Sawtooth Software. Adaptive Choice Based Conjoint Analysis RESEARCH PAPER SERIES

Sawtooth Software RESEARCH PAPER SERIES Adaptive Choice Based Conjoint Analysis Rich Johnson, Sawtooth Software, Joel Huber, Duke University and Lynd...
Author: Emmeline Day
0 downloads 0 Views 123KB Size
Sawtooth Software RESEARCH PAPER SERIES

Adaptive Choice Based Conjoint Analysis Rich Johnson, Sawtooth Software, Joel Huber, Duke University and Lynd Bacon, NFO Worldwide

© Copyright 2003, Sawtooth Software, Inc. 530 W. Fir St. Sequim, WA 98382 (360) 681-2300 www.sawtoothsoftware.com

Adaptive Choice-Based Conjoint Rich Johnson, Sawtooth Software Joel Huber, Duke University Lynd Bacon, NFO Worldwide

A critical aspect of marketing research is asking people questions that will help managers make better decisions. Adaptive marketing research questionnaires involve making those questions responsive to what has been learned before. Such adaptation enables us to use the information we know to make our questions more efficient and less tedious. Adaptive conjoint processes for understanding what a person wants have been around for 20 years, the most notable example being Sawtooth Software’s Adaptive Conjoint Analysis, ACA (Sawtooth Software 1991). ACA asks respondents to evaluate attribute levels directly, and then to assess the importance of level differences, and finally to make paired comparisons between profile descriptions. ACA is adaptive in two important respects. First, when it asks for attribute importances it can frame this question in terms of the difference between the most and least valued levels as expressed by that respondent. Second, the paired comparisons are utility balanced based on the respondent’s previously expressed values. This balancing avoids pairs in which one alternative is much better than the other, thereby engaging the respondent in more challenging questions. ACA revolutionized conjoint analysis, as we know it, replacing the fixed full profile designs that had been the historic mainstay of the business. Currently, ratings-based conjoint methods are themselves being displaced by choice-based methods, where instead of evaluations of product concepts, respondents make a series of hypothetical choices (Huber 1997). Choice-based conjoint is advantageous in that it mimics what we do in the market place. We rarely rate a concept prior to choice, we simply choose. Further, even though choices contain less information per unit of interview time than ratings or rankings, with hierarchical Bayes we are now able to estimate individual-level utility functions. The design issue in choice-based conjoint is determining which alternatives should be included in the choice sets. Currently, most choice designs are not adaptive, and the particular choice sets individuals receive are independent of anything known about them. What we seek to answer in this paper is whether information about an individual’s attribute evaluations can enable us to ask better choice questions. This turns out to be a difficult thing to do. We will describe a method, Adaptive Choice-Based Conjoint (ACBC) and a study that tests it against other methods. What Makes a Good Choice Design? A good design is one in which the estimation error for the parameters is as small as possible. The error theory for choice designs was developed in seminal work by Dan McFadden (1974). For an individual respondent, or for an aggregation of respondents

whose parameters can be assumed to be homogeneous, the variance-covariance matrix of errors for the parameters has a closed form:

∑ β = ( Z ' Z ) −1 Where Z’s have elements:

Jn

z jn = P ( x jn − ∑ xin Pin ) 1/ 2 jn

i =1

The zjn are derived from the original design matrix in which xjn is a vector of features of alternative j in choice set n, and Pjn is the predicted probability of choosing alternative j in choice set n. These somewhat daunting equations have been derived in Huber and Zwerina (1997), and have a simple and compelling intuition. The Z-transformation centers each attribute around its expected (probability weighted) value. Once centered, then the alternatives are weighted by the square roots of their probabilities of being chosen. Thus the transformation involves a within-set probabilitycentered and weighted design matrix. Probability centering attributes and weighting alternatives by the square roots of their probabilities of being chosen within each choice set lead to important implications about the four requirements of a good choice design. The first implication is that the only information that comes from a choice experiment derives from contrasts within its choice sets. This leads to the idea of minimal overlap, that each choice set should have as much variation in attribute levels as possible. The second principle that can be derived is that of level balancing, the idea that levels within attributes should be represented equally. For example, if I have four brands, the design will be more accurate if each of the four appear equally often in the choice design. The third principle is utility balance, specifying that each alternative in the set have approximately equal probability of being chosen. Utility balance follows from the fact that each alternative is weighted by the square root of its probability of being chosen. At the extreme, if one alternative was never chosen within a choice set, then its weight would be zero and the experiment would not contribute to an understanding of the value of its attributes. The final principle is orthogonality, which says that the correlation of the columns across the Z matrix should be as close to zero as possible. While these principles are useful in helping us to understand what makes a good choice set, they are less useful in designing an actual choice experiment because they inherently conflict. Consider, for example, the conflict between orthogonality and utility balance. If one were able to devise a questionnaire in which probabilities of choice were exactly equal within each choice set, then the covariance matrix would be singular because each column of the Z matrix would be equal to a linear combination of its other columns. Generally speaking, there do not exist choice sets that simultaneously satisfy all four principles, so a search method is needed to find one that minimizes a global criterion. The global criterion most used is the determinant of the variance-covariance matrix of the estimated parameters. Minimizing this determinant is equivalent to minimizing the

volume of the ellipsoid defining the estimation errors around the parameters. The determinant also has facile analytical properties (e.g. decomposability, invertability and continuous derivatives) that make it particularly suitable as an optimization measure. Efficient search routines have made process of finding an optimal design much easier (Zwerina, Huber and Kuhfeld 1996). These applications tend to be used in a context where one is looking for a good design across people (Arora and Huber 2001; Sandor and Wedel 2001) that works well across relatively homogeneous respondents. Adaptive CBC, by contrast, takes individual-level prior information and uses it to construct efficient designs on the fly. The process by which it accomplishes this feat is detailed in the next section. Adaptive CBC’s Choice Design Process Adaptive CBC (ACBC) exploits properties of the determinant of the expected covariance matrix that enable it to find quickly and efficiently the next in a sequence of customized choice sets. Instead of minimizing the determinant of the inverse of the Z’Z matrix, ACBC performs the mathematically equivalent operation of maximizing the determinant of Z’Z, the Fisher information matrix. The determinant of Z’Z can be decomposed as the product of the characteristic roots of Z’Z, each of which has an associated characteristic vector. Therefore, if we want to maximize this determinant, increasing the sizes of the smallest roots can make the largest improvement. This, in turn, can be done by choosing choice sets with design vectors similar to the characteristic vectors corresponding to those smallest roots. In an attempt to provide modest utility balance, the characteristic vectors are further modified so as to be orthogonal to the respondent’s partworths. After being converted to zeros and ones most of the resulting utility balance is lost, but this means one should rarely see dominated choices, an advantage for choice experiments. ACBC begins with self-explicated partworths, similar to those used by ACA, constructed from ranking of levels within attributes and judgments of importance for each attribute. (see Sawtooth Software 1991). It uses these to develop prior estimates of the individual’s value parameters. The first choice set is random, subject to requiring only minimum overlap among the attribute levels represented. The information matrix for that choice set is then calculated and its smallest few characteristic roots are computed, as well as the corresponding characteristic vectors. Then each alternative for the next choice set is constructed based on the elements of one of those characteristic vectors. Once we have a characteristic vector from which we want to create a design vector describing an alternative in the proposed choice set, the next job is to choose a (0-1) design vector that best approximates that characteristic vector. Within each attribute, we assign a 1 to the level with the highest value, indicating that that item will be present in the design. An example is given in the figure below.

Figure 1 Building a Choice to Correspond to a Characteristic Vector

L1 Char. Vector Design Vector

-.03 0

Attribute 1 L2 L3 .73 1

Attribute 2 L1 L2

L3

-.70

.93

-.67

-.23

0

1

0

0

This relatively simple process results in choice sets that are focused on information from attribute levels that are least represented so far. Although Adaptive CBC is likely to be able to find good choice sets, there are several reasons why these may not be optimal. These will be just listed below, and then elaborated upon after the results are presented. 1. The priors themselves have error. While work by Huber and Zwerina (1996) indicates that approximate priors work quite well, poor priors could result in less rather than more efficient designs. 2. The translation from continuous characteristic vector to a categorical design vector adds another source of error. 3. D-error is designed for pooled logit, not for hierarchical Bayes logit with its ability to accommodate heterogeneous values across individuals. 4. The Adaptive CBC process assumes that human error does not depend on the particular choice set. Customized designs, particularly those that increase utility balance, may increase respondent error level. If so, the increased error level may counterbalance any gains from greater statistical efficiency. In all, the cascading impact of these various sources of error may lead ACBC to be less successful than standard CBC. The result of the predictive comparisons below will test whether this occurs. An Experiment to Test Adaptive CBC We had several criteria in developing a test for ACBC. First, it is valuable to test it in a realistic conjoint setting, with respondents, product attributes and complexity being similar to those of a commercial study. Second, it is important to have enough respondents so that measures of choice share and hit rate accuracy can differentiate among the methods. Finally, we want a design where we can project not only to within the same sample, but also to be able to predict to an independent sample, a far more difficult predictive test. Knowledge Networks conducted the study, implementing various design strategies among approximately 1000 allergy suffers who were part of their web-based panel.

Respondents made choices within sets of three unbranded antihistamines having attributes shown in Table 1. There were 9 product attributes, of which 5 had three levels and 4 had two levels. Notice the two potentially conflicting price measures, cost per day and cost per bottle. We presented price information to all respondents both ways, but within each choice task only one of these price attributes appeared. We did not provide the option of “None,” so altogether there were a total of 14 independent parameters to be estimated for each respondent. Table 1 Attributes and Levels Used To Define the Choice Alternatives Attribute 1. Cost/day 2. Cost/100x 24 dose 3. Begins working in 4. Symptoms relieved

Level 1 $1.35 $10.80 60 minutes Nasal congestion

5. Form 6. Interacts with Monoamine Oxidase Inhibitors? 7. Interacts with antidepressants 8. Interacts with hypertension medication 9. Drowsiness

Tablet Don’t take with MOI’s

Level 2 $.90 $7.20 30 minutes Nasal Congestion and Headache Coated tablet May take with MOI’s

Don’t take with antidepressants

May take with antidepressants

No

Yes

Causes drowsiness

Does not cause drowsiness

Level 3 $.45 $3.60 15 minutes Nasal, Chest Congestion and Headache Liquid capsule

The form of the exercise was identical for all respondents with the only difference being the particular alternatives in the 12 calibration choice tasks. To begin, all respondents completed an ACA-like section in which they answered desirability and importance questions that could be used to provide information for computing “prior” self-explicated partworths. We were confident of the rank order of desirability of levels within 8 of the 9 attributes, so we only asked for the desirability of levels only for one attribute, tablet/capsule form. We asked about attribute importance for all attributes. Next respondents answered 21 identically formatted choice tasks. o The first was used as a “warm-up” task and its answers were discarded. o The next 4 were used as holdout tasks for assessing predictive validity. All respondents received the same choice sets.

o The next 12 were used to estimate partworths for each respondent, and were unique for each respondent. o The final 4 were used as additional holdout tasks. They were identical to the initial 4 holdout tasks, except that the order of alternatives was rotated. The respondents were randomly allocated to five experimental conditions each containing about 200 people whose calibration sets were determined by different choice design strategies. The first group received standard CBC questionnaires. CBC provides designs with good orthogonality, level balance, and minimal overlap, but it takes no account of respondents’ values in designing its questions, and so makes no attempt at adaptive design. The second group saw choice sets designed by ACBC. It does not directly seek utility balance, although it does take account of estimated partworths, designing questions that provide information lacking in previous questions. The third group also received questions designed by the adaptive algorithm, but one “swap” was made additionally in each choice set, exchanging levels of one attribute between two alternatives to create more utility balance. The fourth group was identical to the third, except their choices had two utility-balancing swaps. Finally, a fifth group also received questions designed by the adaptive algorithm, but based on aggregate partworths estimated from a small pilot study. This group was not of direct interest in the present comparison, and will not be reported further, although its holdout choices were included in the test of predictive validity. Results Before assessing predictive accuracy, it is useful to explore other ways the experimental conditions did and did not create differences on other measures. In particular, groups did not differ with respect to the reliability of the holdouts, with the choice consistency for all groups within one percentage point of 76%. Also, pre-and-post holdout choices did not differ with respect to choice shares. If the first holdouts had been used to predict the shares of the second holdouts, the average mean error would be 2.07 share points. These reliability numbers are useful in that they indicate how well any model might predict. The different design strategies did differ substantially with respect to the utility balance of their choice sets. Using their final estimated utility values, we examined the difference in the utility of the most and least preferred alternatives in each of the choice sets. If we set this range at 1.0 for CBC it drops to .81 for ACBC, then to .32 for ACBC with one swap and .17 for ACBC with two swaps. Thus ACBC appears to inject moderate utility balance compared with the CBC, while each stage of swapping then creates substantially more utility balance. This utility balance has implications for interview time. While regular CBC and ACBC took around 9.15 minutes, adding one swap added another 15 seconds and two swaps another 25 seconds. Thus, the greater difficulty in the choices had some impact on the time to take the study, but less than 10%. In making our estimates of individual utility functions, we used a special version of Sawtooth Software’s hierarchical Bayes routine that contains the option of including

within-attribute prior rankings as constraints in the estimation. This option constrains final partworths to match orders of each respondent’s initial ordering of the attribute levels. We also tested using the prior attribute importance measures, but found them to degrade prediction. This result is consistent with work showing that self-explicated importance weights are less useful in stabilizing partworth values (van der Lans, Wittink, Huber and Vriens 1992). Since within-attribute priors do help, their impact on prediction is presented below. In comparing models it is appropriate to consider both hit rates and share predictions as different measures of accuracy. Hit rates reflect a method’s ability to use the 12 choices from each person to predict 8 holdout choices. Hit rates are important if the conjoint is used at the individual level, for example to segment customers for a given mailing. Share predictions, by contrast, test the ability of the models to predict choice share for holdout choices. Share predictions are most important when the managerial task is to estimate choice shares for new products. Hit rates are very sensitive to the reliability of individuals’ choices, which in this case hovers around 76%. We measure success of share predictions with Mean Absolute Error (MAE). MAEs are sensitive mostly to bias, since unreliability at the individual level is minimized by aggregation across independent respondents. Hit rates, shown in Table 2, demonstrate two interesting tendencies, (although none of the differences within columns is statistically significant). With unconstrained estimation there is some evidence that two swaps reduce accuracy. However, when constraints are used in estimation, hit rates improve for all groups, with the greatest improvement for the group with most utility balance. We will return to the reason for this main effect and significant interaction after noting a similar effect on the accuracy of the designs with respect to predicting choice share. Table 2 Accuracy Predicting Choices Percent of Holdouts Correctly Predicted by Different Design Strategies Design Strategy

No HB Constraints

Regular CBC Adaptive CBC Adaptive CBC + 1 Swap Adaptive CBC + 2 Swaps

75% 74% 73% 69%

Within Attribute Constraints 77% 76% 77% 79%

To generate expected choice shares we used Sawtooth Software’s Randomized First Choice simulation method (Orme and Huber 2000). Randomized First Choice finds the level of error that when added to the fixed portion of utility best predicts holdout choice shares. It does this by taking 1000 random draws from each individual after perturbing the partworths with different levels of variation. The process finds the level of variation that will best predict the choice shares of that group’s holdout choices. Since such a

procedure may result in overfitting choice shares within the group, in this study we use the partworths within each group to predict the combined choice shares from the other groups. Table 3 provides the mean absolute error in the choice share predictions of the four design strategies. For example, regular CBC had mean absolute error of 3.15 percentage points, 20% worse than the MAE of 2.61 for ACBC. Without constraints, the new ACBC method was the clear winner. When the solutions were constrained by withinattribute information, all methods improved and again, as with hit rates, groups with greatest utility balance improved the most. With constraints, as without constraints, ACBC remained the winner. We are not aware of a statistical test for MAEs, so we cannot make statements about statistical significance, but it is noteworthy that ACBC with constraints has an MAE almost half that of regular CBC (without constraints). Table 3 Error Predicting Share Mean Absolute Error Projecting Choice Shares for Different Design Strategies Design Strategy

No HB Constraints

Regular CBC Adaptive CBC Adaptive CBC + 1 Swap Adaptive CBC + 2 Swaps

3.15* 2.61 5.23 7.11

Within Attribute Constraints 2.28 1.66 2.05 3.06

* Read: Regular CBC had an average absolute error in predicting choice shares for different respondents of 3.15 percentage points.

Why were constraints more effective in improving the designs that included swaps? We believe this occurred because swaps make the choice model less accurate by removing necessary information from the design. In our case, the information used to do the balancing came from the prior estimates involving the rank orders of levels within attributes. Thus, this rank-order information used to make the swaps needs to be added back in the estimation process. An analogy might make this idea more intuitive. Suppose in a basketball league teams were handicapped by being balanced with respect to height, so that swaps made the average height of the basketball players approximately the same at each game. The result might make the games closer and more entertaining, and could even provide greater opportunity to evaluate the relative contribution of individual players. However, such height-balanced games would provide very little information on the value of height per se, since that is always balanced between the playing teams. In the same way, balancing choice sets with prior information about partworths appears to make the individual utility estimates of partworths less precise. We tested this

explanation by examining the relationship between utility balance and the correlation between priors and the final partworths. For (unbalanced) CBC, the correlation is 0.61, while for Adaptive CBC with two swaps, this correlation drops to 0.36, with groups having intermediate balancing showing intermediate correlations. Bringing back this prior information in the estimation stage raises the correlation for all the methods increase to a consistent 0.68. Summary and Conclusions In this paper we presented a new method using a characteristic-roots-and-vectors decomposition of the Fisher information matrix to develop efficient individual choice designs. We tested this new method against Sawtooth Software’s CBC and against Adaptive CBC designs that had additional swaps for utility balance. The conclusions relate to the general effectiveness of the new method, the value of swapping and the benefit from including priors in the estimation stage. Adaptive Choice-Based Conjoint For those performing choice-based conjoint, the relevant question is whether the new adaptive method provides a benefit over standard CBC, which does not alter its design strategy depending on characteristics of the individual respondent. We find that the two techniques take about the same respondent time. In terms of accuracy, there are no significant differences in predicting individual choice measured by hit rates. However, the new method appears to be more effective at predicting aggregate choice shares, although we are not able to test the statistical significance of this difference. While we take comfort in the fact that this new prototype is clearly no worse than standard CBC, an examination of the four points of potential slippage discussed earlier offers suggestions as to ways Adaptive CBC might be improved. The first issue is whether the priors are sufficiently accurate in themselves to be able to appropriately guide the design. The second issue arises from the imprecision of approximating continuous characteristic vectors with zero-one design vectors. The third issue focuses on the appropriateness of minimizing D-error, a criterion built around pooled analysis, for the individual estimates from hierarchical Bayes. The final issue is whether human error from more difficult (e.g. utility balanced) choices counteracts any efficiency gains. Below we discuss each of these issues. The first issue, basing a design on unreliable prior estimates, suggests a context in which the adaptive procedure will do well relative to standard CBC. In particular, suppose there is relatively low variability in the partworths across subjects. In that case, the HB procedure will do a fine job of approximating the relatively minor differences in values across respondents. However, where there are substantial differences in value across respondents then even the approximate adjustment of the choice design to reflect those differences is likely to be beneficial in being able to differentiate respondents with very different values from the average.

The second issue relates to additional error imparted by fitting the continuous characteristic vectors into a categorical design vector. The current algorithm constructs design vectors from characteristic vectors on a one-to-one basis. However, what we really need is a set of design vectors that “span the space” of the characteristic vectors, but which could be derived from any linear transformation of them. Just as a varimax procedure can rotate a principal components solution to have values closest to zero or one, it may be possible to rotate the characteristic vectors to define a choice set that is best approximated by the zeros and ones of the design vectors. The third issue relates to the application of D-error in a hierarchical Bayes estimation, particularly one allowing for constraints. While the appropriateness of the determinant is well established as an aggregate measure of dispersion, in hierarchical Bayes one needs choice sets that permit one to discriminate a person’s values from the average, with less emphasis on the precision of the average per se. Certainly more simulations will be needed to differentiate a strategy that minimizes aggregate D-error from one that minimizes error in the posterior estimates of individual value. The final issue involves increasing possible human error brought about by the adaptive designs and particularly by their utility balance. As evidence for greater task difficulty we found that the utility balanced designs took longer, but by less than 10%. Notice, however, that the increased time taken may not compensate for difficulty in the task. It is possible the error around individual’s choices also increases with greater utility balance. That said, it will be difficult to determine the extent of such increased errors, but they will clearly limit the effectiveness of the utility balance aspect of adaptive designs. Utility Balance Along with orthogonality, minimal overlap and level balance, utility balance is one of the factors contributing to efficiency of choice designs. We tested the impact of utility balance by including one or two utility-balancing swaps to the Adaptive CBC choice sets. Unless constraints were used in the estimation process, two swaps degraded both hit rate and MAE accuracy. It is likely that the decay in orthogonality induced by the second swap combined with greater individual error to limit accuracy. However, if too much utility balance is a bad thing, a little (one swap) seems relatively benign. Particularly if constraints are used, then it appears that one swap does well by both the hit rate and the choice share criteria. The general problem with utility balance is that it is easy to characterize, but it is hard to determine an optimal level. Some utility balance is good, but too much quickly cuts into overall efficiency. D-error is one way to trade off these goals, but limiting the number of swaps may not be generally appropriate. For example, for a choice design with relatively few attributes (say 3 or 4), one swap should have a greater impact than in our study with nine attributes. Further, the benefit of balancing generally depends on the accuracy of the information used to do the balancing. The important point here is that while any general rule of thumb recommending one but not two swaps may generally work, but it will certainly not apply over all circumstances.

Using Prior Attribute Orders as Constraints Using individual priors of attribute level orders as constraints in the hierarchical Bayes analysis improved both hit rates and share predictions. It is relevant to note that using prior importance weights did not help; people appear not to be able to state consistently what is important to them. However, the effectiveness of using the rankings of levels within attributes suggests that choices do depend importantly on this information. Using this within-attribute information had particular value in counteracting the negative impact of utility balancing. Utility balancing results in less precision with respect to the information used to do that balancing. Thus it becomes important to add back this information in the analysis aspect that was lost in the choice design. In the current study most of the attributes were such that respondents agreed on the order of levels. Sometimes these are called “vector attributes,” for which people agree that more of the attribute is better. Examples of vector attributes for antihistamines include speed of action, low price and lack of side effects. By contrast, there are attributes, such as brand, or type of pill or bottle size, on which people may reasonably disagree with respect to their ordering. Where there is substantial heterogeneity in value from nonvector attributes we expect that optimizing design on this individual-level information and using priors as constraints should have even greater impact than occurred in the current study. In conclusion, the current study gives reason to be optimistic about the effectiveness of adaptive choice design. It is likely that future research will both improve the process by which prior information guides choice design and guide changes in design strategies that adjust to different product class contexts.

References Arora, Neeraj and Joel Huber (2001), “Improving Parameter Estimates and Model Prediction by Aggregate Customization of Choice Experiments,” Journal of Consumer Research, 26:2 (September) 273-283. Huber, Joel and Klaus Zwerina (1996), “The Importance of Utility Balance in Efficient Choice Designs,” Journal of Marketing Research, 33 (August) 307-317. Huber, Joel (1997) “What We Have Learned from 20 Years of Conjoint Research: When to Use Self-Explicated, Graded Pairs, Full Profiles or Choice Experiments,” Sawtooth Software Proceedings 1997: Available at http://www.sawtoothsoftware.com/download/techpap/whatlrnd.pdf McFadden, Daniel (1974) “Conditional Logit Analysis of Qualitative Choice Behavior,” in Frontiers in Econometrics, P. Zaremka, ed. New York, Academic Press, 105142. Orme, Bryan and Joel Huber (2000), “Improving the Value of Conjoint Simulations,” Marketing Research, 12 (Winter), 12-21. Sandor, Zsolt and Michel Wedel (2001), “Designing Conjoint Choice Experiments Using Manager’s Prior Beliefs,” Journal of Marketing Research, 38 (November), 43044. Sawtooth Software (1991), “ACA System: Adaptive Conjoint Analysis,” Available at http://www.sawtoothsoftware.com/download/techpap/acatech.pdf Sawtooth Software (1999), “Choice-Based Conjoint (CBC),” Available at http://www.sawtoothsoftware.com/download/techpap/cbctech.pdf van der Lans, Ivo A., Dick Wittink, Joel Huber and Marco Vriens (1992), "Within- and Across-Attribute Constraints in ACA and Full Profile Conjoint Analysis," Available at http://www.sawtoothsoftware.com/download/techpap/acaconst.pdf Zwerena, Klaus, Joel Huber and Warren Kuhfeld (1996), “A General Method for Constructing Efficient Choice Designs,” Available at http://support.sas.com/techsup/technote/ts677/ts677d.pdf