FEEDBACK MECHANISMS, JUDGMENT BIAS AND TRUST FORMATION IN ONLINE AUCTIONS

FEEDBACK MECHANISMS, JUDGMENT BIAS AND TRUST FORMATION IN ONLINE AUCTIONS Abstract Online markets, like eBay, Amazon and others rely on electronic re...
Author: Eustacia Harris
2 downloads 0 Views 239KB Size
FEEDBACK MECHANISMS, JUDGMENT BIAS AND TRUST FORMATION IN ONLINE AUCTIONS

Abstract Online markets, like eBay, Amazon and others rely on electronic reputation or “feedback” systems to curtail moral hazard and promote trust among participants in the marketplace. These systems are based on the idea that providing information about a trader’s past behavior (performance on previous market transactions) allows market participants to form judgments regarding the trustworthiness of potential interlocutors in the marketplace. It is often assumed that traders correctly interpret the data presented by these systems when forming their judgments. In this paper, we demonstrate that this assumption does not hold. Using controlled laboratory experiments simulating an online auction site with participants acting as buyers, we find that participants interpret seller feedback ratings in a biased (non-Bayesian) fashion. In addition, we find that the degree to which buyers misweight seller feedback information is moderated by the presentation format of the feedback system, as well as attitudinal and psychological attributes of the buyer. Specifically, we find that buyers interpret feedback data presented in an Amazon-like fashion in a more biased (less-Bayesian) manner than identical data presented in an eBay-like fashion. Further, we find that participants with online shopping experience interpreted feedback data in a more biased (less-Bayesian) manner than participants with online shopping experience. Finally, we find that participants with greater institution-based trust (i.e., structural assurance) interpreted feedback data in a more biased (less-Bayesian) manner. Keywords: Trust formation, feedback mechanisms, judgment bias, online auctions

1. Introduction One way that online commerce sites have attempted to reduce fraud and mitigate the uncertainty associated with shopping online is by implementing online reputation systems. These systems allow buyers and sellers to publicly post feedback about their transaction experiences with their counterparts, and this form of shared reputations may serve to increase buyer trust and offset the effects of asymmetric information. As a result of the success of several online marketplaces and the suspected role feedback systems have played in these successes, a number of economic and IS researchers have started to study online reputation systems and their impact on buyer trust formation. Some of this work has looked at the design of feedback systems. For example, Dellarocas (2000) looked at measures to safeguard feedback rating systems from positive or negative ballot stuffing (i.e., artificially inflating or driving down the seller’s ratings to fool consumers, eliminate competition, or simply out of spite). Chen and Singh (2001) developed a rating system that evaluates users providing feedback and gives additional weight to the input of more credible participants. Related work by Miller, Resnick, & Zeckhauser (2002) explored the use of payments to induce truthful feedback from participants. Another area of research has focused on the economic and behavioral outcomes of user feedback. In this area, experimental work by Bolton et al. (2004) found that markets with feedback systems were more efficient (i.e., more trades were made). Resnick et al. (2006) found that established sellers received price premiums over newcomers. Houser and Wooders (2006) found that seller, but not buyer, reputations affected selling price. Lucking-Reiley et al. (2007) found that both positive and negative feedback ratings affected price, but that negative ratings

2

had greater effect. Resnick and Zeckhauser (2002) found that sellers with better reputations were more likely to sell their items. Wolf and Muhanna (2005) found that feedback ratings mitigate but do not eliminate the risk of adverse selection or moral hazard on eBay Motors. However, while a great deal of research has looked into the design or various economic implications of online reputation systems, to date, no one has examined how users actually interpret these rating systems. The implied belief is that people correctly interpret the data presented by these systems and act accordingly. However, a large body of literature seems to suggest that this may not be the case and that human intuitive judgments are subject to a number of biases (Griffin & Tversky, 1992; Jiang, Muhanna, & Pick, 1996; Nelson et al., 2001; Massey & Wu, 2005). In short, it is conceivable that even if system designers had the ability to create systems that were completely accurate, that induced all participants to provide truthful responses, and that eliminated all possibility of tampering or corruption, there would still be inefficiencies in the market if users are not able to interpret the data in an unbiased fashion. As Kraemer and Weber (2004) note, non-rational consideration of strength and weight can also affect aggregate market outcomes. For example, user overconfidence may lead buyers to place unwarranted trust in some vendors, making shoppers vulnerable to fraud. On the other hand, under-confidence may lead users to be overly suspicious of some vendors. If users refuse to trade with these vendors due to lack of trust, the shunned vendors may suffer from reduced sales and reduced profits and, ultimately, be forced from the market. The goal of this paper is twofold. First we examine the extent to which an eBay or Amazon-like feedback system facilitates buyer trust in an online-auction seller and assess the

3

degree to which risk and judgment bias play a role in trust formation. To accomplish this, we ran a set of controlled laboratory experiments simulating an online auction site with participants acting as buyers. Next, if buyers do mis-weigh feedback information, to what degree is this misweighing affected by the presentation format (e.g., eBay-like or Amazon-like) of the feedback data or the psychological or attitudinal characteristics of the buyer? Using data collected via a pre-experiment survey, we examine both the indirect (moderating) role of presentation format as well as two psychological factors. One of these factors, structural assurance in online auctions, captures the buyer’s belief that legal and technological Internet protections like data encryption safeguard one from loss of privacy, identity, or money. The other factor is the participant’s online shopping experience. The second objective therefore is to provide a better understanding of the factors that might explain variations in the degree of judgment bias across individuals.

2. Literature Review and Hypotheses For buyers, shopping online has two distinct types of uncertainty, or risks. One set of risks involves uncertainty about the Internet as a transaction medium, and the other involves uncertainty about the seller. Hirshleifer and Riley (1979) call these risks system-dependent and transaction-specific uncertainty, respectively. Research shows that trust facilitates cooperative behavior, which can mitigate both system-dependant and transaction-specific uncertainty. System-dependent uncertainty comprises events that are beyond the direct influence of either party (online shoppers or online sellers) and can be characterized as exogenous or environmental uncertainty. In the context of electronic commerce, exogenous uncertainty primarily relates to potential technological sources of errors and security gaps that are inherent in the Internet. They

4

are technology-dependent risks that cannot be avoided by an agreement or a contract with another actor who is involved in the transaction. Grabner-Kräuter and Kaluscha (2003) note that the smooth and secure processing of an online transaction depends on the functioning of the hardware and software that is used as well as on the security of the data-exchange services including the cryptographic protocols that are used. Technical safety gaps can emerge either in the data channel or on the “final points” of the process or the e-commerce system. In business-to-consumer e-commerce, “final points” of the ecommerce system are the desktop system of the customer, the server of the Internet retailer, and eventually the servers of the involved banks and operators of the electronic marketplace. Grabner-Krauter and Kaluscha (2003) define transaction-specific uncertainty as endogenous, or market, uncertainty that results from decisions of economic actors, which is caused by an asymmetric distribution of information between the transaction partners. It is this type of uncertainty that online trust-generating mechanisms are created to address and the type of uncertainty that is the focus of this paper. As Kollock (1999) notes, “At the heart of any unsecured transaction is a social dilemma.” By social dilemma, Kollock means, …a situation in which behavior that is rational for the individual leads to an irrational collective outcome. In the case of a bilateral exchange (e.g., an online auction), there is a temptation to receive the good or service without reciprocation, but if both parties hold back on their side of the exchange, the trade is never consummated and both are worse off.

5

Akerlof's (1970) seminal “Lemon Markets” paper illustrates the potential problems of trades involving uncertainties similar to those mentioned by Kollock and present in online auctions -total market failure. Akerlof notes that the cost of dishonesty in markets, with asymmetric information, is more than simply the losses of the swindled, but also includes the losses that result from driving legitimate businesses out of the market. In the “Lemons Market,” all sellers of high-quality cars are driven from the market – leaving only poor-quality autos (i.e., lemons). In traditional commerce, merchants and buyers have long understood this temptation to defect or cheat. As a result, a wide range of formal and informal mechanisms for managing this risk has developed over the years. Ancient social traditions were designed to elicit trust during uncertain encounters. Handshaking demonstrated the absence of weapons, and the clinking of glasses evolved from pouring wine back and forth to prove it was not poisoned (Shneiderman, 1999). The simple act of meeting face to face for the transaction helps to reduce the likelihood that one party will end up empty handed. However, separating the two sides by time or space (such as purchasing something by mail or on credit) introduces greater risk; the party that moves second must be trustworthy or have some other form of guarantee. Highlighting the importance of trust or lack of trust in online settings, Lee and Turban (2001) find that lack of trust is one of the most frequently cited reasons for consumers not purchasing on the Internet. Supporting this, Fukuyama (1995) notes that users are more likely to participate in transactions and relationships if they feel that they are engaging in a trusting relationship. The author explains that users seek reliable reports about past performance and truthful statements of future guarantees.

6

So what exactly is trust and how does one create trust in an online setting? Baier (1986) offers a useful starting point, writing: One leaves others an opportunity to harm one when one trusts, and also shows one’s confidence that they will not take it. Reasonable trust will require reasonable grounds for such confidence in another’s good will, or at least the absence of good grounds for expecting their ill will or indifference. Trust, then, on this first approximation, is accepted vulnerability to another’s possible, but not expected ill will (or lack of good will) toward one. Several empirical studies have investigated the role of trust in the specific context of ecommerce, focusing on different aspects of this multi-dimensional construct. However, as Bhattacherjee (2002) explains, empirical research in this area is beset by conflicting conceptualizations of the trust construct, inadequate understanding of the relationships between trust, its antecedents and consequents, and the frequent use of trust scales that are neither theoretically derived nor rigorously validated. In the same vein, Grabner-Kräuter and Kaluscha (2003) lament that determinants of online trust are used interchangeably in many studies, making comparisons difficult. Despite these difficulties, there does appear to be widespread agreement among researchers on the importance of trust in electronic commerce. However, while the notion of trust is widely accepted as beneficial in several settings and has been extensively investigated by a wide range of researchers in a number of disciplines, the focus and definitions of trust in these various literatures varies greatly. In this paper, our focus is on the process through which a buyer arrives at his or her judgment regarding the trustworthiness of a given seller, recognizing

7

that seller trustworthiness is an important antecedent to a buyer’s trust or willingness to depend on the seller (Rousseau et al., 1998). 2.1. Feedback Mechanisms and Online Trust Formation One way that online commerce sites have attempted to reduce fraud and mitigate the uncertainty associated with shopping online is by implementing online reputation systems. These systems allow buyers and sellers to publicly post feedback about their transaction experiences with their counterparts. This form of shared reputations may serve to offset the effects of asymmetric information. 2.1.1. eBay’s Feedback System After transaction is complete, eBay allows buyers to rate sellers on their performance and vice versa. There are three possible ratings an eBay user can give a transaction partner; positive (the party behaved honestly), neutral, and negative (the party behaved dishonestly). Positive ratings add (+1) and negative ratings subtract (-1) from the eBay users feedback rating, Neutral ratings have no numerical impact on the user’s feedback rating. The seller's feedback rating, along with the percentage of positive ratings the seller has earned, are prominently displayed on each eBay auction listing page (Kollock, 1999). As Kollock notes, this number is a summary measure of a person’s reputation in the eBay market. However, the number can be difficult to interpret. For example, a feedback rating of 10 could represent 10 positive ratings, or 110 positive ratings, 100 negative ratings, and any number of neutral ratings (Wolf and Muhanna, 2004). In addition, ratings earned as both a buyer and a seller are used to calculate a seller's feedback rating. This means that a seller can earn a high feedback rating before ever selling an item. 8

However, eBay also allows users to view a detailed history of a seller’s (or buyer’s) feedback record. On each eBay user’s Feedback Profile Page is a table that lists the number of positive, neutral and negative feedback ratings for the past 1, 6 and 12 months. Feedback Profile Page also includes tabs which allows views to access the feedback comments other users have left for the user as well as the feedback comments the user has left for others. The design of the Feedback Profile Page as well as the way that feedback has been presented on eBay’s auction listing pages has changed several times since eBay’s feedback system was first introduced in 19961. Originally, all registered eBay users had the ability to leave feedback for all other registered users. In 2000, the system was changed and the ability to leave feedback was limited to transaction partners (i.e., only the buyer and seller involved in a transaction could leave feedback on each other). 2At this time, each eBay users Feedback Profile Page contained a table (ID Card) that listed the number of positive, neutral and negative feedback ratings received in the past 7 days, as well as those received in the past month and past 6 months (Resnick and Zeckhauser, 2002). In 2003, the seller’s percentage of positive ratings was added to both the auction listing page and to user’s Feedback Profile Page (Steiner, 2003). The Feedback Profile Page was also changed. The ID Card was replaced with a table that listed the number of positive, neutral and negative feedback ratings the user had received for the past 1, 6 and 12 months (Steiner, 2007). In May 2007, eBay introduced its Feedback 2.0 system which included Detailed Seller Ratings (DSR) (Steiner, 2003). Detailed Seller Ratings enabling buyers to rate sellers on an Amazon-like scale of 1 to 5 stars on four criteria (i.e., Item as Described, Communication,

1 2

http://www.ebaychatter.com/the_chatter/2008/02/feedback-change.html http://www.ebaychatter.com/the_chatter/2008/02/feedback-change.html

9

Shipping Time and Shipping and Handling Charge). Currently, Detailed Seller Ratings for each of the four criteria are prominently displayed on the Feedback Profile page. 2.1.2. Amazon’s Feedback System Amazon users rate their experiences with transaction partners using 1 to 5 stars, with 5 stars being the best3. Amazon, sellers have the ability to leave feedback on buyers. However, the system is intended primarily as a way for buys to rate and evaluate sellers. As Amazon’s Feedback FAQ states, most buyers do not read their feedback, and Amazon does not use buyer feedback to evaluate buyers4. Further driving the point home, the FAQ tells sellers, “Please do not feel compelled to leave feedback for all buyers.” On Amazon product listing pages, each seller's average rating, along with their cumulative number of ratings (both for the past 12 months and lifetime) are prominently displayed alongside their Amazon user name (Roth and Ockenfels, 2002). Like eBay, Amazon also allows users to view the past feedback details, including buyer comments. On each seller’s At-a-Glance Page, buyers can view a table depicting the percentages of positive, neutral and negative feedback ratings as well as the total number of ratings left by buyers over the past 30, 90 and 365 days and for the seller’s entire “Lifetime” of selling on Amazon. Only feedback ratings received as a seller are used to calculate these percentages. Amazon’s online documentation describes how to map the 1 to 5 stars ratings Amazon users provide into the eBay-like positives, neutrals and negatives, found on a seller’s At-a-Glance

3 4

http://www.amazon.com/gp/help/customer/display.html?ie=UTF8&nodeId=537774&qid=1207513896&sr=1-1 http://www.amazon.com/gp/help/customer/display.html?nodeId=1161284

10

Page. Five or 4 stars is Positive Feedback, 3 stars is Neutral Feedback and 2 or 1 stars is Negative Feedback5. 2.1.3. Related Literature In studies of online reputation systems, Houser and Wooders (2006) find that seller, but not buyer, reputation affects selling price. Resnick et al. (2006) find that established sellers, those with publicly available feedback ratings, receive price premiums over newcomers. In related work, Ba and Pavlou (2002) find that ratings affect buyer trust, which in turn affects the price premium that sellers receive. Empirical investigations by Bajari and Hortacsu (2003), LuckingReiley et al. (2007), and Resnick and Zeckhauser. (2002) each find evidence of connections between seller reputations and economic benefit. Existing literature strongly suggests that online reputation systems can be effective in reducing perceived transaction-specific risk due to information asymmetry. 2.2. Judgment Bias and Online Trust Formation Judgment and decision-making research suggests that people often incorporate evidence into decision choices erroneously. In some cases, decision makers place too much weight on diagnostic evidence (e.g., Massey & Wu, 2005; Luan, Sorkin & Itzkowitz, 2004; Nelson et al., 2001 and Griffin & Tversky, 1992) and in other cases, decision makers allow their decisions to be influenced by irrelevant or nondiagnostic factors (e.g., Chen & Jiang, 2006; Bloomfield, Libby, & Nelson, 2003; Nisbett, Zukier, & Lemly 1981). In examples of research demonstrating that decision makers often place too much weight on unimportant factors, experiments by Bloomfield et al. (2003) suggests that investors and 5

http://www.amazon.com/gp/help/seller/feedback-popup.html?ie=UTF8&seller=A30ER87EVMEYN8

11

analysts rely too heavily on old earnings reports when predicting future earnings. The authors report that experiment participants lent too much credence to old annual return on equity (ROE) information when predicting future ROE. Accord to these researchers, earnings performance from previous years is relatively uninformative once more current performance is known. They note that this pattern is especially strong for annual earnings, which approximate a random walk. In an example of research demonstrating that decision makers often place too much weight on a subset of informative factors, Tversky and Kahneman (1974) gave two groups of high school students five seconds to estimate the product of a multiplication problem. The one group was given the problem: 8x7x6x5x4x3x2x1 and the other group were given the same problem written as 1x2x3x4x5x6x7x8. Tversky and Kahneman found that the students’ answers depended upon the order in which the sequences are presented. The mean answer for students given the first problem was 2250 and the mean for students given the second problem was 512. The authors suggest that students gave the earlier numbers in the multiplication problem too much weight in the estimation of the final product. In this case, the product of the first few numbers is relevant information with respect to the final estimate. Similarly, Chen and Jiang (2006) looked earnings forecasts between 1985-2001, finding that, on average, financial analysts place larger than efficient weights on (i.e., they overweight) private information and on favorable news (whether private or public information) when forecasting corporate earnings. Griffin and Tversky (1992) suggest that it is necessary to consider two aspects in order to understand how people develop confidence in their decisions. One is the extremeness of the evidence (strength) and the other is its weight. Statistical theory provides normative rules for combining strength and weight. One important rule, called Bayes Rule, specifies how to calculate posterior probability given prior probability, sample portion (strength), and sample size 12

(weight). However, several studies suggest that people do not combine the strength and weight of the information in accordance with statistical rules (Griffin & Tversky, 1992; Jiang et al., 1996; Nelson et al., 2001). Griffin and Tversky (1992) note that individuals systematically over-weight the strength of evidence (or how well the evidence matches the hypothesis) and under-weight the weight (or diagnosticity) of the evidence. Massey and Wu (2005) illustrate Griffin and Tversky’s point with a coin example. They ask readers to imagine that there is a biased coin. The coin is either biased to land 70% on heads or biased to land 70% tails, but you do not know which. If the biased coin is flipped several times, the proportions of times that the coin lands on heads are the strength of the evidence, while the sample size represents the weight of the evidence. In this example, 4 out of 5 heads would constitute high strength but low weight while 32 heads out of 60 tosses has low strength but high weight. In settings like Griffin and Tversky’s (and Massey and Wu’s) coin-toss experiment or when a potential bidder is attempting to interpret the feedback ratings of an eBay seller, individuals receive signals that are aggregated on one or two regimes. The system-neglect hypothesis posits that individuals respond primarily to the signal and secondarily to the system. This is consistent with Griffin and Tversky (1992), who found that participants in their coin-toss experiment tended to be overconfident when the strength of the representative information (the difference between the proportion of heads and tails in the sample) is high and the weight (the number of coin tosses or sample size) low and underconfident when strength is low and weight is high. Shifting from the abstract coin-toss setting to an online marketplace, we expect that when presented with the number of positive and negative ratings for a seller, buyers in an online

13

market like eBay would form their judgments regarding the trustworthiness of that seller in a similar (biased) way. This leads to the following hypothesis: HYPOTHESIS 1. Online buyers combine the weight and strength of the evidence provided through the sellers’ feedback ratings in a biased fashion, overemphasizing the strength (or extremeness) and underemphasizing the weight (or credence) of the ratings.

2.2.1. Judgment Bias Antecedents Next, if buyers do misweight feedback information. What causes this misweighting? And to what degree is this misweighting affected by the presentation format of the feedback information or psychological or attitudinal characteristics of the buyer? 2.2.2. Presentation Mode Kraemer and Weber (2004) demonstrated that presentation mode can affect the manner in which users combine the weight and strength of the evidence. Kraemer and Weber found, like Griffin and Tversky and others, that when users are presented with segregated data, that they tend to give insufficient consideration of weight and excessive focus on strength of evidence. However, when presented with equivalent aggregated data, Kraemer and Weber found that users gave excessive consideration to the weight of the evidence. Similarly, experimental work by Juslin, Wennerholm and Olsson (1999) found that data presentation format can affect subjects’ over and under-confidence. Given that, even when not highlighted, buyers tend to give excessive consideration to the strength of evidence, we believe that this judgment bias may be exacerbated by feedback systems

14

with presentation formats which prominently feature the strength of feedback information. For this reason, we believe that judgment bias will be more pronounced when feedback data is presented in a format that emphasizes strength (e.g., a format similar to Amazon’s) than when the data is presented in a format that places less emphasis on the strength of information (e.g., a format similar to eBay’s). Thus, HYPOTHESIS 2. Buyer’s judgment bias will be more pronounced when feedback data is presented in a format that prominently features strength. 2.2.3. Structural Assurance Previous decision making research has shown that individual characteristics can affect judgment biases. For example, experimental studies suggest that men are more prone to overconfidence than women (e.g., Barber and Odean, 2001) and that Asians are more overconfident than Americans (e.g., Yates, Lee and Shinotsuka, 1996; Yates, Lee and Bush,1997). As Chen, Kim, Nofsinger and Rui (2007) note, cognitive biases may be learned and differences in life experiences and education may cause differences in cognitive biases. A construct prominent in online trust literature is structural assurance. According to McKnight et al. (2002) structural assurance means one believes that structures like guarantees, regulations, promises, legal recourse, or other procedures are in place to promote success. McKnight and associates note that a person with high Web-related structural assurance would believe that legal and technological Internet protections like data encryption safeguard one from loss of privacy, identity, or money. McKnight et al. (2004) find that structural assurance positively affects a person’s perceived quality of a website and a person’s willingness to explore a website. Both of these factors are then found to positively affect a person’s trusting intention 15

toward a website. McKnight et al. (2002) find that structural assurance of the Web is positively related to a person’s trusting intentions or willingness to depend on a web vendor as well as their trusting beliefs in a Web vendor. While each of these studies explores structural assurance’s affect on buyer trust, we believe that structural assurance in online auctions will have a similar effect on buyers' judgment bias. That is, structural assurance will exacerbate this bias. Further, we believe that structural assurance in online auctions will both directly affect buyer judgment bias and act as a moderator between seller feedback (the strength and weight of the evidence) and the buyer bias. Thus, HYPOTHESIS 3. Judgment bias is increasing in buyers’ structural assurance in the infrastructure that enables the Internet to perform as a platform for transaction processing.

2.2.4. Experience Several studies have demonstrated biased decision making by a wide array of experts, including physicians, clinical psychologists, meteorologists, forecasters and nurses (Shanteau, 1992). Often these studies have found that the decisions of experts are no less biased than those of inexperienced decision makers (Kahneman, 1991). As Chen, Kim, Nofsinger and Rui (2007) show, cognitive biases may be learned or even exacerbated with experience. According to Langer (1975) people often assume a skill orientation in a chance situation, mistakenly seeing causal relationships between their actions and situational outcomes, even when no such relationship exists. For example, Henslin (1967) found that Las Vegas dice players behave as if they can control the outcome of the toss, finding that that players toss the dice slowly if they want low numbers and harder if they want high numbers. This 16

“illusion of control” causes experienced stock traders to be overconfident which leads to overtrading (Barber, and Odean, 2001). Another example of experience leading to increased cognitive bias is attribution bias (Block and Funder, 1986). Work by Wolosin, Sherman and Till (1973) found that participants that were successful in their tasks saw themselves as responsible for their outcome, while participants that failed or had a neutral outcomes blamed the situation (e.g., luck or room noise) or competitors for the results. As a result, experiences often or exacerbates cognitive biases. Thus, HYPOTHESIS 4. Judgment bias will be more pronounced in buyers with previous online auction shopping experience. 3. Data and Methods To test the above hypotheses, we simulated an online auction site similar to eBay's, with the participants acting as buyers. One hundred and twenty-seven undergraduate students from a large public university in the Midwest participated in the study. Prior to taking part in the experiment, participants completed a survey designed to capture their attitudes toward and experiences with electronic markets. 3.1. Experiment The participants were told that sellers on eBay, Amazon and other online markets behave honestly or dishonestly by delivering (or failing to deliver) a product that matches the posted description in a manner consistent with the posted delivery terms. Further, the participants were told that on eBay and other online auction sites, after an auction is completed, the buyer is given the opportunity to provide feedback on the seller’s performance and that there are three possible

17

ratings that a buyer can give a seller; positive (the seller behaved honestly), neutral, and negative (the seller behaved dishonestly). {Insert Table 1 Here} The study employed a within subject design and all participants were given identical instructions and presented with the same nine scenarios of products being auctioned, together with the prior feedback ratings of sellers. Subjects were randomly assigned to one of three treatment groups. Each treatment group was presented with the exact same seller feedback information. However the presentation format of the feedback information was varied. In the first group, the feedback rating information was presented with in a manner similar to “ID Card” found on eBay Feedback Profile pages between 2000 and 2003 (see Figure 1). {Insert Figure 1 Here} Figure 1 depicts the feedback profile for a seller with 7 positve, 0 nuetral and 2 negative ratings in the year 2000 eBay-like treatment settings. In addition to the seller’s feedback ratings, in each treatment group, for each scenario, participants were presented with a detailed description of the product and the auction’s buy-itnow price. On eBay and other auction sites, bidders have the opportunity to bid for the item and wait to see if they are the winning bidder at the end of the auction, or they can simply agree to pay the buy-it-now price and win the item on the spot. Once a bidder agrees to pay the buy-itnow price, the auction ends and that bidder is declared the winner. A second group was presented in a format similar to the format employed on eBay auction sites after 2003 (see Figure 2). 18

{Insert Figure 2 Here} Figure 2 depicts the feedback profile for a seller with 7 positve, 0 nuetral and 2 negative ratings in the year 2003 eBay-like treatment settings. The seller’s feedback score is caculated by taking the the number of positive ratings (7) and subtracting the number of negative ratings (2). The percentage of postive rating is derived by dividing the number of postive ratings (7) by the total number of ratings (9). A final group was presented with the feedback information in a format similar to the one employed by Amazon (see Figure 3.). In the Amazon-like feedback treatment, participants were shown a table resembling the table found on an Amazon seller’s “At a Glance” page. {Insert Figure 3 Here} Figure 3 depicts the the percentages of positive, neutral and negative feedback ratings as well as the total number of ratings left by buyers for a seller with 7 positve, 0 nuetral and 2 negative ratings in the Amazon-like treatment settings. We have modified Amazon’s star structure slightly. In our system, 3 stars is Positive Feedback, 2 stars is Neutral Feedback and 1 star is Negative Feedback. Consistant with Amazon’s practice, we rounded up the average rating up to the fist decimal place. In addition, to be pictorially accurate, we replicated Amazon’s method of displaying fractional stars (i.e., 0.6 is represented by a half-filled star). For this experiment, all of the products auctioned were identical, new, black, eightgigabyte, Apple iPod’s. All participants were shown a picture of the item and informed that the item could be purchased at multiple area retailers for $299.99. Finally, in addition to the above information, participants were shown shipping details that outlined the delivery terms and

19

conditions for each auction. Participants were tasked with reading each scenario and, based on the information provided, assessing their level of confidence in the trustworthiness of the seller. For the experiment, nine distinct feedback ratings, which form three groupings with identical posteriors, were selected (see Table 1). The three groups were chosen to provide a spread of posteriors. Each group contains a high, low, and medium weighted rating. If participants were Bayesian, they would exhibit the same levels of confidence in the trustworthiness of each member with an equal posterior probability of being trustworthy. For example, regardless of prior probabilities, Bayesian participants would exhibit the same levels of confidence in the trustworthiness of a seller with 3 positive feedback ratings and 0 negative ratings (posterior probability group 1 in Table 1) as they would for a seller with 6 positive ratings and 3 negative ratings (also in posterior probability group 1). For each seller rating and item pairing, the degree of trust in the seller was assessed using the following three-item scale: 1. This seller is honest. 2. This seller is reliable. 3. This seller is trustworthy. 4. I would feel comfortable buying this item from this seller. 5. I would not hesitate to participate in this auction from this seller. 6. I can rely on this seller to deliver to me the product as described in this auction in accordance with the posted delivery terms and conditions. An 11-point Likert-like scale (0 = strongly disagree, 5 = neither agree nor disagree and 10 = strongly agree) was used. Similar to Ba and Pavlou (2002), trust is operationalized as the simple mean of the six questions (alpha = 0.9789). In addition to the above trust questions, a final question asked participants to enter the maximum amount they would be willing to bid on this item from this seller. 20

3.2. Empirical Specifications According to Bayes’ Rule, for each sample case of the data (D), the posterior probability for the seller performance in favor of POSITIVE or trustworthy behavior (assuming equal probability) is as follows:

(1)

Where P(POSITIVE) and P(NEGATIVE) are the percentages of positive and negative seller performances expected in the market. POSITIVE and NEGATIVE are the frequencies of positive and negative seller performance, respectively, and where n = POSITIVE + NEGATIVE denotes the total number of seller transactions, representing the weight of the evidence. The difference between proportion of POSITIVE and NEGATIVE ratings in the sample represents the strength of the evidence. Likewise, assuming equal probability, the posterior probability for the seller performance in favor of NEGATIVE is:

(2)

Combined, the two above formulas yield: (3)

)

To test Hypothesis 1, following the lead of Griffin & Tversky (1992) and Jiang et al. (1996), we transformed participants’ trust judgments into natural log odds. We then performed a clustered multivariate regression with these log odds as the dependent variable and the natural 21

log of STRENGTH ((POSITIVE - NEGATIVE)/n), the natural log of WEIGHT (n), and the estimated retail value (ERV) of the item as independent variables. The regression equation is: (4)

In the above, TRUST denotes the perceived trustworthiness and β1 and β2 are respectively the regression coefficients for strength and weight. It can be shown (ala Griffin & Tversky (1992)) that according to Bayes Rules, for any given prior probability, the regression coefficients for strength and weight should be equal if the participants were Bayesian (i.e., if they did not interpret the ratings in a biased fashion). To empirically test, Hypotheses 2 - 4, we ran a series of generalized least squares regression (GLS) regressions with random effects (grouped by participant). The regression equations are: (6) ln(Logodds) = β1 ln(Strength) + β2 ln(Weight) + β3 eBay + β4 (ln(Strength)* eBay) + β5(ln(Weight)* eBay) + β6 Amazon+ β7 (ln(Strength)* Amazon) + β8(ln(Weight)* Amazon) + ε and (7) ln(Logodds) = β1 ln(Strength) + β2 ln(Weight) + β3 eBay + β4 (ln(Strength)* eBay) + β5(ln(Weight)* eBay) + β6 Old_eBay+ β7 (ln(Strength)* Old_eBay) + β8(ln(Weight)* Old_eBay) + ε and

(8) ln(Logodds) = β1 ln(Strength) + β2 ln(Weight) + β3 StructuralAssurance + β4 (ln(Strength)*StructuralAssurance) + β5(ln(Weight)* StructuralAssurance) + ε 22

and

(9) ln(Logodds) = β1 ln(Strength) + β2 ln(Weight) + β3 OnlineExperience + β4(ln(Strength)*OnlineExperience) + β5(ln(Weight)* OnlineExperience) + ε

In the above, the ln(Logodds) is the transformed participants’ trust judgments found in equation four. OnelineExperince is a dummy variable set to one if the participant had previous experience shopping online or zero if the participant did not. Old_eBay, eBay and Amazon are dummy variables which indicate the presentation format experienced by the participant. For example, if the participant was presented with the Amazon-like format (Figure 3.), the Amazon dummy variable is set to one. Otherwise, the variable is set to zero. Similarly, if the participant was presented with the post 2003 eBay-like format (Figure 2.), the eBay dummy variable is set to one. Finally, if the participant was presented with the classic, 2000-2003, eBay-like format (Figure 1.), the Old_eBay dummy variable is set to one. StructuralAssurance is the participant’s structural assurance in online auctions. It was assessed using the following four-item scale from the pre-experiment survey completed by all participants: 1. Online auction sites like eBay have enough safeguards to make me feel comfortable using it to transact personal business. 2. I feel assured that legal and technological structures adequately protect me from problems on online auction sites. 3. I feel confident that the technologies used on online auction sites make it safe for me to do business there. 4. In general, online auction sites are now a robust and safe environment in which to transact business. 23

All items were measured on a 7-point Likert-like scale (0 = strongly disagree, 4 = neither agree nor disagree and 7 = strongly agree). These items loaded strongly together into a single factor (Table 2). Consistent with extent literature, it was operationalized as the simple mean of the four questions (alpha = 0.9506).

{Insert Table 2 Here} 4. Results Table 3 (columns A and F) summarizes the testing of Hypothesis 1 and suggests that participants place significantly greater weight on the strength of a seller’s feedback ratings than they do on the weight of the ratings. As noted earlier, had the participants been Bayesian on average, the regression coefficients for strength and weight would have been equal. Further, in addition to pooling observations across participants, we also estimate the STRENGTH and WEIGHT coefficients of each subject separately. Again, the regression coefficient for strength was larger than the coefficient for weight for 117 out of 127 participants. Post regression Wald’s tests found that the coefficient for strength was significantly larger for 68 of the 127 participants (p