Discovering Influential Nodes for Viral Marketing

Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009 Discovering Influential Nodes for Viral Marketing Yung-Ming Li Nati...
Author: Iris Allen
1 downloads 0 Views 411KB Size
Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009

Discovering Influential Nodes for Viral Marketing Yung-Ming Li National Chiao-Tung University [email protected]

Cheng-Yang Lai National Chiao-Tung University [email protected]

Abstract High cost and uncertainty are problems of marketing. Influential online product reviews are more powerful than firm’s advertisements. The key of viral marketing is to discover the viruses for efficiently spreading product impressions. In this paper, a model combined with mining techniques and adaptive RFM is proposed to evaluate the influential power of online reviewers. The modified PMI equation quantifies the review value and the RFM concept is used to consider the writing status of reviewers for the influence calculation. The artificial neural network is also adopted to train the appropriate network structure in our model. Trust, the most common influential power indicator, is then used to evaluate our model. The results showed that our model outperforms two general methods in selecting influential reviewers. Our work can accurately point out which reviewer to be selected to become the virus.

1. Introduction Under current global economic structure, almost all firms have to face extreme competitions from competitors around the globe. In order to survive in such tough environment, superior marketing strategies are needed to raise sales, to gain larger the market shares as well as the loyalty of customers. Research in marketing behavior thus emerges as an important topic. For this very reason, many marketing-related research issues covered by various perspectives and domains have been done [20] such as: 1. organizational issues relevant to marketing strategy (e.g. branding, competitive behavior, positioning, and segmentation) 2. organizational issues that span functions (e.g. quality management) 3. the interface between marketing and business strategy 4. organization level phenomena that impact marketing strategy (e.g. market orientation, corporate culture) 5. outcomes of marketing strategy (e.g. market share, customer satisfaction)

Chia-Hao Lin National Chiao-Tung University [email protected]

However, there existed some problems of cost and uncertainty in current marketing modes. One reason making enterprises unwilling to invest resources on marketing is that the revenues cannot be generated as planned. In other words, the investment of advertisement may be a failure or a waste due to unexpected events. In general, the goals of marketing are to get a high growth in sales, market share, and gross margin in the marketplace. However, research has shown that the advancement of technology drives down the manufacturing and managerial costs but, at the same time, raises the marketing cost rapidly [11]. The implementation of Just-in-Time strategy and flexible manufacturing systems reduced the manufactured cost efficiently. Under the whole cost structure of firms, only the marketing costs (including expenses such as product development, selling, distribution, advertising, sales promotion, public relations, customer service, outbound logistics and order fulfillment) have risen a lot over the last 50 years [14]. In addition, the returns on marketing are usually unpredictable. Although the advertisements that can take place in different forms can spread extensively, what makes it difficult is the number of viewers who will be attracted and the amount of revenues that is predicted to be generated for the enterprise. In the e-commerce context, the characteristics of the Internet make it the most appropriate platform for virtual product delivery or make an order for physical products. From the perspective of marketing, the purpose is to spread the positive impressions of products to customers and further to attract them to be interested in. In other words, advertising is one kind of “information spreading” procedure and the Internet is the best media to achieve it. With lower costs, high speed, and externality effects, marketing on the Internet has advantages over the traditional media at the expense of others. For instance, although e-mail is one of the most common ways for general advertising on the web, statistics showed that more than 95% emails are junk mails [28]. The high junk rate would make most people pay less attention to this advertisement and lower the effects of advertising. In addition, over-advertising makes the customers have

978-0-7695-3450-3/09 $25.00 © 2009 IEEE

1

Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009

negative impressions on firms. No matter how low the costs of advertising on the Internet, the resources are wasted definitely. The development of recommendation mechanisms filter most junk information and only send products the customers are interested in. It can be seen as a one-toone marketing. The quality of recommendations relies on the purchase history and personal preference will be considered as basic materials for system input by data mining techniques [30][31]. The advancement of Internet infrastructure makes almost everyone has the ability to contribute or share information on the Internet. The sharing behaviors on the web are so-called “Web 2.0”. In other words, information flow is not purely as client/server structure but like the peer to peer architecture (P2P). The concepts of peer production and social network are also constructed by the power of Web 2.0 [18]. The Web 2.0 platform in today’s Internet offers customers an opportunity to obtain the detailed information about relevant experiences and comments related to specific product from customers. In online community and discussion forum, participants are allowed to contribute comments and to find comments about products they need or ready to buy. Moreover, these comments may be provided by the node in their social network. Consumers have very high probability to believe the real using experience comments which is provided by the one of believable social network node but firms’ advertisements. Viral marketing is promoted based on this situation. Researches have shown that social networks affect the adoption of individual innovations and products [8] and the power of social network spreads information in breathtaking speed [26]. In fact, purchase decisions made by users are usually influenced by the comments of purchased experience of their own social network. From the perspective of firms, the marketing behaviors focus only on the users who are powerful to others and willing to spread product impressions that can be expected. This strategy not only decreases costs but also increases correctness for marketing. In this work, the influential nodes discovery with potentials to achieve the effects of viral marketing was expected. How to measure the influence of each node is a very important topic because it decides which nodes are appropriate to be the “virus”. Enterprises can use the information to make a good marketing strategy and budget plans in order to achieve the best effects of infection. The remaining part of this paper is organized as follows. In section 2, we survey existing literatures about our research topics which include viral marketing, information retrieval, text mining, RFM concept, neural network applications, and trust. In

section 3, we propose the system architecture and methodologies applied in this work. Next, the procedures and materials about experiments will be discussed in section 4. The results and evaluations will be displayed in section 5. Finally, a conclusion ends the paper.

2. Literature Review Viral marketing is a new marketing method which uses electronic communications (eg. e-mail) to trigger brand messages throughout a widespread network of buyers [1]. Dobela et al. [1] studied several real marketing cases and analyze why they need viral marketing, how to apply technology in it, and how to use it successfully. Dobele et al. [2] showed that “emotion” has more impact than “the expectation of recipient” in the successful message passing. They also stated that marketing to several influential people will perform better than sending message to everyone and that is what we want to achieve. Moore [23] investigated the branding influence based on viral marketing environment. Leskovec et al. [12] proposed a model to explain user behaviors in a large community. Richardson and Domingos [15] utilized probabilistic models and data from knowledge-sharing sites to design the best viral marketing plan. Zhan et al. [13] emphasize the important role of writing and referring product reviews in the internet. In the case of the methodologies to implement opinion mining, many scholars focus on the identification of author’s attitude such as positive or negative [29][16]. However, opinion mining techniques are most applied in the binary classification of reviews. We want to use these techniques to do a detailed scoring for the value of each review. Modified and multi-dimensional scoring mechanisms will be more appropriate for our problems. Actually, there are already some social datasets distributed on the web [27] and it is helpful in simplifying the data collecting process for opinion mining. This technique will be one units of our research to achieve the analysis of semantics. Hughes [9] proposed RFM (which stands for Recency, Frequency, and Monetary) analytical model in 1994 to measure the values of customers for enterprises. By RFM analysis, firms can understand the potential of customers easily by observing their past behaviors. Newell [17] also stated that RFM method is very effective in customer segmentation. So, the simple and direct measure has been used in direct marketing for a number of decades. It is obviously customers who recently purchase (Recency), those who purchase many times (Frequency), and those who spend more money (Monetary) with a marketer typically represent the best targets for new offerings [10]. Drozdenko and Drake [6]

2

Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009

applied the hard coding techniques on RFM weighting to assign weights to the three variables in RFM analysis. Chan [5] proposed a novel approach that combines customer targeting and customer segmentation for campaign strategies. In our work, we will use this concept to judge partial value of our target nodes and some modifications will be applied to let it fit our data source scenario. Artificial neural network (ANN) is an interconnected group of artificial neurons that uses a mathematical or computational model for information processing based on a connectionistic approach to computation [4]. ANN is appropriate for solving complex problems which includes many variables. Kuo and Chen [21] utilized fuzzy neural network to learn rules produced from order selection questionnaires in electronic commerce. Our study will also rely on this technique to build feasible model to do accurate prediction of influential nodes. Trust can also be used to indicate the strength level of relationships among people without doing detailed investigation of intention [24]. Munns [3] stated that trust is a relation from personal to individual, arises from the experiences of and influences on that individual. Smith and Barclay [25] studied the relationships between buyers and sellers and showed that trust is based on character/motives/intentions and role competence/judgment. Dasgupta [7] states that trust is helpful in the condition where there is uncertainty about the actions that will be undertaken by others and when these actions are of consequence to those involved. In other words, trust mechanism is an important and effective factor for us to make a purchase decision although we are not familiar with the product. Due to these features of trust, in our work and data sets, the trust score is clear and appropriate to be a pointer of potential nodes. We will try to use this pointer as the evaluation indicators to reflect the effects of our study. We can find several works about viral marketing which is based on social networks. However, most of them are focused on the observation of business condition currently or the calculation of social network spreading. In fact, practical model which can be applied easily on business strategy making is hard to acquire. Our works will pay much attention on applications of information technologies to help enterprise find a good solution of marketing strategy in viral marketing.

3. The model The proposed model analyzes the after-use reviews provided by online users and RFM values in each

author’s activity recorded to identify which authors are influential. The influential reviews represent the influence of their authors and the RFM value also indicates the infective ability of each reviewer by time segmentation. An influential ranking list of authors is expected to identify potential nodes so we hope to construct a welllearned model in order to calculate each reviewer’s mixed-score of two elements. Mass of data which contained complete review content and RFM attributes are needed for well-structured model training. Artificial neural network technique will be applied to achieve the training procedure for better weight measurement among these elements. The well-trained network model will be fed with selected testing nodes and the output ranking list for selecting influential nodes. The high-ranking nodes are valuable targets for firms doing marketing. They are expected to spread fame of products and their manufacturers wider and stronger than other people as real virus. Firms can have some special strategies to take advantages of these potential reviewers. Figure 1 displays the concept and whole architectures about our system model simply:

Figure 1. System concept and architecture Target node means the possible virus for viral marketing who is the product reviewer choose from an online social network environment. In this case, it is an online discussion area which provides a platform for users to write many kinds of product reviews. The model scores these viruses to decide which one is the most infective to market. The infective ability is decided by two factors: review and PMI value. The reviews was wrote by each reviewer will be analysis by text mining techniques to measure the score. The results of analysis will be quantified by our modified PMI model in six different degrees. In addition, the “RFM value” of each node acquired by recording attributes of each review (i.e. time, date, and category). The both scores will be weighted as the final virus score to decide this reviewer is valuable or not. Weighting mechanism in our proposed model is

3

Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009

implemented by artificial neural network. It will learn the most appropriate structure of network to reflect the effects of each element by massive data training. The mechanism would be able to discover the hidden value in each review and consider verifying the effect of it at the same time. The detail statements about each unit in this architecture are described in the following subsections.

3.1 Recursive Word Set Expanding In existed research, the semantics of single article are usually classified as four categories: positive, negative, objective, and subjective. This work follows these categories to identify some hidden information in reviews. People are easier being influence by subjective expression in the strong wording. Empirically, many scholars focus on the identification of author’s attitude such as positive or negative [16] [29]. The semantic tendency of an article is usually decided by some specific keywords which are clear and hard to be misinterpreted. The semantic identification is helpful for review tendency judging automatically. A trustable reviews must have fair attitude to reviewed and comment on products. Therefore, the positive and negative perspectives are combined to review analysis. Turney and Littman [19] define two sets of words which represent positive (Sp) and negative (Sn) sentiments respectively. We expand Sp+n by recursive method in order to make a subjective word base to do subjective check. In order to consider the subjective of reviews, both word sets will be included in our model. Sp={ good, nice, excellent, positive, fortunate, correct, superior } Sn={ bad, nasty, poor, negative, unfortunate, wrong, inferior } Sp+n=Sp+Sn The composite word set Sp+n combine of Sp and Sn covers both positive and negative semantics. The ingredients of Sp+n carry a complete meaning of “subjective words”. For achieve an accurate check further expanding Sp+n from WordNet online semantic lexicon recursively. We mark the word set S pk + n which means it expands k times. For example, k=1 equals to original 14 items in S 1p + n and k=2 equals to all synonyms of the above 14 items are included in S p2 + n . The sets will grow rapidly according to k value and the number of matches will also increase due to larger word set S pk + n . Six

degrees of word set expansion will be executed in experiment to observe a better expanding level. The six levels of word matching will be recorded and quantified in following PMI method.

3.2 PMI Strength Level Approach PMI (Pointwise Mutual Information) is used as a tool to calculate the strength score of each review as the basis for the results of review analysis. Turney and Littman [19] define PMI as the following equation: PMI ( t , ti ) = ¦ log 2

Prc ( t , ti )

Prr ( t ) Prw ( ti )

This equation can measure the semantic association between the matched term t in a review and ti in word set S pk + n by calculate the emerging probability of them in the whole article. In the consumers’ perspective, objective and influential reviews should not only focus on the merit but also pointed out the defect of products. Reviewers must be honest and objective to provide their own experience that will be trusted by readers. By this reason, both the positive and negative subjective terms ( S pk + n ) are used to measure PMI values. The key point of PMI calculation is the value of Prc(t,ti), Prr(t) and Prw(ti). We define each of them as follows: Prr ( t ) = Prw ( ti ) =

ntr Nr 1 Ns

Prc ( t , ti ) = 1

( i.e. term t and t i

are the same word.)

Term ntr stands for the number of term t (i.e. number of matches) in target’s review and Nr stands for the number of all words in this review. Ns represent the number of words in the word set while ti was collected into it. The real effect of PMI is that it considers the number of matched words in the whole article so it reflects the subjective strength level of the target. In addition, PMI also takes the word appearing probability in each review and decreases the errors due to unequal number of words in each review. In order to simplify the calculation and retrieve appropriate value for processing, modify PMI as follow to fit our need: PMI ( t , ti ) = log 2 ª¬ Prr ( t ) Prw ( ti ) º¼

4

Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009

It is based on the viewpoint of each review. Every time the model processes the score of one review and PMI(t,ti) considers all matches in it. It means that every review in dataset will have its PMI score and it comes from the sum of its all matches. It is obviously that this equation will produce a negative score and it is inconvenient for continued processing. Before combine with other character values, PMI score of every review would standardized by: PMI i _ std =

PMI i − PMI min PMI max − PMI min

Then, acquire the target node’s score by: n

¦ PMI

i _ std

i =1

calculations, the following formula is used for Recency standardize procedures:

Std xi =

γ i − max γ i max γ i − min γ i

The absolute value can indicate the strength of lower Recency with higher standardized value. 3.3.2 Frequency Model Frequency represents the purchasing times in a specific time range originally. Similar definition is applied in this work. It indicates the number of writings in a specific time range of each author. The segment of time range is the first step to record frequency. Three separation time points as follows:

n

This equation transfers the viewpoint from each review to each author. It sums and averages all reviews’ PMI score of every author.

3.3 RFM Value In this work, original RFM concepts are modified and fit into experiment. Based on the characteristics of online product reviews, only Recency and Fequency indexes RFM analysis are adopted. In the Web 2.0 environment, information contributors provide knowledge to the Internet voluntarily and their efforts have no direct relationships to pecuniary revenues. 3.3.1 Recency Model The original concept of “Recency” is the days from the last purchase date to now. In the experiment, “Recency” seems as the time range Ȗ between current date and the latest wrote date of each node. It is measured by days. The benchmark date (i.e. current date) is set at May 20th, 2008 due to the experimental duration.

θ 365 : number of writings made over 365 days

The three time points are decided due to general using situation of RFM. In addition, most of data are electronic products and the product life cycles are shorter than general products.

3.4 Trust Value Calculation The real spirit of viral marketing is the ingredients of social network. It is also very clear that the recommendation from our friends are always more trustable. The trust relationships between reviewers can add more benefits to product information spreading. Many online discussion areas record impression scores for every online user that can establish their personal friend list and black list. This procedure also indicates each user’s trust level to others.

γ i = C − li li is the last wrote date of node i and C is the current date. Initial values of Recency are measured by days and need to standardize in order to combine with other index values. Recency standardization is little different from general standardize procedures because of higher values indicate lower market values. In order to display real meaning of Recency and the convenience of later

Figure 2. Trust name list expanding relations Figure 2 displays the trust relationships expanding among online users. The relationship can be traced deeper and deeper. The social connections of members of the discussion area could be observed. In addition,

5

Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009

the intimate level between nodes can also be calculated in detail by leading in more trusting score. The social connections represent the influencing range of each node so called SCN (Social Connection Number). Influence power relate to the range of SCN of reviewer. The SCN range trace of target node can be formulated as following equation: n1 n 2

SCN = " ¦¦ fij" j =1 i =1

Our purpose is to discover how large the influential range of each reviewer is, and this is a fair indicator to determine his / her influence. In other words, we want to know these reviewers are trusted by how many people. The whole Social Connection Number (SCN) of a reviewer can be constructed be recursive tracing. The pseudo codes for recursive SCN computation is showed in following figure:

4.2 Word Set Expansion In order to do subjective word matching in different level, expand original word set Sp+n is needed. WordNet has defined all related subjective words and listed their synonyms clearly. WordNet is used to expand the original word set in this experiment. First, the adjective classification in WordNet is the target because that is the most appropriate words to indicate the tendency or semantics of a review author. Next, the JWordnet API used to help extract all synonyms of S 1p +n from WordNet word base and add these words into S 1p +n to generate S p2+ n and ignore the duplicate words. Six levels of word-set expansion were constructed and show the expansion results in Table 1. Table 1. Word set expansion results k S pk +n

1 14

2 142

3 578

4 1241

5 2148

6 3223

Next, the results of the six different scenarios will be recorded and apply a measure to acquire a better word expanding level (i.e. k value):

MAPE =

Figure 2. Pseudo codes of SCN computation

4. Experiment and Result 4.1 Data Source The data used in this experiment was collect from Epinions.com. It is an open platform which provides online users writing reviews to various products. It provides a simple trust mechanism for members to identify the effect of reviews that is good for us to retrieve related data. Dataset composition is through randomly picked up ten reviews from one product in each sub-classification respectively. In general, each review was written by one author (or node), then retrieve whole reviews written by these nodes to analysis. The process needs the techniques of web crawler or opened dataset. Training dataset include 2952 reviews which are randomly selected from Electronics sub-categories of Epinions.com. There are 715 reviews wrote by 16 reviewers are retrieved as testing dataset.

1 n At − Ft ¦ n t =1 At

Mean Absolute Percentage Error (MAPE) is adopted to reach the goal. A stands for the actual value and F is the forecasting value of data. The concept of MAPE is very simple to understand and the difference between actual value and predicting value will be displayed clearly. In addition, the reviewers in our testing data set all have some basic level of trust value so we do not need to worry about the denominator would be zero. Table 2 shows the MAPE results under the six different word expanding levels: Table 2. The six different word expanding levels Neurons k value MAPE k value MAPE

1 4353.267014 4 2475.813779

50 2 3866.308969 5 4591.692716

3 3546.679299 6 4867.48149

Smaller MAPE represents more accurate results. The column of level 4 has the smallest errors between actual and forecasting trust values. That is, the fourlevel word set expansion is most appropriate for our experiment. In the following experiment, the testing data will be processed by 50 neurons hidden layer ANN model and the four-level word set expansion.

4.3 Word Matching

6

Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009

In this experiment, 920,137 words of 715 reviews are processed. All reviews will have six different matching results but they will be compared under the same k value only to maintain the accuracy of experiment. The executing procedures are listed in the following: (1) Keep original word set. (2) Execute key words matching to all words in the word set with all reviews in dataset. (3) Recording matching counts and calculated the PMI strength score. (4) The review score of each reviewer will be averaged and recorded. (5) Increase k value and repeat step (2)-(4) again to acquire different level scores. (6) The best score set under specific k value will select for upcoming process.

4.4 RFM Score Time attribute of each review is needed for the calculation of Recency and Frequency value. It is convenient that the two indicators are both on the reviewer’s viewpoint originally. The standardized Recency and Frequency value are displayed in Table 3. Preliminary analysis of RFM reveals that large differences exist among these reviewers’ publication. Reviewers get low Recency and similarly Frequency value if reviewers write reviews continuously. This would be helpful for identifying influential nodes. Table 3. Recency and Frequency value Recency ID / Period ASourdough4 AtlantaGreg corona79 dkozin Howard_Creech hwz1 JIMILAGRO jvolzer njpoteri porcupine1 readsteca sarahrose12 theheidis tucknroll williamrender zan720

Standardize 20 94 68 35 50 890 1418 97 1518 91 121 69 232 851 1484 1079

365 days 0.650 0.962 0.000 0.522 0.890 1.000 1.000 0.645 1.000 0.600 0.766 0.917 0.933 1.000 1.000 1.000

4.5 Trust Score The purpose of trust score calculation is to identify how large the influential range of reviewers, that is, to know the reviewers are trusted by how many people.

A three layers artificial neural network was applied to build a fit model for reviewers ranking prediction. The NNTool in MATLAB 2006 is used in this experiment. Table 5 lists parameters applied in network training. Table 5. Parameters for neural network training Parameters Network type Number of neurons in hidden layer Training function Performance function Epochs Goal Mu Mu_dec Mu_inc Mu_max Max_fail

Value Feed-forward back-propagation 50 TRAINLM MSE 150 0 0.001 0.1 10 10,000,000,000 5

7

Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009

4.7 Result and Evaluation Table 6 shows the analysis result of our testing data. The ranks are decided due to the predicting value: Table 6. Influential ranking Reviewer ID ASourdough4 AtlantaGreg corona79 dkozin Howard_Creech hwz1 JIMILAGRO jvolzer njpoteri porcupine1 readsteca sarahrose12 theheidis tucknroll williamrender zan720

Predicting value 0.5101 0.8826 0.1446 0.7615 0.9474 0.9129 0.2709 0.3716 0.0013 0.0669 0.1553 0.0871 0.2191 0.5523 0.1579 0.4601

Rank 6 3 13 4 1 2 9 8 16 15 12 14 10 5 11 7

In order to judge our ranking is effective, two common ranking mechanisms such is “popular author” and “review rating” are compared with our method. The reason for choose these two methods because they covered “human connections” and “value of writings”. MAPE was used to measure the effect of this result and calculated by the following equation: MAPER =

1 16 RTi − RPi ¦ R 16 i =1 Ti

RTi stands for the trust ranking of each reviewer and RPi is the predicting ranking. By comparing with the trust ranking, the MAPE values can be acquired and shown in Table 7. In order to judge whether the ranking is effective or not, we also collect the ranking results of two common ranking mechanisms to compare with our method. They are “popular author” and “review rating” approaches. We choose the two methods because they are widely applied and covered “human connections” and “value of writings”. Popular author is a ranking mechanism applied by Epinions.com. Epinions.com chooses popular authors on a monthly basis, and the newest ranking of this month can be looked up in real time. The popular authors are classified into different product categories to make this mechanism more complete and effective. We collect the popular author ranking of each node in 2008 for evaluation. Due to the characteristics of

reviews, the ranking category should be set to “Electronics” and “Overall” for comparison. Although we found that nodes have different ranking in two categories, the relative positions of them are the same. Therefore, the popular author ranking is displayed by one ranking set only. Review rating is another common ranking mechanism. When someone posts a review, every online member can give a rating to the review. In other words, each review has a composite score which is decided by other online users. We use the average scores of all of the reviews wrote by each author to decide his / her overall review rating. The ranking is generated from these average scores. The review rating represents the comments of their readers and we think it should reflect the feeling of people correctly. It also indicates the values of these articles for online users. Table 7. Ranking and MAPE value id / Method ASourdough4 AtlantaGreg corona79 dkozin Howard_Creec h hwz1 JIMILAGRO jvolzer njpoteri porcupine1 readsteca sarahrose12 theheidis tucknroll williamrender zan720 MAPE value

Ours

Popular Author

Review Rating

Trust

6 3 13 4

4 5 10 2

12 16 7 8

3 5 6 4

1

1

4

1

2 9 8 16 15 12 14 10 5 11 7

3 12 6 16 11 7 13 8 9 15 14

6 13 15 1 10 14 3 9 5 2 11

2 10 8 16 14 13 11 9 12 15 7

0.2531

0.2918

1.0369

0

As Table 7 shows, our proposed method has lower error rate than others. That is, our proposed influential power predicting method for viral marketing have 75% accuracy rate. It proves that the composite method really works. Our method was based on the quantity of emotional expression to determine the strength of influence. Similar to review rating, MAPE appears to perform better than that. However, even our method did not consider the popularity of author, our method still appears to be better mechanism than ones that take popularity of author into account. In the future research, inclusion of popularity of author may help improve our model's performance. There are several reasons for this result:

8

Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009

First, our method is composed of key factors which may decide someone’s influence. By opinion mining, this paper do not focus on judging reviews are positive or negative but pay attentions to “quantity of emotional expression” in them. Second, the ANN training process shapes the model closing to real trust value. By thousands of training, the system learned the patterns extracted from real data. Third, if we use “popular author rating”, it considers the hit numbers of reviews only. In other words, the contents of reviews are not analyzed and the qualities of them are not guaranteed. It is possible to have a situation that a reviewer with high hit numbers but cannot provide the information consumers need.

5. Conclusion Although the advancing of IT technology and the Internet reduce the cost of marketing behaviors such as advertisement, the “uncertainty” problem still exists. Many enterprises waste much resource on invalid marketing behaviors. Viral marketing is a new and effective marketing method which is based on the power of “word of mouth” for saving much resource and troubles in mass marketing. How to find the potential nodes that are powerful to others and willing to spread positive product impressions efficiently is the key of viral marketing. Via Internet, the recommendations from other online users’ product review comments have more influential power than traditional advertising. In this work, a solution to find potential, influential nodes was proposed. The text mining techniques and the RFM analysis were combined to calculate the influential power of real online users through her/his reviews. The trust score which is composed of thousands of human connections is applied for evaluation. The final results also passed the examination of trust. For firms, the influence power of each node can be measured clearly and which node most worthy to market is easy to identify by the proposed model. This method provides a simple and helpful name list to improve marketing behaviors. It not only raised the success rate of marketing but also used less nodes (or costs) accelerated the rate of the spread of advertising to achieve the viral marketing. The high score nodes are like real viruses and infect members of social network automatically and widely. This method assists to carry out the online viral marketing that can save a lot of resource in customer finding. Online viral marketing can widely spread product information to large number of potential customers. It widens the range of marketing and provides more chances to enterprises.

There are still some parts in this work can be improved. As the data description shown above, Electronics, Computers, and Media is the three main partitions in the dataset. Current model can only reflect in Electronic category. A possible solution is classifying the reviews in advance. Retrieving related data according to the need of the enterprise would lead to more accurate ranking. Flexible weighting mechanism is another factor which can be applied in this model. Flexible weighting mechanism not only makes this model fitter to enterprises’ needs but also saves computing resources.

Acknowledgements The authors are grateful for the financial support of National Science Council (NSC: 97-2410-H-009-035MY2).

References [1]

[2]

[3]

[4] [5]

[6] [7] [8]

[9]

[10]

[11]

A. Dobele, D. Toleman, and M. Beverland, “Controlled infection! Spreading the brand message through viral marketing”, Business Horizons, 48(2), 2005, pp. 143-149. A. Dobele, A. Lindgreen, M. Beverland, J. Vanhamme, and R. Wijk , “Why pass on viral messages? Because they connect emotionally”, Business Horizons, 50(4), 2007, pp. 291-304. A. K. Munns, “Potential influence of trust on the successful completion of a project”, International Journal of Project Management, 13(1), 1995, pp. 1924. Artificial neural network. [http://en.wikipedia.org/wiki /Neural_network]. Access on May 31st, 2008. C. C. Chan, “Intelligent value-based customer segmentation method for campaign management: A case study of automobile retailer”, Expert Systems with Applications, 34(4), 2008, pp. 2754-2762. Drozdenko, R. G. and P. D. Drake, Optimal database marketing: strategy, development, and data mining. Thousand Oaks: Sage Publications, 2002. Dasgupta P., Trust as a commodity. Edited by Diego Gambetta. Trust: Making and breaking cooperative relations. Oxford, UK: Blackwellm, 1988. D. Strang, and S. A. Soule, “Diffusion in organizations and social movements: From hybrid corn to poison pills”, Annual Review of Sociology, 24, 1998, pp. 265-290. Hughes, A. M., Strategic database marketing- The masterplan for starting and managing a profitable, customer-based marketing program, third edition. McGraw-Hill Professional, 2005. J. A. McCarty, and M. Hastak, “Segmentation approaches in data-mining: A comparison of RFM, CHAID, and logistic regression”, Journal of Business Research, 60, 2007, pp. 656-662. J. A. Weber, “Managing the marketing budget in a cost-constrained environment”, Industrial Marketing

9

Proceedings of the 42nd Hawaii International Conference on System Sciences - 2009

[12] [13]

[14] [15]

[16]

[17]

[18]

[19]

[20]

[21]

Management, 31(8), 2002, pp. 705-717. J. Leskovec, L. A. Adamic, and B. A. Huberman, “The Dynamics of Viral Marketing”, ACM Transactions on the Web (TWEB), 1(1), 2007. J. M. Zhan, H. T. Loh, and Y. Liu, “Gather customer concerns from online product reviews - A text summarization approach”, Expert Systems with Applications, In Press, Corrected Proof, Available online 28 December 2007. J. N. Sheth, and R. S. Sisodia, “Feeling the heat: marketing is under fire to account for what it spends”, Marketing Management, 4, 1995, pp. 8-23. M. Richardson, and P. Domingos, “Mining knowledge-sharing sites for viral marketing”, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, 2002. M. Q. Hu, and B. Liu, “Mining and summarizing customer reviews”, In Proceedings of the Tenth ACM SIGKDD international Conference on Knowledge Discovery and Data Mining, New York, 2004, pp. 168-177. Newell Frederick, The new rules of marketing: How to use one-to-one relationship marketing to be the leader in your industry, McGraw-Hills Companies Inc., New York, 1997. O'Reilly T. What Is Web 2.0What Is Web 2.0- Design Patterns and Business Models for the Next Generation of Software. O'Reilly Network. September 30, 2005. [http://www.oreillynet.com/pub/a/oreilly/tim /news/2005/09/30/what-is-web-20.html]. Access on March 5th, 2008. P. D. Turney, and M. L. Littman, “Measuring Praise and Criticism: Inference of Semantic Orientation from Association”, ACM Transactions on Information Systems, 21(4), 2003, pp. 315-346. P. R. Varadarajan, and S. Jayachandran, “Marketing strategy: An assessment of the state of the field and outlook”, Journal of the Academy of Marketing Science, 27, 1999, pp. 120-143. R. J. Kuo, and J. A.Chen, “A decision support system

[22] [23] [24] [25]

[26] [27]

[28]

[29]

[30]

[31]

for order selection in electronic commerce based on fuzzy neural network supported by real-coded genetic algorithm”, Expert Systems with Applications, 26(2), 2004, pp. 141-154. Rappa M. Business Models. Managing the Digital Enterprise. Chapter 5. [http://digitalenterprise. org/index.html]. Access on May 7th, 2008. R. E. Moore, “From genericide to viral marketing: on "brand"”, Language & Communication, 23(3-4), 2003, pp. 331-357. Simmel G., and D. Frisby, The Philosophy of Money. ROUTLEDGE TAYLOR & FRANCIS GROUP, 2004. S. J. Brockand, and D. W. Barclay, “The effects of organizational differences and trust on the effectiveness of selling partner relationships”, Journal of Marketing, 61, 1997, pp. 3-21. S. Jurvetson, “What exactly is viral marketing?”, Red Herring, 78 , 2000, pp. 110-112. TrustLet, a free, collaborative project for collecting and analyzing information about trust metrics. [http://www.trustlet.org/wiki/Trust_network_datasets]. Access on May 30th, 2008. Ward Mark, More than 95% of e-mail is 'junk'. Technology correspondent, BBC News website. Thursday, 27 July 2006. [http://news.bbc.co.uk/2/hi/ technology/5219554.stm]. Access on June 6th, 2008. X. W. Ding, B. Liu, and P. S. Yu, “A Holistic Lexicon-Based Approach to Opinion Mining”, Proceedings of the international conference on Web search and web data mining, 2008. Y. H. Cho, and J. K. Kim, “Application of Web usage mining and product taxonomy to collaborative recommendations in e-commerce”, Expert Systems with Applications, 26(2), 2004, pp. 233-246. Y. Y. Zhang, and J. X. Jiao, “An associative classification-based recommendation system for personalization in B2C e-commerce applications”, Expert Systems with Applications, 33(2), 2007, pp. 357-367.

10