User Intention Modeling in Web Applications Using Data Mining

World Wide Web: Internet and Web Information Systems, 5, 181–191, 2002  2002 Kluwer Academic Publishers. Manufactured in The Netherlands. User Inten...
Author: Evangeline Hill
0 downloads 2 Views 110KB Size
World Wide Web: Internet and Web Information Systems, 5, 181–191, 2002  2002 Kluwer Academic Publishers. Manufactured in The Netherlands.

User Intention Modeling in Web Applications Using Data Mining ZHENG CHEN Microsoft Research Asia, 49 Zhichun Road, Beijing 100080, PR China

[email protected]

FAN LIN [email protected] Department of Computer of Science and Technology, Tsinghua University, Beijing 100084, PR China HUAN LIU Arizona State University, PO Box 875406 Tempe, AZ 85287-5406, USA

[email protected]

YIN LIU [email protected] Department of Computer Science and Engineering, Tongji University, Shanghai, PR China WEI-YING MA Microsoft Research Asia, 49 Zhichun Road, Beijing 100080, PR China

[email protected]

LIU WENYIN [email protected] Department of Computer Science, City University of Hong Kong, Hong Kong SAR, PR China

Abstract The problem of inferring a user’s intentions in Machine–Human Interaction has been the key research issue for providing personalized experiences and services. In this paper, we propose novel approaches on modeling and inferring user’s actions in a computer. Two linguistic features – keyword and concept features – are extracted from the semantic context for intention modeling. Concept features are the conceptual generalization of keywords. Association rule mining is used to find the proper concept of corresponding keyword. A modified Naïve Bayes classifier is used in our intention modeling. Experimental results have shown that our proposed approach achieved 84% average accuracy in predicting user’s intention, which is close to the precision (92%) of human prediction. Keywords:

intention modeling, user modeling, machine learning, data mining, Web navigation

1. Introduction The rapid growth of the Internet has resulted in an exponential growth of information. People often lost in his way to find the information even with the help of search engines [4]. Meanwhile, software applications are aiming to provide more powerful functionalities to satisfy the needs of different users. For example, in order to compose a multimedia document, the user must learn how to insert different media objects and format them appropriately. To better assist the user to search what they want more efficiently and learn new software tools more effectively, the computer needs to understand the user’s intention.

182

CHEN ET AL.

In our opinion, the user’s intention can be classified into two levels: action intention and semantic intention. Action intentions are lower level, such as mouse click, keyboard typing and other basic actions performed on a computer. Semantic intentions correspond to what the user wants to achieve at high level, which may involve several basic actions on a computer to accomplish it. For example, “I want to buy a book from Amazon,” “I want to find some papers on data mining,” or “I want to attach an image file to the email I am composing” [12] is a semantic intention. In this paper, we mainly focus on predicting action intention based on the features we extracted from the user interaction such as user’s typed sentences and viewed content. Although not explicitly designed to predict the user’s semantic intention, the predicted actions together will constitute of a high level goal that the user intends to achieve. It has been shown that the assistance is helpful for the user when the user’s intention is predicted by observing the user’s behaviors [10]. For example, in Web surfing, a user may conduct a series of actions including clicking (the hyperlinks), saving (the pages), and closing (the browser). Suppose a user wants to buy a digital camera that is his semantic intention, he may do the following: First, open a Web browser. Second, type in www.amazon.com in the address bar. Third, after the page is returned, type in digital camera in the search box. Fourth, click on one of the objects contained in the page. Fifth, click on the buy button to confirm. Last, after the transaction is finished, close the browser. Our goal is to predict this series of basic actions that the user will be conducting in a system to accomplish his intention to buy a digital camera on the Web. A software agent may automatically highlight the hyperlinks that the user may click on based on the prediction. In this paper, we use a modified Naïve Bayes classifier to model the user’s action intention on a computer. The model can be trained incrementally and used to predict the user’s next action. Besides keyword features, our algorithm also utilizes WordNet to generalize the learned rule for similar words. Our experiments show that our prediction algorithm can reach 85% of accuracy. The rest of the paper is organized as follows. Section 2 is a brief overview of related work in this field. Section 3 describes our algorithm in detail. Section 4 shows the experiments and evaluations. We conclude our work in Section 5.

2. Related work Predicting user’s intention based on a sequence of user’s actions or other related semantic information is an interesting and challenging task. In fact, predicting the user’s intention is an important functionality of agents, which are known as intelligent, frequently, autonomous and mobile software systems. Hence, various agent systems have been designed with their own methods to achieve this goal. Bauer et al. [3] have already introduced some typical notions and procedures on how to construct and train such an agent. However, most of the works mainly focused on user’s preference and did not address the difference between user’s intention and preference. Office Assistant [10] is perhaps the most widely used agent. It uses the approach of Bayesian Networks for analyzing and predicting the user goals. A temporal reasoning

USER INTENTION MODELING IN WEB APPLICATIONS USING DATA MINING

183

method is used for the changing of the user goals. This work, to the best of our knowledge, is probably the only one which has studied user’s intention in a deep mode. However, we think the Office Assistant’s predictions can be further improved if semantic contexts are also used in addition to action sequences for mining user intention. For Syskill & Webert, the authors of [15] have compared several learning algorithms in user’s preference modeling and prediction, including Bayesian classifier, Nearest Neighbor, PEBLS, Decision Trees, TF*IDF, and Neural Nets. However, their methods require manual tagging and no incremental algorithms are considered. For Retriever, the authors of [8] have used the TF*IDF weighting scheme, which is a traditional Information Retrieval approach, to automatically mine user’s preference from semantic information the user involved in. Furthermore, they try to analyze the queries that the user typed for precision improvement using a so-called query domain analysis approach. Other information agents in this kind, such as WebWatcher [1], WebMate [5], WAIR [17] are based on a similar approach. For instance, in WebWatcher, hyperlink information is considered, and in WAIR, user’s feedback is also added as an important source for analysis. In contrast with most of existing works [6,7], our approach does not only consider keyword features extracted from the text, but also tries to form a concept hierarchy of keywords to improve the prediction performance. On the other hand, research efforts in incremental algorithms have also been made in this field. Gabriel et al. [9] gave a detailed discussion on incremental clustering. Different algorithms were presented and compared, including explicit cluster assignment, greedy clustering, and doubling algorithm. These algorithms decrease the maintenance space of the source documents and speed up the process of model rebuilding. Inspired by their success, in this paper, we also extend the Naïve Bayes Algorithm to support incremental learning in the intention modeling process.

3. Mining user’s intention 3.1. Linguistic features Linguistic features are features in the text that may indicate a user’s intention. Two types of linguistic features are considered in this paper: keyword and concept features. A keyword feature is a single word extracted from the text, which may be stemmed [16] and stop-word excluded. For example, a sentence “Attached word file is a map of our office” is parsed to keyword features “attach word file map office.” Although the keywords contain important meaning of the sentences, they are too specific to represent user’s intention. For example, “road” and “way” are of the same meaning sometime; however, they are different keyword features. In order to solve the problem, we introduce the concept hierarchy of keywords, which are more general than keywords. WordNet [13] is a tool representing the underlying lexical concept in various forms: synonyms (similar terms), hypernyms (broader terms), and hyponyms (narrower terms), etc. Among them, the hypernyms relation is a good form of concept hierarchy of keywords. For instance, the hypernyms of the word “dog” in the hierarchy

184

CHEN ET AL.

FEATURE_EXTRACTION(Record, α, β) For each Ri in the Record set (R1 , . . . , Rn ) Extract (Ki1 , . . . , Kim ) from Ri and add to F For each Kij Generate C of Kij using WordNet Add C and action tag of Ri to T Given α, β, generate the Association Rules (Ci1 , . . . , Cim ) from T using Apriori algorithm and add F Return F Figure 1. Algorithm of extracting linguistic features.

are “dog ⇒ canine ⇒ · · · ⇒ mammal ⇒ · · · ⇒ animal ⇒ life form ⇒ entity,” which means “the dog is kind of . . .” and the hypernym ladder of word “cat” is “cat ⇒ feline ⇒ · · · ⇒ mammal ⇒ · · · ⇒ animal ⇒ life form ⇒ entity.” If we roll up the concept hierarchy, the word “dog” and “cat” can be merged into “mammal,” “animal,” or even “entity.” This operation results in the generalization of keywords. Not all concept generalizations are suitable for the linguistic feature. “Entity” may be a too abstract concept for keyword “dog” and is not representative. “Animal” is a good concept in the case mentioned above. We use WordNet to extract the concept hierarchy of each keyword and select the most representative one (i.e. hypernyms) as the concept feature by means of Association Rules. As can be seen from the experiments, user intention prediction based on the concept features outperforms the pure keyword features.

3.2. Feature extraction algorithms At the keyword level of feature extraction, the text part is parsed such that all words are extracted from the sentences and are stemmed with stop-word excluded. Each keyword is a feature and will be added to the keyword feature set. In section 3.1, we mentioned that the concept hierarchy of the keywords may be selected for better representation. Given the α and β thresholds, which will be explained in detail in the next paragraph, the association rules are employed to mine the most popular concept of the keywords among the training data, that is, to generalize the various keywords to the same concept level. An algorithm to extract linguistic features is presented in Figure 1. In Figure 1, α, β are thresholds of the rule generation; F is the feature set; C is the concept hierarchy; T means the transaction set of Association Rules. In Section 3.1 we mentioned that we need to select an appropriate concept of keywords for the concept features. In order to make the selection process automatically, we chose a rule generation method to mine the proper concept. The Apriori algorithm proposed by Agrawal et al. [2] was adopted to generate the association rules. The rules represent the association relationship between features and intentions, e.g., given a feature, the association indicates whether a click action is intended. The association rules were further constrained by two

USER INTENTION MODELING IN WEB APPLICATIONS USING DATA MINING

185

... ... ... ... ... Figure 2. A skeleton of XML user log data.

parameters: α (support of item sets) and β (confidence of association rule) [2]. The first parameter α, which depicts the scope of the rules, can be expressed by the percentage of those records that contain both the same feature and a corresponding intention. The second parameter β depicts the probability that the rule stands, i.e. the probability of the intention given the appearance of the feature. We evaluate the generated rules based on these two parameters: those rules with parameters higher than certain thresholds are selected as concept features.

3.3. Intention modeling Without the loss of generality, we focused on modeling and inferring the user’s intention in the Web browser environment because the training and testing data are easier to collect in such an environment. Initially, a user intention model is empty but can be learned from the user logs. Each user action record contains a text part and a tag of action as well as other important information that may reflect the user’s intention. The XML format is adopted when user’s log data is recorded, as shown in Figure 2. “Action Type” is one of the five intentions mentioned in Section 4. “Title” and “Body” contain corresponding content in the html file. “Links” record every url links appears in the html file. Naïve Bayes classifier and Bayesian Belief Network are two machine learning algorithms widely used in recent years because they provide a probabilistic approach to inference [14]. The Naïve Bayes classifier is based on the simplified assumption that the features are conditionally independent and this assumption dramatically reduces the complexity of learning the target function. In contrast to the Naïve Bayes classifier, Bayesian Belief Network describes the joint probability distribution for a set of features. In general,

186

CHEN ET AL.

INTENTION_MODELING(Record, M) This algorithm incrementally builds the intention model from training data. call FEATURE_EXTRACTION(Record, α, β) For each Ri in the Record set (R1 , . . . , Rn ) ij , C i1 , . . . , C ij ) i1 , . . . , K Using IG to select (K ij , C i1 , . . . , C ij ) to counting set i1 , . . . , K Adding (K If U  0.01 (threshold for incremental learning) Recalculate the probability distribution of the intention model, empty the counting set P∗ =

+ + + Nnew P · Nold

Ntotal + 1

Else keep the probability distribution unchanged Return M Figure 3. Algorithm on intention modeling.

Bayesian Belief Network provides a better performance in classification than Naïve Bayes classifier. However, the computational complexity for building Bayesian Belief Network becomes impractical when the training data is large. Therefore, we chose Naïve Bayes classifier to build intention models. The algorithm was revised to support incremental learning. The training algorithm for intention modeling is depicted in Figure 3. Record (R1 , . . . , Rn ) is a set of log data tagged with user’s intention; n is the number of records; Model (M) is the intention model trained before (which is initially empty); Kij is a keyword feature; Cij is concept feature; m is the dimensions of the features; IG is the Information Gain Algorithm [18]; U is the percentage of untrained records. To support incremental learning, we use the trick that store the extra information of the count of the features in the + + is the count of is the count of previous positive training examples, Nnew counting set: Nold positive examples added currently and Ntotal is the total count of the training data. With the counting set above, there is no need to keep the training data and the probability table can be updated incrementally. Furthermore, we use Information Gain [18] to select most discriminative features (both keyword and concept) to reduce the size of dictionary and improve the performance when the training set is small. The Information Gain measures the significance of information obtained for intention prediction by knowing the presence or absence of a feature in a record. Let {Vi }m i=1 denote the set of intentions predefined. The information gain of feature f is defined as follows:  IG(f ) = − P (Vi ) log(Vi )  + P (f ) P (Vi |f ) log(Vi |f )  P (Vi |f ) log(Vi |f ). (1) + P (f ) Given a training set, we compute the information gain for each unique feature and remove from the feature set those features whose information gain was less than a certain predetermined threshold.

USER INTENTION MODELING IN WEB APPLICATIONS USING DATA MINING

187

Table 1. Probability distribution of the trained model. P (click) = 0.5, P (save) = 0.2, P (close) = 0.3

Learn Research Cognition

Click

Save

Close

0.5 0.6 0.7

0.2 0.2 0.1

0.3 0.2 0.2

3.4. Predicting user intention Once training is completed, the obtained intention model for the user is used to predict the user’s intention in the future. The prediction process is as follows. A set of linguistic features represented by ( f1 , f2 , . . . , fn ) is extracted from the text typed or used by the user. Assuming conditional independence among features, the prediction module calculates the probabilities of all predefined user intentions (V ) and chooses the one with the maximum probability (vNB ) based on the following equation: vNB = arg max P (vj |f1 , f2 , . . . , fn ) vj ∈V

= arg max P (vj )P (f1 , f2 , . . . , fn |vj ) vj ∈V

= arg max P (vj )P (f1 |vj )P (f2 |vj ) · · · P (fn |vj ). vj ∈V

(2)

For example, given the title of a hyperlink “Learning & Research,” we can apply the model to infer the user’s intention as follows:  vNB = P (vj )P (f1 = “learn”|vj ) arg max vj ∈{click,save,close}

 × P (f2 = “research”|vj )P (f3 = “cognition”|vj ) .

(3)

Among the features, “learn” and “research” are keyword features, and “cognition” is the concept feature of these two keywords. To calculate vNB using the above expression, we require estimates for probability terms P (vj ) and P (fi = wk |vj ) (here we introduce wk to indicate the kth feature in the feature dictionary). Suppose we have a trained model and Table 1 provides the partial probability distribution. Using probabilities in Table 1, we calculate vNB as follows: P (click)P (f1 = “learn”|click) ×P (f2 = “research”|click)P (f3 = “cognition”|click) = 0.105, P (save)P (f1 = “learn”|save) ×P (f2 = “research”|save)P (f3 = “cognition”|save) = 0.0008, P (close)P (f1 = “learn”|close) ×P (f2 = “research”|close)P (f3 = “cognition”|close) = 0.0036.

(4)

188

CHEN ET AL.

Based on the above data, we can predict that the user may want to follow the hyperlink, that is, click the hyperlink.

4. Experiments and discussion We have developed a tool to automatically collect the user’s log data in the IE environment. The following five user’s actions are recorded: (1) (2) (3) (4) (5)

browse (view a Web page), click (follow a hyperlink in a Web page), query (type query words in a search box), save (save pages or the objects in a page), close (close the IE browsing window).

Each action was stored in the XML format (cf. Figure 2). Five users’ data in a period of one month were logged and approximately 15,000 pages and their corresponding actions were recorded. We randomly select some of the pages as the training data and the rest as the testing data according to a training ratio and repeat the split 10 times in one test to calculate t-test value [11]. From the training data, keyword and concept features were extracted and stored in their corresponding feature set. The Information Gain is used to select the most useful features and, hence, can dramatically reduce the feature set. In the training stage, the model is built incrementally according to the feature set. From the testing data, each of the action record is predicted using the trained model, and the predicted intention is compared with the action performed by the user. In our experiment, α and β mentioned in Section 3.2 are 0.005 and 0.6, respectively. In the experiments, we used keyword features, concept features, and their combination to compare their performance for intention modeling. The threshold of Information Gain is 0.02. Because most of the related work did not give a prediction on user action intention, and the Office Assistant [10] only archived a precision of 40% (while predicting a lot of intentions), we had to compare our method with human prediction. We randomly chose 150 records from the log data and predict user’s intention manually. We read the training data to be familiar with the user’s behavior, and then use the testing data for prediction. The comparison of these methods is shown in Figure 4. The keyword features are considered the baseline of our approach’s configuration and the human prediction precision is the upper bound of benchmarking intention prediction approaches because our goal is to help the user as a human assistant. As we can see, the precision using concept features is better than keyword features. The data can be analyzed as paired-samples t-test with one tailed distribution. The P value of t-test is 0.0079 (< 0.05), which shows a significant improvement in the concept features. The association rules help improve the performance, because concept features provide more generalized descriptions of a Web page, as discussed in Sections 3.1 and 3.3. Furthermore, when the two features are combined, the performance is improved slightly compared with concept feature only. The P value of t-test is 0.0097, which supports the conclusion. The performance of our algorithm is close to the human prediction when the training ratio is 50%.

USER INTENTION MODELING IN WEB APPLICATIONS USING DATA MINING

189

Figure 4. Comparison of different features for intention modeling.

Figure 5. Precision comparison among different intentions.

In the experiments, we found that the precision of different intentions varies to some extent, as shown in Figure 5. We used combined keywords and concepts features. “Browse,” “Click,” and so on are the intentions we predicted in this experiment. “Browse” action is easy to predict because of its large proportion (almost 60%) in the training data. In the Naïve Model, this kind of intention is easy to predict for the fully trained model. “Query” action is predicted with a higher precision because the concept feature gives a well generalized description on the pages containing search results. For instance, when the user types a query in Google, the concept generalization of keywords in returned pages are denser than the keywords themselves. However, the “close” action has a lower precision in prediction, indicating that it is probably unpredictable based on only the text information. For example, the user may close the window that he is interested because something urgent appears or simply close an uninteresting window. The average precision is the weighted sum of these five intentions, which can be calculated as  Praverage = P (Acti ) · PrActi . (5) An experiment on the impact of Information Gain (IG) was also conducted. Figure 6 shows the performance of using different threshold of Information Gain. The IG algorithm gives a better performance when training data is small. In this situation, a lot of keywords

190

CHEN ET AL.

Figure 6. Impact of IG to prediction precision.

appear only once or twice and contribute little to the intention model. The IG algorithm is a good way to remove these data. However, if the threshold is too small, i.e., too few features are selected, the model will give a poor performance. When the training data is large enough, whether to use the IG makes little difference. Because in either condition, the model is fully trained, noise data will not influence the whole model. However, IG generates a much smaller dictionary and Naïve Model, for a more compact representation. We have also implemented and tested our algorithm on the prediction of Email attachment insertion, i.e., whether to insert an attachment when the user is typing the Email. After intention inferring is done, we further predict which file would be inserted according to the user’s preference model. In a multi-lingual country, it is tedious to switch the input methods when writing multi-language documents. With the help of our algorithms, we can monitor the user’s intention and make an auto-switch for him. What the user has typed can be considered as the semantic context and the intention is of two states: switch or not.

5. Conclusion In this paper, we presented our methods on modeling and inferring user’s intention via data mining. We defined two levels of intention (action intention and semantic intention) and differentiated user’s intentions from user’s preferences. Two linguistic features (keyword and concept features) are extracted for intention modeling. We used association rule to mine proper concepts of corresponding keywords. In our experiment, concept features are more effective than keyword features because they are generalizations of the keywords. Naïve Bayes classifier is a learned intention model. It was chosen here due to its simplicity and fast speed. We also modified the algorithm to support incremental learning. Experiments have shown the usefulness and effectiveness of the developed algorithms. In order to take advantage of a vast body of work on user’s preference, our work in the near future will concentrate on using both user’s intention and user’s preference in Web applications.

USER INTENTION MODELING IN WEB APPLICATIONS USING DATA MINING

191

References [1] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo, “Fast discovery of association rules,” in Advances in Knowledge Discovery and Data Mining, AAAI Press, California, 1994, pp. 307–328. [2] R. Armstrong, D. Freitag, T. Joachims, and T. Mitchell, “WebWatcher: A learning apprentice for the World Wide Web,” in Proceedings of AAAI Spring Symposium on Information Gathering from Heterogeneous, Distributed Resources, 1995. [3] M. Bauer, D. Dengler, and G. Paul, “Instructible information agents for Web mining,” in Proceedings of the 2000 International Conference on Intelligent User Interfaces, 2000, pp. 21–28. [4] H. Chen, Y. Chung, and M. Ramsey, “A smart itsy bitsy spider for the Web,” Journal of the American Society for Information Science 49(7), 1998, 604–618. [5] L. Chen and K. Sycara, “WebMate: A personal agent for browsing and searching,” in Proceedings of the Second International Conference on Autonomous Agents, 1998, pp. 132–139. [6] Z. Chen, W. Liu, F. Zhang, M. Li, and H. J. Zhang, “ Web mining for Web image retrieval,” Journal of the American Society for Information Science and Technology 52(10), 2001, 831–839. [7] F. Crestani, M. Lalmas, C. J. Rijsbergen, and I. Campbell, ““Is this document Relevant? . . . Probably”: A survey of probabilistic models in information retrieval,” ACM Computing Surveys 30(4), 1998, 528–552. [8] D. Fragoudis and S. D. Likothanassis, “Retriever: An agent for intelligent information recovery,” in Proceedings of the 20th International Conference on Information Systems, 1999, pp. 422–427. [9] L. Gabriel, Somlo and E. H. Adele, “Incremental clustering for profile maintenance in information gathering Web agents,” in Proceedings of the Fifth International Conference on Autonomous Agents, 2001, pp. 262–269. [10] E. Horvitz, J. Breese, D. Heckerman, D. Hovel, and K. Rommelse, “The Lumiere project: Bayesian user modeling for inferring the goals and needs of software users,” in Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence, 1998, pp. 256–265. [11] R. E. Kirk, Statistics: An Introduction, Baylor University, 1999. [12] F. Lin, W. Liu, Z. Chen, H. J. Zhang, and L. Tang, “User modeling for efficient use of multimedia files,” in Proceedings of Second IEEE Pacific-Rim Conference on Multimedia. Beijing, October 2001. Lecture Notes in Computer Science, Vol. 2175, Springer, 2001, pp. 182–189. [13] G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller, “Introduction to WordNet: An on-line lexical database,” International Journal of Lexicography 3(4), 1990, 235–244. [14] T. Mitchell, Machine Learning, McGraw-Hill, New York, 1997, pp. 154–200. [15] M. Pazzani, J. Muramatsu, and D. Billsus, “Syskill & Webert: Identifying interesting Web sites,” in Proceedings of the 13th National Conference on Artificial Intelligence (AAA196), 1996, pp. 54–61. [16] M. F. Porter, “An algorithm for suffix stripping,” Program 14(3), 1980, 130–137. [17] Y. W. Seo and B. T. Zhang, “A reinforcement learning agent for personalized information filtering,” in Proceedings of the 2000 International Conference on Intelligent User Interfaces, 2000, pp. 248–251. [18] Y. Yang and J. Pedersen, “A comparative study on feature selection in text categorization,” in Proceedings of the 14th International Conference on Machine Learning, 1997.