Technology Trend Analysis Tool using Twitter as a Source

International Journal of Information Technology & Computer Science ( IJITCS ) (ISSN No : 2091-1610 ) Volume 6 : Issue on November / December , 2012 T...
4 downloads 0 Views 163KB Size
International Journal of Information Technology & Computer Science ( IJITCS ) (ISSN No : 2091-1610 ) Volume 6 : Issue on November / December , 2012

Technology Trend Analysis Tool using Twitter as a Source Yi-Chun Lin Department of Information Management National Taiwan University Taipei, Taiwan [email protected]

Ping-che Yang Institute for Information Industry Taipei, Taiwan [email protected]

Wen-Tai Hsieh Department of Information Management National Taiwan University Taipei, Taiwan [email protected]

Seng-cho T. Chou Department of Information Management National Taiwan University Taipei, Taiwan [email protected] Abstract: As the rise of social networking, people started to share information through different kinds of social media. Among all varieties of social media, Twitter is a valuable resource for data mining because of its prevalence and recognition by celebrities. In this paper we present a novel system which collects Tweets from technology celebrities, by using data mining technique, we’ll be able to do trend analysis on those Tweets and thus provide some prediction of future trend. Results of trend analysis will be display as a website with different sections presenting top news, trend topics, active users, and top sources. Keywords-component; trend analysis; Content-based recommendation; News recommendation; Social recommendation; Twitter; social media I. INTRODUCTION Trend analysis has been an important topic for data mining, while the prevailing of social networking, more and more trend analysis research focuses on social networking. Many celebrities use Twitter as the media for sharing information, driven the wave of using Twitter as a communication tools, which makes trend analysis on Twitter become a valuable topic for further discussion. In this paper we introduce a trend analysis tool, it comprises three functions: trend analysis among Twitter celebrities, finding most powerful celebrities, and

This Paper is presented on : International Conference on Information Technology, E-Government and applications ( ICITEA 2012)

……………………

Page 69

International Journal of Information Technology & Computer Science ( IJITCS ) (ISSN No : 2091-1610 ) Volume 6 : Issue on November / December , 2012

finding critical information resources. This tool focuses on analyzing tweets from those celebrities, thus provide a way to find out technology trends in the future. II. RELATED WORK A. Social Network Analysis Social network analysis is a methodology mainly developed by sociologists and researchers in social psychology. Social network analysis views social relationships in terms of network theory, while individual actor being seen as a node and relationship between each node are presented as an edge. Social network analysis has been define in [1] as an assumption of the importance of relationships among interacting units, and the relations defined by linkages among units are a fundamental component of network theories. Social network analysis has emerged as a key technique in modern sociology. It has also gained a significant following in anthropology, biology, communication studies, economics, geography, information science, organizational studies, social psychology, and sociolinguistics 1. In 1954, Barnes [2] started to use the term systematically to denote patterns of ties, encompassing concepts traditionally. Afterwards, there are many scholars expanded the use of systematic social network analysis. Due to the growth of online social networking site, online social networking analysis becomes a hot research topic recently. B. Twitter Twitter is an online social network used by millions of people around the world to be connected with their friends, family and colleagues through their computers and mobile phones [3]. The interface allows users to post short messages (up to 140 characters) that can be read by any other Twitter user. Users declare the people they are interested in following, in which case they get notified when that person has posted a new message. A user who is being followed by another user need not necessarily reciprocate by following them back, which renders the links of the network as directed. Twitter is categorized as a micro-blogging service. Micro-blogging is a form of blogging that allows users to send brief text updates or other media such as photographs or audio clips. Among variety of microblogging include Twitter, Plurk, Tumblr, Emote.in, Squeelr, Jaiku, identi.ca, and others, Twitter contains an enormous number of text posts and grows quickly every day. Also, audience on Twitter varies from regular users to celebrities, company representatives, politicians [4], and even country presidents therefore provide a huge base for data mining . We choose Twitter as the source for trend analysis simply because of its popularity and data volume. 1 Wikipedia http://en.wikipedia.org/wiki/Social_network_analysis C. Social Network Analysis on Twitter A social networking service is an online service that focuses on building social network among people who are willing to share interests, activities, information, or real-life connections. As the fast-growing popularity on the Internet, social network service platform therefore provide adequate information for social network analysis. In [5], Ahn, Han, Kwak, Moon, and Jeong analysis whether online relationships and their growth patterns are as same as in real-life social networks by comparing the structures of three online social networking services: Cyworld, MySpace, and orkut. This Paper is presented on : International Conference on Information Technology, E-Government and applications ( ICITEA 2012)

……………………

Page 70

International Journal of Information Technology & Computer Science ( IJITCS ) (ISSN No : 2091-1610 ) Volume 6 : Issue on November / December , 2012

Among all kinds of social networking service, Twitter, as a micro-blogging service is the second popular social networking site [6]. With its special limitation that only 140 characters can be entered in each tweet, Twitter therefore provide a good position for social network analysis. Many researches has focus on social network analysis on Twitter. Longueville, Smith, and Luraschi [7] focus on how Twitter can be used as a source of spatio-temporal information; Sakaki, Okazaki, and Matsuo [8] present an investigation of the real-time nature of Twitter and proposes an event notification system that monitors tweets and delivers notification promptly; Pak and Paroubek [9] used Twitter as a source of opinion mining and sentiment analysis tasks. Phelan, McCarthy, and Smyth [10] used Twitter to recommend real-time topical news. Mathioudakis and Koudas [11] focus on trend detection over Twitter stream. Compared with Mathioudakis and Koudas’s Twitter Monitors, our system focuses on trend detection, and further investigates to find opinion leaders and most effective media through trend analysis algorithm. Therefore, the system provides a more integral insight for viewing the trend, and has the potential to provide information about how this trend forms. III. SYSTEM FRAMEWORK We present a novel system which collects tweets from technology celebrities, using data mining technique to detect hot topics, and thus provide a view of technology trends. In our framework, there are two layers in the trend analysis tool, the data processing layer and information display layer. Data processing layer deals with data collection and data mining, while information display layer use a website to present the result of data mining. More details will be introduced in the following sections. A. Data Collection and Data Mining As now, we have set up the list of technology celebrities manually. We then go through the website of those celebrities to collect their tweets. All data collected by the crawler will be stored in a database for further analysis. During the analysis process, user properties, social connections, and message properties are also taken into considerations. Combining with social semantic analysis and natural language processing, tweets about daily gossips or unrelated contents will be discarded, and thus relative contents are accurately extracted.

Figure 1. Overview of Data Collection and Data Mining Process B. System Display Among all tweets from the celebrities, our system provides a visualized website to demonstrate the result from data mining. Four sections of mining results are presented: top news section, trending topics section, active users section, and top sources section. Top news section is shown in the FrontPage, coupling with their original resources and referral tweeters on the side. This system also provides filtering function for top news, users can easily find out what they want to read by time pattern, social activity, or list type of celebrity.

This Paper is presented on : International Conference on Information Technology, E-Government and applications ( ICITEA 2012)

……………………

Page 71

International Journal of Information Technology & Computer Science ( IJITCS ) (ISSN No : 2091-1610 ) Volume 6 : Issue on November / December , 2012

The second section, trending topics, provides the view point from particular keyword. Users may either enter a keyword to search, or simply click on the trending topic box which contains hot keywords and hashtags. For each keyword, we have analyzed its trend for the degrees of popularity, related keywords, and most related celebrities from Twitter. Articles of related news are also displayed on this page. The third and fourth section, active users and top sources, present the popularity of celebrities and source media by calculating the weights of contribution on trend analysis.

Figure 2. System Framework IV. RUN-TIME EXECUTION

Figure 3. A Snapshot of the FrontPage including Top News, Trending Topics, Active Users, and Top Sources Among all crawled data from celebrities on Twitter, we use TF-IDF and fixed keywords which associated with technology domain to extract hot topics. Our trend analysis tool uses web view as visualization to present mining results. All results will be demonstrated in four section: top news section, trending topics section, active users section, and top sources section. Top news section presents several popular news on the FrontPage alone with Twitters and source media which had referred to this topic. Fig. 3 demonstrate a snapshot of the FrontPage. All Twitters’ username and source media are clickable, giving a way to connect with the original tweets and source pages. Fig. 4 presents a detail view of top news.

This Paper is presented on : International Conference on Information Technology, E-Government and applications ( ICITEA 2012)

……………………

Page 72

International Journal of Information Technology & Computer Science ( IJITCS ) (ISSN No : 2091-1610 ) Volume 6 : Issue on November / December , 2012

Figure 4. A Detail View of Top News We use keywords and hashtags to show trending topics. Each keyword/hashtag would have its own statistic about keyword trend, relation with other keywords, Twitters mention counts, and related news. By using the filtering or searching function, users can easily access to relevant information. Our filtering function provides different filter type including time filter by day, week, month, and year; social activity filter by the counts of tweets, likes, and shares; list type filter by the type of the celebrities. Fig. 5 presents an example of the trend of keyword “digg”. For the sections of active users and top sources, we rate those celebrities and source media by counting their presence. Fig. 6 shows the result of top sources.

V. CONCLUSION In this paper, we present a trend analysis system for monitoring trends on technology domain from Twitter. This system not only collects data, but also provides functions for further information exploration by data mining and friendly user interface display. There are four main functions presented in this system: top news, trending topics, active users, and top sources. Top news and trending topics demonstrate two different ways to display our mining results, users can also use filters, search function, or top keyword/hashtag to interact with the mining result. The function of presenting active users and top sources not only provides the sight of future trend prediction, but also provides valuable information for detecting opinion leaders among celebrities and source media. By the ability of analyzing celebrities’ discussion, and display mining result in an userfriendly visualized website, our system provides a new way for users to discover the trend of hot topics in the future. VI. FUTURE WORK In this paper, we demonstrate a trend analysis tool on technology celebrities from Twitter. This structure can also be applied to different domain and different social network site. According to mining results, the contributions of each celebrity and source media are thereby revealed. By giving weights among all sources in the future, crucial sources can be weighted more heavily for generating better mining result. Under the structure of framework implemented in this system, there are promising potentials for developing further trend analysis algorithm; for instance, with the combination of natural language processing and information retrieval, more precious and crucial details are to be discovered; therefore provides more capability for developing next generation’s trend analysis tool. ACKNOWLEDGMENT

This Paper is presented on : International Conference on Information Technology, E-Government and applications ( ICITEA 2012)

……………………

Page 73

International Journal of Information Technology & Computer Science ( IJITCS ) (ISSN No : 2091-1610 ) Volume 6 : Issue on November / December , 2012

This study is conducted under the "Social Intelligence Analysis Service Platform" project of the Institute for Information Industry which is subsidized by the Ministry of Economy Affairs of the Republic of China. REFERENCES [1] S. Wasserman and K. Faust, Social network analysis : methods and applications. Cambridge; New York: Cambridge University Press, 1994. [2] J. A. Barnes, "Class and Committees in a Norwegian Island Parish," Human Relations, vol. 7, pp. 39-58, 1954. [3] S. Milstein, A. Chowdhury, G. Hochmuth, B. Lorica, and R. Magoulas. (2008). Twitter and the micromessaging revolution: Communication,connections, and immediacy.140 characters at a time. [4] A. Cheng, M. Evans, and N. Koudas. (2009, July 13). Inside the Political Twittersphere. Available: http://www.sysomos.com/insidetwitter/politics/ [5] A. Yong-Yeol, H. Seungyeop, K. Haewoon, M. Sue, and J. Hawoong, "Analysis of topological characteristics of huge online social networking services," in Proceedings of the 16th international conference on World Wide Web, ed. Banff, Alberta, Canada: ACM, 2007, pp. 835-844. [6] e. Articles. (2012, July 13). Top 15 Most Popular Social Networking Sites | July 2012. Available: http://www.ebizmba.com/articles/social-networking-websites [7] B. D. Longueville, R. S. Smith, and G. Luraschi, ""OMG, from here, I can see the flames!": a use case of mining location based social networks to acquire spatio-temporal data on forest fires," presented at the Proceedings of the 2009 International Workshop on Location Based Social Networks, Seattle, Washington, 2009. [8] T. Sakaki, M. Okazaki, and Y. Matsuo, "Earthquake shakes Twitter users: real-time event detection by social sensors.," M. Rappa, P. Jones, J. Freire, and S. Chakrabarti, Eds., ed: ACM, 2010, pp. 851-860. [9] A. Pak and P. Paroubek, "Twitter as a Corpus for Sentiment Analysis and Opinion Mining " in Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10), ed. Valletta, Malta: European Language Resources Association (ELRA), 2010. [10] O. Phelan, K. McCarthy, and B. Smyth, "Using twitter to recommend real-time topical news," presented at the Proceedings of the third ACM conference on Recommender systems, New York, New York, USA, 2009. [11] M. Michael and K. Nick, "TwitterMonitor: trend detection over the twitter stream," in Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, ed. Indianapolis, Indiana, USA: ACM, 2010, pp. 1155-1158.

This Paper is presented on : International Conference on Information Technology, E-Government and applications ( ICITEA 2012)

……………………

Page 74

Suggest Documents