Qiaozhu Mei Department of Computer Science University of Illinois at Urbana-Champaign 201 N.Goodwin Ave. Urbana, IL 61801 U.S.A.

Office Phone: (217) 244-4026 Mobile Phone: (217) 848-9394 Office Fax: (217) 265-6494 Email: [email protected] http://sifaka.cs.uiuc.edu/∼qmei2/

Research Interests Text information management in general. Specific interests include text data mining, information retrieval, machine learning, natural language processing, database, and bioinformatics.

Education May 2009 (expected)

University of Illinois at Urbana-Champaign, Urbana, Illinois Ph.D. in Computer Science Advisor: ChengXiang Zhai

Aug. 2003 – May 2004

Vanderbilt University, Nashville, Tennessee Graduate Student in Computer Science

July 2003

Peking University, Beijing, China B.S. in Computer Science Advisor: Junfeng Hu, Xiaoming Li

Employment May 2008 – Aug. 2008

Yahoo! Research, Santa Clara, California Research Intern. Mentor: Andrew Tomkins, Ravi Kumar

May 2007 – Aug. 2007

Microsoft Research, Redmond, Washington Research Intern. Mentor: Kenneth Church

May 2006 – Aug. 2006

Microsoft Research, Redmond, Washington Research Intern. Mentor: Kenneth Church

Aug. 2004 – May 2007 Aug. 2003 – May 2004

Department of Computer Science, Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois Research Assistant Department of EECS, Vanderbilt University, Nashville, Tennessee Teaching Assistant

Nov. 2001 – Jul. 2003

Department of Computer Science, Peking University, Beijing, China Research Assistant, Teaching Assistant (Sep. 2002 – Jan. 2003)

2

Qiaozhu Mei

Awards 2007 2007 2007 2006 2004 2003 2002 2002 2001 2000

Yahoo! PhD Student Fellowship (1 of 5 recipients nationwide) ACM KDD 2007 Best Student Paper Runner-Up Award (out of over 500 submissions) C. L. and Jane Liu Award, University of Illinois at Urbana-Champaign ACM KDD 2006 Best Student Paper Runner-Up Award (out of 531 submissions) Roy J. Carver Fellowship, University of Illinois at Urbana-Champaign University Graduate Fellowship, Vanderbilt University Distinctive Academic Performance Award, Peking University Kodak Scholarship, Peking University Aode Scholarship, Peking University General Electric Scholarship, Peking University

Teaching Experience Spring 2008

University of Illinois, Urbana, Illinois Two guest lectures for “Introduction to Text Information Systems (CS410).” Attendants were graduate and senior undergraduate students.

Spring 2004

Vanderbilt University, Nashville, Tennessee Teaching Assistant for “Program Design and Data Structures (CS201).” Attendants were freshmen and sophomores of undergraduate students. Duties: office hours, review sessions, grading, lab supervision.

Fall 2003

Vanderbilt University, Nashville, Tennessee Teaching Assistant for “Principles of Operation Systems (CS281).” Attendants are undergraduate students. Duties: office hours, review sessions, grading, lab supervision.

Fall 2002

Peking University, Beijing, China Teaching Assistant for “Introduction to Computing.” Attendants were nonCS major undergraduate students. Duties: lab supervision, grading.

Summer 2001

China Computer Federation and Peking University Instructor for “Introduction to Computer Skills,” an 8-day summer popularization course of computer techniques in Zhongyang, Shanxi, China. Attendants were 120 teachers from local high schools.

3

Qiaozhu Mei

Dissertation Research Title: Advisor:

Contextual Text Mining ChengXiang Zhai

Many problems in text information management are concerned with knowledge discovery from text data, which involves rich context information. Some context information is explicit, such as time, geographic locations, users, and social networks. Some context information is implicit, such as topics/themes, relations, and sentiments. I designed a novel and general framework and methodology to facilitate the discovery and comparative analysis of various types of topic patterns over different contexts, a general problem we refer to as contextual text mining. Such a framework includes a family of probabilistic topic models with context variables, a component to regularize topic models with context structures, and a module to generate semantic labels for the topic models. The special cases of this framework can solve many research problems such as modeling topic evolution and topic trends, extracting spatiotemporal topic patterns, modeling the mixture of topics and sentiments, generating gene summaries from biology literature, and discovering topical communities and topic maps in social networks. This work has been broadly applied to news articles, weblogs, search engine logs, scientific literature, emails, and customer reviews, etc.

Other Research Experience Aug. 2004 – Present

University of Illinois at Urbana-Champaign, Urbana, Illinois Graduate Research Assistant, Advisor: ChengXiang Zhai I worked on formal information retrieval models, including Poisson language models and novel smoothing techniques for language models. I also applied advanced IR techniques to generate impact-based summaries for scientific literature, semi-structured summaries for genes, and semantic annotations for frequent patterns.

Aug. 2004 – May 2007

Institute for Genomic Biology, University of Illinois, Urbana, Illinois Graduate Research Assistant, Advisor: ChengXiang Zhai, Bruce Schartz As a research assistant of the BeeSpace project, I applied advanced text mining techniques to biology literature. Our system supports user-oriented discovery and summarization of biological entities, interactions, and other patterns. My work included concept normalization, theme extraction, and semantic analysis in Biology literature.

4

Qiaozhu Mei

May 2008 – Aug. 2008

Yahoo! Research, Santa Clara, California Research Intern. Mentor: Andrew Tomkins, Ravi Kumar I worked on a formal framework for the study of sequences of search activities. This analysis framework supports a general data model, vocabulary to discuss types of features, models, and tasks, as well as straightforward feature re-use across problems. It provides not only realistic baselines for many sequence analysis problems, but also a simple mechanism to develop baselines for unexplored sequence analysis tasks.

May 2006 – Aug. 2006 May 2007 – Aug. 2007

Microsoft Research, Redmond, Washington Research Intern, TMSN group. Mentor: Kenneth Church

Sep. 2002 – June 2003

Peking University, Beijing, China Undergraduate Research. Advisor: Xiaoming Li

I contributed extensively in the processing and exploration of a very large scale (18 months) log data of the Live search engine. We studied the entropy in query logs, based on which we gave practical answers to many questions such as how large the web is, how hard search is, and how much personalization could help. I also developed novel solutions to search problems, such as personalization with backoff, and query suggestion using hitting time.

I worked on text mining problems with the web data collected by Tian Wang search engine. The goal was to provide an analysis platform to interdisciplinary research problems and researchers such as sociologists. This has become a long term undergoing collaborative project with Peking University.

Nov. 2001 – June 2003

Institute of Computational Linguistics, Peking University, Beijing, China Undergraduate Research. Advisor: Junfeng Hu I conducted language processing and knowledge discovery from Chinese ancient poetry, which is a special type of text with restricted linguistic rules and rich metaphorical patterns. Collaborating with linguists, I studied the problems of word/collocation extraction and metaphor pattern discovery, and designed a digital museum system based on the extracted patterns.

Qiaozhu Mei

5

Publications Selected Refereed Publications 1. Qiaozhu Mei, Duo Zhang, and ChengXiang Zhai. A general optimization framework for smoothing language models on graph structures. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 08), pages 611-618, 2008. (Full paper, 17% acceptance) 2. Qiaozhu Mei, Deng Cai, Duo Zhang, and ChengXiang Zhai. Topic Modeling with Network Regularization. In Proceedings of the World Wide Conference 2008 (WWW 08), pages 101-110, 2008. (Full paper, 12% acceptance) 3. Xu Ling, Qiaozhu Mei, and ChengXiang Zhai, and Bruce R. Schatz. Mining multi-faceted overviews of arbitrary topics in a text collection. In Proceedings of the 2008 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 08), pages 497505, 2008. (Short paper, 20% acceptance) 4. Qiaozhu Mei and ChengXiang Zhai. Generating Impact-Based Summaries for Scientific Literature. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-08:HLT), pages 816-824, 2008. (Full paper, 25% acceptance) 5. Qiaozhu Mei and Kenneth Church. Entropy of Search Logs: How Hard is Search? With Personalization? With Backoff? In Proceedings of The First ACM International Conference on Web Search and Data Mining (WSDM 08), pages 45-54, 2008. (Full paper, 16% acceptance) 6. Qiaozhu Mei, Dengyong Zhou, and Kenneth Church. Query Suggestion using Hitting Time. In Proceedings of the 17th ACM International Conference on Information and Knowledge Management (CIKM 08), pages 469-478, 2008. (Full paper, 17% acceptance) 7. Deng Cai, Qiaozhu Mei, Jiawei Han, and ChengXiang Zhai. Modeling Hidden Topics on Document Manifold. In Proceedings of the 17th ACM International Conference on Information and Knowledge Management (CIKM 08), pages 911-920, 2008. (Full paper, 17% acceptance) 8. Qiaozhu Mei, Xuehua Shen, and ChengXiang Zhai. Automatic Labeling of Multinomial Topic Models. In Proceedings of the 2007 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 07), pages 490-499, 2007. (Full paper, 8% acceptance) Best Student Paper Runner-up Award 9. Qiaozhu Mei, Hui Fang, and ChengXiang Zhai. A Study of Poisson Query Generation Model for Information Retrieval. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 07), pages 319326, 2007. (Full paper, 17% acceptance)

Qiaozhu Mei

6

10. Qiaozhu Mei, Dong Xin, Hong Cheng, Jiawei Han, and ChengXiang Zhai. Semantic Annotation of Frequent Patterns. ACM Transactions on Knowledge Discovery from Data, 1(3), article 11, 2007. 11. Qiaozhu Mei, Xu Ling, Matthew Wondra, Hang Su, and ChengXiang Zhai. Topic Sentiment Mixture: Modeling Facets and Opinions in Weblogs, In Proceedings of the World Wide Conference 2007 (WWW 07), pages 171-180, 2007. (Full paper, 14.7% acceptance). 12. Xu Ling, Jing Jiang, Xin He, Qiaozhu Mei, ChengXiang Zhai, and Bruce Schatz. Generating Semi-Structured Gene Summaries from Biomedical Literature. Information Processing and Management 43, pages 1777-1791, 2007. 13. Qiaozhu Mei, Dong Xin, Hong Cheng, Jiawei Han, and ChengXiang Zhai. Generating Semantic Annotations for Frequent Patterns with Context Analysis, In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 06), pages 337-346, 2006. (Full paper, 11% acceptance) Best Student Paper Runner-up Award 14. Qiaozhu Mei and ChengXiang Zhai. A Mixture Model for Contextual Text Mining, In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 06), pages 649-655, 2006. (Short paper, 23% acceptance) 15. Dong Xin, Xuehua Shen, Qiaozhu Mei, and Jiawei Han. Discovering Interesting Patterns Through User’s Interactive Feedback. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 06), pages 773-778, 2006. (Short paper, 23% acceptance) 16. Qiaozhu Mei, Chao Liu, Hang Su, and ChengXiang Zhai, A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs. In Proceedings of the 15th World Wide Web Conference (WWW 06), pages 533-542, 2006. (Full paper, 11% acceptance) 17. Tao Tao, Xuanhui Wang, Qiaozhu Mei, and ChengXiang Zhai, Language Model Information Retrieval with Document Expansion. In Proceedings of the Human Language Technology Conference/the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 06), pages 407-414, 2006. (Full paper, 25% acceptance) 18. Xu Ling, Jing Jiang, Xin He, Qiaozhu Mei, ChengXiang Zhai, Bruce Schatz. Automatically Generating Gene Summaries from Biomedical Literature. In Proceedings of Pacific Symposium on Biocomputing 2006 (PSB 06), pages 40-51, 2006. 19. Qiaozhu Mei, ChengXiang Zhai. Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 05), pages 198207, 2005. (Full paper, 12% acceptance).

Qiaozhu Mei

7

Other Refereed Publications 1. Mu Xia, ChengXiang Zhai, Bin Tan, Yue Lu, Qiaozhu Mei. ”You Are What You Write”– Understanding User Online Behavior through Text Mining. In CHI 2009 workshop of Social Mediating Technologies, to appear. 2. Qiaozhu Mei and Yi Zhang. Automatic Web Tagging and Person Tagging Using Language Models. In Proceedings of the 4th International Conference of Advanced Data Mining and Applications (ADMA 08), pages 741-748, 2008. 3. Bei Yu, Qiaozhu Mei, ChengXiang Zhai. English Usage Comparison between Native and non-Native English Speakers in Academic Writing. In Proceedings of 17th Joint International Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing, 2005. (Top Humanities Computing conference) 4. Tao Tao, Xuanhui Wang, Qiaozhu Mei, ChengXiang Zhai. Accurate Language Model Estimation with Document Expansion. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM 05), page 273-274, 2005. (Poster) 5. Hang Su, Qiaozhu Mei. Template Extraction from Candidate Template Set Generation: A Structure and Content Approach. In Proceedings of the 43nd ACM Annual Southeast Regional Conference, volumn(2), pages 211-216, 2005. 6. Qiaozhu Mei, Junfeng Hu. From Text to Exhibitions: A New Approach for E-Learning on Language and Literature based on Text Mining. In COLING’2004 Workshop on E-Learning for Computational Linguistics and Computational Linguistics for E-Learning, 2004. 7. Qiaozhu Mei. A Knowledge Processing Oriented Life Cycle Study from a Digital Museum System. In Proceedings of the 42nd ACM Annual Southeast Regional Conference, pages 116-121, 2004. 8. Qiaozhu Mei, Junrong Shen. A Knowledge Flow Driven E-Learning Architecture Design: What is its Stratification and How is it Personalized. In Proceedings of the International Conference on Computers in Education, pages 1307-1308, 2002. (Poster Paper) Patents • Kenneth Church and Qiaozhu Mei. Personalized information retrieval search with backoff. US Patent Numbers 11529134. Filed Sep 28, 2006. Unrefereed Publications 1. Qiaozhu Mei, Jing Jiang, Hang Su, ChengXiang Zhai. Search and Tagging: Two Sides of the Same Coin? Department of Computer Science Technical Report No. 2919, University of Illinois at Urbana-Champaign (UIUCDCS-R-2007-2919), 2007.

Qiaozhu Mei

8

2. Qiaozhu Mei. A Digital Museum of Ancient Chinese Poetry Art: Design, Realization and Related Computational Linguistics Research. Literature and Information Science and Technology, pages 219-262, Taiwan Tsinghua University Press, 2004. ISBN: 957-28986-8-x. (A shorter version of Bachelor Thesis. In Chinese.) 3. Junfeng Hu, Qiaozhu Mei, Shiwen Yu. Computer Assisted Archaeology Research on Tang and Song Poetry. Literature and information Technology Workshop, City University of Hong Kong, 2003. (In Chinese) 4. Qiaozhu Mei. A Digital Museum of Ancient Chinese Poetry Art: Its Design, Realization and Related Researches on Computational Linguistics. Bachelor Thesis, Department of Computer Science, Peking University, 2003. (In Chinese)

Professional activities and service Program committee member, the 32th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2009) Program committee member, the North American Chapter of the Association for Computational Linguistics - Human Language Technologies 2009 conference (NAACL-HLT 2009) Program committee member, the 18th international conference on World Wide Web (WWW 2009) Program committee member, the 31th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008) Program committee member, the 17th international conference on World Wide Web (WWW 2008) Program committee member, the ACM 17th Conference on Information and Knowledge Management (CIKM 2008) Program committee member, the IADIS European Conference on Data Mining 2008 (ECDM 2008) Program committee member, the ACL 2008 Student Research Workshop Program committee member, the IADIS European Conference on Data Mining 2007 (ECDM 2007) Reviewer for major journals of information retrieval and data mining (e.g., ACM Transactions on Information Systems, ACM Transactions on the Knowledge Discovery from Data, ACM Transactions on the Web, IEEE Transactions on Knowledge and Data Engineering, International Journal of Data Warehousing and Mining) Coordinator of the Database and Information Systems (Yahoo!-DAIS) Seminar, 2007-2008

9

Qiaozhu Mei

Talks Invited Talk at the Search Labs, Microsoft Research, August 2008. Title: Context Analysis in Text Mining and Search. Invited Talk at the Department of Computer Science, Peking University, April 2008. Title: Contextual Text Mining with Probabilistic Models. Invited Talk at the Department of Computer Science, Tsinghua University, April 2008. Title: Contextual Text Mining with Probabilistic Models. Invited Poster Presentation at the Annual Department of Homeland Security University Network Summit, Washington D.C., March 2008. Title: Contextual Text Mining with Probabilistic Models. Invited Talk at the IRKM Lab, University of California, Santa Cruz, August 2007. Title: Topic-Sentiment Mixture: Modeling Facets and Opinions in Weblogs. Seminar Talks at the Yahoo!-Dais seminar, University of Illinois at Urbana-Champaign, August 2006, October 2006, April 2007, September 2007, April 2008, September 2008. Summer Intern Talks at Microsoft Research, June 2006, August 2006, August 2007. Conference Talks at KDD’05, KDD’06, WWW’07, SIGIR’07, KDD’07, WSDM’08, ACL’08, WWW’08, SIGIR’08, KDD’08, CIKM’08.

References Available upon request.

December, 2008