IEEE Big Data 2015
Inferring Crowd-Sourced Venues for Tweets Bokai Cao∗, Francine Chen†, Dhiraj Joshi† and Philip S. Yu∗‡ ∗ University of Illinois at Chicago †FX Palo Alto Laboratory ‡Tsinghua University
Motivations
2
Motivations •
Over 500 million tweets are generated per day.
http://www.internetlivestats.com/twitter-statistics/ 2
Motivations •
Over 500 million tweets are generated per day.
•
Less than 1% of tweets are geotagged [Wang et al. 2014].
2
Motivations •
Over 500 million tweets are generated per day.
•
Less than 1% of tweets are geotagged [Wang et al. 2014].
•
Inferring the location of non-geotagged tweets can facilitate better understanding of users’ geographic context, …
2
Motivations
3
Motivations •
Tweets are usually short and informal, without clear location signals,
3
Motivations •
Tweets are usually short and informal, without clear location signals,
3
Motivations •
Tweets are usually short and informal, without clear location signals,
•
especially for chain stores.
3
Motivations •
Tweets are usually short and informal, without clear location signals,
•
especially for chain stores.
3
Motivations •
Most existing studies infer the location of a user or a tweet at a coarse level of granularity.
4
Problem Definition •
Venue Inference for Tweets (VIT) Given a non-geotagged tweet ti and a candidate venue vp, estimate the tweet's probability of being posted at the venue, P(y(ti,vp)=1), such that the venue with the maximum probability vest(ti) is the tweet’s actual venue vact(ti).
5
Twitter + Foursquare tweets venues users users tips
6
Twitter + Foursquare tweets venues users ?
users tips
?
6
Twitter + Foursquare •
Anchor links
7
Twitter + Foursquare •
Anchor links
7
Twitter + Foursquare •
Anchor links
7
Constructing the Heterogeneous Information Network
8
Constructing the Heterogeneous Information Network
• • 8
Tweets from Jun. 2013 to Apr. 2014 Tips from Feb. 2009 to Jun. 2014
Constructing the Heterogeneous Information Network
• • 8
Tweets from Jun. 2013 to Apr. 2014 Tips from Feb. 2009 to Jun. 2014
Exploiting the Heterogeneous Information Network •
EgoPath
9
Exploiting the Heterogeneous Information Network •
EgoPath 1. Check in at
9
Exploiting the Heterogeneous Information Network •
EgoPath 1. Check in at 2. Being a mayor of
9
Exploiting the Heterogeneous Information Network •
EgoPath 1. Check in at 2. Being a mayor of 3. Writing a tip about
9
Exploiting the Heterogeneous Information Network •
EgoPath 1. Check in at 2. Being a mayor of 3. Writing a tip about
9
Exploiting the Heterogeneous Information Network •
FriendPath Homophily principle in social science
10
Exploiting the Heterogeneous Information Network •
FriendPath Homophily principle in social science
10
Exploiting the Heterogeneous Information Network •
FriendPath Homophily principle in social science
10
Exploiting the Heterogeneous Information Network •
FriendPath Homophily principle in social science
10
Exploiting the Heterogeneous Information Network •
FriendPath Homophily principle in social science
10
Exploiting the Heterogeneous Information Network •
FriendPath Homophily principle in social science
10
Exploiting the Heterogeneous Information Network •
InterestPath Users tend to tweet at similar venues (based on their interests)
11
Exploiting the Heterogeneous Information Network •
InterestPath Users tend to tweet at similar venues (based on their interests)
11
Exploiting the Heterogeneous Information Network •
InterestPath Users tend to tweet at similar venues (based on their interests)
11
Exploiting the Heterogeneous Information Network •
InterestPath Users tend to tweet at similar venues (based on their interests)
11
Exploiting the Heterogeneous Information Network •
InterestPath Users tend to tweet at similar venues (based on their interests)
11
Exploiting the Heterogeneous Information Network •
InterestPath Users tend to tweet at similar venues (based on their interests)
11
Exploiting the Heterogeneous Information Network •
InterestPath Users tend to tweet at similar venues (based on their interests)
11
Exploiting the Heterogeneous Information Network •
TextPath Sharing common content words via tips
12
Exploiting the Heterogeneous Information Network •
The path counts are used as elements of the feature vectors.
13
Leveraging the Geographic Context
14
Leveraging the Geographic Context •
Tweet ti is more likely to be associated with venue vp if user ui him/ herself has posted any geotagged tweets in the neighborhood of vp
14
Leveraging the Geographic Context •
Tweet ti is more likely to be associated with venue vp if user ui him/ herself has posted any geotagged tweets in the neighborhood of vp
•
Tweet ti is more likely to be associated with venue vp if user ui ’s friends uk have posted any geotagged tweets in the neighborhood of vp
14
Experiments
15
Experiments •
Feature analysis: P(y(ti,vp)=1)
15
Experiments •
Feature analysis: P(y(ti,vp)=1)
15
Experiments
16
Experiments •
Ranking geo-located venues: P(y(ti,vp1)=1) > P(y(ti,vp2)=1) > P(y(ti,vp3)=1) > …
16
Experiments •
Ranking geo-located venues: P(y(ti,vp1)=1) > P(y(ti,vp2)=1) > P(y(ti,vp3)=1) > … •
Chain stores
16
Experiments •
Ranking geo-located venues: P(y(ti,vp1)=1) > P(y(ti,vp2)=1) > P(y(ti,vp3)=1) > … •
Chain stores
•
Confined geographic areas
16
Experiments •
•
Ranking geo-located venues: P(y(ti,vp1)=1) > P(y(ti,vp2)=1) > P(y(ti,vp3)=1) > … •
Chain stores
•
Confined geographic areas
Evaluation metrics
16
Experiments •
Starbucks, McDonald’s, and Apple Stores
17
Experiments •
Venues in the Stanford Shopping Center
18
Summary and Conclusions
19
Summary and Conclusions •
Constructed a heterogeneous social network modeling users' social relations and activities, textual information.
19
Summary and Conclusions •
Constructed a heterogeneous social network modeling users' social relations and activities, textual information.
•
Presented a method for using social network information and geographic context to predict the geo-located venue of a tweet.
19
Summary and Conclusions •
Constructed a heterogeneous social network modeling users' social relations and activities, textual information.
•
Presented a method for using social network information and geographic context to predict the geo-located venue of a tweet.
•
Evaluated the proposed model using datasets collected from Twitter and Foursquare on predefined geographic regions and on chain stores. 19
Future Work •
Investigate approaches that can jointly model the social network and the geographic context.
•
Enhance the Foursquare-based text paths with more sophisticated language models.
•
Incorporate temporal information.
•
Implement the method on a distributed system.
20
Q&A