Inferring Crowd-Sourced Venues for Tweets

IEEE Big Data 2015 Inferring Crowd-Sourced Venues for Tweets Bokai Cao∗, Francine Chen†, Dhiraj Joshi† and Philip S. Yu∗‡ ∗ University of Illinois at...
Author: Randolf Goodwin
4 downloads 0 Views 4MB Size
IEEE Big Data 2015

Inferring Crowd-Sourced Venues for Tweets Bokai Cao∗, Francine Chen†, Dhiraj Joshi† and Philip S. Yu∗‡ ∗ University of Illinois at Chicago †FX Palo Alto Laboratory ‡Tsinghua University

Motivations

2

Motivations •

Over 500 million tweets are generated per day.

http://www.internetlivestats.com/twitter-statistics/ 2

Motivations •

Over 500 million tweets are generated per day.



Less than 1% of tweets are geotagged [Wang et al. 2014].

2

Motivations •

Over 500 million tweets are generated per day.



Less than 1% of tweets are geotagged [Wang et al. 2014].



Inferring the location of non-geotagged tweets can facilitate better understanding of users’ geographic context, …

2

Motivations

3

Motivations •

Tweets are usually short and informal, without clear location signals,

3

Motivations •

Tweets are usually short and informal, without clear location signals,

3

Motivations •

Tweets are usually short and informal, without clear location signals,



especially for chain stores.

3

Motivations •

Tweets are usually short and informal, without clear location signals,



especially for chain stores.

3

Motivations •

Most existing studies infer the location of a user or a tweet at a coarse level of granularity.

4

Problem Definition •

Venue Inference for Tweets (VIT) Given a non-geotagged tweet ti and a candidate venue vp, estimate the tweet's probability of being posted at the venue, P(y(ti,vp)=1), such that the venue with the maximum probability vest(ti) is the tweet’s actual venue vact(ti).

5

Twitter + Foursquare tweets venues users users tips

6

Twitter + Foursquare tweets venues users ?

users tips

?

6

Twitter + Foursquare •

Anchor links

7

Twitter + Foursquare •

Anchor links

7

Twitter + Foursquare •

Anchor links

7

Constructing the Heterogeneous Information Network

8

Constructing the Heterogeneous Information Network

• • 8

Tweets from Jun. 2013 to Apr. 2014 Tips from Feb. 2009 to Jun. 2014

Constructing the Heterogeneous Information Network

• • 8

Tweets from Jun. 2013 to Apr. 2014 Tips from Feb. 2009 to Jun. 2014

Exploiting the Heterogeneous Information Network •

EgoPath

9

Exploiting the Heterogeneous Information Network •

EgoPath 1. Check in at

9

Exploiting the Heterogeneous Information Network •

EgoPath 1. Check in at 2. Being a mayor of

9

Exploiting the Heterogeneous Information Network •

EgoPath 1. Check in at 2. Being a mayor of 3. Writing a tip about

9

Exploiting the Heterogeneous Information Network •

EgoPath 1. Check in at 2. Being a mayor of 3. Writing a tip about

9

Exploiting the Heterogeneous Information Network •

FriendPath Homophily principle in social science

10

Exploiting the Heterogeneous Information Network •

FriendPath Homophily principle in social science

10

Exploiting the Heterogeneous Information Network •

FriendPath Homophily principle in social science

10

Exploiting the Heterogeneous Information Network •

FriendPath Homophily principle in social science

10

Exploiting the Heterogeneous Information Network •

FriendPath Homophily principle in social science

10

Exploiting the Heterogeneous Information Network •

FriendPath Homophily principle in social science

10

Exploiting the Heterogeneous Information Network •

InterestPath Users tend to tweet at similar venues (based on their interests)

11

Exploiting the Heterogeneous Information Network •

InterestPath Users tend to tweet at similar venues (based on their interests)

11

Exploiting the Heterogeneous Information Network •

InterestPath Users tend to tweet at similar venues (based on their interests)

11

Exploiting the Heterogeneous Information Network •

InterestPath Users tend to tweet at similar venues (based on their interests)

11

Exploiting the Heterogeneous Information Network •

InterestPath Users tend to tweet at similar venues (based on their interests)

11

Exploiting the Heterogeneous Information Network •

InterestPath Users tend to tweet at similar venues (based on their interests)

11

Exploiting the Heterogeneous Information Network •

InterestPath Users tend to tweet at similar venues (based on their interests)

11

Exploiting the Heterogeneous Information Network •

TextPath Sharing common content words via tips

12

Exploiting the Heterogeneous Information Network •

The path counts are used as elements of the feature vectors.

13

Leveraging the Geographic Context

14

Leveraging the Geographic Context •

Tweet ti is more likely to be associated with venue vp if user ui him/ herself has posted any geotagged tweets in the neighborhood of vp

14

Leveraging the Geographic Context •

Tweet ti is more likely to be associated with venue vp if user ui him/ herself has posted any geotagged tweets in the neighborhood of vp



Tweet ti is more likely to be associated with venue vp if user ui ’s friends uk have posted any geotagged tweets in the neighborhood of vp

14

Experiments

15

Experiments •

Feature analysis: P(y(ti,vp)=1)

15

Experiments •

Feature analysis: P(y(ti,vp)=1)

15

Experiments

16

Experiments •

Ranking geo-located venues: P(y(ti,vp1)=1) > P(y(ti,vp2)=1) > P(y(ti,vp3)=1) > …

16

Experiments •

Ranking geo-located venues: P(y(ti,vp1)=1) > P(y(ti,vp2)=1) > P(y(ti,vp3)=1) > … •

Chain stores

16

Experiments •

Ranking geo-located venues: P(y(ti,vp1)=1) > P(y(ti,vp2)=1) > P(y(ti,vp3)=1) > … •

Chain stores



Confined geographic areas

16

Experiments •



Ranking geo-located venues: P(y(ti,vp1)=1) > P(y(ti,vp2)=1) > P(y(ti,vp3)=1) > … •

Chain stores



Confined geographic areas

Evaluation metrics

16

Experiments •

Starbucks, McDonald’s, and Apple Stores

17

Experiments •

Venues in the Stanford Shopping Center

18

Summary and Conclusions

19

Summary and Conclusions •

Constructed a heterogeneous social network modeling users' social relations and activities, textual information.

19

Summary and Conclusions •

Constructed a heterogeneous social network modeling users' social relations and activities, textual information.



Presented a method for using social network information and geographic context to predict the geo-located venue of a tweet.

19

Summary and Conclusions •

Constructed a heterogeneous social network modeling users' social relations and activities, textual information.



Presented a method for using social network information and geographic context to predict the geo-located venue of a tweet.



Evaluated the proposed model using datasets collected from Twitter and Foursquare on predefined geographic regions and on chain stores. 19

Future Work •

Investigate approaches that can jointly model the social network and the geographic context.



Enhance the Foursquare-based text paths with more sophisticated language models.



Incorporate temporal information.



Implement the method on a distributed system.

20

Q&A