Incremental Algorithms for Closeness Centrality

Incremental Algorithms for Closeness Centrality A. Erdem Sarıyüce 1,2, Kamer Kaya 1, Erik Saule 1*, Ümit V. Çatalyürek 1,3 1 De...

Author: Avis Higgins

0 downloads 0 Views 3MB Size

Report

Download PDF

Recommend Documents

Random-Walk Closeness Centrality Satisfies Boldi-Vigna Axioms

Parallel Algorithms for Evaluating Centrality Indices in Real-world Networks

Incremental Algorithms for Inter-procedural Analysis of Safety Properties

Axioms for Centrality

Incremental Parser for Czech

Perceptual evaluation of shaving closeness

Shattering and Compressing Networks for Betweenness Centrality

CLOSENESS: PRIVACY MEASURE FOR DATA PUBLISHING USING MULTIPLE SENSITIVE ATTRIBUTES

Incremental Clustering: The Case for Extra Clusters

Algorithms for DNA Sequencing

Algorithms For Interviews

Algorithms for Motif Search

An Algorithm for Incremental Timing Analysis

MODELS AND ALGORITHMS FOR

GENETIC ALGORITHMS FOR OPTIMIZATION

Algorithms for Molecular Biology

Algorithms for Energy Saving

Algorithms for Image Segmentation

Algorithms for Computer Games

Algorithms for Deconvolution Microscopy

Incremental Construction Cost Analysis for New Homes

INCREMENTAL SUBGRADIENT METHODS 1 FOR NONDIFFERENTIABLE OPTIMIZATION

Deriving Production Rules for Incremental View Maintenance

Incremental Dictionary Learning for Unsupervised Domain Adaptation

Incremental Algorithms for Closeness Centrality A. Erdem Sarıyüce 1,2, Kamer Kaya 1, Erik Saule 1*, Ümit V. Çatalyürek 1,3 1 Department of Biomedical InformaAcs 2 Department of Computer Science & Engineering 3 Department of Electrical & Computer Engineering

The Ohio State University * Department of Computer Science University of North Carolina CharloLe IEEE BigData 2013, Santa Clara, CA

Massive Graphs are everywhere •  Facebook has a billion users and a trillion connections •  Twitter has more than 200 million users

Topic 2

Topic 4

Topic 5

Topic 1 Topic 6 Topic 3

citation graphs IEEE BigData’13

Incremental Algorithms for Closeness Centrality

2

Large(r) Networks and Centrality •  Who is more important in a network? Who controls the ﬂow between nodes?

ERING AND

C OMPRESSING N ETWORKS

•  Centrality metrics answer these quesAons t with a simple example: Let G = (V, E) ree with• n Closeness vertices henceCentrality m = n 1.(CC) If is orithm is used the complexity of computing an intriguing metric s is O(n2 ). However, by using a structural G, one can do much better: there is exactly

ween each vertex pair in V . Hence for a , bc[v] is the tnumber of (ordered) pairs •  How o handle changes? ng via v, i.e.,

•  Incremental algorithms are ⇥ ((lv rv ) + (n lv rv 1)(lv + rv )) essenAal

(a) A toy social network with various types of vertices: Arthur is an articulation vertex, Diana is a side vertex, Jack and Martin are degree-1 vertices, and Amy and May are identical vertices.

rv are the number of vertices in the left subtrees of v, respectively. Since lv and mputed in linear time for all v 2 V , this IEEE Incremental Algorithms for Closeness Centrality ich can be easily extended to an arbitrary BigData’13

3

Pr(d(u,v))=)x))

vertex. A farness maximal biconnected subgraph of defined G n aarticulation graph G, the of a vertex u is as 0.50# is a biconnected component. 0.40# X (CC) Closeness Centrality 0.30# A. Closeness Centrality far[u] = dG (u, v). 0.20#

•  Let G=(V, E) be a graph with vertex set V and edge set E

0.10# Given a graph G, the farness of a vertex u is defined as v2V •  Farness (far) of a vertex is the sum of shortest distances to each 0.00# dG (u,v)6 X =1 vertex far[u] = dG (u, v).

e closeness centrality of =u1 is defined as d (u,v)6 v2V

Figure 2. The pr

G

vertices is equal

•  Closeness centrality (cc) of a vertex : And the closeness centrality of u is defined as

1 cc[u] = 1 . cc[u] = far[u] . far[u]

1# 2#

(1)

A. Work (1) Filteri

For efficient ues in case of a • If Best a lgorithm: A ll-‐pairs s hortest p aths cannot reach any vertex in in the graph cc[u] cc[u] = 0. filter which red nnot ureach any vertex the graph = 0. •  Oa(|V|.|E|) omplexity for unweighted For sparse cunweighted graph G = networks (V, E) the the cost of each a • complexity sparse unweighted graph (V, E) Level-based the cc dcomputation is O(n(mG + n))=[2]. For For large of and ynamic networks 2 V ,computaAon Algorithm 1is executes a Single-Source From infeasible filterFor them out. xityeach• ofvertex ccsscratch computation is O(n(m + n)) [2]. Shortest Paths (SSSP), it initiates a breadth-first •  Faster soluAons are ei.e., ssenAal be an edge to rtex s 2 V ,from Algorithm search (BFS) s, computes 1theexecutes distances to atheSingle-Source other the updated gr vertices and far[s], thei.e., sum ofit theinitiates distances which are t IEEE Paths (SSSP), a breadth-first that for a vertex 4 Algorithms Centrality cc[s]. Since different than 1. AsIncremental the last step,for itCloseness computes BigData’13

a BFS takes O(m + n) time, and n SSSPs are required in total, the complexity follows.

CC Algorithm

Algorithm 1: CC: Basic centrality computation

such ve Theo two ver cc[s] =

Data: G = (V, E) Single Source Shortest Path Pro Output: cc[.] (SSSP) is computed for each will not 1 for each s 2 V do vertex connect .SSSP(G, s) with centrality computation Q empty queue dG (s, v d[v] 1, 8v 2 V \ {s} new, lar Q.push(s), d[s] 0 connect far[s] 0 while Q is not empty do insertio Breadthv Q.pop() Case First for all w 2 G (v) do P0 Search with if d[w] = 1 then u–v farness Q.push(w) dG (s, u computation d[w] d[v] + 1 with on far[s] far[s] + d[w] 1 cc value is cc[s] = far[s] Case assigned return cc[.]

dG (s, u path in

IEEE BigData’13

Incremental Algorithms for Closeness Centrality

5

Incremental Closeness Centrality •  Problem deﬁniAon: Given a graph G=(V, E), closeness centrality values of verAces cc and an inserted (or removed) edge u-‐v; ﬁnd the closeness centrality values cc’ of the graph G’ = (V, E U {u,v}) (or G’ = (V, E \ {u,v}) ) •  CompuAng cc values from scratch ager each edge change is very costly •  Need a faster algorithm

IEEE BigData’13

Incremental Algorithms for Closeness Centrality

6

Filtering Techniques

•  We aim to reduce number of SSSPs to be executed

•  Three ﬁltering techniques are proposed •  Filtering with level diﬀerences •  Filtering with biconnected components •  Filtering with idenAcal verAces

•  And an addiAonal SSSP hybridizaAon technique

IEEE BigData’13

Incremental Algorithms for Closeness Centrality

7

Filtering with level diﬀerences •  Upon edge inserAon, breadth-‐ﬁrst search tree of each vertex will change. Three possibiliAes:

•  Case 1 and 2 will not change cc of s! •  No need to apply SSSP from them

•  Just Case 3 •  How to ﬁnd such verAces? •  BFSs are executed from u and v and level diﬀ is checked Incremental Algorithms for Closeness Centrality

IEEE BigData’13

8

v to all other vertices. And, we filter the vertices satisfying Filtering with level d1. iﬀerences the statement of Theorem Algorithm 2: Simple work filtering Data: G = (V, E), cc[.], uv Output: cc0 [.] G0 (V, E [ {uv}) du[.] SSSP(G, u) . distances from u in G dv[.] SSSP(G, v) . distances from v in G for each s 2 V do if |du[s] dv[s]|  1 then Case 1 and 2 cc0 [s] = cc[s] else Case 3 . use the computation in Algorithm 1 with G0 return cc0 [.] IEEE BigData’13

Incremental Algorithms for Closeness Centrality

9

Filtering with biconnected components •  What if the graph have arAculaAon points?

A

u

v B

•  Change in A can change cc of any vertex in A and B •  CompuAng the change for u is enough for ﬁnding changes for any vertex v in B (constant factor is added)

IEEE BigData’13

Incremental Algorithms for Closeness Centrality

10

Filtering with biconnected components •  Maintain the biconnected decomposiAon

edge b-d added

IEEE BigData’13

Incremental Algorithms for Closeness Centrality

11

Filtering with idenJcal verJces •  Two types of idenAcal verAces: •  Type I: u and v are idenAcal verAces if their neighbor lists are same, i.e., Γ(u) = Γ(v) u v

•  Type II: u and v are idenAcal verAces if their neighbor lists are same and they are also connected, i.e., {u} U Γ(u) = {v} U Γ(v) u v

•  If u and v are idenAcal verAces, their cc are the same •  Same breadth-‐ﬁrst search trees! IEEE BigData’13

Incremental Algorithms for Closeness Centrality

12

Filtering with idenJcal verJces •  Let VID be a subset of V and it’s a vertex class containing type-‐I or type-‐II idenAcal verAces. Then cc values of all the verAces in VID are equal •  Applying SSSP from only one of them is enough!

•  Type-‐I and type-‐II idenAcal verAces are found by simply hashing the neighbor lists

IEEE BigData’13

Incremental Algorithms for Closeness Centrality

13

SSSP HybridizaJon •  BFS can be done in two ways: •  Top-‐down: Uses the verAces in distance k to ﬁnd the verAces in distance k+1 •  BoLom-‐up: Ager all distance k verAces are found, all other unprocessed verAces are processed to see if they are neighbor •  Top-‐down is expected to be beLer for small k values •  Following the idea of Beamer et al. [SC’12], we apply hybrid approach •  Simply compare the # of edges to be processed at level k •  Choose the cheaper opAon

IEEE BigData’13

Incremental Algorithms for Closeness Centrality

14

Experiments •  The techniques are evaluated on diﬀerent sizes and types of large real-‐world social networks

processed

wo works Yet, both proposed pdates the rtions and he perfor(less than IEEE BigData’13

name hep-th PGPgiantcompo astro-ph cond-mat-2005

Graph

soc-sign-epinions loc-gowalla web-NotreDame amazon0601 web-Google wiki-Talk DBLP-coauthor

|V | 8.3K 10.6K 16.7K 40.4K

|E| 15.7K 24.3K 121.2K 175.6K

131K 196K 325K 403K 875K 2,394K 1,236K

711K 950K 1,090K 2,443K 4,322K 4,659K 9,081K

Incremental Algorithms for Closeness Centrality

Table I

Time (in sec.) Org. Best 1.41 0.05 4.96 0.04 14.56 0.36 77.90 2.87 43.5 778 6.25 2,267 53.18 2,845 53.06 14,903 298 65,306 824 175,450 922 115,919 251 99.8 15

is more useful for the graphs having characteristics similar DistribuJon toProbability small-world networks. 0.6" 0.4" 0.2"

Pr(X"="0)" Pr(X"="1)" Pr(X">"1)"

0"

The bars show the distribution of random variable X = |dG w)show dGt(v, into three wevinvestigated when an •  (u, Bars he w)| distribuAon of cases random ariable of level edgediﬀerences uv is added. into three cases when an edge is inserted

Figure 4.

Filtering with identical vertices is not as useful as the Incremental Ain lgorithms for Closeness Centrality other two techniques the work filter. Overall, there is 16a IEEE BigData’13

Speedups ~100 times better Random inserAons for 10 graphs real temporal data Real inserAons for DBLP-‐coauthor graph shows larger speedups

•  •  •  Speedups are w.r.t. full cc computaAon Graph hep-th PGPgiantcompo astro-ph cond-mat-2005 Geometric mean soc-sign-epinions loc-gowalla web-NotreDame amazon0601 web-Google wiki-Talk DBLP-coauthor Geometric mean

CC 1.413 4.960 14.567 77.903 9.444 778.870 2,267.187 2,845.367 14,903.080 65,306.600 175,450.720 115,919.518 13,884.152

CC-B 0.317 0.431 9.431 39.049 2.663 257.410 1,270.820 579.821 11,953.680 22,034.460 25,701.710 18,501.147 4,218.031

Time (secs) CC-BL 0.057 0.059 0.809 5.618 0.352 20.603 132.955 118.861 540.092 2,457.660 2,513.041 288.269 315.777

CC-BLI 0.053 0.055 0.645 4.687 0.306 19.935 135.015 83.817 551.867 1,701.249 2,123.096 251.557 273.036

CC-BLIH 0.048 0.045 0.359 2.865 0.217 6.254 53.182 53.059 298.095 824.417 922.828 252.647 139.170

CC-B 4.5 11.5 1.5 2.0 3.5 3.0 1.8 4.9 1.2 3.0 6.8 6.2 3.2

Speedups CC-BL CC-BLI 24.8 26.6 84.1 89.9 18.0 22.6 13.9 16.6 26.8 30.7 37.8 39.1 17.1 16.8 23.9 33.9 27.6 27.0 26.6 38.4 69.8 82.6 402.1 460.8 43.9 50.8

CC-BLIH 29.4 111.2 40.5 27.2 43.5 124.5 42.6 53.6 50.0 79.2 190.1 458.8 99.7

Filter time (secs) 0.001 0.001 0.004 0.010 0.003 0.041 0.063 0.050 0.158 0.267 0.491 0.530 0.146

Table II

XECUTION TIMES IN SECONDS OF ALL THE ALGORITHMS AND SPEEDUPS WHEN COMPARED WITH THE BASIC CLOSENE biconnected ALITY ALGORITHM CC. I N THE TABLE CC-B IS THE VARIANT WHICH USES ONLY BCD S , CC-BL USES BCD S AND FIL decomposition LEVELS , CC-BLI USES ALL THREE WORK FILTERING TECHNIQUES INCLUDING IDENTICAL VERTICES . A ND CC-BLIH level differences ALL THE TECHNIQUES DESCRIBED IN THIS PAPER INCLUDING SSSP HYBRIDIZATION .

brings 3x 1.15x speedup Hybridization filtering provides 14x impact of level filtering can also be seen on Figure 5. NPRP grant 4-1454-1-233 from the Qatar National R speedup with identical speedup component do f the edges in the main biconnected Fund (a member brings of Qatar 2x Foundation). The statemen IEEE the closeness values of many vertices and the vertices ange herein are solely the responsibility of the authors. BigData’13

Incremental Algorithms for Closeness Centrality

17

Conclusion •  First algorithms for incremental closeness centrality computaAon •  Update Ame of a real temporal data is reduced from 1.3 days to 4.2 mins •  Fundamental building block for streaming workloads and centrality management problem •  Future Work: •  Sampling-‐based soluAons •  ParallelizaAon •  A.E. Sarıyuce, E. Saule, K. Kaya, Ümit V. Çatalyürek. STREAMER: a Distributed Framework for Incremental Closeness Centrality ComputaJon, IEEE Cluster 2013.

IEEE BigData’13

Incremental Algorithms for Closeness Centrality

18

Thanks •  For more informaAon •  Email [email protected] •  Visit hLp://bmi.osu.edu/~umit or hLp://bmi.osu.edu/hpc

•  Acknowledgement of Support

IEEE BigData’13

Incremental Algorithms for Closeness Centrality

19