Incremental Algorithms for Closeness Centrality A. Erdem Sarıyüce 1,2, Kamer Kaya 1, Erik Saule 1*, Ümit V. Çatalyürek 1,3 1 Department of Biomedical InformaAcs 2 Department of Computer Science & Engineering 3 Department of Electrical & Computer Engineering
The Ohio State University * Department of Computer Science University of North Carolina CharloLe IEEE BigData 2013, Santa Clara, CA
Massive Graphs are everywhere • Facebook has a billion users and a trillion connections • Twitter has more than 200 million users
Topic 2
Topic 4
Topic 5
Topic 1 Topic 6 Topic 3
citation graphs IEEE BigData’13
Incremental Algorithms for Closeness Centrality
2
Large(r) Networks and Centrality • Who is more important in a network? Who controls the flow between nodes?
ERING AND
C OMPRESSING N ETWORKS
• Centrality metrics answer these quesAons t with a simple example: Let G = (V, E) ree with• n Closeness vertices henceCentrality m = n 1.(CC) If is orithm is used the complexity of computing an intriguing metric s is O(n2 ). However, by using a structural G, one can do much better: there is exactly
ween each vertex pair in V . Hence for a , bc[v] is the tnumber of (ordered) pairs • How o handle changes? ng via v, i.e.,
• Incremental algorithms are ⇥ ((lv rv ) + (n lv rv 1)(lv + rv )) essenAal
(a) A toy social network with various types of vertices: Arthur is an articulation vertex, Diana is a side vertex, Jack and Martin are degree-1 vertices, and Amy and May are identical vertices.
rv are the number of vertices in the left subtrees of v, respectively. Since lv and mputed in linear time for all v 2 V , this IEEE Incremental Algorithms for Closeness Centrality ich can be easily extended to an arbitrary BigData’13
3
Pr(d(u,v))=)x))
vertex. A farness maximal biconnected subgraph of defined G n aarticulation graph G, the of a vertex u is as 0.50# is a biconnected component. 0.40# X (CC) Closeness Centrality 0.30# A. Closeness Centrality far[u] = dG (u, v). 0.20#
• Let G=(V, E) be a graph with vertex set V and edge set E
0.10# Given a graph G, the farness of a vertex u is defined as v2V • Farness (far) of a vertex is the sum of shortest distances to each 0.00# dG (u,v)6 X =1 vertex far[u] = dG (u, v).
e closeness centrality of =u1 is defined as d (u,v)6 v2V
Figure 2. The pr
G
vertices is equal
• Closeness centrality (cc) of a vertex : And the closeness centrality of u is defined as
1 cc[u] = 1 . cc[u] = far[u] . far[u]
1# 2#
(1)
A. Work (1) Filteri
For efficient ues in case of a • If Best a lgorithm: A ll-‐pairs s hortest p aths cannot reach any vertex in in the graph cc[u] cc[u] = 0. filter which red nnot ureach any vertex the graph = 0. • Oa(|V|.|E|) omplexity for unweighted For sparse cunweighted graph G = networks (V, E) the the cost of each a • complexity sparse unweighted graph (V, E) Level-based the cc dcomputation is O(n(mG + n))=[2]. For For large of and ynamic networks 2 V ,computaAon Algorithm 1is executes a Single-Source From infeasible filterFor them out. xityeach• ofvertex ccsscratch computation is O(n(m + n)) [2]. Shortest Paths (SSSP), it initiates a breadth-first • Faster soluAons are ei.e., ssenAal be an edge to rtex s 2 V ,from Algorithm search (BFS) s, computes 1theexecutes distances to atheSingle-Source other the updated gr vertices and far[s], thei.e., sum ofit theinitiates distances which are t IEEE Paths (SSSP), a breadth-first that for a vertex 4 Algorithms Centrality cc[s]. Since different than 1. AsIncremental the last step,for itCloseness computes BigData’13
a BFS takes O(m + n) time, and n SSSPs are required in total, the complexity follows.
CC Algorithm
Algorithm 1: CC: Basic centrality computation
such ve Theo two ver cc[s] =
Data: G = (V, E) Single Source Shortest Path Pro Output: cc[.] (SSSP) is computed for each will not 1 for each s 2 V do vertex connect .SSSP(G, s) with centrality computation Q empty queue dG (s, v d[v] 1, 8v 2 V \ {s} new, lar Q.push(s), d[s] 0 connect far[s] 0 while Q is not empty do insertio Breadthv Q.pop() Case First for all w 2 G (v) do P0 Search with if d[w] = 1 then u–v farness Q.push(w) dG (s, u computation d[w] d[v] + 1 with on far[s] far[s] + d[w] 1 cc value is cc[s] = far[s] Case assigned return cc[.]
dG (s, u path in
IEEE BigData’13
Incremental Algorithms for Closeness Centrality
5
Incremental Closeness Centrality • Problem definiAon: Given a graph G=(V, E), closeness centrality values of verAces cc and an inserted (or removed) edge u-‐v; find the closeness centrality values cc’ of the graph G’ = (V, E U {u,v}) (or G’ = (V, E \ {u,v}) ) • CompuAng cc values from scratch ager each edge change is very costly • Need a faster algorithm
IEEE BigData’13
Incremental Algorithms for Closeness Centrality
6
Filtering Techniques
• We aim to reduce number of SSSPs to be executed
• Three filtering techniques are proposed • Filtering with level differences • Filtering with biconnected components • Filtering with idenAcal verAces
• And an addiAonal SSSP hybridizaAon technique
IEEE BigData’13
Incremental Algorithms for Closeness Centrality
7
Filtering with level differences • Upon edge inserAon, breadth-‐first search tree of each vertex will change. Three possibiliAes:
• Case 1 and 2 will not change cc of s! • No need to apply SSSP from them
• Just Case 3 • How to find such verAces? • BFSs are executed from u and v and level diff is checked Incremental Algorithms for Closeness Centrality
IEEE BigData’13
8
v to all other vertices. And, we filter the vertices satisfying Filtering with level d1. ifferences the statement of Theorem Algorithm 2: Simple work filtering Data: G = (V, E), cc[.], uv Output: cc0 [.] G0 (V, E [ {uv}) du[.] SSSP(G, u) . distances from u in G dv[.] SSSP(G, v) . distances from v in G for each s 2 V do if |du[s] dv[s]| 1 then Case 1 and 2 cc0 [s] = cc[s] else Case 3 . use the computation in Algorithm 1 with G0 return cc0 [.] IEEE BigData’13
Incremental Algorithms for Closeness Centrality
9
Filtering with biconnected components • What if the graph have arAculaAon points?
A
u
v B
• Change in A can change cc of any vertex in A and B • CompuAng the change for u is enough for finding changes for any vertex v in B (constant factor is added)
IEEE BigData’13
Incremental Algorithms for Closeness Centrality
10
Filtering with biconnected components • Maintain the biconnected decomposiAon
edge b-d added
IEEE BigData’13
Incremental Algorithms for Closeness Centrality
11
Filtering with idenJcal verJces • Two types of idenAcal verAces: • Type I: u and v are idenAcal verAces if their neighbor lists are same, i.e., Γ(u) = Γ(v) u v
• Type II: u and v are idenAcal verAces if their neighbor lists are same and they are also connected, i.e., {u} U Γ(u) = {v} U Γ(v) u v
• If u and v are idenAcal verAces, their cc are the same • Same breadth-‐first search trees! IEEE BigData’13
Incremental Algorithms for Closeness Centrality
12
Filtering with idenJcal verJces • Let VID be a subset of V and it’s a vertex class containing type-‐I or type-‐II idenAcal verAces. Then cc values of all the verAces in VID are equal • Applying SSSP from only one of them is enough!
• Type-‐I and type-‐II idenAcal verAces are found by simply hashing the neighbor lists
IEEE BigData’13
Incremental Algorithms for Closeness Centrality
13
SSSP HybridizaJon • BFS can be done in two ways: • Top-‐down: Uses the verAces in distance k to find the verAces in distance k+1 • BoLom-‐up: Ager all distance k verAces are found, all other unprocessed verAces are processed to see if they are neighbor • Top-‐down is expected to be beLer for small k values • Following the idea of Beamer et al. [SC’12], we apply hybrid approach • Simply compare the # of edges to be processed at level k • Choose the cheaper opAon
IEEE BigData’13
Incremental Algorithms for Closeness Centrality
14
Experiments • The techniques are evaluated on different sizes and types of large real-‐world social networks
processed
wo works Yet, both proposed pdates the rtions and he perfor(less than IEEE BigData’13
name hep-th PGPgiantcompo astro-ph cond-mat-2005
Graph
soc-sign-epinions loc-gowalla web-NotreDame amazon0601 web-Google wiki-Talk DBLP-coauthor
|V | 8.3K 10.6K 16.7K 40.4K
|E| 15.7K 24.3K 121.2K 175.6K
131K 196K 325K 403K 875K 2,394K 1,236K
711K 950K 1,090K 2,443K 4,322K 4,659K 9,081K
Incremental Algorithms for Closeness Centrality
Table I
Time (in sec.) Org. Best 1.41 0.05 4.96 0.04 14.56 0.36 77.90 2.87 43.5 778 6.25 2,267 53.18 2,845 53.06 14,903 298 65,306 824 175,450 922 115,919 251 99.8 15
is more useful for the graphs having characteristics similar DistribuJon toProbability small-world networks. 0.6" 0.4" 0.2"
Pr(X"="0)" Pr(X"="1)" Pr(X">"1)"
0"
The bars show the distribution of random variable X = |dG w)show dGt(v, into three wevinvestigated when an • (u, Bars he w)| distribuAon of cases random ariable of level edgedifferences uv is added. into three cases when an edge is inserted
Figure 4.
Filtering with identical vertices is not as useful as the Incremental Ain lgorithms for Closeness Centrality other two techniques the work filter. Overall, there is 16a IEEE BigData’13
Speedups ~100 times better Random inserAons for 10 graphs real temporal data Real inserAons for DBLP-‐coauthor graph shows larger speedups
• • • Speedups are w.r.t. full cc computaAon Graph hep-th PGPgiantcompo astro-ph cond-mat-2005 Geometric mean soc-sign-epinions loc-gowalla web-NotreDame amazon0601 web-Google wiki-Talk DBLP-coauthor Geometric mean
CC 1.413 4.960 14.567 77.903 9.444 778.870 2,267.187 2,845.367 14,903.080 65,306.600 175,450.720 115,919.518 13,884.152
CC-B 0.317 0.431 9.431 39.049 2.663 257.410 1,270.820 579.821 11,953.680 22,034.460 25,701.710 18,501.147 4,218.031
Time (secs) CC-BL 0.057 0.059 0.809 5.618 0.352 20.603 132.955 118.861 540.092 2,457.660 2,513.041 288.269 315.777
CC-BLI 0.053 0.055 0.645 4.687 0.306 19.935 135.015 83.817 551.867 1,701.249 2,123.096 251.557 273.036
CC-BLIH 0.048 0.045 0.359 2.865 0.217 6.254 53.182 53.059 298.095 824.417 922.828 252.647 139.170
CC-B 4.5 11.5 1.5 2.0 3.5 3.0 1.8 4.9 1.2 3.0 6.8 6.2 3.2
Speedups CC-BL CC-BLI 24.8 26.6 84.1 89.9 18.0 22.6 13.9 16.6 26.8 30.7 37.8 39.1 17.1 16.8 23.9 33.9 27.6 27.0 26.6 38.4 69.8 82.6 402.1 460.8 43.9 50.8
CC-BLIH 29.4 111.2 40.5 27.2 43.5 124.5 42.6 53.6 50.0 79.2 190.1 458.8 99.7
Filter time (secs) 0.001 0.001 0.004 0.010 0.003 0.041 0.063 0.050 0.158 0.267 0.491 0.530 0.146
Table II
XECUTION TIMES IN SECONDS OF ALL THE ALGORITHMS AND SPEEDUPS WHEN COMPARED WITH THE BASIC CLOSENE biconnected ALITY ALGORITHM CC. I N THE TABLE CC-B IS THE VARIANT WHICH USES ONLY BCD S , CC-BL USES BCD S AND FIL decomposition LEVELS , CC-BLI USES ALL THREE WORK FILTERING TECHNIQUES INCLUDING IDENTICAL VERTICES . A ND CC-BLIH level differences ALL THE TECHNIQUES DESCRIBED IN THIS PAPER INCLUDING SSSP HYBRIDIZATION .
brings 3x 1.15x speedup Hybridization filtering provides 14x impact of level filtering can also be seen on Figure 5. NPRP grant 4-1454-1-233 from the Qatar National R speedup with identical speedup component do f the edges in the main biconnected Fund (a member brings of Qatar 2x Foundation). The statemen IEEE the closeness values of many vertices and the vertices ange herein are solely the responsibility of the authors. BigData’13
Incremental Algorithms for Closeness Centrality
17
Conclusion • First algorithms for incremental closeness centrality computaAon • Update Ame of a real temporal data is reduced from 1.3 days to 4.2 mins • Fundamental building block for streaming workloads and centrality management problem • Future Work: • Sampling-‐based soluAons • ParallelizaAon • A.E. Sarıyuce, E. Saule, K. Kaya, Ümit V. Çatalyürek. STREAMER: a Distributed Framework for Incremental Closeness Centrality ComputaJon, IEEE Cluster 2013.
IEEE BigData’13
Incremental Algorithms for Closeness Centrality
18
Thanks • For more informaAon • Email
[email protected] • Visit hLp://bmi.osu.edu/~umit or hLp://bmi.osu.edu/hpc
• Acknowledgement of Support
IEEE BigData’13
Incremental Algorithms for Closeness Centrality
19