Incremental Algorithms for Closeness Centrality

Incremental  Algorithms  for   Closeness  Centrality   A.  Erdem  Sarıyüce  1,2,  Kamer  Kaya  1,  Erik  Saule  1*,  Ümit  V.  Çatalyürek  1,3   1  De...
Author: Avis Higgins
0 downloads 0 Views 3MB Size
Incremental  Algorithms  for   Closeness  Centrality   A.  Erdem  Sarıyüce  1,2,  Kamer  Kaya  1,  Erik  Saule  1*,  Ümit  V.  Çatalyürek  1,3   1  Department  of  Biomedical  InformaAcs   2  Department  of  Computer  Science  &  Engineering     3  Department  of  Electrical  &  Computer  Engineering  

The  Ohio  State  University     *  Department  of  Computer  Science   University  of  North  Carolina  CharloLe     IEEE  BigData  2013,  Santa  Clara,  CA      

Massive  Graphs  are  everywhere   •  Facebook has a billion users and a trillion connections •  Twitter has more than 200 million users

Topic 2

Topic 4

Topic 5

Topic 1 Topic 6 Topic 3

citation graphs IEEE BigData’13

Incremental  Algorithms  for  Closeness  Centrality  

2

Large(r)  Networks  and  Centrality   •  Who  is  more  important  in   a  network?  Who  controls   the  flow  between  nodes?  

ERING AND

C OMPRESSING N ETWORKS

•  Centrality  metrics  answer   these   quesAons   t with a simple example: Let G = (V, E) ree with• n Closeness   vertices henceCentrality   m = n 1.(CC)   If is   orithm is used the complexity of computing an  intriguing  metric   s is O(n2 ). However, by using a structural G, one can   do much better: there is exactly

ween each vertex pair in V . Hence for a , bc[v] is the tnumber of (ordered) pairs •  How   o  handle   changes?   ng via v, i.e.,

•  Incremental  algorithms  are   ⇥ ((lv rv ) + (n lv rv 1)(lv + rv )) essenAal  

(a) A toy social network with various types of vertices: Arthur is an articulation vertex, Diana is a side vertex, Jack and Martin are degree-1 vertices, and Amy and May are identical vertices.

rv are the number of vertices in the left subtrees of v, respectively. Since lv and mputed in linear time for all v 2 V , this IEEE Incremental   Algorithms  for  Closeness  Centrality   ich can be easily extended to an arbitrary BigData’13

3

Pr(d(u,v))=)x))

vertex. A farness maximal biconnected subgraph of defined G n aarticulation graph G, the of a vertex u is as 0.50# is a biconnected component. 0.40# X (CC)   Closeness  Centrality   0.30# A. Closeness Centrality far[u] = dG (u, v). 0.20#

•  Let  G=(V,  E)  be  a  graph  with  vertex  set  V  and  edge  set  E  

0.10# Given a graph G, the farness of a vertex u is defined as v2V •  Farness  (far)  of  a  vertex   is  the  sum  of  shortest  distances  to  each  0.00# dG (u,v)6 X =1 vertex   far[u] = dG (u, v).

e closeness centrality of =u1 is defined as d (u,v)6 v2V

Figure 2. The pr

G

vertices is equal

•  Closeness  centrality  (cc)  of  a  vertex  :     And the closeness centrality of u is defined as

1 cc[u] = 1 . cc[u] = far[u] . far[u]

1# 2#

(1)

A. Work (1) Filteri

For efficient ues in case of a • If Best   a lgorithm:   A ll-­‐pairs   s hortest   p aths   cannot reach any vertex in in the graph cc[u] cc[u] = 0. filter which red nnot ureach any vertex the graph = 0. •   Oa(|V|.|E|)   omplexity  for   unweighted   For sparse cunweighted graph G = networks   (V, E) the the cost of each a • complexity sparse unweighted graph (V, E) Level-based the cc dcomputation is O(n(mG + n))=[2]. For For  large  of and   ynamic  networks   2 V ,computaAon   Algorithm 1is   executes a  Single-Source From   infeasible   filterFor them out. xityeach• ofvertex ccsscratch   computation is O(n(m + n)) [2]. Shortest Paths (SSSP), it initiates a breadth-first •  Faster   soluAons   are  ei.e., ssenAal   be an edge to rtex s 2 V ,from Algorithm search (BFS) s, computes 1theexecutes distances to atheSingle-Source other the updated gr vertices and far[s], thei.e., sum ofit theinitiates distances which are t IEEE  Paths (SSSP), a breadth-first that for a vertex 4 Algorithms   Centrality   cc[s]. Since different than 1. AsIncremental   the last step,for  itCloseness   computes BigData’13

a BFS takes O(m + n) time, and n SSSPs are required in total, the complexity follows.

CC  Algorithm  

Algorithm 1: CC: Basic centrality computation

such ve Theo two ver cc[s] =

Data: G = (V, E) Single Source Shortest Path Pro Output: cc[.] (SSSP) is computed for each will not 1 for each s 2 V do vertex connect .SSSP(G, s) with centrality computation Q empty queue dG (s, v d[v] 1, 8v 2 V \ {s} new, lar Q.push(s), d[s] 0 connect far[s] 0 while Q is not empty do insertio Breadthv Q.pop() Case First for all w 2 G (v) do P0 Search with if d[w] = 1 then u–v farness Q.push(w) dG (s, u computation d[w] d[v] + 1 with on far[s] far[s] + d[w] 1 cc value is cc[s] = far[s] Case assigned return cc[.]

dG (s, u path in

IEEE BigData’13

Incremental  Algorithms  for  Closeness  Centrality  

5

Incremental  Closeness  Centrality     •  Problem  definiAon:  Given  a  graph  G=(V,  E),  closeness   centrality  values  of  verAces  cc  and  an  inserted  (or   removed)  edge  u-­‐v;  find  the  closeness  centrality  values   cc’  of  the  graph  G’  =  (V,  E  U  {u,v})  (or  G’  =  (V,  E  \  {u,v})  )     •  CompuAng  cc  values  from  scratch  ager  each  edge  change   is  very  costly   •  Need  a  faster  algorithm    

IEEE BigData’13

Incremental  Algorithms  for  Closeness  Centrality  

6

Filtering  Techniques

   

•  We  aim  to  reduce  number  of  SSSPs  to  be  executed  

•  Three  filtering  techniques  are  proposed   •  Filtering  with  level  differences   •  Filtering  with  biconnected  components   •  Filtering  with  idenAcal  verAces  

•  And  an  addiAonal  SSSP  hybridizaAon  technique  

IEEE BigData’13

Incremental  Algorithms  for  Closeness  Centrality  

7

Filtering  with  level  differences   •  Upon  edge  inserAon,  breadth-­‐first  search  tree  of  each   vertex  will  change.  Three  possibiliAes:  

•  Case  1  and  2  will  not  change  cc  of  s!   •  No  need  to  apply  SSSP  from  them  

•  Just  Case  3   •  How  to  find  such  verAces?   •  BFSs  are  executed  from  u  and  v  and  level  diff  is  checked     Incremental  Algorithms  for  Closeness  Centrality  

IEEE BigData’13

8

v to all other vertices. And, we filter the vertices satisfying Filtering   with   level  d1. ifferences   the statement of Theorem Algorithm 2: Simple work filtering Data: G = (V, E), cc[.], uv Output: cc0 [.] G0 (V, E [ {uv}) du[.] SSSP(G, u) . distances from u in G dv[.] SSSP(G, v) . distances from v in G for each s 2 V do if |du[s] dv[s]|  1 then Case 1 and 2 cc0 [s] = cc[s] else Case 3 . use the computation in Algorithm 1 with G0 return cc0 [.] IEEE BigData’13

Incremental  Algorithms  for  Closeness  Centrality  

9

Filtering  with  biconnected  components   •  What  if  the  graph  have  arAculaAon  points?  

A

u

v B

•  Change  in  A  can  change  cc  of  any  vertex  in  A  and  B   •  CompuAng  the  change  for  u  is  enough  for  finding   changes  for  any  vertex  v  in  B  (constant  factor  is  added)  

IEEE BigData’13

Incremental  Algorithms  for  Closeness  Centrality  

10

Filtering  with  biconnected  components   •  Maintain  the  biconnected  decomposiAon  

edge b-d added

IEEE BigData’13

Incremental  Algorithms  for  Closeness  Centrality  

11

Filtering  with  idenJcal  verJces   •  Two  types  of  idenAcal  verAces:   •  Type  I:  u  and  v  are  idenAcal  verAces  if  their  neighbor  lists  are   same,  i.e.,  Γ(u)  =  Γ(v)   u v

•  Type  II:  u  and  v  are  idenAcal  verAces  if  their  neighbor  lists  are   same  and  they  are  also  connected,  i.e.,  {u}  U  Γ(u)  =  {v}  U  Γ(v)   u v

•  If  u  and  v  are  idenAcal  verAces,  their  cc  are  the  same   •  Same  breadth-­‐first  search  trees!   IEEE BigData’13

Incremental  Algorithms  for  Closeness  Centrality  

12

Filtering  with  idenJcal  verJces   •  Let  VID  be  a  subset  of  V  and  it’s  a  vertex  class  containing   type-­‐I  or  type-­‐II  idenAcal  verAces.  Then  cc  values  of  all   the  verAces  in  VID  are  equal   •  Applying  SSSP  from  only  one  of  them  is  enough!      

•  Type-­‐I  and  type-­‐II  idenAcal  verAces  are  found  by  simply   hashing  the  neighbor  lists  

IEEE BigData’13

Incremental  Algorithms  for  Closeness  Centrality  

13

SSSP  HybridizaJon   •  BFS  can  be  done  in  two  ways:   •  Top-­‐down:  Uses  the  verAces  in  distance  k  to  find  the  verAces  in   distance  k+1   •  BoLom-­‐up:  Ager  all  distance  k  verAces  are  found,  all  other   unprocessed  verAces  are  processed  to  see  if  they  are  neighbor   •  Top-­‐down  is  expected  to  be  beLer  for  small  k  values   •  Following  the  idea  of  Beamer  et  al.  [SC’12],  we  apply  hybrid   approach   •  Simply  compare  the  #  of  edges  to  be  processed  at  level  k   •  Choose  the  cheaper  opAon  

IEEE BigData’13

Incremental  Algorithms  for  Closeness  Centrality  

14

Experiments   •  The  techniques  are  evaluated  on  different  sizes  and  types   of  large  real-­‐world  social  networks  

processed

wo works Yet, both proposed pdates the rtions and he perfor(less than IEEE BigData’13

name hep-th PGPgiantcompo astro-ph cond-mat-2005

Graph

soc-sign-epinions loc-gowalla web-NotreDame amazon0601 web-Google wiki-Talk DBLP-coauthor

|V | 8.3K 10.6K 16.7K 40.4K

|E| 15.7K 24.3K 121.2K 175.6K

131K 196K 325K 403K 875K 2,394K 1,236K

711K 950K 1,090K 2,443K 4,322K 4,659K 9,081K

Incremental  Algorithms  for  Closeness  Centrality  

Table I

Time (in sec.) Org. Best 1.41 0.05 4.96 0.04 14.56 0.36 77.90 2.87 43.5 778 6.25 2,267 53.18 2,845 53.06 14,903 298 65,306 824 175,450 922 115,919 251 99.8 15

is more useful for the graphs having characteristics similar DistribuJon   toProbability   small-world networks. 0.6" 0.4" 0.2"

Pr(X"="0)" Pr(X"="1)" Pr(X">"1)"

0"

The bars show the distribution of random variable X = |dG w)show   dGt(v, into three wevinvestigated when an •  (u, Bars   he  w)| distribuAon   of  cases random   ariable  of  level   edgedifferences   uv is added. into  three  cases  when  an  edge  is  inserted  

Figure 4.

Filtering with identical vertices is not as useful as the Incremental  Ain lgorithms   for  Closeness   Centrality   other two techniques the work filter. Overall, there is 16a IEEE BigData’13

Speedups   ~100 times better Random  inserAons  for  10  graphs   real temporal data Real  inserAons  for  DBLP-­‐coauthor  graph   shows larger speedups

•  •  •  Speedups  are  w.r.t.  full  cc  computaAon   Graph hep-th PGPgiantcompo astro-ph cond-mat-2005 Geometric mean soc-sign-epinions loc-gowalla web-NotreDame amazon0601 web-Google wiki-Talk DBLP-coauthor Geometric mean

CC 1.413 4.960 14.567 77.903 9.444 778.870 2,267.187 2,845.367 14,903.080 65,306.600 175,450.720 115,919.518 13,884.152

CC-B 0.317 0.431 9.431 39.049 2.663 257.410 1,270.820 579.821 11,953.680 22,034.460 25,701.710 18,501.147 4,218.031

Time (secs) CC-BL 0.057 0.059 0.809 5.618 0.352 20.603 132.955 118.861 540.092 2,457.660 2,513.041 288.269 315.777

CC-BLI 0.053 0.055 0.645 4.687 0.306 19.935 135.015 83.817 551.867 1,701.249 2,123.096 251.557 273.036

CC-BLIH 0.048 0.045 0.359 2.865 0.217 6.254 53.182 53.059 298.095 824.417 922.828 252.647 139.170

CC-B 4.5 11.5 1.5 2.0 3.5 3.0 1.8 4.9 1.2 3.0 6.8 6.2 3.2

Speedups CC-BL CC-BLI 24.8 26.6 84.1 89.9 18.0 22.6 13.9 16.6 26.8 30.7 37.8 39.1 17.1 16.8 23.9 33.9 27.6 27.0 26.6 38.4 69.8 82.6 402.1 460.8 43.9 50.8

CC-BLIH 29.4 111.2 40.5 27.2 43.5 124.5 42.6 53.6 50.0 79.2 190.1 458.8 99.7

Filter time (secs) 0.001 0.001 0.004 0.010 0.003 0.041 0.063 0.050 0.158 0.267 0.491 0.530 0.146

Table II

XECUTION TIMES IN SECONDS OF ALL THE ALGORITHMS AND SPEEDUPS WHEN COMPARED WITH THE BASIC CLOSENE biconnected ALITY ALGORITHM CC. I N THE TABLE CC-B IS THE VARIANT WHICH USES ONLY BCD S , CC-BL USES BCD S AND FIL decomposition LEVELS , CC-BLI USES ALL THREE WORK FILTERING TECHNIQUES INCLUDING IDENTICAL VERTICES . A ND CC-BLIH level differences ALL THE TECHNIQUES DESCRIBED IN THIS PAPER INCLUDING SSSP HYBRIDIZATION .

brings 3x 1.15x speedup Hybridization filtering provides 14x impact of level filtering can also be seen on Figure 5. NPRP grant 4-1454-1-233 from the Qatar National R speedup with identical speedup component do f the edges in the main biconnected Fund (a member brings of Qatar 2x Foundation). The statemen IEEE the closeness values of many vertices and the vertices ange herein are solely the responsibility of the authors. BigData’13

Incremental  Algorithms  for  Closeness  Centrality  

17

Conclusion   •  First  algorithms  for  incremental  closeness  centrality   computaAon   •  Update  Ame  of  a  real  temporal  data  is  reduced  from   1.3  days  to  4.2  mins   •  Fundamental  building  block  for  streaming  workloads   and  centrality  management  problem   •  Future  Work:   •  Sampling-­‐based  soluAons   •  ParallelizaAon     •  A.E.  Sarıyuce,  E.  Saule,  K.  Kaya,  Ümit  V.  Çatalyürek.  STREAMER:  a   Distributed  Framework  for  Incremental  Closeness  Centrality   ComputaJon,  IEEE  Cluster  2013.  

  IEEE BigData’13

Incremental  Algorithms  for  Closeness  Centrality  

18

Thanks   •  For  more  informaAon   •  Email  [email protected]   •  Visit    hLp://bmi.osu.edu/~umit  or  hLp://bmi.osu.edu/hpc  

•  Acknowledgement  of  Support  

IEEE BigData’13

Incremental  Algorithms  for  Closeness  Centrality  

19