Graph Algorithms using MapReduce

Graph Algorithms using MapReduce Pruthvi (200601076) Nishanth (200601055) Sundeep (200601041) What’s a Graph?  G = (V,E), where     Diffe...
Author: Scott Morris
0 downloads 4 Views 83KB Size
Graph Algorithms using MapReduce Pruthvi (200601076) Nishanth (200601055) Sundeep (200601041)

What’s a Graph? 

G = (V,E), where   



Different types of graphs:  



V represents the set of vertices (nodes) E represents the set of edges (links) Both vertices and edges may contain additional information Directed vs. undirected edges Presence or absence of cycles

Graphs are everywhere:    

Hyperlink structure of the Web Physical structure of computers on the Internet Interstate highway system Social networks

Graphs and MapReduce 

Graph algorithms typically involve: – Performing computation at each node – Processing node-specific data, edge-specific data, and link structure – Traversing the graph in some manner



Key questions: – How do you represent graph data in MapReduce? – How do you traverse a graph in MapReduce?

Representing Graphs  

For Computational Purposes representing a graph as G = (V,E) is inefficient So for the same there are two ways to represent graphs  

 

Adjacency Matrix Adjacency Lists

We have chosen “Adjacency List” for our project Reasons being: 

 

Much more compact representation (we throw away all the zeros that occupy place in the Adjacency Matrix) Easy to compute over edges (outlinks) Graph structure can be broken up and distributed

Graph Problems Dealing   

Breadth First Search Dijkstra’s Shortest Path First Depth First Search

Breadth First Search • Breadth-first search (BFS) is a graph search algorithm that begins at the root node and explores all the neighbouring nodes. Then for each of those nearest nodes, it explores their unexplored neighbour nodes, and so on, until it finds the goal.

BFS 3

1

2

2

2

3

3 3

4

4

Continued… 

Every Node in the Graph is represented by three colors   



Initially,  



all the nodes in the graph are WHITE except for the root which is GREY The distance from root to all the other nodes is infinite (INT_MAX)

Every node in the graph along with its children is sent to a Map task 



WHITE : The node has not been visited GREY : The node is visited but not its children BLACK : The node and its children are visited

Input to Map: (node, outlinks)

The outputs from all the Map tasks is sent to the Reduce task, whose output is again split to Map tasks if there is another iteration

Algorithm 



Map(Node){ If Node is GREY: For All Child Nodes: Set ChildNode.distance = Node.distance + 1; ChildNode.Color = GREY; Node.Color = Black; } Reduce(ListofNodes(outlinks,color,distance)){ For All Nodes: Node.distance = MinDistance ; Node.Color = Darkest Color; }

Multiple Iterations Needed 

This MapReduce task advances the known frontier by one hopin every iteration 







Subsequent iterations include more reachable nodes as frontier advances Multiple iterations are needed to explore entire graph Feed output back into the same MapReduce task

Preserving graph structure for every iteration

When do we Stop? • Eventually, all the nodes will be discovered and we need to stop. So, ideally we need to have a counter (state) to stop the iterations • Two Methods to keep track of the state • First: – Create a Directory in HDFS every time there is an iteration needed – Delete the Directory when no more Iterations are required

• Second: – Make a RPC call to a server which updates the state for every iteration

Dijkstra’s Algorithm • It is a graph search algorithm that solves the single-source shortest path problem for a graph with nonnegative edge path costs, producing a shortest path tree • For a given source node in the graph, the algorithm finds the path with lowest cost (i.e. the shortest path) between that node and every other node

1





10 2

0

9

3

5

4

6

7 ∞

2



1

10



10 2

0

9

3

5

4

6

7 5

2



1

8

14

10 2

0

9

3

5

4

6

7 5

2

7

1

8

13

10 2

0

9

3

5

4

6

7 5

2

7

1

8

9

10 2

0

9

3

5

4

6

7 5

2

7

Dijkstra • Similar to BFS except for two changes – Edges to Nodes are now attributed with positive weights – So, Map() { if (node = white) set state as TRUE; If Node !=white: For All Child Nodes: Set ChildNode.distance = Node.distance + weight of edge; ChildNode.Color = GREY; Node.Color = Black; }

• Reduce(ListofNodes(outlinks,color,distance)){ For All Nodes: Node.distance = MinDistance ; Node.Color = Darkest Color; }

Depth First Search • DFS is an uninformed search that progresses by expanding the first child node of the search tree that appears and thus going deeper and deeper until a goal node is found, or until it hits a node that has no children. Then the search backtracks, returning to the most recent node it hasn't finished exploring. • Still working on the MapReduce algorithm of DFS

Summary • Store graphs as adjacency lists • Graph algorithms with MapReduce: – Each map task receives a node and its outlinks – Map task computes some function of the link structure and then it gives a value with target as the key – Reduce task collects these keys (target nodes) and aggregates • Iterate multiple MapReduce cycles until some termination condition – A graph structure is passed from one iteration to next

Conclusions • MapReduce is adapt at manipulating graphs • MapReduce explores all paths in parallel – Divide and conquer is the norm – Throws more hardware at the problem • The concept of Graphs in MapReduce can also be used in solving other problems like – Page Rank, Minimum Spanning Trees, Bipartite Partition etc

Bibiliography • http://20bits.com/articles/graph-theory-part-iintroduction/ • http://diveintodata.org/2009/07/paper-graphtwiddling-in-a-mapreduce-world/ • http://code.google.com/edu/submissions/mapredu ce-minilecture/listing.html • http://www.wikipedia.org

THANK YOU

Suggest Documents