Graph Algorithms using MapReduce

Graph Algorithms using MapReduce Pruthvi (200601076) Nishanth (200601055) Sundeep (200601041) What’s a Graph?  G = (V,E), where     Diffe...

Author: Scott Morris

0 downloads 4 Views 83KB Size

Report

Download PDF

Recommend Documents

Design Patterns for Efficient Graph Algorithms in MapReduce

Data Mining with MAPREDUCE: Graph and Tensor Algorithms with Applications

Graph Algorithms. Vertex Coloring. Graph Algorithms

Graph Algorithms: Applications

Graph Representations and Algorithms

Graph Algorithms. Chapter 9

Graph Algorithms in Bioinformatics

1 Graph Representations and Graph Search Algorithms

Fuzzy Joins Using MapReduce

Accelerating Large Graph Algorithms on the GPU Using CUDA

Interactive Grain Image Segmentation using Graph Cut Algorithms

EXPLORING STORY SIMILARITIES USING GRAPH EDIT DISTANCE ALGORITHMS. Sritama Paul

Fuzzy Joins Using MapReduce

Graph-based Algorithms in NLP

Scalable Distributed Reasoning using MapReduce

Algorithms for Graph Similarity and Subgraph Matching

Algorithms for Graph Connectivity and Cut Problems

Filtering: A Method for Solving Graph Problems in MapReduce

Simplifying the Development and Deployment of MapReduce Algorithms

A Comparative Analysis of MapReduce Scheduling Algorithms for Hadoop

Matrix Chain Multiplication via Multi-way Join Algorithms in MapReduce

Matrix Chain Multiplication via Multi-Way Join Algorithms in MapReduce

Graph Algorithms using MapReduce Pruthvi (200601076) Nishanth (200601055) Sundeep (200601041)

What’s a Graph? 

G = (V,E), where   



Different types of graphs:  



V represents the set of vertices (nodes) E represents the set of edges (links) Both vertices and edges may contain additional information Directed vs. undirected edges Presence or absence of cycles

Graphs are everywhere:    

Hyperlink structure of the Web Physical structure of computers on the Internet Interstate highway system Social networks

Graphs and MapReduce 

Graph algorithms typically involve: – Performing computation at each node – Processing node-specific data, edge-specific data, and link structure – Traversing the graph in some manner



Key questions: – How do you represent graph data in MapReduce? – How do you traverse a graph in MapReduce?

Representing Graphs  

For Computational Purposes representing a graph as G = (V,E) is inefficient So for the same there are two ways to represent graphs  

 

Adjacency Matrix Adjacency Lists

We have chosen “Adjacency List” for our project Reasons being: 

 

Much more compact representation (we throw away all the zeros that occupy place in the Adjacency Matrix) Easy to compute over edges (outlinks) Graph structure can be broken up and distributed

Graph Problems Dealing   

Breadth First Search Dijkstra’s Shortest Path First Depth First Search

Breadth First Search • Breadth-first search (BFS) is a graph search algorithm that begins at the root node and explores all the neighbouring nodes. Then for each of those nearest nodes, it explores their unexplored neighbour nodes, and so on, until it finds the goal.

BFS 3

1

2

2

2

3

3 3

4

4

Continued… 

Every Node in the Graph is represented by three colors   



Initially,  



all the nodes in the graph are WHITE except for the root which is GREY The distance from root to all the other nodes is infinite (INT_MAX)

Every node in the graph along with its children is sent to a Map task 



WHITE : The node has not been visited GREY : The node is visited but not its children BLACK : The node and its children are visited

Input to Map: (node, outlinks)

The outputs from all the Map tasks is sent to the Reduce task, whose output is again split to Map tasks if there is another iteration

Algorithm 



Map(Node){ If Node is GREY: For All Child Nodes: Set ChildNode.distance = Node.distance + 1; ChildNode.Color = GREY; Node.Color = Black; } Reduce(ListofNodes(outlinks,color,distance)){ For All Nodes: Node.distance = MinDistance ; Node.Color = Darkest Color; }

Multiple Iterations Needed 

This MapReduce task advances the known frontier by one hopin every iteration 







Subsequent iterations include more reachable nodes as frontier advances Multiple iterations are needed to explore entire graph Feed output back into the same MapReduce task

Preserving graph structure for every iteration

When do we Stop? • Eventually, all the nodes will be discovered and we need to stop. So, ideally we need to have a counter (state) to stop the iterations • Two Methods to keep track of the state • First: – Create a Directory in HDFS every time there is an iteration needed – Delete the Directory when no more Iterations are required

• Second: – Make a RPC call to a server which updates the state for every iteration

Dijkstra’s Algorithm • It is a graph search algorithm that solves the single-source shortest path problem for a graph with nonnegative edge path costs, producing a shortest path tree • For a given source node in the graph, the algorithm finds the path with lowest cost (i.e. the shortest path) between that node and every other node

1

∞

∞

10 2

0

9

3

5

4

6

7 ∞

2

∞

1

10

∞

10 2

0

9

3

5

4

6

7 5

2

∞

1

8

14

10 2

0

9

3

5

4

6

7 5

2

7

1

8

13

10 2

0

9

3

5

4

6

7 5

2

7

1

8

9

10 2

0

9

3

5

4

6

7 5

2

7

Dijkstra • Similar to BFS except for two changes – Edges to Nodes are now attributed with positive weights – So, Map() { if (node = white) set state as TRUE; If Node !=white: For All Child Nodes: Set ChildNode.distance = Node.distance + weight of edge; ChildNode.Color = GREY; Node.Color = Black; }

• Reduce(ListofNodes(outlinks,color,distance)){ For All Nodes: Node.distance = MinDistance ; Node.Color = Darkest Color; }

Depth First Search • DFS is an uninformed search that progresses by expanding the first child node of the search tree that appears and thus going deeper and deeper until a goal node is found, or until it hits a node that has no children. Then the search backtracks, returning to the most recent node it hasn't finished exploring. • Still working on the MapReduce algorithm of DFS

Summary • Store graphs as adjacency lists • Graph algorithms with MapReduce: – Each map task receives a node and its outlinks – Map task computes some function of the link structure and then it gives a value with target as the key – Reduce task collects these keys (target nodes) and aggregates • Iterate multiple MapReduce cycles until some termination condition – A graph structure is passed from one iteration to next

Conclusions • MapReduce is adapt at manipulating graphs • MapReduce explores all paths in parallel – Divide and conquer is the norm – Throws more hardware at the problem • The concept of Graphs in MapReduce can also be used in solving other problems like – Page Rank, Minimum Spanning Trees, Bipartite Partition etc

Bibiliography • http://20bits.com/articles/graph-theory-part-iintroduction/ • http://diveintodata.org/2009/07/paper-graphtwiddling-in-a-mapreduce-world/ • http://code.google.com/edu/submissions/mapredu ce-minilecture/listing.html • http://www.wikipedia.org

THANK YOU