9/3/2015
Peer‐to‐Peer (P2P) Architecture • The processes in a P2P system run on end‐user computer systems that are interconnected via the Internet • All processes are equal, playing the same role – Each process acts as a client and a server at the same time – The processes are called peers
• P2P applications – File sharing: Gnutella, Kazaa – File distribution: BitTorrent – Internet telephony: Skype 17
Scalability of P2P Systems • P2P systems are scalable because available resources grow with the number of users – When a computer joins a P2P system, it adds resources (i.e., processing, storage, network bandwidth) to the system
• P2P systems enable a large number of computers to provide access to data and other resources that they collectively store and manage 18
1
9/3/2015
Comparison of Client‐Server and P2P Architectures Client‐Server Each process is a client or server
P2P Each process acts as a client and a server at the same time; all processes play the same role
Shared resources are placed on the server Shared resources are distributed on user machine computers Not scalable: resources in the system does Scalable: resources in the system increases as the number of users increase not increase as the number of users increase Server is a single point of failure
No single point of failure
19
Overlay Networks • Processes in a P2P system form an overlay network to communicate with each other – In an overlay network, nodes represent processes and links represent communication channels (usually realized as TCP connections) – Each node in the overlay network maintains a list of neighbors – Each node communicates with its neighbors through the communication channels
• Two types of overlay networks – Unstructured: the overlay network is constructed using a randomized algorithm – Structured: the overlay network is constructed using a deterministic procedure
20
2
9/3/2015
Unstructured Overlays • The overlay network is constructed based on a randomized algorithm – Each node maintains a list of c randomly chosen neighbors (called its partial view of the network) – Nodes periodically swap part of their partial views with their neighbors to create a new partial view
• Joining the overlay network – The system provides several permanent well‐known bootstrap hosts that maintain a list of recently joined peers – To join the network, a node contacts a bootstrap host and randomly picks c neighbors from the list
• Leaving the overlay network – A node can depart without informing any other node – When a neighbor contacts the node and gets no response, it knows the node has departed 21
Search in Unstructured Overlays • Each node stores data items that it is willing to share • Search is done through flooding – To find a data item, a node broadcast a search query to its neighbors, which forward the query to their neighbors – If a node receives a query and the requested data item is found, it sends a reply to the source over the reverse path – Search is inefficient – In a network with N nodes, a search requires O(N2) messages – E.g., Gnutella 22
3
9/3/2015
Structured Overlays • A structured overlay network is constructed using a deterministic procedure • We will study Chord ‐ a distributed hash table (DHT) based p2p system – DHT provides a lookup service similar to a hash table; – (key, value) pairs are stored in a DHT, any participating node can efficiently retrieve the value associated with a given key – Mapping from keys to values is implemented in a completely decentralized manner – See the Chord paper at http://pdos.csail.mit.edu/papers/chord:sigcomm01/chord _sigcomm.pdf 23
Chord (1) • Nodes are logically organized in a ring • Each data item is assigned a random key from a large identifier space, each node is assigned a random identifier from the same identifier space • Data item with key k is mapped to the node with the smallest identifier id ≥ k, denoted as succ(k) • The system provides an operation LOOKUP(k) that returns the network address of succ(k) – In a network with N nodes, a lookup requires O(log N) messages – Much more efficient than flooding
The mapping of data items onto nodes in Chord. 24
4
9/3/2015
Chord (2) • When a node wants to join the system, it – generates a random identifier id – does a lookup on id to get the network address of succ(id) – contacts succ(id) and its predecessor, inserts itself in the ring – transfers from succ(id) those data items for which node id is now responsible
• When node id wants to leave the system, it – informs its predecessor and successor – transfers its data items to succ(id) 25
Hybrid Architectures • Combines client‐server architecture and P2P architecture • Examples: – Superpeer networks – BitTorrent – Edge‐server systems
26
5
9/3/2015
Superpeer Networks (1) • Two types of nodes: regular peers and superpeers • Each regular peer is connected to a single superpeer and is a client of its superpeer • Superpeers form an unstructured overlay network
27
Superpeer Networks (2) • Locating data items – A superpeer keeps an index over its clients’ data items – A client submits its query to its superpeer, the superpeer then searches its index and forwards the query to its neighbors if the requested data item is not found in the local index – Searching is more efficient than pure unstructured P2P networks: Flooding is performed among the superpeers instead of among all nodes in the system
• Superpeers handle query processing they should be long‐ lived and have sufficient processing power and network bandwidth • Examples: Kazaa, Skype
28
6
9/3/2015
BitTorrent (1) • BitTorrent is a file distribution system that enables fast downloading of large files and reduces the load on the hosting machine – Files are split into fixed‐sized chunks – When multiple nodes are downloading a file at the same time, they upload the chucks of the file to each other – A node downloads chunks of a file in parallel from many other nodes this enables fast downloads and reduces the burden on the hosting machine – Client‐server architecture is used for the initial setup of the network – P2P architecture is used for downloading chunks of a file
• See paper at http://www.bittorrent.org/bittorrentecon.pdf 29
BitTorrent (2)
• Each file has a tracker, which is a server that keeps track of nodes that are downloading the file • When a node wants to download a file, it – first contacts the tracker of the file to get the network addresses of a set of peers that are downloading the file – then downloads chunks from the peers
• A tit‐for‐tat mechanism is used to ensure downloaders are uploaders too: P uploads to Q if P downloads from Q 30
7
9/3/2015
Edge‐Server Systems
• Servers are placed “at the edge” of the Internet (i.e., at the Internet Service Providers) • Edge servers replicate content and serve content to end users • Edge servers improve response time as they are close to the users • Example: YouTube’s Content Distribution Network 31
8