Peer to Peer Instant Messaging

Peer to Peer Instant Messaging Assignment in Data communication I, Department of Information Technology, Uppsala University. Overview In this program...
0 downloads 0 Views 114KB Size
Peer to Peer Instant Messaging

Assignment in Data communication I, Department of Information Technology, Uppsala University. Overview In this programming exercise you will implement a peer to peer instant messaging system based around a simple program that is provided to you. The program provided exchanges text messages between peers using direct TCP communication between two copies of the peer process executing on the same machine. The assignment asks you to extend the program to implement a more general messaging functionality based on a defined messaging protocol. The final system will provide a combination of features possibly including both point to point, and multicast message delivery to UID’s on any host. Host and ID discovery will be implemented via the peer network using a control message protocol to maintain a distributed name lookup service among the peers. The degree to which the project should be completed depends on the course in which you take elements of this project. Course stages are allocated to subjects as follows (if not stated differently by the examiner). • • • •

1

ITP DatorsysII (del 1), Datakom, Stages 1 and 2. Distance course, Datakom, Stages 1 and 2. MNP Datakom, Stages 1 and 2. DVP Datakom, Stages 1 to 3.

General Problem Description

The program to be developed in this assignment should implement several aspects of a peer to peer messaging application. The application is implemented using TCP communication and a simple packet based message structure. 1

The initial peer implementation code provided with this problem description sets up a server communication socket based on the UID of the person using the client. The UID is stored in a text file named .p2prc or can be specified on the command line by running “peer ”. The peer has a simple user interface providing three commands MESG, DISP, and QUIT. Using the MESG command and a UID, a peer can try to connect to a copy of itself that is also running on ”localhost” (or IP address 127.0.0.1). If the requested peer is executing, a connection is established and messages can be exchanged. The current message is just a text string, and does not represent the structure of the packets that should actually be exchanged between the peers. The DISP command is used to display an incoming message. To complete this laboratory you need to extend the existing code to implement the facilities and application protocols specified in the assignment stages appropriate to your course. Each stage builds on top of previous stages, so you can extend your code from a previous stage when moving forward in the assignment. Stages should be created as separate source code and executable files named peer1, peer2, etc. See section ?? for more information about hand-in instructions. It is important that you follow these instructions to pass the assignment! The code is provided as a tar-ball. The contents is as follows: p2p/ |-|-|-| | ‘--

code_lib/ ns_source/ p2p_source/ |-- base/ ‘-- copy/ serv_test/

: Contains library code (DO NOT MODIFY). : Nameserver source code. : Contains your peer source code. : Copies of your compiled programs. : Program to test the name server.

Figure 1: Directory structure of source code tree.

1.1

Important Things to Think About

• Help routines and support code can be found in the code lib/ directory. Study that code and the peer.c code skeleton to familiarize yourself with the assignment. It should be noted that no modifications of the library code should be necessary to complete the assignment. • All the the the

packets should be sent in network byte order independent of type of hardware/CPU the code is running on. This means that code should use the funtions htonl(), ntohl(), htons(), etc. for data that is sent on the network. 2

• If you have access to for example a Linux machine, you can easily detect byte order bugs by running one “peer” on a SUN machine and one “peer” on the Linux machine and have them communicate with each other. • Run your own local nameserver for easier debugging while coding. Good Luck!

3

2

Project Stages

Stage 1: A Simple Message Protocol In short: 1. Define and implement a new packet format. 2. Extend code to handle multiple peers. 3. Extend code to buffer and handle multiple messages on a per peer basis. 4. Implement a PEER command to list currently connected peers. Modify and extend the current implementation to implement message passing between peers via user interaction. You may continue to assume that both peers are executing on the same physical computer. Start the assignment by copying the peer.c file into a file called peer1.c. Then edit the Makefile in the same directory to match the new target. You need to extend a simple command line interface to which allows a user to send a message to the other user. The primitive implementation of the command language is defined as follows. command message display quit UID textblock

:: :: :: :: :: ::

message display quit MESG DISP QUIT *[0-9] *

To send a message to a given UID one types the keyword MESG followed by the desired UID and an alphanumeric text message of up to N characters. The UID must be delimited by spaces in both commands. Your program need not parse commands for correct grammar, you can assume that all commands are well formed. Your program should extract the relevant data from a MESG command, check that the UID specified is known to the peer, and pack and send the text to the other peer (UID) according to the packet format in figure ??. I.e., you need to change the current implementation that only sends an ASCII string to instead send a binary header in front of the message. Currently no packet format is defined in the code, so you will be required to implement this yourself. The current command line implementation of the DISP command does not require a UID, since only one other peer process can be connected to 4

another. You should extend the program to allow connections to many peers simultaneously and implement a command PEER that prints the curerntly connected peers (e.g., IP address, number of received messages, etc.). You will also need to extend the message notification and buffering, and the DISP command to allow buffering and display of messages according to UID. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Reserved | UID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |

Figure 2: Message packet format The packet format contains the following fields: Type: An integer describing the type of the message. Reserved: Field reserved for future usage, sent as 0. UID: The User ID of the sender. Length: An integer with the length of the message. Message: An arbitrary length string with the message.

In your extended version of the program, when the UID is not known to the peer it should attempt to create a socket connection to that UID peer on “localhost” (127.0.0.1), setting up appropriate data structures to manage such a communication. If a connection can be established the message can then be sent using the defined packet format. If the connection is not possible the user should be notified that the message could not be delivered. The length of the message is specified in the packet so that the server will know how many bytes to read from the stream to get the entire message. The source UID is specified, as the recipient of a connection request will not know the UID of the requestor. Consequently we need to have that information in the packets so that the peer that accepts the connection can make an entry in the list of UID’s with which it has a connection and can exchange messages directly. The type field is not really needed at this point, but will later be used to differentiate between protocol control packets and the data packets containing message data. 5

When a message is received the peer should notify the user of the messaging application that a message has arrived, and from which UID that message has arrived. Messages should be buffered in a FIFO queue of messages until the user issues a DISP command to display the next waiting message from the specified UID. Displaying a message should automatically delete it from the queue. To handle multiple peers and queueing of messages you can use linked lists. Library code implementing simple linked list routines are provided in code lib/. The linked list library is provided for your convenience. Note though, that the library routines implementing the linked lists do not deal with memory management, so make sure you free the memory allocated by the data structures you have added to the list when you detatch them. Testing Test that your peer code can communicate properly with copies of itself with different UID’s. You should test several copies and connect in different orders to each of the copies by issuing MESG commands to different destinations. Test your assumptions about packet format by trying to exchange messages with peer implementations written by your classmates. Questions Answer the following questions related to the solution to stage 1 of the laboratory exercise. Each question should be answered using no more than 250 words to explain your reasoning. 1. What is the limit on the length of messages, if any? 2. Is there a limit on the total number of other peers that you can communicate with? 3. Does your implementation limit the number of messages queued in the peer waiting to be read? What are these limits? 4. There is another key protocol weakness with respect to UID’s. What might this weakness be, and what steps can be taken to help address it? (Think in terms of protocol design, not implementation limitations). 5. Consider the implications of using UDP instead of TCP connections in your peer program. Would it fit this application? Explain why, or why not (there may be both pros and cons)? What differences in efficiency might result, and in what circumstances?

6

Stage 2: Distributed Messaging and a Control Protocol In short: 1. Implement nameserver registration. 2. Implement nameserver interaction for UID to IP lookups. 3. Extend the peer program to be able to communicate between different hosts. Extend the code from stage 1 (by copying peer1.c to a file called peer2.c), so that it connects (via TCP) to a nameserver database. Initially you should connect to a copy of the nameserver running on your own machine “localhost” and test that your peer application interacts correctly with the server, and can properly register itself and do IP lookups for UID’s. Consult the serv test code if you are unsure how to communicate with the name server and need some practical examples of registration and query messages. The structure of the command protocol that we use to interact with the nameserver is as follows. control lookup response register UID

:: :: :: :: ::

lookup response register LRQ RSP PRG *[0-9]

When your peer is running correctly with a local nameserver you should start to use the public nameserver that is running on rama.it.uu.se at port 6345. This will provide a central point where you can register your UID and IP, and lookup the current UID’s of chat peers for your friend’s UID’s. The command protocol grammar is defined below. Modify the connection algorithm for a message to attempt a connection on the current machine, and if that fails to use the control message protocol described below to look up the host’s location (IP Address) for the required peer (UID) on the centralised nameserver. To access nameserver services your peer must connect to rama.it.uu.se port number 6345 which operates as a default name lookup database. If no valid IP can be retrieved report to the user that the UID cannot be discovered and that the message is undeliverable. To communicate with the name server we send a shorter packet type which is used to convey commands between peers instead of messages. This packet type is depicted in figure ?? and is pre-defined for you in globals.h as the type nsr pkt. 7

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | UID (1) | UID (2) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IP Address (1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IP Address (2) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 3: Control/Command packet format. Not all fields of the command packet are filled in for all packet types, since it is a general purpose packet format. The fields are specified as follows: Type: An integer describing the type of the message (defined in globals.h). Reserved: Field reserved for future usage, sent as 0. UID (1): User ID #1 (different meaning depending on the packet type). UID (2): User ID #2 (different meaning depending on the packet type). IP Address (1): An IP Address corresponding to UID (1). IP Address (2): An IP Address corresponding to UID (2).

Don’t forget to modify your code to replace the code that always connects to localhost with code that uses the IP address that you have stored in your local UID lookup table. Testing Once again, test your peer code can communicate properly with copies of itself with different UID’s on the same computer. Extend the tests to place the peers on different computers and test the command protocol by looking up UIDs on the central nameserver running on rama.it.uu.se. Now try to exchange messages with peer implementations written by your classmates. Questions Answer the following questions related to the solution to stage 2 of the laboratory exercise. Each question should be answered using no more than 250 words to explain your reasoning. 1. What important functionality is missing from the control protocol? Are there significant implications for the functionality of the system over time as a result of the design flaw? 8

2. Reflect on the peer discovery problem. How do you find peers in the system you have built? How reliable is this solution, and how might it be made more robust? 3. How can you find out what peer UIDs are in existence, and how can you record the UIDs of friends between sessions of the peer? Outline a possible approach.

9

Stage 3: A Broadcast UID Resolution Protocol (BURP) In short: 1. Extend the control packet format to include a TTL field. 2. Implement Broadcast UID Resolution so that the peer application no longer relies on the nameserver for UID to IP lookups. This stage of the project replaces the primitive control message protocol of Stage 2 which queried directly connected hosts and the nameserver with a more sophisticated broadcast based UID discovery protocol. The control protocol’s packet format (nsr pkt) can be found in the file code lib/globals.h. Extend the data structure that you have used to store UIDs to sockets. Add a hostnameIP entry so that we know what computer the peer is running on. This creates a more complete lookup database for UID information in the peers. LRQ packets need to be given a time to live (TTL) in hops and sent out to all peers to whom we have a direct connection using a broadcast flood method. These packets are forwarded until they die, while the originator waits for a positive response packet (RSP) that contains the host IP address required. If this technique fails (timeout on RSP packet) the host is considered unreachable and message delivery failure is reported to the user. You should not look up the host name in the nameserver tables. The nameserver is only to be used at startup to create a connection to two other hosts which are chosen at random from the nameserver. To obtain two IP addresses from the nameserver you should connect to it and send a packet with the type field set to PRQ. The PRP packet that is sent in response contains two IP addresses in the payload. In solving this problem you should assume that a host that receives a LRQ packet creates a direct TCP connection to the originator. It sends the RSP packet over that link, and retains the link for further communication with the originating peer. The peer that originated the LRQ packet can use then use this connection to send the message to the required destination UID. Testing As in stage 2 you should begin by testing that your peer code can communicate properly with copies of itself with different UID’s on several different computers. Test the initial connection establishment algorithm using a nameserver, either the central nameserver running on ”rama.it.uu.se”, or one running locally.

10

A simple test scenario is to run four peers on different machines. Make sure that two of them do not have connections to each other, but share one of the other peers. It should now be possible to send a message between the peers that do not have direct connection. After the BURP lookup, the peers should have a direct connection between each other (check with your PEER command). Now try to exchange messages with peer implementations written by your classmates, you should also test their implementations of the command protocol by asking them to broadcast lookups of UIDs for you. Questions Answer the following questions related to the solution to stage 3. Each question should be answered using no more than 250 words to explain your reasoning. 1. What changes do you need to make to the standard packet format to handle the new protocol functions? Draw a picture of the new packet format you have defined. 2. What should a peer do when it receives a LRQ message, and has no information on the UID in its local database? Should it forward the request to the other peers that it has contact with? Comment briefly on the advantages and disadvantages of forwarding requests in this way. 3. The ability to respond correctly to broadcast lookup requests relies on a generally agreed protocol implementation. Has a standard emerged within your class, or have people adopted proprietary solutions?

11

3

Hand-in Instructions

To pass the assignment you need to provide the complete source code for the stages you are supposed to solve as well the answers to the questions. It is important that you follow the steps below: 1. Your source tree should contain the files peer1.c, peer2.c, etc. corresponding to each of the stages that you have done. They should automatically compile into their corresponding executable file by a “make” from the top directory p2p/. 2. Create a tar ball with the complete source tree as shown in figure ??. Name the tar ball “p2p-{username}.tar.gz”. Example: > tar zcvf p2p-krmo5621.tar.gz p2p/ 3. Fill in the form at the back of this compendium. Do not forget to provide a path to the source code in your home directory. Also make sure that the tar ball has the right permissions, i.e. it must be readable by others. 4. Hand in the form and a printed copy of your answers to the questions at a location instructed by your examiner.

If you do not follow these instructions your assignment will automatically be handed back to you without corrections.

4

Marking Guidelines

Laboratory results are allocated according to the following qualitative guidelines. Godk¨ and

G

Komplettering

K

Underk¨ and

U

Your code functions correctly and you have answered all the related questions in a satisfactory manner. The code or questions need further work in order to address shortcomings or implementation bugs revealed during the tests associated with marking Code and/or questions have not been answered to the satisfaction of the marker, and the time limit (generally one year from the date when the work was due) for handing in material has expired.

12

Peer to Peer Instant Messaging

Name(s): Login Name: Email: Course and Course Code: Path to code: (e.g., /home/gujo8932/p2p-gujo8932.tar.gz)

Date:

Signature: 1