An Implementation of IP Traceback in IPv6 Using Probabilistic Packet Marking

An Implementation of IP Traceback in IPv6 Using Probabilistic Packet Marking Emil Albright and Xuan-Hien Dang Department of Computer Science Universit...
0 downloads 0 Views 74KB Size
An Implementation of IP Traceback in IPv6 Using Probabilistic Packet Marking Emil Albright and Xuan-Hien Dang Department of Computer Science University of Akron Akron, OH, U.S.A. Abstract - Lack of source authentication in the IP protocol helps to encourage denial-of-service attacks. The open and trusting nature of the protocol makes the task of identifying an attacker difficult if the attacker chooses to spoof the source address. Probabilistic Packet Marking is an IP traceback approach that seeks to identify attackers by marking individual packets with some portion of the attack path, and then relying on the volume of attack traffic generated to ensure that the whole path can be retrieved. In this paper, we seek to take the probabilistic packet marking scheme, initially devised for working with IPv4, and determine its viability under the next-generation Internet Protocol, IPv6. We demonstrate how the Flow Label field in the IPv6 datagram header can be effectively used to implement the scheme. We present simulation results showing the applicability and efficiency of the approach.

Keywords: IP traceback, IPv6, probabilistic packet marking.

1. Introduction A major problem facing computer networks today comes in the form of Denialof-Service (DoS) attacks. These attacks direct massive floods of apparently legitimate messages to a victim network. Processing these packets consumes much if not all of the victim’s resources, and thus denies access to the intended users. Given the structure of the Internet, it is difficult if not impossible to prevent such an attack, but also relatively easy to mount one. This ease aggravates the problem, as attackers will typically launch an attack from many machines simultaneously in a Distributed Denial-of-Service (DDoS) attack. Such attacks are generally comprised of hundreds or thousands of hosts working in tandem, and render very difficult the task of holding the initiator responsible. If the initiators of an attack are to be held responsible, they must first be identified. While in theory every IP packet contains the address of its originator, there is no way to

ensure that an attacker has not inserted a false address into the Source Address field. Furthermore, not all Internet routers are capable of discerning the immediate origin of a packet (i.e., which of their ports received it), let alone its actual provenance. The task of identifying an IP packet’s originator has been referred to as the IP traceback problem, and to date there exists no clear solution. A variety of approximate solutions that have been proposed are intended for the Internet Protocol version 4 [1-6]. However, IP version 6 (IPv6) was developed to accommodate the explosive growth of the Internet and to gradually replace IPv4. Even though we are still in the early stages of IPv6 deployment, most of the operating systems and network devices (routers) are already IPv6-capable and many applications have been ported to IPv6. In this paper, we consider one of the promising techniques for IP traceback known as Probabilistic Packet Marking (PPM) [2] and propose to port its implementation to IPv6. Section 2 presents history and work relating to this method. The details of its

implementation in IP version 6 are presented in section 3, and an analysis of its performance follows in section 4. Finally section 5 offers future work and conclusions.

2. Background and Related Work Given the stateless nature of the Internet, it is very difficult to ascertain the origin of an IP packet. As was noted as early as 1985 [1], the IP protocol provides no real means of authentication for packet origins, and thus essentially operates entirely on trust when dealing with inter-network traffic. Since a software user can easily modify the IP header fields, it was observed that solving this problem would need to involve hardware. One such solution known as probabilistic packet marking, proposed by Savage et al [2] and based on the use of edge sampling for attack path reconstruction, calls for routers to mark packets with some identifying information. As the overhead of adding full address path information is excessive, it dictates that each router add only its own address, or the link between itself and its downstream neighbor, to the packet. In order to minimize space, there is only one such marking allowed per packet, and each router probabilistically decides whether to overwrite it. Furthermore, as the packet is to be marked in the 16 bits of the IP identification field in the IPv4 datagram header, and space is thus limited, each edge in the address path is split into eight fragments of the hash-compressed edge. As each subsequent router receives a packet, it will increment a distance field also contained within the IP identification field. When a DoS victim seeks to reconstruct the address of its attacker, it recombines the fragments at each level to eventually generate a full attack tree. Several criticisms are evident. First, it is impossible to use this technique to identify the origin of a single packet. In fact, it generally requires thousands of packets to construct an attack

tree. However, when used as a DoS deterrent, thousands or millions of packets will be received in a very short period of time, so this problem is insignificant. A more troubling limitation is that the fragmented nature of the edge markings makes the scheme very vulnerable to false packets inserted by an attacker [3]. This also raises a general shortcoming of PPM. The attacker's ability to insert false packets into the data sent allows it to present the attack as arriving from an upstream source. While there are measures which can be taken to reduce this problem, PPM can only be assured to generate a valid attack suffix, as the attacker cannot interfere with downstream routers' marking [1]. One final set of criticisms remains: the fragmented edge marking scheme is slow to reconstruct and can generate many false positives. Song and Perrig [4] propose the Advanced Marking Scheme (AMS) to overcome these last problems. AMS seeks to improve both accuracy and reconstruction time by assuming some knowledge of Internet topology. By assuming the victim is in possession of a map of its upstream routers, it becomes unnecessary to fragment edges, as the smaller amount of stored information can be compared against the map to reconstruct the attack path. Path reconstruction speed is greatly improved, and the number of false positives is much lower. Additionally, fewer packets are needed for reconstruction. These factors become particularly significant when dealing with a DDoS, as the far higher volume of attack traffic generates enough marked packets as to render fragment marking unacceptably slow and inaccurate. In AMS, a router that chooses to mark a packet writes the hash of its address into the allotted space, and its downstream neighbor XORs the hash of its own with this. This XOR of hashes can be calculated for each edge at the packet's marked distance, allowing a relatively fast

Most of the literature on IP traceback has been specified, as with the methodologies described above, for IPv4. Those studies that have addressed IPv6 have either focused on logging-based traceback [5], or addressed the matter only in passing [2]. Given the dearth of work done with IPv6, we chose to study the practicality of applying a PPM algorithm designed for IPv4 to IPv6. For the preliminary study, we chose to implement an algorithm based on Song and Perrig's simple AMS. In order to implement an IPv4 packet marking algorithm in IPv6, it is necessary to find a field with the IPv6 header that can be overloaded in the manner of IPv4's identification field. Savage et al [2] suggest that the Flow Label field as depicted in Figure 1 is the best candidate.

is a 20-bit field denoting a packet sequence flowing from one source address to a specific destination or destinations. Specific requirements for its usage are not yet finalized. If a router does not support flow labeling, the router is to send packets with a flow label field of zero, and leave the flow labels of packets it receives unchanged. Even if a router supports Flow Labels, it is required to leave the Flow Label unchanged. Thus, in principle a PPM packet with an overloaded Flow Label field should be safe from corruption if it is passed to a router that does not support the marking scheme. Furthermore, if an overloaded packet is passed to a router that does support flowspecific treatment, it will be given default treatment unless the Flow Label matches that of an existing flow, and further shares this flow's source and destination addresses. Therefore, there is little risk of inadvertently disrupting such a router's service. Using the Flow Label field immediately affords us an extra four bits as compared to IPv4. However, IPv6 addresses are 128-bit instead of 32-bit. Furthermore, when we obtained the trace route data [8] used to generate our upstream router map and attacker addresses, we found that we needed to use 6 bits for the Distance field instead of 5 bits. While we located only 29 addresses with path lengths of 32 or higher out of the trace route's total of 135629, we felt that it was better to err on the side of caution. Thus, our replacement for the 20-bit Flow Label is a 14-bit Edge Hash field and a 6-bit Distance field shown in Figure 2.

Figure 1: Standard IPv6 header

Figure 2: Overloaded IPv6 header

According to the specifications [7,8] the Flow Label field in the IPv6 datagram header

Given this structure, we proceeded to generate a topological map of Internet routers

and accurate reconstruction to occur. In fact, as the edge need not be fragmented, AMS can use eleven bits to store the edge, whereas fragment marking requires three of these bits for its fragment offset. Further refinements, discussed in section 5, are also possible. It is worth noting that AMS still suffers from PPM's general shortcoming regarding precision: it cannot discern between an address path and its valid attack suffix. It has also been observed that the overhead in maintaining an accurate upstream router map may be prohibitive [3], but standard tools exist, and maps need only be obtained after an attack to be useful [4].

3. Implementation

upstream from a single source point. This consisted of a file containing 135629 distinct addresses, and 37 files containing all distinct links between two levels. Our simulator was written in Java 1.4.2, and consisted of multiple thread processes, each representing a single attacker. When an attack was simulated, each attacker would be given an address chosen at random, and would produce 20000 packets to be passed downstream to the victim. At each router R, there is a fixed 0.03 probability to mark the packet P. If the packet was marked, the address of R would be fed into a 14-bit hash function, and stored in the Edge Hash field. The Distance field would also be cleared. The downstream node would either choose to overwrite the prior mark, or XOR the hash of its address and store this value in the Edge Hash field. In all cases where the node does not choose to initiate the marking process, the Distance field would be incremented. Finally, the full value of the attack path was stored for reference. When the attack was complete, the victim would reconstruct the attack tree. First, the upstream map M would be loaded. Next, each packet p output by the attack simulator to P would be read in and sorted by level into set S, eliminating duplicates. With these two structures created, the reconstruction of the attack tree T would begin. All packets from each distance would have the value of their Edge Hash field, i.e., the XOR of two hashed addresses, compared to the hashes of that level's edges. If the values matched, then that edge would be added to the attack tree. Figure 3 shows a detailed description of the algorithm.

4. Results and Analysis Our goal was to ascertain the feasibility of applying existing IPv4 Probabilistic Packet Marking algorithms under IPv6.

Figure 3: Marking scheme Examining our initial results suggests that these algorithms will perform comparably well under IPv6. Our simulations and reconstructions were run under Windows XP on a 2.4 Ghz Pentium 4 with 512 MB of RAM. We simulated attacks of size ten to two hundred and fifty in increments of ten, and of size two hundred and fifty to five hundred in increments of fifty. At each attack size, we performed thirty separate attack simulations. We then performed reconstruction, storing the number of false positives generated as well as the time taken to reconstruct the attack tree as shown in figures 4 and 5, respectively. A false positive is defined as a path that does not take part in an attack but is reconstructed by the tracing mechanism [4]. The results for the size of attacks tested are slightly superior to those reported by Song and Perrig [4] in terms of both false positives generated and tree

Figure 4: False positives generated

Figure 5: Attack tree reconstruction time

reconstruction time. This is in keeping with our intuition, based on the increased space available for storing our edge hashes. However, it appears that our reconstruction time is slightly higher than was expected. This is perhaps attributable to our nonoptimized Java implementation, but regardless, it does not appear to represent a major deviation from AMS in IPv4.

5. Future Work and Conclusions Our next step will be to perform further, larger simulations to determine if our scheme continues to behave comparably to the IPv4 version. Subsequently, we will attempt to implement the AMS II [4] to verify that the same level of accuracy and performance gains can be achieved. Additionally, we would like to attempt tests in which a small portion of the packets received are not part of the attack tree, to gauge the affect of legitimate traffic during an attack. In conclusion, we have reviewed the problems posed to authentication in the IP protocol. We have considered the implication of these in the context of DoS and DDoS attacks. We have examined some methods proposed to overcome this via packet marking, and we have reviewed criticisms leveled at those methods examined. Subsequently, we examined the IPv6 protocol and the use of the Flow Label field. We argued that this field could be safely overloaded, and suggested its decomposition into two fields for packet marking. Finally, we presented our simulation of IPv6 packet marking, and concluded that it shows behavior comparable to that found in Song and Perrig’s IPv4 implementation.

References [1] Morris, R. A weakness in the 4.2 BSD Unix TCP/IP Software. AT&T Bell Labs, Technical Report Computer Science 117, 1985. [2] Savage, S., Wetherall, D., Karlin, A., and Anderson, T. Network Support for IP Traceback. IEEE/ACM Transactions on Networking, Vol. 9(3), 2001, pp. 226237. [3] Waldvogel, M. GOSSIB vs. IP Traceback Rumors. Proc. 18th Ann. Computer Security Application Conf. (ACSAC 2002), 2002, pp. 5-13. [4] Song, D., and Perrig, A. Advanced and Authenticated Marking Schemes for IP Traceback. Proc. IEEE INFOCOM, Vol. 2, April 2001, pp. 878-886. [5] Lee, H., Ma, M., Thing, V. and Xu, Y. On the Issues of IP Traceback for IPv6 and Mobile IPv6. Proc. 8th IEEE Intl. Symp. on Computers and Communication (ISCC'03), 2003, pp. 582-587. [6] Kuznetsov, V., Simkin, A., and Sandstrom, H. An Evaluation of Different IP Traceback Approaches. In Proceeding of the 4th International Conference on Information and Communications Security. Singapore December 9-12, 2002, pp.37-48. [7] Deering, S., and Hinden, R. Internet Protocol, Version 6 (IPv6) Specification. RFC 2460, 1998. [8] Rajahalme, J., Conta, A., Carpenter, B., and Deering, S. IPv6 Flow Label Specification. RFC 3697, 2004. [9] Internet Mapping Project. http://research.lumeta.com/ches/map/ Accessed October, 2004.