Characteristics of IP traffic in commercial wide area networks

Characteristics of IP traffic in commercial wide area networks∗ Nadia Ben Azzouna and Fabrice Guillemin France Telecom R&D 2, Avenue Pierre Marzin, 22...
Author: Sophia Norris
1 downloads 4 Views 191KB Size
Characteristics of IP traffic in commercial wide area networks∗ Nadia Ben Azzouna and Fabrice Guillemin France Telecom R&D 2, Avenue Pierre Marzin, 22300 Lannion, France ABSTRACT Measurements from an Internet backbone link carrying TCP traffic towards different ADSL areas are analyzed in this paper. For traffic analysis, we adopt a flow-based approach and the popular mice/elephants dichotomy. The originality of the experimental data reported in this paper, when compared with previous measurements from very high speed backbone links, is in that commercial traffic comprises a significant percentage due to peer-topeer applications. This kind of traffic exhibits some remarkable properties in terms of mice, elephants and bit rates, which are thoroughly described in this paper. Mice due to p2p protocols and mice due to classical Internet applications such as HTTP, ftp, etc. are analyzed separately. It turns out that by adopting a suitable level of aggregation, global traffic can be described by means of usual tele-traffic models based on M/G/∞ queues with Weibullian service times. The global bit rate can be approximated by the superposition of Gaussian processes perturbed by a white noise and does not exhibit long range dependence. Key words: ADSL, peer-to-peer, power spectrum, M/G/∞ queue 1. INTRODUCTION Characterization of traffic is a critical issue for telecommunication network operators: it is indeed of prime importance to know traffic in order to plan the evolution of a network in terms of transmission capacities (network planning) as well as to estimate the quality of service offered to end users by the current network infrastructure (traffic management). The emergence of the Internet as a service integration network raises difficult problems in many respects. From a traffic management point of view, the new challenges brought up by the Internet are in the heterogeneity of bit rates, the connectionless transfer mode, the elasticity of a large part of traffic, and the rapid evolution of the usage by customers. A huge amount of work has been devoted in the past ten years to characterizing and modeling Internet traffic. The selfsimilarity property observed in local area traffic [8, 10] has greatly influenced the theoretical developments in the domain of Internet traffic characterization. This has all the more been supported by the fact that the classical Poisson modeling assumption at packet level has been shown to fail in the case of wide area Internet traffic by Paxson and Floyd [13]. Different hypotheses and assumptions have been explored to explain the reasons why and how Internet traffic should be self-similar (see for instance [6, 7]). One possible cause is related to the oscillations of TCP reacting to packet loss, giving rise to a highly variable traffic. ∗ This work has been partially supported by the project Metropolis funded by the French network for research in telecommunications (RNRT)

Another key characteristic of Internet traffic is long range dependence. This phenomenon has been observed, maybe for the first time in the networking community, by Garrett and Willinger [9] in the study of variable video traffic. Long range dependence and self-similarity are two distinct concepts, but the former property is enjoyed by self-similar processes, which are then good candidates to model traffic when long range dependence is observed. This has motivated their introduction in network traffic modeling (see [11]). Digital signal processing techniques [1] can be used to estimate their parameters. It is also worth noting that other techniques have been proposed in the literature to model Internet traffic. In [3], the authors try to approximate real traffic traces by means of a Poisson noise shot process. Different shot functions are tested in order to give the best fit with real traces. Other methods based on the versatile Markov Arrival Process (i.e., a Poisson process with an intensity modulated by a Markov chain) can also be used to approximate real traffic traces. The key problem in this last approach, however, is the rapid explosion of the number of modulating states. The studies cited above on the self-similar nature of traffic have been realized in the context of local area networks, but the situation is quite different for wide area networks, which are rapidly evolving with the steady increase of link transmission capacities and the emergence of new usage. On a Gigabit per second (Gbps) link, flows are highly aggregated and packets from different flows are interleaved. In this context, it can be shown under reasonable assumptions that the packet arrival process tend to become Poisson; see [4] for instance. In this paper, we analyze traffic measurements from a 1 Gbit/s high speed link carrying commercial traffic. The key difference between this type of traffic and Ethernet traffic is in that the prevalent part of global traffic is due to peer-to-peer (p2p) applications. Their operating mode greatly impacts the characteristics of traffic. In order to analyze this impact, we adopt in this paper a flow-based approach by distinguishing between short and long flows. The basic motivation for introducing this distinction is that these two types of flows are expected to have a different networking behavior: long flows controlled by TCP are likely to react to network congestion through the congestion avoidance regime of TCP, while short flows do not leave or leave slightly the slow start regime and are thus less sensitive to bandwidth sharing imposed by TCP, even though they may also react to loss by reducing the congestion window to the minimum threshold (e.g., one packet). The primary goal of this paper is to investigate the impact of p2p applications on traffic characteristics. The organization of this paper is as follows: Basic definitions are given in Section 2. Mouse and elephant bit rates are analyzed in Sections 3 and 4, respectively. The spectral characteristics at low frequencies of the global bit rate are investigated in Section 5. Finally, some concluding remarks are presented in Section 6.

2. FORMULATION In this paper, we consider measurements from a 1 Gbit/s link of the France Telecom IP backbone network. We observe TCP traffic from the backbone network in direction to several ADSL areas. Traffic is mainly due to ADSL customers and is thus quite different from LAN or Tiers One traffic usually analyzed in the technical literature [3, 5, 10, 15]. The total load of the link (including TCP and UDP traffic) is about 43.5%. Traffic is observed by means of a measurement device, which performs a copy of the headers of both TCP segments and IP packets. We are thus able to identify those packets with the same 5-tuple composed of the source IP address, destination IP address, source port, destination port and protocol type. Packets with the same 5-tuple are said to belong to the same flow. The traffic trace analyzed below was captured in October 2003 between 9:00 pm and 11:00 pm, which usually corresponds to a quite high activity period by ADSL customers. In the following, we evaluate the “instantaneous” bit rate by computing the number of bits arriving in time intervals of length ∆ = 100 ms. Let Xn denote the bit rate evaluated over the nth time interval. The goal of this paper is to analyze the properties of the process {Xn }, which can also be seen as a chronological series, assumed to be stationary in the wide sense, characterized by its mean and its spectral density ψ(x) defined by Z π cov(Xn , Xn+k ) = eikx ψ(x)dx, −π

where cov(Xn , Xn+k ) is the covariance of the random variables Xn and Xn+k . In a first step, we observe the composition of traffic per application by analyzing port numbers; results are given in Table 1. It immediately appears that a significant part of traffic is due to p2p applications. In fact, the figures reported in Table 1 are based upon port numbers, but current p2p protocols use dynamic port numbers and a large proportion of traffic labelled “others” in Table 1 is certainly due to p2p protocols. The total contribution of p2p traffic is certainly closer to 80% as usually observed by analyzing the application layer. non p2p

p2p

Applications http ftp nntp others total non p2p traffic Edonkey Kazaa&Morpheus Napster Gnutella Total p2p traffic

percentage 14.5 1.5 1.0 26.7 43.7 50.6 3.8 1.5 0.4 56.3

Table 1: Composition of ADSL traffic per application. The second observation is that flows are very small, most of them comprise less than 10,000 bytes, as shown in Figure 1. This is line with the usual observation that only a small fraction of flows generate the majority of traffic (only one out of 1000 flows is larger than 10,000 bytes). This shows the importance of the concept of flow when analyzing traffic: short and long flows do not have the same impact on the bit rate. In fact, it can be checked

that flows with less than 10,000 bytes contribute only 6% of the total load. If we want to describe the bit rate process, we have to concentrate on long flows, which gives rise to both the majority of traffic and the long term correlation structure in the bit rate process; short flows appear more or less as noise and generate short term correlations. 1

0.1

0.01

0.001

0.0001

1e-05 100

1000

10000

100000

Figure 1: Survival probability density function of the flow size in bytes.

In the technical literature, it is frequently argued that long range dependence is due to the transmission of very large amount of data on a single TCP connection (see for instance [6]). This is typically the case of a file transfer via ftp. However, with the emergence of p2p protocols, the situation is changing. Indeed, most recent file sharing protocols rely on the segmentation of large files into small files (chunks), which can be downloaded by a peer in parallel (see [14] for more details). Thus, the principal cause of long range dependence is progressively disappearing as the proportion of p2p traffic in commercial wide area network increases. In order to describe the bit rate process, we use in the following a flow-based approach and a simple dichotomy [2]: short flows are composed of a number of packets less than or equal to 20 and long flows comprise more than 20 packets; a short flow is terminated when no packets have been observed for a period of 5 seconds. This timer is introduced to concentrate on the transmission phase of flows. We can discuss for a long time what are mice and elephants in the Internet [12]. In this paper, we will call long flows “elephants” and short flows “mice”, even though these terms may appear abusive to purists. 3. MODELING MOUSE TRAFFIC In this section, we investigate the bit rate of mice. For this purpose, we make a distinction between those mice related to p2p protocols and the other mice due to usual Web applications. The basic motivation for this distinction is in that p2p mice are generated by signaling or maintenance procedures in a p2p network. Those mice are likely to arrive in bursts. For instance, a peer searching for a content will send different mice to the hosts connected to the p2p network. Introduction of macro-mice In a first step, we analyze mice, which are apparently not generated by p2p protocols, i.e., with port numbers different from 1214 (Kazaa), 4662-4661 (Edonkey), 6346 (Gnutella) and other p2p protocol port numbers. Note that the observation of port numbers does not suffice to be sure that some mice are not generated by

p2p protocols. However, this seems to be sufficient to capture the global behavior of regular mice; this is why we discriminate mice only on the basis of port numbers. The objective of this section is to describe the bit rate process of regular mice and to propose a probabilistic model approximating this process. To characterize the mouse arrival process, we compute the distributions of the number of mice active at an arbitrary instant; the arrival rate λ and the mean duration of mice are equal to λ ≈ 595 mouse/second and E[S] = 2.4 seconds, respectively. If the arrival process of mice were Poisson, then the number of mice active at an arbitrary instant should be identical to the number N of customers in an M/G/∞ queue. In particular, the stationary distribution should be Poisson with mean λE[S], i.e., X (λE[S])k /k!. Pr(N ≥ n) = e−λE[S]

This is not sufficient to show the arrival process is Poisson, but for the time being, we conjecture that this assumption is true. The complementary distribution function of the duration of macro-mice is displayed in Figure 3. Their inter-arrival time is exponential with mean 1/λm , where λm = 326.46. The probability distribution of the duration Sm of a macro-mouse can be well approximated by a two parameter Weibullian distribution with scale parameter ηm = 3.17 and skew parameter βm = 0.87, which means that ” “ (1) Pr(Sm > x) ≈ exp −(x/ηm )βm . The mean duration of a macro-mouse is E[S] = 3.25 seconds (the theoretical value is ηm Γ(1 + 1/βm ) = 3.4 s, where Γ is the Euler Gamma function). Finally, the mean number of mice in a macro-mouse is equal to 2.42.

k≥n

Figure 2(a) shows that this property is no satisfied and that the arrival process of mice is hence not Poisson.

1

0.1

0.01

1

arbitrary instants theoretical

0.8

0.001

0.6 0.0001

0

0.005

0.01

0.015

0.02

0.025

0.03

0.4

0.2

0 1200

Figure 3: Characteristics of non p2p macro-mice. 1300

1400

1500

1600

1700

1800

1900

(a) Mice

1

arbitrary instants arrival instants theoretical

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 950

1000

1050

1100

1150

1200

1250

1300

1350

1400

(b) Macro-mice

Figure 2: Distribution of the number of active mice. To overcome this problem, we note that mice are actually not independent. In reality, for a same destination IP address, a certain number of mice arrive near one to each other, forming what we call in the following a macro-mouse. We specifically define a macro-mouse as a set of non p2p mice, which have the same destination address and which arrive within a rather short time interval, say, with a length of δ = 1 second; moreover, we impose that a macro-mouse comprises more than one packet. The stationary distributions of the numbers of macro-mice active at a given instant and at the arrival time of a macro-mouse are displayed in Figure 2(b). It turns out that these two experimental distributions are almost indistinguishable one of each other and of the Poisson distribution with the same mean and arrival rate.

For p2p mice, we adopt the same methodology by aggregating mice. At first glance, we may aggregate p2p mice according to their source address. Intuitively, this criterion corresponds to the fact that a member of a p2p network searching for a content sends requests to different nodes. But this level of aggregation is not sufficient because the process counting the aggregated p2p mice on the basis of their source address remains quite irregular. In fact, the search for a content and the transmission of requests give rise to response messages by users and servers connected to the p2p network. Hence, a second level of aggregation consists of grouping the aggregated p2p mice on the basis of their destination address. This second level of aggregation yields p2p macro-mice, which are composed of p2p mice with the same IP source address and/or the same destination address and arriving in a time interval of δ = 1 second. Note that δ is a critical parameter since p2p mice are aggregated over time intervals of δ seconds, which should correspond to the time needed to send requests and get answers by the different members of a p2p network. By using the same method as for regular mice, it can be shown that the p2p macro-mice arrive according to a Poisson process with rate λµ = 903.34. Moreover, the macro-mice duration can be approximated by a two parameter Weibullian distribution with skew parameter βµ = 1.207 and scale parameter ηµ = 6.36; the mean value of the duration is E[Sµ ] = 5.67 s (which is close to the theoretical value ηµ Γ(1+1/βµ ) = 5.97 s). Finally, the mean number of p2p mice in a macro-mouse is equal to 4.31. Bit rate process of mice To describe the bit of mice, we use the macro-mice introduced in the previous section. Consider for instance the non p2p macromice. To analyze the bit rate created by those macro-mice, we

adopt a fluid flow approach. More precisely, by neglecting discrete packet arrivals, we assume that the bit rate of a macromouse is constant and equal to the total number of bits divided by the duration of the macro-mouse. We then get the fluid approximation of the bit rate of the macro-mouse. The key point is in that since the mean arrival rate of macro-mice is large, the fluid bit rate {Λm t } of macro-mice, defined by X m Λm Yj (2) t = j∈Am t

m where Am is t is the set of macro-mice active at time t and Y j the fluid bit rate of the jth macro-mouse, can be approximated in distribution by a Gaussian process, which auto-correlation function is perfectly known [2]. The fluid bit rate over the nth time interval with length ∆ is defined by Z (n+1)∆ 1 ˜m Λm (3) Λ = s ds. n ∆ n∆

˜m Once we have computed the fluid bit rate process {Λ n }, we can reasonably assume that discrete packet arrivals give rise to a white noise, since the number of packets is very large. Thus, the ¯ nm } of macro-mice should be equal to actual bit rate process {X ˜m the fluid bit rate process {Λ n } perturbed by a white noise. In other words, we should have the representation ¯ nm = Λ ˜m X n + σ m εn , where {εn } is a standard white noise. Let us now consider the fluid bit rate process. Since the length ∆ of the integration interval in Eq. (3) is small, one may expect m ˜m that Λ n ∼ Λn∆ . It follows that the auto-correlation function ˜m cΛ˜ m (`) of the process {Λ n }, defined by ˜m ˜m ˜m cΛ˜ m (`) = cov[Λ n Λ(n+`) ]/var[Λn ], should be close to cΛm (`∆), where cΛm (h) is the autocorrelation function of the process {Λm t }, given by (see [2] for details) 2 2 (Sm − h)+ ]/E[Ym Sm ], cΛm (h) = E[Ym

where Ym and Sm denote the fluid bit rate and the duration of a macro-mouse, respectively. Of course, the bit rate of a macro-mouse depends upon its duration. However, it is experimentally observed that E[Ym2 | Sm ] does not vary so much with respect to the duration Sm so that we 2 can assume that E[Ym | Sm ] a constant, equal to κm = 3.5e8. This constant is equal to κµ = 1.5e6 for p2p macro-mice. As a consequence, we have « βm ! „ h 1 E[(Sm − h)+ ] , (4) , =1−P cΛm (h) ∼ E[Sm ] βm ηm where we have taken into account the fact that the distribution of Sm has the form given by Eq. (1) and P (a, x) is the incomplete Gamma function. Finally, it remains a fraction of single packet mice, which are not included in macro-mice. The resulting bit rate is very small (a few Kbit/s). In fact, this residual bit rate can be approximated ˆ nm = dˆm + σ by a white noise, represented as X ˆm εn , where {εn } ˆ is a standard white noise (dm = 3.3 Kbit/s and σ ˆm = 526 bit/s). The same phenomenon is observed for p2p mice; the residual bit rate process is a white noise with mean dˆµ = 3.22 Kbit/s and standard deviation σ ˆµ = 844.27 bit/s.

By taking into account the above results and assuming independence between the different white noise processes, we come up with the conclusion that the non p2p mouse bit rate can be represented as p 2 +σ 2 d ˆm , ˜m Xnm = Λ σm (5) ˆm n + εn

˜m where {εn } is a standard white noise and {Λ n }, which is related to the M/G/∞ queue according to Eqs. (2) and (3). This latter process can be well approximated by a Gaussian process with ˜m mean E[Λ ˜ m = 711 n ] = 5.551 Mbit/s, standard deviation σΛ Kbit/s, and autocorrelation function given by Eq. (4). To check the above representation for the bit rate process, we compute the spectral density ψX m of the chronological series m {Xnm }, for n = 0, 1, 2, . . .. The quantity cov[Xnm Xn+` ] is comm puted by averaging Xnm Xn+` for n = 1, 2, . . . , M , where M is the total number of samples. From Eq. (4), we have ψX m (x) ∼

2 2 σm +σ ˆm + κm ψLm (x), 2π

where ψLm is the spectral density associated with the process m {Lm n∆ }, Lt denoting the number of macro-mice active at time t. It then follows that for x ∈ [−π, π], we should have ψX m (x/∆) ∼

2 2 σm +σ ˆm + κm ∆ψm (x), 2π

where ψm is the spectral density of the process {Lm t } and is given by ψm (x) = ψ(x; λm , ηm , βm ) with Z ∞ β λ sin(tx)e−(t/η) dt. (6) ψ(x; λ, η, β) = πx 0 The comparison between ψX m (./∆) obtained by eliminating the white noise via a wavelet filter and κm ∆ψm is illustrated in Figure 4. It clearly appears that the two spectral densities are very close one to each other. This validates representation (5) for the bit rate of non p2p mice and in particular the Poisson assumption for the arrival process of macro-mice. The same properties hold for p2p mice (see [2]). 1e+12

Filtered bit rate Theoretical approximation

1e+11

1e+10

1e+09

1e+08

1e+07

0

5

10

15

20

25

30

35

Figure 4: Comparison of spectral densities for mice. 4. MODELING ELEPHANT TRAFFIC Different elephant types In a first step, we draw attention to the fact that all elephants are not equivalent. Indeed, among all the elephants analyzed in the data collected, a significant number of elephants are essentially composed of ACK segments. On the one hand, these elephants

indicate that the computers of some ADSL customers play the role of servers from which some files are retrieved by clients outside the corresponding ADSL area. This is in line with the observation that the usage of the Internet becomes more and more symmetrical as p2p applications massively spread over the whole Internet, the usual client/server scheme being progressively replaced with the peer-to-peer paradigm. On the other hand, those elephants mainly composed of ACK segments offer rather a small bit rate (about 1% of global traffic). In fact, ACK segments do not contain very much information and within a same flow, consecutive ACK segments are elapsed by rather long time periods, at least at the time scale of the 1 Gbit/s link. As a consequence, we shall assume in the following that the bit rate created by such elephants (referred to as ACK elephants) be closed to a non centered white noise process. Other elephants are referred to as data elephants in the following. Data elephants In this section, we examine the dynamics of elephants carrying data, that is, elephants with mean packet size greater than 80 bytes. Figure 5(a) shows the evolution of the sequence number of a particular elephant.

mini-elephants and mice (which should not be confused with other mice introduced in the previous section). A mini-elephant is a group of more than 20 packets and is terminated when no packets have been observed for a time period of 20 seconds. A mouse is a group of less than 20 packets, which do not belong to minielephants. In order to describe the bit rate, we have to concentrate on transmission phases of elephants. This is the basic motivation for introducing the concept of mini-elephants. Time periods of low activity do not significantly contribute to the global bit rate so that the contribution of mice can be neglected. Characteristics of mini-elephants The complementary distribution function of the duration Se of mini-elephants is displayed in Figure 6. It turns out that the experimental probability distribution function can be well approximated by a two-parameter Weibullian distribution (see Eq. (1)). Taking a scale parameter ηe = 64.04 seconds and a skew parameter βe = 0.4 yields a good fit of the empirical distribution. Moreover, the experimental mean of Se , denoted by E[Se ] = 200 seconds is close to the theoretical value ηe Γ(1 + 1/βe ) = 212 seconds. 1

Empirical distribution Approximation

1.6235e+09 1.623e+09

0.1

1.6225e+09 1.622e+09 1.6215e+09 1.621e+09

0.01

1.6205e+09 1.62e+09

0

500

1000

1500

2000

2500

3000

3500

1.6195e+09 1.619e+09 1500

2000

2500

3000

3500

4000

4500

5000

the duration of mini-elephants (in seconds).

(a) Well-behaved elephant

1.3617e+09

p2p

1.36165e+09 1.3616e+09 1.36155e+09 1.3615e+09 1.36145e+09 1.3614e+09 1.36135e+09

1760

1780

1800 time (secondss)

1820

Figure 6: Complementary probability distribution function of

1840

(b) p2p elephant

Figure 5: Temporal evolution of the sequence number of some data elephants.

The behavior of the elephant in Figure 5(a) is in line with the classical assumption of a permanent TCP connection. However, such an elephant is an exception. In reality, most elephants are not continuously transmitting but are composed of bursts elapsed by time periods of low activity, in which only a few packets are transmitted. A typical example is shown in Figures 5(b), displaying the evolution of the sequence numbers of a p2p elephant. The above observation leads us to decompose elephants into

Note that the fact that the duration of mini-elephants can be approximated by a Weibullian distribution indicates that minielephants do not spread over very long time periods, as it is usually the case of data transfers causing long range dependence. It is nevertheless worth noting that the skew parameter βe = 0.4 is much smaller than 1 and that mini-elephants are quite stretched, giving rise to the predominant component of the power spectrum of global bit rate in low frequencies as shown in the following. As for mice, the squared fluid bit rate as a function of the duration Se can be approximated by a constant, equal to κe = 1e9, corresponding to a bit rate about 30 Kbit/s. Such small bit rates are typical to p2p applications. Indeed, chunks are retrieved from servers, which might be computers of customers with limited CPU capacities, leading to small bit rates for the corresponding TCP connections. The same phenomenon has been observed in [14] in a totally different setting. 5. SYNTHESIS OF THE GLOBAL BIT RATE By using the decomposition of traffic introduced in the previous sections, we can represent the global bit rate process {Xn } as Xn = Xne + Xnm + en , where {Xne } is the bit rate of mini-elephants, {Xnm } is the bit rate of mice and {en } is a white noise due to ACK elephants. Now,

mice have short durations (see Section 3) and give rise to short term variations of the global bit rate process; their impact on the power spectrum in low frequencies is negligible. By ignoring the noise component, which contributes only through the addition of a constant, the power spectrum of the process {Xn } is essentially due to mini-elephants. To model the bit rate of mini-elephants, we use the same method as for mice and the fact that the squared fluid bit rate does not depend very much on the duration and can be apporoximated by a constant κe = 1e9. Then, we eliminate, by using for instance a wavelet filter (see [2] for details), the white noise in ¯ n }; the white the process {Xn } to obtain the filtered process {X noise is due to ACK elephants and discrete packet arrivals. If all the assumptions and approximations made so far are valid, we ¯n } should have that the spectral density ψ(x) of the process {X should satisfy for small x (i.e. low frequencies) (7)

ψ(x/∆) ≈ ∆κe ψ(x; λe , ηe , βe ),

where ψ(x; λ, η, β) is defined by Eq. (6). Approximation (7) is illustrated in Figure 7, which shows that this approximation is in good agreement with experimental data in low frequencies. This figure also displays the power spectrum of the fluid bit rate process {Λt }, which is very close to that of the global bit rate for all frequencies. This justifies a posteriori all the assumptions made so far. In particular, the power spectrum does not exhibit a singularity near the origin and thus, there is no evidence for long range dependence in the global bit rate. 1e+14

1e+13

7

REFERENCES

[1] P. Abry and D. Veitch. Wavelet analysis of long range dependent traffic. IEEE Trans. Information Theory, 44(1):2–15, January 1998. [2] N. Ben Azzouna, F. Cl´erot, C. Fricker, and F. Guillemin. Modeling ADSL traffic on an IP backbone link. Annals of Telecommunications, to appear, 2004. [3] C. Barakat, P. Thiran, G. Iannaccone, C. Diot, and P. Owezarski. A flow-based model for internet backbone traffic. In Proc. ACM SIGCOMM Internet Measurement Workshop, Marseille, November 2002. [4] J. Cao and K. Ramanan. A Poisson limit for buffer overflow probabilities. In Proc. Infocom 2002, New York, 2002. [5] K. Claffy, G. Miller, and K. Thompson. The nature of the beast: Recent traffic measurement from an Internet backbone. In Proc. of Inet, 1998. [6] M. Crovella and A. Bestravos. Self-similarity in world wide web. Evidence and possible causes. IEEE/ACM Trans. on Networking, pages 835–846, December 1997. [7] A. Feldmann, A.C. Gilbert, W. Willinger, and T. Kurtz. The changing nature of network traffic: Scaling phenomena. In Computer Communication Review, volume 28:5–19, 1998.

Mini-elephants fluid bit rate TCP filtered bit rate Theoretical approximation

[8] H.J. Fowler and W.E. Leland. Local area network traffic characteristics, with implications for broadband network congestion management. IEEE J. Sel. Areas in Commun., 9(7):1139–1149, 1994.

1e+12

1e+11

1e+10

1e+09 0.01

p2p protocols. One can observe from now on a certain symmetric usage of the Internet resources showing that the peer-to-peer paradigm is emerging in commercial traffic. In the future, the migration of p2p traffic on UDP with the emergence of games may radically change the situation, making traffic more aggressive for the network.

0.1

1

Figure 7: Periodogram at low frequencies of the filtered bit rate

¯ n } and the fluid bit rate {Λ ˜ n }, together with approxiprocess {X mation (7).

6. CONCLUSION We have analyzed in this paper the impact of peer-to-peer protocols on the properties of traffic in commercial wide area networks by introducing a decomposition into different components, which are relevant from a networking point of view. The way p2p protocols are running, in particular the segmentation of large files into chunks of moderate size retrieved by a customer via opening several TCP connections, possibly in parallel, and the small bit rate achieved by each TCP connection, greatly impacts the properties of traffic. Commercial traffic is much smoother than Ethernet traffic in local area networks. Finally, we come up in this paper with the conclusion that the classical client/server paradigm, which was in force up to a few years ago, is progressively disappearing with the emergence of

[9] M. Garrett and W. Willinger. Analysis, modeling and generation of sef-similar VBR videa traffic. In Proc. Sigcomm, London, England, 1994. [10] W. Leland, M..Taqqu, W. Willinger, and D. Wilson. On the self-similar nature of ethernet traffic. IEEE/ACM Trans. Net., pages 1–15, 1994. [11] I. Norros. On the use of fractional Brownian motion in the theory of connectionless networks. IEEE J. Sel. Areas Commun., 13(6), August 1995. [12] K. Papagiannaki, N. Taft, S. Bhattachayya, P. Thiran, K. Salamatian, and C. Diot. On the feasibility of identifying elephants in Internet backbone traffic. Technical Report TR01-ATL-110918, Sprint Labs, Sprint ATL, November 2001. [13] V. Paxson and S. Floyd. Wide area traffic: The failure of the Poisson assumption. IEEE/ACM Trans. on Networking, pages 226–244, 1995. [14] K. Tutschku and P. Tran-Gia. A traffic profile of the eDonkey filesharing service. Technical report, COST 279TD(03)049, 2003. [15] Z.L.Zhang, V. Ribeiro, S. Moon, and C. Diot. Small time scaling behavior of Internet backbone traffic: An empirical study. In Proc. Infocom 2003, 2003.

Suggest Documents