TCP CONGESTION CONTROL by SRINATH GOPALAN AND SURANJAN PRAMANIK
Table of Contents • Motivation • Terminology • Implementation Schemes • Simulation Results • References
1
Motivation • Exponential Increase in Network Demand — Rising packet loss rates — Low utilization and goodput — Potential for congestion collapse
•Need for End-to-End congestion control — To avoid congestion collapse — Fairness — As a tool for the application to better achieve its own goals: e.g. minimizing loss in delay and maximizing the throughput
Congestion Control Before TCP in the early 80’s
— TCP flow control to avoid overflowing receiver’s buffer. — TCP’s Go-Back-N retransmission. — FIFO scheduling, drop tail queue management.
A series of congestion collapse in 1986 — Congestion collapse: Paths clogged with unnecessarily-retransmitted packets [Nagle 84]
2
Congestion Control Today • TCP — Instrumental in preventing congestion collapse — Limits transmission rate at the source — Window-based rate control • Increased and decreased based on network feedback • Implicit congestion signal based on packet loss • Slow-start, Congestion avoidance, Fast-retransmit, Fast-recovery • Exponential backoff of the retransmit timer, when a retransmitted packet is itself dropped.
Terminology Sender Maximum Segment Size (SMSS) - The size of the largest segment that the sender can transmit.
Receiver Window (rwnd) - The most recently advertised receiver window.
Congestion Window(cwnd) - A TCP state variable which limits the amount of data a TCP can send.
Initial Window(IW) - Size of the sender’s congestion window after the 3 way handshake is completed.
3
Terminology contd.... Flight Size - The amount of data that has been sent but not yet acknowledged.
Slow Start Threshold(ssthresh) - It is a TCP state variable to determine whether the slow start or the congestion avoidance algorithm is to be used.
Maximum Burst(maxburst) - It is a TCP state variable which limits the amount of data that can be sent after coming out of Fast Recovery.
TCP Congestion Control Mechanisms/Algorithms Basic control mechanism: sliding windows Modern TCP implementations contain a number of algorithms aimed at controlling network congestion while maintaining good user throughput — Slow Start — Congestion avoidance — Fast retransmit — Fast recovery TCP-Tahoe implements the first 3 algorithms TCP-Reno implements all the 4 algorithms
4
Slow Start
Why need slow start ? With unknown conditions, TCP need to slowly probe the network to determine available capacity Slow start is used at the beginning of a transfer or after retransmission timeout TCP increments cwnd by at most SMSS bytes for each ACK received (Additive increase) Slow Start ends when cwnd > ssthresh or when congestion is observed. On Timeout ssthresh = max(Flight Size/2,IW)
TCP without Slow -Start
5
TCP with Slow - Start
Congestion Avoidance Starts when cwnd > ssthresh cwnd is incremented by atmost 1 full-sized segment per roundtrip time
cwnd += SMSS * SMSS / cwnd
Stops when congestion is detected (timeout) Sender sends the min(cwnd,rwnd)
6
Fast Retransmit TCP coarse grained timeout is inefficient, waits too long before it retransmits receiver gets out-of-order packets, sends ACK for expected packets sender sees these as duplicate ACK’s. after 3 duplicate ACKs, sender retransmits first unacknowledged packet without waiting for retransmit timeout set ssthresh = max(Flight Size/2, IW) ----- (1) set cwnd = ssthresh + 3*SMSS
Fast Recovery For each additional dup. ACK increase cwnd by SMSS
— Slow start is not performed because dup. ACK indicates additional segment has left the network Transmit a segment if allowed by cwnd and rwnd When next ACK acknowledges the new data sent, set cwnd = ssthresh as in (1) and come out of fast recovery
7
Example of TCP Windowing Congestion avoidance Slow-start Fast Retransmit/Recovery
2W 4
W+1 W
2 1 RTT RTT
RTT
TCP Tahoe
First implementation which had congestion avoidance mechanisms used new algorithms like slow-start, congestion avoidance and fast retransmit modification to the RTT estimator used for setting retransmission timeout values
Disadvantages: • Retransmitting packets which might have already been successfully delivered.
8
TCP Reno !
"
#
$
Enhancement of TCP Tahoe modified Fast retransmit operation to include Fast recovery prevents the pipe from going empty after Fast retransmit avoids need to slow-start as in TCP Tahoe
Disadvantages: • retransmits at most one dropped packet per RTT • suffers when multiple packets are dropped from a single window of data
Two States for TCP Reno 3 Duplicate ACK’s
Fast Recovery
Regular
Ack for retransmitted pkt received
9
TCP Sack %
Implementation •Three Duplicate ACK’s require to trigger Fast-Recovery. •Reduce congestion window by half; don’t slow-start •Response to further duplicate ACK’s Main Difference from Reno: When multiple pkts are lost from a single window of data
Two States for TCP SACK 3 Duplicate ACK’s
Fast Recovery
Regular
Ack for everything sent before Fast Recovery
10
TCP SACK Header. TCP OPTIONS
IP Header
MAX 40 Bytes
20 Bytes
05
Length
Left edge of Block 1
TCP Header 20 Bytes
Right edge of Block 1 Left edge of Block 2 Right edge of Block 2
TCP SACK contd.. &
On Entering Fast Recovery • Retransmit one packet • Cut the congestion window into half (“cwnd”) • Estimate the number of packets in the pipe( “pipe”)
11
TCP SACK contd... Behavior in Fast Recovery '
• When and how much to send? Whenever the number of packets in the pipe is less than the cwnd. • What to send? Fill “holes”, one packet at a time, in sequence number order.If there are no holes,send new packets • If a retransmitted packet is itself dropped then slow-start • The current implementation waits for a retransmit timer to detect the dropped packet
TCP SACK contd.. (
Behavior in Fast Recovery : receiving ACK • Duplicate ACK’s: Decrement “pipe” call “send” • An ACK that ends Fast Recovery: call “send” • An ACK that does not end Fast Recovery ( SACK ) Decrement “pipe” by two packets once for the retransmitted packet, and once for the original packet (now presumed to have been dropped ). Call “send”
12
TCP SACK contd... )
Behavior in Fast Recovery: sending data pkt • Send if the number of packets in the “pipe” is less than cwnd • Use the SACK scoreboard to determine which pkt to send • Increment “ pipe” • use maxburst parameter to send new data.
TCP SACK Snd.fack =4 Snd.una =1
Snd.next =9
8
7
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 DATA
Send Buffer
Receive Buffer
2 3 ACK Score Board
SENDER
9
4
RECEIVER
13
TCP SACK Snd.fack =7 Snd.una =1
Snd.next =9
1 1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 DATA
Send Buffer R 2 3 4
Receive Buffer
6 ACK
Score Board
8
7 SENDER
RECEIVER
TCP Reno And Sack
.
-
,
+
*
Comparison of throughput and congestion window One PC at UCLA and another at PSC Behavior of an FTP connection, one with TCP SACK and another with TCP Reno The 2 FTP’s were done at different times of the day with different network traffic Key : Seq. : Sequence number of the packet cwnd : congestion window
14
TCP Reno (high) UCLA -PSC Reno 1.2e+07
Seq Cwnd —
1e +07 8e +06 6e +06 4e +06
2e +06
0
20
40
60
80
100
120
140
TCP SACK (high) UCLA -PSC SACK 1.2e+07
Seq Cwnd —
1e +07 8e +06 6e +06 4e +06
2e +06
0
20
40
60
80
100
120
140
15
Results Throughput /
TCP Sack : 81 kbytes/s 0
TCP Reno : 63 kbytes/s 1
2
TCP Sack / TCP Reno : 1.29
TCP Reno (Avg.) UCLA -PSC Reno
1.8e+07
Seq Cwnd —
1.6e +07 1.4e +07 1.2e +07 1e +07 8e +06 6e +06 4e +06
2e +06 0
20
40
60
80
100
120
140
16
TCP SACK (Avg.) UCLA -PSC SACK 1.8e+07 Seq
1.6e +07 1.4e +07
Cwnd —
1.2e +07 1e +07 8e +06 6e +06 4e +06
2e +06 0
20
40
60
80
100
120
140
Results Throughput 3
TCP Sack : 132 Kbytes/s 4
TCP Reno : 104 Kbytes/s 6
5
TCP Sack / TCP Reno : 1.27
17
TCP Reno (Low) UCLA -PSC Reno
4.0e+07
Seq Cwnd —
3.5e +07 3.0e +07 2.5e +07 2.0e +07 1.5e +07 1e +07
5e +06 0
20
40
60
80
100
120
140
TCP SACK(Low) UCLA -PSC SACK
4.0e+07
Seq Cwnd —
3.5e +07 3.0e +07 2.5e +07 2.0e +07 1.5e +07 1e +07
5e +06 0
20
40
60
80
100
120
140
18
Results Throughput 7
TCP Sack : 257 Kbytes/s 8
TCP Reno : 221 Kbytes/s 9
TCP Sack / TCP Reno : 1.16 :
TCP NewReno
>
=
otherwise G
TCP Vegas contd. I
TCP Vegas has a few problems • Re routing — Rerouting a path may change the propagation delay of the connection — There is no serious problem for TCP Vegas if the new route has shorter propagation delay — For a greater propagation delay BaseRTT must be updated else this could lead to a substantial decrease in throughput
21
TCP Vegas contd.. J
Problems contd... • Persistent Congestion Delay can increase due to Congestion/ Re routing TCP Vegas updates its BaseRTT if there is an increase in propagation delay During congestion the BaseRTT should not be increased
TCP Vegas & Reno compared S1
10 Mbps,1ms
10 Mbps,1ms
S3
1ms
R1
R2 1.5Mbps
S2
10 Mbps,1ms
10 Mbps,xms
S4
Network Topology
22
Comparison contd.. X
x
W1 w1
w2
W2 ACK1
ACK2
Ratio
4
3.5
3.5
21,425
16,068
1.33
13
4.0
7.0
17,522
19,965
1.14
22
4.0
7.0
20,061
17,427
1.15
58
4.0
13.0
19,507
17,973
1.09
148
4.0
30.0
16,398
1.29
21,068
TCP Vegas with varying propagation delays
Comparison contd.. X x
W1
W2 ACK1
ACK2
Ratio
4
21,100
15,637
1.35
13
25,460
11,785
2.16
22
25,684
11,672
2.20
58
34,429
2,627
13.11
148
35,598
959
37.12
TCP Reno with varying propagation delays
23
Comparison X
W1 W2 Buffer ACK(R)
ACK(V)
Reno/Vegas
4
13,010
24,308
0.535
7
16,434
20,903
0.786
10
22,091
15,365
1.438
15
25,397
12,051
2.107
25
30,798
6,621
4.652
50
34,443
2,936
11.730
Throughput of TCP Reno Vs Vegas
TCP Pacing Pacing
N
M
L
K
TCP congestion control mechanism can produce bursty traffic . Explicit Rate Control is sending packets at a predetermined rate. Pacing is a hybrid between pure rate control and TCP’s use of acknowledgement -uses the TCP window to determine how much to send and uses rates instead of ACK to determine when to send.
24
TCP Pacing contd. Implementation O
P
Q
R
Timeouts are scheduled regular intervals of duration and is given by RTT/cwnd A packet is transmitted from the window whenever the timer fires - this ensures that packet transmissions are spread across the whole duration of RTT. Pacing imposes the extra overhead of using a timer for each packet transmitted.
Paced Reno & Reno compared S1
4x Mbps,5ms
4x Mbps,5ms
R1
40ms
BS
BR x Mbps
Sn
4x Mbps,5ms
4x Mbps,5ms
Rn
Network Topology for Simulation Experiments
25
Simulation results
Simulation Results
26
Simulation Results
Simulation Results
27
Comparisons between SACK, Reno, NewReno and Tahoe
8Mbps,0.1ms
S1
R1
0.8Mbps,100ms
K1
R1 indicates finite buffer drop tail gateway
Network Topology for Simulation Experiments
Simulation with 1 dropped Pkt
28
Simulation with 1 dropped pkt
Simulation with 2 dropped pkt
29
Simulation with 2 dropped pkt
Simulation with 3 dropped pkt
30
Simulation with 3 dropped pkt
Simulation with 4 dropped pkt
31
Simulation with 4 dropped pkt
References •RFC 896 Congestion Control in IP/TCP - J.Nagle • Congestion Avoidance and control - Van Jacobson. •[F 98] Revisions to RFC 2001- Sally Floyd. ftp://ftp.ee.lbl.gov/talks/sf-tcpimpl-aug98.ps
• Simulation Based Comparison of Tahoe, Reno and SACK TCP - Sally Floyd and Kevin Fall ftp://ftp.ee.lbl.gov/papers/sacks.ps
• TCP and Successive Fast Retransmits - Sally Floyd. ftp://ftp.ee.lbl.gov/papers/fastretans.ps • Improving the start up behavior of a congestion control scheme for TCP ACM SIGCOMM - J Hoe.. www.acm.org/sigcomm/sigcomm96/program.html
32
References • Issues in TCP Slow Start Restart after Idle - Hughes A;Touch J; Heidemann .J • TCP Selective Acknowledgment Options - Mathis M; Madhavi J; Floyd .S A. Romanow.. RFC -2018 • RFC 2001 - W .Stevens. • RFC 2581 - TCP Congestion Control - W Stevens. • RFC 2582 New Reno Modification to TCP’s Fast Recovery Alg -S. Floyd. • Understanding the performance of TCP Pacing - Thomas Anderson • UCLA Internet Research Lab http://irl.cs.ucla.edu/sack.psc.f.html http://irl.cs.ucla.edu/sack.f.html
33