TCP Round Trip Time and Timeout Q: how to set TCP timeout value? r longer than RTT m but RTT varies r too short: premature
timeout m unnecessary retransmissions r too long: slow reaction to segment loss
Q: how to estimate RTT? r SampleRTT: measured time from
segment transmission until ACK receipt m ignore retransmissions r SampleRTT will vary, want estimated RTT “smoother” m average several recent measurements, not just current SampleRTT
3-1
TCP Round Trip Time and Timeout EstimatedRTT = (1- α)*EstimatedRTT + α*SampleRTT r Exponential weighted moving average r influence of past sample decreases exponentially fast
3-2
TCP reliable data transfer r TCP creates reliable
data transfer service on top of IP’s unreliable service r Pipelined segments r Cumulative acks r TCP uses single
retransmission timer
r Retransmissions are
triggered by: m m
timeout events duplicate acks
r Initially consider
simplified TCP sender: m m
ignore duplicate acks ignore flow control, congestion control
3-3
1
TCP sender events: data rcvd from app: r Create segment with seq #
timeout: r retransmit segment that caused timeout
r seq # is byte-stream
r restart timer
number of first data byte in segment
r start timer if not
already running (think of timer as for oldest unacked segment)
r expiration interval:
Ack rcvd: r If acknowledges previously
unacked segments m m
update what is known to be acked start timer if there are outstanding segments
TimeOutInterval
3-4
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
TCP sender
loop (forever) { switch(event) event: data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
(simplified) Comment: • SendBase -1: last cumulatively ack’ed byte Example: • SendBase -1 = 71; y= 73, so the rcvr wants 73+ ; y > SendBase , so that new data is acked
event: timer timeout retransmit not -yet -acknowledged segment with smallest sequence number start timer event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not -yet -acknowledged segments) start timer } } /* end of loop forever */
3-5
TCP: retransmission scenarios Host A
s data
Seq=92 timeout
, 8 byte
Seq =9
=100 ACK
X
loss Seq=92
, 8 byte
s data
Sendbase = 100 SendBase = 120
=100 ACK
SendBase = 100
time
SendBase = 120
lost ACK scenario
Seq =
Host B 2, 8 by
100,
time
tes da
20 byt
ta
es da
ta
0 10 K= 120 AC ACK=
Seq=92
Seq=92 timeout
timeout
Host A
Host B
Seq=92
, 8 byte
s data
20 K=1 AC
premature timeout 3-6
2
TCP retransmission scenarios (more) Host A
Host B
timeout
Seq=92
SendBase = 120
Seq =1
, 8 byte
00, 20
X
s data
=100 ACK bytes data
loss =120 ACK
time Cumulative ACK scenario 3-7
TCP ACK generation
[RFC 1122, RFC 2581]
Event at Receiver
TCP Receiver action
Arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed
Delayed ACK. Wait up to 500ms for next segment. If no next segment, send ACK
Arrival of in-order segment with expected seq #. One other segment has ACK pending
Immediately send single cumulative ACK, ACKing both in-order segments
Arrival of out-of-order segment higher-than-expect seq. # . Gap detected
Immediately send duplicate ACK, indicating seq. # of next expected byte
Arrival of segment that partially or completely fills gap
Immediate send ACK, provided that segment startsat lower end of gap 3-8
Fast Retransmit r Time-out period often
relatively long: m
long delay before resending lost packet
r Detect lost segments
via duplicate ACKs. m
m
Sender often sends many segments back -toback If segment is lost, there will likely be many duplicate ACKs.
r If sender receives 3
ACKs for the same data, it supposes that segment after ACKed data was lost: m
fast retransmit: resend segment before timer expires
3-9
3
Fast retransmit algorithm: event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } else { increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) { resend segment with sequence number y } a duplicate ACK for already ACKed segment
fast retransmit
3-10
TCP Flow Control r receive side of TCP
connection has a receive buffer:
flow control
sender won’t overflow receiver’s buffer by transmitting too much, too fast
r speed- matching
r app process may be
service: matching the send rate to the receiving app’s drain rate
slow at reading from buffer 3-11
TCP Flow control: how it works r Rcvr advertises spare
(Suppose TCP receiver discards out-of-order segments)
room by including value of RcvWindow in segments r Sender limits unACKed data to RcvWindow m
guarantees receive buffer doesn’t overflow
r spare room in buffer = RcvWindow = RcvBuffer-[LastByteRcvd LastByteRead]
3-12
4
TCP Connection Management Recall: TCP sender, receiver establish “connection” before exchanging data segments r initialize TCP variables: m seq. #s m buffers, flow control info (e.g. RcvWindow) r client: connection initiator Socket clientSocket = new Socket("hostname","port number");
r server: contacted by client Socket connectionSocket = welcomeSocket.accept();
Three way handshake: Step 1: client host sends TCP SYN segment to server m specifies initial seq # m no data Step 2: server host receives SYN, replies with SYNACK segment server allocates buffers specifies server initial seq. # Step 3: client receives SYNACK, replies with ACK segment, which may contain data m m
3-13
TCP Connection Management (cont.) Closing a connection: client closes socket: clientSocket.close();
client
close
FIN
Step 1: client end system
ACK
sends TCP FIN control segment to server
FIN, replies with ACK. Closes connection, sends FIN.
close
FIN
timed wait
Step 2: server receives
server
ACK
closed
3-14
TCP Connection Management (cont.) Step 3: client receives FIN, replies with ACK. m
client
closing
Enters “timed wait” will respond with ACK to received FINs
server FIN
ACK
Step 4: server, receives
closing
FIN
modification, can handle simultaneous FINs.
timed wait
ACK. Connection closed.
Note: with small
ACK
closed
closed
3-15
5
TCP Connection Management (cont)
TCP server lifecycle TCP client lifecycle
3-16
Principles of Congestion Control Congestion: r informally: “too many sources sending too much
data too fast for network to handle”
r different from flow control! r manifestations: m
lost packets (buffer overflow at routers)
m
long delays (queueing in router buffers)
r a top-10 problem!
3-17
Causes/costs of congestion: scenario 1 Host A
r two senders, two
λ out
λ i n : original data
receivers
r one router,
Host B
unlimited shared output link buffers
infinite buffers
r no retransmission
r large delays
when congested
r maximum
achievable throughput 3-18
6
Causes/costs of congestion: scenario 2 r one router, finite buffers r sender retransmission of lost packet Host A
λout
λ in : original data λ'in : original data, plus retransmitted data
Host B
finite shared output link buffers
3-19
Causes/costs of congestion: scenario 2 r always:
λin= λout
(goodput)
λin> λout λin larger
r “perfect” retransmission only when loss:
r retransmission of delayed (not lost) packet makes
(than perfect case) for same
λout
“costs” of congestion: r more work (retransmission) for given “goodput” r unneeded retransmissions: link carries multiple copies of pkt 3-20
Causes/costs of congestion: scenario 3 r four senders
r multihop paths
r timeout/retransmit Host A
Q: what happens as λ in and λ increase ? in
λ in : original data
λ out
λ'in : original data, plus retransmitted data
finite shared output link buffers
Host B
3-21
7
Causes/costs of congestion: scenario 3 H o s t A
λ o u t
H o s t B
Another “cost” of congestion: r when packet dropped, any “upstream transmission capacity used for that packet was wasted! 3-22
Approaches towards congestion control Two broad approaches towards congestion control: End-end congestion control:
r no explicit feedback from
network r congestion inferred from end-system observed loss, delay r approach taken by TCP
Network-assisted congestion control:
r routers provide feedback
to end systems m single bit indicating congestion (SNA, DECbit, TCP/IP ECN, ATM) m explicit rate sender should send at
3-23
TCP Congestion Control r end-end control (no network
How does sender perceive congestion?
r sender limits transmission: LastByteSent-LastByteAcked ≤ CongWin
r loss event = timeout or
r Roughly,
rate (CongWin) after loss event three mechanisms:
assistance)
rate =
CongWin Bytes/sec RTT
r CongWin is dynamic, function
of perceived network congestion
3 duplicate acks
r TCP sender reduces
m m m
AIMD slow start conservative after timeout events 3-24
8
TCP AIMD multiplicative decrease: cut CongWin in half after loss event congestion window
additive increase: increase CongWin by 1 MSS every RTT in the absence of loss events: probing
24 Kbytes
16 Kbytes
8 Kbytes
time
Long-lived TCP connection 3-25
TCP Slow Start r When connection begins,
CongWin = 1 MSS m m
Example: MSS = 500 bytes & RTT = 200 msec initial rate = 20 kbps
r When connection begins,
increase rate exponentially fast until first loss event
r available bandwidth may
be >> MSS/RTT m
desirable to quickly ramp up to respectable rate
3-26
TCP Slow Start (more) r When connection
m m
double CongWin every RTT done by incrementing CongWin for every ACK received
Host A
Host B on e se gm
RTT
begins, increase rate exponentially until first loss event:
en t
two segme
nts
four segme
nts
r Summary: initial rate
is slow but ramps up exponentially fast
time
3-27
9
Refinement Philosophy: r After 3 dup ACKs: m
• 3 dup ACKs indicates network capable of delivering some segments • timeout before 3 dup ACKs is “more alarming”
CongWin is cut in half
window then grows linearly r But after timeout event: m
m
CongWin instead set to 1 MSS;
window then grows exponentially m to a threshold, then grows linearly m
3-28
Refinement (more)
Implementation:
14 congestion window size (segments)
Q: When should the exponential increase switch to linear? A: When CongWin gets to 1/2 of its value before timeout. r Variable Threshold
r At loss event, Threshold is
12 10 8 6
threshold TCP TCP Tahoe Reno
4 2 0 1
2 3
4 5
6 7
8 9 10 11 12 13 14 15
Transmission round
set to 1/2 of CongWin just before loss event
3-29
Summary: TCP Congestion Control r When CongWin is below Threshold , sender in
slow- start phase, window grows exponentially.
r When CongWin is above Threshold, sender is in
congestion-avoidance phase, window grows linearly.
r When a triple duplicate ACK occurs, Threshold
set to CongWin/2 and CongWin set to Threshold.
r When timeout occurs, Threshold set to
CongWin/2 and CongWin is set to 1 MSS. 3-30
10
TCP Fairness Fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K TCP connection 1
TCP connection 2
bottleneck router capacity R
3-31
Why is TCP fair? Two competing sessions:
r Additive increase gives slope of 1, as throughout increases
r multiplicative decrease decreases throughput proportionally equal bandwidth share
Connection 2 throughput
R
loss: decrease window by factor of 2 congestion avoidance: additive increase loss: decrease window by factor of 2 congestion avoidance: additive increase
Connection 1 throughput R 3-32
Fairness (more) Fairness and UDP r Multimedia apps often
do not use TCP m
do not want rate throttled by congestion control
r Instead use UDP: m pump audio/video at constant rate, tolerate packet loss
Fairness and parallel TCP connections r nothing prevents app from
opening parallel cnctions between 2 hosts.
r Web browsers do this r Example: link of rate R
supporting 9 cnctions; m m
new app asks for 1 TCP, gets rate R/10 new app asks for 11 TCPs, gets R/2 !
3-33
11
Delay modeling Q: How long does it take to receive an object from a Web server after sending a request? Ignoring congestion, delay is influenced by: r TCP connection establishment r data transmission delay r slow start
Notation, assumptions: r Assume one link between
client and server of rate R
r S: MSS (bits)
r O: object size (bits)
r no retransmissions (no loss,
no corruption)
Window size:
r First assume: fixed
congestion window, m
window = W segments
r Then dynamic window,
modeling slow start
3-34
Fixed congestion window (1) First case: WS/R > RTT + S/R: ACK for first segment in window returns before window’s worth of data sent delay = 2RTT + O/R
3-35
Fixed congestion window (2) Second case:
r WS/R < RTT + S/R:
wait for ACK after sending window’s worth of data sent
delay = 2RTT + O/R + (K-1)[S/R + RTT - WS/R]
3-36
12
TCP Delay Modeling: Slow Start (1) Now suppose window grows according to slow start Will show that the delay for one object is:
Latency = 2 RTT +
O S S + P RTT + − (2 P − 1) R R R
where P is the number of times TCP idles at server:
P = min{Q, K −1} - where Q is the number of times the server idles if the object were of infinite size. - and K is the number of windows that cover the object.
3-37
TCP Delay Modeling: Slow Start (2) Delay components:
• 2 RTT for connection estab and request • O/R to transmit object • time server idles due to slow start
initiate TCP connection
request object first window = S/R RTT
second window = 2S/R
Server idles: P = min{K-1,Q} times
Example: • O/S = 15 segments • K = 4 windows •Q=2 • P = min{K-1,Q} = 2
third window = 4S/R
fourth window = 8S/R
complete transmission
object delivered
Server idles P=2 times
time at server
time at client
3-38
TCP Delay Modeling (3) S + RTT = time from when server starts to send segment R until server receives acknowledg ement initiate TCP connection
2 k−1
S = time to transmit thekth window R +
request object
S + RTT − 2k −1 S = idle time after thekth window R R
first window = S/R RTT
second window = 2S/R
third window = 4S/R
delay =
P
O + 2 RTT + ∑ idleTimep R p=1
O S S + 2 RTT + ∑ [ + RTT − 2 k −1 ] R R k =1 R O S S = + 2 RTT + P[ RTT + ] − (2 P −1) R R R =
fourth window = 8S/R
P
complete transmission
object delivered time at client
timeat server
3-39
13
TCP Delay Modeling (4) Recall K = number of windows that cover object How do we calculate K ?
K = min{k : 2 0 S + 21 S + L + 2k −1 S ≥ O} = min{k : 2 0 + 21 + L + 2 k− 1 ≥ O / S} O = min{k : 2 k − 1 ≥ } S O = min{k : k ≥ log 2 ( + 1)} S O = log 2 ( + 1) S Calculation of Q, number of idles for infinite-size object, is similar 3-40
14