TCP – cont.
Estimating RTT: Adaptive Retransmission
We don't want to re-transmit too often – instead, we would like to retransmit only when a
segment is actually lost. Roughly, if it is not lost, the returned ACK should arrive at
around RTT after we sent the segment. However, how do we know how long the RTT
is? Note that it may change as routing tables change and as network congestion increases
Getting the timeout right is very important in congestion control. If the timeout is too
small, we will retransmit unnecessarily, making congestion worse. We dislike large
timeouts because in networks prone to losing segments (for whatever reason), segments
will be unnecessarily delayed (from the application’s point of view).
Original Technique for Estimating RTT
TCP records the time when it sends a data segment; it also records the time when the
ACK arrives. Then it finds the difference between the two times. This is a sample RTT.
To compute the average RTT, it uses a moving average formula:
MeanRTT = alpha * MeanRTT + (1-alpha)*SampleRTT.
The original TCP specification recommended an alpha between .8 and .9, which weights
history somewhat heavily.
The timeout is computed as twice the weighted average RTT.
Suppose a segment is lost the first time it is sent. After the timeout (2*average RTT), it is
re-transmitted. The ACK for the second segment will arrive at about RTT after it was
sent (assuming that it was not lost). But we’re using the time it takes the ACK to arrive
to estimate the RTT. Consider the following examples.
How can the sender distinguish a late-arriving ACK for the first segment, which had been
assumed lost, from the normal ACK for the second segment, which was the
retransmission of the first segment?
Karn/Partridge Solution: If it was necessary to retransmit a segment, don’t use it in the
computation of RTT.
The original recommendation for computing the timeout was to double the estimated
RTT. But the value used should not be fixed, it should really depend on the variance in
the RTT. If the variance is small, the timeout should be closer to the observed RTT; if it
is large, it should be larger.
Timeout = *MeanRTT + *Deviation
Typical values: =1, =4
Must estimate deviation as well as mean:
Difference = Sample – MeanRTT
MeanRTT = MeanRTT + *Difference
Deviation = Deviation + *(|Difference| - Deviation)
is between 0 and 1 (it is the weight for the history – a small value weighs history more
heavily, a large value weighs recent values more heavily).
Congestion Control in TCP
Peterson, Section 6.3
Works for either FIFO or Fair Queuing in network:
FIFO: first-in, first-out
FQ: simulates bit-by-bit round-robin service for each connection (through a router).
In addition to the advertised window, used for flow control, there is a congestion window
on the sender side (used for congestion!).
Review: for purposes of flow control, the size of the “effective window,” ie the number of
unsent bytes that can still be sent, is
EffectiveWindow = AdvertisedWindow – (LastByteSent – LastByteAcked)
The congestion window gives another upper bound on the number of unACKed bytes
that can be in flight. The actual number of bytes allowed to be in flight is the max of the
advertised window and the congestion window. The new computation is:
MaxWindow = MIN(CongestionWindow, AdvertisedWindow)
EffectiveWindow = MaxWindow – (LastByteSent – LastByteAcked)
The idea: decrease the congestion window size when a segment is dropped, increase it
when a segment is acked. By how much?
The Algorithm: Additive Increase, Multiplicative Decrease (AIMD)
TCP interprets timeouts as a sign of congestion, and halves the window size when a
timeout occurs (signaling a lost segment).
CongestionWindow /= 2
For the additive increase, the goal is to increase the Congestion Window by 1 MSS every
time a full Congestion Window worth of bytes has been sent. But instead of waiting for
the full Congestion Window to be sent to increase the window size, the algorithm
increases it proportionally with ach ack. Each time an ACK is received for a segment,
credit the congestion window for that segment:
Increment = (bytes in ACKED segment) * (MSS/CongestionWindow)
CongestionWindow += Increment
The net result is to increase the congestion window size by 1 MSS every time a full MSS
of segments has been acked.
The minimum congestion window is 1 MSS:
1) If the Congestion Window allows 1 MSS (min size), we add 1 MSS to the
window if a maximum size segment is acked.
2) If the congestion window allows 2 MSS, it takes two segments of the maximum
size to add 1 MSS to the window.
3) And so on.
This gives a sawtooth pattern in response to segment losses.
Why is this reasonable? Because segments delayed by congestion get re-sent, increasing
the congestion, ie, we have a positive feedback loop – too much traffic induces more
traffic. So the sender needs to be aggressive about reducing traffic and cautious about
AIMD is too conservative when a session starts up – there may be plenty of bandwidth
available (there’s no evidence either way). However, using the advertised window may
invite congestion. So, the idea is to use exponential increase until a segment is lost,
1) Start the congestion window at 1 segment and double it each time a new ACK
2) When a segment times out, switch over to AIMD.
This is called “slow” because it's the alternative to using the full advertised window
immediately – this is slower than that. But, it’s actually quite fast.
Slow start runs in two circumstances:
1) At the very beginning of a connection, when it has no information.
2) When the connection has gone dead because all the data allowed has been sent
and the ACK of the earliest segment has not arrived.
Why do we need slow start for the second case: When the timeout for the earliest
segment occurs, there are no segments in transit (they’ve either arrived or been lost). So
no more ACKs will arrive to clock the sender. In this case, the sender uses slow start.
In the slow start used at the beginning of a session, the sender has no information about
available bandwidth and increases until the first segment is lost. But after a lost segment,
the sender has the Congestion Window that had been computed – this becomes the target
congestion window. Instead of looking for a segment to be lost, slow start increases to
the size of the Congestion Window right after the lost segment and then switches over to
Note that TCP uses aggressive increases in the number of segments sent with slow start
to set the congestion window size.
Fast Retransmit and Fast Recovery
There's a relatively long fallow period following a lost segment (see Figure 6.11), while
the sender waits for a time-out. We can tweak the receiver to improve this: The basic
sliding window algorithm for TCP sends an ACK when segment arrives that increases the
number of contiguous (in-order) bytes received.
If instead, the receiver sends an ACK each time a segment arrives at the receiver, but the
ACK still contains only the next expected byte, then the sender knows that a segment has
gotten through but that an earlier segment was either lost or delayed. Then the sender can
re-send the next expected segment without waiting for the timeout.
However, there are still congestion issues, so we want to be sure that the segment has
actually been lost, not merely delayed. This is heuristic, but at present a TCP sender
waits until 3 duplicate acks have been received before re-sending the lost-or-delayed
This greatly reduces the period of time between loss of a segment and its re-transmission.
Routers monitor load and notify end nodes when congestion is about to occur. Ther
receiver gets the notification and relays it to the sender.
Routers assume congestion if the average queue length is >= 1.
The source node estimates how many packets resulted in congestion notification. If 50%
or more of the last window’s worth, then the source decreases its window to .875 times
its earlier value. If less, it increases by 1 packet. Note, this is a variant of AIMD.
Random Early Detection
Uses a dropped packet for notification.
Assign packets a drop probability when queue lengths exceed a certain size; this
probability goes to 1 when the queue length is long enough.
As in DECBit, compute an rolling average queue length.
Use MinThreshold and MaxThreshold to determine drop
probability. Between min and max, compute a drop
probability that is an increasing function of the average
queue length, going from 0 at MinThreshold to a
maximum probability at MaxThreshold. After
MaxThreshold, drop them all.Modeling TCP
The model will have a bad channel in it, which loses, duplicates, and re-orders messages.
It will also have a sender and a receiver. These could be decomposed into various
components – e.g., sliding window, RTT computation, flow control, etc – or not.
Probably (considering the properties we want to prove) we want the TCP ioa to have
actions corresponding to getting data from an application on the sender side and giving
data to the partner application on the receiver side.
Important TCP Properties
Basic properties correspond to implementing a reliable FIFO channel. These could be
proved by defining a simulation relation between the above composition of machines and
the reliable FIFO.
Basic: Every message that is sent is eventually received in a fair trace.
Basic: No messages are received out of order.
Basic: No messages are received twice.
Flow control: There are never more bytes in flight than the advertised window allows.
Flow control (preliminary): The receiver never rolls back the advertised window.
Congestion control: There are never more bytes in flight than the congestion window