On the Resilience of SACK and NewReno TCP
Qiang Ye, Mike H. MacGregor
Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada
Abstract-The de facto requirement in traditional telephone Our other concern is to be able to size client buffers.
networks is to restore failures in 50 milliseconds or less. The The "bandwidth-delay" product is usually used to set
same standard has been assumed in data networks. In this receive buffer size, to fully utilize transport network
study we consider the reaction of TCP to a failure in a bandwidth. When network failures are considered, we
continental-scale network. Our goal is to determine whether
would like to know what buffer size leads to the best
there are particular values for outage duration at which file
transfer times increase markedly. Such values would resilience of TCP.
indicate significant objectives for the restoration of The rest of the paper is organized as follows. Section 2
networks carrying TCP traffic. For SACK and NewReno gives the background of TCP resilience mechanisms, and
TCP, we find that a restoration objective of 600 ms to 1 s is Section 3 presents the details of our simulations. In
appropriate. In addition, we also find that receive buffers Section 4 we discuss the behavior of TCP in the case of
can be sized at 2 rτ to maximize link utilization and network failures. The paper closes with our conclusions
and recommendations in Section 5.
Index Terms-TCP, resilience, data network.
II. RESILIENCE MECHANISMS IN TCP
I. INTRODUCTION TCP does not have any resilience mechanisms that are
The de facto requirement in traditional telephone specially designed to deal with network failures. From
networks is for restoration to occur in 50 milliseconds or the viewpoint of TCP, there is no difference between
less  . This was adopted as the result of network failure and network congestion. As a result,
considering the impact of outage duration on voice calls. when part of the network fails and some segments are
Outages of greater than 50 ms will likely result in many dropped, TCP will assume that there is congestion
calls being dropped, due to various voice switch design somewhere in the network, and the TCP congestion
parameters. Once these calls have been dropped, there is control mechanisms will start dealing with the segment
the potential for an inrush of reattempts which has the loss.
potential to overload and crash the network. However, the TCP congestion control mechanisms have improved
same considerations do not necessarily apply to data over time. The main versions of TCP are Tahoe TCP,
traffic. Despite this, the same 50 ms objective has been Reno TCP, NewReno TCP and SACK TCP. Tahoe TCP
assumed in the development of data networks. Recent is the oldest version and only a few old systems use it.
studies contend that for network and application layers, Reno TCP, NewReno TCP and SACK TCP are widely
50 ms restoration is not necessary . For Internet implemented . This paper focuses on SACK and
packet transport, the question that needs to be answered is: NewReno TCP because they are the newer versions and
how does TCP behave in case of network failures? Does are more widely deployed. Details about TCP congestion
TCP really need 50 millisecond restoration? control can be found in . In our
This paper focuses on the TCP-layer view of failures. experiments, the TCP implementation conforms to the
That is, the goal of this study is to find out, in the absence one illustrated in .
of any other compensating mechanisms (e.g. Automatic Both SACK and NewReno TCP congestion control are
Protection Switching), how TCP control mechanisms composed of three parts: slow start, congestion avoidance
react to outages. This data is fundamental to designing and fast retransmit/fast recovery. Three state variables,
any restoration mechanisms for networks carrying TCP cwnd (congestion window), rwnd (receiver’s advertised
traffic. window) and ssthresh (slow start threshold), are
In this paper we consider the reaction of a single TCP maintained at the sender to deal with network congestion.
session to network link failures. Interactions between In addition, SACK TCP has an extra variable called pipe
multiple TCP flows are not taken into account. We at the sender that represents the estimated number of
believe that the TCP protocol itself is complex enough outstanding segments. SACK TCP also has a data
that it is necessary to first understand how TCP behaves structure called scoreboard at the sender side that keeps
in this baseline scenario, before setting standards, track of the contiguous data blocks that have arrived at
proposing mechanisms, or exploring the impact of the receiver. Retransmission timeout (RTO) is an
additional variables. important parameter in TCP congestion control. It has a
minimum of one second and RFC 2988  suggests that The receive buffer at the client plays an important role
a maximum value may be placed on RTO. In our in TCP performance. During a TCP session, the sending
simulation, this maximum value is 64 seconds. TCP continuously compares the outstanding
unacknowledged traffic with cwnd and rwnd. Whenever
the outstanding traffic is less than the smaller of these
two variables by at least one SMSS (sender maximum
Our simulations were carried out using OPNET  segment size), the sender will send out some segments if
because it has more up-to-date versions of protocols of there are any waiting to be sent. Generally the receive
interest than other simulators such as ns . For this buffer size (rbuff) is set as:
study, a client and server are connected across a
continental-scale network. Each node is connected to a
rbuff = bandwidth * round-trip-time = r * τ (1)
local router via a high-speed LAN link. The local routers where r stands for bandwidth and τ is round-trip-time
are connected to the core network via access links. Three (RTT). This is commonly called the bandwidth-delay
access link rates are commonly used in real-life systems: product .
DS0 (64 Kbps), DS1 (1.544 Mbps) and OC-3c (155 The TCP session in our simulation must be long
Mbps). Based on the fact that servers are usually enough to test scenarios with varying failure durations.
connected to the Internet via high-speed links while We chose FTP as the application-layer protocol and made
client-side access link rates vary a lot, our simulation the transmitted file large enough to fulfill this
fixes the server-side access at OC-3c and varies the requirement. For DS0, DS1 and OC-3c client-side access
client-side access link from DS0 to DS1 and OC-3c. The links, we used 5 MB, 10 MB and 20 MB files,
core network in our simulation has an NSFNET-like respectively. In reality, the duration of TCP flows covers
topology shown in Figure 1. Core routers (Cisco 12008) a very large range. Although there are many short flows
are connected via OC-192, which is common in backbone in terms of number, long-running flows account for up to
networks nowadays. The client resides in Palo Alto and 50% of all traffic . This paper focuses on long-
the server is located at Princeton. running TCP flows.
As shown in Figure 1, a packet discarder model, used to
simulate outages, is on the link connecting Salt Lake City
to Palo Alto. We can specify either the number of packets IV. EXPERIMENTAL RESULTS
to be dropped or a certain time period during which all We first describe the general behavior of SACK and
packets are dropped. Our experiments simulate a NewReno TCP in the case of network failures, and then
unidirectional failure of packets going from Salt Lake present some additional details of TCP resilience for
City to Palo Alto (i.e. in the server-to-client path). different scenarios. We use Transmission Time Increase
Packets traveling the other way get to their destination (TTI) to quantify the impact of a network failure:
safely. A unidirectional failure would be unusual in a -
TTI = ATT NTT (2)
transport network. However, this assumption was made where ATT stands for Actual Transmission Time in the
by many network researchers to reflect the reality of case of network failure, and NTT means Normal
today’s Internet: routes for IP packets are often Transmission Time in the case of no network failure.
asymmetric . Thus a failure in the underlying
network will often only affect a session in one direction. A. General Behavior of TCP
There is only one routing domain in our simulations, We use a scenario with DS1 client access and 32 KB
and the NSFNET-like topology is relatively old. However, receive buffer as a typical example to illustrate the
this paper focuses on the TCP-layer view of failures. That general behavior of TCP in the case of network failures.
is, this paper tries to find out, in the absence of any 1. SACK TCP General Behavior
compensating mechanisms, how TCP congestion control In our example, a long SACK TCP session starts at
mechanisms will react to outages. This first-step time 0 and the Packet Discarder begins to drop packets at
experiment generated many valuable results, and some of 30 seconds, as shown in Figure 2. The four curves in
which are presented in detail in Section 4. Actually, it Figure 2 illustrate the changes in the sender’s congestion
usually takes routing protocols (both IGP and EGP) tens window over time in four different cases. Before the
of seconds to detect and react to lower-layer failures failure starting point (marked by “X”), the four curves
. If the failures can be restored within the time overlap each other because during that period they
horizons recommended in this paper, the routing protocol describe essentially the same conditions. After point “X”,
will not detect the failure, and any failure will be restored they split into four different curves.
long before the routing protocol could converge. For When the TCP session starts, the congestion window is
these reasons, we do not consider the potential reaction of initially one SMSS (1460 bytes in our simulation) and
routing protocols to the failures under study. TCP is in slow start. cwnd increases exponentially as the
sender receives acknowledgements, until cwnd equals In the normal state the receiver only sends out an ACK
ssthresh (initially ssthresh is 64 KB). The sparse points at for every second full-sized segment, or within 200ms of
the beginning of the curve correspond to this slow start the arrival of the first unacknowledged segment. Also,
period. Then TCP transitions into congestion avoidance out-of-order data segments should be acknowledged
during which cwnd increases by 1460 bytes every RTT. immediately. Thus when the first out-of-order segment in
This increase is comparatively slow and as a result the the window arrives, if there is no unacknowledged
points for the congestion avoidance period are very close segment at the receiver, this segment will trigger the
together. receiver sending out a duplicate ACK (we call this case
Without a network failure, TCP stays in congestion Type I Failure). Eq. (7) above applies to this scenario.
avoidance until the file is completely transmitted and the But if the first out-of-order segment arrives within 200
TCP session is terminated. This is shown by the curve ms of an unacknowledged segment, the receiver will not
labeled “0 Drop”. send out a duplicate ACK. Instead the receiver transmits
an acknowledgement of the unacknowledged segment. Of
Now we consider the case in which some segments are
course each segment following the first out-of-order
lost during a failure. In this case, after the lost segments,
segment results in a duplicate ACK. We call this case
the client receives subsequent segments over the restored
Type II Failure. In this scenario, Eq. (7) should be
link. As the result, the sender gets three duplicate
modified slightly: nC,S should be decreased by one. Hence,
acknowledgements, and TCP transitions into fast
we should calculate nC,S according to the following
retransmit/fast recovery. It retransmits the earliest
unacknowledged segment and sets pipe to the number of
outstanding segments, 23 in this case. It also sets ssthresh pipe / 2 + 2 in Type I Failure
to half the current flight size (16060 B in this case) and n C,S = (8)
sets cwnd to (ssthresh + 3 * SMSS) = 20440 bytes . pipe / 2 + 1 in Type II Failure
Theoretically ssthresh should be set to exactly half the
Note that the SACK DS1-32K example illustrates a Type
current flight size, but OPNET uses a slightly different
calculation: after halving the flight size (33580 B), the II failure, thus nC,S is 23/2 +1 =12.
result of 16790 B is rounded down to a multiple of SMSS If less than nC,S segments within one window are lost,
(16060 B). Given the method used to calculate ssthresh, there will be enough segments that arrive at the receiver
we have the following relation between cwnd and pipe: and trigger enough duplicate selective ACKs. These
ACKs will then make (pipe * SMSS) at least one SMSS
cwnd = ( pipe / 2 + 3) *SMSS (3) less than cwnd at the sender. In this case, the sender can
During fast recovery, pipe is increased by one when the retransmit other lost segments after retransmitting the
sender either sends a new segment or retransmits an old earliest unacknowledged segment. When a non-duplicate
segment, and it is decreased by one for each additional ACK arrives acknowledging all data that was outstanding
duplicate ACK. For each partial ACK (ACKs received when fast retransmit/fast recovery was entered, TCP exits
during fast recovery that acknowledges new data, but do fast retransmit/fast recovery and switches into congestion
not take TCP out of fast recovery), pipe is decreased by avoidance. In short, SACK TCP can usually recover
two rather than one. When (pipe * SMSS) becomes at quickly from the loss of less than nC,S segments. The
least one SMSS less than cwnd, the sender will check the overall transmission time does not increase much in this
scoreboard and either retransmit the earliest case. In the DS1-32K case, nC,S is 12. Thus from 1 lost
unacknowledged segment or transmit a new segment segment to 12 lost segments, all the curves are similar.
when there are no unacknowledged segments. We use nD For clarity, we only include the 12-drop curve in Figure 2.
to denote the number of duplicate ACKs received by the On the other hand, if more than nC,S segments are
sender when (pipe * SMSS) becomes exactly one SMSS dropped and there are still three or more non-
less than cwnd. Then we have: retransmitted segments following the lost segments that
pipe *SMSS − n D *SMSS = cwnd − SMSS (4) arrive at the receiver, enough duplicate ACKs will reach
the sender to trigger fast retransmit/fast recovery. But in
So: this case the duplicate selective ACKs will never make
n D = pipe − (cwnd / SMSS) + 1 (5) (pipe * SMSS) at least one SMSS less than cwnd, so the
sender will not retransmit other lost segments after
From Eq. (3) and (5), we have:
retransmitting the earliest unacknowledged segment. It is
n D = pipe − ( pipe / 2 + 3) + 1 = pipe − pipe / 2 − 2 (6) doomed to timeout, which will force TCP into slow start.
If we use nC,S to denote the critical number of lost TCP will then have to retransmit the earliest segment at
segments in this case, we arrive at: that moment, which in this case is the second lost
n C,S = pipe − n D = pipe / 2 + 2 (7)
If less than three segments follow the lost segments, following formula:
fast retransmit/fast recovery will not occur because there rbuff / SM SS-3 in Type I Failure
will not be enough duplicate ACKs. This also leads to n C,NR = (9)
timeout. When the retransmission timer expires TCP will rbuff / SM SS-4 in Type II Failure
transition into slow start, but in this case it retransmits the
first lost segment instead of the second. Although TCP Note that the NewReno DS1-32K example also illustrates
experiences different transitions in the above two cases, a Type II failure, thus nC,NR is (33580/1460) - 4 =19.
total transmission time does not change dramatically If less than nC,NR segments in a window of data are lost,
because timeout is the main factor. Hence, from 13 lost enough surviving segments can arrive at the receiver and
segments to 23 lost segments, all the curves are similar to trigger enough duplicate ACKs to make TCP transition
the 13-drop curve. We only include the 13-drop curve in into fast retransmit/fast recovery. In this case, the earliest
Figure 2 for clarity. unacknowledged segment is retransmitted and the
If the network failure lasts long enough so that the retransmission leads to a partial ACK. The partial ACK
segment retransmitted due to timeout is also dropped, will then make the sender retransmit the earliest
things change again. This is because when the unacknowledged segment at that moment. This
retransmitted segment is sent out, the retransmission retransmitted segment will result in another partial ACK,
timer has been doubled. If the retransmission fails, the and thus leads to another retransmission. This process
sender will wait for twice the previous RTO before goes on until a non-duplicate ACK arrives
timing out and retransmitting the earliest acknowledging all data that was outstanding when TCP
unacknowledged segment again. This corresponds to the transitioned into fast retransmit/fast recovery, then TCP
24-drop curve in Figure 2. If the repeated retransmission switches into congestion avoidance by setting cwnd back
does not succeed, the sender has to wait for four times the to ssthresh. We should note that each time the sender gets
original RTO to retransmit a third time. This process goes a partial ACK, it does one retransmission and thus
on until TCP gives up this connection. For clarity, we did recovers one lost segment. Namely, it takes NewReno
not include the curves of 25, 26, etc. dropped segments, TCP a whole RTT to recover one lost segment. Thus in a
but it is not difficult to imagine what they should look sense, RTT determines the final TTI value. If RTT is
like in Figure 2. Figure 3 presents TTI changes vs. the comparatively long, TTI increases dramatically with the
number of dropped segments, and illustrates this trend. number of lost segments; otherwise, TTI almost remains
unchanged. In the NewReno DS1_32K case, RTT is
2. NewReno TCP General Behavior relatively small, thus TTI does not increase much. In this
For NewReno TCP, a similar experimental setup is case nC,NR is 19, so from 1 lost segment to 19 lost
used, but different experimental results are obtained. As segments, all the curves are similar. For clarity, we only
shown in Figure 4, a long NewReno TCP session also include the 19-drop curve in Figure 4.
starts at time 0 and the Packet Discarder begins to drop SACK TCP has a different mechanism to deal with
packets at 30 seconds. The four curves in Figure 4 partial ACKs. In Section 4.A.1, we have mentioned that
illustrate the changes in the sender’s congestion window pipe is decremented by one for each additional duplicate
over time in four different cases. Again, before the ACK, but it is decreased by two rather than one for each
failure starting point (marked by “X”), the four curves partial ACK. This additional decrease in pipe results in a
overlap each other; after point “X”, they transition into faster recovery process: one partial ACK leads to two
four different curves. retransmissions. The two retransmissions will trigger
The curve labeled “0 Drop” is the same one illustrated another two partial ACKs and eventually will lead to
in Section 4.A.1. Without network failures SACK and another four retransmissions. This process goes on until a
NewReno TCP behave in the same way. non-duplicate ACK arrives acknowledging all data that
Now we consider the case in which some segments are was outstanding when TCP transitioned into fast
lost during a failure. In this case, similar to the SACK retransmit/fast recovery. Hence, within one RTT, usually
scenario, if the sender can get three duplicate many more lost segments can be recovered with SACK
acknowledgements, TCP will transition into fast TCP than with NewReno TCP. This is why with SACK
retransmit/fast recovery and set ssthresh and cwnd in the TCP, TTI does not increase much when less than nC,S
same way. NewReno TCP does not set pipe because pipe segments within one window are lost, regardless of the
only appears in SACK TCP. If we use nC,NR to denote the length of RTT. In contrast, the TTI of NewReno is
critical number of lost segments when there are just influenced by RTT in this situation.
enough subsequent surviving segments in the window of On the other hand, if more than nC,NR segments in a
data to trigger three duplicate acknowledgements, it is window of data are dropped, fast retransmit/fast recovery
easy to know usually the critical number is equal to will not occur because there will not be enough duplicate
(rbuff/SMSS-3). Due to the TCP acknowledging ACKs. This leads to timeout. When the retransmission
mechanism illustrated in Section 4.A.1, we can get the timer expires TCP will transition into slow start and
retransmit the first lost segment. In this scenario, timeout τ1,S = (( rbuff /(2 *SMSS) ) *SMSS) / r (11)
plays the major role in terms of TTI, and thus the overall
transmission time does not increase much with the then τ1,S approximately doubles as rbuff doubles. This is
number of lost segments. In the NewReno DS1_32K case, illustrated in Figure 6, 7 and 8. τ2,S is not as
from 20 lost segments to 23 lost segments, all the curves straightforward because it is mainly related to RTO, and
are similar. We only include the 20-drop curve in Figure RTO is influenced by many factors . RTO usually
4 for clarity. increases with rbuff, and has a minimum value of 1
If the network failure lasts long enough so that the second. Thus, when RTO is greater than 1 second, τ2,S
segment retransmitted due to timeout is also dropped, increases with rbuff. This is illustrated in Figure 6 and 7.
NewReno TCP will experience the same doubling When RTO is at its minimum of 1 second, τ2,S does not
calculation that is illustrated in Section 4.A.1. A similar change much and is independent of rbuff. This can be
24-drop curve is included in Figure 4. For clarity, we did observed in Figure 8. In any case, τ2,S is always greater
not include the curves of 25, 26, etc. dropped segments. than 1 second.
Figure 5 illustrates the overall trend by presenting TTI Either τ1,S or τ2,S can be chosen as a restoration
changes vs. the number of dropped segments. objective. If the restoration can be finished within τ1,S, the
B. Several Details of TCP Behavior overall transmission time will not increase much in the
The bandwidth-delay product, rτ, is commonly used to case of network failures. If the restoration time is in the
size the receive buffer. In our simulations, the RTT for range of τ1,S to τ2,S, the overall transmission time is
DS0, DS1 and OC-3c access is 210ms, 41ms and 26ms increased but it is guaranteed that the TTI is a fixed value.
respectively, so rτ has values of 1680, 7913 and 505440 Other thresholds can be defined on the basis of a third
bytes respectively. For each access link rate, we timeout and so on. However, we know that τ2,S is
experimented with 8 different receive buffer sizes, from 8 certainly greater than 1 second. This is already much
KB to 1024 KB. By 8 KB, we mean a multiple of SMSS larger than the de facto target of 50 ms.
that is just above 8 KB. For example, in our simulation, An interesting observation is that TTI is not always
SMSS is 1460 bytes, so by 8 KB, we mean 1460 * 6= greater than outage duration. When receive buffer is
8760 bytes. greater than rτ, some segments will be buffered in the
1. SACK TCP Details network. These buffered segments help keep traffic
flowing between the sender and the receiver in the case of
We have demonstrated that losing less than nC,S
network failures. Thus TTI could be shorter than outage
segments typically does not increase SACK TCP
duration in some scenarios. We can observe this trend in
transmission time significantly. Losing (nC,S + 1)
Figure 6 and 7.
segments makes a difference, and subsequent losses have
little impact until it comes to the loss of the retransmitted There are some exceptions to these typical cases. First,
copy. Knowing that pipe * SMSS = rbuff, we can easily the 512 KB and 1024 KB curve in Figure 6 illustrate
translate nC,S into ( rbuff /(2*SMSS) + 2) or ( rbuff situations in which very large receive buffers lead to a
/(2*SMSS) + 1). To link the number of lost segments to calculated value of RTO that is greater than the TCP-
outage duration, we define SACK Level-1 Fault defined maximum of 64 seconds. This puts TCP into
slow start many times unnecessarily and dramatically
Tolerance Time (τ1,S ) as the period from the moment
changes the normal recovery process. Thus the 512 KB
network failure occurs to the moment just before the
and 1024 KB curve are very irregular. Secondly, in
segment following the dropped nC,S segments arrives. We
Figure 8, we note that a 1024 KB buffer mostly leads to a
define SACK Level-2 Fault Tolerance Time (τ2,S) as the
shorter TTI than does a 512 KB buffer. This is
period from the moment network failure occurs to the
exceptional because generally TTI increases with rbuff.
moment just before the copy retransmitted due to timeout
However, the bandwidth-delay product for OC-3c access
arrives. Thus τ1,S is the time during which ( rbuff is 505440 B, and after the failure ssthresh is set to half
/(2*SMSS) +2) or ( rbuff /(2*SMSS) + 1) segments the current flight size, which is around 256 KB in the
pass the failure point. In our simulations, this is mostly case of a 512 KB buffer. Setting ssthresh to a value less
influenced by the client access rate, which is essentially than rτ hurts link utilization and leads to a longer TTI.
the bandwidth of “r” used to calculate the bandwidth- Thirdly, in Figure 8, we observe that when outage
delay product, so that: duration is between τ1,S and τ2,S, TTI decreases
(( rbuff /(2 * SMSS) + 2) *SMSS) / r in Type I Failure dramatically with outage duration. This is again the result
τ1,S = (10) of the large value of rτ. When outage duration is in this
(( rbuff /(2 * SMSS) + 1) * SMSS) / r in Type II Failure range, the sender times out and finally gets into
Obviously, τ1,S increases with rbuff. If rbuff is large congestion avoidance. In the case of OC-3c access, cwnd
enough so that we can approximate τ1,S as follows: increases with the number of lost segments
(corresponding to longer outage duration) when TCP
transitions into congestion avoidance. In this scenario, the We define NewReno Level-2 Fault Tolerance Time
network connection is not fully utilized after the failure (τ2,NR) as the period from the moment network failure
because cwnd is always less than rτ, so a larger cwnd due occurs to the moment just before the copy retransmitted
to longer failure time leads to shorter TTI. Fourthly, in due to timeout arrives. τ2,NR is essentially the same as τ2,S.
Figure 8, after τ1,S, TTI increases as the receiver buffer So all conclusions about τ2,S also apply to τ2,NR.
increases from 8 KB to 256 KB and it decreases as the Either τ1,NR or τ2,NR can be chosen as a restoration
receiver buffer increases from 256 KB to 512 KB. We objective. But with NewReno TCP, if the restoration can
know that in the case of OC-3c access rτ is 505440 bytes. be finished within τ1,NR, the overall transmission time is
Hence, the curves for 8 KB to 256 KB are for receive influenced by RTT. If RTT is relatively small, the overall
buffer sizes of less than rτ and those for 512 KB to 1024 transmission time does not change much as restoration
KB are for sizes greater than rτ. A value for rbuff less time increases; otherwise, TTI increases with restoration
than rτ leads to poorer link utilization and so to larger time. In Figure 9, 10 and 11, we observe that, when
NTT . NTT is the baseline value used to calculate TTI restoration time is less than τ1,NR, TTI does not change
in Eq. (2). Thus, we have two different classes in terms of much in the DS0 scenario, but RTT plays a role in the
TTI, above and below rτ. They are essentially not DS1 scenario and TTI increases dramatically with outage
comparable. duration in the OC-3c scenario. It is interesting that in the
The receive buffer size plays a significant role in SACK OC-3c scenario, for large receive buffers, restoration
TCP resilience. First, from the viewpoint of network link times longer than τ1,NR lead to better resilience (i.e.
utilization, receive buffers should be set to at least 2rτ to decreased TTI). In addition, TTI could be less than
anticipate the case that one network failure takes place outage duration in some cases due to large receiver buffer
during the transmission. Anything lower results in and this can be observed in Figure 9 and 10.
impaired resilience. This is because even if a timeout The exceptions due to large receive buffers, ssthresh
occurs, ssthresh is still equal to at least rτ when it is halving and insufficient rτ presented in Section 4.B.1 also
initially set to 2rτ. Secondly, the receive buffer should be apply to NewReno TCP and they can be observed in
set as large as possible in order to increase τ1,S. Finally, Figure 9, 10 and 11. The exception with SACK TCP that
when RTO is at its minimum of 1 second, receive buffer TTI decreases with outage duration when restoration is
size does not affect τ2,S; when RTO is greater than 1 finished between τ1,S and τ2,S in the OC-3c access case
second, larger receive buffers lead to longer τ2,S. We can does not occur in NewReno TCP because with NewReno
observe these trends in Figure 6, 7 and 8. TCP, after τ1,NR there are only 3 or 4 segments left in the
2. NewReno TCP Details window of data, these segments do not make a significant
change to TTI.
We have illustrated that when less than nC,NR segments
are lost, TTI is affected by RTT. Losing (nC,NR +1) For receive buffer sizing, all rules illustrated for SACK
segments makes a difference, and subsequent losses have TCP previously also apply to NewReno TCP.
little impact until it comes to the loss of the retransmitted
copy. We define NewReno Level-1 Fault Tolerance Time V. CONCLUSIONS
(τ1,NR) as the period from the moment of network failure
The results presented here demonstrate that the
to the moment just before the segment following the
traditional 50 ms restoration time is not suitable for TCP
dropped nC,NR segments arrives. τ1,NR is the time during
transport on the Internet. With SACK TCP, we found two
which (rbuff/SMSS-3) or (rbuff/SMSS-4) segments pass
restoration objectives, τ1,S and τ2,S. τ1,S is given by Eq.
the failure point. In our simulation, it is mostly affected
by the client access rate, which is essentially r, so: (10), and τ2,S is closely related to RTO. With NewReno
TCP, we also found two restoration objectives, τ1,NR and
(( rbuff / SMSS-3) * SMSS) / r in Type I Failure
τ2,NR. τ1,NR is given by Eq. (12), and τ2,NR is essentially the
τ1,NR = (12)
same as τ2,S. τ1,NR is approximately twice as large as τ1,S
(( rbuff / SM SS-4) * SM SS) / r in Type II Failure
when rbuff is large. For SACK TCP, if restoration can be
Apparently, τ1,NR increases with rbuff. If rbuff is large finished within τ1,S, TTI does not increase much with
enough so that we can approximate τ1,NR as follows: restoration time. For NewReno TCP, if network failures
τ1,NR = rbuff / r (13) can be restored within τ1,NR, TTI is influenced by RTT. If
RTT is relatively small, TTI does not change much;
then τ1,NR approximately doubles as rbuff doubles. This is otherwise, TTI increases with restoration time. In
illustrated in Figure 9, 10 and 11. From Eq. (11) and (13), addition, we find that TTI could be less than outage
we conclude that τ1,NR is approximately twice as large as duration in some cases due to large receive buffer.
τ1,S when rbuff is large. For low-rate access, we recommend τ1,S or τ1,NR to be
the restoration objective. This is because in this situation
τ1,S or τ1,NR is the threshold after which TTI increases  J. Schallenburg. Is 50 ms Restoration Necessary? .
markedly. In the DS0 access scenario, both values are IEEE Bandwidth Management Workshop,
above 600 ms in all experimental cases. For high-rate Montebello, Quebec, Canada, Jun. 2001.
access, τ2,S or τ2,NR is recommended because τ1,S or τ1,NR  J. Pahdye and S. Floyd. On Inferring TCP Behavior.
is probably too small to be realistically attainable, and Proceedings of the 2001 conference on applications,
any additional outage up to τ2,S or τ2,NR does not increase technologies, architectures and protocols for
TTI significantly. In the OC-3c access scenario, both τ1,S computer communications.
and τ1,NR are below 15 ms in almost all experimental  M. Allman and V. Paxson. RFC 2581: TCP
cases. Hence, a restoration objective of τ2,S or τ2,NR is Congestion Control. Apr. 1999.
 W.R. Stevens. TCP/IP Illustrated, Volume 1.
appropriate; in the OC-3c access scenario, τ2,S and τ2,NR
Addison Wesley Press, Apr. 2000.
are approximately 1 s.
 M. Mathis, J. Mahdavi et al. RFC 2018: TCP
We also find that receive buffers can be sized to meet Selective Acknowledgement Options. Oct. 1996.
various utilization and resilience goals. First, from the  K. Fall and S. Floyd. Simulation-based Comparisons
viewpoint of network link utilization, receive buffers of Tahoe, Reno and SACK TCP. Computer
should be set to at least 2rτ in the case that one network Communication Review, V. 26 N. 3, Jul.1996, pp. 5-
failure takes place during the transmission. Anything 21.
lower results in impaired resilience. This is because even  S. Floyd and T. Henderson. RFC 2582: The
if a timeout occurs, ssthresh is still equal to at least rτ NewReno Modification to TCP’s Fast Recovery
when it is initially set to 2rτ. Secondly, the receive buffer Algorithm. Apr. 1999.
should be set as large as possible in order to increase τ1,S  V. Paxson and M. Allman. RFC 2988: Computing
and τ1,NR. Finally, when RTO stays at its minimum of 1 TCP's Retransmission Timer. Nov. 2000.
second, receive buffer size does not affect τ2,S and τ2,NR;  OPNET Modeler. Version: 8.1.A PL3. May 2002.
when RTO is greater than 1 second, larger receive buffers  LBNL Network Simulator. http://www-
lead to longer τ2,S and τ2,NR. nrg.ee.lbl.gov/ns/.
 K. Ramakrishnan, S. Floyd et al. RFC 3168: The
Addition of Explicit Congestion Notification (ECN)
REFERENCES to IP. Sep. 2001.
 Transport Systems Generic Requirements (TSGR):  J. Moy. RFC 2328: OSPF Version 2. Apr. 1998.
Common Requirements. Generic Requirements: GR-  K. Lougheed and Y. Rekhter. RFC 1267: Border
499-CORE. Dec. 1998. Gateway Protocol 3. Oct. 1991.
 Types and Characteristics of SDH Network  N. Brownlee and K.C. Claffy. Understanding
Protection Architectures. ITU-T G.841. Oct. 1998. Internet traffic streams: dragonflies and
 S. Mokbel. Canada's Optical Research and Education tortoises. IEEE Communications Magazine,
Network: CA*net3. Proceedings DRCN 2000, 40(10), Oct. 2002, pp. 110-117.
Munich, April2000, pp. 10-32.
Figure - 1 Core Network Topology.
100 Failure Starting Point
Congestion Window (KB)
0 10 20 30 40 50 60
Figure - 3 TTI vs. No. of Lost Segments
Figure - 2 Congestion Window vs. Transmission Time (SACK DS1-32K Case).
(SACK DS1-32K Case).
0 Drop 16k Buffer
120 5 64k Buffer
X 256k Buffer
Transmission Time Increase (s)
100 Failure Starting Point 1024k Buffer
Congestion Window (KB)
0 10 20 30 40 50 60 1
0.0001 0.001 0.01 0.1 1
Outage Duration (s)
Figure - 4 Congestion Window vs. Transmission Time Figure - 8 TTI vs. Outage Duration (SACK OC-3c Access).
(NewReno DS1-32K Case).
Transmission Time Increase (s)
1 8k Buffer
Figure - 5 TTI vs. No. of Lost Segments 1024k Buffer
(NewReno DS1-32K case). 0.1 1 10 100
Outage Duration (s)
Figure - 9 TTI vs. Outage Duration (NewReno DS0 Access).
Transmission Time Increase (s)
10 512k Buffer
10 1024k Buffer
Transmission Time Increase (s)
1 8k Buffer
64k Buffer 1
0.1 1 10 100
Outage Duration (s)
Figure - 6 TTI vs. Outage Duration (SACK DS0 Access). 0.1
0.01 0.1 1 10
Outage Duration (s)
Figure - 10 TTI vs. Outage Duration (NewReno DS1 Access).
128k Buffer 8k Buffer
256k Buffer 16k Buffer
512k Buffer 32k Buffer
10 1024k Buffer
Transmission Time Increase (s)
Transmission Time Increase (s)
0.01 0.1 1 10
Outage Duration (s)
Figure - 7 TTI vs. Outage Duration (SACK DS1 Access). 1
0.0001 0.001 0.01 0.1 1
Outage Duration (s)
Figure - 11 TTI vs. Outage Duration
(NewReno OC-3c Access).