Microscopic Behavior of TCP Congestion Control

Document Sample
scope of work template
							Microscopic Behavior of Internet Control
Xiaoliang (David) Wei NetLab, CS&EE California Institute of Technology

Internet Control


Problem -> solution -> understanding ->

1986: First Internet Congestion Collapse

1986

1989

1995

1999

2003

…

Internet Control


Problem -> solution -> understanding ->

First Internet Congestion Collapse

1988~1990: TCP-Tahoe DEC-bit

1986

1989

1995

1999

2003

…

Internet Control


Problem -> solution -> understanding ->

First Internet Congestion Collapse
TCP Tahoe; DEC-bit

1993~1995: Tri-S, DUAL, TCP-Vegas

1986

1989

1995

1999

2003

…

Outline







 

Motivation Overview of Microscopic behavior Stability of Delay-based Congestion Control Algorithms Fairness of Loss-based Congestion control algorithms Future works Summary

Outline









Motivation Overview of Microscopic behavior Stability of Delay-based Congestion Control Algorithms Fairness of Loss-based Congestion control algorithms Future works

Macroscopic View of TCP Control


TCP/AQM: A feedback control system
C
TCP Receiver 1

TCP Sender 1

TCP Sender 2

TCP Receiver 2

xi(t)
TCP:  Reno  Vegas  FAST


τF τB

q(t)
AQM:  DropTail / RED  Delay  ECN

x i t   F xi t , qt   B 

  qt   G qt ,  xi t   F   c  i  


Fluid Models
x i t   F xi t , qt   B 
  qt   G qt ,  xi t   F   c  i  




Assumptions:  TCP algorithms directly control the transmission rates;  The transmission rates are differentiable (smooth);  Each TCP packet observes the same congestion price (loss, delay or ECN)

Methodology based on Fluid Models
x i t   F xi t , qt   B 
  qt   G qt ,  xi t   F   c  i  




Equilibrium:  Efficiency?  Fairness?

Dynamics:  Stability?  Responsiveness?

Gap 1: Stability of TCP Vegas


Analysis: “TCP Vegas is stable if (and only if) the number of flows is large, and capacity is small, and delay is small.”



Experiment: a single TCP Vegas flow is stable with arbitrary delay and capacity.

Gap 2: Fairness of Scalable TCP


Analysis: “Scalable TCP is fair in homogeneous network” [Kelly’03]



Analysis: [Chiu&Jain’90] → Scalable TCP is unfair.



Experiment: in most cases, Scalable TCP is unfair in homogeneous network.

Gap 3: TCP vs TFRC


Analysis: “We designed TCP Friendly Rate Control (TFRC) algorithm to have the same equilibrium as TCP when they co-exist.”



Experiment: TCP flows do not fairly coexist with TFRC flows.

Gaps
Stability: TCP-Vegas Fairness: Scalable TCP Friendliness: TCP vs TFRC
Current analytical models ignore microscopic behavior in TCP congestion control

  

Outline









Motivation Overview of Microscopic behavior Stability of Delay-based Congestion Control Algorithms Fairness of Loss-based Congestion control algorithms Future works

Microscopic View (Packet level)
Two level timescales  On each RTT -- TCP congestion control algorithm;


On each packet arrival -- Ack-clocking:  p--;  while (p < w(t) ) do  Send a packet  p++;
(p: number of packets in flight)

W: 0 -> 5
1

2

Sender

3 4 5

C
Receiver

x(t) c
0

t (time)

Packets queued in bottleneck
C
Sender
1 2 3 4 5

Receiver

x(t) c
0

t (time)

Packets leaves bottleneck at rate c
C
Sender
3 4 5 2 1

Receiver

x(t) c
0

t (time)

Acknowledgment returns at rate c
A1 A2 A3

C
Sender
5 4

Receiver

x(t) c
0

t (time)

New Packets sent at rate c
A4 A5

C
Sender
3

2

1

Receiver

x(t) c
0
RTT

t (time)

No queue in

nd 2
C

Round Trip
Receiver

Sender

5

4

3

2

1

x(t) c
0
RTT

No need to control rate x(t) !

RTT

t (time)

Two Flows
4

TCP1
4 3 2

3 2 1

C

Rcv1

TCP2

1

Rcv2

x(t) c
0

t (time)

Two Flows
TCP1

C
3
4 1 2 3 4 2

1

Rcv1

TCP2

Rcv2

x(t) c
0

t (time)

A1

A2

A3

TCP1

C
2 3 4 5 1 4

Rcv1

TCP2

Rcv2

x(t) c
0

t (time)

A3

A4

A1

TCP1

2 1

C
4

Rcv1

3

TCP2

2

Rcv2

x(t) c
0
RTT

t (time)

A1

A2

A3

TCP1

4 3

C
2
1

Rcv1

TCP2

4

Rcv2

x(t) c
0
RTT

t (time)

A3

A4

A1

TCP1

C
4 1 2 3

2

Rcv1

TCP2

Rcv2

x(t) c
0
RTT

t (time)

A1

A2

A3

TCP1

C
2 4

Rcv1

3

1

TCP2

4

Rcv2

x(t) c
0
RTT

On-off pattern for each flow
RTT

t (time)

Sub-RTT Burstiness: NS-2 Measurement

Two levels of Burstiness
x(t) c
0
RTT
RTT

t (time)

Micro Burst  Pulse function  Input rate>>c  Extra queue & loss  Transient

Sub-RTT burstiness  On-off function  Input rate <=c  No extra queue & loss  Persistent

Microscopic Effects: known
Loss-based TCP Delay-based TCP

Micro Burst

Low throughput with small buffer – pacing improves throughput (Clearly understood)

Noise to delay signal, should be eliminated
(Partially…)

Sub-RTT Observed in Internet Traffic Burstiness (“Why do we care?”)

Microscopic Effects: new
Loss-based TCP Delay-based TCP

Micro Burst

Low throughput with small buffer – pacing improves throughput (Clearly Understood)

Fast convergence in queuing delay and better stability

Sub-RTT Low loss No effect Burstiness synchronization rate with DropTail routers

New Understandings
Micro Burst with Delay-based TCP: fast queue convergence
1.

A single TCP-Vegas flow is always stable, regardless of delay and capacity.

Sub-RTT Burstiness and Loss-based TCP: low loss sync rate

1. 2.

Scalable TCP is (usually) unfair; TCP is unfriendly to TFRC;

Outline









Motivation Overview of Microscopic behavior Stability of Delay-based Congestion Control Algorithms Fairness of Loss-based Congestion control algorithms Future works

New Understandings
Micro Burst with Delay-based TCP: fast queue convergence
1.

A single TCP-Vegas flow is always stable, regardless of delay and capacity.

Sub-RTT Burstiness and Loss-based TCP: low loss sync rate

1. 2.

Scalable TCP is (usually) unfair; TCP is unfriendly to TFRC;

A packet level model: basis


Ack-clocking: on each ack arrival  p--;  while (p < w(t) ) do  Send a packet  p++; (p: number of packets in flight)



 

Packets can only be sent upon arrival of an acknowledgment; A micro burst of packets can be sent at a moment; Window size w(t) can be an arbitrary given process.

A packet level model: variables


Ack-clocking: on each ack arrival  p--;  while (p < w(t) ) do  Send a packet  p++; (p: number of packets in flight)

  



pj : Number of packets in flight when j is sent; sj : sending time of packet j bj : backlog experienced by packet j aj : ack arrival time of packet j

A packet level model: variables
A4 A5

3

C
1

Sender

2

Receiver

 

pj : Number of packets in flight when j is sent; sj : sending time of packet j

A packet level model: variables
A4 A5

C
Sender
2

3

1

Receiver

  

pj : Number of packets in flight when j is sent; sj : sending time of packet j bj : backlog experienced by packet j

A packet level model: variables
A4

A3

C
2 1 6 5

Sender

Receiver

  



pj : Number of packets in flight when j is sent; sj : sending time of packet j bj : backlog experienced by packet j aj : ack arrival time of packet j

A packet level model: variables


Ack-clocking: on each ack arrival  p--;  while (p < w(t) ) do  Send a packet  p++; (p: number of packets in flight)

p j  max p j 1  k  1 : p j 1  k  1  w a j 1 p j1  k
0 k  p j 1







s j  a j p j




 

k : number of acks between sj and sj-1 ; pj : number of packets in flight when i is sent sj : sending time of packet j aj-p(j) : ack arrival time of the packet one RTT ago

A packet level model: variables


Ack-clocking: on each ack arrival  p--;  while (p < w(t) ) do  Send a packet  p++; (p: number of packets in flight)
 



k : number of acks between sj and sj-1 ; For example: k =0
pj

max p
0 k  p j 1

j 1

 k  1 p j 1  k  1  w... p j 1  1



s j  a j  p j  a j  p j1 1  a j 1 p j1  s j 1

A packet level model: variables
C
j j-1 p3 p2 p1

c s j  s j 1 

b j  max b j 1  1  c s j  s j 1 ,0


aj  sj  d 

bj c

  

bj : experienced backlog c : bottleneck capacity aj :ack arrival time d : propagation delay

A packet level model
p j  max p j 1  k  1 : p j 1  k  1  w a j 1 p j1 k
0 k  p j 1







b j  max b j 1  1  c s j  s j 1 

s j  a j p j

aj  sj  d 

bj c



  

pj : Number of packets in flight when j is sent; sj : sending time of packet j bj : backlog experienced by packet j aj : ack arrival time of packet j

Ack-clocking: quick sending process


Theorem: For anytime that a packet j is sent (sj ), there is always a packet j*:=j*(j) s.t.
 

sj = sj* pj* = w (sj )



The number of packets in flight at any packet sending time is sync-up with the congestion window.
w(t) p(t)

time (t) s

Ack-clocking: fast queue convergence


Theorem: If Then:

pk  cd
for

k : j  p j  k  j

p j  cd  b j
w(t) q(t)



The queue converges instantly if window size is larger than BDP in the entire previous RTT.

time (t) s

Window Control and Ack-clocking


Per RTT Window Control:
makes decision once every RTT  with the measurement from the latest acknowledgement (a subsequence of sequence number k1, k2, k3, …)


w(t) p(t)

a k1

ak 2

time (t)

s k1

sk 2

sk3

Stability of TCP Vegas


Theorem: Given the packet level model, if αd>1, a single TCP Vegas flow converges to equilibrium with arbitrary capacity c, propagation delay d. That is: there exists a sequence number J such that

j  J :

cd  d  1  ws j   cd  d  1

d  1  b j  d  1

Stability of Vegas : 100-flow simulation

Stability of Vegas : Avg Window Size

Window Oscillation: 1 packet

Stability of Vegas : Queue Size

Queue Oscillation: 100 packets ( because 100 flows synchronized )

Gap 1: Stability of TCP Vegas


Analysis: “TCP Vegas is stable if (and only if) the number of flows is large, and capacity is small, and delay is small.”

Reason: micro burst leads to fast queue convergence


Experiment: a single TCP Vegas flow is stable with arbitrary delay and capacity.

FAST : stable and responsive
Designed based on the intuition that queue is directly a function of congestion window size. A FAST flow does the following every other RTT:

     p j  1  wt    d     wt  bj  2   d     c   

FAST : stability


Theorem: Given the packet level model, homogeneous FAST flows converge to equilibrium regardless of capacity c and propagation delay d and number of flows N. [Tang, Jacobsson, Andrew, Low’07]: FAST is stable with single bottleneck link regardless of capacity c and propagation delay d and number of flows N. (With an extended fluid model capturing microburst effects)



Micro-burst: Summary
x(t) c
0
RTT
RTT

t (time)

Effects:  Fast queue convergence
Stability of homogeneous Vegas for arbitrary delay  Possibility of very responsive & stable TCP control  Stability of FAST for arbitrary delay


Outline









Motivation Overview of Microscopic behavior Stability of Delay-based Congestion Control Algorithms Fairness of Loss-based Congestion control algorithms Future works

New Understandings
Micro Burst with Delay-based TCP: fast queue convergence
1.

A single (homogeneous) TCPVegas flow is always stable, regardless of delay and capacity.

Sub-RTT Burstiness and Loss-based TCP: low loss sync rate

1. 2.

Scalable TCP is (usually) unfair; TCP is unfriendly to TFRC;

Loss Synchronization Rate: Definition


Loss Synchronization Rate [Baccelli,Hong’02]: The probability that a flow observes a packet loss during a congestion event.  Congestion event (loss event): A round-trip time interval in which at least one packet is dropped by the bottleneck router due to congestion (buffer overflow at router)

Loss Synchronization Rate: Effects


Intuitions:
Individual flow: the smaller the better (selfishness)  System design: the higher the better (for fairness and convergence)




Theoretic Results:
Aggregate throughput [Baccelli,Hong’02]  Instantaneous fairness [Baccelli,Hong’02]  Fairness convergence [Shorten, Wirth, Leith’06]


Loss Sync. Rate: Existing Model


[Shorten, Wirth, Leith’06] No Model. Measure from NS-2 and feed into a model for computational results



[Baccelli,Hong’02] Assume each packet has the same probability of being dropped in the loss event.

Packet loss is bursty: Internet

~50% losses happen in bursts

Loss process is bursty: on-off
incoming packets during the RTT of loss event from all flows

Legend:
a packet (from any flow) a dropped packet

burst period of loss signal L incoming packets dropped



In each loss event (one RTT), packet loss process is an on-off process.

Data packet process is bursty: on-off
incoming packets during the RTT of loss event from all flows burst period of one flow: w packets
i i i i i i i i i

Legend:
a packet (from any flow)

i
a packet from flow i

x(t) c
0



In each loss event (one RTT), TCP data packet process is an on-off process.

RTT

RTT

t (time)

Loss Sync. Rate: A Sampling Perspective
incoming packets during the RTT of loss event from all flows burst period of one flow: w packets
i i i i i i i i i

Legend:
a packet (from any flow)

i
a packet from flow i a dropped packet

burst period of loss signal L incoming packets dropped



Loss Sync. Rate: The efficiency of a (bursty) TCP data process to sample the loss signal in a (bursty) loss process




Assumption 1: Within the RTT of loss event, the position of an individual flow’s burst is uniformly distributed. Assumption 2: Loss process does not depend on data packet process of individual flows.

Loss Sync. Rate Case 1: TCP+DropTail
incoming packets during the RTT of loss event from all flows burst period of one flow: w packets
i i i i i i i i i

Legend:
a packet (from any flow)

i
a packet from flow i a dropped packet

burst period of loss signal L incoming packets dropped



L  wi  1 i  cd  B  L

 

wi : window of a TCP flow L : number of dropped packets cd+B+L : number of packets going through the bottleneck in the loss event ( c : capacity, d : propagation delay; B : buffer size)

Loss Sync. Rate: TCP+DropTail

Loss Sync. Rate Case 2: Pacing+DropTail
incoming packets during the RTT of loss event from all flows w packets distributed in the entire RTT of loss event
i i i i
a packet from flow i a dropped packet

i

i

i

i

i

i

Legend:
a packet (from any flow)

burst period of loss signal L incoming packets

L   i  1  1    cd  B  L 

wi


 

wi : window of a TCP flow L : number of dropped packets cd+B+L : number of packets going through the bottleneck in the loss event

Loss Sync. Rate: Pacing + DropTail

Loss Sync. Rate Case 3: TCP+RED
incoming packets during the RTT of loss event from all flows burst period of one flow: w packets
i i i i i i i i i

packet loss distributed over the entire RTT of loss event

wi   i  1  1    cd  B  L 

L


 

wi : window of a TCP flow L : number of dropped packets cd+B+L : number of packets going through the bottleneck in the loss event

Model for Loss Sync. Rate: General form
cd+B incoming packets during the RTT of loss event burst period of Flow i spanning over K incoming packets
i i i i i i i i i i i

Legend:
a packet (from any flow)

i
a packet from flow i a dropped packet

burst period of loss signal randomly drop from M incoming packets

    

cd+B : number of packets going through the bottleneck in the loss event ( c : capacity, d : propagation delay; B : buffer size) wi : window of a TCP flow in the loss event L : number of dropped packets in the loss event i ? Ki : length of burst period of flow i (in pkt) M : length of burst period of loss process (in pkt)



Loss Sync. Rate: MatLab Computation

cd+B = 1080; wi = 60; L = 16; K , M vary

Measurement: TCP + DropTail
Averaged sync. Rate
  

cd+B = 3340 M =L = N/2 K = w = (cd+B)/N

Measurement: Pacing + DropTail

Averaged sync. Rate
  

cd+B = 3340 M =L = N/2 K = w = (cd+B)/N

Measurement: TCP + RED

Averaged sync. Rate
  

cd+B = 3340 M =L = N/2 K = w = (cd+B)/N

Loss Sync. Rate: Qualitative Results






With DropTail and bursty TCP (most widely deployed combination), loss synchronization rate is very low; TCP Pacing increases loss synchronization rate; RED increases loss synchronization rate.

Loss Sync. Rate: Asymptotic Result
 

If number of flows N is large: L >> wi TCP: L  wi  1 L
i 
cd  B  L  cd  B  L



Very weak dependency of Loss Sync Rate to window size: All flows see the same loss w TCP Pacing: wL L  
i  1  1 

i

Loss Sync Rate is proportional to window size: Rich guys see more loss.

i   cd  B  L  cd  B  L

Asymptotic Result: MatLab Computation

cd+B = 1080; L = N/2; N varies Fair share window size: cd+B/N

Implications
1.

2.
3.

Scalable TCP is (usually) unfair with bursty TCP TCP is unfriendly to TFRC; …

Fairness of Scalable TCP
For each RTT without a loss: wi (t+1) = αwi (t); α=1.01  For each RTT with a loss (loss event): wi (t+1) = βwi (t); β= 0.875  [Chiu,Jain’90]: MIMD algorithms cannot converges to fairness with synchronization model  [Kelly’03]: Scalable TCP (MIMD) converges to fairness in theory with fluid model  [Wei, Jin, Low’06][Li,Leith,Shorten’07]: Scalable TCP is unfair in experiments


Fairness of Scalable TCP: Chiu vs Kelly


[Chiu,Jain’90]: MIMD is not fair
 Assumption:

loss event rate is independent of window size (simplified synchronization model)



[Kelly’03]: Scalable TCP (MIMD) is fair
 Assumption:

loss event rate is proportional to window size (fluid model)

Fairness of Scalable TCP: Chiu vs Kelly


[Chiu,Jain’90]: MIMD is not fair
loss event rate is independent of window size (simplified synchronization model)  Sync. Rate Model: many bursty TCP flows
 Assumption:



[Kelly’03]: Scalable TCP is fair
 Assumption:

loss event rate is proportional to window size (fluid model)  Sync. Rate Model: true with very few bursty TCP flows or with paced TCP flows

Scalable TCP: simulations

Capacity=100Mbps; delay=200ms; buffer size: BDP; MTU=1500; N varies; averaged rate over 600 second runtime

Gap 2: Fairness of Scalable TCP


Analysis: “Scalable TCP is fair in homogeneous network” [Kelly’03]

Analysis: “MIMD in general is unfair.” [Chiu&Jain’90]. → Scalable TCP is unfair.


Reason: sub-RTT burstiness leads to similar loss sync. rate for different flows


Experiment: in most cases, Scalable TCP is unfair in homogeneous network.

TFRC vs TCP
incoming packets during the RTT of loss event from all flows burst period of TCP: w packets
1 1 1 2
a packet from TCP

1

2 1 2 2 2 2 1 2 2 2 2 1 1
a packet from TFRC a dropped packet

1

1

Legend:
a packet (from any flow)

burst period of loss signal L incoming packets



TCP: L  wi  1 i  cd  B  L



TFRC (same as Pacing):
wi

L   i  1  1    cd  B  L 

TFRC vs TCP: simulation

Gap 3: TCP vs TFRC


Analysis: “We designed TCP Friendly Rate Control (TFRC) algorithm to have the same equilibrium as TCP when they co-exist.”

Reason: sub-RTT burstiness leads to different loss sync. rate for TFRC and TCP


Experiment: TCP flows do not fairly coexist with TFRC flows.

Sub-RTT Burstiness: Summary
x(t) c
0
RTT
RTT

t (time)

Effects:  Low Loss Sync. Rate with DropTail router
Poor convergence  MIMD unfairness  TFRC unfriendly




Possible solutions
Eliminate sub-RTT burstiness: Pacing  Randomize loss signal: RED  Persistent loss signal: ECN


Outline









Motivation Overview of Microscopic behavior Stability of Delay-based Congestion Control Algorithms Fairness of Loss-based Congestion control algorithms Future works

Future: a research framework on microscopic Internet behavior






Experiment tools: help to observe, analyze and validate microscopic behavior in Internet: WAN-in-Lab, NS-2 TCP-Linux, … Theoretic model: more accurate models to capture the dynamic of Internet in microscopic timescale. New algorithms: new algorithms that utilize and control the microscopic Internet behavior

NS-2 TCP-Linux
The first tool that can run a congestion algorithm directly from Linux source code with the same simulation speed (sometimes even faster) 700+ local downloads (2400+ tutorial visits worldwide) 5+ Linux kernel fixes NS-2 Simulator 2+ papers Outreach:
   

 




BIC/Cubic-TCP (NCSU), Linux Implementation H-TCP (Hamilton), TCP Westwood (UCLA/Politecnico di Bari), A-Reno (NEC), …

Thank you!
Q&A


						
Related docs
Other docs by qyz12567