Transport Layer Computer Networking by cuteandfamous


More Info
									Section 3: Transport Layer
Our goals goals:             Overview:
r understand principles      r transport layer services
  behind transport layer     r multiplexing/demultiplexing
  services:                  r connectionless transport: UDP
   m  multiplexing/demultiplex
                               r principles of reliable data
   m reliable data transfer
                               r connection-oriented transport:
   m flow control
   m congestion control
                                  m reliable transfer
r instantiation and
                                  m flow control
  implementation in the
                                  m connection management
                             r principles of congestion control
                             r TCP congestion control

                                                  3: Transport Layer   3-1
Transport services and protocols

r provide    logical communication     application

    between app’ processes

    running on different hosts
                                        data link                  network
                                        physical                   data link
                                                     network       physical
r   transport protocols run in                       data link
    end systems                                                     network
                                                                    data link
r   transport vs network layer                                      physical          network
                                                                                      data link
    services:                                                                         physical

    network layer: data transfer
r                                                                         data link
    between end systems
r   transport layer: data                                                              application
    transfer between processes                                                          network
                                                                                        data link
    m   relies on, enhances, network
        layer services

                                                                 3: Transport Layer               3-2
Transport-layer protocols

Internet transport services:    application
r reliable, in-order unicast     network
                                 data link                  network
   delivery (TCP)                physical                   data link
                                              network       physical
   m   congestion                             data link
   m   flow control                                          network
                                                             data link
   m   connection setup                                      physical          network
                                                                               data link

r unreliable (“best-effort”),

  unordered unicast or                                             data link
  multicast delivery: UDP
r services not available:                                                       application
   m   real-time                                                                 data link
   m   bandwidth guarantees
   m   reliable multicast

                                                          3: Transport Layer               3-3
 Recall: segment - unit of data
                                              Demultiplexing: delivering
   exchanged between
                                              received segments to
   transport layer entities                   correct app layer processes
    m aka TPDU: transport
        protocol data unit
                                     P3                  P4
          application-layer               M          M
  segment               P1                transport                      P2
  header                                                            M
                         M                 network
                       application                                application
segment     Ht M       transport                                  transport
          Hn segment    network

                                                              3: Transport Layer   3-4
gathering data from multiple             32 bits
 app processes, enveloping
data with header (later used   source port #   dest port #
for demultiplexing)
                                  other header fields
r based on sender, receiver
  port numbers, IP addresses
   m source, dest port #s in
      each segment                    (message)
   m recall: well-known port
      numbers for specific
      applications              TCP/UDP segment format

                                          3: Transport Layer   3-5
Multiplexing/demultiplexing: examples
          source port: x                                  Web client
host A    dest. port: 23   server B                        host C

          source port:23
           dest. port: x
                                                   Source IP: C     Source IP: C
                                                    Dest IP: B       Dest IP: B
                                                  source port: y   source port: x
  port use: simple telnet app                     dest. port: 80   dest. port: 80

                                  Source IP: A
                                   Dest IP: B                          Web
         Web client              source port: x                      server B
            host A               dest. port: 80
                                                      port use: Web server

                                                           3: Transport Layer   3-6
UDP: User Datagram Protocol [RFC 768]
r “no frills,” “bare bones”
  Internet transport             Why is there a UDP?
                                 r no connection
r “best effort” service, UDP       establishment (which can
  segments may be:                 add delay)
   m lost                        r simple: no connection state
   m delivered out of order        at sender, receiver
     to app                      r small segment header
r   connectionless:              r no congestion control: UDP
    m   no handshaking between     can blast away as fast as
        UDP sender, receiver       desired
    m   each UDP segment
        handled independently
        of others

                                               3: Transport Layer   3-7
UDP: more
r often used for streaming
  multimedia apps                                   32 bits

   m loss tolerant         Length, in   source port #     dest port #
   m rate sensitive     bytes of UDP       length             checksum
r other UDP uses            including
  (why?):                     header
   m  DNS
   m SNMP                                      Application
r reliable transfer over UDP:
  add reliability at
  application layer
   m application-specific
                                           UDP segment format
      error recovery!

                                                     3: Transport Layer   3-8
UDP checksum
Goal: detect “errors” (e.g., flipped bits) in transmitted

Sender:                      Receiver:
r treat segment contents     r compute checksum of
  as sequence of 16-bit        received segment
  integers                   r check if computed checksum
r checksum: addition (1’s      equals checksum field value:
  complement sum) of            m NO - error detected
  segment contents              m YES - no error detected.
r sender puts checksum            But maybe errors
  value into UDP checksum         nonetheless? More later
  field                           ….

                                            3: Transport Layer   3-9
Principles of Reliable data transfer
r important in app., transport, link layers
r top-10 list of important networking topics!

r characteristics of unreliable channel will determine
   complexity of reliable data transfer protocol (rdt)
                                                   3: Transport Layer   3-10
 Reliable data transfer: getting started
rdt_send(): called from above,      deliver_data(): called by
  (e.g., by app.). Passed data to   rdt to deliver data to upper
 deliver to receiver upper layer

      send                                              receive
      side                                                side

 udt_send(): called by rdt,         rdt_rcv(): called when packet
   to transfer packet over           arrives on rcv-side of channel
unreliable channel to receiver

                                                 3: Transport Layer   3-11
Reliable data transfer: getting started
r incrementally develop sender, receiver sides of
  reliable data transfer protocol (rdt)
r consider only unidirectional data transfer
   m   but control info will flow on both directions!
r use finite state machines (FSM) to specify
  sender, receiver
                                event causing state transition
                               actions taken on state transition
 state: when in this
  “state” next state   state                                          state
                         1          event
uniquely determined                                                     2
      by next event                 actions

                                                        3: Transport Layer   3-12
Rdt1.0: reliable transfer over a reliable channel
r underlying channel perfectly reliable
   m no bit errors
   m no loss of packets

r separate FSMs for sender, receiver:
   m sender sends data into underlying channel
   m receiver read data from underlying channel

   Wait for    rdt_send(data)            Wait for     rdt_rcv(packet)
   call from                             call from    extract (packet,data)
    above      packet = make_pkt(data)    below       deliver_data(data)

               sender                                receiver

                                                        3: Transport Layer    3-13
Rdt2.0: channel with bit errors
  r underlying channel may flip bits in packet
     m recall: UDP checksum to detect bit errors

  r   the question: how to recover from errors:
      m   acknowledgements (ACKs): receiver explicitly tells sender
          that pkt received OK
      m   negative acknowledgements (NAKs): receiver explicitly
          tells sender that pkt had errors
      m   sender retransmits pkt on receipt of NAK
      m   human scenarios using ACKs, NAKs?
  r new mechanisms in rdt2.0 (beyond rdt1.0):
      m   error detection
      m   receiver feedback: control msgs (ACK,NAK) rcvr->sender

                                                  3: Transport Layer   3-14
rdt2.0: FSM specification
   snkpkt = make_pkt(data, checksum)                    receiver
                             rdt_rcv(rcvpkt) &&
Wait for           Wait for                          rdt_rcv(rcvpkt) &&
call from          ACK or     udt_send(sndpkt)        corrupt(rcvpkt)
 above              NAK

   rdt_rcv(rcvpkt) && isACK(rcvpkt)
                                                       Wait for
                                                       call from
 sender                                                 below

                                                  rdt_rcv(rcvpkt) &&

                                                    3: Transport Layer    3-15
rdt2.0: operation with no errors
   snkpkt = make_pkt(data, checksum)
                             rdt_rcv(rcvpkt) &&
Wait for           Wait for                          rdt_rcv(rcvpkt) &&
call from          ACK or     udt_send(sndpkt)        corrupt(rcvpkt)
 above              NAK

   rdt_rcv(rcvpkt) && isACK(rcvpkt)
                                                       Wait for
       L                                               call from

                                                  rdt_rcv(rcvpkt) &&

                                                    3: Transport Layer    3-16
rdt2.0: error scenario
   snkpkt = make_pkt(data, checksum)
                             rdt_rcv(rcvpkt) &&
Wait for           Wait for                          rdt_rcv(rcvpkt) &&
call from          ACK or     udt_send(sndpkt)        corrupt(rcvpkt)
 above              NAK

   rdt_rcv(rcvpkt) && isACK(rcvpkt)
                                                       Wait for
       L                                               call from

                                                  rdt_rcv(rcvpkt) &&

                                                    3: Transport Layer    3-17
rdt2.0 has a fatal flaw!
What happens if                Handling duplicates:
 ACK/NAK corrupted?            r sender adds   sequence
r sender doesn’t know what       number to each pkt
  happened at receiver!        r sender retransmits current
r can’t just retransmit:         pkt if ACK/NAK garbled
  possible duplicate           r receiver discards (doesn’t
                                 deliver up) duplicate pkt
What to do?
r sender ACKs/NAKs
  receiver’s ACK/NAK? What        stop and wait
  if sender ACK/NAK lost?         Sender sends one packet,
                                  then waits for receiver
r retransmit, but this might
  cause retransmission of
  correctly received pkt!
                                                3: Transport Layer   3-18
rdt2.1: sender, handles garbled ACK/NAKs
                             sndpkt = make_pkt(0, data, checksum)
                             udt_send(sndpkt)             rdt_rcv(rcvpkt) &&
                                                                    ( corrupt(rcvpkt) ||
                          Wait for                   Wait for
                                                     ACK or
                                                                    isNAK(rcvpkt) )
                         call 0 from
                                                     NAK 0           udt_send(sndpkt)
 && notcorrupt(rcvpkt)                                          rdt_rcv(rcvpkt)
 && isACK(rcvpkt)                                               && notcorrupt(rcvpkt)
                                                                && isACK(rcvpkt)
                         Wait for                      Wait for
                         ACK or                       call 1 from
 rdt_rcv(rcvpkt) &&      NAK 1                          above
 ( corrupt(rcvpkt) ||
 isNAK(rcvpkt) )                    rdt_send(data)

 udt_send(sndpkt)               sndpkt = make_pkt(1, data, checksum)

                                                                            3: Transport Layer   3-19
  rdt2.1: receiver, handles garbled ACK/NAKs
                             rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
                               && has_seq0(rcvpkt)
                             sndpkt = make_pkt(ACK, chksum)
                                                          rdt_rcv(rcvpkt) &&
 rdt_rcv(rcvpkt) &&                                         (corrupt(rcvpkt) ||
   (corrupt(rcvpkt) ||      Wait for            Wait for      has_seq0(rcvpkt)))
    has_seq1(rcvpkt)))       0 from             1 from
                             below                         sndpkt = make_pkt(NAK, chksum)
sndpkt = make_pkt(NAK, chksum)                             udt_send(sndpkt)
                            rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
                              && has_seq1(rcvpkt)
                             sndpkt = make_pkt(ACK, chksum)

                                                                 3: Transport Layer   3-20
rdt2.1: discussion
Sender:                        Receiver:
r seq # added to pkt           r must check if received
r two seq. #’s (0,1) will        packet is duplicate
  suffice. Why?                   m   state indicates whether
                                      0 or 1 is expected pkt
r must check if received              seq #
  ACK/NAK corrupted
                               r note: receiver can       not
r twice as many states           know if its last
   m   state must “remember”     ACK/NAK received OK
       whether “current” pkt
                                 at sender
       has 0 or 1 seq. #

                                                3: Transport Layer   3-21
rdt2.2: a NAK-free protocol

r same functionality as rdt2.1, using NAKs only
r instead of NAK, receiver sends ACK for last pkt
  received OK
   m   receiver must explicitly include seq # of pkt being ACKed
r duplicate ACK at sender results in same action as
  NAK: retransmit current pkt

                                                   3: Transport Layer   3-22
    rdt2.2: sender, receiver fragments
                                 sndpkt = make_pkt(0, data, checksum)
                                                                   rdt_rcv(rcvpkt) &&
                                                                   ( corrupt(rcvpkt) ||
                             Wait for                  Wait for
                                                         ACK         isACK(rcvpkt,1) )
                            call 0 from
                              above                       0          udt_send(sndpkt)
                                           sender FSM
                                            fragment                  rdt_rcv(rcvpkt)
                                                                      && notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) &&                                                    && isACK(rcvpkt,0)
  (corrupt(rcvpkt) ||                                                         L
   has_seq1(rcvpkt))    Wait for     receiver FSM
                        0 from
udt_send(sndpkt)        below          fragment
                                   rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
                                    && has_seq1(rcvpkt)
                                   sndpkt = make_pkt(ACK1, chksum)
                                   udt_send(sndpkt)                        3: Transport Layer   3-23
rdt3.0: channels with errors and loss

New assumption:                  Approach: sender waits
  underlying channel can           “reasonable” amount of
  also lose packets (data          time for ACK
  or ACKs)                       r retransmits if no ACK
   m   checksum, seq. #, ACKs,     received in this time
       retransmissions will be   r if pkt (or ACK) just delayed
       of help, but not enough     (not lost):
Q: how to deal with loss?           m retransmission will be
                                       duplicate, but use of seq.
   m   sender waits until
                                       #’s already handles this
       certain data or ACK
       lost, then retransmits       m receiver must specify seq
                                       # of pkt being ACKed
   m   yuck: drawbacks?
                                 r requires countdown timer

                                                  3: Transport Layer   3-24
rdt3.0 sender
                                                                              rdt_rcv(rcvpkt) &&
                                 sndpkt = make_pkt(0, data, checksum)         ( corrupt(rcvpkt) ||
                                 udt_send(sndpkt)                             isACK(rcvpkt,1) )
     rdt_rcv(rcvpkt)             start_timer                                           L
           L              Wait for                             Wait
                                                                for            timeout
                         call 0from
                                                               ACK0            udt_send(sndpkt)
    && notcorrupt(rcvpkt)                                                rdt_rcv(rcvpkt)
    && isACK(rcvpkt,1)                                                   && notcorrupt(rcvpkt)
    stop_timer                                                           && isACK(rcvpkt,0)
                               Wait                              Wait for
 timeout                        for                             call 1 from
 udt_send(sndpkt)              ACK1                               above
 start_timer                                                                     rdt_rcv(rcvpkt)
                                      rdt_send(data)                                    L
        rdt_rcv(rcvpkt) &&
        ( corrupt(rcvpkt) ||          sndpkt = make_pkt(1, data, checksum)
        isACK(rcvpkt,0) )             udt_send(sndpkt)

                                                                               3: Transport Layer    3-25
rdt3.0 in action

                   3: Transport Layer   3-26
rdt3.0 in action

                   3: Transport Layer   3-27
 Performance of rdt3.0

 r rdt3.0 works, but performance stinks
 r example: 1 Gbps link, 15 ms e-e prop. delay, 1KB packet:

Ttransmit =    L (packet length in bits)    8kb/pkt
                                          =             = 8 microsec
               R (transmission rate, bps)   10**9 b/sec

         U                  L/R             .008
                      =                 =            = 0.00027
             sender                         30.008
                          RTT + L / R                  microsec
    m   U sender: utilization – fraction of time sender busy sending
    m   1KB pkt every 30 msec -> 33kB/sec thruput over 1 Gbps link
    m   network protocol limits use of physical resources!

                                                            3: Transport Layer   3-28
    rdt3.0: stop-and-wait operation
                                     sender              receiver
    first packet bit transmitted, t = 0
last packet bit transmitted, t = L / R

                                                                first packet bit arrives
                                     RTT                        last packet bit arrives, send ACK

             ACK arrives, send next
             packet, t = RTT + L / R

                    U                      L/R         .008
                                 =                 =                = 0.00027
                        sender                         30.008
                                     RTT + L / R                      microsec

                                                                        3: Transport Layer   3-29
Pipelined protocols
Pipelining: sender allows multiple, “in-flight”, yet-to-
   be-acknowledged pkts
   m   range of sequence numbers must be increased
   m   buffering at sender and/or receiver

r Two generic forms of pipelined protocols:          go-Back-N,
   selective repeat
                                                3: Transport Layer   3-30
      Pipelining: increased utilization
                                        sender           receiver
first packet bit transmitted, t = 0
      last bit transmitted, t = L / R

                                                              first packet bit arrives
                                    RTT                       last packet bit arrives, send ACK

                                                              last bit of 2nd packet arrives, send ACK
          ACK arrives, send next                              last bit of 3rd packet arrives, send ACK
          packet, t = RTT + L / R

                                                                         Increase utilization
                                                                          by a factor of 3!

                   U                    3*L/R          .024
                                =                  =                = 0.0008
                       sender                          30.008
                                     RTT + L / R                       microsecon
                                                                            3: Transport Layer    3-31
r k-bit seq # in pkt header
r “window” of up to N, consecutive unack’ed pkts allowed

r ACK(n): ACKs all pkts up to, including seq # n - “cumulative ACK”
   m may deceive duplicate ACKs (see receiver)
r timer for each in-flight pkt
r timeout(n): retransmit pkt n and all higher seq # pkts in window

                                                   3: Transport Layer   3-32
GBN: sender extended FSM

                          if (nextseqnum < base+N) {
                              sndpkt[nextseqnum] = make_pkt(nextseqnum,data,chksum)
                              if (base == nextseqnum)
      L                   else
   rdt_rcv(rcvpkt)                           udt_send(sndpkt[base+1])
     && corrupt(rcvpkt)                      …
                           rdt_rcv(rcvpkt) &&
                            base = getacknum(rcvpkt)+1
                            If (base == nextseqnum)
                                start_timer                      3: Transport Layer   3-33
GBN: receiver extended FSM
                  udt_send(sndpkt)   rdt_rcv(rcvpkt)
                                      && notcurrupt(rcvpkt)
                                      && hasseqnum(rcvpkt,expectedseqnum)
            L                Wait    extract(rcvpkt,data)
        expectedseqnum=0             deliver_data(data)
                                     sndpkt = make_pkt(expectedseqnum,ACK,chksum)

 ACK-only: always send ACK for correctly-received pkt
   with highest in-order seq #
    m   may generate duplicate ACKs
    m   need only remember expectedseqnum
 r out-of-order pkt:
    m discard (don’t buffer) -> no receiver buffering!
    m Re-ACK pkt with highest in-order seq #
                                                             3: Transport Layer   3-34
GBN in

         3: Transport Layer   3-35
Selective Repeat
r receiver    individually acknowledges all correctly
  received pkts
   m   buffers pkts, as needed, for eventual in-order delivery
       to upper layer
r sender only resends pkts for which ACK not
   m   sender timer for each unACKed pkt
r sender window
   m N consecutive seq #’s
   m again limits seq #s of sent, unACKed pkts

                                                   3: Transport Layer   3-36
Selective repeat: sender, receiver windows

                                3: Transport Layer   3-37
Selective repeat
 sender                             receiver
data from above :                  pkt n in [rcvbase, rcvbase+N-1]
r if next available seq # in       r send ACK(n)
   window, send pkt                r out-of-order: buffer
timeout(n):                        r in-order: deliver (also
r resend pkt n, restart timer         deliver buffered, in-order
                                      pkts), advance window to
ACK(n) in [sendbase,sendbase+N]:      next not-yet-received pkt
r mark pkt n as received
                                   pkt n in   [rcvbase-N,rcvbase-1]
r if n smallest unACKed pkt,
                                   r ACK(n)
   advance window base to
   next unACKed seq #              otherwise:
                                   r ignore

                                                  3: Transport Layer   3-38
Selective repeat in action

                             3: Transport Layer   3-39
Selective repeat:
r seq #’s: 0, 1, 2, 3
r window size=3

r receiver sees no
  difference in two
r incorrectly passes
  duplicate data as new
  in (a)

Q: what relationship
   between seq # size
   and window size?
                          3: Transport Layer   3-40
         TCP: Overview                                 RFCs: 793, 1122, 1323, 2018, 2581

         r point-to-point:                                r full duplex data:
            m one sender, one receiver                       m bi-directional data flow

         r reliable, in-order         byte                     in same connection
                                                             m MSS: maximum segment
               m     no “message boundaries”
                                                          r connection-oriented:
         r pipelined:
                                                             m handshaking (exchange
            m TCP congestion and flow
                                                               of control msgs) init’s
              control set window size                          sender, receiver state
         r   send & receive buffers                            before data exchange
                                                            r flow controlled:
                                                               m sender will not
             application               application
             writes data               reads data
socket                                                  socket

                                                                 overwhelm receiver
 door                                                    door
                TCP                       TCP
             send buffer              receive buffer

                                                                        3: Transport Layer   3-41
   TCP segment structure
                                      32 bits
  URG: urgent data                                                     counting
(generally not used)    source port #          dest port #
                                                                       by bytes
                               sequence number                         of data
       ACK: ACK #
             valid         acknowledgement number                      (not segments!)
                       head not
PSH: push data now      len used
                                 UA P R S F   rcvr window size
(generally not used)                                                       # bytes
                           checksum            ptr urgent data
                                                                           rcvr willing
    RST, SYN, FIN:                                                         to accept
                           Options (variable length)
   connection estab
   (setup, teardown
           Internet                     data
          checksum                (variable length)
        (as in UDP)

                                                                 3: Transport Layer   3-42
 TCP seq. #’s and ACKs
Seq. #’s:
                                        Host A           Host B
    m byte stream
      “number” of first       User
      byte in segment’s         ‘C’
      data                                                      host ACKs
                                                                receipt of
ACKs:                                                           ‘C’, echoes
    m seq # of next byte                                          back ‘C’
      expected from
      other side            host ACKs
    m cumulative ACK         receipt
                            of echoed
Q: how receiver handles         ‘C’
   out-of-order segments
    m A: TCP spec doesn’t
      say, - up to
                                            simple telnet scenario
                                                      3: Transport Layer   3-43
TCP: reliable data transfer

                                      simplified sender, assuming
 event: data received
from application above
 create, send segment                      •one way data transfer
                                           •no flow, congestion control

               event: timer timeout for
   for          segment with seq # y
 event           retransmit segment

 event: ACK received,
    with ACK # y
   ACK processing

                                                       3: Transport Layer   3-44
TCP:         00 sendbase = initial_sequence number
             01 nextseqnum = initial_sequence number

             03 loop (forever) {
             04   switch(event)

             05   event: data received from application above
             06       create TCP segment with sequence number nextseqnum
             07       start timer for segment nextseqnum

             08       pass segment to IP
             09       nextseqnum = nextseqnum + length(data)
             10    event: timer timeout for segment with sequence number y
             11       retransmit segment with sequence number y
             12       compue new timeout interval for segment y
Simplified   13       restart timer for sequence number y
             14    event: ACK received, with ACK field value of y
TCP          15       if (y > sendbase) { /* cumulative ACK of all data up to y */
sender       16           cancel all timers for segments with sequence numbers < y
             17            sendbase = y
             18            }
             19       else { /* a duplicate ACK for already ACKed segment */
             20            increment number of duplicate ACKs received for y
             21            if (number of duplicate ACKS received for y == 3) {
             22                /* TCP fast retransmit */
             23               resend segment with sequence number y
             24               restart timer for segment y
             25           }
             26   } /* end of loop forever */
                                                          3: Transport Layer    3-45
TCP ACK generation                      [RFC 1122, RFC 2581]

Event                               TCP Receiver action
in-order segment arrival,           delayed ACK. Wait up to 500ms
no gaps,                            for next segment. If no next segment,
everything else already ACKed       send ACK

in-order segment arrival,           immediately send single
no gaps,                            cumulative ACK
one delayed ACK pending

out-of-order segment arrival        send duplicate ACK, indicating seq. #
higher-than-expect seq. #           of next expected byte
gap detected

arrival of segment that             immediate ACK if segment starts
partially or completely fills gap   at lower end of gap

                                                        3: Transport Layer   3-46
  TCP: retransmission scenarios
            Host A       Host B                         Host A           Host B

                                       Seq=92 timeout
                                  Seq=100 timeout


time                              time                   premature timeout,
              lost ACK scenario
                                                          cumulative ACKs

                                                                 3: Transport Layer   3-47
   TCP Flow Control
        flow control                         receiver: explicitly
         sender won’t overrun                   informs sender of
         receiver’s buffers by                  (dynamically changing)
        transmitting too much,                  amount of free buffer
                too fast                        space
                                                 m RcvWindow field in
RcvBuffer = size or TCP Receive Buffer              TCP segment
RcvWindow = amount of spare room in Buffer   sender: keeps the amount
                                                of transmitted,
                                                unACKed data less than
                                                most recently received

             receiver buffering
                                                       3: Transport Layer   3-48
TCP Round Trip Time and Timeout
Q: how to set TCP           Q: how to estimate RTT?
  timeout value?            r SampleRTT: measured time from
r longer than RTT             segment transmission until ACK
   m  note: RTT will vary
                               m ignore retransmissions,
r too short: premature
                                 cumulatively ACKed segments
                            r SampleRTT will vary, want
   m unnecessary
                              estimated RTT “smoother”
                               m use several recent
r too long: slow reaction
                                 measurements, not just
  to segment loss
                                 current SampleRTT

                                             3: Transport Layer   3-49
TCP Round Trip Time and Timeout
EstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTT
     r Exponential weighted moving average
     r influence of given sample decreases exponentially fast
     r typical value of x: 0.1

Setting the timeout
r EstimtedRTT plus “safety margin”
r large variation in EstimatedRTT -> larger safety margin

       Timeout = EstimatedRTT + 4*Deviation
    Deviation = (1-x)*Deviation +

                                               3: Transport Layer   3-50
Example RTT estimation:
                                                  RTT: to



RTT (milliseconds)



                           1   8   15   22   29     36      43      50       57        64     71   78   85     92    99   106
                                                                     time (seconnds)

                                                              SampleRTT           Estimated RTT

                                                                                                             3: Transport Layer   3-51
TCP Connection Management

Recall: TCP sender, receiver      Three way handshake:
  establish “connection”
  before exchanging data          Step 1: client end system
  segments                          sends TCP SYN control
r initialize TCP variables:         segment to server
   m seq. #s                         m specifies initial seq #

   m buffers, flow control
                                  Step 2: server end system
      info (e.g. RcvWindow)
                                    receives SYN, replies with
r client: connection initiator      SYNACK control segment
    Socket clientSocket = new
     Socket("hostname","port         m   ACKs received SYN
    number");                        m   allocates buffers
r   server: contacted by client      m   specifies server->
    Socket connectionSocket =            receiver initial seq. #
                                                    3: Transport Layer   3-52
TCP Connection Management (cont.)

Closing a connection:                         client       server

client closes socket:

Step 1: client end system                                            close
  sends TCP FIN control
  segment to server

Step 2: server receives
  FIN, replies with ACK.         timed wait
  Closes connection, sends
  FIN.                       closed

                                                       3: Transport Layer   3-53
TCP Connection Management (cont.)

Step 3: client receives FIN,                    client       server
  replies with ACK.             closing
   m   Enters “timed wait” -
       will respond with ACK
       to received FINs
Step 4: server, receives
  ACK. Connection closed.

                                   timed wait
Note: with small
  modification, can handly
  simultaneous FINs.

                                                         3: Transport Layer   3-54
TCP Connection Management (cont)

                           TCP server

TCP client

                         3: Transport Layer   3-55
Principles of Congestion Control

r informally: “too many sources sending too much
  data too fast for network to handle”
r different from flow control!
r manifestations:
   m lost packets (buffer overflow at routers)
   m long delays (queueing in router buffers)
r a top-10 problem!

                                         3: Transport Layer   3-56
 Causes/costs of congestion: scenario 1
                               Host A                                               lout
r two senders, two
                                        lin : original data

r one router,
                      Host B                                   unlimited shared
                                                              output link buffers

  infinite buffers
r no retransmission

                                                          r large delays
                                                            when congested
                                                          r maximum
                                                                      3: Transport Layer   3-57
 Causes/costs of congestion: scenario 2

r one router,   finite buffers
r sender retransmission of lost packet

                Host A   lin : original                   lout
                         l'in : original data, plus
                               retransmitted data

      Host B                      finite shared output
                                           link buffers

                                                           3: Transport Layer   3-58
Causes/costs of congestion: scenario 2
r always:   l= l     (goodput)
           in    out
r “perfect” retransmission only when loss:   l > lout
r   retransmission of delayed (not lost) packet makes l       larger
    (than perfect case) for same  lout

“costs” of congestion:
r more work (retrans) for given “goodput”
r unneeded retransmissions: link carries multiple copies of pkt
                                                    3: Transport Layer   3-59
Causes/costs of congestion: scenario 3
r four senders
                                        Q: what happens as l
r multihop paths                                             in
                                          and l increase ?
r timeout/retransmit                                     in
                       Host A                                       lout
                                lin : original data
                                l'in : original data, plus
                                      retransmitted data

                                          finite shared output
                                                   link buffers

     Host B

                                                                  3: Transport Layer   3-60
Causes/costs of congestion: scenario 3
                                   H             l
                                   A             t


Another “cost” of congestion:
r when packet dropped, any “upstream transmission
  capacity used for that packet was wasted!

                                       3: Transport Layer   3-61
Approaches towards congestion control
Two broad approaches towards congestion control:

 End-end congestion            Network-assisted
   control:                      congestion control:
 r no explicit feedback from   r routers provide feedback
   network                       to end systems
 r congestion inferred from       m single bit indicating
   end-system observed loss,         congestion (SNA,
   delay                             DECbit, TCP/IP ECN,
 r approach taken by TCP             ATM)
                                  m explicit rate sender
                                     should send at

                                               3: Transport Layer   3-62
Case study: ATM ABR congestion control

ABR: available bit rate:    RM (resource management)
r “elastic service”           cells:
r if sender’s path          r sent by sender, interspersed
  “underloaded”:              with data cells
   m sender should use      r bits in RM cell set by switches
      available bandwidth     (“network-assisted”)
r if sender’s path             m NI bit: no increase in rate
  congested:                      (mild congestion)
   m sender throttled to       m CI bit: congestion
      minimum guaranteed          indication
      rate                  r RM cells returned to sender by
                              receiver, with bits intact

                                            3: Transport Layer   3-63
Case study: ATM ABR congestion control

r two-byte ER (explicit rate) field in RM cell
   m congested switch may lower ER value in cell
   m sender’ send rate thus minimum supportable rate on path

r EFCI bit in data cells: set to 1 in congested switch
   m if data cell preceding RM cell has EFCI set, sender sets CI
     bit in returned RM cell

                                                  3: Transport Layer   3-64
TCP Congestion Control
r end-end control (no network assistance)
r transmission rate limited by congestion window
  size, Congwin, over segments:


r w segments, each with MSS bytes sent in one RTT:

                         w * MSS
          throughput =           Bytes/sec

                                             3: Transport Layer   3-65
TCP congestion control:
r “probing” for usable             r two “phases”
  bandwidth:                          m slow start
   m   ideally: transmit as fast      m congestion avoidance
       as possible (Congwin as
                                   r   important variables:
       large as possible)
       without loss                    m   Congwin
   m   increase Congwin until          m   threshold: defines
       loss (congestion)                   threshold between two
                                           slow start phase,
   m   loss: decrease Congwin,
                                           congestion control
       then begin probing
       (increasing) again

                                                     3: Transport Layer   3-66
TCP Slowstart
                                      Host A          Host B
Slowstart algorithm

initialize: Congwin = 1
for (each segment ACKed)
until (loss event OR
       CongWin > threshold)

r exponential increase (per
  RTT) in window size (not so
  slow!)                                                       time
r loss event: timeout (Tahoe
  TCP) and/or or three
  duplicate ACKs (Reno TCP)
                                               3: Transport Layer   3-67
TCP Congestion Avoidance: Tahoe
TCP Tahoe Congestion avoidance

/* slowstart is over   */
/* Congwin > threshold */
Until (loss event) {
  every w segments ACKed:
threshold = Congwin/2
Congwin = 1
perform slowstart

                                 3: Transport Layer   3-68
  TCP Congestion Avoidance: Reno
                              TCP Reno Congestion avoidance

r three duplicate ACKs        /* slowstart is over      */
  (Reno TCP):                 /* Congwin > threshold */
                              Until (loss event) {
r some segments are             every w segments ACKed:
  getting through                  Congwin++
  correctly!                    }
r don’t “overreact” by
                              threshold = Congwin/2
                              If (loss detected by timeout) {
  decreasing window to 1          Congwin = 1
  as in Tahoe                     perform slowstart }
   m   decrease window size   If (loss detected by triple
       by half                          duplicate ACK)
                                  Congwin = Congwin/2
                                                  3: Transport Layer   3-69
Congestion Avoidance: Reno
r increase window by one per RTT if no loss: Congwin++



r decrease window by half on detection of loss by triple
  duplicate ACK: CongWin = Congwin/2 W <- W/2

                                         3: Transport Layer   3-70
TCP Reno versus TCP Tahoe:
   congestion window size   14

                             4                 threshold
                                 1   2 3   4    5   6   7     8 9 10 11 12 13 14 15
                                                Transmission round

                                                    Series1        TCP
                                                    Tahoe          Reno

            Figure 3.49 (revised): Evolution of TCP’s Congestion
                                       window (Tahoe and Reno)
                                                                                  3: Transport Layer   3-71
                               TCP Fairness
TCP congestion
  avoidance:                    Fairness goal: if N TCP
r AIMD: additive                  sessions share same
  increase,                       bottleneck link, each
  multiplicative                  should get 1/N of link
  decrease                        capacity
  m   increase window by 1       TCP connection 1
      per RTT
  m   decrease window by
      factor of 2 on loss
                             connection 2
                                             capacity R

                                                    3: Transport Layer   3-72
Why is TCP fair?
Two competing sessions:
r Additive increase gives slope of 1, as throughout increases
r multiplicative decrease decreases throughput proportionally

            R               equal bandwidth share

                                loss: decrease window by factor of 2
                                congestion avoidance: additive increase
                                     loss: decrease window by factor of 2
                                   congestion avoidance: additive increase

                Connection 1 throughput R

                                                        3: Transport Layer   3-73
  TCP latency modeling
Q: How long does it take to Notation, assumptions:
  receive an object from a r Assume one link between
  Web server after sending    client and server of rate R
  a request?                r Assume: fixed congestion
r TCP connection establishment       window, W segments
r data transfer delay              r S: MSS (bits)
                                   r O: object size (bits)
                                   r no retransmissions (no loss,
                                     no corruption)
         Two cases to consider:
         r WS/R > RTT + S/R: ACK for first segment in
           window returns before window’s worth of data
         r WS/R < RTT + S/R: wait for ACK after sending
           window’s worth of data sent           3: Transport Layer   3-74
TCP latency Modeling                      K:= O/WS

Case 1: latency = 2RTT + O/R   Case 2: latency = 2RTT + O/R
                                + (K-1)[S/R + RTT - WS/R]

                                            3: Transport Layer   3-75
  TCP Latency Modeling: Slow Start
  r Now suppose window grows according to slow start.
  r Will show that the latency of one object of size O is:

                       O           S              S
   Latency  2 RTT       P  RTT    ( 2 P  1)
                       R           R              R

     where P is the number of times TCP stalls at server:

                 P  min{Q, K  1}

- where Q is the number of times the server would stall
  if the object were of infinite size.

- and K is the number of windows that cover the object.

                                                          3: Transport Layer   3-76
   TCP Latency Modeling: Slow Start (cont.)
                           initiate TCP


O/S = 15 segments             object
                                                                   first window
                                                                       = S/R

K = 4 windows                       RTT
                                                                   second window
                                                                       = 2S/R
                                                               third window
                                                                  = 4S/R
P = min{K-1,Q} = 2

Server stalls P=2 times.
                                                                   fourth window
                                                                      = 8S/R

                            object                                 transmission
                                                         time at
                                          time at        server

                                                    3: Transport Layer             3-77
          TCP Latency Modeling: Slow Start (cont.)
    RTT  time from when server starts to send segment
           until server receives acknowledgement
                                                  initiate TCP

 2k 1      time to transmit the kth window        request
         R                                           object
                                                                                                    first window
                                                                                                        = S/R

S              S                                         RTT
                                                                                                    second window

R  RTT  2k 1   stall time after the kth window                                                    = 2S/R
               R
                                                                                                third window
                                                                                                   = 4S/R

latency   2 RTT   stallTimep
                                                                                                    fourth window
                                                                                                       = 8S/R
         R          p 1
           O               S              S
            2 RTT   [  RTT  2 k 1 ]
           R          k 1 R              R        object
           O                  S               S
            2 RTT  P[ RTT  ]  ( 2 P  1)                    time at
                                                                                          time at
           R                  R               R                   client

                                                                           3: Transport Layer        3-78
Chapter 3: Summary
r principles behind
  transport layer services:
   m multiplexing/demultipl
   m reliable data transfer
   m flow control
   m congestion control       Next:
r instantiation and           r leaving the network
                                “edge” (application,
  implementation in the         transport layers)
  Internet                    r into the network
   m UDP                        “core”
   m TCP
                                      3: Transport Layer   3-79

To top