Congestion Control by ert554898

VIEWS: 4 PAGES: 65

									TOBB ETÜ BİL 551
Computer Networks
   Lecture 10
  March 16, 2007

      Spring 2007
  Friday 08:30 – 12:00
     Classroom: 212

     Bülent Tavlı
      Office: 169
   btavli@etu.edu.tr
                         Reminders
• Homework III
  – 5.8, 5.14, 5.48, 6.7, 6.17, 6.28, 6.39, 7.1, 7.17
  – Due: March 30. 2007
       Congestion Control

Outline
   Queuing Discipline
   Reacting to Congestion
   Avoiding Congestion
                                              Issues
• Two sides of the same coin
   – pre-allocate resources so at to avoid congestion
   – control congestion if (and when) is occurs

            Source
              1    10-M
                        bps
                            Ethe
                                 rnet        Router
                                                                         Destination
                                                      1.5-Mbps T1 link

                                       DDI
                                Mb ps F
                            -
            Source    100
              2


• Two points of implementation
   – hosts at the edges of the network (transport protocol)
   – routers inside the network (queuing discipline)
• Underlying service model
   – best-effort (assume for now)
   – multiple qualities of service (later)
                        Framework
• Connectionless flows
   – sequence of packets sent between source/destination pair
   – maintain soft state at the routers
               Source
                 1
                          Router
                                               Destination
                                                   1
                                     Router
               Source
                 2
                          Router
                                               Destination
                                                   2

               Source
                 3


• Taxonomy
   – router-centric versus host-centric
   – reservation-based versus feedback-based
   – window-based versus rate-based
                   Evaluation
• Fairness
• Power (ratio of throughput to delay)




                    Optimal
                                   Load
                     load
                             Queuing Discipline
• First-In-First-Out (FIFO)
   – does not discriminate between traffic sources
• Fair Queuing (FQ)
   – explicitly segregates traffic based on flows
   – ensures no flow captures more than its share of capacity
   – variation: weighted fair queuing (WFQ)
• Problem?
   Arriv ing                   Next f ree   Next to
   packet                      buf f er     transmit


                                                       Flow 1




   (a)
                                                       Flow 2
                      Free buf f ers Queued packets
                                                                Round-robin
                                                                  serv ice
   Arriv ing                                Next to
   packet                                   transmit
                                                       Flow 3



                                                       Flow 4
   (b)         Drop
                    FQ Algorithm (cont)
• For multiple flows
   – calculate Fi for each packet that arrives on each flow
   – treat all Fi’s as timestamps
   – next packet to transmit is one with lowest timestamp
• Not perfect: can’t preempt current packet
• Example
                                             Flow 1             Flow 2
      Flow 1            Flow 2   Output     (arriving)      (transmitting)   Output



F=8            F = 10                                    F = 10
F=5                                       F=2

                         (a)                                      (b)
         TCP Congestion Control
• Idea
  – assumes best-effort network (FIFO or FQ routers) each
    source determines network capacity for itself
  – uses implicit feedback
  – ACKs pace transmission (self-clocking)
• Challenge
  – determining the available capacity in the first place
  – adjusting to changes in the available capacity
   Additive Increase/Multiplicative
              Decrease
• Objective: adjust to changes in the available capacity
• New state variable per connection: CongestionWindow
   – limits how much data source has in transit

                MaxWin = MIN(CongestionWindow,
                             AdvertisedWindow)
                EffWin = MaxWin - (LastByteSent -
                                   LastByteAcked)
• Idea:
   – increase CongestionWindow when congestion goes down
   – decrease CongestionWindow when congestion goes up
                   AIMD (cont)

• Question: how does the source determine whether
  or not the network is congested?

• Answer: a timeout occurs
   – timeout signals that a packet was lost
   – packets are seldom lost due to transmission error
   – lost packet implies congestion
                     AIMD (cont)
                                                 Source   Destination


• Algorithm
   – increment CongestionWindow by
     one packet per RTT (linear increase)
   – divide CongestionWindow by two
     whenever a timeout occurs
     (multiplicative decrease)

• In practice: increment a little for each ACK
      Increment = (MSS * MSS)/CongestionWindow
      CongestionWindow += Increment
                   AIMD (cont)
• Trace: sawtooth behavior

70
60
50
40
30
20
10

      1.0   2.0   3.0   4.0        5.0     6.0   7.0   8.0   9.0   10.0
                              Time (seconds)
                    Slow Start
                                            Source   Destination


• Objective: determine the available
  capacity in the first
• Idea:
   – begin with CongestionWindow = 1
     packet
   – double CongestionWindow each RTT
     (increment by 1 packet for each ACK)
                 Slow Start (cont)
• Exponential growth, but slower than all at once
• Used…
   – when first starting connection
   – when connection goes dead waiting for timeout
• Trace
   70
   60
   50
   40
   30
   20
   10

          1.0   2.0   3.0   4.0      5.0     6.0   7.0   8.0   9.0
                            Time (seconds)

• Problem: lose up to half a CongestionWindow’s
  worth of data
 Fast Retransmit and Fast Recovery
                                     Sender   Receiv er

• Problem: coarse-grain        Packet 1
                               Packet 2
  TCP timeouts lead to idle    Packet 3           ACK 1

                               Packet 4           ACK 2
  periods
                               Packet 5           ACK 2

• Fast retransmit: use         Packet 6
                                                  ACK 2
  duplicate ACKs to trigger                       ACK 2

  retransmission              Retransmit
                              packet 3

                                                  ACK 6
                              Results
70
60
50
40
30
20
10

            1.0       2.0        3.0        4.0   5.0   6.0   7.0
                                 Time (seconds)


• Fast recovery
     – skip the slow start phase
     – go directly to half the last successful
       CongestionWindow (ssthresh)
             Congestion Avoidance
• TCP’s strategy
   – control congestion once it happens
   – repeatedly increase load in an effort to find the point at which
     congestion occurs, and then back off
• Alternative strategy
   – predict when congestion is about to happen
   – reduce rate before packets start being discarded
   – call this congestion avoidance, instead of congestion control
• Two possibilities
   – router-centric: DECbit and RED Gateways
   – host-centric: TCP Vegas
                                    DECbit
• Add binary congestion bit to each packet header
• Router
   – monitors average queue length over last busy+idle cycle
         Queue length




                                                     Current
                                                      time




                                                               Time
                        Prev ious          Current
                         cy cle             cy cle
                             Av eraging
                               interv al

   – set congestion bit if average queue length > 1
   – attempts to balance throughout against delay
                   End Hosts
• Destination echoes bit back to source
• Source records how many packets resulted in set bit
• If less than 50% of last window’s worth had bit set
   – increase CongestionWindow by 1 packet
• If 50% or more of last window’s worth had bit set
   – decrease CongestionWindow by 0.875 times
   Random Early Detection (RED)

• Notification is implicit
   – just drop the packet (TCP will timeout)
   – could make explicit by marking the packet
• Early random drop
   – rather than wait for queue to become full, drop each
     arriving packet with some drop probability whenever
     the queue length exceeds some drop level
                   RED Details
• Compute average queue length
     AvgLen = (1 - Weight) * AvgLen +
                 Weight * SampleLen
     0 < Weight < 1 (usually 0.002)
     SampleLen is queue length each time a packet arrives

         MaxThreshold        MinThreshold




                        Av gLen
             RED Details (cont)

• Two queue length thresholds

    if AvgLen <= MinThreshold then
       enqueue the packet
    if MinThreshold < AvgLen < MaxThreshold then
       calculate probability P
       drop arriving packet with probability P
    if ManThreshold <= AvgLen then
       drop arriving packet
            RED Details (cont)
• Computing probability P
     TempP = MaxP * (AvgLen - MinThreshold)/
            (MaxThreshold - MinThreshold)
     P = TempP/(1 - count * TempP)

• Drop Probability Curve
             P(drop)




             1.0




           MaxP
                                               Av gLen
                       MinThresh   MaxThresh
                      Tuning RED
• Probability of dropping a particular flow’s packet(s) is
  roughly proportional to the share of the bandwidth that flow
  is currently getting
• MaxP is typically set to 0.02, meaning that when the average
  queue size is halfway between the two thresholds, the
  gateway drops roughly one out of 50 packets.
• If traffic id bursty, then MinThreshold should be
  sufficiently large to allow link utilization to be maintained at
  an acceptably high level
• Difference between two thresholds should be larger than the
  typical increase in the calculated average queue length in one
  RTT; setting MaxThreshold to twice MinThreshold is
  reasonable for traffic on today’s Internet
• Penalty Box for Offenders
                        TCP Vegas
• Idea: source watches for some sign that router’s queue is
  building up and congestion will happen too; e.g.,
   – RTT grows                 70
                               60
                               50
   – sending rate flattens     40
                               30
                               20
                               10

                                    0.5 1.0 1.5   2.0 2.5 3.0 3.5 4.0 4.5 5.0   5.5 6.0   6.5   7.0 7.5 8.0 8.5
                                                             Time (seconds)



                             1100
                              900
                              700
                              500
                              300
                              100
                                    0.5 1.0 1.5   2.0 2.5 3.0 3.5 4.0 4.5 5.0   5.5 6.0   6.5   7.0 7.5 8.0 8.5
                                                             Time (seconds)



                               10


                                5



                                    0.5 1.0 1.5   2.0 2.5 3.0 3.5 4.0 4.5 5.0   5.5 6.0   6.5   7.0 7.5 8.0 8.5
                                                             Time (seconds)
                    Algorithm
• Let BaseRTT be the minimum of all measured RTTs
  (commonly the RTT of the first packet)
• If not overflowing the connection, then
      ExpectRate = CongestionWindow/BaseRTT
• Source calculates sending rate (ActualRate) once per RTT
• Source compares ActualRate with ExpectRate

  Diff = ExpectedRate - ActualRate
  if Diff < a
      increase CongestionWindow linearly
  else if Diff > b
      decrease CongestionWindow linearly
  else
      leave CongestionWindow unchanged
                  Algorithm (cont)
• Parameters          70
                      60
    a = 1 packet     50
                      40

    b = 3 packets
                      30
                      20
                      10

                           0.5   1.0   1.5   2.0   2.5   3.0   3.5   4.0   4.5   5.0   5.5   6.0   6.5   7.0   7.5   8.0
                                                               Time (seconds)




                     240
                     200
                     160
                     120
                      80
                      40

                           0.5   1.0   1.5   2.0   2.5   3.0   3.5 4.0 4.5 5.0         5.5   6.0   6.5   7.0   7.5   8.0
                                                               Time (seconds)



• Even faster retransmit
   – keep fine-grained timestamps for each packet
   – check for timeout on first duplicate ACK
          Quality of Service

Outline
   Realtime Applications
   Integrated Services
   Differentiated Services
                Realtime Applications
• Require “deliver on time” assurances
   – must come from inside the network


               Sampler,
  Microphone   A      D                            Buffer,
               converter                           D       A
                                                               Speaker


• Example application (audio)
   –   sample voice once every 125us
   –   each sample has a playback time
   –   packets experience variable delay in network
   –   add constant factor to playback time: playback point
                      Playback Buffer

                                               Packet
                                               arrival
Sequence number




                  Packet
                  generation                             Playback

                         Network      Buffer
                         delay




                               Time
Example Distribution of Delays
             90% 97% 98%            99%
3




2




1




       50          100             150    200
            Delay (milliseconds)
                   Taxonomy
          Applications




Elastic                      Real time




             Intolerant                        Tolerant




                          Nonadaptiv e                    Adaptiv e




                                         Rate adaptiv e               Delay adaptiv e
                Integrated Services
• Service Classes
   – guaranteed
   – controlled-load
• Mechanisms
   –   signalling protocol
   –   admission control
   –   policing
   –   packet scheduling
                        Flowspec
• Rspec: describes service requested from network
   – controlled-load: none
   – guaranteed: delay target
• Tspec: describes flow’s traffic characteristics
   –   average bandwidth + burstiness: token bucket filter
   –   token rate r
   –   bucket depth B
   –   must have a token to send a byte
   –   must have n tokens to send n bytes
   –   start with no tokens
   –   accumulate tokens at rate of r per second
   –   can accumulate no more than B tokens
        Flowspec
3


                    Flow B
2


    Flow A
1




         1          2       3   4
             Time (seconds)
        Per-Router Mechanisms
• Admission Control
  – decide if a new flow can be supported
  – answer depends on service class
  – not the same as policing
• Packet Processing
  – classification: associate each packet with the
    appropriate reservation
  – scheduling: manage queues so each packet receives the
    requested service
              Reservation Protocol
•   Called signaling in ATM
•   Proposed Internet standard: RSVP
•   Consistent with robustness of today’s connectionless model
•   Uses soft state (refresh periodically)
•   Designed to support multicast
•   Receiver-oriented
•   Two messages: PATH and RESV
•   Source transmits PATH messages every 30 seconds
•   Destination responds with RESV message
•   Merge requirements in case of multicast
•   Can specify number of speakers
            RSVP Example
           Sender 1
                           PATH



                       R
Sender 2                                   R
           PATH


                       RESV
                      (merged)
                                  R
                                                   RESV


                                               R          Receiver A



                                          RESV
                                      R


                                           Receiver B
      RSVP versus ATM (Q.2931)
• RSVP
  –   receiver generates reservation
  –   soft state (refresh/timeout)
  –   separate from route establishment
  –   QoS can change dynamically
  –   receiver heterogeneity
• ATM
  –   sender generates connection request
  –   hard state (explicit delete)
  –   concurrent with route establishment
  –   QoS is static for life of connection
  –   uniform QoS to all receivers
            Differentiated Services
• Problem with IntServ: scalability
• Idea: segregate packets into a small number of classes
   – e.g., premium vs best-effort
• Packets marked according to class at edge of network
• Core routers implement some per-hop-behavior (PHB)
• Example: Expedited Forwarding (EF)
   – rate-limit EF packets at the edges
   – PHB implemented with class-based priority queues or WFQ
                    DiffServ (cont)
• Assured Forwarding (AF)          P(drop)

   – customers sign service
     agreements with ISPs
   – edge routers mark packets     1.0
     as being “in” or “out” of
     profile
   – core routers run RIO: RED   MaxP
                                                                           Av gLen
     with in/out                         Min out   Min in Max out Max in
     Presentation Formatting

Outline
   Presentation Formatting
                  Presentation Formatting
• Marshalling (encoding)
  application data into     Application                            Application
                               data                                   data
  messages
• Unmarshalling
                            Presentation                           Presentation
  (decoding) messages        encoding                               decoding
  into application data
                                     Message   Message   ■■■   Message



• Data types we consider   • Types of data we do not consider
   –   integers               – images
   –   floats                 – video
   –   strings
                              – multimedia documents
   –   arrays
   –   structs
                                Difficulties
• Representation of base types
   – floating point: IEEE 754 versus non-standard
   – integer: big-endian versus little-endian (e.g., 34,677,374)
                      (2)       (17)         (34)       (126)
    Big-endian     00000010   0001 0001    001 00010   0111111 0

                     (126)      (34)         (17)        (2)
    Little- endian 01111110   001 0001 0   0001 0001   0000001 0



                 Low                                            High
                 address                                        address



• Compiler layout of structures
                         Taxonomy
• Data types
   – base types (e.g., ints, floats); must convert
   – flat types (e.g., structures, arrays); must pack
   – complex types (e.g., pointers); must linearize

                                                                Application data structure




                                          Argument marshaller

• Conversion Strategy
   – canonical intermediate form
   – receiver-makes-right (an N x N solution)
                  Taxonomy (cont)
• Tagged versus untagged data
    ty pe =
            len = 4     v alue = 417892
      INT

• Stubs
  – compiled                                  Interf ace
                                           descriptor f or

  – interpreted       Call P
                                            procedure P
                                                                           P



                               Arguments                Specif ication             Arguments


                                    Code                     Code
                      Client                   Stub                      Serv er
                      stub                   compiler                     stub


                           Marshalled                                          Marshalled
                           arguments                                           arguments



                       RPC                                               RPC


                                              Message
       eXternal Data Representation
                 (XDR)
•   Defined by Sun for use with SunRPC
•   C type system (without function pointers)
•   Canonical intermediate form
•   Untagged (except array length)
•   Compiled stubs
#define MAXNAME 256;
#define MAXLIST 100;

struct item {
   int     count;
   char    name[MAXNAME];
   int     list[MAXLIST];
};

bool_t
xdr_item(XDR *xdrs, struct item *ptr)
{
   return(xdr_int(xdrs, &ptr->count) &&
       xdr_string(xdrs, &ptr->name, MAXNAME) &&
       xdr_array(xdrs, &ptr->list, &ptr->count,
                 MAXLIST, sizeof(int), xdr_int));
}
         Count                     Name
           3           7       J     O     H     N   S   O    N


                              List

               3       4 97              8 321               2 65
            Abstract Syntax Notation One
                      (ASN-1)
•    An ISO standard
•    Essentially the C type system
•    Canonical intermediate form
•    Tagged
•    Compiled or interpretted stubs
•    BER: Basic Encoding Rules
            (tag, length, value)

    ty pe   length   ty pe length   v alue       ty pe length   v alue
                                             v alue

                     INT    4       4-by te integer
       Network Data Representation
                (NDR)
• Defined by DCE                                    – IntegerRep
                                                        • 0 = big-endian
• Essentially the C type system                         • 1 = little-endian
• Receiver-makes-right                              – CharRep
  (architecture tag)                                    • 0 = ASCII
                                                        • 1 = EBCDIC
• Individual data items untagged                    – FloatRep
• Compiled stubs from IDL                               •    0 = IEEE 754
                                                        •    1 = VAX
• 4-byte architecture tag                               •    2 = Cray
                                                        •    3 = IBM
  0       4           8              16                 24                  31
  IntegrRep CharRep       FloatRep        Extension 1         Extension 2
             Multimedia

Outline
   Compression
   RTP
   Scheduling
          Compression Overview
• Encoding and Compression
  – Huffman codes
• Lossless
  – data received = data sent
  – used for executables, text files, numeric data
• Lossy
  – data received does not != data sent
  – used for images, video, audio
               Lossless Algorithms
• Run Length Encoding (RLE)
  – example: AAABBCDDDD encoding as 3A2B1C4D
  – good for scanned text (8-to-1 compression ratio)
  – can increase size for data with variation (e.g., some images)
• Differential Pulse Code Modulation (DPCM)
  – example AAABBCDDDD encoding as A0001123333
  – change reference symbol if delta becomes too large
  – works better than RLE for many digital images (1.5-to-1)
           Dictionary-Based Methods
• Build dictionary of common terms
    – variable length strings
•   Transmit index into dictionary for each term
•   Lempel-Ziv (LZ) is the best-known example
•   Commonly achieve 2-to-1 ration on text
•   Variation of LZ used to compress GIF images
    – first reduce 24-bit color to 8-bit color
    – treat common sequence of pixels as terms in dictionary
    – not uncommon to achieve 10-to-1 compression (x3)
             Image Compression
• JPEG: Joint Photographic Expert Group (ISO/ITU)
• Lossy still-image compression
• Three phase process
                         JPEG compression
    Source                                             Compressed
    image        DCT       Quantization     Encoding     image



   – process in 8x8 block chunks (macro-block)
   – grayscale: each pixel is three values (YUV)
   – DCT: transforms signal from spatial domain into and equivalent
     signal in the frequency domain (loss-less)
   – apply a quantization to the results (lossy)
   – RLE-like encoding (loss-less)
       Quantization and Encoding
• Quantization Table
                       3   5   7   9   11 13 15 17
                       5   7   9   11 13 15 17 19
                       7   9   11 13 15 17 19 21
                       9   11 13 15 17 19 21 23
                       11 13 15 17 19 21 23 25
                       13 15 17 19 21 23 25 27
                       15 17 19 21 23 25 27 29
                       17 19 21 23 25 27 29 31

• Encoding Pattern
                      MPEG
•   Motion Picture Expert Group
•   Lossy compression of video
•   First approximation: JPEG on each frame
•   Also remove inter-frame redundancy
                          MPEG (cont)
• Frame types
   – I frames: intrapicture
   – P frames: predicted picture
   – B frames: bidirectional predicted picture
          Input      Frame 1 Frame 2 Frame 3 Frame 4 Frame 5 Frame 6 Frame 7
          stream




                                                    MPEG
                                                  compression
                                      Forward
                                     prediction

          Compressed I f rame   B f rame B f rame P f rame B f rame B f rame I f rame
          stream

                                                  Bidirectional
                                                   prediction



• Example sequence transmitted as I P B B I B B
                       MPEG (cont)
• B and P frames
   –   coordinate for the macroblock in the frame
   –   motion vector relative to previous reference frame (B, P)
   –   motion vector relative to subsequent reference frame (B)
   –   delta for each pixel in the macro block
• Effectiveness
   –   typically 90-to-1
   –   as high as 150-to-1
   –   30-to-1 for I frames
   –   P and B frames get another 3 to 5x
                         MP3
• CD Quality
  – 44.1 kHz sampling rate
  – 2 x 44.1 x 1000 x 16 = 1.41 Mbps
  – 49/16 x 1.41 Mbps = 4.32 Mbps
• Strategy
  –   split into some number of frequency bands
  –   divide each subband into a sequence of blocks
  –   encode each block using DCT + Quantization + Huffman
  –   trick: how many bits assigned to each subband
                           RTP
• Application-Level Framing
• Data Packets
   – sequence number
   – timestamp (app defines “tick”)
• Control Packets (send periodically)
   – loss rate (fraction of packets received since last report)
   – measured jitter
              Transmitting MPEG
• Adapt the encoding
  –   resolution
  –   frame rate
  –   quantization table
  –   GOP mix
• Packetization
• Dealing with loss
• GOP-induced latency
                  Layered Video
• Layered encodeing
   – e.g., wavelet encoded
• Receiver Layered Multicast (RLM)
   – transmit each layer to a different group address
   – receivers subscribe to the groups they can “afford”
   – Probe to learn if you can afford next higher group/layer
• Smart Packet Dropper (multicast or unicast)
   – select layers to send/drop based on observed congestion
   – observe directly or use RTP feedback
            Real-Time Scheduling
•   Priority
•   Earliest Deadline First (EDF)
•   Rate Monotonic (RM)
•   Proportional Share
    – with feedback
    – with adjustments for deadlines

								
To top