Docstoc

Fair Queuing for Aggregated Multiple Links

Document Sample
Fair Queuing for Aggregated Multiple Links Powered By Docstoc
					        Fair Queuing
        for Aggregated
        Multiple Links
Josep M. Blanquer and Banu Özden
Proceedings of the ACM SIGCOMM,
August 2001

                                   1
ABSTRACT
   Fair Queuing algorithms
     Proportionally sharing   a single server among
      competing flows
     Do not address the problem of sharing multiple
      servers.
   Multiserver applications
     Link aggregation
     Multiprocessors
     Multi-path storage I/O


                                                       2
 We introduce a new service discipline for
  multi-server systems, MSF2Q, that provides
  guarantees for competing flows.
 We prove that this new service discipline is a
  close approximation of the idealized
  Generalized Processor Sharing (GPS)
  discipline.
 We calculate its maximum packet delay and
  service discrepancy with respect to GPS.

                                                   3
1. INTRODUCTION
   A large increase in networked services
     a much larger variety of traffic
     different network requirements to be met
    simultaneously over the same links.

     High  bandwidth guarantee  backups
      low jitter guarantees video streaming
      low delay guarantees network data acquisition
     Network resources must be appropriately scheduled.



                                                           4
   Fair Queuing service disciplines allocates
    bandwidth fairly among competing traffic.
     Protection from “misbehaving” traffic
     Effective congestion control
     Better services for rate-adaptive applications
     Strict QoS guarantees, with admission control.




                                                       5
   Growing demand for bandwidth
     Incremental scaling techniques
     Grouping multiple links into a single logical
    interface [3]

   Implementations
     [1] 3Com’s Dynamic Access
     [2] Adaptec Duralink Software Suite
     [12] Hewlett Packard’s Auto-Port Aggregation
     [14] Intel Load Balancing
     [6] J. Blanquer, al. et. Resource Management for QoS in
      Eclipse/BSD, Proceedings of the First FreeBSD
      Conference, Berkeley, California, Oct. 1999.
                                                                6
Adaptec Duralink
                   7
HP Auto Port Aggregation
                           8
Intel Load Balancing
                       9
2. BACKGROUND
   GPS (Generalized Processor Sharing)
     Guaranteed       fairness


        Wx(τ, t) = the amount of traffic for flow x served in the
         interval [τ, t], while any flow x that is continuously
         backlogged during [τ, t].
        ψx = weight of flow x
             = proportion of the server bandwidth that flow x
         receives
               when it is backlogged.
     Guaranteed       rate:
          ri = rate of flow i
           r = server rate
                                                                     10
   Generalized Processor Sharing (GPS)
     An  idealized system that serves as a reference model for the
      fair queuing disciplines.
     The server transmits more than one flow simultaneously and
      that the traffic is infinitely divisible.
   A number of packetized approximations to GPS have
    been devised.
     WFQ    (Weighted Fair Queueing) ’89 Demers et al.
     VC (Virtual Clock) ’90 Zhang
     GPS (General Processor Sharing) ’93 Parekh et al.
     SCFQ (Self-Clocked Fair Queueing) ’94 Golestani
     WF2Q (Worst-case Fair Weighted Fair Queueing) ’96 Bennett
      et al.
     SFQ (Start Time Fair Queueing) ’96 Goyal et al.

                                                                      11
* A New Priority Calculation Method for Sorted-
priority Fair Queuing – Liu et al., 2004
B. Current packet priority calculation methods
 Three best known packet priority calculation
  methods are [9]
   Smallest Finish time First (SFF)
      Packet selection: PiX(t) + li/I (li = packet length)
      WFQ and SCFQ

   Smallest Start time First     (SSF)
      Packet selection: PiX(t)

      SFQ

   Smallest Eligible Finish time First (SEFF)
      Pre-selection: sessions with session potentials smaller than the
       system potential.
      Packet selection: (SFF) PiX(t) + li/i

      WF2Q
                                                                          12
3. PROPORTIONAL SHARING OF
MULTISERVER SYSTEMS
   Numerous applications utilizing multi-server
    systems that can benefit from service
    guarantees:
     Network:
      Multiple network adapters to a web or file server
     Storage:
      Multiple I/O channels to a RAID server




                                                          13
   System Model


                         WFQ




          (MSFQ, N, r)

                               14
           WFQ




(GPS, 1, Nr)

                 15
3.1 A Packetized Fair Queuing Discipline for
Multi-Servers
   MSFQ’s Scheduling discipline is the same as GPS:
     When   a server is idle and there is a packet waiting for
      service, MSFQ schedules the “next” packet.
     The “next” packet is defined as the first packet that would
      complete service in the (GPS, 1,Nr) system if no more
      packets were to arrive.
   To compare how well a (MSFQ ,N, r) system
    approximates a (GPS, 1,Nr) system, calculate:
    (i) the worst case delay
    (ii) the traffic discrepancy


                                                                    16
 3.2 Preliminary Properties
  Delay and service properties of MSFQ do not
   trivially follow from the single server case,
   WFQ.
      GPS     and MSFQ busy periods do not coincide.
                 Nr
(GPS, 1,Nr)                        Finish Time
                                   Δ1 = L / Nr
                 r                               Bits left
                                                 = L – [r * (L/Nr)]
                 r                               = L – (L/N)
(MSFQ ,N, r)
                            …                    = (N-1)L / N
                 r
                                                 Finish Time
                                                 Δ2 = L / r
                           τ
                      W(0, τ) ≥ W’ (0, τ)                            17
 When GPS is busy, MSFQ is busy.
  However, the converse is not true.
 Thus for any τ ,
               W(0, τ) ≥ W’(0, τ),            (2)
  where W(0, τ) and W’ (0, τ) denote the total number of
  bits serviced by GPS and MSFQ , respectively, by time
  τ.

   We will use the term busy period to refer to a busy
    period in the reference (GPS, 1,Nr) system.




                                                           18
 Workfrom previous busy periods can accumulate
 under MSFQ.
     This may happen either at the beginning or in the
      middle of a busy period.
                                                       Arrival Time

       1     2       3        4     5       6          7




                                   Delayed Finish Service Time
                                                                      19
                                    Arrival Time

1   2   3   4    5       6          7




                Delayed Start Service Time
                                                   20
 Theorem 1:
  For any τ,
     W(0, τ) − W’ (0, τ) ≤ (N − 1) Lmax
  where Lmax denote the maximum packet length.
 Proof:
     The  slope of W (GPS) alternates between
      Nr (when a busy period resumes) and
      0 (idle, between two consecutive busy periods).
     The slope of W’ (MSFQ) is at most Nr
      at any given time,

                                                        21
                                                   Assume 3 servers

   W(0, t)


                    GPS
                    Slope = 0 or nr




MSFQ
Slope = r, 2r, 3r
                                                                      t
          0   a1    a2    a3          a4 a5   a6     a7 a8   a9
                          t0                  t0
                                                                          22
[Case 1] At most N − 1 MSFQ servers are
 busy at t:
   Since  MSFQ is work-conserving, if a server is
    idle, we know that there is no packet waiting for
    transmission.
   In the worst case, all the k busy servers have
    just started transmitting a packet of maximum
    length (Lmax).
          W(0, t) − W’ (0, t) ≤ k Lmax           (a)
    where k = N – 1

                                                        23
         [Case        2] All MSFQ servers are busy at t:
               Let [to, t] be the largest interval in which
                all MSFQ servers are busy.
               Since in [to, t] the slope of W’ is Nr ,
                    W(0, t) − W’(0, t) ≤ W(0, to) − W’(0, to)             (b)

                   GPS server
                   Slope = Nr                    W(to, t)
all MSFQ servers are busy
Slope = Nr                                       W’(to, t)

                                                  W(to, t)  W’(to, t)

                            0    t0         t



                                                                            24
                              W(0, t) = W’(0, t)



t0 = 0                    t

               = 0, then W(0, t) = W’(0, t).
          If to
           Otherwise, if to > 0, we know from (a),
                  W(0, to) − W’(0, to) ≤ (N − 1) Lmax     (c)

          From    (b) and (c), we have
                     W(0, τ) − W’ (0, τ) ≤ (N − 1) Lmax   


          This theorem implies the need for a buffer space
           of (N − 1) Lmax.
                                                                25
   The discrepancy of packet departure times
    (i.e. begin transmitting/servicing) between
    multi-server and single-server
     Let dp be the time at which packet p departs from
      (GPS, 1,Nr) system.
     MSFQ packets may not depart in increasing order
      of dp.




                                                          26
   Lemma 1:
    Packet k will be scheduled no later than:

                    bk  ak   
                                   iP
                                          Li
                                Nr
    where
    ak and bk be respectively the arrival time and
        scheduling time of packet k over N servers, each
        with a rate of r,
    P be the set of packets scheduled before packet k
        since time ak, including the packets in service at ak,
    Li be the length of packet i.

                                                                 27
   Proof:
     Given   a load that must be scheduled before packet
      k, a work conserving service discipline schedules
      packet k latest, if the load is equally divided among
      the N servers such that all of them finish the work
      at the same time. 
                        Packet arrivals
                        from all flows        iP
                                                     Li




               ak
                             iP
                                    Li
                                          bk
                             Nr
                                                              28
4. PACKET DELAY
   Theorem 2:
    For all packets p,
                              ( N  1) L p     Lm ax
                d p 'd p                   
                                  Nr            r
  where dp’ and dp be the time at which packet p
  departs from the (MSFQ,N, r) and (GPS,1, Nr)
  system, respectively.
 Proof:
     Skipped

                                                       29
5. SERVICE PER-FLOW
 Theorem 3:
  For any τ ,
     Wi(0, τ) − Wi’ (0, τ) ≤ N Lmax
 Proof:
     Skipped




                                      30
6. FAIRNESS
   Example 3:
     4 servers:
     11 flows: (fixed packet length)
         F1: Weight = 0.5, 10 packets at t = 0
         F2 ~ F11: Weight = 0.05, each with 1 packet at t = 0




                                                                 31
   GPS Scheduled by WFQ ( finish time):


            F1A = 0 + L / 0.5
                F1B = F1A + L / 0.5 = 2L / 0.5
                      ……

                                                 F2 = 0 + L / 0.05
                                                 F3 = 0 + L / 0.05
                                                 ……




                                                                32
   MSFQ Scheduled by WFQ ( finish time):




                                             33
   GPS Scheduled by WF2Q
    (eligible start time (HOL) + finish time):
                                   * Not Smooth?
                      ?




                                                   34
   The direct application of WF2Q technique to
    multi-server systems does not fix the
    undesired burstiness problem and moreover,
    it makes the discipline non-workconserving.


                                      Not eligible
                                      until
                                      the previous pkt
                                      is scheduled
                                      
                                      non-workconserving



                                                      35
6.1 MSF2Q
 (MSF2Q,N, r)
 A packet is outstanding if it is being transmitted.
 Let ôi(t) denote the number of outstanding flow i
  packets at the MSF2Q system at time t.
 Ŵi(τ, t) = the work completed for flow i under
  MSF2Q over the interval [τ, t]




                                                    36
   At time t, when a server is idle and there is a
    packet waiting for service, MSF2Q schedules
    among the flows (eligible) that satisfy
                                                       Example 3:
     ˆ
    Wi (0, t )  Wi (0, t )
                                                       F1:       r1 = 0.5
    or                                                 F2~F10: rx = 0.05
                                        ri (t ) 
    [ Wi (0, t )  Wi (0, t ) and i  
       ˆ                          ˆ               ]
                                                       r = 1/4 = 0.25
                                                       
                                         r           ô1 = 0.5/0.25 = 2
                                                       ôx = 0.05/0.25 = 1

   That would complete service in the GPS
    system earliest
                                                                              37
   The output of MSF2Q in Example 3:

                            * Smooth scheduling



                                     Example 3:

                                     F1:       r1 = 0.5
                                     F2~F10: rx = 0.05
                                     r = 1/4 = 0.25
                                     
                                     ô1 = 0.5/0.25 = 2
                                     ôx = 0.05/0.25 = 1




                                                            38
6.2 Properties of MSF2Q
 Theorem 4:
  Let Li,max denote the maximum packet length
  of flow i. For any time τ and flow i, the
  following property holds:
          ˆ
         Wi (0, )  Wi (0, )  NLi ,max   (8)
 Proof:
     Skipped




                                                  39
7. APPLICATIONS
   Link Aggregation
     Logical  grouping of several Ethernet network
      interfaces to allow for cost-effective, load
      balancing, better scalability, and fault-tolerance.
     IEEE 802.3ad
     Currently ranges from two to eight Fast/Gigabit
      Ethernet ports in either servers or switching
      elements.




                                                            40
   Access of storage I/O
     To connect the RAID system to a host (e.g., Web
      server) with multiple SCSI or Fiber Channels to
      improve the I/O performance.
     Load balancing, failover




                                                        41
8. RELATEDWORK
   Skipped




                 42
9. CONTRIBUTIONS AND
FUTUREWORK
 Link aggregation, or the aggregation of
  multiple interfaces into a single logical link, is
  becoming the predominant approach for
  bandwidth scaling.
 Numerous fair queuing results previously
  obtained for single server systems do not
  directly apply to multi-server systems.



                                                       43
 We first analyzed the cumulative service,
  packet delay and per-flow cumulative service
  bounds for Weighted Fair Queuing (WFQ)
  applied to a multi-server system.
 We then presented a new fair queuing
  algorithm - MSF2Q that leads to smooth and
  fair schedules in finer time scales.




                                                 44
   Our future plans include:
     Investigation of implementation issues
     Quantitative comparison of the approach
      presented in this paper to the alternative
      approach of partitioning flows among servers
     Enhancing the algorithms for multiprocessors and
      cluster of servers
     Hierarchal GPS
     Servers with different rates
     Misordering of packets

                                                         45

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:17
posted:8/26/2012
language:
pages:45