Fair Queuing for Aggregated Multiple Links by lanyuehua

VIEWS: 17 PAGES: 45

									        Fair Queuing
        for Aggregated
        Multiple Links
Josep M. Blanquer and Banu Özden
Proceedings of the ACM SIGCOMM,
August 2001

                                   1
ABSTRACT
   Fair Queuing algorithms
     Proportionally sharing   a single server among
      competing flows
     Do not address the problem of sharing multiple
      servers.
   Multiserver applications
     Link aggregation
     Multiprocessors
     Multi-path storage I/O


                                                       2
 We introduce a new service discipline for
  multi-server systems, MSF2Q, that provides
  guarantees for competing flows.
 We prove that this new service discipline is a
  close approximation of the idealized
  Generalized Processor Sharing (GPS)
  discipline.
 We calculate its maximum packet delay and
  service discrepancy with respect to GPS.

                                                   3
1. INTRODUCTION
   A large increase in networked services
     a much larger variety of traffic
     different network requirements to be met
    simultaneously over the same links.

     High  bandwidth guarantee  backups
      low jitter guarantees video streaming
      low delay guarantees network data acquisition
     Network resources must be appropriately scheduled.



                                                           4
   Fair Queuing service disciplines allocates
    bandwidth fairly among competing traffic.
     Protection from “misbehaving” traffic
     Effective congestion control
     Better services for rate-adaptive applications
     Strict QoS guarantees, with admission control.




                                                       5
   Growing demand for bandwidth
     Incremental scaling techniques
     Grouping multiple links into a single logical
    interface [3]

   Implementations
     [1] 3Com’s Dynamic Access
     [2] Adaptec Duralink Software Suite
     [12] Hewlett Packard’s Auto-Port Aggregation
     [14] Intel Load Balancing
     [6] J. Blanquer, al. et. Resource Management for QoS in
      Eclipse/BSD, Proceedings of the First FreeBSD
      Conference, Berkeley, California, Oct. 1999.
                                                                6
Adaptec Duralink
                   7
HP Auto Port Aggregation
                           8
Intel Load Balancing
                       9
2. BACKGROUND
   GPS (Generalized Processor Sharing)
     Guaranteed       fairness


        Wx(τ, t) = the amount of traffic for flow x served in the
         interval [τ, t], while any flow x that is continuously
         backlogged during [τ, t].
        ψx = weight of flow x
             = proportion of the server bandwidth that flow x
         receives
               when it is backlogged.
     Guaranteed       rate:
          ri = rate of flow i
           r = server rate
                                                                     10
   Generalized Processor Sharing (GPS)
     An  idealized system that serves as a reference model for the
      fair queuing disciplines.
     The server transmits more than one flow simultaneously and
      that the traffic is infinitely divisible.
   A number of packetized approximations to GPS have
    been devised.
     WFQ    (Weighted Fair Queueing) ’89 Demers et al.
     VC (Virtual Clock) ’90 Zhang
     GPS (General Processor Sharing) ’93 Parekh et al.
     SCFQ (Self-Clocked Fair Queueing) ’94 Golestani
     WF2Q (Worst-case Fair Weighted Fair Queueing) ’96 Bennett
      et al.
     SFQ (Start Time Fair Queueing) ’96 Goyal et al.

                                                                      11
* A New Priority Calculation Method for Sorted-
priority Fair Queuing – Liu et al., 2004
B. Current packet priority calculation methods
 Three best known packet priority calculation
  methods are [9]
   Smallest Finish time First (SFF)
      Packet selection: PiX(t) + li/I (li = packet length)
      WFQ and SCFQ

   Smallest Start time First     (SSF)
      Packet selection: PiX(t)

      SFQ

   Smallest Eligible Finish time First (SEFF)
      Pre-selection: sessions with session potentials smaller than the
       system potential.
      Packet selection: (SFF) PiX(t) + li/i

      WF2Q
                                                                          12
3. PROPORTIONAL SHARING OF
MULTISERVER SYSTEMS
   Numerous applications utilizing multi-server
    systems that can benefit from service
    guarantees:
     Network:
      Multiple network adapters to a web or file server
     Storage:
      Multiple I/O channels to a RAID server




                                                          13
   System Model


                         WFQ




          (MSFQ, N, r)

                               14
           WFQ




(GPS, 1, Nr)

                 15
3.1 A Packetized Fair Queuing Discipline for
Multi-Servers
   MSFQ’s Scheduling discipline is the same as GPS:
     When   a server is idle and there is a packet waiting for
      service, MSFQ schedules the “next” packet.
     The “next” packet is defined as the first packet that would
      complete service in the (GPS, 1,Nr) system if no more
      packets were to arrive.
   To compare how well a (MSFQ ,N, r) system
    approximates a (GPS, 1,Nr) system, calculate:
    (i) the worst case delay
    (ii) the traffic discrepancy


                                                                    16
 3.2 Preliminary Properties
  Delay and service properties of MSFQ do not
   trivially follow from the single server case,
   WFQ.
      GPS     and MSFQ busy periods do not coincide.
                 Nr
(GPS, 1,Nr)                        Finish Time
                                   Δ1 = L / Nr
                 r                               Bits left
                                                 = L – [r * (L/Nr)]
                 r                               = L – (L/N)
(MSFQ ,N, r)
                            …                    = (N-1)L / N
                 r
                                                 Finish Time
                                                 Δ2 = L / r
                           τ
                      W(0, τ) ≥ W’ (0, τ)                            17
 When GPS is busy, MSFQ is busy.
  However, the converse is not true.
 Thus for any τ ,
               W(0, τ) ≥ W’(0, τ),            (2)
  where W(0, τ) and W’ (0, τ) denote the total number of
  bits serviced by GPS and MSFQ , respectively, by time
  τ.

   We will use the term busy period to refer to a busy
    period in the reference (GPS, 1,Nr) system.




                                                           18
 Workfrom previous busy periods can accumulate
 under MSFQ.
     This may happen either at the beginning or in the
      middle of a busy period.
                                                       Arrival Time

       1     2       3        4     5       6          7




                                   Delayed Finish Service Time
                                                                      19
                                    Arrival Time

1   2   3   4    5       6          7




                Delayed Start Service Time
                                                   20
 Theorem 1:
  For any τ,
     W(0, τ) − W’ (0, τ) ≤ (N − 1) Lmax
  where Lmax denote the maximum packet length.
 Proof:
     The  slope of W (GPS) alternates between
      Nr (when a busy period resumes) and
      0 (idle, between two consecutive busy periods).
     The slope of W’ (MSFQ) is at most Nr
      at any given time,

                                                        21
                                                   Assume 3 servers

   W(0, t)


                    GPS
                    Slope = 0 or nr




MSFQ
Slope = r, 2r, 3r
                                                                      t
          0   a1    a2    a3          a4 a5   a6     a7 a8   a9
                          t0                  t0
                                                                          22
[Case 1] At most N − 1 MSFQ servers are
 busy at t:
   Since  MSFQ is work-conserving, if a server is
    idle, we know that there is no packet waiting for
    transmission.
   In the worst case, all the k busy servers have
    just started transmitting a packet of maximum
    length (Lmax).
          W(0, t) − W’ (0, t) ≤ k Lmax           (a)
    where k = N – 1

                                                        23
         [Case        2] All MSFQ servers are busy at t:
               Let [to, t] be the largest interval in which
                all MSFQ servers are busy.
               Since in [to, t] the slope of W’ is Nr ,
                    W(0, t) − W’(0, t) ≤ W(0, to) − W’(0, to)             (b)

                   GPS server
                   Slope = Nr                    W(to, t)
all MSFQ servers are busy
Slope = Nr                                       W’(to, t)

                                                  W(to, t)  W’(to, t)

                            0    t0         t



                                                                            24
                              W(0, t) = W’(0, t)



t0 = 0                    t

               = 0, then W(0, t) = W’(0, t).
          If to
           Otherwise, if to > 0, we know from (a),
                  W(0, to) − W’(0, to) ≤ (N − 1) Lmax     (c)

          From    (b) and (c), we have
                     W(0, τ) − W’ (0, τ) ≤ (N − 1) Lmax   


          This theorem implies the need for a buffer space
           of (N − 1) Lmax.
                                                                25
   The discrepancy of packet departure times
    (i.e. begin transmitting/servicing) between
    multi-server and single-server
     Let dp be the time at which packet p departs from
      (GPS, 1,Nr) system.
     MSFQ packets may not depart in increasing order
      of dp.




                                                          26
   Lemma 1:
    Packet k will be scheduled no later than:

                    bk  ak   
                                   iP
                                          Li
                                Nr
    where
    ak and bk be respectively the arrival time and
        scheduling time of packet k over N servers, each
        with a rate of r,
    P be the set of packets scheduled before packet k
        since time ak, including the packets in service at ak,
    Li be the length of packet i.

                                                                 27
   Proof:
     Given   a load that must be scheduled before packet
      k, a work conserving service discipline schedules
      packet k latest, if the load is equally divided among
      the N servers such that all of them finish the work
      at the same time. 
                        Packet arrivals
                        from all flows        iP
                                                     Li




               ak
                             iP
                                    Li
                                          bk
                             Nr
                                                              28
4. PACKET DELAY
   Theorem 2:
    For all packets p,
                              ( N  1) L p     Lm ax
                d p 'd p                   
                                  Nr            r
  where dp’ and dp be the time at which packet p
  departs from the (MSFQ,N, r) and (GPS,1, Nr)
  system, respectively.
 Proof:
     Skipped

                                                       29
5. SERVICE PER-FLOW
 Theorem 3:
  For any τ ,
     Wi(0, τ) − Wi’ (0, τ) ≤ N Lmax
 Proof:
     Skipped




                                      30
6. FAIRNESS
   Example 3:
     4 servers:
     11 flows: (fixed packet length)
         F1: Weight = 0.5, 10 packets at t = 0
         F2 ~ F11: Weight = 0.05, each with 1 packet at t = 0




                                                                 31
   GPS Scheduled by WFQ ( finish time):


            F1A = 0 + L / 0.5
                F1B = F1A + L / 0.5 = 2L / 0.5
                      ……

                                                 F2 = 0 + L / 0.05
                                                 F3 = 0 + L / 0.05
                                                 ……




                                                                32
   MSFQ Scheduled by WFQ ( finish time):




                                             33
   GPS Scheduled by WF2Q
    (eligible start time (HOL) + finish time):
                                   * Not Smooth?
                      ?




                                                   34
   The direct application of WF2Q technique to
    multi-server systems does not fix the
    undesired burstiness problem and moreover,
    it makes the discipline non-workconserving.


                                      Not eligible
                                      until
                                      the previous pkt
                                      is scheduled
                                      
                                      non-workconserving



                                                      35
6.1 MSF2Q
 (MSF2Q,N, r)
 A packet is outstanding if it is being transmitted.
 Let ôi(t) denote the number of outstanding flow i
  packets at the MSF2Q system at time t.
 Ŵi(τ, t) = the work completed for flow i under
  MSF2Q over the interval [τ, t]




                                                    36
   At time t, when a server is idle and there is a
    packet waiting for service, MSF2Q schedules
    among the flows (eligible) that satisfy
                                                       Example 3:
     ˆ
    Wi (0, t )  Wi (0, t )
                                                       F1:       r1 = 0.5
    or                                                 F2~F10: rx = 0.05
                                        ri (t ) 
    [ Wi (0, t )  Wi (0, t ) and i  
       ˆ                          ˆ               ]
                                                       r = 1/4 = 0.25
                                                       
                                         r           ô1 = 0.5/0.25 = 2
                                                       ôx = 0.05/0.25 = 1

   That would complete service in the GPS
    system earliest
                                                                              37
   The output of MSF2Q in Example 3:

                            * Smooth scheduling



                                     Example 3:

                                     F1:       r1 = 0.5
                                     F2~F10: rx = 0.05
                                     r = 1/4 = 0.25
                                     
                                     ô1 = 0.5/0.25 = 2
                                     ôx = 0.05/0.25 = 1




                                                            38
6.2 Properties of MSF2Q
 Theorem 4:
  Let Li,max denote the maximum packet length
  of flow i. For any time τ and flow i, the
  following property holds:
          ˆ
         Wi (0, )  Wi (0, )  NLi ,max   (8)
 Proof:
     Skipped




                                                  39
7. APPLICATIONS
   Link Aggregation
     Logical  grouping of several Ethernet network
      interfaces to allow for cost-effective, load
      balancing, better scalability, and fault-tolerance.
     IEEE 802.3ad
     Currently ranges from two to eight Fast/Gigabit
      Ethernet ports in either servers or switching
      elements.




                                                            40
   Access of storage I/O
     To connect the RAID system to a host (e.g., Web
      server) with multiple SCSI or Fiber Channels to
      improve the I/O performance.
     Load balancing, failover




                                                        41
8. RELATEDWORK
   Skipped




                 42
9. CONTRIBUTIONS AND
FUTUREWORK
 Link aggregation, or the aggregation of
  multiple interfaces into a single logical link, is
  becoming the predominant approach for
  bandwidth scaling.
 Numerous fair queuing results previously
  obtained for single server systems do not
  directly apply to multi-server systems.



                                                       43
 We first analyzed the cumulative service,
  packet delay and per-flow cumulative service
  bounds for Weighted Fair Queuing (WFQ)
  applied to a multi-server system.
 We then presented a new fair queuing
  algorithm - MSF2Q that leads to smooth and
  fair schedules in finer time scales.




                                                 44
   Our future plans include:
     Investigation of implementation issues
     Quantitative comparison of the approach
      presented in this paper to the alternative
      approach of partitioning flows among servers
     Enhancing the algorithms for multiprocessors and
      cluster of servers
     Hierarchal GPS
     Servers with different rates
     Misordering of packets

                                                         45

								
To top