Bridges_ Switches_ Routers by jlhd32


More Info
									Chapter 1

Bridges, Switches, Routers

1.1 Introduction

  ¯   Packet vs circuit (and virtual circuit) circuit switching

  ¯   Network—mesh interconnection of links and switches

         – LANs (multiaccess, broadcast or shared medium Ethernet: 10BT—1000BT, Cat 3 UTP)
         – WANs switches connected by point to point links

  ¯   Packet processors—Bridges, Routers, ATM switches

2                                            CHAPTER 1. BRIDGES, SWITCHES, ROUTERS

                   Congestion control           Reservation
                                 Routing                                   path
                                Switching                                  Data path
                   Policing                      Scheduling
                                                                           per packet

        Figure 1.1: Packet processor functions may involve the data path or control path.

1.2 Packet processor functions

Routing — creating and distributing information that defines path between source and destination
     and determining the best path

Switching — per-packet forwarding decisions, and sending packet towards destination

Other functions — congestion control, reservations, policing, scheduling

Control functions performed infrequently; datapath functions are performed per packet.
1.3. TRANSPARENT BRIDGING IEEE 802.1D                                                           3


                                              R 20                    D 10           R 10
                 R = root port for bridge
                 D = dedicated port for LAN        B3                B1                 B2
                                              D 10                     D 10           10
                                                               20 R
                                                           D          D
                                                          20          20
                                       L4                                                  L5
                 1. Determine root bridge, and set its ports                         L1
                 in forwarding mode.
                                                                           B3   B1        B2
                 2. Each bridge deterimines root port, and
                 sets it in forwarding mode.
                 3. Bridges determine designated port for              L3            L2
                 each LAN segment.                                    B4        B4
                 4. All other ports are in blocked state.
                                                                L4                   L5

Figure 1.2: Bridged extended LAN and corresponding graph. Bridge forwards frames along span-
ning tree, according to FDB.

1.3 Transparent bridging IEEE 802.1D

Ethernet LANs broadcast each packet to every device on the LAN. The throughput per host de-
creases with number of hosts connected to the LAN. See Problem 1.
Transparent bridging prevents this by interconnecting LAN segments (collision domains) and for-
wards unicast packets according to filtering database (FDB). Broadcast, multicast, and unknown
unicast are flooded to all LANs. So all segments form a single broadcast domain.
A bridge has two or more ports. Packets from incoming ports are forwarded to outgoing ports along
a spanning tree to prevent loops, according to FDB. See Figure 1.2.

   ¯   spanning tree algorithm: one root, then shortest path to root;

   ¯   learning process: produces FDB by relating MAC source address to incoming port and re-
       moving unrefreshed entries.

Bridges exchange configuration messages to establish topology and topology-change messages to
indicate that STA should be rerun.
With a fixed number of bridge ports, througput per LAN segment decreases with the number of
segments in an extended LAN. See Problem 2.
4                                           CHAPTER 1. BRIDGES, SWITCHES, ROUTERS

                             Figure 1.3: LAN vs VLAN topology

                                   Figure 1.4: VLAN tags

1.4 LAN switches IEEE 802.1Q

A LAN switch is a bridge with as many ports as number of LAN segments, and with enough capacity
to handle traffic on all segments. Problem 2 is solved through VLAN.
Virtual LANs or VLANs is a collection of LAN segments and attached devices with the properties
of an independent LAN. Each VLAN is a separate broadcast domain: traffic on one VLAN is
restricted from going to another VLAN. Traffic between VLANs goes through a router.
VLAN tags or VID (4-byte) are added to MAC frames so switches can forward packets to ports with
same VID. FDB is augmented to include for each VID the ports (the member set) through which
members of that VLAN can be reached.
1.4. LAN SWITCHES IEEE 802.1Q                                                                      5

The member set is derived from VLAN registration information: (i) explicitly by management
action or by (ii) GARP VLAN registration protocol (GVRP). GARP is generic attribute registration
Multicast filtering A VLAN is a single broadcast domain. If multicast messages are broadcast, the
througput is limited by the slowest link: A switch with 124 10-Mbps ports has a capacity of 1.24
Gbps but can transmit at most 6 1.5Mbps multicast video channels. GARP Multicast Registration
Protocol (GMRP) (IEEE 802.1P) allows switches to limit multicast traffic along the ST. (See IGMP.)

   ¯   JOIN host sends this message to express interest in joining a multicast group. Switch adds
       port to multicast group and forwards multicast source to these ports. JOIN messages are sent
       once every JOINTIME timeout.

   ¯   LEAVE message sent by host. Switch removes this port from multicast group unless another
       host on that port sends JOIN message before LEAVETIME timeout.

   ¯   LEAVEALL message peroidically sent by switch.

When a host sends IP data to a multicast (Class D IP) address, the host inserts the low order 23 bits
in the low order 23 bits of the MAC address. So a NIC that is not part of the group ignores these
6                                             CHAPTER 1. BRIDGES, SWITCHES, ROUTERS

            Figure 1.5: The IP header provides precedence and type of service fields

Quality of service The 3-bit precendence allows 8 priority levels. The ToS bits are D-min delay,
T-max throughput, R-max reliability, C-min cost.
802.1 provides no support for priority. 802.1P provides in-band QoS signalling with 8 COS levels. A
conforming bridge or switch maintains 8 queues. (VLAN tags may also carry priority information.)
1.5. PROBLEMS                                                                                 7

1.5 Problems

  1. “The throughput per host decreases with number of hosts connected to the LAN.” Formulate
     two mathematical models, one deterministic and one stochastic, in which this quote is an
     assertion. Then prove or disprove the assertion. You will have to model LAN speed, host
     load and throughput.
    Hint: Use M/M/1 model of section 3.3.

  2. Follow Figure 1.2 and propose a graph model for an extended bridged LAN in which bridges
     may have multiple ports.

      (a) Use the graph to formulate two mathematical models, one deterministic and one stochas-
          tic, within which one can determine the throughput per LAN segment.
      (b) How would you formulate as a mathematical assertion the statement “the througput per
          LAN segment increases with the number of ports in the bridge”?
          Hint: try the Jackson network model of section 3.3.

  3. Discuss the differences between STP and OSPF in terms of throughput or efficiency in link
Chapter 2

Processor architecture

2.1 Datapaths

When packet arrives at bridge,

   ¯   DA is searched in forwarding table (DA         output ports). If not found, packet is broadcast to
       all output ports;

   ¯   If found, it is forwarded across switching fabric to appropriate output port (or ports for mul-

   ¯   SA is learned and added to forwarding table;

   ¯   During transfer to fabric packet may be stored or dropped if storage is full;

   ¯   Packet is stored in output port queue (usually FIFO) and eventually transmitted.

When packet arrives at router,

   ¯   DA is searched in forwarding table. If not found, packet is dropped;

   ¯   If found, next-hop MAC address is appended, TTL is decremented, new Header Checksum is
       calculated, and packet is forwarded across switching fabric to output port or ports;

   ¯   During transfer to fabric packet may be stored: if storage is full, this (or another) packet may
       be dropped;

   ¯   Packet is stored in output queue (FIFO or more complex) and eventually transmitted.

When cell arrives at ATM switch,

   ¯   Its VCI is searched in forwarding table (VC translation table: (VCI in, Port in)       (VCI out,
       Port out)). If not found, cell is dropped;

10                                                    CHAPTER 2. PROCESSOR ARCHITECTURE

     ¯   If VCI is policed, policing function determines if cell is conformant. If not, it may be dropped.
         If yes, cell is forwarded across switching fabric to output port;

     ¯   During transfer, cell may be stored: if storage is full, this or another cell may be dropped;

     ¯   Cell is stored in output queue and eventually transmitted. Service discipline may be FIFO or
         very elaborate.
2.1. DATAPATHS                                                                                                       11

                                                                      packet                  Memory
        packet                  Memory
                                                                      Line card #1                      CPU memory
        Line card #1                         Line card #4

                                                                      Line card #2                      CPU memory
        Line card #2                         Line card #5

                                                                      Line card #3                      CPU memory
        Line card #3                         Line card #6

                                  A                                                                B

                                           CPU                                                          CPU
                       packet             Memory                                     packet            Memory

        Line card #1   CPU memory                                     Line card #1   CPU memory

       Line card #1
                 #2    CPU memory                                     Line card #2   CPU memory

       Line card #1
                 #3    CPU memory                                     Line card #3   CPU memory

                                   C                                                           D

                                      Figure 2.1: Basic packet processor architecture

   ¯   Throughput in A is limited by CPU speed;

   ¯   In B, there is a choice about which CPU to forward packet;

   ¯   In C, packet travels bus only once, so throughput limited by bus speed;

   ¯   In D, several packets can be forwarded through crossbar.

General purpose CPUs are not well-suited for applications in which packets flow through. CPU’s
are better when same data are examined several times, making use of cache.
12                                                    CHAPTER 2. PROCESSOR ARCHITECTURE

                               Congestion control     Reservation
                                           Routing                    path
                                          Switching                    Data path
                              Policing                 Scheduling
                                                                       per packet

                                         Forwarding    Switching
                        Policing                                    Scheduling
                                          decision      fabric

                          Figure 2.2: Elaboration of datapath functions

2.2 Performance

The packet delay through switch fabric consists of time (1) for forwarding decision, and (2) to
transfer packet across switch.
Packet delay through processor consists of time (1) for policing decision, (2) forwarding decision,
(3) to transfer across switch, and (4) for output scheduling decision.
2.3. FORWARDING DECISION                                                                      13


                                                                            decision time


                 time                          packet
                                               arrival rate               min back-to-back
                Header                                                    packet size

                                                              packet size

                           Figure 2.3: Delay of switch and packet processor

2.3 Forwarding decision

Criteria: (1) speed of address lookups depends on number of memory references; (2) size of memory
ATM switches perform direct lookup, figure 2.4
VCI address space is ¾¾ = 16 M. Most switches contain ¾½ or fewer entries, since it is downstream
switch that chooses VCI that fits in supported address space (PNNI).
For multicast, lookup returns list of output ports, each with different VCI.

                                                                            (port, new VCI)


                           Figure 2.4: ATM switches perform direct lookup
14                                                 CHAPTER 2. PROCESSOR ARCHITECTURE

                                 Network Associated
                                 address data

                net address
                 48 bits
                                                               location of

                                                               log2N bits
                                                               (size N memory)

Figure 2.5: CAM or Content addressable memory. The 48-bit MAC address is presented. A suc-
cessful parallel search asserts “hit” signal and returns pointer to entry where forwarding information
for the MAC address is stored.

Bridge Address space is ¾    so direct lookup is not possible. Three indirect lookup techniques:
Associative memory. Figure 2.5. Typical CAM size is      Æ      ½¼¾   entries. Not suitable for large
LANs which support ¾½         ¼¼¼ entries.
2.3. FORWARDING DECISION                                                                               15

                     48 bits                      16 bits                           log2N

                                 function                                           address of
                M addresses
                                                                                    N linked lists

Figure 2.6: A 48-bit address is presented and the hashing function returns a pointer to one of          Æ
linked lists. The search through a linked list takes a random time proportional to length of list.

Hashing. For large LANs hashing is an option. Suppose the LAN has Å hosts. A hashing function,
 , maps a hosts 48-bit address to a forwarding table with, say, Æ ¾½ entries as in Figure 2.6.
Two addresses Ü Ý may collide: ´Üµ          ´Ýµ. The entry points to a linked list of (MAC address,
forwarding data) of MAC addresses that map into the same entry. The list must be searched sequen-
tially to locate the MAC address. The duration of search is proportional to the length of the list.
Suppose maps the Å MAC addresses ܽ              ÜÅ into the Æ linked lists        ½ Æ . Assume
that ´Ü½ µ       ´ÜÅ µ are independent uniformly distributed over ½ Æ .
The length of the th list is the ramdom number
                                Ò           ½´Ü         µ             ½   Æ                          (2.1)

Let «     Æ Å . If « is small (number of lists larger than number of possible addresses), the lists
will usually have 0 or 1 element. Problem 3 asks to find the distribution of Ò . For Æ Å (« ½),
the mean length of the list is about ¼ ´½ · «µ. However, Ò being random, there is a chance that
some lists (and corresponding search time) may be very large. For real-time applications, you may
store forwarding tables in such a way (e.g. as trees) that retrieval has a deterministic bound,
16                                               CHAPTER 2. PROCESSOR ARCHITECTURE

                                   Prefix          Outgoing port
                       /16          1

                            Figure 2.7: Forwarding table with CIDR

IP routers. With CIDR, router forwarding table entries are identified by a pair, (route prefix/prefix
length), with prefix length between 0 and 32 bits. See Figure 2.7. The entry is a
16-bit long entry.
The forwarding decision must find the longest prefix match between the packet’s destination IP
address and the prefixes in the forwarding table.
CIDR reduces table, but the forwarding decision is more complex. See [9].
With declining memory cost, it may be more economical to expand the prefixes and use simpler,
exact matching algorithms.
2.3. FORWARDING DECISION                                                                           17

Caching. The forwarding decision delay can be reduced by caching. Idea is that the IP destination
addresses of successive packets are correlated.
The cache stores the full source and destination IP address and the corresponding forwarding deci-
sion (including perhaps the entire replacement IP header).
When packet arrives SA and DA are used to do a full match in the local cache. If the addresses are
not there, the packet is forwarded to a central routing processor. A cache replacement rule is needed
if there is a cache miss.
The improvement in delay depends on (1) the ratio of cache size to the size of the forwarding table,
and (2) the temporal locality. The latter is likely to be higher in a campus router than an edge router
and larger there than in a core router. See Problem 4.
Multicast. Some routers support multicast. The simplest rule is RPF (reverse-path forwarding): If a
multicast packet arrives on port È from source Ë , look up Ë in the forwarding table. If È is the best
port to reach Ë , forward the packet on all ports except È .
Switching fabrics Need some queuing models.
18                                                     CHAPTER 2. PROCESSOR ARCHITECTURE

2.4 Problems

     1. For a commercial LAN switch, find the various times in Figure 2.3. Also give the throughput.
        See, for example,

     2. If forwarding decision, switch transfer, and output scheduling can be pipelined, what is the
        throughput of the processor?

     3. Find the (marginal) distribution of the Ò given in (2.1), and calculate the mean length       Ò
        of a list. Show that for « ½ small, the mean is approximately ¼ ´½ · «µ.
        Find the joint distribution Ô´Ò½   ¡ ¡ ¡ ÒÆ µ. Verify that it has the product form:
                                                      ÉÆ Ô´Ò µ
                                      Ô´Ò ¡ ¡ ¡ ÒÆ µ È ÉÆ        ½

                                                            Ô´Ò µ

                                                            Ò¾       ½

        Here        Ò
                      ÈÒ          Å   , so the denominator is the normalizing constant.
        Take Å     Æ     ¾½
                              . Find the probabililty that Ò     ½¼¼¼.
        Suppose a memory access takes 100 ns, «         ½. Consider back-to-back Ethernet packets.
        What is the average throughput of this switch using the model of Figure2.3 and ignoring the
        output scheduling decision delay.

     4. The packets arriving at a line card belong to several multiplexed TCP connections.

         (a) Formulate a model of packet arrivals with say Å simultaneous connections and in which
             connections last a random amount of time with a geometric distribution and mean Ì .
         (b) Suppose the size of the cache is Æ . If there is a cache miss, an existing entry is replaced
             by the missing entry. How would you calculate the hit ratio as a function of Å Æ Ì ?
         (c) Suppose you are given a ‘typical’ trace of the addresses of packet arrivals, but no model
             of the arrival process. You want to know how big a cache you would need so that the hit
             ratio is a certain value, say ¼ . What would you do?
         (d) The time to search a cache is Ì , the time to search the central forwarding table is Ì ,
             the hit ratio is . How would you decide if it’s worth having a cache?
Chapter 3


3.1 Discrete time Markov chains

Ü ÜÒ Ò ¼ is a Markov chain with ÜÒ ¾                                            finite or countable, stationary probability matrix
È´ µ  ¾ , initial distribution ´ µ ¾                    ¼                  .
                      È ´Ü  ¼        ¼   ¡ ¡ ¡ ÜÒ           Òµ         ¼   ´ µÈ ´
                                                                               ¼        ¼   ½   µ ¢ ¡ ¡ ¡ ¢ È ´ Ò    ½   Òµ   (3.1)
for all Ò     ¼   ¼   ¡¡¡    Ò   ¾       .

 Ò is the marginal distribution of ÜÒ written as a row vector. From (3.1)

                                                                  Ò            ¼   ÈÒ                                         (3.2)

     is invariant if it satisfies the balance equations

                                                                                   È                                          (3.3)

Ü is irreducible if it goes from any state to any other state (with positive prob). Irreducible
chains have at most one invariant distribution. The chain is positive recurrent if it has one invariant
If Ü is irreducible,
                                         Æ ½Æ
                                                                ½´ÜÒ           µ                ×    ¾                        (3.4)
                                                        Ò   ¼

i.e.     is the fraction of time Ü spends in state .
Ü is aperiodic if           ½, where
                                               gcd          Ò     ½ È Ò´ µ                  ¼       ¾
If        ½, Ü is periodic with period              .

20                                                                               CHAPTER 3. QUEUING

If Ü is aperiodic and irreducible, with invariant distribution , then for any initial distribution,

                                               Ò½       Ò                                             (3.5)

See Problems 1, 2.
Theorem Suppose Ü is irreducible and Î                  ¼   ½µ. The drift of Î   at is

                              ¡´ µ         Î ´ÜÒ µ   Î ´ÜÒ µ ÜÒ

Suppose Ë is a finite subset of       and there are constants          ¼      ½ so that
                                        ¡´Üµ              ܾË
                                        ¡´Üµ             ܾ
Then Ü is positive recurrent. See Problem 4.
3.2. CONTINUOUS-TIME MARKOV CHAINS                                                       21

3.2 Continuous-time Markov chains

A random variable       is exponentially distributed with rate             if

                                        È´     ص             Ø   Ø        ¼
Its mean is
                                                    ´ µ
and it is memoryless,
                             È        Ø·×           ×     È´          ص        ×Ø   ¼
A rate matrix É     Õ´ µ         on a countable set       satisfies

                                                ¼        Õ´ µ     ½
                            Õ´ µ Õ´ µ                   Õ´ µ      ½             ¾
22                                                                                                   CHAPTER 3. QUEUING

                0                                                                            4

                                                                              3                                 t
                          Figure 3.1: Constructing a continuous-time Markov chain

Given rate matrix É and distribution           ¼   on             . Construct Ü       ÜØ Ø       ¼    thus:

     1. Select ܼ         with È ´Ü¼       µ           ¼   ´ µ.
     2. If ܼ       , select    exponential with rate Õ ´ µ. Let

                                                           ÜØ             ¼       Ø

     3. At Ø        Ü takes a jump from        to , independently of                  and according to

                                                                                      Õ´ µ
                                  ÈÜ           Ü                           ´ µ
                                                                                       Õ´ µ
     4. Return to step 3 with Ü            , independently of process before .

Then Ü is a Markov process with right-continuous sample paths. Figure3.1 shows a sample path.
É is regular if
                                                                  Ò       ½ ×
                                                       Ò     ¼


                               È ´Ü
                                  ¼       ÜØ       µ                  ¼ ´ µÕ´ µØ · Ӵص
                               È ´Ü
                                  ¼       ÜØ       µ                  ½   Õ´ µØ · Ӵص
3.2. CONTINUOUS-TIME MARKOV CHAINS                                                           23


                                   S1                    S2

                                        t1                   t2     t3

                              Figure 3.2: A trajectory in         of Theorem

Theorem (Markov property) For any set           of trajectories

                 È ´Ü× ×    ص ¾    ÜØ          ÜÙ Ù     Ø    È ´Ü× ×     ¼µ ¾   Ü   ¼

Such       is of the form
                                             Ü ÜØ   ¾Ë        ½   ¡¡¡ Ã
¼   Ø  ½    ¡¡¡ Ø , Ë        ,Ã         ½. See Figure 3.2.
24                                                                                                CHAPTER 3. QUEUING

É is irreducible if Õ´ µ          ¼ for all    ¾           if   is irreducible, where
                                                               ´   Õ´ µ
                                               ´ µ                  Õ´ µ
Theorem Suppose Ü is c-t Markov chain with rate É and initial distribution . Then

     1.     is invariant, È ´ÜØ       µ       ´µ           Ø       ¼    ¾       iff balance equation

                                                                       ´ µÕ ´ µ      ¼                          (3.6)
     2.   Ü has at most one invariant distribution                     and then

                                                   Ð Ñ È ´ÜØ
                                                   Ø ½
                                                                            µ            ´µ   ¾
                                         ½             Ì
                                     Ì ½Ì
                                                           ½´Ü×          µ ×             ´µ   ¾

     3. If Ü has no invariant distribution,

                                                       Ð Ñ È ´ÜØ
                                                       Ø ½
                                                                                µ        ¼    ¾
                                           ½               Ì
                                       Ì ½Ì
                                                               ½´Ü×         µ ×          ¼    ¾
3.2. CONTINUOUS-TIME MARKOV CHAINS                                                                          25

Theorem (Time reversal) Suppose        Ü is stationary,   c-t, Markov with rate       É,   distribution   . The
time-reversed process
                                 Ü         ÜØ      ÜÌ  Ø ¼ Ø           Ì
is stationary, Markov, with distribution        and rate É where

                                                   ´ µÕ´ µ
                                  Õ´ µ


                 È ´Ü   ¼   ÜØ     µ             ´ µÕ´ µØ · Ӵص               and
                 È ´Ü   ¼   ÜØ     µ             ´ µÕ ´ µØ · Ӵص
                                                È ´ÜÌ     ÜÌ  Ø µ
                                                È ´ÜØ    Ü ¼    µ È ´Ü     ¼         ÜØ     µ
                                                 ´ µÕ´ µØ · Ӵص
26                                                                             CHAPTER 3. QUEUING

                            0              1            2              3

                                     µ              µ              µ

                        0                                                            t


Figure 3.3: Diagrams for M/M/1 system. Arrivals (blue) and departures(red) form Poisson pro-

3.3 M/M/1 model

See Figure 3.3. The balance equation (3.6) is

                             ´¼µ               ´½µ
                        ´Òµ´ · µ               ´Ò   ½µ · ´Ò · ½µ           Ò    ½
which has a (unique) solution iff      :

                             ´Òµ    ´½   µ      Ò   Ò       ¼   with                         (3.7)
3.3. M/M/1 MODEL                                                                                  27

The queue ÜØ is time-reversible, because

                                      ´ µÕ ´ µ        ´ µÕ ´ µ          ¼
so the rate matrix of the time-reversed process, ÜÌ  Ø , is the same as that of ÜØ .
So the departures before time Ø form a Poisson process with rate , independent of Ü . Surprise!
The mean queue length is
                                  ½              ½
                       ´ÜØ µ             Ò ´Òµ           Ò´½   µ   Ò
                                 Ò   ¼           Ò   ¼
For      ¼   , the mean is 10 packets.

                               av. number of exponential packet arrivals per sec
                               av. number of packets that can be transmitted per sec
                               av. utilization       È ´ÜØ   ¼µ
28                                                                                                       CHAPTER 3. QUEUING

A packet arriving at time Ø sees ÜØ packets in queue with

                           È ÜØ                  Ò packet arrives in ´Ø Ø · ¯µ
                                                 È ´packet arrives in ´Ø Ø · ¯µµ ÜØ Ò È ´ÜØ                            Òµ
                                                           È ´ packet arrives in ´Ø Ø · ¯µ
                                                  ¯ ´Òµ
so the average time between departure and arrival (including packet service or transmission time) is
                                         ½ ½                                 ½ Ò·½                             ½
                              Ì                   ´Ò · ½µ ´Òµ                           ´½   µ   Ò
                                         Ò   ¼                           Ò   ¼
Alternatively, Ì              ½·    ÜØ            ½
Example Consider a 10 Gbps link. Packet lengths are exponentially distributed with mean length
10,000 bits.1 So   ½¼½¼ ¢ ½¼  ½¼ packets/s and  ½ ½ s per packet.
Link utilization is 90 percent, i.e.  ¼ . Then the average number of packets in buffer is
  ´½   µ ½ . The average delay faced by a packet including its own service (transmission) time
is 10 s.
If the packet goes through 10 nodes the average delay is 100 s (assuming independence of nodes).
For a 100 Mbps link, with same packet length distribution,                                           ¼     ,        ½
                                                                                                                            ½¼   ¢ ½¼ 
½¼¼ s/packet, and the average delay is 1000 s per link.
The probability of 100 or more packets in buffer is

                                                                ´½   µ                    ½¼ ¢ ¼

                                         ´Òµ                             Ò                           ½¼¼
                          Ò   ½¼¼                     Ò   ½¼¼

Compare queuing delay with propagation delay of ¿ ¼¼¼                                    ¢ s/km = 15 ms for 3,000 km link.
Possible number of bits in the 3,000 km, 10 Gbps link is ½                               ¢ ½¼  ¢ ½¼ ½ ¼ ¢ ½¼ .
                                                                                                 ¿         ½¼

         What is a more realistic distribution?
3.3. M/M/1 MODEL                                                                                      29

Alternative formulation      Ø Ø    ¼ is a Poisson counting process with rate —the arrival
process. Ë      ËØ Ø ¼ be a Poisson counting process with rate —the virtual service process.
Ë are independent.
The queue at Ø is given by
                               ÜØ           Ü ·
                                            ¼                          ×     ½´Ü×  ¼µ Ë×

The departure counting process is           ,
                                                Ø                     ½´Ü×     ¼µ Ë×

  is also Poisson. Moreover,

   ¯   Future arrivals,   ×     Ø       ×       Ø       , and current state, ÜØ , are independent;

   ¯   Past departures,   Ø         ×   ×           Ø    , and current state, ÜØ , are independent.
30                                                                           CHAPTER 3. QUEUING

                           external traffic

                            traffic                                        line i

                     external traffic
                                                             line rate is µi pkt/sec
                     rate is i pkt/sec

                                j                       i

                                Figure 3.4: Parameters of Jackson network

Jackson network See Figure 3.4. Assumptions:

     ¯   Independent, exponential service times with rate       ;

     ¯   Markovian routing Ö ´      µ;
     ¯   Poisson external arrivals at rate ­ packets/sec;

Aggregate arrivals into node is           where

                                                  ­ ·       Ö´ µ    all                            (3.8)

Let ÜØ      ´Üؽ
                   ¡ ¡ ¡ ÜÂ µ be queue-length process. This is Markovian. Problem 5 asks to find its rate
3.3. M/M/1 MODEL                                                                                31

Theorem Assume            , all . Then Ü has an invariant distribution of the product form:

                                   ´Ü   ½
                                            ¡ ¡ ¡ ÜÂ µ   ½   ´Ü µ ¡ ¡ ¡
                                                                          Â ´Ü

                             ´Òµ        ´½   µ       Ò   Ò          ¼   with

This is a surprising result. The departure from any node in the Jackson network need not be Poisson,
unlike the case of a single M/M/1 system.
32                                                                                                             CHAPTER 3. QUEUING


                                                       route to first free
                                                       server                                  µ

                           0            1          2            3                    m-1            m   m+1

                                    µ         2µ           3µ                (   µ
                                                                              m−1)             mµ       mµ

                                        Figure 3.5: The M/M/m/½ system

3.4 Other M/M/m/n models

M/M/m, the m server case The received request is routed to the first of Ñ available servers, Figure
3.5. The buffer is infinite. The balance equations are

               ´ · Ñ µ ´Òµ                             ´Ò   ½µ · Ñ ´Ò · ½µ Ò Ñ
                ´ · Ò µ ´Òµ                            ´Ò   ½µ · ´Ò · ½µ ´Ò · ½µ ¼ Ò                                     Ñ
                        ´¼µ                            ´½µ
This gives                                                 ´                 Ò
                                            ´Òµ                     ´¼µ ÑÒ


                                                                                           Ò        Ñ                        (3.9)
                                                                    ´¼µ ÑÑ                 Ò        Ñ
It is assumed that                 ½. ´¼µ is obtained using
                                                                                     È ´Òµ ½,

                                                       Ñ ½
                                                               ´Ñ µÒ    ´Ñ µÑ  
                                        ´¼µ                          ·                                  ½

                                                       Ò   ¼
                                                                 Ò     Ñ ´½   µ

A packet arriving at time Ø sees all servers busy (ÜØ                                Ñ) with probability
              È ÜØ             Ñ    packet arrives in ´Ø                   Ø · ¯µ
                                     È ÜØ              Ò   packet arrives in ´Ø                     Ø · ¯µ
                               Ò Ñ
                                     È      packet arrives in ´Ø Ø · ¯µ ÜØ Ò È ´ÜØ                                  Òµ
                               Ò Ñ                È ´packet arrives in ´Ø Ø · ¯µµ
                                    ¯ ´Òµ                                                      ÑÑ             Ò from (3.9)
                                                                      ´Òµ             ´¼µ
                               Ò Ò    ¯                    Ò Ñ                                 Ñ Ò      Ñ
                                ´¼µ´Ñ µÑ
                                                           È ´ÕÙ Ù µ
                                Ñ ´½   µ
The expected number of packets waiting in queue (not in service) is

                                                                    ´¼µ´Ñ µÑ                        Ò
          Æ ´ÕÙ Ù µ                Ò ´Ò · ѵ                                   Ò                         È ´ÕÙ Ù µ
                         Ò     ¼
                                                                       Ñ     Ò             ¼
3.4. OTHER M/M/M/N MODELS                                                            33

By Little’s law (see below), the average waiting time in queue (not in service) is

                                               Æ ´ÕÙ Ù µ
and the total latency (waiting time) is
                                           Ì         ·Ï
34                                                                                    CHAPTER 3. QUEUING

3.5 Little’s law

Suppose ´Øµ is the cumulative arrivals in ¼ Ø into a stable queueing system, ܴص is number of
packets in system (including those in service). Let ´ µ    Ë · Ï be latency of packet . Let
  ´Øµ Ø      be arrival rate.
Suppose queue is empty at Ø      ¼ and Ø    Ì . From figure 3.6, the time average of queue size is
                         ÊÌ             È    ̵                      È    ̵
                              ܴص Ø        ´
                                                    ´µ        ´Ì µ        ´
                              Ì                 Ì             Ì               ´Ì µ
Taking limits as Ì     ½, and if time averages equal ensemble averages, we get
                                          ´Üµ     ¢ ´ µ



                                 W2                            W4         W5
                                S1     S2                S3               S 4 S5

                              Figure 3.6: Calculations for Little’s law
3.6. PASTA                                                                                                        35


We have used the PASTA property (Poisson arrivals see time averages) several times.
Consider stationary queuing system with deterministic service time of 3 and periodic arrivals (period
10). A sample path with arrivals at 1,2,3,11,12,13,21,22,23,¡ ¡ ¡ and queue process ܴص is shown in
figure 3.7.
Let ´Òµ be the probability that ܴص      Ò at any time Ø, and let               Ô´Òµ be the probabililty      that an
arriving packet sees Ò packets in queue. For this system,

                          ´¼µ         ½ ½¼ ´½µ         ½¼ ´¾µ            ½¼ ´¿µ            ½ ½¼
                                             Ô´¼µ   Ô´½µ        Ô´¾µ    ½¿
so the two probabilities are not the same.

                  1   2     3     4    5    6   7          10   11

         Figure 3.7: PASTA property does not hold in this deterministic queuing system

Consider a M/G/1 system, with stationary probabilities                  ´Òµ.   Let   Ô´Òµ be the probability that an
arrival sees Ò packets in queue. Then,

                      Ô´Òµ                 È Ü´Øµ Ò packet arrives in ´Ø Ø · ¯µ
                                           È ´Ü´Øµ ÒµÈ ´packet arrives in ´Ø Ø · ¯µµ
                                                  È ´packet arrives in ´Ø Ø · ¯µµ
                                           È ´Ü´Øµ Òµ        ´Òµ
using Bayes’ rule, independence of arrivals after Ø from               Ü´×µ ×        Ø   , and independence of service
36                                                                                                                 CHAPTER 3. QUEUING


                                     S2                            area (2)

                              S1                                       S3                                     S5
                                                                        W3                                    W5

                               Figure 3.8: Deriving Pollaczek-Khinchin formula

3.7 Pollaczek-Khinchin formula

Consider M/G/1 system with independent service times Ë , Ë           ½ , Ë ¾ ½, Poisson arrivals
with rate . Let Ï ´Øµ be the remaining waiting time, i.e. the amount of time needed to serve packets
in the system at Ø. Let Ë and Ï be the service time and waiting times of packet , see figure 3.8.
The time average of waiting time
                                           ½             Ì                      ½    ´

                                                             Ï ´Øµ Ø                             Ö ´µ
                                           Ì         ¼                          Ì        ¼

  Ö ´ µ is the parallelogram area for packet , so Ö ´ µ ½ ¾Ë · Ë Ï . Substituting and taking              ¾

limits as Ì    ½,
                   Ï       ´ Ë · ´waiting time faced by arriving packetµ Ë µ

By PASTA, ´waiting time faced by arrivalµ         Ï . So,
                                                Ë         Ë                 ¾                         ¾

                                         ¾´½   Ë µ ¾´½   µ
where                 is the utilization.
Note: The formula
                                                 ´   ̵
                                                                 Ö ´µ               ´Ì µ         Ö ´µ

involving a random sum of ´Ì µ terms is sometimes called Wald’s formula. A general version of
Wald’s formula is a consequence of the fact that ´Øµ   Ø Ø ¼ is a martingale. See Problem8.
Determinism minimizes waiting
In general,   Ë   ¾
                         ´ ˵ ·¾      ¾
                                          , so
                                          ´´ Ë µ · µ         ¾     ¾
                                                                                  ´ ˵       ¾

                                            ¾´½   µ                             ¾´½   µ               ¾ ´        µ
where the last expression is the waiting time for a deterministic service time (eg. ATM cells).
3.8. PROBLEMS                                                                                     37

3.8 Problems

  1. How does (3.2) follow from (3.1)?

  2. Give examples of Markov chains Ü with the following properties:

      (a)   Ü is irreducible and has no invariant distribution;
      (b)   Ü is finite with more than one invariant distribution;
      (c)   Ü is finite but not irreducible;
      (d)   Ü is infinite and positive recurrent;
      (e)   Ü is finite, irreducible and (3.5) does not hold.
  3. Show that if    is finite the convergence in (3.5) is geometrically fast, i.e.   Ò         Ò for
     some ¼          ½.
  4. A packet processor takes 1 s to forward one packet. Packets arrivals are iid. In 1 s,
     packets arrive with probability    ¼. Let ÜÒ be the number of packets at the beginning of
     the Òth s in the (infinite) buffer.

      (a) Show that Ü      ÜÒ Ò ¼ is a Markov chain.
          Hint: express the evolution of Ü as a stochastic dynamical system of the form

                                              ÜÒ ·½      ´ÜÒ ÛÒ µ
            where Û     ÛÒ Ò ¼ is an independent process. Show that in this case Ü is always
            Markov, and if Û is iid, Ü has stationary transition probabilities.
      (b) Show that Ü is irreducible.
      (c) Write the balance equations.
      (d) Find conditions on the so that Ü is positive recurrent. How would you find the ex-
          pected forwarding delay faced by a packet?
      (e) Give an example of the      so that Ü is not positive recurrent. What happens to the queue
          size in this case?

  5. Find the rate matrix of the queue length process    ÜØ   of the Jackson network in Figure 3.4.

  6. In the feedback network in Figure 3.9, at each link a packet leaves the system with probability
     0.5. For what values of ­ (in terms of ½ ¾ is the system stable?

  7. In Figure 1.2 suppose the bridges have a throughput of 1 Gbps, L1,L2,L4 and L5 are 10 Mbps
     LANS and L3 is a 100 Mbps LAN. Suppose traffic originating in each LAN is 50 percent of
     LAN capacity.
     Suppose 90 percent of the traffic originating in LAN Li is destined for a station in the same
     LAN whereas 10 percent is destined for a station in LAN Lj, selected randomly.

      (a) Can the network support this traffic?
38                                                                           CHAPTER 3. QUEUING


                                   0.5                          0.5


                                Figure 3.9: Network for problem 6

         (b) By what factor can the traffic increase, before the network becomes unstable?

     8. Let Ü´Òµ Ò ¼, be a Bernoulli sequence with È ´Ü´Øµ            ½µ   Ô. Let Æ be a random number
        defined below. For each case explain why or why not
                                                      Ü´Òµ     Ô Æ
                                             Ò    ¼

         (a)   Æ   ÓÒ×Ø ÒØ a.s.
         (b)   Æ   Ö Ñ Ò Ò Ü´Ò · ½µ ½ .
         (c)   Æ   Ö Ñ Ò Ò Ü´Òµ ½ .
         (d)   Æ   Ö ÑÒ Ò
                              ÈÒ Ü´ µ ½¼¼             .
Chapter 4


4.1 Packet switching

  ¯   Architectures

  ¯   IQ/HOL

  ¯   VoQ

  ¯   SQ

40                                                                     CHAPTER 4. SWITCHING

           is blocked

             IQ: hol blocking                          OQ: faster switch

        VoQ: matching                                SQ: reduces buffer size

                             Figure 4.1: Packet switch architectures

4.1.1 Architectures

Second generation PRIZMA architecture is ¿¾ ¢ ¿¾, with 2 Gbps ports ... all on one chip [15]
4.1. PACKET SWITCHING                                                                                                            41

  λ /N                       Xt
                                                                                    min(1, xt)
                         1                          1

                         3                                                 xt                                 At       xt+1
                     2 1
                                                                           read HOL               arrivals
                     2 1

                                                          Average delay in cell times
                                                                       8   -
                HOL queue
                                                                       6   -
         1                                 1                               -
  ρ                                                                        -
                                  1    1                               4   -
       At                                                                  -
                                  Xt                                       -
                                                                       2   -
         Input from nonblocked queues                                      -
                                                                           0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 ρ

                                                   Figure 4.2: Virtual HOL queue

4.1.2 Input queues


      ¯ discrete time Ø, independent arrivals, uniform destination with prob Æ
      ¯ Æ large so total number of port 1 arrivals is Poisson
                                                        È ´Ò port 1 arrivalsµ                 
Virtual HOL queue of                   Ø port 1 packets at head of queue:

                                  Ø·½      ´   Ø     ½µ ·
                                                                 Ø          Ø   ·     Ø     ½´    Ø        ¼µ Ø    ¼           (4.1)

                     Ø       number of new port 1 packets that come to head unblocked queues
Suppose equilibrium probability of unblocked queue is is                                   ¨.    Then Ø is Poisson with mean
 ¢ ¨. From (4.1)
                                               ´ µ          ´ µ · ´ µ   È´                            ¼µ
42                                                                               CHAPTER 4. SWITCHING

so È ´     ¼µ     ´ µ            . Square (4.1), and take expectations,

                                 ¼     ´ µ· ·¾
                                                          ´ µ ¾ ´ µ ¾            ¾

Since    ´ µ
                  ·      ¾
                                                      ¾     ¾

                                              ´ µ                 ¬
                                                      ¾´½   µ
                   ´# blocked queuesµ           Æ ´½   ¨µ   Æ ´       Ø     ½µ
                                                                                     Æ ´¬   µ
For      ½ this gives ¬ ½,                             Ô
                                                    ¾  ¾    ¼
That is 42 % switch bandwidth is not utilized.
quick upper bound Same switch. But at end of each cycle IQs are flushed. With                    ½, switch
throughput is
                                                                                      ½ Æ
                         È´          ¼µ         ½   È´      ¼µ     Æ ½   ´½             µ
Per port throughput is
                                                ½ Æ
                                     ½   ´½  
                                                  µ     ½½  ½ ¼ ¿
4.1. PACKET SWITCHING                                                                                          43

4.1.3 Virtual output queue

   ¯   Each input port has Æ VoQ’s, one per output port.

   ¯   If several input ports have packets for same destination, which one should be served.

   ¯   Assume iid arrivals       ´Øµ with rates £       such that

                                                    ½         ½   ¡¡¡ Æ
                                                    ½         ½   ¡¡¡ Æ

   ¯   Service Ë ´Øµ     Ë ´Øµ    such that

                                              Ë     ½         ½   ¡¡¡ Æ
                                              Ë     ½         ½   ¡¡¡ Æ

       Note: If = above, Ë ´Øµ is a permutation over    ½   ¡¡¡ Æ       .

   ¯   Queue lengths Ä   ´Øµ such that
                                      Ä´Ø · ½µ      Ĵص   Ë ´Øµ · ´Øµ

Question: Given £, find Ë ´Øµ, based on past arrivals        ´×µ ×           Ø and Ĵص, so that Ä is stable.
Conjecture: Always exists stabilizing matching Ë ´Øµ.
44                                                                                         CHAPTER 4. SWITCHING

                                    I          G        O            I       M        O


                                        Bipartite graph G            Max size matching M

                        all demands 0.5                      Max size matchings

             Figure 4.3: Matching in bipartitie graphs (above) and counterexample (below)

Matching:       Î be bipartite graph, i.e. Î Á Ç and          Á ¢ Ç.                             Å    Î   is a matching if
no two edges in Å have a common vertex. Edges can be weighted.
Max size matching:Å has max number of edges. Best algorithm has running time Ç´Æ µ.                                ¾

Max weight matching:
                     È Û ´ µ ¾ Å is max. Best algorithm Ç´Æ ÐÓ Æ µ.                          ¿

Conjecture: Maximum size matching is stabilizing.
Take Æ        ¿.   ½½        ½¾         ¼     ¾½        ¿¾   ¼   .
Suppose Ľ½ ´Øµ ¼ Ľ¾ ´Øµ ¼. Suppose ¾½ ´Øµ            ¿¾ ´Øµ  ½ (which happens with prob 0.25.)
Then there are 3 max size matchings and input 1 will be selected with prob 2/3. So,

            È ÖÓ ´Ë ´Øµ · Ë ´Øµ
                   ½½             ½¾           ½µ      ¼¾    ¢¾ ¿·¼ ¢½ ½ ·¿                          ½½ ½¾     ½
i.e.   Ä ½´Øµ · Ä ¾´Øµ
        ½          ½    ½.
Suppose all rates are ¼   Æ . Still get instability with max size matching for Æ ¼ small.
4.1. PACKET SWITCHING                                                                                 45

Try providing more service if queue lengths are large, i.e. choose

                          Ë ´Øµ         Ö ÑÜ                   Ä ´ØµË          Ë is a permutation

Intuition. In continuous time approximation,

                                      Ĵص             ´Øµ   Ë ´Øµ    neglecting negatives
So, (below Ä        Ë are vectors or matrices as appropriate)

                                                       Ĵص    ¾
                                                                          ¾Ä´ØµÌ ´ ´Øµ   Ë ´Øµµ     (4.3)
                                        Ĵص      ¾
                                                        Ĵص              ¾Ä´ØµÌ ´£   Ë ´Øµµ        (4.4)
So choose
                           Ë £ ´Øµ               Ö Ñ Ü Ä´ØµÌ Ë Ë is a permutation
Theorem The assignment problem

                                                       ÑÜ                  Ì

                                             subject to
                                                                     È             ½
                                                                     È             ½
has an optimum solution that is a permutation.
Suppose in (4.4),
                    È       È                    ´½   Ƶ. Then from Theorem,
                                             Ä´ØµÌ ´£   Ë £ ´Øµµ                Æ Ĵص
so with policy Ë£ ´Øµ,

                                                       Ĵص   ¾
                                                                   Ĵص         Æ Ĵص
from which stability follows by [16].
46                                                                                               CHAPTER 4. SWITCHING

                                                        Shared memory

                                                                                 queue 1

                path by packet from
                                                                              queue N
                input 1 to output N

                    input 1                                                                              output 1

                    input N                                                                             output N

                                                                             shared bus

        Figure 4.4: Switch with time-division shared bus and centralized shared memory

4.2 Shared queue

This architecture is used in most low speed packet processors: a time-division bus with a centralized
memory shared by all input and output lines, figure 4.4. Up to Æ packets may arrive at one time
and up to Æ may be read at one time, so memory bandwidth must be ¾Æ -times line rate. Assume
100 ns DRAM access time, 53B-wide bus, gives total bandwidth of ¿¼ ¢               ¾ ¼ Mbps. For a
16-port ATM switch, this gives line rate of ¾ ¼ ¿¾ ½¿¾ Mbps.

                            Ø       size of 1-list at beginning of slot Ø
                            Ø       # of 1-packets arriving in slot Ø                       ¼     Ø       Æ
                        Ø·½         ´    Ø     ½µ   ·
                                                            Ø        Ø   ·       Ø     ½´   Ø    ¼µ                    (4.5)
Following same argument that led to (4.2) gives
                                                                ·        ¾        ¾

                                                                ¾´½   µ
where             È´    Ø       ¼µ and        ¾         ¾
                                                            ·    ¾
                                                                     . For the Poisson case,              ¾
                                                                                                                , so
                                                                ¾            ¾

                                                                ¾´½   µ

Shared vs separate queue
Suppose shared buffer is sized at            ´ µ·           ´ µ where ´ µ is the standard deviation of ´ µ. Then
                       separate buffer size                                  ´ µ·           ´µ
                        shared buffer size                                   ´ µ·                ´µ   ¾ ½ ¾
4.3. OUTPUT QUEUE                                                                         47

4.3 Output queue

In an output queued switch the switch fabric must run Æ -times, and the ouptut memory must run
Æ · ½-times line rate. The queue length in port 1 is given by (4.5).
48                                                                         CHAPTER 4. SWITCHING

4.4 Problems

     1. Assume that Ø is Poisson in (4.1) or (4.5). The mean queue size is given by (4.6). Is

         (a)      Ø   Ø   ¼ Markov? Why?
         (b) If       Ø is stationary, how would you find   ´Òµ   Ô´   Ø   Òµ?
Chapter 5


Crossbar switches need a controller to schedule a switch. The controller must find a good match,
eg. longest queue first, oldest cell first, etc.
It is too expensive to run a centralized matching algorithm with complexity   Ç´Æ µ or Ç´Æ µ.
                                                                                 ¾        ¿
40-byte packet at a line speed of 1 Gbps amounts to 360 ns/packet.)
So one may have to be satisfied with maximal matching, using distributed algorithms. Note that for
a fully-connected bipartite graph, a maximal matching is also maximum.
In case of QoS, the matching must satisfy some preferences.

50                                                                              CHAPTER 5. MATCHING

                          Man #     Preference list    Woman #      Preference list
                           1        1 2 3 4              1          1 3 4 2
                           2        2 1 4 3              2          3 4 1 2
                           3        3 2 4 1              3          2 1 4 3
                           4        3 4 2 1              4          1 2 3 4

5.1 The dating game

Consider a dating game with Æ men and Æ women and following preferences. SMP algorithm by
Gale and Shapley finds a “stable” match, eg.

                                        ´½ ½µ ´¾ µ ´¿ ¾µ ´ ¿µ

The algorithm is

     ¯   iterative—proceeds in a sequence of proposals and (tentative) accepts

     ¯   upon termination—returns a matching ´        Ô´ µ
     ¯   guarantees stability.

A matching is unstable if it contains pairs ´      Ô´ µµ ´ Ô´ µµ such that
                                 prefers Ô´   µ to Ô´ µ and Ô´ µ prefers   to

´ Ô´ µµ is a blocking pair.
A stable matching has no blocking pair.
5.1. THE DATING GAME                                                51

The GSA algorithm. Say that a man or woman is

   ¯   free—if she/he is not engaged or matched to any man/woman

   ¯   engaged—if she/he is temporarily matched to some man/woman

   ¯   matched—if she/he is terminally matched
52                                               CHAPTER 5. MATCHING

                                 all are free

                                  some man       No
                                   m free?


                               m proposes to w,
                        the first woman he has not yet
                                  proposed to

                          yes         is
      w engaged to m               w free?

                                w is currently
                                engaged to m'

                         no        does w
     m continues free
                                 prefer m to

                                match w and m,
                                  set m' free

            Figure 5.1: The GS algorithm
5.1. THE DATING GAME                                                                          53

Algorithm will terminate. No man can be rejected by all women. Because a woman can reject a
man only if she is engaged. Once she is engaged, she stays engaged. So if every woman rejects
Ñ, they are all engaged. Alternatively: in each iteration, a man makes worse choices and a woman
makes better choices.
GSA finds a stable matching. Suppose ´ Ô´ µµ ´ Ô´ µµ are matched but prefers Ô´ µ to Ô´ µ and
Ô´ µ prefers to . Then, must have proposed to Ô´ µ before proposing to Ô´ µ; Ô´ µ must have
rejected in favor of, say, prefered by her to . But women make better and better choices, so
Ô´ µ’s final match must be better than , which is better than , hence better than .
Number of iterations is bounded by ƾ : there are Æ men and each makes at most Æ proposals.
There may be more than one stable matching. Suppose ѽ prefers ۽ to ۾ , Ѿ prefers ۾ to ۽ ;
۽ prefers Ѿ to ѽ , ۾ prefers ѽ to Ѿ .
Then ´Ñ½   Û µ ´Ñ Û µ and ´Ñ Û µ ´Ñ Û µ are both stable matches.
            ½     ¾   ¾         ¾   ½     ½   ¾
54                                                                                                       CHAPTER 5. MATCHING

     1                                                                 1

4    a1   2        1                             1            4        g2       2        1                        1

     3             2                             2                     3                 2                        2

     1             3                             3                      1                3                        3

4    a3   2        4                             4            4        g4       2        4                        4

     3                                                                  3

Figure 5.2:   ¢   RRM showing               ½    ¿       ¾        pointers with Ä´½              ½µ Ä´½ ¾µ Ä´¿ ¾µ Ä´¿ µ   ¼.

5.2 Round-robin matching

Each input maintains accept pointer                  . Each output                  maintains grant pointer       .
RRM cycle.
Step 1 Each requests all with Ä´                     µ       ¼.
Step 2 Each grants next requesting input                           at or after current pointer value             , i.e.

                                                ÑÒ                              Ä´ µ             ¼
then increments              · ½.
Step 3 Each accepts next granted output                           at or after current pointer value             , i.e.

If grant has been accepted, increments          ·½. Figure 5.2 illustrates one RRM cycle. Initially,
all      ½, and all    ½. The input requests are
                                    ½           ½½           ¾¿            ¾¿
So we have the following steps:

                                        ½            ¼             ¾                         ½       ¼      ¾
                                                     ¼                                               ¼
                         ½                           ½                          ½                    ½

                                        ½                          ¾                                        ½
                                                     ¼                                               ¼
                         ¾                           ¾                          ¾                    ¾

                                                                   ½                                        ½
                                                     ¼                                               ¼
                         ¿                           ¿                          ¿                    ¿

                                        ¿                                                                   ½
At the end of this cycle, the match is           ´½ ½µ ´¿ µ                 , and the pointer values are given above.
5.2. ROUND-ROBIN MATCHING                                                                       55

5.2.1 Analysis of RRM

Under heavy load, the grant counters may get synchronized, reducing utitlization. Consider    Æ
¾ Ä´ µ ¼ all . Then it is possible for ½ ¾ always as follows.
          ½              ½               ½         ¼      ¾               ½       ¼   ¾
  ½              ½               ½
                                                   ½             ½
                                                                                          Match       ´½ ½µ
  ¾       ½      ¾       ½       ¾       ½         ¾
                                                          ¾      ¾                ¾
          ¾              ¾               ¾         ¼      ½                       ¼   ¾



                                         ¾         ¼


                                                                          ½       ¼

                                                                                      ¾   Match       ´¾ ½µ
  ¾              ¾               ¾                 ¾             ¾                ¾

          ¾              ½               ½         ¼      ¾               ¾       ¼   ½



                                         ½         ¼


                                                                                      ¾   Match       ´½ ¾µ
  ¾              ¾               ¾                 ¾             ¾                ¾

          ½              ¾               ¾         ¼      ½                       ¼   ½



                                         ¾         ¼


                                                                          ¾       ¼

                                                                                      ½   Match       ´¾ ¾µ
  ¾              ¾               ¾                 ¾             ¾                ¾

At the end of the fourth cycle the situation repeats. Throughput is 50 percent.
Of course the following TDM cycle is also possible, and has througput of 100 percent.

          ½              ½               ½         ¼      ¾               ½       ¼   ¾



                                         ¾         ¼


                                                                          ¾       ¼

                                                                                      ½   Match        ´½ ½µ ´¾ ¾µ
  ¾              ¾               ¾                 ¾             ¾                ¾

          ¾              ¾               ¾         ¼      ½               ¾       ¼   ½



                                         ½         ¼


                                                                          ½       ¼

                                                                                      ¾   Match        ´½ ¾µ ´¾ ½µ
  ¾              ¾               ¾                 ¾             ¾                ¾

Under heavy load, if grant counters get syncronized at any time (i.e. have the same value), they’ll
stay synchronized forever.
Under light load, the grant counters will be randomly distributed. The probability that some input

                                                 Æ  ½ Æ
is not served is
                         È                   ´       µ        ½      ½
                                                                         ¼ ¿
56                                                                    CHAPTER 5. MATCHING

                      (1,1) = 1                                           =
                                                                   µ(1,1) /4
                      (1,2) = 1
                              1                             1             =

                      (2,1) = 2
                              1                             2             =

                         Figure 5.3: PIM can be unfair under heavy load

5.3 Partial iterative matching, PIM

Step 1 Each unmatched input sends requests to every output such that Ä´         µ      ¼.
Step 2 Each randomly picks        from received requests.
Step 3 Each randomly accepts one of received grants.
The I in PIM means that this cycle is repeated to improve match.

5.3.1 Analysis of PIM

It appears that with uniform iid traffic, PIM achieves maximal match in 3 iterations.
In heavy load, every input makes requests. Probability that receives no grant in one round equals

                                              Æ  ½ Æ
                        È                 ´       µ      ½      ½
                                                                     ¼ ¿
PIM can be unfair. Figure 5.3 gives a ¾ ¢ ¾ case where the request rates from input to output is
  ´ µ. So requests ½ ½ ½ ¾ ¾ ½ are made in each slot. The grant rates from output to
input will therefore be ­ ´½ ½µ    ­ ´½ ¾µ ¼ ­ ´¾ ½µ ½. So input will accept output 1 with
probability ´½ ½µ ¼ ¾ , and output 2 with probability ´½ ¾µ ¼ ; input 2 will accept output
1 with probability ´¾ ½µ ¼ .
Thus even though arrival rates for output port 1 are equal at inputs ports 1 and 2, the acceptance
rates are not the same.
5.4. ISLIP MATCHING                                                                             57

5.4 iSLIP matching

The detailed reference is [4]. The RRM suffers from synchronization of the grant counters. The
iSLIP modifies RRM slightly so that the grant counters are incremented only if the grant is accepted.
So step 2 of RRM is modified.
Step 2 Each grants next requesting input       at or after current pointer value   , i.e.

                                        ÑÒ             Ä´ µ      ¼
then increments           · ½ only if    accepts output .
58                                                                     CHAPTER 5. MATCHING

5.4.1 Analysis of iSLIP

Consider the situation   Ä´ µ   ¼ all       . In contrast with RRM, inputs 1 and 2 share outputs in
TDM fashion.
         ½                ½             ½                ½      ¼      ¾      ¼      ¾




                                                                       ½      ¼

                                                                                     ½    Match       ´½ ½µ
  ¾              ¾              ¾                ¾              ¾             ¾

         ¾                ¾             ¾                ¾      ¼      ½      ¼      ½




                                                         ½      ¼

                                                                       ¾      ¼

                                                                                     ¾    Match        ´½ ¾µ ´¾ ½µ
  ¾              ¾              ¾                ¾              ¾             ¾

         ½                ½             ½                ½      ¼      ¾      ¼      ¾




                                                         ¾      ¼

                                                                       ½      ¼

                                                                                     ½    Match        ´½ ½µ ´¾ ¾µ
  ¾              ¾              ¾                ¾              ¾             ¾
5.4. ISLIP MATCHING                                                                             59

5.4.2 Priority iSLIP

Suppose there are È priority levels. Then each input maintains È ¢ Æ VoQs, with ÄÔ ´ µ the
buffer occupancy of priority Ô and output . Then gives strict priority, i.e. serves Ä ´ µ only
if ÄÕ ´ µ       ¼, Õ Ô. Each input maintains counter Ô and each output maintains Ô for each
priority level.
Step 1 Each selects highest priority level È ´   µ with non-empty queue to output    .
Step 2 Output determines highest priority level È ´ µ Ñ Ü È ´ µ. The output then chooses one
input among those inputs that have requested at level È ´ µ. The output maintains separate pointer
 Ô ´ µ, and chooses input Ô among requests at level È ´ µ in the same round-robin scheme. The
output notifies each input whether or not its request is granted. The pointer Ô ´ µ      Ô · ½ is
incremented only if granted input Ô accepts output .
Step 3 If input receives any grants, it determines the highest priority level grant, say Ô. The
input then chooses one grant among the requests granted at this level. This is done according to the
counter Ô , which is incremented Ô        Ô · ½. The input then notifies each output whether or
not its grant was accepted.
60                                                                      CHAPTER 5. MATCHING

5.4.3 Threshold iSLIP

It may be better to select a weighted maximal match with weights corresponding to queue length.
If queue lengths are quantized in threshold levels ؽ ؾ ¡ ¡ ¡ ØÈ , then priorities may be assigned
accordingly as ØÔ Ä´ µ ØÔ·½ .

5.4.4 Weighted iSLIP

   È ´ µ ½, È ´ µ ½.
Suppose bandwidth from to is to be shared according to the ratio       ´ µ     Ò´ µ ´ µ subject
In iSLIP each counter is an ordered circular list Ë     ½ ¡ ¡ ¡ Æ . Now expand the list at output to
Ë ´ µ ½ ¡ ¡ ¡ Ï ´ µ where Ï ´ µ is the lcd of         ´ µ and input appears Ï ´ µ¢Ò´ µ ´ µ
times in the list.
5.4. ISLIP MATCHING                                                                                        61

                  state of input queues (N2 bits)
                                                       1                    1

                                                       2                    2

                                                       N                    N

                                                    Grant                Accept      Decision
                                                    arbiters             arbiters    register

   Figure 5.4: Interconnection of the input and output arbiters to construct the iSLIP scheduler

5.4.5 Implementation

Figure 5.4 shows how the iSLIP scheduler for a                 Æ ¢Æ   switch is constructed from the input and
output arbiters.

   ¯   The state memory records whether an input queue is empty. From this memory, an ƾ -
       bit wide vector presents Æ bits to each of the Æ output grant arbiters representing Step 1

   ¯   The grant arbiters select a single input among the contending requests to implement Step 2

   ¯   The grant decisions are presented to the Æ accept arbiters, each of which selects at most one
       output on behalf of each input to implement Step 3 (accept).

   ¯   The final decision is stored in the decision registers and the value of the and      pointers
       are updated. The decision register is used to notify each input which cell to transmit and to
       configure the crossbar switch.
Chapter 6

Network processors

Figure 6.1 is a logical diagram of how a network processor (NP) fits in a system design. The
NP is located between the physical layer (MAC or framer) and the switch fabric. In the figure
the Serializer/Deserializer (SERDES) is the interface between the NPU and switch fabric. The
framer or MAC presents a packet to the NPU which must examine it, parse it, do necessary edits
and database lookups to enforce various policies at layers 3-7 (forwarding, queuing, labels), and
exchange messages with switch controller. The NP is in the data path.

64                                                     CHAPTER 6. NETWORK PROCESSORS

                  Figure 6.1: Location of NP in a logical diagram. Source [17].

6.0.6 NP operation

Figure 6.2 shows a generic block diagram. Data of multiple physical interfaces or the switch fabric
are transferred to/from the NP. The bitstream processors receive the serial stream of packet data and
extract the information needed to process the data, such as MAC or IP source/destination address,
TOS bits, TCP port numbers, MPLS or VLAN tags. The packet is then written into the packet
buffer memory. This information is fed to the processor complex—the programmable unit of the
NP. Under program control, the processor may extract additional information from the packet and
submits relevant information to the search engine which looks up the MAC or IP address, classifies
the packet, or does a VCI/VPI lookup using the routing/bridging tables. Upon packet transmission
through the bitstream processor, the necessary modifications to the packet header are performed.

                packet              buffer                             general
                 buffer           manager/                             purpose
                                  scheduler                              CPU
                   and                                             bitsteream
                bridging         search engine    HW assists

                                                                 To/from PHY/
                                                                 switch fabric

                        Figure 6.2: Generic NP architecture. Source [15].

           Figure 6.3: Time to process 40B packets at different line rates. Source [17]

6.0.7 Speed of operations

Table 6.3 shows the time available to process back to back 40B packets at different line speeds. At
1 Gbps, the time to process one packet is 360 ns. Using 10-ns SRAM permits a maximum of 36
memory accesses. Thus faster line rates can be accommodated only by processing several packets
simultaneously in a pipelined or parallel fashion.
66                                                    CHAPTER 6. NETWORK PROCESSORS

6.0.8 Packet buffer memory

For the architecture of figure 6.2, each packet header byte may traverse the memory interface at
least four times:

     ¯   write inbound packet

     ¯   read header into processor complex

     ¯   write back to memory

     ¯   read for outbound transmission

So for 40 byte back-to-back packets the required memory interface capacity is 10-120 Gbps for line
rates of 2.5-40 Gbps.
Chapter 7

Distributed Switch

The single switch fabric architectures cannot scale beyond 32 ports. Hence the need for distributed
architectures. We’ll study blocking and routing properties.

7.1 Blocking

A switch network is a graph of switches, each with a set of input and output ports as in Figure7.1.
There is a set of Æ input nodes and a set of Å output nodes. Each internal link has a capacity of 1.
A configuration is a set of input-output pairs

                                       ´   ½   ½   Öµ
                                                   ½     ¡¡¡´   Öµ
with distinct inputs and outputs and disjoint routes connecting   ½   to   ½   , ¡ ¡ ¡,   to   .
A DS is strictly non-blocking if given a configuration        and a pair ´ µ not in , there exists
a disjoint route from to . It is rearrangeably non-blocking if given any partial permutation of
input-output pairs, there is a configuration that includes those pairs.
We first study modular architectures.

68                                                       CHAPTER 7. DISTRIBUTED SWITCH


                                                     cap = 1


Figure 7.1: A distributed switch is a network of switches with certain number of input and output
ports, Æ input nodes and Å output nodes

7.2 Clos network

This is a 3-stage network as illustrated in Figure 7.2. The Clos network is specified by 5 numbers
IN ƽ ƾ Æ¿ OUT. There are ƽ ¢ ƾ ¢ Æ¿ switches arranged in 3 stages. The number of
input-output ports and connectivity of the switches are as shown.
Theorem A Clos network with RNB switch modules is RNB iff

                                      Æ ¾     ÑÜ   IN OUT

A Clos network with SNB switch modules is SNB iff

                                      Æ   ¾   IN · OUT   ½

The total number of input lines is IN ¢ ƽ . The total number of output lines is OUT ¢ Æ¿ .
The Clos network in the figure is SNB. It has 9 input lines and 8 output lines.
7.3. RECURSIVE CONSTRUCTION                                                                     69

                                       Clos (3, 3, 5, 4, 2)
                                             N2 = 5

                                                                                    OUT = 2
        IN = 3



                      N1 = 3                                         N3 = 4

             Figure 7.2: A Clos network is fully specified by ´IN   Æ Æ Æ
                                                                    ½   ¾     ¿   OUT µ

7.3 Recursive construction

We can recursively construct an Æ ¢ Æ SNB with Æ           Ô ¢ Õ input and output lines as in Figure
7.3. The result is a ´Ô Õ ¾Ô   ½ Õ Ôµ switch. It is SNB if each module is SNB.
70                                       CHAPTER 7. DISTRIBUTED SWITCH

                              2p - 1 planes

                q planes                         q planes

     N=pxq                                               N=pxq

                                                (2p -1) x q
           p x (2p -1)

                 1                         1
            p                                    p

                 q                          q

      Figure 7.3: Recursive construction of a SNB Clos network
7.3. RECURSIVE CONSTRUCTION                                                         71

                                           pxp             pxp

                            q planes                                 q planes

                N=pxq                                                       N=pxq

                                       1                         1
                                p                                     p

                                       q                         q

                     Figure 7.4: Recursive construction of a RNB CLos network

Figure 7.4 is a Æ   ¢ Æ RNB switch if each module is RNB.
72                                                                CHAPTER 7. DISTRIBUTED SWITCH

                                                  N/2 ✕ N/2

                N                                                                 N

                                                  N/2 ✕ N/2

                                   2 log2 N – 1 stages of N/2 2 ✕ 2 switches

                                  Figure 7.5: The Benes switch

Figure 7.5 is a Æ   ¢ Æ RNB switch made up of ¾ ¢ ¾ switch modules.
7.3. RECURSIVE CONSTRUCTION                                            73

           1                                                       1

           2                                                       2

           3                                                       3

           4                                                       4

       1-->1 as shown; 4-->4 as shown; cannot accommodate
                             Figure 7.6: Benes swtich is not SNB

Figure 7.6 shows that a Benes switch is not SNB.
74                                                      CHAPTER 7. DISTRIBUTED SWITCH

Figure 7.7 illustrates an algorithm to rearrange existing connections in order to accommodate a new
Question 1: Can you supply a proof?
Question 2: Is there an alogrithm to accommodate new connections in an arbitrary network of Figure
7.3. RECURSIVE CONSTRUCTION                                                 75


     3                                        4


           Figure 7.7: Algorithm to add a new connection for a RNB switch
76                                                               CHAPTER 7. DISTRIBUTED SWITCH

In a Benes switch, feasible flows may require multiple paths. Figure 7.8 and 7.9 show this. Note:
      ¾                            ¿                     ¾                        ¿       ¾                   ¿
               ¼ ½       ¼                                                ½                   ½
               ¼    ¼ ½                    ´½   ¾ µ ½                         ½       ·                   ½
          ½ ¾ ¼                                                                                       ½
             ¼ ½    ¼    ¼                                        ½                               ½
                                             ¾                        ¿
                                       ·                          ½
7.3. RECURSIVE CONSTRUCTION                                                                   77

         1    2    3     4            1                 e
     1    e       1-e
     2   e              1-e

     3 1-2e       e      e                        1-e

     4        1

                                                        1                     1-e
                              1                   e


                              3               1                               2e
                                                  1                     1-e

                  2                   1-e
                                  1                               2e
                                      1                     1-e

                                  Figure 7.8: Split flow 1
78                                                                  CHAPTER 7. DISTRIBUTED SWITCH

         1    2        3       4                                                         1
     1    e           1-e
                                      2                                                                         1
     2   e                    1-e                                                                    1-2e
     3 1-2e               e    e
     4        1
                                                                                     e                 1-e


                  3                   e                                                      e

                                      1                                                      e
                                                       2e                   1-e

                      1                          e                                       e

                                                 1-e            e
                                            e                                                1-e

                                            1                       1-2e                         e
                                                            2e                1-e

                                          Figure 7.9: Split flow 2
7.3. RECURSIVE CONSTRUCTION                                                                    79

                                              2               3

                                          1                        1

                             1                                                    1
                                     3                                      2

Figure 7.10: Max flow for single commodity is 3 and flows are integers; in multi-commodity case,
max flows are 0.5 and non-integer

In a Clos switch, permutations can be achieved without splittling flows. In a general multi-commodity
case this is not so. Figure 7.10 shows that if this is a single commodity problem, the maximum flow
is 3 and all flows are 1 (integer).
However, if the flows are ½       ¾   ¾¿    ¿, the max flows are 0.5 each, and not integer.
80                                                                      CHAPTER 7. DISTRIBUTED SWITCH

                                                2               3

                                            1                           1

                            1                                                           1
                                    3                                           2

                                                    2               3

                0.5                                                         1

                                1                                                           1
                                        3                                           2

Figure 7.11: Two copies of figure 7.10 are connected in parallel. Achieving flows of 1,1,1 requires

Figure 7.11 shows that a feasible permutation may require splittling flows. The green and cyan
flows must be connected in parallel similarly to the red flow.

 [1] J. Walrand and P. Varaiya. Chapter 12, Switching. High performance communication networks
     2nd edition, 2000.

 [2] M.J. Karol, M. Hluchyj and S. Morgan. Input vs output queueing on a space-division packet
     swtich. IEEE Trans Comm, COM-35(12): 1347-56, Dec. 1987.

 [3] T.E. Anderson, S. Owicki, J. Saxe and C.P. Thacker. High-speed scheduling for local area
     networks. ACM Trans Computer Systems, 11(4):319-52, Nov. 1993.

 [4] N. McKeown. iSLIP: a scheduling algorithm for input-queued switches. IEEE Trans Network-
     ing, 7(2), April 1999.

 [5] N. McKeown, V. Anatharam and J. Walrand. Achieving 100% througput in an input-queued
     switch. Proc. Infocom ’96, vol 1: 296-302.

 [6] B. Prabhakar and N. McKeown. On the speedup required for combined input and output
     queued switching. Automatica, 35(12), Dec. 1999

 [7] J.F. Hayes, R. Breault and M.K. Mehmet-Ali. Performance analysis of a multicast switch.
     IEEE Trans Comm, COM-39(4): 581-87, April. 1991.

 [8] B. Prabhakar, N. McKeown and R. Ahuja. Multicast scheduling for input-queued switches. J.
     Selected Areas in Comm 15(5):855-66, June 1997.

 [9] M. Waldvogel, G. Varghese, J. Turner and B. Plattner. Scalable high speed IP routing lookups.
     ACM Sigcomm ’97 September 1997.

[10] A. Demers, S. Keshav and S. Shenker. Analysis and simulation of a fair queueing algorithm.
     ACM Sigcomm ’89 Computer Communication Review, 19(4): 1-12, 1989.

[11] A. Parekh and R. Gallager. A generalized processor sharing approach to flow control of inte-
     grated services networks: the single node case. IEEE Trans Networking, 1(3): 344-57, June

[12] A. Parekh and R. Gallager. A generalized processor sharing approach to flow control of inte-
     grated services networks: the multiple node case. IEEE Trans Networking, 2(2): 137-50, April

82                                                                             BIBLIOGRAPHY

[13] S. Floyd and V. Jacobsen. Random early detection. IEEE Trans Networking, 1(4): 397-413,
     August 1993.

[14] I. Stoica, S. Shenker and H. Zhang. Core-stateless fair queuing: achieving approximately fair
     bandwidth allocations in high speed networks. ACM Sigcomm ’98, 1998.

[15] W. Bux, W.E. Denzel, T. Engbersen, et al. Technologies and building blocks for fast packet
     forwarding. IEEE Communications Magazine, 39(1): 70-77, January 2001.

[16] P.R. Kumar and S. Meyn. Stability of queuing networks and scheduling policies. IEEE Trans.
     Automatic Control, 40(2), February 1995.

[17] A. Deb. Building a network-processor based system. Integrated Communications Design,
     December 2000. Available at

To top