On Chip Communication Architectures(2)

Document Sample
On Chip Communication Architectures(2) Powered By Docstoc

     Ben Abdallah, Abderazek
      The University of Aizu
    E-mail: benab@u-aizu.ac.jp

         KUST University, March 2011
       Part 3
Routing Algorithms
 Deterministic Routing
   Oblivious Routing
   Adaptive Routing

           Routing Basics
Once  topology is fixed
Routing algorithm determines path(s)
 from source to destination
They must prevent deadlock, livelock ,
 and starvation

            Routing Deadlock

 Withoutrouting restrictions, a resource
 cycle can occur
   Leads to deadlock

       Deadlock Definition
Deadlock: A packet does not reach its
destination, because it is blocked at
some intermediate resource
Livelock: A packet does not reach its
destination, because it enters a cyclic
Starvation: A packet does not reach its
destination, because some resource
does not grant access (wile it grants
access to other packets)
  Routing Algorithm Attributes
Number   of destinations
  Unicast, Multicast, Broadcast?
  Deterministic , Oblivious or Adaptive?
Implementation    (Mechanisms)
  Source or node routing?
  Table or circuit?

Deterministic Routing

       Deterministic Routing
Alwayschoose the same path
 between two nodes
  Easy to implement and to make deadlock
  Do not use path diversity and thus bad on
   load balancing
  Packets arrive in order

         Deterministic Routing -
  Example: Destination-Tag Routing in Butterfly Networks

 Depends on the destination address only (not on source)

         1                                         2

                                  = 101
                            1                                  3

                                          The destination address
  The destination address in
                                          interpreted as quaternary
  binary is 5 = 101 = down, up,
                                          digits. 11=1011(2) = 23(4),
  down, selects the route.
                                          selects the route
Note: Starting from any source and
using the same pattern always routes to destination.                    9
       Deterministic Routing-
          Dimension-Order Routing

For   n-dimensional hypercubes and
 meshes, dimension-order routing
 produces deadlock-free routing

It is called XY routing in 2-D mesh and
 e-cube routing in hypercubes

Dimension-Order Routing -
    XY Routing Algorithm




Dimension-Order Routing -
       XY Routing Algorithm

XY routing algorithm for 2 D Mesh   12
     Deterministic Routing -
      E-cube Routing Algorithm

Dimension order routing algorithm for Hypercubes

Oblivious Routing

Oblivious (unconscious) Routing

Always choose a route without
 knowing about the state of the
  Random algorithms that do not consider
   the network state, are oblivious
  Include deterministic routing
   algorithms as a subset

     Minimal Oblivious Routing
   Minimal oblivious routing attempts to
    achieve the load balance of
    randomized routing without giving up
    the locality
   This is done by restricting routes to
    minimal paths
   Again routing is done in two steps
    1. Route to random node
    2. Route to destination

       Minimal Oblivious Routing -
Idea: For each packet
randomly determine a
node x inside the         03   13   23   33

minimal quadrant, such
that the packet is
routed from source        02   12   22   32

node s to x and then to
destination node d
                          01   11   21   31

Assumption: At each
node routing in x or y    00   10   20   30
direction is allowed.

           Minimal Oblivious Routing -
   For each node in
    quadrant (00, 10, 20, 01, 11,
    21)                             03   13   23   33
    ◦ Determine a minimal
      route via x
                                    02   12   22   32
   Start with x = 00
    ◦ Three possible routes:
       (00, 01, 11, 21) (p=0.33)   01   11   21   31
       (00, 10, 20, 21)
       (00,10,11,21) (p=0.33)      00   10   20   30

         Minimal Oblivious Routing -
   x = 01
    ◦ One possible         03   13   23   33
       (00, 01, 11, 21)
        (p=1)              02   12   22   32

                           01   11   21   31

                           00   10   20   30

         Minimal Oblivious Routing -
   x = 10
    ◦ Two possible routes:
                                03   13   23   33
       (00, 10, 20, 21)
       (00, 10, 11, 21) (p=0.5)
                               02    12   22   32

                               01    11   21   31

                               00    10   20   30

          Minimal Oblivious Routing -
   x = 11
    ◦ Two possible routes:
       (00, 10, 11, 21) (p=0.5)03   13   23   33

       (00, 01, 11, 21) (p=0.5)

                               02    12   22   32

                               01    11   21   31

                               00    10   20   30

        Minimal Oblivious Routing -
   x = 20
    ◦ One possible route:
                               03   13   23   33
       (00, 10, 20, 21) (p=1)

                              02    12   22   32

                              01    11   21   31

                              00    10   20   30

        Minimal Oblivious Routing -
   x = 21
    ◦ Three possible routes:
                               03   13   23   33
       (00, 01, 11, 21)
       (00, 10, 20, 21)
                               02   12   22   32
       (00, 10, 11, 21)
                               01   11   21   31

                               00   10   20   30

           Minimal Oblivious Routing -
   Adding the                   03   13   23   33
    probabilities on each
   Example, link (00,01)        02   12   22   32

    ◦   P=1/3, x = 00
    ◦   P=1, x = 01
                                 01   11   21   31
    ◦   P=0, x = 10
    ◦   P=1/2, x = 11
    ◦   P=0, x = 20
                                 00   10   20   30
    ◦   P=1/3, x = 21
    ◦   P(00,01)=(2*1/3+1/2+1)/6
        Minimal Oblivious Routing -
 Results:                 03        13         23        33
  ◦ Load is not very
  ◦ Path between node      02        12         22        32
    10 and 11 is very
    seldomly used           p = 2.17/6 p = 3.83/6
 Good locality            01        11         21        31
  performance is
  achieved at expense p = 2.17/6 p = 1.67/6 p = 2.17/6
  of worst-case            00        10         20        30
                                p = 3.83/6   p = 2.17/6

Adaptive Routing
   (route influenced by traffic along the way)

            Adaptive Routing
 Uses network state to make routing
  Buffer occupancies often used
  Couple with flow control mechanism
Local   information readily available
  Global information more costly to obtain
  Network state can change rapidly
  Use of local information can lead to non‐optimal
Can be minimal or non‐minimal
         Adaptive Routing
   Local Information not enough

  0      1    2      3   4   5     6   7

 In   each cycle:
  Node 5 sends packet to node 6
  Node 3 sends packet to node 7

         Adaptive Routing
   Local Information not enough
 Node 3 does not know about the traffic
 between 5 and 6 before the input buffers
 between node 3 and 5 are completely filled
 with packets!

    0    1   2    3    4    5    6    7

         Adaptive Routing
  Local Information is not enough
 Adaptiveflow works better with smaller
 buffers, since small buffers fill faster
 and thus congestion is propagated earlier
 to the sensing node (stiff backpressure)

    0    1    2    3    4    5    6    7

        Adaptive Routing
How   does the adaptive routing
 algorithm sense the state of the
It can only sense current local
Global information is based on historic
 local information
Changes in the traffic flow in the
 network are observed much later
     Minimal Adaptive Routing

                      Minimal adaptive
03     13   23   33
                       routing chooses
                       among the minimal
                       routes from
02     12   22   32

01     11   21   31
                       source s to
                       destination d
00     10   20   30

 Minimal Adaptive Routing

                     Ateach hop a routing
                     function generates a
                     productive output
03   13   23   33

                     vector that
02   12   22   32     identifies which
                      output channels of
                      the current node will
01   11   21   31
                      move the packet
                      closer to its
00   10   20   30
                     Network state is
                      then used to select
                      one of these channels
                      for the next hop
    Minimal Adaptive Routing
Local congestion cannot be avoided
                                        Good at locally
                                         balancing load
                                         Poor at globally
      03     13     23     33
                                         balancing load
      02     12     22     32
                                        Minimal adaptive
                                         routing algorithms
      01     11     21     31            are unable to
                                         avoid congestion
                                         of source-
                                         destination pairs
      00     10     20     30

                                         with no minimal
   Local congestion can be avoided       path diversity.
      Fully Adaptive Routing

 Fully-Adaptive  Routing   03   13   23   33

  does not restrict
  packets to take the       02   12   22   32
  shortest path
 Misrouting is allowed     01   11   21   31

 This can help to avoid
  congested areas and       00   10   20   30
  improves load balance

           Fully Adaptive Routing
 Fully-Adaptive  Routing    03   13   23   33
  may result in live-lock!
 Mechanisms must be         02   12   22   32
  added to prevent
                             01   11   21   31
  ◦ Misrouting may only be
    allowed a fixed number
    of times                 00   10   20   30

Summary of Routing Algorithms
 Deterministic routing is a simple and
 inexpensive routing algorithm, but does
 not utilize path diversity and thus is weak
 on load balancing

 Obliviousalgorithms give often good
 results since they allow load balancing and
 their effects are easy to analyse

 Adaptive algorithms, though in theory
 superior, suffer from that global
 information is not available at a local node

Summary of Routing Algorithms
Latency   paramount concern
  Minimal routing most common for NoC
  Non‐minimal can avoid congestion and
   deliver low latency
To date: NoC research favors DOR for
 simplicity and deadlock freedom
Only covered unicast routing
  Recent work on extending on‐chip routing
   to support multicast

       Part 4

NoC Routing Mechanisms

The term routing mechanics refers to the
mechanism that is used to implement any
routing algorithm

Two   approaches:
  Fixed routing tables at the source or at
   each hop
  Algorithmic routing uses specialized
   hardware to compute the route or next
   hop at run-time
         Table-based Routing
Two   approaches:
  Source-table routing implements all-at-
   once routing by looking up the entire route
   at the source
  Node-table routing performs incremental
   routing by looking up the hop-by-hop
   routing relation at each node along the
Major   advantage:
  A routing table can support any routing
  relation on any topology
         Table-based Routing

Example routing mechanism for deterministic
source routing NoCs. The NI uses a LUT to store
the route map.
            Source Routing
 Allrouting decisions are made at the source
 To route a packet
  1) the table is indexed using the packet
  2) a route or a set of routes are returned
  3) one route is selected
  4) the route is prepended to the packet
 Because of its speed, simplicity and
  scalability source routing is very often used
  for deterministic and oblivious routing
              Source Routing - Example
          The example shows a
           routing table for a 4x2
           torus network                                     01     11        21     31

          In this example there
           are two alternative                               00     10        20     30

           routes for each
           destination                                             4x2 torus network
                                                        In this example the order of XY
          Each node has its own                        should be the opposite, i.e. 21->12
           routing table
        Source routing table for node 00 of 4x2 torus network
        Destination     Route 0              Route 1              Example:
        00              X                    X                    -Routing from 00 to 21
        10              EX                   WWWX                 -Table is indexed with 21
        20              EEX                  WWX                  -Two routes:
        30              WX                   EEEX                 NEEX and WWNX
        01              NX                   SX
        11              NEX                  ENX                  -The source arbitrarily
index   21              NEEX                 WWNX                 selects NEEX
        31              NWX                  WNX
 Arbitrary Length Encoding of
        Source Routes
  It can be used for arbitrary-sized
The  complexity of routing is moved
 from the network nodes to the
 terminal nodes
But routers must be able to handle
 arbitrary length routes

     Arbitrary Length-Encoding

 Router     has
    16-bit phits
    32-bit flits
    Route has 13 hops:
 Extra    symbols:
    P: Phit continuation
  F: Flit continuation Phit
 The tables entries in
 the terminals must be
 of arbitrary length
       Node-Table Routing
Table-based  routing can also be
 performed by placing the routing table
 in the routing nodes rather than in
 the terminals

Node-table  routing is appropriate for
 adaptive routing algorithms, since it
 can use state information at each node

        Node-Table Routing
A table lookup is required, when a packet
 arrives at a router, which takes additional
 time compared to source routing

           is sacrificed, since different
 Scalability
 nodes need tables of varying size

          to give two packets arriving from
 Difficult
 a different node a different way through
 the network without expanding the tables

                    Example                      01       11       21   31

                                                 00       10       20   30

   Table shows a set of routing tables
   There are two choices from a source
    to a destination
         Routing Table for Node 00

                         Note: Bold font ports are misroutes                 49
                     Example            Livelock can occur
A packet passing through node 00
destined for node 11.
                                   01    11     21     31

If the entry for (00->11) is N ,
                                   00    10     20     30
go to 10 and (10-> 11) is S
=> 00 <-> 10 (livelock)

       Algorithmic Routing

Instead of using a table, algorithms
 can be used to compute the next route

In order to be fast, algorithms are
 usually not very complicated and
 implemented in hardware

                Algorithmic Routing -
        Dimension-Order
         ◦ sx and sy indicated the
           preferred directions
            sx=0, +x; sx=1, -x
            sy=0, +y; sy=1, -y
         ◦ x and y represent the
           number of hops in x and
           y direction
         ◦ The PDV is used as an
           input for selection of a
           route                              Determines the type of the routing
Indicates which channels advance the packet                                    52
      Algorithmic Routing -
A minimal oblivious router -
 Implemented by randomly selecting
 one of the active bits of the PDV as
 the selected direction
Minimal adaptive router - Achieved by
 making selection based on the length
 of the respective output Qs.
Fully adaptive router – Implemented
 by picking up unproductive direction if
 Qs > threshold results
Routing   Mechanics
  Table based routing
  Source routing
  Node-table routing
  Algorithmic routing

Compression of source routes. In the source
  routes, each port selector symbol
  [N,S,W,E, and X] was encoded with three
  bits. Suggest an alternative encoding to
  reduce the average length (in bits) required
  to represent a source route. Justify your
  encoding in terms of typical routes that
  might occur on a torus. Also compare the
  original three bits per symbol with your
  encoding on the following routes:
(b)WNEENWWWWWNX                               55
         Part 5

     NoC Flow Control
Resources in a Network Node
  Bufferless Flow Control
   Buffered Flow control

         Flow Control (FC)
FC determines how the resources of a
network, such as channel bandwidth and
buffer capacity are allocated to packets
traversing a network.

Goal is to use resources as efficient
 as possible to allow a high

An efficient FC is a prerequisite to
 achieve a good network performance        57
             Flow Control
FC   can be viewed as a problem of
  Resource allocation
  Contention resolution
Resources  in form of channels,
 buffers and state must be allocated to
 each packet
If two packets compete for the same
 channel flow control can only assign
 the channel to one packet, but must
 also deal with the other packet
               Flow Control
Flow Control can be divided into:

1.Bufferless   flow control
  Packets are either dropped or misrouted

2.Buffered   flow control
  Packets that cannot be routed via the
   desired channel are stored in buffers

 Resources in a Network Node
Control   State
  Tracks the resources allocated to the packet in
   the node and the state of the packet
  Packet is stored in
   a buffer before it is
   send to next node

  To travel to the next node bandwidth has to be
   allocated for the packet
 Units of Resource Allocation -
        Packet or Flits?
Contradictory    requirements on
  Packets should be very large in order to
   reduce overhead of routing and sequencing
  Packets should be very small to allow
   efficient and fine-grained resource
   allocation and minimize blocking latency
Flits   try to eliminate this conflict
  Packets can be large (low overhead)
  Flits can be small (efficient resource
   allocation)                              61
 Units of Resource Allocation -
     Size: Phit, Flit, Packet

There  are no fixed rules for the size
 of phits, flits and packets

Typical   values
  Phits: 1 bit to 64 bits
  Flits: 16 bits to 512 bits
  Packets: 128 bits to 1024 bits

      Bufferless Flow Control
No  buffers less implementation cost
If more than 1 packet shall be routed
 to the same output, 1 has to be
   Misrouted or
   Dropped
Example: two
packets A, and B
 of several flits) arrive at a network node.

     Bufferless Flow Control
Packet B is dropped and must be
There must be a protocol that informs
 the sending node that the packet has
 been dropped
 Example: Resend after no acknowledge has
 been received within a given time

     Bufferless Flow Control
Packet  B is misrouted
No further action is required here,
 but at the receiving node packets have
 to be sorted into original order

           Circuit Switching

 Circuit-Switchingis a bufferless flow
 control, where several channels are
 reserved to form a circuit
  A request (R) propagates from source to
   destination, which is answered by an
   acknowledgement (A)
  Then data is sent (here two five flit packets (D))
   and a tail flit (T) is sent to deallocate the
           Circuit Switching

Circuit-switchingdoes not suffer from
 dropping or misrouting packets
However there are two weaknesses:
  High latency: T = 3 H tr + L/b
  Low throughput, since channel is used to a
   large fraction of time for signaling and not
   for delivery of the payload
         Circuit Switching Latency

          T = 3 H tr + L/b
H : time required to set up the channel and delivers the
head flit
tr: serialization latency
L: time of flight
b: contention time

 Note: 3 x header latency because the path from source to
 destination must be traversed 3 times to deliver the packet:
 Once in each direction to set up the circuit and then again to
 deliver the first flit
      Buffered Flow Control
More  efficient flow control can be
 achieved by adding buffers

With  sufficient buffers packets do
 not need to be misrouted or dropped,
 since packets can wait for the
 outgoing channel to be ready

      Buffered Flow Control
Two main approaches:

1.Packet-Buffer   Flow Control
  Store-And-Forward
  Cut-Through

2.Flit-Buffer   Flow Control
  Wormhole Flow Control
  Virtual Channel Flow Control
 Store & Forward Flow Control

 Each node along a route waits until a packet
 is completely received (stored) and then the
 packet is forwarded to the next node
Two     resources are needed
  Packet-sized buffer in the switch
  Exclusive use of the outgoing channel
 Store & Forward Flow Control

 Advantage:   While waiting to acquire
  resources, no channels are being held idle
  and only a single packet buffer on the
  current node is occupied
 Disadvantage: Very high latency
   T = H (tr + L/b)
    Cut-Through Flow Control

  Cut-through
   reduces the latency
  T = H tr + L/b
 Disadvantages
  No good utilization of buffers, since they are
   allocated in units of packets
  Contention latency is increased, since packets
   must wait until a whole packet leaves the occupied
         Wormhole Flow Control
 Wormhole FC operates like cut-through, but
 with channel and buffers allocated to flits
 rather than packets

 When the head flit arrives at a node, it must
 acquire resources (VC, B,) before it can be
 forwarded to the next node

 Tailflits behave like body flits, but release
 also the channel

  Wormhole (WH) Flow Control
Virtualchannels hold the state needed
 to coordinate the handling of flits of a
 packet over a channel

Comparison   to cut-through
  wormhole flow control makes far more
   efficient use of buffer space
  Throughput maybe less, since wormhole
   flow control may block a channels mid-
  Example for WH Flow Control

Input  virtual channel is in idle state (I)
Upper output channel is occupied,
 allocated to lower channel (L)            76
 Example for WH Flow Control

Inputchannel enters waiting state (W)
Head flit is buffered
  Example for WH Flow Control

 Body  flit is also buffered
 No more flits can be buffered, thus congestion arises
  if more flits want to enter the switch             78
 Example for WH Flow Control

Virtualchannel enters active state (A)
Head flit is output on upper channel
Second body flit is accepted
 Example for WH Flow Control

First    body flit is output
   Tail flit is accepted
 Example for WH Flow Control

Second   body flit is output
 Example for WH Flow Control

Tail flit is output
 Virtual channels is deallocated and
 returns to idle state                  82
        Wormhole Flow Control

 The main advantage of wormhole to cut-
 through is that buffers in the routers do
 not need to be able to hold full packets,
 but only need to store a number of flits
 This allows to use smaller and faster
        Part 6
NoC Flow Control (continued)

Virtual Channel-Flow Control

   Virtual Channel Router

 Credit-Based Flow Control

   On/Off Flow Control

   Flow Control Summary
           Blocking -
   Cut-Through and Wormhole
       Cut-Through (Buffer-Size 1 Packet)


       Wormhole (Buffer-Size 2 Flits)


 Ifa packet is blocked, the flits of the
 wormhole packet are stored in different
        Wormhole Flow Control

 There is only one virtual channel for each physical
 Packet A is blocked and cannot acquire channel p
 Though channels p and q are idle packet A cannot
  use these channels since B owns channel p

  Virtual Channel-Flow Control
 In virtual channel flow-control several
 channels are associated with a single
 physical channel
 This allows to use the bandwidth that
 otherwise is left idle when a packet blocks
 the channel
 Unlike wormhole flow control subsequent
 flits are not guaranteed bandwidth, since
 they have to compete for bandwidth with
 other flits

  Virtual Channel Flow Control
There  are several virtual channels
 for each physical channel
 Packet A can use a second virtual
 channel and thus proceed over
 channel p and q

       Virtual Channel Allocation
 Flits   must be delivered in order, H, B, …B, T.
   Only the head flit carries routing information

 Allocate VC at the packet level, i.e., packet-
   The head flit responsible for allocating VCs along the route.
   Body and tail flits must follow the VC path, and the tail flit
    releases the VCs.

 The flits of a packet cannot interleave with
  those of any other packet

    Virtual Channel Flow Control -
     Fair Bandwidth Arbitration
VCs interleave their flits  Results in
a high average latency

 Virtual Channel Flow Control -
 Winner-Take-All Arbitration
Awinner-take all arbitration reduces
the average latency with no
throughput penalty

 Virtual Channel Flow Control -
        Buffer Storage
Buffer storage is organized in two
  Number of virtual channels
  Number of flits that can be buffered per

   Virtual Channel Flow Control -
          Buffer Storage
Virtual channel buffer shall at least
 be as deep as needed to cover round-
 trip credit latency

In general it is usually better to add
 more virtual channels than to increase
 the buffer size

         Virtual Channel

A: active
W: waiting
I: idle

Virtual Channel Router

         Buffer Organization

Single buffer per   Multiple fixed length
input               queues per physical
       Buffer Management
In buffered CF nodes there is a need
 for communication between nodes in
 order to inform about the availability
 of buffers
  Backpressure informs upstream nodes that
   they must stop sending to a downstream
   node when the buffers of that downstream
   node are full
                             Traffic Flow

             upstream node             downstream node
    Credit-Based Flow Control
 The   upstream router keeps a count of the
  number of free flit buffers in each virtual
  channel downstream
 Each time the upstream router forwards a
  flit, it decrements the counter
 If a counter reaches zero, the downstream
  buffer is full and the upstream node cannot
  send a new flit
 If the downstream node forwards a flit, it
 frees the associated buffer and sends a
 credit to the upstream buffer, which
 increments its counter                        98
Credit-Based Flow Control

            Credit-Based Flow Control
      The  minimum time between the credit being
        sent at time t1 and a credit send for the
        same buffer at time t5 is the credit round-
        trip delay tcrt
All buffers on the
are full

       Credit-Based Flow Control
 If there is only a
 single flit buffer, a
 flit waits for a new
 credit and the
 maximum throughput
 is limited to one flit
 for each tcrt

 The bit rate would
 be then Lf / tcrt
 where Lf is the
 length of a flit in
       Credit-Based Flow Control
 If there are F flit
 buffers on the
 virtual channel, F
 flits could be sent
 before waiting for
 the credit, which
 gives a throughput of
 F flits for each tcrt
 and a bit rate of FLf
 / tcrt

       Credit-Based Flow Control
 Inorder not to limit
 the throughput by
 low level flow
 control the flit
 buffer should be at

where b is the
 bandwidth of a

    Credit-Based Flow Control
 For each flit sent
 downstream a
 corresponding credit
 is set upstream

 Thus there is a large
 amount of upstream
 signaling, which
 especially for small
 flits can represent a
 large overhead!

              On/Off Flow Control
   On/off Flow control tries to reduce
    the amount of upstream signaling

    An off signal is sent to the upstream
    node, if the number of free buffers
    falls below the threshold Foff

    An on signal is sent to the upstream
    node, if the number of free buffers
    rises above the threshold Fon

   With carefully dimensioned buffers
    on/off flow control can achieve a very
    low overhead in form of upstream
      Ack/Nack Flow Control
In ack/nack flow
 control the
 upstream node
 sends packets
 without knowing, if
 there are free
 buffers in the
 downstream node

        Ack/Nack Flow Control
 Ifthere is no buffer
   the downstream node sends
    nack and drops the flit
   the flit must be resent
   flits must be reordered at
    the downstream node
 Ifthere is a buffer
  The downstream node
   sends ack and stores the
   flit in a buffer

       Buffer Management
Because  of its buffer and bandwidth
 inefficiency ack/nack is rarely used

Credit-based flow control is used in
 systems with small numbers of

On/off  flow control is used in systems
 that have large numbers of flit
 buffers                                108
       Flow Control Summary
Bufferless    flow control
  Dropping, misroute packets
  Circuit switching
Buffered    flow control
  Packet-Buffer Flow Control: SAF vs. Cut Through
  Flit-Buffer Flow Control: Wormhole and Virtual
Switch-to-switch       (link level) flow
  Credit-based, On/Off, Ack/Nack
           Part 7

   Router Architecture
  Virtual-channel Router
Virtual channel state fields
    The Router Pipeline
      Pipeline Stalls
   Router Microarchitecture -
     Virtual-channel Router
Modern    routers are pipelined and work
 at the flit level
Head flits proceed through buffer
 stages that perform routing and
 virtual channel allocation
All flits pass through switch allocation
 and switch traversal stages
Most routers use credits to allocate
 buffer space
 Typical Virtual Channel Router
A routers functional
 blocks can be divided into
  Datapath: handles storage
   and movement of a packets
      Input buffers
      Switch
      Output buffers
  Control: coordinating the
   movements of the packets
   through the resources of the
      Route Computation
      VC Allocator
      Switch Allocator
 Typical Virtual Channel Router
 The    input unit contains
  a set of flit buffers
 Maintains the state
  for each virtual
     G = Global State
     R = Route
     O = Output VC
     P = Pointers
     C = Credits

Virtual Channel State Fields

 Typical Virtual Channel Router
 Duringroute
 computation the
 output port for the
 packet is determined

 Then the packet
 requests an output
 virtual channel from
 the virtual-channel

  Typical Virtual Channel Router
 Flits are forwarded via the
  virtual channel by allocating
  a time slot on the switch
  and output channel using
  the switch allocator
 Flits are forwarded to the
  appropriate output during
  this time slot
 The output unit forwards
  the flits to the next router
  in the packet’s path

Virtual Channel State Fields

    Packet Rate and Flit Rate
The control of the router operates at
 two distinct frequencies
  Packet Rate (performed once per packet)
  Route computation
  Virtual-channel allocation
  Flit Rate (performed once per flit)
  Switch allocation
  Pointer and credit count update

         The Router Pipeline
                     A typical router pipeline
                      includes the following
                       RC (Routing
                       VC (Virtual Channel
no pipeline stalls      Allocation)
                       SA (Switch Allocation)
                       ST (Switch Traversal

         The Router Pipeline
                      Cycle0
                       Head flit arrives and
                        the packet is directed
                        to an virtual channel of
                        the input port (G = I)

no pipeline stalls

         The Router Pipeline
                      Cycle   1
                       Routing computation
                       Virtual channel state
                        changes to routing (G =
                       Head flit enters RC-stage
no pipeline stalls     First body flit arrives at

         The Router Pipeline
                      Cycle2: Virtual Channel
                       Route field (R) of virtual
                        channel is updated
                       Virtual channel state is set to
                        “waiting for output virtual
                        channel” (G = V)
no pipeline stalls     Head flit enters VA state
                       First body flit enters RC stage
                       Second body flit arrives at

         The Router Pipeline
                      Cycle2: Virtual Channel
                       The result of the routing
                        computation is input to the
                        virtual channel allocator
                       If successful, the allocator
                        assigns a single output
no pipeline stalls      virtual channel
                       The state of the virtual
                        channel is set to active (G

         The Router Pipeline
                      Cycle3: Switch
                       All further processing is
                        done on a flit base
                       Head flit enters SA stage
                       Any active VA (G = A) that
                        contains buffered flits
no pipeline stalls      (indicated by P) and has
                        downstream buffers
                        available (C > 0) bids for a
                        single-flit time slot
                        through the switch from
                        its input VC to the output
                        VC                           124
         The Router Pipeline
                      Cycle 3: Switch
                       If successful, pointer
                        field is updated
                       Credit field is
no pipeline stalls

         The Router Pipeline
                      Cycle4: Switch
                       Head flit traverses the
                     Cycle    5:
                       Head flit starts
no pipeline stalls      traversing the channel
                        to the next router

         The Router Pipeline
                      Cycle   7:
                       Tail traverses the switch
                       Output VC set to idle
                       Input VC set to idle (G =
                        I), if buffer is empty
                       Input VC set to routing (G
no pipeline stalls
                        = R), if another head
                        flit is in the buffer

         The Router Pipeline
                      Onlythe head flits enter
                      the RC and VC stages

                      The body and tail flits
                      are stored in the flit
                      buffers until they can
no pipeline stalls
                      enter the SA stage

                 Pipeline Stalls
Pipeline   stalls can be divided into
  Packet stalls
   can occur if the virtual channel cannot advance
    to its R, V, or A state
  Flit stalls
   If a virtual channel is in active state and the
    flit cannot successfully complete switch
    allocation due to
     Lack of flit
     Lack of credit
     Losing arbitration for the switch time slot

    Example for Packet Stall
Virtual-channel   allocation stall
  Head flit of A can first enter the VA
   stage when the tail flit of packet B
   completes switch allocation and releases
   the virtual channel

   Example for Packet Stall
Virtual-channel allocation stall

Head flit of A can first enter the VA
stage when the tail flit of packet B
completes switch allocation and
releases the virtual channel            131
      Example for Flit Stalls

Switch allocation stall

Second body flit fails to allocate the
requested connection in cycle 5
      Example for Flit Stalls
Buffer empty stall

Body flit 2 is delayed three cycles. However,
since it does not have to enter the RC and VA
stage the output is only delayed one cycle!
A  buffer is allocated in the SA stage
 on the upstream (transmitting) node
To reuse the buffer, a credit is
 returned over a reverse channel after
 the same flit departs the SA stage of
 the downstream (receiving) node
When the credit reaches the input
 unit of the upstream node the buffer
 is available can be reused
The credit loop can be viewed by
 means of a token that
  Starting at the SA stage of the upstream
  Traveling downwards with the flit
  Reaching the SA stage at the downstream
  Returning upstream as a credit

       Credit Loop Latency
The    credit loop latency tcrt,
 expressed in flit times, gives a lower
 bound on the number of flit buffers
 needed on the upstream size for the
 channel to operate with full bandwidth
tcrt in flit times is given by:

          Credit Loop Latency
If the number of buffers available per
 virtual channel is F, the duty factor of
 the channel will be
      d = min (1, F / tcrt)

The  duty factor will be 100% as long
 as there are sufficient flit buffers to
 cover the round trip latency

                      Credit Stall
Virtual Channel Router with 4 flit buffers

           Flit and Credit Encoding
A. Flits and credits are send over separated lines
   with separate width
B. Flits and credits are transported via the same
   line. This can be done by
      Including credits into flits
      Multiplexing flits and credits at phit level
    Option (A) is considered more efficient. For a more detailed
     discussion check Section 16.6 in the Dally-book

NoC   is a scalable platform for
 billion-transistor chips
 Several driving forces behind it
 Many open research questions
 May change the way we structure
 and model VLSI systems

                     Hong Kong University of Science and Technology, March 2010   140
OASIS   NoC Architecture Design in
 Verilog HDL, Technical Report,TR-
 062010-OASIS, Adaptive Systems
 Laboratory, the University of Aizu,
 June 2010.
OASIS NoC Project:

     Ben Abdallah, Abderazek
      The University of Aizu
    E-mail: benab@u-aizu.ac.jp

         KUST University, March 2011

Shared By:
pptfiles pptfiles