On Chip Communication Architectures(2)

Document Sample
On Chip Communication Architectures(2) Powered By Docstoc
					Network-on-Chip
                 (2/2)



     Ben Abdallah, Abderazek
      The University of Aizu
    E-mail: benab@u-aizu.ac.jp



         KUST University, March 2011
                                       1
       Part 3
      Routing
Routing Algorithms
 Deterministic Routing
   Oblivious Routing
   Adaptive Routing



                         2
           Routing Basics
Once  topology is fixed
Routing algorithm determines path(s)
 from source to destination
They must prevent deadlock, livelock ,
 and starvation




                                          3
            Routing Deadlock




 Withoutrouting restrictions, a resource
 cycle can occur
   Leads to deadlock


                                            4
       Deadlock Definition
Deadlock: A packet does not reach its
destination, because it is blocked at
some intermediate resource
Livelock: A packet does not reach its
destination, because it enters a cyclic
path
Starvation: A packet does not reach its
destination, because some resource
does not grant access (wile it grants
access to other packets)
                                      5
  Routing Algorithm Attributes
Number   of destinations
  Unicast, Multicast, Broadcast?
Adaptivity
  Deterministic , Oblivious or Adaptive?
Implementation    (Mechanisms)
  Source or node routing?
  Table or circuit?



                                            6
Deterministic Routing



                        7
       Deterministic Routing
Alwayschoose the same path
 between two nodes
  Easy to implement and to make deadlock
   free
  Do not use path diversity and thus bad on
   load balancing
  Packets arrive in order




                                               8
         Deterministic Routing -
  Example: Destination-Tag Routing in Butterfly Networks

 Depends on the destination address only (not on source)




         1                                         2

                                  = 101
                            1                                  3
                   0



                                          The destination address
  The destination address in
                                          interpreted as quaternary
  binary is 5 = 101 = down, up,
                                          digits. 11=1011(2) = 23(4),
  down, selects the route.
                                          selects the route
Note: Starting from any source and
using the same pattern always routes to destination.                    9
       Deterministic Routing-
          Dimension-Order Routing

For   n-dimensional hypercubes and
 meshes, dimension-order routing
 produces deadlock-free routing
 algorithms.

It is called XY routing in 2-D mesh and
 e-cube routing in hypercubes


                                       10
Dimension-Order Routing -
    XY Routing Algorithm

S




          D




                       D



                            11
Dimension-Order Routing -
       XY Routing Algorithm




XY routing algorithm for 2 D Mesh   12
     Deterministic Routing -
      E-cube Routing Algorithm




Dimension order routing algorithm for Hypercubes



                                                   13
Oblivious Routing



                    14
Oblivious (unconscious) Routing

Always choose a route without
 knowing about the state of the
 network
  Random algorithms that do not consider
   the network state, are oblivious
   algorithms
  Include deterministic routing
   algorithms as a subset

                                            15
     Minimal Oblivious Routing
   Minimal oblivious routing attempts to
    achieve the load balance of
    randomized routing without giving up
    the locality
   This is done by restricting routes to
    minimal paths
   Again routing is done in two steps
    1. Route to random node
    2. Route to destination

                                            16
       Minimal Oblivious Routing -
                (Torus)
Idea: For each packet
randomly determine a
node x inside the         03   13   23   33

minimal quadrant, such
that the packet is
routed from source        02   12   22   32

node s to x and then to
destination node d
                          01   11   21   31

Assumption: At each
node routing in x or y    00   10   20   30
direction is allowed.

                                              17
           Minimal Oblivious Routing -
                    (Torus)
   For each node in
    quadrant (00, 10, 20, 01, 11,
    21)                             03   13   23   33
    ◦ Determine a minimal
      route via x
                                    02   12   22   32
   Start with x = 00
    ◦ Three possible routes:
       (00, 01, 11, 21) (p=0.33)   01   11   21   31
       (00, 10, 20, 21)
        (p=0.33)
       (00,10,11,21) (p=0.33)      00   10   20   30



                                                        18
         Minimal Oblivious Routing -
                  (Torus)
   x = 01
    ◦ One possible         03   13   23   33
      route:
       (00, 01, 11, 21)
        (p=1)              02   12   22   32



                           01   11   21   31



                           00   10   20   30



                                               19
         Minimal Oblivious Routing -
                  (Torus)
   x = 10
    ◦ Two possible routes:
                                03   13   23   33
       (00, 10, 20, 21)
        (p=0.5)
       (00, 10, 11, 21) (p=0.5)
                               02    12   22   32



                               01    11   21   31



                               00    10   20   30



                                                    20
          Minimal Oblivious Routing -
                   (Torus)
   x = 11
    ◦ Two possible routes:
       (00, 10, 11, 21) (p=0.5)03   13   23   33

       (00, 01, 11, 21) (p=0.5)

                               02    12   22   32



                               01    11   21   31



                               00    10   20   30



                                                    21
        Minimal Oblivious Routing -
                 (Torus)
   x = 20
    ◦ One possible route:
                               03   13   23   33
       (00, 10, 20, 21) (p=1)


                              02    12   22   32



                              01    11   21   31



                              00    10   20   30



                                                   22
        Minimal Oblivious Routing -
                 (Torus)
   x = 21
    ◦ Three possible routes:
                               03   13   23   33
       (00, 01, 11, 21)
        (p=0.33)
       (00, 10, 20, 21)
                               02   12   22   32
        (p=0.33)
       (00, 10, 11, 21)
        (p=0.33)
                               01   11   21   31



                               00   10   20   30




                                                   23
           Minimal Oblivious Routing -
                    (Torus)
   Adding the                   03   13   23   33
    probabilities on each
    channel
   Example, link (00,01)        02   12   22   32

    ◦   P=1/3, x = 00
    ◦   P=1, x = 01
                                 01   11   21   31
    ◦   P=0, x = 10
    ◦   P=1/2, x = 11
    ◦   P=0, x = 20
                                 00   10   20   30
    ◦   P=1/3, x = 21
    ◦   P(00,01)=(2*1/3+1/2+1)/6
                   =2.17/6
                                                     24
        Minimal Oblivious Routing -
                 (Torus)
 Results:                 03        13         23        33
  ◦ Load is not very
    balanced
  ◦ Path between node      02        12         22        32
    10 and 11 is very
    seldomly used           p = 2.17/6 p = 3.83/6
 Good locality            01        11         21        31
  performance is
  achieved at expense p = 2.17/6 p = 1.67/6 p = 2.17/6
  of worst-case            00        10         20        30
  performance
                                p = 3.83/6   p = 2.17/6


                                                               25
Adaptive Routing
   (route influenced by traffic along the way)




                                                 26
            Adaptive Routing
 Uses network state to make routing
 decisions
  Buffer occupancies often used
  Couple with flow control mechanism
Local   information readily available
  Global information more costly to obtain
  Network state can change rapidly
  Use of local information can lead to non‐optimal
  choices
Can be minimal or non‐minimal
                                                      27
         Adaptive Routing
   Local Information not enough


  0      1    2      3   4   5     6   7




 In   each cycle:
  Node 5 sends packet to node 6
  Node 3 sends packet to node 7



                                           28
         Adaptive Routing
   Local Information not enough
 Node 3 does not know about the traffic
 between 5 and 6 before the input buffers
 between node 3 and 5 are completely filled
 with packets!


    0    1   2    3    4    5    6    7




                                          29
         Adaptive Routing
  Local Information is not enough
 Adaptiveflow works better with smaller
 buffers, since small buffers fill faster
 and thus congestion is propagated earlier
 to the sensing node (stiff backpressure)


    0    1    2    3    4    5    6    7




                                             30
        Adaptive Routing
How   does the adaptive routing
 algorithm sense the state of the
 network?
It can only sense current local
 information
Global information is based on historic
 local information
Changes in the traffic flow in the
 network are observed much later
                                       31
     Minimal Adaptive Routing

                      Minimal adaptive
03     13   23   33
                       routing chooses
                       among the minimal
                       routes from
02     12   22   32




01     11   21   31
                       source s to
                       destination d
00     10   20   30




                                       32
 Minimal Adaptive Routing

                     Ateach hop a routing
                     function generates a
                     productive output
03   13   23   33


                     vector that
02   12   22   32     identifies which
                      output channels of
                      the current node will
01   11   21   31
                      move the packet
                      closer to its
00   10   20   30
                      destination
                     Network state is
                      then used to select
                      one of these channels
                      for the next hop
                                         33
    Minimal Adaptive Routing
Local congestion cannot be avoided
                                        Good at locally
                                         balancing load
                                         Poor at globally
      03     13     23     33
                                     
                                         balancing load
      02     12     22     32
                                        Minimal adaptive
                                         routing algorithms
      01     11     21     31            are unable to
                                         avoid congestion
                                         of source-
                                         destination pairs
      00     10     20     30


                                         with no minimal
   Local congestion can be avoided       path diversity.
                                                              34
      Fully Adaptive Routing

 Fully-Adaptive  Routing   03   13   23   33

  does not restrict
  packets to take the       02   12   22   32
  shortest path
 Misrouting is allowed     01   11   21   31

 This can help to avoid
  congested areas and       00   10   20   30
  improves load balance



                                                35
           Fully Adaptive Routing
                  Live-Lock
 Fully-Adaptive  Routing    03   13   23   33
  may result in live-lock!
 Mechanisms must be         02   12   22   32
  added to prevent
  livelock
                             01   11   21   31
  ◦ Misrouting may only be
    allowed a fixed number
    of times                 00   10   20   30




                                                 36
Summary of Routing Algorithms
 Deterministic routing is a simple and
 inexpensive routing algorithm, but does
 not utilize path diversity and thus is weak
 on load balancing

 Obliviousalgorithms give often good
 results since they allow load balancing and
 their effects are easy to analyse

 Adaptive algorithms, though in theory
 superior, suffer from that global
 information is not available at a local node

                                                37
Summary of Routing Algorithms
Latency   paramount concern
  Minimal routing most common for NoC
  Non‐minimal can avoid congestion and
   deliver low latency
To date: NoC research favors DOR for
 simplicity and deadlock freedom
Only covered unicast routing
  Recent work on extending on‐chip routing
   to support multicast

                                              38
       Part 4

NoC Routing Mechanisms




                         39
                 Routing
The term routing mechanics refers to the
mechanism that is used to implement any
routing algorithm

Two   approaches:
  Fixed routing tables at the source or at
   each hop
  Algorithmic routing uses specialized
   hardware to compute the route or next
   hop at run-time
                                              40
         Table-based Routing
Two   approaches:
  Source-table routing implements all-at-
   once routing by looking up the entire route
   at the source
  Node-table routing performs incremental
   routing by looking up the hop-by-hop
   routing relation at each node along the
   route
Major   advantage:
  A routing table can support any routing
  relation on any topology
                                             41
         Table-based Routing




Example routing mechanism for deterministic
source routing NoCs. The NI uses a LUT to store
the route map.
                                                  42
            Source Routing
 Allrouting decisions are made at the source
  terminal
 To route a packet
  1) the table is indexed using the packet
     destination
  2) a route or a set of routes are returned
  3) one route is selected
  4) the route is prepended to the packet
 Because of its speed, simplicity and
  scalability source routing is very often used
  for deterministic and oblivious routing
                                              43
              Source Routing - Example
          The example shows a
           routing table for a 4x2
           torus network                                     01     11        21     31

          In this example there
           are two alternative                               00     10        20     30

           routes for each
           destination                                             4x2 torus network
                                                        In this example the order of XY
          Each node has its own                        should be the opposite, i.e. 21->12
           routing table
        Source routing table for node 00 of 4x2 torus network
        Destination     Route 0              Route 1              Example:
        00              X                    X                    -Routing from 00 to 21
        10              EX                   WWWX                 -Table is indexed with 21
        20              EEX                  WWX                  -Two routes:
        30              WX                   EEEX                 NEEX and WWNX
        01              NX                   SX
        11              NEX                  ENX                  -The source arbitrarily
index   21              NEEX                 WWNX                 selects NEEX
        31              NWX                  WNX
                                                                                         44
 Arbitrary Length Encoding of
        Source Routes
Advantage:
  It can be used for arbitrary-sized
   networks
The  complexity of routing is moved
 from the network nodes to the
 terminal nodes
But routers must be able to handle
 arbitrary length routes


                                        45
     Arbitrary Length-Encoding

 Router     has
    16-bit phits
    32-bit flits
    Route has 13 hops:
     NENNWNNENNWNN
 Extra    symbols:
    P: Phit continuation
   selector
  F: Flit continuation Phit
 The tables entries in
 the terminals must be
 of arbitrary length
                                 46
       Node-Table Routing
Table-based  routing can also be
 performed by placing the routing table
 in the routing nodes rather than in
 the terminals

Node-table  routing is appropriate for
 adaptive routing algorithms, since it
 can use state information at each node


                                      47
        Node-Table Routing
A table lookup is required, when a packet
 arrives at a router, which takes additional
 time compared to source routing

           is sacrificed, since different
 Scalability
 nodes need tables of varying size

          to give two packets arriving from
 Difficult
 a different node a different way through
 the network without expanding the tables

                                               48
                    Example                      01       11       21   31

                                                      N
                                                               E
                                                 00       10       20   30

   Table shows a set of routing tables
   There are two choices from a source
    to a destination
         Routing Table for Node 00




                         Note: Bold font ports are misroutes                 49
                     Example            Livelock can occur
A packet passing through node 00
destined for node 11.
                                   01    11     21     31


If the entry for (00->11) is N ,
                                   00    10     20     30
go to 10 and (10-> 11) is S
=> 00 <-> 10 (livelock)




                                                             50
       Algorithmic Routing

Instead of using a table, algorithms
 can be used to compute the next route

In order to be fast, algorithms are
 usually not very complicated and
 implemented in hardware




                                       51
                Algorithmic Routing -
                      Example
        Dimension-Order
         Routing
         ◦ sx and sy indicated the
           preferred directions
            sx=0, +x; sx=1, -x
            sy=0, +y; sy=1, -y
         ◦ x and y represent the
           number of hops in x and
           y direction
         ◦ The PDV is used as an
           input for selection of a
           route                              Determines the type of the routing
Indicates which channels advance the packet                                    52
      Algorithmic Routing -
            Example
A minimal oblivious router -
 Implemented by randomly selecting
 one of the active bits of the PDV as
 the selected direction
Minimal adaptive router - Achieved by
 making selection based on the length
 of the respective output Qs.
Fully adaptive router – Implemented
 by picking up unproductive direction if
 Qs > threshold results
                                       53
                Summary
Routing   Mechanics
  Table based routing
  Source routing
  Node-table routing
  Algorithmic routing




                          54
                 Exercise
Compression of source routes. In the source
  routes, each port selector symbol
  [N,S,W,E, and X] was encoded with three
  bits. Suggest an alternative encoding to
  reduce the average length (in bits) required
  to represent a source route. Justify your
  encoding in terms of typical routes that
  might occur on a torus. Also compare the
  original three bits per symbol with your
  encoding on the following routes:
(a)NNNNNEEX
(b)WNEENWWWWWNX                               55
         Part 5

     NoC Flow Control
Resources in a Network Node
  Bufferless Flow Control
   Buffered Flow control

                              56
         Flow Control (FC)
FC determines how the resources of a
network, such as channel bandwidth and
buffer capacity are allocated to packets
traversing a network.

Goal is to use resources as efficient
 as possible to allow a high
 throughput

An efficient FC is a prerequisite to
 achieve a good network performance        57
             Flow Control
FC   can be viewed as a problem of
  Resource allocation
  Contention resolution
Resources  in form of channels,
 buffers and state must be allocated to
 each packet
If two packets compete for the same
 channel flow control can only assign
 the channel to one packet, but must
 also deal with the other packet
                                      58
               Flow Control
Flow Control can be divided into:

1.Bufferless   flow control
  Packets are either dropped or misrouted

2.Buffered   flow control
  Packets that cannot be routed via the
   desired channel are stored in buffers

                                             59
 Resources in a Network Node
Control   State
  Tracks the resources allocated to the packet in
   the node and the state of the packet
Buffer
  Packet is stored in
   a buffer before it is
   send to next node


Bandwidth
  To travel to the next node bandwidth has to be
   allocated for the packet
                                                     60
 Units of Resource Allocation -
        Packet or Flits?
Contradictory    requirements on
 packets
  Packets should be very large in order to
   reduce overhead of routing and sequencing
  Packets should be very small to allow
   efficient and fine-grained resource
   allocation and minimize blocking latency
Flits   try to eliminate this conflict
  Packets can be large (low overhead)
  Flits can be small (efficient resource
   allocation)                              61
 Units of Resource Allocation -
     Size: Phit, Flit, Packet

There  are no fixed rules for the size
 of phits, flits and packets

Typical   values
  Phits: 1 bit to 64 bits
  Flits: 16 bits to 512 bits
  Packets: 128 bits to 1024 bits


                                          62
      Bufferless Flow Control
No  buffers less implementation cost
If more than 1 packet shall be routed
 to the same output, 1 has to be
   Misrouted or
   Dropped
Example: two
packets A, and B
(consisting
 of several flits) arrive at a network node.

                                               63
     Bufferless Flow Control
Packet B is dropped and must be
 resended
There must be a protocol that informs
 the sending node that the packet has
 been dropped
 Example: Resend after no acknowledge has
 been received within a given time




                                            64
     Bufferless Flow Control
Packet  B is misrouted
No further action is required here,
 but at the receiving node packets have
 to be sorted into original order




                                      65
           Circuit Switching



 Circuit-Switchingis a bufferless flow
 control, where several channels are
 reserved to form a circuit
  A request (R) propagates from source to
   destination, which is answered by an
   acknowledgement (A)
  Then data is sent (here two five flit packets (D))
   and a tail flit (T) is sent to deallocate the
   channels
                                                        66
           Circuit Switching



Circuit-switchingdoes not suffer from
 dropping or misrouting packets
However there are two weaknesses:
  High latency: T = 3 H tr + L/b
  Low throughput, since channel is used to a
   large fraction of time for signaling and not
   for delivery of the payload
                                                  67
         Circuit Switching Latency

          T = 3 H tr + L/b
Where:
H : time required to set up the channel and delivers the
head flit
tr: serialization latency
L: time of flight
b: contention time

 Note: 3 x header latency because the path from source to
 destination must be traversed 3 times to deliver the packet:
 Once in each direction to set up the circuit and then again to
 deliver the first flit
                                                                  68
      Buffered Flow Control
More  efficient flow control can be
 achieved by adding buffers

With  sufficient buffers packets do
 not need to be misrouted or dropped,
 since packets can wait for the
 outgoing channel to be ready



                                        69
      Buffered Flow Control
Two main approaches:

1.Packet-Buffer   Flow Control
  Store-And-Forward
  Cut-Through

2.Flit-Buffer   Flow Control
  Wormhole Flow Control
  Virtual Channel Flow Control
                                  70
 Store & Forward Flow Control




 Each node along a route waits until a packet
 is completely received (stored) and then the
 packet is forwarded to the next node
Two     resources are needed
  Packet-sized buffer in the switch
  Exclusive use of the outgoing channel
                                             71
 Store & Forward Flow Control




 Advantage:   While waiting to acquire
  resources, no channels are being held idle
  and only a single packet buffer on the
  current node is occupied
 Disadvantage: Very high latency
   T = H (tr + L/b)
                                               72
    Cut-Through Flow Control

Advantages
  Cut-through
   reduces the latency
  T = H tr + L/b
 Disadvantages
  No good utilization of buffers, since they are
   allocated in units of packets
  Contention latency is increased, since packets
   must wait until a whole packet leaves the occupied
   channel
                                                    73
         Wormhole Flow Control
 Wormhole FC operates like cut-through, but
 with channel and buffers allocated to flits
 rather than packets

 When the head flit arrives at a node, it must
 acquire resources (VC, B,) before it can be
 forwarded to the next node

 Tailflits behave like body flits, but release
 also the channel


                                                  74
  Wormhole (WH) Flow Control
Virtualchannels hold the state needed
 to coordinate the handling of flits of a
 packet over a channel

Comparison   to cut-through
  wormhole flow control makes far more
   efficient use of buffer space
  Throughput maybe less, since wormhole
   flow control may block a channels mid-
   packets
                                            75
  Example for WH Flow Control




Input  virtual channel is in idle state (I)
Upper output channel is occupied,
 allocated to lower channel (L)            76
 Example for WH Flow Control




Inputchannel enters waiting state (W)
Head flit is buffered
                                     77
  Example for WH Flow Control




 Body  flit is also buffered
 No more flits can be buffered, thus congestion arises
  if more flits want to enter the switch             78
 Example for WH Flow Control




Virtualchannel enters active state (A)
Head flit is output on upper channel
Second body flit is accepted
                                      79
 Example for WH Flow Control




First    body flit is output
   Tail flit is accepted
                                80
 Example for WH Flow Control




Second   body flit is output
                                81
 Example for WH Flow Control




Tail flit is output
 Virtual channels is deallocated and
 returns to idle state                  82
        Wormhole Flow Control




 The main advantage of wormhole to cut-
 through is that buffers in the routers do
 not need to be able to hold full packets,
 but only need to store a number of flits
 This allows to use smaller and faster
 routers
                                             83
        Part 6
NoC Flow Control (continued)
          Blocking

Virtual Channel-Flow Control

   Virtual Channel Router

 Credit-Based Flow Control

   On/Off Flow Control

   Flow Control Summary
                               84
           Blocking -
   Cut-Through and Wormhole
       Cut-Through (Buffer-Size 1 Packet)


                                            Blocked



       Wormhole (Buffer-Size 2 Flits)


                                            Blocked




 Ifa packet is blocked, the flits of the
 wormhole packet are stored in different
 routers
                                                      85
        Wormhole Flow Control




 There is only one virtual channel for each physical
  channel
 Packet A is blocked and cannot acquire channel p
 Though channels p and q are idle packet A cannot
  use these channels since B owns channel p

                                                        86
  Virtual Channel-Flow Control
 In virtual channel flow-control several
 channels are associated with a single
 physical channel
 This allows to use the bandwidth that
 otherwise is left idle when a packet blocks
 the channel
 Unlike wormhole flow control subsequent
 flits are not guaranteed bandwidth, since
 they have to compete for bandwidth with
 other flits

                                               87
  Virtual Channel Flow Control
There  are several virtual channels
 for each physical channel
 Packet A can use a second virtual
 channel and thus proceed over
 channel p and q




                                       88
       Virtual Channel Allocation
 Flits   must be delivered in order, H, B, …B, T.
   Only the head flit carries routing information

 Allocate VC at the packet level, i.e., packet-
  by-packet
   The head flit responsible for allocating VCs along the route.
   Body and tail flits must follow the VC path, and the tail flit
    releases the VCs.

 The flits of a packet cannot interleave with
  those of any other packet




                                                                     89
    Virtual Channel Flow Control -
     Fair Bandwidth Arbitration
VCs interleave their flits  Results in
a high average latency




                                       90
 Virtual Channel Flow Control -
 Winner-Take-All Arbitration
Awinner-take all arbitration reduces
the average latency with no
throughput penalty




                                        91
 Virtual Channel Flow Control -
        Buffer Storage
Buffer storage is organized in two
 dimensions
  Number of virtual channels
  Number of flits that can be buffered per
   channel




                                              92
   Virtual Channel Flow Control -
          Buffer Storage
Virtual channel buffer shall at least
 be as deep as needed to cover round-
 trip credit latency

In general it is usually better to add
 more virtual channels than to increase
 the buffer size


                                         93
         Virtual Channel




A: active
W: waiting
I: idle


                           94
Virtual Channel Router




                         95
         Buffer Organization




Single buffer per   Multiple fixed length
input               queues per physical
                    channel
                                            96
       Buffer Management
In buffered CF nodes there is a need
 for communication between nodes in
 order to inform about the availability
 of buffers
  Backpressure informs upstream nodes that
   they must stop sending to a downstream
   node when the buffers of that downstream
   node are full
                             Traffic Flow


             upstream node             downstream node
                                                         97
    Credit-Based Flow Control
 The   upstream router keeps a count of the
  number of free flit buffers in each virtual
  channel downstream
 Each time the upstream router forwards a
  flit, it decrements the counter
 If a counter reaches zero, the downstream
  buffer is full and the upstream node cannot
  send a new flit
 If the downstream node forwards a flit, it
 frees the associated buffer and sends a
 credit to the upstream buffer, which
 increments its counter                        98
Credit-Based Flow Control




                            99
            Credit-Based Flow Control
      The  minimum time between the credit being
        sent at time t1 and a credit send for the
        same buffer at time t5 is the credit round-
        trip delay tcrt
All buffers on the
downstream
are full




                                                  100
       Credit-Based Flow Control
 If there is only a
 single flit buffer, a
 flit waits for a new
 credit and the
 maximum throughput
 is limited to one flit
 for each tcrt

 The bit rate would
 be then Lf / tcrt
 where Lf is the
 length of a flit in
 bits
                                   101
       Credit-Based Flow Control
 If there are F flit
 buffers on the
 virtual channel, F
 flits could be sent
 before waiting for
 the credit, which
 gives a throughput of
 F flits for each tcrt
 and a bit rate of FLf
 / tcrt



                                   102
       Credit-Based Flow Control
 Inorder not to limit
 the throughput by
 low level flow
 control the flit
 buffer should be at
 least



where b is the
 bandwidth of a
 channel

                                   103
    Credit-Based Flow Control
 For each flit sent
 downstream a
 corresponding credit
 is set upstream

 Thus there is a large
 amount of upstream
 signaling, which
 especially for small
 flits can represent a
 large overhead!


                                104
              On/Off Flow Control
   On/off Flow control tries to reduce
    the amount of upstream signaling

    An off signal is sent to the upstream
    node, if the number of free buffers
    falls below the threshold Foff

    An on signal is sent to the upstream
    node, if the number of free buffers
    rises above the threshold Fon

   With carefully dimensioned buffers
    on/off flow control can achieve a very
    low overhead in form of upstream
    signaling
                                             105
      Ack/Nack Flow Control
In ack/nack flow
 control the
 upstream node
 sends packets
 without knowing, if
 there are free
 buffers in the
 downstream node


                              106
        Ack/Nack Flow Control
 Ifthere is no buffer
 available
   the downstream node sends
    nack and drops the flit
   the flit must be resent
   flits must be reordered at
    the downstream node
 Ifthere is a buffer
 available
  The downstream node
   sends ack and stores the
   flit in a buffer

                                 107
       Buffer Management
Because  of its buffer and bandwidth
 inefficiency ack/nack is rarely used

Credit-based flow control is used in
 systems with small numbers of
 buffers

On/off  flow control is used in systems
 that have large numbers of flit
 buffers                                108
       Flow Control Summary
Bufferless    flow control
  Dropping, misroute packets
  Circuit switching
Buffered    flow control
  Packet-Buffer Flow Control: SAF vs. Cut Through
  Flit-Buffer Flow Control: Wormhole and Virtual
   Channel
Switch-to-switch       (link level) flow
 control
  Credit-based, On/Off, Ack/Nack
                                                 109
           Part 7

   Router Architecture
  Virtual-channel Router
Virtual channel state fields
    The Router Pipeline
      Pipeline Stalls
                               110
   Router Microarchitecture -
     Virtual-channel Router
Modern    routers are pipelined and work
 at the flit level
Head flits proceed through buffer
 stages that perform routing and
 virtual channel allocation
All flits pass through switch allocation
 and switch traversal stages
Most routers use credits to allocate
 buffer space
                                        111
 Typical Virtual Channel Router
A routers functional
 blocks can be divided into
  Datapath: handles storage
   and movement of a packets
   payload
      Input buffers
      Switch
      Output buffers
  Control: coordinating the
   movements of the packets
   through the resources of the
   datapath
      Route Computation
      VC Allocator
      Switch Allocator
                                  112
 Typical Virtual Channel Router
 The    input unit contains
  a set of flit buffers
 Maintains the state
  for each virtual
  channel
     G = Global State
     R = Route
     O = Output VC
     P = Pointers
     C = Credits

                                  113
Virtual Channel State Fields
          (Input)




                               114
 Typical Virtual Channel Router
 Duringroute
 computation the
 output port for the
 packet is determined

 Then the packet
 requests an output
 virtual channel from
 the virtual-channel
 allocator

                                  115
  Typical Virtual Channel Router
 Flits are forwarded via the
  virtual channel by allocating
  a time slot on the switch
  and output channel using
  the switch allocator
 Flits are forwarded to the
  appropriate output during
  this time slot
 The output unit forwards
  the flits to the next router
  in the packet’s path


                                   116
Virtual Channel State Fields
          (Output)




                               117
    Packet Rate and Flit Rate
The control of the router operates at
 two distinct frequencies
  Packet Rate (performed once per packet)
  Route computation
  Virtual-channel allocation
  Flit Rate (performed once per flit)
  Switch allocation
  Pointer and credit count update




                                             118
         The Router Pipeline
                     A typical router pipeline
                      includes the following
                      stages
                       RC (Routing
                        Computation)
                       VC (Virtual Channel
no pipeline stalls      Allocation)
                       SA (Switch Allocation)
                       ST (Switch Traversal


                                                  119
         The Router Pipeline
                      Cycle0
                       Head flit arrives and
                        the packet is directed
                        to an virtual channel of
                        the input port (G = I)

no pipeline stalls




                                               120
         The Router Pipeline
                      Cycle   1
                       Routing computation
                       Virtual channel state
                        changes to routing (G =
                        R)
                       Head flit enters RC-stage
no pipeline stalls     First body flit arrives at
                        router




                                                     121
         The Router Pipeline
                      Cycle2: Virtual Channel
                      Allocation
                       Route field (R) of virtual
                        channel is updated
                       Virtual channel state is set to
                        “waiting for output virtual
                        channel” (G = V)
no pipeline stalls     Head flit enters VA state
                       First body flit enters RC stage
                       Second body flit arrives at
                        router

                                                          122
         The Router Pipeline
                      Cycle2: Virtual Channel
                      Allocation
                       The result of the routing
                        computation is input to the
                        virtual channel allocator
                       If successful, the allocator
                        assigns a single output
no pipeline stalls      virtual channel
                       The state of the virtual
                        channel is set to active (G
                        =A


                                                   123
         The Router Pipeline
                      Cycle3: Switch
                      Allocation
                       All further processing is
                        done on a flit base
                       Head flit enters SA stage
                       Any active VA (G = A) that
                        contains buffered flits
no pipeline stalls      (indicated by P) and has
                        downstream buffers
                        available (C > 0) bids for a
                        single-flit time slot
                        through the switch from
                        its input VC to the output
                        VC                           124
         The Router Pipeline
                      Cycle 3: Switch
                      Allocation
                       If successful, pointer
                        field is updated
                       Credit field is
                        decremented
no pipeline stalls




                                                 125
         The Router Pipeline
                      Cycle4: Switch
                      Traversal
                       Head flit traverses the
                        switch
                     Cycle    5:
                       Head flit starts
no pipeline stalls      traversing the channel
                        to the next router



                                                 126
         The Router Pipeline
                      Cycle   7:
                       Tail traverses the switch
                       Output VC set to idle
                       Input VC set to idle (G =
                        I), if buffer is empty
                       Input VC set to routing (G
no pipeline stalls
                        = R), if another head
                        flit is in the buffer




                                                     127
         The Router Pipeline
                      Onlythe head flits enter
                      the RC and VC stages

                      The body and tail flits
                      are stored in the flit
                      buffers until they can
no pipeline stalls
                      enter the SA stage




                                                 128
                 Pipeline Stalls
Pipeline   stalls can be divided into
  Packet stalls
   can occur if the virtual channel cannot advance
    to its R, V, or A state
  Flit stalls
   If a virtual channel is in active state and the
    flit cannot successfully complete switch
    allocation due to
     Lack of flit
     Lack of credit
     Losing arbitration for the switch time slot

                                                      129
    Example for Packet Stall
Virtual-channel   allocation stall
  Head flit of A can first enter the VA
   stage when the tail flit of packet B
   completes switch allocation and releases
   the virtual channel




                                              130
   Example for Packet Stall
Virtual-channel allocation stall




Head flit of A can first enter the VA
stage when the tail flit of packet B
completes switch allocation and
releases the virtual channel            131
      Example for Flit Stalls

Switch allocation stall




Second body flit fails to allocate the
requested connection in cycle 5
                                         132
      Example for Flit Stalls
Buffer empty stall




Body flit 2 is delayed three cycles. However,
since it does not have to enter the RC and VA
stage the output is only delayed one cycle!
                                            133
               Credits
A  buffer is allocated in the SA stage
 on the upstream (transmitting) node
To reuse the buffer, a credit is
 returned over a reverse channel after
 the same flit departs the SA stage of
 the downstream (receiving) node
When the credit reaches the input
 unit of the upstream node the buffer
 is available can be reused
                                          134
                Credits
The credit loop can be viewed by
 means of a token that
  Starting at the SA stage of the upstream
   node
  Traveling downwards with the flit
  Reaching the SA stage at the downstream
   node
  Returning upstream as a credit



                                              135
       Credit Loop Latency
The    credit loop latency tcrt,
 expressed in flit times, gives a lower
 bound on the number of flit buffers
 needed on the upstream size for the
 channel to operate with full bandwidth
tcrt in flit times is given by:




                                      136
          Credit Loop Latency
If the number of buffers available per
 virtual channel is F, the duty factor of
 the channel will be
      d = min (1, F / tcrt)

The  duty factor will be 100% as long
 as there are sufficient flit buffers to
 cover the round trip latency

                                           137
                      Credit Stall
Virtual Channel Router with 4 flit buffers




                                             138
           Flit and Credit Encoding
A. Flits and credits are send over separated lines
   with separate width
B. Flits and credits are transported via the same
   line. This can be done by
      Including credits into flits
      Multiplexing flits and credits at phit level
    Option (A) is considered more efficient. For a more detailed
     discussion check Section 16.6 in the Dally-book




                                                                    139
           Summary
NoC   is a scalable platform for
 billion-transistor chips
 Several driving forces behind it
 Many open research questions
 May change the way we structure
 and model VLSI systems




                     Hong Kong University of Science and Technology, March 2010   140
                  References
OASIS   NoC Architecture Design in
 Verilog HDL, Technical Report,TR-
 062010-OASIS, Adaptive Systems
 Laboratory, the University of Aizu,
 June 2010.
OASIS NoC Project:
http://web-ext.u-aizu.ac.jp/~benab/research/projects/oasis/




                                                              141
Network-on-Chip
     Ben Abdallah, Abderazek
      The University of Aizu
    E-mail: benab@u-aizu.ac.jp




         KUST University, March 2011
                                       142

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:5
posted:8/25/2011
language:Dutch
pages:142
pptfiles pptfiles
About