# No Slide Title

W
Shared by:
Categories
Tags
-
Stats
views:
0
posted:
9/12/2012
language:
English
pages:
18
Document Sample

```							Designing Packet Buffers for
Internet Routers

Wednesday, September 12, 2012

Nick McKeown
Professor of Electrical Engineering
and Computer Science, Stanford University
nickm@stanford.edu
www.stanford.edu/~nickm
Contents

1.   Motivation
   A 100 Tb/s router
   160 Gb/s packet buffer

2.   Theory
   Generic Packet Buffer Problem
   Optimal Memory Management

3.   Implementation

2
Motivating Design: 100Tb/s Optical
Router

Optical
Switch

40Gb/s
40Gb/s

Request                  40Gb/s
Arbitration
Grant                     40Gb/s

(100Tb/s = 625 * 160Gb/s)
3
Three stages on a linecard

1st stage                     2nd stage               3rd stage
R/N
1                             1                       1
R                    R       R                    R   R                       R
2                             2                       2

N                             N                       N

Segmentation/                Main Buffering           Reassembly
Frame Building

4

   100% throughput
   No switch scheduling

   Hybrid Optical-Electrical Switch Fabric
   Low (almost zero) power
   Can use an optical mesh
   No reconfiguration of internal switch (MEMS)

5
160 Gb/s Linecard
0.4 Gbit
at 3.2 ns

R    Lookup/                            Fixed-size   R
Segmentation
Processing                           Packets
R
40 Gbit
VOQs         at 3.2 ns
1
R
2

N

2nd Stage        Switching
3 rd stage
R                                                    R
Reassembly

0.4 Gbit
at 3.2 ns
6
Contents

1.   Motivation
   A 100 Tb/s router
   160 Gb/s packet buffer

2.   Theory
   Generic Packet Buffer Problem
   Optimal Memory Management

3.   Implementation

7
Packet Buffering Problem
Packet buffers for a 160Gb/s router linecard

40Gbits

Buffer
Memory
Write Rate, R                       Read Rate, R
Buffer Manager
One 128B packet                     One 128B packet
every 6.4ns                         every 6.4ns
Scheduler
Requests

Problem is solved if a memory can be (random)
accessed every 3.2ns and store 40Gb of data
8
Memory Technology

   Use SRAM?
+ Fast enough random access time, but
- Too low density to store 40Gbits of data.

   Use DRAM?
+ High density means we can store data, but
- Can’t meet random access time.

9
Can’t we just use lots of DRAMs in
parallel?

Buffer             Buffer                   Buffer                Buffer              Buffer
Memory             Memory                   Memory                Memory              Memory

0-127    128-255   …            …           …            …        …            …      …      1152-1279

1280B                1280B
Write Rate, R                                            Read Rate, R
Buffer Manager
One 128B packet                                          One 128B packet
every 6.4ns                                              every 6.4ns

Scheduler
Requests
10
Works fine if there is only one FIFO

Buffer Memory
1280B   1280B   1280B    1280B       1280B   1280B   1280B   1280B   1280B       1280B

1280B   1280B   1280B    1280B       1280B   1280B   1280B   1280B   1280B       1280B

0-127     128-255     …          …             …           …           …      …            …       1152-1279

1280B                           1280B
128B    Write Rate, R                                                         Read Rate, R         128B
1280B 128B
128B 128B                      1280B 128B
128B 128B

One 128B packet                                                            One 128B packet
every 6.4ns                       Buffer Manager                           every 6.4ns
(on chip SRAM)
Scheduler
Aggregate 1280B for the queue in fast SRAM                           Requests
and read and write to all DRAMs in parallel
11
In practice, buffer holds many FIFOs

1280B      1280B   1280B   1280B
1
e.g.
How can we write
 In an IP Router,                                             2
1280B      1280B   1280B   1280B           multiple packets into
Q might be 200.
different queues?
 In an ATM switch,
Q might be 106.

1280B      1280B   1280B   1280B
Q

0-127     128-255   …    …           …          …           …       …         …    1152-1279

1280B                       1280B

?B     Write Rate, R                                               Read Rate, R    ?B
320B                   320B

One 128B packet                                               One 128B packet
every 6.4ns             Buffer Manager                        every 6.4ns

Scheduler
Requests
12
Parallel Packet Buffer
Hybrid Memory Hierarchy
Large DRAM memory holds the body of FIFOs
54 53 52 51 50                 10 9   8   7   6   5
1

95 94 93 92 91 90 89 88 87 86        15 14 13 12 11 10 9   8   7   6
2

86 85 84 83 82                        11 10 9     8   7
Q

DRAM
b = degree of
parallelism
b bytes                                        b bytes

1                                                                   1
Arriving      60 59 58 57 56 55                                                         4 3 2 1           Departing
Packets                  97 96
2                                                 5   4 3     2 1
2    Packets
Buffer Manager                                                R
R                                    (ASIC with on chip SRAM)
91 90 89 88 87
Q                                                6 5 4 3 2 1
Q

Small tail SRAM                                              Small head SRAM                        Scheduler
cache for FIFO tails                                         cache for FIFO heads                   Requests

13
Problem
   Problem:
   What is the minimum size of the SRAM needed so that
every packet is available immediately within a fixed latency?

   Solutions:
   Qb(2 +ln Q) bytes, for zero latency
   Q(b – 1) bytes, for Q(b – 1) + 1 time slots latency.

Examples:
1. 160Gb/s line card, b=1280, Q=625: SRAM = 52Mbits
2. 160Gb/s line card, b=1280, Q=625: SRAM =6.1Mbits,
latency is 40ms.

14
Discussion
Q=1000, b = 10

Queue Length
for Zero Latency
SRAM Size

dw   1

dx   x                             Queue Length
for Maximum Latency

Pipeline Latency, x

15
Contents

1.   Motivation
   A 100 Tb/s router
   160 Gb/s packet buffer

2.   Theory
   Generic Packet Buffer Problem
   Optimal Memory Management

3.   Implementation

16
Technology Assumptions in 2005

   DRAM Technology
   Access Time ~ 40 ns
   Size ~ 1 Gbits
   Memory Bandwidth ~ 16 Gbps (16 data pins)

   On-chip SRAM Technology
   Access Time ~ 2.5 ns
   Size ~ 64 Mbits

   Bandwidth ~ 10 Gb/s
   100 serial links per chip

17
Packet Buffer Chip (x4)
Details and Status

DRAM    DRAM        DRAM

R/4                               R/4

Buffer
SRAM     Manager   SRAM

   Incoming: 4x10 Gb/s
   Outgoing: 4x10 Gb/s
   35 pins/DRAM x 10 DRAMs = 350 pins
   SRAM Memory: 3.1 Mbits with 3.2ns SRAM
   Implementation starts Fall 2003

18

```
Related docs
Other docs by HC1209122075
Edward Voyzey
design lightsound