No Slide Title
Document Sample


Designing Packet Buffers for
Internet Routers
Wednesday, September 12, 2012
Nick McKeown
Professor of Electrical Engineering
and Computer Science, Stanford University
nickm@stanford.edu
www.stanford.edu/~nickm
Contents
1. Motivation
A 100 Tb/s router
160 Gb/s packet buffer
2. Theory
Generic Packet Buffer Problem
Optimal Memory Management
3. Implementation
2
Motivating Design: 100Tb/s Optical
Router
Optical
Switch
40Gb/s
40Gb/s
Request 40Gb/s
Arbitration
Grant 40Gb/s
(100Tb/s = 625 * 160Gb/s)
3
Load Balanced Switch
Three stages on a linecard
1st stage 2nd stage 3rd stage
R/N
1 1 1
R R R R R R
2 2 2
N N N
Segmentation/ Main Buffering Reassembly
Frame Building
4
Advantages
Load-balanced switch
100% throughput
No switch scheduling
Hybrid Optical-Electrical Switch Fabric
Low (almost zero) power
Can use an optical mesh
No reconfiguration of internal switch (MEMS)
5
160 Gb/s Linecard
0.4 Gbit
at 3.2 ns
R Lookup/ Fixed-size R
Segmentation
Processing Packets
1st Stage Load-balancing
R
40 Gbit
VOQs at 3.2 ns
1
R
2
N
2nd Stage Switching
3 rd stage
R R
Reassembly
0.4 Gbit
at 3.2 ns
6
Contents
1. Motivation
A 100 Tb/s router
160 Gb/s packet buffer
2. Theory
Generic Packet Buffer Problem
Optimal Memory Management
3. Implementation
7
Packet Buffering Problem
Packet buffers for a 160Gb/s router linecard
40Gbits
Buffer
Memory
Write Rate, R Read Rate, R
Buffer Manager
One 128B packet One 128B packet
every 6.4ns every 6.4ns
Scheduler
Requests
Problem is solved if a memory can be (random)
accessed every 3.2ns and store 40Gb of data
8
Memory Technology
Use SRAM?
+ Fast enough random access time, but
- Too low density to store 40Gbits of data.
Use DRAM?
+ High density means we can store data, but
- Can’t meet random access time.
9
Can’t we just use lots of DRAMs in
parallel?
Read/write 1280B every 32ns
Buffer Buffer Buffer Buffer Buffer
Memory Memory Memory Memory Memory
0-127 128-255 … … … … … … … 1152-1279
1280B 1280B
Write Rate, R Read Rate, R
Buffer Manager
One 128B packet One 128B packet
every 6.4ns every 6.4ns
Scheduler
Requests
10
Works fine if there is only one FIFO
Buffer Memory
1280B 1280B 1280B 1280B 1280B 1280B 1280B 1280B 1280B 1280B
1280B 1280B 1280B 1280B 1280B 1280B 1280B 1280B 1280B 1280B
0-127 128-255 … … … … … … … 1152-1279
1280B 1280B
128B Write Rate, R Read Rate, R 128B
1280B 128B
128B 128B 1280B 128B
128B 128B
One 128B packet One 128B packet
every 6.4ns Buffer Manager every 6.4ns
(on chip SRAM)
Scheduler
Aggregate 1280B for the queue in fast SRAM Requests
and read and write to all DRAMs in parallel
11
In practice, buffer holds many FIFOs
1280B 1280B 1280B 1280B
1
e.g.
How can we write
In an IP Router, 2
1280B 1280B 1280B 1280B multiple packets into
Q might be 200.
different queues?
In an ATM switch,
Q might be 106.
1280B 1280B 1280B 1280B
Q
0-127 128-255 … … … … … … … 1152-1279
1280B 1280B
?B Write Rate, R Read Rate, R ?B
320B 320B
One 128B packet One 128B packet
every 6.4ns Buffer Manager every 6.4ns
Scheduler
Requests
12
Parallel Packet Buffer
Hybrid Memory Hierarchy
Large DRAM memory holds the body of FIFOs
54 53 52 51 50 10 9 8 7 6 5
1
95 94 93 92 91 90 89 88 87 86 15 14 13 12 11 10 9 8 7 6
2
86 85 84 83 82 11 10 9 8 7
Q
DRAM
b = degree of
parallelism
Writing Reading
b bytes b bytes
1 1
Arriving 60 59 58 57 56 55 4 3 2 1 Departing
Packets 97 96
2 5 4 3 2 1
2 Packets
Buffer Manager R
R (ASIC with on chip SRAM)
91 90 89 88 87
Q 6 5 4 3 2 1
Q
Small tail SRAM Small head SRAM Scheduler
cache for FIFO tails cache for FIFO heads Requests
13
Problem
Problem:
What is the minimum size of the SRAM needed so that
every packet is available immediately within a fixed latency?
Solutions:
Qb(2 +ln Q) bytes, for zero latency
Q(b – 1) bytes, for Q(b – 1) + 1 time slots latency.
Examples:
1. 160Gb/s line card, b=1280, Q=625: SRAM = 52Mbits
2. 160Gb/s line card, b=1280, Q=625: SRAM =6.1Mbits,
latency is 40ms.
14
Discussion
Q=1000, b = 10
Queue Length
for Zero Latency
SRAM Size
dw 1
dx x Queue Length
for Maximum Latency
Pipeline Latency, x
15
Contents
1. Motivation
A 100 Tb/s router
160 Gb/s packet buffer
2. Theory
Generic Packet Buffer Problem
Optimal Memory Management
3. Implementation
16
Technology Assumptions in 2005
DRAM Technology
Access Time ~ 40 ns
Size ~ 1 Gbits
Memory Bandwidth ~ 16 Gbps (16 data pins)
On-chip SRAM Technology
Access Time ~ 2.5 ns
Size ~ 64 Mbits
Serial Link Technology
Bandwidth ~ 10 Gb/s
100 serial links per chip
17
Packet Buffer Chip (x4)
Details and Status
DRAM DRAM DRAM
R/4 R/4
Buffer
SRAM Manager SRAM
Incoming: 4x10 Gb/s
Outgoing: 4x10 Gb/s
35 pins/DRAM x 10 DRAMs = 350 pins
SRAM Memory: 3.1 Mbits with 3.2ns SRAM
Implementation starts Fall 2003
18
Get documents about "