Quadrics QsNetII: A network for Supercomputing Applications

Document Sample
Quadrics QsNetII: A network for Supercomputing Applications Powered By Docstoc
					Quadrics QsNetII:
A network for Supercomputing Applications
David Addison, Jon Beecroft, David Hewson, Moray McLaren (Quadrics Ltd.), Fabrizio Petrini (LANL)

August 2003

Hot Chips

Can’t have that anyyou need the Any process to yourother process You’ve bought so super computer So what it! Simple That’sdo you … zero time. network? next best thing want from the communication in your building for it! You’ve constructed

August 2003

Hot Chips

Image Courtesy of LANL

A Quadrics Supercomputer Network
 Ultra low user process to user process latency  Highest possible (affordable?) compute communications ratio

 Seamless scaling to many 1000s of nodes
 High availability  Reliable data transfer

 Mixed system and multiple user traffic on one network

August 2003

Hot Chips

Supercomputers
 Los Alamos National Laboratory – ASCI Q 13880 Gflops



Lawrence Livermore National
Laboratory - MCR 7634 Gflops



Lawrence Livermore National Laboratory - ALC 6586 Gflops



Pacific Northwest National Laboratory - 4881 Gflops



Pittsburgh Supercomputer Centre - Le Mieux 4463 Gflops



CEA - Tera -3680 Gflops
August 2003 Hot Chips Images Courtesy of LANL, LLNL,PNNL, PSC, CEA

QsNet

II

Components

 Elan 4 network interface card

 Elite 4 switch component

 QsNet II Switch

August 2003

Hot Chips

Topology
 Fat trees give great performance
Scales to the number of nodes Fault tolerance structure Uniform connectivity Supports global operations

August 2003

Hot Chips

A Process Communication
Cable or fibre up to 150m

Cable up to 13m

Elite4 Switch Chip

August 2003

Hot Chips

8 byte write latency on a 4000 node machine with 50m of cable
3000 2500
385

Latency (ns)

2000 1500

21 8

Elite Switch
231

Cable Elan CPU ChipSet

1 050

21 8 240

1000 500 0

855

990

Elan3 ( QsNet )

Elan4 ( QsNet II )

August 2003

Hot Chips

Elan 4 functional units
 64-bit virtual addressing

 Short Transaction ENgine
 Pipelined (R)DMA engine  64-bit RISC processor
 16Kbyte On-chip I-cache

 Memory System
 32Kbyte On-chip D-cache, pipelined fills, multi-port.  64-bit MMU 128-TLB entries, hash walk engine, mixed page sizes, 16 bit context.  64-bit/133MHz PCI-X  64Mbytes ECC DDR RAM

 Link. 2.6 Gbytes/sec total
August 2003 Hot Chips

Elan4 Command Queues
 Enables a user process to send packets into the network with very low latency  Used to start all operations. (RDMA, STEN, RISC threads etc)  Up to 8K command queues can be allocated  Command queues can by used by many processes simultaneously from multiple CPUs

August 2003

Hot Chips

Command Programming Model

August 2003

Hot Chips

Command Proc Implementation

August 2003

Hot Chips

Command Proc Implementation

August 2003

Hot Chips

Command Proc Implementation

August 2003

Hot Chips

Command Proc Implementation

August 2003

Hot Chips

Elan4 Command Queues
 The Command Processor can execute commands directly as they are written from the PCI-X bus  80ns from PCI-X bus to network link  Provides auto retry of STEN packets  DDR-SDRAM used as backing store for Queues  Copes with a main CPU process timeslice in a command stream and concurrent access by multiple CPUs  Copes with occasional “out of order” PIO writes

August 2003

Hot Chips

Elan4 RDMA engine
 Addresses are 64 bit  Pipelined operation to hide large PCI-X read latency  Processes two RDMA descriptors concurrently to achieve peak bandwidth with multiple small RDMAs  Two run queues for high and low priority scheduling  Timeslices between multiple RDMAs

August 2003

Hot Chips

RISC processor
 64 bit word  Instruction set optimised for low latency scheduling  16 Kbyte I-Cache  Registers loaded directly from the network  Block load/store instructions for copy bandwidth
August 2003 Hot Chips

Elan4 implementation
 LSI G12 process  0.18um 4 + R layer metal process  7.5 mm x 7.5 mm  ~800,000 gates  ~283 signal pins  512 ball BGA  3 watts

August 2003

Hot Chips

QsNetII Physical Link
 1.333Ghz design speed
 4b5b coding for DC balance on cables and fiber  ~920 Mbytes/s after protocol  Internal switch links deliver 1.18 Gbytes/s after protocol  2 virtual channels

 Copper
 10 bit LVDS – total 40 wires  12m range

 Optics
 12 bit parallel optical fiber  150m range
August 2003 Hot Chips

QsNetII Elite4 Switch Component
        8 QsNetII links  2 virtual channels Broadcast to range of outputs Full automatic error detection / recovery Arbitration based on age of packet Two levels of priority Adaptive routing support Unblocked latency of ~20ns Traceroute transaction for interrogating the network

August 2003

Hot Chips

Elite 4 implementation
 LSI G12 process  0.18um 4 + 1 layer metal process.  8.67mm x 8.67mm  ~ 1 million gates  ~348 signal pins  608 ball BGA  6.5 watts

August 2003

Hot Chips

Performance
 Short message latency  RDMA bandwidth  Performance in applications
 MPI message passing  Global operations

August 2003

Hot Chips

Short message latency on a 4000 node machine with 50m cable
2.5

2

Latency (us)

1.5

1

0.5

0 0 16 32 48 64 80 96 112 128 Number of bytes in message
August 2003 Hot Chips

RDMA bandwidth
1000 900 800
Bandwidth (MB/s)

Batched RDMAs

700 600 500 400 300 200 100 0 0 8 32 128 512 2048 8192 32768 131072 524288 Size in bytes

1 2 4 8

August 2003

Hot Chips

MPI short message latency
40 35 30

uSeconds

25 20 15 10 5 0
128 256 512 16 32 64 0 4 8

Infiniband QsNet II (elan4)

1024

2048

4096

Bytes
August 2003 Hot Chips

8192

MPI Bandwidth - Elan4
1000 900 800 700
MBytes/sec

600 500 400 300 200 100 0 0 8 32 128 512 2048 8192 32768 131072 524288 Size in bytes MPI bandwidth MPI ping

August 2003

Hot Chips

Global operations - Gsync scaling
4.5 4 3.5 3

lateccy (us)

2.5 2 1.5 1 0.5 0

Elan3 Elan4

2

4

6

8 10 Nodes

12

14

16

August 2003

Hot Chips

Acknowledgements
    PNNL LANL Avnet/LSI Logic HP

August 2003

Hot Chips

www.quadrics.com

August 2003

Hot Chips


				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:104
posted:4/22/2009
language:English
pages:30