Myrinet Technology Roadmap
Document Sample


Myrinet
Technology Roadmap
Dr. Charles L. Seitz
CEO & CTO
Myricom, Inc.
chuck@myri.com
Myrinet Users Group Conference
Vienna, Austria
13 May 2002
Charles L. Seitz
www.myri.com MUG-2002, 13 May 2002
1
Myrinet Technology – History & Roadmap
1994 Products & Features
1st Generation 32-bit SBus (SPARC) interfaces, 8-port switches
1995 0.64+0.64 Gb/s links
1996 32-bit PCI interfaces (LANai 4), 8-port switches
SAN PHY level
1997 Clos “network in a box” of 8-port switches
1998 2nd Generation 16-port switches, HA features
1.28+1.28 Gb/s links 64-bit PCI interfaces (LANai 7), GM message system
1999
Clos “network in a box” of 16-port switches
2000 64-bit PCI interfaces (LANai 9), SW16, Clos128
2001 Fiber becomes prevalent for Myrinet-2000 links
Past 3rd Generation
2002 Future “Myrinet 2000” PCI-X interfaces, GM 2 GbE & 1x InfiniBand
2+2 Gb/s links ports on Myrinet switches
2003 3GIO interfaces
2004 4x Myrinet links
Charles L. Seitz
www.myri.com MUG-2002, 13 May 2002
2
Current Market and Technology Forces (winds of change)
• Continued healthy growth for clusters
– All of the major OEMs now offer clusters.
– Excellent progress in distributed-computing applications.
– Myricom’s competitive position -- the clear market leader
• Myricom’s 2001 revenue growth was 102%; 5-year growth has been ~609%.
• Myricom is already shipping >80% of ports in this market niche.
• Faster hosts, faster I/O (PCI-X and 3GIO)
– Just what we hoped for to be able to build better clusters.
– Moore’s Law still rules
• Advances in microelectronics (including VCSEL fiber components) apply to
interconnect in the same way as to processors and memory.
• InfiniBand
– Contributing to the expectation that interconnect will become “commodity.”
– However, IB has been a technical disappointment so far, and 1x IB is a
‘non-starter’ in the marketplace.
Charles L. Seitz
www.myri.com MUG-2002, 13 May 2002
3
Myricom’s Strategy (Priorities)
• The “whole product” concept
– Extraordinary efforts toward software reliability and customer support
• Extend Myrinet performance ~2x at close to present prices.
– PCI-X interfaces with two Myrinet-2000 ports
• Two-port NICs also have applications for high availability.
• Extend and broaden Myricom’s market
– “Low-end” Myrinet interfaces (PCI-X)
• Over the next ~18 months, bring the list prices of “low-end” fiber interfaces down
to ~$700.
– GbE ports on Myrinet switches
• Interoperability for Myrinet, and possible market for GbE Beowulf clusters.
– Remain positioned to ship components with InfiniBand ports
• InfiniBand ports on Myrinet switches.
• Myrinet-2000 PHY is exactly the 1x InfiniBand PHY.
– Motherboard NIC modules
– And more…
Charles L. Seitz
www.myri.com MUG-2002, 13 May 2002
4
Links: 2.5 GBaud, full duplex, mostly fiber
(At the PHY level,
these links are
identical to 1x
InfiniBand.)
Advantages of fiber: small-diameter, lightweight, flexible cables; reliability;
EMC; 200m length; connector size. (See http://www.myri.com/news/01723/)
Charles L. Seitz
www.myri.com MUG-2002, 13 May 2002
5
Links: Changes Planned
• (June 2002) - first chips with multi-protocol ports
– A multi-protocol port can act as a Myrinet port, long-range-Myrinet port
(1310nm single-mode fiber to 20km), GbE port, or InfiniBand port.
– Interoperability between Myrinet, GbE, & InfiniBand.
• (Nov 2002) - “High-end” PCI-X interfaces with two ports
– 2 x (250+250) MB/s = 1GB/s, a good match to 1 GB/s PCI-X.
– GM-2 route dispersion can use both links concurrently
• (Early 2003) - SerDes function integrated into Myricom custom-
VLSI chips
– These serial links will displace today’s SAN-2000 PHY.
– 2+2 Gb/s data rate, 2.5+2.5 GBaud (8b/10b encoded) links are also used as the
base PHY by 3GIO. Myricom plans to support 3GIO -- initially 4x 3GIO -- as
soon as 3GIO hosts become available.
• (Early 2004) - “4x” Myrinet (multi-protocol) links
– Most product volume is expected to continue with “1x” links through 2006.
Charles L. Seitz
www.myri.com MUG-2002, 13 May 2002
6
Switches: 128-Host Clos Network (Flagship)
Charles L. Seitz
www.myri.com MUG-2002, 13 May 2002
7
Switches: Changes Planned
• Switches with a mix of Myrinet, long-range-Myrinet, GbE, and
InfiniBand ports (starting this year).
– High-degree switches with GbE ports may find a market for “Beowulf”
clusters that use next-generation hosts with GbE on the motherboard.
• The use of dispersive routing (GM 2) allows better utilization of
Myrinet Clos networks (also HA at a finer time scale).
• More capable monitoring line card that can run Linux.
• Very few “Myrinet” changes until the advent of “4x Myrinet” links
(early 2004).
Charles L. Seitz
www.myri.com MUG-2002, 13 May 2002
8
Interfaces: Current production M3F-PCI64B-2
Myricom’s highest volume product
Charles L. Seitz
www.myri.com MUG-2002, 13 May 2002
9
Interfaces: 64-bit, 66MHz, Myrinet/PCI interfaces
• PCI64B, 133MHz RISC and memory
– 1067 MB/s memory bandwidth
• PCI64C, 200MHz RISC and memory
– 1600 MB/s memory bandwidth
533 MB/s
500 MB/s 1067 or 1600 MB/s
SAN Network Packet DMA controller
Fast SRAM
port Interface DMA & bus bridge
PCIDMA chip
LANai 9 chip RISC
Charles L. Seitz
www.myri.com MUG-2002, 13 May 2002
10
Interfaces: Changes Planned
• Faster RISCs. Higher local-memory bandwidth
– Lower latency, to ~4µs GM latency by the end of 2002 (from 7µs currently).
– MPI latency will decrease correspondingly.
– Higher throughput.
• Higher levels of integration
– LANai (10) XP - 225MHz RISC and memory, PCI-X, and one port.
• Pricing of low-end interfaces is expected to decline over next 1-2 years to less than
50% of current prices
– LANai (10) 2XP - 300MHz RISC and memory, PCI-X, and two ports.
– LANai with on-chip memory (not this year)
• Open-ended performance growth to 600+MHz RISC and memory.
Charles L. Seitz
www.myri.com MUG-2002, 13 May 2002
11
LANai 10 – New Features
• 250+250 MB/s multi-protocol ports
– Multi-protocol ports connect directly to a 10b SerDes (8b/10b-encoded data)
to support Myrinet-Fiber, single-mode-fiber Myrinet, GbE, or InfiniBand.
• Three pinout versions of the LANai 10 - XM, XP, 2XP
– See the following block diagrams, and Jakov’s talk
• Self-initialization of the LANai memory from ROM
– Necessary for stand-alone protocol converters.
– For interfaces, allows diskless hosts to boot over the Myrinet.
• Performance boost, plus headroom for future performance gains
– Initially: XM/XP versions 200-225MHz ZBT SRAM & RISC
– 2XP version 300+MHz ZBT SRAM & RISC
• Evolve to products with DDR or “Sigma” SRAM
– Headroom: 2.4–4.8+ GB/s local-memory data rate; 300–600+MHz RISC
Charles L. Seitz
www.myri.com MUG-2002, 13 May 2002
12
LANai 10 as a protocol converter
To line-card
To line card SerDes front-panel
XBar16 port SerDes
port
SAN X Modes
network network - Myrinet
interface interface - Program control
1310nm Fiber
Send/recv Send/recv - InfiniBand
DMA DMA - GbE
To line card µC engines engines
(JTAG)
Control & L-bus
x72b
x72b
memory memory
SRAM
SRAM
initialize interface
RISC
LANai XM
This circuitry is repeated for each line-card port.
Charles L. Seitz
www.myri.com MUG-2002, 13 May 2002
13
Low-Cost LANai 10 PCI-X Interface
PCI-card
SerDes
SerDes port
X
network
interface
Send/recv
DMA
Interface Control & engines
EEPROM memory
L-bus
& JTAG initialize x72b
x72b
memory
SRAM
SRAM
PCI-X & interface
PCI-X bus DMA
(225MHz)
Engine
RISC
LANai XP
Charles L. Seitz
www.myri.com MUG-2002, 13 May 2002
14
High-End LANai 10 PCI-X Interface
PCI-card
SerDes
SerDes port
PCI-card
SerDes
SerDes port
X X
network network
interface interface
Send/recv Send/recv
DMA DMA
engines engines
Interface Control & L-bus
x72b
x72b
EEPROM memory memory
SRAM
SRAM
& JTAG initialize interface
PCI-X & (300MHz)
PCI-X bus DMA RISC
Engine
dRAM Optional
DMA dRAM
dRAM Used for IO
Engine page tables
LANai 2XP
Charles L. Seitz
www.myri.com MUG-2002, 13 May 2002
15
Myrinet Software: Basic OS-Bypass Structure
Applications
MPI VIA Middleware
UDP TCP OS-bypass
APIs (multiple
Host host processes)
OS IP
Ethernet Myrinet
(executes in the
Myrinet Control Program (MCP) Myrinet interface)
2000 + 2000 Mb/s
10/100/1000 Mb/s
Charles L. Seitz
www.myri.com MUG-2002, 13 May 2002
16
The GM Message-Passing System
No Compromises GM Data-Rate Performance (Myrinet-2000 Fiber Interfaces)
• Concurrent, protected,
user-level access
• Reliable, ordered message
delivery UNIX user process to user process
• Very low CPU overhead Fully protected
End-to-end data integrity
• Robust under network
faults
• Mapping
• Segmentation and
reassembly of long
messages
• High-level flow control
• “Clean” API, with
exception handling
• Zero-copy layering of other
APIs GM short-message latency (Myrinet-2000 interfaces)
~ 7µs (PCI64C) or ~9µs (PCI64B)
GM CPU overhead = 1-2µs per message (LogP)
Charles L. Seitz
www.myri.com MUG-2002, 13 May 2002
17
Current Software Distributions
OS Platforms
Linux IA-32, IA-64, Alpha, PowerPC, UltraSPARC
Win2000/XP IA-32, IA-64
Solaris UltraSPARC
Tru64 Alpha
HP UX PA-RISC, developed by HP, used for HyperFabric
AIX PowerPC/IBM Power
Irix MIPS
VxWorks PowerPC
MacOS X Apple Macintosh G4
FreeBSD, … IA-32 & Alpha
Charles L. Seitz
www.myri.com MUG-2002, 13 May 2002
18
Small Part of Myricom’s Software Lab
Charles L. Seitz
www.myri.com MUG-2002, 13 May 2002
19
Yes, Myrinet runs beautifully on McKinleys
“gm_debug” for Myricom’s early
4-processor McKinley boxes using
M3F-PCI64C interfaces in PCI-X slots.
Myricom is distributing both Linux and
Windows software for IA64 now. Several
Itanium clusters in service.
DMA rate for 16384 Byte pages (64bit / 60MHz bus)
Timing 32 pages.
bus_read (send) = 418 MBytes/s
bus_write (recv) = 439 MBytes/s
Much higher throughputs are expected for the future Myrinet/PCI-X Interfaces.
Charles L. Seitz
www.myri.com MUG-2002, 13 May 2002
20
Current Choice of Myrinet Software Interfaces
• The GM API
– Low level, but some applications are programmed at this level
• TCP/IP
– Actually, “ethernet emulation,” included in all GM releases
• 1.8 Gb/s TCP/IP under GM 2 (netperf benchmarks)
• MPICH-GM
– An implementation of the Argonne MPICH directly over GM.
• VI-GM
– An implementation of the VI Architecture API directly over GM.
• Possibly relevant to InfiniBand compatibility
• Sockets-GM
– An implementation of UNIX or Windows sockets (or DCOM) over GM.
Completely transparent to application programs. Use the same binaries!
• Sockets-GM/GM/Myrinet is similar to the proposed SDP/InfiniBand.
Charles L. Seitz
www.myri.com MUG-2002, 13 May 2002
21
Myrinet Software: Changes Planned
• Increasing emphasis on Myrinet interoperability with GbE and
InfiniBand.
– Requires improvements & simplification of the mapper.
• GM 2
– GM-2.0-alpha0 (Linux, FreeBSD) is on the web for download now.
• Possible/probable Myricom support for HP UX.
– The only major software platform that we don’t support today.
• Performance.
– We are still leaving some performance ‘on the table.’
• No other “middleware” layers in sight.
– Ideas? SDP?
• Storage over Myrinet.
• Applications support.
– Myricom is now large enough to support application developers.
Charles L. Seitz
www.myri.com MUG-2002, 13 May 2002
22
Get documents about "