Cray Roadmap (2004-2010)

Cray Roadmap (2004-2010) John M. Levesque Senior Technologist (Virtual Steve Scott Chief Architect for X1/X1E/BW) Cray Proprietary Cray‟s Computing Vision Scalable High-Bandwidth Computing 2010 „Cascade‟ „Black Widow‟ 2006 „Black Widow 2‟ Sustained Petaflops X1E 2004 Product Integration 2006 X1 „Strider 3‟ 2004 2005 „Strider X‟ Red Storm RS „Strider 2‟ Cray X1 Overview Cray Proprietary Slide 2 Cray X1 Cray PVP • Powerful vector processors • Very high memory bandwidth • Non-unit stride computation • Special ISA features • Modernized the ISA T3E • Extreme scalability • Optimized communication • Memory hierarchy • Synchronization features • Improved via vectors High bandwidth, scalable shared memory supercomputer Cray X1 Overview Cray Proprietary Slide 3 Key Architectural Features New vector instruction set architecture (ISA) – Much larger register set (32x64 vector, 64+64 scalar) – 64- and 32-bit memory and IEEE arithmetic – Based on 25 years of experience compiling with Cray1 ISA Decoupled Execution – Scalar unit runs ahead of vector unit, doing addressing and control – Hardware dynamically unrolls loops, and issues multiple loops concurrently – Special sync operations keep pipeline full, even across barriers  Allows the processor to perform well on short nested loops Scalable, distributed shared memory (DSM) architecture – Memory hierarchy: caches, local memory, remote memory – Low latency, load/store access to entire machine (tens of TBs) – Processors support 1000’s of outstanding refs with flexible addressing – Very high bandwidth network – Coherence protocol, addressing and synchronization optimized for DM Cray X1 Overview Cray Proprietary Slide 4 Cray X1 Node P P P P P P P P P P P P P P P P $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ M mem M mem M mem M mem M mem M mem M mem M mem M mem M mem M mem M mem M mem M mem M mem M mem IO IO 51 Gflops, 200 GB/s • Four multistream processors (MSPs), each 12.8 Gflops • High bandwidth local shared memory (128 Direct Rambus channels) • 32 network links and four I/O links per node Cray X1 Overview Cray Proprietary Slide 5 NUMA Scalable up to 1024 Nodes Interconnection Network • • Cray X1 Overview 16 parallel networks for bandwidth Global shared memory across machine Cray Proprietary Slide 6 Network Topology (16 CPUs) P M0 M1 P P P M15 node 0 P M0 M1 P P P M15 node 1 P M0 M1 P P P M15 node 2 P M0 M1 P P P M15 node 3 Section 0 Section 1 Section 15 Cray X1 Overview Cray Proprietary Slide 7 Network Topology (128 CPUs) R R R R 16 links R R R R Cray X1 Overview Cray Proprietary Slide 8 Network Topology (512 CPUs) Cray X1 Overview Cray Proprietary Slide 9 Cray X1 Node Module Cray X1 Overview Cray Proprietary Slide 10 Cray X1 Chassis Cray X1 Overview Cray Proprietary Slide 11 64 Processor Cray X1 System ~820 Gflops Cray X1 Overview Cray Proprietary Slide 12 Cray X1E Product Enhancement Cray Proprietary Cray X1E Mid-life Enhancement • Technology refresh of the X1 (0.13m) – ~50% faster processors – Scalar performance enhancements – Doubling processor density – Modest increase in memory system bandwidth – Same interconnect and I/O • Machine upgradeable – Can replace Cray X1 nodes with X1E nodes • Shipping the end of this year Cray X1 Overview Cray Proprietary Slide 14 Cray BlackWidow System • Second generation Vector MPP – Upward compatible with the Cray X1 – Shipping in 2006 • Major improvement (>> Moore’s Law rate) in: – Single thread scalar performance – Price performance • BlackWidow features: – – – – – – Single chip vector microprocessor Globally addressable memory with 4-way SMP nodes Scalable to tens of thousands of processors Even more bandwidth per flop than the X1 Innovative fault tolerance features Configurable memory capacity, memory BW and network BW Cray X1 Overview Cray Proprietary Slide 15 System Goals • Balanced Performance between CPU, Memory, Interconnect, and I/O • Highly scalable system hardware and software • High speed, high bandwidth 3D mesh interconnect • Run a set of applications 7 times faster than ASCI Red • Run an ASCI Red application on full system for 50 hours • Flexible partitioning for classified and non-classified computing • High performance I/O subsystem (File system and storage) Cray X1 Overview Cray Proprietary Slide 16 Red Storm System Overview • 40TF peak performance • 108 compute node cabinets, 16 service and I/O node cabinets, and 16 Red/Black switch cabinets – 10,368 compute processors - 2.0 GHz AMD Opteron™ – 512 service and I/O processors (256P for red, 256P for black) – 10 TB DDR memory • 240 TB of disk storage(120TB for red, 120TB for black) • MPP System Software – Linux + lightweight compute node operating system – Managed and used as a single system – Easy to use programming environment – Common programming environment – High performance file system – Low overhead RAS and message passing • Approximately 3,000 ft² including disk systems Cray X1 Overview Cray Proprietary Slide 17 Typical Architecture Intel XeonTM Processor Intel XeonTM Processor • 6.4 GB/sec Memory latency ~ 160 ns and bandwidth is shared between mutliple processors Northbridge chip is 2nd most complex chip on the board. Typical chip uses about 11 Watts Any interconnect limited by speed of PCI-X since it’s the fastest place to “plug in” Best place to tie in a high performance interconnect would be through the Northbridge, but this is difficult to do legally without an Intel bus license Slide 18 • Northbridge Southbridge or PCI-X Bridge • I/O SPEED PCI-X Slot PCI-X Slot LIMIT PCI-X Slot 1 GB/sec Cray X1 Overview Cray Proprietary • AMD OpteronStrider PE CRAY Generic System DDR Memory Controller • AMD Opteron HyperTransport 6.4 GB/sec HT HT • PCI-X Bridge Cray Router (Seastar) PCI-X Slot PCI-X Slot PCI-X Slot • SDRAM memory controller and function of Northbridge is pulled onto the Opteron die. Memory latency reduced to 60-90 ns No Northbridge chip results in savings in heat, power, complexity and an increase in performance Interface off the chip is an open standard (HyperTransport) Six Network Links Each >3 GB/s x 2 Cray X1 Overview Cray Proprietary Slide 19 Using HyperTransport to Interface With System Interconnect AMD AMD Opteron AMD Opteron HyperTransport HyperTransport HyperTransport DDR Memory Controller DDR Memory Controller DDR Memory Opteron Controller AMD AMD Opteron AMD Opteron HyperTransport HyperTransport HyperTransport DDR Memory Controller DDR Memory Controller DDR Memory Opteron Controller Cray Cray SeaStar Cray SeaStar 6 port Router SeaStar 6 port Router 6 port Router Cray Cray SeaStar Cray SeaStar 6 port Router SeaStar 6 port Router 6 port Router • • 6GB/sec (3 GB/sec bi-directional) 3D Torus Interconnect Sound Familiar? Cray X1 Overview Cray Proprietary Slide 20 Cray BlackWidow The Next Generation HPC System From Cray Inc. Cray Proprietary Cray BlackWidow System • Second generation Vector MPP – Upward compatible with the Cray X1 – Shipping in 2006 • Major improvement (>> Moore’s Law rate) in: – Single thread scalar performance – Price performance • BlackWidow features: – – – – – – Single chip vector microprocessor Globally addressable memory with 4-way SMP nodes Scalable to tens of thousands of processors Even more bandwidth per flop than the X1 Innovative fault tolerance features Configurable memory capacity, memory BW and network BW Cray X1 Overview Cray Proprietary Slide 22 Cascade Toward Sustained Petaflop Computing Cray X1 Overview Cray Proprietary Slide 23 HPCS Phases CRAY SGI Sun HP IBM Phase I: Concept Development – – – – Forecast available technology Propose HPCS hw/sw concepts Explore productivity metrics Develop research plan for Phase II 1 Year 2H 2002 – 1H 2003 $3M/year CRAY Sun IBM Phase II: Concept Validation – – – – Focused R&D Hardware and software prototyping Experimentation and simulation Risk assessment and mitigation 3 Years 2H 2003 – 1H 2006 $17M/year ? ? Phase III: Full Scale Product Development – Commercially available system by 2010 – Outreach and cooperation in software and applications areas 4 Years 2H 2006 – 2010 $?/year The HPCS program lets us explore technologies we otherwise couldn’t. A three year head start on typical development cycle. Cray X1 Overview Cray Proprietary Slide 24 Cray‟s Approach to HPCS • High system efficiency at scale – Bandwidth is the most critical and expensive part of scalability – Enable very high (but configurable) global bandwidth – Design processor and system to use this bandwidth wisely – Reduce bandwidth demand architecturally • High human productivity and portability – – – – Support legacy and emerging languages Provide strong compiler, tools and runtime support Support a mixed UMA/NUMA programming model Develop higher-level programming language and tools • System robustness – Provide excellent fault detection and diagnosis – Implement automatic reconfiguration and fault containment – Make all resources virtualized and dynamically Cray X1 Overview reconfigurable Cray Proprietary Slide 25 Using Bandwidth Wisely • Implement global shared memory – Lowest latency communication – Lowest overhead communication – Fine-grained overlap of computation and communication • Tolerate latency with processor concurrency – Message passing concurrency is constraining and hard to program – Vectors and streaming provide concurrency within a thread – Multithreading provides concurrency between threads • Exploit locality to reduce bandwidth demand – “Heavyweight” processors (HWPs) to exploit temporal locality – “Lightweight” processors (LWPs) to exploit spatial locality • Use other techniques to reduce network traffic – Atomic memory operations – Single word network transfers when no locality is present Cray X1 Overview Cray Proprietary Slide 26 Questions? File Name: BWOpsReview081103.ppt Cray Proprietary

Related docs
Cray 2006 Annual Report
Views: 98  |  Downloads: 0
Cray
Views: 3  |  Downloads: 0
Seymour_Cray
Views: 1  |  Downloads: 0
Robert_Cray
Views: 3  |  Downloads: 1
Richard_Cray
Views: 3  |  Downloads: 0
An Introduction to the Cray X1E
Views: 10  |  Downloads: 1
Agreement - CRAY INC - 8-14-2003
Views: 0  |  Downloads: 0
Capital Stock - CRAY INC - 3-12-2004
Views: 0  |  Downloads: 0
CRAY INC Loan Agreement
Views: 0  |  Downloads: 0
CRAY INC Termination Severance Agreement
Views: 1  |  Downloads: 0
Agreement - CRAY INC - 12-18-1996
Views: 0  |  Downloads: 0
CRAY VALLEY technical data sheet
Views: 1  |  Downloads: 0
Other docs by Dennis Haskins
Articles of Confederation info
Views: 205  |  Downloads: 0
Finance Lecture8
Views: 467  |  Downloads: 10
Withholdings from distributions
Views: 158  |  Downloads: 1
Assumption agreement
Views: 305  |  Downloads: 3
Venture Capital for Chemical Industry Engineers
Views: 856  |  Downloads: 25
layout_engine
Views: 262  |  Downloads: 3
Extension of Commercial Lease
Views: 254  |  Downloads: 3
As tenants in common
Views: 342  |  Downloads: 4
Transcript of Federalist Papers
Views: 204  |  Downloads: 0
Sample Executive Summary TenderSys
Views: 333  |  Downloads: 6
Alien and Sedition Acts _1798_ Image 2
Views: 219  |  Downloads: 0