Sun Microsystems' Computer Architecture Development Strategies

Reviews
Insight into Sun Microsystems' Computer Architecture Development Strategies CSC, May 22nd 2008 Søren Steenberg Sun Microsystems Drive Innovation at Every Level Processor System Datacenter Page 2 The Sun Systems Family & Branches Expanded! Breakthrough! Enhanced! New! Sun Fire x64 Servers TM Sun SPARC CoolThreadsTM T-series Servers Sun FireTM UltraSPARC Servers Sun SPARC Enterprise Servers Page 3 25 years: 100.000 – 1 mill. times MIPS/$ 1010 105 10,000MIPS/$1K 100 MIPS/ 1000 USD 10-5 0.01 MIPS/$1K 10-10 Hermann Brunner, Max-Planck-Institut fuer Extraterrestriche Physik, Germany 10-15 1880 1900 1920 1940 Year 1960 1980 2000 2020 Page 4 The “Brick Wall”* of Performance “In 2006, performance is a factor of three below the traditional doubling every 18 months that we enjoyed between 1986 and 2002. The doubling of uniprocessor performance may now take 5 years.” * * “The Landscape of Parallel Computing Research: A View from Berkeley”, December 2006 (emphasis added) http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html Page 5 ILP vs. TLP Thread 1 1 Threads 2 3 4 Execution Units ILP (Instruction Level Parallelism) TLP (Thread Level Parallelism) The Memory Bottleneck 10000 Relative Performance 1000 CPU Frequency DRAM Speeds 2x Every Two Years 100 Gap 10 1 2x Every Six Years 1980 1985 1990 1995 2000 2005 Page 7 Today´s Bad news – The Challenges ILP exhausted, new tricks running short -------------------------------------> Physical constraints, speed of electrons Heat Power Memory latency gap Complexity Number of processor architects/engineers Time-to-market, chip respins Cost of R&D plus cost of manufacturing Larger caches in 3 levels Out of Order execution Deeper superpipelines Superscalar design EPIC (explicitly parallel instruction computing) Branch prediction Speculative prefetches Page 8 CMT – Multiple Multithreaded Cores Core 8 Core 7 Core 6 Core 5 Core 4 Core 3 Core 2 Core 1 Thread 4 Thread 3 Thread 2 Thread 1 Thread 4 Thread 3 Thread 2 Thread 1 Thread 4 Thread 3 Thread 2 Thread 1 Thread 4 Thread 3 Thread 2 Thread 1 Thread 4 Thread 3 Thread 2 Thread 1 Thread 4 Thread 3 Thread 2 Thread 1 Thread 4 Thread 3 Thread 2 Thread 1 Thread 4 Thread 3 Thread 2 Thread 1 Time Memory Latency Compute 9 UltraSPARC T2 Plus SMP System on a Chip • Unprecedented throughput and integration > Leveraged UltraSPARC T2 ® processor > 8 cores, 64 threads > Added 4x Coherence Channels • • • • 2-socket design Built-in wirespeed security Fast, integrated I/O Virtualization built in 3rd Generation CMT Page 10 Multithreaded Multicore in the Curriculum • It has become clear only recently, but has become astonishingly clear: • The future performance growth in microprocessors, at least for the next five years, will almost certainly come from exploitation of threadlevel parallelism (TLP) through multicore processors rather than through exploiting more instruction level parallelism (ILP). • The Sun T1 is a step in this direction. Computer Architecture: A Quantitative Approach, 4th ed. by John Hennessy and David Patterson Page 11 Sun is Leading the Way “We applaud Sun for being the first to enter this emerging market for highly multithreaded servers.” - Ideas International "We're at a historic point in computing, moving away from sequential processing to multicore designs...we need to invent new ways to evaluate these new parallel systems. Our initial experiments suggest that Niagara 2 has the highest performance, is the most power efficient and is the most 'software friendly' of the processors we've tested." - Professor Dave Patterson, Pardee Chair of Computer Science for the University of California at Berkeley Source 1: “Sun Gets it Right with Niagara 2 Servers,” 10/12/07, http://ideasint.blogs.com/ideasinsights/2007/10/sun-gets-it-rig.html Source 2: "Sun Microsystems Enters Commercial Silicon Market With World's Fastest Commodity Microprocessor," 08/07/07, http://www.sun.com/aboutsun/pr/2007-08/sunflash.20070807.1.xml Page 12 Slide 8-9 from Gartner Figures p. 3 – 2 from Microprocessor Report Page 13 Slide 8-9 from Gartner Figures p. 3 – 2 from Microprocessor Report Page 14 Slide 8-9 from Gartner Figures p. 3 – 2 from Microprocessor Report Page 15 UltraSPARC T2: TrueSystem On a Chip FB DIMM FB DIMM FB DIMM FB DIMM • Up to 8 cores @1.2GHz or 1.4GHz • Up to 64 threads per CPU • Up to 16 FB-DIMMs, 4 memory controllers > Up to 64GB memory (4GB DIMMs) > 270 GB/s crossbar bandwidth FB DIMM FB DIMM FB DIMM FB DIMM MCU MCU MCU MCU L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ Full Cross Bar C0 C1 C2 C3 C4 C5 C6 C7 FPU MAU FPU FPU FPU MAU MAU MAU FPU FPU FPU FPU MAU MAU MAU MAU DMA Sub-System NIU (E-net+) Sys I/F Buffer Switch Core PCI-E Power ~95W x8 @2.5GHz 2x 10GE Ethernet <1.5W/thread4GBytes/s bi-directional • 8 x fully pipelined Floating Point units / core, 1 per core • Dual 10Gbit Ethernet and PCI-E integrated onto chip • 4MB L2 (8 banks) 16 way associative • Enhanced MAU/Security coprocessor per core > DES, 3DES, AES, RC4, SHA1, SHA256, MD5, RSA to 2048 key, ECC,CRC32 • Advanced Power saving features • 65nm process technology Page 16 UltraSPARC T2 Plus: Multi-Socket US T2 Each Coherency Link 6.4GBytes/s per direction, 12.8GB/s total 4 x Coherency links give Snoop bandwidth of 51.2GB/s 4x Coherency Links FB-DIMM FB-DIMM FB-DIMM FB-DIMM • Per Socket > Up to 8 cores @1.2GHz or 1.4GHz > Up to 64 Threads per socket > 4MB L2$ 8 banks x 16 Way SA > Up to 8 FPU's, 1 per core > Up to 8 Crypto cores, 1 per core > DES, 3DES, AES, RC4, SHA1, SHA256, MD5, RSA to 2048 key, ECC,CRC32 FB-DIMM FB-DIMM FB-DIMM FB-DIMM MCU CU CU CU MCU CU L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ Full Cross Bar CO C1 C2 C3 C4 C5 C6 C7 FPU FPU FPU FPU FPU FPU FPU FPU MAU MAU MAU MAU MAU MAU MAU MAU System I/F SSI Bus PCI-Express x8 @2.5GHz 4GBytes/s bi-directional Power ~105W <1.5W/thread DIMMS > 25Read+13WriteGB/s Memory BW > One x8 PCIe interface • Per system > 2 sockets glue-less > 4 sockets using UST2 Plus XBR > 16 or 32 PCIe lanes on 2 or 4 Socket systems • 65nm process technology Page 17 > 2 memory Controllers, 16 UltraSPARC T2 Plus 2-Socket System Dual Channel FBDIMM Dual Channel FBDIMM Coherency Links Dual Channel FBDIMM Dual Channel FBDIMM Memory Controller Coherence Coherence Unit Unit Memory Controller Coherence Coherence Unit Unit Memory Controller Coherence Unit Memory Controller Coherence Coherence Coherence Unit Unit Unit UltraSPARC T2 Plus Cores, Crossbar, L2$ (8 cores, 64 threads, 4MB L2$) UltraSPARC T2 Plus Cores, Crossbar, L2$ (8 cores, 64 threads, 4MB L2$) PCI-Express NCU, DMU NCX PCI-Express NCU, DMU NCX System IO (Network, Disk, etc.) Page 18 UltraSPARC T2 Servers Page 19 Scale More with Less The World’s Fastest, Most Energy-Efficient Virtualization Servers World’s First Dual-Socket CMT Servers Sun SPARC Enterprise T5240 First True Modular Design World’s First Eco-Responsible Servers World’s First 64-Thread Servers with “System on a Chip” Sun SPARC Enterprise T5220 NEW Sun SPARC Enterprise T5140 Sun Blade T6320 TM Sun Fire /Sun SPARC Enterprise T2000 TM ® Sun SPARC Enterprise T5120 Sun Fire/Sun SPARC Enterprise T1000 Sun Blade T6300 Page 20 Sun SPARC Enterprise T5140 Server Sun SPARC Enterprise T5240 Server • Up to 128 threads • Up to 128GB of memory • Up to 2.3TB of storage • Up to 4.6GB/second delivered bandwidth • Tightly coupled thread, memory and interconnects for high scalability • Open source and FREE virtualization capabilities built in • Sun ILOM service processor supports industry-standard management interfaces Page 21 CoolThreads Power Saving Technology • Power management at both core and memory > Reduce instruction issue rates > Control clocks in both core and memory and reduce power consumption • Highly efficient 80 Plus and Climate Savers compliant power supplies • System power consumption can be reported to management applications up to every 5 seconds > Vital for effective datacenter power management and chargeback Page 22 Logical Domains: UltraSPARC Virtualization Solaris or Linux guest domains File Server Web Server Mail Server Application Solaris Control Domain Ultra lightweight Hypervisor in the firmware OS Server SPARC Enterprise CoolThreads Servers Page 23 Solaris Containers for Virtualization Strong isolation between Apps Calendar Server Database Web Server Application OS Virtualization built into the kernel Very light weight and scales with any Solaris system OS Server Page 24 Faster can be cooler. Better can be cleaner. Cheaper can be greener. Page 25

Related docs
Other docs by tiny54tim
Herman Miller Inc Ammendments and Bylaws
Views: 172  |  Downloads: 0
Agreement-Trademark Assignment
Views: 508  |  Downloads: 20
Board Resolution to Acquire a Company
Views: 253  |  Downloads: 4
Employee Discipline Form
Views: 5788  |  Downloads: 206
CorpDocs- List of Corporations Shareholders
Views: 250  |  Downloads: 4
Common Stock Purchase Certificate
Views: 513  |  Downloads: 11
Company Memorandum Template
Views: 575  |  Downloads: 4
2006 Inst W-2 and W-3 (PDF) Instructions
Views: 315  |  Downloads: 7
Form 8582 Passive Activity Loss Limitations
Views: 464  |  Downloads: 1
Receipt For Services in Exchange For_Stock
Views: 413  |  Downloads: 9
Form I-9 Employment Eligibility Verification
Views: 521  |  Downloads: 9