Taking Multicore Chip Multithreading (CMT) to the next Level by kpg20724

VIEWS: 143 PAGES: 39

									Taking Multicore Chip
Multithreading (CMT) to the
next Level of throughput
performance with SMP:

The Victoria Falls
Processor
aka UltraSPARC T2 Plus


    Denis Sheahan

    Distinguished Engineer

    Sun Microsystems Inc.
Agenda
 •
     Chip Multi-threaded concepts
 •
     Aim of UltraSPARC T2 Plus
 •
     Hardware design decisions and implementation
 •
     Multi-core SMP OS scaling
              Data
 •
     Virtualization and Consolidation
 •
     Scaling Applications
 •
     Performance
 •
     Conclusions




                                                    Page 2
Memory Bottleneck
Relative
Performance
 10000
                   CPU Frequency
                   DRAM Speeds
  1000
                                                                 s
                                                              ear
                                                      y     2Y
   100                                            Ever                       Gap
                                            -- 2x
                                       U
                                     CP                            s
                                                          ry 6 Year
   10
                                            DRAM -- 2x Eve

     1
         1980            1985               1990               1995   2000         2005
          Source: Sun World Wide Analyst Conference Feb. 25, 2003
  CMT Implementation
Four threads share a single pipeline
Every cpu cycle an instruction from
a different thread is executed                                        Niagara Processor
                                                                       Shared Pipeline
                                                                     Utilization: Up to 85%
Thread 4               C       M       C       M       C       M
Thread 3           C       M       C       M       C       M
Thread 2       C       M       C       M       C       M
Thread 1   C       M       C       M       C       M
                                                                   Time
                   Memory Latency                  Compute



                                                                                        Page 4
Aims of UltraSPARC T2 Plus
 •
     Create an SMP version of CMT to extend the highly
     threaded Niagara design
 •
     Use T2 as the basis for these systems
 •
     Minimal modifications to T2 for shorter time to market
               Data
 •
     Create two-way and four-way systems without the need
     for a traditional SMP backplane
 •
     Avoid any hardware bottlenecks to scaling
 •
     High throughput low latency interconnect
 •
     Scale memory bandwidth with nodes
 •
     Scale I/O bandwidth with nodes
 •
     Include hardware features to enable software scaling

                                                         Page 5
UltraSPARC T2:
Basis for T2 Plus                                         • Up to 8 cores @1.4GHz
                                                          • Up to 64 threads per CPU
                                                          • Memory
      FB DIMM      FB DIMM     FB DIMM       FB DIMM
                                                             > Up to 64GB memory with 4GB
      FB DIMM      FB DIMM     FB DIMM       FB DIMM           DIMMs
                                                             > Up to 16 fully buffered Dimms
            MCU         MCU           MCU           MCU
                                                             > Memory BW = 60+GB/S
     L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$
                                                          • 8x FPUs, 1 fully pipelined
                     Full Cross Bar                         floating point unit/core
     C0    C1 C2         C3   C4      C5    C6      C7    • 4MB L2$ (8 banks) 16 way set
     FPU   FPU    FPU   FPU   FPU     FPU   FPU     FPU
                                                          • Security co-processor per core
                                                             > DES, 3DES, AES, RC4, SHA1,
       NIU
     (E-net+)
                          Sys I/F
                    Buffer Switch Core
                                                  PCI-
                                                   Ex
                                                               SHA256, MD5, RSA to 2048 key,
                                                               ECC,CRC32
                        Power 80W
 2x 10GE Ethernet                           x8 @2.5GHz
Victoria Falls aka                                                         • Up to 4 sockets with 8 cores @1.4GHz
Niagara T2 Plus                                                            • N2 core
Coherence Plane 0 (6.4GB/s per direction, 12.8GB/s total)
                                                                           • Threads
4 x Planes, delivering 51.2GB/s snoop B/width)
                                                                             > Up to 256 threads
                                                               To 2nd


                                                               T2 Plus
                                                                             > 64 threads per socket

           FBDIMM
           Memory
                    {                                                      • Memory
                                                                             > Up to 128GB memory
                             Memory               Memory                     > Up to 32 full buffered Dimms
                           Controller            Controller
                          CU         CU        CU         CU

                        L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$

                                  Full Cross Bar

                        C0 C1 C2 C3 C4 C5 C6 C7


                                      Sys I/F          PCI-
                                                        Ex

     Maintenance
     Subsystem                        To 2nd           PCI-E      To 2nd
                                      CPU             SWITCH      CPU
T2 Plus Hardware design decisions - Silicon layout
  •
      Majority of the T2 Plus layout the same as T2
  •
      Includes packaging and pin count ie processor could
      not be any bigger or have more pins
  •
      Coherency units require space on silicon so reallocate
                 for 2
      the space Data x 10Gig on T2 for the coherency units
      on T2 Plus.
  •
      Also regained some space in the move from 4 to 2
      MCUs
  •
      All I/O on T2 Plus via the x8 PCI-E link per chip
  •
      Can scale the I/O with processors
  •
      Two-way systems have 2x the I/O, four-way systems
      have 4x the I/O

                                                               Page 8
T2 Plus Hardware design decisions - Interconnect
   •
       Reduce the Memory Control Units (MCU) from four (T2)
       to two (T2 Plus).
   •
       Each MCU then serves four banks on the T2 Plus.
   •
       Use half the FBDIMM channels for memory and the
       other half to create 4 bidirectional links for interconnect
   •
       PhysicalData of interconnect same as FBDIMM
                 link
   •
       Address space partitioned into 4 coherence planes
       using physical address bits
   •
       Reuse the FBDIMM pins for the interconnect
   •
       Result is a high speed SERDES link
   •
       Provides high bandwidth and a low latency interconnect



                                                                Page 9
UltraSPARC T2 Plus 2-Socket System

      Dual Channel            Dual Channel                    Dual Channel           Dual Channel
        FBDIMM                  FBDIMM                          FBDIMM                 FBDIMM




       Memory Controller      Memory Controller                 Memory Controller     Memory Controller

  Coherence    Coherence   Coherence   Coherence             Coherence Coherence    Coherence   Coherence
    Unit         Unit        Unit        Unit                  Unit      Unit         Unit        Unit


     Niagara2 Cores, Crossbar, L2$                            Niagara2 Cores, Crossbar, L2$
          (8 cores, 64 threads, 4MB L2$)                          (8 cores, 64 threads, 4MB L2$)



         PCI-Express          NCU, DMU       NCX                 PCI-Express           NCU, DMU     NCX




                           System IO (Network, Disk, etc.)
T2 Plus Hardware design decisions - Memory
 •
     Each T2 Plus processor has its own local memory
     connected to its FBDIMM channels
      >
        21GB/sec (Theoretical Peak) Read
      >
        10GB/sec (Theoretical Peak) Write
      >
        FBDIMM-667
 •
     Remote memory latency has a 76ns penalty for access,
              Data
     about 1.5x latency to local memory This makes T2
     Plus systems NUMA though not highly so.
 •
     Two memory interleaving modes implemented in
     hardware
 •
     512 byte interleaving spreads memory accesses evenly
     across all nodes
 •
     1 Gigabyte interleaving where OS can become involved
     in optimal memory placement

                                                        Page 11
T2 Plus Hardware design decisions - I/O
 •
     Each T2 Plus processor has its own x8 PCI-E link.
     Two-way systems have 2 x8 links, four-way systems
     have 4 x8 links
 •
     T2 implemented PCI-E strict ordering which guaranteed
     data integrity but can unnecessarily block unrelated
     transactions.
 •             Data
     Enhancement for T2 Plus introduced relaxed DMA
     Ordering. Allows request reordering and higher
     performance.
 •
     Relaxed ordering helps scaling
 •
     A device interrupt is delivered to a local cpu via a PCI-E
     memory write (MSI or MSI-X) but if necessary can be
     forwarded to a remote cpu for completion via a cross
     call. This adds very little overhead.


                                                              Page 12
T5140 1U                               Extreme Rackmount Density
UltraSPARC T2 Platform                 • 1RU chassis, 26.5” depth
                                       • Two sockets
                                         > 6 /8 cores @ 1.4GHz each
                                         > Up to 128 threads
                                       • 16 memory slots
                                         > Up to 64GB of memory, FB-DIMM
                                       High Reliability
• Optimized for maximum throughput     • Up to 8 hot plug SATA/SAS 2.5”
  per Rack Unit                          disk drive (Raid 0,1)
• Up to 128 threads / 64GB memory in   • Redundant, hot-swappable PSUs
                                         and Fans
  a redundant 1U chassis
                                       Expandable
                                       • 3 PCI-E expansion slots
                                       • 4 10/100/1,000 Mbps Ethernet
                                         as standard
                                       • 4 USB ports


                                                                      Page 13
T5240 2U                                Rackmount Density
UltraSPARC T2 Platform                  •
                                          2RU chassis, 26.5” depth
                                        •
                                          Two sockets
                                            >
                                                6/8 cores @1.4GHz
                                            >
                                                Up to 128 threads
                                        •
                                            32 memory slots
                                            >
                                                Up to 128GB of memory, FB-DIMM
•
    Aim is Data center simplification   High Reliability
                                        •
                                          Up to 16 hot plug SATA/SAS
                                          2.5” disk drive (Raid 0,1)
                                        •
                                          Redundant, hot-swappable
                                          PSUs and Fans
                                        Expandable
                                        •
                                          6 PCI-E expansion slots
                                        •
                                          4 10/100/1000 Mbps ethernet
                                        •
                                          4 USB ports



                                                                        Page 14
Scaling the OS
 •
     There are a set of common issues when scaling the OS
 •
     Single threaded kernel services
 •
     Scaling contended mutexes
 •
     Optimising memory placement
 •
     Optimal scheduling across cores
               Data
 •
     Scaling I/O
 •
     Scaling the Network
 •
     Local vs remote DMA
 •
     Delivering interrupts
 •
     Virtualization


                                                        Page 15
Single Threaded kernel services - Tick accounting

   •
       Solaris performs accounting and book keeping activities
       every clock tick.
   •
       To do this, a cyclic timer is created to fire every clock
       tick
   •             Data
       Code then goes around all the active CPUs in the
       system, to determine if any user thread is running on a
       CPU and charges it with one tick.
   •
       Measures number of ticks a user thread is using and
       adjusts time quantum used by a thread.
   •
       Scheduler dispatching decisions are made using the
       number of ticks
   •
       LWP interval timers are processed every tick, if they
       have been set.
                                                              Page 16
Tick Accounting (cont.)
  •
      As the number of CPUs increases, the tick accounting
      loop gets larger.
  •
      Tick accounting is a single threaded activity and on a
      busy system with many CPUs, the loop can often take
      more than a tick (10ms) to process if the locks it needs
      to acquire are busy.
                Data
  •
      This causes the clock to fall behind and timeout
      processing to be delayed. System becomes
      unresponsive
  •
      Solution is to involve multiple cpus in the Tick
      scheduling. Cpus are collected in groups and one is
      chosen to schedule all their ticks
  •
      Algorithm is used to spread the tick accounting evenly
      across active cpus.so each has to take less than 1% of
      the load
                                                                 Page 17
Scaling contended mutexes
 •
     mutexes in Solaris are optimised for the non contended
     case. Contended mutexes are considered rare
 •
     With up to 256 active threads in a single OS instance,
     calls on contended mutexes can become hotter.
 •
     A side effect can be the thundering herd where multiple
     callers areData
                 woken at the same time and attempt to
     acquire the lock.
 •
     This is quite common in the network stack
 •
     Solaris solution is an Exponential backoff algorithm.
     This has the advantage of little overhead in the non-
     contended case and performance gains in the
     contended case



                                                               Page 18
T2 Plus NUMA aware OS
  •
      OS has been NUMA aware since Solaris 9. Works on
      the concept of a logical group, lgroup, which is a portion
      of lower latency local memory and its associated cpus.
      This feature is called Memory Placement Optimisation
      (MPO)
  •
      Processes are assigned to a local lgroup where hot
                Data
      areas of their address space such as heap and stack
      are allocated and where they will be scheduled
  •
      On T2 Plus systems an lgroup spans a processor and
      its local memory. lgroupinfo shows the groups and free
      memory.
  •
      To implement NUMA aware memory placement all the
      memory is 1GB interleaved
  •
      MPO is greatly complicated by Virtualization as the
      Hypervisor needs to be MPO aware when allocating
      memory to a Logical Domain
                                                              Page 19
Scaling the Network Stack
  •
      The T2 Plus systems have a 10Gig NIC integrated on
      the Motherboard
  •
      We implemented Large Segment Offload (LSO) to
      increase transmit throughput and reduce CPU
      overhead
  •
      Large buffers (up to 64k) are sent to lower layers where
               Data
      the driver fragments them into packets
  •
      This avoids fragmenting at the IP layer and copying of
      many smaller packets
  •
      Advantages
  •
      Up to 30% better single thread throughput
  •
      Up to 20% better multi-thread throughput
  •
      10% less cpu consumed by the network stack
                                                             Page 20
T2 Plus Scheduling Optimizations
 • Solaris Processor Groups
   > Abstraction introduced to capture a group of CPUs with some
     (hardware) sharing relationship
      > int/FP pipelines, caches, chips, MMUs, crypto units, etc.
   > PGs used by dispatcher to implement multi-level CMT load balancing
     and affinity polices
 • T2 Plus
   > Groupings created for int/FP pipelines
   > Balances running threads across both levels
       > 16 - 32 threads => 1 per core i.e. 1 per floating point pipeline
       > 64 – 128 threads => 2 per core i.e. 1 per integer pipeline
 • Aim is to spread the load evenly across all available cpus
 • Logical domains can also use this information for scheduling

                                                                            Page 21
The Hypervisor: Virtualization for T2 Plus
Platforms

  • “sun4v” architecture                              Operating
                                                       System



       Solaris X                    Solaris X
        update                      (genunix)
       (genunix)

      sun4u code                  Solaris X (sun4v)


                                                            sun4v
       US-Z CPU                                           interface
         code
                                  SPARC hypervisor
       CPU “Z”                      SPARC CPU


                                                      Platform
                                                                  Page 22
The Sun4v/HV/LDOMs Model
• Replace HW domains
  with Logical Domains         Logical      Logical      Logical
  > Highly flexible            Domain 1     Domain 2     Domain 3


• Each Domain runs an                       Solaris
                                              10
                                                           Open           App

  independent OS               Service          App
                                                           Solaris
                                                            App
  > Capability for specialized Domain
                                               App         App
                                                                         App

    domains                                  Container   Container 1 Container 2


                Hypervisor
                Hardware            CPU         CPU        CPU          CPU
                Shared CPU,
                Memory, IO    I/O     Mem     Mem                 Mem

                                                                          Page 23
Virtualized I/O
                  Logical Domain A                       Service Domain
                          App                                                Device Driver
                                  App                                       /pci@B/qlc@6
                   App
                          App
                                                         Virtual Device      Nexus Driver
                                                             Service           /pci@B
                      Virtual Device
Privileged                Driver



                                                                               Virtual Nexus I/F
Hyper        Hypervisor                 Domain Channel
Privileged

                                                                               I/O MMU
Hardware
                                                                               PCI
                                                                                      I/O
                                                                              Root   Bridge




                                                                          PCI B
                                                                                                   Page 24
T2 Plus - Scaling Applications
  •
      Scaling and throughput are the most important goals for
      performance on highly threaded systems
  •
      High Utilisation of the pipelines is key to performance.
  •
      Need many software threads or processes to utilise the
      hardware threads.
  •
      Threads have to be actually working
                Data
  •
      Multiple instances of an application may be necessary
      to achieve scaling
  •
      A single thread can become the bottleneck for the
      application
  •
      Spinning in a tight loop waiting for a hot lock is not good
      for CMT
  •
      Critical sections can also reduce scaling

                                                               Page 25
Utilization
  •
      Solaris maintains a count of ticks in a kstat which is
      updated on entry and exit of idle, user and kernel
      states. On traditional processors these kstats indicates
      the percentage of time spent in each state
  •
      T2 Plus utilises the pipeline completely differently, 4
      threads share each of the two integer pipelines per core
      and all 8 share the FP pipeline.
  •
      If a thread stalls it is parked by the hardware and its
      cycles given to the other 3 threads until the stall
      completes. A thread is also parked on entering the idle
      state
  •
      From a kstat perspective Solaris believes it is running
      on 4 processors. Depending on the amount of stall in
      the threads the actual distribution of cpu cycles can be
      radically different to the Solaris view

                                                                 Page 26
Utilization and Corestat
• In order to determine the real pipeline utilization we use low
  level hardware counters to count the number of instructions per
  second per pipeline – both integer and floating point
• We have written a tool called corestat which collects data from
  the low level performance counters and aggregates it to
  present utilization percentages
• Available from http://cooltools.sunsource.net/corestat/
• Corestat reports :
            Individual integer pipeline utilization (Default mode)
            Floating Point Unit utilization -g flag)




                                                             Page 27
    Corestat: Example output
#   corestat
    Core Utilization for Integer pipeline
Core,Int-pipe %Usr %Sys %Usr+Sys
-------------      -----     -----   --------
   0,0              1.74     3.00    4.74
   0,1              1.56     2.70    4.25
   1,0              1.69     2.90    4.59
   1,1              1.55     2.69    4.25
   2,0              1.69     2.90    4.59
   2,1              1.56     2.69    4.24
   3,0             28.92     2.10 31.03
   3,1             1.56     2.68     4.24
   4,0             1.68     2.95     4.63
   4,1             1.56     2.69     4.24
   5,0             1.69     2.91     4.60
   5,1             1.56     2.69     4.24
   6,0             1.69     2.92     4.61
   6,1            31.77     1.89 33.66
-------------     -----    -----   --------
   Avg             5.73     2.69    8.42

                                                Page 28
Application Locking issues
  •
      As in the OS the most common reason Applications fail to Scale on
      T2 Plus is hot locks
  •
      We have found this especially true when migrating from 2-4 way
      systems. Systems with 2-4 cpus/cores tend to serialize access to
      hot locks and structures
  •
      On T2 Plus locks tend to reside in the L2 cache. Cache to cache
                  Data
      transfers are much lower latency and lock acquire time can be a lot
      less. More threads can be waiting on the same lock
  •
      When developing locking code the CMT architecture is different.
  •
      Use of a low-impact, long-latency opcode to add delay in the busy-
      wait loop. The low-impact opcode frees up cycles so that other
      threads sharing the core get more useful work done
  •
      Try and use randomized exponential backoff in between attempts to
      acquire the lock. Without backoff multiple threads concurrently
      attempt atomic operations on the same address
                                                                           Page 29
Memory Bandwidth - T2 Plus STREAM Results




      vcpus   Copy    Scale    Add     Triad
       128    27950   23212   28901   29248
       112    30057   24989   31480   31108
        96    28073   23546   29797   29766
        80    28218   23371   29511   29548
        64    28088   23523   29650   29555
        48    28033   23410   29495   29246
        32    27975   23041   26444   25751
        16    25000   17500   14447   14113




                                               Page 30
SPECint_rate2006: UltraSPARC T2 Plus
• 2x scaling from T2 to 2 x T2 Plus


 SPECint_rate2006

 System       Peak   Base   Configuration
 Sun T5140    157    141    2 x UltraSPARC T2 Plus       1.4GHz
 Sun T5120    78     72     1 x UltraSPARC T2            1.4GHz
 Sun T5120    68     63     1 x UltraSPARC T2            1.2GHz
 IBM p570     122    108    2 x Power 6                  4.7 GHz
 SuperMicro   147    121    2 x Xeon X5482               3.2 Ghz




                                      *See Slide “Benchmark Disclosure”
                                                                          Page 31
SPECfp_rate2006: UltraSPARC T2 Plus
 • 1.9x scaling from T2 to 2x T2 Plus


 SPECfp_rate2006

 System       Peak   Base   Configuration
 Sun T5140    119    111    2 x UltraSPARC T2 Plus        1.4GHz
 Sun T5120    62     57     1 x UltraSPARC T2             1.4GHz
 IBM p570     116    98     2 x Power 6                   4.7 GHz
 Dell T7400   81     76     2 x Xeon X5482                3.2 Ghz




                                      *See Slide “Benchmark Disclosure”
                                                                          Page 32
    SpecJBB performance 16 JVM
•
    1.94x scaling from T2 to 2xT2 Plus
•
    T5140 run with 16 JVMs




SpecJBB

 System                     Metric
Way/GHz/cpu                 #core
Sun Fire T5140              373k                               2 x 1.4 Sun T2
Plus           16
Sun Fire T5120              192k                               1 x 1.4 Sun T2
8
IBM System x3650       323k          2 x 3.16GHz Xeon        8
IBM p6 570          346k                      4 x 4.7GHz Power 68

                                         *See Slide “Benchmark Disclosure”
                                                                             Page 33
Java on UltraSPARC T2 Plus
SPECjbb2005 Multi-JVM Results
  375000
  350000
  325000
  300000
  275000
  250000
  225000
  200000
  175000
  150000
  125000
  100000
   75000
   50000
   25000
       0
           IBM p 570   IBM System   HP DL360 G5      Dell PE2950         Sun SE
                       3650                          III                 T5240
                                         *See Slide “Benchmark Disclosure”        Page 34
     SpecJAppserver2004
 •
  1.7x scaling from T2 to 2 x T2 Plus



SpecJAppserve2004

 System        Metric                           Appserver
              Way/GHz/cpu #core
Sun Fire T5140       3331    Oracle 10.1.3.3                           2 x 1.4
Sun T2 Plus     16
Sun Fire T5120       2000    Oracle 10.1.3.3                           1 x 1.4
Sun T2                8
IBM p570          1197      WS 6.1                                       2x
4.7GHz Power 6          4
Inspur NF280D    1538       BEA 10                                 2x 2.6GHz
Xeon 5355 8                          *See Slide “Benchmark Disclosure”
                                                                          Page 35
it.com
• Run in house on a T5240
• Discover software for law firms
• Pure 64-bit Java application
• Ingesting a huge amount of data and running a
  compute intensive learning algorithm
            Data
• Traditional solution x86 clusters
• Went from 3GB of data an hour on 2-way dual core
  x86 servers to 50GB an hour on the T5240




                                                     Page 36
Conclusions
 •
     The implementation of T2 Plus enables highly threaded
     two-way and four-way servers
 •
     The high bandwidth, low latency interconnect is
     essential to achieve good scalability
 •
     Scalable memory bandwidth and physical I/O is also
     required Data
 •
     An OS running on a highly threaded processor must
     also be scalable and aware of the underlying hardware
     implementation. It must also be virtualizable
 •
     High throughput and utilisation of the per core pipelines
     is key to Application scaling.
 •
     Requires scalable locking
 •
     May require multiple instances

                                                             Page 37
More Info
• Start with our Engineering blogs
• http://blogs.sun.
  com/allanp/entry/sun_s_cmt_goes_multi
• All benchmarks are posted here:
            Data
• http://www.sun.
  com/servers/coolthreads/benchmarks/index.jsp




                                                 Page 38
Benchmark Disclosure
SPEC, SPECjAppServer are registered trademarks of Standard Performance Evaluation Corporation. All results from www.spec.org as of 04/01/08.
SPECjAppServer2004 Sun SPARC Enterprise T5240 (16 cores, 2 chip) 3,331 JOPS@Standard. SPECjAppServer2004; IBM p570 (4 cores, 2 chips)
1,197.51 JOPS@Standard. SPECjAppServer2004; Inspur NF280D (8 cores, 2 chips) 1,538.65 JOPS@Standard.SPECjAppServer2004.
SPECjAppServer2004 Sun SPARC Enterprise T5220 (8 cores, 1 chip) 2000 JOPS@Standard. SPECjAppServer2004


SPEC, SPECjbb reg tm of Standard Performance Evaluation Corporation. Sun SPARC Enterprise T5120 results submitted to
SPEC. Other results as of 09/28/07 on www.spec.org. T5240 results submitted to SPEC for review.
Other results as of 04/09/2008 on www.spec.org. Sun SPARC Enterprise T5240 (2 chip, 16 cores, 128 threads) 373,405
SPECjbb2005 bops, 23,338 SPECjbb2005 bops/JVM. IBM p570 (4 chips, 8 cores, 16 threads) SPECjbb2005 bops = 346,742,
SPECjbb2005 bops/JVM = 86,686. IBM System x3650 (2 chips, 8 cores, 8 threads) SPECjbb2005 bops = 323,172, SPECjbb2005
bops/JVM = 80,793. HP ProLiant DL360 G5 (2 chips, 8 cores, 8 threads) SPECjbb2005 bops = 301,321, SPECjbb2005 bops/JVM
= 75330. Dell PowerEdge 2950 III (2 chips, 8 cores, 8 threads) SPECjbb2005 bops = 305,411, SPECjbb2005 bops/JVM = 76,353

SPEC, SPECint reg tm of Standard Performance Evaluation Corporation. Sun result submitted to SPEC, other results from
www.spec.org as of 4/7/08. Sun SPARC Enterprise T5240 (UltraSPARC T2 Plus, 2 chips, 16 cores), 157 SPECint_rate2006, 142
SPECint_rate_base2006; IBM p 570 (POWER6, 2 chips, 4 cores), 122 SPECint_rate2006; Sun SPARC Enterprise T5220
(UltraSPARC T2, 1 chip, 8 cores), 83.2 SPECint_rate2006. Supermicro (Xeon X548, 2 chips, 8 cores), 147 SPECint_rate2006

SPEC, SPECfp reg tm of Standard Performance Evaluation Corporation. Sun result submitted to SPEC, other results from
www.spec.org as of 4/7/08. Sun SPARC Enterprise T5240 (UltraSPARC T2 Plus, 2 chips, 16 cores), 119 SPECfp_rate2006, 111
SPECfp_rate_base2006; IBM p 570 (POWER6, 2 chips, 4 cores), 116 SPECfp_rate2006; Sun SPARC Enterprise T5220
(UltraSPARC T2, 1 chip, 8 cores), 62.3 SPECfp_rate2006; Dell T7400 (Xeon X5482, 2 chips, 8 cores), 81 SPECfp_rate2006
SPEC, SPECjAppServer, SPECint, SPECfp are registered trademarks of Standard Performance Evaluation Corporation.



                                                                                                                                      Page 39

								
To top