ANSYSCFX and Sun by xzd16972

VIEWS: 11 PAGES: 25

									MCAE Performance on
Sun's x64 Systems

Henry H. Fong
Industry Principal, Global HPC
Manufacturing Industry Sales
henry.fong@sun.com

May 14-15, 2006            Singapore
Agenda




         Sun HPC Update

         x64 Performances on MCAE Apps

         Sun's Strengths in MCAE/x64

                                         2
x64 HPC Strategy
• Performance and Scalability
• Energy Efficiency
    > Power consumption, Heat dissipation, Cooling
    > Lower operating costs
• System Balance
    > Compute, Interconnect, I/O
•   Complete software stack for each segment
•   Manageability
•   Serviceability
•   Reference designs for each industry vertical from
    common building blocks                            3
General HPC Trends
• Adoption of Clusters Increasing
  > Compute nodes going to be 4-way (2S dual core)
  > Memory density growth
• Increased price/performance following Moore's law
  roughly
  > Nodes are getting fatter
• GbE is still the interconnect of choice for many HPC
  application areas (e.g., GM, DaimlerChrysler)
• IB provides a low latency and high bandwidth
  alternative
  > 32 dual-socket dual-core nodes within a rack
  > Multi-rack deployments through layers of switches   4
              Power Consumption Ratios
                                                                                                               GFlops/Kwatt ratio
                                                                                                                               :
                      10,000
Peak performance (TFlops/s)



                                                          cluster if not
                                                                 power,
                                            Sun x64,40 TFAverageis calculated
                                                         known,
                                                                                                                                               500
                                                            as 2/3 of peak power.
                                             Vector          Does not include IO                                                               200
                                                                               More power
                                             MPP                               efficient                                                       100
                              1,000
                                             Cluster                                                                                            50
                                                                                            BG LLL
                                             Fat node
                                                                                                                                                20

                               100                                        BG Watson                                                             10
                                                                                                                       LLL ASC Purple
                                                                                    Sandia Tbird      NASA Columbia
                                                            MareNostrum                              Sandia RedStorm           EarthSim
                                                        Colsa Xserve                 ORNL XT3
                                                        VT Sys X                    LLL Thunder                  Less power
                                                                       ORNL X1E
                                                                                                                   efficient
                                10
                                      0.1                                             1                                                   10
                                                  Average power consumption (Mwatts)                                                    5
             Floor Space Ratios
                                                                                                                  Gflops/sq ft ratio:
                                                                                                          1000    500         200         100
                      10,000
Peak performance (TFlops/s)



                                          Floor space is
                                         calculated as 2x        More space                                             Sun x64           50
                                         the footprint and        efficient
                                        does not include IO
                                                                                                                        Vector            20

                              1,000                                                                                     MPP               10
                                                                                                                                           5
                                                                                                                        Cluster
                                                                        BG LLL                                          Fat node
                                                                                                                                           2

                                                  BG Watson
                               100                                                                NASA Columbia
                                                                             LLL ASC Purple
                                                                                 Sandia Tbird
                                                  MareNostrum                         Sandia RedStorm EarthSim

                                                        Colsa Xserve        ORNL XT3
                                                                                   LLL Thunder    Less space
                                                        ORNL X1E       VT Sys X                     efficient
                                10
                                      0.1                                1                               10                         100
                                                                        Floor space (K sq ft)                                       6
          Memory Size Ratios
                                                                                                        Bytes/(flop/s) ratio:
                      10,000                                                                                               0.1
Peak performance (TFlops/s)



                                                          Does not include
                                          Sun x64            IO nodes                                                      0.2
                                          Vector                             Less balanced
                                          MPP
                              1,000
                                          Cluster
                                          Fat node                                  BG LLL



                               100                                  BG Watson
                                                                  NASA Columbia        LLL ASC Purple
                                                                EarthSim       Sandia Tbird
                                                                            Sandia RedStorm
                                                                                                          More
                                                    MareNostrum
                                             Colsa Xserve                                               balanced
                                                                  ORNL XT3
                                                     VT Sys X LLL Thunder
                                                    ORNL X1E
                                10                                Stuttgart SX-8
                                      1                          10                             100                   1,000
                                                               Memory size (Tbytes)                                    7
   Bisection Bandwidth Ratios
                                                                                                            Bytes/flop ratio:
                                                                                            .001   .002   .005   .01   .02   .05   .1
                     10,000
Peak performance (TFlops/s)



                                         Sun x64                 Less balanced                                                     .2

                                         Vector
                                                                                                                                      .5
                                         MPP
                              1,000
                                         Cluster
                                         Fat node               BG LLL


                                                                                                                    More
                               100                        BG Watson
                                                                                                                  balanced
                                                                       Sandia Tbird      LLL ASC Purple
                                                             NASA Columbia        Sandia RedStorm
                                                                                            EarthSim
                                                        MareNostrum
                                                    Colsa Xserve     ORNL XT3
                                                     LLL Thunder        ORNL X1E
                                                             VT Sys X
                                10                                         Stuttgart SX-8
                                  0.01             0.1                 1                    10               100              1,000
                                                   Bisection bandwidth (TBytes/sec)                                            8
 Vertical vs. Horizontal Workloads
                   Nano Technology        Finance                  Real Time
                                                             Local Weather Forecast
                                              Fit for Vector                           64bit
                                                                                  Shared Memory
           Large

                                                                         Engine Analysis
                                                                         Simulation
                                                     Meteorology
   Data Size              Genomics
                                                                                            Automotive
  Fit for Scalar
                                                                                           EMD Simulation
                                               Fluid                              Noise Analysis
                                             Dynamics
                                                     Crash
                          Chemistry                                EMD
                                                                                       Workload
           Small                                                                       Characterization
                                         Structure                                     Courtesy of NEC
32bit-Cluster
                                Little      Compute Intensity            Huge


                                                                                                          9
                                                         A Complete HPC Portfolio From Sun
                                                                                       Custom or ISV Applications                 ,
                                                                                                                               en e
Sun CRS, Support, Architectural, Professional Services




                                                                                                                              p
                                                           Applications                                                      O re
                                                                                       Sun HPC ClusterTools                     F
                                                                                                                                  ,
                                                                                                                              pen e
                                                           Management                                                        O re
                                                          Workload            Sun N1™ Grid Engine Software                      F
                                                          Management                                                              ,
                                                                                                                              pen e
                                                          Cluster           Sun N1™ System Manager Software                  O re
                                                                                                                                F
                                                          Management
                                                                                                                         ,
                                                                                                                     pen e
                                                          Operating                                                 O re
                                                                                                                       F
                                                          System

                                                               Node
                                                            Processor

                                                                                           64 bit

                                                           Interconnect   Gigabit Ethernet, Myrinet, Infiniband
                                                                                                                             10
Sun’s Extended x64 Product Line

  New Rack
   Servers
                Sun Fire X2100 Sun Fire X4100 Sun Fire X4200 Future Galaxy servers up to 16-way


    Rack
   Servers           V20z Single/Dual Core                     V40z Single/Dual Core


    X64
 Workstations
                   Sun Ultra 20                                        Sun Ultra 40




                                                                                          11
  Sun Grid Rack System
The Integrated HPC or Web Services System, Ready for Deployment
  • Any combination of Sun Fire™ X2100, X4100,
    X4200 servers
     > Easy-to-use web-based configurator available
  • High Performance Computing (HPC) and Web
    Services options
        > Suggested configurations available based on Sun's expertise
  • Infrastructure:
        > Sun™ Rack, Cabling
        > Networking (Sun and select 3rd Party)
        > Recommended Management Software: Sun N1™ Grid Engine,
          Sun N1™ System Manager, Sun N1™ Service Provisioning
          System                                               12
Sun Grid Rack System Configurator

> Flexibility in components
   > Rack size
   > Server nodes, OS
   > Management node and software
   > Networking and interconnect
   > Storage
   > Support service options
> Sample configurations in different
  sizes
   > Industry focus – e.g. Manufacturing

                                                                      13
       http://www.sun.com/servers/sungridracksystem/configtool/index.html
    FLUENT performance on Sun Fire x2100/x4100
    clusters (3.0 GHz Opteron 156, SuSE Linux SLES9 SP3 –
    FL5L1, FL5L2, FL5L3 benchmarks, Infiniband interconnects.
    Rating number -- the bigger, the better. Apr. 5, 2006)
      -----------------------------------------------------------------------------------------------
      Server         Infiniband cpu's            FL5L1           FL5L2          FL5L3
      ------------- --------------- ----------- ----------- ------------- -------------
      Sun x2100 Topspin                16        2019.9          2054.7         395.7
        3.0 GHz 156
      Sun x4100 SilverStorm                   16   1888.5        1950.3        374.6
        2.8 GHz 254
      ------------------------------------------------------------------------------------
      IBM E326 Topspin                 16        1608.2          1615.0         311.6
        2.4 GHz 250
      IBM x336 Topspin                        16   1632.5        1398.6        260.2
        EM64T 3.6 GHz Xeon
      HP DL360 Voltaire                       16   1441.8        1249.9        237.8
        EM64T 3.4 GHz Xeon
      SGI Altix 3000                      ?   16   1205.0        1056.2        180.5
        Itan 2 1.6 GHz                                                                              Page 14
MCAE on x64-Fong-Singapore-06-05-14.sxi
    STAR-CD performance on Sun Fire x2100
    cluster (2.6 GHz Opteron, SuSE Linux SLES9 SP3 –
    Mercedes A-class benchmark. Turbulent flow. Gigabit
    Ethernet. Nov. 2005)
                   ------------------------------------------------------
                         No. of processors                Scalability
                   -----------------------------       -------------------
                                          1                 1.00

                                          4                 3.74

                                          8                 7.29

                                          16              12.79

                                          32              18.25
                   ------------------------------------------------------
                   Planning to run same STAR-CD benchmark with Topspin,
                   SilverStorm, and Voltaire Infiniband interconnects.       Page 15
MCAE on x64-Fong-Singapore-06-05-14.sxi
CFX 5.7.1 and CFX 10.0

                       V440 and V40z Performance on Sun
                                (larger is better)
 10

 9

 8
                                                                         1 CPU
 7
                                                                         2 CPU
 6                                                                       4 CPU

 5

 4

 3

 2

 1

 0
      V440 CFX 5.7.1    V440 CFX 10.0   V40z CFX 5.7.1   V40z CFX 10.0


                                                                                 16
ANSYS 9.0 and 10.0 Multiphysics

                    V440 and V40z Performance on Sun
                             (larger is better)
 4




 3
                                                                   1 CPU
                                                                   2 CPU
                                                                   4 CPU

 2




 1




 0
     V440 ANSYS 9   V440 ANSYS 10   V40z ANSYS 9   V40z ANSYS 10


                                                                           17
                                                                                            64-bit ANSYS 9.0
                                                                                            Brake Automotive Performance (2P)

                                                                    Ansys 9.0 Automotive Brake Rotor Benchmarks
                           Dell Precision 670 Xeon64 3.6 Ghz                                                                                              HP rx4640 1.5Ghz Itanium 2                                                           V40z Opteron 2.6 Ghz


                                                                   2.00
                               Performance Relative to Itanium 2



                                                                   1.80



                                                                   1.60



                                                                   1.40



                                                                   1.20



                                                                   1.00



                                                                   0.80



                                                                   0.60



                                                                   0.40



                                                                   0.20



                                                                   0.00
                                                                                            os                                    al                              g                                  se                              al                                                         se
                                                                                        z                                    nt                                pc                                  ar                           nt                           cg                             r
                                                                                     nc                                                                                                                                                                    _p                            pa
                                                                                  la                                 _   fr o                         c   t_                              _   sp
                                                                                                                                                                                                                            _f
                                                                                                                                                                                                                              ro                       m                            _s
                                                                               ck                                 ct                             t ru                                  ct                                                            er                        rm
                                                                          lo                                                                _s                                  t ru                                   rm                          th
                                                                        _b                                 t ru                                                                                                     he                         _                            he
                                                                   ct                                 _s                               sm                                  _s                                  _t                         sm                           _t
                                                   ru                                            sm                                                                   sm                                  sm                                                      sm
                                      _         st
                              sm




                                                                                                                                                                                                                                                                                                     Page 18
MCAE on x64-Fong-Singapore-06-05-14.sxi
                                                                                            64-bit ANSYS 9.0
                                                                                            Brake Automotive Performance (4P)

                                                                    Ansys 9.0 Automotive Brake Rotor Benchmarks
                                                                                                 HP rx4640 1.5Ghz Itanium 2                                                                         V40z Opteron 2.6 Ghz


                                                                   2.00
                               Performance Relative to Itanium 2



                                                                   1.80



                                                                   1.60



                                                                   1.40



                                                                   1.20



                                                                   1.00



                                                                   0.80



                                                                   0.60



                                                                   0.40



                                                                   0.20



                                                                   0.00
                                                                                        z   os                                    al                              g                                     se                              al                      cg                             rs
                                                                                                                                                                                                                                                                                                  e
                                                                                     nc                                      nt                                pc                                  ar                              nt                         _p                            pa
                                                                                  la                                     fr o                             t_                                  sp                                 ro
                                                                               ck                                    _                                c                                   _                                    _f                         m                            _s
                                                                          lo                                      ct                             t ru                                  ct                                 rm                            er                        rm
                                                                                                           t ru                             _s                                  t ru                                                                  th                       he
                                                                        _b                            _s                                                                   _s                                          he                         _
                                                                                                                                                                                                                                                                          _t
                                                                   ct                                                                  sm                                                                         _t                         sm
                                                st
                                                   ru                                            sm                                                                   sm                                     sm                                                      sm
                                      _
                              sm




                                                                                                                                                                                                                                                                                                      Page 19
MCAE on x64-Fong-Singapore-06-05-14.sxi
    LS-DYNA scalability/performance on Sun Fire
    x2100 cluster (2.6 GHz Opteron 152, SuSE Linux SLES9
    – Neon benchmark, www.topcrunch.org). GbE vs Infiniband
    interconnects.
      -----------------------------------------------------------------------------------------------
      Proc Elapsed t (sec.) Scal. Efficiency Elapsed t (sec.) Scal.                       Efficiency
                  GbE                             %        Cisco/Topspin IB                    %
      ------ ------------------- ----------- ----------- --------------------- -------- ------------
      1           11,331            1.000         --             10,241          1.000         --

      2                   6,341           1.787    89.4            5,128          1.997         99.8

      4                   3,222           3.517    87.9            2,703          3.789         94.7

      8                   1,874           6.046    75.6            1,414          7,294         91.2

      16                  1,676           6.761    42.3               774        13,231         82.7

      32           2,529            4.480         14.0               458        22,360         69.9
      -----------------------------------------------------------------------------------------------Page 20
MCAE on x64-Fong-Singapore-06-05-14.sxi
 LS-DYNA performance on Sun Fire x2100 cluster
      (3.0 GHz Opteron 156, SuSE Linux SLES9 SP3 – 3-car-crash
 benchmark, Various interconnects. 16 processors. Elapsed time
 in sec. Apr.5,2006. www.topcrunch.org)
      -----------------------------------------------------------------------------------------------
      Vendor, server, GHz, processor              Interconnect #nodes x Elapsed t (sec)
                                                                    #proc's/node
                                                                    #cores/proc.
      --------------------------------------- ------------------- -------------- ------------------
      Sun x2100, 3.0 GHz, Opteron 156 IB (Topspin) 16 x 1                          10,987
      ----------------------------------------------------------------------------------------------
      HP CP4000,2.6 GHz,Opteron DL145 IB (Topspin) 16 x 1                          11,566
      Cray XD1,2.2 GHz,dual-core Opteron Rapid Array 8 x 2 x 1 12,222
      IBM eServer P5 575, 1.9 GHz                   HPS              2x8           12,386
      SGI Altix 3700, 1.6 GHz, Itan 2              Numalink          1 x 16        12,388
      Cray XD1,2.4 GHz, Opteron                     Rapid Array 8 x 2              12,576
      Cray XD1,2.2 GHz,dual-core OpteronRapid Array 4 x 2 x 2 14,078
      HP C6000, 1.5 GHz, Itan 2, rx2600 IB (Topspin) 8 x 2                         14,737
      -----------------------------------------------------------------------------------------------
                                                                                                    Page 21
MCAE on x64-Fong-Singapore-06-05-14.sxi
MSC.Nastran Solaris x64 Performance
  “MSC.Nastran 2005r2 Vendor Benchmark's”
          AMD 64 Linux / Solaris x64
          (run on Sun's V40z server)
             (elapsed time in seconds)
•   1. xloop: 28710/23642                =   1.2    speedup
•   2. xlem0:    6319/5159               =   1.2    speedup
•   3. xltd0:    7353/6459               =   1.2    speedup
•   4. xxafst0:  1959/1854               =   1.05   speedup
•   5. xxcmd0: 18946/21581               =    .9    speedup
•   6. xxcmda_1: 4099/2580               =   1.6    speedup
•   7. lgqdo:    3923/3082               =   1.3    speedup 22
MSC.Nastran on Sun Grid Compute Utility




                                      23
Why Solaris x64 for MCAE?

• Major MCAE/MCAD apps are being ported to
  Solaris x64 (LS-DYNA, PTC Pro/E, ANSYS,
  MSC.Nastran, STAR-CD). Negotiating with
  FLUENT, ESI, Catia, Unigraphics, etc.
• Major customers have requested key MCAE codes
  on Solaris x64
• Sun just bid on a major General Motors HPC
  Request for Proposal (that permits no Linux) – GM
  invited Sun to bid Solaris 10 x64 on our Opteron
  servers.
                                                 24
    Sun's Strengths in
    HPC/Grid/MCAE
     ●     Outstanding price/performance on MCAE
           apps in clustered Linux, Opteron-based
           servers and workstations. Progress on
           Solaris 10 x64 version.
     ●     Very competitive scalable Sun Fire family of
           SMP-servers; in future, Sun-Fujitsu jointly
           developed SPARC/Solaris “APL” Unix
           servers
     ●     Grid Computing – vital part of Sun's HPC
           strategy. Excellent Sun Client Services
           support.
MCAE on x64-Fong-Singapore-06-05-14.sxi
                                                          Page 25

								
To top