The race withMPRACE On GRAPE, FPGA, Petaflops

Document Sample
The race withMPRACE On GRAPE, FPGA, Petaflops Powered By Docstoc
					The race with MPRACE On
GRAPE, FPGA,
Petaflop/s

Application Driven
Reconfigurable Computing for
Astrophysics and other Fields

       Rainer Spurzem, Astronomisches Rechen-Institut
       Zentrum für Astronomie Univ.Heidelberg, Germany
                     spurzem@ari.uni-heidelberg.de
          http://www.ari.uni-heidelberg.de/mitarbeiter/spurzem/
                     (ARI)


 Foundation Document of ARI
         May 10, 1700
Calendar Patent of Duke of Brandenburg

                                                 Collaboration:
                                                 Sverre Aarseth (IoA Cambridge UK),
                                                 David Merritt (RIT, USA),
                                                 Naohito Nakasato, Tsuyoshi Hamada
                                                    (RIKEN Japan),
                                                 Simon Portegies Zwart, Alessia Gualandris
                                                    (U Amsterdam),




Dez. 06                             COEHT 2007
                     The GRACE Project = GRAPE + MPRACE
    Astrophysical Computer Simulations using Programmable Hardware
    R. Spurzem, R. Männer, A. Burkert with
    G. Lienhart, G. Marcus, A. Kugel
    P. Berczik, I. Berentzen, M. Wetzstein, T. Naab…
    Interdisciplinary: Computer Science and Astrophysics
                       Univ. Heidelberg (ARI-ZAH), Munich (USM)
                       Univ. Mannheim (Techn. Informatik)




                                    MWK Baden-Württembg.


Dez. 06                         COEHT 2007
                      Astrophysics
Computer Simulation of Dense Star Clusters
Example1: Galactic Globular Clusters

Gravitative Star-Star Interaction
Complexity N2 (N: Number of Stars)

       Ground Based
           View



                       Globular Star Cluster Centauri
                       (Central Region with Hubble Space Telescope



  Dez. 06                   COEHT 2007
               Astrophysics
 Example 2: Motion of Supermassive Black Holes (SMBH)
 in central galactic star clusters (here not shown),
 gravitational wave emission, relativistic dynamics
                    Left: Orbits of
                    Triple-SMBH
                    in central star
                    cluster (not shown
                    here), simulation
                    NBODY6++

                    Right: SMBH-
                    Coalescence,
                    Gravitatonal Wave
                    Detection with
                    Space Antenna
                    LISA (2015)
                    Source: ESA


Dez. 06                   COEHT 2007
              LISA: Bin. Black Holes in the Universe
Terrestrial Detectors: (VIRGO, GEO600, LIGO): Galactic Compact
    Objects (black holes, neutron stars…) higher frequencies


                                        Astrophysical
                                          Sources




                                         Terrestrial Detectors
                                          Geo600 Hannover
                                         VIRGO, LIGO, TAMA, AIGO


  Space detectors
             LISA
   Dez. 06                 COEHT 2007
              Hardware - GRAPE
~128 Gflops for a price ~5K USD; Memory for up to 128K particles
~128 Gflops for a price ~5K USD; Memory for up to 128K particles




  GRAPE6a PCI board
GRAPE6a, -BL - PCI ASIC Board for PC-Clusters
GRAPE6a, -BL - PCI ASIC Board for PC-Clusters
PROGRAPE-4, FPGA based board from RIKEN (Hamada)
PROGRAPE-4, FPGA based board from RIKEN (Hamada)
GRAPE7 – new FPGA based board from Tokyo Univ. (Fukushige)
GRAPE7 – new FPGA based board from Tokyo Univ. (Fukushige)
GRAPE-DR – new board from Makino et al. NAOJ
GRAPE-DR – new board from Makino et al. NAOJ
MPRACE1,2 – FPGA boards from Univ. Mannheim/GRACE (Kugel et al.)
MPRACE1,2 – FPGA boards from Univ. Mannheim/GRACE (Kugel et al.)
   Dez. 06                   COEHT 2007
  Basic idea of any GRAPE N-body code:



~N                                                  ~N^2




r            N      r      r               G ⋅mj        r
ai =        ∑       f ij
          j =1; j ≠ i
                           f ij = −
                                         (r + ε )
                                           2
                                          ij
                                               2 3/ 2
                                                        rij

Dez. 06                     COEHT 2007
  GRAPE = GRAvity PipE – more detail…
                            r r
                       mi ; ri ; vi ; ti
      r r
m j ; rj ; v j ; t j




                                                        r r  &
                                                   φi ; ai ; ai
   Dez. 06                            COEHT 2007
ARI-ZAH + RIT 32 node GRAPE6a clusters




           Performance Analysis (3.2 Tflop/s):
           Harfst et al. 2006, New Astron., in press, astro-ph/0608125
 Dez. 06                    COEHT 2007
                                         Hardware - GRAPE
                       4
                   10
                              32xGRAPE6a
                                                                               ARI-ZAH GRAPE
                                                                               Cluster:
                              GRAPE6
                   103

                                                                               ~3.2 Tlop/s
                                                                               sustained
Speed (GFlops)




                              GRAPE6a
                   102

                                                                            Up to 4 million stars!
                                                                            World record in this class!
                   101
                                                                            (Direct N-Body)

                                                          01                Harfst, Gualandris,
                   100                                    02
                                                          04
                                                                            Merritt, Spurzem,
                                                          08                Portegies Zwart, Berczik
                                                          16
                                                          32                2006, New Astron. in press
                     -1
                  10                                                        astro-ph/0608125
                        103             104       105       106
                                        Particle number - N

                 Dez. 06                                       COEHT 2007
                             Software, NBODY6++

O(N p) + O(N2 /p) [ + O(N Nn/p) ]

   1                2                  3
Communication Long Range               Short Range
              Regular Force            Irregular Force
Original code by S.J.Aarseth, S. Mikkola (ca. 20.000 lines):
•Hierarchical Block Time Steps, 4th order Pred./Corr. Scheme
•Ahmad-Cohen Neighbour Scheme
•Kustaanheimo-Stiefel and Chain-Regular.
  for close encounters (Quaternions!)
•4th order Hermite scheme (pred/corr)
• Parallelization (Spurzem 1999)
•Implementation on GRAPE Cluster (Harfst et al. 2006)
    Dez. 06                           COEHT 2007
               Hardware




Dez. 06   COEHT 2007
Dez. 06   COEHT 2007
          Pipeline Generation on FPGA I
          (see talk by Gerhard Lienhart)




Dez. 06         COEHT 2007
          Pipeline Generation on FPGA II
          (see talk by Gerhard Lienhart)




Dez. 06        COEHT 2007
Hardware FPGA


                                            MPRACE

                                            GRAPE




   • GRAPE moves the bottleneck to short range (neighbour) forces
   • Use FPGA-platform for accelerating neighbour algorithm

 Dez. 06                     COEHT 2007
                     Hardware - GRACE
                                                 Univ. Heidelberg (ARI) Univ. Mannheim (LIV)
              The GRACE architecture             Univ. Munich (USM) RIKEN Institute Tokyo
                (GRAPE+MPRACE)




_____ Infiniband Dual PCIe 20Gb/s ____
                32 Hosts




                                          4 Tflops, 128 CPUs, 128 GB Memory
                                           (64 P4 Xeon, 32 GRAPE, 32 Xilinx FPGA-MPRACE)


    Dez. 06                              COEHT 2007
                       Preliminary
                       Ongoing Work


                            Xeon 3.6GHz

                            FPGA
                            1 Pipeline



                            GRAPE
                            12 Pipelines




Dez. 06   COEHT 2007
Prototype
Testing

Production
Summer 2007




 Dez. 06      COEHT 2007
                                                Other Applications

                                                       r           ⎛ pi            ⎞
Smoothed Particle
                                                      dvi
                                                      dt
                                                          = − ∑ mj ⎜
                                                                   ⎜ρ2 ρ
                                                                         pj              r
                                                                        + 2 + ∏ ij ⎟∇ iW rij , hij
                                                                                   ⎟
                                                                                                   (          )
                                                              j    ⎝ i    j        ⎠
Hydrodynamics (SPH)
                                                               (                     )
                                                             ⎧ − α cij μ ij + β μ ij 2
                                                             ⎪
                                                                                             r r
                                                                                         for vij rij ≤ 0
                                                      ∏ ij = ⎨             ρ ij
                  (r )                                       ⎪                               r r
           N
    ρ i = ∑ m jW rij , hij ,   pi = P ( ρ i )                ⎩             0            for vij rij > 0
           j =1
                                                             ρi + ρ j              fi + f j     r r r
    Hydrodynamic equation                             ρ ij =            , f ij =            , rij = ri − rj
                                                                 2                    2
    of motion, gravity                                       ci + c j             hi + h j     r r r
                                                       cij =          , hij =              , vij = vi − v j
                                                                 2                   2
       r                                                            r r
      dvi    1      r visc                                      h vr
                                                      μ ij = r 2 ij ij2 ij 2 f ij
          = − ∇Pi + ai                                       rij + η hij
      dt     ρi
                                                                   SPH formulation
 Dez. 06                                        COEHT 2007
              Other Applications
    Molecular Dynamics
   Protein Interactions, with Nanotubes, Ligands, Water
   Cellular Signaling
Long Range Force: Fast TREE or direct GRAPE
Intermediate Range: FPGA
Prospective Partners:
 * G. Sutmann, A. Schiller,
   NIC, FZ Jülich (using Pro-
   GRAPE FPGA Board, RIKEN Inst. Japan)
* EML Research Institute Heidelberg, S. Richter, R. Wade
Dez. 06                  COEHT 2007
How to build a super-GRACE…
 … 50 Tflop/s machine for < 5 % of gen. purpose cost ?
•200 standard nodes, AMD Opteron or Pentium Xeon
•200 super-GRAPEs (250 Gflop/s) MPRACE-2, GRAPE-DR, PROGRAPE
•Super-Network (e.g. AMD Hypertransport, Xtoll-Connection Custom Network
               (AMD excellence centre with Univ. of Mannheim, U. Brüning)




Such computer
competes with
general purpose
supercomputers on
the Petaflop/s scale.

Used: Performance Model
of Harfst et al. 06

  Dez. 06                          COEHT 2007
                     Other Applications
 Astrophysical Excellence Cluster Univ. of
 Heidelberg – admitted for 2nd round –
 projected cooperation with informatics:




Co-Ordination: Prof. J. Wambsganss
                (Director ZAH)                    Information
….                                                Science
Co-I‘s Prof. R. Klessen
       Prof. R. Spurzem …
                                                  Prof. Brüning
                                                  Prof. Männer

                                                  Further
    Dez. 06                          COEHT 2007
                                                  Partners?

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:15
posted:9/28/2010
language:English
pages:24