Learning Center
Plans & pricing Sign in
Sign Out

Secrets of Supercomputing


Approved for public release;
distribution is unlimited

     Secrets of Supercomputing
                  The Conservation Laws
                      Supercomputing Challenge Kickoff
                            October 21-23, 2007

  I. Background to Supercomputing
  II. Get Wet! With the Shallow Water Equations
                   Bob Robey - Los Alamos National Laboratory
                  Randy Roberts – Los Alamos National Laboratory
                            Cleve Moler -- Mathworks

• Bob Robey -- Los Alamos National Lab, X division
   –, 665-9052 or home:,
   – 3D Hydrocodes and parallel numerical software
   – Helped found UNM and Maui High Performance Computing
     Centers and Supercomputing Tutorials
• Randy Roberts -- Los Alamos National Lab, D Division
   – Java, C++, Numerical and Agent Based Modeling
• Cleve Moler
   – Matlab Founder
   – Former UNM CS Dept Chair
   – SIAM President
   – Author of “Numerical Computing with Matlab” and “Experiments with

                  Conservation Laws
• Formulated as a conserved quantity
    – mass
    – momentum                            Conserved
    – energy
• Good reference is Leveque’s book and his
  freely available software package
  CLAWPACK (Fortran/MPI) and a 2D
  shallow water version Tsunamiclaw
Leveque, Randall, Numerical Methods for Conservation Laws
Leveque, Randall, Finite Volume Methods for Hyperbolic Problems
     I. Intro to Supercomputing
• Classical Definition of Supercomputing
  – Harnessing lots of processors to do lots of small
• There are many other definitions which usually
  include any computing beyond the norm
  – Includes new techniques in modeling, visualization,
    and higher level languages.
• Question for thought: With greater CPU
  resources is it better to save programmer work
  or to make the computer do bigger problems?

II. Calculus Quickstart
   Decoding the Language of

    Calculus Quickstart Goals
• Calculus is a language of mathematical wizards.
  It is convenient shorthand, but not easy to
  understand until you learn the secrets to the
• Our goal is for you to be able to
        READ calculus and         TALK calculus.

• Goal is not to ANALYTICALLY SOLVE calculus
  using traditional methods. In supercomputing we
  generally solve problems by brute force.

           Calculus Terminology
• Two branches of Calculus
   – Integral Calculus
   – Derivative Calculus
• P = f(x, y, t)
   – Population is a function of x, y, and t
• ∫f(x)dx – definite integral, area under the curve, or
• dP/dx – derivative, instantaneous rate of change, or
  slope of a function
• ∂P/∂x – partial derivative implying that P is a function of
  more than one variable

                    Matrix Notation
  a  c                                      This is just a system of equations

  b   d   0

   t   x
The first set of terms are state variables at time t and                           F
usually called U. The second set of terms are the flux            U
variables in space x and usually referred to as F.

  Patterns for Parallel Programming
  Parallel Algorithms                  Implementation Patterns
• Data Parallel -– most                •   Message Passing
  common with MPI                      •   Threads
• Master/Worker – one                  •   Shared Memory
  process hands out the                •   Distributed Arrays, Global
  work to the other                        Arrays
  processes – great load
  balance, good with
• Pipeline – bucket brigade

Patterns for Parallel Programming, Mattson, Sanders, and Massingill, 2005

                      Writing a Program
                         Data Parallel Model
Serial operations are done                                        Sections of distributed data are
on every processor so that                                        “owned” by each processor.
replicated data is the same                                       This is where the parallel
                                         P(400) – distributed     speedups occur.
on every processor.
                                          Ptot -- replicated

This may seem like a waste                                        Often ghost cells around each
of work, but it is easier than                                    processor’s data is a way to
synchronizing data values.                                        handle communication.

       Proc 1                    Proc 2                        Proc 3               Proc 4

      P(1-100)              P(101-200)                    P(201–300)              P(301-400)

        Ptot                      Ptot                          Ptot                 Ptot

         2007-2008 Sample
       Supercomputing Project
• Evaluation Criteria – Expo (Report slightly
  different). Use these to evaluate the
  following project.
  – 15% Problem Statement
  – 25% Mathematical/Algorithmic Model
  – 25% Computational Model
  – 15% Results and Conclusions
  – 10% Code
  – 10% Display           Evaluate Us!!

             Get Wet!
 With the Shallow Water Equations
• The shallow water model for wave motion is
  important for water flow, seashore waves, and
• Goal of this project is to model the wave motion
  in the shallow water tank
• With slight modifications this model can be
  applied to:
  – ocean or lake currents
  – weather
  – glacial movement

Output from a shallow water equation
    model of water in a bathtub.

    The water experiences 5 splashes which generate surface
    gravity waves that propagate away from the splash locations
    and reflect off of the bathtub walls. Wikipedia commons,
    Author Dan Copsey
    Go to shallow water movie.

           Mathematical Equations
            Mathematical Model
                         Shallow Water Equations
                                                    Notes: mass equals
  Conservation of Mass                              height because width,

       ht  (hu ) x  0                             depth and density are
                                                    all constant

  Conservation of Momentum                           Note: Force term,
                                                     Pressure P=½gh2

      (hu)t  (hu  gh ) x  0
                         2    1
                                     2                   h -> height
                                                         u -> velocity
                                                         g -> gravity

References: Leveque, Randall, Finite Volume Methods for Hyperbolic
Problems, p. 254
Shallow Water Equations
     Matrix Notation

 h       uh    
 hu    2 1 2   0.
   t hu  2 gh  x
                   The maximum time
                   step is calculated so
wave speed  gh0   as to keep a wave
                   from completely
                   crossing a cell.

             Numerical Model
• Lax-Wendroff two-step, a predictor-corrector
  – Predictor step estimates the values at the zone
    boundaries at half a time step advanced in time
  – Corrector step fluxes the variables using the predictor
    step values
• Mathematical Notes for next slide:
  – U is a state variable such as mass or height.
  – F is a flux term – the velocity times the state variable
    at the interface
  – superscripts are time
  – subscripts are space

       The Lax-Wendroff Method
  Half Step

       n 2
  Ui 1           0.5(U  U ) 
                          i 1       ( Fi 1  Fi )
                                       n  n      n


 Whole Step

                           t n
             n 1
                     U i  ( Fi   Fi  )
                                        n         1              1
     Ui                                            2              2

                                               1              1
                                               2              2

Explanation graphic courtesy of Jon Robey and Dov Shlacter, 2006-2007
Supercomputing Challenge
     Explanation of Lax-Wendroff Model

                                                              Data assumed to be
                                                              at the center of cell.
Physical model                                                             Ghost cell

                                                                     Space index
Half-step                                                         t+.5

Full step

     Explanation graphic courtesy of Jon Robey and Dov Shlacter, 2006-2007
     Supercomputing Challenge. See appendix for 2D index explanation.                  18
           Extension to 2D
• The extension of the shallow water
  equations to 2D is shown in the following
  – First slide shows the matrix form of the 2D
    shallow water equations
  – Second slide shows the 2D form of the Lax-
    Wendroff numerical method

2D Shallow Water Equations

     h          uh            vh      
     hu   hu 2  1 gh2    huv       
                  2                     0.
     hv  t  huv
                         hv 2  1 gh2 
                           x       2     y
       U               F                   G

Note the addition of fluxes in the y direction and a flux cross term
in the momentum equation. The U, F, and G are shorthand for
the numerical equations on the next slide. The U terms are the
state variables. F and G are the flux terms in x and y.

     The Lax-Wendroff Method
 Half Step
                      n 2
                  U i 1 , j     0.5(U  U ) n
                                              i 1, j
                                                                   ( Fi 1, j  Fi ,nj )

                                                          i, j

                      n 2
                  U i, j 1      0.5(U i , j 1  U i , j ) 
                                         n            n
                                                                   (Gin, j 1  Gin, j )
                            2                                  2y
Whole Step

           n 1                      t n  1             n 2
                                                                      t n  1            n 2

      U            U        n
                                       ( Fi  1 ,2j  Fi  1 , j )     (Gi , j 2 1  Gi , j  1 )
                                     x                               y
          i, j              i, j
                                               2            2                       2            2

   2D Shallow Water Equations
  Transformed for Programming
      Letting H = h, U = hu and V = hv so that our
      main variables are the state variables in the first
      column gives the following set of equations.

      H           U                 V         
      U   U 2 / H  1 gH 2    UV / H        
                     2                          0.
      V  t  UV / H
                             V 2 / H  1 gH 2 
                               x          2      y

H is height (same as mass for constant width, depth and density)
U is x momentum (x velocity times mass)
V is y momentum (y velocity times mass)

         Sample Programs
• The numerical method was extracted from
  the McCurdy team’s model (team 62) from
  last year and reprogrammed from serial
  Fortran to C/MPI using the programming
  style from one of the Los Alamos team’s
  project (team 51) with permission from
  both teams.
• Additional versions of the program were
  made in Java/Threads and Matlab

                  Programming Tools
                     Three options
1.       Matlab
     –     Computation and graphics integrated into Matlab desktop
2.       Java/Threads
     –     Eclipse or Netbeans workbench
     –     Graphics via Java 2D and Java Free Chart
3.       C/MPI
     –     Eclipse workbench -- An open-source Programmers
     –     PTP (parallel tools plug-in) – adds MPI support to Eclipse
           (developed partly at LANL)
     –     OpenMPI – a MPI implementation (developed partly at LANL)
     –     MPE -- graphics calls that come with MPICH. Graphics calls
           are done in parallel from each processor!

   Initial Conditions and Boundary
• Initial conditions
   – velocity (u and v) are 0 throughout the mesh
   – height is 2 with a ramp to the height of 10 at the right
     hand boundary starting at the mid-point in the x
• Boundary conditions are reflective, slip
   – hbound=hinterior; uxbound=0; vxbound=vinterior
   – hbound=hinterior; uybound=uinterior; vybound=0
   – If using ghost cells, force zero velocity at the
     boundary by setting Uxghost= -Uinterior

• The Lax-Wendroff model accurately
  models the experimental wave tank
  – matches wave speed across the tank
• Some of the oscillations in the simulation
  are an artifact of the numerical model
  – OK as long as initial wave is not too steep
  – numerical damping technique could be added
    but is beyond the scope of this effort

Work used by permission:
• Awash: Modeling Wave Movement in a Ripple Tank,
  Team 62, McCurdy High School, 2006-2007
  Supercomputing Challenges
• A Lot of Hot Air: Modeling Compressible Fluid Dynamics,
  Team 51, Los Alamos High School, 2006-2007
  Supercomputing Challenge
We all have bugs and thanks to those who found mine
• Randy Roberts and Jon Robey for finding and fixing a
  bug in the second pass
• Randy Leveque for finding a missing square in the
  gravity forcing term

              Lab Exercises
• TsunamiClaw

• Matlab
• Experimental demonstration

• Java Serial
• Java Parallel


         Java Wave Structure
• Wave class does most of the work
  – main(String[] args) calls start()
  – start() creates a WaveProblemSetup
  – start() calls methods to do initialization and
    boundary conditions
  – start() calls methods to iterate and update the
Java Wave Structure (continued)
• WaveProblemSetup stores the new and old
• swaps the new and old arrays when asked
  to by Wave
     Java Wave Program Flow
• Create arrays for new, old, and temporary
• Initialize data
• Set boundary data to represent correct
  boundary conditions
• Iterate for the given number of iterations
     Java Wave Iteration Flow
• Update physics into new arrays from data
  in old arrays
• Set boundary data to represent correct
  boundary conditions with updated arrays
• Update display
• Swap new arrays with old arrays
             Java Threads
• How do you take advantage of new Multi-
  Core processors?
• Run parts of the problem on different cores
  at the same time!
     Java Threads (continued)
• WaveThreaded program
  – partitions the problem into domains using
    SubWaveProblemSetup objects
  – runs calculations on each domain in separate
    threads using WaveWorker objects
  – adds complexity with synchronization of
    thread's access to data
C/MPI Program Diagram
        Allocate memory
       Set Initial Conditions
          Initial Display

      Update Boundary Cells
       MPI Communication
       External Boundaries

            First Pass
             x half step
             y half step

          Second Pass

         Swap new/old
        Graphics Output
       Conservation Check       Repeat

       Calculate Runtime
    Close Display, MPI & exit
                       MPI Quick Start
•   #include <mpi.h>
•   MPI_Init(&argc, &argv)

•   MPI_Comm_size(Comm, &nprocs) // get number of processors
•   MPI_Comm_rank(Comm, &myrank) // get processor rank 0 to nproc-1

•   // Broadcast from source processor to all processors
•   MPI_Bcast(buffer, count, MPI_type, source, Comm)

•   // Used to update ghost cells
•   MPI_ISend(buffer, count, MPI_type, dest, tag, Comm, req)
•   MPI_IRecv(buffer, count, MPI_type, source, tag, Comm, req+1)
•   MPI_Waitall(num, req, status)

•   // Used for sum, max, and min such as total mass or minimum timestep
•   MPI_Allreduce(&num_local, &num_global, count, MPI_type, MPI_op, Comm)

•   MPI_Finalize()

•   Web pages for MPI and MPE at Argonne National Lab (ANL) -- http://www-
• The software is already setup on the
• For setup on home computers, there are
  two parts. First download the files from the
  Supercomputing Challenge website for the
  lab in C/MPI if you haven’t already done
• Untar the lab files with
  “tar –xzvf Wave_Lab.tgz”

         Setting up Software
   Instructions in the README file
Setting up System Software    Setting up User’s
• Need Java, OpenMPI and        workspace
  MPE package from            • Download eclipse
  MPICH                         software including
• Download and install          eclipse, PTP and PLDT
  according to instructions   • Install according to
  in           instructions in
• Can install in user’s
  directory with some         • Import wave source files
  modifications                 and setup eclipse
                                according to instructions

                   Lab Exercises
• Try modifying the sample program (Java and/or C
   – Change initial distribution. How sharp can it be before it goes
   – Change number of cells
   – Change graphics output
   – Try running 1, 2, or 4 processes and time the runs. Note that you
     can run 4 processes even if you are on a one processor system.
   – Switch to PTP debug or Java debug perspective and try stepping
     through the program
• Comparing to data is critical
   – Are there other unrealistic behaviors of the model?
   – Design an experiment to isolate variable effects. This can greatly
     improve your model.

            Appendix A.
    Calculus and Supercomputing
• Calculus and Supercomputing are intertwined.
• Here is a simple problem – Add up the volume of
  earth above sea-level for an island 500 ft high by
  half a mile wide and twenty miles long.

• Typical science homework problem using simple
  algebra. Can be done by hand. Not appropriate
  for supercomputing. Not enough complexity.

             Add Complexity
• The island profile is a jagged mountainous terrain
  cut by deep canyons. How do we add up the

• Calculus – language of complexity
  – Addition – summing numbers
  – Multiplication – summing numbers with a constant
  – Integration – summing numbers with an irregular

          Divide and Conquer
• In discrete form
                               ∑ -- Summation symbol
                               ∆ -- delta symbol or x2-x1

• Divide the island into small pieces and sum up
  the volume of each piece.
• Approaches the solution as the size of the
  intervals grows smaller for a jagged profile.

           Divide and Conquer
• In Continuous Form – Integration

• Think of the integral symbols as describing a shape
  that is continuously varying
• The accuracy of the solution can be improved by
  summing over smaller increments
• Lots of arithmetic operations – now you have a
  “computing” problem. Add more work and you have a
  “supercomputing” problem.

             Derivative Calculus
                Describing Change
• Derivatives describe the change in a variable
  (numerator or top variable) relative to another variable
  (denominator or bottom). These three derivatives
  describe the change in population versus time, x-
  direction and y-direction.

                P P     P
                  ,   and
                t x     y

    Appendix B.
Computational Methods
   1. Eulerian and Lagrangian
      2. Explicit and Implicit

Two Main Approaches to Divide up
• Eulerian – divide up by spatial coordinates
  – Track populations in a location
  – Observer frame of reference
• Lagrangian – divide up by objects
  – Object frame of reference
  – Easier to track attributes of population since they
    travel with the objects
  – Agent based modeling of Star Logo uses this
  – Can tangle mesh in 2 and 3 dimensions

Eulerian – The area stays fixed and has a                  Population moves out of cell
  Population per area. We observe the
  change in population across the
  boundaries of the area.                                     Eulerian

Lagrangian – The population stays
  constant. The population moves with        Population moves and so does region

  velocity vx and vy and we move with
  them. The size of the area will change if
  the four vertexes of the rectangle move at Lagrangian
  different velocities. Changes in area will
  result in different densities.

        Explicit versus Implicit
• Explicit – In mathematical shorthand, Un+1=
  f(Un). This means that the next timestep values
  can be expressed entirely on the previous
  timestep values.
• Implicit – Un+1=f(Un+1,Un). Next timestep values
  must be solved iteratively. Often uses a matrix or
  iterative solver.
• We will stick with explicit methods here. You
  need more math to attempt implicit methods.

    Appendix C
Index Explanation for 2D Lax

• Most difficult part of programming this method is
  to keep track of indices – half step grid indices
  cannot be represented by ½ in the code so they
  have to be offset one way or the other.
• Errors are very difficult to find so it is important
  to be very methodical in the coding.
• Next two slides show the different sizes of the
  staggered half-step grid and the relationships
  between the indices in the calculation (courtesy
  Jon Robey).

              0        1       2       3       4
                                                   1st Pass
                       y       y       y
    1              x       x       x       x             0,0     -- 1,0 | 1,1
                       y       y       y
                                                       j,i     -- j+1,i | j+1,i+1
j   2              x       x       x       x
                       y       y       y
    3              x       x       x       x
                       y       y       y                       X step grid          Main grid

        0,0       -- 0,1 | 1,1
    j,i       -- j,i+1 | j+1,i+1

              Y step grid          Main grid                                                    51
              0        1       2       3       4
                                                   2nd Pass
                       y       y       y
    1              x       x       x       x             1,1     -- 0,0 | 0,1
                       y       y       y
                                                       j,i     -- j-1,i-1 | j-1,i
j   2              x       x       x       x
                       y       y       y
    3              x       x       x       x
                       y       y       y                       Main grid            X step grid

        1,1       -- 0,0 | 1,0
    j,i       -- j-1,i-1 | j,i-1

              Main grid            Y step grid                                                    52

To top