The U.S. DOE Advanced CompuTational Software (ACTS) Collection
Tony Drummond Lawrence Berkeley National Laboratory LADrummond@lbl.gov
SIMULA Research Laboratory - May 2008
OUTLINE
• Motivation • Introduction to the DOE ACTS Collection • Interfaces to the ACTS Collection • Software Sustainability Requirements • References
SIMULA Research Laboratory - May 2008
Development of HighWhere are theSimulations End Computer applications? • Accelerator Science • Astrophysics • Biology • Chemistry • Earth Sciences • Materials Science • Nanoscience • Plasma Science : Commonalities: • Major advancements in Science http://acts.nersc.gov/MatApps • Increasing demands for computational power • Rely on available computational systems, languages, and software tools
SIMULA Research Laboratory - May 2008
Software Development and Evolution
min[time_to_first_solution] min[time_to_solution] (prototype) (production)
• Outlive Complexity • Increasingly sophisticated models • Model coupling • Interdisciplinary
(Software Evolution)
• Sustained Performance • Increasingly complex algorithms • Increasingly diverse architectures • Increasingly demanding applications
min[software-development-cost] max[software_life] and max[resource_utilization]
SIMULA Research Laboratory - May 2008
(Long-term deliverables)
OUTLINE
• Motivation • Introduction to the DOE ACTS Collection • Interfaces to the ACTS Collection • Software Sustainability Requirements • References
SIMULA Research Laboratory - May 2008
THE U.S. DOE ACTS COLLECTION
Goal: The Advanced CompuTational Software Collection (ACTS) makes reliable and efficient software tools more widely used, and more effective in solving the nation’s engineering and scientific problems.
References: • L.A. Drummond, O. Marques: An Overview of the Advanced CompuTational Software (ACTS) Collection. ACM Transactions on Mathematical Software Vol. 31 pp. 282-301, 2005 • http://acts.nersc.gov
SIMULA Research Laboratory - May 2008
The Advanced CompuTational Software Collection (ACTS)
Components: • Solid Base: non-commercial and open source tools developed at DOE laboratories and universities. • Independent Tool Evaluations and Consultation provided through acts-support@nersc.gov • High Level User Support problem identification, tool and interface selection, specific tuning parameter configurations, installation, documentation, etc. • Training and Dissemination workshops, lectures, active conference participation (acts.nersc.gov. • Collaborations with HPC centers, computational sciences research centers (national and international level), and software and computer vendors.
SIMULA Research Laboratory - May 2008
Category
Tool
Functionalities
Trilinos
Numerical Hypre PETSc
Ax b Az z A UV T
PDEs ODEs
Algorithms for the iterative solution of large sparse linear systems.
Algorithms for the iterative solution of large sparse linear systems, intuitive grid-centric interfaces, and dynamic configuration of parameters. Tools for the solution of PDEs that require solving large-scale, sparse linear and nonlinear systems of equations. Object-oriented nonlinear optimization package. Solvers for the solution of systems of ordinary differential equations, nonlinear algebraic equations, and differential-algebraic equations. Library of high performance parallel dense linear algebra. Software library for the solution of large sparse eigenproblems on parallel computers. General-purpose library for the direct solution of large, sparse, nonsymmetric systems of linear equations. Large-scale optimization software. Library for writing parallel programs that use large arrays distributed across processing nodes and that offers a shared-memory view of distributed arrays. Object-Oriented tools for solving computational fluid dynamics and combustion problems in complex geometries. Set of tools for analyzing the performance of C, C++, Fortran and Java programs. SIMULA Research Laboratory - May 2008 Tools for the automatic generation of optimized numerical software for
OPT++ SUNDIALS ScaLAPACK SLEPc SuperLU TAO Global Arrays Overture
Code Development
Run Time Support Library
TAU ATLAS
Software Sustainability
Changes in algorithms sometimes lead to several years advancement in computations. Needs Flexibility! Its performance is influenced by system parameters and in steps in the algorithm. Critical points: portability and scalability.
Algorithmic Implementations Application Data Layout
Control
I/O
Tuned and machine Dependent modules
New Architecture requires extensive tuning, may even require new programming paradigms. This is Difficult to maintain and not “very” portable.
SIMULA Research Laboratory - May 2008
Software Sustainability
USER's APPLICATION CODE (Main Control)
Compilers + Expert Drivers + Support
AVAILABLE
AVAILABLE
Application Data Layout
LIBRARIES & PACKAGES
Algorithmic Implementations
LIBRARIES & PACKAGES
AVAILABLE
I/O LIBRARIES
Tuned and machine Dependent modules
SIMULA Research Laboratory - May 2008
Critical Path for HPC Software Stack
• Scientific or engineering context • Domain expertise
• Simulation codes • Data Analysis codes
General Purpose Libraries
•Algorithms •Data Structures •Code Optimization • Programming Languages •O/S - Compilers Hardware - Middleware - Firmware
SIMULA Research Laboratory - May 2008
Critical Path for HPC Software Stack
Funded by DOE/ASCR
Library Development
Numerical Tools Code Development Run Time Support http://acts.nersc.gov
General Purpose Libraries
•Algorithms •Data Structures •Code Optimization • Programming Languages •O/S - Compilers Hardware - Middleware - Firmware
SIMULA Research Laboratory - May 2008
ACTS Numerical Tools: Functionality
Computational Problem Systems of Linear Equations Methodology Algorithms LU Factorization Cholesky Factorization Direct Methods
matrices)
Library
ScaLAPACK(dense) SuperLU (sparse) ScaLAPACK
LDLT (Tridiagonal ScaLAPACK QR Factorization
ScaLAPACK
QR with column ScaLAPACK pivoting
LQ factorization ScaLAPACK
SIMULA Research Laboratory - May 2008
ACTS Numerical Tools: Functionality
Computational Problem Systems of Linear Equations (cont..) Methodology Algorithms Conjugate Gradient GMRES Library
AztecOO (Trilinos) PETSc AztecOO PETSc Hypre AztecOO PETSc AztecOO PETSc
Iterative Methods
CG Squared Bi-CG Stab
AztecOO Quasi-Minimal Residual (QMR)
Transpose Free QMR
SIMULA Research Laboratory - May 2008
AztecOO PETSc
Structure of PETSc
PETSc PDE Application Codes ODE Integrators
Visualization
Nonlinear Solvers, Interface Unconstrained Minimization Linear Solvers Preconditioners + Krylov Methods Object-Oriented Grid Matrices, Vectors, Indices Management
Profiling Interface Computation and Communication Kernels MPI, MPI-IO, BLAS, LAPACK
SIMULA Research Laboratory - May 2008
Hypre Conceptual Interfaces
Linear System Interfaces
Linear Solvers
GMG, ... FAC, ... Hybrid, ... AMGe, ... ILU, ...
Data Layout
structured composite block-struc unstruc CSR
SIMULA Research Laboratory - May 2008
Hypre Conceptual Interfaces to Solvers
List of Solvers and Preconditioners per Conceptual Interface
System Interfaces Solvers
Jacobi SMG PFMG BoomerAMG ParaSails PILUT Euclid PCG GMRES Struct X X X X X X X X X SStruct FEI IJ
X X X X X X
X X X X X X
X X X X X X
SIMULA Research Laboratory - May 2008
ACTS Numerical Tools: Functionality
Computational Problem Systems of Linear Equations (cont..) Methodology Algorithms SYMMLQ
PETSc
Library
Precondition CG AztecOO
PETSc Hypre
Iterative Methods (cont..)
Richardson Block Jacobi Preconditioner Point Jocobi Preconditioner Least Squares Polynomials
PETSc AztecOO PETSc Hypre AztecOO PETSc
SIMULA Research Laboratory - May 2008
ACTS Numerical Tools: Functionality
Computational Problem Methodology Algorithms
SOR Preconditioning Overlapping Additive Schwartz PETSc PETSc Hypre AztecOO PETSc Hypre AztecOO PETSc PETSc Hypre Hypre
Library
Systems of Linear Equations (cont..) Iterative Methods (cont..)
Approximate Inverse Sparse LU preconditioner Incomplete LU (ILU) preconditioner Least Squares Polynomials MG Preconditioner
MultiGrid (MG) Methods
Algebraic MG
SIMULA Research Laboratory - May 2008 Semi-coarsening
Hypre
ACTS Numerical Tools: Functionality
Computational Problem Methodology Algorithm
mi n x || b Ax || 2 mi n x || x || 2 mi n x || b Ax || 2 mi n x || x || 2
Library
ScaLAPACK ScaLAPACK ScaLAPACK ScaLAPACK (dense) SLEPc (sparse) ScaLAPACK (dense) SLEPc (sparse) ScaLAPACK (dense) SLEPc (sparse)
Linear Least Least Squares Squares Problems
Minimum Norm Solution Minimum Norm Least Squares Standard Eigenvalue Problem Singular Value Problem Generalized Symmetric Definite Eigenproblem
Symmetric Eigenvalue Problem
Az z
For A=AH or A=AT
Singular Value Decomposition Eigenproblem
A UVT A UV H Az Bz ABz z BAz z
SIMULA Research Laboratory - May 2008
ACTS Numerical Tools: Functionality
Computational Problem Non-Linear Equations
Methodology
Algorithm
Line Search Trust Regions
Library
PETSc PETSc PETSc PETSc
Newton Based
Pseudo-Transient Continuation
Matrix Free
SIMULA Research Laboratory - May 2008
ACTS Numerical Tools: Functionality
Computational Problem Methodology Algorithm
Newton Finite-Difference Newton Quasi-Newton Non-linear Interior Point Standard Nonlinear CG
Library
OPT++ TAO OPT++ TAO OPT++ TAO OPT++ TAO OPT++ TAO OPT++ TAO OPT++
Non-Linear Optimization Newton Based
CG
Limited Memory BFGS Gradient Projections
Direct Search
No derivate information
SIMULA Research Laboratory - May 2008
TAO - Interface with PETSc
SIMULA Research Laboratory - May 2008
OPT++ Interfaces
• Four major classes of problems available
• NLF0(ndim, fcn, init_fcn, constraint)
• Basic nonlinear function, no derivative information available
• NLF1(ndim, fcn, init_fcn, constraint)
• Nonlinear function, first derivative information available
• FDNLF1(ndim, fcn, init_fcn, constraint)
• Nonlinear function, first derivative information approximated
• NLF2(ndim, fcn, init_fcn, constraint)
• Nonlinear function, first and second derivative information available
SIMULA Research Laboratory - May 2008
ACTS Numerical Tools: Functionality
Computational Problem Methodology Algorithm
Newton Finite-Difference Newton Quasi-Newton Non-linear Interior Point Standard Nonlinear CG
Library
OPT++ TAO OPT++ TAO OPT++ TAO OPT++ TAO OPT++ TAO OPT++ TAO OPT++
Non-Linear Optimization Newton Based
CG
Limited Memory BFGS Gradient Projections
Direct Search
No derivate information
SIMULA Research Laboratory - May 2008
ACTS Numerical Tools: Functionality
Computational Problem Non-Linear Optimization (cont..) Ordinary Differential Equations Methodology Algorithm
Feasible Semismooth Unfeasible semismooth Adam-Moulton (Variable coefficient forms) Direct and Iterative Solvers Line Search
TAO TAO CVODE (SUNDIALS) CVODES
Library
Semismoothing
Integration
Backward Differential Formula
CVODE CVODES KINSOL (SUNDIALS)
Nonlinear Algebraic Equations Differential Algebraic Equations
Inexact Newton
Direct and Iterative Solvers
IDA (SUNDIALS)
Backward Differential Formula
SIMULA Research Laboratory - May 2008
ACTS Tools: Functionality
Computational Problem Writing Parallel Programs Support Techniques
Shared-Memory Distributed Memory Grid Generation
Library
Global Arrays CUMULVS (viz) Globus (Grid) OVERTURE CHOMBO (AMR) Hypre OVERTURE PETSc CHOMBO (AMR) Hypre OVERTURE Globus
Distributed Arrays Structured Meshes
Semi-Structured Meshes GRID
Distributed Computing
Remote Steering
Coupling
CUMULVS
PAWS
SIMULA Research Laboratory - May 2008
ACTS Tools: Functionality
Computational Problem
Writing Parallel Programs (cont.) Profiling Support Distributed Computing Technique Library
Check-point/restart CUMULVS Automatic instrumentation User Instrumentation Automatic Instrumentation User Instrumentation
PETSc PETSc TAU TAU
Algorithmic Performance
Execution Performance Code Optimization Library Installation Code Generation
Linear Algebra Tuning
ATLAS
BABEL CHASM
CCA
Interoperability
Language
Components
SIMULA Research Laboratory - May 2008
OUTLINE
• Motivation • Introduction to the DOE ACTS Collection • Interfaces to the ACTS Collection • Software Sustainability Requirements • References
SIMULA Research Laboratory - May 2008
How Does One Use ACTS Tools?
CALL BLACS_GET( -1, 0, ICTXT ) CALL BLACS_GRIDINIT( ICTXT, 'Row-major', NPROW, NPCOL ) : CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) : : CALL PDGESV( N, NRHS, A, IA, JA, DESCA, IPIV, B, IB, JB, DESCB, $ INFO )
Language Calls
• -ksp_type [cg,gmres,bcgs,tfqmr,…] • -pc_type [lu,ilu,jacobi,sor,asm,…] More advanced: • -ksp_max_it • -ksp_gmres_restart • -pc_asm_overlap • -pc_asm_type <. . >
Command lines
Linear System Interfaces
Linear Solvers GMG FAC Hybrid, ... AMGe ILU, ...
Problem Domain
Data Layout structured composite blockstrc unstruc CSR SIMULA Research Laboratory - May 2008
Tool to Tool Interoperability
One Side Interoperability
PETSc
Ex 1
TAU
Ex 2
TOOL A
TOOL D
SIMULA Research Laboratory - May 2008
High-level User Interfaces to the ACTS Collection
PyACTS matlabMPI NetSolve Star-P
User
Ax = b
View_field(T1)
Az z
T
A UV
High Level Interfaces
OPT++ AZTEC ScaLAPACK PAWS Hypre SuperLU
Globus
PETSc TAO
CUMULVS Chombo PVODE
TAU
Global Arrays Overture
SIMULA Research Laboratory - May 2008
PyACTS
Tony Drummond Lawrence Berkeley National Laboratory
Vicente Galiano Miguel Hernandez University Violeta Migallón and José Penadés University of Alicante
Goal: Provide a didactical tool to the ACTS collection. Provide a Python based interface to the ACTS Collection.
References: • L. A. Drummond, V. Galiano, O. Marques, V. Migallon, J.Penades: PyACTS: A High-level Framework for Fast Development of High Performance Applications. Lecture Notes in Computer Sciences, Vol. 4395, pp 417-425, 2007.
SIMULA Research Laboratory - May 2008
PyACTS
PyACTS
PyScaLAPACK PySuperLU
SuperLU Wrappers
PyACTS Wrappers
ScaLAPACK Wrappers
Python World PyMPI NumPy
ScaLAPACK
...
SuperLU
Python
SIMULA Research Laboratory - May 2008
PyACTS: Basic Services
• BASIC Services: Creation and modification of different data objects and parallel environment specifications (matrices, data layouts, ctx,) • I/O Services : Parallel read/write. Currently supported ASCII and NetCDF. • Verification and Validation: Predicates and parameter type checking. • Data Conversion. Interoperable objects between libraries.
SIMULA Research Laboratory - May 2008
PyACTS: Motivation
PyClimate (J. Saenz et al,Univ. Basque Country) Support to common tasks during the analysis of climate variability data. • Simple IO operations • Operations with COARDS-compliant NetCDF files • Empirical Orthogonal Function (EOF) analysis, • Canonical Correlation Analysis (CCA) • Singular Value Decomposition (SVD) analysis of coupled datasets • Some linear digital filters • Kernel based probability-density function estimation and • access to DCDFLIB.C library from Python.
SIMULA Research Laboratory - May 2008
PyACTS: Performance in PyClimate EOF calculations
Empirical Orthogonal Function (Day calc)
SIMULA Research Laboratory - May 2008
PyScaLAPACK: pvgesvd Performance
SIMULA Research Laboratory - May 2008
PyACTS: Performance
> from PyACTS import * > import PyACTS.PyPBLAS as PyPBLAS > import time > n=500 > ACTS_lib=1 # ScaLAPACK library > PyACTS.gridinit() # grid initialization > alpha=Scal2PyACTS(2,ACTS_lib) # convert scalar c=PyPBLAS.pvgemm(alpha,a,b,beta,c) # to PyACTS scalar > beta=Scal2PyACTS(3,ACTS_lib) > a=Rand2PyACTS(n,n,ACTS_lib) # generate a random # PyACTS array > b=Rand2PyACTS(n,n,ACTS_lib) > c=Rand2PyACTS(n,n,ACTS_lib) > c=PyPBLAS.pvgemm(alpha,a,b,beta,c) # call level 3 # PBLAS routine > PyACTS.gridexit()
SIMULA Research Laboratory - May 2008
OUTLINE
• Motivation • Introduction to the DOE ACTS Collection • Interfaces to the ACTS Collection • Software Sustainability Requirements • References
SIMULA Research Laboratory - May 2008
Problem Statement:
Software Sustainability
THE GOOD • Many successful HPC stories have induced major advances in science and engineering • We have successful run and scale applications on 100000+ processors THE BAD • Portability Across Platforms is Still An Outstanding Issue: •Readiness • Performance • Robustness and Correctness THE UGLY Multi-Core and Many Core Era is knocking at the HPC door
SIMULA Research Laboratory - May 2008
Problem Statement:
Software Sustainability
THE GOOD • Many successful HPC stories have induced major advances in science and engineering • We have successful run and scale applications on 100000+ processors THE BAD • Portability Across Platforms is Still An Outstanding Issue: •Readiness •Performance • Robustness and Correctness THE UGLY Multi-Core and Many Core Era is knocking at the HPC door
SIMULA Research Laboratory - May 2008
Problem Statement:
Software Sustainability
THE GOOD • Many successful HPC stories have induced major advances in science and engineering • We have successful run and scale applications on 100000+ processors THE BAD • Portability Across Platforms is Still An Outstanding Issue: •Readiness • Performance • Robustness and Correctness THE UGLY Multi-Core and Many Core Era is knocking at the HPC door
SIMULA Research Laboratory - May 2008
Software Quality Assurance
• Robustness • Scalability • Extensibility • Interoperability • User Friendliness • Documentation • Periodic test and evaluations (test engines and dependency graphs) Versions (tools, systems, O/S, compilers)
• Sanity-check (robustness) • Interoperability (maintained) • Consistent Documentation
SIMULA Research Laboratory - May 2008
ScaLAPACK’s Software Structure
ScaLAPACK
PBLAS
Global Local
LAPACK
platform specific
BLACS
BLAS
MPI/PVM/...
SIMULA Research Laboratory - May 2008
BLAS: Basic Linear Algebra Subroutines
BLAS LEVELS:
• Level 1 BLAS: vector-vector
2.2 GHz AMD Opteron
10000.0
Mflop/s
• Level 2 BLAS: matrix-vector
11 00
13 00
15 00
17 00
order of matrix/vector
+
*
Design Considerations: • Portability • Performance: development of blocked algorithms is important for performance!
SIMULA Research Laboratory - May 2008
19 00
10 0
30 0
50 0
70 0
90 0
• Level 3 BLAS: matrix-matrix
*
100.0
1000.0
+ *
BLAS 1 BLAS 2 BLAS 3
ScaLAPACK: Data Layouts
• 1D block and column distributions • 1D block-cycle column and 2D block-cyclic distribution • 2D block-cyclic distribution used in ScaLAPACK for dense matrices
SIMULA Research Laboratory - May 2008
Astrophysics Applications
Cosmic Microwave Background Analysis, BOOMERanG collaboration, MADCAP code (Apr. 27, 2000).
• The statistics of the tiny variations in the CMB (the faint echo of the Big Bang) allows the determination of the fundamental parameters of cosmology to the percent level or better. • MADCAP (Microwave Anisotropy Dataset Computational Analysis Package) • Makes maps from observations of the CMB and then calculates their angular power spectra. (See http://crd.lbl.gov/~borrill). • Calculations are dominated by the solution of linear systems of the form M=A-1B for dense nxn matrices A and B scaling as O(n3) in flops. MADCAP uses ScaLAPACK for those calculations. SIMULA Research Laboratory - May 2008
PETSc
PETSc PDE Application Codes ODE Integrators
Visualization
Nonlinear Solvers, Interface Unconstrained Minimization Linear Solvers Preconditioners + Krylov Methods Object-Oriented Grid Matrices, Vectors, Indices Management Profiling Interface Computation and Communication Kernels MPI, MPI-IO, BLAS, LAPACK
Image Provided by PETSc Development Team, ANL
SIMULA Research Laboratory - May 2008
Basic Conjugate Gradient Algorithm
Synchronization Points Scalars , , y
Vectors x, r, p (= search direction), and q
SIMULA Research Laboratory - May 2008
Preconditioning Matrices
Gauss-Seidel: M = D-E Uses lower triangular part of matrix A Jacobi: M = D Uses diagonal of A SOR: M = 1/(D- E), Uses lower triangular part of A
SSOR: M = 1/(2- ) (D- E)D-1(D- F) Uses the whole matrix A
SIMULA Research Laboratory - May 2008
PETSc: Matrix Distribution
proc 1
M=8,N=8,m=3,n=k1 rstart=0,rend=4 M=8,N=8,m=3,n=k2 rstart=3,rend=6 M=8,N=8,m=2,n= k3 rstart=6,rend=8
proc 2
proc 3
SIMULA Research Laboratory - May 2008
Software Dependency Graph
ScaLAPACK
Software Dependency Tree:
PBLAS
Global Local LAPACK
BLACS
ScaLAPACK: PBLAS, LAPACK LAPACK: BLAS PBLAS: BLACS, MPI Computational Platform Dependency ScaLAPCK: compiles=[compiler-list] options=[compile-options] Software Testing: ScaLAPACK: tests=[dir-list]
BLAS
platform specific MPI/PVM/...
Python-base scripts
SIMULA Research Laboratory - May 2008
Software Sustainability
Software Testing Engines (automatic)
Errors/Problems
yes
No
End
Fix/Report and Document
User Reported Problems
SIMULA Research Laboratory - May 2008
Software Sustainability
Performance and Scalability
Software Testing Engines (automatic)
• Profiling and Tracing Tools: TAU
Execution time of PDPOSV for various grid shapes
Errors/Problems yes Fix/Report and Document
No
End
40 35 30 25 seconds 20 1x60 15 10 5 0 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 problem size 5x12 6x10 2x30 3x20 4x15 grid shape 35-40 30-35 25-30 20-25 15-20 10-15 5-10 0-5
User Reported Problems
• Auto-Tuning (OSKI, ATLAS like)
SIMULA Research Laboratory - May 2008
Software Sustainability Requirement
SIMULA Research Laboratory - May 2008
ACTS Software Sustainability Center
· · · t∞ Sustainable Software Support · · · t∞
SIMULA Research Laboratory - May 2008
Open Challenges - Multi-core
• Improve interactions between Tool-
Compilers-Hardware • Software Distribution and Installation
• Automatic Tuning and Profiling (TAU, IPM, etc) • Automatic Code Generators (ATLAS-like) • Debugging tools • Tools and Language Interoperability
SIMULA Research Laboratory - May 2008
References
• ACTS Information Center: http://acts.nersc.gov • Two Upcoming Journal Issues dedicated to ACTS ACM TOMS IJHPCA
• Ninth ACTS Collection Workshop, August 19-22, 2008
SIMULA Research Laboratory - May 2008