HPCx Overview and Biomolecular Simulations by nikeborome

VIEWS: 15 PAGES: 57

									Opportunities for Biological Consortia
               on HPCx

 Code Capabilities and Performance
  HPCx and CCP Staff

    http://www.ccp.ac.uk/
   http://www.hpcx.ac.uk/
Welcome to the Meeting

 • Background
       – HPCx

 • Objectives
       – to consider whether there is a case to bid


 • Agenda
       –   Introduction to the HPCx service
       –   Overview of Code Performance
       –   Contributed Presentations
       –   Invited Presentation -
       –   Discussion



HPCx/Biology Discussions                  2           Royal Institution, 6th November 2003
Outline

 • Overview of Code Capabilities and Performance

       – Macromolecular simulation
            • DL_POLY, AMBER, CHARMM, NAMD
       – Localised basis molecular codes
            • Gaussian, GAMESS-UK, NWChem
       – Local basis periodic code
            • CRYSTAL
       – Plane wave periodic codes
            • CASTEP
            • CPMD (Alessandro Curioni talk)


 • Note - consortium activity is not limited to these codes.



HPCx/Biology Discussions                       3   Royal Institution, 6th November 2003
The DL_POLY Molecular Dynamics
Simulation Package




             Bill Smith
DL_POLY Background


• General purpose parallel MD code
• Developed at Daresbury Laboratory for CCP5 1994-today
• Available free of charge (under licence) to University
  researchers world-wide
• DL_POLY versions:
     – DL_POLY_2
          • Replicated Data, up to 30,000 atoms
          • Full force field and molecular description
     – DL_POLY_3
          • Domain Decomposition, up to 1,000,000 atoms
          • Full force field but no rigid body description.




HPCx/Biology Discussions                           5          Royal Institution, 6th November 2003
DL_POLY Force Field

 • Intermolecular forces
       –   All common van de Waals potentials
       –   Sutton Chen many-body potential
       –   3-body angle forces (SiO2)
       –   4-body inversion forces (BO3)
       –   Tersoff potential -> Brenner
 • Intramolecular forces
       – Bonds, angle, dihedrals, inversions
 • Coulombic forces
       – Ewald* & SPME (3D), HK Ewald* (2D), Adiabatic shell model,
         Reaction field, Neutral groups*, Truncated Coulombic,
 • Externally applied field
       – Walled cells,electric field,shear field, etc
 * Not in DL_POLY_3
HPCx/Biology Discussions                    6           Royal Institution, 6th November 2003
Boundary Conditions

 •    None (e.g. isolated macromolecules)
 •    Cubic periodic boundaries
 •    Orthorhombic periodic boundaries
 •    Parallelepiped periodic boundaries
 •    Truncated octahedral periodic boundaries*
 •    Rhombic dodecahedral periodic boundaries*
 •    Slabs (i.e. x,y periodic, z nonperiodic)




HPCx/Biology Discussions           7         Royal Institution, 6th November 2003
Algorithms and Ensembles



                Algorithms               Ensembles

  •    Verlet leapfrog           •   NVE
  •    RD-SHAKE                  •   Berendsen NVT
  •    Euler-Quaternion*         •   Hoover NVT
  •    QSHAKE*                   •   Evans NVT
  •    [All combinations]        •   Berendsen NPT
                                 •   Hoover NPT
  *   Not in DL_POLY_3           •   Berendsen NT
                                 •   Hoover NT




HPCx/Biology Discussions     8         Royal Institution, 6th November 2003
   Migration from Replicated to Distributed data
   DL_POLY-3 : Domain Decomposition

• Distribute atoms, forces across the
  nodes
   – More memory efficient, can address
     much larger cases (105-107)                                A                     B
• Shake and short-ranges forces require
  only neighbour communication
   – communications scale linearly with
     number of nodes
• Coulombic energy remains global                               C                     D
   – strategy depends on problem and
     machine characteristics
   – Adopt Smooth Particle Mesh Ewald
     scheme
        • includes Fourier transform smoothed
          charge density (reciprocal space grid       An alternative FFT algorithm has been
          typically 64x64x64 - 128x128x128)           designed to reduce communication costs

   HPCx/Biology Discussions                       9             Royal Institution, 6th November 2003
Migration from Replicated to Distributed data
DL_POLY-3: Coulomb Energy Evaluation
                             • Conventional routines (e.g. fftw) assume plane
                               or column distributions
                             • A global transpose of the data is required to
                               complete the 3D FFT and additional costs are
Plane              Block
                               incurred re-organising the data from the natural
                               block domain decomposition.
                             • An alternative FFT algorithm has been
                               designed to reduce communication costs.
                                 – the 3D FFT are performed as a series of 1D
                                   FFTs, each involving communications only
                                   between blocks in a given column
                                 – More data is transferred, but in far fewer
                                   messages
                                 – Rather than all-to-all, the communications are
                                   column-wise only

  HPCx/Biology Discussions              10           Royal Institution, 6th November 2003
DL_POLY_2 & 3 Differences



 •    Rigid bodies not in _3
 •    MSD not in _3
 •    Tethered atoms not in _3
 •    Standard Ewald not in _3
 •    HK_Ewald not in _3
 •    DL_POLY_2 I/O files work in _3 but NOT vice versa
 •    No multiple timestep in _3



HPCx/Biology Discussions       11      Royal Institution, 6th November 2003
DL_POLY_2 Developments



 •    DL_MULTI - Distributed multipoles
 •    DL_PIMD - Path integral (ionics)
 •    DL_HYPE - Rare event simulation
 •    DL_POLY - Symplectic versions 2/3
 •    DL_POLY - Multiple timestep
 •    DL_POLY - F90 re-vamp




HPCx/Biology Discussions      12      Royal Institution, 6th November 2003
DL_POLY_3 on HPCx

 • Test case 1 (552960 atoms, 300Dt)
       – NaKSi2O5 - disilicate glass
       – SPME (1283grid)+3 body terms, 15625 LC)
       – 32-512 processors (4-64 nodes)




HPCx/Biology Discussions             13        Royal Institution, 6th November 2003
DL_POLY_3 on HPCx

 • Test case 2 (792960 atoms, 10Dt)
       – 64xGramicidin(354)+256768 H2O
       – SHAKE+SPME(2563 grid),14812 LC
       – 16-256 processors (2-32 nodes)




HPCx/Biology Discussions          14      Royal Institution, 6th November 2003
DL_POLY People

 • Bill Smith DL_POLY_2 & _3 & GUI
       – w.smith@dl.ac.uk
 • Ilian Todorov DL_POLY_3
       – i.t.todorov@dl.ac.uk
 • Maurice Leslie DL_MULTI
       – m.leslie@dl.ac.uk

 • Further Information:
       – W. Smith and T.R. Forester, J. Molec. Graphics, (1996), 14, 136
       – http://www.cse.clrc.ac.uk/msi/software/DL_POLY/index.shtml
       – W. Smith, C.W. Yong, P.M. Rodger,Molecular Simulation (2002), 28,
         385




HPCx/Biology Discussions               15         Royal Institution, 6th November 2003
AMBER, NAMD and
   Gaussian



 Lorna Smith and Joachim Hein
AMBER

• AMBER (Assisted Model Building with Energy Refinement)
      – A molecular dynamics program, particularly for biomolecules
      – Weiner and Kollman, University of California, 1981.
• Current version – AMBER7
• Widely used suite of programs
      – Sander, Gibbs, Roar
• Main program for molecular dynamics: Sander
      – Basic energy minimiser and molecular dynamics
      – Shared memory version – only for SGI and Cray
      – MPI version: master / slave, replicated data model




HPCx/Biology Discussions                19         Royal Institution, 6th November 2003
AMBER - Initial Scaling

               12


               10


               8
    Speed-up




               6

               4


               2


               0
                    0   16   32   48      64     80       96   112     128       144
                                       No of Processors


• Factor IX protein with Ca++ ions – 90906 atoms

HPCx/Biology Discussions                           20           Royal Institution, 6th November 2003
Current developments - AMBER

 • Bob Duke
       – Developed a new version of Sander on HPCx
       – Originally called AMD (Amber Molecular Dynamics)
       – Renamed PMEMD (Particle Mesh Ewald Molecular Dynamics)
 • Substantial rewrite of the code
       – Converted to Fortran90, removed multiple copies of routines,…
       – Likely to be incorporated into AMBER8
 • We are looking at optimising the collective communications –
   the reduction / scatter




HPCx/Biology Discussions               21         Royal Institution, 6th November 2003
Optimisation – PMEMD


                      300
                                                                      PMEMD
                      250                                             Sander7
     Time (seconds)




                      200


                      150


                      100


                       50


                        0
                            0   50   100         150          200   250          300
                                           No of Processors




HPCx/Biology Discussions                                22          Royal Institution, 6th November 2003
NAMD

 • NAMD
       – molecular dynamics code designed for high-performance
         simulation of large biomolecular systems.
       – Theoretical and Computational Biophysics Group, University of
         Illinois at Urbana-Champaign.


 • Versions 2.4, 2.5b and 2.5 available on HPCx

 • One of the first codes to be awarded a capability incentive
   rating – bronze




HPCx/Biology Discussions              23        Royal Institution, 6th November 2003
NAMD Performance




                                         •Benchmarks
                                         from Prof
                                         Peter Coveney
                                         •TCR-peptide-
                                         MHC system




HPCx/Biology Discussions   24   Royal Institution, 6th November 2003
NAMD Performance




HPCx/Biology Discussions   25   Royal Institution, 6th November 2003
    Molecular Simulation - NAMD Scaling
    http://www.ks.uiuc.edu/Research/namd/             •       Parallel, object-oriented MD code
                                                      •       High-performance simulation of
                                                              large biomolecular systems
                                                      •       Scales to 100’s of processors on
                                                              high-end parallel platforms
                                                          Speedup
                                                    512
                                                                 Linear
                                                                 IBM SP/Regatta-H
                                                                 Compaq AlphaServer ES45/1000
                                                    384



                                                    256

• standard NAMD ApoA-I benchmark, a
system comprising 92,442 atoms, with 12Å            128
cutoff and PME every 4 time steps.
• scalability improves with larger simulations -
                                                     0
speedup of 778 on 1024 CPUs of TCS-1 in a
                                                          0          128         256         384          512
327K particle simulation of F1-ATPase.                                 Number of CPUs
    HPCx/Biology Discussions                   26                  Royal Institution, 6th November 2003
Performance Comparison

 • Performance comparison between AMBER, CHARMM and
   NAMD

 • See: http://www.scripps.edu/brooks/Benchmarks/

 • Benchmark
       – dihydrofolate reductase protein in an explicit water bath with cubic
         periodic boundary conditions.
       – 23,558 atoms




HPCx/Biology Discussions                  27         Royal Institution, 6th November 2003
Performance




HPCx/Biology Discussions   28   Royal Institution, 6th November 2003
Gaussian

 • Gaussian 03
       – Performs semi-empirical and ab initio molecular orbital
         calulcations.
       – Gaussian Inc, www.gaussian.com

 • Shared memory version available on HPCx
       – Limited to the size of a logical partition (8 processors)
       – Phase 2 upgrade will allow access to 32 processors


 • Task farming option




HPCx/Biology Discussions                   29          Royal Institution, 6th November 2003
CRYSTAL and CASTEP




   Ian Bush and Martin Plummer
Crystal

 • Electronic structure and related properties of periodic systems

 • All electron, local Gaussian basis set, DFT and Hartree-Fock

 • Under continuous development since 1974

 • Distributed to over 500 sites world wide

 • Developed jointly by Daresbury and the University of Turin




HPCx/Biology Discussions               31         Royal Institution, 6th November 2003
Crystal Functionality

 • Basis Set                         Properties
       – LCAO - Gaussians
                                        Energy
 • All electron or pseudopotential      Structure
 • Hamiltonian                          Vibrations (phonons)
       – Hartree-Fock (UHF, RHF)        Elastic tensor
       – DFT (LSDA, GGA)                Ferroelectric polarisation
       – Hybrid funcs (B3LYP)           Piezoelectric constants
 • Techniques                           X-ray structure factors
       – Replicated data parallel       Density of States / Bands
       – Distributed data parallel      Charge/Spin Densities
 • Forces                               Magnetic Coupling
       – Structural optimization        Electrostatics (V, E, EFG classical)
 • Direct SCF                           Fermi contact (NMR)
 • Visualisation                        EMD (Compton, e-2e)
       – AVS GUI (DLV)

HPCx/Biology Discussions              32          Royal Institution, 6th November 2003
Benchmark Runs on Crambin

•   Very small protein from Crambe
    Abyssinica - 1284 atoms per unit
    cell

•   Initial studies using STO3G (3948
    basis functions)

•   Improved to 6-31G * * (12354
    functions)

•   All calculations Hartree-Fock

•   As far as we know the largest HF
    calculation ever converged




HPCx/Biology Discussions                33   Royal Institution, 6th November 2003
Crambin - Parallel Performance
 •    Fit measured data to Amdahl’s law
      to obtain estimate of speed up
                                             1024


 •    Increasing the basis set size               896
                                                                 Linear
      increases the scalability                                  6-31G* (12,354 GTOs)
                                                  768
                                                                 6-31G (7,194 GTOs)
                                                                 STO-3G (3,948 GTOs)
 •    About 700 speed up on 1024                  640




                                       Speed-up
      processors for 6-31G * *
                                                  512


 •    Takes about 3 hours instead of              384

      about 3 months                              256


 •    99.95% parallel                             128
                                                                         Number of Processors
                                                    0
                                                        0          256          512         768        1024




HPCx/Biology Discussions                                    34                Royal Institution, 6th November 2003
Results – Electrostatic Potential

 • Charge density isosurface
   coloured according to
   potential

 • Useful to determine possible
   chemically active groups




HPCx/Biology Discussions          35   Royal Institution, 6th November 2003
Futures - Rusticyanin

 • Rusticyanin (Thiobacillus
   Ferrooxidans) has 6284
   atoms and is involved in
   redox processes
 • We have just started
   calculations using over
   33000 basis functions
 • In collaboration with
   S.Hasnain (DL) we want to
   calculate redox potentials for
   rusticyanin and associated
   mutants




HPCx/Biology Discussions            36   Royal Institution, 6th November 2003
What is Castep?

 • First principles (DFT) materials simulation code
       –   electronic energy
       –   geometry optimization
       –   surface interactions
       –   vibrational spectra
            • materials under pressure, chemical reactions
       – molecular dynamics

 • Method (direct minimization)
       – plane wave expansion of valence electrons
       – pseudopotentials for core electrons




HPCx/Biology Discussions                     37          Royal Institution, 6th November 2003
HPCx: biological applications

 • Examples currently include:
       – NMR of proteins
       – hydroxyapatite (major component of bone)
       – chemical processes following stroke


 • Possibility of treating systems with a few hundred atoms on
   HPCx

 • May be used in conjunction with classical codes (eg DL_POLY)
   for detailed QM treatment of ‘features of interest’




HPCx/Biology Discussions               38           Royal Institution, 6th November 2003
Castep 2003 HPCx performance gain

                           Al2O3 120 atom cell, 5 k-points

                   8000
                   7000
                   6000
        Job time




                   5000
                   4000
                   3000                                                   Jan-03
                   2000                                                   Current 'Best'
                   1000
                      0
                           80       160     240      320

                             Total number of processors


HPCx/Biology Discussions                      39           Royal Institution, 6th November 2003
Castep 2003 HPCx performance gain

                           Al2O3 270 atom cell, 2 k-points

                  16000
                  14000
                  12000
       Job Time




                  10000
                   8000
                   6000                                                 Jan-03
                   4000                                                 Current 'Best'
                   2000
                      0
                            128        256         512

                            Total number of processors


HPCx/Biology Discussions                     40          Royal Institution, 6th November 2003
HPCx: biological applications

 • Castep (version 2) is written by:
       – M Segall, P Lindan, M Probert C Pickard, P Hasnip, S Clark, K
         Refson, V Milman, B Montanari, M Payne.
       – ‘Easy’ to understand top-level code.


 • Castep is fully maintained and supported on HPCx

 • Castep is distributed by Accelrys Ltd

 • Castep is licensed free to UK academics by the UKCP
   consortium (contact ukcp@dl.ac.uk)




HPCx/Biology Discussions                41         Royal Institution, 6th November 2003
CHARMM, NWChem and GAMESS-UK




          Paul Sherwood
NWChem

• Objectives                                    • Tools
     – Highly efficient and portable                  – Global arrays:
       MPP computational chemistry                       • portable distributed data tool:
       package
                                                         Physically distributed data
     – Distributed Data - Scalable with
       respect to chemical system size
       as well as MPP hardware size
     – Extensible Architecture
          • Object-oriented design
              – abstraction, data hiding,                Single, shared data structure
                handles, APIs
          • Parallel programming model                   • Used by CCP1 groups (e.g.
              – non-uniform memory access,                 MOLPRO)
                global arrays
                                                      – PeIGS:
          • Infrastructure
              – GA, Parallel I/O, RTDB, MA, …            • parallel eigensolver,
                                                         • guaranteed orthogonality
     – Wide range of parallel
                                                           of eigenvectors
       functionality essential for HPCx

HPCx/Biology Discussions                         43             Royal Institution, 6th November 2003
     Distributed Data SCF
          MOAO                                                          Pictorial representation of the iterative SCF
                                                    P
                                                                        process in (i) a sequential process, and (ii) a
     guess                      dgemm                                   distributed data parallel process: MOAO
    orbitals
                                                                        represents the molecular orbitals, P the
                 Sequential                                             density matrix and F the Fock or
                 Eigensol ver
                                  F                                   Hamiltonian matrix
                                        Integrals

 If Converged                           V
                                         XC

                                        V
                                         Coul
                                        V
                                         1e
                                                                   MOAO                                         P

                                                              guess
Sequential                                                   orbitals

                                                                                           ga_dgemm


                                                                                   PeIGS
                                                                                           F
                                                          If Converged                                          Integrals
                                                                                                                  V XC
                                                                                                                 V Coul
                      Distributed Data                                                                            V 1e




   HPCx/Biology Discussions                                  44                   Royal Institution, 6th November 2003
NWChem

NWChem Capabilities (Direct, Semi-direct and conventional):
     – RHF, UHF, ROHF using up to 10,000 basis functions; analytic
       1st and 2nd derivatives.
     – DFT with a wide variety of local and non-local XC potentials,
       using up to 10,000 basis functions; analytic 1st and 2nd
       derivatives.
     – CASSCF; analytic 1st and numerical 2nd derivatives.
     – Semi-direct and RI-based MP2 calculations for RHF and
       UHF wave functions using up to 3,000 basis functions; analytic
       1st derivatives and numerical 2nd derivatives.
     – Coupled cluster, CCSD and CCSD(T) using up to 3,000
       basis functions; numerical 1st and 2nd derivatives of the CC
       energy.
     – Classical molecular dynamics and free energy simulations
       with the forces obtainable from a variety of sources

HPCx/Biology Discussions                45         Royal Institution, 6th November 2003
Case Studies - Zeolite Fragments

                                                    • DFT Calculations with
                      Si8O7H18     347/832          Coulomb Fitting

                                                    Basis (Godbout et al.)
                                                           DZVP - O, Si
                     Si8O25H18     617/1444                DZVP2 - H
                                                    Fitting Basis:
                                                           DGAUSS-A1 - O, Si
                                                           DGAUSS-A2 - H
                      Si26O37H36   1199/2818
                                                    • NWChem & GAMESS-UK
                                                    Both codes use auxiliary fitting
                                                    basis for coulomb energy, with 3
                                                    centre 2 electron integrals held in
                      Si28O67H30   1687/3928
                                                    core.


HPCx/Biology Discussions                       46        Royal Institution, 6th November 2003
       DFT Coulomb Fit - NWChem


                       Si26O37H36                1199/2818                                     Si28O67H30                  1687/3928


Measured Time (seconds)                                                Measured Time (seconds)
        2388                          CS7 AMD K7/1000 + SCI
2500                                  CS9 P4/2000 + Myrinet 2k         6000             5507
                                                                                                                     CS7 AMD K7/1000 + SCI
                     2414                                                                                            CS9 P4/2000 + Myrinet 2k
                                      CS2 QSNet Alpha Cluster/667                                                    CS2 QSNet Alpha Cluster
                                      SGI Origin 3800/R14k-500                   4682                                SGI Origin 3800/R14k-500
2000                                  IBM SP/p690                                                                    IBM SP/p690
                                      AlphaServer SC ES45/1000                                                       AlphaServer SC ES45/1000
                                                                       4000
1500                          1271                                                                    3008        3050
           1147
                                                                                  2424
                    951 907                                                                2053
1000
                                                                       2000                    1580     1617
                                                                                                                    1418
                                 517 502 490                                        2351
                                               404         390                                                             1182    880 834
 500                                                             303                                    1504
                                                                                                                                             611


   0                                                                         0
               32                    64              128                            32                       64                   128
                    Number of CPUs                                                         Number of CPUs

       HPCx/Biology Discussions                                         47                 Royal Institution, 6th November 2003
Memory-driven Approaches: NWChem - DFT
(LDA): Performance on the IBM SP/p690


                                                                Zeolite ZSM-5

                                                • DZVP Basis (DZV_A2) and Dgauss
                                                    A1_DFT Fitting basis:

                                                    AO basis:    3554
                                                    CD basis:    12713
                                                •   IBM SP/p690)

                                                Wall time (13 SCF iterations):
                                                64 CPUs = 9,184 seconds
                                                128 CPUs= 3,966 seconds

                                                MIPS R14k-500 CPUs (Teras)
                                                Wall time (13 SCF iterations):
  • 3-centre 2e-integrals = 1.00 X 10 12                64 CPUs = 5,242 seconds
  • Schwarz screening = 6.54 X 10 9                     128 CPUs= 3,451 seconds
  • % 3c 2e-ints. In core = 100%
HPCx/Biology Discussions                   48              Royal Institution, 6th November 2003
    GAMESS-UK
   •     GAMESS-UK is the general purpose ab initio molecular electronic structure
         program for performing SCF-, MCSCF- and DFT-gradient calculations, together
         with a variety of techniques for post Hartree Fock calculations.

          – The program is derived from the original GAMESS code, obtained from Michel
            Dupuis in 1981 (then at the National Resource for Computational Chemistry, NRCC),
            and has been extensively modified and enhanced over the past decade.

          – This work has included contributions from numerous authors†, and has been
            conducted largely at the CCLRC Daresbury Laboratory, under the auspices of the
            UK's Collaborative Computational Project No. 1 (CCP1). Other major sources that
            have assisted in the on-going development and support of the program include
            various academic funding agencies in the Netherlands, and ICI plc.
   •     Additional information on the code may be found from links at:
                         http://www.dl.ac.uk/CFS
† M.F.Guest, J.H. Amos, R.J. Buenker, H.J.J. van Dam, M. Dupuis, N.C. Handy, I.H. Hillier, P.J.
Knowles, V. Bonacic-Koutecky van Lenthe, J. Kendrick, K. Schoffel & P. Sherwood, with
contributions from R.D., W. von Niessen, R.J. Harrison, A.P. Rendell, V.R. Saunders, A.J. Stone
and D. Tozer.

   HPCx/Biology Discussions                         49            Royal Institution, 6th November 2003
GAMESS-UK features 1.
       – Hartree Fock:
            •   Segmented/ GC + spherical harmonic basis sets
            •   SCF-Energies and Gradients: conventional, in-core, direct
            •   SCF-Frequencies: numerical and analytic 2nd derivatives
            •   Restricted, unrestricted open shell SCF and GVB.
       – Density Functional Theory
            • Energies + gradients, conventional and direct including Dunlap fit
            • B3LYP, BLYP, BP86, B97, HCTH, B97-1, FT97 & LDA functionals
            • Numerical 2nd derivatives (analytic implementation in testing)
       – Electron Correlation:
            •   MP2 energies, gradients and frequencies, Multi-reference MP2, MP3 Energies
            •   MCSCF and CASSCF Energies, gradients and numerical 2nd derivatives
            •   MR-DCI Energies, properties and transition moments (semi-direct module)
            •   CCSD and CCSD(T) Energies
            •   RPA (direct) and MCLR excitation energies / oscillator strengths, RPA gradients
            •   Full-CI Energies
            •   Green's functions calculations of IPs.
            •   Valence bond (Turtle)
HPCx/Biology Discussions                           50            Royal Institution, 6th November 2003
GAMESS-UK features 2.
       – Molecular Properties:
            • Mulliken and Lowdin population analysis, Electrostatic Potential-Derived
              Charges
            • Distributed Multipole Analysis, Morokuma Analysis, Multipole Moments
            • Natural Bond Orbital (NBO) + Bader Analysis
            • IR and Raman Intensities, Polarizabilities & Hyperpolarizabilities
            • Solvation and Embedding Effects (DRF)
            • Relativistic Effects (ZORA)
       – Pseudopotentials:
            • Local and non-local ECPs.
       –   Visualisation: tools include CCP1 GUI
       –   Hybrid QM/MM (ChemShell + CHARMM QM/MM)
       –   Semi-empirical : MNDO, AM1, and PM3 hamiltonians
       –   Parallel Capabilities:
            •   MPP and SMP implementations (GA tools)
            •   SCF/DFT energies, gradients, frequencies
            •   MP2 energies and gradients
            •   Direct RPA
HPCx/Biology Discussions                          51            Royal Institution, 6th November 2003
Parallel Implementation of GAMESS-UK

 • Extensive use of Global Array (GA) Tools and Parallel Linear
   Algebra from NWChem Project (EMSL)
 • SCF and DFT
       –   Replicated data, but …
       –   GA Tools for caching of I/O for restart and checkpoint files
       –   Storage of 2-centre 2-e integrals in DFT Jfit
       –   Linear Algebra (via PeIGs, DIIS/MMOs, Inversion of 2c-2e matrix)
 • SCF and DFT second derivatives
       – Distribution of <vvoo> and <vovo> integrals via GAs
 • MP2 gradients
       – Distribution of <vvoo> and <vovo> integrals via Gas
 • Direct RPA Excited States
       – Replicated data with parallelisation of direct integral evaluation


HPCx/Biology Discussions                  52          Royal Institution, 6th November 2003
   GAMESS-UK: DFT Calculations
   Elapsed Time (seconds)                                        Speedup
                                                                 128
       4731                                                                 Linear
5000                        SGI Origin 3800/R14k-500                        SGI Origin 3800/R14k-500
                            IBM SP/Regatta-H                                IBM SP/Regatta-H
                            AlphaServer ES45/1000                           AlphaServer ES45/1000
                                                                  96
3750
                                                                                                       81
                             2838
              2614
                     2504
2500
                                    1681       1867
                                                                  64
                                                                                                                      Valinomycin (DFT HCTH):
                                        1584
                                                      1281                                                            Basis: DZVP2_A2 (Dgauss)
                                                                                                                      (1620 GTOs)
                                                          1100
1250
                                                                  32
                                                                       32         64           96       128
                                                                                                              12000    11053          SGI Origin 3800/R14k-500
  0                                                              128                                                                  IBM SP/Regatta-H
         32                    64                128                        Linear
                                                                                                                                      AlphaServer ES45/1000
          Number of CPUs
                                                                            SGI Origin 3800/R14k-500
                                                                                                       104    9000
                                                                            IBM SP/Regatta-H

   Cyclosporin (DFT B3LYP):
                                                                            AlphaServer ES45/1000
                                                                  96                                                           5557
                                                                                                       102
   Basis: 6-31G*
                                                                                                                                   5823 5846
                                                                                                              6000
                                                                                                       92
   (1855 GTOs)                                                                                                                                 3109
                                                                                                                                                         3388
                                                                                                                                                  3081
                                                                  64                                                                                            1940
                                                                                                              3000
                                                                                                                                                                    1825



                                                                                                                 0
                                                                  32                                                      32              64               128
                                                                       32         64            96      128                        Number of CPUs
       HPCx/Biology Discussions                                                              53               Royal Institution, 6th November 2003
  DFT Analytic 2nd Derivatives Performance
  IBM SP/p690, HP/Compaq SC ES45/1000 and SGI O3800

  (C6H4(CF3))2: Basis 6-31G (196 GTO)
Elapsed Time (seconds)
                               CS14 PIII/1000 + Myrinet (1 CPU)
   3000                        SGI Origin3800/R14k-500 - B3LYP
                               IBM SP/p690 - B3LYP
                               IBM SP/p690 - HCTH
   2500                        AlphaServer ES45/1000 - B3LYP
                               AlphaServer ES45/1000 - HCTH
   2000                            1937
               1614

   1500                                                                 Terms from MO 2e-integrals
                                   1073        1175
                       989                                              in GA storage (CPHF & pert.
   1000                      743                                        Fock matrices);
                                       569                              Calculation dominated by
                                             470           354 307
    500                                                                 CPHF:


       0
                  32                64                128
                                   CPUs
  HPCx/Biology Discussions                            54             Royal Institution, 6th November 2003
CHARMM

  • CHARMM (Chemistry at HARvard Macromolecular
    Mechanics) is a general purpose molecular mechanics,
    molecular dynamics and vibrational analysis package for
    modelling and simulation of the structure and behaviour of
    macromolecular systems (proteins, nucleic acids, lipids etc.)
  • Supports energy minimisation and MD approaches using a
    classical parameterised force field.
  • J. Comp. Chem. 4 (1983) 187-217
  • Parallel Benchmark - MD Calculation of Carboxy Myoglobin
    (MbCO) with 3830 Water Molecules.
  • QM/MM model for study of reacting species
        – incorporate the QM energy as part of the system into the force
          field
        – coupling between GAMESS-UK (QM) and CHARMM.

HPCx/Biology Discussions                 55         Royal Institution, 6th November 2003
      Parallel CHARMM Benchmark
       Benchmark MD Calculation of Carboxy Myoglobin
       (MbCO) with 3830 Water Molecules: 14026 atoms, 1000
       steps (1 ps), 12-14 A shift.



150                        CS2 QSNet Alpha Cluster/667
                           CS9 P4/2000 + Myrinet 2k
                           CS12 P4/2400 + Gbit Ether                               64
                           CS10 P4/2666 + Myrinet                                           Linear
               114                                                                          CS1 PIII/450 + FE/LAM
                           SGI Origin 3800/R14k-500                                56       CS2 QSNet Alpha Cluster/667
        104                AlphaServer SC ES45/1000                                         CS10 P4/2666 + Myrinet
100                        IBM SP/p690                                                      Cray T3E/1200E
                     89                                                            48       SGI Origin 3800/R14k-500
          83
                                                                    73             40
                      72
                                     66                        64
                           64             61        62
                                59                                                 32
                                               54        51
              69                                                         46
 50
                                                                                   24
                                44
                                                              37                   16

                                                                                    8
  0                                                                                     8   16      24     32     40      48     56     64
              16                32                            64
      HPCx/Biology Discussions                                                56                 Royal Institution, 6th November 2003
       QM/MM Applications

                                                                                                            Triosephosphate
                                                                                                            isomerase (TIM)

T 128 (IBM SP/Regatta-H) = 143 secs                                                                         • Central reaction in
                                                                                                            glycolysis, catalytic
 Measured Time (seconds)
                                                                                                            interconversion of
1600                              CS9 P4/2000 + Myrinet 2k                                                  DHAP to GAP
          1487
                                  SGI Origin3800/R14k-500                                                   • Demonstration case
                                  AlphaServer SC ES45/1000                                                  within QUASI (Partners
1200                              IBM SP/Regatta-H                                                          UZH, and BASF)

                  714 797     778
 800
       1030
                                                                                        • QM region 35 atoms (DFT BLYP)
                                    419
                                          431
                                                  428                                        – include residues with possible proton
 400                        540                         274           257                    donor/acceptor roles
                                                                            213
                                                              246
                                                                                  170        – GAMESS-UK, MNDO, TURBOMOLE
                                                308
                                                                    196                 • MM region (4,180 atoms + 2 link)
   0                                                                                         – CHARMM force-field, implemented in
              8           16       32                                     64                 CHARMM, DL_POLY
                        Number of CPUs
   HPCx/Biology Discussions                                                             58          Royal Institution, 6th November 2003
Sampling Methods
– Multiple independent simulations
– Replica exchange - Monte Carlo
    exchange of configurations between an                     P0 P1 P2 P3 P4
    ensemble of replicas at different                        P32 P33 P34 P35 P36
    temperatures
                                                               
–   Combinatorial approach to ligand       E              
                                                                   
                                                                       
    binding
                                                         
–   Replica path method - simultaneously
                                                     
    optimise a series of points defining a       
    reaction path or conformational change,
                                                Reaction Co-ordinate
    subject to path constraints.
     • Suitable for QM and QM/MM
       Hamiltonians                    Collaboration with Bernie Brooks (NIH)
     • Parallelisation per point       http://www.cse.clrc.ac.uk/qcg/chmguk
     • Communication is limited to
       adjacent points on the path -
       global sum of energy function
HPCx/Biology Discussions                  59         Royal Institution, 6th November 2003
Summary

 • Many of the codes used by the community have quite poor
   scaling
 • Best cases
       – large quantum calculations (Crystal, DFT etc)
       – very large MD simulations (NAMD)

 • For credible consortium bid we need to focus on applications
   which have
       – acceptable scaling now (perhaps involving migration to new codes
         (e.g. NAMD)
       – heavy CPU or memory demands (e.g. CRYSTAL).
       – potential for algorithmic development to exploit 1000s of
         processors (e.g. pathway optimisation, Monte Carlo etc)



HPCx/Biology Discussions                60         Royal Institution, 6th November 2003

								
To top