Docstoc

ppt - Mark and Heather's PPT Lesson

Document Sample
ppt - Mark and Heather's PPT Lesson Powered By Docstoc
					  HIGH PERFORMANCE
ELECTRONIC STRUCTURE
        THEORY

Mark S. Gordon, Klaus Ruedenberg
         Ames Laboratory
      Iowa State University



                 BBG
                OUTLINE


• Methods and Strategies
  – Correlated electronic structure methods
  – Distributed Data Interface (DDI)
  – Approaches to efficient HPC in chemistry
  – Scalability with examples
      CORRELATED ELECTRONIC
       STRUCTURE METHODS
• Well Correlated Methods Needed for
  – Accurate relative energies, dynamics
  – Treatment of excited states, photochemistry
  – Structures of diradicals, complex species
• Computationally demanding: Scalability important
• HF Often Reasonable Starting Point for Ground
  States, Small Diradical Character
  – Single reference perturbation theory
     • MP2/MBPT2 Scales ~N5
     • Size consistent
     • Higher order MBPT methods often perform worse
SINGLE REFERENCE COUPLED
    CLUSTER METHODS
– Cluster expansion is more robust
  • Can sum all terms in expansion
  • Size-consistent
– State-of-the-art single reference method
  • CCSD, CCSDT, CCSDTQ, …
  • CCSD(T), CR-CCSD(T): efficient compromise
     – Scales ~N7
  • Methods often fail for bond-breaking: consider N2
     – Breaking 3 bonds: s +  p
     – Minimal active space = (6,6)
            MCSCF METHODS
• Single configuration methods can fail for
  – Species with significant diradical character
  – Bond breaking processes
  – Often for excited electronic states
  – Unsaturated transition metal complexes
• Then MCSCF-based method is necessary
• Most common approach is
  – Complete active space SCF (CASSCF/FORS)
     • Active space = orbitals+electrons involved in process
     • Full CI within active space: optimize orbitals & CI coeffs
     • Size-consistent
   MULTI-REFERENCE METHODS
• Multi reference methods, based on MCSCF
  – Second order perturbation theory (MRPT2)
    • Relatively computationally efficient
    • Size consistency depends on implementation
  – Multi reference configuration interaction (MRCI)
    •   Very accurate, very time-consuming
    •   Highly resource demanding
    •   Most common is MR(SD)CI
    •   Generally limited to (14,14) active space
    •   Not size-consistent
  – How to improve efficiency?
       DISTRIBUTED PARALLEL
            COMPUTING

• Distribute large arrays among available
  processors
• Distributed Data Interface (DDI) in GAMESS
  – Developed by G. Fletcher, M. Schmidt, R. Olson
  – Based on one-sided message passing
  – Implemented on T3E using SHMEM
  – Implemented on clusters using sockets or MPI,
    and paired CPU/data server
The virtual shared-memory model. Each large box
(grey) represents the memory available to a given
CPU. The inner boxes represent the memory used
by the parallel processes (rank in lower right). The
gold region depicts the memory reserved for the
storage of distributed data. The arrows indicate
memory access (through any means) for the
distributed operations: get, put and accumulate.
FULL shared-memory model:

All DDI processes within a node attach to all the shared-memory segments.

The accumulate operation shown can now be completed directly through
memory.
     CURRENTLY DDI ENABLED
• Currently implemented
  – Closed shell MP2 energies & gradients
    • Most efficient closed shell correlated method when
      appropriate (single determinant)
    • Geometry optimizations
    • Reaction path following
    • On-the-fly “direct dynamics
  – Unrestricted open shell MP2 energies & gradients
    • Simplest correlated method for open shells
  – Restricted open shell (ZAPT2) energies & grad
    • Most efficient open shell correlated method
    • No spin contamination through second order
    CURRENTLY DDI ENABLED

– CASSCF Hessians
  • Necessary for vibrational frequencies, transition state
    searches, building potential energy surfaces
– MRMP2 energies
  • Most efficient correlated multi-reference method
– Singles CI energies & gradients
  • Simplest qualitative method for excited electronic states
– Full CI energies
  • Exact wavefunction for a given atomic basis
– Effective fragment potentials
  • Sophisticated model for intermolecular interactions
                COMING TO DDI
• In progress
  – Vibronic (derivative) coupling (Tim Dudley)
     • Conical intersections, photochemistry
  – GVVPT2 energies&gradients: Mark Hoffmann
  – ORMAS energies, gradients
     • Joe Ivanic, Andrey Adsatchev
     • Subdivides CASSCF active space into subspaces
  – Coupled cluster methods
     • Ryan Olson, Ian Pimienta, Alistair Rendell
     • Collaboration w/ Piotr Piecuch, Ricky Kendall
• Key Point:
  – Must grow problem size to maximize scalability
     FULL CI: ZHENGTING GAN
– Full CI = exact wavefunction for given atomic basis
– Extremely computationally demanding
  • Scales ~ eN
  • Can generally only be applied to atoms & small
    molecules
  • Very important because all other approximate methods
    can be benchmarked against Full CI
  • Can expand the size of applicable molecules by making
    the method highly scalable/parallel
  • CI part of FORS/CASSCF
– Parallel performance for FCI on IBM P3 cluster
  • * singlet state of H3COH:
     – 14 electrons in 14 orbitals
     – 11,778,624 determinants
  • ** singlet state of H2O2
     – 14 electrons in 15 orbitals
     – 41,409,225 determinants
                  32

                  28

                  24

                  20
                                                                JCP, 119, 47 (2003)
        Speedup




                  16

                  12

                   8                          FCI(14,14)*
                                              FCI(14,15)**
                   4

                   0
                       0   4   8   12   16   20 24 28   32 36
                                         NProcs
– Parallel performance for FCI on Cray X1 (ORNL)
  • O-
      – Aug-cc-pVTZ atomic basis, O 1s orbitals frozen
      – 7 valence electrons in 79 orbitals
      – 14,851,999,576 determinants: ~ 8-10 Gflops/12.5 theoretical
                 256

                 224

                 192

                 160
       SpeedUp




                 128

                 96                                    SpeedUp

                 64

                 32

                  0
                       0   32   64   96    128 160 192 224 256
                                          MSPs


– Latest results:aug-cc-pVTZ C2, 8 electrons in 68 orbitals
– 64,931,348,928 determinants, < 4 hours wall time!
– Comparison with Coupled Cluster

 C2 Vertical excitation energies (eV):
                     cc-pVTZ   cc-pVQZ   a-cc-pVTZ a-cc-pVQZ
 1
  g (1Ag):
 EOM-CCSD            4.68      4.76      4.67     4.76
 CR-EOM-CCSD(T)      2.48      2.48      2.56     2.57
 FCI                 2.18
 
  u (1B2u):
 EOM-CCSD            1.33      1.30      1.32     1.30
 CR-EOM-CCSD(T)      1.31      1.30      1.30     1.30
 FCI                 1.28
 1
  u+ (1B1u):
 EOM-CCSD            5.62      5.56      5.58     5.55
 CR-EOM-CCSD(T)      5.82      5.51      5.52     5.51
 FCI                 5.47
 1
  g (1B2g):
 EOM-CCSD            6.49      6.53      6.45     6.51
 CR-EOM-CCSD(T)      4.45      4.42      4.50     4.50
 FCI                 4.38
 Correlation Energy Extrapolation by Intrinsic Scaling: CEEIS
              An Alternative Approach to Full CI
Rigorous variational energy
Determined as complete basis set (CBS) limit of FCI calculations in terms
of systematically consistent basis sets.
Extrapolation to the CBS limit
Extrapolation formulas for Dunning DZ, TZ, ... XZ AO bases
FCI for a given orbital basis
Requires solution of the eigenvalue problem for  = sum of ALL Slater
determinants generated by the orbital basis. This is impossible because the
expansions are much too long.

However: They contain over 99% deadwood.
Question: How to select a priori all live wood that is required for
achieving an accuracy of 1 mh/molecule  0.6 kcal/mole in the energy?

J. Chem. Phys. 121, 10852 (2004)       J. Chem. Phys. 121, 10905 (2004)
J. Chem. Phys. 121, 10919 (2004)       J. Chem. Phys. 122, 154110 (2005)
                        Full CI for a given orbital basis
Natural orbital ordering for a wavefunction 
   Occupations >0.1 :           Principal NOs
   Occupations <0.1 :           Secondary NOs, Òd
                                                ynamically correlatingÓ
Correspondingly ordered determinant expansion of CI wavefunction
      + corr           corr = correlating part of 
    : Zeroth-order wavefunction contains only principal NOs
        SCF (one determinant) or MCSCF (many determinants)
Dynamic correlation term
    corr = x x   +  +  +   n +    
  x = k cxk xk
        xk} = all x-tuple excitations with respect to o ,
                  i.e. all determinants containing x secondary orbitals
 While corr = x x converges fast (6-8 terms for mh accuracy),
 the determinantal expansions x = k cxk xk converge very slowly.
 But, for n>3, they contain over 99% deadwood.
   Correlation energy as a sum of incremental contributions from
                    successive excitation levels
Preliminary calculations
     o by SCF or (small) MCSCF
     Full SD-CI  SD-CI-NOs
All e xcitations generated from these natural orbitals
Resolution in terms of NO-based excitation contributions
    Excitation increment sum:         ETotal  E0 = Ecorr = x E(x)
  E(x) = E(x)ĞE(x-1) = incremental contribution from x-tuple
excitations
   where E(x) = total energy up to and including x-tuple excitations

Incremental contributions E(x) as orbital limits
    E(x) = limit E(x|m)       for m  M,        where
    E(x|m) = analogous to E(x) above, except that only the first m
    correlating natural orbitals are used,
    and M = total number of correlating orbitals
 New relations between contributions from different excitation levels x

Considering E(x|m) as a function of m for fixed x, we infer:
    The values for x4 are related to those for lower x by
            E(x|m) = ax E(x2|m) + cx
    whence for x>4 :
    E(x;x-1|m) = [E(x|m)ĞE(x-2|m)] = Ax E(2|m) + Bx E(3|m) + Cx
    and also :
        E(all excitations|m) Ğ E(3|m)] = A E(2|m) + B E(3|m) + C

Controlled energy extrapolation by intrinsic scaling (CEEIS)
   (i) Obtain coefficients A, B, C by LMS fitting to low values of m
       with a moderate number of determinants.
    (ii) The desired values of E(x) = E(x|M) for x4
         are then obtained from the values for x=2,3 and m=M.
                Accurate binding energies of C2, N2, O2, F2
(i) Full CI energies, including all valence correlations are determined using the
    CEEIS extrapolation for the cc-pV2Z, cc-pV3Z, cc-pV4Z basis sets
ii) The FCI energies are extrapolated to the complete basis set (CBS) limit
     SCF energy:               EX(SCF) = ECBS(SCF) + c exp(X)
     Correlation energy:        EX(Corr) = ECBS(Corr) + aX
   Yields the non-relativistic, valence-only-correlated energies of the four
   molecules and the corresponding atoms.
iii) Experimentally known are the total atomic energies as sum of ionization
     potentials and the total molecular dissociation energies.
iv) To relate these theoretical and experimental quantities, one must account for
    the following effects:
     Scalar relativistic effects in atoms and molecules,
     Spin-orbit coupling in the atoms C,O,F,
     Zero-point vibrational and low rotational energies in molecules,
     In-core and core-valence electron correlations.
Comparison of CEEIS-FCI-CBS and experimental energies for C2, N2, O2, F2
     Energy (mh)                    C            C2        2C  C2
Experimentally Measured         37 852.0     75 935.6   231.8  0.8
Vibration - Rotation                  0.0           4.2      4.2
Scalar Relativistic                  6.65        13.0      0.3
Spin Orbit Coupling                  0.15          0.0      0.3
Core Correlations                   55.0        112.4     2.4
Nonrelativistic Valence Total   37 790.0     75 814.1   234.1
CEEIS - FCI - CBS               37 790.6     75 813.5   -232.3

                                    N            N2         2N  N2
Experimentally Measured         54 610.0    109 578.5   358.5  0.04
Vibration - Rotation                  0.0           5.4      5.4
Scalar Relativistic                 20.7         41.2      0.2
Spin Orbit Coupling                   0.0           0.0      0.0
Core Correlations                   58.8       119.0      1.4
Nonrelativistic Valence Total   54 530.5    109 423.7   
CEEIS - FCI - CBS               54 531.2        
Comparison of experimental and CEEIS-FCI-CBS energies for C2, N2, O2, F2
    Energy (mh)                     O            O2          2O  O2
Experimentally Measured         5 106.45   150 400.9    0.002
Vibration - Rotation                  0.0           3.6      3.6
Scalar Relativistic                 38.35        76.4      0.3
Spin Orbit Coupling                  0.35          0.0      0.7
Core Correlations                          124.9      0.7
Nonrelativistic Valence Total   75 005.3    150 202.5   191.9
CEEIS - FCI - CBS               75 006.4    150 204.0   191.2

                                   F            F2         2F  F2
Experimentally Measured         99 785.3         -58.9  0.2
Vibration ĞRotation                   0.0           2.1       2.1
Scalar Relativistic                                0.0
Spin Orbit Coupling                  0.6           0.0       1.2
Core Correlations                          130.8        0.0
Nonrelativistic Valence Total   99 668.7    199 399.6    
CEEIS - FCI - CBS               99 669.5    199 399.3    60.3
CEEI S-FCI vs. Complete FCI Determinants Required
         for C2, N2, O2, F2 (cc-pVQZ B asis)


         C2         N2         O2           F2


CEEI S   6.4x107   3.2x107   2.0x108     1.1x108

FCI      3.6x1012 1.6x1015   1.7x1017    3.7x1019
Full Potential Energy Surfaces
F2 potential energy curves: cc-pVTZ


                                  60
                                                 CEEIS
 Binding energy = E(F E(2F), mh                  completely renormalized CCSD(T)
                                  45
                                                 CCSD(T)
                                  30             CCSDT
                  2 ) -




                                  15

                                   0

                                  -15

                                  -30

                                  -45

                                  -60

                                    1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0

                                                    R(F-F), Angstroms
                            Summary

CEEIS deduces the correlation contributions of quadruple and
higher excitation levels from those of single, double and triple
excitations.

The FCI energy, generated from a given atomic basis, can be
obtained from energy va lues calculated in only a very small part
of the full configuration space, e.g. 107 vs. 1019 determinants for
F2 with a QZ basis.

Combining these CEEIS full CI energies with extrapolation to
the CBS limit, the complete full CI energy i s approached to
chemical accuracy.

The binding energies obtained agree with experimental values
within the chemical accuracy criterion of 1 kcal/mol.
MCSCF HESSIANS: TIM DUDLEY

– Analytic Hessians generally superior to numerical
  or semi-numerical
– Finite displacements frequently cause artificial
  symmetry breaking or root flipping
– Necessary step for derivative coupling
– Computationally demanding: Parallel efficiency
  desirable
– DDI-based MCSCF Hessians
– IBM clusters, 64-bit Linux
                  Speedup of CAS(2|3) Hessian Calculation of Cyclopentadienyl
                                    Complex of Zirconium

          9



          8



          7
                      Zr                                              y = 0.95x + 0.11
                                                                        R2 = 0.9997
          6
Speedup




          5



          4



          3
                                          304 basis fxns, small active space
          2
                                        Dominated by calc of derivative integrals
          1
              1   2        3        4           5         6      7         8             9
                                           # Processors
                   Speedup of CAS(2|3) Hessian Calculation of Zirconium Cyclopentadienyl
                                               Complex


          32


          28


                              Zr
          24


          20
Speedup




          16


          12


           8


           4


           0
               0      16           32               48     64        80         96            112   128
                                                         # CPUs

                             Derivative Integrals         CPMC SCF        Total Freq. Calc.
                  Speedup of CAS(16|12) Hessian Calculation of Silicon Dioxide

          9



          8



          7                 Si
                                                                         y = 0.97x + 0.03
                        O            O
                                                                              R2 = 1
          6
Speedup




          5



          4



          3
                                              Large active space, small AO basis
          2                                  Dominated by calc of CI blocks of H
          1
              1     2            3       4         5         6       7             8        9
                                              # Processors
                   Speedup of CAS(16|12)/6-31G* Hessian Calculation of Silicon Dioxide


          32



          28



          24                   Si
                           O        O
          20
Speedup




          16



          12



           8



           4



           0
               0     16        32         48          64            80   96        112   128
                                                   # CPUs

                                                Total Freq. Calc.
                  Speedup of CAS(10|9) Hessian Calculation of 7-Azaindole

          9



          8



          7
                                                                      y = 0.96x + 0.09
                                N                                       R2 = 0.9995
                      N
          6                      H
Speedup




          5



          4



          3
                                           216 basis fxns, full p active space
          2                                  Calc is mix of all bottlenecks
          1
              1   2         3        4         5         6        7           8          9
                                          # Processors
                    Speedup of CAS(10|9)/TZV Hessian Calculation of 7-azaindole

          32



          28



          24
                                     N
                        N
                                      H
          20
Speedup




          16



          12



           8



           4



           0
               0   16           32             48     64         80        96             112   128
                                                    # CPUs

                            Derivative Integrals      CPMC SCF        Total Freq. Calc.
       ZAPT2 BENCHMARKS

• IBM p640 nodes connected by dual Gigabit Ethernet
   – 4 Power3-II processors at 375 MHz
   – 16 GB memory
• Tested
   –   Au3H4
   –   Au3O4
   –   Au5H4
   –   Ti2Cl2Cp4
   –   Fe-porphyrin: imidazole
                      Au3H4
• Basis set
   – aug-cc-pVTZ on H
   – uncontracted SBKJC with 3f2g polarization
     functions and one diffuse sp function on Au
   – 380 spherical harmonic basis functions
• 31 DOCC, 1 SOCC
• 9.5 MWords replicated                      Au


• 170 MWords distributed

                                H                       H
                                       Au          Au

                                 H                      H
                      Au3O4
• Basis set
   – aug-cc-pVTZ on O
   – uncontracted SBKJC with 3f2g polarization
     functions and one diffuse sp function on Au
   – 472 spherical harmonic basis functions
• 44 DOCC, 1 SOCC
• 20.7 MWords replicated                     Au


• 562 MWords distributed

                                 O                      O
                                        Au         Au

                                  O                     O
                      Au5H4
• Basis set
   – aug-cc-pVTZ on H
   – uncontracted SBKJC with 3f2g polarization
     functions and one diffuse sp function on Au
   – 572 spherical harmonic basis           Au       Au


     functions
• 49 DOCC, 1 SOCC
                                                Au
• 30.1 MWords replicated
• 1011 MWords distributed
                                      H                   H
                                            Au       Au

                                      H                   H
                  Ti2Cl2Cp4
• Basis set
   – TZV
   – 486 basis functions (N = 486)
• 108 DOCC, 2 SOCC
• 30.5 MWords replicated
• 2470 MWords distributed
      Fe-porphyrin: imidazole
• Two basis sets
   – MIDI with d polarization functions (N = 493)
   – TZV with d,p polarization functions (N = 728)
• 110 DOCC, 2 SOCC
• N = 493
   – 32.1 MWords replicated
   – 2635 MWords distributed
• N = 728
   – 52.1 MWords replicated
   – 5536 MWords distributed
                                       Speedup Curve

          80.0



          70.0



          60.0



          50.0
                                                                     Au3H4 (380)
                                                                     Au3O4 (472)
Speedup




                                                                     Au5H4 (570)
          40.0
                                                                     Ti2Cl2Cp4 (486)
                                                                     Fe-porphyrin (493)
                                                                     Fe-porphyrin (728)
          30.0
                                                                     Linear


          20.0



          10.0



           0.0
                 0   10   20      30         40       50   60   70
                               Number of processors
            Load Balancing
• Au3H4 on 64 processors
   – Total CPU time ranged from 1124 to 1178 sec.
   – Master spent 1165 sec.
   – average: 1147 sec.
   – standard deviation: 13.5 sec.
• Large Fe-porphyrin on 64 processors
   – Total CPU time ranged from 50679 to 51448 sec.
   – Master spent 50818 sec.
   – average: 51024 sec.
   – standard deviation: 162 sec.
           THANKS!

• GAMESS Gang
• DOE SciDAC program
• IBM SUR grants

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:3/8/2011
language:English
pages:49