Document Sample

HIGH PERFORMANCE ELECTRONIC STRUCTURE THEORY Mark S. Gordon, Klaus Ruedenberg Ames Laboratory Iowa State University BBG OUTLINE • Methods and Strategies – Correlated electronic structure methods – Distributed Data Interface (DDI) – Approaches to efficient HPC in chemistry – Scalability with examples CORRELATED ELECTRONIC STRUCTURE METHODS • Well Correlated Methods Needed for – Accurate relative energies, dynamics – Treatment of excited states, photochemistry – Structures of diradicals, complex species • Computationally demanding: Scalability important • HF Often Reasonable Starting Point for Ground States, Small Diradical Character – Single reference perturbation theory • MP2/MBPT2 Scales ~N5 • Size consistent • Higher order MBPT methods often perform worse SINGLE REFERENCE COUPLED CLUSTER METHODS – Cluster expansion is more robust • Can sum all terms in expansion • Size-consistent – State-of-the-art single reference method • CCSD, CCSDT, CCSDTQ, … • CCSD(T), CR-CCSD(T): efficient compromise – Scales ~N7 • Methods often fail for bond-breaking: consider N2 – Breaking 3 bonds: s + p – Minimal active space = (6,6) MCSCF METHODS • Single configuration methods can fail for – Species with significant diradical character – Bond breaking processes – Often for excited electronic states – Unsaturated transition metal complexes • Then MCSCF-based method is necessary • Most common approach is – Complete active space SCF (CASSCF/FORS) • Active space = orbitals+electrons involved in process • Full CI within active space: optimize orbitals & CI coeffs • Size-consistent MULTI-REFERENCE METHODS • Multi reference methods, based on MCSCF – Second order perturbation theory (MRPT2) • Relatively computationally efficient • Size consistency depends on implementation – Multi reference configuration interaction (MRCI) • Very accurate, very time-consuming • Highly resource demanding • Most common is MR(SD)CI • Generally limited to (14,14) active space • Not size-consistent – How to improve efficiency? DISTRIBUTED PARALLEL COMPUTING • Distribute large arrays among available processors • Distributed Data Interface (DDI) in GAMESS – Developed by G. Fletcher, M. Schmidt, R. Olson – Based on one-sided message passing – Implemented on T3E using SHMEM – Implemented on clusters using sockets or MPI, and paired CPU/data server The virtual shared-memory model. Each large box (grey) represents the memory available to a given CPU. The inner boxes represent the memory used by the parallel processes (rank in lower right). The gold region depicts the memory reserved for the storage of distributed data. The arrows indicate memory access (through any means) for the distributed operations: get, put and accumulate. FULL shared-memory model: All DDI processes within a node attach to all the shared-memory segments. The accumulate operation shown can now be completed directly through memory. CURRENTLY DDI ENABLED • Currently implemented – Closed shell MP2 energies & gradients • Most efficient closed shell correlated method when appropriate (single determinant) • Geometry optimizations • Reaction path following • On-the-fly “direct dynamics – Unrestricted open shell MP2 energies & gradients • Simplest correlated method for open shells – Restricted open shell (ZAPT2) energies & grad • Most efficient open shell correlated method • No spin contamination through second order CURRENTLY DDI ENABLED – CASSCF Hessians • Necessary for vibrational frequencies, transition state searches, building potential energy surfaces – MRMP2 energies • Most efficient correlated multi-reference method – Singles CI energies & gradients • Simplest qualitative method for excited electronic states – Full CI energies • Exact wavefunction for a given atomic basis – Effective fragment potentials • Sophisticated model for intermolecular interactions COMING TO DDI • In progress – Vibronic (derivative) coupling (Tim Dudley) • Conical intersections, photochemistry – GVVPT2 energies&gradients: Mark Hoffmann – ORMAS energies, gradients • Joe Ivanic, Andrey Adsatchev • Subdivides CASSCF active space into subspaces – Coupled cluster methods • Ryan Olson, Ian Pimienta, Alistair Rendell • Collaboration w/ Piotr Piecuch, Ricky Kendall • Key Point: – Must grow problem size to maximize scalability FULL CI: ZHENGTING GAN – Full CI = exact wavefunction for given atomic basis – Extremely computationally demanding • Scales ~ eN • Can generally only be applied to atoms & small molecules • Very important because all other approximate methods can be benchmarked against Full CI • Can expand the size of applicable molecules by making the method highly scalable/parallel • CI part of FORS/CASSCF – Parallel performance for FCI on IBM P3 cluster • * singlet state of H3COH: – 14 electrons in 14 orbitals – 11,778,624 determinants • ** singlet state of H2O2 – 14 electrons in 15 orbitals – 41,409,225 determinants 32 28 24 20 JCP, 119, 47 (2003) Speedup 16 12 8 FCI(14,14)* FCI(14,15)** 4 0 0 4 8 12 16 20 24 28 32 36 NProcs – Parallel performance for FCI on Cray X1 (ORNL) • O- – Aug-cc-pVTZ atomic basis, O 1s orbitals frozen – 7 valence electrons in 79 orbitals – 14,851,999,576 determinants: ~ 8-10 Gflops/12.5 theoretical 256 224 192 160 SpeedUp 128 96 SpeedUp 64 32 0 0 32 64 96 128 160 192 224 256 MSPs – Latest results:aug-cc-pVTZ C2, 8 electrons in 68 orbitals – 64,931,348,928 determinants, < 4 hours wall time! – Comparison with Coupled Cluster C2 Vertical excitation energies (eV): cc-pVTZ cc-pVQZ a-cc-pVTZ a-cc-pVQZ 1 g (1Ag): EOM-CCSD 4.68 4.76 4.67 4.76 CR-EOM-CCSD(T) 2.48 2.48 2.56 2.57 FCI 2.18 u (1B2u): EOM-CCSD 1.33 1.30 1.32 1.30 CR-EOM-CCSD(T) 1.31 1.30 1.30 1.30 FCI 1.28 1 u+ (1B1u): EOM-CCSD 5.62 5.56 5.58 5.55 CR-EOM-CCSD(T) 5.82 5.51 5.52 5.51 FCI 5.47 1 g (1B2g): EOM-CCSD 6.49 6.53 6.45 6.51 CR-EOM-CCSD(T) 4.45 4.42 4.50 4.50 FCI 4.38 Correlation Energy Extrapolation by Intrinsic Scaling: CEEIS An Alternative Approach to Full CI Rigorous variational energy Determined as complete basis set (CBS) limit of FCI calculations in terms of systematically consistent basis sets. Extrapolation to the CBS limit Extrapolation formulas for Dunning DZ, TZ, ... XZ AO bases FCI for a given orbital basis Requires solution of the eigenvalue problem for = sum of ALL Slater determinants generated by the orbital basis. This is impossible because the expansions are much too long. However: They contain over 99% deadwood. Question: How to select a priori all live wood that is required for achieving an accuracy of 1 mh/molecule 0.6 kcal/mole in the energy? J. Chem. Phys. 121, 10852 (2004) J. Chem. Phys. 121, 10905 (2004) J. Chem. Phys. 121, 10919 (2004) J. Chem. Phys. 122, 154110 (2005) Full CI for a given orbital basis Natural orbital ordering for a wavefunction Occupations >0.1 : Principal NOs Occupations <0.1 : Secondary NOs, Òd ynamically correlatingÓ Correspondingly ordered determinant expansion of CI wavefunction + corr corr = correlating part of : Zeroth-order wavefunction contains only principal NOs SCF (one determinant) or MCSCF (many determinants) Dynamic correlation term corr = x x + + + n + x = k cxk xk xk} = all x-tuple excitations with respect to o , i.e. all determinants containing x secondary orbitals While corr = x x converges fast (6-8 terms for mh accuracy), the determinantal expansions x = k cxk xk converge very slowly. But, for n>3, they contain over 99% deadwood. Correlation energy as a sum of incremental contributions from successive excitation levels Preliminary calculations o by SCF or (small) MCSCF Full SD-CI SD-CI-NOs All e xcitations generated from these natural orbitals Resolution in terms of NO-based excitation contributions Excitation increment sum: ETotal E0 = Ecorr = x E(x) E(x) = E(x)ĞE(x-1) = incremental contribution from x-tuple excitations where E(x) = total energy up to and including x-tuple excitations Incremental contributions E(x) as orbital limits E(x) = limit E(x|m) for m M, where E(x|m) = analogous to E(x) above, except that only the first m correlating natural orbitals are used, and M = total number of correlating orbitals New relations between contributions from different excitation levels x Considering E(x|m) as a function of m for fixed x, we infer: The values for x4 are related to those for lower x by E(x|m) = ax E(x2|m) + cx whence for x>4 : E(x;x-1|m) = [E(x|m)ĞE(x-2|m)] = Ax E(2|m) + Bx E(3|m) + Cx and also : E(all excitations|m) Ğ E(3|m)] = A E(2|m) + B E(3|m) + C Controlled energy extrapolation by intrinsic scaling (CEEIS) (i) Obtain coefficients A, B, C by LMS fitting to low values of m with a moderate number of determinants. (ii) The desired values of E(x) = E(x|M) for x4 are then obtained from the values for x=2,3 and m=M. Accurate binding energies of C2, N2, O2, F2 (i) Full CI energies, including all valence correlations are determined using the CEEIS extrapolation for the cc-pV2Z, cc-pV3Z, cc-pV4Z basis sets ii) The FCI energies are extrapolated to the complete basis set (CBS) limit SCF energy: EX(SCF) = ECBS(SCF) + c exp(X) Correlation energy: EX(Corr) = ECBS(Corr) + aX Yields the non-relativistic, valence-only-correlated energies of the four molecules and the corresponding atoms. iii) Experimentally known are the total atomic energies as sum of ionization potentials and the total molecular dissociation energies. iv) To relate these theoretical and experimental quantities, one must account for the following effects: Scalar relativistic effects in atoms and molecules, Spin-orbit coupling in the atoms C,O,F, Zero-point vibrational and low rotational energies in molecules, In-core and core-valence electron correlations. Comparison of CEEIS-FCI-CBS and experimental energies for C2, N2, O2, F2 Energy (mh) C C2 2C C2 Experimentally Measured 37 852.0 75 935.6 231.8 0.8 Vibration - Rotation 0.0 4.2 4.2 Scalar Relativistic 6.65 13.0 0.3 Spin Orbit Coupling 0.15 0.0 0.3 Core Correlations 55.0 112.4 2.4 Nonrelativistic Valence Total 37 790.0 75 814.1 234.1 CEEIS - FCI - CBS 37 790.6 75 813.5 -232.3 N N2 2N N2 Experimentally Measured 54 610.0 109 578.5 358.5 0.04 Vibration - Rotation 0.0 5.4 5.4 Scalar Relativistic 20.7 41.2 0.2 Spin Orbit Coupling 0.0 0.0 0.0 Core Correlations 58.8 119.0 1.4 Nonrelativistic Valence Total 54 530.5 109 423.7 CEEIS - FCI - CBS 54 531.2 Comparison of experimental and CEEIS-FCI-CBS energies for C2, N2, O2, F2 Energy (mh) O O2 2O O2 Experimentally Measured 5 106.45 150 400.9 0.002 Vibration - Rotation 0.0 3.6 3.6 Scalar Relativistic 38.35 76.4 0.3 Spin Orbit Coupling 0.35 0.0 0.7 Core Correlations 124.9 0.7 Nonrelativistic Valence Total 75 005.3 150 202.5 191.9 CEEIS - FCI - CBS 75 006.4 150 204.0 191.2 F F2 2F F2 Experimentally Measured 99 785.3 -58.9 0.2 Vibration ĞRotation 0.0 2.1 2.1 Scalar Relativistic 0.0 Spin Orbit Coupling 0.6 0.0 1.2 Core Correlations 130.8 0.0 Nonrelativistic Valence Total 99 668.7 199 399.6 CEEIS - FCI - CBS 99 669.5 199 399.3 60.3 CEEI S-FCI vs. Complete FCI Determinants Required for C2, N2, O2, F2 (cc-pVQZ B asis) C2 N2 O2 F2 CEEI S 6.4x107 3.2x107 2.0x108 1.1x108 FCI 3.6x1012 1.6x1015 1.7x1017 3.7x1019 Full Potential Energy Surfaces F2 potential energy curves: cc-pVTZ 60 CEEIS Binding energy = E(F E(2F), mh completely renormalized CCSD(T) 45 CCSD(T) 30 CCSDT 2 ) - 15 0 -15 -30 -45 -60 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 R(F-F), Angstroms Summary CEEIS deduces the correlation contributions of quadruple and higher excitation levels from those of single, double and triple excitations. The FCI energy, generated from a given atomic basis, can be obtained from energy va lues calculated in only a very small part of the full configuration space, e.g. 107 vs. 1019 determinants for F2 with a QZ basis. Combining these CEEIS full CI energies with extrapolation to the CBS limit, the complete full CI energy i s approached to chemical accuracy. The binding energies obtained agree with experimental values within the chemical accuracy criterion of 1 kcal/mol. MCSCF HESSIANS: TIM DUDLEY – Analytic Hessians generally superior to numerical or semi-numerical – Finite displacements frequently cause artificial symmetry breaking or root flipping – Necessary step for derivative coupling – Computationally demanding: Parallel efficiency desirable – DDI-based MCSCF Hessians – IBM clusters, 64-bit Linux Speedup of CAS(2|3) Hessian Calculation of Cyclopentadienyl Complex of Zirconium 9 8 7 Zr y = 0.95x + 0.11 R2 = 0.9997 6 Speedup 5 4 3 304 basis fxns, small active space 2 Dominated by calc of derivative integrals 1 1 2 3 4 5 6 7 8 9 # Processors Speedup of CAS(2|3) Hessian Calculation of Zirconium Cyclopentadienyl Complex 32 28 Zr 24 20 Speedup 16 12 8 4 0 0 16 32 48 64 80 96 112 128 # CPUs Derivative Integrals CPMC SCF Total Freq. Calc. Speedup of CAS(16|12) Hessian Calculation of Silicon Dioxide 9 8 7 Si y = 0.97x + 0.03 O O R2 = 1 6 Speedup 5 4 3 Large active space, small AO basis 2 Dominated by calc of CI blocks of H 1 1 2 3 4 5 6 7 8 9 # Processors Speedup of CAS(16|12)/6-31G* Hessian Calculation of Silicon Dioxide 32 28 24 Si O O 20 Speedup 16 12 8 4 0 0 16 32 48 64 80 96 112 128 # CPUs Total Freq. Calc. Speedup of CAS(10|9) Hessian Calculation of 7-Azaindole 9 8 7 y = 0.96x + 0.09 N R2 = 0.9995 N 6 H Speedup 5 4 3 216 basis fxns, full p active space 2 Calc is mix of all bottlenecks 1 1 2 3 4 5 6 7 8 9 # Processors Speedup of CAS(10|9)/TZV Hessian Calculation of 7-azaindole 32 28 24 N N H 20 Speedup 16 12 8 4 0 0 16 32 48 64 80 96 112 128 # CPUs Derivative Integrals CPMC SCF Total Freq. Calc. ZAPT2 BENCHMARKS • IBM p640 nodes connected by dual Gigabit Ethernet – 4 Power3-II processors at 375 MHz – 16 GB memory • Tested – Au3H4 – Au3O4 – Au5H4 – Ti2Cl2Cp4 – Fe-porphyrin: imidazole Au3H4 • Basis set – aug-cc-pVTZ on H – uncontracted SBKJC with 3f2g polarization functions and one diffuse sp function on Au – 380 spherical harmonic basis functions • 31 DOCC, 1 SOCC • 9.5 MWords replicated Au • 170 MWords distributed H H Au Au H H Au3O4 • Basis set – aug-cc-pVTZ on O – uncontracted SBKJC with 3f2g polarization functions and one diffuse sp function on Au – 472 spherical harmonic basis functions • 44 DOCC, 1 SOCC • 20.7 MWords replicated Au • 562 MWords distributed O O Au Au O O Au5H4 • Basis set – aug-cc-pVTZ on H – uncontracted SBKJC with 3f2g polarization functions and one diffuse sp function on Au – 572 spherical harmonic basis Au Au functions • 49 DOCC, 1 SOCC Au • 30.1 MWords replicated • 1011 MWords distributed H H Au Au H H Ti2Cl2Cp4 • Basis set – TZV – 486 basis functions (N = 486) • 108 DOCC, 2 SOCC • 30.5 MWords replicated • 2470 MWords distributed Fe-porphyrin: imidazole • Two basis sets – MIDI with d polarization functions (N = 493) – TZV with d,p polarization functions (N = 728) • 110 DOCC, 2 SOCC • N = 493 – 32.1 MWords replicated – 2635 MWords distributed • N = 728 – 52.1 MWords replicated – 5536 MWords distributed Speedup Curve 80.0 70.0 60.0 50.0 Au3H4 (380) Au3O4 (472) Speedup Au5H4 (570) 40.0 Ti2Cl2Cp4 (486) Fe-porphyrin (493) Fe-porphyrin (728) 30.0 Linear 20.0 10.0 0.0 0 10 20 30 40 50 60 70 Number of processors Load Balancing • Au3H4 on 64 processors – Total CPU time ranged from 1124 to 1178 sec. – Master spent 1165 sec. – average: 1147 sec. – standard deviation: 13.5 sec. • Large Fe-porphyrin on 64 processors – Total CPU time ranged from 50679 to 51448 sec. – Master spent 50818 sec. – average: 51024 sec. – standard deviation: 162 sec. THANKS! • GAMESS Gang • DOE SciDAC program • IBM SUR grants

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 6 |

posted: | 3/8/2011 |

language: | English |

pages: | 49 |

OTHER DOCS BY suchenfz

How are you planning on using Docstoc?
BUSINESS
PERSONAL

Feel free to Contact Us with any questions you might have.