SciDAC Software Infrastructure for Lattice Gauge Theory Richard C. Brower Annual Progress Review JLab , May 14 , 2007 Code distribution see http://www.usqcd.org/software.html Software Committee • Rich Brower (chair) email@example.com • Carleton DeTar firstname.lastname@example.org • Robert Edwards email@example.com • Don Holmgren firstname.lastname@example.org • Bob Mawhinney email@example.com • Chip Watson firstname.lastname@example.org • Ying Zhang email@example.com SciDAC-2 Minutes/Documents &Progress report http://super.bu.edu/~brower/scc.html Major Participants in SciDAC Project Arizona Doug Toussaint MIT Andrew Pochinsky Dru Renner Joy Khoriaty BU Rich Brower * North Carolina Rob Fowler James Osborn Ying Zhang * Mike Clark JLab Chip Watson * BNL Chulwoo Jung Robert Edwards * Enno Scholz Jie Chen Efstratios Efstathiadis Balint Joo Columbia Bob Mawhinney * IIT Xien-He Sun DePaul Massimo DiPierro Indiana Steve Gottlieb FNAL Don Holmgren * Subhasish Basak Jim Simone Utah Carleton DeTar * Jim Kowalkowski Ludmila Levkova Amitoj Singh Vanderbilt Ted Bapty * Software Committee: Participants funded in part by SciDAC grant QCD Software Infrastructure Goals: Create a unified software environment that will enable the US lattice community to achieve very high efficiency on diverse high performance hardware. Requirements: 1. Build on 20 year investment in MILC/CPS/Chroma 2. Optimize critical kernels for peak performance 3. Minimize software effort to port to new platforms & to create new applications Solution for Lattice QCD (Perfect) Load Balancing: Uniform periodic lattices & identical sublattices per processor. (Complete) Latency hiding: overlap computation /communications Data Parallel: operations on Small 3x3 Complex Matrices per link. Critical kernels : Dirac Solver, HMC forces, etc. ... 70%-90% Lattice Dirac operator: SciDAC-1 QCD API Optimised for P4 and QCDOC Optimized Dirac Operators, Level 3 Inverters ILDG collab QDP (QCD Data Parallel) QIO Level 2 Lattice Wide Operations, Binary/ XML Data shifts Metadata Files QLA (QCD Linear Algebra) Level 1 Exists in C/C++ QMP (QCD Message Passing) C/C++, implemented over MPI, native QCDOC, M-via GigE mesh Data Parallel QDP/C,C++ API • Hides architecture and layout • Operates on lattice fields across sites • Linear algebra tailored for QCD • Shifts and permutation maps across sites • Reductions • Subsets • Entry/exit – attach to existing codes Example of QDP++ Expression Typical for Dirac Operator: QDP/C++ code: Use Portable Expression Template Engine (PETE) Temporaries eliminated, expressions optimized Application Codes: MILC / CPS / Chroma / RoleYourOwn TOPS PERI SciDAC-2 QCD API Level 4 QCD Physics Toolbox Workflow Shared Alg,Building Blocks, Visualization,Performance Tools and Data Analysis tools QOP (Optimized in asm) Uniform User Env Level 3 Dirac Operator, Inverters, Force etc Runtime, accounting, grid, QDP (QCD Data Parallel) QIO Level 2 Lattice Wide Operations, Data shifts Binary / XML files & ILDG Level 1 QLA QMP QMC (QCD Linear Algebra) (QCD Message Passing) (QCD Multi-core interface) SciDAC-1/SciDAC-2 = Gold/Blue Level 3 Domain Wall CG Inverter y JLAB 3G, Level II JLab 3G, Level III JLAB 4G, Level II JLab 4G, Level III FNAL Myrinet, Level III (32 nodes) 1600 1400 1200 1000 Mflops/node 800 600 400 200 0 0 5000 10000 15000 20000 25000 30000 42 6 16 (4) 52 20 Local lattice size (6)102 40 6 163 y Ls = 16, 4g is P4 2.8MHz, 800MHz FSB Asqtad Inverter on Kaon cluster @ FNAL Mflop/s per core 1200 1000 800 SciDAC 16 600 SciDAC 64 non-SciDAC 16 non-SciDAC 64 400 200 0 4 6 8 10 12 14 L Comparison of MILC C code vs SciDAC/QDP on L4 sub-volumes for 16 and 64 core partition of Kaon Level 3 on QCDOC DW RHMC kernels 323x64x16 with subvolumes 43x8x16 Mflop/s 400 350 300 250 Asqtad RHMC kernels 200 Asqta d 150 100 50 0 3 4 6 8 10 L Asqtad CG on L4 subvolumes 243x32 with subvolumes 63x18 Building on SciDAC-1 Fuller use of API in application code. 1. Integrate QDP into MILC & QMP into CPS 2. Universal use of QIO, File Formats, QLA etc 3. Level 3 Interface standards Common Runtime Environment 1. File transfer, Batch scripts, Compile targets 2. Practical 3 Laboratory “Metafacility” Porting API to INCITE Platforms 1. BG/L & BG/P: QMP and QLA using XLC & Perl script 2. Cray XT4 & Opteron, clusters New SciDAC-2 Goals Exploitation of Multi-core 1. Multi-core not Hertz is new paradigm 2. Plans for a QMC API (JLab & FNAL & PERC) See SciDAC-2 kickoff workshop Oct27-28, 2006 http://super.bu.edu/~brower/workshop Tool Box -- shared algorithms / building blocks 1. RHMC, eigenvector solvers, etc 2. Visualization and Performance Analysis (DePaul & PERC) 3. Multi-scale Algorithms (QCD/TOPS Collaboration) http://www.yale.edu/QCDNA/ Workflow and Cluster Reliability 1. Automated campaign to merge lattices, propagators and extract physics . (FNAL & Illinois Institute of Tech) 2. Cluster Reliability: (FNAL & Vanderbuilt) http://lqcd.fnal.gov/workflow/WorkflowProject.html QMC – QCD Multi-Threading • General Evaluation – OpenMP vs. Explicit Thread library (Chen) – Explicit thread library can do better than OpenMP but OpenMP is compiler Serial Idle dependent Code • Simple Threading API: QMC Parallel Parallel – based on older smp_lib (Pochinsky) sites 0..7 sites 8..15 – use pthreads and investigate barrier synchronization algorithms • Evaluate threads for SSE-Dslash finalize / thread join • Consider threaded version of QMP ( Fowler and Porterfield in RENCI) Conclusions Progress has been made using a common QCD-API & libraries for Communication, Linear Algebra, I/0, optimized inverters etc. But full Implementation, Optimization, Documentation & Maintenance of shared codes is a continuing challenge. And there is much work to keep up with changing Hardware and Algorithms. Still NEW users (young and old) with no prior lattice experience have initiated new lattice QCD research using SciDAC software! The bottom line is PHYSICS is being well served.
Pages to are hidden for
"SciDAC Software Infrastructure for Lattice Gauge Theory"Please download to view full document