SANS by stariya


									      The First Self Adapting Numerical Software (SANS) Summit

Jack Dongarra, Piotr Luczszek, Eric Meek, Sathish Vadhiyar, Zizhong “Jeffrey”Chen,
Ken Roche (University of Tennessee)
Lennart Johnson (University of Hosuton)
Kathy Yelick, Rich Vuduc (University of Berkeley)
Christoph Üeberhuber, Franz Franchetti (Vienna University of Technology)
Jeremy Johnson (Drexel University)
Markus Püschel (CMU)
Jorge Moré (Argonne National Lab)

Day 1 (August 7, 2002)

Christoph contacted Jack to setup the 1st SANS meeting.

1. Introduction:

2. Agenda by Jack Dongarra

3. “Self Adaptive DSP Software” – Vienna University of Technology

       a. FFTs (on Blue Gene – collaboration with Jose Moreira)
       b. SALT – Self Adapting Library for Transforms.
       c. MAP – Special purpose compiler.
       d. Single processor activities – FMA optimization, Parallel machine activities –
       adaptive communication.
       e. Problems with general purpose compilers – bad register usage.
       f. Solutions – Scheduled C code, Kernel backend, fft -> assembly by MAP
       instead of FFT -> C ->assembly.
       g. Comparisions of different SIMDs.
       h. Mini grids – chose number of processors.
       i. Collaboration with DSP.
       j. SALT – very similar to GrADS using Cluster Evaluator (CLUE) and GRIM.
       k. MAP – special purpose kernel compilers.

4. SPIRAL – Markus Püschel

       a. Library generator - Automates implementation, optimization etc.
       b. Formula generator, SPL compiler, search engine, output – platform-adapted
       c. DSP transform – algorithm, transform, rule, ruletree, formula
      d. Automatic generation of formula
      e. Formula -> implementation
      f. SPL language, SPL compiler input: SPL program output: C, Fortran Code
      g. Search – to find best implementation. Different search methods.
      h. Recent work – Learning to generate fast algorithms , generates performance
      model from the training set
      i. SIMD: Vector code generation from SPL formulas.
      j. Parallel code generation
      k. Filters and Wavelets
      l. Discussion on FPGA and matrix multiply
      m. Discussion on compiler limitations – length of compilation times

5. Adaptive Scientific Software Libraries – Lennart Johnson

      a. Automatic algorithm selection, exploit multiple precision options, code
      b. GrADS project.
      c. UHFFT – Different factorization algorithms.
      d. Performance Modeling – analytic models.
      e. Search options.
      f. Results demonstrating codelet efficiency.
      g. New results obtained on Itanium and Ultrasparc-III.
      h. Code generator written in C.
      i. New tools – CODELAB – uses script languages.

12:00 - Lunch Break

1:00 – Meeting resumes

6. NEOS – Jorge J. Moré

      a. Nonlinearly constrained optimization.
      b. User submits optimization problem that is solved remotely.
      c. Optimization problem submitted by means of a procedure.
      d. Many problems being submitted to NEOS every year.
      e. Uses of NEOS for 2000, 2001.
      f. Problems – Stores the submitted problem.
      g. Self adapting optimization problems – detecting problem structure
      automatically, comparing performance profiles.
      h. Determining the best algorithm is difficult – what is best?

7. Automatic Performance Tuning of Sparse Matrix Kernels – Richard Vuduc.

      a. Performance depends on architecture, kernel and matrix.
      b. Different optimization techniques – register blocking, cache blocking, matrix
       c. Search procedure – depends on benchmark and models based.
       d. Example – determining register block size based on performance model.
       e. Results compared with reference implementations on Pentium 3, 4 and
       Power3, Itanium – results shown for matrix vector, triangular solve.
       f. Exploiting additional matrix structure, split matrices, cache blocking
       g. Current directions – tuning parameter selection, extending sparse BLAS.

8. SANS and NetSolve – University of Tennessee

       a. ATLAS BLAS – some results to show improvement with ATLAS.
       b. ATLAS uses recursive approach for level 3 BLAS.
       c. Automatic selection of MPI collective communication algorithm.
       d. CG variants by dynamic selection at runtime.
       e. Optimizations of BiCG-Stab – combining 2 vector ops. into 1 loop, indexing
       f. Split ADI method
       g. LAPACK for Clusters – choose optimum processors and block size.
       h. LFC – User stages data to disk, calls library middleware, resource selection,
       time function minimization.
       i. LFC – plan to do LU, Cholesky, QR and eigen value.
       j. Future SANS efforts – IC, system component, history database.
       k. CCA – dynamically selecting sparse matrix package
       l. CCA – approaches – blackbox, user steering,
       m. CCA – NSF funded Next Generation Software (NGS)

9. Grid TLSE project

       a. Design a web expert site for sparse matrices.
       b. User passes a matrix to the web site. The site will say which algorithm to use.

10. Discussion on recursive LU factorizations.

Meeting ends at 4:00 P.M.

2nd Day:

11. Innovative Computing Laboratory – Jack Dongarra

       a. Fields – Numerical algebra, distributed computing, repositories, performance
       b. People – employees, students

12. Discussion

       a. Proposal to have a web page that has got links to all SANS projects. The
       webpage will be
     b. Proposal to have a workshop on SANS in some conference – ICCS 2003,
     Federated Computing Research Conference, SC2002 BOF, SC2003 workshop,
     EUROPAR 2003.
     c. Christopher thoughts:
        I.     Are we targeting all systems? – Maybe specialized systems like FPGA.
       II.     If specializing for hardware, the problem gets difficult because of the huge
           search space.

     III.       Performance models for what software?
     IV.        Compiler technologies – are we doing it ourselves or are we going to
            influence? Try to get some compiler people into this. Issues with straight line

     d. Kathy Yelick will speak to SUN regarding sending proposals.
     e. Self adaptivity – different views – high level and low level. Input data based
     and hardware based.
     f. Need to worry about loss in system performance while doing self adaptivity.
     g. Speak about standards.
     h. Proposal submissions - NSF ITR proposal, DARPA HPC. Thoughts about
     SANS in Grid and embedded computing. In Europe – EU Framework program for
     5 years.
     i. Integrate MAP and ATLAS.

1:30 – Meeting adjourned

To top