Docstoc

atlas_dod

Document Sample
atlas_dod Powered By Docstoc
					Automatically Tuned Linear
Algebra Software (ATLAS)




   R. Clint Whaley

University of Tennessee   www.netlib.org/atlas
                  What is ATLAS
 A package that                  AEOS requires:
  adapts to differing              - Method of code
  architectures via
  AEOS techniques                    variation
   - Initially, supply                » Code generation
     BLAS                             » Multiple implement.
 Automated                           » Parameterization
  Empirical                        - Sophisticated Timers
  Optimization of                  - Robust search
  Software (AEOS)                    heuristic
     - Machine searches
       opt space
     - Finds application-
       apparent architecture
    University of Tennessee            www.netlib.org/atlas
Why ATLAS is needed
     BLAS require many man-hours /
      platform
      - Only done if financial incentive is there
         » Many platforms will never have an optimal
           version
      - Lags behind hardware
      - May not be affordable by everyone
      - Improves vendor code
     Allows for portably optimal codes
      - Obsolescence insurance
     Operations may be important, but not
      general enough for standard
University of Tennessee                www.netlib.org/atlas
ATLAS Software

   Currently provided                  Coming soon
    - Full BLAS (C & F77)                - pthread support
       » Level 3 BLAS                    - Open source
           Generated GEMM                 kernels
              - 1-2 hours install          » SSE & 3DNOW!
                time per precision         » GOTO ev5/6
           Recursive GEMM-                  BLAS
            based L3 BLAS
              - Antoine Petitet
                                         - Performance for
       » Level 2 BLAS
                                           banded and packed
             GEMV & GER ker             - More LAPACK
       » Level 1 BLAS                   Coming not-so-
    - Some LAPACK                        soon
       » LU, LLt                         - Sparse support
                                         - User customization
    University of Tennessee                www.netlib.org/atlas
Algorithmic Approach for Matrix
Multiply
       Only generated code is on-chip multiply
       All BLAS operations written in terms of
        generated on-chip multiply
       All transpose cases coerced through data
        copy to 1 case of on-chip multiply
        - Only 1 case generated per platform
             N                K
                                               N

M
                              A
        NB   C            M        *           B          K



University of Tennessee                www.netlib.org/atlas
Algorithmic approach for Level 3
BLAS
                                         Recursive TRMM
       Recur down to L1
        cache block size             0
                                            0
       Need kernel at
                                                          0
        bottom of recursion                     0
        -   Use gemm-based
            kernel for portability                    0
                                                              0
                                                                  0



University of Tennessee              www.netlib.org/atlas
                                                                      MFLOPS
                                   A
                                    M
            D
               A
           D th
             E lo
               C n




                                              100.0
                                                      200.0
                                                              300.0
                                                                      400.0
                                                                              500.0
                                                                                      600.0
                                                                                              700.0
                                                                                                      800.0
                                                                                                              900.0




                                        0.0
            D ev -60
        H E 5 0
           P9 C 6-
       IB 0 e 53
         M 00 v6- 3
       IB PP /73 50
          M C 5/ 0
       IB P 60 13




         University of Tennessee
          M ow 4- 5
        Pe P er 11
           nt ow 2-1 2
             iu er 6
          P e m 3- 0
                                                                                              F77 BLAS




             nt Pr 20
    SG Pe iu o 0
                                                                                              Vendor BLAS
                                                                                              ATLAS BLAS




       I R nti m -20
                                                                                                                      Various Architectures




    SG
           10 um II-2 0


Architectures
    Su I R 00 II 66
      n 12 0ip I-5
        U 00 2 50
                                                                                                                      500x500 DGEMM Across




          ltr 0 8-
             aS ip3 20
                p a 0- 0
                   rc 27
                     2- 0
                       20
                          0
         www.netlib.org/atlas
                                                                                    MFLOPS
                                   AM
                                     D
                                            At




                                                                        100
                                                                              200
                                                                                    300
                                                                                          400
                                                                                                500
                                                                                                      600
                                                                                                                700




                                                                    0
                                              hl
                                                    on
                                                       -
                                    D                      60
                                        EC                   0
                                              ev
                                                   56
                                                        -5
                                      D                    3   3
                                           EC
                                                 ev
                               H                   6-
                                                     50
                                   P9
                                      0                        0
                                           00
                                                /7
                               IB                  3   5/
                                  M                      13
                                        PP                     5
                                          C
                                                 60
                                IB                     4-
                                                          1




University of Tennessee
                                   M                          12
                                           Po
                                             w
                                                                                                                      LU factorization



                                                   er
                                IB                   2-
                                   M                   16
                                                               0
                                           Po
                                             w
                                                   er
                               Pe                    3-
                                                       20
                                     nt                        0
                                                                                                      F77 BLAS




                                        iu
                                             m
                                                                                                      ATLAS BLAS
                                                                                                      Vendor BLAS




                                                   Pr
                                                     o-
                                                       20




     Architecture
                                     Pe                        0
                                           nt
                                              iu
                                                   m
                                                       II-
                                     Pe                   26
                                           nt                  6
                                              iu
                          SG                       m
                            IR                         III
                                    10                    -5
                                       0                       50
                                           00
                          SG                    ip
                            IR                    28
                                                        -2
                                    12                     0   0
                                       0   00
                                                ip
                                                                                                                      500 x 500 Double Precision RB




                          Su                      30
                               n                        -2
                                   Ul
                                      tr                   7   0
                                           aS
                                             pa
                                                   rc
                                                     2-
                                                           20
                                                              0
www.netlib.org/atlas
     500x500 Recursive BLAS on
     UltraSparc 2200
         350                        Vendor BLAS
                                    ATLAS BLAS
         300                        Reference BLAS

         250

         200
MFLOPS




         150

         100

         50

           0
               DGEMM    DSYMM       DSYRK          DSYR2K   DTRMM     DTRSM
          University of Tennessee           BLAS       www.netlib.org/atlas

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:1
posted:9/30/2012
language:English
pages:9