Docstoc

pep-micro-2005-talk

Document Sample
pep-micro-2005-talk Powered By Docstoc
					Path &
Edge
Profiling

    Michael Bond, UT Austin
    Kathryn McKinley, UT Austin
Why profile?

   Inform optimizations
    –   Target hot code
    –   Inlining and unrolling
    –   Code scheduling and register allocation
   Increasingly important for speculative
    optimization
    –   Hardware trends  simplicity & multiple contexts
    –   Less speculation in hardware, more in software
Speculative optimization



Less speculative           More speculative
Speculative optimization



Less speculative           More speculative
Profiling requirements

   Predict future with representative profile
    –   Accurate
    –   Continuous
Profiling requirements

   Predict future with representative profile
    –   Accurate
    –   Continuous
   Other requirements
    –   Low overhead
    –   Portable
    –   Path profiling
   Previous work struggles to meet all goals
PEP: continuous path and edge profiling

   Predict future with representative profile
    –   Accurate: 94-96%
    –   Continuous: yes
   Other requirements
    –   Low overhead: 1.2%
    –   Portable: yes
    –   Path profiling: yes
PEP: continuous path and edge profiling

   Combines all-the-time
    instrumentation &     New!
    sampling
PEP: continuous path and edge profiling

                                             r=0
   Combines all-the-time
    instrumentation &     New!      r=r+2
    sampling
    –   Instrumentation computes
        path number
    –   Sampling updates profiles   r=r+1
        using path number




                                            SAMPLE r
PEP: continuous path and edge profiling

                                             r=0
   Combines all-the-time
    instrumentation &     New!      r=r+2
    sampling
    –   Instrumentation computes
        path number
    –   Sampling updates profiles   r=r+1
        using path number
    –   Overhead: 30%  2%


                                            SAMPLE r
PEP: continuous path and edge profiling

                                               r=0
   Combines all-the-time
    instrumentation &     New!        r=r+2
    sampling
    –   Instrumentation computes
        path number
    –   Sampling updates profiles     r=r+1
        using path number
    –   Overhead: 30%  2%
   Path profiling as efficient
    means of edge profiling         New!
                                              SAMPLE r
Outline

   Introduction
   Background: Ball-Larus path profiling
   PEP
   Implementation & methodology
   Overhead & accuracy
Ball-Larus path profiling

   Acyclic, intraprocedural paths
   Instrumentation maintains execution
    frequency of each path
    –   Each path computes unique integer in [0, N-1]
Ball-Larus path profiling

   4 paths  [0, 3]
Ball-Larus path profiling

   4 paths  [0, 3]
   Each path sums to   2   0
    unique integer


                        1   0
Ball-Larus path profiling

   4 paths  [0, 3]
   Each path sums to   2   0
    unique integer

    Path 0
                        1   0
Ball-Larus path profiling

   4 paths  [0, 3]
   Each path sums to   2   0
    unique integer

    Path 0
                        1   0
    Path 1
Ball-Larus path profiling

   4 paths  [0, 3]
   Each path sums to   2   0
    unique integer

    Path 0
                        1   0
    Path 1
    Path 2
Ball-Larus path profiling

   4 paths  [0, 3]
   Each path sums to   2   0
    unique integer

    Path 0
                        1   0
    Path 1
    Path 2
    Path 3
Ball-Larus path profiling

   r: path register
    –   Computes path number      2   0

   count: frequency table
    –   Stores path frequencies

                                  1   0
Ball-Larus path profiling
                                       r=0
   r: path register
    –   Computes path number      2          0

   count: frequency table
    –   Stores path frequencies

                                  1          0




                                      count[r]++
Ball-Larus path profiling
                                           r=0
   r: path register
    –   Computes path number      r=r+2

   count: frequency table
    –   Stores path frequencies

                                  r=r+1




                                          count[r]++
Ball-Larus path profiling
                                           r=0
   r: path register
    –   Computes path number      r=r+2

   count: frequency table
    –   Stores path frequencies
    –   Array by default
                                  r=r+1
    –   Too many paths?
            Hash table



                                          count[r]++
Outline

   Introduction
   Background: Ball-Larus path profiling
   PEP
   Implementation & methodology
   Overhead & accuracy
Motivation for PEP
                                        r=0


                               r=r+2
        Computes path



                               r=r+1




        Updates path profile
                                       count[r]++
Motivation for PEP
                                           r=0
   Where have
    all the cycles   cheap        r=r+2
    gone?            <10%



                                  r=r+1




                      expensive
                        >90%              count[r]++
What PEP does
                                          r=0


           All-the-time          r=r+2
      instrumentation



                                 r=r+1




                   Sampling
                (piggybacks on
          existing mechanism)            SAMPLE r
What PEP does
                                            r=0


             All-the-time          r=r+2
        instrumentation



Overhead: 30%  2%                 r=r+1




                     Sampling
                  (piggybacks on
            existing mechanism)            SAMPLE r
Profile-guided profiling

   Existing edge profile informs
    path profiling [Joshi et al. ’04]   freq = 30   freq = 70




                                        freq = 90   freq = 10
Profile-guided profiling

   Existing edge profile informs
    path profiling [Joshi et al. ’04]   2   0
   Assign zero to hotter edges


                                        0   1
Profile-guided profiling
                                                 r=0
   Existing edge profile informs
    path profiling [Joshi et al. ’04]   r=r+2

   Assign zero to hotter edges
    –   No instrumentation

                                                       r=r+1




                                                SAMPLE r
Profile-guided profiling
                                                 r=0
   Existing edge profile informs
    path profiling [Joshi et al. ’04]   r=r+2

   Assign zero to hotter edges
    –   No instrumentation
   From practical path profiling
    [Bond & McKinley ’05]                              r=r+1




                                                SAMPLE r
Outline

   Introduction
   Background: Ball-Larus path profiling
   PEP
   Implementation & methodology
   Overhead & accuracy
Implementation

   Jikes RVM
    –   High performance, Java-in-Java VM
    –   Adaptive compilation triggered by sampling
   Two compilers
    –   Baseline compiles at first invocation
            Adds instrumentation-based edge profiling
    –   Optimizer recompiles hot methods
            Three optimization levels
   PEP implemented in optimizing compiler
PEP sampling

   Piggybacks on existing
    thread-switching mechanism
PEP sampling
                                      if (flag)
   Piggybacks on existing              handler();
    thread-switching mechanism
   Jikes RVM inserts yieldpoints
    at loop headers and method
                                             if (flag)
    entry & exit                               handler();
    –   Yieldpoint handlers switch
        threads and update profiles



                                      if (flag)
                                        handler();
PEP sampling
                                         if (flag)
   Piggybacks on existing                 handler();
    thread-switching mechanism
   Jikes RVM inserts yieldpoints
    at loop headers and method
                                                if (flag)
    entry & exit                                  handler(r);
    –   Yieldpoint handlers switch
        threads and update profiles
   PEP samples r at headers
    and method exit
    –   Updates path and edge profiles   if (flag)
                                           handler(r);
Sampling methodology

   Classic
Sampling methodology

   Classic


   Arnold-Grove sampling: invaluable for PEP
    –   Multiple samples per timer tick
    –   Strides to reduce timer-based bias
Sampling methodology

   Classic


   Arnold-Grove sampling: invaluable for PEP
    –   Multiple samples per timer tick
    –   Strides to reduce timer-based bias



   PEP strides before first sample only, not between samples


    –   Timer goes off every 20 ms
    –   Stride 0 -16 samples, then 64 consecutive samples
    –   PEP sampling rate: 3200 paths/second
Execution methodology

   Adaptive: normal adaptive run
      –   Different behavior from run to run




    Adaptive execution



       Adaptive run
Execution methodology

   Adaptive: normal adaptive run
      –   Different behavior from run to run
   Replay: deterministic compilation decisions
      –   First run includes compilation
      –   Second run is application only [Eeckhout et al. 2003]

                         Optimization
                          decisions
    Adaptive execution                            Replay execution

                         Edge profile
                                             First run          Second run
       Adaptive run
                          Call graph    (includes compiler)   (application only)
                           profile
Benchmarks and platform

   SPEC JVM98
   pseudojbb: SPEC JBB2000, fixed workload
   DaCapo Benchmarks
    –   Exclude hsqldb


   3.2 GHz Pentium 4 with Linux
    –   8K DL1, 12Kμop IL1, 512K L2, 1GB memory
Outline

   Introduction
   Background: Ball-Larus path profiling
   PEP
   Implementation & methodology
   Accuracy & overhead
Path accuracy

   Compare PEP’s path profile to perfect profile
   Wall weight-matching scheme [Wall ’91]
    –   Measures how well PEP predicts hot paths
   Branch-flow metric [Bond & McKinley ’05]
    –   Weights paths by their lengths
                                          Accuracy

co




                   0%
                        10%
                              20%
                                    30%
                                          40%
                                                50%
                                                      60%
                                                            70%
                                                                  80%
                                                                        90%
                                                                              100%
     mp
        re   ss
           je s
     ray       s
         tra
             ce

             db
       ja v
mp          ac
   eg
      au
                                                                                     Path accuracy




          dio
         mt
             rt
         ja c
ps
   eu         k
      do
          jbb
        an
            tlr
        blo
             at
           fop
     jy t
          ho
              n
        pm
             d
             ps
        xa
          la n
          Av
             g
Edge accuracy

   Compare PEP’s edge profile to perfect profile
   Relative overlap
    –   Measures how well PEP predicts edge frequency
        relative to source basic block
    –   Jikes RVM uses relative frequencies only
                                        Accuracy

 co




                 0%
                      10%
                            20%
                                  30%
                                        40%
                                              50%
                                                    60%
                                                          70%
                                                                80%
                                                                      90%
                                                                            100%
    mp
       ress
       jes
   ray     s
      tra
         ce
            db
     jav
mp       ac
  eg
     au
       dio
                                                                                   Edge accuracy




        mt
            rt
        jac
ps
   eu       k
      do
         jbb
       an
           tlr
       bl o
           at
         fop
     jyt
         ho
            n
       pm
            d
            ps
      xa
         lan
       Av
           g
                              Total Execution Time

 co




                 0%
                      10%
                            20%
                                  30%
                                        40%
                                              50%
                                                    60%
                                                          70%
                                                                80%
                                                                      90%
                                                                            100%
    mp
       ress
       jes
   ray     s
      tra
         ce
            db
     jav
mp       ac
  eg
     au
       dio
        mt
            rt
        jac
ps
   eu       k
      do
         jbb
                                                                                   Without PEP
                                                                                                 Compilation overhead




       an
           tlr
       bl o
           at
         fop
                                                                                   With PEP




     jyt
         ho
            n
       pm
            d
            ps
      xa
         lan
       Av
           g
                                           Overhead

co




                    0%
                         10%
                               20%
                                     30%
                                           40%
                                                 50%
                                                       60%
                                                             70%
                                                                   80%
                                                                         90%
                                                                               100%
     mp
        re    ss
      je s
 ray      s
     tra
        ce

              db
        ja v
mp          ac
     eg
        au
           dio
          mt
               rt
          ja c
ps            k
                                                                                      Instrumentation only




     eu
        do
                                                                                                                   Execution overhead




           jbb
         an
             tlr
         blo
             at
              fop
       jy t
           ho
              n
         pm
               d
                                                                                      Instrumentation & sampling




              ps
        xa
          la n
          Av
             g
                                   Overhead
co




                    0%
                         1%
                              2%
                                     3%
                                              4%
                                                   5%
                                                        6%
     mp
        re    ss
      je s
          s
 ray
     tra
        ce

              db
        ja v
mp          ac
     eg
        au
           dio
          mt
               rt
          ja c
ps            k
                                                             Instrumentation only




     eu
        do
           jbb
                                                                                          Execution overhead




         an
             tlr
         blo
             at
              fop
       jy t
           ho
              n
         pm
               d
                                                             Instrumentation & sampling




              ps
        xa
          la n
          Av
             g
Related work

   Instrumentation [Ball & Larus ’96]
    –   High overhead
   One-time profiling [Jikes RVM baseline compiler]
    –   Vulnerable to phased behavior
   Sampling
    –   Code sampling [Anderson et al. ’00]
            No path profiling
    –   Code switching [Arnold & Ryder ’01, dynamic instrumentation]
   Hardware [Vaswani et al. ’05, Shye et al. ‘05]
Summary

   Continuous & accurate profiling needed for
    aggressive, speculative dynamic optimization
   PEP: continuous, accurate, low overhead,
    portable, path profiling
Thank you!

   Questions?

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:7
posted:6/29/2011
language:English
pages:54