Docstoc

International Mileage Chart - Download as PowerPoint

Document Sample
International Mileage Chart - Download as PowerPoint Powered By Docstoc
					Performance and Power Modeling: The
Return of Synthetic Benchmarks


            Prof. Lizy K. John
            Email: ljohn@ece.utexas.edu
            www: http://users.ece.utexas.edu/~ljohn



Keynote Presentation
9th Annual Austin CAS Conference 2008
9th ACAS 2008 - Lizy K. John   7/7/2010
  If cars were benchmarked like computers
                                                                               Mileage Chart
Mileage chart might have looked like                                          I-10   25.125 mpg

And we would have asked questions like                                        I-20   23.115 mpg
                                                                              I-80   24.997 mpg
    –When you drive through Austin on I-35, how                               I-90   21.236 mpg

     many truck accidents were there?                                         I-35   17.433 mpg
                                                                              I-95   23.534 mpg
    –Was there a longhorn game that day?                                      I-75   28.758 mpg

    –Did you drive on I-80 in the summer or winter?
                                                                           International Routes
    –Was it night or day when you crossed Chicago?
                                                                           AutoBahn
Imagine running cars on
                                                                           Selangor Route 11
          I-10 (CA to FL)   2460 miles
                                                                           Japan‟s Hwy 142
          I-20 (TX to SC)   1540 miles
          I-80 (CA to NJ)   2900 miles
          I-90 (WA to MA)   3020 miles                   Abstracting the Long Roads to
          I-35 (TX to MN)   1570 miles                              City
          I-95 (Maine-FL)   1920 miles
          I-75 (FL to MI)   1790 miles                              Highway

                                     9th ACAS 2008 - Lizy K. John                                 7/7/2010
If we look at our computer performance
evaluation trends, we have been adding longer
and longer roads to the list.




                9th ACAS 2008 - Lizy K. John   7/7/2010
The Resurrection of Synthetic Benchmarks

   Whetstone, Dhrystone
   Misuse of Synthetic Benchmarks
   Not many good memories
   A totally new look at a new kind of synthetic benchmarks
   A few new scenarios
     – The Pre-Silicon Nightmare
     – Proprietary Applications that cannot be shared
     – Expiring Benchmarks
     – Power and Thermal Stress Benchmarks




                             9th ACAS 2008 - Lizy K. John      7/7/2010
The Pre-silicon Design Nightmare

   Modern microprocessors are built from millions of lines
    of VHDL/Verilog code (IBM POWER4, 1.5 million lines
    of VHDL, 174 million transistors)
   RTL and Performance Models are 1000X or more
    slower than hardware.
   Industry standard benchmarks run for hours on actual
    hardware.
   Design tradeoffs need to be analyzed quickly to make
    timely design decisions.
   If the wrong decisions are made, the pre-silicon
    nightmare will become a serious Post-silicon nightmare




                             9th ACAS 2008 - Lizy K. John     7/7/2010
Synthesis Process

                              Detailed Workload
                                    Profile
Real Workload               Instruction Mix, ILP,
 101010101010010101         Data Access Patterns,
 101010101010101001         branch patterns ..                                             ADD R1, R2,R3
                                                                                           LD R4, R1, R6
 101010101001010101
                                                                                          MUL R3, R6, R7
1010101010101001010                                                                       ADD R3, R2, R5
1010101010100101010                                                                       DIV R10, R2, R1
                                                                                          SUB R3, R5, R6
1010101010010010101                                                                     STORE R3, R10, R20
1010101010010101101                                                                        ADD R1, R2,R3
 101010101010100101                        A                                               LD R4, R1, R6
                                                                                          MUL R3, R6, R7
 101010101010101010                                                                       ADD R3, R2, R5
 101010101010101001                             BR                                        DIV R10, R2, R1
                                       0.                0.                               SUB R3, R5, R1
1010101010100101010
                                               B 8                 2C                    BEQ R3, R6, LOOP
1010101010101010101                                                                       SUB R3, R5, R6
                                      BR
 101010101010101010                                           BR                        STORE R3, R10, R20
                                      1.            1.                                    DIV R10, R2, R1
 101010101010101011                                                                            ………….
                                                0              0
1010101010010101010                        D
1010101010101010101
                                               BR         0.
 101010101010101011
     101010110
                                      0.
                                                9
                                                                   1               Synthetic Clone


              Measure Workload                                          Generate Synthetic
              Characteristics at                                        Clone with Similar
              Basic Block                                               Inherent
              Granularity                                               Characteristics
                              9th ACAS 2008 - Lizy K. John                                           7/7/2010
        Synthetic Workload Generation

         Workload Space of Inherent                          Modeling Workload Attributes               Experiment
          Program Characteristics                              into Synthetic Workload                 Environment


                                                                                                       VHDL/Verilog
                     Workload                                                                            Models
                    Signature =                                                            Synthetic
                                                             Workload
                      Point in                                                            Benchmark
                                                            Synthesizer
                   Workload Space                                                            Clone
                                                                                                       Performance
                                                                                                        Simulator


              Instruction Mix
                                Control Flow
                                 Behavior


Program Locality
                                       Instruction Level
                                          Parallelism



             Error in performance and power estimates obtained with synthetics: <10%
                                                           9th ACAS 2008 - Lizy K. John                           7/7/2010
Overall Performance Estimation


                                                                                                               Benchmark                 Synthetic Clone
                          12

                          10
 Cycles-Per-Instruction




                          8

                          6

                          4

                          2

                          0




                                                                                                                                                         vpr
                                                                                                                                                vortex
                                                                                                  mcf




                                                                                                                                        twolf
                                                             crafty

                                                                      eon
                                applu




                                                     bzip2




                                                                                           gzip




                                                                                                        mesa

                                                                                                               mgrid
                                                                            equake




                                                                                                                                                               wupwise
                                        apsi




                                                                                                                                 swim
                                               art




                                                                                     gcc




                                                                                                                       perlbmk
                               Average Error of 6.3% & Maximum Error of 9.9% (mcf) [Bell05,06]




                                                                            9th ACAS 2008 - Lizy K. John                                                                 7/7/2010
Overall Power Estimation


                                                                                                                  Benchmark                      Synthetic Clone
                            20
                            18
                            16
   Energy-Per-Instruction




                            14
                            12
                            10
                            8
                            6
                            4
                            2
                            0




                                                                                                                                                     vortex

                                                                                                                                                              vpr
                                                              crafty




                                                                                              gzip
                                                      bzip2
                                                art




                                                                                                                                             twolf
                                                                                        gcc
                                                                       eon
                                 applu




                                                                               equake




                                                                                                     mcf




                                                                                                                            perlbmk
                                         apsi




                                                                                                                    mgrid




                                                                                                                                                                    wupwise
                                                                                                                                      swim
                                                                                                           mesa
                                     Average Error of 7.3% & Maximum Error of 10.6% (mcf)




                                                                             9th ACAS 2008 - Lizy K. John                                                                     7/7/2010
Five Orders of Magnitude Speedup!
                    Benchmark           Speedup from Synthetic
                                            Benchmark Clone
                      Applu                        22,300
                       apsi                        34,700
                        art                         4,500
                      bzip2                        12,800
                      crafty                       19,100
                       eon                          8,000
                      equake                       13,100
                       gcc                          4,600
                       gzip                        10, 300
                       mcf                          6,100
                       mesa                        14,100
                      mgrid                        41,900
                       swim                        22,500
                       twolf                       34,600
                      vortex                       11,800
                       vpr                          8,400
                     wupwise                       34,900


                                9th ACAS 2008 - Lizy K. John     7/7/2010
Sharing Proprietary Applications


 Military Applications cannot be shared with Vendors
 Many commercial applications cannot be shared freely between
  software and hardware developers

       Proprietary Applications




                                                            Performance/Power Clones


                             9th ACAS 2008 - Lizy K. John                          7/7/2010
 Earlier Performance Testing

 Usage Scenario
   – Derive proxy applications from a set of workload characterizations
   – Proxies convey no proprietary information, but capture the execution behavior
     of XX’s applications
   – Proxy applications can be available very early in project
   – Proxies focus on the computationally significant portions of code, not the small
     parts that are hardware-dependent (e.g., DMA, etc.)

 Early testing confirms performance, or discovers shortfalls in time to
  address them before final integration

      Platform-independent         Platform Performance           Workload-Derived
            Workload               Prediction & Analysis         Application Proxies
        Characterizations                 Engine




                                  9th ACAS 2008 - Lizy K. John                          7/7/2010
Proxy Applications can Transform Procurement Process

 Proxy applications can turn vendors into partners in achieving
  performance goals
   – Provides them with means to directly measure performance
   – Places the iterative test/analyze process in vendor’s labs
   – Reveals no proprietary information




 Turn hardware performance tuning into a supplier task
   – Periodically capture application performance in a new set of proxies
   – Task supplier to re-tune hardware for best performance as applications mature




                                  9th ACAS 2008 - Lizy K. John                       7/7/2010
The Expiring Benchmarks


 SPEC89, SPEC92, SPEC95, SPEC CPU2000


 TPC A, TPC B, TPC D


 Benchmarks become obsolete very quickly


 Benchmark makers create inflated data sets
   to stay alive at least 4-5 years


 Lifetime of benchmarks has been not
   more than time for their creation


                           9th ACAS 2008 - Lizy K. John   7/7/2010
Adaptable Benchmark Synthesis

                                                   Workload Characteristics
                                              ix                                               el
                                                                             w              ev
         Application                       nM          ram               lF
                                                                           lo
                                                                                        on
                                                                                           L
       Behavior Space               c   tio         rog ality         tro vior       cti elism
                                tr u               P c               n
                                                                   Co eha         tru ll
                          Ins                       Lo               B         Ins Para



     „Knobs‟ for Changing                                                                           Scalable Benchmarks
          Program
        Characteristcs
                                                                                                    Futuristic Benchmarks

     Workload Synthesis                            Workload Synthesizer
         Algorithm



                                                              ADD R1, R2,R3
                                                              LD R4, R1, R6
                                                             MUL R3, R6, R7
                                                             ADD R3, R2, R5
                                                             DIV R10, R2, R1
                                                             SUB R3, R5, R6
                                                           STORE R3, R10, R20

     Synthetic Benchmark                                      ADD R1, R2,R3
                                                              LD R4, R1, R6
                                                             MUL R3, R6, R7
                                                             ADD R3, R2, R5
                                                             DIV R10, R2, R1
                                                             SUB R3, R5, R1
                                                            BEQ R3, R6, LOOP
                                                             SUB R3, R5, R6
                                                           STORE R3, R10, R20
                                                             DIV R10, R2, R1
                                                                  ………….




                                        Real Hardware or                        Execution Driven
     Compile and Execute                      RTL                                  Simulator




                                                   9th ACAS 2008 - Lizy K. John                                  7/7/2010
 Power and Thermal Stress Benchmarks [Joshi, HPCA2008]
                                                                     1
Automatic identification of
                                                  BenchMaker
test sequences that                                                                                 Genetic Algorithms
cause power and thermal
stress                                                                            ADD R1, R2,R3
                                                                                  LD R4, R1, R6
                                                                                 MUL R3, R6, R7
                                                                                 ADD R3, R2, R5
                                                                                 DIV R10, R2, R1

                            Workload                                             SUB R3, R5, R6
                                                                               STORE R3, R10, R20



                          Characteristics
                                                                                  ADD R1, R2,R3
                                                                                  LD R4, R1, R6
                                                                                 MUL R3, R6, R7
                                                                                 ADD R3, R2, R5
                                                                                                     Intermediate Test
                                                                                 DIV R10, R2, R1
                                                                                 SUB R3, R5, R1
                                                                                BEQ R3, R6, LOOP
                                                                                 SUB R3, R5, R6
                                                                               STORE R3, R10, R20
                                                                                 DIV R10, R2, R1
                                                                                      ………….




                                                                                                           2
 Power Virus                Workload
                     4       Space                                        Performance Model /
                           Exploration                                    RTL/ Real Hardware
Thermal Virus
                                                                                                                    stressmark
                                                                                                                            ADD R1, R2,R3
                                                                                                                            LD R4, R1, R6
                                                                                                                           MUL R3, R6, R7
                                                                                                                           ADD R3, R2, R5
                                                                                                                           DIV R10, R2, R1


                           Application                                         Evaluator –                                 SUB R3, R5, R6
                                                                                                                         STORE R3, R10, R20
                                                                                                                            ADD R1, R2,R3
                                                                                                                            LD R4, R1, R6

                         Behavior Space                                  Quality of stressmark &                           MUL R3, R6, R7
                                                                                                                           ADD R3, R2, R5
                                                                                                                           DIV R10, R2, R1

                                                                           Simulation Budget                               SUB R3, R5, R1
                                                                                                                          BEQ R3, R6, LOOP
                                                                                                                           SUB R3, R5, R6

                                                               3                                                         STORE R3, R10, R20
                                                                                                                           DIV R10, R2, R1
                                                                                                                                ………….




                                      9th ACAS 2008 - Lizy K. John                                                                7/7/2010
                  Performance and
                Power Clones for RTL


                                                   Clones for
  Stress                                          Proprietary
Benchmarks                                        Applications



                         Synthetic
                        Benchmarks




 Benchmarks with Knobs                            Futuristic
 Adjustable Benchmarks                           Benchmarks
  Scalable Benchmarks




                  9th ACAS 2008 - Lizy K. John                   7/7/2010
Weaknesses – The Vitamin Tablet Problem




                                                          Vitamin A, B, C, D, E, K




  What about undiscovered vitamins? Vitamin P, Q, R?

                           9th ACAS 2008 - Lizy K. John                              7/7/2010
Other Applications of Workload Characterization

  Benchmark Similarity/Dissimilarity Studies
  Performance Prediction
  Benchmark Subsetting




                    9th ACAS 2008 - Lizy K. John   7/7/2010
One of our Recent Contributions
Workload Similarity/Dissimilarity Analysis for Benchmark
  Subsetting and Performance Prediction (used during
  selection of SPEC 2006)

            Performance Characteristic 2




                                                                              Illustration
                                                                           Not from real data


                                           Performance Characteristic 1



                                            9th ACAS 2008 - Lizy K. John                     7/7/2010
Principal Components Analysis (PCA)


– Reduce dimensionality if you
  have lots of characteristics

– PCA is useful here

– It removes correlation between                                                                 Variable 1
  program characteristics

– Principal Components (PC) are
  linear combination of original
  characteristics
                                                          PC1  a11 x1  a12 x 2  a13 x 3  .....
                                                          PC 2  a 21 x1  a 22 x 2  a 23 x 3  .....
– Var(PC1) > Var(PC2) > ...                               PC3  a 31 x1  a 32 x 2  a 33 x 3  .....



                           9th ACAS 2008 - Lizy K. John                                                  7/7/2010
Four Generations of SPEC CPU benchmarks

                                                Instruction locality characteristics




                                  Good
   Instruction spatial locality




                                  Poor



   PC1 and PC2
                                         Good                                             Poor
                                  97%                     Instruction temporal locality
                                                9th ACAS 2008 - Lizy K. John   7/7/2010          7/7/2010
CPU2000 compared to CPU2006 [ISCA07]




                                                           CPU2006
                                                           covers a
                                                            broader
                                                         characteristic
                                                          space than
                                                           CPU 2000




                          9th ACAS 2008 - Lizy K. John            7/7/2010
Subsetting CPU2006 integer programs [ISCA07]




   Subset of Four   400.perlbench, 462.libquantum, 473.astar,
     Programs       483.xalancbmk
   Subset of Six    400.perlbench, 471.omnetpp, 429.mcf,
    Programs        462.libquantum, 473.astar, 483.xalancbmk

                             9th ACAS 2008 - Lizy K. John       7/7/2010
                                                                                                          Speedup




                                                                                          0
                                                                                              2
                                                                                                  4
                                                                                                      6
                                                                                                          8
                                                                                                              10
                                                                                                                   12
                                                                                                                        14
                                                                                                                             16
                                                                                                                                  18
                                                                                                                                       20

                                                                          AMD Tyan
                                                                         Thunder K8E


                                                                            Apple iMac
                                                                           2.0GHz Intel
                                                                            Core Duo




                                       K=4
                               K=6
                                                                          Fujitsu SC
                                                                                                                                            Using all benchmarks




                                                                        CELSIUS M440



                                                                        Intel DG965WH
                                                                          motherboard
                                                                                                                                                                       Validation of Integer Subset Using Speedup




                                       5.8 %
                               3.8 %
                                                                       Sun Blade 2000




9th ACAS 2008 - Lizy K. John
                                                                                                                                            Based on the subset of 6




                                                                          Bull SAS
                                               Average Speedup error
                                                                       NovaScale B260




7/7/2010
                                                                         (Intel Xeon
                                                                       E5320,1.86GHz)

                                                                        Dell Precision
                                                                        690 (Intel Xeon
                                                                       5160, 3.00 GHz)
                                                                                                                                            Based on the subset of 4




                                                                         HP Integrity
                                                                       BL860c (1.6GHz
                                                                        Dual-Core Intel
                                                                          Itanium 2)
7/7/2010
                                                                                        10
                                                                                        12
                                                                                        16
                                                                                        18

                                                                                        14
                                                                                        20




                                                                                         0
                                                                                         4
                                                                                         6

                                                                                         2
                                                                                         8
                                                                       HP Integrity rx6600
                                                                           (Itanium 2)



                                                                      Fujitsu SC CELSIUS
                                                                              V830




                               K=6
                                     K=4
                                                                       Apple 2.0GHz Intel
                                                                                             Using all benchmarks




                                                                           Core Duo



                                                                        Dell Precision 380
                                                                          Workstation
                                                                                                                        Validation of FP Subset Using Speedup




                               7%
                                     10.8 %
                                                                          Sun Fire X2100




9th ACAS 2008 - Lizy K. John
                                                                                             Based on the subset of 8




                                                                      Bull SAS NovaScale
                                              Average speedup error
                                                                       B280 (Intel Xeon




7/7/2010
                                                                        5150,2.66GHz)


                                                                       IBM BladeCenter
                                                                      LS41 (AMD Opteron
                                                                            8220)


                                                                       Intel DG965WH
                                                                                             Based on the subset of 6




                                                                      motherboard(3.6GHz
                                                                          Pentium D)
7/7/2010
               PERFORMANCE PREDICTION


Perf Prediction Using Workload Characteristics

                                           Predict performance of
                                           application a1 using
                                           performance of known
                                           benchmarks b1, b2, b3, b4


                                           K-nearest neighbors used for
                                           prediction


                                           Use a weight that is inversely
                                           proportional to the distance of
                                           the application from each
                                           benchmark




                   9th ACAS 2008 - Lizy K. John                              7/7/2010
CONCLUDING REMARKS


 Synthesis for multicore/multiprocessor evaluation
 Not all rosy but lots of potential
 Workload Characterization
 Excited with the opportunities

 Thanks to IBM: Mark Papermaster, Steve Stevens, Alex
  Mericas, Rob Bell Jr., Alan MacKay, Ann Marie Maynard, Tom
  Keller, Kevin Nowka




                        9th ACAS 2008 - Lizy K. John           7/7/2010
CONCLUDING REMARKS


 Thanks to LCA graduates, especially those in
  IBM – Juan Rubio, Madhavi Valluri, Rob Bell Jr., Pattabi
  Seshadri

 Thanks to my current students




                        9th ACAS 2008 - Lizy K. John         7/7/2010
Backup slides




 9th ACAS 2008 - Lizy K. John   7/7/2010
                    Workload Synthesis



Miniature Benchmarks for RTL Model Validations
 Early Power/Performance Estimates
 Simulating RTL models with SPEC benchmarks –
  practically impossible
 Distill workload characteristics into miniature clones
 “Automatic Testcase Synthesis and Performance
  Model Validation for High Performance PowerPC
  Processors”, Bell, Bhatia, John, Stuecheli, Griswell,
  Tu, Capps, Blanchard, and Thai, IEEE International
  Symposium on Performance Analysis of Systems and
  Software (ISPASS). March 2006.
 “Automatic Stressmark Generation”, HPCA 2008

                     9th ACAS 2008 - Lizy K. John   7/7/2010   7/7/2010
The Expiring Benchmarks


 SPEC89, SPEC92, SPEC95, SPEC CPU2000


 TPC A, TPC B, TPC D


 Benchmarks become obsolete very quickly


 Benchmark makers create inflated data sets
   to stay alive at least 4-5 years


 Lifetime of benchmarks has been not
   more than time for their creation


                           9th ACAS 2008 - Lizy K. John   7/7/2010
Fast Subsetting to form CPU2006 suite




                            9th ACAS 2008 - Lizy K. John   7/7/2010   7/7/2010
Redundancy in Benchmark Suites:
A Quote from Microprocessor Report


“....SPEC CPU suites have tended to contain too
many programs…….whose inclusion or exclusion
makes little or no difference to the overall score
achieved.”




    H. McGhan, Microprocessor Report, October 2006




                       9th ACAS 2008 - Lizy K. John   7/7/2010
Workload Similarity/Dissimilarity
                                                                          Subsetting




                                                                          Workload
                                                                           space
                                                                          analysis



                                                                         Performance
                                                                          Prediction




                                                          Principal Component Analysis
        Program Similarity Analysis                       Clustering Techniques


                     9th ACAS 2008 - Lizy K. John   7/7/2010                      7/7/2010
Ph. Ds in the past
 I have been fortunate enough to work with several highly talented Ph.D students,
    currently at
 – Deepu Talla (Texas Instruments)
 – Ravi Bhargava (AMD)
 – Tao Li (University of Florida)
 – Juan Rubio (IBM-ARL)
 – Madhavi Valluri (IBM)
 – Ramesh Radhakrishnan (Dell)
 – Yue Luo (Microsoft)
 – Byeong Kil Lee (Texas Instruments)
 – Shiwen Hu (Freescale)
 – Robert H. Bell, Jr. (IBM)
 – Aashish Phansalkar (Marvell)
 – Ajay Joshi (ARM)

                               9th ACAS 2008 - Lizy K. John   7/7/2010   7/7/2010
Example Ph. D Dissertations
Microarchitectural Techniques to Enable Efficient Java Execution
Architectural Techniques to Accelerate Multimedia Applications on General-
  Purpose Processors
Instruction History Management for High-Performance Microprocessors
OS-aware Architecture for Improving Microprocessor Performance and Energy
 Efficiency
A Hierarchical Computing Model for a Commercial Server
A Hybrid-Scheduling Approach for Energy-Efficient Superscalar Processors
Network Processor Design: Benchmarks and Architectural Alternatives
Efficient Adaptation of Multiple Microprocessor Resources for Energy Reduction
  Using Dynamic Optimization
Automatic Workload Synthesis for Early Design Studies and Performance Model
  Validation
Measuring Program Similarity for Efficient Benchmarking and Performance
 Evaluation of Computer Systems
Constructing Adaptable and Scalable Benchmarks for Microprocessor
 Performance Evaluation
                             9th ACAS 2008 - Lizy K. John   7/7/2010      7/7/2010
Current Research Topics

 Synthetic Benchmarks for Multi-cores/multithreaded
  processors, Portable Synthetic Benchmarks
  (Ganesan, Jung-ho)
 Complete System Power Measurement, Modeling and
  Management (Bircher)
 Embedded Java Characterization and Acceleration
  (Isen)
 QoS in Highly Threaded Processors (Kaseridis)
 Caches for Highly Threaded Processors (Stuecheli)
 Application Mapping to Highly Threaded Processors
  Utilizing Workload Characteristics (Chen)

                    9th ACAS 2008 - Lizy K. John   7/7/2010   7/7/2010
                     PERFORMANCE PREDICTION

Data Transformation Techniques
 All the measured microarchitecture independent characteristics
  are not equally important
 – Choose appropriate characteristics
 – Find weights for each characteristics


 Four different techniques
 – Equal Weights (EW) (Baseline)
 – Choosing characteristics based on correlation to performance
   (COR)
 – Principal Components Analysis (PCA)
 – Genetic Algorithm (GA)


                            9th ACAS 2008 - Lizy K. John          7/7/2010
                    PERFORMANCE PREDICTION

Data Transformation Techniques

 EW
 – All the characteristics are normalized and form the workload
   space
 – It is the first step for the all the other techniques below


 COR
 – Characteristics are chosen based on their correlation
   coefficient with the performance scores and form the
   workload space
 – It is equivalent to giving a weight of 0 or 1


                         9th ACAS 2008 - Lizy K. John            7/7/2010
                  PERFORMANCE PREDICTION

Data Transformation Techniques
 PCA
 – Top few PCs retained
 – The top few chosen PCs form the workload space
 – More weight to characteristics that show higher variance


 GA
 – Algorithm based on evolution theory
 – Used to find a best set of weights using the performance
   scores of benchmarks
 – The weights are then applied to each characteristic and the
   transformed data is used to form the workload space


                       9th ACAS 2008 - Lizy K. John           7/7/2010
                   PERFORMANCE PREDICTION

Experiments for Performance Prediction

 Experimental Setup
 – Prediction of machine ranks
 – Prediction of CPI
 – Prediction of data cache hit rate


 Procedure
 – Leave one out validation method
 – Use k-nearest neighbors to predict the performance score




                        9th ACAS 2008 - Lizy K. John      7/7/2010
                                            PERFORMANCE PREDICTION

Prediction of Machine Ranks Using Speedup

          Average Rank Correlation   0.93
                                     0.92
                                     0.91
                 Coefficient



                                      0.9
                                     0.89
                                     0.88
                                     0.87
                                     0.86
                                            GM       EW            COR          PCA   GA
                                                  Data Transformation Technique




 Rank correlation coefficient close to 1 shows ranks close to the actual
  ranks
 GA shows better rank correlation coefficient than any other data
  transformation technique
                                                 9th ACAS 2008 - Lizy K. John              7/7/2010
                                                                              PERFORMANCE PREDICTION

Prediction of Machine Ranks Using Speedup

                                                                                                 GM          EW         COR       PCA        GA
Rank correlation coefficient




                                1

                               0.9


                               0.8

                               0.7

                               0.6




                                                                                                                                                                                                             AVG
                                                                                                                                                                                    vortex

                                                                                                                                                                                             vpr
                                                                         crafty
                                                                 bzip2




                                                                                                                        gzip
                                                           art




                                                                                                                                                                            twolf
                                                                                                                  gcc
                                                                                  eon
                                            applu




                                                                                        equake



                                                                                                         galgel




                                                                                                                                       mcf
                                                                                                                               lucas




                                                                                                                                                            parser
                                                    apsi




                                                                                                 fma3d




                                                                                                                                                    mgrid




                                                                                                                                                                                                   wupwise
                                                                                                                                                                     swim
                                                                                                                                             mesa
                                     ammp




                                                                                                                  Application



                                        Rank correlation coefficient for individual programs


                                                                                            9th ACAS 2008 - Lizy K. John                                                                           7/7/2010

				
DOCUMENT INFO
Description: International Mileage Chart document sample