execution time

Document Sample
execution time Powered By Docstoc
					Computing Systems

Assessing and Understanding
Performance   1
   Measure, Report, and Summarize performance
   Make intelligent choices
   See through the marketing hype
   Key to understanding underlying organizational motivation

    Why is some hardware better than others for different

    What factors of system performance are hardware related?
    (e.g., Do we need a new machine, or a new operating system?)

    How does the machine's instruction set affect performance?

   We are looking for metrics for measuring performance from the
    viewpoint of both a computer user and a designer                                       2
Defining performance
 Airplane       Pass.       Range       Speed     Pass. throughput
               capacity     (miles)    (m.p.h.)   (pass. x m.p.h.)
Boeing 777       375            4630     610          228750

Boeing 747       470            4150     610          286700

 Concorde        132            4000    1350          178200

 Douglas         146            8720     544           79424
 DC 8-50

• How much faster is the Concorde compared to the 747 ?
• Is the Concorde faster compared to the 747 ?
• How much bigger is the 747 than the Douglas DC-8?
• Which of these airplanes has the best performance ?                                        3
Understanding performance

   The performance of a program
    depends on:
     - the algorithm,
     - the language,
     - the compiler,
     - the architecture
     - the actual hardware      4
           Computer performance:
            Time, Time, Time !!!
   Response Time = Execution Time = Latency
     - The time between the start and completion of a
   Throughput
     - Total amount of work completed in a given time

    If we upgrade a machine with a new faster
    processor what do we increase?
    If we add a new processor to a system that uses
    multiple processors what do we increase?                           5
Execution Time
   Execution Time (response time or elapsed time)
      total time to complete a program, it counts everything
       (disk accesses, memory accesses, input/output
      a useful number, but often not good for comparison

   CPU (execution) time
      doesn't count time spent waiting for I/O or time spent
       running other programs
      can be broken up into system time (CPU time spent in
       the OS), and user time (CPU time spent in the program)

   Our focus: user CPU time
      time spent executing the lines of code that are "in" our
       program                                   6
Book’s definition of performance

- For some program running on machine X,
    Performanc e X 
                     ExecutionTime x

- "X is n times faster than Y"
    Performanc e X ExecutionTime Y
                                  n
    Performanc e Y ExecutionTime X

Problem (Relative Performance) :
    – machine A runs a program in 10 seconds
    – machine B runs the same program in 25 seconds
    – How much faster is machine A compared to B ?                         7
      Measuring Time

   Instead of reporting execution time in seconds, we often use
    cycles (= clock cycles = ticks = clock ticks = clocks = clock
                   seconds   cycles seconds
                                  
                   program program   cycle

   Clock “ticks” indicate when to start activities:


   cycle time = time between ticks = seconds per cycle
   clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec)

    Example: a 4 GHz. clock has a 250 ps. cycle time                                 8
How to improve performance
    CPU_execution _time_ for_ a_program 
    CPU_clock_ cycles_for _a_program  clock_cycle_time

                    seconds   cycles seconds
           Time                   
                    program program   cycle

   So, to improve performance (everything else being equal)
    you can either (increase or decrease?)

    ________ the # of required cycles for a program, or
    ________ the clock cycle time or, said another way,
    ________ the clock rate.                              9
Example - improving performance

 Our favorite program runs in 10 seconds on computer A,
 which has a 4 GHz. clock. We are trying to help a
 computer designer build a new machine B, that will run
 this program in 6 seconds. The designer can use new (or
 perhaps more expensive) technology to substantially
 increase the clock rate, but has informed us that this
 increase will affect the rest of the CPU design, causing
 machine B to require 1.2 times as many clock cycles as
 machine A for the same program. What clock rate
 should we tell the designer to target?"                               10
Cycles required for a program
   Can we assume that # cycles = # instructions ?

   This assumption is incorrect. Different instructions take different
    amounts of time. Why?
      remember that these are machine instructions, not lines of
        C code
      Multiplication takes more time than addition
      Floating point operations take longer than integer ones
      Accessing memory takes more time than accessing

Important point: changing the cycle time often changes the
number of cycles required for various instructions (more later)                                        11
Clock cycles per instruction

   It is clear that the execution time of a program must
    depends on the number of machine instructions
    generated by the compiler:

    CPU_clock_ cycles  instructio ns_for_a_p rogram 
                      average_cl ock_cycles _per_instr uction

   the average number of clock cycles each instruction
    takes to execute is often abbreviated CPI
   CPI provides a way of comparing two different
    implementations of the same ISA (since the IC required
    for a program will be the same)                                   12
The “performance equation”

   A given program will require:
        some number of instructions (machine instructions)
        some number of cycles per each instruction
        some number of seconds per cycle

    CPU_time  instruction_ count  CPI  clock_cycle_time

   This useful formula separate the 3 key factors that affect
    performance.                                 13
Performance - the BIG picture
   The only complete and reliable measure of performance is
    determined by execution time

             Seconds Instructio ns Clock_ cycles     Seconds
    Time                                      
             Program   Program      Instructio n   Clock _ cycle

   Do any of the other variables equal performance?     NO!
      # of cycles to execute program?
      # of instructions in program?
      # of cycles per second?
      average # of cycles per instruction (CPI)?
      average # of instructions per second (MIPS)?

   Common pitfall: thinking one of the variables is indicative
    of performance when it really isn’t.                                 14
Example - CPI
   Suppose we have two implementations of the same
    instruction set architecture (ISA).
    For some program:
        Machine A has a clock cycle time of 250 ps and a CPI of 2.0
        Machine B has a clock cycle time of 500 ps and a CPI of 1.2

   What machine is faster for this program, and by how
   If two machines have the same ISA which of the following
    quantities will always be identical?
     -   clock rate,
     -   CPI,
     -   execution time,
     -   # of instructions,
     -   # of cycles,
     -   MIPS                                     15
Example - Number of Instructions
A compiler designer is trying to decide between two code sequences
for a particular machine. Based on the hardware implementation,
there are three different classes of instructions: Class A, Class B, and
Class C, and they require one, two, and three cycles (respectively).

The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C.
The second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C.

Which sequence will be faster? How much?
What is the average CPI for each sequence?
Hint:   CPU_clock_ cycles   (CPI i  Ci )

N = number of instruction classes,
Ci = count of the # of instructions of class i executed                                       16

   Million instructions per second

                      Instructio n count
            MIPS 
                     Execution time  10 6

   Problems using MIPS for comparing computers
        MIPS specifies the instruction execution rate but does not
         take into account that instructions may have different
        MIPS varies between programs on the same computer; thus
         a computer cannot have a single MIPS rating for all
        MIPS can vary inversely with performance !!!                                    17
Example – MIPS
   Two different compilers are being tested for a 4 GHz. machine
    with three different classes of instructions: Class A, Class B, and
    Class C, which require one, two, and three cycles (respectively).
    Both compilers are used to produce code for a large piece of

    The first compiler's code uses:
          - 5 million Class A instructions,
          - 1 million Class B instructions,
          - 1 million Class C instructions.

    The second compiler's code uses:
          - 10 million Class A instructions,
          - 1 million Class B instructions,
          - 1 million Class C instructions.

   Which sequence will be faster according to MIPS?
   Which sequence will be faster according to execution time?                                       18
Performance best determined by running a real application
     Use programs typical of expected workload
     or typical of expected class of applications
       (e.g., compilers/editors, scientific applications, graphics, etc.)

Small benchmarks
    nice for architects and designers
    easy to standardize
    can be abused

SPEC (System Performance Evaluation Cooperative)
    companies have agreed on a set of real program and inputs
    valuable indicator of performance (and compiler technology)
    can still be abused (see Intel’s benchmark )                                         19
Benchmark “games”
   An embarrassed Intel Corp. acknowledged Friday that a bug in
    a software program known as a compiler had led the company
    to overstate the speed of its microprocessor chips on an
    industry benchmark by 10 percent. However, industry analysts
    said the coding error…was a sad commentary on a common
    industry practice of “cheating” on standardized performance
    tests…The error was pointed out to Intel two days ago by a
    competitor, Motorola …came in a test known as
    SPECint92…Intel acknowledged that it had “optimized” its
    compiler to improve its test scores. The company had also said
    that it did not like the practice but felt to compelled to make the
    optimizations because its competitors were doing the same
    thing…At the heart of Intel’s problem is the practice of “tuning”
    compiler programs to recognize certain computing problems in
    the test and then substituting special handwritten pieces of

          Saturday, January 6, 1996 New York Times                                        20
   Different classes and applications of computers requires different
    types of benchmark suites
      SPEC CPU2000
      SPECweb99
      EEMBC

   The execution time measurements are normalized by dividing the
    execution time on a Sun Ultra 5_10 with a 300 MHz processor by
    the execution time on a measured computer (this measure is called
    SPEC ratio)

   The guiding principle in reporting performance measurements
    should be reproducibility (an important aspect of reproducibility is
    the choice of input)                                      21
Spec’89 – Compiler enhancement
and performance


SPEC performance ratio






                               gcc   espresso   spice   doduc   nasa7     li   eqntott   matrix300   fpppp      tomcatv

                                                                                                     Compiler                                                                        Enhanced compiler    22
SPEC CPU2000   23
                                                  Ratio            PIII                 P4
SPEC2000                                CINT2000/clock rate in MHz 0.47                0.36
                                        CFP2000/clock rate in MHz 0.34                 0.39

Does doubling the clock rate double the performance?



                                                Pentium 4 CFP2000
                                                               Pentium 4 CINT2000

             Pentium III CINT2000

                                Pentium III CFP2000

                     500       1000      1500         2000   2500     3000      3500
                                        Clock rate in MHz                                                          24
Can a machine with a slower clock rate have better performance?
                                                           Pentium M @ 1.6/0.6 GHz
                                                           Pentium 4-M @ 2.4/1.2 GHz
                                                           Pentium III-M @ 1.2/0.8 GHz







             SPECINT2000 SPECFP2000 SPECINT2000 SPECFP2000 SPECINT2000 SPECFP2000
             Always on/maximum clock     Laptop mode/adaptive     Minimum power/minimum
                                                 clock                    clock
                                       Benchmark and power mode                                                             25
  Performance, power, and energy

     Power is increasingly becoming a key limitation in
      processor performance (especially for embedded

     In CMOS technology the primary source of power
      dissipation is:

switching_ power  capacitive _load  Voltage 2  switching _frequency

     For power limited application, the most important metric
      is energy efficiency, which is computed by taking
      performance and dividing by average power consumption
      when running the benchmark                                    26
Summarizing performance
   Although summarizing measurements result in less
    information, marketers and even users often prefer to
    have a single number to compare performance

   Arithmetic mean of the execution times (underlying
    assumption that the programs in the workload are each
    run an equal number of times)
                         1 n
                   AM   Time i
                         n i1

   Weighted arithmetic mean (wi frequency of the program
    in the workload)
                 WAM   w i  Time i
                           i1                               27
Amdahl’s law

   The performance enhancement possible with a given
    improvement is limited by the amount that the improved
    feature is used

     Execution time after improvement =

     Execution time affected
                                + Execution time unaffected
     Amount of improvement

   Principle: make the common case fast                                 28
Example – Amdahl’s law

Suppose a program runs in 100 seconds on a
machine, with multiply responsible for 80 seconds of
this time.

How much do we have to improve the speed of
multiplication if we want the program to run 4 times

How about making it 5 times faster?                          29
Performance is specific to a particular program
     Execution time is the only valid and unimpeachable
       measure of performance

For a given ISA, increases in CPU performance come from three
     Increases in clock rate
     Improvement in processor organization that lower the CPI
     Compiler enhancements that lower the instruction count
       and/or the average CPI (e.g. by using simpler instructions)

          Using a subset of the “performance equation” as
           performance metric
          Expecting the improvement of one aspect of a machine’s
           performance to increase total performance by an amount
           proportional to the size of the partial improvement
          Designing only for performance without considering cost,
           functionality and other requirements is unrealistic                                         30

Shared By: