Computer Architecture by ewghwehws

VIEWS: 3 PAGES: 36

									Computer Architecture
        Chapter 1
      Fundamentals



          Fall 2007



       Chapter 1 - Fundamentals   1
                  Introduction

1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer Design
1.7 Putting It All Together: The Concept of Memory Hierarchy




                      Chapter 1 - Fundamentals                 2
                      Art and
                    Architecture


                   What’s the difference
                   between     Art   and
                   Architecture?



                   Lyonel Feininger,
                   Marktkirche in Halle



Chapter 1 - Fundamentals                  3
          Art and Architecture




                                              Notre Dame
                                              de Paris



What’s the difference between Art and Architecture?
                   Chapter 1 - Fundamentals                4
What’s Computer Architecture?
The attributes of a [computing] system as seen by the
programmer, i.e., the conceptual structure and functional
behavior, as distinct from the organization of the data
flows and controls the logic design, and the physical
implementation.
          Amdahl, Blaaw, and Brooks, 1964
                              SOFTWARE




                   Chapter 1 - Fundamentals           5
What’s Computer Architecture?
• 1950s to 1960s: Computer Architecture Course
  Computer Arithmetic.
• 1970s to mid 1980s: Computer Architecture Course
  Instruction Set Design, especially ISA appropriate for
  compilers. (What we’ll do in Chapter 2)
• 1990s to 2000s: Computer Architecture Course
  Design of CPU, memory system, I/O system,
  Multiprocessors. (All evolving at a tremendous rate!)




                    Chapter 1 - Fundamentals          6
                                     The Task of a
                                   Computer Designer
1.1 Introduction
1.2 The Task of a Computer
     Designer
1.3 Technology and Computer
     Usage Trends                                   Evaluate Existing
1.4 Cost and Trends in Cost        Implementation     Systems for
1.5 Measuring and Reporting        Complexity         Bottlenecks
     Performance
1.6 Quantitative Principles of
     Computer Design                                                      Benchmarks
1.7 Putting It All Together: The
     Concept of Memory                                Technology
     Hierarchy                                          Trends

                                        Implement Next
                                                               Simulate New
                                       Generation System
                                                               Designs and
                                                               Organizations


                                        Workloads

                                   Chapter 1 - Fundamentals                    7
                           Technology and
                        Computer Usage Trends
1.1 Introduction
1.2 The Task of a Computer Designer           When building a Cathedral numerous
1.3 Technology and Computer Usage               very practical considerations need to
     Trends
                                                be taken into account:
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance       • available materials
1.6 Quantitative Principles of Computer       • worker skills
     Design
                                              • willingness of the client to pay the
1.7 Putting It All Together: The Concept of
     Memory Hierarchy                           price.


                                              Similarly, Computer Architecture is about
                                                 working within constraints:
                                              • What will the market buy?
                                              • Cost/Performance
                                              • Tradeoffs in materials and processes


                                    Chapter 1 - Fundamentals                      8
                             Trends
Gordon Moore (Founder of Intel) observed in 1965 that the number of
  transistors that could be crammed on a chip doubles every year.
This has CONTINUED to be true since then.
                        Transistors Per Chip

1.E+08

                                                                           Pentium 3
                                                   Pentium Pro
1.E+07                                                              Pentium II
                                             Pentium                Power PC G3


                                      486                   Power PC 601
1.E+06
                              386

                     80286
1.E+05


                     8086
1.E+04



            4004
1.E+03
     1970     1975   1980      1985         1990          1995         2000            2005

                        Chapter 1 - Fundamentals                                  9
                                Trends
Processor performance, as measured by the SPEC benchmark has
  also risen dramatically.


  5000
                                                                  Alpha 6/833

  4000

  3000

  2000
                                                DEC Alpha 5/500
                                     DEC
  1000     Sun MIPS
                         IBM         AXP/
                          RS/        500 DEC Alpha 4/266      DEC Alpha 21264/600
            -4/ M
                         6000
           260 2000
      0
          87
               88
                    89
                         90
                                91
                                     92
                                           93
                                                94
                                                     95
                                                          96
                                                                97
                                                                     98
                                                                          99
                                                                                2000
                           Chapter 1 - Fundamentals                                 10
                                                    Trends
Memory Capacity (and Cost) have changed dramatically in the last 20
  years.


                                             size


        1000000000


         100000000                                                       year   size(Mb)    cyc time
          10000000
                                                                         1980        0.0625 250 ns
                                                                         1983        0.25    220 ns
 Bits




            1000000
                                                                         1986        1       190 ns
             100000                                                      1989        4       165 ns
                                                                         1992       16       145 ns
               10000
                                                                         1996       64       120 ns
                1000                                                     2000     256        100 ns
                       1970   1975   1980   1985    1990   1995   2000

                                            Year




                                             Chapter 1 - Fundamentals                         11
                          Trends
Based on SPEED, the CPU has increased dramatically, but memory
  and disk have increased only a little. This has led to dramatic
  changed in architecture, Operating Systems, and Programming
  practices.



                Capacity                Speed (latency)
     Logic      2x in 3 years           2x in 3 years
     DRAM       4x in 3 years           2x in 10 years
     Disk       4x in 3 years           2x in 10 years




                       Chapter 1 - Fundamentals              12
                              Measuring And
                           Reporting Performance
1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage
     Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance        This section talks about:
1.6 Quantitative Principles of Computer
     Design
                                               1. Metrics – how do we describe
1.7 Putting It All Together: The Concept of
     Memory Hierarchy                             in a numerical way the
                                                  performance of a computer?

                                               2. What tools do we use to find
                                                  those metrics?




                                    Chapter 1 - Fundamentals                 13
                           Metrics
                                                    Throughput
   Plane      DC to Paris    Speed     Passengers
                                                      (pmph)

 Boeing 747    6.5 hours    610 mph        470       286,700



 BAD/Sud
                3 hours     1350 mph       132       178,200
 Concodre


• Time to run the task (ExTime)
  – Execution time, response time, latency
• Tasks per day, hour, week, sec, ns …
  (Performance)
  – Throughput, bandwidth
                      Chapter 1 - Fundamentals                   14
     Metrics - Comparisons
"X is n times faster than Y" means

  ExTime(Y)             Performance(X)
  ---------     =      ---------------
  ExTime(X)             Performance(Y)


Speed of Concorde vs. Boeing 747

Throughput of Boeing 747 vs. Concorde
                    Chapter 1 - Fundamentals   15
            Metrics - Comparisons
Pat has developed a new product, "rabbit" about which she wishes to determine
   performance. There is special interest in comparing the new product, rabbit to the
   old product, turtle, since the product was rewritten for performance reasons. (Pat
   had used Performance Engineering techniques and thus knew that rabbit was
   "about twice as fast" as turtle.) The measurements showed:

Performance Comparisons

Product   Transactions / second      Seconds/ transaction    Seconds to process transaction
Turtle               30                  0.0333                        3
Rabbit               60                  0.0166                        1


Which of the following statements reflect the performance comparison of rabbit and
  turtle?

o Rabbit is 100% faster than turtle.            o Rabbit takes 200% less time than turtle.
o Rabbit is twice as fast as turtle.            o Turtle is 50% as fast as rabbit.
o Rabbit takes 1/2 as long as turtle.           o Turtle is 50% slower than rabbit.
o Rabbit takes 1/3 as long as turtle.           o Turtle takes 200% longer than rabbit.
o Rabbit takes 100% less time than turtle.      o Turtle takes 300% longer than rabbit.

                                  Chapter 1 - Fundamentals                           16
      Metrics - Throughput
      Application             Answers per month
                              Operations per second
     Programming
       Language
       Compiler
                    (millions) of Instructions per second: MIPS
          ISA       (millions) of (FP) operations per second: MFLOP/s
        Datapath
            Control            Megabytes per second
     Function Units
Transistors Wires Pins         Cycles per second (clock rate)




                    Chapter 1 - Fundamentals                    17
     Methods For Predicting
         Performance
• Benchmarks, Traces, Mixes
• Hardware: Cost, delay, area, power estimation
• Simulation (many levels)
   – ISA, RT, Gate, Circuit
• Queuing Theory
• Rules of Thumb
• Fundamental “Laws”/Principles




                    Chapter 1 - Fundamentals      18
                          Benchmarks
         SPEC: System Performance Evaluation
                    Cooperative
•   First Round 1989
     – 10 programs yielding a single number (“SPECmarks”)
•   Second Round 1992
     – SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs)
           • Compiler Flags unlimited. March 93 of DEC 4000 Model 610:
           spice: unix.c:/def=(sysv,has_bcopy,”bcopy(a,b,c)=
           memcpy(b,a,c)”
           wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200
           nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas
•   Third Round 1995
     – new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floating
        point)
     – “benchmarks useful for 3 years”
     – Single flag setting for all programs: SPECint_base95, SPECfp_base95

                              Chapter 1 - Fundamentals                            19
                   Benchmarks
CINT2000 (Integer Component of SPEC CPU2000):

  Program               Language         What Is It
  164.gzip      C               Compression
  175.vpr       C               FPGA Circuit Placement and Routing
  176.gcc       C               C Programming Language Compiler
  181.mcf       C               Combinatorial Optimization
  186.crafty    C               Game Playing: Chess
  197.parser    C               Word Processing
  252.eon       C++             Computer Visualization
  253.perlbmk   C               PERL Programming Language
  254.gap       C               Group Theory, Interpreter
  255.vortex    C               Object-oriented Database
  256.bzip2     C               Compression
  300.twolf     C               Place and Route Simulator
  http://www.spec.org/osg/cpu2000/CINT2000/
                      Chapter 1 - Fundamentals                       20
                 Benchmarks
CFP2000 (Floating Point Component of SPEC
                CPU2000):
Program        Language        What Is It
168.wupwise    Fortran 77      Physics / Quantum Chromodynamics
171.swim       Fortran 77      Shallow Water Modeling
172.mgrid      Fortran 77      Multi-grid Solver: 3D Potential Field
173.applu      Fortran 77      Parabolic / Elliptic Differential Equations
177.mesa       C               3-D Graphics Library
178.galgel     Fortran 90      Computational Fluid Dynamics
179.art        C                Image Recognition / Neural Networks
183.equake     C               Seismic Wave Propagation Simulation
187.facerec    Fortran 90      Image Processing: Face Recognition
188.ammp       C               Computational Chemistry
189.lucas      Fortran 90      Number Theory / Primality Testing
191.fma3d      Fortran 90      Finite-element Crash Simulation
200.sixtrack   Fortran 77      High Energy Physics Accelerator Design
 301.apsi      Fortran 77      Meteorology: Pollutant Distribution

http://www.spec.org/osg/cpu2000/CFP2000/
                     Chapter 1 - Fundamentals                          21
Benchmarks                                    Sample Results For
                                                SpecINT2000
              http://www.spec.org/osg/cpu2000/results/res2000q3/cpu2000-20000718-00168.asc

                Base     Base   Base      Peak    Peak    Peak
Benchmarks    Ref Time Run Time Ratio    Ref Time Run Time Ratio
164.gzip         1400    277      505*   1400     270      518*
175.vpr          1400    419      334*   1400     417      336*
176.gcc          1100    275      399*   1100     272      405*
181.mcf          1800    621      290*   1800     619      291*      Intel OR840(1 GHz
                                                                     Pentium III processor)
186.crafty       1000    191      522*   1000     191      523*
197.parser       1800    500      360*   1800     499      361*
252.eon          1300    267      486*   1300     267      486*
253.perlbmk      1800    302      596*   1800     302      596*
254.gap          1100    249      442*   1100     248      443*
255.vortex       1900    268      710*   1900     264      719*
256.bzip2        1500    389      386*   1500     375      400*
300.twolf        3000    784      382*   3000     776      387*
SPECint_base2000                  438
SPECint2000                                                442

                               Chapter 1 - Fundamentals                            22
                    Benchmarks
                 Performance Evaluation
• “For better or worse, benchmarks shape a field”
• Good products created when have:
    – Good benchmarks
    – Good ways to summarize performance
• Given sales is a function in part of performance relative to
  competition, investment in improving product as reported by
  performance summary
• If benchmarks/summary inadequate, then choose between
  improving product for real programs vs. improving product to get
  more sales;
  Sales almost always wins!
• Execution time is the measure of computer performance!


                       Chapter 1 - Fundamentals               23
                        Benchmarks
               How to Summarize Performance
Management would like to have one number.
Technical people want more:
1.   They want to have evidence of reproducibility – there should be enough
     information so that you or someone else can repeat the experiment.
2.   There should be consistency when doing the measurements multiple
     times.

          How would you report these results?
                        Computer A      Computer B   Computer C

    Program P1 (secs)          1             10           20

    Program P2 (secs)        1000           100           20

    Total Time (secs)        1001           110           40


                         Chapter 1 - Fundamentals                   24
                            Quantitative Principles
                             of Computer Design
1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage
     Trends
1.4 Cost and Trends in Cost
                                               Make the common case fast.
1.5 Measuring and Reporting Performance        Amdahl’s Law:
1.6 Quantitative Principles of Computer             Relates total speedup of a
     Design
1.7 Putting It All Together: The Concept of
                                                    system to the speedup of some
     Memory Hierarchy                               portion of that system.




                                    Chapter 1 - Fundamentals                 25
Quantitative                                     Amdahl's Law
  Design

 Speedup due to enhancement E:

                 Execution _ Time _ Without _ Enhancement    Performanc e _ With _ Enhancement
Speedup( E )                                             
                  Execution _ Time _ With _ Enhancement     Performanc e _ Without _ Enhancement




                                  This fraction enhanced
Suppose that enhancement E accelerates a fraction F
  of the task by a factor S, and the remainder of the
  task is unaffected
                                   Chapter 1 - Fundamentals                                 26
Quantitative
                                         Amdahl's Law
  Design
ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced

                                                           Speedupenhanced


                                                       1
                   ExTimeold
Speedupoverall =                 =
                   ExTimenew         (1 - Fractionenhanced) + Fractionenhanced

                                                                Speedupenhanced




                          This fraction enhanced
              ExTimeold                                    ExTimenew

                          Chapter 1 - Fundamentals                           27
Quantitative                    Amdahl's Law
  Design
• Floating point instructions improved to run 2X; but only
  10% of actual instructions are FP




  ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold

                                 1
    Speedupoverall =                          =   1.053
                               0.95

                   Chapter 1 - Fundamentals               28
Quantitative                          Cycles Per
  Design                              Instruction
       CPI = (CPU Time * Clock Rate) / Instruction Count
           = Cycles / Instruction Count

                                            n
       CPU _ Time  Cycle _ Time *  CPI i * I i
                                           i 1
“Instruction Frequency”                                      Number of
                                                             instructions of
                                                             type I.

        n
CPI   CPI i * Fi      where       Fi              Ii
                                            Instruction _ Count
       i 1



 Invest Resources where time is Spent!
                     Chapter 1 - Fundamentals                        29
Quantitative                              Cycles Per
  Design                                  Instruction
Suppose we have a machine where we can count the frequency with which
instructions are executed. We also know how many cycles it takes for
each instruction type.

 Base Machine (Reg / Reg)
 Op            Freq Cycles CPI(i)            (% Time)
 ALU           50%     1   .5                (33%)
 Load          20%     2   .4                (27%)
 Store         10%     2   .2                (13%)
 Branch        20%     2   .4                (27%)
 Total CPI                 1.5

 How do we get CPI(I)?
 How do we get %time?
                         Chapter 1 - Fundamentals               30
Quantitative                                  Locality of
  Design                                      Reference
 Programs access a relatively small portion of the address space at
 any instant of time.

 There are two different types of locality:

 Temporal Locality (locality in time): If an item is referenced, it will
 tend to be referenced again soon (loops, reuse, etc.)

 Spatial Locality (locality in space/location): If an item is referenced,
 items whose addresses are close by tend to be referenced soon
 (straight line code, array access, etc.)




                         Chapter 1 - Fundamentals                          31
                                    The Concept of
                                   Memory Hierarchy
1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage
     Trends
1.4 Cost and Trends in Cost                   Fast memory is expensive.
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer
     Design                                   Slow memory is cheap.
1.7 Putting It All Together: The Concept of
     Memory Hierarchy
                                              The goal is to minimize the
                                              price/performance for a
                                              particular price point.




                                    Chapter 1 - Fundamentals                32
              Memory Hierarchy

                              Level 1      Level 2
               Registers                               Memory        Disk
                               cache       Cache

  Typical      4 - 64      <16K bytes    <2 Mbytes       <16          >5
   Size                                               Gigabytes    Gigabytes
 Access        1 nsec        3 nsec       15 nsec     150 nsec     5,000,000
  Time                                                                nsec
Bandwidth     10,000 –     2000 - 5000   500 - 1000   500 - 1000     100
(in MB/sec)    50,000
 Managed      Compiler      Hardware     Hardware        OS        OS/User
   By



                           Chapter 1 - Fundamentals                   33
             Memory Hierarchy
• Hit: data appears in some block in the upper level (example:
  Block X)
   – Hit Rate: the fraction of memory access found in the upper level
   – Hit Time: Time to access the upper level which consists of
        RAM access time + Time to determine hit/miss
• Miss: data needs to be retrieve from a block in the lower level
  (Block Y)
   – Miss Rate = 1 - (Hit Rate)
   – Miss Penalty: Time to replace a block in the upper level +
        Time to deliver the block the processor
• Hit Time << Miss Penalty (500 instructions on 21264!)




                        Chapter 1 - Fundamentals                  34
        Memory Hierarchy

                        Level 1      Level 2
         Registers                              Memory   Disk
                         cache       Cache


What is the cost of executing a program if:
• Stores are free (there’s a write pipe)
• Loads are 20% of all instructions
• 80% of loads hit (are found) in the Level 1 cache
• 97 of loads hit in the Level 2 cache.




                     Chapter 1 - Fundamentals             35
                      Wrap Up


1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer Design
1.7 Putting It All Together: The Concept of Memory Hierarchy




                      Chapter 1 - Fundamentals                 36

								
To top