Perl interpreter by HC121107013454

VIEWS: 4 PAGES: 27

									     MAMAS – Computer Architecture
               234267

                                    Lecturer: Adi Yoaz

           Presentation made by: Dr. Lihu Rappoport
Some of the slides were taken from Avi Mendelson, Randi Katz, Patterson, Gabriel Loh




 1                                                               Computer Architecture 2008 – Introduction
        General Course Information
   Grade
       20% Exercise (mandatory) ‫תקף‬
       80% Final exam
       No midterm exam


   Textbooks
       Computer Architecture a Quantitative Approach:
        Hennessy & Patterson

   Other course information
       Course web site:
        http://webcourse.cs.technion.ac.il/234267
       Foils will be on the web several days before the class

2                                         Computer Architecture 2008 – Introduction
                Lecturer details
   Name: Adi Yoaz
   Cell: 054-7885599
   Email: adi.yoaz@intel.com

   Feel free to call or send email

   Reception Hours: After the lecture 18:20 and on.




3                                     Computer Architecture 2008 – Introduction
                         Class Focus
   CPU
       Introduction: performance, instruction set (RISC vs. CISC)
       Pipeline, hazards
       Branch prediction
       Out-of-order execution
   Memory Hierarchy
       Cache
       Main memory
       Virtual Memory
   Advanced Topics
   PC Architecture
       Motherboard & chipset, DRAM, I/O, Disk, peripherals


4                                        Computer Architecture 2008 – Introduction
                                  Computer System Structure
                                                               External
                                                               Graphics
                                                                 Card

                                                                     PCI express ×16
                                                           North Bridge
                                  Cache                                                              DDRII
                                            CPU BUS     On-board           Memory                  Channel 1
                                  CPU                   Graphics          controller   Mem BUS
                                                                                                     DDRII
                                                                                                   Channel 2


                                                             South Bridge                    PCI express ×1

                           IO Controller
                                                        USB        IDE       SATA                   PCI
                                                      controller controller controller


                                                                                                 Sound         Lan
    Parallel Port




                                                                                                  Card        Adap
                    Serial Port




                                   Floppy             mouse       DVD          Hard
                                            keybrd
                                    Drive                         Drive        Disk
                                                                                               speakers
                                                                                                               LAN


5                                                                                 Computer Architecture 2008 – Introduction
    Architecture & Microarchitecture
   Architecture
    The processor features seen by the “user”
       Instruction set, addressing modes, data width, …

   Micro-architecture
    The way of implementation of a processor
       Caches size and structure, number of execution units, …
       Timing is considered uArch (though it is user visible)

   Processors with different uArch can support the
    same Architecture




6                                        Computer Architecture 2008 – Introduction
                     Compatibility
   Backward compatibility
       New hardware can run existing software
        • Core2 Duo can run SW written for Pentium4, PentiumM,
          Pentium III, Pentium II, Pentium, 486, 386, 268

   Forward compatibility
       New software can run on existing hardware
       Example: new software written with SSE2TM runs on older
        processor which does not support SSE2TM
       Commonly supports one or two generations behind

   Architecture independent SW
       JIT – just in time compiler: Java and .NET
       Binary translation

7                                        Computer Architecture 2008 – Introduction
    Performance




8           Computer Architecture 2008 – Introduction
  Technology Trends and Performance
1000                                                  1000000
                       Speed                                                Capacity
            Logic
                                                      100000        Logic
            DRAM
100                                                    10000        DRAM
                                      CPU speed and
           2× in 3 years              Memory speed      1000    4× in 3 years
                                      grow apart
 10                                                      100
                    1.1× in 3 years                                                     2× in 3 years
                                                          10
  1                                                         1
     80

     83

     86

     89

     92

     95

     98

     01

     04

     07




                                                             80

                                                             83

                                                             86

                                                             89

                                                             92

                                                             95

                                                             98

                                                             01

                                                             04

                                                             07
                                                           19

                                                           19

                                                           19

                                                           19

                                                           19

                                                           19

                                                           19

                                                           20

                                                           20

                                                           20
  19

  19

  19

  19

  19

  19

  19

  20

  20

  20


      Computing capacity: 4× per 3 years
         If we could keep all the transistors busy all the time

         Actual: 3.3× per 3 years


      Moore’s Law: Performance is doubled every ~18 months
        Trend is slowing: process scaling declines, power is up


       9                                                   Computer Architecture 2008 – Introduction
                                Moore’s Law




Graph taken from: http://www.intel.com/technology/mooreslaw/index.htm

    10                                                        Computer Architecture 2008 – Introduction
           CPI – Cycles Per Instruction
   CPUs work according to a clock signal
        Clock cycle is measured in nsec (10-9 of a second)
        Clock frequency (= 1/clock cycle) measured in GHz (109cyc/sec)

   Instruction Count (IC)
        Total number of instructions executed in the program

   CPI – Cycles Per Instruction
        Average #cycles per Instruction (in a given program)

                     #cycles required to execute the program
            CPI =
                                          IC

        IPC (= 1/CPI) : Instructions per cycles


    11                                        Computer Architecture 2008 – Introduction
                          CPU Time
    CPU Time - time required to execute a program
             CPU Time = IC  CPI  clock cycle




    Our goal: minimize CPU Time
        Minimize clock cycle: more GHz (process, circuit, uArch)
        Minimize CPI:        uArch (e.g.: more execution units)
        Minimize IC:         architecture (e.g.: SSETM)




12                                        Computer Architecture 2008 – Introduction
                      Amdahl’s Law
Suppose enhancement E accelerates a fraction F of the task by a
factor S, and the remainder of the task is unaffected, then:




                                                            Fractionenhanced
 ExTimenew = ExTimeold × (1 – Fraction enhanced) +
                                                             Speedupenhanced


                   ExTimeold                          1
Speedupoverall =               =
                   ExTimenew                                   Fractionenhanced
                                   (1 - Fractionenhanced) +
                                                                Speedupenhanced



 13                                              Computer Architecture 2008 – Introduction
          Amdahl’s Law: Example
• Floating point instructions improved to run at 2×,
  but only 10% of executed instructions are FP

ExTimenew = ExTimeold × (0.9 + 0.1 / 2) = 0.95 × ExTimeold

                             1
         Speedupoverall =        = 1.053
                            0.95


                      Corollary:
        Make The Common Case Fast


 14                                 Computer Architecture 2008 – Introduction
     Calculating the CPI of a Program
    ICi: #times instruction of type i is executed in the program
                                                         n
    IC: #instruction executed in the program: IC  IC
                                                            i                  
                                                                               i 1
    Fi: relative frequency of instruction of type i : Fi = ICi/IC
    CPIi – #cycles to execute instruction of type i
            e.g.: CPIadd = 1, CPImul = 3
    #cycles required to execute the program:
                                      n
                          # cyc   CPIi  ICi  CPI * IC
                                     i 1

                              n
     CPI:

                   # cyc      CPI  IC
                                     i      i     n
                                                           ICi n
             CPI           i 1
                                                  CPIi       CPIi  Fi
                    IC              IC            i 1     IC i 1
    15                                                 Computer Architecture 2008 – Introduction
            Comparing Performance
    Peak Performance
        MIPS, MFLOPS
        Often not useful: unachievable / unsustainable in practice
    Benchmarks
        Real applications, or representative parts of real apps
        Targeted at the specific system usages
    SPEC INT – integer applications
        Data compression, C complier, Perl interpreter, database
         system, chess-playing, Text-processing, …
    SPEC FP – floating point applications
        Mostly important scientific applications
    TPC Benchmarks
        Measure transaction-processing throughput

16                                          Computer Architecture 2008 – Introduction
      Instruction Set Design

software                     The ISA is what the user /
                             compiler see

           instruction set


                             The HW implements the
hardware                     ISA




 17                          Computer Architecture 2008 – Introduction
                 ISA Considerations
    Code size
        Long instructions take more time to fetch
        Longer instructions require a larger memory
         • Important in small devices, e.g., cell phones


    Number of instructions (IC)
        Reducing IC reduce execution time
         • At a given CPI and frequency


    Code “simplicity”
        Simple HW implementation
         • Higher frequency and lower power
        Code optimization can better be applied to “simple code”

18                                            Computer Architecture 2008 – Introduction
Architectural Consideration Example
     Displacement Address Size


    30%
                                                           Int. Avg.

    20%                                                     FP Avg.


    10%

      0%
                       3
                           4
                                5
                                    6
                                        7
                                            8
                                                9
               1
                   2
           0




                                                      10




                                                                       13
                                                                             14
                                                                                   15
                                                                 12
                                                            11
                               Address Bits

          1% of addresses > 16-bits
          12 - 16 bits of displacement needed


 19                                                 Computer Architecture 2008 – Introduction
                    CISC Processors
    CISC - Complex Instruction Set Computer
        The idea: a high level machine language
        Example: x86


    Characteristic
        Many instruction types, with a many addressing modes
        Some of the instructions are complex
         • Execute complex tasks
         • Require many cycles
        ALU operations directly on memory
         • Only a few registers, in many cases not orthogonal
        Variable length instructions
         • common instructions get short codes  save code length


20                                          Computer Architecture 2008 – Introduction
          Top 10 x86 Instructions
            Rank   instruction     % of total executed
            1      load                     22%
            2      conditional branch       20%
            3      compare                  16%
            4      store                    12%
            5      add                      8%
            6      and                      6%
            7      sub                      5%
            8      move register-register   4%
            9      call                     1%
            10     return                   1%
                   Total                    96%


     Simple instructions dominate instruction frequency
21                                          Computer Architecture 2008 – Introduction
                          CISC Drawbacks
    Complex instructions and complex addressing modes
      complicates the processor
      slows down the simple, common instructions
      contradicts Make The Common Case Fast

    Compilers don’t use complex instructions / indexing methods
    Variable length instructions are real pain in the neck
            Difficult to decode few instructions in parallel
              • As long as instruction is not decoded, its length is unknown
                 It is unknown where the instruction ends
                 It is unknown where the next instruction starts
            An instruction may be over more than a single cache line
            An instruction may be over more than a single page



    22                                              Computer Architecture 2008 – Introduction
                    RISC Processors
    RISC - Reduced Instruction Set Computer
        The idea: simple instructions enable fast hardware
    Characteristic
        A small instruction set, with only a few instructions formats
        Simple instructions
         • execute simple tasks
         • Most of them require a single cycle (with pipeline)
        A few indexing methods
        ALU operations on registers only
         • Memory is accessed using Load and Store instructions only
         • Many orthogonal registers
         • Three address machine:    Add dst, src1, src2
        Fixed length instructions

    Examples: MIPSTM, SparcTM, AlphaTM, PowerTM

23                                            Computer Architecture 2008 – Introduction
             RISC Processors (Cont.)
    Simple architecture  Simple micro-architecture
        Simple, small and fast control logic
        Simpler to design and validate
        Room for large on die caches
        Shorten time-to-market


    Using a smart compiler
        Better pipeline usage
        Better register allocation


    Existing RISC processor are not “pure” RISC
        e.g., support division which takes many cycles


24                                          Computer Architecture 2008 – Introduction
                   Compilers and ISA
    Ease of compilation
        Orthogonality:
          • no special registers
          • few special cases
          • all operand modes available with any data type or instruction
            type
        Regularity:
          • no overloading for the meanings of instruction fields
        streamlined
          • resource needs easily determined


    Register Assignment is critical too
        Easier if lots of registers


25                                             Computer Architecture 2008 – Introduction
                  CISC Is Dominant
    The x86 architecture, which is a CISC
     architecture, dominates the processor market
        A vast amount of existing software
        Intel, AMD, Microsoft and others benefit from this
         • Intel and AMD put a lot of money to make high performance
           x86 processors, despite the architectural disadvantage
         • Current x86 processor give the best cost/performance
        CISC processors use arch ideas from the RISC world
        Starting at Pentium II and K6, x86 processors translate
         CISC instructions into RISC-like operations internally
         • the inside core looks much like that of a RISC processor




26                                           Computer Architecture 2008 – Introduction
         Software Specific Extensions
    Extend arch to accelerate exec of specific apps

    Example: SSETM – Streaming SIMD Extensions
        128-bit packed (vector) / scalar single precision FP (4×32)
        Introduced on Pentium® III on ’99
        8 new 128 bit registers (XMM0 – XMM7)
        Accelerates graphics, video, scientific calculations, …

    Packed:                            Scalar:
                128-bits                              128-bits
         x3   x2       x1   x0           x3         x2          x1         x0
                   +                                        +
         y3   y2       y1   y0           y3         y2          y1         y0

      x3+y3 x2+y2 x1+y1 x0+y0            y3          y2         y1      x0+y0

27                                            Computer Architecture 2008 – Introduction

								
To top