Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

Computer Architecture

VIEWS: 5 PAGES: 77

									LOGO




              P r i n c e s s   S u m a y a   U n i v e r s i t y   f o r   Te c h n o l o g y




         Computer
       Architecture
       Dr. Esam Al_Qaralleh
2
Instruction Set
 Architecture
     (ISA)
              3
                     Outline

Introduction
Classifying instruction set architectures
Instruction set measurements
     Memory addressing
     Addressing modes for signal processing
     Type and size of operands
     Operations in the instruction set
     Operations for media and signal processing
     Instructions for control flow
     Encoding an instruction set
MIPS architecture

                                                   4
LOGO




 Instruction Set Principles and
           Examples
   Basic Issues in Instruction Set Design

 What operations and How many
    Load/store/Increment/branch are sufficient to do any
     computation, but not useful (programs too long!!).
 How (many) operands are specified?
    Most operations are dyadic (e.g., AB+C); Some are
     monadic (e.g., A B).
 How to encode them into instruction format?
    Instructions should be multiples of Bytes.
 Typical Instruction Set
      32-bit word
      Basic operand addresses are 32-bit long.
      Basic operands (like integer) are 32-bit long.
      In general, Instruction could refer 3 operands (AB+C).
 Challenge: Encode operations in a small number of
  bits.
                                                                 6
        Brief Introduction to ISA

 Instruction Set Architecture: a set of instructions
    Each instruction is directly executed by the CPU’s hardware
 How is it represented?
    By a binary format since the hardware understands only bits
       6         5        5                   16

     opcode       rs          rt           Immediate

 Options - fixed or variable length formats
    Fixed - each instruction encoded in same size field (typically 1
     word)
    Variable – half-word, whole-word, multiple word instructions are
     possible


                                                                        7
          What Must be Specified?



Instruction Format (encoding)
   How is it decoded?
Location of operands and result
   Where other than memory?
   How many explicit operands?
   How are memory operands located?
Data type and Size
Operations
   What are supported?

                                       8
Example of Program Execution


                                Command
                                   1: Load AC from
                                    Memory
                                   2: Store AC to
                                    memory
                                   5: Add to AC
                                    from memory
                                Add the contents
                                 of memory 940
                                 to the content of
                                 memory 941 and
                                 stores the result
                                 at 941
    Fetch          Execution

                                                9
LOGO




           Classifying
       Instruction Set
          Architecture
             Instruction Set Design




The instruction set influences everything


                                            11
Instruction Characteristics

 Usually a simple operation
    Which operation is identified by the op-code field
 But operations require operands - 0, 1, or 2
    To identify where they are, they must be addressed
       • Address is to some piece of storage
       • Typical storage possibilities are main memory, registers, or a stack
 2 options explicit or implicit addressing
    Implicit - the op-code implies the address of the operands
       • ADD on a stack machine - pops the top 2 elements of the stack,
         then pushes the result
       • HP calculators work this way
    Explicit - the address is specified in some field of the instruction
       • Note the potential for 3 addresses - 2 operands + the destination




                                                                             12
Classifying Instruction Set Architectures


                                  Based on CPU internal storage options
                                  AND # of operands




   These choices critically affect - #instructions, CPI, and
   cycle time
                                                                          13
Operand Locations for Four ISA Classes




                                         14
                          C=A+B

 Stack                             Register (register-
    Push A
    Push B
                                     memory)
    Add                               Load R1, A
      • Pop the top-2 values of        Add R3, R1, B
        the stack (A, B) and push
        the result value into the      Store R3, C
        stack
    Pop C                          Register (load-store)
 Accumulator (AC)                       Load R1, A
    Load A                              Load R2, B
    Add B
      • Add AC (A) with B and            Add R3, R1, R2
        store the result into AC         Store R3, C
    Store C

                                                             15
   Modern Choice – Load-store Register
           (GPR) Architecture

 Reasons for choosing GPR (general-purpose registers)
  architecture
    Registers (stacks and accumulators…) are faster than memory
    Registers are easier and more effective for a compiler to use
       • (A+B) – (C*D) – (E*F)
           – May be evaluated in any order (for pipelining concerns or …)
               » But on a stack machine  must left to right
    Registers can be used to hold variables
       • Reduce memory traffic
       • Speed up programs
       • Improve code density (fewer bits are used to name a register)
 Compiler writers prefer that all registers be equivalent
  and unreserved
    The number of GPR: at least 16


                                                                            16
 Characteristics Divide GPR Architectures


# of operands
   Three-operand: 1 result and 2 source
    operands
   Two-operand – 1 both source/result and 1
    source
How many operands are memory
  addresses
 Load-store
    0 – 3 (two
Register-memory   sources + 1 result)


Memory-memory
                                               17
 Pro’s and Con’s of Three Most Common
             GPR Computers

Register-Register: (0,3)
        + Simple, fixed length instruction encoding.
        + Simple code-generation model.
        + Similar number of clocks to execute.
        - Higher instruction count.
Memory-memory: (3,3)
        + Most compact.
        - Different Instruction size.
        - Memory access bottleneck.
Register-Memory: (1,2)
        + Data access without loading first.
        + Easy to encode and yield good density.
        - One operand is destroyed.
        - Limited number of registers.


                                                       18
LOGO




       Memory Addressing
            Memory Addressing Basics
                            All architectures must address memory
What is accessed - byte, word, multiple words?
   Today’s machine are byte addressable
   Main memory is organized in 32 - 64 byte lines
   Big-Endian or Little-Endian addressing
Hence there is a natural alignment problem
   Size s bytes at byte address A is aligned if
                         A mod s = 0
   Misaligned access takes multiple aligned memory
    references
Memory addressing mode influences instruction
 counts (IC) and clock cycles per instruction (CPI)
                                                              20
                     Byte Ordering


Idea
   Bytes in long word numbered 0 to 3
   Which is most (least) significant?
   Can cause problems when exchanging binary data
    between machines
Big Endian: Byte 0 is most, 3 is least
   IBM 360/370, Motorola 68K, SPARC.
Little Endian: Byte 0 is least, 3 is most
   Intel x86, VAX
Alpha
   Chip can be configured to operate either way
   DEC workstation are little endian
   Cray T3E Alpha’s are big endian                  21
        Byte Ordering Example


union {
  unsigned   char c[8];
  unsigned   short s[4];
  unsigned   int i[2];
  unsigned   long l[1];
} dw;


     c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7]
       s[0]          s[1]          s[2]          s[3]
              i[0]                        i[1]
                            l[0]




                                                        22
                Byte Ordering on Alpha


Little Endian
          f0   f1   f2   f3   f4   f5   f6   f7
         c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7]
         LSB    MSB    LSB      MSB    LSB    MSB    LSB    MSB
            s[0]           s[1]             s[2]          s[3]
         LSB                    MSB    LSB                  MSB
                   i[0]                            i[1]
         LSB                                                MSB
                                   l[0]

                                    Print
Output on Alpha:
        Characters   0-7   ==   [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]
        Shorts       0-3   ==   [0xf1f0,0xf3f2,0xf5f4,0xf7f6]
        Ints         0-1   ==   [0xf3f2f1f0,0xf7f6f5f4]
        Long         0     ==   [0xf7f6f5f4f3f2f1f0]


                                                                            23
                Byte Ordering on x86


Little Endian
          f0   f1   f2   f3   f4   f5   f6   f7
         c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7]
         LSB    MSB    LSB      MSB    LSB    MSB    LSB    MSB
            s[0]           s[1]             s[2]          s[3]
         LSB                    MSB    LSB                  MSB
                   i[0]                            i[1]
         LSB                     MSB
                   l[0]

                                    Print
Output on Pentium:
        Characters   0-7   ==   [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]
        Shorts       0-3   ==   [0xf1f0,0xf3f2,0xf5f4,0xf7f6]
        Ints         0-1   ==   [0xf3f2f1f0,0xf7f6f5f4]
        Long         0     ==   [f3f2f1f0]


                                                                            24
                Byte Ordering on Sun


Big Endian
         f0   f1   f2   f3   f4   f5   f6   f7
        c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7]
        MSB    LSB    MSB        LSB   MSB    LSB    MSB     LSB
             s[0]          s[1]             s[2]          s[3]
        MSB                      LSB   MSB                  LSB
                    i[0]                           i[1]
        MSB                      LSB
                    l[0]

                                    Print
Output on Sun:
       Characters   0-7    ==   [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]
       Shorts       0-3    ==   [0xf0f1,0xf2f3,0xf4f5,0xf6f7]
       Ints         0-1    ==   [0xf0f1f2f3,0xf4f5f6f7]
       Long         0      ==   [0xf0f1f2f3]


                                                                            25
                   Addressing Modes

Immediate                      Register
Add R4, #3                     Add R4, R3
Regs[R4]  Regs[R4]+3          Regs[R4]  Regs[R4]+Regs[R3]


         Operand:3                     R3


Register Indirect
Add R4, (R1)                                Operand
Regs[R4]  Regs[R4]+Mem[Regs[R1]]

         R1
                                            Registers



                          Operand


              Registers   Memory                              26
                        Addressing Modes(Cont.)

Direct                          Memory Indirect
Add R4, (1001)                  Add R4, @(R3)
Regs[R4]  Regs[R4]+Mem[1001]   Regs[R4]  Regs[R4]+Mem[Mem[Regs[R3]]]

                                        R3
       1001



              Operand                                    Operand


                Memory                       Registers   Memory




                                                                   27
                     Addressing Modes(Cont.)

Displacement                      Scaled
Add R4, 100(R1)                   Add R1, 100(R2) [R3]
Regs[R4]  Regs[R4]+Mem[100+R1]   Regs[R1]  Regs[R1]+Mem[100+
                                   Regs[R2]+Regs[R3]*d]

       R1   100                       R3 R2   100



                     Operand                              Operand

                                                     *d
       Registers     Memory              Registers        Memory




                                                                    28
Typical Address Modes (I)




                            29
Typical Address Modes (II)




                             30
Use of Memory Addressing Mode (Figure 2.7)

  Based on a VAX which
  supported everything
                             Not counting Register
                             mode (50% of all)




                                                     31
             Displacement Address Size

Average of 5 programs from SPECint92 and
 SPECfp92.
   1% of addresses > 16 bits.
            Integer Average

                                    FP Average




                                                 32
       Immediate Addressing Mode

10 Programs from SPECInt92 and
 SPECfp92




                                   33
        Immediate Addressing Mode

50% to 60% fit within 8 bits
75% to 80% fit within 16 bits
            gcc


                  spice
                          Tex




                                    34
Short Summary – Memory Addressing

 Need to support at least three addressing
  modes
    Displacement, immediate, and register
     deferred (+ REGISTER)
    They represent 75% -- 99% of the addressing
     modes in benchmarks
 The size of the address for displacement
  mode to be at least 12—16 bits (75% –
  99%)
 The size of immediate field to be at least
  8 – 16 bits (50%— 80%)
                                               35
Operand Type & Size

Typical types: assume word= 32 bits
   Character - byte - ASCII or EBCDIC (IBM) - 4
    per word
   Short integer - 2- bytes, 2’s complement
   Integer - one word - 2’s complement
   Float - one word - usually IEEE 754 these
    days
   Double precision float - 2 words - IEEE 754
   BCD or packed decimal - 4- bit values packed
    8 per word
                                               36
Data Access Patterns




                       37
   Short Summary – Type and Size of
              Operand

The future - as we go to 64 bit machines
Larger offsets, immediate, etc. is likely
Usage of 64 and 128 bit values will
 increase
DSPs need wider accumulating registers
 than the size in memory to aid accuracy in
 fixed-point arithmetic



                                          38
LOGO




       ALU Operations
40
           What Operations are Needed

 Arithmetic + Logical
    Integer arithmetic: ADD, SUB, MULT, DIV, SHIFT
    Logical operation: AND, OR, XOR, NOT
 Data Transfer - copy, load, store
 Control - branch, jump, call, return, trap
 System - OS and memory management
    We’ll ignore these for now - but remember they are needed
 Floating Point
    Same as arithmetic but usually take bigger operands
 Decimal
 String - move, compare, search
 Graphics – pixel and vertex,
  compression/decompression operations
                                                                 41
        Top 10 Instructions for 80x86

 load: 22%                 The most widely
 conditional branch: 20%    executed instructions
 compare: 16%               are the simple
 store: 12%                 operations of an
                             instruction set
 add: 8%
                            The top-10
 and: 6%
                             instructions for 80x86
 sub: 5%                    account for 96% of
 move register-register:    instructions executed
  4%
                            Make them fast, as
 call: 1%                   they are the common
 return: 1%                 case
                                                  42
 Control Instructions are a Big Deal

Jumps - unconditional transfer
Conditional Branches
   How is condition code set? – by flag or part of the
    instruction
   How is target specified? How far away is it?
Calls
   How is target specified? How far away is it?
   Where is return address kept?
   How are the arguments passed? Callee vs. Caller
    save!
Returns
   Where is the return address? How far away is it?
   How are the results passed?

                                                          43
        Breakdown of Control Flows

Call/Returns
   Integer: 19%    FP: 8%
Jump
   Integer: 6% FP: 10%
Conditional Branch
   Integer: 75%    FP: 82%




                                     44
         Branch Address Specification

Known at compile time for unconditional and
 conditional branches - hence specified in the
 instruction
   As a register containing the target address
   As a PC-relative offset
Consider word length addresses, registers, and
 instructions
   Full address desired? Then pick the register option.
     • BUT - setup and effective address will take longer.
   If you can deal with smaller offset then PC relative
    works
     • PC relative is also position independent - so simple linker
       duty
                                                                     45
         Returns and Indirect Jumps

Branch target is not known at compile time
Need a way to specify the target
 dynamically
   Use a register
   Permit any addressing mode
   Regs[R4]  Regs[R4] + Mem[Regs[R1]]
Also useful for
   case or switch
   Dynamically shared libraries
   High-order functions or function pointers
                                                46
Branch Stats - 90% are PC Relative

Call/Return
   TeX = 16%, Spice = 13%, GCC = 10%
Jump
   TeX = 18%, Spice = 12%, GCC = 12%
Conditional
   TeX = 66%, Spice = 75%, GCC = 78%




                                        47
Branch Distances




                   48
           Condition Testing Options




PSW: program Switch Word
                                       49
What kinds of compares do Branches Use?




Large comparisons are with zero           50
           Direction, Frequency, and real
                                  Change




Key points – 75% are forward branch
• Most backward branches are loops - taken about 90%
• Branch statistics are both compiler and application dependent
• Any loop optimizations may have large effect

                                                                  51
   Short Summary – Operations in the
            Instruction Set

Branch addressing to be able to jump to
 about 100+ instructions either above or
 below the branch
   Imply a PC-relative branch displacement of at
    least 8 bits
Register-indirect and PC-relative
 addressing for jump instructions to support
 returns as well as many other features of
 current systems ( dynamic allocations)

                                                52
LOGO




          Encoding an
       Instruction Set
                                Encoding the ISA

 Encode instructions into a binary representation for
  execution by CPU
 Can pick anything but:
    Affects the size of code - so it should be tight
    Affects the CPU design - in particular the instruction decode
 So it may have a big influence on the CPI or cycle-time
 Must balance several competing forces
    Desire for lots of addressing modes and registers
    Desire to make average program size compact
    Desire to have instructions encoded into lengths that will be easy
     to handle in a pipelined implementation (multiple of bytes)




                                                                     54
                3 Popular Encoding Choices

 Variable (compact code but difficult to encode)
     Primary opcode is fixed in size, but opcode modifiers may exist
     Opcode specifies number of arguments - each used as address fields
     Best when there are many addressing modes and operations
     Use as few bits as possible, but individual instructions can vary widely in
      length
     e. g. VAX - integer ADD versions vary between 3 and 19 bytes
 Fixed (easy to encode, but lengthy code)
     Every instruction looks the same - some field may be interpreted
      differently
     Combine the operation and the addressing mode into the opcode
     e. g. all modern RISC machines
 Hybrid
     Set of fixed formats
     e. g. IBM 360 and Intel 80x86          Trade-off between size of program
                                             VS. ease of decoding



                                                                               55
3 Popular Encoding Choices (Cont.)




                                     56
  An Example of Variable Encoding -- VAX


addl3 r1, 737(r2), (r3): 32-bit integer add
 instruction with 3 operands  need 6 bytes to
 represent it
   Opcode for addl3: 1 byte
   A VAX address specifier is 1 byte (4-bits: addressing
    mode, 4-bits: register)
     • r1: 1 byte (register addressing mode + r1)
     • 737(r2)
         – 1 byte for address specifier (displacement addressing + r2)
         – 2 bytes for displacement 737
     • (r3): 1 byte for address specifier (register indirect + r3)
Length of VAX instructions: 1—53 bytes
                                                                         57
    Short Summary – Encoding the
            Instruction Set

Choice between variable and fixed
 instruction encoding
   Code size than performance  variable
    encoding
   Performance than code size  fixed encoding




                                              58
LOGO




       Role of Compilers
Critical goals in ISA from the compiler
 viewpoint
    What features will lead to high-quality code
    What makes it easy to write efficient
     compilers for an architecture




                                                    60
                        Compiler and ISA

ISA decisions are no more for programming AL
 easily
Due to HLL, ISA is a compiler target today
Performance of a computer will be significantly
 affected by compiler
Understanding compiler technology today is
 critical to designing and efficiently implementing
 an instruction set
Architecture choice affects the code quality and
 the complexity of building a compiler for it

                                                      61
                           Goal of the Compiler

Primary goal is correctness
Second goal is speed of the object code
Others:
     Speed of the compilation
     Ease of providing debug support
     Inter-operability among languages
     Flexibility of the implementation - languages
      may not change much but they do evolve - e.
      g. Fortran 66 ===> HPF
      Make the frequent cases fast and the rare case correct
                                                               62
           Optimization Observations

Hard to reduce branches
Biggest reduction is often memory
 references
Some ALU operation reduction happens
 but it is usually a few %
Implication:
   Branch, Call, and Return become a larger
    relative % of the instruction mix
   Control instructions among the hardest to
    speed up
                                                63
  How can Architects Help Compiler
                           Writers

Provide Regularity
   Address modes, operations, and data types should be
    orthogonal (independent) of each other
     • Simplify code generation especially multi-pass
     • Counterexample: restrict what registers can be used for a
       certain classes of instructions
Provide primitives - not solutions
   Special features that match a HLL construct are often
    un-usable
   What works in one language may be detrimental to
    others


                                                                   64
  How can Architects Help Compiler
                    Writers (Cont.)

Simplify trade-offs among alternatives
   How to write good code? What is a good code?
     • Metric: IC or code size (no longer true) caches and
       pipeline…
   Anything that makes code sequence performance
    obvious is a definite win!
     • How many times a variable should be referenced before it is
       cheaper to load it into a register
Provide instructions that bind the quantities
 known at compile time as constants
   Don’t hide compile time constants
     • Instructions which work off of something that the compiler
       thinks could be a run-time determined value hand-cuffs the
       optimizer

                                                                    65
           Short Summary -- Compilers

ISA has at least 16 GPR (not counting FP
 registers) to simplify allocation of registers using
 graph coloring
Orthogonality suggests all supported addressing
 modes apply to all instructions that transfer data
Simplicity – understand that less is more in ISA
 design
    Provide primitives instead of solutions
    Simplify trade-offs between alternatives
    Don’t bind constants at runtime
Counterexample – Lack of compiler support for
 multimedia instructions
                                                    66
LOGO




          The MIPS
       Architecture
                  Expectations for New ISA

 Use general-purpose registers, with a load-store architecture
 Support displacement (offset size12-16 bits), immediate (size 8 to
  16 bits), and register indirect
 Support 8-, 16-, 32-, and 64-bit integers and 64-bit IEEE 754
  floating-point numbers
 Support the following simple instructions: load, store, add, subtract,
  move register-register, and, shift, compare equal, compare not equal,
  branch (with a PC-relative address at least 8 bits long), jump, call,
  return
 Use fixed instruction encoding if interested in performance and use
  variable instruction encoding if interested in code size
 Provide at least 16 general-purpose registers (GPA) + separate
  floating-point registers, be sure all addressing modes apply to all
  data transfer instructions, and aim for a minimalist instruction set




                                                                     68
                                 MIPS

Simple load- store ISA
Enable efficient pipeline implementation
Fixed instruction set encoding
Efficiency as a compiler target
MIPS64 variant is discussed here




                                            69
                        Register for MIPS

32 64-bit integer GPR’s - R0, R1, ... R31,
 R0= 0 always
32 FPR’s - used for single or double
 precision
   For single precision: F0, F1, ... , F31 (32-bit)
   For double precision: F0, F2, ... , F30 (64-bit)
Extra status registers - moves via GPR’s
Instructions for moving between an FRP
 and a GPR
                                                       70
                       Data Types for MIPS

8-bit byte, 16-bit half words, 32-bit word, and 64-
 bit double words for integer data
32-bit single precision and 64-bit double
 precision for FP
MIPS64 operations work on 64-bit integer and
 32- or 64-bit floating point
   Bytes, half words, and words are loaded into the
    GPRs with zeros or the sign bit replicated to fill the 64
    bits of the GPRs
All references between memory and either
 GPRs or FPRs are through load or stores
                                                            71
           Addressing Modes for MIPS

Data addressing : immediate and displacement
 (16 bits)
   Displacement: Add R4, 100(R1)
    (Regs[R4]Regs[R4]+Mem[100+Regs[R1]])
   Register-indirect: placing 0 in displacement field
     • Add R4, (R1) (Regs[R4]Regs[R4]+Mem[Regs[R1]])
   Absolute addressing (16 bits): using R0 as the base
    register
     • Add R1, (1001) (Regs[R4]Regs[R4]+Mem[1001])
Byte addressable with 64-bit address
   Mode selection for Big Endian or Little Endian

                                                          72
           MIPS Instruction Format

Encode addressing mode into the opcode
All instructions are 32 bits with 6-bit
 primary opcode




                                           73
          MIPS Instruction Format (Cont.)


           I-Type Instruction
  6        5      5               16
opcode     rs      rt           Immediate
       Loads and Stores                 LW R1, 30(R2), S.S F0, 40(R4)
       ALU ops on immediates            DADDIU R1, R2, #3
           rt <-- rs op immediate
       Conditional branches             BEQZ R3, offset
           rs is the register checked
           rt unused
           immediate specifies the offset
       Jump registers ,jump and link register   JR R3
           rs is target register
           rt and immediate are unused but = 011

                                                                         74
     MIPS Instruction Format (Cont.)


                        R-Type Instruction
                6       5     5       5        5       6
             opcode     rs        rt    rd   shamt     func
  Register-register ALU operations: rdrs funct rt DADDU R1, R2, R3
       Function encodes the data path operations: Add, Sub...
  read/write special registers
  Moves

J-Type Instruction: Jump, Jump and Link, Trap and return from exception
                6                  26
             opcode               Offset added to PC




                                                                       75
MIPS instruction MIX




SPECint2000


                       76
MIPS instruction MIX (Cont.)




       SPECfp2000


                               77

								
To top