Docstoc

Computer Architecture Computer Architecture Chapter 2

Document Sample
Computer Architecture Computer Architecture Chapter 2 Powered By Docstoc
					Computer Architecture
          Chapter 2
       Instruction Sets

     Prof. Jerry Breecher
           CSCI 240
           Fall 2001
                    Introduction
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The DLX Architecture
Bonus




                        Chap. 2 - Instruction Sets   2
                     Introduction
The Instruction Set Architecture is that portion of the machine visible
     to the assembly level programmer or to the compiler writer.


                software

                                    instruction set

                hardware


1.   What are the advantages and disadvantages of various
     instruction set alternatives.
2.   How do languages and compilers affect ISA.
3.   Use the DLX architecture as an example of a RISC architecture.
                           Chap. 2 - Instruction Sets             3
                     Classifying Instruction Set
2.1 Introduction
                                  Architectures
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
                                                  Classifications can be by:
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers                         1.     Stack/accumulator/register
2.8 The DLX Architecture                          2.     Number of memory operands.
                                                  3.     Number of total operands.




                                        Chap. 2 - Instruction Sets              4
 Instruction Set                                        Basic ISA
  Architectures                                          Classes
Accumulator:
     1 address              add A             acc acc + mem[A]
     1+x address            addx A            acc acc + mem[A + x]

Stack:
         0 address          add               tos tos + next

General Purpose Register:
                                                                           ALU Instructions
      2 address             add A B           EA(A) EA(A) + EA(B)
                                                                           can have two or
      3 address             add A B C         EA(A) EA(B) + EA(C)
                                                                           three operands.
Load/Store:
      0 Memory              load R1, Mem1                         ALU Instructions can
                            load R2, Mem2                         have 0, 1, 2, 3 operands.
                            add R1, R2                            Shown here are cases of
                                                                  0 and 1.
         1 Memory           add R1, Mem2


                             Chap. 2 - Instruction Sets                               5
Instruction Set                                       Basic ISA
 Architectures                                         Classes
  The results of different address classes is easiest to see with the examples here,
  all of which implement the sequences for C = A + B.

  Stack       Accumulator              Register                    Register
                                  (Register-memory)              (load-store)
Push A        Load A             Load R1, A                  Load    R1, A

Push B        Add B              Add     R1, B               Load    R2, B

Add           Store C            Store    C, R1              Add     R3, R1, R2

Pop C                                                        Store    C, R3

Registers are the class that won out. The more registers on the CPU, the better.


                           Chap. 2 - Instruction Sets                          6
Instruction Set                       Intel 80x86
 Architectures                    Integer Registers
GPR0   EAX                       Accumulator
GPR1   ECX                       Count register, string, loop
GPR2   EDX                       Data Register; multiply, divide
GPR3   EBX                       Base Address Register
GPR4   ESP                       Stack Pointer
GPR5   EBP                       Base Pointer – for base of stack seg.
GPR6   ESI                       Index Register
GPR7   EDI                       Index Register
       CS                        Code Segment Pointer
       SS                        Stack Segment Pointer
       DS                        Data Segment Pointer
       ES                        Extra Data Segment Pointer
       FS                        Data Seg. 2
       GS                        Data Seg. 3
PC     EIP                       Instruction Counter
       Eflags                    Condition Codes
                Chap. 2 - Instruction Sets                               7
                           Memory Addressing
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set            Sections Include:
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
                                                 Interpreting Memory Addresses
2.7 The Role of Compilers
2.8 The DLX Architecture
                                                 Addressing Modes

                                                 Displacement Address Mode

                                                 Immediate Address Mode




                                        Chap. 2 - Instruction Sets               8
 Memory                                Interpreting Memory
Addressing                                  Addresses


What object is accessed as a function of the address and length?

Objects have byte addresses – an address refers to the number of bytes
   counted from the beginning of memory.
Little Endian – puts the byte whose address is xx00 at the least
   significant position in the word.
Big Endian – puts the byte whose address is xx00 at the most significant
   position in the word.
Alignment – data must be aligned on a boundary equal to its size.
   Misalignment typically results in an alignment fault that must be
   handled by the Operating System.



                         Chap. 2 - Instruction Sets                   9
 Memory                               Addressing
Addressing                              Modes
This table shows the most common modes. A more complete set is in
  Figure 2.5

Addressing Mode        Example                Meaning                When Used
                      Instruction
Register            Add R4, R3        R[R4] <- R[R4] + R[R3]     When a value is in a
                                                                 register.
Immediate           Add R4, #3        R[R4] <- R[R4] + 3         For constants.


Displacement        Add R4, 100(R1)   R[R4] <- R[R4] +           Accessing local
                                               M[100+R[R1] ]     variables.
Register Deferred   Add R4, (R1)      R[R4] <- R[R4] +           Using a pointer or a
                                               M[R[R1] ]         computed address.
Absolute            Add R4, (1001)    R[R4] <- R[R4] + M[1001]   Used for static data.


                             Chap. 2 - Instruction Sets                       10
 Memory                                  Displacement
Addressing                              Addressing Mode
How big should the displacement be?

For addresses that do fit in displacement size:
       Add R4, 10000 (R0)
For addresses that don’t fit in displacement size, the compiler
  must do the following:
       Load R1, address
       Add R4, 0 (R4)

Depends on typical displaces as to how big this should be.

On both IA32 and DLX, the space allocated is 16 bits.



                        Chap. 2 - Instruction Sets                11
 Memory                              Immediate Address
Addressing                                Mode
 Used where we want to get to a numerical value in an
   instruction.

     At high level:                 At Assembler level:

     a = b + 3;                     Load     R2, 3
                                    Add      R0, R1, R2

     if ( a > 17 )                  Load   R2, 17
                                    CMPBGT R1, R2

     goto    Addr                   Load           R1, Address
                                    Jump           (R1)


  So how would you get a 32 bit value into a register?
                      Chap. 2 - Instruction Sets                 12
                                                Operations In The
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
                                                  Instruction Set
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands                     Sections Include:
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
                                                  Detailed information about types
2.8 The DLX Architecture
                                                  of instructions.

                                                  Instructions for Control Flow
                                                  (conditional branches, jumps)




                                        Chap. 2 - Instruction Sets                13
Operations In The                                   Operator Types
 Instruction Set
Arithmetic and logical -            and, add
Data transfer -                     move, load
Control -                           branch, jump, call
System -                            system call, traps
Floating point -                    add, mul, div, sqrt
Decimal -                           add, convert
String -                            move, compare
Multimedia -                        2D, 3D? e.g., Intel MMX and Sun VIS




                           Chap. 2 - Instruction Sets                     14
Operations In The                                Control
                                               Instructions
 Instruction Set                               Conditional branches are 20%
                                                   of all instructions!!




      Control Instructions Issues:
      •     taken or not
      •     where is the target
      •     link return address
      •     save or restore
      Instructions that change the PC:
      •     (conditional) branches, (unconditional) jumps
      •     function calls, function returns
      •     system calls, system returns




                  Chap. 2 - Instruction Sets                         15
Operations In The                                               Control
                                                              Instructions
 Instruction Set
There are numerous tradeoffs:                   There are numerous tradeoffs:


Compare and branch                              condition in general-purpose register
        + no extra compare, no state passed         + no special state but uses up a register
           between instructions                     -- branch condition separate from branch
                                                       logic in pipeline
        -- requires ALU op, restricts code
                                                some data for MIPS
           scheduling opportunities
                                                    > 80% branches use immediate data, >
Implicitly set condition codes - Z, N, V, C            80% of those zero
        + can be set ``for free''                   50% branches use == 0 or <> 0
        -- constrains code reordering, extra    compromise in MIPS
           state to save/restore                    branch==0, branch<>0
Explicitly set condition codes                      compare instructions for all other
        + can be set ``for free'', decouples           compares
           branch/fetch from pipeline
        -- extra state to save/restore

                                 Chap. 2 - Instruction Sets                        16
Operations In The                                                      Control
                                                                     Instructions
 Instruction Set
Link Return Address:                               Save or restore state:


implicit register - many recent                    What state?
  architectures use this                                  function calls: registers
                                                          system calls: registers, flags, PC, PSW, etc
     + fast, simple
     -- s/w save register before next call,
                                                   Hardware need not save registers
        surprise traps?                                   Caller can save registers in use
                                                          Callee save registers it will use
explicit register
     + may avoid saving register                   Hardware register save
                                                          IBM STM, VAX CALLS
     -- register must be specified
                                                          Faster?
processor stack
                                                   Many recent architectures do no register
     + recursion direct                                 saving
     -- complex instructions                       Or do implicit register saving with register
                                                        windows (SPARC)



                                       Chap. 2 - Instruction Sets                             17
               Type And Size of Operands
2.1 Introduction
                                                  The type of the operand is usually
2.2 Classifying Instruction Set                     encoded in the Opcode – a LDW
    Architectures
                                                    implies loading of a word.
2.3 Memory Addressing
                                                  Common sizes are:
2.4 Operations in the Instruction Set
                                                       Character (1 byte)
2.5 Type and Size of Operands
                                                       Half word (16 bits)
2.6 Encoding and Instruction Set                       Word (32 bits)
2.7 The Role of Compilers                              Single Precision Floating Point (1 Word)
2.8 The DLX Architecture                               Double Precision Floating Point (2 Words)
                                                  Integers are two’s complement binary.
                                                  Floating point is IEEE 754.
                                                  Some languages (like COBOL) use
                                                     packed decimal.




                                        Chap. 2 - Instruction Sets                      18
      Encoding And Instruction Set
2.1 Introduction                                    This section has to do with how an
2.2 Classifying Instruction Set Architectures       assembly level instruction is
2.3 Memory Addressing                               encoded into binary.
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
                                                    Ultimately, it’s the binary that is
2.6 Encoding and Instruction Set
                                                    read and interpreted by the
2.7 The Role of Compilers
                                                    machine.
2.8 The DLX Architecture




            We will be using the Intel instruction set which is defined at:
            http://developer.intel.com/design/Pentium4/manuals.

            Volume 2 has the instruction set.

                                        Chap. 2 - Instruction Sets                   19
Encoding And                                          80x86 Instruction
Instruction Set                                          Encoding
for ( index = 0; index < iterations; index++ )                                      Here’s some
  0040D3AF              C7 45 F0 00 00 00 00 mov     dword ptr [ebp-10h],0       sample code that’s
  0040D3B6              EB 09                   jmp   main+0D1h (0040d3c1)      been disassembled.
  0040D3B8              8B 4D F0                mov   ecx,dword ptr [ebp-10h]     It was compiled
  0040D3BB              83 C1 01               add    ecx,1                      with the debugger
  0040D3BE              89 4D F0                mov   dword ptr [ebp-10h],ecx      option so is not
  0040D3C1              8B 55 F0                mov   edx,dword ptr [ebp-10h]        optimized.
  0040D3C4              3B 55 F8               cmp    edx,dword ptr [ebp-8]
  0040D3C7              7D 15                  jge    main+0EEh (0040d3de)
          long_temp = (*alignment + long_temp) % 47;                                This code
  0040D3C9              8B 45 F4                mov   eax,dword ptr [ebp-0Ch]          was
  0040D3CC              8B 00                   mov   eax,dword ptr [eax]           produced
  0040D3CE              03 45 EC               add    eax,dword ptr [ebp-14h]      using Visual
  0040D3D1              99                     cdq                                    Studio
  0040D3D2              B9 2F 00 00 00          mov   ecx,2Fh
  0040D3D7              F7 F9                   idiv  eax,ecx
  0040D3D9              89 55 EC                mov   dword ptr [ebp-14h],edx
  0040D3DC              EB DA                  jmp    main+0C8h (0040d3b8)
                                     Chap. 2 - Instruction Sets                           20
Encoding And                                          80x86 Instruction
Instruction Set                                          Encoding
                                                                                 Here’s some
 for ( index = 0; index < iterations; index++ )
                                                                              sample code that’s
 00401000 8B 0D 40 54 40 00         mov         ecx,dword ptr ds:[405440h]
 00401006 33 D2                     xor         edx,edx
                                                                             been disassembled.
 00401008 85 C9                      test       ecx,ecx                        It was compiled
 0040100A 7E 14                     jle         00401020                       with optimization
 0040100C 56                        push        esi
 0040100D 57                        push        edi                              This code
 0040100E 8B F1                     mov         esi,ecx                             was
       long_temp = (*alignment + long_temp) % 47;                                produced
 00401010 8D 04 11                  lea         eax,[ecx+edx]                   using Visual
 00401013 BF 2F 00 00 00            mov         edi,2Fh                            Studio
 00401018 99                        cdq
 00401019 F7 FF                      idiv       eax,edi
 0040101B 4E                        dec         esi
 0040101C 75 F2                      jne        00401010
 0040101E 5F                        pop         edi
 0040101F 5E                        pop         esi
 00401020 C3                         ret
                                   Chap. 2 - Instruction Sets                          21
Encoding And                             80x86 Instruction
Instruction Set                             Encoding
                                                                  Here’s some
 for ( index = 0; index < iterations; index++ )
                                                               sample code that’s
 0x804852f <main+143>: add $0x10,%esp                         been disassembled.
 0x8048532 <main+146>: lea 0xfffffff8(%ebp),%edx                It was compiled
 0x8048535 <main+149>: test %esi,%esi                           with optimization
 0x8048537 <main+151>: jle 0x8048543 <main+163>
 0x8048539 <main+153>: mov %esi,%eax                               This code
 0x804853b <main+155>: nop                                            was
 0x804853c <main+156>: lea 0x0(%esi,1),%esi                        produced
      long_temp = (*alignment + long_temp) % 47;                   using gcc
 0x8048540 <main+160>: dec %eax                                    and gdb.
 0x8048541 <main+161>: jne 0x8048540 <main+160>                   For details,
 0x8048543 <main+163>: add $0xfffffff4,%esp                       see Lab 2.1

                                               Note that the representation of
                                               the code is dependent on the
                                               compiler/debugger!
                          Chap. 2 - Instruction Sets                     22
Encoding And                                            80x86 Instruction
                                                            Encoding
Instruction Set

  4         3       1          8                       A Morass of disjoint encoding!!
 ADD Reg W                     Disp.



      6      2             8                8

  SHL       V/w         postbyte          Disp.

                                                                          This is Figure D.8
        7       1           8                8

  TEST       W          postbyte        Immediate



                                       Chap. 2 - Instruction Sets                        23
Encoding And                                           80x86 Instruction
                                                           Encoding
Instruction Set
 4            4          8

 JE       Cond        Disp.

      8                          16                            16

  CALLF                       Offset                   Segment Number


          6       2      8                 8

  MOV         D/w     postbyte           Disp.


          5       3

  PUSH Reg



                                      Chap. 2 - Instruction Sets        24
Encoding And                                               80x86 Instruction
Instruction Set                                                Encoding
        Here’s the instruction that we had several pages ago:
        0040D3AF C7 45 F0 00 00 00 00 mov          dword ptr [ebp-10h],0
        Is described in:
        http://developer.intel.com/design/Pentium4/manuals/24547103.pdf
        (I found it on page 472, but this is obviously version dependent.)

C7 /0    MOV r/m32,imm32          Move an immediate 32 bit data item to a register or to memory.

Copies the second operand (source operand) to the first operand (destination operand). The
    source operand can be an immediate value, general purpose register, segment register,
    or memory location. Both operands must be the same size, which can be a byte, a word,
    or a doubleword.
In our case, because of the “C7” Opcode, we know it’s a sub-flavor of MOV putting an
    immediate value into memory.

                        C7 45 F0 00 00 00 00 mov        dword ptr [ebp-10h],0
   Op Code for
  Mov Immediate
                                   This is              32 bits of 0.
      Target Register             -10 hex.
    + use next 8 bits as
       displacement.                   Chap. 2 - Instruction Sets                       25
                   The Role of Compilers
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set             Compiler goals:
2.5 Type and Size of Operands                     • All correct programs execute
2.6 Encoding and Instruction Set                    correctly
2.7 The Role of Compilers                         • Most compiled programs
2.8 The DLX Architecture                            execute fast (optimizations)
                                                  • Fast compilation
                                                  • Debugging support




                                        Chap. 2 - Instruction Sets             26
The Role of                       Steps In Compilation
Compilers
Parsing --> intermediate representation
Jump Optimization
Loop Optimizations
Register Allocation
Code Generation --> assembly code
Common Sub-Expression
Procedure in-lining
Constant Propagation
Strength Reduction
Pipeline Scheduling




                     Chap. 2 - Instruction Sets      27
        The Role of
                                          Steps In Compilation
        Compilers
   Optimization               Explanation                % of the total number of
      Name                                                      optimizing
                                                            transformations
High Level          At or near the source level;              Not Measured
                    machine-independent

Local               Within Straight Line Code                      40%



Global              Across A Branch                                42%



Machine Dependent   Depends on Machine Knowledge              Not Measured



                            Chap. 2 - Instruction Sets                       28
  The Role of                     What compiler writers want:
  Compilers
                                        One solution or all possible solutions
• regularity                            • 2 branch conditions - eq, lt
• orthogonality                         • or all six - eq, ne, lt, gt, le, ge
• composability                         • not 3 or 4

Compilers perform a giant case          There are advantages to having
  analysis                                instructions that are primitives.
• too many choices make it hard
                                        Let the compiler put the instructions
Orthogonal instruction sets               together to make more complex
• operation, addressing mode, data        sequences.
   type



                           Chap. 2 - Instruction Sets                         29
                   The DLX Architecture
2.1 Introduction                                   DLX (pronounced DELUX) is an
2.2 Classifying Instruction Set Architectures      instruction set introduced by
2.3 Memory Addressing                              Hennessy and Patterson in the 1st
2.4 Operations in the Instruction Set              edition of this text.
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
                                                   DLX is very RISC oriented.
2.7 The Role of Compilers
2.8 The DLX Architecture
                                                   DLX will be used for many
                                                   examples throughout the course.




                                        Chap. 2 - Instruction Sets              30
  The DLX                                DLX Characteristics
Architecture
RISC strongly related to MIPS            Data transfer
32-bit byte addresses aligned            • load/store word, load/store
Load/store - only displacement              byte/halfword signed?
      addressing                         • load/store FP single/double
Standard datatypes                       • moves between GPRs and FPRs
3 fixed length formats                   ALU
32 32-bit GPRs (r0 = 0)                  • add/subtract signed? immediate?
16 64-bit (32 32-bit) FPRs               • multiply/divide signed?
FP status register                       • and,or,xor immediate?, shifts: ll, rl,
No Condition Codes                          ra immediate?
                                         • sets immediate?



                            Chap. 2 - Instruction Sets                     31
  The DLX                                 DLX Characteristics
Architecture
  Control
  •   branches == 0, <> 0
  •   conditional branch testing FP bit
  •   jump, jump register
  •   jump & link, jump & link register
  •   trap, return-from-exception

  Floating Point
  •    add/sub/mul/div
  •    single/double
  •    fp converts, fp set



                             Chap. 2 - Instruction Sets   32
  The DLX                                       The DLX Encoding
Architecture
  Register-Register
     31         26 25     21 20        16 15     11 10   6 5         0

          Op        Rs1           Rs2       Rd                 Opx

  Register-Immediate
     31         26 25     21 20        16 15                         0

          Op        Rs1           Rd             immediate

  Branch
     31         26 25     21 20        16 15                         0

          Op        Rs1    Rs2/Opx               immediate

  Jump / Call
     31         26 25                                                0

          Op                           target

                           Chap. 2 - Instruction Sets                    33
                                 RISC versus CISC
BONUS

combines 3 features
• architecture
• implementation
• compilers and OS
argues that
• implementation effects are second order
• compilers are similar
• RISCs are better than CISCs: fair comparison?

• NEEDS MORE WORK


                    Chap. 2 - Instruction Sets    34
                                  RISC versus CISC
BONUS

RISC factor: {CPI VAX * Instr VAX }/ {CPI MIPS * Instr MIPS }

Benchmark Instruction    CPI    CPI       CPI     RISC
             Ratio       MIPS   VAX       Ratio   factor
li           1.6         1.1     6.5       6.0     3.7
eqntott      1.1         1.3     4.4       3.5     3.3
fpppp        2.9         1.5    15.2      10.5     2.7
tomcatv      2.9         2.1    17.5       8.2     2.9




                     Chap. 2 - Instruction Sets                 35
                                      RISC versus CISC
    BONUS
Compensating factors                  Factors favoring MIPS
• Increase VAX CPI but decrease       • Operand specifier decoding
   VAX instruction count              • Number of registers
• Increase MIPS instruction count     • Separate floating point unit
• e.g. 1: loads/stores versus         • Simple branches/jumps (lower
   operand specifiers                    latency)
• e.g. 2: necessary complex           • No complex instructions
   instructions: loop branches        • Instruction scheduling
Factors favoring VAX                  • Translation buffer
• Big immediate values                • Branch displacement size
• Not-taken branches incur no
   delay




                         Chap. 2 - Instruction Sets               36
                         Wrapup
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The DLX Architecture
Bonus




                        Chap. 2 - Instruction Sets   37

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:26
posted:8/12/2011
language:English
pages:37