Lecture 5 MIPS R2000 Instructions and the Intel x86

Document Sample
Lecture 5 MIPS R2000 Instructions and the Intel x86 Powered By Docstoc
					         Lecture 5:
MIPS R2000 Instructions and
 the Intel x86 Architecture


                     ECE 201-11
                     03 Feb 04
                Administration
• Chapter 3 homework will be posted by next class
  (5 Feb). Due following Tuesday (10 Feb)
• Course is online on Blackboard
              Today’s Objectives
• Complete review of MIPS ISA
• Contrast MIPS to the Intel x86 Architecture
                     Multiply / Divide
• Perform multiply, divide
   –   mult    rs, rt
   –   multu   rs, rt
   –   div     rs, rt                      Registers

   –   divu    rs, rt
• Move result from multiply, divide         Mult/Div
                                             Unit
   – mflo rd
       • Low 32 bits for mult              HI          LO
       • Integer division result for div
   – mfhi rd
       • High 32 bits for mult
       • Remainder for div
                div/mult SPIM Examples
main:    li         $a0, 4
         li         $t0, 103
         div        $t0, $a0

PC = 00000000      EPC = 00000000 Cause = 00000000 BadVAddr= 00000000
Status = 00000000  HI = 00000003 LO = 00000019
                                     General Registers
R0 (r0) = 00000000 R8 (t0) = 00000067 R16 (s0) = 00000000 R24 (t8) = 00000000
R1 (at) = 00000000 R9 (t1) = 00000000 R17 (s1) = 00000000 R25 (t9) = 00000000
...

main:    li         $a0, 4
         li         $t0, 0x7fffffff
         mult       $t0, $a0

PC = 00000000       EPC = 00000000 Cause = 00000000 BadVAddr= 00000000
Status = 00000000    HI = 00000001 LO = fffffffc
                                        General Registers
R0 (r0) = 00000000 R8 (t0) = 7fffffff R16 (s0) = 00000000 R24 (t8) = 00000000
R1 (at) = 7fff0000 R9 (t1) = 00000000 R17 (s1) = 00000000 R25 (t9) = 00000000
       Sign-Extension/Zero-Extension
• Sign-extension is used for signed immediates (addi)
  and signed values from memory (lb).
• To sign-extend an n bit number to n+m bits, copy the
  sign-bit m times.
• For example, with n=4 and m=4,
        1011 = -4                 0101 = 5
   11111011 = -4             00000101 = 5
• Zero-extension is used for logical operations (ori), and
  unsigned values from memory (lbu)
• To zero-extend an n bit number to n+m bits, copy zero
  m times.
• For example, with n=4 and m=4,
        1011 = 11                0101 = 5
   00001011 = 11            00000101 = 5
                    Procedure Calls

• Procedures & subroutines for frequently used code
  segments
• Structuring of code
• Steps:
   – Define joint parameter space for main program & procedure
     (typically the registers and the stack)
   – Transfer control to procedure
   – Acquire required storage resources
   – Perform the desired task
   – Store parameters in joint space
   – Return control to caller
            Jump and Link/Jump

• Begin of procedure:
               jal      ProcedureAddress
• Operation:
   – Jump to address handed as parameter
   – Return address is placed in link/return address register
   – Link register, $ra, is $31
• Return:
               jr      $ra
   – Return register holds value: PC + 4
   – Storing of additional variables: STACK
                       Example: Swap

Swapping words                    jal swap
void swap(int v[],int k){
                                  ...
  int temp;                       swap:
 temp = v[k];                     sll $t2,$a1,2
 v[k] = v[k + 1];                 add $t2,$t2,$a0
 v[k + 1] = temp;                 1w $t0, 0($t2)
}
                                  1w $t1, 4($t2)
                                  sw $t1, 0($t2)
$a0 is start addr of v[ ]         sw $t0, 4($t2)
$a1 is k                          jr $ra
$t0 is temp
$t1 is used in transferring
     v[k] = v[k + 1]
$t2 is addr of v[k] = 4*k+start
           Moving Registers to Memory

• Moving registers to/from memory is typically implemented
  on a stack (our “joint parameter space”):
   – Supported by machine code on some machines
   – Push and pop
   – MIPS has no explicit support, and must be implemented in
     software


• Basic functionality:
   – The stack is an area allocated in memory
   – The stack pointer (SP) points to the top of stack
   – Push increments the SP and stores a register
   – Pop restores a register and decrements the SP
                Stack Pointer Example
High Memory
Address

          Before               Upon Entry               Upon Exit



SP                                               SP
                                  $s1
                                  $s0
                        SP



      We must use            addi $sp, $sp, -8        lw $s0, 0($sp)
      $s0, $s1 in the        sw $s0, 0($sp)           lw $s1, 4($sp)
      procedure              sw $s1, 4($sp)           addi $sp, $sp, 8

Low Memory
Address
                Who saves the register?
• Caller save
   – All values that have to be kept must be saved before procedure is
     called, e.g., $t1
• Callee save
   – Within procedure all used registers are saved and afterwards
     restored, e.g., $s1
• Two types of procedures:
   – A leaf procedures does not call another so $ra is ok
   – A non-leaf calls another, so the caller must save $ra, etc., on stack
Register Names in MIPS Assembly
 Name     Register           Usage       Preserved
          Number                          On call?
 $zero       0       Constant 0             n/a
  $at        1       Used by Assembler      n/a
$v0-$v1     2-3      Results                No
$a0-$a3     4-7      Arguments              No
$t0-$t7     8-15     Temporaries            No
$s0-$s7    16-23     Saves                 Yes
$t8-$t9    24-25     More Temps             No
$k0-$k1    26-27     Reserved by OS        Yes
 $gp         28      Global pointer        Yes
 $sp         29      Stack pointer         Yes
  $fp        30      Frame pointer         Yes
  $ra        31      Return address        Yes
             The Frame Pointer (FP)

• The stack is used to hold saved registers as well as
data local to a procedure
• For a given procedure, this segment is referred to as the
procedure frame or activation record
• The FP is a special purpose register that maintains the
address of the first word of the activation record
• The FP provides a stable base address within the
procedure, as the SP is dynamic
• The FP is implemented in software
• It is used for convenience, and is not necessary
   – GNU C Compiler uses the FP
   – SGI C Compiler does not, and uses FP as $s8
     Call-Return Linkage: Stack Frames

SP    FP                                 High Memory Address
                      Saved argument
At the beginning,     registers
SP and FP point
to the same word
                       Saved return
                       address
                                           The SP moves during
                                           procedure execution as
                       Saved saved         registers and local
                       registers           variables are pushed
                                           onto the stack
       SP
                      Local arrays and
                      structures
       SP                                Low Memory Address


                    Activation Record
MIPS Memory Map
Linker and Loader
           Summary of the MIPS ISA

• 32-bit fixed format instructions (RISC based)
• 3 instruction formats (R-type, I-type, J-type)
• 32 32-bit GPR, 32 FP registers and special purpose
registers
• Registers are partitioned by software convention. NOT
enforced by hardware
• 3-address mode for reg-reg arithmetic instructions
• Single address mode for load/store: base+displacement
   – lw & sw (and lh, sh…) are the only ways to access memory
• Simple branch conditions: compare one register
against zero or two registers for =, !=
         Summary of the MIPS ISA (cont’d)
• Register $zero always has the value zero (even if you try to write it)
• Branch/jal put the return address PC+4 into the link register (R31)
• All instructions change all 32 bits of the destination register (including
  lui, lb, lh) and all read all 32 bits of sources (add, sub, and, or, …)
• Immediate arithmetic and logical instructions are extended as follows:
      – Logical immediates ops are zero extended to 32 bits
      – Arithmetic immediates ops are sign extended to 32 bits (including
        addu)
• The data loaded by the instructions lb and lh are extended as follows:
      – lbu, lhu are zero extended
      – lb, lh are sign extended
• Overflow…
      – Can occur with add, sub, addi
      – Cannot occur in addu, subu, addiu, and, or, xor, nor, shifts, mult,
        multu, div, divu
The Intel x86 Architecture
            History of the Intel 80x86
• 1971: Intel invents microprocessor - 4004
• 1975: 8080 introduced
   – 8-bit microprocessor
   – Accumulator machine
• 1978: 8086 introduced
   – 16 bit microprocessor
   – Accumulator plus dedicated registers
• 1980: IBM selects 8088 as basis for IBM PC
   – 8088 is 8-bit external bus version of 8086
• 1980: 8087 floating point coprocessor
   – Adds 60 floating point instructions
   – 80 bit floating point registers
   – Uses hybrid stack/register scheme
    History of the Intel 80x86 (cont’d)
• 1982: 80286 introduced
     – 24-bit address
     – memory mapping & protection
• 1985: 80386 introduced
     – 32-bit address
     – 32-bit GP registers
•   1989: 80486 introduced
•   1992: Pentium introduced
•   1995: Pentium Pro introduced
•   1996: Pentium with MMX extensions
     – 57 new instructions
     – Primarily for multimedia applications
Sample MMX Application: Image Averaging
void imageAvg(unsigned char* im1, unsigned char* im2, unsigned char* im3, int sX, int sY)
{
            int          counter = sX*sY/16, rem=(sX*sY)%16, i;

             for(i=0; i<rem; i++, im1++, im2++, im3++)
                           *im3 = (*im1+*im2+1)/2;                // Add one to force round-up as in SIMD instruction

             __asm {
                          mov           eax, 0                    // pointer offset
                          mov           ecx, counter              // loop counter
                          mov           esi, im1                  // image pointers
                          mov           ebx, im2
                          mov           edi, im3
until:
                          cmp           ecx, 0
                          jz            done
                          movq          mm0, [esi+eax]
                          pavgb         mm0, [ebx+eax]
                          movq          [edi+eax], mm0

                          movq          mm0, [esi+eax+8]          // Loop unwound for speedup
                          pavgb         mm0, [ebx+eax+8]
                          movq          [edi+eax+8], mm0
                          add           eax, 16
                          loop          until
done:
                          emms
             }
}
        History of the Intel 80x86 (cont’d)
• 1997: Pentium II
• 1999: Pentium III Introduced
   – Supports Intel’s Internet Streaming SIMD technology
        Additional multimedia instructions
        Four 32-bit floating point operations in parallel
        Useful in speech recognition, video encoding/decoding
• 2000: Pentium IV Introduced
   – SSE2 Instructions extend mmx integer performance
• 2000: Itanium Introduced
   –   Release of IA-64 (RISC-like) architecture
   –   Explicitly Parallel Instruction Computing (EPIC)
   –   128-bit bundle with three instructions and a template
   –   128 general purpose registers and 128 floating point registers
• Intel architecture dictated by backward compatibility
   – Highly irregular architecture
   – About 40 million units sold per year in the US alone!
Intel 80x86 Integer Registers
Pentium 4 Architecture




                         24 Registers
                         TOTAL!!!
              x86 Operand Types

• x86 instructions typically have two operands, where
  one operand is both a source and a destination
  operand.
• Possible combinations include
      Source/destination type    Second source type
      Register                   Register
      Register                   Immediate
      Register                   Memory
      Memory                     Register
      Memory                     Immediate

• No memory-memory or immediate-immediate
• Immediates can be 8, 16, or 32 bits
              80x86 Instructions

• Data movement (move, push, pop)
• Arithmetic and logic (logic ops, tests CCs, shifts,
  integer and decimal arithmetic)
• Control flow (branches, jumps, calls, returns)
• String instructions (move and compare)
• FP data movement (load, load const., store)
• Arithmetic instructions (add, subtract, multiply,
  divide, square root, absolute value)
• Comparisons (can send result to ALU)
• Transcendental functions (sin, cos, log, etc.)
            Top 10 80x86 Instructions
                                            Integer Average
    Rank          Instruction               % Total Executed

     1            load                            22%
     2            conditional branch              20%
     3            compare                         16%
     4            store                           12%
     5            add                              8%
     6            and                              6%
     7            sub                              5%
     8            move register-register           4%
     9            call                             1%
    10            return                           1%
                  Total                           96%

Simple instructions dominate instruction frequency - support these.
                      Addressing Modes

• The x86 offers several different addressing modes for
  accessing memory

  Register indirect                 Address in register (mem[R1])

  Base with displacement            Address in base register plus
  (8, 16, or 32-bit displacement)   displacement (mem[R1+100])

  Base plus scaled index            Address is
  (8, 16, or 32-bit displacement)    Base + 2scale x Index
                                    scale = 0, 1, 2 or 3

  Base plus scaled index with       Address is
  displacement                       Base + 2scale x Index + disp.
  (8, 16, or 32-bit displacement)   scale = 0, 1, 2 or 3
         80x86 Instruction Format

• Instructions sizes vary from 1 to 17 bytes
                  11                         80x86 Length Distribution
                        0%
                  10    0%
                         1%
                         1%
                        0%
                   9    0%
                        0%
                        0%
                        0%
                   8    0%
                        0%
                        0%
                           2%
                            2%                                                                  Espresso
                   7           4%
Length in bytes




                              3%
                                4%                                                              Gcc
                   6                    6%
                                                                                    27%
                                                     13%                                        Spice
                                                    12%
                   5                                  13%
                                                         15%                                    NASA7
                                                   12%
                                 3%
                   4              4%
                            1%
                                 3%
                                                                                          29%
                   3                                                                27%
                                                          16%
                                                                       21%
                                                                              24%
                                                                             24%
                   2                                         17%
                                                                        23%
                                                                           25%
                   1                                                      24%
                                                                 19%
                                                                         24%

                       0%                    10%                20%                  30%

                                      % instr uctions at each length
         Performance Comparison
       Pentium Pro vs. MIPS R10000

Benchmark      Pro      MIPS        MIPS÷Pro
SPECint95      8.7      8.9           1.02
SPECfp95       6.0      17.2          2.87

• The Pentium Pro and MIPS R1000 have comparable
  performance on integer computations.

• The MIPS R10000 has much better performance than
  the Pentium Pro for floating point computations.
                       Comparison

• How would you expect the x86 and MIPS architectures to
  compare on the following
   – CPI on SPEC benchmarks
   – Ease of design and implementation
   – Ease of writing assembly language & compilers
   – Code density
   – Overall performance

• What other advantages/disadvantages are there to the
  two architectures?
                  Next Time
• Chapter 4 Computer Arithmetic
• Assignment 1 to be posted