Register Usage in MIPS ABI

Document Sample
Register Usage in MIPS ABI Powered By Docstoc

                    Register Usage in MIPS ABI

                    Register          Soft       ABI function for this
                    Number            Name       register

                    $0                           always contains zero
                    $1                 at
                    $2-$3             v0,v1
                    $4-$7             a0-a3
                    $8-$15            t0-t7
                    $16-$23           s0-s7
                    $24-$25           t8,t9
                    $28               gp
                    $29               sp
                    $30               fp
                    $31               ra

                                              Inf3 Computer Architecture - 2007-2008               40

          The ABI gives well-understood functions to each of the registers in the general purpose register
          set. There are obvious uses, such as the stack pointer. There are also three other special registers;
          the return address (ra), the frame pointer (fp) and the global pointer (gp). The ra register is
          assigned the return address when a function call is made. Software will put this value on the
          stack if the called function itself calls further functions. The fp register points to the base of the
          stack frame for the current function. We’ll see that in the next slide. The gp register, when used,
          points to a pool of global data that can be commonly referenced by all functions. This may
          include variables with file or global scope.
          A function can use registers t0-t9 freely, but if it calls another function they may be overwritten.
          A function may not overwrite the contents of s0-s7, and must preserve their original contents if it
          wants to use them. Hence, s0-s7 are callee-saved, whereas t0-t9 are caller-saved registers.

         Additional notes:

Informatics 3 - Computer Architecture                                                                              40

                    Functions and Stack Frames

                                                                         high addresses
                     foo (int i)                                                                usually
                                                                          Stack frame           grows
                    {                                                       for foo
                       return bar (i);                                                          downwards
                    int bar (int n)
                                                                          Stack frame
                      int a = n+1, b = n-1;                                 for bar
                      return (a*b);                                                       $sp

                       Each function has a dynamically                free stack space
                        allocated stack frame

                       Frame contents normally accessed by
                        addresses that are relative to either             low addresses
                        the stack pointer $sp or the frame
                        pointer $fp

                                                Inf3 Computer Architecture - 2007-2008                  41

          Stacks usually grow downwards in memory. Can you think why this might be?

         Additional notes:

Informatics 3 - Computer Architecture                                                                              41

                      Anatomy of a Stack Frame

                  int foo (int i)                                                          high addresses
                     return bar (i);
                  }                                           Stack frame
                                                                for foo
                  int bar (int n)
                    int a = n+1, b = n-1;                                                  incoming args
                    return (a*b);
                  }                                                                       callee-save space
                                                              Stack frame
                                                                for bar                    local variables
                     Positive offsets from $fp = args
                     Negative offsets from $fp = locals
                                                                                           outgoing args      $sp
                     Not all portions of frame are needed by
                      all functions
                     Callee save space holds previous $fp,                               free stack space
                      $ra, and any $s0-$7 that are modified by
                      function bar
                                                                                            low addresses

                                                 Inf3 Computer Architecture - 2007-2008                         42

          The incoming arguments are values passed from foo to bar. Some of the args may be passed
          in registers and may not need space on the stack. The callee save space is a region that bar
          can use to save any of $s0-$s7 that may be modified in bar. Local variables in bar may
          require some storage space on the stack. The outgoing args space is where args for functions
          that bar calls will be stored. This space will become the incoming args space of functions
          that bar calls (if any). If bar calls several functions, then the outgoing args space would
          typically be the maximum space needed by any such function, allowing it to be allocated

         Additional notes:

Informatics 3 - Computer Architecture                                                                                      42

                    Call – Return Sequencing

                       Call sequence
                         –   Save caller-saved registers
                         –   Copy arguments to stack or regs
                         –   Call the function
                                                                           Function Prologue
                                                                              –   Allocate callee’s stack frame
                                                                              –   Reposition frame pointer
                                                                              –   Save callee-saved registers

                                                                           < execute body of function >

                                                                           Function Epilogue
                                                                              –   Restore callee-saved registers
                                                                              –   Restore frame pointer
                                                                              –   De-allocated callee’s stack frame
                       Return sequence
                                                                              –   Return to caller
                         –   Restore caller-saved registers

                                                  Inf3 Computer Architecture - 2007-2008                              43

          Exercise: take the foo() and bar() code shown earlier. Compile it using gcc on your
          workstation to produce an assembler file, and identify the four sequences listed in this slide.
          To do this type:

           gcc –O –S –o assembler.lis program.c

          Where assembler.lis is the output where your assembler code will be produced, and
          program.c is the name of your C source file containing foo() and bar().

         Additional notes:

Informatics 3 - Computer Architecture                                                                                            43

                  Categorising Data by Location and Access

                         C programs contain several categories of data, according to where they
                          live and how they are created
                         The way addresses are computed depends on the category of access

                        Classification       Where data is located             How created        Addressing mode

                        Function            On stack, above frame             Dynamic            $fp + positive offset
                        arguments           pointer                           Function scope

                        Automatic           On stack, below frame             Dynamic            $fp + negative offset
                        variables           pointer                           Function scope

                        Dynamically         On the heap                       Dynamic            GPR + offset
                        allocated                                             malloc(), free()

                        Global and static   .bss section                      Static             $gp + signed offset
                        variables                                             Read or Write

                        Embedded            Often in a constant pool in       Static             $pc + signed offset
                        constants           the .text section                 Read-only

                                                       Inf3 Computer Architecture - 2007-2008                            44

          Each category of data, whether a function argument or an automatic variable, is allocated in
          a different way, and is therefore accessed in a different way. There are well-defined regions,
          such as the stack, the heap and the global data area. Each may have its own pointer (e.g. $sp,
          $gp) or may be accessed relative to $pc or a general-purpose register.

         Additional notes:

Informatics 3 - Computer Architecture                                                                                               44

                      Addressing Mode Frequency

                                           Displacement                                                                   55
                         Addressing mode

                                             Immediate                               17
                                               Register            3                                                            spice
                                                Scaled                               16
                                                Indirect               6

                                                           0               10         20        30        40         50    60
                                                                   Frequency of the addressing mode (%)                                  H&P
                                                                                                                                        Fig. 2.7

                     Bottom-line: few addressing modes account for most of the
                      instructions in programs

                                                                       Inf3 Computer Architecture - 2007-2008                                45

          In practice, compilers usually convert complex address calculations into unsigned integer
          computations and then use very simple addressing modes based on computed addresses.
          Many memory references are to variables located on the stack. These always use [sp + offset]
          addressing modes, making the Displacement mode one of the most common.
          Try compiling a simple piece of C code into assembler and look at the addressing modes obtained
          for each variable accessed by the code.

          Hint:   gcc -S foo.c

         Additional notes:

Informatics 3 - Computer Architecture                                                                                                                    45

                    Displacement Addressing and Data Classification

                     Stack pointer and Frame pointer relative
                         – Compiler can often eliminate frame pointer
                         – Function must not call alloca()
                         – 5 to 10 bits of offset is sufficient in most cases

                     Register + offset
                         – Generic form for accessing via pointers
                         – Multi-dimensional arrays require address calculations

                     PC relative addresses
                         – Useful for locating commonly-used constants in a pool of
                             constants located in the .text section

                                             Inf3 Computer Architecture - 2007-2008               46

          Exercise: add a call to alloca() in both foo() and bar() to see the effect on how the code gets
          compiled. Try “man alloca” if unsure how to use it.

         Additional notes:

Informatics 3 - Computer Architecture                                                                             46

                    Floating point arithmetic

                       Usually based on the IEEE 754 floating point standard
                       Useful when greater range of number is required
                         –   Integer: -2m-1 .. +2m-1-1
                         –   Floating point:
                                                    Binary                        Decimal
                               Single precision       ± (2-2-23) 127             ~ ± 1038.53
                               Double precision       ± (2-2-52)1023             ~ ± 10308.25

                       See Hennessy & Patterson appendix for details of formats and operations
                         –   Set aside an hour to read their appendix and become familiar with the overall
                             structure of the FP standard (don’t memorise details – you can always refer
                             back to the standard if you ever need to use it)

                       Key points for instruction sets:
                         –   Integer and Floating Point never mixed in same operation
                         –   Separate register sets for integer and FP operations are therefore common
                         –   Floating point operations often optional or omitted from embedded processors
                         –   Other ways to represent fractional values, e.g. fixed-point types

                                                   Inf3 Computer Architecture - 2007-2008                    47

          Follow the suggested reading on Hennessy and Patterson from the second bullet point. Make
          summary notes here.

         Additional notes:

Informatics 3 - Computer Architecture                                                                                   47

                    Encoding the Instruction Set

                     How many bits per instruction?
                        – Fixed-length 32-bit RISC encoding
                        – Variable-length encoding (e.g. Intel x86)
                        – Compact 16-bit RISC encodings
                              ARM Thumb
                              MIPS16
                              ARCompact

                     Formats define instruction groups with a common set of

                                            Inf3 Computer Architecture - 2007-2008               48

          An instruction format defines a set of operands that are used in common by a group of
          instructions. An instruction set is simply a collection of formats and the operations defined
          for each format.

         Additional notes:

Informatics 3 - Computer Architecture                                                                           48

                    Design consideration for ISA encoding

                     How compact is the encoding?
                     Is the encoding orthogonal?
                     How easy is it to extract operands unambiguously?
                        – Register specifiers should be aligned in all formats (ideally)
                        – Implicitly defined registers will complicate decode
                        – How are the literals aligned and/or extended?
                     Are control transfers easily identifiable?
                        – If not, slow decoding of branches may increase CPI

                     Op-code assignment:
                        – Minimise Hamming distance between codes that perform
                          similar operations.
                        – Leads to simpler and faster decode logic

                                            Inf3 Computer Architecture - 2007-2008              49

          If you don’t know what Hamming distance is, see page 193 of Andrew Tanenbaum,
          Computer Networks, 4th edition (a standard text in communications). A google search will
          also find the definition. Think about why this is useful in instruction set design, and then
          make notes here as a reminder.

         Additional notes:

Informatics 3 - Computer Architecture                                                                          49

                    MIPS 32-bit Instruction Formats

                     R-type (register to register)
                        – three register operands
                        – most arithmetic, logical and shift instructions

                     I-type (register with immediate)
                        – instructions which use two registers and a constant
                        – arithmetic/logical with immediate operand
                        – load and store
                        – branch instructions with relative branch distance

                     J-type (jump)
                        – jump instructions with a 26 bit address

                                           Inf3 Computer Architecture - 2007-2008                50

          At this point you will find it helpful to read Appendix B from Hennessy and Patterson (4/e)
          “Putting it all together: The MIPS Architecture”, p.B-32
          Appendix B is all about ISA design issues, using the MIPS architecture as a teaching

         Additional notes:

Informatics 3 - Computer Architecture                                                                         50

                    MIPS R-type instruction format

                        add     $1, $2, $3                    special        $2       $3   $1        add

                        sll     $4, $5, 16                    special        $5       $4        16   sll

                                             Inf3 Computer Architecture - 2007-2008                        51

          Make your own list of instructions that follow this format.

         Additional notes:

Informatics 3 - Computer Architecture                                                                                 51

                    MIPS I-type instruction format

                      lw     $1, offset($2)                  lw           $2        $1     address offset

                      beq    $4, $5, .L001                   beq         $4         $5   (PC - .L001) >> 2

                      addi   $1, $2, -10                     addi         $2        $1        0xfff6

                                           Inf3 Computer Architecture - 2007-2008                            52

          Find more examples of instructions that follow this format and write them here.

         Additional notes:

Informatics 3 - Computer Architecture                                                                                   52

                    MIPS J-type instruction format

                      call func                               call         absolute func address >> 2

                                            Inf3 Computer Architecture - 2007-2008                      53

          Again, find other examples of MIPS instructions that use this format.

         Additional notes:

Informatics 3 - Computer Architecture                                                                              53

                    Code density optimisations

                     Prologue and Epilogue

                     Constant pools and PC relative loads

                     2-register formats

                     Restricted register sets

                     Non-orthogonality and implicit register operands

                                          Inf3 Computer Architecture - 2007-2008           54

          Read section B.10, “Fallacies and Pitfalls”, on page B-39 of Hennessy & Patterson. Make
          brief notes here to remind you of the main points.

         Additional notes:

Informatics 3 - Computer Architecture                                                                     54


                             Instruction Set   Instruction      GP registers     Special Features
                             Architecture      Size

                             MIPS16            16 bit           8                Some special ABI
                                                                                 registers still
                             ARM thumb         16 bit           8                push and pop for
                                                                                 stack frame support

                             ARCompact         Mixed 16 and     8 direct         Freely-mixed compact
                                               32 bit           32 available     and 32-bit
                                                                                 Long-immediate data

                                               Inf3 Computer Architecture - 2007-2008                   55

          Most 32-bit architectures used in embedded systems have acquired a subset that is encoded
          in 16 bits. These instructions still operate on 32-bit data, but are encoded more efficiently.
          Generally speaking they all use two register operands rather than three, and also restrict the
          number of general purpose registers to 8. The ARCompact instruction set allows a free
          mixing of the original 32-bit instructions and the compact 16-bit instructions. This is not
          permitted in ARM thumb or MIPS16, where each function must be compiled into the 32-bit
          or the 16-bit instruction set. Recently, ARM introduced the Thumb2 instruction set which
          removes that restriction.

         Additional notes:

Informatics 3 - Computer Architecture                                                                              55

                    ARM Thumb Push and Pop instructions

                       Particularly effective for encoding function entry and exit code in
                        a compact form.
                       Operand is a bit vector, with each bit specifying whether one of
                        the callee saved registers should be pushed or popped.
                       Push may also save the link register (equiv. to MIPS $ra)
                       Pop may then pop that value directly into PC, causing the
                        function to return to the caller.
                       E.g.
                               push { r4, r5, r6, r7, lr }
                               pop { r4, r5, r6, r7, pc }
                       These are multi-cycle operations, performing up to 5 memory
                        reads or writes.
                       Complex to implement, but highly effective in terms of code
                         – Prologue and epilogue can account for 10-15% of the code space

                                                    Inf3 Computer Architecture - 2007-2008      56

          Try to find other Instruction Set Architectures that support multi-register move operations.
          List them here:

         Additional notes:

Informatics 3 - Computer Architecture                                                                          56

                     Instruction Frequency

                                                80x86 instruction                  Fraction (%)

                                     load                                                 22

                                     conditional branch                                   20

                                     compare                                              16

                                     store                                                12

                                     add                                                  8

                                     and                                                  6

                                     sub                                                  5

                                     move register-register                               4

                                     call                                                 1

                                     return                                               1        H&P
                                                                                                  Fig. 2.16
                                                      Total                               96

                    Bottom-line: few instruction types account for most of the
                     instructions executed

                                                 Inf3 Computer Architecture - 2007-2008                       57

          Bear in mind that each architecture is different, but that in general the frequencies shown above
          are representative of typical desktop applications.
          Embedded applications often see increasing frequencies of signal processing operations,
          especially 16-bit multiplications.

         Additional notes:

Informatics 3 - Computer Architecture                                                                                    57

                     IS and Performance

                             Implementation                   ISA                  Compiler


                    ISA → Implementation: cycle time, pipelining, CPI, instruction length
                    ISA → Compiler: instruction scheduling, code motion, branch
                     optimizations, code generation, code size, register allocation
                    Implementation → instruction delays, register allocation, functional

                                              Inf3 Computer Architecture - 2007-2008             58

          This slide summarises the relationship between ISA and Compiler, and ISA and Implementation.

         Additional notes:

Informatics 3 - Computer Architecture                                                                       58

                   IS Guidelines

                  Regularity: operations, data types, addressing modes, and
                   registers should be independent (orthogonal)

                  Primitives, not solutions: do not attempt to match HLL
                   constructs with special IS instructions

                  Simplify tradeoffs: make it easy for compiler to make choices
                   based on estimated performance

                  Trust compiler: provide compiler with instructions and
                   primitives that exploit knowledge at compile-time

                                              Inf3 Computer Architecture - 2007-2008                  59

          Instruction Sets can vary enormously from one architecture to another. However, within the set of
          all RISC architectures there are actually few substantial differences.
          It is also worth noting that the number of distinct desktop architectures has been decreasing year
          on year. In 2007 most new desktop systems shipped will have x86 processors. In the server space
          one can still find Sun SPARC and IBM PowerPC architectures.
          The embedded computing domain has a much greater diversity of architectures. Can you think
          why this might be?

         Additional notes:

Informatics 3 - Computer Architecture                                                                            59

                   Improving CPU Performance (H&P 2.11; A.1; A3)

                    CPU performance can be computed by the “CPU
                     performance equation”: CPU time = IC x CPI x Clock time

                    To reduce CPU time: ↓ IC; ↓ clock period; ↓ CPI

                    ISA influences implementation, compiler optimizations, and
                     therefore performance

                    ISA must be an easy compiler target

                    No need to provide too many and too complex

                    Compiler has a significant role in improving performance
                                             Inf3 Computer Architecture - 2007-2008                   60

          Essentially, to improve CPI we must reduce one of the three primary contributors, or else issue
          more than one instruction per cycle (or both!)

         Additional notes:

Informatics 3 - Computer Architecture                                                                             60

                     Program Structure: Basic-Blocks (BB)

                    Definition: straight-line code with single entry and single exit
                    Boundaries:
                      – Branches and jumps
                      – Calls and returns
                      – Targets of branches, jumps, calls, and returns

                         lw          r2,0(r1)
                         lw          r3,4(r1)     BB1                                       BB1
                         addi        r3,r3,n
                         bne         r2,r3,Label2
                 Label1: lw          r4,8(r1)
                                                                                      BB2         BB3
                         sub         r2,r2,m      BB2
                         beq         r2,r0,label1
                 Label2: add         r1,r1,r3

                                             Inf3 Computer Architecture - 2007-2008                     61

          Note: not all basic blocks are preceded by a branch. Contrive an example instruction sequence to
          illustrate this point here:

         Additional notes:

Informatics 3 - Computer Architecture                                                                              61

                    Structure of Modern Compilers

                        Dependences                                                            Function
                                                                    HLL code
              Language dependent;                                                       Generate intermediate
              machine independent                                                       representation
              Somewhat language independent              High-level                     Procedure inlining;
              largely machine independent               optimizations                   loop transformations

                                                                    Optimized IR
              Mostly language independent                   Global                      Global + local optimizations;
              mostly machine independent                   optimizer                    register allocation
              Language independent                           Code                       Instruction selection;
              machine dependent                            generator                    scheduling

                                                                    Machine code

                                               Inf3 Computer Architecture - 2007-2008                               62

          If you are taking a compiler course this year, these optimisations will be familiar. If not, you need
          to be at least aware of:
          1. The difference between global and local optimisations
          2. Machine dependent and machine independent optimisations
          If you need help with understanding the role of compilers, read section B.8, “Crosscutting Issues:
          The Role of Compilers”, in H&P (4/e) on page B-24

         Additional notes:

Informatics 3 - Computer Architecture                                                                                          62

                     Compiler Optimizations

                    High-level: at HLL source
                      – Procedure inlining
                    Local: within basic-block (BB)
                      – Common sub-expression elimination
                      – Constant propagation
                      – Stack height reduction
                    Global: across BB’s
                      –   Global common sub-expression elimination
                      –   Copy propagation
                      –   Code motion
                      –   Induction variable elimination
                    Machine-dependent
                      – Strength reduction
                      – Pipeline scheduling
                      – Branch offset optimization

                                              Inf3 Computer Architecture - 2007-2008              63

          This slide summarises the essential concepts. A little reading around the subject and
          supplementary note-taking will help with revision.

         Additional notes:

Informatics 3 - Computer Architecture                                                                        63

Shared By: