Register Allocation by chenshu

VIEWS: 11 PAGES: 48

									                    Register Allocation




btw, there are lots of examples, but I will probably forget to stop and let
everyone digest or ask questions. please feel stop me if i do.
                 Overview
•   Base case algorithm
•   Optimization goals
•   Improvements at basic block level
•   Improvements at function level
•   Practicalities
•   Going further
             Register allocation
• Registers hold values
   – Sometimes only certain types
• Used for the input and output of instructions
   – Exclusively, in the case of RISC architectures
• There are not many
   – Generally less for CISCs than RISCs
• Need to swap values in and out of memory
   – “Fills” and “spills”
• Optimization problem: minimize # fills, spills
   – Hard (in the human and theoretical sense)
              Pseudo-registers
• Isolate the complexity of registers with p-regs
  – Post-pone decision of what to put in registers
• The idea: pretend you have infinite registers
  – Simplifies AST  IR
       add $1001, $999, $1000
       mul $1002, $1001, $1000

  – Simplifies high-level IR optimizations
• Infinite, but not dynamically addressable
  – Each use refers to one static p-reg, no indirection
  – Contrast with “the store”
      • Allows loads from dynamic addresses: lw $2, 4($1)
                   psuedo-
                                           the store
                   registers

                 ...             0xffff0
lw $42, 4($41)   $41   0xffff0   0xffff4      7
                 $42     7       0xffff8
                 ...             0xffffc
High-level organization

  Lex, parse, check,
     IR codegen

                        Use p-regs. Put
     Optimization       as much work as
                         possible here…

  Register allocation


       Low level
     optimization /     … and not here
     ISA codegen
               Pseudo-registers
• After high-level optimization, IR  ISA
   – No more psuedo-registers
• The idea:
   – Not enough registers: “spill” to memory
   – Need spilled contents: “fill” from memory
• Important distinction:
   – Load/stores (IR) vs. fills/spills (Regalloc)
   – For example:
      • [$1] := $2
          – IR: 1 store, 2 read
          – Regalloc: 1 store, 0-2 fills
      • $2 := [$1]
          – IR: 1 load, 1 read, 1 write
          – Regalloc: 1 load, 0-1 fills, 0-1 spills
            Simplest approach
• Give every pseudo-register a home
• The home is in memory, but separate from the
  store
• Keep things in registers for as little time as
  possible
   – Every write to a p-reg spills
   – Every read from a p-reg fills

• Efficient?
   – Extremely
   – Just kidding, this is the worst possible
                       Example
 Source          IR                  Code generated

 A[x] = 3        mul p, x, 4         lw $t1, x($fp)
                 add q, p, A         muli $t2, $t1, #4
                 [q] := 3            sw $t2, p($fp)
                                     lw $t1, p($fp)
                                     lw $t2, A($fp)
                                     add $t3, $t1, $t2
                                     sw $t3, q($fp)
                                     ldi $t1, 3
                                     lw $t2, q($fp)
                                     sw $t1, 0($t2)

But on the bright side, there only
need be as many registers as there
are operands.
         Optimization goals
• Clearly room for improvement
• Want to minimize fills and spills
• Same constraint as any optimization:
  – Preserve the observable behavior
  – That means for any execution path
            Basic block level
• Keep track of what p-reg is in what register
• Avoid obviously redundant fill/fill, spill/fill
• Handle multiple incoming paths to BB
  entry:
  – Reset records at beginning of BB
• Handle multiple paths from BB to exit:
  – Spill every p-reg currently in a register
• Described in more detail in dragon book
                Function level
• Global register allocation
  – Old term, “global” means intra-procedural
• Minor improvement:
  – Use dataflow analysis
     • Live variables
  – Avoid useless spills:
     • Do not spill if not live
        – At end of BB
        – When spilling to make room
             Function level
• ...and now the big algorithm
  – Major improvement
  – Same idea at the heart of modern compilers
• Actually, two approaches:
  – Top-down
  – Bottom-up (a.k.a graph-coloring)
      Global register allocation
• Top-down register allocation [Chow, 84]
   – “Use high-level information to make allocation
     decisions.”
      • Priority-function determines ordering
   – More pessimistic: assumes nothing live at the start
   – More conservative: courser definition of interference
   – [Briggs, 92] found O(n log n) for bottom-up and O(n2) for
     top-down
   – Research appears to favor bottom-up (# papers)
   – ? Industry too ?
      Global register allocation
• Bottom-up register allocation [Chaitin, 81][Briggs, 94]
   – Step 0: dataflow analyses
   – Step 1: build webs
   – Step 2: build interference graph
   – Step 3: coalesce
   – Step 4: compute spill costs
   – Step 5: color
   – Step 6: spill
    Step 0: dataflow analyses
1. Given: IR
  – Build the CFG
2. Find reaching definitions
3. Find live variables
Step 0: dataflow analyses


                      def x


                                    use x
            use x     use x
                                    def x


            def x


                    def x


                            use x
      Step 0: dataflow analyses


                            def x


                                          use x
                  use x     use x
                                          def x
Live variables
                  def x


                          def x


                                  use x
     Step 0: dataflow analyses


                           def x


                                         use x
                 use x     use x
                                         def x
Reaching defs
                 def x


                         def x


                                 use x
           Step 1: build webs
• A web is:
  – a set of statements whose definitions and
    uses of a given pseudo-register must share a
    physical register
• (In the classic approach) all or nothing:
  – All reads/writes fill/spill or none do


               pseudo 1                 1   physical
                          *   web   *
               register                     register
            Step 1: build webs
Web building approach:
• Initially:
    – Each use and def points to a web containing
      only itself

•   For each statement:
    – For each use U of a p-reg in the statement:
      •   For each reaching definition D (from step 0) :
          – Merge D’s and U’s webs
                  Step 1: build webs
                                          def x

                                    *
                                                        use x
         def x                  use x     use x
                        use x
                                                        def x
use x    use x
                        def x

def x
                                def x
        def x

                use x
                                        def x
         def x

                        use x
use x    use x
                        def x
                                                use x
def x

        def x

                use x
                                           Webs
     Step 2: interference graph
• An interference graph is a graph where:
  – Nodes are webs (step 1)
  – Edges are webs that cannot occupy the same
    physical register

• Overly conservative approach
  – Two webs interfere if they are both live at any
    statement
• Better:
  – Two webs interfere if one is live at the other’s
    definition
        Step 2: interference graph
Interference graph building approach:
• For each web W:
    –   For each defining statement S in W:
        •   For each reaching and live (step 0) definition D at
            statement S that is not in W:
            –   W interferes with the web containing D

Store results as both:
• Triangular adjacency matrix:
    –   Efficient form for coalescing step
•   Adjacency list:
    –   Efficient form for coloring step
Step 2: interference graph
                           def x


                           use x           def z
               use x
                           def z           def x

               def x
               def y


                         def x


                 use y             use z


                         use x
Step 2: interference graph
                           def x


                           use x           def z
               use x
                           def z           def x

               def x
               def y


                         def x


                 use y             use z


                         use x
          Step 2: interference graph
                                                       def x


                                                       use x           def z
                                           use x
                                                       def z           def x

                                           def x
                                           def y


                                                     def x


                                             use y             use z

Alive-at-def vs. Alive-at-same-statement
                                                     use x
                 Step 3: coalesce
• Given a copy statement: a := b
• If a’s web and b’s web do not interfere:
   – All uses of a can be replaced with b or vice-versa
       • a or b could be fixed (parameter or return register)
   – Eliminate copy instruction
• Redundant copies often are introduced by optimizations
• Can have a negative effect:
   – Live range is longer  less coloring flexibility  more spilling
   – Optimistic coalescing [Park, 1998]
• Changes interference graph
   – Just merging edges is too conservative
   – Need to go back to step 2
    Step 4: compute spill costs
• Order webs by how expensive it is to spill
• Take into account:
  – Number of uses and defs
  – Loop nesting depth
  – Possibility of rematerialization
• This is a heuristic
  – Cannot generally know branch frequency,
    loop trip count (without profiling data)
                      Step 5: color
• Problem: assign physical registers to webs
• Reduces to map maker’s coloring problem:
   – Give each node of the interference graph a color property
       • Color = physical register
   – No two adjacent nodes can have the same color
       • Adjacent  edge between  cannot share a register
   – # available colors = # available registers

• To address ISA restrictions:
   – Register classes:
       • Separate graphs
   – Other:
       • Add a node to the graph for every register
       • Register nodes are fully connected
       • Add an edge between a register and every web that cannot be
         allocated to that register
                 Step 5: color
•   Graph coloring is NPC for N >= 3
•   But there are heuristics:
    1. Don’t try to find the minimum:
      – Given k registers, try to use k colors
    2. Just pick the best looking at the time
      – Might not be best overall
•   May not find solution, even if it exists
•   Acceptable and fast
                                Step 5: color
Optimistic graph coloring approach: [Chaitin, 81][Briggs, 94]
•   Initially:
    –     Each node’s degree is the number of adjacent nodes

•       Until there are no more uncolored nodes:
    –     If there is a node with degree < k
               –     Choose it
          •    Otherwise
               –     Choose node with lowest spill cost (Step 3)   (optimism here)
    –     Lower degree of chosen node’s neighbors
    –     Push onto stack
•       For each node of the stack (LIFO)
    –     If there is a color not yet assigned to neighbors:
          •    Use that color
    –     Else (optimistic failed; cold, hard reality)
          •    Mark as spilled, keep uncolored
                        Step 5: color
                                               def x
Trivial if # registers >= # webs

                                               use x           def z
                                   use x
                                               def z           def x

                                   def x
                                   def y


                                             def x


                                     use y             use z


                                             use x
Step 5: color
                     def x


                     use x           def z
        use x
                     def z           def x

         def x
         def y


                   def x


           use y             use z


                   use x
                     Step 5: color
Add some ISA restrictions:                  def x


                                            use x           def z
                                use x
                    registers               def z           def x

                                def x
                                def y


                                          def x


                                  use y             use z


                                          use x
                     Step 5: color
Add some ISA restrictions:                  def x


                                            use x           def z
                                use x
                    registers               def z           def x

                                def x
                                def y


                                          def x


                                  use y             use z


                                          use x
                     Step 5: color
Add some ISA restrictions:                  def x


                                            use x           def z
                                use x
                    registers               def z           def x

                                def x
                                def y


                                          def x


                                  use y             use z


                                          use x
                       Step 6: spill
• Maybe do not even need to spill:
   – Rematerialization (should be chosen first, lowest spill cost)
• Better register usage:
   – Insert new load and store instructions
   – Creates new, very short webs
       • Interference graph changed, need restart at Step 2
       • Need to modify spill cost to make new web’s cost = ∞
• Simple approach:
   – Keep a set of registers reserved for filling/spilling
   – Add “spilt” flag to web
   – When emitting an instruction:
       • Load spilt webs of input into reserved registers
       • Execute
       • Use reserved register as destination of spilt web, then store
                  Epilogue: codegen
   • Now we know:
      – For each use/def, it’s web
      – For each web, whether it spills or not
      – If the web spills:
              – Same as the base case: reads fill, writes spill
          • Otherwise
              – Just use the web’s register as the operand




                       $1  $s1                         $1  $s1
                  $2  spills to 28($sp)                $2  $s2
lw $t1, 28($sp)                            neg $1, $2              neg $s1, $s2
neg $s1, $t1
                       Practicalities
• Webs that span calls:
   – Doesn’t appear to be addressed much [at all?] in literature
   – Caller- and callee-preserved registers
       • If a web spans a call, does it get split in two?
           – Not if it is callee-saved
   – Chicken-egg problem:
       • Splitting a caller-saved web changes the interference graph, makes
         it more colorable, which could change whether this web spans the
         call...
   – Simple heuristic:
       • During allocation: mark webs as either call-spanning or not
       • When picking registers:
           – Prefer caller-saved for non-spanning
           – Prefer callee-saved for spanning
               » If no callee-saved left, use caller-saved and spill/fill
                 Practical details
• Parameters (passed on the stack) and globals
   – Want to keep in registers
      • Cannot for globals unless:
          – Simple: no calls
          – 10x harder: inter-procedural analysis says it’s ok
   – Need to insert “import” statements
      • Otherwise there will be use statements for a variable with no
        reaching defs; messes up algorithms
   – Where to put the imports?
      • CFG head
          – Makes long webs
          – Especially if variable only used near the end
      • As late as possible
          – Requires an analysis like Partial Redundancy Elimination
               Further optimizations
 • Live-range splitting                    Good:
                                           split x
 Need to         Bad:         Bad:
spill x or y     spill x      spill y       def x
                                                     • Create contains
  def x        def,spill x     def x
                                            use x
                                                       graph

  use x         fill,use x     use x
                                                     • During coloring, use
                                            def x      contains graph to split
  def x        def,spill x     def x       spill x     before resorting to
                                           def y       spilling
  def y           def y      def,spill y
                                           use y
                                                     • Other variations in
  use y          use y       fill,use y                [Cooper, 04]
                                            def y
  def y          def y       def,spill y
                                            fill x
  use x        fill,use x      use x
                                           use x
  def x        def,spill x     def x
                                           def x
  use x        fill,use x      use x
                                           use x
          Further optimizations
• Stack allocation for fills/spills
   – Goal: minimize stack usage
   – Essentially, it’s the same problem we just solved:
      • P-reg is to register file as home is to stack memory
            Further optimizations
• Alias analysis for heap
void foo(A *a) {                • a and y have pseudo-registers, so they
  int y = 1;                     may be kept live
  for (; a->x < 1000; ++a->x)
    y += a->x;                  • a->x does not have a pseudo-register:
}                                it has a dynamic location. Loads and
                                 stores generated during code
                                 generation, even before register
                                 allocation.
                                • Start by creating indirect pseudo-
                                  registers and postponing loads/stores
                                • More problems...
            Further optimizations
• Alias analysis for heap
void foo(A *a, A *a2) {           • Need to generate import of a->x before first
  int y = 1;                        use.
  for (; a->x < 1000; ++a->x) {       • Only when original program would have.
    bar();                              Cannot introduce new memory
    y += a->x;                          accesses.
    a2->x *= a->x;
  }                               • Aliasing is a problem:
}                                     • Does a2 point to the same object as a?
                                  • Inter-procedural calls:
                                      • Does bar() modify the object a points
                                        to?
                                  • Solutions:
                                      • Easy: very conservative
                                      • 10x harder: points-to analysis
          Further optimizations
• Interaction with instruction scheduling and
  selection
  – Naïve approach: select, allocate, schedule
  – Not orthogonal
     • Scheduling goal: put as much space between reads and
       writes.
     • Allocation goal: want short live ranges, so put definitions and
       uses close together.
  – Need to balance both interests
  – GCC: select, allocate 1, schedule 1, allocate 2, schedule 2
        Further optimizations
• List of techniques at end of chapter 13 in:

    Keith D. Cooper and Linda Torczon.
        Engineering a Compiler. 2004
                         References
[Briggs, 92]    Preston Briggs. Register Allocation via Graph Coloring, Tech.
                Rept. CRPC-TR92218, Ctr. for Research on Parallel
                Computation, Rice Univ., Houston, TX, Apr. 1992.
[Briggs, 94]    Preston Briggs, Keith D. Cooper, and Linda Torczon.
                Improvements to graph coloring register allocation. ACM
                Transactions on Programming Languages and Systems,
                16(3):428-255, May 1994.
[Chaitin, 81]   Gregory J. Chaitin. Register allocation and spilling via graph
                coloring. United States Paten 4,571,678, February 1986.
[Chow, 84]      Frederick C. Chow and John L. Hennessy. Register allocation
                by priority-based coloring. SIGPLAN Notices, 19(6):222-232,
                June 1984. Proceedings of the ACM SIGPLAN ’84 Symposium
                on Compiler Construction.
[Park, 98]      Jinpyo Park and Soo-Mook Moon. Optimistic register
                coalescing. In Proceedings of the 1998 International
                Conference on Parallel Architecture and Compilation
                Techniques (PACT), pages 196-204, October 1998.

								
To top