Compile time Copy Elimination

Document Sample
Compile time Copy Elimination Powered By Docstoc
					Register Allocation and Spilling
via Graph Coloring
           G. J. Chaitin
        IBM Research, 1982
     Before the register allocation phase, the
      compiler assumes that there are an unlimited
      number of general purpose registers
     The symbolic registers must be mapped to
      real registers in a way that avoids conflicts
     Symbolic registers that cannot be mapped to
      real registers must be spilled to memory
     We need an algorithm to map registers with
      minimal spilling cost
Paper Overview

   Register allocation overview
   Subsumption algorithm
   Interference graph coloring algorithm
   Spilling algorithm
Register Allocation Steps
  1.   Determine which registers are live at any
       point in the intermediate language (IL)
  2.   Build a register interference graph
          Nodes represent symbolic registers
          Edges represent a conflict between symbolic
  3.   Subsumption: eliminate unnecessary
       register copies
  4.   Find a 32-coloring of the interference graph
  5.   Decide which registers to spill if necessary

     If the source and destination of a register
      copy do not interfere, they may be coalesced
      into a single node
     For each register copy in IL, determine
      whether the registers interfere
     If not, coalesce the two nodes into one
     After first pass, rewrite IL code
     Repeat until no more coalescing is possible
Subsumption Example

Instructions Live   Dead
                           A   B
A=1           A
B=A           B
B=B+1                      C   D

C=B           C      B
D=A           D      A
…                   C, D
Subsumption Example

Instructions   Live   Dead
                             AD   BC
AD = 1          AD
BC = AD         BC
BC = BC + 1
…                     AD,
Finding a 32-Coloring
     Each symbolic register is assigned a color
      representing a real register
     If no adjacent nodes have the same color, then the
      coloring succeeds
     Assume that G has a node N with degree < 32
     Then G is 32-colorable iff the reduced graph from
      which N and all its edges have been omitted is 32-
     Algorithm throws away nodes of degree < 32 until all
      nodes have been removed
     Algorithm fails if no node has degree < 32
3-coloring example

Instructions   Live   Dead
                             A   B
A=1             A
B=2             B
C=3             C
                             C   D
?=A                    A
D=4             D
?=B                    B
?=C                    C
?=D                    D
     If the 32-coloring fails, then nodes must be
      spilled to memory
     Spilled registers are stored to memory, then
      loaded momentarily when their results are
     Every time spill code is generated, the
      interference graph must be rebuilt
     Usually recoloring succeeds after spilling, but
      sometimes several passes are required

   NP-Complete problem
   Heuristic: spill the node that minimizes
      – Cost of spilling / Degree of node
     Cost of spilling
      – (number of definition points + number of
        use points) * frequency of each point
     In some cases, spilled node can be
      reloaded for an extended interval

   The graph coloring and spilling
    algorithms should produce faster code
   The register allocation algorithm is
      – Graph coloring is (N)
      – But uses (N2) space
Compile-time Copy Elimination

            Peter Schnorf
         Mahadevan Ganapathi
           John Hennessy

            Stanford, 1993
     Single assignment languages simplify
      dependency checking
     Which simplifies automatic detection and
      exploitation of parallelism
     But single-assignment languages require a
      large number of copies
     Previous implementations eliminate copies at
     Increased efficiency if copies can be
      eliminated at compile time
Paper Overview
     Single-assignment languages
     Code generation
     Compile-time copy elimination techniques
      –   Substitution
      –   Pattern matching
      –   Substructure sharing
      –   Substructure targeting
     Results – success!
      – Eliminated all copies in bubble sort
Single-assignment languages
     Functional languages (LISP, Haskell, SISAL)
     Simpler dependency checking
      – True dependencies – write, read
             b = f(c), a = f(b)
      – Anti-dependencies – read, write
             a = f(b), b = f(c)
      – Output dependencies – write, write
             a = f(b), a = f(c)
      – Aliasing
             caused by pointers, array indexes
     To avoid aliasing, all inputs and outputs are passed
      by value
Example – Swap(A,i,j)
   Data flow diagram                              Input
    – Edges transport values
    – Simple nodes are operations
   Pick any feasible node              AElement           AElement
    evaluation order at random
   Naïve implementation
    – Each edge has its own memory      AReplace
    – Swap uses 5 array copies!
   Optimized implementation
    – Swap array updates are done in-                      AReplace
Example: BubbleSort(A)

   Compound nodes represent
    control flow
   Loops are implemented
    using recursion to avoid
    multiple assignment of the
    iteration variable
   Naïve implementation
    – Bubble sort requires (n2) array
   Optimized implementation
    – All array updates are done in
    – But parallelism is decreased
Code Generation Overview

     Input is from compiler front-end
      – IF1: intermediate data-flow graph
   Code generator eliminates copies
   Output is in C
      – Compiled into machine code using an
        optimized C compiler
Vertical Substitution
   If input and output
    have the same
    type and size, they   1   AElement   2   AElement

    can share memory
    – Updates are done                        3   AReplace

                                              4   AReplace
Horizontal Substitution
   If an output has
    destinations, the   1   AElement   2   AElement

    output edges
    can share                               3   AReplace
                                            4   AReplace
Horizontal and Vertical Substitution

     Horizontal and vertical substitution can
      interfere with each other
      – A node along the substitution chain
        modifies the shared object before its last
     Edges can be marked as read-only if
      they are shared and this is not the last
    Horizontal and Vertical Substitution

               Input                              Input

1   AElement   2   AElement        1   AElement       3   AElement

                    3   AReplace              2   AReplace

                    4   AReplace                      4   AReplace
Interprocedural Substitution

   Previous discussion concerned simple
    nodes that can be analyzed at compiler
    design time
   Information about a function is needed
    in order to use substitution
      – Does the function modify an input?
      – Will an input be chained to an output?
Intersubgraph Substitution

   Substitution analysis is done for each
   Same basic principles
Determining the Evaluation Order

   Evaluation order can impact efficiency
    of substitution
   Naïve implementation selects the next
    node to evaluate at random
   Hints tell algorithm which nodes should
    be evaluated before and after other
    nodes if possible
   Hints are ad hoc?
Pattern Matching

   Replace hard-to-optimize pieces of
   Patterns are language-specific
   Patterns are detected using “ad hoc”
Substructure Sharing

   Allow substructures to be referenced
    without copies
   AElement can be treated as a NoOp
   Happens after substitution analysis –
    less important
   Same principles as substitution
Substructure Targeting

   Allow structures to be built from
    substructures without copies
   Similar to substructure sharing

   Compared optimizations versus naïve
   Optimization eliminate all copies for
    bubble sort
   Informal comparison to run-time
    optimizer shows improvements

   Substitution, pattern matching and
    substructure sharing can almost
    eliminate unnecessary copies in a
    single assignment language.
   Copy elimination no longer has to be
    done at run-time.
   Single assignment languages should be
    more efficient for parallel programs.