# Compile time Copy Elimination

Document Sample

```					Register Allocation and Spilling
via Graph Coloring
G. J. Chaitin
IBM Research, 1982
Motivation
   Before the register allocation phase, the
compiler assumes that there are an unlimited
number of general purpose registers
   The symbolic registers must be mapped to
real registers in a way that avoids conflicts
   Symbolic registers that cannot be mapped to
real registers must be spilled to memory
   We need an algorithm to map registers with
minimal spilling cost
Paper Overview

 Register allocation overview
 Subsumption algorithm
 Interference graph coloring algorithm
 Spilling algorithm
Register Allocation Steps
1.   Determine which registers are live at any
point in the intermediate language (IL)
program
2.   Build a register interference graph
   Nodes represent symbolic registers
   Edges represent a conflict between symbolic
registers
3.   Subsumption: eliminate unnecessary
register copies
4.   Find a 32-coloring of the interference graph
5.   Decide which registers to spill if necessary
Subsumption

   If the source and destination of a register
copy do not interfere, they may be coalesced
into a single node
   For each register copy in IL, determine
whether the registers interfere
   If not, coalesce the two nodes into one
   After first pass, rewrite IL code
   Repeat until no more coalescing is possible
Subsumption Example

A   B
A=1           A
B=A           B
B=B+1                      C   D

C=B           C      B
D=A           D      A
…                   C, D
Subsumption Example

BC = BC + 1
BC
Finding a 32-Coloring
   Each symbolic register is assigned a color
representing a real register
   If no adjacent nodes have the same color, then the
coloring succeeds
   Assume that G has a node N with degree < 32
   Then G is 32-colorable iff the reduced graph from
which N and all its edges have been omitted is 32-
colorable
   Algorithm throws away nodes of degree < 32 until all
nodes have been removed
   Algorithm fails if no node has degree < 32
3-coloring example

A   B
A=1             A
B=2             B
C=3             C
C   D
?=A                    A
D=4             D
?=B                    B
?=C                    C
?=D                    D
Spilling
   If the 32-coloring fails, then nodes must be
spilled to memory
   Spilled registers are stored to memory, then
loaded momentarily when their results are
needed
   Every time spill code is generated, the
interference graph must be rebuilt
   Usually recoloring succeeds after spilling, but
sometimes several passes are required
Spilling

 NP-Complete problem
 Heuristic: spill the node that minimizes
– Cost of spilling / Degree of node
   Cost of spilling
– (number of definition points + number of
use points) * frequency of each point
   In some cases, spilled node can be
Conclusion

 The graph coloring and spilling
algorithms should produce faster code
 The register allocation algorithm is
efficient
– Graph coloring is (N)
– But uses (N2) space
Compile-time Copy Elimination

Peter Schnorf
John Hennessy

Stanford, 1993
Motivation
   Single assignment languages simplify
dependency checking
   Which simplifies automatic detection and
exploitation of parallelism
   But single-assignment languages require a
large number of copies
   Previous implementations eliminate copies at
runtime
   Increased efficiency if copies can be
eliminated at compile time
Paper Overview
   Single-assignment languages
   Code generation
   Compile-time copy elimination techniques
–   Substitution
–   Pattern matching
–   Substructure sharing
–   Substructure targeting
   Results – success!
– Eliminated all copies in bubble sort
Single-assignment languages
   Functional languages (LISP, Haskell, SISAL)
   Simpler dependency checking
– True dependencies – write, read
   b = f(c), a = f(b)
   a = f(b), b = f(c)
– Output dependencies – write, write
   a = f(b), a = f(c)
– Aliasing
   caused by pointers, array indexes
   To avoid aliasing, all inputs and outputs are passed
by value
Example – Swap(A,i,j)
   Data flow diagram                              Input
– Edges transport values
– Simple nodes are operations
   Pick any feasible node              AElement           AElement
evaluation order at random
   Naïve implementation
– Each edge has its own memory      AReplace
– Swap uses 5 array copies!
   Optimized implementation
– Swap array updates are done in-                      AReplace
place
Example: BubbleSort(A)

   Compound nodes represent
control flow
   Loops are implemented
using recursion to avoid
multiple assignment of the
iteration variable
   Naïve implementation
– Bubble sort requires (n2) array
copies
   Optimized implementation
– All array updates are done in
place
– But parallelism is decreased
Code Generation Overview

   Input is from compiler front-end
– IF1: intermediate data-flow graph
representation
 Code generator eliminates copies
 Output is in C
– Compiled into machine code using an
optimized C compiler
Vertical Substitution
Input
   If input and output
have the same
type and size, they   1   AElement   2   AElement

can share memory
– Updates are done                        3   AReplace
in-place

4   AReplace
Horizontal Substitution
Input
   If an output has
several
destinations, the   1   AElement   2   AElement

output edges
can share                               3   AReplace
memory
4   AReplace
Horizontal and Vertical Substitution

   Horizontal and vertical substitution can
interfere with each other
– A node along the substitution chain
modifies the shared object before its last
use
   Edges can be marked as read-only if
they are shared and this is not the last
use
Horizontal and Vertical Substitution

Input                              Input

1   AElement   2   AElement        1   AElement       3   AElement

3   AReplace              2   AReplace

4   AReplace                      4   AReplace
Interprocedural Substitution

 Previous discussion concerned simple
nodes that can be analyzed at compiler
design time
 Information about a function is needed
in order to use substitution
– Does the function modify an input?
– Will an input be chained to an output?
Intersubgraph Substitution

 Substitution analysis is done for each
construct
 Same basic principles
Determining the Evaluation Order

 Evaluation order can impact efficiency
of substitution
 Naïve implementation selects the next
node to evaluate at random
 Hints tell algorithm which nodes should
be evaluated before and after other
nodes if possible
Pattern Matching

 Replace hard-to-optimize pieces of
code
 Patterns are language-specific
 Patterns are detected using “ad hoc”
methods
Substructure Sharing

 Allow substructures to be referenced
without copies
 AElement can be treated as a NoOp
 Happens after substitution analysis –
less important
 Same principles as substitution
analysis
Substructure Targeting

 Allow structures to be built from
substructures without copies
 Similar to substructure sharing
Results

 Compared optimizations versus naïve
implementation
 Optimization eliminate all copies for
bubble sort
 Informal comparison to run-time
optimizer shows improvements
Results
Conclusions

 Substitution, pattern matching and
substructure sharing can almost
eliminate unnecessary copies in a
single assignment language.
 Copy elimination no longer has to be
done at run-time.
 Single assignment languages should be
more efficient for parallel programs.

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 9 posted: 3/25/2011 language: English pages: 32