Register Allocation and Spilling via Graph Coloring G. J. Chaitin IBM Research, 1982 Motivation Before the register allocation phase, the compiler assumes that there are an unlimited number of general purpose registers The symbolic registers must be mapped to real registers in a way that avoids conflicts Symbolic registers that cannot be mapped to real registers must be spilled to memory We need an algorithm to map registers with minimal spilling cost Paper Overview Register allocation overview Subsumption algorithm Interference graph coloring algorithm Spilling algorithm Register Allocation Steps 1. Determine which registers are live at any point in the intermediate language (IL) program 2. Build a register interference graph Nodes represent symbolic registers Edges represent a conflict between symbolic registers 3. Subsumption: eliminate unnecessary register copies 4. Find a 32-coloring of the interference graph 5. Decide which registers to spill if necessary Subsumption If the source and destination of a register copy do not interfere, they may be coalesced into a single node For each register copy in IL, determine whether the registers interfere If not, coalesce the two nodes into one After first pass, rewrite IL code Repeat until no more coalescing is possible Subsumption Example Instructions Live Dead A B A=1 A B=A B B=B+1 C D C=B C B D=A D A … C, D Subsumption Example Instructions Live Dead AD BC AD = 1 AD BC = AD BC BC = BC + 1 … AD, BC Finding a 32-Coloring Each symbolic register is assigned a color representing a real register If no adjacent nodes have the same color, then the coloring succeeds Assume that G has a node N with degree < 32 Then G is 32-colorable iff the reduced graph from which N and all its edges have been omitted is 32- colorable Algorithm throws away nodes of degree < 32 until all nodes have been removed Algorithm fails if no node has degree < 32 3-coloring example Instructions Live Dead A B A=1 A B=2 B C=3 C C D ?=A A D=4 D ?=B B ?=C C ?=D D Spilling If the 32-coloring fails, then nodes must be spilled to memory Spilled registers are stored to memory, then loaded momentarily when their results are needed Every time spill code is generated, the interference graph must be rebuilt Usually recoloring succeeds after spilling, but sometimes several passes are required Spilling NP-Complete problem Heuristic: spill the node that minimizes – Cost of spilling / Degree of node Cost of spilling – (number of definition points + number of use points) * frequency of each point In some cases, spilled node can be reloaded for an extended interval Conclusion The graph coloring and spilling algorithms should produce faster code The register allocation algorithm is efficient – Graph coloring is (N) – But uses (N2) space Compile-time Copy Elimination Peter Schnorf Mahadevan Ganapathi John Hennessy Stanford, 1993 Motivation Single assignment languages simplify dependency checking Which simplifies automatic detection and exploitation of parallelism But single-assignment languages require a large number of copies Previous implementations eliminate copies at runtime Increased efficiency if copies can be eliminated at compile time Paper Overview Single-assignment languages Code generation Compile-time copy elimination techniques – Substitution – Pattern matching – Substructure sharing – Substructure targeting Results – success! – Eliminated all copies in bubble sort Single-assignment languages Functional languages (LISP, Haskell, SISAL) Simpler dependency checking – True dependencies – write, read b = f(c), a = f(b) – Anti-dependencies – read, write a = f(b), b = f(c) – Output dependencies – write, write a = f(b), a = f(c) – Aliasing caused by pointers, array indexes To avoid aliasing, all inputs and outputs are passed by value Example – Swap(A,i,j) Data flow diagram Input – Edges transport values – Simple nodes are operations Pick any feasible node AElement AElement evaluation order at random Naïve implementation – Each edge has its own memory AReplace – Swap uses 5 array copies! Optimized implementation – Swap array updates are done in- AReplace place Example: BubbleSort(A) Compound nodes represent control flow Loops are implemented using recursion to avoid multiple assignment of the iteration variable Naïve implementation – Bubble sort requires (n2) array copies Optimized implementation – All array updates are done in place – But parallelism is decreased Code Generation Overview Input is from compiler front-end – IF1: intermediate data-flow graph representation Code generator eliminates copies Output is in C – Compiled into machine code using an optimized C compiler Vertical Substitution Input If input and output have the same type and size, they 1 AElement 2 AElement can share memory – Updates are done 3 AReplace in-place 4 AReplace Horizontal Substitution Input If an output has several destinations, the 1 AElement 2 AElement output edges can share 3 AReplace memory 4 AReplace Horizontal and Vertical Substitution Horizontal and vertical substitution can interfere with each other – A node along the substitution chain modifies the shared object before its last use Edges can be marked as read-only if they are shared and this is not the last use Horizontal and Vertical Substitution Input Input 1 AElement 2 AElement 1 AElement 3 AElement 3 AReplace 2 AReplace 4 AReplace 4 AReplace Interprocedural Substitution Previous discussion concerned simple nodes that can be analyzed at compiler design time Information about a function is needed in order to use substitution – Does the function modify an input? – Will an input be chained to an output? Intersubgraph Substitution Substitution analysis is done for each construct Same basic principles Determining the Evaluation Order Evaluation order can impact efficiency of substitution Naïve implementation selects the next node to evaluate at random Hints tell algorithm which nodes should be evaluated before and after other nodes if possible Hints are ad hoc? Pattern Matching Replace hard-to-optimize pieces of code Patterns are language-specific Patterns are detected using “ad hoc” methods Substructure Sharing Allow substructures to be referenced without copies AElement can be treated as a NoOp Happens after substitution analysis – less important Same principles as substitution analysis Substructure Targeting Allow structures to be built from substructures without copies Similar to substructure sharing Results Compared optimizations versus naïve implementation Optimization eliminate all copies for bubble sort Informal comparison to run-time optimizer shows improvements Results Conclusions Substitution, pattern matching and substructure sharing can almost eliminate unnecessary copies in a single assignment language. Copy elimination no longer has to be done at run-time. Single assignment languages should be more efficient for parallel programs.