Compilers I -Chapter 6 Optimisation and data-flow analysis
W
Shared by: qws18475
Categories
Tags
data flow analysis, register allocation, program analysis, optimising compilers, abstract interpretation, code generation, static analysis, strength reduction, code optimisation, software performance, program transformation, programming languages, examination scheme, data flow, peephole optimisation
-
Stats
- views:
- 2
- posted:
- 6/8/2010
- language:
- English
- pages:
- 13
Document Sample


October 09
Overview
Compilers I - Chapter 6: • This introductory course has focussed so far on fast,
simple techniques which generated code that works
Optimisation and data-flow analysis reasonably well
• In this chapter we briefly look at what optimising
p y p g
• Lecturers: compilers do, and how they do it
– Part I: Paul Kelly (phjk@doc.ic.ac.uk) • Compare “gcc file.c” versus “gcc –O file.c”
• Office: room 423 • According to the gcc manual page (“man gcc”):
– Part II: Naranker Dulay (nd@doc.ic.ac.uk) – Without `-O', the compiler's goal is to reduce the cost of
• Office: room 562 compilation and to make debugging produce the expected
results.
results Statements are independent: if you stop the program
• Materials: with a breakpoint between statements, you can then assign a
– Textbook new value to any variable or change the program counter to any
other statement in the function and get exactly the results you
– Course web pages would expect from the source code.
(http://www.doc.ic.ac.uk/~phjk/Compilers)
– Without `-O', only variables declared “register” are allocated in
October 09 1 October 09
registers 2
The plan Optimisation: example
• To optimise or not to optimise?
• Consider the loop from tutorial exercise 4:
• High-level vs low-level; role of analysis
• Peephole optimisation void P(int i, int j)
• Local, l b l i
L l global, interprocedural
d l {
– Loop optimisations int k, tmp;
– Where optimisation fits in the compiler
• Example: live ranges for (k=0; k<100; k++) {
– Live ranges as a data flow problem tmp = A[i+k];
– Solving the data-flow equations A[i+k] A[j+k];
A[i k] = A[j k]
– Deriving the interference graph A[j+k] = tmp;
• What can optimisation
• Other data-flow analyses }
do here?
• Loop-invariant code and code motion optimisations }
October 09 – More sophisticated optimisations 3 October 09 4
Compilers Chapter 6 (c) Paul Kelly, Imperial College London 1
October 09
Without optimisation…. movl $_A,%edx Without optimisation…. movl $_A,%edx
_P: movl 52(%esp),%ecx _P: movl 52(%esp),%ecx
subl $36,%esp movl 28(%esp),%ebx Without subl $36,%esp movl 28(%esp),%ebx Without
pushl %ebp addl %ebx,%ecx optimisation, pushl %ebp addl %ebx,%ecx optimisation,
pushl %ebx leal 0(,%ecx,4),%ebx pushl %ebx leal 0(,%ecx,4),%ebx
nop
code is large, nop
code is large,
movl $_A,%ecx movl $_A,%ecx
movl $0,28(%esp) movl (%ebx,%ecx),%ebx slow, but movl $0,28(%esp) movl (%ebx,%ecx),%ebx slow, but
.align 4 movl %ebx,(%eax,%edx) compiles .align 4 movl %ebx,(%eax,%edx) compiles
L3:
L3 movl 52(%esp),%eax L3:
L3 movl 52(%esp),%eax
cmpl $99,28(%esp) movl 28(%esp),%ecx
quickly and cmpl $99,28(%esp) movl 28(%esp),%ecx
quickly and
jle L6 leal (%ecx,%eax),%edx works well jle L6 leal (%ecx,%eax),%edx works well
jmp L4 leal 0(,%edx,4),%eax with the jmp L4 leal 0(,%edx,4),%eax with the
.align 4 movl $_A,%edx .align 4 movl $_A,%edx
L6: movl 24(%esp),%ecx
debugger L6: movl 24(%esp),%ecx
debugger
movl 48(%esp),%eax movl %ecx,(%eax,%edx) movl 48(%esp),%eax movl %ecx,(%eax,%edx)
movl 28(%esp),%edx L5: movl 28(%esp),%edx L5:
addl %edx,%eax
, incl 28(%esp) addl %edx,%eax
, incl 28(%esp)
leal 0(,%eax,4),%edx jmp L3 31 instructions in loop leal 0(,%eax,4),%edx jmp L3 31 instructions in loop
movl $_A,%eax .align 4 movl $_A,%eax .align 4
movl (%edx,%eax),%edx L4: Performance: movl (%edx,%eax),%edx L4: Performance:
movl %edx,24(%esp) L2: movl %edx,24(%esp) L2:
movl 48(%esp),%eax popl %ebx • 27.5ns per iteration movl 48(%esp),%eax popl %ebx • 8.2ns per iteration
movl 28(%esp),%ecx popl %ebp (gcc 2.95, 800MHz movl 28(%esp),%ecx popl %ebp (gcc 3.2.2, 2GHz
leal (%ecx,%eax),%edx addl $36,%esp leal (%ecx,%eax),%edx addl $36,%esp
leal 0(,%edx,4),%eax ret
Pentium III) leal 0(,%edx,4),%eax ret
Pentium IV)
October 09 5 October 09 6
_P: pushl %edi _P: pushl %edi
With optimisation: pushl %esi
movl $99,%edi
With optimisation: pushl %esi
movl $99,%edi
pushl %ebx pushl %ebx
• In this extreme example, movl $_A,%esi
• In this extreme example, movl $_A,%esi
optimised code is four movl 20(%esp),%ebx optimised code is four movl 20(%esp),%ebx
times faster movl 16(%esp),%ecx times faster movl 16(%esp),%ecx
$ ,
sall $2,%ebx $ ,
sall $2,%ebx
Use it t t k
– U registers not stack sall $2,%ecx
Use
– U registers not stack
it t t k sall $2,%ecx
– One jump per iteration .align 4 – One jump per iteration .align 4
– Loop-invariant offset L6: – Loop-invariant offset L6:
calculation moved out movl (%esi,%ecx),%edx calculation moved out movl (%esi,%ecx),%edx
movl (%esi,%ebx),%eax movl (%esi,%ebx),%eax
– Array pointers incremented movl %eax,(%esi,%ecx) – Array pointers incremented movl %eax,(%esi,%ecx)
instead of recalculated movl %edx,(%esi,%ebx) instead of recalculated movl %edx,(%esi,%ebx)
Loop control variable
– L t l i bl addl $4 %
ddl $4,%ecx – L t l
Loop control variable
i bl addl $4 %
ddl $4,%ecx
replaced with down-counter addl $4,%ebx 8 instructions in loop replaced with down-counter addl $4,%ebx 8 instructions in loop
decl %edi decl %edi
• Even faster code is Performance: • Even faster code is Performance:
jns L6 jns L6
possible by loop unrolling popl %ebx • 6.71ns per iteration possible by loop unrolling popl %ebx • 3.4ns per iteration
popl %esi (gcc 2.95, 800MHz popl %esi (gcc 3.2.2, 2GHz
popl %edi popl %edi
October 09 ret
Pentium III) 7 October 09 ret
Pentium IV) 8
Compilers Chapter 6 (c) Paul Kelly, Imperial College London 2
October 09
A simple local technique – peephole optimisation
Optimisation principles…
• To generate really good code, need to combine many • Scan assembly code, replacing obviously inane
techniques, including both high-level and low-level combinations of instructions (eg mov R0,a; mov a,R0)
• High-level example: inlining • Easy to implement:
– replace a call “f(x)” with the function body itself peep :: [Instruction] -> [Instruction]
– Avoids call/return overheads peep (Store r1 dest : Load r2 src : rest)
– Also creates further opportunities… | src == dest
– Can we inline virtual method calls “x.f(y)”? = Store r1 dest : (peep (Load r2 r1 : rest))
| otherwise
– y p yp
Need static analysis of possible types of “x”
= Store r1 dest : (peep (Load r2 src : rest))
• Low-level example: instruction scheduling
– Re-order instructions so processor executes them in parallel • Endless possibilities…
– To switch order of load A[i] and store A[j], need • Phase ordering problem: in which sequence should
dependence analysis: could i and j refer to same location? optimisations be applied?
October 09 9 October 09 10
Spectrum… Some loop optimisations…
• Peephole optimisation works at instruction level
• The Sethi-Ullman “weights” algorithm: expressions • Loop-invariant code motion
– An instruction is loop-invariant if its operands can only arrive
• “Local” optimisation works at the level of basic from outside the loop
blocks
bl k – a sequence of i i hi h h
f instructions which has a – move loop-invariant instructions into loop header
single point of entry and a single point of exit • Detection of induction variables
• “Global” optimisation works on a whole procedure – Induction variable is a value which increases/decreases by a (loop-
invariant) constant on each iteration
• Interprocedural optimisation works on the whole • Strength reduction: calculate induction variable by
program incrementing, instead of by multiplying other induction
i i i d fb li l i h i d i
Local: generally runs quickly and easy to validate variables
Global: may have worse-than-linear complexity, eg O(N2) where • Control variable selection: replace loop control variable
N is number of basic blocks, or local variables with one of the induction variables actually used in the loop
Interprocedural: rare – hard to avoid excessive compilation time
October 09 11 October 09 12
Compilers Chapter 6 (c) Paul Kelly, Imperial College London 3
October 09
Loop optimisations - example
int P(int N, int M) 1. (constant Where does optimisation happen?
{ 1. y is constant
propagation Source Target
Appel pg457) Language Language
int i, u, v, w, x, y; Program Analysis Synthesis Program
2. w+x is dead code 2. (dead code
int z = 0; elimination (char string) (char string)
3 y+N+M is loop-
3. loop 457 397)
pg457,397)
for (i=0; i<N; i++) { invariant 3. (loop-invariant (further
w = w+10; 4. i, w and x are code motion decomposition)
x = w*10; induction variables pg422)
4. (induction Lexical Syntax Semantic Intermediate Optimisation Code
y = z*(w+x); 5. x increases by 100 variable
Analysis Analysis Analysis Code Generator
Generation
w+x+y+N+M;
u = w x y N M; each iteration recognition Input: intermediate
• I t i t di t
(internal
v = v+u; 6. i is used only to
pg426) representation) code
} control the loop,
5. (strength • Output: intermediate
reduction ditto) Abstract Symbol code
return v; and can be omitted 6. (rewriting Syntax Table • Uses: symbol table,
} if convenient comparisons, Tree semantic analysis
October 09 pg428) 13 October 09 14
Intermediate code Dataflow analysis (DFA)
• In our simple compiler, translator traverses AST and
produces assembler code directly • Optimisation consists of analysis and transformation
• In optimising compiler, translator traverses AST and • Analysis: deduce program properties from IR
produces “intermediate code”
– Analyse effect of each instruction
• Intermediate code is designed to
– Compose these effects to derive information about the
– Represent all primitive operations necessary to execute program
entire procedure
– In a uniform way, easy to analyse and manipulate
– Independently of target instruction set • Consider: Add (Reg T0) (Reg T1)
• Compiler writers argue… Appel advocates two IRs: – Uses temporaries T0 and T1
•TTree: before instruction selection
b f i t ti l ti – Kills old definition of T1
• FlowGraph: after instruction selection – Generates new definition of T1
• IR uses “temporaries” T0, T1, T2… instead of real • We will see how to do “dataflow analysis” in order to
registers; after optimisation, use graph colouring to assign use this local information to derive global properties
temporaries to real registers
October 09 15 October 09 16
Compilers Chapter 6 (c) Paul Kelly, Imperial College London 4
October 09
Example dataflow analysis: live ranges Preliminary: build the control flow graph
• Recall graph colouring: • data CFG = ControlFlowGraph [CFGNode]
1. Generate code using temporaries T0… instead of registers • data CFGNode = Node Id Instruction [Register] [Register] [Id]
2. For each temporary Ti, find Ti’s “live range” – the set of defs uses succs
instructions for which Ti must reside in a register • type Id = Int
3. LiveRange(Ti) intersects LiveRange(Tj) means they have • data Register = D Int | T Int (temporaries before, real after)
to be allocated to different registers – they interfere • buildCFG :: [Instruction] -> CFG
4. Assemble the register interference graph (RIG)
5. Colour the RIG by assigning real registers to temporaries g p ,
• Each node of the control flow graph contains an instruction,
idi i t f
avoiding interference together with:
6. If successful, replace temporaries with registers and – nodeDefs cfgnode = list of temporaries which this instruction updates
generate code – nodeUses cfgnode = list of temporaries which this instruction reads
7. If graph cannot be coloured, find a temporary to spill to – nodeSuccs cfgnode = list of nodes which might be executed next
memory, then retry
October 09 17 October 09 18
Source code Intermediate code Control flow graph
while (b<10) Bra L2 1 Bra L2 10 Live variable analysis - definition
L1:
{ • Point: any location between adjacent nodes
cmp b a 2 cmp b a a,b 3
if (b<a) bge L3 3 bge L3 4,8 • Path: a sequence of points p1..pi pi+1..pn such that
a = a*7; mul #7 a 4 mul #7 a a a 5 pi+1 is the immediate successor of pi in the CFG
mov a b 5 mov a b a b 6
b = a+1; 6 add #1 b b b 7
add #1 b • “x is live at p”: for some variable x and point p, the
else bra L4 7 bra L4 10
value of x could be used along some path starting at
a = b-1; L3:
p.
mov b a 8 mov b a b a 9
}
sub #1 a 9 sub #1 a a a 10
Finding live L4:
L2:
ranges…
Cmp b #10 10 Cmp b #10 b 11
example
October 09 Blt L1 11 Blt L1 12,2 19 October 09 20
Compilers Chapter 6 (c) Paul Kelly, Imperial College London 5
October 09
1 Bra L2 10 1 Bra L2 10
2 cmp b a a,b 3 2 cmp b a a,b 3
3 bge L3 4,8 3 bge L3 4,8
4 mul #7 a a a 5 4 mul #7 a a a 5
5 mov a b a b 6 8 mov b a b a 9 5 mov a b a b 6 8 mov b a b a 9
6 add #1 b b b 7 9 sub #1 a a a 10 6 add #1 b b b 7 9 sub #1 a a a 10
7 bra L4 10 7 bra L4 10
“x is live at p”: for some “x is live at p”: for some
variable x and point p, the value variable x and point p, the value
of x could be used along some of x could be used along some
path starting at p. path starting at p.
10 Cmp b #10 b 11 10 Cmp b #10 b 11
Consider variable b at node 1
11 Blt L1 12,2 11 Blt L1 12,2
October 09 21 October 09 22
1 Bra L2 10 1 Bra L2 10
2 cmp b a a,b 3 2 cmp b a a,b 3
3 bge L3 4,8 3 bge L3 4,8
4 mul #7 a a a 5 4 mul #7 a a a 5
5 mov a b a b 6 8 mov b a b a 9 5 mov a b a b 6 8 mov b a b a 9
6 add #1 b b b 7 9 sub #1 a a a 10 6 add #1 b b b 7 9 sub #1 a a a 10
7 bra L4 10 7 bra L4 10
“x is live at p”: for some “x is live at p”: for some
variable x and point p, the value variable x and point p, the value
of x could be used along some of x could be used along some
path starting at p. path starting at p.
10 Cmp b #10 b 11 10 Cmp b #10 b 11
Consider variable b at node 2 Consider variable b at node 4
11 Blt L1 12,2 11 Blt L1 12,2
October 09 23 October 09 24
Compilers Chapter 6 (c) Paul Kelly, Imperial College London 6
October 09
Dataflow equations for live variable analysis
Dataflow equations for live variable analysis
• LiveIn(n): set of temporaries live immediately before node n
Define:
• LiveOut(n): set of temporaries live immediately after node n
• LiveIn(n): the set of temporaries live immediately before
node n • A variable is live immediately after node n if it is live before
• LiveOut(n): the set of temporaries live immediately after n’s
any of n s successors:
node n
– LiveOut(n) = U s ∈ succ(n) LiveIn(s)
• A variable is live immediately after node n if it is live • A variable is live immediately before node n if:
before any of n’s successors
– It is live after node n (ie some later instruction reads it)
Unless i itt b
– U l it is overwritten by node n d
• A variable is live immediately before node n if: OR
– It is live after node n (ie some later instruction reads it) – It is used by node n (ie the instruction reads it)
– Unless it is overwritten by node n
OR – LiveIn(n) = uses(n) U (LiveOut(n) – defs(n))
October 09
– It is used by node n (ie the instruction reads it) 25 October 09 26
LiveIn(1)= uses(1) U (LiveOut(1) – defs(1)) Id Uses Defs Ids of succs
1 Bra L2 10
LiveOut(1)= U s ∈ succ(1) LiveIn(s) LiveIn(6)= uses(6) U (LiveOut(6) – defs(6))
LiveIn(2)= uses(2) U (LiveOut(2) – defs(2))
2 cmp b a a,b 3
• What’s the difference between LiveIn and LiveOut? LiveOut(6)= U s ∈ succ(6) LiveIn(s)
LiveOut(2)= U s ∈ succ(2) LiveIn(s)
LiveIn(7)= uses(7) U (LiveOut(7) – defs(7))
3 bge L3 4,8
LiveIn(3)= uses(3) U (LiveOut(3) – defs(3))
4 mul #7 a a a 5
5 mov a b a b 6
LiveOut(3)= U s ∈ succ(3) LiveIn(s)
LiveOut(7)= U s ∈ succ(7) LiveIn(s)
6 add #1 b b b 7
LiveIn(n): the set of variables that could be used along some path starting here LiveIn(4)= uses(4) U (LiveOut(4) – defs(4)) LiveIn(8)= uses(8) U (LiveOut(8) – defs(8))
7 bra L4 10
n: 6 add #1 b b b 7
LiveOut(n): the set of variables that could be used along some path starting here LiveOut(4)= U s ∈ succ(4) LiveIn(s)
LiveOut(8)= U s ∈ succ(8) LiveIn(s)
LiveIn(5)= uses(5) U (LiveOut(5) – defs(5)) 8 mov b a b a 9
LiveIn(9)= uses(9) U (LiveOut(9) – defs(9))
9 sub #1 a a a 10
LiveOut(5)= U s ∈ succ(5) LiveIn(s)
LiveOut(9)=
LiveOut(9) U s ∈ succ(9) LiveIn(s)
(9)
LiveIn(10)= uses(10) U (LiveOut(10) – defs(10))
b
10 Cmp b #10 11
LiveOut(10)= U s ∈ succ(10) LiveIn(s) 11 Blt L1 12,2
• 22 simultaneous LiveIn(11)= uses(11) U (LiveOut(11) – defs(11))
equations
October 09 27 October 09
LiveOut(11)= U s ∈ succ(11) LiveIn(s) 28
Compilers Chapter 6 (c) Paul Kelly, Imperial College London 7
October 09
LiveIn(1)= uses(1) U (LiveOut(1) – defs(1)) Id Uses Defs Ids of succs
LiveOut(1)=LiveIn(10)
LiveIn(6)= uses(6) U (LiveOut(6) – defs(6))
1 Bra L2 10
Solving the dataflow equations
LiveIn(2)= uses(2) U (LiveOut(2) – defs(2))
LiveOut(6)=LiveIn(7) 2 cmp b a a,b 3
LiveOut(2)=LiveIn(3)
LiveIn(7)= uses(7) U (LiveOut(7) – defs(7)) 3 bge L3 4,8
LiveIn(3)= uses(3) U (LiveOut(3) – defs(3)) 4 mul #7 a a a 5
LiveOut(7)=LiveIn(10)
LiveOut(3)=LiveIn(4) U LiveIn(8) 5 mov a b a b 6
LiveIn(4)= uses(4) U (LiveOut(4) – defs(4)) LiveIn(8)= uses(8) U (LiveOut(8) – defs(8)) 6 add #1 b b b 7 • We have a system of simultaneous equations for
7 bra L4 10
LiveOut(4)=LiveIn(5) LiveOut(8)=LiveIn(9) LiveIn(n) and LiveOut(n) for each node n
LiveIn(5)= uses(5) U (LiveOut(5) – defs(5))
LiveIn(9)= uses(9) U (LiveOut(9) – defs(9))
LiveOut(5)=LiveIn(6)
8 mov b a b a 9
LiveOut(9)=LiveIn(10) 9 sub #1 a a a 10
• How can we solve them?
LiveIn(10)= uses(10) U (LiveOut(10) – defs(10))
LiveIn(10)
LiveOut(10)=LiveIn(11)
b
10 Cmp b #10 11
LiveIn(11)= uses(11) U (LiveOut(11) – defs(11)) 11 Blt L1 12,2
• 22 simultaneous LiveOut(11)=LiveIn(12) U LiveIn(2) Clearer if we substitute in
equations the successors:
succs(11) = {12,2}
October 09 29 October 09 30
Solving the dataflow equations Step 0 Iteration… walkthrough
Node uses defs succs in out
• Idea: Iterate! 1 10 {} {}
2 a,b 3 {} {}
for each n in CFG { 3 {} {}
for each n in CFG {
4,8
, LiveIn(n) := {}; LiveOut(n) := {};
( ) {}; ( )
LiveIn(n) := {} LiveOut(n) := {}
{}; 4 a a {} {} }
5
} repeat {
5 a b 6 {} {} for each n in CFG {
repeat { LiveIn(n) = uses(n) U (LiveOut(n) – defs(n));
6 b b 7 {} {}
for each n in CFG {
LiveIn(n) = uses(n) U (LiveOut(n) – defs(n));
7 10 {} {}
}
LiveOut(n) = U s ∈ succ(n) LiveIn(s);
U
8 b a 9 {} {} } until LiveIn and LiveOut stop changing
LiveOut(n) = s ∈ succ(n) LiveIn(s); 9 a a 10 {} {}
} 10 b 11 {} {}
} until LiveIn and LiveOut stop changing 11 12,2 {} {} Q: should I process the nodes in order?
October 09 31 • see Appel pg 226 for another example
October 09 32
Compilers Chapter 6 (c) Paul Kelly, Imperial College London 8
October 09
Step 1 Iteration… walkthrough Step 1 Step 2 Step 3 Step 4 Step 5 Step 5
Node uses defs succs in out Node
usesdefs
succs in out in out in out in out in out in out
1 10 { } {b} 1 10 { } {b} { } {b} { } {a,b}{a,b}{a,b}{a,b}{a,b}{a,b}{a,b}
2 a,b 3 {a,b} { } 2 a,b 3 {a,b} { } {a,b}{a,b}{a,b} { { {
{a,b}{a,b} a,b} {a,b} a,b} {a,b} a,b}
for each n in CFG {
3 4,8
, { , }
{ } {a,b} LiveIn(n) := {}; LiveOut(n) := {}; 3 4,8 { , }{ , }{ }{ }{ }{ }{ }{ }{ }{ }{ }
{ } {a,b}{a,b}{a,b}{a,b}{a,b}{a,b}{a,b}{a,b}{a,b}{a,b}{a,b}
4 a a {a} {a} } 4 a a 5 {a} {a} {a} {a} {a} {a} {a} {a} {a} {a} {a} {a}
5 repeat {
5 a b 6 {a} {b} for each n in CFG { 5 a b 6 {a} {b} {a} {b} {a} {b} {a} {b} {a} {b} {a} {a,b}
LiveIn(n) = uses(n) U (LiveOut(n) – defs(n));
6 b b 7 {b} { } 6 b b 7 {b} { } {b} {b} {b} {b} {b} {b} {b} {a,b} {a,b} a,b} {
7 10 { } {b}
}
LiveOut(n) = U s ∈ succ(n) LiveIn(s); 7 10 { } {b} {b} {b} {b} {b} {b} {a,b}{a,b}{a,b}{a,b}{a,b}
8 b a 9 {b} {a} } until LiveIn and LiveOut stop changing
8 b a 9 {b} {a} {b} {a} {a b}{a b}{a b} b}{a b}{a b}{a b}{a b}
{a,b}{a,b}{a,b}{a {a,b}
{a,b} {a,b}{a,b} {a,b}
9 a a 10 {a} {b} 9 a a 10 {a} {b} {a,b} {b} {a,b} {b} {a,b}{a,b}{a,b}{a,b}{a,b}{a,b}
10 b 11 {b} {} 10 b 11 {b} { } {b} { } {b} {a,b}{a,b} {a,b}{a,b}{a,b}{a,b} {a,b}
11 12,2 {} {} Q: should I process the nodes in order? 11 12,2 { } { } { } {a,b}{a,b}{a.b}{a,b}{a,b}{a,b}{a,b}{a,b}{a,b}
• see Appel pg 226 for another example
October 09 33 October 09
LiveIn(n) = uses(n) U (LiveOut(n) – defs(n)); LiveOut(n) = U s ∈ succ(n) LiveIn(s); 34
Code
Real example: factorial loop Real example: factorial loop
.data
; Integer variable a has been allocated to T0
.text
Concrete syntax Abstract syntax Concrete syntax
move.l #1, T0
move.l #10, T1
program program move.l #1, T2
a
(Program [Decl "a" Integer]
declare x : declare x : bra L2
[(Assign (Var "a") (Const 1)),
Integer Integer L1:
(For "x" (Const 1) (Const 10) move.l T2, T3
declare a : declare a :
[(Assign (Var "a") move.l T0, T4
Integer Integer mul.l T3, T4
begin (Binop Times (Ref (Var "a")) (Ref (Var "x"))))] begin move.l T4, T0
)]) add.l #1, T2
:
a := 1 :
a := 1
L2:
for x = 1 to 10 for x = 1 to 10 cmp.l T1, T2
a := a * x a := a * x bgt L3
bra L1
end end
L3:
end end move.l T2, x (updates variable x on exit from loop – a bug!)
October 09 35 October 09 36
Compilers Chapter 6 (c) Paul Kelly, Imperial College London 9
October 09
Code LiveIns ([(0,[]),
Step 0 ([(0,[]), Step 1 ([(0,[]),
Step 2 ([(0,[]),
Step 3 ([(0,[]),
Step 4
Real example: factorial loop (1,[]), (1,[]), (1,[]), (1,[]), (1,[]),
Node 0 (Mov (ImmNum 1) (Reg T0)) [T0] [] [1] [] (2,[]), (2,[]), (2,[]), (2,[T1]), (2,[T1]),
(3,[]), (3,[]), (3,[T1,T2]), (3,[T1,T2]), (3,[T1,T2]),
Node 1 (Mov (ImmNum 10) (Reg T1)) [T1] [] [2] [0] (4,[]), (4,[T2]), (4,[T2,T0]), (4,[T2,T0]), (4,[T2,T0]),
Concrete syntax Node 2 (Mov (ImmNum 1) (Reg T2)) [T2] [] [3] [1] (5,[]), (5,[T0]), (5,[T0,T3]), (5,[T0,T3]), (5,[T0,T3,T2]),
(6,[]), (6,[T3,T4]), (6,[T3,T4]), (6,[T3,T4,T2]), (6,[T3,T4,T2,T1]),
Node 3 (Bra "L2") [] [] [9] [2] (7,[]), (7,[T4]), (7,[T4,T2]), (7,[T4,T2,T1]), (7,[T4,T2,T1]),
program (8,[]), (8,[T2]), (8,[T2,T1]), (8,[T2,T1]), (8,[T2,T1]),
Node 4 (Mov (Reg T2) (Reg T3)) [T3] [T2] [5] [11]
declare x : (9 [])
(9,[]), (9,[T1,T2]),
(9 [T1 T2]) (9 [T1 T2])
(9,[T1,T2]), (9,[T1,T2]),
(9 [T1 T2]) (9,[T1,T2]),
(9 [T1 T2])
Node 5 (Mov (Reg T0) (Reg T4)) [T4] [T0] [6] [4] (10,[]), (10,[]), (10,[T2]), (10,[T2]), (10,[T2]),
Integer (11,[]), (11,[]), (11,[]), (11,[T2]), (11,[T2,T0]),
Node 6 (Mul (Reg T3) (Reg T4)) [T4] [T3,T4] [7] [5] (12,[]), (12,[T2]), (12,[T2]), (12,[T2]), (12,[T2]),
declare a : (13,[])], (13,[])], (13,[])], (13,[])], (13,[])],
Node 7 (Mov (Reg T4) (Reg T0)) [T0] [T4] [8] [6] LiveOuts
Integer [(0,[]),
(1,[]),
[(0,[]),
(1,[]),
[(0,[]),
(1,[]),
[(0,[]),
(1,[T1]),
[(0,[]),
(1,[T1]),
Node 8 (Add (ImmNum 1) (Reg T2)) [T2] [T2] [9] [7]
begin (2,[]), (2,[]), (2,[T1,T2]), (2,[T1,T2]), (2,[T1,T2]),
Node 9 (Cmp (Reg T1) (Reg T2)) [] [T1,T2] [10] [3,8] (3,[]), (3,[T1,T2]), (3,[T1,T2]), (3,[T1,T2]), (3,[T1,T2]),
:
a := 1 [11,12]
(Bgt
N d 10 (B t "L3") [] [] [11 12] [9]
Node
(4,[]),
(5,[]),
(4,[T0]),
(5,[T3,T4]),
(4,[T0,T3]),
(5,[T3,T4]),
(4,[T0,T3]),
(5,[T3,T4,T2]),
(4,[T0,T3,T2]),
(5,[T3,T4,T2,T1]),
for x = 1 to 10 Node 11 (Bra "L1") [] [] [4] [10]
(6,[]), (6,[T4]), (6,[T4,T2]), (6,[T4,T2,T1]), (6,[T4,T2,T1]),
(7,[]), (7,[T2]), (7,[T2,T1]), (7,[T2,T1]), (7,[T2,T1]),
a := a * x Node 12 (Mov (Reg T2) (Abs "x")) [] [T2] [13] [10] (8,[]), (8,[T1,T2]), (8,[T1,T2]), (8,[T1,T2]), (8,[T1,T2]),
(9,[]), (9,[]), (9,[T2]), (9,[T2]), (9,[T2]),
end Node 13 Halt [] [] [] [12] Live
range
(10,[]), (10,[T2]), (10,[T2]), (10,[T2]), (10,[T2,T0]),
(11,[]), (11,[]), (11,[T2]), (11,[T2,T0]), (11,[T2,T0]),
end
analysis
for (12,[]), (12,[]), (12,[]), (12,[]), (12,[]),
(Node id instrn defs uses succs preds) factorial
example (13,[])]) (13,[])]) (13,[])]) (13,[])]) (13,[])])
October 09 37 October 09 38
LiveIns ([(0,[]),
(1,[]),
Step 5 ([(0,[]),
(1,[]),
Step 6 ([(0,[]),
(1,[]),
Step 7 ([(0,[]),
(1,[]),
Step 8 ([(0,[]),
(1,[T0]),
Step 9 Derive interference graph from live ranges
(2,[T1]), (2,[T1]), (2,[T1]), (2,[T1,T0]), (2,[T1,T0]),
(3,[T1,T2]), (3,[T1,T2]), (3,[T1,T2,T0]), (3,[T1,T2,T0]), (3,[T1,T2,T0]),
Recall definition: • LiveOut: Interference
(4,[T2,T0]), (4,[T2,T0,T1]), (4,[T2,T0,T1]), (4,[T2,T0,T1]), (4,[T2,T0,T1]), Find overlapping live ranges
(5,[T0,T3,T2,T1]), (5,[T0,T3,T2,T1]), (5,[T0,T3,T2,T1]), (5,[T0,T3,T2,T1]), (5,[T0,T3,T2,T1]), [(0,[T0]),
(6,[T3,T4,T2,T1]), (6,[T3,T4,T2,T1]), (6,[T3,T4,T2,T1]), (6,[T3,T4,T2,T1]), (6,[T3,T4,T2,T1]), • “x is live at p”: for • For each temporary t
(7,[T4,T2,T1]), (7,[T4,T2,T1]), (7,[T4,T2,T1]), (7,[T4,T2,T1]), (7,[T4,T2,T1]), (1,[T1,T0]),
(8,[T2,T1]), (8,[T2,T1]), (8,[T2,T1,T0]), (8,[T2,T1,T0]), (8,[T2,T1,T0]), some variable x and (2,[T1,T2,T0]),
( ,[ , , ]), • For each node id
(9 [T1 T2 T0])
(9,[T1,T2,T0]), (9,[T1,T2,T0]),
(9 [T1 T2 T0]) (9,[T1,T2,T0]),
(9 [T1 T2 T0]) (9,[T1,T2,T0]),
(9 [T1 T2 T0])
(9 [T1 T2])
(9,[T1,T2]),
(10,[T2,T0]), (10,[T2,T0]), (10,[T2,T0]), (10,[T2,T0]), (10,[T2,T0,T1]),
point p, the value of x
i h f
l (3,[T1,T2,T0]), • If t is in liveOut(id)
(11,[T2,T0]),
(12,[T2]),
(11,[T2,T0]),
(12,[T2]),
(11,[T2,T0]),
(12,[T2]),
(11,[T2,T0,T1]),
(12,[T2]),
(11,[T2,T0,T1]),
(12,[T2]),
could be used along (4,[T0,T3,T2,T1]), • Then interferes(t)
(13,[])],
LiveOuts [(0,[]),
(13,[])], (13,[])], (13,[])], (13,[])], some path starting at p. (5,[T3,T4,T2,T1]), includes liveOut(id)
[(0,[]), [(0,[]), [(0,[]), [(0,[T0]),
(1,[T1]), (1,[T1]), (1,[T1]), (1,[T1,T0]), (1,[T1,T0]), (6,[T4,T2,T1]),
(2,[T1,T2]), (2,[T1,T2]), (2,[T1,T2,T0]), (2,[T1,T2,T0]), (2,[T1,T2,T0]),
(3,[T1,T2]), (3,[T1,T2,T0]), (3,[T1,T2,T0]), (3,[T1,T2,T0]), (3,[T1,T2,T0]), (7,[T2,T1,T0]), • Interference graph
(4,[T0,T3,T2,T1]),
(5,[T3,T4,T2,T1]),
(4,[T0,T3,T2,T1]),
(5,[T3,T4,T2,T1]),
(4,[T0,T3,T2,T1]),
(5,[T3,T4,T2,T1]),
(4,[T0,T3,T2,T1]),
(5,[T3,T4,T2,T1]),
(4,[T0,T3,T2,T1]),
(5,[T3,T4,T2,T1]),
• Eg: liveOut(7)
liveOut(7)= (8,[T1,T2,T0]), interferes=
(8 [T1 T2 T0]) interferes
(6,[T4,T2,T1]), (6,[T4,T2,T1]), (6,[T4,T2,T1]), (6,[T4,T2,T1]), (6,[T4,T2,T1]), [T2,T1,T0] (9,[T2,T0,T1]), [(T0,[T0,T1,T2,T3]),
(7,[T2,T1]), (7,[T2,T1]), (7,[T2,T1,T0]), (7,[T2,T1,T0]), (7,[T2,T1,T0]),
(8,[T1,T2]), (8,[T1,T2,T0]), (8,[T1,T2,T0]), (8,[T1,T2,T0]), (8,[T1,T2,T0]), “The values of T2, T1 and (10,[T2,T0,T1]), (T1,[T1,T0,T2,T3,T4]),
(9,[T2,T0]), (9,[T2,T0]), (9,[T2,T0]), (9,[T2,T0]), (9,[T2,T0,T1]),
Live (10,[T2,T0]), (10,[T2,T0]), (10,[T2,T0]), (10,[T2,T0,T1]), (10,[T2,T0,T1]),
T0 could be used along (11,[T2,T0,T1]), (T2,[T1,T2,T0,T3,T4]),
range
analysis (11,[T2,T0]), (11,[T2,T0]), (11,[T2,T0,T1]), (11,[T2,T0,T1]), (11,[T2,T0,T1]), some path starting from (12,[]), (T3,[T0,T3,T2,T1,T4]),
for (12,[]), (12,[]), (12,[]), (12,[]), (12,[]),
factorial 7” (T4,[T3,T4,T2,T1])] 40
example
October 09
(13,[])]) (13,[])]) (13,[])]) (13,[])]) (13,[])])
39 October 09
(13,[])])
Compilers Chapter 6 (c) Paul Kelly, Imperial College London 10
October 09
Derive interference graph from live ranges Use interference graph to assign temporaries
• Interference graph: • Interference graph:
T1 T1
[(T0,[T0,T1,T2,T3]), T2 [(T0,[T0,T1,T2,T3]), T2
(T1 [T1 T0 T2 T3 T4])
(T1,[T1,T0,T2,T3,T4]), (T1,[T1,T0,T2,T3,T4]),
(T1 [T1 T0 T2 T3 T4])
T0 T0
(T2,[T1,T2,T0,T3,T4]), (T2,[T1,T2,T0,T3,T4]),
(T3,[T0,T3,T2,T1,T4]), T3 (T3,[T0,T3,T2,T1,T4]), T3
(T4,[T3,T4,T2,T1])] (T4,[T3,T4,T2,T1])]
T4 T4
• Find colouring:
[(T0,D0),(T1,D1),(T2,D2),(T3,D3),(T4,D0)]
October 09 41 October 09 42
Applying the colouring:
.data .data Live variable analysis… summary
; Integer variable a has been allocated to T0 ; Integer variable a has been allocated to D0
.text .text • We found we could find live ranges by constructing a
move.l #1, T0 move.l #1, D0
move.l #10, T1 move.l #10, D1
system of dataflow equations and solving it by iteration
move.l #1, T2 move.l #1, D2 g y
• The algorithm always terminates…
bra L2 bra L2
L1:
• The amount of work per iteration depends on program
L1:
move.l T2, T3 move.l D2, D3 complexity - #instructions, #temporaries
move.l T0, T4 (T0 & T4 assigned to D0) mul.l D3, D0
mul.l T3, T4 add.l #1, D2
• The number of iterations needed depends on the order in
move.l T4, T0 which the CFG is traversed…
add.l #1, T2 L2:
cmp.l D1
cmp l D1, D2 – See EaC pg445, Appel pg226, pg399
L2:
bgt L3 – Live variable analysis is a backwards analysis – LiveIn(n)
cmp.l T1, T2
bra L1
bgt L3 depends on its successors
bra L1 L3:
Before colouring After colouring – Number of iterations depends on program’s structural complexity
L3: move.l D2, x
– its “loop interconnectiveness”
move.l T2, x
October 09 43 October 09 44
Compilers Chapter 6 (c) Paul Kelly, Imperial College London 11
October 09
APPENDIX: Liveness analysis, colouring in Haskell… Solving DFAs in Haskell… (for completeness!)
• Encode DFA equations: • Iterate…
newLiveIn liveIns liveOuts node iterateUpdates nodes (liveIns, liveOuts)
= nodeUses node `union` ( (liveOutsOf node) \\ nodeDefs node ) = let
where (newLiveIns, newLiveOuts) = updateLiveness nodes (liveIns, liveOuts)
liveOutsOf node = retrieve (nodeId node) liveOuts in
newLiveOut liveIns liveOuts node if newLiveIns == liveIns && newLiveOuts == liveOuts
= bigU [retrieve s liveIns | s <- nodeSuccs node] then
where bigU sets = nub (concat sets) (newLiveIns, newLiveOuts)
else
• Do one step: update LiveIn and LiveOut sets for each node:
iterateUpdates nodes (newLiveIns, newLiveOuts)
updateLiveness [] (liveIns, liveOuts) = (liveIns, liveOuts)
updateLiveness (node:nodes) (liveIns, liveOuts)
findLiveRanges :: CFG -> ([(Id,[Register])], [(Id [Register])])
> ([(Id [Register])] [(Id,[Register])]) liveOut
(live ranges liveIn & liveOut, each a
= updateLiveness nodes (newLiveIns, newLiveOuts) mapping from node to list of
temps)
where findLiveRanges (ControlFlowGraph cfgnodes)
newLiveIns = subst (nodeId node) liveIns (newLiveIn liveIns liveOuts node) = iterateUpdates cfgnodes (initialLiveIns, initialLiveOuts)
newLiveOuts = subst (nodeId node) liveOuts (newLiveOut newLiveIns liveOuts node) where
initialLiveIns = initialLiveOuts
Detailed code is shown in the hope that it will make the concepts clearer; initialLiveOuts = [(id,[]) | id <- map nodeId cfgnodes] (an empty list for each node)
please don’t memorize it! Spend the time reading the textbook instead.
October 09 45 October 09 46
• Now build the register interference graph (RIG): Solving DFAs in Haskell… (for completeness!)
buildInterferenceGraph cfg • Colour the graph – find a conflict-free assignment
= [(t, nub (buildInterferenceList liveOuts t)) | t <- temporaries] (nub eliminates duplicates)
where type Colouring = [(Register, Register)] (temporary, real register)
(liveIns, liveOuts) = findLiveRanges cfg
findColouring cfg ifg
temporaries = findTemporaries cfg (findTemporaries lists temps used in code
= let temporaries = findTemporaries cfg
buildInterferenceList [] t = [] in findColouring' temporaries ifg
buildInterferenceList ( (id,livelist) : liveIns) t
| t `elem` livelist = livelist ++ buildInterferenceList liveIns t findColouring' :: [Register] -> InterferenceGraph -> Colouring
| otherwise = buildInterferenceList liveIns t findColouring' [] ifg = []
findColouring' (t:ts) ifg
= let
• If we assign Ti to Dj, will we have a conflict? possibleMappings = [(t,r) | r <- theRealRegisters]
(theRealRegisters is [D0,D1..D31])
doesntInterfere :: (Register,Register) -> InterferenceGraph -> Bool validMappings = [(t,r) | (t,r) <- possibleMappings, doesntInterfere (t,r) ifg]
doesntInterfere (t,r) ifg in (updateIFG replaces temps with regs)
= actualinterferences == [] head [ (t,r) : (findColouring' ts (updateIFG ifg (t,r))) | (t,r) <- validMappings ]
where • If no colouring can be found, this function fails (the list above is empty). If this
actualinterferences = [ ai | ai <- potentialinterferences, ai == r ] happens, we will have to “spill” one of the variables to memory and try again.
(retrieve finds the list corresponding to t)
potentialinterferences = retrieve t ifg \\ [t] (remove t itself, which also appears in list) • This is a quick and dirty but dumb inefficient algorithm; see Appel pg239
October 09 47 October 09 48
Compilers Chapter 6 (c) Paul Kelly, Imperial College London 12
October 09
Solving DFAs in Haskell… (for completeness!)
• Put it all together…
applyColouring :: [Instruction] -> [Instruction]
applyColouring code
= let
cfg = buildCFG code
colouring = findColouring cfg (buildInterferenceGraph cfg)
in
map (replaceTemporaries colouring) code
(where “replaceTemporaries colouring instruction” updates the
( p
instruction to use the specified real registers instead of
temporaries)
October 09 49
Compilers Chapter 6 (c) Paul Kelly, Imperial College London 13
Related docs
Get documents about "