# transformations by absences

VIEWS: 12 PAGES: 55

• pg 1
```									Control Dependences

Chapter 7
Control Dependences

— If-conversion
— Control dependence

Optimizing Compilers for Modern Architectures
Control Dependences
•    Constraints posed by control flow

DO 100 I = 1, N
S1         IF (A(I-1).GT. 0.0) GO TO 100     S2 1 S1
S2            A(I) = A(I) + B(I)*C
100     CONTINUE

If we vectorize by...
S2    A(1:N) = A(1:N) + B(1:N)*C
DO 100 I = 1, N
S1      IF (A(I-1).GT. 0.0) GO TO 100
100 CONTINUE

•    We are missing dependences
•    There is a dependence from S1 to S2 - a control dependence

Optimizing Compilers for Modern Architectures
Control Dependences
•    Two strategies to deal with control dependences:
— If-conversion: expose by converting to data dependences. Used for
vectorization
— Explicitly expose as control dependences. Used for automatic
parallelization

Optimizing Compilers for Modern Architectures
If-conversion
•    Underlying Idea: Convert statements affected by branches to
conditionally executed statements
DO 100 I = 1, N
S1      IF (A(I-1).GT. 0.0) GO TO 100
S2             A(I) = A(I) + B(I)*C
100 CONTINUE

can be converted to:

DO I = 1, N
IF (A(I-1).LE. 0.0) A(I) = A(I) + B(I)*C
ENDDO

Optimizing Compilers for Modern Architectures
If-conversion
DO 100 I = 1, N
S1     IF (A(I-1).GT. 0.0) GO TO 100
S2            A(I) = A(I) + B(I) * C
S3            B(I) = B(I) + A(I)
100 CONTINUE

•    can be converted to:
DO 100 I = 1, N
S2    IF (A(I-1).LE. 0.0) A(I) = A(I) + B(I) * C
S3    IF (A(I-1).LE. 0.0) B(I) = B(I) + A(I)
100 CONTINUE

•    vectorize using the Fortran WHERE statement:
DO 100 I = 1, N
S2      IF (A(I-1).LE. 0.0) A(I) = A(I) + B(I) * C
100   CONTINUE
S3   WHERE (A(0:N-1).LE. 0.0) B(1:N) = B(1:N) + A(1:N)

Optimizing Compilers for Modern Architectures
If-conversion
•    If-conversion assumes a target notation of guarded execution in
which each statement implicitly contains a logical expression
controlling its execution

S1 IF (A(I-1).GT. 0.0) GO TO 100
S2     A(I) = A(I) + B(I)*C
100 CONTINUE

•    with guards in place:

S1 M = A(I-1).GT. 0.0
S2    IF (.NOT. M) A(I) = A(I) + B(I)*C
100 CONTINUE

Optimizing Compilers for Modern Architectures
Branch Classification

•    Forward Branch: transfers control to a target that occurs
lexically after the branch but at the same level of nesting

•    Backward Branch: transfers control to a statement occurring
lexically before the branch but at the same level of nesting

•    Exit Branch: terminates one or more loops by transferring
control to a target outside a loop nest

Optimizing Compilers for Modern Architectures
If-conversion
•    If-conversion is a composition of two different transformations:
1.      Branch relocation
2.      Branch removal

Optimizing Compilers for Modern Architectures
Branch removal
•    Basic idea:
— Make a pass through the program.
— Maintain a Boolean expression cc that represents the condition that
must be true for the current expression to be executed
— On encountering a branch, conjoin the controlling expression into cc
— On encountering a target of a branch is encountered, its controlling
expression is disjoined into cc

Optimizing Compilers for Modern Architectures
Branch Removal: Forward Branches
•    Remove forward branches by inserting appropriate guards

DO 100 I = 1,N
C1       IF (A(I).GT.10) GO TO 60
20           A(I) = A(I) + 10
C2           IF (B(I).GT.10) GO TO 80
40               B(I) = B(I) + 10
60           A(I) = B(I) + A(I)
80       B(I) = A(I) - 5
ENDDO

DO 100 I = 1,N
m1 = A(I).GT.10
20       IF(.NOT.m1) A(I) = A(I) + 10
IF(.NOT.m1) m2 = B(I).GT.10
40       IF(.NOT.m1.AND..NOT.m2) B(I) = B(I) + 10
60       IF(.NOT.m1.AND..NOT.m2.OR.m1)A(I) = B(I) + A(I)
80       IF(.NOT.m1.AND..NOT.m2.OR.m1.OR..NOT.m1
.AND.m2) B(I) = A(I) - 5
ENDDO

Optimizing Compilers for Modern Architectures
Branch Removal: Forward Branches
•    We can simplify to:
DO 100 I = 1,N
m1 = A(I).GT.10
20     IF(.NOT.m1) A(I) = A(I) + 10
IF(.NOT.m1) m2 = B(I).GT.10
40     IF(.NOT.m1.AND..NOT.m2)
B(I) = B(I) + 10
60     IF(m1.OR..NOT.m2)
A(I) = B(I) + A(I)
80     B(I) = A(I) - 5
ENDDO

•    vectorize to:
m1(1:N) = A(1:N).GT.10
20 WHERE(.NOT.m1(1:N)) A(1:N) = A(1:N) + 10
WHERE(.NOT.m1(1:N)) m2(1:N) = B(1:N).GT.10
40 WHERE(.NOT.m1(1:N).AND..NOT.m2(1:N))
B(1:N) = B(1:N) + 10
60 WHERE(m1(1:N).OR..NOT.m2(1:N))
A(1:N) = B(1:N) + A(1:N)
80 B(1:N) = A(1:N) - 5

Optimizing Compilers for Modern Architectures
Branch Removal: Forward Branches
•    To show correctness we must establish:
— the guard for statement instance in the new program is true if and
only if the corresponding statement in the old program is executed,
unless the statement has been introduced to capture a guard
variable value, which must be executed at the point the conditional
expression would have been evaluated

— the order of execution of statements in the new program with true
guards is the same as the order of execution of those statements
in the original program

— Any expression with side effects is evaluated exactly as many times
in the new program as in the old program

Optimizing Compilers for Modern Architectures
Exit Branches
DO J = 1, M
DO I = 1, N
A(I,J) = B(I,J) + X
S         IF (L(I,J)) GO TO 200
C(I,J) = A(I,J) + Y
ENDDO
D(J) = A(N,J)
200    F(J) = C(10,J)
ENDDO

•    more complicated because they terminate a loop
•    Solution: relocate exit branches and convert them to forward
branches

Optimizing Compilers for Modern Architectures
Exit Branches
DO J = 1, M
DO I = 1, N
A(I,J) = B(I,J) + X
S                  IF (L(I,J)) GO TO 200
C(I,J) = A(I,J) + Y
ENDDO
D(J) = A(N,J)
200        F(J) = C(10,J)
ENDDO

DO J = 1, M
DO I = 1, N
IF (C1) A(I,J) = B(I,J) + X
Sa                        Code to set C1 and C2
IF (C2) C(I,J) = A(I,J) + Y
ENDDO
Sb                IF (.NOT.C1.OR..NOT.C2) GO TO 200
D(J) = A(N,J)
200           F(J) = C(10,J)
ENDDO
•    What should C1 and C2 be?

Optimizing Compilers for Modern Architectures
Exit Branches
•    Statements in the inner loop should be executed only if exit
branch was not taken on any previous iteration
•    For the ith iteration, C1 and C2 should be
lm = AND(  L(k, J) ), 1  k  i-1

DO J = 1, M
lm = .TRUE.
DO I = 1, N
IF (lm) A(I,J) = B(I,J) + X
IF (lm) m1 = .NOT. L(I,J)
lm = lm .AND. m1
IF (lm) C(I,J) = A(I,J) + Y
ENDDO
m2 = lm
IF (m2) D(J) = A(N,J)
200           F(J) = C(10,J)
ENDDO

Optimizing Compilers for Modern Architectures
Exit Branches
•    After forward substitution and expansion of lm, we get:
DO J = 1, M
lm(0,J) = .TRUE.
DO I = 1, N
IF (lm(I-1,J)) A(I,J) = B(I,J) + X
IF (lm(I-1,J)) m1 = .NOT.L(I,J)
lm(I,J) = lm(I-1,J) .AND. m1
IF (lm(I,J)) C(I,J) = A(I,J) + Y
ENDDO
IF (lm(N,J)) D(J) = A(N,J)
200           F(J) = C(10,J)
ENDDO

•    codegen will produce four vectorized loops…

Optimizing Compilers for Modern Architectures
Exit Branches
•    After running codegen:
DO J = 1, M
lm(0,J) = .TRUE.
DO I = 1, N
IF (lm(I-1,J)) m1 =.NOT.L(I,J)
lm(I,J) = lm(I-1,J) .AND. m1
ENDDO
ENDDO
WHERE(lm(0:N-1,1:M)) A(1:N,1:M)=B(1:N,1:M)+X
WHERE(lm(0:N-1,1:M)) C(1:N,1:M)=A(1:N,1:M)+Y
WHERE(lm(N,1:M)) D(1:M) = A(N,1:M)
200 F(1:M) = C(10,1:M)

•    Procedure relocate_branches()

Optimizing Compilers for Modern Architectures
Backward Branches
•   Problems:
— Create implicit loops. Backward control flow cannot be simulated by
simple guards
— Complicate removal of forward branches - may create loops into
which forward branches jump
IF (P) GO TO 200
...
100      S1
...
200      S2
...
IF (Q) GO TO 100

•   Applying forward if-conversion
m1 = .NOT. P
...
100    IF (m1) S1
...
200     S2
...
IF (Q) GO TO 100

Optimizing Compilers for Modern Architectures
Backward Branches
•    Solutions?
— Avoid region within a backward control flow edge
— Eliminate backward branches through a variant of
if-conversion

•    Note that:
— S1 is executed on the first pass through the code only if P is false
— S1 is always executed when the backward branch is taken

•    Use a backward branch guard!

Optimizing Compilers for Modern Architectures
Backward Branches
•    Using a backward branch guard:

IF (P) GO TO 200
...
100     S1
...
200     S2
...
IF (Q) GO TO 100
•    converted to:
m = P
...
bb = .FALSE.
100    IF (.NOT.m .OR (m.AND.bb)) S1
...
200       S2
...
IF (Q) THEN
bb = .TRUE.
GO TO 100
ENDIF

Optimizing Compilers for Modern Architectures
Backward Branches
•    In general, two ways a target of a backward branch can be
reached:
— Fall through
— Branch around the statement but reach it via a backward branch

•    Thus, if current condition just prior to target y is cc, the
branch condition is m, and the backward branch condition is bb,
the guard at y should be:
cc OR (m AND bb)

Optimizing Compilers for Modern Architectures
Complete Forward Branch Removal
1    Statement is branch target: combine (disjoin) set of conditions
associated with branches to that target with the current
condition passed from the lexical predecessor

2    Statement is any type except DO, ENDDO, CONTINUE: the
current condition is conjoined to the guard for the current
statement

3    Statement is a DO: invoke relocate_branches to remove exit
branches. Recur on body of the loop. May generate some
statements before the loop which should be guarded by the
current condition

Optimizing Compilers for Modern Architectures
Complete Forward Branch Removal
4    Statement is a conditional branch: 2 copies of the current
— The compiler generated variable associated with the new condition
is conjoined with cc and the result is appended to the list
associated with the branch target
— The negation of the variable is conjoined to cc and is the current
condition for the next statement

5    Statement is an unconditional branch: current condition, cc, is
appended to the list of conditions for the branch target.
Current condition for the next statement is set to false

6    Continue processing at step 1 for next statement

Optimizing Compilers for Modern Architectures
Simplification
•    Boolean Simplifier is NP-Complete
•    Use Simplify, an O(N2) algorithm by tweaking simplification
process to focus on if-conversion

Optimizing Compilers for Modern Architectures
Iterative Dependences
•    Iterative statements can also create control dependences:
20      DO I = 1, 100
40             L = 2*I
60             DO J= 1,L
80               A(I,J) = 0
ENDDO
ENDDO

•    If we vectorize as:
20       DO I = 1, 100
40            L = 2*I
100      ENDDO
80       A(1:100,1:L) = 0

•    Incorrect!
•    Must capture the notion that the DO statement controls the
number of times a particular statement is executed.

Optimizing Compilers for Modern Architectures
Iterative Dependences
•    Notation used:
•    A(I, J)       (irange)

•    where irange is a compiler generated scalar which holds the
iteration range
•    Using this notation, the example will be converted to:

20     irange1 = (1,100)
DO I = irange1
40        L = 2*I (irange1)
60        irange2 = (1,L) (irange1)
DO J = irange2
80           A(I,J) = 0 (irange2)
ENDDO
ENDDO

Optimizing Compilers for Modern Architectures
Iterative Dependences
•    Forward substituting constants and loop-independent variables:
20     DO I = 1,100
40        L = 2*I (1,100)
60       DO J = 1,L (1,100)
80              A(I,J) = 0 (1,L) (1,100)
ENDDO
ENDDO

•    which vectorizes to:
20     DO I = 1, 100
40           L = 2*I
80          A(I,1:L) = 0
ENDDO

Optimizing Compilers for Modern Architectures
If-reconstruction
•    If-conversion may degrade performance when vectorization is
not possible

DO 100 I = 1, N
IF (A(I) .GT. 0) GOTO 100
B(I) = A(I) * 2.0
A(I+1) = B(I) + 1
100       CONTINUE

•    After if-conversion:
DO 100 I = 1, N
m1 = (A(I) .GT. 0)
IF (.NOT. m1) B(I) = A(I) * 2.0
IF (.NOT. m1) A(I+1) = B(I) + 1
100       CONTINUE

Optimizing Compilers for Modern Architectures
If-reconstruction
•    On a machine without predicated execution:

DO 100 I = 1, N
m1 = (A(I) .GT. 0)
IF ( m1) GOTO 10
B(I) = A(I) * 2.0
10         IF (m1) GOTO 20
A(I+1) = B(I) + 1
20       CONTINUE
100     CONTINUE

•    If-reconstruction: replace sections of guarded code with a
minimal set of branches that enforce the guarded execution

Optimizing Compilers for Modern Architectures
Control Dependence
— Unnecessarily complicates code when code cannot be vectorized
— Cannot a priori analyze code to decide whether if-conversion will

•    Alternate approach: explicitly expose constraints due to control
flow as control dependences

Optimizing Compilers for Modern Architectures
Control Dependence
•    A node x in directed graph G with a single exit node
postdominates node y in G if any path from y to the exit node
of G must pass through x.
•    A statement y is said to be control dependent on another
statement x if:
— there exists a non-trivial path from x to y such that every
statement zx in the path is postdominated by y and
— x is not postdominated by y.
•    In other words, a control dependence exists from S1 to S2 if
one branch out of S1 forces execution of S2 and another
doesn’t
•    Note that control dependences can be looked as a property of
basic blocks

Optimizing Compilers for Modern Architectures
Control Dependence: Example

Optimizing Compilers for Modern Architectures
Control Dependence: Example
•    n nodes and O(n2) control
dependences.
•    Control dependence graphs
can thus get much larger than
the corresponding CFG
•    procedure ConstructCD
constructs the control
dependence relation

Optimizing Compilers for Modern Architectures
Control Dependence: Loops
•    Loops can be converted to a CFG and then ConstructCD can be
applied
•    Want to treat loops as special cases to help in transforming
loops
•    Use a loop control node to represent the loop

10    DO I = 1, 100
20       A(I) = A(I) + B(I)
30       IF (A(I).GT.0) GO TO 50
40       A(I) = -A(I)
50       B(I) = A(I) + C(I)
ENDDO

Optimizing Compilers for Modern Architectures
Execution Model
•    In Chapter 2, we annotated each statement S with the
corresponding iteration vector i
•    S(i) could execute whenever every statement instance that it

•    However…

DO I = 1, N
S0            IF (P) GO TO S2
S1            ...
S2            ...
ENDDO

Optimizing Compilers for Modern Architectures
Execution Model
•    Solution: Use a doit flag for each statement: S(i).doit
•    Statement instances that are not control dependent on any
other statement: doit = True
•    For all other statements: doit = False
•    How does doit get set to True?
— All those statements that are control dependent on the conditional
and whose execution is forced by the sense of the condition: doit =
true

•    Execute statement instance S(i) if its doit flag is set to True
and every statement instance it depends on either has a false
doit flag or has been executed

Optimizing Compilers for Modern Architectures
Execution Model

•    Note that if doit is true for S, then there is a sequence of
control statements S0, S1, ... , Sm= S such that S0 is executed
unconditionally and the decision taken at Sk forces the
execution of Sk+1, 0  k < m

•    Sequence of control dependences defines a unique execution
path

Optimizing Compilers for Modern Architectures
Execution Model
•    Behavior of loop control nodes under this model:

•    Case 1: Evaluation of iteration range does not depend on
quantities computed in loop:
— Set doit for loop node to True
— Range of iteration can be completely evaluated
— Create collection of statement instances for the loop body, one for
each iteration of the loop
— Set doit flags of statements control dependent on loop header to
true, all other doit flags to False

Optimizing Compilers for Modern Architectures
Execution Model

•    Case 2: Evaluation of iteration range depends on quantities
computed in loop:
— If range is non-empty, create new instance of loop header,
adjusting range to the remainder of the iterations
— DO.doit = True if dependence back to DO is a data dependence and
False if it is a control dependence
— Set doit flags of statements control dependent on loop header to
true, all other doit flags to False

Optimizing Compilers for Modern Architectures
Execution Model
Theorem 7.1. Dependence graphs that are executed according to
the execution model are equivalent in meaning to the programs
from which they are created.
•    Proof:
— Show that doit flag of statement is true iff it is executed in the
original program
— Proof by contradiction: Consider the shortest sequence
S0, S1, …,Sm-1, Sm s.t. Sm is the first statement to get the
wrong doit flag
— Focus on Sm-1:
– All statements executed leading to Sm-1 in the original program
must be executed in this model
– Statements that are not executed leading to Sm-1 in the
original program cannot be executed in this model

Optimizing Compilers for Modern Architectures
Control Dependence and Parallelization
•    For simplicity, we shall only consider:
— Forward branches - they create loop-independent control
dependences
— Control Dependences due to loops
•    From Chapter 2: Most loop transformations are unaffected by
loop-independent dependences
•    Loop reversal, loop skewing, strip mining, index-set splitting,
loop interchange do not affect independent dependences
•    Might be problematic: Loop fusion, loop distribution
•    However, since exit branches are excluded, loop fusion is not a
problem

Optimizing Compilers for Modern Architectures
Loop Distribution
DO I = 1, N
S1       IF (A(I).LT.B(I)) GOTO 20
S2       B(I) = B(I) + C(I)             S1 -1 S2
20      CONTINUE
ENDDO

•    Distributing…
DO I = 1, N
S1       IF (A(I).LT.B(I)) GOTO 20
ENDDO
DO I = 1, N
S2       B(I) = B(I) + C(I)
ENDDO
20         CONTINUE

•    Incorrect!

Optimizing Compilers for Modern Architectures
Loop Distribution
•    Problem: control dependences crossing between distributed loops
•    Solution: Keep a history of the evaluated conditions (similar to
if-conversion).
DO I = 1, N
S1       IF (A(I).LT.B(I)) GOTO 20
S2       B(I) = B(I) + C(I)
20      CONTINUE
ENDDO

•    Convert to:
DO I = 1, N
S1       e(I) = A(I).LT.B(I)
ENDDO
DO I = 1, N
S2       IF (e(I).EQ..FALSE.) B(I) = B(I) + C(I)
ENDDO

Optimizing Compilers for Modern Architectures
Loop Distribution
•    More complex example:

DO I = 1, N
1       IF (A(I).NE.0) THEN
2              IF (B(I)/A(I).GT.1) GOTO 4
ENDIF
3       A(I) = B(I)
GOTO 8
4       IF (A(I).GT.T) THEN
5              T = (B(I) - A(I)) + T
ELSE
6              T = (T + B(I)) – A(I)
7              B(I) = A(I)
ENDIF
8       C(I) = B(I) + C(I)
ENDDO

Optimizing Compilers for Modern Architectures
Loop Distribution
•    Fusion into "like" regions
•    Needs two execution
variables E2(I) and E4(I) to
hold result of branches at
statement 2 and 4
respectively

Optimizing Compilers for Modern Architectures
Loop Distribution
•    Consider branch at node 2:
•    3 cases may hold
— Statement 2 is executed and the true
branch to statement 4 is taken
— Statement 2 is executed and the false
branch to statement 3 is taken
— Statement 2 is never executed
because the false branch is taken at
statement 1
•    Corresponds to condition for doit
variable to be set:
— A control dependence exists from
S0 to S.
— S0 has its doit flag set
— Value of the conditional expression is the
label on the branch

Optimizing Compilers for Modern Architectures
Loop Distribution
•    Use three corresponding values: True, False, Undefined
•    procedure DistributeCDG implements these ideas. It inserts
execution variables at appropriate places in the code and
selectively converts control dependences to data dependences

Optimizing Compilers for Modern Architectures
Code Generation
•    Problem: Mapping the arbitrary control flow represented in the
control dependence graph to real machines
DO I = 1, N
S1     IF (p1) GOTO 3
S2   ...
GOTO 4
3     IF (p3) GOTO 5
4     S4
5     S5
ENDDO

Loop distribution

Optimizing Compilers for Modern Architectures
Code Generation
•    Code generated for first partition:
DO I = 1, N
E1(I) = p1
IF (E1(I).EQ.FALSE) THEN
S2    ...
ENDIF
S5    ...
ENDDO

•    For second partition:
DO I = 1, N
IF((E1(I).EQ..TRUE.).AND..NOT.p3).OR.
(E1(I).EQ..FALSE.)) THEN
S4     ...
ENDIF
ENDDO

Optimizing Compilers for Modern Architectures
Code Generation
•    Observation: generating code for graphs in which every vertex
has at most one control dependence predecessor is relatively
easy

•    Thus, transform graph into canonical form consisting of a set of
control dependence trees with the following properties:
— each statement is control dependent on at most one other
statement, i.e., each statement is a member of at most one tree
— the trees can be ordered so that all data dependences between
trees flow from trees earlier in the order to trees that are later in
the order

Optimizing Compilers for Modern Architectures
Code Generation

Optimizing Compilers for Modern Architectures
Code Generation

Optimizing Compilers for Modern Architectures
Code Generation
•    How can the statements be organized into groups of statements
that are part of the same conditional statement?
— Statements can be grouped together if there is no dependence path
between them that passes through a statement that is not a child
of the same conditional node with the same label
— Typed Fusion!
— Each statement typed by (p, l) where
— p: its unique control dependence predecessor
— l: the truth label of the edge from p to the statement

Optimizing Compilers for Modern Architectures
Code Generation
•    Simple recursive procedure
•    Generate code for each of the subtree in an order consistent
with the data dependences
•    Roughly linear in size of the original dependence graph

Optimizing Compilers for Modern Architectures

```
To top