VIEWS: 32 PAGES: 21 POSTED ON: 11/28/2009
Programming Transformations 5th Irish Workshop on Formal Methods Trinity College, Dublin Speaker Time, Date Oege de Moor, University of Oxford Joint work with David Lacey and Eric van Wyk 10am, Monday July 16, 2001 How does one specify a program transformation, so that it is easy to verify that the transformation is correct, and it is also possible to generate an executable "program transformer" from the specification? In this talk, we suggest a formalism based on regular expressions for describing the side conditions in typical compiler optimisations. The principles are illustrated by a number of examples. Summary Next, we turn to the problem of generating a program transformer from such specifications. That problem boils down to this: given a source vertex v in an edge-labelled graph, and a regular language R, determine all vertices v' such that every path v v' is in the language R. The final part of the talk shows how such problems can be formalised in relational calculus, and how one can systematically derive a solution. Slide 1 Programming Transformations Oege de Moor joint work with David Lacey and Eric van Wyk I would like to thank the organisers of IWFM, in particular Glenn Strong and Andrew Butterfield, for the invitation to give a talk at this workshop. Much of formal methods is directed towards the discovery and proof of highlevel program transformations: indeed most of my own work could be classified under that heading. Today, however, I wish to talk about the work that comes after such discovery and proof, namely the task of implementing program transformations. Even for low-level transformations, that can be a difficult undertaking. One could write simply write a program transformer that manipulates tree structures, and indeed many optimising compilers are implemented in that way. Such programming of transforimations is too primitive: it is difficult to experiment with the effect of different transformations. As we are currently lacking sound theories for predicting the combined effect of a large set of program optimisations, such experimentation is rather important. Therefore, I would like to have a way of specifying the transformations in a declarative style, and to generate the program transformer from such a declarative description. Essentially most transformations can be thought of as rewrite rules, replacing one piece of code by an equivalent (hopefully more efficient) one. These rewrite rules do however need side conditions, that express certain semantic properties. Even for a simple programming language, and very simple transformations, these properties can be tricky to express precisely. Slide 2 Constant propagation y := c; P; x := y y := c; P; x := c provided P does not change y To illustrate, consider constant propagation, a transformation that is implemented in many optimising compilers. Its purpose is to save a memory access operation: the assignment x := y (where y is a variable) is replaced by x:=c (where c is a constant). For this to be a valid replacement, we need to know that on all execution paths, the assignment x:=y is preceded by an assignment x := c, with no changes to y in between. To express this condition, we could say that the two relevant assignments y:=c and x:=y are separated by a program fragment P that does not change y in any way. Admittedly this formulation is a bit restrictive, because it requires that all execution paths to the occurrence of x:=y pass through the same block P. We shall see shortly how that requirement can be relaxed, by being more precise about the notion of execution paths. Slide 3 Dead code elimination x := E; P; return y P; return y provided P does not use x, until x is defined or "return y" is reached An equally simple transformation is dead code elimination. Suppose that we have an assignment to a variable x, which is not used until the program returns the value of some other variable y. Such an assignment can be replaced by Skip, because it does not contribute to the final result of the program. Compiler writers call such a useless program fragment dead code. Other forms of dead code are program fragments that are never reached. Again the applicability condition is easily phrased in terms of execution paths: the assignment x:=E is dead provided all paths from that assignment to the return statement do not refer to the value of x, or redefine x before using it. Slide 4 Strength reduction while C do P; x := i * E; Q; i := i+1 end; y := (i-1)*E; while C do P; y := y + E; x := y; Q; i := i+1 end; provided P and Q do not change i Let us consider just one example of a slightly more involved transformation, namely strength reduction. Here we have a loop that contains an expensive multiplication by the loop variable i. At each step of the loop body, i is increased by one, so we could replace the multiplication by a number of successive additions. Those of you who teach wp semantics to their students will recognise this as a typical example of "strengthening the invariant". We introduce a new variable y, with the invariant that y = (i-1)*E. The invariant is established upon loop entry, and maintained by adding E to the value of y. The condition for this to work is that the assignment i:=i+1 is the only place in the loop body where i is updated. You may be wondering why I decided to introduce the new variable y. Could we not have done without that, and just use x instead of y throughout the optimised program? The answer is yes, but doing so would introduce many new conditions. To see that, consider what would happen if C evaluated to false first time round. Then we would have made an assignment to x that was not present in the original program. The lesson is clear: the main tool to simplify side conditions is our brain, and no sophisticated notation will substitute for that! Slide 5 The Goal declarative specification of transformations: • automatic generation of program transformer • easy proof of correctness I can now be more specific about the goal of this talk. We seek a declarative way of specifying program transformations, of the simple kind that I have just discussed. Preferably they will be expressed as rewrite rules, with side conditions in some suitable formal language. Why do we want to do this? Firstly, we hope to reduce the programmer effort involved in implementing program transformers. That in turn will enable application programmers to take control of their optimising compiler, rather than regard it as a mystical black box. In particular, it should be possible to introduce transformations that are specific to an application, for instance exploiting the associativity of matrix multiplication, or by merging a number of consecutive OpenGL primitives. Secondly, these optimisations can be very hard to get right. It is important, therefore, that we can prove that they are indeed correctness preserving. That is easier to do for the declarative specifications than for an imperative formulation of these program edits. In this talk I shall focus almost exclusively on the specification formalism, and on its implementation. The issue of correctness is addressed in a paper by my coauthors and Neil Jones. Slide 6 Constant propagation revisited x := y x := c provided all paths to x := y set y := c, and then do not change y until x:=y is reached Here is constant propagation as a rewrite rule. The side condition states that all execution paths to x:=y make the assignment y:=c, and no changes to y happen between those assignments and x:=y itself. It is important to realise that x, y and c are all meta-variables, that are instantiated to concrete program fragments when the rule is applied. They are not the names of variables in the program to be optimised! As an aside, I’d like to mention that this is indeed a little more general than our previous formulation, because there may be multiple occurrences of the assignment y:=c that precede x:=y, not necessarily just a single one. But what exactly do we mean by “paths” here? In order to answer that question, I have to tell you about flow graphs. Slide 7 Flow graph entry if p > 3 p>3 then q := 0; q := 0 r := 2 r := 1 else r := 2; r := 1 q := 0 q := 0 fi; s := q s := q; all paths to "s := q" set q :=0 if q > 3 q>3 then t := q else t := 5 t := q t := 5 fi Paths refer to the flow graph: a graph where each node is an atomic statement, and the edges indicate the flow of control from one to the next. Whenever there is a conditional (which could branch to multiple nodes), we add all possible edges, independent of the condition in the test. This slide shows an artificial example. On the left is a program; on the right is its flow graph. We start by evaluating the test p>3. Depending on its truth, we choose the then-branch or the else-branch. Each of these branches make assignments to q and r, and then converge again on the assignment s:=q. Finally, another test is made, resulting in t being assigned the value of q or t set to 5. Now note that we actually know that q=0 at the time of that last test, so it is always the then-branch that is chosen. But this does not affect the construction of the flow graph: we add edges from q>3 to both branches irrespective of this particular piece of knowledge. So now we can state exactly what "paths" are meant to be: they are paths in the flow graph, that is sequences of edges. We could think of such edges as being labelled by propositions about their target statement. For example, we could label the edge into t:=q with a proposition that expresses the fact that t is defined (say def(t)) and also with a proposition that records the use of q (say use(q)). That view of paths (as sequences of sets of propositions) allows us to concisely specify the side conditions of program transformations. Naturally the definition of the flow graph gets more complicated for more advanced programming languages. In particular, to deal faithfully with higher-order functions and exceptions requires a lot of extra machinery. The essence for this talk is that the paths in the flow graph are a superset of the paths that can actually occur during program runs. Furthermore, the flow graph is finite. Slide 8 Universal regular path queries query: (_)* ; y := c; (def(y)) *; x:=y q := 0 r := 1 two solutions: ({v1}, { y q, c 0, x s}) and ({v2}, { y q, c 0, x t}) v1: s := q q>3 v2: t := q t := 5 entry p>3 r := 2 q := 0 Let us now consider constant propagation again. We wish to express the condition that all paths to x:=y pass through y:=c, and do not define y in between. We could express this as a regular expression: all paths to an assignment that we wish to optimise consist of four consecutive components: An initial segment that we do not care about: it is specified as (_)*, where the underscore denotes a wildcard matching any edge whatsoever (grep afficionadoes would write it with a dot rather than an underscore). The second path component is a statement of the form y:=c. Here y and c are both meta-variables, to be instantiated with concrete program fragments. Third, the path has a section of zero or more edges that all point to nodes that do not define y. Finally, the path ends up in the statement x:=y. Naturally these free variables (x,y and c) are to be instantiated consistently, so that the paths from entry to the desired vertex are all in the regular language defined by applying these instantiations to the query shown here. So what is an answer to such a query? An answer is a set of pairs. Each pair consist of a set of vertices (say S), and a substitution . Each element v of S satisfies the condition that all paths from entry to v are in the regular language (U). The example on this slide illustrates the idea. Here there are two solutions. In each case, the set of successful nodes consists just of one element. The first says that with the substitution {y q, c 0, x s}, the vertex v1 is the only v such that all paths from entry to v are in the given regular pattern. The second solution describes the application of constant propagation to the leftmost bottommost node in the picture of the flow graph. Perhaps some of you have been following the recent excitement in the database community about query languages for semi-structured data. Here the aim is to perform queries on graphs (sub-graphs of the worldwide web, that is). The basis of many such query languages is the notion of so-called regular path queries. As in our case, they are expressed through regular expressions that may contain free variables. The difference, however, is that in database applications the main interest is in existential queries: there exists a path that satisfies the pattern, rather than the requirement that all paths are in the given regular language. To emphasise that difference, we shall name our type of queries "universal regular path queries". So this, then, is the core of my talk this morning. The side conditions of optimising transformations are specified as universal regular path queries on the flow graph. Presumably this makes it easy to reason about the transformations in terms of some trace semantics. As I announced at the beginning of this talk, I am side stepping any further discussion of the correctness arguments by referring to my absent co-authors. The question remains how such path queries can be efficiently executed. To appreciate the efficiency concerns, you need to know that the flow graph can be quite big, while the pattern is likely to be quite small. Furthermore, the pattern will be available at transformer-generation time. We are therefore looking for a solution that is at worst linear in the size of the flow graph, treating the pattern itself as a constant. Slide 9 A simpler problem given: an edge-labelled graph G with entry vertex v regular expression P compute: set of vertices v' such that all paths v v' are in P extend with free variables in P later... Let us step back for a moment, and consider a simplified version of the problem first, where free variables do not enter the picture. We are given an edge-labelled directed graph G with a designated entry vertex called v. Furthermore we are given a regular expression P. The simplification here is that P does not contain free variables. It is not stated here on the slide, but I am also assuming that G is finite. We could therefore think of G as a non-deterministic finite state machine, where each edge represents a state transition. The type of edge labels is the alphabet of G. Note, however, that G does not specify a set of accepting states. By contrast, P is also a finite state machine, but here we do have a specified set of final states. Our aim is to compute the set of vertices v' such that all paths from the entry vertex v to v' are in the regular language given by P. Knowledge increases suffering, and that is certainly true in computer science. Just before starting to think about this problem, I had read the marvellous paper by Robert Tarjan on path problems, which suggests the use of regular expressions for certain analysis problems. It describes a beautiful algorithm for labelling every vertex v of a graph with a regular expression that describes all paths from the entry node to v. Need I tell you that its complexity involves the inverse Ackermann function? Anyway, I got carried away, attempting to use Tarjan's result. Fortunately, however, a much simpler solution is feasible. Slide 10 Algorithm outline let P' be the deterministic equivalent of P construct product automaton GP' let S be the set of reachable states (u,w) return { u | w : (u,w) S : w accept(P') } Here it is, described at a fairly high level of abstraction, relying on your general education as computer scientists for a little automata theory: We start by constructing the deterministic automaton for the pattern P. This deterministic automaton is called P'. Next, we construct the product automaton of G and P', where the states are pairs of states in G and P'. The transitions in the product automaton are precisely those that are shared by G and P'. Some of you may find it helpful to think in terms of parallel composition here. We can now compute the set of reachable states (u,w), where u is a state of G and w is a state of P'. This could be performed using a simple depth-first search from the initial states of G and P'. Call this set of reachable states S. Finally, we return the set of all u that have been paired up only with accepting states of P'. Naturally all states that were not reachable in the product automaton vacuously satisfy the all-paths condition. Does this algorithm meet our efficiency concerns? Yes: although the first step could be exponential in the size of P, it does not matter because P is small and statically available. The construction of the product automaton takes no more than the size of G times the size of P'. It could in fact be merged with the third step, which is typically implemented using depth-first search, so it takes linear time. Naturally the set of reachable states is likely to be much smaller than the size of the full product automaton. Slide 11 Example 1 a a 4 c 5 a a 3 {U} c {} b c {V,W} a G a 2 b b U a V a W P a 4,{V,W} 1,{U} a GP' 2,{V,W} b a a a 3,{U} P' c 5,{V,W} 5,{} Let us inspect a little example, and see the algorithm in action. This slide has been divided into three vertical sections. The leftmost section shows the graph G: its entry vertex is numbered 1, and the other vertices are numbered 2 through 5. Note that 5 is a sink: if this were a flow graph, it would represent the successful termination of the program. The upper middle section shows the pattern P. The pattern has three states U, V and W, but as you can see, it is nondeterministic. To make it deterministic, we can consider an automaton that works on sets of vertices instead. The resulting automaton P’ is shown in the lower middle section. Note that we now have transitions for the symbol c, all ending in the empty set. The product GP’ is shown in the third section. Here we start off with the state (1,{U}). Where can a symbol a take us? There are two options: either we move to state 2, or to state 4. In either case, we pair up with the set of all pattern states that can be reached from {U} via an a transition, that is the set consisting of V and W. Now consider the paired state (2,{V,W}). According to the graph G, we can only move with a b, to state 3. Moving with a b in the deterministic pattern means moving to {U}. The other transitions are constructed similarly. An interesting thing happens with the transition labelled c from 4 to 5: as there is no corresponding transition in the pattern, this implies a move to the empty set of states in the product. Now we can see what states should be returned as a result of the regular path query. Only W is an accepting state of the pattern. Since we look for the reachable states that are paired up only with accepting states of the deterministic pattern, we have 2 and 4, but not 5. Indeed, 2 and 4 are the two states where the paths from 1 are in the language described by the pattern. Hopefully this example convinces you that the procedure sketched on the previous slide does indeed work. Slide 12 How do we derive this? • define in algebraic terms • use familiar laws of functional programming to derive solution • need relations instead of functions Terrible! Shame on me! An invited talk at a conference on formal methods, and I have just indulged in the worst practice of algorithm design: I presented you with a woolly description of the algorithm, and then tried to convince you of its correctness by talking through a trivial example. Is this the same guy who used to trot the conference circuit proclaiming that algorithms should be derived from specifications? As my colleagues at Microsoft say, I need to eat my own dog food! Here is the strategy. First, we shall define the problem in algebraic terms. That is, we need a formal definition of graphs and automata, and rather than relying on those tuples and big sigmas found in textbooks on formal language theory, we shall use the standard toolkit of functional programmers. The advantage is that we can then use the familiar laws of functional programming to derive an algorithm by equational reasoning. In fact, given the right set of primitives, calculating the algorithm I have just shown you from its specification takes a mere eight steps. There is a small complication, however. We need to reason about nondeterministic automata. To do that, we shall leave the realm of purely functional programming, and consider binary relations instead of just total functions. Slide 13 Relational fold () e :: S A S :: 1 S fold () e [a0, a1, ..., an-1] = ( ((e 1 a0) a1) ... ) an-1 Our first task is to model the notion of an automaton or state machine. Let us think of the transition relation as a binary operator that takes a state (of type S) and a symbol (of type A) and returns a new state (of type S). For a deterministic automaton, this operator is a partial function. We shall however allow it to be a relation, and then we can think of it as a nondeterministic operator that chooses a state in S from a set of possible alternatives. The initial state of an automaton can be defined as a constant e of type S. For technical reasons, it works out slightly nicer to say that e is a constant function (taking an argument of unit type 1), that returns a state in S. Note that this also allows e to specify a set of initial states, rather than just a single one. It is important to spell this out: the unit type 1 has only one element. A relation of type 1 S can therefore be thought of as a subset of S itself. The operation of an automaton can now be modelled via the fold operator of functional programming. It takes the operator, the initial state and a list of symbols. The return value is obtained by summing the symbols from left to right, starting with e and applying the operator at each step. Slide 14 Automata and languages automaton M = (Init, Step, Final), where Init :: 1 S Step :: S A S Final :: S 1 language of M is a relation A* 1: L(M) = Final fold Step Init L(M) is regular if A and S are finite To complete the definition of an automaton, all we need is a specification of the final (accepting) states. The type of that is dual to the type of the initial states, namely a relation from S to 1. The language of an automaton is now simply defined as the set of all input strings (sequences of symbols) that can lead to an accepting state. In the definition on this slide, the dot signifies relational composition. We thus obtain the language by composing the Final states with the machine that consumes input symbols. A language is said to be regular if A and S are finite. This is in fact a well-known view of automata, pioneered by Eilenberg and Wright, who proved that the generalisation of fold from functions to relations is well-defined, and that it satisfies the same properties as its purely functional counterpart. That preservation of well-known results gives us a license to apply the laws of functional programming, giving equational proofs of results in automata theory. Slide 15 Tupling (a,b) R, S c = a R c b S c fold Step Init, fold Step' Init' = fold Steps Inits X,Y X',Y' = X X' Y Y' To illustrate, let me remind you of the technique of tupling, that most of us teach in our first year course on functional programming. The aim is here to take a program that makes two passes over an input list, and to replace it with a program that makes only one pass. To start with, define the split of two relations R and S as shown on the first line of this slide. It says that c is related to (a,b) precisely when R relates c to a and S relates c to b. If R and S are functions, the split of R and S is a function that takes c and returns the pair (a,b). It follows that the split of two folds precisely captures the pattern of two passes over the same input list. The tupling transformation shows how two such folds can be written as a single fold, by appropriately modifying the transition relation and the initial states. I have not given the new definitions of Steps and Inits on this slide: all you need to know is that they can be defined, and that you can look up the definition in many textbooks on functional programming. Many of my fellow Dutchmen (including that honorary Dutchman, Richard Bird) write the fold operator with banana brackets. This dubious notational habit is purely motivated by the desire to call this identity the banana split law. Today, we stick to the more conventional terminology of tupling. I promised you that I would illustrate how laws from functional programming come to bear on automata theory. To do so, there is one more technicality that I need to tell you about. Every relation R has a converse R, obtained by flipping round all the pairs in R. Using converse and split, we can express a deep connection between pairing and intersection: if you first split X' and Y', and then you go back with the converse of the split of X and Y, what you get is an intersection. The intersection consists of the converse of X composed with X' and the converse of Y composed with Y'. Slide 16 Closure under intersection = = = = L(M) L (M') Final fold Step Init Final' fold Step' Init' Final,Final' fold Step Init, fold Step' Init' Final,Final' fold Steps Inits L(Inits, Steps, Final,Final' ) Let us now investigate the intersection of two regular languages. In the first step of this calculation, we simply unfold the definition of language. What you see here is the intersection of two relations, and each of the component relations is itself obtained by sequential composition. Hopefully this rings a bell: it is similar to the pattern I just told you about in connection with split. Indeed, we can apply that identity here, using the additional fact that taking the converse of the converse of R gives you the original R back again. This has prepared the ground for the tupling transformation, where the split of two folds is written as a single fold. It follows that we now have the language of a single machine, and we thus have an algebraic proof that regular languages are closed under intersection. I have to admit I find this sort of calculation seductive, and I hope to investigate with an MSc student at Oxford whether more sophisticated results from automata theory can be similarly streamlined. But that is enough of the airy-fairy stuff about pretty proofs: we have a job to do, namely deriving a solver for universal regular path queries. Slide 17 Deriving a path solver • tupling • isomorphism relations and set-valued functions • weakest pre-specification extend simple algorithm to free variables in pattern by introducing an appropriate monad As you will have guessed from the relentless advancement of the clock, I am going to refer you to our paper for the details of that derivation. There are three main ingredients. First, there is the idea of using tupling, and its close connection to intersection of relations. Second, there is the isomorphism between relations and set-valued functions. That bijection allows us to make the transition from a non-deterministic pattern automaton to a deterministic one. Finally, we have to express the universal quantification in the specification. The appropriate idiom to do that in relational calculus is the weakest pre-specification, first studied by Hoare and He in the mid-eighties. I should like to stress that all these ingredients are standard, and it was not necessary to develop any new theory to solve this new problem. That came as a surprise, even to myself. I first presented the algorithm to our research group in much the same fashion as I did earlier during this talk, citing incantations from automata theory, and drawing examples to argue the correctness. It was only after that unsatisfactory experience that I went back and ate the dog food from my book with Richard Bird, and found it, at least on this occasion, quite tasty. Of course the story does not end here: we need to extend the algorithm to deal with free variables in the pattern. Fortunately that is only a small change, by introducing an appropriate monad that keeps track of the current substitution. Slide 18 Alternative to regular queries: modal logic all paths to x:=y are of the form (_)* ; y := c ; (def(y))* ; x := y in Computational Tree Logic A(def(y) U y:=c) My coauthors would not forgive me if I closed this talk without mentioning an alternative to regular path queries, namely modal logic. Modal logic is ideally suited to expressing the conditions in program transformation; and we can use model checking (using the flow graph as the model) to discharge the conditions. Eric van Wyk pioneered this idea when he was at the university of Iowa. To give you a flavour, here is the side condition of constant propagation again, this time expressed in computational tree logic. Paraphrasing the formula, it says that all backward paths (from x:=y) do not define y until a statement of the form y:=c is reached. The difficulty with this approach is that standard model checkers need to be extended to deal with free variables in formulae. That generalisation has been sorted by my other coauthor, David Lacey. At present I am unable to say whether regular path queries or a variant of CTL are better. Only further experiments with examples and implementations can tell. Slide 19 Current status • advanced implementation of rewrite rules + modal logic for Jimple (David Lacey) • Just started implementation of rewrite rules + regular path queries for .NET intermediate language • a paper by Lacey, Van Wyk and Jones gives correctness proofs through trace semantics Let me close by summarising the current state of the research programme that I have attempted to sketch. David Lacey has produced a well-engineered implementation of rewrite rules with side conditions in modal logic. His system transforms programs in Jimple, an intermediate language for processing Java programs. We have also just started to implement a similar system, based on universal regular path queries, for the .NET intermediate language. This work is generously supported by a gift from Microsoft. Finally, as I have said several times during this talk, there is a paper by Lacey, Van Wyk and Jones on the correctness of transformations expressed in this way. Perhaps the most interesting research question from a mathematical point of view is to justify the generalisation of the path solver to deal with free variables. It appears to be an instance of a general construction, and it ought to be possible to prove it once and for all, for a large class of searching programs. Any suggestions you may have for attacking this problem would be most welcome!