A Machine -Verified Code Generator

Document Sample
A Machine -Verified Code Generator Powered By Docstoc
					           A Machine -Verified Code Generator

                   Christoph Walther and Stephan Schweitzer

                           Fachgebiet Programmiermethodik
                           Technische Universit¨t Darmstadt

        Abstract. We consider the machine-supported verification of a code
        generator computing machine code from WHILE-programs, i.e. abstract
        syntax trees which may be obtained by a parser from programs of an
        imperative programming language. We motivate the representation of
        states developed for the verification, which is crucial for success, as the
        interpretation of tree-structured WHILE-programs differs significantly in
        its operation from the interpretation of the linear machine code. This
        work has been developed for a course to demonstrate to the students
        the support gained by computer-aided verification in a central subject of
        computer science, boiled down to the classroom-level. We report about
        the insights obtained into the properties of machine code as well as the
        challenges and efforts encountered when verifying the correctness of the
        code generator. We also illustrate the performance of the XeriFun system
        that was used for this work.

1     Introduction

We develop the X   eriFun system [1],[23], a semi-automated system for the ver-
ification of programs written in a functional programming language. One rea-
son for this development originates from our experiences when teaching Formal
Methods, Automated Reasoning, Semantics, Verification, and similar subjects.
As the motivation of the students largely increases when they can gather prac-
tical experiences with the principles and methods taught, X    eriFun has been
developed as a small, highly portable system with an elaborated user interface
and a simple base logic, which nevertheless allows the students to perform ambi-
tious verification case studies within the restricted time frame of a course. The
system has been used in practical courses at the graduate level for proving e.g.
the correctness of a first-order matching algorithm, the RSA public key encryp-
tion algorithm and the unsolvability of the Halting Problem, as well as recently
in an undergraduate course about Algorithms and Data Structures, where more
than 400 students took their first steps in computer-aided verification of simple
statements about Arithmetic and Linear Lists and the verification of algorithms
like Insertion Sort and Mergesort, cf. [22], [24]. XeriFun comes as a JAVA ap-
plication which the students can run on their home PC (whatever platform it
may use) after a 1 MB download to work with the system whenever they like to.


Technical Report VFR 03/01
    This paper is concerned with the verification of a code generator for a sim-
ple imperative language. Work on verified code generators and compilers dates
back more than 35 years [9]. With the development of elaborated logics and the
evolving technology of theorem proving over the years, systems developed that
provide a remarkable support for compiler verification as well. Various impres-
sive projects have been carried out which demonstrate well the benefits of certain
logical frameworks and their implementation by reasoning systems in this do-
main. Meanwhile a tremendous amount of literature exists, which excludes an
exhaustive account. E.g., [8] presents a case study using the Elf language, [14]
uses the HOL system to verify a compiler for an assembly language, [6] and [3]
report on compiler verification projects for a subset of CommonLisp using PVS,
and [15] verifies a compiler for Prolog with the KIV system. Much work also
centers around the Boyer-Moore prover and its successors, e.g. [4], and in one
of the largest projects the compilation of an imperative programming language
via an assembly language down to machine code is verified, cf. [10], [11], [27].
    However, the high performance of these systems also comes with the price
of highly elaborated logics and complicated user interfaces, which makes their
use difficult for teaching within the restricted time frame of a course (if it is
not impossible at all). Furthermore, as almost all of the cited work is concerned
with real programming languages and the bits-and-pieces coming with them, it
is hard to work out the essential principles and problems from the presentations
to demonstrate them in the classroom. And last but not least, it is also difficult
(in particular for the students) to assess the effort needed when using a certain
tool, as most of the papers do not provide appropriate statistics but refer to
large proof scripts in an appendix or to be downloaded from the web for further
    The work presented here was prepared (in addition to the material given in
[18]) for a course about Semantics and Program Verification to illustrate the
principles of state-based semantics and the practical use of formal semantics
when developing compilers etc. However, the main focus is to demonstrate the
support gained by computer-aided verification in a central subject of computer
science education, boiled down to the classroom-level. The code generator com-
putes machine code from abstract syntax trees as used in standard textbooks
of formal semantics, e.g. [7], [13], [26]. We report about the insights obtained
into the properties of machine code as well as the challenges and efforts en-
countered when verifying the correctness of this program. We also illustrate the
performance of the X   eriFun system that was used for this work.

2   WHILE - Programs

The language of WHILE-programs consists of conditional statements, while-loops,
assignments, compound statements and statements for doing nothing, and is de-
fined by the data structure WHILE.PROGRAM in Fig. 1. WHILE-programs represent
abstract syntax trees which for instance are computed by a compiler from a
program conforming to the concrete syntax of a programming language to be

structure    VARIABLE <= VAR(ADR:nat)
structure    nullary.operator <= FALSE, TRUE, CONSTANT(number:nat)
structure    unary.operator <= NOT, EVEN, DIV2
structure    binary.operator <= AND, OR, PLUS, MINUS, TIMES, EQ, GT, LE
structure EXPR <=
 EXPR2(e-op2:binary.operator,arg1:EXPR,arg2:EXPR), VAR@(index:nat)
structure WHILE.PROGRAM <=

               Fig. 1. The languages of Expressions and WHILE-programs


                      Fig. 2. A WHILE-program computing the gcd
available for subsequent code generation, cf. e.g. [2]. The language of WHILE-
programs uses program variables and expressions built with arithmetical and
logical operators, which are defined by further data structures, cf. Fig. 1.1 For
instance, the WHILE-program of Fig. 2 computes the gcd of the variables VAR1
and VAR2 and returns the result in the program variable VAR1.
    Following the standard approach for the definition of a (structural) oper-
ational semantics of WHILE-programs, e.g. [7], [13], [26], we start by provid-
ing an operational semantics for the expressions EXPR: Given a data structure
association which associates a VARIABLE with a natural number, we define a
data structure memory organized as a linear list of associations and a proce-
dure function assignment(v:VARIABLE, m:memory):nat <= ... which re-
turns the assignment of a variable v wrt. memory m (or returns 0 in case that
no assignment for v exists in m). Then a procedure function value(e:EXPR,m:
memory):nat <= ... is used to compute the value of an expression wrt. the
    Neither the language of expressions nor the language of WHILE-programs respect
    types, i.e. boolean expressions may be used in places where only arithmetical expres-
    sion are meaningful and vice versa. Of course, one may take care of correct typing by
    an appropriate modification of the definitions in Fig. 1 without complicating subse-
    quent proofs. However, we do not follow this idea, because (1) our code generator is
    also correct for ill-typed expressions, and (2) type checking is performed by a com-
    piler before the computation of an abstract syntax tree, so that ill-typed expressions
    are never available for code generation.

structure state <= timeout, triple(loops:nat,store:memory,stack:Stack)
function eval(r:state, wp:WHILE.PROGRAM):state <=
if r=timeout
 then timeout
 else if wp=WHILE(WCOND(wp),BODY(wp))
       then if value(WCOND(wp),store(r))=0
             then if loops(r)=0
                    then timeout
                    else if pred(loops(r))≥loops(eval(S,BODY(wp)))
                           then eval(eval(S,BODY(wp)),wp)
                           else timeout
                          fi fi
             else r fi
       else if wp=SET(CELL(wp),TERM(wp))
             then triple(loops(r),
             else if wp=COMPOUND(LEFT(wp),RIGHT(wp))
                    then eval(eval(r,LEFT(wp)),RIGHT(wp))
                    else if wp=IF(ICOND(wp),THEN(wp),ELSE(wp))
                           then if value(ICOND(wp),store(r))=0
                                 then eval(r,THEN(wp))
                                 else eval(r,ELSE(wp))
                           else r fi fi fi fi fi
      where S abbreviates triple(pred(loops(r)),store(r),stack(r))

                   Fig. 3. An interpreter eval for WHILE-programs
assignments in memory m and the semantics provided for the operators, thus
defining an operational semantics for the language of expressions. The procedure
value calls assignment to obtain the number assigned to a program variable by
memory m, returns the number denoted by a CONSTANT and returns 0 for TRUE
and 1 for FALSE.2 In all other cases, the operation corresponding to the expres-
sion’s operator is applied to the values of the expression’s arguments computed
recursively by value. E.g., the computation of value(EXPR2(PLUS,e1,e2),m)
yields call-PLUS(value(e1,m),value(e2,m)), where call-PLUS and similar
procedures defining the semantics of the other operators are given elsewhere.3
    An operational semantics for WHILE-programs is given by the interpreter eval
which maps a program state and a WHILE-program to a program state, cf. Fig.
3. A program state is either the symbol timeout (denoting a non-terminating
interpretation of a WHILE-program) or is a triple consisting of a counter loops,
    Truth values are encoded by natural numbers, where natural numbers are predefined
    in XeriFun by the data structure structure nat <= 0, succ(pred:nat).
    The procedure call-PLUS and all other procedures providing the semantics of the
    operators remain undefined here, because the correctness of the code generator does
    not depend on the semantics of the operators.

structure STACK.PROGRAM <=
structure INSTRUCTION <=
 JUMP+(displ.J+:nat), JUMP-(displ.J-:nat),
 BRANCH+(displ.B+:nat), BRANCH-(displ.B-:nat)

                 Fig. 4. Stack Programs and Machine Instructions
a memory store and a stack stack. The store holds the current variable as-
signments under which expressions are evaluated by value (when executing a
WHILE-, a SET- or an IF-statement), and which is updated when executing a
SET-statement. The rˆle of the counter loops is discussed in Sections 6 and 7.
Also the stack is used for subsequent developments, cf. Section 3, and may be
ignored here as eval does not consider this component of a triple at all.

3   Machine Programs
The Stack Machine. Our target machine consists of a program store, a ran-
dom access memory, a stack and a program counter pc. Arithmetical and logical
operations are performed by a subdevice of the target machine, called the stack
machine. The stack machine may push some content of the memory onto the
stack and performs arithmetical and logical operations by fetching operands from
and returning results to the stack. The operation of the stack machine is con-
trolled by so-called stack programs. The target machine provides an instruction
EXEC which calls the stack machine to run the stack program provided by the
parameter of EXEC.
    Stack programs are finite sequences of PUSH-commands and are defined by
the data structure STACK.PROGRAM of Fig. 4. An operational semantics for stack
programs is given by procedure function run(sp:STACK.PROGRAM,s:Stack,
m:memory):Stack <= ... which interprets the commands of a stack program
step by step: When executing a stack program PUSH.VAR(v, sp), the number
assigned to program variable v by memory m (obtained by assignment) is pushed
onto stack, and when executing PUSH.OP0(c, sp), the value denoted by constant
c is pushed onto stack. In all other cases operands are fetched from and the
result of the operation is pushed onto stack. For instance, stack s is replaced by
push(call-PLUS(top(pop(s)),top(s)),pop(pop(s))) when executing a stack
program PUSH.OP2(PLUS, sp), where call-PLUS and the procedures defining the
semantics of the other operators are the same like for value.
    Having executed a PUSH-command, run proceeds with the execution of the
remaining stack program sp.

Syntax and Semantics of Machine Programs. Besides the EXEC instruc-
tion, the target machine also provides a LOAD instruction to write the top-of-stack

[00]   JUMP+(7);
[01]   EXEC(PUSH.VAR(VAR(1),PUSH.VAR(VAR(2),PUSH.OP2(GT,null))));
[02]   BRANCH+(3);
[04]   LOAD(VAR(1));
[05]   JUMP+(2);
[07]   LOAD(VAR(2));
[09]   BRANCH-(7);
[11]   BRANCH+(3);
[12]   EXEC(PUSH.VAR(VAR(2),null));
[13]   LOAD(VAR(1));
[14]   JUMP+(1);
[15]   NOOP

    Fig. 5. A machine program computed by code for the WHILE-program of Fig. 2

to a designated address of the memory, two unconditional jump instructions
JUMP+ and JUMP- to move the pc forward or backward in the program store,
two conditional jump instructions BRANCH+ and BRANCH- which are controlled
by the top-of-stack, a HALT instruction which halts the target machine, and a
NOOP instruction which does nothing (except incrementing the pc). The instruc-
tion set is formally defined by the data structure INSTRUCTION of Fig. 4, and a
MACHINE.PROGRAM is simply a linear list of instructions, where VOID denotes the
empty program, begin yields the first instruction and end denotes the machine
program obtained by removing the first instruction. Fig. 5 gives an example of
a machine program.4
    An operational semantics of machine programs is defined by procedure exec
of Fig. 6.5 This interpreter uses a procedure fetch to fetch the instruction to
which the program counter pc points in the machine program mp. The interpreter
exec returns the input state if called in the state timeout, called with an empty
machine program, with a HALT instruction or if the pc is not within the address
space [0, ..., size(mp) − 1] of the machine program mp, where size(mp) computes
the number of instructions in mp.
    To ease readability, the list structure of machine programs is omitted in Fig. 5. E.g.
    we write LOAD(VAR(1));JUMP+(1);NOOP for the last three instructions of the ma-
    chine program instead of the formally required conc(LOAD(VAR(1)),conc(JUMP+(1),
    conc(NOOP,VOID)...). Also the addresses [. . .] are not part of the machine program,
    but are added for illustration purposes only.
    For lack of space, some instructions are omitted in Fig. 6. The execution of NOOP
    results in a recursive call of exec with incremented pc and unchanged state, HALT
    terminates the execution of exec and returns the input state, BRANCH+ is imple-
    mented like JUMP+, however controlled by the top-of-stack like BRANCH-, and JUMP-
    is implemented like BRANCH-, however without being controlled by the top-of-stack.

function exec(r:state, pc:nat, mp:MACHINE.PROGRAM):state <=
if r=timeout
 then timeout
 else if mp=VOID
       then r
       else if pc>size(end(mp))
             then r
             else if fetch(pc,mp)=LOAD(loc(fetch(pc,mp)))
                   then exec(triple(loops(r),
               else if fetch(pc,mp)=EXEC(prog(fetch(pc,mp)))
                     then exec(triple(loops(r),
                 else if fetch(pc,mp)=BRANCH-(displ.B-(fetch(pc,mp)))
                       then if top(stack(r))=0
                             then if succ(displ.B-(fetch(pc,mp)))>pc
                                   then r
                                   else if loops(r)=0
                                         then timeout
                                         else exec(triple(pred(loops(r)),
                             else exec(triple(loops(r),
                       else if fetch(pc,mp)=JUMP+(displ.J+(fetch(pc,mp)))
                             then exec(r,
                             else ... fi ... fi

              Fig. 6. The interpreter exec for machine programs

function postfix(e:EXPR):STACK.PROGRAM <=
if e=VAR@(index(e))
 then PUSH.VAR(VAR(index(e)),null)
 else if e=EXPR0(e-op0(e))
       then PUSH.OP0(e-op0(e),null)
       else if e=EXPR1(e-op1(e),arg(e))
             then extend(postfix(arg(e)),PUSH.OP1(e-op1(e),null))
             else extend(postfix(arg1(e)),
            fi fi fi
if wp=SKIP
 then conc(NOOP,VOID)
 else if wp=IF(ICOND(wp),THEN(wp),ELSE(wp))
       then conc(EXEC(postfix(ICOND(wp))),
       else if wp=WHILE(WCOND(wp),BODY(wp))
             then conc(JUMP+(size(code(BODY(wp)))),
             else if wp=SET(CELL(wp),TERM(wp))
                  then conc(EXEC(postfix(TERM(wp))),
                  else append(code(LEFT(wp)),code(RIGHT(wp)))
                 fi fi fi fi

                  Fig. 7. Generation of stack code and machine code

4     Code Generation

We aim at defining a procedure code which generates a machine program from
a WHILE-program such that both programs compute the same function. To do
so, we start by generating stack programs from the expressions used in a WHILE-
program. This is achieved by procedure postfix of Fig. 7, which simply trans-
lates the tree-structure of expressions into the linear format of stack programs.6
E.g., the expression EXPR2(MINUS,VAR@(1),VAR@(2)) is translated by postfix
to the stack program PUSH.VAR(VAR(1),PUSH.VAR(VAR(2),PUSH.OP2(MINUS,
    Using the stack code generation for expressions, procedure code of Fig. 7
defines the machine code generation for WHILE-programs in a straightforward
recursive way, where the recursively computed code is embedded into machine
instructions in order to translate the control structure of the statement under
    The call of procedure extend in procedure postfix concatenates stack programs and
    the call of procedure append in procedure code concatenates machine programs.

consideration. Fig. 5 displays the result of code when applied to the WHILE-
program of Fig. 2.
   The correctness property for code is formally stated by

(4.1) lemma code is correct <= all wp:WHILE.PROGRAM, r:state

expressing that eval returns the same state upon the evaluation of a WHILE-
program wp as exec returns when applied in the same initial state r to the
machine program resulting from the translation of wp computed by code.

5     About      X

We intend to prove (4.1) code is correct with X        eriFun . In a typical session
with this system, a user defines a program by stipulating the data structures and
the procedures of the program, defines statements about these program elements
and verifies these statements and the termination of the procedures.
    XeriFun consists of several automated routines for theorem proving and for
the formation of hypotheses to support verification. It is designed as an inter-
active system, where, however, the automated routines substitute the human
expert in striving for a proof until they fail. In such a case, the user may step in
to guide the system for a continuation of the proof.
    When called to prove a statement, the system computes a prooftree. An
interaction, which may be required when the construction of the prooftree gets
stuck, is to instruct the system to use a proof rule, i.e.

 —   to   perform a case analysis,
 —   to   use an instance of a lemma or an induction hypothesis,
 —   to   unfold a procedure call,
 —   to   apply an equation,
 —   to   use an induction axiom, or
 —   to   insert, move or delete some hypothesis in the sequent of a proof-node.

    For simplifying proof goals, a further set of so-called computed proof rules
is provided. For example, the Simplification rule rewrites a goalterm using the
definitions of the data structures and the procedures, the hypotheses and the
induction hypotheses of the proof-node sequent and the lemmas already veri-
fied. The other computed proof rules perform a similar rewrite, however with
restricted performance. The computed proof rules are implemented by the Sym-
bolic Evaluator, i.e. an automated theorem prover over which the X  eriFun user
has no control. The Symbolic Evaluator can also be used for program tests to
evaluate instances of lemmas or to “run” procedures.7
    For instance, the machine program of Fig. 5 has been computed by the Symbolic
    Evaluator when called to apply code to the WHILE-program of Fig. 2. The Symbolic
    Evaluator may also compute the gcd of two numbers N and M by either calling

    X eriFun provides no control commands (except disabling induction hypothe-
ses upon symbolic evaluation), thus leaving the proof rules as the only means
for the user to control the system’s behavior. The symbolic evaluations and all
proofs computed by the system may be inspected by the user.
    Having applied a user suggested proof rule, the system takes over control
again and tries to develop the prooftree further until it gets stuck once more etc.
or it eventually succeeds. In addition, it may be necessary to formulate (and to
prove) an auxiliary lemma (sometimes after providing a new definition) in order
to complete the actual proof task.
    X eriFun demands that the termination of each procedure that is called in
a statement be verified before a proof of the statement can be started. There-
fore the system’s automated termination analysis [16] is activated immediately
after the definition of a recursively defined procedure. If the automated termi-
nation analysis fails, the user has to tell the system useful termination functions
represented by (sequences of) so-called measure terms. Based on this hint, the
system computes termination hypotheses that are sufficient for the procedure’s
termination and then need to be verified like any other statement.
    An introduction into the use of the system is given in [20], a short survey
is presented in [23], and a detailed account on the system’s operation and its
logical foundations can be found in [21].

6     A Machine Verification of code

Termination. X    eriFun’s automated termination analysis verifies termination
of all procedures upon definition, except for eval and exec. The interpreter
eval terminates because upon each recursive call either the size of the program
decreases or otherwise remains unchanged (for the outer eval-call in the WHILE-
case), but the loops-counter decreases. But the system is unable to recognize this
argumentation by itself and must be provided with the pair of termination func-
tions λwp¦|wp| , λr¦loops(r), causing the system to prove termination by the lexi-
cographic relation (wp1 , r1 ) >eval (wp2 , r2 ) iff |wp1 | > |wp2 | or |wp1 | = |wp2 | and
loops(r1 ) > loops(r2 ). Hence for the outer eval-call in the WHILE-case, the proof
obligation loops(r)>loops(eval(S,body(wp)) is obtained, which is trivially
verified as loops(r)6=0 and (*) pred(loops(r))≥loops(eval(S,body(wp))
must hold, cf. Fig. 3.
    Note that requirement (*) controlling the outer recursive call in the WHILE-
case is always satisfied, as we may prove (after having verified eval’s termina-

(6.1) lemma eval not increases loops <=
      all r:state, wp:WHILE.PROGRAM loops(r)≥loops(eval(r,wp))

    eval with the WHILE-program of Fig. 2 or exec with the machine program of Fig.
    5, where N and M are assigned to the program variables VAR(1) and VAR(2) in the
    memory of the state which is used when calling one of the interpreters.

expressing that eval never increases the loops-component of a state. However,
when removing (*) from the definition of eval, X        eriFun is unable to prove
termination because the system verifies only strong termination of procedures.
Roughly speaking, a terminating procedure terminates strongly iff termination
can be proven without reasoning about the procedure’s semantics, cf. [17] for
a formal definition. Since one has to reason about the semantics of eval, viz.
(6.1), for proving its termination if (*) is not provided, the system would fail to
prove eval’s termination.8
    The interpreter exec terminates because with each fetch&execute cycle either
the loops-component of the state decreases (in case of a back-leap BRANCH- or
JUMP-) or otherwise stays even, but pc moves towards the end of the program,
cf. Fig. 6. Here the system is unable to recognize this argumentation as well
and needs to be provided with the pair of termination functions λr ¦ loops(r),
λmp, pc ¦ size(mp)−pc.

Correctness of code. Before considering (4.1) code is correct, the correct-
ness of the procedure postfix has to be verified. Therefore we start with proving

(6.2) lemma postfix is correct <= all e:EXPR, s:Stack, m:memory
expressing that the value computed by value upon the evaluation of an expres-
sion e is obtained as top-of-stack after running the stack program returned by
postfix when applied to expression e. The proof of (6.2) requires two auxiliary
lemmas, one stating that stack programs can be executed step-by-step, viz.

(6.3) lemma run extend <= all m:memory,s:Stack,sp1,sp2:STACK.PROGR
and the other one expressing that the execution of stack programs obtained by
postfix does not affect the stack initially given to run, i.e.

(6.4) lemma pop run postfix <= all e:EXPR, s:Stack, m:memory
      pop(run(postfix(e),s,m))=s .
All three lemmas have an automated proof and are frequently (and automati-
cally) used in subsequent proofs.
    The induction proof of (4.1) code is correct is based on eval’s recursion
structure. Hence it develops into 7 cases, viz.

 1. wp is a SKIP-statement,
    The need for terminating procedures is necessitated by the logic implemented by our
    system, while the failure to prove eval’s termination without requirement (*) is a
    lack of our implementation only. This is because, as proved in [5], our approach for
    automated termination proofs [16] is also sound for procedures with nested recur-
    sions like eval. We learned how to transform a terminating procedure with nested
    recursions into a strongly terminating procedure from [12].

 2. wp is a SET-statement,
 3. wp is a WHILE-statement and loops(r)=0,
 4. wp is a COMPOUND-statement,
 5. wp is an IF-statement,
 6. wp is a WHILE-statement, pred(loops(r))≥loops(eval(S,BODY(wp)) and
    loops(r)6=0, and
 7. wp is a WHILE-statement, pred(loops(r))<loops(eval(S,BODY(wp)) and
    loops(r)6=0, where S is defined as in Fig. 3.

The first 3 cases are the base cases of the induction, and (consequently) the
remaining cases are the step cases.
    The first two base cases have an automated proof. However, the proof of the
third base case gets stuck and after Simplification the goalterm reads as

(6.5) if(r=timeout,
        if(value(WCOND(wp),store(r))=0,R=timeout,R=r)) ,

where R abbreviates

(6.6) exec(r,succ(size(code(BODY(wp)))),P)

and P abbreviates

(6.7) conc(JUMP+(size(code(BODY(wp)))),

   Since the pc points in R to the instruction EXEC(postfix(WCOND(wp))),
symbolic execution of the exec-call R yields R0 as

(6.8) exec(triple(loops(r),
          P) .

    Here the pc points to the instruction BRANCH-(size(code(BODY(wp)))),
hence symbolic execution of the exec-call R0 yields timeout, if value(WCOND(wp),
store(r))=0 holds, and r otherwise, thus completing the proof of (6.5).
    We therefore instruct the system with the Unfold Procedure rule to unfold
both procedure calls R in (6.5) and also both resulting procedure calls R0 , which
(after subsequent Simplification steps using (6.2) and (6.4)) proves the third base
    Of course, it is annoying to call interactively for Unfold Procedure quite often
instead of only letting Simplification doing the job. The reason is that the unfold

of procedure calls needs to be controlled heuristically upon symbolic evaluation,
because otherwise unusable goalterms may result. X   eriFun uses a heuristic which
is based on a similarity measure between the well-founded relation used for an
induction proof of a statement and the well-founded relations which have been
used to prove the termination of the procedures involved in the statement, and
only calls of procedures having a termination relation similar to the induction
relation are unfolded automatically. This heuristic proved successful in almost all
cases but may fail if a statement refers to procedures which differ significantly in
their recursion structure, as eval and exec do in the present case. In the proof
of (4.1), we must therefore call for Unfold Procedure not only in the third base
case, but also in some of the step cases to instruct the system interactively to
execute parts of the machine code in the goalterms symbolically.
    Having proved the base cases, the system proceeds with the first step case
(where wp is a COMPOUND-statement) and simplifies the induction conclusion to

(6.9) if(r=timeout,
         = exec(r,0,append(code(LEFT(wp)),code(RIGHT(wp)))))

which we straightforwardly generalize to

(6.10) exec(exec(r,pc,mp1),0,mp2) = exec(r,pc,append(mp1,mp2))

in order to prove the subgoal (6.9) with this equation.
    However, (6.10) does not hold. If the pc points to some instruction of append(
mp1,mp2) but is not within the address space of mp1, equation (6.10) rewrites
to exec(r,0,mp2)=exec(r,pc,append(mp1,mp2)), which obviously is false. We
therefore demand size(mp1)>pc, but this restriction is not enough:
    Assume that mp1 contains a HALT instruction and exec(r,pc,mp1) returns
the state r’ upon the execution of HALT. Then r’ is also obtained when executing
append(mp1,mp2), and consequently equation (6.10) rewrites to exec(r’,0,mp2)
=r’. Hence we also demand, where procedure re-
turns true iff it is applied to a machine program free of HALT instructions.
    But even with this additional restriction equation (6.10) is still false. This
time we assume that mp1 contains some forward-leap instruction with a displace-
ment pointing beyond the last instruction of append(mp1,mp2). If this instruc-
tion is executed in a state r’, exec returns r’ and equation (6.10) rewrites to
exec(r’,0,mp2)=r’ again. We therefore demand closed+(mp1) as a further re-
striction, where procedure closed+ returns true iff it is applied to a machine
program mp such that pc+d≤size(mp)-1 for each forward-leap instruction in mp
with fetch(pc,mp)=JUMP+(d) or fetch(pc,mp)=BRANCH+(d).
    We continue with our analysis and now assume that mp2 contains some back-
leap instruction with a displacement pointing beyond the first instruction of mp2.
For instance, mp1 may consist of one instruction NOOP only and mp2 may consist
of only one instruction JUMP-(0). Then equation (6.10) rewrites to r=timeout,

and a counter example is found again. We therefore demand closed-(mp2)
as another restriction, where procedure closed- returns true iff it is applied
to a machine program mp such that pc≥d+1 for each back-leap instruction in
mp with fetch(pc,mp)=BRANCH-(d) or fetch(pc,mp)=JUMP-(d). But we also
have to demand closed-(mp1), as otherwise equation (6.10) may rewrite to
exec(r’,0,mp2)=r’ again. We are done with this final restriction, and the re-
quired lemma reads as

(6.11) lemma exec stepwise <=
   all pc:nat, mp1,mp2:MACHINE.PROGRAM, r:state
         exec(exec(r,pc,mp1),0,mp2) = exec(r,pc,append(mp1,mp2)),
         true), ... , true) .
   However, in order to prove subgoal (6.9) by lemma (6.11), it must be verified
that all preconditions of (6.11) are satisfied for machine programs mp1 and mp2
computed by code. We therefore formulate the lemmas size code not zero,
code is, code is closed- and code is closed+ all of which are
proved easily. Recognizing the lemmas just developed and verified, the system
simplifies the induction conclusion of the first step case to true.
   We continue in the verification of lemma (4.1) with the second step case,
which is concerned with the code generation for IF-statements. Using both in-
duction hypotheses, Simplification of the step formula yields the goalterm

(6.12) if(r=timeout,
             exec(r,0,code(ELSE(wp)))=exec(r,0,P))) .
where P abbreviates

(6.13) conc(EXEC(postfix(ICOND(wp))),
                             code(THEN(wp)))))) .
    As before, parts of the machine code must be executed symbolically by some
interactive unfold steps, yielding the modified goalterm

(6.14) if(r=timeout,

after executing the EXEC and the BRANCH+ instruction in the THEN-case. Since
the pc points in exec(r,succ(succ(succ(size(code(ELSE(wp)))))),P) to
the first instruction of code(THEN(wp)), we may get rid of the machine code
EXEC(...);...;JUMP+(...) preceding code(THEN(wp)) in P and adjust the pc
accordingly, as code(THEN(wp)) (being closed-) does not contain any back-
leaps into the preceding code. This yields exec(r,0,code(THEN(wp))) and the
equation of the THEN-case is proved.
    The elimination of machine code is formally justified by

(6.15) lemma exec skip program <=
       all mp1,mp2:MACHINE.PROGRAM, r:state, pc:nat

which is needed also in the ELSE-case: After symbolic execution of the EXEC and
the BRANCH+ instruction, both instructions are eliminated using (6.15), yielding
the goalterm

(6.16) if(r=timeout,
                                   code(THEN(wp))))))) .

   To get rid of the JUMP+ instruction and the code for the THEN-part succeeding
code(ELSE(wp)), we instruct the system to apply (6.11) exec stepwise yielding
the simplified goalterm

(6.17) if(r=timeout,

which simplifies further to the inner call exec(r,0,code(ELSE(wp))) after a
symbolic execution of the outer exec-call, as the pc is out of range after the
execution of JUMP+. This proves the subgoal of the ELSE-part, and of the whole
step case in turn.

    Next we consider the third step case, which is concerned with the code genera-
tion for WHILE-statements if loops(r)6=0 and pred(loops(r))≥loops(eval(S,
BODY(wp)), where S is defined as in Fig. 3. Since we formulated the proof of the
WHILE-base case as a lemma, Simplification proves the statement if the loop-
condition is false and returns (using both induction hypotheses) the simplified

(6.18) if(value(WCOND(wp),store(r))=0,
          true) ,
where P abbreviates

(6.19) append(code(BODY(wp)),
                   conc(BRANCH-(size(code(BODY(wp)))),VOID))) .
    Now (appealing to lemma (6.15)), the JUMP+ instruction is eliminated in the
right-hand exec-call of (6.18), and then the EXEC and the BRANCH- instructions
are executed symbolically in the resulting exec-call, yielding the simplified goal-

(6.20) if(value(WCOND(wp),store(r))=0,
          true) .
    Although the call of exec(S,0,P) in (6.20) seems to be a candidate for
applying lemma (6.11) exec stepwise, this lemma is unusable here because
the machine program EXEC(...);BRANCH-(size(code(BODY(wp)))) fails to be
closed-. We therefore need a further lemma, which is obtained as a generaliza-
tion of the equation in (6.20), viz.

(6.21) lemma exec repetition <=
      all mp:MACHINE.PROGRAM, pc:nat, sp:STACK.PROGRAM, r:state

                    = exec(r,pc,append(mp,conc(EXEC(sp),
                      true),true),true)) .

    To avoid an over-generalization of the equation in (6.20), the equation of
(6.21) must be restricted to machine programs being closed-, closed+ and, where these requirements have been recognized by the same care-
ful analysis undertaken when developing lemma (6.11) exec stepwise. Having
verified (6.21), we instruct the system to use this lemma in (6.20) by Apply
Equation, and this results in the simplified goalterm

(6.22) if(value(WCOND(wp),store(r))=0,
          true) .

    It only remains to unfold the outer exec-call in the left-hand side of the equa-
tion in (6.22). Now Simplification sets the pc to succ(size(code(BODY(wp))))
and then uses lemma (6.15) to eliminate the JUMP+ instruction just executed as
well as to reset the pc to size(code(BODY(wp))). This makes both sides of the
equation identical, thus proving the third step case.
    We conclude with the last step case in which code is generated for WHILE-
statements if loops(r)6=0 and pred(loops(r))<loops(eval(S,BODY(wp)),
where S is defined as in Fig. 3. But this case is impossible, because loops(S)≥
loops(eval(S,...) by lemma (6.1) eval not increases loops and loops(S)=
pred(loops(r)).9 Hence the goalterm of this step case simplifies to true, thus
proving this step case and completing the proof of lemma (4.1) code is correct.
    It is interesting to see that we do not get rid of lemma (6.1). Either we need it to
    verify lemma (4.1), or we need it to verify eval’s termination without considering
    the additional requirement (*) controlling the outer eval-call in the WHILE-case of
    Fig. 3, cf. Section 6. In the latter case, (6.1) is not required for subsequent proofs as
    step cases 3 and 4 then merge into one step case.

7      Discussion

Viewed in retrospect, the proofs required to verify (4.1) code is correct were
obtained without that much effort. However, theorem proving sometimes is like
crossing an unknown terrain for climbing a hill. Viewing the way after reach-
ing the top, it seems quite obvious how to get there directly, but being on
the way one is faced with deadends and traps turning the whole event into a
nightmare. In case of (4.1) code is correct, the crucial steps to success were
(1) the “right” definition of the machine language’s interpreter exec, (2) the
“right” definition of a state, and (3) the invention of the key lemmas (6.11)
exec stepwise, (6.15) exec skip program and (6.21) exec repetition as well
as the key notions needed to formulate them.10
    Our first attempt was to define a state without the loops- and the stack-
components, but to care for the termination of exec by limiting the total number
of fetch-calls performed when executing a machine program. This means to use
a definition like function exec(r:state, pc:nat, mp: MACHINE.PROGRAM,
cycles:nat):state <= ..., were cycles is decremented in each recursive
call of exec to enforce termination (and the treatment of the stack is ignored
here for the moment). But this approach requires an additional procedure, say
get.cycles, computing the minimum number of cycles needed to execute a
machine program (if this execution does not result in timeout). This procedure
get.cycles is required to formulate the correctness statement for code, as the
number of loop-bodies of a WHILE-program wp evaluated by eval has to be re-
lated to the number of machine cycles needed to execute code(wp) by exec.
Procedure get.cycles is easily obtained from the definition of exec, but with
this procedure also a bunch of additional lemmas, in particular the get.cycles-
versions of the key lemmas (6.11), (6.15) and (6.21) need to be formulated and
    Having followed this approach for some time, we gave up and started to
work with another version of exec. This definition was based on the observation
that the number of evaluated loop-bodies in a WHILE-program wp is exactly
the number of BRANCH--calls performed upon the execution of code(wp).12 We
therefore got rid of get.cycles (and also, what is even more important, of all the
     Another obstacle was caused by two faulty address calculations in the code generator
     which were recognized while attempting to prove code is correct.
     get.cycles corresponds to the clock-procedures used in [4], [10], [11], [27], causing
     similar problems there. However, clock can be used to reason about running times,
     and also for proof-technical reasons as Moore explains (personal communication).
     Of course, this idea depends on our definition of code generation, hence it may
     not transfer to alternative implementations of code. There are two answers to this
     complaint: First of all, it would be rather artificial to use a JUMP- or a BRANCH-
     instruction for some reason other to implement a loop. And second, the procedure
     exec is a model of some piece of hardware only, where a counter (being it cycles or
     loops) has been added for verification purposes only. As long as a real machine does
     not provide such a counter, we may define it as we like without spoiling the truth of
     statements wrt. the partial correctness of the machine language interpreter.

lemmas coming with it) by (renaming cycles by loops and) decrementing loops
only in the recursive exec-calls coming with a JUMP- or a BRANCH--instruction.
   However, this approach has problems, too: This time another procedure, say
get.loops, is required to rephrase the equation in (6.11) exec stepwise for this
version of exec, which then would read

 (7.1) exec(exec(r,pc,mp1,loops),
     = exec(r,pc,append(mp1,mp2),loops).
    Here the procedure get.loops is needed to compute the number of loops
remaining after the execution of mp1 starting at pc in state r, hence this approach
necessitates the same burden of additional lemmas as the get.cycles idea.
    But fortunately, there is an easy remedy to this problem. We simply let
loops become a component of a state rather than a formal parameter of exec,
and then the need for an additional procedure get.loops disappears. As similar
problems with the formulation of (6.11) arise if exec is provided with a formal
parameter s:stack, also the stack becomes a component of a state, and this
motivates our definitions of the data structure state and the procedure exec.
    Having settled these definitions, the next problem was to formulate the key
lemmas (6.11), (6.15) and (6.21). It is interesting to see that each lemma cor-
responds directly to a statement of the WHILE-language, viz. the COMPOUND-, the
IF-, and the WHILE-statement. Whereas the control structure of WHILE-programs
is easily captured by the notions of the meta-language, viz. functional compo-
sition, conditionals and recursion, cf. Fig. 3, the control structure of a machine
program mp is encoded in the data (viz. mp) by the JUMP and BRANCH instruc-
tions, cf. Fig. 6. This requires making the control structure of machine programs
explicit in the meta-language, and this is achieved by the key lemmas, whose syn-
tactical similarity with the recursive eval-calls (necessitated by the respective
statements) is obvious. However, this was not obvious to us when we developed
the proof of (4.1) code is correct, and most of the time was spent to analyze
the system’s outcome when it got stuck, to find out which properties of ma-
chine programs are required in order to get the proof through. Upon this work,
the key notions of, closed-, and closed+ machine programs were
recognized, and the key lemmas were speculated step by step.
    While the speculation of the key lemmas constituted the main challenge for
us, the proofs of these lemmas constituted the main challenge for the system:
By the rich case-structure of procedure exec, goalterms evolved which are so
huge that the Symbolic Evaluator fails to process them within reasonable time.
A remedy to this problem is to throw in interactively a case analysis (stipulating
the kind of instruction fetch(pc,mp) may yield) which separates the goalterm
into smaller pieces so that the theorem prover can cope with them.13
     This case study also revealed a shortcoming of X eriFun’s object language, viz. not
     having let- and case-instructions available. We intend to remove this lack of the
     language in a future version of the system, as this would significantly improve the
     performance of the Symbolic Evaluator.

    But still the system showed performance problems for another reason: Since
the clauses stemming from the verified lemmas and the induction hypotheses
may be used upon symbolic evaluation, system performance would decrease with
an increasing number of clauses. Therefore the system uses a lemma-filter to
throw out all clauses computed from verified lemmas which (heuristically) do
not seem to contribute to a proof. This prevents the Symbolic Evaluator from
being swamped by a huge number of clauses (without introducing the need for
frequent user calls to consider a lemma which - although required - did not pass
the filter), and makes a user command to disable verified lemmas obsolete in
X eriFun. However, induction hypotheses are not considered by the lemma-filter,
simply because induction hypotheses are quite similar to the original statement
thus causing the lemma-filter to let them pass in almost all cases. Now in case
of each key-lemma, 8 induction hypotheses are available in the step case which
separate into 66 clauses having 10 − 12 literals each. This creates a large search
space for the Symbolic Evaluator and performance decreases significantly. We
therefore instructed the system to disable the induction hypotheses upon sym-
bolic evaluation, and then the system went through the proofs (needing further
advice from time to time).14 Using this setting, X    eriFun succeeded because
after Simplification following the case analysis, the system picked up the right
induction hypotheses with the Use Lemma rule, and a subsequent Simplification
then proved the case.15 Viewed from the structure of the proofs, the key-lemmas
showed nothing spectacular as the proofs (which are quite similar) consist of
large sequences of (interactive) calls for Case Analysis and (automated) calls of
Simplification and Use Lemma only.
    In addition, a bunch of “routine” lemmas had been created (expressing for
example that distributes over append, etc.), whose need, however,
was immediately obvious from the system’s outcome and whose proofs needed
an interaction in very rare cases only.
    Fig. 8 gives an account on XeriFun’s automation degree measured in terms of
prooftree edits.16 For the whole case study, 13 data structures and 22 procedures
were defined to formulate 56 lemmas, whose verification required 1038 prooftree
edits in total, where only 186 of them had to be suggested interactively.
    The Symbolic Evaluator computed proofs with a total number of 64750
rewrite steps within 33 minutes running time, where the longest subproof of
1259 steps had been created for lemma (6.21) exec repetition.
    The value for the automated calls of Use Lemma is unusually high as com-
pared to other case studies performed with X     eriFun , e.g. [19], [25], which is
caused by the fact that the induction hypotheses had been disabled upon sym-
     As a matter of fact, we never saw the need for disabling induction hypotheses before.
     Induction hypotheses are treated like lemmas in X     eriFun, except that induction
     hypotheses are only locally available, i.e. for a specific proof, whereas lemmas are
     globally available, i.e. for all proofs.
     The values are computed for X        eriFun 2.6.1 under Windows XP running JAVA
     1.4.1 01 on a 2.2 GHz Pentium 4 PC. The 3 interactions to disable induction
     hypotheses are not listed in Fig. 8.

     Fig. 8. Proof statistics obtained for the verification of the code generator

bolic evaluation of the key lemmas so that Use Lemma could succeed following
Simplification. Whereas the Induction rule performed perfectly here, the values
for Unfold Procedure and for Case Analysis are unusually high, which reflects the
need for frequent interactive calls for symbolic execution of machine programs
when proving (4.1), and also reflects the separation into subcases needed for the
proofs of the key lemmas.
    In total, 82.1% of the required prooftree edits had been computed by ma-
chine, a number which (although it is not as good as the values encountered
in other cases) we consider as good enough to provide significant support for
computer-aided verification. With the key notions for machine programs and
the key lemmas for the machine language interpreter, a clear and illuminating
structure for the proof of the main statement evolved, which lacks formal clutter
and therefore provides a useful base to illustrate the rˆle of formal semantics
and the benefits of computer-aided verification in the classroom.
Acknowledgement We are grateful to Markus Aderhold for useful comments.


 2. A. V. Aho, R. Sethi, and J. D. Ullmann. Compilers: Principles, Techniques and
    Tools. Addison-Wesley, New York, 1986.
 3. A. Dold and V. Vialard. A mechanically verified compiling specification for a Lisp
    compiler. In R. Hariharan, M. Mukund, and V. Vinay, editors, FST TCS 2001:
    Foundations of Software Technology and Theoretical Computer Sience, volume 2245
    of Lect. Notes in Comp. Sc., pages 144—155, 2001.
 4. A. D. Flatau. A Verified Implementation of an Applicative Language with Dynamic
    Storage Allocation. PhD. Thesis, Univ. of Texas, 1992.
 5. J. Giesl. Termination of Nested and Mutually Recursive Algorithms. Journal of
    Automated Reasoning, 19:1—29, 1997.
 6. W. Goerigk, A. Dold, T. Gaul, G. Goos, A. Heberle, H. von Henke, U. Hoffmann,
    H. Langmaack, and W. Zimmermann. Compiler correctness and implementation
    verification: The Verifix approach. In P. Fritzson, editor, Proc. of the Poster Ses-
    sion of CC’96 - Intern. Conf. on Compiler Construction, pages 65 — 73, 1996.
 7. C. A. Gunter. Semantics of Programming Languages – Structures and Techniques.
    The MIT Press, Cambridge, 1992.
 8. J. Hannan and F. Pfenning. Compiler verification in LF. In A. Scedrov, editor,
    Proceedings of the Seventh Annual IEEE Symposium on Logic in Computer Science,
    pages 407—418. IEEE Computer Society Press, 1992.
 9. J. McCarthy and J. A. Painter. Correctness of a Compiler for Arithmetical Ex-
    pressions. In J. T. Schwartz, editor, Proc. on a Symp. in Applied Math., 19, Math.
    Aspects of Comp. Sc. American Math. Society, 1967.
10. J. S. Moore. A Mechanically Verified Language Implementation. Journal of Auto-
    mated Reasoning, 5(4):461—492, 1989.
11. J. S. Moore. PITON - A Mechanically Verified Assembly-Level Language. Kluwer
    Academic Publishers, Dordrecht, 1996.
12. J. S. Moore. An exercise in graph theory. In M. Kaufmann, P. Manolios, and
    J. S. Moore, editors, Computer-Aided Reasoning: ACL2 Case Studies, pages 41—
    74, Boston, MA., 2000. Kluwer Academic Press.
13. H. R. Nielson and F. Nielson. Semantics with Applications. John Wiley and Sons,
    New York, 1992.
14. P. Curzon. A verified compiler for a structured assembly language. In M. Archer,
    J.J. Joyce, K.N. Levitt, and P.J. Windley, editors, International Workshop on
    Higher Order Logic Theorem Proving and its Applications, pages 253—262, Davis,
    California, 1991. IEEE Computer Society Press.
15. G. Schellhorn and W. Ahrendt. The WAM case study: Verifying compiler correct-
    ness for Prolog with KIV. In W. Bibel and P. H. Schmidt, editors, Automated
    Deduction: A Basis for Applications. Volume III, Applications. Kluwer Academic
    Publishers, Dordrecht, 1998.
16. C. Walther. On Proving the Termination of Algorithms by Machine. Artificial
    Intelligence, 71(1):101—157, 1994.
17. C. Walther. Criteria for Termination. In S. H¨lldobler, editor, Intellectics and Com-
    putational Logic, pages 361—386. Kluwer Academic Publishers, Dordrecht, 2000.
18. C. Walther. Semantik und Programmverifikation. Teubner-Wiley, Leipzig, 2001.
19. C. Walther and S. Schweitzer. A Machine Supported Proof of the Unique Prime
    Factorization Theorem. Technical Report VFR 02/03, Programmiermethodik,
    Technische Universit¨t Darmstadt, 2002.
20. C. Walther and S. Schweitzer. The X      eriFun Tutorial. Technical Report VFR
    02/04, Programmiermethodik, Technische Universit¨t Darmstadt, 2002.

21. C. Walther and S. Schweitzer. X  eriFun User Guide. Technical Report VFR 02/01,
    Programmiermethodik, Technische Universit¨t Darmstadt, 2002.
22. C. Walther and S. Schweitzer. Verification in the Classroom. Technical Report
    VFR 02/05, Programmiermethodik, Technische Universit¨t Darmstadt, 2002.
23. C. Walther and S. Schweitzer. About X    eriFun. In F. Baader, editor, Proc. of the
    19th Inter. Conf. on Automated Deduction (CADE-19), volume 2741 of Lecture
    Notes in Artifical Intelligence, pages 322—327, Miami Beach, 2003. Springer-Verlag.
24. C. Walther and S. Schweitzer. Verification in the Classroom. To appear in Journal
    of Automated Reasoning - Special Issue on Automated Reasoning and Theorem
    Proving in Education, pages 1—43, 2003.
25. C. Walther and S. Schweitzer. A Verification of Binary Search. In D. Hutter and
    W. Stephan, editors, Mechanizing Mathematical Reasoning: Techniques, Tools and
    Applications, volume 2605 of LNAI, pages 1—18. Springer-Verlag, 2003.
26. G. Winskel. The Formal Semantics of Programming Languages. The MIT Press,
    Cambridge, 1993.
27. W. D. Young. A Mechanically Verified Code Generator. Journal of Automated
    Reasoning, 5(4):493—518, 1989.