A Machine -Veriﬁed Code Generator
Christoph Walther and Stephan Schweitzer
Technische Universit¨t Darmstadt
Abstract. We consider the machine-supported veriﬁcation of a code
generator computing machine code from WHILE-programs, i.e. abstract
syntax trees which may be obtained by a parser from programs of an
imperative programming language. We motivate the representation of
states developed for the veriﬁcation, which is crucial for success, as the
interpretation of tree-structured WHILE-programs diﬀers signiﬁcantly in
its operation from the interpretation of the linear machine code. This
work has been developed for a course to demonstrate to the students
the support gained by computer-aided veriﬁcation in a central subject of
computer science, boiled down to the classroom-level. We report about
the insights obtained into the properties of machine code as well as the
challenges and eﬀorts encountered when verifying the correctness of the
code generator. We also illustrate the performance of the XeriFun system
that was used for this work.
We develop the X eriFun system ,, a semi-automated system for the ver-
iﬁcation of programs written in a functional programming language. One rea-
son for this development originates from our experiences when teaching Formal
Methods, Automated Reasoning, Semantics, Veriﬁcation, and similar subjects.
As the motivation of the students largely increases when they can gather prac-
tical experiences with the principles and methods taught, X eriFun has been
developed as a small, highly portable system with an elaborated user interface
and a simple base logic, which nevertheless allows the students to perform ambi-
tious veriﬁcation case studies within the restricted time frame of a course. The
system has been used in practical courses at the graduate level for proving e.g.
the correctness of a ﬁrst-order matching algorithm, the RSA public key encryp-
tion algorithm and the unsolvability of the Halting Problem, as well as recently
in an undergraduate course about Algorithms and Data Structures, where more
than 400 students took their ﬁrst steps in computer-aided veriﬁcation of simple
statements about Arithmetic and Linear Lists and the veriﬁcation of algorithms
like Insertion Sort and Mergesort, cf. , . XeriFun comes as a JAVA ap-
plication which the students can run on their home PC (whatever platform it
may use) after a 1 MB download to work with the system whenever they like to.
Technical Report VFR 03/01
This paper is concerned with the veriﬁcation of a code generator for a sim-
ple imperative language. Work on veriﬁed code generators and compilers dates
back more than 35 years . With the development of elaborated logics and the
evolving technology of theorem proving over the years, systems developed that
provide a remarkable support for compiler veriﬁcation as well. Various impres-
sive projects have been carried out which demonstrate well the beneﬁts of certain
logical frameworks and their implementation by reasoning systems in this do-
main. Meanwhile a tremendous amount of literature exists, which excludes an
exhaustive account. E.g.,  presents a case study using the Elf language, 
uses the HOL system to verify a compiler for an assembly language,  and 
report on compiler veriﬁcation projects for a subset of CommonLisp using PVS,
and  veriﬁes a compiler for Prolog with the KIV system. Much work also
centers around the Boyer-Moore prover and its successors, e.g. , and in one
of the largest projects the compilation of an imperative programming language
via an assembly language down to machine code is veriﬁed, cf. , , .
However, the high performance of these systems also comes with the price
of highly elaborated logics and complicated user interfaces, which makes their
use diﬃcult for teaching within the restricted time frame of a course (if it is
not impossible at all). Furthermore, as almost all of the cited work is concerned
with real programming languages and the bits-and-pieces coming with them, it
is hard to work out the essential principles and problems from the presentations
to demonstrate them in the classroom. And last but not least, it is also diﬃcult
(in particular for the students) to assess the eﬀort needed when using a certain
tool, as most of the papers do not provide appropriate statistics but refer to
large proof scripts in an appendix or to be downloaded from the web for further
The work presented here was prepared (in addition to the material given in
) for a course about Semantics and Program Veriﬁcation to illustrate the
principles of state-based semantics and the practical use of formal semantics
when developing compilers etc. However, the main focus is to demonstrate the
support gained by computer-aided veriﬁcation in a central subject of computer
science education, boiled down to the classroom-level. The code generator com-
putes machine code from abstract syntax trees as used in standard textbooks
of formal semantics, e.g. , , . We report about the insights obtained
into the properties of machine code as well as the challenges and eﬀorts en-
countered when verifying the correctness of this program. We also illustrate the
performance of the X eriFun system that was used for this work.
2 WHILE - Programs
The language of WHILE-programs consists of conditional statements, while-loops,
assignments, compound statements and statements for doing nothing, and is de-
ﬁned by the data structure WHILE.PROGRAM in Fig. 1. WHILE-programs represent
abstract syntax trees which for instance are computed by a compiler from a
program conforming to the concrete syntax of a programming language to be
structure VARIABLE <= VAR(ADR:nat)
structure nullary.operator <= FALSE, TRUE, CONSTANT(number:nat)
structure unary.operator <= NOT, EVEN, DIV2
structure binary.operator <= AND, OR, PLUS, MINUS, TIMES, EQ, GT, LE
structure EXPR <=
structure WHILE.PROGRAM <=
Fig. 1. The languages of Expressions and WHILE-programs
Fig. 2. A WHILE-program computing the gcd
available for subsequent code generation, cf. e.g. . The language of WHILE-
programs uses program variables and expressions built with arithmetical and
logical operators, which are deﬁned by further data structures, cf. Fig. 1.1 For
instance, the WHILE-program of Fig. 2 computes the gcd of the variables VAR1
and VAR2 and returns the result in the program variable VAR1.
Following the standard approach for the deﬁnition of a (structural) oper-
ational semantics of WHILE-programs, e.g. , , , we start by provid-
ing an operational semantics for the expressions EXPR: Given a data structure
association which associates a VARIABLE with a natural number, we deﬁne a
data structure memory organized as a linear list of associations and a proce-
dure function assignment(v:VARIABLE, m:memory):nat <= ... which re-
turns the assignment of a variable v wrt. memory m (or returns 0 in case that
no assignment for v exists in m). Then a procedure function value(e:EXPR,m:
memory):nat <= ... is used to compute the value of an expression wrt. the
Neither the language of expressions nor the language of WHILE-programs respect
types, i.e. boolean expressions may be used in places where only arithmetical expres-
sion are meaningful and vice versa. Of course, one may take care of correct typing by
an appropriate modiﬁcation of the deﬁnitions in Fig. 1 without complicating subse-
quent proofs. However, we do not follow this idea, because (1) our code generator is
also correct for ill-typed expressions, and (2) type checking is performed by a com-
piler before the computation of an abstract syntax tree, so that ill-typed expressions
are never available for code generation.
structure state <= timeout, triple(loops:nat,store:memory,stack:Stack)
function eval(r:state, wp:WHILE.PROGRAM):state <=
else if wp=WHILE(WCOND(wp),BODY(wp))
then if value(WCOND(wp),store(r))=0
then if loops(r)=0
else if pred(loops(r))≥loops(eval(S,BODY(wp)))
else r fi
else if wp=SET(CELL(wp),TERM(wp))
else if wp=COMPOUND(LEFT(wp),RIGHT(wp))
else if wp=IF(ICOND(wp),THEN(wp),ELSE(wp))
then if value(ICOND(wp),store(r))=0
else r fi fi fi fi fi
where S abbreviates triple(pred(loops(r)),store(r),stack(r))
Fig. 3. An interpreter eval for WHILE-programs
assignments in memory m and the semantics provided for the operators, thus
deﬁning an operational semantics for the language of expressions. The procedure
value calls assignment to obtain the number assigned to a program variable by
memory m, returns the number denoted by a CONSTANT and returns 0 for TRUE
and 1 for FALSE.2 In all other cases, the operation corresponding to the expres-
sion’s operator is applied to the values of the expression’s arguments computed
recursively by value. E.g., the computation of value(EXPR2(PLUS,e1,e2),m)
yields call-PLUS(value(e1,m),value(e2,m)), where call-PLUS and similar
procedures deﬁning the semantics of the other operators are given elsewhere.3
An operational semantics for WHILE-programs is given by the interpreter eval
which maps a program state and a WHILE-program to a program state, cf. Fig.
3. A program state is either the symbol timeout (denoting a non-terminating
interpretation of a WHILE-program) or is a triple consisting of a counter loops,
Truth values are encoded by natural numbers, where natural numbers are predeﬁned
in XeriFun by the data structure structure nat <= 0, succ(pred:nat).
The procedure call-PLUS and all other procedures providing the semantics of the
operators remain undeﬁned here, because the correctness of the code generator does
not depend on the semantics of the operators.
structure STACK.PROGRAM <=
structure INSTRUCTION <=
NOOP, HALT, EXEC(prog:STACK.PROGRAM), LOAD(loc:VARIABLE),
Fig. 4. Stack Programs and Machine Instructions
a memory store and a stack stack. The store holds the current variable as-
signments under which expressions are evaluated by value (when executing a
WHILE-, a SET- or an IF-statement), and which is updated when executing a
SET-statement. The rˆle of the counter loops is discussed in Sections 6 and 7.
Also the stack is used for subsequent developments, cf. Section 3, and may be
ignored here as eval does not consider this component of a triple at all.
3 Machine Programs
The Stack Machine. Our target machine consists of a program store, a ran-
dom access memory, a stack and a program counter pc. Arithmetical and logical
operations are performed by a subdevice of the target machine, called the stack
machine. The stack machine may push some content of the memory onto the
stack and performs arithmetical and logical operations by fetching operands from
and returning results to the stack. The operation of the stack machine is con-
trolled by so-called stack programs. The target machine provides an instruction
EXEC which calls the stack machine to run the stack program provided by the
parameter of EXEC.
Stack programs are ﬁnite sequences of PUSH-commands and are deﬁned by
the data structure STACK.PROGRAM of Fig. 4. An operational semantics for stack
programs is given by procedure function run(sp:STACK.PROGRAM,s:Stack,
m:memory):Stack <= ... which interprets the commands of a stack program
step by step: When executing a stack program PUSH.VAR(v, sp), the number
assigned to program variable v by memory m (obtained by assignment) is pushed
onto stack, and when executing PUSH.OP0(c, sp), the value denoted by constant
c is pushed onto stack. In all other cases operands are fetched from and the
result of the operation is pushed onto stack. For instance, stack s is replaced by
push(call-PLUS(top(pop(s)),top(s)),pop(pop(s))) when executing a stack
program PUSH.OP2(PLUS, sp), where call-PLUS and the procedures deﬁning the
semantics of the other operators are the same like for value.
Having executed a PUSH-command, run proceeds with the execution of the
remaining stack program sp.
Syntax and Semantics of Machine Programs. Besides the EXEC instruc-
tion, the target machine also provides a LOAD instruction to write the top-of-stack
Fig. 5. A machine program computed by code for the WHILE-program of Fig. 2
to a designated address of the memory, two unconditional jump instructions
JUMP+ and JUMP- to move the pc forward or backward in the program store,
two conditional jump instructions BRANCH+ and BRANCH- which are controlled
by the top-of-stack, a HALT instruction which halts the target machine, and a
NOOP instruction which does nothing (except incrementing the pc). The instruc-
tion set is formally deﬁned by the data structure INSTRUCTION of Fig. 4, and a
MACHINE.PROGRAM is simply a linear list of instructions, where VOID denotes the
empty program, begin yields the ﬁrst instruction and end denotes the machine
program obtained by removing the ﬁrst instruction. Fig. 5 gives an example of
a machine program.4
An operational semantics of machine programs is deﬁned by procedure exec
of Fig. 6.5 This interpreter uses a procedure fetch to fetch the instruction to
which the program counter pc points in the machine program mp. The interpreter
exec returns the input state if called in the state timeout, called with an empty
machine program, with a HALT instruction or if the pc is not within the address
space [0, ..., size(mp) − 1] of the machine program mp, where size(mp) computes
the number of instructions in mp.
To ease readability, the list structure of machine programs is omitted in Fig. 5. E.g.
we write LOAD(VAR(1));JUMP+(1);NOOP for the last three instructions of the ma-
chine program instead of the formally required conc(LOAD(VAR(1)),conc(JUMP+(1),
conc(NOOP,VOID)...). Also the addresses [. . .] are not part of the machine program,
but are added for illustration purposes only.
For lack of space, some instructions are omitted in Fig. 6. The execution of NOOP
results in a recursive call of exec with incremented pc and unchanged state, HALT
terminates the execution of exec and returns the input state, BRANCH+ is imple-
mented like JUMP+, however controlled by the top-of-stack like BRANCH-, and JUMP-
is implemented like BRANCH-, however without being controlled by the top-of-stack.
function exec(r:state, pc:nat, mp:MACHINE.PROGRAM):state <=
else if mp=VOID
else if pc>size(end(mp))
else if fetch(pc,mp)=LOAD(loc(fetch(pc,mp)))
else if fetch(pc,mp)=EXEC(prog(fetch(pc,mp)))
else if fetch(pc,mp)=BRANCH-(displ.B-(fetch(pc,mp)))
then if top(stack(r))=0
then if succ(displ.B-(fetch(pc,mp)))>pc
else if loops(r)=0
else if fetch(pc,mp)=JUMP+(displ.J+(fetch(pc,mp)))
else ... fi ... fi
Fig. 6. The interpreter exec for machine programs
function postfix(e:EXPR):STACK.PROGRAM <=
else if e=EXPR0(e-op0(e))
else if e=EXPR1(e-op1(e),arg(e))
fi fi fi
function code(wp:WHILE.PROGRAM):MACHINE.PROGRAM <=
else if wp=IF(ICOND(wp),THEN(wp),ELSE(wp))
else if wp=WHILE(WCOND(wp),BODY(wp))
else if wp=SET(CELL(wp),TERM(wp))
fi fi fi fi
Fig. 7. Generation of stack code and machine code
4 Code Generation
We aim at deﬁning a procedure code which generates a machine program from
a WHILE-program such that both programs compute the same function. To do
so, we start by generating stack programs from the expressions used in a WHILE-
program. This is achieved by procedure postfix of Fig. 7, which simply trans-
lates the tree-structure of expressions into the linear format of stack programs.6
E.g., the expression EXPR2(MINUS,VAR@(1),VAR@(2)) is translated by postfix
to the stack program PUSH.VAR(VAR(1),PUSH.VAR(VAR(2),PUSH.OP2(MINUS,
Using the stack code generation for expressions, procedure code of Fig. 7
deﬁnes the machine code generation for WHILE-programs in a straightforward
recursive way, where the recursively computed code is embedded into machine
instructions in order to translate the control structure of the statement under
The call of procedure extend in procedure postfix concatenates stack programs and
the call of procedure append in procedure code concatenates machine programs.
consideration. Fig. 5 displays the result of code when applied to the WHILE-
program of Fig. 2.
The correctness property for code is formally stated by
(4.1) lemma code is correct <= all wp:WHILE.PROGRAM, r:state
expressing that eval returns the same state upon the evaluation of a WHILE-
program wp as exec returns when applied in the same initial state r to the
machine program resulting from the translation of wp computed by code.
5 About X
We intend to prove (4.1) code is correct with X eriFun . In a typical session
with this system, a user deﬁnes a program by stipulating the data structures and
the procedures of the program, deﬁnes statements about these program elements
and veriﬁes these statements and the termination of the procedures.
XeriFun consists of several automated routines for theorem proving and for
the formation of hypotheses to support veriﬁcation. It is designed as an inter-
active system, where, however, the automated routines substitute the human
expert in striving for a proof until they fail. In such a case, the user may step in
to guide the system for a continuation of the proof.
When called to prove a statement, the system computes a prooftree. An
interaction, which may be required when the construction of the prooftree gets
stuck, is to instruct the system to use a proof rule, i.e.
— to perform a case analysis,
— to use an instance of a lemma or an induction hypothesis,
— to unfold a procedure call,
— to apply an equation,
— to use an induction axiom, or
— to insert, move or delete some hypothesis in the sequent of a proof-node.
For simplifying proof goals, a further set of so-called computed proof rules
is provided. For example, the Simpliﬁcation rule rewrites a goalterm using the
deﬁnitions of the data structures and the procedures, the hypotheses and the
induction hypotheses of the proof-node sequent and the lemmas already veri-
ﬁed. The other computed proof rules perform a similar rewrite, however with
restricted performance. The computed proof rules are implemented by the Sym-
bolic Evaluator, i.e. an automated theorem prover over which the X eriFun user
has no control. The Symbolic Evaluator can also be used for program tests to
evaluate instances of lemmas or to “run” procedures.7
For instance, the machine program of Fig. 5 has been computed by the Symbolic
Evaluator when called to apply code to the WHILE-program of Fig. 2. The Symbolic
Evaluator may also compute the gcd of two numbers N and M by either calling
X eriFun provides no control commands (except disabling induction hypothe-
ses upon symbolic evaluation), thus leaving the proof rules as the only means
for the user to control the system’s behavior. The symbolic evaluations and all
proofs computed by the system may be inspected by the user.
Having applied a user suggested proof rule, the system takes over control
again and tries to develop the prooftree further until it gets stuck once more etc.
or it eventually succeeds. In addition, it may be necessary to formulate (and to
prove) an auxiliary lemma (sometimes after providing a new deﬁnition) in order
to complete the actual proof task.
X eriFun demands that the termination of each procedure that is called in
a statement be veriﬁed before a proof of the statement can be started. There-
fore the system’s automated termination analysis  is activated immediately
after the deﬁnition of a recursively deﬁned procedure. If the automated termi-
nation analysis fails, the user has to tell the system useful termination functions
represented by (sequences of) so-called measure terms. Based on this hint, the
system computes termination hypotheses that are suﬃcient for the procedure’s
termination and then need to be veriﬁed like any other statement.
An introduction into the use of the system is given in , a short survey
is presented in , and a detailed account on the system’s operation and its
logical foundations can be found in .
6 A Machine Veriﬁcation of code
Termination. X eriFun’s automated termination analysis veriﬁes termination
of all procedures upon deﬁnition, except for eval and exec. The interpreter
eval terminates because upon each recursive call either the size of the program
decreases or otherwise remains unchanged (for the outer eval-call in the WHILE-
case), but the loops-counter decreases. But the system is unable to recognize this
argumentation by itself and must be provided with the pair of termination func-
tions λwp¦|wp| , λr¦loops(r), causing the system to prove termination by the lexi-
cographic relation (wp1 , r1 ) >eval (wp2 , r2 ) iﬀ |wp1 | > |wp2 | or |wp1 | = |wp2 | and
loops(r1 ) > loops(r2 ). Hence for the outer eval-call in the WHILE-case, the proof
obligation loops(r)>loops(eval(S,body(wp)) is obtained, which is trivially
veriﬁed as loops(r)6=0 and (*) pred(loops(r))≥loops(eval(S,body(wp))
must hold, cf. Fig. 3.
Note that requirement (*) controlling the outer recursive call in the WHILE-
case is always satisﬁed, as we may prove (after having veriﬁed eval’s termina-
(6.1) lemma eval not increases loops <=
all r:state, wp:WHILE.PROGRAM loops(r)≥loops(eval(r,wp))
eval with the WHILE-program of Fig. 2 or exec with the machine program of Fig.
5, where N and M are assigned to the program variables VAR(1) and VAR(2) in the
memory of the state which is used when calling one of the interpreters.
expressing that eval never increases the loops-component of a state. However,
when removing (*) from the deﬁnition of eval, X eriFun is unable to prove
termination because the system veriﬁes only strong termination of procedures.
Roughly speaking, a terminating procedure terminates strongly iﬀ termination
can be proven without reasoning about the procedure’s semantics, cf.  for
a formal deﬁnition. Since one has to reason about the semantics of eval, viz.
(6.1), for proving its termination if (*) is not provided, the system would fail to
prove eval’s termination.8
The interpreter exec terminates because with each fetch&execute cycle either
the loops-component of the state decreases (in case of a back-leap BRANCH- or
JUMP-) or otherwise stays even, but pc moves towards the end of the program,
cf. Fig. 6. Here the system is unable to recognize this argumentation as well
and needs to be provided with the pair of termination functions λr ¦ loops(r),
λmp, pc ¦ size(mp)−pc.
Correctness of code. Before considering (4.1) code is correct, the correct-
ness of the procedure postfix has to be veriﬁed. Therefore we start with proving
(6.2) lemma postfix is correct <= all e:EXPR, s:Stack, m:memory
expressing that the value computed by value upon the evaluation of an expres-
sion e is obtained as top-of-stack after running the stack program returned by
postfix when applied to expression e. The proof of (6.2) requires two auxiliary
lemmas, one stating that stack programs can be executed step-by-step, viz.
(6.3) lemma run extend <= all m:memory,s:Stack,sp1,sp2:STACK.PROGR
and the other one expressing that the execution of stack programs obtained by
postfix does not aﬀect the stack initially given to run, i.e.
(6.4) lemma pop run postfix <= all e:EXPR, s:Stack, m:memory
All three lemmas have an automated proof and are frequently (and automati-
cally) used in subsequent proofs.
The induction proof of (4.1) code is correct is based on eval’s recursion
structure. Hence it develops into 7 cases, viz.
1. wp is a SKIP-statement,
The need for terminating procedures is necessitated by the logic implemented by our
system, while the failure to prove eval’s termination without requirement (*) is a
lack of our implementation only. This is because, as proved in , our approach for
automated termination proofs  is also sound for procedures with nested recur-
sions like eval. We learned how to transform a terminating procedure with nested
recursions into a strongly terminating procedure from .
2. wp is a SET-statement,
3. wp is a WHILE-statement and loops(r)=0,
4. wp is a COMPOUND-statement,
5. wp is an IF-statement,
6. wp is a WHILE-statement, pred(loops(r))≥loops(eval(S,BODY(wp)) and
7. wp is a WHILE-statement, pred(loops(r))<loops(eval(S,BODY(wp)) and
loops(r)6=0, where S is deﬁned as in Fig. 3.
The ﬁrst 3 cases are the base cases of the induction, and (consequently) the
remaining cases are the step cases.
The ﬁrst two base cases have an automated proof. However, the proof of the
third base case gets stuck and after Simpliﬁcation the goalterm reads as
where R abbreviates
and P abbreviates
Since the pc points in R to the instruction EXEC(postfix(WCOND(wp))),
symbolic execution of the exec-call R yields R0 as
Here the pc points to the instruction BRANCH-(size(code(BODY(wp)))),
hence symbolic execution of the exec-call R0 yields timeout, if value(WCOND(wp),
store(r))=0 holds, and r otherwise, thus completing the proof of (6.5).
We therefore instruct the system with the Unfold Procedure rule to unfold
both procedure calls R in (6.5) and also both resulting procedure calls R0 , which
(after subsequent Simpliﬁcation steps using (6.2) and (6.4)) proves the third base
Of course, it is annoying to call interactively for Unfold Procedure quite often
instead of only letting Simpliﬁcation doing the job. The reason is that the unfold
of procedure calls needs to be controlled heuristically upon symbolic evaluation,
because otherwise unusable goalterms may result. X eriFun uses a heuristic which
is based on a similarity measure between the well-founded relation used for an
induction proof of a statement and the well-founded relations which have been
used to prove the termination of the procedures involved in the statement, and
only calls of procedures having a termination relation similar to the induction
relation are unfolded automatically. This heuristic proved successful in almost all
cases but may fail if a statement refers to procedures which diﬀer signiﬁcantly in
their recursion structure, as eval and exec do in the present case. In the proof
of (4.1), we must therefore call for Unfold Procedure not only in the third base
case, but also in some of the step cases to instruct the system interactively to
execute parts of the machine code in the goalterms symbolically.
Having proved the base cases, the system proceeds with the ﬁrst step case
(where wp is a COMPOUND-statement) and simpliﬁes the induction conclusion to
which we straightforwardly generalize to
(6.10) exec(exec(r,pc,mp1),0,mp2) = exec(r,pc,append(mp1,mp2))
in order to prove the subgoal (6.9) with this equation.
However, (6.10) does not hold. If the pc points to some instruction of append(
mp1,mp2) but is not within the address space of mp1, equation (6.10) rewrites
to exec(r,0,mp2)=exec(r,pc,append(mp1,mp2)), which obviously is false. We
therefore demand size(mp1)>pc, but this restriction is not enough:
Assume that mp1 contains a HALT instruction and exec(r,pc,mp1) returns
the state r’ upon the execution of HALT. Then r’ is also obtained when executing
append(mp1,mp2), and consequently equation (6.10) rewrites to exec(r’,0,mp2)
=r’. Hence we also demand HALT.free(mp1), where procedure HALT.free re-
turns true iﬀ it is applied to a machine program free of HALT instructions.
But even with this additional restriction equation (6.10) is still false. This
time we assume that mp1 contains some forward-leap instruction with a displace-
ment pointing beyond the last instruction of append(mp1,mp2). If this instruc-
tion is executed in a state r’, exec returns r’ and equation (6.10) rewrites to
exec(r’,0,mp2)=r’ again. We therefore demand closed+(mp1) as a further re-
striction, where procedure closed+ returns true iﬀ it is applied to a machine
program mp such that pc+d≤size(mp)-1 for each forward-leap instruction in mp
with fetch(pc,mp)=JUMP+(d) or fetch(pc,mp)=BRANCH+(d).
We continue with our analysis and now assume that mp2 contains some back-
leap instruction with a displacement pointing beyond the ﬁrst instruction of mp2.
For instance, mp1 may consist of one instruction NOOP only and mp2 may consist
of only one instruction JUMP-(0). Then equation (6.10) rewrites to r=timeout,
and a counter example is found again. We therefore demand closed-(mp2)
as another restriction, where procedure closed- returns true iﬀ it is applied
to a machine program mp such that pc≥d+1 for each back-leap instruction in
mp with fetch(pc,mp)=BRANCH-(d) or fetch(pc,mp)=JUMP-(d). But we also
have to demand closed-(mp1), as otherwise equation (6.10) may rewrite to
exec(r’,0,mp2)=r’ again. We are done with this ﬁnal restriction, and the re-
quired lemma reads as
(6.11) lemma exec stepwise <=
all pc:nat, mp1,mp2:MACHINE.PROGRAM, r:state
exec(exec(r,pc,mp1),0,mp2) = exec(r,pc,append(mp1,mp2)),
true), ... , true) .
However, in order to prove subgoal (6.9) by lemma (6.11), it must be veriﬁed
that all preconditions of (6.11) are satisﬁed for machine programs mp1 and mp2
computed by code. We therefore formulate the lemmas size code not zero,
code is HALT.free, code is closed- and code is closed+ all of which are
proved easily. Recognizing the lemmas just developed and veriﬁed, the system
simpliﬁes the induction conclusion of the ﬁrst step case to true.
We continue in the veriﬁcation of lemma (4.1) with the second step case,
which is concerned with the code generation for IF-statements. Using both in-
duction hypotheses, Simpliﬁcation of the step formula yields the goalterm
where P abbreviates
As before, parts of the machine code must be executed symbolically by some
interactive unfold steps, yielding the modiﬁed goalterm
after executing the EXEC and the BRANCH+ instruction in the THEN-case. Since
the pc points in exec(r,succ(succ(succ(size(code(ELSE(wp)))))),P) to
the ﬁrst instruction of code(THEN(wp)), we may get rid of the machine code
EXEC(...);...;JUMP+(...) preceding code(THEN(wp)) in P and adjust the pc
accordingly, as code(THEN(wp)) (being closed-) does not contain any back-
leaps into the preceding code. This yields exec(r,0,code(THEN(wp))) and the
equation of the THEN-case is proved.
The elimination of machine code is formally justiﬁed by
(6.15) lemma exec skip program <=
all mp1,mp2:MACHINE.PROGRAM, r:state, pc:nat
which is needed also in the ELSE-case: After symbolic execution of the EXEC and
the BRANCH+ instruction, both instructions are eliminated using (6.15), yielding
To get rid of the JUMP+ instruction and the code for the THEN-part succeeding
code(ELSE(wp)), we instruct the system to apply (6.11) exec stepwise yielding
the simpliﬁed goalterm
which simpliﬁes further to the inner call exec(r,0,code(ELSE(wp))) after a
symbolic execution of the outer exec-call, as the pc is out of range after the
execution of JUMP+. This proves the subgoal of the ELSE-part, and of the whole
step case in turn.
Next we consider the third step case, which is concerned with the code genera-
tion for WHILE-statements if loops(r)6=0 and pred(loops(r))≥loops(eval(S,
BODY(wp)), where S is deﬁned as in Fig. 3. Since we formulated the proof of the
WHILE-base case as a lemma, Simpliﬁcation proves the statement if the loop-
condition is false and returns (using both induction hypotheses) the simpliﬁed
where P abbreviates
Now (appealing to lemma (6.15)), the JUMP+ instruction is eliminated in the
right-hand exec-call of (6.18), and then the EXEC and the BRANCH- instructions
are executed symbolically in the resulting exec-call, yielding the simpliﬁed goal-
Although the call of exec(S,0,P) in (6.20) seems to be a candidate for
applying lemma (6.11) exec stepwise, this lemma is unusable here because
the machine program EXEC(...);BRANCH-(size(code(BODY(wp)))) fails to be
closed-. We therefore need a further lemma, which is obtained as a generaliza-
tion of the equation in (6.20), viz.
(6.21) lemma exec repetition <=
all mp:MACHINE.PROGRAM, pc:nat, sp:STACK.PROGRAM, r:state
To avoid an over-generalization of the equation in (6.20), the equation of
(6.21) must be restricted to machine programs being closed-, closed+ and
HALT.free, where these requirements have been recognized by the same care-
ful analysis undertaken when developing lemma (6.11) exec stepwise. Having
veriﬁed (6.21), we instruct the system to use this lemma in (6.20) by Apply
Equation, and this results in the simpliﬁed goalterm
It only remains to unfold the outer exec-call in the left-hand side of the equa-
tion in (6.22). Now Simpliﬁcation sets the pc to succ(size(code(BODY(wp))))
and then uses lemma (6.15) to eliminate the JUMP+ instruction just executed as
well as to reset the pc to size(code(BODY(wp))). This makes both sides of the
equation identical, thus proving the third step case.
We conclude with the last step case in which code is generated for WHILE-
statements if loops(r)6=0 and pred(loops(r))<loops(eval(S,BODY(wp)),
where S is deﬁned as in Fig. 3. But this case is impossible, because loops(S)≥
loops(eval(S,...) by lemma (6.1) eval not increases loops and loops(S)=
pred(loops(r)).9 Hence the goalterm of this step case simpliﬁes to true, thus
proving this step case and completing the proof of lemma (4.1) code is correct.
It is interesting to see that we do not get rid of lemma (6.1). Either we need it to
verify lemma (4.1), or we need it to verify eval’s termination without considering
the additional requirement (*) controlling the outer eval-call in the WHILE-case of
Fig. 3, cf. Section 6. In the latter case, (6.1) is not required for subsequent proofs as
step cases 3 and 4 then merge into one step case.
Viewed in retrospect, the proofs required to verify (4.1) code is correct were
obtained without that much eﬀort. However, theorem proving sometimes is like
crossing an unknown terrain for climbing a hill. Viewing the way after reach-
ing the top, it seems quite obvious how to get there directly, but being on
the way one is faced with deadends and traps turning the whole event into a
nightmare. In case of (4.1) code is correct, the crucial steps to success were
(1) the “right” deﬁnition of the machine language’s interpreter exec, (2) the
“right” deﬁnition of a state, and (3) the invention of the key lemmas (6.11)
exec stepwise, (6.15) exec skip program and (6.21) exec repetition as well
as the key notions needed to formulate them.10
Our ﬁrst attempt was to deﬁne a state without the loops- and the stack-
components, but to care for the termination of exec by limiting the total number
of fetch-calls performed when executing a machine program. This means to use
a deﬁnition like function exec(r:state, pc:nat, mp: MACHINE.PROGRAM,
cycles:nat):state <= ..., were cycles is decremented in each recursive
call of exec to enforce termination (and the treatment of the stack is ignored
here for the moment). But this approach requires an additional procedure, say
get.cycles, computing the minimum number of cycles needed to execute a
machine program (if this execution does not result in timeout). This procedure
get.cycles is required to formulate the correctness statement for code, as the
number of loop-bodies of a WHILE-program wp evaluated by eval has to be re-
lated to the number of machine cycles needed to execute code(wp) by exec.
Procedure get.cycles is easily obtained from the deﬁnition of exec, but with
this procedure also a bunch of additional lemmas, in particular the get.cycles-
versions of the key lemmas (6.11), (6.15) and (6.21) need to be formulated and
Having followed this approach for some time, we gave up and started to
work with another version of exec. This deﬁnition was based on the observation
that the number of evaluated loop-bodies in a WHILE-program wp is exactly
the number of BRANCH--calls performed upon the execution of code(wp).12 We
therefore got rid of get.cycles (and also, what is even more important, of all the
Another obstacle was caused by two faulty address calculations in the code generator
which were recognized while attempting to prove code is correct.
get.cycles corresponds to the clock-procedures used in , , , , causing
similar problems there. However, clock can be used to reason about running times,
and also for proof-technical reasons as Moore explains (personal communication).
Of course, this idea depends on our deﬁnition of code generation, hence it may
not transfer to alternative implementations of code. There are two answers to this
complaint: First of all, it would be rather artiﬁcial to use a JUMP- or a BRANCH-
instruction for some reason other to implement a loop. And second, the procedure
exec is a model of some piece of hardware only, where a counter (being it cycles or
loops) has been added for veriﬁcation purposes only. As long as a real machine does
not provide such a counter, we may deﬁne it as we like without spoiling the truth of
statements wrt. the partial correctness of the machine language interpreter.
lemmas coming with it) by (renaming cycles by loops and) decrementing loops
only in the recursive exec-calls coming with a JUMP- or a BRANCH--instruction.
However, this approach has problems, too: This time another procedure, say
get.loops, is required to rephrase the equation in (6.11) exec stepwise for this
version of exec, which then would read
Here the procedure get.loops is needed to compute the number of loops
remaining after the execution of mp1 starting at pc in state r, hence this approach
necessitates the same burden of additional lemmas as the get.cycles idea.
But fortunately, there is an easy remedy to this problem. We simply let
loops become a component of a state rather than a formal parameter of exec,
and then the need for an additional procedure get.loops disappears. As similar
problems with the formulation of (6.11) arise if exec is provided with a formal
parameter s:stack, also the stack becomes a component of a state, and this
motivates our deﬁnitions of the data structure state and the procedure exec.
Having settled these deﬁnitions, the next problem was to formulate the key
lemmas (6.11), (6.15) and (6.21). It is interesting to see that each lemma cor-
responds directly to a statement of the WHILE-language, viz. the COMPOUND-, the
IF-, and the WHILE-statement. Whereas the control structure of WHILE-programs
is easily captured by the notions of the meta-language, viz. functional compo-
sition, conditionals and recursion, cf. Fig. 3, the control structure of a machine
program mp is encoded in the data (viz. mp) by the JUMP and BRANCH instruc-
tions, cf. Fig. 6. This requires making the control structure of machine programs
explicit in the meta-language, and this is achieved by the key lemmas, whose syn-
tactical similarity with the recursive eval-calls (necessitated by the respective
statements) is obvious. However, this was not obvious to us when we developed
the proof of (4.1) code is correct, and most of the time was spent to analyze
the system’s outcome when it got stuck, to ﬁnd out which properties of ma-
chine programs are required in order to get the proof through. Upon this work,
the key notions of HALT.free, closed-, and closed+ machine programs were
recognized, and the key lemmas were speculated step by step.
While the speculation of the key lemmas constituted the main challenge for
us, the proofs of these lemmas constituted the main challenge for the system:
By the rich case-structure of procedure exec, goalterms evolved which are so
huge that the Symbolic Evaluator fails to process them within reasonable time.
A remedy to this problem is to throw in interactively a case analysis (stipulating
the kind of instruction fetch(pc,mp) may yield) which separates the goalterm
into smaller pieces so that the theorem prover can cope with them.13
This case study also revealed a shortcoming of X eriFun’s object language, viz. not
having let- and case-instructions available. We intend to remove this lack of the
language in a future version of the system, as this would signiﬁcantly improve the
performance of the Symbolic Evaluator.
But still the system showed performance problems for another reason: Since
the clauses stemming from the veriﬁed lemmas and the induction hypotheses
may be used upon symbolic evaluation, system performance would decrease with
an increasing number of clauses. Therefore the system uses a lemma-ﬁlter to
throw out all clauses computed from veriﬁed lemmas which (heuristically) do
not seem to contribute to a proof. This prevents the Symbolic Evaluator from
being swamped by a huge number of clauses (without introducing the need for
frequent user calls to consider a lemma which - although required - did not pass
the ﬁlter), and makes a user command to disable veriﬁed lemmas obsolete in
X eriFun. However, induction hypotheses are not considered by the lemma-ﬁlter,
simply because induction hypotheses are quite similar to the original statement
thus causing the lemma-ﬁlter to let them pass in almost all cases. Now in case
of each key-lemma, 8 induction hypotheses are available in the step case which
separate into 66 clauses having 10 − 12 literals each. This creates a large search
space for the Symbolic Evaluator and performance decreases signiﬁcantly. We
therefore instructed the system to disable the induction hypotheses upon sym-
bolic evaluation, and then the system went through the proofs (needing further
advice from time to time).14 Using this setting, X eriFun succeeded because
after Simpliﬁcation following the case analysis, the system picked up the right
induction hypotheses with the Use Lemma rule, and a subsequent Simpliﬁcation
then proved the case.15 Viewed from the structure of the proofs, the key-lemmas
showed nothing spectacular as the proofs (which are quite similar) consist of
large sequences of (interactive) calls for Case Analysis and (automated) calls of
Simpliﬁcation and Use Lemma only.
In addition, a bunch of “routine” lemmas had been created (expressing for
example that HALT.free distributes over append, etc.), whose need, however,
was immediately obvious from the system’s outcome and whose proofs needed
an interaction in very rare cases only.
Fig. 8 gives an account on XeriFun’s automation degree measured in terms of
prooftree edits.16 For the whole case study, 13 data structures and 22 procedures
were deﬁned to formulate 56 lemmas, whose veriﬁcation required 1038 prooftree
edits in total, where only 186 of them had to be suggested interactively.
The Symbolic Evaluator computed proofs with a total number of 64750
rewrite steps within 33 minutes running time, where the longest subproof of
1259 steps had been created for lemma (6.21) exec repetition.
The value for the automated calls of Use Lemma is unusually high as com-
pared to other case studies performed with X eriFun , e.g. , , which is
caused by the fact that the induction hypotheses had been disabled upon sym-
As a matter of fact, we never saw the need for disabling induction hypotheses before.
Induction hypotheses are treated like lemmas in X eriFun, except that induction
hypotheses are only locally available, i.e. for a speciﬁc proof, whereas lemmas are
globally available, i.e. for all proofs.
The values are computed for X eriFun 2.6.1 under Windows XP running JAVA
1.4.1 01 on a 2.2 GHz Pentium 4 PC. The 3 interactions to disable induction
hypotheses are not listed in Fig. 8.
Fig. 8. Proof statistics obtained for the veriﬁcation of the code generator
bolic evaluation of the key lemmas so that Use Lemma could succeed following
Simpliﬁcation. Whereas the Induction rule performed perfectly here, the values
for Unfold Procedure and for Case Analysis are unusually high, which reﬂects the
need for frequent interactive calls for symbolic execution of machine programs
when proving (4.1), and also reﬂects the separation into subcases needed for the
proofs of the key lemmas.
In total, 82.1% of the required prooftree edits had been computed by ma-
chine, a number which (although it is not as good as the values encountered
in other cases) we consider as good enough to provide signiﬁcant support for
computer-aided veriﬁcation. With the key notions for machine programs and
the key lemmas for the machine language interpreter, a clear and illuminating
structure for the proof of the main statement evolved, which lacks formal clutter
and therefore provides a useful base to illustrate the rˆle of formal semantics
and the beneﬁts of computer-aided veriﬁcation in the classroom.
Acknowledgement We are grateful to Markus Aderhold for useful comments.
2. A. V. Aho, R. Sethi, and J. D. Ullmann. Compilers: Principles, Techniques and
Tools. Addison-Wesley, New York, 1986.
3. A. Dold and V. Vialard. A mechanically veriﬁed compiling speciﬁcation for a Lisp
compiler. In R. Hariharan, M. Mukund, and V. Vinay, editors, FST TCS 2001:
Foundations of Software Technology and Theoretical Computer Sience, volume 2245
of Lect. Notes in Comp. Sc., pages 144—155, 2001.
4. A. D. Flatau. A Veriﬁed Implementation of an Applicative Language with Dynamic
Storage Allocation. PhD. Thesis, Univ. of Texas, 1992.
5. J. Giesl. Termination of Nested and Mutually Recursive Algorithms. Journal of
Automated Reasoning, 19:1—29, 1997.
6. W. Goerigk, A. Dold, T. Gaul, G. Goos, A. Heberle, H. von Henke, U. Hoﬀmann,
H. Langmaack, and W. Zimmermann. Compiler correctness and implementation
veriﬁcation: The Veriﬁx approach. In P. Fritzson, editor, Proc. of the Poster Ses-
sion of CC’96 - Intern. Conf. on Compiler Construction, pages 65 — 73, 1996.
7. C. A. Gunter. Semantics of Programming Languages – Structures and Techniques.
The MIT Press, Cambridge, 1992.
8. J. Hannan and F. Pfenning. Compiler veriﬁcation in LF. In A. Scedrov, editor,
Proceedings of the Seventh Annual IEEE Symposium on Logic in Computer Science,
pages 407—418. IEEE Computer Society Press, 1992.
9. J. McCarthy and J. A. Painter. Correctness of a Compiler for Arithmetical Ex-
pressions. In J. T. Schwartz, editor, Proc. on a Symp. in Applied Math., 19, Math.
Aspects of Comp. Sc. American Math. Society, 1967.
10. J. S. Moore. A Mechanically Veriﬁed Language Implementation. Journal of Auto-
mated Reasoning, 5(4):461—492, 1989.
11. J. S. Moore. PITON - A Mechanically Veriﬁed Assembly-Level Language. Kluwer
Academic Publishers, Dordrecht, 1996.
12. J. S. Moore. An exercise in graph theory. In M. Kaufmann, P. Manolios, and
J. S. Moore, editors, Computer-Aided Reasoning: ACL2 Case Studies, pages 41—
74, Boston, MA., 2000. Kluwer Academic Press.
13. H. R. Nielson and F. Nielson. Semantics with Applications. John Wiley and Sons,
New York, 1992.
14. P. Curzon. A veriﬁed compiler for a structured assembly language. In M. Archer,
J.J. Joyce, K.N. Levitt, and P.J. Windley, editors, International Workshop on
Higher Order Logic Theorem Proving and its Applications, pages 253—262, Davis,
California, 1991. IEEE Computer Society Press.
15. G. Schellhorn and W. Ahrendt. The WAM case study: Verifying compiler correct-
ness for Prolog with KIV. In W. Bibel and P. H. Schmidt, editors, Automated
Deduction: A Basis for Applications. Volume III, Applications. Kluwer Academic
Publishers, Dordrecht, 1998.
16. C. Walther. On Proving the Termination of Algorithms by Machine. Artiﬁcial
Intelligence, 71(1):101—157, 1994.
17. C. Walther. Criteria for Termination. In S. H¨lldobler, editor, Intellectics and Com-
putational Logic, pages 361—386. Kluwer Academic Publishers, Dordrecht, 2000.
18. C. Walther. Semantik und Programmveriﬁkation. Teubner-Wiley, Leipzig, 2001.
19. C. Walther and S. Schweitzer. A Machine Supported Proof of the Unique Prime
Factorization Theorem. Technical Report VFR 02/03, Programmiermethodik,
Technische Universit¨t Darmstadt, 2002.
20. C. Walther and S. Schweitzer. The X eriFun Tutorial. Technical Report VFR
02/04, Programmiermethodik, Technische Universit¨t Darmstadt, 2002.
21. C. Walther and S. Schweitzer. X eriFun User Guide. Technical Report VFR 02/01,
Programmiermethodik, Technische Universit¨t Darmstadt, 2002.
22. C. Walther and S. Schweitzer. Veriﬁcation in the Classroom. Technical Report
VFR 02/05, Programmiermethodik, Technische Universit¨t Darmstadt, 2002.
23. C. Walther and S. Schweitzer. About X eriFun. In F. Baader, editor, Proc. of the
19th Inter. Conf. on Automated Deduction (CADE-19), volume 2741 of Lecture
Notes in Artiﬁcal Intelligence, pages 322—327, Miami Beach, 2003. Springer-Verlag.
24. C. Walther and S. Schweitzer. Veriﬁcation in the Classroom. To appear in Journal
of Automated Reasoning - Special Issue on Automated Reasoning and Theorem
Proving in Education, pages 1—43, 2003.
25. C. Walther and S. Schweitzer. A Veriﬁcation of Binary Search. In D. Hutter and
W. Stephan, editors, Mechanizing Mathematical Reasoning: Techniques, Tools and
Applications, volume 2605 of LNAI, pages 1—18. Springer-Verlag, 2003.
26. G. Winskel. The Formal Semantics of Programming Languages. The MIT Press,
27. W. D. Young. A Mechanically Veriﬁed Code Generator. Journal of Automated
Reasoning, 5(4):493—518, 1989.