A Rely-Guarantee-Based Simulation for Verifying Concurrent

Document Sample
A Rely-Guarantee-Based Simulation for Verifying Concurrent Powered By Docstoc
					                       A Rely-Guarantee-Based Simulation for
                   Verifying Concurrent Program Transformations
                                              Hongjin Liang           Xinyu Feng      Ming Fu
                                               School of Computer Science and Technology
                                              University of Science and Technology of China

Abstract                                                                     program respects the semantics of atomic blocks gives us the
Verifying program transformations usually requires proving that              correctness of the STM implementation.
the resulting program (the target) refines or is equivalent to the          • Correctness of concurrent garbage collectors (GCs). High-
original one (the source). However, the refinement relation between           level garbage-collected languages (e.g., Java) allow program-
individual sequential threads cannot be preserved in general with            mers to work at an abstract level without knowledge of the un-
the presence of parallel compositions, due to instruction reordering         derlying GC algorithm. However, the concrete and executable
and the different granularities of atomic operations at the source           low-level program involves interactions between the mutators
and the target. On the other hand, the refinement relation defined             and the collector. If we view the GC implementation as a trans-
based on fully abstract semantics of concurrent programs assumes             formation from high-level mutators to low-level ones with a
arbitrary parallel environments, which is too strong and cannot be           concrete GC thread, the GC safety can be reduced naturally to
satisfied by many well-known transformations.                                 the semantics preservation of the transformation.
    In this paper, we propose a Rely-Guarantee-based Simulation
(RGSim) to verify concurrent program transformations. The rela-               To verify the correctness of a program transformation T, we
tion is parametrized with constraints of the environments that the        follow the standard approach [17] and define a refinement relation
source and the target programs may compose with. It considers the            between the target and the source programs, which says the target
interference between threads and their environments, thus is less         has no more observable behaviors than the source. Then we can
permissive than relations over sequential programs. It is composi-        formalize the correctness of the transformation as follows:
tional w.r.t. parallel compositions as long as the constraints are sat-        Correct(T)       ∀C, C. C = T(C) =⇒ C             C.       (1.1)
isfied. Also, RGSim does not require semantics preservation under
all environments, and can incorporate the assumptions about en-           That is, for any source program C acceptable by T, T(C) is
vironments made by specific program transformations in the form            a refinement of C. When the source and the target are shared-
of rely/guarantee conditions. We use RGSim to reason about opti-          state concurrent programs, the refinement       needs to satisfy the
mizations and prove atomicity of concurrent objects. We also pro-         following requirements to support effective proof of Correct(T):
pose a general garbage collector verification framework based on
                                                                           • Since the target T(C) may be in a different language from the
RGSim, and verify the Boehm et al. concurrent mark-sweep GC.
                                                                             source, the refinement should be general and independent of the
                                                                             language details.
1.    Introduction
                                                                           • To verify fine-grained implementations of abstract operations,
Many verification problems can be reduced to verifying program                the refinement should support different views of program states
transformations, i.e., proving the target program of the transforma-         and different granularities of state accesses at the source and the
tion has no more observable behaviors than the source. Below we              target levels.
give some typical examples in concurrent settings:
                                                                           • When T is syntax-directed (and it is usually the case for par-
 • Correctness of compilation and optimizations of concurrent                allel compositions, i.e., T(C C ) = T(C) T(C )), a com-
     programs. In this most natural program transformation verifica-          positional refinement is of particular importance for modular
     tion problem, every compilation phase does a program transfor-          verification of T.
     mation T, which needs to preserve the semantics of the inputs.
                                                                          However, existing refinement (or equivalence) relations cannot sat-
 • Atomicity of concurrent objects. A concurrent object or library
                                                                          isfy all these requirements at the same time. Contextual equiva-
     provides a set of methods that allow clients to manipulate the       lence, the canonical notion for comparing program behaviors, fails
     shared data structure with abstract atomic behaviors [13]. Their     to handle different languages since the contexts of the source and
     correctness can be reduced to the correctness of the transforma-     the target will be different. Simulations and logical relations have
     tion from abstract atomic operations to concrete and executable      been used to verify compilation [4, 14, 17, 18], but they are usu-
     programs in a concurrent context.                                    ally designed for sequential programs (except [18, 22], which we
 • Verifying implementations of software transactional memory             will discuss in Section 8). Since the refinement or equivalence re-
     (STM). Many languages supporting STM provide a high-level            lation between sequential threads cannot be preserved in general
     atomic block atomic{C}, so that programmers can assume the           with parallel compositions, we cannot simply adapt existing work
     atomicity of the execution of C. Atomic blocks are imple-            on sequential programs to verify transformations of concurrent pro-
     mented using some STM protocol (e.g., TL2 [10]) that allows          grams. Refinement relations based on fully abstract semantics of
     very fine-grained interleavings. Verifying that the fine-grained       concurrent programs are compositional, but they assume arbitrary

                                                                     1                                                                 2011/10/8
program contexts, which is too strong for many practical transfor-                   local r1;                      local r2;
mations. We will explain the challenges in detail in Section 2.                      x := 1;                        y := 1;
    In this paper, we propose a Rely-Guarantee-based Simulation                      r1 := y;                       r2 := x;
(RGSim) for compositional verification of concurrent transforma-                      if (r1 = 0) then               if (r2 = 0) then
tions. By addressing the above problems, we make the following                         critical region                critical region
                                                                                          (a) Dekker’s Mutual Exclusion Algorithm
 • RGSim parametrizes the simulation between concurrent pro-
      grams with rely/guarantee conditions [15], which specify the                              x := x+1;          x := x+1;
      interactions between the programs and their environments.                                              vs.
      This makes the corresponding refinement relation composi-                            local r1;                local r2;
      tional w.r.t. parallel compositions, allowing us to decompose                       r1 := x;                 r2 := x;
      refinement proofs for multi-threaded programs into proofs for                        x := r1 + 1;             x := r2 + 1;
      individual threads. On the other hand, the rely/guarantee con-
      ditions can incorporate the assumptions about environments                       (b) Different Granularities of Atomic Operations
      made by specific program transformations, so RGSim can be
      applied to verify many practical transformations.                          Figure 1. Equivalence Lost after Parallel Composition
 • Based on the simulation technique, RGSim focuses on com-
      paring externally observable behaviors (e.g., I/O events) only,
                                                                         optimizations for sequential programs may change the behaviors
      which gives us considerable leeway in the implementations of
                                                                         of multi-threaded programs [5]. The Dekker’s algorithm shown in
      related programs. The relation is mostly independent of the lan-
                                                                         Figure 1(a) has been widely used to demonstrate the problem. Re-
      guage details. It can be used to relate programs in different
                                                                         ordering the first two statements of the thread on the left preserves
      languages with different views of program states and different
                                                                         its sequential behaviors, but the whole program can no longer en-
      granularities of atomic state accesses.
                                                                         sure exclusive access to the critical region.
 • RGSim makes relational reasoning about optimizations possi-               In addition to instruction reordering, the different granularities
      ble in parallel contexts, We present a set of relational reason-   of atomic operations between the source and the target programs
      ing rules to characterize and justify common optimizations in a    can also break the compositionality of program equivalence in a
      concurrent setting, including hoisting loop invariants, strength   concurrent setting. In Figure 1(b), the target program at the bottom
      reduction and induction variable elimination, dead code elimi-     behaves differently from the source at the top (assuming each
      nation, redundancy introduction, etc..                             statement is executed atomically), although the individual threads
 • RGSim gives us a refinement-based proof method to verify               at the target and the source have the same behaviors.
      fine-grained implementations of abstract algorithms and con-
      current objects. We successfully apply RGSim to verify con-        2.2     Assuming Arbitrary Environments is Too Strong
      current counters, the concurrent GCD algorithm, Treiber’s non-     The problem with the refinement for sequential programs is that it
      blocking stack and the lock-coupling list.                         does not consider the effects of threads’ intermediate state accesses
 • We reduce the problem of verifying concurrent garbage col-            on their parallel environments. People have given fully abstract se-
      lectors to verifying transformations, and present a general GC     mantics to concurrent programs (e.g., [1, 8]). The semantics of a
      verification framework, which combines unary rely-guarantee-        program is modeled as a set of execution traces. Each trace is an
      based verification [15] with relational proofs based on RGSim.      interleaving of state transitions made by the program itself and ar-
                                                                         bitrary transitions made by the environment. Then the refinement
 • We verify the Boehm et al. concurrent garbage collection algo-        between programs can be defined as the subset relation between
      rithm [7] using our framework. As far as we know, it’s the first    the corresponding trace sets. Since it considers all possible envi-
      time to formally prove the correctness of this algorithm.          ronments, the refinement relation has very nice compositionality,
In the rest of this paper, we first analyze the challenges for com-       but unfortunately is too strong to formulate the correctness of many
positional verification of concurrent program transformations, and        well-known transformations, including the four classes of transfor-
explain our approach informally in Section 2. Then we give the ba-       mations mentioned before:
sic technical settings in Section 3 and present the formal definition      • Many concurrent languages (e.g., C++ [6]) do not give seman-
of RGSim in Section 4. We show the use of RGSim to reason about
                                                                               tics to programs with data races (like the examples shown in
optimizations in Section 5, verify atomicity of concurrent objects
                                                                               Figure 1). Therefore the compilers only need to guarantee the
in Section 6, and prove the correctness of concurrent GCs in Sec-
                                                                               semantics preservation of data-race-free programs.
tion 7. Finally we discuss related work and conclude in Section 8.
                                                                          • When we prove that a fine-grained implementation of a concur-
2.     Challenges and Our Approach                                             rent object is a refinement of an abstract atomic operation, we
                                                                               can assume that all accesses to the object in the context of the
The major challenge we face is to have a compositional refinement               target program use the same set of primitives.
relation between concurrent programs, i.e., we should be able to
                                                                          • Usually the implementation of STM (e.g., TL2 [10]) ensures
know T(C1 ) T(C2 )        C1 C2 if we have T(C1 )         C1 and
T(C2 )     C2 .                                                                the atomicity of a transaction atomic{C} only when there are
                                                                               no data races. Therefore, the correctness of the transformation
2.1     Sequential Refinement Loses Parallel Compositionality                   from high-level atomic blocks to fine-grained concurrent code
Observable behaviors of sequential imperative programs usually re-             assumes data-race-freedom in the source.
fer to their control effects (e.g., termination and exceptions) and       • Many garbage-collected languages are type-safe and prohibit
final program states. However, refinement relations defined cor-                  operations such as pointer arithmetics. Therefore the garbage
respondingly cannot be preserved after parallel compositions. It               collector could make corresponding assumptions about the mu-
has been a well-known fact in the compiler community that sound                tators that run in parallel.

                                                                    2                                                                     2011/10/8
In all these cases, the transformations of individual threads are                (Events)       e    ::=   ...              (Labels) o ::= e | τ
allowed to make various assumptions about the environments. They
do not have to ensure semantics preservation within all contexts.                                    (a) Events and Transition Labels

2.3   Languages at Source and Target May Be Different                          (LState)     σ ::= . . .
The use of different languages at the source and the target levels             (LExpr)      E       ∈ LState → Int⊥
makes the formulation of the transformation correctness more dif-              (LBExp) B            ∈ LState → {true, false}⊥
ficult. If the source and the target languages have different views of
                                                                                (LInstr)    c       ∈ LState     P((Labels × LState) ∪ {abort})
program states and different atomic primitives, we cannot directly
compare the state transitions made by the source and the target pro-            (LStmt)     C ::= skip | c | C1 ; C2 | if (B) C1 else C2
grams. This is another reason that makes the aforementioned subset                              | while (B) C | C1 C2
relation between sets of program traces in fully abstract semantics             (LStep) −→L ∈ LStmt/{skip} × LState
infeasible. For the same reason, many existing techniques for prov-                             → P((Labels × LStmt × LState) ∪ {abort})
ing refinement or equivalence of programs in the same language
cannot be applied either.                                                                             (b) The Low-Level Language

2.4   Different Observers Make Different Observations                          (HState)     Σ ::= . . .
                                                                               (HExpr)      E       ∈ HState → Int⊥
Concurrency introduces tensions between two kinds of observers:
the outside observers as human beings and the parallel program                (HBExp)       B       ∈ HState → {true, false}⊥
contexts. Outside observers do not care about the implementation               (HInstr)     c       ∈ HState     P((Labels × HState) ∪ {abort})
details of the source and the target programs. For them, interme-              (HStmt)      C ::= skip | c | C1 ; ; C2 | if B then C1 else C2
diate state accesses (such as memory reads and writes) are silent                               | while B do C | C1 C2
steps (unobservable), and only external events (such as I/O opera-             (HStep) −→H ∈ HStmt/{skip} × HState
tions) are observable. On the other hand, state accesses have effects                         → P((Labels × HStmt × HState) ∪ {abort})
on the parallel program contexts, and are not silent to them.
    If the refinement relation relates externally observable event                                     (c) The High-Level Language
traces only, it cannot have parallel compositionality, as we ex-
plained in Section 2.1. On the other hand, relating all state ac-              Figure 2. Generic Languages at Target and Source Levels
cesses of programs is too strong. Any reordering of state accesses
or change of atomicity would fail the refinement.

2.5   Our Approach                                                           Also different from fully abstract semantics for threads, which
                                                                         assumes arbitrary behaviors of environments, RGSim allows us to
In this paper we propose a Rely-Guarantee-based Simulation               instantiate the interference R, G, R and G differently for different
(RGSim)        between the target and the source programs. It es-        assumptions about environments, therefore it can be used to verify
tablishes a weak simulation, ensuring that for every externally ob-      the aforementioned four classes of transformations. For instance, if
servable event made by the target program there is a corresponding       we want to prove that a transformation preserves the behaviors of
one in the source. We choose to view intermediate state accesses         data-race-free programs, we can specify the data-race-freedom in
as silent steps, thus we can relate programs with different imple-       R and G. Then we are no longer concerned with the examples in
mentation details. This also makes our simulation independent of         Figure 1, both of which have data races.
language details.
    To support parallel compositionality, our relation takes into ac-
count explicitly the expected interference between threads and their     3.      Basic Technical Settings
parallel environments. Inspired by the rely-guarantee (R-G) verifi-
                                                                         In this section, we present the source and the target programming
cation method [15], the relation (C, R, G)          (C, R, G) is now
                                                                         languages. Then we define a basic refinement , which relates the
parametrized with the interference specified using rely/guarantee
                                                                         externally observable event traces of programs.
conditions. It talks about not only the target C and the source C,
but also the interference R and G between C and its target-level
environment, and R and G between C and its environment at the            3.1     The Languages
source level. Here the rely condition R (or R) specifies the permit-      Following standard simulation techniques, we model the seman-
ted state transitions that the environment may have, and G (or G)        tics of target and source programs as labeled transition systems.
specifies the possible transitions made by the thread itself.             Before showing the languages, we first define events and labels in
    Informally, (C, R, G)       (C, R, G) says the executions of C       Figure 2(a). We leave the set of events unspecified here. It can be
under the environment R do not exhibit more observable behav-            instantiated by program verifiers, depending on their interest (e.g.,
iors than the executions of C under the environment R, and the           input/output events). A label that will be associated with a state
state transitions of C and C satisfy G and G respectively. RGSim is      transition is either an event or τ , which means the corresponding
now compositional, as long as the threads are composed with well-        transition does not generate any event (i.e., a silent step).
behaved environments only. The parallel compositionality lemma               The target language, which we also call the low-level language,
is in the following form. If we know (C1 , R1 , G1 ) (C1 , R1 , G1 )     is shown in Figure 2(b). We abstract away the forms of states, ex-
and (C2 , R2 , G2 )    (C2 , R2 , G2 ), and also the interference con-   pressions and primitive instructions in the language. An instruc-
straints are satisfied, i.e., G2 ⊆ R1 , G1 ⊆ R2 , G2 ⊆ R1 and             tion is a partial function from states to sets of label and state pairs,
G1 ⊆ R2 , we could get                                                   describing the state transitions and the events it generates. Unsafe
                                                                         executions lead to abort. Note that the semantics of an instruc-
 (C1 C2 , R1 ∩ R2 , G1 ∪ G2 )       (C1 C2 , R1 ∩ R2 , G1 ∪ G2 ) .
                                                                         tion could be non-deterministic. Moreover, it might be undefined
The compositionality of RGSim gives us a proof theory for concur-        on some states, making it possible to model blocking operations
rent program transformations.                                            such as acquiring a lock.

                                                                    3                                                                       2011/10/8
                      (τ, Σ ) ∈ c Σ                        (e, Σ ) ∈ c Σ                      abort ∈ c Σ                     Σ ∈ dom(c)
                  (c, Σ) −→ (skip, Σ )                  (c, Σ) −→ (skip, Σ )               (c, Σ) −→ abort                  (c, Σ) −→ (c, Σ)

                                                                   (C1 , Σ) −→ (C1 , Σ )                          (C2 , Σ) −→ (C2 , Σ )
                  (skip skip, Σ) −→ (skip, Σ)                (C1 C2 , Σ) −→ (C1 C2 , Σ )                   (C1 C2 , Σ) −→ (C1 C2 , Σ )
                               e                                       e
                     (C1 , Σ) −→ (C1 , Σ )                   (C2 , Σ) −→ (C2 , Σ )                (C1 , Σ) −→ abort      or (C2 , Σ) −→ abort
                               e                                       e
                (C1 C2 , Σ) −→ (C1 C2 , Σ )             (C1 C2 , Σ) −→ (C1 C2 , Σ )                        (C1 C2 , Σ) −→ abort

                                             Figure 3. Operational Semantics of the High-Level Language

     Statements are either primitive instructions or compositions of                  R, R           {((σ, σ ), (Σ, Σ )) | (σ, σ ) ∈ R ∧ (Σ, Σ ) ∈ R∗
them. skip is a special statement used as a flag to show the end of                                                        ∧ (σ, Σ) ∈ α ∧ (σ , Σ ) ∈ α}
executions. A single-step execution of statements is modeled as a
labeled transition −→L , which maps a program configuration (a                        InitRelT (ζ)        ∀σ, Σ. σ = T(Σ) =⇒ (σ, Σ) ∈ ζ
pair of statement and state) to sets of label and configuration pairs.                B ⇔B          {(σ, Σ) | B σ = B Σ}          B ∧B     {(σ, Σ) | B σ ∧ B Σ}
It is undefined when the initial statement is skip. The step aborts if                Intuit(α)        ∀σ, Σ, σ , Σ . (σ, Σ) ∈ α ∧ σ ⊆ σ ∧ Σ ⊆ Σ
an unsafe instruction is executed.                                                                            =⇒ (σ , Σ ) ∈ α
     The high-level language (source language) is defined similarly
in Figure 2(c), but it is important to note that its states and primitive            η#α           (η ∩ α) ⊆ (η     α)
instructions may be different from those in the low-level language.                  β◦α          {(σ, Σ) | ∃θ. (σ, θ) ∈ α ∧ (θ, Σ) ∈ β}
The compound statements are almost the same as their low-level                       α    β       {(σ1    σ2 , Σ1     Σ2 ) | (σ1 , Σ1 ) ∈ α ∧ (σ2 , Σ2 ) ∈ β}
counterparts. C1 ; ; C2 and C1 C2 are sequential and parallel com-
positions of C1 and C2 respectively. Note that we choose to use the                  Id       {(σ, σ) | σ ∈ LState}       True      {(σ, σ ) | σ, σ ∈ LState}
same set of compound statements in the two languages for simplic-
ity only. This is not required by our simulation relation, which only                              Figure 4. Auxiliary Definitions for RGSim
needs both languages to have parallel compositions.
     Figure 3 shows part of the definition of −→ H , which gives
the high-level operational semantics of statements. We often omit                                        ETrSet(C, σ) ⊆ ETrSet(C, Σ) .
the subscript H (or L) in −→H (or −→L ). We put the label                        The refinement is defined for program configurations instead of
on top of the arrow in our notation, and omit it when it is τ . The              for code only because the initial states may affect the behaviors
semantics is mostly standard. We only show the rules for primitive               of programs. In this case, the transformation T should translate
instructions and parallel compositions here. Note that when a prim-              states as well as code. We overload the notation and use T(Σ) to
itive instruction c is blocked at state Σ (i.e., Σ ∈ dom(c)), we let             represent state transformation, and use C T C for
the program configuration reduce to itself. Primitive instructions
in the high-level and low-level languages are atomic in the inter-                                  ∀σ, Σ. σ = T(Σ) =⇒ (C, σ)              (C, Σ) ,
leaving semantics. Below we use −→ ∗ for zero or multiple-step                   then Correct(T) defined in formula (1.1) can be reformulated as
transitions with no events generated, and −→ ∗ for multiple-step
transitions with only one event e generated.                                              Correct(T)         ∀C, C. C = T(C) =⇒ C                T    C.      (3.1)

3.2   The Event Trace Refinement                                                      The e-trace refinement relation is defined directly over the ex-
                                                                                 ternally observable behaviors of programs. It is intuitive, and also
Now we can formally define the refinement relation that relates                    abstract in that it is independent of language details. However, as
the set of externally observable event traces generated by the target            we explained before, it is not compositional w.r.t. parallel composi-
and the source programs. A trace is a sequence of events e, and may              tions. In the next section we propose RGSim, which can be viewed
end with a termination marker done or a fault marker abort.                      as a compositional proof technique that allows us to derive the sim-
        (EvtTrace)     E    ::=         | done | abort | e :: E                  ple e-trace refinement.
Definition 1 (Event Trace Set). ETrSetn (C, σ) represents a set of
external event traces produced by C in n steps from the state σ:                 4.       The RGSim Relation
 • ETrSet0 (C, σ)                                                                4.1      The Definition
                    { };
 • ETrSetn+1 (C, σ)                                                              We define RGSim in Definition 3. The co-inductively defined rela-
      {E | (C, σ) −→ (C , σ ) ∧ E ∈ ETrSetn (C , σ )                             tion (C, σ, R, G) α;γ (C, Σ, R, G) is a simulation between pro-
                   e                                                             gram configurations (C, σ) and (C, Σ). It is parametrized with the
        ∨ (C, σ) −→ (C , σ ) ∧ E ∈ ETrSetn (C , σ ) ∧ E = e :: E
                                                                                 rely and guarantee conditions at the low level and the high level,
        ∨ (C, σ) −→ abort ∧ E = abort
                                                                                 which are binary relations over states:
        ∨ C = skip ∧ E = done} .
We define ETrSet(C, σ) as              ETrSetn (C, σ).                             R, G ∈ P(LState × LState) ,                 R, G ∈ P(HState × HState) .
                                                                                 The simulation also takes two additional parameters α and γ, which
We overload the notation and use ETrSet(C, Σ) for the high-level
                                                                                 are relations between the low-level and the high-level states:
language. Then we define an event trace refinement as the subset
relation between event trace sets.                                                                       α, γ, ζ ∈ P(LState × HState) .
Definition 2 (Event Trace Refinement). We say (C, σ) is an e-                         The definition below uses R, R α , the α-related transitions
trace refinement of (C, Σ), i.e., (C, σ) (C, Σ), if and only if                   of R and R. As defined in Figure 4, R, R α puts together the

                                                                           4                                                                               2011/10/8
corresponding transitions of R and R that can be related by α. Here         InitRelT (ζ) (defined in Figure 4) to say the transformation T over
a step in R may correspond to zero or multiple steps of R.                  states ensures the binary precondition ζ.
Definition 3 (RGSim). Whenever (C, σ, R, G)            α;γ   (C, Σ, R, G),   Corollary 5. If there exist R, G, R, G, α, ζ and γ such that
then (σ, Σ) ∈ α and the following are true:                                 InitRelT (ζ) and (C, R, G) α;ζ γ (C, R, G), then C T C.
1. if (C, σ) −→ (C , σ ), then (σ, σ ) ∈ G and                              4.2   Compositionality Rules
   there exist C and Σ such that (C, Σ) −→ ∗ (C , Σ ), (Σ, Σ ) ∈
   G∗ and (C , σ , R, G) α;γ (C , Σ , R, G);                                RGSim is compositional w.r.t. various program constructs, includ-
               e                                                            ing parallel compositions. We present the compositionality rules in
2. if (C, σ) −→ (C , σ ), then (σ, σ ) ∈ G and                              Figure 5, which gives us a relational proof method for concurrent
   there exist C and Σ such that (C, Σ) −→ ∗ (C , Σ ), (Σ, Σ ) ∈            program transformations.
   G and (C , σ , R, G) α;γ (C , Σ , R, G);                                    As in the R-G logic [15], we require that pre- and post-
3. if C = skip, then                                                        conditions be stable under the interference from the environments.
   there exists Σ such that (C, Σ) −→ ∗ (skip, Σ ), (Σ, Σ ) ∈ G∗            Here we introduce the concept of stability of a relation ζ w.r.t. the
   and (σ, Σ ) ∈ γ;                                                         α-related transitions of R and R.
4. if (C, σ) −→ abort, then (C, Σ) −→ ∗ abort;                              Definition 6 (Stability). Sta(ζ, R, R α ) holds iff
5. if ((σ, σ ), (Σ, Σ )) ∈ R, R α , then                                    for all σ, σ , Σ and Σ ,
   (C, σ , R, G) α;γ (C, Σ , R, G).                                         if (σ, Σ) ∈ ζ and ((σ, σ ), (Σ, Σ )) ∈ R, R α , then (σ , Σ ) ∈ ζ.
Then, (C, R, G) α;ζ γ (C, R, G) iff                                         It says whenever ζ holds initially and R and R perform related
for all σ and Σ, if (σ, Σ) ∈ ζ, then (C, σ, R, G)          (C, Σ, R, G).    actions, the result states still satisfy ζ. It’s easy to see that α
                                                                            itself is stable w.r.t. any α-related transitions, i.e., Sta(α, R, R α ).
Informally, (C, σ, R, G)     α;γ (C, Σ, R, G) says the low-level            Another simple example is given below, where both environments
configuration (C, σ) is simulated by the high-level configuration                                                                             }
                                                                            could increment x and the unary stable assertion { x ≥ 0} is lifted
(C, Σ) with behaviors G and G respectively, no matter how their             to the relation ζ:
environments R and R interfere with them. It requires the follow-             ζ {(σ, Σ) | σ(x) = Σ(x) ≥ 0}   α {(σ, Σ) | σ(x) = Σ(x)}
ing hold for every execution of C:                                            R   {(σ, σ ) | σ = σ{x   σ(x) + 1}}
 • Starting from α-related states, each step of C corresponds to              R   {(Σ, Σ ) | Σ = Σ{x     Σ(x) + 1}}
   zero or multiple steps of C, and the resulting states are α-related      We can prove Sta(ζ, R, R α ). Stability is assumed to be an im-
   too. If an external event is produced in the step of C, the same         plicit side-condition at every proof rule.We also require implicitly
   event should be produced by C.                                           that the relies and guarantees are closed over identity transitions,
 • The α relation reflects the abstractions from the low-level ma-           since stuttering steps will not affect observable event traces.
   chine model to the high-level one, and is preserved by related               The rules SKIP, SEQ, IF and WHILE reveal a high degree of
   steps at the two levels (so it is an invariant). For instance, in op-    similarity to the corresponding inference rules in Hoare logic. In
   timizations where the target machine is the same as the source,          the SEQ rule, γ serves as the postcondition of C1 and C1 and the
   the α relation usually requires that the shared states be identical,     precondition of C2 and C2 at the same time. The IF rule requires
   and the thread-local parts be arbitrarily related.                       the conditions of both sides to be evaluated to the same value under
 • The executions of C and C need to satisfy their guarantee con-           the precondition ζ. We give the definitions of the sets B ⇔ B and
   ditions G and G respectively. The guarantees are abstractions of         B ∧B in Figure 4. The rule also requires the precondition ζ to be
   the programs’ behaviors. As we will show later, they will serve          stronger than the invariant α. In the WHILE rule, the γ relation is
   as the rely conditions of the parallel environments at the time of       viewed as a loop invariant preserved at the loop entry point.
   parallel compositions.                                                       The PAR rule shows parallel compositionality of RGSim. The
                                                                            interference constraints say that two threads can be composed in
 • If C terminates, then C terminates as well, and the final states          parallel only if one thread’s guarantee implies the rely of the other.
   should be related by the postcondition γ. Here we assume                 After parallel composition, they are expected to run in the common
   γ ⊆ α, i.e., the postcondition is stronger than the invariant.           environment and their guaranteed behaviors contain each single
 • C is not safe only if C is not safe either. This means the trans-        thread’s behaviors.
   formation should not make a safe high-level program unsafe at                We also develop some other useful rules about RGSim. For ex-
   the low level.                                                           ample, the STREN -α rule allows us to replace the invariant α by a
                                                                            stronger invariant α . We need to check that α is indeed an invari-
 • Whatever the low-level environment R and the high-level one
                                                                            ant preserved by the related program steps, i.e., Sta(α , G, G α )
   R do, as long as the state transitions are α-related, they should        holds. Symmetrically, the WEAKEN -α rule requires α to be pre-
   not affect the simulation between C and C.                               served by environment steps related by the weaker invariant α . As
Then based on the simulation, we hide the states by the precondi-           usual, the pre/post conditions, the relies and the guarantees can be
tion ζ and define the RGSim relation between programs only. It’s             strengthened or weakened by the CONSEQ rule.
easy to see that ζ ⊆ α if (C, R, G) α;ζ γ (C, R, G), i.e., the                  The FRAME rule allows us to use local specifications. When
precondition needs to be stronger than the invariant.                       verifying the simulation between C and C, we need to only talk
    RGSim is sound w.r.t. the e-trace refinement (Definition 2). That         about the locally-used resource in α, ζ and γ, and the local relies
is, (C, σ, R, G) α;γ (C, Σ, R, G) ensures that (C, σ) does not              and guarantees R, G, R and G. Then the proof can be reused in
have more observable behaviors than (C, Σ).                                 contexts where some extra resource η is used, and the accesses of it
                                                                            respect the invariant β and R , G , R and G . We give the auxiliary
Theorem 4 (Soundness). If there exist R, G, R, G, α and γ such
                                                                            definitions in Figure 4. The disjoint union between states is lifted
that (C, σ, R, G) α;γ (C, Σ, R, G), then (C, σ) (C, Σ).
                                                                            to state pairs. An intuitionistic state relation is monotone w.r.t. the
   For program transformations, since the initial state for the target      extension of states. The disjointness η # α says that any state pair
program is transformed from the initial state for the source, we use        satisfying both η and α can be split into two disjoint state pairs

                                                                      5                                                                     2011/10/8
                            ζ⊆α                                        (C1 , R, G)          α;ζ γ   (C1 , R, G)     (C2 , R, G)      α;γ η    (C2 , R, G)
                                                     (SKIP)                                                                                                 (SEQ)
           (skip, R, Id)    α;ζ ζ   (skip, R, Id)                                          (C1 ; C2 , R, G)     α;ζ η   (C1 ; ; C2 , R, G)

                                        (C1 , R, G)      α;ζ1 γ (C1 , R, G)               (C2 , R, G) α;ζ2 γ (C2 , R, G)
                                     ζ ⊆ (B ⇔ B)          ζ1 = (ζ ∩ (B ∧B))                 ζ2 = (ζ ∩ (¬B ∧¬B))       ζ⊆α
                                             (if (B) C1 else C2 , R, G)          α;ζ γ      (if B then C1 else C2 , R, G)

                   (C, R, G)      α;γ1 γ     (C, R, G)        γ ⊆ (B ⇔ B)             γ1 = (γ ∩ (B ∧B))            γ2 = (γ ∩ (¬B ∧¬B))
                                                 (while (B) C, R, G)       α;γ γ2          (while B do C, R, G)

                                  (C1 , R1 , G1 )     α;ζ γ1   (C1 , R1 , G1 )            (C2 , R2 , G2 ) α;ζ γ2 (C2 , R2 , G2 )
                                                    G1 ⊆ R2       G2 ⊆ R1                 G1 ⊆ R2        G2 ⊆ R1
                                    (C1 C2 , R1 ∩ R2 , G1 ∪ G2 )          α;ζ (γ1 ∩γ2 )
                                                                                               (C1 C2 , R1 ∩ R2 , G1 ∪ G2 )

                          (C, R, G) α;ζ γ (C, R, G)                                                  (C, R, G) α;ζ γ (C, R, G)
                     (ζ ∪ γ) ⊆ α ⊆ α    Sta(α , G, G              α)                                 α⊆α       Sta(α, R, R α )
                                                                         (STREN -α)                                                          (WEAKEN -α)
                            (C, R, G)      α ;ζ γ
                                                     (C, R, G)                                       (C, R, G)     α ;ζ γ
                                                                                                                            (C, R, G)

                (C, R, G)    α;ζ γ   (C, R, G)         ζ ⊆ζ         γ⊆γ ⊆α                   R ⊆R          R ⊆R           G⊆G           G⊆G
                                                          (C, R , G )      α;ζ        γ
                                                                                           (C, R , G )

                         (C, R, G) α;ζ γ (C, R, G)                                                             (C, R, G) α;ζ γ (M, RM , GM )
                 η⊆β       Intuit({α, ζ, γ, β, η, R, R, R , R })                                                (M, RM , GM ) β;δ η (C, R, G)
                η # {ζ, γ, α}     Sta(η, { G, G α , R , R β })                                                 R, R β◦α = RM , R β ◦ R, RM α
                                                                                      (FRAME)                                                               (TRANS)
        (C, R     R ,G      G)      α β;(ζ η) (γ η)
                                                         (C, R     R ,G     G)                             (C, R, G)      β◦α;(δ◦ζ) (η◦γ)
                                                                                                                                                (C, R, G)

                                                         Figure 5. Compositionality Rules for RGSim

satisfying η and α respectively. For example, let η           {(σ, Σ) |                       C1 C2 , using a lock l to synchronize the accesses of the shared
σ(y) = Σ(y)} and α            {(σ, Σ) | σ(x) = Σ(x)}, then both η                             variable x. We aim to prove C1 C2 T C1 C2 . That is, although
and α are intuitionistic and η # α holds. We also require η to be                             x:=x+2 is implemented by two steps of incrementing x in C2 , the
stable under interference from the programs (i.e., the programs do                            parallel observer C1 will not print unexpected values. Here we view
not change the extra resource) and the extra environments. We use                             output events as externally observable behaviors.
η # {ζ, γ, α} as a shorthand for (η # ζ) ∧ (η # γ) ∧ (η # α).
Similar representations are used in this rule.                                                                  print(x);           x := x + 2;
    Finally, the transitivity rule TRANS allows us to verify a trans-                                                           ⇓
formation by using an intermediate level as a bridge. The interme-                                              lock(l);            lock(l);
diate environment RM should be chosen with caution so that the                                                  print(x);           x := x+1; x := x+1;
(β ◦ α)-related transitions can be decomposed into β-related and                                                unlock(l);          unlock(l);
α-related transitions. Here ◦ defines the composition of two rela-
tions, as shown in Figure 4. Due to the space limit, all the soundness                            By the soundness and compositionality of RGSim, we only need
proofs are put in the technical report submitted with this paper.                             to prove simulations over individual threads, providing appropriate
                                                                                              relies and guarantees. We first define the invariant α, which only
Instantiations of relies and guarantees. We can derive the se-                                cares about the value of x when the lock is free.
quential refinement and the fully-abstract-semantics-based refine-
ment by instantiating the rely conditions in RGSim. For example,                                           α         {(σ, Σ) | σ(l) = 0 =⇒ σ(x) = Σ(x)} .
the refinement (4.1) over closed programs assumes identity envi-                               We let the pre- and post-conditions be α as well.
ronments, making the interference constraints in the PAR rule un-                                 The high-level threads can be executed in arbitrary environ-
satisfiable. This confirms the observation in Section 2.1 that the
                                                                                              ments with arbitrary guarantees: R = G          True. The transfor-
sequential refinement loses parallel compositionality.
                                                                                              mation uses the lock to protect every access of x, thus the low-level
                 (C, Id, True)       α;ζ γ   (C, Id, True)                 (4.1)              relies and guarantees are not arbitrary:
The refinement (4.2) assumes arbitrary environments, which makes                                     R    {(σ, σ ) | σ(l) = cid =⇒ σ(x) = σ (x) ∧ σ(l) = σ (l)} ;
the interference constraints in the PAR rule trivially true. But this                               G    {(σ, σ ) | σ = σ ∨ σ(l) = 0 ∧ σ = σ{l     cid}
requirement is too strong and usually cannot be satisfied in practice.                                              ∨ σ(l) = cid ∧ σ = σ{x     }
                                                                                                                   ∨ σ(l) = cid ∧ σ = σ{l    0}} .
             (C, True, True)         α;ζ γ   (C, True, True)               (4.2)
                                                                                              The low-level thread guarantees that it updates x only when the
Here we use Id and True (defined in Figure 4) for the sets of identity
                                                                                              lock is acquired. Its environment cannot update x or l if the current
transitions and arbitrary transitions respectively, and overload the
                                                                                              thread holds the lock. Here cid is the identifier of the current
notations at the low level to the high level.
                                                                                              thread. When acquired, the lock holds the id of the owner thread.
Example. Below we give a simple example to illustrate the use of                                  Following the definition, we can prove (C1 , R, G) α;α α
RGSim and its parallel compositionality. C1 C2 is transformed to                              (C1 , R, G) and (C2 , R, G) α;α α (C2 , R, G). By applying the

                                                                                  6                                                                                   2011/10/8
PAR rule and from soundness of RGSim (Corollary 5), we know                 We can eliminate the loop, if the loop condition is false (no matter
C1 C2 T C1 C2 holds for any T that respects α.                              how the environments update the states) at the loop entry point.
    Perhaps interestingly, if we omit the lock and unlock opera-
tions in C1 , then C1 C2 would have more observable behav-                  Dead Code Elimination
iors than C1 C2 . This does not show the unsoundness of our                       (skip, Id, Id)    α;ζ γ   (C, Id, G)       Sta({ζ, γ}, R, R    α)
PAR rule (which is sound!). The reason is that we cannot have
                                                                                                   (skip, R, Id)            (C, R , G)
(print(x), R, G) α;α α (print(x), R, G) with the current                                                           α;ζ γ

definitions of α, R and G, even though the code of the two sides             Intuitively (skip, Id, Id) α;ζ γ (C, Id, G) says that the code C
are syntactically identical.                                                can be eliminated in a sequential context where the initial and the
More discussions. RGSim ensures that the target program pre-                final states satisfy ζ and γ respectively. If both ζ and γ are stable
serves safety properties (including the partial correctness) of the         w.r.t. the interference from the environments R and R , then the
source, but allows a terminating source program to be transformed           code C can be eliminated in such a parallel context as well.
to a target having infinite silent steps. In the above example, this al-     Redundancy Introduction
lows the low-level programs to be blocked forever (e.g., at the time
when the lock is held but never released by some other thread).                   (c, Id, G)   α;ζ γ    (skip, Id, Id)       Sta({ζ, γ}, R, R    α)
Proving the preservation of the termination behavior would require                                 (c, R, G)             (skip, R , Id)
                                                                                                               α;ζ γ
liveness proofs in a concurrent setting (e.g., proving the absence of
deadlock), which we leave as future work.                                   Similar to the dead-code-elimination rule, we can also lift sequen-
    In the next three sections, we show more serious examples to            tial redundant code introduction to the concurrent setting, as long as
demonstrate the applicability of RGSim.                                     the pre/post conditions are stable w.r.t. the environments. Note that
                                                                            here c is a single instruction, because we should consider the in-
5.    Relational Reasoning about Optimizations                              terference from the environments at every intermediate state when
                                                                            introducing a sequence of redundant instructions.
As a general correctness notion of concurrent program transforma-
tions, RGSim establishes a relational approach to justify compiler          5.2   An Example of Invariant Hoisting
optimizations on concurrent programs. Below we adapt Benton’s
                                                                            With these rules, we can prove the correctness of many traditional
work [3] on sequential optimizations to the concurrent setting.
                                                                            compiler optimizations performed on concurrent programs in ap-
5.1    Optimization Rules                                                   propriate contexts. Here we only give a small example of hoisting
                                                                            loop invariants. More examples (e.g., strength reduction and induc-
Usually optimizations depend on particular contexts, e.g., the as-          tion variable elimination) can be found in the technical report.
signment x := E can be eliminated only in the context that the
value of x is never used after the assignment. In a shared-state con-                Target Code (C1 )                         Source Code (C)
current setting, we should also consider the parallel context for an                 local t;                                  local t;
optimization. RGSim enables us to specify various sophisticated                      t := x + 1;
                                                                                                                               while(i < n) {
requirements for the parallel contexts by rely/guarantee conditions.                 while(i < n) {                              t := x + 1;
Based on RGSim, we provide a set of inference rules to character-                      i := i + t;                               i := i + t;
                                                                                     }                                         }
ize and justify common optimizations (e.g., dead code elimination)
with information of both the sequential and the parallel contexts.             When we do not care about the final value of t, it’s not diffi-
Due to the space limit, we only present some interesting rules here         cult to prove that the optimized code C1 preserves the sequential
and leave other rules in the technical report. Note in this section the     behaviors of the source C [3]. But in a concurrent setting, safely
target and the source programs are in the same language.                    hoisting the invariant code t:=x+1 also requires that the environ-
Sequential skip Law                                                         ment should not update x nor t.
                  (C1 , R1 , G1 )    α;ζ γ   (C2 , R2 , G2 )                           R       {(σ, σ ) | σ(x) = σ (x) ∧ σ(t) = σ (t)} .
               (skip; C1 , R1 , G1 )    α;ζ γ   (C2 , R2 , G2 )             The guarantee of the program can be specified as arbitrary transi-
                                                                            tions. Since we only care about the values of i, n and x, the invari-
                 (C1 , R1 , G1 )     α;ζ γ   (C2 , R2 , G2 )                ant relation α can be defined as:
               (C1 , R1 , G1 )    α;ζ γ   (skip; C2 , R2 , G2 )               α     {(σ1 , σ) | σ1 (i) = σ(i) ∧ σ1 (n) = σ(n) ∧ σ1 (x) = σ(x)} .
Plus the variants with skip after the code C1 or C2 . That is, skips        We do not need special pre/post conditions, thus the correctness of
could be arbitrarily introduced and eliminated.                             the optimization is formalized as follows:
Common Branch                                                                               (C1 , R, True)         α;α α   (C, R, True) .             (5.1)
                ∀σ1 , σ2 . (σ1 , σ2 ) ∈ ζ =⇒ B σ2 =⊥
       (C, R, G) α;ζ1 γ (C1 , R , G )        ζ1 = (ζ ∩ (true∧B))               We can prove (5.1) directly by the RGSim definition and the op-
      (C, R, G) α;ζ2 γ (C2 , R , G )        ζ2 = (ζ ∩ (true∧¬B))            erational semantics of the code. Below we give a more convenient
                                                                            proof using the optimization rules and the compositionality rules.
             (C, R, G)    α;ζ γ     (if (B) C1 else C2 , R , G )               We first prove the following by the dead-code-elimination and
This rule says that, when the if-condition can be evaluated and both        redundancy-introduction rules:
branches can be optimized to the same code C, we can transform                             (t:=x+1, R, True) α;α γ (skip, R, True) ;
the whole if-statement to C without introducing new behaviors.                             (skip, R, True) α;γ η (t:=x+1, R, True) ,
Dead While                                                                  where γ and η specify the states at the specific program points:
        ζ = (ζ ∩ (true∧¬B))            ζ⊆α         Sta(ζ, R, R     α)
                                                                                           γ       α ∩ {(σ1 , σ) | σ1 (t) = σ1 (x) + 1} ;
             (skip, R, Id)   α;ζ ζ     (while (B){C}, R , Id)                              η       γ ∩ {(σ1 , σ) | σ(t) = σ(x) + 1} .

                                                                        7                                                                        2011/10/8
After adding skips to C1 and C to make them the same “shape”,                      ADD(e) :                   RMV(e) :
we can prove the simulation by the compositionality rules. Finally,                0 atom {                   0 atom {
we remove all the skips and conclude (5.1), i.e., the correctness                        S := S ∪ {e};              S := S − {e};
                                                                                       }                          }
of the optimization in appropriate contexts. Since the relies only
prohibit updates of x and t, we can execute C1 and C concurrently                                 (a) An Abstract Set
with other threads which update i and n or read x, still ensuring
semantics preservation.                                                       add(e) :                        rmv(e) :
                                                                                                                   local x,y,z,v;
                                                                                    local x,y,z,u;
                                                                                                               0 <x := Head;>
                                                                               0    <x := Head;>
                                                                                                               1 lock(x);
6.   Proving Atomicity of Concurrent Objects                                   1    lock(x);
                                                                                                               2 <y :=;>
                                                                               2    <z :=;>
A concurrent object provides a set of methods, which can be called             3    <u :=;>
                                                                                                               3 <v :=;>
in parallel by clients as the only way to access the object. RGSim             4    while (u < e) {
                                                                                                               4 while (v < e) {
gives us a refinement-based proof method to verify the atomicity                                                5      lock(y);
                                                                               5      lock(z);
of implementations of the object: we can define abstract atomic                                                 6      unlock(x);
                                                                               6      unlock(x);
                                                                                                               7      x := y;
operations in a high-level language as specifications, and prove                7      x := z;
                                                                                                               8      <y :=;>
the concrete fine-grained implementations refine the correspond-                 8      <z :=;>
                                                                                                               9      <v :=;>
ing atomic operations when executed in appropriate environments.               9      <u :=;>
For instance, in Figure 6(a) we define two atomic set operations,                    }
                                                                                                              10 if (v = e) {
ADD(e) and RMV(e). Figure 6(b) gives a concrete implementation                10    if (u != e) {
                                                                                                              11      lock(y);
                                                                              11      y := new();
of the set object using a lock-coupling list. The algorithm was ver-          12      y.lock := 0;
                                                                                                              12      <z :=;>
ified in RGSep [25, 26]. Here we show how to use our RGSim                     13 := e;
                                                                                                              13      < := z;>
to verify its atomicity by proving the low-level methods refine the                                            14      unlock(x);
                                                                              14 := z;
corresponding abstract operations.                                                                            15      free(y);
                                                                              15      < := y;>
                                                                                                                   } else {
    We first take the generic languages in Figure 3, and instantiate                 }
                                                                                                              16      unlock(x);
the high-level program states below.                                          16    unlock(x);
           (HMem)     Ms , M l   ∈   (Loc ∪ PVar)     HVal
                                                                                          (b) The Lock-Coupling List-Based Set
          (HThrds)       Π       ∈   ThrdID → HMem
           (HState)      Σ       ∈   HThrds × HMem                                            Figure 6. The Set Object

The state consists of shared memory Ms (where the object resides)
and a thread pool Π, which is a mapping from thread identifiers               The atomic actions of the algorithm are specified by Glock ,
(t ∈ ThrdID) to their memory Ml . The low-level state σ is defined        Gunlock , Gadd , Grmv and Glocal respectively, which are all parame-
similarly. We use ms , ml and π to represent the low-level shared        terized with a thread identifier t. For example, Grmv (t) says that
memory, thread-local memory and the thread pool respectively.            when holding the locks of the node y and its predecessor x, we can
    To allow ownership transfer between the shared memory and            transfer the node y from the shared memory to the thread’s local
thread-local memory, we use atom{C}A (or C A at the low level)           memory. This corresponds to the action performed by the code of
to convert the shared memory to the local memory and execute             line 13 in rmv(e). Every thread t is executed in the environment
C (or C) atomically. Following RGSep [26], an abstract transition        that any other thread t can only perform those five actions, as de-
A ∈ P(HMem × HMem) (or A ∈ P(LMem × LMem)) is used to                    fined in R(t). Similarly, the high-level G(t) and R(t) are defined
specify the effects of the atomic operation over the shared memory,      according to the abstract ADD(e) and RMV(e).
which allows us to split the resulting state back to shared and local.       We can prove that for any thread t, the following hold:
We omit the annotations A and A in Figure 6, which are the same as            (t.add(e), R(t), G(t))     α;α α   (t.ADD(e), R(t), G(t)) ;
the corresponding guarantees in Figure 7, as we will explain below.           (t.rmv(e), R(t), G(t))     α;α α   (t.RMV(e), R(t), G(t)) .
    In Figure 6, the abstract set is implemented by an ordered
singly-linked list pointed to by a shared variable Head, with two        The proof is done operationally based on the definition of RGSim.
sentinel nodes at the two ends of the list containing the values         Due to the space limit, detailed proofs are given in the technical
MIN VAL and MAX VAL respectively. Each list node is associated           report submitted with this paper.
with a lock. Traversing the list uses “hand-over-hand” locking:              By the compositionality and the soundness of RGSim, we know
the lock on one node is not released until its successor is locked.      that the fine-grained operations (under the parallel environment R)
add(e) inserts a new node with value e in the appropriate position       are simulated by the corresponding atomic operations (under the
while holding the lock of its predecessor. rmv(e) redirects the          high-level environment R), while R and R say all accesses to the
predecessor’s pointer while both the node to be removed and its          set must be done through the add and remove operations. This gives
predecessor are locked.                                                  us the atomicity of the concurrent implementation of the set object.
    We define the α relation, the guarantees and the relies in Fig-       More examples. In the companion technical report, we also show
ure 7. The predicate ms |= list(x, A) represents a singly-linked         the use of RGSim to prove the atomicity of other fine-grained
list in the shared memory ms at the location x, whose values form        algorithms, including the non-blocking concurrent counter [24],
the sequence A. Then the mapping shared map between the low-             Treiber’s stack algorithm [23], and a concurrent GCD algorithm
level and the high-level shared memory is defined by only concern-        (calculating greatest common divisors).
ing about the value sequence on the list: the concrete list should
be sorted and its elements constitute the abstract set. For a thread
t’s local memory of the two levels, we require that the values of        7.   Verifying Concurrent Garbage Collectors
e are the same and enough local space is provided for add(e) and         In this section, we explain in detail how to reduce the problem of
rmv(e). Then α relates the shared memory by shared map and the           verifying concurrent garbage collectors to transformation verifica-
local memory of each thread t by local map.                              tion, and use RGSim to develop a general GC verification frame-

                                                                    8                                                                 2011/10/8
  ms |= list(x, A)            (ms = φ ∧ x = null ∧ A = ) ∨ (∃ms .∃v.∃y.∃A . ms = ms             {x     ( , v, y)} ∧ A = v :: A ∧ ms |= list(y, A ))
  shared map(ms , Ms )        ∃ms .∃A.∃x. ms = ms {Head            x} ∧ (ms |= list(x, MIN VAL :: A :: MAX VAL)) ∧ sorted(A) ∧ (elems(A) = Ms (S))
  local map(ml , Ml )         ml (e) = Ml (e) ∧ ∃ml . ml = ml      {x    ,y      ,z     ,u    ,v        }
  α                           {((π, ms ), (Π, Ms )) | shared map(ms , Ms ) ∧ ∀t ∈ dom(Π). local map(π(t), Π(t))}

  Glock (t)                   {((π, ms ), (π, ms )) | ∃x, v, y. ms (x) = (0, v, y) ∧ ms = ms {x        (t, v, y)}}
  Gunlock (t)                 {((π, ms ), (π, ms )) | ∃x, v, y. ms (x) = (t, v, y) ∧ ms = ms {x       (0, v, y)}}
  Gadd (t)                    {((π {t        ml }, ms ), (π {t        ml }, ms ))
                              | ∃x, y, z, u, v, w. ms (x) = (t, u, z) ∧ ms (z) = ( , w, )
                                ∧ ms = ms {x           (t, u, y)} {y       (0, v, z)} ∧ (ml {y    (0, v, z)} = ml ) ∧ u < v < w}
  Grmv (t)                    {((π {t        ml }, ms ), (π {t        ml }, ms ))
                              | ∃x, y, z, u, v. ms (x) = (t, u, y) ∧ ms (y) = (t, v, z)
                                ∧ ms {y           (t, v, z)} = ms {x       (t, u, z)} ∧ ml = ml {y      (t, v, z)} ∧ v < MAX VAL}
  Glocal (t)                  {((π {t        ml }, ms ), (π {t        ml }, ms )) | π ∈ (ThrdID → LMem) ∧ ml , ml , ms ∈ LMem}
  G(t)                        Glock (t) ∪ Gunlock (t) ∪ Gadd (t) ∪ Grmv (t) ∪ Glocal (t)            R(t)          t =t G(t )

  Gadd (t)                    {((Π {t       Ml }, Ms ), (Π {t      Ml }, Ms )) | ∃e. Ms = Ms {S    Ms (S)∪{e}}}
  Grmv (t)                    {((Π {t       Ml }, Ms ), (Π {t      Ml }, Ms )) | ∃e. Ms = Ms {S    Ms (S)−{e}}}
  Glocal (t)                  {((Π {t       Ml }, Ms ), (Π {t      Ml }, Ms )) | Π ∈ (ThrdID → HMem) ∧ Ml , Ml , Ms ∈ HMem}
  G(t)                        Gadd (t) ∪ Grmv (t) ∪ Glocal (t)                                 R(t)      t =t G(t )

                                              Figure 7. Useful Definitions for the Lock-Coupling List

work. We apply the framework to prove the correctness of the                        Then we reduce the correctness of the concurrent garbage col-
Boehm et al. concurrent GC algorithm [7].                                       lecting system to Correct(T), saying that any mutator program will
                                                                                not have unexpected behaviors when executed using this system.
7.1   Correctness of Concurrent GCs
                                                                                7.2   A General Framework
A concurrent GC is executed by a dedicate thread and performs
the collection work in parallel with user threads (mutators), which             We can use RGSim to prove Correct(T). By its compositionality,
access the shared heap via read, write and allocation operations. To            we can decompose the refinement proofs into proofs for the GC
ensure that the GC and the mutators share a coherent view of the                thread and each mutator thread.
heap, the heap operations from mutators may be instrumented with                Verifying the GC. The semantics of the abstract GC thread can
extra operations, which provide an interaction mechanism to allow               be defined by a binary state predicate AbsGCStep:
arbitrary mutators to cooperate with the GC. These instrumented
heap operations are called barriers (e.g., read barriers, write barriers                                (Σ, Σ ) ∈ AbsGCStep
and allocation barriers).                                                                       (tgc .AbsGC, Σ) −→ (tgc .AbsGC, Σ )
    The GC thread and the barriers constitute a concurrent garbage
                                                                                That is, the abstract GC thread always makes AbsGCStep to change
collecting system, which provides a higher-level user-friendly pro-
                                                                                the high-level state. We can choose different AbsGCStep for differ-
gramming model for garbage-collected languages (e.g., Java). In
                                                                                ent GCs, but usually AbsGCStep guarantees not modifying reach-
this high-level model, programmers feel they access the heap using
                                                                                able objects in the heap.
regular memory operations, and are freed from manually disposing
                                                                                    Thus for the GC thread, we need to show that Cgc is simu-
objects that are no longer in use. They do not need to consider the
                                                                                lated by AbsGC when executed in their environments. This can
implementation details of the GC and the existence of barriers.
                                                                                be reduced to unary Rely-Guarantee reasoning of Cgc by proving
    We could verify the GC system by using a Hoare-style logic to
                                                                                Rgc ; Ggc {pgc }Cgc {qgc } in a standard Rely-Guarantee logic with
prove that the GC thread and the barriers satisfy their specifications.
                                                                                proper Rgc , Ggc , pgc and qgc , as long as Ggc is a concrete represen-
However, we say this is an indirect approach because it is unclear
                                                                                tation of AbsGCStep. The judgment says given an initial state sat-
if the specified correct behaviors would indeed make the mutators
                                                                                isfying the precondition pgc , if the environment’s behaviors satisfy
happy and generate the abstract view for high-level programmers.
                                                                                Rgc , then each step of Cgc satisfies Ggc , and the postcondition qgc
Usually this part is examined by experts and then trusted.
                                                                                holds at the end if Cgc terminates. In general, the collector never ter-
    Here we propose a more direct approach. We view a concurrent
                                                                                minates, thus we can let qgc be false. Ggc and pgc should be provided
garbage collecting system as a transformation T from a high-level
                                                                                by the verifier, where pgc is general enough that can be satisfied by
garbage-collected language to a low-level language. A standard
                                                                                any possible low-level initial state. Rgc encodes the possible behav-
atomic memory operation at the source level is transformed into the
                                                                                iors of mutators, which can be derived, as we will show below.
corresponding barrier code at the target level. In the source level,
we assume there is an abstract GC thread that magically turns                   Verifying mutators. For the mutator thread, since T is syntax-
unreachable objects into reusable memory. The abstract collector                directed on C, we can reduce the refinement problem for arbitrary
AbsGC is transformed into the concrete GC code Cgc running                      mutators to the refinement on each primitive instruction only, by the
concurrently with the target mutators. That is,                                 compositionality of RGSim. The proof needs proper rely/guarantee
                                                                                conditions. Let G(t.c) and G(T(t.c)) denote the guarantees of the
                T(tgc .AbsGC t1 .C1 . . . tn .Cn )                              source instruction c and the target code T(c) respectively. Then we
                    tgc .Cgc t1 .T(C1 ) . . . tn .T(Cn ) ,                      can define the general guarantees for a mutator thread t:
where T(C) simply translates some memory access instructions in                                       G(t)            c G(t.c)
C into the corresponding barriers, and leaves the rest unchanged.                                                                                     (7.1)
                                                                                                      G(t)            c G(t.T(c))

                                                                        9                                                                      2011/10/8
Its relies should include all the possible guarantees made by other            {wfstate} }
threads, and GC’s abstract and concrete behaviors respectively:              0 Collection() {
                                                                             1    local mstk: Seq(Int);
                R(t)         AbsGCStep ∪ t          =t   G(t )                    Loop Invariant: {wfstate ∗ (ownnp (mstk) ∧ mstk = )} }
                                                                    (7.2)    2    while (true) {
                R(t)         Ggc ∪ t =t G(t )
                                                                             3       Initialize();
The Rgc used for GC verification can now be defined below:                            {(wfstate ∧ reach inv) ∗ (ownnp (mstk) ∧ mstk = )}
                                                                             4       Trace();
                          Rgc            t   G(t)                   (7.3)                                                            }
                                                                                    {(wfstate ∧ reach inv) ∗ (ownnp (mstk) ∧ mstk = )}
                                                                             5       CleanCard();
    The refinement proof also needs definitions of binary α, ζ and                                                                     }
                                                                                    {(wfstate ∧ reach inv) ∗ (ownnp (mstk) ∧ mstk = )}
γ relations. The invariant α relates the low-level and the high-level                atomic{
states and needs to be preserved by each low-level step. In general,         6          ScanRoot();
a high-level state Σ can be mapped to a low-level state σ by giving                    {∃X.(wfstate ∧ reach rtnw stk(X) ∧ stk black(X))
a concrete local store for the GC thread, adding additional struc-                       ∗(ownnp (mstk) ∧ mstk = X)}  }
tures in the heap (to record information for collection), renaming           7          CleanCard();
heap cells (for copying GCs), etc.. For each mutator thread t, the                  }
                                                                                    {(wfstate ∧ reach black) ∗ (ownnp (mstk) ∧ mstk = )}}
relations ζ(t) and γ(t) need to hold at the beginning and the end of
                                                                             8       Sweep();
each basic transformation unit (every high-level primitive instruc-                }
tion in this case) respectively. We let γ(t) be the same as ζ(t) to             }
support sequential compositions. We require InitRelT (ζ(t)) (see               {false}}
Figure 4), i.e., ζ(t) holds over the initial states. In addition, the tar-
get and the source boolean expressions should be evaluated to the                   Figure 8. Outline of the GC Code and Proof Sketch
same value under related states, as required in the IF and WHILE
rules in Figure 5.
                                                                                      update(, E) { // id ∈ {pt1, ..., ptm}
 GoodT (ζ(t))        InitRelT (ζ(t)) ∧ ∀B. ζ(t) ⊆ (T(B) ⇔ B) (7.4)
                                                                                        atomic{ := E; aux := x; }
Theorem 7 (Verifying Concurrent Garbage Collecting Systems).                            atomic{ x.dirty := 1; aux := 0; }
If there exist Rgc , Ggc , pgc , R(t), R(t), ζ(t) and α such that (7.1),              }
(7.2), (7.3), (7.4) and the following hold:
1. (Verification of the GC code)                                                      Figure 9. The Write Barrier for Boehm et al. GC
   Rgc ; Ggc {pgc }Cgc {false};
2. (Correctness of T on mutator instructions)
   ∀c. (t.T(c), R(t), G(t.T(c))) α;ζ(t) ζ(t) (t.c, R(t), G(t.c));            (line 4) and traces the objects reachable from the roots (i.e., the
                                                                             mutators’ local pointer variables that may contain references to
3. (Side Conditions)
                                                                             the heap objects). A mark stack (mstk) is used to do a depth-first
   Ggc ◦ α−1 ⊆ α−1 ◦ (AbsGCStep)∗ ;
                                                                             tracing. During the tracing, the connectivity between objects might
   ∀σ, Σ. σ = T(Σ) =⇒ pgc σ;
                                                                             be changed by the mutators, thus a write barrier is required to notify
then Correct(T).                                                             the collector of those modified objects by dirtying the objects’
                                                                             tags (called cards). When the tracing is done, the GC suspends all
   That is, to verify a concurrent garbage collecting system, we             the mutators and re-traces from the dirty objects that have been
need to do the following:                                                    marked (called card-cleaning, line 6 and 7). The stop-the-world
 • Define the α and ζ(t) relations, and prove the correctness of T            phase is implemented by atomic{C}. Finally, all the reachable
      on high-level primitive instructions. Since T preserves the syn-       objects are ensured marked and the GC performs the concurrent
      tax on most instructions, it’s often immediate to prove the target     sweep-phase (line 8), in which unmarked objects are reclaimed.
      instructions are simulated by their sources. But for instructions      Usually in practice, there is also a concurrent card-cleaning phase
      that are transformed to barriers, we need to verify the barriers       (line 5) before the stop-the-world card-cleaning to reduce the pause
      that they implement both the source instructions (by RGSim)            time. The full GC code Cgc is given in the technical report.
      and the interaction mechanism (shown in their guarantees).                 The write barrier is shown in Figure 9, where the dirty field is
                                                                             set after modifying the object’s pointer field. Here we use a write-
 • Find some proper Ggc and pgc , and verify the GC code by R-G              only auxiliary variable aux for each mutator thread to record the
      reasoning. We require the GC’s guarantee Ggc should not con-           current object that the mutator is updating. The GC does not use
      tain more behaviors than AbsGCStep (the first side condition),          read barriers nor allocation barriers.
      and Cgc can start its execution from any state σ transformed               We first present the high-level and low-level program state mod-
      from a high-level one (the second side condition).                     els in Figure 10. The behaviors of the high-level abstract GC thread
                                                                             are defined as follows:
7.3     Application: Boehm et al. Concurrent GC Algorithm
                                                                                AbsGCStep          {((Π, H), (Π, H ))
We illustrate the applications of the framework (Theorem 7)                                        | ∀l. reachable(l)(Π, H) =⇒ H(l) = H (l)} ,
by proving the correctness of a mostly-concurrent mark-sweep
garbage collector proposed by Boehm et al. [7]. Variants of the              saying that, the mutator stores and the reachable objects in the
algorithm have been used in practice (e.g., by IBM [2]). Due to              heap are remained unmodified. Here reachable(l)(Π, H) means the
the space limit, we only describe the proof sketch here. Details are         object at the location l is reachable in H from the roots in Π.
presented in the companion technical report.
                                                                             The transformation. The transformation T is defined as follows.
Overview of the GC algorithm. The top-level code of the GC                   For code, the high-level abstract GC thread is transformed to the
thread is shown in Figure 8. In each collection cycle, after an              GC thread shown in Figure 8. Each instruction := E in
initialization process, the GC enters the concurrent mark-phase              mutators is transformed to the write barrier, where id is a pointer

                                                                        10                                                                2011/10/8
 (HStore) S ∈ PVar      HVal          (HHeap) H ∈ Loc       HObj           logic adapted to the target language. We describe states using sep-
                                                                           aration logic assertions, as shown below:
 (HThrds) Π ∈ MutID → HStore          (HState) Σ ∈ HThrds × HHeap
                                                                            p, q ::= B | t.ownp (x) | t.ownnp (x) | E1 .id → E2 | p ∗ q | . . .
 (LStore) s ∈ PVar      LVal×{0, 1}     (LHeap) h ∈ [1..M ]        LObj
                                                                           Following Parkinson et al. [20], we treat program variables as re-
 (LThrds) π ∈ ThrdID → LStore           (LState) σ ∈ LThrds × LHeap
                                                                           source and use t.ownp (x) and t.ownnp (x) for the thread t’s owner-
                                                                           ships of pointers and non-pointers respectively. We omit the thread
        Figure 10. High-Level and Low-Level State Models                   identifiers if these predicates hold for the current thread.
                                                                               We first give the precondition and the guarantee of the GC. The
field of x. Other instructions and the program structures of mutators       GC starts its executions from a low-level well-formed state, i.e.,
are unchanged.                                                             pgc     wfstate. Just corresponding to the high-level wfstate defi-
   The following transformations are made over initial states.             nition, the low-level wfstate predicate says none of the reachable
                                                                           objects are BLUE. We define Ggc as follows:
 • First we require the high-level initial state to be well-formed:
                                                                                Ggc   {((π {tgc     s}, h), (π {tgc  s }, h ))
     wfstate(Π, H)          ∀l. reachable(l)(Π, H) =⇒ l ∈ dom(H) .                    | ∀n. reachable(n)(π, h)
                                                                                        =⇒ h(n) = h (n)
   That is, reachable locations cannot be dangling pointers.                                  ∧ h(n).color = BLUE ∧ h (n).color = BLUE} .
 • High-level locations are transformed to integers by a bijective         The GC guarantees not modifying the mutator stores. For any
   function Loc2Int : Loc ↔ [0..M ] satisfying Loc2Int(nil) = 0.           mutator-reachable object, the GC does not update its fields coming
 • Variables are transformed to the low level using an extra bit to        from the high-level mutator, nor does it reclaim the object. Here
   preserve the high-level type information (0 for non-pointers and        lifts a low-level object to a new one that contains mutator data only.
   1 for pointers).                                                            The proof sketch is given in Figure 8. One of the key invariants
                                                                           used in the proof is reach inv, which says any WHITE reachable
 • High-level objects are transformed to the low level by adding           object can either be traced from a root object in a path on which
   the color and dirty fields with initial values WHITE and 0 re-           every object is WHITE, or be reachable from a BLACK object whose
   spectively. Other addresses in the low-level heap domain [1..M ]        pointer field was updated and dirty bit was set to 1. Since the
   are filled out using unallocated objects whose colors are BLUE           proof is done in the unary logic, the details here are orthogonal
   and all the other fields are initialized by 0. Here we use BLACK         to our simulation-based proof (but it is RGSim that allows us to
   and WHITE for marked and unmarked objects respectively, and             derive Theorem 7, which then links proofs in the unary-logic with
   BLUE for unallocated memory.                                            relational proofs). We give the complete proofs in the companion
 • The concrete GC thread is given an initial store.                       technical report.
The formal definition of T is included in the technical report.
   To prove Correct(T) in our framework, we apply Theorem 7,               8.     Related Work and Conclusion
prove the refinement between low-level and high-level mutators,             There is a large body of work on refinements and verification of
and verify the GC code using a unary Rely-Guarantee-based logic.           program transformations. Here we only focus on the work most
Refinement proofs for mutator instructions.           We first define the     closely related to the typical applications discussed in this paper.
α and ζ(t) relations.
                                                                           Verifying compilation and optimizations of concurrent programs.
    α      {((π     {tgc     }, h), (Π, H)) |                              Compiler verification for concurrent programming languages can
                    ∀t ∈ dom(Π). store map(π(t), Π(t))                     date back to work by Wand [28] and Gladstein et al. [12], which
                         ∧ heap map(h, H) ∧ wfstate(Π, H)} .               is about functional languages using message-passing mechanisms.
                                                                           Recently, Lochbihler [18] presents a verified compiler for Java
In α, the relation between low-level and high-level stores and heaps
                                                                           threads and prove semantics preservation by a weak bisimulation.
are enforced by store map and heap map respectively. Their defi-
                                                                           He views every heap update as an observable move, thus does not
nitions reflect the state transformations we describe above, ignoring
                                                                           allow the target and the source to have different granularities of
the values of those high-level-invisible structures. It also requires
                                                                           atomic updates. To achieve parallel compositionality, he requires
the well-formedness of high-level states.
                                                                           the relation to be preserved by any transitions of shared states,
    For each mutator thread t, the ζ(t) relation enforced at the
                                                                           i.e., the environment is assumed arbitrary. As we explained in
beginning and the end of each transformation unit (each high-level
                                                                           Section 2, this is a too strong requirement in general for many
instruction) is stronger than α. It requires that the value of the
                                                                           transformations, including the examples in this paper.
auxiliary variable aux (see Figure 9) be a null pointer (0p ):
                                                                                Burckhardt et al. [9] present a proof method for verifying con-
         ζ(t)     α ∩ {((π, h), (Π, H)) | π(t)(aux) = 0p } .               current program transformations on relaxed memory models. The
                                                                           method relies on a compositional trace-based denotational seman-
   The refinement between the write barrier at the low level and
                                                                           tics, where the values of shared variables are always considered
the pointer update instruction at the high level is formulated as:
                                                                           arbitrary at any program point. In other words, they also assume
        (t.update(, E), R(t), Gwrite barrier )   α;ζ(t) ζ(t)           arbitrary environments.
                               t                                                                                             ˇ cı
        (t.( := E), R(t), Gwrite pt ) ,                                     Following Leroy’s CompCert project [17], Sevˇ´k et al. [22]
                                                                           verify compilation from a C-like concurrent language to x86 by
where Gwrite barrier and Gt pt are the guarantees of the two-step
                                                                           simulations. They focus on correctness of a particular compiler, and
write barrier and the high-level atomic write operation respectively.
                                                                           there are two phases in their compiler whose proofs are not compo-
Since the transformation of other high-level instructions is identity,
                                                                           sitional. Here we provide a general, compiler-independent, compo-
the corresponding refinement proofs are simple.
                                                                           sitional proof technique to verify concurrent transformations.
Rely-Guarantee reasoning of the GC code. The unary program                      We apply RGSim to justify concurrent optimizations, following
logic we use to verify the GC thread is a standard Rely-Guarantee          Benton [3] who presents a declarative set of rules for sequential

                                                                      11                                                                2011/10/8
optimizations. Also the proof rules of RGSim for sequential com-                  mostly concurrent garbage collector for servers. ACM Trans. Program.
positions, conditional statements and loops coincide with those in                Lang. Syst., 27(6):1097–1146, 2005.
relational Hoare logic [3] and relational separation logic [29].            [3]   N. Benton. Simple relational correctness proofs for static analyses and
                                                                                  program transformations. In Proc. POPL’04, pages 14–25, 2004.
Proving linearizability or atomicity of concurrent objects. Fil-
ipovi´ et al. [11] show linearizability can be characterized in terms       [4]   N. Benton and C.-K. Hur. Biorthogonality, step-indexing and compiler
of an observational refinement, where the latter is defined similarly               correctness. In Proc. ICFP’09, pages 97–108, 2009.
to our Correct(T). There is no proof method given to verify the             [5]   H.-J. Boehm. Threads cannot be implemented as a library. In Proc.
linearizability of fine-grained object implementations. Turon and                  PLDI’05, pages 261–268, 2005.
Wand [24] propose a refinement-based proof method to verify con-             [6]   H.-J. Boehm and S. V. Adve. Foundations of the C++ concurrency
current objects. Their refinements are based on Brookes’ fully ab-                 memory model. In Proc. PLDI’08, pages 68–78, 2008.
stract trace semantics [8], using rely conditions to specify permit-        [7]   H.-J. Boehm, A. J. Demers, and S. Shenker. Mostly parallel garbage
ted environments. They also prove the atomicity of a non-blocking                 collection. In Proc. PLDI’91, pages 157–164, 1991.
counter and Treiber’s stack algorithm. There both the fine-grained           [8] S. D. Brookes. Full abstraction for a shared-variable parallel language.
and the atomic versions of object operations are expressed in the               Inf. Comput., 127(2):145–163, 1996.
same language. Both methods focus on verifying concurrent ob-               [9] S. Burckhardt, M. Musuvathi, and V. Singh. Verifying local trans-
jects, which is only a special case in transformation verification. It           formations on relaxed memory models. In Compiler Construction
is unclear if the methods can be applied to other transformations,              (CC’10), pages 104–123, 2010.
such as concurrent GCs. We propose RGSim for general verifi-                [10] D. Dice, O. Shalev, and N. Shavit. Transactional locking II. In Proc.
cation of concurrent program transformations. Verifying atomicity               DISC’06, pages 194–208, 2006.
just shows one of its applications.                                                        c
                                                                           [11] I. Filipovi´ , P. O’Hearn, N. Rinetzky, and H. Yang. Abstraction for
    In his thesis [25], Vafeiadis proves linearizability of concurrent          concurrent objects. In Proc. ESOP’09, 2009.
objects by introducing abstract objects and abstract atomic oper-
                                                                           [12] D. S. Gladstein and M. Wand. Compiler correctness for concurrent
ations as auxiliary variables and code in RGSep logic. The refine-               languages. In Proc. COORDINATION’96, pages 231–248, 1996.
ment between the concrete implementation and the abstract version
                                                                           [13] M. Herlihy and N. Shavit. The Art of Multiprocessor Programming.
is implicitly embodied in the unary verification process, but is not
                                                                                Morgan Kaufmann, Apr. 2008.
spelled out in the meta-theory.
                                                                           [14] C.-K. Hur and D. Dreyer. A Kripke logical relation between ML and
Verifying concurrent GCs. Vechev et al. [27] define transforma-                  assembly. In Proc. POPL’11, pages 133–146, 2011.
tions to generate concurrent GCs from an abstract collector. After-        [15] C. B. Jones. Tentative steps toward a development method for inter-
wards, Pavlovic et al. [21] present refinements to derive concrete               fering programs. ACM Trans. Program. Lang. Syst., 5(4):596–619,
concurrent GCs from specifications. These methods focus on de-                   1983.
scribing the behaviors of variants (or instantiations) of a correct ab-    [16] K. Kapoor, K. Lodaya, and U. Reddy. Fine-grained concurrency with
stract collector (or a specification) in a single framework, assuming            separation logic. Technical report, March 2010. http://www.cs.
all the mutator operations are atomic. By comparison, we provide      
a general correctness notion and a proof method for verifying con-         [17] X. Leroy. Formal verification of a realistic compiler. Commun. ACM,
current GCs and the interactions with mutators (where the barriers              52(7):107–115, 2009.
could be fine-grained). Furthermore, the correctness of their trans-
                                                                           [18] A. Lochbihler. Verifying a compiler for java threads. In Proc.
formations or refinements is expressed in a GC-oriented way (e.g.,               ESOP’10, pages 427–447, 2010.
the target GC should mark no less objects than the source), which
                                                                           [19] A. McCreight, Z. Shao, C. Lin, and L. Li. A general framework for
cannot be used to justify other transformations.
                                                                                certifying garbage collectors and their mutators. In Proc. PLDI’07,
    Kapoor et al. [16] verify Dijkstra’s GC using concurrent sepa-              pages 468–479, 2007.
ration logic. To validate the GC specifications, they also verify a
                                                                           [20] M. Parkinson, R. Bornat, and C. Calcagno. Variables as resource in
representative mutator in the same system. In contrast, we reduce
                                                                                Hoare logics. In Proc. LICS’06, pages 137–146, 2006.
the problem of verifying a concurrent GC to verifying a transfor-
mation, ensuring semantics preservation for all mutators. Our GC           [21] D. Pavlovic, P. Pepper, and D. R. Smith. Formal derivation of concur-
                                                                                rent garbage collectors. In Proc. MPC’10, pages 353–376, 2010.
verification framework is inspired by McCreight et al. [19], who
propose a framework for separate verification of stop-the-world and                 ˇ cı
                                                                           [22] J. Sevˇ´k, V. Vafeiadis, F. Z. Nardelli, S. Jagannathan, and P. Sewell.
incremental GCs and their mutators, but their framework does not                Relaxed-memory concurrency and verified compilation. In Proc.
handle concurrency.                                                             POPL’11, pages 43–54, 2011.
                                                                           [23] R. K. Treiber. System programming: coping with parallelism. Tech-
Conclusion and Future Work. We propose RGSim to verify con-                     nical Report RJ 5118, IBM Almaden Research Center, 1986.
current program transformations. By describing explicitly the inter-
                                                                           [24] A. Turon and M. Wand. A separation logic for refining concurrent
ference with environments, RGSim is compositional and can sup-                  objects. In Proc. POPL’11, pages 247–258, 2011.
port many widely-used transformations. We have applied RGSim
to reason about optimizations, prove atomicity of fine-grained con-         [25] V. Vafeiadis. Modular fine-grained concurrency verification. Techni-
                                                                                cal Report UCAM-CL-TR-726, University of Cambridge, Computer
current algorithms and verify concurrent garbage collectors. In the             Laboratory, July 2008.
future, we would like to further test its applicability with more ap-
plications, such as verifying STM implementations and compilers.           [26] V. Vafeiadis and M. J. Parkinson. A marriage of rely/guarantee and
                                                                                separation logic. In Proc. CONCUR’07, pages 256–271, 2007.
It is also interesting to explore the possibility of building tools to
automate the verification process.                                          [27] M. T. Vechev, E. Yahav, and D. F. Bacon. Correctness-preserving
                                                                                derivation of concurrent garbage collection algorithms. In Proc.
                                                                                PLDI’06, pages 341–353, 2006.
                                                                           [28] M. Wand. Compiler correctness for parallel languages. In FPCA’95,
 [1] M. Abadi and G. Plotkin. A model of cooperative threads. In Proc.          pages 120–134, 1995.
     POPL’09, pages 29–40, 2009.
                                                                           [29] H. Yang. Relational separation logic. Theoretical Computer Science,
 [2] K. Barabash, O. Ben-Yitzhak, I. Goft, E. K. Kolodner, V. Leikehman,        375:308–334, 2007.
     Y. Ossia, A. Owshanko, and E. Petrank. A parallel, incremental,

                                                                      12                                                                       2011/10/8

Shared By: