Docstoc

HAVOC - Microsoft Research

Document Sample
HAVOC - Microsoft Research Powered By Docstoc
					HAVOC: A precise and scalable
 verifier for systems software
         Shaz Qadeer
      Microsoft Research
                Collaborators
• Researchers
  – Jeremy Condit, Shuvendu Lahiri
• Interns
  – Shaunak Chatterjee, Brian Hackett, Zvonimir
    Rakamaric, Ian Wehrman, Thomas Wies
                         HAVOC
• Modular verifier for C programs
    –   Verifies each procedure separately
    –   Requires contracts: preconditions, postconditions,
        modifies clauses, loop invariants
• Features
    –   Accurate heap model
    –   Expressive annotation language
    –   Efficient checking using SMT solvers
•   Precise and efficient reasoning for loop-free and
    call-free code
                 Annotated C program

             Visual C
            Front End
                 Control flow graph

           CtoBoogiePL       Memory model

                 Boogie program

             Boogie
           VCGenerator
                 Verification condition
              Z3
Verified                      Warning
           SMT solver
           Challenges for HAVOC
• Concise and precise expression of non-aliasing
  and disjointness of heap values
• Properties of unbounded collections
  – Lists, Arrays, …
• Enable such reasoning for low-level software
  – pointer arithmetic
  – interior pointers
  – nested structures and unions
  –…
  But will programmers ever write contracts?

• In some cases, they might
  – security properties: thousands of buffer
    annotations in Windows code
  – maintenance of critical legacy code: the Windows
    NT file system


• Automatic annotation inference
  – precise and efficient checking of annotated
    programs is a crucial first step
                 Roadmap
• Novel features of the specification language

• Dealing with low-level features of C

• Concluding remarks
log_list.head                                           log_list.tail


                              LinkNode
        next                  next                             next
        prev                  prev                             prev
        data                  data                             data


                                     char *

                channel_name
                   file_name
                    logtype
                struct _logentry         [muh: Internet Relay Chat (IRC) bouncer]
                                LinkNode *iter = log_list.head;
                                while (iter != null) {
                                   struct _logentry *entry = iter->data;
                                   free (entry->channel_name);
                                   free (entry->file_name);
                                   free (entry);
                                   entry = NULL;
                                   iter = iter->next;        Ensure
                                }                          absence of
                                                           double free


Data structure invariant                      Reachability predicate



      For every node x in the list between log_list.head and null:
             x->data is a unique pointer, and
             x->data->channel_name is a unique pointer, and
             x->data->file_name is a unique pointer.
 Universal quantification
      Limitations of SMT solvers
• No support for precise reasoning with
  reachability predicate
  – Incompleteness in Floyd-Hoare proofs for straight
    line code
• Brittle support for quantifiers
  – Complexity: NP-complete (ground)  undecidable
  – Leads to unpredictable behavior of verifiers
     • Proof times, proof success rate
  – Requires user ingenuity to craft axioms/invariants
    with quantifiers
               Contribution
• Expressive and efficient logic for precise
  reasoning about reachability, unique pointers,
  and restricted quantification
• A decision procedure for the logic built over
  an SMT solver
  Simple Java-like memory model
• Heap consists of a set of objects (obj)
• Each field “f” is a mutable map
  – f: obj  obj
  – g: obj  int
  – h: obj  bool
• The sort obj may be refined into a collection of
  sorts
Reachability predicate: Btwnf
x                          y



    next   next                next
    prev   prev                prev
    data   data                data


           Btwnnext(x,y)
           Btwnprev(y,x)
      Inverse of a function: f-1
x                                    y



    next      next                    next
    prev      prev                    prev
    data      data                    data


w
                     data-1(w) = {x, y}
                               LinkNode *iter = log_list.head;
                               while (iter != null) {
                                  struct _logentry *entry = iter->data;
                                  free (entry->channel_name);
                                  free (entry->file_name);
                                  free (entry);
                                  entry = NULL;
                                  iter = iter->next;
                               }



Data structure invariant

    For every node x in the list between log_list.head and null:
           x->data is a unique pointer, and
           ….

    x  Btwnf (log_list.head, null) \ {null}.
          data-1(data(x)) = {x}
          ….
                Expressive logic
• Express properties of collections
  x  Btwnf (f(hd), hd). state(x) = LOCKED     //cyclic


• Arithmetic reasoning on data (e.g. sortedness)
  x  Btwnf (hd, null) \ {null}.
       y  Btwnf (x, null) \ {null}. d(x)  d(y)
                       Precise
• Given the Floyd-Hoare triple X = {P} S {Q}
  – P and Q are expressed in our logic
  – S is a loop-free call-free program
• We can construct a formula Y in our logic
  – Y is linear in the size of X
  – X is valid iff Y is valid

        Need annotations/abstractions
        only at procedure/loop boundaries
                    Efficient
• Decision problem is NP-complete
  – Can’t expect any better with propositional logic!
  – Retains the complexity of current SMT logics
• Provide a decision procedure for the logic on
  top of state-of-the-art Z3 SMT solver
  – Leverages powerful ground-theory reasoning
    (arithmetic, arrays, uninterpreted functions…)
                Ground Logic
                   Logic
t  Term ::= c | x | t1 + t2 | t1 - t2 | f(t)
G  GFormula ::= t = t’ | t < t’ |
                    t  Btwnf(t1, t2) | G



S  Set       ::=     f-1(t) | Btwnf(t1, t2)
F  Formula ::=      G | F1  F2 |F1  F2 |
                      x  S. F
      Ground decision procedure
• Provide a set of 10 rewrite rules for Btwnf
   – Sound, complete and terminating
• E.g. Transitivity3

   t1  Btwnf(t0, t2)   t  Btwnf(t0, t1)

    t  Btwnf(t0, t2), t1  Btwnf(t, t2)
                      Logic
t  Term ::= c | x | t1 + t2 | t1 - t2 | f(t)
G  Bounded quantification | t < t’ |
     GFormula ::= t = t’
                       t
     over interpreted sets Btwnf(t1, t2) | G




S  Set       ::=     f-1(t) | Btwnf(t1, t2)
F  Formula ::=      G | F1  F2 |F1  F2 |
                      x  S. F
      Lazy quantifier instantiation
• Instantiation rule
             t  S x  S. F
                    F[t/x]

• Lazy instantiation
   – Instantiate only when a term t belongs to the set S
   – Substantially reduces the number of terms to instantiate a
     quantified fact
• Terminates if x  S. F is sort-restricted
   – sort(x) is less than sort(t[x]) for any term t[x] in F
                  Experience
• Compared with an earlier implementation
  – Unrestricted quantifiers, incomplete
    axiomatization of reachability, no f-1
  – Small to medium sized benchmarks
• Greatly improved the predictability of HAVOC
  – Reduced runtimes (2X – 100X)
  – Eliminate need for carefully crafted axioms and
    invariants
  – Can handle newer examples
                 Roadmap
• Novel features of the specification language

• Dealing with low-level features of C

• Concluding remarks
                 struct list {         struct record {
                   list *next;           int data1;
                   list *prev;           list node;
                 };                      int data2;
q                                      };


p                 record                     record
                   data1                      data1
                   next                       next
                   prev                       prev
                   data2                      data2



    q = CONTAINER(p, record, node)
      = (record *) ((int *) p – (int) (&(((record *)0)node)))
      = (record *) ((int *) p – 1)
void init_all_records(list *p) {   void init_record(list *p) {
  while (p != NULL) {                record *r = CONTAINER(p,
    init_record(p);                                    record, node);
    p = p->next;                     r->data2 = 42;
  }                                }
}


• Type safety requires nontrivial reasoning
     • the container of every element in list has type record*
• Use of memory model with field abstraction is
  unsound
• Field abstraction is crucial to all property checkers
     • &a->data1 is not aliased to &b->data2
     • init_all_records(p) preserves the assertion a->data1 == 0
Unify type checking and property
            checking
• Harness the power of constraint solvers to
  enhance type checking
  – type safety often depends on program-specific
    invariants
• Harness the strong guarantees provided by
  the type invriant to enhance property
  checking
  – non-aliasing, field abstraction
Mem:int int                Type:int type
 Mutable                      Immutable


      102

      101

      100                 Ptr(Int)   Ptr(List)   Ptr(Record)

      100
                            Int        List        Record
      99




    int                           type
Type invariant: a:int. HasType(Mem(a), Type(a))
void init_record(list *p) {
  record *r = CONTAINER(p,
                                             struct list {
                    record, node);             list *next;
  r->data2 = 42;                               list *prev;
}                                            };


                                             struct record {
                                               int data1;
                                               list node;
requires a:int. HasType(Mem(a), Type(a))
                                               int data2;
requires HasType(p, Ptr(List))
                                             };
ensures a:int. HasType(Mem(a), Type(a))
void init_record(int p) {
  var r:int;
  r := p-1;
  assert HasType(r, Ptr(Record));
  Mem(r+3) := 42;
  assert a:int. HasType(Mem(a), Type(a));
}
                                                            struct list {
                                                              list *next;
                                                              list *prev;
                                                            };
HasType(v, Int)     true
HasType(v, Ptr(t))  v = 0  (v > 0  Match(v, t))
                                                            struct record {
                                                              int data1;
                                                              list node;
                                                              int data2;
                                                            };

Match(a, Int)       Type(a) = Int
Match(a, Ptr(t))    Type(a) = Ptr(t)
Match(a, List)      Match(a, Ptr(List))  Match(a+1, Ptr(List))
Match(a, Record)  Match(a, Int)  Match(a+1, List)  Match(a+3, Int)
void init_record(list *p) {                      struct list {
  record *r = CONTAINER(p,                         list *next;
                    record, node);                 list *prev;
  r->data2 = 42;                                 };
}

                                                 struct record {
                                                   int data1;
requires HasType(p-1, Ptr(Record))  p - 1  0     list node;
                                                   int data2;
requires a:int. HasType(Mem(a), Type(a))        };
requires HasType(p, Ptr(List))
ensures a:int. HasType(Mem(a), Type(a))
void init_record(int p) {
  var r:int;
  r := p-1;
  assert HasType(r, Ptr(Record));
  Mem(r+3) := 42;
  assert a:int. HasType(Mem(a), Type(a));
}
                                                                    struct list {
                                                                      list *next;
HasType(v, Int)  true                                                list *prev;
                                                                    };
HasType(v, Data1)  true
HasType(v, Data2)  true
                                                                    struct record {
HasType(v, Ptr(t))  v = 0  (v > 0  Match(v, t))                    int data1;
                                                                      list node;
                                                                      int data2;
                                                                    };
Match(a, Int)     Type(a) = Int
Match(a, Data1)      Type(a) = Data1
Match(a, Data2)      Type(a) = Data2
Match(a, Ptr(t))     Type(a) = Ptr(t)
Match(a, List)       Match(a, Ptr(List))  Match(a+1, Ptr(List))
Match(a, Record)  Match(a, Data1)  Match(a+1, List)  Match(a+3, Data2)
               Other highlights
• Decision procedure for type safety
  – suffices to instantiate the type invariant and
    definitions of Match and HasType on few terms
• Extensions
  – unions
  – function pointers
  – parametric polymorphism
  – user-defined types
  – sub-word accesses (char, short)
                   Experience
• Property checking on small benchmarks
  – list-manipulation: insertion, removal, multiple lists
    each with a different container type
  – sorting: bubble sort, merge sort, quick sort
  – intuitive and concise annotations
• Type checking of four WDK drivers
  – cancel, event, kbfiltr, vserial
  – ~1 min to check each driver
  – ~5KLOC, ~225 annotations
                 Roadmap
• Novel features of the specification language

• Dealing with low-level features of C

• Concluding remarks
  Other case studies with HAVOC
• Synchronization protocols protecting critical data
  structures in the NT file system (Brian Hackett)
   – ~300KLOC, 1500 procedures
   – reference count usage, lock usage, data races, teardown
     races
   – 45 confirmed bugs (out of 125 warnings)
   – most bugs fixed
• Spin lock usage in Windows device drivers (Juan
  Pablo Galeotti, Thomas Wies)
   – flpydisk, kbdclass, daytona, serial (~50KLOC)
          HAVOC is available
• Download:
  – http://research.microsoft.com/projects/HAVOC
            Future directions
• Unified decision procedure for reachability,
  inverse, arrays, and types for the low-level
  memory model

• Exploiting type invariant for property checking
  on device drivers

• Annotation inference
Questions

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:4/8/2013
language:Unknown
pages:39