Docstoc

defense

Document Sample
defense Powered By Docstoc
					Formal Design and Verification
 Methods for Shared Memory
          Systems

         Ratan Nalumasu
       Dissertation Defense
       September 10, 1998
   Problems Facing Digital Design
  • Complexity
  • Longer design time
  • Shorter time to market




9/10/1998         Design Complexity   2
   Current Debugging Technology




      + Full model
      – Partial examination  No assurance
      – Weaker properties
      – Difficult correctness metrics
      – Full model

9/10/1998            Design Complexity       3
              Formal Methods
• Formal methods = Math based
  techniques
• Continuous math : Engineering =
  Discrete math : Digital system design

    “It is what the designers want. It’s just
    challenging to prove.”



9/10/1998            Introduction to FM         4
    Formal Methods based Design




      – Reduced model
      + Complete examination
      + Better assurances (on the reduced model)
      + Stronger property language
      + Better correctness metrics
      + Reduced model
9/10/1998            Introduction to FM            5
                FM Taxonomy
       • Manual verification techniques:
         Interactive theorem provers
       • Automatic verification techniques:
         Model checkers
       • Compilation techniques:
         Refinement rules




9/10/1998             Introduction to FM      6
       Interactive Theorem Provers
 + Can deal with infinite state systems
 – Extensive manual reasoning




 + Good for algorithm verification
            Proof of a compilation scheme
9/10/1998             Theorem Provers       7
                    Model Checking
       process p(x) {                   (G=0, p.L=0, ...)
         global G; local L;
                                             0
            while (...) {
              recv ...;                      1
              send ...;
            }                                2     3
       }
       process q(x,y) ...


9/10/1998                   Model Checking                  8
            Model Checking Strengths
• Automatic
• If property fails, model checker shows the
  error trace
     – Deadlock: How initial state reached it
     – Assertion: How initial state reached it
     – Starvation: A loop where no progress is
       made



9/10/1998             Model Checking             9
            Model Checking: Example
• Construct graph of the system, and check
  the property: Deadlock at (22)
             0   0                     00

                                  10        01
             1   1
                             20        11        02

             2   2                21        12

                                       22
• State Explosion
            Partial Order Reductions
9/10/1998            Model Checking                   10
             Refinement Algorithms

• Need to verify only high-level protocols

• Domain-specific compilers can generate
  efficient implementations




            Refinement rules for DSM protocols

9/10/1998              Refinement Algorithms     11
      State of the art of Applied FM
     +      General purpose
     +      Widely applicable techniques
     –      Inefficient algorithms
     –      Inefficient “compilers”
     –      Do not help with domain specific
            concerns



9/10/1998                  Applied FM          12
               Thesis Statement
      Domain specific formal methods
        • Efficient verification techniques
        • Address domain specific concerns

                       CPU     CPU

            Domain:

                      Memory   Memory


9/10/1998                                     13
                  Overview
  • Introduction to formal verification
   Shared memory systems
  • Contributions
  • Conclusions




9/10/1998                                 14
             Memory Bottleneck
   • Processor speed increases at 55% a
     year, while memory speed increases at
     7%
        – Caches
   • Tendency toward multiprocessors
        – Further imbalance  complex protocols
        – SMP systems
        – DSM systems

9/10/1998             Memory Bottleneck           15
            Symmetric Multiprocessors

                 CPU      CPU             CPU
                  $        $               $




                       Memory

       Can scale upto 10s of processors
       Modern caches have support for such SMP
            protocols

9/10/1998              SMP Architecture          16
            SMP Protocol Design
  • Bus protocols
        – Bus arbitration algorithm
        – Cache invalidation scheme
        – Lack of atomicity on the bus
  • Bus and CPU interaction
        – Does CPU have out-of-order execution?
        – Does bus allow out-of-order completion?
  • Are these decisions visible to software?

9/10/1998               SMP Protocols               17
        Distributed Shared Memory
              MEM          MEM               MEM

              NODE         NODE              NODE



                         Network


            Each node may be a SMP or a single CPU


9/10/1998                 DSM Architecture           18
            DSM Protocol Design
• Network port arbitration
• Coherency maintenance across the
  network
     – Maintaining distributed state
     – Little atomicity
     – “Ghost” messages
     – Transient states
• Are these decisions visible to software?

9/10/1998              DSM Protocols         19
      Shared Memory Correctness
    • Low level:
            – deadlock
            – forward progress
            – bus arbitration
    • Intermediate level:
            – at most one owner of a cache line at a
              time
    • High-level:
            – abstraction provided to the software
9/10/1998                Shared Memory Systems         20
  Abstraction Provided to Software
 Uniprocessor:            P1           ok P1
 cache/compiler/          write(a,new)    read(b)
 out-of-order execution
                          read(b)         write(a,new)
                          P1                    P2
Multiprocessor:           write(a,new)          write(b,new)
 Not ok                   read(b)               read(a)
 under
                          P1                    P2
 S.C.
                          read(b)               read(a)
                          write(a,new)          write(b, new)
                  Test model checking
 9/10/1998                 Software Interface              21
                        Overview
• Introduction to formal verification
• Shared Memory systems
 Contributions
     – mitigating state explosion
        • Partial order reduction algorithm
     – facilitating high-level design
            • Protocol synthesis algorithm
     – enhancing applicability
            • High-level correctness such as SC
• Conclusions
9/10/1998                                         22
                     Contributions
                           2              Test
                Protocol              Model checking
                   1                       2
                       PO algorithm

            3        Refinement rules
                               3
                  Efficient implementation

9/10/1998                  Contributions               23
       Contribution #1

Mitigating State Explosion Problem
     Partial Order Reductions
                Partial Order Reductions

                                 00
            0      0                                        00
                            10          01             10
            1      1   20        11          02
                                                  20
                                        12
            2      2        21                         21
                                 22
                                                            22

      If two transitions are independent, then
      explore one of them postponing the other

9/10/1998                    PO Reductions                       25
              Ignoring Problem
        Select some transitions, and postpone
        others  but do not postpone forever


                   S0       Postponed



                   S1
                            Postponed


9/10/1998               PO Reductions           26
            Proviso based Solution
       Godefroid, Valmari, Holzmann, Peled’s
       solutions are very similar: Proviso
        – Expands the “last” state of the loop
          completely
                     S0       Postponed



                     S1
                               Expand

9/10/1998                 PO Reductions          27
              Problem with Proviso
                      0                           0

                  1   P 2                     1   Q 2

                      10      11             01       21
             00               12
                      02                     22       20

            Q postponed
                           ALL 9 states
9/10/1998                    PO Reductions                 28
            Our Algorithm: 2-phase
                  0                            0

              1   P 2                      1   Q 2

                             00


                  10    20            01       02

                       Only 5 states

9/10/1998                 PO Reductions              29
            Performance Comparison
                                       States              Time
              Mig (Spin)               113,628             13.6
              Mig (2 PV)              9,185     1.7
              Inv (Spin)               > 620,446 DNF
              Inv (2 PV)              135,404   21.2

             20,000
                    (20x)
             15,000
                                                                SPIN
             10,000
                                                                PV
              5,000
                  0
                     SC2    SC3       SC4         Pftp   Snpy

9/10/1998                         PO Reductions                        30
    Contribution #2

Facilitating High-level Design
    Protocol Refinement
                Protocol Refinement
    • PO reductions not sufficient, theorem
      provers ruled out
    • Compile from high-level protocol
      specification
            – easier to design
            – easier to verify
            – can generate efficient implementation
              using domain knowledge


9/10/1998                Refinement Algorithms        32
            Unexpected Messages

               Send a                 recv ack
      P
               req to Q               from Q



             Some request              ???
                                Always nack
                                 no forward progress
                                Always Silence 
                                 Deadlock


9/10/1998            Refinement Algorithms             33
             Refinement Procedures
     • Debug the high-level specification:
       Synchronous communication with no
       transient states
     • Automatic refinement procedures
       transforms it into detailed
       implementation
            – No need to verify the implementation
            – Needs domain specific knowledge for
              efficiency

9/10/1998               Refinement Algorithms        34
                  Related Work
  • Buckley & Silberschatz, 83
        – For OS environments, not fit for
          hardware
  • Gribomont,90
        – Protocols where synchronous
          messages can be simply replaced by
          asynchronous messages



9/10/1998             Refinement Algorithms    35
            Related Work (contd)
  • Teapot, 96 for DSM systems (Chandra)
        – Protocol programming language
        – “Suspend” construct for transient states
        – Not high-level: Suspend states still
          specify what to do in a transient state




9/10/1998             Refinement Algorithms          36
            Context: DSM Protocols
              MEM     MEM                    MEM


             NODE    NODE                NODE


                      Network

       Protocol per each cache line
       1 home, n “remote” nodes per each line
       Home is responsible for maintaining
            consistency (Hub)
9/10/1998            Refinement Algorithms         37
               Refinement Rules
            Req                                Req



            Ack or                             Ack or
            Nack                               Nack


   Home              Remote              Home      Remote


9/10/1998              Refinement Algorithms                38
              Refinement Rules (2)

            Req1                    Req2


                                              Req1 is
                                              ignored by
                   Ack or                     both
                   Nack                       processes

               Home    Remote

9/10/1998             Refinement Algorithms                39
                    Debugging Effort
              Protocol N Low-level                High-level
                                                  spec
              Mig     2 54                        23,164/2.8
                      4 235/0.4
                      8 965/0.5
              Inv     2 546/0.6                   193389/20.6
                      4 18686/18.4


            Protocol compilation scheme has been
            proved using a theorem prover

9/10/1998                 Refinement Algorithms                 40
      Contribution #3

     Enhancing Applicability
Shared Memory Model Verification
        Relaxing Instruction Orders

                    P1                         P2
                    write(a,new)               write(b,new)
                    read(b)                    read(a)

                    P1                         P2
            Under   read(b)                    read(a)
            SC      write(a,new)               write(b,new)



9/10/1998                Test Model Checking                  42
  Verification of HW/SW Interface
                CPU         CPU               CPU
                 $           $                 $




                       Memory                   Test model
                                                checking
        SC:
              The result can be explained by
              some interleaving of the instructions.

9/10/1998               Test Model Checking                  43
  Current Verification Techniques
• Simulation
     – Must study lengthy executions
     – Must choose non-trivial programs
• Formal techniques (next slide)




9/10/1998           Test Model Checking   44
              Related Work
  • Graf’s Lazy caching in ACTL*
  • Gibbons approach  run programs and
    check if the results are SC
  • McMillan’s thesis  data abstraction for
    a test
  • Hojati  data abstraction in a different
    context
  • Undecidability result by Alur et al

9/10/1998         Test Model Checking          45
      ACTL* for (stronger than) SC
• AG(enabled( read(a,d) ))  avail(a,d)
• AG(avail(a,d) AND EF(enable(read(a,d))))
  A[NOT avail(a,d) W AG NOT avail(a,d)]
• ...
• init  AG[after(write(a,d)) 
        A(NOT enabled(read(a,d) W avail(a,d))]
  Such MODEL DEPENDENT SPECS do not
  fit in an iterative industrial frame


9/10/1998         Test Model Checking            46
                Test Model Checking
   • Adaptation of simulation to model
     checking
            – model checking (full coverage) +
              testing (“black box approach’’)
   • Tests are independent of the model
     being verified  manual effort is
     considerably reduced
            – Test model-checking can be used early
              in the design cycle

9/10/1998                 Test Model Checking         47
                         Results
    • Defined a shared memory description
      language
            – “data is not used for control decisions”
            – “addresses are symmetric”
            – Can specify HP’s Runway/PA, ...
    • Model checking technique
            – “Small number of addresses is
              sufficient”
    • Application to runway/PA using PV
9/10/1998                 Test Model Checking            48
            Read Order, Write Order
      If P1 executes two write instructions, then
      P2 sees them in the program order of P1
             P1           P2
            A := 1      X1 := A
            A := 2      X2 := A
            A := 3      X3 := A                X(i+1)  X(i)
             ....         ....
            A := k      Xk := A
                     Many deficiencies
9/10/1998                Test Model Checking                   49
              Deficiencies of the Test
     • Finite k
            – What if an error occurs for a really large
              k?
     • Location “A” is never written by P2
            – What if an error occurs when the
              ownership changes?
     • Only 1-address
            – The definitions of RO and WO are not
              restricted to a single address at a time
            – How many addresses to consider?

9/10/1998                  Test Model Checking             50
                     Unbounded k
     Data abstraction + non-determinism

   wr(0)
                        rd(0)
             wr(1)                           rd(1)


                         rd(1)                rd(0)
            wr(1)
                               Non-deterministic
                               change
9/10/1998              Test Model Checking            51
              Ownership Changes

     wr(0)
     or rd(-)               rd(0)
                             or wr(2)        rd(1)
            wr(1)


                             rd(1)            rd(0)
            wr(1)            or wr(2)
            or rd(-)

                Complete 1-address test

9/10/1998              Test Model Checking            52
             2-address (RO, WO) test
            rd(-) OR wr(0)              rd(-) OR wr(2)
    rd(A,-) OR rd(B,-) OR rd(A,-) OR or rd(B-) OR
    wr(A,0) OR wr(B,0)    wr(A,2) OR wr(B,2)



                  wr(A,1)           rd(B,1)
                  wr(1)              rd(1)
                                                   rd(0)
                                                   rd(A,0)
        rd(A,-) OR wr(A,1) OR
        rd(B,-) OR wr(B,1)
            rd(-) OR wr(1)
9/10/1998                    Test Model Checking             53
            2-address (RO, WO) test

    rd(A,-) OR rd(B,-) OR rd(A,-) OR or rd(B-) OR
    wr(A,0) OR wr(B,0)    wr(A,2) OR wr(B,2)



               wr(A,1)          rd(B,1)



                                               rd(A,0)
        rd(A,-) OR wr(A,1) OR
        rd(B,-) OR wr(B,1)

9/10/1998                Test Model Checking             54
       Complete Test for (RO, WO)
    • Theorem: A system implements (RO,
      WO) if and only if it has no errors on
      all 1- and 2-address programs
    • Complete 1-address and 2-address
      tests




9/10/1998          Test Model Checking         55
              Program Order
• PO generalizes RO and WO to include
  orderings between a read followed by
  write, and write followed by read
            rd(A)
                    RO
            rd(B)
                    RW                     PO
            wr(A)
                    WR
            rd(B)

9/10/1998            Test Model Checking        56
               Write Atomicity
• All processors agree on the order of writes
     – WO imposes the order only if the writes are
       from same program
            wr(A,0)


                             wr(B,1)

                  SC is (PO, WA)

9/10/1998             Test Model Checking        57
               1-address SC test
             P0                     P1
            A := 0               A := 3
            rd(A)                rd(A)      ORDER:
                                             1, 4
            A := 1 Barrier A := 4             OR
                                             4, 1
            A := 2               A := 5
            rd(A)                rd(A)



9/10/1998             Test Model Checking            58
            Complete Tests for SC
   • Theorem: A system with N processors
     implements SC if and only if it has no
     errors on programs n<N address
     programs
   • Scheme for N processors
        – N barriers
        – Data written before, at, and after barrier
          are different
            • data 0, 1, 2 for P0, and data 3, 4, 5 for
              P1
9/10/1998                Test Model Checking              59
                  Case Studies
  • Serial memory (operational semantics of
    SC)
  • Lazy caching
  • Runway/PA system model
       – Bus based design
       – An aggressive split transaction protocol
       – Out-of-order completion of transactions on
         Runway for high-performance
       – In-order completion of instructions in PA for
         sequential consistency

9/10/1998               Test Model Checking              60
            Test Model checking of
                 HP/Runway
                   Spin                    PV
            PO-1   56K                     2412
            PO-2   > 5M/DNF 285K
            SC-1   499K                    7880
            SC-2a > 5M/DNF 5.9M
            SC-2b > 4M/DNF 574K

9/10/1998            Test Model Checking          61
                 Conclusion
    Showed that specializing formal methods
    for a particular domain (shared memory)
    leads to efficient verification techniques
    for the domain, and increases the
    applicability of the formal methods
     – Two phase algorithm
     – Refinement procedure
     – Memory model verification


9/10/1998                                        62
                 Future Work
• Model checking algorithms
     – better partial order algorithms
     – tune for test model checking
• Protocol synthesis
     – More optimizations
• Test model checking
     – Weaker memory models, other objects
     – Application to other fields


9/10/1998                                    63

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:8/13/2012
language:
pages:63