```									Parallel External Directed
Model Checking with
Linear I/O
Shahid Jabbar
Stefan Edelkamp

Computer Science Department
University of Dortmund, Dortmund,
Germany
Model Checking
   Given
   A model of a system.
   A specification property

   Model Checking Problem: Does the system satisfy
the property ?

   Method: An exhaustive exploration of the state
space to search for a state that does not satisfy
the property.

   Problem: How to cope with large state spaces
that do not fit into the main memory?

Directed Model Checking
(Edelkamp, Leue, Lluch-Lafuente, 2004)

   A guided search in the state space.
   Usually by some heuristic estimate.
   Only promising states are explored.
   Under-certain conditions proved to be complete.
   Shorter error trails
   Better for human comprehension

   Problem: The inevitable demands of the model ..
Space, space and space.

Possible Solution
   Use Virtual Memory.
   Assume a bigger address space divided into
pages.
   Saved on the hard disk but are moved back to
the main memory whenever they are “called” –
Page Faults.
   Pages are mapped to physical locations within
the main memory and the desired content is
returned from the main memory location.

Problem with the Virtual Memory
Virtual
Space

Memory
Page

0xFFF…FFF

External Memory Model (Aggarwal and
Vitter)
If the input size is
M                     very large, running
time depends on the
Scan(N) = O(N / B)                                                      I/Os rather than on
the number of
Sort(N) = O(N/B log        M/B   N/B)             B                     instructions.

Disk             Input of size N and N >> M

External BFS (Munagala &
I: Remove Duplicates                              Duplicates’        II: Subtract
by sorting the nodes                               Removal           layers t and
according to the                                                     t+1 from
indices and doing an                                                 t+2.
scan and compaction
phase.                                X
Y            X
B                                            X
Z            Y
A            C                                            Y
A            Z
D                                            Z
X            A

t           t+1
t+2

A* Algorithm
a.k.a Goal-directed Dijkstra
   A heuristic estimate is used to guide the search.
   E.g. Straight line distance from the current node to the
goal in case of a graph with a geometric layout.
   Reweighing: w’(u,v) = w(u,v) – h(u) + h(v)

   Problems:
   A* needs to store all the states during exploration.
   A* generates large amount of duplicates that can be
removed using an internal hash table – only if it can fit
in the main memory.
   A* do not exhibit any locality of expansion. For large
state spaces, standard virtual memory management can
result in excessive page faults.

Bringing Locality …
h
   Implicit, unweighted,
undirected graphs                         0        1        2      3   4     5      6

0                                            A

Consistent
Bucket
                                                                                !!

heuristic                       1
estimates.                      2
g
3
4
=> ∆h ={-1,0,1}
5

External A* [Edelkamp, Jabbar, and Schroedl,
2004]

   Buckets represent temporal
locality – cache efficient
order of expansion.

   If we store the states in the
same bucket together we                                            External A*
can exploit the spatial
locality.

and Korf’s delayed duplicate
detection for implicit graphs.

External Search For Model Checking
[Jabbar and Edelkamp VMCAI – 05]
+ Uses Harddisk to store the state space divided in
the form of Buckets.
+ Implemented on top of SPIN model checker.
+ Promising: Largest exploration so far took ~20
GB – much larger than even the address limits
of most computers.
+ Pause and Resume support – Can add more
harddisks.
Problems:
-  Slow duplicate detection phase
-  Internal Processing Time >> External I/O time

Solution
Distribute the internal working
on multiple processors.
Distributed Directed Model Checking
Observations:

   Since each state in a Bucket is independent of the other –
they can be expanded in a parallel fashion.

   Duplicates removal can be distributed on different
processors.

   Bulk (Streamed) transfers between processors are much
better than single transfers.

Distributed Queue
<g, h, start byte, size>
P0

<15,34, 20, 100>
TOP

<15,34, 0, 100>                                                         P1

<15,34, 40, 100>

P2

Delayed Duplicate Detection
   Each state can appear several times in a bucket.
   A bucket has to be searched completely for the
duplicates.                               Single Files
Sorted buffers                                  GOAL

P0                   P1                           P2                  P3

Problem: Concurrent Writes !!!!

Multiple Processors - Multiple Disks
variant
P1                    P2                         P3   P4
Sorted
buffers w.r.t
the hash val

Sorted Files

Divide w.r.t
the hash
ranges
Sorted
buffers from
every
processor              h0 ….. hk-1             hk ….. hl-1

Sorted File

I/O Complexity
External memory algorithms are evaluated
on the number of I/Os.
Expansion:
   Linear in I/O => O(Scan(V))
   Delayed Duplicate Detection:
   Phase I: Given that enough file pointers are provided by
the operating system
   Ω(scan(E))
   Else Ω(sort(E))
   Phase 2: Subtracting previous levels: k .O(Scan(E))
where k is bounded by the size of the largest cycle in the
combined automata.
Comparison with other approaches
 Delayed transfer.
 Bulk transfer is much better than
individual transfers over a network.
 External Memory provides the space for
large state spaces.

(Space Consumption = 2.1 Gigabytes)
Time taken     Time taken on                Time taken    Speed-up
on Proc. 1     Proc. 2                      on Proc. 3

1 25m 59s
Processor 18m 20s
2 17m 30s                  17m 29s                                    1.48
Processors 9m 49s                   9m 44s                                     1.89
3 15m 55s                  16m 6s                       15m 58s       1.64
Processors 7m 32s                   7m 28s                       7m 22s        2.44

Real-time
Multiple Processors Machine
CPU-time

(Space Consumption = 5.2 Gigabytes)
Time taken     Time taken on                Time taken    Speed-up
on Proc. 1     Proc. 2                      on Proc. 3

1 73m 10s
Processor 52m 50s
2 41m 42s                  41m 38s                                    1.75
Processors 25m 56s                  25m 49s                                    2.04
3 37m 24s                  34m 27s                      37m 20s       2.12
Processors 18m 8s                   18m 11s                      18m 20s       2.91

Real-time
Multiple Processors Machine
CPU-time

(Space Consumption = 20 Gigabytes)
Time taken     Time taken on                Time taken    Speed-up
on Proc. 1     Proc. 2                      on Proc. 3

1 269m 9s
Processor 186m 12s
2 165m 25s                 165m 25s                                   1.62
Processors 91m 10s                  90m 32s                                    2.04
3 151m 6s                  151m 3s                      151m 5s       1.78
Processors 63m 12s                  63m 35s                      63m 59s       2.93

Real-time
Multiple Processors Machine
CPU-time

(Space Consumption = 4.3 Gigabytes)
Time on          Time on Proc.                Time on    Speed-up
Proc. 1          2                            Proc. 3

1 55m 53s
Processor 43m 26s
2 31m 43s                    31m 36s                                 1.76
Processors 22m 46s                    22m 58s                                 1.89
3 23m 32s                    23m 17s                      23m 10s    2.41
Processors 15m 20s                    14m 24s                      14m 25s    3.01

Real-time
Multiple Processors Machine
CPU-time

(Space Consumption = 4.3 Gigabytes)

Time taken          Time taken on                Speed-up
on Proc. 1          Proc. 2

1 76m 33s
Processor 26m 37s
2 54m 20s                     54m 6s                       1.41
Processors 14m 11s                     14m 12s                      1.87

Real-time
Workstations connected via NFS
CPU-time

(Space Consumption = 5.2 Gigabytes)

Time taken          Time taken on                Speed-up
on Proc. 1          Proc. 2

1 100m 27s
Processor 31m 6s
2 76m 38s                     76m 39s                      1.3
Processors 15m 52s                     15m 31s                      1.96

Real-time
Workstations connected via NFS
CPU-time

Summary
   State space explosion problem can be circumvented by
Directed External Model Checking

   Time turns out to be a bottle-neck.
    not for the External I/O but for Expansion

   Internal work is divided on multiple processors.

   Delayed transfer of state sets  low network cost.

   Implemented on top of IO-HSF-SPIN – SPIN model checker
with external heuristic search.

   Significant speed-up.

External Directed LTL Model
Checking – under review
 Schuppan and Biere approach => liveness
as reachability.
 Liveness requires searching for an
acceptance cycle
   A path to a previously seen state that also
visits an accepting state.
 Save a tuple of states.
 Two new heuristics to accelerate the
search.

Shahid Jabbar (Dortmund)   External Parallel Directed Model Checking   26
External Directed LTL Model
Checking – under review
0     1     2      3      4

Same
states in
Arrives at                                                                         both
the final                                                                          parts
state

Arrives