Benchmark and Framework for Encouraging
Research on Multi-Threaded Testing Tools
Klaus Havelund
Scott D. Stoller
Shmuel Ur
IBM Labs in Haifa, Software and Verification Technology
IBM Labs in Haifa, Software and Verification Technology
Goal of this Work – Foster Research in Multi-Threaded Testing
Create a benchmark of programs
with documented bugs that will help
in evaluating testing technologies
Create a set of tools and interfaces
such that the implementation of a
new idea may be based on existing
technology
Create a community of tool makers
for multi-threaded testing and
debugging
2
IBM Labs in Haifa, Software and Verification Technology
Agenda
Multi-threaded testing is different
Existing dedicated testing technologies
Formal verification
Static analysis
Noise makers
Race detection
Trace analysis
Replay
State space exploration
Cloning
Interactions between the technologies
Proposed benchmark content
How will we get there?
3
IBM Labs in Haifa, Software and Verification Technology
Distinguishing Factors for Concurrent Testing
The set of the possible interleaving is huge
The bugs are intermittent (now you see me, now you don’t)
Usually the schedulers are deterministic
Running the test many times will not help
Bugs are very hard to fix
Hard to recreate
Debugging changes the timing
Current testing technologies are geared to sequential code
Inspection
Coverage
Result – Bugs are found very late or by the user
4
IBM Labs in Haifa, Software and Verification Technology
Agenda
Multi-threaded testing is different
Existing dedicated testing technologies
Formal verification
Static analysis
Noise makers
Race detection
Trace analysis
Replay
State space exploration
Cloning
Interactions between the technologies
Proposed benchmark content
How will we get there?
5
IBM Labs in Haifa, Software and Verification Technology
Formal Verification and Static analysis
Analyzing the code and looking for bugs
Model Checkers usually verify invariants
Detecting risky programming practices
Constructing models for verification
Static analyzers detect relevant parts of the program however tend
to be either too cautious or too optimistic
Formal verification is limited by the state explosion problem
Lots of research – practical impact only in limited domains
Tools: FeaVer, Bandera, SLAM, BLAST, TVLA, Canvas
Papers in many conferences
6
IBM Labs in Haifa, Software and Verification Technology
Noise Makers
Cause different interleavings by adding random delays
Try to make the application behave in an unpredictable way
Do not report to the user – no false warnings
Benefit from multiple execution of tests
Do not guess what the correct result is
7
IBM Labs in Haifa, Software and Verification Technology
Race Detectors
A race is two accesses to the same variable, from two different threads, at least
one of which is a write, that are not synchronized
Race detectors can work on-line or off-line
On-line lower performance
Off-line requires large traces
Not all information reported is correct
The race could be intentional
The synchronization may be subtle or user implemented
A lot of research in this domain, mainly on improving performance and
accuracy
8
IBM Labs in Haifa, Software and Verification Technology
Trace Analyzers
Post mortem of the execution
Can check if properties hold
Can look for races
9
IBM Labs in Haifa, Software and Verification Technology
Replay
Once a bug is found you want to be able to replay the test
Accurate replay is hard
All OS interaction has to be captured
All application randomness (time, random, hash) captured
All input captured
Interleaving captured
Seed replay is much easier
Repeat the randomness caused by the application
Replay tool may be at the source, bytecode or JVM level
10
IBM Labs in Haifa, Software and Verification Technology
State Space Exploration
Integrates automatic test generation, execution & evaluation
Run random test
Expected results not known
Generate random inputs and random timing (or biased random)
Try to create
Deadlocks
Exceptions
Violate user defined assertion
Try to explore all program states
Tools: Verisoft, CMC, JPF
11
IBM Labs in Haifa, Software and Verification Technology
Cloning
Run the “same” test multiple times in parallel
Load testing - the most used testing technology is based on this
observation: LoadRunner, Robot…
Contention is likely
Usually limited expected results are used
Need to distinguish between clones
12
IBM Labs in Haifa, Software and Verification Technology
Agenda
Multi-threaded testing is different
Existing dedicated testing technologies
Formal verification
Static analysis
Noise makers
Race detection
Trace analysis
Replay
State space exploration
Cloning
Interactions between the technologies
Proposed benchmark content
How will we get there?
13
IBM Labs in Haifa, Software and Verification Technology
Benefits of Information Sharing
Formal Verification
Is this really a race? (from a race detector, static analyzer)
Create a list of suspicious variables (to noise maker, instrumentor)
Race detectors
List of detected races (to formal tool, model checker)
List of suspicious variables (from static analyzer, formal, noise)
Coverage
List of shared variables (from static analyzer, formal)
State not yet covered (to state space exploration, noise, formal)
Opportunity – All the tools can benefit from sharing information
Problem – Too many API’s to create
14
IBM Labs in Haifa, Software and Verification Technology
Interactions between Technologies – Partial Solution
Debugging
Static Analysis
On-line Race Detection Replay
Static Dynamic
Formal Verification Noise Making State Space Exploration
Observation
Database
Off-line Race Detection coverage
Trace Evaluation
Cloning
Performance Monitoring
Instrumentation
Engine
15
IBM Labs in Haifa, Software and Verification Technology
Instrumentation is a Key Enabling Technology
Instrumentation is the process of automatic modification of the code by
adding user exits
An example – at every statement increase a counter - coverage
Instrumentation is used by all the dynamic and trace evaluation
technologies
Inputs to instrumentation
What to instrument
Parts of code (e.g. files, classes, methods, lines)
Subset of the variables
Types of instrumentation (e.g. specific kinds of bytecodes)
Which instrumentation to put at that point
Different instrumentation types – source, bytecode, class loading
16
IBM Labs in Haifa, Software and Verification Technology
Observation Database
Interesting variables – may be involved in races
Possible race locations
Unimportant locations
Only one thread may be alive at that time
Suspicious areas
Interleaving coverage information
Traces of executions
17
IBM Labs in Haifa, Software and Verification Technology
Dynamic linking
Technologies need to interact in runtime
Only a specific instantiation of a variable is suspicious
Transfer control to other technologies (e.g., race to noise)
Appropriate interfaces should be designed
Links between testing and debugging fall under this category
18
IBM Labs in Haifa, Software and Verification Technology
Agenda
Multi-threaded testing is different
Existing dedicated testing technologies
Formal verification
Static analysis
Noise makers
Race and deadlock detection
Trace analysis
Replay
State space exploration
Cloning
Interactions between the technologies
Proposed benchmark content
How will we get there?
19
IBM Labs in Haifa, Software and Verification Technology
Programs with BUGS
Sample programs to show different bug categories
Real programs in which bugs were found
For each program
Annotation of the bug
Where it is
What variables are involved
Classification of the bug
Expected results, correct, failures, what the failures indicate
Instrumented program (generic instrumentation)
Driver for the test suite and result analyzer
20
IBM Labs in Haifa, Software and Verification Technology
Components of the Framework
Observation database with the API’s
Instrumentation engine with the API’s
Open API’s for runtime
Other components as they become available. We already have
Noise maker
Race detector
Coverage viewer
21
IBM Labs in Haifa, Software and Verification Technology
Agenda
Multi-threaded testing is different
Existing dedicated testing technologies
Formal verification
Static analysis
Noise makers
Race and deadlock detection
Trace analysis
Replay
State space exploration
Cloning
Interactions between the technologies
Proposed benchmark content
How will we get there?
22
IBM Labs in Haifa, Software and Verification Technology
So what do we do?
We already have a few of the programs (10-15) in our regression suites
and will make them available. Scott has a few more.
I teach multi-threaded testing - Gave the class an assignment to write
programs with interesting bugs in the required format…
Start a few thesis projects in this area (we already have three MSc
students)
Propose projects in this area
Get you people to join us in this effort
Have something to show for the next PADTAD!
23
IBM Labs in Haifa, Software and Verification Technology
Questions, Comments, Are You with Us?
24