Safety Analysis of Hardware / Software Interactions in Complex Systems
John A. McDermid, David J. Pumfrey; University of York; York, UK
Keywords: software segregation, operating system safety
Abstract which it runs. In seeking evidence of safety, there
is a need to answer questions such as:
This paper describes a new analysis technique • does the software use the hardware safely and
developed specifically to study the safety appropriately?
implications of the relationship between software • can the software cope acceptably with
and the hardware on which it runs. plausible hardware failures?
• could hardware failures invalidate
The technique was developed in response to a assumptions of independence of software
request for assistance in completing the safety failure modes (or vice versa)?
argument for a critical avionics application.
Evidence was required that the segregation Safety requirements for a computer system are
mechanism, used to partition functions of normally expressed in terms of requirements on
different integrity levels running on the same the functions they provide, which are derived
processor, would adequately protect critical from hazard analyses carried out on the system
program and data memory from corruption by the which is to be controlled. These requirements are
lower integrity software. propagated down and refined as the computer
system design is developed, so that, at each level,
The technique is based on an analysis of time and the implementers are aware which functions are
physical resources, using interpretations of a critical, and what failure modes must be avoided.
number of generic failure classes to prompt In addition to these specific requirements, there
consideration of various hypothetical deviations will also be some general requirements over the
from designed behaviour. whole system, such as “no single failure shall
lead directly to a hazard”. At the level of
We consider this research to be of particular hardware / software interactions, a safety effect is
significance, as the ability to provide such considered to be anything which causes one of
evidence is fundamental to the development of the functional failure modes identified as critical,
safety cases for future systems which will need to or causes a violation of one of the general safety
use a generic high integrity kernel to manage a requirements.
number of processes with different integrity
levels running on the same hardware. Historically, safety critical systems have tended
to be written as monolithic software,
The paper describes the principles of the incorporating bespoke scheduling, interface and
technique, and also presents our experience in support functions, which ran on a single, custom-
applying it to the avionics system project which designed hardware platform. This style of system
prompted its development. was relatively inflexible, but had the advantage
(from the safety assessment viewpoint) that
Introduction application software interfaced very directly with
the hardware (figure 1a).
As the number and diversity of safety critical
applications of computer systems has increased, More recently, there has been a move to build
there has been an attendant requirement for safety critical software using architectures more
increased levels of assurance of the safety of such similar to those found in general computing
systems, where safety encompasses both the systems, with a distinct operating system layer
normal (intended) operation of the system, and between application functions and hardware
the management of failures. One area which has (figure 1b). Since it adds a level of indirection
proved particularly problematic is assessment of between application code and hardware, this
the safety properties of the interaction between additional layer makes some parts of the task of
safety critical software and the hardware on analysing software / hardware interactions more
complex. However, it also offers significant this will be detected, and the system caused to
advantages for the safety analyst, not least by take whatever action is necessary to ensure
making it possible to provide detailed evidence continued safety. A suitable operating system
about the system which is independent of the may make it possible to demonstrate this
application code, and does not need to be revised segregation without needing to analyse all of the
if the application is changed. application code. Although the analysis
technique described in this paper is applicable to
any computer system, our focus will be its
'Monolithic' application to provide evidence for such an
application Application functions operating system.
For any realistic computer system, whether of the
Scheduling monolithic style or with an identifiable operating
functions system, it would be impossible to carry out a
functions support complete analysis of the effects of every
software plausible failure of the hardware at every step of
software execution. What is required, therefore,
is an analysis based on a simplification or
abstraction of the system, supported by an
argument of the suitability and acceptability of
the abstraction used. Parts of this argument will,
Controlled system Controlled system necessarily, be system specific, but much of it is
general, and some of the general parts will be
described as we present our approach.
a) Monolithic b) With operating
software system Analysis Approach
Figure 1 – Safety Critical System Architectures
Our approach is based on the observation that all
In particular, when critical and non-critical interactions between software and the hardware
application functions are implemented on the on which it is running can be considered in terms
same system, it is necessary to demonstrate not of the use of physical resources and time. By
only that the use of resources by each is identifying a number of classes of resource
appropriate in itself, but also that functions of a criticality, it is possible to describe the type of
lower integrity cannot in any way interfere with arguments which must be made to demonstrate
the operation of those of a higher integrity. More the acceptability of the use of the resources in a
specifically, it must be demonstrated that under given system. We then show how these
normal operation: arguments can be based on consideration of the
1. Data flow corruption is prevented effects of a relatively small set of hypothetical
– low integrity level software cannot modify failure modes.
high integrity data.
2. Control flow corruption is prevented Identifying Resources: Physical resources
– critical functions can always execute at consist of the processor registers, memory
the correct time, without being affected by locations, I/O and other special registers. This is,
the actions of non-critical functions. effectively, the programmer’s model of the
– low integrity software cannot modify high hardware; hardware features such as buses,
integrity level code. arbitration logic etc. are considered in terms of
3. Corruption of the execution environment is the registers which control them.
– corruption of parts of the system used by For any specific combination of software and
both high and low integrity software (e.g. hardware, we can partition these resources into
processor registers, device registers and classes based upon the criticality of the resource
memory access privileges) cannot occur. usage. We identify five classes of criticality:
It must also be shown that, if any of requirements • intrinsically critical
1 to 3 is violated, e.g. due to hardware failure, those resources which contain safety critical
data at any point in the execution of the
software, or the program code for safety 1. it is possible to ensure that the model is
critical functions; examples include I/O and complete (i.e. includes the entire memory
RAM used by safety critical functions, map, all non-mapped devices such as
processor registers etc. processor registers, and all interrupts and
• primary control synchronisation events)
resources which directly control the use or 2. the model is familiar to the system’s
function of an intrinsically critical resource; designers and programmers, so it is possible
examples include memory management unit to discuss safety analysis in familiar terms
(MMU) registers, I/O control registers etc. 3. although potentially large, the model is of a
• secondary control fixed and predetermined size, so the effort
resources which either provide a backup to required for analysis can be predicted
primary controls (e.g. a secondary MMU reasonably accurately in advance.
giving redundancy in memory protection), or
control access to primary resources (for As an example of the identification and
example, key registers which must be set to classification of resources, consider the simple
particular values before MMU registers can vehicle brake-by-wire example shown in figure 2.
be altered) Braking inputs from a pair of sensors on the
• non-critical driver’s foot pedal are converted by the
resources which are never used by critical processing electronics into digital values
software, and do not affect the operation of available to the software in particular registers in
any part of the hardware which is used by the I/O map. The software includes a routine
critical functions which reads these registers and calculates a
• unused required braking value which is placed in an
locations in the memory map which do not output register, from where further electronics
correspond to a physical device. The convert it into drive signals to the brake actuator.
importance of these locations is that there The program code is stored in part of the
should be no attempts by any part of the Program ROM and runs on the processor, using
software to access them; such an attempt some of the RAM locations as work space. If
indicates a failure, and must be trapped and braking control is safety critical, then all these
handled safely. parts of the system must be regarded as
In considering time, we are not concerned with
the passage of real time, such as execution times The system also includes a memory management
of code sections, for which good methods exist. unit, which performs the mapping between
Our model of time is based on the identification logical and physical memory locations. The
of discrete timing events which have associated registers which define this mapping are regarded
hardware actions. Examples of these include as primary controls; they are not actually
interrupts, the use of system timers and counters, involved in the braking calculations, but directly
and synchronisation actions. Again, this is a affect resources which are.
model familiar to programmers.
The software is cyclic, and its periodic execution
Timing events can be identified as either critical is triggered by the arrival of an interrupt from the
or non-critical, depending upon whether they timer. This interrupt is a critical event, and the
affect the execution of critical code. Note that device registers which control the timer are also
there will be a set of primary (and possibly some considered to be primary controls.
secondary) control resources associated with
each timing event. For example, primary The example brake-by-wire system incorporates
resources associated with a timer-generated other functions which are not critical, including
interrupt will include the control registers for the functions to display system status on the
timer, and CPU registers which determine its
response to the arrival of the interrupt.
As a basis for analysis, this model has several
Brake Pedal Input 2 Critical RAM
Output Critical ROM
electronics / Output
actuation Non-critical ROM
electronics / Status Out
Brake-by-wire Computer System
Figure 2 – Simple Vehicle Brake-by-wire System
dashboard. The parts of the program ROM critical and non critical RAM and ROM areas.
containing the code for these functions, the status The initialisation code must set these boundary
output register, and RAM locations used only by values. However, the MMU is used in all
the status display functions may be regarded as memory accesses from the CPU, and will
non-critical resources. If we assume that the therefore be influential in the execution of the
software is implemented as a set of application code which initialises its own registers.
functions running on top of an operating system,
then the operating system must be capable of A complete safety argument for the system must
providing suitably segregated access to all the therefore demonstrate that the system powers up
system resources. in a safe state, and respects minimum safety
requirements throughout every stage of
Resource Dependencies: From the descriptions initialisation. To guarantee the correct execution
of the resource classes, it is clear that there are of the application, it must also be shown either
dependencies between resources; that is, the state that successful completion of the initialisation
of one resource affects the behaviour of another. guarantees that the hardware is correctly
Indeed, this is explicit in the definition of configured, or that it is impossible (or at least
primary and secondary control resources. extremely improbable) that the main body of the
However, there are other, less direct software could fail to detect, and safely respond
dependencies which are also vital. The most to, any incorrectness in its execution
significant of these is that, in most systems, there environment.
must be an initialisation phase, in which the
software configures the hardware to the state Safety Arguments for Resources: Having
required for the execution of the main body of identified the timing events and resources in the
the application. However, this initialisation code system, and assigned appropriate criticality
is, itself, run on the very hardware it is classes, we must now demonstrate the
configuring, so a circular dependency is created acceptability of their implementation and use.
(figure 3). The arguments made must consider both normal
(intended) operation and the effects of failures.
For example, when the brake-by-wire system is There are many fault tolerance strategies, and our
powered up, the MMU registers will contain the purpose here is not to discuss these, but rather to
manufacturer’s defaults, and will therefore not consider some general properties which are
correctly describe the boundaries between the relevant whatever the system architecture.
Interrupts Output events ROM RAM CPU regs I/O regs
Master Program Stack Critical Intrinsically
cycle clock ROM RAM variables critical resources
Timer MMU Bus arbitration Primary control
registers registers control registers resources
Initialisation Initialisation routines for
routines primary control resources use
system resources, and
dependencies become cyclic.
ROM RAM CPU regs
Figure 3 – Illustration of Cyclic Dependencies in System Resources
Because the criticality of data and calculations is algorithms compared. The effectiveness of these
application dependent, it is not possible to make strategies depends upon the improbability of two
general arguments for the safety of intrinsically hardware failures resulting in identical but
critical resource usage based on knowledge of incorrect results.
the hardware and operating system alone. It is
entirely the responsibility of the application An alternative strategy may be to argue that
designer to demonstrate that the intended intermediate values are stored for so little time
(normal) operation of the system is safe. that the effective exposure to random hardware
However, study of the underlying hardware is failures is negligible. This argument may prove
very important in understanding the behaviour of fallacious if a single calculation contains many
the system in the presence of failures. There are steps which use previous intermediate results, or
two categories of failure to consider; failure of if the calculation is repeated frequently, in which
the hardware implementing the resource itself, case the proportion of the time for which values
and failures in the configuration and protection are stored in the same temporary location may
of the resource arising from faults in primary prove too high.
Many of the possible resource protection
In both monolithic systems and those with strategies in application software depend upon
operating systems, the arguments which can be the ability to control, or make use of, features of
made for the tolerability of hardware failure in the hardware. For example, storing two copies of
intrinsically critical resources depend upon two data gives greater protection if the two locations
factors; the improbability of that failure, and the are in separate devices (RAM chips), potentially
provision of appropriate protection mechanisms avoiding sources of common mode failure such
for critical data and operations within the as faulty address decoding or ‘stuck at’ faults on
application. For example, in a simple, single individual devices. This is relatively easy to
channel system, the integrity of critical data held achieve in monolithic software, where the
in RAM may be checked by storing the same programmer has direct control over the hardware.
value in two locations and comparing them For an operating system to offer similar
before use. For temporary storage (e.g. protection, it may be necessary to implement
intermediate results in critical calculations) this special features (possibly with complementary
may not be viable, so the calculation may be compiler directives) to provide the application
repeated, or the results of two alternative programmer with the necessary control.
However, the benefit of this will be that it is also a location is necessarily an error, the argument
possible to provide generic 'argument fragments' must show how the system will trap and respond
or patterns, which can be applied each time the to such an attempt.
feature is used.
Failure modes to consider in safety arguments:
Failures of intrinsically critical resources arising We have now described a system model based
from faults in, or incorrect management of, upon identifying and classifying resources, and
primary control resources are of particular briefly discussed the types of safety argument
concern because they have the potential to cause which may be made for each. Most of these
common mode failures, possibly invalidating any arguments require an assessment of the effects of
of the above arguments. However, this is an area failures, and it is now necessary to consider an
where generic arguments may be made about the appropriate model of failure. Again, it is
interaction of the operating system and the infeasible to consider every type and cause of
hardware; as there will normally be relatively few failure of each device individually, so it is
of these, it is feasible to devote significant time necessary to make another abstraction.
and effort to analysis.
We base our approach on a small number of
So, for intrinsically critical resources, we have hypothetical failures, based on research into the
identified three essential strands to a safety classification of computer system failures (refs.
argument: 1-2). We have previously used these same
1. Safety of normal usage – argument is hypothetical failure categories as guide words for
responsibility of application developer. a HAZOP-style analysis of software designs
2. Toleration of hardware failure – argument is (refs. 3-4), and this experience has shown that
responsibility of application developer, but these categories are applicable to many aspects
with support from hardware and operating of computer systems. The failure categories are
system analysis. introduced below, together with the
3. Correct management via primary controls – interpretations we have used in our case studies.
argument is primarily responsibility of Clearly, as this approach is new, we cannot be
hardware and operating system analysis. certain of the completeness and adequacy of
these interpretations, but comparison with results
For primary control resources, the safety obtained by other analysis techniques (e.g. more
arguments that can be made depend on so many traditional Functional Failure Analysis at device
factors that it is impossible to give general block diagram level) has given us reasonable
guidelines for the safety argument. If there is a confidence that they provide a sound basis for
secondary control resource which duplicates the analysis.
behaviour of the primary for redundant
protection (e.g. an external device which The failure categories we identify are based on
duplicates the memory protection functions of the the concept of a service, that is, the provision of
MMU), it may be sufficient to argue safety a particular value at a specific time. They are:
simply from improbability of coincident failure, • Omission – the value is never provided
provided that initialisation is not a potential • Commission – a value is provided when it is
source of common failure. More probably, it will not required (i.e. a perfectly functioning
be necessary to identify means of detecting and service would have done nothing)
managing the effects of a failure. For example, if • Early – the value is provided before the time
the MMU is incorrectly configured so that a (either real time, or relative to some other
process is denied access to memory locations it action) at which it is required
requires, can it be shown that the resultant errors • Late – the value is provided after the time at
are always trapped and lead to the system taking which it is required
appropriate action? • Value – the timing is correct, but the value
delivered is incorrect.
Similarly, for secondary control resources, the
argument will depend entirely on the role of the The interpretations of these categories used for
resource, and no general guidance can be given. timing events are:
Only one argument is normally required for non-
existent resources. As any attempt to access such
• Omission – the failure of an event to occur. In treated as an omission, e.g. by triggering a
a multi-processsor system where events bus timeout.
affecting more than one processor are • The value of a resource is its data content.
possible (e.g. broadcast interrupts or For control resources, the correct value can
synchronisation events), it is necessary to often be determined in advance, and the
consider symmetric (where no recipient effects of changes predicted. In the case of
responds to the event) and asymmetric (one memory (RAM or ROM) the effect of
or some recipients respond) omission. unwanted changes can only be determined
• Commission – the spurious occurrence of an with knowledge of the application software.
event. Again, there may be different cases to
consider in a multi-processor system. It may Analysis Steps: The numbered steps below
also be necessary to consider whether the summarise our approach to conducting an
spurious event is a repetition of an expected analysis based on the principles outlined.
event, or insertion of something completely 1. Identify resources from system design
unexpected or out of sequence. documentation, e.g. memory maps.
• Early and Late take the obvious 2. Describe the function and usage of each
interpretations. resource. Many of the resources can be
• Value has no meaningful interpretation for grouped together at this stage; for example,
timing events. blocks of memory with the same function
3. Classify resources to help decide argument
Interpretations for physical resources are: requirements.
• Omission and commission – interpreted as 4. For each resource, consider each applicable
access permission violations. An omission failure mode:
failure occurs if a process which should be 4.1. Describe the exact failure mode(s) being
able to access a resource is denied considered.
permission. Commission failure occurs where 4.2. Describe the effects of the failure,
a process is granted access to a resource considering hardware and software
which it should not have. response, existing prevention, detection
• Early has two interpretations in the case of a and mitigation mechanisms.
physical device such as memory, both leading 4.3. Decide whether an acceptable argument
to (unpredictably) corrupt data: for safe handling of this failure mode
• the processor reads from a location in the can be made.
device, and attempts to latch the data from 4.4. If argument exists, record it, otherwise
the bus before it is stable, or propose necessary design revisions.
• the processor writes to a location in the 4.5. If necessary, repeat 4.1 - 4.4 for system
device, and de-asserts data before the initialisation.
device has latched it correctly. 5. Repeat until acceptable arguments have been
These may seem unlikely failures, and only made for all failure modes of all resources.
possible with poor hardware design.
However, there are systems in which The natural format for recording the results of the
parameters such as the number of wait states analysis a large table. Note that we find it helpful
inserted on accessing a particular device are to add a 'check' column in which to note whether
programmable; the system can dynamically the argument obligations for a particular resource
alter its own timing characteristics. In such have been fully discharged. Obviously, no
systems, this type of timing failure is intrinsically critical resource can be considered
plausible and extremely important. to be fully discharged until arguments have been
• Late refers to delay in accessing the resource, completed for each of the primary controls upon
arising either from effects such as contention which it depends.
for a shared bus, or from the same type of
configuration fault that could lead to early In practice, many of the mechanisms in step 4.2,
failures. In general, lateness will not cause and the related arguments, will be found to be
data corruption, and is only of interest in our generic, and step 4.3 will consist simply of
analysis if the delay is great enough to be adding an appropriate reference. Some of these
generic mechanisms may be identified before the
main body of the analysis is commenced, as they
have been specifically provided as features of the For reasons of efficient processor utilisation, a
design; others will emerge as analysis proceeds. segregated operating system was used, with high
It is important that any mechanisms identified in and low integrity functions executing on the same
advance are thoroughly investigated to ensure processor. The application software ran in an
that they function as intended. environment which provided high integrity
system initialisation, scheduling and memory
Case Study protection. In order to ensure system consistency
and guarantee the critical function access to the
The major case study used in the development system bus and I/O when required, the system
and testing of this analysis approach was a large, employed synchronised cyclic schedules,
multi-processor avionics system. The system was executing a high integrity code segment on all of
required to provide one safety critical function, the processors at the same time in each cycle.
and a large number of functions of lower
criticality. The system was developed in Our analysis technique was applied to investigate
accordance with UK defence standards, requiring whether there was any way in which hardware,
the critical function to be developed to the scheduling or protection systems could fail in
highest integrity level, with the other functions such a way that the three principles of safe
developed to lower integrity levels as operation outlined in the introduction could be
appropriate. violated. Some specific areas of concern,
including bus arbitration, high integrity data I/O
The basic structure of the computer system and the management of asynchronous events
hardware is shown in figure 4. The processors were identified before the study started, and
were arranged in pairs, each processor having its particular attention was paid to these during the
own private bus, giving access to RAM, ROM analysis.
and timers, and to the arbitration logic for access
to the shared local and system buses. In addition, Another specific requirement was that the
a secondary MMU on each private bus provided analysis produced should be as independent of
redundant protection of critical memory areas. the application software as possible, to permit
changes to the application without the need for
extensive repeat analysis. This was achieved by
Second MMU Second MMU
Private bus bus
Second MMU Second MMU
Private bus bus
Second MMU Second MMU
Private RAM Private RAM
Private RAM Private RAM
Private ROM Private ROM
Private RAM Private RAM
Private ROM Private ROM
Private ROM Private ROM
Shared ROM Arbitration
Arbitration Arbitration Arbitration
Figure 4 – Case Study System Hardware Architecture
assuming that all application software except that were identified and these, resulted in a number of
of the critical function itself would always revisions to the system design.
behave in the worst conceivable way, i.e. any
failure of resource protection or scheduling The final report was around 60 pages in length,
would always result in the lower integrity most of which was occupied by tables
software causing the maximum possible summarising the analysis results. This report was
interference to the operation of the critical submitted to the certifying authority as part of the
function. If satisfactory protection of the critical review package. The comments received in
function could be demonstrated under this response were detailed and technical in nature,
assumption, no future change to the lower and concentrated on the system rather than the
integrity software could invalidate the safety analysis approach, which was seen as generally
arguments. sound and useful. The only specifically analysis-
related request made was that the response of the
Since the analysis approach was novel, the software to certain hardware failures should be
certifying authority for the system was involved described more completely.
at an early stage, and their agreement sought for
the principles and method to be used. Figure 5 The analysis identified a number of features of
shows a fragment of the output produced the system which facilitated the safety argument.
Case Study Results: The analysis of the study • An easy to achieve safe state in which the
system took approximately nine man months to system could stop. This meant that proving
complete, distributed over a considerably longer that detected failures were handled correctly
elapsed period. About one third of this time was was relatively easy. The task would have
spent in understanding the design of the system, been far harder had there been a requirement
and the rest in analysis and report writing. During for continued operation.
the course of the analysis, a number of issues
Location: EFC8 - Timer T1 Counter Load Register
Use / Criticality: Timer T1 is used as the master frame timer on Processor 1 / Primary Control
Guide Word Deviation Causes Summary of Acceptance Arguments
Omission Denial of read access to Failure of processor • Only write access needs to be protected; there are no
any software; denial of MMU or protection built safety implications of allowing any software read
write access to Class 1 in to Timer hardware. access to the timer.
software. • Failure of one device may lead to refused write
access; this will lead to bus exception being raised,
followed by orderly shutdown of affected processor.
Commission Granting of write Simultaneous failure of • Requires simultaneous failure of both access control
permission software processor MMU and mechanisms. Note that these are separately
other than Class 1. protection built in to initialised by different code from two separate
Timer hardware. configuration tables.
Early Processor attempts to Timer hardware failure • Timers are read for two reasons;
latch data off bus before (waitstates for Timer 1. to check how much time is left in a frame -
it has stabilised. access are controlled by incorrect reading will lead to loss of
the timer hardware). synchronisation between processors at end of
frame. Processors remaining in sync. will force
orderly shutdown of affected processor
2. in continuous built-in test to check timer status -
incorrect reading will lead immediately to orderly
Late Excessive latency Timer hardware failure • Local device - no arbitration delays
between device access (waitstates for Timer • Worst case will lead to bus exception being raised,
and data read. access are controlled by followed by orderly shutdown of affected processor.
the timer hardware).
Value Incorrect timer setting Timer hardware failure, • May be detected by CBIT, or result in loss of
or corruption of synchronisation between processors at end of frame.
initialisation data. Arguments for Early apply.
Figure 5 – Example Analysis Output
• Redundant MMUs, independently initialised, 1. A. Bondavalli and L. Simoncini, Failure
which permitted a simple and sound generic Classification with respect to Detection, in
argument for memory protection safety. First Year Report, Task B: Specification and
• The physical and logical memory maps of Design for Dependability, volume 2. ERPRIT
the system were the same (i.e. the MMUs BRA project 3092: Predictably Dependable
performed no address translation). It was Computing Systems, May 1990
therefore a simple task to identify all the 2. P. D. Ezhilchelvan and S. K. Shrivastava, A
areas of memory that could ever be involved Classification of Faults in Systems, Technical
in a critical process. Report, University of Newcastle upon Tyne,
The tight synchronisation between processors 1989
meant that many failures could be detected and 3. J. A. McDermid and D. J. Pumfrey, A
mitigated by related events on other processors. Development of Hazard Analysis to aid
Software Design, COMPASS '94,
Conclusions Gaithersburg, MD, IEEE Computer Society
We believe that the work described here presents 4. J. A. McDermid, M. Nicholson, D. J.
a useful advance in the range of low-level safety Pumfrey and P. Fenelon, Experience with the
analysis techniques available for computer application of HAZOP to computer-based
system safety analysis. The combination of a systems, COMPASS '95, Gaithersburg, MD,
HAZOP-like approach with a suitable model of IEEE Computer Society Press, 1995
the hardware allows for a very thorough analysis,
whilst still remaining relatively tractable. Biographies
However, there are many open issues, both for Professor John A. McDermid, High Integrity
the improvement of safety analysis techniques, Systems Engineering Group, Department of
and for the design of systems such that they are Computer Science, University of York,
amenable to analysis. This is of particular Heslington, York YO10 5DD, UK, telephone
relevance to the avionics industry at present, in +44 (0) 1904 432726, fax +44 (0) 1904 432708,
view of the industry-wide move towards e-mail John.McDermid@cs.york.ac.uk
Integrated Modular Avionics (IMA). As noted,
the analysis of the case study system was greatly John McDermid is Professor of Software
facilitated by certain features of the design, some Engineering at the University of York, where he
of which (such as a safe fail stop state) are not runs the High Integrity Systems Engineering
generally possible. Others, such as the provision group. His primary interest is safety critical
of redundant memory protection, could relatively systems in aerospace, and he directs the BAe
easily be incorporated into the design of future Dependable Computing Systems Centre, and the
systems. More work is needed to identify features Rolls-Royce University Technology Centre in
like this, and to focus the attention of designers Systems and Software Engineering.
on the need to incorporate them into their
systems. David Pumfrey, DCSC, Department of Computer
Science, University of York, Heslington, York
Particular research challenges for the future YO10 5DD, UK, telephone +44 (0) 1904
development of this work are how to model and 433385, fax +44 (0) 1904 432708, e-mail
analyse systems with address translation and David.Pumfrey@cs.york.ac.uk
dynamic memory allocation. We also need to
improve our understanding of arguments which David Pumfrey is a Research Associate in the
can be made for the safety of all types of BAe Dependable Computing Systems Centre,
resource usage, so that improved guidance can be where he is currently investigating the use of
given to engineers attempting this type of HAZOP and related techniques for software
analysis in future. hazard analysis. He is also one of the presenters
of a highly successful series of short courses on
system safety and safety cases.