Dependency Modelling Using Fault Tree Analysis
Document Sample


From the Proceedings of the 17th International System Safety Conference (August 1999)
Dependency Modelling Using Fault Tree Analysis
J.D. Andrews, PhD; Department of Mathematical Sciences, Loughborough University, England
J.B. Dugan, PhD; Department of Electrical Engineering, University of Virginia, USA
Keywords; fault tree analysis, dependency, reliability, availability, Markov methods, binary decision
diagrams
Abstract commonly based on the Kinetic Tree Theory of
Veseley [1]. Kinetic Tree Theory was developed
This paper describes the use of the fault tree in the early 1970’s at a time when the pioneering
method to model the failure probability of work in systems reliability was being performed.
systems that feature dependencies. Our method One of the main assumptions of this modelling
is presented by illustrating its application to an technique is that the basic events occur
example safety system taken from the offshore independently. In the last decade there has been
industry. Despite the dependencies the a significant increase in the complexity of high
representation of the system failure logic retains technology system designs. A feature of many of
the fault tree structure by utilising a new gate set. these modern systems is that they are software
The analysis of the dependent parts of the system controlled and the failure events may exhibit
is performed by software in a manner, which is some form of dependency. For the analysis
totally transparent to the analyst. Markov techniques to remain relevant to the modern
methods are employed to solve the dynamic, systems their development must keep pace with
dependent sections of the fault tree and Binary those made in the systems technology. Several
Decision Diagrams to solve the static fault tree recent publications have appeared which deal
sections. It may be necessary to alternate with the research performed in extending the
between the two analysis methods several times traditional fault tree analysis method to
to solve the complete fault tree structure. incorporate dependencies [2],[3],[4],[5].
Introduction The approach adopted to incorporate dependency
modelling has retained the fault tree structure and
Over the last two decades, Fault Tree Analysis introduced a new gate set which will enable the
has become an established tool used to assess the dependent sections of the tree structure to be
likelihood of failure of industrial systems. It is identified and the exact nature of the dependency
particularly well utilised for the assessment of to be specified. Such gates are described in the
safety systems whose failure can cause fatalities sections below. The analysis of such a tree
or have excessive financial penalties. The would then be performed by transforming the
popularity of the method is due to the ease in dependent sections of the fault tree to equivalent
which the system failure causality is represented Markov state transition diagrams. Construction
in a logic tree diagram which increases in its and analysis of the Markov diagrams is then
resolution as the diagram develops until terminal performed by the analysis software in a manner
branch events which represent component which is totally transparent to the analyst. Once
failures, software errors or human actions are analysed to produce either the probability or
encountered. This form of diagram, whilst frequency of the intermediate level fault tree
representing a mathematical logic equation, events, the results of the Markov assessments
provides a concise, documented means of would be incorporated back into the original fault
representing the fault propagation through the tree structure. This process would be continued
engineering system. There are therefore until a static, independent fault tree structure
advantages in retaining the basic fault tree remains which could then be analysed by
structure to develop causes of system failure, traditional fault tree techniques or the more
whilst extending its analytical capabilities. recent Binary Decision Diagram (BDD)
methods[6],[7],[8]. Dependent upon the system
Usually the analysis is performed using one of structure it may at times be necessary to conduct
the many commercially available computer the analysis in an alternating BDD / Markov
software packages. The method used in the process to achieve the final results. An example
packages to perform the calculations is of such a system, a water deluge system on an
Provided as a free service of Fault Tree Central http://www.fault-tree.net
From the Proceedings of the 17th International System Safety Conference (August 1999)
offshore platform, is presented in this paper maintained by a jockey pump (not shown in the
along with a description of its analysis. figure). When the take-off valves open and water
is delivered to the spray nozzles the ringmain
Safety System Description (Water Deluge pressure will drop. Ringmain pressure is
System) monitored and transmitted to the computer
control system by the three pressure transmitters
The water deluge system used to illustrate the (PS1-PS3). When two of the three transmitters
dependency methodology is shown in figure 1. indicate a low ringmain pressure the main pumps
Whilst this particular system is an example taken are activated in the order indicated from top to
from the offshore industry its features are typical bottom of the diagram (ie. EP1, EP2, DP1, DP2).
of water spray systems used in many different As long as two pumps are available then water
onshore industries. Four pumps are used to can be delivered at the required rate to satisfy
provide the water demand to the ringmain. The demand. Four pumps provide redundancy in the
ringmain transports the water around the system. Pumps 1 and 2 are electric powered and
platform to the take-off points where it is used to pumps 3 and 4 are the diesel backups.
protect against the hazards posed by hydrocarbon
fires and explosions. Pressure in the ringmain is
Electric
Power PS1
Test Valve
(ep) P ressure Relief Val ve
Filter EP1
IS OL22 Val ve I SOL11 Valve
Test Valve
P ressure Relief Val ve
Filter EP2
IS OL22 Val ve I SOL12 Valve
Diesel
Power
Test Valve
(dp) P ressure Relief Val ve
Filter DP1
IS OL23 Val ve I SOL13 Valve
PS2
Test Valve
P ressure Relief Val ve
Filter DP2
IS OL24 Val ve I SOL14 Valve
PS3
Computer Control
System
Figure 1 - Schematic representation of the deluge system pump streams
The features on each pump stream are identical. on each line enables individual pumps to be
As the water supply is direct from the sea a filter tested without fully activating the deluge system.
is fitted on each stream. Manual isolation valves
are located for maintenance purposes located There are two failure modes of concern for each
either side of the pump. A pressure relief valve stream, the first is that it fails to start
provides protection for the pump and a test valve (unavailable) and the second is that it fails once
Provided as a free service of Fault Tree Central http://www.fault-tree.net
From the Proceedings of the 17th International System Safety Conference (August 1999)
running (unreliable). If a pump stream activates fault tree model showing the failure of the
on demand it means that the filter, isolation computer system is shown in figure 2, in which
valves, test valve and pressure relief valve which the basic events represent hardware (processors),
are all (for this function) passive components are software and the sensor set.
in the working condition. As they are passive
they are unlikely to fail in the relatively short Each basic event is characterized by information
running times if they work initially. These are giving the probability of failure (either as a
static failure modes. The pump is however is a distribution or as a fixed probability) and the
dynamic component and can also fail once probability that a given fault is covered (or
running. uncovered).
System failure will occur if fewer than two of the Next consider the pump system, consisting of the
four streams can be activated (ie 3 from 4 fail) four pumps, their power sources (two are electric
for the required duration (12 hours) and two are diesel) and their pump streams
(associated valves and filters). For now,
Fault tree model of example safety system
Computer
System
Failure
Let us consider the two parts of the system
separately when building the fault tree model.
That is, we will first consider the computer
control system and then consider the pump
system. As we analyse this system, we will Both 2 out of 3
processing units sensors fail
describe the dependencies that must be modelled, fail
and describe special gates which incorporate K
these dependencies into the fault tree analysis. M
3*
The computer control system consists of the sensors
three pressure sensors (of which 2 are needed), Primary fails Hot spare
fails
plus the hardware and the software. The
hardware consists of redundant processors in hot
standby mode, each equipped with identical
software. While the spare processor is in spare
HW1 SW1 HW2 SW2
mode, it is monitoring the inputs and outputs of
the primary, in order to provide detection and
recovery in case of error. When an error is Figure 2 Computer system failure fault tree
detected, control is switched to the backup
processor. The computer control system can thus let us ignore the pump streams and power
tolerate a single (detected) hardware or software supplies, and concentrate on the four pumps.
failure. However, an undetected error causes
failure of the computer subsystem regardless of The set of four pumps operate in standby
the state of the backup. This latter case redundancy in that the two electric pumps are
(undetected error) is an example of an uncovered started first, and the diesel pumps provide
fault, which leads to immediate system failure. replacements when the electric pumps are
Another example of an uncovered fault is a unavailable. On demand, pumps EP1 and EP2
software fault which affects both processors are turned on. If one of these two should fail, it
simultaneously. One might expect, since the is replaced by DP1. The second pump failure is
software on both processors is identical, that all replaced by pump DP2. This dynamic
software faults would affect both processors. redundancy scheme introduces dependencies
However, there is field data to support the between the failures and requires special
assumption that a large percentage of software modeling techniques. A pump which is in use
faults will affect only a single processor[10]. experiences a different failure rate than one in
Modeling uncovered faults is crucial to the standby. Therefore, we need to keep track of
analysis of a fault tolerant computer system, and which pumps are being used and which are in
is discussed in more detail in [7] and [9]. A standby. We use a spare gate to model the
Provided as a free service of Fault Tree Central http://www.fault-tree.net
From the Proceedings of the 17th International System Safety Conference (August 1999)
failure dependencies which arise from the use of returns true when the primary and the spares
spares. have been exhausted. Basic events representing
spares have failure rates, coverage factors and
A spare gate is one of several dynamic gates dormancy factors.
introduced in [9] and it is used to model several
dependencies associated with the use of spares. Continuing to ignore the power supplies and
First, a component which is used as a spare has pump streams, the fault tree in figure 3 models
an associated dormancy factor (between zero and the pumps and their spares. The pump system
one inclusive) which is a multiplicative factor of fails when there are no longer two available
the active failure rate to produce the spare failure pumps (thus the OR gate with two inputs). The
rate. If the dormancy factor is zero, the spare is basic events represent the two electric pumps,
said to be a cold spare; a cold spare cannot fail which are both initially active (on demand). The
before being switched into active operation two diesel pumps (DP1 and DP2) are pooled
(failure to activate is modeled as an uncovered spares shared by both electric pumps. The first
failure). If the dormancy factor is unity, then the electric pump failure is replaced by DP1 and the
spare is said to be a hot spare and can fail at the second by DP2. Note that if EP2 preferred to be
same rate as when active. The in between replaced by DP2 then we could switch order the
situation is referred to as a warm spare; a warm DP1 and DP2 inputs on the second spare gate.
spare can fail before switched into active
operation, but does so at a lower rate than when Next let us consider the power supplies. There is
active. an electrical power supply for pumps EP1 and
EP2 and a diesel supply for DP1 and DP2. If a
The second dependency handled by the spare power supply fails, then the associated pumps are
gate is the use of pooled spares, which are spares unavailable (essentially failed). This type of
that can be used as a replacement for whichever functional dependency of one component on
of a set of components fails first. Modeling another is easily modeled with a functional
pooled spares requires us to keep track of not dependency gate [11]. The functional
only the state of each component, but also the dependency gate has a trigger input and one or
order in which they have failed, so that we can more dependent inputs; when the event
determine which spare is being used where. associated with the trigger input occurs, the
Further, it might be the case that components dependent inputs are then forced to occur. The
have preferences for replacements, in that there is functional dependency gate can be used to model
an priority or order in which spares are utilized. the functional dependence of the pumps on the
This order may well be different for different power supplies : the power supply is the trigger
components. event and the two pumps are the dependent
events. The fault tree in figure 4 adds the
The spare gate has a set of at least two inputs, the functional dependency to the fault tree in figure
first (leftmost) of which is the designated 3. The functional dependency gate produces no
primary, and the second and subsequent (from output other then the propagation of failures. For
left to right) are the spares. When the primary this reason it is connected via a dashed line to the
fails, it is replaced (in order) by the spares which rest of the fault tree.
are still available (i.e. not failed and not used
elsewhere). The single output of the spare gate
Provided as a free service of Fault Tree Central http://www.fault-tree.net
From the Proceedings of the 17th International System Safety Conference (August 1999)
Pu mp
System
failure
Sp are structure Sp are structure
for Pump 1 for Pump 2
WSP WSP
Ele ctri c Ele ctri c
Pump 1 Pump 2
Die sel Die sel
Pump 1 Pump 2
Figure 3 Pump system fault tree structure
An interesting aspect of the model is the chain is needed for analysis. The Markov chain
inclusion of the pump streams (by which we which is used to solve this system must account
mean the isolation valves, the pressure relief for not only the pumps and power supplies, but
valves, the test valves and the filter). The pump also for every valve and filter in each channel.
streams provide support for the pumps and need Since the number of states in a Markov grows
to be operational in order for the pump to be exponentially with the number of components
utilized. We have used two different approaches being considered, the resulting model can be
to including the pump streams, and will describe quite large. Further, since the pump streams are
these approaches in turn. First, we can use the unlikely to fail once the pump is running, it is not
functional dependency gate to model the necessary to model each filter and valve in such
dependence of each pump on the associated detail. It is sufficient to know whether the stream
pump stream. Figure 5 shows a fault tree model is operational on demand.
for a stream, and the functional dependency of
the pump on the pump stream. In the full fault In [2], an approach is developed for separating
tree model, there are then 4 such constructs, one the static analysis of the pump stream from the
for each pump and stream configuration. dynamic analysis of the pumps themselves. The
probability that the stream is available on
The advantage of this approach is that it is simple demand is determined for each stream, and these
to describe and can be solved using Galileo (a probabilities are used to determine the initial
software tool for dynamic fault tree analysis) state probabilities for the Markov analysis of the
[12], but the disadvantage is that a large Markov pumps and power supplies.
Provided as a free service of Fault Tree Central http://www.fault-tree.net
From the Proceedings of the 17th International System Safety Conference (August 1999)
Pum p
System
failure
Spare structure Elect ric pumps Spare st ruct ure
for Pump 1 for Pump 2
need power
Elect ric
WSP power F DEP WSP
supply
Di esel pumps
need power
Diesel
El ectri c Elect ric
Power FDEP Pump 1 p
Pum 2
supply
Diesel
Pump 1
Diesel
Pump 2
Figure 4 Detailed Pump system fault tree
Pump stream
failure causes
pump failure
Pump stream
failure
FDEP
pump
failure
Valve
failure Filter
failure
Isolation Relief or
valve test valve
fails fails
Pressure
Isolation Isolation Test
relief
Valve 1 Valve 2 valve
valve
Figure 5 Pump Stream Fault Tree
Analysis Process the failure of the pumping system (Figures 4 and
5 – the functional dependencies modelled in the
The final fault tree for the failure of the pumping figure 5 fault tree feed into the four pump failure
system has a top event whose cause is given by events in figure 4). The procedure for analysis
the failure of the computer system (figure 2) OR first involves identifying the modules of the tree
Provided as a free service of Fault Tree Central http://www.fault-tree.net
From the Proceedings of the 17th International System Safety Conference (August 1999)
which have dependencies and those which are basic events with functional dependencies and
independent. A bottom up analysis scheme is will therefore require a Markov assessment. The
then implemented where the lowest level Markov model has been listed as a table (Table
modules are analysed using the appropriate 1) for clarity. The first column represents the
technique (BDD’s for independent sections and state number, the remaining columns identify the
Markov models for dependencies) and the results status of each pump, coded as W for working, S
fed into the analysis of the higher level sections. for standby and F for failed. The status thus
This is performed as an alternating sequence of indicated is at the start of the pumping process.
BDD and Markov models until the results for the The failure status for a pump is therefore caused
top event are obtained. by either the pump itself failing prior to the
demand whilst dormant, the associated pump
The lowest level modules in the pump system stream failure (the probability determined by the
fault tree are the functional dependency gates first level BDD analysis) or the power source
represented in figure 5. All events contributing unavailability. Standby status indicates that the
to the cause of the functional dependency event pump is functional but not operating when the
are independent and therefore these sections can demand occurs. It can then fail following the
be analysed using the BDD technique. This will demand due to a dynamic failure event. The
produce the pump stream failure probabilities pumping process is required for 12 hours to
PE1, PE2, PD1, and PD2. mitigate the hazard. It is assumed that in this
short period of time repair action can not be
Proceeding up the tree structure the next section completed and that passive components (valves,
for analysis is the Pump system section pipework etc) cannot fail .
represented by the fault tree in figure 4. This has
STATE N0. STREAM STATUS INITIAL PROBABILITY
1 2 3 4
1 W W S S (1-PE1)(1-PE2)(1-PD1)(1-PD2)
2 F W W S PE1(1-PE2)( 1-PD1)(1-PD2)
3 W F W S (1-PE1)PE2(1-PD1)(1-PD2)
4 W W F S (1-PE1)(1-PE2)PD1(1-PD2)
5 W W S F (1-PE1)(1-PE2)(1-PD1)PD2
6 F F W W PE1PE2(1-PD1)(1-PD2)
7 F W F W PE1(1-PE2)PD1(1-PD2)
8 F W W F PE1(1-PE2)(1-PD1)PD2
9 W F F W (1-PE1)PE2PD1(1-PD2)
10 W F W F (1-PE1)PE2(1-PD1)PD2
11 W W F F (1-PE1)(1-PE2)PD1PD2
12 F F F W PE1PE2PD1(1-PD2)
13 F F W F PE1PE2(1-PD1)PD2
14 F W F F PE1(1-PE2)PD1PD2
15 W F F F (1-PE1)PE2PD1PD2
16 F F F F PE1PE2PD1PD2
Table 1 Markov Diagram Sate List
Initial probabilities of entering each of the states for analysis. System failure states are those with
in the table is determined using the results of the less than two functional pumps i.e. 12-16.
pump stream dormant failure fault trees. The Performing the analysis on the Markov diagram
transition rates between the states are then then yields the failure probability for the pump
obtained due to failure of the pumps, listed in system. Progressing up to the top level in the
Table 2 and their power supply failures, listed in fault tree structure will combine this probability
Table 3. The two sets of transition rates are with that of the computer system failure to obtain
superimposed onto the state transition diagram the overall system unavailability.
Provided as a free service of Fault Tree Central http://www.fault-tree.net
From the Proceedings of the 17th International System Safety Conference (August 1999)
From state To state Transition rate From State To state Transition rate
1 2 λe 1 8 13 λe 2
1 3 λe 2 8 14 λd 1
2 6 λe 2 9 12 λe 1
2 7 λd 1 9 15 λd 2
3 6 λe 1 10 13 λe 1
3 9 λd 1 10 15 λd 1
4 7 λe 1 11 14 λe 1
4 9 λe 2 11 15 λe 2
5 8 λe 1 12 16 λd 2
5 10 λe 2 13 16 λd 1
6 12 λd 1 14 16 λe 2
6 13 λd 2 15 16 λe 1
7 12 λe 2 7 14 λd 2
Table 2 pump failure transition rates
From state To state Transition rate From state To state Transition rate
1 6 λep 1 16 λdp
2 6 λep 2 14 λdp
3 6 λep 3 15 λdp
4 12 λep 4 11 λdp
5 13 λep 5 11 λdp
7 12 λep 6 16 λdp
8 13 λep 7 14 λdp
9 12 λep 8 14 λdp
10 13 λep 9 15 λdp
11 16 λep 10 15 λdp
14 16 λep 12 16 λdp
15 16 λep 13 16 λdp
Table 3 Power failure transition rates
Summary and Conclusions We have described, by means of a representative
example, a methodology for incorporating the
Provided as a free service of Fault Tree Central http://www.fault-tree.net
From the Proceedings of the 17th International System Safety Conference (August 1999)
analysis of various kinds of dependencies into a Tandem GUARDIAN90 Operating
fault tree. The dependencies considered include System,” Proceedings of the 23rd
functional dependencies, static (on-demand) International Symposium on Fault Tolerant
dependencies, sharing relationships and Computing , June 1993.
uncovered faults. These dependencies arise 11. Joanne Bechta Dugan, Salvatore J. Bavuso
naturally in mechanical, electrical and computer and Mark A. Boyd, “Dynamic fault tree
based systems, and their correct analysis is models for fault tolerant computer
crucial to the accurate assessment of the system systems,” IEEE Transactions on Reliability,
reliability. Volume 41, Number 3, pages 363-377,
September 1992.
References
12. Kevin J. Sullivan, David Coppit and Joanne
1. Veseley, W.E., “ A time dependent Bechta Dugan, “The Galileo Fault Tree
methodology for fault tree evaluation”, Analysis Tool,” Proceedings of the 1999
Nucl. Eng Des., 13, 337-360, 1970. Fault Tolerant Computing Symposium
(FTCS-29), June 1999. (Also see the web
2. Ridley L.M. and Andrews J.D., “Optimal
page www.cs.virginia.edu).
design of systems with standby
dependencies”, to be published in Quality
and Reliability Engineering International, Biography
15, 1999.
3. Andrews J.D. and Ridley L.M., “Analysis John D. Andrews, PhD, Department of
of systems with standby dependencies”, Mathematical Sciences, Loughborough
Proceedings of the International System University, Loughborough, LE11 3TU, England.
Safety Conference, Seattle, Sept 1998. e-mail J.D.Andrews@lboro.ac.uk
4. Rohit Gulati and Joanne Bechta Dugan, “A
Dr Andrews is a Senior Lecturer in the
modular approach for analyzing static and
Department of Mathematical Sciences at
dynamic fault trees,” in Proceedings of the
Loughborough University. He joined this
Reliability and Maintainability Symposium,
department in 1989 having previously gained
January 1997
nine years industrial experience at British Gas
5. Ragavan Manian, David Coppit, Kevin J. and two years lecturing experience at the
Sullivan and Joanne Bechta Dugan, University of Central England.
“Bridging the gap between systems and
dynamic fault tree models,” Proceedings of His current research interests concern the
the 1999 Reliability and Maintainability assessment of the safety and risks of potentailly
Symposium, January 1999, pages 105-111. hazardous industrial systems. This research has
6. Rauzy A., “A brief introduction to Binary been heavily supported by funding from industry.
Decision Diagrams”, Eur. J. Automat., Recent grants have been secured from Mobil
30(8), 1996. North Sea Ltd, Daimler Chrysler and Rolls
Royce Aero Engines.
7. Dugan J.B. and Doyle S.A., “Incorporating
imperfect coverage into binary decision
Joanne Bechta Dugan, PhD, Department of
diagrams”, Eur. J. Automat., 30(8), 1996.
Electrical Engineering, University of Virginia,
8. Sinnamon R.M. and Andrews J.D., Thornton Hall, Charlottesville, VA 22903-2442
“Quantitative fault tree analysis using USA. e-mail jbd@Virginia.edu
binary decision diagrams, Eur. J. Automat.,
30(8), 1996. Joanne Bechta Dugan is a Professor of Electrical
9. Joanne Bechta Dugan, Salvatore Bavuso, Engineering at the University of Virginia, and
and Mark Boyd, “Fault trees and Markov was previously Associate Professor of Computer
models for reliability analysis of fault Science at Duke University and visiting Scientist
tolerant systems,” Reliability Engineering at the Research Triangle Institute. She has
and System Safety, 39:291-307, 1993. performed and directed research on the
development and application of techniques for
10. I. Lee and R.K. Iyer, “Faults, Symptoms
the analysis of computer systems which are
and Software Fault Tolerance in the
Provided as a free service of Fault Tree Central http://www.fault-tree.net
From the Proceedings of the 17th International System Safety Conference (August 1999)
designed to tolerate hardware and software
faults. Dr. Dugan is Senior Associate Editor of
the IEEE Transactions on Reliability, is a Senior
member of the IEEE (Reliability and Computer
Societies. She served on the National Research
Council Committee on Application of Digital
Instrumentation and Control Systems to Nuclear
Power Plant Operations and Safety.
Acknowledgement
The authors would like to acknowledge the
financial support of NATO which has enabled
the collaboration between Loughborough
University, England and University of Virginia,
USA in developing methods to predict the
reliability of safety critical systems.
Provided as a free service of Fault Tree Central http://www.fault-tree.net
Related docs
Other docs by xpj11142
Get documents about "