On the Fault Diagnosis and Failure Analysis in the

W
Document Sample
scope of work template
							                On the Fault Diagnosis and Failure Analysis in the
                                                          *
                      Satellite Attitude Control Subsystem
                                Amitabh Barua, Purnendu Sinha, Kash Khorasani

                                  Dept. of Electrical and Computer Engineering
                                  Concordia University, Montreal, QC H3G 1M8


                                                        Abstract

A rigorous failure analysis procedure within the on-board fault detection, isolation and recovery (FDIR)
system is desired and necessary for unmanned space vehicles. However, such analysis may not be
performed easily within existing diagnostic systems alone due to their limited capability of performing fault
detection and diagnosis together. In general, additional diagnosis tools can be incorporated within the
diagnosis scheme in the Attitude Control Subsystem (ACS) of a spacecraft that maintains the required
attitude using reaction wheel(s), which is typically a potential source of anomalies in the ACS. We are
developing a framework using fault tree analysis to complement existing diagnostic approaches. Unlike
most of the existing algorithms, the fault tree synthesis algorithm that we have proposed here does not
require an in-depth knowledge of the system under consideration. We anticipate that fault detection and
diagnosis framework that we are developing may help an on-board control system or an operator at
ground station to make an intelligent decision to prevent a major failure in the ACS of the satellite.


1. Introduction

Often for unmanned space vehicles, continuous communication with ground station may not be possible,
and further, in unforeseen environments, ground control could be interrupted for a long time. Moreover,
round-trip communications delay between ground and the orbiting spacecraft makes the operator
intervention in controlling the satellite to adapt to changes in the environment in real-time more difficult.
Because of these reasons, on-board fault detection, diagnosis and recovery scheme is necessary for
unmanned space vehicles. A rigorous failure analysis procedure within the diagnostic system can be a
very useful feature for identifying the source of malfunction. While such analysis may not be easily
performed within many existing diagnostic systems alone due to their limited capability of performing fault
detection and diagnosis together, it can be achieved by a complementary procedure that incorporates
fault tree analysis based techniques in on-board fault diagnosis and recovery system.


Fault tree analysis is a widely used technique for finding the cause of a failure. Fault trees can be
generated either manually or automatically. Though the manual synthesis is not uncommon in practice,


    •   This work is supported in part by a Strategic Projects Grant STPGP 258007 from
        Natural Sciences and Engineering Research Council of Canada (NSERC)
for complex systems, automatic generation of trees is desired. An automatic fault tree synthesis algorithm
that does not require detailed knowledge of the design, construction and operation of the system under
consideration is desirable and feasible.

In this paper we propose a framework for system fault diagnosis using fault tree analysis for the Attitude
Control Subsystems (ACS) of a satellite, which maintains its required attitude using reaction wheels. In
such satellites, disturbances and anomalies in the reaction wheel mechanisms are often the main
reasons behind the failures in the ACS. We have been aware of a similar problem that has been
encountered by the Canadian Space Agency, where the pitch momentum wheel of a satellite has failed.
Given this background, we plan to investigate the possibility of correlated faults in the Attitude Control
Subsystem behind such failures by fault tree analysis. A fault tree can be generated from multiple fault
data available either from the history of a satellite mission or from simulated fault data using an automatic
synthesis algorithm. We are basing our work on simulated data for both fault-free and faulty scenarios.
We begin with modeling the ACS in MATLAB-Simulink to generate data necessary for fault tree synthesis
by simulating the model under both fault-free and faulty conditions, and subsequently, constructing a fault
tree from the generated data set. Each decision in the fault tree synthesis is taken based on a monitored
signal for which thresholds or ranges are pre-defined. The signals to be monitored are selected by
evaluating the failure scenarios. The generated fault tree can be coupled with the on-board fault diagnosis
and recovery system to empower the system with failure analysis ability. However, in addition, if the
framework is incorporated at ground station, it may save considerable amount of time spent on fault
diagnosis and failure analysis by the analyst and thus help in making a quick intelligent decision.


Specifically, as our contributions in this paper, we:
    (a) propose a framework for an on-board (or at ground station) fault diagnosis in the ACS
    (b) present a fault tree synthesis algorithm for analyzing the cause(s) of failure.
    (c) highlight some promising preliminary results of the proposed scheme.


This paper is organized as follows: A brief description of the Attitude Control Subsystem is presented in
Section 2. Section 3 highlights some important points on the ACS model that we developed in MATLAB-
Simulink. A review on fault trees is presented in Section 4. In Section 5, we present an algorithm for fault
tree synthesis utilizing learning techniques. The proposed framework for fault diagnosis and failure
analysis is presented in Section 6. Finally, Section 7 highlights some promising preliminary simulation
results. Finally, we conclude the paper in Section 8.
2. The ACS Overview:

The main purpose of the Attitude Control Subsystem (ACS), which is commonly considered as
momentum management system, is to orientate the main structure of the satellite at desired angle(s)
within required accuracy. This ‘required accuracy’ is set by the payload mounted on the main structure.
Attitude of a spacecraft may be specified in a number of ways such as direction cosines, Euler’s angles
etc. We use Euler’s angles to specify the attitude. The information required for specifying the attitude
includes the three angles: roll, pitch and yaw, which are the measures of rotations about the x, y and z
axes respectively.

The major components of the ACS are [1]: the Attitude Control Processor (ACP), control torquers or
actuators, (for example, reaction wheels (RW), momentum wheels (MW), magnetic torque bars (MTB),
etc.), attitude sensors and the spacecraft body. The attitude sensors acquire spacecraft’s attitude. The
errors in angles are computed and based on these error signals the on-board ACP generates torque
command voltages. Control actuators produce torques depending on the torque demand/command
voltage inputs to them from the ACP. In this process the required attitude is attained. Though this is an
automatic control loop, even for autonomous space vehicles, usually the option of intervention from
ground control is kept open for various reasons.


3. Simulink Model of ACS:

We have developed a MATLAB-Simulink model of a generic ACS of a satellite. Simulink was chosen as
the modeling tool because of its wide acceptance and growing demand in system modeling, analysis and
design. A detailed description of the ACS modeling is not presented in the paper due to a limited space.
Only some important points on the ACS model that we have developed are mentioned here. We are
considering a ‘zero-momentum’ system. The model has been developed to diagnose ACS along a single
axis, i.e., along the pitch axis. The actuator or reaction wheel block in the control loop of our ACS model
is primarily based on the reaction wheel model presented in [2]. We have extended and modified this
model in order to include fault injection capabilities. Also, at this stage, we are assuming an ideal
dynamics for attitude sensors, i.e., signals from sun sensors, horizon scanners, magnetometers, etc. are
fed back to ACP without any error or time delay. As a part of our ongoing research works, we are looking
into incorporating the sensor dynamics in our ACS model and also to extend the problem to
accommodate control along all three axes.


4. A Review on Fault Trees:


The concept of fault tree evolved primarily within the U.S. aerospace and nuclear industries. Fault tree
was first developed in 1961 at Bell Telephone Laboratories by H.A. Watson for evaluating Minuteman
Launch Control System for an inadvertent missile launch. Fault trees have been extensively used in
system safety analysis for the last 40 years.


The fault tree for any failure analysis has a basic structure as shown in Figure-1. The top event in a fault
tree is the failure, which is to be analyzed. The basic events are the occurrences beyond which there is
no further interest for analysis. The basic events (often called leaves) are connected to the top events
through some intermediate events (often called nodes) which show how fault(s) propagated into the
system and led to a failure. The top event in the fault tree has to be foreseen by the analyst. The event(s)
in one level of a fault are connected to the event(s) at the next level of the tree through some logic gates,
which are not shown in Figure-1.




                                      Figure-1: Basic Fault Tree Structure

The whole problem of fault tree can be divided into two parts – fault tree synthesis (FTS) and fault tree
analysis (FTA). FTS and FTA can be used in two ways – as a design tool to save costly design changes
or as a diagnostic aid for diagnosing faults in the system. In our study, we have used fault tree as a
diagnostic aid for fault diagnosis and health monitoring in the ACS of a satellite.


We point out that fault tree is not a complete representation of all possible faults and failures in a system.
It is usually capable of representing the combinations of events for a failure, which have been foreseen by
the analyst. It should also be noted here that FTA is primarily a means for analyzing the cause of a
failure. Therefore, the top event in a fault tree should be detected by other mechanism. Hence, the
existence of an efficient fault detection mechanism has been assumed here.


Fault tree generation or synthesis is considered to be a relatively difficult problem. This is mainly due to
the fact that a very good understanding of the system is often required for FTS. Once a tree is generated,
both qualitative and quantitative analysis can be performed to figure out the root cause of failure. Though
manual synthesis of fault tree is still common in today’s industries, for large and complex systems,
manual synthesis is often not feasible. Therefore, there is a need for some automatic fault tree synthesis
algorithms.
The Fault Tree Handbook [3] by Office of Nuclear Regulatory Research provides basic guidelines for
FTS. Common fault tree construction approaches include component transfer function based fault tree
synthesis, fault tree synthesis based on discrete event systems specification simulation and generation of
fault trees from databases of examples comprising of data recordings by tree induction algorithms.
A formal methodology for FTS appears in [4]; it utilizes a set of component transfer function, which
represents the components and their various failure modes. Taylor [5] suggested FTS using mini fault
tree models of the components. The main advantage of mini-fault tree model is that it can be used
repeatedly in different system failure analysis. An automated fault tree generation methods using
symbolic discrete event systems specification simulation is found in [6], which provides powerful
capabilities for specifying coupling relations of component modes and representation of timing related
faults and their effects. Madden and Nolan [7] has presented an alternative way to induce fault trees from
example cases. The advantage of their approach is that detailed process knowledge is not required. In
[8], authors have presented a fault-diagnosis engine, which utilizes the algorithm for induction of fault
trees introduced in [7]. The diagnostic procedure learns from a database of examples comprising of data
or sensor values that have been categorized into normal behavior or abnormal behavior depending on the
absence or presence of one or more faults in the system.


5. Proposed FTS Algorithm:

The proposed algorithm for FTS is similar to the algorithm presented in [7] to some extent. Our algorithm
also generates fault trees from vectors of numeric feature value of the attributes by induction. However,
there are some important differences between these two algorithms. First, instead of using a single
threshold point for a numeric feature value, we have used more than one range of values for the features
of some of the attributes. The advantage here is that a range is relatively easy to determine compared to
an exact threshold point for a feature value.


Secondly, the algorithm in [7], on selecting a feature value Vx of an attribute Ax with threshold Tx,
considers both the cases when Vx<Tx and Vx ≥Tx. This results in a generalized tree for a particular top
event. This type of tree is useful in detecting and classifying faults using fault trees. But our objective
here is to find out the reason behind a failure immediately after the detection of an anomaly in the system.
Note that one can also evaluate all possible combination of events behind a failure that may provide
useful information into future design to avoid other unforeseen combination of events that could lead to
the failure scenario. However, at present, we are interested in identifying the exact combination of events
behind a failure. In order to achieve this, we define different and distinct (non-overlapping) ranges for
numeric feature values, whenever it is necessary to do so. Upon selection of the numeric feature value for
a particular attribute, the proposed algorithm considers only that subset of examples for which the current
value of the feature and existing feature values (discussed in the next section) are in the same range.
Finally, we are using an ‘Undeveloped Leaf’ in the tree, which represents that the tree could not be
developed beyond that point due to lack of information (examples).


 Proposed FTS Algorithm:
 Input:
 A set of examples with each example consisting a vector of numeric feature values for attributes. Each
 example is tagged with a TRUE or FALSE flag to indicate whether the example represents fault
 condition or fault-free condition. Also, in each example, each feature value is tagged with an
 identification for the class (range) it belongs to.
 Algorithm:
 A.    Form a TOP NODE with the undesired event.
 B.    1.     Select an attribute Ax. Read its current feature value. Determine the pre-defined range i to
              which the current feature value belongs.
 B.    2.     a.    (Only if executing step-B for the first time):
                    Connect an AND gate with the Top Node and place a node (FAx Є i) at one input
                    branch of the AND gate.
              b.    (Only if back from B.3a):
                    Place (FBx Є i) at one input of the AND gate.
              c.    (Only if back from B.3c):
                    Replace (FBx Є j) by an AND gate and place (FBx Є j) at one input of the AND gate.
 B.    3.     To develop the other branch of AND gate, consider all the examples in the example-set
              where, FAx Є i.
              a.   If all examples are FALSE, remove the (FAx Є i) node. Select next attribute Bx by going
                   back to step-B.1 and executing the algorithm recursively.
                   If no more attribute is available, connect an ‘Undeveloped Event’ leaf with the
                   remaining input of the AND gate and STOP.
              b.   If all examples are TRUE, then STOP.
              c.   Otherwise, to develop the other branch of the AND gate, select next attribute Bx and
                   place a node (FBx Є i), (where i is range for FBx in the current example), at the other
                   input of the AND gate. Repeat step-B.2 onwards recursively for Bx.
                   If no more attribute is available, connect an ‘Undeveloped Leaf’ with the remaining
                   input of the AND gate and STOP.
 C.    Remove all single-input AND gates.


Sometimes it may be difficult to specify distinct and non-overlapping ranges for a particular feature.
Consequently, there may be a need for specifying overlapping ranges. In that case, a feature value may
belong to more than one class (range). The proposed algorithm can be easily modified to fulfill such
requirement by putting an OR gate in the tree whenever a feature value belongs to more than one class
and placing all the classes at the input of that OR gate and finally, executing the remaining part of the
algorithm for each OR gate input, i.e., for each range of the feature. It is easy to see that the more the
number of such OR gate in the fault tree, the more generic will be the resulting tree.


The sequence in which the attributes are selected is important because some attributes divide the data
more efficiently than the others. The attribute selection is based on probability analysis and on knowledge
available on the system under consideration. It should be mentioned here that once the ACS model of a
satellite is available, detail knowledge on the ACS construction and operation is not necessary for fault
tree construction. Only decent knowledge on the ACS with some intelligent observations on various ACS
parameters at their different ranges can serve the purpose.


6. Proposed Framework for Fault Diagnosis in ACS:


As mentioned earlier, we assume the existence of an efficient error detection mechanism. Figure-2
illustrates the proposed framework: upon detection of any error, and hence fault, in the ACS, the
diagnosis system will start monitoring the pre-defined attributes (signals) on time frames of X sec with Y
sec overlap. The values of X and Y can be selected depending on the diagnosis requirement. After
acquiring data for a particular attribute over a frame, the next step is to extract features from the data.
Subsequently, a vector of numeric feature values for attributes will be generated. We call each value in
this vector as ‘current feature value’.




                                     Figure-2: Proposed Framework for Diagnosis


Feature extraction is a difficult part and can be done in several ways. The most commonly used
techniques include (a) Spectrum analysis by N-point DFT (Discrete Fourier Transform), (b) Standard
Control Systems specifications such as natural frequency, peak value, percentage overshoot, settling
time etc. (c) curve fitting techniques. At this stage, we are following the second approach for feature
extraction from the signals.


Once an example vector is created by feature extraction from the attributes, the next task is fault tree
synthesis. For this purpose, the current example will be added to the existing example set. The existing
example set consists of the vectors of numeric feature values for all attributes. This example set is to be
formed through the simulation of ACS (under fault-free condition as well as in presence of fault) and/or
from the data collected during the mission. The total example set (current plus previously existing) will be
the input to the FTS tool, which will generate the tree using the algorithm proposed in the previous
section. After this, the constructed tree can be used for fault diagnosis. If the generated tree does not
match with any of the existing trees in the database, it will be added to the library. Above computations
can be performed in real time or near-real time on-board and/or at ground for fault diagnosis and failure
analysis in the ACS.


7. Simulation Results:

In order to generate data for faulty scenarios, faults were injected in the ACS model. We have considered
failure scenarios such as increase in reaction wheel bearing friction due to improper lubricant flow,
random error in motor driver unit output, bus voltage failure to mention a few.


The pitch control loop was implemented so that the torque command voltage was in the + 5V range and
the RW angular momentum was approximately + 4 N-m-S at 5100 RPM. The torque generated by the
wheel was approximately within + 27mN-m range. We have chosen a time frame of 2500 sec with 1500
sec overlap. We are presenting here the results for one failure. Figure-3a and 3b show the responses for
pitch error (PE) for two different faults which led to a failure ‘Pitch Error > 0.03 degrees’ (PE>0.03) in the
ACS over a 2500 sec timeframe. Fault-1 was due to the increase in friction in the reaction wheel bearing
and fault-2 was because of the faulty output of the reaction wheel motor driver unit. In Figure-3a, system
behavior under fault-free condition can be observed between t=1500 and t=2000 and also between
t=3000 and t=4000. Figure-3c shows the fault tree, which would be generated by the proposed algorithm
for fault-1. The fault tree here shows combinations of events leading to the pitch angle failure. By an
event ‘F8 Є 3’ in the tree we imply that: feature value of the attribute or signal-8, which may be current,
voltage, speed etc., is in a pre-defined range ‘3’. A pre-defined range i for a feature value FAx of a
particular attribute Ax is the range specified by the analyst upon foreseeing the fact that because of the
presence of one or more fault(s) in the system, if FAx belongs to i, together with some other feature values
being in some particular ranges, a failure may take place. In the case of ‘F8 Є 3’, if the feature value of
signal-8 belongs to range-3, together with some other feature values being in some particular ranges (as
shown by the other nodes of the tree in Figure-3c), the resulting combination is a potential cause of a
failure. It may be mentioned here that where the feature extraction function is simple ‘min’ and/or ‘max’
values during the time interval for which the fault tree analysis has been performed, the resulting tree will
give us information on the failure as: “The top event took place when the peak value of motor current was
within the range (x1, y1) ampere AND maximum speed of the reaction wheel was within the range (x2, y2)
RPM AND minimum bus voltage was within the range (x3, y3) … ‘’ and so on. Generation of this type of
information in real or near-real time will save considerable amount of time spent on fault diagnosis.
Moreover, combinations of events obtained from generated fault trees may be avoided in future system
designs and modifications to make the system more reliable.
PE




                                     Time                             (a)
PE




                                       Time                           (b)                          (c)


Figure-3: (a): Pitch Error (PE) during fault-1, (b): Pitch Error (PE) during fault-2, (c): Resulting fault tree for fault-1


We have shown the results in the case where initial speed of RW is zero and the ACS is maintaining the
satellite’s required attitude within the accuracy of 0.02 degrees. Clearly, our method can be applied to
other initial conditions, if necessary, by determining different distinct ranges for feature values at those
conditions. For the failure for which we have shown the resulted tree, we could specify distinct ranges for
features. The fault tree could have been more complex and generic if we had used overlapping ranges for
feature values. As we have worked with simulated model and data generated from the model, our efforts
have been to resemble the real life spacecrafts and missions. We have worked with only those
parameters, which are measured and available through telemetry in spacecraft operations [1]. The
values of different parameters of ACS that we have used were inferred from the sample space mission
data collected from already launched spacecrafts. Sampling step size in the pitch control loop has been
chosen as 1 sec, which is common in practice.


8. Conclusion:

In this paper we have proposed a new fault tree synthesis algorithm, which utilizes learning techniques.
Given the system parameters under faulty and fault-free conditions, the algorithm can be implemented
without having the detailed knowledge on the design, construction and operation of the system. Further,
we have shown that the proposed algorithm can successfully determine the combination of events
leading to a failure. We have used the FTS and FTA as diagnostic aid for fault diagnosis in the Attitude
Control Subsystem of a satellite and have shown that proposed framework based on fault tree analysis
and synthesis techniques has potential for automated spacecraft health monitoring and diagnosis
applications.


References:


[1] W. J. Larson, J. R. Wertz (Eds.) ‘Space Mission Analysis and Design’. Second Edition, Kluwer
   Academic Publishers, 1999
[2] B. Bialke ‘High Fidelity Mathematical Modeling of Reaction Wheel Performance’. Advances in the
   Astronautical Sciences, Vol. 98. pp. 483 –496, 1998
[3] W.E. Vesley, F.F. Goldberg, N.H. Roberts, D.F. Haasl ‘The Fault Tree Handbook’, Technical Report
   USNRC NUREG 0492, United States’ Nuclear Regulatory Commission, January 1981
[4] J.B. Fussell, G.J. Powers, R.G. Bennetts ‘Fault Trees – A State of Art Discussion’. IEEE Transactions
   on Reliability, Vol. R-23, No. 1, April 1974
[5] J. R. Taylor ‘An Algorithm for Fault Tree Construction’. IEEE Transactions on Reliability, Vol. R-31,
   No. 2, pp. 137- 146, June 1982
[6] S. Chi, S. Lee, S. Park ‘Automated Generation of Fault Tree Using the Symbolic DEVS Simulation’.
   Proc. of AIS, 1993
[7] M.G. Madden, P.J. Nolan ‘Generation of Fault Trees from Simulated Incipient Fault Case Data’. Proc.
   9th Intl. Conf. on Applications of AI in Engg, 1994. Pennsylvania, USA. Pages: 567-574
[8] M.G. Madden, P.J. Nolan ‘Monitoring and Diagnosis of Multiple Incipient Faults Using Fault Tree
   Induction’. IEE Proc. on Control Theory and Applications, Vol. 146, Number 2. March 1999