Docstoc

THE UNIVERSITY OF YORK

Document Sample
THE UNIVERSITY OF YORK Powered By Docstoc
					    Safety Tactics for Reconfigurable Process Control Devices


                                    Adrian E Hill




            Report on a project submitted in part for the degree of
                MSc in Safety Critical Systems Engineering
                in the Department of Computer Science at

              THE UNIVERSITY OF YORK

                                4th September 2008




Number of words = 43655, as counted by MS Word word count command. This includes all the
body of the report but does not include the Reference Section or Appendices A and B.
Abstract

        Control systems are increasingly subject to reconfiguration either because of a failure of
        a component or a change in the process under control. System reconfiguration typically
        takes place by shutting down either the whole or part of an operational system and then
        going through an installation and commissioning programme before bringing the system
        back on line. This can be expensive, disruptive and inefficient, but with the introduction
        of intelligent field devices such as sensors and actuators, system reconfiguration can
        now be managed more efficiently through commercially available asset management
        software applications without the need to shut down the system.
        The advantages that the use of intelligent field devices brings to the process control
        industry are very appealing to the UK Naval Submarine Programme where operations in
        harsh environments mean that opportunities to shut down critical processes are limited.
        With the introduction of smart hardware into future Naval Submarine platforms the ability
        to reconfigure networked field devices in order to maintain levels of service without
        disrupting the operation of the submarine would be beneficial. However, as operation of
        a submarine is a safety critical activity, the question arises,
        “How can the benefits of systems that use intelligent reconfigurable hardware be
        realised without compromising safety”?
        This project report has sought to answer this question through the development and
        evaluation of a solution based on the use of safety tactics. Specific safety tactics have
        been developed and applied to a hypothetical submarine control system that implements
        intelligent plug and play functionality in a safety critical operation. The application of a
        tactical approach has been shown to reduce the hazards posed by intelligent plug and
        play functionality.




Statement of Ethics
        I confirm that the basic ethical principles of “do no harm”, “informed consent” and
        “confidentiality of data” have been considered while undertaking this project. This
        project is an academic work, has not embarked on experimentation and no user testing
        or questionnaires have been used.




Adrian E Hill                                    -2-                                     04/09/2008
Glossary of Terms and Abbreviations
Abbreviations
      ADC       Analogue Digital Conversion
      ALARP     As Low As Reasonably Practicable
      ASAAC     Allied Standards Avionics Architecture Council
      ASICs     Application Specific Integrated Circuits
      CCA       Component Criticality Analysis
      CHM       Hardware Component Monitoring
      CPLD      Complex Programmable Logic Device
      CNNRP     Chairman of Nuclear Naval Regulatory Panel
      COTS      Commercial Off The Shelf
      CRC       Cyclic Redundancy Check
      DAC       Digital Analogue Conversion
      DCS       Distributed Control System
      DDL       Device Description Language
      DIA       Distributed Intelligent Agent
      DS        Defence Standard
      DTM       Device Type Manager
      EDDL      Electronic Device Description language
      E/E/PES   Electrical, Electronics and Programmable Electronic Devices
      FDT       Field Device Tool
      FAST      Failure Avoidance Safety Tactic
      FCST      Failure Containment Safety Tactic
      FDST      Failure Detection Safety Tactic
      FFA       Functional Failure Analysis
      FMEA      Failure Modes and Effects Analysis
      FPGA      Field Programmable Gate Array
      FSK       Frequency Shift Keying
      FTA       Fault Tree Analysis
      GHM       Global level Health Monitoring
      GSN       Goal Structuring Notation
      HAZOP     Hazard and Operability
      HART      Highway Addressable Remote Transducer


Adrian E Hill                                   -3-                           04/09/2008
      HCF       HART Communications Foundation
      HCI       Human Computer Interface
      HDAL      Hardware Design Assurance Level
      HMI       Human Machine Interface
      HSE       High Speed Ethernet
      IMA       Integrated Modular Avionics
      IMS       Integrated Modular System
      iPMS      Integrated Platform Management System
      IVHM      Integrated Vehicle Health Management
      JSP       Joint Services Publication
      LRU       Lowest Replaceable Unit
      MHM       Module /application level Health Monitoring
      MOD       Ministry of Defence
      NCAP      Network Capable Application Processor
      NII       Nuclear Installations Inspectorate
      NIST      National Institute of Standards and Technology
      PHM       Partition /application level Health Monitoring
      PID       Proportional Integral Derivative (controller)
      PLC       Programmable Logic Controller
      PLD       Programmable Logic Device
      QoS       Quality of Service
      RTCA      Radio Technical Commission of Aeronautics
      RTBP      Real Time Blue Print
      SAL       Safety Assurance Level
      SHARD     Software Hazard Analysis and Resolution in Design
      SIL       Safety Integrity Level
      SIS       Safety Instrumented System
      STIM      Smart Transducer Interface Module
      TEDS      Transducer Electronic Data Sheet
      XDCR      Transducer




Adrian E Hill                                -4-                    04/09/2008
Glossary of Terms
        The following list provides a project specific definition of a number of common technical
        terms that are used throughout this project report.
         Asset Management        Application software that manages the overall data collected
                                 from distributed intelligent networks and field devices. Asset
                                 management allows devices to be installed, configured and
                                 maintained through the life of the process plant.
                                 Manufacturers design field devices to support either a single
                                 proprietary asset management system or open systems such
                                 as FDT/DTM.
         Health Management       An approach for monitoring the operational health of a device
                                 or system.     Health management can include predictive
                                 analysis to enable the system to predict device failures before
                                 they occur.
         Intelligent Hardware    Intelligent hardware (device) describes a device that provides
                                 built in functions that support diagnostic, health and asset
                                 management as part of a distributed control system.
         iPMS                    The integrated platform management system that provides
                                 the means by which platform control is maintained from the
                                 HMI through to plant machinery.
         OPC                     OPC is a system of open standards developed and supported
                                 by the OPC Foundation for the automation industry. OPC
                                 enables devices to communicate and enables access to
                                 system wide data.
         Smart Instrument        In its broadest sense a smart instrument is a field device that
                                 includes some software (firmware) as indicated by the use of
                                 a microprocessor or other similar complex logic technology
                                 (FPGAs, CPLDs, PLDs, ASICs) and is capable of
                                 communicating on a digital network.




Adrian E Hill                                  -5-                                    04/09/2008
Acknowledgments


        I would like to thank all those who have supported my work at The University of York
        which has cumulated in completion of this project report. Special thanks go to Dr Mark
        Nicholson who has supervised this project and to BAE Systems Submarine Solutions
        who has sponsored and supported me throughout.
        An extra special thank you must go to my wife, Stella and daughter, Esther who have
        patiently suffered my further educational exploits.




Adrian E Hill                                 -6-                                   04/09/2008
Table of Contents
ABSTRACT ................................................................................................................................................................ 2
STATEMENT OF ETHICS....................................................................................................................................... 2
GLOSSARY OF TERMS AND ABBREVIATIONS............................................................................................... 3
ACKNOWLEDGMENTS.......................................................................................................................................... 6
TABLE OF CONTENTS ........................................................................................................................................... 7
LIST OF FIGURES.................................................................................................................................................... 9
LIST OF TABLES...................................................................................................................................................... 9
1       INTRODUCTION ............................................................................................................................................ 10
2    LITERATURE SURVEY................................................................................................................................ 12
  2.1 INTELLIGENT HARDWARE ............................................................................................................................. 12
  2.2 PROCESS INDUSTRY ...................................................................................................................................... 14
     2.2.1 Process Control .................................................................................................................................. 14
     2.2.2 Current Advances in Process Control ................................................................................................ 15
     2.2.3 Health Monitoring and Asset Management ........................................................................................ 16
     2.2.4 Networks of Intelligent Hardware ...................................................................................................... 17
     2.2.5 Distributed Intelligent Agents ............................................................................................................. 24
  2.3 NUCLEAR INDUSTRY ..................................................................................................................................... 25
     2.3.1 Safety Concerns .................................................................................................................................. 25
     2.3.2 Approach to Intelligent Hardware...................................................................................................... 26
  2.4 AEROSPACE INDUSTRY ................................................................................................................................. 27
     2.4.1 Integrated Modular Avionics (IMA) ................................................................................................... 27
     2.4.2 IMA Reconfiguration and Certification Issues ................................................................................... 28
     2.4.3 Blueprints in a wider context .............................................................................................................. 30
     2.4.4 IMA Certification Issues ..................................................................................................................... 30
     2.4.5 Dynamic Reconfiguration of IMA - Certification Issues..................................................................... 31
  2.5 REGULATORY ISSUES .................................................................................................................................... 31
     2.5.1 General ............................................................................................................................................... 31
     2.5.2 Application to Military Systems .......................................................................................................... 32
     2.5.3 Certification and Standards Compliance Issues ................................................................................. 33
  2.6 COTS ACQUISITION AND SAFETY ISSUES ..................................................................................................... 33
     2.6.1 Justifying the use of COTS Components ............................................................................................. 34
  2.7 SAFETY TACTICS ........................................................................................................................................... 35
  2.8 LITERATURE SURVEY CONCLUSIONS ............................................................................................................ 36
3    SYSTEM RECONFIGURATION – A TACTICAL APPROACH TO SAFETY ASSURANCE ............ 38
  3.1 PROCESS CONTROL IN CONTEXT ................................................................................................................... 38
  3.2 A GENERIC ARCHITECTURE FOR PROCESS AUTOMATION AND RECONFIGURATION ...................................... 39
     3.2.1 Introduction to a Generic Architecture for Process Automation and Reconfiguration ...................... 39
     3.2.2 A Proposed Process Automation Architecture.................................................................................... 40
     3.2.3 Benefits of a layered architecture ....................................................................................................... 42
  3.3 GENERAL SAFETY ISSUES WITH SYSTEM RECONFIGURATION ........................................................................ 43
     3.3.1 Scope of Issues.................................................................................................................................... 43
     3.3.2 Plug and Play at the Field Devices layer ........................................................................................... 44
     3.3.3 Device Reconfiguration at the Field Devices layer ............................................................................ 45
     3.3.4 System Reconfiguration at the Application Layer............................................................................... 46
  3.4 A TACTICAL APPROACH TO SAFETY ASSURANCE ......................................................................................... 47
     3.4.1 Failure Avoidance .............................................................................................................................. 48
     3.4.2 Acquisition Process............................................................................................................................. 48
     3.4.3 COTS Acquisition and Safety Contracts ............................................................................................. 52
     3.4.4 Failure Detection................................................................................................................................ 56
     3.4.5 Failure Containment........................................................................................................................... 59
  3.5 SAFETY JUSTIFICATION ................................................................................................................................. 62
  3.6 INSTANTIATION OF THE TACTICAL APPROACH .............................................................................................. 62
  3.7 CONCLUSION TO PROPOSAL PHASE ............................................................................................................... 63
     3.7.1 Overview of Safety Tactics.................................................................................................................. 64

Adrian E Hill                                                                      -7-                                                                   04/09/2008
4    CASE STUDY – PLUG AND PLAY IN A SUBMARINE INTEGRATED PLATFORM
     MANAGEMENT SYSTEM ............................................................................................................................ 65
  4.1 INTRODUCTION TO CASE STUDY ................................................................................................................... 65
  4.2 SUBMARINE HOVER CONTROL ...................................................................................................................... 66
     4.2.1   Overview of Hover System.................................................................................................................. 66
     4.2.2   Hover Control..................................................................................................................................... 67
  4.3 SUBMARINE HAZARDS .................................................................................................................................. 68
     4.3.1   Flowmeter Failure Analysis................................................................................................................ 69
  4.4 FLOWMETER REQUIREMENTS ........................................................................................................................ 72
  4.5 SELECTION AND APPLICATION OF SAFETY TACTICS...................................................................................... 72
     4.5.1   High Level Requirements.................................................................................................................... 73
     4.5.2   Architecture Level Constraints ........................................................................................................... 73
     4.5.3   Behavioural Level Constraints ........................................................................................................... 73
     4.5.4   Quantifiable Level Constraints ........................................................................................................... 74
     4.5.5   Documenting of Tactics ...................................................................................................................... 74
     4.5.6   Application of Safety Tactics and Safety Contracts ............................................................................ 74
  4.6 FAILURE AVOIDANCE TACTICS ..................................................................................................................... 74
     4.6.1   Reference COTS component ............................................................................................................... 74
     4.6.2   Evaluation and Selection Criteria ...................................................................................................... 75
     4.6.3   High level Requirements..................................................................................................................... 77
  4.7 FAILURE DETECTION TACTICS ...................................................................................................................... 78
     4.7.1   Communication Failure Detection Tactics ......................................................................................... 79
     4.7.2   Initialisation Failure Detection Tactics.............................................................................................. 80
     4.7.3   Diagnostic Failure Detection Tactics ................................................................................................. 83
  4.8 FAILURE CONTAINMENT TACTICS ................................................................................................................. 84
     4.8.1   General Failure Containment Tactic.................................................................................................. 84
     4.8.2   Communication Failure Containment Tactics .................................................................................... 85
     4.8.3   Initialisation Failure Containment Tactics......................................................................................... 87
     4.8.4   Diagnostics Failure Containment Tactics .......................................................................................... 88
  4.9 SUMMARY OF CASE STUDY SAFETY TACTICS ............................................................................................... 89
  4.10     SAFETY JUSTIFICATION ............................................................................................................................ 90
     4.10.1     Overview of Safety Justification..................................................................................................... 90
     4.10.2     Safety Justification Issues .............................................................................................................. 92
  4.11     CONCLUSION TO CASE STUDY .................................................................................................................. 92
5    PROJECT CONCLUSION............................................................................................................................ 93
  5.1 PROJECT SUMMARY ...................................................................................................................................... 93
  5.2 OVERALL CONCLUSION ................................................................................................................................ 93
6    FUTURE WORK............................................................................................................................................. 95
  6.1 FURTHER DEVELOPMENT .............................................................................................................................. 95
     6.1.1   Development of Patterns..................................................................................................................... 95
     6.1.2   Safety Tactics for Increasing Complexity ........................................................................................... 95
  6.2 EMERGENT WORK ......................................................................................................................................... 96
     6.2.1   Dynamic Reconfiguration at the Field Device Layer.......................................................................... 96
     6.2.2   Safety Analysis of Device Description Languages.............................................................................. 96
     6.2.3   Autonomous Devices........................................................................................................................... 96
     6.2.4   Wireless Devices ................................................................................................................................. 96
     6.2.5   Safety Certification ............................................................................................................................. 97
7    REFERENCES ............................................................................................................................................... 98
APPENDIX A - PLUG AND PLAY FAILURE MODES ................................................................................... 101
APPENDIX B – CASE STUDY SAFETY ARGUMENT STRUCTURE .......................................................... 105




Adrian E Hill                                                                  -8-                                                               04/09/2008
List of Figures
Figure 2-1 Generic Intelligent Sensor Architecture.................................................................................................... 13
Figure 2-2 Generic Process Control [15].................................................................................................................... 15
Figure 2-3 Intelligent System Architecture [37]......................................................................................................... 16
Figure 2-4 Digital on Analogue Signal [24] ............................................................................................................... 18
Figure 2-5 General Use of HART [24]....................................................................................................................... 19
Figure 2-6 Foundation Fieldbus network [34] ............................................................................................................ 20
Figure 2-7 Profi Network with ProfiSafe Layer [36]................................................................................................. 21
Figure 2-8 IEEE 1451 Interface Model [26]............................................................................................................... 23
Figure 2-9 Scope of FDT/DTM [49] .......................................................................................................................... 24
Figure 2-10 IMS Layered Architecture [7]................................................................................................................. 27
Figure 2-11 IMA Blueprints [13] ............................................................................................................................... 29
Figure 2-12 Component Acquisition Process ............................................................................................................. 34
Figure 2-13 Diagrammatic View of COTS Evaluation .............................................................................................. 35
Figure 2-14 Safety Tactics for Software Architecture Design [41] ............................................................................ 36
Figure 3-1 Layered Architecture ................................................................................................................................ 40
Figure 3-2 Expanded Process Automation Architecture............................................................................................. 40
Figure 3-3 Modified Safety Tactics for System Reconfiguration............................................................................... 47
Figure 3-4 “V” Model of the COTS Development/Safety Lifecycle [33] .................................................................. 49
Figure 3-5 COTS Software Acquisition Process ........................................................................................................ 50
Figure 3-6 COTS Component Acquisition ................................................................................................................. 50
Figure 3-7 COTS Component Evaluation, Selection and Acquisition ....................................................................... 52
Figure 3-8 – Failure Detection.................................................................................................................................... 57
Figure 3-9 Masking Based Failure Containment ........................................................................................................ 61
Figure 3-10 Lifecycle of Tactical Approach............................................................................................................... 63
Figure 4-1 Typical Submarine at sea .......................................................................................................................... 65
Figure 4-2 Basic Submarine Hover Control ............................................................................................................... 66
Figure 4-3 Hover Control Layered Architecture ........................................................................................................ 67
Figure 4-4 Submarine Depth, Roll and Pitch.............................................................................................................. 68
Figure 4-5 Segment of Hover Control System ........................................................................................................... 85
Figure 4-6 Overall Safety Argument .......................................................................................................................... 90
Figure 4-7 Failure Avoidance Argument.................................................................................................................... 91
Figure 4-8 Failure Detection and Containment Argument ......................................................................................... 91

List of Tables
Table 2-1 HART Protocol Commands [24]................................................................................................................ 19
Table 2-2 Properties of PROFISafe [36] .................................................................................................................... 21
Table 3-1 Example of Safety Contract for COTS Component Acquisition [33] ........................................................ 53
Table 3-2 Example of IMA Behavioural Level Contract [38].................................................................................... 54
Table 3-3 Example of COTS Component Contract.................................................................................................... 55
Table 4-1 Failure Detection Constraints (Communication)........................................................................................ 80
Table 4-2 Failure Detection Constraints (Initialisation) ............................................................................................. 83
Table 4-3 Failure Detection Constraints (Diagnostics).............................................................................................. 84
Table 4-4 Failure Containment Constraints (Communication)................................................................................... 86
Table 4-5 Failure Containment Constraints (Initialisation) ........................................................................................ 88
Table 4-6 Failure Containment Constraints (Diagnostics) ........................................................................................ 88
Table 4-7 List of Safety Tactics used in Case Study .................................................................................................. 89




Adrian E Hill                                                                   -9-                                                                04/09/2008
1 Introduction
        Recent advances in the process control industry have led to the wide availability of what
        has become known as smart instruments. The degree of intelligence held within this
        new breed of hardware device can vary from simple microprocessor controlled signal
        conditioning and data manipulation through in-built diagnostics and asset management
        to autonomous real time reconfiguration and wireless integration.
        The most numerous of these smart instruments are typically sensor type devices that
        are designed to measure one or more process variables as part of a distributed control
        system. Process actuation devices, typically valve positioners are also available with a
        growing number of complex functions including closed loop control functions such as
        proportional integration (PID) etc that were once only available within process control
        application software packages, but are now built into the device itself.
        The process automation industry has in recent years taken advantage of the increase in
        complex functionality at the device level, especially with regard to increased diagnostics
        as well as health and asset management across distributed systems.                   More
        sophisticated asset management systems enable failures to be predicted through the
        use of trend data supplied by the field device itself and it can be envisaged that such
        systems may be capable of dynamic reconfiguration so ensuring the continuation of
        control in the presence of field device failure. Dynamically reconfigurable systems may
        also facilitate the safe replacement of hardware before it fails rather than after it has
        done so. Intelligence built into the field device also enables sophisticated plug and play
        type functions to be implemented so that the replacement of defective or potentially
        defective hardware may be undertaken more easily by the maintainer. In practice, the
        ability for systems to automatically configure new hardware in real time without the
        maintainer having to undertake complex manual activities enables systems to be
        efficiently maintained with a reduction in the scope for human error.
        The advantages that the use of intelligent devices brings to the process control industry
        are very appealing to the UK Naval Submarine Programme. With the introduction of
        smart hardware into future Naval Submarine platforms the ability to dynamically
        reconfigure control systems of networked field devices in order to maintain levels of
        service and improve fault tolerance would be appealing.
        The latest class of nuclear powered submarines for the British Royal Navy is the Astute
        Class. The Astute Class is special, not only because it is the largest attack submarine to
        be built for the British Navy but in the context of this project it is the first British
        submarine to have a large process automation system that offers both control and
        monitoring functions. This system is called an integrated Platform Management System
        (iPMS) that serves a wide range of submarine control systems that now demand a level
        of intelligent monitoring and control that in previous class submarines was not possible.
        The current situation is that there are a limited but increasing number of smart
        sensors/actuators being introduced on the Astute nuclear submarine platform. The
        current iPMS is the first step in the process of moving toward greater automation of
        submarine systems and although intelligent control is used, the advantages of being
        able to access the additional data available from smart hardware has not yet been
        realised. However, the benefits of implementing commercial off the shelf (COTS)
        intelligent hardware from the process control industry within the next generation of UK
        nuclear submarine platforms is appealing due to the overall need to reduce costs and
        improve performance whilst maintaining the safe operation of what is in effect a mobile
        nuclear reactor, process plant, hotel and fighting machine.



Adrian E Hill                                  -10-                                    04/09/2008
        The requirement to maintain the safe operation of the submarine whilst taking advantage
        of increasingly intelligent hardware is a key driver for this project. Generally speaking,
        the use of software intensive systems has caused nervousness amongst those in the
        military who have an interest in such systems. Headlines, such as “Software glitches
        leave Navy Smart Ship dead in the water” [1] concerning the problems that beset the
        USS Yorktown when designers used software intensive systems to make savings in
        “manpower, maintenance and cost” [1] do nothing to increase confidence in complex
        intelligent systems. Although the problem with the USS Yorktown was not specifically
        due to failures with intelligent hardware, the headline is not easily forgotten. Likewise,
        the UK Nuclear Regulator’s risk adverse attitude to smart instrumentation is also a
        cause of nervousness when contemplating the implementation of intelligent systems
        when such systems are associated with platforms powered by nuclear power plants.
        So, the question arises, “how can the benefits of systems that use intelligent
        reconfigurable hardware be realised without compromising safety”? The motivation
        behind this project has been to develop a solution that addresses this question.
        The aims of the project have been to,
        1. Determine the extent to which the safety aspects of implementing the dynamic
           reconfiguration of intelligent hardware have been addressed in industry.
        2. Develop a method for arguing that a system based on reconfigurable intelligent field
           devices has been safety implemented.
        3. Evaluate the chosen approach to arguing the safety of reconfigurable intelligent field
           devices.
        In order to satisfy the aims of the project an approach was taken that focused on the
        adaptation of existing, relevant and appropriate techniques. The initial project scope
        was broad in its approach to the problem area and included the safety issues associated
        with reconfigurable systems, smart instruments, intelligent hardware and
        certification/regulatory issues. This scope was reduced through the literature survey and
        proposal phase to address the specific area of reconfigurable plug and play intelligent
        hardware at the system level. The overall approach taken to address the project aims
        has been,
        1. To undertake a review of relevant research and information that is currently available
           including,
                • Process control systems in the commercial market place and their applicability to
                  reconfigurable systems, intelligent devices, diagnostics and asset management.
                • The approach to system reconfiguration taken by the avionics industry and the
                  safety work undertaken with regard to the certification of IMA systems.
                • Nuclear safety and regulatory issues around smart instrumentation.
                • Regulatory issues and standards that are applicable to the design and
                  implementation of reconfigurable systems.
                • Methods of justifying the use of COTS components.
        2. To develop a tactical approach to safety in reconfigurable systems in the context of
           layered system architecture that includes system applications, Infrastructure and
           intelligent field devices.
        3. To undertake an evaluation of the method through the application of the proposed
           tactical approach to a UK naval submarine platform case study.




Adrian E Hill                                     -11-                                   04/09/2008
2 Literature Survey
        The purpose of this literature survey was to review a range of current and past work being
        undertaken across a wide range of industries in the area of process control and dynamic
        reconfiguration.

        The scope of the literature survey was the following,

         •      Gain an understanding of the term Intelligent Hardware in the context of dynamic real time
                reconfigurable systems.
         •      To understand the use of intelligent hardware within the Process Industry and review the
                current technological developments around intelligent hardware.
         •      To review the current issues faced by the Nuclear Industry with regard to intelligent
                hardware. This is relevant in the context of nuclear powered submarines
         •      To review the work that has been undertaken in the Aerospace Industry in the areas of real
                time configuration and IMA systems.
         •      To assess the Regulatory Issues faced by industry when contemplating the use of
                reconfigurable systems
         •      To determine the issues around COTS Acquisition and Safety Issues as many of the
                components used to build a reconfigurable system are Commercial Off The Shelf (COTS)
                items
         •      Review a method of applying a tactical approach to safety assurance at the architectural
                level.

2.1   Intelligent Hardware
        In older analogue control systems the method of compensating for inaccuracies within
        for sensors was by applying bespoke correction factors, calibration curves or additional
        signal processing to the raw analogue signal by the controlling computerised system.
        Modern digital systems can now perform these and many additional functions within the
        sensor itself and transmit via digital media corrected data directly to the control system.
        Back in the 1990’s Najafi [16], produced a paper on Smart Sensors whose purpose was
        to provide a review of smart sensor technology available at the time. Najafi [16]
        provides a definition of a smart sensor that states that “a smart sensor is defined as one
        that is capable of (i) providing a digital output; (ii) communicating through a bidirectional
        digital bus; (iii) being accessed through a specific address; and (iv) executing commands
        and logical functions”. Najafi [16] constructed this definition from information provided
        by Brignell [18] and Giachino [19]. However, this definition, although described as
        “broad” by Najafi [16] is in my opinion quite narrow in so far as the specific technology
        being described. Whilst the definition to execute commands and logical functions is
        applicable to all smart sensors the requirement to provide a digital output and to be
        addressable on a network is very limiting.
        A better and more inclusive definition of a smart sensor is provided by Dubey [17]
        “Smart sensors are sensors with integrated electronics that can perform one or more of
        the following… logic functions, two-way communication, make decisions...” This is a
        particularly useful in that the smart sensor is now defined in terms of integrated
        electronics, logic functions and decision making functions that strictly speaking do not
        have to be provided via a digital output. However, the requirement for two way
        communication is still limiting in that many sensors that might otherwise be defined as
        “smart” are not networked and only provide an analogue output to a control system. In
        this case the sensor is smart in terms of internal processing and manipulation of
        information but is dumb in terms of its connectivity.


Adrian E Hill                                        -12-                                      04/09/2008
        A longer, but perhaps more inclusive definition is provided by Sellafield Limited, a major
        player in the British Civil Nuclear Industry whose definition [20] of a smart instrument is;
        “A Smart Instrument meets the following criteria:
         •      That the main purpose of the instrument is to measure or directly control a single
                process variable.
         •      That despite using a microprocessor (or similar), it is a proprietary or ‘off the shelf’
                instrument in common use.
         •      It may (or may not) include some flexibility in its use, due to parameters that are set
                by the vendor or user.
         •      That its life cycle includes the production of some generic embedded firmware by
                the manufacturer and may include some particular configuration software or settings
                by the user. This is often called Fixed Programming Language (FPL).”
        “Smart hardware should not be restricted to measurement but should also include
        actuators, valves, motor starters and other control instruments.”
        The definition given by Sellafield Limited [20] is inclusive of a wider scope of
        instrumentation to be classified as smart. This includes the most basic requirement that
        the instrument has to include some software (firmware) as indicated by the use of a
        microprocessor or other similar technology. From this definition, devices that implement
        various types of logic such as FPGAs, CPLDs, PLDs, ASICs etc. are also included. It is
        also interesting that actuators have also been explicitly included in the definition given by
        Sellafield Limited [20] whereas they are not in other definitions which focus on sensors.
        From the various definitions reviewed, none really fit the need to describe the level of
        intelligence required to support real time reconfiguration based on health
        management/diagnostic type data. For this reason, the term Intelligent Hardware or
        Device will be used throughout the rest of this project to describe a device that provides
        functions that support health monitoring and asset management as well as the “smart”
        functions described by other definitions.




                             Figure 2-1 Generic Intelligent Sensor Architecture
        Figure 2-1 provides a generic functional block diagram of a fictional intelligent device
        that is based on engineering practice and my generic definition. An intelligent device will
        require as a minimum,
         •      A sensing element to capture the analogue process variable
         •      Some signal conditioning for the analogue input
         •      A means of converting the analogue signal to a digital format for further processing
         •      A microprocessor or other programmable logic device to handle digital manipulation,
                health monitoring, diagnostics and other digital processing.
Adrian E Hill                                       -13-                                     04/09/2008
         •      A digital to analogue converter for systems that require an analogue output, for
                example 4 – 20 mA
         •      A digital output for systems that require a direct digital connection to the device.

2.2   Process Industry
2.2.1 Process Control
        Process control systems have been widely used across industry for many years,
        particularly in the automation of factories used for the mass production of consumer
        items such as automobiles, food stuffs and paper products etc. Automated control is
        also used in the process intense petrochemical and pharmaceutical industries. All these
        industries have driven advances in process control technology in order to improve the
        reliability of factory automation systems and in so doing improve efficiency by reducing
        the plant down time due to system failures.
        One significant area of technology advancement has been in the increased intelligence
        held within low level field devices such as sensors and actuators. One of the drivers for
        greater intelligence within field devices has been to enable better asset management of
        such devices through improvements in health monitoring, diagnostics, automatic
        configuration and reconfiguration and what is become known as “Plug and Play”. Plug
        and play functionality provides the ability to simply replace a defective unit with a
        replacement without the need to manually configure it or change the application software
        to accommodate it. This has huge benefits in terms of the maintenance and
        upgradeability of process plants, but the question that this work sets out to answer is,
        how can such benefits be achieved whilst still maintaining the safety risk to as low as
        reasonably practicable (ALARP)?
        A review of how the Process industry implements intelligence within smart hardware and
        reconfigurable systems is very relevant to the submarine industry as it moves away from
        bespoke military applications and seeks to implement commercial off the shelf (COTS)
        process control systems. The submarine iPMS is in effect a process control system that
        is specifically tailored for the military environment. The submarine iPMS is built from
        components from the process industry including standard components such as
        Programmable Logic Controllers (PLCs), Single Board Computers, smart and dumb
        sensors and actuators.
        A generic process control system on a submarine is used to control various processes
        including the flow of liquids such as oils and water around the submarine as well as
        electrical supplies. Typically, actuation devices include valves, pumps and hydraulic
        systems. Human interaction with the system is typically through a human computer
        interface (HCI) that is computer controlled and includes touch screens to allow “under
        glass” control. This provides control to be undertaken without the need for keyboards
        and pointer devices.
        A model of a process control system, shown in Figure 2-2 depicts a system in which
        some physical process is measured using sensors and then manipulated using
        actuation. Typically, such a process is controlled by an automated controller which
        controls the physical process within set limits. Human controllers either monitor the
        control process or directly interact with the control system through the HCI.




Adrian E Hill                                       -14-                                      04/09/2008
                               Figure 2-2 Generic Process Control [15]

        One pertinent issue for the process industry and for that matter the military, is the extent
        to which the human supervisor needs to be involved in the control of the process. As
        computer based systems become more capable in terms of intelligence it must be asked
        whether the human operator should be taken out of the control loop when fault
        conditions are detected and should only be involved in system restoration in special
        circumstances when command override type decisions can be made to continue with
        degraded systems. Dynamic reconfiguration of systems could be used to take workload
        off the human operator during high stress fault conditions and enable the operator to
        focus on more important tasks. However, there would need to be a balance struck
        between the advantages of dynamic reconfiguration and reduced human interaction so
        that the probability of human error due to the lack of awareness of the new system
        status during and following reconfiguration is managed.

2.2.2 Current Advances in Process Control
        In recognising the current trend in process control, Yalcinkaya, Atherton, Calis and
        Powner [37] extend the generic process control shown in Figure 2-2 to describe an
        intelligent system architecture (Figure 2-3) in which smart sensors are used in
        conjunction with decision making algorithms and plant models to enable very
        sophisticated fault detection to take place. Although Yalcinkaya et al do not explicitly
        say so there is scope here for an intelligent system to not only diagnose a fault and
        provide a prognosis, but to also reconfigure itself to provide continuation of service in the
        presence of a fault condition. They do however recognise the potential for smart
        sensors to communicate with other devices and modify their own behaviour accordingly.




Adrian E Hill                                   -15-                                      04/09/2008
                                                            Human Supervisor
                                                              (Controller)




                                                 Displays                       Controls


                                                     Decision                                          Input
                                                                                     Pre-processing
                                                      Taking

                                                                                  Updates
                                            Evaluation
                                                            Plan                      Prediction
                                                                        Model
                      Output             Judgement                      Data

                                  Behaviour          Plan
                                  Generation
                                                            States



                     Actuators
                                                                                                      Sensors

                            Controlled
                            variables                                                                          Measured
                                                                                                               variables
                                                             Physical Process


                                           Process                                         Process
                                           Outputs                                          Inputs

                                                                Disturbances

                                                              Indirect Monitoring
                                                              Direct Monitoring/Control


                            Figure 2-3 Intelligent System Architecture [37]

2.2.3 Health Monitoring and Asset Management
        Bailey [21] presents a case for asset management pushing advanced diagnostics to
        deliver enhanced process plant performance and significant savings in through life
        maintenance. As instruments become smarter, basic diagnostic functions, calibration,
        configuration of user programmable parameters are being replaced by more advanced
        diagnostics with a focus on being able to predict plant failures and deploy maintenance
        before failure occurs. Bailey [21] provides an example of a servo valve with advanced
        diagnostics that has the capability to learn what normal plant parameters should be
        experienced and therefore is able to raise warnings when abnormal parameter values
        are experienced. Bailey also points out that conditions such as measurement sensor
        drift, leaks, losses and instability can be detectable with modern hardware and by
        detecting potential failures early; larger plant failures can be avoided. This is a
        significant driver in the process industry where plant shutdown due to failure can be very
        costly.
        The prospect of detecting failures earlier is also taken up by Aaseng [22] in his paper on
        Integrated Vehicle Health Management (IVHM) for space vehicles, who emphasises the
        need to be able to “…understand the state of the vehicle and its components, to restore
        the vehicle to nominal system status when malfunctions occur, and to minimize safety
        risks and mission impacts that results in system failures…”. Aaseng [22] suggests four
        areas of interest need to be addressed in order to satisfy this emphasis on IVHM,




Adrian E Hill                                               -16-                                                       04/09/2008
         •      Diagnostics - Knowing what system components are not operating correctly, and to
                what degree they aren’t working
         •      Mitigation - Dealing with the failure for as long as necessary, while maximizing
                mission effectiveness in spite of the failure
         •      Repair - Replacing or otherwise restoring the failed components to a nominal state
         •      Verification - Making sure that the repairs fixed the problems and that no latent side
                effects persist.
        For the purposes of discussing the part intelligent hardware plays in the health
        monitoring and reconfiguration of a system, the two most pertinent points drawn out by
        Aaseng [22] are “diagnostics” and “mitigation”. In terms of diagnostics, Aaseng [22]
        extends the principle to encompass prognosis, in a similar way that Bailey [21] described
        the advantage of advanced diagnostics to detect potential failures. This is of great
        interest to the submarine application as the ability to detect potential failures and
        intelligently reconfigure the system around that potential failure is appealing, especially
        as there can be few opportunities to repair or replace components during intense
        periods of sustained operations, including “fight” situations. Aaseng [22] recognises that
        repair of space vehicles can be equally challenging when in flight but only considers
        reconfiguration of a system from a manual perspective following a review of the situation
        by human operators using fault models. The use of fault models is an interesting idea
        that could be taken forward and used, in similar fashion as a static configuration list, in a
        system that is capable of dynamic reconfiguration.
        Aaseng [22] also highlights an issue with the perceived trustworthiness of software
        based control systems and the extent to which operators feel comfortable with handing
        control over to such systems. Aaseng [22] experience with the space industry is
        apparently that “Turning over some of these decision responsibilities to an automated
        system will require very high confidence in the accuracy of the system. Reducing the
        human element involved in the decision processes will be done only cautiously and
        reluctantly”. This is very similar to the situation with UK Naval personnel who,
        anecdotally, appear to have a reluctance to take the “man” out of the loop and allow the
        “computer” to take control.
        Whilst the process industry push toward ever increasing levels of automation, any
        argument concerning the use of automated systems in military systems will need to be
        carefully constructed and move in measured steps towards taking the man out of the
        loop. To some extent this is already happening with the Astute Submarine currently
        under construction. This is the first British Nuclear Submarine that has an iPMS with
        some automated sequences, although the man is still in command for most control
        operations. As the UK MoD push toward lower manning levels on their naval platforms,
        this situation is likely to change over the next couple of generations of submarine. To
        meet this challenge there is an opportunity to work towards automated reconfigurable
        systems that can be shown to be sufficiently safe to operate.

2.2.4 Networks of Intelligent Hardware
        Advances in the Process Control industry has seen much of the complexity that used to
        reside in the automated controller move closer to the physical process, i.e. to the
        sensors and actuators. In order to facilitate this move toward greater automation and
        intelligence, modern process control systems now utilise digital networks and are
        beginning to move toward wireless networks.




Adrian E Hill                                      -17-                                    04/09/2008
        HART
        The Hart Communications Foundation [23] is the organisation that controls the
        development of HART and provided the historical and technical information about the
        protocol that has been used in this section of the project report. The HART (Highway
        Addressable Remote Transducer) Protocol was introduced around 1986. It was one of
        the earliest methods of taking advantage of the intelligent instruments that were
        appearing on the market and provided a digital interface without the need for an
        additional network. By superimposing a digital signal on to the analogue measurement
        signal (4ma – 20ma loop current) a two way digital communication path can be
        established between an instrument and the control system. The digital signal is
        superimposed using frequency shift keying (FSK) with 1.2 KHz and 2.2 KHz
        representing bits 1 and 0, respectively. As a sine wave is used in the implementation
        the average amplitude is always zero and therefore the analogue measurement signal is
        not affected by the additional signal.
        Figure 2-4 shows the general arrangement of how the digital data is superimposed on
        the analogue signal.




                             Figure 2-4 Digital on Analogue Signal [24]

        Within the HART protocol there is the opportunity to include not only measurement data,
        but also a great amount of diagnostic data about performance of the instrument. The
        HART Communications Foundation (HCF) [23] is the body responsible for the
        development of HART and they have published specifications and application notes for
        the use of the protocol. It is apparent from the HCF application notes [24] that HART
        can be used to access a number of common and device specific parameters. This can
        be seen in Table 2-1 which has been taken from the HCF application notes [24].
        Universal commands are those that every HART enabled device must recognise whilst
        common and device specific commands are provided by many manufacturers but not all.
        As can be seen from Table 2-1, there are a number of commands that could be very
        useful in determining the value of secondary variables within the device that could
        contribute to device failure. The control system could then be used to predict plant
        failure and reconfigure the system such that the failing device could be reset or removed
        from service for repair whilst maintaining plant integrity.




Adrian E Hill                                  -18-                                   04/09/2008
                             Table 2-1 HART Protocol Commands [24]

        Figure 2-5 shows a number of HART enabled devices on a typical process control
        network with a controller and a local input/output device. The handheld device is a
        special tool designed to be able to configure a HART device.




                               Figure 2-5 General Use of HART [24]

        Adler [25] makes a case for the use of HART enabled devices to increase the reliability
        of process plant and claims that increased reliability can support an argument that a
        required safety integrity level (SIL) has been achieved. Although Adler [25] provides

Adrian E Hill                                 -19-                                   04/09/2008
        several examples of how diagnostic data can be used to improve the reliability of a
        process plant, particularly when alternative networked systems such as Fieldbus
        necessitate the wholesale replacement of older instruments, he fails to substantiate the
        safety case for the implementation of the HART protocol. Alder’s [25] focus is on the
        reliability of the measurement and so he fails to consider the safety issues around the
        implementation of an intelligent digital instrument, i.e. the HART protocol itself and the
        software and hardware design of the instrument.              Whilst the reliability of the
        measurement and the benefits that advanced diagnostics bring, there are other issues to
        contend with that have not been considered including common mode failures across
        instruments that use a common communication protocol such as HART.
        Digital Control Networks
        Following on from and building on the principles of the HART protocol, instrument
        manufacturers have developed intelligent instruments that are capable of being
        networked together. Of the various types of digital intelligent field devices available the
        main flavours are, Foundation Fieldbus [34], Profi (ProfiBus, ProfiNet & ProfiSafe [35])
        and IEEE 1451 [26]. All these protocols are claimed to support open standards and
        many manufacturers offer intelligent instruments with options for Foundation Fieldbus,
        Profi and HART protocols enabled although IEEE 1451 does not appear to have been
        adopted by the main instrument manufactures in the process industry.
        Foundation Fieldbus was one of the first digital networks that were developed to
        interconnect intelligent field devices to process control systems. Foundation Fieldbus is
        an open standard that has, along with the HART protocol been adopted by all of the
        major manufacturers of intelligent instrumentation.
        The Foundation Fieldbus protocol runs on two digital networks, H1 which is at the field
        instrumentation level. At the higher, control level the protocol runs on High Speed
        Ethernet (HSE) to share data between the field instrumentation and the PLC control
        system and the human operator via the HCI [34]. Figure 2-6 shows a Foundation
        Fieldbus network in diagrammatic form.




                            Figure 2-6 Foundation Fieldbus network [34]

        ProfiBus and ProfiNet Networks [35] are typically used to interconnect field devices
        using open Industrial standards. ProfiNet is typically used to enable existing systems of


Adrian E Hill                                   -20-                                    04/09/2008
        Profibus and FieldBus devices to be integrated into a single system by providing a
        common network backbone from which systems can connect and interact.
        ProfiNet uses common communication protocols such as TCP, IP and UDP. At the
        application layer, ProfiNet can use protocols such as HTTP (web), FTP (file transfer) and
        SMTP (email). The designers of ProfiNet have realised the importance of real time
        applications and have provided protocols for three levels of process criticality, non time
        critical. real time and clock-synchronized communications called Isochronous Real Time,
        which enables clock rates of under 1 ms and a jitter precision of <1 µs [35]. ProfiBus
        [35] is commonly used at the lowest level of the system architecture to provide
        interconnectivity between field devices and PLCs via standard RS485 serial interfaces
        for non network powered devices and MBP interfaces for network powered devices.
        ProfiSafe [36] has been developed to provide an additional layer that sits on top of the
        standard ProfiNet (Figure 2-7) and which provides more reliable data transmission
        between devices on the network. The real benefit of using ProfiSafe comes from the
        ability to use the same physical network medium to transmit both safety related and non
        safety related data without compromising the integrity of that data.




                        Figure 2-7 Profi Network with ProfiSafe Layer [36]
        Data integrity is guaranteed through the use of four main measures that are used to
        address the most likely causes of error. An overview of these four measures and the
        specific data integrity issues they have been designed to address are provided in Table
        2-2.




                               Table 2-2 Properties of PROFISafe [36]
Adrian E Hill                                  -21-                                    04/09/2008
        The implementation of these measures within network components enables the
        communication of safety related data to be certificated by independent bodies up to IEC
        61508 SIL 3 [36]. From the perspective of constructing an argument for safely
        implementing a reconfigurable process control system, then it could be argued that the
        network communication between field devices and the control system can be considered
        to exhibit sufficient integrity if the integrity requirements for the function are no greater
        than SIL 3. Therefore, as long as a suitable digital network infrastructure can be
        implemented the communication network can be regarded as a pre-qualified component
        of the system. However, in reality there still needs to be additional work undertaken to
        demonstrate that the safety certification is valid in a specific application. Often problems
        can occur with the mode of operation and the specific installation requirements that are
        assumed by the safety certification.
        IEEE 1451 “Plug and Play” type functionality is described as “self describing capabilities”
        [40], which Potter [40] sees as analogous to simply plugging in a computer mouse and it
        working without further configuration by the end user.
        IEEE 1451 is a family of standards that has been developed to allow plug and play
        functionality across different types of networks. The development of the standard is
        being undertaken in the USA, primarily by NIST and therefore as a commercial protocol
        it has not been widely adopted by the process industry. Lee [26] helpfully provides an
        overview of the IEEE 1451 standard and details the connection of instruments in both
        drop and ring configurations based on Ethernet technologies. One of the main
        advantages of this standard is the ability to plug and play different instruments from a
        variety of manufacturers and varied functionality from very simple devices through to
        more complex reconfigurable devices. This plug and play functionality is based on a
        standard method of defining the capability and interface of an instrument in non-volatile
        memory such that other network components are able to interrogate the instrument and
        automatically interface with it. The device configuration is stored as a Transducer
        Electronic Data Sheet (TEDS). The correctness of this data appears to be a key factor
        in the success or failure of interfacing an instrument and is likely to be key in enabling
        dynamic reconfiguration of a system based on health monitoring data. However, the
        IEEE 1451 standard appears to be focused on plug and play functionality rather than
        health monitoring and reconfiguration.
        An overview of the connection of intelligent instruments to a network is shown in Figure
        2-8 which demonstrates how an intelligent instrument could be connected to a network
        via the interface between the interface module (STIM) and the Network Capable
        Application Processor (NCAP). The STIM is used as a means of encapsulating the
        details of the transducer arrangement and presenting a common interface with the
        NCAP which in turn is used to make a commonly defined interface with the physical
        network.




Adrian E Hill                                   -22-                                      04/09/2008
                              Figure 2-8 IEEE 1451 Interface Model [26]
        Whilst there are advantages and disadvantages when adopting any of the digital
        networking technology reviewed for networks of intelligent instrumentation, there are
        clear advantages in adopting those technologies that are established and have been
        adopted by a wide range of industries and users. In terms of safety, established
        technologies have typically been introduced through extensive user testing as well as
        several product development iterations and so it would be reasonable to assume that
        any significant issues would have been identified and fixed. Established technologies
        also benefit from the experience of a wide user base that in many cases is supported
        through user groups and industry forums.
        Newer technologies that have not been widely adopted by industry and therefore untried
        in a broad range of applications could be considered to be at greater risk of harbouring
        unrevealed failure modes. Given that IEEE 1451 has not been widely adopted by the
        process control industry it would be more difficult to justify its use in a safety critical
        process automation application whilst proven technologies such as Foundation Fieldbus
        and ProfiNet/ProfiBus are widely available.
        FDT/DTM [46] is an open source asset management system that is network protocol
        independent. Figure 2-9 taken from the FDT Technical Description [46] shows the
        overall scope of FDT/DTM which consists of two parts. The Field Device Tool (FDT) is
        an interface that provides access to a frame application based asset management
        applications, data storage and engineering resources that are used for developing and
        managing intelligent instruments on a network.




Adrian E Hill                                   -23-                                    04/09/2008
                                  Figure 2-9 Scope of FDT/DTM [46]
        The Device Type Manager (DTM) is the equivalent of a device driver and is supplied by
        the device manufacturer. The device DTMs provide encapsulated data specific to each
        intelligent device while communication DTMs provide communication channels between
        the intelligent field device and the FDT. FDT/DTM offers a number of useful features
        that could be exploited in a reconfigurable system. In-built diagnostics and system
        scans to search for and automatically configure connected devices are particularly
        useful features. FDT/DTM has been taken up by a number of instrument manufacturers
        and therefore numbers of instruments with these features are now growing. The
        downside, however, is that this technology is relatively new and is still evolving. This
        evaluation is borne out by an independent survey of device description languages and
        asset management applications [50] in which it was concluded that in concept FDT/DTM
        “offered enhanced data accessibility and extended functionality as used in
        commissioning … and maintenance of… smart devices in a very effective manner”. In
        conclusion, FDT/DTM technology is currently in a strong position to supply the
        necessary functionality to enable the benefits of reconfigurable intelligent hardware to be
        realised if field device manufacturers continue to support and further exploit this
        technology.

2.2.5 Distributed Intelligent Agents
        One of the advantages of the ability to network intelligent instruments has been the
        development of what has become known as Distributed Intelligent Agents (DIAs). DIAs
        are typically used to reduce the processing requirements on traditional control systems
        by transferring decision making logic to intelligent field instrumentation and allowing the
        field instrumentation to collectively reconfigure in order to mitigate system failures.
        Discenzo et al [27], present a number of research projects including municipal water
        utilities and military ship systems that are using DIAs to locate and mitigate system
        failures. Discenzo et al [27] conclude that “Autonomous intelligent agents have been
        shown to provide valuable capabilities for distributed control and dynamic
        reconfiguration without the need for centralized control”. However, they concede that
        there are still many issues to address, including the planning and coordination of
        configuration and reconfiguration, establishing alternative control strategies to maintain a
        stable service while the system is reconfiguring as well as dealing with a reconfigured
        system with missing or degraded components. Whilst Discenzo et al [27] describe the

Adrian E Hill                                   -24-                                     04/09/2008
        benefits of DIAs in terms of being able to maintain critical processes, i.e. fault tolerance,
        nowhere do they address the very important issues of safety certification or indeed raise
        any comments about such issues.
        The use of DIAs in the “real” world is very limited at present as noted by Pechoucek and
        Marik [28] when they state that “Unfortunately there is a gap between fundamental
        researchers and industrial users of agent technology”. This is an important issue for the
        immediate use of this new technology in a safety related military application – it is yet to
        be widely deployed and experience with the technology is limited. However, from the
        work of Pechoucek and Marik [28] it is interesting to note that there are some synergies
        between the work being undertaken with real time blueprints and DIAs that could be
        taken forward as general principles for implementing real time or dynamic
        reconfiguration of systems. One of the most striking similarities is the use of a layered
        approach to control and the authority to reconfigure with a contract type arrangement
        between DIAs to manage resources between themselves in the event of a detected
        failure. In terms of arguing the safety of such a system, it is apparent that the authority
        and mechanism for managing a reconfiguration is key to its success.

2.3   Nuclear Industry
        The nuclear industry is a user of a wide range of intelligent instrumentation in operations
        ranging from nuclear reprocessing, power generation and research through to weapon
        design and manufacture.

2.3.1 Safety Concerns
        Due to the high risks involved in working with nuclear material, the industry is highly
        regulated and tends to be in the public eye, especially when accidents occur. The
        Health and Safety Executive (HSE) regulate the civil nuclear industry through an HSE
        organisation called the Nuclear Installations Inspectorate (NII). The Military nuclear
        industry (Naval) sets out a number of regulatory expectations from computer based
        systems that are published by the Chairman of Nuclear Naval Regulatory Panel
        (CNNRP) [30]. In providing guidance to Regulatory assessors, the CNNRP takes note
        of guidance from the civil authorities and in particular the NII and HSE. In taking such
        an approach we see a commonality developing between the civil and military with regard
        to the regulation of computer based systems in nuclear applications.
        Due to the high level of regulation and the risk adverse attitude towards nuclear work the
        industry has a tendency to be very cautious in the assimilation of new computer based
        technologies such as intelligent instruments and to be pessimistic with regard to the
        safety justification of new technology. In recognising the issues facing the industry with
        regard to intelligent instrumentation, the HSE have initiated a number of research
        projects [29] including,
         •      Development of an approach to the assurance of Smart sensor
         •      Guidelines for assessment of Smart instrument suppliers
        From the list of current research [29], the HSE have not yet started to look at the
        dynamic reconfiguration of systems involving intelligent instrumentation, but are still
        concentrating their efforts on the fundamental issue of arguing the safety of intelligent
        instrumentation as components of a wider control system.
        The HSE also issued technical directives with respect to Electrical, Electronics and
        Programmable Electronic Devices (E/E/PES) on their Control Systems Technical
        Measures Document web site [31]. This provides an overview of the important areas of
        concern for the HSE and specifically comments on the use of intelligent instruments.

Adrian E Hill                                    -25-                                     04/09/2008
        Again, there is a measure of pessimism with the advice from the HSE to use intelligent
        instruments in “dumb” mode rather than use the digital output offered. This is seen as a
        means of mitigating any possible systematic failure of the instrument due to software.
        This is the real concern of this highly regulated industry, and dealing with the issues
        around systematic failure due to software errors is something the industry is still
        grappling with.

2.3.2 Approach to Intelligent Hardware
        The nuclear industry has to date taken a very conservative approach to intelligent
        instrumentation, particularly with regard to how it deal with safety integrity and
        systematic failure. Current practice in the industry is to take a pessimistic view of 3rd
        party certification to IEC 61508 [5] and to develop its own “IEC 61508 [5] like” approach
        to the assessment of intelligent instruments. Taking guidance from the regulatory
        bodies such as the NII and HSE etc, the industry takes a three leg approach by applying
        the principles of “Production Excellence”, “Independent Confidence Building Measures”
        and “Compensating Activities” [32].
        Production Excellence is an assessment of the process evidence available from the
        manufacturer of the intelligent instrument in order to claim a safety integrity level in
        accordance with IEC 61508 [5]. The industry has developed a tool called EMPHASIS to
        assess the process evidence that is available from a manufacturer and to identify gaps
        between the available evidence and that required by IEC 61508 [5]. This is heavily
        dependent upon the openness of the manufacturer to share information with the
        assessor. However, for commercial reasons, most manufacturers are not very open and
        meeting the regulators demands for access to source code for example is very difficult if
        not impossible [32].
        Independent Confidence Building Measures are measures, independent of the
        manufacturer that can be used to enhance the safety integrity claim for the instrument.
        This could include 3rd party certification to IEC 61508 [5] but it would appear that the
        industry is not prepared to trust such certification without some additional assessment of
        their own. Other measures could include such activities as analysis of code, functional
        testing and in-service history for example [32].
        Compensating Activities are used in cases were there are shortfalls in the process
        evidence. Compensating activities are dependent on the extent of the shortfall, but
        could include statistical testing, use of diverse instruments, audits of manufacturer and
        ALARP arguments.
        The experience of the industry is that one third of all instruments assessed typically fail
        the assessment due to the lack of evidence from the manufacturer and therefore cannot
        be used in safety related applications [32].
        The trustworthiness of information from manufacturers, including 3rd part certification is a
        key concern to the nuclear industry and this along with the expectations of the regulatory
        bodies has made the use of intelligent instrumentation very difficult. However, the
        process industry is progressively moving toward more sophisticated control solutions
        including intelligent hardware and therefore the manufacturing base for instrumentation
        is also moving in the same direction. This has a knock on effect on the nuclear industry
        as dumb instruments become harder to source and the acceptance of intelligent
        instruments in safety related applications remains difficult to achieve. It is hoped that the
        results of this MSc project will in some way be able to challenge these pessimistic views
        without diminishing the overall safety of such systems.



Adrian E Hill                                   -26-                                      04/09/2008
2.4   Aerospace Industry

2.4.1 Integrated Modular Avionics (IMA)
        The real time dynamic reconfiguration of systems has been taken up by the aerospace
        industry by taking advantage of the introduction of Integrated Modular Avionics (IMA)
        which has generically become known as an Integrated Modular System (IMS) in order to
        enable the use of the principles of IMA to be used in a wider context than just avionics.
        IMS moves away from the federated approach which relies on a number of independent
        systems, each with its own memory and processing resources, to an approach that uses
        a modular system with shared resources. An IMS implements a number of key
        attributes that enable the sharing of computer resources whilst maintaining the integrity
        of each application using those resources.
        A clearly defined layered architecture is presented in Defence Standard 00-74 part 2 [8]
        and is used to separate system functions into the four layers depicted in Figure 2-10.
        The separation of the architecture into layers allows a modular approach to be taken
        with regard to the applications that are designed to run on the system and the definition
        of the interfaces between the layers of the architecture.




                               Figure 2-10 IMS Layered Architecture [7]
        Resource sharing and the partitioning of resources between the various components of
        the system is a major key to the successful implementation of an IMS. Partitioning of
        resources enables applications to share common resources, even at differing safety
        integrity levels and therefore partitioning opens up the prospect of dynamic
        reconfiguration of an IMS in the event of a failure. For example, it would theoretically be
        possible to move a software application to an alternative operational processor in the
        event of a processor hardware failure and if need be to halt less critical processes that
        were running on the available processor to enable the process with a higher criticality to
        run. Therefore, IMA systems can be implemented in such a way that safety critical
        functions are fault tolerant.
Adrian E Hill                                  -27-                                     04/09/2008
        Nicholson [12] makes the point that health management is a key enabler to this level of
        fault tolerance and dynamic reconfiguration when he states that “It is possible to
        reconfigure such [IMS] systems to provide continued functionality when an element of
        the system fails. To achieve this aim a number of pre-requisites must be in-place: the
        ability to determine when a failure has occurred, the appropriate configuration to move to
        and the ability to safely transfer from one configuration to another [12]. In general, IMS
        provides health management through the monitoring of the state of each of the layers in
        the model depicted in Figure 2-10. As part of this health monitoring, Nicholson [12]
        identifies the allocation of authority to reconfigure a system on failure as an issue that
        would need to be dealt with in order to have an assurance that the system remained
        stable during the reconfiguration process. It is suggested that one way of solving this
        problem might be by allocating authority based on the layered architecture of IMS
        approach to authority within health monitoring. The levels of authority envisaged by
        Nicholson [12] are;
         •      Software process / hardware component monitoring (CHM)
         •      Module / application level health monitoring (MHM)
         •      Partition /application level health monitoring (PHM)
         •      Global level health monitoring (GHM).
        However, there is an area concerning smart sensors that Nicholson [12] touches on but
        does not consider in great detail. The trend in the process industry is to move towards
        software driven smart sensors that posses the ability to reconfigure themselves based
        on internal diagnostics or health monitoring. In effect this level of health monitoring
        could be considered to be a fifth level below the external health monitoring of the sensor
        hardware by the IMS.
        This fifth level provides an additional place in which authority to reconfigure may be
        allowed to reside and which must be taken into consideration when attempting to argue
        the safety of such a system. However, the placement of authority within this fifth layer
        needs to be carefully managed as it would potentially be hazardous to allow a low level
        field device to autonomously change the configuration of the system based on its view of
        its own state and the state of the system. It would therefore be prudent to place the
        authority to reconfigure at a higher system level where the health management
        application has a system wide view. Moving authority to reconfigure to higher levels
        could also be used to address any fault with a field device that caused it to continually
        move from “in service” to “out of service” (babbling) and therefore continually and
        uncontrollably change the configuration of the system.

2.4.2 IMA Reconfiguration and Certification Issues
        Although certification issues are identified, Nicholson does not provide a detailed
        approach to the certification of an IMS. However there is a very helpful overview of
        some of the issues that must be overcome to enable a safe reconfiguration in an IMS to
        take place. Typically, failures in the health monitoring system of an IMS will either be
        one of omission, the failure to detect a problem or one of commission where the health
        monitor incorrectly detects a failure that does not exist. In both these cases, there is a
        high probability that the health monitoring system would take the IMS into an incorrect
        and potentially dangerous configuration. In addition to these types of failure, there is
        also the case of early detection and late detection of a problem or failure by the health
        monitoring system and Nicholson [12] suggests that the SHARD analysis technique
        would be suitable for the analysis of the failure classes mentioned.
        Jolliffe [13] takes the principles of Integrated Modular Avionics and explores the
        possibilities of arguing the safety of reconfiguration through the use of blueprints.

Adrian E Hill                                      -28-                                04/09/2008
                                   Figure 2-11 IMA Blueprints [13]

        Blueprints are described by Jolliffe [13] in terms of a definition of their content as shown
        in Figure 2-11. Each blueprint holds information about the configuration of system and
        is specific to the aircraft in question. For example, the hardware blueprint would typically
        hold information about the resources that are available to the system such as memory,
        processors, communication. Similarly, the software blueprint typically holds information
        about the software applications that are available to the system and their current
        configuration while the configuration blueprint holds information about how hardware
        and software may be configured to work together. Mapping rules and optimisation is
        then applied so that the most appropriate configuration of software and hardware is
        chosen by the system to load as the System Blueprint. In effect, the System Blueprint is
        the run time blueprint that the system uses during aircraft operations.
        Jolliffe [13] introduces three methods of IMA reconfiguration based on the use of the
        blueprints described above which are defined as,
        Manual – this scheme is used to reconfigure a system from a known state to a new
        known state that has been manually selected from a list of configuration options. This
        scheme appears to be every limited in its approach and would require human
        intervention. It is also apparent that this approach is very inflexible by fixing the number
        of possible configurations to those that can be reasonably foreseen and can be
        accommodated. Although it is perhaps easier to formulate a safety argument for this
        approach, through life costs of updating the system for new operational roles and
        replacement components would be an obvious downside along side the possibility of
        there not being an optimal configuration for unforeseen circumstances.
        Static (on the ground) – this scheme is used to automatically reconfigure an aircraft on
        the ground, probably between missions. As far as a submarine application is concerned
        the operational constraints of a 90 day mission and the requirement for continuous plant
        operation during this period would make this scheme of little benefit. However, the
        prospect of automatic reconfiguration is appealing and Jolliffe [13] highlights some
        important issues that need to be addressed, including the choice of algorithm for the
        automatic optimisation and the number of possible permutations along with the need to
        undertake some amount of testing to ensure the run-time blueprint is valid. For a
        submarine, the time between missions is typically measured in weeks rather than hours
        and therefore the option to reconfigure a control system is less time critical, yet the
        philosophy of automatically changing a configuration to match a mission cannot be
        readily carried across to a submarine application as a submarine has a limited number
        of operational purposes, none of which would necessitate a reconfiguration of the control
        system. The most likely scenario for a submarine is to require system reconfiguration in
Adrian E Hill                                   -29-                                     04/09/2008
        response to a failure during a mission as the time between missions is typically used for
        maintenance, including fault rectification.
        Dynamic – this scheme is used to automatically reconfigure the system while the
        aircraft is in flight. Jolliffe [13] considers this to be the most difficult to implement
        technically and from a safety certification aspect. Problems in this area include
        overcoming the technical challenges of dynamic reconfiguration in the very hard real
        time environment typically experienced by a military jet as well as the problems
        associated with safety making the transition between states. However, in the submarine
        environment the timings are less hard and there may be greater opportunities for
        dynamic reconfiguration. The advantages of being able to use dynamic reconfiguration
        for fault tolerance and health management are very appealing in a hostile environment
        like the sea where opportunities for fault finding, maintenance and system downtime are
        very limited. In such an environment there maybe an opportunity for a trade-off between
        maintaining the operation of a system and therefore the whole platform and a reduction
        in overall safety margin.

2.4.3 Blueprints in a wider context
        The use of blueprints as a mechanism for the implementation system reconfiguration is
        interesting as it provides a clear allocation of authority to the controlling system, i.e. the
        global health monitoring layer which would use the principal of mapping rules and
        optimisation to create a suitable run time system blueprint. However, Jolloffe [13] does
        not consider the levels at which reconfiguration could take place and certainly does not
        consider reconfiguration at the lowest hardware level by smart instruments. If the
        blueprint approach were to be adopted for a submarine platform control system then a
        consideration of the levels at which authority is given for reconfiguration would have to
        be factored into the safety argument given that the aspirations of the shipbuilder is
        toward smarter hardware and reduced reliance on the submarine crew.

2.4.4 IMA Certification Issues
        Jolliffe [13] argues that the safety of reconfiguration through the use of blueprints may be
        achieved through a strategy that involves arguing the safety of the components of the
        reconfiguration system, i.e. the Hardware, Software and Configuration Blueprints along
        with arguments about the mapping rules. In essence, at the highest level, Jolliffe [13]
        argues that “the correct combination of safe blueprints and safe mapping rules will
        provide a safe IMA transition”. He then goes on to provide a justification for this high
        level statement through a set of GSN diagrams that decomposes the argument into
        strategies that have been designed to argue the main aspects of safety, including,
         •      Software safety
         •      The sufficiency of the pre-determined configurations to meet the operational
                requirements of the application
         •      The sufficiency of the Run-Time Blueprint configurations
         •      The sufficiency of the mapping mules and post reconfiguration testing and sanity
                checks
         •      The sufficiency of the chosen algorithm in terms of its predictability.
        Jolliffe [13] introduces the use of safety case contracts as part of the development of the
        GSN safety argument. The use of contracts would be very useful when dealing with
        plug and play type COTS hardware that is likely to bring significant advantages to the
        maintenance of large complex control system through life. The prospect of being able to
        interface a piece of safety case for a smart hardware component into the system safety

Adrian E Hill                                       -30-                                   04/09/2008
        case using a contract to ensure that the safety argument is maintained is seen as a
        possible way forward.

2.4.5 Dynamic Reconfiguration of IMA - Certification Issues
        Hollow, McDermid and Nicholson [14] address some of the problems arising from the
        certification of dynamically reconfigurable IMA by proposing an alternative method than
        that currently used for traditional federated systems. Their approach does not seek
        certification of each discrete configuration from a finite but very large number of possible
        configurations, but rather concentrates on the certification of a family of configurations.
        In this area of work, a family of configurations is described as a group of configurations
        that are equivalent with one another. Equivalence is based on system properties such
        as resource usage, timing properties, time to failure and cost of configuration. The
        notion of equivalence is used in heuristic search algorithms in order to determine which
        configurations are equivalent and can therefore be grouped together. In the example
        provided by Hollow et al, a Simulated Annealing algorithm was used with an off line
        model of a system to create a baseline configuration and then by introducing single
        faults in the model, to create a table of equivalent configurations such that the system
        remains fault tolerant to a single failure.
        Hollow et al [14] conclude that by using off line modelling and analysis a baseline of
        deterministic configurations can be established. They also conclude that by arguing the
        equivalence of a family of configurations with the baseline then certification should only
        be required for the baseline.
        This approach is interesting in so far as developing an approach to dynamic
        reconfiguration at the control system level. The problem of authority assignment
        between smart hardware and the overarching controlling system has not been
        considered, nor has a large complex process control type system of the type most likely
        to be encountered in a submarine application. There are other differences between a
        submarine and IMA application in that timing is often more critical in aircraft applications,
        although IMA systems are smaller and are not distributed over as wide an area as
        commonly found in a submarine application. Therefore, the principle of off line
        modelling in order to establish a family of configurations could be used with some
        modification and additional consideration to application specific issues.

2.5   Regulatory Issues

2.5.1 General
        The primary safety standard that is applicable to UK military projects is Defence
        Standard 00-56 [2]. Issue 4 is the latest revision of this standard. It is expected that the
        design, development and supply of future submarines will have to comply with the
        requirements of this standard and therefore it is of utmost importance. The standard
        moves away from safety integrity levels (SILs) and towards ALARP and Goal Setting
        techniques. By moving in this direction, older electronic hardware and software specific
        safety standards Def Stan 00-54 [3] and Def Stan 00-55 [4] have been superseded,
        although some guidance on the application of Defence Standard 00-56 [2] can be found
        in part 2 of the standard. As Defence Standard 00-56 [2] is aimed at the system level
        the safety of the individual hardware units needs to be addressed. In the commercial
        process centric industry, standards such as IEC 61508 [5] and its derivatives are used to
        assess and certify complex electronic and programmable hardware. IEC 61508 [5] does
        use SILs but could still be used at the individual equipment (PLC, sensor and actuator)
        level in conjunction with Defence Standard 00-56 [2] at the system level to provide
        sufficient evidence for a safety case to be generated. When seeking to propose a

Adrian E Hill                                   -31-                                      04/09/2008
        method of undertaking a safety analysis of a real time or dynamically reconfigurable
        process control system the assumption, from a certification perspective, is that pre-
        certified components such as smart hardware, computer resources and digital networks
        can be successfully justified as part of an ALARP based safety case.

2.5.2 Application to Military Systems
        For military submarines, JSP 430 [6] is the platform level safety standard that is
        contractually applicable. JSP 430 [6] is helpful in providing an overview of the principles
        of safety that have been adopted by the MoD with regard to naval ships. The policy is
        defined in terms of the principals of ALARP and fits well with the approach to system
        safety taken by Defence Standard 00-56 [2]. However, JSP 430 [6] does not address
        the use of specific technologies such as sensors and actuators and their associated
        controllers when considering the safety of the platform. Whilst this presents a level of
        uncertainty in terms of compliance, at this level it is hoped that a safety argument can be
        established that brings together JSP 430 [6], the Defence Standard 00-56 [2] and IEC
        61508 [5] in order to demonstrate that risks associated with dynamic reconfiguration of
        control systems can be reduced to ALARP.
        In the military avionics industry, from which this project is looking to transfer knowledge,
        the primary safety standard is again DS 00-56 [2] at the system level with technology
        specific standards at lower system and equipment levels. Def Standard 00-74 [7] is a
        new standard that has been introduced by the UK MoD in order to specify “the
        requirements to define and validate a set of open architecture standards, concepts &
        guidelines for Advanced Avionics Architectures.” [7]. The standard does this by
        providing a specification for each major part of a software architecture model based on
        the ASAAC Three Layer Software Architecture of “Application Layer”, “Operating System
        Layer” and “Module Support Layer”. Included in this standard are the requirements for
        implementing Real Time Blue Prints (RTBP) but does not address safety/certification of
        such a system in any detail. Some safety issues that should be considered are
        mentioned throughout the text but these deal mainly with concerns about resource
        management and not about overall certification of RTBP and the associated dynamic
        reconfiguration of an IMS using such technology. By presenting the ASAAC three layer
        architectural model, the standard does not deal with low level hardware issues around
        sensors and actuators, whether intelligent or otherwise. This is not unexpected as the
        lowest layer of the model is a layer of abstraction above the interface between the IMS
        and the physical hardware.
        Commercial avionic system safety certification is based on standards such as D0 178B
        [9] for software and DO-254 [10] for complex electronic hardware.
        D0 178B [9] is well respected and has been used in the civil aerospace industry for a
        number of years. The standard itself is prescriptive in its approach to software safety
        and as such takes a similar approach to the older UK military safety standards. In taking
        this type of approach the standard is better suited to bespoke applications rather than
        Industrial COTS componants.
        DO-254 [10] has been produced by the RTCA in order to address the development of
        complex electronic hardware for the aviation industry. Complex electronic hardware
        includes such items as Programmable Logic Devices (PLDS), Application Specific
        Integrated Circuits (ASICs) and electronic circuit boards that reside at the lowest
        replaceable unit (LRU) level. The application of DO-254 [10] is very similar to the
        equivalent functional safety standards in that assurance levels are used (HDALs) in a
        similar way that SILs are used to drive the hardware design and development processes
        employed by the hardware developer. DO-254 [10] defines five HDALs from A to E that
        correspond to hazard consequences ranging from “A” being “Catastrophic” through to

Adrian E Hill                                   -32-                                     04/09/2008
        “E” with is “No effect”. DO-254 [10] address the issue of COTS components in a very
        similar fashion to other safety standards by addressing the quality, developer’s track
        records and in-service data that relates to a specific COTS component. For this project,
        where the focus on certification is toward the use of pre-certified components, the use of
        DO-254 [10] as a standard to apply to the development of complex hardware does not
        provide any significant advantage over the use of other safety standards.

2.5.3 Certification and Standards Compliance Issues
        The issue relating to the argument that systems that implement dynamic reconfiguration
        are tolerably safe is that they typically fall into the realms of both hardware and software
        safety and as such are not particularly well served by the prescriptive approach still
        taken by the likes of D0 178B [9] and in the now superseded Defence Standards 00-54
        [3] and 00-55 [4]. Defence Standard 00-56 [2] has raised the level to that of the system
        and therefore a less prescriptive approach can be taken which is based on risk and risk
        reduction and not the strict adherence to a set of mandatory practises. This is
        particularly important in situations where the controlling system may be a mixture of
        COTS and bespoke development and the sensing and actuating components of such a
        system are pure COTS. However, this in itself raises the problem of designing, building
        and justifying a safety related control system where a significant number or components
        are supplied pre-certified to a predetermined SIL in accordance with IEC 61508 [5] whilst
        the system requires a more bespoke ALARP approach in accordance with DS 00-56 [2].
        The problems of mixing SILs and ALARP have been discussed by Shuttleworth [11] who
        proposes various methods of ensuring ALARP principles are maintained within a safety
        management environment which is dominated by the use of SILs. For COTS
        components, the method of aiming for a higher SIL than is strictly required would be one
        way of addressing this issue if such a component was available at a higher SIL. For
        example, a sensor serving a SIL 2 function would need to be certified to SIL 3 in order to
        provide a measure of margin and therefore an argument that the risk has been reduced
        to ALARP.

2.6   COTS Acquisition and Safety Issues
        Typically, most process control systems consist of a number of COTS components that
        are brought together and integrated into a cohesive system to provide specified
        functionality. The MoD’s desire to move away from costly bespoke product development
        fits well with the argument that a Submarine iPMS is no more complex than an industrial
        process control system and therefore the COTS components could be used to cost
        effectively provide such a system and provide greater flexibility for through life support.
        However, when attempting to argue the safety of a system that has been designed to
        utilise COTS components there are a number of key issues that need to be addressed.
        These are typically,
         •      The trustworthiness of the product, i.e. can the product be trusted to consistently
                perform as specified to a desired level of reliability. Generally, those products that
                have been on the market for some time and have proven track record are more
                likely to be justifiable than a new unproven in the field product.
         •      Little or no access to sufficient information about a product due to commercial
                sensitivities in a highly competitive market place. This is one of the most common
                problems faced when attempting to comply with the current safety standards.
                However, it is common practice to argue the safety of a product by employing a
                strategy based on what is commonly called Product Excellence. This is usually the
                case for software based products were the normal strategy to argue that adequate
                development processes have been employed in order to reduce the probability of

Adrian E Hill                                      -33-                                    04/09/2008
                systematic error cannot be used due to the lack of sufficient information about the
                software design and development process.
         •      The COTS product rarely matches the exact requirement specification of the system
                of which it is to form a part. It is almost certainly the case that the COTS product
                will include additional functionality that is not required and may lead to unintentional
                behaviours in the wider system or could lead to unforeseen failure modes. It is
                unlikely that a COTS product would have less functionality than required as this
                would make it unsuitable for its intended use.

2.6.1 Justifying the use of COTS Components
        There have been several strategies put forward for justifying the use of COTS
        components in safety critical applications and specifically with dealing with the problem
        of COTS software. For example, guidance provided in safety standards such as IEC
        61508 [5] and the now superseded Defence Standard 00-55 [4] tend to approach the
        problem from the position of arguing the “goodness” of the product by determining the
        construction of the component through reverse engineering or by arguing safety by the
        analysis of in-service data where this is available. With the publication of the newer goal
        based Defence Standard 00-56 [2] safety standard there is much greater emphasis on
        the quality of evidence and a broader recognition that the system integrators processes
        with regard to component selection should be taken into consideration when arguing the
        safety of a system.
        The contribution made by the processes for component selection is recognised by Ye
        [33] in his proposal for a “systematic approach towards COTS software component
        acquisition for the development of safety critical systems”.
        Ye [33], describes a process whereby COTS component selection is based on clearly
        defined requirements derived from the system architecture and on the evaluation of the
        COTS component based on clearly defined evaluation criteria. Although not explicitly
        shown by Ye [33] in process form, the component acquisition process may be drawn as
        shown in Figure 2-12 which includes the main points from Ye’s COTS acquisition
        framework, i.e. software architecture, reference COTS component, COTS component
        evaluation and system safety justification.




                                Figure 2-12 Component Acquisition Process


Adrian E Hill                                       -34-                                     04/09/2008
        Although Ye [33] applies the COTS acquisition framework to software components, it
        would be useful to apply the same principles to a broader definition of a COTS
        component that included the hardware and for the purposes of this project the intelligent
        COTS hardware components as well as the more traditional process control COTS
        components at each layer in the submarine control architecture, i.e. HMI Computers,
        Logic Controllers (PLCs) and network infrastructure. It may be useful to consider how
        components at all levels in the architecture interact with one another and how this might
        be affected by dynamic reconfiguration of components at various layers.
        What is particularly significant in the work undertaken by Ye [33] is the concept of a
        reference COTS component that is used to “establish the criteria for COTS component
        evaluation and selection” [33]. As the process industry drives towards the world of “plug
        and play” functionality fuelled by the advances in intelligent instrumentation it should be
        possible to define a reference COTS component for each candidate piece of equipment
        in the system design and as Ye [33] suggests, use this to establish the selection criteria
        for the actual COTS component as shown in Figure 2-13.


            Functional
            Requirements                                                            Candidate
                                                                                     COTS
                                                                                   Component n
            Non-functional              Reference          Evaluation
            Requirements                  COTS            and Section
                                        Component           Criteria
            Safety
            Requirements                                                            Candidate
                                                                                     COTS
            Constraints                                     CCA1                  Component n…

        1
         CCA – Component Criticality Analysis as defined by Ye [33]
                             Figure 2-13 Diagrammatic View of COTS Evaluation


        The principles of COTS evaluation and selection defined by Ye [33] could then be used
        to select a number of components (intelligent hardware) that could be substituted (plug
        and play) for defective components, either by direct replacement (maintenance
        activities) or by dynamic system reconfiguration to utilise less critical hardware in order
        to provide fault tolerance.

2.7   Safety Tactics
        From a survey of literature dealing with the technical aspects of intelligent hardware and
        real time reconfiguration it is apparent that very little has been documented about the
        approach that can be taken to address the safety when implementing such technology.
        However, there is an approach to the safety aspects of software architecture design that
        has been developed by Wu and Kelly [41] from principles originally developed by the
        Software Engineering Institute at Carnegie Mellon University.
        Figure 2-14 shows in diagrammatic form the grouping of several safety tactics for
        software architecture design




Adrian E Hill                                   -35-                                    04/09/2008
                  Figure 2-14 Safety Tactics for Software Architecture Design [41]
        The hierarchical safety tactics developed by Wu and Kelly [41] are grouped into three
        distinct high level areas of concern, “Failure Avoidance”, “Failure Detection” and “Failure
        Containment”. Each main safety tactic is then decomposed into lower level safety
        tactics that are designed to address specific types of failure in software by taking
        measures in the software architecture. Taking the three high level safety tactics and
        applying them in a generic sense to a process automation system is straightforward as
        the tactics are generic and could reasonably be modified for a number of different
        architecture types. The safety tactics for software architecture design could also be
        broadened to encompass not only architectural aspects of a design but also the
        processes by which safety is argued in a wider context than currently shown in Figure
        2-14. One such process is that used to specify and acquire COTS components for
        safety related systems and another is the process for creating safety case contracts
        between components, both of which are explored in greater depth in Section 3.4.

2.8   Literature Survey Conclusions
        The purpose of this literature survey was to review a range of current and past work
        being undertaken across a wide range of industries in the area of process control and
        dynamic reconfiguration. This literature survey has also sought to investigate the ways
        in which the dynamic reconfiguration of intelligent hardware might be safety
        implemented as part of a process control system.
        The regulatory aspects of justifying the use of COTS products and in particular intelligent
        hardware was reviewed with the conclusion that with care there is a way forward that
        could be taken to bring together the military and civil safety standards. This was
        important in the context of the project as the use of civil COTS components within a
        military application is one way of reducing cost. However it is recognised that it is
        important to safety justify the use of COTS. The literature survey considered the use of
        IEC 61508 certified COTS in a goal based ALARP environment.
        When considering the safety issues of implementing real time dynamic reconfiguration
        the approach taken by the aerospace industry with regard to the certification of IMA
        systems was considered. It was concluded that this work could provide a basis for the

Adrian E Hill                                   -36-                                    04/09/2008
        development of safety tactics for real time reconfiguration of an industrial process control
        system.
        With regard to process control systems the literature survey took a broad view of the
        current technologies as well as the safety issues associated with recent advances in
        intelligent field instrumentation. From the literature survey it was determined that there
        is a significant body of existing and ongoing research into the technology and benefits of
        intelligent hardware that suggests that not only is dynamic reconfiguration possible but
        that it can also be highly beneficial in terms of improving fault tolerance. However, whilst
        there is a considerable amount of information about the technical capabilities of
        intelligent hardware and reconfiguration, the literature survey found very little evidence
        that the problems faced by those wanting to implement such technology in safety critical
        applications had been satisfactorily addressed.
        The work undertaken by Fan Ye [33] in developing a method of justifying COTS software
        components was reviewed and was considered as a candidate for further development
        toward the application of the acquisition of intelligent hardware. In considering the work
        of Ye, the application of COTS acquisition and safety justification within a safety tactics
        framework was considered an important factor in the overall strategy for the use of real
        time reconfigurable intelligent hardware.
        In conclusion, the literature survey has revealed an opportunity to bring together the
        work undertaken in the area of IMA safety analysis, the development of safety contracts
        and the work associated with COTS software component acquisition and apply these
        techniques to the area of reconfigurable intelligent hardware. It is therefore proposed to
        explore these themes further in the development and application of safety tactics for the
        justification of real time configurable intelligent hardware in the context of process
        control.




Adrian E Hill                                   -37-                                     04/09/2008
3 System Reconfiguration – A Tactical Approach to Safety
  Assurance
        The aim of this chapter is to explore the problems associated with the safety justification of
        industrial process control systems that are reconfigurable in real time. Once a set of safety
        problems have been established this chapter will go on to describe a number of system safety
        tactics that could be used to address generic safety problems associated with plug and play
        reconfigurable systems. This method will then be applied to a case study in Chapter 4.

        The scope of the project proposal phase is,

         •      An introduction to industrial process control and the use of intelligent field devices such as
                sensors and actuators in the context of this project proposal.
         •      A Generic Architecture for Process Automation and Reconfiguration exploring the
                application of the three layer IMA architecture to a process automation system. The section
                also discusses the benefits of applying a layered architecture in terms of the use of safety
                tactics for reducing the risks associated with plug and play technology.
         •      General Safety Issues with system reconfiguration providing an overview of some general
                failures modes of reconfigurable process automation systems and the impact on safety. In
                this section, three high level functions are considered; Plug and Play at the Field Devices
                layer, Device Reconfiguration at the Field Devices layer and System Reconfiguration at the
                Application Layer.
         •      A Tactical Approach to Safety Assurance introducing the concept of safety tactics for
                software architecture design and how it can be modified for the application of intelligent field
                devices within a reconfigurable system.
         •      A proposed method of Instantiation of the Tactical Approach that introduces a lifecycle
                approach to the application of safety tactics.
         •      A description of how the use of safety tactics and safety contracts can be brought together to
                form a safety justification.

3.1   Process Control in Context
        Process control systems as introduced in Section 2.2 have been widely used across
        industry for many years, particularly in the automation of process plants. In the context
        of developing a tactical approach to the safety assurance of reconfigurable process
        control devices the following working definition of process automation is used.
        “The use of computer controlled sensors and actuators in a system to automate
        repetitive tasks and to maintain process variables within set limits”
        This is a useful definition as it essentially describes a system in which some physical
        process is monitored and controlled in order to ensure that some process variable
        (pressure, flow, temperature, level, velocity etc) is maintained within defined parameters.
        The implementation of a process automation system can take one of several different
        forms depending on the specific application. For simple installations, standalone control
        of a single process variable can be achieved with limited resources. For more complex,
        widely distributed applications, Distributed Control Systems (DCS) are typically used
        with multiple networked controllers, field devices and supervisory stations forming a
        large control network with distributed data processing. It is at this distributed system
        level that the safety tactics for plug and play reconfiguration will be discussed.
        Process automation naturally lends itself to a layered analysis due mainly to its modular
        construction from component parts as well as its adoption of open standards and clearly
        defined interfaces between system components. It is therefore proposed to take the
        layered approach to system safety analysis from the established discipline of IMA and to
        apply the techniques to a process automation system that includes aspects of dynamic
Adrian E Hill                                          -38-                                         04/09/2008
        reconfiguration. As well as applying an IMA approach to safety analysis, a number of
        safety tactics will be devised from the IMA approach to safety assurance as well as from
        work undertaken to justify COTS software acquisition. These techniques will then be
        applied to a process automation system to demonstrate that the risks posed by a
        reconfigurable system employing “plug and play” technology can be reduced to a
        tolerable level.

3.2   A Generic Architecture for Process Automation and Reconfiguration

3.2.1 Introduction to           a   Generic    Architecture     for   Process    Automation     and
      Reconfiguration
        In general terms the architecture, including hardware and software, of a process
        automation system could be described by reference to at least six layers,
         •      Process Control Applications – Typically bespoke application specific software that
                is designed to provide process automation control sequences and human to
                computer interface (HCI). This layer provides the human supervisory functions that
                are typically found in control rooms as well as the bespoke control algorithms that
                are typically found in automated controllers (PLCs for example).
         •      Operating system and System Management – The operating system provides an
                interface between the application layer and the hardware and is used to provide
                support services such as process and memory management, network support and
                file management. In many systems there are also COTS system management
                software packages that are supplied with system components and that enable the
                management of diagnostics, health monitoring and system configuration.
         •      Computer Hardware Support – The controlling hardware, including memory,
                processors, human input and output devices such as touch screen, printers etc.
         •      Network Infrastructure – Physical network components including network switches
                and transport medium.
         •      Input/Output (I/O) data Logic Devices – The point at which local input/output signals
                are connected to the network infrastructure, typically though field wiring interface
                boxes.
         •      Field Devices – A collection of sensors and actuators, some of which could be
                intelligent and therefore could be connected directly to the network infrastructure.
                Other devices could be dumb and would have to be connected to I/O logic devices
                through analogue field wiring.
        In order to facilitate the analysis of safety issues with a dynamically reconfigurable
        process automation system I have decided to use a simplified three layer architecture
        that that could be directly compared to the ASAAC three layer IMA architecture [7]. The
        IMA approach was chosen in order to simplify the architecture so that the key interfaces
        could be identified and also to enable the safety analysis principles developed for IMA
        systems to be extended, modified and applied to process automation.
        Figure 3-1 shows a comparison between the IMA architecture and simplified process
        automation architecture that was developed as part of this project. The fourth layer
        shown on the IMA software architecture in Figure 3-1 relating to field devices has been
        included to show the direct comparison between the IMA structure and the process
        automation structure. The external hardware (sensors and actuators) does not normally
        form part of the three layer architecture as they connect into the IMA system through an
        I/O signal concentrator, which is part of the network infrastructure [39]. However, the
        IMA concept does not currently include the use of intelligent field devices and therefore
        does not include the ability to network such devices directly into the system. This is a
Adrian E Hill                                     -39-                                    04/09/2008
        significant difference between an IMA system and the proposed process automation
        system that needs to be taken into account when comparing the two systems. The other
        main difference is that the IMA architecture only takes into consideration the software
        layers within each IMA unit whereas the process automation architecture includes both
        software and hardware.




                                   Figure 3-1 Layered Architecture

3.2.2 A Proposed Process Automation Architecture
        Figure 3-2 shows an expanded process automation architecture that will be used as the
        basis for the discussion around the safety aspects of a real time reconfigurable system.




                      Figure 3-2 Expanded Process Automation Architecture
        The process automation architecture in Figure 3-2 is not representative of any specific
        system, but has been derived from the type of architecture typically used in
        commercially available DCS systems. This architecture has also been developed in
        such a way that it could reasonably be applied to many types of system currently in use

Adrian E Hill                                 -40-                                    04/09/2008
        or indeed to future process automation systems that are likely to want to implement new
        digital technologies and asset management schemes. The architecture is split into three
        layers that closely interface with each other to provide services from low level field
        instrumentation through to top level bespoke process applications. This three layer
        architecture is discussed in greater detail below.
        Application Layer
        The application layer represents the bespoke process automation applications that are
        specifically developed for the individual systems under control. Process automation
        applications typically reside in dedicated single board computers or within PLCs. In both
        cases, the application code will implement logic that is used to monitor and control areas
        of the plant through interfaces to field devices.
        In terms of the relationship between this layer and the equivalent IMA application layer, it
        is envisaged that the same functionality with regard to reconfiguration of the system by
        running applications on free resources in the event of their “normal” resource not being
        available due to a failure would be available. In this sense, the principle of partitioning of
        resources in order to manage the safety requirements of individual applications could be
        applied in the same way as in the IMA world. However in large DCS installations
        processing may need to be remain close to the point of measurement due to latency and
        timing issues.
        Infrastructure Layer
        The infrastructure layer is a simplified representation of several layers of infrastructure
        that are required to ensure that data is effectively transferred between applications and
        intelligent field devices.
        It is this layer that includes the operating system that provides services to the application
        layer including interfaces to the computer hardware and network devices that are also
        scoped within this layer.
        In DCS systems, data is typically held in distributed databases where the value of each
        signal parameter is updated at specified intervals. The data held in the signal database
        is made available to applications in accordance with the rules of the database. The
        database is typically updated from direct interfaces to the field devices and from
        calculated data from the process applications. Therefore, the database is well placed in
        the infrastructure layer.
        Health Monitoring at this layer is used to monitor the diagnostics data from a range of
        intelligent field instruments and to report the health status of the entire plant system.
        Modern health monitoring systems are typically marketed as asset management
        packages that not only monitor the health of intelligent field devices but also store long
        term diagnostic logs that are able to be analysed for trends to aid preventive
        maintenance. Leading DCS suppliers typically provide asset management packages as
        part of their range of applications, an example is the Siemens SIMATIC PCS 7 [51] that
        is capable of being integrated into a wider business information network. Other DCS
        suppliers offer similar packages including the AMS Suite from Emerson Process
        Management.
        Network Management includes the network protocols that allow data to be successfully
        transmitted between all networked components.
        System Management applications are used to monitor the components that make up the
        process automation system. For example in Foundation FieldBus systems, an OPC
        server is used to monitor process variables across the whole process in order to detect
        fault conditions or even conditions that could lead to a potential fault in the process.
Adrian E Hill                                    -41-                                      04/09/2008
        Bailey [21], provides an overview of such system monitoring and suggests that common
        problems such as liquid loss, pressure loss, sensor drift and valve problems can be
        detected even though individual intelligent field instrumentation components do not
        report an error condition.       In a system that is capable of initiating real time
        reconfiguration, the system management application(s) is fundamental to the detection
        of faults that may require action to be taken to ensure the process continues in the short
        term without immediate maintainer intervention. In such a system, it is envisaged that
        the system manager in conjunction with the process application would be capable of
        reconfiguring the system to provide additional fault tolerance through alternative or
        reconfigured field instrumentation.
        Configuration Management also needs to be managed so that the system configuration
        is always known and that new components are successfully integrated. It is envisaged
        that a configuration management application, either bought as COTS or developed as a
        bespoke application would be used as part of the system infrastructure. The
        configuration of the system would be held within this application along with individual
        configuration details for each intelligent field instrument.
        In commercial systems configurable off the shelf software applications are available to
        enable the user to monitor the health of the system and can also be used for managing
        the configuration of the system. These applications have also been included as part of
        the infrastructure as they provide a service to both the application layer and field devices
        layer.
        Field Devices Layer
        This layer is made up of intelligent field devices that are used to sense and actuate the
        process under control in accordance with commands from the applications. To enable
        reconfiguration at this level, industry standard intelligent field instrumentation and
        actuation is used in accordance with one of several standards such as Foundation
        Fieldbus, ProfiBus, IEEE 1451 or other digital protocol. Such standards allow devices to
        be configured and reconfigured, either under their own authority or under the authority of
        a controlling system.
        Some Health Monitoring and Diagnostic functionality also resides at this level. This
        functionality resides within the intelligent field devices that are able to self diagnose
        problems and to report back their status to a higher level health monitoring and asset
        management system for further processing and remedial action. There are also
        possibilities for health monitoring and prognostics within some intelligent field devices at
        the level of autonomous reconfiguration where potential failures are identified in advance
        from trend analysis, but this is out of the scope of this project.

3.2.3 Benefits of a layered architecture
        The main benefit of splitting the process automation architecture into the three distinct
        layers is that it allows change to be managed in the most efficient way possible. Without
        layering, change would ripple through the entire system increasing cost and complexity.
        For example, if a layered approach is taken, it should be possible to make changes to
        the field instrumentation without having to make very expensive changes to the bespoke
        applications. The infrastructure layer can be used to separate the field devices layer
        from the application layer and present an application with predefined data irrespective of
        the specific field device manufacturer. All that would be required is that the field device
        would be capable of fulfilling the requirements of the application that calls upon its
        services, either by sensory data or actuation commands.



Adrian E Hill                                   -42-                                     04/09/2008
        In the same way, it can be seen that changes at the application layer should not in
        theory necessarily drive changes in the field devices layer so long as the data and
        functionality of the field instrumentation remains sufficient for the needs of the
        application. It should not matter to the individual field device what functionality the
        application has so long as its interface to the application remains unchanged and the
        data requirements remain valid. By reducing the ripple effects the efficiency of the
        safety argument is also improved as a modular approach to the architecture enables a
        modular approach to the safety case. Modularity can be used to prevent the
        propagation of changes in the safety case to those modules that are directly affected by
        a component change in the system.
        Change at the infrastructure layer is more likely to be significant given that components
        in this layer provide services to both the application layer and the field devices layer.
        However, once the infrastructure has been established it is not usually subject to logical
        change, i.e. new management applications and operating system rather than physical
        change, i.e. routing of network infrastructure and position of physical components.
        Components within the infrastructure layer would not be candidates for real time
        reconfiguration although it is recognised that there is a high probability that updates
        throughout the operational life of the system would occur.

3.3   General Safety Issues with system reconfiguration

3.3.1 Scope of Issues
        In order to be able to define the functions of reconfiguration that could plausibly result in
        a hazardous situation occurring within a process automation system, it would seem
        prudent to establish the scope of the analysis.
        With reference to Figure 3-2 on page 40 and the generic architecture for process
        automation, the main candidates for the safety analysis of reconfiguration reside at the
        application layer and the field devices layer. It is not expected that components that
        reside in the interface layer would be subject to real time reconfiguration. It is also
        assumed that the effects of the network infrastructure on the integrity of reconfiguration
        data across the network can be ignored as can the safety assurance of the network
        infrastructure devices. This assumption is based on the use of commercially available
        devices that have achieved a sufficient level of safety assurance and that the processing
        of reconfiguration data between an application and the device is treated the same as
        any other piece of safety related data. However, the consequences of data corruption
        on the implementation of plug and play functionality still needs to be taken into
        consideration when developing safety tactics for its use.
        As far as the field device layer is concerned, the safety analysis being considered in this
        work need only be concerned with effects of real time reconfiguration and not the means
        by which the function of reconfiguration has been implemented in the individual device
        firmware. This is because I have made the assumption that in all cases, sufficient safety
        assurance that an intelligent field device will function correctly can be gained through the
        application of pre-certification to a recognised and respected safety standard such as
        IEC 61508 [5] or other appropriate means that can be justified in the context of the
        system in which the intelligent field device provides the safety related function. In
        practice this level of safety assurance would be provided within a safety case/argument
        for each individual field device which is considered to be outside the scope of this work.
        In this section, the safety analysis of a system that implements dynamic, or real time
        reconfiguration has been undertaken at a relatively high level and has not attempted to
        go down to individual functions as the focus is on identifying a generic set of safety
        issues. As an aid to analysis three failure guidewords have been used, these are,
Adrian E Hill                                   -43-                                      04/09/2008
         •      Commission: This is a type of failure that incorrectly triggers a reconfiguration where
                a reconfiguration is not required and would be hazardous.
         •      Omission: This is a failure to successfully complete the reconfiguration process
                within an acceptable timeframe.
         •      Value: This failure can be split between a data value that is grossly incorrect and
                one that is subtlety incorrect. In both cases, the processes of reconfiguration has
                resulted in incorrect data being transmitted either by a field device to an application
                or from an external source to a field device.
         •      The two other classic failure guidewords of “Early” and “Late” have not been used in
                this analysis because they were not considered to be appropriate. Timing issues in
                real time systems are often very important and can be safety critical, however, in the
                case of real time reconfiguration of intelligent hardware, the emphasis needs to be
                on the correctness of the reconfiguration rather than the timing of the
                reconfiguration. For example, a field device that needs to be configured by a
                system application may take a set time to update however if it fails to be updated
                with the correct configuration then a hazardous situation may result. If the device
                takes longer than expected to update but still updates properly it would be
                reasonable to expect the system to continue to operate within safe limits. The same
                reasoning applies equally with the failure of “too early”. The only critical timing issue
                concerns the point at which the reconfiguration could be described as having failed,
                i.e. a failure of “omission” has occurred, or if early, that a failure of “commission” has
                occurred. System specific safety analysis would have to take the timing of
                reconfiguration into consideration and define a strategy for determining the point at
                which a reconfiguration should be declared a failure of either commission or
                omission.
        The analysis itself, is focused on three types of reconfiguration,
         •      Plug and Play at the Field Devices Layer
         •      Device Reconfiguration at the Field Devices layer
         •      System Reconfiguration at the Application Layer

3.3.2 Plug and Play at the Field Devices layer
        In a non reconfigurable system, a maintenance engineer would need to manually update
        the configuration settings for the device in the application. In a plug and play
        reconfigurable system it would be expected that the replacement device would
        automatically update the configuration settings held at the infrastructure layer and used
        by the application.
        Plug and play functionality is also provided by other factory automation systems that
        include Foundation FieldBus and ProfiBus compatible field devices using device
        description languages. However, in the case of these devices, the configuration of the
        device is undertaken at the infrastructure layer by the maintenance engineer or
        automatically by the system from configuration data held by the device. Therefore, the
        replacement of a field device should still be transparent to the application layer.
        With plug and play there are a number of failure modes that could lead to a hazard,
         •      Transmission of configuration data fails so that the device data held by the system
                and used by the application is not updated when an intelligent field device is
                replaced in the system. There are a number of consequences that could be
                considered depending on the criticality of the system. The worst would be an
                incorrect set of data such as offsets, measurement ranges and sensitivity settings
                leading to grossly inaccurate data available to the application.

Adrian E Hill                                       -44-                                       04/09/2008
         •      The replacement field device includes additional functions that were not available in
                the original device. These additional functions cause the interface between the
                device, system infrastructure and the application to fail in some unpredictable way
                such that either the device or application fails. At best the additional functions
                would be ignored and the full range of device functions would not be available to the
                application. At worst, there are several implications for the system. The device
                itself could fail to be configured properly and therefore provide incorrect data to the
                application. The device could attempt to transmit additional, unexpected data to the
                application and cause the application to fail in some way, either by crashing or by
                incorrect operation. Depending on the actual effects, the failure may not be
                detectable if the application still believes the data is credible, i.e. is within normal
                operating parameters. However, this may still be hazardous as the system/operator
                may react inappropriately to incorrect data without realizing it.
         •      Incorrect configuration data is transmitted either by the field device or by the
                configuration manager application. This could be caused by a corruption of data or
                by incorrect data provided by the device manufacturer that is embedded within the
                device. The consequence of incorrect data will either be detectable or undetectable
                depending upon the extent to which the data is incorrect. Subtle errors in calibration
                data may not be detectable but could have a significant effect on the safety of the
                process control. Other errors such as incorrect device identification may be easier
                to detect if for example the system expects the field device to be a specific type of
                sensor and the device describes itself as an actuator.

3.3.3 Device Reconfiguration at the Field Devices layer
        Reconfiguration at the field device layer can take place in response to either internal (to
        the device) or external (control system) influences.
        In practice, the issues concerned with external influences are very similar to those
        associated with plug and play in that rather then the field instrument updating the system
        with its internal configuration it is the external system that downloads configuration
        information to the newly installed field instrument. Of course, this issue may also arise
        through maintenance errors that result in a field device being incorrectly configured.
        External influences that cause device reconfiguration could contribute to system hazards
        in the following ways,
         •      Configuration download from external system causes the field device to malfunction
                such that the device fails to provide the expected functionality with the expected
                Quality of Service (QoS). A failure of this nature could contribute to a system
                hazard by causing the field device to behave in a nondeterministic manner as well
                as providing incorrect data to the system. In circumstances where the data is
                grossly incorrect, the failure may be detectable and therefore mitigation may be
                straightforward. However, in cases where the data is credible but incorrect the
                effects on the system may be more subtle, undetectable and therefore more likely to
                lead to a hazardous situation arising.
         •      External system allows field device to continue in failed or degraded state. The
                probability of this failure mode occurring would depend on the extent to which the
                external system had authority over the intelligent device to stop it reconfiguring in
                the event of a failure. If the external system failed to allow the intelligent device to
                reconfigure then the overall system would be compromised and at the very least a
                layer of fault tolerance would be lost and at worst the process could fail completely.
         •      External system fails to reconfigure device correctly either by failing to recognise the
                intelligent device (plug and play) or by the intelligent device failing to respond
                appropriately to the commands transmitted by the external system. This type of

Adrian E Hill                                       -45-                                     04/09/2008
                failure would result in a device operating to a QoS below that expected by the
                external system with a high probability that the data from the intelligent device would
                be erroneous.
         •      Reconfiguration of the field device in response to internal influences is typically
                caused by a failure of an internal component that results in the field device taking
                some action to mitigate for the failure and thus attempt to provide a level of fault
                tolerance. Although this additional fault tolerance is welcomed, it can also provide
                an opportunity for the field device to contribute to system hazards by either failing to
                reconfigure (total or partial) or to spuriously reconfigure whilst in a healthy state. A
                partial or full failure to reconfigure would generally result in reduced QoS with the
                associated compromise to the data transmitted to the external system. In the case
                of a spurious reconfiguration QoS may be reduced although the device should still
                provide a minimum level of performance. The risk with spurious reconfiguration is
                that if it is allowed to continue the intelligent device could start to babble, constantly
                moving from one state to another and causing resources to be taken and the
                system to become unstable as it attempts to provide a service to the babbling
                device.
         •      If a field device is capable of self reconfiguration then there is a further issue
                concerned with the authority to reconfigure and where that authority should reside.
                In dumb field devices, the configuration of the device is set at manufacture and
                remains set through life apart from changes that are necessary due to calibration or
                maintenance activities. With intelligent devices, the configuration can be much
                more flexible in terms of how, when and by whom it is changed. In the case of self
                reconfiguration a key safety issue is the “who” element of the reconfiguration.
                Traditionally, this was either the controlling system or the maintenance engineer.
                With intelligent technology it can be foreseen that an intelligent field device may
                have the authority to self reconfigure without regard for the wider system
                configuration. In some circumstances, autonomous self reconfiguration may lead to
                a system hazard as described in the case of the babbling device.

3.3.4 System Reconfiguration at the Application Layer
        Applications that sit at the application and infrastructure layers of the model presented in
        Figure 3-2 (Page 40) have a broader view of the entire system compared with the
        relatively narrow view of an individual field device, although as reported by Discenzo et
        al [27], greater autonomy is finding its way into the lower levels of the process control
        architecture. However, it is assumed that for currently available commercial systems,
        the autonomy of field devices is very limited and therefore real time reconfiguration of
        the wider system is to be managed through applications that sit at higher levels in the
        architecture.
        Reconfiguration at the application layer can be expressed as either reconfiguration of
        the field devices layer in order to work around a component or subsystem failure or
        reconfiguration of the control system and applications in a similar way in which IMA
        systems reconfigure in the presence of failures. This includes the ability to move
        processes to alternative resources on a distributed computer network or to provide a
        means of graceful degradation by the reprioritisation of tasks.
         •      A total failure to successfully reconfigure part of the system in the presence of a
                failure would typically cause that part of the system to remain in a failed condition,
                i.e. a level of fault tolerance would be lost. This would have various consequences
                depending on the overall system and the layers of protection above the point of
                failure.



Adrian E Hill                                       -46-                                       04/09/2008
         •      A partial failure to successfully reconfigure part of the system in the presence of a
                failure would typically cause the behaviour of effected system and components to
                become nondeterministic. The effect on safety would be that there would be a high
                probability that the plant would fail in an unknown and therefore potentially unsafe
                state.

3.4   A Tactical Approach to Safety Assurance
        The approach to the safety assurance of reconfigurable intelligent field devices should
        be developed iteratively, taking as a stating point a simple plug and play scenario as the
        first step to a reconfigurable system and then as experience is gained, enhancing the
        method for more complex reconfiguration scenarios. As a starting point, the safety
        justification of intelligent plug and play field devices has been approached from a tactical
        position, drawing on the safety tactics proposed by Wu and Kelley [41] which were
        introduced in Chapter 2, section 2.7.

                                                          Safety




                     Failure Avoidance



                                             Failure Detection
                Acquisition   Safety Case
                 Process       Contracts
                                            Diagnostics                          Failure Containment
                                            Health Management
                                            Comparison

                                                                   Redundancy         Recovery          Masking



                                                                   Redundancy     Degradation             Fault
                                                                   Replication    System Reset         Management


                         Figure 3-3 Modified Safety Tactics for System Reconfiguration

        Figure 3-3 shows that while the fundamental hierarchy has been retained, the safety
        tactics have been modified to show how the safety assurance of reconfigurable plug and
        play systems may be approached from a safety critical systems perspective. In this type
        of system, the extent of reconfiguration permissible is limited to that necessary for an
        intelligent device to be plugged into the process automation system, either as a direct
        replacement for a defective device or as an extension to the system. Once physically
        connected, the expectation is that the intelligent device and control system would
        communicate and that the intelligent device would be automatically configured for use in
        the system. This type of functionality is analogous with the plug and play functionality
        enjoyed in personal computing when a periphery is automatically configured by the
        Operating System.
        The safety tactics shown in Figure 3-3 are hierarchical and should be applied top down
        and from left to right so that failure avoidance is considered before failure containment.
        This approach is based on the common understanding that it is far better to avoid a
        failure occurring than to have to contain it when it occurs. Of course, failure detection
        and failure containment should also be included in any overall approach to risk reduction
Adrian E Hill                                               -47-                                           04/09/2008
        and safety assurance although differing strategies may be required depending on the
        extent to which a safety argument can be made with respect to any specific system
        implementation.
        In the case of a plug and play reconfigurable system there are a number of failure
        avoidance, detection and containment tactics that could be used to good effect and the
        specific strategy in this case would be towards a balanced mixture of all three main
        tactics. This is mainly because of the way in which the process automation industry has
        developed its product base towards high quality components, diagnostics & health
        management as well as a whole system approach to plan availability in order to reduce
        operational overheads and plant downtime. It is this approach to failure management
        that can be exploited for safety assurance.

3.4.1 Failure Avoidance
        The first and most significant safety tactic to consider is failure avoidance. In the safety
        tactics for software architecture design, it was proposed by Wu and Kelly [41] that
        failures can be avoided by the use of simplicity and substitution. However, when put in
        the context of intelligent field devices and reconfiguration the tactic to use simple
        devices would not be feasible as by definition, intelligent field devices introduce greater
        complexity in comparison to dumb devices because they introduce complex electronics
        and various types of programmable logic. Equally, it can be seen that a substitution type
        tactic that resulted in the use of the least complex device available from a choice of
        many would in theory still hold with the intent of the tactic. But in practice this would still
        lead to similar issues due to the rate at which intelligent field devices continue to
        increase in complexity. As the technology matures and manufacturers take advantage
        of that technology in their products it becomes increasing difficult to acquire simple
        devices. Therefore, in order to develop a sustainable safety tactic that could be used
        effectively, two alternative but complementary tactics that sit equally at the same level as
        “simplicity” and “substitution” are proposed.
        The first tactic concerns the acquisition process and draws from the work undertaken by
        Ye [33] towards the safety justification of COTS software components. Although Ye’s
        work was focused on the acquisition of COTS software components the actual process
        for specifying and acquiring a software component could work equally effectively for any
        COTS component. It is therefore proposed that the principles of the COTS component
        acquisition process could be used to ensure that potential failures are avoided by the
        careful specification, evaluation and justification of intelligent field devices within a
        reconfigurable process automation system.
        The second tactic in the area of failure avoidance has been developed from a review of
        the methods used in the safety assurance of IMA systems that was undertaken during
        the development of these safety tactics. One of the techniques developed by Conmy et
        al [38] for IMA safety assurance is the use of safety assurance contracts. It is proposed
        that the principles of safety assurance contracts could be employed within the process
        for COTS acquisition to support the safety justification of plug and play reconfigurable
        COTS components.

3.4.2 Acquisition Process
        Ye [33] presents a framework for Software COTS acquisition that is integral to the
        established “V” development and safety lifecycle model shown in Figure 3-4.




Adrian E Hill                                    -48-                                       04/09/2008
                Figure 3-4 “V” Model of the COTS Development/Safety Lifecycle [33]
        Within the framework presented by Ye, although the process takes into account the
        availability of suitable components in the market place, the final selection of a COTS
        component is placed towards the right side of the V. The activity of evaluating and
        selecting a COTS component proposed by Ye partially takes in aspects of
        implementation but also includes integration, testing and delivery. There are also safety
        analysis activities to confirm that the selected component is suitable within the overall
        design. When this process is compared to a typical design development and build
        lifecycle of a large complex industrial process automation system the final selection of
        an individual component (in this case intelligent field device) is usually taken much
        earlier in the lifecycle. Typically this is a direct result of the need to “freeze” system
        designs early in the lifecycle in order to support the overall programme for the product
        development. Therefore, the COTS component evaluation and selection needs to be
        brought earlier in the lifecycle to enable detailed design work to continue with known and
        understood components that can be ordered early in the development lifecycle and can
        be safety justified. However, the definition of a “Reference COTS Component” is an
        interesting concept and one that can be utilised in the acquisition of process control
        COTS components.
        Therefore, if the evaluation and selection of a COTS component can be moved earlier in
        the lifecycle, the principles of the acquisition process described by Ye [33] can be drawn
        out and mapped as a basic process as shown in Figure 3-5. The process described by
        Ye [33] follows the design and development lifecycle from architecture definition through
        to system safety justification with a clear bias toward the activities around the
        architecture development. In respect to COTS acquisition and the process that can be
        applied to the acquisition of intelligent field devices there are two main activities;
        definition of a COTS component, i.e. defining the requirements, and the evaluation of a
        COTS component, i.e. ensuring that a best match is made between the requirements
        and the available components (shaded activities in Figure 3-5).




Adrian E Hill                                  -49-                                    04/09/2008
                            Figure 3-5 COTS Software Acquisition Process


        If the two activities to define and evaluate a COTS component are to be applied as a
        failure avoidance safety tactic then they need to be decomposed further and rearranged
        in order to differentiate between the evaluation, acquisition and safety justification
        activities in the context of the system architecture and design. The specific activities that
        need to be undertaken to support this safety tactic can then be identified in the context
        of the system design, development and safety activities. This is shown in Figure 3-6
        (shaded areas).




                               Figure 3-6 COTS Component Acquisition




Adrian E Hill                                   -50-                                      04/09/2008
        Reference COTS Component
        The first step toward ensuring that a system function can be supported by a suitable
        COTS component is to define the specification for the reference COTS component that
        will be used. This is called the “Reference Component”. Ye describes the reference
        COTS component as “a set of essential functions that are required by an application
        component, and are provided by a number of COTS products on the market” [33]. The
        specification for the reference COTS component within a safety related system is driven
        by the broader process automation system requirements that are vital to the success of
        the overall safety argument and the avoidance of intolerable component failures.
        From a process automation perspective, the reference component is required to fulfil a
        specific function or functions within the overall design of the system. For a plug and play
        component, there are not only the standard requirements such as function, performance
        and spatial constraints but there are also specific requirements concerning the
        communication protocols that are to be employed for network connection and integration
        with specific system and application software packages. The component is also
        required to conform to a specific device data type to enable successful integration with
        the system including the update or download of configuration data to and from the
        system. This is a key requirement and one that has a significant impact on the system
        architecture and design. At the present time there are three main types of device data,
        Field Device Tool/Device Type Manager (FDT/DTM technology), Electronic Device
        Description Language (EDDL) and Device Description Language (DDL) with the
        prospect of a fourth once the IEEE 1451.4 Transducer Electronic Data Sheet (TEDS)
        begins to be adopted by industry. FDT/DTM, DDL and EDDL are well documented and
        although development continues, are now mature and therefore able to be implemented
        on the two main commercially available systems, Foundation FieldBus and
        ProfiBus/ProfiNet/ProfiSafe. The IEEE 1451 series of standards are new and untested
        in the process automation industry and therefore not considered further.
        As recognized by Ye [33], market forces also play a part in the definition of a reference
        COTS component as it is pointless to define a software component that is required to
        support an application but cannot be purchased. In terms of plug and play applications,
        the principle remains the same in that the specification for the reference component
        must be reasonable and should not preclude a large proportion of the components
        available in the market place. Intelligent field devices are becoming widely available, but
        their features remain relatively limited in terms of their functionality and interoperability.
        The issues over device description languages previously discussed in this section are an
        example of this problem and must be taken into consideration. The limitations in the
        number and type of intelligent field devices may have an impact on the final system
        architecture and may drive additional safety requirements.
        Evaluation and Selection Criteria
        Ye [33], identifies safety analysis as a key part of the COTS acquisition process and
        proposes a method of safety analysis called “Component Criticality Analysis” (CCA)
        which “…examines the degree of contribution that each individual failure mode of a
        component has with respect to system safety” [33]. As part of an overall activity
        designed to derive the safety requirements for the COTS component CCA is undertaken
        as the focus for the safety analysis within the context of the system architecture, By
        defining a number of scenarios for the component and then undertaking the analysis, Ye
        develops the safety requirements for the COTS component into a Safety Contract that is
        used to specify the expectations of both the COTS Component and receiving software
        application. The safety analysis defined by Ye [33] forms part of the activity termed as
        “Evaluation and Selection Criteria” which forms a critical part of the COTS component
        acquisition and is shown in Figure 3-7.

Adrian E Hill                                    -51-                                      04/09/2008
                Figure 3-7 COTS Component Evaluation, Selection and Acquisition
        The activities described by Ye in the evaluation and selection criteria for a software
        COTS component can be used, with some slight modification, for a process automation
        COTS component.
        For a plug and play intelligent field device the main concerns regarding the safety
        requirements for the device within the system are with the behavioural characteristics of
        the device as it interacts with the host system. As the commercial process industry has
        adopted IEC 61508 [5], safety analysis is usually concerned with the functional safety
        aspects of the system, whether that system is a single component or a system of
        components. Therefore the contribution that a COTS component makes to a specific
        function is fundamental to the definition of that component’s safety requirements. The
        process industry has also bought into the concept of Safety Integrity Levels (SIL) with
        the widespread adoption of IEC 61508 [5] and therefore the contribution each individual
        COTS component makes to a safety related function needs to be determined in order to
        be able to define a safety integrity level (SIL) target for that component.
        As Component Criticality Analysis as described by Ye [33] covers various safety
        analysis techniques including HAZOP, FFA and FTA, the principal of CCA can, like other
        principles of COTS acquisition, be used as part of the COTS component acquisition
        process in support of the failure avoidance safety tactic. The significance of CCA is that
        it may be used to identify the contribution the plug and play COTS component makes to
        system safety and therefore it supports the COTS component acquisition process. The
        safety analysis of a system implementing plug and play technology can also be used as
        the basis for developing specific safety tactics so that hazards are identified and reduced
        to ALARP.

3.4.3 COTS Acquisition and Safety Contracts
        Out of CCA comes a need to document the specific requirements of the COTS
        component in order for the system to achieve safety assurance. Ye [33], uses a method
        known as a safety contract in much the same way as Conmy et al [38] does in the
        safety assurance of IMA components to establish the specific requirements of both the
        component and the application/system that component is used in.
        It is therefore proposed that the principal of defining and using a safety assurance
        contract between the proposed COTS component and the host system is a key activity
        in both IMA and software COTS acquisition that could be effectually transferred to the
        safety assurance of intelligent, plug and play field devices.

Adrian E Hill                                  -52-                                     04/09/2008
        In the safety contract scheme proposed by Ye [33], the contract is focused on the safety
        justification for the COTS component through the satisfaction of goals. This can be
        demonstrated by the example contract in Table 3-1.


         Contract between application and COTS argument modules
         Application Safety Objectives for COTS             COTS Component Safety Claims
         Goal               SAL     Context                 Goal         SAL   Context
         Failure mode x 3           Operational             Failure mode 3     Assumptions on
         of COTS shall              assumptions etc         x will not         how          the
         not occur                                          occur              component
                                                                               should be used
         ----------         ---     --------------          ----------   ---   ---------------
            Table 3-1 Example of Safety Contract for COTS Component Acquisition [33]
        In this example, the contract is formed between an application and a COTS software
        component. Goals such as “Failure mode x of n shall not occur” are defined within the
        context of the operational scenario which then allows Safety Assurance Levels (SAL) to
        be set. By referring to Goals and SALs, Ye ties the development of the contract into the
        development of a safety case using Goal Structuring Notation (GSN) such that the
        validation of safety arguments within the safety case can be achieved through verifying
        that the safety contact between the COTS component and application has been
        satisfied. The use of SALs also ensures that the contribution that the achievement of
        each contract makes to the system safety case is defined and assigned a level of
        importance in the safety case. The use of SALs is significant in that where process
        based safety standards that use SILs or DALs etc are not used there needs to be a way
        in which the arguments in the safety case can be determined to be sufficient. The
        application of SALs is further described by Weaver et al in [42].
        In the context of the process automation industry, the use of SILs is prevalent and must,
        for the foreseeable future, be taken into account. Therefore, while the theory of applying
        a contact that makes use of the SAL concept may be valid for cases where goal based
        safety cases are to be developed, the use of SILs within the overall safety argument for
        COTS components still needs to be taken into consideration. This is the case in the
        process automation industry where many COTS components including intelligent plug
        and play field devices are pre-certified to a SIL in accordance with IEC 61508 [5] by
        recognised certification bodies.
        Conmy et al [38], takes a slightly different approach in the definition and application of a
        contract than that taken by Ye [33] in that rather than applying the contract in the context
        of goals, the contract between the Component (supplier) and Application (client) is
        formed on the basis of a guarantee. Each guarantee is defined to address a specific
        potential failure as defined by safety analysis of the component within the context of the
        system, and described as derived safety requirements. An example of this type of
        contract is shown in Table 3-2.




Adrian E Hill                                        -53-                                   04/09/2008
         Behavioural Level Constraints
         Potential Failure       Guarantee (Supplier)        P/A     Application(s) (Client)     P/A
         Configuration      of   Component x will only O             Application reads back A
         component x is          allow configuration to              component              x
         incorrectly altered     be    changed    within             configuration        and
         by the application      defined limits                      verifies     correctness
                                                                     against known profile
                                                                     for device type.
         ---------               ------------                ---     ---------------             ---
                      Table 3-2 Example of IMA Behavioural Level Contract [38]
        The example contract shown in Table 3-2 expresses the behavioural constraints that
        have been derived from the safety analysis. In this example, the potential failure is
        documented along with the derived requirements relating to the supplier (component)
        and the client (application). Each contractual constraint is marked as either being an
        Obligation (O), Proposed (P) or Accepted (A). This is a particularly useful notation when
        applied to COTS components as obligations in particular should drive the acquisition
        process down a particular product route. This is due mainly to the specific functionality
        offered by a COTS component when evaluated against the required obligations. In the
        example shown in Table 3-2, there is an obligation on the COTS component x to only
        respond to a defined list of configuration changes demanded by the host system
        application. It is this type of obligation that is likely to drive the acquisition and provide a
        differentiating factor when comparing a range of prospective COTS components.
        In describing a process for developing safety assurance contracts for IMA systems
        Conmy et al [38] provides for the development of contracts at four levels of abstraction,
        “High level Requirements”, “Architectural Constraints”, “Behavioural Constraints” and
        “Quantifiable Details” (performance, reliability etc.). In terms of the safety assurance of
        a plug an play intelligent COTS field device, each of these four levels of abstraction
        need to be considered in the context of process automation and the particular safety
        issues associated with such systems. High level requirements should focus on the
        overall requirements for the COTS intelligent field device in terms of the function it
        provides within the overall process automation system. The architectural contracts can
        be used to focus on the overall system and specifically on wider issues such as network
        topologies for example. At the behavioural level of abstraction the safety contract would
        need to be focused at the individual COTS component whilst the quantifiable details
        could be used to record other obligations both on the host system as well as the actual
        COTS component.
        Whilst either of the two examples of safety assurance contracts (Ye or Conmy et al)
        could be implemented for a process automation system using plug and play
        components, the idea of focusing on the interaction between the COTS component and
        the application/system in terms of obligations is the most relevant. However, the need to
        take into consideration the use of SILs in the process automation industry also needs to
        be factored into the contract arrangements between a COTS component and the host
        process automation system.
        As neither system for describing a contract that were reviewed exactly fit the
        requirements for explicitly defining the relationships between a COTS component and
        the system within which it functions, an alternative contract definition has been
        developed as shown in Table 3-3.




Adrian E Hill                                     -54-                                       04/09/2008
       Behavioural Level Constraints
       Function                 High Flow Alarm                                            SIL   2
       COTS Device              Flowmeter xyz
       Potential Failure           COTS Device (Supplier)       IMP     Host System (Client)         IMP
       Alarm “high” setpoint       Shall not be allowed to      L       Compare device               H
       configuration error         change set points in                 setpoints with values
       (set too high)              Configuration Database               held in Configuration
                                                                        Database
       ----------                  -------------                ---     -----------                  ---
                              Table 3-3 Example of COTS Component Contract
        With the primary objective of avoiding failure of a safety related function due to a failure
        of an intelligent COTS component, the definition of a contract has been developed from
        a merging of ideas from the work undertaken by Ye [33] and Conmy et al [38]. This
        modified definition has been developed to address the following specific requirements
        that are needed to address plug and play functionality,
         •      The architectural, behavioural and quantitative constraints to address the
                dependences between COTS components and the system in which it functions. At
                the architectural level this mainly concerns the whole system where at the
                behavioural level the concern is at the individual COTS component level
         •      The interactions between the COTS component and the system infrastructure.
                contracts at this level need to ensure that a plug and play component is able to
                safely integrate itself with the host system through the services of the infrastructure
                layer. Of particularly importance is the interface to the infrastructure and how
                reconfiguration failures when data is passed through to the application layer and
                control commands are received back from the application layer may be mitigated.
         •      The interactions between the COTS component and the application layer. If the
                principle of layers is successfully implemented there should be very little if any direct
                interaction between the COTS component and the application layer, as all data
                should be processed through the infrastructure layer. However, it would be useful if
                the contract could be used to enforce this principle.
         •      The need to identify the contribution the COTS component makes in to the SIL
                requirements of the system function.
         •      The behaviour of COTS asset management application software that resides at the
                infrastructure layer and has a direct influence on the failure modes of a plug and
                play driven system/component reconfiguration.
        In order to capture the functional SIL requirement that impacts on the COTS device and
        the client application, provision has been added for the function and SIL to be recorded.
        This means that the contracts can be formed on a functional basis and therefore more
        critical functions can be given priority as well as enabling each COTS device that
        contributes to that function to be identified.
        The Potential Failure information is retained from the contract defined by Conmy et al
        [38]. This allows the potential failure modes of the plug and play device to be explicitly
        tied to the derived requirements for the device and the client application. This should
        help in the development of the system safety case as, although not described in terms of
        goals, there needs to be a method of ensuring that all potential failures have been
        addressed, either in the COTS component or in the host system.


Adrian E Hill                                       -55-                                      04/09/2008
        For each of the COTS device (supplier) and application (client) constraint there is an
        opportunity to identify the impact of the constraint on either the COTS component or the
        implementation of the application. This is necessary to understand the relative
        importance each constraint may have on the overall COTS acquisition. For example, if
        there were a constraint on the COTS device to limit the amount and type of data it was
        capable of uploading to a system configuration management application then there
        would be a direct impact on the choice of devices available to fulfil that function.
        Therefore, such a constraint would have a high impact compared to a constraint on a
        COTS device not to allow data to be uploaded to an application. In this case the choice
        of COTS devices that do not upload data to a host system may be greater than those
        that do and therefore the constraint has less of an impact. The impact is identified in the
        contract by the column “IMP” and the level of impact is identified as being “High”,
        “Medium” and “Low”. For example, if the constraint is likely to cause requirements to be
        placed on a COTS device that would make the acquisition of such a device more
        difficult, then the impact of that constraint is said to be “high”. Although the distinction
        between high, medium and low is quite coarse it can still be used as part of the process
        to evaluate a number of COTS devices and could be a significant differentiator between
        competing products.
        In conclusion, the first safety tactic when implementing plug and play technology in a
        reconfigurable process automation system is to reduce the probability of failure through
        the use of a robust COTS acquisition process that is closely linked to system design and
        hazard analysis. It is also apparent that the use of safety contracts as part of a failure
        avoidance tactic encompasses all of the other safety tactics as well. In this sense, failure
        avoidance is the primary tactic to be used in the justification of plug and play technology.

3.4.4 Failure Detection
        In terms of being able to detect a failure, the use of intelligent field devices offers a
        significant advantage over more traditional dumb field devices. To detect a failure of a
        dumb device additional checking is required within the host system which could be quite
        complex depending on the granularity of the required detection. In this description the
        term “host system” is defined as the system in which the field device provides a service.
        Simple failures such as incredible, incorrect outputs can be straightforward to detect by
        checking whether or not the output is within a set of predefined bounds. For more subtle
        failures that result in credible but incorrect outputs, detection becomes more difficult and
        typically requires the use of redundant devices with comparison and voting type logic in
        order to determine whether or not the output was valid.
        With newer, intelligent, field devices, the device itself can provide some failure detection
        through inbuilt diagnostics. This functionality can be used in conjunction with health
        monitoring software applications that are often called Asset Management, to detect very
        subtle failures either by direct analysis of the diagnostic data from the device or by the
        long term analysis of trend data. Sophisticated failure detection techniques are also
        commonly used to reconfigure the device and/or system to contain and mitigate failures.
        For simple plug and play technology the main failure detection safety tactic that could be
        successfully implemented would be a combination of Diagnostics and Comparison as
        depicted in Figure 3-8.




Adrian E Hill                                   -56-                                     04/09/2008
                                            Opportunities to detect a failure




                                         Self Check &            Comparison &
                                         Diagnostics             Trend Analysis


                                        Device initiated          Host initiated

                                               Health/Asset Management


                                       Figure 3-8 – Failure Detection


        Device Initiated Failure Detection
        The first tactic in detecting a failure of a plug and play intelligent field device is to use the
        intelligence built into the device itself. If a scenario is considered where a faulty device
        is replaced with a new device of the same type and functionality as the original it should
        also satisfy the all of the constraints documented in the safety contract for the function(s)
        provided by that specific device. In the simple case of a like for like replacement, the
        plug and play functionality within the device could initiate a self diagnostic check on
        power up and the results transmitted to the host system. On receipt of diagnostic check
        data, the host system would then be in a position to verify that the data was correct and
        either initiate the reconfiguration of the device in order to set the device up to function
        correctly in the host system or take alternative action as required.
        The opportunity to detect a failure may also exist in the use of diagnostics to check that
        the configuration data passed from the host system to the field device is valid for that
        particular device. If the diagnostic check of the configuration failed, e.g. the host system
        had tried to download the configuration data for a different type of device, then this
        would be classed as a failure and the process by the host system to assimilate the
        replacement device into the system would be aborted. It is assumed that there would be
        sufficient assurance that the diagnostic functions of the device performed correctly and
        that the data was correct. This type of failure detection is based on the principles of
        handshaking so that the field device only responds to a valid configuration command
        from the host control system.
        Another prominent failure mode concerns the transmission of reconfiguration commands
        to the intelligent field device from the controlling system. Process Automation systems
        typically operate over large areas, particularly in factory environments and therefore
        there is a potential for data to be corrupted or to suffer from timing issues caused by
        network latency. This could result in a plug and play device receiving a spurious
        command to reconfigure when such a command was not required and be hazardous. A
        tactic that could be used to detect this type of failure would be to use checksums.
        Checksums are useful as they allow a packet of data to be verified by the field device
        comparing a check sum generated by the host system and transmitted with the packet
        with a checksum generated on receipt of the data by the field device. This allows bit
        errors in the data to be detected and failure containment action taken, either to reject the
        data and request a resend or to reject the data and flag the communication channel
        defective.


Adrian E Hill                                        -57-                                    04/09/2008
        Host initiated Failure Detection
        The scope for developing tactics to detect the failure of a plug and play initiated
        reconfiguration when considering the host system rather than the COTS device is
        focused on comparison and analysis techniques.
        There are a number of ways in which COTS software applications may be used to detect
        a failure by utilising asset management type functionality. In the least complex systems,
        a plug and play device connects to a COTS asset management software package and is
        then configured by the maintainer to work in the specific application. The safety tactic
        that could effectively be implemented in such a case would be for additional software,
        which could be termed the “Configuration Manager”, in the infrastructure layer to be
        used to read back the configuration from the device and compare it to a known “good”
        configuration. In this scenario, a failure in the configuration of the device would be
        detected by the Configuration Manager and a warning passed through to the application
        layer, raising a warning to the system operators.
        This tactic could also be used in a more complex plug and play reconfiguration scenario
        where the interchange of configuration data between the intelligent device and the asset
        management application takes place automatically when the device is plugged into the
        host system. In this case there would need to be specific constraints on the data
        exchange protocols and interchange timings so that the configuration verification takes
        place at the right time in the process.
        For a safety tactic that takes advantage of a comparison between configuration settings,
        there needs to be a known good reference configuration to check against. In large
        complex systems with a large number of different COTS devices there is a problem with
        storing, processing and managing a large number of reference configurations within the
        host system. This is not dissimilar to the issues faced by the use of real time blueprints
        in IMA where the number of blueprints needed to deal with a large number of possible
        reconfigurations becomes unmanageable. However, in the case of process automation
        systems, whilst there may be large numbers of field devices, there may not be an
        unmanageable number of different types of field device.
        The other issue with a tactic that may have a large number of possible reference
        configurations is one of performance and timing. In IMA systems, the time it takes to
        reconfigure a system in real time may be critical as circumstances change very quickly
        whilst the aircraft is in flight. With process automation systems, timing may not be as
        critical. Therefore the time it may take to search a database of reconfigurations may not
        become problematic unless the number of reconfigurations becomes unmanageable
        within the available system resources.
        Predictive analysis is another technique that could reasonably be used as part of a
        failure detection tactics. Predictive analysis is a system that uses diagnostic data
        collected from system components to create trends that can be used to predict specific
        types of failure before they occur. It is proposed that such a system could be extended
        to compare the output from a plug and play configured device over a period of time to
        known good data either from expected data or from similar devices in the system. This
        type of analysis requires data to be logged and compared with “ideal” data and therefore
        would not provide an immediate means of failure detection which for critical processes
        may be too late to prevent a hazardous failure becoming an accident. Although useful in
        systems where long term failure rather than instant failure is important to detect, this
        particular tactic would need to be carefully considered before use.



Adrian E Hill                                  -58-                                    04/09/2008
3.4.5 Failure Containment
        The tactics that could be employed to contain a failure of the plug and play functionality
        of a device are really no different to the type of tactics used to contain a failure of any
        intelligent field device. This is because a failure to reconfigure properly following the
        replacement of a plug and play intelligent device cannot be contained in isolation from
        the requirement to maintain the functionality provided by that device. This is because a
        failure of the device to be configured properly would inevitably result in the device failing
        to provide the required level of service.
        In terms of failure containment, the best that could reasonably be achieved is the
        containment of the failure to that particular device and function so that a failure of the
        plug and play functionality would not be propagated throughout the system. A suitable
        tactic to prevent propagation of a failure would be to ensure that architecturally the
        interfaces between layers are maintained such that a failure is constrained to the
        interface between the field device and the infrastructure layer and is not allowed to
        propagate through to the application. Once a failure has been detected, it would be
        advisable for the system to flag the device as failed and to take other appropriate action
        to further contain the effects of the failure. By flagging the device as failed, the system
        acts in a deterministic way and stops the device from autonomously coming back on
        line. By effectively shutting off the device from the system, the problem of a babbling
        device is also managed by the system, so that a defective device cannot constantly
        attempt to make the transition from in-service to out of service and back.
        Therefore to contain the failure such that the function to which the device contributes is
        maintained then there are a number of tactics that can be applied at the architectural
        level. It is proposed that failure containment can be achieved by using failure
        containment techniques based on the principles of redundancy, recovery and masking
        established by Wu et al [41].
        Redundancy Based Failure Containment
        Redundancy is the classic way of containing a failure whilst maintaining the overall
        function within the system. Essentially, the success of this tactic is dependant upon the
        provision of additional resources that are capable of providing the same services as the
        failed device. This is achievable if the system architecture is arranged in such a way
        that additional resources could be switched in without other critical functions being
        compromised. In other words, there is a constraint on the system architecture to provide
        redundancy without degrading other critical functions, although in some cases it may be
        acceptable to degrade some less critical functions in order to maintain more critical
        functions. In this case, the use of intelligent field devices in combination with asset
        management software can be used to ensure that systems remain configured to provide
        graceful degradation in the face of single or multiple failures. To be effective, the host
        system would need to have alternative resources available at the moment a device
        failure was detected and for that resource to immediately provide the same level of
        service, either through direct measurement or derived by calculation. However, this
        cannot always be implemented, especially in circumstances where additional resource
        would be prohibitively expensive or impracticable to implement.
        Recovery Based Failure Containment
        The tactic that could be used to recover from a failure to successfully reconfigure a plug
        and play field device is to allow the failed device to reset to factory defaults and then to
        retry the reconfiguration until successful. However, there are safety issues with this
        approach that may make recovery less safe than to leave the device in a failed
        condition.


Adrian E Hill                                   -59-                                      04/09/2008
        The main issue concerns the number of times the device should be allowed to retry the
        reconfiguration before it is determined that a permanent failure has occurred. If there
        are no constraints on the number of times a failed device is allowed to retry
        reconfiguration then the danger is that the device will continue to retry indefinitely. This
        is commonly called “babbling” and can result in the system becoming unstable and its
        behaviour unpredictable. This can pose significant risks to the safe operation of the
        system and should not be allowed to persist. However, there sometimes needs to be a
        trade-off between what is desirable and what is practicable. In the case of recovery from
        a failure to reconfigure a plug and play device, there needs to be a trade-off between the
        requirement to maintain the services of the device and the requirement for the system to
        remain stable. In most cases I single reattempt at configuration would be tolerated so
        long as the timing of such a retry did not compromise the performance requirements for
        the function being serviced by the device. For example, where timing is critical, it may
        be more tolerable to immediately set the device out of service and fall back to a
        redundant configuration to ensure that the system continues to function until a time
        comes when it is safe to undertake maintenance on the device. In other less time critical
        circumstances, it may be possible to allow the device to make several attempts to
        reconfigure before it is flagged as out of service by the system. Whatever approach is
        taken to a failed reconfiguration, once flagged as out of service a device should not be
        allowed to move back into service until either replaced with another device or repaired in
        situ. In all cases, the return to service following some remedial action should be by an
        explicit command by the controlling application within the infrastructure layer and should
        not be left to the individual device to take control and return to service.
        Masking Based Failure Containment
        In terms of failure containment, the use of masking is more problematic. The classic
        approach to masking a failure is through the use of a voting scheme as identified by Wu
        and Kelly [41] and shown in Figure 2-14. Typically, in such a scheme there would be an
        odd number of signal channels feeding data into voting logic that would then compare
        each of its inputs and set the output to the majority result. In this scheme a failure in any
        number of minority channels is masked by the fact the voting logic in effect filters out the
        data that does not concur with the majority of signal channels that make up the system.
        The question is, what is the equivalent means of masking a failure of a reconfigurable
        plug and play device?
        The predominant failure mode that is relevant to this type of containment is incorrect
        operation by the device following a reconfiguration that has been instigated by a
        replacement device being “plugged” into the system. Incorrect operation may well
        involve incorrect data from the device that could be either credible or incredible but it
        could also involve incorrect operation of an actuation function. For an actuation function
        the use of a voting system would be inappropriate and the only way it could be masked
        would be to implement redundant actuator(s) such that a second, third or fourth actuator
        would seamlessly perform the function in the event of a single or multiple failures.
        In the case of a sensing system, although a voting system could be implemented in a
        system that implemented plug and play field devices in the same way that a voting
        system could be implemented in a more traditional system, it would necessitate the
        installation of additional sensors which is not always possible. An alternative approach
        would be to take advantage of the intelligence within the overall system. Figure 3-9
        shows a comparison between a 2 out of 3 traditional voting system and a system based
        on intelligent plant management.




Adrian E Hill                                   -60-                                      04/09/2008
                                                        Voting Logic


                           Figure 3-9 Masking Based Failure Containment
        It is proposed that the intelligent plant management system is able to mask an intelligent
        plug and play field device’s failure to reconfigure by monitoring the state of the plant in
        real time and identifying the source of any potential failure. Once a failure has been
        detected, the plant management system would be able to determine what the true data
        should be with a defined level of confidence by comparison with a plant model.
        In the example shown in Figure 3-9, a failure of the device providing Variable 1 could be
        masked by the Plant Management System taking data from the devices providing
        Variables 2 through 4 and based on a plant model determine the data that should be
        provided by Variable 1. For this tactic to work the plant model used by the management
        system needs to be of sufficient fidelity to be able to provide an estimate of the data with
        high enough confidence to ensure that the plant is able to continue safe operation.
        There also needs to be a minimum number of sensors within the system to provide the
        management system with an accurate view of the state of the system. Although this can
        be more efficient than the traditional voting system because there is no need for multiple
        channels, there is the issue of the level of confidence that can be placed in the derived
        variable.
        With a voting system, a number of physical sensors provide the data that is used by the
        voting logic and the output from the voting system is real data from a physical device
        operating in real time. However, in an intelligent plant management scheme for
        providing masking, the data provided by the system in the presence of a failure is
        derived from a multitude of different sources that due to their level of measurement
        fidelity could introduce errors into the data. This is especially true the further away from
        the point of measurement the plant data is derived such that there is a potential
        difference between the derived data and the actual value of the process at the desired
        point of measurement. This problem is typically handled by plan management software
        by providing a confidence level for the derived data that enables the system or an
        operator to assess the potential impact on the system of accepting the derived data. For
        safety related functions it would be sensible to assess the risk associated with a range of
        confidence levels and to set boundaries within the application software so that once

Adrian E Hill                                   -61-                                     04/09/2008
        those boundaries are breached then the system is brought into a known safe state and
        the defective device repaired or replaced.

3.5   Safety Justification
        As identified by Ye [33] there is a need to ensure that the use of the COTS component is
        able to be justified within the context of the system it forms a part. This justification can
        be achieved through the careful matching of products that appear on the market with the
        requirements for a component that are derived from the safety analysis and associated
        safety contracts. In Ye [33], this safety justification is presented as a number of safety
        case patterns, documented in GSN, that argue that the use of a COTS component is
        “acceptably safe”. The argument that a specific COTS component is acceptably safe is
        based on arguments around the acquisition process and being able to argue about the
        suitability of the COTS component from a product perspective.
        For intelligent plug and play COTS field devices the principles of safety justification can
        be applied with a similar safety argument pattern proposed by Ye [33] if adapted for use
        for system components rather than software components. From a safety critical process
        automation perspective it is apparent that the chosen COTS device needs to be
        justifiable in the context of the host system and not just the properties of the device itself.
        Likewise, the process undertaken to select the COTS device needs to be sufficiently
        robust to support the overall argument. In this proposal, the use of safety contracts
        developed from a number of appropriate safety tactics is the way in which proposed
        COTS devices may be evaluated and selected. It is therefore proposed that the process
        as described in this chapter provides a sound basis for moving forward into a real world
        case study.
        It also has to be acknowledged that given the large variety of products in the market
        place the evaluation and selection of a COTS device may be quite challenging. This is
        not made any easier by the commercial motivation of COTS manufacturers that is driven
        by market forces. This is especially the case when seeking to ensure that safety
        requirements that are derived from the acquisition process are satisfied by a specific
        device.
        In terms of safety justification there needs to be sufficient evidence that a best fit has
        been achieved and that the match is sufficient to demonstrate that the risk of
        implementing the chosen COTS device has been reduced to ALARP. In practice, the
        need for a robust justification may result in a disproportionate amount of mitigation being
        placed on the host system where there is typically greater control over the bespoke
        elements of the system design.

3.6   Instantiation of the Tactical Approach
        The discussions around the proposal take a tactical approach to the implementation of
        reconfigurable intelligent hardware has focused on tactics that are designed to ensure
        that, where possible, failures are avoided, detectable and constrainable in the context of
        a process automation system. The tactics that have been defined in this proposal phase
        of the project are aimed at high level system functions and interactions and are therefore
        generic in their nature. It is envisaged that the generic safety tactics that have been
        discussed here will be further decomposed and instantiated as system specific tactics.
        The proposed process for working with safety tactics is presented as a “V” lifecycle in
        Figure 3-10. On the left hand side of the “V” are the activities that must be completed
        before the system level solution and associated safety justification can be established.
        The activities on the right hand side of the “V” are concerned with ensuring that the
        chosen solution including COTS devices and host system are consistent with the safety

Adrian E Hill                                    -62-                                       04/09/2008
        tactics and that a safety justification can be established for the choice of COTS device in
        the context of the system safety case. The overall lifecycle encompasses the more
        detailed COTS acquisition process as well as the process for developing safety
        contracts that have been discussed throughout this chapter. The lifecycle also provides
        for the important verification and validation (V&V) activities that are required to ensure
        that the processes involved in working with safety tactics are appropriate and the
        outputs sufficient for supporting the safety justification.

                                                Verify that safety justification satisfies safety tactic
                                                                      principles

                         Safety tactics                                                                     Safety Justification




                                                                                                                   Validate that
                Validate against safety                                                                         solution and safety
                    tactic principles                                                                           justification match
                                                                Verify that solution
                                                                  satisfies lower
                                                                                                                             Safety Justification
                                                                level safety tactics
                                                                                                                             contributes to system
                                          Instantiated Safety                                                                safety case
                                                                                        Select solution
                                                Tactics

                  Lower level safety
                  tactics derived from
                  safety analysis of                                                                                     Device selected on
                                           Validate that                                     Validate that
                  preliminary design                                                                                     basis of best fit with
                                              safety                                       solution satisfies
                  and reference COTS                                                                                     safety contracts
                                             contracts                                      safety contract
                  device
                                            capture all
                                           safety tactics                                                  Changes to host
                                                                 Safety Contracts                          system implemented
                                                                                                           iaw safety contracts



                                          Figure 3-10 Lifecycle of Tactical Approach
        In Figure 3-10, validation routes are provided that ensure that each stage of the safety
        tactic development is consistent with the previous stage. Likewise, verification routes
        are also provided to ensure that the selected solution satisfies the instantiated safety
        tactics and that the safety justification satisfies the project safety tactics.
        Although not specifically defined here, it is anticipated that a design review type process
        would be used as part of the V&V activity. As part of the review process, documented
        peer reviews could be used to make an informed evaluation of the extent to which the
        outputs from each stage satisfy the input requirements. Likewise, it is envisaged that
        reviews would be used to ensure that the stages on the right of the “V” are brought
        under an appropriate level of scrutiny from not only peer groups but also independent
        bodies operating with sufficient independence to support the overall safety case.

3.7   Conclusion to Proposal Phase
        This phase of the project has looked at the way in which a commercial process
        automation system can be compared to the three layer IMA architecture model and in so
        doing the principles of applying safety tactics can be used to justify the use of intelligent
        COTS field devices.
        The concept of applying a number of established safety tactics to focus on the specific
        issues with the implementation of plug and play intelligent field devices within process
        automation systems has been explored. The range of safety tactics that could
        successfully be used have been identified under the general headings of Failure
        Avoidance, Failure Detection and Failure Containment. It is proposed that these three
        main tactics should be used together to provide a broad approach, so ensuring that the

Adrian E Hill                                                          -63-                                                                  04/09/2008
        risks involved with taking advantage of plug and play technology are reduced. The
        interaction between the safety tactics can be focused on the primary means of reducing
        risk through the avoidance of failures.
        By applying the principles of IMA, it has been shown how a process for acquiring a
        COTS intelligent field device can be used in conjunction with safety tactics, and safety
        contracts as part of an effective strategy to justify the use of the plug and play intelligent
        currently being implemented within process control field devices.

3.7.1 Overview of Safety Tactics
        The generic safety tactics described in this proposal phase can be summarized as
        follows using the acronyms FAST for failure avoidance, FDST for failure detection and
        FCST for failure containment safety tactic.
        Acquisition Process (FAST#1) - Failure avoidance when implementing plug and play
        technology in a reconfigurable process automation system can be achieved through the
        use of a robust acquisition process that is closely linked to system design and hazard
        analysis.
        Safety Case Contracts (FAST#2) - The use of contracts in the acquisition of COTS
        components and the specification of the system architecture is a key feature of failure
        avoidance.
        Health Management (FDST#1) – Failure detection should be implemented across the
        boundaries between the field device, infrastructure and application architectural layers
        through the use of asset management/health management applications.
        Diagnostics (FDST#2) - Failure detection should be implemented in the field device
        through the use of internal diagnostics. Internal diagnostics should be considered at
        device power up and periodically through operations as well as during significant events
        such as configuration and reconfiguration.
        Comparison (FDST#3) – Systems that are capable of detecting failures/error in data
        transmission between the field device and host system should be implemented.
        Redundancy (FCST#1) – The application of redundancy should be considered in order
        to contain field device configuration failures. Analytical sensing may be used for
        redundancy purposes in certain circumstances.
        Recovery (FCST#2) – A balanced approach to device initiated auto recovery from a
        failure to reconfigure a plug and play device. Auto recovery may be achieved by
        allowing the failed device multiple attempts at configuration before flagging it as failed.
        Masking (FCST#3) - Analytical Masking should be considered as an alternative to
        traditional voting schemes. If the field device data is found to be defective then
        analytical sensing could also be used in conjunction with this tactic.




Adrian E Hill                                    -64-                                      04/09/2008
4 Case Study – Plug and Play in a Submarine Integrated Platform
  Management System
      This chapter describes the submarine based case study that was undertaken in order to evaluate the
      project proposal that was developed and describes in Chapter 3.
          • An introduction to the Case Study is provided that describes the overall scope of the project.
         •      A brief overview is provided that describes how the submarines position in the water is
                maintained whilst stationary or a very slow speeds using what is known as Submarine Hover
                Control.
         •      A description of the high level Submarine Hazards associated with the submarine and
                specifically the hover control system is provided. The requirements for an intelligent hover
                control flowmeter are also defined.
         •      An overview of the Selection and Application of Safety Tactics that were used in this case
                study.
         •      The instantiation of the generic safety tactics in the context of a plug and play COTS
                Flowmeter in which the principles discussed in chapter 3 are further developed and applied.
         •      The results of the application of safety tactics to the acquisition of a COTS flowmeter in the
                form of an overall Safety Justification.

4.1    Introduction to Case Study




                                      Figure 4-1 Typical Submarine at sea
         This case study is based on the implementation of intelligent plug and play field devices
         in a nuclear powered submarine application. The purpose of this case study is to take
         the proposed approach and the principles of applying safety tactics in an intelligent plug
         and play application that was discussed in Chapter 3 and to apply them to a real process
         automation system. The outcome from this case study is to determine the extent to
         which plug and play intelligence can be implemented safely by using a range of
         appropriate safety tactics.
         The scope of this case study is limited to the plug and play functionality that is found in
         many intelligent devices and will become more prevalent as the technology matures and
         is further developed. This limited scope purposely omits the basic function of an
         instrument to either measure a process parameter or to influence a process parameter
         through activation. It is therefore assumed that the basic function of the instrument can
         be safety justified through established safety certification activities.


Adrian E Hill                                         -65-                                        04/09/2008
4.2   Submarine Hover Control
        For the purposes of assessing the safe application of plug and play intelligence on a
        nuclear powered submarine I decided to look at the submarine hover control system that
        is a sub system of the larger integrated Platform Management System (iPMS).

4.2.1 Overview of Hover System
        Due to the security constraints on the actual submarine design, a hypothetical
        arrangement that could theoretically be used to maintain the depth and trim of a
        submarine while stationary or at very low speed is shown in Figure 4-2. This basic
        system design is based on the authors understanding of typical submarine
        hydrodynamics and the principles of the required performance and does not represent
        any known current design. When the submarine is submerged and stationary or at very
        low speed the hydrodynamics of the submarine is such that the main means of
        controlling depth, i.e. the hydroplanes and propulsion systems are not effective and
        therefore an alternative method is used. Under such conditions, the method of
        maintaining the attitude of the submarine is known as “Hover Control”, which is
        synonymous with the “hover” function of a helicopter.




                            Figure 4-2 Basic Submarine Hover Control
        The basic function of hover control is to maintain the position of the submarine by
        moving large quantities of sea water between tanks. Sea water is either pumped from
        the external environment to make the submarine heavier and so increase depth or to
        pump water out of the submarine to make it lighter and so reduce depth. By moving

Adrian E Hill                                -66-                                   04/09/2008
        water from tank to tank, the trim of the submarine can be changed so that a level
        position between starboard, port, forward and aft can be maintained.
        As can be seen from Figure 4-2, the movement of water is controlled by the use of a
        number of valves and pumps such that under specific conditions the line up of pipe work
        can be achieved in order to pump water throughout the system and by doing so alter the
        weight distribution. By adding or subtracting water from the Port Tank and Starboard
        Tank the submarine trim can be adjusted in order to maintain a level platform. Likewise,
        by adjusting the amount of water in the Ballast Tank the forward/aft trim as well as depth
        can be maintained.

4.2.2 Hover Control
        Overall, the system is automatically controlled by the onboard iPMS which provides both
        monitoring and control of the system. Figure 4-3 shows how the overall function of
        controlling the Hover capability can be fitted to the previously described (Section 3.2.2)
        three layer architecture that is a tailored IMA architecture [7].

                                                                            Application Layer

                                                                                  Hover Control




                                                                                       Interface




                                                           Health      Network    System             Operating   Configuration
                                                          Monitoring Management Management            System       Manager

                                                                            Infrastructure Layer
                                                                                       Interface




                                                                                                Level
                                                                Pumps         Valves                        Flowmeters
                                                                                               Sensors
                                                              Diagnostics   Diagnostics       Diagnostics   Diagnostics
                                                                            Field Devices Layer



                           Figure 4-3 Hover Control Layered Architecture
        At the application level, there is an algorithm that takes as its input, data from the
        submarine dynamics such as depth, speed, pitch and roll (Figure 4-4). The algorithm
        also takes inputs from the various flowmeters, pumps, valves and level sensors installed
        throughout the hover system in order to determine the state of the system at any one
        given point in time. From this data, the application algorithm is able to determine the
        state if the system, the dynamic performance of the submarine and to command
        changes to the system in order to maintain the submarine’s position or to move the
        submarine to a new position in response to a command (human) decision.
        Movement of the submarine within a safe operating envelope is maintained by the
        movement of water between the various tanks and the external environment. With
        reference to Figure 4-2, the field instrument that has the most significant impact on the
        hover control is the flowmeter that measures the volume of water that is taken into the


Adrian E Hill                                  -67-                                                         04/09/2008
        submarine from the environment and the volume of water that is removed from the
        submarine. Therefore, I have decided to use this flowmeter for the case study.




                                          Depth




                                                               Pitch
                            Roll



                             Figure 4-4 Submarine Depth, Roll and Pitch


        Currently a typical submarine system employs a flowmeter with microprocessor based
        flow measurement technology that provides a 4 – 20 mA analogue signal to the iPMS as
        an input to the hover control application. It is envisaged that in future submarines, an
        intelligent plug and play flowmeter will be used to provide the same function that should
        also provide the benefits of advanced diagnostics and health management. With
        respect to this case study, the benefits of using a plug and play flowmeter would be to
        allow defective units to be easily replaced without the need to manually update the
        system with the specific characteristics of the replacement flowmeter. Intelligent plug
        and play should also allow the flowmeter installation, setup and calibration to be
        achieved in one simple operation. Overall, the use of an intelligent flowmeter should
        also provide enhanced asset management of the device, although this is outside the
        scope of this case study.

4.3   Submarine Hazards
        By its very nature, the operation of a submarine is a hazardous business. It operates
        both above and under the surface of the World’s oceans. The UK Nuclear Attack
        Submarine is also a piece of military hardware, designed to operate in extreme
        conditions in times of both peace and conflict. It is home is approximately 100 people,
        provides domestic (hotel) services and utilities (heating, electricity and water) and all in
        close proximity to a nuclear power plant. The main hazards concern the living
        conditions (air quality), nuclear radiation, Health and Safety (plant operation and working
        conditions), the handling of munitions and the physical movement of the submarine
        through the water.
        It is the hazards associated with the manoeuvrability of the submarine that concerns the
        implementation of plug and play technology in the submarine hover application. In the
        case of the flowmeter marked in Figure 4-2, the main hazard concerns the movement of
        large volumes of water into and out of the submarine from the external environment. A
        failure of the flowmeter could lead to too much water being allowed to enter the
        submarine ballast tank which could lead to the submarine losing depth control.
        Ultimately this could lead to an accident with catastrophic consequences as the
        submarine could hit the bottom or some other object. Equally, not allowing enough
        water into the submarine ballast tank or not pumping enough water out could also lead

Adrian E Hill                                     -68-                                   04/09/2008
        to a catastrophic accident, again either by hitting another object or “broaching” the
        surface of the water. This type of accident is more likely to happen in shallow waters
        where the safe envelop of operation is much smaller although even in deeper waters,
        there are hazards associated with the hover system. While in hover mode of operation a
        submarine is also able to provide a stable platform for operational purposes that if the
        function were to fail would cause significant harm to personnel. Therefore, in general
        terms, the function of hover is safety related and one of the significant field devices used
        in the provision of this function is the flowmeter.

4.3.1 Flowmeter Failure Analysis
        A number of lower level requirements have previously been defined in the FMEA [45]
        undertaken by the Submarine Designer which is an analysis of the main flow process
        measurement. The results of the FMEA undertaken on the Hover system showed that
        there are three credible dangerous failure modes that could result in a catastrophic
        hazard to the submarine. These three failures are briefly described as,
         •      Failure 1 – No measurement output from the flowmeter and therefore no flow
                measurement input values for the control algorithm. This is a very obvious failure to
                the control system.
         •      Failure 2 – Grossly incorrect measurement output from the flowmeter causing the
                control algorithm to make grossly incorrect control decisions. Generally, the control
                system would be able to detect such grossly incorrect input from the flowmeter and
                take necessary action.
         •      Failure 3 – Subtle incorrect measurement output from the flowmeter causing the
                control algorithm to make incorrect control decisions that would lead to a loss of
                hover control. This is perhaps the most dangerous failure as the control system
                would not detect the failure and would continue as normal.
        A further FMEA was undertaken as part of this case study in order to define the failures
        of the Hover flowmeter intelligent plug and play functionality. The FMEA was
        undertaken using the guide words of “Commission”, “Omission”, “Grossly Incorrect” and
        “Subtly Incorrect” as these were considered the most appropriate for an intelligent
        device. Performance type guide words such as “Early” and “Late” were not used
        because for the purpose of this project in a plug and play application, the transfer of data
        is not considered to be time critical and therefore such guide words are not relevant.
        Rather the important failure modes are those concerned with the correctness of the data
        being transferred as well as the correctness of the intelligent functions that provide the
        plug and play abilities of the flowmeter. This is not to say however that in other cases
        timing guide words “Early” and “Late” would not be appropriate. Using these four guide
        words the following functions were analysed,
         •      Communication: The means of transferring device and system configuration and
                diagnostic data between the flowmeter and the host system. In this case study the
                FDT/DTM scheme was used on a profibus type network infrastructure. Typical
                communication failure modes explored included the failure and corruption of data as
                well as the failures associated with the flowmeter and host system inappropriately
                initiating communications, i.e. failures of commission.
         •      Initialisation: One of the primary functions of the plug and play is the initialisation of
                the device when it is first connected to the host system and initial configuration
                takes place. Failure modes explored included failures in the initialisation and
                configuration process as well as gross and subtle corruption of configuration data.
                Failures of the flowmeter and host system to make the transition from initialisation to
                normal measurement state were also considered.


Adrian E Hill                                       -69-                                       04/09/2008
         •      Diagnostics: Although not strictly part of the plug and play function, device and
                system diagnostics play an important supporting role, especially with regard to the
                detection of failures. Therefore, failure modes in the diagnostic function were
                explored. These included failure of the built in flowmeter diagnostics as well as the
                failure of the diagnostic functions built in to the FDT/DTM mechanism.
        The results of this additional FMEA can be found in Appendix A, a summary of which is
        discussed below.
        Communication Failures
        The main source of communication failures is the network connecting the flowmeter and
        the asset management application that forms part of the iPMS architecture is corrupted
        transfer of data between host system and device. At the time a flowmeter is inserted in
        to the host system, there is an initial transfer of data between the host system
        discovering the device using the FDT/DTM mechanism and the device transferring
        identification and configuration information to the host system.
        The main function of plug and play is to ensure that the device is easily and successfully
        integrated into the host system. The initial detection of the device by the host system is
        a key feature of plug and play functionality. However if the host system fails to detect
        the flowmeter on initialization there is no immediate hazard because the hover
        application would not be operational at the time the flowmeter was inserted. The
        immediate consequence would be that the flowmeter would not appear in the system
        topology and therefore would be unavailable to the Hover system, constraining the
        operation of the submarine.
        With many intelligent devices configured on the system, each with their own device
        drivers it is possible that on initialisation the system topology held in the system
        configuration database could become updated with incorrect data. This is caused by the
        host system detecting and initialising the wrong device so that the configuration settings
        of another device appear on the system as those of the hover flowmeter. This would
        result in the hover application being supplied with incorrect data and possible loss of
        control.
        Once the host system has detected and connected to an intelligent field device there is a
        possibility that the system then fails to disconnect from the device when commanded to
        do so. In the case of the hover system, this failure could be caused by the hover
        flowmeter continuously being connected to the asset management application. This
        results in the hover application being denied access to flow measurement data from the
        flowmeter. Likewise, if the flowmeter itself fails to disconnect from the system when
        commanded to do so a similar situation arises.
        Initialisation Failures
        Following communication failures between the hover flowmeter and the host iPMS the
        next significant area concerned with plug and play technology is that of device
        initialisation. Device initialisation typically concerns the updating of the intelligent device
        with specific data including parameters, device specific diagnostic and calibration
        settings from the asset management application using the DTM (device driver) provided
        by the device manufacturer .
        The failure of the hover flowmeter to read the initial settings from the iPMS and to
        successfully update itself is a key failure mode in this application. This type of failure will
        typically result in the flowmeter coming on-line incorrectly configured for its specific role
        in the hover application. It is assumed that when the hover flowmeter does come on-line
        and fails to be updated with application specific data it will continue to operate with its
        factory default settings. This might not be significant in terms of safety risk if there is
Adrian E Hill                                     -70-                                      04/09/2008
        little difference between the factory defaults and the settings required by the target
        application. However, there is a risk that the delta between the configuration setting
        required by the hover application, and specifically the calibration settings, and the
        default will be significant enough to cause the hover application to loose control of the
        hover process.
        Initialisation of the device as part of the plug and play functionality also involves the
        device itself transmitting data to the host system. Failure of the hover flowmeter to
        accurately transmit configuration data to system could result the hover application failing
        to maintain control over the hover process due to either gross or subtle errors in the
        data.
        As far as the host system (iPMS) is concerned there are failure modes that can also
        contribute to the failure of the hover system to provide its function following a plug and
        play operation. These failures centre around the actions of the asset management
        application that sits in the infrastructure layer of the iPMS. It is the asset management
        application that initialises the update of the hover flowmeter with configuration data as
        well as ensuring that the system configuration is kept up to date with relevant device
        data.
        If the iPMS was to continuously send initialisation commands and data to the flowmeter
        there is a possibility that the flowmeter would respond to those commands by remaining
        in its initialisation mode and would therefore be prevented from outputting measurement
        data to the hover application. At the point of plug and play initialisation the hover system
        is not operational and therefore a failure at this point would impact the availability of the
        function rather than the safety of the submarine. However, as a function of the
        interaction between the iPMS and the hover flowmeter, the possibility that the iPMS
        could spuriously command the initialisation of the flowmeter whilst the hover function is
        operational needs to be considered.
        As part of the initialisation of the hover flowmeter during the plug and play operation the
        failure of the hover flowmeter to accurately transmit configuration data to the system
        could result in the hover application failing to maintain control over the hover process.
        This failure could either be caused by gross or subtle errors in the data introduced either
        by the device manufacture during the development of the DTM or by data corruption. As
        part of the acquisition process it is assumed that the integrity of the DTM is assured by
        the application of safety tactics FA#1 and FA#2 combined with specific certification
        schemes from the FDT Group [46] which is the body responsible for controlling and
        certifying the technical aspects of FDT/DTM.
        Diagnostics Failures
        The capability of the device to identify internal failures through the use of inbuilt
        diagnostics is an important mitigation for the detection of failures. However, the
        diagnostic function is also susceptible to failures caused by errors in the software
        providing the intelligent functions and therefore needs to be considered as part of this
        study.
        Depending in the extent to which the diagnostic functionality fails there is range of
        effects on the ability of the hover flowmeter to provide its primary function of flow
        measurement. If following initialisation the flowmeter internal diagnostics fail to run
        there is a possibility that errors in the configuration setup at initialisation would not be
        flagged up to the iPMS. Un-revealed failures in the flowmeter could impact on the ability
        of the flowmeter to provide accurate flow measurement results to the hover application
        and therefore loss of submarine control is a possibility.



Adrian E Hill                                   -71-                                      04/09/2008
        The other significant failure of the inbuilt flowmeter diagnostics concerns the failure of
        the flowmeter to make the transition from diagnostics mode to normal operation or to
        move to a diagnostic mode when it should be in normal operating mode. In both cases
        the flowmeter fails to provide normal flow measurement when required to do so by the
        hover application resulting in possible loss of submarine control.
        As well as there being diagnostic functions built in to the flowmeter there is also a soft
        diagnostic validity check of device parameters provided as part of the DTM (device
        driver) functionality. The parameter checking function within the DTM checks that the
        correct device parameters have been enabled in the host application. In the case of the
        use of a plug and play flowmeter, on initialisation the associated DTM will check that
        there is a match between the parameter provided by the device and those expected by
        the application. However, if this function was to fail to detect an error there would be the
        possibility of the hover application either processing incorrect data or not being able to
        access the correct device parameters.

4.4   Flowmeter Requirements
        The role of the flowmeter in the provision of the hover function is understood and is
        considered to be fairly simple. However, it is useful to briefly outline the expected
        requirements that a flowmeter would be expected to satisfy in this specific application.
        The flowmeter is required to continually measure flow in both directions from 0 to n
        Litres per minute during hover operations.
        The flowmeter itself plays a significant role in the safe provision of the Hover function by
        providing the system with flow data relating to the flow and quantity of sea water in and
        out of the submarine via a hull valve. The data provided by the flowmeter has a direct
        influence on the algorithm that the control system uses to determine how much water to
        move and the route that water must take to ensure that the submarine remains under
        control. Therefore the primary requirement for the flowmeter is that it should provide
        data to a specified accuracy.
        The flowmeter should also be capable of providing an analogue signal that represents
        the direction and quantity of flow. This would typically be a 4-20mA signal that would
        connect directly into a field wiring connection facility, although the same functionality
        may be provided digitally through a network connection using a suitable device language
        and protocol.
        The flowmeter should also provide a number of additional functions including,
         •      Self diagnostics and health monitoring
         •      Self calibration and configuration on power-up (plug and play technology)
         •      Provide status information to the system using a recognised device language
         •      Be capable of interfacing with a digital network
        The rest of this case study will consider how, based on these requirements, an intelligent
        plug and play flowmeter may be used for the hover function whilst maintaining the safety
        requirements for the system.

4.5   Selection and Application of Safety Tactics
        In order to be able to tolerate the automatic configuration of a plug and play flowmeter in
        the hover application a number of the safety tactics discussed in Chapter 3 have been
        selected. The selection of safety tactics has been dependant upon the type of field


Adrian E Hill                                      -72-                                     04/09/2008
        device to be used and the constraints placed on the use of the device by the initial
        system architecture and the physical constraints of a submarine environment.
        When instantiating the safety tactics within a framework of COTS acquisition and safety
        contracts the suggestion offered by Conmy et al [38], which was briefly explored in
        Section 3.4.3, concerning the levels of abstraction that should be considered has been
        used. As a reminder, the four levels of abstraction suggested by Conmy et al [38] are,
         •      High Level Requirements
         •      Architectural level Constraints
         •      Behavioural Level Constraints
         •      Quantifiable details including performance and reliability etc.
        When applied to a control and instrumentation system such as the iPMS on board a
        nuclear submarine the same levels of abstraction can in principle be applied. It is this
        principle of abstraction from the IMA safety contracts work that will be applied to the use
        of the safety tactics that are proposed for the hover flowmeter application. The
        application of each level of abstraction is discussed as follows.

4.5.1 High Level Requirements
        The high level requirements for the interaction between the hover flowmeter (the
        supplier) and the host system (the client) are at the highest level of abstraction and are
        fundamental to the correct and safe operation of the hover control system. The
        requirements are therefore concerned with the type and function of the flowmeter that is
        required to support the overall hover control function.

4.5.2 Architecture Level Constraints
        The hover flowmeter forms part of a larger integrated system that is used to provide the
        hover function. At the architectural level the application of an IMA type layering
        approach provides three main layers to consider, the application, the infrastructure and
        the field device. The hover application interfaces with the infrastructure layer that
        provides asset management, configuration management and network services. At the
        lowest layer the hover flowmeter provides the raw data required by the hover application
        is managed from a plug and play perspective by the infrastructure layer. From this
        architecture there are interfaces between the flowmeter and host system that could be
        exploited in order to detect failure of the plug and play function. The architecture also
        provides opportunities for failure detection and containment through the use of analytical
        sensing by utilising the array of field devices that are also used as part of the hover
        control process.

4.5.3 Behavioural Level Constraints
        At the behavioural level, the types of safety tactic that can be used for the plug and play
        operation of the hover flowmeter are mainly concerned with the individual components
        within each of the architecture layers. The main focus of the safety tactics development
        is on those behaviours of the flowmeter itself and specifically of the built-in intelligence
        that could reasonably contribute to failure of the flow measurement function. The built-in
        intelligence includes communication, initialisation, diagnostics and the transfer of data
        between the flowmeter and the iPMS. It also includes the components within the iPMS
        that could also contribute to a failure within the context of the plug and play functionality.
        In this case study, the behaviours of the iPMS network and asset management
        application is a key factor as these components have a direct influence of the plug and
        play function provided by the flowmeter.
Adrian E Hill                                       -73-                                   04/09/2008
4.5.4 Quantifiable Level Constraints
        Although the plug and play functionality of the hover flowmeter does not need to meet
        any hard real time deadlines there are a limited number of failure modes that could
        reasonably be addressed using performance type constraints. Issues such as how
        many times to allow the flowmeter to attempt to initialise itself on the system before it is
        declared faulty and taken out of service? Network performance and the timeliness of
        transferring data across the network may be important as is the use or misuse of certain
        command strings between the flowmeter and the iPMS.

4.5.5 Documenting of Tactics
        Wu et al [41] proposes a template for documenting safety tactics that aims to describe
        each tactic in a structured way. Whilst the rationale behind the creation of the template
        is understood, the safety tactics for this case study will be translated into safety
        contracts. Therefore, the derivation of the safety tactics has been based on the general
        template that includes, “Aim”, “Description”, “Rationale”, “Applicability”, “Consequences”,
        “Side Effects” and “Patterns” but the safety tactics themselves have not been
        documented in the form proposed by Wu et al. At this stage of development, each
        proposed safety tactics is described in free form text using the section headings from the
        template as guide words.

4.5.6 Application of Safety Tactics and Safety Contracts
        The application of the safety tactics in this case study has been undertaken in
        accordance with the generic process discussed in Chapter 3, section 3.6. For the
        purposes of this project there was a deviation from the process in that group activity
        such as the safety analysis of plug and play functions and Validation & Verification
        activities were undertaken solely by the author. The application of the safety tactics and
        the documenting of safety contracts have been based on a theoretical intelligent
        flowmeter and host system (iPMS) using currently available technologies and
        techniques.
        Table 4-7 on page 89 provides a cross reference between the generic safety tactics
        presented in Chapter 3 and the lower level safety tactics that have developed as part of
        this case study.

4.6   Failure Avoidance Tactics
        Failure avoidance is the first safety tactic that should be considered as its aim is to
        reduce the probability that the flowmeter could fail in such a way that would be
        hazardous.
        In the context of this specific case study, the failure of the flowmeter is taken as a failure
        to successfully automatically configure (flowmeter and system) to a known good state
        when a flowmeter is inserted in the system. In the context of the submarine hover
        function a failure of this type may not manifest itself immediately as the hover system is
        not continually operating at all times during a submarine mission and therefore it could
        feasibly be a relatively long period of time before a plug and play induced error would
        have an effect on the safe operation of the submarine.

4.6.1 Reference COTS component
        In the approach taken by Ye [33] a reference COTS components was defined in order to
        undertake Component Criticality Analysis which itself feeds into the development of the
        safety contract between the COTS component and the system in which the component
Adrian E Hill                                    -74-                                      04/09/2008
        was to form a part. In the case of the hover flowmeter, a simple reference device can be
        defined from which the contribution the flowmeter function makes to the safety of the
        hover function can be determined.
        The equivalent of the reference COTS component in this case study is a reference
        flowmeter that satisfies the initial requirements determined by the system architecture,
        overall hover function and the demands of the hover control algorithm. From the hover
        system overview in Section 4.2 and the flowmeter requirements in Section 4.4 the most
        significant aspects of the reference flowmeter are its interfaces and enhanced functions
        such as self diagnosis.
        The hover flowmeter needs to have an interface with the iPMS infrastructure layer in
        order to exchange device and measurement data. For this functionality, a device with a
        fieldbus protocol connection can be considered suitable and the basis for further
        analysis.
        The hover flowmeter also needs to be capable of self diagnosis and so the reference
        flowmeter should also include some intelligence with regard to self monitoring and health
        management.
        There is also a need to access device specific information in order to take advantage of
        the built-in intelligence that would be required for plug and play functionality. In the case
        of an intelligent flowmeter, the use of a suitable means of accessing data will be
        required. The most up to date approach is to use device data files to store and share
        details about a field device throughout a process automation system. One such
        approach is to use the Field Device Tool (FDT) [46] which provides standard
        communication of device data independently of the specific network protocol being used.
        An FDT enabled application is able to interface with a specific device on the network
        through the use of a device specific driver called a Device Type Manager (DTM). The
        DTM can be considered to be analogous to a device driver in that it contains device
        specific information that enables a computer application to access the device. As the
        flowmeter will be inserted in to an iPMS it will need to be accessible to the hover
        application and infrastructure. It would therefore seem reasonable for the reference
        flowmeter to be accessible to the iPMS using FDT/DTM technology.
        The normal function of flow measurement is also important in the overall operation of the
        control system, although for this case study it is assumed that the performance of the
        flowmeter is unimportant given the emphasis on the plug and play configuration and
        reconfiguration function. Therefore, the actual implementation of the flow measurement
        function within the flowmeter will not be considered further.

4.6.2 Evaluation and Selection Criteria
        In terms of defining the evaluation and selection criteria for the flowmeter the major
        activity in the proposition made by Ye [33] was the component criticality analysis (CCA).
        As discussed in Chapter 3, there are several safety analysis techniques that could be
        used to satisfy the requirements to undertake CCA. In this case study it was decided
        that an appropriate technique would be to use Failure Modes Effects Analysis (FMEA)
        as this appeared to fit well with the analysis of what is essentially a black box device
        within the context of a control system. The use of FMEA within the scope of this case
        study also fitted with the analysis of the “flow” function undertaken by BAE Systems [45]
        and was described as part of the case study context in Section 4.3.1.
        From the FMEA, given the extent of the mitigation available, such as trained operators,
        additional depth indication etc., the SIL of the measurement function provided by the
        flowmeter was determined to be SIL 2. This FMEA is considered to be valid for this

Adrian E Hill                                   -75-                                      04/09/2008
        case study as it was undertaken for the same system, in the same application using the
        same functionality within submarine iPMS. It is also valid in so far as the plug and play
        functionality of the hover flowmeter could plausibly contribute to any of the three failure
        modes identified by the FMEA. These three failure modes are therefore taken further in
        an additional FMEA of the plug and play function that was undertaken as part of this
        case study and documented in Appendix A.
        However, for the acquisition of the hover flowmeter the requirement for flow
        measurement with safety integrity of SIL 2 can itself be captured as an instantiation of
        the generic safety tactic FAST#1.
        Safety Tactic FA#1 – Acquire a flowmeter that has been independently certified to IEC
        61508 SIL 2.
        In terms of the basic function of the flowmeter this tactic is reasonable and enables the
        rest of the safety work to be concentrated on the implementation of the plug and play
        functionality within the specific iPMS hover application. As the flowmeter is an intelligent
        field device the certification should cover both the hardware and firmware and should be
        appropriate for the flow measurement application in the hover system taking in to
        consideration the failure modes that have been identified. As the flowmeter is not a on
        demand type safety device, the certification should be for a continuous operation mode
        with a probability of dangerous failure of not greater than 10-6 failures per hour based on
        the definition of high demand and safety Integrity Levels in IEC 61508 [5].
        Although this safety tactic applies primarily to the basic flow measurement function it
        may also be useful in the argument that the development and implementation of plug
        and play functionality within the flowmeter has been undertaken to a level of integrity
        comparable with the integrity of the main flow function.
        The overall goal to acquire a COTS component with a given safety integrity level has
        typically been argued from a process and/or from an analysis of operational data.
        However, as the commercial world of process automation starts to adopt the
        international safety standard IEC 61508 in greater numbers, there is now an alternative
        approach through the use of independent 3rd party certification. Whilst the acceptance
        of independent safety certification for intelligent devices is still an unresolved issue
        within the nuclear safety industry it does not appear to be an unreasonable tactic to use
        if some additional precautions are put in place.
        For the acquisition of the flowmeter, the IEC 61508 [5] SIL 2 certification should be
        supplemented by a report issued by the independent certification body which itself
        should be reviewed and any specific limitations noted and dealt with. At this point in
        time, it would appear that the market place for intelligent field devices such as the
        flowmeter is not yet mature enough with regard to safety certification. Issues that were
        encountered while searching for a commercially available intelligent flowmeter with
        suitable IEC 61508 [5] SIL 2 safety certification included,
         •      Certification that only applied to the hardware and not the firmware. This is clearly
                an issue for the implementation of an intelligent flowmeter where systematic errors
                in the firmware could lead to a dangerous failure. In most cases the SIL 2
                certification was based on IEC 61508 pt 2 [5] and consisted of a hardware FMEA
                and Reliability Analysis
         •      Certification that was only applicable to devices with a low demand operation, which
                is defined in IEC 61508 [5] as an operational demand of less than 1 per year. This
                would not be appropriate for the hover flowmeter as it would be expected to be
                operated more frequently and therefore certification applicable to continuous
                operation would be more acceptable.

Adrian E Hill                                     -76-                                    04/09/2008
         •      Certification based on “proven in use”. Whilst this is useful, it can be difficult to
                apply in a marine/military application given that in all cases the previous use of an
                intelligent flowmeter has been in the commercial areas of petrochemical and
                process control. There is an ongoing debate over the “proven in use” approach to
                safety certification and two papers of interest that provide a useful overview of such
                issues have been written by Amkreutz et al [43], that gives an IEC 61508 approach
                and by Ferrell et al [44] which takes a wider view from a DO-178B certification
                approach.
        Having defined the high level integrity requirement for the hover flowmeter and identified
        possible constraints on the use of COTS and 3rd party safety certification further safety
        tactics concerning the high level requirements for the hover flowmeter have been
        defined by instantiating the generic safety tactics FAST#1 and FAST#2 as a single
        safety tactic FA#2.
        Safety Tactic FA#2 – Ensure that failures are avoided by choosing the most suitable
        COTS flowmeter for the role and that appropriate constraints on the flowmeter and iPMS
        have been defined and applied using a range of appropriate safety contracts.
        It could be argued that Safety Tactic FA#2 is really common sense in terms of acquiring
        a product to be used in a safety related system. Whilst this is true, it is often the case
        that COTS components are bought in isolation from the wider needs of the host system
        such as the iPMS and certainly without considering fully the safety impact the
        component has on the whole system. This tactic is useful in that it can be used to define
        the constraints and dependencies between a COTS product such as the hover
        flowmeter and the wider iPMS control system. As a tactic, FA#2 encompasses all of the
        safety tactics that in turn are used to drive out the specific constraints and dependences
        between the COTS flowmeter and the iPMS by using the principles of the safety
        contract.
        In applying the concept of safety contracts to safety tactic FA#2, the most appropriate
        level of abstraction is considered to be the high level requirements. At this level of
        abstraction the high level requirement for the implementation of plug and play
        intelligence needs to be considered across the whole application and not just the COTS
        flowmeter. Lower levels of abstraction including architectural and behavioural are more
        appropriate for defining lower level constraints that may be applied to individual
        components and the interactions between such components. High level requirements
        can be considered to embrace a much wider scope and be used to address broader
        COTS component acquisition issues.

4.6.3 High level Requirements
        The high level requirements have been determined to be,
         •      The hover flowmeter shall be acquired from a manufacturer with a proven track
                record in the design and development of intelligent flow measurement devices with
                plug and play functionality. This requirement is not component specific, but is useful
                in terms of supporting the overall safety tactic of choosing the most suitable COTS
                product. This can be especially important when justifying the use of the flowmeter
                in the safety as it can add to the overall body of evidence in favour of its use.
                However, it may of course be a source of negative evidence if the intelligent
                flowmeter is the first product developed by the manufacturer with this technology.
         •      The hover flowmeter shall provide intelligent plug and play functionality that includes
                self diagnostics and health management capabilities. Without this requirement, the
                flowmeter would not satisfy the basic functional requirement for plug and play


Adrian E Hill                                      -77-                                     04/09/2008
                technology and therefore would not be suitable. The intelligence provided by the
                flowmeter should also be compatible with that provided by the host system.
         •      The hover flowmeter shall support the FDT/DTM device description language and
                management tools. In essence this means that the hover flowmeter needs to have
                an associated device driver (DTM) developed in accordance with FDT standard. To
                fail to meet this fundamental requirement would mean that the interface between the
                flowmeter and the host system would not operate correctly and it would not be
                possible to transfer data between the two.
         •      The hover flowmeter shall be acquired with appropriate safety certification.
                Although this requirement was discussed in some detail as part of Safety Tactic
                FA#1 it is useful to capture it here as a high level requirement. At the very least, the
                safety case for the primary flow measurement function needs to be supportable by
                the chosen flowmeter. This may be satisfied through independent certification or
                through other means such as verifiable process compliance and product testing.
         •      The network shall be capable of supporting the data throughput rates demanded by
                the exchange of configuration data between the flowmeter and the host system. As
                part of a failure avoidance tactic, the performance of the network upon which COTS
                components such as the flowmeter reside needs to be managed such that each
                device can be serviced when required. In terms of safety, the provision of plug and
                play configuration data on the network is not time critical in this application because
                operational constraints would not allow the hover application to run if the flowmeter
                had not been successfully integrated with the system. A delay in the transmission
                of plug and play configuration data would not therefore affect the hover application.
                However, other parts of the system may be more susceptible to poor network
                performance and the additional network traffic generated by the plug and play
                function should not be allowed to impact this.
        This relatively short list of high level requirements can be used to inform the final
        selection of a COTS component as it covers the basic requirements for the flowmeter.
        Typically, high level requirements will include some technical and non technical aspects
        as in this case. The primary aim of such an approach is to ensure that there is a solid
        foundation from which to move to lower level tactics and to ensure that safety drives the
        acquisition process.
        Below the high level requirements sit a number of lower level requirements that have
        been derived through the application of failure detection and failure containment safety
        tactic. Not only do these safety tactics sit alongside the failure avoidance tactic but they
        can be seen to also be part of failure avoidance. The most appropriate safety tactics to
        apply in order to drive the safety contract for the COTS flowmeter were determined as
        part of the intelligent flowmeter FMEA (see Appendix A). Each failure was analysed in
        order to determine whether failure detection and/or failure containment tactics could be
        reasonably applied in this application given the proposed architecture and system
        components. In line with the approach taken by Conmy et al [38], the analysis of the
        plug and play failures considered three further levels of abstraction, architectural,
        behavioural and quantifiable and the possible failure detection and failure avoidance
        safety tactics that could be applied. The following sections 4.7 and 4.8 describe the
        safety tactics that were defined using this process.

4.7   Failure Detection Tactics
        Having determined the type of failures that could reasonably occur with a plug and play
        implementation of an intelligent flowmeter a range of failure detection tactics have been
        developed from the generic safety tactic principles discussed in Chapter 3 (FDST#1-3).
        Having developed the failure detection safety tactics a number of failure detection

Adrian E Hill                                       -78-                                     04/09/2008
        constraints were documented in accordance with the safety contracts discussed in
        Chapter 3.
        As a general principle, in this type of system failure detection can be implemented within
        the device and/or within the host system. In the case of the hover application that is
        hosted on the iPMS there are failure detection tactics that can be implemented within the
        hover flowmeter as well as within the infrastructure layer of the iPMS. The instantiated
        failure detection tactics are documented as safety tactics FD#1 - #8.

4.7.1 Communication Failure Detection Tactics
        Safety Tactic FD#1 – Implement appropriate device polling techniques
        “Sign of life” is a technique that could be embedded within the network management
        infrastructure and is used to periodically check that a device on the network is still active.
        This tactic is also at the architectural level and in the case of the plug and play hover
        flowmeter, a possible tactic would be to use device polling to initially register the
        flowmeter on the system following initialisation and configuration. Once registered, the
        device polling tactic could be used to detect a failure that prevented the flowmeter from
        providing a healthy status. From the FMEA in Appendix A there are failures of
        commission where the flowmeter fails to successfully disconnect from the iPMS asset
        management application on completion of initialisation. In this example, device polling
        would detect that the flowmeter was not responding to the poll and would either flag the
        failure and or initiate failure containment measures.
        Safety Tactic FD#2 – Implement appropriate data integrity checks
        Data integrity checks such as those provided by Cyclic Redundancy Checks (CRC) or
        checksums could be used as a general technique to ensure that data passing between
        the flowmeter and the asset management application during plug and play initialisation
        and configuration had not been corrupted. By using data integrity checking, data
        corruption failures may be detectable. The application of integrity checking on the
        network would be implemented in the specific network protocols and device languages
        used, such as PROFIBUS, PROFIsafe and FDT/DTM mechanisms and therefore could
        be considered as a behavioural level tactic. It is understood that FDT/DTM includes
        provision for data checking that would be implemented as part of the DTM device driver
        provided by the Flowmeter manufacturer.
        Safety Tactic FD#3 – Implement appropriate timeout techniques
        Implementing specific timeout periods for communication between the iPMS and the
        hover flowmeter may be used to detect both failures in the network as well as failures of
        the flowmeter. Typically timeouts would be implemented at the iPMS infrastructure layer
        such that data passing between the field device layer through the infrastructure to the
        application layer is managed. Part of that management would be used to ensure that
        the failure of a field device such as the flowmeter to respond to a command within a
        specified period of time would be detected and appropriate action taken. Timeouts may
        also be used to manage the number of attempts the iPMS is allowed to access the
        flowmeter before giving up and declaring the flowmeter out of service.
        Safety Tactic FD#4 – Implement device identification through asset management and
        system configuration techniques.
        To ensure that communication properly maintained between the host system and
        multiple intelligent devices in a process automation system a method should be used to
        ensure that each device on the network is uniquely identified. This type of identification
        should work rather like a computer network uses IP address to identify equipment on the

Adrian E Hill                                    -79-                                      04/09/2008
        network and ensure that messages are routed through to specific addresses. Although
        systems that use Ethernet and IP protocols are well established, there are methods of
        implementing the same type of function that are specific to the process control industry.
        PROFIsafe which is a software layer that sits on top of network protocols such as
        PROFInet and PROFIbus includes a method of unique device identification and
        addressing on the network. The FDT/DTM scheme also includes device identification
        and addressing that can also be used to ensure that data messages are always sent to
        the correct device. Such schemes also use techniques to detect failures so that devices
        typically time out if communication fails. Additional functionality could also be added
        within the infrastructure layer to make comparisons between a device and a known good
        configuration so that a device, in this case the hover flowmeter, on the system is
        correctly identified and data is not sent to an incorrect device.
        This type of functionality is well established and therefore the issue here is one of
        choosing the most appropriate for the specific application.
        The safety contract between the hover flowmeter and the iPMS for ensuring that
        communication failures are detected is shown in Table 4-1


         Failure Detection Constraints
         Function              Data Transfer/Communication                                             SIL    2
         COTS Device           Hover Flowmeter
         Potential Failure                COTS Device (Supplier)        IMP   Host System (Client)           IMP
         Architecture Level
         Corrupted transfer of data       Flowmeter must support the    H     Network     topology   must     H
         between host system and          use     of    the   chosen          provide redundant paths with
         device.                          communication protocol and          integrated diagnostics
                                          physical media
         Behavioural Level
         Communication          channel   Flowmeter must provide sign   H     Host system must include        H
         failure                          of life status signal               method     of   periodically
                                                                              checking for device sign of
                                                                              life.
         Corrupted transfer of data       Flowmeter must support the    H     Host system must support        H
         between host system and          use of data checksums               the use of data checksums
         device.

         System fails to detect device    Flowmeter must support FDT    H     Implement        comparison     M
                                          scan function                       between expected system
                                                                              configuration and actual
                                                                              system configuration
         System detects and initialises   Flowmeter must use same       H     Ensure that adequate device     H
         incorrect device                 addressing scheme as host           addressing is implemented
                                          and have a unique address
         System fails to disconnect       NA                            NA    Implement watchdog timer        M
         from device when required                                            on asset management server
                         Table 4-1 Failure Detection Constraints (Communication)

4.7.2 Initialisation Failure Detection Tactics
        From the FMEA (Appendix A) it can be seen that initialisation failures are those that
        occur at the time of plug and play configuration when the flowmeter is inserted into the
        host systems (iPMS in this case). Whilst the flowmeter itself has some inbuilt diagnostic
        functions, the primary tactics for detecting initialisation failures are focused on the use of
        the host system.

Adrian E Hill                                        -80-                                            04/09/2008
        Safety Tactic FD#5 – Validate device configuration through the use of Host system
        configuration check.
        If on initialisation the flowmeter fails to read the initial configuration settings from the host
        system due either to an internal failure or a failure of the host system there may be a
        means of detecting the failure. Assuming that the copy of the required flowmeter
        configuration held by the asset management system was correct, then it would be
        reasonable to use the asset management system to validate the flowmeter configuration
        as a final step in the initialisation process.
        The chosen FDT/DTM mechanism for implementing the plug and play technology
        includes a scan function that is able in the first instance to recognise a device on the
        network and secondly is able to download the device configuration from the device.
        These functions are part of the service tool that would need to be installed as part of the
        overall FDT/DTM asset management package.
        This tactic is dependant upon the validity of the master configuration held by the asset
        management which would need to be maintained on the system. This dependency
        would require that for each configuration change the master configuration would also
        need to be updated and verified. For a system with a large number of different devices
        this may become a source of error that could only be mitigated through maintenance
        instructions and procedures.
        Safety Tactic FD#6 – Verify device is operating in correct operating mode for plant
        state.
        Once the flowmeter has been inserted into the system there is a possibility that either
        the device or the host system remains in initialisation mode and does not make the
        transition to normal operation mode. This type of failure will deny the process control
        application access to the flow measurement data that it needs to successfully keep the
        submarine on position.
        For this failure ways should be sought to verify the operational status of the flowmeter on
        a periodic basis. Simple verification could be achieved through access to a status
        indicator within the flowmeter which is accessed by the DTM and passed to the hover
        application. If the flowmeter status indicated that the flowmeter was in any other state
        than operational then the hover application could flag the flowmeter as out of service.
        An out of service flag could then be used to drive an indication to the operator and
        automatically prevent the system from making the transition into hover operations. More
        complex methods of implementing this type of tactic would be to compare the output
        from the flowmeter with the expected output generated by the host system from a range
        of plant and submarine dynamic data.
        The checking of a status flag is considered to be a better method in this case as the
        comparison with generated data is dependant up the flowmeter being used in hover
        operations and therefore the failure would be detected late. It is better to be able to
        detect a failure prior to operational use rather than during operations, particularly when a
        single component is critical to the operation.
        For failures of the asset management application that resulted in the continual
        initialisation commands being transmitted to the flowmeter, there may be an opportunity
        to build in timeout functionality so that the initialisation attempt is disrupted after a set
        period. Once a timeout has occurred, the system would have to check the status of the
        flowmeter to ensure that the initialisation had successfully completed.




Adrian E Hill                                     -81-                                       04/09/2008
        Safety Tactic FD#7 – Verify flow measurement signal is within set limits.
        Assuming that the initialisation process completes, the most prevalent failure mode from
        a plug and play operation concerns the validity of the initial set up data which includes
        calibration, offsets, parameter and limit settings. An error in the initial setting could
        cause the output from the flowmeter to be either grossly or subtly incorrect. Gross and
        subtle errors in the data used by the hover application to maintain the submarine
        position has a range of undesirable effects on the process from minor inconvenience
        through to loss of control. It is expected that gross errors in the data could reasonably
        be detected through the use of boundary checks on the data from the flowmeter as it is
        used by the hover application. Gross error typically includes errors that cause the data
        to present out of range measurements to the application which under any operational
        condition would be considered to be erroneous.
        There are however more subtle errors than could present credible but incorrect
        measurement data to the application that would not be detectable in all operational
        conditions but would have a detrimental effect on the overall control of the process. For
        this type of failure, simply checking whether or not the measurement value is outside
        predefined ranges would not necessarily be enough to detect the failure. Therefore, an
        alternative and more powerful technique would be to compare the flowmeter
        measurement value with a calculated result derived from data collected from a range of
        plant and submarine dynamic parameters. This analytical method of determining the
        correct result and comparison with the measure value is however susceptible to errors,
        particularly if other sensors within the process are for some reason unable to truly
        represent the measurement provided by the flowmeter. In such a case, it would be
        expected that there would have to be a confidence level placed on the derived
        measurement and that there would be error bands built into the comparison.
        The safety contract between the hover flowmeter and the iPMS for ensuring that
        initialisation failures are detected is shown in Table 4-2.




Adrian E Hill                                  -82-                                   04/09/2008
         Failure Detection Constraints
         Function                 Flowmeter Plug and Play Initialisation                                    SIL    2
         COTS Device              Hover Flowmeter
         Potential Failure               COTS Device (Supplier)            IMP   Host System (Client)              IMP
         Behavioural Level
         Device fails to read initial    Flowmeter supplied with           H     Compare actual device             H
         settings                        valid DTM Device Driver                 settings with known good
                                                                                 configuration.
         Device continuously             Flowmeter includes device         H     Determine device status and       H
         operates in initialisation      status information                      validate against current
         mode                                                                    system status.
         System continuously sends       NA                                NA    Implement timeout on device       H
         initialisation data                                                     initialisation application
         System transmits grossly        Flowmeter shall reject out        H     Check device status               H
         incorrect initial settings      of range configuration
                                         settings
         System transmits subtly         NA                                NA    Compare flowmeter output          H
         incorrect initial settings                                              with plant status by using
                                                                                 analytical sensing.
         Device transmits grossly        NA                                NA    Asset management application      L
         incorrect configuration data                                            shall detect out of range
         to system                                                               device configuration settings
         Device transmits subtly         NA                                NA    Validate device parameters        L
         incorrect configuration data                                            with FDT Manager.
         to system
                             Table 4-2 Failure Detection Constraints (Initialisation)

4.7.3 Diagnostic Failure Detection Tactics
        Diagnostic failures in the context of this case study are mainly concerned with the failure
        of the in-built diagnostic features of the intelligent flowmeter. It is assumed that once the
        flowmeter has been inserted into the system and initialised through the plug and play
        function it would complete the process by running diagnostics to ensure that the unit was
        functioning correctly. It is also assumed that in built diagnostics run periodically
        throughout the operation of the flowmeter.
        Within the proposed hover system solution there is a potential failure of omission
        concerning the use of the FDT/DTM mechanism of implementing device intelligence.
        Diagnostics form part of the FDT/DTM mandatory interface that allows an FDT
        application to compare the actual device settings with those held in the DTM (device
        driver). However, if the FDT application fails to detect a difference between the actual
        device settings with those held in the DTM then there is loss of diagnostic function.
        However, for this to be hazardous there would need to be not only a failure of this
        diagnostic function but also an error in either the device settings or DTM. If such a
        failure resulted in incorrect flow measurement data being passed to the hover
        application then Safety Tactic FD#7 should detect the data had a low confidence and
        alert the system operator to the failure.
        Safety Tactic FD#8 – Host system should check status of flowmeter following
        reconfiguration and periodically during its operation.
        In the event of a failure of the flowmeter diagnostic function, the only credible way of
        detecting such a failure is via the diagnostics built in to the host iPMS. It would be
        reasonable to expect the iPMS through the diagnostic functions of the asset
        management application to validate the status of the flowmeter following its plug and
        play initialisation, the aim if which would be to ensure that the flowmeter was fully
Adrian E Hill                                            -83-                                                 04/09/2008
        operational and fault free prior to the critical submarine hover operation. Once the
        status had been validated, repeat status checks should be run periodically, perhaps as
        part of the process that takes the submarine into a hover state. However, failures have
        been reported where instruments have gone into diagnostics mode. The case of the
        Rockwell Automation ControlLogix Controller 1756-L55 and FlexLogix Controller 1794-
        L34 is a case in point where after 3-4 months of continuous operation the controllers
        went into diagnostics mode and caused a “non-recoverable major fault” [47]. Therefore,
        some care is needed when deciding whether or not to implement built-in diagnostics. IN
        the hover application the diagnostics should not be executed while the submarine is in
        the hover state as this may deny the flowmeter the resources it requires to be able to
        provide valid flow measurement data to the hover application.
        The safety contract between the hover flowmeter and the iPMS for ensuring that
        diagnostic failures are detected is shown in Table 4-3.
         Failure Detection Constraints
         Function                Flowmeter Diagnostics                                                SIL         2
         COTS Device             Hover Flowmeter
         Potential Failure                      COTS Device (Supplier)   IMP   Host System (Client)           IMP
         Behavioural Level
         Inbuilt diagnostics fail to detect     NA                       NA    Operational status of the      H
         device failure                                                        device shall be verified
                                                                               following initialisation and
         Device remains in diagnostic           NA                       NA
                                                                               at a periodicity of n
         mode following initialisation
         Device fails to run self diagnostics   NA                       NA                                   L
         following initialisation
                             Table 4-3 Failure Detection Constraints (Diagnostics)

4.8   Failure Containment Tactics
        Having determined the tactics that could reasonably be applied for the detection of
        failures in the plug and play implementation of the hover flowmeter, the next stage was
        to develop a range of failure containment tactics. The failure containment tactics
        discussed in this section have been developed from the generic safety tactic principles
        discussed in Chapter 3 (FCST#1 - #3). Having developed the failure containment safety
        tactics they were rewritten as safety constraints and documented as safety contracts.
        As a general principle, in this type of system failure containment can be implemented
        mainly within the host system. In the case of the hover application that is hosted on the
        iPMS most of the failure containment tactics that can be implemented will be shown to
        be within the application layer of the iPMS. The instantiated failure detection tactics are
        documented as safety tactics FD#1 - #8.

4.8.1 General Failure Containment Tactic
        Safety Tactic FC#1 – On detection of a failure switch to analytic sensing.
        As a general safety tactic the use of analytic sensing can be used to ensure a
        continuation of service in the event of a detectable flowmeter failure. As such, this
        specific safety tactic has been classified as a general failure containment tactic as it is
        applicable to the containment of all flowmeter failures, i.e. communication, initialisation
        and diagnostics.
        From the basic hover control diagram given in Figure 4-2 on page 66 there are a number
        of field instruments including flowmeters and tank level sensors to be able to derive the

Adrian E Hill                                            -84-                                         04/09/2008
        flow rate of sea water as it is pumped between through the system. Other systems,
        particularly the depth measurement system and the sensors that provide the
        hydrodynamic position of the submarine could also be used to provide an alternative
        range of data required to control the movement of the submarine while in the hover
        mode of operation. Figure 4-5 is a small segment of the hover control system that
        provides a number of different ways in which flow may be derived.




                            Figure 4-5 Segment of Hover Control System
        In this example the rate at which parameters change along with the actual dynamic
        response of the submarine can be compared with a model of the expected submarine
        response to continue the control of the hover process in spite of a hover flowmeter
        failure. However, the accuracy of control will be reduced due to the increase in
        measurement uncertainties. For example, various uncertainties will effect the accuracy
        of the control including issues such as leaking joints, combined measurement
        uncertainties for all the instruments involved as well as measurement latencies. The
        success of this type of system is also dependant upon the fidelity of the submarine
        model and the extent to which other parts of the system is failure free. In effect, all
        these variables and uncertainties result in a system that is less effective than if the
        flowmeter was functional and therefore the overall confidence in the alternative analytic
        method of providing flow measurement redundancy would need to be determined and
        an algorithm used to determine the point at which the confidence was high enough to
        continue the hover operation. If implemented with care this type of tactic can be a very
        powerful way of containing a sensor failure.

4.8.2 Communication Failure Containment Tactics
        Safety Tactic FC#2 – Provide a redundant communication path between the flowmeter
        and the host system.
        At the architectural level the first tactic that could be employed concerns the use of
        highly reliable communications through the network in order to address failures where
        the reliability of the network could be considered as a contributing factor. The aim of the
        tactic would be to ensure that the most suitable network topology was implemented


Adrian E Hill                                  -85-                                     04/09/2008
        along with the most appropriate protocols across the whole network, from intelligent
        device through to processing units that run the applications.
        There are several network topologies and protocols that could be considered including
        duel redundant LAN architectures using star, mesh or ring topologies. With a system
        like the hover control running on an iPMS there are range of commercially available
        network protocols that are widely used in industry and could therefore be considered.
        The main contenders would be Ethernet, PROFIBUS, PROFINET, PROFIsafe and
        Foundation Fieldbus, all of which have pros and cons and all of which could provide a
        level of robust and reliable communication between the iPMS and the hover flowmeter.
        In implementing a reliable communications a redundant PROFIsafe network would be
        considered the most robust. PROFIsafe [35] is a software layer that sits on top of the
        standard network protocol. It provides safety certified (up to IEC 61508 SIL3) services
        that are used to detect a range of failures such as messages being sent to the wrong
        device and bit errors for example. PROFIsafe facilitates the retransmission of messages
        under failure conditions and this combined with redundant network paths should ensure
        that not only are communication failures detected but also contained.
        Other schemes could also be used to provide communication failure containment, using
        standard process control network protocols such as Ethernet, PROFIBUS, PROFINET,
        and Foundation Fieldbus with redundancy and additional services similar to those
        provided by PROFIsafe. Care needs to be taken to ensure that network components
        are selected that work together and are capable of providing the overall redundancy
        required by the system.
        This safety tactic is applicable to the three types of failure; avoidance, detection and
        containment. In terms of avoidance it is important to recognise that a robust network
        topology that supports the transfer of data to a defined integrity and reliability is an
        important means of avoiding system failures caused by failures in the communication
        channel. As a means of failure detection, depending on the specific communication
        protocol implemented there are varying levels of detecting a network failure. Likewise
        as a containment tactic, redundancy is an important means of ensuring that
        communication between the flowmeter and host system is maintained, even in the
        presence of single or multiple failures.
        The safety contract between the hover flowmeter and the iPMS for ensuring that
        communication failures are contained is shown in Table 4-4.


         Failure Containment Constraints
         Function              Data Transfer/Communication                                       SIL     2
         COTS Device           Hover Flowmeter
         Potential Failure               COTS Device (Supplier)    IMP   Host System (Client)            IMP
         Architecture Level
         Corrupted transfer of data      Provide connectivity to   H     Provide redundant means of      H
         between host system and         the host system network         transferring data
         device.                                                         Provide redundant means of      M
                                                                         measuring      flow through
                                                                         analytic sensing
         Behavioural Level
         Device fails to transfer data   NA                        NA    Switch off flowmeter signal
         to host system                                                  and use analytical sensing to
                                                                         determine flow
                      Table 4-4 Failure Containment Constraints (Communication)

Adrian E Hill                                        -86-                                         04/09/2008
4.8.3 Initialisation Failure Containment Tactics
        Safety Tactic FC#3 – On detected failure of device initialisation routine, allow system to
        command device to reinitialise a set number of times.
        On detection of an initialisation failure the system should be able to determine if the
        flowmeter was really defective or if the initialisation had failed due to a transient problem.
        In terms of containment, a transient problem should not be allowed to deny the system
        access to the resource offered by the flowmeter. However, the system needs a high
        level of confidence that the status of the flowmeter is valid before operating in hover
        mode. In the proposed hover system it is therefore assumed that the command to
        operate in hover mode is dependant upon the state of the system. If the flowmeter had
        failed to be successfully initialised as part of the plug and play function it would not have
        done so at a critical point in the process and would therefore not immediately contribute
        to a hazard. However, the flowmeter cannot be allowed to continually consume system
        resources by endlessly attempting to initialise and contribute to a system wide hazard.
        In this case, it would be reasonable to allow the system to reset and reinitialise the
        flowmeter a set number of times, probably three attempts would be reasonable with a
        set period between attempts. By limiting the number of attempts to initialise the
        flowmeter the system will prevent the flowmeter from babbling whilst ensuring that there
        is a high probability that the flowmeter is correctly identified as being defective before
        flagging it as such.
        In this case it should also be recognised that the replacement and initialisation of the
        flowmeter is a maintenance task that would be undertaken while the submarine was in a
        safe state and not at times of high stress. Therefore this safety tactics is effective in
        assisting the maintainer in the replacement of a defective flowmeter, ensuring that the
        initialisation process is effective.
        Safety Tactic FC#4 – On detected failure of system to configure device then allow
        system to reset and retry configuration set up.
        This safety tactic is very similar to FC#3 in that it is focused on the maintenance of the
        flow measurement function in the most effective way. The safety tactic is designed to
        address failures of the asset management application that could cause the flowmeter to
        be incorrectly configured in the iPMS.
        The issue for the case study is how to implement this type of function. The failure is in
        the asset management application and within the FDT/DTM system there is a function
        that can be used to check the system configuration with the DTM device driver for
        consistency. The question is how this functionality can be used to not only detect a
        failure but also to drive a reconfiguration of the system. It is not clear from the FDT/DTM
        documentation [46] if this is possible, however there may be ways in which such
        functionality could be implemented as a bespoke application. Although it is not clear
        how this tactic might be implemented, it is nonetheless still valid to consider it and to
        write a safety contract for its implementation. The issue of implementation should then
        be a consideration when selecting a COTS flowmeter and when finalising the
        requirements for the iPMS.
        If implementation of this safety tactic is impossible then the main consequence is that on
        detection of a failure the flowmeter configuration within the asset management would
        need to be manually updated and validated
        The safety contract between the hover flowmeter and the iPMS for ensuring that
        initialisation failures are contained is shown in Table 4-5.



Adrian E Hill                                    -87-                                      04/09/2008
         Failure Containment Constraints
         Function                 Flowmeter Plug and Play Initialisation                                      SIL   2
         COTS Device              Hover Flowmeter
         Potential Failure              COTS Device (Supplier)      IMP      Host System (Client)                   IMP
         Architectural Level
         Device initialisation fails    NA                          NA     Switch off flowmeter signal and use      M
         either as a result of                                             analytical sensing to determine flow
         device failure or host
         system failure
         Behavioural Level
         Device fails to read initial   Allow multiple              H      Limit re-initialisation to three         H
         settings                       initialisation attempts            attempts before setting the
                                                                           flowmeter to defective.
         Device continuously            Implement timeout           H      Reset and re-initialise flowmeter up     H
         operates in initialisation     period for initialisation          to a maximum of three times before
         mode                           task                               setting the flowmeter to defective.
         System continuously            NA                          NA     Reset and re-initialise asset            H
         sends initialisation data                                         management application
         System transmits grossly       NA                          NA     System comparison with defined           M
         incorrect initial settings                                        parameter boundaries.
         System transmits subtly        NA                          NA     System comparison using analytical       H
         incorrect initial settings                                        sensing
                          Table 4-5 Failure Containment Constraints (Initialisation)

4.8.4 Diagnostics Failure Containment Tactics
        There is only one possible tactic that could be used to contain a failure of the in-built
        flowmeter diagnostic function that results in an undetected measurement failure and this
        is through the use of analytic sensing so that the flow measurement function is
        maintained.
        The principles of analytical sensing are discussed in section 4.8.1 and safety tactic
        FC#1. These can be used to ensure that the flow measurement function is maintained
        through the calculation of a derived flow measure from a diverse range of field
        instrumentation and real time submarine dynamic response.
        The safety contract between the hover flowmeter and the iPMS for ensuring that
        diagnostic failures are contained is shown in Table 4-6.


         Failure Containment Constraints
         Function                 Flowmeter Diagnostics                                                       SIL   2
         COTS Device              Hover Flowmeter
         Potential Failure                             COTS Device (Supplier) IMP        Host System (Client)       IMP
         Architectural Level
         Inbuilt diagnostics fail to detect device      NA                       NA      Switch off flowmeter       H
         failure                                                                         signal     and    use
         Device remains in diagnostic mode                                               analytical sensing to
                                                                                         determine flow
         following initialisation
         Device fails to run self diagnostics
         following initialisation
         Validity check of device parameters
         fails to detect error in device description
                         Table 4-6 Failure Containment Constraints (Diagnostics)

Adrian E Hill                                                -88-                                             04/09/2008
4.9   Summary of Case Study Safety Tactics
        Table 4-7 presents an overview of the link between the generic safety tactics that were
        introduced in Chapter 3 and the case study specific safety tactics.


         Case Study Safety Tactic                                         Generic Safety Tactic
         FA#1 Acquire a flowmeter that has been independently                    FAST#1
              certified to IEC 61508 SIL 2.
         FA#2 Ensure that failures are avoided by choosing the most              FAST#1
              suitable COTS flowmeter for the role and that                      FAST#2
              appropriate constraints on the flowmeter and iPMS
              have been defined and applied using a range of
              appropriate safety contracts.
         FD#1 Implement appropriate device polling techniques                    FDST#1
         FD#2 Implement appropriate data integrity checks                        FDST#3
         FD#3 Implement appropriate timeout techniques                           FDST#2
         FD#4 Implement device identification through asset                      FDST#1
              management and system configuration techniques
         FD#5 Validate device configuration through the use of Host              FDST#1
              system configuration check.
         FD#6 Verify device is operating in correct operating mode               FDST#1
              for plant state.
         FD#7 Verify flow measurement signal is within set limits.               FDST#3
         FD#8 Host system should check status of flowmeter                       FDST#1
              following reconfiguration and periodically during its
              operation
         FC#1 On detection of a failure switch to analytic sensing               FCST#1
                                                                                 FCST#3
         FC#2 Provide a redundant communication path between                     FCST#1
              the flowmeter and the host system.
         FC#3 On detected failure of device initialisation routine,              FCST#2
              allow system to command device to reinitialise a set
              number of times.
         FC#4 On detected failure of system to configure device then             FCST#2
              allow system to reset and retry configuration set up.
                         Table 4-7 List of Safety Tactics used in Case Study


        The instantiation of the generic safety tactics was undertaken as a part implementation
        of the life cycle presented in chapter 3. The three main tactics of failure avoidance,
        failure detection and failure containment have been decomposed into lower level safety
        tactics and translated into a number of safety contracts. The development of the safety
        tactics demonstrated that reasonable coverage of the issues associated with
        reconfigurable plug and play functionality can be achieved although it is difficult to
        establish an appropriate measure for determining whether or not the list of safety tactics
        in Table 4-7 is complete. It is expected that a full implementation of the lifecycle would
        validate the completeness of the safety tactics and associated safety contracts through
        formal peer reviews processes.


Adrian E Hill                                  -89-                                    04/09/2008
4.10 Safety Justification

4.10.1 Overview of Safety Justification
        Having completed the safety analysis of the plug and play function of COTS flowmeter in
        the context of a submarine hover control application and iPMS, a number of safety
        tactics were developed in an attempt to derive a set of safety requirements for the
        flowmeter. The safety requirements were then documented as safety contracts between
        the COTS flowmeter and the host system (iPMS). Taking the safety tactics and
        associated safety contracts a high level safety argument was developed.




                                 Figure 4-6 Overall Safety Argument
        Figure 4-6 presents the top layer of the safety argument that the use of intelligent plug
        and play technology can be safety used in the context of the case study. Goal
        Structuring Notation (GSN) has been used as a means of diagrammatically representing
        the argument structure. More information about the development and use of GSN may
        be found in Kelly, Arguing Safety - A Systematic Approach to Managing Safety Cases
        [48]. Appendix B provides an overview of the GSN as well as a more detailed safety
        argument.
        The top level goal recognises that the work presented in this case study relates to the
        plug and play aspects of the flowmeter within the context of the submarine iPMS.
        Although not discussed here, it is recognised that other aspects of the flowmeter’s use,
        including the safety requirements relating to its measurement performance, must be also
        be successfully argued in a complete safety case. The strategy of arguing that the plug
        and play implementation does not present a hazard is supported by further arguments
        that are based on the three main safety tactics of failure avoidance, detection and
        containment.
        The argument strategy for failure avoidance, shown in Figure 4-7, is based on the COTS
        acquisition process as a means of ensuring that the best fit device is purchased for the
        given requirements. The first of two supporting goals concern the integrity of the
        flowmeter itself and the kind of arguments that might be able to be made concerning
        safety certification or compliance to safety standards for the development of complex
        electronics devices. The second goal concerns the process for deriving the safety

Adrian E Hill                                  -90-                                   04/09/2008
        requirements from the safety tactics and how those requirements are satisfied. It is
        recognised that the argument strategy for failure avoidance is strongly biased toward the
        COTS flowmeter. This is because the plug and play functionality is an integral part of
        the flowmeter acquisition and that the iPMS host system plays a supporting role and
        therefore a greater emphasis on ensuring a good requirements match for the flowmeter
        is achieved.




                                                 Figure 4-7 Failure Avoidance Argument
        The argument strategy for failure detection and containment is shown in Figure 4-8. The
        strategy for both these areas of the safety justification is based on the failure detection
        and containment safety tactics and associated safety contracts. The focus at the
        supporting level goals is on the attributes of communication, plug and play initialisation
        and diagnostics.
                                   FailureDetection

                                   Dangerous failures are
                                   detectable to an acceptable
                                   level of probability



                                        FDStrategy                                                  FailureContainment
                                     Argue that communication,                                      Dangerous failures have
                                     initialisation and diagnostic                                  been contained to an
                                     failures are detectable by the                                 acceptable level of
                                     iPMS and COTS Flowmeter                                        probability



                                                                                                        FailureContainmentStrat
          CommunicationFD            InitilisationFD               DiagnosticFD
                                                                                                      Argue that communication,
          Communication              Initialisation failures       Diagnostic failures are            initialisation and diagnostic
          failures are detected      are detected                  detected                           failures can be contained by the
                                                                                                      iPMS and COTS Flowmeter




                      FCAnalyticaAssumption
                                                                      FCAnalytical                  FCComms
                                                                                                                                     FCInitilisation
                      It is assumed that failure                                                    Communication failures are
                                                                      Analytical sensing provides   contained through provision
                      detection is capable of                                                                                        Initialisation failures
                                                                      redundant and diverse         of redundant Network
                      detecting a failure of flowmeter                                                                               are contained
                                                                      means of measuring flow       Topology
                      and iPMS diagnostics
                                                               A



                                  Figure 4-8 Failure Detection and Containment Argument
Adrian E Hill                                                                -91-                                                             04/09/2008
        Unlike failure avoidance that is predominately focused on the COTS flowmeter, the
        safety arguments around failure detection and containment have a much wider scope
        that includes the extent to which the design of the iPMS may also be influenced by the
        safety tactics that need to be employed in order to mitigate the use of the plug and play
        function.

4.10.2 Safety Justification Issues
        As well as presenting a justification for the choice of flowmeter, the design of the host
        system, in this case iPMS, also needs to be considered where there are safety contracts
        that make a demand upon it. This is not explicitly brought out in the safety justification
        due to the way in which the focus on the COTS acquisition process forces the thinking
        toward the COTS flowmeter. However, it is recognised that as far as the iPMS is
        concerned, the specific impact of each safety tactic has been needs to be assessed and
        appropriate safety arguments developed to support the use of the plug and play
        technology. In most cases where the safety contracts drive the iPMS design the
        bespoke nature of the process control applications enable the contracts to be satisfied in
        support of the overall safety justification.
        More problematic is the impact upon the COTS software components that are used
        within the infrastructure layer of the overall architecture. In order to be able to
        successfully argue that the hazards associated with the plug and play function, including
        device initialisation and reconfiguration, have been sufficiently reduced there needs to
        be a robust justification for the actual solution that is implemented. Whilst it is apparent
        that a large part of that justification needs to be around the selection of the COTS
        flowmeter there also needs to be a significant amount of work undertaken around the
        justification of the COTS elements of the host system. In a process automation system
        with asset management this inevitably means the network infrastructure and asset
        management applications.

4.11 Conclusion to Case Study
        The proposal for using a tactical approach to system safety when implementing
        intelligent plug and play devices has been applied to a submarine hover application.
        The hover application has been shown to be a reasonable basis for the case study due
        to its importance in maintaining the safety critical function of the submarine’s position
        while at slow speeds or stopped. The hover application is hosted on an iPMS distributed
        control system and therefore the choice of an intelligent flowmeter in the context of a
        process automation system was reasonable.
        With regard to the development of specific safety tactics in the areas of failure
        avoidance, detection and containment for the hover system, the case study has taken
        the generic solution defined in Chapter 3 and applied it in the context of this specific
        application. Because of the specific nature of the system and field device used for this
        case study, a set of lower level safety tactics were developed using the principles
        proposed in Chapter 3. This additional work has demonstrated that the process for
        developing safety tactics for a reconfigurable plug and play can be tailored for specific
        applications.
        Safety contracts have been developed and a high level safety justification has been
        presented for the plug and play scope of the hover application. It is recognised that a
        more detailed safety argument is required and that further work is required in this area.




Adrian E Hill                                   -92-                                     04/09/2008
5 Project Conclusion
      This chapter discusses the conclusions that can be made from the work undertaken by this project
      by providing,
          • A Project Summary providing an overall summary of what has been achieved while
              undertaking this project.
         •      An Overall Conclusion providing a number of concluding remarks about the project in terms
                of its conduct, results and the benefits of the work in the area of safety critical engineering.

5.1    Project Summary
         This project has sought to address the question that was posed in the introduction to this
         report, i.e. “how can the benefits of systems that use intelligent reconfigurable hardware
         be realised without compromising safety”? In seeking to address this question the
         following activities have been undertaken,
         •      A survey of the most relevant research and documented practices relating to key
                industries was undertaken in order to understand what provision was already in
                place with regard to reconfigurable intelligent devices and to identify areas that
                could be taken further. The literature survey provided an opportunity to take a
                broad view of the use intelligent hardware across industry and to investigate a range
                of technologies from the aviation and process control industries. It was from this
                survey that the project scope was finally defined and focused on the plug and play
                aspect of reconfiguration that was then taken forward into the proposal and
                evaluation phases of the project.
         •      A proposal based on a tactical approach to addressing failures that are related to
                plug and play technologies was developed from previously established work in the
                areas of safety assurance contracts undertaken by Conmy et al [38], acquisition of
                COTS components by Ye [33] and safety tactics by Wu and Kelly [41]. As part of
                this proposal a generic set of safety tactics were discussed with regard to the safety
                justification of plug and play intelligent hardware. A process to use safety tactics
                was also proposed with the aim of showing how safety tactics may be instantiated.
         •      A case study based on an intelligent plug and play flowmeter was used in the
                context of a nuclear powered submarine application to evaluate the use of safety
                tactics for reconfigurable process control devices. The case study provided the
                opportunity to instantiate the generic safety tactics by decomposing them to a lower
                level of detail. Although the full lifecycle detailed in section 3.6 was not completed,
                the scope of the case study was broad enough to enable an evaluation of the main
                topics, i.e. instantiation of generic safety tactics and development of safety
                contracts. Selection of a COTS flowmeter was impaired by limited opportunities to
                search the market place for the most appropriate device. The final safety
                justification was limited to a high level argument structure that was not decomposed
                down to the solution level that would be needed to support a safety case (see
                section 4.10 and Appendix B).

5.2    Overall Conclusion
         The main conclusions that can be drawn from the application of safety tactics to
         intelligent plug and play technology in the context of process automation are that,
         1. The use of safety tactics can be used to reduce the risks associated with intelligent
            hardware primarily by effectively reducing the contribution the intelligent functions
            within the field device make to the safety critical functions of the system. This
            reduction in contribution to a safety critical function could be considered as a high
            level safety tactic where a SIL target has been determined for a specific function and
Adrian E Hill                                          -93-                                         04/09/2008
            intelligent diagnostics and asset management including plug & play configuration
            contributes to that function. Typically, it can be very difficult to safety justify a COTS
            intelligent device based on the traditional process argument to mitigate systematic
            failure of the embedded software. Therefore, the overall tactic to reduce the
            contribution of the software based intelligence through system level safety tactics
            can be very powerful in providing a compelling argument that the risks have been
            reduced to a tolerable level.
        2. By implementing system level safety tactics that seek to ensure that failures are
           avoided, detectable and containable, it can be shown that the system can be fault
           tolerant to at least a single failure of the intelligent functions and that a failure does
           not increase the risk that the system would fail in an unsafe state. This is based on
           the premise that the safety tactics can be successfully implemented. In reality, the
           levels of risk reduction may not be fully realised because the lack of availability of
           suitable COTS field devices may impact on the implementation of the safety tactics
           and or the host system may not be flexible enough to support the demands placed
           upon it by the most appropriate safety tactics. In such circumstances there may
           need to be a trade off between the desired functionality and the extent to which the
           system can be justified from a safety perspective. When using safety tactics with
           COTS components it is clear that an optimum solution may not be achievable and
           that some compromise may be necessary.
        3. The development of safety tactics and the translation of those tactics into safety
           contracts were found to be heavily weighted towards the host system. This was
           caused by the inherent difficulties with the use of COTS components in safety critical
           systems where very little, if any, influence can be exercised over the functions and
           behaviour that are provided by the component. Typically, COTS components are
           bought with a fixed specification or at best a number of options that can be chosen
           from a limited list. This means that while a component may be the best possible fit
           given the requirements, there is a high reliance on the host system, where typically
           there is greater influence over the development of the system to implement risk
           reduction measures. In recognising this issue, it has also to be recognised that the
           development lifecycle of a process automation system has to be heavily weighted
           toward architectural analysis to ensure that short comings in the COTS component
           can be successfully mitigated through the implementation of system level safety
           tactics.
        4. The practicalities of bringing together COTS acquisition and safety contracts under
           the umbrella of safety tactics for reconfigurable intelligent hardware worked to the
           extent that a systematic approach to developing appropriate safety tactics was
           achievable. However, the evaluation of the project proposal revealed a fundamental
           issue concerning the traceability between the safety tactics, safety contracts and the
           final safety justification. The semantics of writing safety tactics, safety contracts and
           safety justification are very different and it became quite difficult to map the safety
           contracts back to the safety tactics in such a way that the safety tactics were fully
           supported. In a similar way, it was also difficult to express the safety justification in a
           format that allowed it to be easily checked against the safety contracts and safety
           tactics to ensure that the safety justification was complete and supportable. Ye [33]
           developed safety contracts using goal centred language from which safety goals
           could easily be developed as part of a GSN argument and Wu et al [41] documented
           safety tactics using GSN using contractual language. From these two approaches a
           better method of combining and tracing from safety tactics through to safety
           justification could have been developed.




Adrian E Hill                                    -94-                                      04/09/2008
6 Future Work
      The scope for future work in the area of intelligent hardware and reconfigurable systems is focused
      on the following areas,
         •    Further Development of the subject areas around the tactical approach to safety and
              dynamic reconfiguration that were part of the original project scope but were not addressed
              due to the need to reduce the final scope.
         •      Emergent work related to intelligent hardware and dynamic reconfiguration that warrant
                further research including autonomous devices and wireless technologies.

6.1    Further Development

6.1.1 Development of Patterns
         The overall safety justification for the case study was presented at a fairly high level and
         did not fully describe in detail how a safety case might be constructed. One of the
         techniques that can be used with GSN is the development of safety case patterns. Each
         pattern presents a generic structured safety argument with detailed notes about how the
         pattern could be implemented. It would be helpful if safety argument patterns could be
         developed for safety cases that have within their scope intelligent reconfigurable
         hardware.

6.1.2 Safety Tactics for Increasing Complexity
         The project began with a broader scope, looking at the use of intelligent hardware and
         reconfigurable systems. The final project proposal and evaluation focused on plug and
         play configuration which is just one aspect of this area. Therefore more work could be
         undertaken with regard to reconfigurable systems as complexity is increased in the
         following areas,

         •   Increased complexity in plug and play applications that use intelligent devices
             sourced from different manufacturers or with updated and different functions to those
             originally considered during system design. In these cases the work undertaken by
             those in the IMA world may be transferable, specifically with regard to the use real
             time blueprints. In plug and play systems it is envisaged that in the first place
             blueprints could be used to store multiple configurations of a device type that store
             specific device drivers and parameter settings from a number of manufacturers.
             This would enable a defined number of device variants to be used and so increase
             flexibility with regard to the choice of device through the life of the system.

         •   Dynamic system reconfiguration in which systems self heal following a failure would
             dramatically improve system availability rates as well as have a positive effect where
             safe operation of the system depends on the resilience of the system to single or
             multiple failures. It is envisaged that on the failure of a field device, the host system
             would be capable of dynamically reconfiguring itself to enable control to be
             maintained to a high level of confidence and the failure to be tolerated. In the
             presence of multiple failures such a system could positively contribute to graceful
             system degradation by ensuring that critical processes were maintained by
             reassigning or sacrificing less critical intelligent devices. The principles of using
             safety tactics could be used to ensure that the requirement for such functionality was
             captured and implemented.




Adrian E Hill                                      -95-                                       04/09/2008
6.2   Emergent work

6.2.1 Dynamic Reconfiguration at the Field Device Layer
        The development of safety tactics for the reconfiguration of process control devices in
        this project scope was limited to system level. Therefore, further work needs to be
        undertaken with respect to the means by which the function of reconfiguration can be
        safety implemented within the individual device firmware. One way of achieving safe
        implementation of intelligent reconfigurable hardware might be to implement the
        principles of the safety tactics developed in this report at the macro level. Consideration
        of software architectural analysis and the robust implementation of self diagnostics and
        health management would form part of this analysis.

6.2.2 Safety Analysis of Device Description Languages
        While reviewing the various methods of implementing intelligent field devices there was
        an issue that came to light concerning the use of device description languages that are
        used in conjunction with the most widely used Asset Management software applications.
        Currently there are three main contenders for device descriptions, these are, DDL,
        EDDL and FDT/DTM. In each case, the device description language is used to
        implement specific intelligence with a field device and enables that intelligence to be
        used by the asset management software. However, how safe are such languages? Are
        there specific unsafe features that need to be addressed? Are some languages better
        than others and can a safe subset be defined as is the case with more traditional
        programming languages such as C? As the device description language and asset
        management applications play a significant role in the safety tactics discussed in this
        project it would be useful if further research was undertaken to address these questions.

6.2.3 Autonomous Devices
        The current safety tactics developed during this project rely heavily on the infrastructure
        layer of the host system, i.e. network management and asset management applications.
        There is also some reliance on the application layer to provide operator warning etc.
        However, the advent of new autonomous devices that when networked together become
        known as distributed intelligent agents (DIA), moves the decision making about
        reconfiguration from the host system down to the field device layer. From information
        gathered from current research in this area (references [27] & [28]) the work appears to
        be focused on developing the technical aspects of DIA and therefore it appears that
        further research in to the way such systems might be implemented in safety critical
        systems is required. It would also be useful to determine how safety arguments for such
        systems might be developed in support of system safety cases.

6.2.4 Wireless Devices
        Wireless technology is seen by the major instrument manufacturers as a means by
        which intelligence can be easily implemented as part of process control solutions.
        Although manufacturers are claiming safety certification to IEC 61508 up to SIL 3 for
        wireless enabled devices [49] what does this really mean in terms of the use of wireless
        field devices in safety critical applications? Is there an opportunity to take the concept of
        wireless further and implement devices that scavenge power from the environment
        rather than rely in battery technology in safety critical applications?
        As the UK Nuclear industry in conjunction with the HSE struggle to justify the use of
        intelligent hardware in safety related systems what further work needs to be undertaken
        to satisfy the regulatory authorities regarding the use of wireless technologies?

Adrian E Hill                                   -96-                                      04/09/2008
        Can the safety tactics discussed in this project report go some way towards the safety
        justification of wireless devices or are additional safety tactics required and if so what
        are they?
        And finally, with the benefits of reduced cable weight driving the use of wireless
        technologies, can such technologies be safety implemented on platforms such as
        aircraft, ships and submarines?

6.2.5 Safety Certification
        It is now common practice for instrument manufacturers to offer 3rd party independent
        safety certification as an option when selling their products. Typically, products are
        certified to the IEC 61508 international safety standard [5] although other sector specific
        standards may also be offered. It has been assumed as part of this project that the use
        of 3rd party safety certification can be used as part of a COTS acquisition safety tactic,
        but how useful are such certificates in reality? Are the certificates trustworthy and are
        some more trustworthy than others? This is a difficult subject that may be impossible to
        solve due to the inevitable commercial issues that would be encountered. However, if
        such certification was determined to be acceptable from a regulatory perspective, a
        significant amount of work may be saved and resultant safety cases strengthened.




Adrian E Hill                                  -97-                                     04/09/2008
7 References

[1]   Slabodkin G, “Software glitches leave Navy Smart            Ship   dead   in   the   water”
      http://www.gcn.com/print/17_17/33727-1.html, 1998.
[2]   Ministry of Defence, Defence Standard 00-56, “Safety Management Requirements for
      Defence Systems Part 1: Requirements”, Issue 4, 01/06/2007.
[3]   Ministry of Defence, Defence Standard 00-54, “Requirements For Safety Related
      Electronic Hardware In Defence Equipment Part 1: Requirements”, Issue 1, 19/03/1999.
[4]   Ministry of Defence, Defence Standard 00-55, “Requirements For Safety Related
      Software in Defence Equipment Part 1: Requirements”, Issue 2, 01/08/1997.
[5]   IEC 61508 Parts 1-7, “Functional safety of electrical/electronic/programmable electronic
      safety-related systems”, 2002
[6]   Ministry of Defence, JSP 430, “Ship Safety Management Part1: Policy”, Issue 3,
      September 2006.
[7]   Ministry of Defence, Interim Defence Standard 00-74, “ASAAC Standards, Part 1:
      Proposed Standards for Software”, Issue 1, 14/01/2005
[8]   Ministry of Defence, Interim Defence Standard 00-74, “ASAAC Standards, Part 2:
      Rationale Report for Software Standards”, Issue 1, 14/01/2005
[9]   DO-178B “Software Considerations in Airborne Systems and Equipment Certification”,
      RTCA, Issued 1st December 1992
[10] DO-254 “Design Assurance Guidance for Airborne Electronic Hardware”, RTCA, Issued
     19th April 2000
[11] Shuttleworth J, “Integrating the ALARP Principle with the use of Safety Integrity Levels”,
     MSc Thesis, Department of Computer Science, University of York, 2007
[12] Nicholson M, “Health Monitoring for Reconfigurable Integrated Control Systems”, System
     Safety Symposium, Southampton, February 2005
[13] Jolliffe G, “Exploring the Possibilities Towards a Preliminary Safety Case for IMA
     Blueprints”, MSc Thesis, Department of Computer Science, University of York, 2004
[14] Hollow P, McDermid P, Nicholson M, “Approaches to Certification of Reconfigurable IMA
     Systems”. INCOSE 2000, Minneapolis, USA, July 2000.
[15] Leveson Nancy G, “A Systems-Theoretic Approach to Safety in Software-Intensive
     Systems”, Figure 3, IEEE Trans. on Dependable and Secure Computing, January 2005
[16] K Najafi, “Smart Sensors”, Journal of Micromechanics and Microengineering, 1991, 86-
     102.
[17] Dayashankar Dubey, “Smart Sensors”, M.Tech. Credit seminar report, Electronic Systems
     Group, EE Dept, IIT Bombay, submitted November 2002.
[18] Brignell J E, “Sensors, A Comprehensive Survey”, 1989, ed W Gopel, J Hesse and J N
     Zemel vol 1 (New York VCH)
[19] Giachino J M, “Sensor and Actuators”, 1986, 10 239-48
[20] British Nuclear Group, Standard BNF.EG.0036_1, “Design Guide for the use of Smart
     instruments or electrical equipment containing microprocessors & firmware”, 2003, British
     Nuclear Group.
[21] Bailey Krisi, “Beyond Basics - Asset Management Software pushes advanced
     diagnostics”, InTech October 2004


Adrian E Hill                                -98-                                     04/09/2008
[22] Aaseng, G B, “Blueprint for an integrated vehicle health management system”, Honeywell
     Inc., 2001
[23] The Hart Communications Foundation at www.hartcomm2.org/index.html accessed
     November 2007.
[24] The Hart Communications Foundation, “Hart Application Guide HCF LIT 34”, 1999
[25] Adler Bud, “Using Hart To Increase Field Device Reliability”, Moore Industries-
     International Inc, 2001
[26] Lee Kang, “IEEE 1451: A Standard in Support of Smart Transducer Networking”, National
     Institute of Standards and Technology, 2000
[27] Discenzo F M, Maturana F P, Staron R J, "Distributed Diagnostics and Dynamic
     Reconfiguration Using Autonomous Agents", ICCS2006, 2006
[28] Pechoucek M and Marik V “Industrial Deployment of Multi-Agent Technologies: Review
     and Selected Case Studies”, to appear in: International Journal on Autonomous Agents
     and Multi-Agent Systems. 2008.
[29] HSE, “Control & Instrumentation Reactor Plant Strategy Statement and contents”,
     Reactor Nuclear Research Index 2006/07.
[30] CNNRP. “The Regulatory Expectations for Computer Based Safety Systems”,
     SA/RSD/N/028 Issue 1, August 2005.
[31] HSE, “Control Systems Technical Measures Document”, Web Site Page
     http://www.hse.gov.uk/comah/sragtech/techmeascontsyst.htm, Accessed July 2007.
[32] Hill, A, “Meeting Notes from the Nuclear Smart Instruments Working Group”, Internal BAE
     Systems Document eDMS 8233190 issue 1, November 2006.
[33] Fan Ye, “Justifying the use of COTS components within Safety Critical Applications, PhD
     Thesis, Department of Computer Science, University of York, 2005.
[34] Foundation Fieldbus Website, www.fieldbus.org, accessed November 2007
[35] Profibus and Profinet International, www.profibus.com, accessed December 2007
[36] PInternational, “ProfiSafe System Description”, July 2007
[37] Yalcinkaya F, Atherton D.P, Calis H, Powner E.T, “Intelligent Sensors and decentralised
     control”, UKACC International Conference on CONTROL ‘98, September 1998,
     Conference Publication No. 455
[38] Conmy P, Nicholson M, McDermid J, “Safety Assurance Contracts for Integrated Modular
     Avionics”, 8th Australian Workshop on Safety Critical Systems and Software, 2003
[39] Edwards R A, “Integrated modular systems - Architecture Concept Summary”, BAe-WSE-
     RP-RDA-AAA-535, Issue 1, 26 April 2001BAE Systems
[40] Potter D, “Smart Plug and Play Sensors”, IEEE Instrumentation and Measurement
     Magazine, March 2002.
[41] Wu W and Kelly T. P, "Safety Tactics for Software Architecture Design," in Proceedings of
     the 28th Annual International Computer Software and Applications Conference
     (COMPSAC'04), Hong Kong, PR China, pp. 368-375, IEEE Computer Society, 2004.
[42] Weaver R, Fenn J and Kelly T, “A Pragmatic Approach to Reasoning about the
     Assurance of Safety Arguments”, Department of Computer Science. The University of
     York
[43] Amkreutz R, Beurden I, “What does proven in use imply”, exida, as published in
     Hydrocarbon Processing, 2004



Adrian E Hill                                -99-                                   04/09/2008
[44] Ferrell, T.K., Ferrell, U.D., "Use of service history for certification credit for COTS [avionic
     software]," Aerospace and Electronic Systems Magazine, IEEE , vol.18, no.1, pp. 24-28,
     Jan 2003
[45] Lynch, J, “Hover System Flowmeter SIL Analysis”, BAE Systems eDMS8215702, 2006
[46] FDT Group, “FDT Technical Description”, www.fdtgroup.org, 2007
[47] Rockwell, “Incident Report Ref. ACIG 2006-11-001”, Rockwell Automation, November
     2006
[48] T.P. Kelly, Arguing Safety - A Systematic Approach to Managing Safety Cases,
     DPhilThesis, Department of Computer Science, University of York, York, 1998.
[49] Rosemount 3051S Product Data Sheet, 00813-0100-4801, Rev KA, Emerson Process
     Management, March 2008.
[50] WIB Test Report T 2768 X 07 “FDT/DTM or EDDL For Asset Management Using FF
     Technology”, International Instrument Users Association, November 2007
[51] Siemens, “SIMATIC PCS 7 The process control system for all sectors”, Product brochure,
     2005




Adrian E Hill                                   -100-                                     04/09/2008
Appendix A- Plug and Play Failure Modes

Keywords
        Commission; Omission; Grossly Incorrect; Subtly Incorrect
Failure mode             Immediate effect      System effect             Hazard                   Environmental          / Possible       Safety
                                                                                                  Operational              Tactic
                                                                                                  contributory factors
Function: Communication channel between the host system (system) and the flowmeter (device). Transmission of asset management and
configuration data between the flowmeter and the host system.
Omission                 Data transfer fails   System is not             No immediate hazard      Network failure          Detection by CRC
Corrupted transfer of    and device remains    updated                                            Protocol errors          checks
data between host        out of service                                                                                    Containment by
                                                                                                  Timing errors on
system and device.                                                                                network                  redundant profiSafe
                                                                                                                           network.
Omission                 Device is not         Device does not           No immediate hazard      Network failure          Detection by warning
System fails to detect   integrated into the   appear in the system                               Protocol errors          to operator that
device                   overall system        topology and                                                                device is unavailable
                                               therefore its output is                            Timing errors on
                                                                                                  network                  Containment by
                                               unavailable to the                                                          disabling function
                                               application layer
Commission               Incorrect device is   Device appears in the     Incorrect process        Incorrect redundancy     Detection by
System detects and       integrated into the   system topology as        data is passed to the    scheme may cause         validation checks
initialises incorrect    system                the correct device but    application leading to   two devices to           based on plant
device                                         does not provide the      loss of process          appear as the same       behaviour
                                               required output.          control                  device                   Containment by
                                               Device ceases to                                                            Analytic redundancy
                                               perform intended
                                               function



Adrian E Hill                                                        -101-                                                             08/09/2008
Failure mode            Immediate effect         System effect             Hazard                   Environmental          / Possible        Safety
                                                                                                    Operational              Tactic
                                                                                                    contributory factors
Commission              Device forced into a     System cannot             Loss of process          None                     Detection by use of
System fails to         receive data mode        update application        control due to lack of                            timeouts
disconnect from         and cannot continue      flow data from the        data
device when required    with normal operation    device
Commission              Device unable to         System cannot             Loss of process          None                     Detection by use of
Device fails to         process flow             update application        control due to lack of                            timeouts and
disconnect from         measurement data         flow data from the        data                                              watchdog
system when                                      device
required


Function:.... Initialisation of flowmeter on power up following insertion into the host system.
Omission                Device is not updated    Device continues to       Loss of process          Network failures         Detection by read
Device fails to read    with correct             operate with factory      control due to                                    back of device
initial settings        configuration            default settings          incorrect data                                    configuration to host
                                                                                                                             system along with
                                                                                                                             configuration check.
Omission                Application makes        System may fail to        Loss of process          Network failures         Detection by
System fails to         incorrect decisions      operate depending in      control due to           Maintainer failures      validation checks
update system           based on old data        extent of the failure     incorrect                                         based on plant
configuration with                                                         configuration                                     behaviour
device data                                                                                                                  Containment by
                                                                                                                             Analytic redundancy
Commission              Flowmeter fails to       Configuration and         Loss of process          None                     Detection by validity
Device continuously     provide normal           measurement data is       control due to lack of                            checks by host
operates in             operation due to         unavailable to the        data                                              system
initialisation mode     being stuck in an        system infrastructure                                                       Containment by
                        initialisation loop.     and applications                                                            Analytic redundancy

Adrian E Hill                                                           -102-                                                            08/09/2008
Failure mode               Immediate effect          System effect           Hazard                  Environmental          / Possible        Safety
                                                                                                     Operational              Tactic
                                                                                                     contributory factors
Commission                 Flowmeter fails to        Flowmeter will         Loss of process          None                     Detection by validity
System continuously        provide normal            appear to be out of    control due to lack of                            checks by host
sends initialisation       operation due to          service and additional data                                              system
data                       continually servicing     system resources will                                                    Containment by
                           demands from              be utilised.                                                             Analytic redundancy
                           system
Grossly Incorrect          Flowmeter operates        Application will        Loss of process         Data corruption by       Detection by validity
System transmits           with incorrect settings   process incorrect       control due to          the network or           checks by host
grossly incorrect                                    data from the           incorrect data          system                   system
initial settings                                     flowmeter                                                                Containment by
                                                                                                                              Analytic redundancy
Subtly Incorrect           Flowmeter operates        Application will        Loss of process         Data corruption by       Detection by validity
System transmits           with incorrect settings   process incorrect       control due to          the network or           checks by host
subtly incorrect initial                             data from the           incorrect data          system                   system
settings                                             flowmeter                                                                Containment by
                                                                                                                              Analytic redundancy
Grossly Incorrect          System infrastructure Asset management            No immediate hazard     Data corruption by       Detection by validity
Device transmits           application is updated will be affected                                   the network or           checks by host
grossly incorrect          with incorrect                                                            system                   system
configuration data to      flowmeter
system                     configuration.
Subtly Incorrect           System infrastructure Asset management            No immediate hazard     Data corruption by       Detection by validity
Device transmits           application is updated will be affected                                   the network or           checks by host
subtly incorrect           with incorrect                                                            system                   system
configuration data to      flowmeter
system                     configuration.



Adrian E Hill                                                            -103-                                                            08/09/2008
Failure mode               Immediate effect          System effect            Hazard                   Environmental          / Possible        Safety
                                                                                                       Operational              Tactic
                                                                                                       contributory factors
Function: Diagnostics within the flowmeter that check for failures and general health of the device. Also system diagnostics that checks the
validity of the flowmeter configuration.
Omission                   Flowmeter may             Application will         Loss of process          None                     Detection by validity
Inbuilt diagnostics fail   transmit incorrect        process incorrect        control due to                                    checks by host
to detect device           data to the system        data from the            defective flowmeter                               system
failure                                              flowmeter and device
                                                     warning may not be
                                                     presented to the
                                                     operator
Commission                 Flowmeter fails to        Flow rate data not       Loss of process          None                     Detection by validity
Device remains in          provide normal            available to the hover   control due to loss of                            checks by host
diagnostic mode            operation due to          application              flow measurement                                  system
following initialisation   being stuck in
                           diagnostic mode.
Omission                   Failure of the            Flow rate data not       Loss of process          None                     Detection by validity
Device fails to run        flowmeter                 available to the hover   control due to loss of                            checks by host
self diagnostics           initialisation on power   application              flow measurement or                               system
following initialisation   up not detected                                    incorrect data
Omission                   Application may           Hover system fails to    Loss of process          Asset management         Detection by validity
Validity check of          process incorrect         control process.         control due to           software                 checks by host
device parameters          data or fail to access                             incorrect data                                    system
fails to detect error in   correct data.
device description




Adrian E Hill                                                             -104-                                                             08/09/2008
Appendix B – Case Study Safety Argument Structure

Top Level Argument




Failure Avoidance




Adrian E Hill                 -105-                 08/09/2008
Failure Avoidance - Safety Requirements
                                                                                       Avoidance

                                                                             FASafetyReqts

                                                                             COTS Flowmeter satisfies
                                                                             derived safety requirements




                          FAReqts
                                                                                                              FAReqtsSatisfaction
                          High level requirements
                          determined                                                                         COTS Flowmeter
                                                                                                             satisfies high level
                                                                                                             requirements

 FAFailureModes                   FASafetyContracts                     FAContractsContext

                                                                        Safety contracts derived
 Potential failure modes of       Safety Contracts for                  from high level
 COTS Flowmeter                   implementation of COTS                                                                                                FAMismatchContext
                                                                        requirements and safety                       FAMismatch
 determined                       Flowmeter derived                     tactics                                                                         Best product match from
                                                                                                                      Mismatched requirements           market survey does not
                                                                                                                      do not pose a hazard              provide a perfect match
                                                                                                                                                        with requirements

                                                      FAReqtsContext                      FAHighLevelReqts
                                                      Requirements for
                                                      COTS Flowmeter                      Attributes of COTS
                                                      derived from failure                Flowmeter match high level
                                                      modes                               requirements




                                    FAAcquisition                    FAComms                               FAPerformance              FAAssetManagement
                                                                                                           Flowmeter shall meet
                                    Flowmeter shall be acquired      Flowmeter shall support                                          Flowmeter shall support the
                                    from manufacturer with           chosen data communication             performance
                                                                                                                                      chosen asset management
                                    proven track record              protocols                             requirements
                                                                                                                                      application including health and
                                                                                                                                      diagnostic management




Failure Detection - Top




Adrian E Hill                                                                       -106-                                                                           08/09/2008
Failure Detection - Communication




Failure Detection - Initialisation




Adrian E Hill                        -107-   08/09/2008
Failure Containment




                      -END-




Adrian E Hill         -108-   08/09/2008

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:11/21/2011
language:English
pages:108