ABSTRACT NASA Office of Logic Design

Document Sample
ABSTRACT NASA Office of Logic Design Powered By Docstoc
					    The X-38 Spacecraft Fault-Tolerant Avionics System
              Coy Kouba1, Deborah Buscher1, Joseph Busa2, Samuel Beilin2

                     1. NASA-Johnson Space Center, Houston, TX
              2. The Charles Stark Draper Laboratory, Inc., Cambridge, MA

                                        September, 2003

ABSTRACT                                                      This paper describes the X-38 V201
                                                      fault-tolerant avionics system. Based on
          In 1995 NASA began an                       Draper Laboratory’s Byzantine-resilient
experimental program to develop a                     Fault-Tolerant Parallel Processing (FTPP)
reusable crew return vehicle (CRV) for                system and their Network Element (NE)
the International Space Station. The                  hardware, each flight computer exchanges
purpose of the CRV was threefold: (i) to              information on a strict timescale to process
bring home an injured or ill                          input data, compare results, and issue voted
crewmember; (ii) to bring home the                    vehicle output commands.                 Major
entire crew if the Shuttle fleet was                  accomplishments        achieved      in    this
grounded; and (iii) to evacuate the crew              development include: (i) a space qualified
in the case of an imminent Station threat             two-fault tolerant design using mostly
(i.e., fire, decompression, etc). Built at            commercial off the shelf (COTS) hardware
the Johnson Space Center, were two                    and operating system; (ii) a single event upset
approach and landing prototypes and one               tolerant network element board, (iii) on-the-
spacecraft demonstrator (called V201).                fly recovery of a failed processor; (iv) use of
A series of increasingly complex ground               synched cache; (v) realignment of shared
subsystem tests were completed, and                   memory to bring back a failed channel; (vi)
eight successful high-altitude drop tests             flight code automatically generated from a
were achieved to prove the design                     master measurement list; and (vii) built in-
concept.         In this program, an                  house by a team of civil servants and support
unprecedented amount of commercial-                   contractors.
off-the-shelf technology was utilized in
this first crewed spacecraft NASA has                        This paper presents an overview of
built since the Shuttle program.                      the avionics system and the hardware
Unfortunately, in 2002 the program was                implementation, as well as the system
canceled due to changing Agency                       software and vehicle command & telemetry
priorities.     The vehicle was 80%                   functions.    Potential improvements and
complete and the program was shut                     lessons learned on this program are also
down in such a manner as to preserve                  discussed.
design,       development,     test   and
engineering data.

        P81     2003 MAPLD International Conference, September 2003, Washington D.C.   Kouba
                                                                                                                                                                                                Page 2 of 18

            I. AVIONICS ARCHITECTURE                                                                                              computer that runs the VxWorks operating
            OVERVIEW                                                                                                              system. This board contains Draper’s Fault
                                                                                                                                  Tolerant System Services (FTSS) software,
            The X-38 V201 avionics architecture is a                                                                              which     provides   scheduling     services,
            four string, two-fault tolerant avionics                                                                              communication services, time services, fault
            system. The central part of the avionics                                                                              detection and isolation (FDI), redundancy
            architecture are the four Flight Critical                                                                             management (RM), and system support
            Computers (FCCs) and the Network                                                                                      services. The FTSS software in combination
            Element Fifth Unit (NEFU). Each FCC                                                                                   with the JSC-provided vehicle, mission, and
            consists of a Flight Critical Processor                                                                               power management software provides a basic
            (FCP), an Instrumentation Control                                                                                     environment in which applications, such as
            Processor (ICP), a Network Element                                                                                    flight control, can execute and meet all
            (NE) card, one multiprotocol/RS-422                                                                                   necessary timing and mission requirements.
            card, four digital output (DO) cards, an
            analog output (AO) card, and an IRIG-                                                                                 The ICP is the I/O processor and thus
            B/Decom card. A simplified view of the                                                                                controls all the slave I/O boards. The ICP is
            X-38 architecture is pictured in Figure 1.                                                                            also a Radstone Power PC604R single-board
                                                                                                                                  computer that runs the VxWorks operating
            The FCP is the main application                                                                                       system. This board obtains the majority of its
            processor, which handles the guidance,                                                                                sensor information from the vehicle’s
            navigation,   control   and    mission                                                                                instrumentation boxes via an IRIG-B/Decom
            sequence instructions. This board is a                                                                                card and the electromechanical actuator
            Radstone Power PC604R single-board                                                                                    (EMA) system via a Mil-Std-1553 card. The
                 Pyro                      Pyro                       Pyro                        Pyro

               Actuators                 Actuators                  Actuators                   Actuators

                   GN&C                     GN&C                       GN&C                        GN&C
                  Sensors                  Sensors                    Sensors                     Sensors

                Comm                      Comm                       Comm                         Comm
              Equipment                 Equipment                  Equipment                    Equipment




                                                                                                                                                                              RS-422                                 PAD T-0
  NEFU                   FCC1                     FCC2                              FCC3                             FCC4

                                                                FTPP Interconnect                                                                                                  Ethernet
                                                                                                                                                        CTC #1
                                                                                                                                                                                Comm                                 AFT FLT
      Ethernet/test signals                                                                                                                                                   Equipment                               DECK
                                                                                                                                                                Flt Data                                           INTERFACE
                                                            FCC’s command outputs

 GSE                                                                                 Discrete            Analog
Vehicle                                                                                                                                                               2
                                                                                     outputs             outputs
 Panel                                                                                                                                                                                 CTC #2

                                                                                                     Vehicle Equipment
Power                  High                                                                                                                                                                Flt Data
Source                                  Aerosurfaces                                                                                                                                      Recorder
                      Relays                                                                                                          RS422 Data & Clk / IRIG 106

Power                                                               Power                                                   RS422 Data & Clk
Source                                                                                                                                                                                        CTC – Command & Telemetry Computer
                                        Low Power                                           Power Relay                                                                                       FCC – Flight Critical Computer
                                          Relays                                              Boxes                                                                                           FTPP – Fault Tolerant Parallel Processor
                                                                                                                                                                  Boxes                       GSE – Ground Support Equipment
                                                                                                                                                                                              NEFU – Network Element Fifth Unit

                                     Vehicle Equipment                              Vehicle Equipment                                                               Sensors

                                                     Figure 1: X-38 / V201 Avionics Architecture
                                                                                              Page 3 of 18

                                                                               FCC 1
remainder of the sensor information is
                                                                          NE - Channel A
obtained via 1553 data buses from the
                                                                      CH1       CH2   CH3     CH4
Space Integrated GPS/INS (SIGIs), the
Flush Air Data System (FADS), and the

                                                                                                       C 1
                                                                                                       C 1

S-band Transponder, and via RS-422


                                                                                                                     NEF el E


                                                                                                               C 2
                                                                                                               C 2

from the altimeters. The ICP also

                                                                                                                       h n
                                                                                                                       h nn
outputs commands to the subsystems.


                                                                                                                 C 3
                                                                                                                 C 3

                                                                                                                          n lE

Commands to a few analog devices,


such as the cabin fans, are issued via the
AO interface. Commands to many                          4
                                                        4                                                       CH
                                                      CH                                                          1
discrete devices, such as the power                             3                                     CH        NE
switches, are issued via the DO                               CH
                                                             an 3     2                       CH
interfaces. EMA position commands are                       h       CH
                                                                    CH                          3   ha
                                                                                                     ann       FC
                                                         – C FCC
                                                         – FCC                                   ne               C
issued via the 1553 interface.
                                                         –                  1           CH     lD
                                                                                               lD          4
                                                       E                  CH
                                                                          CH              4

Communication between the FCP and
ICP occur via the NE. The NE, in             Figure 2: Network Element Connectivity Diagram –
combination with the FTSS software,          Each channel represents a physical fault-containment
provides the exchange mechanism for          region (i.e., Channel A = FCC 1). Each link in the
sharing input data and voting outputs.       diagram represents a fiber optic cable connecting a
                                             transmit-receiver pair.
These NEs act as arbiters that connect a
redundant set of FCC computers, each
considered fault containment regions, to
                                                By utilizing the underpinning technologies of
each other as well as to external systems
                                                the FTPP and the FTSS, the vehicle now has
in a manner that implements Byzantine
                                                the ability to have a redundant set of
resilience in a parallel processing
                                                computers run flight-critical application code
                                                in lock-step, vote inputs and outputs at the bit
                                                level, isolate a suspect FCC by masking it out
The NEFU is a fifth flight computer that
                                                of the group, and recover a once faulty FCC
contains an ICP and an NE. The NEFU
                                                at desired flight phases.
was added to make the architecture two-
fault tolerant. The NEFU participates in
                                                There are two other computers critical to this
the voting exchange process, but does
                                                architecture, called the Command and
not provide any source data nor
                                                Telemetry Computers (CTC). They serve as
command any vehicle functions.
                                                the vehicle’s primary interface to machines
                                                and people outside of the vehicle. Each CTC
A fiber optic (FO) data bus network
                                                consists of three boards; a Power PC604R
connects all four FCCs and the NEFU.
                                                processor board; a multiprotocol/RS-422
This ensures that each fault-containment
                                                board; and a Reed-Solomon convolutional
region is electrically isolated from each
                                                encoding board. The two CTC machines
other, a Byzantine requirement. Each
                                                interface with the four FCPs via the
NE board in these computers has four
                                                multiprotocol/RS-422 interface. The CTC
FO transmitters and four FO receivers,
                                                machines receive remote commands from
so each computer can send and receive
                                                several sources, and send telemetry data to
data from everyone else. Figure 2 shows
                                                several destinations, including both the
this configuration as a “star” topology.
                                                ground control center and the Shuttle’s aft
                                                                                                                        Page 4 of 18

                 flight deck. All telemetry data is also                               Figure 3 shows a schematic block diagram of
                 stored onboard two flight data recorders                              the flight computer interconnectivity and the
                 for retrieval after the mission. These                                major I/O interfaces. All four FCCs and the
                 data recorders consist of a 5.1 GB SCSI                               two CTCs are built exactly alike, but each
                 solid-state disk drive attached to each                               will have an address connector plug or a
                 CTC.                                                                  firmware setting to give it its unique identity.

                                                                                                                       Discrete outputs
                                              Flight Data                                FCC 1                         Analog outputs
                                              Recorder #1                                                              1553 bus
                               28V pwr                                                                                 IRIG-B PCM data
                                                                                                                 Ethernet, RS-232, RS-422, health status
                                                                                                       28V pwr

                 Ethernet, RS-232                                                                                      Discrete outputs
                                     RS-422   CTC 1                                      FCC 2                         Analog outputs
PLD umbilical                        RS-232                 RS-422
                      UHF                                                                                              1553 bus
                    health status                                                                                      IRIG-B PCM data
                           28V pwr                                                                               Ethernet, RS-232, RS-422, health status
                                                                                                       28V pwr

                                                                                                                       Discrete outputs
                                                                                         FCC 3                         Analog outputs
                                                                                                                       1553 bus
                                                                                                                       IRIG-B PCM data
                 Ethernet, RS-232                                                                                Ethernet, RS-232, RS-422, health status
    S-band                                    CTC 2                                                    28V pwr
PLD umbilical                                               RS-422
                      UHF            RS-232
                    health status
                           28V pwr                                                                                     Discrete outputs
                                                                                         FCC 4                         Analog outputs
                                                                                                                       1553 bus
                                              SCSI                                                                     IRIG-B PCM data
                                                                                                                 Ethernet, RS-232, RS-422, health status
                                              Flight Data                                              28V pwr
                                              Recorder #2
                               28V pwr
                                                                     Fiber Optic Bus

                                                                                         NEFU                    Ethernet, RS-232, health status
                                                                                                       28V pwr

                Figure 3: Flight Computer avionics block diagram showing major interconnectivity and I/O interfaces.
                                                                                                               Page 5 of 18

      II. SOFTWARE                                                           level ability to run this application in parallel
      IMPLEMENTATION                                                         across separate processors, in lockstep,
                                                                             leaving the developers no concern over this
      SOFTWARE ARCHITECTURE                                                  parallel nature or redundancy of the system.
      The X-38 vehicle software is primarily                                 The flight application simply reads from
      broken up into four categories: (i) the                                inputs (which the FTSS delivers to the
      COTS operating system (OS) VxWorks;                                    application and the FTPP ensures is
      (ii) FTSS, a layer of system services that                             congruent across all computers) and then
      supplements the OS; (iii) the Data                                     writes the associated outputs. The FTSS then
      Management Software, which is made                                     passes these outputs to the FTPP hardware
      up of the mission/vehicle management                                   where output voting and final message
      software and the ICP software; and (iv)                                delivery take place. The developer is further
      the flight application software. The                                   relieved from performing health and
      software architecture for the X-38                                     monitoring of these systems as FTSS
      vehicle is shown in Figure 4. With the                                 performs intensive fault detection, isolation,
      exception of the OS, details on each of                                and recovery functions. By working in
      the software types are described below.                                unison, the FTPP and FTSS create a scalable
                                                                             architecture that allows growth for both state-
      FTSS AND FTPP INTERPLAY                                                of-the-art advancement (by providing distinct
      The FTSS software is a software layer,                                 modular separation), and for further
      which is woven around the VxWorks                                      redundant parallel expansion by allowing the
      OS. FTSS works intimately with the                                     creation of new virtual groupings.
      FTPP hardware, which allows the
      developer of the flight application the                                The NE interface between the ICP and FCP
      ability to program as if they were                                     serves several functions, including the
      running everything only on one                                         exchange of sensor data from the ICP to the
      computer. The FTSS handles the low-                                    FCP. All ICP data is treated as simplex data

                                                   FCP                                                   ICP
                   Sensor, Appl,
 Applications     Effector, Vehicle                              Mission / Vehicle                   User I/O Tasks
                                      User Applications Tasks
                     FDI & RM                                     Management
                 Scheduling Shared Communications Support Time       FDI
                                                                                         FTSS SW
      FTSS        Services Memory     Services   Services Services (FCP, NE) RM MPCC                         Custom
                            Objects                                                      (partial)
                                                                                                         System Software

        OS                                      VxWorks                                                    VxWorks
                          FCP CPU and Resources                                 BRVC      BRVC   ICP CPU &       ICP I/O
      Hardware                                            FCP I/O Hardware                       Resources      Hardware

                         User Software

                         Fault-Tolerant System Services

                         COTS Software

                         FCP Hardware

                         ICP Hardware

                                         Figure 4: X-38 V-201 Software Architecture
                                                                                                                 Page 6 of 18

             (i.e., single source) that is being passed                             exchange, the data is sent from all five NEs
             to a quadraplex group. This is due to the                              to all five NEs again. This two round
             fact that the sensors and effectors are not                            exchange is necessary because: (i) the ICPs
             redundant across the four ICPs. Instead,                               are not synced during the first exchange (i.e.,
             the I/O profile of the X-38 201 vehicle is                             all four ICPs are running independently and
             redundant and/or cross-strapped in only                                in simplex mode); and (ii) the second
             the key areas necessary for vehicle                                    exchange is necessary to verify that the data
             flight, life support, and environmental                                exchanged in the first round was received
             control.                                                               properly by all NEs.

             Figure 5 below shows a single input                                    DATA MANAGEMENT AND APPLICATION
             value being read into the ICP and                                      SOFTWARE
             exchanged via a two-round exchange                                     The X-38 data management and application
             over the NEs to all four FCPs. If that                                 software consists of five main functions: (i)
             single input value is “bad” (i.e., the                                 data acquisition of sensor information; (ii)
             sensor has produced erroneous data) that                               computer processing of sensor information;
             “bad” value would be exchanged via the                                 (iii) subsystem commanding & effector
             NEs just like any other value. It is up to                             control; (iv) telemetry frame construction;
             the application software to determine if                               and (v) remote command reception and
             the value is “bad.”                                                    execution.       The FTSS software in
                                                                                    combination with the JSC vehicle and
             During the first round of the exchange,                                mission management software provides a
             the data is sent from one NE to all of the                             basic environment in which applications,
             NEs. During the second round of the                                    such as flight control, can execute and meet
                                                                                    all necessary timing requirements.
                    Two Round Exchange Example -
                                                                                    One of the primary jobs of the FCP processor
                           On Input Data
                                                                                    is to run the “flight critical” and “non-flight
                                   Round 1                   Round 2
                                                                                    critical” applications. These applications
                                                 3                     3
                                                                                    each consist of several parts: (i) sensor
                                    NE                         NE          FCP #1
                                                                                    Subsystem Operating Procedure (SOP) code,
                                                     3                              which contains sensor data conversion
                       3                         3
                                                                                    routines, sensor redundancy management
                                    NE           3
                                                               NE          FCP #2
                                                                                    routines, and sensor fault detection, isolation,
                                                 3                                  and recovery routines; (ii) application code,
                                             3                                      which takes these sensor inputs and uses
                                                     3                 3
                           4        NE           3
                                                               NE          FCP #3   them in equations to produce effector
                                                                                    commands; (iii) effector reverse SOPs, which
                                             4 4

                                               4                                    convert the commands from engineering
                                                 4                     3
                                    NE                         NE          FCP #4   units to raw effector units; and (iv) code for
                                                                                    processing remote commands coming from
                                                 3 3
                                                                                    the ground flight controllers or Shuttle
                                                               NE                   astronaut crew.

                                                                                    Each application program is divided up into
         Figure 5: A two-round exchange example on input                            an initialization procedure, an application
         data. Note the erroneous “4” input on FCC 4 is voted                       code procedure, a sensor SOP procedure, and
                                                                                                                                                              Page 7 of 18

an effector SOP procedure. Sensor                                                                    Once the application has completed
redundancy management and FDIR are                                                                   processing of the sensor data, the application
included in the sensor SOP task. Each                                                                produces an effector command response.
task will operate using a global memory                                                              Figure 7 below shows how all four FCPs
block, which is broken up into 50 Hz, 10                                                             produce the EMA position command at the
Hz, and 1 Hz data for each subsystem.                                                                same time and a single round exchange
Only tasks within the same rate group                                                                occurs via the NE. In this case, three of the
can communicate directly and share data                                                              four FCPs have produced a solution of 5.0.
with other tasks in that rate group. Data                                                            A fourth FCP has produced a solution of 5.1.
transfer between tasks in a different rate                                                           No FCPs have timed out, so all processors
group is performed via FTSS                                                                          are in sync. Upon the completion of the
communication services sockets. Since                                                                single round exchange, a voted output is sent
FCP applications do not have access to                                                               (i.e., 5.0) to all four ICPs. The 5.1 position
non-congruent data, FTSS comm-                                                                       that FCP #4 produced is masked out. The
unication services will by-pass the use of                                                           voted output is broadcast to all FCPs and to
the NEs.                                                                                             all four ICPs. This voted broadcast allows
                                                                                                     both the ICPs to receive the output command
Figures 6 and 7 below show end-to-end                                                                and the FCPs to: (i) receive the output
how the ICP brings in sensor data, how                                                               command, which can then be placed in the
the application operates on that data and                                                            telemetry stream; and (ii) receive any
produces an effector command, and how                                                                syndrome data on the output vote, which will
the effector command is output to the                                                                in turn be used in FTSS FDI to determine
ICP.                                                                                                 whether or not a processor or NE board has a

                                                         Three Sensor Example
                                                                                                      • FCP4 participates in the two round exchange which occurs over the
                                                                                                        NE like in the previous example.
                                     0x355, 0x356, and 0x355                                          • FCP4 does three reads - Read (SIGI3), Read (SIGI2), and Read (SIGI1).
                                                                         FCP 4           ICP 4
                                                                                                      • FCP4 now has 0x355, 0x356, and 0x355 as values for the three SIGI

  • Two round exchange occurs over NE like in previous example.                              A
                                               E                                                                                                         0x355, 0x356, and 0x355

                                                                                                                                                     FCP 1
0x355, 0x356, and 0x355                                                                                                       NE
                                                                                                                                                                        SIGI 1
                                                                                                                                                     ICP 1              0x355
                                                                                                          •   ICP1 obtains data via 1553 and performs a Send(SIGI1)
                     FCP 3                                                                                •   Two round exchange occurs over NE like in previous example.
                                               NE                                                         •   This is independent of the SIGI2 and SIGI3 exchange.
  SIGI 3                                                                                                  •   FCP1 does three reads - Read (SIGI3), Read (SIGI2), and Read (SIGI1).
                     ICP 3                                                          NE
  0x355                                                                                          C        •   FCP1 now has 0x355, 0x356, and 0x355 as values for the three SIGI
 Send(SIGI3)                                                                                                  reads.

  •   ICP3 obtains data via 1553 and performs a Send(SIGI3)
  •   Two round exchange occurs over NE like in previous example.         FCP 2           ICP 2
  •   This is independent of the SIGI1 and SIGI2 exchange.
                                                                                                      •   ICP2 obtains data via 1553 and performs a Send(SIGI2)
  •   FCP3 does three reads - Read (SIGI3), Read (SIGI2), and Read (SIGI1).
                                                                                                      •   Two round exchange occurs over NE like in previous example.
  •   FCP3 now has 0x355, 0x356, and 0x355 as values for the three SIGI
                                                                                                      •   This is independent of the SIGI1 and SIGI3 exchange.
                                                                      Read(SIGI1)        SIGI 2       •   FCP2 does three reads - Read (SIGI3), Read (SIGI2), and Read (SIGI1).
                                                                      Read(SIGI2)        0x356        •   FCP2 now has 0x355, 0x356, and 0x355 as values for the three SIGI
                                                                      Read(SIGI3)                         reads.
                                                        0x355, 0x356, and 0x355

   Figure 6: End-to-End Sensor Input to Effector Output Data Exchange Example, Part 1
                                                                                                                                                   Page 8 of 18

 problem and needs to be voted out, reset,                                                     commanding were taken off the NE path, yet
 or powered off. All of the ICPs receive                                                       would still adhere to all rules governing the
 all commands. This allows the FCP, for                                                        NE.     The solution was to receive the
 the most part, to be independent from the                                                     commands and send telemetry to/from a
 effector configuration. It is the ICP’s                                                       separate     I/O     board      called    the
 responsibility to know their own identity                                                     multiprotocol/RS-422       (MPCC)      board,
 and what I/O devices are attached to                                                          making it part of the fault containment
 them.                                                                                         region. Bringing commands into the FCCs is
                                                                                               accomplished by reading the commands from
 COMMANDING & TELEMETRY                                                                        the MPCC at 10hz from the redundant set,
 Commanding and telemetry are dealt                                                            voting on the health of each MPCC, then
 with in a different fashion than the rest                                                     selecting the two healthiest ones to single-
 of the system. In a perfect world, the                                                        source exchange their commands to the NE.
 telemetry and commanding would have                                                           None of the telemetry is voted, as the only
 been transmitted through the NEs via the                                                      difference in the data across the redundant
 ICPs in the traditional fashion.                                                              computers is the timestamp value. The
 However, a high data volume of                                                                telemetry is simply transmitted out of the
 telemetry and commanding at a 10hz                                                            MPCC board at 10hz to the flight data
 rate would bog down the system and                                                            recorders and eventually transmitted to the
 could preempt high priority flight critical                                                   ground by a separate system.
 data. Instead, both the telemetry and

• GN&C is notified that the SIGI data is ready.   Three Sensor Example, cntd.
• GN&C performs SOP function, FDIR, and RM. Decides solution
  is really 0x355.                                                                               • GN&C is notified that the SIGI data is ready.
                                                                           5.1               5.0
• This value is used in a GN&C equation, which produces an EMA                                   • GN&C performs SOP function, FDIR, and RM. Decides solution
  position value of 5.1.                                                                           is really 0x355.
                                                                     Write(EMA1)       Read(EMA1)
• The reverse SOP is called for output.                                                          • This value is used in a GN&C equation, which produces an EMA
• The application then does a Write(EMA1) to the NE.                                               position value of 5.0.
                                                                            FCP 4    ICP 4
• A single round voted exchange occurs, because all four FCPs                                    • The reverse SOP is called for output.
  are synced. The output is 5.0. Error in FCP4 is masked by voters in NE.                        • The application then does a Write(EMA1) to the NE.
• FTSS FDIR will determine how to treat FCP4 (i.e., determine if this is a                       • A single round voted exchange occurs, because all four FCPs
  transient error or permanent; RM will be dependent on Flt. Mgr defined                           are synced. The output is 5.0. FCP4 is masked by voters in NE.
  RM policy in force for that particular mission phase.                                  A       • FTSS FDIR will determine how to treat FCP4 (i.e., determine if this
• All four ICPs and each FCP receive the broadcast output value and it is                          is a transient error or permanent; RM will be dependent on Flt. Mgr
 the ICPs responsibility to determine which ICP is channelized to                                  defined RM policy in force for that particular mission phase.)
 which EMA controller.                                                                           • All four ICPs and each FCP receive the broadcast output value and
                                                                                                   it is the ICPs responsibility to determine which ICP is channelized
                                                                                                   to which EMA controller.
                                                  NE                                                                B
                                                                                                                                          FCP 1               5.0
 Write(EMA1)                                  D                                                                                                               5.0
                                                                                                                                          ICP 1
 5.0               FCP 3
                   ICP 3
                                                                                                 • GN&C is notified that the SIGI data is ready.
 • GN&C is notified that the SIGI data is ready.
                                                                                  NE             • GN&C performs SOP function, FDIR, and RM. Decides solution is
 • GN&C performs SOP function, FDIR, and RM. Decides solution is                           C
                                                                                                   really 0x355.
   really 0x355.
                                                                                                 • This value is used in a GN&C equation, which produces an EMA
 • This value is used in a GN&C equation, which produces an EMA
                                                                                                   position value of 5.0.
   position value of 5.0.
                                                                                                 • The reverse SOP is called for output.
 • The reverse SOP is called for output.
                                                                            FCP 2      ICP 2     • The application then does a Write(EMA1) to the NE.
 • The application then does a Write(EMA1) to the NE.
                                                                                                 • A single round voted exchange occurs, because all four FCPs
 • A single round voted exchange occurs, because all four FCPs
                                                                                                   are synced. The output is 5.0. FCP4 is masked by voters in NE.
   are synced. The output is 5.0. FCP4 is masked by voters in NE.
                                                                                                 • FTSS FDIR will determine how to treat FCP4 (i.e., determine if this
 • FTSS FDIR will determine how to treat FCP4 (i.e., determine if this is a
                                                                                                   is a transient error or permanent; RM will be dependent on Flt. Mgr
   transient error or permanent; RM will be dependent on Flt. Mgr defined
                                                                              5.0       5.0        defined RM policy in force for that particular mission phase.
   RM policy in force for that particular mission phase..
                                                                                                 • All four ICPs and each FCP receive the broadcast output value and
 • All four ICPs and each FCP receive the broadcast output value and
                                                                         Write(EMA1)   Read(EMA1) it is the ICPs responsibility to determine which ICP is channelized
   it is the ICPs responsibility to determine which ICP is channelized
                                                                                                   to which EMA controller.
   to which EMA controller.

           Figure 7: End-to-End Sensor Input to Effector Output Data Exchange Example, Part 2
                                                                         Page 9 of 18

The FCPs and CTCs are connected              The 10hz data collector task gathers all
together through their MPCC boards (an       telemetry information at a 10hz rate and
RS-422 interface). The FCC has one           passes it to the 10hz data logger task. The
MPCC channel for command reading             1hz data collector task gathers all telemetry
and one channel for telemetry writing.       information at a 1hz rate and passes it to the
Refer to Figure 3 again for the FCC-         10hz data logger task. The 10hz data logger
CTC connectivity scheme.             The     task then constructs each telemetry frame
FCP/CTC combination is maximized to          and, at a 10hz rate, and outputs a stream of
minimize the chance of two failures (i.e.,   telemetry data to each CTC.
two failed FCCs) in two fault
containment regions bringing down both       After the telemetry stream has been written,
CTCs. The channelization of the CTCs         the FCPs then read data from its associated
is completed along power fault               CTC. There will always be a command and
containment regions.                         status data to be read (even if it is a null
                                             command or a repeated command).
There are three telemetry gathering tasks
scheduled at 50, 10, and 1 Hz rates
(known as the data collector tasks), and
one data logger task at 10hz which sends
the telemetry frame to the CTCs. The
50hz data collector task gathers all
telemetry information at a 50hz rate and
passes it to the 10hz data logger task.
                                                                             Page 10 of 18


The X-38 flight computers are
implemented using the industry standard
VME64x protocol. As stated before, this
project relies heavily on industry
available ruggedized COTS components,
with the exception being the Draper
designed Network Element board. A
few COTS components have been
modified to meet flight quality
specifications, and extensive system
engineering & test principles have been
applied to the hardware development.
The      chassis    enclosures,    VME       Figure 8: A flight FCC and CTC chassis on the test
backplanes, and power supply modules         bench.
are custom designed and built by AITech
Defense Systems (Israel). The VME              The four digital output (DO) boards in each
cards are procured from five different         FCC are used to control relays in the Power
manufacturers.      The internal wire          Conditioning Unit subsystem. The DO outputs
harness is designed and fabricated in-         either a 0-volt or 5-volt TTL command. The
house. The NASA hardware effort is to          single analog output (AO) board is used for
primarily integrate all these components       controlling vehicle cabin fans, pumps, and
together as a working, space-qualified         motor positions. The AO outputs a unipolar
flight computer system, then integrate         signal from 0 to 10 volts.
them into the V201 spacecraft.
                                               A single channel Mil-Std-1553b daughter card
FLIGHT CRITICAL COMPUTER                       resides on the ICP PPC604R processor board.
The FCC chassis contains an 11-slot            Both bus A and B are used and are transformer-
VME64x backplane with two redundant            coupled to the vehicle’s 1553 data bus network.
power supply modules. VME cards
populated ten of the eleven slots, and the     Most of the vehicle sensor information arrives to
entire chassis consumes approximately          the FCCs via the IRIG-B/Decom board. When
180 watts at 28 volts (in idle mode).          new information arrives at regularly scheduled
Figure 9 shows a block diagram.                rates, the decom board prepares the data for
                                               processing and generates an interrupt to the ICP
The processor boards used for the FCP,         processor.
ICP, NEFU and CTC computers are the
Radstone Power PC604R, running at              As stated earlier, the FCCs and CTCs
333-MHz with 128MB of DRAM, 8MB                communicate with each other via their MPCC
user flash, and 1MB system flash               board’s RS-422 interfaces. Any command
memory onboard. Several I/O interfaces         received by the CTC will be forwarded onto the
are used including the RS-232, RS-422,         FCCs, and all telemetry gathered by the FCCs
SCSI, Ethernet, and parallel port.             will be sent out to the CTCs.
                                                                                                                                                                                                 Page 11 of 18

The two power supply modules inside                                                                        volts (in idle mode). See Figure 10 for
each FCC can supply 300-watts each.                                                                        details. Two of the slots contain PPC604R
They normally work in a load-sharing                                                                       and MPCC boards. The third VME board is
arrangement, however if one supply                                                                         a Reed-Solomon convolutional encoder. It
should fail, the remaining one can                                                                         formats and packages the telemetry stream,
adequately power the entire FCC.                                                                           and encodes it for transmission to either the
Nominal input voltage is 28Vdc, with                                                                       Shuttle’s aft flight deck or the launch pad’s
+5V and 12Vdc being supplied to the                                                                       T-0 connector, or for transmission via the S-
VME backplane.                                                                                             band communication system. The other
                                                                                                           function of this board is to receive commands
All seven flight computers are designed                                                                    (from the same sources listed above), decode
for conduction-cooling. Each chassis is                                                                    them, check for receive errors, and then
mounted on an active thermal cold plate                                                                    forward them to the FCCs.
inside the pressurized crew cabin.
Temperature sensors on the chassis                                                                         All the vehicle telemetry data is also stored
sidewalls and on select VME cards are                                                                      onboard a flight data recorder connected to
used to help determine the flight                                                                          each CTC. The data recorder is a 5.1GB
computer’s health status.                                                                                  flash memory drive and interfaces to the CTC
                                                                                                           PPC604R via the SCSI interface.
The CTC chassis utilizes a five-slot                                                                       NETWORK ELEMENT FIFTH UNIT
VME64x backplane with three 175-watt                                                                       The NEFU utilizes the same backplane that
power supply modules available (only                                                                       the CTC uses, but the VME cards are placed
two are actually installed). The CTC                                                                       in different slots. The NEFU contains two
contains three VME cards and it                                                                            VME cards (the PPC604R and NE), with all
consumes approximately 92 watts at 28                                                                      three 175-watt power supplies installed, and
           SIGI,                                                    Flight Critical Computer
           S-band Decomm
                                                                         Block Diagram
           Xponder, (one-way),
           EMA      Altimeter
                                                                     To                                                                                            From
                            R   R                                    ECLSS,            To                To                To   To                                 Instrumentation
                     1      S   S                                                      PCUs              PCUs              PCUs PCUs
                                                                     winches                                                                                       Box
                     5      4   2    To/From CTCs
   GSE               5                                                                                                                                                               Fiber Optics       To NEs in the
                            2   3
                     3                4    4                                                                                                                                                            other FCCs
                            2   2
                                                                                                                                                                                                        and NEFU
                                    In Out

                                                                       Analog Output
                                                                       Analog Output
                                                                       Analog Output

                                                                                       Discrete Output

                                                                                                         Discrete Output
                                                                                                         Discrete Output
                                                                                                         Discrete Output

                                                                                                                           Discrete Output
                                                                                                                           Discrete Output
                                                                                                                           Discrete Output

                                                                                                                                             Discrete Output

                                                                                                                                                                   Decomm Board

         FCP        ICP                   NE

                                           VME BackPlane

          PWR Supply             PWR Supply
                                                                       power supplies                                                  KEY

                                                                                                                                                               Communication is really across backplane

               EMI filter           EMI filter                                                                                                                                    FCP controlled I/O device

                                                                                                                                                                                  ICP controlled I/O device

                Bus A                Bus B

               Figure 9: Schematic block diagram of the Flight Critical Computer
           Command & Telemetry Computer
                 Block Diagram                                                                                        Page 12 of 18
              UHF (RS -232)                                                                                      Fiber Optics      To NEs in the
                               XPDR        Payload Umb .                                                                           FCCs
                                                                                                    4    4
Dig Cam                                               FCCs
(RS-422)                                                      GSE                                  In Out

                               Re                    Mul                 ICP                            NE
                               ed                    ti
GSE                            -                     -
             CTP               Sol
                               om                    toc
                               on                    ol

                              VME BackPlane

                                                                       PWR Supply    PWR Supply              PWR Supply
                                                                                                                                power supplies
            PWR Supply       PWR Supply
                                                                        EMI filter    EMI filter              EMI filter

                EMI filter    EMI filter

                                                                          Bus A        Bus B                   Bus C

                   Bus A       Bus B

      Figure 10: Schematic block diagram                            Figure 11: Schematic block diagram of the
      of the CTC computer.                                          NEFU computer.

    consumes approximately 113 watts at 28                          reconfigure and reset the NE. Each NE
    volts (in idle mode). See Figure 11 for                         contains four fiber-optic receive channels and
    more details.                                                   four fiber-optic transmit channels to
                                                                    communicate to the other NEs in the other
    FIELD-PROGRAMMABLE GATE ARRAYS                                  flight computers. All four output channels
    Numerous field-programmable gate                                contain the same data; except during a failure
    arrays (FPGAs) are utilized on the                              condition.    The data transferred usually
    ruggedized COTS boards. The Draper                              comes from three 8K x 16 dual-port SRAM
    NE board uses three Xilinx and one                              memory chips. The functionality of the four
    Actel FPGAs to incorporate the design’s                         FPGAs are as follows:
    state machine. Figure 12 shows a block
    diagram of the NE board from an FPGA                            Global Controller – This FPGA operates the
    perspective, and Figure 13 lists the                            microcode-based state machine. It receives
    programmable parts and memories used.                           from the other FPGAs the inputs required to
                                                                    determine branch conditions.      The state
    The four NE FPGAs are known as the                              machine also controls all of the one-round
    Global Controller, Receive, VME/                                and two-round data exchanges, executes all
    Scoreboard, and Reset Supervisor. The                           message type requests, and controls the flow
    Reset Supervisor is implemented in an                           of data into and out of the NE.
    Actel RT54SX16-CQ256B, and the
    others are in Xilinx XQR4036XL-                                 Receive – This FPGA performs four major
    3CB228M,        all    radiation-tolerant                       functions: (i) it maintains all the operational
    components. The three Xilinx FPGAs                              NEs in a lock-step state machine
    are configured at power-up via                                  synchronization. The FPGA performs this
    configuration PROMs and the Actel                               function by recreating the fault tolerant
    FPGA is hard-wired. The logic that is                           clocks from the other NEs and maintaining a
    partitioned in the hard-wired Actel                             complex digital phase-locked loop [6]; (ii) it
    FPGA is the least likely to change.                             facilitates the reading of data from the
    Also, it contains the logic required to
                                                                        Page 13 of 18

                     Figure 12: Network Element Block Diagram

other NEs and stores all their data in       the same message request before execution.
internal FPGA first-in-first-out (FIFO)
buffers. There are four FIFOs per FPGA       This FPGA also implements a VME
corresponding to the four NE inputs; (iii)   controller. The NE is an addressable slave
it performs voting on all the data; and      device on the VME backplane, thus only
(iv) it produces syndromes if any of the     VME reads are supported. If the NE were
data doesn’t match.                          required to interface with another backplane
                                             protocol, e.g. PCI, the VME logic would be
VME/Scoreboard – There are two               changed to PCI logic. This illustrates one of
major functions implemented in this          the flexibilities of the design.
Xilinx; the scoreboard and the VME
controller.                                  Reset Supervisor – This FPGA controls the
                                             programming of the Xilinx FPGAs from their
The scoreboard function acts as an           respective configuration PROMs at both
arbiter to determine message requests        start-up and reset conditions. There is a path
from the other NEs by maintaining an         from the fiber optic receivers to the Actel
internal table of the system configuration   device, bypassing the Receive FPGA, to
and performing a sanity check on the         interpret a reset command from the other
message requests. Also, the scoreboard       NEs.
waits a prescribed time for all the
processors in the other channels to send
                                                                                   Page 14 of 18

    Manuf.           Part Number                 Description         Qty           Notes

    Xilinx      XQR4036XL-3CB228M        FPGA                         3    Rad Hard. Contains
                                                                           board’s state
     ASI        AS58C1001SF-15/883C      128K x 8 PROM                3    Boot-up PROMS for
                                                                           Xilinx FPGAs
    Actel       RT54SX16-CQ256B          FPGA                         1    Rad-tolerant

    UTMC        UT28F64T-35UCC           8K x 8 PROM                 10    Rad Hard. Contains
                                                                           the microcode
    Space       7025ERPQB-35             8K x 16 Dual-Port SRAM       3    Rad Pak®. Memory
  Electronics                                                              space for the NE

             Figure 13: FPGAs, PROMs, and Memories used in the Draper NE

                                                      (iv) Synchronized Processor Cache –
A number of hardware techniques have
                                                           previous incarnations of fault- tolerant
been implemented to make the X-38
                                                           parallel processing systems have
flight computer design more fault-
                                                           purposefully excluded the use of cache
tolerant and robust, and are described
                                                           to avoid timing issues.      The X-38
below. Some of these are features
                                                           implementation successfully uses cache
provided by the COTS boards, while
                                                           and is successfully synchronized across
others have been custom designed by our
                                                           the system.
software team.
                                                      (v) Real-time reintegration of a failed
(i) DRAM memory scrubbing – as a
                                                          channel – if a particular channel is faulty
    low priority task, certain portions of
                                                          and/or reset, the other channels can vote
    the shared DRAM memory space
                                                          to reintegrate it back into the working
    are continuously checked for errors
                                                          set, and thus allow it to resume
    or corrupted memory, etc.
                                                          participating in the voting process. This
                                                          is only allowed in quiescent mission
(ii) Error Correction Code – a
                                                          phases (i.e., not during re-entry for
     background task is continuously run
                                                          example). Up to three reintegration
     to detect and correct for memory bit
                                                          attempts are made before that channel is
     changes in DRAM (i.e., radiation
                                                          declared permanently “dead.”
     single event upsets).          This
     information is fed back into the
                                                      (vi) Fiber optic link retries – the NE board
     overall health system.
                                                           has provisions to automatically retry
                                                           and/or reset a particular fiber optic link
(iii) Watchdog timer – a continuous task
                                                           if transmission errors are encountered.
      “pets” this circuit every 1.6 seconds
      to protect against processor lockups,
      etc. If this circuit is to ever timeout,
      that computer will reset.
                                                                          Page 15 of 18

IV. DEVELOPMENT & TESTING                     switch panels and LED status lamps, and a
                                              laptop computer as the user interface.
Three software development labs were
assembled using commercial-grade              The GSE testset consists of both hardware
VME hardware (in standard 19-inch             and     software,      and    performs     the
racks      using     convection-cooled        complimentary functions of each flight
hardware). One lab was primarily used         computer. For example, it contains analog
for application code development, the         and digital input boards to test the FCC’s
second for guidance, navigation &             analog and digital output boards. It contains
control (GN&C) specific applications,         an MPCC board to verify the MPCC
and the third was used solely for             interfaces in the FCC and CTC computers.
software integration testing. Another         All other flight computer interfaces are
facility was setup at Draper Labs for         similarily tested in this closed-loop fashion,
developing the NE board.                      allowing for 100% of the wired I/O channels
                                              to be individually tested and verified.
After early testing proved the
architecture was viable, work began on        After each flight chassis has been assembled,
developing the conduction-cooled flight       extensive burn-in time is achieved using the
hardware. The first units built were for      GSE testset. Custom software routines were
engineering evaluation purposes, with         specifically created to exercise each
the second units being the formal             interface, testing both nominal and off-
qualification units. A complete flight-       nominal conditions. Limited fault-injection
quality ship-set was then built, with         simulations are used to test some of the flight
provisions reserved for building spares       hardware’s FDIR routines. After testing at
as needed.                                    the box-level, the entire shipset can be
                                              connected and the actual flight software
FUNCTIONAL TESTING WITH THE                   application run in an integrated set, with the
GSE TESTSET                                   GSE testset monitoring its performance.
In order to perform functional testing on
each flight unit, a Ground Support            ENVIRONMENTAL TESTING
Equipment (GSE) testset was built. Its        The X-38 flight computers are located on the
purpose is to: (i) provide a test platform    vehicle inside the pressurized crew cabin,
for powering up each flight chassis; (ii)     thus eliminating direct exposure to the space
verifying the proper function of each         environment.      However, a number of
VME flight board; (iii) verifying the         environmental tests have been performed to
connectivity of the internal wire             ensure the hardware will survive with ample
harnesses in each chassis; (iv) testing for   margin. Thermal-vacuum, vibration, and
limited fault detection, isolation, and       ionizing radiation testing have been
recovery (FDIR) actions; and (v)              performed on most of the components.
characterizing the behavior of the            Several tests were not completed due to the
chassis power supplies under different        project’s cancellation, and these include
loading conditions.                           EMI/EMC and power quality.

The GSE testset is assembled in a             For thermal-vacuum testing, a complete
standard 7-foot portable equipment rack.      flight-like unit was placed inside a vacuum
It contains commercial VME hardware,          chamber and on an insulated thermal plate (in
Ethernet hub, power supply, numerous          two separate tests).      Each chassis was
                                                                           Page 16 of 18

  communicating with the GSE testset,            early in the project, on the most critical
  running a suitable, high-duty cycle            components like processors, memories, etc,
  software application. This was started         before the architecture committed to a
  before the test and monitored with             particular hardware implementation. For this
  appropriate      instrumentation      (i.e.,   testing, individual components on each flight
  thermocouples and RTDs) throughout             computer board were exposed to a high-
  the test. Several heating/cooling cycles       energy proton beam to a dose that would
  were achieved, ramping to about +/-            exceed the X-38’s on-orbit life by 300%.
  130% of the maximum expected                   Several components were found to be
  temperatures while on orbit. Each cycle        unacceptably prone to single event upsets,
  had a 1-2 hour dwell time. After each          and one SCSI controller device experienced a
  test, a thorough post-functional test and      destructive latchup. Alternate parts were
  visual inspection yielded no abnormal          selected and retested to prove they would be
  results for any of the flight hardware.        suitable for this space application.
                                                 Approximately 75% of the total parts to be
  The vibration testing was done in a            tested were completed before the project was
  similar fashion, placing the flight-quality    cancelled. Figures 14 and 15 show hardware
  chassis on a vibration plate. Each X, Y,       in several tests.
  and Z axis was vibrated separately, again
  exceeding the limits expected in space         VEHICLE V201 INTEGRATION
  by about 50%. A post-test inspection           After each chassis completed formal
  revealed problems with jumper pins on          acceptance testing, it was delivered to the
  several VME boards, so a different pin         V201 spacecraft for installation and
  was selected and silicon staking was           integration testing with other subsystems. At
  used in the assembly process. A repeat         the project’s end, the spacecraft was about
  of the vibration test showed the jumper        75% complete. Most of the subsystems were
  pins held firm, and no other anomalies         installed including the electromechanical
  were observed.                                 actuators, body flaps, environmental control
                                                 & life support system, power system,
  Ionizing radiation testing was the biggest     communication gear, and avionics.
  unknown, so this testing actually begun

Figure 14: GSE Testset supporting an FCC          Figure 15: FCC processor VME board being
vibration test.                                   irradiated in the proton test chamber.
                                                                      Page 17 of 18

V. FUTURE WORK &                           (ii) Improve the Network Element’s
CONCLUSIONS                                     throughput to reduce the overloading
                                                bottleneck.   This would require a
Despite the fact the X-38 program was           hardware design change.
cancelled, much knowledge and valuable
test experience was gained from this       (iii) Use faster FCP and ICP processor
work. Several of our development labs            boards to also increase throughput.
remain in use today at the Johnson Space
Center and at Draper Labs, where the       (iv) Improve the radiation tolerance by
FTPP system concept is being leveraged          upgrading certain parts on the Network
for future spacecraft applications,             Element and processor boards.
potentially for the new Orbital Space
Plane. Enhancements are being studied      (v) Recent improvements in ruggedized
to make the FTPP avionics system even          COTS components could also lead to a
more robust and reliable.                      faster   and    smaller    hardware
The current X-38 configuration is fully
Byzantine resilient up to the I/O
processor. After that, due to cost and
weight concerns, the path beyond is        REFERENCES
susceptible to Byzantine errors (though
each hardware instance has a minimum       [1]   “X-38 V201 Software Architecture
of one fault tolerance). Improving this          Definition Document - JSC29309”,
system would include implementing the            Version 2.3; NASA-Johnson Space
Byzantine philosophy throughout the              Center; Houston, TX; December 1999.
entire breadth of the system, beyond the
flight computers and into the sensors,     [2]   “X-38 Data Management System
effectors, etc.                                  VME Card Configuration - JSC29839”,
                                                 Baseline;    NASA-Johnson      Space
Additional   improvements      to    the         Center; Houston, TX; December 2002.
hardware design could be:
                                           [3]   Racine, LeBlanc and Beilin, "Design of
(i) Removing the fiber optic links –             a Fault-Tolerant Parallel Processor";
    these components are very fragile            21st IEEE Digital Avionics Systems
    and easily damaged. The fiber optic          Conference, Irvine, CA; 2002.
    components also required us to
    significantly increase the size of     [4]   Racine, “System Status in the X-38
    each flight chassis due to the               Fault Tolerant Parallel Processor,” C.S.
    minimum bend radius restriction of           Draper Laboratory, Cambridge, MA;
    each fiber cable. An alternative             2002.
    would be to replace them with
    copper connections and use             [5]   Alger and Racine, “Communication
    optocouplers to provide electrical           Error Detection in the X-38 Fault
    isolation.                                   Tolerant Parallel Processor,” C.S.
                                                 Draper Laboratory, Cambridge, MA;
                                            Page 18 of 18

[6]   Beilin, “Maintaining Network
      Element Synchronization in the X-
      38     Fault-Tolerant      Parallel
      Processor”,           International
      Conference     on      Dependable
      Systems       and        Networks,
      Washington, D.C., June 2002.

Shared By: