Docstoc

Fault Tolerance in VHDL Description Transient Fault

Document Sample
Fault Tolerance in VHDL Description Transient Fault Powered By Docstoc
					                       Fault-Detection Capability Analysis of a
                            Hardware-Scheduler IP-Core in
                      Electromagnetic Interference Environment



                                 J. Tarrillo1, L. Bolzani1, F. Vargas1, E. Gatti2, F. Hernandez3, L. Fraigi2



                      1 ElectricalEngineering Dept., Catholic University – PUCRS. Porto Alegre, Brazil.
                               2 Inst. Nacional de Tecnologia Industrial (INTI). Buenos Aires, Argentina.

                                                               3 Universidad ORT. Montevideo, Uruguay.




Catholic University                           vargas@computer.org                                      1
     PUCRS
                           Motivation



Nowadays, safety-critical embedded systems support real-time (RT)
applications that have to respect strict timing constraints.



 They have to provide logically and temporally correct results !
The high complexity of these systems requires the adoption of Real-Time
Operating Systems (RTOS) that manage task switching process,
concurrency between tasks, memory, time as well as interrupts.




                             vargas@computer.org                          2
Understanding the Problem …

The increasing hostility of the electromagnetic
environment caused by the widespread adoption of electronics
and in particular wireless technologies, represents a huge
challenge for the reliability of RT embedded systems.




Electromagnetic interference (EMI) may induce Power
Supply Disturbances (PSD) that can generate transient faults.



These faults can affect not only the applications running on
embedded systems but also the RTOS executing the application
code, by causing scheduling dysfunctions that could lead to
incorrect system behavior.

              vargas@computer.org                              3
Understanding the Problem …


  Several solutions have been proposed. However, they provide
  fault tolerance only at the application level and do NOT
  consider faults affecting the RTOS that propagate to
  application tasks.



 e.g.: about 34% of the faults injected in processor’s registers
 led to scheduling dysfunctions:

      If not detected at the RTOS-level,
    - 44% of these dysfunctions led to system crashes,
       these faults escape detection by
    - 34% caused RT problems and
conventional (app-level) techniques as well !
     - 22% generated incorrect outputs (propagate to system
     outputs).

                 vargas@computer.org                          4
                      Goal

In this context…

We propose a Hardware-based Scheduler (Hw-S) IP core
to improve the robustness of embedded systems based
on RTOS.



 the Hw-S targets faults that are NOT detected by the
native structures present in the RTOS kernel.



                       vargas@computer.org               5
       Summary


1. The Proposed Approach

2. Practical Experiments

3. Discussion: The Benefits

4. Conclusions


          vargas@computer.org   6
                        1. The Proposed Approach




                                                                      Embedded
                                                                        System


Events: Tick, interruption, ...
                                                                Memory Addresses accessed
      (Reference for                                                by the processor.
 Switching Task Context )
                                   Hw-S identifies the current task
                                  under execution and correlates it
                                  with the information stored in an
                                   Address Table generated during
                                      the compilation process.



                  Block diagram of the target embedded system
                                  vargas@computer.org                                       7
                                          1. The Proposed Approach
                                                           In charge of identifying the task under execution based on
                                                           the addresses accessed by the CPU and on the information
                                                           stored in an Address Table generated during the compilation
                                                           process.




                                                                                              Error Indication to
                                                                                                 System Level




                                                       Implements the scheduling algorithm based on the RTOS kernel
Based on the tick and on any other event               and provides fault detection according to:
(interrupts), it is in charge of defining the                - the task in execution,
Time Limit (tl) for the processor to                         - the analysis of the tl, and
execute each task, as well as detecting                      - the events (interrupts) that can influence the RT-system.
the events that can possibly interrupt the
task in execution.
                                                      vargas@computer.org                                           8
                                                Block diagram of the Hw-S
                      1. The Proposed Approach


  Time for Context Switching (Δ time, proportional to the number and complexity of resources used by the RTOS)




     External Event




  Next task recover
      from the
  execution queue



    Current task
retirement into the
 execution queue




                                          Time Limit for Switching Context



                                 Context Switch and Time Limit.
                                            vargas@computer.org                                                  9
                1. The Proposed Approach



  Regarding the fault detection capability, the Hw-S targets two types of
faults:


  Sequence error (E_seq): occurs at the end of the Time Limit, tl, by noting
that the current task does not represent the expected one according to the
task’s execution flow.


  Time error (E_time): occurs when a task switching process takes place
in between two consecutive context switching events (e.g., two
consecutive ticks) thus, violating the time constraints associated to the
real-time system.




                             vargas@computer.org                               10
                    2. Practical Experiments

Case study:

 Von Neumann 32-bit RISC Plasma microprocessor running a RTOS (opencores.org).

 Plasma’s instruction set compatible to MIPS architecture.

 We developed and validated three benchmarks that exploit different services offered by
the Plasma’s RTOS:


               T1         Variable 1

                                                    Tasks T1, T2 and T3 access and update the value of
     BM1       T2         Variable 2
                                                    three different global variables.
               T3         Variable 3



               T1      QM               T2          Tasks T1 and T2 communicate by message queue. T1
     BM2                                            sends a value to the queue and T2 reads this value.
               T3         Variable
                         3                          Task T3 writes a value into a global variable.
               T1

                                                    Tasks T1, T2 and T3 access a global variable which has
               T2     Global
     BM3                                            been protected by mutual exclusion semaphore
               T3                      MUTEX        (MUTEX).



                                             vargas@computer.org                                         11
                                                                                                                                                         Power
                                                                                                                                                        Supplies
                                 2. Practical Experiments
               Temp Sensor
                                            FPGA
                                                               Flash
             SRAM



                                                   8051
                                                                                  Supply F0                                               Supply F1

Test Side                                                               SRAM 0                                                                           SRAM 1
                                                                                                       RS232             RS232

                                                                                                               32 bits



                                                          Supply        SRAM 0    FPGA0                     Supply                        FPGA1          SRAM 1    Supply
                                                           M0                                                MSC                                                    M1



                                                                                                   8 bits                   8 bits
                                                                        Flash 0                             8051                                         Flash 1
                                                   Test Side
                                                               Top
                                                               Botton

                                                Glue Logic Side                       8 bits                                                8 bits
                                                                                                            FPGA
                                                                                               RS232         CLK                 8 bits
                                                                                                                                                        RS232
                                                                                                                                                 8051


Remaining Glue                                                                                                              Supply C

Logic Side

                                                                                         Block Diagram




            Test board designed for IEC 62.132-2 and 61.004-29 electromagnetic susceptibility analysis
                                               vargas@computer.org                                  12
                                      2. Practical Experiments

                                                                     Test Conditions:

                                                 GTEM Cell                     Freq. range: 150 KHz – 3 GHz
                                                                               Field range: 10 – 200 V/m
                                                  Test Host                    Signal Modulation: AM 80%
                                                  Computer
                                                                              Total time of exposition: 27 hours

RF Noise Generator          Power-Supply
   and Amplifier        Noise Generator Board




                     Test Board and
                      Shielding Box

                                                              1.2 volts


                                                              1.15 volts

     Fault injection environment


                                                                               4.2 % of voltage dips




                                                                           Injected noise at the FPGA power bus
                                                                           (conducted EMI)
                                                vargas@computer.org                                            13
                         2. Practical Experiments



                                                 Summary of the obtained results


                                                                                RTOS/Hw-S           FPGA
                                         RTOS              Hw-S
After 27 hours, # of                                                             latency         configuration
                                      detection [%]    detection [%]
erroneous outputs                                                              [clock cycle]       lost [%]
   observed per
  benchmark: 65

                          BM1               33.8          100.0                    1523                 7.7

                          BM2               43.1          100.0                    498                  1.5

                          BM3                1.5          100.0                    810                    -


                                                                                               Minimum fault latency


                       Highest fault detection
                                                         Coverage of faults that
                                                         propagated to outputs


                                                 vargas@computer.org                                                   14
                              2. Practical Experiments

                   After inspection …

      Time_Errors                                                                 RTOS lost information
(CPU switched to another      Sequence_Errors                                    associated to the “next
   task between two           (CPU executed an                                    thread”, so preventing
   consecutive ticks)                                   RTOS lost “semaphore     the CPU from switching
                           unexpected task from the        information”, so
                            Task Execution Queue)                                  to the next task in the
                                                       preventing the CPU from        execution queue
                                                        continuing the proper
                                                        execution of the tasks




               Migrate to HW the weakest reliability points of the RTOS


           Percentage of E_seq and                       Percentage of assert() send by the
           E_time detected by the Hw-S.                  RTOS
                                              vargas@computer.org                                   15
                       4. Final Conclusions


We presented a Hardware-based Scheduler (Hw-S) IP core to
improve the robustness of embedded systems based on RTOS

The Hw-S targets faults: scheduling dysfunctions that could
lead to incorrect system behavior
These faults are NOT detected by the native structures
present in the RTOS kernel


The IP core is attached to the processor bus to monitor
tasks execution flow

Practical experiments indicate the technique is effective to
increase fault detection coverage provided by the RTOS-native
structures.
                         vargas@computer.org                    16

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:7
posted:10/16/2012
language:English
pages:16