FPGAs for Fault Tolerant Circuits _MAPLD 2001_

Document Sample
FPGAs for Fault Tolerant Circuits _MAPLD 2001_ Powered By Docstoc

FPGAs for Fault
Tolerant Circuits
Steven A. Guccione
               The vast majority of integrated circuits produced today rely on extremely high reliability in manufacturing for their
               successful operation. A device with tens of millions of transistors will typically be unusable if a single one of these
               transistors fails. If this failure is detected during production, the device is usually discarded, leading to the well-
               known "yield" of an integrated circuit manufacturing process. Similarly, if any portion of a circuit fails while in use,
               the entire system may become unstable or non-functional. For many applications, this mode of failure is
               acceptable, and repair or replacement of the component or system is a cost-effective solution. In many other
               applications, including those where repair or replacement of faulty components is expensive or impossible, some
               form of Fault Tolerance is necessary to keep systems operating in the presence of such failures.

               In general, the techniques used to produce fault-tolerant circuits are based on having redundant components
               available to replace circuits detected as faulty. While providing such redundancy can be a costly and challenging
               design problem, Field Programmable Gate Arrays (FPGAs) provide a potential solution for implementing fault
               tolerant circuits and systems. The device architecture of FPGAs is itself a highly redundant circuit which includes
               a regular cellular array of reconfigurable logic and interconnect. The redundant nature of the underlying device
               architecture, combined with the ability to dynamically re-program the circuitry, should combine to make FPGAs
               devices a suitable platform for constructing fault tolerant circuits and systems.

               While much research has been done in the area of fault tolerance using FPGAs, little or no support for this type of
               design has been made commercially available. This is in spite of the large potential gains in both system reliability
               and device process yield. While the underlying FPGA architecture does inherently provide support for fault
               tolerance, existing FPGA design software does little to take advantage of these properties.

               The software must provide support for directly configuring, probing and reconfiguring large, popular, commercially
               available FPGA devices. This software has been the basis of some preliminary work in producing defect and fault
               tolerant FPGA circuits. These circuits have demonstrated the ability to configure and reconfigure FPGA devices in
               the presence of defects in both circuit logic and interconnect. This work involves three major components: the
               ability to construct working circuits in the presence of known defects, the ability to detect and isolate defects in an
               operating FPGA device, and finally the ability to reconfigure circuits at run-time to operate in the presence of
               newly detected defects. The fundamentals of this software design problem will be explored.
MAPLD 2001 E0_Guccione                                              2
                          FPGA History
                          FPGA Test / Defect Isolation
                          Defect Tolerance
                          Run-Time Reconfiguration / JBits
                          Fault Tolerance

MAPLD 2001 E0_Guccione                    3
                         1970s: Fixed SSI Logic
                   Printed Circuit Boards designed using fixed
                    Small Scale Integration (SSI) logic
                   Texas Instruments 7400 Transistor -
                    Transistor Logic (TTL) parts popular
                   Changes to hardware required physical
                    modification of system
                   No programmable hardware

MAPLD 2001 E0_Guccione                 4
                         1980s: Programmable Arrays
                    Programmable Array Logic (PAL) introduced
                     by Monolithic Memories Inc (MMI) 1975
                    Low density programmable AND / OR array
                    Used primarily for interface and “glue” logic
                    Usually replaced several TTL parts
                    One-time programmability

MAPLD 2001 E0_Guccione                  5
                  1990s: Field Programmable Gate
                          Arrays (FPGAs)

                          FPGAs introduced by Xilinx in 1984
                          A programmable cellular array
                          More flexible architecture than PALs
                          RAM-based
                          Reprogrammable in-system

MAPLD 2001 E0_Guccione                    6
                         FPGA Architecture
                         IOB   IOB       IOB   IOB   IOB

                         IOB   CLB   CLB       CLB   CLB

                         IOB   CLB   CLB       CLB   CLB

                         IOB   CLB   CLB       CLB   CLB

                         IOB   CLB   CLB       CLB   CLB

MAPLD 2001 E0_Guccione               7
                                  FPGA Architecture
                          Configurable Logic Blocks (CLBs)
                            — RAM-based Look-Up Tables (LUTs)             CLB

                          Configurable Interconnect
                            — MUXes, tristates, etc ...
                          Configurable Input / Output                    IOB

                            — Input, output, tristate, various voltages
                          Other features possible
                            — Embedded SRAM, Carry Logic, Multiplier
                              support, etc.
MAPLD 2001 E0_Guccione                         8
                                          FPGA Density

                                    Approx. 65% growth                             Virtex-II
                                         per year                              Virtex



                                 1985        1989       1993            1997            2001

MAPLD 2001 E0_Guccione                              9
                                                   FPGA Device Resources

                                            100k                                     10k
                                             80k                                     8k
                                                       M et al Lengt h (km)

                                             60k                                     6k

                                             40k                                     4k
            Courtesy and Copyright of UMC

                                             20k                                     2k

                                              0k                                     0k
                                               1990        1995               2000

MAPLD 2001 E0_Guccione                                           10
                         Modern FPGA Devices
                          10 Million+ system gates
                          420+ MHz clock speeds
                          840 Mbit/sec LVDS IO
                          3.5 MB Dual-Ported SRAM
                          18 x 18 multipliers
                          12 High-performance clocks

MAPLD 2001 E0_Guccione                11
              Defect Tolerance / Fault Tolerance
                          Software techniques
                           —   Use existing devices
                           —   General purpose
                           —   Low overhead
                           —   Fast enough for on-line use
                          Does not address:
                           —   Hardware techniques
                           —   BIST-based techniques
                           —   Customized manufacturing
                           —   Soft error recovery
MAPLD 2001 E0_Guccione                 12
                          Why Defect Tolerance?
                          Increase device yields
                          Build larger devices economically


MAPLD 2001 E0_Guccione                  13
                         Why Fault Tolerance?

                              Reliability
                              Availability
                              Repairability

MAPLD 2001 E0_Guccione             14
                            DT / FT and FPGAs

                          FPGAs ideal for DT / FT
                           — Everything is a redundant spare!
                          Approach:
                           1) Isolate defects
                           2) Program device around defects

MAPLD 2001 E0_Guccione                  15
                                  Defect Tolerance
                          Basic technology available today
                          Approach:
                            —   Modify standard FPGA tools
                            —   Isolate and record defect during test
                            —   Import defect data from database
                            —   Place and route each device
                            —   Bypass faulty wires / logic
                            —   Generate configuration bitstream

MAPLD 2001 E0_Guccione                       16
                            Defect Tolerance Flow

                                                   Place       Defect
                          Netlister    Netlist      and       Database


MAPLD 2001 E0_Guccione                       17
                               The Need for Speed
                         But …
                          Modification of existing tools difficult
                          Too slow for most production uses
                            — Bitstream generation can take hours
                            — Faster circuit generation required

MAPLD 2001 E0_Guccione                      18
                               Xilinx’s JBits Toolkit
                          Provides extremely fast circuit generation
                          Can directly interface with defect database
                          Uses off the shelf Java compilers
                          Supports Run-Time Reconfiguration (RTR)
                          Core-based design model

MAPLD 2001 E0_Guccione                      19
                         The JBits Environment
                                  RTP Core                   JBits
                                   Library                    API
                                  JRoute           Code


MAPLD 2001 E0_Guccione                       20
                              The JBits Environment
                  A collection of tools and Application Program
                   Interfaces (APIs)
                         —   JBits: The configuration bitstream API
                         —   RTP Cores: Run-Time Parameterizable Cores
                         —   BoardScope: The debug tool
                         —   XHWIF: The portable hardware API
                         —   JRoute: The run-time router API
                         —   VirtexDS: The Virtex Device Simulator

MAPLD 2001 E0_Guccione                      21
                 Run-Time Reconfiguration (RTR)
                Run-Time Reconfiguration (RTR): building and
                 configuring circuits at run-time, in-system
                A powerful advantage of FPGAs over ASICs
                         —   More flexible
                         —   Faster circuits
                         —   Multifunction hardware
                         —   Enables defect and fault tolerance

MAPLD 2001 E0_Guccione                          22
                                RTR = CPU + FPGA
                 CPU re-programs FPGA
                 Useful for:
                         —   Co-processing
                         —   Reducing CPU size and speed
                         —   Increasing system performance
                         —   Lower system power
                         —   Reducing system cost
                         —   Enabling Defect / Fault

MAPLD 2001 E0_Guccione                        23
                                  JBits RTP Core
                          Run-Time Parameterizable (RTP) Core
                          A high-level circuit abstraction
                          An object generated at run-time
                          Parameters define specifics of circuit:
                            — <n>-bit shift register
                            — <row,col> location independent
                            — Other parameters possible

MAPLD 2001 E0_Guccione                     24
                         An Example RTP Core
                          An 8-bit adder
                          Parameter <n=8> defines size
                          Generated at run-time
                                                Stage 7


                                                Stage 0

MAPLD 2001 E0_Guccione                     25
                           Defect Tolerant RTP Cores
                 Use defect database as parameters
                 Skip defective resources
                         — Skip CLB
                         — Skip Rows
                         — Skip Columns
                 Can be variable granularity (CLB-level used here)
                 Defective wires handled in JRoute run-time router
                         — Simply marked as “used”

MAPLD 2001 E0_Guccione                      26
                         Defect Tolerance RTPCores
                          Mode used depends on goals
                          Common Defect Tolerance modes
                           — Skip CLB: resource efficient
                           — Skip CLB Row: aligns core internals
                           — Skip CLB Column: aligns datapaths

MAPLD 2001 E0_Guccione                   27
                         DT Constant Multiplier

                           Core View                       State View

                                       BoardScope XCV800
MAPLD 2001 E0_Guccione                     28
                                Defect Tolerant Design
                 Use JBits DT RTP Core library
                         —   Relatively place cores
                         —   Provide defect database to all cores in design
                         —   Generate bitstream
                         —   Program device
                 Provides Defect Tolerance at the Core level

                         See: Run-Time Defect Tolerance using JBits. Prasanna Sundararajan and
                             Steven A. Guccione, FPGA 2001.

MAPLD 2001 E0_Guccione                              29
                            From Defect Tolerance to
                                Fault Tolerance
                  Defect Tolerance: produce working circuits
                   in the presence of manufacturing defects
                         — Fast circuit generation via JBits
                         — Test techniques still slow, off-line
                  Fault Tolerance: produce working circuits in
                   the presence of all defects, even those
                   occurring in-system at run time.
                   Requires fast, in-system device test techniques

MAPLD 2001 E0_Guccione                         30
                         JBits for FPGA Device Test
                 Use JBits to:
                         —   Configure test circuits          Defect
                         —   Supply test vectors                               Tester

                         —   Read back results
                         —   Store defect data
                         —   Interface to JBits design tools
                 Small, fast, on-line solution
                         See: FPGA Device Test Using JBits, P. Sundararajan, S. McMillan
                                          and S. Guccione, MAPLD 2001.

MAPLD 2001 E0_Guccione                             31
                         Fault Tolerance Using JBits
                  Small, fast, on-line test to detect faults
                  Fast, flexible circuit generation to produce
                   new circuits in the presence of faults
                  Preliminary results very encouraging

                  JBits provides an integrated tool to produce on-
                           line fault tolerant FPGA systems

MAPLD 2001 E0_Guccione                   32
                                      Timing Issues
                     Not recommended for aggressive designs
                         — Tight timing
                         — High resource utilization
                     Potential timing solutions:
                         —   On-line static timing analysis
                         —   Limited reconfiguration
                         —   Adjustable clock speeds
                         —   Others
                     Timing variance in FPGAs improving
MAPLD 2001 E0_Guccione                        33
                                      Timing and Connectivity


                               2000                                                        Virtex-II




                                      0   100   200   300    400   500   600   700   800   900         1000

                                                            LUTs Reached

MAPLD 2001 E0_Guccione                                      34
                                       Future Work
                 Expand DT / FT RTP Core library
                         — Permits transparent design of DT / FT
                         — Easily converts existing designs to DT / FT
                 Move from demos to real-world applicaitons
                 Expand on-line testing to characterization for
                  speed, temperature, and other parameters
                 Explore architectural modifications to aid in
                  software-based defect / fault tolerance
MAPLD 2001 E0_Guccione                        35
                    RTR enables high-speed, in-system FPGA
                     device test and defect isolation
                    RTR enables fast circuit generation in the
                     presence of defects
                    An integrated, on-line Fault Tolerant FPGA
                     design tool using RTR demonstrated with

MAPLD 2001 E0_Guccione                  36
                Thanks to the JBits development team at both
                  Xilinx and Virginia Tech.
                Finally, thanks to DARPA Adaptive Computing
                  Systems (ACS) grant DABT63-99-3-0004 for
                  support of this work.

MAPLD 2001 E0_Guccione               37