ELECTRONIC devices are sensitive to radiation that may sensitive groups

Document Sample
ELECTRONIC devices are sensitive to radiation that may sensitive groups Powered By Docstoc
					732                                                                                                 IEEE TRANSACTIONS ON COMPUTERS,                       VOL. 55,   NO. 6,   JUNE 2006




           A New Reliability-Oriented Place and
         Route Algorithm for SRAM-Based FPGAs
                  Luca Sterpone, Student Member, IEEE, and Massimo Violante, Member, IEEE

       Abstract—The very high integration levels reached by VLSI technologies for SRAM-based Field Programmable Gate Arrays (FPGAs)
       lead to high occurrence-rate of transient faults induced by Single Event Upsets (SEUs) in FPGAs’ configuration memory. Since the
       configuration memory defines which circuit an SRAM-based FPGA implements, any modification induced by SEUs may dramatically
       change the implemented circuit. When such devices are used in safety-critical applications, fault-tolerant techniques are needed to
       mitigate the effects of SEUs in FPGAs’ configuration memory. In this paper, we analyze the effects induced by the SEUs in the
       configuration memory of SRAM-based FPGAs. The reported analysis outlines that SEUs in the FPGA’s configuration memory are
       particularly critical since they are able to escape well-known fault masking techniques such as Triple Modular Redundancy (TMR). We
       then present a reliability-oriented place and route algorithm that, coupled with TMR, is able to effectively mitigate the effects of the
       considered faults. The effectiveness of the new reliability-oriented place and route algorithm is demonstrated by extensive fault
       injection experiments showing that the capability of tolerating SEU effects in the FPGA’s configuration memory increases up to 85 times
       with respect to a standard TMR design technique.

       Index Terms—FPGA, transient fault injection, reliability, place and route.

                                                                                            æ

1     INTRODUCTION

E   LECTRONIC devices are sensitive to radiation that may
    happen both in the space environment and at the
ground level. Nowadays, the continuous evolution of
                                                                                                memory a proper bitstream, giving the FPGA the
                                                                                                capability of implementing nearly any kind of digital
                                                                                                circuit on the same chip. In SRAM-based FPGA, both the
manufacturing technologies makes Integrated Circuits                                            combinational and sequential logic are controlled by
(ICs) even more sensitive to radiation effects: Device                                          several customizable SRAM cells that are extremely
shrinking coupled with voltage scaling and high operating                                       sensitive to radiation that may cause Single Event Upsets
frequencies correspond to significantly reduced noise                                           [1], [2]. If an upset affects the combinational logic in the
margins, which make ICs more sensitive to radiation, as
                                                                                                FPGA, it provokes a bit flip in one of the LUTs cells or in the
well as to other phenomena (such as crosstalk or internal
                                                                                                cells that control the routing. This upset has a persistent
noise sources) that provoke transient faults.
                                                                                                effect that could be propagated in other parts of the circuit
   In the last decade, the new manufacturing technologies
                                                                                                since the implemented hardware is modified. This upset is
made feasible the development of SRAM-based FPGAs that
became very popular thanks to their capability of imple-                                        correctable only at the next load of the configuration
menting complex circuits with a very short development                                          bitstream (which is often performed in some critical space
time. Today, manufacturers are producing very complex                                           applications), but the effect may still remain in the circuit
and resourceful FPGAs: State-of-the-art SRAM-based                                              until the next reset is performed. On the other hand, when
FPGAs embed megabits of RAM modules and plenty of                                               an upset affects the user sequential logic, it may have a
configurable logic and routing resources, which are making                                      transient effect if the flip-flop’s next load corrects it and if
feasible the implementation of circuits composed of                                             the effect is not propagated to other parts of the circuit or a
millions of gates. SRAM-based FPGAs are used for different                                      persistent effect if the effect is propagated to others parts of
applications, such as signal processing, prototyping, and                                       the circuit. For instance, a counter that is affected by an SEU
networking, or whenever reconfiguration capabilities are                                        cannot return to its original counting sequence until it
important.                                                                                      undergoes a reset. In this case, SEU can have more
   The architecture of SRAM-based FPGAs is composed of a                                        persistent effects in the implemented user circuit.
fixed number of routing resources (wires and programmable                                          SEUs may also affect the configuration control logic
switches), memory modules, and logic resources (i.e., look-                                     registers that are used during the download of the bitstream
up tables or LUTs, flip-flops or FFs). All these components are                                 within the configuration memory. An experimental analysis
programmed by downloading into an on-chip configuration                                         based on heavy ion beam is described in [3] that shows the
                                                                                                criticalities of such registers and that demonstrates that they
. The authors are with the Dipartimento di Automatica e Informatica,                            have a sensitivity to SEUs several orders of magnitude
  Politecnico di Torino, c.so Duca degli Abruzzi, 10129, Torino, Italy.                         lower with respect to the configuration memory.
  E-mail: {luca.sterpone, massimo.violante}@polito.it.                                             SEUs may also alter the content of half-latch structures
Manuscript received 13 Apr. 2005; revised 14 Oct. 2005; accepted 1 Feb. 2006;                   used to generate the constant values “0” or “1.” In our
published online 21 Apr. 2006.
For information on obtaining reprints of this article, please send e-mail to:                   work, we assumed that this problem is fixed according to
tc@computer.org, and reference IEEECS Log Number TC-0108-0405.                                  the work described in [4].
                                                       0018-9340/06/$20.00 ß 2006 IEEE          Published by the IEEE Computer Society
             Authorized licensed use limited to: Instituto Politecnico do Porto. Downloaded on April 3, 2009 at 17:14 from IEEE Xplore. Restrictions apply.
STERPONE AND VIOLANTE: A NEW RELIABILITY-ORIENTED PLACE AND ROUTE ALGORITHM FOR SRAM-BASED FPGAS                                                            733


   Several fault tolerance methods have been proposed in                                      or more resources. As a result of this analysis, we obtained a
the past in order to mitigate the effects of SEUs in the                                      theoretical explanation of the criticality observed within
configuration memory of FPGAs. These methods could be                                         circuits designed according to TMR.
divided in two main categories: reconfiguration-based,                                           After presenting an analysis of the effects of SEUs in the
which aim at restoring. as soon as possible. the proper                                       configuration memory of FPGAs, this paper presents a
values into configuration bits after an SEU happened [5],                                     reliability-oriented place and route algorithm, called RoRA,
and redundancy-based, which aim at masking the propaga-                                       that we developed for implementing dependable circuits
tion of SEU’s effects to circuit’s outputs [6], [7], [8]. Fault                               based on TMR architectures when SRAM-based FPGAs are
masking is usually achieved through Triple Module                                             used. The algorithm is able to place and route the logic
Redundancy (TMR), where three identical replicas of the                                       functions and the signals of a design in such a way that the
same circuit work in parallel and the outputs they produce                                    number of SEUs affecting the configuration memory and
are compared and voted through a majority voter. TMR is                                       possibly causing FPGA misbehaviors is drastically reduced
appealing for hardening designs implemented through                                           with respect of a common redundancy-based approach
SRAM-based FPGAs: memory elements, routing resources,                                         such as TMR. For the considered benchmark circuits, the
and logic resources are all susceptible to SEUs and, thus,                                    capability to tolerating SEU effects in the FPGA’s config-
redundancy must be adopted for all of them.                                                   uration memory increases up to 85 times with respect to a
   The resources that are most likely to be affected by SEUs                                  TMR approach. On the other side, to achieve such a level of
are those controlling the routing, indeed about 90 percent of                                 reliability, we have to pay in terms of computational time
the configuration memory bits are devoted to storing                                          and speed of the circuit: We indeed observed a reduction of
information about routing resources. Previous works,                                          about 22 percent of the circuit speed. Moreover, we
essentially based on a simulation tool, have experimentally                                   observed that RoRA does not introduce any area overhead
tested the capability of the TMR of tolerating SEUs [9]. In                                   of the circuit. The major advantage of the algorithm we
our work, in order to understand the criticalities of TMR,                                    developed is that it is transparent to designers, which can
we performed a detailed analysis of FPGA resources [10]                                       trade off fault tolerance versus area and time overhead.
[11], followed by extensive fault-injection experiments [12].                                    The rest of this paper is organized as follows: Section 2
We observed that a single SEU is capable of producing                                         summarizes the methods already proposed in the literature
multiple errors when affecting the portion of the FPGA’s                                      that can be adopted to mitigate the effects of SEUs in SRAM-
configuration memory that stores information about the                                        based FPGAs. Section 3 introduces the abstract model we
routing resources. Moreover, we identified faulty behaviors                                   developed for describing the architecture of an FPGA.
that are produced when one SEU hits either a programmed                                       Section 4 describes how SEU’s modify circuits implemented
memory bit or a nonprogrammed memory bit that may                                             on SRAM-based FPGAs and it describes how SEU’s effects
have side effects on the resources configured by the                                          can be modeled by means of our abstract model. Section 5
programmed ones. As a result of this effect, we observed                                      proposes the reliability-oriented place and route algorithm
that the TMR architecture is able to only partially mitigate                                  we developed, while Section 6 shows some experimental
the effects of SEUs in routing resources. This phenomenon                                     results. Finally, Section 7 presents our conclusions.
depends on many factors: the architecture of the adopted
FPGA family, the organization of the configuration memory,                                    2      PREVIOUS WORKS
the application that is implemented on the FPGA device, and
                                                                                              Several SEU mitigation techniques have been proposed in
the bit of the configuration memory affected by the SEU.
                                                                                              the past. These techniques can be organized into two
Thus, redundancy-based techniques by themselves are not                                       categories: reconfiguration-based techniques and redun-
sufficient to ensure complete reliability. In our investigations,                             dancy-based techniques. The former are used to correct
we considered several benchmark circuits designed accord-                                     fault effects, while the latter are used to mask fault effects.
ing to the TMR architecture and we observed that about
10 percent of the SEUs that affect the portion of the                                         2.1 Reconfiguration-Based Techniques
configuration memory that stores information about the                                        The simplest approach used to correct SEUs in the FPGAs’
routing resources produce multiple errors that the TMR is not                                 configuration memory is known as Scrubbing, which
able to mask [11]. As shown in [13], a clever selection of the                                consists of periodically rewriting the configuration memory
TMR architecture helps in reducing the number of escaped                                      [14]. The implementation of a scrubbing system introduces
SEUs, but it is unable to reduce them to zero.                                                a limited overhead that essentially corresponds in the circuit
   In order to identify the reasons that limit the effectiveness                              needed to control the bitstream loading process, as well as the
of TMR, we systematically analyzed the resources of FPGA                                      memory for storing an error-free bitstream. The system also
devices taken as a case study: the Xilinx Virtex family. We                                   needs a mechanism to control how often the scrubbing must
analyzed each FPGA’s resource and we identified all its                                       take place. The frequency of occurrence of scrubbing
possible configurations independently from the circuit                                        operations is normally referred to as the scrub rate and it is
mapped on the FPGA architecture. As an example, for each                                      determined on the basis of the expected SEU rate, i.e., on the
programmable interconnection point, we identified the                                         basis of a figure predicting how often an SEU may appear in
possible configurations that can be used by the place and                                     the FPGA configuration memory. An improvement to the
route tool for implementing any given circuit. In our analysis,                               Scrubbing method exploits the partial reconfiguration cap-
we identified critical situations, where SEU hitting the                                      ability of the latest generation of SRAM-based FPGAs, which
configuration memory may modify the configuration of two                                      allow reconfiguring only a user-selected portion of the

           Authorized licensed use limited to: Instituto Politecnico do Porto. Downloaded on April 3, 2009 at 17:14 from IEEE Xplore. Restrictions apply.
734                                                                                                IEEE TRANSACTIONS ON COMPUTERS,                       VOL. 55,   NO. 6,   JUNE 2006




Fig. 1. TMR architecture for throughput logic.

configuration memory (known as frame) while leaving the                                        Fig. 2. TMR scheme for state-machine logic.
remainder untouched [5]. This technique uses a Readback
process to read one frame at a time and compares it with the                                   affect every FPGA’s resource: routing resources implement-
expected one, which is stored in an error-free off-chip                                        ing interconnections, combinational resources, sequential
memory. Another commonly used technique to detect                                              resources, I/O logic. This means that three copies of the
errors by means of Readback is to use a CRC on each frame                                      whole circuit, including I/O logic, have to be implemented
storing only the check word rather than the entire frame of                                    to harden it against SEUs [14].
the configuration data [5].                                                                       The optimal implementation of the TMR circuitry inside
   When an SEU is detected, only the faulty frame is                                           SRAM-based FPGAs depends on the type of circuit that the
rewritten. Readback is normally transparent to the circuit                                     FPGA implements. As described in [14], the logic may be
the FPGA implements, which continues to operate normally                                       grouped into four different types of structure: throughput
even while the Readback process is running. The presence                                       logic, state-machine logic, I/O logic, and special features
of SEUs is thus checked online and the FPGA is set offline                                     (embedded RAM modules, DLLs, etc.). The throughput
only for the amount of time needed for rewriting the faulty                                    logic is a logic circuit of any size or functionality,
configuration-memory’s frame. As a result, the normal                                          synchronous or asynchronous, where the entire logic path
activity of the circuit the FPGA implements is stopped for a                                   flows from the inputs to the outputs of the module without
shorter period of time than in the Scrubbing case. In recent                                   ever forming a logic loop. The TMR architecture for module
SRAM-based FPGA devices, such as Virtex Xilinx FPGAs, it                                       M is implemented as shown in Fig. 1.
is possible to rewrite the configuration data without putting                                     Three copies of M are connected to a majority voter V,
the devices offline. This makes possible online and                                            which computes the output of throughput logic. In order to
transparent fault correction.                                                                  prevent common-mode failures, the inputs feeding the
2.2 Redundancy-Based Techniques                                                                throughput logic have to be replicated, too. This implies
                                                                                               that, when M is fed directly from I/O pins, the adoption of
The techniques presented in this section exploit additional
                                                                                               TMR implies tripling of the circuit I/O pins.
hardware components or additional computation time for
                                                                                                  State-machine logic is, by definition, state dependent. For
detecting the presence of SEUs modifying the expected
                                                                                               this reason, it is important that the TMR voting is
circuit operations and/or masking SEUs propagation to the
circuit’s outputs. It is worthwhile to underline here that the                                 performed internally rather than externally to such a
techniques presented in this section are not intended for                                      module. Thus, applying TMR to a state machine consists
removing SEUs from the faulty configuration memory, but                                        of tripling all circuits and nserting a majority voter for each
only for mitigating the SEU’s effects. SEUs may be removed                                     of the replicated feedback paths. The use of three redundant
from the faulty configuration memory by resorting to those                                     majority voters eliminates these as single points of failure,
techniques presented in the previous section.                                                  as shown in Fig. 2.
    Fault detection can be achieved by duplicating the circuit                                    Hardening the I/O logic through TMR causes a severe
the FPGA implements. The outputs the two replicas                                              increase in the number of required I/O pins and this
produce are continuously compared and an alarm signal                                          method can be used only when there are enough I/O
is raised as soon as a mismatch is found [14]. This solution                                   resources to achieve tripling of all the inputs and outputs of
is fairly simple and cost-effective; however, it is not able to                                the design. Therefore, as illustrated in Fig. 3, each
mask the SEU’s effects.                                                                        redundant module of a design that uses FPGA’s inputs
    When fault masking is mandatory, designers may resort                                      should have its own set of inputs. Thus, if one input is
to the Triple Module Redundancy (TMR) approach. The
basic concept of the TMR architecture is that a circuit can be
hardened against SEUs by designing three copies of the
same circuit and building a majority voter on the outputs of
the replicated circuits. Implementing TMR to prevent the
effects of SEUs in technologies such as ASICs is generally
limited to protecting only the memory elements since
combinational logic and interconnections are less sensitive
to SEUs. When the configuration memory of FPGAs is
considered, the TMR implementation should be reconsid-
ered since a modification in the configuration memory may                                      Fig. 3. TMR scheme for I/O logic.


            Authorized licensed use limited to: Instituto Politecnico do Porto. Downloaded on April 3, 2009 at 17:14 from IEEE Xplore. Restrictions apply.
STERPONE AND VIOLANTE: A NEW RELIABILITY-ORIENTED PLACE AND ROUTE ALGORITHM FOR SRAM-BASED FPGAS                                                            735




                                                                                              Fig. 5. Simple FPGA’s logic block.

                                                                                              order to address the multiple effects induced by SEUs in the
                                                                                              FPGA’s configuration memory, it is mandatory to select a
                                                                                              clever placement and routing of the design. To attack the
Fig. 4. Generic FPGA architecture model.
                                                                                              problem, we abstracted the physical characteristics of FPGA
                                                                                              by using a generic FPGA model.
affected by an SEU, it only affects one module of the TMR
architecture.
    The majority of any logic design can be realized by using                                 3      PRELIMINARIES OF SRAM-BASED FPGAS
look-up tables, flip-flops, and routing resources that can be                                        ARCHITECTURE
hardened against SEUs in the configuration memory                                             In order to describe the general characteristics of SRAM-
through the previously outlined methods. However, there                                       based FPGAs, we developed a generic FPGA model. This
are other special FPGA resources that allow for more                                          model permits us to focus attention on only those compo-
efficient and performing circuit implementations. These                                       nents that are affected by the multiple faults induced by SEUs.
include block RAM, LUT RAM, shift-register, and arith-                                        On these components, SEUs induce multiple effects that are
metic cores. For each of these features, there are particular                                 permanent until the corrupted bitstream is refreshed through
recommendations to be followed to guarantee an accurate                                       the download of the new one. Thus, place and route
TMR architecture. A detailed presentation of these recom-                                     algorithms must be enhanced in order to introduce redun-
mendations is out of the scope of this paper and we thus                                      dancies that are resilient to multiple effects, too.
suggest that interested readers refer to [5], [14].
    Other methodologies to implement redundant architec-                                      3.1 Generic SRAM-Based FPGA Model
tures on SRAM-based FPGAs are available, too. One of                                          A Field Programmable Gate Array consists of an array of
these techniques is oriented to performing all mitigations                                    logic blocks that can be interconnected selectively to
using the description language to provide a functional TMR                                    implement different designs. An FPGA logic block is
methodology [8]. According to this methodology, inter-                                        typically capable of implementing many different combina-
connections and registers are tripled and internal voters are                                 tional and sequential logic functions. Today, commercial
used before and after each register in the design. The                                        FPGAs use logic blocks that are based on transistor pairs,
advantage of this methodology is that it can be applied to                                    basic small gates such as two-input NANDs or exclusive
any type of FPGAs.                                                                            ORs, multiplexers, look-up tables (LUTs), and wide-fanin
    Another approach is based on the concept that a circuit                                   AND-OR structures. An FPGA routing architecture incor-
can be hardened against SEUs by applying TMR selectively                                      porates wire segments of varying length that can be
(STMR) [16]. This approach extends the basic TMR                                              interconnected via electrically programmable switches.
technique by identifying SEU-sensitive gates in a given                                       The distribution of the length of the wire segments directly
circuit and then by introducing TMR selectively on these                                      affects the density and performance achieved by an FPGA.
gates, only. Although this approach optimizes TMR by                                             The SRAM-based FPGA generic model used in this work
replicating only the most sensitive portions of a circuit (thus                               is shown in Fig. 4.
saving area), it needs a high number of majority voters since                                    This model is common to the architecture of several
one voter is needed for each SEU-sensitive circuit portion.                                   families of SRAM-based FPGAs [18], [17]. The model
    To reduce both the pin count and the number of voters                                     consists of three kinds of resources: wiring segments, logic
used to implement the TMR approach, Lima et al. proposed                                      blocks, and switch boxes.
a technique based on time and hardware redundancy to                                             Wiring segments are chunks of wiring devoted to transfer
harden combinational logic [6], [7]. This technique com-                                      information among logic blocks. Wiring segments are
bines duplication with comparison (DWC) with a concur-                                        organized in the horizontal plane, traversing an FPGA from
rent error detection (CED) machine based on time                                              east to west, and the vertical plane, traversing the FPGA from
redundancy that works as a self-checking block. DWC                                           north to south. Wiring segments are used in conjunction with
detects faults in the system and CED detects which blocks                                     switch boxes to deliver information between any locations
are fault-free. Although this fault-tolerant technique aims to                                inside FPGAs. Logic blocks contain the combinational and
reduce the number of I/O pads and the power dissipation,                                      sequential logic required to implement the user circuit, which
it is applied on a high-level description of the circuit and,                                 is defined by writing proper bit patterns inside the FPGA’s
thus, if their components are not properly placed and                                         configuration memory. Fig. 5 shows an example of a simple
routed on the FPGAs, they may suffer the multiple effect                                      logic block, where we can recognize a look-up table (LUT) to
induced by SEU in the FPGA’s configuration memory. In                                         implement combinational functions, a flip-flip (FF) to

           Authorized licensed use limited to: Instituto Politecnico do Porto. Downloaded on April 3, 2009 at 17:14 from IEEE Xplore. Restrictions apply.
736                                                                                               IEEE TRANSACTIONS ON COMPUTERS,                       VOL. 55,   NO. 6,   JUNE 2006




Fig. 6. FPGA routing graph.

implement memory elements, and two multiplexers
(MUX) needed for implementing different signal forward-
ing strategies.
   Each logic block has a number of input and output
signals connected to adjacent switch boxes and logic blocks
                                                                                              Fig. 7. Modeling of a FPGA implementing a circuit by means of the
through wiring segments. The SRAM programming tech-
                                                                                              routing graph.
nology uses static RAM cells to control pass gates or
multiplexers.
                                                                                              segments. As shown in Fig. 6, the routing graph has two
   The programmable interconnection network consists of
                                                                                              types of vertices: logic vertices that model the FPGA’s logic
wiring segments that can be connected or disconnected by
                                                                                              blocks and routing vertices that model the input/output
several programmable interconnect points (PIPs). The PIPs are
                                                                                              ports of each switch box. For each switch box having
organized to form switch matrices and are located inside
                                                                                              I inputs and O outputs, the routing graph has I+O routing
switch boxes, which are controlled by the FPGA’s config-
                                                                                              vertices. Moreover, the routing graph has two types of
uration memory. PIPs (also called routing segments) provide
                                                                                              edges: routing edges that model the FPGA’s PIPs as edges
configurable connections between pairs of wiring segments.                                    between two different routing vertices and wiring edges that
The basic PIP structure consists of a pass transistor                                         model the FPGA’s wiring segment as edges between logic
controlled by a configuration memory bit. There are several                                   vertices and routing vertices.
types of PIPs: cross-point PIPs that connect wire segments                                        An FPGA’s switch box in the graph model is described
located in disjoint planes (one in the horizontal plane and                                   as different routing edges forming a structure known as a
one in the vertical plane), break-point PIPs that connect wire                                Universal Switch Module (USM) [20]. The number of
segments in the same plane, decoded and nondecoded multi-                                     vertices and edges modeling switch boxes and logic blocks
plexer (MUX) PIPs, and compound PIPs, which consist of a                                      depends on the selected FPGA’s architecture.
combination of n cross-point PIPs and m break-point PIPs,                                         According to our model, a logic signal connecting two
each controlled separately by groups of configuration                                         logic blocks in the circuit the FPGA implements is modeled
memory bits [19]. Decoded MUX PIPs are groups of                                              by the routing graph as a path that may span over different
2k cross-point PIPs sharing common output wire segments                                       wiring edges and routing edges. As illustrated in Fig. 7,
controlled by k configuration memory bits. Conversely,                                        edges and vertices are colored to indicate that the
nondecoded MUX PIPs consist of k wire segments con-                                           corresponding FPGA’s resource is used to implement a
trolled by k configuration bits.                                                              circuit. In case the FPGA implements different circuits or
                                                                                              different replicas of the same circuit, different colors are
3.2 FPGA Routing Graph
                                                                                              used to mark edges and vertices of each circuit or replica.
We developed a model that abstracts most of the details of                                        Moreover, a direction is associated to any edge to
SRAM-based FPGAs. It is general enough to describe any                                        describe the direction of the information flow. The
FPGA architectures and it conveys only the information                                        proposed graph model is very flexible and can be adopted
meaningful for our dependability-oriented analysis. Indeed,                                   to describe any type of FPGA’s architecture.
for the sake of our work, it is important to capture
information about which logic blocks are used by a circuit
mapped on a FPGA, as well as all the information about the                                    4      RADIATION EFFECTS                     ON     SRAM-BASED FPGAS
interconnections between used logic blocks (i.e., how wiring                                  The past 30 years have seen the discovery that electronic
segments and switch matrices are configured for imple-                                        circuits are sensitive to transient effects such as Single Event
menting a circuit). Conversely, it is not important to know                                   Upset provoked by ionizing radiation [21]. Since the
which function (combinational or sequential) a logic block                                    discovery of SEUs at aircraft altitudes, researchers have
implements.                                                                                   made significant efforts to monitor the environment. The
   The resources in an SRAM-based FPGA that are used to                                       space and the earth environments contain various ionizing
implement a circuit can be described by resorting to a                                        radiations, generated by natural phenomena such as sun
routing graph, where the graph’s vertices model logic blocks                                  activity and manmade radiation that interacts with silicon
and switch boxes while the graph’s edges model wiring                                         atoms. If, at ground level, neutrons and alpha particles are

           Authorized licensed use limited to: Instituto Politecnico do Porto. Downloaded on April 3, 2009 at 17:14 from IEEE Xplore. Restrictions apply.
STERPONE AND VIOLANTE: A NEW RELIABILITY-ORIENTED PLACE AND ROUTE ALGORITHM FOR SRAM-BASED FPGAS                                                            737


the most frequent causes of SEUs, in a space environment,
they are protons and heavy ions. When a particle hits the
surface of a silicon area, it loses its energy through the
production of free electron-hole pairs, resulting in a dense
ionized track in the struck region [22]. Interestingly, when
the struck silicon area implements a static memory cell, the
transient pulse may induce permanent changes: It can
indeed activate the inversion of the stored value. In SRAM-                                  Fig. 8. Possible multiple effects induced by one SEU.
based FPGAs, transient faults originating in the FPGA’s
configuration memory have dramatic effects since the                                              3. FF error. The SEU modified the configuration of an
circuits the FPGAs implement are totally controlled by the                                           FF, for example, changing the polarity of the reset
content of the configuration memory, which is composed of                                            line or that of the clock line.
static RAM cells [23], [24]. In this section, we will accurately                                In order to model faulty logic blocks in the routing graph
describe the effects of the SEUs within the configuration                                    described in Section 3.2, we assumed using the black color
memory of SRAM-based FPGAs; we will then describe                                            to mark each vertex corresponding to a faulty logic block.
these effects through the graph model presented in the                                          As far as switch boxes are concerned, different phenom-
previous section.                                                                            ena are possible. Although an SEU affecting a switch box
                                                                                             modifies the configuration of one PIP, both single and
4.1 SEU Effects on FPGA’s Configuration Memory
                                                                                             multiple effects can be originated.
SRAM-based FPGAs suffer from radiation as other semi-                                           Single effects happen when the modifications induced by
conductor devices. Designers and users have to consider                                      the SEU alter only the affected PIP. In this case, one
these radiation effects before including an SRAM-based                                       situation may happen, which we call open: The SEU
FPGA within a space application. SRAM-based FPGAs, as                                        changes the configuration of the affected PIP in such a
other devices that contain several arrays of memory cells,                                   way that the existing connection between two routing
are extremely sensitive to SEUs due to the large amount of                                   segments is opened. In the routing graph, we model such a
memory within a relatively small amount of silicon area.                                     situation by deleting the routing edge corresponding to the
   SRAM-based FPGAs contain a large amount of memory                                         PIP that connects the two routing vertices.
cells within a device, implementing the configuration                                           In order to describe the multiple effects in terms of
memory, which are sensitive to SEUs. The SEU Upset rate                                      modifications to the routing graph, let us consider the two
is related to the kind of radiation environment where the                                    routing edges AS =AD and BS =BD connecting the routing
device will be used. To mention an estimation, in the                                        vertices AS , AD , BS , BD , as shown in Fig. 8a. We identified the
Cibolla flight experiment using an SRAM-based FPGA                                           following modifications that could be introduced by an SEU:
Virtex V1000 containing more than six million bits, it has
                                                                                                  1. Short between AS =AD and BS =BD . As shown in
been calculated that worst-case SEU Upset Rates on an
                                                                                                     Fig. 8b, a new routing edge is added to the graph
average orbit rangeg from 0.13 SEUs per hour under a quiet
                                                                                                     that connects either one end of A to one end of B.
sun to 4.2 SEUs per hour under a peak upset rate [25]. The                                           This effect can happen if AS =AD and BS =BD belong
effects induced by SEUs on SRAM-based FPGAs have been                                                to the same switch box and the SEU enables the
recently investigated thanks to radiation experiments [26],                                          nondecoded or decoded PIP that connects B with A.
[27], [28]. More recently, an analysis that combines the                                        2. Open, which corresponds to the deletion of both
results of radiation testing with those obtained while                                               routing edges AS =AD and BS =BD , as shown in
analyzing the meaning of every bit in the FPGA’s config-                                             Fig. 8c. This situation may happen if a decoded PIP
uration was presented in [10].                                                                       controls both AS =AD , and BS =BD .
   Although SEUs are transient by nature, when they                                             3. Open/Short, which corresponds to the deletion of
originate in the configuration memory, their effects are                                             either the routing edge AS =AD or the one BS =BD and
permanent since SEUs remain latched until the configura-                                             to the addition of the routing edge AS =BD or BS =AD ,
tion memory is rewritten with new configuration data. The                                            as shown in Fig. 8d. This situation may happen if a
errors produced by SEUs in the FPGA’s configuration                                                  decoded PIP controls both AS =AD and BS =BD .
memory can be classified into two different categories:                                         The short effects, as shown in Fig. 8b, may happen if two
errors that affect logic blocks and errors that affect the                                   nets are routed on the same switch box and a new edge is
switch boxes.                                                                                added between them. This kind of faulty effect happens
   As far as logic-block errors are concerned, several                                       when a cross-point PIP, which is nonbuffered and has
different phenomena may be observed, depending on                                            bidirectional capability, links two wire segments located in
which resource of the logic block the SEU modified:                                          disjoint planes. Conversely, the Open and the Open/Short
                                                                                             effects, as shown in Fig. 8c and Fig. 8d, may happen if two
   1.   LUT error. The SEU modified one bit of an LUT, thus                                  nets are routed using decoded PIPs.
        changing the combinational function it implements.
   2.   MUX error. The SEU modified the configuration of a                                   4.2 Constraints for Achieving Fault Tolerance
        MUX in the logic block; as a result, signals are not                                 As previously described, one SEU affecting the FPGA’s
        correctly forwarded inside the logic block.                                          configuration memory may provoke multiple errors by

          Authorized licensed use limited to: Instituto Politecnico do Porto. Downloaded on April 3, 2009 at 17:14 from IEEE Xplore. Restrictions apply.
738                                                                                                IEEE TRANSACTIONS ON COMPUTERS,                       VOL. 55,   NO. 6,   JUNE 2006




Fig. 9. Possible short configuration for a given pair of routing edges.

changing the configuration of routing resources. As a result,
hardening techniques developed according to the single-
fault assumption are not adequate to cope with the multiple
effects of SEUs in the configuration memory controlling
routing resources. In our analysis, we considered the TMR                                      Fig. 10. The flow of the proposed Reliability-Oriented Place and Route
architecture and we observed that many situations exist                                        Algorithm RoRA.
where one SEU provokes multiple errors in such a way that
the TMR is no longer able to mask the SEU’s effects [11]. As                                        1.    All the circuit modules and connections must be
an example of this situation, let us refer to Fig. 8a and let us                                          replicated three times.
assume that AS =AD and BS =BD are two routing edges                                                 2.    The outputs of the three circuit replicas must be
belonging to two different replicas of the circuit hardened                                               voted according to the TMR principle.
according to TMR. In this case, each SEU resulting in the                                           3.    The elements of the resulting TMR architecture
erroneous configurations reported in Fig. 8b, Fig. 8c, and                                                (logic functions and connections among them) must
Fig. 8d violates the single-fault assumption.                                                             be placed and routed in such a way that, given the
   This problem is particularly critical since 90 percent of                                              corresponding routing graph, each new edge that is
the bits of FPGAs’ configuration memory are devoted to                                                    added (or deleted) to (from) the graph cannot
programming the routing resources. While it is possible that                                              provoke any fault belonging to the following
one upset may modify more than one routing edges, this                                                    categories:
becomes a problem only when two routing edges from two                                                    .      Short between different connections belonging
different TMR replicas (i.e., domains) are affected.                                                             to different circuit replicas.
   To estimate the magnitude of the problem, we considered                                                .      Open affecting different connections belonging
one switch box and, for a given pair of routing edges                                                            to different circuit replicas.
(implemented by two PIPs of the same switch box) that
belong to two different TMR replicas, we identified all the
faulty configurations that are possible given the routing
                                                                                               5      THE      PROPOSED RELIABILITY-ORIENTED PLACE
architecture of a selected FPGA device (in our case, we                                               AND      ROUTE ALGORITHM: RORA
considered the Xilinx Virtex). For each faulty configuration,                                  In general, the commonly used design-flow to map designs
we computed the corresponding image of the configuration                                       onto an SRAM-based FPGA consist of three phases. In the
memory, i.e., the faulty bitstream. We then compared the                                       first phase, a synthesizer is used to transform a circuit
faulty bitstream with the reference one and we observed that                                   model coded in a hardware description language into an
they differ by one bit only. This means that one SEU may                                       RTL design. In the second phase, a technology mapper
provoke multiple effects. We performed the aforementioned                                      transforms the RTL design into a gate-level model
procedure for all the faulty cases (i.e., Short, Open, and Open/                               composed of look-up tables (LUTs) and flip flops (FFs)
Short) and we calculated that 72 percent of all the configura-                                 and it binds them to the FPGA’s resources (producing the
tion memory bits controlling the considered switch box could                                   technology-mapped design). In the third phase, the tech-
produce critical situations if used for routing different TMR                                  nology-mapped design is implemented on the FPGA by the
replicas. Please note that the switch boxes within the FPGA                                    place and route algorithm.
are all equal and, therefore, the above considerations are                                         The problem of how to implement a circuit on an FPGA
general. An example of such a kind of analysis related to the                                  device is divided into two subproblems: placement and
Short fault effect is illustrated in Fig. 9.                                                   routing. The main reason behind such decomposition is to
   As a result, unless suitable countermeasures are devel-                                     reduce the problem complexity. Our proposed reliability-
oped, the TMR approach is no longer suitable for achieving                                     oriented place and route algorithm, called RoRA, first reads
fault tolerance.                                                                               a technology mapped design. Then, it performs a reliability-
   Following the analysis we performed on FPGAs’                                               oriented placement of each logic function and, finally, it
architectures and on the organization of FPGAs’ configura-                                     routes the signals between functions in such a way that
tion memory, we defined the following constraints that                                         multiple errors affecting two different connections are not
must be enforced by the place and route algorithm in order                                     possible.
to develop circuit implementations based on TMR that are                                           The algorithm we developed is described in Fig. 10,
resilient to multiple errors:                                                                  where the placement and routing steps are shown in a

            Authorized licensed use limited to: Instituto Politecnico do Porto. Downloaded on April 3, 2009 at 17:14 from IEEE Xplore. Restrictions apply.
STERPONE AND VIOLANTE: A NEW RELIABILITY-ORIENTED PLACE AND ROUTE ALGORITHM FOR SRAM-BASED FPGAS                                                             739


C-like pseudocode. Our proposed RoRA Placement algo-
rithm performs a robust placement, which implements the
TMR principle, executing four distinct functions:

   1.   The generate_functions_replicas() first
        reads the design description produced after the
        technology mapping and identifies the logic func-
        tions in the design. Second, it generates three
        replicas of the logic functions belonging to the
        original design. Let F be the set of the original
        design’s logic functions: At the end of this step, the
        three sets, F1 , F2 , and F3 , are produced.
    2. The generate_majority_voter() analyzes the
        three logic function sets, F1 , F2 , and F3 , and
        generates a logic functions set F4 that performs the
        majority voting between them.
    3. The generate_partitions() partitions the
        routing graph’s vertices into four nonoverlapping
        sets, where each set Si ði ¼ 1; 2; 3; 4Þ has enough                                    Fig. 11. The flow of the RoRA Placement algorithm.
        logic vertices to contain the logic functions of each
        set Fi ði ¼ 1; 2; 3; 4Þ.                                                                               If RV is added to the circuit and an SEU
    4. Every logic function in set Fi is placed heuristically                                               affects the routing resources in such a way that
        to the logic vertices in set Si , where i ¼ 1; 2; 3; 4. This                                        both RV and RV’ are affected, the TMR no
        phase takes care of marking the graph, by assigning                                                 longer works as expected. The Forbidden
        each logic function to exactly one logic vertex in our
                                                                                                            Vertices Sets (FVSs), which are empty at the
        routing graph.
                                                                                                            beginning of the RoRA routing, contain the
    The RoRA placement algorithm places each logic                                                          vertices marked as forbidden and belonging to
function in Fi with the graph vertices belonging to Si , as                                                 the correspondent graph routing vertices set Si .
well as the majority voter on S4 . After the placement                                            RoRA performs the routing of each net by taking into
process, each set Si exclusively contains the function of set                                  consideration all the graph’s vertices labeled as free and it
Fi . This solution allows us to guarantee that single or                                       progressively updates the FVSs, adding the vertices marked
multiple effects within one set Si only do not provoke any                                     as forbidden.
misbehavior of the circuit. Indeed, according to our                                              As soon as the net is routed and the marking of the graph
placement, only multiple effects on the boundary of two                                        has been updated (i.e., the vertices in the routing graph and
different sets, Si 6¼ Sj , may generate multiple errors that                                   the associated edges have been marked as used by the
affect two different replicas.                                                                 circuit implementation), the update() function is used to
    When all the logic functions are placed with the                                           modify the set i of forbidden vertices (FVSi), which is
corresponding set of logic vertex, RoRA performs the                                           empty at the beginning of RoRA routing. The implementa-
routing of the interconnections between the logic vertices.                                    tion details of the RoRA Routing algorithm are described in
Basically, the RoRA Routing algorithm works on the                                             Section 5.2.
routing graph we developed and it routes each connection                                       5.1 RoRA Placement Algorithm
between two logic vertices through the shortest path it can
                                                                                               The algorithm we developed starts by reading a description
find. During path selection, the RoRA Routing algorithm
                                                                                               of the circuit which consists of unplaced logic blocks and a
dynamically labels the graph’s routing vertices in such a                                      set of nets. While standard placement techniques are
way that it avoids the instantiation of two connections that                                   sufficient if the application mapped on the FPGA does not
may be subject to Short effects. Each graph routing vertex                                     require any particular reliability constraints, special atten-
(RV) is labeled as free, used, or forbidden, with the following                                tion must be taken in FPGA placement algorithm for safety-
meanings:                                                                                      critical applications where high reliability is a mandatory
                                                                                               requirement.
   1.   Free: The routing vertex is not used by any
                                                                                                  The RoRA Placement algorithm, which is described in
        connection.
                                                                                               Fig. 11 as C-like pseudocode, performs the placement of a
   2.   Used: The routing vertex is already used by a
                                                                                               logic function by using the concept of window. A window is
        connection.
                                                                                               defined as a rectangular portion of the logic vertices
   3.   Forbidden: A routing vertex RV is forbidden if and
                                                                                               belonging to the routing graph space. In more detail, the
        only if:
                                                                                               RoRA Placement algorithm uses two types of windows: the
        .     it belongs to set Si (RV 2 Si ) and                                              place window P W and the nearby window W . The place
        .     at least one routing edge or one wiring edge                                     window PW defines a rectangular space containing the
              exists between RV and another vertex RV’                                         logic vertices already connected to the logic vertex being
              belonging to Sj (RV0 2 Sj ), where i 6¼ j.                                       placed, while the nearby window W defines the space

            Authorized licensed use limited to: Instituto Politecnico do Porto. Downloaded on April 3, 2009 at 17:14 from IEEE Xplore. Restrictions apply.
740                                                                                               IEEE TRANSACTIONS ON COMPUTERS,                       VOL. 55,   NO. 6,   JUNE 2006


containing a logic vertex labeled as free and a candidate for
the placement.
   The RoRA Placement algorithm implements different
heuristic cost functions that measure the wirelength as well
as the routability of the placement. The wirelength is based
on the Manhattan distance that defines the distance
between two points measured along axes at right angles
that include horizontal and vertical components. Minimiz-
ing the wirelength minimizes the number of routing
resources required and, thus, reduces the existence of SEU
sensitive routing resources; thus, the Manhattan distance is
minimized. However, the minimization of the Manhattan
distance does not guarantee that a signal can be routed
successfully. Not all the available routing resources can
indeed be used since some of them must be avoided to
satisfy reliability constraints. To address this problem, we
added two metric functions: local density and global                                          Fig. 12. Example of place window.
constraints, that are defined as follows:

      1. The local_density (W) computes the number of                                         logic vertex. Otherwise, W is generated by adding one row
         routing resources available in the nearby window W.                                  or column that contains at least one free logic vertex to the
         It returns the number of available edges that link                                   dimension of PW.
         two routing vertices labeled as free.                                                   During the placing phase, the RoRA Placement algo-
   2. The global_constraints (W) computes the                                                 rithm executes three different steps until the logic function
         routing reliability constraints in the nearby                                        LF is placed on a logic vertex V. First, the RoRA Placement
         window W. It returns the number of routing                                           algorithm computes the heuristic cost functions local density
         reliability constraints that may be generated be-                                    and global constraints on the nearby window W and
         tween the routing vertices labeled as free and                                       compares the respective values with their limits. The limits
         comprised of the nearby window W.                                                    depend on the cardinality of the adopted routing graph
   The local density addresses the degree of routability of                                   and, thus, on the kind of FPGA architecture used. If the
the placement. It attaches a cost to the placement consider-                                  limits are not respected, the nearby window W is updated
ing the capability of routing resources. Thus, it aims at                                     until the cost function is satisfied.
avoiding any competition among signals for insufficient                                          Second, a logic vertex labeled as free is selected from the
routing resources. The global constraints address the                                         nearby window W belonging to the partition set Si . A cost
inadequacies of the routability computing the congestion                                      MDLV is associated with every logic vertex DLV that is
provoked by the routing reliability constraints. These                                        already placed on the partition Si and that is connected to the
metrics consist of looking at the region contained in the                                     logic function LF. Each cost MDLV is defined calculating the
nearby window W and computing a cost calculating the                                          Manhattan distance between each DLV and the logic vertex V
number of net and routing reliability constraints that may                                    candidate for the placement of the logic function LF. Finally,
exist in this region.                                                                         the RoRA Placement algorithm calculates a Manhattan
   For a given placement phase, the generated nearby                                          Cost C for the whole DLVs and, if C satisfies the max
window W in the routing graph is examined. This phase                                         length distance, the logic function LF is placed on the
allows the RoRA routing algorithm to easily find a route for                                  candidate logic vertex V.
every signal since the routing capability of the considered
nearby window W where the signals have to be routed is                                        5.2 RoRA Routing Algorithm
computed during the placement phase.                                                          The FPGA routing is a complex combinatorial problem.
   The RoRA placement of a logic function LF on a partition                                   Basically, the RoRA router algorithm works on the routing
set Si is divided into two phases: preplacement and                                           graph we developed and routes each connection between
placement. During the preplacement, the window PW is                                          two logic vertices through the shortest path it can find.
generated considering the logic functions connected to LF                                     During path selection, RoRA dynamically labels the graph’s
that have already been placed on the logic vertices DLVs.                                     routing vertices in such a way that it avoids the instantia-
   We describe in Fig. 12 an example of the PW generation.                                    tion of two connections belonging to two different sets S
Supposing that a logic function LFA is connected to the                                       that may be subject to multiple effects.
logic functions LFB , LFC , and LFD , as shown in Fig. 12a.                                      The general approach implemented in the RoRA router
Additionally, we suppose that only LFB and LFD have                                           is a two-phase method composed of a global routing
already been placed on the logic vertices DLVB and DLVD ;                                     followed by a detailed routing. As shown in Fig. 13, given a
during the placement of the logic function LFA , the place                                    source vertex SV belonging to a logic function Fi , a
window PW will be generated as described in Fig. 12b: It                                      connection between SV and all its destination vertices
selects an area where a logic vertex could be used for the                                    DVs is computed executing the global routing followed by
placement of the logic function LFA . Moreover, W is                                          the detailed routing. The global routing balances the density
initialized as equal to PW only if PW contains at least one                                   of all the routing structures in relation to the reliability

           Authorized licensed use limited to: Instituto Politecnico do Porto. Downloaded on April 3, 2009 at 17:14 from IEEE Xplore. Restrictions apply.
STERPONE AND VIOLANTE: A NEW RELIABILITY-ORIENTED PLACE AND ROUTE ALGORITHM FOR SRAM-BASED FPGAS                                                             741




Fig. 13. The flow of the RoRA global and detailed router algorithm.                            Fig. 14. Super-Routing graph architecture.

constraints, while the detailed routing assigns to the paths                                   with the logic vertices corresponding to the destinations of
specific wiring edges, routing edges, and routing vertices.                                    the connection. The routing tree expansion is made by
The global routing is based on a Super-routing graph                                           choosing wiring and routing edges linked by routing
architecture which is composed of logic vertices and super-                                    vertices labeled as free in our routing graph and belonging
routing vertices that are linked by a super edge (SE), as                                      to the Global route selected by the RoRA Global routing.
shown in Fig. 14.                                                                              The RoRA detailed routing is based on the approach
   The Super-Routing graph is used to execute the global                                       developed for the Pathfinder negotiated congestion algo-
routing. The global routing on the Super-Routing graph                                         rithm [29], [30]. It is based on the construction of a routing
architecture is performed by the function find global_                                         tree. The maze routing, described in [31], is usually used for
route SV to DV. This function generates a global route P                                       this purpose. The RoRA detailed router expands the routing
that consists of a sequence of super edges and super routing                                   tree progressively to the leaves and preserves the routing
vertices that link the source logic vertex SV to the                                           channel by the global routing: Starting from a tree
destination logic vertex DV. Associating the Super-Routing                                     composed of the source vertex only, new vertices are added
graph architecture with the FPGA routing graph, a global                                       until all the destinations of the connection have been added
route P is decomposed into a sequence of routing vertices,                                     to the tree. The previously executed global routing allows
wiring, and routing edges that connect SV to DV. Thus, the                                     preserving memory and running time for the routing tree
RoRA Global Routing generates a set of candidate paths                                         expansion since the detailed router may choose the net
that could be chosen by the RoRA Detailed Routing to                                           paths on a limited space of solutions. The RoRA detailed
connect SV to DV.                                                                              router uses the routing tree construction developed for the
   To determine whether a global route P is optimal, the                                       maze routing approach with a fundamental difference in
RoRA Global Routing selects the super edges and the super-                                     the creation of each routing tree: The key step of the RoRA
routing vertices optimizing an heuristic cost function that                                    detailed router is performed during the routing tree
consists of two components: The first component aims at                                        expansion, where those vertices that are labeled as for-
minimizing the length of the global route by selecting the                                     bidden are not used. Moreover, the set of forbidden vertices
shortest way to connect the source to the sink, while the                                      is updated in the second phase of the RoRA detailed router
second component computes the availability of the global                                       after the creation of the routing tree.
route calculating the number of vertices labeled as forbidden                                     The detailed routing generates the routing tree comput-
out of the number of vertices labeled as free existing in it.                                  ing the function create_routing_tree(). This function
The availability Af of a global route P composed of i super-                                   performs the computation of the routing tree by taking into
routing vertices SRV is defined as:                                                            consideration all the graph’s vertices not labeled as
                                                                                               forbidden and belonging to the global route P selected.
                                     X freeðSRVi Þ                                             After the expansion, each routing tree (SV, DVs) may
                      Af ðP Þ ¼                       ;                               ð1Þ
                                         avoidðSRVi Þ                                          contain a number of routing vertices that could have a
                                     i¼0
                                                                                               routing edge that links them to other routing vertices in the
where avoidðSRVi Þ is the number of routing vertices labeled                                   routing graph model by the modification of a single
as forbidden belonging to the Super-Routing vertex SRVi and                                    configuration memory bit. The update function of the RoRA
freeðSRVi Þ is the number of routing vertices labeled as free.                                 algorithm selects these routing vertices belonging to the set
The global router makes the routing problem easier since it                                    Si and checks if each of them could be linked, by changing a
can estimate the routing congestion due to the routed                                          single configuration memory bit, to the routing tree routed
interconnection and the forbidden vertices. When a global                                      on the routing graph belonging to the set Sj , where i 6¼ j. If
route P is selected, the RoRA Routing algorithm executes                                       this happens, the update function labels it as forbidden. In
the detailed routing.                                                                          this way, no routing edge could link routing vertices
   The RoRA detailed routing algorithm is split into two                                       belonging to a different set S and, thus, no SEU affecting the
phases. In the first phase, it expands each routing tree, where                                configuration memory of the SRAM-based FPGA could
the root is associated with the logic vertex corresponding to                                  affect more than one replica of the implemented TMR
the source of the connection, while the leaves are associated                                  architecture.

            Authorized licensed use limited to: Instituto Politecnico do Porto. Downloaded on April 3, 2009 at 17:14 from IEEE Xplore. Restrictions apply.
742                                                                                              IEEE TRANSACTIONS ON COMPUTERS,                       VOL. 55,   NO. 6,   JUNE 2006


                           TABLE 1                                                               The characteristics of the adopted circuits are reported in
            Characteristics of the Adopted Circuits                                          Table 1, where we report the number of FPGA slices that the
                                                                                             circuits occupy (column Area), as well as their maximum
                                                                                             working frequencies (column Speed), for the plain, the
                                                                                             TMR, and the RoRA versions. It is interesting to observe
                                                                                             that, for the considered benchmarks, RoRA does not
                                                                                             introduce any area overhead with respect to the traditional
                                                                                             TMR solution (which is about three times larger than the
                                                                                             plain circuit) and, in some cases, it is even less resource
                                                                                             demanding due to the better packing of the logic resources
                                                                                             into the slices. Conversely, when placed and routed through
                                                                                             RoRA, the circuits become 22 percent slower on the average
                                                                                             than their TMR versions. This effect is the result of the
6     EXPERIMENTAL ANALYSIS                                                                  dependability-oriented routing algorithm that RoRA imple-
                                                                                             ments: The shortest path is not always selected as the best
In this section, we will describe the experiments we                                         solution since it may not be acceptable from the depend-
performed to evaluate the effectiveness of the RoRA                                          ability point of view.
algorithm. For this purpose, we implemented a prototype of                                       To measure the hardness of the obtained circuits, we
the RoRA algorithm, which accounts for about 8K lines of                                     injected 15,000 randomly selected SEUs in the FPGA’s
ANSI C code. We used the tool we developed for hardening                                     configuration memory bits. These bits are selected among
five circuits, which we mapped on a Xilinx Spartan II device.                                those configuration memory bits that define the designs we
    To evaluate the robustness of the circuit we obtained                                    implemented.
through RoRA against transient faults affecting the FPGA’s                                       Please note that they may both be programmed or not
configuration memory and, in particular, against faults                                      since both of them may be critical to the mapped design.
affecting the routing resource, we used the fault injection                                      The number of injected faults was selected to guarantee
environment we presented in [11] to perform extensive                                        that the gathered results are statistically meaningful. For
fault-injection experiments.                                                                 these purposes, we repeated the experiments with 150,000
    We considered three purely combinational case studies                                    randomly selected SEUs and we observed negligible mod-
implemented on a Xilinx Spartan XC2S30PQ144 [32]: an                                         ifications to the results already gathered with 15,000 faults.
adder with two 8-bit wide operands, an adder working on                                          Since the voter we used is not fault-tolerant, we did
two 16-bit wide operands, and a multiplier with two 8-bit                                    not inject faults in the portion of the configuration
wide operands. We also developed the analysis for an                                         memory that implements it. The results we obtained are
elliptic filter in order to evaluate the sensitivity to SEUs in                              reported in Table 2, where Injected Faults is the number of
the configuration memory of SRAM-based FPGAs imple-                                          injected SEUs and Wrong Answer is the number of SEUs
menting a sequential circuit. Besides, in order to evaluate                                  for which the faulty circuit produces outputs that differ
the capability of RoRA on a real design, we mapped an IP-                                    from the fault-free one. In order to show the contribution
core that implement the Control Area Network (CAN) that                                      of the different FPGA’s resources, we reported the
uses about 98 percent of the resources of a Spartan II                                       number of injected faults and the observed wrong
XC2S200 [32].                                                                                answers for the FPGA’s CLBs and Routing resources.
    In order to evaluate the effectiveness of the algorithm we                               During our experiments, we adopted a workload com-
developed, we mapped the five circuits using RoRA, as well                                   posed of 100,000 randomly generated input stimuli.
as the TMR approach (i.e., each circuit is implemented by                                        From these results, we can observe that most of the injected
using three identical modules performing the same task and                                   faults provoke erroneous behaviors in the plain, unhardened
a majority voter). In the latter case, TMR circuits are placed                               circuits. Moreover, even when the TMR architecture is
and routed by standard tools, which do not pose any                                          adopted, a significant percentage of the injected faults
emphasis in enforcing dependability-oriented place and                                       still produces a wrong answer. We carefully analyzed
route rules.                                                                                 each fault escaping the TMR and we observed that most


                                                                                 TABLE 2
                                                                          Fault Injection Results




          Authorized licensed use limited to: Instituto Politecnico do Porto. Downloaded on April 3, 2009 at 17:14 from IEEE Xplore. Restrictions apply.
STERPONE AND VIOLANTE: A NEW RELIABILITY-ORIENTED PLACE AND ROUTE ALGORITHM FOR SRAM-BASED FPGAS                                                                  743


                         TABLE 3                                                                                         TABLE 4
 Summary of Routing Resources Needed by TMR and RoRA                                                       CPU Time Needed by RoRA and Xilinx PAR
                Circuits’ Implementations




of them correspond to the effects described in Section 4. In                                 the time needed by RoRA is higher than that of the
particular, the majority of faults escaping TMR are due to                                   commercial tool; however, the increased time for running
SEUs in the routing resources. A very limited number of                                      the place and route process is rewarded by a much higher
faults escaping TMR do not fall in the scenario outlined in                                  fault tolerance capability.
Section 4: These are faults that affect FPGA’s resources
that do not depend on the implemented circuit and whose
usage is independent of the place and route algorithm. For                                   7      CONCLUSIONS
this very specific device-dependent type of fault, different                                 This paper presented a theoretical analysis of the effects of
hardening strategies must be envisioned, possibly coming                                     SEUs in the configuration memory of SRAM-based FPGA
from the FPGA vendor.                                                                        devices. The analysis showed that one SEU may have
    We observed that RoRA drastically reduces the number                                     multiple effects since, depending on which bit of the
of SEUs producing a Wrong Answer. In particular, as far as                                   configuration memory it affects, it can modify two or more
routing resources are concerned, RoRA is able to reduce the                                  FPGA’s resources. In the case where the circuit the FPGA
number of faults producing wrong answer by three orders                                      implements is designed according to the TMR, one SEU
of magnitude, while reductions by a factor of two were
                                                                                             may escape it.
observed for faults affecting CLBs. The number of routing                                        This paper also presented a reliability-oriented place and
faults is effectively reduced thanks to the ability of RoRA of
                                                                                             route algorithm able to minimize the effects of SEUs
generate a reliability-aware routing topology able to avoid
                                                                                             affecting the configuration memory of SRAM-based FPGAs.
the propagation of multiple errors through the different
                                                                                             Its effectiveness was evaluated on some benchmark circuits
circuit domains output. Although very effective, RoRA still
produces circuits where few SEUs escape and provoke                                          by means of fault-injection experiments in the FPGA’s
circuit misbehaviors. As the reader can notice, the numbers                                  configuration memory. The results we gathered show a
of faults within the CLBs are not as widely reduced as the                                   drastic reduction in the number of SEUs causing circuit
routing ones. This is due to critical faults that cannot be                                  misbehavior with respect to those observed for the same
masked only through the usage of a reliability-oriented                                      circuits when the TMR design technique is adopted.
place and route algorithm since they produce errors that                                         For the considered benchmarks, the capability of tolerat-
exclusively influence those FPGA parts (such as the                                          ing SEU effects in the FPGA’s configuration memory
delivery of power or reset signals to CLB or routing                                         increases up to 85 times with respect to the standard
resources) that can be hardened only by the usage of                                         TMR design technique. This improvement comes without
information provided by the vendor.                                                          any additional logic resources with respect to the
    In our experiments, we also measured the overhead of                                     TMR design technique, while a performance penalty of
RoRA in terms of required FPGA routing resources. We                                         22 percent on the average was observed.
reported the results in Table 3, where PIPs TMR and PIPs
RoRA report the number of PIPs in the circuits obtained by
the TMR approach and those obtained by RoRA. Please                                          ACKNOWLEDGMENTS
note that RoRA uses a higher number of PIPs for each                                         This work was partially supported by the Xilinx University
circuit: The existence of a forbidden graph’s routing vertices                               Program (XUP) and by the Italian Ministry for Research and
(i.e., PIPs) forces RoRA to produce connections that are                                     University (MIUR).
longer than those obtained in the TMR circuits, where all
the PIPs are available to the router tool. However, the
overhead in terms of PIPs is rewarded by a much higher                                       REFERENCES
degree of fault tolerance. The computation times needed by                                   [1]    M. Nikolaidis, “Time Redundancy Based Soft-Error Tolerance to
                                                                                                    Rescue Nanometer Technologies,” Proc. IEEE 17th VLSI Test
RoRA to perform the place and route process are reported                                            Symp., pp. 86-94, Apr. 1999.
in Table 4. The machine used for running RoRA was a                                          [2]    E. Normand, “Single Event Upset at Ground Level,” IEEE Trans.
SunUltra 250 equipped with 2 Gbytes of RAM and running                                              Nuclear Science, vol. 43, no. 6, pp. 2742-2750, Dec. 1996.
at 400 MHz. As a reference, we also reported the time                                        [3]    M. Alderighi, A. Candelori, F. Casini, S. D’Angelo, M. Mancini, A.
                                                                                                    Paccagnella, S. Pastore, and G.R. Sechi, “Heavy Ion Effects on
needed by a commercial tool (Xilinx PAR) for placing and                                            Configuration Logic of Virtex FPGAs,” Proc. IEEE 11th On-Line
routing the considered circuits. As the reader can observe,                                         Testing Symp., pp. 49-53, 2005.


          Authorized licensed use limited to: Instituto Politecnico do Porto. Downloaded on April 3, 2009 at 17:14 from IEEE Xplore. Restrictions apply.
744                                                                                                  IEEE TRANSACTIONS ON COMPUTERS,                       VOL. 55,   NO. 6,   JUNE 2006

[4]    P. Graham, M. Caffrey, D.E. Johnson, N. Rollins, and M. Wirthlin,                         [26] E. Fuller, M. Caffrey, A. Salazar, C. Carmichael, and J. Fabula,
       “SEU Mitigation for Half-Latches in Xilinx Virtex FPGAs,” IEEE                                 “Radiation Tesing Update, SEU Mitigation and Availability
       Trans. Nuclear Science, vol. 50, no. 6, pp. 2139-2146, Dec. 2003.                              Analysis of the Virtex FPGA for Space Re-Configurable Comput-
[5]    C. Carmichael, M. Caffrey, and A. Salazar, “Correcting Single-                                 ing,” Proc. IEEE Nuclear and Space Radiation Effects Conf., July 2000.
       Event Upset through Virtex Partial Reconfiguration,” Xilinx                               [27] M. Bellato, M. Ceschia, M. Menichelli, A. Papi, J. Wyss, and A.
       Application Notes, XAPP216, 2000.                                                              Paccagnella, “Ion Beam Testing of SRAM-Based FPGA’s,” Proc.
[6]    F. Lima Kanstensmidt, G. Neuberger, R. Hentschke, L. Carro, and                                IEEE Radiation Effects Data Workshop, July 2002.
       R. Reis, “Designing Fault-Tolerant Techniques for SRAM-Based                              [28] M. Alderighi, F. Casini, S. D’Angelo, F. Faure, M. Mancini, S.
       FPGAs,” IEEE Design and Test of Computers, pp. 552-562, Nov./                                  Pastore, G.R. Sechi, and R. Velazco, “Radiation Test Methodology
       Dec. 2004.                                                                                     of SRAM-Based FPGAs by Using THESIC+,” Proc. IEEE Ninth On-
[7]    F. Lima, L. Carro, and R. Reis, “Designing Fault Tolerant System                               Line Testing Symp., p. 162, 2003.
       into SRAM-Based FPGAs,” Proc. IEEE/ACM Design Automation                                  [29] V. Betz and J. Rose, “Directional Bias and Non-Uniformity in
       Conf., pp. 650-655, June 2003.                                                                 FPGA Global Routing Architectures,” Proc. Int’l Conf. Computer-
[8]    S. Habinc Gaisler Research, “Functional Triple Modular Redun-                                  Aided Design (ICCAD), pp. 652-659, 1996.
       dancy (FTMR) VHDL Design Methodology for Redundancy in                                    [30] C. Ebeling, L. McMurchie, S.A. Hauck, and S. Burns, “Placement
       Combinational and Sequential Logic,” www.gaisler.com, 2002.                                    and Routing Tools for the Triptych FPGA,” IEEE Trans. Very Large
                                                                                                      Scale Integration, pp. 473-482, Dec. 1995.
[9]    N. Rollins, M.J. Wirthlin, M. Caffrey, and P. Graham, “Evaluating
                                                                                                 [31] C.Y. Lee, “An Algorithm for Path Connections and Its Applica-
       TMR Techniques in the Presence of Single Event Upsets,” Proc.
                                                                                                      tion,” IRE Trans. Electronic Computers, vol. 10, pp. 346-365, Sept.
       Military and Aerospace Programmable Logic Design (MAPLD 2003),
                                                                                                      1961.
       2003.
                                                                                                 [32] Xilinx Inc., “Spartan-II 2.5 V FPGA Family: Introduction and
[10]   M. Bellato, P. Bernardi, D. Bortolato, A. Candelori, M. Cerchia, A.
                                                                                                      Ordering Information,” Xilinx Product Specification Datasheets,
       Paccagnella, M. Rebaudengo, M. Sonza Reorda, M. Violante, and
                                                                                                      2003.
       P. Zambolin, “Evaluating the Effects of Seus Affecting the
       Configuration Memory of an SRAM-Based FPGA,” Proc. IEEE
       Design Automation and Test in Europe, pp. 188-193, 2004.                                                                 Luca Sterpone received the MS degree in
                                                                                                                                computer science engineering from the Politec-
[11]   M. Ceschia, M. Violante, M. Sonza Reorda, A. Paccagnella, P.
                                                                                                                                nico di Torino in 2003. He is currently working
       Bernardi, M. Rebaudengo, D. Bortolato, M. Bellato, P. Zambolin,
                                                                                                                                toward the PhD degree in computer and system
       and A. Candelori, “Identification and Classification of Single-
                                                                                                                                engineering at the same institution. His main
       Event Upsets in the Configuration Memory of SRAM-Based
                                                                                                                                research interests include fault-tolerant systems,
       FPGAs,” IEEE Trans. Nuclear Science, vol. 50, no. 6, pp. 2088-2094,
                                                                                                                                place and route algorithms, and reconfigurable
       Dec. 2003.
                                                                                                                                computing. He is a student member of the IEEE.
[12]   P. Bernardi, M. Sonza Reorda, L. Sterpone, and M. Violante, “On
       the Evaluation of Seus Sensitiveness in SRAM-Based FPGAs,”
       Proc. IEEE 10th On-Line Testing Symp., pp. 115-120, 2004.
[13]   F. Lima Kanstensmidt, L. Sterpone, L. Carro, and M. Sonza
       Reorda, “On the Optimal Design of Triple Modular Redundancy
       Logic for SRAM-Based FPGAs,” Proc. IEEE Design, Automation and                                                           Massimo Violante received the MS degree in
       Test in Europe, pp. 1290-1295, 2005.                                                                                     computer engineering (1996) and the PhD
[14]   C. Carmichael, “Triple Module Redundancy Design Techniques                                                               degree in computer engineering (2001) from
       for Virtex FPGAs,” Xilinx Application Notes, XAPP197, 2001.                                                              the Politecnico di Torino, Italy. Currently, he is
[15]   P. Brinkley, A. Carmichael, and C. Carmichael, “SEU Mitigation                                                           an assistant professor with the Department of
       Design Techniques for XQR4000XL,” Xilinx Application Notes,                                                              Computer Engineering of the same Institution.
       XAPP181, 2000.                                                                                                           His main research interests include design,
[16]   P.K. Samudrala, J. Ramos, and S. Katkoori, “Selective Triple                                                             validation, and test of fault-tolerant electronic
       Modular Redundancy (STMR) Based Single-Event Upset (SEU)                                                                 systems. He is a member of the IEEE.
       Tolerant Synthesis for FPGAs,” IEEE Trans. Nuclear Science, vol. 51,
       no. 5, Oct. 2004.
[17]   S. Brown, “FPGA Architecture Research: A Survey,” IEEE Design
       and Test of Computers, pp. 9-15, Nov./Dec. 1996.
[18]   J. Rose, A. El Gamal, and A. Sangiovanni-Vincetelli, “Architecture
       of Field-Programmable Gate Arrays,” Proc. IEEE, vol. 81, no. 7,
       pp. 1013-1029, July 1993.                                                                 . For more information on this or any other computing topic,
[19]   C. Stroud, J. Nall, M. Lashinsky, and M. Abramovici, “BIST-Based                          please visit our Digital Library at www.computer.org/publications/dlib.
       Diagnosis of FPGA Interconnect,” Proc. Int’l Test Conf., pp. 618-
       627, 2002.
[20]   Y.W. Chang, D.F. Wong, and C.K. Wong, “Universal Switch
       Modules for FPGA Design,” ACM Trans. Design Automation of
       Electronic Systems, pp. 80-101, Jan. 1996.
[21]   T.P. Ma and P.V. Dressendorfer, Ionizing Radiation Effects in MOS
       Devices and Circuits. Wiley, 1989.
[22]   J.L. Barth, C.S. Dyer, and E.G. Stassinopoulos, “Space, Atmo-
       spheric, and Terrestrial Radiation Environments,” IEEE Trans.
       Nuclear Science, vol. 50, no. 3, pp. 466-482, June 2003.
[23]   M. Ceschia, A. Paccagnella, S.-C. Lee, C. Wan, M. Bellato, M.
       Menichelli, A. Papi, A. Kaminski, and J. Wyss, “Ion Beam Testing
       of ALTERA APEX FPGAs,” NSREC 2002 Radiation Effects Data
       Workshop Record, July 2002.
[24]   R. Katz, K. LaBel, J.J. Wang, B. Cronquist, R. Koga, S. Penzin, and
       G. Swift, “Radiation Effects on Current Field Programmable
       Technologies,” IEEE Trans. Nuclear Science, vol. 44, no. 6, pp. 1945-
       1956, Dec. 1997.
[25]   M. Wirthlin, E. Johnson, N. Rollins, M. Caffrey, and P. Graham,
       “The Reliability of FPGA Circuit Designs in the Presence of
       Radiation Induced Configuration Upsets,” Proc. 11th Ann. IEEE
       Symp. Field-Programmable Custom Computing Machines, pp. 133-142,
       2003.


              Authorized licensed use limited to: Instituto Politecnico do Porto. Downloaded on April 3, 2009 at 17:14 from IEEE Xplore. Restrictions apply.

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:11
posted:7/31/2010
language:English
pages:13
Description: ELECTRONIC devices are sensitive to radiation that may sensitive groups