Docstoc

Fault tolerant circuit design

Document Sample
Fault tolerant circuit design Powered By Docstoc
					Soft-errors Modeling and
evaluation in caches


      Reliable low power design
                      Lin Li &Vijay Degalahal
                        Fall 2003, CSE-598C
Outline
   Introduction
       Soft Errors
       Low Power Cache design
   Methodology
       Hspice
       Simple scalar
   Results
   Conclusions
Soft Errors
   Soft errors or transient errors are
    circuit errors caused due to excess
    charge carriers induced primarily by
    external radiations
   These errors cause an upset event but
    the circuit it self is not damaged.
Power Vs Soft Errors
   SER α Nflux * CS*exp (-Qcritical /Qs)
    [Hazucha, 2000]
        Nflux- Neutron Flux
        CS- Cross Sectional area
        Qcritical – Critical charge necessary for a Bit
         Flip
        Qs – Charge Collection Efficiency


        Q = CV,
        For a given process voltage has exponential
        dependency
Power
   Cache dominant sources of power
    consumption
       Occupy a lot of area and not actively
        used
   Considered two schemes
       Cache Decay
       Drowsy Cache
Cache Decay




  “turning off” cache lines when they hold
  data not likely to be reused.
Cache Decay
   Use the generational rate to
    determine turn off interval
   Disadvantage
       Lose data, if prediction wrong
Drowsy Cache
   Reduce Cache VDD
    periodically
   Just one cycle
    access penalty
   No data lost
   Simple as uses just
    one global counter
Methodology
   Hspice
       Usual prodecure
          Inject current at

           a node and see
           for inversion
          Long time, and

           boring to see
           waveforms
    Methodology
   Hspice
       Developed a script   .param spike = 0
                             Iio1 io1 vdd exp (0 spike 0.435ns 0.001ns 0.436ns 0.001ns)
       Good for memory      Vio1 io1
                             .TRANS 100ps 30ns
        based elements       +sweep
                             +(spike -10mamps -43.5mamps -1mamps)
                             *+nvth 0.2v -0.2v -0.01v
       Swipe the current    *+pvth -0.2v 0.2v 0.05v
                             .measure avgflipcur avg i(Vio1) from = 435ps to = 442ps
       Fast more accurate   .measure minv min v(q) from = 1.6ns to = 1.8ns
                             *.measure minv2 min v(out2) from = 10ps to = 500ps
                             *.measure tran delay trig V(in) val =0.5 fall= 2 targ V(out)
                             val= 0.5 fall=3
                             *.measure tran delay2 trig V(out) val =0.5 fall= 3 targ
                             V(out2) val= 0.5 fall=3
Leakage Vs Soft Error

             35                                             0.3

             30                                             0.25
                                                                             Drowsy
             25
                                                                             Caches
                                                            0.2
 fCoulumbs




             20




                                                                   pJoules
                                                            0.15
             15
                                                            0.1
             10

             5                                              0.05


             0                                              0
                  0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
                                    fC
                       Qcritical in Vdd
                      Leakage in pJ/cycle
    Methodology
        Soft error injection in SimpleScaler
              Random variable R1  [0, 1]
                   Every cycle, generate a random number
                   Indicate which kind of soft error happens
0       1E-9        1E-8                 1E-7                     1



                                    Normal   Low Voltage   R/W
               Single-bit Error      1E-7       1E-6       5E-7
               Double-bit Error      1E-8       1E-7       5E-8
               Multi-bit Error       1E-9       1E-8       5E-9

                           Soft Error Rate (per cycle)
Methodology
   Soft error injection in SimpleScaler
     Random variable R2  [0, #set - 1]


     Random variable R3  [0, #way - 1]

     Random variable R4  [0, #bit - 1]
Injected Soft Errors
                 Original Cache          Drowsy Cache
           1-bit 2-bit Multi-bit 1-bit    2-bit   Multi-bit
           error error  error    error    error    error
 gzip       39       4        0   299       34          3
 gcc        65       6        0   540       54          5
 mcf       112       13       1   966      100       12
 perlbmk    60       5        0   494       50          4
 gap        36       4        0   281       34          3
 vortex     82       10       1   662       69          11
 bzip2      43       4        0   328       38          4
 twolf      76       8        1   639       58          8
Soft error in drowsy cache increases significantly.
More powerful/effective error detection/correction
schemes are needed.
Category of Soft Errors
   Injected soft errors
        On a invalid cache blocks
        On a valid cache blocks
             Read into processor (1)
             Overwritten by new data
             Replaced by new cache block, if block is clean
             Written back to L2 cache, if block is dirty (2)

    We call (1) and (2) as effective errors, that
    propagate errors to pipeline or L2 cache.
Effective Soft Errors
                Original Cache          Decay Cache
          1-bit 2-bit Multi-bit 1-bit    2-bit   Multi-bit
          error error  error    error    error    error
gzip       22       3        0    3        0          0
gcc        38       4        0   13        1          0
mcf        53       8        0   23        2          0
perlbmk    32       5        0   19        2          0
gap        32       3        0    7        0          0
vortex     38       6        0   16        4          0
bzip2      5        1        0    2        0          0
twolf      31       2        0   17        0          0
Decay cache turns off cache lines if they are not
accessed for a long period of time.
It saves leakage energy and increases reliability.
Outcome of Soft Errors




In normal cache and drowsy cache, errors being
replaced and written back to L2 are dominant. In
decay cache, invalid soft error is dominant.
 Happen Time of Soft Errors
    Cumulative Errors




                                              gzip




                                              Cycles
Time gap between the time when cache block is loaded
into L1 cache and the time when soft error happens.
The longer the block is in cache, the high probability
it is hit by soft error.
               Influence of Bit Interleaving
                            w/o Interleaving                 w/ Interleaving
                         1-bit   2-bit   Multi-bit   1-bit     2-bit    Multi-bit
                         error   error    error      error     error     error
               gzip      299      34        3        372         0             0
Drowsy cache




               gcc       540      54        5        653          1            0
               mcf       966     100        12       1192        2             0
               perlbmk   494      50        4        551         6             1
               gap       281      34        3        351         4             1
               vortex    662      69        11       823         3             0
               bzip2     328      38        4        408         0             0
               twolf     639      58        8        769         0             0

               Convert a multi-bit error into several single-bit errors.
                                 Influence of EDC/ECC
                                                Parity      SEC-DED         DEC
Failure errors in Drowsy cache




                                           Read      WB    Read   WB   Read   WB
                                 gzip       1        152    0     17    0         1
                                 gcc        15       266    0     32    0         2
                                 mcf        0        392    0     39    0         4
                                 perlbmk    10       258    0     34    0         2
                                 gap        3        285    0     36    0         3
                                 vortex     18       274    0     28    0         3
                                 bzip2      1        60     0     4     0         1
                                 twolf      1        259    0     23    0         4

Clean block: failure errors = undetectable errors
Dirty block: failure errors = undetectable errors + uncorrectable errors
Error protection is very important for dirty blocks in write-back cache.
Conclusions
   Drowsy cache incurs a significant
    increase of soft errors.
   Decay cache can reduce the amount
    of effective errors.
   Bit interleaving can alleviate the
    impact of multi-bits soft errors.
   Dirty cache blocks in write-back
    cache are more vulnerable.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:11/2/2011
language:English
pages:21