Docstoc

2a

Document Sample
2a Powered By Docstoc
					Timing Optimization in
Logic with Interconnect


Arkadiy Morgenshtein, Eby G. Friedman, Ran Ginosar, Avinoam Kolodny

                Technion – Israel Institute of Technology



                 SLIP (System Level Interconnect Prediction) 2008
                                                                      1
Intro
                 Timing Optimization
    Timing Optimization


                           A          function              B




        Special cases

        A
                                  B                   Typically, a mixture of both
                Logic

                               only gates



        A          Logic                                                    B
                                             Interconnect
                               only wires
                                                                                     2
Intro
                    Logic with Wires

   Common Example                         1


                                      2                   1

                                                      2



                           4      3                   3



                                              4




                           5
  UART design                                     5




                                                              3
Intro
                             The Interconnect Wall

     Logic w/o wires                                                                        Long wires




           Logic Gate Sizing                                           Interconnect Optimization

                                                                                     wire
                                                         Logic                        L

                C2                        CN                                                              CL
     g1                        C3    g3
                     g2                        gN
C1

          h1=C2/C1        h2=C3/C2                  CL
                                                         Logic   L/k      1    L/k            k-1   L/k

                                                                                                          CL



               Logical Effort                                    Repeater Insertion




                                                                                                               4
Intro
                     Timing Optimization in
                     Logic with Interconnect

   Logic w/o wires                                            Long wires




                        A        L1
                            C1             L2
                                      C2             L3
                                                C3
                                                          B




                                                                           5
Existing Techniques

A (very) Short Tutorial



                          6
Intro
                  Logical Effort (only logic)
                              Delay            =         Delay             =             Delay           =           Delay
                               h1=C2/C1                  h2=C3/C2                          h3=C4/C3                     h4=CL/C4

                       g1                                            C3      g3
                                        C2        g2                                                C4        g4
                C1

                                                                                                                          CL




        Delay model
                                                       Optimal sizing
               Di    gi  hi  pi
                                                   Delayi = Delayi+1
     - delay of minimal inverter R0·C0 , technology
        constant
    g i - logical effort, gate type factor: e.g. ginv=1
                                                         gihi=gi+1hi+1
    hi - electrical effort, load driving capability
    pi - parasitic effort, due to output capacitance
                I. Sutherland, B. Sproull, and D. Harris, “Logical Effort - Designing Fast CMOS Circuits,” Morgan Kaufmann, 1999.   7
Intro
          Limitations of Logical Effort
                       Delay        =    Delay         =    Delay        =        Delay
                       h1=C2/C1          h2=C3/C2            h3=C4/C3              h4=CL/C4

                  g1                              C3   g3
                               C2   g2                              C4       g4
             C1

                                                                                    CL



                       • No wires                • No fixed side branches


Logic with wires and branches
                                             LE breaks down
                                    ?                  ?                 ?
                       Delay        =    Delay         =    Delay        =        Delay




                                                                                              8
Intro
        Repeater Insertion (only wires)
                                           R=5, C=5
Delay ~ Length2                                                  L=5                                                    D = RC = 25


                         r=1                r=1                  r=1                 r=1                  r=1
                         c=1                c=1                  c=1                 c=1                  c=1
Delay ~ Length          L=1                 L=1                  L=1                 L=1                  L=1            D = Σrc = 5




             Optimal sizing                                                      Optimal number of repeaters

                    R0  Cw                                                                      0.4  Rw  Cw
           x                                                                        k
                    Rw  C0                                                                      0.7  R0  C0

        R0 - effective resistance of minimal inverter                        Rw - wire resistance
        C0 - gate capacitance of minimal inverter                            Cw - wire capacitance

                  H.B. Bakoglu, “Circuits, Interconnections and Packaging for VLSI,” Adison-Wesley, pp. 194-219, 1990                  9
Intro
        Properties of Repeater Insertion
   Assumptions of basic repeater insertion (RI)

                     Equal size

                     Equal spacing

                     Terminal gates are similar to repeaters


                                     x                 =

                               L/k       h   L/k           h   L/k
             Logic                             equal
                                               fixed


   Characteristics of RI

                     Number and size of repeaters are independent

                     Single optimal size for a given process and metal layer

                                                                               10
So, What Are We Going To Do?




                               11
Intro
             We Are Breaking The Wall
 Logic w/o wires                                                                     Long wires


   Logical Effort                                                                   Repeaters Insertion




                                 wire   C2                           CN
                         g1                         wire        g3
                                                           C3             gN
                                 Cw1         g2
                    C1           Rw1                Cw2
                                                    Rw2
                                                                               CL



                              WANTED – solution for the mixed case
                                                  Challenges:
                                  Gate placements
                                  Gate sizes
                                  Number of gates, repeaters

                                                                                                     12
 Our Approach to Timing Optimization
Logic Gates as Repeaters (LGR)


                                     Gate placement (along the wire)




Unified Logical Effort (ULE)

                                              Gate sizes



Gate-terminated Sized Repeater Insertion (GSRI)

                                                  Number of repeaters


                                                                        13
Logic Gates as Repeaters - LGR
 “Where should the gates be located (along the wire)?”




                                                         14
LGR
                                                         The Idea
             • Problem – delay reduction in logic with wire
                             IN                                                                                   OUT
                                                                                    L


             • A solution – wire segmenting by repeaters
                            IN                                       1                         k-1               OUT
                                                           L /k               L /k                     L /k


             • Drawback – power, area w/o logical functionality = waste
             • Proposed – logic gates as repeaters
                             IN                  L1                           L2                                 OUT
                                                                                                       L3



          LGR - distribution of logic gates over interconnect
                         - driving the partitioned wire without adding repeaters

K. Venkat, “Generalized Delay Optimization of Resistive Interconnections through an Extension of Logical Effort,” ISCAS 1993   15
 LGR
                                 LGR Delay Modeling
                    i-1
                                                                i
                                         Li-1                                                        i+1
                                                                                    Li
                                                                                                                            Li+1
                                                Cwi  Li  Cint , Rwi  Li  Rint

              Total Delay


                                          LiCint                                     
                                                                                                                                   
                               N
                 Dtot     g i   hi           0.5  Li RintCint  Li RintCi 1 
                                                              2

                        i 1               Ci                                        



M. Moreinis, A. Morgenshtein, I. Wagner, and A. Kolodny, “Logic Gates as Repeaters (LGR) for Area-Efficient Timing Optimization,” IEEE TVLSI, 2006   16
LGR
                 Optimal Wire Segmenting
                      i-1
                                             i
                              Li-1                           i+1
                                                      Li
                                                                         Li+1
                                                 Ri
                                                                  Ri+1
                                        Ci
                                                           Ci+1

     Dtot
           0
      Li                            L L  Rav  Ri  L Cav  Ci 1 
         N                  Liopt                  
         L
         1
             i   L                  N      Rw             Cw

 •    Output resistance of driving gate i below average  wire length i is increased
 •    Input capacitance of successor gate i+1 above average wire length i is decreased
 •    All gates are equal  equal partitioning

 •    In the case of a negative segment length, neighbor gates are merged

                                                                                          17
LGR
                                                   LGR Results
      Critical path of 8-256 decoder circuit

                                                   Unoptimized           LGR Segmenting

                                     40                                                         36.4
                                                              34.6                                       34.9
                                     35

                                     30                                25.2
                      Delay [nsec]




                                     25

                                     20

                                     15

                                     10
                                                                               3.62   3.47
                                            2.28     2.15
                                      5

                                      0
                                          Low-tier          Low-tier          High-tier      High-tier
                                          1.5mm              15mm             1.5mm           15mm


      • Delay reduction of up-to 27% - by “moving” the gates

      • Further delay reduction – by scaling and LGR+RI

       M. Moreinis, et al., “Repeater Insertion combined with LGR Methodology for on-Chip Interconnect Timing Optimization,” ICECS, 2004.   18
LGR
                 Optimal Gate Scaling
      • Enlargement of all gates by a uniform factor S to minimize timing
          • can be performed iteratively with Segmenting

             1
                             2               3
                    L1                                     4
                                       L2           L3
                                                                L4


  Dtot                  N            N              
        0           s    Ri  Cwi    Rwi  Ci 1 
   s                     i 1         i 1          
                                             equal
                                 inverters   segments

                             s     Cw R0  C0Rw 

                                                                            19
LGR
                 LGR Segmenting and Scaling
  Uniform scaling performed for all gates
                                                           Repeater Insertion             LGR

                                          6                               5.45


                                          5
                           Delay [nsec]



                                          4

                                          3

                                                                 1.65
                                          2

                                                                                                     0.542   0.557
                                          1    0.268                              0.194
                                                       0.188                               0.086

                                          0
                                              Low-tier         Low-tier          High-tier         High-tier
                                              1.5mm             15mm             1.5mm              15mm

  • For intermediate wires LGR outperforms RI by up-to 55%
  • For long wires RI is faster
           • BUT: it requires 44 repeaters

  • Best for long wires – combined LGR and RI
      M. Moreinis, et al., “Repeater Insertion combined with LGR Methodology for on-Chip Interconnect Timing Optimization,” ICECS, 2004.   20
LGR
                   LGR Summary




      • Logic gates serve as repeaters
         No need for logically redundant repeaters

      • Delay reduction + lower area/power
      • Can be combined with RI

                                                      21
Unified Logical Effort - ULE
   “What is the optimal size of the gates?”




                                              22
ULE
      Unified Delay Model (including wires)
                             wire segment                        wire segment
                                 Rw                                  Rw
               g                                       g
        C                                        C
                       Cw                   Cw              Cw                  Cw
                       2                    2                2                   2




                        Cw
                   hw                          pw  Rw   0.5  Cw  C 
                        C
     Capacitive                                                                Resistive
 interconnect effort                                                      interconnect effort

                           D    g   h  hw   pw



                                                                                                23
ULE
               Minimal Delay Condition




                            Rwi  Ci   
                                                                          
      D
          0           gi               hi  g i 1  hi 1  hwi 1
      hi                   R0  C0    
                                       

                Minimal Delay                   Equal Stage Delays



                                                                               24
ULE
      Minimal Delay for Capacitive Wires




                Rw i  Ci   
           gi 
                R0  C0                     
                               hi  g i 1  hi 1  hw i 1       General RC interconnect
                            



                                      
            gi  hi  gi 1  hi 1  hwi 1                          Capacitive interconnect
                                                                     (short wires and branches)




                                                                                                25
ULE
          ULE Convergence to LE and RI




                                 Rw i  Ci   
                            gi 
                                 R0  C0
                                                                 
                                                hi  g i 1  hi 1  hw i 1
                                                                                 
                                             

logic without wires                                                                          repeater insertion
                                               special cases
               hw  0   Rw  0                                                h 1    g 1


           gi  hi  gi 1  hi 1                                     C        R0  Cwi
                                                               xopt    i 
                                                                       C0       Rwi  C0

                                                                                                            26
ULE
                     Some Algebra…
                    wire                                             wire
           Ri                                                                    Ci+2
                    Rwi        Ci+1                    Ri+1         Rwi+1
      gi                                     gi+1                                        gi+2

            Cwi                Cwi                       Cwi+1                   Cwi+1
             2                  2                         2                       2



                          Rw i  Ci   
                     gi 
                          R0  C0
                                                           
                                         hi  g i 1  hi 1  hw i 1
                                                                           
                                      

             Ri  Ci Rw i  Ci      Ci  1 R i  1  C i  1     Ci  2 Cw i 1   
             R C  R C
                                    C  R C
                                                                C  C
                                                                                   
                                                                                     
             0 0      0      0         i        0     0          i 1    i 1     


                      R  R   C
                           i   wi         i 1            
                                                  Ri 1  Ci  2  Cwi 1   
                                                                                                27
ULE
             Intuition of ULE Optimum
                     wire                                      wire
        Ri                                                                 Ci+2
                     Rwi      Ci+1              Ri+1          Rwi+1


         Cwi                  Cwi                   Cwi+1                 Cwi+1
          2                    2                     2                     2




               R  R   C
                 i     wi   i 1         =                    
                                                        Ri 1  Cwi 1  Ci 2   
                                     optimal size

                       DC                                         DR

      Delay caused by gate capacitance should be equal to
                delay caused by gate resistance

                                                                                     28
ULE
                                        ULE Optimality
                                  -11
                           x 10                          Delay vs. Sizing Factor
                     3.5



                      3

                                             Dtot
Size too small       2.5
                                                                  Dmin

high resistance                                                                                        Size too big
                      2

                  Delay                                                                            high capacitance
                     1.5
                                             DR

                      1
                                   D*

                     0.5
                                        DC
                                                                  optimum
                           20           40          60       80          100   120   140   160   180

                                                             sizing factor



                                                                                                                      29
ULE
            Optimal Gate Capacitance




                                      Cw i 1                         g i 1
             Ci 1  Ci  Ci  2    1            
                                         Ci  2                       Rwi  Ci
                         LE                                    gi 
                                     wire capacitance                    R0  C0
                                                         logical efforts and wire resistance




      • Expression for size of a single gate
      • Gate sizes along a logic path are iteratively determined

                                                                                               30
ULE
                  Examples (1): ULE Sizing
                                          N=9, L={0, 0.01, 0.05, 0.1, 0.5, 1}mm

         g=4/3        g=4/3       g=4/3       g=4/3       g=4/3                         g=4/3            g=4/3       g=4/3             g=4/3


                  L           L           L           L                            L             L               L            L
        C1=10C0
                                                                                                                                   CN=100C0


                                                                        100

Equal wires                                                                   90
Total electrical effort H = 10                                                80

                                                                              70
• L = 0  Size converges to LE                            Capacitance (×C0)
                                                                              60          xopt
• Longer wires  ULE is faster                                                50
                                                                                       L=1mm
                                                                              40
• Long wires  Fixed sizing xopt                                              30
                                                                                                                             L=0

                                                                              20
                                                                                                                     LE
                                                                              10
                                                                                   1      2          3           4     5           6           7   8   9
                                                                                                                     Gate #
                                                                                                                                                       31
ULE
                Examples (2): ULE Sizing
                                            N=9, L={0, 0.01, 0.05, 0.1, 0.5, 1}mm

            g=4/3       g=4/3       g=4/3       g=4/3       g=4/3                         g=4/3              g=4/3            g=4/3             g=4/3


                    L           L           L           L                             L             L                     L             L
          C1=10C0                                                                                                                               CN=10C0



                                                                                 60
                                                                                 55               xopt
Total electrical effort H = 1
                                                                                 50
                                                                                                                                                            L= 1 mm
                                                                                 45                      0.5 mm
• L = 0  Converges to LE (no scaling)                                                                                          100 µm


                                                            Capacitance (×C0 )
                                                                                 40
                                                                                                                                50 µm
                                                                                 35
• All wire lengths  ULE is faster                                               30
                                                                                 25
                                                                                                                                  10 µm
• Long wires  Fixed sizing xopt                                                 20
                                                                                 15                                                   LE
                                                                                                               L= 0
                                                                                 10

                                                                                      1     2            3            4        5            6           7   8         9
                                                                                                                               Gate #


                                                                                                                                                                          32
ULE
                    So, What is Xopt ?
                              Rwi-1        Ci              Ri         Rwi
                                                  gi
                          Cwi-1                                 Cwi




                                       Rwi 1Ci        Ri Cwi


                                            Cw             gi
                   Ci  Ci 1  Ci 1   1  i  
                                         C 
  For long wires
                                             i 1 
                                                             Rwi 1  Ci 1
                                                     gi 1 
                                                              R0  C0




                                  Ci opt          cw  R0       Li
                      xopti                                
                                  Cimin         r  C0  g     Li 1
                                              E 5w5 5 5 5 5i
                                                          F
                                                  constant



                                                                              33
ULE
      Optimum Condition for Long Wires
                        Rw     C                R        Rw
                                         gi
                   Cw                               Cw




                              RwC             RCw

  For long wires
                         R  Rw   C    R  Cw  C 




                             Rw  C  R  Cw


                                                              34
ULE
                            Xopt and Repeaters
                                      Rw           Crep                      Rrep            Rw

                                Cw                                                      Cw




                                                RwCrep                 RrepCw

             Optimal sizing condition
                          for repeater
                                                               Rw  Crep  Rrep  Cw



             cw  R0        Li         equal wires                         R0  cw                                       R0  cw
 xopti                                                  xopti                                  INV
                                                                                                               xopt 
           rw  C0  g i   Li 1                                         rw  C0  g i           (g=1)                   rw  C0


                   H.B. Bakoglu, “Circuits, Interconnections and Packaging for VLSI,” Adison-Wesley, pp. 194-219, 1990             35
ULE
        Solving Design Problems with Xopt
                          L1        Crep              Rrep      L2




                               rw L1  Crep      Rrep  cw L2

                                       cw  R0        Li
                           xopt                   
                                     rw  C0  g i   Li 1


- Layout constraint - optimal size of the repeater located between two wires


                                           cw  R0   L2
                           xrep                   
                                           rw  C0   L1

                                                                               36
ULE
        Solving Design Problems with Xopt
                           L1        Crep              Rrep      L2




                                rw L1  Crep      Rrep  cw L2

                                        cw  R0        Li
                            xopt                   
                                      rw  C0  g i   Li 1


 - Cell size constraint - optimal wire length with a repeater of size xrep


                            L2         cw  R0 
                                xrep 
                                  2
                                                
                            L1         rw  C0 
                                                                             37
ULE
                 Typical Design Example
                                                                                       Optimal Sizing of the Gates
                                                             120
                                                                                similar g, similar L
                                                             100                different g, similar L
Optimal ULE sizing                                                              similar g, different L

(a) similar gates, similar wires                             80




                                               capacitance
(b) different gates, similar wires                           60

(c) similar gates, different wires                           40



•     Gates with higher logical effort get                   20

      bigger size                                             0
                                                                         1     2        3        4         5         6        7       8         9
                                                                                                         gate #
•     No fixed xopt in circuits with various      (a)              g=4/3     g=4/3    g=4/3    g=4/3    g=4/3     g=4/3    g=4/3    g=4/3    g=4/3
      gates and wires                                                         L=0.1    L=0.1    L=0.1    L=0.1     L=0.1    L=0.1    L=0.1
                                                                     L=0.1

                                                  (b)              g=1       g=4/3    g=4/3    g=5/3    g=5/3     g=4/3    g=7/3    g=1      g=1


                                                                     L=0.1    L=0.1    L=0.1    L=0.1    L=0.1     L=0.1    L=0.1    L=0.1

                                                                   g=4/3     g=4/3    g=4/3    g=4/3    g=4/3     g=4/3    g=4/3    g=4/3
                                                   (c)                                                                                       g=4/3


                                                                    L=0.1     L=0.2   L=0.05   L=0.05    L=0.5    L=0.12   L=0.08   L=0.15




                                                                                                                                                     38
ULE
                                      ULE Results

 Simulation Setup


                         Critical path in a logic circuit (e.g. Adder)
       B0 A0                  B1 A1                     B2 A2                  B3 A3


                    S0                        S1                     S2                S3


  C0                     C1                        C2                     C3                C4




   • Compared to Cadence Virtuoso® Analog Optimizer (using numerical algorithms)
   • 65 nm CMOS




                                                                                                 39
ULE
                                     Delay Optimization
                                                                                         Logical Effort:
                                            Delay vs. wire lengths
                                                                                         higher delay
           10000

                                                                                            ULE:
                                      LE
                                                                                        minimal delay
                                      ULE
                                      AO
                 1000
                                                                                           Analog Optimizer:
                                                                                             minimal delay
          delay [ps]




                                                                                            (but sloooooow)

                       100




                       10
                             100nm   1µm     10µm      50µm       100µm   500µm   1mm
                                               w ire segm ent lengths


  • LE becomes inaccurate as the wire lengths grows
  • ULE is close to Analog Optimizer tool
      •         within 9%
                                                                                                           40
ULE
                                          Run Time Comparison
                                               AO (1% precision)
                                               AO (5% precision)
                                      100
                                               ULE (0.1% precision)




                                          10
                 run time [min]
                         Run time [min]




                                          1




                                          0




                                          0
                                                  2                   4                         6   8
                                                                          # of stages in path




• ULE run time is orders of magnitude shorter than the run time of Analog Optimizer
      •   ULE run time is shorter than 1 second

                                                                                                        41
ULE
           Power-Delay Optimization in ULE
                                        wire
                           Ri-1                                                                               Ci+1
                                            Rwi-1              Ci                Ri            R wi
                        gi-1                                            gi                                           gi+1

                                  Cwi-1/2        Cwi-1/2                               Cwi/2          Cwi/2




 Power is function of gate and wire capacitances

                                                                   
                                  P  Ci 1  Cwi 1  Ci  h i  Cwi 1

 Optimal gate size Ci

                                     Ci3  a1  Ci2  a2  Ci  a3  a4  0
                                                      Rw  Ci 1 
                                      a1  2  g i 1  i 1       
                                                                  
       P  D                                                  
                   0
        Ci                                                                                   
                                                                     Rwi 1  Ci 1  0.5  Cwi 1  Cwi               
                                           
                                                           
                                      a2   g i 1  Cwi 1  Cwi                   
                                                                                                          pw i  Ci 1 
                                                                                                                        
                                                                                                                       
                                      a3  0

                                                                       
                                      a4   g i  Cwi  Ci 1  Ci 1  Cw i          .
                                                                                                                            42
ULE
                 Sizing for minimal P×D
 Random logic path assumed with 10 stages
                                                          x6                                               x8
       x1        x2        x3        x4        X5                                        x7                                 x9                X10
                                                                                    L6                             L8
            L1                  L3        L4        L5                                         L7                                    L9
                      L2
                                                                                                                                     CN=10C0
        C1=C0




 Four wire length scenarios                                                                         Gate Sizes vs. Optimization                     (S4)
                                                                              140


                                                                              120
 S1: all wires L = 100µm
 S2: all wires L = 80µm                                                       100




                                                         gate size (xC0) 0)
                                                         Gate size (×C
 S3: all wires L = 400µm                                                      80

 S4: L = {900,600,150,300,800,200,400,150,250}                                           minimal Delay
                                                                              60


                                                                              40


• Power-Delay optimization reduces gate                                       20
                                                                                                                minimal Power×Delay
sizes as compared to Delay optimization                                        0
                                                                                    1    2     3       4        5       6        7        8     9      10
                                                                                                                # of gate




                                                                                                                                                            43
ULE
                       Reduced Energy, Low Delay Penalty

                                          Energy                                                          Delay

                  10                                                              4000

                   9                                                              3500        minimal Power-Delay
                   8     minimal Power-Delay
                                                                                  3000        minimal Delay
                   7     minimal Delay




                                                                     delay [ps]
energy (pJ)[pJ]




                   6                                                              2500




                                                                    delay (ps)
   energy




                   5                                                              2000
                   4                                                              1500
                   3
                                                                                  1000
                   2
                   1                                                              500
                   0                                                                 0
                        S1               S2               S3   S4                        S1              S2               S3   S4
                                               scenario                                                        scenario




                                                                                                                                    44
ULE
         ULE for Branches and Fanout
                                                                      wire segment
                                                                             Rwbi

                                                                                            Cbi+1
                                                          Cwbi/2               Cwbi/2




                                          wire segment                               wire segment
                              Cpi             Rwi                           Cpi+1        Rwi+1
                         gi                                         gi+1
                Ci                                       Ci+1
                                  Cwi/2         Cwi/2                      Cwi+1/2         Cwi+1/2




                                              
                     bi 1  Ci 1  Cbi 1 Ci 1               
                                                         hi  Ci 1  Cbi 1 Ci                        
                                                                                                 hwi  Cwi  Cwbi   C
                                                                                                                     i




 General ULE condition for gate sizing

                            Rwi   bi 1  1  Rwbi      
                       gi 
                                        
                                                        Ci   hi  g i 1  hi 1  hw i 1
                                                                                                            
                                                           

                                                                                                                         45
ULE
               Sizing in Path with Branches
                                                         Cb                                       Cb
                                                    Lb
                                                                                             Lb



                                         Lb                                                                             CN=10C0
         C1=C0                                                   Lb

                                              Cb                      Cb


                                                                                    Gate Sizing with Branches
                                                                140
 Four branch scenarios
                                                                               S1
                                                                120            S2
 S1: Lb = 400µm, Cb = 1 for all branches                                       S3
 S2: Lb = 400µm, Cb = 30 for all branches                       100            S4
                                                                               no branches
 S3: Lb = {400, 100, 400, 400}µm, Cb = {30,1,30,1}
                                                                80
                                                         size

 S4: Lb = {100, 100, 100, 400}µm, Cb = {1,1,1,30}
                                                                60
 Lw = 100µm for all wires at critical path
                                                                40


• Branches cause a change in sizing                             20

as compared to ULE without branches                               0
                                                                      1    2        3         4    5            6   7    8        9   10
                                                                                                       gate #




                                                                                                                                           46
ULE
        Delay Optimization with Branches

                                         Delay vs. Optimization
                              950
                                                                    Branches
                              900                                   No branches
                 delay [ps]




                              850


                              800


                              750


                              700
                                    S1         S2              S3          S4
                                                    scenario




 • Additional delay reduction is obtained using extended ULE condition with branches



                                                                                       47
ULE
      Unified Logical Effort Summary
                                 wire                                                wire
                  Ri                                                                              Ci+2
                                  Rwi              Ci+1          Ri+1                Rwi+1
             gi                                           gi+1                                           gi+2

                         Cwi/2          Cwi/2                              Cwi+1/2      Cwi+1/2




                       R  R   C
                         i         wi           i 1      =                     
                                                                        Ri 1  Cwi 1  Ci 2      

      • Useful over entire range of problems
         logic only – logic & wires – wires only

      • Computes optimal gate sizes
      • Low computational complexity
                                                                                                                48
ULE




           One More Question:
  “When can I reduce delay by adding an inverter?”




                                                     49
ULE

      Adding an Inverter to Reduce Delay
                                                                   wire
                                            R1                                               C3
                                                                    Rw
                                      g1                                                             g3

                                                            Cw/2         Cw/2




                                                                  C       
                                    Dold  R1  Cw  C3   Rw   w  C3 
                                                                   2      
                                          wire
                             R1                                                                               C3
                                           Rw1          C2                      R2           Rw2
                        g1                                                                                         g3

                                  Cw1/2        Cw1/2                                 Cw2/2        Cw2/2




                                              Cw                                                                     Cw     
                                           
                Dnew  R1  Cw1  C2  Rw1   1  C2   R2  Cw2  C3  Rw2
                                              2                                                                   2  C3 
                                                                                                                       2      
                                                                                                                            



                                   R  R  
                                           1           w1                   2  C0
  Dold  Dnew
                                                                     C                  
                                                                                                    condition for inverter insertion
                                           2  R0                          w2    C3
                                                                                                                                       50
ULE

              Inverter Addition vs. Gate Sizing




  L = 1000µm
  X1, X3 - variables




      • Inverter insertion depends on the value and ratio of the gate sizes X1 and X3
          •   Size of the inverter X2 is determined from ULE

                                                                                        51
ULE

      Inverter Addition – More Applications
                                    R  R  
                                       1       w1            2  C0
                No wires
                                       2  R0           C w2    C3       Power



                                                       vs. wire length

          R1 4  C0                                                          Dnew  Dold    Dold
                                                4  R0  C0
          R0   C3                      L1  L2 
                                                   rw  cw
                                                                           Beneficial when the expected
  Beneficial when the electrical                                           delay reduction is more than ∆
  effort is higher than 4                              equal wires


                                                        R0  C0
                                           Lcr  2 
                                                        rw  cw

                                   Beneficial when the wire is
                                   longer than Lcr



                                                                                                            52
ULE
              Example: Critical Wire Length
                                                 Lcritical Length
                                                Critical vs. Delta vs. ∆
                            4000
                                                                           X2=187
                            3500   Matlab
                                   Simulation
                            3000

                            2500
                 Lcr (µm)
                Lcr [um]




                                                          X2=164
                            2000
                                   X2=135
                            1500

                            1000

                            500

                              0
                                     0                     0.1               0.2
                                                          Delta
                                                           ∆


 • Critical length Lcr for inverter insertion depends upon the minimal delay reduction
 factor ∆
      •   Size of the inverter X2 is determined from ULE
                                                                                         53
 Gate-Terminated Sized
Repeater Insertion - GSRI
 “What is the optimal number of gates/repeaters?”




                                                    54
GSRI
       Revisiting Standard Repeater Insertion
   RI Assumptions
           Fixed and equal sizes
           Terminal gates are similar to repeaters



                             L/k   h     L/k       h     L/k
              Logic
                                           fixed
                                           equal


  BUT
           The wires are usually located between different logic gates
           Different repeater sizes may be chosen

        Gate-Terminated Sized Repeater Insertion (GSRI) is proposed

                                                                         55
GSRI
             Delay Model of Logic with Repeaters
                                                     k repeaters

              R0/x1                                                                      C0*g2*x2
  C0*g1*x1                        R0/xr              R0/xr                    R0/xr
                          C0*xr              C0*xr                    C0*xr

                      Cw/(k+1)            Cw/(k+1)                 Cw/(k+1)           Cw/(k+1)
                      Rw/(k+1)            Rw/(k+1)                 Rw/(k+1)           Rw/(k+1)




              R0  C                         R          C                 
  D  0.7       w  C0  xr    k  1  0   0.4  w  0.7  C0  xr  
              x1  k 1                       xr       k 1               
   R           C                              R           C                   R
   0   0.4  w  0.7  C0  g 2  x2   k  w   0.4  w  0.7  C0  xr   w  0.7  C0  g 2  x2
   xr         k 1                           k 1        k 1                 k 1




                                                                                                            56
GSRI
            Delay Minimization by GSRI
                                                              k repeaters

                       R0/x1                                                                         C0*g2*x2
            C0*g1*x1                       R0/xr              R0/xr                       R0/xr
                                   C0*xr              C0*xr                       C0*xr

                               Cw/(k+1)            Cw/(k+1)                  Cw/(k+1)             Cw/(k+1)
                               Rw/(k+1)            Rw/(k+1)                  Rw/(k+1)             Rw/(k+1)




                                  K 3  a1  K 2  a2  K  a3  a4  0


                                  a1  0.7  C0  R0
  D                              a2  0
     0
  K                                              0.7 0.4 
                                  a3  R0Cw     x  x   0.7RwC0   xr  g 2  x2   0.4RwCw
                                                           
                                                  1     r 

                                  a4  2  0.4  Rw  Cw

 RI assumptions
           - Long wires                                                                   0.4  Rw  Cw
           - Terminal gates are repeaters                                    K 
           - Many repeaters (K>>1)                                                        0.7  R0  C0
                  H.B. Bakoglu, “Circuits, Interconnections and Packaging for VLSI,” Adison-Wesley, pp. 194-219, 1990   57
GSRI
                 Example: Single Wire
                             how many repeaters?




                             L=3mm
            C1=10C0                                   C2=10C0


         RI  2
         GSRI  4
       Why?
       The first gate is weaker than the repeater
                                   (RI assumption is inaccurate)
                                                                   58
GSRI
        Number of Repeaters in Logic Path
                                                                              wire #4
                    wire #1           wire #2            wire #3                                   wire #5


              C1=10C0                                                                                     CL=10C0

                                                    30
                                                             w ire #5
                                                             w ire #4
 - ALU critical path, 65nm process                  25
                                                             w ire #3
 - Several wire lengths scenarios                            w ire #2
                                                    20       w ire #1
 - ULE sizing performed before GSRI

                                                K
                                                    15
                                                                   GSRI                                        RI
                                                    10


                                                    5


                                                    0
                                                           0.5      1     2         3       5       0.5        1    2   3   5

                                                                                        w ire lengths [m m ]

          •     GSRI allows optimization of shorter wires than RI
          •     The number of repeaters per wire is not equal in GSRI:
                              - Higher electrical effort  more repeaters
                                                                                                                                59
GSRI
                      Delay Reduction by GSRI
                    2500


                    2000           Initial ULE
                                   RI
                                   GSRI
       Delay [ps]




                    1500
                                   ULE on repeaters


                    1000


                    500


                      0
                             0.5                 1            2             3   5
                                                     w ire lengths [m m ]


       ULE sizing w/o repeaters  RI/GSRI  ULE sizing on repeaters
       •            GSRI result in up to 25% delay reduction as compared to RI
       •            ULE further reduces the delay by up to 27%
                     • mostly in short wires
                                                                                    60
 GSRI
                                      GSRI Followed by ULE Sizing
                                                                                                                                                                             GSRI)
                                                                                                                                              Delay and Power (normalized to RRI)
                                       Gate sizes vs. Optimization technique                                                                      vs. Optimization technique
                                                                                                                                     120%
               300
                                                                                                                                                                                           Delay

               250                                                                                                                   100%                                                  Power


               200                                                                                                                   80%
Size (×C0)
  size (xC0)




               150
                                                                                                                                     60%


               100
                                                                                                                                     40%
                                                                                           ULE (only the repeaters)
               50                                                                          ULE(all gates +repeaters)
                                                                                           GSRI
                                                                                           RRI                                       20%

                0
                                                                                                             NOR2
                                                                       NAND3




                                                                                                                       NAND4
                     INV




                                                                                          INV




                                                                                                                               INV
                           repeater


                                      repeater


                                                 repeater


                                                            repeater




                                                                               repeater




                                                                                                  repeater




                                                                                                                                      0%
                                                                                                                                             RRI
                                                                                                                                            GSRI        ULE (only the repeaters)   ULE(all gates +repeaters)




               Two alternatives for ULE sizing
                     - Sizing of the repeaters, without sizing the gates
                            - Power-efficient
                     - Sizing of the entire path, including the gates and the repeaters
                            - Lowest delay
                                                                                                                                                                                                               61
 GSRI
                                   Using Smaller Repeaters
                                   Delay vs. Repeater size                                                   Power vs. Repeater size
              3000                                                                        25

                              RI                                                                                                            RI
              2500
                              GSRI                                                        20                                                GSRI

              2000




                                                                             Power [pW]
                                                                        power [pW]
 delay [ps]




                                                                                          15
Delay [ps]




              1500

                                                                                          10
              1000


                                                                                          5
              500



                0                                                                         0
                     258     200           150          100   50   20                          258     200         150          100    50          20
                                            repeater size                                                            repeater size




                                         17% delay reduction                              &          15% power reduction

                     •     Smaller size  more repeaters
                     •     Power may decrease for higher number of smaller repeaters
                           Many smaller repeaters  reduced transition time  lower short-circuit currents

                                                                                                                                                        62
GSRI
                    Additional Perspective

                    Delay
                                                                      GSRI


                                                                       RI
                                       possible
                                      delay gap
                                                     possible
                                                  beneficial region
                     ∆P       DPmin
               ∆D                                 Dmin

                                                        DminGSRI

                                 APmin       ADmin AGSRI                     Area




  •    GSRI may provide smaller delay with smaller repeaters than RI
  •    Power-aware RI will lead to higher delay penalty than currently assumed

                                                                                    63
GSRI    Gate-terminated Sized Repeater Insertion
                       Summary



       • Accurate number of repeaters
          Terminal gates ≠ repeaters

       • Supports smaller repeaters
          Analytic expression – no more “rules of thumb”

       • Minimal delay
          GSRI delay < standard RI delay

                                                            64
Summary of Approaches
          LGR



                ULE



                      GSRI


                             65
                     Summary
         LE – only logic             RI – only wires

We propose: general solution - logic with wires

Unified Logical Effort (ULE)
     - Fast sizing of gates in presence of interconnect
     - Intuitive conditions for minimal delay

Gate-terminated Sized Repeater Insertion (GSRI)
     - Accurate optimal number of repeaters
     - Enhanced design flexibility and smaller delay than in RI

Logic Gates as Repeaters (LGR)
     - Distribution of logic gates over interconnect
     - Delay optimization without logically-redundant repeaters
                                                                  66
           Future Work

Analyzing wire sizing
Developing power efficient heuristics
Incorporating inductance


Integration in EDA tools

                                        67
68

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:12
posted:4/15/2012
language:
pages:68