2a

Document Sample

```					Timing Optimization in
Logic with Interconnect

Arkadiy Morgenshtein, Eby G. Friedman, Ran Ginosar, Avinoam Kolodny

Technion – Israel Institute of Technology

SLIP (System Level Interconnect Prediction) 2008
1
Intro
Timing Optimization
Timing Optimization

A          function              B

Special cases

A
B                   Typically, a mixture of both
Logic

only gates

A          Logic                                                    B
Interconnect
only wires
2
Intro
Logic with Wires

Common Example                         1

2                   1

2

4      3                   3

4

5
UART design                                     5

3
Intro
The Interconnect Wall

Logic w/o wires                                                                        Long wires

Logic Gate Sizing                                           Interconnect Optimization

wire
Logic                        L

C2                        CN                                                              CL
g1                        C3    g3
g2                        gN
C1

h1=C2/C1        h2=C3/C2                  CL
Logic   L/k      1    L/k            k-1   L/k

CL

Logical Effort                                    Repeater Insertion

4
Intro
Timing Optimization in
Logic with Interconnect

Logic w/o wires                                            Long wires

A        L1
C1             L2
C2             L3
C3
B

5
Existing Techniques

A (very) Short Tutorial

6
Intro
Logical Effort (only logic)
Delay            =         Delay             =             Delay           =           Delay
h1=C2/C1                  h2=C3/C2                          h3=C4/C3                     h4=CL/C4

g1                                            C3      g3
C2        g2                                                C4        g4
C1

CL

Delay model
Optimal sizing
Di    gi  hi  pi
Delayi = Delayi+1
 - delay of minimal inverter R0·C0 , technology
constant
g i - logical effort, gate type factor: e.g. ginv=1
gihi=gi+1hi+1
hi - electrical effort, load driving capability
pi - parasitic effort, due to output capacitance
I. Sutherland, B. Sproull, and D. Harris, “Logical Effort - Designing Fast CMOS Circuits,” Morgan Kaufmann, 1999.   7
Intro
Limitations of Logical Effort
Delay        =    Delay         =    Delay        =        Delay
h1=C2/C1          h2=C3/C2            h3=C4/C3              h4=CL/C4

g1                              C3   g3
C2   g2                              C4       g4
C1

CL

• No wires                • No fixed side branches

Logic with wires and branches
LE breaks down
?                  ?                 ?
Delay        =    Delay         =    Delay        =        Delay

8
Intro
Repeater Insertion (only wires)
R=5, C=5
Delay ~ Length2                                                  L=5                                                    D = RC = 25

r=1                r=1                  r=1                 r=1                  r=1
c=1                c=1                  c=1                 c=1                  c=1
Delay ~ Length          L=1                 L=1                  L=1                 L=1                  L=1            D = Σrc = 5

Optimal sizing                                                      Optimal number of repeaters

R0  Cw                                                                      0.4  Rw  Cw
x                                                                        k
Rw  C0                                                                      0.7  R0  C0

R0 - effective resistance of minimal inverter                        Rw - wire resistance
C0 - gate capacitance of minimal inverter                            Cw - wire capacitance

H.B. Bakoglu, “Circuits, Interconnections and Packaging for VLSI,” Adison-Wesley, pp. 194-219, 1990                  9
Intro
Properties of Repeater Insertion
Assumptions of basic repeater insertion (RI)

Equal size

Equal spacing

Terminal gates are similar to repeaters

x                 =

L/k       h   L/k           h   L/k
Logic                             equal
fixed

Characteristics of RI

Number and size of repeaters are independent

Single optimal size for a given process and metal layer

10
So, What Are We Going To Do?

11
Intro
We Are Breaking The Wall
Logic w/o wires                                                                     Long wires

Logical Effort                                                                   Repeaters Insertion

wire   C2                           CN
g1                         wire        g3
C3             gN
Cw1         g2
C1           Rw1                Cw2
Rw2
CL

WANTED – solution for the mixed case
Challenges:
Gate placements
Gate sizes
Number of gates, repeaters

12
Our Approach to Timing Optimization
Logic Gates as Repeaters (LGR)

Gate placement (along the wire)

Unified Logical Effort (ULE)

Gate sizes

Gate-terminated Sized Repeater Insertion (GSRI)

Number of repeaters

13
Logic Gates as Repeaters - LGR
“Where should the gates be located (along the wire)?”

14
LGR
The Idea
• Problem – delay reduction in logic with wire
IN                                                                                   OUT
L

• A solution – wire segmenting by repeaters
IN                                       1                         k-1               OUT
L /k               L /k                     L /k

• Drawback – power, area w/o logical functionality = waste
• Proposed – logic gates as repeaters
IN                  L1                           L2                                 OUT
L3

LGR - distribution of logic gates over interconnect
- driving the partitioned wire without adding repeaters

K. Venkat, “Generalized Delay Optimization of Resistive Interconnections through an Extension of Logical Effort,” ISCAS 1993   15
LGR
LGR Delay Modeling
i-1
i
Li-1                                                        i+1
Li
Li+1
Cwi  Li  Cint , Rwi  Li  Rint

Total Delay

             LiCint                                     
                                                   
N
Dtot     g i   hi           0.5  Li RintCint  Li RintCi 1 
2

i 1               Ci                                        

M. Moreinis, A. Morgenshtein, I. Wagner, and A. Kolodny, “Logic Gates as Repeaters (LGR) for Area-Efficient Timing Optimization,” IEEE TVLSI, 2006   16
LGR
Optimal Wire Segmenting
i-1
i
Li-1                           i+1
Li
Li+1
Ri
Ri+1
Ci
Ci+1

Dtot
0
Li                            L L  Rav  Ri  L Cav  Ci 1 
N                  Liopt                  
L
1
i   L                  N      Rw             Cw

•    Output resistance of driving gate i below average  wire length i is increased
•    Input capacitance of successor gate i+1 above average wire length i is decreased
•    All gates are equal  equal partitioning

•    In the case of a negative segment length, neighbor gates are merged

17
LGR
LGR Results
Critical path of 8-256 decoder circuit

Unoptimized           LGR Segmenting

40                                                         36.4
34.6                                       34.9
35

30                                25.2
Delay [nsec]

25

20

15

10
3.62   3.47
2.28     2.15
5

0
Low-tier          Low-tier          High-tier      High-tier
1.5mm              15mm             1.5mm           15mm

• Delay reduction of up-to 27% - by “moving” the gates

• Further delay reduction – by scaling and LGR+RI

M. Moreinis, et al., “Repeater Insertion combined with LGR Methodology for on-Chip Interconnect Timing Optimization,” ICECS, 2004.   18
LGR
Optimal Gate Scaling
• Enlargement of all gates by a uniform factor S to minimize timing
• can be performed iteratively with Segmenting

1
2               3
L1                                     4
L2           L3
L4

Dtot                  N            N              
0           s    Ri  Cwi    Rwi  Ci 1 
s                     i 1         i 1          
equal
inverters   segments

s     Cw R0  C0Rw 

19
LGR
LGR Segmenting and Scaling
Uniform scaling performed for all gates
Repeater Insertion             LGR

6                               5.45

5
Delay [nsec]

4

3

1.65
2

0.542   0.557
1    0.268                              0.194
0.188                               0.086

0
Low-tier         Low-tier          High-tier         High-tier
1.5mm             15mm             1.5mm              15mm

• For intermediate wires LGR outperforms RI by up-to 55%
• For long wires RI is faster
• BUT: it requires 44 repeaters

• Best for long wires – combined LGR and RI
M. Moreinis, et al., “Repeater Insertion combined with LGR Methodology for on-Chip Interconnect Timing Optimization,” ICECS, 2004.   20
LGR
LGR Summary

• Logic gates serve as repeaters
 No need for logically redundant repeaters

• Delay reduction + lower area/power
• Can be combined with RI

21
Unified Logical Effort - ULE
“What is the optimal size of the gates?”

22
ULE
Unified Delay Model (including wires)
wire segment                        wire segment
Rw                                  Rw
g                                       g
C                                        C
Cw                   Cw              Cw                  Cw
2                    2                2                   2

Cw
hw                          pw  Rw   0.5  Cw  C 
C
Capacitive                                                                Resistive
interconnect effort                                                      interconnect effort

D    g   h  hw   pw

23
ULE
Minimal Delay Condition

      Rwi  Ci   
                 
D
0           gi               hi  g i 1  hi 1  hwi 1
hi                   R0  C0    
                 

Minimal Delay                   Equal Stage Delays

24
ULE
Minimal Delay for Capacitive Wires

      Rw i  Ci   
 gi 
      R0  C0                     
  hi  g i 1  hi 1  hw i 1       General RC interconnect
                  


gi  hi  gi 1  hi 1  hwi 1                          Capacitive interconnect
(short wires and branches)

25
ULE
ULE Convergence to LE and RI

      Rw i  Ci   
 gi 
      R0  C0

  hi  g i 1  hi 1  hw i 1
                                   
                  

logic without wires                                                                          repeater insertion
special cases
hw  0   Rw  0                                                h 1    g 1

gi  hi  gi 1  hi 1                                     C        R0  Cwi
xopt    i 
C0       Rwi  C0

26
ULE
Some Algebra…
wire                                             wire
Ri                                                                    Ci+2
Rwi        Ci+1                    Ri+1         Rwi+1
gi                                     gi+1                                        gi+2

Cwi                Cwi                       Cwi+1                   Cwi+1
2                  2                         2                       2

      Rw i  Ci   
 gi 
      R0  C0

  hi  g i 1  hi 1  hw i 1
                                    
                  

 Ri  Ci Rw i  Ci      Ci  1 R i  1  C i  1     Ci  2 Cw i 1   
 R C  R C
                        C  R C
                            C  C
                  

 0 0      0      0         i        0     0          i 1    i 1     

R  R   C
i   wi         i 1            
 Ri 1  Ci  2  Cwi 1   
27
ULE
Intuition of ULE Optimum
wire                                      wire
Ri                                                                 Ci+2
Rwi      Ci+1              Ri+1          Rwi+1

Cwi                  Cwi                   Cwi+1                 Cwi+1
2                    2                     2                     2

R  R   C
i     wi   i 1         =                    
Ri 1  Cwi 1  Ci 2   
optimal size

DC                                         DR

Delay caused by gate capacitance should be equal to
delay caused by gate resistance

28
ULE
ULE Optimality
-11
x 10                          Delay vs. Sizing Factor
3.5

3

Dtot
Size too small       2.5
Dmin

high resistance                                                                                        Size too big
2

Delay                                                                            high capacitance
1.5
DR

1
D*

0.5
DC
optimum
20           40          60       80          100   120   140   160   180

sizing factor

29
ULE
Optimal Gate Capacitance

 Cw i 1                         g i 1
Ci 1  Ci  Ci  2    1            
    Ci  2                       Rwi  Ci
LE                                    gi 
wire capacitance                    R0  C0
logical efforts and wire resistance

• Expression for size of a single gate
• Gate sizes along a logic path are iteratively determined

30
ULE
Examples (1): ULE Sizing
N=9, L={0, 0.01, 0.05, 0.1, 0.5, 1}mm

g=4/3        g=4/3       g=4/3       g=4/3       g=4/3                         g=4/3            g=4/3       g=4/3             g=4/3

L           L           L           L                            L             L               L            L
C1=10C0
CN=100C0

100

Equal wires                                                                   90
Total electrical effort H = 10                                                80

70
• L = 0  Size converges to LE                            Capacitance (×C0)
60          xopt
• Longer wires  ULE is faster                                                50
L=1mm
40
• Long wires  Fixed sizing xopt                                              30
L=0

20
LE
10
1      2          3           4     5           6           7   8   9
Gate #
31
ULE
Examples (2): ULE Sizing
N=9, L={0, 0.01, 0.05, 0.1, 0.5, 1}mm

g=4/3       g=4/3       g=4/3       g=4/3       g=4/3                         g=4/3              g=4/3            g=4/3             g=4/3

L           L           L           L                             L             L                     L             L
C1=10C0                                                                                                                               CN=10C0

60
55               xopt
Total electrical effort H = 1
50
L= 1 mm
45                      0.5 mm
• L = 0  Converges to LE (no scaling)                                                                                          100 µm

Capacitance (×C0 )
40
50 µm
35
• All wire lengths  ULE is faster                                               30
25
10 µm
• Long wires  Fixed sizing xopt                                                 20
15                                                   LE
L= 0
10

1     2            3            4        5            6           7   8         9
Gate #

32
ULE
So, What is Xopt ?
Rwi-1        Ci              Ri         Rwi
gi
Cwi-1                                 Cwi

Rwi 1Ci        Ri Cwi

    Cw             gi
Ci  Ci 1  Ci 1   1  i  
 C 
For long wires
     i 1 
Rwi 1  Ci 1
gi 1 
R0  C0

Ci opt          cw  R0       Li
xopti                                
Cimin         r  C0  g     Li 1
E 5w5 5 5 5 5i
F
constant

33
ULE
Optimum Condition for Long Wires
Rw     C                R        Rw
gi
Cw                               Cw

RwC             RCw

For long wires
 R  Rw   C    R  Cw  C 

Rw  C  R  Cw

34
ULE
Xopt and Repeaters
Rw           Crep                      Rrep            Rw

Cw                                                      Cw

RwCrep                 RrepCw

Optimal sizing condition
for repeater
Rw  Crep  Rrep  Cw

cw  R0        Li         equal wires                         R0  cw                                       R0  cw
xopti                                                  xopti                                  INV
xopt 
rw  C0  g i   Li 1                                         rw  C0  g i           (g=1)                   rw  C0

H.B. Bakoglu, “Circuits, Interconnections and Packaging for VLSI,” Adison-Wesley, pp. 194-219, 1990             35
ULE
Solving Design Problems with Xopt
L1        Crep              Rrep      L2

rw L1  Crep      Rrep  cw L2

cw  R0        Li
xopt                   
rw  C0  g i   Li 1

- Layout constraint - optimal size of the repeater located between two wires

cw  R0   L2
xrep                   
rw  C0   L1

36
ULE
Solving Design Problems with Xopt
L1        Crep              Rrep      L2

rw L1  Crep      Rrep  cw L2

cw  R0        Li
xopt                   
rw  C0  g i   Li 1

- Cell size constraint - optimal wire length with a repeater of size xrep

L2         cw  R0 
 xrep 
2

L1         rw  C0 
37
ULE
Typical Design Example
Optimal Sizing of the Gates
120
similar g, similar L
100                different g, similar L
Optimal ULE sizing                                                              similar g, different L

(a) similar gates, similar wires                             80

capacitance
(b) different gates, similar wires                           60

(c) similar gates, different wires                           40

•     Gates with higher logical effort get                   20

bigger size                                             0
1     2        3        4         5         6        7       8         9
gate #
•     No fixed xopt in circuits with various      (a)              g=4/3     g=4/3    g=4/3    g=4/3    g=4/3     g=4/3    g=4/3    g=4/3    g=4/3
gates and wires                                                         L=0.1    L=0.1    L=0.1    L=0.1     L=0.1    L=0.1    L=0.1
L=0.1

(b)              g=1       g=4/3    g=4/3    g=5/3    g=5/3     g=4/3    g=7/3    g=1      g=1

L=0.1    L=0.1    L=0.1    L=0.1    L=0.1     L=0.1    L=0.1    L=0.1

g=4/3     g=4/3    g=4/3    g=4/3    g=4/3     g=4/3    g=4/3    g=4/3
(c)                                                                                       g=4/3

L=0.1     L=0.2   L=0.05   L=0.05    L=0.5    L=0.12   L=0.08   L=0.15

38
ULE
ULE Results

Simulation Setup

Critical path in a logic circuit (e.g. Adder)
B0 A0                  B1 A1                     B2 A2                  B3 A3

S0                        S1                     S2                S3

C0                     C1                        C2                     C3                C4

• Compared to Cadence Virtuoso® Analog Optimizer (using numerical algorithms)
• 65 nm CMOS

39
ULE
Delay Optimization
Logical Effort:
Delay vs. wire lengths
higher delay
10000

ULE:
LE
minimal delay
ULE
AO
1000
Analog Optimizer:
minimal delay
delay [ps]

(but sloooooow)

100

10
100nm   1µm     10µm      50µm       100µm   500µm   1mm
w ire segm ent lengths

• LE becomes inaccurate as the wire lengths grows
• ULE is close to Analog Optimizer tool
•         within 9%
40
ULE
Run Time Comparison
AO (1% precision)
AO (5% precision)
100
ULE (0.1% precision)

10
run time [min]
Run time [min]

1

0

0
2                   4                         6   8
# of stages in path

• ULE run time is orders of magnitude shorter than the run time of Analog Optimizer
•   ULE run time is shorter than 1 second

41
ULE
Power-Delay Optimization in ULE
wire
Ri-1                                                                               Ci+1
Rwi-1              Ci                Ri            R wi
gi-1                                            gi                                           gi+1

Cwi-1/2        Cwi-1/2                               Cwi/2          Cwi/2

Power is function of gate and wire capacitances

                      
P  Ci 1  Cwi 1  Ci  h i  Cwi 1

Optimal gate size Ci

Ci3  a1  Ci2  a2  Ci  a3  a4  0
         Rw  Ci 1 
a1  2  g i 1  i 1       
                     
 P  D                                                  
0
Ci                                                                                   
Rwi 1  Ci 1  0.5  Cwi 1  Cwi               


a2   g i 1  Cwi 1  Cwi                   
 pw i  Ci 1 

                                                                            
a3  0

                  
a4   g i  Cwi  Ci 1  Ci 1  Cw i          .
42
ULE
Sizing for minimal P×D
Random logic path assumed with 10 stages
x6                                               x8
x1        x2        x3        x4        X5                                        x7                                 x9                X10
L6                             L8
L1                  L3        L4        L5                                         L7                                    L9
L2
CN=10C0
C1=C0

Four wire length scenarios                                                                         Gate Sizes vs. Optimization                     (S4)
140

120
S1: all wires L = 100µm
S2: all wires L = 80µm                                                       100

gate size (xC0) 0)
Gate size (×C
S3: all wires L = 400µm                                                      80

S4: L = {900,600,150,300,800,200,400,150,250}                                           minimal Delay
60

40

• Power-Delay optimization reduces gate                                       20
minimal Power×Delay
sizes as compared to Delay optimization                                        0
1    2     3       4        5       6        7        8     9      10
# of gate

43
ULE
Reduced Energy, Low Delay Penalty

Energy                                                          Delay

10                                                              4000

9                                                              3500        minimal Power-Delay
8     minimal Power-Delay
3000        minimal Delay
7     minimal Delay

delay [ps]
energy (pJ)[pJ]

6                                                              2500

delay (ps)
energy

5                                                              2000
4                                                              1500
3
1000
2
1                                                              500
0                                                                 0
S1               S2               S3   S4                        S1              S2               S3   S4
scenario                                                        scenario

44
ULE
ULE for Branches and Fanout
wire segment
Rwbi

Cbi+1
Cwbi/2               Cwbi/2

wire segment                               wire segment
Cpi             Rwi                           Cpi+1        Rwi+1
gi                                         gi+1
Ci                                       Ci+1
Cwi/2         Cwi/2                      Cwi+1/2         Cwi+1/2

                
bi 1  Ci 1  Cbi 1 Ci 1               
hi  Ci 1  Cbi 1 Ci                        
hwi  Cwi  Cwbi   C
i

General ULE condition for gate sizing

      Rwi   bi 1  1  Rwbi      
 gi 
                  
 Ci   hi  g i 1  hi 1  hw i 1
                                                
                                     

45
ULE
Sizing in Path with Branches
Cb                                       Cb
Lb
Lb

Lb                                                                             CN=10C0
C1=C0                                                   Lb

Cb                      Cb

Gate Sizing with Branches
140
Four branch scenarios
S1
120            S2
S1: Lb = 400µm, Cb = 1 for all branches                                       S3
S2: Lb = 400µm, Cb = 30 for all branches                       100            S4
no branches
S3: Lb = {400, 100, 400, 400}µm, Cb = {30,1,30,1}
80
size

S4: Lb = {100, 100, 100, 400}µm, Cb = {1,1,1,30}
60
Lw = 100µm for all wires at critical path
40

• Branches cause a change in sizing                             20

as compared to ULE without branches                               0
1    2        3         4    5            6   7    8        9   10
gate #

46
ULE
Delay Optimization with Branches

Delay vs. Optimization
950
Branches
900                                   No branches
delay [ps]

850

800

750

700
S1         S2              S3          S4
scenario

• Additional delay reduction is obtained using extended ULE condition with branches

47
ULE
Unified Logical Effort Summary
wire                                                wire
Ri                                                                              Ci+2
Rwi              Ci+1          Ri+1                Rwi+1
gi                                           gi+1                                           gi+2

Cwi/2          Cwi/2                              Cwi+1/2      Cwi+1/2

R  R   C
i         wi           i 1      =                     
Ri 1  Cwi 1  Ci 2      

• Useful over entire range of problems
 logic only – logic & wires – wires only

• Computes optimal gate sizes
• Low computational complexity
48
ULE

One More Question:
“When can I reduce delay by adding an inverter?”

49
ULE

Adding an Inverter to Reduce Delay
wire
R1                                               C3
Rw
g1                                                             g3

Cw/2         Cw/2

C       
Dold  R1  Cw  C3   Rw   w  C3 
 2      
wire
R1                                                                               C3
Rw1          C2                      R2           Rw2
g1                                                                                         g3

Cw1/2        Cw1/2                                 Cw2/2        Cw2/2

 Cw                                                                     Cw     
              
Dnew  R1  Cw1  C2  Rw1   1  C2   R2  Cw2  C3  Rw2
 2                                                                   2  C3 
 2      
                                                                               

R  R  
1           w1                   2  C0
Dold  Dnew
C                  
condition for inverter insertion
2  R0                          w2    C3
50
ULE

L = 1000µm
X1, X3 - variables

• Inverter insertion depends on the value and ratio of the gate sizes X1 and X3
•   Size of the inverter X2 is determined from ULE

51
ULE

R  R  
1       w1            2  C0
No wires
2  R0           C w2    C3       Power

vs. wire length

R1 4  C0                                                          Dnew  Dold    Dold
                                   4  R0  C0
R0   C3                      L1  L2 
rw  cw
Beneficial when the expected
Beneficial when the electrical                                           delay reduction is more than ∆
effort is higher than 4                              equal wires

R0  C0
Lcr  2 
rw  cw

Beneficial when the wire is
longer than Lcr

52
ULE
Example: Critical Wire Length
Lcritical Length
Critical vs. Delta vs. ∆
4000
X2=187
3500   Matlab
Simulation
3000

2500
Lcr (µm)
Lcr [um]

X2=164
2000
X2=135
1500

1000

500

0
0                     0.1               0.2
Delta
∆

• Critical length Lcr for inverter insertion depends upon the minimal delay reduction
factor ∆
•   Size of the inverter X2 is determined from ULE
53
Gate-Terminated Sized
Repeater Insertion - GSRI
“What is the optimal number of gates/repeaters?”

54
GSRI
Revisiting Standard Repeater Insertion
RI Assumptions
Fixed and equal sizes
Terminal gates are similar to repeaters

L/k   h     L/k       h     L/k
Logic
fixed
equal

BUT
The wires are usually located between different logic gates
Different repeater sizes may be chosen

Gate-Terminated Sized Repeater Insertion (GSRI) is proposed

55
GSRI
Delay Model of Logic with Repeaters
k repeaters

R0/x1                                                                      C0*g2*x2
C0*g1*x1                        R0/xr              R0/xr                    R0/xr
C0*xr              C0*xr                    C0*xr

Cw/(k+1)            Cw/(k+1)                 Cw/(k+1)           Cw/(k+1)
Rw/(k+1)            Rw/(k+1)                 Rw/(k+1)           Rw/(k+1)

R0  C                         R          C                 
D  0.7       w  C0  xr    k  1  0   0.4  w  0.7  C0  xr  
x1  k 1                       xr       k 1               
R           C                              R           C                   R
 0   0.4  w  0.7  C0  g 2  x2   k  w   0.4  w  0.7  C0  xr   w  0.7  C0  g 2  x2
xr         k 1                           k 1        k 1                 k 1

56
GSRI
Delay Minimization by GSRI
k repeaters

R0/x1                                                                         C0*g2*x2
C0*g1*x1                       R0/xr              R0/xr                       R0/xr
C0*xr              C0*xr                       C0*xr

Cw/(k+1)            Cw/(k+1)                  Cw/(k+1)             Cw/(k+1)
Rw/(k+1)            Rw/(k+1)                  Rw/(k+1)             Rw/(k+1)

K 3  a1  K 2  a2  K  a3  a4  0

a1  0.7  C0  R0
D                              a2  0
0
K                                              0.7 0.4 
a3  R0Cw     x  x   0.7RwC0   xr  g 2  x2   0.4RwCw

 1     r 

a4  2  0.4  Rw  Cw

RI assumptions
- Long wires                                                                   0.4  Rw  Cw
- Terminal gates are repeaters                                    K 
- Many repeaters (K>>1)                                                        0.7  R0  C0
H.B. Bakoglu, “Circuits, Interconnections and Packaging for VLSI,” Adison-Wesley, pp. 194-219, 1990   57
GSRI
Example: Single Wire
how many repeaters?

L=3mm
C1=10C0                                   C2=10C0

RI  2
GSRI  4
Why?
The first gate is weaker than the repeater
(RI assumption is inaccurate)
58
GSRI
Number of Repeaters in Logic Path
wire #4
wire #1           wire #2            wire #3                                   wire #5

C1=10C0                                                                                     CL=10C0

30
w ire #5
w ire #4
- ALU critical path, 65nm process                  25
w ire #3
- Several wire lengths scenarios                            w ire #2
20       w ire #1
- ULE sizing performed before GSRI

K
15
GSRI                                        RI
10

5

0
0.5      1     2         3       5       0.5        1    2   3   5

w ire lengths [m m ]

•     GSRI allows optimization of shorter wires than RI
•     The number of repeaters per wire is not equal in GSRI:
- Higher electrical effort  more repeaters
59
GSRI
Delay Reduction by GSRI
2500

2000           Initial ULE
RI
GSRI
Delay [ps]

1500
ULE on repeaters

1000

500

0
0.5                 1            2             3   5
w ire lengths [m m ]

ULE sizing w/o repeaters  RI/GSRI  ULE sizing on repeaters
•            GSRI result in up to 25% delay reduction as compared to RI
•            ULE further reduces the delay by up to 27%
• mostly in short wires
60
GSRI
GSRI Followed by ULE Sizing
GSRI)
Delay and Power (normalized to RRI)
Gate sizes vs. Optimization technique                                                                      vs. Optimization technique
120%
300
Delay

250                                                                                                                   100%                                                  Power

200                                                                                                                   80%
Size (×C0)
size (xC0)

150
60%

100
40%
ULE (only the repeaters)
50                                                                          ULE(all gates +repeaters)
GSRI
RRI                                       20%

0
NOR2
NAND3

NAND4
INV

INV

INV
repeater

repeater

repeater

repeater

repeater

repeater

0%
RRI
GSRI        ULE (only the repeaters)   ULE(all gates +repeaters)

Two alternatives for ULE sizing
- Sizing of the repeaters, without sizing the gates
- Power-efficient
- Sizing of the entire path, including the gates and the repeaters
- Lowest delay
61
GSRI
Using Smaller Repeaters
Delay vs. Repeater size                                                   Power vs. Repeater size
3000                                                                        25

RI                                                                                                            RI
2500
GSRI                                                        20                                                GSRI

2000

Power [pW]
power [pW]
delay [ps]

15
Delay [ps]

1500

10
1000

5
500

0                                                                         0
258     200           150          100   50   20                          258     200         150          100    50          20
repeater size                                                            repeater size

17% delay reduction                              &          15% power reduction

•     Smaller size  more repeaters
•     Power may decrease for higher number of smaller repeaters
Many smaller repeaters  reduced transition time  lower short-circuit currents

62
GSRI

Delay
GSRI

RI
possible
delay gap
possible
beneficial region
∆P       DPmin
∆D                                 Dmin

DminGSRI

•    GSRI may provide smaller delay with smaller repeaters than RI
•    Power-aware RI will lead to higher delay penalty than currently assumed

63
GSRI    Gate-terminated Sized Repeater Insertion
Summary

• Accurate number of repeaters
 Terminal gates ≠ repeaters

• Supports smaller repeaters
 Analytic expression – no more “rules of thumb”

• Minimal delay
 GSRI delay < standard RI delay

64
Summary of Approaches
LGR

ULE

GSRI

65
Summary
LE – only logic             RI – only wires

We propose: general solution - logic with wires

Unified Logical Effort (ULE)
- Fast sizing of gates in presence of interconnect
- Intuitive conditions for minimal delay

Gate-terminated Sized Repeater Insertion (GSRI)
- Accurate optimal number of repeaters
- Enhanced design flexibility and smaller delay than in RI

Logic Gates as Repeaters (LGR)
- Distribution of logic gates over interconnect
- Delay optimization without logically-redundant repeaters
66
Future Work

Analyzing wire sizing
Developing power efficient heuristics
Incorporating inductance

Integration in EDA tools

67
68

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 12 posted: 4/15/2012 language: pages: 68