Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

PowerPoint format

VIEWS: 1 PAGES: 60

									 Post-layout Interconnect Optimization
 by Simultaneous Gate and Wire Sizing
    Based on Lagrangian Relaxation
• Jiang, Chang, Jou, “Crosstalk-driven Interconnect Optimization,”
  IEEE TCAD, Sep. 2K.
• Jiang, Jou, Chang, “Noised-constrained Performance Optimization
  by Simultaneous Gate and Wire Sizing Based on Lagrangian
  Relaxation,” DAC-99.
• Jiang, Pan, Chang, Jou, “Optimal Reliable Crosstalk-driven
  Interconnect Optimization,” ISPD-2K.
• Chen, Chang, Wong, “Fast Performance-driven Optimization for
  Buffered Clock Trees Based on Lagrangian Relaxation,” DAC-96.
 Technology Drives Post-layout
      Optimization Needs




• Double the work load and design complexity every 18
  months.
        Post-Layout Optimization
• Trends
   –   Increased custom design for high-performance circuits
   –   Aggressive tuning for performance improvement
   –   Shorter time-to-market
   –   Severe interconnect effects
   –   Signal integrity issues

• Post-layout circuit component tuning
   – Can significantly improve circuit performance and signal integrity
     without major modification
                    Manual Sizing
• Pros
  – Takes advantage of human experience          1000+ iterations
  – Is reliable
  – Can simultaneously combines with other            Change
    optimization techniques
• Cons
  – Is slow, tedious, limited, and error-prone        Simulate
  – Relies too much on experience, requires
    solid training
  – Cannot guarantee optimality (don’t know
                                                     Satisfy?
    when to stop)
      Automatic Circuit Tuning
• Pros
  – Is fast
  – Can achieve the best performance with interconnect
    considerations
  – Can explore other objectives (power/delay/noise tradeoff)
  – Can Boost productivity
  – Can guarantee optimality (for convex problems)
  – Can insure timing and reliability

• Cons
  – Complicated tool development and support ($$)
  – Tool testing, integration, and training
         Good Tuning Algorithm
•   Fast
•   Optimal (for convex problems)
•   Versatile
•   Easy to use
•   Solution quality index (error bound to the optimal solution)
•   Simple (easy to develop and maintain)
   A Simple Sizing Problem
• Minimize the maximum delay Dmax by
  changing w1,…,wn




        w9    w7        w4        w1   D1<Dmax
 a
     w10                w5
                                       D2<Dmax
 b                                w2
       w11   w8    w6        w3
        Existing Sizing Works
• Algorithm: fast, non-optimal for general problems
   – TILOS (Fishburn, Dunlop, ICCAD-85)
   – Weighted Delay Optimization (Cong et al., ICCAD-95)


• Mathematical Programming: slower, optimal
   – Geometric Programming (TILOS)
   – Augmented Lagrangian (Marple et al., 86)
   – Sequential Linear Programming (Sapatnekar et al.)
   – Interior Point Method (Sapatnekar et al., TCAD-93)
   – Sequential Quadratic Programming (Menezes et al., DAC-95)
   – Augmented Lagrangian + Adjoin Sensitivity (Visweswariah, et
   al., ICCAD-96, ICCAD-97)


• Is there any method that is fast and optimal?
            Converge?

                   ?     Augmented
       Weighted          Lagrangian
       Delay
                                 SQP
   TILOS    Fast       Optimal
                                       SLP


Algorithm                    Mathematical
                             Programming
   TILOS: Heuristic Approach
• Finds sensitivities associated with each gate
• Up-sizes the gate with the maximum sensitivity
• Minimizes the objective function

                   Minimize Dmax
             w1       w2
                           w3           w4
   a                                               D1<Dmax

        w5                 w6
                                                   D2<Dmax
    b        w7          w9             w11
                  w8            w10
  Weighted Delay Optimization
• Cong, et. al., ICCAD-95
• Sizes one wire at a time in the DFS order
• Minimize the weighted delay
• Best weights?
                Minimize l1D1 +l2D2
Drivers                                        Loads
                w1           w2          w3
                                                l 1D 1


                                                l 2D 2
                             w4           w5
 Mathematical Programming
• Formulation:



• Lagrangian:

• Optimality (Necessary) Condition (Kuhn-Tucker
Condition):
    Lagrangian Relaxation Theory

                         LRS




• LRS (Lagrangian Relaxation Subproblem)
• There exist Lagrangian multipliers λthat lead LRS to
  the optimal solution for the convex programming
  problem (in which gi(x)’s are posynomials).
• The optimal solution for any LRS is a lower bound of
  the original problem for any type of problems
Lagrangian Relaxation



            Lagrangian Relaxation
                             Lλ


                   λ
    Lagrangian Relaxation
            Lagrangian
            Relaxation
                         Augmented
       Weighted          Lagrangian
       Delay
                             SQP
   TILOS
                                   SLP


Algorithm                   Mathematical
                            Programming
Lagrangian Relaxation
     Framework
   Update Multipliers


   Weighted Delay
   Optimization



     Converge?
      Lagrangian Relaxation
           Framework
More Critical -> More Resource -> Larger Weight

                                  D1

                                  D2




     l1 l2                        l1 l2
             Dmax
                                          Dmax

    D1 D2                         D1 D2
       Weighted Minimization
• Traverse the circuit in the topological order
• Resize each component to minimize
Lagrangian during visit
              Minimize l1D1 +l2D2

                      w1
   a                                        D1


                                            D2
   b
                 w2           w3
        Multiplier Adjustment:
       A Subgradient Approach




• Subgradient: An extension definition of gradient for
  non-smooth function.
• Experience: Simple heuristic implementation can
  achieve very good convergence rate.
     Convergence Sequence
Max Delay


            Any Feasible Maximum Delay =
                    Upper Bound


                          Optimal Solution


                Lagrangian = Lower Bound
            Weighted Delay <= Maximum Delay
                               # Iterations
Path Delay Formulation
     d1
           d2
Aa                    D1
Ab
                d3
Ac                    D2


            • Exponential growth
            • More accurate
            • Can exclude false paths
Stage Delay Formulation
       d1
            Ae   d2
Aa                            D1
Ab
                      d3
Ac                            D2


                 • Polynomial size
                 • Less accurate
                 • Contains false paths
              Compatible?

                  ?


Stage Based                 Path Based
Both Multipliers Satisfy KCL
   (Flow Conservation)
      Stage Based        Path Based
                                           l41
l43
      4        l31   1                     l
                                          1 51
                          4
           3
                     2                    2
                                   3          l42
      5        l32        5
l53
                                           l52
 l43 +l53=l31 +l32        l3,in =l3,out
              Compatible?
                Lagrangian
                Relaxation




Stage Based                  Path Based
       Crucial Design Metrics
• NOISE is a crucial concern in DSM technology

                     Noise



            Area               Power



                     Delay


        Goal: simultaneous optimization
             Circuit Model
    gate i                        wire j



             ri                      rj
    ci
                                cj/2 cj/2
    ci
    ri = ri / xi               rj = rj / xj
    ci = ci xi                 c j = c j x j + fj

          Elmore delay model: Di = ri Ci
Di: delay of node i; Ci : downstream capacitance
                  Crosstalk Model
• Crosstalk between neighboring wires i and j:
   crosstalk(i,j) = switching_dissimilarity(i,j)
                  * coupling_capacitance(i,j)



              0                    Cc                     2Cc



 Anti-Miller Effect                            Miller Effect:
shortens transition                        lengthens transition
               Switching Dissimilarity
    • Switching dissimilarity between wires i and j:
                                                        TD
                                                        0
                                                             f ( i , t )f ( j , t )dt
                   switching_dissimilarity(i,j) = 1-
          f(1,t)                                                   TD
         +1
wire 1                        TD
                                        t
         -1
          f(2,t)                                                 1
         +1
wire 2                        TD
                                        t
                                               0.8125                      0.6875
         -1
          f(3,t)                                2                                  3
                                                               0.75
         +1
wire 3                         TD
                                        t
         -1
         Coupling Capacitance
• Coupling capacitance between two neighboring
  wires i and j:
                                              wire i of size xi
               fij lij
 cij =            xi + xj
                                          x
         dij -                            i
                     2
                                                                       dij
                                                          lij
         fijlij             xi + xj             x
    »              (1 +               )          j
          dij                2 dij                      wire j of size xj

         fijlij
    =             + cij (xi + xj)
          dij                             fij : unit-length fringing
                                               capacitance
               constants
    extracted from technology files
Taming Noise and Other Objectives
        Switching Dissimilarity Consideration
                    Wire Ordering




     Simultaneous Post-layout Optimization
             by sizing circuit component


     Noise        Area        Delay        Power


               Lagrangian Relaxation
The Switching Dissimilarity Problem
  SS:
  Given           n wires and their switching behavior
  Find            an ordering for the wires such that
                  the total effective loading between
                  neighboring wires is minimized

            1                             1

  0.8125          0.6875                 3    2          ??

                                         3    2
   2                  3
           0.75
 Efficient Algorithms for SS??
• SS is NP-hard
  – Equivalent to the MCWO problem (Vittal and Marek-
    Sadowska, TCAD97)
• Theorem:
  If P ¹ NP and r ³ 1, there is no polynomial-time
  approximation algorithm with ratio bound r for
  the SS problem.
  – Intractable to have good approximation algorithms
  – Resort to heuristics
The WOSS Approximation Algorithm
Algorithm: WOSS (Wire Ordering for the SS Problem)
Input:     Complete graph Kn for n wires
Output:    A wire ordering O minimizing effective loading

A1. Select a node r to be the root, 1 ≦ r ≦ n.
A2. Grow a minimum spanning tree T for Kn from r.
A3. O ← the list of nodes visited in the preorder traversal
    of T .

  Time complexity: O(n2)
Crosstalk-constrained Multi-
  Objective Optimization
M:
Minimize A          Area
Subject to D £ DB   Delay constraints
           X £ XB   Crosstalk constraints
           P £ PB   Power constraints
           L£x£U    Sizing constraints
LP Problem P


 arrival time
  variables


                Lagrangian   LR Problem LRS1
LP Problem PP
                Relaxation

                                optimality
                                conditions



                             LR Problem LRS2
Neighborhood and Dominating Index
  • Neighborhood N(i): the set of i’s adjacent wires
  • Dominating index I(i): { j | j > i and jÎN(i)}


               2              N(2)={1}, I(2)=Æ
                    1         N(1)={2, 3}, I(1)={2,3}
                              N(3)={1}, I(3)=Æ
               3
      Problem Formulation P
Minimize                 åi=s+1..n+s aixi
Subject to               åiÎd Di £ AB, "d ÎD, ?
                         åiÎW åjÎI(i) cij £ XB,
                         V2f åi=s+1..n+s ci £ PB
                         Li £ xi £ Ui, " s+1 £ i £ n+s
ai : unit size of component i
d : path; D: path set; AB : delay bound
cij : coupling capacitance between i and j; XB : crosstalk bound
V : supply voltage; f : working frequency; PB : power bound
Li : lower bound of xi; Ui : upper bound of xi
|D| may grow exponentially in the circuit size
    Problem Formulation PP
Minimize                å i=s+1..n+s aixi
Subject to              aj £ A0, jÎinput(n+s+1)
                        aj +Di £ ai, " s+1 £ i £ n+s,
                                        jÎinput(i)
                        Di £ ai, " 1 £ i £ s,
                        åiÎW åjÎI(i) cij(xi+xj) £ X0,
                        åi=s+1..n+s ci £ P0,
                        Li £ xi £ Ui, " s+1 £ i £ n+s
ai : arrival time of i; A0 = AB;
X0 : transformed crosstalk bound
PB : transformed power bound
All are positive coefficient polynomials
From LP to Lagrangian Relaxation

min   cx
                  Posynomial          min   L(l)=cx + l(Ax-b)
st    Ax£b          forms             st    xÎX
      xÎX

     LP        Positive coefficient
                                       Lagrange multipliers l
 formulation      polynomials
        Lagrangian Relaxation
• Introduce Lagrange multipliers l, b, g to relax
  constraints to the objective function L
  Ll,b,g(x,a) = åi=s+1..n+s aixi + åi Î input(m) ljm(aj - A0)
              + åi=s+1..n+s åj Î input(i) lji(aj +Di - ai)
              + åi=1..s l0i(Di - ai) + b(åi=s+1..n+s ci - P0)
              + g(åiÎW åjÎI(i) cij(xi+xj) - X0)
• Lagrangian Relaxation Subproblem LRS1
  Minimize      Ll,b,g(x,a)
  Subject to    Li £ xi £ Ui, " s+1 £ i £ n+s
  Subject to only sizing constraints
              Optimality Conditions
  • Theorem:
    The optimality conditions on Lagrange multipliers:
       åkÎoutput(i ) lik = åjÎinput(i ) lji, for 1 £ i £ n+s
    ¶Ll,b,g(x*,a*)
                   =0
           ¶ai
å multipliers on incoming edges =å multipliers on outgoing edges
                               1
                                     4
                           3
                                         2
                           2

                        1+3+2=6=4+2
Lagrangian Relaxation Subproblem LRS2
  Minimize           Lm,b,g(x)
  Subject to         Li £ xi £ Ui, " s+1 £ i £ n+s,
  where m=(m1,…, mm), mi = åjÎinput(i) lji, for 1£ i £m
    and Lm,b,g(x) = åi=s+1..n+s aixi + b(åi=s+1..n+s ci - P0)
                  + g(åiÎW åjÎI(i) cij(xi+xj) - X0)
                  + åi=1..n+s miDi

  Apply the optimality conditions and rewrite Ll,b,g(x,a).
  Lm,b,g(x) is independent of a.
           Optimal Resizing
• Theorem:
  Let x = (xs+1,…, xn+s) be a solution.
  The optimal resizing of component i :
      xi* = min(Ui, max(Li, opti)),
                       mi ri (Cj’ + S cij xj)
     where opti =                 jÎN(i)

                     ai + ( b+Ri )ci +γ S cij
                                           jÎN(i)
  ¶Lm,b,g(x)
             = 0.
      ¶xi
  Our algorithm updates xi using this theorem.
 Lagrangian Dual Problem LDP
Maximize D(l,b,g)
Subject to l satisfies the optimality conditions,
           where D(l,b,g) = min Ll,b,g(x,a)
LDP converges to the global optimal.
Optimal Gate and Wire Sizing
          The objective function
              minimize area
                                           > error bound

                                            LR
                                                           done
    Lagrangian Dual Problem
        maximize min Ll                          £ error bound
                                    Adjust
                                   Lagrange
                                   multipliers

Lagrangian Relaxation Subproblem
           minimize Ll
RC Parameters And Size Bounds
Experimental Results
 Ckt            Ckt Size                 Noise (pF)         Delay (ps)
Name     #G       #W        total     Initial   Final    Initial   Final
c1355   546      1064       1610       20.5       2.1   1005.6 1098.9
c1908   880      1498       2378       24.6       2.5   1444.6 1338.6
c2670   1193     2076       3269       33.5       3.4   1480.7 1499.9
c3540   1669     2939       4608       50.2       5.0   1713.5 1685.5
c432    214       426       640         7.9       0.9   1442.3     958.2
c499    514       928       1442       16.4       1.7    875.8     799.3
c5315   2307     4386       6693       82.1       8.2   1649.4 1548.4
c6288   2416     4800       7216       95.4       9.5   4888.3 4494.3
c7552   3512     6144       9656      103.3      10.3   1615.3 1619.4
c880    383       729       1112       13.1       1.4    931.5     794.4
Impr.               -                       89.7%              5.3%

 Ckt       Power (mW)         Area (m m2 )               time     mem
                                                 ite
Name     Initial   Final    Initial    Final            (sec)     (KB)
c1355    228.3      28.5   48299       5203      9        56      1096
c1908    357.1      41.5   71338       7369      13      155      1184
c2670    486.4      58.5   98067      10319      7       444      1320
c3540    682.2      79.5   138242 14292          8       553      1472
c432      89.9      18.4   19200       2984      7        21      976
c499     211.3      27.9   43259       4834      10       97      1072
c5315    959.3     113.9   200803 20768          7      1321      1752
c6288   1015.0     129.9   216495 23341          14     2705      1808
c7552   1433.5     168.9   289707 30120          7      2823      2120
c880     159.3      22.1   33359       3827      12       94      1032
Impr.          86.8%              87.9%                    -
  Effectiveness of Our Method
• Improve area, noise, power, and delay by
  87.9%, 89.7%, 86.8%, and 5.3%
• Why not much improvement on delay?
  – Up-sizing increases loading for upstreams and
    coupling
  – Down-sizing reduces driving capability for
    downstreams
     Efficiency of Our Method
• 47 min runtime and 2.1 MB storage requirement
  for a circuit of 9656 gates and wires
• The runtime per iteration and the storage
  requirement approach linear in the circuit size

          size: 9656            size: 9656
          403 sec/ite             2.1MB
  Conclusion: Crosstalk-constrained
     Multi-objective optimization
• Crosstalk model
  – Switching dissimilarity
  – Coupling capacitance
• Our method
  – Switching dissimilarity consideration: wire ordering
  – Multi-objective optimization: Lagrangian relaxation
• Experimental results
  – Effectiveness: improve area, noise, power, and delay by
    87.9%, 89.7%, 86.8%, and 5.3%
  – Efficiency: 47 min runtime and 2.1MB storage
    requirement for a circuit of 9656 gates and wires
     Physical Design Challenges
1. Interconnect-driven design flow
     – Layout-driven synthesis
     – Interconnect-driven floorplanning: buffer block design, buffer
       constrained routing
2.  Physical design for very large-scale circuits
3.  Block-level floorplanning/placement
4.  Inductance extraction & modeling (transmission lines)
5.  RLC routing
6.  Coupling aware timing analysis
7.  Noise & timing-driven routing
8.  High-speed clock routing with power & skew
    constraints
9. Design with uncertain data
10. Process variation impacts
      Interconnect-driven Design
• Traditional post-layout optimization is not feasible
  for deep submicron ICs

                       circuit block          wire
    routing
    channel                                    buffer
                       circuit block
• Shall integrate buffer-block design into
  floorplanning


   floorplanning                         post-layout
                        routing
     /placement                          optimization
Large-scale Module Floorplanning




            9800 blocks
Transmission-Line Effect in High-Speed ICs

• Optimize delay with the transmission-line effect
  consideration.
                                     rb     A
                                                Zi+1         B     Cb i+1
                                      i

                                           RLC equivalent
n   Buffer and wire sizing for impedance matching
    to handle staircase-like waveform and ringing
              b
             r i > Z I+1                         rbi <ribZ I+11
                                                          < Zi +
      VD                             VDD
       D
                                                         undershoot
                            Time                              Time
           Staircase-like waveform              Ringing
     Process Variation
• Result from subwavelength lithography
• May cause unexpected circuit behavior
• Require design insensitive to process
  variation
          Crosstalk: Coupling
             Capacitance
                 fij lij                   wire i
• cij =                                              xi
                    xi + xj
           dij -
                      2
      fij lij         1        dij    lij cij
    = d
          ij     1- xi + xj                     xj
                      2 dij      wire j
• Include coupling effect into wire capacitance
    – ci = ci xi + fi + 2Scij
    – consider crosstalk effect on delay and power
            Crosstalk Sensitivity
• Impact of process variations on crosstalk
•
    Vij º      ¶ cij   ¶ cij                          wire i
                     +                                          xi
               ¶ xi    ¶ xj
                                     -2
            fij lij        xi + xj        dij    lij cij
       =              1-
             dij 2          2dij
                                                           xj
                                            wire j
  – first derivative of crosstalk w.r.t. wire width
             Scaling Effect
• Technology scaling down by S times benefits
  gates but harms wires


              Lower Bound
     Feature                     Gate Area/ Wire
                     Crosstalk
      Size Crosstalk             Delay Device Delay
                     Sensitivity
                           2               2
       1/S     S         S       1/S    1/S    >1

               fij lij   fij lij
                dij      dij 2
 Metal Filling for Manufacturability and
               Performance

• Add dummy features to achieve global planarization
  for performance and manufacturability


                          oxide
    Dense metal wires                 sparse

        dishing
                            erosion
                  oxide

                   Copper process

								
To top