Docstoc

Digital CMOS Design A Low-Power Approach

Document Sample
Digital CMOS Design A Low-Power Approach Powered By Docstoc
					Dynamic and Leakage Power
   Reduction in MTCMOS
Circuits Using an Automated
  Efficient Gate Clustering
          Technique
Mohab Anis, Shawki Areibi *, Mohamed Mahmoud
            and Mohamed Elmasry

  VLSI Research Group, University of Waterloo, Canada
  * School of Engineering, University of Guelph, Canada
                                                          1
       Presentation Outline

•   Low Power Design in DSM
•   Concept of sleep transistors
•   Previous work
•   Sizing the sleep transistor
•   Bin-Packing technique
•   Set-Partitioning technique
•   Conclusion and extended work done
                                        2
    Why Low Power Design ?
• Growing market of mobile and handheld
  electronic systems.
• Difficulty in providing adequate cooling. Fans
  create noise and add to cost.
• Heat dissipation impacts packaging technology
  and cost
• Increasing standby time of portable devices.

In DSM regimes, leakage power has become as
  big a problem as dynamic power

                                            3
     Concept of sleep transistors
MTCMOS technology is an increasingly
popular technique to reduce leakage power

                                                       LVT               LVT
                                                     Logic Block       Logic Block

                                                            VX                      VX

Proper ST sizing is a key issue                                         R       I
                                             SLEEP        HVT



ST size          Area    , Pdynamic   , Pleakage
                                                          Modeling of a sleep
ST size          Delay                                   transistor as a resistor



                                                                            4
                        First Approach [1]

Single ST to support whole circuit
                                                                                    LVT
                                                                                Logic Circuit
Increase in interconnect resistance for
distant blocks
                                                                             SLEEP           HVT

ST size    to compensate added
resistance      Area  Pdynamic Pleakage


 More significant in the DSM regime

[1] S.Mutah et al. “1-V Power Supply High-Speed Digital Circuit Technology with Multi-Threshold Voltage
                                                                                                     5
    CMOS,” IEEE J. of Solid-State Circuits, pp.847-853, 1995.
                      Second Approach [2]
  Single ST is sized according to a mutual
  exclusive discharge pattern algorithm.
                                                                           G1      G4
           ST assignments are wasteful.                                                     G6       G8


                                                                          G2               G7       G9
  Increase in interconnect resistance for                                                                    G10
                                                                                   G5
  distant blocks. ST size                  to compensate                  G3

  added resistance.
           Pdynamic          Pleakage


  More significant in the DSM regime.

[2] J.Kao et al. “MTCMOS Hierarchical Sizing Based on Mutual Exclusive Discharge Patterns”, in Proc. of 35 th DAC,
   pp. 495-500, Las Vegas, 1998                                                                         6
         Sizing the sleep transistor

• Objective: Constant ST size, causing 5%
  degradation in circuit speed.
• (W/L)sleep =               Isleep
                 0.05 n Cox (Vdd-VtL)(Vdd-VtH)


  Isleep is chosen to be 250 A.
  (W/L)sleep  6 for 0.18 m CMOS technology
  VtL = 350mV, VtH = 500mV

                                                  7
4-bit CLA Adder




                  8
Preprocessing of Gate Currents
           Random I/Ps to CLA adder are
        applied, highest current discharge
          is monitored, and multiplied by
         corresponding switching activity




          Monitor the peak current value
            and time of occurrence +
                     duration




           Currents are combined into
          single current Ieq = max{Ii},
           when  Ii in time  max{Ii}

                                             9
               Timing Diagram

                                  F0=2
                         G1
                                                              F0=4
                          T1                          G2
                                                      T2
                             65
  I1 (G1)
                    T1=80psec                            79                                      time
  I2 (G2)             120psec
                                            T1+T2=210psec                                        time
                                                    260psec

I1 (G1):   0 0 11 22 33 43 54 65 54 43 33 22 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

I2 (G2):   0 0 0 0 0 0 0 6 12 18 24 30 37 43 49 55 61 67 73 79 73 67 61 55 49 43 37 30 24 18 12 6 0 0 0 0 0 0 0




                                                                                                                  10
Preprocessing Heuristic
1.   Initialize current vectors
2.   Set all Gates free; to move to sub-cluster;
3.    For all gates in circuit
               If gate i is not clustered yet
                  assign gate i to new cluster k
                  update cluster current vector
                  calculate max current, start, end time
               For all other gates in circuit
                  If (gate j is not clustered yet)
                      add current of gate j to cluster k
                      If (combination  max current)
                          append gate to cluster
                          update cluster info
                          set gate j locked in cluster k
               End For
      End For
4.    Return all clusters formed.
                                                           11
   Bin-Packing Technique

Objective: Minimize the No. of used STs.

Subject to: 1.  Ieq  Imax for any ST.
           2. Ieq are assigned only once.




                                       12
        Currents Assignment
Sleep Transistors            1                  2
   Equivalent
                     IEQ3 IEQ4 IEQ7     IEQ1 IEQ2 IEQ5 IEQ6
    Currents

Assigned Gates
                    G5 G6 G7 G8 G14   G1 G2 G3 G4 G9 G10
                    G16 G18 G23       G11 G12 G13 G15 G17
                                      G19 G20 G21 G22 G24
                                      G25 G26 G27 G28


  Currents (A)           250                240

                                                        13
Clustering of CLA adder




                          14
  Set-Partitioning Technique

                                           Cell
                 Lmin                      Sleep Device cavity
                                           Ground rail


                                                                           Vdd
Cell       G1                             G6
                 G2     G3    G4    G5            G7    G8
Height
                                                                           gnd
     G19   G9    G10    G11   G12   G13   G14     G15   G16   G17   G18
                                                                           Vdd

           G24   G20    G25   G21   G26   G22     G27   G23 G28
                                                                           gnd


                                                                          15
               Cost Function
Cj = ( w1 . Cj1 ) + ( w2 . Cj2 )

Cj1 = Sleep_Transistor max_current -  currenti i

Cj2 =  duv in a group Sj

                         Gv               Sj
                                   dvw
                  duv
                                     Gw
                  Gu        dwu

                                               16
 Clustering Heuristic
Create_Clusters ( )
1. Calculate distances between all gates;
2. Initialize maxgates_per_cluster=n;
3. Create clusters with Single gates;
4. For cl=2; cl  maxgates_per_cluster
          Create_n_Gate_Cluster (cl)
5. For all clusters created calculate_cost ( )

Create_n_Gate_Clusters (cl)
1. For cluster of type cl
          create_new_cluster ( )
          While not done
               Choose Gate with minimum distances
                If sum of currents  capacity
                    append gate to newly created cluster
                End If
                If total gates within cluster  limit
                break;
           End While
    End For                                                17
2. Return newly created cluster
    Set-Partitioning Technique

• Objective: Minimize  CjSj

• Subject to: 1.  of currents for Sj  Imax
              2. Groups must cover all gates
                 with no repetition.



                                         18
                Grouping of gates

                                           Cell
                 Lmin                      Sleep Device cavity
                                           Ground rail


                                                                           Vdd
Cell
           G1    G2     G3    G4    G5    G6      G7    G8
Height
                                                                           gnd
     G19   G9    G10    G11   G12   G13   G14     G15   G16   G17   G18
                                                                           Vdd

           G24   G20    G25   G21   G26   G22     G27   G23 G28
                                                                           gnd


                                                                          19
       Computational Time

                             BP/SP CPU TIME
                             SP CPU Time   BP CPU Time
              2000
              1800
              1600
              1400
Time (secs)




              1200
              1000
               800
               600
               400
               200
                 0
               -20028   30        31       61       160   204
                                 Number of Gates




                                                                20
                  Results (% Savings)
REF     Benchmark          4-bit     32-bit      6-bit        4-bit    32-bit Single   27-channel
                           CLA       Parity    Multiplier    74181        Error         interrupt
                          adder     Checker                   ALU       Correcting      controller
                                                                           C499           C432
       No. of gates        28         31          30          61           202            160
       Pdynamic to [1]    14 %       18 %        31 %        17 %          20 %           2%
       Pdynamic to [2]    12 %       16 %        23 %        14 %          19 %           0%

BP     Pleakage to [1]    96 %       92 %        95 %        93 %          95 %           99 %
       Pleakage to [2]    93 %       85 %        78 %        83 %          89 %           89 %

      ST_Area [1],[2]    95, 92 %   92, 85 %   95, 78 %     93, 83 %     95, 89 %       99, 88 %


       Pdynamic to [1]     7%         9%         19 %        11 %          9%             2%
       Pdynamic to [2]     5%         6%         9%          8%            8%             0%

SP     Pleakage to [1]    87 %       85 %        85 %        86 %          87 %           98 %
       Pleakage to [2]    78 %       70 %        35 %        66 %          71 %           77 %

      ST_Area [1],[2]    87, 77 %   84, 69 %   85, 34 %     86, 67 %     86, 70 %       98, 76 %
                                                                                             21
  % Power Savings (Bin-Packing)

       Pdyn/1   Pdyn/2      Pleak/1   Pleak/2
100
 90
 80
 70
 60
 50
 40
 30
 20
 10
  0
      CLA   Parity   Mult     ALU     Error     C432
                     Benchmarks


                                                       22
% Power Savings (Set-Partitioning)

        Pdyn/1   Pdyn/2   Pleak/1   Pleak/2

  100

   80

   60

   40

   20

   0
        CLA Parity Mult     ALU Error C432
                     Benchmarks


                                              23
  % ST Area Saving (Bin-Packing)

          St-Area[1]   St-Area[2]

100
80

60

40

20

 0
      CLA Parity Mult      ALU      Error C432
                   Benchmarks


                                                 24
% ST Area Saving (Set-Partitioning)

          St-Area[1]   St-Area[2]

100
80
60
40
20
 0
      CLA Parity Mult ALU Error C432
                 Benchmarks


                                       25
              Conclusion
• BP technique cluster gates in MTCMOS
  circuits. Pdynamic and Pleakage are reduced by
  15% and 90% compared to [1] and [2]
  respectively.

• SP takes routing complexity into
  consideration. Pdynamic and Pleakage are
  reduced by 11% and 77% compared to [1]
  and [2] respectively.

                                              26
         Extended Work Done

• A hybrid clustering technique that combines the
  BP and SP techniques is devised, to produce a
  more efficient and faster solution.
• Noise associated with ground bounce is taken
  as taken as a design criterion (< 50mV).
• Investigating effect of different ST sizes on
  circuit parameters.
• Investigating effect of the cost function weights
  w1 and w2 on circuit parameters.


                                               27

				
DOCUMENT INFO