TH EDA
Fine-Grained Sleep Transistor Sizing
Algorithm for Leakage Power Minimization
Student : Da-Cheng Juan
Advisor : Shih-Chieh Chang
NTHU-CS VLSI/CAD LAB
Outline
Sleep Transistor Sizing Problem
Maximum Instantaneous Current Estimation
Time-Frame Partitioning for Sizing
Experimental Results
Conclusions
2
Trend of Low Power Designs
Leakage increases exponentially
– reaches more than 50% of total power in 65nm technology.
Low power design is a must-have, not an optional.
gate
Power dissipation
– Active power (active mode)
source drain
– Leakage power (sleep mode)
Sub-threshold leakage
3
Power Gating
Power Gating
– One of the most effective ways to reduce leakage.
Low Vth Logic Devices
VDD SL
Mode Sleep
Transistor
Active 0 ON
Sleep 1 OFF
GND
VGND
SL use high Vth Sleep Transistor
to reduce the leakage current
GND
4
Implementation of Power Gating
Distributed Sleep Transistor Network (DSTN)
Low Vth Logic Device
VDD
C1 C2 C3
VGND
SL SL SL
5
Leakage Saving
In sleep mode:
– Leakage: proportional to the ST’s size.
– Small ST to reduce leakage.
VDD
VGND
Ileakage Ileakage Ileakage
6
Voltage Drop across Sleep Transistor
In active mode:
– Voltage drop across a ST degrades the performance.
– Voltage drop: inversely proportional to the ST’s size.
– Large ST to bind the voltage drop.
VDD
VGND
VST VST VST
7
Sleep Transistor (ST) Sizing
Dilemma scenario:
– Small ST to reduce leakage. (sleep mode)
– Large ST to bind the voltage drop. (active mode)
Objective: minimize ST size (leakage) under a specified
voltage-drop constraint, VST*.
VDD
VGND
VST* VST* VST*
8
Voltage Drop Estimation with MIC
Maximum Instantaneous Current (MIC) through a ST
– determines the worst case IR drop.
Estimating the upper bound of MIC(ST)
– to size ST properly to meet the voltage-drop constraint.
VDD
C1 C2 C3
MIC(ST): MIC across a ST.
VGND
MIC(ST1) MIC(ST2) MIC(ST3)
9
Voltage Drop Estimation with MIC
MIC(C) (MIC of a cluster) is easy to measure.
Due to current balancing effect
– MIC(ST) (MIC through a ST) is hard to predict.
Finding the MIC of
a cluster is fast. VDD
C1 MIC(C1) C2 C3
Finding the MIC across
a ST is time-consuming.
VGND
MIC(ST1) MIC(ST2) MIC(ST3)
10
Temporal Perspective of Cluster’s MIC
Conventional way
– ST sizes are determined with MIC
of the entire clock period.
one clock cycle
(Current)
Cluster 1
Cluster 2
MIC(C1) occurs at T6.
MIC(C2) occurs at T9.
11
(Time Unit : 10ps)
MIC(Ci) waveform
Temporal Perspective of Cluster’s MIC
Smaller time frames lead to:
– a more accurate MIC estimation.
– but high computation complexity.
one clock cycle
Cluster 1
Current (mA)
Cluster 2
(Time Unit: 10ps) 12
MIC(Ci) waveform
Difficulties
Current balancing effect complicates the sizing problem.
MIC
MIC MIC MIC
Time-frame partitioning leads to high computation complexity.
one clock cycle
13
Contributions
More accurate MIC prediction from temporal perspective.
Variable-length partitioning to reduce computation complexity.
Algorithm to minimize the sizes of sleep transistors.
Achieving 21% area reduction in total sleep transistor sizes
compared with [2].
- [2] Chiou et al. DAC’06
14
Outline
Sleep Transistor Sizing Problem
Maximum Instantaneous Current Estimation
Time-Frame Partitioning for Sizing
Experimental Results
Conclusions
15
Resistance Network
C1 C2 C3
I(C1) I(C2) I(C3)
RV RV
VGND
I(ST
R(ST1) I(ST
R(ST2) I(ST
R(ST3)
16
Discharging Ratio
C1 C2 C3
I(C1)
2Ω 2Ω
VGND
9Ω 8Ω 10Ω
0.43* I(C1) 0.34* I(C1) 0.23* I(C1)
The discharging ratio can be calculated by
– Kirchhoff’s Current Law
17
– Ohm’s Law
Discharging Matrix Ψ (SAI )
C1 C2 C3
I(C1) I(C2) I(C3)
VGND
I(ST1) I(ST2) I(ST3)
I ( ST1 ) I (C1 ) ψ11 ψ12 ψ13
→ I ( ST2 ) Ψ I (C2 )
where Ψ ψ 21 ψ 22 ψ 23
I ( ST3 ) I (C3 ) ψ 31 ψ32 ψ33
18
MIC(ST) Estimation Mechanism
C1 C2 C3
MIC(C1) MIC(C2) MIC(C3)
MIC(ST1) MIC(ST2) MIC(ST3)
MIC ( ST1 ) MIC (C1 ) ψ11 ψ12 ψ13
→ MIC ( ST2 ) Ψ MIC (C2 )
where Ψ ψ 21 ψ 22 ψ 23
ψ 31 ψ 32 ψ 33
MIC ( ST3 ) MIC (C3 )
19
Outline
Sleep Transistor Sizing Problem
Maximum Instantaneous Current Estimation
Time-Frame Partitioning for Sizing
Experimental Results
Conclusions
20
Temporal Perspective of Cluster’s MIC
Different MIC(Ci) occurs at
different time points.
one clock cycle
(Current)
Cluster 1
Cluster 2
MIC(C1) occurs at T6.
MIC(C2) occurs at T9.
21
MIC(Ci) waveform (Time Unit: 10ps)
Temporal Perspective of Cluster’s MIC
Different MIC(Ci) occurs at different time points
within a clock period.
Traditional way to estimate MIC(STi) is over
pessimistic.
MIC ( ST1 ) MIC (C1 )
MIC ( ST ) Ψ MIC (C )
2 2
MIC ( ST3 )
MIC (C3 )
over-estimated !
22
Time-Frame Partitioning for MIC(ST) Estimation
Expand MIC(Ci) into MIC(Ci,Tj).
one clock
MIC(C1,T6) cycle
(Current)
Cluster 1
Cluster 2
MIC(C1,T3) MIC(C2,T6)
MIC(C1,T1)
23
MIC(C2,T1) MIC(C2,T3) MIC(Ci,Tj) waveform (Time Frame)
Time-Frame Partitioning for MIC(ST) Estimation
For each time frame Tj, use MIC(Ci,Tj) to obtain MIC(STi,Tj).
MIC ( ST1 , T1 ) MIC (C1 , T1 )
MIC ( ST , T ) Ψ MIC (C , T )
2 1 2 1
MIC ( ST3 , T1 )
MIC (C3 , T1 )
24
Time-Frame Partitioning for MIC(ST) Estimation
For ST1, the maximum MIC(ST1,Tj)
among all Tj is the upper bound of
MIC(ST1) after partitioning.
one clock cycle
(Current)
Cluster 1
ST 1
Cluster 2
ST 2
MIC(ST2)
MIC(ST1)
25
MIC(STi,Tj) waveform (Time Frame)
Notation Review
MIC(Ci)
– Maximum Instantaneous Current of ith Cluster
MIC(STi)
– Estimated MIC upper bound flowing through ith sleep transistor
MIC(Ci,Tj)
– MIC of Ci in jth time frame
MIC(STi,Tj) =Ψ * MIC(Ci,Tj)
– Estimated MIC upper bound through STi in jth time frame
MIC(STi) = Ψ * MIC(Ci)
– With time-frame partitioning
MIC(STi) = max{ MIC(STi,Tj) for all j }
– Without time-frame partitioning
26
Time-Frame Partitioning for MIC(ST) Estimation
Time-Frame Partitioning leads
to a better MIC(ST) estimation!
ORIGINAL_MIC(ST1) one clock cycle
ORIGINAL_MIC(ST2)
37% larger! 27% larger! Cluster 1
ST 1
Cluster 2
ST 2
(Current)
MIC(ST1)
MIC(ST2)
27
MIC(STi,Tj) waveform (Time Frame)
Reduce the Computation Complexity
More time frames lead to
– more accurate voltage-drop estimation.
– but higher computation complexity.
Reduce the computation complexity:
– dominated time-frame removal
– variable length time-frame partitioning
28
Dominated Time Frame Removal
T3 is dominated by T6.
– MIC(C1,T6) > MIC(C1,T3),
– MIC(C2,T6) > MIC(C2,T3).
Neglect T3
– MIC(ST1,T6) > MIC(ST1,T3),
– MIC(ST2,T6) > MIC(ST2,T3).
Cluster 1
MIC(C1,T6) Cluster 2
MIC(C2,T6)
MIC(C1,T3)
Cluster MIC
MIC(C2,T3) waveform
29
Variable-Length Time-Frame Partitioning
MIC(C1,Tc)
Ta Tb Tc Td
MIC(C2,Td)
MIC(C1,Tb) MIC(C2,Tc)
MIC(C2,Tb)
MIC(C1,Td)
(1) uniform two-way partition (2)Variable-length two-way partition
(Tb dominates Tc ) and (Tb dominates Td)
=> the estimated upper bound of Fig(2) will be smaller.
30
Variable-Length Time-Frame Partitioning
With all MIC(Ci)s are separated
- MIC(STi) can be better estimated!
Example with the number of time frames = 3
one clock cycle
Cluster 1
Cluster 2
Cluster 3
T1 T2 T3
31
Cluster MIC waveform
Variable-Length Time-Frame Partitioning
Partition one clock period
– with the minimum time unit exhausively
Not efficient
Accurate MIC(STi) estimation
– with limited number of variable-length time frames
Efficient
Only lose slight accuracy
32
Problem Formulation of ST Sizing
Inputs:
1. Voltage-drop constraint.
2. MIC(Ci,Tj): Cluster’s MIC information.
Objective:
1. Minimize the total width of sleep transistors.
2. Voltage drops must meet the constraint.
Output:
1. A set of sleep transistor width.
33
ST Sizing Algorithm
1. Initialize ST size with a 2. Update the discharging 3. Update MIC(STi,Tj)
large value. matrix. and voltage drops.
0.38 0.30 0.21 0.18
Ψ=
0.27 0.30 0.21 0.18
Ψ
MIC(STi,Tj)= .MIC(Ci,Tj)
0.21 0.24 0.35 0.28 V(STi,Tj)=MIC(STi,Tj).R(STi)
99Ω 99Ω 99Ω 99Ω 0.14 0.16 0.23 0.36
4. Resize ST with the
worst drop.
MIC ( STi , T j ) No Voltage
WST * ( )k drops ok?
VST *
Yes
Return 34
99 99 73 99 ST size
Outline
Sleep Transistor Sizing Problem
Maximum Instantaneous Current Estimation
Time-Frame Partitioning for Sizing
Experimental Results
Conclusions
35
Environment Setup
TSMC 130nm CMOS technology.
Vdd = 1.3 volt.
Specified tolerable voltage drop: 5% of the ideal supply
voltage (0.065 volt.)
MIC(Ci) is obtained via 10,000-random-pattern
PrimePowerTM simulations.
Minimum time unit is set to 10 pico-second.
36
Implementation Flow
RTL netlist Synthesis
Gate-level netlist SDF file
Placement Simulation
DEF file VCD file
Gate Positioning VCD Partitioning
Gate location Partitioned VCD file
MIC Estimation
Variable-length Partitioning (Optional)
: Commercial tools
ST Sizing ST size
: Our tools 37
Experimental Results
Total Area (Width in μm) Runtime (Sec.)
Circuit
[8] [2] TP V-TP TP V-TP
C432 12817 8491 6775 7086 4262 495
C499 10741 8347 6684 7229 3644 568
C880 15050 11296 9233 9676 2561 345
C1355 19352 13056 10591 11496 2514 422
C3540 29808 23020 18650 20282 16856 942
C5315 29794 23773 18785 19534 13830 2190
C7552 41016 29500 24269 25621 17216 2896
dalu 3468 2904 2110 2283 3816 483
frg2 3632 2835 2232 2255 701 136
i8 13247 9931 7836 8141 7720 1080
t481 9405 7389 5024 5402 16289 1514
des 11804 9766 7850 8145 8321 1180
AES 44378 33965 27229 28137 28379 3524
Avg. 1.70 1.26 1 1.06 8.09 1
38
Previous works: [2] Chiou et al. DAC’06, [8] Long et al. DAC’03
Outline
Sleep Transistor Sizing Problem
Maximum Instantaneous Current Estimation
Time-Frame Partitioning for Sizing
Experimental Results
Conclusions
39
Conclusions
Propose an efficient sleep transistor sizing method
for DSTN power-gating designs.
Present theorems based on temporal perspective to
estimate a tight upper bound of voltage drop.
Achieve 21% size as well as leakage reduction on
average compared with [2].
- [2] Chiou et al. DAC’06
40
Thanks for your time.
41
Q&A
42
Backup Slides
43
Sleep Transistor (ST) Sizing
In the active mode
– Sleep Transistors operate in linear region.
– WST is inversely proportional to RST.
WST = k / RST
Relations between WST and VST.
VDD
VST: the voltage I(ST): the current
drop across the through the sleep
sleep transistor transistor
VGND
VST I(ST)
I ( ST )
WST ( )k
GND VST 44
Sleep Transistor (ST) Sizing
Determine the minimum required size (WST* )
based on:
1. MIC(ST)
2. VST*: IR-drop constraint
MIC(ST): Maximum
InstantaneousVDD
Current
I ( ST ) (MIC) through ST
WST ( )k
VST
MIC ( ST ) VGND
WST * ( )k
VST * MIC(ST)
GND
Smaller MIC(ST) leads to a
better ST size! 45
MIC Waveform
Pattern 1
Current
Time
MIC waveform of 3 patterns
Pattern 2
Current
Time
Pattern 3
Current
Time
46
RST Initialization
Physical limitation
– CMOS process limits the width of a sleep transistor.
– Choose the minimum width as the initial RST.
47