Introduction to CMOS VLSI Design
Document Sample


Introduction to
CMOS VLSI
Design
Design for Skew
Outline
Clock Distribution
Clock Skew
Skew-Tolerant Static Circuits
Traditional Domino Circuits
Skew-Tolerant Domino Circuits
Design for Skew CMOS VLSI Design Slide 2
Clocking
Synchronous systems use a clock to keep
operations in sequence
– Distinguish this from previous or next
– Determine speed at which machine operates
Clock must be distributed to all the sequencing
elements
– Flip-flops and latches
Also distribute clock to other elements
– Domino circuits and memories
Design for Skew CMOS VLSI Design Slide 3
Clock Distribution
On a small chip, the clock distribution network is just
a wire
– And possibly an inverter for clkb
On practical chips, the RC delay of the wire
resistance and gate load is very long
– Variations in this delay cause clock to get to
different elements at different times
– This is called clock skew
Most chips use repeaters to buffer the clock and
equalize the delay
– Reduces but doesn’t eliminate skew
Design for Skew CMOS VLSI Design Slide 4
Example
Skew comes from differences in gate and wire delay
– With right buffer sizing, clk1 and clk2 could ideally
arrive at the same time.
– But power supply noise changes buffer delays
– clk2 and clk3 will always see RC skew
gclk
3 mm 3.1 mm 0.5 mm
clk1 clk3
clk2
1.3 pF
0.4 pF 0.4 pF
Design for Skew CMOS VLSI Design Slide 5
Review: Skew Impact
Ideally full cycle is
clk clk
Q1 D2
F1
F2
Combinational Logic
available for work Tc
Skew adds sequencing
clk
tpcq
tskew
Q1 tpdq tsetup
overhead D2
Increases hold time too clk
Q1
F1
CL
t pd Tc t pcq tsetup tskew clk
D2
sequencing overhead
F2
tcd thold tccq tskew tskew
clk
thold
Q1 tccq
D2 tcd
Design for Skew CMOS VLSI Design Slide 6
Cycle Time Trends
Much of CPU performance comes from higher f
– f is improving faster than simple process shrinks
– Sequencing overhead is bigger part of cycle 100
1000
10
MHz
SpecInt95
1 100
80386 80386
80486
0.1 80486
Pentium
Pentium
Pentium II / III
Pentium II / III
0.01 10
1985 1988 1991 1994 1997 2000 1985 1988 1991 1994 1997 2000
100
Fanout-of-4 (FO4) Inverter Delay (ps)
VDD = 3.3
VDD = 5
500
FO4 inverter delays / cycle
50
VDD = 2.5
200
80386
100 20 80486
Pentium
Pentium II / III
50 10
2.0 1.2 0.8 0.6 0.35 0.25 1985 1988 1991 1994 1997 2000
Process
Design for Skew CMOS VLSI Design Slide 7
Solutions
Reduce clock skew
– Careful clock distribution network design
– Plenty of metal wiring resources
Analyze clock skew
– Only budget actual, not worst case skews
– Local vs. global skew budgets
Tolerate clock skew
– Choose circuit structures insensitive to skew
Design for Skew CMOS VLSI Design Slide 8
Clock Dist. Networks
Ad hoc
Grids
H-tree
Hybrid
Design for Skew CMOS VLSI Design Slide 9
Clock Grids
Use grid on two or more levels to carry clock
Make wires wide to reduce RC delay
Ensures low skew between nearby points
But possibly large skew across die
Design for Skew CMOS VLSI Design Slide 10
Alpha Clock Grids
Alpha 21064 Alpha 21164 Alpha 21264
PLL
gclk grid gclk grid
Alpha 21064 Alpha 21164 Alpha 21264
Design for Skew CMOS VLSI Design Slide 11
H-Trees
Fractal structure
– Gets clock arbitrarily close to any point
– Matched delay along all paths
Delay variations cause skew
A and B might see big skew A B
Design for Skew CMOS VLSI Design Slide 12
Itanium 2 H-Tree
Four levels of buffering:
– Primary driver
– Repeater Repeaters
– Second-level
clock buffer
– Gater
Route around Typical SLCB
Locations
obstructions
Primary Buffer
Design for Skew CMOS VLSI Design Slide 13
Hybrid Networks
Use H-tree to distribute clock to many points
Tie these points together with a grid
Ex: IBM Power4, PowerPC
– H-tree drives 16-64 sector buffers
– Buffers drive total of 1024 points
– All points shorted together with grid
Design for Skew CMOS VLSI Design Slide 14
Skew Tolerance
Flip-flops are sensitive to skew because of hard edges
– Data launches at latest rising edge of clock
– Must setup before earliest next rising edge of clock
– Overhead would shrink if we can soften edge
Latches tolerate moderate amounts of skew
– Data can arrive anytime latch is transparent
Design for Skew CMOS VLSI Design Slide 15
Skew: Latches
2-Phase Latches 1 2 1
2t
D1 Q1 Combinational D2 Q2 Combinational D3 Q3
L1
L2
L3
t pd Tc pdq
Logic 1 Logic 2
sequencing overhead 1
tcd 1 , tcd 2 thold tccq tnonoverlap tskew 2
tsetup tnonoverlap tskew
Tc
tborrow
2
Pulsed Latches
t pd Tc max t pdq , t pcq tsetup t pw tskew
sequencing overhead
tcd thold t pw tccq tskew
tborrow t pw tsetup tskew
Design for Skew CMOS VLSI Design Slide 16
Dynamic Circuit Review
Static circuits are slow because fat pMOS load input
Dynamic gates use precharge to remove pMOS
transistors from the inputs
– Precharge: = 0 output forced high
– Evaluate: = 1 output may pull low
A
B
C Y
D Y A B C D
A B C D
Design for Skew CMOS VLSI Design Slide 17
Domino Circuits
Dynamic inputs must monotonically rise during
evaluation
– Place inverting stage between each dynamic gate
– Dynamic / static pair called domino gate
Domino gates can be safely cascaded
domino AND
W X
A
B
dynamic static
NAND inverter
Design for Skew CMOS VLSI Design Slide 18
Domino Timing
Domino gates are 1.5 – 2x faster than static CMOS
– Lower logical effort because of reduced Cin
Challenge is to keep precharge off critical path
Look at clocking schemes for precharge and eval
– Traditional schemes have severe overhead
– Skew-tolerant domino hides this overhead
Design for Skew CMOS VLSI Design Slide 19
Traditional Domino Ckts
Hide precharge time by ping-ponging between half-
cycles
– One evaluates while other precharges
– Latches hold results during precharge
Tc
clk
clk
t pd Tc 2t pdq
clk clk clk clk clk clk clk clk clk clk
Dynamic
Dynamic
Dynamic
Dynamic
Dynamic
Dynamic
Dynamic
Dynamic
Static
Static
Static
Static
Static
Static
Latch
Latch
tpdq tpdq
Design for Skew CMOS VLSI Design Slide 20
Clock Skew
Skew increases sequencing overhead
– Traditional domino has hard edges
– Evaluate at latest rising edge
– Setup at latch by earliest falling edge
clk
clk
t pd Tc 2tsetup 2tskew
clk clk clk clk clk clk clk clk
Dynamic
Dynamic
Dynamic
Dynamic
Dynamic
Dynamic
Static
Static
Static
Static
Latch
Latch
tsetup tskew
Design for Skew CMOS VLSI Design Slide 21
Time Borrowing
Logic may not exactly fit half-cycle
– No flexibility to borrow time to balance logic
between half cycles
Traditional domino sequencing overhead is about
25% of cycle time in fast systems!
clk
clk
clk clk clk clk clk clk
Dynamic
Dynamic
Dynamic
Dynamic
Static
Static
Static
Static
Latch
Latch
tsetup tskew
Design for Skew CMOS VLSI Design Slide 22
Relaxing the Timing
Sequencing overhead caused by hard edges
– Data departs dynamic gate on late rising edge
– Must setup at latch on early falling edge
Latch functions
– Prevent glitches on inputs of domino gates
– Holds results during precharge
Is the latch really necessary?
– No glitches if inputs come from other domino
– Can we hold the results in another way?
Design for Skew CMOS VLSI Design Slide 23
Skew-Tolerant Domino
Use overlapping clocks to eliminate latches at phase
boundaries.
– Second phase evaluates using results of first
No latch at
phase boundary
1 2
Dynamic
Dynamic
a b c d
Static
Static
1 1
2 2
a a
b b
c c
Design for Skew CMOS VLSI Design Slide 24
Full Keeper
After second phase evaluates, first phase precharges
Input to second phase falls
– Violates monotonicity?
But we no longer need the value
Now the second gate has a floating output
– Need full keeper to hold it either high or low
H
X
weak full
f keeper
transistors
Design for Skew CMOS VLSI Design Slide 25
Time Borrowing
Overlap can be used to
– Tolerate clock skew
– Permit time borrowing
No sequencing overhead
toverlap
tborrow tskew
1
2
t pd Tc
1 1 1 1 1 2 2 2
Dynamic
Dynamic
Dynamic
Dynamic
Dynamic
Dynamic
Dynamic
Dynamic
Static
Static
Static
Static
Static
Static
Static
Static
Phase 1 Phase 2
Design for Skew CMOS VLSI Design Slide 26
Multiple Phases
With more clock phases, each phase overlaps more
– Permits more skew tolerance and time borrowing
1
2
3
4
1 1 2 2 3 3 4 4
Dynamic
Dynamic
Dynamic
Dynamic
Dynamic
Dynamic
Dynamic
Dynamic
Static
Static
Static
Static
Static
Static
Static
Static
Phase 1 Phase 2 Phase 3 Phase 4
Design for Skew CMOS VLSI Design Slide 27
Clock Generation
en clk
1
2
3
4
Design for Skew CMOS VLSI Design Slide 28
Summary
Clock skew effectively increases setup and hold
times in systems with hard edges
Managing skew
– Reduce: good clock distribution network
– Analyze: local vs. global skew
– Tolerate: use systems with soft edges
Flip-flops and traditional domino are costly
Latches and skew-tolerant domino perform at full
speed even with moderate clock skews.
Design for Skew CMOS VLSI Design Slide 29
Get documents about "