Introduction to CMOS VLSI Design by btRl71

VIEWS: 12 PAGES: 29

									 Introduction to
   CMOS VLSI
     Design


Design for Skew
                   Outline
   Clock Distribution
   Clock Skew
   Skew-Tolerant Static Circuits
   Traditional Domino Circuits
   Skew-Tolerant Domino Circuits




Design for Skew      CMOS VLSI Design   Slide 2
                  Clocking
 Synchronous systems use a clock to keep
  operations in sequence
   – Distinguish this from previous or next
   – Determine speed at which machine operates
 Clock must be distributed to all the sequencing
  elements
   – Flip-flops and latches
 Also distribute clock to other elements
   – Domino circuits and memories



Design for Skew      CMOS VLSI Design               Slide 3
           Clock Distribution
 On a small chip, the clock distribution network is just
  a wire
   – And possibly an inverter for clkb
 On practical chips, the RC delay of the wire
  resistance and gate load is very long
   – Variations in this delay cause clock to get to
     different elements at different times
   – This is called clock skew
 Most chips use repeaters to buffer the clock and
  equalize the delay
   – Reduces but doesn’t eliminate skew
Design for Skew        CMOS VLSI Design              Slide 4
                         Example
 Skew comes from differences in gate and wire delay
   – With right buffer sizing, clk1 and clk2 could ideally
     arrive at the same time.
   – But power supply noise changes buffer delays
   – clk2 and clk3 will always see RC skew

                          gclk
                  3 mm                 3.1 mm         0.5 mm
        clk1                                                   clk3
                                               clk2
    1.3 pF
                                             0.4 pF            0.4 pF



Design for Skew           CMOS VLSI Design                              Slide 5
     Review: Skew Impact
 Ideally full cycle is
                                                                 clk                                                clk

                                                                             Q1                            D2




                                                                 F1




                                                                                                                    F2
                                                                                     Combinational Logic


  available for work                                                                           Tc




 Skew adds sequencing
                                                     clk
                                                                         tpcq
                                                                                                                            tskew

                                                            Q1                                      tpdq   tsetup

  overhead                                                  D2

 Increases hold time too                                  clk

                                                                   Q1




                                                            F1
                                                                              CL


  t pd  Tc   t pcq  tsetup  tskew                           clk

                                                           D2
                   sequencing overhead



                                                                  F2
  tcd  thold  tccq  tskew                                           tskew

                                                      clk
                                                                             thold

                                                      Q1 tccq


                                                      D2               tcd




Design for Skew                            CMOS VLSI Design                                                               Slide 6
          Cycle Time Trends
 Much of CPU performance comes from higher f
  – f is improving faster than simple process shrinks
  – Sequencing overhead is bigger part of cycle               100
                                                                                                                                                                            1000




                                                               10




                                                                                                                                                                  MHz
                                                  SpecInt95




                                                                 1                                                                                                           100


                                                                                                             80386                                                                                                      80386
                                                                                                             80486
                                                               0.1                                                                                                                                                      80486
                                                                                                             Pentium
                                                                                                                                                                                                                        Pentium
                                                                                                             Pentium II / III
                                                                                                                                                                                                                        Pentium II / III

                                                              0.01                                                                                                            10
                                                                  1985     1988          1991         1994              1997           2000                                    1985   1988              1991     1994      1997             2000



                                                                                                                                                                             100
          Fanout-of-4 (FO4) Inverter Delay (ps)




                                                                                          VDD = 3.3
                                                                         VDD = 5
                                                          500
                                                                                                                                              FO4 inverter delays / cycle




                                                                                                                                                                             50
                                                                                                                 VDD = 2.5
                                                          200


                                                                                                                                                                                             80386
                                                          100                                                                                                                20              80486
                                                                                                                                                                                             Pentium
                                                                                                                                                                                             Pentium II / III

                                                              50                                                                                                              10
                                                                         2.0       1.2      0.8        0.6             0.35     0.25                                           1985   1988            1991      1994      1997             2000

                                                                                           Process




Design for Skew                                                                                                          CMOS VLSI Design                                                                                                          Slide 7
                  Solutions
 Reduce clock skew
   – Careful clock distribution network design
   – Plenty of metal wiring resources
 Analyze clock skew
   – Only budget actual, not worst case skews
   – Local vs. global skew budgets
 Tolerate clock skew
   – Choose circuit structures insensitive to skew




Design for Skew       CMOS VLSI Design               Slide 8
      Clock Dist. Networks
   Ad hoc
   Grids
   H-tree
   Hybrid




Design for Skew   CMOS VLSI Design   Slide 9
                  Clock Grids
   Use grid on two or more levels to carry clock
   Make wires wide to reduce RC delay
   Ensures low skew between nearby points
   But possibly large skew across die




Design for Skew        CMOS VLSI Design             Slide 10
           Alpha Clock Grids
                  Alpha 21064        Alpha 21164      Alpha 21264




                                                                     PLL




                    gclk grid                         gclk grid




                   Alpha 21064          Alpha 21164    Alpha 21264



Design for Skew                  CMOS VLSI Design                          Slide 11
                   H-Trees
 Fractal structure
   – Gets clock arbitrarily close to any point
   – Matched delay along all paths
 Delay variations cause skew
 A and B might see big skew                A    B




Design for Skew        CMOS VLSI Design              Slide 12
             Itanium 2 H-Tree
 Four levels of buffering:
   – Primary driver
   – Repeater                            Repeaters



   – Second-level
     clock buffer
   – Gater
 Route around                             Typical SLCB
                                           Locations


  obstructions
                                            Primary Buffer




Design for Skew       CMOS VLSI Design      Slide 13
             Hybrid Networks
 Use H-tree to distribute clock to many points
 Tie these points together with a grid

 Ex: IBM Power4, PowerPC
   – H-tree drives 16-64 sector buffers
   – Buffers drive total of 1024 points
   – All points shorted together with grid




Design for Skew       CMOS VLSI Design            Slide 14
              Skew Tolerance
 Flip-flops are sensitive to skew because of hard edges
   – Data launches at latest rising edge of clock
   – Must setup before earliest next rising edge of clock
   – Overhead would shrink if we can soften edge
 Latches tolerate moderate amounts of skew
   – Data can arrive anytime latch is transparent




Design for Skew       CMOS VLSI Design            Slide 15
                       Skew: Latches
2-Phase Latches                                                   1                             2                              1




                   2t 
                                                             D1        Q1   Combinational   D2        Q2   Combinational    D3        Q3




                                                                  L1




                                                                                                 L2




                                                                                                                                 L3
t pd  Tc             pdq
                                                                              Logic 1                        Logic 2



              sequencing overhead                       1



tcd 1 , tcd 2  thold  tccq  tnonoverlap  tskew      2




                 tsetup  tnonoverlap  tskew 
            Tc
tborrow 
            2
Pulsed Latches
t pd  Tc  max  t pdq , t pcq  tsetup  t pw  tskew 
                             sequencing overhead

tcd  thold  t pw  tccq  tskew

tborrow  t pw   tsetup  tskew 

Design for Skew                                CMOS VLSI Design                                                            Slide 16
 Dynamic Circuit Review
 Static circuits are slow because fat pMOS load input
 Dynamic gates use precharge to remove pMOS
  transistors from the inputs
   – Precharge:  = 0 output forced high
   – Evaluate:        = 1 output may pull low
                A
                B
                C                                         Y
                D            Y      A          B   C   D
      A     B       C   D



Design for Skew             CMOS VLSI Design                   Slide 17
              Domino Circuits
 Dynamic inputs must monotonically rise during
  evaluation
   – Place inverting stage between each dynamic gate
   – Dynamic / static pair called domino gate
 Domino gates can be safely cascaded
                       domino AND


                           W        X
                  A
                  B
                  

                      dynamic static
                       NAND inverter


Design for Skew           CMOS VLSI Design     Slide 18
                  Domino Timing
 Domino gates are 1.5 – 2x faster than static CMOS
   – Lower logical effort because of reduced Cin
 Challenge is to keep precharge off critical path
 Look at clocking schemes for precharge and eval
   – Traditional schemes have severe overhead
   – Skew-tolerant domino hides this overhead




Design for Skew       CMOS VLSI Design          Slide 19
 Traditional Domino Ckts
 Hide precharge time by ping-ponging between half-
  cycles
   – One evaluates while other precharges
   – Latches hold results during precharge
                                                                                                  Tc


                      clk


                      clk


 t pd  Tc  2t pdq
                            clk                clk                clk                clk clk clk                           clk                clk                clk clk
                            Dynamic


                                               Dynamic


                                                                  Dynamic


                                                                                     Dynamic


                                                                                                        Dynamic


                                                                                                                           Dynamic


                                                                                                                                              Dynamic


                                                                                                                                                                 Dynamic
                                      Static


                                                         Static


                                                                            Static




                                                                                                                  Static


                                                                                                                                     Static


                                                                                                                                                        Static
                                                                                                Latch




                                                                                                                                                                            Latch
                                                                                               tpdq                                                                        tpdq



Design for Skew              CMOS VLSI Design                                                                                                                                       Slide 20
                        Clock Skew
 Skew increases sequencing overhead
   – Traditional domino has hard edges
   – Evaluate at latest rising edge
   – Setup at latch by earliest falling edge

                                clk


                                clk
 t pd  Tc  2tsetup  2tskew
                                      clk                clk                clk clk           clk                clk                clk clk
                                      Dynamic


                                                         Dynamic


                                                                            Dynamic




                                                                                              Dynamic


                                                                                                                 Dynamic


                                                                                                                                    Dynamic
                                                Static


                                                                   Static




                                                                                                        Static


                                                                                                                           Static
                                                                                      Latch




                                                                                                                                              Latch
                                                                                                                                              tsetup tskew



Design for Skew                        CMOS VLSI Design                                                                                                      Slide 21
              Time Borrowing
 Logic may not exactly fit half-cycle
   – No flexibility to borrow time to balance logic
     between half cycles
 Traditional domino sequencing overhead is about
  25% of cycle time in fast systems!
                   clk


                   clk

                         clk                clk                clk     clk                clk                clk
                         Dynamic


                                            Dynamic




                                                                       Dynamic


                                                                                          Dynamic
                                   Static


                                                      Static




                                                                                 Static


                                                                                                    Static
                                                               Latch




                                                                                                             Latch
                                                                                                                     tsetup tskew



Design for Skew           CMOS VLSI Design                                                                                          Slide 22
       Relaxing the Timing
 Sequencing overhead caused by hard edges
   – Data departs dynamic gate on late rising edge
   – Must setup at latch on early falling edge
 Latch functions
   – Prevent glitches on inputs of domino gates
   – Holds results during precharge
 Is the latch really necessary?
   – No glitches if inputs come from other domino
   – Can we hold the results in another way?


Design for Skew      CMOS VLSI Design           Slide 23
   Skew-Tolerant Domino
 Use overlapping clocks to eliminate latches at phase
  boundaries.
   – Second phase evaluates using results of first
                                     No latch at
                                     phase boundary
                       1                         2


                       Dynamic




                                                  Dynamic
                                 a            b              c            d



                                     Static




                                                                 Static
                  1                                        1


                  2                                        2


                  a                                         a


                  b                                         b


                  c                                         c



Design for Skew                  CMOS VLSI Design                             Slide 24
                  Full Keeper
 After second phase evaluates, first phase precharges
 Input to second phase falls
   – Violates monotonicity?
 But we no longer need the value
 Now the second gate has a floating output
   – Need full keeper to hold it either high or low
                                H
                                         X
                                weak full
                      f         keeper
                                transistors



Design for Skew           CMOS VLSI Design      Slide 25
              Time Borrowing
 Overlap can be used to
   – Tolerate clock skew
   – Permit time borrowing
 No sequencing overhead
                                                                                                              toverlap
                                                                                                    tborrow tskew

                  1


                  2
  t pd  Tc
                       1                 1                  1                 1                 1                   2                  2                 2
                       Dynamic


                                          Dynamic


                                                              Dynamic


                                                                                 Dynamic


                                                                                                    Dynamic


                                                                                                                         Dynamic


                                                                                                                                             Dynamic


                                                                                                                                                                Dynamic
                                 Static


                                                     Static


                                                                        Static


                                                                                           Static


                                                                                                                Static


                                                                                                                                    Static


                                                                                                                                                       Static


                                                                                                                                                                          Static
                                                    Phase 1                                                                        Phase 2


Design for Skew                                 CMOS VLSI Design                                                                                                                   Slide 26
              Multiple Phases
 With more clock phases, each phase overlaps more
  – Permits more skew tolerance and time borrowing

             1


             2


             3


             4

                  1                  1                 2                  2                 3                  3                 4                  4
                  Dynamic


                                      Dynamic


                                                         Dynamic


                                                                             Dynamic


                                                                                                Dynamic


                                                                                                                    Dynamic


                                                                                                                                       Dynamic


                                                                                                                                                           Dynamic
                             Static


                                                Static


                                                                    Static


                                                                                       Static


                                                                                                           Static


                                                                                                                              Static


                                                                                                                                                  Static


                                                                                                                                                                     Static
                            Phase 1                                Phase 2                                Phase 3                                Phase 4



Design for Skew                                                          CMOS VLSI Design                                                                                     Slide 27
            Clock Generation

 en clk

                                       1

                                       2

                                       3

                                       4


Design for Skew   CMOS VLSI Design   Slide 28
                  Summary
 Clock skew effectively increases setup and hold
  times in systems with hard edges
 Managing skew
   – Reduce: good clock distribution network
   – Analyze: local vs. global skew
   – Tolerate: use systems with soft edges
 Flip-flops and traditional domino are costly
 Latches and skew-tolerant domino perform at full
  speed even with moderate clock skews.



Design for Skew      CMOS VLSI Design            Slide 29

								
To top