Docstoc

Design Productivity Crisis UCLA VLSI CAD Lab

Document Sample
Design Productivity Crisis UCLA VLSI CAD Lab Powered By Docstoc
					           Welcome to PROFIT
Pacific-Rim Outlook Forum on IC Technology




                                             1
Background of PROFIT
 IC-DFN   International Center for Design on
 Nanotechnologies (2000 – 2005)
 IC-SOC   Giga-Scale System-On-A-Chip
 International Research Center (2006 – 2009)




                                                2
IC-SOC and IC-DFN workshops from 2000 to 2008:
   ICDFN 2008 - Tianjin, China
   ICDFN 2007 - Rizhao, China
   ICDFN 2007 - Las Vegas, NV
   ICDFN 2006 - Hangzhou, China
   ICDFN 2006 - Grand Formosa Taroko
   ICDFN 2005 – Chengdu
   ICDFN 2004 – Hawaii
   ICDFN 2004 – Changsha
   ICDFN 2003 – Kunming

   ICDFN 2003 – Taiwan
   ICDFN 2002 - Santa Babara
   ICDFN 2002 – Beijing
   ICDFN 2002 – Taiwan
   ICDFN 2001 - Taiwan                    3
 www.profitforum.org
Coping with Vertical Interconnect
          Bottleneck
              Jason Cong
     UCLA Computer Science Department
               cong@cs.ucla.edu
       http://cadlab.cs.ucla.edu/~cong
Outline
 Lessons   learned
 Research   challenges and opportunities




                                            6
   Dr. Mike Fritze
   Chenson Chen and Craig Keast, MIT Lincoln Labs
   “Advanced 3D CAD/CAE for the Design of Mixed Signal Systems”, Mr. Kurt Obermiller, PTC
   “Technology & Design Infrastructure for High Performance 3D-Ics”, Dr. Albert Young, IBM
   “3D Integrated Circuit with Unlimited Upward Extendibility” Dr. Simon Wong, Stanford University
   “3D Modular Integration for Massively Stacked Systems” Dr. Volkan Ozguz, Irvine Sensors
    Corporation
   Larry Smith, Sematech
   CJ Shi, University of Washington
   Amy Moll, Boise State
   Gabriel Loh, Georgia Tech
   Jason Cong, UCLA





                                                                                              7
Early Studies of 3D ICs
 K.   Banerjee, S. J. Souri, P. Kapur, K. C.
 Saraswat, “3D ICs: A novel chip design for
 improving deep submicron interconnect
 performance and systems-on-chip integration,”
 Proc. IEEE, Special Issue on Interconnects, May
 2001, pp.602-633.
 Y.   Deng, W. Maly, Characteristics of 2.5-D
 System Integration Scheme 



                                                 8
9
Recent Work on 3D Physical Design Flow
(IBM, UCLA, and PSU) (2006 – 2008)
                              Layer &      Cell & Via*           Netlist
                            Design Rules   definitions        (HDL or DEF)
          PSU                  (LEF)         (LEF)                                UCLA

                                                                             Thermal-Driven 3D
   3D RC extraction                                                             Floorplanner
                                           3D OA

                Timing                     Tech. Lib
                                                                             Thermal-Driven 3D
 EinsTimer                                 Ref. Lib
                Interface                                                         Placer
                                            Design


  3D DRC & 3D LVS                                                            3D Global Router
                                             Tier          Tier              Thermal-Via Planner
                                            Export        Import



                 Layout                                                       Detailed Routing
                 (GDSII)                              2D OA                  by Cadence Router

   10/8/2007                     UCLA VLSICAD LAB        UCLA 3D research started10 2002
                                                                                  in
                                                         under DARPA with CFDRC
3D Architecture Evaluation with Physical Planning
-- MEVA-3D [DAC’03 & ASPDAC’06]
   Optimize                                 microarchitecture target       critical architectural power density
                                               configuration frequency      paths and sensitivity    estimates

     BIPS (not IPC or Freq)




                                                 ESTIMATION
      • Consider interconnect
                                                                   2D/3D floorplanning for
          pipelining based on early                             performance and thermal with
          floorplanning for critical paths                          interconnect pipelining

      •   Use IPC sensitivity model                           estimated performance, temperature,
          [Jagannathan05]                                            and interconnect data

     Area/wirelength
                                                                   performance simulation
                                                                  with interconnect latencies
     Temperature
                                                 VALIDATION
                                                                      power density with
                                                                  interconnect consideration


                                                                   2D/3D thermal simulation

                                                                   performance, power and
                                                                         temperature
                                                                                                    11
Design Driver 1 (Using Top-Level Floorplan)
   An out-of-order superscalar processor micro-architecture
    with 4 banks of L2 cache in 70nm technology




 Critical   paths




                                                           12
Top-Level Wirelength Improvement from 3D
Stacking
 Close to 2X WL reduction (for top-level interconnects)
             120000


             100000


              80000


                                           2D
              60000
                                           3D


              40000


              20000


                  0
                      3G   4G   5G    6G




              Assume two device layers
                                                     13
Performance Improvement from 3D
Stacking
Disappointing ….




             Assume two device layers   14
2D vs 3D Layout
 Assume two device layers
     2D EV6-like core                              3D EV6-like core (2 layers)
       BIPS= 2.75                                        BIPS= 2.94


                           Wakeup loop :
                          The extra cycle is
                             eliminated.




                         Branch misprediction
                        resolution loop and the
                           L2 cache access
                                latency :
                        Some of the extra cycles
                            are eliminated


                                                                            15
Design Driver 2 (Using Full RTL)
 An   open-source 32-bit processor
   Compliant with SPARC V8 architecture
 Synthesized    by Cadence RTL compiler with UMC 90nm
 digital cell library and Faraday memory compiler
   Configuration: Single core with 4KB data cache and 4KB
    instruction cache as direct-mapped caches
   statistics:
     • #cell = 34225
     • #macro = 12
     • #net = 36789
     • Total area = 6.67 x 105 μm2


                           UCLA VLSICAD LAB                  16
Logical Hierarchy of LEON3
   LEON3 (77.8% area)
      Processor core (11.1% area)
          •   Integer unit (6.6% area)
          •   Multiplier (1.6% area)                           Other   Core
          •   Divider (0.7% area)
          •   Memory management unit (2.2% area)     Debug
      Register file (16.6% area)                    support
                                                                                 Register
      Cache memory (38.1% area)                                                   file
      TLB memory (12.0% area)
   Debug support unit (13.4% area)                 TLB
                                                   memory
   Other (8.8% area)
        Memory controller (1.8% area)                                        Cache
        Interrupt controller (0.3% area)                                     memory
        UART serial interface (0.7% area)
        AMBA AHB bus, AMBA APB bus (4.3% area)
        General purpose timer unit (1.4% area)
        General purpose I/O unit (0.3% area)


                                       UCLA VLSICAD LAB                            17
3D Placement Restricted By Logical Hierarchies

 Comparisons
         Flat           Processor         Register file
                        Core              restricted
                        restricted
  HWPL 0.99 (m)         1.09 (m)          1.20 (m)
  #TSV   3835           1715              845
   Flat 3D placement
   Processor core restricted
    • Processor core is restricted in only one device layer
            Including Integer unit, multiplier, divider and MMU
   Register file restricted
    • Register file is restricted in only one device layer
                               UCLA VLSICAD LAB                    18
Lessons Learned

 Block   stacking following the logic hierarchy gives
  limited performance and WL reduction
     potential is realized with extensive vertical
 Full

  connections




                                                         19
Research challenges and opportunities


 Novel 3D architecture component designs that can cope
 with the vertical interconnect bottleneck,
 Physicalsynthesis tools that can fully comply with
 global and local TSV density constraints,
 3D   microarchitecture exploration, include generating
 optimized 3D physical hierarchies under the TSV density
 constraints
 New   interconnect technologies that can alleviate or
 eliminate the vertical interconnect bottleneck.
Results from 3D Folding and Stacking

     4

    3.5

     3

    2.5                                  1   layer
                                         2   layers
     2
                                         3   layers
    1.5                                  4   layers
     1

    0.5

     0
            3G     4G      5G     6G




          Over 35% performance improvement
                                                      21
5GHz 3 Device Layer Layout




                             22
3D Architectural Blocks – Issue Queue
 Block   folding                         Benefits    from block folding
   Fold the entries and place them         Maximum delay reduction of
    on different layers                      50%, maximum area
   Effectively shortens the tag lines       reduction of 90% and a
 Port   partitioning                        maximum reduction in
                                             power consumption of 40%
   Place tag lines and ports on
    multiple layer, thus reducing
    both the height and width of the
    ISQ.
   The reduction in tag and
    matchline wires can help reduce
                                             (a) 2D issue queue with 4 taglines;
    both power and delay.
                                            (b) block folding; (c) port partitioning
3D Architectural Blocks – Caches
   3D-CACTI: a tool to model 3D cache for area, delay and power
     We add port partitioning method
     The area impaction of vias
   Improvements
     Port folding performs better than wordline folding for area.(72% vs 51%)
     Wordline folding is more effective in reducing the block delay (13% vs 5%)
     Port folding also performs better in reducing power (13% vs 5%)
   Requires dense TSVs




                                    Wordline Folding          Port Partitioning
      Single Layer Design
Research Challenges and Opportunities (2)


 Novel 3D architecture component designs that can cope
 with the vertical interconnect bottleneck,
 Physicalsynthesis tools that can fully comply with
 global and local TSV density constraints
 3D   microarchitecture exploration, include generating
 optimized 3D physical hierarchies under the TSV density
 constraints
 New   interconnect technologies that can alleviate or
 eliminate the vertical interconnect bottleneck.
Current Approaches to Handling
TSV Constraints
   Approach 1: minimizing
       WL + k* #TSVs

   Approach 2: minimizing WL (or weighted WL)
    subject to the total #TSV constraints
   None of these can handle local TSV density
    constraints
Research Challenges and Opportunities (3)


 Novel 3D architecture component designs that can cope
 with the vertical interconnect bottleneck,
 Physicalsynthesis tools that can fully comply with
 global and local TSV density constraints
 3D   microarchitecture exploration, include generating
 optimized 3D physical hierarchies under the TSV density
 constraints
 New   interconnect technologies that can alleviate or
 eliminate the vertical interconnect bottleneck.
Example:
Impact of Following Logical Hierarchy
 Comparisons
           Flat              Processor           Register file
                             Core                restricted
                             restricted
  HWPL 0.99 (m)              1.09 (m)            1.20 (m)
  #TSV 3D3835
    Flat   placement         1715                845
   Processor core restricted
     • Processor core is restricted in only one device layer
              Including Integer unit, multiplier, divider and MMU
   Register file restricted
     • Register file is restricted in only one device layer
 Question:how much logic hierarchy to flatten for 3D
 design/optimization?
                                    UCLA VLSICAD LAB                 28
Research Challenges and Opportunities (4)


 Novel 3D architecture component designs that can cope
 with the vertical interconnect bottleneck,
 Physicalsynthesis tools that can fully comply with
 global and local TSV density constraints
 3D   microarchitecture exploration, include generating
 optimized 3D physical hierarchies under the TSV density
 constraints
 New   interconnect technologies that can alleviate or
 eliminate the vertical interconnect bottleneck.
Contactless Interconnects
    Inductor-coupled            Capacitor-coupled
      Interconnect                Interconnect
    M field
                  TX                            TX

                  RX            E field         RX



Advantages:                  Advantages:
More effective for longer    Smaller size
distance communication       Lower cross talk
(hundreds of microns)
                             Disadvantages:
Disadvantages:               Effective for short distance
Larger size                  communication (several microns)
Higher cross talks between
                               Suitable for 3DIC
channels
                                  integration
Die Photos (MIT LL 0.18um)
                          RFI die photo
 BISI die photo

                                    Coupling
                                    Capacitor


            Coupling
            Capacitor
                           TX in
                          Layer 2
                                                 RX in
       TX in                                    Layer 1
      Layer 2
                 RX in
                Layer 1
   BISI Test Results [ISSCC’07]

              Data rate:10Gbps

        20ps/div                 500ps/div
                       Input

                       Output
       50mV/div


Output Eye diagram     Output versus Input
Conclusions
 Never   enough for vertical interconnects (VIs)
 Need   to cope with VI constraints
   Novel 3D architecture component designs
   Physical synthesis tools that can fully comply with global and
    local TSV density constraints,
   3D microarchitecture exploration, include generating optimized
    3D physical hierarchies under the TSV density constraints

 Need   to find ways to break VI bottleneck
   New interconnect technologies



                                                              33
Acknowledgements
 We   would like to thank the supports from
    DARPA
 Support   from the primary contractors --
    Collaboration with CFDRC and IBM
   Publications are available from
    http://cadlab.cs.ucla.edu/~cong
Example 1 Processor Parameters




                                 35
                                               Other   Core
                                     Debug
Example 2 Flat 3D Placement          support
                                                                    Register
                                                                      file
 HPWL   = 0.99 (m)
                                    TLB
                                   memory
 #TSV   = 3835
                                                              Cache
                                                              memory
 Placement   (bottom layer, top layer)




                      UCLA VLSICAD LAB                         36
                                                Other   Core
                                      Debug
Example 2 Processor Core Restricted   support
                                                                     Register
                                                                       file
 HPWL   = 1.09 (m)
                                     TLB
                                    memory
 #TSV   = 1715
                                                               Cache
                                                               memory
 Placement    (bottom, top)




                       UCLA VLSICAD LAB                         37
                                                 Other   Core
                                       Debug
Example 2 Register File Restricted     support
                                                                      Register
                                                                        file
 HPWL    = 1.20 (m)
                                      TLB
                                     memory
 #TSV   = 845
                                                                Cache
                                                                memory
 Placement    (bottom, top)




                        UCLA VLSICAD LAB                         38
Further Discussions of Example 2
 Comparisons
          Flat            Processor         Register file
                          Core              restricted
                          restricted
  HWPL 0.99 (m)           1.09 (m)          1.20 (m)
         3835         1715
  #TSVquantify the impact of TSV
    To                                      845
    • Example: TSV in MIT Lincoln 180nm SOI 3D technology
             Resistance: one TSV is equivalent to a 8-20 μm metal 2 wire
             Capacitance: one TSV is equivalent to a 0.2 μm metal 2 wire

 Conclusion
   TSV impact on the RC is not significant
   Some logical units are preferred to be distributed on different
    device layers
    • E.g., the register file in the LEON3 circuit
   Flat 3D placement is preferred to optimize total RC
                                UCLA VLSICAD LAB                            39
3D Capacitive RF-Interconnect

                                        ASK
                         ASK Signal   Modulator Buffer        INPUT


  Tier N+1   Coupling                          Transmitter
             Capacitor
                                                               OUTPUT
                          Envelope                   Buffer
                          Detector Sense Amp
   Tier N                                         Receiver




NRZ baseband signal is up-converted by an RF carrier at the
transmitter (tier N+1) using ASK (Amplitude-Shift-Key)
modulation; and an RF envelope detector at the receiver
(tier N) recovers NRZ data in the receiver.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:10/29/2012
language:English
pages:40