PowerPoint Presentation - Department of Computer Science_ Columbia by yaofenjin

VIEWS: 5 PAGES: 29

									 Advances in Designing
Clockless Digital Systems

   Prof. Steven M. Nowick
   nowick@cs.columbia.edu
  Department of Computer Science
        Columbia University
        New York, NY, USA
     Introduction

   Synchronous vs. Asynchronous Systems?
       Synchronous Systems: use a global clock
          entire   system operates at fixed-rate
          uses   “centralized control”


                      clock




                                                    #2
    Introduction (cont.)

   Synchronous vs. Asynchronous Systems? (cont.)
       Asynchronous Systems: no global clock
          components     can operate at varying rates
          communicate     locally via “handshaking”
          uses   “distributed control”



        “handshaking
          interfaces”
         (channels)


                                                         #3
 Trends and Challenges
Trends in Chip Design: next decade
     “Semiconductor Industry Association (SIA) Roadmap” (97-8)

Unprecedented Challenges:
     complexity and scale (= size of systems)
     clock speeds
     power management
     reusability & scalability
     “time-to-market”
Design becoming unmanageable using a centralized
  single clock (synchronous) approach….
                                                             #4
  Trends and Challenges (cont.)

1. Clock Rate:
     1980: several MegaHertz
     2001: ~750 MegaHertz - 1+ GigaHertz
     2005: several GigaHertz


Design Challenge:
     “clock skew”: clock must be near-simultaneous across
      entire chip

                                                             #5
   Trends and Challenges (cont.)

2. Chip Size and Density:
Total #Transistors per Chip: 60-80% increase/year
        ~1970:   4 thousand (Intel 4004 microprocessor)
        today:   50-200+ million
        2006   and beyond: towards 1 billion+


Design Challenges:
     system complexity, design time, clock distribution
     clock will require 10-20 cycles to reach across chip

                                                             #6
Trends and Challenges (cont.)

3. Power Consumption
     Low power: ever-increasing demand
        consumer   electronics: battery-powered
          high-end processors: avoid expensive fans, packaging




Design Challenge:
     clock inherently consumes power continuously
     “power-down” techniques: complex, only partly effective

                                                                  #7
  Trends and Challenges (cont.)

4. Time-to-Market, Design Re-Use, Scalability

Increasing pressure for faster “time-to-market”. Need:
     reusable components: “plug-and-play” design

     flexible interfacing: under varied conditions, voltage scaling

     scalable design: easy system upgrades


Design Challenge: mismatch w/ central fixed-rate clock


                                                                  #8
  Trends and Challenges (cont.)

5. Future Trends: “Mixed Timing” Domains
Chips themselves becoming distributed systems….
     contain many sub-regions, operating at different speeds:




 Design Challenge: breakdown of single centralized
                       clock control                             #9
   Asynchronous Design: Potential Advantages

Several Potential Advantages:

     Lower Power
        no   clock  components use power only “on demand”

     Robustness, Scalability
        no   global timing“mix-and-match” variable-speed components
        composable/modular     design style  “object-oriented”

     Higher Performance
        systems   not limited to “worst-case” clock rate


                                                                   #10
     Asynchronous Design: Some Recent Developments
1. Philips Semiconductors:
      commercial use: 100 million async chips for consumer electronics:
                pagers, cell phones, smart cards, digital passports, automotive
      3-4x lower power, less electromagnetic interference (“EMI”)
2. Intel:
      experimental: Pentium instruction-length decoder = “RAPPID” (1990‟s)
      3-4x faster than synchronous subsystem
3. Sun Labs:
      commercial use: high-speed FIFO‟s in recent “Ultra‟s” (memory access)
4. IBM Research:
      experimental: high-speed pipelines, filters, mixed-timing systems

Recent Startups: Fulcrum, Theseus Logic, Handshake Solutions, Silistrix


                                                                            #11
        Asynchronous CAD Tools: Recent Developments
DARPA’s “CLASS” Program: Clockless Initiative (2003-07)
Goals:
    - CAD tool: produce viable commercial-grade async tool flow
    - demonstration: a complex Boeing ASIC chip
Participants:
        Lead (PI): Boeing
        Industrial participants:
           Philips (via async incubated startup, “Handshake Solutions”)

           Theseus Logic, Codetronix

        Academic participants:
           Columbia, UNC, UW, Yale, OSU

Targets: cover wide “design space” – very robust to high-speed circuits
Columbia‟s role: (i) high-speed pipelines, (ii) CAD optimizations


                                                                           #12
    Asynchronous Design: Challenges


   Critical Design Issues:

       components must communicate cleanly: „hazard-free‟ design
       highly-concurrent designs: much harder to verify!


   Lack of Automated “Computer-Aided Design” Tools:
       most commercial “CAD” tools targeted to synchronous




                                                              #13
  What Are CAD Tools?

Software programs to aid digital designers =
                  “computer-aided design” tools
        automatically   synthesize and optimize digital circuits




Input:                                                 Output:
desired circuit                  CAD                   optimized circuit
  specification                                          implementation
                                TOOL



                                                                       #14
 Asynchronous Design Challenge

Lack of Existing Asynchronous Design Tools:
     Most commercial “CAD” tools targeted to synchronous

     Synchronous CAD tools:
        major   drivers of growth in microelectronics industry

     Asynchronous “chicken-and-egg” problem:
        few   CAD tools  less commercial use of async design
        especially   lacking: tools for designing/optmzng. large systems




                                                                      #15
             Overview: My Research Areas
   CAD Tools for Asynchronous Controllers (FSM‟s)
       “MINIMALIST” Package: for synthesis + optimization

   Other Research Areas:
       CAD Tools for Designing Large-Scale Async Systems
       Mixed-Timing Interface Circuits:
          for   interfacing sync/async systems

       High-Speed Asynchronous Pipelines



                                                             #16
             CAD Tools for Async Controllers
MINIMALIST: developed at Columbia University [1994-]
     extensible CAD package for synthesis of asynchronous controllers
     integrates synthesis, optimization and verification tools
     used in 80+ sites/17+ countries (being taught in IIT Bombay)
     URL: http://www.cs.columbia.edu/async

Includes several optimization tools:
     State Minimization
     CHASM: optimal state encoding
     2-Level Hazard-Free Logic Minimization
     Verilog back-end

Key goal: facilitate design-space exploration

                                                                         #17
   Example: “PE-SEND-IFC” (HP Labs)
   Inputs:        Outputs:
   req-send       tack                                      0
   treq           peack         req-send-/
                                --                              req-send+ treq+ rd-iq+/
   rd-iq          adbld                                         adbld+
   adbld-out                                                1
   ack-pkt                                                      adbld-out+/
                                                                peack+
                                            adbld-out-    2 rd-iq-/
                                          treq- ack-pkt+/   peack- adbld-
                                              peack+          tack+
                                         8                  3
From HP Labs                                 ack-pkt+/          adbld-out- treq-
                                             peack- tack-
   “Mayfly” Project:                                            rd-id+/ adbld+
                                         9                  4
B.Coates, A.Davis, K.Stevens,                                    adbld-out+/
                                treq-/
“The Post Office                tack-
                                               treq+/
                                                                 peack+            adbld-out-
                                               tack+
  Experience: Designing a                10                 5                      treq+ rd-iq+/
 Large Asynchronous Chip”,                                                            adbld+
                                                                rd-iq-/ peack-
INTEGRATION: the                     ack-pkt- treq-/            adbld- tack-
  VLSI Journal, vol. 15:3,           peack- tack-           6
  pp. 341-66 (Oct. 1993)                                        adbld-out- treq+ ack-pkt+/
                                                                  peack+ tack+
                                                            7
                                                                                             #18
EXAMPLE (cont.):
       Design-Space Exploration
         using MINIMALIST:
      optimizing for area vs. speed




Examples:


                                      #19
    CAD Tools for Large-Scale Asynchronous Systems

         Input Specification:                                  Target Architecture:
   = “Control Data-flow Graph”
                                                              control unit
                                                                                  Functional
                                                                 Ctrlr 1             Unit
                      Start
                                                                                  Functional
    B:=2dx+dx                    C:=X<a                          Ctrlr 2             Unit
                     Loop C< 0             End                                    Functional
                                                                 Ctrlr 3             Unit
    M:=U*X1          X:=X+dx           C:=X<a
                                                                 Register          Register
                   Endloop                               Target:
                                                         - synthesize distributed control
[Theobald/Nowick, IEEE Design Automation Conf. (2001)]   - 1 controller per functional unit    #20
   Mixed-Timing Interfaces


      Asynchronous       Asynchronous
         Domain             Domain


                                              Synchronous
                                               Domain 2


                          Synchronous
                           Domain 1



Goal: provide low-latency communication between “timing domains”
Challenge: avoid synchronization errors
                                                                   #21
  Mixed-Timing Interfaces: Solution
                                                                Async-Sync FIFO



    Asynchronous                                       Asynchronous
       Domain                                             Domain


                                                                                 Synchronous
          Sync-Async FIFO




                                                                                  Domain 2
                               Async-Sync FIFO




                                                        Synchronous
                                                         Domain 1


                                                                                 Mixed-Clock FIFO’s

Solution: insert mixed-timing FIFO‟s  provide safe data transfer
                            … developed complete family of mixed-timing interface circuits
                                                 [Chelcea/Nowick, IEEE Design Automation Conf. (2001)]   #22
 High-Speed Asynchronous Pipelines



NON-PIPELINED COMPUTATION:        “datapath component” =
                                     adder, multiplier, etc.


global clock




                    SYNCHRONOUS
                                                               #23
   High-Speed Asynchronous Pipelines

“PIPELINED COMPUTATION”: like an assembly line

        global clock




                        SYNCHRONOUS




     no global clock
                       ASYNCHRONOUS
                                                 #24
   High-Speed Asynchronous Pipelines
Goal: extremely fast async datapath components
      speed: comparable to fastest existing synchronous designs
      additional benefits:
           dynamically adapt to variable-speed interfaces: voltage scaling!
           “elastic” processing of data in pipeline
           no clock distribution

Contributions: 3 new async pipeline styles          [SINGH/NOWICK]
      MOUSETRAP:                           static logic
      High-Capacity/Lookahead:     dynamic logic

   Obtain multi-GigaHertz speeds
   Used by IBM, currently incorporated into Philips tool flow



                                                                               #25
   MOUSETRAP: A Basic FIFO (no computation)

Stages communicate using transition-signaling:
                                        Latch Controller
                               ackN-1                      ackN


                                           En

                                reqN         doneN reqN+1

     Data in                                                                  Data out

                                        Data Latch
                  Stage N-1               Stage N                 Stage N+1

[Singh/Nowick, IEEE Int. Conf. on Computer Design (2001)]
                                                                                   #26
    “MOUSETRAP” Pipeline: w/computation

                                     Latch Controller
                    ackN-1                              ackN

                              reqN                                reqN+1
                      delay                               delay                        delay
                                          doneN


                      logic                              logic                         logic
                                     Data Latch

        Stage N-1                        Stage N                           Stage N+1




Function Blocks: use “synchronous” single-rail circuits (not hazard-free!)
“Bundled Data” Requirement:
       each “req” must arrive after data inputs valid and stable

                                                                                               #27
#28
      MOUSETRAP: A Basic FIFO
   Stages communicate using transition-signaling:
                                      Latch Controller
1 transition                 ackN-1                      ackN
per data item!

                                         En

                             reqN          doneN reqN+1

       Data in                                                              Data out

                                      Data Latch
                 Stage N-1              Stage N                 Stage N+1

                         One Data Item
                                                                                 #29

								
To top