Approaching the Physical Limits of Computing by vyd52713

VIEWS: 53 PAGES: 44

									               Michael P. Frank


           http://www.eng.fsu.edu/~mpf




Approaching the Physical Limits of
          Computing

  Invited talk, presented at ISMVL 2005
 35th Int’l Symp. on Multiple-Valued Logic
    (Sponsor: IEEE Computer Society)
             Friday, May 20, 2005
                           Abstract of Talk
• Fundamental physics limits the performance of
  conventional computing technologies.
     – The energy efficiency of conventional machines will be
       forced to level off in roughly the next 10-20 years.
           • Practical computer performance will then plateau as well.
• However, all of the proven limits to computer energy
  efficiency can, in principle, be circumvented…
     – but only if computing undergoes a radical paradigm shift.
• The essential new paradigm: Reversible computing.
     – It involves reusing energy to improve energy efficiency.
           • However, doing this well tightly constrains computer design at all
             levels from devices through logic, architectures, and algorithms.
• In this talk, I review the stringent physical and logical
  requirements that must be met,
     – if we wish to break through the near-term barriers,
           • and approach the true physical limits of computing.
9/4/2010             M. Frank, "Approaching the Physical Limits of Computing"     2
            Moore’s Law and Performance                                              Moore's Law - Transistors per Chip
                                                    1,000,000,000                                                              Madison
                                                                                                                   Itanium 2
                                                     100,000,000                                                          P4
                                                                                                                        P3




                                                                    Devices per IC
                                                      10,000,000
• Gordon Moore, 1975:                                  1,000,000
                                                                                                             486DX Pentium
                                                                                                             386
                                                                                                                          P2

                                                                                                          286
    – Devices per IC can be                              100,000
                                                          10,000
                                                                                                   8086

      doubled every 18 months                              1,000
                                                                                            4004          Avg. increase
                                                                                                          of 57%/year
           • Borne out by history!                           100
                                                              10
• Some fortuitous corollaries:                                 1
                                                                                               Year of Introduction
                                                               1950                  1960     1970        1980     1990   2000     2010
    – Every 3 years: Devices ½ as long
    – Every 1.5 years: ~½ as much stored energy per bit!
           • It is that that has enabled us to throw away bits (and their energies)
             2× more frequently every 1.5 years, at reasonable power levels!
               – And thereby double processor performance 2× every 1.5 years!
• Increased energy efficiency of computation is a
  prerequisite for improved raw performance!
    – Given realistic levels of total power consumption.
9/4/2010              M. Frank, "Approaching the Physical Limits of Computing"                                                      3
                    Efficiency in General,
                    and Energy Efficiency
• The efficiency η of any process is: η = P/C
    – Where P = Amount of some valued product produced
    – and C = Amount of some costly resources consumed
• In energy efficiency ηe, the cost C measures energy.
• We can talk about the energy efficiency of:
    – A heat engine: ηhe = W/Q, where:
           • W = work energy output, Q = heat energy input
    – An energy recovering process : ηer = Eend/Estart, where:
           • Eend = available energy at end of process,
           • Estart = energy input at start of process
    – A computer: ηec = Nops/Econs, where:
           • Nops = useful operations performed
           • Econs = free-energy consumed

9/4/2010             M. Frank, "Approaching the Physical Limits of Computing"   4
                                 Trend of Minimum Transistor Switching Energy
                                            ITRS '97-'03 Gate Energy Trends

                            1.E-14
                                           Based on ITRS ’97-03 roadmaps
                                        250                                                           LP min gate energy, aJ
                                              180                                                     HP min gate energy, aJ     fJ
                            1.E-15                  130        Node numbers
                                                                                                      100 k(300 K)
                                                          90   (nm DRAM hp)                           ln(2) k(300 K)
                            1.E-16                              65
CV2/2 gate energy, Joules




                                                                                                      1 eV
                                                                      45
                                                                            32                        k(300 K)
       CVV/2 energy, J




                            1.E-17                                               22
                                     Practical limit for CMOS?
                            1.E-18   Room-temperature 100 kT reliability limit                                                   aJ
                                     One electron volt
                            1.E-19

                            1.E-20   Room-temperature kT thermal energy
                                     Room-temperature von Neumann - Landauer limit
                            1.E-21                                                                                               zJ


                            1.E-22
                                 1995    2000             2005       2010    2015     2020   2025   2030   2035     2040       2045
                                                                                      Year
 9/4/2010                                       M. Frank, "Approaching the Physical Limits of Computing"                          5
           Some Lower Bounds on Energy
                   Dissipation
• In today‟s 90 nm VLSI technology, for minimal operations
  (e.g., conventional switching of a minimum-sized transistor):
     – Ediss,op is on the order of 1 fJ (femtojoule)  ηec ≲ 1015 ops/sec/watt.
           • Will be a bit better in coming technologies (65 nm, maybe 45 nm)
• But, conventional digital technologies are subject to several
  lower bounds on their energy dissipation Ediss,op for digital
  transitions (logic / storage / communication operations),
     – And thus, corresponding upper bounds on their energy efficiency.
• Some of the known bounds include:
     – Leakage-based limit for high-performance field-effect transistors:
           • Maybe roughly ~5 aJ (attojoules)  ηec ≲ 2×1017 operations/sec./watt
     – Reliability-based limit for all non-energy-recovering technologies:
           • On the order of 1 eV (electron-volt)  ηec ≲ 6×1018 ops./sec/watt
     – von Neumann-Landauer (VNL) bound for all irreversible technologies:
           • Exactly kT ln 2 ≈ 18 meV  ηec ≲ 3.5×1020 ops/sec/watt
               – For systems whose waste heat ultimately winds up in Earth‟s atmosphere,
                   » i.e., at temperature T ≈ Troom = 300 K.

9/4/2010              M. Frank, "Approaching the Physical Limits of Computing"             6
                 Reliability Bound on Logic
                      Signal Energies
• Let Esig denote the logic signal energy,
     – The energy involved (transferred, manipulated) in the process of storing,
       transmitting, or transforming a bit‟s worth of digital information.
           • But note that “involved” does not necessarily mean “dissipated!”
• As a result of fundamental thermodynamic considerations, it is required
  that Esig ≲ kBTsig ln r (with quantum corrections that are small for large r)
     – Where kB is Boltzmann‟s constant, 1.38×10−12 J/K;
     – and Tsig is the temperature in the degrees of freedom carrying the signal;
     – and r is the reliability factor, i.e., the improbability of error, 1/perr.
• In non-energy-recovering logic technologies (totally dominant today)
     – Basically all of the signal energy is dissipated to heat on each operation.
           • And often additional energy (e.g., short-circuit power) as well.
• In this case, minimum sustainable dissipation is Ediss,op ≳ kBTenv ln r,
     – Where Tenv is now the temperature of the waste-heat reservoir (environment)
           • Averages around 300 K (room temperature) in Earth‟s atmosphere
• For a decent r of e.g. 2×1017, this energy is on the order ~40 kT ≈ 1 eV.
     – Therefore, if we want energy efficiency ηec > ~1 op/eV, we must recover some
       of the signal energy for later reuse.
           • Rather than dissipating it all to heat with each manipulation of the signal.

9/4/2010                M. Frank, "Approaching the Physical Limits of Computing"            7
                 The von Neumann-Landauer
                       (VNL) Principle
• First alluded to by John von Neumann in 1949.
     – Developed explicitly by Rolf Landauer of IBM in 1961.
• The principle is a rigorous theorem of physics!
     – It follows from the reversibility of fundamental dynamics.
• A correct statement of the principle is the following:
     – Any process that loses or obliviously erases 1 bit of known
       (correlated) information increases total entropy by at least
                       ∆S = 1 bit = kB ln 2,
       and thus implies the eventual dissipation at least
                        Ediss = kBTenv ln 2
       of free energy to the environment as waste heat.
           • where kB = log e = 1.38×10−23 J/K is Boltzmann‟s constant
           • and Tenv = temperature of the waste-heat reservoir (environment)
               – Not less than about room temperature, or 300 K for earthbound
                 computers.  implies Ediss ≥ 18 meV.


9/4/2010             M. Frank, "Approaching the Physical Limits of Computing"    8
                 Definition of Reversibility
• What does it mean for a dynamical system (either
  continuous or discrete) to be (time-) reversible?
     – Let x(t) denote the state of the system at time t.
           • The universe, or any closed system of interest (e.g. a computer).
     – Let Ft→u(x) be the transition relation operating between a
       given two times t and u; i.e., x(u) = Ft→u[x(t)].
           • Determined by the system‟s dynamics (laws of physics, or a FSM).
     – Then the system is called “dynamically reversible” iff Ft→u
       is a one-to-one function, for any times (t, u) where u > t.
           • That is,  t >u: ¬ x1x2: Ft→u(x1) = Ft→u(x2).
               – That is, no two distinct states would ever go to the same state over the
                 course of a given time interval.
     – The definition implies determinism, if we also allow u < t.
           • A reversible system is deterministic in the reverse time direction.

9/4/2010             M. Frank, "Approaching the Physical Limits of Computing"           9
               Types of Dynamics

 • Nondeterministic,                         • Nondeterministic,
   irreversible                                reversible




 • Deterministic,                            • Deterministic,
   irreversible                                reversible                 WE
                                                                         ARE
                                                                        HERE



9/4/2010     M. Frank, "Approaching the Physical Limits of Computing"          10
                   Physics is Reversible!
• All of the successful models of fundamental physics
  are expressible in the Hamiltonian formalism.
     – Including: Classical mechanics, quantum mechanics,
       special and general relativity, quantum field theories.
           • The latter two (GR & QFT) are backed up by enormous,
             overwhelming mountains of evidence confirming their predictions!
              – 11 decimal places of precision so far! And, no contradicting evidence.
• In Hamiltonian systems, the dynamical state x(t)
  obeys a differential equation that‟s first-order in time,
       dx/dt = g(x)     (where g is some function)
     – This immediately implies determinism of the dynamics.
• And, since the time differential dt can be taken to be
  negative, the formalism also implies reversibility!
     – Thus, dynamical reversibility is one of the most firmly-
       established, fundamental, inviolable facts of physics!
9/4/2010            M. Frank, "Approaching the Physical Limits of Computing"        11
              Illustration of VNL Principle
•   Either digital state is initially encoded by any of N possible physical microstates
     – Illustrated as 4 in this simple example (the real number would usually be much larger)
     – Initial entropy S = log[#microstates] = log 4 = 2 bits.
•   Reversibility of physics ensures “bit erasure” operation can‟t possibly merge two
    microstates, so it must double the possible microstates in the digital state!
     – Entropy S = log[#microstates] increases by log 2 = 1 bit = (log e)(ln 2) = kB ln 2.
     – To prevent entropy from accumulating locally, it must be expelled into the environment.

                   Microstates                                  Microstates
                   representing                                 representing
                    logical “0”                                  logical “1”




    Entropy                                                                          Entropy
    S′ =
    S=                                                                               S=
        8
    log 4 =                                                                          log 4 =
    3 bits
    2                                                                                2 bits
                                          ∆S = S′ − S
                                        = 3 bits − 2 bits
                                             = 1 bit
9/4/2010              M. Frank, "Approaching the Physical Limits of Computing"                  12
                    Reversible Computing
• The basic idea is simply this:
     – Don’t erase information when performing logic / storage /
       communication operations!
           • Instead, just reversibly (invertibly) transform it in place!
• When reversible digital operations are implemented
  using well-designed energy-recovering circuitry,
     – This can result in local energy dissipation Ediss << Esig,
           • this has already been empirically demonstrated by many groups.
     – and even total energy dissipation Ediss << kT ln 2!
           • This is easily shown in theory & simulations,
               – but we are not yet to the point of demonstrating such low levels of total
                 dissipation empirically in a physical experiment.
           • Achieving this goal requires very careful design,
               – and verifying it requires very sensitive measurement equipment.


9/4/2010              M. Frank, "Approaching the Physical Limits of Computing"          13
           How Reversible Logic Avoids the
            von Neumann-Landauer Bound
• We arrange our logical manipulations to never
  attempt to merge two distinct digital states,
     – but only to reversibly                         logic 00            logic 01
       transform them from
       one state to another!
• E.g., illustrated is a
  reversible operation                                logic 10            logic 11
  cCLR (controlled CLR)
     – It and its inverse cSET
       enable arbitrary logic!
9/4/2010       M. Frank, "Approaching the Physical Limits of Computing"              14
           A Few Highlights Of Reversible
                Computing History
• Charles Bennett @ IBM, 1973-1989:
     – Reversible Turing machines & emulation algorithms
           • Can emulate irreversible machines on reversible architectures.
               – But, the emulation introduces some inefficiencies
     – Models of chemical & Brownian-motion physical realizations.
• Fredkin and Toffoli‟s group @ MIT, late 1970‟s/early 1980‟s
     – Reversible logic gates and networks (space/time diagrams)
     – Ballistic and adiabatic circuit implementation proposals
• Groups @ Caltech, ISI, Amherst, Xerox, MIT, „85-‟95:
     – Concepts for & implementations of adiabatic circuits in VLSI tech.
     – Small explosion of adiabatic circuit literature since then!
• Mid 1990s-today:
     – Better understanding of overheads, tradeoffs, asymptotic scaling
     – A few groups begin development of post-CMOS implementations
           • Most notably, the Quantum-dot Cellular Automata group at Notre Dame
9/4/2010              M. Frank, "Approaching the Physical Limits of Computing"     15
                                                                      Caveat #1
•                                Technically, to avoid the VNL bound doesn‟t actually require that the digital
                                 operation must be reversible at the level of the logical states…
                                  – It can be logically irreversible if the information in the digital state is already entropy!
                                        • In the below example, the non-digital entropy doesn‟t change, because the operation is also
                                          nondeterministic (N to N), and the transition relation between logical states has semi-detailed
                                          balance, so the entropy in the digital state remains constant.
•                                However, such operations just re-randomize bits that are already random!
                                  – It‟s not clear if this kind of operation is computationally useful.
Digital bit with unknown value




                                    0                                                                                              0




                                    1                                                                                              1


                                                              Physical dynamics whose precise
                                                                  details may be uncertain
9/4/2010                                              M. Frank, "Approaching the Physical Limits of Computing"                              16
                                   Caveat #2
• Operations that are logically N-to-1 can be used, if
  there are sufficient compensating 1-to-N
  (nondeterministic) logical operations.
     – All that is really required is that the logical dynamics be 1-
       to-1 in the long-term average.
           • Thus, it‟s possible to thermally generate random bits and discard
             them later when we are through with them.
               – While maintaining overall thermodynamic reversibility.
           • This ability is useful for probabilistic (randomized) algorithms.




                        logic 0                              logic 1
9/4/2010             M. Frank, "Approaching the Physical Limits of Computing"    17
               Reversibility and Reliability
• A widespread myth: “Future low-level digital
  devices will necessarily be highly unreliable.”
     – This comes from a flawed line of reasoning:
           • Faster  more energy efficient  lower bit energies  high rate of
             bit errors from thermal noise
     – However, this scaling strategy doesn‟t work, because:
           • High rate of thermal errors  high power dissipation from error
             correction  less energy efficient  ultimately slower!
• But in contrast, using reversible computing, we can
  achieve arbitrarily high energy efficiency while also
  maintaining arbitrarily high reliability!
     – The key is to keep bit energies reasonably high!
           • While recovering most of the bit energy…

9/4/2010             M. Frank, "Approaching the Physical Limits of Computing"   18
           Minimizing Energy Dissipation
               Due to Thermal Errors
• Let perr = 1/r be the bit-error probability per operation.
     – Where r quantifies the “reliability level.”
     – And pok = 1 − perr is the probability the bit is correct
• The necessary entropy increase ∆S per op due to error occurrence is
  given by the (binary) Shannon entropy of the bit-value after the operation:
        H(perr) = perr log perr-1 + pok log pok-1.
• For r >> 1 (i.e., as r → ∞), this increase approaches 0:
        ∆S = H(perr) ≈ perr log perr-1 = (log r)/r → 0
• Thus, the required energy dissipation per op also approaches 0:
        Ediss = T∆S ≈ (kT ln r)/r → 0
• Could get the same result by assuming the signal energy Esig = kT ln r
  required for reliability level r is dissipated each time an error occurs:
        Ediss = perrEsig = perr(kT ln r) = (kT ln r)/r → 0 as r → ∞.
• Further, note that as r → ∞, the required signal energy grows only very
  slowly…
     – Specifically, only logarithmically in the reliability, i.e., Esig = Θ(log r).

9/4/2010              M. Frank, "Approaching the Physical Limits of Computing"         19
           Device-Level Requirements for
               Reversible Computing
• A good reversible device technology should have:
     – Low manufacturing cost ¢d per device
           • Important for good overall (system-level) cost-efficiency
     – Low rate of static power dissipation Pleak due to energy leakage.
           • Required for energy-efficient storage especially (but also in logic)
     – Low energy coefficient cE = Ediss/f (energy dissipated per operation, per unit
       transition frequency) for adiabatic transitions.
           • Implies we can achieve a high operating frequency (and thus good cost-
             performance) at a given level of energy efficiency.
     – High maximum available transition frequency fmax.
           • Important for those applications in which the latency of serial threads of computation
             dominates total cost
• Important: For system-level energy efficiency, Pleak and cE must be taken
  as effective global values measuring the implied amount of energy emitted
  into the outside environment at temperature Tenv.
     – With an ideal (Carnot) refrigerator, Pleak = StTenv and cE = cSTenv,
           • Where St = the static rate of leakage entropy generation per unit time,
           • and cS = Sgen/f adiabatic entropy coefficient, or entropy generated per unit transition
             frequency.


9/4/2010                M. Frank, "Approaching the Physical Limits of Computing"                  20
             Early Chemical Implementations
• How to physically implement reversible logic?
     – Bennett‟s original inspiration: DNA polymerization!
           • Reversible copying of a DNA strand
               – Molecular basis of cell division / organism reproduction
           • This (and all) chemical reactions are reversible…
               – Direction (forward vs. backward) & reaction rate depends on relative
                 concentrations of reagent and product species  affect free energy
           • Energy dissipated per step turns out to be proportional to speed.
               – Implies process is characterized by an energy-time constant.
                   » I call this the “energy coefficient” cEt ≡ Ediss,optop = Ediss,op/fop.
           • For DNA, typical figures are 40 kT ≈ 1eV @ ~1,000 bp/s
               – Thus, the energy coefficient cE is about 1 eV/kHz.

• Can we achieve better energy coefficients?
     – Yes, in fact, we had already beat DNA‟s cE in reversible
       CMOS VLSI technology available circa 1995!
9/4/2010              M. Frank, "Approaching the Physical Limits of Computing"                21
           Energy & Entropy Coefficients
                                     Q
                  in Electronics        R
• For a transition involving the adiabatic transfer
  of an amount Q of charge along a path with
  resistance R:
     – The raw (local) energy coefficient is given by
       cEt = Edisst = Pdisst2 = IVt2 = I2Rt2 = Q2R.
           • Where V is the voltage drop along the path.
     – The entropy coefficient cSt = Q2R/Tpath.
           • where Tpath is the local thermodynamic temperature in
             the path.
     – The effective (global) energy coefficient is
        cEt,eff = Q2R(Tenv/Tpath).
           • We pay a penalty for low-T operation!
9/4/2010           M. Frank, "Approaching the Physical Limits of Computing"   22
                 Example of Electronic cEt
• In a fairly recent (180 nm) CMOS VLSI technology:
     – Energy stored per min. sized transistor gate: ~1 fJ @ 2V
           • Corresponds to charge per gate of Q = 1 fC ≈ 6,000 electrons
     – Resistance per turned-on min-sized nFET of ~14 kΩ
           • Order of the quantum resistance R = R0 = 1/G0 = h/2q2 = 12.9 kΩ
     – Ideal energy coefficient for a single-gate transition
       ~1.4×10−26 J/Hz
           • Or in more convenient units, ~80 eV/GHz = 0.08 eV/MHz!
     – with some expected overheads for a simple test circuit,
       calculated energy coefficient comes out to about 8× higher,
       or ~10−25 J·s
           • Or ~600 eV/GHz = 0.6 eV/MHz.
     – Detailed Cadence simulations gave us, per transistor:
           • @ 1 GHz: P = 20 μW, E = 20 fJ = 1.2 keV, so Ec = 1.2 eV/MHz
           • @ 1 MHz: P = 0.35 pW, E = 3.5 aJ = 2.2 eV, so Ec = 2.1 eV/MHz

9/4/2010            M. Frank, "Approaching the Physical Limits of Computing"   23
                                                        Cadence Simulation Results
                                          Power vs. freq., TSMC 0.18, Std. CMOS vs. 2LAL
                                                   2LAL = Two-level adiabatic logic                                                         •   Graph shows power
                                         1.E-05                                                                                                 dissipation vs. frequency
                                                                                                                                                 – in a shift register.
                                         1.E-06                                                                                             •   At moderate frequencies
                                                                                                                                                (1 MHz),
                                                                                                                                                 – Reversible uses
Average power dissipation per nFET, W




                                         1.E-07                                                                                                    < 1/100th the power of




                                                                                                     Energy dissipated per nFET per cycle
                                                                                                                                                   irreversible!
                                                                       Standard
                                         1.E-08
                                                                        CMOS                                                                •   At ultra-low power
                                                                                                                                                (1 pW/transistor)
                                         1.E-09                                                                                                  – Reversible is 100×
                                                                                                                                                   faster than irreversible!
                                         1.E-10                                                                                             •   Minimum energy
                                                                                                                                                dissipation < 1 eV!
                                                                                                                                                 – 500× lower than best
                                         1.E-11                                                                                                    irreversible!
                                                                                                                                                      • 500× higher
                                         1.E-12                                                                                                         computational energy
                                                                                                                                                        efficiency!
                                                                                                                                            •   Energy transferred is still
                                         1.E-13                                                                                                 ~10 fJ (~100 keV)
                                                                                                                                                 – So, energy recovery
                                         1.E-14                                                                                                    efficiency is 99.999%!
                                              1.E+09 1.E+08 1.E+07 1.E+06 1.E+05 1.E+04 1.E+03                                                        • Not including losses in
                                                                                                                                                        power supply
                                                              Frequency, Hz
                                        9/4/2010             M. Frank, "Approaching the Physical Limits of Computing"                                                          24
              A Useful Two-Bit Primitive:
             Controlled-SET or cSET(a,b)
• Semantics: If a=1, then set b:=1.                                                  a   b   a’ b’
    – Conditionally reversible, if the special                                       0   0   0 0
      precondition ab=0 is met.
           • Note it‟s 1-to-1 on the subset of states used                           0   1   0 1
               – Sufficient to avoid Landauer‟s principle!
                                                                                     1   0   1 1
• We can implement cSET in dual-rail
  CMOS with a pair of transmission gates                                                     drive
                                                                                              (0→1)
    – Each needs just 2 transistors,
           • plus one controlling “drive” signal                                 a            switch
• This 2-bit semi-reversible operation &                                                      (T-gate)
  its inverse cCLR are universal for                                                            b
  reversible (and irreversible) logic!
    – If we compose them in special ways.
                                                                                 a                  b
           • And include latches for sequential logic.

9/4/2010              M. Frank, "Approaching the Physical Limits of Computing"                          25
           Reversible OR (rOR) from cSET
• Semantics: rOR(a,b) ::= if a|b, c:=1.
     – Set c:=1, on the condition that either a or b is 1.
           • Reversible under precondition that initially a|b → ~c.
• Two parallel cSETs simultaneously
  driving a shared output line       Hardware diagram

  implement the rOR operation!     a
                                                                                           c
     – This type of gate composition was
                                                                   b
       not traditionally considered.
• Similarly one can do                                                 Spacetime diagram
                                                               a                               a’
  rAND, and reversible
  versions of all operations.                                  c       0          a OR b
                                                                                               c’
     – Logic synthesis with these
       is extremely straightforward…                           b                               b’

9/4/2010             M. Frank, "Approaching the Physical Limits of Computing"                  26
                    CMOS Gate Implementing
                      rLatch / rUnLatch
• Symmetric Reversible Latch
       Implementation                          Icon                        Spacetime Diagram
                                                                           crLatch       crUnLatch
                    connect               in        mem
                                                                      in
                2
           in            mem
                                               or
                                                connect
                                                                                  (in)
                                         in         mem            mem



• The hardware is just a CMOS transmission gate again
      • This time controlled by a clock, with the data signal driving
• Concise, symmetric hardware icon – Just a short orthogonal line
• Thin strapping lines denote connection in spacetime diagram.

9/4/2010               M. Frank, "Approaching the Physical Limits of Computing"                 27
                        Example:
                Building cNOT from rlXOR
• rlXOR(a,b,c): Reversible latched XOR.
     – Semantics: c := ab.
           • Reversible under precondition that c is initially clear.
• cNOT(a,b): Controlled-NOT operation.
     – Semantics: b := ab. (No preconditions.)
           • A classic “primitive” operation in reversible & quantum computing
     – But, it turns out to be fairly complex to implement cNOT in
       available fully adiabatic hardware technologies…
           • Thus, it‟s really not a very good building block for practical
             reversible hardware designs!
     – Of course, we can still build it, if we really want to.
           • Since, as I said, our gate set is universal for reversible logic

9/4/2010             M. Frank, "Approaching the Physical Limits of Computing"    28
                  cNOT from rlXOR:
                  Hardware Diagram
• A logic block providing an in-place cNOT
  operation (a cNOT “gate”) can be constructed
  from 2 rlXOR gates and two latched buffers.
                                                                         Reversible
              A
              B                                                          latches
                                                           X




• The key is:
     – Operate some of the gates in reverse!
9/4/2010      M. Frank, "Approaching the Physical Limits of Computing"                29
                        Θ(log n)-time carry-skip adder
                                                                                                                     With this structure, we can do a
 (8 bit segment shown)                                                                                               2n-bit add in 2(n+1) logic levels
                                                                                                                            → 4(n+1) reversible ticks
                                            3rd carry tick                                                 2nd carry tick        → n+1 clock cycles.
   4th carry tick
       S AB                S AB              S AB                S AB                   S AB            S AB              S AB             S AB
       G        Cin        GCoutCin          G       Cin         GCoutCin           G        Cin        GCoutCin         G     Cin         GCoutCin         Hardware
            P                   P                P                      P                P                   P                P                   P
                                                                                                                                                           overhead is
           Pms        Gls Pls                Pms           Gls    Pls                   Pms        Gls Pls                Pms        Gls    Pls
                                                                                                                                                         < 2× regular
           G
                      MS
                                             GCout
                                                           LS
                                                                        Cin             G
                                                                                                   MS
                                                                                                                          GCout
                                                                                                                                     LS
                                                                                                                                                  Cin
                                                                                                                                                         ripple-carry!
                       P                                    P                                       P                                 P

                                                                                                                                                            Spacetime
                                Pms Gls    Pls                                                               Pms Gls    Pls
                                                                                                                                                        overhead only
                                      MS
                                G                                                                            GCout LS         Cin
                                      P
                                                                                                                                                          ~2(n+1)× a
                                                                                                                   P
                                                                                                                                                         conventional
                                                                     Pms Gls      Pls                                                                     single-cycle
                                                                     GCout LS           Cin                                                                equivalent.
                                                                              P



9/4/2010                                   M. Frank, "Approaching the Physical Limits of Computing"                                                                      30
                          32-bit Adder Simulation Results
                          32-bit adder power vs.                                                       32-bit adder energy vs.
                                 frequency                                                                    frequency
            1.E-04
                                                                                              1.E-11


            1.E-05
                                                                                              1.E-12
                                                                                                       1V CMOS




                                                                             Energy/Add (J)
            1.E-06
                                                                                                                          0.5V CMOS
                                                                                              1.E-13
Power (W)




            1.E-07

                                                                                              1.E-14        CMOS energy
            1.E-08                                                                                          Adia. enrgy
                                                 20x better perf.
                            CMOS pwr             @ 3 nW/adder                                 1.E-15
            1.E-09                                                                                 1.E+08     1.E+07        1.E+06    1.E+05   1.E+04
                            Adia. pwr
                                                                                                               Add Frequency (Hz)
            1.E-10
                 1.E+08     1.E+07      1.E+06     1.E+05           1.E+04                     (All results here are normalized to a
                               Add Frequency (Hz)
                                                                                               throughput level of 1 add/cycle)
9/4/2010                                M. Frank, "Approaching the Physical Limits of Computing"                                                        31
                Technological Challenges
• Fundamental theoretical challenges:
     – Find more efficient reversible algorithms
           • Or, prove rigorous lower bounds on complexity overheads
     – Study fundamental physical limits of reversible computing
• Implementation challenges:
     – Design new devices with lower energy coefficients cEt
     – Design high-quality resonators for driving transitions
     – Empirically demonstrate large system-level power savings
• Application development challenges:
     – Find a plausible near- to medium-term “killer app” for RC
           • Something that‟s very valuable, and can‟t be done without it
     – Build a prototype RC-based solution prototype


9/4/2010             M. Frank, "Approaching the Physical Limits of Computing"   32
                                                                       Power vs. freq., alt. device techs.


      Plenty of Room for                                       Power per device, vs. frequency
                                                                                                                                        1.E-03




     Device Improvement
                                                                                                                                        1.E-04

                                                                                                                                        1.E-05

                                                                                                                                        1.E-06



• Recall, irreversible device
                                                                                                                                        1.E-07

                                                                                                                                        1.E-08


  technology has at most ~3-                                                                                                            1.E-09

                                                                                                                                        1.E-10

  4 orders of magnitude of                                                                                                              1.E-11


  power-performance                                                                                                                     1.E-12

                                                                                                                                        1.E-13

  improvements remaining.                                                                                                               1.E-14




                                                                                                                                                  Power per device (W)
                                                                                                                                        1.E-15

     – And then, the firm kT ln 2                                                                                                       1.E-16


       (VNL) limit is encountered.                                                                                                      1.E-17

                                                                                                                                        1.E-18

• But, a wide variety of                                                                                                                1.E-19



  proposed reversible device                                                                                                            1.E-20

                                                                                                                                        1.E-21


  technologies have been                                      .18um 2LAL
                                                                                                                                        1.E-22

                                                                                                                                        1.E-23

  analyzed by physicists.                                     nSQUID
                                                              QCA cell                                                                  1.E-24

                                                                                      Various
     – With theoretical power-                                Quantum FET                                                               1.E-25

                                                              Rod logic               reversible                                        1.E-26

       performance up to 10-12                                Param. quantron
                                                                                      device proposals                                  1.E-27
                                                              Helical logic
       orders of magnitude better                             .18um CMOS
                                                                                                                                        1.E-28



       than today‟s CMOS!
                                                                                                                                        1.E-29
                                                              kT ln 2
                                                                                                                                        1.E-30

           • Ultimate limits are unclear.         1.E+12   1.E+11   1.E+10   1.E+09    1.E+08   1.E+07   1.E+06   1.E+05   1.E+04
                                                                                                                                         1.E-31
                                                                                                                                    1.E+03

                                                                                      Frequency (Hz)
9/4/2010              M. Frank, "Approaching the Physical Limits of Computing"                                                             33
                   Limiting Cases of
              Energy/Entropy Coefficients
• Entropy/entropy coefficients in adiabatic “single electronics:”
     – Suppose the amount of charge moved |Q| = q (a single electron)
     – Let the path consist of a single quantum channel (chain of states)
           • Has quantum resistance R = R0 = 1/G0 = h/2q2 = 12.9 kΩ.
     – Then cE = h/2 = 2.07 meV/THz (very low!)
           • If path is at Tpath = Troom = 300 K, then cS = 0.08 k/THz.
     – For N× better efficiency than this, let the path consist of N parallel
       quantum channels.  N× lower resistance.
• What about systems where resistive models may not apply?
     – E.g., superconductors, photonics, etc.
• A more general and rigorous (but perhaps loose) lower bound
  on the energy coefficient in all adiabatic quantum systems is
  given by the expression cE ≥ h2/4Egt,
     – where Eg = energy gap between ground & excited states,
     – and t = time taken for a single orthogonalizing transition
     – Ex.: Let Eg = 1 eV, t = 1 ps. Then cE ≥ 4.28 μeV/THz.

9/4/2010              M. Frank, "Approaching the Physical Limits of Computing"   34
               Requirements for Energy-
            Recovering Clock/Power Supplies
• All known reversible computing schemes require a periodic
  global signal that synchronizes and drives adiabatic
  transitions.
     – For good system-level energy efficiency, this signal must oscillate
       resonantly and near-ballistically, with a high effective quality factor.
• Several factors make the design of a satisfactory resonator
  quite difficult:
     – Need to avoid uncompensated back-action of logic on resonator
     – In some resonators, Q factor may scale unfavorably with size
     – Effective quality factor problem
• There‟s no reason to think that it‟s impossible to do…
     – But it is definitely a nontrivial hurdle, that we need to face up to, pretty
       urgently…
           • If we want to make reversible computing practical in time to avoid an
             extended period of stagnation in computer performance growth.

9/4/2010              M. Frank, "Approaching the Physical Limits of Computing"       35
                  The Back-Action Problem
• The ideal resonator signal is a pure periodic signal.
     – A pretty general result from communications theory:
           • A resonator‟s quality factor is inversely proportional to its signal bandwidth B.
     – E.g., for an EM cavity w. resonant frequency ω0,
           • the half-maximum BW is B = ∆ω = ω0/(2πQ) [1].
     – Thus Q∞  B  0.
           • There must be little or no information in the resonator signal!
• However, if the logic load being driven varies from on cycle to the next,
     – whether due to data-dependent variations,
     – or structural variations (different amounts of logic being driven per cycle)
• this will tend to produce impedance nonuniformities, which will lead to
  nonuniform reflections of the resonator signal
     – and thereby introduce nonzero bandwidth into that signal.
• Even more generally, any departure of resonator energy away from its
  ideal desired trajectory represents a form of effective energy dissipation!
     – we must control exactly where (into what states) all of the energy goes!
           • the set of possible microstates of the system must not grow quickly


               [1] Schwartz, Principles of Electrodynamics, Dover, 1972.

9/4/2010                M. Frank, "Approaching the Physical Limits of Computing"                 36
            Unfavorable Scaling of Resonator
                Quality Factor with Size?
• I don‟t yet have a perfectly clear and general understanding of
  this issue, but…
     – In a lot of oscillating systems I‟ve looked at, the resonant Q factor may
       tend to get worse (or at least, not very much better) as the resonator
       dimensions get smaller.
           • E.g., in LC oscillators, inductor Q scales inversely to frequency
               – EM emission is greater at high frequencies
               – But, the tendency is for low f  large coil sizes, not small!
           • Anecdotal reports from people working in NEMS community…
               – It can be difficult to get high Q in nanoscale electromechanical resonators
                     » Perhaps due to present difficulty of precision engineering at nanoscale?
           • Our own experience working with transmission-line resonators
• Example: In a cubical EM cavity of length L,
     – We have 2πQ = L / 8δ, where δ = skin depth. ([1] again)
           • Skin depth δ = (2πσk)−1/2, where σ = wall conductivity, k = wave #.
               – So if L is fixed, high Q  small δ  large k  high f  low Q in logic!


9/4/2010              M. Frank, "Approaching the Physical Limits of Computing"                    37
              The Effective Quality Factor
                       Problem
• Actual quality factor of resonator Q = Eres/Edissr.
     – Where Eres = energy contained in resonator signal
     – and Edissr = energy dissipated in resonator per cycle.
• But the effective quality factor, for purposes of doing
  energy-efficient logic transitions is Qeff = Edeliv/Edissr.
     – Where Edeliv = energy delivered to the logic per transition.
           • Since 1/Qeff of the logic signal energy is dissipated per cycle.
• Thus, Qeff = Q · (Edeliv/Eres).
     – That is, the effective Q is taken down by the fraction of
       resonator energy delivered to the logic per cycle.
• If a resonator needs to be large to attain high Q,
     – it may also hold a large amount of energy Eres,
           • and so it may not have a very high effective Q for driving the logic!

9/4/2010             M. Frank, "Approaching the Physical Limits of Computing"    38
             (PATENT PENDING, UNIVERSITY OF FLORIDA)

            Trapezoidal Resonator Concept
                    Arm anchored to nodal points of fixed-fixed beam flexures,
                    located a little ways away, in both directions (for symmetry)

                Moving metal plate support arm/electrode




                                   Moving
                                    plate Range of Motion


                                                                                            z
           Phase 0° electrode                                      Phase 180° electrode         y
                                                         Repeat
                                                 interdigitated                                     x
   C(θ)                                               structure   C(θ)
                                               arbitrarily many
       0°                   360°          times along y axis,        0°              360°
       θ                                  all anchored to the        θ
                                                 same flexure



9/4/2010               M. Frank, "Approaching the Physical Limits of Computing"                     39
       Previous CMOS-MEMS Resonators
       in post-CMOS DRIE process (in use at UF)
      Front-side      Serpentine
      view            spring     Proof
                                 mass




           Comb
           drive



       Back-side
       view




                                                                  150 kHz


                                     Resonators
9/4/2010           M. Frank, "Approaching the Physical Limits of Computing"   40
           PATENT PENDING, UNIVERSITY OF FLORIDA
                     Resonator Schematic                                                         Vb
                                              Actuator
                                                                            Vc        vac
            Vc            vac
                                                                                            Ca
                                                                                                 Cs
            Sensor                                                                Sensor
                                                                                                 Cr




                  Vb




            Sensor                                                                Sensor

             Vc          vac
                                              Actuator                        V p  Vc  Vb
9/4/2010               M. Frank, "Approaching the Physical Limits of Computing"                   41
           PATENT PENDING, UNIVERSITY OF FLORIDA
                 Post-TSMC35 AdiaMEMS Resonator
                              (Coventorware model)

     Taped out
                                                                  Drive
     April ‘04                                                    comb




            Sense
            comb

                                                                             Flex
                                                                             arm

9/4/2010          M. Frank, "Approaching the Physical Limits of Computing"          42
                Quasi-Trapezoidal MEMS
             Resonator: 1st Fabbed Prototype
                                 • Post-etch process is still being fine-tuned.
                                      – Parts are not yet ready for testing…




                                                                               Primary
                                                                               flexure
                                                                                 (fin)


                                                                               Sense
                                                                               comb


           Drive comb



             PATENT PENDING, UNIVERSITY OF FLORIDA
9/4/2010            M. Frank, "Approaching the Physical Limits of Computing"             43
                          Conclusions
• Reversible computing will become necessary
  within our lifetimes,
     – if we wish to continue progress in computing
       performance/power beyond the next 1-2 decades.
• Much progress in our understanding of RC
  has been made in the past three decades…
     – But much important work still remains to be done.
• I encourage my audience to join the
  community of researchers who are working to
  address the reversible computing challenge.

9/4/2010       M. Frank, "Approaching the Physical Limits of Computing"   44

								
To top