Approaching the Physical Limits of Computing by vyd52713


									               Michael P. Frank


Approaching the Physical Limits of

  Invited talk, presented at ISMVL 2005
 35th Int’l Symp. on Multiple-Valued Logic
    (Sponsor: IEEE Computer Society)
             Friday, May 20, 2005
                           Abstract of Talk
• Fundamental physics limits the performance of
  conventional computing technologies.
     – The energy efficiency of conventional machines will be
       forced to level off in roughly the next 10-20 years.
           • Practical computer performance will then plateau as well.
• However, all of the proven limits to computer energy
  efficiency can, in principle, be circumvented…
     – but only if computing undergoes a radical paradigm shift.
• The essential new paradigm: Reversible computing.
     – It involves reusing energy to improve energy efficiency.
           • However, doing this well tightly constrains computer design at all
             levels from devices through logic, architectures, and algorithms.
• In this talk, I review the stringent physical and logical
  requirements that must be met,
     – if we wish to break through the near-term barriers,
           • and approach the true physical limits of computing.
9/4/2010             M. Frank, "Approaching the Physical Limits of Computing"     2
            Moore’s Law and Performance                                              Moore's Law - Transistors per Chip
                                                    1,000,000,000                                                              Madison
                                                                                                                   Itanium 2
                                                     100,000,000                                                          P4

                                                                    Devices per IC
• Gordon Moore, 1975:                                  1,000,000
                                                                                                             486DX Pentium

    – Devices per IC can be                              100,000

      doubled every 18 months                              1,000
                                                                                            4004          Avg. increase
                                                                                                          of 57%/year
           • Borne out by history!                           100
• Some fortuitous corollaries:                                 1
                                                                                               Year of Introduction
                                                               1950                  1960     1970        1980     1990   2000     2010
    – Every 3 years: Devices ½ as long
    – Every 1.5 years: ~½ as much stored energy per bit!
           • It is that that has enabled us to throw away bits (and their energies)
             2× more frequently every 1.5 years, at reasonable power levels!
               – And thereby double processor performance 2× every 1.5 years!
• Increased energy efficiency of computation is a
  prerequisite for improved raw performance!
    – Given realistic levels of total power consumption.
9/4/2010              M. Frank, "Approaching the Physical Limits of Computing"                                                      3
                    Efficiency in General,
                    and Energy Efficiency
• The efficiency η of any process is: η = P/C
    – Where P = Amount of some valued product produced
    – and C = Amount of some costly resources consumed
• In energy efficiency ηe, the cost C measures energy.
• We can talk about the energy efficiency of:
    – A heat engine: ηhe = W/Q, where:
           • W = work energy output, Q = heat energy input
    – An energy recovering process : ηer = Eend/Estart, where:
           • Eend = available energy at end of process,
           • Estart = energy input at start of process
    – A computer: ηec = Nops/Econs, where:
           • Nops = useful operations performed
           • Econs = free-energy consumed

9/4/2010             M. Frank, "Approaching the Physical Limits of Computing"   4
                                 Trend of Minimum Transistor Switching Energy
                                            ITRS '97-'03 Gate Energy Trends

                                           Based on ITRS ’97-03 roadmaps
                                        250                                                           LP min gate energy, aJ
                                              180                                                     HP min gate energy, aJ     fJ
                            1.E-15                  130        Node numbers
                                                                                                      100 k(300 K)
                                                          90   (nm DRAM hp)                           ln(2) k(300 K)
                            1.E-16                              65
CV2/2 gate energy, Joules

                                                                                                      1 eV
                                                                            32                        k(300 K)
       CVV/2 energy, J

                            1.E-17                                               22
                                     Practical limit for CMOS?
                            1.E-18   Room-temperature 100 kT reliability limit                                                   aJ
                                     One electron volt

                            1.E-20   Room-temperature kT thermal energy
                                     Room-temperature von Neumann - Landauer limit
                            1.E-21                                                                                               zJ

                                 1995    2000             2005       2010    2015     2020   2025   2030   2035     2040       2045
 9/4/2010                                       M. Frank, "Approaching the Physical Limits of Computing"                          5
           Some Lower Bounds on Energy
• In today‟s 90 nm VLSI technology, for minimal operations
  (e.g., conventional switching of a minimum-sized transistor):
     – Ediss,op is on the order of 1 fJ (femtojoule)  ηec ≲ 1015 ops/sec/watt.
           • Will be a bit better in coming technologies (65 nm, maybe 45 nm)
• But, conventional digital technologies are subject to several
  lower bounds on their energy dissipation Ediss,op for digital
  transitions (logic / storage / communication operations),
     – And thus, corresponding upper bounds on their energy efficiency.
• Some of the known bounds include:
     – Leakage-based limit for high-performance field-effect transistors:
           • Maybe roughly ~5 aJ (attojoules)  ηec ≲ 2×1017 operations/sec./watt
     – Reliability-based limit for all non-energy-recovering technologies:
           • On the order of 1 eV (electron-volt)  ηec ≲ 6×1018 ops./sec/watt
     – von Neumann-Landauer (VNL) bound for all irreversible technologies:
           • Exactly kT ln 2 ≈ 18 meV  ηec ≲ 3.5×1020 ops/sec/watt
               – For systems whose waste heat ultimately winds up in Earth‟s atmosphere,
                   » i.e., at temperature T ≈ Troom = 300 K.

9/4/2010              M. Frank, "Approaching the Physical Limits of Computing"             6
                 Reliability Bound on Logic
                      Signal Energies
• Let Esig denote the logic signal energy,
     – The energy involved (transferred, manipulated) in the process of storing,
       transmitting, or transforming a bit‟s worth of digital information.
           • But note that “involved” does not necessarily mean “dissipated!”
• As a result of fundamental thermodynamic considerations, it is required
  that Esig ≲ kBTsig ln r (with quantum corrections that are small for large r)
     – Where kB is Boltzmann‟s constant, 1.38×10−12 J/K;
     – and Tsig is the temperature in the degrees of freedom carrying the signal;
     – and r is the reliability factor, i.e., the improbability of error, 1/perr.
• In non-energy-recovering logic technologies (totally dominant today)
     – Basically all of the signal energy is dissipated to heat on each operation.
           • And often additional energy (e.g., short-circuit power) as well.
• In this case, minimum sustainable dissipation is Ediss,op ≳ kBTenv ln r,
     – Where Tenv is now the temperature of the waste-heat reservoir (environment)
           • Averages around 300 K (room temperature) in Earth‟s atmosphere
• For a decent r of e.g. 2×1017, this energy is on the order ~40 kT ≈ 1 eV.
     – Therefore, if we want energy efficiency ηec > ~1 op/eV, we must recover some
       of the signal energy for later reuse.
           • Rather than dissipating it all to heat with each manipulation of the signal.

9/4/2010                M. Frank, "Approaching the Physical Limits of Computing"            7
                 The von Neumann-Landauer
                       (VNL) Principle
• First alluded to by John von Neumann in 1949.
     – Developed explicitly by Rolf Landauer of IBM in 1961.
• The principle is a rigorous theorem of physics!
     – It follows from the reversibility of fundamental dynamics.
• A correct statement of the principle is the following:
     – Any process that loses or obliviously erases 1 bit of known
       (correlated) information increases total entropy by at least
                       ∆S = 1 bit = kB ln 2,
       and thus implies the eventual dissipation at least
                        Ediss = kBTenv ln 2
       of free energy to the environment as waste heat.
           • where kB = log e = 1.38×10−23 J/K is Boltzmann‟s constant
           • and Tenv = temperature of the waste-heat reservoir (environment)
               – Not less than about room temperature, or 300 K for earthbound
                 computers.  implies Ediss ≥ 18 meV.

9/4/2010             M. Frank, "Approaching the Physical Limits of Computing"    8
                 Definition of Reversibility
• What does it mean for a dynamical system (either
  continuous or discrete) to be (time-) reversible?
     – Let x(t) denote the state of the system at time t.
           • The universe, or any closed system of interest (e.g. a computer).
     – Let Ft→u(x) be the transition relation operating between a
       given two times t and u; i.e., x(u) = Ft→u[x(t)].
           • Determined by the system‟s dynamics (laws of physics, or a FSM).
     – Then the system is called “dynamically reversible” iff Ft→u
       is a one-to-one function, for any times (t, u) where u > t.
           • That is,  t >u: ¬ x1x2: Ft→u(x1) = Ft→u(x2).
               – That is, no two distinct states would ever go to the same state over the
                 course of a given time interval.
     – The definition implies determinism, if we also allow u < t.
           • A reversible system is deterministic in the reverse time direction.

9/4/2010             M. Frank, "Approaching the Physical Limits of Computing"           9
               Types of Dynamics

 • Nondeterministic,                         • Nondeterministic,
   irreversible                                reversible

 • Deterministic,                            • Deterministic,
   irreversible                                reversible                 WE

9/4/2010     M. Frank, "Approaching the Physical Limits of Computing"          10
                   Physics is Reversible!
• All of the successful models of fundamental physics
  are expressible in the Hamiltonian formalism.
     – Including: Classical mechanics, quantum mechanics,
       special and general relativity, quantum field theories.
           • The latter two (GR & QFT) are backed up by enormous,
             overwhelming mountains of evidence confirming their predictions!
              – 11 decimal places of precision so far! And, no contradicting evidence.
• In Hamiltonian systems, the dynamical state x(t)
  obeys a differential equation that‟s first-order in time,
       dx/dt = g(x)     (where g is some function)
     – This immediately implies determinism of the dynamics.
• And, since the time differential dt can be taken to be
  negative, the formalism also implies reversibility!
     – Thus, dynamical reversibility is one of the most firmly-
       established, fundamental, inviolable facts of physics!
9/4/2010            M. Frank, "Approaching the Physical Limits of Computing"        11
              Illustration of VNL Principle
•   Either digital state is initially encoded by any of N possible physical microstates
     – Illustrated as 4 in this simple example (the real number would usually be much larger)
     – Initial entropy S = log[#microstates] = log 4 = 2 bits.
•   Reversibility of physics ensures “bit erasure” operation can‟t possibly merge two
    microstates, so it must double the possible microstates in the digital state!
     – Entropy S = log[#microstates] increases by log 2 = 1 bit = (log e)(ln 2) = kB ln 2.
     – To prevent entropy from accumulating locally, it must be expelled into the environment.

                   Microstates                                  Microstates
                   representing                                 representing
                    logical “0”                                  logical “1”

    Entropy                                                                          Entropy
    S′ =
    S=                                                                               S=
    log 4 =                                                                          log 4 =
    3 bits
    2                                                                                2 bits
                                          ∆S = S′ − S
                                        = 3 bits − 2 bits
                                             = 1 bit
9/4/2010              M. Frank, "Approaching the Physical Limits of Computing"                  12
                    Reversible Computing
• The basic idea is simply this:
     – Don’t erase information when performing logic / storage /
       communication operations!
           • Instead, just reversibly (invertibly) transform it in place!
• When reversible digital operations are implemented
  using well-designed energy-recovering circuitry,
     – This can result in local energy dissipation Ediss << Esig,
           • this has already been empirically demonstrated by many groups.
     – and even total energy dissipation Ediss << kT ln 2!
           • This is easily shown in theory & simulations,
               – but we are not yet to the point of demonstrating such low levels of total
                 dissipation empirically in a physical experiment.
           • Achieving this goal requires very careful design,
               – and verifying it requires very sensitive measurement equipment.

9/4/2010              M. Frank, "Approaching the Physical Limits of Computing"          13
           How Reversible Logic Avoids the
            von Neumann-Landauer Bound
• We arrange our logical manipulations to never
  attempt to merge two distinct digital states,
     – but only to reversibly                         logic 00            logic 01
       transform them from
       one state to another!
• E.g., illustrated is a
  reversible operation                                logic 10            logic 11
  cCLR (controlled CLR)
     – It and its inverse cSET
       enable arbitrary logic!
9/4/2010       M. Frank, "Approaching the Physical Limits of Computing"              14
           A Few Highlights Of Reversible
                Computing History
• Charles Bennett @ IBM, 1973-1989:
     – Reversible Turing machines & emulation algorithms
           • Can emulate irreversible machines on reversible architectures.
               – But, the emulation introduces some inefficiencies
     – Models of chemical & Brownian-motion physical realizations.
• Fredkin and Toffoli‟s group @ MIT, late 1970‟s/early 1980‟s
     – Reversible logic gates and networks (space/time diagrams)
     – Ballistic and adiabatic circuit implementation proposals
• Groups @ Caltech, ISI, Amherst, Xerox, MIT, „85-‟95:
     – Concepts for & implementations of adiabatic circuits in VLSI tech.
     – Small explosion of adiabatic circuit literature since then!
• Mid 1990s-today:
     – Better understanding of overheads, tradeoffs, asymptotic scaling
     – A few groups begin development of post-CMOS implementations
           • Most notably, the Quantum-dot Cellular Automata group at Notre Dame
9/4/2010              M. Frank, "Approaching the Physical Limits of Computing"     15
                                                                      Caveat #1
•                                Technically, to avoid the VNL bound doesn‟t actually require that the digital
                                 operation must be reversible at the level of the logical states…
                                  – It can be logically irreversible if the information in the digital state is already entropy!
                                        • In the below example, the non-digital entropy doesn‟t change, because the operation is also
                                          nondeterministic (N to N), and the transition relation between logical states has semi-detailed
                                          balance, so the entropy in the digital state remains constant.
•                                However, such operations just re-randomize bits that are already random!
                                  – It‟s not clear if this kind of operation is computationally useful.
Digital bit with unknown value

                                    0                                                                                              0

                                    1                                                                                              1

                                                              Physical dynamics whose precise
                                                                  details may be uncertain
9/4/2010                                              M. Frank, "Approaching the Physical Limits of Computing"                              16
                                   Caveat #2
• Operations that are logically N-to-1 can be used, if
  there are sufficient compensating 1-to-N
  (nondeterministic) logical operations.
     – All that is really required is that the logical dynamics be 1-
       to-1 in the long-term average.
           • Thus, it‟s possible to thermally generate random bits and discard
             them later when we are through with them.
               – While maintaining overall thermodynamic reversibility.
           • This ability is useful for probabilistic (randomized) algorithms.

                        logic 0                              logic 1
9/4/2010             M. Frank, "Approaching the Physical Limits of Computing"    17
               Reversibility and Reliability
• A widespread myth: “Future low-level digital
  devices will necessarily be highly unreliable.”
     – This comes from a flawed line of reasoning:
           • Faster  more energy efficient  lower bit energies  high rate of
             bit errors from thermal noise
     – However, this scaling strategy doesn‟t work, because:
           • High rate of thermal errors  high power dissipation from error
             correction  less energy efficient  ultimately slower!
• But in contrast, using reversible computing, we can
  achieve arbitrarily high energy efficiency while also
  maintaining arbitrarily high reliability!
     – The key is to keep bit energies reasonably high!
           • While recovering most of the bit energy…

9/4/2010             M. Frank, "Approaching the Physical Limits of Computing"   18
           Minimizing Energy Dissipation
               Due to Thermal Errors
• Let perr = 1/r be the bit-error probability per operation.
     – Where r quantifies the “reliability level.”
     – And pok = 1 − perr is the probability the bit is correct
• The necessary entropy increase ∆S per op due to error occurrence is
  given by the (binary) Shannon entropy of the bit-value after the operation:
        H(perr) = perr log perr-1 + pok log pok-1.
• For r >> 1 (i.e., as r → ∞), this increase approaches 0:
        ∆S = H(perr) ≈ perr log perr-1 = (log r)/r → 0
• Thus, the required energy dissipation per op also approaches 0:
        Ediss = T∆S ≈ (kT ln r)/r → 0
• Could get the same result by assuming the signal energy Esig = kT ln r
  required for reliability level r is dissipated each time an error occurs:
        Ediss = perrEsig = perr(kT ln r) = (kT ln r)/r → 0 as r → ∞.
• Further, note that as r → ∞, the required signal energy grows only very
     – Specifically, only logarithmically in the reliability, i.e., Esig = Θ(log r).

9/4/2010              M. Frank, "Approaching the Physical Limits of Computing"         19
           Device-Level Requirements for
               Reversible Computing
• A good reversible device technology should have:
     – Low manufacturing cost ¢d per device
           • Important for good overall (system-level) cost-efficiency
     – Low rate of static power dissipation Pleak due to energy leakage.
           • Required for energy-efficient storage especially (but also in logic)
     – Low energy coefficient cE = Ediss/f (energy dissipated per operation, per unit
       transition frequency) for adiabatic transitions.
           • Implies we can achieve a high operating frequency (and thus good cost-
             performance) at a given level of energy efficiency.
     – High maximum available transition frequency fmax.
           • Important for those applications in which the latency of serial threads of computation
             dominates total cost
• Important: For system-level energy efficiency, Pleak and cE must be taken
  as effective global values measuring the implied amount of energy emitted
  into the outside environment at temperature Tenv.
     – With an ideal (Carnot) refrigerator, Pleak = StTenv and cE = cSTenv,
           • Where St = the static rate of leakage entropy generation per unit time,
           • and cS = Sgen/f adiabatic entropy coefficient, or entropy generated per unit transition

9/4/2010                M. Frank, "Approaching the Physical Limits of Computing"                  20
             Early Chemical Implementations
• How to physically implement reversible logic?
     – Bennett‟s original inspiration: DNA polymerization!
           • Reversible copying of a DNA strand
               – Molecular basis of cell division / organism reproduction
           • This (and all) chemical reactions are reversible…
               – Direction (forward vs. backward) & reaction rate depends on relative
                 concentrations of reagent and product species  affect free energy
           • Energy dissipated per step turns out to be proportional to speed.
               – Implies process is characterized by an energy-time constant.
                   » I call this the “energy coefficient” cEt ≡ Ediss,optop = Ediss,op/fop.
           • For DNA, typical figures are 40 kT ≈ 1eV @ ~1,000 bp/s
               – Thus, the energy coefficient cE is about 1 eV/kHz.

• Can we achieve better energy coefficients?
     – Yes, in fact, we had already beat DNA‟s cE in reversible
       CMOS VLSI technology available circa 1995!
9/4/2010              M. Frank, "Approaching the Physical Limits of Computing"                21
           Energy & Entropy Coefficients
                  in Electronics        R
• For a transition involving the adiabatic transfer
  of an amount Q of charge along a path with
  resistance R:
     – The raw (local) energy coefficient is given by
       cEt = Edisst = Pdisst2 = IVt2 = I2Rt2 = Q2R.
           • Where V is the voltage drop along the path.
     – The entropy coefficient cSt = Q2R/Tpath.
           • where Tpath is the local thermodynamic temperature in
             the path.
     – The effective (global) energy coefficient is
        cEt,eff = Q2R(Tenv/Tpath).
           • We pay a penalty for low-T operation!
9/4/2010           M. Frank, "Approaching the Physical Limits of Computing"   22
                 Example of Electronic cEt
• In a fairly recent (180 nm) CMOS VLSI technology:
     – Energy stored per min. sized transistor gate: ~1 fJ @ 2V
           • Corresponds to charge per gate of Q = 1 fC ≈ 6,000 electrons
     – Resistance per turned-on min-sized nFET of ~14 kΩ
           • Order of the quantum resistance R = R0 = 1/G0 = h/2q2 = 12.9 kΩ
     – Ideal energy coefficient for a single-gate transition
       ~1.4×10−26 J/Hz
           • Or in more convenient units, ~80 eV/GHz = 0.08 eV/MHz!
     – with some expected overheads for a simple test circuit,
       calculated energy coefficient comes out to about 8× higher,
       or ~10−25 J·s
           • Or ~600 eV/GHz = 0.6 eV/MHz.
     – Detailed Cadence simulations gave us, per transistor:
           • @ 1 GHz: P = 20 μW, E = 20 fJ = 1.2 keV, so Ec = 1.2 eV/MHz
           • @ 1 MHz: P = 0.35 pW, E = 3.5 aJ = 2.2 eV, so Ec = 2.1 eV/MHz

9/4/2010            M. Frank, "Approaching the Physical Limits of Computing"   23
                                                        Cadence Simulation Results
                                          Power vs. freq., TSMC 0.18, Std. CMOS vs. 2LAL
                                                   2LAL = Two-level adiabatic logic                                                         •   Graph shows power
                                         1.E-05                                                                                                 dissipation vs. frequency
                                                                                                                                                 – in a shift register.
                                         1.E-06                                                                                             •   At moderate frequencies
                                                                                                                                                (1 MHz),
                                                                                                                                                 – Reversible uses
Average power dissipation per nFET, W

                                         1.E-07                                                                                                    < 1/100th the power of

                                                                                                     Energy dissipated per nFET per cycle
                                                                        CMOS                                                                •   At ultra-low power
                                                                                                                                                (1 pW/transistor)
                                         1.E-09                                                                                                  – Reversible is 100×
                                                                                                                                                   faster than irreversible!
                                         1.E-10                                                                                             •   Minimum energy
                                                                                                                                                dissipation < 1 eV!
                                                                                                                                                 – 500× lower than best
                                         1.E-11                                                                                                    irreversible!
                                                                                                                                                      • 500× higher
                                         1.E-12                                                                                                         computational energy
                                                                                                                                            •   Energy transferred is still
                                         1.E-13                                                                                                 ~10 fJ (~100 keV)
                                                                                                                                                 – So, energy recovery
                                         1.E-14                                                                                                    efficiency is 99.999%!
                                              1.E+09 1.E+08 1.E+07 1.E+06 1.E+05 1.E+04 1.E+03                                                        • Not including losses in
                                                                                                                                                        power supply
                                                              Frequency, Hz
                                        9/4/2010             M. Frank, "Approaching the Physical Limits of Computing"                                                          24
              A Useful Two-Bit Primitive:
             Controlled-SET or cSET(a,b)
• Semantics: If a=1, then set b:=1.                                                  a   b   a’ b’
    – Conditionally reversible, if the special                                       0   0   0 0
      precondition ab=0 is met.
           • Note it‟s 1-to-1 on the subset of states used                           0   1   0 1
               – Sufficient to avoid Landauer‟s principle!
                                                                                     1   0   1 1
• We can implement cSET in dual-rail
  CMOS with a pair of transmission gates                                                     drive
    – Each needs just 2 transistors,
           • plus one controlling “drive” signal                                 a            switch
• This 2-bit semi-reversible operation &                                                      (T-gate)
  its inverse cCLR are universal for                                                            b
  reversible (and irreversible) logic!
    – If we compose them in special ways.
                                                                                 a                  b
           • And include latches for sequential logic.

9/4/2010              M. Frank, "Approaching the Physical Limits of Computing"                          25
           Reversible OR (rOR) from cSET
• Semantics: rOR(a,b) ::= if a|b, c:=1.
     – Set c:=1, on the condition that either a or b is 1.
           • Reversible under precondition that initially a|b → ~c.
• Two parallel cSETs simultaneously
  driving a shared output line       Hardware diagram

  implement the rOR operation!     a
     – This type of gate composition was
       not traditionally considered.
• Similarly one can do                                                 Spacetime diagram
                                                               a                               a’
  rAND, and reversible
  versions of all operations.                                  c       0          a OR b
     – Logic synthesis with these
       is extremely straightforward…                           b                               b’

9/4/2010             M. Frank, "Approaching the Physical Limits of Computing"                  26
                    CMOS Gate Implementing
                      rLatch / rUnLatch
• Symmetric Reversible Latch
       Implementation                          Icon                        Spacetime Diagram
                                                                           crLatch       crUnLatch
                    connect               in        mem
           in            mem
                                         in         mem            mem

• The hardware is just a CMOS transmission gate again
      • This time controlled by a clock, with the data signal driving
• Concise, symmetric hardware icon – Just a short orthogonal line
• Thin strapping lines denote connection in spacetime diagram.

9/4/2010               M. Frank, "Approaching the Physical Limits of Computing"                 27
                Building cNOT from rlXOR
• rlXOR(a,b,c): Reversible latched XOR.
     – Semantics: c := ab.
           • Reversible under precondition that c is initially clear.
• cNOT(a,b): Controlled-NOT operation.
     – Semantics: b := ab. (No preconditions.)
           • A classic “primitive” operation in reversible & quantum computing
     – But, it turns out to be fairly complex to implement cNOT in
       available fully adiabatic hardware technologies…
           • Thus, it‟s really not a very good building block for practical
             reversible hardware designs!
     – Of course, we can still build it, if we really want to.
           • Since, as I said, our gate set is universal for reversible logic

9/4/2010             M. Frank, "Approaching the Physical Limits of Computing"    28
                  cNOT from rlXOR:
                  Hardware Diagram
• A logic block providing an in-place cNOT
  operation (a cNOT “gate”) can be constructed
  from 2 rlXOR gates and two latched buffers.
              B                                                          latches

• The key is:
     – Operate some of the gates in reverse!
9/4/2010      M. Frank, "Approaching the Physical Limits of Computing"                29
                        Θ(log n)-time carry-skip adder
                                                                                                                     With this structure, we can do a
 (8 bit segment shown)                                                                                               2n-bit add in 2(n+1) logic levels
                                                                                                                            → 4(n+1) reversible ticks
                                            3rd carry tick                                                 2nd carry tick        → n+1 clock cycles.
   4th carry tick
       S AB                S AB              S AB                S AB                   S AB            S AB              S AB             S AB
       G        Cin        GCoutCin          G       Cin         GCoutCin           G        Cin        GCoutCin         G     Cin         GCoutCin         Hardware
            P                   P                P                      P                P                   P                P                   P
                                                                                                                                                           overhead is
           Pms        Gls Pls                Pms           Gls    Pls                   Pms        Gls Pls                Pms        Gls    Pls
                                                                                                                                                         < 2× regular
                                                                        Cin             G
                       P                                    P                                       P                                 P

                                Pms Gls    Pls                                                               Pms Gls    Pls
                                                                                                                                                        overhead only
                                G                                                                            GCout LS         Cin
                                                                                                                                                          ~2(n+1)× a
                                                                     Pms Gls      Pls                                                                     single-cycle
                                                                     GCout LS           Cin                                                                equivalent.

9/4/2010                                   M. Frank, "Approaching the Physical Limits of Computing"                                                                      30
                          32-bit Adder Simulation Results
                          32-bit adder power vs.                                                       32-bit adder energy vs.
                                 frequency                                                                    frequency

                                                                                                       1V CMOS

                                                                             Energy/Add (J)
                                                                                                                          0.5V CMOS
Power (W)


                                                                                              1.E-14        CMOS energy
            1.E-08                                                                                          Adia. enrgy
                                                 20x better perf.
                            CMOS pwr             @ 3 nW/adder                                 1.E-15
            1.E-09                                                                                 1.E+08     1.E+07        1.E+06    1.E+05   1.E+04
                            Adia. pwr
                                                                                                               Add Frequency (Hz)
                 1.E+08     1.E+07      1.E+06     1.E+05           1.E+04                     (All results here are normalized to a
                               Add Frequency (Hz)
                                                                                               throughput level of 1 add/cycle)
9/4/2010                                M. Frank, "Approaching the Physical Limits of Computing"                                                        31
                Technological Challenges
• Fundamental theoretical challenges:
     – Find more efficient reversible algorithms
           • Or, prove rigorous lower bounds on complexity overheads
     – Study fundamental physical limits of reversible computing
• Implementation challenges:
     – Design new devices with lower energy coefficients cEt
     – Design high-quality resonators for driving transitions
     – Empirically demonstrate large system-level power savings
• Application development challenges:
     – Find a plausible near- to medium-term “killer app” for RC
           • Something that‟s very valuable, and can‟t be done without it
     – Build a prototype RC-based solution prototype

9/4/2010             M. Frank, "Approaching the Physical Limits of Computing"   32
                                                                       Power vs. freq., alt. device techs.

      Plenty of Room for                                       Power per device, vs. frequency

     Device Improvement



• Recall, irreversible device


  technology has at most ~3-                                                                                                            1.E-09


  4 orders of magnitude of                                                                                                              1.E-11

  power-performance                                                                                                                     1.E-12


  improvements remaining.                                                                                                               1.E-14

                                                                                                                                                  Power per device (W)

     – And then, the firm kT ln 2                                                                                                       1.E-16

       (VNL) limit is encountered.                                                                                                      1.E-17


• But, a wide variety of                                                                                                                1.E-19

  proposed reversible device                                                                                                            1.E-20


  technologies have been                                      .18um 2LAL


  analyzed by physicists.                                     nSQUID
                                                              QCA cell                                                                  1.E-24

     – With theoretical power-                                Quantum FET                                                               1.E-25

                                                              Rod logic               reversible                                        1.E-26

       performance up to 10-12                                Param. quantron
                                                                                      device proposals                                  1.E-27
                                                              Helical logic
       orders of magnitude better                             .18um CMOS

       than today‟s CMOS!
                                                              kT ln 2

           • Ultimate limits are unclear.         1.E+12   1.E+11   1.E+10   1.E+09    1.E+08   1.E+07   1.E+06   1.E+05   1.E+04

                                                                                      Frequency (Hz)
9/4/2010              M. Frank, "Approaching the Physical Limits of Computing"                                                             33
                   Limiting Cases of
              Energy/Entropy Coefficients
• Entropy/entropy coefficients in adiabatic “single electronics:”
     – Suppose the amount of charge moved |Q| = q (a single electron)
     – Let the path consist of a single quantum channel (chain of states)
           • Has quantum resistance R = R0 = 1/G0 = h/2q2 = 12.9 kΩ.
     – Then cE = h/2 = 2.07 meV/THz (very low!)
           • If path is at Tpath = Troom = 300 K, then cS = 0.08 k/THz.
     – For N× better efficiency than this, let the path consist of N parallel
       quantum channels.  N× lower resistance.
• What about systems where resistive models may not apply?
     – E.g., superconductors, photonics, etc.
• A more general and rigorous (but perhaps loose) lower bound
  on the energy coefficient in all adiabatic quantum systems is
  given by the expression cE ≥ h2/4Egt,
     – where Eg = energy gap between ground & excited states,
     – and t = time taken for a single orthogonalizing transition
     – Ex.: Let Eg = 1 eV, t = 1 ps. Then cE ≥ 4.28 μeV/THz.

9/4/2010              M. Frank, "Approaching the Physical Limits of Computing"   34
               Requirements for Energy-
            Recovering Clock/Power Supplies
• All known reversible computing schemes require a periodic
  global signal that synchronizes and drives adiabatic
     – For good system-level energy efficiency, this signal must oscillate
       resonantly and near-ballistically, with a high effective quality factor.
• Several factors make the design of a satisfactory resonator
  quite difficult:
     – Need to avoid uncompensated back-action of logic on resonator
     – In some resonators, Q factor may scale unfavorably with size
     – Effective quality factor problem
• There‟s no reason to think that it‟s impossible to do…
     – But it is definitely a nontrivial hurdle, that we need to face up to, pretty
           • If we want to make reversible computing practical in time to avoid an
             extended period of stagnation in computer performance growth.

9/4/2010              M. Frank, "Approaching the Physical Limits of Computing"       35
                  The Back-Action Problem
• The ideal resonator signal is a pure periodic signal.
     – A pretty general result from communications theory:
           • A resonator‟s quality factor is inversely proportional to its signal bandwidth B.
     – E.g., for an EM cavity w. resonant frequency ω0,
           • the half-maximum BW is B = ∆ω = ω0/(2πQ) [1].
     – Thus Q∞  B  0.
           • There must be little or no information in the resonator signal!
• However, if the logic load being driven varies from on cycle to the next,
     – whether due to data-dependent variations,
     – or structural variations (different amounts of logic being driven per cycle)
• this will tend to produce impedance nonuniformities, which will lead to
  nonuniform reflections of the resonator signal
     – and thereby introduce nonzero bandwidth into that signal.
• Even more generally, any departure of resonator energy away from its
  ideal desired trajectory represents a form of effective energy dissipation!
     – we must control exactly where (into what states) all of the energy goes!
           • the set of possible microstates of the system must not grow quickly

               [1] Schwartz, Principles of Electrodynamics, Dover, 1972.

9/4/2010                M. Frank, "Approaching the Physical Limits of Computing"                 36
            Unfavorable Scaling of Resonator
                Quality Factor with Size?
• I don‟t yet have a perfectly clear and general understanding of
  this issue, but…
     – In a lot of oscillating systems I‟ve looked at, the resonant Q factor may
       tend to get worse (or at least, not very much better) as the resonator
       dimensions get smaller.
           • E.g., in LC oscillators, inductor Q scales inversely to frequency
               – EM emission is greater at high frequencies
               – But, the tendency is for low f  large coil sizes, not small!
           • Anecdotal reports from people working in NEMS community…
               – It can be difficult to get high Q in nanoscale electromechanical resonators
                     » Perhaps due to present difficulty of precision engineering at nanoscale?
           • Our own experience working with transmission-line resonators
• Example: In a cubical EM cavity of length L,
     – We have 2πQ = L / 8δ, where δ = skin depth. ([1] again)
           • Skin depth δ = (2πσk)−1/2, where σ = wall conductivity, k = wave #.
               – So if L is fixed, high Q  small δ  large k  high f  low Q in logic!

9/4/2010              M. Frank, "Approaching the Physical Limits of Computing"                    37
              The Effective Quality Factor
• Actual quality factor of resonator Q = Eres/Edissr.
     – Where Eres = energy contained in resonator signal
     – and Edissr = energy dissipated in resonator per cycle.
• But the effective quality factor, for purposes of doing
  energy-efficient logic transitions is Qeff = Edeliv/Edissr.
     – Where Edeliv = energy delivered to the logic per transition.
           • Since 1/Qeff of the logic signal energy is dissipated per cycle.
• Thus, Qeff = Q · (Edeliv/Eres).
     – That is, the effective Q is taken down by the fraction of
       resonator energy delivered to the logic per cycle.
• If a resonator needs to be large to attain high Q,
     – it may also hold a large amount of energy Eres,
           • and so it may not have a very high effective Q for driving the logic!

9/4/2010             M. Frank, "Approaching the Physical Limits of Computing"    38

            Trapezoidal Resonator Concept
                    Arm anchored to nodal points of fixed-fixed beam flexures,
                    located a little ways away, in both directions (for symmetry)

                Moving metal plate support arm/electrode

                                    plate Range of Motion

           Phase 0° electrode                                      Phase 180° electrode         y
                                                 interdigitated                                     x
   C(θ)                                               structure   C(θ)
                                               arbitrarily many
       0°                   360°          times along y axis,        0°              360°
       θ                                  all anchored to the        θ
                                                 same flexure

9/4/2010               M. Frank, "Approaching the Physical Limits of Computing"                     39
       Previous CMOS-MEMS Resonators
       in post-CMOS DRIE process (in use at UF)
      Front-side      Serpentine
      view            spring     Proof



                                                                  150 kHz

9/4/2010           M. Frank, "Approaching the Physical Limits of Computing"   40
                     Resonator Schematic                                                         Vb
                                                                            Vc        vac
            Vc            vac
            Sensor                                                                Sensor


            Sensor                                                                Sensor

             Vc          vac
                                              Actuator                        V p  Vc  Vb
9/4/2010               M. Frank, "Approaching the Physical Limits of Computing"                   41
                 Post-TSMC35 AdiaMEMS Resonator
                              (Coventorware model)

     Taped out
     April ‘04                                                    comb



9/4/2010          M. Frank, "Approaching the Physical Limits of Computing"          42
                Quasi-Trapezoidal MEMS
             Resonator: 1st Fabbed Prototype
                                 • Post-etch process is still being fine-tuned.
                                      – Parts are not yet ready for testing…



           Drive comb

9/4/2010            M. Frank, "Approaching the Physical Limits of Computing"             43
• Reversible computing will become necessary
  within our lifetimes,
     – if we wish to continue progress in computing
       performance/power beyond the next 1-2 decades.
• Much progress in our understanding of RC
  has been made in the past three decades…
     – But much important work still remains to be done.
• I encourage my audience to join the
  community of researchers who are working to
  address the reversible computing challenge.

9/4/2010       M. Frank, "Approaching the Physical Limits of Computing"   44

To top