How to cut SoC power dissipation, energy consumption and by pvb52213



     How to cut SoC power dissipation,
     energy consumption and cost
     Using multiple                                                                                                                   loring makes each on-chip processor
                                                                                                                                      more efficient when executing a spe-
     configurable processors                                                                                                           cific algorithm. Increased efficiency
                                                                                                                                      permits the processor to execute the
     can cut SoC power                                                                                                                algorithm during the same amount
     dissipation, energy                                                                                                              of time but with fewer instructions.
                                                                                                                                         The Fast Fourier Transform (FFT)
     consumption, and cost,                                                                                                           serves as a good example of an al-
     says Steve Leibson                                                                                                               gorithm that can be greatly acceler-
                                                                                                                                      ated through ISA tailoring. The FFT
                                                                                                                                      decomposes signals into their con-

                   ising clock rate causes                                                                                            stituent frequency components and
                   dynamic power dis-                                                                                                 is commonly used in communica-
                   sipation and energy                                                                                                tions and signaling applications. For
                   consumption to grow,                                                                                               example, the 802.11g wireless PHY
                   resulting in unaccept-                                                                                             employs 64-point, radix-4, decima-
     able energy consumption and heat                                                                                                 tion-in-frequency FFTs, executed
     in contemporary SoCs. These nega-                                                                                                every 3.2 microseconds or so.
     tive consequences create some hard                                                                                                  Radix-4 FFTs are usually imple-
     limits that are halting SoC advance-                                                                                             mented with hardware butterfly
     ment.                                                                                                                            blocks (the butterfly is the basic FFT
        One key way to reverse this trend                                                                                             operation), which require twelve
     is to exploit inherent parallelism,                                                                                              16x16-bit multipliers and more than
     which cuts the need for high clock                                                                                               20 16-bit adders. Usually, it’s imprac-
     rates. Programmable, configurable                                                                                                 tical to implement an FFT in software,
     microprocessor cores play a signifi-                                                                                              as illustrated by Figure 1, because of
     cant role in exploiting parallelism, in                                                                                          the high clock rates required. Using
     ways that cannot be achieved using                                                                                               straight C code, a 32-bit RISC proc-
     conventional, fixed-ISA (instruction                                                                                              essor (with no hardware multiplier)
     set architecture) processors.                                                                                                    needs 32187 cycles to execute the
        Dynamic power dissipation and           Figure 1: Cycle Count and Energy Reduction for Radix-4 FFT.                           FFT algorithm. To execute one of
     energy consumption clearly rise                                                                                                  these FFTs every 3.2 microseconds,
     with clock frequency.                      run hotter, so they need more ex-             posite for decades. In the name of      the processor would need to run at
                                                pensive packaging (for copper heat            saving hardware, microprocessor-        10GHz. That’s clearly an impossible
     The formula is: P(dynamic) = ½ CV2f        spreaders or ceramic packaging).              based systems employ multitask-         clock rate for any processor, much
                                                Larger, more costly heat sinks or             ing operating systems that allow        less one synthesized on an SoC.
        In this formula, the C is the total     noisy fans are needed to cool these           the processors to execute multiple         Figure 1 shows the result of in-
     switched capacitance of all on-chip        SoCs and systems based on such                concurrent tasks. The processor’s       creasing amounts of ISA tailoring
     nodes, V is the core operating volt-       SoCs will therefore require larger,           clock rate therefore becomes the ag-    for the FFT. Adding one 32-bit mul-
     age, and f is the clock frequency. It      more expensive enclosures.                    gregate clock rate needed to execute    tiplier to the Risc processor cuts
     appears that the dynamic power is             The added heat also reduces prod-          all of these multiple tasks.            the required clock rate to 1.6GHz.
     linearly proportional to clock fre-        uct reliability, which in turn increas-          When energy consumption was          Although better, that number is still
     quency but SoCs must run at higher         es warranty costs.                            not a problem, when microproces-        out of reach for synthesized proces-
     core operating voltages to attain the         The place to start reducing system         sors came in individual packages, and   sors. Creating a superscalar micro-
     highest possible clock frequencies,,       energy consumption is at the system           when tasks were fairly simple, it did   processor that can issue multiple
     which also brings the V2 term into         level and the key to such cuts in dig-        make sense to cut hardware costs by     Risc instructions simultaneously
     play, so the relationship between          ital systems is reducing clock rates.         employing multitasking. However,        and adding the hardware multiplier
     dynamic power and operating fre-           Lower clock rates immediately cut             processors on 90nm and 65nm SoCs        only drops the required clock rate to
     quency is superlinear.                     dynamic energy consumption and                consume less than 1mm sq. of silicon.   930MHz, at the very edge of possibil-
        Battery life is inversely proportion-   they reduce the need for advanced             To avoid high clock rates and unnec-    ity for a synthesized processor.
     al to energy consumption. SOCs that        IC processes, thus reversing the con-         essary energy consumption, it now          However, adding a radix-4 butter-
     draw a lot of energy result in products    current upward trend in static en-            makes sense to use more processors      fly instruction to a processor cuts
     with short talk or operating times and     ergy consumption.                             running at lower clock rates through    the required clock rate to 46 MHz
     unacceptable standby times.                   One way to cut clock rates is to           reduced multitasking.                   (a x217 reduction) and energy con-
        At the same time, larger and more       increase execution parallelism.                  It also makes sense to tailor each   sumption by x62. Almost any algo-
     expensive batteries and power sup-         Hardware designers intuitively use            processor to the assigned task using    rithm can be similarly addressed.
     plies are needed to power SoCs that        parallelism but microprocessor-               the ISA-extension abilities inherent
     consume a lot of energy. Such SoCs         based designs have done the op-               in configurable processors. Such tai-    Steve Leibson from Tensilica

28      09 April 2008                                                                                                                         

To top