Document Sample
					              Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 2, July-September (2012), © IAEME

ISSN 0976 – 6464(Print)
ISSN 0976 – 6472(Online)
Volume 3, Issue 2, July- September (2012), pp. 199-208
Journal Impact Factor (2012): 3.5930 (Calculated by GISI)                ©IAEME

      1. P.Sreenivasulu†, 2.Krishnna veni ,3.Dr. K.Srinivasa Rao*,4.Dr.A.VinayaBabu
  1. Assistant Professor of ECE, Dr.S.G.I.E.T, MARKAPUR, Prakasam Dist., A.P, INDIA
  2. Assistant Professor of ECE,,3.Principal & Professor of ECE, T.R.R
  College of Eng. HYDERABAD, A.P, INDIA.* principaltrr@gmail.com4.Principal,JNTU
                              College of Eng.,HYDERABAD.


         In the past, the major concerns of the VLSI designer were area, performance, cost and
reliability; power consideration was mostly of only secondary importance. In recent years,
however, this has begun to change and, increasingly, power is being given comparable weight
to area and speed considerations. Increasing demand for portable electronics for computing
and communication, as well as other applications, has necessitated longer battery life, lower
weight, and lower power consumption. In order to satisfy these requirements, research
activities focusing on low power/low voltage design techniques are underway. Low power
design basically involves two concomitant tasks: power estimation and analysis and power
minimization. These tasks need to be carried out at each of the levels in the design hierarchy,
namely, the system (behavioural), architectural, logic, circuit and physical levels. In this
paper, we discuss major sources of power dissipation in VLSI systems, and various low
power design techniques on the system, architectural, logic, circuit and physical levels.

Key words - Dynamic power, leakage power, MTCMOS, Threshold voltage.

               1. INTRODUCTION

         In VLSI hundreds of millions transistors are integrated on one chip. Packaging and
cooling only have a limited ability to remove the excess heat. This results in power
consumption is an important factor in achieving high performance. Low power design can be
explored at various levels like system, architectural, logic, and circuit level. At system level,
computation complexity can be reduced via sub-structure sharing, various algorithmic
transformation, precomputation, adaptive computing. At architectural level, various
techniques such as retiming, pipelining and parallel processing can be employed to reduce
critical path or increase clock period, which either reduces switching activities or facilitates
applying lower supply voltage to the designed circuit to meet the target throughput (or both),
and thus enable reducing the power consumption drastically. At gate level an effective
method of power optimization technique is the reduction of switching activity. A total
switching activity is

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 2, July September (2012), © IAEME

defined as a sum of all gates switching activity in the circuit and it determines the power
dissipation of that circuit. A particular gate switching activity has a significant influence on
temperature distribution in a chip. Switching activity reduction leads to power consumption
minimization, which results in minimization of dimensions of the chip.

       At circuit level design, designer may use different design style and circuit topology.
The choice of using static or dynamic logic is dependent
on many criteria than just its low power performance, e.g., testability and ease of design.
Physical design fits between the net list of gates specification and the geometric (mask)
representation known as the layout. It provides the automatic layout of circuits minimizing
some objective function subject to given constraints.

        The rest of the paper is organised as follows. Section 2 describes low power design
principles. Section 3 evaluates low power design techniques at system level while section 4
reviews at architecture level and logic level. Section 5 evaluates at circuit and physical level.
Conclusions are offered in section 6.


       There are three major sources of power dissipation in a CMOS circuit
       PTotal = Pdynamic + PSC + Pleakage

       PTotal is the total power dissipation of a CMOS circuit, Pdynamic is the dynamic power
consumption due to the switching of transistors, PSC is the short circuit current dissipation
when there is direct path from Vdd to GND, and Pleakage is the power consumption due to
leakage currents.

A.Dynamic power consumption
       Dynamic power consumption is due to charging and discharging of load capacitance
as shown in fig 1 and is given by

               P = ACV2F
P is the power consumed, A is the activity factor, i.e., average no of transitions per clock
cycle at the inverter’s output, also known as switching activity.

                           Fig 1. CMOS inverter for power analysis
        If a capacitance of C is charged and discharged by a clock signal of frequency F and
peak voltage V, then the charge moved per cycle is            and the charge moved per second
is        . Since the charge packet is delivered at voltage V, the energy dissipated per cycle,
or the power, is CV2F. The data power for a clocked flip flop, which can toggle at most once
                       .                               flip-flop,

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 2, July September (2012), © IAEME

per cycle, will be 0.5CV2F. When capacitances are clock gated or when flip-flops do not
                            .                                                 flip
toggle every cycle, their power consumption will be lower. Hence, a constant called the
activity factor (0≤A≤1) is used to model the average switching activity in the circuit.

B. Short circuit power
        This is caused by direct current from Vdd to gnd when both transistors are on. Finite
slope of the input signal causes a direct current path between Vdd and gnd for a short period
of time during switching when both the NMOS and PMOS transistors are conducting and is
given as

PSC = tSC Vdd Ipeak f0→1

tSC = (tr + tf)/2
tr is the rise time and tf is the fall time
Ipeak is determined by saturation current of the P and N transistors which depend on their sizes
, process technology, temperature.

C. Leakage power consumption
The leakage power can be expressed as:
pIeakage = I leakage vdd

where Ileakage, is the total leakage current in a CMOS circuit. Ilekage is caused by six short
channel leakage mechanisms the reverse bias pn junction leakage, subthreshold leakage,
oxide tunnelling current, gate current due to hot carrier injection, gate induced drain leakage,
and the channel punch through current.


       At system or behavioural level of design, selection of right algorithm leads to large
reduction in power consumption.

                           Fig 2. System level low power design flow

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 2, July-September (2012), © IAEME

       It is a design methodology that reconfigures an electronic system to provide the
requested services and performances with a minimum number of active components or a
minimum load on such components.

        The system specification defines system requirements, and is expressed at a very high
level of abstraction. This specification is usually written in a standard language, such as
C/C++ or System C as shown in fig 2. Using this system specification, algorithms that realize
system functionality are developed and optimized, generally in those same standard
languages. The algorithmic description consists of an executable specification, or functional
description. This executable specification captures the system function, and enables the
verification thereof. It can be written as a behavioral description, which can be refined into a
bit-accurate, pure functional design description.

        The power dissipation of the clocked components in a system is more amount of the
total chip power consumption. Clock gating is an effective way to reduce the switching
power. Supply shut down of idle components can reduce both switching power and leakage
power. In this level of abstraction, it is very difficult to calculate the power dissipation
because the designer is not very much clear about the architecture or complete hardware of
the system.


        The decisions made at a high level (architecture or system) will have a much larger
impact on power than those made at a lower level (e.g. gate or circuit level). At architectural
level, various techniques like retiming, pipelining, and parallel processing can be used to
reduce critical path or increase clock period.

        Retiming is a transformation technique used to change the locations of delay elements
in a circuit without affecting the input/output characteristics of the circuit.

                        Fig 3. (a) Original Circuit, (b) Retimed circuit

        Consider a single output function f(x1,… ,xn) whose Boolean network is shown in
Figure 3(a). Let the node that produces the output function be G, and assume node G has m
inputs y1,… , ym. Let that the controlling value of node G be c, which means the output is
determined whenever one of the inputs has the value c on it. For instance, the controlling
value of AND and NAND gates is 0, while for OR and NOR gates the controlling value is
1.When a node has a controlling value on any one of its inputs, the logic signals on the other
inputs are not observable and thus can be set to any value. Since our goal is to achieve lower
power consumption, a feasible way is to refrain those unobservable signals from changing so
that switching activity can be reduced.
        In fig 3 if y1 t=c , the signals on y2 t to ym t are irrelevant and thus all the switching
activities in logic block D can be stopped to reduce power in cycle t. An easy way to

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 2, July-September (2012), © IAEME

accomplish this goal is to disable register R2 from loading new values at cycle t. Since there
are no signal transitions in R2, all signals in logic block D will not change. A retimed circuit
that achieves this goal is shown in Figure 3 (b). We move cone C such that it produces signal
y1 a cycle earlier, and the result is stored in a flip-flop (FF), which will be applied to node G
at the next cycle. Assume y1t-1= c, which is the controlling value of node G. In this case, ft
will be solely decided by y1t, which implies that we do not have to load register R2 in cycle t.
Hence, in Figure 3(b), register R2 will be load enabled (LE) only if y1 t-1≠c.

        pipelining is another technique to reduce the power consumption. An example of
pipelining for low power is shown in Figure 4.

     (a)Reference:              (b)Pipelined:
     Capacitance = C            Capacitance ~ C
     Frequency = f             Frequency = f
     Voltage    =V             Voltage ~ V/N
     Power = P α CV2f          Power ~ P/N2

Fig 4. Voltage scaling and pipelining for low power

        In this situation power reduction is achieved by inserting pipeline registers, resulting
in an N-stage pipelined version of processor A (assuming processor A can be pipelined to this
extent). In this implementation, maintaining throughput requires that we sustain clocking
Frequency, f. Ignoring the overhead of the pipeline registers, the capacitance, C, also remains
constant. The advantage of this configuration is derived from the greatly reduced
computational requirements between pipeline registers. Rather than performing the entire
computation, A, within one clock cycle, only 1/Nth of A need be calculated per clock cycle.
This allows a factor N reduction in supply voltage and, considering the constant C and f
terms, the dynamic power consumption is reduced by N2.

c. Parallel Processing:
       As a quantitative example, consider the use of parallelism to perform some complex
operation, A . The registers supplying operands and storing results for A are clocked at a
frequency f. Further assume that algorithmic and data dependency constraints do not prevent
concurrency in the calculations performed by A. When the computation of A is parallelized,
Figure 5       results. The hardware comprising block A has been duplicated N times resulting
in N identical processors. Since there are now N processors, a throughput equal to that of
sequential processor, A, can be maintained with a clocking frequency N times lower than that
of A. That is, although each block will produce a result only 1/Nth as frequently as processor
A, there are N such processors producing results. Consequently, identical throughput is

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 2, July-September (2012), © IAEME

  (a)Reference:                   (b)Parallel:
  Capacitance = C              Capacitance ~ NC
   Frequency = f               Frequency = f/N
   Voltage    =V                Voltage ~ V/N
   Power      = P α CV2f       (neglecting Vt)
                               Power ~ P/N2

Fig 5. Voltage scaling and parallelism

        The key to this architecture’s utility as a power saving configuration lies in this factor
of N
reduction in clocking frequency. In particular, with a clocking frequency of f/N, each
individual processor can run N times slower. Since to the first order, delays vary roughly
linearly with voltage supply, this corresponds to a possible factor of N reduction in supply
voltage. Examining the power consumption relative to the single processor configuration, we
see that capacitances have increased by a factor of N (due to hardware duplication), while
frequency and supply voltage have been reduced by the same factor. Thus, since, power
consumption is reduced by the square of the concurrency factor, N

Gate Level Design

        This philosophy should also apply to the gate level. Still, as in the case of the circuit
level, there are gate-level techniques that can be applied successfully to reduce power
consumption. Once again these techniques reflect the themes of trading performance and area
for power, In this section we discuss a number of gate-level techniques and give some
quantitative indication of their impact on power. In particular, this section presents techniques
for technology mapping, glitching and activity reduction.

A.Technology Decomposition and Mapping:
        Technology decomposition and mapping refers to the process of transforming a gate-
boolean description of a logic network into a CMOS circuit. For a given gate-level network
there may be many possible circuit-level implementations. For instance, a three-input NAND
can be implemented as a single complex CMOS gate or as a cascade of simpler two-input
gates. Each mapping may result in different signal activities, as well as physical capacitances.
The concept of technology mapping for low-power is to first decompose the boolean network
such that switching activity is minimized, and then to hide any high activity nodes inside
complex CMOS gates. In this way, rapidly switching signals are mapped to the low
capacitance internal nodes, thereby reducing power consumption. Making a gate too
complex, however, can slow the circuit, resulting in a trade-off of performance for power.

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 2, July-September (2012), © IAEME

B.Glitch Reduction:
       Other gate-level activity reduction techniques focus on avoiding the wasted transitions
associated with glitching. Figure 6           shows two implementations of the same logic
function. One implementation employs a balanced tree structure, while the other uses a
cascaded gate structure.

                      a.Balanced tree structure      b.Cascaded structure

                      Fig 6. Cascaded and balanced tree gate structures

        If we assume equal input arrival times and gate delays, we find that the cascaded
undergoes many more transitions than the tree structure before settling at its steady-state
value. In particular, the arrival of the inputs may trigger a transition at the output of each of
the gates. These output transitions may in turn trigger additional transitions for the gates
within their fan-out. This reasoning leads to an upper-bound on glitching that is O(N2), where
N is the depth of the logic network. In contrast, the path delays in the tree structure are all
balanced, and therefore, each node makes a single transition and no power is wasted. This
concept can be extended to derive optimum tree structures for the case of unequal arrival
times as well. Some studies have suggested that eliminating glitching in static circuits could
reduce power consumption by as much as 15-20%


        At Circuit level design, designer may use different design style and circuit topology.
The choice of using static or dynamic logic is dependent on many criteria than just its low
power. This section will consider five topics relating to low-power circuit design: dynamic
logic, pass-transistor logic, asynchronous logic, transistor sizing, and design style.

A.Dynamic Logic
        In IC design, dynamic logic or sometimes clocked logic is a design methodology
in combinational logic circuits, particularly those implemented in MOS technology. It is
distinguished from the so-called static logic by exploiting temporary storage of information in
stray and gate capacitances. Dynamic logic circuits are usually faster than static counterparts,
and require less surface area, but are more difficult to design, and have higher power
dissipation. Dynamic logic is distinguished from so-called static logic in that dynamic logic
uses a clock signal in its implementation of combinational logic circuits. The usual use of a
clock signal is to synchronize transitions in sequential logic circuits. For most
implementations of combinational logic, a clock signal is not even needed.

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 2, July-September (2012), © IAEME

                             (a)Static                 ( b) Dynamic

                   Fig 7. Static and dynamic implementations of F=(A+B)C

        Dynamic design styles often have significantly reduced device counts. Since the logic
evaluation function is fulfilled by the NMOS tree alone, the PMOS tree can be replaced by a
single precharge device. These reduced device counts result in a corresponding decrease in
capacitive loading, which can lead to power savings. Example is shown in fig 7.
B.Pass transistor logic
         In electronics, pass transistor logic describes several logic families used in the design
of integrated circuits. It reduces the count of transistors used to make different logic gates, by
eliminating redundant transistors. Transistors are used as switches to pass logic
levels between nodes of a circuit, instead of as switches connected directly to supply
voltages. This reduces the number of active devices, but has the disadvantage that output
levels can be no higher than the input level. Each transistor in series has a lower voltage at its
output than at its input.If several devices are chained in series in a logic path, a
conventionally-constructed gate may be required to restore the signal voltage to the full
value. By contrast, conventional CMOS logic always switches transistors to the power supply
rails, so logic voltage levels in a sequential chain do not decrease. Since there is less isolation
between input signals and outputs, designers must take care to assess the effects of
unintentional paths within the circuit. For proper operation, design rules restrict the
arrangement of circuits, so that sneak paths, charge sharing, and slow switching can be
avoided. Simulation of circuits may be required to ensure adequate performance.
        Complementary pass-transistor logic or "Differential pass transistor logic" refers to
a logic family which is designed for certain advantages. It is common to use this logic family
for multiplexers and latches. CPL uses series transistors to select between possible inverted
output values of the logic, the output of which drives an inverter to generate the non-inverted
output signal. Inverted and non-inverted inputs are needed to drive the gates of the pass-
C.Asynchronous logic
        As transistor switching speed improves, synchronizing a global clock increasingly
degrades system performance. Therefore, self-timed asynchronous logic becomes potentially
faster than synchronous logic. To do so, however, it must exploit the techniques used in fast
synchronous designs, including redundant logic, inverting logic, transistor size optimization,
dynamic logic, and phase alignment. Most techniques can be applied equally well to
asynchronous logic-indeed phase alignment is easier-but combining dynamic and
asynchronous logic is more difficult.

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 2, July-September (2012), © IAEME

D.Transistor sizing

        Regardless of the circuit style employed, the issue of transistor sizing for low power
arises. The primary trade-off involved is between performance and cost - where cost is
measured by area and power. Transistors with larger gate widths provide more current drive
than smaller transistors. Unfortunately, they also contribute more device capacitance to the
circuit and, consequently, result in higher power dissipation. Moreover, larger devices
experience more severe short-circuit currents, which should be avoided whenever possible.

E. Design style

       Another decision which can have a large impact on the overall chip power
consumption is selection of design style: e.g. full custom, gate array, standard cell, non
clocked, clocked etc. Full custom design offers the best possibility of minimizing power
consumption but this is a costly alternative in terms of design time, and can rarely be
employed exclusively as a design strategy. Gate arrays offer one alternative for reducing
design cycles at the expense of area, power, and performance. Standard cell synthesis is
another commonly employed strategy for reducing design time. Current standard cell libraries
and tools, however, offer little hope of achieving low power operation. Similarly non clocked
design style consumes less power. Clocked design style consumes more power.

Physical level Design

        In physical design depending on the target design style (full-custom, standard-cell,
gate arrays, FPGAs), the packaging technology (printed circuit boards, multi-chip modules,
wafer-scale integration) and the objective function (area, delay, power, reliability), various
optimization techniques are used to partition, place, resize and route gates. At this level
power may be reduced by using appropriate net weights during net list partitioning, floor
planning, placement and routing; Individual transistors may be sized down to reduce the
power dissipation along the non-critical paths in a circuit.


        In this paper we have reviewed various sources of power dissipation like switching,
short circuit and leakage powers. Different low power techniques at all levels of low power
design like system, architectural, logic, circuit and physical levels are discussed.

1. J. Monteiro, S. Devadas, and A. Ghosh, “Retiming sequential circuits for low power,” in
Proc. Int’l Conf. Computer-Aided Design, pp. 384-402, Nov. 1993.
2.K. Roy, et al., “Leakage control for deep-submicron circuits”, VLSI Circuits and Systems,
2003, Proc. of SPIE, Vo1.5117, pp.135-146
3. H. J. M. Veendrick. " Short-circuit dissipation of static CMOS circuitry and its impact on
the design of buffer circuits. " IEEE Journal of Solid
State Circuits, 19:468–473,August 1984.
4. J. Rabaey and M. Pedram, “Low Power Design Methodologies,” Kluwer
Academic Publishers, 1996.

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN
0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 2, July-September (2012), © IAEME

5.F. Najm, “A survey of power estimation techniques in VLSI circuits,” in IEEE Trans. on
VLSI Systems, Dec. 1994, pp. 446-455.
6.Anantha P.Chandrakasan, Samuel Sheng and Robert W.Brodersen,Low Power CMOS
digital design, fellow, IEEE
7. H. J. M. Veendrick. " Short-circuit dissipation of static CMOS circuitry and its impact on
the design of buffer circuits. " IEEE Journal of Solid
State Circuits, 19:468–473,August 1984.


Shared By: