# Automating Transformations from Floating Point to Fixed Point for by mtr14643

VIEWS: 29 PAGES: 73

• pg 1
```									      Automating Transformations from
Floating Point to Fixed Point for Implementing
Digital Signal Processing Algorithms

Prof. Brian L. Evans
Embedded Signal Processing Laboratory
Dept. of Electrical and Computer Engineering
The University of Texas at Austin

Based on work by PhD student Kyungtae Han (now at Intel Research Labs)

July 4, 2006
2

Outline
• Introduction
• Background
• Optimize fixed-point wordlengths
• Reduce power consumption in arithmetic
• Automate transformations of systems
• Conclusion
Introduction                             3

Implementing Digital Signal
Processing Algorithms
Hardware Price   Power*

Floating-Point Program                  Floating-
Point
Processor
L    H
Code
Conversion
Digital Signal                                            Fixed-
Processing                    Fixed Point
(Uniform Wordlength)             Point
Algorithms                                              Processor        L    H
Wordlength
Optimization
Fixed-
Fixed Point                Point
(Optimized Wordlength)           ASIC            L    H

ASIC: Application Specific Integrated Circuit   * Power consumption
Introduction                                                4

Transformations to Fixed Point
 Lower hardware complexity
 Lower power consumption
Code

Transformation
 Faster speed in processing               Conversion
 Introduces distortion due to
quantization error                       Wordlength
Optimization
 Search for optimum wordlengths
by trial & error is time-consuming                   Fixed Point
(Optimized Wordlength)
• Research goals
 Automate transformations to fixed point
 Control distortion vs. complexity tradeoffs
5

Outline
• Introduction
• Background
• Optimize fixed-point wordlengths
• Reduce power consumption in arithmetic
• Automate transformations of systems
• Conclusion
Background                            6

Fixed-Point Data Format
• Integer wordlength (IWL)
 Number of bits assigned to integer representation
 Includes sign bit                                 SystemC format
• Fractional wordlength (FWL)                         www.systemc.org

 Number of bits assigned to fraction                 Wordlength

• Wordlength: WL = IWL + FWL
S    X X X X X
π=   3.14159… (10)   [Floating Point]
3.140625(10) = 011.001001(2)
[WL=9; IWL=3; FWL=6]        Integer    Fractional
3.141479492(10) = 011.00100100001110(2)   wordlength wordlength
[WL=16; IWL=3; FWL=13]          (Binary point)
Background                                    7

• Different wordlengths have different application
Application                                           c(w) Implementation cost
distortion d(w)                                            function
Feasible
region          Optimal     Cmax Constant for maximum
implementation cost
curve
d(w) Application distortion
function
Implementation
Vector of wordlengths:              complexity c(w)   Dmax Constant for maximum
w  {w0 , w1 , w2 , , wN 1}                              application distortion
w    Wordlength lower bounds
• Minimize implementation cost
• Minimize application distortion                w    Wordlength upper bounds
Background                            8

Wordlength Optimization
• Single objective                 • Multiple objective
optimization                       optimization
min ac c(w )  ad d (w )                min [c ( w ), d ( w )]
wΙ
n
wΙ
n

subject to                          subject to
d (w )  Dm ax                      d ( w )  Dm ax
c(w )  Cm ax                       c ( w )  Cm ax
www                               www

Proposed work fixes integer wordlengths
and searches for fractional wordlengths
Background                                        9

Genetic Algorithm

• Evolutionary
New Gene        Function       Genes w/
algorithm                    Pool         Evaluation      Measure
 Inspired by Holland
1975
 Mimic processes of        Mutation                      Selection

plant and animal
evolution
Child                         Parental
 Find optimum of a          Genes
Mating
Genes
complex function
[Greg Rohling, Ph.D Defense, Georgia Tech, 2004]
Background                                                            10

Pareto Optimality

• Pareto optimality: “best that could be
Pareto Front
one group” [Schick, 1970]
I
A
G

Objective 2
H

• Pareto optimal set is set of nondominated                           B
E
C               F
solutions                                                                       D
 E is dominated by C as all objectives for C
Objective 1
are less than corresponding objectives for E
: Nondominated
 Solutions A, B, C, D are nondominated (not
: Dominated
dominated by any solution)
• Pareto front is boundary (tradeoff curve)
that connects Pareto optimal set solutions
11

Outline
• Introduction
• Background
• Optimize fixed-point wordlengths
• Reduce power consumption in arithmetic
• Automate transformations of systems
• Conclusion
Optimize Fixed-Point Wordlengths                  12

Search for Optimum Wordlength

• Exhaustive search impractical for many variables
 Utilizes gradient information to determine next candidates
 Complexity measure (CM) [Sung & Kum, 1995]
 Distortion measure (DM) [Han et al., 2001]
Next    Complexity-and-distortion measure (CDM) [Han & Evans, 2004]
• Guided random search
 Genetic algorithm for single objective [Leban & Tasic, 2000]
Next    Multiple objective genetic algorithm [Han, Olson & Evans, 2006]
Optimize Fixed-Point Wordlengths                          13

Complexity-and-Distortion Measure
• Weighted combination of measures                          c(w) Complexity
f cd (w )   c c(w )   d d (w )                             function

where  c   d  1, 0   c  1, 0   d  1             d(w) Distortion
function
• Single objective function                min f cd (w )    Dmax Constant for
wΙ
n

• Gradient-based search                    subject to            maximum
distortion
 Initialization                        d (w )  Dm ax
Cmax Constant for
 Iterative greedy search based c(w )  Cm ax                 maximum
on complexity and distortion w  w  w                      complexity
Optimize Fixed-Point Wordlengths          14

Case Study I: Filter Design
• Infinite impulse response (IIR) filter
 Complexity measure: Area model of field-programmable
gate array (FPGA) [Constantinides, Cheung & Luk 2003]
 Distortion measure: Root mean square (RMS) error
 Seven fixed-point variables (indicated by slashes)
b0

x[n]                                        y[n]

Delay
-a1           b1
Optimize Fixed-Point Wordlengths                   15

• CDM could lead to lower complexity and lower
number of simulations compared to DM and CM

Method     Measure     of System     Estimate     (RMS)*
Simulations     (LUT)

Complete        -           167 **            -           -

* Maximum distortion measured by root mean square (RMS) error is 0.1
** 167 = 268,435,456 (8.5 years, if 1 second per 1 simulation)
Optimize Fixed-Point Wordlengths                                                                      16

Case Study I: Genetic Algorithm
• Search Pareto optimal set (nondominated)
• Handles multiple objectives: Error and Area
Pareto Front
0                                                              0                                                         0
10                                                             10                                                        10
non-dom (67/90)                                               non-dom (76/90)                                      non-dom (90/90)
dom (23/90)                                                   dom (14/90)
Error (RMS)

Error (RMS)

Error (RMS)
-1                                                             -1                                                        -1
10                                                             10                                                        10

-2
9,000 simulations                                        -2
22,500 simulations                                   -2
45,000 simulations
10                                                             10                                                        10
20      40      60     80       100                            20     40      60     80       100                       20      40      60     80   100
Area (LUTs)                                                   Area (LUTs)                                               Area (LUTs)

100th Generation                                               250th Generation                                         500th Generation
* Population for one generation: 90                                                                                           LUT: Lookup table
Optimize Fixed-Point Wordlengths                                                         17

Case Study I: Comparison
• Gradient-based search (GS) results vs. GA results
0                                                                    0
10                                                                   10
non-dom (90/90)
DM solutions
CDM solutions
CM solutions

Error (RMS)
Error (RMS)

-1                                                                   -1
10                                                                   10

non-dom (35/90)
dom (55/90)
DM solutions
CDM solutions
-2
CM solutions                                               -2
10                                                                   10
20       40          60        80      100                           20   40          60          80           100
Area (LUTs)                                                      Area (LUTs)
50th Generation (4500 simulations)        500th Generation (45000 simulations)
* Required RMSmax for gradient-based search are Dmax  {0.12, 0.1, 0.08}

• GS methods can get stuck in a local minimum
• GS methods reduce running time (CDM: 145 simulations)
Optimize Fixed-Point Wordlengths           18

Case Study II: Communication System
• Simple binary phase shift keying (BPSK) system
 Complexity measure: Area model of field-programmable
gate array (FPGA) [Constantinides, Cheung, and Luk 2003]
 Distortion measure: Bit error rate (BER)
 Four fixed-point variables (indicated by slashes)
Source Data
(1 or -1)

Carrier    AWGN

Integration
BER       Decision
& Dump
Optimize Fixed-Point Wordlengths                   19

• CDM could lead to lower complexity and lower
number of simulations compared to DM and CM

Method     Measure     of System     Estimate     (BER)*
Simulations     (LUT)

Complete        -           65536             -            -

* Maximum distortion measured by bit error rate (BER) error is 0.1
Optimize Fixed-Point Wordlengths                                                                  20

Case Study II: Genetic Algorithm

For Comparison
• Search Pareto optimal set                                                                                         BER      LUT
DM        0.83    40.65
• Handles multiple objectives                                                                             CDM       0.85    43.95
Pareto Front                                                                                             CM        0.81    41.95
Error (Bit Error Rate)
Error (Bit Error Rate)

Error (Bit Error Rate)
4,500 simulations                               9,000 simulations                                           18,000 simulations

50th Generation                                    100th Generation                                          200th Generation

* Population for one generation: 90                        Preliminary results                                    LUT: Lookup table
Optimize Fixed-Point Wordlengths                      21

Comparison of Proposed Methods

search                    algorithm
Type of Solution               One point              Family of points
Execution Time                    Short                    Long
Amount of Computation              Low                     High
Parallelism                        Low                     High
22

Outline
• Introduction
• Background
• Optimize fixed-point wordlengths
• Reduce power consumption in arithmetic
• Automate transformations of systems
• Conclusion
Reduce Power Consumption in Arithmetic         23

Lower Power Consumption in DSP
• Minimize power dissipation due to limited battery
power and cooling system
• Multipliers often a major source of dynamic power
consumption in typical DSP applications
 Multi-precision multiplier select smaller multipliers (8,
16 or 24 bits) to reduce power consumption
Next       Wordlength reduction to select any word size
[Han, Evans & Swartzlander 2004]

• In general, what reductions in power are possible in
software when hardware has fixed wordlengths?
Reduce Power Consumption in Arithmetic                          24

Wordlength Reduction in Multiplication
• Input data wordlength             Sign bit
0001 0010 0011 0100
reduction                                         1101 1100 1010 1001
 Smaller bits enough to                     (a) Original Multiplication
represent, e.g. π x π ≈ 9
0001 0010 0000 0000
• Truncation                                        1101 1100 0000 0000

• Signed right shift                           (b) Reduction by Truncation

 Move toward the least                          0000 0000 0001 0010
1111 1111 1101 1100
significant bit (LSB)
 Signed bit extended for               (c) Reduction by Signed Right Shift

arithmetic right shift
Reduce Power Consumption in Arithmetic                        25

Power Reduction via Wordlength Reduction

• Power consumption
 Switching power consumption                   Pswitching  CLVdd f clk
2

 Static power consumption
• Switching power consumption                     CL Load capacitance
 Switching activity parameter, α               Vdd Operating voltage
 Reduce α by wordlength                        fclk Operating
reduction                                          frequency

Relationship between reduced wordlength and
switching parameter α in power consumption?
Reduce Power Consumption in Arithmetic                  26

Analytical Method
L bits
M bits            N bits
No Reduction
S      …                …

S      …                …

S S    …     S S        …

Input                 Switching
expectation
Full length                L/2
Truncate N bits             M/2
N-bit signed                L/2
right shift                                            Wordlength (L) = 16
Reduce Power Consumption in Arithmetic                         27

Dynamic Power Consumption
for Wallace Multiplier (1 MHz)

Reduction
(56%)

16-bit x 16-bit
multiplier                                                       Truncate 1st arg
(Simulated on                                                    Truncation- First
Truncate 2nd arg
Xilinx XC3S200-                                                  Truncation- Second
(recode,nonrecode)
5FT256 FPGA)

Wallace multiplier used in TI 320C64 DSP
Reduce Power Consumption in Arithmetic                         28

Modified Booth Multiplier (1 MHz)

Sensitive                                       Reduction
(13%)                                           (31%)

16-bit x 16-bit
multiplier                                                          Truncate 1st arg
(Simulated on                                                       Truncate 2nd arg
Xilinx XC3S200-                                                     (recode,nonrecode)
5FT256 FPGA)

Swapping could have benefit
Radix-4 modified Booth multiplier used in TI 320C62 DSP
Reduce Power Consumption in Arithmetic      29

Comparison of Proposed Methods

• Truncation to 8 bits reduces est. power consumption by
56% in Wallace and 31% in Booth 16-bit multipliers
• Signed right shift has no est. power reduction in
Wallace multiplier (for any shift) and 25% reduction in
Booth (for 8-bit shift) multiplier
• Operand swapping reduces power consumption for
Booth but has negligible savings for Wallace multiplier
• Power consumption in tree-based multiplier
 Highly dependent on input data
 Simulation matches analysis
30

Outline
• Introduction
• Background
• Optimize fixed-point wordlengths
• Reduce power consumption in arithmetic
• Automate transformations of systems
• Conclusion
Automatic Transformations of Systems                           31

Automating Transformations from
Floating Point to Fixed Point
• Existing fixed-point tools
Fixed-point tools
 Support fixed-point simulation
• SNU gFix, Autoscaler
 Convert floating-point code to              • CoWare SPW HDS
raw fixed-point code                        • Synopsys CoCentric
• MATLAB Fixed-point toolbox
 Manually find optimum                       • MATLAB Fixed-point blockset
wordlength by trial and error               • AccelChip DSP synthesis
• Catalytic RMS, MCS
• Automating transformations
 Fully automate conversion and wordlength optimization

Floating-Point     Code              Wordlength             Wordlength-Optimized
Program       Conversion          Optimization           Fixed-Point Program
Automatic Transformations of Systems                  32

Automatic Transformation Flow
• Code generation
 Parse floating-point program
 Generate raw fixed-point program and auxiliary programs
• Range estimation
 Estimate range to avoid overflow (Analytical/Simulation)
 Determine integer wordlength (IWL)
• Wordlength optimization
 Optimize wordlength according to given input, and error
specification (Analytical/Simulation)
 Determine fractional wordlength (FWL)

Code                     Range                    Wordlength
Generation               Estimation                 Optimization
Automatic Transformations of Systems                     33

Automating Transformation Environment
for Wordlength Optimization
Input Data        Top                                Floating-Point
Optimum Wordlength       Program                                 Program

Evaluation
Search           Program              Fixed-Point
(Objectives)
or Genetic
algorithm

Range         Complexity          Error
Estimation      Estimation        Estimation

• Given floating-point program and options,
auxiliary programs are automatically generated
• Given input data, optimum wordlength is searched
Automatic Transformations of Systems   34

Demo of Released Software
Conclusion                              35

Conclusion
• Search for optimum wordlength
 Gradient-based search reduces execution time while
solutions could be trapped in local optimum
 Genetic algorithm can find distortion vs. complexity
tradeoff curve, but it requires longer execution time
• Reduce power consumption by wordlength reduction
of multiplicands
• Automate transformations from floating-point
programs to fixed-point programs
• Freely distributable software release available at
http://www.ece.utexas.edu/~bevans/projects/wordlength/converter/
Conclusion                         36

Future Work
 Hybrid wordlength optimization
 Prune redundant wordlength variables (e.g. delay, adder)

• Further analysis on search algorithms
 Analysis of genetic algorithms with different settings
 Comparison with simulated annealing

• Low power consumption
 System level including memory [Powell and Chau, 1991]
 Wordlength reduction for floating-point multipliers
Conclusion                             37

Future Work (continued)
• Electronic design automation software
 Enhanced code generator (e.g. rounding preferences)
 Hybrid analytical/simulation range estimation

• Optimum DSP algorithms
 Rearranging subsystems at block diagram
 Rearranging mathematical expressions in algorithm

• Developing more sophisticated hardware area models
 Avoids having to route each design through synthesis tools
 Transcendental functions
38

End
39

Backup Slides
Publications                                              40

Publications-I
• Conference Papers
1. K. Han, A. G. Olson, and B. L. Evans, ``Automatic floating-point to fixed-point
transformations'', Proc. IEEE Asilomar Conf. on Signals, Systems, and Computers, Nov.
2006, Pacific Grove, CA USA. invited paper.
2.   K. Han, B. L. Evans, and E. E. Swartzlander, Jr., ``Low-Power Multipliers with Data
Wordlength Reduction'', Proc. IEEE Asilomar Conf. on Signals, Systems, and Computers,
Oct. 30-Nov. 2, 2005, pp. 1615-1619, Pacific Grove, CA USA.
3.   K. Han, B. L. Evans, and E.E. Swartzlander, Jr., ``Data Wordlength Reduction for Low-
Power Signal Processing Software,'' Proc. IEEE Work. on Signal Processing Systems,
Oct. 13-15, 2004, pp. 343-348, Austin, TX USA.
4.   K. Han and B. L. Evans, ``Wordlength Optimization with Complexity-And-Distortion
Measure and Its Applications to Broadband Wireless Demodulator Design,'' Proc. IEEE
Int. Conf. on Acoustics, Speech, and Signal Proc., May 17-21, 2004, vol. 5, pp. 37-40,
5.   K. Han, I. Eo, K. Kim, and H. Cho, ``Numerical Word-Length Optimization for CDMA
Demodulator,'' Proc. IEEE Int. Sym. on Circuits and Systems, May, 2001, vol. 4, pp. 290-
293, Sydney, Australia.
6.   K. Han, I. Eo, K. Kim, and H. Cho, ``Bit Constraint Parameter Decision Method for
CDMA Digital Demodulator,'' Proc. CDMA Int. Conf. & Exhibition, Nov. 2000, vol. 2,
pp. 583-586, Seoul, Korea.
7.   S. Nahm, K. Han, and W. Sung, ``A CORDIC-based Digital Quadrature Mixer:
Comparison with ROM-based Architecture,'' Proc. IEEE Int. Sym. on Circuits and
Systems, Jun. 1998, vol. 4, pp. 385-388, Monterey, CA USA.
Publications                                           41

Publications-II
• Journal Articles
   K. Han and B. L. Evans, ``Optimum Wordlength Search Using A Complexity-And-
Distortion Measure,'' EURASIP Journal on Applied Signal Processing, special issue on
Design Methods for DSP Systems, vol. 2006, no. 5, pp. 103-116, 2006.
• Other Publications
1. K. Han, E. Soo, H. Jugn, and K. Kim, Apparatus and Method for Short-Delay Multipath
Searcher in Spread Spectrm Systems, U.S. Patent pending, Nov. 2001.
2. K. Han, I. Lim, E. Soo, H. Seo, K. Kim, H. Jung, and H. Cho, Apparatus and Method for
Separating Carrier of Multicarrier Wireless Communication Receiver System, U.S.
Patent pending, Sep. 2001.
3. K. Han, ``Carrier Synchronization Scheme Using Input Signal Interpolation for Digital
Receivers,'' Master's Thesis, Seoul National University, Seoul, Korea, Feb. 1998.
Backup             42

Research on Transformation
Backup                              43

Simulation Flow

search algorithm         search algorithm
Generate
Pareto Front
Setup desired         Search wordlength
specification                sets

Search wordlength
Pick one of sets
set

Generate Optimized
fixed-point program
Backup                                                                          44

Algorithm Design and Implementation
Low                                                                                        High
Floating-Point              Floating-Point
Programs                   Processor

Code Conversion

Hardware Complexity

Power Consumption
Design Time

Uniform Wordlength              Fixed-Point
Fixed-Point                  Processor
Programs

Wordlength Optimization

Optimized
Fixed-Point
Fixed-Point
IC
Programs
High                                                                                       Low

Algorithm Design                Algorithm Implementation
Backup                                            45

Wordlength Optimization Constraints
• Distortion constraint                    • Complexity constraint
Application-specific                     Application-specific
distortion d(w)                          distortion d(w)

Dmax

Cmax
Implementation                               Implementation
complexity c(w)                              complexity c(w)
Backup                               46

• Gradient information can be used for update direction
• Gradient information is measured in design
parameters such as implementation complexity,
precision distortion, or power consumption
 Complexity measurement (CM)          [Sung and Kum, 1995]

 Distortion measurement (DM)        [Han et al., 2001]

 Complexity-and-distortion measurement (CDM)
[Han and Evans, 2004] (proposed)
Backup                             47

f (w ( h ) )  f ({w1( h) , w2h ) ,...,wnh1) ,...,wNh ) })
(          (           (

wnh )  wnh1)
(          (
w2
N   number of variable
3   20          23                                 h   iteration index
10          8
2   10          15             25
n   variable index
5              10
w   wordlength vector
w1
2           3             4
f(w) objective function
Search direction
b
Objective
value              a
Backup                                        48

• Wordlength update (s: step size)
              
w j 1  w j  s j
• Direction
                                d
 (1,0,0,..., ) if m j 
0
w1      Finite Difference

                               d
  (0,1,0,..., ) if m j 
0
j                                 w2
             ..........
..........      ..........

                               d
(0,0,0,..., ) if m j  w
1
                                  N
d d              d
where     m j  max(        ,      ,....,      )
w1 w2            wN
Backup                              49

Complexity and Distortion Function
• Complexity function, c(w)
 Number of multiplications is counted
 Hardware complexity is estimated by assuming that
complexity linearly increases as wordlength increases
 Given hardware model results in accurate complexity
• Distortion function, d(w)
 Difficult to derive closed-form mathematical expression
 Estimated by computer simulation measuring output SNR
or bit error rate in digital communication systems
Backup                                50

Complexity Measure [Sung and Kum, 1995]
• Uses complexity sensitivity information as direction
to search for optimum wordlength
• Disadvantage: demands large number of iterations

Objective function     f ( w )  c( w )
Optimization problem   min{ f (w) | d (w)  Dmax, w  w  w}
wI
n

Update direction         c(w)
Backup                                   51

Distortion Measure [Han et al., 2001]
• Applies the application performance information to
search for the optimum wordlengths
• Advantage: Fewer number of iterations
• Disadvantage: Not guaranteed to yield optimum
wordlength for complexity
Objective function       f (w )  d (w )

Optimization problem min{ f (w) | d (w)  Dmax, c(w)  Cmax, w  w  w}
wI n

Update direction           d (w)
Backup                                                       52

Feasible Solution Search [Sung and Kum, 1995]
• Exhaustive search of all possible                   w2
wordlengths
wopt

 Does not miss optimum points                                                       23
dw2
 Simple algorithm                                                                        22

21
dw1
 Many trials (=experiments)
• Distance d  dw1  dw2  ...  dwN                                         5                    w1
Direction of full search:
• Expected number of iterations
minimum wordlengths {2,2}
(d  N  1)...(d  2)(d  1)d        optimum wordlengths = {5,5}
EFS (d ) 
N

N!                                               d=6
trials = 24
Backup                                              53

Sequential Search [K. Han et al. 2001]
• Greedy search based on sensitivity information
w            
w w                   s j
• Example
2
j 1       j

wopt
 Minimum wordlengths {2,2}
5

 Direction of sequential search
dw2

 Optimum wordlengths {5,5}
 12 iterations                         wb
dw1

• Advantage: Fewer trials                                5                w1

• Disadvantage: Could miss global optimum point
Backup                                   54

Transmitter
Multicarrier
Data     Encoder        Modulator

Channel w2
Channel
Estimator
w
Bit Error       3                                 w
w         Multicarrier    0
Rate         Channel    1
Demodulator
Tester        Equalizer

w0   Input wordlength of a multicarrier demodulator which performs
a fast Fourier transform (FFT)
w1   Input wordlength of equalizer
w2   Input wordlength of channel estimator
w3   Output wordlength of channel estimator
Backup                                                55

Simulation Results
• CDM leads to lower complexity compared to DM
• CDM reduces the number of trials compared to CM,
feasible solution [Sung and Kim 1995], and exhaustive search
 Fast searching
Search     Gradient   αc    Number     Simulations   Wordlength   Complexity   Distortion
Method     Measure             of                        for       Estimate     (BER)*
Trials                   Variables

Gradient      DM         0      16             64 {10,9,4,10}         10781        0.0009
Gradient     CDM       0.5      15             60 {7,10,4,6}           7702        0.0012
Gradient      CM         1      69             69 {7,7,4,6}            7699        0.0015
Feasible       -         -     210            210 {7,7,4,6}            7699        0.0015
Exhaustive      -         -   26364          26364      -                  -             -
* Required BER ≤ 1.5 x 10-3
Backup                                   56

Simulation Environments
• Assumptions                                Complexity Vector
 Internal wordlengths of blocks have       Input     Weight
been decided                              FFT             1024
 Complexity increases linearly as        Equalizer            1
wordlength increases                     (right)
Estimator          128
• Required application performance           Equalizer            2
 Bit error rate of 1.5 x 10-3 (without    (upper)
error correcting codes)
Complexity
• Simulation tool                                C(w) = cT.w
 LabVIEW 7.0
Backup           57

FFT Cost
• N Tap FFT cost
N
Cost FFT    log 2 N
2
• 256 Tap FFT cost

256
CostFFT         log 2 256
2
 1024
Backup   58

Minimum Wordlengths
• Change one wordlength
variable while keeping
other variables at high
precision
 {1,16,16,16},{2,16,16,16},...
 {16,1,16,16},{16,2,16,16},...
 …
 …{16,16,16,15},{16,16,16,16}

• Minimum wordlength
vector is {5,4,4,4}
Backup       59

Number of Trials
• Start at {5,4,4,4} wordlength
• Next wordlength vectors
for complexity measure
(α = 1.0)
{5,4,4,4},
{5,5,4,4}, …
• Increase wordlength one-by-
one until satisfying required
application performance
Low-Power Signal Processing                                 60

Power Consumption
• Power consumption in CMOS circuits
α : Transition factor
Pavg  Pswitching  Pshort circuit  Pleakage
C : Capacitance
• Significant power in CMOS circuits is                          V : Supply voltage
dissipated when they are switching                             f : Frequency
Pswitching   C V 2 f
• Power reduction in hardware part [Chandrakasan and Brodersen, 1995]
 Scaling down, minimizing area
 Adjusting voltage and frequency during operation
• Power reduction in software part [Tiwari, Malik and Wolfe, 1994] [Lee et al., 1997]
 Instruction ordering and packing
 Energy reductions varying from 26% to 73%
Low-Power Signal Processing     61

Wordlength for Low-Power Consumption

• Power model of wordlength [Choi and Burleson, 1994]
 Wordlength is considered as capacitance
 Power consumption is proportional to wordlength
 Switching activity is not considered
• Data wordlength reduction technique
[Han, Evans, and Swartzlander, 2004] (proposed)

 Count node transitions for switching activity
 Reduce input data wordlength to decrease power
consumption
Backup                                                      62

Dynamic and Static Power

Trends in dynamic and static power dissipation showing
increasing contribution of static power
[S. Thompson, P. Packan, and M. Bohr. MOS Scaling: Transistor Challenges for the 21st Century. Intel
Technology Journal, Q3 1998]
Backup                                        63

Power Dissipation of Multiplier Unit
• Multiply unit is usually a major source of power
consumption in typical DSP applications
 Multiply unit required
for digital communication
& digital signal processing
algorithms
 Digital filters, equalizers,
FFT/IFFT, digital down/
upconverter, etc.

TMS320C5x Power Dissipation Characteristics
from www.ti.com
Backup                                      64

Wallace vs. Booth Multipliers

Symmetric

Asymmetric
(one operand
recoded)

Tree dot diagram                       Radix-4 multiplier based
in 4-bit Wallace multiplier             on Booth’s recoding (Χ ● a = P)
Backup                              65

• One multiplicand is recoded • Three bits in X are recoded
• The a and x are multiplicands  to z
• P is product of multiplication
Backup                             66

Switching Activity in Multipliers
• Logic delay and propagation cause glitches
• Proposed analytical method
 Hard to estimate glitches in closed form
 Analyze switching activity w/r to input data wordlength
 Does not consider multiplier architecture
• Simulation method
 Count all switching activities
(transition counts in logic)
 Power estimation (Xilinx XPower)
 Considers multiplier architecture
Reduce Power Consumption in Arithmetic                                      67

Analytical Method
• Stream of data for one multiplicand                                                L
E ( X )   x  PX ( x)
• Compare two adjacent numbers                                                    x 0

in stream after reduction                                                 L bits

• Expectation of bit                   L                           M bits            N bits
EL ( X ) 
switching, x, with                   2                       S      …                  …

probability Px             LN M                             S      …                  …
 L-bit input data    Etr ( X )                                  …                  …
2   2                     S S          S S
 Truncate input data
to M bits (remove N bits)
 N-bit signed right shift in        Ers ( X ) 
1                 1
E ( X | Y  0)  E ( X | Y  1) 
L
L-bit input (Y is sign bit)                      2                 2                 2
Backup                          68

Analytical Method

1                 1
X has binomial    Ers ( X )      E ( X | Y  0)  E ( X | Y  1)
2                 2
distribution

Always L/2 (independent on M and N)
Backup                                                 69

Power Reduction in TI DSP
• TI TMS320VC5416 DSP STARTER KIT
 Measure average current for wordlength reduction of
multiplicands
581
loop:                                       580
STM data_a, AR2;
579
STM data_b, AR3;
Current [mA]

MPY *AR2+, *AR3+,a                          578
MPY *AR2+, *AR3+,a
577                                 (w,w)
….….
(16,w)
MPY *AR2+, *AR3+,a                          576
(w,16)
B loop                                      575                                 (wrsh,wrsh)

Assembly program (data_a and                   574
0   2   4   6     8      10   12      14    16
data_b has random data with                                    Wordlength (w)
wordlength w)
Backup                                           70

Code Generation for Fixed-Point Program

Function [c] = adder(a, b)                 c = 0;
c = 0;                                     a = fi (a, 1,32,16);      Determined
c = a + b;                                 b = fi (b, 1,32,16);          by
c = fi (c, 1,32,16);       designers
(a) Floating point program for adder                                   with trial
c(:) = a + b;
and error
(b) Raw fixed-point program
Function [c] = adder_fx(a, b, numtype)                        WL
c = 0;                                                S
a = fi (a, numtype.a);                                            FWL
b = fi (b, numtype.b);
c = fi (c, numtype.c);                        fi(a, S,WL,FWL) is a constructor
c(:) = a + b;                                 function for a fixed-point object in
(c) Converted fixed-point program for        fixed-point toolbox [S: Signed, WL:
automating optimization                  Wordlength, FWL: Fraction length]
Backup      71

Code Generation

<Run Code Generation>

<Floating-point Program>
Backup                   72

Running Transformation
• Just call top function with input data

> in = rand(1,1000)
> mac_top(in)

• Range and optimum wordlengths depend on input
statistic
Backup               73