VIEWS: 29 PAGES: 73 CATEGORY: Real Estate POSTED ON: 1/19/2010 Public Domain
Automating Transformations from Floating Point to Fixed Point for Implementing Digital Signal Processing Algorithms Prof. Brian L. Evans Embedded Signal Processing Laboratory Dept. of Electrical and Computer Engineering The University of Texas at Austin Based on work by PhD student Kyungtae Han (now at Intel Research Labs) July 4, 2006 2 Outline • Introduction • Background • Optimize fixed-point wordlengths • Reduce power consumption in arithmetic • Automate transformations of systems • Conclusion Introduction 3 Implementing Digital Signal Processing Algorithms Hardware Price Power* Floating-Point Program Floating- Point Processor L H Code Conversion Digital Signal Fixed- Processing Fixed Point (Uniform Wordlength) Point Algorithms Processor L H Wordlength Optimization Fixed- Fixed Point Point (Optimized Wordlength) ASIC L H ASIC: Application Specific Integrated Circuit * Power consumption Introduction 4 Transformations to Fixed Point • Advantages Floating-Point Program Lower hardware complexity Lower power consumption Code Transformation Faster speed in processing Conversion • Disadvantages Introduces distortion due to quantization error Wordlength Optimization Search for optimum wordlengths by trial & error is time-consuming Fixed Point (Optimized Wordlength) • Research goals Automate transformations to fixed point Control distortion vs. complexity tradeoffs 5 Outline • Introduction • Background • Optimize fixed-point wordlengths • Reduce power consumption in arithmetic • Automate transformations of systems • Conclusion Background 6 Fixed-Point Data Format • Integer wordlength (IWL) Number of bits assigned to integer representation Includes sign bit SystemC format • Fractional wordlength (FWL) www.systemc.org Number of bits assigned to fraction Wordlength • Wordlength: WL = IWL + FWL S X X X X X π= 3.14159… (10) [Floating Point] 3.140625(10) = 011.001001(2) [WL=9; IWL=3; FWL=6] Integer Fractional 3.141479492(10) = 011.00100100001110(2) wordlength wordlength [WL=16; IWL=3; FWL=13] (Binary point) Background 7 Distortion vs. Complexity Tradeoffs • Different wordlengths have different application distortion and implementation complexity tradeoffs Application c(w) Implementation cost distortion d(w) function Feasible region Optimal Cmax Constant for maximum tradeoff implementation cost curve d(w) Application distortion function Implementation Vector of wordlengths: complexity c(w) Dmax Constant for maximum w {w0 , w1 , w2 , , wN 1} application distortion w Wordlength lower bounds • Minimize implementation cost • Minimize application distortion w Wordlength upper bounds Background 8 Wordlength Optimization • Single objective • Multiple objective optimization optimization min ac c(w ) ad d (w ) min [c ( w ), d ( w )] wΙ n wΙ n subject to subject to d (w ) Dm ax d ( w ) Dm ax c(w ) Cm ax c ( w ) Cm ax www www Proposed work fixes integer wordlengths and searches for fractional wordlengths Background 9 Genetic Algorithm • Evolutionary New Gene Function Genes w/ algorithm Pool Evaluation Measure Inspired by Holland 1975 Mimic processes of Mutation Selection plant and animal evolution Child Parental Find optimum of a Genes Mating Genes complex function [Greg Rohling, Ph.D Defense, Georgia Tech, 2004] Background 10 Pareto Optimality • Pareto optimality: “best that could be Pareto Front achieved without disadvantaging at least one group” [Schick, 1970] I A G Objective 2 H • Pareto optimal set is set of nondominated B E C F solutions D E is dominated by C as all objectives for C Objective 1 are less than corresponding objectives for E : Nondominated Solutions A, B, C, D are nondominated (not : Dominated dominated by any solution) • Pareto front is boundary (tradeoff curve) that connects Pareto optimal set solutions 11 Outline • Introduction • Background • Optimize fixed-point wordlengths • Reduce power consumption in arithmetic • Automate transformations of systems • Conclusion Optimize Fixed-Point Wordlengths 12 Search for Optimum Wordlength • Exhaustive search impractical for many variables • Gradient-based search (single objective) Utilizes gradient information to determine next candidates Complexity measure (CM) [Sung & Kum, 1995] Distortion measure (DM) [Han et al., 2001] Next Complexity-and-distortion measure (CDM) [Han & Evans, 2004] • Guided random search Genetic algorithm for single objective [Leban & Tasic, 2000] Next Multiple objective genetic algorithm [Han, Olson & Evans, 2006] Optimize Fixed-Point Wordlengths 13 Complexity-and-Distortion Measure • Weighted combination of measures c(w) Complexity f cd (w ) c c(w ) d d (w ) function where c d 1, 0 c 1, 0 d 1 d(w) Distortion function • Single objective function min f cd (w ) Dmax Constant for wΙ n • Gradient-based search subject to maximum distortion Initialization d (w ) Dm ax Cmax Constant for Iterative greedy search based c(w ) Cm ax maximum on complexity and distortion w w w complexity gradient information Optimize Fixed-Point Wordlengths 14 Case Study I: Filter Design • Infinite impulse response (IIR) filter Complexity measure: Area model of field-programmable gate array (FPGA) [Constantinides, Cheung & Luk 2003] Distortion measure: Root mean square (RMS) error Seven fixed-point variables (indicated by slashes) b0 x[n] y[n] Delay -a1 b1 Optimize Fixed-Point Wordlengths 15 Case Study I: Gradient-Based Search • CDM could lead to lower complexity and lower number of simulations compared to DM and CM Search Gradient Number Complexity Distortion Method Measure of System Estimate (RMS)* Simulations (LUT) Gradient DM 316 51.05 0.0981 Gradient CDM 145 49.85 0.0992 Gradient CM 417 51.95 0.0986 Complete - 167 ** - - * Maximum distortion measured by root mean square (RMS) error is 0.1 ** 167 = 268,435,456 (8.5 years, if 1 second per 1 simulation) Optimize Fixed-Point Wordlengths 16 Case Study I: Genetic Algorithm • Search Pareto optimal set (nondominated) • Handles multiple objectives: Error and Area Pareto Front 0 0 0 10 10 10 non-dom (67/90) non-dom (76/90) non-dom (90/90) dom (23/90) dom (14/90) Error (RMS) Error (RMS) Error (RMS) -1 -1 -1 10 10 10 -2 9,000 simulations -2 22,500 simulations -2 45,000 simulations 10 10 10 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 Area (LUTs) Area (LUTs) Area (LUTs) 100th Generation 250th Generation 500th Generation * Population for one generation: 90 LUT: Lookup table Optimize Fixed-Point Wordlengths 17 Case Study I: Comparison • Gradient-based search (GS) results vs. GA results 0 0 10 10 non-dom (90/90) DM solutions CDM solutions CM solutions Error (RMS) Error (RMS) -1 -1 10 10 non-dom (35/90) dom (55/90) DM solutions CDM solutions -2 CM solutions -2 10 10 20 40 60 80 100 20 40 60 80 100 Area (LUTs) Area (LUTs) 50th Generation (4500 simulations) 500th Generation (45000 simulations) * Required RMSmax for gradient-based search are Dmax {0.12, 0.1, 0.08} • GS methods can get stuck in a local minimum • GS methods reduce running time (CDM: 145 simulations) Optimize Fixed-Point Wordlengths 18 Case Study II: Communication System • Simple binary phase shift keying (BPSK) system Complexity measure: Area model of field-programmable gate array (FPGA) [Constantinides, Cheung, and Luk 2003] Distortion measure: Bit error rate (BER) Four fixed-point variables (indicated by slashes) Source Data (1 or -1) Carrier AWGN Integration BER Decision & Dump Optimize Fixed-Point Wordlengths 19 Case Study II: Gradient-Based Search • CDM could lead to lower complexity and lower number of simulations compared to DM and CM Search Gradient Number Complexity Distortion Method Measure of System Estimate (BER)* Simulations (LUT) Gradient DM 66 40.65 0.083 Gradient CDM 65 43.65 0.085 Gradient CM 193 41.95 0.081 Complete - 65536 - - * Maximum distortion measured by bit error rate (BER) error is 0.1 Optimize Fixed-Point Wordlengths 20 Case Study II: Genetic Algorithm For Comparison • Search Pareto optimal set BER LUT DM 0.83 40.65 • Handles multiple objectives CDM 0.85 43.95 Pareto Front CM 0.81 41.95 Error (Bit Error Rate) Error (Bit Error Rate) Error (Bit Error Rate) 4,500 simulations 9,000 simulations 18,000 simulations 50th Generation 100th Generation 200th Generation * Population for one generation: 90 Preliminary results LUT: Lookup table Optimize Fixed-Point Wordlengths 21 Comparison of Proposed Methods Gradient-based Genetic search algorithm Type of Solution One point Family of points Tradeoff Curve Found No Yes Execution Time Short Long Amount of Computation Low High Parallelism Low High 22 Outline • Introduction • Background • Optimize fixed-point wordlengths • Reduce power consumption in arithmetic • Automate transformations of systems • Conclusion Reduce Power Consumption in Arithmetic 23 Lower Power Consumption in DSP • Minimize power dissipation due to limited battery power and cooling system • Multipliers often a major source of dynamic power consumption in typical DSP applications Multi-precision multiplier select smaller multipliers (8, 16 or 24 bits) to reduce power consumption Next Wordlength reduction to select any word size [Han, Evans & Swartzlander 2004] • In general, what reductions in power are possible in software when hardware has fixed wordlengths? Reduce Power Consumption in Arithmetic 24 Wordlength Reduction in Multiplication • Input data wordlength Sign bit 0001 0010 0011 0100 reduction 1101 1100 1010 1001 Smaller bits enough to (a) Original Multiplication represent, e.g. π x π ≈ 9 0001 0010 0000 0000 • Truncation 1101 1100 0000 0000 • Signed right shift (b) Reduction by Truncation Move toward the least 0000 0000 0001 0010 1111 1111 1101 1100 significant bit (LSB) Signed bit extended for (c) Reduction by Signed Right Shift arithmetic right shift Reduce Power Consumption in Arithmetic 25 Power Reduction via Wordlength Reduction • Power consumption Switching power consumption Pswitching CLVdd f clk 2 Static power consumption • Switching power consumption CL Load capacitance Switching activity parameter, α Vdd Operating voltage Reduce α by wordlength fclk Operating reduction frequency Relationship between reduced wordlength and switching parameter α in power consumption? Reduce Power Consumption in Arithmetic 26 Analytical Method L bits M bits N bits No Reduction S … … S … … S S … S S … Input Switching expectation Full length L/2 Truncate N bits M/2 N-bit signed L/2 right shift Wordlength (L) = 16 Reduce Power Consumption in Arithmetic 27 Dynamic Power Consumption for Wallace Multiplier (1 MHz) Reduction (56%) 16-bit x 16-bit multiplier Truncate 1st arg (Simulated on Truncation- First Truncate 2nd arg Xilinx XC3S200- Truncation- Second (recode,nonrecode) 5FT256 FPGA) Wallace multiplier used in TI 320C64 DSP Reduce Power Consumption in Arithmetic 28 Dynamic Power Consumption for Radix-4 Modified Booth Multiplier (1 MHz) Sensitive Reduction (13%) (31%) 16-bit x 16-bit multiplier Truncate 1st arg (Simulated on Truncate 2nd arg Xilinx XC3S200- (recode,nonrecode) 5FT256 FPGA) Swapping could have benefit Radix-4 modified Booth multiplier used in TI 320C62 DSP Reduce Power Consumption in Arithmetic 29 Comparison of Proposed Methods • Truncation to 8 bits reduces est. power consumption by 56% in Wallace and 31% in Booth 16-bit multipliers • Signed right shift has no est. power reduction in Wallace multiplier (for any shift) and 25% reduction in Booth (for 8-bit shift) multiplier • Operand swapping reduces power consumption for Booth but has negligible savings for Wallace multiplier • Power consumption in tree-based multiplier Highly dependent on input data Simulation matches analysis 30 Outline • Introduction • Background • Optimize fixed-point wordlengths • Reduce power consumption in arithmetic • Automate transformations of systems • Conclusion Automatic Transformations of Systems 31 Automating Transformations from Floating Point to Fixed Point • Existing fixed-point tools Fixed-point tools Support fixed-point simulation • SNU gFix, Autoscaler Convert floating-point code to • CoWare SPW HDS raw fixed-point code • Synopsys CoCentric • MATLAB Fixed-point toolbox Manually find optimum • MATLAB Fixed-point blockset wordlength by trial and error • AccelChip DSP synthesis • Catalytic RMS, MCS • Automating transformations Fully automate conversion and wordlength optimization Floating-Point Code Wordlength Wordlength-Optimized Program Conversion Optimization Fixed-Point Program Automatic Transformations of Systems 32 Automatic Transformation Flow • Code generation Parse floating-point program Generate raw fixed-point program and auxiliary programs • Range estimation Estimate range to avoid overflow (Analytical/Simulation) Determine integer wordlength (IWL) • Wordlength optimization Optimize wordlength according to given input, and error specification (Analytical/Simulation) Determine fractional wordlength (FWL) Code Range Wordlength Generation Estimation Optimization Automatic Transformations of Systems 33 Automating Transformation Environment for Wordlength Optimization Input Data Top Floating-Point Optimum Wordlength Program Program Evaluation Search Program Fixed-Point Gradient-based Engine Program (Objectives) or Genetic algorithm Range Complexity Error Estimation Estimation Estimation • Given floating-point program and options, auxiliary programs are automatically generated • Given input data, optimum wordlength is searched Automatic Transformations of Systems 34 Demo of Released Software Conclusion 35 Conclusion • Search for optimum wordlength Gradient-based search reduces execution time while solutions could be trapped in local optimum Genetic algorithm can find distortion vs. complexity tradeoff curve, but it requires longer execution time • Reduce power consumption by wordlength reduction of multiplicands • Automate transformations from floating-point programs to fixed-point programs • Freely distributable software release available at http://www.ece.utexas.edu/~bevans/projects/wordlength/converter/ Conclusion 36 Future Work • Advanced wordlength search algorithms Hybrid wordlength optimization Prune redundant wordlength variables (e.g. delay, adder) Adaptive step size for gradient-based search methods • Further analysis on search algorithms Analysis of genetic algorithms with different settings Comparison with simulated annealing • Low power consumption System level including memory [Powell and Chau, 1991] Wordlength reduction for floating-point multipliers Conclusion 37 Future Work (continued) • Electronic design automation software Enhanced code generator (e.g. rounding preferences) Hybrid analytical/simulation range estimation • Optimum DSP algorithms Rearranging subsystems at block diagram Rearranging mathematical expressions in algorithm • Developing more sophisticated hardware area models Avoids having to route each design through synthesis tools Transcendental functions 38 End 39 Backup Slides Publications 40 Publications-I • Conference Papers 1. K. Han, A. G. Olson, and B. L. Evans, ``Automatic floating-point to fixed-point transformations'', Proc. IEEE Asilomar Conf. on Signals, Systems, and Computers, Nov. 2006, Pacific Grove, CA USA. invited paper. 2. K. Han, B. L. Evans, and E. E. Swartzlander, Jr., ``Low-Power Multipliers with Data Wordlength Reduction'', Proc. IEEE Asilomar Conf. on Signals, Systems, and Computers, Oct. 30-Nov. 2, 2005, pp. 1615-1619, Pacific Grove, CA USA. 3. K. Han, B. L. Evans, and E.E. Swartzlander, Jr., ``Data Wordlength Reduction for Low- Power Signal Processing Software,'' Proc. IEEE Work. on Signal Processing Systems, Oct. 13-15, 2004, pp. 343-348, Austin, TX USA. 4. K. Han and B. L. Evans, ``Wordlength Optimization with Complexity-And-Distortion Measure and Its Applications to Broadband Wireless Demodulator Design,'' Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Proc., May 17-21, 2004, vol. 5, pp. 37-40, Montreal, Canada. 5. K. Han, I. Eo, K. Kim, and H. Cho, ``Numerical Word-Length Optimization for CDMA Demodulator,'' Proc. IEEE Int. Sym. on Circuits and Systems, May, 2001, vol. 4, pp. 290- 293, Sydney, Australia. 6. K. Han, I. Eo, K. Kim, and H. Cho, ``Bit Constraint Parameter Decision Method for CDMA Digital Demodulator,'' Proc. CDMA Int. Conf. & Exhibition, Nov. 2000, vol. 2, pp. 583-586, Seoul, Korea. 7. S. Nahm, K. Han, and W. Sung, ``A CORDIC-based Digital Quadrature Mixer: Comparison with ROM-based Architecture,'' Proc. IEEE Int. Sym. on Circuits and Systems, Jun. 1998, vol. 4, pp. 385-388, Monterey, CA USA. Publications 41 Publications-II • Journal Articles K. Han and B. L. Evans, ``Optimum Wordlength Search Using A Complexity-And- Distortion Measure,'' EURASIP Journal on Applied Signal Processing, special issue on Design Methods for DSP Systems, vol. 2006, no. 5, pp. 103-116, 2006. • Other Publications 1. K. Han, E. Soo, H. Jugn, and K. Kim, Apparatus and Method for Short-Delay Multipath Searcher in Spread Spectrm Systems, U.S. Patent pending, Nov. 2001. 2. K. Han, I. Lim, E. Soo, H. Seo, K. Kim, H. Jung, and H. Cho, Apparatus and Method for Separating Carrier of Multicarrier Wireless Communication Receiver System, U.S. Patent pending, Sep. 2001. 3. K. Han, ``Carrier Synchronization Scheme Using Input Signal Interpolation for Digital Receivers,'' Master's Thesis, Seoul National University, Seoul, Korea, Feb. 1998. Backup 42 Research on Transformation Backup 43 Simulation Flow Gradient-based Genetic search algorithm search algorithm Generate Pareto Front Setup desired Search wordlength specification sets Search wordlength Pick one of sets set Generate Optimized fixed-point program Backup 44 Algorithm Design and Implementation Low High Floating-Point Floating-Point Programs Processor Code Conversion Hardware Complexity Power Consumption Design Time Uniform Wordlength Fixed-Point Fixed-Point Processor Programs Wordlength Optimization Optimized Fixed-Point Fixed-Point IC Programs High Low Algorithm Design Algorithm Implementation Backup 45 Wordlength Optimization Constraints • Distortion constraint • Complexity constraint Application-specific Application-specific distortion d(w) distortion d(w) Dmax Cmax Implementation Implementation complexity c(w) complexity c(w) Backup 46 Gradient-Based Search • Gradient information can be used for update direction • Gradient information is measured in design parameters such as implementation complexity, precision distortion, or power consumption Complexity measurement (CM) [Sung and Kum, 1995] Distortion measurement (DM) [Han et al., 2001] Complexity-and-distortion measurement (CDM) [Han and Evans, 2004] (proposed) Backup 47 Gradient Information f (w ( h ) ) f ({w1( h) , w2h ) ,...,wnh1) ,...,wNh ) }) ( ( ( wnh ) wnh1) ( ( w2 N number of variable 3 20 23 h iteration index 10 8 2 10 15 25 n variable index 5 10 w wordlength vector w1 2 3 4 f(w) objective function Search direction b Objective value a b Gradient Backup 48 Gradient-Based Search Direction • Wordlength update (s: step size) w j 1 w j s j • Direction d (1,0,0,..., ) if m j 0 w1 Finite Difference d (0,1,0,..., ) if m j 0 j w2 .......... .......... .......... d (0,0,0,..., ) if m j w 1 N d d d where m j max( , ,...., ) w1 w2 wN Backup 49 Complexity and Distortion Function • Complexity function, c(w) Number of multiplications is counted Hardware complexity is estimated by assuming that complexity linearly increases as wordlength increases Given hardware model results in accurate complexity • Distortion function, d(w) Difficult to derive closed-form mathematical expression Estimated by computer simulation measuring output SNR or bit error rate in digital communication systems Backup 50 Complexity Measure [Sung and Kum, 1995] • Uses complexity sensitivity information as direction to search for optimum wordlength • Advantage: minimizes complexity • Disadvantage: demands large number of iterations Objective function f ( w ) c( w ) Optimization problem min{ f (w) | d (w) Dmax, w w w} wI n Update direction c(w) Backup 51 Distortion Measure [Han et al., 2001] • Applies the application performance information to search for the optimum wordlengths • Advantage: Fewer number of iterations • Disadvantage: Not guaranteed to yield optimum wordlength for complexity Objective function f (w ) d (w ) Optimization problem min{ f (w) | d (w) Dmax, c(w) Cmax, w w w} wI n Update direction d (w) Backup 52 Feasible Solution Search [Sung and Kum, 1995] • Exhaustive search of all possible w2 wordlengths wopt • Advantages 5 24 Does not miss optimum points 23 dw2 Simple algorithm 22 21 • Disadvantage wb dw1 Many trials (=experiments) • Distance d dw1 dw2 ... dwN 5 w1 Direction of full search: • Expected number of iterations minimum wordlengths {2,2} (d N 1)...(d 2)(d 1)d optimum wordlengths = {5,5} EFS (d ) N N! d=6 trials = 24 Backup 53 Sequential Search [K. Han et al. 2001] • Greedy search based on sensitivity information (gradient) w w w s j • Example 2 j 1 j wopt Minimum wordlengths {2,2} 5 Direction of sequential search dw2 Optimum wordlengths {5,5} 12 iterations wb dw1 • Advantage: Fewer trials 5 w1 • Disadvantage: Could miss global optimum point Backup 54 Case Study: Receiver Design Transmitter Multicarrier Data Encoder Modulator Receiver Wireless Channel w2 Channel Estimator w Bit Error 3 w w Multicarrier 0 Rate Channel 1 Demodulator Tester Equalizer w0 Input wordlength of a multicarrier demodulator which performs a fast Fourier transform (FFT) w1 Input wordlength of equalizer w2 Input wordlength of channel estimator w3 Output wordlength of channel estimator Backup 55 Simulation Results • CDM leads to lower complexity compared to DM • CDM reduces the number of trials compared to CM, feasible solution [Sung and Kim 1995], and exhaustive search Fast searching Search Gradient αc Number Simulations Wordlength Complexity Distortion Method Measure of for Estimate (BER)* Trials Variables Gradient DM 0 16 64 {10,9,4,10} 10781 0.0009 Gradient CDM 0.5 15 60 {7,10,4,6} 7702 0.0012 Gradient CM 1 69 69 {7,7,4,6} 7699 0.0015 Feasible - - 210 210 {7,7,4,6} 7699 0.0015 Exhaustive - - 26364 26364 - - - * Required BER ≤ 1.5 x 10-3 Backup 56 Simulation Environments • Assumptions Complexity Vector Internal wordlengths of blocks have Input Weight been decided FFT 1024 Complexity increases linearly as Equalizer 1 wordlength increases (right) Estimator 128 • Required application performance Equalizer 2 Bit error rate of 1.5 x 10-3 (without (upper) error correcting codes) Complexity • Simulation tool C(w) = cT.w LabVIEW 7.0 Backup 57 FFT Cost • N Tap FFT cost N Cost FFT log 2 N 2 • 256 Tap FFT cost 256 CostFFT log 2 256 2 1024 Backup 58 Minimum Wordlengths • Change one wordlength variable while keeping other variables at high precision {1,16,16,16},{2,16,16,16},... {16,1,16,16},{16,2,16,16},... … …{16,16,16,15},{16,16,16,16} • Minimum wordlength vector is {5,4,4,4} Backup 59 Number of Trials • Start at {5,4,4,4} wordlength • Next wordlength vectors for complexity measure (α = 1.0) {5,4,4,4}, {5,5,4,4}, … • Increase wordlength one-by- one until satisfying required application performance Low-Power Signal Processing 60 Power Consumption • Power consumption in CMOS circuits α : Transition factor Pavg Pswitching Pshort circuit Pleakage C : Capacitance • Significant power in CMOS circuits is V : Supply voltage dissipated when they are switching f : Frequency Pswitching C V 2 f • Power reduction in hardware part [Chandrakasan and Brodersen, 1995] Scaling down, minimizing area Adjusting voltage and frequency during operation • Power reduction in software part [Tiwari, Malik and Wolfe, 1994] [Lee et al., 1997] Instruction ordering and packing Energy reductions varying from 26% to 73% Low-Power Signal Processing 61 Wordlength for Low-Power Consumption • Power model of wordlength [Choi and Burleson, 1994] Wordlength is considered as capacitance Power consumption is proportional to wordlength Switching activity is not considered • Data wordlength reduction technique [Han, Evans, and Swartzlander, 2004] (proposed) Count node transitions for switching activity Reduce input data wordlength to decrease power consumption Backup 62 Dynamic and Static Power Trends in dynamic and static power dissipation showing increasing contribution of static power [S. Thompson, P. Packan, and M. Bohr. MOS Scaling: Transistor Challenges for the 21st Century. Intel Technology Journal, Q3 1998] Backup 63 Power Dissipation of Multiplier Unit • Multiply unit is usually a major source of power consumption in typical DSP applications Multiply unit required for digital communication & digital signal processing algorithms Digital filters, equalizers, FFT/IFFT, digital down/ upconverter, etc. TMS320C5x Power Dissipation Characteristics from www.ti.com Backup 64 Wallace vs. Booth Multipliers Symmetric Asymmetric (one operand recoded) Tree dot diagram Radix-4 multiplier based in 4-bit Wallace multiplier on Booth’s recoding (Χ ● a = P) Backup 65 Radix-4 Modified Booth Multiplier • One multiplicand is recoded • Three bits in X are recoded • The a and x are multiplicands to z • P is product of multiplication Backup 66 Switching Activity in Multipliers • Logic delay and propagation cause glitches • Proposed analytical method Hard to estimate glitches in closed form Analyze switching activity w/r to input data wordlength Does not consider multiplier architecture • Simulation method Count all switching activities (transition counts in logic) Power estimation (Xilinx XPower) Considers multiplier architecture Reduce Power Consumption in Arithmetic 67 Analytical Method • Stream of data for one multiplicand L E ( X ) x PX ( x) • Compare two adjacent numbers x 0 in stream after reduction L bits • Expectation of bit L M bits N bits EL ( X ) switching, x, with 2 S … … probability Px LN M S … … L-bit input data Etr ( X ) … … 2 2 S S S S Truncate input data to M bits (remove N bits) N-bit signed right shift in Ers ( X ) 1 1 E ( X | Y 0) E ( X | Y 1) L L-bit input (Y is sign bit) 2 2 2 Backup 68 Analytical Method 1 1 X has binomial Ers ( X ) E ( X | Y 0) E ( X | Y 1) 2 2 distribution Always L/2 (independent on M and N) Backup 69 Power Reduction in TI DSP • TI TMS320VC5416 DSP STARTER KIT Radix-4 modified Booth multiplier Measure average current for wordlength reduction of multiplicands 581 loop: 580 STM data_a, AR2; 579 STM data_b, AR3; Current [mA] MPY *AR2+, *AR3+,a 578 MPY *AR2+, *AR3+,a 577 (w,w) ….…. (16,w) MPY *AR2+, *AR3+,a 576 (w,16) B loop 575 (wrsh,wrsh) Assembly program (data_a and 574 0 2 4 6 8 10 12 14 16 data_b has random data with Wordlength (w) wordlength w) Backup 70 Code Generation for Fixed-Point Program • Adder function in MATLAB Function [c] = adder_fx(a, b) Function [c] = adder(a, b) c = 0; c = 0; a = fi (a, 1,32,16); Determined c = a + b; b = fi (b, 1,32,16); by c = fi (c, 1,32,16); designers (a) Floating point program for adder with trial c(:) = a + b; and error (b) Raw fixed-point program Function [c] = adder_fx(a, b, numtype) WL c = 0; S a = fi (a, numtype.a); FWL b = fi (b, numtype.b); c = fi (c, numtype.c); fi(a, S,WL,FWL) is a constructor c(:) = a + b; function for a fixed-point object in (c) Converted fixed-point program for fixed-point toolbox [S: Signed, WL: automating optimization Wordlength, FWL: Fraction length] Backup 71 Code Generation <Run Code Generation> <Floating-point Program> Backup 72 Running Transformation • Just call top function with input data > in = rand(1,1000) > mac_top(in) • Range and optimum wordlengths depend on input statistic Backup 73 Advantages/disadvantages of wordlength search algorithms