Automating Transformations from Floating Point to Fixed Point for by mtr14643

VIEWS: 29 PAGES: 73

									      Automating Transformations from
Floating Point to Fixed Point for Implementing
     Digital Signal Processing Algorithms

                Prof. Brian L. Evans
      Embedded Signal Processing Laboratory
    Dept. of Electrical and Computer Engineering
          The University of Texas at Austin



    Based on work by PhD student Kyungtae Han (now at Intel Research Labs)

                                July 4, 2006
                                           2




                   Outline
• Introduction
• Background
• Optimize fixed-point wordlengths
• Reduce power consumption in arithmetic
• Automate transformations of systems
• Conclusion
                                          Introduction                             3


                     Implementing Digital Signal
                       Processing Algorithms
                                                         Hardware Price   Power*

                 Floating-Point Program                  Floating-
                                                           Point
                                                         Processor
                                                                          L    H
                     Code
                   Conversion
Digital Signal                                            Fixed-
 Processing                    Fixed Point
                          (Uniform Wordlength)             Point
 Algorithms                                              Processor        L    H
                   Wordlength
                   Optimization
                                                          Fixed-
                               Fixed Point                Point
                         (Optimized Wordlength)           ASIC            L    H


         ASIC: Application Specific Integrated Circuit   * Power consumption
                         Introduction                                                4




      Transformations to Fixed Point
• Advantages                               Floating-Point Program
   Lower hardware complexity
   Lower power consumption
                                               Code




                                                                    Transformation
   Faster speed in processing               Conversion
• Disadvantages
   Introduces distortion due to
    quantization error                       Wordlength
                                             Optimization
   Search for optimum wordlengths
    by trial & error is time-consuming                   Fixed Point
                                                   (Optimized Wordlength)
• Research goals
   Automate transformations to fixed point
   Control distortion vs. complexity tradeoffs
                                           5




                   Outline
• Introduction
• Background
• Optimize fixed-point wordlengths
• Reduce power consumption in arithmetic
• Automate transformations of systems
• Conclusion
                                    Background                            6




             Fixed-Point Data Format
• Integer wordlength (IWL)
   Number of bits assigned to integer representation
   Includes sign bit                                 SystemC format
• Fractional wordlength (FWL)                         www.systemc.org

   Number of bits assigned to fraction                 Wordlength

• Wordlength: WL = IWL + FWL
                                                  S    X X X X X
  π=   3.14159… (10)   [Floating Point]
       3.140625(10) = 011.001001(2)
                      [WL=9; IWL=3; FWL=6]        Integer    Fractional
       3.141479492(10) = 011.00100100001110(2)   wordlength wordlength
                       [WL=16; IWL=3; FWL=13]          (Binary point)
                                        Background                                    7




        Distortion vs. Complexity Tradeoffs
   • Different wordlengths have different application
     distortion and implementation complexity tradeoffs
Application                                           c(w) Implementation cost
distortion d(w)                                            function
                         Feasible
                          region          Optimal     Cmax Constant for maximum
                                          tradeoff
                                                           implementation cost
                                          curve
                                                      d(w) Application distortion
                                                           function
                                    Implementation
Vector of wordlengths:              complexity c(w)   Dmax Constant for maximum
w  {w0 , w1 , w2 , , wN 1}                              application distortion
                                                       w    Wordlength lower bounds
      • Minimize implementation cost
      • Minimize application distortion                w    Wordlength upper bounds
                             Background                            8




             Wordlength Optimization
• Single objective                 • Multiple objective
  optimization                       optimization
  min ac c(w )  ad d (w )                min [c ( w ), d ( w )]
  wΙ
    n
                                          wΙ
                                            n

        subject to                          subject to
        d (w )  Dm ax                      d ( w )  Dm ax
        c(w )  Cm ax                       c ( w )  Cm ax
        www                               www


           Proposed work fixes integer wordlengths
            and searches for fractional wordlengths
                           Background                                        9




                 Genetic Algorithm

• Evolutionary
                             New Gene        Function       Genes w/
  algorithm                    Pool         Evaluation      Measure
   Inspired by Holland
    1975
   Mimic processes of        Mutation                      Selection

    plant and animal
    evolution
                               Child                         Parental
   Find optimum of a          Genes
                                              Mating
                                                              Genes
    complex function
                          [Greg Rohling, Ph.D Defense, Georgia Tech, 2004]
                            Background                                                            10




                   Pareto Optimality

• Pareto optimality: “best that could be
                                                                          Pareto Front
  achieved without disadvantaging at least
  one group” [Schick, 1970]
                                                                                              I
                                                                  A
                                                                              G




                                                    Objective 2
                                                                                      H

• Pareto optimal set is set of nondominated                           B
                                                                              E
                                                                          C               F
  solutions                                                                       D
    E is dominated by C as all objectives for C
                                                                          Objective 1
     are less than corresponding objectives for E
                                                                      : Nondominated
    Solutions A, B, C, D are nondominated (not
                                                                      : Dominated
     dominated by any solution)
• Pareto front is boundary (tradeoff curve)
  that connects Pareto optimal set solutions
                                           11




                   Outline
• Introduction
• Background
• Optimize fixed-point wordlengths
• Reduce power consumption in arithmetic
• Automate transformations of systems
• Conclusion
                         Optimize Fixed-Point Wordlengths                  12




            Search for Optimum Wordlength

   • Exhaustive search impractical for many variables
   • Gradient-based search (single objective)
        Utilizes gradient information to determine next candidates
        Complexity measure (CM) [Sung & Kum, 1995]
        Distortion measure (DM) [Han et al., 2001]
Next    Complexity-and-distortion measure (CDM) [Han & Evans, 2004]
   • Guided random search
        Genetic algorithm for single objective [Leban & Tasic, 2000]
Next    Multiple objective genetic algorithm [Han, Olson & Evans, 2006]
                      Optimize Fixed-Point Wordlengths                          13




   Complexity-and-Distortion Measure
• Weighted combination of measures                          c(w) Complexity
  f cd (w )   c c(w )   d d (w )                             function

  where  c   d  1, 0   c  1, 0   d  1             d(w) Distortion
                                                                 function
• Single objective function                min f cd (w )    Dmax Constant for
                                            wΙ
                                             n


• Gradient-based search                    subject to            maximum
                                                                 distortion
    Initialization                        d (w )  Dm ax
                                                            Cmax Constant for
    Iterative greedy search based c(w )  Cm ax                 maximum
     on complexity and distortion w  w  w                      complexity
     gradient information
                    Optimize Fixed-Point Wordlengths          14




          Case Study I: Filter Design
• Infinite impulse response (IIR) filter
    Complexity measure: Area model of field-programmable
     gate array (FPGA) [Constantinides, Cheung & Luk 2003]
    Distortion measure: Root mean square (RMS) error
    Seven fixed-point variables (indicated by slashes)
                                      b0

           x[n]                                        y[n]

                              Delay
                        -a1           b1
                       Optimize Fixed-Point Wordlengths                   15




  Case Study I: Gradient-Based Search
• CDM could lead to lower complexity and lower
  number of simulations compared to DM and CM

           Search     Gradient     Number      Complexity   Distortion
           Method     Measure     of System     Estimate     (RMS)*
                                 Simulations     (LUT)

         Gradient       DM             316         51.05      0.0981
         Gradient      CDM             145         49.85      0.0992
         Gradient       CM             417         51.95      0.0986
         Complete        -           167 **            -           -


   * Maximum distortion measured by root mean square (RMS) error is 0.1
   ** 167 = 268,435,456 (8.5 years, if 1 second per 1 simulation)
                                                                        Optimize Fixed-Point Wordlengths                                                                      16




                                  Case Study I: Genetic Algorithm
                        • Search Pareto optimal set (nondominated)
                        • Handles multiple objectives: Error and Area
                   Pareto Front
                   0                                                              0                                                         0
              10                                                             10                                                        10
                                       non-dom (67/90)                                               non-dom (76/90)                                      non-dom (90/90)
                                       dom (23/90)                                                   dom (14/90)
Error (RMS)




                                                               Error (RMS)




                                                                                                                         Error (RMS)
                   -1                                                             -1                                                        -1
              10                                                             10                                                        10




                   -2
                         9,000 simulations                                        -2
                                                                                       22,500 simulations                                   -2
                                                                                                                                                 45,000 simulations
              10                                                             10                                                        10
                         20      40      60     80       100                            20     40      60     80       100                       20      40      60     80   100
                                  Area (LUTs)                                                   Area (LUTs)                                               Area (LUTs)

                              100th Generation                                               250th Generation                                         500th Generation
                        * Population for one generation: 90                                                                                           LUT: Lookup table
                                                    Optimize Fixed-Point Wordlengths                                                         17




                                Case Study I: Comparison
 • Gradient-based search (GS) results vs. GA results
                   0                                                                    0
              10                                                                   10
                                                                                                                     non-dom (90/90)
                                                                                                                     DM solutions
                                                                                                                     CDM solutions
                                                                                                                     CM solutions




                                                                     Error (RMS)
Error (RMS)




                   -1                                                                   -1
              10                                                                   10

                             non-dom (35/90)
                             dom (55/90)
                             DM solutions
                             CDM solutions
                   -2
                             CM solutions                                               -2
              10                                                                   10
                        20       40          60        80      100                           20   40          60          80           100
                                      Area (LUTs)                                                      Area (LUTs)
                   50th Generation (4500 simulations)        500th Generation (45000 simulations)
               * Required RMSmax for gradient-based search are Dmax  {0.12, 0.1, 0.08}

     • GS methods can get stuck in a local minimum
     • GS methods reduce running time (CDM: 145 simulations)
                     Optimize Fixed-Point Wordlengths           18




 Case Study II: Communication System
• Simple binary phase shift keying (BPSK) system
    Complexity measure: Area model of field-programmable
     gate array (FPGA) [Constantinides, Cheung, and Luk 2003]
    Distortion measure: Bit error rate (BER)
    Four fixed-point variables (indicated by slashes)
                           Source Data
                             (1 or -1)

                                              Carrier    AWGN


                             Integration
     BER       Decision
                               & Dump
                       Optimize Fixed-Point Wordlengths                   19




 Case Study II: Gradient-Based Search
• CDM could lead to lower complexity and lower
  number of simulations compared to DM and CM

           Search     Gradient     Number      Complexity   Distortion
           Method     Measure     of System     Estimate     (BER)*
                                 Simulations     (LUT)

         Gradient       DM              66         40.65        0.083
         Gradient      CDM              65         43.65        0.085
         Gradient       CM             193         41.95        0.081
         Complete        -           65536             -            -

     * Maximum distortion measured by bit error rate (BER) error is 0.1
                                                             Optimize Fixed-Point Wordlengths                                                                  20




                                Case Study II: Genetic Algorithm




                                                                                                  For Comparison
                         • Search Pareto optimal set                                                                                         BER      LUT
                                                                                                                                   DM        0.83    40.65
                         • Handles multiple objectives                                                                             CDM       0.85    43.95
                          Pareto Front                                                                                             CM        0.81    41.95
                                                   Error (Bit Error Rate)
Error (Bit Error Rate)




                                                                                                          Error (Bit Error Rate)
                            4,500 simulations                               9,000 simulations                                           18,000 simulations


                            50th Generation                                    100th Generation                                          200th Generation

                         * Population for one generation: 90                        Preliminary results                                    LUT: Lookup table
                   Optimize Fixed-Point Wordlengths                      21




    Comparison of Proposed Methods


                           Gradient-based                 Genetic
                               search                    algorithm
Type of Solution               One point              Family of points
Tradeoff Curve Found                No                      Yes
Execution Time                    Short                    Long
Amount of Computation              Low                     High
Parallelism                        Low                     High
                                           22




                   Outline
• Introduction
• Background
• Optimize fixed-point wordlengths
• Reduce power consumption in arithmetic
• Automate transformations of systems
• Conclusion
                         Reduce Power Consumption in Arithmetic         23




         Lower Power Consumption in DSP
       • Minimize power dissipation due to limited battery
         power and cooling system
       • Multipliers often a major source of dynamic power
         consumption in typical DSP applications
           Multi-precision multiplier select smaller multipliers (8,
            16 or 24 bits) to reduce power consumption
Next       Wordlength reduction to select any word size
            [Han, Evans & Swartzlander 2004]

       • In general, what reductions in power are possible in
         software when hardware has fixed wordlengths?
                 Reduce Power Consumption in Arithmetic                          24




Wordlength Reduction in Multiplication
• Input data wordlength             Sign bit
                                                    0001 0010 0011 0100
  reduction                                         1101 1100 1010 1001
    Smaller bits enough to                     (a) Original Multiplication
     represent, e.g. π x π ≈ 9
                                                    0001 0010 0000 0000
• Truncation                                        1101 1100 0000 0000

• Signed right shift                           (b) Reduction by Truncation

    Move toward the least                          0000 0000 0001 0010
                                                    1111 1111 1101 1100
     significant bit (LSB)
    Signed bit extended for               (c) Reduction by Signed Right Shift

     arithmetic right shift
                Reduce Power Consumption in Arithmetic                        25




Power Reduction via Wordlength Reduction

• Power consumption
   Switching power consumption                   Pswitching  CLVdd f clk
                                                                    2


   Static power consumption
• Switching power consumption                     CL Load capacitance
   Switching activity parameter, α               Vdd Operating voltage
   Reduce α by wordlength                        fclk Operating
    reduction                                          frequency

      Relationship between reduced wordlength and
      switching parameter α in power consumption?
                                  Reduce Power Consumption in Arithmetic                  26




                               Analytical Method
                L bits
       M bits            N bits
                                                                           No Reduction
   S      …                …

   S      …                …

   S S    …     S S        …


    Input                 Switching
                         expectation
  Full length                L/2
Truncate N bits             M/2
 N-bit signed                L/2
  right shift                                            Wordlength (L) = 16
                       Reduce Power Consumption in Arithmetic                         27


                   Dynamic Power Consumption
                  for Wallace Multiplier (1 MHz)




                                                                       Reduction
                                                                        (56%)


16-bit x 16-bit
multiplier                                                       Truncate 1st arg
(Simulated on                                                    Truncation- First
                                                                 Truncate 2nd arg
Xilinx XC3S200-                                                  Truncation- Second
                                                                 (recode,nonrecode)
5FT256 FPGA)


                      Wallace multiplier used in TI 320C64 DSP
                          Reduce Power Consumption in Arithmetic                         28


        Dynamic Power Consumption for Radix-4
          Modified Booth Multiplier (1 MHz)


                         Sensitive                                       Reduction
                          (13%)                                           (31%)




16-bit x 16-bit
multiplier                                                          Truncate 1st arg
(Simulated on                                                       Truncate 2nd arg
Xilinx XC3S200-                                                     (recode,nonrecode)
5FT256 FPGA)

                                                           Swapping could have benefit
                  Radix-4 modified Booth multiplier used in TI 320C62 DSP
                Reduce Power Consumption in Arithmetic      29




     Comparison of Proposed Methods

• Truncation to 8 bits reduces est. power consumption by
  56% in Wallace and 31% in Booth 16-bit multipliers
• Signed right shift has no est. power reduction in
  Wallace multiplier (for any shift) and 25% reduction in
  Booth (for 8-bit shift) multiplier
• Operand swapping reduces power consumption for
  Booth but has negligible savings for Wallace multiplier
• Power consumption in tree-based multiplier
    Highly dependent on input data
    Simulation matches analysis
                                           30




                   Outline
• Introduction
• Background
• Optimize fixed-point wordlengths
• Reduce power consumption in arithmetic
• Automate transformations of systems
• Conclusion
                    Automatic Transformations of Systems                           31


         Automating Transformations from
           Floating Point to Fixed Point
• Existing fixed-point tools
                                                           Fixed-point tools
     Support fixed-point simulation
                                                  • SNU gFix, Autoscaler
     Convert floating-point code to              • CoWare SPW HDS
      raw fixed-point code                        • Synopsys CoCentric
                                                  • MATLAB Fixed-point toolbox
     Manually find optimum                       • MATLAB Fixed-point blockset
      wordlength by trial and error               • AccelChip DSP synthesis
                                                  • Catalytic RMS, MCS
• Automating transformations
     Fully automate conversion and wordlength optimization

Floating-Point     Code              Wordlength             Wordlength-Optimized
   Program       Conversion          Optimization           Fixed-Point Program
                  Automatic Transformations of Systems                  32




     Automatic Transformation Flow
• Code generation
   Parse floating-point program
   Generate raw fixed-point program and auxiliary programs
• Range estimation
   Estimate range to avoid overflow (Analytical/Simulation)
   Determine integer wordlength (IWL)
• Wordlength optimization
   Optimize wordlength according to given input, and error
    specification (Analytical/Simulation)
   Determine fractional wordlength (FWL)

       Code                     Range                    Wordlength
     Generation               Estimation                 Optimization
                      Automatic Transformations of Systems                     33


   Automating Transformation Environment
        for Wordlength Optimization
         Input Data        Top                                Floating-Point
Optimum Wordlength       Program                                 Program

                                          Evaluation
                          Search           Program              Fixed-Point
 Gradient-based           Engine                                 Program
                                         (Objectives)
   or Genetic
   algorithm

                             Range         Complexity          Error
                           Estimation      Estimation        Estimation



       • Given floating-point program and options,
         auxiliary programs are automatically generated
       • Given input data, optimum wordlength is searched
     Automatic Transformations of Systems   34




Demo of Released Software
                            Conclusion                              35




                        Conclusion
• Search for optimum wordlength
    Gradient-based search reduces execution time while
     solutions could be trapped in local optimum
    Genetic algorithm can find distortion vs. complexity
     tradeoff curve, but it requires longer execution time
• Reduce power consumption by wordlength reduction
  of multiplicands
• Automate transformations from floating-point
  programs to fixed-point programs
• Freely distributable software release available at
 http://www.ece.utexas.edu/~bevans/projects/wordlength/converter/
                             Conclusion                         36




                      Future Work
• Advanced wordlength search algorithms
    Hybrid wordlength optimization
    Prune redundant wordlength variables (e.g. delay, adder)
    Adaptive step size for gradient-based search methods

• Further analysis on search algorithms
    Analysis of genetic algorithms with different settings
    Comparison with simulated annealing

• Low power consumption
    System level including memory [Powell and Chau, 1991]
    Wordlength reduction for floating-point multipliers
                           Conclusion                             37




           Future Work (continued)
• Electronic design automation software
    Enhanced code generator (e.g. rounding preferences)
    Hybrid analytical/simulation range estimation

• Optimum DSP algorithms
    Rearranging subsystems at block diagram
    Rearranging mathematical expressions in algorithm

• Developing more sophisticated hardware area models
    Avoids having to route each design through synthesis tools
    Transcendental functions
      38




End
                39




Backup Slides
                                         Publications                                              40




                              Publications-I
• Conference Papers
   1. K. Han, A. G. Olson, and B. L. Evans, ``Automatic floating-point to fixed-point
        transformations'', Proc. IEEE Asilomar Conf. on Signals, Systems, and Computers, Nov.
        2006, Pacific Grove, CA USA. invited paper.
   2.   K. Han, B. L. Evans, and E. E. Swartzlander, Jr., ``Low-Power Multipliers with Data
        Wordlength Reduction'', Proc. IEEE Asilomar Conf. on Signals, Systems, and Computers,
        Oct. 30-Nov. 2, 2005, pp. 1615-1619, Pacific Grove, CA USA.
   3.   K. Han, B. L. Evans, and E.E. Swartzlander, Jr., ``Data Wordlength Reduction for Low-
        Power Signal Processing Software,'' Proc. IEEE Work. on Signal Processing Systems,
        Oct. 13-15, 2004, pp. 343-348, Austin, TX USA.
   4.   K. Han and B. L. Evans, ``Wordlength Optimization with Complexity-And-Distortion
        Measure and Its Applications to Broadband Wireless Demodulator Design,'' Proc. IEEE
        Int. Conf. on Acoustics, Speech, and Signal Proc., May 17-21, 2004, vol. 5, pp. 37-40,
        Montreal, Canada.
   5.   K. Han, I. Eo, K. Kim, and H. Cho, ``Numerical Word-Length Optimization for CDMA
        Demodulator,'' Proc. IEEE Int. Sym. on Circuits and Systems, May, 2001, vol. 4, pp. 290-
        293, Sydney, Australia.
   6.   K. Han, I. Eo, K. Kim, and H. Cho, ``Bit Constraint Parameter Decision Method for
        CDMA Digital Demodulator,'' Proc. CDMA Int. Conf. & Exhibition, Nov. 2000, vol. 2,
        pp. 583-586, Seoul, Korea.
   7.   S. Nahm, K. Han, and W. Sung, ``A CORDIC-based Digital Quadrature Mixer:
        Comparison with ROM-based Architecture,'' Proc. IEEE Int. Sym. on Circuits and
        Systems, Jun. 1998, vol. 4, pp. 385-388, Monterey, CA USA.
                                       Publications                                           41




                            Publications-II
• Journal Articles
      K. Han and B. L. Evans, ``Optimum Wordlength Search Using A Complexity-And-
       Distortion Measure,'' EURASIP Journal on Applied Signal Processing, special issue on
       Design Methods for DSP Systems, vol. 2006, no. 5, pp. 103-116, 2006.
• Other Publications
   1. K. Han, E. Soo, H. Jugn, and K. Kim, Apparatus and Method for Short-Delay Multipath
      Searcher in Spread Spectrm Systems, U.S. Patent pending, Nov. 2001.
   2. K. Han, I. Lim, E. Soo, H. Seo, K. Kim, H. Jung, and H. Cho, Apparatus and Method for
      Separating Carrier of Multicarrier Wireless Communication Receiver System, U.S.
      Patent pending, Sep. 2001.
   3. K. Han, ``Carrier Synchronization Scheme Using Input Signal Interpolation for Digital
      Receivers,'' Master's Thesis, Seoul National University, Seoul, Korea, Feb. 1998.
          Backup             42




Research on Transformation
                        Backup                              43




                   Simulation Flow

Gradient-based               Genetic
search algorithm         search algorithm
                                             Generate
                                             Pareto Front
  Setup desired         Search wordlength
  specification                sets

Search wordlength
                          Pick one of sets
       set

            Generate Optimized
            fixed-point program
                                      Backup                                                                          44




        Algorithm Design and Implementation
Low                                                                                        High
                    Floating-Point              Floating-Point
                       Programs                   Processor

                Code Conversion




                                                                     Hardware Complexity


                                                                                                  Power Consumption
 Design Time




                 Uniform Wordlength              Fixed-Point
                     Fixed-Point                  Processor
                      Programs

                Wordlength Optimization

                     Optimized
                                                 Fixed-Point
                     Fixed-Point
                                                     IC
                      Programs
High                                                                                       Low

               Algorithm Design                Algorithm Implementation
                                  Backup                                            45




  Wordlength Optimization Constraints
• Distortion constraint                    • Complexity constraint
  Application-specific                     Application-specific
  distortion d(w)                          distortion d(w)

  Dmax




                                                                  Cmax
                     Implementation                               Implementation
                     complexity c(w)                              complexity c(w)
                            Backup                               46




             Gradient-Based Search
• Gradient information can be used for update direction
• Gradient information is measured in design
  parameters such as implementation complexity,
  precision distortion, or power consumption
    Complexity measurement (CM)          [Sung and Kum, 1995]

    Distortion measurement (DM)        [Han et al., 2001]

    Complexity-and-distortion measurement (CDM)
     [Han and Evans, 2004] (proposed)
                                            Backup                             47




                    Gradient Information
                 f (w ( h ) )  f ({w1( h) , w2h ) ,...,wnh1) ,...,wNh ) })
                                              (          (           (


                                     wnh )  wnh1)
                                        (          (
 w2
                                                      N   number of variable
   3   20          23                                 h   iteration index
            10          8
   2   10          15             25
                                                      n   variable index
             5              10
                                                      w   wordlength vector
                                       w1
        2           3             4
                                                     f(w) objective function
                                  Search direction
                        b
Objective
 value              a
                              b        Gradient
                               Backup                                        48




     Gradient-Based Search Direction
• Wordlength update (s: step size)
                             
               w j 1  w j  s j
• Direction
                                                d
                 (1,0,0,..., ) if m j 
                             0
                                                w1      Finite Difference
                
                                               d
              (0,1,0,..., ) if m j 
                             0
           j                                 w2
                             ..........
                      ..........      ..........
                
                                               d
                (0,0,0,..., ) if m j  w
                             1
                                                  N
                              d d              d
      where     m j  max(        ,      ,....,      )
                             w1 w2            wN
                           Backup                              49




   Complexity and Distortion Function
• Complexity function, c(w)
    Number of multiplications is counted
    Hardware complexity is estimated by assuming that
     complexity linearly increases as wordlength increases
    Given hardware model results in accurate complexity
• Distortion function, d(w)
    Difficult to derive closed-form mathematical expression
    Estimated by computer simulation measuring output SNR
     or bit error rate in digital communication systems
                         Backup                                50




 Complexity Measure [Sung and Kum, 1995]
• Uses complexity sensitivity information as direction
  to search for optimum wordlength
• Advantage: minimizes complexity
• Disadvantage: demands large number of iterations

Objective function     f ( w )  c( w )
Optimization problem   min{ f (w) | d (w)  Dmax, w  w  w}
                       wI
                         n



Update direction         c(w)
                                 Backup                                   51




         Distortion Measure [Han et al., 2001]
• Applies the application performance information to
  search for the optimum wordlengths
• Advantage: Fewer number of iterations
• Disadvantage: Not guaranteed to yield optimum
  wordlength for complexity
Objective function       f (w )  d (w )

Optimization problem min{ f (w) | d (w)  Dmax, c(w)  Cmax, w  w  w}
                         wI n


Update direction           d (w)
                                          Backup                                                       52




Feasible Solution Search [Sung and Kum, 1995]
• Exhaustive search of all possible                   w2
  wordlengths
                                                                            wopt
• Advantages                                           5                           24

    Does not miss optimum points                                                       23
                                                              dw2
    Simple algorithm                                                                        22

                                                                                                  21
• Disadvantage                                                 wb
                                                                      dw1
    Many trials (=experiments)
• Distance d  dw1  dw2  ...  dwN                                         5                    w1
                                                                Direction of full search:
• Expected number of iterations
                                                            minimum wordlengths {2,2}
                      (d  N  1)...(d  2)(d  1)d        optimum wordlengths = {5,5}
         EFS (d ) 
          N

                                   N!                                               d=6
                                                                              trials = 24
                           Backup                                              53




         Sequential Search [K. Han et al. 2001]
• Greedy search based on sensitivity information
  (gradient)                                                         
                                 w            
                                           w w                   s j
• Example
                                      2
                                                  j 1       j


                                                         wopt
    Minimum wordlengths {2,2}
                                      5

    Direction of sequential search
                                          dw2

    Optimum wordlengths {5,5}
    12 iterations                         wb
                                                dw1


• Advantage: Fewer trials                                5                w1

• Disadvantage: Could miss global optimum point
                              Backup                                   54




       Case Study: Receiver Design
      Transmitter
                                   Multicarrier
           Data     Encoder        Modulator


Receiver                                                    Wireless
                    Channel w2
                                                            Channel
                    Estimator
                    w
     Bit Error       3                                 w
                              w         Multicarrier    0
       Rate         Channel    1
                                        Demodulator
      Tester        Equalizer


w0   Input wordlength of a multicarrier demodulator which performs
     a fast Fourier transform (FFT)
w1   Input wordlength of equalizer
w2   Input wordlength of channel estimator
w3   Output wordlength of channel estimator
                                        Backup                                                55




                        Simulation Results
• CDM leads to lower complexity compared to DM
• CDM reduces the number of trials compared to CM,
  feasible solution [Sung and Kim 1995], and exhaustive search
      Fast searching
  Search     Gradient   αc    Number     Simulations   Wordlength   Complexity   Distortion
  Method     Measure             of                        for       Estimate     (BER)*
                               Trials                   Variables

 Gradient      DM         0      16             64 {10,9,4,10}         10781        0.0009
 Gradient     CDM       0.5      15             60 {7,10,4,6}           7702        0.0012
 Gradient      CM         1      69             69 {7,7,4,6}            7699        0.0015
 Feasible       -         -     210            210 {7,7,4,6}            7699        0.0015
Exhaustive      -         -   26364          26364      -                  -             -
* Required BER ≤ 1.5 x 10-3
                             Backup                                   56




            Simulation Environments
• Assumptions                                Complexity Vector
    Internal wordlengths of blocks have       Input     Weight
     been decided                              FFT             1024
    Complexity increases linearly as        Equalizer            1
     wordlength increases                     (right)
                                             Estimator          128
• Required application performance           Equalizer            2
    Bit error rate of 1.5 x 10-3 (without    (upper)
     error correcting codes)
                                                 Complexity
• Simulation tool                                C(w) = cT.w
    LabVIEW 7.0
                          Backup           57




                     FFT Cost
• N Tap FFT cost
                          N
              Cost FFT    log 2 N
                          2
• 256 Tap FFT cost

                           256
              CostFFT         log 2 256
                            2
                          1024
                               Backup   58




               Minimum Wordlengths
• Change one wordlength
  variable while keeping
  other variables at high
  precision
    {1,16,16,16},{2,16,16,16},...
    {16,1,16,16},{16,2,16,16},...
    …
    …{16,16,16,15},{16,16,16,16}

• Minimum wordlength
  vector is {5,4,4,4}
                        Backup       59




                  Number of Trials
• Start at {5,4,4,4} wordlength
• Next wordlength vectors
  for complexity measure
  (α = 1.0)
   {5,4,4,4},
   {5,5,4,4}, …
• Increase wordlength one-by-
  one until satisfying required
  application performance
                             Low-Power Signal Processing                                 60




                     Power Consumption
• Power consumption in CMOS circuits
                                                                 α : Transition factor
               Pavg  Pswitching  Pshort circuit  Pleakage
                                                                 C : Capacitance
• Significant power in CMOS circuits is                          V : Supply voltage
  dissipated when they are switching                             f : Frequency
                          Pswitching   C V 2 f
• Power reduction in hardware part [Chandrakasan and Brodersen, 1995]
    Scaling down, minimizing area
    Adjusting voltage and frequency during operation
• Power reduction in software part [Tiwari, Malik and Wolfe, 1994] [Lee et al., 1997]
    Instruction ordering and packing
    Energy reductions varying from 26% to 73%
                        Low-Power Signal Processing     61




   Wordlength for Low-Power Consumption

• Power model of wordlength [Choi and Burleson, 1994]
    Wordlength is considered as capacitance
    Power consumption is proportional to wordlength
    Switching activity is not considered
• Data wordlength reduction technique
  [Han, Evans, and Swartzlander, 2004] (proposed)

    Count node transitions for switching activity
    Reduce input data wordlength to decrease power
      consumption
                                           Backup                                                      62




               Dynamic and Static Power




Trends in dynamic and static power dissipation showing
        increasing contribution of static power
[S. Thompson, P. Packan, and M. Bohr. MOS Scaling: Transistor Challenges for the 21st Century. Intel
                                  Technology Journal, Q3 1998]
                              Backup                                        63




  Power Dissipation of Multiplier Unit
• Multiply unit is usually a major source of power
  consumption in typical DSP applications
    Multiply unit required
     for digital communication
     & digital signal processing
     algorithms
    Digital filters, equalizers,
     FFT/IFFT, digital down/
     upconverter, etc.


                              TMS320C5x Power Dissipation Characteristics
                                          from www.ti.com
                              Backup                                      64




         Wallace vs. Booth Multipliers


                       Symmetric


                          Asymmetric
                         (one operand
                           recoded)




    Tree dot diagram                       Radix-4 multiplier based
in 4-bit Wallace multiplier             on Booth’s recoding (Χ ● a = P)
                          Backup                              65




    Radix-4 Modified Booth Multiplier
• One multiplicand is recoded • Three bits in X are recoded
• The a and x are multiplicands  to z
• P is product of multiplication
                            Backup                             66




      Switching Activity in Multipliers
• Logic delay and propagation cause glitches
• Proposed analytical method
    Hard to estimate glitches in closed form
    Analyze switching activity w/r to input data wordlength
    Does not consider multiplier architecture
• Simulation method
    Count all switching activities
     (transition counts in logic)
    Power estimation (Xilinx XPower)
    Considers multiplier architecture
                    Reduce Power Consumption in Arithmetic                                      67




                   Analytical Method
• Stream of data for one multiplicand                                                L
                                                                      E ( X )   x  PX ( x)
• Compare two adjacent numbers                                                    x 0



  in stream after reduction                                                 L bits

• Expectation of bit                   L                           M bits            N bits
                            EL ( X ) 
  switching, x, with                   2                       S      …                  …

  probability Px             LN M                             S      …                  …
    L-bit input data    Etr ( X )                                  …                  …
                                     2   2                     S S          S S
    Truncate input data
     to M bits (remove N bits)
    N-bit signed right shift in        Ers ( X ) 
                                                      1                 1
                                                        E ( X | Y  0)  E ( X | Y  1) 
                                                                                          L
     L-bit input (Y is sign bit)                      2                 2                 2
                                    Backup                          68




                  Analytical Method

                                1                 1
X has binomial    Ers ( X )      E ( X | Y  0)  E ( X | Y  1)
                                2                 2
 distribution




            Always L/2 (independent on M and N)
                                               Backup                                                 69




              Power Reduction in TI DSP
• TI TMS320VC5416 DSP STARTER KIT
      Radix-4 modified Booth multiplier
      Measure average current for wordlength reduction of
        multiplicands
                                               581
   loop:                                       580
   STM data_a, AR2;
                                               579
   STM data_b, AR3;
                                Current [mA]



   MPY *AR2+, *AR3+,a                          578
   MPY *AR2+, *AR3+,a
                                               577                                 (w,w)
   ….….
                                                                                   (16,w)
   MPY *AR2+, *AR3+,a                          576
                                                                                   (w,16)
   B loop                                      575                                 (wrsh,wrsh)

Assembly program (data_a and                   574
                                                     0   2   4   6     8      10   12      14    16
  data_b has random data with                                    Wordlength (w)
         wordlength w)
                                    Backup                                           70




  Code Generation for Fixed-Point Program

• Adder function in MATLAB
                                             Function [c] = adder_fx(a, b)
  Function [c] = adder(a, b)                 c = 0;
  c = 0;                                     a = fi (a, 1,32,16);      Determined
  c = a + b;                                 b = fi (b, 1,32,16);          by
                                             c = fi (c, 1,32,16);       designers
 (a) Floating point program for adder                                   with trial
                                             c(:) = a + b;
                                                                        and error
                                             (b) Raw fixed-point program
Function [c] = adder_fx(a, b, numtype)                        WL
c = 0;                                                S
a = fi (a, numtype.a);                                            FWL
b = fi (b, numtype.b);
c = fi (c, numtype.c);                        fi(a, S,WL,FWL) is a constructor
c(:) = a + b;                                 function for a fixed-point object in
 (c) Converted fixed-point program for        fixed-point toolbox [S: Signed, WL:
     automating optimization                  Wordlength, FWL: Fraction length]
                           Backup      71




                     Code Generation




   <Run Code Generation>




<Floating-point Program>
                         Backup                   72




           Running Transformation
• Just call top function with input data

               > in = rand(1,1000)
               > mac_top(in)


• Range and optimum wordlengths depend on input
  statistic
          Backup               73


Advantages/disadvantages of
wordlength search algorithms

								
To top