Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C by rt3463df

VIEWS: 13 PAGES: 32

									Embedded ISA Support for Enhanced Floating-Point
      to Fixed-Point ANSI C Compilation


                Tor Aamodt and Paul Chow
                  University of Toronto

                     { aamodt, pc }@eecg.utoronto.ca




         3rd ACM International Conference on Compilers, Architectures and Synthesis for
         Embedded Systems, Nov. 17-18th, 2000, San Jose CA
    What is this presentation about?

FOCUS: Signal processing applications developed
 using high-level language representation and
 floating-point data types...
WANT: Faster fixed-point software development...

QUESTION: Are there “better” fixed-point DSP
 instruction-sets in terms of runtime, power, or
 roundoff-noise performance?


    Tor Aamodt & Paul Chow    Embedded ISA Support for Enhanced Floating-
      University of Toronto     Point to Fixed-Point ANSI C Compilation     Slide 2 of 32
             Presentation Outline
Motivation & Background
Focus on…
     Automatic Conversion to Fixed-Point
     Architectural Enhancements
     Some Experimental Results

Summary / Future Directions

Tor Aamodt & Paul Chow    Embedded ISA Support for Enhanced Floating-
  University of Toronto     Point to Fixed-Point ANSI C Compilation     Slide 3 of 32
                             Motivation

80% of DSPs in use are Fixed-Point. Why?

Because fixed-point hardware is cheaper and
 uses less power …

… however, it is much harder to develop
 signal-processing software for.

   Tor Aamodt & Paul Chow    Embedded ISA Support for Enhanced Floating-
     University of Toronto     Point to Fixed-Point ANSI C Compilation     Slide 4 of 32
                                 Background


 UTDSP Project: DSP Compiler/Architecture Co-design
    Traditional DSP architectures are hard for compilers to generate
     efficient code for… eg. extended precision accumulators
    First Generation Silicon Sept. 30, 1999: 108 pin PGA 0.35 µm
     CMOS / 63 MHz (Sean Peng‟s M.A.Sc.)
    16-bit Fixed-Point VLIW DSP with novel 2-level Instruction
     fetching architecture (reduced pin-count)


 June 2000: Synopsys CoCentric Fixed-Point Designer Tool
    First commercial tool for transforming floating-point ANSI C
     programs into fixed-point ($20,000 US)


    Tor Aamodt & Paul Chow    Embedded ISA Support for Enhanced Floating-
      University of Toronto     Point to Fixed-Point ANSI C Compilation     Slide 5 of 32
    Background: Fixed-Point versus Floating-Point

                 sign bit             8 bit exponent       23+1 bit normalized
                                       (excess 127)             mantissa




32 bit Floating-Point (IEEE):
                                                                       explicit
                                                                       binary-point
Fixed-Point:
                                                                       implied
                                                                       binary-point


                  sign bit             Integer Part      Fractional Part


     Tor Aamodt & Paul Chow    Embedded ISA Support for Enhanced Floating-
       University of Toronto     Point to Fixed-Point ANSI C Compilation     Slide 6 of 32
          Background: Using Fixed-Point Arithmetic




Floating-Point:                   yn = yn-1 + xn




  Fixed-Point:                     yn =   ((•y   n-1>>3)      + xn   ) <<    1


           Explicit Scaling Operations


  Tor Aamodt & Paul Chow    Embedded ISA Support for Enhanced Floating-
    University of Toronto     Point to Fixed-Point ANSI C Compilation     Slide 7 of 32
        Automatic Conversion Process

              Traditional Optimizing Compiler:


 Input
                 Parser         Optimizer       Code Generator            Processor
Program




        • CONSTRAINT:                    Input/Output Invariance
        • GOAL:                          Application Speedup


        ie. make code faster, but do not break anything!!!


  Tor Aamodt & Paul Chow    Embedded ISA Support for Enhanced Floating-
    University of Toronto     Point to Fixed-Point ANSI C Compilation      Slide 8 of 32
          Automatic Conversion Process
              Traditional Optimizing Compiler:


 Input
                   Parser           Optimizer       Code Generator          Processor
Program



Sample
Inputs          Floating-Point to Fixed-Point Translator


• “RELAX” CONSTRAINTS…
• GOALS: “Good” Input/Ouput Fidelity (eg. good signal-to-noise ratio)
         Fast/Low-Power Operation (10-500  faster than FP emulation)


    Tor Aamodt & Paul Chow    Embedded ISA Support for Enhanced Floating-
      University of Toronto     Point to Fixed-Point ANSI C Compilation     Slide 9 of 32
         Floating-Point to Fixed-Point Translation



float a, b, x[N];                               int a, b, x[N];
y = a*x[i] + b*x[i+1];                          y = a•x[i] >> 2 + b•x[i+1];



1. Type Conversion
2. Scaling Operations
3. Fractional Fixed-Point Operations




     Tor Aamodt & Paul Chow    Embedded ISA Support for Enhanced Floating-
       University of Toronto     Point to Fixed-Point ANSI C Compilation     Slide 10 of 32
            Floating-Point to Fixed-Point Translator


SUIF Parser*                                                                    Optimizer




                              Identifier Assignment          Fixed-PointConversion


                               Instrument Code

Sample Inputs                       Profile



                                                        *SUIF = Stanford University Intermediate Format
                                                                See: http://suif.stanford.edu

    Tor Aamodt & Paul Chow      Embedded ISA Support for Enhanced Floating-
      University of Toronto       Point to Fixed-Point ANSI C Compilation              Slide 11 of 32
           Collecting Dynamic Range Information

Consider the ANSI C code:
   float a, b, x[N];                                    Code Instrumentation:
   y = a*x[i] + b*x[i+1];
                                                            tmp_1 = a*x[i];
                                                            profile(tmp_1,1);
Equivalent Expression Tree:
  ID Assignment:                                            tmp_2 = b*x[i+1];
                                       a
                                                            profile(tmp_2,2);
  “1” : tmp_1
                               *
                                       x[i]                 y = tmp_1 * tmp_2;
  “0” :    y        +                                       profile(y,0);
                                       b

                               *
  “2” : tmp_2                          x[i+1]



     Tor Aamodt & Paul Chow    Embedded ISA Support for Enhanced Floating-
       University of Toronto     Point to Fixed-Point ANSI C Compilation     Slide 12 of 32
            Generating Scaling Operations


Signal Scaling: Integer Word Length (IWL)
  definition:               IWL[x] = log2 max(x) + 1



                                      IWL

                 Sign bit        Integer Part      Fractional Part



  Tor Aamodt & Paul Chow    Embedded ISA Support for Enhanced Floating-
    University of Toronto     Point to Fixed-Point ANSI C Compilation     Slide 13 of 32
              Generating Scaling Operations


Example: “A op B”:
                                                               IWLA op B measured
      IWLA measured                                              IWLA op B     current
      IWLA current                                          ?
                                               op               IWLB measured
                                                                IWL
                                                                     B current
  Converted                             A             B
Sub-Expressions


    Tor Aamodt & Paul Chow    Embedded ISA Support for Enhanced Floating-
      University of Toronto     Point to Fixed-Point ANSI C Compilation     Slide 14 of 32
                             Automatic Conversion Process:

 IRP: Using Intermediate Result Profile Data
Previous Algorithms:
   „Worst-Case Evaluation‟: Markus Willems et. al. FRIDGE:
      An Interactive Code Generation Environment for HW/SW CoDesign.
      ICASSP, April 1997. (a.k.a. Predecessor to Synopsys CoCentric Fixed-
      Point Designer Tool)

   A „Statistical‟ Approach: Ki-Il Kum, Jiyang Kang, and
      Wonyong Sung. A Floating-Point to Fixed-Point C Converter for Fixed-
      Point Digital Signal Processors. In Proc. 2nd SUIF Compiler Workshop,
      August 1997.

Neither use Intermediate Result Profile data,
 instead, they combine range information from leaf
 nodes  Is Useful Information Lost?
   Tor Aamodt & Paul Chow     Embedded ISA Support for Enhanced Floating-
     University of Toronto      Point to Fixed-Point ANSI C Compilation     Slide 15 of 32
       IRP: Additive Operations
“A ± B”       For example, assume |A| > |B|, and
              IWLA+B measured  IWLA measured

            A:

            B:                                                  >> n

                                      n

  “A  B”                    “(A << nA)  (B >> [n-nB])”
  where:                    nA = IWLA     current    -   IWLA   measured
                            nB = IWLA     current    -   IWLB   measured
                            n = IWLA      measured   -   IWLB   measured

  IWLA+B    current         = IWLA   measured


  Tor Aamodt & Paul Chow      Embedded ISA Support for Enhanced Floating-
    University of Toronto       Point to Fixed-Point ANSI C Compilation     Slide 16 of 32
              IRP: Multiplication

   “A • B”                    “(A << nA) • (B << nB)”



    where:                    nA = IWLA   current   -    IWLA      measured
                              nB = IWLA   current   -    IWLB      measured




  IWLA•B      current     = IWLA   measured   + IWLB    measured




Tor Aamodt & Paul Chow    Embedded ISA Support for Enhanced Floating-
  University of Toronto     Point to Fixed-Point ANSI C Compilation       Slide 17 of 32
                            IRP: Division

“A / B”                “(A >> [ndividend - nA]) / (B << nB)”


    nA = IWLA current - IWLA measured
    nB = IWLA current - IWLB measured
    ndiff = IWLA/B measured - IWLA measured + IWLB                   measured

                            ndiff , if ndiff  0
    ndividend =
                            0 , otherwise




  Tor Aamodt & Paul Chow     Embedded ISA Support for Enhanced Floating-
    University of Toronto      Point to Fixed-Point ANSI C Compilation     Slide 18 of 32
IRP-SA: Using „Shift Absorption‟
Example:

       y = (a*x[i] + (b*x[i+1]>>1)) << 1


Question: Is information discarded unnecessarily here?
Consider the following alternative:

        y = (a*x[i]<<1) + b*x[i+1]

BUT: Can we really discard most significant bits and
get roughly the same answer???? YES!
  Tor Aamodt & Paul Chow    Embedded ISA Support for Enhanced Floating-
    University of Toronto     Point to Fixed-Point ANSI C Compilation     Slide 19 of 32
                   Architectural Support
Common occurrence (using IRP-SA):                                      A•B << n




Fractional Multiplication                                       IWLA
with internal Left Shift                   A:

                                                                    IWLB
                                           B:


A*B:
               IWLA+ IWLB


               n

       Tor Aamodt & Paul Chow    Embedded ISA Support for Enhanced Floating-
         University of Toronto     Point to Fixed-Point ANSI C Compilation     Slide 20 of 32
            Experimental Results
                                Benchmarks
 4th Order Cascaded/Parallel IIR Filter (IIR-C, IIR-P)
 (Normalized) Lattice Filter (LAT, NLAT)
 128-Point Radix 2 Decimation in Time FFT (FFT-NR, FFT-MW)
 Levinson-Durbin Recursion (LEVDUR)
 10x10 Matrix-Multiply (MMUL10)
 Nonlinear Control (INVPEND)
 Trig Function (SIN)




Tor Aamodt & Paul Chow    Embedded ISA Support for Enhanced Floating-
  University of Toronto     Point to Fixed-Point ANSI C Compilation     Slide 21 of 32
SQNR Enhancement: FMLS and/or IRP-SA

                     2
                                                                                              IRP-SA
                                                                                              FMLS
                                                                                              IRP-SA w/ FMLS
                   1.5
 Equivalent Bits




                     1



                   0.5



                     0



                   -0.5
                          IIR4-C   IIR4-P   NLAT   LAT   FFT-NR   FFT-MW   INVPEND   LEVDUR     MMUL10    SIN




             Tor Aamodt & Paul Chow          Embedded ISA Support for Enhanced Floating-
               University of Toronto           Point to Fixed-Point ANSI C Compilation                         Slide 22 of 32
  What Is The Effect of “Shift Absorption” ?
                             Distribution of Fractional Multiply Output Shifts
                       0.8


                                   IRP
  Relative Frequency


                       0.6
                                   IRP-SA

                       0.4


                       0.2


                        0
                             3 left      2 left      1 left      none      1 right
                                           FMLS Ouput Shift Distance


Tor Aamodt & Paul Chow                 Embedded ISA Support for Enhanced Floating-
  University of Toronto                  Point to Fixed-Point ANSI C Compilation     Slide 23 of 32
                                  Experimental Results:


  Rotational Inverted Pendulum
U of T System Control Group
Non-linear Testbench




   Tor Aamodt & Paul Chow    Embedded ISA Support for Enhanced Floating-
     University of Toronto     Point to Fixed-Point ANSI C Compilation     Slide 24 of 32
 Closed-Loop System Response: Rotational Inverted Pendulum
               12-bit Controller Comparison




                                  WC :            32.8 dB
                                  IRP-SA:         41.1 dB
                                  IRP-SA w/ fmls: 48.0 dB



Tor Aamodt & Paul Chow    Embedded ISA Support for Enhanced Floating-
  University of Toronto     Point to Fixed-Point ANSI C Compilation     Slide 25 of 32
               128-Point Radix-2 FFT
     (Generated by MATLAB RealTime Workshop)




Tor Aamodt & Paul Chow    Embedded ISA Support for Enhanced Floating-
  University of Toronto     Point to Fixed-Point ANSI C Compilation     Slide 26 of 32
                            Speedup?
              Rotational Inverted Pendulum:
              Fractional Multiply Output Shift Relative Frequencies




Tor Aamodt & Paul Chow    Embedded ISA Support for Enhanced Floating-
  University of Toronto     Point to Fixed-Point ANSI C Compilation     Slide 27 of 32
                              …Yup!




Tor Aamodt & Paul Chow    Embedded ISA Support for Enhanced Floating-
  University of Toronto     Point to Fixed-Point ANSI C Compilation     Slide 28 of 32
                                      Speedup* Using FMLS
                   1.4
                                   Limiting
                                   8-FMUL = { 4 left thru 3 right }
                   1.3             4-FMUL = { 2 left thru 1 right }
                                   2-FMUL = { one left, no shift }
Relative Speedup




                   1.2



                   1.1



                    1
                                                         LAT
                                               NLAT




                                                                                  LEVDUR
                                                                FFT-NR




                                                                                                    INVPEND
                                                                         FFT-MW




                                                                                           MMUL10
                                      IIR4-P
                          IIR4-C




                                                                                                              SIN
                   Tor Aamodt & Paul Chow             Embedded ISA Support for Enhanced Floating-
                     University of Toronto              Point to Fixed-Point ANSI C Compilation                     Slide 29 of 32
SQNR Enhancement for various Output Shift Sets

                     2
                                            Limiting
                                            8-FMUL
                                            4-FMUL
                                            2-FMUL
                   1.5
Equivalent Bits




                     1




                   0.5




                     0
                          IIR4-C   IIR4-P     NLAT     LAT   FFT-NR   FFT-MW   LEVDUR   MMUL10   INVPEND   SIN




                  Tor Aamodt & Paul Chow       Embedded ISA Support for Enhanced Floating-
                    University of Toronto        Point to Fixed-Point ANSI C Compilation                   Slide 30 of 32
                               Summary
The Fractional Multiply with internal Left Shift
 (FMLS) operation can improve runtime and
 signal-to-noise performance. Speedups of up to
 35% and SQNR enhancement equivalent of up to
 2 bits maybe even 4 bits (depending on how you
 choose to measure it)

Easy VLSI implementation, and easy for compiler
 to use.

   Tor Aamodt & Paul Chow    Embedded ISA Support for Enhanced Floating-
     University of Toronto     Point to Fixed-Point ANSI C Compilation     Slide 31 of 32
                  Future Directions
Higher Level Transformations:
   Automatic Generation of Block-Floating-Point...
   Quantization Error Feedback…
   BOTH need signal-flow-graph representation…
       therefore probably need a better DSP language
       than ANSI C


Variable Precision Arithmetic (How much
 precision does each operation need?)

 Tor Aamodt & Paul Chow    Embedded ISA Support for Enhanced Floating-
   University of Toronto     Point to Fixed-Point ANSI C Compilation     Slide 32 of 32

								
To top