Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

fast math functions by 4a0Bz8

VIEWS: 0 PAGES: 215

									Faster Math Functions

           Robin Green
           R&D Programmer
  Sony Computer Entertainment America
What Is This Talk About?
   This is an Advanced Lecture
    • There will be equations
    • Programming experience is assumed


   Writing your own Math functions
    •   Optimize for Speed
    •   Optimize for Accuracy
    •   Optimize for Space
    •   Understand the trade-offs
Running Order
   Part One – 10:00 to 11:00
    • Floating Point Recap
    • Measuring Error
    • Incremental Methods
       - Sine and Cosine


   Part Two – 11:15 to 12:30
    • Table Based Methods
    • Range Reduction
    • Polynomial Approximation
Running Order
   Part Three – 2:00 to 4:00
    • Fast Polynomial Evaluation
    • Higher Order functions
       - Tangent
       - Arctangent, Arcsine and Arccosine


   Part Four – 4:15 to 6:00
    • More Functions
       - Exponent and Logarithm
       - Raising to a Power
    • Q&A
Floating Point Formats
32-bit Single Precision Float
Floating Point Standards
   IEEE 754 is undergoing revision.
    • In process right now.


   Get to know the issues.
    • Quiet and Signaling NaNs.
    • Specifying Transcendental Functions.
    • Fused Multiply-Add instructions.
History of IEEE 754
History of IEEE 754
   IEEE754 ratified in 1985 after 8 years of
    meetings.

   A story of pride, ignorance, political
    intrigue, industrial secrets and genius.

   A battle of Good Enough vs. The Best.
Timeline: The Dark Ages
   Tower of Babel
    • On one machine, values acted as non-zero for add/subtract
      and zero for multiply-divide.
         b = b * 1.0;
         if(b==0.0) error;
         else return a/b;

    • On another platform, some values would overflow if
      multiplied by 1.0, but could grow by addition.
    • On another platform, multiplying by 1.0 would remove the
      lowest 4 bits of your value.
    • Programmers got used to storing numbers like this
         b = (a + a) - a;
Timeline: 8087 needs “The Best”
   Intel decided the 8087 has to appeal to
    the new mass market.

    • Help “normal” programmers avoid the
      counterintuitive traps.

    • Full math library in hardware, using only 40,000
      gates.

    • Kahan, Coonen and Stone prepare draft spec, the
      K-C-S document.
Timeline: IEEE Meetings
   Nat Semi, IBM, DEC, Zilog, Motorola,
    Intel all present specifications.
    • Cray and CDC do not attend…
   DEC with VAX has largest installed base.
    • Double float had 8-bit exponent.
    • Added an 11-bit “G” format to match K-C-S, but
      with a different exponent bias.
   K-C-S has mixed response.
    • Looks complicated and expensive to build.
    • But there is a rationale behind every detail.
Timeline: The Big Argument
 K-C-S specified Gradual Underflow.
 DEC didn’t.
Timeline: The Big Argument
   Both Cray and VAX had no way of
    detecting flush-to-zero.

   Experienced programmers could add
    extra code to handle these exceptions.

   How to measure the Cost/Benefit ratio?
Timeline: Trench Warfare
   DEC vs. Intel
    • DEC argued that Gradual Underflow was
      impossible to implement on VAX and too
      expensive.
    • Intel had cheap solutions that they couldn’t share
      (similar to a pipelined cache miss).


   Advocates fought for every inch
    • George Taylor from U.C.Berkeley built a drop-in
      VAX replacement FPU.
    • The argument for “impossible to build” was
      broken.
Timeline: Trench Warfare
   DEC turned to theoretical arguments
    • If DEC could show that GU was unnecessary then
      K-C-S would be forced to be identical to VAX.


   K-C-S had hard working advocates
    • Prof Donald Knuth, programming guru.
    • Dr. J.H. Wilkinson, error-analysis & FORTRAN.


   Then DEC decided to force the
    impasse…
Timeline: Showdown
   DEC found themselves a hired gun
    • U.Maryland Prof G.W.Stewart III, a highly respected
      numerical analyst and independent researcher


   In 1981 in Boston, he delivered his verdict
    verbally…


       “On balance, I think Gradual Underflow is
                the right thing to do.”
Timeline: Aftermath
   By 1984, IEEE 754 had been
    implemented in hardware by:
       •   Intel   •   Nat. Semi.
       •   AMD     •   Weitek
       •   Apple   •   Zilog
       •   IBM     •   AT&T



   It was the de facto standard long before
    being a published standard.
Why IEEE 754 is best
The Format
   Sign, Exponent, Mantissa
    • Mantissa used to be called “Significand”

   Why base2?
    • Base2 has the smallest “wobble”.
    • Base2 also has the hidden bit.
       - More accuracy than any other base for N bits.
       - Base3 arguments always argue using fixed-point values


   Why 32, 64 and 80-bit formats?
    • Because 8087 could only do 64-bits of carry
      propagation in a cycle!
Why A Biased Exponent?
   For sorting.
   Biased towards underflow.

              exp_max = 127;
              exp_min = -126;

    •   Small number reciprocals will never Overflow.
    •   Large numbers will use Gradual Underflow.
The Format
   Note the Symmetry
      1   11111111   ???????????????????????   Not A Number

      1   11111111   00000000000000000000000   Negative Infinity

      1   11111110   ???????????????????????   Negative Numbers

      1   00000000   ??????????????????????1   Negative Denormal

      1   00000000   00000000000000000000000   Negative Zero

      0   00000000   00000000000000000000000   Positive Zero

      0   00000000   ??????????????????????1   Positive Denormal

      0   00000001   ???????????????????????   Positive Numbers

      0   11111111   00000000000000000000000   Positive Infinity

      0   11111111   ???????????????????????   Not A Number
Rounding
   IEEE says operations must be “exactly
    rounded towards even”.

   Why towards even?
    • To stop iterations slewing towards infinity.
    • Cheap to do using hidden “guard digits”.


   Why support different rounding modes?
    • Used in special algorithms, e.g. decimal to binary
      conversion.
Rounding
   How to round irrational numbers?
    • Impossible to round infinite numbers accurately.
    • Called the Table Makers Dilemma.
       - In order to calculate the correct rounding, you need to
         calculate worst case values to infinite precision.
       - E.g. Sin(x) = 0.02310000000000000007


   IEEE754 just doesn’t specify functions
    • Recent work looking into worst case values
Special Values
   Zero
    • 0.0 = 0x00000000


   NaN
    • Not an number.
    • NaN = sqrt(-x), 0*infinity, 0/0, etc.
    • Propagates into later expressions.
Special Values
   ±Infinity
    • Allows calculation to continue without overflow.

   Why does 0/0=NaN when ±x/0=±infinity?
    • Because of limit values.
    • a/b can approach many values, e.g.

              sin x      
                       1 
                 x
            1  cosx      as x  0
                         0
                 x         
Signed Zero
   Basically, WTF?
    • Guaranteed that +0 = -0, so no worries.


   Used to recover the sign of an
    overflowed value
    • Allows 1/(1/x) = x as xinf
    • Allows log(0)=-inf and log(-x)=NaN
    • In complex math, sqrt(1/-1) = 1/sqrt(-1) only
      works if you have signed zero
Destructive Cancellation
   The nastiest problem in floating point.
   Caused by subtracting two very similar values
     • For example, in quadratic equation if b2 ≈ 4ac
    • In fixed point…


            1.10010011010010010011101
          - 1.10010011010010010011100

              0.00000000000000000000001

    • Which gets renormalised with no signal that almost all digits
      have been lost.
Compiler “Optimizations”
   Floating Point does not obey the laws of
    algebra.
     • Replace x/2 with 0.5*x – good
     • Replace x/10 with 0.1*x – bad
     • Replace x*y-x*z with x*(y-z) – bad if y≈z
     • Replace (x+y)+z with x+(y+z) – bad


   A good compiler will not alter or reorder
    floating point expressions.
    • Compilers should flag bad constants, e.g.

          float x = 1.0e-40;
Decimal to Binary Conversion
   In order to reconstruct the correct binary value
    from a decimal constant


            Single float : 9 digits
            Double float : 17 digits

    • Loose proof in the Proceedings
       - works by analyzing the number of representable values in sub-
         ranges of the number line, showing a need for between 6 and
         9 decimal digits for single precision
Approximation Error
Measuring Error
   Absolute Error
    • Measures the size of deviation, but tell us nothing
      about the significance
    • The abs() is often ignored for graphing




      errorabs  f actual  f approx
Measuring Error
   Absolute Error sometimes written ULPs
    • Units in the Last Place


       Approx         Actual      ULPs


       0.0312         0.0314      2


       0.0314         0.0314159   0.159
Measuring Error
   Relative Error
    • A measure of how important the error is.




                                  f approx
           errorrel  1 
                                  f actual
Example: Smoothstep Function
   Used for ease-in ease-out animations
    and anti-aliasing hard edges
    • Flat tangents at x=0 and x=1




                  1 cos x 
          f ( x)  
                  2    2
Smoothstep Function
Smoothstep Approximation
   A cheap polynomial approximation
    • From the family of Hermite blending functions.




         f approx( x)  3x  2 x 2         3
Smoothstep Approximation
Absolute Error
Relative Error
Relative Error Detail
Incremental Algorithms
Incremental Methods
Q: What is the fastest method to calculate
   sine and cosine of an angle?

A: Just two instructions.
  There are however two provisos.

  1. You have a previous answer to the problem.
  2. You are taking equally spaced steps.
Resonant Filter
   Example using 64 steps   int N = 64;
    per cycle.               float a = sin(2PI/N);
                             float c = 1.0f;
   NOTE: new s uses the     float s = 0.0f;
    previously updated c.    for(int i=0; i<M; ++i) {
                               output_sin = s;
                               output_cos = c;
                               c = c – s*a;
                               s = s + c*a;
                               ...
                             }
Resonant Filter Graph
Resonant Filter Quarter Circle
Goertzels Algorithm
   A more accurate               float cb = 2*cos(b);
    algorithm                     float s2 = sin(a+b);
                                  float s1 = sin(a+2*b);
    • Uses two previous samples
                                  float c2 = cos(a+b);
      (Second Order)
                                  float c1 = cos(a+2*b);
                                  float s,c;
   Calculates x =                for(int i=0; i<m; ++i) {
    sin(a+n*b) for all               s = cb*s1-s2;
    integer n                        c = cb*c1-c2;
                                     s2 = s1; c2 = c1;
                                     s1 = s; c1 = c;
                                     output_sin = s;
                                     output_cos = c;
                                     ...
                                  }
Goertzels Algorithm Graph
Goertzels Initialization
   Needs careful initialization
    • You must account for a three iteration lag



      // N steps over 2PI radians
      float b = 2PI/N;

      // subtract three steps from initial value
      float new_a = a – 3.0f * b;
Goertzels Algorithm Quarter Circle
Table Based Solutions
Table Based Algorithms
   Traditionally the sine/cosine table was the
    fastest possible algorithm
    • With slow memory accesses, it no longer is


   New architectures resurrect the technique
    • Vector processors with closely coupled memory
    • Large caches with small tables forced in-cache


   Calculate point samples of the function
    • Hash off the input value to find the nearest samples
    • Interpolate these closest samples to get the result
Table Based Sine
Table Based Sine Error
Precalculating Gradients
   Given an index i, the approximation is…


    sin x  table i   * (table i  1  table i)
             table i   * gradient i

   Which fits nicely into a 4-vector…


        sine       cosine     sin-grad cos-grad
How Accurate Is My Table?
   The largest error occurs when two samples
    straddle the highest curvature.
     • Given a stepsize of ∆x, the error E is:


                             x 
                 E  1  cos 
                             2 
    • e.g. for 16 samples, the error will be:


               1  cos 16  0.0192147
How Big Should My Table Be?
   Turning the problem around, how big should a
    table be for an accuracy of E?
    • We just invert the expression…


                        E  1%
             1  cos N   1%
                 cos N   1  0.01
                       N   arccos0.99
                           N  22.19587...
                           N  23
How Big Should My Table Be?
   We can replace the arccos() with a small angle
    approximation, giving us a looser bound.


                            
                N
                            2E
   Applying this to different accuracies gives us a
    feel for where tables are best used.
Table Sizes
                E          360°        45°
1% accurate     0.01       23          3
0.1% accurate   0.001      71          9
0.01% accurate 0.0001      223         28
1 degree        0.01745    17          3
0.1 degree      0.001745   54          7
8-bit int       2-7        26          4
16-bit int      2-15       403         51
24-bit float    10-5       703         88
32-bit float    10-7       7025        880
64-bit float    10-17      ~infinite   8.7e+8
Range Reduction
Range Reduction
   We need to map an infinite range of input
    values x onto a finite working range [0..C].

   For most transcendentals we use a technique
    called “Additive Range Reduction”
    • Works like y = x mod C but without a divide.
    • We just work out how many copies of C to subtract from x
      to get it within the target range.
Additive Range Reduction
1.   We remap 0..C into the 0..1 range by scaling

        const float C = range;
        const float invC = 1.0f/C;
        x = x*invC;


2.   We then truncate towards zero (e.g. convert to int)

        int k = (int)(x*invC);
               // or (x*invC+0.5f);


3.   We then subtract k copies of C from x.

        float y = x – (float)k*C;
High Accuracy Range Reduction
   Notice that y = x-k*C has a destructive
    subtraction.

   Avoid this by encoding C in several constants.
    • First constant C1 is a rational that has M bits of C’s
      mantissa, e.g. PI = 201/64 = 3.140625
    • Second constant C2 = C - C1
    • Overall effect is to encode C using more bits than machine
      accuracy.

      float n = (float)k;
      float y = (x – n*C1) – n*C2;
Truncation Towards Zero
   Another method for truncation
    • Add the infamous 1.5 * 224 constant to your float
    • Subtract it again
    • You will have lost the fractional bits of the mantissa

       A = 123.45        = 1111011.01110011001100110
       B = 1.5*2^24 = 1100000000000000000000000.
       A = A+B           = 1100000000000000001111011.
       A = A-B           = 1111011.00000000000000000


    • This technique requires you know the range of your input
      parameter…
Quadrant Tests
   Instead of range reducing to a whole cycle,
    let’s use C=Pi/2 - a quarter cycle
    • The lower bits of k now holds which quadrant our angle is in


   Why is this useful?
    • Because we can use double angle formulas
    • A is our range reduced angle.
    • B is our quadrant offset angle.



    sin( A  B)  sin( A) cos( B)  cos( A) sin( B)
    cos( A  B)  cos( A) cos( B)  sin( A) sin( B)
Double Angle Formulas
   With four quadrants, the double angle
    formulas now collapses into this useful form


          sin  y  0 *  2   sin  y 
          sin  y  1*  2   cos y 
          sin  y  2 *  2    cos y 
          sin  y  3 *  2   sin  y 
Four Segment Sine
A Sine Function
   Leading to code like this:

       float table_sin(float x)
       {
         const float C = PI/2.0f;
         const float invC = 2.0f/PI;
         int k = (int)(x*invC);
         float y = x-(float)k*C;
         switch(k&3) {
           case 0: return sintable(y);
           case 1: return sintable(TABLE_SIZE-y);
           case 2: return -sintable(TABLE_SIZE-y);
           default: return –sintable(y);
         }
         return 0;
       }
More Quadrants
   Why stop at just four quadrants?
    • If we have more quadrants we need to calculate both the
      sine and the cosine of y.
    • This is called the reconstruction phase.

              3                 3                3 
      sin  y    sin  y * cos   cos y * sin  
              16                 16                16 
    • Precalculate and store these constants.
    • For little extra effort, why not return both the sine AND
      cosine of the angle at the same time?
    • This function traditionally called sincos()in FORTRAN
      libraries
Sixteen Segment Sine
 float table_sin(float x)
 {
   const float C = PI/2.0f;
   const float invC = 2.0f/PI;
   int k = (int)(x*invC);
   float y = x-(float)k*C;
   float s = sintable(y);
   float c = costable(y);
   switch(k&15) {
     case 0: return s;
     case 1: return s*0.923879533f + c*0.382683432f;
     case 2: return s*0.707106781f + c*0.707106781f;
     case 3: return s*0.382683432f + c*0.923879533f;
     case 4: return c;
     ...
   }
   return 0;
 }
Math Function Forms
   Most math functions follow three phases of
    execution


             1. Range Reduction
             2. Approximation
             3. Reconstruction

   This is a pattern you will see over and over
    •   Especially when we meet Polynomial Approximations
Polynomial Approximation
Infinite Series
   Most people learn about approximating
    functions from Calculus and Taylor series

                       3     5     7      9
      sin x   x      
                    x  x x   x
                    3! 5! 7! 9!

   If we had infinite time and infinite storage,
    this would be the end of the lecture.
Taylor Series
   Taylor series are generated by repeated
    differentiation
    • More strictly, the Taylor Series around x=0 is called the
      Maclauren series

                                       f 0 f 0
           f x   f 0  f 0                    
                                         2!       3!


   Usually illustrated by graphs of successive
    approximations fitting to a sine curve.
Taylor Approx of Sine
Properties Of Taylor Series
                             x3 x5 x7 x9
               sin x   x      
                             3! 5! 7! 9!

   This series shows all the signs of convergence
    • Alternating signs
    • Rapidly increasing divisor


   If we truncate at the 7th order, we get:
                      1 3 1 5          1 7
        sin( x)  x  x         x       x
                      6     120      5040
                 x  0.16667 x  0.0083333x 5  0.00019841x 7
Graph of Taylor Series Error
   The Taylor Series, however, has problems
    • The problem lies in the error
    • Very accurate for small values but is exponentially bad for
      larger values.


   So we just reduce the range, right?
    • This improves the maximal error.
    • Bigger reconstruction cost, large errors at boundaries.
    • The distribution of error remain the same.


   How about generating series about x=Pi/4
    • Improves the maximal error.
    • Now you have twice as many coefficients.
Taylor 7th Order for –Pi/2..Pi/2
Taylor 7th Order for 0..Pi/2
Taylor 7th Order for 0..Pi/2
   And now the bad news.

           sin( x)  0.0000023014110 
                      1.000023121x 
                   0.00010117322 x 2 
                   0.1664154429 x 3 
                   0.00038530806 x 4 
                    0.008703147018 x 5 
                   0.0002107589082 x 6 
                   0.0001402989645 x 7
Taylor Series Conclusion
   For our purposes a Taylor series is next
    to useless
    • Wherever you squash error it pops back up
      somewhere else.
    • Sine is a well behaved function, the general case
      is much worse.


   We need a better technique.
    • Make the worst case nearly as good as the best
      case.
Orthogonal Polynomials
Orthogonal Polynomials
   Families of polynomials with interesting
    properties.
    • Named after the mathematicians who discovered them
    • Chebyshev, Laguerre, Jacobi, Legendre, etc.


   Integrating the product of two O.P.s returns
    zero if the two functions are different.

                                        c j
                                               if i  j
             w( x) Pi x Pj x  dx   0
                                              otherwise
                                        
    • Where w(x) is a weighting function.
Orthogonal Polynomials
   Why should we care?
    • If we replace Pi(x) an arbitrary function f(x), we end up with
      a scalar value that states how similar f(x) is to Pj(x).
    • This process is called projection and is often notated as


               f Pj  f w Pj   f x Pj x wx  dx


   Orthogonal polynomials can be used to
    approximate functions
    • Much like a Fourier Transform, they can break functions into
      approximating components.
Chebyshev Polynomials
   Lets take a concrete example
    • The Chebyshev Polynomials Tn(x)

         T0  x   1
         T1  x   x
         T2  x   2 x 2  1
         T3  x   4 x 3  3 x
         T4  x   8 x 4  8 x 2  1
         T4  x   16 x 5  20 x 3  5 x


       Tn 1  x   2 xTn  x   Tn 1  x 
Chebyshev Plots
   The first five Chebyshev polynomials
Chebyshev Approximation
   A worked example.
    • Let’s approximate f(x) = sin(x) over [-π..π] using Chebyshev
      Polynomials.
    • First, transform the domain into [-1..1]


          a  
          b 
                   a b      ab
       g x   f         x    
                   2          2 
               sin  x 
Chebyshev Approximation
   Calculate coefficient kn for each Tn(x)


                           g x T x wx  dx
                           1
                                   n
                   kn     1
                                    cn


    Where the constant cn and weighting function w(x) are


             
                   if n  0
                                         w x  
                                                     1
        cn  
              2 otherwise
                                                   1 x2
Chebyshev Coefficients
   The resulting coefficients

            k0  0.0
            k1  0.5692306864
            k 2  0.0
            k 2  0.666916672
            k 4  0.0
            k5  0.104282369
            k6  

    • This is an infinite series, but we truncate it to produce an
      approximation to g(x)
Chebyshev Reconstruction
   Reconstruct the polynomial in x
    • Multiply through using the coefficients kn

         g ( x)  k0 1 
                k1  x  
                         
                k2 2 x 2  1 
                k 4 x  3 x  
                 3
                        3


                k 4 x  3 x  
                 3
                        3


                k 8 x  8 x  1 
                 4
                        4       2


                k 16 x  20 x  5 x 
                 5
                            5       3
Chebyshev Result
   Finally rescale the domain back to [-π..π]

                           2     ab
               f x   g     x    
                          ba    ba


    • Giving us the polynomial approximation

               f ( x)  0.984020813 x 
                     0.153301672 x 3 
                      0.00545232216 x 5
Chebyshev Result
   The approximated function f(x)
Chebyshev Absolute Error
   The absolute error sin(x)-f(x)
Chebyshev Relative Error
   The relative error tells a different story…
Chebyshev Approximation
   Good points
    •   Approximates an explicit, fixed range
    •   Uses easy to generate polynomials
    •   Integration is numerically straightforward
    •   Orthogonal Polynomials used as basis for new techniques
         - E.g. Spherical Harmonic Lighting


   Bad points
    • Imprecise control of error
    • No clear way of deciding where to truncate series
    • Poor relative error performance
Minimax Polynomials
Pafnuty Chebyshev 1821-1894
   Chebyshev described what the
    error for the best approximating
    polynomial should look like.

   An Order-N approximation error
    curve should:

    a. Change sign N+1 times
    b. Touch the maximal error N+2 times


   This function minimizes the
    maximal error, so it is called the
    Minimax Polynomial
Evgeny Remez 1896-1975
   The algorithm to generate
    these polynomials is the
    Remez Exchange Algorithm

   The algorithm requires:
    •   A function to be approximated
    •   A range of approximation
    •   The required order of approximation
    •   A weighting function
Remez Exchange Algorithm
   The algorithm works by generating a set of
    zero crossings for the error curve, e.g.

            sin  x   a  bx0  cx0  0
                                    2


            sin  x   a  bx1  cx12  0
                         
            sin  x   a  bxn  cxn  0
                                    2




   These are solved for a,b and c as a matrix
    inversion. The maximal error is found and one
    xn is altered. Repeat until minimized.
Remez Exchange Algorithm
   The algorithm is very sensitive to error and
    requires large number representations
    (e.g. 50 digits or more) and numerical solvers.

    • The best place to find these tools is in a mathematical
      package like Maple or Mathematica.

    • In Maple, a raw call to minimax() will use all coefficients
      available, so we will have to set up the calculation…
Forming The Minimax Inequality
   Start by looking at the Taylor Expansion of
    sine to get a feel for what the end result
    • Every odd power, first coefficient is 1.0


     sin x   x  0.16667 x3  0.0083333x5  0.00019841x 7


   Transform the problem into finding P(x2) in

               sin x  x  x3 Px 2 
The Minimax Inequality
   Next, we form the minimax inequality
    expressing our desire to minimize the relative
    error




           sin x   x  x P( x )
                          3     2
                                    error
                    sin x 
The Minimax Inequality
   Divide through by x3




          sin x  1
              3
                   2  P( x )
                            2

            x        x          error
                  sin x 
                      3
                    x
The Minimax Inequality
   We want only every other power, so
    substitute y = x2



              
          sin y 1
                       P( y )
               32
             y          y
                                 error
                    
                  sin y
                       32
                     y
The Function to Approximate
   We solve for P(y) giving us



            P y  
                          
                     sin y 1
                             
                          32
                        y      y
   With the weighting function

                         32
            w y  
                        y
                          
                     sin y
Final Details
   Before we can run this through the minimax algorithm,
    we need to convert to a Taylor Series
     • This stops the sin() function being evaluated and greatly
       speeds the calculation

                         1  y   y2    y3
               P( y )                 
                         6 120 5040 362880

   We also transform our approximation range
                              
                                        2
                x 2  y  0..   0.. 2 
                           2  2 
Calculating The Minimax Poly
   We run these values through the minimax
    algorithm:

      minimax(p(y), y=0..Pi^2/4, [2,0], w(y));

   It returns (to 5 d.p.):
          P y   0.16666  0.0083143 y  0.00018542 y 2

   Finally, we resubstitute back into the original
    equation

    sin x   x  0.16666 x3  0.0083143x5  0.00018542 x 7
Minimax Polynomial Error
   Maximal error = 1.108e-6 at x=1.5707
Magic Numbers
   Comparing the results

       taylor  x  0.16667 x 3  0.0083333x 5  0.00019841x 7
    minimax  x  0.16666 x 3  0.0083143x 5  0.00018542 x 7

   An enormous increase in accuracy with a tiny
    change in coefficients.
    • We must be very sure of our coefficients.
    • We must pay attention to floating point rounding.
Side Note: PS2 ESIN Instruction
   The manual entry for ESIN has a small error in it.
Improving On Perfection
   We can optimize polynomials for floating point
    representation
    • This will guarantee better accuracy for small angles, as the
      first term has the most effect in these cases.
    • Requires a smaller range than Pi/2, so we will use 0..Pi/4


   Using a different form of minimax inequality,
    we can specify the first coefficient:


               sin x   x  kx3  x5 P x 2 
Optimizing For Floating Point
   Now we have to specify the first coefficient as
    a single-precision float.

                            
            k  0.1666666666   1 6

   Multiply k by 224, round to zero and divide by
    224 to create an exactly representable number.


       k  2 24   2796203
                  
            24
          2         16777216
                   0.16666668653488159179687500000
Optimizing For Floating Point
   Solving the Minimax Inequality for P(x) and
    reinserting the answer gives us


sin x   x 
                2796203 3
                        x +0.00833282412 x 5  0.000195878418 x 7
               16777216

   Maximal error = 6.275e-9 at x=0.7855
    • About twice the error of the unoptimized version but with
      better accuracy for small angles.
Float Optimized Sine Minimax
Optimizing for Small Angles
   Most functions are bound by the accuracy of
    their first coefficient
    • In the same way we split the range reduction into two
      coefficients, we can do the same for the first coefficient of a
      polynomial, e.g.



            ln( x)  a0 x  a1 x  x P( x)    2



    • Where a0+a1 = k, the leading coefficient.
Fast Polynomial Evaluation
Polynomial Evaluation
   Now we have our coefficients, we need to
    evaluate them efficiently.

    • When presented with a polynomial of the form


           y  cx  bx  a
                     2


    • Most of us would produce code like


           y = c*x*x + b*x + a;
Large Polynomials
   Faced with a larger polynomial…


     y  Cx  Bx  Ax  x
             7      5      3


    … you can see the need for better methods


      y = C*x*x*x*x*x*x*x +
          B*x*x*x*x*x +
          A*x*x*x +
          x;
Horner Form
   Most good programmers will use Horner Form
    to encode the polynomial

     z = x*x;
     y = (((C*z + B)*z + A)*z + 1)*x;

   This also codes well if you have multiply-add
    instructions available

   Horner Form is used by nearly all libraries,
    with coefficients stored as arrays of values
The Problem With Horner Form
   Horner Form leads to very serial computations
    • Later instructions are waiting for earlier results.
    • A lot of instructions are stalled in the pipeline.
    • Makes no use of available SIMD instructions.


   There is a better way – Estrin’s Method.
    • Comes from parallel computation research
    • Only documented in Knuth – nowhere online!
Estrin’s Method
   Stems from the observation that by
    bracketing, we can produce regular sub-
    expressions

         f ( x)  Dx  Cx  Bx  A
                    3      2


               Dx  C x  Bx  A
                               2


               Px 2  Q

                                        x
    The trick is to keep a running copy of 2N   as
    “glue” between each sub expression…
Estrin’s Method Table
p0(x) =                                                             A
p1(x) =                                                         (Bx+A)
p2(x) =                                                   (C)x2+(Bx+A)
p3(x) =                                               (Dx+C)x2+(Bx+A)
p4(x) =                                       (E)x4+((Dx+C)x2+(Bx+A))
p5(x) =                                    (Fx+E)x4+((Dx+C)x2+(Bx+A))
p6(x) =                            ((G)x2+(Fx+E))x4+((Dx+C)x2+(Bx+A))
p7(x) =                        ((Hx+G)x2+(Fx+E))x4+((Dx+C)x2+(Bx+A))
p8(x) =                 (I)x8+(((Hx+G)x2+(Fx+E))x4+((Dx+C)x2+(Bx+A)))
p9(x) =             (Jx+I)x8+(((Hx+G)x2+(Fx+E))x4+((Dx+C)x2+(Bx+A)))
p10(x)=     ((K)x2+(Jx+I))x8+(((Hx+G)x2+(Fx+E))x4+((Dx+C)x2+(Bx+A)))
p11(x)= ((Lx+K)x2+(Jx+I))x8+(((Hx+G)x2+(Fx+E))x4+((Dx+C)x2+(Bx+A)))
Estrin’s Method Chart
Packing Estrin’s Into SIMD
   Evaluating the polynomial takes O(log N) time
    • Proves that you can improve on Horner Form


   Each row should be evaluated in parallel
    • Estrin’s assumes N processing units
    • We only have four-way SIMD instructions
    • We will still have serialization stalls


   Many architectures don’t allow interaction
    between vector elements
    • We have to split calculations across SIMD registers
    • We can often use redundant calculations to speed evaluation
Brute Force Searching
   One tool can find the perfect evaluation order
    for every architecture – Brute Force
    • We generate trees of expressions
    • Flatten the tree into an ordered sequence of instructions
    • Evaluate the sequence w.r.t a particular architecture


   Each run can generate millions of trees to test
    • Run the program overnight, keeping the best result so far
    • Distribute the calculation over many machines
Expression Tree
      f ( x)  Ax  Bx  Cx  Dx
                    3     5        7
Common Expression Removal
Tree After Rotations
                                        
     f ( x)  x  A  Bz   z 2 C  Dz 
             where z  x     2
Coding Using SIMD
   Using SIMD instructions to code this
    • Temptation is to gather all coefficients into a 4-vector
    • Calculate [x, x3, x5, x7] and do one big multiply
    • Finally sum the result

     ...
     calculate [ x, x3, x5, x7 ]
     ...
     N :=   [ A, B , C , D ]
          * [ x, x3, x5, x7 ]
     ...
     sum the elements of N
     ...
Coding With SIMD
   This turns out to be very slow
    • You are only using one parallel instruction!
    • Better to think of calculating in two pairs
     ...
     calculate [ x, x2, x, x2 ]
     ...
     N :=   [ A, B, C, D ]
          * [ x, x2, x, x2 ]

     N :=     [ Ax, Bx2, Cx, Dx2 ]
            * [ -, -, x2, x2 ]
     ...
     sum the elements of N
     ...
Summing Elements
   Summing the elements of a vector is very
    serial if you don’t have a special instruction
    • Better to split the problem over two registers.

          A.x = A.x + A.y          A.x = A.x + A.y
                        .          B.x = B.x + B.y
                        .          .
                        .          .
          A.x = A.x + A.z          .
                        .          A.x = A.x + B.x
                        .          .
                        .          .
          A.x = A.x + A.w          .
                        .          Finished.
                        .
                        .
                Finished.
SIMD Insights
   The more work you have to do, the more
    opportunities for parallelisation
    • Easier to code higher power polynomials as O(log n) starts
      to be more and more effective
    • Use redundant elements as “free duplicate values” to avoid
      moving elements around
    • You end up distributing coefficients in small groups

   SinCos a good example              Coeff1   :=   [A,   A,   E,   E]
                                       Coeff2   :=   [B,   B,   F,   F]
     •More efficient to do both sine   Coeff3   :=   [C,   C,   G,   G]
     and cosine at the same time       Coeff4   :=   [D,   D,   H,   H]
Higher Order Functions
More Transcendental Functions
   What about the other Transcendentals?
    • Tangent, Arctangent, Logarithm, Exponent, Roots, Powers


   Let’s take a look at how well they are
    approximated by polynomials
    • For each power, make an approximation (e.g. minimax
      between [0..1])
    • Calculate the maximum error within the range
    • Find the negative base-2 log of the error


   This value tells us how many bits of accuracy
    we have for each approximation
Table Of Error In Bits

           2      3      4      5      6      7      8
 Exp(x)    6.8   10.8   15.1   19.8   24.6   29.6   34.7

 Sin(x)    7.8   12.7   16.1   21.6   25.5   31.3   35.7

Ln(1+x)    8.2   11.1   14.0   16.8   19.6   22.3   25.0

Atan(x)    8.7   9.8    13.2   15.5   17.2   21.2   22.3

 Tan(x)    4.8   6.9    8.9    10.9   12.9   14.9   16.9

 Asin(x)   3.4   4.0    4.4    4.7    4.9    5.1    5.3

 Sqrt(x)   3.9   4.4    4.8    5.2    5.4    5.6    5.8
Interpretation
   Sine and Exponent give us 4 bits per power
    • Will only need a 7th order approximation for single floats


   Why are tangent and arcsine so badly
    behaved?
    • We will find out shortly…


   Square Root will never be approximated by a
    polynomial.
    • Less than half a bit of accuracy per power
    • We must use Newton’s Method with good initial guesses
Hardware Square Root
   You will never out perform a hardware square root
     • To avoid serializing the Newtons Iterations hardware designs
       use a redundant signed-digit number system.


          637  6 10 2  3 101  7 100
                1103  4 10 2  4 101  3 100  144 3
                1103  4 10 2  3 101  7 100  1437

    • Carry propagation happens only once, at the end of the
      calculation
Tangent
Tangent Definitions
   Tangent is defined as the ratio of sine over
    cosine


                            sin x 
                 tan x  
                            cosx 

   For infrequent use this calculation is good
    enough
    • Rememember to test for cos(x)=0 and return BIGNUM
    • Has twice the inaccuracy of sin or cosine
Using Trig Identities
   We must range reduce as much as possible
    • Using trigonometric identities to help us




               tan  x    tan x 

   We record the sign, force x to positive
    • We will replace the sign during reconstruction
    • We have halved our input range
Range Reduction
   The next identity is the fun one


                   tan x  
                                     1
                                      
                             tan   x 
                                 2     
                            cot  2  x 

   This tells us two things
    1. The range 0..Pi/2 is repeated over and over.
    2. There is a sweet spot at Pi/4 where both sides are equal.
Tangent and Cotangent
Tangent and Cotangent Detail
Minimax Approximations
   Tangent between 0..Pi/4 is simple enough
    • Looking at the Taylor Series, it uses odd powers.


                            x 3 2 x 5 17 x 7
               tan( x)  x                
                            3 15 315


    • Apart from being strangely resistant to polynomial
      approximation, it is a straightforward minimax.



                tan( x)  x  x P x   3
                                           2
Cotangent Minimax
   But how to approximate Cotangent?
    • It has infinite values and high curvature


   There is a trick using small angles
    • The arctangent is just cosine/sine
    • As x0 the sinex and cosine1
    • Meaning that as x0 then cotangent 1/x
Cotangent and 1/x
Cotangent Minus 1/x
Cotangent Minimax
   We can use a different minimax form


                      1
             cot( x)   xP x
                      x
                              2
                                       
   This has an interesting property
    • The reciprocal is based on nothing but x
    • We can start a reciprocal instruction way up the pipeline and
      use the result after calculating the difference polynomial
Final Tangent Form
   The final form of the tangent function

                    tan x         0 x 
                                           4
                    
                    cot x   4
                    
                                     
                                          x
        tan  x   
                                      4      2


                    cot x         
                                      2   x   3
                                                 4
                    
                    tan x   4
                    
                                     3
                                      4    x 
Tangent Error Plot 0..2Pi
Rational Polynomials
   Minimax can also produce rational polynomials
    • A pair of polynomials that are divided by each other to
      produce the final result


                    tan( x) 
                              xP x 2  
                              Q x2   
   Rational Polynomials can be very accurate for
    low powers
    • If you can take the cost of a divide or reciprocal at the end
      of the approximation process
    • Popular for high accuracy functions
Rational Tangent
   Rational approximation to tangent
    • Gives 1.3e-8 accuracy using fewer operations


               1.15070768 x  0.110238971x 3
        1.15070770  0.49380897 x 2  0.0111809593x 4


    • Can improve this by dividing through by denominator
      constant, saving us a load-constant instruction


               0.999999986 x  0.0958010197 x 3
            1  0.429135022 x 2  0.00971659383x 4
Arctangent
Inverse Trig Functions
   Arctangent
    • The inverse of a tangent
    • Maps [–inf..+inf] to [–/2../2]




   Arcsine and Arccosine
     • Asin maps [-1..+1] to [–/2../2]
     • Acos maps [-1..+1] to [..0]
Why only Arctangent?
   Atan, Acos and Asin are closely related
                                1 x2        
           arccosx   arctan               
                                x            
                                             
                                 x           
           arcsin x   arctan 
                                
                                              
                                              
                                 1 x        
                                       2



                          
           arccosx          arcsin x 
                          2
                                 1 x 
                      2 arcsin 
                                 2 
                                       
                                      
Range Reduction
   Arctan has these interesting properties


          arctan  x    arctan( x)
                                   1
             arctan x    arctan  
                          2          x

   We only need to define arctan between 0..1
    • That is one huge range reduction
Arctan Taylor Series
   The Taylor series for Arctangent looks
    strangely familiar, yet different…
    • It converges very, very slowly
    • Doesn’t even converge for |x|>1


                               3        5   7
         arctan x   x     
                          x x x
                          3 5 7

    • For 1e-8 accuracy you need to evaluate 7071 terms!
    • We need polynomial approximations to make the function a
      viable calculation
Minimax Inequality
   The minimax form is very straightforward


              arctan x   x  x P x3
                                          
                                           2



   But gives us very poor accuracy
    • A 13th power minimax approximation over 0..1 gives a
      maximal error of 0.9!
    • We need to reduce the range a lot more
Double Angle Formulation
   Arctangent is the inverse of a tangent
    • So let’s assume x = tan(a+b) so that arctan(x) = a+b.
      What are a and b ?
                             tan a   tan b 
         x  tan a  b  
                            1  tan a  tan b 
         z  tan a   a  arctan  z 
         k  tan b   b  arctan k 
                zk         xk
         x           z 
               1  zk      1  kx
    arctan x   arctan k   arctan z 
                 b  arctan z 
The X-Z Mapping

                               xk 
     arctan x   b  arctan         
                               1  kx 

   How do we use this?
    • It remaps input x into the smaller z-range
    • We can subdivide 0..1 into, say, 2 segments, each with their
      own b offset and k parameter.


   What value of k to choose?
    • If z(0) = -z(1) we have equal distribution either side of the
      graph, which happens when k = sqrt(2)-1
    • Or divide by equal input values – ¼, ½, ¾
Some X-Z Mappings
Designing Your Function
   Now you have a set of             float arctan(x)
    tools to play with:               {
                                        float k,b,sign = +1.0f;
    • Range reduction to 0..1           bool compl = false;
                                        // remove sign
    • Compliment if x>1.0               if(x<0.0) { s = -s; y = -y; }
    • Choice of X-Z map                 // compliment if x>1.0
        - Equal-Angle maps are most     if(x>1.0) { compl = true;
          accurate                                  x = 1/x;}
                                        // x-z range reduce
        - Equal-Input maps are          if(x<0.5) { k = 0.25f;
          faster to range reduce                    b = arctan(k); }
    • Choice of minimax order           else { k = 0.75f;
                                               b = arctan(k); }
                                        y = (y-k)/(1+k*y);
                                        // poly approx
                                        y = b + arctan_poly(x);
                                        // finish compliment
                                        if(compl) y:=Pi/2 - y;
                                        // reinstate sign
                                        return y*sign;
                                      }
Using Arctan for Arccosine
   The ideal example for using IEEE754 flags
    • To convert from arctan to arccosine we need to cope as x
      tends towards zero

                                     1 x2   
                arccosx   arctan          
                                     x       
                                             

    • Arccos(0.0) = Arctan(+inf) = Pi/2

    • We must remember to reset the divide-by-zero flag before
      returning from arccos(0.0)
Exponent
Exponent: The Anti-Log
   All exponents can be expressed as powers of
    the Euler Constant.


          10  e   ln10
                           10  e
                                x
                                       ln10 x



                                     e x ln10
            2  e ln 2       2 x  e x ln 2
Range Reduction
   Exponent also uses additive range reduction,
    much like sine and cosine

                x  N ln( 2) / 2  r
                    NC  r
   This means we break x into two parts
    • 2 raised to some exponent plus a fractional residual.
    • Which floating point numbers already do!
                                   N
                  exp( x)  2 e    2    r
Polynomial Approximation
   Looking at the Taylor series

                           x 2 x3 x 4
         exp x   1  x     
                           2 6 24

   We can subtract the 1 and minimax it over the
    range [0..ln(2)/2] with


          exp x   1  x  x 2 Px 
The Exponent Function
   Two methods of coding             float fast_exp(x)
                                      {
                                        const float C = 2.f/ln(2);
    1. Breaking your input into         const float iC = ln(2)/2.f;
       exponent and mantissa and        // range reduction
       reassembling at the end          int N = (int)(x*C);
                                        float y = x-N*invC;
    2. Using tables for 2N to keep      // approximation
       all work in float registers,     y = exp_poly(y) + 1.0f;
       but only supporting a small      // reconstruction
       range of input                   return power2[N>>1]*y;
                                      }
Semi-Table Based Reconstruction
   New research has better methods for
    reconstructing the power of two
    • There are a family of algorithms by Tang where K > 1 in
                             N

           exp( x)  2 e r   2K


    • To reconstruct, say, 2N/128 we calculate

               N  128M  16 K  J
                N           K        J
           2   128
                      2M  2  2
                            8       128


    • Where the last two terms are from 8 and 16 entry tables
Fast Float Exponent
   How about just constructing a float directly
    from an integer power?
    • Remember how the IEEE exponent works.


        01011100000100000000000000000000


    • float = sign * 2(exponent + 127) * 1.mantissa
    • Note also that :
                                   x              1
                                                       x
                                 ln 2 
                       ex  2              2   ln(2 )


                            2   kx
Fast Float Exponent
   Take an integer in [-126..127], e.g. 12
          00000000000000000000000000001100

   Integer multiply it by integer 223 / ln(2)

       * 00000000101110001010101000111011
       = 00001000101001111111101011000101

   Add in the bias offset integer 127 * 223

       + 00111111100000000000000000000000
       = 01001000001001111111101011000110
       = 172011.09375
Fast Float Exponent
   Plotting the function 1/(1-exp(-x)) shows an offset:
Fast Exponent Fixed
   Subtracting 0.7*223 = 5872025 removes the offset
Increasing Resolution
   The problem with single precision floats
    • there are only 255 exponent values we can use hence the
      staircase effect.


   Solution:
    • Allow the input power to have some fractional bits.
    • Take float, multiply it by 23 = 8 and floor it, e.g. 12.125 = 97
           00000000000000000000000001100001
    • Multiply by 2(23-3) / ln(2), add IEEE bias 127*223

    • Pick how many fractional bits you want, but the bias
      changes each time
        - e.g. if bits = 6 then bias = 3*217
High Resolution Fast Exponent
High Res Fast Exponent Detail
High Res Relative Error
Logarithm
Logarithms
   Ln(x) returns exponents
    • Defined in the range [0..+infinity]


   There are two main logarithm functions
    • log(x) refers to the logarithm base 10
    • ln(x) refers to the Natural Logarithm base e


   They are freely convertible
                                  ln( x)
                       log( x) 
                                 ln( 10)
Taylor Series
   The Taylor Series comes in two forms
    • First is closely related to the arctangent series


                         z 3 z 5 z 7 z11    x 1
          ln x   2
                      z      where z 
                         3 5 7 11          x 1


    • Second is simpler but converges as slowly


                          z2 z3 z4 z5 z6
          ln x  1  x      
                          2 3 4 5 6
Range Reduction
   Range reduction for ln uses multiplicative
    range reduction
    • Given that we store floating point numbers as exponent and
      mantissa

         x  sign  2n  f        where 1.0  f  2.0

    • we find that the logarithm of such a number is

              ln x   ln 2n  f 
                           
                      ln 2  ln f
                             n


                      n ln 2  ln f
Dissecting Floats
   How do we split a float into exponent and
    mantissa?
    • On a PC it’s easy, you just mask and shift bits.
    • Remember to subtract 127 from the exponent.


   On the PS2 Vector Units, we have a problem.
    • Only 16 bit integer registers – not enough to hold even the
      entire mantissa.
    • No way to shift the exponent into lower bits
    • Float-to-int conversion loses the exponent


   But there is a trick…
Extracting The Exponent
   What happens if you convert a float to a float?
    • Pretend the contents of a floating-point register is an integer
      value and convert this “integer” into a float.
    • For example, take the value 257*1.125
    • sign = 0
      exponent = 57 + 127 = 184
      mantissa = 1.125


         01011100000100000000000000000000


    • This bit pattern represents the integer 1544552448
Extracting The Exponent
 • Multiply 1544552448 by 1/223 to rescale the decimal point


                   1
      1544552448  23  184.125
                  2
 • Subtract the exponent bias of 127



            184.125 127  57.125

 • Convert to integer to obtain the true exponent
A Mathematical Oddity: Bitlog
   A real mathematical oddity
    •   The integer log2 of a 16 bit integer
    •   Given an N-bit value, locate the leftmost nonzero bit.
    •   b = the bitwise position of this bit, where 0 = LSB.
    •   n = the NEXT three bits (ignoring the highest 1)


            bitlog( x)  8  b 1  n
   Bitlog is exactly 8 times larger than log2(x)-1
Bitlog Example
   For example take the number 88

       88 = 10110002
       b = 6th bit
       n = 0112 = 3
       bitlog(88) = 8*(6-1)+3
               = 43
    • (43/8)+1 = 6.375
    • Log2(88) = 6.4594…

   This relationship holds down to bitlog(8)
Bitlog Graph
X To The Power N
Integer Powers
   Raising to an integer power
    • multiply all powers of x where N has a bit set



              N  23  101112
              x  x x x x
                23       16      4      2



   If N<0 then calculate 1/result
   If x<0 then record sign and reinstate later
    • Note that odd powers of –ve number lose the sign
Integer Power Function
   As simple as it sounds        float powi(float x,int N)
                                  {
    • Loop through bits of N
                                    float y,w;
    • Keep squaring x each loop     bool ns = false,xs = false;
                                    // record signs
                                    if(N<0) ns = true, N = -N;
   Careful noting and              if(x<0.0f) xs = true, x = -x;
    reinstating signs               // setup initial powers
                                    if(N&1) y = x;
                                    else y = 1.0f, xs = false;
                                    w = x; N >>= 1;
                                    while(N) {
                                      w = w * w;
                                      if(N&1) y *= w;
                                      N >>= 1;
                                    }
                                    // reinstate signs
                                    if(xs) y = -y;
                                    if(ns) y = 1.0f/y;
                                  }
XN Approximation
   From Graphics Gems 4
                          x
        pow( x, n)               for 0  x  1
                     n  nx  x




        approximation                actual
Raising to a Power
   The canonical method is to do the calculation
    in two parts – log and exponent

              pow x, N   e N ln x
                             N log 2 x
                          2
    • Range reduction only needs to be done once.
    • MUST use extended accuracy in internal workings.
       - Log2(x) must have 8+ extra bits of accuracy
    • The most expensive operation in the toolbox.
    • Also the most difficult to write.
       - e.g. calculating 56156 naively gives an error of 1052 ULPS!
Floating Point Power Hack
   The log and power can be faked
    • If we force a float through the int-to-float conversion, the
      result is approximately logarithmic.


            int_to_flo at x  A log 2 x  B

    • And conversely the inverse operation approximates

                                      xB
            float_to_i nt x   2     A



   Credited to Matthew Jones, Infogrames
Expanding Out
   Combine these two operations to approximate
    the power function

                                 ( A log2 x  B )  B
                             N
                  x 2
                    N                     A

                                          ( N 1) B
                             N log2 x 
                        2                    A


    • So we have an extra term to subtract before we can use the
      approximate power.
Floating Point Power Hack
   So, the algorithm goes like this
    • First, convert our float into logarithmic space
       float a = float_as_int(x);

    • We multiply this value by the power we want to raise it to.
       a = a * power;

    • Then we subtract the magic constant
       magic = float_as_int(1.0f) * (power-1.0f);
       a = a – magic;

    • And finally reconvert back to a float
       a = int_as_float(x);
Scaling the Result
   We can scale the resulting value
    • by reformulating the magic number

       magic = float_as_int(1.0f) * (power-1.0f);
       a = a – magic;


    • Can be rewritten to give us a 255 * xN for no additional work

       magic = float_as_int(255.0f) –
               float_as_int(1.0f) * power;
       a = a + magic;
Cos16(x) Graph
Cos16(x) Absolute Error
Cos16(x) Relative Error
Another Approach
   An interesting effect
    • When raising pixel intensities in the range [0..255] to a power, for
      most of the range the value is less than one intensity level




    • As noted by Beaudoin & Guardado in “Direct3D ShaderX: Vertex and Pixel
      Tips and Tricks”
Low Power Approximation
   Higher powers look a bit like lower powers
    scaled and offset
    • Example: Using x4 to approximate x16
Low Power Approximation
   Our function is zero until it is greater than one
    part in 255
Low Power Approximation
   New power approximation is:

          x  max ax  b,0
            m                       n


    • Where n<<m and a and b are:

                   1
          a
              1  2561 n
          b  1 a
Low Power Approx Error
Improved Error
   Using a binary search, improved values of a and b can
    minimize the maximal error
Low Power Approx Problem…
   There is however a problem

    • The paper doesn’t seem to have obvious “best” values
      for a and b.
       - The results are just presented as a table of values with errors.
         The choice is already made for you.


    • Closer powers don’t seem to reduce the error
       - so there is no obvious way to control error, relative or absolute.


   What is going on?
    • Let’s plot the maximal error for every power approximating
      every other as a surface…
Low Power Approx Problem…



    x4 approx of x16




   The original example was a global minimum!
Conclusions
Conclusions
   Get to know IEEE754
    • It’s being updated as we speak
    • If your implementation differs, you will understand
      it’s weaknesses


   Always test both absolute and relative
    error
    • Functions may look accurate on a graph
    • Relative error will show the tough weaknesses
Conclusions
   Incremental Algorithms
    • Used for generating sequences of trig operations
    • Some of the fastest techniques around




   Table Based Algorithms
    • Don’t discount them just yet
    • Tables are usually far too big – check the size
    • Use range reduction and reconstruction on tables
Conclusions
   Polynomial Approximations
    • Some of the most flexible techniques around
    • Range Reduce, Approximate, Reconstruct
    • Choose between
       -   Aggressive or simple range reduction
       -   Polynomial order
       -   Parallel polynomial evaluation techniques
       -   Expensive or simple reconstruction
       -   Semi table-based reconstruction


    • Use Maple or Mathematica to your advantage
Conclusions
   New techniques
    • Floating Point Hacks
       - An exciting new set of techniques


    • Gaining support form hardware designers
       - IA64 architecture uses 8-bit mantissa lookup for 1/x


    • Combine with semi table-based reconstruction
       - Accuracy over a limited range
       - Research to be done on accurate approximations
References
   Many of the original books are out of print
       - Researchers work with piles of photocopies!


    • Hart, J.F., Computer Approximations, John Wiley & Sons, 1968
    • Cody & Waite, Software Manual for the Elementary
      Functions, Prentice Hall, 1980
    • Goldberg, Steve, What Every Computer Scientist Should
      Know About Floating Point Arithmetic, ACM Computing
      Surveys, March, 1991
    • Muller, J.M., Elementary Functions: Algorithms and
      Implementations, Birkhaüser, 1997
    • Tang, Ping Tak Peter, Table Lookup Algorithms for
      Elementary Functions and Their Error Analysis, Proceedings
      of 10th Symposium on Computer Arithmetic, 1991
Website
   Notes available from:



         http://research.scea.com/



robin_green@playstation.sony.com
Questions?

								
To top