Floating-Point Arithmetic by 773ElO84

VIEWS: 21 PAGES: 80

									Decimal Floating-Point Arithmetic

           Dongdong Chen




              EE800, U of S         1
                  Objectives
• IEEE 754-2008 standard for Decimal
  Floating-Point (DFP) arithmetic (Lecture 1)
  –   DFP numbers formats
  –   DFP number encoding
  –   DFP arithmetic operations
  –   DFP rounding modes
  –   DFP exception handling

                      EE800, U of S             2
             Objectives (Con.)
• Algorithm, architecture and VLSI circuit
  design for DFP arithmetic (Lecture 2)
  –   DFP adder/substracter
  –   DFP multiplier
  –   DFP divider
  –   DFP transcendental function computation



                      EE800, U of S             3
            Background


The decimal computer arithmetic went out
of style 25 to 30 years ago; no one uses it
now." Is that true?




                 EE800, U of S                4
                   Introduction
• Decimal is still essential for specific applications
   – Numbers in commercial databases are decimal
   – Extensive use decimal in commercial applications
   – Survey of commercial databases report
   – Decimal fixed-point or floating-point number
• How to process decimal computation
   – Software computation
   – Convert back to decimal representation
   – Problems


                         EE800, U of S                   5
            Introduction (Con.)
• Errors from decimal and binary conversion
   – Example 1: represent 0.1 in DFP or BFP
      Decimal representation (BCD code):0.0001
      Binary representation: 0.00011… 0.09…
   – Example 2: telephone billing Cost: 0.70; Tax: 5%
      BFP arithmetic: 0.6999…8*(1.05)=0.734999…
      DFP arithmetic: 0.70*(1.05)=0.74
• Decimal integer, fixed-point or floating-point?
• Decimal hardware or software solutions?


                         EE800, U of S                  6
             Current Researches
• DFP arithmetic defined in IEEE 754-2008
• IBM computing systems include DFP hardware
   – IBM Power6, z9, z10
• Intel include DFP software solution in system
   – Intel DFP software computation library
• DFP arithmetic IP blocks:
   – Basic DFP arithmetic IPs:
   DFP adder/substrcter, multiplier, divider, square root etc.
   – Transcendental DFP arithmetic IPs:
   DFP CORDIC, Logarithm, antilogarithm, reciprocal etc.
                          EE800, U of S                          7
DFP Arithmetic in IEEE 754-2008


• Review BFP arithmetic in IEEE 754-2008
• How to define new DFP in IEEE 754-2008




                 EE800, U of S             8
  BFP Floating-point representation
• Representation:
   – sign, exponent, significand (or mantissa):
                  (–1)sign × significand × 2exponent
   – more bits for significand gives more accuracy
   – more bits for exponent increases range
• IEEE 754 floating point standard:
   – single precision: 8 bit exponent, 23 bit significand
   – double precision: 11 bit exponent, 52 bit significand


                            EE800, U of S                    9
      BFP floating-point Number
• Leading “1” bit of significand is implicit
   –Example: if the significand is 011010110…0, the
    actual significand is 1.011010110…0
• This is called a normalized number; there is
  exactly one non-zero digit to the left of the
  point.
   –Unique representation of a number
   –We get a little more precision: there are 24 bits in
    the significand, but only 23 of them are stored.


                        EE800, U of S                  10
                       Exponent
• Exponent is “biased” to make sorting easier
   – all 0s is smallest exponent, all 1s is largest
   – The actual exponent is e-127 for single precision, and
     e-1023 for double precision
   – Bias of 127 for single precision and 1023 for double
     precision
   – By biasing the exponent and storing it before the
     significand, we can compare magnitudes as if they were
     unsigned integers.
      • If e = 1000 0011 (13110), the actual exponent is 131-127=4
      • If e = 0101 1101 (9310), the actual exponent is 93-127=-34


                            EE800, U of S                            11
      BFP Floating-Point Formats

          Short (32-bit) format

    8 bits,       23 bits for fractional part
    bias = 127,   (plus hidden 1 in integer part)
    –126 to 127

Sign Exponent                    Significand
    11 bits,
    bias = 1023,                       52 bits for fractional part
    –1022 to 1023                      (plus hidden 1 in integer part)


                              Long (64-bit) format



                                         EE800, U of S                   12
   BFP Floating-Point Formats (Con.)
Positive and        0
                    1 00000000 00000000000000000000000                               0
negative zero
                        Biased                            Fraction
                       exponent
 Positive and
negative infinity   1 11111111 00000000000000000000000
                    0                                                                ∞

                       Biased                             Fraction
                      exponent
            Negative underflow                      Positive underflow


Negative        Expressible                                Expressible      Positive
Overflow         negative                                   positive        Overflow
                 numbers                                    numbers

  - (2 – 2-23)×2128           -2-127    0         2-127              (2 – 2-23)×2128
     exponent = 128 and fraction ≠ 0, It is called “not a number” or NaN

                                  EE800, U of S                                 13
                 Example
• Summary: FP representation
    (–1)sign×(1+significand)×2exponent – bias
• Example:
  – decimal: -.75 = -3/4 = -3/22
  – binary: -.11 = -1.1 x 2-1
  – floating point: exponent = 126 = 01111110
  – IEEE single precision:
  1 01111110 10000000000000000000000

                    EE800, U of S               14
    DFP Number Representation
• Representation:
   – sign, exponent, significand (or mantissa):
                 (–1)sign × significand × 10exponent
   – more digits for significand gives more accuracy
   – more bits for exponent increases range representation:
• DFP formats:
   – decimal32: DFP storage format encoded in 32-bit
   – decimal64: DFP computational format encoded in 64-bit
   – decimal128: DFP computational format encoded in 128-bit

                          EE800, U of S                       15
           DFP Number format



• 1-bit Sign (S) is defined as same as BFP format
• w+5-bit combination (G) to two subfield:
  – 5-bit (G0…G4) to encode: 2 MSBs of exponent; 1 MSD of
    significand; Not-a-Number (NaN); Inf;
  – W-bit(G5…Gw+4) as a suffix 2 MSBs derived from G0…G4,
    which consists of w+2-bit nonnegative biased exponent.

                        EE800, U of S                 16
                 DFP Exponent
• Exponent is “biased” to make sorting easier
  – Binary format (not decimal)
  – The actual exponent is e-101 for decimal32, e-398 for
    decimal64, e-6167 for decimal128
  – Range of exponent is (emin−q+1) ≤ e ≤ (emax−q+1);




                          EE800, U of S                     17
      DFP Number format (Con.)
• J×10-bit Trailing Significand (T) Field:
   – Densely packed decimal (DPD) encoding
     3-digit decimal number encoded to 10-bit binary number
     DPD converted to binary coded decimal (BCD)
   – Binary integer decimal (BID) encoding
     decimal number encoded by binary integer
   – Non-normalized decimal significand
     (-1)0 × 0.00900 × 102          (-1)0 × 0.09000 × 101
   – DFP number’s Cohort


                          EE800, U of S                   18
Parameters in DFP Format




         EE800, U of S     19
                   Example
• Summary: DFP representation
• (–1)sign×(significand)×10exponent-bias
• Convert -8.35×10-2 to decimal64
   – Sign bit: “1” negative, “0” positive (sign 1)
   – Exponent: -2+398=396 (8-bit “0110001100”)
   – Significand: 835(50-bit DPD coding “0…00 02 3D”)
   – Encoding of 5-bit MSBs (G0…G4) of Combinational
     field “01000”
   – Decimal-64 : “10100010001100…..00…1000111101”
     “A2 30 00 00 00 00 02 3D” (binary/hex)
                       EE800, U of S                20
           DFP special values
• Not-a-Number: G0…G4 “11111”;
• Infinite Number: G0…G4 “11110”, sign of Inf
  according to the sign bit;
• Overflow: If DFP numbers with absolute values are
  larger than the largest DFP number (|vmax|=(10q -
  1)×10emax-q+1) then overflow occurs.
• Underflow: If DFP number are less than the smallest
  DFP number (|vmin|=10emin-q+1) then underflow
  occurs. If the absolute value of DFP number is less
  than 10emin and larger than 10emax-q+1, it produces
  subnormal.
• Normal number: The remaining exponent values and
  significands represent normal numbers.
                      EE800, U of S              21
     DFP Arithmetic Operations
• Basic DFP arithmetic operations
• Two decimal-specific DFP operations
   – SameQuantum(DFP1,DFP2)
   – Quantize(DFP1,DFP2)
• DFP comparison operations
  – do not distinguish between redundant of the same
    number
• DFP conversion operations
   – DFP to BFP conversion (correctly rounded);
   – DFP to integer conversion
• Recommended DFP operations
                        EE800, U of S             22
     DFP Arithmetic Operations
• Basic DFP arithmetic operations
• Two decimal-specific DFP operations
   – SameQuantum(DFP1,DFP2)
   – Quantize(DFP1,DFP2)
• DFP comparison operations
  – do not distinguish between redundant of the same
    number
• DFP conversion operations
   – DFP to BFP conversion (correctly rounded);
   – DFP to integer conversion
• Recommended DFP operations
                        EE800, U of S             23
         DFP Number’s Cohort
• Non-normalized decimal significand
• DFP number’s Cohort
• Standard defines the preferred (required) exponent
  (quantum)
   – Exact operation results: the cohort member is selected
     based on the preferred exponent (quantum) for a DFP
     result of that operation
   – Inexact operation results: the cohort member of least
     possible exponent is used to get the maximum number of
     significant digits


                        EE800, U of S                   24
           DFP Rounding Modes
• Five types of active rounding modes
   –   roundTiesToEven
   –   roundTiesToAway
   –   roundTiesToPositive
   –   roundTiesToNegative
   –   roundTowardZero
• Correct rounding and Faithful rounding
• IEEE 754-2008 require to satisfy the correct
  rounded results for all DFP arithmetic operations
• DFP operations should satisfy all rounding modes
                        EE800, U of S             25
       DFP Exception Handling
• Invalid operation: Operand is NaN; 0×Inf; quare-
  root of negative operand; default result is NaN
• Division by zero: if the dividend is a finite non-zero
  number and the divisor is zero. The default result is
  a +inf or −inf.
• Overflow operation: if the magnitude of a result
  exceeds the largest finite number representable in
  the format of the operation.
• Underflow operation: if the magnitude of a result is
  below 10emin.
• Inexact: the correctly rounded result of an operation
  differs from the infinite precision result.

                       EE800, U of S                 26
DFP Addition/Subtraction




         EE800, U of S     27
DFP Add/Sub Data flow




        EE800, U of S   28
                DFP Addition
• Step 1: equalize the exponents
  – add the mantissas only when exponents are the
    same.
  – the number with smaller exponent should be
    shifting its point to the left, and the number with
    larger exponent should be shifting its point to
    right.
  – Rewriting the operand with the smaller exponent
    could result in a loss of the least significant digits
  – keep guard digit, round digit, and stick digit for
    the operand with smaller exponent
                       EE800, U of S                   29
              DFP addition
• Step 2: add the mantissas
                 0099999x101
                +0016234x10-3
                 0999990x100
                 0000016(234)x100
                 1000006(234) x100
• Step 3: Normalize the result if necessary

                    EE800, U of S             30
              DFP addition
• Step 4: Round the number if needed
       1000006234x100 =1000006x100
• Step 5: Repeat step 3 if the result is no
  longer normalized
• The final result is 1000006
• The correct answer is 1000006.234

                     EE800, U of S            31
                Guard bits
• To help minimize rounding problems, IEEE
  specifies that intermediate steps of
  operations must store guard digits -
  additional internal digits that increase the
  precision of the operations.
• Previous example: add one extra digit.
• IEEE 754-2008 requires one guard digit,
  one rounded digit and one sticky digit to
  make rounding more accurate.
                    EE800, U of S            32
DFP add/sub




   EE800, U of S   33
General Description: Addition




            EE800, U of S       34
Example: Addition




      EE800, U of S   35
Example: Addition (Con.)




         EE800, U of S     36
DFU: IBM POWER6 and Z10




        EE800, U of S     37
High performance Implementation




             EE800, U of S   38
High performance Implementation




             EE800, U of S   39
High performance Implementation




                             [12] A. Vázquez and E. Antelo“A
                             High-performance Significand BCD
                             Adder with IEEE 754-2008 Decimal
                             Rounding” ARITH19, Portland. June
                             08-10 2009

             EE800, U of S                              40
Evaluation Results and Comparison




  [Proposed]: A. Vázquez and E. Antelo“A High-performance Significand BCD
  Adder with IEEE 754-2008 Decimal Rounding” ARITH19, Portland. June
  08-10 2009
                                 EE800, U of S                              41
DFP Multiplication




      EE800, U of S   42
Scheme of decimal multiplier
             x:                1963×
             y:                8145=
             xy0:   5x         9815
                     0         0000
             xy1: 5x          9815
                    −x       -1963
             xy2 :   x       1963
                     0       0000
             xy3: 10x      19630
                   −2x     -3926
                           15988635

           EE800, U of S           43
Partial product generation


                          Generate XYi
                          Yi {1,2,3…7,8,9}
                          XYi is carry save format




          EE800, U of S                       44
Partial product generation
                   Solid Circles: BCD Sum (digit)
                   Hollow Circles: Carry (bit)




                          n-digit radix-10 CSA




                          m-digit radix-10 counter

          EE800, U of S                              45
Carry Save Adder Tree



                        CSA Tree to Generate
                        Multiplication Result




        EE800, U of S                     46
Flowchart of DFP Multiplier




                              47
Architecture of DFP Multiplier




                                 48
Exception Detection & Handling
 • Invalid operation
    – sNaN (pass significand of sNaN)
    – 0 x ∞ (produce qNaN with significand 0)
 • Overflow (and Inexact)
    – IEIP – SLA > Emax
    – Increase SLA until all LZs removed
 • Underflow (and possibly Inexact)
    – IEIP – SLA < Emin
    – Decrease SLA until 0, then shift right
 • Inexact
                                                49
       Implementation Highlights
• Leverage operands' LZCs
  – SC, SLA, and IESIP
• Handle NaNs with minimal overhead
  – No dataflow modification
  – Coerce multiplicand or multiplier to 1
• Support gradual underflow
  – No dataflow modification
  – Simply extend number of iterations
• Simple, control-based rounding scheme
                                             50
                Synthesis Results
•   64-bit (16 digit) operands, DPD encoded
•   LSI Logic's gflxp 0.11um CMOS, 55ps FO4
•   Synopsys Design Compiler
•   Results
    – Fixed-point      119,653 um2       14.72 FO4s
    – Floating-point   237,607 um2       15.45 FO4s
• Critical path
    – Fixed-point      4:2 compressor (accumulator)
    – Floating-point   128-bit barrel shifer
                                                  51
Applicability to Parallel Designs
•   IE and IP shift generation
•   Rounding scheme
•   NaN handling
•   Exception detection and handling

• On-the-fly sticky bit generation... NO


                                           52
  Sequential vs. Parallel
• Sequential
  – Less area
  – Potentially better cycle time
• Parallel
  – Less latency
  – Higher throughput


                                    53
DFP Division




   EE800, U of S   54
                                         64
                                                     DFP Division Data Flow                                                        64


        Sign (1 bit)
                             Combinational Field
                                  (5 bits)
                                                                       Exponent Field (8 bits)

                                                                        E1_b 8                      8 E2_b
                                                                                                                                Significands Field (50bits)

                                                                                                                                M1_b 50                50 M2_b
                                                                                                                                                                               Unpacking
                                                                                                                                                                                           •   Unpacking
                                C1
                                     5              5
                                                    C2
                                                           2                  Combin_Register                                           DPD_to_BCD                                             Decimal Floating-
                                                                                                                                                                                               Point Number
         1              1                                E1_a
                                                                         E1    10              10      E2                        M1_b 60                60 M2_b
                               Combinational                  2
                                                          E2_a
                                Div Process
         S1



                                                                                                                                                                                           •
                        S2
                                                                                 Exponent                                           Combin_Register


        Sign Logic                                        4
                                                                                Substraction

                                                                                                                                  M1 64                 64   M2
                                                                                                                                                                                               Check for zeros
                                                          4
                                                                M2_a

                                                                M1_a
                                                                                    E12
                                                                                          10
                                                                                                                                                                                               and infinity
                                                                                                                                                                                           •
                                                                                                                                    Mantissa Division
                                                                               Bias Addition
                                                                                                                                                                                               Subtract
                                                                                                                                                                                               exponents
F              Sq
                                                                                                                                                  72
                                                                                                                                             Mn
                                                                                     Ea 10



                                                                                  Exponent
                                                                                 Adjustment                  1
                                                                                                                 Fa
                                                                                                                                        Normalization
                                                                                                                                                                                           •   Divide Mantissa
                                                                                                                                                                                           •
    1          1




                                                                                          10                                                      72
                                                                                                                                                                  72


                                                                                                                                                                            Rounding
                                                                                                                                                                                               Normalize and
                                                                                                                                            Mn
                                                                                                                                                                             Control
                                                                                                                                                                                               detect overflow
                                                                                                                                                                                               and underflow
                                                                                  Exponent                                Fa2
                                                                                                                                           Rounding                     1
                                                                                 Adjustment                           1
                                                                                                                                                                       Fr
                                                                                    Ea    10
                                                                                                                                                  64



                                                                                                                                                                                           •
                                                                                                                                                   Mq


                                                                                                                                                                                               Perform rounding
                                                                Eq_C
                                                                 2
                                                                               Exponent Div
                                Combinational                                                                                           Significand_Div
                                 Com Process                    Mq_C
                                                                                                                                                  60



                                                                                                                                                                                           •
                                                                                                                                                    Mq


                                                                                                                                                                                               Replace sign
                                                                  4
                                                                                    Eq     8
                                               Cq                                                                                       BCD_to_DPD
                                              5
                                                                                                                                                  50



                                                                                                                                                                                           •
                                                                                                                                                    Mq
                   11
        Sign (1 bit)
           Eb
                             Combinational Field
                                  (5 bits)
                                                                               Field
                                                                       ExponentM12 (8 bits)
                                                                                               64
                                                                                                                                Significands Field (50 bits)                  packing          Packing

                                                                                                                                             EE800, U of S                                                  55
         Unpacking and Sign Logic
                         64                                                 64


                   Combinational Field   Exponent Field (8 bits)
Sign (1 bit)                                                             Significands Field (50bits)   Unpacking
                        (5 bits)




               •    Step1: Unpacking Floating-Point Number
                    Check for zeros and infinity (if F=0, Stop)
  S1 1                        1 S2




                                                 •       Step2: Sign Process
         Sign Logic
                                                                   Sq  S1  S2

                   1 Sq


                                                EE800, U of S                                                56
   Exponent Subtraction
E1 11         11 E2



  Exponent
 Substraction
                      •     Step3: Exponent Subtract
    E12 11
                            Eb  E1  E2 + bias

 Bias Addition


    Eb   11



                      EE800, U of S                    57
        Mantissa Division
                              Algorithms Choose here?
                              1. Restoring division
M1 64      64   M2            2. Non-restoring division
                              3. High-Radix division
                              4. Convergence division
Mantissa Division         •     Step4: Mantissa Division
                              0.1  M1  1 0.1  M 2  1
    M12 68
                              M min  0.1 M max  1  10 p  1

 0.1  M min / M max  M1 / M 2  M max / M min  10

                      EE800, U of S                         58
                Normalization
                10                      M12   68
           Eb



           Exponent         1
                                       Normalization
          Adjustment       Fa


           Ea   10                       Mn   68




      •   Step5 : Left shift over one bit is
          needed to make Mantissa result
          Normalized, also need to detect
          overflow and underflow
For example: “0934…2140819564” Left shift one bit 
“934…21408195640 Should tell exponent and Ea=Eb-1
                       EE800, U of S                   59
                   Rounding and Packing
        10         Ea                              68     Mn                                       68




     Exponent                       Fr                               Fr                Rounding
                                               Rounding
    Adjustment                       1                               1                  Control

        10        Eq                              64    Mq




•    Step6 : Truncate, Round-up, Round-to-nearest.
     Sometimes, the Rounding Policy above is not fair,
     according to IEEE Rounding standard: “Round to nearest
     even” is more better.
             11         Combinational Field
    Sign (1 bit)
       Eb
                                                      Field
                                              ExponentM12 (8 bits)        Significands Field (50 bits)   packing
                             (5 bits)
                                                            64


•    Step7: Packing the Sign bit and Exponent bits and
     Significand bits together, detect the NaN, Infinity,
                                                   EE800, U of S                                                   60
   High performance Implementation




[1] L.-K. Wang and M. J. Schulte, “Decimal Floating-Point Division Using Newton-Raphson
Iteration,” Proceedings of the IEEE International Conference on Application-Specific Systems,
Architectures and Processors, pp. 84-95, Sep. 2004.
                                           EE800, U of S                                        61
   High performance Implementation




[2] Tomás Lang and Alberto Nannarelli, “A Radix-10 Digit-Recurrence Division Unit: Algorithm and
Architecture,”IEEE Transactions on Computers, pp727–739, IEEE, June 2007.

                                          EE800, U of S                                   62
High performance Implementation




             EE800, U of S   63
 Evaluation Results and Comparison

                           DFP Divider[1]          DFP Divider[2]
     Precision (digit)     16 (decimal64)          16 (decimal64)
     Cycle time (ns)            0.57                      1
        # of cycles             150                       20
      Latency (ns)              85.5                      20


1:   Synthesized with a STM 90-nm standard cell library


                               EE800, U of S                        64
DFP Transcendental Arithmetic




            EE800, U of S   65
                Contents
•   Introduction
•   Decimal Logarithmic Converter
•   Decimal Antilogarithmic Converter
•   Conclusions
•   Future Work




                   EE800, U of S        66
          32-bit DFP Logarithm
X  (1) s 10e  coefficient
R  log10 ( X )  log10 (10e ) + log10 (coefficient )

coefficient is a non-normalized decimal Integer.

Example: R  log10 ((1)0 108  0024589)

                8 + 5 + log10 (0.2458900)

To guarantee a 32-bit DFP Calculation, there need to
keep 14-digit FXP logarithmic calculation.
                          EE800, U of S                 67
        32-bit DFP Antilogarithm
             P  Anti log10 ( X )  10 X
Here:      log10 ( X min )  X  log10 ( X max )

For 32-bit DFP:           X [101,96.99999]
                           X Int X Frac           X Int
  Anti log10 ( X )  10                     10             10
                                                                  X frac


                                                                           5
Example:      Anti log10 ((1) 1940467 10 )
                                           1


          Anti log10 (19.40467)  1019 100.4046700
To guarantee a 32-bit DFP calculation, there need to
keep 8-digit FXP antilog calculation.
                              EE800, U of S                                     68
Digit-Recurrence Algorithm (Log)
The corresponding recurrences:
           E ( j + 1)  E[ j ](1 + e j 10 j )
           L( j + 1)  L[ j ]  log10 (1 + e j 10 j )

 Here:      E[1]  m       L[1]  0

         ej ∈{-9 -8 -7…0 1…7 8 9}

e j selected so that E( j + 1) converges to 1

                            EE800, U of S                69
Digit-Recurrence Algorithm (Antilog)
Any 7-digit fixed-point decimal input N:
                10( m)  em ln(10)  em '
The corresponding recurrences:
                                            j
         L( j + 1)  L[ j ]  ln(1 + e j 10 )
         E ( j + 1)  E[ j ]  (1 + e j 10 j )

Here: E[1]  1 L[1]  m ' f i  1 + e j 10 j
 e j selected so that L( j + 1) converges to 0
              ej ∈{-9 -8 -7…0 1…7 8 9}
                            EE800, U of S         70
  Selection By Rounding (cont.)
A scaled remainder is defined as:

Log:            W [ j ]  10 j (1  E[ j ])

Antilog:        W [ j ]  10 ( E[ j ])
                                j



e j is achieved by Rounding W [j]
                 e j  round (W [ j ])
e1 is achieved by using look-up table, e2…ej can
be obtained with selection by rounding
                       EE800, U of S               71
Architecture: Decimal Log Converter
                                m          28
                                Reg 1
               8                   28
          Detector
                       2    Mult1
                            32
                                   8
                                            Tab I                      Stage 1                                       e1
                                                                                                                      4
                                                                                                                                              Stage 2




                                                                                                                              Mux 7
                     m2m 3m 5m     e1                  4
                             Reg 2                                                                                                                        (1/ln(10))
                                                                                                                     ej                  4
                             “0000”
                                                                                                                                                     4         56
                                       56 m'                                                                                                                           Adjusted Costant
                                                                                                                      4
   W[j]
           56        56
                     m'               e1
                                                4          4e
                                                              j
                                                                       56
                                                                       m'
                                                                            “0000”
                                                                              ej
                                                                                     4
                                                                                     56
                                                                                           56
                                                                                            1
                                                                                                              56
                                                                                                              W[j]
                                                                                                                                      Tab II Mult3
                                                                                                                                        64                64
                                                                                                                                                                       0 & Log 10(5,2,3)
                                                                                                                                                                              64       64
            Mux 1                          Mux 2                         Mux 3              Mux 4
                                                                            56
                                                                                                                                             Mux 8                            Mux 9
                  56             4                                      9'sCom                       56
                                                                            56
                  Mult2       56
                                                                   14-Digit Decimal CLA Adder
                                                                                                                                                64             64
                                                                                                                                                                                           Reg 6

          Shifter (x10-j)           9'sCom                    Shifter (x10)
                                                                                 56

                                                                                           Shifter (x100)
                                                                                                                                16-Digit Dec CLA
             56                         56                      56                                 56                                                    64
Reg 4
                            Mux 5                                                  Mux 6
                               56                                                     56

                14-Digit Dec CLA
                                                                                                      Reg 5




                                                                                                                          4
                                                                  56
                                                    W[j]
                Rounding Logic                        ej      4
                                                                                             Reg 3




                                                     critical path

                                                                                                EE800, U of S                                                                         72
         Implementation Results
    Logic Utilization            Used Available* Utilization
   # of Occupied Slices          2842   13696       21%
  Maximum Frequency                             47.7 MHz
   # of Clock Cycles                          17 clock cycle

*: Xilinx Virtex2p XC2VP30 with package ff1157 and speed -7


Critical Path Detail (ns):
 Reg2    Mux2 Mult 2     Shifter   Mux5        CLA    Round    Total
 1.188   1.564   9.347   1.438     1.350      5.519   0.566    20.97


                              EE800, U of S                            73
  Architecture: Dec. Antilog Converter
                             X frac 28
                                 Reg 1    28
                                               ln(10)
                                    28
                        Cons Mul
                          “0000” 32
                                 m'  40
                                  Reg 2
                                                   Stage 1                           Stage 2
                                                   12                                              Critical Path
                       40                 TAB I
                                           e 8 1
                                                           4   ej
                                                                        4   ej             40
                                             AddGen    AddGen

                9'sCom
                                                 7
                                                   Mux 1
                                                         7                  Mult
                                                                            40 “0000”
                                                                                             e1     40 ‘1’        40
                                                                                             40
                                            7                                       Mux 4              Mux 5
                       40            TABLE II        “0000”
                                                       40                          40
                                            40                                                                          Reg 6
          40                                                                     Shifter (x10-j)             40
                                   Shifter (x10j+1)    9'sCom
Shifter_Reg                                                                        40
                                         40                40

  40
               Mux 3
               40
                                                 Mux 2
                                                40
                                                                        10-digit Dec CLA
                                                                               40 L(j)
                                                                                                                            40

               10-digit Dec CLA
                        W[j]
                                                                                       Final Rounding
                               40                                                         28
                            Rounding Logic                                                  Reg 5
                                4    ej
                                                                                          28
                                Reg 3
                                4   ej




                                                        EE800, U of S                                                  74
           Implementation Results
      Logic Utilization           Used Available* Utilization
     # of Occupied Slices         2315   13696       17%
    Maximum Frequency                            51.5 MHz
     # of Clock Cycles                         11 clock cycle

*: Xilinx Virtex2p XC2VP30 with package ff1157 and speed -7

Critical Path Detail (ns):

   Reg6    Mult    Mux4    Shifter      CLA      Round   Total
   1.599   7.839   1.539     1.100     6.794     0.545   19.42


                               EE800, U of S                     75
                  Comparison
  (with Binary FXP Log and Exponential Converters)
• similar dynamic range for the normalized coefficients.
      223  107  224          252  1016  253
• Binary reference available having the same digit-
  recurrence algorithm with Selection by Rounding.
• The radix-10 is close to radix-8.




                        EE800, U of S                 76
                       Comparison (cont.)
      (with Binary FXP Log and Exponential Converters)

                           Radix-10 Decimal1              Radix-8 Binary [1]
                           Log.             Exp.          Log.          Exp.
     Precision (digit)     7       16        7      16   24       53   24         53
        Area (fa2)       1630 2640 1370 2260             647 1829      627 1777
     Cycle time (T3)      17       19      16       18    7        8    7          8
        # of cycles        8       17        8      17    8       18   11         21
      Latency (T3)       136      323     128      306   56      144   77        168

1: Synthesized with a TMSC 0.18-um standard cell library
2: the area of 1-bit full adder
3: the delay of 1-bit full adder

                                  EE800, U of S                             77
                Conclusions
• Achieved 32-bit DFP accuracy of decimal log and
  antilog results.
• Implemented them on FPGA and ASIC.
• Compare them with binary converters.




                     EE800, U of S              78
                        Future Work
• The 64-bit and 128-bit DFP logarithm and antilog
  converters.
• The presented architecture can be optimized to
  achieve a faster speed or occupy a smaller area.




                                EE800, U of S                 79
    EE990 April. 2009    Decimal Log and Antilog Converters   79/18
                    Summary
• IEEE 754-2008 defines a DFP standard that
  defines
   – number representation in several precisions
   – correct DFP arithmetic operations
   – rounding modes
• Implementation of DFP Adder, Multiplier, Divider,
  Logarithmic and Antilogarithmic Converter
• Implementing and programming DFP are both
  really hard.


                         EE800, U of S             80

								
To top