VIEWS: 21 PAGES: 80 POSTED ON: 6/22/2012 Public Domain
Decimal Floating-Point Arithmetic Dongdong Chen EE800, U of S 1 Objectives • IEEE 754-2008 standard for Decimal Floating-Point (DFP) arithmetic (Lecture 1) – DFP numbers formats – DFP number encoding – DFP arithmetic operations – DFP rounding modes – DFP exception handling EE800, U of S 2 Objectives (Con.) • Algorithm, architecture and VLSI circuit design for DFP arithmetic (Lecture 2) – DFP adder/substracter – DFP multiplier – DFP divider – DFP transcendental function computation EE800, U of S 3 Background The decimal computer arithmetic went out of style 25 to 30 years ago; no one uses it now." Is that true? EE800, U of S 4 Introduction • Decimal is still essential for specific applications – Numbers in commercial databases are decimal – Extensive use decimal in commercial applications – Survey of commercial databases report – Decimal fixed-point or floating-point number • How to process decimal computation – Software computation – Convert back to decimal representation – Problems EE800, U of S 5 Introduction (Con.) • Errors from decimal and binary conversion – Example 1: represent 0.1 in DFP or BFP Decimal representation (BCD code):0.0001 Binary representation: 0.00011… 0.09… – Example 2: telephone billing Cost: 0.70; Tax: 5% BFP arithmetic: 0.6999…8*(1.05)=0.734999… DFP arithmetic: 0.70*(1.05)=0.74 • Decimal integer, fixed-point or floating-point? • Decimal hardware or software solutions? EE800, U of S 6 Current Researches • DFP arithmetic defined in IEEE 754-2008 • IBM computing systems include DFP hardware – IBM Power6, z9, z10 • Intel include DFP software solution in system – Intel DFP software computation library • DFP arithmetic IP blocks: – Basic DFP arithmetic IPs: DFP adder/substrcter, multiplier, divider, square root etc. – Transcendental DFP arithmetic IPs: DFP CORDIC, Logarithm, antilogarithm, reciprocal etc. EE800, U of S 7 DFP Arithmetic in IEEE 754-2008 • Review BFP arithmetic in IEEE 754-2008 • How to define new DFP in IEEE 754-2008 EE800, U of S 8 BFP Floating-point representation • Representation: – sign, exponent, significand (or mantissa): (–1)sign × significand × 2exponent – more bits for significand gives more accuracy – more bits for exponent increases range • IEEE 754 floating point standard: – single precision: 8 bit exponent, 23 bit significand – double precision: 11 bit exponent, 52 bit significand EE800, U of S 9 BFP floating-point Number • Leading “1” bit of significand is implicit –Example: if the significand is 011010110…0, the actual significand is 1.011010110…0 • This is called a normalized number; there is exactly one non-zero digit to the left of the point. –Unique representation of a number –We get a little more precision: there are 24 bits in the significand, but only 23 of them are stored. EE800, U of S 10 Exponent • Exponent is “biased” to make sorting easier – all 0s is smallest exponent, all 1s is largest – The actual exponent is e-127 for single precision, and e-1023 for double precision – Bias of 127 for single precision and 1023 for double precision – By biasing the exponent and storing it before the significand, we can compare magnitudes as if they were unsigned integers. • If e = 1000 0011 (13110), the actual exponent is 131-127=4 • If e = 0101 1101 (9310), the actual exponent is 93-127=-34 EE800, U of S 11 BFP Floating-Point Formats Short (32-bit) format 8 bits, 23 bits for fractional part bias = 127, (plus hidden 1 in integer part) –126 to 127 Sign Exponent Significand 11 bits, bias = 1023, 52 bits for fractional part –1022 to 1023 (plus hidden 1 in integer part) Long (64-bit) format EE800, U of S 12 BFP Floating-Point Formats (Con.) Positive and 0 1 00000000 00000000000000000000000 0 negative zero Biased Fraction exponent Positive and negative infinity 1 11111111 00000000000000000000000 0 ∞ Biased Fraction exponent Negative underflow Positive underflow Negative Expressible Expressible Positive Overflow negative positive Overflow numbers numbers - (2 – 2-23)×2128 -2-127 0 2-127 (2 – 2-23)×2128 exponent = 128 and fraction ≠ 0, It is called “not a number” or NaN EE800, U of S 13 Example • Summary: FP representation (–1)sign×(1+significand)×2exponent – bias • Example: – decimal: -.75 = -3/4 = -3/22 – binary: -.11 = -1.1 x 2-1 – floating point: exponent = 126 = 01111110 – IEEE single precision: 1 01111110 10000000000000000000000 EE800, U of S 14 DFP Number Representation • Representation: – sign, exponent, significand (or mantissa): (–1)sign × significand × 10exponent – more digits for significand gives more accuracy – more bits for exponent increases range representation: • DFP formats: – decimal32: DFP storage format encoded in 32-bit – decimal64: DFP computational format encoded in 64-bit – decimal128: DFP computational format encoded in 128-bit EE800, U of S 15 DFP Number format • 1-bit Sign (S) is defined as same as BFP format • w+5-bit combination (G) to two subfield: – 5-bit (G0…G4) to encode: 2 MSBs of exponent; 1 MSD of significand; Not-a-Number (NaN); Inf; – W-bit(G5…Gw+4) as a suffix 2 MSBs derived from G0…G4, which consists of w+2-bit nonnegative biased exponent. EE800, U of S 16 DFP Exponent • Exponent is “biased” to make sorting easier – Binary format (not decimal) – The actual exponent is e-101 for decimal32, e-398 for decimal64, e-6167 for decimal128 – Range of exponent is (emin−q+1) ≤ e ≤ (emax−q+1); EE800, U of S 17 DFP Number format (Con.) • J×10-bit Trailing Significand (T) Field: – Densely packed decimal (DPD) encoding 3-digit decimal number encoded to 10-bit binary number DPD converted to binary coded decimal (BCD) – Binary integer decimal (BID) encoding decimal number encoded by binary integer – Non-normalized decimal significand (-1)0 × 0.00900 × 102 (-1)0 × 0.09000 × 101 – DFP number’s Cohort EE800, U of S 18 Parameters in DFP Format EE800, U of S 19 Example • Summary: DFP representation • (–1)sign×(significand)×10exponent-bias • Convert -8.35×10-2 to decimal64 – Sign bit: “1” negative, “0” positive (sign 1) – Exponent: -2+398=396 (8-bit “0110001100”) – Significand: 835(50-bit DPD coding “0…00 02 3D”) – Encoding of 5-bit MSBs (G0…G4) of Combinational field “01000” – Decimal-64 : “10100010001100…..00…1000111101” “A2 30 00 00 00 00 02 3D” (binary/hex) EE800, U of S 20 DFP special values • Not-a-Number: G0…G4 “11111”; • Infinite Number: G0…G4 “11110”, sign of Inf according to the sign bit; • Overflow: If DFP numbers with absolute values are larger than the largest DFP number (|vmax|=(10q - 1)×10emax-q+1) then overflow occurs. • Underflow: If DFP number are less than the smallest DFP number (|vmin|=10emin-q+1) then underflow occurs. If the absolute value of DFP number is less than 10emin and larger than 10emax-q+1, it produces subnormal. • Normal number: The remaining exponent values and significands represent normal numbers. EE800, U of S 21 DFP Arithmetic Operations • Basic DFP arithmetic operations • Two decimal-specific DFP operations – SameQuantum(DFP1,DFP2) – Quantize(DFP1,DFP2) • DFP comparison operations – do not distinguish between redundant of the same number • DFP conversion operations – DFP to BFP conversion (correctly rounded); – DFP to integer conversion • Recommended DFP operations EE800, U of S 22 DFP Arithmetic Operations • Basic DFP arithmetic operations • Two decimal-specific DFP operations – SameQuantum(DFP1,DFP2) – Quantize(DFP1,DFP2) • DFP comparison operations – do not distinguish between redundant of the same number • DFP conversion operations – DFP to BFP conversion (correctly rounded); – DFP to integer conversion • Recommended DFP operations EE800, U of S 23 DFP Number’s Cohort • Non-normalized decimal significand • DFP number’s Cohort • Standard defines the preferred (required) exponent (quantum) – Exact operation results: the cohort member is selected based on the preferred exponent (quantum) for a DFP result of that operation – Inexact operation results: the cohort member of least possible exponent is used to get the maximum number of significant digits EE800, U of S 24 DFP Rounding Modes • Five types of active rounding modes – roundTiesToEven – roundTiesToAway – roundTiesToPositive – roundTiesToNegative – roundTowardZero • Correct rounding and Faithful rounding • IEEE 754-2008 require to satisfy the correct rounded results for all DFP arithmetic operations • DFP operations should satisfy all rounding modes EE800, U of S 25 DFP Exception Handling • Invalid operation: Operand is NaN; 0×Inf; quare- root of negative operand; default result is NaN • Division by zero: if the dividend is a finite non-zero number and the divisor is zero. The default result is a +inf or −inf. • Overflow operation: if the magnitude of a result exceeds the largest finite number representable in the format of the operation. • Underflow operation: if the magnitude of a result is below 10emin. • Inexact: the correctly rounded result of an operation differs from the infinite precision result. EE800, U of S 26 DFP Addition/Subtraction EE800, U of S 27 DFP Add/Sub Data flow EE800, U of S 28 DFP Addition • Step 1: equalize the exponents – add the mantissas only when exponents are the same. – the number with smaller exponent should be shifting its point to the left, and the number with larger exponent should be shifting its point to right. – Rewriting the operand with the smaller exponent could result in a loss of the least significant digits – keep guard digit, round digit, and stick digit for the operand with smaller exponent EE800, U of S 29 DFP addition • Step 2: add the mantissas 0099999x101 +0016234x10-3 0999990x100 0000016(234)x100 1000006(234) x100 • Step 3: Normalize the result if necessary EE800, U of S 30 DFP addition • Step 4: Round the number if needed 1000006234x100 =1000006x100 • Step 5: Repeat step 3 if the result is no longer normalized • The final result is 1000006 • The correct answer is 1000006.234 EE800, U of S 31 Guard bits • To help minimize rounding problems, IEEE specifies that intermediate steps of operations must store guard digits - additional internal digits that increase the precision of the operations. • Previous example: add one extra digit. • IEEE 754-2008 requires one guard digit, one rounded digit and one sticky digit to make rounding more accurate. EE800, U of S 32 DFP add/sub EE800, U of S 33 General Description: Addition EE800, U of S 34 Example: Addition EE800, U of S 35 Example: Addition (Con.) EE800, U of S 36 DFU: IBM POWER6 and Z10 EE800, U of S 37 High performance Implementation EE800, U of S 38 High performance Implementation EE800, U of S 39 High performance Implementation [12] A. Vázquez and E. Antelo“A High-performance Significand BCD Adder with IEEE 754-2008 Decimal Rounding” ARITH19, Portland. June 08-10 2009 EE800, U of S 40 Evaluation Results and Comparison [Proposed]: A. Vázquez and E. Antelo“A High-performance Significand BCD Adder with IEEE 754-2008 Decimal Rounding” ARITH19, Portland. June 08-10 2009 EE800, U of S 41 DFP Multiplication EE800, U of S 42 Scheme of decimal multiplier x: 1963× y: 8145= xy0: 5x 9815 0 0000 xy1: 5x 9815 −x -1963 xy2 : x 1963 0 0000 xy3: 10x 19630 −2x -3926 15988635 EE800, U of S 43 Partial product generation Generate XYi Yi {1,2,3…7,8,9} XYi is carry save format EE800, U of S 44 Partial product generation Solid Circles: BCD Sum (digit) Hollow Circles: Carry (bit) n-digit radix-10 CSA m-digit radix-10 counter EE800, U of S 45 Carry Save Adder Tree CSA Tree to Generate Multiplication Result EE800, U of S 46 Flowchart of DFP Multiplier 47 Architecture of DFP Multiplier 48 Exception Detection & Handling • Invalid operation – sNaN (pass significand of sNaN) – 0 x ∞ (produce qNaN with significand 0) • Overflow (and Inexact) – IEIP – SLA > Emax – Increase SLA until all LZs removed • Underflow (and possibly Inexact) – IEIP – SLA < Emin – Decrease SLA until 0, then shift right • Inexact 49 Implementation Highlights • Leverage operands' LZCs – SC, SLA, and IESIP • Handle NaNs with minimal overhead – No dataflow modification – Coerce multiplicand or multiplier to 1 • Support gradual underflow – No dataflow modification – Simply extend number of iterations • Simple, control-based rounding scheme 50 Synthesis Results • 64-bit (16 digit) operands, DPD encoded • LSI Logic's gflxp 0.11um CMOS, 55ps FO4 • Synopsys Design Compiler • Results – Fixed-point 119,653 um2 14.72 FO4s – Floating-point 237,607 um2 15.45 FO4s • Critical path – Fixed-point 4:2 compressor (accumulator) – Floating-point 128-bit barrel shifer 51 Applicability to Parallel Designs • IE and IP shift generation • Rounding scheme • NaN handling • Exception detection and handling • On-the-fly sticky bit generation... NO 52 Sequential vs. Parallel • Sequential – Less area – Potentially better cycle time • Parallel – Less latency – Higher throughput 53 DFP Division EE800, U of S 54 64 DFP Division Data Flow 64 Sign (1 bit) Combinational Field (5 bits) Exponent Field (8 bits) E1_b 8 8 E2_b Significands Field (50bits) M1_b 50 50 M2_b Unpacking • Unpacking C1 5 5 C2 2 Combin_Register DPD_to_BCD Decimal Floating- Point Number 1 1 E1_a E1 10 10 E2 M1_b 60 60 M2_b Combinational 2 E2_a Div Process S1 • S2 Exponent Combin_Register Sign Logic 4 Substraction M1 64 64 M2 Check for zeros 4 M2_a M1_a E12 10 and infinity • Mantissa Division Bias Addition Subtract exponents F Sq 72 Mn Ea 10 Exponent Adjustment 1 Fa Normalization • Divide Mantissa • 1 1 10 72 72 Rounding Normalize and Mn Control detect overflow and underflow Exponent Fa2 Rounding 1 Adjustment 1 Fr Ea 10 64 • Mq Perform rounding Eq_C 2 Exponent Div Combinational Significand_Div Com Process Mq_C 60 • Mq Replace sign 4 Eq 8 Cq BCD_to_DPD 5 50 • Mq 11 Sign (1 bit) Eb Combinational Field (5 bits) Field ExponentM12 (8 bits) 64 Significands Field (50 bits) packing Packing EE800, U of S 55 Unpacking and Sign Logic 64 64 Combinational Field Exponent Field (8 bits) Sign (1 bit) Significands Field (50bits) Unpacking (5 bits) • Step1: Unpacking Floating-Point Number Check for zeros and infinity (if F=0, Stop) S1 1 1 S2 • Step2: Sign Process Sign Logic Sq S1 S2 1 Sq EE800, U of S 56 Exponent Subtraction E1 11 11 E2 Exponent Substraction • Step3: Exponent Subtract E12 11 Eb E1 E2 + bias Bias Addition Eb 11 EE800, U of S 57 Mantissa Division Algorithms Choose here? 1. Restoring division M1 64 64 M2 2. Non-restoring division 3. High-Radix division 4. Convergence division Mantissa Division • Step4: Mantissa Division 0.1 M1 1 0.1 M 2 1 M12 68 M min 0.1 M max 1 10 p 1 0.1 M min / M max M1 / M 2 M max / M min 10 EE800, U of S 58 Normalization 10 M12 68 Eb Exponent 1 Normalization Adjustment Fa Ea 10 Mn 68 • Step5 : Left shift over one bit is needed to make Mantissa result Normalized, also need to detect overflow and underflow For example: “0934…2140819564” Left shift one bit “934…21408195640 Should tell exponent and Ea=Eb-1 EE800, U of S 59 Rounding and Packing 10 Ea 68 Mn 68 Exponent Fr Fr Rounding Rounding Adjustment 1 1 Control 10 Eq 64 Mq • Step6 : Truncate, Round-up, Round-to-nearest. Sometimes, the Rounding Policy above is not fair, according to IEEE Rounding standard: “Round to nearest even” is more better. 11 Combinational Field Sign (1 bit) Eb Field ExponentM12 (8 bits) Significands Field (50 bits) packing (5 bits) 64 • Step7: Packing the Sign bit and Exponent bits and Significand bits together, detect the NaN, Infinity, EE800, U of S 60 High performance Implementation [1] L.-K. Wang and M. J. Schulte, “Decimal Floating-Point Division Using Newton-Raphson Iteration,” Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors, pp. 84-95, Sep. 2004. EE800, U of S 61 High performance Implementation [2] Tomás Lang and Alberto Nannarelli, “A Radix-10 Digit-Recurrence Division Unit: Algorithm and Architecture,”IEEE Transactions on Computers, pp727–739, IEEE, June 2007. EE800, U of S 62 High performance Implementation EE800, U of S 63 Evaluation Results and Comparison DFP Divider[1] DFP Divider[2] Precision (digit) 16 (decimal64) 16 (decimal64) Cycle time (ns) 0.57 1 # of cycles 150 20 Latency (ns) 85.5 20 1: Synthesized with a STM 90-nm standard cell library EE800, U of S 64 DFP Transcendental Arithmetic EE800, U of S 65 Contents • Introduction • Decimal Logarithmic Converter • Decimal Antilogarithmic Converter • Conclusions • Future Work EE800, U of S 66 32-bit DFP Logarithm X (1) s 10e coefficient R log10 ( X ) log10 (10e ) + log10 (coefficient ) coefficient is a non-normalized decimal Integer. Example: R log10 ((1)0 108 0024589) 8 + 5 + log10 (0.2458900) To guarantee a 32-bit DFP Calculation, there need to keep 14-digit FXP logarithmic calculation. EE800, U of S 67 32-bit DFP Antilogarithm P Anti log10 ( X ) 10 X Here: log10 ( X min ) X log10 ( X max ) For 32-bit DFP: X [101,96.99999] X Int X Frac X Int Anti log10 ( X ) 10 10 10 X frac 5 Example: Anti log10 ((1) 1940467 10 ) 1 Anti log10 (19.40467) 1019 100.4046700 To guarantee a 32-bit DFP calculation, there need to keep 8-digit FXP antilog calculation. EE800, U of S 68 Digit-Recurrence Algorithm (Log) The corresponding recurrences: E ( j + 1) E[ j ](1 + e j 10 j ) L( j + 1) L[ j ] log10 (1 + e j 10 j ) Here: E[1] m L[1] 0 ej ∈｛-9 -8 -7…0 1…7 8 9｝ e j selected so that E( j + 1) converges to 1 EE800, U of S 69 Digit-Recurrence Algorithm (Antilog) Any 7-digit fixed-point decimal input N: 10( m) em ln(10) em ' The corresponding recurrences: j L( j + 1) L[ j ] ln(1 + e j 10 ) E ( j + 1) E[ j ] (1 + e j 10 j ) Here: E[1] 1 L[1] m ' f i 1 + e j 10 j e j selected so that L( j + 1) converges to 0 ej ∈｛-9 -8 -7…0 1…7 8 9｝ EE800, U of S 70 Selection By Rounding (cont.) A scaled remainder is defined as: Log: W [ j ] 10 j (1 E[ j ]) Antilog: W [ j ] 10 ( E[ j ]) j e j is achieved by Rounding W [j] e j round (W [ j ]) e1 is achieved by using look-up table, e2…ej can be obtained with selection by rounding EE800, U of S 71 Architecture: Decimal Log Converter m 28 Reg 1 8 28 Detector 2 Mult1 32 8 Tab I Stage 1 e1 4 Stage 2 Mux 7 m2m 3m 5m e1 4 Reg 2 (1/ln(10)) ej 4 “0000” 4 56 56 m' Adjusted Costant 4 W[j] 56 56 m' e1 4 4e j 56 m' “0000” ej 4 56 56 1 56 W[j] Tab II Mult3 64 64 0 & Log 10(5,2,3) 64 64 Mux 1 Mux 2 Mux 3 Mux 4 56 Mux 8 Mux 9 56 4 9'sCom 56 56 Mult2 56 14-Digit Decimal CLA Adder 64 64 Reg 6 Shifter (x10-j) 9'sCom Shifter (x10) 56 Shifter (x100) 16-Digit Dec CLA 56 56 56 56 64 Reg 4 Mux 5 Mux 6 56 56 14-Digit Dec CLA Reg 5 4 56 W[j] Rounding Logic ej 4 Reg 3 critical path EE800, U of S 72 Implementation Results Logic Utilization Used Available* Utilization # of Occupied Slices 2842 13696 21% Maximum Frequency 47.7 MHz # of Clock Cycles 17 clock cycle *: Xilinx Virtex2p XC2VP30 with package ff1157 and speed -7 Critical Path Detail (ns): Reg2 Mux2 Mult 2 Shifter Mux5 CLA Round Total 1.188 1.564 9.347 1.438 1.350 5.519 0.566 20.97 EE800, U of S 73 Architecture: Dec. Antilog Converter X frac 28 Reg 1 28 ln(10) 28 Cons Mul “0000” 32 m' 40 Reg 2 Stage 1 Stage 2 12 Critical Path 40 TAB I e 8 1 4 ej 4 ej 40 AddGen AddGen 9'sCom 7 Mux 1 7 Mult 40 “0000” e1 40 ‘1’ 40 40 7 Mux 4 Mux 5 40 TABLE II “0000” 40 40 40 Reg 6 40 Shifter (x10-j) 40 Shifter (x10j+1) 9'sCom Shifter_Reg 40 40 40 40 Mux 3 40 Mux 2 40 10-digit Dec CLA 40 L(j) 40 10-digit Dec CLA W[j] Final Rounding 40 28 Rounding Logic Reg 5 4 ej 28 Reg 3 4 ej EE800, U of S 74 Implementation Results Logic Utilization Used Available* Utilization # of Occupied Slices 2315 13696 17% Maximum Frequency 51.5 MHz # of Clock Cycles 11 clock cycle *: Xilinx Virtex2p XC2VP30 with package ff1157 and speed -7 Critical Path Detail (ns): Reg6 Mult Mux4 Shifter CLA Round Total 1.599 7.839 1.539 1.100 6.794 0.545 19.42 EE800, U of S 75 Comparison (with Binary FXP Log and Exponential Converters) • similar dynamic range for the normalized coefficients. 223 107 224 252 1016 253 • Binary reference available having the same digit- recurrence algorithm with Selection by Rounding. • The radix-10 is close to radix-8. EE800, U of S 76 Comparison (cont.) (with Binary FXP Log and Exponential Converters) Radix-10 Decimal1 Radix-8 Binary [1] Log. Exp. Log. Exp. Precision (digit) 7 16 7 16 24 53 24 53 Area (fa2) 1630 2640 1370 2260 647 1829 627 1777 Cycle time (T3) 17 19 16 18 7 8 7 8 # of cycles 8 17 8 17 8 18 11 21 Latency (T3) 136 323 128 306 56 144 77 168 1: Synthesized with a TMSC 0.18-um standard cell library 2: the area of 1-bit full adder 3: the delay of 1-bit full adder EE800, U of S 77 Conclusions • Achieved 32-bit DFP accuracy of decimal log and antilog results. • Implemented them on FPGA and ASIC. • Compare them with binary converters. EE800, U of S 78 Future Work • The 64-bit and 128-bit DFP logarithm and antilog converters. • The presented architecture can be optimized to achieve a faster speed or occupy a smaller area. EE800, U of S 79 EE990 April. 2009 Decimal Log and Antilog Converters 79/18 Summary • IEEE 754-2008 defines a DFP standard that defines – number representation in several precisions – correct DFP arithmetic operations – rounding modes • Implementation of DFP Adder, Multiplier, Divider, Logarithmic and Antilogarithmic Converter • Implementing and programming DFP are both really hard. EE800, U of S 80