VIEWS: 59 PAGES: 20 CATEGORY: Real Estate POSTED ON: 1/19/2010
Floating-Point Floating-point notation is a method for storing approximations of real numbers over large ranges. Computers use values in floating-point notation primarily for scientific and engineering applications. Converting numbers with “radix points” from various bases into decimal: ex. (11011.0101)2 = 1 * 24 + 1 * 23 + 1 * 21 + 1 *20 + 1 * 2-2 + 1 * 2-4 = 27.3125 ex. (1B.5)16 = 1 * 161 + 11 * 160 + 5 * 16-1 = 16 + 11 + .3125 = 27.3125 Converting from Decimal to other bases: The integer part is converted in the normal method (repeated division by 2) The fractional part is multiplied by the radix of the new base. The integer portion of the resulting number becomes the next digit in the fraction’s expansion, and the process is repeated with the fraction portion replacing the old fraction. ex. (4.25)10 = 4 = 100 .25 = Calculation Integer Fraction 0.25 * 2 0 .5 0.5 * 2 1 .0 so .25 = 01 4.25 = 100.01 ex. 0.3 = (0.0100 1100 1100 .......)2 Calculation Integer Fraction 0.3 * 2 = 0.6 0 .6 0.6 * 2 = 1.2 1 .2 0.2 * 2 = 0.4 0 .4 0.4 * 2 = 0.8 0 .8 0.8 * 2 = 1.6 1 .6 0.6 * 2 = 1.2 1 .2 0.2 * 2 = 0.4 0 .4 0.4 * 2 = 0.8 0 .8 0.8 * 2 = 1.6 1 .6 0.6 * 2 = 1.2 1 .2 . . . . . . Floating-Point Representation m re where m - mantissa ( 0 m 1) r - Radix (base) e - exponent ex. 0.1101 23 VAX allows 4 different formats for floating-point numbers, all of which use base 2. These formats are: 1. Single precision ( F-Floating) 2. Double precision ( D-Floating) 3. G-Floating 4. H-Floating The most common one is the F-Floating. Normalized Floating-Point Numbers A floating-point number is in normalized form if the first digit after the radix point is nonzero, and the integer portion of the number is 0 ex. 0.3454102 0.101123 IEEE 754 floating-point numbers are always normalized m since the digit after the radix point is 1 1/2 m 1 Why use normalized form? 1. The leading digit after the radix point is always 1, so it can be omitted and could be assumed hence, we can express b+1 significant bits with b bits. 2. Increased accuracy 0.110123 = 0.00110125 using normalized form makes the representation unique. Biased Exponent Both the mantissa and the exponent are signed values(they can be negative or positive). This means that two sign bits are needed. In IEEE 754 standards, the mantissa is expressed in the sign-magnitude form, and the exponent is expressed in the biased form. This requires only one sign bit ( for the mantissa) The floating-point representation uses n bits for the exponent so 2n different values can be represented. In the biased exponent representation one half of these 2n values are used to represent negative numbers, and the other half are used to represent positive numbers. Instead of storing the actual exponent, the biased exponent is stored as follows: Biased exponent = true exponent + 2(b - 1) where b is the number of bits used for the exponent field. ex. In the VAX F-Floating data type there are 8 bits in the exponent field b = 8, the biased is 128 ( 28-1 ) so the exponent = true exponent + 127 Bit Pattern Value in Decimal True Exponent 00000000 0 special case 00000001 1 -127 00000010 2 -126 . . . . . . . . . 01111111 127 -1 10000000 128 0 10000001 129 1 . . . . . . 11111111 255 127 The bit pattern is what the machine stores for the true exponent. (true + 128) On the VAX the Biased 00000000 is used as follows - if the mantissa sign bit is 0 it represent 0.0 - if the mantissa sign bit is 1 not used With 8 bits we represent -127 to 127, which is similar to the two’s complement representation. However with two’s complement the negative numbers are larger that positive numbers. To get the actual exponent from the biased exponent, the machine will subtract the biased ( in the above example 128). VAX Floating-Point Representations 1. Single Precision (F-Floating) It requires 32-bit 15 14 7 6 0 1st word S b.exponent m1 m2 where S = sign bit 0 positive 1 negative m = .1m1m2 b.exponent = biased exponent = true exponent + 128 so the number represented using F-Floating would be .1m1m2 2p where p = b.exponent - 128 ex. 0000 0000 0000 0000 1 100 00010 1010000 m2 S b.exp m1 b.exponent = 10000010 = 13010 so p = 130 - 128 = 2 (true exponent) .1m1m2 2p S = 1 (negative number) - .11010000 22 - 11.01 = -3.2510 ex. What floating-point number is represented by the following? ( 00004080 )16 0000 0000 0000 0000 0 10000001 0000000 m2 S b.exp m1 b.exponent = 10000001 = 12910 so p = 129 - 128 = 1( true exponent) .1m1m2 2p S = 0 (positive) .10000000 21 = 1.010 ex. Show the floating-point representation (F-type) of the following: 5/8 = 0.625 0.625 = 0.1012 (normalized) .1m1m2 2p S = 0 (positive) m1 = 0100000 m2 = 0000 0000 0000 0000 b.exp = 0 + 128 = 128 1000 0000 0000 0000 0000 0000 0 10000000 0100000 0000402016 2. Double Precision (D-Floating) It requires 64-bit 15 14 7 6 0 1st word S b.exponent m1 m2 m3 m4 where S = sign bit 0 positive 1 negative m = .1m1m2m3m4 b.exponent = biased exponent = true exponent + 128 so the number represented using D-Floating would be .1m1m2m3m4 2p where p = b.exponent - 128 The range of values represented by this type is the same as in F type, however the number of significant digits is increased. 3. G-Floating It requires 64-bit 15 14 4 3 0 1st word S b.exponent m1 m2 m3 m4 where S = sign bit 0 positive 1 negative m = .1m1m2m3m4 b.exponent = biased exponent = true exponent + 210 so the number represented using G-Floating would be .1m1m2m3m4 2p where p = b.exponent - 1024 4. H-Floating It requires 8 words 15 14 0 1st word S b.exponent m1 . . . m7 where S = sign bit 0 positive 1 negative m = .1m1m2m3m4m5m6m7 b.exponent = biased exponent = true exponent + 214 so the number represented using H-Floating would be .1m1m2m3m4 2p where p = b.exponent - 214 Allocating Memory for Floating-point Values Initialized memory .x_FLOATING list_of_floating_point_constants where x = F, D, G, H or use .FLOAT .DOUBLE ex. X: .FLOAT 3.5 Y: .DOUBLE 4.121 - The first directive allocates 4 bytes (F type) and initializes to 3.5 represented in F-Floating type. - The second directive allocates 8 bytes ( D type) and initializes to 4.121 represented in D-Floating type. Uninitialized Storage .BLKF .BLKD .BLKG .BLKH All allocate as many bytes as the type specified requires and initialize to zero. ex. REAL: .BLKF 3 Allocates 3 F type (3 longwords) and initializes them to zero Floating-Point Operations All the arithmetic instructions, special purpose instructions, convert instructions, move instructions, compare instruction, and test instruction operate on floating point types in the normal way as before ADDx2 op1, op2 ADDx3 op1, op2, op3 SUBx2 op1, op2 SUBx3 op1, op2, op3 MULx2 op1, op2 MULx3 op1, op2, op3 DIVx2 op1, op2 DIVx3 op1, op2, op3 CLRx op MOVx source, dest MOVAx source, dest MNEGx source, dest TSTx op CMPx op1, op2 where x = F, D, G, H Floating-Point Convert Instructions CVTxy source, dest x, y = F, D, G, H, B, W, L xy xy DG xy GD Convert Round to Longword CVTRxL source, dest x = F, D, G, H Convert Truncate to Longword CVTxL source, dest x = F, D, G, H