HP OpenVMS Systemsask the wizard |
The Question is: This may seem simple but when I run this BASIC code, 1 DECLARE DOUBLE TMP.DBL DECLARE LONG TMP.LONG TMP.DBL=39.80*100. TMP.LONG=TMP.DBL PRINT TMP.LONG 32767 END I get the folowing output: ALPHA::ARS$ R TEST_TYPE_CAST 3979 ALPHA::ARS$ Why is this number 1 less than it should be? There was no subtraction involved. I came accross this while I was doing a type cast between double and long in one of my payment processing programs. I ran this on two ALPHA systems so far and the same outpu t came up for both. For some reason I could not get another number to do the same (e.g. 43.50) other than 39.80, any guesses? Thanks! The Answer is : VAX and Alpha systems, like just about every modern computer, represent real numbers in a binary floating-point format. Floating point refers to numbers being represented internally with the radix point adjusted so that the number's fraction is always between .05 and 1. This is similar in concept to 'scientific notation' of very large or small numbers being represented in a notation such as "6.02 * 10^23" where "10^23" represents an exponent indicating a large power of 10. Internally, a floating point value is stored as a combination of three components: - A base-2 fraction of a certain number of digits - An exponent (in powers of 2) - A 1 bit sign Some common floating point formats on VAX and Alpha are as follows (note that not all formats are 'native' to both architectures): Data Type Bits Fraction Bits Exponent Bits --------------- ---- ------------- ------------- F Floating 32 24 8 D Floating 64 56 8 G Floating 64 53 11 H Floating 128 112 16 IEEE S Floating 32 23 9 IEEE T Floating 64 52 12 For the F Floating format, the fraction is 24 binary digits (bits), and the exponent is 8 bits. The exponent is the power of 2 which, when multiplied by the fraction, gives the value. In addition, things are manipulated so that the fraction's leftmost digit is always 1 - this is called "normalization" - and the exponent adjusted accordingly. Since that bit is always 1, there is no need to store it, so it is assumed. So the fraction "f" is always in the range (0.5 <= f < 1). Note that the fraction is 24 bits long, but only 23 bits are stored. A sign bit is included as well, so there is 1 sign bit, 8 exponent and 23 fraction bits actually store in memory for F Floating format. The exponent for F Floating can range from -127 to +127, and is stored by adding 128 to the exponent value - this is called "biasing". A stored exponent of zero is reserved - if the sign is positive, then the value is zero, regardless of the fraction. If the sign is negative, this is called a "reserved operand", and generates an exception if it is used. Let's take a simple read-world example - the number 1. Remembering that the fraction is between 0.5 and 1 (but less than 1), we have to represent this as a fraction of 0.5 and an exponent of 1 (0.5 times 2**1). 0.5 can be exactly expressed as a binary fraction, so there's no problem with this. The bits would work out this way: Sign: 0 (positive), goes in bit 15 Exponent: 1, biased with 128 gives 129, bits 14:7 Fraction: 0.5, or in binary, 0.100000000000000000000000 bits 6:0 and 31:16 (23 actual bits plus hidden bit) Putting all the bits together we get: 3 111 00 0 1 654 76 0 ffffffffffffffffseeeeeeeefffffff 00000000000000000100000010000000 or in hex: 0 0 0 0 4 0 8 0 D Floating format is the same as F Floating except that it has another 32 fraction bits available (all zero in this case). The first thing you can see is that since we only have 24 fraction bits, we are limited in the accuracy to which we can store values. 24 binary fraction digits translates roughly to 6 decimal digits, so if we have a value with more than 6 significant decimal digits, it's unlikely it can be represented accurately in F Floating. We'll choose the closest representation we can in 24 bits. It is important to realize that "nice, clean" decimal fractions such as 0.1 and 0.05 don't translate to "nice, clean" binary fractions. In fact, they end up as repeating fractions, where you can keep adding bits forever and you'll never get it exactly right. The binary fraction for .05 looks like: 0.110011001100110011001100110011001100... ad infinitum ^ The 24th fraction bit is here| And since the next bit is 1, we'll round up, and thus the F Floating value will be slightly higher than .05. How "slightly"? Well, the F Floating value of CCCD3E4C turns out to be in decimal: 0.05000000074505806 What would we have gotten if we didn't round, and left the 24th bit zero? The hex would be CCCC3E4C and in decimal: 0.04999999701976776 which is much further away from .05 than the first value. Now take this F Floating value and convert it to D Floating. This is done by tacking on 32 extra fraction bits of zero. But since the original F value is only correct to 24 bits, the D value isn't going to be any better. We'll end up with hex 00000000CCCD3E4C which is exactly the same decimal value as above. If we had started out by converting 0.05 to D Floating, adding 32 bits of precision, we'd STILL get a repeating fraction, but the rounding error would be much further out. In hex we'll get: CCCDCCCCCCCC3E4C Certainly different than the F-converted-to-D value above. This is good to at least 16 decimal digits, but again isn't EXACTLY .05 but slightly higher. You could go to H Floating and get a whopping 113 fraction bits for about 33 decimal digits of accuracy, but you'd STILL not have exactly the right answer. So when dealing with floating point, remember that you've only got an approximation of the value you want. Sometimes it's exactly right, when the fraction can be exactly expressed, but often it isn't, especially when dealing with decimal fractions. And the other thing to remember is that simply converting a value from single-precision to double-precision doesn't magically conjure up those fraction bits that got chopped off in the first place. Choose your initial precision wisely, and don't necessarily believe that those last decimal digits you print out are meaningful. When arithmetic is performed on these approximations of decimal values, the error is compounded to propagated to the final result. So tiny differences in conversion can result in much larger errors later on. Obviously, multiplication or division can magnify the differences even further. Because in many cases an exact decimal number (for example, .05 as shown previously) does not accurately convert to a binary number, it is important to remember that these numbers are approximate when stored in binary floating point format. This accounts for the common advice that financial data and calculations representing dollars and cents should not use floating point numbers. In case you are tempted to "check" your binary computer's floating point arithmetic operations by using your "pocket" calculator, be aware that the results rarely agree. Calculators almost always use BCD (Binary Coded Decimal) numeric representation, so their results tend to be more nearly "exact". Unfortunately, BCD calculations tend to be quite slow, that's why computers tend to use floating point arithmetic natively. Software is usually used to handle BCD operations (though the VAX architecture does describe optional support for Decimal-string instructions). The OpenVMS Wizard would also encourage you to review together the article "What Every Computer Scientist Should Know About Floating-Point Arithmetic: by David Goldberg of the Xerox Palo Alto Research Center (available on the internet in several places including http://docs.sun.com/source/806-3568/ncg_goldberg.html). All that nonsense out of the way, this program appears to run just as would be expected. OpenVMS Alpha V7.3-1 (all patches installed), with BASIC V1.4-000. Never, never, never use floating point format for financial data, as floating point is, will be, and always has been an approximation -- most accountants will prefer integer values for monetary data, whether stored in a longword or a quadword. $ type x.bas 1 DECLARE DOUBLE TMP.DBL DECLARE LONG TMP.LONG TMP.DBL=39.80*100. TMP.LONG=TMP.DBL PRINT TMP.LONG 32767 END $ basic x $ link x run x 3980 $ And the same on OpenVMS VAX V7.3 (all patches installed), BASIC V3.9-000. $ basic VAX BASIC V3.9-000 Ready 1 DECLARE DOUBLE TMP.DBL DECLARE LONG TMP.LONG TMP.DBL=39.80*100. TMP.LONG=TMP.DBL PRINT TMP.LONG 32767 END run NONAME 19-JUN-2003 19:04 3980 Ready You could also resolve the current and undesired (but entirely correct and valid) result with the following change to the code: TMP.DBL='39.80'D*100. Put another way, you cannot represent 39.8 in a floating point value, and assigning it to an integer will truncate it. You will also want to read the following information in the BASIC HELP library: $ HELP/LIBRARY=BASICHELP CONSTANTS Literal_notation
|