HP OpenVMS Systemsask the wizard |
The Question is:
This may seem simple but when I run this BASIC code,
1 DECLARE DOUBLE TMP.DBL
DECLARE LONG TMP.LONG
TMP.DBL=39.80*100.
TMP.LONG=TMP.DBL
PRINT TMP.LONG
32767 END
I get the folowing output:
ALPHA::ARS$ R TEST_TYPE_CAST
3979
ALPHA::ARS$
Why is this number 1 less than it should be? There was no subtraction
involved. I came accross this while I was doing a type cast between double
and long in one of my payment processing programs. I ran this on two ALPHA
systems so far and the same outpu
t came up for both. For some reason I could not get another number to do the
same (e.g. 43.50) other than 39.80, any guesses?
Thanks!
The Answer is :
VAX and Alpha systems, like just about every modern computer, represent
real numbers in a binary floating-point format. Floating point refers
to numbers being represented internally with the radix point adjusted
so that the number's fraction is always between .05 and 1. This is
similar in concept to 'scientific notation' of very large or small
numbers being represented in a notation such as "6.02 * 10^23" where
"10^23" represents an exponent indicating a large power of 10.
Internally, a floating point value is stored as a combination of three
components:
- A base-2 fraction of a certain number of digits
- An exponent (in powers of 2)
- A 1 bit sign
Some common floating point formats on VAX and Alpha are as follows
(note that not all formats are 'native' to both architectures):
Data Type Bits Fraction Bits Exponent Bits
--------------- ---- ------------- -------------
F Floating 32 24 8
D Floating 64 56 8
G Floating 64 53 11
H Floating 128 112 16
IEEE S Floating 32 23 9
IEEE T Floating 64 52 12
For the F Floating format, the fraction is 24 binary digits (bits),
and the exponent is 8 bits. The exponent is the power of 2 which,
when multiplied by the fraction, gives the value. In addition, things
are manipulated so that the fraction's leftmost digit is always 1 -
this is called "normalization" - and the exponent adjusted
accordingly. Since that bit is always 1, there is no need to store
it, so it is assumed. So the fraction "f" is always in the range (0.5
<= f < 1). Note that the fraction is 24 bits long, but only 23 bits
are stored. A sign bit is included as well, so there is 1 sign bit, 8
exponent and 23 fraction bits actually store in memory for F Floating
format.
The exponent for F Floating can range from -127 to +127, and is stored
by adding 128 to the exponent value - this is called "biasing". A
stored exponent of zero is reserved - if the sign is positive, then
the value is zero, regardless of the fraction. If the sign is
negative, this is called a "reserved operand", and generates an
exception if it is used.
Let's take a simple read-world example - the number 1. Remembering
that the fraction is between 0.5 and 1 (but less than 1), we have to
represent this as a fraction of 0.5 and an exponent of 1 (0.5 times
2**1). 0.5 can be exactly expressed as a binary fraction, so there's
no problem with this. The bits would work out this way:
Sign: 0 (positive), goes in bit 15
Exponent: 1, biased with 128 gives 129, bits 14:7
Fraction: 0.5, or in binary, 0.100000000000000000000000
bits 6:0 and 31:16 (23 actual bits plus hidden bit)
Putting all the bits together we get:
3 111 00 0
1 654 76 0
ffffffffffffffffseeeeeeeefffffff
00000000000000000100000010000000
or in hex:
0 0 0 0 4 0 8 0
D Floating format is the same as F Floating except that it has another
32 fraction bits available (all zero in this case).
The first thing you can see is that since we only have 24 fraction
bits, we are limited in the accuracy to which we can store values. 24
binary fraction digits translates roughly to 6 decimal digits, so if
we have a value with more than 6 significant decimal digits, it's
unlikely it can be represented accurately in F Floating. We'll choose
the closest representation we can in 24 bits.
It is important to realize that "nice, clean" decimal fractions such
as 0.1 and 0.05 don't translate to "nice, clean" binary fractions. In
fact, they end up as repeating fractions, where you can keep adding
bits forever and you'll never get it exactly right. The binary
fraction for .05 looks like:
0.110011001100110011001100110011001100... ad infinitum
^
The 24th fraction bit is here|
And since the next bit is 1, we'll round up, and thus the F Floating
value will be slightly higher than .05. How "slightly"? Well, the F
Floating value of CCCD3E4C turns out to be in decimal:
0.05000000074505806
What would we have gotten if we didn't round, and left the 24th bit
zero? The hex would be CCCC3E4C and in decimal:
0.04999999701976776
which is much further away from .05 than the first value.
Now take this F Floating value and convert it to D Floating. This is
done by tacking on 32 extra fraction bits of zero. But since the
original F value is only correct to 24 bits, the D value isn't going
to be any better. We'll end up with hex 00000000CCCD3E4C which is
exactly the same decimal value as above.
If we had started out by converting 0.05 to D Floating, adding 32 bits
of precision, we'd STILL get a repeating fraction, but the rounding
error would be much further out. In hex we'll get:
CCCDCCCCCCCC3E4C
Certainly different than the F-converted-to-D value above. This is
good to at least 16 decimal digits, but again isn't EXACTLY .05 but
slightly higher. You could go to H Floating and get a whopping 113
fraction bits for about 33 decimal digits of accuracy, but you'd STILL
not have exactly the right answer.
So when dealing with floating point, remember that you've only got an
approximation of the value you want. Sometimes it's exactly right,
when the fraction can be exactly expressed, but often it isn't,
especially when dealing with decimal fractions.
And the other thing to remember is that simply converting a value from
single-precision to double-precision doesn't magically conjure up
those fraction bits that got chopped off in the first place. Choose
your initial precision wisely, and don't necessarily believe that
those last decimal digits you print out are meaningful.
When arithmetic is performed on these approximations of decimal
values, the error is compounded to propagated to the final result. So
tiny differences in conversion can result in much larger errors later
on. Obviously, multiplication or division can magnify the differences
even further.
Because in many cases an exact decimal number (for example, .05 as
shown previously) does not accurately convert to a binary number, it
is important to remember that these numbers are approximate when
stored in binary floating point format. This accounts for the common
advice that financial data and calculations representing dollars and
cents should not use floating point numbers.
In case you are tempted to "check" your binary computer's floating
point arithmetic operations by using your "pocket" calculator, be
aware that the results rarely agree. Calculators almost always use
BCD (Binary Coded Decimal) numeric representation, so their results
tend to be more nearly "exact". Unfortunately, BCD calculations tend
to be quite slow, that's why computers tend to use floating point
arithmetic natively. Software is usually used to handle BCD
operations (though the VAX architecture does describe optional support
for Decimal-string instructions).
The OpenVMS Wizard would also encourage you to review together the
article "What Every Computer Scientist Should Know About Floating-Point
Arithmetic: by David Goldberg of the Xerox Palo Alto Research Center
(available on the internet in several places including
http://docs.sun.com/source/806-3568/ncg_goldberg.html).
All that nonsense out of the way, this program appears to run just as
would be expected. OpenVMS Alpha V7.3-1 (all patches installed), with
BASIC V1.4-000. Never, never, never use floating point format for
financial data, as floating point is, will be, and always has been
an approximation -- most accountants will prefer integer values for
monetary data, whether stored in a longword or a quadword.
$ type x.bas
1 DECLARE DOUBLE TMP.DBL
DECLARE LONG TMP.LONG
TMP.DBL=39.80*100.
TMP.LONG=TMP.DBL
PRINT TMP.LONG
32767 END
$ basic x
$ link x
run x
3980
$
And the same on OpenVMS VAX V7.3 (all patches installed), BASIC
V3.9-000.
$ basic
VAX BASIC V3.9-000
Ready
1 DECLARE DOUBLE TMP.DBL
DECLARE LONG TMP.LONG
TMP.DBL=39.80*100.
TMP.LONG=TMP.DBL
PRINT TMP.LONG
32767 END
run
NONAME 19-JUN-2003 19:04
3980
Ready
You could also resolve the current and undesired (but entirely correct
and valid) result with the following change to the code:
TMP.DBL='39.80'D*100.
Put another way, you cannot represent 39.8 in a floating point value,
and assigning it to an integer will truncate it.
You will also want to read the following information in the BASIC
HELP library:
$ HELP/LIBRARY=BASICHELP CONSTANTS Literal_notation
|