Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
How do you explain floating point inaccuracy to fresh programmers and laymen who still think computers are infinitely wise and accurate?
Do you have a favourite example or anecdote which seems to get the idea across much better than an precise, but dry, explanation?
How is this taught in Computer Science classes?
There are basically two major pitfalls people stumble in with floating-point numbers.
The problem of scale. Each FP number has an exponent which determines the overall “scale” of the number so you can represent either really small values or really larges ones, though the number of digits you can devote for that is limited. Adding two numbers of different scale will sometimes result in the smaller one being “eaten” since there is no way to fit it into the larger scale.
PS> $a = 1; $b = 0.0000000000000000000000001
PS> Write-Host a=$a b=$b
a=1 b=1E-25
PS> $a + $b
1
As an analogy for this case you could picture a large swimming pool and a teaspoon of water. Both are of very different sizes, but individually you can easily grasp how much they roughly are. Pouring the teaspoon into the swimming pool, however, will leave you still with roughly a swimming pool full of water.
(If the people learning this have trouble with exponential notation, one can also use the values 1 and 100000000000000000000 or so.)
Then there is the problem of binary vs. decimal representation. A number like 0.1 can't be represented exactly with a limited amount of binary digits. Some languages mask this, though:
PS> "{0:N50}" -f 0.1
0.10000000000000000000000000000000000000000000000000
But you can “amplify” the representation error by repeatedly adding the numbers together:
PS> $sum = 0; for ($i = 0; $i -lt 100; $i++) { $sum += 0.1 }; $sum
9,99999999999998
I can't think of a nice analogy to properly explain this, though. It's basically the same problem why you can represent 1/3 only approximately in decimal because to get the exact value you need to repeat the 3 indefinitely at the end of the decimal fraction.
Similarly, binary fractions are good for representing halves, quarters, eighths, etc. but things like a tenth will yield an infinitely repeating stream of binary digits.
Then there is another problem, though most people don't stumble into that, unless they're doing huge amounts of numerical stuff. But then, those already know about the problem. Since many floating-point numbers are merely approximations of the exact value this means that for a given approximation f of a real number r there can be infinitely many more real numbers r1, r2, ... which map to exactly the same approximation. Those numbers lie in a certain interval. Let's say that rmin is the minimum possible value of r that results in f and rmax the maximum possible value of r for which this holds, then you got an interval [rmin, rmax] where any number in that interval can be your actual number r.
Now, if you perform calculations on that number—adding, subtracting, multiplying, etc.—you lose precision. Every number is just an approximation, therefore you're actually performing calculations with intervals. The result is an interval too and the approximation error only ever gets larger, thereby widening the interval. You may get back a single number from that calculation. But that's merely one number from the interval of possible results, taking into account precision of your original operands and the precision loss due to the calculation.
That sort of thing is called Interval arithmetic and at least for me it was part of our math course at the university.
Show them that the base-10 system suffers from exactly the same problem.
Try to represent 1/3 as a decimal representation in base 10. You won't be able to do it exactly.
So if you write "0.3333", you will have a reasonably exact representation for many use cases.
But if you move that back to a fraction, you will get "3333/10000", which is not the same as "1/3".
Other fractions, such as 1/2 can easily be represented by a finite decimal representation in base-10: "0.5"
Now base-2 and base-10 suffer from essentially the same problem: both have some numbers that they can't represent exactly.
While base-10 has no problem representing 1/10 as "0.1" in base-2 you'd need an infinite representation starting with "0.000110011..".
How's this for an explantation to the layman. One way computers represent numbers is by counting discrete units. These are digital computers. For whole numbers, those without a fractional part, modern digital computers count powers of two: 1, 2, 4, 8. ,,, Place value, binary digits, blah , blah, blah. For fractions, digital computers count inverse powers of two: 1/2, 1/4, 1/8, ... The problem is that many numbers can't be represented by a sum of a finite number of those inverse powers. Using more place values (more bits) will increase the precision of the representation of those 'problem' numbers, but never get it exactly because it only has a limited number of bits. Some numbers can't be represented with an infinite number of bits.
Snooze...
OK, you want to measure the volume of water in a container, and you only have 3 measuring cups: full cup, half cup, and quarter cup. After counting the last full cup, let's say there is one third of a cup remaining. Yet you can't measure that because it doesn't exactly fill any combination of available cups. It doesn't fill the half cup, and the overflow from the quarter cup is too small to fill anything. So you have an error - the difference between 1/3 and 1/4. This error is compounded when you combine it with errors from other measurements.
In python:
>>> 1.0 / 10
0.10000000000000001
Explain how some fractions cannot be represented precisely in binary. Just like some fractions (like 1/3) cannot be represented precisely in base 10.
Another example, in C
printf (" %.20f \n", 3.6);
incredibly gives
3.60000000000000008882
Here is my simple understanding.
Problem:
The value 0.45 cannot be accurately be represented by a float and is rounded up to 0.450000018. Why is that?
Answer:
An int value of 45 is represented by the binary value 101101.
In order to make the value 0.45 it would be accurate if it you could take 45 x 10^-2 (= 45 / 10^2.)
But that’s impossible because you must use the base 2 instead of 10.
So the closest to 10^2 = 100 would be 128 = 2^7. The total number of bits you need is 9 : 6 for the value 45 (101101) + 3 bits for the value 7 (111).
Then the value 45 x 2^-7 = 0.3515625. Now you have a serious inaccuracy problem. 0.3515625 is not nearly close to 0.45.
How do we improve this inaccuracy? Well we could change the value 45 and 7 to something else.
How about 460 x 2^-10 = 0.44921875. You are now using 9 bits for 460 and 4 bits for 10. Then it’s a bit closer but still not that close. However if your initial desired value was 0.44921875 then you would get an exact match with no approximation.
So the formula for your value would be X = A x 2^B. Where A and B are integer values positive or negative.
Obviously the higher the numbers can be the higher would your accuracy become however as you know the number of bits to represent the values A and B are limited. For float you have a total number of 32. Double has 64 and Decimal has 128.
A cute piece of numerical weirdness may be observed if one converts 9999999.4999999999 to a float and back to a double. The result is reported as 10000000, even though that value is obviously closer to 9999999, and even though 9999999.499999999 correctly rounds to 9999999.
Related
Why does the following program print what it prints?
class Program
{
static void Main(string[] args)
{
float f1 = 0.09f*100f;
float f2 = 0.09f*99.999999f;
Console.WriteLine(f1 > f2);
}
}
Output is
false
Floating point only has so many digits of precision. If you're seeing f1 == f2, it is because any difference requires more precision than a 32-bit float can represent.
I recommend reading What Every Computer Scientist Should Read About Floating Point
The main thing is this isn't just .Net: it's a limitation of the underlying system most every language will use to represent a float in memory. The precision only goes so far.
You can also have some fun with relatively simple numbers, when you take into account that it's not even base ten. 0.1 (1/10th), for example, is a repeating decimal when represented in binary, just as 1/3rd is when represented in decimal.
In this particular case, it’s because .09 and .999999 cannot be represented with exact precision in binary (similarly, 1/3 cannot be represented with exact precision in decimal). For example, 0.111111111111111111101111 base 2 is 0.999998986721038818359375 base 10. Adding 1 to the previous binary value, 0.11111111111111111111 base 2 is 0.99999904632568359375 base 10. There isn’t a binary value for exactly 0.999999. Floating point precision is also limited by the space allocated for storing the exponent and the fractional part of the mantissa. Also, like integer types, floating point can overflow its range, although its range is larger than integer ranges.
Running this bit of C++ code in the Xcode debugger,
float myFloat = 0.1;
shows that myFloat gets the value 0.100000001. It is off by 0.000000001. Not a lot, but if the computation has several arithmetic operations, the imprecision can be compounded.
imho a very good explanation of floating point is in Chapter 14 of Introduction to Computer Organization with x86-64 Assembly Language & GNU/Linux by Bob Plantz of California State University at Sonoma (retired) http://bob.cs.sonoma.edu/getting_book.html. The following is based on that chapter.
Floating point is like scientific notation, where a value is stored as a mixed number greater than or equal to 1.0 and less than 2.0 (the mantissa), times another number to some power (the exponent). Floating point uses base 2 rather than base 10, but in the simple model Plantz gives, he uses base 10 for clarity’s sake. Imagine a system where two positions of storage are used for the mantissa, one position is used for the sign of the exponent* (0 representing + and 1 representing -), and one position is used for the exponent. Now add 0.93 and 0.91. The answer is 1.8, not 1.84.
9311 represents 0.93, or 9.3 times 10 to the -1.
9111 represents 0.91, or 9.1 times 10 to the -1.
The exact answer is 1.84, or 1.84 times 10 to the 0, which would be 18400 if we had 5 positions, but, having only four positions, the answer is 1800, or 1.8 times 10 to the zero, or 1.8. Of course, floating point data types can use more than four positions of storage, but the number of positions is still limited.
Not only is precision limited by space, but “an exact representation of fractional values in binary is limited to sums of inverse powers of two.” (Plantz, op. cit.).
0.11100110 (binary) = 0.89843750 (decimal)
0.11100111 (binary) = 0.90234375 (decimal)
There is no exact representation of 0.9 decimal in binary. Even carrying the fraction out more places doesn’t work, as you get into repeating 1100 forever on the right.
Beginning programmers often see floating point arithmetic as more
accurate than integer. It is true that even adding two very large
integers can cause overflow. Multiplication makes it even more likely
that the result will be very large and, thus, overflow. And when used
with two integers, the / operator in C/C++ causes the fractional part
to be lost. However, ... floating point representations have their own
set of inaccuracies. (Plantz, op. cit.)
*In floating point, both the sign of the number and the sign of the exponent are represented.
When I do the following double multiplication in C# 100.0 * 1.005 I get 100,49999999999999 as a result. I believe this is because the exact number (or some intermedia result when evaluting the expression) cannot be represented. When I do the same computation in calc.exe I get 100.5 as expected.
Another example is the ninefold incrementation of 0.001 (that is the first time a deviation occurs) so basically 9d * 0.001d = 0,0090000000000000011. When I do the same computation in calc.exe I get 0.009 as expected.
Now one can argue, that I should choose decimal instead. But with decimal I get the problem with other computations for example with ((1M / 3M) * 3M) = 0,9999999999999999999999999999 while calc.exe says 1.
With calc.exe I can divide 1 by 3 several times until some real small number and then multiply with 3 again as many several times and then I reach exacty 1. I therefore suspect, that calc.exe computes internally with fractions, but obviously with real big ones, because it computes
(677605234775492641 / 116759166847407000) + (932737194383944703 / 2451942503795547000)
where the common denominator is -3422539506717149376 (an overflow occured) when doing a long computation, so it must be at least ulong. Does anybody know how computation in calc.exe is implemented? Is this implementation made somewhere public for reuse?
As described here, calc uses an arbitrary-precision engine for its calculations, while double is standard IEEE-754 arithmetic, and decimal is also floating-point arithmetic, just in decimal, which, as you point out, has the same problems, just in another base.
You can try finding such an arbitrary-precision arithmetic library for C# and use it, e.g. this one (no idea whether it's good; was the first result). The one inside calc is not available as an API, so you cannot use it.
Another point is that when you round the result to a certain number of places (less than 15), you'd also get the intuitively "correct" result in a lot of cases. C# already does some rounding to hide the exact value of a double from you (where 0.3 is definitely not exactly 0.3, but probably something like 0.30000000000000004). By reducing the number of digits you display you lessen the occurrence of such very small differences from the correct value.
I have a class that does some length calculations based on a height on a ticket. It's been in place for years and working quite well... Until we got a unique ticket size.
They are entered by sales people in inches and are normally nice numbers like 3, 4 or 3.5 and store in a database - This one is however 3.66666 recurring (or 11/3) But it is being entered as 3.666 and causing the calculation to fail due to lost precision.
I have thought of a bit of a hack to restore precision for certain numbers, but thought maybe someone knows of a better way of getting a 3.666 or a 93.1333 back to it's number + two thirds status?
Thanks,
Mick.
As you explained in comments I see your point now. I've checked the numbers:
168000 / 3.666 = 45826.5139
168000 / 3.666666 = 45818.1901488
168000 * 3 / 11 = 45818.1818182
It makes a difference of 8 tickets. I have a feeling that your issue can be solved in many ways. On the side of user input for example. Or on the side of database. But back to your question:
How do I convert 3.666 or a 93.1333 back to it's number + two thirds
status?
You are looking for converting decimal (or double) to fraction.
There is already a question on SO: Algorithm for simplifying decimal to fractions which has many answeres. I've tested some of them, and none of them were satisfying. Some of them don't even hanlde recurrence. Perhaps I've missed the correct one, you can look by yourself.
Anyway, I believe you don't need to fully implement a conversion from 1.666 to 3/2, since it's not easy and you have a real-world sizes. You've said, that most of the time numbers are aroung 3, 3.5, 4 etc. So I suggest you to take a look at a question I've linked above and search for an algorythm of detecting the recurrence number. It was also discussed here How to know the repeating decimal in a fraction?
After what just convert 1.666 to 1.666666, since 1/1000000 of inch won't mess your calculations, as numbers above show.
It would be difficult to get the accurate value of double as double is floating point.
The MSDN says:
Remember that a floating-point number
can only approximate a decimal number,
and that the precision of a
floating-point number determines how
accurately that number approximates a
decimal number. By default, a Double
value contains 15 decimal digits of
precision, although a maximum of 17
digits is maintained internally. The
precision of a floating-point number
has several consequences:
Two floating-point numbers that appear equal for a particular
precision might not compare equal
because their least significant digits
are different.
A mathematical or comparison operation that uses a floating-point
number might not yield the same result
if a decimal number is used because
the floating-point number might not
exactly approximate the decimal
number.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why is floating point arithmetic in C# imprecise?
If I loop through numerous random doubles, and "round" them to two fractional digit places, each individual round appears to be correct (0.02, 0.01, 0.00, etc).
However, there appears to be a very small fractional part that is kept along with the round.
double total = 0;
for (int i = 0; i < 10000; i++)
{
total += Math.Round(random.NextDouble() * 0.02, 2);
}
Console.WriteLine(total);
Sample Outputs:
100.600000000006
99.7400000000059
Anyone care to explain why this happens is a more intuitive way?
System.Double and System.Float are base 2 floating point types. There are many finite decimal values that have an infinite representation in base 2, much as 1/3 has an infinite representation in base 10. Therefore, when you round to such a value, the binary representation is approximate. To avoid this problem, use the decimal type, which is a base 10 floating point type.
There must be 100 duplicates of this question on stackoverflow, but I am on my phone, which makes it inconvenient to find them and link to them.
For more information, look at the wikipedia article for IEEE double.
Many will say that doubles are "not exact", which is false. Every double value represents an exact value that can be represented exactly in base 10 (except NaN and infinity, of course). That's because 2 is one of the prime factors of 10. The only approximation is when you try to represent certain decimal fractions (or other rational numbers whose denominator has at least one prime factor other than 2).
The best way to understand this, for me at least, is to work out, on paper, the binary representations of a few fractions. For example, try 0.5, 0.625, 3.25, 5/16, 1/3, 0.2, and 0.3.
Doubles do not store base 10 numbers- they store a value in base 2, and so when storing fractional numbers they can exhibit small differences from the expected decimal value. For what it's worth, this is not unique to base 2. Base 10 (and really all base-N systems) has the same problem- take for example 1/3. In base 10 you end up representing this as something like 0.3333333(...), but there is no way to perfectly represent 1/3 in base 10.
In your example, you can experience small errors in the representation of the fractional portion of the number, and because you are adding these together you may see these small errors accumulate. Using my example above, if you round .333333(...) to 2 decimal places, you get .33, but that has a fairly substantial inaccuracy relative to the actual value of 1/3. Accumulating these inaccuracies is a common mistake when doing floating point math.
As #Phoog writes, there are many explanations of this on SO. Here's one: Why is floating point arithmetic in C# imprecise?
I always tell in c# a variable of type double is not suitable for money. All weird things could happen. But I can't seem to create an example to demonstrate some of these issues. Can anyone provide such an example?
(edit; this post was originally tagged C#; some replies refer to specific details of decimal, which therefore means System.Decimal).
(edit 2: I was specific asking for some c# code, so I don't think this is language agnostic only)
Very, very unsuitable. Use decimal.
double x = 3.65, y = 0.05, z = 3.7;
Console.WriteLine((x + y) == z); // false
(example from Jon's page here - recommended reading ;-p)
You will get odd errors effectively caused by rounding. In addition, comparisons with exact values are extremely tricky - you usually need to apply some sort of epsilon to check for the actual value being "near" a particular one.
Here's a concrete example:
using System;
class Test
{
static void Main()
{
double x = 0.1;
double y = x + x + x;
Console.WriteLine(y == 0.3); // Prints False
}
}
Yes it's unsuitable.
If I remember correctly double has about 17 significant numbers, so normally rounding errors will take place far behind the decimal point. Most financial software uses 4 decimals behind the decimal point, that leaves 13 decimals to work with so the maximum number you can work with for single operations is still very much higher than the USA national debt. But rounding errors will add up over time. If your software runs for a long time you'll eventually start losing cents. Certain operations will make this worse. For example adding large amounts to small amounts will cause a significant loss of precision.
You need fixed point datatypes for money operations, most people don't mind if you lose a cent here and there but accountants aren't like most people..
edit
According to this site http://msdn.microsoft.com/en-us/library/678hzkk9.aspx Doubles actually have 15 to 16 significant digits instead of 17.
#Jon Skeet decimal is more suitable than double because of its higher precision, 28 or 29 significant decimals. That means less chance of accumulated rounding errors becoming significant. Fixed point datatypes (ie integers that represent cents or 100th of a cent like I've seen used) like Boojum mentions are actually better suited.
Since decimal uses a scaling factor of multiples of 10, numbers like 0.1 can be represented exactly. In essence, the decimal type represents this as 1 / 10 ^ 1, whereas a double would represent this as 104857 / 2 ^ 20 (in reality it would be more like really-big-number / 2 ^ 1023).
A decimal can exactly represent any base 10 value with up to 28/29 significant digits (like 0.1). A double can't.
My understanding is that most financial systems express currency using integers -- i.e., counting everything in cents.
IEEE double precision actually can represent all integers exactly in the range -2^53 through +2^53. (Hacker's Delight, pg. 262) If you use only addition, subtraction and multiplication, and keep everything to integers within this range then you should see no loss of precision. I'd be very wary of division or more complex operations, however.
Using double when you don't know what you are doing is unsuitable.
"double" can represent an amount of a trillion dollars with an error of 1/90th of a cent. So you will get highly precise results. Want to calculate how much it costs to put a man on Mars and get him back alive? double will do just fine.
But with money there are often very specific rules saying that a certain calculation must give a certain result and no other. If you calculate an amount that is very very very close to $98.135 then there will often be a rule that determines whether the result should be $98.14 or $98.13 and you must follow that rule and get the result that is required.
Depending on where you live, using 64 bit integers to represent cents or pennies or kopeks or whatever is the smallest unit in your country will usually work just fine. For example, 64 bit signed integers representing cents can represent values up to 92,223 trillion dollars. 32 bit integers are usually unsuitable.
No a double will always have rounding errors, use "decimal" if you're on .Net...
Actually floating-point double is perfectly well suited to representing amounts of money as long as you pick a suitable unit.
See http://www.idinews.com/moneyRep.html
So is fixed-point long. Either consumes 8 bytes, surely preferable to the 16 consumed by a decimal item.
Whether or not something works (i.e. yields the expected and correct result) is not a matter of either voting or individual preference. A technique either works or it doesn't.