Why 1.0f + 0.0000000171785715f returns 1f? - c#

After one hour of trying to find a bug in my code I've finally found the reason. I was trying to add a very small float to 1f, but nothing was happening. While trying to figure out why I found that adding that small float to 0f worked perfectly.
Why is this happening?
Does this have to do with 'orders of magnitude'?
Is there any workaround to this problem?
Thanks in advance.
Edit:
Changing to double precision or decimal is not an option at the moment.

Because precision for a single-precision (32 bit) floating-point value is around 7 digits after the decimal point. Which means the value you are adding is essentially zero, at least when added to 1. The value itself, however, can effortlessly stored in a float since the exponent is small in that case. But to successfully add it to 1 you have to use the exponent of the larger number ... and then the digits after the zeroes disappear in rounding.
You can use double if you need more precision. Performance-wise this shouldn't make a difference on today's hardware and memory is often also not as constrained that you have to think about every single variable.
EDIT: As you stated that using double is not an option you could use Kahan summation, as akuhn pointed out in a comment.
Another option may be to perform intermediary calculations in double-precision and afterwards cast to float again. This will only help, however, when there are a few more operations than just adding a very small number to a larger one.

Floating-point arithmetic

This probably happens because the number of digits of precision in a float is constant, but the exponent can obviously vary.
This means that although you can add your small number to 0, you cannot expect to add it to a number that has an exponent different from 0, since there just won't be enough digits of precision left.
You should read What Every Computer Scientist Should Know About Floating-Point Arithmetic.

It looks like it has something to do with floating point precision. If I were you, I'd use a different type, like decimal. That should fix precision errors.

With float, you only get an accuracy of about seven digits. So your number'll be rounded into 1f. If you want to store such number, use double instead
http://msdn.microsoft.com/en-us/library/ayazw934.aspx

In addition to the accepted answer: If you need to sum up many small number and some larger ones, you should use Kahan Summation.

If performance is an issue (because you can't use double), then binary scaling/fixed-point may be an option. floats are stored as integers, but scaled by a large number (say, 2^16). Intermediate arithmetic is done with (relatively fast) integer operations. The final answer can be converted back to floating point at the end, by dividing by the scaling factor.
This is often done if the target processor lacks a hardware floating-point unit.

You're using the f suffix on your literals, which will make these floats instead of doubles. So your very small float will vanish in the bigger float.

Related

c# floating point for loop, unexpected results

Can anyone explain to me why this program:
for(float i = -1; i < 1; i += .1F)
Console.WriteLine(i);
Outputs this:
-1
-0.9
-0.8
-0.6999999
-0.5999999
-0.4999999
-0.3999999
-0.2999999
-0.1999999
-0.99999993
7.450581E-08
0.1000001
0.2000001
0.3000001
0.4000001
0.5000001
0.6000001
0.7000001
0.8000001
0.9000002
Where is the rounding error coming from??
I'm sure this question must have been asked in some form before but I can't find it anywhere quickly. :)
The answer comes down to the way that floating point numbers are represented. You can go into the technical detail via wikipedia but it is simply put that a decimal number doesn't necessarily have an exact floating point representation...
The way floating point numbers (base 2 floating point anyway like doubles and floats) work [0]is by adding up powers of 1/2 to get to what you want. So 0.5 is just 1/2. 0.75 is 1/2+1/4 and so on.
the problem comes that you can never represent 0.1 in this binary system without an unending stream of increasingly smaller powers of 2 so the best a computer can do is store a number that is very close to but not quite 0.1.
Usually you don't notice these differences but they are there and sometimes you can make them manifest themselves. There are a lot of ways to deal with these issues and which one you use is very much dependant on what you are actually doing with it.
[0] in the slightly handwavey close enough kind of way
Floating point numbers are not correct, they are always approximated because they must be rounded!!
They are precise in binary representation.
Every CPU or pc could lead to different results.
Take a look at Wikipedia page
The big issue is that 0.1 cannot be represented in binary, just like 1 / 3 or 1 / 7 cannot be represented in decimal. So since the computer has to cut off at some point, it will accumulate a rounding error.
Try doing 0.1 + 0.7 == 0.8 in pretty much any programming language, you'll get false as a result.
In C# to get around this, use the decimal type to get better precision.
This will explain everything about floating-point:
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
The rounding error comes from the fact that Float is not a precise data type (when converted to decimal), it is an approxomation, note in the C# Reference Float is specified as having 7 digits of decimal precision.
It is fundamental to the any floating point variable. The reasons are complex but there is plenty of information if you google it.
Try using Decimal instead.
As other posters have intimated, the problem stems from the assumption that floating point numbers are a precise decimal representation. They are not- they are a precise binary (base-2) representation of a number. The problem you are experiencing is that you cannot always express a precise binary number in decimal format- just like you cannot express 1/3 in decimal format (.33333333...). At some point, rounding must occur.
In your example, rounding is occurring when you express .1F (because that is not a value that can be expressed precisely in base-2).

Bitwise representation of division of floats - how division of floats works

A number can have multiple representations if we use a float, so the results of a division of floats may produce bitwise different floats. But what if the denominator is a power of 2?
AFAIK, dividing by a power of 2 would only shift the exponent, leaving the same mantissa, always producing bitwise identical floats. Is that right?
float a = xxx;
float result = n/1024f; // always the same result?
--- UPDATE ----------------------
Sorry for my lack of knowledge in the IEEE black magic for floating points :) , but I'm talking about those numbers Guvante mentioned: no representation for certain decimal numbers, 'inaccurate' floats. For the rest of this post I'll use 'accurate' and 'inaccurate' considering Guvante's definition of these words.
To simplify, let's say the numerator is always an 'accurate' number. Also, let's divide not by any power of 2, but always for 1024. Additionally, I'm doing the operation the same way every time (same method), so I'm talking about getting the same results in different executions (for the same inputs, sure).
I'm asking all this because I see different numbers coming from the same inputs, so I thought: well if I only use 'accurate' floats as numerators and divide by 1024 I will only shift the exponent, still having an 'accurate' float.
You asked for an example. The real problem is this: I have a simulator producing sometimes 0.02999994 and sometimes 0.03000000 for the same inputs. I thought I could multiply these numbers by 1024, round to get an 'integer' ('accurate' float) that would be the same for those two numbers, and then divide by 1024 to get an 'accurate' rounded float.
I was told (in my other question) that I could convert to decimal, round and cast to float, but I want to know if this way works.
A number can have multiple representations if we use a float
The question appears to be predicated on an incorrect premise; the only number that has multiple representations as a float is zero, which can be represented as either "positive zero" or "negative zero". Other than zero a given number only has one representation as a float, assuming that you are talking about the "double" or "float" types.
Or perhaps I misunderstand. Is the issue that you are referring to the fact that the compiler is permitted to do floating point operations in higher precision than the 32 or 64 bits available for storage? That can cause divisions and multiplications to produce different results in some cases.
Since people often don't fully grasp floating point numbers I will go over some of your points real quick. Each particular combination of bits in a floating point number represent a unique number. However because that number has a base 2 fractional component, there is no representation for certain decimal numbers. For instance 1.1. In those cases you take the closest number. IEEE 754-2008 specifies round to nearest, ties to even in these cases.
The real difficulty is when you combine two of these 'inaccurate' numbers. This can introduce problems as each intermediate step will involve rounding. If you calculate the same value using two different methods, you could come up with subtly different values. Typically this is handled with an epsilon when you want equality.
Now onto your real question, can you divide by a power of two and avoid introducing any additional 'inaccuracies'? Normally you can, however as with all floating point numbers, denormals and other odd cases have their own logic, and obviously if your mantissa overflows you will have difficulty. And again note, that no mathematical errors are introduced during any of this, it is simply math being done with limited percision, which involves intermittent rounding of results.
EDIT: In response to new question
What you are saying could work, but is pretty much equivalent to rounding. Additionally if you are just looking for equality, you should use an episilon as I mentioned earlier (a - b) < e for some small value e (0.0001 would work in your example). If you are looking to print out a pretty number, and the framework you are using isn't doing it to your liking, some rounding would be the most direct way of describing your solution, which is always a plus.

How does decimal work?

I looked at decimal in C# but I wasnt 100% sure what it did.
Is it lossy? in C# writing 1.0000000000001f+1.0000000000001f results in 2 when using float (double gets you 2.0000000000002 which is correct) is it possible to add two things with decimal and not get the correct answer?
How many decimal places can I use? I see the MaxValue is 79228162514264337593543950335 but if i subtract 1 how many decimal places can I use?
Are there quirks I should know of? In C# its 128bits, in other language how many bits is it and will it work the same way as C# decimal does? (when adding, dividing, multiplication)
What you're showing isn't decimal - it's float. They're very different types. f is the suffix for float, aka System.Single. m is the suffix for decimal, aka System.Decimal. It's not clear from your question whether you thought this was actually using decimal, or whether you were just using float to demonstrate your fears.
If you use 1.0000000000001m + 1.0000000000001m you'll get exactly the right value. Note that the double version wasn't able to express either of the individual values exactly, by the way.
I have articles on both kinds of floating point in .NET, and you should read them thoroughly, along other resources:
Binary floating point (float/double)
Decimal floating point (decimal)
All floating point types have their limits of course, but in particular you should not expect binary floating point to accurately represent decimal values such as 0.1. It still can't represent anything that isn't exactly representable in 28/29 decimal digits though - so if you divide 1 by 3, you won't get the exact answer of course.
You should also note that the range of decimal is considerably smaller than that of double. So while it can have 28-29 decimal digits of precision, you can't represent truly huge numbers (e.g. 10200) or miniscule numbers (e.g. 10-200).
Decimals in programming are (almost) never 100% accurate. Sometimes it's even better to multiply the decimal value with a very high number and then calculate, but that's only if you're for example sure that the value is always between 0 and 100(so it won't get out of range of the maxvalue)
Floting point is inherently imprecise. Some numbers can't be represented faithfully. Decimal is a large floating point with high precision. If you look on the page at msdn you can see there are "28-29 significant digits." The .net framework classes are language agnostic. they will work the same in every language that uses .net.
edit (in response to Jon Skeet): If you initialize the Decimal class with the numbers above, which are less than 28 digits each after the decimal point, the number will be stored faithfully as long as the binary representation is exact. Since it works in 64-bit format, I assume the 128-bit will handle it perfectly fine. Some numbers, such as 0.1, will never be exactly representable because they are a repeating sequence in binary.

When should I use double instead of decimal?

I can name three advantages to using double (or float) instead of decimal:
Uses less memory.
Faster because floating point math operations are natively supported by processors.
Can represent a larger range of numbers.
But these advantages seem to apply only to calculation intensive operations, such as those found in modeling software. Of course, doubles should not be used when precision is required, such as financial calculations. So are there any practical reasons to ever choose double (or float) instead of decimal in "normal" applications?
Edited to add:
Thanks for all the great responses, I learned from them.
One further question: A few people made the point that doubles can more precisely represent real numbers. When declared I would think that they usually more accurately represent them as well. But is it a true statement that the accuracy may decrease (sometimes significantly) when floating point operations are performed?
I think you've summarised the advantages quite well. You are however missing one point. The decimal type is only more accurate at representing base 10 numbers (e.g. those used in currency/financial calculations). In general, the double type is going to offer at least as great precision (someone correct me if I'm wrong) and definitely greater speed for arbitrary real numbers. The simple conclusion is: when considering which to use, always use double unless you need the base 10 accuracy that decimal offers.
Edit:
Regarding your additional question about the decrease in accuracy of floating-point numbers after operations, this is a slightly more subtle issue. Indeed, precision (I use the term interchangeably for accuracy here) will steadily decrease after each operation is performed. This is due to two reasons:
the fact that certain numbers (most obviously decimals) can't be truly represented in floating point form
rounding errors occur, just as if you were doing the calculation by hand. It depends greatly on the context (how many operations you're performing) whether these errors are significant enough to warrant much thought however.
In all cases, if you want to compare two floating-point numbers that should in theory be equivalent (but were arrived at using different calculations), you need to allow a certain degree of tolerance (how much varies, but is typically very small).
For a more detailed overview of the particular cases where errors in accuracies can be introduced, see the Accuracy section of the Wikipedia article. Finally, if you want a seriously in-depth (and mathematical) discussion of floating-point numbers/operations at machine level, try reading the oft-quoted article What Every Computer Scientist Should Know About Floating-Point Arithmetic.
You seem spot on with the benefits of using a floating point type. I tend to design for decimals in all cases, and rely on a profiler to let me know if operations on decimal is causing bottlenecks or slow-downs. In those cases, I will "down cast" to double or float, but only do it internally, and carefully try to manage precision loss by limiting the number of significant digits in the mathematical operation being performed.
In general, if your value is transient (not reused), you're safe to use a floating point type. The real problem with floating point types is the following three scenarios.
You are aggregating floating point values (in which case the precision errors compound)
You build values based on the floating point value (for example in a recursive algorithm)
You are doing math with a very wide number of significant digits (for example, 123456789.1 * .000000000000000987654321)
EDIT
According to the reference documentation on C# decimals:
The decimal keyword denotes a
128-bit data type. Compared to
floating-point types, the decimal type
has a greater precision and a smaller
range, which makes it suitable for
financial and monetary calculations.
So to clarify my above statement:
I tend to design for decimals in all
cases, and rely on a profiler to let
me know if operations on decimal is
causing bottlenecks or slow-downs.
I have only ever worked in industries where decimals are favorable. If you're working on phsyics or graphics engines, it's probably much more beneficial to design for a floating point type (float or double).
Decimal is not infinitely precise (it is impossible to represent infinite precision for non-integral in a primitive data type), but it is far more precise than double:
decimal = 28-29 significant digits
double = 15-16 significant digits
float = 7 significant digits
EDIT 2
In response to Konrad Rudolph's comment, item # 1 (above) is definitely correct. Aggregation of imprecision does indeed compound. See the below code for an example:
private const float THREE_FIFTHS = 3f / 5f;
private const int ONE_MILLION = 1000000;
public static void Main(string[] args)
{
Console.WriteLine("Three Fifths: {0}", THREE_FIFTHS.ToString("F10"));
float asSingle = 0f;
double asDouble = 0d;
decimal asDecimal = 0M;
for (int i = 0; i < ONE_MILLION; i++)
{
asSingle += THREE_FIFTHS;
asDouble += THREE_FIFTHS;
asDecimal += (decimal) THREE_FIFTHS;
}
Console.WriteLine("Six Hundred Thousand: {0:F10}", THREE_FIFTHS * ONE_MILLION);
Console.WriteLine("Single: {0}", asSingle.ToString("F10"));
Console.WriteLine("Double: {0}", asDouble.ToString("F10"));
Console.WriteLine("Decimal: {0}", asDecimal.ToString("F10"));
Console.ReadLine();
}
This outputs the following:
Three Fifths: 0.6000000000
Six Hundred Thousand: 600000.0000000000
Single: 599093.4000000000
Double: 599999.9999886850
Decimal: 600000.0000000000
As you can see, even though we are adding from the same source constant, the results of the double is less precise (although probably will round correctly), and the float is far less precise, to the point where it has been reduced to only two significant digits.
Use decimal for base 10 values, e.g. financial calculations, as others have suggested.
But double is generally more accurate for arbitrary calculated values.
For example if you want to calculate the weight of each line in a portfolio, use double as the result will more nearly add up to 100%.
In the following example, doubleResult is closer to 1 than decimalResult:
// Add one third + one third + one third with decimal
decimal decimalValue = 1M / 3M;
decimal decimalResult = decimalValue + decimalValue + decimalValue;
// Add one third + one third + one third with double
double doubleValue = 1D / 3D;
double doubleResult = doubleValue + doubleValue + doubleValue;
So again taking the example of a portfolio:
The market value of each line in the portfolio is a monetary value and would probably be best represented as decimal.
The weight of each line in the portfolio (= Market Value / SUM(Market Value)) is usually better represented as double.
Use a double or a float when you don't need precision, for example, in a platformer game I wrote, I used a float to store the player velocities. Obviously I don't need super precision here because I eventually round to an Int for drawing on the screen.
In some Accounting, consider the possibility of using integral types instead or in conjunction. For example, let say that the rules you operate under require every calculation result carry forward with at least 6 decimal places and the final result will be rounded to the nearest penny.
A calculation of 1/6th of $100 yields $16.66666666666666..., so the value carried forth in a worksheet will be $16.666667. Both double and decimal should yield that result accurately to 6 decimal places. However, we can avoid any cumulative error by carrying the result forward as an integer 16666667. Each subsequent calculation can be made with the same precision and carried forward similarly. Continuing the example, I calculate Texas sales tax on that amount (16666667 * .0825 = 1375000). Adding the two (it's a short worksheet) 1666667 + 1375000 = 18041667. Moving the decimal point back in gives us 18.041667, or $18.04.
While this short example wouldn't yield a cumulative error using double or decimal, it's fairly easy to show cases where simply calculating the double or decimal and carrying forward would accumulate significant error. If the rules you operate under require a limited number of decimal places, storing each value as an integer by multiplying by 10^(required # of decimal place), and then dividing by 10^(required # of decimal places) to get the actual value will avoid any cumulative error.
In situations where fractions of pennies do not occur (for example, a vending machine), there is no reason to use non-integral types at all. Simply think of it as counting pennies, not dollars. I have seen code where every calculation involved only whole pennies, yet use of double led to errors! Integer only math removed the issue. So my unconventional answer is, when possible, forgo both double and decimal.
If you need to binary interrop with other languages or platforms, then you might need to use float or double, which are standardized.
Depends on what you need it for.
Because float and double are binary data types you have some diifculties and errrors in the way in rounds numbers, so for instance double would round 0.1 to 0.100000001490116, double would also round 1 / 3 to 0.33333334326441. Simply put not all real numbers have accurate representation in double types
Luckily C# also supports the so-called decimal floating-point arithmetic, where numbers are represented via the decimal numeric system rather than the binary system. Thus, the decimal floating point-arithmetic does not lose accuracy when storing and processing floating-point numbers. This makes it immensely suited to calculations where a high level of accuracy is needed.
Note: this post is based on information of the decimal type's capabilities from http://csharpindepth.com/Articles/General/Decimal.aspx and my own interpretation of what that means. I will assume Double is normal IEEE double precision.
Note2: smallest and largest in this post reffer to the magnitude of the number.
Pros of "decimal".
"decimal" can represent exactly numbers that can be written as (sufficiently short) decimal fractions, double cannot. This is important in financial ledgers and similar where it is important that the results exactly match what a human doing the calculations would give.
"decimal" has a much larger mantissa than "double". That means that for values within it's normalised range "decimal" will have a much higher precision than double.
Cons of decimal
It will be Much slower (I don't have benchmarks but I would guess at least an order of magnitude maybe more), decimal will not benefit from any hardware acceleration and arithmetic on it will require relatively expensive multiplication/division by powers of 10 (which is far more expensive than multiplication and dividion by powers of 2) to match the exponent before addition/subtraction and to bring the exponent back into range after multiplication/division.
decimal will overflow earlier tha double will. decimal can only represent numbers up to ±296-1 . By comparision double can represent numbers up to nearly ±21024
decimal will underflow earlier. The smallest numbers representable in decimal are ±10-28 . By comparision double can represent values down to 2-149 (approx 10-45) if subnromal numbers are supported and 2-126 (approx 10-38) if they are not.
decimal takes up twice as much memory as double.
My opinion is that you should default to using "decimal" for money work and other cases where matching human calculation exactly is important and that you should use use double as your default choice the rest of the time.
Use floating points if you value performance over correctness.
Choose the type in function of your application. If you need precision like in financial analysis, you have answered your question. But if your application can settle with an estimate your ok with double.
Is your application in need of a fast calculation or will he have all the time in the world to give you an answer? It really depends on the type of application.
Graphic hungry? float or double is enough. Financial data analysis, meteor striking a planet kind of precision ? Those would need a bit of precision :)
Decimal has wider bytes, double is natively supported by CPU. Decimal is base-10, so a decimal-to-double conversion is happening while a decimal is computed.
For accounting - decimal
For finance - double
For heavy computation - double
Keep in mind .NET CLR only supports Math.Pow(double,double). Decimal is not supported.
.NET Framework 4
[SecuritySafeCritical]
public static extern double Pow(double x, double y);
A double values will serialize to scientific notation by default if that notation is shorter than the decimal display. (e.g. .00000003 will be 3e-8) Decimal values will never serialize to scientific notation. When serializing for consumption by an external party, this may be a consideration.

Is a double really unsuitable for money?

I always tell in c# a variable of type double is not suitable for money. All weird things could happen. But I can't seem to create an example to demonstrate some of these issues. Can anyone provide such an example?
(edit; this post was originally tagged C#; some replies refer to specific details of decimal, which therefore means System.Decimal).
(edit 2: I was specific asking for some c# code, so I don't think this is language agnostic only)
Very, very unsuitable. Use decimal.
double x = 3.65, y = 0.05, z = 3.7;
Console.WriteLine((x + y) == z); // false
(example from Jon's page here - recommended reading ;-p)
You will get odd errors effectively caused by rounding. In addition, comparisons with exact values are extremely tricky - you usually need to apply some sort of epsilon to check for the actual value being "near" a particular one.
Here's a concrete example:
using System;
class Test
{
static void Main()
{
double x = 0.1;
double y = x + x + x;
Console.WriteLine(y == 0.3); // Prints False
}
}
Yes it's unsuitable.
If I remember correctly double has about 17 significant numbers, so normally rounding errors will take place far behind the decimal point. Most financial software uses 4 decimals behind the decimal point, that leaves 13 decimals to work with so the maximum number you can work with for single operations is still very much higher than the USA national debt. But rounding errors will add up over time. If your software runs for a long time you'll eventually start losing cents. Certain operations will make this worse. For example adding large amounts to small amounts will cause a significant loss of precision.
You need fixed point datatypes for money operations, most people don't mind if you lose a cent here and there but accountants aren't like most people..
edit
According to this site http://msdn.microsoft.com/en-us/library/678hzkk9.aspx Doubles actually have 15 to 16 significant digits instead of 17.
#Jon Skeet decimal is more suitable than double because of its higher precision, 28 or 29 significant decimals. That means less chance of accumulated rounding errors becoming significant. Fixed point datatypes (ie integers that represent cents or 100th of a cent like I've seen used) like Boojum mentions are actually better suited.
Since decimal uses a scaling factor of multiples of 10, numbers like 0.1 can be represented exactly. In essence, the decimal type represents this as 1 / 10 ^ 1, whereas a double would represent this as 104857 / 2 ^ 20 (in reality it would be more like really-big-number / 2 ^ 1023).
A decimal can exactly represent any base 10 value with up to 28/29 significant digits (like 0.1). A double can't.
My understanding is that most financial systems express currency using integers -- i.e., counting everything in cents.
IEEE double precision actually can represent all integers exactly in the range -2^53 through +2^53. (Hacker's Delight, pg. 262) If you use only addition, subtraction and multiplication, and keep everything to integers within this range then you should see no loss of precision. I'd be very wary of division or more complex operations, however.
Using double when you don't know what you are doing is unsuitable.
"double" can represent an amount of a trillion dollars with an error of 1/90th of a cent. So you will get highly precise results. Want to calculate how much it costs to put a man on Mars and get him back alive? double will do just fine.
But with money there are often very specific rules saying that a certain calculation must give a certain result and no other. If you calculate an amount that is very very very close to $98.135 then there will often be a rule that determines whether the result should be $98.14 or $98.13 and you must follow that rule and get the result that is required.
Depending on where you live, using 64 bit integers to represent cents or pennies or kopeks or whatever is the smallest unit in your country will usually work just fine. For example, 64 bit signed integers representing cents can represent values up to 92,223 trillion dollars. 32 bit integers are usually unsuitable.
No a double will always have rounding errors, use "decimal" if you're on .Net...
Actually floating-point double is perfectly well suited to representing amounts of money as long as you pick a suitable unit.
See http://www.idinews.com/moneyRep.html
So is fixed-point long. Either consumes 8 bytes, surely preferable to the 16 consumed by a decimal item.
Whether or not something works (i.e. yields the expected and correct result) is not a matter of either voting or individual preference. A technique either works or it doesn't.

Categories