I am having very different results when comparing, what seems like identical floating point comparison statements. What is bizarre, both statements are true in 64-bit, and only in 32-bit are the results not equal.
Even if I explicitly cast the '132' and/or 'initial' to an Int32, the result in 32-bit is still the same.
int initial = 134;
float initialConverted = initial/255.0f;
// both are true in 64, abs2 is false in 32
var abs = Math.Abs(initialConverted - (134/255.0f)) < float.Epsilon;
var abs2 = Math.Abs(initialConverted - (initial/255.0f)) < float.Epsilon;
Why is there a problem with division when the integer value is stored in its own field?
This is just a variant of the normal floating point comparison and accuracy problems.
Floating point calculations are slightly different in 32-bit and 64-bit and slightly different between DEBUG and RELEASE builds. Most likely in one setting it evaluates to 0, in another to something equal to or slightly larger than float.Epsilon.
I would not use float.Epsilon, it is far too small to handle normal inaccuracies. Instead you need to decide on an epsilon value yourself, that would be "close enough".
float.Epsilon is the same as Single.Epsilon, which is documented as:
Represents the smallest positive Single value that is greater than zero. This field is constant.
In other words, this is just the smallest number representable in the Single data type, it is not usable to handle inaccuracies in calculations, you need something larger for that. Single.Epsilon is somewhere in the vicinity of 1.4E-45 which doesn't allow any inaccuracies at all.
There are a couple of things going on here.
Firstly, C# uses a different definition of epsilon than other languages. float.Epsilon is the next largest float after 0. Due to the scaling property of floating point numbers, this is very small (1.0e-45f assuming standard IEEE754 binary32 format). In most other languages (such as say, FLT_EPSILON in C), epsilon refers to the difference between 1 and the next largest float (1.1920929e-7f in binary32 format).
What this means is that the threshold you're using is very tight, too tight too allow for the usual floating point rounding error.
The reason for the difference between archiectures is due to differences in handling intermediate precision. On a modern CPU, there are two sets of instructions for handling floating point numbers:
x87 instructions: these date back to the original 8086 processors (or, more specifically, the 8087 coprocessors that accompanied them). They internally utilise a higher precision than the format, namely an 80-bit format (compared with typical 32-bit floats and 64-bit doubles). However at certain steps, operations will need to be truncated to the destination format. The precise rules for when this occurs depends on the language (see here for C#). This is the reason why your abs2 is false on a 32-bit machine: initialConverted has been rounded to a float, but the second (initial/255.0f) has not (I'm not sure why this doesn't occur in abs, but I guess the compiler optimises away the constant expression (134/255.0f) into a float).
SSE instructions: these were introduced as "fast-but-restrictive" floating point operations for games and multimedia, but now have almost completely supplanted the x87 instructions on modern processors. Unlike x87, there is no extended precision (so a float - float immediately returns a float), they are faster, and offer basic parallelism via SIMD operations. They are almost certainly being used on a 64-bit machine (they are also available on most 32-bit machines from the past decade, but compilers tend not to use them, I guess for compatibility reasons). As there is no extended precision, initialConverted and (initial/255.0f) will both be identical floats, hence abs2 is true.
Related
There was some code in my project that compared two double values to see if their difference exceeded 0, such as:
if (totValue != 1.0)
Resharper complained about this, suggesting I should use "EPSILON" and added just such a constant to the code (when invited to do so). However, it does not create the constant itself or suggest what value it should be. Is this a good solution:
const double EPSILON = double.Epsilon; // see http://msdn.microsoft.com/en-us/library/system.double.epsilon.aspx
. . .
if (Math.Abs(totValue - 1.0) > EPSILON)
compValue = Convert.ToString(totValue*Convert.ToDouble(compValue));
?
UPDATE
I changed it to this:
const double EPSILON = 0.001;
...thinking that's probably both large and small enough to work for typical double vals (not scientific stuff, just "I have 2.5 of these," etc.)
No, it is not a sensible value for your epsilon. The code you have is no different than a straight equality check.
double.Epsilon is the smallest possible difference that there can be between any two doubles. There is no way for any two doubles to be closer to each other than by being double.Epsilon apart, so the only way for this check to be true is for them to be exactly equal.
As for what epsilon value you should actually use, that simply depends, which is why one isn't automatically generated for you. It all depends on what types of operations you're doing to the data (which affects the possible deviation from the "true value") along with how much precision actually care about in your application (and of course if the precision that you care about is greater than your margin of error, you have a problem). Your epsilon needs to be some precision value greater than (or equal to) the precision you need, while being less than the possible margin of error of all operations performed on either numeric value.
Yes. But even better is to not use floating point. Use decimal instad.
However, if, for some reason, you have to stick to double never compare directly, that means, never rely on e.g. a-b == 0 with (a and b being some double values which are meant to be equal).
Floating point arithmetic is fast, but not precise, and taken that into account, R# is correct.
Mathematically, 0.9 recurring can be shown to be equal to 1. This question however, is not about infinity, convergence, or the maths behind this.
The above assumption can be represented using doubles in C# with the following.
var oneOverNine = 1d / 9d;
var resultTimesNine = oneOverNine * 9d;
Using the code above, (resultTimesNine == 1d) evaluates to true.
When using decimals instead, the evaluation yields false, yet, my question is not about the disparate precision of double and decimal.
Since no type has infinite precision, how and why does double maintain such an equality where decimal does not? What is happening literally 'between the lines' of code above, with regards to the manner in which the oneOverNine variable is stored in memory?
It depends on the rounding used to get the closest representable value to 1/9. It could go either way. You can investigate the issue of representability at Rob Kennedy's useful page: http://pages.cs.wisc.edu/~rkennedy/exact-float
But don't think that somehow double is able to achieve exactness. It isn't. If you try with 2/9, 3/9 etc. you will find cases where the rounding goes the other way. The bottom line is that 1/9 is not exactly representable in binary floating point. And so rounding happens and your calculations are subject to rounding errors.
What is happening literally 'between the lines' of code above, with regards to the manner in which the oneOverNine variable is stored in memory?
What you're asking about is called IEEE 754. This is the spec that C#, it's underlying .Net runtime, and most other programming platforms use to store and manipulate decimal values. This is because support for IEEE 754 is typically implemented directly at the CPU/chipset-level, making it both far more performant than an alternative implemented solely in software and far easier when building compilers, because the operations will map almost directly to specific CPU instructions.
This is a question concerning cross-platform consistency and determinism of floating point operations (IE yield different results on different CPUs/sysetms)
Which one is more likely to stay cross-platform consistent(pseudo code):
float myFloat = float ( myInteger) / float( 1024 )
or
float myFloat = float ( myInteger ) / float( 1000 )
Platforms are C# and AS3.
.
AS3 versions:
var myFloat:Number = myInteger / 1000 // AS3
var myFloat:Number = myInteger / 1024 // AS3
- Ok I've added AS3 version for clarification, which is equivalent to the 'C pseudo code' above . As you can see in AS3 all calculations, even on integers, are performed as Floats automatically, a cast is not required ( and nor can you avoid it or force the runtime to perform true integer divisions )
Hopefully this explains why im 'casting' everything into Floats: I am not! that just simply what happens in one of the target languages!
The first one is likely the same on both platforms, since there are no representation issues. In particular for small integers (highest 8 bits unused) there is one exact result, and it's very likely that this result will be used.
But I wouldn't rely on it. If you need guaranteed determinism, I recommend implementing the required arithmetic yourself on top of plain integers. For example using a fixed point representation.
The second one is likely to be inconsistent, even when using the same C# code on different hardware or .net versions. See the related question Is floating-point math consistent in C#? Can it be?
I suggest you read the IEEE 754-1985 standard. A copy can be purchased for $43. Although superseded by the 2008 version, it is an excellent introduction to floating-point because it is only 20 pages and is quite readable. It will show you why both dividing by 1000 and by 1024 are deterministic and why the former may have error but the latter does not (except in cases of underflow). It will also give you a basis for understanding the answers you have been given and why you are on the wrong track.
Which one is more likely to stay cross-platform consistent(pseudo code):
Dividing by 1024.
Every binary-based floating point systems (which are IEEE754, IBM, VAX, Cray) which applies division by 1024 to all finite numbers will yield exact results in the given representation. The reason is that dividing by 1024 is equivalent to
shifting the bits 10 positions to the right which means
decreasing the binary exponent by 10
If the number is too small (for IEEE754 1E-38/1E-308), you will lose an exact result, but this is not a problem of the operation, but of the limited range of the number...it simply cannot display such small results accurately.
As no rounding is necessary, there can be no difference due to rounding (and yes, while most programming languages use round to even, some enable choosing another rounding mode).
What is the maximum double value that can be represented\converted to a decimal?
How can this value be derived - example please.
Update
Given a maximum value for a double that can be converted to a decimal, I would expect to be able to round-trip the double to a decimal, and then back again. However, given a figure such as (2^52)-1 as in #Jirka's answer, this does not work. For example:
Test]
public void round_trip_double_to_decimal()
{
double maxDecimalAsDouble = (Math.Pow(2, 52) - 1);
decimal toDecimal = Convert.ToDecimal(maxDecimalAsDouble);
double toDouble = Convert.ToDouble(toDecimal);
//Fails.
Assert.That(toDouble, Is.EqualTo(maxDecimalAsDouble));
}
All integers between -9,007,199,254,740,992 and 9,007,199,254,740,991 can be exactly represented in a double. (Keep reading, though.)
The upper bound is derived as 2^53 - 1. The internal representation of it is something like (0x1.fffffffffffff * 2^52) if you pardon my hexadecimal syntax.
Outside of this range, many integers can be still exactly represented if they are a multiple of a power of two.
The highest integer whatsoever that can be accurately represented would therefore be 9,007,199,254,740,991 * (2 ^ 1023), which is even higher than Decimal.MaxValue but this is a pretty meaningless fact, given that the value does not bother to change, for example, when you subtract 1 in double arithmetic.
Based on the comments and further research, I am adding info on .NET and Mono implementations of C# that relativizes most conclusions you and I might want to make.
Math.Pow does not seem to guarantee any particular accuracy and it seems to deliver a bit or two fewer than what a double can represent. This is not too surprising with a floating point function. The Intel floating point hardware does not have an instruction for exponentiation and I expect that the computation involves logarithm and multiplication instructions, where intermediate results lose some precision. One would use BigInteger.Pow if integral accuracy was desired.
However, even (decimal)(double)9007199254740991M results in a round trip violation. This time it is, however, a known bug, a direct violation of Section 6.2.1 of the C# spec. Interestingly I see the same bug even in Mono 2.8. (The referenced source shows that this conversion bug can hit even with much lower values.)
Double literals are less rounded, but still a little: 9007199254740991D prints out as 9007199254740990D. This is an artifact of internal multiplication by 10 when parsing the string literal (before the upper and lower bound converge to the same double value based on the "first zero after the decimal point"). This again violates the C# spec, this time Section 9.4.4.3.
Unlike C, C# has no hexadecimal floating point literals, so we cannot avoid that multiplication by 10 by any other syntax, except perhaps by going through Decimal or BigInteger, if these only provided accurate conversion operators. I have not tested BigInteger.
The above could almost make you wonder whether C# does not invent its own unique floating point format with reduced precision. No, Section 11.1.6 references 64bit IEC 60559 representation. So the above are indeed bugs.
So, to conclude, you should be able to fit even 9007199254740991M in a double precisely, but it's quite a challenge to get the value in place!
The moral of the story is that the traditional belief that "Arithmetic should be barely more precise than the data and the desired result" is wrong, as this famous article demonstrates (page 36), albeit in the context of a different programming language.
Don't store integers in floating point variables unless you have to.
MSDN Double data type
Decimal vs double
The value of Decimal.MaxValue is positive 79,228,162,514,264,337,593,543,950,335.
A number can have multiple representations if we use a float, so the results of a division of floats may produce bitwise different floats. But what if the denominator is a power of 2?
AFAIK, dividing by a power of 2 would only shift the exponent, leaving the same mantissa, always producing bitwise identical floats. Is that right?
float a = xxx;
float result = n/1024f; // always the same result?
--- UPDATE ----------------------
Sorry for my lack of knowledge in the IEEE black magic for floating points :) , but I'm talking about those numbers Guvante mentioned: no representation for certain decimal numbers, 'inaccurate' floats. For the rest of this post I'll use 'accurate' and 'inaccurate' considering Guvante's definition of these words.
To simplify, let's say the numerator is always an 'accurate' number. Also, let's divide not by any power of 2, but always for 1024. Additionally, I'm doing the operation the same way every time (same method), so I'm talking about getting the same results in different executions (for the same inputs, sure).
I'm asking all this because I see different numbers coming from the same inputs, so I thought: well if I only use 'accurate' floats as numerators and divide by 1024 I will only shift the exponent, still having an 'accurate' float.
You asked for an example. The real problem is this: I have a simulator producing sometimes 0.02999994 and sometimes 0.03000000 for the same inputs. I thought I could multiply these numbers by 1024, round to get an 'integer' ('accurate' float) that would be the same for those two numbers, and then divide by 1024 to get an 'accurate' rounded float.
I was told (in my other question) that I could convert to decimal, round and cast to float, but I want to know if this way works.
A number can have multiple representations if we use a float
The question appears to be predicated on an incorrect premise; the only number that has multiple representations as a float is zero, which can be represented as either "positive zero" or "negative zero". Other than zero a given number only has one representation as a float, assuming that you are talking about the "double" or "float" types.
Or perhaps I misunderstand. Is the issue that you are referring to the fact that the compiler is permitted to do floating point operations in higher precision than the 32 or 64 bits available for storage? That can cause divisions and multiplications to produce different results in some cases.
Since people often don't fully grasp floating point numbers I will go over some of your points real quick. Each particular combination of bits in a floating point number represent a unique number. However because that number has a base 2 fractional component, there is no representation for certain decimal numbers. For instance 1.1. In those cases you take the closest number. IEEE 754-2008 specifies round to nearest, ties to even in these cases.
The real difficulty is when you combine two of these 'inaccurate' numbers. This can introduce problems as each intermediate step will involve rounding. If you calculate the same value using two different methods, you could come up with subtly different values. Typically this is handled with an epsilon when you want equality.
Now onto your real question, can you divide by a power of two and avoid introducing any additional 'inaccuracies'? Normally you can, however as with all floating point numbers, denormals and other odd cases have their own logic, and obviously if your mantissa overflows you will have difficulty. And again note, that no mathematical errors are introduced during any of this, it is simply math being done with limited percision, which involves intermittent rounding of results.
EDIT: In response to new question
What you are saying could work, but is pretty much equivalent to rounding. Additionally if you are just looking for equality, you should use an episilon as I mentioned earlier (a - b) < e for some small value e (0.0001 would work in your example). If you are looking to print out a pretty number, and the framework you are using isn't doing it to your liking, some rounding would be the most direct way of describing your solution, which is always a plus.