Bitwise representation of division of floats - how division of floats works

Bitwise representation of division of floats - how division of floats works - c#

A number can have multiple representations if we use a float, so the results of a division of floats may produce bitwise different floats. But what if the denominator is a power of 2?
AFAIK, dividing by a power of 2 would only shift the exponent, leaving the same mantissa, always producing bitwise identical floats. Is that right?
float a = xxx;
float result = n/1024f; // always the same result?
--- UPDATE ----------------------
Sorry for my lack of knowledge in the IEEE black magic for floating points :) , but I'm talking about those numbers Guvante mentioned: no representation for certain decimal numbers, 'inaccurate' floats. For the rest of this post I'll use 'accurate' and 'inaccurate' considering Guvante's definition of these words.
To simplify, let's say the numerator is always an 'accurate' number. Also, let's divide not by any power of 2, but always for 1024. Additionally, I'm doing the operation the same way every time (same method), so I'm talking about getting the same results in different executions (for the same inputs, sure).
I'm asking all this because I see different numbers coming from the same inputs, so I thought: well if I only use 'accurate' floats as numerators and divide by 1024 I will only shift the exponent, still having an 'accurate' float.
You asked for an example. The real problem is this: I have a simulator producing sometimes 0.02999994 and sometimes 0.03000000 for the same inputs. I thought I could multiply these numbers by 1024, round to get an 'integer' ('accurate' float) that would be the same for those two numbers, and then divide by 1024 to get an 'accurate' rounded float.
I was told (in my other question) that I could convert to decimal, round and cast to float, but I want to know if this way works.

A number can have multiple representations if we use a float
The question appears to be predicated on an incorrect premise; the only number that has multiple representations as a float is zero, which can be represented as either "positive zero" or "negative zero". Other than zero a given number only has one representation as a float, assuming that you are talking about the "double" or "float" types.
Or perhaps I misunderstand. Is the issue that you are referring to the fact that the compiler is permitted to do floating point operations in higher precision than the 32 or 64 bits available for storage? That can cause divisions and multiplications to produce different results in some cases.

Since people often don't fully grasp floating point numbers I will go over some of your points real quick. Each particular combination of bits in a floating point number represent a unique number. However because that number has a base 2 fractional component, there is no representation for certain decimal numbers. For instance 1.1. In those cases you take the closest number. IEEE 754-2008 specifies round to nearest, ties to even in these cases.
The real difficulty is when you combine two of these 'inaccurate' numbers. This can introduce problems as each intermediate step will involve rounding. If you calculate the same value using two different methods, you could come up with subtly different values. Typically this is handled with an epsilon when you want equality.
Now onto your real question, can you divide by a power of two and avoid introducing any additional 'inaccuracies'? Normally you can, however as with all floating point numbers, denormals and other odd cases have their own logic, and obviously if your mantissa overflows you will have difficulty. And again note, that no mathematical errors are introduced during any of this, it is simply math being done with limited percision, which involves intermittent rounding of results.
EDIT: In response to new question
What you are saying could work, but is pretty much equivalent to rounding. Additionally if you are just looking for equality, you should use an episilon as I mentioned earlier (a - b) < e for some small value e (0.0001 would work in your example). If you are looking to print out a pretty number, and the framework you are using isn't doing it to your liking, some rounding would be the most direct way of describing your solution, which is always a plus.

Related

double type Multiplication in C# giving me wrong values [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 7 years ago.
If I execute the following expression in C#:
double i = 10*0.69;
i is: 6.8999999999999995. Why?
I understand numbers such as 1/3 can be hard to represent in binary as it has infinite recurring decimal places but this is not the case for 0.69. And 0.69 can easily be represented in binary, one binary number for 69 and another to denote the position of the decimal place.
How do I work around this? Use the decimal type?

Because you've misunderstood floating point arithmetic and how data is stored.
In fact, your code isn't actually performing any arithmetic at execution time in this particular case - the compiler will have done it, then saved a constant in the generated executable. However, it can't store an exact value of 6.9, because that value cannot be precisely represented in floating point point format, just like 1/3 can't be precisely stored in a finite decimal representation.
See if this article helps you.

why doesn't the framework work around this and hide this problem from me and give me the
right answer,0.69!!!
Stop behaving like a dilbert manager, and accept that computers, though cool and awesome, have limits. In your specific case, it doesn't just "hide" the problem, because you have specifically told it not to. The language (the computer) provides alternatives to the format, that you didn't choose. You chose double, which has certain advantages over decimal, and certain downsides. Now, knowing the answer, you're upset that the downsides don't magically disappear.
As a programmer, you are responsible for hiding this downside from managers, and there are many ways to do that. However, the makers of C# have a responsibility to make floating point work correctly, and correct floating point will occasionally result in incorrect math.
So will every other number storage method, as we do not have infinite bits. Our job as programmers is to work with limited resources to make cool things happen. They got you 90% of the way there, just get the torch home.

And 0.69 can easily be represented in
binary, one binary number for 69 and
another to denote the position of the
decimal place.
I think this is a common mistake - you're thinking of floating point numbers as if they are base-10 (i.e decimal - hence my emphasis).
So - you're thinking that there are two whole-number parts to this double: 69 and divide by 100 to get the decimal place to move - which could also be expressed as:
69 x 10 to the power of -2.
However floats store the 'position of the point' as base-2.
Your float actually gets stored as:
68999999999999995 x 2 to the power of some big negative number
This isn't as much of a problem once you're used to it - most people know and expect that 1/3 can't be expressed accurately as a decimal or percentage. It's just that the fractions that can't be expressed in base-2 are different.

but why doesn't the framework work around this and hide this problem from me and give me the right answer,0.69!!!
Because you told it to use binary floating point, and the solution is to use decimal floating point, so you are suggesting that the framework should disregard the type you specified and use decimal instead, which is very much slower because it is not directly implemented in hardware.
A more efficient solution is to not output the full value of the representation and explicitly specify the accuracy required by your output. If you format the output to two decimal places, you will see the result you expect. However if this is a financial application decimal is precisely what you should use - you've seen Superman III (and Office Space) haven't you ;)
Note that it is all a finite approximation of an infinite range, it is merely that decimal and double use a different set of approximations. The advantage of decimal is it produces the same approximations that you would if you were performing the calculation yourself. For example if you calculated 1/3, you would eventually stop writing 3's when it was 'good enough'.

For the same reason that 1 / 3 in a decimal systems comes out as 0.3333333333333333333333333333333333333333333 and not the exact fraction, which is infinitely long.

To work around it (e.g. to display on screen) try this:
double i = (double) Decimal.Multiply(10, (Decimal) 0.69);
Everyone seems to have answered your first question, but ignored the second part.

Inaccuracy of decimal in .NET

Yesterday during debugging something strange happened to me and I can't really explain it:
So maybe I am not seeing the obvious here or I misunderstood something about decimals in .NET but shouldn't the results be the same?

decimal is not a magical do all the maths for me type. It's still a floating point number - the main difference from float is that it's a decimal floating point number, rather than binary. So you can easily represent 0.3 as a decimal (it's impossible as a finite binary number), but you don't have infinite precision.
This makes it work much closer to a human doing the same calculations, but you still have to imagine someone doing each operation individually. It's specifically designed for financial calculations, where you don't do the kind of thing you do in Maths - you simply go step by step, rounding each result according to pretty specific rules.
In fact, for many cases, decimal might work much worse than float (or better, double). This is because decimal doesn't do any automatic rounding at all. Doing the same with double gives you 22 as expected, because it's automatically assumed that the difference doesn't matter - in decimal, it does - that's one of the important points about decimal. You can emulate this by inserting manual Math.Rounds, of course, but it doesn't make much sense.

Decimal can only store exactly values that are exactly representable in decimal within its precision limit. Here 22/24 = 0.91666666666666666666666... which needs infinite precision or a rational type to store, and it does not equal to 22/24 after rounding anymore.
If you do the multiplication first then all the values are exactly representable, hence the result you see.

By adding brackets you are making sure that the division is calculated before the multiplication. This subtlely looks to be enough to affect the calculation enough to introduce a floating precision issue.
Since computers can't actually produce every possible number, you should make sure you factor this into your calculations

While Decimal has a higher precision than Double, its primary useful feature is that every value precisely matches its human-readable representation. While the fixed-decimal types which are available in some languages can guarantee that neither addition or subtraction of two matching-precision fixed-point values, nor multiplication of a fixed-point type by an integer, will ever cause rounding error, and while "big-decimal" types such as those found in Java can guarantee that no multiplication will ever cause rounding errors, floating-point Decimal types like the one found in .NET offers no such guarantees, and no decimal types can guarantee that division operations can be completed without rounding errors (Java's has the option to throw an exception in case rounding would be necessary).
While those deciding to make Decimal be a floating-point type may have intended that it be usable either for situations requiring more digits to the right of the decimal point or more to the left, floating-point types, whether base-10 or base-2, make rounding issues unavoidable for all operations.

Convert.ToDecimal(double x) - System.OverflowException

What is the maximum double value that can be represented\converted to a decimal?
How can this value be derived - example please.
Update
Given a maximum value for a double that can be converted to a decimal, I would expect to be able to round-trip the double to a decimal, and then back again. However, given a figure such as (2^52)-1 as in #Jirka's answer, this does not work. For example:
Test]
public void round_trip_double_to_decimal()
{
double maxDecimalAsDouble = (Math.Pow(2, 52) - 1);
decimal toDecimal = Convert.ToDecimal(maxDecimalAsDouble);
double toDouble = Convert.ToDouble(toDecimal);
//Fails.
Assert.That(toDouble, Is.EqualTo(maxDecimalAsDouble));
}

All integers between -9,007,199,254,740,992 and 9,007,199,254,740,991 can be exactly represented in a double. (Keep reading, though.)
The upper bound is derived as 2^53 - 1. The internal representation of it is something like (0x1.fffffffffffff * 2^52) if you pardon my hexadecimal syntax.
Outside of this range, many integers can be still exactly represented if they are a multiple of a power of two.
The highest integer whatsoever that can be accurately represented would therefore be 9,007,199,254,740,991 * (2 ^ 1023), which is even higher than Decimal.MaxValue but this is a pretty meaningless fact, given that the value does not bother to change, for example, when you subtract 1 in double arithmetic.
Based on the comments and further research, I am adding info on .NET and Mono implementations of C# that relativizes most conclusions you and I might want to make.
Math.Pow does not seem to guarantee any particular accuracy and it seems to deliver a bit or two fewer than what a double can represent. This is not too surprising with a floating point function. The Intel floating point hardware does not have an instruction for exponentiation and I expect that the computation involves logarithm and multiplication instructions, where intermediate results lose some precision. One would use BigInteger.Pow if integral accuracy was desired.
However, even (decimal)(double)9007199254740991M results in a round trip violation. This time it is, however, a known bug, a direct violation of Section 6.2.1 of the C# spec. Interestingly I see the same bug even in Mono 2.8. (The referenced source shows that this conversion bug can hit even with much lower values.)
Double literals are less rounded, but still a little: 9007199254740991D prints out as 9007199254740990D. This is an artifact of internal multiplication by 10 when parsing the string literal (before the upper and lower bound converge to the same double value based on the "first zero after the decimal point"). This again violates the C# spec, this time Section 9.4.4.3.
Unlike C, C# has no hexadecimal floating point literals, so we cannot avoid that multiplication by 10 by any other syntax, except perhaps by going through Decimal or BigInteger, if these only provided accurate conversion operators. I have not tested BigInteger.
The above could almost make you wonder whether C# does not invent its own unique floating point format with reduced precision. No, Section 11.1.6 references 64bit IEC 60559 representation. So the above are indeed bugs.
So, to conclude, you should be able to fit even 9007199254740991M in a double precisely, but it's quite a challenge to get the value in place!
The moral of the story is that the traditional belief that "Arithmetic should be barely more precise than the data and the desired result" is wrong, as this famous article demonstrates (page 36), albeit in the context of a different programming language.
Don't store integers in floating point variables unless you have to.

MSDN Double data type
Decimal vs double
The value of Decimal.MaxValue is positive 79,228,162,514,264,337,593,543,950,335.

How does decimal work?

I looked at decimal in C# but I wasnt 100% sure what it did.
Is it lossy? in C# writing 1.0000000000001f+1.0000000000001f results in 2 when using float (double gets you 2.0000000000002 which is correct) is it possible to add two things with decimal and not get the correct answer?
How many decimal places can I use? I see the MaxValue is 79228162514264337593543950335 but if i subtract 1 how many decimal places can I use?
Are there quirks I should know of? In C# its 128bits, in other language how many bits is it and will it work the same way as C# decimal does? (when adding, dividing, multiplication)

What you're showing isn't decimal - it's float. They're very different types. f is the suffix for float, aka System.Single. m is the suffix for decimal, aka System.Decimal. It's not clear from your question whether you thought this was actually using decimal, or whether you were just using float to demonstrate your fears.
If you use 1.0000000000001m + 1.0000000000001m you'll get exactly the right value. Note that the double version wasn't able to express either of the individual values exactly, by the way.
I have articles on both kinds of floating point in .NET, and you should read them thoroughly, along other resources:
Binary floating point (float/double)
Decimal floating point (decimal)
All floating point types have their limits of course, but in particular you should not expect binary floating point to accurately represent decimal values such as 0.1. It still can't represent anything that isn't exactly representable in 28/29 decimal digits though - so if you divide 1 by 3, you won't get the exact answer of course.
You should also note that the range of decimal is considerably smaller than that of double. So while it can have 28-29 decimal digits of precision, you can't represent truly huge numbers (e.g. 10200) or miniscule numbers (e.g. 10-200).

Decimals in programming are (almost) never 100% accurate. Sometimes it's even better to multiply the decimal value with a very high number and then calculate, but that's only if you're for example sure that the value is always between 0 and 100(so it won't get out of range of the maxvalue)

Floting point is inherently imprecise. Some numbers can't be represented faithfully. Decimal is a large floating point with high precision. If you look on the page at msdn you can see there are "28-29 significant digits." The .net framework classes are language agnostic. they will work the same in every language that uses .net.
edit (in response to Jon Skeet): If you initialize the Decimal class with the numbers above, which are less than 28 digits each after the decimal point, the number will be stored faithfully as long as the binary representation is exact. Since it works in 64-bit format, I assume the 128-bit will handle it perfectly fine. Some numbers, such as 0.1, will never be exactly representable because they are a repeating sequence in binary.

Is a double really unsuitable for money?

I always tell in c# a variable of type double is not suitable for money. All weird things could happen. But I can't seem to create an example to demonstrate some of these issues. Can anyone provide such an example?
(edit; this post was originally tagged C#; some replies refer to specific details of decimal, which therefore means System.Decimal).
(edit 2: I was specific asking for some c# code, so I don't think this is language agnostic only)

Very, very unsuitable. Use decimal.
double x = 3.65, y = 0.05, z = 3.7;
Console.WriteLine((x + y) == z); // false
(example from Jon's page here - recommended reading ;-p)

You will get odd errors effectively caused by rounding. In addition, comparisons with exact values are extremely tricky - you usually need to apply some sort of epsilon to check for the actual value being "near" a particular one.
Here's a concrete example:
using System;
class Test
{
static void Main()
{
double x = 0.1;
double y = x + x + x;
Console.WriteLine(y == 0.3); // Prints False
}
}

Yes it's unsuitable.
If I remember correctly double has about 17 significant numbers, so normally rounding errors will take place far behind the decimal point. Most financial software uses 4 decimals behind the decimal point, that leaves 13 decimals to work with so the maximum number you can work with for single operations is still very much higher than the USA national debt. But rounding errors will add up over time. If your software runs for a long time you'll eventually start losing cents. Certain operations will make this worse. For example adding large amounts to small amounts will cause a significant loss of precision.
You need fixed point datatypes for money operations, most people don't mind if you lose a cent here and there but accountants aren't like most people..
edit
According to this site http://msdn.microsoft.com/en-us/library/678hzkk9.aspx Doubles actually have 15 to 16 significant digits instead of 17.
#Jon Skeet decimal is more suitable than double because of its higher precision, 28 or 29 significant decimals. That means less chance of accumulated rounding errors becoming significant. Fixed point datatypes (ie integers that represent cents or 100th of a cent like I've seen used) like Boojum mentions are actually better suited.

Since decimal uses a scaling factor of multiples of 10, numbers like 0.1 can be represented exactly. In essence, the decimal type represents this as 1 / 10 ^ 1, whereas a double would represent this as 104857 / 2 ^ 20 (in reality it would be more like really-big-number / 2 ^ 1023).
A decimal can exactly represent any base 10 value with up to 28/29 significant digits (like 0.1). A double can't.

My understanding is that most financial systems express currency using integers -- i.e., counting everything in cents.
IEEE double precision actually can represent all integers exactly in the range -2^53 through +2^53. (Hacker's Delight, pg. 262) If you use only addition, subtraction and multiplication, and keep everything to integers within this range then you should see no loss of precision. I'd be very wary of division or more complex operations, however.

Using double when you don't know what you are doing is unsuitable.
"double" can represent an amount of a trillion dollars with an error of 1/90th of a cent. So you will get highly precise results. Want to calculate how much it costs to put a man on Mars and get him back alive? double will do just fine.
But with money there are often very specific rules saying that a certain calculation must give a certain result and no other. If you calculate an amount that is very very very close to $98.135 then there will often be a rule that determines whether the result should be $98.14 or $98.13 and you must follow that rule and get the result that is required.
Depending on where you live, using 64 bit integers to represent cents or pennies or kopeks or whatever is the smallest unit in your country will usually work just fine. For example, 64 bit signed integers representing cents can represent values up to 92,223 trillion dollars. 32 bit integers are usually unsuitable.

No a double will always have rounding errors, use "decimal" if you're on .Net...

Actually floating-point double is perfectly well suited to representing amounts of money as long as you pick a suitable unit.
See http://www.idinews.com/moneyRep.html
So is fixed-point long. Either consumes 8 bytes, surely preferable to the 16 consumed by a decimal item.
Whether or not something works (i.e. yields the expected and correct result) is not a matter of either voting or individual preference. A technique either works or it doesn't.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.