According to http://msdn.microsoft.com/en-us/library/system.dividebyzeroexception.aspx
only Int and Decimal will throw the DivideByZeroException when you divide them by 0, but when you divide floating point by 0, the result is infinity, negative infinity, or NaN. Why is this? And what are some examples where the result is +ve infinity, -ve infinity, or NaN?
The IEEE standards committee felt that exception handling was more trouble than it was worth, for the range of code that could encounter these kinds of issues with floating-point math:
Traps can be used to stop a program, but unrecoverable situations are extremely rare.
[...]
Flags offer both predictable control flow and speed. Their use requires the programmer be aware of exceptional conditions, but flag stickiness allows programmers to delay handling exceptional conditions until necessary.
This may seem strange to a developer accustomed to a language in which exception handling is deeply baked, like C#. The developers of the IEEE 754 standard are thinking about a broader range of implementations (embedded systems, for instance), where such facilities aren't available, or aren't desirable.
Michael's answer is of course correct. Here's another way to look at it.
Integers are exact. When you divide seven by three in integers you are actually asking the question "how many times can I subtract three from seven before I'd have to go into negative numbers?". Division by zero is undefined because there is no number of times you can subtract zero from seven to get something negative.
Floats are by their nature inexact. They have a certain amount of precision and you are best to assume that the "real" quantity is somewhere between the given float and a float near it. Moreover, floats usually represent physical quantities, and those have measurement error far larger than the representation error. I think of a float as a fuzzy smeared-out region surrounding a point.
So when you divide seven by zero in floats, think of it as dividing some number reasonably close to seven by some number reasonably close to zero. Clearly a number reasonably close to zero can make the quotient arbitrarily large! And therefore this is signaled to you by giving infinity as the answer; this means that the answer could be arbitrarily large, depending on where the true value actually lies.
The floating point engine built into the processor is quite capable of generating exceptions for float division by zero. Windows has a dedicated exception code for it, STATUS_FLOAT_DIVIDE_BY_ZERO, exception code 0xC000008E, "Floating-point division by zero". As well as other mishaps that the FPU can report, like overflow, underflow and inexact results (aka denormals).
Whether it does this is determined by the control register, programs can alter this register with a helper function like _controlfp(). Libraries created with Borland tools routinely did this for example, unmasking these exceptions.
This has not worked out well, to put it mildly. It is the worst possible global variable you can imagine. Mixing such libraries with other ones that expect a division by zero to generate infinity instead of an exception just does not work and is next-to-impossible to deal with.
Accordingly, it is now the norm for language runtimes to mask all floating point exceptions. The CLR insist on this as well.
Dealing with a library that unmasks the exceptions is tricky, but has a silly workaround. You can throw an exception and catch it again. The exception handling code inside the CLR resets the control register. An example of this is shown in this answer.
Related
Yesterday during debugging something strange happened to me and I can't really explain it:
So maybe I am not seeing the obvious here or I misunderstood something about decimals in .NET but shouldn't the results be the same?
decimal is not a magical do all the maths for me type. It's still a floating point number - the main difference from float is that it's a decimal floating point number, rather than binary. So you can easily represent 0.3 as a decimal (it's impossible as a finite binary number), but you don't have infinite precision.
This makes it work much closer to a human doing the same calculations, but you still have to imagine someone doing each operation individually. It's specifically designed for financial calculations, where you don't do the kind of thing you do in Maths - you simply go step by step, rounding each result according to pretty specific rules.
In fact, for many cases, decimal might work much worse than float (or better, double). This is because decimal doesn't do any automatic rounding at all. Doing the same with double gives you 22 as expected, because it's automatically assumed that the difference doesn't matter - in decimal, it does - that's one of the important points about decimal. You can emulate this by inserting manual Math.Rounds, of course, but it doesn't make much sense.
Decimal can only store exactly values that are exactly representable in decimal within its precision limit. Here 22/24 = 0.91666666666666666666666... which needs infinite precision or a rational type to store, and it does not equal to 22/24 after rounding anymore.
If you do the multiplication first then all the values are exactly representable, hence the result you see.
By adding brackets you are making sure that the division is calculated before the multiplication. This subtlely looks to be enough to affect the calculation enough to introduce a floating precision issue.
Since computers can't actually produce every possible number, you should make sure you factor this into your calculations
While Decimal has a higher precision than Double, its primary useful feature is that every value precisely matches its human-readable representation. While the fixed-decimal types which are available in some languages can guarantee that neither addition or subtraction of two matching-precision fixed-point values, nor multiplication of a fixed-point type by an integer, will ever cause rounding error, and while "big-decimal" types such as those found in Java can guarantee that no multiplication will ever cause rounding errors, floating-point Decimal types like the one found in .NET offers no such guarantees, and no decimal types can guarantee that division operations can be completed without rounding errors (Java's has the option to throw an exception in case rounding would be necessary).
While those deciding to make Decimal be a floating-point type may have intended that it be usable either for situations requiring more digits to the right of the decimal point or more to the left, floating-point types, whether base-10 or base-2, make rounding issues unavoidable for all operations.
I am having very different results when comparing, what seems like identical floating point comparison statements. What is bizarre, both statements are true in 64-bit, and only in 32-bit are the results not equal.
Even if I explicitly cast the '132' and/or 'initial' to an Int32, the result in 32-bit is still the same.
int initial = 134;
float initialConverted = initial/255.0f;
// both are true in 64, abs2 is false in 32
var abs = Math.Abs(initialConverted - (134/255.0f)) < float.Epsilon;
var abs2 = Math.Abs(initialConverted - (initial/255.0f)) < float.Epsilon;
Why is there a problem with division when the integer value is stored in its own field?
This is just a variant of the normal floating point comparison and accuracy problems.
Floating point calculations are slightly different in 32-bit and 64-bit and slightly different between DEBUG and RELEASE builds. Most likely in one setting it evaluates to 0, in another to something equal to or slightly larger than float.Epsilon.
I would not use float.Epsilon, it is far too small to handle normal inaccuracies. Instead you need to decide on an epsilon value yourself, that would be "close enough".
float.Epsilon is the same as Single.Epsilon, which is documented as:
Represents the smallest positive Single value that is greater than zero. This field is constant.
In other words, this is just the smallest number representable in the Single data type, it is not usable to handle inaccuracies in calculations, you need something larger for that. Single.Epsilon is somewhere in the vicinity of 1.4E-45 which doesn't allow any inaccuracies at all.
There are a couple of things going on here.
Firstly, C# uses a different definition of epsilon than other languages. float.Epsilon is the next largest float after 0. Due to the scaling property of floating point numbers, this is very small (1.0e-45f assuming standard IEEE754 binary32 format). In most other languages (such as say, FLT_EPSILON in C), epsilon refers to the difference between 1 and the next largest float (1.1920929e-7f in binary32 format).
What this means is that the threshold you're using is very tight, too tight too allow for the usual floating point rounding error.
The reason for the difference between archiectures is due to differences in handling intermediate precision. On a modern CPU, there are two sets of instructions for handling floating point numbers:
x87 instructions: these date back to the original 8086 processors (or, more specifically, the 8087 coprocessors that accompanied them). They internally utilise a higher precision than the format, namely an 80-bit format (compared with typical 32-bit floats and 64-bit doubles). However at certain steps, operations will need to be truncated to the destination format. The precise rules for when this occurs depends on the language (see here for C#). This is the reason why your abs2 is false on a 32-bit machine: initialConverted has been rounded to a float, but the second (initial/255.0f) has not (I'm not sure why this doesn't occur in abs, but I guess the compiler optimises away the constant expression (134/255.0f) into a float).
SSE instructions: these were introduced as "fast-but-restrictive" floating point operations for games and multimedia, but now have almost completely supplanted the x87 instructions on modern processors. Unlike x87, there is no extended precision (so a float - float immediately returns a float), they are faster, and offer basic parallelism via SIMD operations. They are almost certainly being used on a 64-bit machine (they are also available on most 32-bit machines from the past decade, but compilers tend not to use them, I guess for compatibility reasons). As there is no extended precision, initialConverted and (initial/255.0f) will both be identical floats, hence abs2 is true.
Mathematically, 0.9 recurring can be shown to be equal to 1. This question however, is not about infinity, convergence, or the maths behind this.
The above assumption can be represented using doubles in C# with the following.
var oneOverNine = 1d / 9d;
var resultTimesNine = oneOverNine * 9d;
Using the code above, (resultTimesNine == 1d) evaluates to true.
When using decimals instead, the evaluation yields false, yet, my question is not about the disparate precision of double and decimal.
Since no type has infinite precision, how and why does double maintain such an equality where decimal does not? What is happening literally 'between the lines' of code above, with regards to the manner in which the oneOverNine variable is stored in memory?
It depends on the rounding used to get the closest representable value to 1/9. It could go either way. You can investigate the issue of representability at Rob Kennedy's useful page: http://pages.cs.wisc.edu/~rkennedy/exact-float
But don't think that somehow double is able to achieve exactness. It isn't. If you try with 2/9, 3/9 etc. you will find cases where the rounding goes the other way. The bottom line is that 1/9 is not exactly representable in binary floating point. And so rounding happens and your calculations are subject to rounding errors.
What is happening literally 'between the lines' of code above, with regards to the manner in which the oneOverNine variable is stored in memory?
What you're asking about is called IEEE 754. This is the spec that C#, it's underlying .Net runtime, and most other programming platforms use to store and manipulate decimal values. This is because support for IEEE 754 is typically implemented directly at the CPU/chipset-level, making it both far more performant than an alternative implemented solely in software and far easier when building compilers, because the operations will map almost directly to specific CPU instructions.
A number can have multiple representations if we use a float, so the results of a division of floats may produce bitwise different floats. But what if the denominator is a power of 2?
AFAIK, dividing by a power of 2 would only shift the exponent, leaving the same mantissa, always producing bitwise identical floats. Is that right?
float a = xxx;
float result = n/1024f; // always the same result?
--- UPDATE ----------------------
Sorry for my lack of knowledge in the IEEE black magic for floating points :) , but I'm talking about those numbers Guvante mentioned: no representation for certain decimal numbers, 'inaccurate' floats. For the rest of this post I'll use 'accurate' and 'inaccurate' considering Guvante's definition of these words.
To simplify, let's say the numerator is always an 'accurate' number. Also, let's divide not by any power of 2, but always for 1024. Additionally, I'm doing the operation the same way every time (same method), so I'm talking about getting the same results in different executions (for the same inputs, sure).
I'm asking all this because I see different numbers coming from the same inputs, so I thought: well if I only use 'accurate' floats as numerators and divide by 1024 I will only shift the exponent, still having an 'accurate' float.
You asked for an example. The real problem is this: I have a simulator producing sometimes 0.02999994 and sometimes 0.03000000 for the same inputs. I thought I could multiply these numbers by 1024, round to get an 'integer' ('accurate' float) that would be the same for those two numbers, and then divide by 1024 to get an 'accurate' rounded float.
I was told (in my other question) that I could convert to decimal, round and cast to float, but I want to know if this way works.
A number can have multiple representations if we use a float
The question appears to be predicated on an incorrect premise; the only number that has multiple representations as a float is zero, which can be represented as either "positive zero" or "negative zero". Other than zero a given number only has one representation as a float, assuming that you are talking about the "double" or "float" types.
Or perhaps I misunderstand. Is the issue that you are referring to the fact that the compiler is permitted to do floating point operations in higher precision than the 32 or 64 bits available for storage? That can cause divisions and multiplications to produce different results in some cases.
Since people often don't fully grasp floating point numbers I will go over some of your points real quick. Each particular combination of bits in a floating point number represent a unique number. However because that number has a base 2 fractional component, there is no representation for certain decimal numbers. For instance 1.1. In those cases you take the closest number. IEEE 754-2008 specifies round to nearest, ties to even in these cases.
The real difficulty is when you combine two of these 'inaccurate' numbers. This can introduce problems as each intermediate step will involve rounding. If you calculate the same value using two different methods, you could come up with subtly different values. Typically this is handled with an epsilon when you want equality.
Now onto your real question, can you divide by a power of two and avoid introducing any additional 'inaccuracies'? Normally you can, however as with all floating point numbers, denormals and other odd cases have their own logic, and obviously if your mantissa overflows you will have difficulty. And again note, that no mathematical errors are introduced during any of this, it is simply math being done with limited percision, which involves intermittent rounding of results.
EDIT: In response to new question
What you are saying could work, but is pretty much equivalent to rounding. Additionally if you are just looking for equality, you should use an episilon as I mentioned earlier (a - b) < e for some small value e (0.0001 would work in your example). If you are looking to print out a pretty number, and the framework you are using isn't doing it to your liking, some rounding would be the most direct way of describing your solution, which is always a plus.
In this question the topic is how to make VS check for an arithmetic overflow in C# and throw an Exception: C# Overflow not Working? How to enable Overflow Checking?
One of the comments stated something weird and got upvoted much, I hope you can help me out here:
You can also use the checked keyword to wrap a statement or a set of statements so that they are explicitly checked for arithmetic overflow. Setting the project-wide property is a little risky because oftentimes overflow is a fairly reasonable expectation.
I dont know much about hardware but am aware that overflow has to do with the way registers work. I always thought overflow causes undefined behaviour and should be prevented where possible. (in 'normal' projects, not writing malicious code)
Why would you ever expect an overflow to happen and why wouldn't you always prevent it if you have the possibility? (by setting the corresponding compiler option)
The main time when I want overflow is computing hash codes. There, the actual numeric magnitude of the result doesn't matter at all - it's effectively just a bit pattern which I happen to be manipulating with arithmetic operations.
We have checked arithmetic turned on project-wide for Noda Time - I'd rather throw an exception than return incorrect data. I suspect that it's pretty rare for overflows to be desirable... I'll admit I usually leave the default to unchecked arithmetic, just because it's the default. There's the speed penalty as well, of course...
I always thought overflow causes
undefined behaviour and should be
prevented where possible.
You may also be confused about the difference between buffer overflow (overrun) and numeric overflow.
Buffer overflow is when data is written past the end of an unmanaged array. It can cause undefined behavior, doing things like overwriting the return address on the stack with user-entered data. Buffer overflow is difficult to do in managed code.
Numeric overflow, however, is well defined. For example, if you have an 8-bit register, it can only store 2^8 values (0 to 255 if unsigned). So if you add 100+200, you would not get 300, but 300 modulo 256, which is 44. The story is a little more complicated using signed types; the bit pattern is incremented in a similar manner, but they are interpreted as two's complement, so adding two positive numbers can give a negative number.
When doing calculations with constantly incrementing counters. A classic example is Environment.TickCount:
int start = Environment.TickCount;
DoSomething();
int end = Environment.TickCount;
int executionTime = end - start;
If that were checked, the program has odds to bomb 27 days after Windows was booted. When TickCount ticks beyond int.MaxValue while DoSomething was running. PerformanceCounter is another example.
These types of calculations produce an accurate result, even though overflow is present. A secondary example is the kind of math you do to generate a representative bit pattern, you're not really interested in an accurate result, just a reproducible one. Examples of those are checksums, hashes and random numbers.
Angles
Integers that overflow are elegant tools for measuring angles. You have 0 == 0 degrees, 0xFFFFFFFF == 359.999.... degrees. Its very convenient, because as 32 bit integers you can add/subtract angles (350 degrees plus 20 degrees ends up overflowing wrapping back around to 10 degrees). Also you can decide to treat the 32 bit integer as signed (-180 to 180 degrees) and unsigned (0 to 360). 0xFFFFFFF equates to -179.999..., which equates to 359.999..., which is equivelent. Very elegent.
When generating HashCodes, say from a string of characters.
why wouldn't you always prevent it if you have the possibility?
The reason checked arithmetic is not enabled by default is that checked arithmetic is slower than unchecked arithmetic. If performance isn't an issue for you it would probably make sense to enable checked arithmetic as an overflow occurring is usually an error.
This probably has as much to do with history as with any technical reasons. Integer overflow has very often been used to good effect by algorithms that rely on the behaviour (hashing algorithms in particular).
Also, most CPUs are designed to allow overflow, but set a carry bit in the process, which makes it easier to implement addition over longer-than-natural word-sizes. To implement checked operations in this context would mean adding code to raise an exception if the carry flag is set. Not a huge imposition, but one that the compiler writers probably didn't want to foist upon people without choice.
The alternative would be to check by default, but offer an unchecked option. Why this isn't so probably also goes back to history.
You might expect it on something that is measured for deltas. Some networking equipment keeps counter sizes small and you can poll for a value, say bytes transferred. If the value gets too big it just overflows back to zero. It still gives you something useful if you're measuring it frequently (bytes/minute, bytes/hour), and as the counters are usually cleared when a connection drops it doesn't matter they are not entirely accurate.
As Justin mentioned buffer overflows are a different kettle of fish. This is where you write past the end of an array into memory that you shouldn't. In numeric overflow, the same amount of memory is used. In buffer overflow you use memory you didn't allocate. Buffer overflow is prevented automatically in some languages.
There is one classic story about a programmer who took advantage of overflow in the design of a program:
The Story of Mel
This isn't so much related to how registers work as it is just the limits of the memory in variables that store data. (You can overflow a variable in memory without overflowing any registers.)
But to answer your question, consider the simplest type of checksum. It's simply the sum of all the data being checked. If the checksum overflows, that's okay and the part that didn't overflow is still meaningful.
Other reasons might include that you just want your program to keep running even though a inconsequential variable may have overflowed.
One more possible situation which I could imaging is a random number generation algorythm - we don't case about overflow in that case, because all we want is a random number.
An integer overflow goes like this.
You have an 8 bit integer 1111 1111, now add 1 to it. 0000 0000, the leading 1 gets truncated since it would be in the 9th position.
Now say you have a signed integer, the leading bit means it's negative. So now you have 0111 1111. Add 1 to it and you have 1000 0000, which is -128. In this case, adding 1 to 127 made it switch to negative.
I'm very sure overflows behave in a well determined manner, but I'm not sure about underflows.
All integer arithmetic (well adds subtracts and multiplies at least) is exact. It is just the interpretation of the resulting bits that you need to be careful of. In the 2's complement system, you get the correct result modulo 2 to the number of bits. The only difference between signed, and unsigned is that for signed numbers the most significant bit is treated as a sign bit. Its up to the programmer to determine what is appropriate. Obviously for some computations you want to know about an overflow and take appropriate action if one is detected. Personally I've never needed the overflow detection. I use a linear congruential random number generator that relies on it, i.e. 64*64bit unsigned integer multiplication, I only care about the lowest 64bits, I get the modulo operation for free because of the truncation.