Why would you want an integer overflow to occur? - c#

In this question the topic is how to make VS check for an arithmetic overflow in C# and throw an Exception: C# Overflow not Working? How to enable Overflow Checking?
One of the comments stated something weird and got upvoted much, I hope you can help me out here:
You can also use the checked keyword to wrap a statement or a set of statements so that they are explicitly checked for arithmetic overflow. Setting the project-wide property is a little risky because oftentimes overflow is a fairly reasonable expectation.
I dont know much about hardware but am aware that overflow has to do with the way registers work. I always thought overflow causes undefined behaviour and should be prevented where possible. (in 'normal' projects, not writing malicious code)
Why would you ever expect an overflow to happen and why wouldn't you always prevent it if you have the possibility? (by setting the corresponding compiler option)

The main time when I want overflow is computing hash codes. There, the actual numeric magnitude of the result doesn't matter at all - it's effectively just a bit pattern which I happen to be manipulating with arithmetic operations.
We have checked arithmetic turned on project-wide for Noda Time - I'd rather throw an exception than return incorrect data. I suspect that it's pretty rare for overflows to be desirable... I'll admit I usually leave the default to unchecked arithmetic, just because it's the default. There's the speed penalty as well, of course...

I always thought overflow causes
undefined behaviour and should be
prevented where possible.
You may also be confused about the difference between buffer overflow (overrun) and numeric overflow.
Buffer overflow is when data is written past the end of an unmanaged array. It can cause undefined behavior, doing things like overwriting the return address on the stack with user-entered data. Buffer overflow is difficult to do in managed code.
Numeric overflow, however, is well defined. For example, if you have an 8-bit register, it can only store 2^8 values (0 to 255 if unsigned). So if you add 100+200, you would not get 300, but 300 modulo 256, which is 44. The story is a little more complicated using signed types; the bit pattern is incremented in a similar manner, but they are interpreted as two's complement, so adding two positive numbers can give a negative number.

When doing calculations with constantly incrementing counters. A classic example is Environment.TickCount:
int start = Environment.TickCount;
DoSomething();
int end = Environment.TickCount;
int executionTime = end - start;
If that were checked, the program has odds to bomb 27 days after Windows was booted. When TickCount ticks beyond int.MaxValue while DoSomething was running. PerformanceCounter is another example.
These types of calculations produce an accurate result, even though overflow is present. A secondary example is the kind of math you do to generate a representative bit pattern, you're not really interested in an accurate result, just a reproducible one. Examples of those are checksums, hashes and random numbers.

Angles
Integers that overflow are elegant tools for measuring angles. You have 0 == 0 degrees, 0xFFFFFFFF == 359.999.... degrees. Its very convenient, because as 32 bit integers you can add/subtract angles (350 degrees plus 20 degrees ends up overflowing wrapping back around to 10 degrees). Also you can decide to treat the 32 bit integer as signed (-180 to 180 degrees) and unsigned (0 to 360). 0xFFFFFFF equates to -179.999..., which equates to 359.999..., which is equivelent. Very elegent.

When generating HashCodes, say from a string of characters.

why wouldn't you always prevent it if you have the possibility?
The reason checked arithmetic is not enabled by default is that checked arithmetic is slower than unchecked arithmetic. If performance isn't an issue for you it would probably make sense to enable checked arithmetic as an overflow occurring is usually an error.

This probably has as much to do with history as with any technical reasons. Integer overflow has very often been used to good effect by algorithms that rely on the behaviour (hashing algorithms in particular).
Also, most CPUs are designed to allow overflow, but set a carry bit in the process, which makes it easier to implement addition over longer-than-natural word-sizes. To implement checked operations in this context would mean adding code to raise an exception if the carry flag is set. Not a huge imposition, but one that the compiler writers probably didn't want to foist upon people without choice.
The alternative would be to check by default, but offer an unchecked option. Why this isn't so probably also goes back to history.

You might expect it on something that is measured for deltas. Some networking equipment keeps counter sizes small and you can poll for a value, say bytes transferred. If the value gets too big it just overflows back to zero. It still gives you something useful if you're measuring it frequently (bytes/minute, bytes/hour), and as the counters are usually cleared when a connection drops it doesn't matter they are not entirely accurate.
As Justin mentioned buffer overflows are a different kettle of fish. This is where you write past the end of an array into memory that you shouldn't. In numeric overflow, the same amount of memory is used. In buffer overflow you use memory you didn't allocate. Buffer overflow is prevented automatically in some languages.

There is one classic story about a programmer who took advantage of overflow in the design of a program:
The Story of Mel

This isn't so much related to how registers work as it is just the limits of the memory in variables that store data. (You can overflow a variable in memory without overflowing any registers.)
But to answer your question, consider the simplest type of checksum. It's simply the sum of all the data being checked. If the checksum overflows, that's okay and the part that didn't overflow is still meaningful.
Other reasons might include that you just want your program to keep running even though a inconsequential variable may have overflowed.

One more possible situation which I could imaging is a random number generation algorythm - we don't case about overflow in that case, because all we want is a random number.

An integer overflow goes like this.
You have an 8 bit integer 1111 1111, now add 1 to it. 0000 0000, the leading 1 gets truncated since it would be in the 9th position.
Now say you have a signed integer, the leading bit means it's negative. So now you have 0111 1111. Add 1 to it and you have 1000 0000, which is -128. In this case, adding 1 to 127 made it switch to negative.
I'm very sure overflows behave in a well determined manner, but I'm not sure about underflows.

All integer arithmetic (well adds subtracts and multiplies at least) is exact. It is just the interpretation of the resulting bits that you need to be careful of. In the 2's complement system, you get the correct result modulo 2 to the number of bits. The only difference between signed, and unsigned is that for signed numbers the most significant bit is treated as a sign bit. Its up to the programmer to determine what is appropriate. Obviously for some computations you want to know about an overflow and take appropriate action if one is detected. Personally I've never needed the overflow detection. I use a linear congruential random number generator that relies on it, i.e. 64*64bit unsigned integer multiplication, I only care about the lowest 64bits, I get the modulo operation for free because of the truncation.

Related

Why doesn't 0.9 recurring always equal 1

Mathematically, 0.9 recurring can be shown to be equal to 1. This question however, is not about infinity, convergence, or the maths behind this.
The above assumption can be represented using doubles in C# with the following.
var oneOverNine = 1d / 9d;
var resultTimesNine = oneOverNine * 9d;
Using the code above, (resultTimesNine == 1d) evaluates to true.
When using decimals instead, the evaluation yields false, yet, my question is not about the disparate precision of double and decimal.
Since no type has infinite precision, how and why does double maintain such an equality where decimal does not? What is happening literally 'between the lines' of code above, with regards to the manner in which the oneOverNine variable is stored in memory?
It depends on the rounding used to get the closest representable value to 1/9. It could go either way. You can investigate the issue of representability at Rob Kennedy's useful page: http://pages.cs.wisc.edu/~rkennedy/exact-float
But don't think that somehow double is able to achieve exactness. It isn't. If you try with 2/9, 3/9 etc. you will find cases where the rounding goes the other way. The bottom line is that 1/9 is not exactly representable in binary floating point. And so rounding happens and your calculations are subject to rounding errors.
What is happening literally 'between the lines' of code above, with regards to the manner in which the oneOverNine variable is stored in memory?
What you're asking about is called IEEE 754. This is the spec that C#, it's underlying .Net runtime, and most other programming platforms use to store and manipulate decimal values. This is because support for IEEE 754 is typically implemented directly at the CPU/chipset-level, making it both far more performant than an alternative implemented solely in software and far easier when building compilers, because the operations will map almost directly to specific CPU instructions.

Why do int and decimal throw DivideByZeroException but floating point doesn't?

According to http://msdn.microsoft.com/en-us/library/system.dividebyzeroexception.aspx
only Int and Decimal will throw the DivideByZeroException when you divide them by 0, but when you divide floating point by 0, the result is infinity, negative infinity, or NaN. Why is this? And what are some examples where the result is +ve infinity, -ve infinity, or NaN?
The IEEE standards committee felt that exception handling was more trouble than it was worth, for the range of code that could encounter these kinds of issues with floating-point math:
Traps can be used to stop a program, but unrecoverable situations are extremely rare.
[...]
Flags offer both predictable control flow and speed. Their use requires the programmer be aware of exceptional conditions, but flag stickiness allows programmers to delay handling exceptional conditions until necessary.
This may seem strange to a developer accustomed to a language in which exception handling is deeply baked, like C#. The developers of the IEEE 754 standard are thinking about a broader range of implementations (embedded systems, for instance), where such facilities aren't available, or aren't desirable.
Michael's answer is of course correct. Here's another way to look at it.
Integers are exact. When you divide seven by three in integers you are actually asking the question "how many times can I subtract three from seven before I'd have to go into negative numbers?". Division by zero is undefined because there is no number of times you can subtract zero from seven to get something negative.
Floats are by their nature inexact. They have a certain amount of precision and you are best to assume that the "real" quantity is somewhere between the given float and a float near it. Moreover, floats usually represent physical quantities, and those have measurement error far larger than the representation error. I think of a float as a fuzzy smeared-out region surrounding a point.
So when you divide seven by zero in floats, think of it as dividing some number reasonably close to seven by some number reasonably close to zero. Clearly a number reasonably close to zero can make the quotient arbitrarily large! And therefore this is signaled to you by giving infinity as the answer; this means that the answer could be arbitrarily large, depending on where the true value actually lies.
The floating point engine built into the processor is quite capable of generating exceptions for float division by zero. Windows has a dedicated exception code for it, STATUS_FLOAT_DIVIDE_BY_ZERO, exception code 0xC000008E, "Floating-point division by zero". As well as other mishaps that the FPU can report, like overflow, underflow and inexact results (aka denormals).
Whether it does this is determined by the control register, programs can alter this register with a helper function like _controlfp(). Libraries created with Borland tools routinely did this for example, unmasking these exceptions.
This has not worked out well, to put it mildly. It is the worst possible global variable you can imagine. Mixing such libraries with other ones that expect a division by zero to generate infinity instead of an exception just does not work and is next-to-impossible to deal with.
Accordingly, it is now the norm for language runtimes to mask all floating point exceptions. The CLR insist on this as well.
Dealing with a library that unmasks the exceptions is tricky, but has a silly workaround. You can throw an exception and catch it again. The exception handling code inside the CLR resets the control register. An example of this is shown in this answer.

Consistency: Dividing an integer by the powers of 2 vs powers of 10?

This is a question concerning cross-platform consistency and determinism of floating point operations (IE yield different results on different CPUs/sysetms)
Which one is more likely to stay cross-platform consistent(pseudo code):
float myFloat = float ( myInteger) / float( 1024 )
or
float myFloat = float ( myInteger ) / float( 1000 )
Platforms are C# and AS3.
.
AS3 versions:
var myFloat:Number = myInteger / 1000 // AS3
var myFloat:Number = myInteger / 1024 // AS3
- Ok I've added AS3 version for clarification, which is equivalent to the 'C pseudo code' above . As you can see in AS3 all calculations, even on integers, are performed as Floats automatically, a cast is not required ( and nor can you avoid it or force the runtime to perform true integer divisions )
Hopefully this explains why im 'casting' everything into Floats: I am not! that just simply what happens in one of the target languages!
The first one is likely the same on both platforms, since there are no representation issues. In particular for small integers (highest 8 bits unused) there is one exact result, and it's very likely that this result will be used.
But I wouldn't rely on it. If you need guaranteed determinism, I recommend implementing the required arithmetic yourself on top of plain integers. For example using a fixed point representation.
The second one is likely to be inconsistent, even when using the same C# code on different hardware or .net versions. See the related question Is floating-point math consistent in C#? Can it be?
I suggest you read the IEEE 754-1985 standard. A copy can be purchased for $43. Although superseded by the 2008 version, it is an excellent introduction to floating-point because it is only 20 pages and is quite readable. It will show you why both dividing by 1000 and by 1024 are deterministic and why the former may have error but the latter does not (except in cases of underflow). It will also give you a basis for understanding the answers you have been given and why you are on the wrong track.
Which one is more likely to stay cross-platform consistent(pseudo code):
Dividing by 1024.
Every binary-based floating point systems (which are IEEE754, IBM, VAX, Cray) which applies division by 1024 to all finite numbers will yield exact results in the given representation. The reason is that dividing by 1024 is equivalent to
shifting the bits 10 positions to the right which means
decreasing the binary exponent by 10
If the number is too small (for IEEE754 1E-38/1E-308), you will lose an exact result, but this is not a problem of the operation, but of the limited range of the number...it simply cannot display such small results accurately.
As no rounding is necessary, there can be no difference due to rounding (and yes, while most programming languages use round to even, some enable choosing another rounding mode).

Increment forever and you get -2147483648?

For a clever and complicated reason that I don't really want to explain (because it involves making a timer in an extremely ugly and hacky way), I wrote some C# code sort of like this:
int i = 0;
while (i >= 0) i++; //Should increment forever
Console.Write(i);
I expected the program to hang forever or crash or something, but, to my surprise, after waiting for about 20 seconds or so, I get this ouput:
-2147483648
Well, programming has taught me many things, but I still cannot grasp why continually incrementing a number causes it to eventually be negative...what's going on here?
In C#, the built-in integers are represented by a sequence of bit values of a predefined length. For the basic int datatype that length is 32 bits. Since 32 bits can only represent 4,294,967,296 different possible values (since that is 2^32), clearly your code will not loop forever with continually increasing values.
Since int can hold both positive and negative numbers, the sign of the number must be encoded somehow. This is done with first bit. If the first bit is 1, then the number is negative.
Here are the int values laid out on a number-line in hexadecimal and decimal:
Hexadecimal Decimal
----------- -----------
0x80000000 -2147483648
0x80000001 -2147483647
0x80000002 -2147483646
... ...
0xFFFFFFFE -2
0xFFFFFFFF -1
0x00000000 0
0x00000001 1
0x00000002 2
... ...
0x7FFFFFFE 2147483646
0x7FFFFFFF 2147483647
As you can see from this chart, the bits that represent the smallest possible value are what you would get by adding one to the largest possible value, while ignoring the interpretation of the sign bit. When a signed number is added in this way, it is called "integer overflow". Whether or not an integer overflow is allowed or treated as an error is configurable with the checked and unchecked statements in C#. The default is unchecked, which is why no error occured, but you got that crazy small number in your program.
This representation is called 2's Complement.
The value is overflowing the positive range of 32 bit integer storage going to 0xFFFFFFFF which is -2147483648 in decimal. This means you overflow at 31 bit integers.
It's been pointed out else where that if you use an unsigned int you'll get different behaviour as the 32nd bit isn't being used to store the sign of of the number.
What you are experiencing is Integer Overflow.
In computer programming, an integer overflow occurs when an arithmetic operation attempts to create a numeric value that is larger than can be represented within the available storage space. For instance, adding 1 to the largest value that can be represented constitutes an integer overflow. The most common result in these cases is for the least significant representable bits of the result to be stored (the result is said to wrap).
int is a signed integer. Once past the max value, it starts from the min value (large negative) and marches towards 0.
Try again with uint and see what is different.
Try it like this:
int i = 0;
while (i >= 0)
checked{ i++; } //Should increment forever
Console.Write(i);
And explain the results
What the others have been saying. If you want something that can go on forever (and I wont remark on why you would need something of this sort), use the BigInteger class in the System.Numerics namespace (.NET 4+). You can do the comparison to an arbitrarily large number.
It has a lot to do with how positive numbers and negative numbers are really stored in memory (at bit level).
If you're interested, check this video: Programming Paradigms at 12:25 and onwards. Pretty interesting and you will understand why your code behaves the way it does.
This happens because when the variable "i" reaches the maximum int limit, the next value will be a negative one.
I hope this does not sound like smart-ass advice, because its well meant, and not meant to be snarky.
What you are asking is for us to describe that which is pretty fundamental behaviour for integer datatypes.
There is a reason why datatypes are covered in the 1st year of any computer science course, its really very fundamental to understanding how and where things can go wrong (you can probably already see how the behaviour above if unexpected causes unexpected behaviour i.e. a bug in your application).
My advice is get hold of the reading material for 1st year computer science + Knuth's seminal work "The art of computer pragramming" and for ~ $500 you will have everything you need to become a great programmer, much cheaper than a whole Uni course ;-)

Working with numbers larger than max decimal value

I'm working with the product of the first 26 prime numbers. This requires more than 52 bits of precision, which I believe is the max a double can handle, and more than the 28-29 significant digits a decimal can provide. So what would be some strategies for performing multiplication and division on numbers this large?
Also, what would the performance impacts be of whatever hoops I'd have to jump through to make this happen?
The product of the first 22 prime numbers (the most I can multiply together on my calculator without dropping into scientific mode) is:
10,642,978,845,819,148,849,204,664,294,430
The product of the last four is
72,370,439
When multiplied together, I get:
7.7023705133964511682328635583552e+38
The performance impacts are especially important here, because we're essentially trying to resolve the question of whether a prime-number string comparison solution is faster in practice than a straight comparison of characters. The post which prompted this investigation is here. Processors are optimized for floating-point calculations; ideally I'd want to leverage as much of that optimization in whatever solution I end up with.
TIA!
James
PS: The code I do have is for a competing solution; I don't think the prime number solution can possibly be faster, but I'm trying to give it the fairest chance I can.
You can use BigInteger in C#4.0. For older versions, I think you need an open source library such as this one
I read the post you linked to, about the interview question. Since you're only multiplying and dividing these large integers, a huge optimization is to keep them in their prime-factorized form. Each large integer is an array [0..25] of ints, each element representing the exponent of the nth prime in the factorization. To multiply two large integers in this form, simply add the exponents element-by-element; to divide, subtract exponents.
But you will see this is equivalent to tabulating character counts on the two strings.

Categories