How to know a value overflows without recurring to a runtime OverflowException - c#

So the question is:
Given a native integral type that represents the largest possible number in a given compiler (language?), for example ulong in C#, how do you detect that an input string representing a number is going to overflow the largest value that is representable by that given type without falling back to a checked context and a runtime OveflowException?
Obviously the C# compiler can detect constant integral overflows:
ulong l = 18446744073709551615; //ok
ulong l = 18446744073709551616; //compile time error: Integral constant is too large.
Is the compiler using a runtime OverflowException (or equivalent) under the hood? If so, is there a way to actually do this without recurring to a runtime exception or building a numeric type that can hold larger numbers like System.Numeric.BigInt? It is worth noting that BigInt has no native support in C# as the following is a compile time error although the integral constant is well inside the type's range:
BigInt i = 18446744073709551616; //compile time error: Integral constant is too large.

Would something like this count as a solution?
string s1 = "18446744073709551616";
string s2 = ulong.MaxValue.ToString();
// assuming s1 contains digits only and no leading zeros
if(
s1.Length > s2.Length ||
s1.Length == s2.Length && string.CompareOrdinal(s1, s2) > 0
)
Console.WriteLine("overflow");

The compiler most likely just parses the input one digit at a time, with code similar to that of ulong.Parse but obviously adapted to the task.
The way ulong.Parse decides that it overflows is reasonably simple, but you do need to know how integers are parsed. ulong.Parse parses the input one character at a time. It maintains the ulong result, and for each character, it multiplies the result by 10, then adds the value of the character (which is 0 to 9), and it keeps doing so until the input runs out or the result overflows.
There are two overflow checks, because there are two things that can overflow: the multiplication by 10, and the addition. To check that the multiplication won't overflow, the current result is compared to 1844674407370955161. If the result is greater than that, and there are still digits left, the algorithm exits, reporting overflow. Observe that this number is the same as ulong.MaxValue with the last digit removed, making it the largest integer that can be multiplied by 10 without overflow.
Next, it needs to check whether adding a number from 0 to 9 will overflow. It does so by adding the number first, and then checking whether the result has decreased instead of increasing. This works because of how addition is implemented inside CPUs; basically the top bit of the result is discarded because it can't fit.
And that's it, really. If the characters run out without tripping any of the two checks above, the number is parsed successfully, otherwise it's an overflow.

Related

Extremely large long type number becomes negative

Currently, I am having a bit of a problem with my C# code. I have a set of code that is supposed to turn a string in the form of "x^y^z..." into a number, so I have set up a method that looks like this.
public long valueOfPower()
{
long[] number = Array.ConvertAll(this.power.Split('^'), long.Parse);
if(number.Length == 1)
{
return number[0];
}
long result = number[0];
long power = number[number.Length-1];
for (long i = number.Length-1; i > 1; i-- )
{
power = (long)Math.Pow((int)number[(int)i-1], (int)power);
}
result= (long)Math.Pow((int)result,(int)power);
return result;
}
The problem I am having is that when something like 2^2^2^2^2 is entered, I get an extremely large negative number. I am not sure if it is something wrong with my code, or because 2^2^2^2^2 is too large of a number for the long object, but I don't understand what is happening.
So, the question is, why is my code returning a large negative number when "this.power" is 2^2^2^2^2, but normal numbers with smaller inputs(like 2^2^2^2)?
(Sorry about the random casting, that came from me experimenting with different number types.)
What is happening is overflow. Each data type is stored as a certain number of bits. Because that number of bits is limited, the biggest number any data type can store is limited. Because the most significant bit often represents the sign of the number, when the maximum value for a data type is exceeded, that bit flips and the computer now interprets it as a negative number.
You can use the checked keyword to throw an exception if your math would overflow. More info on that here: https://msdn.microsoft.com/en-us/library/74b4xzyw.aspx
Another possible solution would be using a BigInteger. More info here: https://msdn.microsoft.com/en-us/library/system.numerics.biginteger.aspx
See this for the max values of data types in C#: http://timtrott.co.uk/data-types-ranges/
See this for more info on overflow: https://en.wikipedia.org/wiki/Integer_overflow
2^2^2^2^2 is, well, quite a large number and as a result is overflowing the maximum length of the long data type (9,223,372,036,854,775,807) but some margin.
You could try using the BigInteger class out of System.Numerics, or come up with some other method of representing such a number.
The Overflow problem you are experiencing is because you are doing downcasting.
number and power are supposed to be long, but in your calculation, for ex:
power = (long)Math.Pow((int)number[(int)i-1], (int)power);
// you are downcasting number and power into int.
when you do calculation in int, then your value will become negative because of overflow, and then you convert it back to long.
Also, my Math.Pow only accepts double as parameter and returns double. I don't know how you are allowed to provide int as parameters.
So, to fix your issue, it should look like this:
power = (long)Math.Pow((double)number[(int)i-1], (double)power);
// and
result= (long)Math.Pow((double)result,(double)power);
Then, if you want to get something bigger than long, consider using BigInteger.

C# Limit left decimal places

How do you limit the number of places on the left hand side of a decimal?
So 123.45 needs to be 23.45.
I would like the output to be a decimal.
You must use modulo and ToString(string Format), so
var resultString = (number % 100).ToString("#00.00");
is the correct operation
Well, if you need "one hundred twenty-three" to become "twenty-three," one arithmetic way to do that is "mod(100.0)." Works equally well for positive or negative numbers.
Another approach might be to convert the value to a string, then lop-off some of the leading digits/characters. (This would avoid applying yet-another arithmetic operation to a floating-point value, risking the dreaded "off by one-cent" that gets accountants and other types so upset.) Requires more debugging on your part, though.

Increment forever and you get -2147483648?

For a clever and complicated reason that I don't really want to explain (because it involves making a timer in an extremely ugly and hacky way), I wrote some C# code sort of like this:
int i = 0;
while (i >= 0) i++; //Should increment forever
Console.Write(i);
I expected the program to hang forever or crash or something, but, to my surprise, after waiting for about 20 seconds or so, I get this ouput:
-2147483648
Well, programming has taught me many things, but I still cannot grasp why continually incrementing a number causes it to eventually be negative...what's going on here?
In C#, the built-in integers are represented by a sequence of bit values of a predefined length. For the basic int datatype that length is 32 bits. Since 32 bits can only represent 4,294,967,296 different possible values (since that is 2^32), clearly your code will not loop forever with continually increasing values.
Since int can hold both positive and negative numbers, the sign of the number must be encoded somehow. This is done with first bit. If the first bit is 1, then the number is negative.
Here are the int values laid out on a number-line in hexadecimal and decimal:
Hexadecimal Decimal
----------- -----------
0x80000000 -2147483648
0x80000001 -2147483647
0x80000002 -2147483646
... ...
0xFFFFFFFE -2
0xFFFFFFFF -1
0x00000000 0
0x00000001 1
0x00000002 2
... ...
0x7FFFFFFE 2147483646
0x7FFFFFFF 2147483647
As you can see from this chart, the bits that represent the smallest possible value are what you would get by adding one to the largest possible value, while ignoring the interpretation of the sign bit. When a signed number is added in this way, it is called "integer overflow". Whether or not an integer overflow is allowed or treated as an error is configurable with the checked and unchecked statements in C#. The default is unchecked, which is why no error occured, but you got that crazy small number in your program.
This representation is called 2's Complement.
The value is overflowing the positive range of 32 bit integer storage going to 0xFFFFFFFF which is -2147483648 in decimal. This means you overflow at 31 bit integers.
It's been pointed out else where that if you use an unsigned int you'll get different behaviour as the 32nd bit isn't being used to store the sign of of the number.
What you are experiencing is Integer Overflow.
In computer programming, an integer overflow occurs when an arithmetic operation attempts to create a numeric value that is larger than can be represented within the available storage space. For instance, adding 1 to the largest value that can be represented constitutes an integer overflow. The most common result in these cases is for the least significant representable bits of the result to be stored (the result is said to wrap).
int is a signed integer. Once past the max value, it starts from the min value (large negative) and marches towards 0.
Try again with uint and see what is different.
Try it like this:
int i = 0;
while (i >= 0)
checked{ i++; } //Should increment forever
Console.Write(i);
And explain the results
What the others have been saying. If you want something that can go on forever (and I wont remark on why you would need something of this sort), use the BigInteger class in the System.Numerics namespace (.NET 4+). You can do the comparison to an arbitrarily large number.
It has a lot to do with how positive numbers and negative numbers are really stored in memory (at bit level).
If you're interested, check this video: Programming Paradigms at 12:25 and onwards. Pretty interesting and you will understand why your code behaves the way it does.
This happens because when the variable "i" reaches the maximum int limit, the next value will be a negative one.
I hope this does not sound like smart-ass advice, because its well meant, and not meant to be snarky.
What you are asking is for us to describe that which is pretty fundamental behaviour for integer datatypes.
There is a reason why datatypes are covered in the 1st year of any computer science course, its really very fundamental to understanding how and where things can go wrong (you can probably already see how the behaviour above if unexpected causes unexpected behaviour i.e. a bug in your application).
My advice is get hold of the reading material for 1st year computer science + Knuth's seminal work "The art of computer pragramming" and for ~ $500 you will have everything you need to become a great programmer, much cheaper than a whole Uni course ;-)

Instead of error, why doesn't compiler promote the two literals to type long?

The following two statements will cause a compiler overflow error ( reason being that constant expressions are by default checked for overflow):
int i=(int)(int.MaxValue+100); // error
long l=(long)(int.MaxValue+100); // error
But if compiler is able to figure out that adding the two values causes an overflow, why doesn't it then promote both int.MaxValue and 100 to long and only then try to add them together? As far as I can tell, that shouldn't be a problem since according to the following quote, the integer literal can also be of type long:
When an integer literal has no suffix,
its type is the first of these types
in which its value can be represented:
int, uint, long, ulong.
thanx
The literal 100 can be represented as an int, which is the first of those four types in that order, so it's made an int.
int.MaxValue is not a literal. It is a public constant field of type int.
So, the addition operation is int + int, which results in an int, which then overflows for this case.
To turn the literal 100 into a long so you perform long integer addition, suffix it with L:
long l = int.MaxValue + 100L;
The rules are:
int + int is int, not long
the default for arithmetic on constants is "checked"; the default for arithmetic on non-constants is "unchecked"
100 and int.MaxValue are constants
Therefore the correct behaviour according to the specification is to do overflow checking at compile time and give an error.
If instead you said:
int x = 100;
int y = int.MaxValue;
long z = x + y;
then the right behaviour is to do an unchecked addition of two integers, wrap around on overflow, and then convert the resulting integer to long.
If what you want is long arithmetic then you have to say so. Convert one of the operands to long.
The reason is all in the sequence. If you read your code, it does: Take an int variable with value MAX, and add 100 to it. This is true in both cases, and this is the code that will be executed before anything else will happen.
If you want to make it work. do
long l = ((long)int.MaxValue)+100;
The short answer, I would imagine, is because it hasn't been designed to. Whenever MS adds features to the C# compiler (or when anyone adds features to anything), there has to be a cost-benefit analysis. People have to want the feature, and the cost of implementing the feature (in terms of time coding and testing and the opportunity cost of some other feature that could be implemented) must be outweighed by the potential benefit that the feature provides to the developer.
In your case, the way to get the compiler to do what you want is simple, obvious, and clear. If they add:
Infer the type of a numeric expression consisting only of constants and literals to be the minimal type that can contain the resulting value
That means that they now have more code paths to check and more unit tests to write. Changing expected behavior also means that there is probably someone who relies on this documented fact whose code will now be invalid because the inferences could be different.
It's not the compiler's place to promote types based on the result of run-time expressions.
Both int.MaxValue and 100 are integers. I would find the potential for problems if the compiler change the type based on the result of an expression.
Well, what do you expect? You are saying int.MaxValue+100 which goes over the maximum value allowed for an integer! To make it work for the long, do this:
((long)(int.MaxValue)) + 100;
Don't assume the compiler will promote the value to long automatically. That would be even stranger.

Get number of digits in an unsigned long integer c#

I'm trying to determine the number of digits in a c# ulong number, i'm trying to do so using some math logic rather than using ToString().Length. I have not benchmarked the 2 approaches but have seen other posts about using System.Math.Floor(System.Math.Log10(number)) + 1 to determine the number of digits.
Seems to work fine until i transition from 999999999999997 to 999999999999998 at which point, it i start getting an incorrect count.
Has anyone encountered this issue before ?
I have seen similar posts with a Java emphasis # Why log(1000)/log(10) isn't the same as log10(1000)? and also a post # How to get the separate digits of an int number? which indicates how i could possibly achieve the same using the % operator but with a lot more code
Here is the code i used to simulate this
Action<ulong> displayInfo = number =>
Console.WriteLine("{0,-20} {1,-20} {2,-20} {3,-20} {4,-20}",
number,
number.ToString().Length,
System.Math.Log10(number),
System.Math.Floor(System.Math.Log10(number)),
System.Math.Floor(System.Math.Log10(number)) + 1);
Array.ForEach(new ulong[] {
9U,
99U,
999U,
9999U,
99999U,
999999U,
9999999U,
99999999U,
999999999U,
9999999999U,
99999999999U,
999999999999U,
9999999999999U,
99999999999999U,
999999999999999U,
9999999999999999U,
99999999999999999U,
999999999999999999U,
9999999999999999999U}, displayInfo);
Array.ForEach(new ulong[] {
1U,
19U,
199U,
1999U,
19999U,
199999U,
1999999U,
19999999U,
199999999U,
1999999999U,
19999999999U,
199999999999U,
1999999999999U,
19999999999999U,
199999999999999U,
1999999999999999U,
19999999999999999U,
199999999999999999U,
1999999999999999999U
}, displayInfo);
Thanks in advance
Pat
log10 is going to involve floating point conversion - hence the rounding error. The error is pretty small for a double, but is a big deal for an exact integer!
Excluding the .ToString() method and a floating point method, then yes I think you are going to have to use an iterative method but I would use an integer divide rather than a modulo.
Integer divide by 10. Is the result>0? If so iterate around. If not, stop.
The number of digits is the number of iterations required.
Eg. 5 -> 0; 1 iteration = 1 digit.
1234 -> 123 -> 12 -> 1 -> 0; 4 iterations = 4 digits.
I would use ToString().Length unless you know this is going to be called millions of times.
"premature optimization is the root of all evil" - Donald Knuth
From the documentation:
By default, a Double value contains 15
decimal digits of precision, although
a maximum of 17 digits is maintained
internally.
I suspect that you're running into precision limits. Your value of 999,999,999,999,998 probably is at the limit of precision. And since the ulong has to be converted to double before calling Math.Log10, you see this error.
Other answers have posted why this happens.
Here is an example of a fairly quick way to determine the "length" of an integer (some cases excluded). This by itself is not very interesting -- but I include it here because using this method in conjunction with Log10 can get the accuracy "perfect" for the entire range of an unsigned long without requiring a second log invocation.
// the lookup would only be generated once
// and could be a hard-coded array literal
ulong[] lookup = Enumerable.Range(0, 20)
.Select((n) => (ulong)Math.Pow(10, n)).ToArray();
ulong x = 999;
int i = 0;
for (; i < lookup.Length; i++) {
if (lookup[i] > x) {
break;
}
}
// i is length of x "in a base-10 string"
// does not work with "0" or negative numbers
This lookup-table approach can be easily converted to any base. This method should be faster than the iterative divide-by-base approach but profiling is left as an exercise to the reader. (A direct if-then branch broken into "groups" is likely quicker yet, but that's way too much repetitive typing for my tastes.)
Happy coding.

Categories