Difference between two large numbers C#

Difference between two large numbers C# - c#

There are already solutions to this problem for small numbers:
Here: Difference between 2 numbers
Here: C# function to find the delta of two numbers
Here: How can I find the difference between 2 values in C#?
I'll summarise the answer to them all:
Math.Abs(a - b)
The problem is when the numbers are large this gives the wrong answer (by means of an overflow). Worse still, if (a - b) = Int32.MinValue then Math.Abs crashes with an exception (because Int32.MaxValue = Int32.MinValue - 1):
System.OverflowException occurred
HResult=0x80131516
Message=Negating the minimum value of a twos complement number is
invalid.
Source=mscorlib
StackTrace: at
System.Math.AbsHelper(Int32 value) at System.Math.Abs(Int32 value)
Its specific nature leads to difficult-to-reproduce bugs.
Maybe I'm missing some well known library function, but is there any way of determining the difference safely?

As suggested by others, use BigInteger as defined in System.Numerics (you'll have to include the namespace in Visual Studio)
Then you can just do:
BigInteger a = new BigInteger();
BigInteger b = new BigInteger();
// Assign values to a and b somewhere in here...
// Then just use included BigInteger.Abs method
BigInteger result = BigInteger.Abs(a - b);
Jeremy Thompson's answer is still valid, but note that the BigInteger namespace includes an absolute value method, so there shouldn't be any need for special logic. Also, Math.Abs expects a decimal, so it will give you grief if you try to pass in a BigInteger.
Keep in mind there are caveats to using BigIntegers. If you have a ludicrously large number, C# will try to allocate memory for it, and you may run into out of memory exceptions. On the flip side, BigIntegers are great because the amount of memory allotted to them is dynamically changed as the number gets larger.
Check out the microsoft reference here for more info: https://msdn.microsoft.com/en-us/library/system.numerics.biginteger(v=vs.110).aspx

The question is, how do you want to hold the difference between two large numbers? If you're calculating the difference between two signed long (64-bit) integers, for example, and the difference will not fit into a signed long integer, how do you intend to store it?
long a = +(1 << 62) + 1000;
long b = -(1 << 62);
long dif = a - b; // Overflow, bit truncation
The difference between a and b is wider than 64 bits, so when it's stored into a long integer, its high-order bits are truncated, and you get a strange value for dif.
In other words, you cannot store all possible differences between signed integer values of a given width into a signed integer of the same width. (You can only store half of all of the possible values; the other half require an extra bit.)
Your options are to either use a wider type to hold the difference (which won't help you if you're already using the widest long integer type), or to use a different arithmetic type. If you need at least 64 signed bits of precision, you'll probably need to use BigInteger.

The BigInteger was introduced in .Net 4.0.
There are some open source implementations available in lower versions of the .Net Framework, however you'd be wise to go with the standard.
If the Math.Abs still gives you grief you can implement the function yourself; if the number is negative (a - b < 0) simply trim the negative symbol so its unsigned.
Also, have you tried using Doubles? They hold much larger values.

Here's an alternative that might be interesting to you, but is very much within the confines of a particular int size. This example uses Int32, and uses bitwise operators to accomplish the difference and then the absolute value. This implementation is tolerant of your scenario where a - b equals the min int value, it naturally returns the min int value (not much else you can do, without casting things to the a larger data type). I don't think this is as good an answer as using BigInteger, but it is fun to play with if nothing else:
static int diff(int a, int b)
{
int xorResult = (a ^ b);
int diff = (a & xorResult) - (b & xorResult);
return (diff + (diff >> 31)) ^ (diff >> 31);
}
Here are some cases I ran it through to play with the behavior:
Console.WriteLine(diff(13, 14)); // 1
Console.WriteLine(diff(11, 9)); // 2
Console.WriteLine(diff(5002000, 2346728)); // 2655272
Console.WriteLine(diff(int.MinValue, 0)); // Should be 2147483648, but int data type can't go that large. Actual result will be -2147483648.

Related

Get random double (floating point) value from random byte array between 0 and 1 in C#?

Assume I have an array of bytes which are truly random (e.g. captured from an entropy source).
byte[] myTrulyRandomBytes = MyEntropyHardwareEngine.GetBytes(8);
Now, I want to get a random double precision floating point value, but between the values of 0 and positive 1 (like the Random.NextDouble() function performs).
Simply passing an array of 8 random bytes into BitConverter.ToDouble() can yield strange results, but most importantly, the results will almost never be less than 1.
I am fine with bit-manipulation, but the formatting of floating point numbers has always been mysterious to me. I tried many combinations of bits to apply randomness to and always ended up finding the numbers were either just over 1, always VERY close to 0, or very large.
Can someone explain which bits should be made random in a double in order to make it random within the range 0 and 1?

Though working answers have been given, I'll give an other one, that looks worse but isn't:
long asLong = BitConverter.ToInt64(myTrulyRandomBytes, 0);
double number = (double)(asLong & long.MaxValue) / long.MaxValue;
The issue with casting from an ulong to double is that it's not directly supported by hardware, so it compiles to this:
vxorps xmm0,xmm0,xmm0
vcvtsi2sd xmm0,xmm0,rcx ; interpret ulong as long and convert it to double
test rcx,rcx ; add fixup if it was "negative"
jge 000000000000001D
vaddsd xmm0,xmm0,mmword ptr [00000060h]
vdivsd xmm0,xmm0,mmword ptr [00000068h]
Whereas with my suggestion it will compile more nicely:
vxorps xmm0,xmm0,xmm0
vcvtsi2sd xmm0,xmm0,rcx
vdivsd xmm0,xmm0,mmword ptr [00000060h]
Both tested with the x64 JIT in .NET 4, but this applies in general, there just isn't a nice way to convert an ulong to a double.
Don't worry about the bit of entropy being lost: there are only 262 doubles between 0.0 and 1.0 in the first place, and most of the smaller doubles cannot be chosen so the number of possible results is even less.
Note that this as well as the presented ulong examples can result in exactly 1.0 and distribute the values with slightly differing gaps between adjacent results because they don't divide by a power of two. You can change them exclude 1.0 and get a slightly more uniform spacing (but see the first plot below, there is a bunch of different gaps, but this way it is very regular) like this:
long asLong = BitConverter.ToInt64(myTrulyRandomBytes, 0);
double number = (double)(asLong & long.MaxValue) / ((double)long.MaxValue + 1);
As a really nice bonus, you can now change the division to a multiplication (powers of two usually have inverses)
long asLong = BitConverter.ToInt64(myTrulyRandomBytes, 0);
double number = (double)(asLong & long.MaxValue) * 1.08420217248550443400745280086994171142578125E-19;
Same idea for ulong, if you really want to use that.
Since you also seemed interested specifically in how to do it with double-bits trickery, I can show that too.
Because of the whole significand/exponent deal, it can't really be done in a super direct way (just reinterpreting the bits and that's it), mainly because choosing the exponent uniformly spells trouble (with a uniform exponent, the numbers are necessarily clumped preferentially near 0 since most exponents are there).
But if the exponent is fixed, it's easy to make a double that's uniform in that region. That cannot be 0 to 1 because that spans a lot of exponents, but it can be 1 to 2 and then we can subtract 1.
So first mask away the bits that won't be part of the significand:
x &= (1L << 52) - 1;
Put in the exponent (1.0 - 2.0 range, excluding 2)
x |= 0x3ff0000000000000;
Reinterpret and adjust for the offset of 1:
return BitConverter.Int64BitsToDouble(x) - 1;
Should be pretty fast, too. An unfortunate side effect is that this time it really does cost a bit of entropy, because there are only 52 but there could have been 53. This way always leaves the least significant bit zero (the implicit bit steals a bit).
There were some concerns about the distributions, which I will address now.
The approach of choosing a random (u)long and dividing it by the maximum value clearly has a uniformly chosen (u)long, and what happens after that is actually interesting. The result can justifiably be called a uniform distribution, but if you look at it as a discrete distribution (which it actually is) it looks (qualitatively) like this: (all examples for minifloats)
Ignore the "thicker" lines and wider gaps, that's just the histogram being funny. These plots used division by a power of two, so there is no spacing problem in reality, it's only plotted strangely.
Top is what happens when you use too many bits, as happens when dividing a complete (u)long by its max value. This gives the lower floats a better resolution, but lots of different (u)longs get mapped onto the same float in the higher regions. That's not necessarily a bad thing, if you "zoom out" the density is the same everywhere.
The bottom is what happens when the resolution is limited to the worst case (0.5 to 1.0 region) everywhere, which you can do by limiting the number of bits first and then doing the "scale the integer" deal. My second suggesting with the bit hacks does not achieve this, it's limited to half that resolution.
For what it's worth, NextDouble in System.Random scales a non-negative int into the 0.0 .. 1.0 range. The resolution of that is obviously a lot lower than it could be. It also uses an int that cannot be int.MaxValue and therefore scales by approximately 1/(231-1) (cannot be represented by a double, so slightly rounded), so there are actually 33 slightly different gaps between adjacent possible results, though the majority of the gaps is the same distance.
Since int.MaxValue is small compared to what can be brute-forced these days, you can easily generate all possible results of NextDouble and examine them, for example I ran this:
const double scale = 4.6566128752458E-10;
double prev = 0;
Dictionary<long, int> hist = new Dictionary<long, int>();
for (int i = 0; i < int.MaxValue; i++)
{
long bits = BitConverter.DoubleToInt64Bits(i * scale - prev);
if (!hist.ContainsKey(bits))
hist[bits] = 1;
else
hist[bits]++;
prev = i * scale;
if ((i & 0xFFFFFF) == 0)
Console.WriteLine("{0:0.00}%", 100.0 * i / int.MaxValue);
}

This is easier than you think; its all about scaling (also true when going from a 0-1 range to some other range).
Basically, if you know that you have 64 truly random bits (8 bytes) then just do this:
double zeroToOneDouble = (double)(BitConverter.ToUInt64(bytes) / (decimal)ulong.MaxValue);
The trouble with this kind of algorithm comes when your "random" bits aren't actually uniformally random. That's when you need a specialized algorithm, such as a Mersenne Twister.

I don't know wether it's the best solution for this, but it should do the job:
ulong asLong = BitConverter.ToUInt64(myTrulyRandomBytes, 0);
double number = (double)asLong / ulong.MaxValue;
All I'm doing is converting the byte array to a ulong which is then divided by it's max value, so that the result is between 0 and 1.

To make sure the long value is within the range from 0 to 1, you can apply the following mask:
long longValue = BitConverter.ToInt64(myTrulyRandomBytes, 0);
longValue &= 0x3fefffffffffffff;
The resulting value is guaranteed to lay in the range [0, 1).
Remark. The 0x3fefffffffffffff value is very-very close to 1 and will be printed as 1, but it is really a bit less than 1.
If you want to make the generated values greater, you could set a number higher bits of an exponent to 1. For instance:
longValue |= 0x03c00000000000000;
Summarizing: example on dotnetfiddle.

If you care about the quality of the random numbers generated, be very suspicious of the answers that have appeared so far.
Those answers that use Int64BitsToDouble directly will definitely have problems with NaNs and infinities. For example, 0x7ff0000000000001, a perfectly good random bit pattern, converts to NaN (and so do thousands of others).
Those that try to convert to a ulong and then scale, or convert to a double after ensuring that various bit-pattern constraints are met, won't have NaN problems, but they are very likely to have distributional problems. Representable floating point numbers are not distributed uniformly over (0, 1), so any scheme that randomly picks among all representable values will not produce values with the required uniformity.
To be safe, just use ToInt32 and use that int as a seed for Random. (To be extra safe, reject 0.) This won't be as fast as the other schemes, but it will be much safer. A lot of research and effort has gone into making RNGs good in ways that are not immediately obvious.

Simple piece of code to print the bits out for you.
for (double i = 0; i < 1.0; i+=0.05)
{
var doubleToInt64Bits = BitConverter.DoubleToInt64Bits(i);
Console.WriteLine("{0}:\t{1}", i, Convert.ToString(doubleToInt64Bits, 2));
}
0.05: 11111110101001100110011001100110011001100110011001100110011010
0.1: 11111110111001100110011001100110011001100110011001100110011010
0.15: 11111111000011001100110011001100110011001100110011001100110100
0.2: 11111111001001100110011001100110011001100110011001100110011010
0.25: 11111111010000000000000000000000000000000000000000000000000000
0.3: 11111111010011001100110011001100110011001100110011001100110011
0.35: 11111111010110011001100110011001100110011001100110011001100110
0.4: 11111111011001100110011001100110011001100110011001100110011001
0.45: 11111111011100110011001100110011001100110011001100110011001100
0.5: 11111111011111111111111111111111111111111111111111111111111111
0.55: 11111111100001100110011001100110011001100110011001100110011001
0.6: 11111111100011001100110011001100110011001100110011001100110011
0.65: 11111111100100110011001100110011001100110011001100110011001101
0.7: 11111111100110011001100110011001100110011001100110011001100111
0.75: 11111111101000000000000000000000000000000000000000000000000001
0.8: 11111111101001100110011001100110011001100110011001100110011011
0.85: 11111111101011001100110011001100110011001100110011001100110101
0.9: 11111111101100110011001100110011001100110011001100110011001111
0.95: 11111111101110011001100110011001100110011001100110011001101001

BigInteger to double incorrect rounding

I'm attempting to convert a BigInteger value to a double-precision value using the explicit cast operator in C#, but not getting the convergent rounding behavior I expected. I can't find any documentation on the rounding mode for this operation, but it doesn't match that of a long to double cast (which is also undocumented). Bug or feature? I've implemented my own rounding routine to work around it, but would rather stick with the built-in functionality.
Here's a test that passes as written (VS 2012 on Windows 7):
BigInteger roundMeDown = BigInteger.Pow(2, 53) + 1;
double expectedRoundedDown = Math.Pow(2, 53);
BigInteger roundMeUp = BigInteger.Pow(2, 53) + 3;
double expectedRoundedUp = Math.Pow(2, 53) + 4;
double actualResult = Math.Pow(2, 53) + 2;
Assert.AreEqual(expectedRoundedDown, (double)roundMeDown);
Assert.AreNotEqual(expectedRoundedUp, (double)roundMeUp);
Assert.AreEqual(actualResult, (double)roundMeUp);

http://referencesource.microsoft.com/#System.Numerics/System/Numerics/BigInteger.cs
There's the actual source code, which will show you exactly how it does it.
In short, it looks like it basically chops and shifts, without any rounding occurring at all. Note it calls a method called GetApproxParts.
However, it should be easy to hack on at the end if you need special rounding.
Basically, simply look at the NumBits - 53rd bit (from most to least significant) on the BigInteger using the >> and | operators (doubles have 52 bits in the mantissa).
There's basically three cases, and only the last is handled differently based on different rounding modes you might want to employ.
If the 53rd bit is not set, you don't need to round.
If it is, check the bits after it. If any are set, round up (add Double.Epsilon).
If it is set and no bits after it are set, you are exactly in the middle of two valid double values. Do whatever is reasonable, so long as it is consistent.

Convert float to double loses precision but not via ToString

I have the following code:
float f = 0.3f;
double d1 = System.Convert.ToDouble(f);
double d2 = System.Convert.ToDouble(f.ToString());
The results are equivalent to:
d1 = 0.30000001192092896;
d2 = 0.3;
I'm curious to find out why this is?

Its not a loss of precision .3 is not representable in floating point. When the system converts to the string it rounds; if you print out enough significant digits you will get something that makes more sense.
To see it more clearly
float f = 0.3f;
double d1 = System.Convert.ToDouble(f);
double d2 = System.Convert.ToDouble(f.ToString("G20"));
string s = string.Format("d1 : {0} ; d2 : {1} ", d1, d2);
output
"d1 : 0.300000011920929 ; d2 : 0.300000012 "

You're not losing precision; you're upcasting to a more precise representation (double, 64-bits long) from a less precise representation (float, 32-bits long). What you get in the more precise representation (past a certain point) is just garbage. If you were to cast it back to a float FROM a double, you would have the exact same precision as you did before.
What happens here is that you've got 32 bits allocated for your float. You then upcast to a double, adding another 32 bits for representing your number (for a total of 64). Those new bits are the least significant (the farthest to the right of your decimal point), and have no bearing on the actual value since they were indeterminate before. As a result, those new bits have whatever values they happened to have when you did your upcast. They're just as indeterminate as they were before -- garbage, in other words.
When you downcast from a double to a float, it'll lop off those least-significant bits, leaving you with 0.300000 (7 digits of precision).
The mechanism for converting from a string to a float is different; the compiler needs to analyze the semantic meaning of the character string '0.3f' and figure out how that relates to a floating point value. It can't be done with bit-shifting like the float/double conversion -- thus, the value that you expect.
For more info on how floating point numbers work, you may be interested in checking out this wikipedia article on the IEEE 754-1985 standard (which has some handy pictures and good explanation of the mechanics of things), and this wiki article on the updates to the standard in 2008.
edit:
First, as #phoog pointed out below, upcasting from a float to a double isn't as simple as adding another 32 bits to the space reserved to record the number. In reality, you'll get an addition 3 bits for the exponent (for a total of 11), and an additional 29 bits for the fraction (for a total of 52). Add in the sign bit and you've got your total of 64 bits for the double.
Additionally, suggesting that there are 'garbage bits' in those least significant locations a gross generalization, and probably not be correct for C#. A bit of explanation, and some testing below suggests to me that this is deterministic for C#/.NET, and probably the result of some specific mechanism in the conversion rather than reserving memory for additional precision.
Way back in the beforetimes, when your code would compile into a machine-language binary, compilers (C and C++ compilers, at least) would not add any CPU instructions to 'clear' or initialize the value in memory when you reserved space for a variable. So, unless the programmer explicitly initialized a variable to some value, the values of the bits that were reserved for that location would maintain whatever value they had before you reserved that memory.
In .NET land, your C# or other .NET language compiles into an intermediate language (CIL, Common Intermediate Language), which is then Just-In-Time compiled by the CLR to execute as native code. There may or may not be an variable initialization step added by either the C# compiler or the JIT compiler; I'm not sure.
Here's what I do know:
I tested this by casting the float to three different doubles. Each one of the results had the exact same value.
That value was exactly the same as #rerun's value above: double d1 = System.Convert.ToDouble(f); result: d1 : 0.300000011920929
I get the same result if I cast using double d2 = (double)f; Result: d2 : 0.300000011920929
With three of us getting the same values, it looks like the upcast value is deterministic (and not actually garbage bits), indicating that .NET is doing something the same way across all of our machines. It's still true to say that the additional digits are no more or less precise than they were before, because 0.3f isn't exactly equal to 0.3 -- it's equal to 0.3, up to seven digits of precision. We know nothing about the values of additional digits beyond those first seven.

I use decimal cast for correct result in this case and same other case
float ff = 99.95f;
double dd = (double)(decimal)ff;

Get number of digits in an unsigned long integer c#

I'm trying to determine the number of digits in a c# ulong number, i'm trying to do so using some math logic rather than using ToString().Length. I have not benchmarked the 2 approaches but have seen other posts about using System.Math.Floor(System.Math.Log10(number)) + 1 to determine the number of digits.
Seems to work fine until i transition from 999999999999997 to 999999999999998 at which point, it i start getting an incorrect count.
Has anyone encountered this issue before ?
I have seen similar posts with a Java emphasis # Why log(1000)/log(10) isn't the same as log10(1000)? and also a post # How to get the separate digits of an int number? which indicates how i could possibly achieve the same using the % operator but with a lot more code
Here is the code i used to simulate this
Action<ulong> displayInfo = number =>
Console.WriteLine("{0,-20} {1,-20} {2,-20} {3,-20} {4,-20}",
number,
number.ToString().Length,
System.Math.Log10(number),
System.Math.Floor(System.Math.Log10(number)),
System.Math.Floor(System.Math.Log10(number)) + 1);
Array.ForEach(new ulong[] {
9U,
99U,
999U,
9999U,
99999U,
999999U,
9999999U,
99999999U,
999999999U,
9999999999U,
99999999999U,
999999999999U,
9999999999999U,
99999999999999U,
999999999999999U,
9999999999999999U,
99999999999999999U,
999999999999999999U,
9999999999999999999U}, displayInfo);
Array.ForEach(new ulong[] {
1U,
19U,
199U,
1999U,
19999U,
199999U,
1999999U,
19999999U,
199999999U,
1999999999U,
19999999999U,
199999999999U,
1999999999999U,
19999999999999U,
199999999999999U,
1999999999999999U,
19999999999999999U,
199999999999999999U,
1999999999999999999U
}, displayInfo);
Thanks in advance
Pat

log10 is going to involve floating point conversion - hence the rounding error. The error is pretty small for a double, but is a big deal for an exact integer!
Excluding the .ToString() method and a floating point method, then yes I think you are going to have to use an iterative method but I would use an integer divide rather than a modulo.
Integer divide by 10. Is the result>0? If so iterate around. If not, stop.
The number of digits is the number of iterations required.
Eg. 5 -> 0; 1 iteration = 1 digit.
1234 -> 123 -> 12 -> 1 -> 0; 4 iterations = 4 digits.

I would use ToString().Length unless you know this is going to be called millions of times.
"premature optimization is the root of all evil" - Donald Knuth

From the documentation:
By default, a Double value contains 15
decimal digits of precision, although
a maximum of 17 digits is maintained
internally.
I suspect that you're running into precision limits. Your value of 999,999,999,999,998 probably is at the limit of precision. And since the ulong has to be converted to double before calling Math.Log10, you see this error.

Other answers have posted why this happens.
Here is an example of a fairly quick way to determine the "length" of an integer (some cases excluded). This by itself is not very interesting -- but I include it here because using this method in conjunction with Log10 can get the accuracy "perfect" for the entire range of an unsigned long without requiring a second log invocation.
// the lookup would only be generated once
// and could be a hard-coded array literal
ulong[] lookup = Enumerable.Range(0, 20)
.Select((n) => (ulong)Math.Pow(10, n)).ToArray();
ulong x = 999;
int i = 0;
for (; i < lookup.Length; i++) {
if (lookup[i] > x) {
break;
}
}
// i is length of x "in a base-10 string"
// does not work with "0" or negative numbers
This lookup-table approach can be easily converted to any base. This method should be faster than the iterative divide-by-base approach but profiling is left as an exercise to the reader. (A direct if-then branch broken into "groups" is likely quicker yet, but that's way too much repetitive typing for my tastes.)
Happy coding.

How do you deal with numbers larger than UInt64 (C#)

In C#, how can one store and calculate with numbers that significantly exceed UInt64's max value (18,446,744,073,709,551,615)?

Can you use the .NET 4.0 beta? If so, you can use BigInteger.
Otherwise, if you're sticking within 28 digits, you can use decimal - but be aware that obviously that's going to perform decimal arithmetic, so you may need to round at various places to compensate.

By using a BigInteger class; there's one in the the J# libraries (definitely accessible from C#), another in F# (need to test this one), and there are freestanding implementations such as this one in pure C#.

What is it that you wish to use these numbers for? If you are doing calculations with really big numbers, do you still need the accuracy down to the last digit?
If not, you should consider using floating point values instead. They can be huge, the max value for the double type is 1.79769313486231570E+308, (in case you are not used to scientific notation it means 1.79769313486231570 multiplied by 10000000...0000 - 308 zeros).
That should be large enough for most applications

BigInteger represents an arbitrarily large signed integer.
using System.Numerics;
var a = BigInteger.Parse("91389681247993671255432112000000");
var b = new BigInteger(1790322312);
var c = a * b;

Decimal has greater range.
There is support for bigInteger in .NET 4.0 but that is still not out of beta.

There are several libraries for computing with big integers, most of the cryptography libraries also offer a class for that. See this for a free library.

Also, do check that you truly need a variable with greater capacity than Int64 and aren't falling foul of C#'s integer arithmetic.
For example, this code will yield an overflow error:
Int64 myAnswer = 20000*1024*1024;
At first glance that might seem to be because the result is too large for an Int64 variable, but actually it's because each of the numbers on the right side of the formula are implicitly typed as Int32 so the temporary memory space reserved for the result of the calculation will be Int32 size, and that's what causes the overflow.
The result will actually easily fit into an Int64, it just needs one of the values on the right to be cast to Int64:
Int64 myAnswer = (Int64)20000*1024*1024;
This is discussed in more detail in this answer.
(I appreciate this doesn't apply in the OP's case, but it was just this sort of issue that brought me here!)

You can use decimal. It is greater than Int64.
It has 28-29 significant digits.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.