Bitwise left shift in Python and C# - c#

Why bitwise left shift in Python and C# has different values?
Python:
>>> 2466250752<<1
4932501504L
C#:
System.Console.Write((2466250752 << 1).ToString()); // output is 637534208

You are overflowing the 32-bit (unsigned) integer in C#.
In python, all integers are arbitrarily-sized. That means the integer will expand to whatever size is required. Note that I added the underscores:
>>> a = 2466250752
>>>
>>> hex(a)
'0x9300_0000L'
>>>
>>> hex(a << 1)
'0x1_2600_0000L'
^-------------- Note the additional place
In C#, uint is only 32-bits. When you shift left, you are exceeding the size of the integer, causing overflow.
Before shifting:
After shifting:
Notice that a does not have the leading 1 that python showed.
To get around this limitation for this case, you can use a ulong which is 64 bits instead of 32 bits. This will work for values up to 264-1.

Python makes sure your integers don't overflow, while C# allows for overflow (but throws an exception on overflow in a checked context). In practice this means you can treat Python integers as having infinite width, while a C# int or uint is always 4 bytes.
Notice in your Python example that the value "4932501504L" has a trailing L, which means long integer. Python automatically performs math in long (size-of-available-memory-width, unlike C#'s long, which is 8 bytes) integers when overflow would occur in int values. You can see the rationale behind this idea in PEP 237.
EDIT: To get the Python result in C#, you cannot use a plain int or long - those types have limited size. One type whose size is limited only by memory is BigInteger. It will be slower than int or long for arithmetic, so I wouldn't recommend using it on every application, but it can come in handy.
As an example, you can write almost the same code as in C#, with the same result as in Python:
Console.WriteLine(new BigInteger(2466250752) << 1);
// output is 4932501504
This works for arbitrary shift sizes. For instance, you can write
Console.WriteLine(new BigInteger(2466250752) << 1000);
// output is 26426089082476043843620786304598663584184261590451906619194221930186703343408641580508146166393907795104656740341094823575842096015719243506448572304002696283531880333455226335616426281383175835559603193956495848019208150304342043576665227249501603863012525070634185841272245152956518296810797380454760948170752
Of course, this would overflow a long.

Related

Get random double (floating point) value from random byte array between 0 and 1 in C#?

Assume I have an array of bytes which are truly random (e.g. captured from an entropy source).
byte[] myTrulyRandomBytes = MyEntropyHardwareEngine.GetBytes(8);
Now, I want to get a random double precision floating point value, but between the values of 0 and positive 1 (like the Random.NextDouble() function performs).
Simply passing an array of 8 random bytes into BitConverter.ToDouble() can yield strange results, but most importantly, the results will almost never be less than 1.
I am fine with bit-manipulation, but the formatting of floating point numbers has always been mysterious to me. I tried many combinations of bits to apply randomness to and always ended up finding the numbers were either just over 1, always VERY close to 0, or very large.
Can someone explain which bits should be made random in a double in order to make it random within the range 0 and 1?
Though working answers have been given, I'll give an other one, that looks worse but isn't:
long asLong = BitConverter.ToInt64(myTrulyRandomBytes, 0);
double number = (double)(asLong & long.MaxValue) / long.MaxValue;
The issue with casting from an ulong to double is that it's not directly supported by hardware, so it compiles to this:
vxorps xmm0,xmm0,xmm0
vcvtsi2sd xmm0,xmm0,rcx ; interpret ulong as long and convert it to double
test rcx,rcx ; add fixup if it was "negative"
jge 000000000000001D
vaddsd xmm0,xmm0,mmword ptr [00000060h]
vdivsd xmm0,xmm0,mmword ptr [00000068h]
Whereas with my suggestion it will compile more nicely:
vxorps xmm0,xmm0,xmm0
vcvtsi2sd xmm0,xmm0,rcx
vdivsd xmm0,xmm0,mmword ptr [00000060h]
Both tested with the x64 JIT in .NET 4, but this applies in general, there just isn't a nice way to convert an ulong to a double.
Don't worry about the bit of entropy being lost: there are only 262 doubles between 0.0 and 1.0 in the first place, and most of the smaller doubles cannot be chosen so the number of possible results is even less.
Note that this as well as the presented ulong examples can result in exactly 1.0 and distribute the values with slightly differing gaps between adjacent results because they don't divide by a power of two. You can change them exclude 1.0 and get a slightly more uniform spacing (but see the first plot below, there is a bunch of different gaps, but this way it is very regular) like this:
long asLong = BitConverter.ToInt64(myTrulyRandomBytes, 0);
double number = (double)(asLong & long.MaxValue) / ((double)long.MaxValue + 1);
As a really nice bonus, you can now change the division to a multiplication (powers of two usually have inverses)
long asLong = BitConverter.ToInt64(myTrulyRandomBytes, 0);
double number = (double)(asLong & long.MaxValue) * 1.08420217248550443400745280086994171142578125E-19;
Same idea for ulong, if you really want to use that.
Since you also seemed interested specifically in how to do it with double-bits trickery, I can show that too.
Because of the whole significand/exponent deal, it can't really be done in a super direct way (just reinterpreting the bits and that's it), mainly because choosing the exponent uniformly spells trouble (with a uniform exponent, the numbers are necessarily clumped preferentially near 0 since most exponents are there).
But if the exponent is fixed, it's easy to make a double that's uniform in that region. That cannot be 0 to 1 because that spans a lot of exponents, but it can be 1 to 2 and then we can subtract 1.
So first mask away the bits that won't be part of the significand:
x &= (1L << 52) - 1;
Put in the exponent (1.0 - 2.0 range, excluding 2)
x |= 0x3ff0000000000000;
Reinterpret and adjust for the offset of 1:
return BitConverter.Int64BitsToDouble(x) - 1;
Should be pretty fast, too. An unfortunate side effect is that this time it really does cost a bit of entropy, because there are only 52 but there could have been 53. This way always leaves the least significant bit zero (the implicit bit steals a bit).
There were some concerns about the distributions, which I will address now.
The approach of choosing a random (u)long and dividing it by the maximum value clearly has a uniformly chosen (u)long, and what happens after that is actually interesting. The result can justifiably be called a uniform distribution, but if you look at it as a discrete distribution (which it actually is) it looks (qualitatively) like this: (all examples for minifloats)
Ignore the "thicker" lines and wider gaps, that's just the histogram being funny. These plots used division by a power of two, so there is no spacing problem in reality, it's only plotted strangely.
Top is what happens when you use too many bits, as happens when dividing a complete (u)long by its max value. This gives the lower floats a better resolution, but lots of different (u)longs get mapped onto the same float in the higher regions. That's not necessarily a bad thing, if you "zoom out" the density is the same everywhere.
The bottom is what happens when the resolution is limited to the worst case (0.5 to 1.0 region) everywhere, which you can do by limiting the number of bits first and then doing the "scale the integer" deal. My second suggesting with the bit hacks does not achieve this, it's limited to half that resolution.
For what it's worth, NextDouble in System.Random scales a non-negative int into the 0.0 .. 1.0 range. The resolution of that is obviously a lot lower than it could be. It also uses an int that cannot be int.MaxValue and therefore scales by approximately 1/(231-1) (cannot be represented by a double, so slightly rounded), so there are actually 33 slightly different gaps between adjacent possible results, though the majority of the gaps is the same distance.
Since int.MaxValue is small compared to what can be brute-forced these days, you can easily generate all possible results of NextDouble and examine them, for example I ran this:
const double scale = 4.6566128752458E-10;
double prev = 0;
Dictionary<long, int> hist = new Dictionary<long, int>();
for (int i = 0; i < int.MaxValue; i++)
{
long bits = BitConverter.DoubleToInt64Bits(i * scale - prev);
if (!hist.ContainsKey(bits))
hist[bits] = 1;
else
hist[bits]++;
prev = i * scale;
if ((i & 0xFFFFFF) == 0)
Console.WriteLine("{0:0.00}%", 100.0 * i / int.MaxValue);
}
This is easier than you think; its all about scaling (also true when going from a 0-1 range to some other range).
Basically, if you know that you have 64 truly random bits (8 bytes) then just do this:
double zeroToOneDouble = (double)(BitConverter.ToUInt64(bytes) / (decimal)ulong.MaxValue);
The trouble with this kind of algorithm comes when your "random" bits aren't actually uniformally random. That's when you need a specialized algorithm, such as a Mersenne Twister.
I don't know wether it's the best solution for this, but it should do the job:
ulong asLong = BitConverter.ToUInt64(myTrulyRandomBytes, 0);
double number = (double)asLong / ulong.MaxValue;
All I'm doing is converting the byte array to a ulong which is then divided by it's max value, so that the result is between 0 and 1.
To make sure the long value is within the range from 0 to 1, you can apply the following mask:
long longValue = BitConverter.ToInt64(myTrulyRandomBytes, 0);
longValue &= 0x3fefffffffffffff;
The resulting value is guaranteed to lay in the range [0, 1).
Remark. The 0x3fefffffffffffff value is very-very close to 1 and will be printed as 1, but it is really a bit less than 1.
If you want to make the generated values greater, you could set a number higher bits of an exponent to 1. For instance:
longValue |= 0x03c00000000000000;
Summarizing: example on dotnetfiddle.
If you care about the quality of the random numbers generated, be very suspicious of the answers that have appeared so far.
Those answers that use Int64BitsToDouble directly will definitely have problems with NaNs and infinities. For example, 0x7ff0000000000001, a perfectly good random bit pattern, converts to NaN (and so do thousands of others).
Those that try to convert to a ulong and then scale, or convert to a double after ensuring that various bit-pattern constraints are met, won't have NaN problems, but they are very likely to have distributional problems. Representable floating point numbers are not distributed uniformly over (0, 1), so any scheme that randomly picks among all representable values will not produce values with the required uniformity.
To be safe, just use ToInt32 and use that int as a seed for Random. (To be extra safe, reject 0.) This won't be as fast as the other schemes, but it will be much safer. A lot of research and effort has gone into making RNGs good in ways that are not immediately obvious.
Simple piece of code to print the bits out for you.
for (double i = 0; i < 1.0; i+=0.05)
{
var doubleToInt64Bits = BitConverter.DoubleToInt64Bits(i);
Console.WriteLine("{0}:\t{1}", i, Convert.ToString(doubleToInt64Bits, 2));
}
0.05: 11111110101001100110011001100110011001100110011001100110011010
0.1: 11111110111001100110011001100110011001100110011001100110011010
0.15: 11111111000011001100110011001100110011001100110011001100110100
0.2: 11111111001001100110011001100110011001100110011001100110011010
0.25: 11111111010000000000000000000000000000000000000000000000000000
0.3: 11111111010011001100110011001100110011001100110011001100110011
0.35: 11111111010110011001100110011001100110011001100110011001100110
0.4: 11111111011001100110011001100110011001100110011001100110011001
0.45: 11111111011100110011001100110011001100110011001100110011001100
0.5: 11111111011111111111111111111111111111111111111111111111111111
0.55: 11111111100001100110011001100110011001100110011001100110011001
0.6: 11111111100011001100110011001100110011001100110011001100110011
0.65: 11111111100100110011001100110011001100110011001100110011001101
0.7: 11111111100110011001100110011001100110011001100110011001100111
0.75: 11111111101000000000000000000000000000000000000000000000000001
0.8: 11111111101001100110011001100110011001100110011001100110011011
0.85: 11111111101011001100110011001100110011001100110011001100110101
0.9: 11111111101100110011001100110011001100110011001100110011001111
0.95: 11111111101110011001100110011001100110011001100110011001101001

Difference between two large numbers C#

There are already solutions to this problem for small numbers:
Here: Difference between 2 numbers
Here: C# function to find the delta of two numbers
Here: How can I find the difference between 2 values in C#?
I'll summarise the answer to them all:
Math.Abs(a - b)
The problem is when the numbers are large this gives the wrong answer (by means of an overflow). Worse still, if (a - b) = Int32.MinValue then Math.Abs crashes with an exception (because Int32.MaxValue = Int32.MinValue - 1):
System.OverflowException occurred
HResult=0x80131516
Message=Negating the minimum value of a twos complement number is
invalid.
Source=mscorlib
StackTrace: at
System.Math.AbsHelper(Int32 value) at System.Math.Abs(Int32 value)
Its specific nature leads to difficult-to-reproduce bugs.
Maybe I'm missing some well known library function, but is there any way of determining the difference safely?
As suggested by others, use BigInteger as defined in System.Numerics (you'll have to include the namespace in Visual Studio)
Then you can just do:
BigInteger a = new BigInteger();
BigInteger b = new BigInteger();
// Assign values to a and b somewhere in here...
// Then just use included BigInteger.Abs method
BigInteger result = BigInteger.Abs(a - b);
Jeremy Thompson's answer is still valid, but note that the BigInteger namespace includes an absolute value method, so there shouldn't be any need for special logic. Also, Math.Abs expects a decimal, so it will give you grief if you try to pass in a BigInteger.
Keep in mind there are caveats to using BigIntegers. If you have a ludicrously large number, C# will try to allocate memory for it, and you may run into out of memory exceptions. On the flip side, BigIntegers are great because the amount of memory allotted to them is dynamically changed as the number gets larger.
Check out the microsoft reference here for more info: https://msdn.microsoft.com/en-us/library/system.numerics.biginteger(v=vs.110).aspx
The question is, how do you want to hold the difference between two large numbers? If you're calculating the difference between two signed long (64-bit) integers, for example, and the difference will not fit into a signed long integer, how do you intend to store it?
long a = +(1 << 62) + 1000;
long b = -(1 << 62);
long dif = a - b; // Overflow, bit truncation
The difference between a and b is wider than 64 bits, so when it's stored into a long integer, its high-order bits are truncated, and you get a strange value for dif.
In other words, you cannot store all possible differences between signed integer values of a given width into a signed integer of the same width. (You can only store half of all of the possible values; the other half require an extra bit.)
Your options are to either use a wider type to hold the difference (which won't help you if you're already using the widest long integer type), or to use a different arithmetic type. If you need at least 64 signed bits of precision, you'll probably need to use BigInteger.
The BigInteger was introduced in .Net 4.0.
There are some open source implementations available in lower versions of the .Net Framework, however you'd be wise to go with the standard.
If the Math.Abs still gives you grief you can implement the function yourself; if the number is negative (a - b < 0) simply trim the negative symbol so its unsigned.
Also, have you tried using Doubles? They hold much larger values.
Here's an alternative that might be interesting to you, but is very much within the confines of a particular int size. This example uses Int32, and uses bitwise operators to accomplish the difference and then the absolute value. This implementation is tolerant of your scenario where a - b equals the min int value, it naturally returns the min int value (not much else you can do, without casting things to the a larger data type). I don't think this is as good an answer as using BigInteger, but it is fun to play with if nothing else:
static int diff(int a, int b)
{
int xorResult = (a ^ b);
int diff = (a & xorResult) - (b & xorResult);
return (diff + (diff >> 31)) ^ (diff >> 31);
}
Here are some cases I ran it through to play with the behavior:
Console.WriteLine(diff(13, 14)); // 1
Console.WriteLine(diff(11, 9)); // 2
Console.WriteLine(diff(5002000, 2346728)); // 2655272
Console.WriteLine(diff(int.MinValue, 0)); // Should be 2147483648, but int data type can't go that large. Actual result will be -2147483648.

Why Do Bytes Carryover?

I have been playing with some byte arrays recently (dealing with grayscale images). A byte can have values 0-255. I was modifying the bytes, and came across a situation where the value I was assigning to the byte was outside the bounds of the byte. It was doing unexpected things to the images I was playing with.
I wrote a test and learned that the byte carries over. Example:
private static int SetByte(int y)
{
return y;
}
.....
byte x = (byte) SetByte(-4);
Console.WriteLine(x);
//output is 252
There is a carryover! This happens when we go the other way around as well.
byte x = (byte) SetByte(259);
Console.WriteLine(x);
//output is 3
I would have expected it to set it to 255 in the first situation and 0 in the second. What is the purpose of this carry over? Is it just due to the fact that I'm casting this integer assignment? When is this useful in the real-world?
byte x = (byte) SetByte(259);
Console.WriteLine(x);
//output is 3
The cast of the result of SetByte is applying modulo 256 to your integer input, effectively dropping bits that are outside the range of a byte.
259 % 256 = 3
Why: The implementers choose to only consider the 8 least significant bits, ignoring the rest.
When compiling C# you can specify whether the assembly should be compiled in checked or unchecked mode (unchecked is default). You are also able to make certain parts of code explicit via the use of the checked or unchecked keywords.
You are currently using unchecked mode which ignores arithmetic overflow and truncates the value. The checked mode will check for possible overflows and throw if they are encountered.
Try the following:
int y = 259;
byte x = checked((byte)y);
And you will see it throws an OverflowException.
The reason why the behaviour in unchecked mode is to truncate rather than clamp is largely for performance reasons, every unchecked cast would require conditional logic to clamp the value when the majority of the time it is unnecessary and can be done manually.
Another reason is that clamping would involve a loss of data which may not be desirable. I don't condone code such as the following but have seen it (see this answer):
int input = 259;
var firstByte = (byte)input;
var secondByte = (byte)(input >> 8);
int reconstructed = (int)firstByte + (secondByte << 8);
Assert.AreEqual(reconstructed, input);
If firstByte came out as anything other than 3 this would not work at all.
One of the places I most commonly rely upon numeric carry over is when implementing GetHashCode(), see this answer to What is the best algorithm for an overridden System.Object.GetHashCode by Jon Skeet. It would be a nightmare to implement GetHashCode decently if overflowing meant we were constrained to Int32.MaxValue.
The method SetByte is irrelevant, simply casting (byte) 259 will also result in 3, since downcasting integral types is implemented as cutting of bytes.
You can create a custom clamp function:
public static byte Clamp(int n) {
if(n <= 0) return 0;
if(n >= 256) return 255;
return (byte) n;
}
Doing arithmetic modulo 2^n makes it possible for overflow errors in different directions to cancel each other out.
byte under = -12; // = 244
byte over = (byte) 260; // = 4
byte total = under + over;
Console.WriteLine(total); // prints 248, as intended
If .NET instead had overflows saturate, then the above program would print the incorrect answer 255.
The bounds control is not active for a case with direct type cast (when using (byte)) to avoid performance reducing.
FYI, result of most operations with operands of byte is integer, excluding the bit operations. Use Convert.ToByte() and you will get an Overflow Exception and you may handle it by assigning the 255 to your target.
Or you may create a fuction to do this check, as mentioned by another guy below.
If the perfomanse is a key, try to add attribute [MethodImpl(MethodImplOptions.AggressiveInlining)]
to that fuction.

Implement function from C++ in C# (MAKE_HRESULT - Windows function)

I have such code in C++
#define dppHRESULT(Code) \
MAKE_HRESULT(1, 138, (Code))
long x = dppHRESULT(101);
result being x = -2138439579.
MAKE_HRESULT is a windows function and defined as
#define MAKE_HRESULT(sev,fac,code) \
((HRESULT) (((unsigned long)(sev)<<31) | ((unsigned long)(fac)<<16) | ((unsigned long)(code))) )
I need to replicate this in C#. So I wrote this code:
public static long MakeHResult(uint facility, uint errorNo)
{
// Make HR
uint result = (uint)1 << 31;
result |= (uint)facility << 16;
result |= (uint)errorNo;
return (long) result;
}
And call like:
// Should return type be long actually??
long test = DppUtilities.MakeHResult(138, 101);
But I get different result, test = 2156527717.
Why? Can someone please help me replicate that C++ function also in C#? Such that I get similar output on similar inputs?
Alternative implementation.
If I use this implementation
public static long MakeHResult(ulong facility, ulong errorNo)
{
// Make HR
long result = (long)1 << 31;
result |= (long)facility << 16;
result |= (long)errorNo;
return (long) result;
}
this works on input 101.
But if I input -1, then C++ returns -1 as result while C# returns 4294967295. Why?
I would really appreciate some help as I am stuck with it.
I've rewritten the function to be the C# equivalent.
static int MakeHResult(uint facility, uint errorNo)
{
// Make HR
uint result = 1U << 31;
result |= facility << 16;
result |= errorNo;
return unchecked((int)result);
}
C# is more strict about signed/unsigned conversions, whereas the original C code didn't pay any mind to it. Mixing signed and unsigned types usually leads to headaches.
As Ben Voigt mentions in his answer, there is a difference in type naming between the two languages. long in C is actually int in C#. They both refer to 32-bit types.
The U in 1U means "this is an unsigned integer." (Brief refresher: signed types can store negative numbers, unsigned types cannot.) All the arithmetic in this function is done unsigned, and the final value is simply cast to a signed value at the end. This is the closest approximation to the original C macro posted.
unchecked is required here because otherwise C# will not allow you to convert the value if it's out of range of the target type, even if the bits are identical. Switching between signed and unsigned will generally require this if you don't mind that the values differ when you deal with negative numbers.
In Windows C++ compilers, long is 32-bits. In C#, long is 64-bits. Your C# conversion of this code should not contain the type keyword long at all.
SaxxonPike has provided the correct translation, but his explanation(s) are missing this vital information.
Your intermediate result is a 32-bit unsigned integer. In the C++ version, the cast is to a signed 32-bit value, resulting in the high bit being reinterpreted as a sign bit. SaxxonPike's code does this as well. The result is negative if the intermediate value had its most significant bit set.
In the original code in the question, the cast is to a 64-bit signed version, which preserves the old high bit as a normal binary digit, and adds a new sign bit (always zero). Thus the result is always positive. Even though the low 32-bits exactly match the 32-bit result in C++, in the C# version returning long, what would be the sign bit in C++ isn't treated as a sign bit.
In the new attempt in the question, the same thing happens (sign bit in the 64-bit number is always zero), but it happens in intermediate calculations instead of at the end.
You're calculating it inside an unsigned type (uint). So shifts are going to behave accordingly. Try using int instead and see what happens.
The clue here is that 2156527717 as an unsigned int is the same as -2138439579 as a signed int. They are literally the same bits.

Different values when ANDING and assigned uints to a char array in C++ and C#

I have a very simple problem that is giving me a really big headache in that I am port a bit of code from C++ into C# and for a very simple operation I am getting totally different results:-
C++
char OutBuff[25];
int i;
unsigned int SumCheck = 46840;
OutBuff[i++] = SumCheck & 0xFF; //these 2 ANDED make 248
The value written to the char array is -8
C#
char[] OutBuff = new char[25];
int i;
uint SumCheck = 46840;
OutBuff[i++] = (char)(sumCheck & 0xFF); //these 2 ANDED also make 248
The value written to the char array is 248.
Interestingly they are both the same characters, so this may be something to do with the format of a char array in C++ and C# - but ultimately I would be grateful if someone could give me a definitive answer.
Thanks in advance for any help.
David
Its overflow in C++, and no overflow in C#.
In C#, char is two byte. In C++, char is one byte!
So in C#, there is no overflow, and the value is retained. In C++, there is integral overflow.
Change the data type from char to uint16_t or unsigned char (in C++), you will see same result. Note that unsigned char can have a value of 248, without overflow. It can have value upto 255, in fact.
Maybe you should be using byte or sbyte instead of char. (char is only to store text chars and the actual binary serialization for char is not the same as in c++. char allows us to store characters without worrying about character byte width.)
A C# char is actually 16 bits, while a C++ char is usually 8 bits (a char is exactly 8 bits on Visual C++). So you're actually overflowing the integer in the C++ code, but the C# code does not overflow, since it holds more bits, and therefore has a bigger integer range.
Notice that 248 is outside the range of a signed char (-128 to 127). That should give you a hint that C#'s char might be bigger than 8 bits.
You're probably meant to use C#'s sbyte, (the closest equivalent to Visual C++'s char) if you want to preserve the behavior. Although you may want to recheck the code code since there's an overflow occurring in the C++ implementation.
As everyone has stated, in C# a char is 16 bits while in C++ it is usually 8 bits.
-8 and 248 in binary both (essentially) look like this:
11111000
Because a char in C++ is usually 8 bits (which is in fact your case), the result is -8. In C#, the value looks like this:
00000000 11111000
Which is 16 bits and becomes 248.
2's complement representation of -8 is the same as the binary represenation of 248 (unsigned)
So the binary representation is the same in both cases. The c++ is interpreted as an int8 result and in c# it's simply interpreted as a positive integer (int is 32 bit an truncating to 16 by casting to char doesn't affect the sign in this case)
The difference between -8 and 248 is all in how you interpret the data. They are stored exactly the same (0xF8). In C++, the default char type is 'signed'. So, 0xF8 = -8. If you change the data type to 'unsigned char', it will be interpreted as 248. VS also has a compiler option to make 'char' default to 'unsigned'.

Categories