Why BinaryReader.ReadInt64() read data as upper/lower separately? - c#

I have questions for BinaryReader by Microsoft.
From https://referencesource.microsoft.com/#mscorlib/system/io/binaryreader.cs ,
public virtual long ReadInt64() {
FillBuffer(8);
uint lo = (uint)(m_buffer[0] | m_buffer[1] << 8 |
m_buffer[2] << 16 | m_buffer[3] << 24);
uint hi = (uint)(m_buffer[4] | m_buffer[5] << 8 |
m_buffer[6] << 16 | m_buffer[7] << 24);
return (long) ((ulong)hi) << 32 | lo;
}
It parse data as two parts (upper 32bit, lower 32bit) for Int64 data. But I think that it is rebundant. So the method like below also should work.
public virtual long ReadInt64Other() {
FillBuffer(8);
return (long)m_buffer[0] |
(long)m_buffer[1] << 8 |
(long)m_buffer[2] << 16 |
(long)m_buffer[3] << 24 |
(long)m_buffer[4] << 32 |
(long)m_buffer[5] << 40 |
(long)m_buffer[6] << 48 |
(long)m_buffer[7] << 56;
}
Is there any reason for Microsoft use upper/lower style parsing?

Related

Bitshift over Int32.max fails

Ok, so I have two methods
public static long ReadLong(this byte[] data)
{
if (data.Length < 8) throw new ArgumentOutOfRangeException("Not enough data");
long length = data[0] | data[1] << 8 | data[2] << 16 | data[3] << 24
| data[4] << 32 | data[5] << 40 | data[6] << 48 | data[7] << 56;
return length;
}
public static void WriteLong(this byte[] data, long i)
{
if (data.Length < 8) throw new ArgumentOutOfRangeException("Not enough data");
data[0] = (byte)((i >> (8*0)) & 0xFF);
data[1] = (byte)((i >> (8*1)) & 0xFF);
data[2] = (byte)((i >> (8*2)) & 0xFF);
data[3] = (byte)((i >> (8*3)) & 0xFF);
data[4] = (byte)((i >> (8*4)) & 0xFF);
data[5] = (byte)((i >> (8*5)) & 0xFF);
data[6] = (byte)((i >> (8*6)) & 0xFF);
data[7] = (byte)((i >> (8*7)) & 0xFF);
}
So WriteLong works correctly(Verified against BitConverter.GetBytes()). The problem is ReadLong. I have a fairly good understanding of this stuff, but I'm guessing what's happening is the or operations are happening as 32 bit ints so at Int32.MaxValue it rolls over. I'm not sure how to avoid that. My first instinct was to make an int from the lower half and an int from the upper half and combine them, but I'm not quite knowledgeable to know even where to start with that, so this is what I tried....
public static long ReadLong(byte[] data)
{
if (data.Length < 8) throw new ArgumentOutOfRangeException("Not enough data");
long l1 = data[0] | data[1] << 8 | data[2] << 16 | data[3] << 24;
long l2 = data[4] | data[5] << 8 | data[6] << 16 | data[7] << 24;
return l1 | l2 << 32;
}
This didn't work though, at least not for larger numbers, it seems to work for everything below zero.
Here's how I run it
void Main()
{
var larr = new long[5]{
long.MinValue,
0,
long.MaxValue,
1,
-2000000000
};
foreach(var l in larr)
{
var arr = new byte[8];
WriteLong(ref arr,l);
Console.WriteLine(ByteString(arr));
var end = ReadLong(arr);
var end2 = BitConverter.ToInt64(arr,0);
Console.WriteLine(l + " == " + end + " == " + end2);
}
}
and here's what I get(using the modified ReadLong method)
0:0:0:0:0:0:0:128
-9223372036854775808 == -9223372036854775808 == -9223372036854775808
0:0:0:0:0:0:0:0
0 == 0 == 0
255:255:255:255:255:255:255:127
9223372036854775807 == -1 == 9223372036854775807
1:0:0:0:0:0:0:0
1 == 1 == 1
0:108:202:136:255:255:255:255
-2000000000 == -2000000000 == -2000000000
The problem is not the or, it is the bitshift. This has to be done as longs. Currently, the data[i] are implicitely converted to int. Just change that to long and that's it. I.e.
public static long ReadLong(byte[] data)
{
if (data.Length < 8) throw new ArgumentOutOfRangeException("Not enough data");
long length = (long)data[0] | (long)data[1] << 8 | (long)data[2] << 16 | (long)data[3] << 24
| (long)data[4] << 32 | (long)data[5] << 40 | (long)data[6] << 48 | (long)data[7] << 56;
return length;
}
You are doing int arithmetic and then assigning to long, try:
long length = data[0] | data[1] << 8L | data[2] << 16L | data[3] << 24L
| data[4] << 32L | data[5] << 40L | data[6] << 48L | data[7] << 56L;
This should define your constants as longs forcing it to use long arithmetic.
EDIT: Turns out this may not work according to comments below as while bitshift takes many operators on the left it only takes int on the right. Georg's should be the accepted answer.

Convert 3 Hex (byte) to Decimal

How do I take the hex 0A 25 10 A2 and get the end result of 851.00625? This must be multiplied by 0.000005. I have tried the following code without success:
byte oct6 = 0x0A;
byte oct7 = 0x25;
byte oct8 = 0x10;
byte oct9 = 0xA2;
decimal BaseFrequency = Convert.ToDecimal((oct9 | (oct8 << 8) | (oct7 << 16) | (oct6 << 32))) * 0.000005M;
I am not getting 851.00625 as the BaseFrequency.
oct6 is being shifted 8 bits too far (32 instead of 24)
decimal BaseFrequency = Convert.ToDecimal((oct9 | (oct8 << 8) | (oct7 << 16) | (oct6 << 24))) * 0.000005M;

C# Bitwise Operators

Could someone please explain what the following code does.
private int ReadInt32(byte[] _il, ref int position)
{
return (((il[position++] | (il[position++] << 8)) | (il[position++] << 0x10)) | (il[position++] << 0x18));
}
I'm not sure I understand how the bitwise operators in this method work, could some please break it down for me?
The integer is given as a byte array.
Then each byte is shifted left 0/8/16/24 places and these values are summed to get the integer value.
This is an Int32 in hexadecimal format:
0x10203040
It is represented as following byte array (little endian architecture, so bytes are in reverse order):
[0x40, 0x30, 0x20, 0x10]
In order to build the integer back from the array, each element is shifted i.e. following logic is performed:
a = 0x40 = 0x00000040
b = 0x30 << 8 = 0x00003000
c = 0x20 << 16 = 0x00200000
d = 0x10 << 24 = 0x10000000
then these values are OR'ed together:
int result = a | b | c | d;
this gives:
0x00000040 |
0x00003000 |
0x00200000 |
0x10000000 |
------------------
0x10203040
Think of it like this:
var i1 = il[position];
var i2 = il[position + 1] << 8; (<< 8 is equivalent to * 256)
var i3 = il[position + 2] << 16;
var i4 = il[position + 3] << 24;
position = position + 4;
return i1 | i2 | i3 | i4;

Why does this output differently in C# and C++?

In C#
var buffer = new byte[] {71, 20, 0, 0, 9, 0, 0, 0};
var g = (ulong) ((uint) (buffer[0] | buffer[1] << 8 | buffer[2] << 16 | buffer[3] << 24) |
(long) (buffer[4] | buffer[5] << 8 | buffer[6] << 16 | buffer[7] << 24) << 32);
In C++
#define byte unsigned char
#define uint unsigned int
#define ulong unsigned long long
byte buffer[8] = {71, 20, 0, 0, 9, 0, 0, 0};
ulong g = (ulong) ((uint) (buffer[0] | buffer[1] << 8 | buffer[2] << 16 | buffer[3] << 24) |
(long) (buffer[4] | buffer[5] << 8 | buffer[6] << 16 | buffer[7] << 24) << 32);
C# outputs 38654710855, C++ outputs 5199.
Why? I have been scratching my head on this for hours...
Edit: C# has the correct output.
Thanks for the help everyone :) Jack Aidley's answer was the first so I will mark that as the accepted answer. The other answers were also correct, but I can't accept multiple answers :\
The C++ is not working because you're casting to long which is typically 32-bits in most current C++ implementation but whose exact length is left to the implementor. You want long long.
Also, please read Bikeshedder's more complete answer below. He's quite correct that fixed size typedefs are a more reliable way of doing this.
The problem is that long type in C++ is still 4 byte or 32 bit(on most compilers) and thus your calculation overflows it. In C# however long is equivelent to C++'s long long and is 64 bit and so the result of the expression fits into the type.
Your unsigned long is not 64 bits long. You can easily check this using sizeof(unsigned long) which should return 4 (=32 bits) instead of 8 (=64 bits).
Don't use int/short/long if you expect them to be of a specific size. The standard does only say that short <= int <= long <= long long and defines a minimum size. They can actually be all the same size. long is guaranteed to be at least 32 bits and long long is guaranteed to be at least 64 bits. (See Page 22 of the C++ Standard) Still I would highly recommend against this and stick to stdint if you really want to work with a specific size.
Use <cstdint> (C++11) or <cstdint.h> (C++98) and the defined types uint8_t, uint16_t, uint32_t, uint64_t instead.
Corrected C++ code
#include <stdint.h>
#include <iostream>
int main(int argc, char *argv[]) {
uint8_t buffer[8] = {71, 20, 0, 0, 9, 0, 0, 0};
uint64_t g = (uint64_t) ((uint32_t) (buffer[0] | buffer[1] << 8 | buffer[2] << 16 | buffer[3] << 24) |
(int64_t) (buffer[4] | buffer[5] << 8 | buffer[6] << 16 | buffer[7] << 24) << 32);
std::cout << g << std::endl;
return 0;
}
Demo with output: http://codepad.org/e8GOuvMp
There is a subtle error in your castings.
long in C# is a 64-bit integer.
long in C++ is usually a 32-bit integer.
Thus your (long) (buffer[4] | buffer[5] << 8 | buffer[6] << 16 | buffer[7] << 24) << 32) has a different meaning when you execute it in C# or C++.

How can I combine 4 bytes into a 32 bit unsigned integer?

I'm trying to convert 4 bytes into a 32 bit unsigned integer.
I thought maybe something like:
UInt32 combined = (UInt32)((map[i] << 32) | (map[i+1] << 24) | (map[i+2] << 16) | (map[i+3] << 8));
But this doesn't seem to be working. What am I missing?
Your shifts are all off by 8. Shift by 24, 16, 8, and 0.
Use the BitConverter class.
Specifically, this overload.
BitConverter.ToInt32()
You can always do something like this:
public static unsafe int ToInt32(byte[] value, int startIndex)
{
fixed (byte* numRef = &(value[startIndex]))
{
if ((startIndex % 4) == 0)
{
return *(((int*)numRef));
}
if (IsLittleEndian)
{
return (((numRef[0] | (numRef[1] << 8)) | (numRef[2] << 0x10)) | (numRef[3] << 0x18));
}
return ((((numRef[0] << 0x18) | (numRef[1] << 0x10)) | (numRef[2] << 8)) | numRef[3]);
}
}
But this would be reinventing the wheel, as this is actually how BitConverter.ToInt32() is implemented.

Categories