Bit shifting and data explanation - c#

Ok, I'd like to have an explanation on how bit shifting works and how to build data from an array of bytes.The language is not important (if an example is needed, I know C,C++, Java and C#, they all follow the same shifting syntax,no?)
The question is, how do I go from byte[] to something which is a bunch of bytes together? (be it 16 bit ints, 32 bits ints, 64 bit ints, n-bits ints) and more importantly, why? I'd like to understand and learn how to make this myself rather then copy from the internet.
I know of endianess, I mainly mess with little endian stuff, but explaining a general rule for both systems would be nice.
Thank you very very much!!
Federico

for bit shifting i would say. byte1 << n will shift bits of byte1 to n times left and result can be get by multiplying byte1 with 2^n...and for >>n we have to divide byte1 by 2^n.

Hm... it depends what you're looking for. There's really no way to convert to an n-bit integer type (it's always an array of bytes anyways) but you can shift bytes into other types.
my32BitInt = byte1 << 24 | byte2 << 16 | byte3 << 8 | byte4

Usually the compiler is going to take care of endianess for you. So you can OR bytes together with shifting and not worry too much about that.
I've used a bit of a cheat myself in the past to save cycles. When I know the data to be in arrays least to most significant order, then you can cast the bytes into larger types and walk the array by size of walker. For example this sample walks 4 bytes at a time and writes 4 byte ints in C.
char bytes[] = {1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1};
int *temp;
temp = (int *)&bytes[0];
printf("%d\n", *temp);
temp = (int *)&bytes[4];
printf("%d\n", *temp);
temp = (int *)&bytes[8];
printf("%d\n", *temp);
temp = (int *)&bytes[12];
printf("%d\n", *temp);
return 0;
Output:
1
256
65536
16777216
This is probably no good for C# and Java

You will want to to take into consideration the endianness of the data. From there, it depends on the data type - but here is an example. Let's say you would like to go from an array of 4 bytes to an unsigned 32 bit integer. Let's also assume the bytes are in the same order as they started (so we don't need to worry about the endianness ).
//In C
#include <stdint.h>
uint32_t offset; //the offset in the array, should be set to something
uint32_t unpacked_int; //target
uint8_t* packed_array; //source
for ( uint32_t i = 0; i < 4; --i ) {
unpacked_int += packed_array[ offset + i ] << 8*i;
You could equivalently use |= and OR the bytes too. Also, if you attempt this trick on structures be aware that your compiler probably won't pack values in back-to-back, although usually you can set the behavior with a pragma. For example, gcc/g++ packs data aligned to 32 bits on my architecture.
On C#, BitConverter should have you covered. I know Python, PHP, and Perl use a function called pack() for this, perhaps there is an equivalent in Java.

Related

Why doesn't my 32-bit integer convert into a float properly?

Background
First of all, I have some hexadecimal data... 0x3AD3FFD6. I have chosen to represent this data as an array of bytes as follows:
byte[] numBytes = { 0x3A, 0xD3, 0xFF, 0xD6 };
I attempt to convert this array of bytes into its single-precision floating point value by executing the following code:
float floatNumber = 0;
floatNumber = BitConverter.ToSingle(numBytes, 0);
I have calculated this online using this IEEE 754 Converter and got the following result:
0.0016174268
I would expect the output of the C# code to produce the same thing, but instead I am getting something like...
-1.406E+14
Question
Can anybody explain what is going on here?
The bytes are in the wrong order. BitConverter uses the endianness of the underlying system (computer architecture), make sure to use the right endianness always.
Quick Answer: You've got the order of the bytes in your numBytes array backwards.
Since you're programming in C# I assume you are running on an Intel processor and Intel processors are little endian; that is, they store (and expect) the least significant bytes first. In your numBytes array you are putting the most significant byte first.
BitConverter doesn't so much convert byte array data as interpret it as another base data type. Think of physical memory holding a byte array:
b0 | b1 | b2 | b3.
To interpret that byte array as a single precision float, one must know the endian of the machine, i.e. if the LSByte is stored first or last. It may seem natural that the LSByte comes last because many of us read that way, but for little endian (Intel) processors, that's incorrect.

Creating URL ShortCode in C#

I am using this article to create a short code for a URL.
I've been working on this for a while and the pseudo code is just not making any sense to me. He states in "loop1" that I'm supposed to look from the first 4 bytes to the 4th 4 bytes, and then cast the bytes to an integer, then convert that to bits. I end up with 32 bits for each 4 bytes, but he's using 5 bytes in the "loop3" which isn't divisible by 32. I am not understanding what he's trying to say.
Then I noticed that he closes "loop2" at the bottom after you've written the short code to the database. That's not making any sense to me because I would be writing the same short code to the database over and over again.
Then I have "loop1" which is going to loop into infinity, again I'm not seeing why I would need to update the database to infinity.
I have tried to follow his example and ran it through the debugger line-by-line, but it's not making sense.
Here is the code I have so far, according to what I've been able to understand:
private void button1_Click(object sender, EventArgs e)
{
string codeMap = "abcdefghijklmnopqrstuvwxyz012345"; // 32 bytes
// Compute MD5 Hash
MD5 md5 = MD5.Create();
byte[] inputBytes = Encoding.ASCII.GetBytes(txtURL.Text);
byte[] hash = md5.ComputeHash(inputBytes);
// Loop from the first 4 bytes to the 4th 4 bytes
byte[] FourBytes = new byte[4];
for (int i = 0; i <= 3; i++)
{
FourBytes[i] = hash[i];
//int CastedBytes = FourBytes[i];
BitArray binary = new BitArray(FourBytes);
int CastedBytes = 0;
for(int ii = 0; i <=5; i++)
{
CastedBytes = CastedBytes + ii;
}
}
Can someone help me figure out what I'm doing wrong, so I can get this program working? I just need to convert URLs into short 6-digit unique codes.
Thanks.
Your MD5 hash is 128 bits. The idea is to represent those 128 bits in 6 characters, ideally without losing any information.
The codeMap contains 32 characters
string codeMap = "abcdefghijklmnopqrstuvwxyz012345"
Note that 2^5 is also 32. The third loop is using 5 bits of the hash at a time, and converting those 5 bits to a character in the codeMap. For example, for the bit pattern
00001 00011 00100
b d e
The algorithm uses 6 sets of 5 bits, so 30 bits in total. 2 bits are "wasted".
Note though that the 128 bit MD5 is being taken 4 bytes at a time, and those 4 bytes are converted to an integer. That is one approach to consuming the bits of the MD5, but certainly not the only one. It involves bit masking and bit shifting.
You may find it more straightforward to use a BitArray for the implementation. While this is probably slightly less efficient, it will not likely matter. If you go that path, initialize the BitArray with the bits of your MD5 hash, and then just take 5 bits at a time, converting them to a number in the range 0..31 to use as an index into codeMap.
This bit from the article is misleading
6 characters of short code can used to map 32^6 (1,073,741,824) URLs so it is unlikely to be used up in the near future
Due to the possibility of hash collisions, the system can manage far fewer than 1 billion URLs without a significant risk of the same short URL being assigned to two long URLs. See the Birthday Problem for more.
Unless you are expecting to have a hugely popular URL shortener, just use base 16 or base 64 off of a database auto increment column.
Base 16 would provide 16 million unique URLs. Base 64 would provide ~2^^36.

How do I convert a int to an array of byte's and then back?

I need to send an integer through a NetworkStream. The problem is that I only can send bytes.
Thats why I need to split the integer in four byte's and send those and at the other end convert it back to a int.
For now I need this only in C#. But for the final project I will need to convert the four bytes to an int in Lua.
[EDIT]
How about in Lua?
BitConverter is the easiest way, but if you want to control the order of the bytes you can do bit shifting yourself.
int foo = int.MaxValue;
byte lolo = (byte)(foo & 0xff);
byte hilo = (byte)((foo >> 8) & 0xff);
byte lohi = (byte)((foo >> 16) & 0xff);
byte hihi = (byte)(foo >> 24);
Also.. the implementation of BitConverter uses unsafe and pointers, but it's short and simple.
public static unsafe byte[] GetBytes(int value)
{
byte[] buffer = new byte[4];
fixed (byte* numRef = buffer)
{
*((int*) numRef) = value;
}
return buffer;
}
Try
BitConverter.GetBytes()
http://msdn.microsoft.com/en-us/library/system.bitconverter.aspx
Just keep in mind that the order of the bytes in returned array depends on the endianness of your system.
EDIT:
As for the Lua part, I don't know how to convert back. You could always multiply by 16 to get the same functionality of a bitwise shift by 4. It's not pretty and I would imagine there is some library or something that implements it. Again, the order to add the bytes in depends on the endianness, so you might want to read up on that
Maybe you can convert back in C#?
For Lua, check out Roberto's struct library. (Roberto is one of the authors of Lua.) It is more general than needed for the specific case in question, but it isn't unlikely that the need to interchange an int is shortly followed by the need to interchange other simple types or larger structures.
Assuming native byte order is acceptable at both ends (which is likely a bad assumption, incidentally) then you can convert a number to a 4-byte integer with:
buffer = struct.pack("l", value)
and back again with:
value = struct.unpack("l", buffer)
In both cases, buffer is a Lua string containing the bytes. If you need to access the individual byte values from Lua, string.byte is your friend.
To specify the byte order of the packed data, change the format from "l" to "<l" for little-endian or ">l" for big-endian.
The struct module is implemented in C, and must be compiled to a DLL or equivalent for your platform before it can be used by Lua. That said, it is included in the Lua for Windows batteries-included installation package that is a popular way to install Lua on Windows systems.
Here are some functions in Lua for converting a 32-bit two's complement number into bytes and converting four bytes into a 32-bit two's complement number. A lot more checking could/should be done to verify that the incoming parameters are valid.
-- convert a 32-bit two's complement integer into a four bytes (network order)
function int_to_bytes(n)
if n > 2147483647 then error(n.." is too large",2) end
if n < -2147483648 then error(n.." is too small",2) end
-- adjust for 2's complement
n = (n < 0) and (4294967296 + n) or n
return (math.modf(n/16777216))%256, (math.modf(n/65536))%256, (math.modf(n/256))%256, n%256
end
-- convert bytes (network order) to a 32-bit two's complement integer
function bytes_to_int(b1, b2, b3, b4)
if not b4 then error("need four bytes to convert to int",2) end
local n = b1*16777216 + b2*65536 + b3*256 + b4
n = (n > 2147483647) and (n - 4294967296) or n
return n
end
print(int_to_bytes(256)) --> 0 0 1 0
print(int_to_bytes(-10)) --> 255 255 255 246
print(bytes_to_int(255,255,255,246)) --> -10
investigate the BinaryWriter/BinaryReader classes
Convert an int to a byte array and display : BitConverter ...
www.java2s.com/Tutorial/CSharp/0280__Development/Convertaninttoabytearrayanddisplay.htm
Integer to Byte - Visual Basic .NET answers
http://bytes.com/topic/visual-basic-net/answers/349731-integer-byte
How to: Convert a byte Array to an int (C# Programming Guide)
http://msdn.microsoft.com/en-us/library/bb384066.aspx
As Nubsis says, BitConverter is appropriate but has no guaranteed endianness.
I have an EndianBitConverter class in MiscUtil which allows you to specify the endianness. Of course, if you only want to do this for a single data type (int) you could just write the code by hand.
BinaryWriter is another option, and this does guarantee little endianness. (Again, MiscUtil has an EndianBinaryWriter if you want other options.)
To convert to a byte[]:
BitConverter.GetBytes(int)
http://msdn.microsoft.com/en-us/library/system.bitconverter.aspx
To convert back to an int:
BitConverter.ToInt32(byteArray, offset)
http://msdn.microsoft.com/en-us/library/system.bitconverter.toint32.aspx
I'm not sure about Lua though.
If you are concerned about endianness use John Skeet's EndianBitConverter. I've used it and it works seamlessly.
C# supports their own implementation of htons and ntohs as:
system.net.ipaddress.hosttonetworkorder()
system.net.ipaddress.networktohostorder()
But they only work on signed int16, int32, int64 which means you'll probably end up doing a lot of unnecessary casting to make them work, and if you're using the highest order bit for anything other than signing the integer, you're screwed. Been there, done that. ::tsk:: ::tsk:: Microsoft for not providing better endianness conversion support in .NET.

Most significant bit

I haven't dealt with programming against hardware devices in a long while and have forgotten pretty much all the basics.
I have a spec of what I should send in a byte and each bit is defined from the most significant bit (bit7) to the least significant (bit 0). How do i build this byte? From MSB to LSB, or vice versa?
If these bits are being 'packeted' (which they usually are), then the order of bits is the native order, 0 being the LSB, and 7 being the MSB. Bits are not usually sent one-by-one, but as bytes (usually more than one byte...).
According to wikipedia, bit ordering can sometimes be from 7->0, but this is probably the rare case.
If you're going to write the whole byte at the same time, i.e. do a parallel transfer as opposed to a serial, the order of the bits doesn't matter.
If the transfer is serial, then you must find out which order the device expects the bits in, it's impossible to tell from the outside.
To just assemble a byte from eight bits, just use bitwise-OR to "add" bits, one at a time:
byte value = 0;
value |= (1 << n); // 'n' is the index, with 0 as the LSB, of the bit to set.
If the spec says MSB, then build it MSB. Otherwise if the spec says LSB, then build it LSB. Otherwise, ask for more information.

Byte order: converting java bytes to c#

While converting a Java application to C# I came through a strange and very annoying piece of code, which is crucial and works in the original version.
byte[] buf = new byte[length];
byte[] buf2 = bout.toByteArray();
System.arraycopy(buf2, 0, buf, 0, buf2.length);;
for (int i = (int) offset; i < (int) length; ++i) {
buf[i] = (byte) 255;
}
The part which is causing a casting error is the set into buf[i] of the byte 255: while in Java it works fine, since java.lang.Byte spans from 0 to 255, .NET System.Byte spans from 0 to 254.
Because of this limitation, the output in the C# version of the application is that instead of 255, as expected, the buffer contains a set of 254.
Could anyone give me a viable alternative?
Thank you very much for the support.
I think you've misdiagnosed your problem: .NET bytes are 8-bit like everyone else's. A better approach is to try to understand what the Java code is trying to do, then figure out what the cleanest equivalent is in C#.
I think this might be because you're casting the 255 integer literal to a byte, rather than assigning a byte value. I recommend you try using using Byte.MaxValue instead. Byte.MaxValue has a value of 255.
For example:
buf[i] = byte.MaxValue;
Edit: I was wrong; (byte)255 definitely evaluates to 255; I've just confirmed in VS. There must be something you're doing to cause the change elsewhere in your code.
byte.MaxValue equals 255.
The value of this constant is 255 (hexadecimal 0xFF).
Are you absolutely sure about this C# "limitation", according to MSDN : http://msdn.microsoft.com/en-us/library/5bdb6693(VS.71).aspx
The C# byte is an unsigned 8 bit integer with values that can range between 0 and 255.
From MDSN
byte:
The byte keyword denotes an integral type that stores values as indicated in the following table.
.NET Framework type: System Byte
Range: byte 0 to 255
Size : Unsigned 8-bit integer

Categories