Most significant bit

Most significant bit - c#

I haven't dealt with programming against hardware devices in a long while and have forgotten pretty much all the basics.
I have a spec of what I should send in a byte and each bit is defined from the most significant bit (bit7) to the least significant (bit 0). How do i build this byte? From MSB to LSB, or vice versa?

If these bits are being 'packeted' (which they usually are), then the order of bits is the native order, 0 being the LSB, and 7 being the MSB. Bits are not usually sent one-by-one, but as bytes (usually more than one byte...).
According to wikipedia, bit ordering can sometimes be from 7->0, but this is probably the rare case.

If you're going to write the whole byte at the same time, i.e. do a parallel transfer as opposed to a serial, the order of the bits doesn't matter.
If the transfer is serial, then you must find out which order the device expects the bits in, it's impossible to tell from the outside.
To just assemble a byte from eight bits, just use bitwise-OR to "add" bits, one at a time:
byte value = 0;
value |= (1 << n); // 'n' is the index, with 0 as the LSB, of the bit to set.

If the spec says MSB, then build it MSB. Otherwise if the spec says LSB, then build it LSB. Otherwise, ask for more information.

Related

Decompression of 2 byte to 4 byte float

I have some animation data (x,y,z), which is represented as 2 byte structures and written in Little Endian. I know that they should be a 4 byte floating point, so i have to unpack them. I collected a few sample values as precise as it was possible (they doesn't represent exactly packed values, but very close to them) and roughly divided packed values on few ranges.
Sample values (Little Endian):
0.048879981 - 0x0046
0.056879997 - 0x0047
0.253880024 - 0x0050
0.313879967 - 0x0051
0.623880029 - 0x0055
1.003879905 - 0x0058
-0.066120029 - 0x00С8
-0.1561199428 - 0x00СD
-0.8691199871 - 0x00D7
Ranges:
0x0000 : zero
[0x0000,0x0014] : invisible changes (increasing probably)
[0x0014, ....] : increasing (visible)
0x0080 : zero, probably the point of sign change
[0x0080,0x00B0] : invisible changes (decreasing probably)
[0x00B0, ....] : decreasing (visible)
There are gaps (....) on the ends of ranges because it is hard to check them correctly, but i assume such big values which are lying close to these ends doesn't used in practice.
Also, it looks like a symmetry between positive and negative ranges, for example i tested 0x0058 which gave 1.003879905 and 0x00D8 which gave value close to -1.003879905 but not precise. Maybe it happened because of slightly offset observed after 0x0080, when visible decreasing starts from 0x00B0, but it should be about 0x0094 if entire range had equal symmetry. But slight measure inaccuracy might be as well.
So, how to get a function in C#, that will convert source data to 4 byte floating point?

Some initial comments based on the information in the question so far:
byte[] buffer = new byte[4]; is a bad approach because it addresses bytes individually while the other code manipulates bits using shifts within words, and C# does not define endianness. Simply use an unsigned 32-bit integer for all the work. The code will actually be simpler.
The code does not handle subnormal values properly. If num2 is zero and num3 is not zero, the significand (num3) must be shifted and the exponent (num2) must be adjusted.

Reading and writing non-standard floating point values

I'm working with a binary file (3d model file for an old video game) in C#. The file format isn't officially documented, but some of it has been reverse-engineered by the game's community.
I'm having trouble understanding how to read/write the 4-byte floating point values. I was provided this explanation by a member of the community:
For example, the bytes EE 62 ED FF represent the value -18.614.
The bytes are little endian ordered. The first 2 bytes represent the decimal part of the value, and the last 2 bytes represent the whole part of the value.
For the decimal part, 62 EE converted to decimal is 25326. This represents the fraction out of 1, which also can be described as 65536/65536. Thus, divide 25326 by 65536 and you'll get 0.386.
For the whole part, FF ED converted to decimal is 65517. 65517 represents the whole number -19 (which is 65517 - 65536).
This makes the value -19 + .386 = -18.614.
This explanation mostly makes sense, but I'm confused by 2 things:
Does the magic number 65536 have any significance?
BinaryWriter.Write(-18.613f) writes the bytes as 79 E9 94 C1, so my assumption is the binary file I'm working with uses its own proprietary method of storing 4-byte floating point values (i.e. I can't use C#'s float interchangably and will need to encode/decode the values first)?

Firstly, this isn't a Floating Point Number its a Fix Point Number
Note : A fixed point number has a specific number of bits (or digits) reserved for the integer part (the part to the left of the decimal point)
Does the magic number 65536 have any significance
Its the max number of values unsigned 16 bit number can hold, or 2^16, yeah its significant, because the number you are working with is 2 * 16 bit values encoded for integral and fractional components.
so my assumption is the binary file I'm working with uses its own
proprietary method of storing 4-byte floating point values
Nope wrong again, floating point values in .Net adhere to the IEEE Standard for Floating-Point Arithmetic (IEEE 754) technical standard
When you use BinaryWriter.Write(float); it basically just shifts the bits into bytes and writes it to the Stream.
uint TmpValue = *(uint *)&value;
_buffer[0] = (byte) TmpValue;
_buffer[1] = (byte) (TmpValue >> 8);
_buffer[2] = (byte) (TmpValue >> 16);
_buffer[3] = (byte) (TmpValue >> 24);
OutStream.Write(_buffer, 0, 4);
If you want to read and write this special value you will need to do the same thing, you are going to have to read and write the bytes and convert them your self

This should be a build-in value unique to the game.
It should be more similar to Fraction Value.
Where 62 EE represent the Fraction Part of the value and FF ED represent the Whole Number Part of the value.
The While Number Part is easy to understand so I'm not going to explain it.
The explanation of Fraction Part is:
For every 2 bytes, there are 65536 possibilities (0 ~ 65535).
256 X 256 = 65536
hence the magic number 65536.
And the game itself must have a build-in algorithm to divide the first 2 bytes by 65536.
Choosing any number other than this will be a waste of memory space and result in decreased accuracy of the value which can be represented.
Of course, it's all depended on what kind of accuracy the game wish to present.

Why doesn't my 32-bit integer convert into a float properly?

Background
First of all, I have some hexadecimal data... 0x3AD3FFD6. I have chosen to represent this data as an array of bytes as follows:
byte[] numBytes = { 0x3A, 0xD3, 0xFF, 0xD6 };
I attempt to convert this array of bytes into its single-precision floating point value by executing the following code:
float floatNumber = 0;
floatNumber = BitConverter.ToSingle(numBytes, 0);
I have calculated this online using this IEEE 754 Converter and got the following result:
0.0016174268
I would expect the output of the C# code to produce the same thing, but instead I am getting something like...
-1.406E+14
Question
Can anybody explain what is going on here?

The bytes are in the wrong order. BitConverter uses the endianness of the underlying system (computer architecture), make sure to use the right endianness always.

Quick Answer: You've got the order of the bytes in your numBytes array backwards.
Since you're programming in C# I assume you are running on an Intel processor and Intel processors are little endian; that is, they store (and expect) the least significant bytes first. In your numBytes array you are putting the most significant byte first.
BitConverter doesn't so much convert byte array data as interpret it as another base data type. Think of physical memory holding a byte array:
b0 | b1 | b2 | b3.
To interpret that byte array as a single precision float, one must know the endian of the machine, i.e. if the LSByte is stored first or last. It may seem natural that the LSByte comes last because many of us read that way, but for little endian (Intel) processors, that's incorrect.

Converting an amount to a 4 byte array

I haven't ever had to deal with this before. I need to convert a sale amount (48.58) to a 4 byte array and use network byte order. The code below is how I am doing it, but it is wrong and I am not understanding why. Can anyone help?
float saleamount = 48.58F;
byte[] data2 = BitConverter.GetBytes(saleamount).Reverse().ToArray();
What I am getting is 66 66 81 236 in the array. I am not certain what it should be though. I am interfacing with a credit card terminal and need to send the amount in "4 bytes, fixed length, max value is 0xffffffff, use network byte order"

The first question you should ask is, "What data type?" IEEE single-precision float? Twos-complement integer? It's an integer, what is the implied scale? Is $48.53 represented as 4,853 or 485,300?
It's not uncommon for monetary values to be represented by an integer with an implied scale of either +2 or +4. In your example, $48.58 would be represented as the integer value 4858 or 0x000012FA.
Once you've established what they actually want...use an endian-aware BitConverter or BinaryWriter to create it. Jon Skeet's MiscUtil, for instance offers:
EndianBinaryReader
EndianBinaryWriter
BigEndianBitConverter
LittleEndianBitConverter
There are other implementations out there as well. See my answer to the question "Helpful byte array extensions to handle BigEndian data" for links to some.
Code you don't write it code you don't have to maintain.

Network byte order pseudo-synonym of big-endian, hence (as itsme86 mentioned already) so you can check BitConverter.IsLittleEndian:
float saleamount = 48.58F;
byte[] data2 = BitConverter.IsLittleEndian
? BitConverter.GetBytes(saleamount).Reverse().ToArray()
: BitConverter.GetBytes(saleamount);
But if you don't know this, probably you already using some protocol, which handle it.

Hashtable/Dictionary collisions

Using the standard English letters and underscore only, how many characters can be used at a maximum without causing a potential collision in a hashtable/dictionary.
So strings like:
blur
Blur
b
Blur_The_Shades_Slightly_With_A_Tint_Of_Blue
...

There's no guarantee that you won't get a collision between single letters.
You probably won't, but the algorithm used in string.GetHashCode isn't specified, and could change. (In particular it changed between .NET 1.1 and .NET 2.0, which burned people who assumed it wouldn't change.)
Note that hash code collisions won't stop well-designed hashtables from working - you should still be able to get the right values out, it'll just potentially need to check more than one key using equality if they've got the same hash code.
Any dictionary which relies on hash codes being unique is missing important information about hash codes, IMO :) (Unless it's operating under very specific conditions where it absolutely knows they'll be unique, i.e. it's using a perfect hash function.)

Given a perfect hashing function (which you're not typically going to have, as others have mentioned), you can find the maximum possible number of characters that guarantees no two strings will produce a collision, as follows:
No. of unique hash codes avilable = 2 ^ 32 = 4294967296 (assuming an 32-bit integer is used for hash codes)
Size of character set = 2 * 26 + 1 = 53 (26 lower as upper case letters in the Latin alphabet, plus underscore)
Then you must consider that a string of length l (or less) has a total of 54 ^ l representations. Note that the base is 54 rather than 53 because the string can terminate after any character, adding an extra possibility per char - not that it greatly effects the result.
Taking the no. of unique hash codes as your maximum number of string representations, you get the following simple equation:
54 ^ l = 2 ^ 32
And solving it:
log2 (54 ^ l) = 32
l * log2 54 = 32
l = 32 / log2 54 = 5.56
(Where log2 is the logarithm function of base 2.)
Since string lengths clearly can't be fractional, you take the integral part to give a maximum length of just 5. Very short indeed, but observe that this restriction would prevent even the remotest chance of a collision given a perfect hash function.
This is largely theoretical however, as I've mentioned, and I'm not sure of how much use it might be in the design consideration of anything. Saying that, hopefully it should help you understand the matter from a theoretical viewpoint, on top of which you can add the practical considersations (e.g. non-perfect hash functions, non-uniformity of distribution).

Universal Hashing
To calculate the probability of collisions with S strings of length L with W bits per character to a hash of length H bits assuming an optimal universal hash (1) you could calculate the collision probability based on a hash table of size (number of buckets) 'N`.
First things first we can assume a ideal hashtable implementation (2) that splits the H bits in the hash perfectly into the available buckets N(3). This means H becomes meaningless except as a limit for N.
W and 'L' are simply the basis for an upper bound for S. For simpler maths assume that strings length < L are simply padded to L with a special null character. If we were interested we are interested in the worst case this is 54^L (26*2+'_'+ null), plainly this is a ludicrous number, the actual number of entries is more useful than the character set and the length so we will simply work as if S was a variable in it's own right.
We are left trying to put S items into N buckets.
This then becomes a very well known problem, the birthday paradox
Solving this for various probabilities and number of buckets is instructive but assuming we have 1 billion buckets (so about 4GB of memory in a 32 bit system) then we would need only 37K entries before we hit a 50% chance of their being at least one collision. Given that trying to avoid any collisions in a hashtable becomes plainly absurd.
All this does not mean that we should not care about the behaviour of our hash functions. Clearly these numbers are assuming ideal implementations, they are an upper bound on how good we can get. A poor hash function can give far worse collisions in some areas, waste some of the possible 'space' by never or rarely using it all of which can cause hashes to be less than optimal and even degrade to a performance that looks like a list but with much worse constant factors.
The .NET framework's implementation of the string's hash function is not great (in that it could be better) but is probably acceptable for the vast majority of users and is reasonably efficient to calculate.
An Alternative Approach: Perfect Hashing
If you wish you can generate what are known as perfect hashes this requires full knowledge of the input values in advance however so is not often useful. In a simliar vein to the above maths we can show that even perfect hashing has it's limits:
Recall the limit of of 54 ^ L strings of length L. However we only have H bits (we shall assume 32) which is about 4 billion different numbers. So if you can have truly any string and any number of them then you have to satisfy:
54 ^ L <= 2 ^ 32
And solving it:
log2 (54 ^ L) <= 32
L * log2 54 <= 32
L <= 32 / log2 54 <= 5.56
Since string lengths clearly can't be fractional, you are left with a maximum length of just 5. Very short indeed.
If you know that you will only ever have a set of strings well below 4 Billion in size then perfect hashing would let you handle any value of L, but restricting the set of values can be very hard in practice and you must know them all in advance or degrade to what amounts to a database of string -> hash and add to it as new strings are encountered.
For this exercise the universal hash is optimal as we wish to reduce the probability of any collision i.e. for any input the probability of it having output x from a set of possibilities R is 1/R.
Note that doing an optimal job on the hashing (and the internal bucketing) is quite hard but that you should expect the built in types to be reasonable if not always ideal.
In this example I have avoided the question of closed and open addressing. This does have some bearing on the probabilities involved but not significantly

A hash algorithm isn't supposed to guarantee uniqueness. Given that there are far more potential strings (26^n for n length, even ignoring special chars, spaces, capitalization, non-english chars, etc.) than there are places in your hashtable, there's no way such a guarantee could be fulfilled. It's only supposed to guarantee a good distribution.

If your key is a string (e.g., a Dictionary) then it's GetHashCode() will be used. That's a 32bit integer. Hashtable defaults to a 1 key to value load factor and increases the number of buckets to maintain that load factor. So if you do see collisions they should tend to occur around reallocation boundaries (and decrease shortly after reallocation).

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.