C# Packing multiple signed integers into a single 64-bit value [closed] - c#

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I need to pack and unpack several values to/from a single 64-bit value. I have 3 signed integers (x,y,z). I would like to pack them into a single 64-bit value (signed or unsigned doesn't matter to me) using 24, 16, and 24 bits for the values respectively. Here are my requirements:
1) I can ensure ahead of times that the values being stored do not exceed the limits of the number of bits I am using to store them into the 64-bit value, so no additional checks need to be made.
2) The initial values are signed, so I'm thinking some kind of bit magic may need to be done in order to ensure that nothing is lost.
3) This conversion is going to take place a LOT, so it needs to be fast. I know in C++ this can pretty easily be done by storing the values in a struct that specifies the integer length, and then establishing a pointer that just points to the first value that can be used for the 64-bit value. With this method, there really isn't any math that needs done, everything is just memory read or right. As far as I can tell, this can't be done quite so simply in C#, but C# is what I have to work with for this project.
4) I don't really care if the 64-bit value is signed or unsigned, so long as I can go both ways with the operation and recover the initial values, and whatever type is used can be used for a Dictionary key.

Masks and shifts are probably your best option. You can create explicit layout structs in C#, but there's no 24-bit primitive, so you'd be tripping over yourself and have to mask anyway. As soon as you're shifting, it is usually best to work unsigned (especially when right-shifting), so:
ulong val = ((((ulong)x) & 0xFFFFFF) << 40) // 24 bits of x, left-shifted by 40
| ((((ulong)y) & 0xFFFF) << 24) // 16 bits of y, left-shifted by 24
| (((ulong)z) & 0xFFFFFF); // 24 bits of z, no left-shift
and to reverse that (assuming that we want uint values):
uint a = (uint)((val >> 40) & 0xFFFFFF),
b = (uint)((val >> 24) & 0xFFFF),
c = (uint)(val & 0xFFFFFF);

With this method, there really isn't any math that needs done, everything is just memory read or write.
Not really, the math is done when you set partial integers into bitfields, so there's quite a bit of math going on.
As far as I can tell, this can't be done quite so simply in C#, but C# is what I have to work with for this project.
Correct, in C# you would need to write code that combines bits into a long manually. Assuming that you have taken care of range checking, this is relatively straightforward:
static long Pack(long a24, long b16, long c24) {
// a24 can go with no masking, because its MSB becomes
// the MSB of the 64-bit number. The other two numbers
// need to be truncated to deal with 1s in the upper bits of negatives.
return a24<<40 | (b16&0xffffL)<<24 | (c24&0xffffffL);
}
static void Unpack(long packed, out int a24, out int b16, out int c24) {
a24 = (int)(packed >> 40); // Sign extension is done in the long
b16 = ((int)(packed >> 8)) >> 16; // Sign extension is done in the int
c24 = ((int)(packed << 8)) >> 8; // Sign extension is done in the int
}
Demo.

These values are byte-aligned inside the long, you'll want to take advantage of Intel/AMD processors being able to directly access them to make the code as fast as possible. The killer requirement is the 24 bit size, the processor can only directly read/write 8, 16, 32 or 64 bits.
That is a problem in C++ as well, you'd have to use bit-fields. C# does not support them, you'll have to write the code that the C++ compiler emits automatically. Like this:
[StructLayout(LayoutKind.Explicit)]
struct MyPackedLong {
[FieldOffset(0)] uint item1; // 24-bit field
[FieldOffset(3)] uint item2; // 24-bit field
[FieldOffset(6)] ushort item3; // 16-bit field
public uint Item1 {
get { return item1 & 0xffffff; }
set { item1 = (item1 & 0xff000000) | value; }
}
public uint Item2 {
get { return item2 & 0xffffff; }
set { item2 = (item2 & 0xff000000) | value; }
}
public ushort Item3 {
get { return item3; }
set { item3 = value; }
}
}
Some trickorama here, note how item2 has an intentional offset of 3 so that no shift is necessary. I ordered the fields so their access is optimal, putting the 16-bit value either first or last is best. Not thoroughly tested, ought to be in the ballpark. Be careful in threaded code, the writes are not atomic.

Related

Why Do Bytes Carryover?

I have been playing with some byte arrays recently (dealing with grayscale images). A byte can have values 0-255. I was modifying the bytes, and came across a situation where the value I was assigning to the byte was outside the bounds of the byte. It was doing unexpected things to the images I was playing with.
I wrote a test and learned that the byte carries over. Example:
private static int SetByte(int y)
{
return y;
}
.....
byte x = (byte) SetByte(-4);
Console.WriteLine(x);
//output is 252
There is a carryover! This happens when we go the other way around as well.
byte x = (byte) SetByte(259);
Console.WriteLine(x);
//output is 3
I would have expected it to set it to 255 in the first situation and 0 in the second. What is the purpose of this carry over? Is it just due to the fact that I'm casting this integer assignment? When is this useful in the real-world?
byte x = (byte) SetByte(259);
Console.WriteLine(x);
//output is 3
The cast of the result of SetByte is applying modulo 256 to your integer input, effectively dropping bits that are outside the range of a byte.
259 % 256 = 3
Why: The implementers choose to only consider the 8 least significant bits, ignoring the rest.
When compiling C# you can specify whether the assembly should be compiled in checked or unchecked mode (unchecked is default). You are also able to make certain parts of code explicit via the use of the checked or unchecked keywords.
You are currently using unchecked mode which ignores arithmetic overflow and truncates the value. The checked mode will check for possible overflows and throw if they are encountered.
Try the following:
int y = 259;
byte x = checked((byte)y);
And you will see it throws an OverflowException.
The reason why the behaviour in unchecked mode is to truncate rather than clamp is largely for performance reasons, every unchecked cast would require conditional logic to clamp the value when the majority of the time it is unnecessary and can be done manually.
Another reason is that clamping would involve a loss of data which may not be desirable. I don't condone code such as the following but have seen it (see this answer):
int input = 259;
var firstByte = (byte)input;
var secondByte = (byte)(input >> 8);
int reconstructed = (int)firstByte + (secondByte << 8);
Assert.AreEqual(reconstructed, input);
If firstByte came out as anything other than 3 this would not work at all.
One of the places I most commonly rely upon numeric carry over is when implementing GetHashCode(), see this answer to What is the best algorithm for an overridden System.Object.GetHashCode by Jon Skeet. It would be a nightmare to implement GetHashCode decently if overflowing meant we were constrained to Int32.MaxValue.
The method SetByte is irrelevant, simply casting (byte) 259 will also result in 3, since downcasting integral types is implemented as cutting of bytes.
You can create a custom clamp function:
public static byte Clamp(int n) {
if(n <= 0) return 0;
if(n >= 256) return 255;
return (byte) n;
}
Doing arithmetic modulo 2^n makes it possible for overflow errors in different directions to cancel each other out.
byte under = -12; // = 244
byte over = (byte) 260; // = 4
byte total = under + over;
Console.WriteLine(total); // prints 248, as intended
If .NET instead had overflows saturate, then the above program would print the incorrect answer 255.
The bounds control is not active for a case with direct type cast (when using (byte)) to avoid performance reducing.
FYI, result of most operations with operands of byte is integer, excluding the bit operations. Use Convert.ToByte() and you will get an Overflow Exception and you may handle it by assigning the 255 to your target.
Or you may create a fuction to do this check, as mentioned by another guy below.
If the perfomanse is a key, try to add attribute [MethodImpl(MethodImplOptions.AggressiveInlining)]
to that fuction.

Implement function from C++ in C# (MAKE_HRESULT - Windows function)

I have such code in C++
#define dppHRESULT(Code) \
MAKE_HRESULT(1, 138, (Code))
long x = dppHRESULT(101);
result being x = -2138439579.
MAKE_HRESULT is a windows function and defined as
#define MAKE_HRESULT(sev,fac,code) \
((HRESULT) (((unsigned long)(sev)<<31) | ((unsigned long)(fac)<<16) | ((unsigned long)(code))) )
I need to replicate this in C#. So I wrote this code:
public static long MakeHResult(uint facility, uint errorNo)
{
// Make HR
uint result = (uint)1 << 31;
result |= (uint)facility << 16;
result |= (uint)errorNo;
return (long) result;
}
And call like:
// Should return type be long actually??
long test = DppUtilities.MakeHResult(138, 101);
But I get different result, test = 2156527717.
Why? Can someone please help me replicate that C++ function also in C#? Such that I get similar output on similar inputs?
Alternative implementation.
If I use this implementation
public static long MakeHResult(ulong facility, ulong errorNo)
{
// Make HR
long result = (long)1 << 31;
result |= (long)facility << 16;
result |= (long)errorNo;
return (long) result;
}
this works on input 101.
But if I input -1, then C++ returns -1 as result while C# returns 4294967295. Why?
I would really appreciate some help as I am stuck with it.
I've rewritten the function to be the C# equivalent.
static int MakeHResult(uint facility, uint errorNo)
{
// Make HR
uint result = 1U << 31;
result |= facility << 16;
result |= errorNo;
return unchecked((int)result);
}
C# is more strict about signed/unsigned conversions, whereas the original C code didn't pay any mind to it. Mixing signed and unsigned types usually leads to headaches.
As Ben Voigt mentions in his answer, there is a difference in type naming between the two languages. long in C is actually int in C#. They both refer to 32-bit types.
The U in 1U means "this is an unsigned integer." (Brief refresher: signed types can store negative numbers, unsigned types cannot.) All the arithmetic in this function is done unsigned, and the final value is simply cast to a signed value at the end. This is the closest approximation to the original C macro posted.
unchecked is required here because otherwise C# will not allow you to convert the value if it's out of range of the target type, even if the bits are identical. Switching between signed and unsigned will generally require this if you don't mind that the values differ when you deal with negative numbers.
In Windows C++ compilers, long is 32-bits. In C#, long is 64-bits. Your C# conversion of this code should not contain the type keyword long at all.
SaxxonPike has provided the correct translation, but his explanation(s) are missing this vital information.
Your intermediate result is a 32-bit unsigned integer. In the C++ version, the cast is to a signed 32-bit value, resulting in the high bit being reinterpreted as a sign bit. SaxxonPike's code does this as well. The result is negative if the intermediate value had its most significant bit set.
In the original code in the question, the cast is to a 64-bit signed version, which preserves the old high bit as a normal binary digit, and adds a new sign bit (always zero). Thus the result is always positive. Even though the low 32-bits exactly match the 32-bit result in C++, in the C# version returning long, what would be the sign bit in C++ isn't treated as a sign bit.
In the new attempt in the question, the same thing happens (sign bit in the 64-bit number is always zero), but it happens in intermediate calculations instead of at the end.
You're calculating it inside an unsigned type (uint). So shifts are going to behave accordingly. Try using int instead and see what happens.
The clue here is that 2156527717 as an unsigned int is the same as -2138439579 as a signed int. They are literally the same bits.

When is a shift operator >> or << useful? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
When to use Shift operators << >> in C# ?
I've in programming a while and I've never used the shift operator. I could see how it could be helpful for calculating hash codes like in Tuple<T>, but other than that,
When and how is the shift operator useful in C#/.NET?
In general it's not used very often. But it's very useful when dealing with bit level operations. For example printing out the bits in a numeric value
public static string GetBits(int value) {
var builder = new StringBuilder();
for (int i = 0; i < 32; i++) {
var test = 1 << (31 - i);
var isSet = 0 != (test & value);
builder.Append(isSet ? '1' : '0');
}
return builder.ToString();
}
It's useful to write powers of two.
Quick: What's 227?
Answer: 1 << 27
Writing 1 << 27 is both easier and more understandable than 134217728.
I use it rather a lot in dealing with hardware. This isn't something you probably do in C# a lot, but the operator was inherited from C/C++ where it is a fairly common need.
Example 1:
I just got a longword from a little-endian machine, but I'm big endian. How do I convert it? Well, the obvious is call htonl() (you cheater). One of the manual ways to do it is with something like the following:
((source & 0x000000ff) << 24 ) |
((source & 0x0000ff00) << 8) |
((source & 0x00ff0000) >> 8) |
((source & 0xff000000) >> 24);
Example 2:
I have a DMA device that only allows longword accesses of up to 512K. So it requires me to put (for reasons only understood by hardware folks) the modulo 4 of the transfer size into the high order 18 bits of a DMA transfer control register. For the sake of arguement, the low-order bits are to be filled with various flags controlling the DMA operation. That would be accomplished like so:
dma_flags | ((length & 0xffffc) << 14);
These might not be the kind of things you do every day. But for those of us that regularly interface to hardware they are.
If you ever need to multiply without using * How to implement multiplication without using multiplication operator in .NET :)
Or write a Sudoku solver Sudoku validity check algorithm - how does this code works?
In practice, the only time I've seen it used in my (limited) experience was as an (arguably) confusing way to multiply (see first link) or in conjunction with setting BitFlags (the Sudoku solver above).
In .NET I rarely have to work at the bit level; but if you need to, being able to shift is important.
Bitwise operators are good for saving space, but nowadays, space is hardly an issue.
It's useful when multiplying by powers of 2
number<<power;
is number*2^power
And of course division by powers of 2:
number>>power;
Another place is flags in enums.
when you come across code like
Regex re = new Regex(".",RegexOptions.Multiline|RegexOptions.Singleline);
the ability to use RegexOptions.Multiline|RegexOptions.Singleline i.e multiple flags is enabled through the shifting and also this allows them to be unique.
Something like:
enum RegexOptions {
Multiline = (1 << 0),
Singleline = (1<<1)
};
Bit shifts are used when manipulating individual bits is desired. You'll see a lot of bit shifts in many encryption algorithms, for example.
In optimization, it can used in place of multiplication/division. A shift left is equal to multiplying by two. Shift right equals division. You probably don't see this done anymore, since this level of optimization is often unnecessary.
Other than that, I can't think of many reasons to use it. I've seen it used before, but rarely in cases where it was really required and usually a more readable method could have been used.
Whenever you need to multiply by 2 ;)
Really the only use I have is for interoperability code and bitfields:
http://www.codeproject.com/KB/cs/masksandflags.aspx

How to set endianness when converting to or from hex strings

To convert an integer to a hex formatted string I am using ToString("X4") like so:
int target = 250;
string hexString = target.ToString("X4");
To get an integer value from a hex formatted string I use the Parse method:
int answer = int.Parse(data, System.Globalization.NumberStyles.HexNumber);
However the machine that I'm exchanging data with puts the bytes in reverse order.
To keep with the sample data, If I want to send the value 250 I need a string of "FA00" (not 00FA which is what hexString is) Likewise if I get "FA00" I need to convert that to 250 not 64000.
How do I set the endianness of these two converstion methods?
Marc's answer seems, by virtue of having been accepted, to have addressed the OP's original issue. However, it's not really clear to me from the question text why. That still seems to require swapping of bytes, not pairs of bytes as Marc's answer does. I'm not aware of any reasonably common scenario where swapping bits 16 at a time makes sense or is useful.
For the stated requirements, IMHO it would make more sense to write this:
int target = 250; // 0x00FA
// swap the bytes of target
target = ((target << 8) | (target >> 8)) & 0xFFFF;
// target now is 0xFA00
string hexString = target.ToString("X4");
Note that the above assumes we're actually dealing with 16-bit values, stored in a 32-bit int variable. It will handle any input in the 16-bit range (note the need to mask off the upper 16 bits, as they get set to non-zero values by the << operator).
If swapping 32-bit values, one would need something like this:
int target = 250; // 0x00FA
// swap the bytes of target
target = (int)((int)((target << 24) & 0xff000000) |
((target << 8) & 0xff0000) |
((target >> 8) & 0xff00) |
((target >> 24) & 0xff));
// target now is 0xFA000000
string hexString = target.ToString("X8");
Again, masking is required to isolate the bits we are moving to specific positions. Casting the << 24 result back to int before or-ing with the other three bytes is needed because 0xff000000 is a uint (UInt32) literal and causes the & expression to be extended to long (Int64). Otherwise, you'll get compiler warnings with each of the | operators.
In any case, as this comes up most often in networking scenarios, it is worth noting that .NET provides helper methods that can assist with this operation: HostToNetworkOrder() and NetworkToHostOrder(). In this context, "network order" is always big-endian, and "host order" is whatever byte order is used on the computer hosting the current process.
If you know that you are receiving data that's big-endian, and you want to be able to interpret in as correct values in your process, you can call NetworkToHostOrder(). Likewise, if you need to send data in a context where big-endian is expected, you can call HostToNetworkOrder().
These methods work only with the three basic integer types: Int16, Int32, and Int64 (in C#, short, int, and long, respectively). They also return the same type passed to them, so you have to be careful about sign extension. The original example in the question could be solved like this:
int target = 250; // 0x00FA
// swap the bytes of target
target = IPAddress.HostToNetworkOrder((short)target) & 0xFFFF;
// target now is 0xFA00
string hexString = target.ToString("X4");
Once again, masking is required because otherwise the short value returned by the method will be sign-extended to 32 bits. If bit 15 (i.e. 0x8000) is set in the result, then the final int value would otherwise have its highest 16 bits set as well. This could be addressed without masking simply by using more appropriate data types for the variables (e.g. short when the data is expected to be signed 16-bit values).
Finally, I will note that the HostToNetworkOrder() and NetworkToHostOrder() methods, since they are only ever swapping bytes, are equivalent to each other. They both swap bytes, when the machine architecture is little-endian† . And indeed, the .NET implementation is simply for the NetworkToHostOrder() to call HostToNetworkOrder(). There are two methods mainly so that the .NET API matches the original BSD sockets API, which included functions like htons() and ntohs(), and that API in turn included functions for both directions of conversion mainly so that it was clear in code whether one was receiving data from the network or sending data to the network.
† And do nothing when the machine architecture is big-endian…they aren't useful as generalized byte-swapping functions. Rather, the expectation is that the network protocol will always be big-endian, and these functions are used to ensure the data bytes are swapped to match whatever the machine architecture is.
That isn't an inbuilt option. So either do string work to swap the characters around, or so some bit-shifting, I.e.
int otherEndian = (value << 16) | (((uint)value) >> 16);

C/C++ Date Solution/Conversion

I need to come up with a way to unpack a date into a readable format. unfortunately I don't completely understand the original process/code that was used.
Per information that was forwarded to me the date was packed using custom C/Python code as follows;
date = year << 20;
date |= month << 16;
date |= day << 11;
date |= hour << 6;
date |= minute;
For example, a recent packed date is 2107224749 which equates to Tuesday Sept. 22 2009 10:45am
I understand....or at least I am pretty sure....the << is shifting the bits but I am not sure what the "|" accomplishes.
Also, in order to unpack the code the notes read as follows;
year = (date & 0xfff00000) >> 20;
month = (date & 0x000f0000) >> 16;
day = (date & 0x0000f800) >> 11;
hour = (date & 0x000007c0) >> 6;
minute = (date & 0x0000003f);
Ultimately, what I need to do is perform the unpack and convert to readable format using either JavaScript or ASP but I need to better understand the process above in order to develop a solution.
Any help, hints, tips, pointers, ideas, etc. would be greatly appreciated.
The pipe (|) is bitwise or, it is used to combine the bits into a single value.
The extraction looks straight-forward, except I would recommend shifting first, and masking then. This keeps the constant used for the mask as small as possible, which is easier to manage (and can possibly be a tad more efficient, although for this case that hardly matters).
Looking at the masks used written in binary reveals how many bits are used for each field:
0xfff00000 has 12 bits set, so 12 bits are used for the year
0x000f0000 has 4 bits set, for the month
0x0000f800 has 5 bits set, for the day
0x000007c0 has 5 bits set, for the hour
0x0000003f has 6 bits set, for the minute
The idea is exactly what you said. Performing "<<" just shifts the bits to the left.
What the | (bitwise or) is accomplishing is basically adding more bits to the number, but without overwriting what was already there.
A demonstration of this principle might help.
Let's say we have a byte (8 bits), and we have two numbers that are each 4 bits, which we want to "put together" to make a byte. Assume the numbers are, in binary, 1010, and 1011. So we want to end up with the byte: 10101011.
Now, how do we do this? Assume we have a byte b, which is initialized to 0.
If we take the first number we want to add, 1010, and shift it by 4 bits, we get the number 10100000 (the shift adds bytes to the right of the number).
If we do: b = (1010 << 4), b will have the value 10100000.
But now, we want to add the 4 more bits (0011), without touching the previous bits. To do this, we can use |. This is because the | operator "ignores" anything in our number which is zero. So when we do:
10100000 (b's current value)
|
00001011 (the number we want to add)
We get:
10101011 (the first four numbers are copied from the first number,
the other four numbers copied from the second number).
Note: This answer came out a little long, I'm wikiing this, so, if anyone here has a better idea how to explain it, I'd appreciate your help.
These links might help:
http://www.gamedev.net/reference/articles/article1563.asp
http://compsci.ca/v3/viewtopic.php?t=9893
In the decode section & is bit wise and the 0xfff00000 is a hexadecimal bit mask. Basically each character in the bit mask represents 4 bits of the number. 0 being 0000 in binary and f being 1111 so if you look at the operation in binary you are anding 1111 1111 1111 0000 0000 ... with whatever is in date so basically you are getting the upper three nibbles(half bytes) and shifting them down so that 00A00000 gives you 10(A in hex) for the year.
Also note that |= is like += it is bit wise or then assignment rolled in to one.
Just to add some practical tips:
minute = value & ((1 << 6)-1);
hour = (value >> 6) & ((1<<5)-1); // 5 == 11-6 == bits reserved for hour
...
1 << 5 creates a bit at position 5 (i.e. 32=00100000b),
(1<<5)-1 cretaes a bit mask where the 5 lowest bits are set (i.e. 31 == 00011111b)
x & ((1<<5)-1) does a bitwise 'and' preserving only the bits set in the lowest five bits, extracting the original hour value.
Yes the << shifts bits and the | is the bitwise OR operator.

Categories