How to best handle bit data in C#?

How to best handle bit data in C#? - c#

I'm trying to implement different signals containing different data and I implemented various datatypes in C# to manage the data cleanly (mainly structs, some enums). Most of these types are oddly sized, say some are 9 bit or 3 bit etc.
I implemented them as their closest equivalent basic C# type (most are byte, ushort or short, with some ints and uints).
What is the general way of handling such data types in C#? In the end I have to combine all the bits into one byte array, but I'm not sure how to combine them.
I thought about getting the byte array of each type with a BitConverter and putting all data as booleans into a BitArray which I than can convert back to a byte array. But I can't seem to split the byte array.
Another way to do it would be shifting every single variable I have, but that seems really dirty to do. If a type changed from 32 bit to 31 bit in the future that would seem like a hassle to change.
How is this usually done? Any best practices or something?
Basically I want to combine different sized data into one byte array. For example pack a 3 bit variable, a 2 bit boolean and a 11 bit value into 2 bytes.
Since I have the types implemented as C# types I can do BitArray arr = new BitArray(BitConverter.GetBytes((short)MyType)), but this would give me 16 bit while MyType might only have 9 bit.

My task is only to implement the data structures and the packing as
binary data in C#
To explicitly manipulate the in-memory layout of your structures:
Use StructLayout. Generally, this is only appropriate if dealing with interop or specialized memory constraints. See MSDN for examples/documentation. Note that you'll take a performance hit if your data is not byte-aligned.
To design a data structure:
Just use an existing solution like ASN.1 or Protobuf. This problem has already been addressed by experts; take advantage of their skills and knowledge. As an added bonus, using standard protocols makes it far easier for third parties to implement custom interfaces.

Related

How can I make a Value Type collection in .NET?

The basic building block type of my application decomposes into a type (class or structure) which contains some standard value types (int, bool, etc..) and some array types of standard value types where there will be a small (but unknown) number of elements in the collection.
Given that I have many instances of the above building block, I would like to limit the memory usage of my basic type by using an array/collection as a Value Type instead of the standard Reference Type. Part of the problem is that my standard usage will be to have the arrays containing zero, one or two elements in them and the overhead of the array reference type in this scenario is prohibitive.
I have empirically observed and research has confirmed that the array wrapper itself introduces unwanted (by me, in this situation) overhead in each instance.
How do I make a collection a Value Type / Struct in .NET?
Side Note: it is interesting that Apple's Swift language has arrays as value types by default.
Pre-Emptive Comment
I am fully aware that the above is a non-standard way of using the .NET framework and is very bad practice etc...so it's not necessary to comment to that effect. I really just want to know how to achieve what I am asking.

The fixed keyword referenced in the docs seems to be what you're looking for. It has the same constraints on types as structs do, but it does require unsafe.
internal unsafe struct MyBuffer
{
public fixed char fixedBuffer[128];
}
If you wanted to also have a fixed array of your struct it would be more complicated. fixed only supports the base value types, so you'd have to drop into manual memory allocation.

A mix of ideas from a DirectBuffer and a BufferPool could work.
If you use a buffer pool then fixing buffers in memory is not a big issue because buffers become effectively long-lived and do not affect GC compaction as much as if you were fixing every new byte[] without a pool.
The DirectBuffer uses flyweight pattern and adds very little overhead. You could read/write any blittable struct directly using pointers. Other than SBE, Flatbuffers and Cup'n Proto also use such approach as far as I understand. In the linked implementation you should change the delegate so that it returns a discarded byte[] to the pool.
Big advantage of such solution is zero-copy if you need to interop with native code or send data over network. Additionally, you could allocate a single buffer and work with offsets/lengths in ArraySegment-like fashion.
Update:
I have re-read the question and realized that it was specifically about collections as value types. However the main rationale seems to be memory pressure, so this answer could be an alternative solution for memory, even though DirectBuffer is a class.

RTL_BITMAP the same as BitArray in C#?

Can someone tell me if the RTL_BITMAP structure (for use with RtlInitializeBitMap) in C++ is the same as a BitArray in C#? If not, is there anything that can be changed to make it the same? The reason I ask is because I am trying to port some C++ code to C# and some of the code utilizes a code that converts the Bitmap in VOLUME_BITMAP_BUFFER to a RTL_BITMAP structure.

Can someone tell me if the RTL_BITMAP structure (for use with RtlInitializeBitMap) in C++ is the same as a BitArray in C#?
Well, they are not the same in as much as being identical. But these two types (and the associated RtlXXX functions in the case of RTL_BITMAP) essentially implement the same data structure. Namely a compact array of boolean values.
There is quite a lot of high level functionality available for the RTL_BITMAP type that is not offered by the BitArray type. However, you may very well not need any of that, and any that you do need is readily implemented on top of BitArray.
So in summary, BitArray seems like a good starting place for your translation, based on the information that you have provided.

Protobuf-net IsPacked=true for user defined structures

Is it currently possible to use IsPacked=true for user defined structures? If not, then is it planned in the future?
I'm getting the following exception when I tried to apply that attribute to a field of the type ColorBGRA8[]: System.InvalidOperationException : Only simple data-types can use packed encoding
My scenario is as follows: I'm writing a game and have tons of blitable structures for various things such as colors, vectors, matrices, vertices, constant buffers. Their memory layout needs to be precisely defined at compile time in order to match for example the constant buffer layout from a shader (where fields generally? need to be aligned on a 16 byte boundary).
I don't mean to waste anyone's time, but I couldn't find any recent information about this particular question.
Edit after it has been answered
I am currently testing a solution which uses protobuf-net for almost everything but large arrays of user defined, but blitable structures. All my fields of arrays of custom structures have been replaced by arrays of bytes, which can be packed. After protobuf-net is finished deserializing the data, I then use memcpy via p/invoke in order to be able to work with an array of custom structures again.
The following numbers are from a test which serializes one instance containing one field of either the byte[] or ColorBGRA8[]. The raw test data is ~38MiB of data, e.g. 1000000 entries in the color array. Serialization was one in memory using MemoryStream.
Writing
Platform.Copy + Protobuf: 51ms, Size: 38,15 MiB
Protobuf: 2093ms, Size: 109,45 MiB
Reading
Platform.Copy + Protobuf: 43ms
Protobuf: 2307ms
The test shows that for huge arrays of more or less random data, a noticeable memory overhead can occur. This wouldn't have been such a big deal, if not for the (de)serialization times. I understand protobuf-net might not be designed for my extreme case, let alone optimized for it, but it is something I am not willing to accept.
I think I will stick with this hybrid approach, as protobuf-net works extremely well for everything else.

Simply "does not apply". To quote from the encoding specification:
Only repeated fields of primitive numeric types (types which use the varint, 32-bit, or 64-bit wire types) can be declared "packed".
This doesn't work with custom structures or classes. The two approaches that apply here are strings (length-prefixed) and groups (start/end tokens). The latter is often cheaper to encode, but Google prefer the former.
Protobuf is not designed to arbitrarily match some other byte layout. It is its own encoding format and is only designed to process / output protobuf data. It would be like saying "I'm writing XML, but I want it to look like {non-xml} instead".

Have Arrays in .NET lost their significance?

For every situation that warrants the use of an array ... there is an awesome collection with benefits. Is there any specific use case for Arrays any more in .NET?

Sending/Receiving data with a specific length comes to mind, ie. Serial Port, Web Request, FTP Request. Basically stuff that works on a lower level in the system. Also, most Collections are using an array for storage (Noteable exception: LinkedList<T>). Collections are just another abstraction layer.

Arrays are useful because they are always linear in memory and are fast to work with. For example I can take a byte[] and marshal directly into a structure without any problems but a List<T> would have to be converted to an array first as far as I know.

No they still have their uses and should always be considered.
Remember arrays are very basic representations of a fixed length so they are very fast and most languages understand them depending on the type used in the array.
You need to define an array size at the time that it is created and cannot change its size later. Lists and other things can grow as needed which adds overhead with respect to memory allocation.
Lists and other types are useful because they can do a lot but sometimes you don't need all that extra overhead so an array is all you need.
It's like driving a 4x4 because you think one day you might need to go off roading even though there is 99.9% of a chance you will be on normal roads. Array would be the basic car and a List for example would be a 4x4... it does everything else a car can do (an under the hood might use most of the same parts) but at the expense of gas, cost, might not fit in certain parking stalls, etc...
Arrays = performance and compatability
Lists (or other representations) = ease of use at a cost of performance and compatability

Yes, there certainly still is a use for arrays. Some methods still needs arrays.
For example:
string[] items = "a,b;c:d".Split(new char[]{',',';',':'});
It's still the simplest way to keep a bunch of items, and the number one choice until you need some specific feature, like for example dynamic growth.

Have arrays lost (some) significance?
Yes. For many tasks requiring a 'table' of items there now are more flexible and useful solutions like List<> and IEnumerable<>.
Have arrays lost their importance?
No. They are the fastest form of storage and they are used 'under the hood' in most of the collection classes, System.String etc.
So, arrays have become more low-level, and an application programmer will be using them directly less often.

Quite apart from everything else, many of those great collection classes you refer to are implemented using arrays. You might not be using them explicitly, but you're using loads of them and your program is better for it. That means that arrays must be in the language (or that the collections are implemented directly using lots of native code, which would be suckier).

Yes? Anytime I have a type which internally maintains a fixed-size collection of items, I use an array as it's the fastest to iterate and requires the least memory. No sense using a List<T>, Queue<T>, etc. if you don't need those features.

Two cases where I work daily with arrays:
Image analysis. Images are almost always byte[] or int[] arrays.
External hardware communication. Most devices require perfectly structured arrays to send/receive messages.

It's a fair question, but the answer is definitely that they're still useful. Speed is one reason, simplicity for fixed sizes is another. But I think the most important one is flexibility. It gives you a nice base to design your own collection, backed by a simple array, if you ever needed it.

No , Array will not loose its importance.
As when you know about the no of items in advance , you can go for array which gives you very fast access.
2- In Graph theroy , still when you store information about link between vertex , you can implment using arrays which are more fast than LinkList implementation.
3- Some methods like string.split returns array.
you can use this wonderful static placeholder of items in a verity of computer problems

I use arrays to maniulate images like in WritableBitmap class.

One thing I must tell you Arrays are the building blocks for any programming language. If you want to declare a storage having more than one element, Arrays are the basic option for you.
Say for instance a List.
If you see the definition of List, it actually holds
T[] items
Just use Reflector and find the definition of List, you will be surprised to find out List to be actually an Array. In .NET most of the collection other than LinkedList are basically an Array implementation. They used Array because of its fast storage and retrieval.
I agree that Array has limitation of Update or Remove, if your main emphasis is in storage than speed, you might go for Linked List.

What do you think the backing field behind many of those fancy collections is?

Judy array for managed languages

Judy array is fast data structure that may represent a sparse array or a set of values. Is there its implementation for managed languages such as C#? Thanks

It's worth noting that these are often called Judy Trees or Judy Tries if you are googling for them.
I also looked for a .Net implementation but found nothing.
Also worth noting that:
The implementation is heavily designed around efficient cache usage, as such implementation specifics may be highly dependent on the size of certain constructs used within the sub structures. A .Net managed implementation may be somewhat different in this regard.
There are some significant hurdles to it that I can see (and there are probably more that my brief scan missed)
The API has some fairly anti OO aspects (for example a null pointer is viewed as an empty tree) so simplistic, move the state pointer to the LHS and make functions instance methods conversion to C++ wouldn't work.
The implementation of the sub structures I looked at made heavy use of pointers. I cannot see these efficiently being translated to references in managed languages.
The implementation is a distillation of a lot of very complex ideas that belies the simplicity of the public api.
The code base is about 20K lines (most of it complex), this doesn't strike me as an easy port.
You could take the library and wrap the C code in C++/CLI (probably simply holding internally a pointer that is the c api trie and having all the c calls point to this one). This would provide a simplistic implementation but the linked libraries for the native implementation may be problematic (as might memory allocation).
You would also probably need to deal with converting .Net strings to plain old byte* on the transition as well (or just work with bytes directly)

Judy really doesn't fit well with managed languages. I don't think you'll be able to use something like SWIG and get the first layer done automatically.
I wrote PyJudy and I ended up having to make some non-trivial API changes to fit well in Python. For example, I wrote in the documentation:
JudyL arrays map machine words to
machine words. In practice the words
store unsigned integers or pointers.
PyJudy supports all four mappings as
distinct classes.
pyjudy.JudyLIntInt - map unsigned
integer keys to unsigned integer
values
pyjudy.JudyLIntObj - map unsigned
integer keys to Python object values
pyjudy.JudyLObjInt - map Python
object keys to unsigned integer
values
pyjudy.JudyLObjObj - map Python
object keys to Python object values
I haven't looked at the code for a few years so my memories about it are pretty hazy. It was my first Python extension library, and I remember I hacked together a sort of template system for code generation. Nowadays I would use something like genshi.
I can't point to alternatives to Judy - that's one reason why I'm searching Stackoverflow.
Edit: I've been told that my timing numbers in the documentation are off from what Judy's documentation suggests because Judy is developed for 64-bit cache lines and my PowerBook was only 32 bits.
Some other links:
Patricia tries (http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/PATRICIA/ )
Double-Array tries (http://linux.thai.net/~thep/datrie/datrie.html)
HAT-trie (http://members.optusnet.com.au/~askitisn/index.html)
The last has comparison numbers for different high-performance trie implementations.

This is proving trickier than I thought. PyJudy might be worth a look, as would be Tie::Judy. There's something on Softpedia, and something Ruby-ish. Trouble is, none of these are .NET specifically.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.