Exact object to byte conversion, similar to memcpy in C#

Exact object to byte conversion, similar to memcpy in C# - c#

I have been looking for a solution for this for a little while and non of the stuff I've found quite matches. It is great that C# has a built in serialization library but that is not quite what I am looking for. I need to serialize objects in such a way that I can serialize them, append header data to the packet ie. ID number, timestamp, object type etc. and then be able to send it out without having to keep in mind the platform I am sending it to. In other words, I should be able to unwrap my packet in C++ or in Java without much more knowledge than what object type I am casting into and the order of header data. The binary formatter in C# creates a problem because it is designed to be deserialized on the other end using the same library. It also creates bloated packets which I would rather not have to deal with. I would rather format my packet as such
|========|========|=========|===|==============|
| packetID | datatype | timeStamp | etc | serializedObject |
|========|========|=========|===|==============|
It would be nice if I had access to something along the lines of memcpy in order to achieve this because then as long as the data type order of the object being deserialized into matches (Yes, assuming the other language has the same byte sizes for datatypes), it is easy to grab data from the server on a new platform (say I want to grab data of an Android (Java) or an iPhone (obj. C)) with little hassle

I've used Google's protocol buffers to good effect. It's small, fast, cross platform, backwards compatible and serializes to a binary format. It doesn't support the custom header information you're looking for, but if you can frame your own packets you can tack a custom header onto the binary stream as you see fit. Protobuf can compile down into C#, iOS, Java and C++.

Related

How do I get my object's metadata information in Binary format in C#?

I am recieving binary stream from an application I am running in Python.
From the binary stream, I want to create a C# object that is inside the stream in byte array.
How do I deserialise the object and retrieve the object from the binary stream?
We can ignore that it's a python application. I am more interested in how binary streaming works.

You seem to think that all languages automatically use the same serialization scheme.
This is not so.
It is not even theoretically possible, because different programming languages have different notions of what it means to be an object.
If you are specifically interested in how to read a Python serialized stream in C#, then ask that. Otherwise, this question is unanswerable because it is based on a false premise.
FOLLOW UP - Out of curiosity, I did some searching for a Python pickle reader in C#. Nothing in the first 3 pages of search results ... though there was a reference to a pickle reader in C++.

Just to add you a little general info:
In C#/.Net there's a general approach to serialize objects to NOT a binary form, because a binary form needs a lot of protocol-like headers to - note - include the metadata, and this causes the receiver to have to know the .Net/CLR inner structure very well.
Instead, today, the objects are usually serialized to XML (when type information is crucial) or JSON formats (when only data matters), so that any receiver may read them quite easily, and more often - any 3rd party may easily generate new object-like data that our application may "just deserialize", regardless of who generated it and on what platform.
However, binary serialization is still used. XML/JSON data, even if compressed, is still usually larger than the binary image. However, the binary serialization is strictly used when we want the data to not be published to the outside world, or if we somehow magically know that it will be only processed on .Net with use of our assemblies.

C# object
C# does not have objects; it's a .Net object.
Secondly we absolutely CANNOT ignore that it's a Python application, because that implies that it's likely it's not running on .Net and therefore the .Net binary format is not native to your Python runtime. That's not to say that it's not possible for the .Net serialization to be available to you in this case, because if you're running IronPython - the .Net python implementation - then you can simply use the Binary serialization APIs from within that and get the .Net object that was serialized.
If, however, it's Python running on a different platform, then you can decode the information in the binary stream, for that you need to know the format, and for that go straight to the horse's mouth and read through the Binary Format Data Structure spec from MSDN.
This will, of course, require (quite a lot) more work!
If the project you're working on allows you to change the way that the original object is serialized, then I strongly suggest changing over to XML serialization or something similar - that is designed to be portable.

Alternative ways to load / save data - without serialization?

Ok. I know how to use Serialization and such, but since that only applies to Objects that's been marked with Serialization attribute - how can I for example load data and use it in an application without using Serialization? Say a data file.
Or, create a datacontainer with serialization that holds files not serialized.
Methods I've used is Binary Serialization and XML Serialization. Any other ways that can load unknown data and perhaps somehow use it in C#?

JSON serialization using JSON.NET
This eats everything! Including anonymous types.
Edit
I know you said "you don't want serialization", but based on your statement "[...]Objects that's been marked with Serialization attribute", I believe you didn't try JSON serialization using JSON.NET!

Maybe a definition of terms is in order; serialization is "the process of converting a data structure or object state into a format that can be stored and "resurrected" later in the same or another computer environment". Pretty much any method of converting "volatile" memory into persistent data and back is "serialization", so even if you roll your own scheme to do it, you're "serializing".
That said, it sounds like you simply don't want to use .NET binary serialization. That's actually the right idea; binary serialization is simple, but very code- and environment-dependent. Moving a serializable class to a different namespace, or serializing a file using the Microsoft CLR and then trying to deserialize it in Mono, can break binary serialization.
First and foremost, you MUST be able to determine what type of object you should try to create based on the file. You simply cannot open some "random" file and expect to be able to get anything meaningful out of it without knowing how the data is structured within the file. The easiest way is for the file to tell you, by specifying the type name of the object it was created from (which you will hopefully have available in your codebase). Most built-in serializers do it this way. Other ways the file can inform consumers of its format include file, row and/or field header codes (very common in older standards as they economize on file size) and extension/MIME type.
With that sorted out, deserialization can take place. If the file was serialized using a built-in serializer, simply use that, but if it's an older format (CSV, fixed-length) then you will have to parse the file, line by line, into objects representing lines, collected within a main object representing the file.
Have a look at the ETL (Extract-Transform-Load) process pattern. This is a modular, scaleable architecture pattern for taking files and turning them into data the program can work with:
Extract - This part of the system is pointed at the filesystem, or other incoming "pipe" for raw data, and its job is to open the file, extract the data into a very basic object format that can be further manipulated, and put those objects into an in-memory "queue" for the Transform step. The goal is to get data from the pipe as fast and efficiently as possible, but you are required at this point to have some knowledge of the data you are working with so that you can effectively encapsulate it for further processing; actually turning the data into the format you really want happens later.
Transform - This part of the system takes the extracted data, and performs the logic that will put that data into a hydrated object from your codebase. This is where, given information from the Extract step about the type of file the data was extracted from, you instantiate a domain object that represents the data model, slice the raw data up into the chunks that will be stored as data members, perform any type conversions (data you get from a file is usually either in string format or in raw bits and must be marshalled or otherwise converted into data types that better represent the concept of the data), and validate that the internal structure of the new object is consistent and meets known business rules. Hydrated, valid objects are placed in an output queue to be processed by the Load step.
Load - This step takes the hydrated, valid business objects from the Transform step and persists them into the data store that is used by your system (such as a SQL database or the program's native flat file format).

Well, the old fashioned way was to use stream access operations and read out the data you wanted. This way you could read/write to pretty much any file.
Serialization simply automates this process based on some contract.
Based on your comment, I'm guessing that your requirement is to read any kind of file without having a contract in the first place.
Let's say you have a raw file with the first byte specifying the length of a string and the next set of bytes representing the string;
For example, 5 | H | e | l | l | o
var stream = File.Open(filename);
var length = stream.ReadByte();
byte[] b = new byte[length];
stream.Read(b, 0, length);
var string = Encoding.ASCII.GetString(b);
Binary I/O is as raw as it gets.
Check MSDN for more.

Is it smart to output data from embedded device in xml format?

Our company makes many embedded devices that communicate with PC's via applications that I write in C#.net. I have been considering different ways of improving the data transfer so that the PC application can be more easily synchronized with the devices current state (which in some cases is continually changing).
I have been thinking about an approach where the device formats it's description and state messages into an xml formatted message before sending them across either the serial port, USB, Ethernet Socket, etc. I was thinking that it may make the process of getting all of this data into my C# classes more simple.
The alternative is an approach where the host application sends a command like GETSTATUS and the device responds with an array of bytes, each representing a different property, sensor reading, etc.
I don't have a great deal of experience with xml but from what I have seen can be done with LINQ to XML it seems like it might be a good idea. What do you guys think? Is this something that is done commonly? Is it a horrible idea?!?

First, which ever way you go, make sure the returned data has a version number embedded so that you can revise the data structure.
Is both an option? Seriously, there are always situations where sending data in a more readable form are preferable, and others where a more dense representation is best (these are fewer than most people think, but I don't want to start a religious war about it). People will passionately argue for both, because they are optimizing for different things. Providing both options would satisfy both camps.
A nice, clear XML status could definitely lower the bar for people who are starting to work with your devices. You could also build a C# object that can be deserialized from the binary data that is returned.

It isn't a terrible idea, but it is probably an overdedesign. I would prefer to use a format that the embedded device will generate easier and faster. Then at the PC side I would insert a layer to conver it to a convenient format. You can also use LINQ with objects. Why don't send the data in binary form or in a simple ASCII protocol and then convert it to C# objects? You can use LINQ to access the data. In my opinion, in this case XML introduces an unnecessary complexity.

There are tradeoffs either way, so the right choice depends on your application, how powerful your device is and who is going to be using this protocol.
You mention that the alternative is a binary-serialized, request-response approach. I think that there are two separate dimensions here: the serialization format (binary or XML) and the communication style. You can use whatever serialization format you want in either a push protocol or in a request-response protocol.
XML might be a good choice if
Readability is important
If there is variation between devices, i.e. if you have different devices that have different properties, since XML tends to be self-describing.
Or if you want to publish your device's data to the Internet.
Of course, XML is verbose and there are certainly ways to accomplish all of the above with a binary protocol (e.g. with tagged values can be used to make your binary protocol more descriptive).

One of the founders of this very site has some sane and amusing opinions on XML in XML: The Angle Bracket Tax

I did something very similar in a previous design with PC to microprocessor communications using an XML format. It worked very well on the PC side since what Adobe Flex (what we were using) could interpret XML very easily, and I suspect .Net can do the same thing very easily.
The more complicated part of it was on the microprocessor side. The XML parsing had to be done manually, which was not really that complicated, but just time intensive. Creating the XML string can also be quite a lot of code depending on what you're doing.
Overall - If I had to do it again, I still think XML was a good choice because it is a very flexible protocol. RAM was not that much of an issue with regards to storing a few packets in our FIFO buffer on the microprocessor side but that may be something to consider in your application.

It's a waste of precious embedded CPU time to generate and transmit XML files. Instead, I would just use an array of binary bytes represent the data, but I would use structs to help interpret the data. The struct feature of C# lets you easily interpret an array of bytes as meaningful data. Here's an example:
[StructLayout(LayoutKind.Sequential, Pack = 1)]
public struct DeviceStatus
{
public UInt16 position; // Byte 0 and 1
public Byte counter; // Byte 2
public Fruit currentFruit; // Byte 3
};
enum Fruit : Byte
{
Off = 0,
Apple = 1,
Orange = 2,
Banana = 3,
}
Then you would have a function that converts your array of bytes to this struct:
public unsafe DeviceStatus getStatus()
{
byte[] dataFromDevice = fetchStatusFromDevice();
fixed (byte* pointer = dataFromDevice)
{
return *(DeviceStatus*)pointer;
}
}
Compared to XML, this method will save CPU time on the device and on the PC, and it is easier to maintain than an XML schema, with complementary functions for building and parsing the XML file. All you have to do is make sure that the struct and enum definitions in your embdedded device are the same as the definitions in your C# code, so that the C# program and device agree on the protocol to use.
You'll probably want to use the "packed" attribute on both the C# and embedded side so that all the struct elements are positioned in a predictable way.

Is serialization a must in order to transfer data across the wire?

Below is something I read and was wondering if the statement is true.
Serialization is the process of
converting a data structure or object
into a sequence of bits so that it can
be stored in a file or memory buffer,
or transmitted across a network
connection link to be "resurrected"
later in the same or another computer
environment.[1] When the resulting
series of bits is reread according to
the serialization format, it can be
used to create a semantically
identical clone of the original
object. For many complex objects, such
as those that make extensive use of
references, this process is not
straightforward.

Serialization is just a fancy way of describing what you do when you want a certain data structure, class, etc to be transmitted.
For example, say I have a structure:
struct Color
{
int R, G, B;
};
When you transmit this over a network you don't say send Color. You create a line of bits and send it. I could create an unsigned char* and concatenate R, G, and B and then send these. I just did serialization

Serialization of some kind is required, but this can take many forms. It can be something like dotNET serialization, that is handled by the language, or it can be a custom built format. Maybe a series of bytes where each byte represents some "magic value" that only you and your application understand.
For example, in dotNET I can can create a class with a single string property, mark it as serializable and the dotNET framework takes care of most everything else.
I can also build my own custom format where the first 4 bytes represent the length of the data being sent and all subsequent bytes are characters in a string. But then of course you need to worry about byte ordering, unicode vs ansi encoding, etc etc.
Typically it is easier to make use of whatever framework your language/OS/dev framework uses, but it is not required.

Yes, serialization is the only way to transmit data over the wire. Consider what the purpose of serialization is. You define the way that the class is stored. In memory tho, you have no way to know exactly where each portion of the class is. Especially if you have, for instance, a list, if it's been allocated early but then reallocated, it's likely to be fragmented all over the place, so it's not one contiguous block of memory. How do you send that fragmented class over the line?
For that matter, if you send a List<ComplexType> over the wire, how does it know where each ComplexType begins and ends.

The real problem here is not getting over the wire, the problem is ending up with the same semantic object on the other side of the wire. For properly transporting data between dissimilar systems -- whether via TCP/IP, floppy, or punch card -- the data must be encoded (serialized) into a platform independent representation.
Because of alignment and type-size issues, if you attempted to do a straight binary transfer of your object it would cause Undefined Behavior (to borrow the definition from the C/C++ standards).
For example the size and alignment of the long datatype can differ between architectures, platforms, languages, and even different builds of the same compiler.

Is serialization a must in order to transfer data across the wire?
Literally no.
It is conceivable that you can move data from one address space to another without serializing it. For example, a hypothetical system using distributed virtual memory could move data / objects from one machine to another by sending pages ... without any specific serialization step.
And within a machine, the objects could be transferred by switch pages from one virtual address space to another.
But in practice, the answer is yes. I'm not aware of any mainstream technology that works that way.

For anything more complex than a primitive or a homogeneous run of primitives, yes.

Binary serialization is not the only option. You can also serialize an object as an XML file, for example. Or as a JSON.

I think you're asking the wrong question. Serialization is a concept in computer programming and there are certain requirements which must be satisfied for something to be considered a serialization mechanism.
Any means of preparing data such that it can be transmitted or stored in such a way that another program (including but not limited to another instance of the same program on another system or at another time) can read the data and re-instantiate whatever objects the data represents.
Note I slipped the term "objects" in there. If I write a program that stores a bunch of text in a file; and I later use some other program, or some instance of that first program to read that data ... I haven't really used a "serialization" mechanism. If I write it in such a way that the text is also stored with some state about how it was being manipulated ... that might entail serialization.
The term is used mostly to convey the concept that active combinations of behavior and state are being rendered into a form which can be read by another program/instance and instantiated. Most serialization mechanism are bound to a particular programming language, or virtual machine system (in the sense of a Java VM, a C# VM etc; not in the sense of "VMware" virtual machines). JSON (and YAML) are a notable exception to this. They represents data for which there are reasonably close object classes with reasonably similar semantics such that they can be instantiated in multiple different programming languages in a meaningful way.
It's not that all data transmission or storage entails "serialization" ... is that certain ways of storing and transmitting data can be used for serialization. At very list it must be possible to disambiguated among the types of data that the programming language supports. If it reads: 1 is has to know whether that's text or an integer or a real (equivalent to 1.0) or a bit.

Strictly speaking it isn't the only option; you could put an argument that "remoting" meets the meaning inthe text; here a fake object is created at the receiver that contains no state. All calls (methods, properties etc) are intercepted and only the call and result are transferred. This avoids the need to transfer the object itself, but can get very expensive if overly "chatty" usage is involved (I.e. Lots of calls)as each has the latency of the speed of light (which adds up).
However, "remoting" is now rather out of fashion. Most often, yes: the object will need to be serialised and deserialized in some way (there are lots of options here). The paragraph is then pretty-much correct.

Having a messages as objects and serializing into bytes is a better way of understanding and managing what is transmitted over wire. In the old days protocols and data was much simpler, often, programmers just put bytes into output stream. Common understanding was shared by having well-known and simple specifications.

I would say serialization is needed to store the objects in file for persistence, but dynamically allocated pointers in objects need to be build again when we de-serialize, But the serialization for transfer depends on the physical protocol and the mechanism used, for example if i use UART to transfer data then its serialized bit by bit but if i use parallel port then 8 bits together gets transferred , which is not serialized

FileHelpers-like data import/export utility for binary data?

I use the excellent FileHelpers library when I work with text data. It allows me to very easily dump text fields from a file or in-memory string into a class that represents the data.
In working with a big endian microcontroller-based system I need to read a serial data stream. In order to save space on the very limited microcontroller platform I need to write raw binary data which contains field of various multi-byte types (essentially just dumping a struct variable out the serial port).
I like the architecture of FileHelpers. I create a class that represents the data and tag it with attributes that tell the engine how to put data into the class. I can feed the engine a string representing a single record and get an deserialized representation of the data. However, this is different from object serialization in that the raw data is not delimited in any way, it's a simple binary fixed record format.
FileHelpers is probably not suitable for reading such binary data as it cannot handle the nulls that show up and* I suspect that there might be unicode issues (the engine takes input as a string, so I have to read bytes from the serial port and translate them into a unicode string before they go to my data converter classes). As an experiment I have set it up to read the binary stream and as long as I'm careful to not send nulls it works quite well so far. It is easy to set up new converters that read the raw data and account for endian foratting issues and such. It currently fails on nulls and cannot process multiple records (it expect a CRLF between records).
What I want to know is if anyone knows of an open-source library that works similarly to FileHelpers but that is designed to handle binary data.
I'm considering deriving something from FileHelpers to handle this task, but it seems like there ought to be something already available to do this.
*It turns out that it does not complain about nulls in the input stream. I had an unrelated bug in my test program that came up where I expected a problem with the nulls. Should have investigated a little deeper first!

I haven't used filehelpers, so I can't do a direct comparison; however, if you have an object-model that represents your objects, you could try protobuf-net; it is a binary serialization engine for .NET using Google's compact "protocol buffers" wire format. Much more efficient than things like xml, but without the need to write all your own serialization code.
Note that "protocol buffers" does include some very terse markers between fields (typically one byte); this adds a little padding, but greatly improves version tolerance. For "packed" data (i.e. blocks of ints, say, from an array) this can be omitted if desired.
So: if you just want a compact output, it might be good. If you need a specific output, probably less so.
Disclosure: I'm the author, so I'm biased; but it is free.

When I am fiddling with GPS data in the SIRFstarIII binary mode, I use the Python interactive prompt with the serial module to fetch the stream from the USB/serial port and the struct module to convert the bytes as needed (per some format defined by SIRF). Using the interactive prompt is very flexible because I can read the string to a variable, process it, view the results and try again if needed. After the prototyping stage is finished, I have the data format strings that I need to put into the final program.
Your question doesn't mention anything about why you have a C# tag. I understand FileHelpers is a C# library, but I that doesn't tell me what environment you are working in. There is an implementation of Python for .NET called IronPython.
I realize this answer might mean you have to learn a new language, but having an interactive prompt is a very powerful tool for any programmer.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.