I use the excellent FileHelpers library when I work with text data. It allows me to very easily dump text fields from a file or in-memory string into a class that represents the data.
In working with a big endian microcontroller-based system I need to read a serial data stream. In order to save space on the very limited microcontroller platform I need to write raw binary data which contains field of various multi-byte types (essentially just dumping a struct variable out the serial port).
I like the architecture of FileHelpers. I create a class that represents the data and tag it with attributes that tell the engine how to put data into the class. I can feed the engine a string representing a single record and get an deserialized representation of the data. However, this is different from object serialization in that the raw data is not delimited in any way, it's a simple binary fixed record format.
FileHelpers is probably not suitable for reading such binary data as it cannot handle the nulls that show up and* I suspect that there might be unicode issues (the engine takes input as a string, so I have to read bytes from the serial port and translate them into a unicode string before they go to my data converter classes). As an experiment I have set it up to read the binary stream and as long as I'm careful to not send nulls it works quite well so far. It is easy to set up new converters that read the raw data and account for endian foratting issues and such. It currently fails on nulls and cannot process multiple records (it expect a CRLF between records).
What I want to know is if anyone knows of an open-source library that works similarly to FileHelpers but that is designed to handle binary data.
I'm considering deriving something from FileHelpers to handle this task, but it seems like there ought to be something already available to do this.
*It turns out that it does not complain about nulls in the input stream. I had an unrelated bug in my test program that came up where I expected a problem with the nulls. Should have investigated a little deeper first!
I haven't used filehelpers, so I can't do a direct comparison; however, if you have an object-model that represents your objects, you could try protobuf-net; it is a binary serialization engine for .NET using Google's compact "protocol buffers" wire format. Much more efficient than things like xml, but without the need to write all your own serialization code.
Note that "protocol buffers" does include some very terse markers between fields (typically one byte); this adds a little padding, but greatly improves version tolerance. For "packed" data (i.e. blocks of ints, say, from an array) this can be omitted if desired.
So: if you just want a compact output, it might be good. If you need a specific output, probably less so.
Disclosure: I'm the author, so I'm biased; but it is free.
When I am fiddling with GPS data in the SIRFstarIII binary mode, I use the Python interactive prompt with the serial module to fetch the stream from the USB/serial port and the struct module to convert the bytes as needed (per some format defined by SIRF). Using the interactive prompt is very flexible because I can read the string to a variable, process it, view the results and try again if needed. After the prototyping stage is finished, I have the data format strings that I need to put into the final program.
Your question doesn't mention anything about why you have a C# tag. I understand FileHelpers is a C# library, but I that doesn't tell me what environment you are working in. There is an implementation of Python for .NET called IronPython.
I realize this answer might mean you have to learn a new language, but having an interactive prompt is a very powerful tool for any programmer.
Related
First of all, what I do at the moment:
I sniff a asyncron serial bus with 9 bit protocol and send the data to the PC. At the PC side I receive the data as an endless string, that looks like that: .12_80E886.02_80E894.13. The Software of the PC-side is written with winforms with C#. Now I have the problem that I haven´t a clearly start you can see it in the stream example. The reason for that is, that I start the sniff somewhere in the protocol.
What I want to do:
I think I can use startindex = IndexOf("_"), and set them now as new start. I have to evaluate sign´s in the stream the stream is build: _(timestamp in milliseconds).(addressbyte databyte). The only what I want to display in my RichTextBox is the databyte, also I need a data management method for the timestamp. Because I have in the GUI the function that I can see the time beetween two or more databyte´s, for that I think I make a sql database. The addressbyte need I to collor the byte with an one as address in a special collor.
Question:
How can I evaluate the stream so that i have alternately timestamp,
addressbyte and than databyte as single substring?
The reason why I want them so, is that, I think I can make an easy if elseif else block to realize all what I want to do.
When someone has an better suggestion for my project pls write it as comment.
With friendly wishes sniffi
I think you're trying to solve two problems at the same time. It would be better to separate them and solve them individually.
There is the issue of transporting the data, for this you are using streams. That is a valid solution. There is sending and receiving the data (bits) over the stream.
You have the problem of transforming these bits (after receiving them) into actual objects (dates, strings, etc..). For that you an use a simple parser, tokenizer, a local script that can get the correct parts from the data and convert it, or you can use a serialization framework (like DataContracts).
If you have simple data, I would opt for using a single method that can parse the data. For more complex scenarios I would look into serialization.
Also be ware that you will need to validate your inputs, since you cannot assume that there is always a trusted (non compromised) piece of software that is sending the bits to you.
I think string is bad choice. Propably data is send as bytes. Sniff rather bytes than string. And you need protocol description to understand data.
You need to read bytes form bus and interpret it.
Is there a 'correct' or preferred manner for sending data over a web socket connection?
In my case, I am sending the information from a C# application to a python (tornado) web server, and I am simply sending a string consisting of several elements separated by commas. In python, I use rudimentary techniques to split the string and then structure the elements into an object.
e.g:
'foo,0,bar,1'
becomes:
object = {
'foo': 0,
'bar': 1
}
In the other direction, I am sending the information as a JSON string which I then deserialise using Json.NET
I imagine there is no strictly right or wrong way of doing this, but are there significant advantages and disadvantages that I should be thinking of? And, somewhat related, is there a consensus for using string vs. binary formats?
Writing a custom encoding (eg, as "k,v,..") is different than 'using binary'.
It is still text, just a rigid under-defined one-off hand-rolled format that must be manually replicated. (What happens if a key or value contains a comma? What happens if the data needs to contain nested objects? How can null be interpreted differently than '' or 'null'?)
While JSON is definitely the most ubiquitous format for WebSockets one shouldn't (for interchange purposes) write JSON by hand - one uses an existing serialization library on both ends. (There are many reasons why JSON is ubiquitous which are covered in other answers - this doesn't mean it is always the 'best' format, however.)
To this end a binary serializer can also be used (BSON being a trivial example as it is effectively JSON-like in structure and operation). Just replace JSON.parse with FORMATX.parse as appropriate.
The only requirements are then:
There is a suitable serializer/deserializer for the all the clients and servers. JSON works well here because it is so popular and there is no shortage of implementations.
There are various binary serialization libraries with both Python and C# libraries, but it will require finding a 'happy intersection'.
The serialization format can represent the data. JSON usually works sufficiently and it has a very nice 1-1 correspondence with basic object graphs and simple values. It is also inherently schema-less.
Some formats are better are certain tasks and have different characteristics, features, or tool-chains. However most concepts (and arguably most DTOs) can be mapped onto JSON easily which makes it a good 'default' choice.
The other differences between different kinds of binary and text serializations is most mostly dressing - but if you'd like to start talking about schema vs. schema-less, extensibility, external tooling, metadata, non-compressed encoded sizes (or size after transport compression), compliance with a specific existing protocol, etc..
.. but the point to take away is don't create a 'new' one-off format. Unless of course, you just like making wheels or there is a very specific use-case to fit.
First advice would be to use the same format for both ways, not plain text in one direction and JSON in the other.
I personally think {'foo':0,'bar':1} is better than foo,0,bar,1 because everybody understands JSON but for your custom format they might not without some explanations. The idea is you are inventing a data interchange format when JSON is already one and #jfriend00 is right, pretty much every language now understands JSON, Python included.
Regarding text vs binary, there isn't any consensus. As # user2864740 mentions in the comments to my answer as long as the two sides understand each other, it doesn't really matter. This only becomes relevant if one of the sides has a preference for a format (consider for example opening the connection from the browser, using JavaScript - for that people might prefer JSON instead of binary).
My advice is to go with something simple as JSON and design your app so that you can change the wire format by swapping in another implementation without affecting the logic of your application.
I am recieving binary stream from an application I am running in Python.
From the binary stream, I want to create a C# object that is inside the stream in byte array.
How do I deserialise the object and retrieve the object from the binary stream?
We can ignore that it's a python application. I am more interested in how binary streaming works.
You seem to think that all languages automatically use the same serialization scheme.
This is not so.
It is not even theoretically possible, because different programming languages have different notions of what it means to be an object.
If you are specifically interested in how to read a Python serialized stream in C#, then ask that. Otherwise, this question is unanswerable because it is based on a false premise.
FOLLOW UP - Out of curiosity, I did some searching for a Python pickle reader in C#. Nothing in the first 3 pages of search results ... though there was a reference to a pickle reader in C++.
Just to add you a little general info:
In C#/.Net there's a general approach to serialize objects to NOT a binary form, because a binary form needs a lot of protocol-like headers to - note - include the metadata, and this causes the receiver to have to know the .Net/CLR inner structure very well.
Instead, today, the objects are usually serialized to XML (when type information is crucial) or JSON formats (when only data matters), so that any receiver may read them quite easily, and more often - any 3rd party may easily generate new object-like data that our application may "just deserialize", regardless of who generated it and on what platform.
However, binary serialization is still used. XML/JSON data, even if compressed, is still usually larger than the binary image. However, the binary serialization is strictly used when we want the data to not be published to the outside world, or if we somehow magically know that it will be only processed on .Net with use of our assemblies.
C# object
C# does not have objects; it's a .Net object.
Secondly we absolutely CANNOT ignore that it's a Python application, because that implies that it's likely it's not running on .Net and therefore the .Net binary format is not native to your Python runtime. That's not to say that it's not possible for the .Net serialization to be available to you in this case, because if you're running IronPython - the .Net python implementation - then you can simply use the Binary serialization APIs from within that and get the .Net object that was serialized.
If, however, it's Python running on a different platform, then you can decode the information in the binary stream, for that you need to know the format, and for that go straight to the horse's mouth and read through the Binary Format Data Structure spec from MSDN.
This will, of course, require (quite a lot) more work!
If the project you're working on allows you to change the way that the original object is serialized, then I strongly suggest changing over to XML serialization or something similar - that is designed to be portable.
Our company makes many embedded devices that communicate with PC's via applications that I write in C#.net. I have been considering different ways of improving the data transfer so that the PC application can be more easily synchronized with the devices current state (which in some cases is continually changing).
I have been thinking about an approach where the device formats it's description and state messages into an xml formatted message before sending them across either the serial port, USB, Ethernet Socket, etc. I was thinking that it may make the process of getting all of this data into my C# classes more simple.
The alternative is an approach where the host application sends a command like GETSTATUS and the device responds with an array of bytes, each representing a different property, sensor reading, etc.
I don't have a great deal of experience with xml but from what I have seen can be done with LINQ to XML it seems like it might be a good idea. What do you guys think? Is this something that is done commonly? Is it a horrible idea?!?
First, which ever way you go, make sure the returned data has a version number embedded so that you can revise the data structure.
Is both an option? Seriously, there are always situations where sending data in a more readable form are preferable, and others where a more dense representation is best (these are fewer than most people think, but I don't want to start a religious war about it). People will passionately argue for both, because they are optimizing for different things. Providing both options would satisfy both camps.
A nice, clear XML status could definitely lower the bar for people who are starting to work with your devices. You could also build a C# object that can be deserialized from the binary data that is returned.
It isn't a terrible idea, but it is probably an overdedesign. I would prefer to use a format that the embedded device will generate easier and faster. Then at the PC side I would insert a layer to conver it to a convenient format. You can also use LINQ with objects. Why don't send the data in binary form or in a simple ASCII protocol and then convert it to C# objects? You can use LINQ to access the data. In my opinion, in this case XML introduces an unnecessary complexity.
There are tradeoffs either way, so the right choice depends on your application, how powerful your device is and who is going to be using this protocol.
You mention that the alternative is a binary-serialized, request-response approach. I think that there are two separate dimensions here: the serialization format (binary or XML) and the communication style. You can use whatever serialization format you want in either a push protocol or in a request-response protocol.
XML might be a good choice if
Readability is important
If there is variation between devices, i.e. if you have different devices that have different properties, since XML tends to be self-describing.
Or if you want to publish your device's data to the Internet.
Of course, XML is verbose and there are certainly ways to accomplish all of the above with a binary protocol (e.g. with tagged values can be used to make your binary protocol more descriptive).
One of the founders of this very site has some sane and amusing opinions on XML in XML: The Angle Bracket Tax
I did something very similar in a previous design with PC to microprocessor communications using an XML format. It worked very well on the PC side since what Adobe Flex (what we were using) could interpret XML very easily, and I suspect .Net can do the same thing very easily.
The more complicated part of it was on the microprocessor side. The XML parsing had to be done manually, which was not really that complicated, but just time intensive. Creating the XML string can also be quite a lot of code depending on what you're doing.
Overall - If I had to do it again, I still think XML was a good choice because it is a very flexible protocol. RAM was not that much of an issue with regards to storing a few packets in our FIFO buffer on the microprocessor side but that may be something to consider in your application.
It's a waste of precious embedded CPU time to generate and transmit XML files. Instead, I would just use an array of binary bytes represent the data, but I would use structs to help interpret the data. The struct feature of C# lets you easily interpret an array of bytes as meaningful data. Here's an example:
[StructLayout(LayoutKind.Sequential, Pack = 1)]
public struct DeviceStatus
{
public UInt16 position; // Byte 0 and 1
public Byte counter; // Byte 2
public Fruit currentFruit; // Byte 3
};
enum Fruit : Byte
{
Off = 0,
Apple = 1,
Orange = 2,
Banana = 3,
}
Then you would have a function that converts your array of bytes to this struct:
public unsafe DeviceStatus getStatus()
{
byte[] dataFromDevice = fetchStatusFromDevice();
fixed (byte* pointer = dataFromDevice)
{
return *(DeviceStatus*)pointer;
}
}
Compared to XML, this method will save CPU time on the device and on the PC, and it is easier to maintain than an XML schema, with complementary functions for building and parsing the XML file. All you have to do is make sure that the struct and enum definitions in your embdedded device are the same as the definitions in your C# code, so that the C# program and device agree on the protocol to use.
You'll probably want to use the "packed" attribute on both the C# and embedded side so that all the struct elements are positioned in a predictable way.
Below is something I read and was wondering if the statement is true.
Serialization is the process of
converting a data structure or object
into a sequence of bits so that it can
be stored in a file or memory buffer,
or transmitted across a network
connection link to be "resurrected"
later in the same or another computer
environment.[1] When the resulting
series of bits is reread according to
the serialization format, it can be
used to create a semantically
identical clone of the original
object. For many complex objects, such
as those that make extensive use of
references, this process is not
straightforward.
Serialization is just a fancy way of describing what you do when you want a certain data structure, class, etc to be transmitted.
For example, say I have a structure:
struct Color
{
int R, G, B;
};
When you transmit this over a network you don't say send Color. You create a line of bits and send it. I could create an unsigned char* and concatenate R, G, and B and then send these. I just did serialization
Serialization of some kind is required, but this can take many forms. It can be something like dotNET serialization, that is handled by the language, or it can be a custom built format. Maybe a series of bytes where each byte represents some "magic value" that only you and your application understand.
For example, in dotNET I can can create a class with a single string property, mark it as serializable and the dotNET framework takes care of most everything else.
I can also build my own custom format where the first 4 bytes represent the length of the data being sent and all subsequent bytes are characters in a string. But then of course you need to worry about byte ordering, unicode vs ansi encoding, etc etc.
Typically it is easier to make use of whatever framework your language/OS/dev framework uses, but it is not required.
Yes, serialization is the only way to transmit data over the wire. Consider what the purpose of serialization is. You define the way that the class is stored. In memory tho, you have no way to know exactly where each portion of the class is. Especially if you have, for instance, a list, if it's been allocated early but then reallocated, it's likely to be fragmented all over the place, so it's not one contiguous block of memory. How do you send that fragmented class over the line?
For that matter, if you send a List<ComplexType> over the wire, how does it know where each ComplexType begins and ends.
The real problem here is not getting over the wire, the problem is ending up with the same semantic object on the other side of the wire. For properly transporting data between dissimilar systems -- whether via TCP/IP, floppy, or punch card -- the data must be encoded (serialized) into a platform independent representation.
Because of alignment and type-size issues, if you attempted to do a straight binary transfer of your object it would cause Undefined Behavior (to borrow the definition from the C/C++ standards).
For example the size and alignment of the long datatype can differ between architectures, platforms, languages, and even different builds of the same compiler.
Is serialization a must in order to transfer data across the wire?
Literally no.
It is conceivable that you can move data from one address space to another without serializing it. For example, a hypothetical system using distributed virtual memory could move data / objects from one machine to another by sending pages ... without any specific serialization step.
And within a machine, the objects could be transferred by switch pages from one virtual address space to another.
But in practice, the answer is yes. I'm not aware of any mainstream technology that works that way.
For anything more complex than a primitive or a homogeneous run of primitives, yes.
Binary serialization is not the only option. You can also serialize an object as an XML file, for example. Or as a JSON.
I think you're asking the wrong question. Serialization is a concept in computer programming and there are certain requirements which must be satisfied for something to be considered a serialization mechanism.
Any means of preparing data such that it can be transmitted or stored in such a way that another program (including but not limited to another instance of the same program on another system or at another time) can read the data and re-instantiate whatever objects the data represents.
Note I slipped the term "objects" in there. If I write a program that stores a bunch of text in a file; and I later use some other program, or some instance of that first program to read that data ... I haven't really used a "serialization" mechanism. If I write it in such a way that the text is also stored with some state about how it was being manipulated ... that might entail serialization.
The term is used mostly to convey the concept that active combinations of behavior and state are being rendered into a form which can be read by another program/instance and instantiated. Most serialization mechanism are bound to a particular programming language, or virtual machine system (in the sense of a Java VM, a C# VM etc; not in the sense of "VMware" virtual machines). JSON (and YAML) are a notable exception to this. They represents data for which there are reasonably close object classes with reasonably similar semantics such that they can be instantiated in multiple different programming languages in a meaningful way.
It's not that all data transmission or storage entails "serialization" ... is that certain ways of storing and transmitting data can be used for serialization. At very list it must be possible to disambiguated among the types of data that the programming language supports. If it reads: 1 is has to know whether that's text or an integer or a real (equivalent to 1.0) or a bit.
Strictly speaking it isn't the only option; you could put an argument that "remoting" meets the meaning inthe text; here a fake object is created at the receiver that contains no state. All calls (methods, properties etc) are intercepted and only the call and result are transferred. This avoids the need to transfer the object itself, but can get very expensive if overly "chatty" usage is involved (I.e. Lots of calls)as each has the latency of the speed of light (which adds up).
However, "remoting" is now rather out of fashion. Most often, yes: the object will need to be serialised and deserialized in some way (there are lots of options here). The paragraph is then pretty-much correct.
Having a messages as objects and serializing into bytes is a better way of understanding and managing what is transmitted over wire. In the old days protocols and data was much simpler, often, programmers just put bytes into output stream. Common understanding was shared by having well-known and simple specifications.
I would say serialization is needed to store the objects in file for persistence, but dynamically allocated pointers in objects need to be build again when we de-serialize, But the serialization for transfer depends on the physical protocol and the mechanism used, for example if i use UART to transfer data then its serialized bit by bit but if i use parallel port then 8 bits together gets transferred , which is not serialized