I'm looking at protobuf-net for implementing various messaging formats, and I particularly like the contract-based approach as I don't have to mess with the proto compiler. one thing I couldn't quite find information on is, does this make it difficult to work cross-platform? there are a few C++ apps that would need to be able to parse PB data, and while I understand that protobuf-net serializes to the PB standard format, if I use the contract approach and not a proto file, how does the C++ side parse the data?
can (should?) I write a separate proto file for the (very few) cases where C++ needs to understand the data? and if so, how exactly do I know that the C++ class generated from the proto file is going to match the data from the no-proto-file C# side?
Yes, in theory at least they should match at the binary level, but you might want to limit yourself to types that map simply to ".proto" - so avoid things like DateTime, inheritance ([ProtoInclude]), etc. This also has the advantage that you should be able to use:
string proto = Serializer.GetProto<YourType>();
to get the .proto; it (GetProto) isn't 100%, but it works for basic types. But ultimately, the answer is "testing and tweaking"; perhaps design for interop from the outset - i.e. test this early.
Related
Is there a 'correct' or preferred manner for sending data over a web socket connection?
In my case, I am sending the information from a C# application to a python (tornado) web server, and I am simply sending a string consisting of several elements separated by commas. In python, I use rudimentary techniques to split the string and then structure the elements into an object.
e.g:
'foo,0,bar,1'
becomes:
object = {
'foo': 0,
'bar': 1
}
In the other direction, I am sending the information as a JSON string which I then deserialise using Json.NET
I imagine there is no strictly right or wrong way of doing this, but are there significant advantages and disadvantages that I should be thinking of? And, somewhat related, is there a consensus for using string vs. binary formats?
Writing a custom encoding (eg, as "k,v,..") is different than 'using binary'.
It is still text, just a rigid under-defined one-off hand-rolled format that must be manually replicated. (What happens if a key or value contains a comma? What happens if the data needs to contain nested objects? How can null be interpreted differently than '' or 'null'?)
While JSON is definitely the most ubiquitous format for WebSockets one shouldn't (for interchange purposes) write JSON by hand - one uses an existing serialization library on both ends. (There are many reasons why JSON is ubiquitous which are covered in other answers - this doesn't mean it is always the 'best' format, however.)
To this end a binary serializer can also be used (BSON being a trivial example as it is effectively JSON-like in structure and operation). Just replace JSON.parse with FORMATX.parse as appropriate.
The only requirements are then:
There is a suitable serializer/deserializer for the all the clients and servers. JSON works well here because it is so popular and there is no shortage of implementations.
There are various binary serialization libraries with both Python and C# libraries, but it will require finding a 'happy intersection'.
The serialization format can represent the data. JSON usually works sufficiently and it has a very nice 1-1 correspondence with basic object graphs and simple values. It is also inherently schema-less.
Some formats are better are certain tasks and have different characteristics, features, or tool-chains. However most concepts (and arguably most DTOs) can be mapped onto JSON easily which makes it a good 'default' choice.
The other differences between different kinds of binary and text serializations is most mostly dressing - but if you'd like to start talking about schema vs. schema-less, extensibility, external tooling, metadata, non-compressed encoded sizes (or size after transport compression), compliance with a specific existing protocol, etc..
.. but the point to take away is don't create a 'new' one-off format. Unless of course, you just like making wheels or there is a very specific use-case to fit.
First advice would be to use the same format for both ways, not plain text in one direction and JSON in the other.
I personally think {'foo':0,'bar':1} is better than foo,0,bar,1 because everybody understands JSON but for your custom format they might not without some explanations. The idea is you are inventing a data interchange format when JSON is already one and #jfriend00 is right, pretty much every language now understands JSON, Python included.
Regarding text vs binary, there isn't any consensus. As # user2864740 mentions in the comments to my answer as long as the two sides understand each other, it doesn't really matter. This only becomes relevant if one of the sides has a preference for a format (consider for example opening the connection from the browser, using JavaScript - for that people might prefer JSON instead of binary).
My advice is to go with something simple as JSON and design your app so that you can change the wire format by swapping in another implementation without affecting the logic of your application.
so I feel like this is a thing that I've seen before, but I can't think what it's called.
I have a java program which loads up an object from a file (I'm going to call this object the catalogue). Defining the file format, and writing the parser for it has taken a fair amount of time. Recently I've had developers that are want to load the file so they can use the catalogue for their own tools, but are working in a different language (current just C#, but potentially others in future).
Most of our tools run out on a server, so to prevent other people having to write their own parser for the file format, I plan on having a web service running which uses my program to load the catalogue, then returns it's contents as a json string. (Encoding the original file as json is not an option). In future I may just return bits of the json string, since the whole contents of the file can be pretty massive.
What I'm wondering is, do I have to write my catalogue object in each language, and write a json parser for it in that language, or is there a tool that will allow me to do something similar? I'm hoping for something that just has a simple format for declaring storage classes, generate corresponding code for various languages, and have a default serialize/deserialize for something like json. Currently I'd need it to support java and C#.
Does such a thing exist?
As long as the target language can make a web call, and parse JSON strings, you should be OK. This covers just about all mainstream languages I think. C# is certainly OK.
I am recieving binary stream from an application I am running in Python.
From the binary stream, I want to create a C# object that is inside the stream in byte array.
How do I deserialise the object and retrieve the object from the binary stream?
We can ignore that it's a python application. I am more interested in how binary streaming works.
You seem to think that all languages automatically use the same serialization scheme.
This is not so.
It is not even theoretically possible, because different programming languages have different notions of what it means to be an object.
If you are specifically interested in how to read a Python serialized stream in C#, then ask that. Otherwise, this question is unanswerable because it is based on a false premise.
FOLLOW UP - Out of curiosity, I did some searching for a Python pickle reader in C#. Nothing in the first 3 pages of search results ... though there was a reference to a pickle reader in C++.
Just to add you a little general info:
In C#/.Net there's a general approach to serialize objects to NOT a binary form, because a binary form needs a lot of protocol-like headers to - note - include the metadata, and this causes the receiver to have to know the .Net/CLR inner structure very well.
Instead, today, the objects are usually serialized to XML (when type information is crucial) or JSON formats (when only data matters), so that any receiver may read them quite easily, and more often - any 3rd party may easily generate new object-like data that our application may "just deserialize", regardless of who generated it and on what platform.
However, binary serialization is still used. XML/JSON data, even if compressed, is still usually larger than the binary image. However, the binary serialization is strictly used when we want the data to not be published to the outside world, or if we somehow magically know that it will be only processed on .Net with use of our assemblies.
C# object
C# does not have objects; it's a .Net object.
Secondly we absolutely CANNOT ignore that it's a Python application, because that implies that it's likely it's not running on .Net and therefore the .Net binary format is not native to your Python runtime. That's not to say that it's not possible for the .Net serialization to be available to you in this case, because if you're running IronPython - the .Net python implementation - then you can simply use the Binary serialization APIs from within that and get the .Net object that was serialized.
If, however, it's Python running on a different platform, then you can decode the information in the binary stream, for that you need to know the format, and for that go straight to the horse's mouth and read through the Binary Format Data Structure spec from MSDN.
This will, of course, require (quite a lot) more work!
If the project you're working on allows you to change the way that the original object is serialized, then I strongly suggest changing over to XML serialization or something similar - that is designed to be portable.
I have two application that need to talk to each other. App1 needs to be able to serialize an object that App2 can then deserialize. Easily done, right? Here's the problem; App1 is C# based, App2 is Java based. So App1 needs to write out the file in the Java binary file format. How can this be done?
The way I see it, I have two options. The first is figure out some way to serialize a Java object in C#, so that App1 just creates the appropriate file. My other option would be to write a converter in Java that reads in a file and populates the object accordingly and serializes the newly populated object. That way the C# app would only have to write out some sort of formatted text file that the converter then interprets.
I can't make any changes to the Java application.
How should this be done?
Update:
The Java application is already in the hands of customers so changing the serialization scheme would cause the customers existing data to be incompatible. The Java App uses the native java serialization when dealing with this object. Modifications to the Java app can't happen.
The C# app uses protocol buffers to serialize its own data.
The best answer is option 3:
use a language-neutral serialization scheme.
I use JavaScript. Thrift is another option, protocol buffers I believe are more focused on RPC, but should be usable for serialization as well. XML or a custom binary format would be other options.
Edit:
Sorry, didn't notice that you can't make changes to the Java application. That said, the best way to do it would probably be to create your own well defined format, write a java app that can read that format, then output a serialized java object for the legacy app.
"IKVM" might be something you could use. This product allows you to convert compiled java bytecode (.jar, etc.) into a .NET DLL. It's super easy to use, and might give you the interop you need.
Other than this, the easiest way to accomplish this without a binary-level interop is to just use a plain text format, such as a CSV or XML.
Just use XML serialization. Both frameworks have good support, and the simplicity will make it easier to debug / maintain. Write a small program in Java that just imports the XML and writes the binary file.
Your best bet would be to write something that uses Java Native interface. Not fun, but it'll work.
You can do this directly using JNI (not fun but doable) or there may be some tools out there that will generate code for you -- take a look at SWIG: http://www.swig.org/
You would call Java from C# to do the persistence for you.
I can use write(&stName,sizeof(stName),&FileName) and define a same struct in other program to read the file(XXX.h) when i use C, But I want do the same use C# and I should not use the unsafe mode. How do to solve the problem?
Edit:
thanks all. I will to try them
Edit:
Now if I want to use C write the Struct to file.h and use C# to read the struct from file.h, may I have chance solve that and not to count the offset? Because count the offset is not a good answer when I want to add some variable or other struct in the struct.
Look at the ISerializable interface and Serialization in general.
Even in C, this is a dangerous thing to do IMO. If you use a different compiler, operating system, architecture etc you can very easily "break" your data files - you're absolutely relying on the layout of the data in memory. It's a bit like exposing fields directly instead of properties - the in-memory layout should be an implementation detail which can be changed without the public form (the file) changing.
There are lots of ways of storing data, of course. For example:
Binary serialization (still pretty fragile, IMO)
XML serialization
Google's Protocol Buffers
Thrift
YAML
Hand-written serialization/deserialization e.g. using BinaryReader and BinaryWriter
There are balances in terms of whether the file is human readable, speed, size etc. See this question for answers to a similar question about Java.
You should take a look at the NetDataContractSerializer. You can markup those portions of the struct that you wish to serializer and use a file stream to write them out.
Look at the StructLayoutAttribute
Use Managed C++ or C++/CLI. It can read your .h file struct. It can read and write using:
read(in, &structure, sizeof(structure));
write(out, &structure, sizeof(structure));
and it can transfer that data very simply to anything else in .NET.
You'll have to convert each member of the struct individually using Bitconverter.convert(). This works well when your struct holds numeric data types, but you might have to do something more complex when using structs that contain more complicated data types like strings, arrays, and collections. For purposes like this, you will want to check out the .Net serialization facilities that other have mentioned.
You can look into Google Protocol buffers as well. You may not want to add another dependency into your code, but it's meant to allow you to do this sort of thing.