Fast and compact object serialization in .NET

Fast and compact object serialization in .NET - c#

I want to use object serialization to communicate over the network between a Mono server and Silverlight clients.
It is pretty important that serialization is space efficient and pretty fast, as the server is going to host multiple real time games.
What technique should I use? The BinaryFormatter adds a lot of overhead to serialized classes (Version, culture, class name, property names, etc.) that is not required within this application.
What can I do to make this more space efficient?

You can use Protocol Buffers. I'm changing all my serialization code from BinaryFormatter with compression to Protocol Buffers and obtaining very good results. It's more efficient in both time and space.
There are two .NET implementations by Jon Skeet and Marc Gravell.
Update: Official .NET implementation can be found here.

I have some benchmarks for the leading .NET serializers available based on the Northwind dataset.
#marcgravell binary protobuf-net is the fastest implementations benchmarked that is about 7x faster than Microsoft fastest serializer available (the XML DataContractSerializer) in the BCL.
I also maintain some open-source high-performance .NET text serializers as well:
JSV TypeSerializer a compact, clean, JSON+CSV-like format that's 3.1x quicker than the DataContractSerializer
as well as a JsonSerializer that's 2.6x quicker.

As the author, I would invite you to try protobuf-net; it ships with binaries for both Mono 2.0 and Silverlight 2.0, and is fast and efficient. If you have any problems whatsoever, just drop me an e-mail (see my Stack Overflow profile); support is free.
Jon's version (see the earlier accepted answer) is also very good, but IMO the protobuf-net version is more idiomatic for C# - Jon's would be ideal if you were talking C# to Java, so you could have a similar API at both ends.

I had a similar problem, although I'm just using .NET. I wanted to send data over the Internet as quickly and easily as possible. I didn't find anything that would be optimized enough, so I made my own serializer, named NetSerializer.
NetSerializer has its limitations, but they didn't affect my use case. And I haven't done benchmarks for a while, but it was much much faster than anything else I found.
I haven't tried it on Mono or Silverlight. I'd bet it works on Mono, but I'm not sure what the level of support is for DynamicMethods on Silverlight.

You could try using JSON. It's not as bandwidth efficient as Protocol Buffers, but it would be a lot easier to monitor messages with tools like Wireshark, which helps a lot when debugging problems. .NET 3.5 comes with a JSON serializer.

You could pass the data through a DeflateStream or GZipStream to compress it prior to transmission. These classes live in the System.IO.Compression namespace.

I had a very similar problem - saving to a file. But the following can also be used over a network as it was actually designed for remoting.
The solution is to use Simon Hewitt's library - see Optimizing
Serialization in .NET - part 2.
Part 1 of the article states (the bold is my emphasis):
"... If you've ever used .NET remoting for large amounts of
data, you will have found that there are problems with
scalability. For small amounts of data, it works well
enough, but larger amounts take a lot of CPU and memory,
generate massive amounts of data for transmission, and
can fail with Out Of Memory exceptions. There is also a big
problem with the time taken to actually perform the
serialization - large amounts of data can make it unfeasible
for use in apps ...."
I got a similar result for my particular application, 40
times faster saving and 20 times faster loading (from
minutes to seconds). The size of the serialised data was
also much reduced. I don't remember exactly, but it
was at least 2-3 times.
It is quite easy to get started. However there is one
gotcha: only use .NET serialisation for the very highest
level datastructure (to get serialisation/deserialisation
started) and then call the serialisation/deserialisation
functions directly for the fields in the highest level
datastructure. Otherwise there will not be any speed-up...
For instance, if a particular data structure (say
Generic.List) is not supported by the library then .NET
serialisation will used instead and this is a no-no. Instead
serialise the list in client code (or similar). For an example
see near "'This is our own encoding." in the same function
as listed below.
For reference: code from my application - see near "Note: this is the only place where we use the built-in .NET ...".

You can try BOIS which focuses on packed data size and provides the best packing so far. (I haven't seen better optimization yet.)
https://github.com/salarcode/Bois

Related

Serialization in C# and de-serialization in Java

Is it possible to serialize the class/object in C# and deserialize the same in java. I want to serialize the class and not any XML/JSON data. Please clarify.
Thanks

I see 3 options here. I suggest option 1, Protobufs.
Look into Google's ProtoBufs
Or some equivalent. Here's the java version. Here's a C# port.
Protobufs meant for this sort of language interop. Its binary, small, fast, and language agnostic.
Also it has backwards compatibility, so if you change the serialized objects in the future, you can still read them. This feature is transparent to you too, long as you write code understanding newer variables could be missing when unserialized old objects. This is a huge advantage!
Implement one language's default serialization in the other
You can try implementing the java serialization logic in C#, or the C# serialization routines in Java. I don't suggest this as it will be more difficult, more verbose, almost certainly slower as you're writing new code, and will net you the same result.
Write your serialization routines by hand
This will certainly be fast, but tedious, more error prone, harder to maintain, less flexible...
Here's some benchmarks for libraries like ProtoBufs. This should aide you in selecting the best one for your use case.

We did this a while ago, it worked after lot of tinkering, it really depends on byte encoding, i think JAva uses one and C# uses another (little endian vs. big endian) so you will need to implement a deserializer which takes this affects into account. hope this helps

As others have suggested, your options are going to be external serialization libraries (Google Protobuff, Apache Thrift, etc), or simply using something built-in that's slower/less efficient bandwidth-wise (JSON, XML, etc). You could also write your own, but believe me, it's a maintenance nightmare.

Not using native serialization. The built-in defaults are tied to the binary representation of the data types, which are different for the different VMs. The purpose of XML, JSON, and similar technologies is precisely to provide a format that's generic and can be moved between differing systems. For what it's worth, the overhead in serializing to JSON is usually small, and there's a lot of benefit to being able to read the serialized objects manually, so I'd recommend JSON unless you have a very specific reason why you can't.

Consider OMG's standard CORBA IIOP.
While you many not need the full-on "remote object" support of CORBA, IIOP is the underlying binary protocol for "moving language-neutral objects" (such as an object value parameter) across the wire.
For Java: Java EE EJB's are based on IIOP, there is RMI-IIOP; various support libraries. The IDL-to-Java compiler is delivered with the JDK.
For C# IIOP & integration with Java EE, see IIOP.NET

You can also consider BSON, which is used by MongoDB.
If it is OK for your C#/Java programs to communicate with a mongodb database, you could store your objects there and read it with the appropriate driver.
Regarding BSON itself, see BSON and Data Interchange at the mongoDB blog.

Programatically filter XML in a streaming fashion (XmlWrappingReader/Writer alternatives?)

I'm working with some .NET services that have the potential to process significantly large XML documents, and I need to ensure that all processing is done in a streaming / pipelining fashion. I'm already using the XmlReader and XmlWriter classes. My question is, what is the best way to programmatically provide a filter into the reader and writer (either, depending upon the flow)?
(I am not looking for XSLT. I already do a lot with XSLT, and many of the things I'm looking to do are outside the scope of XSLT - or at least, implementing within XSLT would not be ideal.)
In Java & SAX, this would best be handled through a XMLFilterImpl. I do not see that .NET provides anything similar for working with a XmlReader. I did find this blog post, "On creating custom XmlReaders/XmlWriters in .NET 2.0, Part 2", which includes the following (I've fixed the first link from a broken link from the original post):
Here is the idea - have an utility wrapper class, which wraps
XmlReader/XmlWriter and does nothing else. Then derive from this class
and override methods you are interested in. These utility wrappers are
called XmlWrapingReader and XmlWrapingWriter. They are part of
System.Xml namespace, but unfortunately they are internal ones -
Microsoft XML team has considered making them public, but in the
Whidbey release rush decided to postpone this issue. Ok, happily these
classes being pure wrappers have no logic whatsoever so anybody who
needs them can indeed create them in a 10 minutes. But to save you
that 10 minutes I post these wrappers here. I will include
XmlWrapingReader and XmlWrapingWriter into the next Mvp.Xml library
release.
These 2 classes (XmlWrappingReader and XmlWrappingWriter) from the Mvp.Xml library are currently meeting my needs nicely. (As an added-bonus, it is a free & open-source library, BSD licensed.) However, due to the stale status of this project, I do have some concerns with including these classes in a contracted, commercial development project that will be handed-off. The last release of Mvp.Xml was 4.5 years ago in July 2007. Additionally, there is this comment from a "project coordinator" in response to this project discussion:
Anyway, this is not really a supported project anymore. All devs moved
out. But it's open source, you are on your own.
I've also found SAX equivalent in .Net, but SAXDotNet doesn't seem to be in any better shape - with its last release being in 2006.
I'm well aware that a stale project doesn't necessarily mean that it is any less useable, and will be moving forward with the 2 wrapper classes from the Mvp.Xml library - at least for now.
Are there any better alternatives that I should be considering? (Again, any solution must not require the entire XML to exist in-memory at any one time - whether as a DOM, a string, or otherwise.) Are there any other libraries available (preferably something from a more active project), or maybe something within the LINQ features that would meet these requirements?

Personally I find that writing a pipeline of filters works much better with a push model than a pull model, although both are possible. With a pull model, a filter that needs to generate multiple output events in response to a single input event is quite tricky to program, though of course it can be done by keeping track of the state. So I think that looking for a SAX-like approach makes sense.
I would look again at SaxDotNet or equivalents. Be prepared to look at the source code and bend it to your needs; consider contributing back your improvements. Intrinsically the job it is doing is very simple: a loop that reads events from the (pull) input and writes events to the (push) output. In fact, it's so simple that perhaps the reason it hasn't changed since 2006 is that it doesn't need to.

What is a good communication layer for both Java and C#?

I would like my newer C# 2.0 application to talk to my older Java 1.4 application (can't change versions, sorry). What are my options?
I think that using shared memory would give me better performance, but on the other hand, if I use a network protocol then the architecture would be more flexible. So I'm looking to weigh up both options to see which has the biggest pay off.
I've used XML-RPC implementations that are dog slow, but I assume that was just a bad implementation, and not the actual protocol. Would I be better off going with a lower-level protocol? I've used Google's protobuf before in C++ and Python (over plain old sockets) but I'm not so sure that it's available for Java and C# -- is there anything similar available for the languages I'm using?
I'm looking for the best performance that I can possibly get, but, I'm working with objects and inheritance hierarchies that I'd like to serialize (protobuf is a good example of how this can be done). So, sadly, just sending a simple string over sockets isn't really feasible.

Aha, there's actually C# versions of protobuf!
http://code.google.com/p/protosharp/
http://code.google.com/p/protobuf-csharp-port
... and protobuf has support for Java anyway.

you might also consider JSON as serialization for your objects, much lighter than XML but same capabilities to represent object hierarchies and many libraries are available.
for the communication bus however, i would recommend network since it gives better flexibility.
IMHO,your performance bottleneck is due to serialization/deserialization more than communication bus itself.

XML compression compatible to both Java and C#

Am building a C# front end that communicates to a Java Tomcat server via HTTP.
The WOX package is used to de/serialize the objects on the Java and C# ends.
However, I want to reduce the time spent in sending XML strings over HTTP, by using some XML compression packages.
My questions are:
Is using WOX de/serialization resulting in XML strings being passed back and forth, the best way to communicate between C# and Java?
What XML compression libraries (has to be free) should I consider to increase the speed?
Many thanks.
Chapax

I'd initially try just applied gzip compression at the HTTP level - partly because that should be able to be applied transparently to your app. XML generally compresses pretty well. Do you have a specific target in mind, so you'll know when a result is "good enough"? (If not, that might be the first thing to work out - otherwise you won't know when to stop.) Tomcat supports gzip compression as a connector configuration option.
As for whether XML is the right way to go - it certainly has advantages and disadvantages. There are plenty of other serialization options, including JSON, Thrift and Protocol Buffers. Each has pros and cons in terms of platform integration, size, readability, versioning etc. You should work out what's important to you and then look at the options in terms of those considerations.

Serializing vs Database

I believe that the best way to save your application state is to a traditional relational database which most of the time its table structure is pretty much represent the data model of our system + meta data.
However other guys in my team think that today it's best to simply serialize the entire object graph to a binary or XML file.
No need to say (but I'll still say it) that World War 3 is going between us and I would like to hear your opinion about this issue.
Personally I hate serialization because:
The data saved is adhered only to your development platform (C# in my case). No other platforms like Java or C++ can use this data.
Entire object graph (including all the inheritance chain) is saved and not only the data we need.
Changing the data model might cause severe backward compatibility issues when trying to load old states.
Sharing parts of the data between applications is problematic.
I would like to hear your opinion about that.

You didn't say what kind of data it is -- much depends on your performance, simultaneity, installation, security, and availability/centralization requirements.
If this data is very large (e.g. many instances of the objects in question), a database can help performance via its indexing capabilities. Otherwise it probably hurts performance, or is indistinguishable.
If your app is being run by multiple users simultaneously, and they may want to write this data, a database helps because you can rely on transactions to ensure data integrity. With file-based persistence you have to handle that yourself. If the data is single-user or single-instance, a database is very likely overkill.
If your app has its own soup-to-nuts installation, using a database places an additional burden on the user, who must set up and maintain (apply patches etc.) the database server. If the database can be guaranteed to be available and is handled by someone else, this is less of an issue.
What are the security requirements for the data? If the data is centralized, with multiple users (either simultaneous or sequential), you may need to manage security and permissions on the data. Without seeing the data it's hard to say whether it would be easier to manage with file-based persistence or a database.
If the data is local-only, many of the above questions about the data have answers pointing toward file-based persistence. If you need centralized access, the answers generally point toward a database.
My guess is that you probably don't need a database, based solely on the fact that you're asking about it mainly from a programming-convenience perspective and not a data-requirements perspective. Serialization, especially in .NET, is highly customizable and can be easily tailored to persist only the essential pieces you need. There are well-known best practices for versioning this data as well, so I'm not sure there's an advantage on the database side from that perspective.
About cross-platform concerns: If you do not know for certain that cross-platform functionality will be required in the future, do not build for it now. It's almost certainly easier overall to solve that problem when the time comes (migration etc.) than to constrain your development now. More often than not, YAGNI.
About sharing data between parts of the application: That should be architected into the application itself, e.g. into the classes that access the data. Don't overload the persistence mechanism to also be a data conduit between parts of the application; if you overload it that way, you're turning the persisted state into a cross-object contract instead of properly treating it as an extension of the private state of the object.

It depends on what you want to serialize of course. In some cases serialization is ridicilously easy.
(I once wrote kind of a timeline program in Java,
where you could draw en drag around and resize objects. If you were ready you could save it in file (like myTimeline.til). On that momenet hundreds of objects where saved, their position on the canvas, their size, their colors, their innertexts, their special effects,...
You could than ofcourse open myTimeLine.til and work further.
All this only asked a few lines of code. (just made all classes and their dependencies
serializable) and my coding time took less than 5 minutes, I was astonished myself! (it was the first time I used serialization ever)
Working on a timeline you could also 'saveAs' for different versions and the 'til' files where very easy to backup and mail.
I think in my particular case it would be a bit idiot to use databases. But that's of course for document-like structures only, like Word to name one.)
My point thus first : there are certainly several scenarios in which databases wouldn't be the best solution. Serialization was not invented by developers just because they were bored.
Not true if you use XMLserialization or SOAP
Not quite relevant anymore
Only if you are not carefull, plenty of 'best practices' for that.
Only if you want it to be problematic, see 1
Of course serialization has besides the speed of implementation other important advantages like not needing a database at all in some cases!

See this Stackoverflow posting for a commentary on the applicability of XML vs. the applicability of a database management system. It discusses an issue that's quite similar to the subject of the debate in your team.

You have some good points. I pretty much agree with you, but I'll play the devil's advocate.
Well, you could always write a converter in C# to extract the data later if needed.
That's a weak point, because disk space is cheap and the amount of extra bytes we'll use costs far less than the time we'll waste trying to get this all to work your way.
That's the way of the world. Burn the bridges and require upgrades. Convert the data, or make a tool to do that, and then no longer support the old version's way of doing it.
Not if the C# program hands off the data to the other applications. Other applications should not be accessing the data that belongs to this application directly, should they?

For transfer and offline storage, serialization is fine; but for active use, some kind of database is far preferable.
Typically (as you say), without a database, you need to deserialize the entire stream to perform any query, which makes it hard to scale. Add the inherent issues with threading etc, and you're asking for pain.
Some of your other pain points about serialization aren't all true - as long as you pick wisely. Obviously, BinaryFormatter is a bad choice for portability and versioning, but "protocol buffers" (Google's serialization format) has versions for Java, C++, C#, and a lot of others, and is designed to be version tolerant.

Just make sure you have a component that handles saving/loading state with a clean interface to the rest of your application. Then whatever choice you make for persistence can easily be revisited later.
Serializing an object graph to a file might be a good quick and dirty initial solution that is very quick to implement.
But if you start to run into issues that make a database a better choice you can plug in a new version with little or no impact on the rest of the application.

Yes propably true. The downside is that you must retrieve the whole object which is like retrieving all rows from a table. And if it's big it will be a downside. But if it ain't so big and with my hobbyprojects they are not, so maybe they should be a perfect match?

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.