Get version when deserializing binary object

Get version when deserializing binary object - c#

I'm serializing a class using a BinaryFormatter. When I open the created file in a texteditor, I can see that at the beginning, some attributes like namespace, version, cultureInfo, ... are written there. How can I read this version string out when deserializing this file again?
Thank you in advance!

You should take a look at this articles at MSDN:
Run-time Serialization, Part 1
Run-time Serialization, Part 2
Run-time Serialization, Part 3
The BinaryFormatter has two properties: Binder and SurrogateSelector.
With these you are able to interfere the serialization / deserialization process and access these informations. More informations about it can be found in the articles above.

You probably should read that part like a normal file (read and check bytes).
However, why would you be interested in that part? If you are, than it's best to add your own version attributes in the normal way as other data to be serialized and retrieve it the normal way (by deserialization like all other data).
Remark to your comment:
If this is the first time, you could write an 'updater', which reads the old file and transforms it with a new (so change the enum values). For the new serialization object, add a version (always, and update it for each version your publish). This case, you can always differ on changes. By making such an update function, you always can change older versions of data to newer versions. In this case (since you don't have a version), you can assume it is the old version.

Related

Good coding practice when saving data to files in .net

To give a bit of a background.
I have created an application that allows users to save settings and then recall the settings at a later date. To do this I have created some serializable objects. I have gotten this to work using the BinaryFormatter without much trouble.
Where I start to run into problems is when I upgrade the software and add new settings. Now my serializable objects do not match and so I have to update the files. I have done this successfully for a few versions. But to do this I try deserializing the file and if it throws an exception, I try with the next version. . .and then the next. . .and then the next. . . until I find the right one. Then I have to write conversion functions for each of old versions to convert it into the newest version. I did create a "revision" file as well, so I can just check up front what version they have and then upgrade it, but I still have to keep a lot of different "versions" alive and write the conversion functions for all of them. . . which seems inherently messy to me and prone to bloat later on down the line if I keep going this route.
There has to be a better way to do this, I just am not sure how.
Thanks

You need to write a serialization binder to resolve assemblies.
For settings, I use a Dictionary<string, byte[]> to save to file. I serialize the dictionary and all is well. When I add new settings, I provide a default setting if not found in the settings file.
Also, if you are adding fields to a serialized object, you can decorate with [Optional].

This is exactly what the Settings class is for. You define default values in your app.config, and then a user can change them and when you save, their changes will save to a location in their user profile. When you read them you'll just get the modified settings.
This link is for VS 2005, but it works exactly the same in VS 2012: http://msdn.microsoft.com/en-us/library/aa730869(v=vs.80).aspx
Found the link for VS2012: http://msdn.microsoft.com/EN-US/library/k4s6c3a0(v=VS.110,d=hv.2).aspx

XML format is for such cases. You will find the necessary old settings in very early version of settings file. And even the old version can handle XML settings created from newer version. It does not work "automatically" i.e. with method like Serialize/Deserialize, but writing conversion functions is not easier or faster.

actually this can be done by adding a [DefaultValue()] attribute to the newer properties on your settings objects - at least for XML serialization. I haven't attempted this using binary serialzation. For xml, this means that they are "optional" and the serialization will not break when loading old version of the files. You can find this attribute in the System.ComponentModel namespace as so;
class MySettings
{
public int MaxNumLogins { get; set; }
// specify the value to default to if it's not present in the serialized file...
[DefaultValue(0)]
public int CacheTimeoutMinutes { get; set; }
}

In addition to naming the fields as others have suggested this sort of thing just cries out for version numbers.

You could have a look at ProtoBuf-Net http://code.google.com/p/protobuf-net/wiki/GettingStarted if you are doing Binary because all these things are covered regarding versioning etc. It is also very compact. It also is actively developed and if you potentially have cross platform requirements if you use a .proto file you can also achieve that.
If you would like people to be possibly able to edit the settings (outside of your program) then you could use the XML* serialisation methods.

Improve performance of XmlSerializer

I use a XmlSerializer to serialize/deserialize some objects. The problem is the performance. When profiling, using the XmlSerializer make our application 2 seconds longer to start. We cache our XmlSerializer and reuse them. We cannot use sgen.exe because we are creating the XmlSerializer with XmlAttributeOverrides.
I try to use serialization alternative like Json.Net and, at first, it's work great. The problem is that we need to be backward compatible so all the xml already generated need to be parsed correctly. Also, the object serialization output must be Xml.
To summarize:
I receive Xml data serialized by a XmlSerializer.
I need to deserialize the Xml data and convert it into an object.
I need to serialize object into Xml (ideally an Xml format like the one a XmlSerializer would have done)

Ultimately, it depends on the complexity of your model. XmlSerializer needs to do a lot of thinking, and the fact that it is taking so long leads me to suspect that your model is pretty complex. For a simple model, it might be possible to manually implement the deserialization using LINQ-to-XML (pretty easy), or maybe even XmlReader (if you are feeling very brave - it isn't easy to get 100% correct).
However, if the model is complex, this is a problem and frankly would be very risky in terms of introducing subtle bugs.
Another option is DataContractSerializer which handles xml, but not as well as XmlSerializer, and certainly not with as much control over the layout. I strongly suspect that DataContractSerializer would not help you.
There is no direct replacement for XmlSerializer that I am aware of, and if sgen.exe isn't an option I believe you basically have options:
live with it
rewrite XmlSerializer yourself, somehow doing better than them
use something like LINQ-to-XML and accept the effort involved
Long term, I'd say "switch formats", and use xml for legacy import only. I happen to know of some very fast binary protocols that would be pretty easy to substitute in ;p

The problem is that you are requesting types which are not covered by sgen which results in the generation of new assemblies during startup.
You could try to get your hands on the temp files generated by Xmlserializer for your specific types and use this code for your own pregnerated xmlserializer assembly.
I have used this approach to find out why csc.exe was executed which did delay startup of my application.
Besides this it might help to rename some types like in the article to arrive at the same type names that sgen is creating to be able to use sgen. Usually type arrays are not precreated by sgen which is a pitty sometimes. But if you name your class ArrayOf HereGoesYourTypeName then you will be able to use the pregenerated assemblies.

This answer has some good info on why XmlSerializer runs slow with XmlAttributeOverrides.
Do you really need to use the XmlSerializer in your main thread at startup?
Maybe run it in a background thread; If only some parts of the data are mandatory for startup perhaps you could manually read them into proxy/sparse versions of the real classes while the XmlSerializer initializes.
If its a GUI app you could just add a splash screen to hide the delay (or a game of tetris!!)
If all else fails can't you convert the existing files to JSON by just running the existing deserialize and a JSON serialize, or is there a hard requirement to keep them XML?

you have to Deserialize your list using a classic .net Serialization
something like the below:
TextReader tr = new StreamReader("yourlist.xml");
XmlSerializer serializer = new XmlSerializer(typeof(List<YourObject>));
List<YourObject> myCutomList = (List<YourObject>)serializer.Deserialize(tr);
tr.Close();
then you can use the Json.Serialization
JavaScriptSerializer json = new JavaScriptSerializer();
JsonResult output = json.Serialize(myCutomList );

If you have xml stored in the format that you cannot use then use xslt to transform it to the format that you can use.
If this xml is stored in the XmlSerializer format - lets say in flat files - or a db - then you can run your transforms over it once and not incur the XmlSerializer overhead at your usual runtime.
Alternatively you could xslt it at run time - but I doubt that this would be faster than the method outlined by Massimiliano.

You can use threading or tasks to get the application to startup faster and don't wait for the hardrive or the deserialization.

Another option I don't see mentioned in any answer here is to compile a serialization assembly. This way all the code generation and code compilation steps happen during compile-time in Visual Studio, not at runtime when your app is starting.
The OP mentions that the app takes too long to start. Well, that's exactly what serialization assembly is for.
In .NET Core the steps are pretty easy:
Add nuget reference to Microsoft.XmlSerializer.Generator
...that's it :)
More info here https://learn.microsoft.com/en-us/dotnet/core/additional-tools/xml-serializer-generator
P.S. If you're still using .NET Framework (not .NET Core), see this question Generating an Xml Serialization assembly as part of my build

Is there a way to generate c# code from serialized object in bson (file on a disk) to

What I have is a serialized object (given to me serialized from another language). I would like to generate c# code for this and use it in JSON.Net or similar. I have just started looking at JSON.Net capabilities. However, thought it may be interesting to ask it here in parallel.

I have found 2 great options:
json2csharp works well, which is a lightweight website giving .NET code that can be copy-pasted.
JsonCSharpClassGenerator is an executable that creates actual files in a subfolder of your choosing. So it's better for bulk .NET class generation from a large JSON string.

Based on the list here, there are several BSON implementations for C#:
http://bsonspec.org/#/implementation
One example is JSON.

what is serialization all about?

Where exactly does serialization comes into the picture? I read about serializtion on the 'net and I have come to know that
it is an interface that if implements in a class, means that it can be automatically be serialized and deserialized by the different serializers.
Give me a good reason why and when would a class needs to be serialized? Suppose once it's serialized, what happens exactly?

Serialization is needed whenever an object needs to be persisted or transmitted beyond the scope of its existence.
Persistence is the ability to save an object somewhere and load it later with the same state. For example:
You might need to store an object instance on disk as part of a file.
You might need to store an object in a database as a blob (binary large object).
Transmission is the ability to send an object outside of its original scope to some receiver. For example:
You might need to transmit an instance of an object to a remote machine.
You might need to transmit an instance to another AppDomain or process on the same machine.
For each of these, there must be some serial bit representation that can be stored, communicated, and then later used to reconstitute the original object. The process of turning an object into this series of bits is called "serialization", while the process of turning the series of bits into the original object is called "deserialization".
The actual representation of the object in serialized form can differ depending on what your goals are. For example, in C#, you have both XML serialization (via the XmlSerializer class) and binary serialization (through use of the BinaryFormatter class). Depending on your needs, you can even write your own custom serializer to do additional work such as compression or encryption. If you need a language- and platform-neutral serialization format, you can try Google's Protocol Buffers which now has support for .NET (I have not used this).
The XML representation mentioned above is good for storing an object in a standard format, but it can be verbose and slow depending on your needs. The binary representation saves on space but isn't as portable across languages and runtimes as XML is. The important point is that the serializer and deserializer must understand each other. This can be a problem when you start introducing backward and forward compatibility and versioning.
An example of potential serialization compatibility issues:
You release version 1.0 of your program which is able to serialize some Foo object to a file.
The user does some action to save his Foo to a file.
You release version 2.0 of your program with an updated Foo.
The user tries to open the version 1.0 file with your version 2.0 program.
This can be troublesome if the version 2.0 Foo has additional properties that the version 1.0 Foo didn't. You have to either explicitly not support this scenario or have some versioning story with your serialization. .NET can do some of this for you. In this case, you might also have the reverse problem: the user might try to open a version 2.0 Foo file with version 1.0 of your program.
I have not used these techniques myself, but .NET 2.0 and later has support for version tolerant serialization to support both forward and backward compatibility:
Tolerance of extraneous or unexpected data. This enables newer versions of the type to send data to older versions.
Tolerance of missing optional data. This enables older versions to send data to newer versions.
Serialization callbacks. This enables intelligent default value setting in cases where data is missing.

For example when you want to send objects over network or storing them into files.
Lets say you're creating a Savegame-format for a video-game. You then could make the class Player and every Enemy serializable. This way it would be easy to save the state of the current objects into a file.
On the other end, when writing a multiplayer-implementation for your game, you could send the Player serialized via network to the other clients, which then could handle these data.

In non-object-oriented languages, one would typically have data stored in memory in a pattern of bytes that would 'make sense' without reference to anything else. For example, a bunch of shapes in a graphics editor might simply have all their points stored consecutively. In such a program, simply storing the contents of all one's arrays to disk might yield a file which, when read back into those arrays would yield the original data.
In object-oriented languages, many objects are stored as references to other objects. Merely storing the contents of in-memory data structures will not be useful, because a reference to object #24601 won't say anything about what that object represents. While an object-oriented system may be able to do a pretty good job figuring out what the in-memory data "mean" and try to convert it automatically to a sensible format, it can't recognize all the distinctions between object references which point to the same object, and those that point to objects which happen to match. It's thus often necessary to help out the system when converting objects to a raw stream of bits.

Not classes, but the specific objects might be serialized to store in some persistent storage or to pass the object to another application/via network.

for example, when you want to send a object to some url, you might decide to send it in xml format. The process of converting from the in-memory object to (in this case) xml, is called serialization. Converting from xml to a in-memory is called de-serialization.

versioning serialized files

I've got a working app that serializes a document (of type IDocument) to disk. From there there's another app that I've made that can open that document (IDocument implements IPrintDocument) for viewing.
Let's assume that I've written an IDocument to disk, and then a week later a field gets added to the IDocument object. Both the program that writes the files and the one that opens them are updated with this new 'version' of IDocument. It will then break (I assume - haven't had a chance to check, I'm looking ahead here) when trying to open the previous IDocument version. Is there a known pattern that alleviates this kind of problem?

Yes - use a serialization mechanism which is tolerant to versioning.
Predictably enough, I'm going to suggest using Google's Protocol Buffers, for which there are at least two viable .NET implementations. So long as you're careful, Protocol Buffers are both backward and forward compatible - you can read a new message with old code and vice versa, and the old code will still be able to preserve the information it doesn't understand.
Another alternative is XML, whether using .NET's built-in XML serialization or not. The built-in serialization isn't particularly flexible in terms of versioning as far as I'm aware.

The .net built-in serialization is an option, but it does requires you to add place holders on the specific pieces that you want to extend in the future.
You add place holders for the extra elements/attributes like the following code:
[XmlAnyElement()]
public XmlElement[] ExtendedElements { get; set; }
[XmlAnyAttribute()]
public XmlAttribute[] ExtendedAttributes { get; set; }
By adding the above in the involved classes, you can effectively read a information saved that has extra elements/attributes, modify the normal properties that the software knows how to handle and save it. This allows for both backwards and forward compatibility. When adding a new field, just add the desired property.
Note that the above is limited to extend in the specified hooks.
Update: As Jon mentioned in the comment, the above will only work for xml serialization. As far as I know binary serialization doesn't support something similar. In binary serialization you can get both old/new version of the app to be able to read each other serialized info (.net 2.0+), but if you save it back you will loose the extra info the version doesn't handles.
Starting at .net 2.0 the de-serialization process ignores the extra data, if you combine that with optional fields you can effectively get both apps to read other version's formats. The problem is that the data isn't hold by the class like in the xml fields.
Some related links: http://msdn.microsoft.com/en-us/library/system.runtime.serialization.serializationbinder.aspx, http://msdn.microsoft.com/en-us/library/ms229752.aspx
If you don't want xml serialization, I would go with Jon's approach.
Ps. I am unaware if there is some good third party implementation, that we can access, that extends the binary serialization to hold and save the extra data.

Built in serialization should give you some minimal tolerance for version updates using the [OptionalField] attribute. But stuff can get tricky really fast so you better look at using a framework that solved these issues like Jons protobuffers etc...
Another couple of options would be to use an embedded DB like Sqlite for your document store. And manually (or using an ORM) map properties/fields in your object to columns in a table.
or
Use Lucene which will also give you fulltext search through your documents.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.