Serialization and object versioning in C#

Serialization and object versioning in C# - c#

If I want to serialize an object I have to use [Serializable] attribute and all member variables will be written to the file. What I don't know how to do versioning e.g. if I add a new member variable (rename a variable or just remove a variable) and then I open (deserialize) the file how can I determine the object/file version so I can correctly set the new member or take some kind of migration? How can I determine that the variable was initialized during the load or not (ignored by deserializer).
I know that there are version tolerant approaches and I can mark variables with [OptionalField(VersionAdded = 1)] attribute. If I open an old file the framework will ignore this optional (new variable) and it will be just zero/null. But again how can I determine if the variable is initialized by load or it was ignored.
I can write the class/object version number to the stream. Use the ISerializable approach and in the constructor(SerializationInfo oInfo, StreamingContext context) method read this version number. This will exactly tell me what is the class version in the stream.
However I expected that such kind of versioning is already implemented by the streaming framework in C#. I tried to obtain the Assembly version from the SerializationInfo but it is always set to current version not to the version which was used when the object was saved.
What is the preferred approach? I found a lot of articles on the net, but I could not find a good solution for this which addresses versioning...
Any help is appreciated
Thanks,
Abyss

Forgive me if some of what I write is too obvious,
First of all, please! you must stop thinking that you are serializing an object...
That is simply incorrect as the methods which are part of your object are not being persisted.
You are persisting information - and so.. DATA only.
.NET serialization also serializing the type name of your object which contain the assembly name and its version, so when you deserialize - it compares the persisted assembly information with the type that will be manifested with the information - if they are not the same it will return an exception.
Beside the versioning problem - not everything can be serialized so easily.. try to serialize a System.Drawing.Color type and you will begin to understand the problems with the over simplistic mechanism of .NET serialization.
Unless you plan to serialize something really simple which has no plans to evolve I wouldn't use the serialization mechanism provided by .NET.
Getting the focus back to your question, you can read here about the versioning ignorance ability:
http://msdn.microsoft.com/en-us/library/ms229752(v=vs.80).aspx which is provided for BinaryFormatter.
You should also check XML Serialization which has some nice abilities, but the biggest benefit is that you getting an XML which is Human readable so your data will never be lost even if you had complication with the versioning of your types.
But finally, I recommend you either use Database with Entity Framework to persist your data or write your own flat file manager.. while EF is very good for most solutions, sometime you might want something lighter to persist something very simple.
(my imply is that I can no longer see a solution where .NET serialization can be relevant.)
I hope this helps, Good luck.

Related

Version control a protobuf-net serialised C# object

I have serialized a C# class using protobuf-net. The resultant byte array is stored in a database. This is for performance reasons and I probably won't be able to change this design. The C# language doesn't make it possible to prevent classes being modified, and the class structure being passed in for deserialization with time may require changes that will not match that used for serialization, causing retrieval to fail.
Other than the wrapper technique suggested here, is there a pattern or technique for handling this kind of problem?

The only othe technique that comes to my mind is to version the classes that need to be deserialized in order to not loose anything when you need to make some changes. When you serialize an instance of those classes, you have to serialize also the version of the class (it could be a field of the the class itself).
I don't think this is the best solution but a solution.
The versioning strategy could become very difficult to manage when the changes (and the versions) start to grow.

C# - NO Sql Database with Dynamic Object Advice

I am considering using NO SQL databases such as MongoDb, RavenDb or any other ones recommend I would consider.
Can someone give me some advice, tutorials and useful links regarding my following question.
This system I want to write must be dynamic e.g. the model may change allot and should not be hard coded in C#.
For example if I had a JSON document saved holding ID, NAME, FULL NAME and then added a property called PHONENUMBER I would not want to rebuild the C# code or redeploy.
Is it possible to build C# models from a dynamic JSON? and then be able to manipulate it?
If so what library are most recommend for this type of system? What libraries work best with .NET?
This question is a step in to starting my university project.
Thanks for help & advice.

Yes, you can do that quite easily with RavenDB.
You can do it in one of two ways.
Either you will use a fully dynamic model, utilizing the C# dynamic keyword. That will let you do pretty much whatever you want, including adding properties at runtime, querying on runtime properties, etc.
However, a more common setup is that you'll use a lot of common properties (a customer has to have a name, for example). So you'll have a model that looks something like this:
public class Customer
{
public string Id {get;set;}
public string Name {get;set;}
public dynamic Props {get;set;}
}
The fixed properties are coded in C#, which helps you get into an easier, more consistent model and work with all the usual compiled tooling.
The dynamic stuff is in the Props property (which is usually initialized to ExpandoObject).
Note that you cannot do linq queries using dynamic. This is a limitation of C#, not RavenDB. You can still query dynamically using RavenDB, but you'll have to use the string based query API.

I implemented a Json.NET serializer wrapper that may help you:
https://github.com/welegan/RedisSessionProvider/blob/master/RedisSessionProvider/Serialization/RedisJSONSerializer.cs
I use it in my library which stores the contents of ASP.NET's Session object inside of Redis, which is a NoSQL option you did not mention. Still, given the typeless nature of Json, I imagine it will be applicable to your needs regardless of what NoSQL db you choose. The basic steps are:
Serialize:
Decide on a delimiter (I probably could have chosen a better one)
Store the type info (you can cache it for performance gains)
Store the object data after a delimiter
Deserialize:
Find the type info up to the delimiter
Deserialize the Type object
Pass Type as well as the object data to the library of your choosing. At the very least, Json.NET and ServiceStack.Json both expose serializers that will do the trick.
Edit
Seems I misunderstood part of your question. You want to be able to support adding json properties without redeploying your C#, and my example would strip out the extra properties during the serialize step back to the noSql db. You can use either a Dictionary<string, string> or ExpandoObject like ayende or mxmissile suggest, but keep in mind you will then have very few guarantees about the type of the properties of the object you get out.
In other words, you can freely add property names but as soon as you change the type of a property from int to long your code will break unexpectedly. Depending on your use case, that may or may not matter, just something to keep in mind.

Yes, using a Dictionary. However, I am not sure how those database systems handle dictionaries. Gracefully or not.

No, c# is compiled, so once that is done, there is no changing it without changing the source and compiling again. I think you should add some Javascript for that as it is a JS strong point.

BinaryFormatter deserialise malicious code?

I've heard there are safety questions over the BinaryFormatter.
I send user-generated files to the server from the client. These are serialized classes that are then read by the server.
From my understanding of the above link, this is dangerous. But I've tried sending disposable classes, and even tried a class that implemented ISerilizable. But both were rejected due to the server not knowing the source assembly.
[Serializable]
public class Ship : ISerializable
{
public Ship()
{
}
public Ship(SerializationInfo info, StreamingContext context)
{
Console.WriteLine("test");
}
public void GetObjectData(SerializationInfo info, StreamingContext context)
{
}
}
So how could a client successfully get code into my server via this vector? By faking the namespace name and public key causing the server to try deserialise it, thus running the above code? Or are there more subtle ways to do it?
This feature is a core fundamental to my game unfortunately so I want to be careful.

I know this is an old question but I'm not quite satisfied with the accepted answer.
Serialization works on data, not code. [...] It does NOT extract any code from the payload.
It's not that simple. BinaryFormatter uses assembly qualified names to identify types. During the deserialization those type names are resolved by Type.GetType, which happily loads any assemblies. Therefore, a manipulated stream can load a prepared assembly, whose module initializer is immediately executed (but the malicious code can be placed in a serialization constructor or [OnDeserializing]/[OnDeserialized] method, too). This video demonstrates how to exploit this to open a PowerShell and a web page in a browser.
In any case the answers to the original post were mostly speculation, comments on Java serialization that's not really relevant to .NET or really contrived examples.
Maybe just because the answer is too old, but today there are a lot of known BinaryFormatter attacks. Some examples:
TempFileCollection can be exploited to delete files (only in .NET Framework). This is also mentioned in the linked video (and also in the post linked in the question).
StructurelEqualityComparer can be used to cause a StackOverflowException or a hopelessly slow hash code calculation (DoS attack). Starting with .NET Core this type is not serializable anymore.
A lot of [Serializable] types that don't implement ISerializable (so they are restored just by setting their fields) can be initialized with invalid data.
Even most types that implement ISerializable don't validate the incoming SerializationInfo for all possible attacks. For example, Dictionary<TKey, TValue> can throw an OutOfMemoryException if the value of the HashSize entry is manipulated.
But BinaryFormatter is vulnerable at a deeper level, too. For example, arrays are handled internally (and it cannot be even overridden by a surrogate selector) and a manipulated length information can also cause an OutOfMemoryException.
I actually believed that BinaryFormatter can be made safe. I even opened an issue about this topic here: https://github.com/dotnet/runtime/issues/50909
But considering that security was never in focus when serializable types were implemented and it would be an enormous task to fix all these issues I can understand that BinaryFormatter will be obsoleted in the future versions.
And though I introduced a SafeMode option in my binary serializer, it cannot be completely safe as long as the serializable types themselves are vulnerable. Supporting many types natively can just reduce the threat (which is also good for producing very compact payload) but it cannot eliminate it in general.
Verdict: binary serialization is safe only when both serialization and deserialization is performed in the same process. In any other case you need to implement some additional security (eg. by signing the stream cryptographically) to be completely safe.

Serialization works on data, not code. A deserializer extracts the data from the payload you provide, consturcts a new object instance and sets the object's values from the extracted data. It does NOT extract any code from the payload.
If your code is vulnerable to malicious input in the first place, then yes, deserialization could be another way to attack it - just like any other way of injecting malicious data.
For example, if you construct SQL statements by concatenating strings, you will be vulnerable to SQL injection attack whether the strings come from user input or deserialized data. The way to fix this is to use parameterized queries, not avoid deserialization or try to sanitize the user's input.
In any case the answers to the original post were mostly speculation, comments on Java serialization that's not really relevant to .NET or really contrived examples.

The BinaryFormatter serializes fields and not methods.
The only way to transmit and load unknown code is to:
Receive an assembly and load it using Assembly.Load.
Use a CodeDomProvider to emit IL at runtime.

Why is Serializable Attribute required for an object to be serialized

Based on my understanding, SerializableAttribute provides no compile time checks, as it's all done at runtime. If that's the case, then why is it required for classes to be marked as serializable?
Couldn't the serializer just try to serialize an object and then fail? Isn't that what it does right now? When something is marked, it tries and fails. Wouldn't it be better if you had to mark things as unserializable rather than serializable? That way you wouldn't have the problem of libraries not marking things as serializable?

As I understand it, the idea behind the SerializableAttribute is to create an opt-in system for binary serialization.
Keep in mind that, unlike XML serialization, which uses public properties, binary serialization grabs all the private fields by default.
Not only this could include operating system structures and private data that is not supposed to be exposed, but deserializing it could result in corrupt state that can crash an application (silly example: a handle for a file open in a different computer).

This is only a requirement for BinaryFormatter (and the SOAP equivalent, but nobody uses that). Diego is right; there are good reasons for this in terms of what it does, but it is far from the only option - indeed, personally I only recommend BinaryFormatter for talking between AppDomains - it is not (IMO) a good way to persist data (to disk, in cache, to a database BLOB, etc).
If this behaviour causes you trouble, consider using any of the alternatives:
XmlSerializer, which works on public members (not just the fields), but demands a public parameterless constructor and public type
DataContractSerializer, which can work fully opt-in (using [DataContract]/[DataMember]), but which can also (in 3.5 and above) work against the fields instead
Also - for a 3rd-party option (me being the 3rd party); protobuf-net may have options here; "v2" (not fully released yet, but available as source) allows the model (which members to serialize, etc) to be described independently of the type, so that it can be applied to types that you don't control. And unlike BinaryFormatter the output is version-tolerant, known public format, etc.

Serialize in memory object with C#

I've got a program that picks up some code from script files and compiles it.
And It works fine.
The problem is: in the scripts I declare a couple of classes and I want to serialize them.
Obviously the C# serializer (xml and binary) doesn't like to serialize and the de-serialize object defined in a in-memory assembly.
I prefer to don't leave the in-memory assembly so i'm looking for another way of serializing, but in case, is possible to build assembly in memory and eventually write it on file ?

You could always write your own ToXml function using reflection to write out your property data to a string. Then your object would deserialize itself.
Just a thought.

If you want to create assemblies dynamically look into IL emitting via reflection. Here is a good article to get you started.

So just to clarify, are you asking how you can serialize a type if it hasn't got the [Serializable] attribute applied?
One solution is to use the WCF Data Contract Serializer: http://msdn.microsoft.com/en-us/library/ms731923.aspx.
Obviously this will only work if you can target .Net 3.0 or higher.
Alternately you can implement an ISerializationSurrogate. Jeffrey Richter has a great introduction at http://msdn.microsoft.com/en-us/magazine/cc188950.aspx.

I would avoid all built-in serialization whenever possible, both are badly broken. For example, XML serialization doesn't support dictionaries and normal serialization/SOAP doesn't support generics. And both have versioning issues.
It is time consuming, but createing ToXML and FromXML methods is probably to most effective way to go.

Hava a look at here for custom serialisers, which is a sample for dictionary XML serializing

I'm slightly confused by the statement that the XmlSerializer can't serialize dynamically generated types. The XmlSerializer generates it's own serialization code dynamically as well during construction so there should be no issue with it serializing your type.
You may need to decorate your dynamic classes with the appropriate attributes, depending on what you are generating (like derived classes), but there shouldn't be any issue with using the XmlSerializer in the situation you described.
If you could post details about the issues the XmlSerializer is giving you I can help you work out what the problem is.
Also, I'm of the belief that auto-generating code is in general a blessing. All to often have I had to go back into a class to fix one or all of the copy/paste/save/load functions, just because someone forgot to update them when adding a new variable. Save/Load code is boiler plate code. Let the computers write it.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.