I've heard there are safety questions over the BinaryFormatter.
I send user-generated files to the server from the client. These are serialized classes that are then read by the server.
From my understanding of the above link, this is dangerous. But I've tried sending disposable classes, and even tried a class that implemented ISerilizable. But both were rejected due to the server not knowing the source assembly.
[Serializable]
public class Ship : ISerializable
{
public Ship()
{
}
public Ship(SerializationInfo info, StreamingContext context)
{
Console.WriteLine("test");
}
public void GetObjectData(SerializationInfo info, StreamingContext context)
{
}
}
So how could a client successfully get code into my server via this vector? By faking the namespace name and public key causing the server to try deserialise it, thus running the above code? Or are there more subtle ways to do it?
This feature is a core fundamental to my game unfortunately so I want to be careful.
I know this is an old question but I'm not quite satisfied with the accepted answer.
Serialization works on data, not code. [...] It does NOT extract any code from the payload.
It's not that simple. BinaryFormatter uses assembly qualified names to identify types. During the deserialization those type names are resolved by Type.GetType, which happily loads any assemblies. Therefore, a manipulated stream can load a prepared assembly, whose module initializer is immediately executed (but the malicious code can be placed in a serialization constructor or [OnDeserializing]/[OnDeserialized] method, too). This video demonstrates how to exploit this to open a PowerShell and a web page in a browser.
In any case the answers to the original post were mostly speculation, comments on Java serialization that's not really relevant to .NET or really contrived examples.
Maybe just because the answer is too old, but today there are a lot of known BinaryFormatter attacks. Some examples:
TempFileCollection can be exploited to delete files (only in .NET Framework). This is also mentioned in the linked video (and also in the post linked in the question).
StructurelEqualityComparer can be used to cause a StackOverflowException or a hopelessly slow hash code calculation (DoS attack). Starting with .NET Core this type is not serializable anymore.
A lot of [Serializable] types that don't implement ISerializable (so they are restored just by setting their fields) can be initialized with invalid data.
Even most types that implement ISerializable don't validate the incoming SerializationInfo for all possible attacks. For example, Dictionary<TKey, TValue> can throw an OutOfMemoryException if the value of the HashSize entry is manipulated.
But BinaryFormatter is vulnerable at a deeper level, too. For example, arrays are handled internally (and it cannot be even overridden by a surrogate selector) and a manipulated length information can also cause an OutOfMemoryException.
I actually believed that BinaryFormatter can be made safe. I even opened an issue about this topic here: https://github.com/dotnet/runtime/issues/50909
But considering that security was never in focus when serializable types were implemented and it would be an enormous task to fix all these issues I can understand that BinaryFormatter will be obsoleted in the future versions.
And though I introduced a SafeMode option in my binary serializer, it cannot be completely safe as long as the serializable types themselves are vulnerable. Supporting many types natively can just reduce the threat (which is also good for producing very compact payload) but it cannot eliminate it in general.
Verdict: binary serialization is safe only when both serialization and deserialization is performed in the same process. In any other case you need to implement some additional security (eg. by signing the stream cryptographically) to be completely safe.
Serialization works on data, not code. A deserializer extracts the data from the payload you provide, consturcts a new object instance and sets the object's values from the extracted data. It does NOT extract any code from the payload.
If your code is vulnerable to malicious input in the first place, then yes, deserialization could be another way to attack it - just like any other way of injecting malicious data.
For example, if you construct SQL statements by concatenating strings, you will be vulnerable to SQL injection attack whether the strings come from user input or deserialized data. The way to fix this is to use parameterized queries, not avoid deserialization or try to sanitize the user's input.
In any case the answers to the original post were mostly speculation, comments on Java serialization that's not really relevant to .NET or really contrived examples.
The BinaryFormatter serializes fields and not methods.
The only way to transmit and load unknown code is to:
Receive an assembly and load it using Assembly.Load.
Use a CodeDomProvider to emit IL at runtime.
Related
I have serialized a C# class using protobuf-net. The resultant byte array is stored in a database. This is for performance reasons and I probably won't be able to change this design. The C# language doesn't make it possible to prevent classes being modified, and the class structure being passed in for deserialization with time may require changes that will not match that used for serialization, causing retrieval to fail.
Other than the wrapper technique suggested here, is there a pattern or technique for handling this kind of problem?
The only othe technique that comes to my mind is to version the classes that need to be deserialized in order to not loose anything when you need to make some changes. When you serialize an instance of those classes, you have to serialize also the version of the class (it could be a field of the the class itself).
I don't think this is the best solution but a solution.
The versioning strategy could become very difficult to manage when the changes (and the versions) start to grow.
I wonder if there is any possibility of serializing a class described in a topic.
Suppose we have someone's library that is shared as binary DLL file. Additionally a creator of this lib created a class that is not Serializable. How to serialize such a class? I know I can create a twin-class that contains all the poperties etc. that can be serialized. But is there any other, easier solution to do this? How do you serialize classes that are "not yours" and are stored as binary only?
The 3rd party class is an implementation detail; frankly, it is a very bad idea to involve this in your serialization, as you are then completely fenced into a corner, and can never change implementation. You would also face significant risk of versioning issues - something that BinaryFormatter simply doesn't handle well.
It might not be what you want to hear, but I offer two recommendations:
do not serialize implementation details; serialize the data (only); this may indeed require you to write a DTO that mirrors the implementation, but this is usually a trivial job
make sure you understand the implications of BinaryFormatter; frankly, I never recommend it - it has... glitches.
As for workarounds: you can investigate serialization surrogates, but that isn't a trivial thing to do inside BinaryFormatter, and is basically just a re-statement of the first bullet.
If it was me (although I am hugely biased), I would change serializer; protobuf-net (disclosure: I'm the author) works as a binary serializer, and has easy-to-implement support for surrogates if the third-party model is already coupled to your model.
If I want to serialize an object I have to use [Serializable] attribute and all member variables will be written to the file. What I don't know how to do versioning e.g. if I add a new member variable (rename a variable or just remove a variable) and then I open (deserialize) the file how can I determine the object/file version so I can correctly set the new member or take some kind of migration? How can I determine that the variable was initialized during the load or not (ignored by deserializer).
I know that there are version tolerant approaches and I can mark variables with [OptionalField(VersionAdded = 1)] attribute. If I open an old file the framework will ignore this optional (new variable) and it will be just zero/null. But again how can I determine if the variable is initialized by load or it was ignored.
I can write the class/object version number to the stream. Use the ISerializable approach and in the constructor(SerializationInfo oInfo, StreamingContext context) method read this version number. This will exactly tell me what is the class version in the stream.
However I expected that such kind of versioning is already implemented by the streaming framework in C#. I tried to obtain the Assembly version from the SerializationInfo but it is always set to current version not to the version which was used when the object was saved.
What is the preferred approach? I found a lot of articles on the net, but I could not find a good solution for this which addresses versioning...
Any help is appreciated
Thanks,
Abyss
Forgive me if some of what I write is too obvious,
First of all, please! you must stop thinking that you are serializing an object...
That is simply incorrect as the methods which are part of your object are not being persisted.
You are persisting information - and so.. DATA only.
.NET serialization also serializing the type name of your object which contain the assembly name and its version, so when you deserialize - it compares the persisted assembly information with the type that will be manifested with the information - if they are not the same it will return an exception.
Beside the versioning problem - not everything can be serialized so easily.. try to serialize a System.Drawing.Color type and you will begin to understand the problems with the over simplistic mechanism of .NET serialization.
Unless you plan to serialize something really simple which has no plans to evolve I wouldn't use the serialization mechanism provided by .NET.
Getting the focus back to your question, you can read here about the versioning ignorance ability:
http://msdn.microsoft.com/en-us/library/ms229752(v=vs.80).aspx which is provided for BinaryFormatter.
You should also check XML Serialization which has some nice abilities, but the biggest benefit is that you getting an XML which is Human readable so your data will never be lost even if you had complication with the versioning of your types.
But finally, I recommend you either use Database with Entity Framework to persist your data or write your own flat file manager.. while EF is very good for most solutions, sometime you might want something lighter to persist something very simple.
(my imply is that I can no longer see a solution where .NET serialization can be relevant.)
I hope this helps, Good luck.
Based on my understanding, SerializableAttribute provides no compile time checks, as it's all done at runtime. If that's the case, then why is it required for classes to be marked as serializable?
Couldn't the serializer just try to serialize an object and then fail? Isn't that what it does right now? When something is marked, it tries and fails. Wouldn't it be better if you had to mark things as unserializable rather than serializable? That way you wouldn't have the problem of libraries not marking things as serializable?
As I understand it, the idea behind the SerializableAttribute is to create an opt-in system for binary serialization.
Keep in mind that, unlike XML serialization, which uses public properties, binary serialization grabs all the private fields by default.
Not only this could include operating system structures and private data that is not supposed to be exposed, but deserializing it could result in corrupt state that can crash an application (silly example: a handle for a file open in a different computer).
This is only a requirement for BinaryFormatter (and the SOAP equivalent, but nobody uses that). Diego is right; there are good reasons for this in terms of what it does, but it is far from the only option - indeed, personally I only recommend BinaryFormatter for talking between AppDomains - it is not (IMO) a good way to persist data (to disk, in cache, to a database BLOB, etc).
If this behaviour causes you trouble, consider using any of the alternatives:
XmlSerializer, which works on public members (not just the fields), but demands a public parameterless constructor and public type
DataContractSerializer, which can work fully opt-in (using [DataContract]/[DataMember]), but which can also (in 3.5 and above) work against the fields instead
Also - for a 3rd-party option (me being the 3rd party); protobuf-net may have options here; "v2" (not fully released yet, but available as source) allows the model (which members to serialize, etc) to be described independently of the type, so that it can be applied to types that you don't control. And unlike BinaryFormatter the output is version-tolerant, known public format, etc.
I want to know how deserialization works and is it really needed to have the assembly on that system where the deserialization is happening.
If you haven't looked at MSDN yet, do so. It will tell you everything you need to know about the serialization/deserialization process...at least enough to use it. The link I gave you is specifically 'how to deserialize.'
As far as the more technical aspects, the pieces of information that will get serialized will be exactly what is required to fill that structure/class/object.
I'm not really sure what you mean by the 2nd part of your question about the assembly. However, if you are serializing a struct (for instance), then in order to deserialize to another machine, or application, you must have that exact same struct available: name, fields, data types, etc.
If you're looking for the exact details, you can boot up an instance of Reflector and point it to mscorlib and look into the various classes in the System.Runtime.Serialization namespace. Here's the high-level idea (as I understand it):
The first step is ensuring that the type system that wrote binary stream is the same as the type system that is reading the binary stream. Since so little meta-information is attached to the output, problems can arise if we're not looking at the same type. If we have two classes named A, but the writer thinks an A is class A { int m_a, m_b; } and the reader thinks A is class A { int m_b, m_a; }, we're going to have problems. This problem is much worse if the types are significantly different. Keep this in mind, it will come back later.
Okay, so we're reading and writing the same types. The next step is finding all the members of the object you want to serialize. You could do this through reflection with a call like typeof(T).GetFields(~System.Reflection.BindingFlags.Default), but that will be super-slow (rule of thumb: reflection is slow). Luckily, .NET's internal calls are much faster.
Step 1: Now we get to writing. First: the writer writes the strong-name assembly that the object we're serializing resides in. The reader can then confirm that they actually have this assembly loaded. Next, the object's namespace-qualified type is written, so the reader can read into the proper object. This basically guarantees that the reading type and writing type is the same.
Step 2: Time to actually write the object. A look at the methods of Formatter lets us know that there is some basic functionality for writing ints, floats and all sorts of simple types. In a pre-determined order (the order they are declared, starting from the fields of the base class), each field is written to the output. For fields that are not the simple types, recurse back to step 1 with the object in that field.
To deserialize, you perform these same steps, except replace all the verbs such as 'write' with verbs like 'read.' Order is extremely important.