Can somebody explain me the internals of Deserialization? - c#

I want to know how deserialization works and is it really needed to have the assembly on that system where the deserialization is happening.

If you haven't looked at MSDN yet, do so. It will tell you everything you need to know about the serialization/deserialization process...at least enough to use it. The link I gave you is specifically 'how to deserialize.'
As far as the more technical aspects, the pieces of information that will get serialized will be exactly what is required to fill that structure/class/object.
I'm not really sure what you mean by the 2nd part of your question about the assembly. However, if you are serializing a struct (for instance), then in order to deserialize to another machine, or application, you must have that exact same struct available: name, fields, data types, etc.

If you're looking for the exact details, you can boot up an instance of Reflector and point it to mscorlib and look into the various classes in the System.Runtime.Serialization namespace. Here's the high-level idea (as I understand it):
The first step is ensuring that the type system that wrote binary stream is the same as the type system that is reading the binary stream. Since so little meta-information is attached to the output, problems can arise if we're not looking at the same type. If we have two classes named A, but the writer thinks an A is class A { int m_a, m_b; } and the reader thinks A is class A { int m_b, m_a; }, we're going to have problems. This problem is much worse if the types are significantly different. Keep this in mind, it will come back later.
Okay, so we're reading and writing the same types. The next step is finding all the members of the object you want to serialize. You could do this through reflection with a call like typeof(T).GetFields(~System.Reflection.BindingFlags.Default), but that will be super-slow (rule of thumb: reflection is slow). Luckily, .NET's internal calls are much faster.
Step 1: Now we get to writing. First: the writer writes the strong-name assembly that the object we're serializing resides in. The reader can then confirm that they actually have this assembly loaded. Next, the object's namespace-qualified type is written, so the reader can read into the proper object. This basically guarantees that the reading type and writing type is the same.
Step 2: Time to actually write the object. A look at the methods of Formatter lets us know that there is some basic functionality for writing ints, floats and all sorts of simple types. In a pre-determined order (the order they are declared, starting from the fields of the base class), each field is written to the output. For fields that are not the simple types, recurse back to step 1 with the object in that field.
To deserialize, you perform these same steps, except replace all the verbs such as 'write' with verbs like 'read.' Order is extremely important.

Related

How to parse source code fragment to System.Type

I have a set of strings like this:
System.Int32
string
bool[]
List<MyType.MyNestedType>
Dictionary<MyType.MyEnum, List<object>>
I would like to test if those strings are actually source code representations of valid types.
I'm in an environment, that doesn't support Roslyn and incorporating any sort of parser would be difficult. This is why I've tried using System.Type.GetType(string) to figure this out.
However, I'm going down a dirty road, because there are so many edge cases, where I need to modify the input string to represent an AssemblyQualifiedString. E.g. nested type "MyType.MyNestedType" needs to be "MyType+MyNestedType" and generics also have to be figured out the hard way.
Is there any helper method which does this kind of checking in .Net 2.0? I'm working in the Unity game engine, and we don't have any means to switch our system to a more sophisticated environment with available parsers.
Clarification
My company has developed a code generation system in Unity, which is not easily changed at this point. The one thing I need to add to it, is the ability to get a list of fields defined in a class (via reflection) and then separate them based on whether they are part of the default runtime assembly or if they are enclosed within #if UNITY_EDITOR preprocessor directives. When those are set, I basically want to handle those fields differently, but reflection alone can't tell me. Therefore I have decided to open my script files, look through the text for such define regions and then check if a field is declared within in them, and if true, put it in a separate FieldInfo[] array.
The one thing fixed and not changeable: All script will be inspected via reflection and a collection of FieldInfo is used to generate new source code elsewhere. I just need to separate that collection into individual ones for runtime vs editor assembly.
Custom types and nested generics are probably the hard part.
Can't you just have a "equivalency map to fully qualified name" or a few translation rules for all custom types ?
I guess you know by advance what you will encounter.
Or maybe run it on opposite way : at startup, scan your assembly(s) and for each class contained inside, generates the equivalent name "as it's supposed to appear" in your input file from the fully qualified name in GetType() format ?
For custom types of other assemblies, please note that you have to do things such as calling Assembly.LoadFile() or pass assembly name in second parameter to GetType() before to be able to load them.
See here for example : Resolve Type from Class Name in a Different Assembly
Maybe this answer could also help : How to parse C# generic type names?
Could you please detail what is the final purpose of project ? The problem is a bit surprising, especially for a unity project. Is it because you used some kind of weird serialization to persist state of some of your objects ?
This answer is more a few recommandations and questions to help you to clarify the needs than a definitive answer, but it can't hold in a single comment, and I think it provide useful informations

Serialization and object versioning in C#

If I want to serialize an object I have to use [Serializable] attribute and all member variables will be written to the file. What I don't know how to do versioning e.g. if I add a new member variable (rename a variable or just remove a variable) and then I open (deserialize) the file how can I determine the object/file version so I can correctly set the new member or take some kind of migration? How can I determine that the variable was initialized during the load or not (ignored by deserializer).
I know that there are version tolerant approaches and I can mark variables with [OptionalField(VersionAdded = 1)] attribute. If I open an old file the framework will ignore this optional (new variable) and it will be just zero/null. But again how can I determine if the variable is initialized by load or it was ignored.
I can write the class/object version number to the stream. Use the ISerializable approach and in the constructor(SerializationInfo oInfo, StreamingContext context) method read this version number. This will exactly tell me what is the class version in the stream.
However I expected that such kind of versioning is already implemented by the streaming framework in C#. I tried to obtain the Assembly version from the SerializationInfo but it is always set to current version not to the version which was used when the object was saved.
What is the preferred approach? I found a lot of articles on the net, but I could not find a good solution for this which addresses versioning...
Any help is appreciated
Thanks,
Abyss
Forgive me if some of what I write is too obvious,
First of all, please! you must stop thinking that you are serializing an object...
That is simply incorrect as the methods which are part of your object are not being persisted.
You are persisting information - and so.. DATA only.
.NET serialization also serializing the type name of your object which contain the assembly name and its version, so when you deserialize - it compares the persisted assembly information with the type that will be manifested with the information - if they are not the same it will return an exception.
Beside the versioning problem - not everything can be serialized so easily.. try to serialize a System.Drawing.Color type and you will begin to understand the problems with the over simplistic mechanism of .NET serialization.
Unless you plan to serialize something really simple which has no plans to evolve I wouldn't use the serialization mechanism provided by .NET.
Getting the focus back to your question, you can read here about the versioning ignorance ability:
http://msdn.microsoft.com/en-us/library/ms229752(v=vs.80).aspx which is provided for BinaryFormatter.
You should also check XML Serialization which has some nice abilities, but the biggest benefit is that you getting an XML which is Human readable so your data will never be lost even if you had complication with the versioning of your types.
But finally, I recommend you either use Database with Entity Framework to persist your data or write your own flat file manager.. while EF is very good for most solutions, sometime you might want something lighter to persist something very simple.
(my imply is that I can no longer see a solution where .NET serialization can be relevant.)
I hope this helps, Good luck.

how to use extensions from protocol buffers to maintain 'general' message

My client-server communication looks like this: there are some so called annoucements which are seperate messages used to exchange information. The idea is that annoucement is the common part of every message. Actually I suppose it will be the type of the message. The type decide what is the content. In UML class diagram Annoucement would be the class all other messages inherit.
I want to implement that idea in communication between two applications one written in C++ the other in C#. I thought I could write a message that contain one field with the type if the message (an enum field) . All additional information relevant to the type would be implemented as an extensions.
I have found some examples how to use extensions in C++, however I have no clue how to do it in C#. I know there are interfaces IExtensible and IExtension (in protobuf-net) but how can I use them? Internet resources seem to be poor in the matter.
I suppose in the past messages in C# used to be define similiar to fashion that they are still defined in C++ apps (using proto file and protoc). Can I use the same proto file to define the message in C#? How? Will extenions be interpreted or overriden?
If I could implement extensions, I would sent a message, parse it, check the type and use approriate function to maintain it. That sounds to me cool because I wouldn't have to take care of the type of the message I was going to read - I don't have to know the type before parsing.
There are a number of ways you could do this. I'm not actually sure extensions is the one I would leap for, but:
in your message type, you could have a set of fully defined fields for each sub-message, i.e.
base-message
{1-5} common fields
{optional 20} sub-message 1
{optional 21} sub-message 2
{optional 22} sub-message 3
{optional 23} sub-message 4
sub-message 1
{1-n} specific fields
where you would have exactly one of the sub-message object
alternatively, encapsulate the common parts inside the more specific message:
common field type
{1-n} fields
sub-message 1
{1} common field type
{2-m} specific fields
Either approach would allow you to deserialize; the second is trickier, IMO, since it requires you to know the type ahead of time. The only convenient way to do that is to prefix each with a different identifier. Personally I prefer the first. This does not, however, require extensions - since we know everything ahead of time. As it happens, the first is also how protobuf-net implements inheritance, so you could do that with type inheritance (4 concrete sub-types of an abstract base message type)and [ProtoInclude(...)]
Re extension data; protobuf-net does support that, however as mentioned in the blog this is not included in the current v2 beta. It will be there soon, but I had to put a line somewhere. It is included in the v1 (r282) download though
Note that protobuf-net is just one of several C#/.NET implementations. The wire format is the same, but you might also want to consider the directly ported version. If I had to summarise the difference I would say "protobuf-net is a .NET serializer that happens to be protobuf; protobuf-csharp-port is a protobuf serializer that happens to be .NET" - they both achieve the same end, but protobuf-net focuses on being idiomatic to C#/.NET where-as the port focuses more on having the same API. Either should work here of course.

Storing multiple types in same object

First I will give a little background info so this question isn't completely without context:
I am writing a program which needs to read in binary files of which I know the format/layout. The bulk of these files contain "structs", with the layout of each stored in the header of the file. Each struct contains fields which could be either structs or "base" types (which are not structs, and can be either value or reference types like float or String or Vector3).
I do not need to access most of the data in these files, so I would not have to define all possible structs for my current project, but there are enough that I would have to define that doing it manually would be tedious and time consuming.
My problem is that there are an extremely large number of these structs (2500+ different ones, though only the ones appearing in a file are defined in that file's header). I would like to be able to read them in, without having to manually define each one, in a way that makes all the data available in the same way. My current thinking is that I should create a class like this:
class Struct{
StructDefinition _def;
List &lt Field &gt _fields;
...
}
In the field class I would need to be able to store both structs AND base types (or more importantly reference and value types). Is there a way to do this without turning everything into an object then casting it to the correct type when I need it? Also, is this the best way to go about reading in these files, or is there a better method?
dynamic (.NET 4.0) can do this at runtime, but you lose compile-time type safety and IntelliSense.
For your particular situation, I would recommend the new file mapping classes in .NET 4.0.
If you need a pre-.NET 4.0 solution, consider looking at the file as a series of offsets instead of structures, and use a FileStream, seeking to and reading in only the information you need. This is kind of like a poor-man's file mapping. Note that in this situation, it's more performant if you only move forward through the file.
Using either of these last two solutions, you don't have to waste time and memory reading in data you won't need anyway.
If you define an interface that contains all of the methods that you need to manipulate the fields, then you can create a small set of objects - one for reference and one for value types.
As long as each of these objects implement the same interface, you will be able to work with them generically.

Why is Serializable Attribute required for an object to be serialized

Based on my understanding, SerializableAttribute provides no compile time checks, as it's all done at runtime. If that's the case, then why is it required for classes to be marked as serializable?
Couldn't the serializer just try to serialize an object and then fail? Isn't that what it does right now? When something is marked, it tries and fails. Wouldn't it be better if you had to mark things as unserializable rather than serializable? That way you wouldn't have the problem of libraries not marking things as serializable?
As I understand it, the idea behind the SerializableAttribute is to create an opt-in system for binary serialization.
Keep in mind that, unlike XML serialization, which uses public properties, binary serialization grabs all the private fields by default.
Not only this could include operating system structures and private data that is not supposed to be exposed, but deserializing it could result in corrupt state that can crash an application (silly example: a handle for a file open in a different computer).
This is only a requirement for BinaryFormatter (and the SOAP equivalent, but nobody uses that). Diego is right; there are good reasons for this in terms of what it does, but it is far from the only option - indeed, personally I only recommend BinaryFormatter for talking between AppDomains - it is not (IMO) a good way to persist data (to disk, in cache, to a database BLOB, etc).
If this behaviour causes you trouble, consider using any of the alternatives:
XmlSerializer, which works on public members (not just the fields), but demands a public parameterless constructor and public type
DataContractSerializer, which can work fully opt-in (using [DataContract]/[DataMember]), but which can also (in 3.5 and above) work against the fields instead
Also - for a 3rd-party option (me being the 3rd party); protobuf-net may have options here; "v2" (not fully released yet, but available as source) allows the model (which members to serialize, etc) to be described independently of the type, so that it can be applied to types that you don't control. And unlike BinaryFormatter the output is version-tolerant, known public format, etc.

Categories