I would like to know the most common scenarios where xml serialization may fail in .NET.
I'm thinking mainly of XmlSerializer here:
it is limited to tree-like data; it can't handle full object graphs
it is limited to public members, on public classes
it can't really do much with object members
it has some weaknesses around generics
like many serializers, it won't touch instance properties on a collection (bad practice in the first place)
xml simply isn't always a good choice for large data (not least, for performance)
requires a public parameterless constructor
DataContractSerializer solves some of these, but has its own limitations:
it can't handle values in attributes
requires .NET 3.0 (so not much use in 2.0)
Cannot easily serialize generic collections.
See another question: C# XML Serialization Gotchas
Depending on the serializer, cyclic references may not work
Using the shadows keyword has also broken serialization and deserialization for me because the shadowing causes a new implementation of that property to exist making it incompatible for proper reconstruction. Only use overloads if you want to retype to the specific for a subclass.
TimeSpan objects are not serializable. IDictionary-implementing types are not serializable either (although they can be serialized with some manual massaging).
AFAIK, classes marked as [Obsolete] are not serialized by XmlSerializer since .NET 2.0
Related
The .NET Framework ships with System.Runtime.Serialization.Json.DataContractJsonSerializer and System.Web.Script.Serialization.JavaScriptSerializer, both of which de/serialize JSON. How do I know when to choose one of these types over the other? MSDN doesn't make it clear what their relative advantages are.
We have several projects that consume or emit JSON, and the class selected for each thus far has depended on the opinion of the primary dev on each project. Some are simple, two have complex logic regarding producing managed types from JSON (the types do not map closely to the streams) but don't have any emphasis on speed, one requires speed. None interact with WCF, at least as of now.
While I'm interested in alternative libraries, I am hoping that somebody might have an answer to my question too.
The DataContractJsonSerializer is intended for use with WCF client applications where the serialized types are typically POCO classes with the DataContract attribute applied to them. No DataContract, no serialization. The mapping mechanism of WCF makes the sending and receiving very simple, but only if your platform is homogeneous. If you start mixing in different toolsets, your program might go sideways.
The JavaScriptSerializer can serialize any type, including anonymous types (one way), and does so in a more conformant way. You lose the "automagic" of WCF, but you gain more integration options.
As you can see by the comments, there are a lot of options out there for AJAX serialization, and to address your speed vs. maintainability questions, it might be worth investigating them to find a solution that meets the needs of all the teams, to reduce maintainability issues in the long term as everybody does things their own way.
2014-04-07 UPDATE:
I suggest using JSON.NET if you can. See http://james.newtonking.com/json Feature Comparison for a review of the 3 libraries considered in this question.
2015-05-26 UPDATE:
If your company requires the use of commercially licensable products, or you need every last bit of performance, you may also want to check out https://servicestack.net/.
Both do approximately the same but using very different infrastructure thus applying different restrictions on the classes you want to serialize/deserialize and providing different degree of flexibility in tuning the serialization/deserialization process.
For DataContractJsonSerializer you must mark all classes you want to serialize using DataContract atrtibute and all members using DataMember attribute. As well as if some of you classes have enum members, then the enums also must be marked as DataContract and each enum member - with EnumMember attribute.
Also DataContractJsonSerializer allows you fine control over the whole process of serialization/deserialization by altering types resolution logic and replacing the types you serialize with surrogates.
For JavaScriptSerializer you must provide parameterless constructor if you plan on deserializing objects from json string.
For me, I usually use JavaScriptSerializer in presentation logic, where there's a simple model I want to render in Json together with page, without additional ajax requests. And I even usually don't have to deserialize them back to c# - so there's no overhead at all. But if it's persistence logic, where I want to save objects into a data store (usually no-sql storage), to load them later, I prefer using DataContractJsonSerializer because the overhead of putting attributes is worth of flexibility in the serialization/deserialization process tuning, especially when it comes to loading of serialized data into the objects of the newer version, with updated definitions
Personally, I think that DataContractJsonSerializer reeks of over-engineering. I'd skip it and go with JavaScriptSerializer. In the event where JavaScriptSerializer isn't available, you can use FridayThe13th (a library I wrote ;p).
On my own object I can add the metatag [Serializable] to make it serializable. Now I use a 3rd party library that I need to be serializable. I inspected the code and it should not be a problem. Is there a way to fix this without altering the 3rd party code?
My advice would be: serialize data, not implementation. The fact of the existence of a 3rd-party object is nothing to do with the data; that is an implementation detail. As such, I always offer the same advice: if serialization ever gets complex, the first thing to do is to introduce a separate DTO model that represents the data in isolation of the implementation, and just map the current state to that DTO. This allows you to handle implementation changes without impact on the storage, and allows otherwise non-serializable objects to be serialized.
Some serializers offer workarounds - for example with protobuf-net you can a: supply the serialization information for any type at runtime, and b: supply a "surrogate" to use automatically when it gets tricky, but - using a DTO model is simpler and easier to maintain.
Your use of [Serializable] suggests BinaryFormatter; in my opinion, this is almost never a good choice for any kind of storage, since BinaryFormatter relies on implementation details. It works nicely for passing data between two in-sync app-domains, though
If the types are public you should be able to use the XmlSerializer to do what you want.
There's more information on this here
Serializes and deserializes objects into and from XML documents. The
XmlSerializer enables you to control how objects are encoded into XML.
Exactly take your subclass and make it serializable.
[Serializable] public class Foo: Bar {}
Write an adapter or be prepared to do something more extreme like disassembling the assembly, injecting the serializable attribute and reassembling.
Based on my understanding, SerializableAttribute provides no compile time checks, as it's all done at runtime. If that's the case, then why is it required for classes to be marked as serializable?
Couldn't the serializer just try to serialize an object and then fail? Isn't that what it does right now? When something is marked, it tries and fails. Wouldn't it be better if you had to mark things as unserializable rather than serializable? That way you wouldn't have the problem of libraries not marking things as serializable?
As I understand it, the idea behind the SerializableAttribute is to create an opt-in system for binary serialization.
Keep in mind that, unlike XML serialization, which uses public properties, binary serialization grabs all the private fields by default.
Not only this could include operating system structures and private data that is not supposed to be exposed, but deserializing it could result in corrupt state that can crash an application (silly example: a handle for a file open in a different computer).
This is only a requirement for BinaryFormatter (and the SOAP equivalent, but nobody uses that). Diego is right; there are good reasons for this in terms of what it does, but it is far from the only option - indeed, personally I only recommend BinaryFormatter for talking between AppDomains - it is not (IMO) a good way to persist data (to disk, in cache, to a database BLOB, etc).
If this behaviour causes you trouble, consider using any of the alternatives:
XmlSerializer, which works on public members (not just the fields), but demands a public parameterless constructor and public type
DataContractSerializer, which can work fully opt-in (using [DataContract]/[DataMember]), but which can also (in 3.5 and above) work against the fields instead
Also - for a 3rd-party option (me being the 3rd party); protobuf-net may have options here; "v2" (not fully released yet, but available as source) allows the model (which members to serialize, etc) to be described independently of the type, so that it can be applied to types that you don't control. And unlike BinaryFormatter the output is version-tolerant, known public format, etc.
In C#, if I want to serialize an instance with XmlSerializer, the object's type doesn't have to be marked with [Serializable] attribute. However, for other serialization approaches, such as DataContractSerializer, needs the class be marked as [Serializable] or [DataContract].
Is there any standard or pattern about serialization requirement?
This is because XmlSerializer only serializes public fields/properties. Other forms of serialization can serialize private data, which constitutes a potential security risk, so you have to "opt in" using an attribute.
Security isn't the only issue; simply, serialization only makes sense for certain classes. For example, it makes little snse to serialize a "connection". A connection string, sure, but the connection itself? nah. Likewise, anything that requires an unmanaged pointer/handle is not going to serialize very well. Nor are delegates.
Additionally, XmlSerializer and DataContractSerializer (by default) are tree serializers, not graph serializers - so any recursive links (like Parent) will cause it to break.
Marking the class with the serializer's preferred token is simply a way of saying "and it should make sense".
IIRC, both [XmlSerializer and [DataContractSerializer] used to be very rigid about demanding things like [Serializable], [DataContract] or [IXmlSerializable], but they have become a bit more liberal lately.
Right now there are really 3 forms of serialization in the .Net Framework.
XmlSerialization - By default works on public fields and properties. Can still be controlled via XmlElementAttribute, XmlAttributeAttribute, etc ...
BinarySerialization - Controlled by the SerializationAttribute. Deeply integrated into the CLR
WCF Seralization - DataContractAttribute, etc ...
There unfortunately is standard overall pattern for serialization. All 3 frameworks have different requirements and quirks.
I've got a program that picks up some code from script files and compiles it.
And It works fine.
The problem is: in the scripts I declare a couple of classes and I want to serialize them.
Obviously the C# serializer (xml and binary) doesn't like to serialize and the de-serialize object defined in a in-memory assembly.
I prefer to don't leave the in-memory assembly so i'm looking for another way of serializing, but in case, is possible to build assembly in memory and eventually write it on file ?
You could always write your own ToXml function using reflection to write out your property data to a string. Then your object would deserialize itself.
Just a thought.
If you want to create assemblies dynamically look into IL emitting via reflection. Here is a good article to get you started.
So just to clarify, are you asking how you can serialize a type if it hasn't got the [Serializable] attribute applied?
One solution is to use the WCF Data Contract Serializer: http://msdn.microsoft.com/en-us/library/ms731923.aspx.
Obviously this will only work if you can target .Net 3.0 or higher.
Alternately you can implement an ISerializationSurrogate. Jeffrey Richter has a great introduction at http://msdn.microsoft.com/en-us/magazine/cc188950.aspx.
I would avoid all built-in serialization whenever possible, both are badly broken. For example, XML serialization doesn't support dictionaries and normal serialization/SOAP doesn't support generics. And both have versioning issues.
It is time consuming, but createing ToXML and FromXML methods is probably to most effective way to go.
Hava a look at here for custom serialisers, which is a sample for dictionary XML serializing
I'm slightly confused by the statement that the XmlSerializer can't serialize dynamically generated types. The XmlSerializer generates it's own serialization code dynamically as well during construction so there should be no issue with it serializing your type.
You may need to decorate your dynamic classes with the appropriate attributes, depending on what you are generating (like derived classes), but there shouldn't be any issue with using the XmlSerializer in the situation you described.
If you could post details about the issues the XmlSerializer is giving you I can help you work out what the problem is.
Also, I'm of the belief that auto-generating code is in general a blessing. All to often have I had to go back into a class to fix one or all of the copy/paste/save/load functions, just because someone forgot to update them when adding a new variable. Save/Load code is boiler plate code. Let the computers write it.