Improve performance of XmlSerializer - c#

I use a XmlSerializer to serialize/deserialize some objects. The problem is the performance. When profiling, using the XmlSerializer make our application 2 seconds longer to start. We cache our XmlSerializer and reuse them. We cannot use sgen.exe because we are creating the XmlSerializer with XmlAttributeOverrides.
I try to use serialization alternative like Json.Net and, at first, it's work great. The problem is that we need to be backward compatible so all the xml already generated need to be parsed correctly. Also, the object serialization output must be Xml.
To summarize:
I receive Xml data serialized by a XmlSerializer.
I need to deserialize the Xml data and convert it into an object.
I need to serialize object into Xml (ideally an Xml format like the one a XmlSerializer would have done)

Ultimately, it depends on the complexity of your model. XmlSerializer needs to do a lot of thinking, and the fact that it is taking so long leads me to suspect that your model is pretty complex. For a simple model, it might be possible to manually implement the deserialization using LINQ-to-XML (pretty easy), or maybe even XmlReader (if you are feeling very brave - it isn't easy to get 100% correct).
However, if the model is complex, this is a problem and frankly would be very risky in terms of introducing subtle bugs.
Another option is DataContractSerializer which handles xml, but not as well as XmlSerializer, and certainly not with as much control over the layout. I strongly suspect that DataContractSerializer would not help you.
There is no direct replacement for XmlSerializer that I am aware of, and if sgen.exe isn't an option I believe you basically have options:
live with it
rewrite XmlSerializer yourself, somehow doing better than them
use something like LINQ-to-XML and accept the effort involved
Long term, I'd say "switch formats", and use xml for legacy import only. I happen to know of some very fast binary protocols that would be pretty easy to substitute in ;p

The problem is that you are requesting types which are not covered by sgen which results in the generation of new assemblies during startup.
You could try to get your hands on the temp files generated by Xmlserializer for your specific types and use this code for your own pregnerated xmlserializer assembly.
I have used this approach to find out why csc.exe was executed which did delay startup of my application.
Besides this it might help to rename some types like in the article to arrive at the same type names that sgen is creating to be able to use sgen. Usually type arrays are not precreated by sgen which is a pitty sometimes. But if you name your class ArrayOf HereGoesYourTypeName then you will be able to use the pregenerated assemblies.

This answer has some good info on why XmlSerializer runs slow with XmlAttributeOverrides.
Do you really need to use the XmlSerializer in your main thread at startup?
Maybe run it in a background thread; If only some parts of the data are mandatory for startup perhaps you could manually read them into proxy/sparse versions of the real classes while the XmlSerializer initializes.
If its a GUI app you could just add a splash screen to hide the delay (or a game of tetris!!)
If all else fails can't you convert the existing files to JSON by just running the existing deserialize and a JSON serialize, or is there a hard requirement to keep them XML?

you have to Deserialize your list using a classic .net Serialization
something like the below:
TextReader tr = new StreamReader("yourlist.xml");
XmlSerializer serializer = new XmlSerializer(typeof(List<YourObject>));
List<YourObject> myCutomList = (List<YourObject>)serializer.Deserialize(tr);
tr.Close();
then you can use the Json.Serialization
JavaScriptSerializer json = new JavaScriptSerializer();
JsonResult output = json.Serialize(myCutomList );

If you have xml stored in the format that you cannot use then use xslt to transform it to the format that you can use.
If this xml is stored in the XmlSerializer format - lets say in flat files - or a db - then you can run your transforms over it once and not incur the XmlSerializer overhead at your usual runtime.
Alternatively you could xslt it at run time - but I doubt that this would be faster than the method outlined by Massimiliano.

You can use threading or tasks to get the application to startup faster and don't wait for the hardrive or the deserialization.

Another option I don't see mentioned in any answer here is to compile a serialization assembly. This way all the code generation and code compilation steps happen during compile-time in Visual Studio, not at runtime when your app is starting.
The OP mentions that the app takes too long to start. Well, that's exactly what serialization assembly is for.
In .NET Core the steps are pretty easy:
Add nuget reference to Microsoft.XmlSerializer.Generator
...that's it :)
More info here https://learn.microsoft.com/en-us/dotnet/core/additional-tools/xml-serializer-generator
P.S. If you're still using .NET Framework (not .NET Core), see this question Generating an Xml Serialization assembly as part of my build

Related

If I serialize a class in c#, can Unrealscript read it back in?

I am trying to figure out a way to pass large amounts of information between C# and Unrealscript. One of the things I am trying to investigate, is if C# can serialize a class for UnrealScript to later read in.
Serializing in C# is not so bad:
InfoToSerialize myInfo = new InfoToSerialize();
myInfo.i = 1;
myInfo.j = 2;
myInfo.str = "hello";
IFormatter formatter = new BinaryFormatter();
Stream stream = new FileStream("MyFile", FileMode.Create, FileAccess.Write, FileShare.None);
formatter.Serialize(stream, myInfo);
stream.Close();
This little snippet of code successfully produces a binary file called MyFile, which looks quite serialized. if you open it in a notepad application, most of the data there is composed of unreadable symbols and icons.
My question: Is it then possible to have UnrealScript gain access to this file and de-serialize it? My research on this topic so far makes it seem like it is not possible, but perhaps somewhere here has had some experience in this area. I know that UnrealScript has its own save and load functionality, but I am not sure if that will aid me in this task. Any information is greatly appreciated!
It is possible, but it might take some extra work on your part, depending on what is available in UnrealScript. The .NET framework supports several different serialization schemes. The default method is to serialize to an internal binary format. That works great if you need to read and write from the same .NET program (and as long as your program can handle upgrades -- binary serialization can break between different releases of the .NET framework). But this won't work at all in your case -- you need to be able to read the object from a program not written in the .NET framework.
Fortunately there are also ways to change your serialization methods. You can choose to serialize, for example, to XML. At the far end of the spectrum, if no serializers provided by the .NET framework can be read by UnrealScript, you can write your own serializer that writes to whatever format you require.
This MSDN link has more information about serialization.
Only if you use a language-neutral serialization scheme.
The default .NET serialization providers require .NET to read the object back in. But if you use a language-neutral serialization scheme, such as plain old XML, JSON or your own predefined format, then any other language should be able to parse it if a provider is available.
The format used by BinaryFormatter is fairly specific to .NET and relies on .NET type information, which is not available for easy serialization in UnrealScript.
However, JSON parsing is available in the UDK, I believe, although you will probably have to do most of the actual deserialization manually. There are multiple ways to serialize to JSON from .NET and JSON is also easy enough to serialize manually.
You can JSON.NET (http://json.codeplex.com/) for fast JSON Serialization.

Can a DataContractSerializer be setup to ignore errors in a file rather then just fail entirely?

I'm using DataContractSerializer to save a large number of different classes which make up a tree structures to XML files. I'm in the initial stages of writing this software so at this point all the different components are changing around quite a bit. Yet every time I make a change to a class I end up breaking my programs ability to open previously saved files.
My tree structures will still be functional if components are missing. Is there some way to tell DataContractSerializer to skip over data it has a problem deserializing and continue on rather then just quitting at the first problem it has?
I know one answer would be to write my own serialization class, but I'd rather not spend the time to do that. I was hopping to still be able to take advantage of DataContractSerializer, but with out it being an all or nothing situation.
I think what you're looking for is IExtensibleDataObject. This way, any unexpected elements get read into a name-value dictionary maintained internally, and can even be serialized back later. See the following resources for help.
Blog post -- WCF Extensibility – Other Serialization Extensions
Forward-Compatible Data Contracts
Data Contract Versioning

Any reason not to use XmlSerializer?

I just learned about the XmlSerializer class in .Net. Before I had always parsed and written my XML using the standard classes. Before I dive into this, I am wondering if there are any cases where it is not the right option.
EDIT: By standard classes I mean XmlDocument, XmlElement, XmlAttribute...etc.
There are many constraints when you use the XmlSerializer:
You must have a public parameterless constructor (as mentioned by idlewire in the comments, it doesn't have to be public)
Only public properties are serialized
Interface types can't be serialized
and a few others...
These constraints often force you to make certain design decisions that are not the ones you would have made in other situations... and a tool that forces you to make bad design decisions is usually not a good thing ;)
That being said, it can be very handy when you need a quick way to store simple objects in XML format. I also like that fact that you have a pretty good control over the generated schema.
Well, it doesn't give you quite as much control over the output, obviously. Personally I find LINQ to XML makes it sufficiently easy to write this by hand that I'm happy to do it that way, at least for reasonably small projects. If you're using .NET 3.5 or 4 but not using LINQ to XML, look into it straight away - it's much much nicer than the old DOM.
Sometimes it's nice to be able to take control over serialization and deserialization... especially when you change the layout of your data. If you're not in that situation and don't anticipate being in it, then the built-in XML serialization would probably be fine.
EDIT: I don't think XML serialization supports constructing genuinely immutable types, whereas this is obviously feasible from hand-built construction. As I'm a fan of immutability, that's definitely something I'd be concerned about. If you implement IXmlSerializable I believe you can make do with public immutability, but you still have to be privately mutable. Of course, I could be wrong - but it's worth checking.
The XmlSerializer can save you a lot of trouble if you are regularly serializing and deserializing the same types, and if you need the serialized representations of those types to be consumable by different platforms (i.e. Java, Javascript, etc.) I do recommend using the XmlSerializer when you can, as it can alleviate a considerable amount of hassle trying to manage conversion from object graph to XML yourself.
There are some scenarios where use of XmlSerializer is not the best approach. Here are a few cases:
When you need to quickly, forward-only process large volumes of xml data
Use an XmlReader instead
When you need to perform repeated searches within an xml document using XPath
When the xml document structure is rather arbitrary, and does not regularly conform to a known object model
When the XmlSerializer imposes requirements that do not satisfy your design mandates:
Don't use it when you can't have a default public constructor
You can't use the xml serializer attributes to define xml variants of element and attribute names to conform to the necessary Xml schema
I find the major drawbacks of the XmlSerializer are:
1) For complex object graphs involving collections, sometimes it is hard to get exactly the XML schema you want by using the serialization control attributes.
2) If you change the class definitions between one version of the app and the next, your files will become unreadable.
Yes, I personally use automatic XML serialization - although I use DataContractSerializer initially brought in because of WCF instead (ability to serialize types without attributes at all is very helpful) as it doesn't embed types in there. Of course, you therefore need to know the type of object you are deserializing when loading back in.
The big problem with that is it's difficult to serialize to attributes as well without implementing IXmlSerializable on the type whose data you might want to be written so, or exposing some other types that the serializer can handle natively.
I guess the biggest gotcha with this is that you can't serialise interfaces automatically, because the DCS wants to be able to construct instances again when it receives the XML back. Standard collection interfaces, however, are supported natively.
All in all, though, I've found the DCS route to be the fastest and most pain-free way.
As an alternative, you could also investigate using Linq to XML to read and write the XML if you want total control - but you'll still have to process types on a member by member basis with this.
I've been looking at that recently (having avoided it like the plague because I couldn't see the point) after having read about it the early access of Jon Skeet's new book. Have to say - I'm most impressed with how easy it makes it to work with XML.
I've used XmlSerializer a lot in the past and will probably continue to use it. However, the greatest pitfall is one already mentioned above:
The constraints on the serializer (such as restriction to public members) either 1) impose design constraints on the class that have nothing to do with its primary function, or 2) force an increase in complexity in working around these constraints.
Of course, other methods of Xml serialization also increase the complexity.
So I guess my answer is that there's no right or wrong answer that fits all situations; chosing a serialization method is just one design consideration among many others.
Thera re some scenarios.
You have to deal with a LOT of XML data -the serializer may overlaod your memory. I had that once for a simple schema that contained a database dump for 2000 or so tables. Only a handfull of classes, but in the end serialization did not work - I had to use a SAX streaming parser.
Besides that - I do not see any under normal circumstances. It is a much easier way to deal with the XML Serializer than to use the lower level parser, especially for more complex data.
When You want to transmit lot of data and You have very limited resources.

What would be the best way to validate XML?

I been looking at XML Serialization for C# and it looks interesting. I was reading this tutorial
http://www.switchonthecode.com/tutorials/csharp-tutorial-xml-serialization
and of course you can de serialize it back to a list of objects. So I am wondering would it be better to de serialize it back to to a list of objects and then go through each object and validate it or validate it by using a schema then de serializing it and doing stuff with it?
http://support.microsoft.com/kb/307379
Thanks
I guess it would depend a bit on what you want to validate, and for what purpose. If it is intended for interop to other systems, then validating via xsd is a reasonable idea not least because you can use xsd.exe to write your classes for you from the xsd (you can also generate xsd from xml or dll, but it isn't as accurate). Likewise you can use XmlReader (appropriately configured) to check against xsd,
If you just want valid .NET objects, I'd be tempted to leave the serialized form as an implementation detail, and write some C# validation code - perhaps implementing IDataErrorInfo, or using data-annotations.
You can create an XmlValidatingReader and pass that into your serializer. That way you can read the file in one pass and validate it at the same time.
I believe the same technique will work even if you are using hand rolled XML classes (for extremely large XML files) so you might find it worth a look.
Edit:
Sorry just reread some of my code, XmlValidatingReader is obsolete, you can do what you need with the XmlReader.
See XmlReader Settings
For speed I would do it in C#, however for completeness you might want to do it using an XSD. The issue with that is you have to learn the verbose and cumbersome XSD syntax, which from experience takes a lot of trial and error, is time consuming and holds not a lot of reward for serialization. Particularly with constants where you have to map them in C# and also in the XSD.
You'll always be writing the XML as C#. Anything not known when read back in is simply ignored. If you aren't editing the XML with a text editor you can guarantee that it will come back in the right way, in which case XSD is definitely not needed.
If you validate the XML, you can only prove that it's structurally correct. An attempt to deserialize from the XML will tell you the same thing.
Typically business objects can implement business logic/rules/conditions that go beyond a valid schema. That type of knowledge should stay with the business objects themselves, rather than being duplicated in some sort of external validation routine (otherwise, if you change a business rule, you have to update the validator at the same time).

Converting XML between schemas - XSLT or Objects?

Given:
Two similar and complex Schemas, lets call them XmlA and XmlB.
We want to convert from XmlA to XmlB
Not all the information required to produce XmlB is contained withing XmlA (A database lookup will be required)
Can I use XSLT for this given that I'll need to reference additional data in the database? If so what are the arguments in favour of using XSLT rather than plain old object mapping and conversion? I'm thinking that the following criteria might influence this decision:
Performance/speed
Memory usage
Code reuse/comlpexity
The project will be C# based.
Thanks.
With C# you can always provide extension objects to XSLT transforms, so that's a non-issue.
It's hard to qualitatively say without having the schemas and XML to hand, but I imagine a compiled transform will be faster than object mapping since you'll have to do a fair amount of wheel reinventing.
Further, one of the huge benefits of XSLT is it's maintainability and portability. You'll be able to adapt the XSLT doc really quickly with schema changes, and on-the-fly without having to do any rebuilds and takedowns if you're monitoring the file.
Could go either way based on what you've given us though.
My question is how likely are the set-of-transformations to change?
If they won't change much, I favor doing it all in one body of source code -- here that would be C#. I would use XSD.exe (.NET XSD tool) generated serialization classes in conjunction with data layers for this kind of thing.
On the other hand, if the set-of-transformations are likely to change -- or perhaps need to be 'corrected' post installation -- then I would favor a combination of XSLT and C# extensions to XSLT. The Extension mechanism is straightforward, and if you use the XslCompiledTransform type the performance is quite good.
If the data isn't in the xml, then xslt will be a pain. You can provide additional documents with xsl:document(), or you can use xslt extension methods (but that is not well supported between vendors). So unless you are dead set on xslt, it doesn't sound like a good option on this occasion (although I'm a big fan of xslt when used correctly).
So: I would probably use regular imperative code - streaming (IEnumerable<T>) if possible. Of course, unless you have a lot of data, such nuances are moot.

Categories