Xsd to object class - c#

So, I'm trying to take an .xsd file (musicxml fixed standard), create an object class, use portions of it - specificially the note object - include it in a graph object, and then save both the graph object and a musicxml validated file.
All in all, the solutions I'm using have one or two massively breaking shortcomings.
Xsd2Code - Creates the file; but for some reason it makes a Items collection (of the type I need, ObservableCollection), and then an enumerable ItemsChoiceType[0-9] ObservableCollection. The problem with the enumerable is after it generates, I have to either have to switch the latter to an Array, or do mumbo-jumbo for the XmlSerialisation attrs. Generates a 2mb .cs file, so alot of code that would be autogenerated and would have to have a crapton of .extend.cs files to get it to fit. Maybe I have to change some switches for it to work? What switches fix this?
LinqToXsd / OpenLinqToXsd - Generates the file, hard codes it to reference a DLL file, then forces you to use List (no option to go to ObservableCollection), which doesn't have EditItem and can't be used for binding to WPF/XAML. Otherwise, a bunch more .extend.cs files.
Altova C# generator - Expensive, requires a bunch of their DLLs to include in the project, messy.
Long story short, has anyone used any of these systems successfully and what did you have to do to shoehorn them? What kind of pain will I have to deal with beyond the issues I'm having
I remember now for XSD.exe: XSD notation doesn't export, individual classes (such as 'note') don't serialise out to xml. I would have to write out the entire thing from scorepartwise to every piece inbetween. Which means I can't serialise a graph object that has 'note's as vertices.

Related

How to handle XML deserialization with a changing schema

We have a service that's receiving data in XML, and there's an accompanying namespace/schema definition that usually changes once a year, sometimes more.
The schema describes a very large object and we only use a small portion of it that has not changed in at least 2 years since I've been handling it. However, the schema change forces us to re-generate the C# classes, re-build and re-deploy the application.
It would be good to not have to touch the application unless there's a change in the parts that we use.
For a separate throwaway application that was set up with a certain namespace, I had the code replace the incompatible namespace with the compatible one and deserialize the data that way.
Is there a solution for this problem that's more elegant?
Edit: the data we receive is only the subset of the whole schema, that's why it's not a problem to deserialize it with the namespace replacement.

XmlSerializer - the first deserialization is very slow

I have a solution with two projects; an asp.net MVC application, and a class library. Let's call them project MVC and project CLS.
In the project CLS, there are two different versions (V1 and V2) of an XSD file that I have used to create two serializable classes with the same name, but under different namespaces (V1 and V2) using xsd2code.
In the MVC project, when the user uploads an XML file, the CLS.dll is used to deserialize the XML into an object. When the XML file is of type V1, the deserialization is very fast, but the XSD file for the V2 version is a lot more complex, and the deserialization can take up to a couple of minutes, only the first time (it's very fast afterwards, until the application is run again).
I used the Sgen.exe tool to create a serializer assembly (CLS.XmlSerializers.dll) for the CLS.V2 type in order to eliminate the first-time creation of the assembly on the fly, and therefore improving the performance.
I have successfully managed to add the Sgen Task to the Post Build events, and the assembly CLS.XmlSerializers.dll is created every time I build the project. Also, I have used the unit test code in this post to make sure the assembly is loaded, and it does. The test passes susscessfully.
However, still, the first time the XML file is deserialized, it takes a long time. So, something still should be wrong. But, I don't know what. Please help.
UPDATE:
I used Fuslogvw.exe as was suggested in the comments, and I can see that the CLS.XmlSerializers.dll is being loaded successfully. Then, how come the first time the XML file is deserialized it takes around one minute, but every time after that takes less than a second?
UPDATE 2:
One of the differences between the two XSD files is that the second one (V2) has a reference to a very big XSD file that containes definitions of some xs:enumeration types that are used in the main file. And, that's the reason the deserialization took a long time. Since all I need to do is to deserialize the XML files into objects and do not need to validate the values of the attributes and elements against those enumerations, I ended up removing the reference to that XSD file, and replacing all the enumeration types with their base types (in this case, xs:string). Now, V2 is deserialized as fast as V1, and I don't even need to use Sgen.exe. I guess Sgen.exe only helps in situations where you need to deserialize a very large XML file. In my case, the XML files are always very small, but the desrialization is (was) complex.
In order to increase performance of XML serialization, assemblies are dynamically generated each time XmlSerializer is instantiated for the first time for a specific type. It happens only once in the application lifetime, but that makes its first usage slow.
When you instantiate an XmlSerializer you have to pass the Type of the objects that you will attempt to serialize and deserialize with that serializer instance. The serializer examines all public fields and properties of the Type to learn about which types an instance references at runtime. It then proceeds to create C# code for a set of classes to handle serialization and deserialization using the classes in the System.CodeDOM namespace. During this process, the XmlSerializer checks the reflected type for XML serialization attributes to customize the created classes to the XML format definition. These classes are then compiled into a temporary assembly and called by the Serialize() and Deserialize() methods to perform the XML to object conversions.
Full Content: Troubleshooting Common Problems with the XmlSerializer
More Info: XmlSerializer Constructor Performance Issues
It is a known issue of x64 jit compiler, it can be very slow in some cases. That's why you have much better performance when running the deserializtion the second time when code is already compiled.
Try to use .net 4.6 or higher, it features a new version of x64 jit compiler (RyuJIT). If it is not possible to update .net version then take a look at this thread.

How do I check if a class file has been changed before serializing it?

We have a custom serialization process for a large number of C# types. However, regenerating all serialization information for all classes/types is time consuming and we were planning on optimizing the serialization process by computing the hash of the file and if different, we generate the serialized output, else we skip it. EDIT: We can store the hashes in a Dictionary which could be output to a file and re-read when processing. That's the current idea.
Our current serialization processor works as follows - we add the types to be serialized to a repo:
SerializerRepo.Add(typeof(MyType)); //Add type to be serialized to a repo
And then (possibly elsewhere in code) have the serializer process the repo and output the custom XMLs etc.,
Serializer.WriteXML(SerializerRepo.GetTypes());
WriteXML goes through each type and spews out an XML file for each type at a particular location. I need to optimize the WriteXML method to only serialize the class/type if it has changed, else let it be.
This may not be the best way to do it and is open for refactoring suggestions. However, the current problem is how to ascertain if the class definition (or file) housing the class/type has changed in order to determine if the XML should be generated or not?
Since there is no inherent relation between the type and the corresponding class since a class can be partial, .Net doesn't have any such mapping from types to class file and vice versa. However, we don't have any partial classes. But in our case, we seem to need the two (albeit unrelated) pieces of information - the file housing the type/class and the type itself.
Two (possibly sub-optimal) ideas so far:
Either we have the user specify the file name along with the type. But that'd not be amenable to any kind of refactoring where the file name is changed.
Another solution is to manually read each .cs file and parse for public class <classname> and map it to every type. That seems like a huge overhead and not sure if it's a reliable way to do it.
These are the only two ideas that I have but nothing concrete. Suggestions?
Separate the generation of XML in-memory from persisting it to disk.
Keep a dictionary from fully-qualified class names to hashes. On your first run, the dictionary will start out empty.
When it is time to ensure that a class's corresponding XML is up to date on disk, generate its XML in-memory, hash that, and check the hash against the dictionary. If the class's name is not in the dictionary or if its hash disagrees with the hash in the dictionary, persist the generated XML and update the dictionary with the new hash.
After you've gone through this process with all your types, you'll have a full dictionary of hashes. Persist that to disk and load it the next time you run this program.

Save and restore type (not object) created at runtime?

Inside our application (C# - .NET 4.5) we have a report generation tool. This tool must receive only the SQL command, verify it and from that, create a whole new report with available fields with the same name as specified by the SQL command and corresponding data types, similar to what an ORM tool would do.
Because of the nature of this tool we're using Reflection and Emit to create a whole new class. From fields provided by a dataReader (System.Data.SqlClient.SqlDataReader) we can create the type and populate it with the corresponding data.
The result of this is a IQueryable object that I can use on my reports.
This whole process is done and tested but to keep both the report, the generated class and the SQL command together we need to save this new type on the database and because or our database layout and system definitions, this requires me to provide an XML-like file or string to a method that will compress and convert to a Base64 string before saving it.
It would be a simple task if I were to save the report into a DLL file, just like shown HERE.
But since this new type must be transformed into an XML-like format I'm a little bit lost in here.
I have done the opposite on the past: fully create a type from an pure XML file, manually. I also know that I can do something similar but it would require me to loop into every detail/property/method/member/etc of the class to create the XML file.
Is there any way (like a helper from .NET framework) that could help me here?
Instead of doing it 100% manually I'd like to delegate the XML generation/parse to a tool probably with a better perforance too...
Edit:
People, READ THE TITLE BEFORE POSTING COMMENTS! I'm trying to save an XML for the type. TYPE. Not the object. The type.
#Mark Gravell
Thanks for the tip. I'll check that.
But about the schema: any way to save/load it automatically?
For saving the type, I would say either simply store the schema, and re-create a compatible type at runtime, or just use AssemblyBuilder etc and configure the dynamic-assembly as saveable, and write it as a .dll to disk (or elsewhere). Then just load the .dll at runtime and find the type. Either approach can work. If you already have the code to create a Type from a schema, the first may be easier.
For saving the data, my first instinct would be XmlSerializer, however that works via Assembly generation, so it might not like working against a fully-dynamic Type, from TypeBuilder. If XmlSerializer isn't happy, you could try protobuf-net; that also works in-memory (by default), so should be pretty happy.
However! I should note that you might also want to consider simply using a DataTable. While I don't have tons of love for DataTable, it is designed for exactly this scenario:
it can model fields that are defined only at runtime
it has inbuilt serialization of both schema and data
it implements the ComponentModel APIs for declaring runtime-models
which means most tools work with it for free

Storing multiple types in same object

First I will give a little background info so this question isn't completely without context:
I am writing a program which needs to read in binary files of which I know the format/layout. The bulk of these files contain "structs", with the layout of each stored in the header of the file. Each struct contains fields which could be either structs or "base" types (which are not structs, and can be either value or reference types like float or String or Vector3).
I do not need to access most of the data in these files, so I would not have to define all possible structs for my current project, but there are enough that I would have to define that doing it manually would be tedious and time consuming.
My problem is that there are an extremely large number of these structs (2500+ different ones, though only the ones appearing in a file are defined in that file's header). I would like to be able to read them in, without having to manually define each one, in a way that makes all the data available in the same way. My current thinking is that I should create a class like this:
class Struct{
StructDefinition _def;
List &lt Field &gt _fields;
...
}
In the field class I would need to be able to store both structs AND base types (or more importantly reference and value types). Is there a way to do this without turning everything into an object then casting it to the correct type when I need it? Also, is this the best way to go about reading in these files, or is there a better method?
dynamic (.NET 4.0) can do this at runtime, but you lose compile-time type safety and IntelliSense.
For your particular situation, I would recommend the new file mapping classes in .NET 4.0.
If you need a pre-.NET 4.0 solution, consider looking at the file as a series of offsets instead of structures, and use a FileStream, seeking to and reading in only the information you need. This is kind of like a poor-man's file mapping. Note that in this situation, it's more performant if you only move forward through the file.
Using either of these last two solutions, you don't have to waste time and memory reading in data you won't need anyway.
If you define an interface that contains all of the methods that you need to manipulate the fields, then you can create a small set of objects - one for reference and one for value types.
As long as each of these objects implement the same interface, you will be able to work with them generically.

Categories