Complex XML Parsing in C# - c#

I need to parse a complex and large (100 MB+) XML file. Fortunately I have XML Schema definitions, but unfortunately I can not use xsd2code to generate an automatic XML deserialization, because there are abstract message types used on the top level of the XML.
The structure of the XML file is like this:
<Head>
<Batch>
<Dog></Dog>
<Dog></Dog>
</Batch>
</Head>
The xsd defines batch to contain abstract animals, not dog. Xsd2Code can create the Dog class with the right XML attributes, but the dog class is inside another xsd file.
I tried to paste all xsd together, but this did not help to fix this.
Is there a good way like Linq to XML or Xpath to loop over the elements in Batch and create Dog instances without needing to parse Dog manually?

Is there a good way like Linq to XML or Xpath to loop over the elements in Batch and create Dog instances without needing to parse Dog manually?
It depends on what you mean by "manually". I've found it's useful to have a pattern where each relevant class has a static FromXElement factory method (or a constructor taking an XElement) which extracts the relevant details. With LINQ to XML that's pretty straightforward, e.g.
public static Dog FromXElement(XElement element)
{
// Or whatever...
return new Dog((string) element.Element("Name"),
(double) element.Element("Weight"));
}
Then you can use:
List<Dog> dogs = batch.Elements("Dog")
.Select(x => Dog.FromXElement(x))
.ToList();
(You may be able to use Select(Dog.FromXElement) instead - it depends on which version of C# you're using.)
To process all the animals in a batch, you'd probably want something like:
private static readonly Dictionary<string, Func<XElement, Animal>> Factories =
new Dictionary<string, Func<XElement, Animal>>
{
{ "Dog", Dog.FromXElement },
{ "Cat", Cat.FromXElement },
// etc
}
...
List<Animal> animals = batch.Elements()
.Select(x => Factories[x.Name.LocalName](x))
.ToList();

Related

Are there annotations in C# to iterate the names of array elements in the xml created by serialization?

I have an array with elements and want to have an iteration in the names of the elements when using Serialization in C#.
Let's say I have the class:
class MyClass {
public int[] MyArray { get; set; }
}
Currently the xml build with Serialization looks like this:
<MyClass>
<MyArray>
<int>1</int>
<int>2</int>
...
</MyArray>
</MyClass>
But I need it to look like this:
<MyClass>
<MyArray>
<int1>1</int1>
<int2>2</int2>
...
</MyArray>
</MyClass>
I want to communicate with another system which don't allow two elements with the same XML-path, so the default Serialization isn't possible.
I know I can use Annotations like [XmlArrayItem("int")], but I don't think it's possible to name the elements of the array different.
Worst case I have to implement the IXmlSerializable interface, but that would be a lot of work, because basically then I have to write the complete Serialization by myself with reflection.
So if anyone has an idea how to iterate the names of array elements with Annotation or any other hack, I would be very happy!
Solution: I used the solution from the similar question How do you deserialize XML with dynamic element names?. I changed it a bit for my problem, but now it's working! Thanks for the comments and tips!

XmlSerializer Serialize with a default XmlRoot

Is there anyway to add a XML Root Element or "Wrapper" in the XmlSerializer when I serialize an object?
The XML I am looking for would be something like this:
<Groups>
<Group method="ModifySubGroups" ID="1234" PIN="5678">
<SubGroup action="Delete" number="95">
<Name>Test</Name>
</SubGroup>
</Group>
</Groups>
I have two classes, Group and SubGroup. Group contains a generic list of SubGroups. It works great, but I don't have the XML Root "Groups". Using the two classes Group and SubGroup produces this:
<Group method="ModifySubGroups" ID="1234" PIN="5678">
<SubGroup action="Delete" number="95">
<Name>Test</Name>
</SubGroup>
</Group>
The only way I could get it to work was to create another class "Groups" that contained Group. So now I have three classes, Groups, Group, and SubGroup. Groups contains Group and Group contains SubGroup.
Any other ideas?
You don't normally use XML serialization to make XML pretty. If you need a root container element, then you need to have a root container object, and serialize that instead of Group object.
You can however serialize an array of Group object
void Main()
{
var g = new Group();
g.SubGroups.Add(new SubGroup {Name = "aaa"});
var ser = new XmlSerializer(typeof(Group[]), new XmlRootAttribute("Groups"));
using (var w = new StringWriter())
{
ser.Serialize(w, new Group[] {g});
w.ToString().Dump();
}
}
public class Group
{
[XmlElement("SubGroup")]
public List<SubGroup> SubGroups = new List<SubGroup>();
}
public class SubGroup
{
public string Name;
}
Naturally this means that deserialize code needs to either magically know that there is always one and only one Group element or assume that there could be 0 or more. Honestly I don't see much point in doing this unless you actually want to serialize collection of groups. It would just add confusion.
EDIT: if you really want to comply to vendors schema you are starting from wrong point.
You do not need to implement classes like this at all, all you do instead is taking an vendors XSD and use xsd utility provided with Visual Studio to generate .net classes from your schema, you can also choose which way you want serialize objects - using XmlSerializer or DataContractSerializer ( which gives you better flexibility i would say )
NOTE : you can use some tools to generate xsd from your xml if you do not have one and do not know how to write it on yourself
You can use XmlRootAttribute in order to specify custom XML Root
Also when you serialize collection you can specify wrapper - see Array Serializing

Integration of Serialization and Deserialization in Builder Design Pattern using C#

I am implementing the Builder Pattern in order to generate a set of objects. These objects then have to be serialized to XML and deserialized from XML.
I know how to perform the serialization and deserialization however I am unsure how to integrate it into the design pattern.
For example suppose my code uses the builder to create products foo and bar. My first thought is to put a serialize function on each one because each product knows what to serialize.
My next thought was to put the deserialization in the Director or the ConcreteBuilder.
What I don't like about this is that the serialization and deserialization functions will be in different places - one in the file for the declaration of the foo and bar objects and the other in the files for something else. I am worried that they might end up becoming out of sync with each other as I work on the product classes.
My final thought was for the Director or ConcreteBuilder to perform the serialization and deserialization. What I don't like about that is the products then have to know which builder was used or know who the Director is.
To clarify - there are two situations where a product can be created:
User clicks on a button in the user interface
User loads a XML project
Can you not simply have a static serialize/deserialize class and create a generic method that can take any type of object? Isn't the pattern simply for building the objects? You can then serialize as you wish?
Something like:
public static string Serialize<T>(T data)
{
XmlSerializer xmlSerializer = new XmlSerializer(typeof(T));
StringWriter sw = new StringWriter();
xmlSerializer.Serialize(sw, data);
return sw.ToString();
}
My current solution is to have the Product perform the serialization and the ConcreteBuilder perform the deserialization, then put both the Product and it's ConcreteBuilder declarations into the same source file.
Although the task is spread across two classes it is at least kept together in one file.
Any better solutions are appreciated.

Create dynamic object with hierarchy from xml and c#

I want to create a dynamic object from a string of XML. Is there an easy way of doing this?
Example String.
<test><someElement><rep1>a</rep1><rep1>b</rep1></someElement></test>
I'm trying to create an mvc editor for passing data through nvelocity and would like people on the front end to input xml as there data for parsing.
Thanks in advance.
You need 2 things to achieve this :
1) Valid xml
2) C# class which has same data members as in your input xml.
You need to create one object of C# class then enumerate through all the elements of xml and when by using switch for each of the element name, you can take inner text property of that element and assign it to respective data member of object.
C# code might look like following (you need to fill in the gaps):
class test {
List<string> someElement;
}
class xmlEnum
{
static test createObject(string inputXml)
{
test t = new test();
// load input xml in XmlDocument class
// and start iterating thorugh all the elements
swithc(elementName)
{
case rep1:
t.someElement.add(element.innerText);
break;
// some more cases will go here
}
// finally return the object;
return t;
}
}
I hope this will help you.
I don't think there's a ready-made dynamic solution to this. If I understand your question correctly, you would like to do something like this.
SomeDynamicXmlObject test = new SomeDynamicXmlObject(yourteststring);
var rep1 = test.SomeElement.rep1;
The closest I can think of you could get to that, is to use XElement classes, something like this:
XElement test = XElement.Parse(yourteststring);
var rep1 = test.Element("SomeElement").Element("rep1");
If that's not good enough, I'm afraid you will have to write something yourself that will parse the xml and create the object on the fly. If you know in advance what the xml will look like, you could use shekhars code, but I guess from your comments that you don't.
If you have schema for xml available and if this is needed in dev/build environment then a round about way to do this will be
Use XSD tool to parse schema and generate code from it
Build the generated code using command line complier or compiler services to generate assmebly. Now you have a type available there that can be used.
Needless to say this will be a quite slow and out-of-proc tools will be used here.
Another (not an easy way but faster) way that would not have dev env dependencies would be to parse your xml and generate dynamic type using reflection. See this article to check how to use Reflection.Emit

What is the fastest way to convert a class to XML

I would like to know what is the fastest and most lightweight technique to convert a fairly large class to XML. The class will have lists and arrays in it. I need to convert all this data to XML
Here is what my application does:
it will get all the information from the database using linq to enties. Then store the data in a class. Then I want to convert this class to XML. When the data is in XML I will send the XML to the browser along with the xsl stylesheet to be displayed to the user. What is the fastest way to do this.
The XmlSerializer actually creates an assembly (with an XmlSerializationWriter) that is custom made to serialize your class. You can look at the generated code by following these steps.
You only pay the price the first time it encounters a new type.
So I think that you should really go with the XmlSerializer, not only for performance, but for maintainability.
You can use a mixin-like serializer class:
public interface MXmlSerializable { }
public static class XmlSerializable {
public static string ToXml(this MXmlSerializable self) {
if (self == null) throw new ArgumentNullException();
var serializer = new XmlSerializer(self.GetType());
using (var writer = new StringWriter()) {
serializer.Serialize(writer, self);
return writer.GetStringBuilder().ToString();
}
}
}
public class Customer : MXmlSerializable {
public string Name { get; set; }
public bool Preferred { get; set; }
}
// ....
var customer = new Customer {
Name = "Guybrush Threepwood",
Preferred = true };
var xml = customer.ToXml();
The fastest way is to write the code for it yourself. That will remove any overhead, like the need to use reflection to read the properties of the object, as you can access the properties directly.
Add a method to the class that returns it's data as XML, either by returning an XDocument, the XML already formatted as a string, or you can pass an XmlWriter to the method.
By "fastest" do you mean you want the approach which will be fastest to develop? Or do you want the approach which will have the fastest execution speed?
If it's the former, I recommend just using .NET's XmlSerializer class: http://msdn.microsoft.com/en-us/library/system.xml.serialization.xmlserializer.aspx
Serializing a class to XML output is as simple as:
XmlSerializer serializer = new XmlSerializer(myObject.GetType());
serializer.Serialize(Response.OutputStream, myObject);
And there are various attributes you can decorate your class with to control things like whether individual properties are serialized as elements or attributes, etc.
There's a good FAQ at http://www.devolutions.net/articles/serialization.aspx also
You could use XML serialization, for example:
Foo foo = new Foo();
XmlSerializer serializer = new XmlSerializer(typeof(Foo));
TextWriter writer = new StringWriter();
serializer.Serialize(writer, foo);
string xml = writer.ToString();
The fastest method would depend on the class, because it would be hand-written to take advantage of knowledge of the specifics of that class in ways a more general approach couldn't do.
I'd probably use XmlTextWriter rather than straight to TextWriter though. While the latter would allow for some further savings, these would be minimal compared to the better structure of XmlTextWriter, and you've already sacrificed a good bit in terms of structure and ease of maintenance as it is.
You can always slot in your super-optimised implementation of XmlWriter afterwards ;)
It sounds like a rather convoluted set-up, when you could just display the class's information on a webpage using ASP.NET MVC. Why take the extra two steps of converting it to XML, send it to the browser, and use an XSL stylesheet to display it to the user? It doesn't make sense.
I wrote a program that serialized one simple object graph to XML in different ways:
1. Using XmlSerializer
2. Using hardcoded xml serializer
30,000 documents:
XmlSerializer took : 0.9 sec
Hardcoded serializer took: 0.45 sec
I relied on XmlWriter in both cases and that adds some overhead.
Note that you can instruct Visual Studio to generate the XmlSerializer assembly during compile time in order to reduce the serialization for that first instance (otherwise an assembly is generated in runtime).

Categories