Parsing an XML file -options? - c#

I'm developing a system to pick up XML attachments from emails, via Exchange Web Services, and enter them into a DB, via a custom DAL object that I've created.
I've manage to extract the XML attachment and have it ready as a stream... they question is how to parse this stream and populate a DAL object.
I can create an XMLTextReader and iterate through each element. I don't see any problems with this other than that I suspect there is a much slicker way. The reader seems to treat the opening tag, the content of the tag and the closing tag as different elements (using reader.NodeType). I expected myValue to be considered one element rather than three. Like I said, I can get round this problem, but I'm sure there must be a better way.
I came across the idea of using an XML Serializer (completely new to me) but a quick look suggested that these can't handle ArrayLists and List (I'm using List).
Again, I'm new to LINQ, but LINQ-to-XML has also been mentioned, but examples I've seen seem rather complex - though that my simply be my lack of familiarity.
Basically, I don't want a cludged system, but I don't want to use any complicated technique with a learning curve, just because it's 'cool'.
What is the simplest and most effective way of translating this XML/Stream in to my DAL objects?
XML Sample:
<?xml version="1.0" encoding="UTF-8"?>
<enquiry>
<enquiryno>100001</enquiryno>
<companyname>myco</companyname>
<typeofbusiness>dunno</typeofbusiness>
<companyregno>ABC123</companyregno>
<postcode>12345</postcode>
<contactemail>me#example.com</contactemail>
<firstname>My</firstname>
<lastname>Name</lastname>
<vehicles>
<vehicle>
<vehiclereg>54321</vehiclereg>
<vehicletype>Car</vehicletype>
<vehiclemake>Ford</vehiclemake>
<cabtype>n/a</cabtype>
<powerbhp>130</powerbhp>
<registrationdate>01/01/2003</registrationdate>
</vehicle>
</vehicles>
</enquiry>
Update 1:
I'm trying to deserialize, based on Graham's example. I think I've set up the DAL for serialization, including specifying [XmlElement("whatever")] for each property. And I've tried to deserialize using the following:
SalesEnquiry enquiry = null;
XmlSerializer serializer = new XmlSerializer(typeof(SalesEnquiry));
enquiry = (SalesEnquiry)serializer.Deserialize(stream);
However, I get an exception:'There is an error in XML document (2, 2)'. The innerexception states {"<enquiry xmlns=''> was not expected."}
Conclusion (updated):
My previous problem was the fact that the element in the XML file (Enquiry) != the name of the class (SalesEnquiry). Rather than an [XmlElement] attribute for the class, we need an [XmlRoot] attribute instead. For completeness, if you want a property in your class to be ignored during serialization, you use the [XmlIgnore] attribute.
I've successfully serialized my object, and have now successfully taken the incoming XML and de-serialized it into a SalesEnquiry object.
This approach is far easier than manually parsing the XML. OK, there has been a steep learning curve, but it was worth it.
Thanks!

If your XML uses a schema (i.e. you're always going to know what elements appear, and where they appear in the tree), you could use XmlSerializer to create your objects. You'd just need some attributes on your classes to tell the serializer what XML elements or attributes they correspond to. Then you just load up your XML, create a new XmlSerializer with the type of the .NET object you want to create, and call the Deserialize method.
For example, you have a class like this:
[Serializable]
public class Person
{
[XmlElement("PersonName")]
public string Name { get; set; }
[XmlElement("PersonAge")]
public int Age { get; set; }
[XmlArrayItem("Child")]
public List<string> Children { get; set; }
}
And input XML like this (saved in a file for this example):
<?xml version="1.0"?>
<Person>
<PersonName>Bob</PersonName>
<PersonAge>35</PersonAge>
<Children>
<Child>Chris</Child>
<Child>Alice</Child>
</Children>
</Person>
Then you create a Person instance like this:
Person person = null;
XmlSerializer serializer = new XmlSerializer(typeof(Person));
using (FileStream fs = new FileStream(GetFileName(), FileMode.Open))
{
person = (Person)serializer.Deserialize(fs);
}
Update:
Based on your last update, I would guess that either you need to specify an XmlRoot attribute on the class that's acting as your root element (i.e. SalesEnquiry), or the XmlSerializer might be a bit confused that you're referencing an empty namespace in your XML (xmlns='' doesn't seem right).

XmlSerializer does support arrays & lists... as long as the contained type is serializable.

I have found Xsd2Code very helpful for this kind of thing: http://xsd2code.codeplex.com/
Basically, all you need to do is write an xsd file (an XML schema file) and specify a few command line switches. Xsd2Code will automatically generate a C# class file that contains all the classes and properties plus everything needed to handle the serialization. It's not a perfect solution as it doesn't support all aspects of XSD, but if your XML files are relatively simple collections of elements and attributes, it should be a nice short-cut for you.
There's another similar project on Codeplex called Linq to XSD (http://linqtoxsd.codeplex.com/), which was designed to enforce the entire XSD specification, but last time I checked, it was no longer being supported and not really ready for prime time. Thought it was worth a mention, though.

Related

DeSerialize xml string and ignore the first element within root

I've got this XML data in string, structure can be seen as follows:
<Document>
<Contents>
<Content>
...
<Contents>
</Document>
So the structure is always like above, I made a class which exactly reflects the objects which will be identified as <Content>.
I wonder how I can deserialize the content in one go into a List of Content objects. Currently I try something as
XmlSerializer annotationSerializer = new XmlSerializer(
typeof(List<Content>),
new XmlRootAttribute("Document")
);
Of course this will not work as the first found element will be contents, how do I work around this? Do I require a certain attribute on the Content class?
You will need to use a root object here:
public class Document {
public List<Content> Contents {get;} = new List<Content>();
}
Now deserialize a Document and read .Contents. There are some scenarios where you can bypass the root object, but... not here, not conveniently.

rename property in xml schema class when serializing

I have an auto generated c# class file from an xml schema using the xsd generator tool.
There is a property in this class that i need to rename from "Balance" to "balance" when the xml file gets created.
As this is a generated class i need to update the created xml object on the fly before seralizing so cant just add an atrribute over the class property with the expected name.
I have accomplished the task of ignoring certain properties by using the XmlAttributes class so am sure there is something i could do along same lines for this
Can anyone point me in the direction of how to achieve this?
Thanks
I have managed to resolve my issue by using the following:
var overrides = new XmlAttributeOverrides();
overrides.Add(typeof(MyGeneratedCustomType), "Balance", new XmlAttributes { XmlAttribute = new XmlAttributeAttribute("balance") });
var serializer = new XmlSerializer(xmlFile.GetType(), overrides);
MyGeneratedCustomType is a type that appears in the generated xsd class which holds the property i needed to rename. Its an elegant solution as there is very minimal code required.
Assuming you need to deserialize from XML like this:
<root>
<Balance />
</root>
and then serialize to XML like this:
<root>
<balance />
</root>
You have two options here:
You could just create a second class that mirrors your auto-generated class in every way except for having [XmlElement("balance")] on the Balance property in the mirrored class. If the generated class is XsdGenerated and the mirrored class is CustomClass, create a constructor or override the =s operator to be able to populate the CustomClass with all the fields from XsdGenerated. When you serialize CustomClass, you should get the desired result. I think this is the preferable option.
Implement IXmlSerializable on XsdGenerated. Call base() in the ReadXml method, and just have the WriteXml method create the balance tag in lower case. Note that this option is probably more challenging to write/maintain and would limit your ability to serialize and deserialize - it'd be a one way operation, unless you created an even more complex mechanism to set whether ReadXml() and WriteXml() should treat balance with an upper or lower case b.

.NET XmlSerializer to Element FormDefault=Unqualified XML?

I use C# code more-or-less like this to serialize an object to XML:
XmlSerializer xs1 = new XmlSerializer(typeof(YourClassName));
StreamWriter sw1 = new StreamWriter(#"c:\DeserializeYourObject.xml");
xs1.Serialize(sw1, objYourObjectFromYourClassName);
sw1.Close();
I want it to serialize like this:
<ns0:Header xmlns:ns0="https://mynamespace/">
<SchemaVersion>1.09</SchemaVersion>
<DateTime>2009-12-15T00:00:01-08:00</DateTime>
but instead, it is doing this:
<Header xmlns="https://mynamespace/">
<SchemaVersion xmlns="">V109</SchemaVersion>
<DateTime xmlns="">2010-03-08T18:21:09.100125-08:00</DateTime>
The way it is serializing doesn't work with the XPath I had planned to use, and doesn't match my BizTalk schema. Originally I built the class using XSD.exe from a BizTalk 2006 schema, then I use it for an argument to a WCF web service.
This might be related to an option called element FormDefault = Qualified or Unqualified. In BizTalk, my I have the schema set to "Unqualfiied" which is what I want.
Is there any way for the serializer to output "unqualified" results?
Thanks,
Neal Walters
Update:
Sample attribute on DateTime:
/// <remarks/>
[System.Xml.Serialization.XmlElementAttribute(Form = System.Xml.Schema.XmlSchemaForm.Unqualified)]
public System.DateTime DateTime
{
get
{
return this.dateTimeField;
}
set
{
this.dateTimeField = value;
}
}
BizTalk provides for what it calls promoted (or distinguished) fields, which use XPath to pull out values of individual elements. I checked the XPath of BizTalk in a tool called StylusStudio, and Biztalk'x xpath didn't work with the xmlns='' fields above.
The first thing my WCF web service does is to serialize the object to a string (using UTF16 encoding) and store it in an XML column in a SQL database. It is from there I am seeing the above xml sample with the xmlns="".
XPath:
/*[local-name()='Header' and namespace-uri()='https://mynamespace/']/*[local-name()='DateTime' and namespace-uri()='']
The XPATH you're using does not match the namespaces of your XML. Your Header element, for instance, in in the https://mynamespace/, but your XPATH is searching in the http://mynamespace/ namespace.
My question was a bit muddled, so this answer may or may not help someone.
This is a fairly complex scenario, and half of my issues came from trying to simplify it to make an easy post here.
I was actually adding a new element programmatically with a C# routine (see "NewElement" below). The C# code did not set its namespace to an empty string, therefore I believe it is inheriting the namespace of the "Header" element.
I freaked out a little because I was jumping to the conclusion that DateTime should not have the "xmlns=""' when in fact it should. Even though DateTime falls under Header, it does not nor should not inherit the Header's namespace.
In BizTalk, typically only complex types have their own namespace, and DateTime as well as NewElement are simple types.
<Header xmlns="https://mynamespace/">
<SchemaVersion xmlns="">V109</SchemaVersion>
<DateTime xmlns="">2010-03-08T18:21:09.100125-08:00</DateTime>
<NewElement>myvalue</NewElement>
So in effect, the two XML's I posted originally are identical as far as XPath goes. If I insert a new element, I need to make sure it follows the same pattern.
I had written the C# routine to add the element more than a year ago, and it worked fine then, so I wasn't suspect that it was causing this problem.

Creating something similar to System.ServiceModel.Syndication using .NET 3.5

If I am dealing with several standard xml formats what would be the best practice way of encapsulating them in C# 3.5? I'd like to end up with something similar to the System.ServiceModel.Syndication namespace.
A class that encapsulates the ADF 1.0 XML standard would be one example. It has the main XML root node, 6 child elements, 4 of which are required IIRC, several required and optional elements and attributes further down the tree. I'd like the class to create, at a minimum, the XML for all the required pieces all the way up to a full XML representation. (Make sense).
With LINQ 4 XML and extension classes, etc, etc, there has to be some ideas on quickly generating a class structure for use. Yes? No? :)
Am not sure if I gave enough details to get one correct answer but am willing to entertain ideas right now.
TIA
XML serialization seems a good approach. You could even use xsd.exe to generate the classes automatically...
EDIT
Note that the class names generated by the tool are usually not very convenient, so you might want to rename them. You might also want to change arrays of T to List<T>, so that you can easily add items where needed.
Assuming the class for the root element is named ADF, you can load an ADF document as follows :
ADF adf = null;
XmlSerializer xs = new XmlSerializer(typeof(ADF));
using (XmlReader reader = XmlReader.Create(fileName))
{
adf = (ADF)xs.Deserialize(reader);
}
And to save it :
ADF adf = ...; // create or modify your document
...
XmlSerializer xs = new XmlSerializer(typeof(ADF));
using (XmlWriter writer = XmlWriter.Create(fileName))
{
xs.Serialize(writer, adf);
}
Why not just follow the pattern of the SyndicationFeed object in the Syndication namespace? Create a class that takes in a Uri to the xml document or just takes in the document fragment.
Then parse the document based on your standards (this parsing can be done using LinqToXml if you wanted to, though regEx might be faster if you are comfortable with them). Throw exceptions or track errors appropriately when the document doesn't pass the specification rules.
If the document passes the parse step then break the pieces of the document out into public getter properties of your object. Then return the fully hydrated object back to the consumer for use
Seems pretty straight forward to me. Is that what you are after or are you looking for something more than this?

Deserializing XML to Objects in C#

So I have xml that looks like this:
<todo-list>
<id type="integer">#{id}</id>
<name>#{name}</name>
<description>#{description}</description>
<project-id type="integer">#{project_id}</project-id>
<milestone-id type="integer">#{milestone_id}</milestone-id>
<position type="integer">#{position}</position>
<!-- if user can see private lists -->
<private type="boolean">#{private}</private>
<!-- if the account supports time tracking -->
<tracked type="boolean">#{tracked}</tracked>
<!-- if todo-items are included in the response -->
<todo-items type="array">
<todo-item>
...
</todo-item>
<todo-item>
...
</todo-item>
...
</todo-items>
</todo-list>
How would I go about using .NET's serialization library to deserialize this into C# objects?
Currently I'm using reflection and I map between the xml and my objects using the naming conventions.
Create a class for each element that has a property for each element and a List or Array of objects (use the created one) for each child element. Then call System.Xml.Serialization.XmlSerializer.Deserialize on the string and cast the result as your object. Use the System.Xml.Serialization attributes to make adjustments, like to map the element to your ToDoList class, use the XmlElement("todo-list") attribute.
A shourtcut is to load your XML into Visual Studio, click the "Infer Schema" button and run "xsd.exe /c schema.xsd" to generate the classes. xsd.exe is in the tools folder. Then go through the generated code and make adjustments, such as changing shorts to ints where appropriate.
Boils down to using xsd.exe from tools in VS:
xsd.exe "%xsdFile%" /c /out:"%outDirectory%" /l:"%language%"
Then load it with reader and deserializer:
public GeneratedClassFromXSD GetObjectFromXML()
{
var settings = new XmlReaderSettings();
var obj = new GeneratedClassFromXSD();
var reader = XmlReader.Create(urlToService, settings);
var serializer = new System.Xml.Serialization.XmlSerializer(typeof(GeneratedClassFromXSD));
obj = (GeneratedClassFromXSD)serializer.Deserialize(reader);
reader.Close();
return obj;
}
Deserialize any object, as long as the type T is marked Serializable:
function T Deserialize<T>(string serializedResults)
{
var serializer = new XmlSerializer(typeof(T));
using (var stringReader = new StringReader(serializedResults))
return (T)serializer.Deserialize(stringReader);
}
Well, you'd have to have classes in your assembly that match, roughly, the XML (property called Private, a collection property called ToDo, etc).
The problem is that the XML has elements that are invalid for class names. So you'd have to implement IXmlSerializable in these classes to control how they are serialized to and from XML. You might be able to get away with using some of the xml serialization specific attributes as well, but that depends on your xml's schema.
That's a step above using reflection, but it might not be exactly what you're hoping for.
Checkout http://xsd2code.codeplex.com/
Xsd2Code is a CSharp or Visual Basic Business Entity class Generator from XSD schema.
There are a couple different options.
Visual Studio includes a command line program called xsd.exe. You use that program to create a schema document, and use the program again on the schema document to creates classes you can use with system.xml.serialization.xmlserializer
You might just be able to call Dataset.ReadXml() on it.
i had the same questions few years back that how abt mapping xml to C# classes or creating C# classes which are mapped to our XMLs, jst like we do in entity Framework (we map tables to C# classes). I created a framework finally, which can create C# classes out of your XML and these classes can be used to read/write your xml. Have a look

Categories