Deserializing xml with namespace prefixes that are undefined - c#

The Xml response I receive is as follows:
<response>
<item xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="java:com.someDomain.item">
<name>some name</disc-name>
<description>some description</disc-desc>
</item>
<item xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="java:com.someDomain.item">
<name>some name</disc-name>
<description>some description</disc-desc>
</item>
<otherValue>12.1</otherValue>
</response>
My class is decorated as such:
[XmlElement("item")]
public Item[] Items{get;set;}
[XmlElement("otherValue")
public string OtherValue{get;set;}
When I attempt to deserialize the above Xml to the class described, I receive an error of "Namespace prefix 'java' is not defined". Adding the "namespace" attribute to the class resolves the parsing error(however, the xml is then distorted from the original).
ie
[XmlElement(ElementName="item",Namespace="java")]
How should I be decorating a given property to match up with a new namespace? Or, how do I correctly define the namespace?
I'm not 100% on using a stock array for my enumerable section either, but I think the namespace issue takes precident at the moment. Any insight or thoughts are greatly appreciated!
UPDATE:
I think the question is better rephrased now that I've gone back and forth a bit:
How do you use an XmlElementAttribute(or other attribute) to have a class that can serialize into the item snippet above, including the xsi tags?
As for my particular problem, I've realized since the Xml response is out of my control, I don't need the xsi attributes to begin with. To workaround the serialization issue, I'm simply doing the following(XmlElement element contains the original document above):
foreach(XmlNode node in element)
node.Attributes.RemoveAll();
I'm only noting my personal workaround as this is not actually a solution.

You were right the first time. "java" is not a namespace. It's a namespace prefix. That's an abbreviation of the namespace, for use in the XML. Otherwise, the actual namespace would need to be repeated wherever you currently see "java:".
You can use List<Item> instead of Item[].

Unfortunately this is valid XML, and completely conforms to the XML Standard.
It validates, it's correct and it's complete.
The problem you're having is in the deserialization, which is not a part of the XML Standard and is related to how .NET maps declared XML types to internal CLR types.
The xsi:type is a namespace reference and is intended to allow XML documents to substitute a derived type from another namespace for the declared type in the schema.
I know from my own experience that coders tend to react in shock that this sort of thing is even legal, much less correct XML. It basically hijacks your schema.
You do not need even need to include the foreign namespace in order for this to be considered correct.
(see this article for more ranting on this subject: http://norman.walsh.name/2004/01/29/trainwreck)
Now, as to how to handle your stated problem: deserialize this mess.
1) process the xml text and remove the xsi-types declaration and hope there are no fields declared that extend the base type.
2) declare a type that derives from your base type in the schema.
This looks like the following:
// note this "XmlIncludeAttribute" references the derived type.
// note that in .NET they are in the same namespace, but in XML they are in different namespaces.
[System.Xml.Serialization.XmlIncludeAttribute(typeof(DerivedType))]
[System.SerializableAttribute()]
[System.Xml.Serialization.XmlTypeAttribute(Namespace="http://BaseNameSpace")]
[System.Xml.Serialization.XmlRootAttribute(Namespace="http://BaseNameSpace", IsNullable=true)]
public partial class MyBaseType : object
{
...
}
/// <remarks/>
[System.SerializableAttribute()]
[System.Xml.Serialization.XmlTypeAttribute(Namespace="http://DerivedNameSpace")]
[System.Xml.Serialization.XmlRootAttribute(Namespace="http://DerivedNameSpace", IsNullable=true)]
public partial class DerivedType: MyBaseType
{
...
}
This is only a rough outline, hopefully enough to get you started. Note that this is not an easy problem to solve progmatically because it's always possible for someone to feed you XML and it will validate but not deserialize properly.

Related

XML serialization and encoded attributes

I'm trying to deserialize an XML with the following element with .NET XmlSerializer :
<SomeElement Foo_x003a_Bar="123"/>
In the target class, there's the following declaration:
class SomeElement
{
//...
[XmlAttribute("Foo_x003a_Bar")]
public string Foo_Bar;
}
The attribute is not being read from XML. In the UnknownAttribute event handler, I see that Foo_x003a_Bar is not recognized, and the list of valid attributes (Args.ExpectedAttributes) instead includes Foo_x005F_x003a_Bar.
What's the deal here, please? 0x5F is the code for the undescore character. Other attributes in the same element/class with names that contain _x0020_ deserialize properly. Why does _x003a_ get some kind of a special treatment?
EDIT: dirty hackery in the form of search/replacing in the XML string before it's parsed helps. But still.
EDIT2: the functions that implement this kind of encoding are XmlConvert:EncodeName, XmlConvert:EncodeLocalName. The latter handles colons, the former doesn't. Looks like they're calling EncodeName...
EDIT3: filed a bug report with Microsoft. Please navigate there and click "I can too" if you can, too :)
Looks like you are right about the treatment for _x<char-code>_ element names (described here), which uses XmlConvert.EncodeLocalName.
Given that you've instantiated your XmlSerializer using the only the type argument, the namespace is assumed to be empty.
What should work in your case is to simply use the decoded element names in your XmlAttributeAttribute settings:
class SomeElement
{
//...
[XmlAttribute("Foo:Bar")]
public string Foo_Bar;
}
One important factor here is casing, the XML node needs to have <Foo_x003A_Bar>, with a capital A.
The encoded node name needs to be consistent with XmlConvert.EncodeLocalName:
XmlConvert.EncodeLocalName("Foo:Bar")
// Foo_x003A_Bar

"Namespace prefix not defined" when it actually is defined

I'm having trouble deserializing XML with an "undefined" namespace prefix which really is defined.
We've published an internal web service in C# which serves a variety of clients. A new client's IDE insists on declaring xsi:type for every element in its XML output, and they can't turn off this "feature".
The XML message they produce goes like this, where "namespace" is the correct namespace.
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Body>
<myOperation xsi:type="ns1:namespace" xmlns="namespace" xmlns:ns1="namespace">
<inputString xsi:type="xsd:string">ABCDEF</inputString>
<books xsi:type="ns1:booksType">
<bookID xsi:type="xsd:string">ABC123</bookID>
<bookID xsi:type="xsd:string">DEF456</bookID>
</books>
<!-- ... snip... -->
</myOperation>
</soapenv:Body>
<books> is basically an array of strings.
The service method accepts as XmlNode, but XmlSerializer throws a "prefix 'ns1' not defined" error. (It is defined in a parent node, but apparently that is not good enough.) I have a similar problem using wsdl.exe to generate classes and deserialize the input for me.
Using XmlNamespaceManager to specify prefixes doesn't seem right -- akin to magic numbers, and I can't predict which prefix a given consumer will declare anyway. Is there a way to handle this without stripping the attributes out (books.Attributes.RemoveAll)? That doesn't feel particularly elegant either.
I've found that books.OuterXML does not contain any information for 'ns1' unless I hack the element inbound to use that prefix (), so I can see why it complains, but I don't yet understand why 'ns1' isn't recognized from its previous definition above.
Many thanks for any help, or at least education, someone can provide.
Edits: it works fine if I change <books> to use the prefix, i.e. <ns1:books xsi:type="ns1:booksType">. This works whether I've defined xmlns or no. That may be consistent with this answer, but I still don't see how I would feasibly declare the prefix in the service code.
#Chris, certainly. Hope I can strike a balance between "stingy with closed source" and "usable for those who would help". Here "books" is the XmlNode received in the service method parameter. (Not to get off topic, but will also humbly take suggestions to improve it in general; I'm still a novice.)
XmlSerializer xmlSerializer = new XmlSerializer(typeof(booksType));
StringReader xmlDataReader = new StringReader(books.OuterXml);
books = (booksType)xmlSerializer.Deserialize(xmlDataReader);
The class is pretty much this:
[Serializable()]
[XmlRoot("books", Namespace = "namespace")]
[XmlTypeAttribute(TypeName = "booksType", Namespace = "namespace")]
public class booksType
{
[XmlElement(ElementName = "bookID")]
public string[] bookIDs { get; set; }
}
Your deserialization code could look something like this:
XmlSerializer sz = new XmlSerializer(typeof(booksType));
var reader = new XmlNodeReader(booksXmlNode);
var books = sz.Deserialize(reader);
[EDIT] This is better, because the namespace declarations are preserved with the XmlNode, whereas converting to an XML string via OuterXml appears to slice off the namespace declaration for the ns1 prefix, and the serializer then barfs on the type attribute value containing this prefix. I imagine this is a bug in the XML implementation but maybe an XML guru can confirm this.
This should get you past the error you are seeing, but whether it solves the problem completely I'm not sure.
[FURTHER EDIT] As noted in the comments below, there is a bug in the .NET XmlSerializer which is causing the deserialization to fail. Stepping through the deserialization code in the generated assembly, there is a point where the following condition is tested:
(object) ((System.Xml.XmlQualifiedName)xsiType).Namespace == (object)id2_namespace))
Although the Namespace property of the XmlQualifiedName has the same value ('namespace') as the string variable id2_namespace, the condition is evaluating to false because it is coded as an object identity test rather than a test for string value equivalence. Failing this condition leads directly to the exception reported by OP.
As far as I can see, this bug will always cause deserialization to fail whenever the XML for the object being deserialized uses one prefix on the object's root element name, and another prefix (defined as the same namespace) on that element's xsi:type attribute.

WCF with XmlSerializer: namespace collision when returning generic contracts

The background
I'm developing a REST API for a C#.NET web application using WCF. I configured it to use the XmlSerializer rather than its default DataContractSerializer, for greater control over the XML format. I created a generic ResponseContract<TResponse, TErrorCode> data contract, which wraps the response with <Api> and <Response> for generic data like request status, error messages, and namespaces. An example method:
ResponseContract<ItemListContract, ItemListErrorCode> GetItemList(...)
An example response from the above method:
<?xml version="1.0" encoding="utf-8"?>
<Api xmlns="http://example.com/api/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Response Status="OKAY" ErrorCode="OKAY" ErrorText="">
<Data Template="ItemList">
<Pages Template="Pagination" Size="10" Index="1" Count="13" Items="126" />
<Items>
<Item example="..." />
<Item example="..." />
<Item example="..." />
  </Items>
</Data>
</Response>
</Api>
The problem
This works very well for services whose methods all have the same generic ResponseContract types. WCF or XmlSerializer expects each contract to have a unique name within its namespace, but the service is now returning a generic contract with different types having the same XML root name:
ResponseContract<ItemListContract, ItemListErrorCode> GetItemList(...)
ResponseContract<ItemContract, ItemErrorCode> GetItem(...)
With the resulting exception:
The top XML element 'Api' from namespace 'http://example.com/api/' references distinct types Company.Product.ApiServer.Contracts.ResponseContract`2[Company.Product.ApiServer.Contracts.Items.ItemListContract,Company.Product.ApiServer.Interfaces.Items.ItemListErrorCode] and Company.Product.ApiServer.Contracts.ResponseContract`2[Company.Product.ApiServer.Contracts.Items.ItemContract,Company.Product.ApiServer.Items.ItemErrorCode]. Use XML attributes to specify another XML name or namespace for the element or types.
The service must allow different return types. This is difficult to achieve because the ResponseContract<TResponse, TErrorCode> (which sets the name & namespace) is generic and returned by all API methods. I also need to maintain WSDL metadata integrity, which means no dynamic changes using reflection.
Attempted solutions
Changing the XML attributes declaratively is not possible, since the <Api> root element and its attributes are completely generic (in ResponseContract).
Changing the attribute namespace at runtime using reflection (eg, 'http://example.com/api/Items/GetItemList') has no effect. It's possible to get attributes, but changes to them have no effect. This would break WSDL anyway.
When implementing IXmlSerializable, the writer is already positioned after the <Api> start tag when WriteXml() is invoked. It's only possible to override the serialization of <Api>'s child nodes, which cause no problem anyway. This wouldn't work anyway, since the exception is thrown before the IXmlSerializable methods are called.
Concatenating the constant namespace with typeof() or similar to make it unique doesn't work, because the namespace must be a constant.
The default DataContractSerializer can insert type names into the name (like <ApiOfIdeaList>), but DataContractSerializer's output is bloated and unreadable and lacks attributes, which is not feasible for the external reusers.
Extending XmlRootAttribute to generate the namespace differently. Unfortunately there is no type information available when it's invoked, only the generic ResponseContract data. It's possible to generate a random namespace to circumvent the problem, but dynamically changing the schema breaks the WSDL metadata.
Making ResponseContract a base class instead of a wrapper contract should work, but would result in a lot of duplicated generic data. For example, <Pages> and <Item> in the example above are also contracts, which would have their own equivalent <Api> and <Response> elements.
Conclusion
Any ideas?
I got the tumbleweed badge for this one!
I abandoned the described approach because I couldn't find a viable solution. Instead, every contract inherits a nullable QueryStatus<TErrorCode> property from the generic BaseContract<TContract, TErrorCode>. This property is populated automatically for the main contract, and null for subcontracts.

Parsing an XML file -options?

I'm developing a system to pick up XML attachments from emails, via Exchange Web Services, and enter them into a DB, via a custom DAL object that I've created.
I've manage to extract the XML attachment and have it ready as a stream... they question is how to parse this stream and populate a DAL object.
I can create an XMLTextReader and iterate through each element. I don't see any problems with this other than that I suspect there is a much slicker way. The reader seems to treat the opening tag, the content of the tag and the closing tag as different elements (using reader.NodeType). I expected myValue to be considered one element rather than three. Like I said, I can get round this problem, but I'm sure there must be a better way.
I came across the idea of using an XML Serializer (completely new to me) but a quick look suggested that these can't handle ArrayLists and List (I'm using List).
Again, I'm new to LINQ, but LINQ-to-XML has also been mentioned, but examples I've seen seem rather complex - though that my simply be my lack of familiarity.
Basically, I don't want a cludged system, but I don't want to use any complicated technique with a learning curve, just because it's 'cool'.
What is the simplest and most effective way of translating this XML/Stream in to my DAL objects?
XML Sample:
<?xml version="1.0" encoding="UTF-8"?>
<enquiry>
<enquiryno>100001</enquiryno>
<companyname>myco</companyname>
<typeofbusiness>dunno</typeofbusiness>
<companyregno>ABC123</companyregno>
<postcode>12345</postcode>
<contactemail>me#example.com</contactemail>
<firstname>My</firstname>
<lastname>Name</lastname>
<vehicles>
<vehicle>
<vehiclereg>54321</vehiclereg>
<vehicletype>Car</vehicletype>
<vehiclemake>Ford</vehiclemake>
<cabtype>n/a</cabtype>
<powerbhp>130</powerbhp>
<registrationdate>01/01/2003</registrationdate>
</vehicle>
</vehicles>
</enquiry>
Update 1:
I'm trying to deserialize, based on Graham's example. I think I've set up the DAL for serialization, including specifying [XmlElement("whatever")] for each property. And I've tried to deserialize using the following:
SalesEnquiry enquiry = null;
XmlSerializer serializer = new XmlSerializer(typeof(SalesEnquiry));
enquiry = (SalesEnquiry)serializer.Deserialize(stream);
However, I get an exception:'There is an error in XML document (2, 2)'. The innerexception states {"<enquiry xmlns=''> was not expected."}
Conclusion (updated):
My previous problem was the fact that the element in the XML file (Enquiry) != the name of the class (SalesEnquiry). Rather than an [XmlElement] attribute for the class, we need an [XmlRoot] attribute instead. For completeness, if you want a property in your class to be ignored during serialization, you use the [XmlIgnore] attribute.
I've successfully serialized my object, and have now successfully taken the incoming XML and de-serialized it into a SalesEnquiry object.
This approach is far easier than manually parsing the XML. OK, there has been a steep learning curve, but it was worth it.
Thanks!
If your XML uses a schema (i.e. you're always going to know what elements appear, and where they appear in the tree), you could use XmlSerializer to create your objects. You'd just need some attributes on your classes to tell the serializer what XML elements or attributes they correspond to. Then you just load up your XML, create a new XmlSerializer with the type of the .NET object you want to create, and call the Deserialize method.
For example, you have a class like this:
[Serializable]
public class Person
{
[XmlElement("PersonName")]
public string Name { get; set; }
[XmlElement("PersonAge")]
public int Age { get; set; }
[XmlArrayItem("Child")]
public List<string> Children { get; set; }
}
And input XML like this (saved in a file for this example):
<?xml version="1.0"?>
<Person>
<PersonName>Bob</PersonName>
<PersonAge>35</PersonAge>
<Children>
<Child>Chris</Child>
<Child>Alice</Child>
</Children>
</Person>
Then you create a Person instance like this:
Person person = null;
XmlSerializer serializer = new XmlSerializer(typeof(Person));
using (FileStream fs = new FileStream(GetFileName(), FileMode.Open))
{
person = (Person)serializer.Deserialize(fs);
}
Update:
Based on your last update, I would guess that either you need to specify an XmlRoot attribute on the class that's acting as your root element (i.e. SalesEnquiry), or the XmlSerializer might be a bit confused that you're referencing an empty namespace in your XML (xmlns='' doesn't seem right).
XmlSerializer does support arrays & lists... as long as the contained type is serializable.
I have found Xsd2Code very helpful for this kind of thing: http://xsd2code.codeplex.com/
Basically, all you need to do is write an xsd file (an XML schema file) and specify a few command line switches. Xsd2Code will automatically generate a C# class file that contains all the classes and properties plus everything needed to handle the serialization. It's not a perfect solution as it doesn't support all aspects of XSD, but if your XML files are relatively simple collections of elements and attributes, it should be a nice short-cut for you.
There's another similar project on Codeplex called Linq to XSD (http://linqtoxsd.codeplex.com/), which was designed to enforce the entire XSD specification, but last time I checked, it was no longer being supported and not really ready for prime time. Thought it was worth a mention, though.

.NET XmlSerializer to Element FormDefault=Unqualified XML?

I use C# code more-or-less like this to serialize an object to XML:
XmlSerializer xs1 = new XmlSerializer(typeof(YourClassName));
StreamWriter sw1 = new StreamWriter(#"c:\DeserializeYourObject.xml");
xs1.Serialize(sw1, objYourObjectFromYourClassName);
sw1.Close();
I want it to serialize like this:
<ns0:Header xmlns:ns0="https://mynamespace/">
<SchemaVersion>1.09</SchemaVersion>
<DateTime>2009-12-15T00:00:01-08:00</DateTime>
but instead, it is doing this:
<Header xmlns="https://mynamespace/">
<SchemaVersion xmlns="">V109</SchemaVersion>
<DateTime xmlns="">2010-03-08T18:21:09.100125-08:00</DateTime>
The way it is serializing doesn't work with the XPath I had planned to use, and doesn't match my BizTalk schema. Originally I built the class using XSD.exe from a BizTalk 2006 schema, then I use it for an argument to a WCF web service.
This might be related to an option called element FormDefault = Qualified or Unqualified. In BizTalk, my I have the schema set to "Unqualfiied" which is what I want.
Is there any way for the serializer to output "unqualified" results?
Thanks,
Neal Walters
Update:
Sample attribute on DateTime:
/// <remarks/>
[System.Xml.Serialization.XmlElementAttribute(Form = System.Xml.Schema.XmlSchemaForm.Unqualified)]
public System.DateTime DateTime
{
get
{
return this.dateTimeField;
}
set
{
this.dateTimeField = value;
}
}
BizTalk provides for what it calls promoted (or distinguished) fields, which use XPath to pull out values of individual elements. I checked the XPath of BizTalk in a tool called StylusStudio, and Biztalk'x xpath didn't work with the xmlns='' fields above.
The first thing my WCF web service does is to serialize the object to a string (using UTF16 encoding) and store it in an XML column in a SQL database. It is from there I am seeing the above xml sample with the xmlns="".
XPath:
/*[local-name()='Header' and namespace-uri()='https://mynamespace/']/*[local-name()='DateTime' and namespace-uri()='']
The XPATH you're using does not match the namespaces of your XML. Your Header element, for instance, in in the https://mynamespace/, but your XPATH is searching in the http://mynamespace/ namespace.
My question was a bit muddled, so this answer may or may not help someone.
This is a fairly complex scenario, and half of my issues came from trying to simplify it to make an easy post here.
I was actually adding a new element programmatically with a C# routine (see "NewElement" below). The C# code did not set its namespace to an empty string, therefore I believe it is inheriting the namespace of the "Header" element.
I freaked out a little because I was jumping to the conclusion that DateTime should not have the "xmlns=""' when in fact it should. Even though DateTime falls under Header, it does not nor should not inherit the Header's namespace.
In BizTalk, typically only complex types have their own namespace, and DateTime as well as NewElement are simple types.
<Header xmlns="https://mynamespace/">
<SchemaVersion xmlns="">V109</SchemaVersion>
<DateTime xmlns="">2010-03-08T18:21:09.100125-08:00</DateTime>
<NewElement>myvalue</NewElement>
So in effect, the two XML's I posted originally are identical as far as XPath goes. If I insert a new element, I need to make sure it follows the same pattern.
I had written the C# routine to add the element more than a year ago, and it worked fine then, so I wasn't suspect that it was causing this problem.

Categories