how to verify that an XML file implements a specific schema

how to verify that an XML file implements a specific schema - c#

This is slightly different from just Schema validation. I'm asking how in C# you can check that not only is the document valid against a schema, but also validate that the schema actually applies to that document. I'd prefer a .NET / C# answer but any answer that fully respects the document standards will suffice.

I am making some assumptions as to what exactly you're looking for.
Having said "a specific schema" to me it means that you have a schema and that you're sifting through XML files trying to understand first if that schema should even be used to validate an XML.
First, I would lay some background... A "schema" could be that which one gets in a single file, or spread across multiple files. With multiple files, there are a couple of relationships possible between the XSD files: include, import, redefine; then there's the including of a schema file without a target namespace, by a schema file with a target namespace (this is typically called chameleon). So instead of "schema" I would prefer to use the term "schema set".
Some things to consider then:
A chameleon XSD in your "schema set" may not be intended to validate an XML having an unqualified document element.
A redefined XSD should not be used to validate matching XML content; the redefining XSD should.
Even though an XSD defines abc as a global element, it may not be acceptable to process XML instances that feature abc as the root element.
The above is to indicate that even though an XML may appear to implement a "specific schema", in itself it doesn't mean it matches the intent the author of the XSD placed in that schema.
Considering the above logic defined and implemented somehow, the verification I would do, as an answer to your question, would be to find the XSD definition of a non-abstract, global element - an XmlSchemaElement - in a specific XmlSchemaSet, using the full qualified name of the root element in the XML I am verifying.
System.Xml.Schema.XmlSchemaSet xset = ...; // Loaded somehow
System.Xml.XmlQualifiedName qn = ...; // LocalName + NamespaceURI
if (xset.GlobalElements.Contains(qn))
{
System.Xml.Schema.XmlSchemaElement el = (System.Xml.Schema.XmlSchemaElement)xset.GlobalElements[qn];
if (!el.IsAbstract)
{
// The XML file may implement the schemata loaded in this schema set.
}
}
I would expect this to at least help you improve your question, if I am off.

Related

How can I extract values as strings from an xml file based on the element/property name in a generated .Net class or the original XSD?

I have a large complex XSD set.
I have C# classes generated from those XSDs using xsd.exe. Naturally, though the majority of properties in the generated classes are strings, many are decimals, DateTimes, enums or bools, just as they should be.
Now, I have some UNVALIDATED data that is structured in the correct XML format, but may well NOT be able to pass XSD validation, let alone be put into an instance of the relevant .Net object. For example, at this stage, for all we know the value for the element that should be a DateTime could be "ABC" - not even parseable as a DateTime - let alone other string elements respecting maxLength or regex pattern restrictions. This data is ready to be passed in to a rules engine that we already have to make everything valid, including defaulting things appropriately depending on other data items, etc.
I know how to use the various types in System.Xml to read the string value of a given element by name. Clearly I could just hand craft code to get out all the elements that exist today by name - but if the XSD changes, the code would need to be reworked. I'd like to be able to either directly read the XSD or use reflection on the generated classes (including attributes like [System.Xml.Serialization.XmlTypeAttribute(TypeName=...] where necessary) to find exactly how to recursively query the XML down to the the raw, unverified string version of any given element to pass through to the ruleset, and then after the rules have made something valid of it, either put it back into the strongly typed object or back into a copy of the XML for serialization into the object.
(It has occurred to me that an alternative approach would be to somehow automatically generate a 'stringly typed' version of the object - where there are not DateTimes etc; nothing but strings - and serialize the xml into that. I have even madly thought of taking the xsd.exe generated .cs file and search/replacing all the enums and base types that aren't strings to strings, but there has to be a better way.)
In other words, is there an existing generic way to pull the XElement or attribute value from some XML that would correspond to a given item in a .Net class if it were serialized, without actually serializing it?

Sorry to self-answer, and sorry for the lack of actual code in my answer, but I don't yet have the permission of my employer to share the actual code on this. Working on it, I'll update here when there is movement.
I was able to implement something I called a Tolerant XML Reader. Unlike most XML deserializing, it starts by using reflection to look at the structure of the required .Net type, and then attempts to find the relevant XElements and interpret them. Any extra elements are ignored (because they are never looked for), any elements not found are defaulted, and any elements found are further interpreted.
The main method signature, in C#, is as follows:
public static T TolerantDeserializeIntoType<T>(
XDocument doc,
out List<string> messagesList,
out bool isFromSuppliedData,
XmlSchemaSet schemas = null,
bool tolerant = true)
A typical call to it might look like this:
List<string> messagesList;
bool defaultOnly;
SomeType result = TolerantDeserializeIntoType<SomeType>(someXDocument, out messagesList, out defaultOnly);
(you may use var; I just explicitly put the type there for clarity in this example).
This will take any XDocument (so the only criteria of the original was that it was well-formed), and make an instance of the specified type (SomeType, in this example) from it.
Note that even if nothing at all in the XML is recognized, it will still not fail. The new instance will simply have all properties / public fields nulled or defaulted, the MessageList would list all the defaulting done, and the boolean out paramater would be FALSE.
The recursive method that does all the work has a similar signature, except it takes an XElement instead of an XDocument, and it does not take a schemaSet. (The present implementation also has an explicit bool to indicate a recursive call defaulting to false. This is a slightly dirty way to allow it to gather all failure messages up to the end before throwing an error if tolerant is false; in a future version I will refactor that to only expose publicly a version without that, if I even want to make the XElement version public at all):
public static T TolerantDeserializeXElementIntoType<T>(
ref XElement element,
ref List<string> messagesList,
out bool isFromSuppliedValue,
bool tolerant = true,
bool recursiveCall = false)
How it works, detail
Starting with the main call, the one with with an XDocument and optional SchemaSet:
If a schema Set that will compile is supplied (actually, it also looks for xsi:noNamespaceSchemaLocation as well) the initial XDocument and schemaSet call runs a standard XDocument.Validate() across the supplied XDocument, but this only collects any issued validation error callbacks. It won't throw an exception, and is done for only two reasons:
it will give some useful messages for the MessageList, and
it will populate the SchemaInfo of all XElements to
possibly use later in the XElement version.
(note, however, that the
schema is entirely optional. It is actually only used to resolve
some ambiguous situations where it can be unclear from the C#
object if a given XElement is mandatory or not.)
From there, the recursive XElement version is called on the root node and the supplied C# type.
I've made the code look for the style of C# objects generated by xsd.exe, though most basic structured objects using Properties and Fields would probably work even without the CustomAttributes that xsd.exe supplies, if the Xml elements are named the same as the properties and fields of the object.
The method looks for:
Arrays
Simple value types, explicitly:
String
Enum
Bool
then anything
else by using the relevant TryParse() method, found by reflection.
(note that nulls/xsi:nill='true' values also have to be specially
handled)
objects, recursively.
It also looks for a boolean 'xxxSpecified' in the object for each field or property 'xxx' that it finds, and sets it appropriately. This is how xsd.exe indicates an element being omitted from the XML in cases where null won't suffice.
That's the outline. As I said, I may be able to put actual code somewhere like GitHub in due course. Hope this is helpful to someone.

DataContract deserialization fails due to incorrect ordering of XML nodes

I am baffled with the behavior of the DataContractSerializer. Our configuration is XML based. XML is used as source for DataContractSerializer.ReadObject method. Recently I have encountered a problem when some properties of deserialized object were not set. I have tracked the changes and discovered that those properties were added into XML manually. Which is OK in my opinion. Apparently it was not OK in DataContractSerializer's opinion because it appears it expects XML nodes to be ordered alphabetically. Really?! Deserialization seems like really straightforward thing - read XML sequentially, parse node name, set corresponding property. What's the purpose of ordering?
Is there a workaround? Maybe some sort of settings for DataContractSerializer?

I encountered this problem recently. To work around it, I used the XmlSerializer and removed the explicit ordering from the XmlElement attributes:
set proxy_tool="C:\Program Files\Microsoft SDKs\Windows\v7.1\Bin\SvcUtil.exe" /nologo /t:code /ser:XmlSerializer /UseSerializerForFaults
set sed_tool="$(ProjectDir)sed.exe" -r -i "s/,?[[:space:]]*Order=[[:digit:]]+//"
%proxy_tool% /o:"Proxy1.cs" /n:*,Namespaces.Name1 "Proxy1.wsdl"
%sed_tool% "Proxy1.cs"
%proxy_tool% /o:"Proxy2.cs" /n:*,Namespaces.Name2 "Proxy2.wsdl"
%sed_tool% "Proxy2.cs"
...
There's some more information on my blog post.
If you want to know why the order matters, it's because a sequence in XSD has a defined order, and the web service contracts are defined with XSD.
From the specification:
The consequence of this definition is that any element appearing in an instance whose type is declared to be USAddress (e.g. shipTo in po.xml) must consist of five elements and one attribute. These elements must be called name, street, city, state and zip as specified by the values of the declarations' name attributes, and the elements must appear in the same sequence (order) in which they are declared.

You can use the Order member of DataMemberAttribute to help with this, but in most cases: XML is order-specific (for elements, not attributes) - so it isn't specifically wrong.
That said: if you want fine control over XML serialization, DataContractSerializer is a poor choice XmlSerializer offers more control - and is less fussy re order iirc.

How can I deserialize data from xml and how can I create classes dynamically by xml element name?

I have a dynamic XML structure. I mean, root name can be changed, element name can be changed and structure of xml can be changed.So How can I deserialize dynamic xml to objects?
Creating classes dynamically by xml element names, is this possible? If yes, then how can I do this?

So How can I deserialize this type xml to objects?
Well, you cannot deserialize a XML without a specific XSD schema to statically typed classes. You could use a XDocument or XmlReader classes to parse and extract information from it.

If you're using .Net 4, the ExpandoObject might be what you need:
Represents an object whose members can be dynamically added and removed at run time.

You want to look into the factory pattern. The basic idea is:
for each XML node, get the type name / name of node
ask factory for type with given name
create object of the given type
pass the XML node to object to allow it to load its data
XSD schemas help here to verify the XML is structurally correct although not necessary. I did try this once, creating objects from XML, but I never found a way to have multiple schema files for one XML (this would have allowed me to extend the schema without changing the root schema, if you see what I mean).

XML: read children of an element

I'm looking into the possibility of storing settings in an XML file. Here's a simplified version of my code. You don't see it here, but this code is located inside a try block so that I can catch any XmlException that comes up.
XmlReader fileReader = XmlReader.Create(Path.GetDirectoryName(Application.ExecutablePath) + "\\settings.xml");
// Start reading
while (fileReader.Read())
{
// Only concern yourself with start tags
if (fileReader.IsStartElement())
{
switch (fileReader.Name)
{
// Calendar start tag detected
case "Calendar":
//
//
// Here's my question: can I access the children
// of this element here?
//
//
break;
}
}
}
// Close the XML reader
fileReader.Close();
Can I access the children of a certain element on the spot in the code where I put the big comment?

There are applications where using XmlReader to read through an XML document is the right answer. Unless you have hundreds of thousands of user settings that you want to read (you don't) and you don't need to update them (you do), XmlReader is the wrong choice.
Since you're talking about wanting to store arrays and structs in your XML, it seems obvious to me that you want to be using .NET's built in serialization mechanisms. (You can even use XML as the serialization format, if accessing the XML is important to you.) See this page for a starting point.
If you design your settings classes properly (which is not hard), you can serialize and deserialize them without having to write code that knows anything about the names and data types of the properties you're serializing. Your code that accesses the settings will be completely decoupled from the implementation details of how they are persisted, which, as Martha Stewart would say, is a Good Thing.

Check this short article for a solution.
And by the way, you should use LINQ to XML if you want to work easy with XML (overview).

You can use any of the following methods on XmlReader:
ReadSubtree()
ReadInnerXml()
ReadOuterXml()
ReadStartElement()
to read the child content. Which one you use depends on exactly what you wish to do with said child content.

App.config and create a custom section.
App.Config and Custom Configuration Sections

How can I load .xsd file to track relationships(nesting) between elements?

I know, where is XmlSchema class in dot net, but it does not allow to load file into it. Is there a clean solution to load xsd file and travel between elements?

You need to use XmlSchemaSet, e.g.:
XmlSchemaSet schemaSet = new XmlSchemaSet();
schemaSet.Add(targetNamespace, schemaUri);
schemaSet.Compile();
foreach(XmlSchemaElement element in schemaSet.GlobalElements.Values)
{
// do stuff...
}
EDIT: Sorry for not being clearer.
Where the comment says // do stuff..., what you need to do is traverse the inherited types of each element, which are available under XmlSchemaElement.SchemaType, and the inline type of the element, which is available under XmlSchemaElement.ElementSchemaType.
The MSDN does contain all the the information you're after, but it is somewhat maze-like and requires digging around plus a bit of trial-and-error to work it out.
As per my comment, the following class from one of my open-source side-projects may be of use to you here:
http://bitbucket.org/philbooth/schemabrute/src/tip/LibSchemaBrute/Navigator.cs

I suggest checking out this tool: http://www.altova.com/xmlspy.html. You can create a data model from any XSD file. I've used it in many XML projects.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.