Validate xml subset using schema subset in c# .net XmlDocument - c#

Currently I have a solution that builds an XML document in a number of sections and then validates the final concatenated xml against a single schema. Is it possible to use a subset of the same schema to validate each section individually?

The answer is yes in most of the cases. For a disclaimer, in theory someone could intentionally write an XML Schema that would make some of my proposals impossible, but then that would be just bad practice in XSD authoring.
For a straightforward solution, the following assumptions should be true:
A section is well formed XML; you're concatenating XmlElement nodes. E.g.:
<section-element ... attribute content>
... more content
</section-element>
Each of the sections being merged has a matching global element declaration in your XML Schema set. If you use the xsi:type attribute for any of your sections, things might get a bit tricky, but not hard to fix.
The validation would be common code, where the XmlReader would be an XmlNodeReader on the node you're concatenating. Use the XmlReaderSettings as usual...
The above would work for any XSD (you don't have a design time dependency of knowing the XSD). For anything below, the code would have to match your XSD...
If you don't have the matching global elements in the XML Schema then you have to look at the type of each matching local element declaration. If the type is global, then you can easily create, in memory, dummy elements that match your sections, of the global type (assuming a Venetian Blind authoring style).
If even the type is anonymous (more of a Russian Doll style), then you can even fake that, by creating a global element with a type that is a copy of the anonymous type - all in memory.

Related

Parsing XML file in C# - how to report errors

First I load the file in a structure
XElement xTree = XElement.Load(xml_file);
Then I create an enumerable collection of the elements.
IEnumerable<XElement> elements = xTree.Elements();
And iterate elements
foreach (XElement el in elements)
{
}
The problem is - when I fail to parse the element (a user made a typo or inserted a wrong value) - how can I report exact line in the file?
Is there any way to tie an element to its corresponding line in the file?
One way to do it (although not a proper one) –
When you find a wrong value, add an invalid char (e.g. ‘<’) to it.
So instead of: <ExeMode>Bla bla bla</ExeMode>
You’ll have: <ExeMode><Bla bla bla</ExeMode>
Then load the XML again with try / catch (System.Xml.XmlException ex).
This XmlException has LineNumber and LinePosition.
If there is a limited set of acceptable values, I believe XML Schemas have the concept of an enumerated type -- so write a schema for the file and have the parser validate against that. Assuming the parser you're using supports Schemas, which most should by now.
I haven't looked at DTDs in decades, but they may have the same kind of capability.
Otherwise, you would have to consider this semantic checking rather than syntactic checking, and that makes it your application's responsibility. If you are using a SAX parser and interpreting the data as you go, you may be able to get the line number; check your parser's features.
Otherwise the best answer I've found is to report the problem using an xpath to the affected node/token rather than a line number. You may be able to find a canned solution for that, either as a routine to run against a DOM tree or as a state machine you can run alongside your SAX code to track the path as you go.
(All the "maybe"s are because I haven't looked at what's currently available in a very long time, and because I'm trying to give an answer that is valid for all languages and parser implementations. This should still get you pointed in some useful directions.)

Get list of XSD elements in C#

I have an XSD file and want to get a list of the names of all the elements in it. I don't mean stuff like <xs:sequence> and so on, just the "real stuff", that actually can appear in XML that are valid according to the XSD.
Real stuff is a bit vague
But if you just want want all elements it's just a it of Xpath.
If you want a tree, then you can't avoid sequence etc.
If you have things like xs:choice in there you have even more issues.
Then there's attributes...
From SimpleContent or ComplexType...
Might be easier to generate a 'blank' xml document from the xsd and then get what you want out of that. That's a fair chunk of code as well though. Might be one lying around you can borrow though.
If you don't actually want to do this from your code, you could use the XML Schema Definition Tool (Xsd.exe) to create source code for runtime objects.
From there you can use Xml serialization to create valid Xml samples for your given Xsd schema.
Since you're trying to code for this, I would assume you want to do this against different XML Schema files, over and over; if true, it would be then important to understand if you really have to embed this in your codebase, or if it can be used as an external tool.
If you really want to do it, most of all you need is in System.Xml.Schema package. Start with an XmlSchemaSet to load and compile your XSD files. Then using an iterator on GlobalElements, go over the global elements that can show as your root elements in XML document and traverse those (for what you need, use the PSVI properties); as someone else was mentioning, there will be types to go through, compositors, etc.; and then there's more: abstract elements (those can't show up in XML, neither references to abstract elements, instead members of substitution groups), prohibited attributes, restricted types, etc.
I've recently answered another post that may be related to your need; your posted XML Schema may look like this:
root/ship/engine/#MaxSpeed,A,1..1,True
root/ship/crew/#function,A,1..1,True
root/ship/#Name,A,1..1,True
root/ship/#class,A,1..1,True
root/ship/special_abilities/hull/#separable,A,0..1,False
root/ship/special_abilities/hull/#canCarryWesley,A,0..1,False
root/ship/special_abilities/hull/#capableOfLanding,A,0..1,False
If you want, you can deal only with the first column; the generated XPath shows only those items (elements or attributes) that have data; processing something like the above might be much easier (split the string using /, elements are all but #, etc.)

XML Schema validation - intra-field validation

This is the scenario/problem I am trying to solve - Within a sequence of elements in my XSD I have an element- say XYZ which can be nillable if the one of the preceding element - say ABC - has a certain value - say "Alpha". If that preceding element - ABC has a different value then the element XYZ must be not nillable.
What is the best approach to solve this problem?
I am using C# & SQL Server.
Is it possible to define new attributes within a XSD?
Really an XSD should be fixed to control the structure and format of the elements and attributes. What you are attempting to do is implement business rules, which cannot be validated using an XSD.
However, there is a framework available for implementing business rules in XML, it is an ISO standard called Schematron. Schematron basically uses a combination of XPath to implement the logic and XSLT to perform the validation.
There is a .NET project for this know as Schematron.NET.
This may be interesting reading 'Improving XML Document Validation with Schematron'.

Declare namespaces within XPath expression

My application needs to evaluate XPath expression against some XML data. Expression is provided by user at runtime. So, I cannot create XmlNamespaceManager to pass to XPathEvaluate because I don't know prefixes and namespaces at compile time.
Is there any possibility to specify namespaces declaration within xpath expression?
Answers to comments:
XML data has one default namespace but there can be nested elements with any namespaces. User knows namespaces of the data he works with.
User-provided xpath expression is to be evaluated against many XML documents, and every document can have its own prefixes for the same namespaces.
If the same prefix can be bound to different namespaces and prefixes aren't known in advance, then the only pure XPath way to specify such expressions is to use this form of referring to elements:
someName[namespace-uri() = 'exactNamespace']
So, a particular XPath expression would be:
/*/a[namespace-uri() = 'defaultNS']/b[namespace-uri() = 'NSB']
/c[namespace-uri() = 'defaultNS']
I don't know any way to define a namespace prefix in an XPath expression.
But you can write the XPath expression to be agnostic of namespace-prefixes by using local-name() and namespace-uri() functions where appropriate.
Or if you know the XML-namespaces in advance, you can register an arbitrary prefix for them in the XmlNamespaceManager and tell your user to use that prefix in the XPath expression. It doesn't matter if the XML document itself registers a different prefix or no prefix at all. Path resolution is based on the namespace alone, not on the prefix.
Another option would be to scan the document at runtime (use XmlReader for low resource overhead if you haven't loaded it already) and then add the used mappings in the document in the XmlNamespaceManager. I'm not sure if you can get the namespaces and prefixes from XmlDocument, but I see no direct method to do it. It's easy with XmlReader though, since it exposes NamespaceURI and Prefix members for each node.
Is there any possibility to specify namespaces declaration within xpath expression?
The answer is no - it's always done in the calling environment (which is actually more flexible).
An alternative would be to use XQuery, which does allow declaring namespaces in the query prolog.
UPDATE (2020)
In XPath 3.1 you can use the syntax /*/Q{http://my-namespace}a.
Sadly, though, if you're still using Microsoft software, then the situation hasn't changed since 2011 - you're still stuck with XPath 1.0 with all its shortcomings.

LINQ to SQL to XML (using XML literals in C#)

Is it possible to use variables like <%=person.LastName %> in XML string this way?
XElement letters = new XElement("Letters");
XElement xperson = XElement.Parse("<Table><Row><Cell><Text>
<Segment>Dear <%=person.Title%> <%=person.FirstName%> <%=person.LastName%>,
</Segment></Text></Cell></Row>...");
foreach (Person person in persons){
letters.Add(xperson)
}
If it's possible, it would be a lifesaver since I can't use XElement and XAttribute to add the nodes manually. We have multiple templates and they frequently change (edit on the fly).
If this is not doable, can you think of another way so that I can use templates for the XML?
Look like it's possible in VB
http://aspalliance.com/1534_Easy_SQL_to_XML_with_LINQ_and_Visual_Basic_2008.6
This is an exclusive VB.NET feature known as XML Literals. It was added in VB 9.0. C# does not currently support this feature. Although Microsoft has stated its intent to bridge the gap between the languages in the future, it's not clear whether this feature will make it to C# any time soon.
Your example doesn't seem clear to me. You would want to have the foreach loop before parsing the actual XML since the values are bound to the current Person object. For example, here's a VB example of XML literals:
Dim xml = <numbers>
<%= From i In Enumerable.Range(1, 5)
Select <number><%= i %></number>
%>
</numbers>
For Each e In xml.Elements()
Console.WriteLine(e.Value)
Next
The above snippet builds the following XML:
<numbers>
<number>1</number>
<number>2</number>
<number>3</number>
</numbers>
Then it writes 1, 2, 3 to the console.
If you can't modify your C# code to build the XML dynamically then perhaps you could write code that traverses the XML template and searches for predetermined fields in attributes and elements then sets the values as needed. This means you would have to iterate over all the attributes and elements in each node and have some switch statement that checks for the template field name. If you encounter a template field and you are currently iterating attributes you would set it the way attributes are set, whereas if you were iterating elements you would set it the way elements are set. This is likely not the most efficient approach but is one solution.
The simplest solution would be to use VB.NET. You can always develop it as a stand-alone project, add a reference to the dll from the C# project, pass data to the VB.NET class and have it return the final XML.
EDIT: to clarify, using VB.NET doesn't bypass the need to update the template. It allows you to specify the layout easier as an XML literal. So code still needs to be updated in VB once the layout changes. You can't load an XML template from a text file and expect the fields to be bound that way. For a truly dynamic solution that allows you to write the code once and read different templates my first suggestion is more appropriate.

Categories