Gracefully handle validation errors in a XML file in C# - c#

The description is bit on the longer side please bear with me. I would like to process and validate a huge XML file and log the node which triggered the validation error and continue with processing the next node. A simplified version of the XML file is shown below.
What I would like to perform is on encountering any validation error processing node 'A' or its children (both XMLException and XmlSchemaValidationException) I would like to stop processing current node log the error and XML for node 'A' and move on to the next node 'A'.
<Root>
<A id="A1">
<B Name="B1">
<C>
<D Name="ID" >
<E>Test Text 1</E>
</D>
<D Name="text" >
<E>Test Text 1</E>
</D>
</C>
</B>
</A>
<A id="A2">
<B Name="B2">
<C>
<D Name="id" >
<E>Test Text 3</E>
</D>
<D Name="tab1_id" >
<E>Test Text 3</E>
</D>
<D Name="text" >
<E>Test Text 3</E>
</D>
</C>
</B>
</Root>
I am currently able to recover from the XmlSchemaValidationException by using a ValidationEventHandler with XMLReader which throws a Exception that I handle in the XML Processing code. However for some cases XMLException is being triggered which leads to termination of the process.
The following snippets of the code illustrate the current structure I am using; it is messy and code improvement suggestions are also welcome.
// Setting up the XMLReader
XmlReaderSettings settings = new XmlReaderSettings();
settings.ConformanceLevel = ConformanceLevel.Auto;
settings.IgnoreWhitespace = true;
settings.CloseInput = true;
settings.IgnoreComments = true;
settings.ValidationType = ValidationType.Schema;
settings.Schemas.Add(null, "schema.xsd");
settings.ValidationEventHandler += new ValidationEventHandler(ValidationCallBack);
XmlReader reader = XmlReader.Create("Sample.xml", settings);
// Processing XML
while (reader.Read())
if (reader.NodeType == XmlNodeType.Element)
if (reader.Name.Equals("A"))
processA(reader.ReadSubtree());
reader.Close();
// Process Node A
private static void processA(XmlReader A){
try{
// Perform some book-keeping
// Process Node B by calling processB(A.ReadSubTree())
}
catch (InvalidOperationException ex){
}
catch (XmlException xmlEx){
}
catch (ImportException impEx){
}
finally{ if (A != null) A.Close(); }
}
// All the lower level process node functions propagate the exception to caller.
private static void processB(XmlReader B){
try{
// Book-keeping and call processC
}
catch (Exception ex){
throw ex;
}
finally{ if (B != null) B.Close();}
}
// Validation event handler
private static void ValidationCallBack(object sender, ValidationEventArgs e){
String msg = "Validation Error: " + e.Message +" at line " + e.Exception.LineNumber+
" position number "+e.Exception.LinePosition;
throw new ImportException(msg);
}
When a XMLSchemaValidationException is encountered the finally block will invoke close() and the original XMLReader is being positioned on the EndElement of the subtree and hence the finally block in processA will lead to processing of the next node A.
However when a XMlException is encountered invoking the close method is not positioning the original reader on the EndElement node of the subtree and an InvalidOperationException is being throw.
I tried to use methods like skip, ReadToXYZ() methods but these are invariably leading to XMLExcpetion of InvalidOperationException when invoked on any node that triggered an exception.
The following is a excerpt from MSDN regarding the ReadSubTree method.
When the new XmlReader has been
closed, the original XmlReader will be
positioned on the EndElement node of
the sub-tree. Thus, if you called the
ReadSubtree method on the start tag of
the book element, after the sub-tree
has been read and the new XmlReader
has been closed, the original
XmlReader is positioned on the end tag
of the book element.
Note: I cannot use .Net 3.5 for this, however .Net 3.5 suggestions are welcome.

See this question:
XML Parser Validation Report
You need to distinguish between well-formed xml (it follows the rules required to be real xml) and valid xml (follows additional rules given by a specific xml schema). From the spec:
Once a fatal error is detected, however, the processor must not continue normal processing (i.e., it must not continue to pass character data and information about the document's logical structure to the application in the normal way).
For better or worse, the xml tools included with Visual Studio need to follow that spec very closely, and therefore will not continue processing if there is a well-formedness error. The link I provided might give you some alternatives.

I have done this almost like you have, except for the exception eating and questionable use XmlReader.Close():
XmlReaderSettings settings = new XmlReaderSettings();
settings.ConformanceLevel = ConformanceLevel.Auto;
settings.IgnoreWhitespace = true;
settings.CloseInput = true;
settings.IgnoreComments = true;
settings.ValidationType = ValidationType.Schema;
settings.Schemas.Add(null, "schema.xsd");
settings.ValidationEventHandler += ValidationCallBack;
using (XmlReader reader = XmlReader.Create("Sample.xml", settings))
{
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element &&
reader.Name.Equals("A"))
{
using (
XmlReader subtree = reader.ReadSubtree())
{
try {
processA(subtree);
}
catch (ImportException ex) { // log it at least }
catch (XmlException ex) {// log this too }
// I would not catch InvalidOperationException - too general
}
}
}
}
private static void processA(XmlReader A)
{
using (XmlReader subtree = A.ReadSubtree())
{
processB(subtree);
}
}
// All the lower level process node functions propagate the exception to caller.
private static void processB(XmlReader B)
{
}
You don't want to eat the exceptions, except for those from processA. Let the caller of processA decide wihch exceptions to ignore - processA should not be aware of that. Same with the using blocks - put them on the outside, rather than the inside.

Related

Partial XML file validation using XSD

I am trying to use the XDocument class and XmlSchemaSet class to validate an XMl file.
The XML file already exists but I want to add in just a single element consisting of a couple other elements and I only want to validate this node.
Here is an example of the XML file. The piece I would like to validate is the TestConfiguration node:
<?xml version="1.0" encoding="ISO-8859-1"?>
<Root>
<AppType>Test App</AppType>
<LabelMap>
<Label0>
<Title>Tests</Title>
<Indexes>1,2,3</Indexes>
</Label0>
</LabelMap>
<TestConfiguration>
<CalculateNumbers>true</CalculateNumbers>
<RoundToDecimalPoint>3</RoundToDecimalPoint>
</TestConfiguration>
</Root>
Here is my xsd so far:
<?xml version="1.0" encoding="utf-8"?>
<xs:schema id="TestConfiguration"
targetNamespace="MyApp_ConfigurationFiles" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="TestConfiguration">
<xs:complexType>
<xs:sequence>
<xs:element name="CalculateNumbers" type="xs:boolean" minOccurs="1" maxOccurs="1"/>
<xs:element name="RoundToDecimalPoint" type="xs:int" minOccurs="1" maxOccurs="1"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Here is the code I use to validate it:
private bool ValidateXML(string xmlFile, string xsdFile)
{
string xsdFilePath = Path.Combine(Path.GetDirectoryName(Assembly.GetEntryAssembly().Location) ?? string.Empty, xsdFile);
Logger.Info("Validating XML file against XSD schema file.");
Logger.Info("XML: " + xmlFile);
Logger.Info("XSD: " + xsdFilePath);
try
{
XDocument xsdDocument = XDocument.Load(xsdFilePath);
XmlSchemaSet schemaSet = new XmlSchemaSet();
schemaSet.Add(XmlSchema.Read(new StringReader(xsdDocument.ToString()), this.XmlValidationEventHandler));
XDocument xmlDocument = XDocument.Load(xmlFile);
xmlDocument.Validate(schemaSet, this.XmlValidationEventHandler);
}
catch (Exception e)
{
Logger.Info("Error parsing XML file: " + xmlFile);
throw new Exception(e.Message);
}
Logger.Info("XML validated against XSD.");
return true;
}
Even validating the full XML file, the validation will pass successfully causing me to run into problems when I try to load the XML file into the generated class file created by xsd2code, the error: <Root xmlns=''> was not expected..
How can I validate just the TestConfiguration piece?
Thanks
You have a few issues here:
Validating the entire document succeeds when it should fail.
This happens because the root node is unknown to the schema, and encountering an unknown node is considered a validation warning not a validation error - even if that unknown node is the root element. To enable warnings while validating, you need to set XmlSchemaValidationFlags.ReportValidationWarnings. However, there's no way to pass this flag to XDocument.Validate(). The question XDocument.Validate is always successful shows one way to work around this.
Having done this, you must also throw an exception in your validation handler when ValidationEventArgs.Severity == XmlSeverityType.Warning.
(As for requiring a certain root element in your XSD, this is apparently not possible.)
You need a convenient way to validate elements as well as documents, so you can validate your <TestConfiguration> piece.
Your XSD and XML are inconsistent.
You XSD specifies that your elements are in the XML namespace MyApp_ConfigurationFiles in the line targetNamespace="MyApp_ConfigurationFiles" elementFormDefault="qualified". In fact the XML elements shown in your question are not in any namespace.
If the XSD is correct, your XML root node needs to look like:
<Root xmlns="MyApp_ConfigurationFiles">
If the XML is correct, your XSD needs to look like:
<xs:schema id="TestConfiguration"
elementFormDefault="unqualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
After you have resolved the XSD and XML inconsistency from #3, you can solve issues #1 and #2 by introducing the following extension methods that validate both documents and elements:
public static class XNodeExtensions
{
public static void Validate(this XContainer node, XmlReaderSettings settings)
{
if (node == null)
throw new ArgumentNullException();
using (var innerReader = node.CreateReader())
using (var reader = XmlReader.Create(innerReader, settings))
{
while (reader.Read())
;
}
}
public static void Validate(this XContainer node, XmlSchemaSet schemaSet, XmlSchemaValidationFlags validationFlags, ValidationEventHandler validationEventHandler)
{
var settings = new XmlReaderSettings();
settings.ValidationType = ValidationType.Schema;
settings.ValidationFlags |= validationFlags;
if (validationEventHandler != null)
settings.ValidationEventHandler += validationEventHandler;
settings.Schemas = schemaSet;
node.Validate(settings);
}
}
Then, to validate the entire document, do:
try
{
var xsdDocument = XDocument.Load(xsdFilePath);
var schemaSet = new XmlSchemaSet();
using (var xsdReader = xsdDocument.CreateReader())
schemaSet.Add(XmlSchema.Read(xsdReader, this.XmlSchemaEventHandler));
var xmlDocument = XDocument.Load(xmlFile);
xmlDocument.Validate(schemaSet, XmlSchemaValidationFlags.ReportValidationWarnings, XmlValidationEventHandler);
}
catch (Exception e)
{
Logger.Info("Error parsing XML file: " + xmlFile);
throw new Exception(e.Message);
}
And to validate a specific node, you can use the same extension methods:
XNamespace elementNamespace = "MyApp_ConfigurationFiles";
var elementName = elementNamespace + "TestConfiguration";
try
{
var xsdDocument = XDocument.Load(xsdFilePath);
var schemaSet = new XmlSchemaSet();
using (var xsdReader = xsdDocument.CreateReader())
schemaSet.Add(XmlSchema.Read(xsdReader, this.XmlSchemaEventHandler));
var xmlDocument = XDocument.Load(xmlFile);
var element = xmlDocument.Root.Element(elementName);
element.Validate(schemaSet, XmlSchemaValidationFlags.ReportValidationWarnings, this.XmlValidationEventHandler);
}
catch (Exception e)
{
Logger.Info(string.Format("Error validating element {0} of XML file: {1}", elementName, xmlFile));
throw new Exception(e.Message);
}
Now validating the entire document fails while validating the {MyApp_ConfigurationFiles}TestConfiguration node succeeds, using the following validation event handlers:
void XmlSchemaEventHandler(object sender, ValidationEventArgs e)
{
if (e.Severity == XmlSeverityType.Error)
throw new XmlException(e.Message);
else if (e.Severity == XmlSeverityType.Warning)
Logger.Info(e.Message);
}
void XmlValidationEventHandler(object sender, ValidationEventArgs e)
{
if (e.Severity == XmlSeverityType.Error)
throw new XmlException(e.Message);
else if (e.Severity == XmlSeverityType.Warning)
throw new XmlException(e.Message);
}

C# read XML with DTD verification

I'm trying to read an XML file with dtd verification but no mather how I do it seems like the program doesn't read my dtd file. I have concentrated the problem to a small xml file and a small dtd file:
test.xml - Located at c:\test.xml
<?xml version="1.0"?>
<!DOCTYPE Product SYSTEM "test.dtd">
<Product ProductID="123">
<ProductName>Rugby jersey</ProductName>
</Product>
test.dtd - located at c:\test.dtd
<!ELEMENT Product (ProductName)>
<!ATTLIST Product ProductID CDATA #REQUIRED>
<!ELEMENT ProductName (#PCDATA)>
My C# program looks like this
namespace XML_to_csv_converter
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
ReadXMLwithDTD();
}
public void ReadXMLwithDTD()
{
// Set the validation settings.
XmlReaderSettings settings = new XmlReaderSettings();
settings.ValidationType = ValidationType.DTD;
settings.DtdProcessing = DtdProcessing.Parse;
settings.ValidationEventHandler += new ValidationEventHandler(ValidationCallBack);
settings.IgnoreWhitespace = true;
// Create the XmlReader object.
XmlReader reader = XmlReader.Create("c:/test.xml", settings);
// Parse the file.
while (reader.Read())
{
System.Console.WriteLine("{0}, {1}: {2} ", reader.NodeType, reader.Name, reader.Value);
}
}
private static void ValidationCallBack(object sender, ValidationEventArgs e)
{
if (e.Severity == XmlSeverityType.Warning)
Console.WriteLine("Warning: Matching schema not found. No validation occurred." + e.Message);
else // Error
Console.WriteLine("Validation error: " + e.Message);
}
}
}
This results in the output:
XmlDeclaration, xml: version="1.0"
DocumentType, Product:
Validation error: The 'Product' element is not declared.
Element, Product:
Validation error: The 'ProductName' element is not declared.
Element, ProductName:
Text, : Rugby jersey
EndElement, ProductName:
EndElement, Product:
I have tried to have the files in defferent locations and i have tried both relative and absolute paths. I have tried to copy an example from microsoft webpage and it resulted in the same problem. Someone have an idea of what can be the problem? Is there any way to see if the program was able to load the dtd file?
I cannot comment so I add an answer to the correct answer by Jim :
// SET THE RESOLVER
settings.XmlResolver = new XmlUrlResolver();
this is a breaking change between .Net 4.5.1 and Net 4.5.2 / .Net 4.6. The resolver was set by default to XmlUrlResolver before. Got stung by this.
You need to add the resolver.
XmlReaderSettings settings = new XmlReaderSettings();
// SET THE RESOLVER
settings.XmlResolver = new XmlUrlResolver();
settings.ValidationType = ValidationType.DTD;
settings.DtdProcessing = DtdProcessing.Parse;
settings.ValidationEventHandler += new ValidationEventHandler(ValidationCallBack);
settings.IgnoreWhitespace = true;
As long as the two files are in the same directory, this will work.
Alternatively you need to provide an URL to the DTD.
XmlUrlResolver can also be overridden to provide additional semantics to the resolution process.

XmlReader ReadStartElement causes XmlException

I'm writing a file reader using the XmlReader in a Silverlight project. However, I'm getting some errors (specifically around the XmlReader.ReadStartElement method) and it's causing me to believe that I've misunderstood how to use it somewhere along the way.
Basically, here is a sample of the format of the Xml I am using:
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<root>
<EmptyElement />
<NonEmptyElement Name="NonEmptyElement">
<SubElement Name="SubElement" />
</NonEmptyElement>
</root>
And here is a sample of some code used in the same way as how I am using it:
public void ReadData(XmlReader reader)
{
// Move to root element
reader.ReadStartElement("root");
// Move to the empty element
reader.ReadStartElement("EmptyElement");
// Read any children
while(reader.ReadToNextSibling("SubEmptyElement"))
{
// ...
}
// Read the end of the empty element
reader.ReadEndElement();
// Move to the non empty element
reader.ReadStartElement("NonEmptyElement"); // NOTE: This is where I get the error.
// ...
}
So, essentially, I am simply trying to read each element and any contained children. The error I get at the highlighted point is as follows:
Error Description
[Xml_InvalidNodeType]
Arguments: None,10,8
Debugging resource strings are unavailable. Often the key and arguments provide sufficient information to diagnose the problem. See http://go.microsoft.com/fwlink/?linkid=106663&Version=4.0.51204.0&File=System.Xml.dll&Key=Xml_InvalidNodeType
Error Stack Trace
at System.Xml.XmlReader.ReadStartElement(String name)
at ----------------
Any advice or direction on this would be greatly appreciated.
EDIT
Since this reader needs to be fairly generic, it can be assumed that the Xml may contain elements that are children of the EmptyElement. As such, the attempt at reading any SubEmptyElements should be valid.
<SubElement/> is not a sibling of <EmptyElement>, so <NonEmptyElement> is going to get skipped entirely, and your call to ReadEndElement() will read the end element </root>. When you try to subsequently read "NonEmptyElement", there are no elements left, and you'll get an XmlException: {"'None' is an invalid XmlNodeType. Line 8, position 1."}
Note also that since <EmptyElement/> is empty, when you ReadStartElement("EmptyElement"), you'll read the whole element, and you won't need to use ReadEndElement().
I'd also recommend that you configure your reader settings to IgnoreWhitespace (if you're not already doing so), to avoid any complications introduced by reading (insignificant) whitespace text nodes when you aren't expecting them.
Try moving the Read of NonEmptyElement up:
public static void ReadData(XmlReader reader)
{
reader.ReadStartElement("root");
reader.ReadStartElement("EmptyElement");
reader.ReadStartElement("NonEmptyElement");
while (reader.ReadToNextSibling("SubEmptyElement"))
{
// ...
}
reader.ReadEndElement(/* NonEmptyElement */);
reader.ReadEndElement(/* root */);
// ...
}
If you just want to skip anything in <EmptyElement>, regardless of whether or not its actually empty, use ReadToFollowing:
public static void ReadData(XmlReader reader)
{
reader.ReadStartElement("root");
reader.ReadToFollowing("NonEmptyElement");
Console.WriteLine(reader.GetAttribute("Name"));
reader.ReadStartElement("NonEmptyElement");
Console.WriteLine(reader.GetAttribute("Name"));
while (reader.ReadToNextSibling("SubEmptyElement"))
{
// ...
}
reader.ReadEndElement(/* NonEmptyElement */);
reader.ReadEndElement(/* root */);
// ...
}
Update: Here's a fuller example with a clearer data model. Maybe this is closer to what you're asking for.
XMLFile1.xml:
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<root>
<Person Type="Homeless"/>
<Person Type="Developer">
<Home Type="Apartment" />
</Person>
<Person Type="Banker">
<Home Type="Apartment"/>
<Home Type="Detached"/>
<Home Type="Mansion">
<PoolHouse/>
</Home>
</Person>
</root>
Program.cs:
using System;
using System.Xml;
namespace ConsoleApplication6
{
internal class Program
{
public static void ReadData(XmlReader reader)
{
reader.ReadStartElement("root");
while (reader.IsStartElement("Person"))
{
ReadPerson(reader);
}
reader.ReadEndElement( /* root */);
}
public static void ReadPerson(XmlReader reader)
{
Console.WriteLine(reader.GetAttribute("Type"));
bool isEmpty = reader.IsEmptyElement;
reader.ReadStartElement("Person");
while (reader.IsStartElement("Home"))
{
ReadHome(reader);
}
if (!isEmpty)
{
reader.ReadEndElement( /* Person */);
}
}
public static void ReadHome(XmlReader reader)
{
Console.WriteLine("\t" + reader.GetAttribute("Type"));
bool isEmpty = reader.IsEmptyElement;
reader.ReadStartElement("Home");
if (!isEmpty)
{
reader.Skip();
reader.ReadEndElement( /* Home */);
}
}
private static void Main(string[] args)
{
var settings = new XmlReaderSettings { IgnoreWhitespace = true };
using (var xr = XmlReader.Create("XMLFile1.xml", settings))
{
ReadData(xr);
}
Console.ReadKey();
}
}
}

C# XmlNodeReader exception NodeType not supported

I'm getting the exception that NodeType "None" is not supported when trying to run the following code.
public int ObjectContentI(string XmlPath)
{
XmlNodeReader xnr = new XmlNodeReader(this.xmlr.SelectSingleNode(XmlPath));
return xnr.ReadElementContentAsInt();
}
this.xmlr is a XmlDocument with a document successfully loaded in it. XmlPath contains a valid XPath url.
How do i set the NodeType (xnr.NodeType is readonly) or am I doing something else wrong?
Part of my XML:
<?xml version="1.0" encoding="utf-8" ?>
<ship weapons="0">
<cost>
<metal>250</metal>
<crystal>100</crystal>
</cost>
<health>
<shields>750</shields>
<sregene>10</sregene>
<hitpoints>1000</hitpoints>
<oxygen cps="2">25000</oxygen>
</health>
My XPath: "/ship/health/shields/text()"
Well, your approach is correct but not completely.
Let's suppose
XmlNode n = myXMLDoc.SelectSingleNode("/ship/health/shields/");
n.InnerXML OR n.InnerText should give you what you need.
Though conqenator provided you with code that fixed your problem following is reasoning why it didn't work in the first place:
If you don't call the Read method on a XmlNodeReader or any of the classes that derive from XmlReader, you will always get a XmlNodeType.None NodeType, which is the reason for the error. To fix the code you provided and to get an int back this is what the code needs to look like:
public int ObjectContentI(string XmlPath)
{
int result;
using(XmlNodeReader xnr = new XmlNodeReader(this.xmlr.SelectSingleNode(XmlPath))){
while(xnr.Read()){
result = xnr.ReadElementContentAsInt();
}
}
return result;
}
Note that the XPath to get this wroking needs to change to /ship/health/shields since ReadElementContentAsInt() returns the content of the element and won't work on a Text Node, which what you get when using /ship/health/shields/text().
Notice that I have also wrapped the XmlNodeReader in a using block, which will dispose of the XmlNodeReader once you are done with it to free up resources.

How can I find the position where a string is malformed XML (in C#)?

I'm writing a lightweight XML editor, and in cases where the user's input is not well formed, I would like to indicate to the user where the problem is, or at least where the first problem is. Does anyone know of an existing algorithm for this? If looking at code helps, if I could fill in the FindIndexOfInvalidXml method (or something like it), this would answer my question.
using System;
namespace TempConsoleApp
{
class Program
{
static void Main(string[] args)
{
string text = "<?xml version=\"1.0\"?><tag1><tag2>Some text.</taagg2></tag1>";
int index = FindIndexOfInvalidXml(text);
Console.WriteLine(index);
}
private static int FindIndexOfInvalidXml(string theString)
{
int index = -1;
//Some logic
return index;
}
}
}
I'd probably just cheat. :) This will get you a line number and position:
string s = "<?xml version=\"1.0\"?><tag1><tag2>Some text.</taagg2></tag1>";
System.Xml.XmlDocument doc = new System.Xml.XmlDocument();
try
{
doc.LoadXml(s);
}
catch(System.Xml.XmlException ex)
{
MessageBox.Show(ex.LineNumber.ToString());
MessageBox.Show(ex.LinePosition.ToString());
}
Unless this is an academic exercise, I think that writing your own XML parser is probably not the best way to go about this. I would probably check out the XmlDocument class within the System.Xml namespace and try/catch exceptions for the Load() or LoadXml() methods. The exception's message property should contain info on where the error occurred (row/col numbers) and I suspect it'd be easier to use a regular expression to extract those error messages and the related positional info.
You should be able to simply load the string into an XmlDocument or an XmlReader and catch XmlException. The XmlException class has a LineNumber property and a LinePosition property.
You can also use XmlValidatingReader if you want to validate against a schema in addition to checking that a document is well-formed.
You'd want to load the string into an XmlDocument object via the load method and then catch any exceptions.
public bool isValidXml(string xml)
{
System.Xml.XmlDocument xDoc = null;
bool valid = false;
try
{
xDoc = new System.Xml.XmlDocument();
xDoc.loadXml(xmlString);
valid = true;
}
catch
{
// trap for errors
}
return valid;
}

Categories