What is the best way to compare XML files for equality?

What is the best way to compare XML files for equality? - c#

I'm using .NET 2.0, and a recent code change has invalidated my previous Assert.AreEqual call (which compared two strings of XML). Only one element of the XML is actually different in the new codebase, so my hope is that a comparison of all the other elements will give me the result I want. The comparison needs to be done programmatically, since it's part of a unit test.
At first, I was considering using a couple instances of XmlDocument. But then I found this:
http://drowningintechnicaldebt.com/blogs/scottroycraft/archive/2007/05/06/comparing-xml-files.aspx
It looks like it might work, but I was interested in Stack Overflow feedback in case there's a better way.
I'd like to avoid adding another dependency for this if at all possible.
Similar questions
Is there an XML asserts for NUnit?
How would you compare two XML Documents?

It really depends on what you want to check as "differences".
Right now, we're using Microsoft XmlDiff: http://msdn.microsoft.com/en-us/library/aa302294.aspx

You might find it's less fragile to parse the XML into an XmlDocument and base your Assert calls on XPath Query. Here are some helper assertion methods that I use frequently. Each one takes a XPathNavigator, which you can obtain by calling CreateNavigator() on the XmlDocument or on any node retrieved from the document. An example of usage would be:
XmlDocument doc = new XmlDocument( "Testdoc.xml" );
XPathNavigator nav = doc.CreateNavigator();
AssertNodeValue( nav, "/root/foo", "foo_val" );
AssertNodeCount( nav, "/root/bar", 6 )
private static void AssertNodeValue(XPathNavigator nav,
string xpath, string expected_val)
{
XPathNavigator node = nav.SelectSingleNode(xpath, nav);
Assert.IsNotNull(node, "Node '{0}' not found", xpath);
Assert.AreEqual( expected_val, node.Value );
}
private static void AssertNodeExists(XPathNavigator nav,
string xpath)
{
XPathNavigator node = nav.SelectSingleNode(xpath, nav);
Assert.IsNotNull(node, "Node '{0}' not found", xpath);
}
private static void AssertNodeDoesNotExist(XPathNavigator nav,
string xpath)
{
XPathNavigator node = nav.SelectSingleNode(xpath, nav);
Assert.IsNull(node, "Node '{0}' found when it should not exist", xpath);
}
private static void AssertNodeCount(XPathNavigator nav, string xpath, int count)
{
XPathNodeIterator nodes = nav.Select( xpath, nav );
Assert.That( nodes.Count, Is.EqualTo( count ) );
}

Doing a simple string compare on a xml string not always work. Why ?
for example both :
<MyElement></MyElmennt> and <MyElment/> are equal from an xml standpoint ..
There are algorithms for converting making an xml always look the same, they are called
canonicalization algorithms. .Net has support for canonicalization.

I wrote a small library with asserts for serialization, source.
Sample:
[Test]
public void Foo()
{
...
XmlAssert.Equal(expected, actual, XmlAssertOptions.IgnoreDeclaration | XmlAssertOptions.IgnoreNamespaces);
}

Because of the contents of an XML file can have different formatting and still be considered the same (from a DOM point of view) when you are testing the equality you need to determine what the measure of that equality is, for example is formatting ignored? does meta-data get ignored etc is positioning important, lots of edge cases.
Generally you would create a class that defines your equality rules and use it for your comparisons, and if your comparison class implements the IEqualityComparer and/or IEqualityComparer<T> interfaces, then your class can be used in a bunch of inbuilt framework lists as the equality test implementation as well. Plus of course you can have as many as you need to measure equality differently as your requirements require.
i.e
IEnumerable<T>.Contains
IEnumerable<T>.Equals
The constructior of a Dictionary etc etc

I ended up getting the result I wanted with the following code:
private static void ValidateResult(string validationXml, XPathNodeIterator iterator, params string[] excludedElements)
{
while (iterator.MoveNext())
{
if (!((IList<string>)excludedElements).Contains(iterator.Current.Name))
{
Assert.IsTrue(validationXml.Contains(iterator.Current.Value), "{0} is not the right value for {1}.", iterator.Current.Value, iterator.Current.Name);
}
}
}
Before calling the method, I create a navigator on the instance of XmlDocument this way:
XPathNavigator nav = xdoc.CreateNavigator();
Next, I create an instance of XPathExpression, like so:
XPathExpression expression = XPathExpression.Compile("/blah/*");
I call the method after creating an iterator with the expression:
XPathNodeIterator iterator = nav.Select(expression);
I'm still figuring out how to optimize it further, but it does the trick for now.

I made a method to create simple XML paths.
static XElement MakeFromXPath(string xpath)
{
XElement root = null;
XElement parent = null;
var splits = xpath.Split('/'); //split xpath into parts
foreach (var split in splits)
{
var el = new XElement(split);
if (parent != null)
parent.Add(el);
else
root = el; //first element created, set as root
parent = el;
}
return root;
}
Sample usage:
var element = MakeFromXPath("My/Path/To/Element")'
element will contain the value:
<My>
<Path>
<To>
<Element></Element>
</To>
</Path>
</My>

Related

Cannot access or find reference to System.Xml.Linq.LineInfoAnnotation. Why is this?

I have an application which takes an XML document and sorts it by certain attributes. I have information associated with each line of the XML document which I want to include in the sorted document. In order to do this,
When I load the file, I make sure the line info is loaded using XDocument.Load(file, LoadOptions.SetLineInfo).
Then I recursively iterate over each XElement and get its line info. When I ran the app, I noticed that each XElement has two annotations,
one of type System.Xml.Linq.LineInfoAnnotation
and one of type System.Xml.Linq.LineInfoEndElementAnnotation.
They contain the info that I need but in private fields.
I can't find any information on these classes, I can't instantiate them, they do not appear in the Object browser under System.Xml.Linq. Yet they exist and I can run "GetType()" on them and get information about the class.
If they exist, why are they not in MSDN references and why can't I instantiate them or extend them? Why can't I find them in the object browser?
P.S. My workaround for this was to use reflection to get the information contained inside each element. But I still can't pass a class name to tell the method what type it is, I have to isolate the object from XElement.Annotations(typeof(object)), and then run GetType() on it. I've illustrated this below.
public object GetInstanceField(Type type, object instance, string fieldName)
{
//reflective method that gets value of private field
}
XElement xEl = existingXElement; //existingXElement is passed in
var annotations = xEl.Annotations(typeof(object)); //contains two objects, start and end LineInfoAnnotation
var start = annotations.First();
var end = annotations.Last();
var startLineNumber = GetInstanceField(start.GetType(), start, lineNumber); //lineNumber is private field I'm trying to access.
var endLineNumber = GetInstanceField(end.GetType(), end, lineNumber);
This code works, but again, I can't just tell the method "typeof(LineInfoAnnotation)", instead I have to do GetType on the existing object. I cannot make sense of this.

Those classes are private - an implementation detail, if you will.
All XObjects (elements, attributes) implement the IXmlLineInfo interface - but they implement the inteface explicitly, so you must perform a cast to access the properties.
Once you have your IXmlLineInfo, you can use the properties LineNumber and LinePosition.
var data =
#"<example>
<someElement
someAttribute=""val"">
</someElement></example>
";
var doc = XDocument.Load(new MemoryStream(Encoding.UTF8.GetBytes(data)), LoadOptions.SetLineInfo);
foreach(var element in doc.Descendants()) {
var elLineInfo = element as IXmlLineInfo;
Console.Out.WriteLine(
$"Element '{element.Name}' at {elLineInfo.LineNumber}:{elLineInfo.LinePosition}");
foreach(var attr in element.Attributes()) {
var attrLineInfo = attr as IXmlLineInfo;
Console.Out.WriteLine(
$"Attribute '{attr.Name}' at {attrLineInfo.LineNumber}:{attrLineInfo.LinePosition}");
}
}
Output:
Element 'example' at 1:2
Element 'someElement' at 2:2
Attribute 'someAttribute' at 3:3
To get the EndElement information, you have to use a plain old XML reader, since the XObject api doesn't expose any information about where the element ends.
using(var reader = doc.CreateReader()) {
while(reader.Read()) {
var lineInfo = reader as IXmlLineInfo;
Console.Out.WriteLine($"{reader.NodeType} {reader.Name} at {lineInfo.LineNumber}:{lineInfo.LinePosition}");
if(reader.NodeType == XmlNodeType.Element && reader.HasAttributes) {
while(reader.MoveToNextAttribute()) {
Console.Out.WriteLine($"{reader.NodeType} {reader.Name} at {lineInfo.LineNumber}:{lineInfo.LinePosition}");
}
}
}
}
Output:
Element example at 1:2
Element someElement at 2:2
Attribute someAttribute at 3:3
EndElement someElement at 5:5
EndElement example at 5:19

What is the most efficient way to select an XML value based on one of its associative attributes using C#

I have an XML document that resembles this:
<resorts>
<resort location="locationA" email="locationA#somewhere.com"></resort>
<resort location="locationB" email="locationB#somewhere.com"></resort>
<resort location="locationC" email="locationC#somewhere.com"></resort>
<resort location="locationD" email="locationD#somewhere.com"></resort>
</resorts>
I need to get the corresponding email address given a specific location and the code I'm using to do that is:
XmlDocument doc = new XmlDocument();
doc.Load(xml);
XmlElement xmlRoot = doc.DocumentElement;
XmlNodeList xmlNodes = xmlRoot.SelectNodes("/resorts/resort");
foreach(XmlNode element in xmlNodes)
{
foreach (XmlAttribute attribute in element.Attributes)
{
switch (attribute.Name)
{
case "location":
if (attribute.Value.ToLower() == location.ToLower())
{
loc = attribute.Value;
locationIdentified = true;
}
break;
case "email":
if (locationIdentified)
{
if(!emailIdentified)
{
email = attribute.Value;
var recipientList = new Dictionary<string, string>() { { "emailrecipients", email } };
emailRecipients.Add(recipientList);
emailIdentified = true;
}
}
break;
}
}
}
return recipients;
But I don't really care much for the iterative approach and would prefer something more streamlined with less code.
Something similar to a linq expression would be ideal but I don't typically have to deal much with XML data so I'm a bit of a novice in this area; but I know there has to be a better way to get retrieve the data from the XML.
What I need to do is acquire the email address for a specific location; having the location known beforehand.
What would be the most efficient manner to do this without an explicit iteration as I've done here?
This question is not specifically about "how to use" alternative options as much as "what are more streamlined approaches" to solve this problem. However, as I stated; since I am a novice at XML, it would be nice to have examples of any proposed alternatives
Thanks

octavioccl's answer is correct, but if you have some reason to stick with XmlDocument rather than switching to Linq to XML (perhaps this is a large project that's already heavily committed to the old XML classes, and nobody wants to cross the streams), this will work:
string location = "locationC";
string xpath = "/resorts/resort[#location='" + location + "']/#email";
var address = doc.SelectNodes(xpath).Cast<XmlAttribute>()
.Select(attr => attr.Value).SingleOrDefault();
If you can use C#6 features, this is tidier:
var address = doc.SelectSingleNode(xpath)?.Value;
If you need it to be case-blind, I think you may be stuck with this:
string xpath = "/resorts/resort[translate(#location, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz')='"
+ location.ToLower() + "']/#email";
There's a lower-case() function in XPath 2.0, but SelectNodes seems to be implementing XPath 1.0. That messy translate() call is the usual workaround.

If you use Linq to XML your query could be like this:
var query=root.Descendants("resort").Where(e=>e.Attribute("location").Value.ToLower()==location.ToLower())
.Select(e=>e.Attribute("email"));
If there is only one with that location you can use FirstOrDefault extension method to get it:
var result=query.FirstOrDefault();

Its hard to say what is most efficient, but you could look into using XPath to traverse XML without having to iterate.
Here are some examples of what that looks like:
https://msdn.microsoft.com/en-us/library/ms256086(v=vs.110).aspx

XpathNavigator giving System.Xml.XPath.XPathException because of a single quote [duplicate]

This question already has answers here:
Apostrophe (') in XPath query
(11 answers)
Closed 8 years ago.
I have the following code
XPathNavigator nav = doc.CreateNavigator();
XPathExpression expr;
expr = nav.Compile("//somePath/FieldData[#Location='Payer's Name']/#Value");
The single quote gives the exception System.Xml.XPath.XPathException. I have tried escaping it with a slash ('\'), (&apos;), double single quotes. Nothing seems to work though. Any idea how to resolve this?

There are basically two ways to reliably handle this situation.
The first way is to define a VariableContext and put the value you want to use in an XPath variable. Then you can use an expression like:
//somePath/FieldData[#Location = $user]/#Value
I describe how to do that in this post. It requires creating a VariableContext class as nothing suitable is built into .NET, but the example I provide should work as-is for most cases.
The other option is to use Linq-to-XML. If you go that route, you can query the node using Linq instead of XPath and delimiters are not an issue this way:
// here the doc variable is an XElement or XDocument
var value = (from fd in doc.Descendants("FieldData")
where (string)fd.Attribute("Location") == sUser
select fd.Attribute("Value")).FirstOrDefault();
or with the other Linq syntax:
var value = doc.Descendants("FieldData")
.Where(fd => (string)fd.Attribute("Location") == sUser)
.Select(fd => fd.Attribute("Value"))
.FirstOrDefault();

you cannot have a space in the xpath "[#Location='Player's Name'] take the space out

Note the single quote (trouble maker) has been replaced with a double quote.
string sUser = "Payer's name"; // as per OP requirement considering only single quote
expr = nav.Compile("//somePath/FieldData[#Location=\"" + sUser + "\"]/#Value")

Is there any reason why you cannot use LINQ if you use language C#?
using System.Xml.Linq;
public class XmlParser
{
public XDocument doc { get; set;}
public XmlParser (XDocument doc)
{
this.doc = doc;
}
public List<XElement> searchElements (String elementName)
{
return doc.Elements(elementName).ToList<XElement>(); //this will search for all child nodes and return the elements with the specified name.
}
}

expr = nav.Compile("//somePath/FieldData[#Location=\"Payer's Name\"]/#Value");

XML linq query lists first elements but not all

I have this XML file that I parse into its elements and create a list of a custom object Module.
XDocument kobra = XDocument.Load(new StringReader(results.OuterXml));
XNamespace ns = "#RowsetSchema";
var kobraNodeList = from s in kobra.Descendants(ns + "row")
select new Module
{
id = s.Attribute("ows_ID").Value,
name = s.Attribute("ows_Title").Value,
sourceFile = s.Attribute("ows_Source_x0020_Message_x0020_File_").Value,
scope = Scope.KOBRA,
component = string.Empty
};
and here's my Module struct:
public struct Module
{
public string name;
public Scope scope;
public string component;
public int wordCound;
public string id;
public string sourceFile;
}
The code works fine, but things get weird when I try to convert the var kobraNodeList into a list of Modules, I get a System.NullReferenceException at the AddRange line:
this.moduleList = new List<Module>;
this.moduleList.AddRange(kobraNodeList);
When trying to debug, I notice that although kobraNodeList.Count() also returns System.NullReferenceException, a kobraNodeList.Any() returns true, and kobraNodeList.First() returns a perfectly valid and correct Module struct with the desired data.
The XML file is valid, and if I replace the linq query with this:
var kobraNodeList = from s in kobra.Descendants(ns + "row")
select s;
I get a valid list of XElement, which I can Count() ok.
Can someone explain me what's wrong? BTW, I'm using .NET 3.5.

That looks like one (or more) of kobra.Descendants has ows_ID, ows_Title or ows_Source_x0020_Message_x0020_File_ attribute missing.
Linq uses deferred execution, so it won't try to build the sequence until you ask for the items. When you call Any() or First(), it only needs the first item in the sequence to work, which tells me that the first item in kobra.Descendants does have all of the required nodes.
However, one of the items after the first is probably missing at least one of those attributes - so you end up asking for the Value of a NULL attribute.

Inside
select new Module
{
// properties...
}
You could be running into a NullReferenceException as you access .Value on elements that might not exist in the XML document. Your first object in the collection is likely fine, hence your results when using Any() or First(). Subsequent items could be missing elements/attributes you are trying to use.
Try this as a replacement instead of using .Value directly.
id = (string)s.Attribute("whatever") // etc.

One of your lines such as s.Attribute("ows_Source_x0020_Message_x0020_File_") will be returning null for one of the records so s.Attribute("ows_Source_x0020_Message_x0020_File_").Value would cause the null reference exception.

Convert XElement to string

I have a simple XElement object
XElement xml = new XElement("XML",
new XElement ("TOKEN",Session["Token"]),
new XElement("ALL_INCLUSIVE", "0"),
new XElement("BEACH", "0"),
new XElement("DEST_DEP", ddlDest.SelectedValue.ToString()),
new XElement("FLEX", "0")
);
Where want to dump out the contents into a string. Exactly like how Console.Writeline(xml); does, but I want the contents in a string. I tried various methonds. xml.ToString(); doesn't return anything on its own.

ToString should most definitely work. I use it all the time. What does it return for you in this case? An empty string? My guess is that something went wrong building your XElement. To debug, rewrite the code to add each of the child XElements separately, so that you can step through your code and check on each of them. Then before you execute the .ToString, in the Locals window, look at the [xml] variable expanded to xml.
In short, your problem is happening before you ever get to the ToString() method.

ToString works, but it returns content including XElement tag itself. If you need for Inner XML without root tag ("" in your example), you may use the following extension method:
public static class XElementExtension
{
public static string InnerXML(this XElement el) {
var reader = el.CreateReader();
reader.MoveToContent();
return reader.ReadInnerXml();
}
}
Then simple call it: xml.InnerXML();

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

What is the best way to compare XML files for equality? - c#

It really depends on what you want to check as "differences". Right now, we're using Microsoft XmlDiff: http://msdn.microsoft.com/en-us/library/aa302294.aspx

I wrote a small library with asserts for serialization, source. Sample: [Test] public void Foo() { ... XmlAssert.Equal(expected, actual, XmlAssertOptions.IgnoreDeclaration | XmlAssertOptions.IgnoreNamespaces); }

Related

Cannot access or find reference to System.Xml.Linq.LineInfoAnnotation. Why is this?

What is the most efficient way to select an XML value based on one of its associative attributes using C#

XpathNavigator giving System.Xml.XPath.XPathException because of a single quote [duplicate]

XML linq query lists first elements but not all

Convert XElement to string

Categories

Resources