Count how many differences in 2 XML files - c#

Imagine one XML as:
<foo>
<node1>Some value</node1>
<node2>BB</node2>
<node3>TTTTT</node3>
<node4>XXXX</node4>
</foo>
and another XML as:
<foo>
<node1>Something Else</node1>
<node4>XXXX</node4>
<node5>TTTTT</node5>
</foo>
The difference count here is 3
a) node1 value is different
b) node2 is missing in 2nd XML
c) node5 is missing in 1st XML
I've tried using the XMLDiff class but the result is too cumbersome for my needs.
Schema:
Root named "foo" and one single set of childrens with one value each.
Question:
What's the simplest and fastest way to code this in C#?

One way to do this might be to generate a list of XPath assertions from your first document, in the form:
/foo/node1 = "Some value"
/foo/node2 = "BB"
/foo/node3 = "TTTT"
/foo/node4 = "XXXX"
and then apply these assertions to the second document to count how many of them are true. Because this won't catch data that is absent on the first document and present in the second, you might want to do the inverse as well. It's not perfect, of course, for example it won't catch differences in element order. But you haven't actually defined what you mean by a significant difference, and you could adjust the XPath expressions to assert what you consider significant. For example you could vary the last assertion to:
count(/foo/node4[. = "XXXX"]) = 1
The simplest and fastest way to code this, of course, is not in C#, unless that happens to be the only programming language you know. Using XSLT or XQuery would be much better.

Have you considered using XNode.DeepEquals, with the root (in this case 'foo') of each XML file being your node? The MSDN page on how to use it is here:
http://msdn.microsoft.com/en-us/library/system.xml.linq.xnode.deepequals.aspx

Related

Parsing XML file in C# - how to report errors

First I load the file in a structure
XElement xTree = XElement.Load(xml_file);
Then I create an enumerable collection of the elements.
IEnumerable<XElement> elements = xTree.Elements();
And iterate elements
foreach (XElement el in elements)
{
}
The problem is - when I fail to parse the element (a user made a typo or inserted a wrong value) - how can I report exact line in the file?
Is there any way to tie an element to its corresponding line in the file?
One way to do it (although not a proper one) –
When you find a wrong value, add an invalid char (e.g. ‘<’) to it.
So instead of: <ExeMode>Bla bla bla</ExeMode>
You’ll have: <ExeMode><Bla bla bla</ExeMode>
Then load the XML again with try / catch (System.Xml.XmlException ex).
This XmlException has LineNumber and LinePosition.
If there is a limited set of acceptable values, I believe XML Schemas have the concept of an enumerated type -- so write a schema for the file and have the parser validate against that. Assuming the parser you're using supports Schemas, which most should by now.
I haven't looked at DTDs in decades, but they may have the same kind of capability.
Otherwise, you would have to consider this semantic checking rather than syntactic checking, and that makes it your application's responsibility. If you are using a SAX parser and interpreting the data as you go, you may be able to get the line number; check your parser's features.
Otherwise the best answer I've found is to report the problem using an xpath to the affected node/token rather than a line number. You may be able to find a canned solution for that, either as a routine to run against a DOM tree or as a state machine you can run alongside your SAX code to track the path as you go.
(All the "maybe"s are because I haven't looked at what's currently available in a very long time, and because I'm trying to give an answer that is valid for all languages and parser implementations. This should still get you pointed in some useful directions.)

Linq duplicate elements when iterating over XML

<?xml version='1.0' encoding='utf-8' standalone='yes'?>
<stock-items>
<stock-item>
<name>Loader 34</name>
<sku>45GH6</sku>
<vendor>HITINANY</vendor>
<useage>Lifter 45 models B to C</useage>
<typeid>01</typeid>
<version>01</version>
<reference>33</reference>
<comments>EOL item. No Re-order</comments>
<traits>
<header>56765</header>
<site>H4</site>
<site>A6</site>
<site>V1</site>
</traits>
<type-validators>
<actions>
<endurance-tester>bake/shake</endurance-tester>
</actions>
<rules>
<results-file>Test-Results.txt</results-file>
<file-must-contain file-name="Test-Results.xml">
<search>
<term>[<![CDATA[<"TEST TYPES 23 & 49 PASSED"/>]]></term>
<search-type>exactMatch</search-type>
</search>
</file-must-contain>
</rules>
</type-validators>
</stock-item>
</stock-items>
Im trying to get the rules fragment from the xml above into a string so it can be added to a database. Currently the search element and its contents are added twice. I know why this is happing but cant figure out how to prevent it.
Heres my code
var Rules = from rules in Type.Descendants("rules")
select rules.Descendants();
StringBuilder RulesString = new StringBuilder();
foreach (var rule in Rules)
{
foreach (var item in rule)
{
RulesString.AppendLine(item.ToString());
}
}
Console.WriteLine(RulesString);
Finally any elements in rules are optional and some of these elements may or may not contain other child elements up to 4 or 5 levels deep. TIA
UPDATE:
To try and make it clearer what im trying to achieve.
From the xml above I should end up with a string containing everthing in the rules element, exactly like this:
<results-file>Test-Results.txt</results-file>
<file-must-contain file-name="Test-Results.xml">
<search>
<term>[<![CDATA[<"TEST TYPES 23 & 49 PASSED"/>]]></term>
<search-type>exactMatch</search-type>
</search>
</file-must-contain>
Objective is to extract the entire contents of the rules element as is while taking account that the rules element may or may not contains child elements several levels deep
If you just want the entirety of the rules element as a string (rather than caring about its contents as xml), you don't need to dig into its contents, you just need to get the element as an XNode and then call ToString() on it :
The following example uses this method to retrieve indented XML.
XElement xmlTree = new XElement("Root",
new XElement("Child1", 1)
);
Console.WriteLine(xmlTree);
This example produces the following output:
<Root>
<Child1>1</Child1>
</Root>
if you want to prevent duplicates than you will need to use Distinct() or GroupBy() after parsing the xml and before building the string.
I'm still not fully understanding exactly what the output should be, so I can't provide a clear solution on what exactly to use, or how, in terms of locating duplicates. If you can refine the original post that would help.
we need the structure of the xml as it would appear in your scenario. nesting and all.
we need an example of the final string.
saving it to a db doesn't really matter for this post so you only need to briefly mention that once, if at all.

LINQ to SQL to XML (using XML literals in C#)

Is it possible to use variables like <%=person.LastName %> in XML string this way?
XElement letters = new XElement("Letters");
XElement xperson = XElement.Parse("<Table><Row><Cell><Text>
<Segment>Dear <%=person.Title%> <%=person.FirstName%> <%=person.LastName%>,
</Segment></Text></Cell></Row>...");
foreach (Person person in persons){
letters.Add(xperson)
}
If it's possible, it would be a lifesaver since I can't use XElement and XAttribute to add the nodes manually. We have multiple templates and they frequently change (edit on the fly).
If this is not doable, can you think of another way so that I can use templates for the XML?
Look like it's possible in VB
http://aspalliance.com/1534_Easy_SQL_to_XML_with_LINQ_and_Visual_Basic_2008.6
This is an exclusive VB.NET feature known as XML Literals. It was added in VB 9.0. C# does not currently support this feature. Although Microsoft has stated its intent to bridge the gap between the languages in the future, it's not clear whether this feature will make it to C# any time soon.
Your example doesn't seem clear to me. You would want to have the foreach loop before parsing the actual XML since the values are bound to the current Person object. For example, here's a VB example of XML literals:
Dim xml = <numbers>
<%= From i In Enumerable.Range(1, 5)
Select <number><%= i %></number>
%>
</numbers>
For Each e In xml.Elements()
Console.WriteLine(e.Value)
Next
The above snippet builds the following XML:
<numbers>
<number>1</number>
<number>2</number>
<number>3</number>
</numbers>
Then it writes 1, 2, 3 to the console.
If you can't modify your C# code to build the XML dynamically then perhaps you could write code that traverses the XML template and searches for predetermined fields in attributes and elements then sets the values as needed. This means you would have to iterate over all the attributes and elements in each node and have some switch statement that checks for the template field name. If you encounter a template field and you are currently iterating attributes you would set it the way attributes are set, whereas if you were iterating elements you would set it the way elements are set. This is likely not the most efficient approach but is one solution.
The simplest solution would be to use VB.NET. You can always develop it as a stand-alone project, add a reference to the dll from the C# project, pass data to the VB.NET class and have it return the final XML.
EDIT: to clarify, using VB.NET doesn't bypass the need to update the template. It allows you to specify the layout easier as an XML literal. So code still needs to be updated in VB once the layout changes. You can't load an XML template from a text file and expect the fields to be bound that way. For a truly dynamic solution that allows you to write the code once and read different templates my first suggestion is more appropriate.

Xpath, retrieving node value

I get this return value from Sharepoint... which I have just included the first part of the xml snippet...
<Result ID=\"1,New\" xmlns=\"http://schemas.microsoft.com/sharepoint/soap/\">
<ErrorCode>0x00000000</ErrorCode><ID /><z:row ows_ID=\"9\"
It populates a XmlNode node object.
How using xPath can I get the value of ows_id ?
My code so far...
XmlNode results = list.UpdateListItems("MySharePointList", batch);
Update
So far I have this : results.FirstChild.ChildNodes[2].Attributes["ows_ID"].Value
But I am not sure how reliable it is, can anyone improve on it?
I don't know if its necessarily an improvement, but it might be more readable, though more verbose:
/*[local-name() = 'Result']/*[local-name() = 'row']/#ows_ID
There is probably more to the fragment you posted so this XPath query might need a fixup when used against the actual xml result.
The function, local-name(), lets you ignore namespaces, which can be both a boon and a curse. :)
When you start from root:
/Result/z:row/#ows_ID
also you can improve search if exists multiple Result:
/Result[#ID='1,New']/z:row/#ows_ID
<xsl:value-of select="Result/b:row/#ows_ID"/>
or
<xsl:value-of select="Result/b:row[#ows_ID = '9']"/>
Depending on what value you wanted
You probably need to make sure the z namespace prefix is declared correctly - that's implementation dependent. Here's how you do it in Java's XPath implementation.
Then to select the value of the ows_ID attribute, you need to navigate to the element itself, then use #ows_ID to get the value.
The specific xpath calls depend on what library you use (e.g. libxml xpath implementation).
But the generic xpath statement would be:
"//z:row[#ows_ID='9']"
This will select all z:row nodes with an attribute ows_ID of value 9.
You can modify this query to match all z:row nodes or only those with a specific attribute.
For details look here: W3Schools XPath syntax

Xpathexpression AddSort on multilpe field?

My application will process more than 10000 xml documents as batch. while processing I want to sort the content of xml documents.
I came accross XpathExpression AddSort method, but how to use i to sort on multiple fields ?
or using xslttranform will be appropriate ?? which is better in term of performance ??
Thanks in advance.
Jon Kra
Let me answer in backorder
To choose between XPath and xsltransfor you should understand if xslt is enough to your batch processing. Most of xml operations can be done in xslt, so think about fully migrate.
Concerning to XPathExpression.AddSort. According to msdn: first argument can be XPathExpression, the second should be IComparer.
This exposes your 2 ways.
Let the XPathExpression to merge 2 or more fields to compare
Let the XPathExpression select some "root" of compare and pass it to IComparer, that in order would extract from "root" expected fields to compare.

Categories