Find and Replace through XML files - c#

I have a lot of XML files that need to be edited.
I need to find all instances with:
Example
<Btn>
<sText>Hold</sText>
and add a field before it
<Btn>//star of new fields
<sText>Tools</sText>
*rest of fields*
</Btn> //end of added fields
<Btn> //start of original search
<sText>Hold</sText>
I have read the using regex on XML is not advisable. What would be the best way to achieve a large one time operation for multiple files for something like this?
For regex I tried but with no luck to just start out with searching for the needed fields.
/<Btn>(.*?(\n))+.*?<sText>Hold</sText>/im
Using editors like notepad++,Brackets currently to edit files. Any suggestions on doing a large one time one time operation would be greatly appreciated. Doing the changes by hand in the GUI to hundreds of configs is not desirable.Just looking for an alternative route to save sanity.

You can create an object for your XML document. From there you can traverse through all of its nodes, find what you are looking for and add them to a list. When you already have the list, you can then write your logic for inserting the nodes that you want. I'm using LINQ.
public class Program
{
static void Main(string[] args)
{
XDocument doc = XDocument.Load("YourXmlFile.xml");
RootElement root = new RootElement(doc.Elements().FirstOrDefault());
foreach (XElement item in root.GetInstances())
{
//--Your logic for adding the fields you want
}
Console.ReadLine();
}
}
public class RootElement
{
public List<XElement> childElements { get; set; }
public RootElement(XElement xElement)
{
childElements = new List<XElement>();
foreach (XElement e in xElement.Elements())
{
childElements.Add(e);
}
}
public List<XElement> GetInstances()
{
List<XElement> instances = new List<XElement>();
foreach (XElement item in childElements)
{
if (item.Name == "Btn")
{
IEnumerable<XElement> elements = item.Elements();
XElement child = elements.FirstOrDefault(x => x.Name == "sText");
if (child != null)
{
if (child.Value == "Hold")
{
instances.Add(item);
}
}
}
}
return instances;
}
}

I have an XSL approach you might like to try. XSL is great for transforming XML documents of one kind into another (amongst other things).
As I understand it, you need to find each instance of Btn and copy it to a new instance before its current location.
With this in mind, here's how I got it to work.
Test.xml file:
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="Test.xslt"?>
<Something>
<Btn>
<sText>Hold</sText>
<Another>Foo</Another>
</Btn>
<Btn>
<sText>Hold</sText>
</Btn>
<Btn>
<sText>Hold</sText>
</Btn>
</Something>
Note the use of the stylesheet reference, you would need to add this to the documents you wish to edit.
Test.xslt file:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<xsl:element name="Output">
<xsl:apply-templates select="//Btn" />
</xsl:element>
</xsl:template>
<xsl:template match="Btn">
<xsl:element name="NewBtn">
<xsl:copy-of select="current()/*" />
</xsl:element>
<xsl:element name="Btn">
<xsl:copy-of select="current()/*" />
</xsl:element>
</xsl:template>
</xsl:stylesheet>
The output should look like this:
<?xml version="1.0" encoding="utf-8"?>
<Output>
<NewBtn>
<sText>Hold</sText>
<Another>Foo</Another>
</NewBtn>
<Btn>
<sText>Hold</sText>
<Another>Foo</Another>
</Btn>
<NewBtn>
<sText>Hold</sText>
</NewBtn>
<Btn>
<sText>Hold</sText>
</Btn>
<NewBtn>
<sText>Hold</sText>
</NewBtn>
<Btn>
<sText>Hold</sText>
</Btn>
</Output>
The newly duplicated instances of your Btn nodes are named NewBtn in this example.
Note that I've changed/added some elements here (Output, Something) in order to get valid XML.
I hope that helps!

You can try to solve it without regular expressions. For example you can use XmlReader and XmlWriter
Read one row with XmlReader
Check for your condition
Skip/modify row
Write row with XmlWriter
It's the most memory and CPU efficient solution since you don't need to load whole file to memory and XML Writer/Reader in C# are pretty fast compared to XDocument or other fancy xml parsers. Moreover it's simple so you need to mess with regexes and can contain any complicated logic you will need.

Related

C# XSLT Transforming Large XML Files Quickly

I'm transforming a > 2GB file with a lookup template in the XSLT.
I would like this to run faster but can't find any low hanging fruit to improve performance. Any help would be greatly appreciated.
I'm a newb when it comes to transformations.
This is the current format of the XML file.
<?xml version="1.0" encoding="utf-8" ?>
<contacts>
<contact>
<attribute>
<name>text12</name>
<value>B00085590</value>
</attribute>
<attribute>
<name>text34</name>
<value>Atomos</value>
</attribute>
<attribute>
<name>date866</name>
<value>02/21/1991</value>
</attribute>
</contact>
<contact>
<attribute>
<name>text12</name>
<value>B00058478</value>
</attribute>
<attribute>
<name>text34</name>
<value>Balderas</value>
</attribute>
<attribute>
<name>date866</name>
<value>11/24/1997</value>
</attribute>
</contact>
</contacts>
The xslt I used for the transformation.
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl"
>
<xsl:output method="xml" indent="yes"/>
<!--Identify location of the lookup xml-->
<xsl:param name="lookupDoc" select="document('C:\Projects\Attributes.xml')" />
<!--Main Template-->
<xsl:template match="/contacts">
<!--Apply Formatted Contacts Template-->
<xsl:apply-templates select="contact" />
</xsl:template>
<!--Formatted Contacts Template-->
<xsl:template match="contact">
<contact>
<xsl:for-each select="attribute">
<!--Create variable to hold New Name after passing the Data Name to the Lookup Template-->
<xsl:variable name="newName">
<xsl:apply-templates select="$lookupDoc/attributes/attribute">
<xsl:with-param name="nameToMatch" select="name" />
</xsl:apply-templates>
</xsl:variable>
<!--Format Contact Element with New Name variable-->
<xsl:element name="{$newName}">
<xsl:value-of select="value"/>
</xsl:element>
</xsl:for-each>
</contact>
</xsl:template>
<!--Lookup Template-->
<xsl:template match="attributes/attribute">
<xsl:param name="nameToMatch" />
<xsl:value-of select='translate(translate(self::node()[name = $nameToMatch]/mappingname, "()*%$##!~<>&apos;&,.?[]=-+/\:1234567890", "")," ","")' />
</xsl:template>
</xsl:stylesheet>
Sample Lookup XML
<?xml version="1.0" encoding="utf-8" ?>
<attributes>
<attribute>
<name>text12</name>
<mappingname>ID</mappingname>
<datatype>Varchar2</datatype>
<size>30</size>
</attribute>
<attribute>
<name>text34</name>
<mappingname>Last Name</mappingname>
<datatype>Varchar2</datatype>
<size>30</size>
</attribute>
<attribute>
<name>date866</name>
<mappingname>DOB</mappingname>
<datatype>Date</datatype>
<size></size>
</attribute>
</attributes>
Transformed XML
<?xml version="1.0" encoding="utf-8" ?>
<contacts>
<contact>
<ID>B00085590</ID>
<LastName>Brady</LastName>
<DOB>02/21/1991</DOB>
</contact>
<contact>
<ID>B00058478</ID>
<LastName>Balderas</LastName>
<DOB>11/24/1997</DOB>
</contact>
</contacts>
C#
XsltSettings settings = new XsltSettings(true, true);
XslCompiledTransform ContactsXslt = new XslCompiledTransform();
ContactsXslt.Load(#"C:\Projects\ContactFormat.xslt", settings, new XmlUrlResolver());
using (XmlReader r = XmlReader.Create(#"C:\Projects\Contacts.xml")){
using (XmlWriter w = XmlWriter.Create(#"C:\Projects\FormattedContacts.xml")) {
w.WriteStartElement("contacts");
while (r.Read()) {
if (r.NodeType == XmlNodeType.Element && r.Name == "contact") {
XmlReader temp = new XmlTextReader(new StringReader(r.ReadOuterXml()));
ContactsXslt.Transform(temp, null, w);
}
}
}
}
The approach I'm taking is transforming 1 node at a time to avoid an OutOfMemoryException. Should I be feeding larger chunks through to speed up the process? Or am I going about this all wrong?
I think you can simplify the XSLT code
<xsl:for-each select="attribute">
<!--Create variable to hold New Name after passing the Data Name to the Lookup Template-->
<xsl:variable name="newName">
<xsl:apply-templates select="$lookupDoc/attributes/attribute">
<xsl:with-param name="nameToMatch" select="name" />
</xsl:apply-templates>
</xsl:variable>
using the template
<xsl:template match="attributes/attribute">
<xsl:param name="nameToMatch" />
<xsl:value-of select='translate(translate(self::node()[name = $nameToMatch]/mappingname, "()*%$##!~<>&apos;&,.?[]=-+/\:1234567890", "")," ","")' />
</xsl:template>
to
<xsl:for-each select="attribute">
<!--Create variable to hold New Name after passing the Data Name to the Lookup Template-->
<xsl:variable name="newName">
<xsl:apply-templates select="$lookupDoc/attributes/attribute[name = current()/name]"/>
</xsl:variable>
with the template being simplified to
<xsl:template match="attributes/attribute">
<xsl:value-of select='translate(translate(mappingname, "()*%$##!~<>&apos;&,.?[]=-+/\:1234567890", "")," ","")' />
</xsl:template>
I think that for sure is a more concise and XSLT way of expressing the approach, whether it improves performance is something you would have to test.
In general with XSLT to improve performance of cross-references/lookups it is recommended to use a key so you would use
<xsl:key name="att-lookup" match="attributes/attribute" use="name"/>
and then use it as
<xsl:variable name="name" select="name"/>
<xsl:variable name="newName">
<!-- in XSLT 1 we need to change the context doc for the key lookup -->
<xsl:for-each select="$lookupDoc">
<xsl:apply-templates select="key('att-lookup', $name)"/>
</xsl:variable>
I think that would considerable speed up the lookup in a single transformation, as you combine XmlReader and XSLT to run the XSLT many times on as many elements your XmlReader finds I can't tell whether it helps a lot, you would need to try.
As pointed out in the XSLT 3 suggestion, I would also consider transforming the lookup file first and once to avoid the repetition of all those translate calls to create proper XML element names. Either do that outside of the existing XSLT or do it inside by using a variable and then exsl:node-set to convert the result tree fragment into a variable. But in your case as you run the XSLT repeatedly I think it is probably better to first transform the lookup document outside of the main XSLT, to avoid having to do all those translates again and again.
When reading huge xml files always use XmlReader. I like using a combination of XmlReader and Xml linq. I also like using dictionaries. See code below :
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
const string FILENAME = #"c:\temp\test.xml";
static void Main(string[] args)
{
XmlReader reader = XmlReader.Create(FILENAME);
while (!reader.EOF)
{
if (reader.Name != "contact")
{
reader.ReadToFollowing("contact");
}
if (!reader.EOF)
{
XElement xContact = (XElement)XElement.ReadFrom(reader);
Contact newContact = new Contact();
Contact.contacts.Add(newContact);
newContact.attributes = xContact.Descendants("attribute")
.GroupBy(x => (string)x.Element("name"), y => (string)y.Element("value"))
.ToDictionary(x => x.Key, y => y.FirstOrDefault());
}
}
}
}
public class Contact
{
public static List<Contact> contacts = new List<Contact>();
public Dictionary<string, string> attributes { get; set; }
}
}
As an alternative, you might want to look into solving the task with XSLT 3 and its streaming feature (https://www.w3.org/TR/xslt-30/#streaming-concepts) as there you could process the huge input file in a forwards only but declarative way where you only in the template for the attribute element you need to ensure you work with a intentionally created full copy of that element to allow XPath navigation to the child elements. Additionally I think it makese sense to read in the lookup document only once and do the translate calls to create the proper element names only once. So the following is a streaming XSLT 3 solution runnable with Saxon 9.8 EE which transforms the lookup document into an XPath 3.1 map (https://www.w3.org/TR/xpath-31/#id-maps) and otherwise uses a streamable mode to process the large, main input:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:map="http://www.w3.org/2005/xpath-functions/map"
exclude-result-prefixes="xs map"
version="3.0">
<!-- could of course load the document using select="document('lookup.xml')" instead of inlining it as done here just for the example and testing -->
<xsl:param name="lookup-doc">
<attributes>
<attribute>
<name>text12</name>
<mappingname>ID</mappingname>
<datatype>Varchar2</datatype>
<size>30</size>
</attribute>
<attribute>
<name>text34</name>
<mappingname>Last Name</mappingname>
<datatype>Varchar2</datatype>
<size>30</size>
</attribute>
<attribute>
<name>date866</name>
<mappingname>DOB</mappingname>
<datatype>Date</datatype>
<size></size>
</attribute>
</attributes>
</xsl:param>
<xsl:variable
name="lookup-map"
as="map(xs:string, xs:string)"
select="map:merge(
$lookup-doc/attributes/attribute
!
map {
string(name) : translate(translate(mappingname, '()*%$##!~<>''&,.?[]=-+/\:1234567890', ''), ' ','')
}
)"/>
<xsl:mode on-no-match="shallow-copy" streamable="yes"/>
<xsl:output method="xml" indent="yes"/>
<xsl:template match="contact/attribute">
<xsl:variable name="attribute-copy" select="copy-of()"/>
<xsl:element name="{$lookup-map($attribute-copy/name)}">
<xsl:value-of select="$attribute-copy/value"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
Online sample (there running with Saxon 9.8 HE which ignores the streaming and does normal XSLT processing) is at https://xsltfiddle.liberty-development.net/bFDb2Ct/1.
To run streaming XSLT 3 with Saxon 9.8 and C# you use http://saxonica.com/html/documentation/dotnetdoc/Saxon/Api/Xslt30Transformer.html and set up ApplyTemplates on an input Stream with your huge input XML (http://saxonica.com/html/documentation/dotnetdoc/Saxon/Api/Xslt30Transformer.html#ApplyTemplates(System.IO.Stream,Saxon.Api.XmlDestination)).

How can I replace xmlns namespace attributes with prefixes?

I have been trying to write a utility in C# which takes an XML file, removes the xmlns attributes from the tags, sets the prefix of these attributes in the root tag, and then uses these prefixes in the tags instead.
Source XML file:
<?xml version="1.0" encoding="utf-8"?>
<Main version="1.0" xmlns="urn:root:v1">
<Report>
<Title>Some Value</Title>
</Report>
<Content>
<Address>
<CountryName xmlns="urn:location:v2">Australia</CountryName>
</Address>
</Content>
</Main>
Target XML file:
<?xml version="1.0" encoding="utf-8"?>
<root:Main version="1.0" xmlns:root="urn:root:v1" xmlns:loc="urn:location:v2">
<root:Report>
<root:Title>Some Value</root:Title>
</root:Report>
<root:Content>
<root:Address>
<loc:CountryName>Australia</loc:CountryName>
</root:Address>
</root:Content>
</root:Main>
I've managed to get part of the way there with the following code. I have replaced all tags with no attributes with the root prefix, and added the xmlns attribute to the root tag, but have not been successful with removing the xmlns attribute from CountryName tag and using the prefix there instead.
XDocument doc = XDocument.Load(#"C:\Temp\Source.xml");
var content = XElement.Parse(doc.ToString());
content.Attributes("xmlns").Remove();
content.Add(new XAttribute(XNamespace.Xmlns + "root", "urn:root:v1"));
content.Add(new XAttribute(XNamespace.Xmlns + "loc", "urn:location:v2"));
foreach (var node in doc.Root.Descendants().Where(n => n.Name.NamespaceName == "urn:location:v2"))
{
node.Attribute("xmlns").Remove();
node.Add(new XAttribute(XNamespace.Xmlns + "loc", "urn:location:v2"));
}
content.Save(#"C:\Temp\Target.xml");
Any help would be appreciated - thanks!
You're not a million miles away. All you need to do is remove any existing namespace declaration attributes and then add the ones you want to the root. The rest will be taken care of.
var doc = XDocument.Load(#"C:\Temp\Source.xml");
doc.Descendants().Attributes().Where(x => x.IsNamespaceDeclaration).Remove();
doc.Root.Add(new XAttribute(XNamespace.Xmlns + "root", "urn:root:v1"));
doc.Root.Add(new XAttribute(XNamespace.Xmlns + "loc", "urn:location:v2"));
doc.Save(#"C:\Temp\Target.xml");
See this fiddle for a demo.
Consider XSLT, the special-purpose language designed to transform XML files. While I personally do not know or use C#, I do know it can run XSLT 1.0 scripts. See answers here. Also, the XSLT processor you choose to use must allow the document() function for this solution.
XSLT (save as .xsl file; notice namespaces declared in header)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:root="urn:root:v1" xmlns:local="urn:location:v2">
<xsl:output omit-xml-declaration="no" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*">
<xsl:element name="root:{name()}" namespace="urn:root:v1">
<xsl:copy-of select="document('')/*/namespace::local"/>
<xsl:apply-templates select="node()|#*"/>
</xsl:element>
</xsl:template>
<xsl:template match="*[local-name()='CountryName']">
<xsl:element name="local:{name()}" namespace="urn:location:v2">
<xsl:apply-templates select="node()|#*"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
C# (see link above)
var myXslTrans = new XslCompiledTransform();
myXslTrans.Load("XSLTScript.xsl");
myXslTrans.Transform("Input.xml", "Output.xml");
XML Output
<?xml version="1.0"?>
<root:Main xmlns:root="urn:root:v1" xmlns:local="urn:location:v2" version="1.0">
<root:Report>
<root:Title>Some Value</root:Title>
</root:Report>
<root:Content>
<root:Address>
<local:CountryName>Australia</local:CountryName>
</root:Address>
</root:Content>
</root:Main>

how to edit xml file in C#

<Family_Inventory>
<FamilyCategory>Curtain Panels
<FamilyName>System Panel
<FamilySymbol>Glazed</FamilySymbol>
<FamilySymbol>Wall</FamilySymbol>
</FamilyName>
</FamilyCategory>
<FamilyCategory>Curtain Panels
<FamilyName>Rectangular Mullion
<FamilySymbol>2.5" x 5" rectangular</FamilySymbol>
</FamilyName>
</FamilyCategory>
...........
...........// many other family categories
...........
</Family_Inventory>
i want create only one FamilyCategory tag based on its value as above is 'Curtain Panels'
so output will be
<Family_Inventory>
<FamilyCategory>Curtain Panels
<FamilyName>System Panel
<FamilySymbol>Glazed</FamilySymbol>
<FamilySymbol>Wall</FamilySymbol>
</FamilyName>
<FamilyName>Rectangular Mullion
<FamilySymbol>2.5" x 5" rectangular</FamilySymbol>
</FamilyName>
</FamilyCategory>
...........
...........// many other family categories
...........
</Family_Inventory>
Please tell me how Can i do this?.
Thanks in Advance
Regards,
Nitin
You should read the documentation on XmlDocument and XmlNode and it's associated classes. There is also plenty of examples of how to use them online.
Here is an introduction to XML with C#.
The first thing that comes to my mind is to use linq to get the whole document... then just create two classes
class FamilyCategory
{
string name;
List<FamilyName> familyNames;
}
class FamilyName
{
string name;
List<string> familySymbol;
}
after you parse the whole document just sort them by familyCategory.. then you just go through the list. You create a new instance of FamilyCategory and add all the FamilyNames to its list... For each new Cateogry you create a new class... after that you just need to add them all back which is described Here
I've not tested this, but I think it's close. It's a very manual approach, and there may be a slicker way to do it, but it's just manually grouping the categories into a new xdocument called groupedCategoryDocument.
var doc = XDocument.Parse(xmlString);
var groupedCategoryDocument = XDocument.Parse("<Family_Inventory />");
foreach(var category in doc.Root.Elements("FamilyCategory"))
{
//create the category for the new document
var groupedCategory = new XElement("FamilyCategory", category.Text);
//get all of the families for this category
foreach(var familyName in category.Elements("FamilyName"))
{
//add to new document
groupedCategory.Add(XElement.Parse(familyName.ToString());
}
// add category to new document
groupedCategoryDocument.Add(groupedCategory);
}
On third preview, I think the XML you are trying to generate is too complex and I think this will be the easiest both in terms of generation & consumption
<Family_Inventory>
<Family category="{category-name}" name="{family-name}" symbol="{symbol-1}"/>
<Family category="{category-name}" name="{family-name}" symbol="{symbol-2}"/>
...
<Family category="{category-name}" name="{family-name}" symbol="{symbol-n}"/>
</Family_Inventory>
One possibility is to use XSLT and grouping by keys. In this case it is complicated by the use of mixed content for "category" and "name"; it makes the key construction non-obvious:
normalize-space(./node()[1]) The first child node of FamilyCategory is assumed to be a text node that contains the category name,; it is further trimmed w.r.t. white space.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes" />
<xsl:key name="category" match="FamilyCategory" use="normalize-space(./node()[1])" />
<xsl:template match="Family_Inventory">
<xsl:copy>
<xsl:for-each
select="FamilyCategory[count(. | key('category', normalize-space(./node()[1]))[1]) = 1]">
<xsl:copy>
<xsl:value-of select="normalize-space(./node()[1])" />
<xsl:for-each select="key('category', normalize-space(./node()[1]))/FamilyName">
<xsl:copy-of select="."/>
</xsl:for-each>
</xsl:copy>
</xsl:for-each>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
A good explanation of how it works is found here: http://www.jenitennison.com/xslt/grouping/muenchian.xml

Comparing two XML files & generating a third with XMLDiff in C#

I am trying to write a simple algorithm to read two XML files with the exact same nodes and structure but not necessarily the same data inside the child nodes and not the same order. How could I create a simple implementation for creating a third, temporary XML being the differential between the two first ones, using Microsoft's XML Diff .DLL ?
XML Diff on MSDN:
XML Diff and Patch Tool
XML Diff and Patch GUI Tool
sample XML code of the two different XML files to compare:
<?xml version="1.0" encoding="utf-8" ?>
<Stats Date="2011-01-01">
<Player Rank="1">
<Name>Sidney Crosby</Name>
<Team>PIT</Team>
<Pos>C</Pos>
<GP>39</GP>
<G>32</G>
<A>33</A>
<PlusMinus>20</PlusMinus>
<PIM>29</PIM>
</Player>
</Stats>
<?xml version="1.0" encoding="utf-8" ?>
<Stats Date="2011-01-10">
<Player Rank="1">
<Name>Sidney Crosby</Name>
<Team>PIT</Team>
<Pos>C</Pos>
<GP>42</GP>
<G>35</G>
<A>34</A>
<PlusMinus>22</PlusMinus>
<PIM>30</PIM>
</Player>
</Stats>
Result wanted (difference between the two)
<?xml version="1.0" encoding="utf-8" ?>
<Stats Date="2011-01-10">
<Player Rank="1">
<Name>Sidney Crosby</Name>
<Team>PIT</Team>
<Pos>C</Pos>
<GP>3</GP>
<G>3</G>
<A>1</A>
<PlusMinus>2</PlusMinus>
<PIM>1</PIM>
</Player>
</Stats>
In this case, I would probably use XSLT to convert the resulting XML "differential" file into a sorted HTML file, but I am not there yet. All I want to do is to display in the third XML file the difference of every numerical value of each nodes, starting from the "GP" child-node.
C# code I have so far:
private void CompareXml(string file1, string file2)
{
XmlReader reader1 = XmlReader.Create(new StringReader(file1));
XmlReader reader2 = XmlReader.Create(new StringReader(file2));
string diffFile = StatsFile.XmlDiffFilename;
StringBuilder differenceStringBuilder = new StringBuilder();
FileStream fs = new FileStream(diffFile, FileMode.Create);
XmlWriter diffGramWriter = XmlWriter.Create(fs);
XmlDiff xmldiff = new XmlDiff(XmlDiffOptions.IgnoreChildOrder |
XmlDiffOptions.IgnoreNamespaces |
XmlDiffOptions.IgnorePrefixes);
bool bIdentical = xmldiff.Compare(file1, file2, false, diffGramWriter);
diffGramWriter.Close();
// cleaning up after we are done with the xml diff file
File.Delete(diffFile);
}
That's what I have so far, but the results is garbage... note that for each "Player" node, the first three childs have NOT to be compared... How can I implement this?
There are two immediate solutions:
Solution 1.
You can first apply a simple transform to the two documents that will delete the elements that should not be compared. Then, compare the results ing two documents -- exactly with your current code. Here is the transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Name|Team|Pos"/>
</xsl:stylesheet>
When this transformation is applied to the provided XML document:
<Stats Date="2011-01-01">
<Player Rank="1">
<Name>Sidney Crosby</Name>
<Team>PIT</Team>
<Pos>C</Pos>
<GP>39</GP>
<G>32</G>
<A>33</A>
<PlusMinus>20</PlusMinus>
<PIM>29</PIM>
<PP>10</PP>
<SH>1</SH>
<GW>3</GW>
<Shots>0</Shots>
<ShotPctg>154</ShotPctg>
<TOIPerGame>20.8</TOIPerGame>
<ShiftsPerGame>21:54</ShiftsPerGame>
<FOWinPctg>22.6</FOWinPctg>
</Player>
</Stats>
the wanted resulting document is produced:
<Stats Date="2011-01-01">
<Player Rank="1">
<GP>39</GP>
<G>32</G>
<A>33</A>
<PlusMinus>20</PlusMinus>
<PIM>29</PIM>
<PP>10</PP>
<SH>1</SH>
<GW>3</GW>
<Shots>0</Shots>
<ShotPctg>154</ShotPctg>
<TOIPerGame>20.8</TOIPerGame>
<ShiftsPerGame>21:54</ShiftsPerGame>
<FOWinPctg>22.6</FOWinPctg>
</Player>
</Stats>
Solution 2.
This is a complete XSLT 1.0 solution (for convenience only, the second XML document is embedded in the transformation code):
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vrtfDoc2">
<Stats Date="2011-01-01">
<Player Rank="2">
<Name>John Smith</Name>
<Team>NY</Team>
<Pos>D</Pos>
<GP>38</GP>
<G>32</G>
<A>33</A>
<PlusMinus>15</PlusMinus>
<PIM>29</PIM>
<PP>10</PP>
<SH>1</SH>
<GW>4</GW>
<Shots>0</Shots>
<ShotPctg>158</ShotPctg>
<TOIPerGame>20.8</TOIPerGame>
<ShiftsPerGame>21:54</ShiftsPerGame>
<FOWinPctg>22.6</FOWinPctg>
</Player>
</Stats>
</xsl:variable>
<xsl:variable name="vDoc2" select=
"document('')/*/xsl:variable[#name='vrtfDoc2']/*"/>
<xsl:template match="node()|#*" name="identity">
<xsl:param name="pDoc2"/>
<xsl:copy>
<xsl:apply-templates select="node()|#*">
<xsl:with-param name="pDoc2" select="$pDoc2"/>
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
<xsl:template match="/">
<xsl:apply-templates select="*">
<xsl:with-param name="pDoc2" select="$vDoc2"/>
</xsl:apply-templates>
-----------------------
<xsl:apply-templates select="$vDoc2">
<xsl:with-param name="pDoc2" select="/*"/>
</xsl:apply-templates>
</xsl:template>
<xsl:template match="Player/*">
<xsl:param name="pDoc2"/>
<xsl:if test=
"not(. = $pDoc2/*/*[name()=name(current())])">
<xsl:call-template name="identity"/>
</xsl:if>
</xsl:template>
<xsl:template match="Name|Team|Pos" priority="20"/>
</xsl:stylesheet>
when this transformation is applied on the same first document as above, the correct diffgrams are produced:
<Stats Date="2011-01-01">
<Player Rank="1">
<GP>39</GP>
<PlusMinus>20</PlusMinus>
<GW>3</GW>
<ShotPctg>154</ShotPctg>
</Player>
</Stats>
-----------------------
<Stats xmlns:xsl="http://www.w3.org/1999/XSL/Transform" Date="2011-01-01">
<Player Rank="2">
<GP>38</GP>
<PlusMinus>15</PlusMinus>
<GW>4</GW>
<ShotPctg>158</ShotPctg>
</Player>
</Stats>
How this works:
The transformation is applied on the first document, passing the second document as parameter.
This produces an XML document whose only leaf element nodes are the ones that have different value than the corresponding leaf element nodes in the second document.
The same processing is performed as in 1. above, but this time on the second document, passing the first document as parameter.
This produces a second diffgram: an XML document whose only leaf element nodes are the ones that have different value** than the corresponding leaf element nodes in the first document
Okay... I finally opted with a pure C# solution to compare the two XML files, without using the XML Diff/Patch .dll and without even needing to use XSL transforms. I will be needing XSL transforms in the next step though, to convert the Xml into HTML for viewing purposes, but I have figured an algorithm using nothing but System.Xml and System.Xml.XPath.
Here is my algorithm:
private void CompareXml(string file1, string file2)
{
// Load the documents
XmlDocument docXml1 = new XmlDocument();
docXml1.Load(file1);
XmlDocument docXml2 = new XmlDocument();
docXml2.Load(file2);
// Get a list of all player nodes
XmlNodeList nodes1 = docXml1.SelectNodes("/Stats/Player");
XmlNodeList nodes2 = docXml2.SelectNodes("/Stats/Player");
// Define a single node
XmlNode node1;
XmlNode node2;
// Get the root Xml element
XmlElement root1 = docXml1.DocumentElement;
XmlElement root2 = docXml2.DocumentElement;
// Get a list of all player names
XmlNodeList nameList1 = root1.GetElementsByTagName("Name");
XmlNodeList nameList2 = root2.GetElementsByTagName("Name");
// Get a list of all teams
XmlNodeList teamList1 = root1.GetElementsByTagName("Team");
XmlNodeList teamList2 = root2.GetElementsByTagName("Team");
// Create an XmlWriterSettings object with the correct options.
XmlWriter writer = null;
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.IndentChars = (" ");
settings.OmitXmlDeclaration = false;
// Create the XmlWriter object and write some content.
writer = XmlWriter.Create(StatsFile.XmlDiffFilename, settings);
writer.WriteStartElement("StatsDiff");
// The compare algorithm
bool match = false;
int j = 0;
try
{
// the list has 500 players
for (int i = 0; i < 500; i++)
{
while (j < 500 && match == false)
{
// There is a match if the player name and team are the same in both lists
if (nameList1.Item(i).InnerText == nameList2.Item(j).InnerText)
{
if (teamList1.Item(i).InnerText == teamList2.Item(j).InnerText)
{
match = true;
node1 = nodes1.Item(i);
node2 = nodes2.Item(j);
// Call to the calculator and Xml writer
this.CalculateDifferential(node1, node2, writer);
j = 0;
}
}
else
{
j++;
}
}
match = false;
}
// end Xml document
writer.WriteEndElement();
writer.Flush();
}
finally
{
if (writer != null)
writer.Close();
}
}
XML Results:
<?xml version="1.0" encoding="utf-8"?>
<StatsDiff>
<Player Rank="1">
<Name>Sidney Crosby</Name>
<Team>PIT</Team>
<Pos>C</Pos>
<GP>0</GP>
<G>0</G>
<A>0</A>
<Points>0</Points>
<PlusMinus>0</PlusMinus>
<PIM>0</PIM>
<PP>0</PP>
<SH>0</SH>
<GW>0</GW>
<OT>0</OT>
<Shots>0</Shots>
<ShotPctg>0</ShotPctg>
<ShiftsPerGame>0</ShiftsPerGame>
<FOWinPctg>0</FOWinPctg>
</Player>
<Player Rank="2">
<Name>Steven Stamkos</Name>
<Team>TBL</Team>
<Pos>C</Pos>
<GP>1</GP>
<G>0</G>
<A>0</A>
<Points>0</Points>
<PlusMinus>0</PlusMinus>
<PIM>2</PIM>
<PP>0</PP>
<SH>0</SH>
<GW>0</GW>
<OT>0</OT>
<Shots>4</Shots>
<ShotPctg>-0,6000004</ShotPctg>
<ShiftsPerGame>-0,09999847</ShiftsPerGame>
<FOWinPctg>0,09999847</FOWinPctg>
</Player>
[...]
</StatsDiff>
I have spared to show the implementation for the CalculateDifferential() method, it is rather cryptic but it is fast and efficient. This way I could obtain the results wanted without using any other reference but the strict minimum, without having to use XSL...

C#: How to remove namespace information from XML elements

How can I remove the "xmlns:..." namespace information from each XML element in C#?
Zombiesheep's cautionary answer notwithstanding, my solution is to wash the xml with an xslt transform to do this.
wash.xsl:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="no" encoding="UTF-8"/>
<xsl:template match="/|comment()|processing-instruction()">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="*">
<xsl:element name="{local-name()}">
<xsl:apply-templates select="#*|node()"/>
</xsl:element>
</xsl:template>
<xsl:template match="#*">
<xsl:attribute name="{local-name()}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
From here http://simoncropp.com/working-around-xml-namespaces
var xDocument = XDocument.Parse(
#"<root>
<f:table xmlns:f=""http://www.w3schools.com/furniture"">
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
</root>");
xDocument.StripNamespace();
var tables = xDocument.Descendants("table");
public static class XmlExtensions
{
public static void StripNamespace(this XDocument document)
{
if (document.Root == null)
{
return;
}
foreach (var element in document.Root.DescendantsAndSelf())
{
element.Name = element.Name.LocalName;
element.ReplaceAttributes(GetAttributes(element));
}
}
static IEnumerable GetAttributes(XElement xElement)
{
return xElement.Attributes()
.Where(x => !x.IsNamespaceDeclaration)
.Select(x => new XAttribute(x.Name.LocalName, x.Value));
}
}
I had a similar problem (needing to remove a namespace attribute from a particular element, then return the XML as an XmlDocument to BizTalk) but a bizarre solution.
Before loading the XML string into the XmlDocument object, I did a text replacement to remove the offending namespace attribute. It seemed wrong at first as I ended up with XML that could not be parsed by the "XML Visualizer" in Visual Studio. This is what initially put me off this approach.
However, the text could still be loaded into the XmlDocument and I could output it to BizTalk fine.
Note too that earlier, I hit one blind alley when trying to use childNode.Attributes.RemoveAll() to remove the namespace attribute - it just came back again!

Categories