Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I need to convert a XML File to make it readable in Excel. The idea is to Flatten the XML file. In addition, I am using C# as most resources assist me with SQL which i do not need.
Any help would do please.
Source:
<root>
<product>
<screen>
Samsung
</screen>
<screen>
Mecer
</screen>
</product>
<product>
<phone>
Sony
</phone>
<phone>
Nokia
</phone>
</product>
</root>
Expected Result
<dataSet>
<row>
<column>
screen
</column>
<column>
phone
</column>
</row>
<row>
<column>
Samsung
</column>
<column>
Sony
</column>
</row>
<row>
<column>
Mecer
</column>
<column>
Nokia
</column>
</row>
</dataSet>
The better way to you do this is create a new sctructure of class:
[Serializable]
[XmlRoot(ElementName = "dataSet")]
public class RootDataSet
{
[XmlElement(ElementName = "row")]
public List<Rows> Rows { get; set; }
}
[Serializable]
public class Rows
{
[XmlElement(ElementName = "column")]
public List<string> column { get; set; }
}
After you mnake that you can put in a method this code to generate the file.
static void Main(string[] args)
{
using (Stream fileStream = new FileStream(#"C:\Nova pasta\file.xml", FileMode.OpenOrCreate))
{
RootDataSet dataset = new RootDataSet();
dataset.Rows = new List<Rows>();
Rows Rows1 = new Rows();
Rows1.column = new List<string>();
Rows1.column.Add("teste1");
Rows1.column.Add("teste2");
dataset.Rows.Add(Rows1);
//use reflection to get the properties names of the class
//get the values of the class
XmlSerializer xmlSerializer = new XmlSerializer(dataset.GetType());
xmlSerializer.Serialize(fileStream, dataset);
}
}
The returned file will be:
<?xml version="1.0"?>
<dataSet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<row>
<column>teste1</column>
<column>teste2</column>
</row>
</dataSet>
As #June Paik mentioned, you could go down the route of XSLT
As this would allow you to configure how you wanted the XML to be transformed without having to recompile the application every time (you can just modify the XSLT and run the application again).
Here is a starting point for the XSLT:
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:apply-templates select="*"/>
</xsl:template>
<xsl:template match="node( ) | #*">
<xsl:copy><xsl:apply-templates select="#* | node( )"/></xsl:copy>
</xsl:template>
<!-- Name of the element you wish to find here-->
<xsl:template match="Product">
<!-- What you want to change it to here -->
<Row><xsl:apply-templates select="#* | node( )"/></Row>
</xsl:template>
</xsl:stylesheet>
The XSLT Changes elements named "Product" to "Row"
Save this to a file (e.g. productTransform.xslt) on C:\
Then use the XSLT in C# by writing the following 3 lines:
XslCompiledTransform Trans = new XslCompiledTransform();
Trans.Load(#"C:\productTransform.xslt");
Trans.Transform("products.xml", "transformedProducts.xml");
Related
I have the following problem. I need to write an application in C# that will read a given XML and prepare data for me to load into the database. In XML, the structure of which I have no influence, the main data is placed in CDATA. I have verified that the structure of this data is ordered in the correct XML structure.
I've searched hundreds of posts and can't find any solution from them. Below is the XML file from which I need to extract the data from the CDATA section. Maybe one of you can help me?
<Docs>
<Doc>
<Content>
<![CDATA[
<Doc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Header DocNumber="1" Description="Desc1"></Header>
<Poss>
<Pos Id="1" Name="Pos1"></Pos>
<Pos Id="2" Name="Pos2"></Pos>
</Poss>
</Doc>
]]>
</Content>
</Doc>
<Doc>
<Content>
<![CDATA[
<Doc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Header DocNumber="2" Description="Desc2"></Header>
<Poss>
<Pos Id="3" Name="Pos3"></Pos>
<Pos Id="4" Name="Pos4"></Pos>
</Poss>
</Doc>
]]>
</Content>
</Doc>
</Docs>
For me, the most important are the fields contained in the Content section and I have to load them as data into the database.
To extract the data from the CData part,
Construct the classes.
public class Doc
{
public Header Header { get; set; }
[XmlArrayItem(typeof(Pos), ElementName = "Pos")]
public List<Pos> Poss { get; set; }
}
public class Header
{
[XmlAttribute]
public int DocNumber { get; set; }
[XmlAttribute]
public string Description { get; set; }
}
public class Pos
{
[XmlAttribute]
public int Id { get; set; }
[XmlAttribute]
public string Name { get; set; }
}
Implement the extraction logic.
2.1. Read the XML string as XDocument via XDocument.Parse().
2.2. Select the DescendantNodes for the XPath of "/Docs/Doc/Content".
2.3. Convert the nodes to XCData type.
2.4. With XmlSerializer to deserialize the value of XCData to Doc type.
using System.Linq;
using System.Xml;
using System.Xml.Linq;
using System.Xml.XPath;
using System.Xml.Serialization;
using System.IO;
XmlSerializer xmlSerializer = new XmlSerializer(typeof(Doc));
XDocument xDoc = XDocument.Parse(xml);
var cdataSections = xDoc.XPathSelectElements("/Docs/Doc/Content")
.DescendantNodes()
.OfType<XCData>()
.Select(x => (Doc)xmlSerializer.Deserialize(new StringReader(x.Value)))
.ToList();
Demo # .NET Fiddle
Here is implementation based on a stored procedure with the XML parameter like in my comments.
I had to remove the <Poss> XML element to make CData section a well-formed XML.
SQL
DECLARE #xml XML =
N'<Docs>
<Doc>
<Content><![CDATA[
<Doc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Header DocNumber="1" Description="Desc1"></Header>
<Pos Id="1" Name="Pos1"></Pos>
<Pos Id="2" Name="Pos2"></Pos>
</Doc>
]]>
</Content>
</Doc>
<Doc>
<Content><![CDATA[
<Doc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Header DocNumber="2" Description="Desc2"></Header>
<Pos Id="3" Name="Pos3"></Pos>
<Pos Id="4" Name="Pos4"></Pos>
</Doc>
]]>
</Content>
</Doc>
</Docs>';
--INSERT INTO <targetTable>
SELECT h.value('(Header/#DocNumber)[1]', 'INT') AS DocNumber
, h.value('(Header/#Description)[1]', 'VARCHAR(256)') AS DocDescription
, d.value('#Id', 'INT') AS posID
, d.value('#Name', 'VARCHAR(256)') AS posName
FROM #xml.nodes('/Docs/Doc/Content/text()') AS t(c)
CROSS APPLY (SELECT TRY_CAST(c.query('.').value('.', 'NVARCHAR(MAX)') AS XML)) AS t1(x)
CROSS APPLY x.nodes('/Doc') AS t2(h)
CROSS APPLY h.nodes('Pos') AS t3(d);
Output
DocNumber
DocDescription
posID
posName
2
Desc2
3
Pos3
2
Desc2
4
Pos4
1
Desc1
1
Pos1
1
Desc1
2
Pos2
I'm transforming a > 2GB file with a lookup template in the XSLT.
I would like this to run faster but can't find any low hanging fruit to improve performance. Any help would be greatly appreciated.
I'm a newb when it comes to transformations.
This is the current format of the XML file.
<?xml version="1.0" encoding="utf-8" ?>
<contacts>
<contact>
<attribute>
<name>text12</name>
<value>B00085590</value>
</attribute>
<attribute>
<name>text34</name>
<value>Atomos</value>
</attribute>
<attribute>
<name>date866</name>
<value>02/21/1991</value>
</attribute>
</contact>
<contact>
<attribute>
<name>text12</name>
<value>B00058478</value>
</attribute>
<attribute>
<name>text34</name>
<value>Balderas</value>
</attribute>
<attribute>
<name>date866</name>
<value>11/24/1997</value>
</attribute>
</contact>
</contacts>
The xslt I used for the transformation.
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl"
>
<xsl:output method="xml" indent="yes"/>
<!--Identify location of the lookup xml-->
<xsl:param name="lookupDoc" select="document('C:\Projects\Attributes.xml')" />
<!--Main Template-->
<xsl:template match="/contacts">
<!--Apply Formatted Contacts Template-->
<xsl:apply-templates select="contact" />
</xsl:template>
<!--Formatted Contacts Template-->
<xsl:template match="contact">
<contact>
<xsl:for-each select="attribute">
<!--Create variable to hold New Name after passing the Data Name to the Lookup Template-->
<xsl:variable name="newName">
<xsl:apply-templates select="$lookupDoc/attributes/attribute">
<xsl:with-param name="nameToMatch" select="name" />
</xsl:apply-templates>
</xsl:variable>
<!--Format Contact Element with New Name variable-->
<xsl:element name="{$newName}">
<xsl:value-of select="value"/>
</xsl:element>
</xsl:for-each>
</contact>
</xsl:template>
<!--Lookup Template-->
<xsl:template match="attributes/attribute">
<xsl:param name="nameToMatch" />
<xsl:value-of select='translate(translate(self::node()[name = $nameToMatch]/mappingname, "()*%$##!~<>'&,.?[]=-+/\:1234567890", "")," ","")' />
</xsl:template>
</xsl:stylesheet>
Sample Lookup XML
<?xml version="1.0" encoding="utf-8" ?>
<attributes>
<attribute>
<name>text12</name>
<mappingname>ID</mappingname>
<datatype>Varchar2</datatype>
<size>30</size>
</attribute>
<attribute>
<name>text34</name>
<mappingname>Last Name</mappingname>
<datatype>Varchar2</datatype>
<size>30</size>
</attribute>
<attribute>
<name>date866</name>
<mappingname>DOB</mappingname>
<datatype>Date</datatype>
<size></size>
</attribute>
</attributes>
Transformed XML
<?xml version="1.0" encoding="utf-8" ?>
<contacts>
<contact>
<ID>B00085590</ID>
<LastName>Brady</LastName>
<DOB>02/21/1991</DOB>
</contact>
<contact>
<ID>B00058478</ID>
<LastName>Balderas</LastName>
<DOB>11/24/1997</DOB>
</contact>
</contacts>
C#
XsltSettings settings = new XsltSettings(true, true);
XslCompiledTransform ContactsXslt = new XslCompiledTransform();
ContactsXslt.Load(#"C:\Projects\ContactFormat.xslt", settings, new XmlUrlResolver());
using (XmlReader r = XmlReader.Create(#"C:\Projects\Contacts.xml")){
using (XmlWriter w = XmlWriter.Create(#"C:\Projects\FormattedContacts.xml")) {
w.WriteStartElement("contacts");
while (r.Read()) {
if (r.NodeType == XmlNodeType.Element && r.Name == "contact") {
XmlReader temp = new XmlTextReader(new StringReader(r.ReadOuterXml()));
ContactsXslt.Transform(temp, null, w);
}
}
}
}
The approach I'm taking is transforming 1 node at a time to avoid an OutOfMemoryException. Should I be feeding larger chunks through to speed up the process? Or am I going about this all wrong?
I think you can simplify the XSLT code
<xsl:for-each select="attribute">
<!--Create variable to hold New Name after passing the Data Name to the Lookup Template-->
<xsl:variable name="newName">
<xsl:apply-templates select="$lookupDoc/attributes/attribute">
<xsl:with-param name="nameToMatch" select="name" />
</xsl:apply-templates>
</xsl:variable>
using the template
<xsl:template match="attributes/attribute">
<xsl:param name="nameToMatch" />
<xsl:value-of select='translate(translate(self::node()[name = $nameToMatch]/mappingname, "()*%$##!~<>'&,.?[]=-+/\:1234567890", "")," ","")' />
</xsl:template>
to
<xsl:for-each select="attribute">
<!--Create variable to hold New Name after passing the Data Name to the Lookup Template-->
<xsl:variable name="newName">
<xsl:apply-templates select="$lookupDoc/attributes/attribute[name = current()/name]"/>
</xsl:variable>
with the template being simplified to
<xsl:template match="attributes/attribute">
<xsl:value-of select='translate(translate(mappingname, "()*%$##!~<>'&,.?[]=-+/\:1234567890", "")," ","")' />
</xsl:template>
I think that for sure is a more concise and XSLT way of expressing the approach, whether it improves performance is something you would have to test.
In general with XSLT to improve performance of cross-references/lookups it is recommended to use a key so you would use
<xsl:key name="att-lookup" match="attributes/attribute" use="name"/>
and then use it as
<xsl:variable name="name" select="name"/>
<xsl:variable name="newName">
<!-- in XSLT 1 we need to change the context doc for the key lookup -->
<xsl:for-each select="$lookupDoc">
<xsl:apply-templates select="key('att-lookup', $name)"/>
</xsl:variable>
I think that would considerable speed up the lookup in a single transformation, as you combine XmlReader and XSLT to run the XSLT many times on as many elements your XmlReader finds I can't tell whether it helps a lot, you would need to try.
As pointed out in the XSLT 3 suggestion, I would also consider transforming the lookup file first and once to avoid the repetition of all those translate calls to create proper XML element names. Either do that outside of the existing XSLT or do it inside by using a variable and then exsl:node-set to convert the result tree fragment into a variable. But in your case as you run the XSLT repeatedly I think it is probably better to first transform the lookup document outside of the main XSLT, to avoid having to do all those translates again and again.
When reading huge xml files always use XmlReader. I like using a combination of XmlReader and Xml linq. I also like using dictionaries. See code below :
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
namespace ConsoleApplication1
{
class Program
{
const string FILENAME = #"c:\temp\test.xml";
static void Main(string[] args)
{
XmlReader reader = XmlReader.Create(FILENAME);
while (!reader.EOF)
{
if (reader.Name != "contact")
{
reader.ReadToFollowing("contact");
}
if (!reader.EOF)
{
XElement xContact = (XElement)XElement.ReadFrom(reader);
Contact newContact = new Contact();
Contact.contacts.Add(newContact);
newContact.attributes = xContact.Descendants("attribute")
.GroupBy(x => (string)x.Element("name"), y => (string)y.Element("value"))
.ToDictionary(x => x.Key, y => y.FirstOrDefault());
}
}
}
}
public class Contact
{
public static List<Contact> contacts = new List<Contact>();
public Dictionary<string, string> attributes { get; set; }
}
}
As an alternative, you might want to look into solving the task with XSLT 3 and its streaming feature (https://www.w3.org/TR/xslt-30/#streaming-concepts) as there you could process the huge input file in a forwards only but declarative way where you only in the template for the attribute element you need to ensure you work with a intentionally created full copy of that element to allow XPath navigation to the child elements. Additionally I think it makese sense to read in the lookup document only once and do the translate calls to create the proper element names only once. So the following is a streaming XSLT 3 solution runnable with Saxon 9.8 EE which transforms the lookup document into an XPath 3.1 map (https://www.w3.org/TR/xpath-31/#id-maps) and otherwise uses a streamable mode to process the large, main input:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:map="http://www.w3.org/2005/xpath-functions/map"
exclude-result-prefixes="xs map"
version="3.0">
<!-- could of course load the document using select="document('lookup.xml')" instead of inlining it as done here just for the example and testing -->
<xsl:param name="lookup-doc">
<attributes>
<attribute>
<name>text12</name>
<mappingname>ID</mappingname>
<datatype>Varchar2</datatype>
<size>30</size>
</attribute>
<attribute>
<name>text34</name>
<mappingname>Last Name</mappingname>
<datatype>Varchar2</datatype>
<size>30</size>
</attribute>
<attribute>
<name>date866</name>
<mappingname>DOB</mappingname>
<datatype>Date</datatype>
<size></size>
</attribute>
</attributes>
</xsl:param>
<xsl:variable
name="lookup-map"
as="map(xs:string, xs:string)"
select="map:merge(
$lookup-doc/attributes/attribute
!
map {
string(name) : translate(translate(mappingname, '()*%$##!~<>''&,.?[]=-+/\:1234567890', ''), ' ','')
}
)"/>
<xsl:mode on-no-match="shallow-copy" streamable="yes"/>
<xsl:output method="xml" indent="yes"/>
<xsl:template match="contact/attribute">
<xsl:variable name="attribute-copy" select="copy-of()"/>
<xsl:element name="{$lookup-map($attribute-copy/name)}">
<xsl:value-of select="$attribute-copy/value"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
Online sample (there running with Saxon 9.8 HE which ignores the streaming and does normal XSLT processing) is at https://xsltfiddle.liberty-development.net/bFDb2Ct/1.
To run streaming XSLT 3 with Saxon 9.8 and C# you use http://saxonica.com/html/documentation/dotnetdoc/Saxon/Api/Xslt30Transformer.html and set up ApplyTemplates on an input Stream with your huge input XML (http://saxonica.com/html/documentation/dotnetdoc/Saxon/Api/Xslt30Transformer.html#ApplyTemplates(System.IO.Stream,Saxon.Api.XmlDestination)).
I am trying to transform one xml to another xml using xslt transform.
when there is a complex node repeating then all nodes are properly transformed and things are fine.
If the simple type is repeating then same count of nodes are transformed but value of all nodes are value of the very first node.
This is a part of xml
<GetDataResult xmlns="http://tempuri.org/">
<string xmlns="http://schemas.microsoft.com/2003/10/Serialization/Arrays">0</string>
<string xmlns="http://schemas.microsoft.com/2003/10/Serialization/Arrays">1</string>
<string xmlns="http://schemas.microsoft.com/2003/10/Serialization/Arrays">2</string>
<string xmlns="http://schemas.microsoft.com/2003/10/Serialization/Arrays">3</string>
</GetDataResult>
this is my Xslt snippet part
<response>
<xsl:for-each select="ns1:GetDataResponse/ns1:GetDataResult/ns2:string" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<item>
<xsl:value-of select="/ns1:GetDataResponse/ns1:GetDataResult/ns2:string/text()" />
</item>
</xsl:for-each>
</response>
i tried few combinations of Xslt for each loop, however the final result is as below. all the items have value of the first repeating node.
<?xml version="1.0" encoding="utf-16"?>
<Fields>
<response>
<item>0</item>
<item>0</item>
<item>0</item>
<item>0</item>
</response>
</Fields>
this it transformation code snippet.
XmlDocument xslDoc = new XmlDocument();
xslDoc.InnerXml = XsltCode;
System.Xml.Xsl.XslTransform xslTransform = new System.Xml.Xsl.XslTransform();
StringWriter xmlResult = new StringWriter();
try
{
//Load XSL Transform Object
xslTransform.Load(xslDoc, new XmlUrlResolver(), null);
//Load the xsl parameter if Any
System.Xml.Xsl.XsltArgumentList xslArgs = new System.Xml.Xsl.XsltArgumentList();
//Call the actual Transform method
xslTransform.Transform(xmlDoc, null, xmlResult, new XmlUrlResolver());
}
catch
{ }
string firstParse = xmlResult.ToString();
Use relative XPath to get the value of current ns2:string in every iteration instead :
<xsl:for-each select="ns1:GetDataResponse/ns1:GetDataResult/ns2:string" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<item>
<xsl:value-of select="." />
</item>
</xsl:for-each>
I'm working with C# .Net 3.5 and trying to convert a given xml (XDocument) into an empty one (where XElement.IsEmpty would be true) containing no text values . I tried setting the XElement.Value to String.Empty but that results in <element><element> which isn't what i need. I needed it to be <element />. Can someone suggest how this can be done in .NET.
below is input example:
<Envelope>
<Body>
<Person>
<first>John</first>
<last>Smith</last>
<address>123</address>
</Person>
</Body>
<Envelope>
expected output:
<Envelope>
<Body>
<Person>
<first />
<last />
<address />
</Person>
</Body>
<Envelope>
You can use ReplaceWith() function to replace desired elements with empty elements :
var xml = #"<Envelope>
<Body>
<Person>
<first>John</first>
<last>Smith</last>
<address>123</address>
</Person>
</Body>
</Envelope>";
var doc = XDocument.Parse(xml);
foreach (XElement propertyOfPerson in doc.XPathSelectElements("/Envelope/Body/Person/*").ToList())
{
propertyOfPerson.ReplaceWith(new XElement(propertyOfPerson.Name.LocalName));
}
Console.WriteLine(doc.ToString());
Result :
In Interest of sharing and whilst i have accepted the answer above, i actually went with the below approach and using A XSLT to transform the XML into what i wanted, so using the below code:
//an XSLT which removes the values and stripes the white spaces
const string xslMarkup = "<xsl:stylesheet version=\"1.0\" xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\"> <xsl:output method=\"xml\" omit-xml-declaration=\"yes\" indent=\"no\"/> <xsl:strip-space elements=\"*\"/> <xsl:template match=\"#* | node()\"> <xsl:copy> <xsl:apply-templates select=\"#* | node()\"/> </xsl:copy> </xsl:template> <xsl:template match=\"node()|#*\"> <xsl:copy> <xsl:apply-templates select=\"node()|#*\"/> </xsl:copy> </xsl:template><xsl:template match=\"*/text()\"/> </xsl:stylesheet>";
var transformedXml = new XDocument();
XNode xml = YOUR_XML_OBJECT_HERE;
using (var writer = transformedXml.CreateWriter())
{
// Load the XSLT
var xslt = new XslCompiledTransform();
xslt.Load(XmlReader.Create(new StringReader(xslMarkup)));
// Execute the transform and output the results to a writer.
xslt.Transform(xml.CreateReader(), writer);
}
return transformedXml.ToString(SaveOptions.DisableFormatting);
Try creating a new XElement without a value:
var xElement = new XElement("Envelope", new XElement("Body", new XElement("Person", "")))
In that manner.
I am trying to write a simple algorithm to read two XML files with the exact same nodes and structure but not necessarily the same data inside the child nodes and not the same order. How could I create a simple implementation for creating a third, temporary XML being the differential between the two first ones, using Microsoft's XML Diff .DLL ?
XML Diff on MSDN:
XML Diff and Patch Tool
XML Diff and Patch GUI Tool
sample XML code of the two different XML files to compare:
<?xml version="1.0" encoding="utf-8" ?>
<Stats Date="2011-01-01">
<Player Rank="1">
<Name>Sidney Crosby</Name>
<Team>PIT</Team>
<Pos>C</Pos>
<GP>39</GP>
<G>32</G>
<A>33</A>
<PlusMinus>20</PlusMinus>
<PIM>29</PIM>
</Player>
</Stats>
<?xml version="1.0" encoding="utf-8" ?>
<Stats Date="2011-01-10">
<Player Rank="1">
<Name>Sidney Crosby</Name>
<Team>PIT</Team>
<Pos>C</Pos>
<GP>42</GP>
<G>35</G>
<A>34</A>
<PlusMinus>22</PlusMinus>
<PIM>30</PIM>
</Player>
</Stats>
Result wanted (difference between the two)
<?xml version="1.0" encoding="utf-8" ?>
<Stats Date="2011-01-10">
<Player Rank="1">
<Name>Sidney Crosby</Name>
<Team>PIT</Team>
<Pos>C</Pos>
<GP>3</GP>
<G>3</G>
<A>1</A>
<PlusMinus>2</PlusMinus>
<PIM>1</PIM>
</Player>
</Stats>
In this case, I would probably use XSLT to convert the resulting XML "differential" file into a sorted HTML file, but I am not there yet. All I want to do is to display in the third XML file the difference of every numerical value of each nodes, starting from the "GP" child-node.
C# code I have so far:
private void CompareXml(string file1, string file2)
{
XmlReader reader1 = XmlReader.Create(new StringReader(file1));
XmlReader reader2 = XmlReader.Create(new StringReader(file2));
string diffFile = StatsFile.XmlDiffFilename;
StringBuilder differenceStringBuilder = new StringBuilder();
FileStream fs = new FileStream(diffFile, FileMode.Create);
XmlWriter diffGramWriter = XmlWriter.Create(fs);
XmlDiff xmldiff = new XmlDiff(XmlDiffOptions.IgnoreChildOrder |
XmlDiffOptions.IgnoreNamespaces |
XmlDiffOptions.IgnorePrefixes);
bool bIdentical = xmldiff.Compare(file1, file2, false, diffGramWriter);
diffGramWriter.Close();
// cleaning up after we are done with the xml diff file
File.Delete(diffFile);
}
That's what I have so far, but the results is garbage... note that for each "Player" node, the first three childs have NOT to be compared... How can I implement this?
There are two immediate solutions:
Solution 1.
You can first apply a simple transform to the two documents that will delete the elements that should not be compared. Then, compare the results ing two documents -- exactly with your current code. Here is the transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Name|Team|Pos"/>
</xsl:stylesheet>
When this transformation is applied to the provided XML document:
<Stats Date="2011-01-01">
<Player Rank="1">
<Name>Sidney Crosby</Name>
<Team>PIT</Team>
<Pos>C</Pos>
<GP>39</GP>
<G>32</G>
<A>33</A>
<PlusMinus>20</PlusMinus>
<PIM>29</PIM>
<PP>10</PP>
<SH>1</SH>
<GW>3</GW>
<Shots>0</Shots>
<ShotPctg>154</ShotPctg>
<TOIPerGame>20.8</TOIPerGame>
<ShiftsPerGame>21:54</ShiftsPerGame>
<FOWinPctg>22.6</FOWinPctg>
</Player>
</Stats>
the wanted resulting document is produced:
<Stats Date="2011-01-01">
<Player Rank="1">
<GP>39</GP>
<G>32</G>
<A>33</A>
<PlusMinus>20</PlusMinus>
<PIM>29</PIM>
<PP>10</PP>
<SH>1</SH>
<GW>3</GW>
<Shots>0</Shots>
<ShotPctg>154</ShotPctg>
<TOIPerGame>20.8</TOIPerGame>
<ShiftsPerGame>21:54</ShiftsPerGame>
<FOWinPctg>22.6</FOWinPctg>
</Player>
</Stats>
Solution 2.
This is a complete XSLT 1.0 solution (for convenience only, the second XML document is embedded in the transformation code):
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vrtfDoc2">
<Stats Date="2011-01-01">
<Player Rank="2">
<Name>John Smith</Name>
<Team>NY</Team>
<Pos>D</Pos>
<GP>38</GP>
<G>32</G>
<A>33</A>
<PlusMinus>15</PlusMinus>
<PIM>29</PIM>
<PP>10</PP>
<SH>1</SH>
<GW>4</GW>
<Shots>0</Shots>
<ShotPctg>158</ShotPctg>
<TOIPerGame>20.8</TOIPerGame>
<ShiftsPerGame>21:54</ShiftsPerGame>
<FOWinPctg>22.6</FOWinPctg>
</Player>
</Stats>
</xsl:variable>
<xsl:variable name="vDoc2" select=
"document('')/*/xsl:variable[#name='vrtfDoc2']/*"/>
<xsl:template match="node()|#*" name="identity">
<xsl:param name="pDoc2"/>
<xsl:copy>
<xsl:apply-templates select="node()|#*">
<xsl:with-param name="pDoc2" select="$pDoc2"/>
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
<xsl:template match="/">
<xsl:apply-templates select="*">
<xsl:with-param name="pDoc2" select="$vDoc2"/>
</xsl:apply-templates>
-----------------------
<xsl:apply-templates select="$vDoc2">
<xsl:with-param name="pDoc2" select="/*"/>
</xsl:apply-templates>
</xsl:template>
<xsl:template match="Player/*">
<xsl:param name="pDoc2"/>
<xsl:if test=
"not(. = $pDoc2/*/*[name()=name(current())])">
<xsl:call-template name="identity"/>
</xsl:if>
</xsl:template>
<xsl:template match="Name|Team|Pos" priority="20"/>
</xsl:stylesheet>
when this transformation is applied on the same first document as above, the correct diffgrams are produced:
<Stats Date="2011-01-01">
<Player Rank="1">
<GP>39</GP>
<PlusMinus>20</PlusMinus>
<GW>3</GW>
<ShotPctg>154</ShotPctg>
</Player>
</Stats>
-----------------------
<Stats xmlns:xsl="http://www.w3.org/1999/XSL/Transform" Date="2011-01-01">
<Player Rank="2">
<GP>38</GP>
<PlusMinus>15</PlusMinus>
<GW>4</GW>
<ShotPctg>158</ShotPctg>
</Player>
</Stats>
How this works:
The transformation is applied on the first document, passing the second document as parameter.
This produces an XML document whose only leaf element nodes are the ones that have different value than the corresponding leaf element nodes in the second document.
The same processing is performed as in 1. above, but this time on the second document, passing the first document as parameter.
This produces a second diffgram: an XML document whose only leaf element nodes are the ones that have different value** than the corresponding leaf element nodes in the first document
Okay... I finally opted with a pure C# solution to compare the two XML files, without using the XML Diff/Patch .dll and without even needing to use XSL transforms. I will be needing XSL transforms in the next step though, to convert the Xml into HTML for viewing purposes, but I have figured an algorithm using nothing but System.Xml and System.Xml.XPath.
Here is my algorithm:
private void CompareXml(string file1, string file2)
{
// Load the documents
XmlDocument docXml1 = new XmlDocument();
docXml1.Load(file1);
XmlDocument docXml2 = new XmlDocument();
docXml2.Load(file2);
// Get a list of all player nodes
XmlNodeList nodes1 = docXml1.SelectNodes("/Stats/Player");
XmlNodeList nodes2 = docXml2.SelectNodes("/Stats/Player");
// Define a single node
XmlNode node1;
XmlNode node2;
// Get the root Xml element
XmlElement root1 = docXml1.DocumentElement;
XmlElement root2 = docXml2.DocumentElement;
// Get a list of all player names
XmlNodeList nameList1 = root1.GetElementsByTagName("Name");
XmlNodeList nameList2 = root2.GetElementsByTagName("Name");
// Get a list of all teams
XmlNodeList teamList1 = root1.GetElementsByTagName("Team");
XmlNodeList teamList2 = root2.GetElementsByTagName("Team");
// Create an XmlWriterSettings object with the correct options.
XmlWriter writer = null;
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.IndentChars = (" ");
settings.OmitXmlDeclaration = false;
// Create the XmlWriter object and write some content.
writer = XmlWriter.Create(StatsFile.XmlDiffFilename, settings);
writer.WriteStartElement("StatsDiff");
// The compare algorithm
bool match = false;
int j = 0;
try
{
// the list has 500 players
for (int i = 0; i < 500; i++)
{
while (j < 500 && match == false)
{
// There is a match if the player name and team are the same in both lists
if (nameList1.Item(i).InnerText == nameList2.Item(j).InnerText)
{
if (teamList1.Item(i).InnerText == teamList2.Item(j).InnerText)
{
match = true;
node1 = nodes1.Item(i);
node2 = nodes2.Item(j);
// Call to the calculator and Xml writer
this.CalculateDifferential(node1, node2, writer);
j = 0;
}
}
else
{
j++;
}
}
match = false;
}
// end Xml document
writer.WriteEndElement();
writer.Flush();
}
finally
{
if (writer != null)
writer.Close();
}
}
XML Results:
<?xml version="1.0" encoding="utf-8"?>
<StatsDiff>
<Player Rank="1">
<Name>Sidney Crosby</Name>
<Team>PIT</Team>
<Pos>C</Pos>
<GP>0</GP>
<G>0</G>
<A>0</A>
<Points>0</Points>
<PlusMinus>0</PlusMinus>
<PIM>0</PIM>
<PP>0</PP>
<SH>0</SH>
<GW>0</GW>
<OT>0</OT>
<Shots>0</Shots>
<ShotPctg>0</ShotPctg>
<ShiftsPerGame>0</ShiftsPerGame>
<FOWinPctg>0</FOWinPctg>
</Player>
<Player Rank="2">
<Name>Steven Stamkos</Name>
<Team>TBL</Team>
<Pos>C</Pos>
<GP>1</GP>
<G>0</G>
<A>0</A>
<Points>0</Points>
<PlusMinus>0</PlusMinus>
<PIM>2</PIM>
<PP>0</PP>
<SH>0</SH>
<GW>0</GW>
<OT>0</OT>
<Shots>4</Shots>
<ShotPctg>-0,6000004</ShotPctg>
<ShiftsPerGame>-0,09999847</ShiftsPerGame>
<FOWinPctg>0,09999847</FOWinPctg>
</Player>
[...]
</StatsDiff>
I have spared to show the implementation for the CalculateDifferential() method, it is rather cryptic but it is fast and efficient. This way I could obtain the results wanted without using any other reference but the strict minimum, without having to use XSL...