Transform XML into multiple CSV using XSLT - c#

For example, I have XML file with a following structure:
<?xml version="1.0" encoding="utf-8"?>
<MainItem>
<Field1>1</Field1>
<Field2>2</Field2>
<SubItem>
<SubField1>1</SubField1>
<SubField2>2</SubField2>
</SubItem>
<SubItem>
<SubField1>3</SubField1>
<SubField2>4</SubField2>
</SubItem>
</MainItem>
I know for sure that there is always only one MainItem in XML file. At the same time, one MainItem may have multiple SubItem elements.
I want to be able to transform this XML into CSV using XSLT. Below is my current XSLT script:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:text>Field1,Field2</xsl:text>
<xsl:text>
</xsl:text>
<xsl:for-each select="MainItem">
<xsl:value-of select="Field1"/>
<xsl:text>,</xsl:text>
<xsl:value-of select="Field2"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
This XSLT transforms XML into following CSV:
Field1,Field2
1,2
The question is how can I use XSLT to transform the above-mentioned XML into 2 CSV files - the first one for MainItem element, the second one for SubItem?
I'm using .NET XslCompiledTransform class to perform transformation.

This is something doable using Cinchoo ETL library (an open source ETL framework)
using (var reader = new ChoXmlReader("test.xml").WithXPath("MainItem")
.WithField('Field1')
.WithField('Field2')
)
{
using (var writer = new ChoCSVWriter("test.csv"))
writer.Write(reader);
}
Disclaimer: I'm the author of this library.

Related

XSLT invalid token results in invalid XML document

I am using an XSLT file to transform an XML file to another XML file and then creating this XML file locally. I get this error:
System.InvalidOperationException: 'Token Text in state Start would result in an invalid XML document. Make sure that the ConformanceLevel setting is set to ConformanceLevel.Fragment or ConformanceLevel.Auto if you want to write an XML fragment. '
The XSLT file was debugged in visual studios and it looks like it works correctly but I don't understand this error. What does this mean and how can it be fixed?
This is my XML:
<?xml version="1.0" encoding="utf-8"?>
<In xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="take.xsd">
<Submit ID="1234">
<Values>
<Code>34</Code>
<Source>27</Source>
</Values>
<Information>
<Number>55</Number>
<Date>2018-05-20</Date>
<IsFile>1</IsFile>
<Location></Location>
<Files>
<File>
<Name>Red.pdf</Name>
<Type>COLOR</Type>
</File>
<File>
<Name>picture.pdf</Name>
<Type>IMAGE</Type>
</File>
</Files>
</Information>
</Submit>
</In>
My XSLT code:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
<xsl:output method="xml" indent="yes"/>
<!-- identity template - copies all elements and its children and attributes -->
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*" />
</xsl:copy>
</xsl:template>
<xsl:template match="/In">
<!-- Remove the 'In' element -->
<xsl:apply-templates select="node()"/>
</xsl:template>
<xsl:template match="Submit">
<!-- Create the 'Q' element and its sub-elements -->
<Q xmlns:tns="Q" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://schema.xsd" Source="{Values/Source}" Notification="true">
<xsl:copy>
<xsl:copy-of select="#*"/>
<xsl:apply-templates select="Values" />
<xsl:apply-templates select="Information" />
<xsl:apply-templates select="Information/Files" />
</xsl:copy>
</Q>
</xsl:template>
<xsl:template match="Information">
<!-- Create the 'Data' sub-element without all of its children -->
<xsl:copy>
<xsl:copy-of select="Number"/>
<xsl:copy-of select="Date"/>
<xsl:copy-of select="IsFile"/>
<xsl:copy-of select="Location"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
And this is the C# code used to transform the file:
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(#"D:\\Main\XLSTFiles\Test.xslt");
string xmlPath = #"D:\Documents\Test2.xml";
using (XmlWriter w = XmlWriter.Create(#"D:\Documents\NewFile.xml"))
{
xslt.Transform(xmlPath, w);
}
Also, is there a way to produce the new XML file with proper indentation? It seems to create each node after the last one is closed and on the custom template it just appends each item one after another.
It's an amazingly unhelpful message, isn't it? But I think I can decipher it for you.
The XSLT processor is producing its output by writing events such as start-document, start-element, output-text to an XML Writer.
If you want to produce a well-formed XML document, then you can't have any text before the start of the first element. The message is saying that if the last thing you did is to issue start-document, then the next thing isn't allowed to be text, because the document would be ill-formed (it says invalid, but it means ill-formed).
Now, XSLT stylesheets are allowed to produce "well-formed fragments" rather than only being allowed to write "well-formed documents". Actually, the term used in the XML spec is "well-formed external general parsed entity", but that's a bit of a mouthful, so everyone calls them "fragments" because that's what DOM calls them, and there's no point using correct terminology in error messages if no-one understands it. The difference is that a fragment can contain multiple elements and text nodes at the top level, for example this <b>really</b> is a <i>well-formed</i> fragment. The problem is that the destination to which you write the XSLT output might not handle fragments, and in this particular case, the XML Writer can handle a fragment only if it's configured to do so.
I suspect you didn't actually intend to produce a fragment, and you need to fix your XSLT code so it outputs a well-formed document.
To expand on Michael Kay's excellent answer (as this was too long to write in comments), for your particular input XML the issue is with whitespace. In the template matching /In you do this...
<xsl:template match="/In">
<!-- Remove the 'In' element -->
<xsl:apply-templates select="node()"/>
</xsl:template>
But by selecting node() you are selecting the whitespace nodes before and after the child Submit, so you end up with a text node before your root Q element causing the error.
So, what you could do in this case, is simply strip out the whitespace from your XML by adding this to your XSLT
<xsl:strip-space elements="*" />
Alternatively, you could also do this, to select only elements, as opposed other nodes (although this would omit comments and processing instructions)
<xsl:apply-templates select="*" />
However, if you have multiple Submit elements in your XML, you then get multiple Q elements in your output, which will be a fragment, as there would be a single root element. If this is what you really intend, then you should make the following change to your C#...
using (XmlWriter w = XmlWriter.Create(#"C:\Users\tcase.BGT\Documents\NewFile.xml", xslt.OutputSettings ))
The default ConformanceLevel is ConformanceLevel.Auto, which I think allows fragments. Adding this will also solve your indentation problem, as it will use the settings in your xsl:output.

Saxon XSLT: Serializer producing weird indents

I'm using Saxon HE 9.5.1.8 to transform an XML to another XML file.
My problem is that the XML content written by the Serializer() class of Saxon prints out several additional indents that I don't want to have in there. I'm assuming that this is "wrong" because I got the expected output when using the DomDestination() class (but then the outer XML document information is missing) or other XSL transformers like the one that is shipped with Visual Studio / .NET Framework.
This is the input XML:
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>$44.95</price>
<publish_date>2000-10-01</publish_date>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>$5.95</price>
<publish_date>2000-12-16</publish_date>
</book>
This is the XLST file:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl"
>
<xsl:output method="xml" indent="yes"/>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="book">
<book>
<xsl:copy-of select="#*|book/#*" />
<xsl:for-each select="*">
<xsl:attribute name="{name()}">
<xsl:value-of select="text()"/>
</xsl:attribute>
</xsl:for-each>
</book>
</xsl:template>
</xsl:stylesheet>
That is the expected output:
<?xml version="1.0" encoding="utf-8"?>
<catalog>
<book id="bk101" author="Gambardella, Matthew" title="XML Developer's Guide" genre="Computer" price="$44.95" publish_date="2000-10-01" />
<book id="bk102" author="Ralls, Kim" title="Midnight Rain" genre="Fantasy" price="$5.95" publish_date="2000-12-16" />
</catalog>
And that is the output when using Saxon:
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book id="bk101"
author="Gambardella, Matthew"
title="XML Developer's Guide"
genre="Computer"
price="$44.95"
publish_date="2000-10-01"/>
<book id="bk102"
author="Ralls, Kim"
title="Midnight Rain"
genre="Fantasy"
price="$5.95"
publish_date="2000-12-16"/>
</catalog>
Does anybody know how to suppress or modify this behavior of Saxon? That is the C# code that is used to call the Saxon API:
public Stream Transform(string xmlFilePath, string xsltFilePath)
{
var result = new MemoryStream();
var xslt = new FileInfo(xsltFilePath);
var input = new FileInfo(xmlFilePath);
var processor = new Processor();
var compiler = processor.NewXsltCompiler();
var executable = compiler.Compile(new Uri(xslt.FullName));
var destination = new Serializer();
destination.SetOutputStream(result);
using(var inputStream = input.OpenRead())
{
var transformer = executable.Load();
transformer.SetInputStream(inputStream, new Uri(input.DirectoryName));
transformer.Run(destination);
}
result.Position = 0;
return result;
}
Try setting http://saxonica.com/documentation9.5/extensions/output-extras/line-length.html to a very large value to avoid that attributes are put on a new line: <xsl:output xmlns:saxon="http://saxon.sf.net/" saxon:line-length="1000"/>.
Your goal of having multiple processors produce output in the same format is hopelessly misguided. That's especially so if you choose indented output: the spec leaves it entirely to implementations how to do indentation, saying only that the goal is to make it human-readable. (And placing constraints on where extra whitespace can be inserted.)
I'm sorry you don't find Saxon's way of wrapping long attribute lists pleasing, but it is entirely within the letter and the spirit of the specification. Without it, if you have an element with eight namespace declarations, you can easily get a line that is 400 characters long, which I certainly don't regard as human-readable.
There are many reasons that comparing two XML documents lexically is never going to work. For example, the attributes can be in a different order. There are two ways of comparing XML: convert the documents into canonical form using a "Canonical XML" processor, or compare them at the tree level for example by using the XPath 2.0 deep-equal() function. Ideally (especially if you want to know where the differences are, rather than just whether differences exist), use a specialist XML comparison tool such as DeltaXML.
For what it's worth, when we do unit testing, we first attempt a lexical comparison of the results. If that fails, we parse both documents and compare them using saxon:deep-equal(), which is a modified form of the deep-equal() function that gives fine control over the comparison rules, e.g. handling of whitespace and handling of namespaces.

How to get an absolute path to the directory of the XSL file?

My Schema.xsd file is located in the same directory with the .xsl file. In the .xsl file I would like to generate a link to Schema.xsl in the generated output. The generated output is located in different directories. Currently I do it like this:
<xsl:template match="/">
<root version="1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="../../../Schema.xsd">
<!-- . . . -->
However this forces the generated output to be located 3 levels under the directory of Schema.xsd. I would like to generate an absolute path to the schema in the output, so the output could be located anywhere.
Update. I use XSLT 1.0 (XslCompiledTransform implementation in .NET Framework 4.5).
XSLT 2.0 Solution
Use the XPath 2.0 function, resolve-uri():
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"
omit-xml-declaration="yes"
encoding="UTF-8"/>
<xsl:template match="/">
<root version="1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="{concat(resolve-uri('.'), 'Schema.xsd')}">
</root>
</xsl:template>
</xsl:stylesheet>
Yields, without parameter passing and regardless of the input XML:
<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
version="1.0"
xsi:noNamespaceSchemaLocation="file:/c:/path/to/XSLT/file/Schema.xsd"/>
This is a sketch of how to do it (also see Passing parameters to XSLT Stylesheet via .NET).
In your C# code you need to define and use a parameter list:
XsltArgumentList argsList = new XsltArgumentList();
argsList.AddParam("SchemaLocation","","<SOME_PATH_TO_XSD_FILE>");
XslCompiledTransform transform = new XslCompiledTransform();
transform.Load("<SOME_PATH_TO_XSLT_FILE>");
using (StreamWriter sw = new StreamWriter("<SOME_PATH_TO_OUTPUT_XML>"))
{
transform.Transform("<SOME_PATH_TO_INPUT_XML>", argsList, sw);
}
Your XSLT could be enhanced like this:
...
<xsl:param name="SchemaLocation"/> <!-- this more or less at the top of your XSLT! -->
...
<xsl:template match="/">
<root version="1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="{$SchemaLocation}">
...
...
</xsl:template>
....

Preserving whitespace within XML elements between attributes when using XslCompiledTransform

I am applying an XSL-T file xsltUri to an XML file TargetXmlFile using the XslCompiledTransform class:
XslCompiledTransform xslTransform = new XslCompiledTransform(false);
xslTransform.Load(xsltUri);
using (var outStream = new MemoryStream())
{
var writer = new StreamWriter(outStream, new UTF8Encoding());
using (var reader = new XmlTextReader(TargetXmlFileName)
{
WhitespaceHandling = WhitespaceHandling.All,
DtdProcessing = DtdProcessing.Ignore
})
{
xslTransform.Transform(reader, xsltArguments, writer);
}
outStream.Position = 0;
using (FileStream outFile = new FileStream(outputFileName, FileMode.Create))
{
outStream.CopyTo(outFile);
}
}
Input XML:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<element
id="1"
attr1="value11"
attr2="value12"/>
<element id="2" attr1="value21" attr2="value22"/>
</root>
Input XSL:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="//element[#id='2']/#attr1">
<xsl:attribute name="attr1">
<xsl:value-of select="'newvalue21'"/>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
Actual output XML:
<?xml version="1.0" encoding="utf-8"?><root>
<element id="1" attr1="value11" attr2="value12" />
<element id="2" attr1="newvalue21" attr2="value22" />
</root>
Desired output XML:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<element
id="1"
attr1="value11"
attr2="value12"/>
<element id="2" attr1="newvalue21" attr2="value22"/>
</root>
Question: How can I preserve the whitespace (particularly, line breaks) of the input XML file within the "element" tags in the output XML file? I have experimented with different options, but nothing worked for this case.
Thanks for any hints!
This has nothing to do with XSLT. The whitespace you're referring to does not exist in the XML document model, and it cannot be made significant to a conformant XML processor, even with xml:space="preserve". There is no place for it in the DOM, and it will be skipped by the reader; as such there is no way to copy it to the writer. You would have to emit the XML with custom code (in other words, not with an XmlWriter).
The internal formatting of a tag (whitespace between attributes) is completely ephemeral in XML.
As far as XML documents are concerned, it does not exist.
As far as XML parsers are concerned, it is ignored, because 1). The only exception is that whitespace is illegal immediately after a <.
As far as XML serializers are concerned, they can do what they want, because 1) and 2). Most (if not all) will use a single space character to separate attributes from each other.
So...
Don't try to build an application that depends on the source code layout of XML.
Since this kind of source code layout in XML is technically irrelevant… get over your OCD. ;)

Trying to generate objects for an XML file I have

I have an XML file, I need to extract values from it, and put them in another XML file.
Questions:
Another person is creating the "schema" for the resulting XML file. Is there something that person can give me that will automate the inserting of the values? Do I even need to extract anything from the XML, or can something like a XSLT just do all the transformation?
Is there a problem with this XML structure below? I tried using xsd2code to generate objects but nothing will load when I use the LoadFromFileMethod - I read an article that wasn't very specific but said "nested parents" cause problems for XSD.exe and xsd2code.
<Section>
<Form id="1"...>
<Control id="12523"..> <--Some have this some don't
<Property name="Color">Red</Property>
<Property name="Size">Large</Property>
</Control>
</Form>
<Form id="2"...>
<Property name="Color">Blue</Property>
<Property name="Size">Large</Property>
</Form>
<Form id="3"...>
<Property name="Color">Red</Property>
<Property name="Size">Small</Property>
</Form>
</Section>
Thank you for any guidance!
XSLT is the tool for XML transformation.
As far as your XML goes, in a lot of applications you should replace this:
<Property name="Color">Red</Property>
with:
<Color>Red</Color>
Some reasons:
If you want to write a schema that restricts an element's content in some way (e.g. to one of a list of values), the element must be identifiable by its name; you can't write one schema for a Property element with a name attribute equals "Color" and another schema for a Property element whose name attribute equals "Size".
It's easier to write XPath predicates if your element names are meaningful. For instance, Form[Color = 'Red'] is a lot easier to write (and read) than Form[Property[#name='Color' and .='Red']]
The above is also true if you're writing Linq queries against the XML, in pretty much the same manner. Compare Element.Descendants("Color") with Element.Descendents("Property").Where(x => x.Attributes["name"] == "Color").
There are applications where it's appropriate to have generically-named elements, too; the above argument's not definitive. But if you're going to do that, you should have good reasons.
XLST is the best way to transform xml from one schema to another. Thats exactly what it was built to do. http://w3schools.com/xsl/default.asp is an excellent XSLT tutorial. All you really need is the schema, or a few examples of his xml to write your xslt file.
Also, your xml looks fine/well-formed to me.
XSLT is probably the solution if you just want to transform it, but if you need to do anything with the values in code then LINQ to Xml will make your task much easier.
I'd use XSLT for this, here's a small example to get you started.
Copy this sample code to an empty c# project:
static void Main(string[] args) {
const string xmlPath = "source.xml";
const string xslPath = "transform.xsl";
const string outPath = "out.xml";
try {
//load the Xml doc
var xmlDoc = new XPathDocument(xmlPath);
//load the Xsl
var xslDoc = new XslCompiledTransform();
xslDoc.Load(xslPath);
// create the output file
using (var outDoc = new XmlTextWriter(outPath, null)) {
//do the actual transform of Xml
xslDoc.Transform(xmlDoc, null, outDoc);
}
}
catch (Exception e) {
Console.WriteLine("Exception: {0}", e.ToString());
}
}
Write your example xml code above into source.xml file and put the following xsl code into transform.xsl file:
<?xml version="1.0" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output indent="yes" method="xml" />
<xsl:template match="/">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="Section">
<OtherSection>
<xsl:apply-templates />
</OtherSection>
</xsl:template>
<xsl:template match="Form">
<OtherForm>
<xsl:attribute name="id">
<xsl:value-of select="#id" />
</xsl:attribute>
<xsl:apply-templates />
</OtherForm>
</xsl:template>
<xsl:template match="Control">
<OtherControl>
<!-- converts id attribute to an id tag -->
<id>
<xsl:value-of select="#id" />
</id>
<xsl:apply-templates />
</OtherControl>
</xsl:template>
<xsl:template match="Property">
<OtherProperty>
<!-- converts name attribute to an id attribute -->
<xsl:attribute name="id">
<xsl:value-of select="#name" />
</xsl:attribute>
<xsl:value-of select="."/>
</OtherProperty>
</xsl:template>
</xsl:stylesheet>
The resulting out.xml should give you an idea of how the xsl is doing the work and hopefully get you started.
For more info on XSLT look up the tutorial on W3Schools.

Categories