I do have the following xml file
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
<Document>
<open>1</open>
<Placemark>
<name>L14A</name>
<description>ID:01F40BF0
PLACEMENT:Home Woods
RSSI:-82
</description>
<Style>
<IconStyle>
<Icon>
<href>http://chart.apis.google.com/chart?chst=d_map_pin_letter&chld=3|0000CC|FFFFFF</href>
</Icon>
</IconStyle>
</Style>
<Point>
<coordinates>-73.16551208,44.71051217,0</coordinates>
</Point>
</Placemark>
</Document>
</kml>
The file is bigger than that but it does represent the structure. I'm trying to remove the element <Style> but I can't find a way to get it right.
I have tried the following method:
How to remove an element from an xml using Xdocument when we have multiple elements with same name but different attributes
The code is:
XDocument xdoc = XDocument.Load("kkk.kml");
xdoc.Descendants("Style").Remove();
xdoc.Save("kkk-mod.kml");
The Descendants collection is always empty.
Also, when I save the file, it does append "kml:" to each of my elements (see below).
<kml:Placemark>
<kml:name>L14A</kml:name>
<kml:description>ID:01F40BF0
</kml:description>
<kml:Point>
<kml:coordinates>-73.200,44.500,0</kml:coordinates>
</kml:Point>
</kml:Placemark>
How may I get it right?
the remove
the :kml appended in the final file.
You need to include the namespace in order to access the node. Based on the sampel XML you posted, the namespace is http://www.opengis.net/kml/2.2, so something like this should get you going:
XDocument xdoc = XDocument.Load("kkk.kml");
XNamespace ns = "http://www.opengis.net/kml/2.2";
xdoc.Descendants(ns + "Style").Remove();
xdoc.Save("kkk-mod.kml");
If you want to remove the "kml" prefix from the modified document, you can use the following code snippet. This will remove all the namespaces from the document.
XDocument xdoc = XDocument.Load("kkk.kml");
XNamespace ns = "http://www.opengis.net/kml/2.2";
xdoc.Descendants(ns + "Style").Remove();
XElement newDoc = RemoveAllNamespaces(xdoc.Root);
xdoc.Save("kkk-mod.kml");
public static XElement RemoveAllNamespaces(XElement e)
{
return new XElement(e.Name.LocalName,
(from n in e.Nodes()
select ((n is XElement) ? RemoveAllNamespaces(n as XElement) : n)),
(e.HasAttributes) ?
(from a in e.Attributes()
where (!a.IsNamespaceDeclaration)
select new XAttribute(a.Name.LocalName, a.Value)) : null);
}
Taken from this SO answer.
The resulting XML file looks like this:
<?xml version="1.0" encoding="utf-8"?>
<kml>
<Document>
<open>1</open>
<Placemark>
<name>L14A</name>
<description>ID:01F40BF0
PLACEMENT:Home Woods
RSSI:-82
</description>
<Point>
<coordinates>-73.16551208,44.71051217,0</coordinates>
</Point>
</Placemark>
</Document>
</kml>
Of course, you can use a native language for XML restructuring called XSLT requiring no looping. As information, XSLT is a declarative, special-purpose programming language (same type as SQL) used to re-format, style, and re-structure XML documents for various end use needs. Practically all general purpose languages maintain XSLT processors including C#, Java, Python, PHP, Perl, and VB.
Below is a solution for future readers where the XSLT script runs an identity transform to copy entire document as is and then writes an empty template to the <Style> node, thereby removing it:
XSLT script (save as .xsl or .xslt file)
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns="http://www.opengis.net/kml/2.2"
xmlns:gx="http://www.google.com/kml/ext/2.2"
xmlns:kml="http://www.opengis.net/kml/2.2"
xmlns:atom="http://www.w3.org/2005/Atom">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>
<!-- Identity Transform -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<!-- Empty Template for Style Elemeent -->
<xsl:template match="kml:Style"/>
</xsl:transform>
C# Script (see tutorial)
using System;
using System.Xml;
using System.Xml.Xsl;
namespace XSLTransformation
{
class Class1
{
static void Main(string[] args)
{
XslTransform myXslTransform;
myXslTransform = new XslTransform();
myXslTransform.Load("XSLTScript.xsl");
myXslTransform.Transform("InputXML.xml", "OutpuXML.xml");
}
}
}
Related
I'm using Saxon HE 9.5.1.8 to transform an XML to another XML file.
My problem is that the XML content written by the Serializer() class of Saxon prints out several additional indents that I don't want to have in there. I'm assuming that this is "wrong" because I got the expected output when using the DomDestination() class (but then the outer XML document information is missing) or other XSL transformers like the one that is shipped with Visual Studio / .NET Framework.
This is the input XML:
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>$44.95</price>
<publish_date>2000-10-01</publish_date>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>$5.95</price>
<publish_date>2000-12-16</publish_date>
</book>
This is the XLST file:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl"
>
<xsl:output method="xml" indent="yes"/>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="book">
<book>
<xsl:copy-of select="#*|book/#*" />
<xsl:for-each select="*">
<xsl:attribute name="{name()}">
<xsl:value-of select="text()"/>
</xsl:attribute>
</xsl:for-each>
</book>
</xsl:template>
</xsl:stylesheet>
That is the expected output:
<?xml version="1.0" encoding="utf-8"?>
<catalog>
<book id="bk101" author="Gambardella, Matthew" title="XML Developer's Guide" genre="Computer" price="$44.95" publish_date="2000-10-01" />
<book id="bk102" author="Ralls, Kim" title="Midnight Rain" genre="Fantasy" price="$5.95" publish_date="2000-12-16" />
</catalog>
And that is the output when using Saxon:
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book id="bk101"
author="Gambardella, Matthew"
title="XML Developer's Guide"
genre="Computer"
price="$44.95"
publish_date="2000-10-01"/>
<book id="bk102"
author="Ralls, Kim"
title="Midnight Rain"
genre="Fantasy"
price="$5.95"
publish_date="2000-12-16"/>
</catalog>
Does anybody know how to suppress or modify this behavior of Saxon? That is the C# code that is used to call the Saxon API:
public Stream Transform(string xmlFilePath, string xsltFilePath)
{
var result = new MemoryStream();
var xslt = new FileInfo(xsltFilePath);
var input = new FileInfo(xmlFilePath);
var processor = new Processor();
var compiler = processor.NewXsltCompiler();
var executable = compiler.Compile(new Uri(xslt.FullName));
var destination = new Serializer();
destination.SetOutputStream(result);
using(var inputStream = input.OpenRead())
{
var transformer = executable.Load();
transformer.SetInputStream(inputStream, new Uri(input.DirectoryName));
transformer.Run(destination);
}
result.Position = 0;
return result;
}
Try setting http://saxonica.com/documentation9.5/extensions/output-extras/line-length.html to a very large value to avoid that attributes are put on a new line: <xsl:output xmlns:saxon="http://saxon.sf.net/" saxon:line-length="1000"/>.
Your goal of having multiple processors produce output in the same format is hopelessly misguided. That's especially so if you choose indented output: the spec leaves it entirely to implementations how to do indentation, saying only that the goal is to make it human-readable. (And placing constraints on where extra whitespace can be inserted.)
I'm sorry you don't find Saxon's way of wrapping long attribute lists pleasing, but it is entirely within the letter and the spirit of the specification. Without it, if you have an element with eight namespace declarations, you can easily get a line that is 400 characters long, which I certainly don't regard as human-readable.
There are many reasons that comparing two XML documents lexically is never going to work. For example, the attributes can be in a different order. There are two ways of comparing XML: convert the documents into canonical form using a "Canonical XML" processor, or compare them at the tree level for example by using the XPath 2.0 deep-equal() function. Ideally (especially if you want to know where the differences are, rather than just whether differences exist), use a specialist XML comparison tool such as DeltaXML.
For what it's worth, when we do unit testing, we first attempt a lexical comparison of the results. If that fails, we parse both documents and compare them using saxon:deep-equal(), which is a modified form of the deep-equal() function that gives fine control over the comparison rules, e.g. handling of whitespace and handling of namespaces.
I am applying an XSL-T file xsltUri to an XML file TargetXmlFile using the XslCompiledTransform class:
XslCompiledTransform xslTransform = new XslCompiledTransform(false);
xslTransform.Load(xsltUri);
using (var outStream = new MemoryStream())
{
var writer = new StreamWriter(outStream, new UTF8Encoding());
using (var reader = new XmlTextReader(TargetXmlFileName)
{
WhitespaceHandling = WhitespaceHandling.All,
DtdProcessing = DtdProcessing.Ignore
})
{
xslTransform.Transform(reader, xsltArguments, writer);
}
outStream.Position = 0;
using (FileStream outFile = new FileStream(outputFileName, FileMode.Create))
{
outStream.CopyTo(outFile);
}
}
Input XML:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<element
id="1"
attr1="value11"
attr2="value12"/>
<element id="2" attr1="value21" attr2="value22"/>
</root>
Input XSL:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="//element[#id='2']/#attr1">
<xsl:attribute name="attr1">
<xsl:value-of select="'newvalue21'"/>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
Actual output XML:
<?xml version="1.0" encoding="utf-8"?><root>
<element id="1" attr1="value11" attr2="value12" />
<element id="2" attr1="newvalue21" attr2="value22" />
</root>
Desired output XML:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<element
id="1"
attr1="value11"
attr2="value12"/>
<element id="2" attr1="newvalue21" attr2="value22"/>
</root>
Question: How can I preserve the whitespace (particularly, line breaks) of the input XML file within the "element" tags in the output XML file? I have experimented with different options, but nothing worked for this case.
Thanks for any hints!
This has nothing to do with XSLT. The whitespace you're referring to does not exist in the XML document model, and it cannot be made significant to a conformant XML processor, even with xml:space="preserve". There is no place for it in the DOM, and it will be skipped by the reader; as such there is no way to copy it to the writer. You would have to emit the XML with custom code (in other words, not with an XmlWriter).
The internal formatting of a tag (whitespace between attributes) is completely ephemeral in XML.
As far as XML documents are concerned, it does not exist.
As far as XML parsers are concerned, it is ignored, because 1). The only exception is that whitespace is illegal immediately after a <.
As far as XML serializers are concerned, they can do what they want, because 1) and 2). Most (if not all) will use a single space character to separate attributes from each other.
So...
Don't try to build an application that depends on the source code layout of XML.
Since this kind of source code layout in XML is technically irrelevant… get over your OCD. ;)
I am Working on Visual-studio 2012 in C#.
I want to update the value of a node of a XSLT.
This abc.xslt is like:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs">
<xsl:output method="xml" encoding="UTF-8" indent="yes" />
<xsl:template match="/">
<DocumentElement>
<PositionMaster>
<Name>
<xsl:value-of select = "'Ryan'"/>
</Name>
</PositionMaster>
</DocumentElement>
Code i have written to modify this XSLT in the C# is:
XmlDocument xslDoc = new XmlDocument();
xslDoc.Load(abc.xslt);
XmlNamespaceManager nsMgr = new XmlNamespaceManager(xslDoc.NameTable);
nsMgr.AddNamespace("xsl", "http://www.w3.org/1999/XSL/Transform");
I am looking to change the value of Name field to David. What should i write further here?
XmlElement valueOf = xslDoc.SelectSingleNode("/xsl:stylesheet/xsl:template[#match = '/']/DocumentElement/PositionMaster/Name/xsl:value-of", nsMgr);
if (valueOf != null)
{
valueOf.SetAttribute("select", "'David'");
xslDoc.Save("new.xslt");
}
else
{
// handle case here that element was not found
}
You seem to be going about this a very odd way. Why not just use a stylesheet parameter (a global xsl:param element)?
And if you do need to modify a source stylesheet, as you sometimes do, surely it makes more sense to use XSLT for the purpose?
So, I've got a massive XML file and I want to remove all CDATA sections and replace the CDATA node contents with safe, html encoded text nodes.
Just stripping out the CDATA with a regex will of course break the parsing. Is there a LINQ or XmlDocument or XmlTextWriter technique to swap out the CDATA with encoded text?
I'm not too concerned with the final encoding quite yet, just how to replace the sections with the encoding of my choice.
Original Example
---
<COLLECTION type="presentation" autoplay="false">
<TITLE><![CDATA[Rights & Responsibilities]]></TITLE>
<ITEM id="2802725d-dbac-e011-bcd6-005056af18ff" presenterGender="male">
<TITLE><![CDATA[Watch the demo]]></TITLE>
<LINK><![CDATA[_assets/2302725d-dbac-e011-bcd6-005056af18ff/presentation/presentation-00000000.mp4]]></LINK>
</ITEM>
</COLLECTION>
---
Sould Become
<COLLECTION type="presentation" autoplay="false">
<TITLE>Rights & Responsibilities</TITLE>
<ITEM id="2802725d-dbac-e011-bcd6-005056af18ff" presenterGender="male">
<TITLE>Watch the demo</TITLE>
<LINK>_assets/2302725d-dbac-e011-bcd6-005056af18ff/presentation/presentation-00000000.mp4</LINK>
</ITEM>
</COLLECTION>
I guess the ultimate goal is to move to JSON. I've tried this
XmlDocument doc = new XmlDocument();
doc.Load(Server.MapPath( #"~/somefile.xml"));
string jsonText = JsonConvert.SerializeXmlNode(doc);
But I end up with ugly nodes, i.e. "#cdata-section" keys. It would take WAAAAY to many hours to have the front end re-developed to accept this.
"COLLECTION":[{"#type":"whitepaper","TITLE":{"#cdata-section":"SUPPORTING DOCUMENTS"}},{"#type":"presentation","#autoplay":"false","TITLE":{"#cdata-section":"Demo Presentation"},"ITEM":{"#id":"2802725d-dbac-e011-bcd6-005056af18ff","#presenterGender":"male","TITLE":{"#cdata-section":"Watch the demo"},"LINK":{"#cdata-section":"_assets/2302725d-dbac-e011-bcd6-005056af18ff/presentation/presentation-00000000.mp4"}
Process the XML with a XSLT that just copies input to output - C# code:
XslCompiledTransform transform = new XslCompiledTransform();
transform.Load(#"c:\temp\id.xslt");
transform.Transform(#"c:\temp\cdata.xml", #"c:\temp\clean.xml");
id.xslt:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Using LINQ to XML, you can do it like this:
XDocument doc = …;
var cDataNodes = doc.DescendantNodes().OfType<XCData>().ToArray();
foreach (var cDataNode in cDataNodes)
cDataNode.ReplaceWith(new XText(cDataNode));
I think you can load the xml into a XmlDocument class. Then recursively process each XmlNode and look for XmlCDataSection node. This XmlCDataSection node should be replaced withXmlTextNode with same value.
I am doing an xslt transform inside my c# program. When I run the xslt on its own it outputs just fine, but when I run it from within my c# program it always leaves off the:
<?xml version="1.0" encoding="UTF-8"?>
At the top of the resulting xml document. My XSLT file looks like:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:hd="http://www.hotdocs.com/schemas/component_library/2009"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xml="http://www.w3.org/XML/1998/namespace">
<xsl:output method="xml" omit-xml-declaration="no" version="1.0" encoding="UTF-8"/>
<xsl:template match="/xsd:schema">
<hd:componentLibrary xmlns:hd="something" version="10">
</hd:componentLibrary>
</xsl:template>
<xsl:template match="text()" />
</xsl:stylesheet>
I am running the xslt in my c# program like this:
XPathDocument myXPathDoc = new XPathDocument(PathToXMLDocument);
XslCompiledTransform myXslTrans = new XslCompiledTransform();
myXslTrans.Load(PathToXSLTDocument);
XmlTextWriter myWriter = new XmlTextWriter(PathToOutputLocation, null);
myXslTrans.Transform(myXPathDoc,null,myWriter);
myWriter.Close();
I have tried the xslt document without the xsl:output line, but that does not seem to help.
How can i get the ?xml tag at the top of my outputted xml file?
Thanks
XmlTextWriter is a bit outdated. I recommend you switch to XmlWriter.Create.
Then you can specify OmitXmlDeclaration = false in the XmlWriterSettings.
If you use XmlWriter.Create() then you can pass an XmlWriterSettings instance as a parameter. The OmitXmlDeclaration member in the settings class controls whether or not the tag is included.