Transforming XML to valid HTML with XmlDocument - c#

I have XML and XSL I want to transform to valid HTML. Source and targets are XmlDocuments.
I have this code:
public static XmlDocument XslTransformation(XslCompiledTransform xslt, XmlDocument input)
{
XmlDocument target = new XmlDocument();
using (var writer = XmlWriter.Create(target.CreateNavigator().AppendChild(), xslt.OutputSettings))
{
xslt.Transform(input, writer);
}
return target;
}
In XSL we set output method:
<xsl:output method="html" version="1.0" encoding="iso-8859-1" indent="yes" omit-xml-declaration="no"/>
But result is not valid HTML. For example
<script src="blah.js"></script>
is converted to
<script src="blah.js" />
I checked OutputMethod of XSLT OutputSettings and it is set to "Html".
I found many related questions and accepted answers but I dont understand why I am still getting self-closing tags.

Related

XSLT to transform HTML to Markdown not working

I'm using the XSLT found here to transform content in HTML to Markdown format but the results I'm getting are plain text without the Markdown formatting syntax. Here's the function I'm using:
private static string ConvertToText()
{
string text = string.Empty;
XmlDocument xsl = new XmlDocument();
xsl.CreateEntityReference("nbsp");
xsl.Load(System.Web.HttpContext.Current.Server.MapPath("/Test/markdown.xslt"));
XmlReader xr = XmlReader.Create(System.Web.HttpContext.Current.Server.MapPath("/Test/html.xml"));
//creating stringwriter
StringWriter writer = new System.IO.StringWriter();
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(xsl);
xslt.Transform(xr, null, writer);
//return string
text = writer.ToString();
writer.Close();
return text;
}
Can anyone tell me why it's not working?
Thanks.
I guess your problem is the xmlns in your input XML. Try either to remove it in the xr variable before you transform it or to adjust your XSL file with namespace declarations like:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:html="http://www.w3.org/1999/xhtml">
...
<xsl:template match="html:h3">
...

Reading XML file and indenting

I've been having problems with the indentation of my XML files. Everytime I load them from a certain server, the XML nodes all jumble up on a few lines. I want to write a quick application to indent the nodes properly. That is:
<name>Bob<name>
<age>24</age>
<address>
<stnum>2</stnum>
<street>herp derp st</street>
</address>
currently it's coming out as :
<name>bob</name><age>24</age>
<address>
<stnum>2</stnum><street>herp derp st</street>
</address>
since I can't touch the internal program that gives me these xml files and re-indenting them without a program would take ages, I wanted to write up a quick program to do this for me. When I use the XMLdocument library stuff, it only reads the information of the nodes. So my question is, whats a good way to read the file, line by line and then reindenting it for me. All xml nodes are the same.
Thanks.
You can use the XmlTextWritter class. More specifically the .Formatting = Formatting.Indented.
Here is some sample code I found on this blog post.
http://www.yetanotherchris.me/home/2009/9/9/formatting-xml-in-c.html
public static string FormatXml(string inputXml)
{
XmlDocument document = new XmlDocument();
document.Load(new StringReader(inputXml));
StringBuilder builder = new StringBuilder();
using (XmlTextWriter writer = new XmlTextWriter(new StringWriter(builder)))
{
writer.Formatting = Formatting.Indented;
document.Save(writer);
}
return builder.ToString();
}
With LINQ to XML, it's basically a one-liner:
public static string Reformat(string xml)
{
return XDocument.Parse(xml).ToString();
}
Visual Studio or any decent XML editor will format (tabify) XML documents easily. There are also on-line tools available:
http://www.xmlformatter.net/
http://www.shell-tools.net/index.php?op=xml_format
If you are using Visual studio just open xml do Ctrl+a Ctrl+k Ctrl+F and that's it for formatting.
You can also use XSLT:
// This XSLT copies everything but idented
StringReader sr = new StringReader( xsl );
XmlReader reader = XmlReader.Create(sr);
XslTransform xslt = new XslTransform();
xslt.Load(reader);
xslt.Transform(xmlFileUnidentedPath, xmlFileIdentedPath);
Having xsl defined as:
string xsl = #"
<?xml version=""1.0""?>
<xsl:stylesheet version=""1.0"" xmlns:xsl=""http://www.w3.org/1999/XSL/Transform"">
<xsl:output method=""xml"" omit-xml-declaration=""no"" indent=""yes"" encoding=""US-SCII""/>
<xsl:strip-space elements=""*""/>
<xsl:template match=""/"">
<xsl:copy-of select="".""/>
</xsl:template>
</xsl:stylesheet>";

Generating XSL block in C#

I am trying to generate the following XSL block in my C# application. Can anyone tell me how to to it?
<XSL-Script xmlns:xsl="http://www.w3.org/......">
<xsl:value-of select="$VAR">
</XSL-Script>
I tried to use regular C# XML class, and it removes the xsl: from the tag name, because it thinks xsl: is the namespace. And it also doesn't allow to use "$" in front of VAR for attribute value of "select".
Thanks a lot.
Here is a simple C# program that "generates" a complete XSLT stylesheet and then performs this transformation on a "generated" XML document and outputs the result of the transformation to a file:
using System.IO;
using System.Xml;
using System.Xml.Xsl;
class testTransform
{
static void Main(string[] args)
{
string xslt =
#"<xsl:stylesheet version='1.0'
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:variable name='vX' select='1'/>
<xsl:template match='/'>
<xsl:value-of select='$vX'/>
</xsl:template>
</xsl:stylesheet>";
string xml = #"<t/>";
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(xml);
XmlDocument xslDoc = new XmlDocument();
xslDoc.LoadXml(xslt);
XslCompiledTransform xslTrans = new XslCompiledTransform();
xslTrans.Load(xslDoc);
xslTrans.Transform(xmlDoc, null, new StreamWriter("output.txt"));
}
}
When this application is built and executed it creates a file named "output.txt" and its contents is the expected, correct result from the dynamically generated XSLT transformation:
<?xml version="1.0" encoding="utf-8"?>1

Getting Variable values out of and XSLT file

I wanted to find out if there is a way of getting a parameter or variable value out of an XSL file. For example, if I have the following:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:variable name="username" select ="usertest"/>
<xsl:variable name="password" select ="pass"/>
<!-- ... -->
</xsl:stylesheet>
I would like to read the username and password values from the XSL and use them for authentication. I am using ASP.Net and C# to perform the actual transform on an XML file.
Could someone please share code with me that would allow me to read the XSL variables from ASP.NET/C#. Thanks in advance for the help.
This is easy. XSL files are XML themselves, so you can treat them as such.
XmlDocument xslDoc = new XmlDocument();
xslDoc.Load("myfile.xsl");
XmlNamespaceManager nsMgr = new XmlNamespaceManager(xslDoc.NameTable);
nsMgr.AddNamespace("xsl", "http://www.w3.org/1999/XSL/Transform");
XmlNode usrNode = xslDoc.SelectSingleNode("/xsl:stylesheet/xsl:variable[#name='username']", nsMgr);
XmlNode pwdNode = xslDoc.SelectSingleNode("/xsl:stylesheet/xsl:variable[#name='password']", nsMgr);
string usr = usrNode.Attributes["select"].Value;
string pwd = pwdNode.Attributes["select"].Value;
Your question is (edit: was) missing the actual code, but from the description it appears what you are looking for is XPath. XSL will transform one XML document into another XML document, you can then use XPath to query the resulting XML to get out the values that you want.
This Microsoft KB article has information about how to use XPath from C#:
http://support.microsoft.com/kb/308333
Thanks Everone. Here is what finally worked:
Client (asp with vbscript) Used for Testing Purposes:
<%
//Create Object
Set xmlhttp = CreateObject("Microsoft.XMLHTTP")
//Set up the object with the URL
'xmlhttp.open "POST" ,"http://localhost/ASP_Test/receiveXML.asp",False
//Create DOM Object
Set xmldom = CreateObject("Microsoft.XMLDOM")
xmldom.async = false
//Load xls to send over for transform
xmldom.load(Server.MapPath("/ASP_Test/masterdata/test.xsl"))
//Send transform file as DOM object
xmlhttp.send xmldom
%>
//////////////////////////////////////////////////////////////////////////
On the Server Side: (aspx with C#) Accepts xslt and process the transform:
//file path for data xml
String xmlFile = ("\\masterdata\\test.xml");
//file path for transformed xml
String xmlFile2 = ("\\masterdata\\out.xml");
XmlTextReader reader = new XmlTextReader(Request.InputStream);
Transform(xmlFile, reader, xmlFile2);
public static string Transform(string sXmlPath, XmlTextReader xslFileReader, string outFile)
{
try
{
//load the Xml doc
XPathDocument myXPathDoc = new XPathDocument(sXmlPath);
XslCompiledTransform myXslTrans = new XslCompiledTransform();
//load the Xsl
myXslTrans.Load(xslFileReader);
//create the output stream
XmlTextWriter myWriter = new XmlTextWriter
(outFile, null);
//do the actual transform of Xml
myXslTrans.Transform(myXPathDoc, null, myWriter);
myWriter.Close();
return "Done";
}
catch (Exception e)
{
return e.Message;
}
}

What's the property way to transform with XSL without HTML encoding my final output?

So, I am working with .NET. I have an XSL file, XslTransform object in C# that reads in the XSL file and transforms a piece of XML data (manufactured in-house) into HTML.
I notice that my final output has < and > automatically encoded into < and >. Is there any ways I can prevent that from happening? Sometimes I need to bold or italicize my text but it's been unintentionally sanitized.
Your xsl file should have:
an output of html
omit namespaces for all used in the xslt
i.e.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
exclude-result-prefixes="xsl msxsl">
<xsl:output method="html" indent="no" omit-xml-declaration="yes"/>
<!-- lots -->
</xsl:stylesheet>
And you should ideally use the overloads that accept either a TextWriter or a Stream (not XmlWriter) - i.e. something like:
StringBuilder sb = new StringBuilder();
using (XmlReader reader = XmlReader.Create(source)
using (TextWriter writer = new StringWriter(sb))
{
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load("Foo.xslt"); // in reality, you'd want to cache this
xslt.Transform(reader, options.XsltOptions, writer);
}
string html = sb.ToString();
In the xslt, if you really want standalone < / > (i.e. you want it to be malformed for some reason), then you need to disable output escaping:
<xsl:text disable-output-escaping="yes">
Your malformed text here
</xsl:text>
However, in general it is correct to escape the characters.
I have used this in the past to transform XMl documents into HTML strings which is what you need.
public static string TransformXMLDocFromFileHTMLString(string orgXML, string transformFilePath)
{
System.Xml.XmlDocument orgDoc = new System.Xml.XmlDocument();
orgDoc.LoadXml(orgXML);
XmlNode transNode = orgDoc.SelectSingleNode("/");
System.Text.StringBuilder sb = new System.Text.StringBuilder();
XmlWriterSettings settings = new XmlWriterSettings();
settings.ConformanceLevel = ConformanceLevel.Auto;
XmlWriter writer = XmlWriter.Create(sb, settings);
System.Xml.Xsl.XslCompiledTransform trans = new System.Xml.Xsl.XslCompiledTransform();
trans.Load(transformFilePath);
trans.Transform(transNode, writer);
return sb.ToString();
}
(XslTransform is deprecated, according to MSDN. They recommend you switch to XslCompiledTransform.)
Can you post an example of the input/output?

Categories