Saxon .Net transformation

Saxon .Net transformation - c#

Maybe this is a dumb question but I'm a totally newbie when it comes to XSL.
I'm using this command line to perform a transformation which works great:
transform.exe -s: source.xml -xsl:rules.xsl -o: output.xml -xi:on
I'm trying to achieve the same result from c# but the output file is empty. What is the equivalent of the "-xi" parameter?
Thanks.
Uri xslUri = new Uri(#"rules.xsl");
Uri inputUri = new Uri(#"source.xml");
Uri outputUri = new Uri(#"toc.hhc");
// Compile stylesheet
try
{
Processor processor = new Processor();
XdmNode input = processor.NewDocumentBuilder().Build(inputUri);
XsltCompiler compiler = processor.NewXsltCompiler();
XsltExecutable exec = compiler.Compile(xslUri);
XsltTransformer transformer = exec.Load();
transformer.InitialContextNode = input;
// Create a serializer
Serializer serializer = new Serializer();
FileStream fs = new FileStream(outputUri.AbsolutePath, FileMode.Create, FileAccess.Write);
serializer.SetOutputStream( fs );
// Transform the source XML to System.out.
transformer.Run( serializer );
}
catch( Exception e )
{
Console.WriteLine( e.Message );
}

I tried to set proc.Implementation.setXIncludeAware(true); but it did not work for me with Saxon 9.6.0.7. Therefore I posted my result at https://saxonica.plan.io/boards/3/topics/6206 and Saxonica created the bug report https://saxonica.plan.io/issues/2488 which was fixed later on. So it seems you have to wait until the fix is included in a new release to be able to use XInclude in a .NET application.

I would need to test it to be 100% sure that it works, but I think you should be able to do
processor.setProperty("http://saxon.sf.net/feature/xinclude-aware", "true")

Related

Weird character encoded characters (â€™) appearing from a feed

I've got a question regarding an XML feed and XSL transformation I'm doing. In a few parts of the outputted feed on an HTML page, I get weird characters (such as â€™) appearing on the page.
On another site (that I don't own) that's using the same feed, it isn't getting these characters.
Here's the code I'm using to grab and return the transformed content:
string xmlUrl = "http://feedurl.com/feed.xml";
string xmlData = new System.Net.WebClient().DownloadString(xmlUrl);
string xslUrl = "http://feedurl.com/transform.xsl";
XsltArgumentList xslArgs = new XsltArgumentList();
xslArgs.AddParam("type", "", "specifictype");
string resultText = Utils.XslTransform(xmlData, xslUrl, xslArgs);
return resultText;
And my Utils.XslTransform function looks like this:
static public string XslTransform(string data, string xslurl)
{
TextReader textReader = new StringReader(data);
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Ignore;
XmlReader xmlReader = XmlReader.Create(textReader, settings);
XmlReader xslReader = new XmlTextReader(Uri.UnescapeDataString(xslurl));
XslCompiledTransform myXslT = new XslCompiledTransform();
myXslT.Load(xslReader);
StringBuilder sb = new StringBuilder();
using (TextWriter tw = new StringWriter(sb))
{
myXslT.Transform(xmlReader, new XsltArgumentList(), tw);
}
string transformedData = sb.ToString();
return transformedData;
}
I'm not extremely knowledgeable with character encoding issues and I've been trying to nip this in the bud for a bit of time and could use any suggestions possible. I'm not sure if there's something I need to change with how the WebClient downloads the file or something going weird in the XslTransform.
Thanks!

Give HtmlEncode a try. So in this case you would reference System.Web and then make this change (just call the HtmlEncode function on the last line):
string xmlUrl = "http://feedurl.com/feed.xml";
string xmlData = new System.Net.WebClient().DownloadString(xmlUrl);
string xslUrl = "http://feedurl.com/transform.xsl";
XsltArgumentList xslArgs = new XsltArgumentList();
xslArgs.AddParam("type", "", "specifictype");
string resultText = Utils.XslTransform(xmlData, xslUrl, xslArgs);
return HttpUtility.HtmlEncode(resultText);

The character â is a marker of multibyte sequence (â€™) of UTF-8-encoded text when it's represented as ASCII. So, I guess, you generate an HTML file in UTF-8, while browser interprets it otherwise. I see 2 ways to fix it:
The simplest solution would be to update the XSLT to include the HTML meta tag that will hint the correct encoding to browser: <meta charset="UTF-8">.
If your transform already defines a different encoding in meta tag and you'd like to keep it, this encoding needs to be specified in the function that saves XML as file. I assume this function took ASCII by default in your example. If your XSLT was configured to generate XML files directly to disk, you could adjust it with XSLT instruction <xsl:output encoding="ASCII"/>.

To use WebClient.DownloadString you have to know what the encoding the server is going use and tell the WebClient in advance. It's a bit of a Catch-22.
But, there is no need to do that. Use WebClient.DownloadData or WebClient.OpenReader and let an XML library figure out which encoding to use.
using (var web = new WebClient())
using (var stream = web.OpenRead("http://unicode.org/repos/cldr/trunk/common/supplemental/windowsZones.xml"))
using (var reader = XmlReader.Create(stream, new XmlReaderSettings { DtdProcessing = DtdProcessing.Parse }))
{
reader.MoveToContent();
//… use reader as you will, including var doc = XDocument.ReadFrom(reader);
}

Read custom property value from XMP with itextsharp 5.5.9

I'm having difficulties in reading a specific custom property from the XMP section of a PDF file, using itextsharp v. 5.5.9.
When I try to use the XmpReader class, it gets marked as obsolete, and it does not contain any public method that seems to be useful for reading purposes.
I can convert the Metadata section to an XML, and then parse it in some way (a workaround consists in using XmpCore library that has convenient methods for reading properties by name) but I'm sure I'm missing something...
I think it should be possible to just access some properties with just one library.
PdfReader reader = new PdfReader(inFile);
PdfStamper stamper = new PdfStamper(reader, new FileStream(outFile, FileMode.Create));
MemoryStream ms = null;
if (reader.Metadata != null)
ms = new MemoryStream(reader.Metadata);
else
{
stamper.CreateXmpMetadata();
ms = new MemoryStream();
}
XmpWriter xw = new XmpWriter(ms);
xw.XmpMeta.GetPropertyString(XmpConst.NS_DC, "MyProperty"); // -> not found, but it's ok for the first time...
xw.SetProperty(XmpConst.NS_DC, "MyProperty", "MyValue"); // -> OK
xw.XmpMeta.GetPropertyString(XmpConst.NS_DC, "MyProperty"); // -> OK
xw.Close();
stamper.XmpMetadata = ms.ToArray();
stamper.Close();
reader.Close();
If I run the program on the same file twice (so the property is saved in the file) the property is still not found..
How can I read the presence and value of MyProperty?

I ended up with this solution.
It requires XmpCore library, but it's easy and fast to implement, avoiding the explicit management of many details, such as encodings:
string result = null;
iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(inFile);
if (reader.Metadata != null)
{
XmpCore.IXmpMeta meta = XmpCore.XmpMetaFactory.ParseFromBuffer(reader.Metadata);
result = meta.GetPropertyString(XmpConst.NS_DC, "MyProperty");
}
reader.Close();
return result;

If I don't get it wrong, you want to get customized property of pdf file's metadata.
If yes, you can do like this:
PdfReader reader = new PdfReader(inFile);
string myProperty = reader.Info.Where(x => x.Key == "MyProperty").Select(x => x.Value).FirstOrDefault();

Output Param from XSLT

I have been familiar with passing Input Param to XSLT CompiledTransformation class, so that parser takes care of XSL file making use of Param in processing instruction provided in XSL file.
Is there a way where we can get output param (say a value of node or something else) from XSLT to host language like C#??
XslCompiledTransform xslTransform = new XslCompiledTransform();
string strXmlOutput = string.Empty;
StringWriter swXmlOutput = null;
MemoryStream objMemoryStream = null;
XPathDocument xpathXmlOrig = new XPathDocument(string_xmlInput);
swXmlOutput = new StringWriter();
objMemoryStream = new MemoryStream();
xslArg.AddParam("TESTING", "", SomeVar);
XsltSettings xslsettings = new XsltSettings(false, true);
xslTransform.Load(string_xslInput, xslsettings, new XmlUrlResolver());
xslTransform.Transform(xpathXmlOrig, xslArg, objMemoryStream);
This code indeed outputs transformed XML, but my question is can we take just one value as output param from XSL Tranformation (XSLT file)??
Something like this:
xslArg.OutputParam("testing"); //Something like this?
........
........
xslTransform.Transform(xpathXmlOrig, xslArg, objMemoryStream);
string outputparam = xslArg.GetParam("testing"); //ideal way of getting param after traformation!
Does XSLT provides scope for something like this?

In C#, your best bet is to put <xsl:message> instructions in your XSLT code, and hook into the XsltMessageEncountered event on the XsltArgumentList class.
You can test it's giving you the correct output without hooking the event, by watching the output of the app- in the absence of an event handler, messages are piped to standard output.

How to get Xml as string from XDocument?

I am new to LINQ to XML. After you have built XDocument, how do you get the OuterXml of it like you did with XmlDocument?

You only need to use the overridden ToString() method of the object:
XDocument xmlDoc ...
string xml = xmlDoc.ToString();
This works with all XObjects, like XElement, etc.

I don't know when this changed, but today (July 2017) when trying the answers out, I got
"System.Xml.XmlDocument"
Instead of ToString(), you can use the originally intended way accessing the XmlDocument content: writing the xml doc to a stream.
XmlDocument xml = ...;
string result;
using (StringWriter writer = new StringWriter())
{
xml.Save(writer);
result = writer.ToString();
}

Several responses give a slightly incorrect answer.
XDocument.ToString() omits the XML declaration (and, according to #Alex Gordon, may return invalid XML if it contains encoded unusual characters like &).
Saving XDocument to StringWriter will cause .NET to emit encoding="utf-16", which you most likely don't want (if you save XML as a string, it's probably because you want to later save it as a file, and de facto standard for saving files is UTF-8 - .NET saves text files as UTF-8 unless specified otherwise).
#Wolfgang Grinfeld's answer is heading in the right direction, but it's unnecessarily complex.
Use the following:
var memory = new MemoryStream();
xDocument.Save(memory);
string xmlText = Encoding.UTF8.GetString(memory.ToArray());
This will return XML text with UTF-8 declaration.

Doing XDocument.ToString() may not get you the full XML.
In order to get the XML declaration at the start of the XML document as a string, use the XDocument.Save() method:
var ms = new MemoryStream();
using (var xw = XmlWriter.Create(new StreamWriter(ms, Encoding.GetEncoding("ISO-8859-1"))))
new XDocument(new XElement("Root", new XElement("Leaf", "data"))).Save(xw);
var myXml = Encoding.GetEncoding("ISO-8859-1").GetString(ms.ToArray());

Use ToString() to convert XDocument into a string:
string result = string.Empty;
XElement root = new XElement("xml",
new XElement("MsgType", "<![CDATA[" + "text" + "]]>"),
new XElement("Content", "<![CDATA[" + "Hi, this is Wilson Wu Testing for you! You can ask any question but no answer can be replied...." + "]]>"),
new XElement("FuncFlag", 0)
);
result = root.ToString();

While #wolfgang-grinfeld's answer is technically correct (as it also produces the XML declaration, as opposed to just using .ToString() method), the code generated UTF-8 byte order mark (BOM), which for some reason XDocument.Parse(string) method cannot process and throws Data at the root level is invalid. Line 1, position 1. error.
So here is a another solution without the BOM:
var utf8Encoding =
new UTF8Encoding(encoderShouldEmitUTF8Identifier: false);
using (var memory = new MemoryStream())
using (var writer = XmlWriter.Create(memory, new XmlWriterSettings
{
OmitXmlDeclaration = false,
Encoding = utf8Encoding
}))
{
CompanyDataXml.Save(writer);
writer.Flush();
return utf8Encoding.GetString(memory.ToArray());
}

I found this example in the Microsoft .NET 6 documentation for XDocument.Save method. I think it answers the original question (what is the XDocument equivalent for XmlDocument.OuterXml), and also addresses the concerns that others have pointed out already. By using the XmlWritingSettings you can predictably control the string output.
https://learn.microsoft.com/en-us/dotnet/api/system.xml.linq.xdocument.save
StringBuilder sb = new StringBuilder();
XmlWriterSettings xws = new XmlWriterSettings();
xws.OmitXmlDeclaration = true;
xws.Indent = true;
using (XmlWriter xw = XmlWriter.Create(sb, xws)) {
XDocument doc = new XDocument(
new XElement("Child",
new XElement("GrandChild", "some content")
)
);
doc.Save(xw);
}
Console.WriteLine(sb.ToString());

Looking at these answers, I see a lot of unnecessary complexity and inefficiency in pursuit of generating the XML declaration automatically. But since the declaration is so simple, there isn't much value in generating it. Just KISS (keep it simple, stupid):
// Extension method
public static string ToStringWithDeclaration(this XDocument doc, string declaration = null)
{
declaration ??= "<?xml version=\"1.0\" encoding=\"utf-8\"?>\r\n";
return declaration + doc.ToString();
}
// Usage
string xmlString = doc.ToStringWithDeclaration();
// Or
string xmlString = doc.ToStringWithDeclaration("...");
Using XmlWriter instead of ToString() can give you more control over how the output is formatted (such as if you want indentation), and it can write to other targets besides string.
The reason to target a memory stream is performance. It lets you skip the step of storing the XML in a string (since you know the data must end up in a different encoding eventually, whereas string is always UTF-16 in C#). For instance, for an HTTP request:
// Extension method
public static ByteArrayContent ToByteArrayContent(
this XDocument doc, XmlWriterSettings xmlWriterSettings = null)
{
xmlWriterSettings ??= new XmlWriterSettings();
using (var stream = new MemoryStream())
{
using (var writer = XmlWriter.Create(stream, xmlWriterSettings))
{
doc.Save(writer);
}
var content = new ByteArrayContent(stream.GetBuffer(), 0, (int)stream.Length);
content.Headers.ContentType = new MediaTypeHeaderValue("text/xml");
return content;
}
}
// Usage (XDocument -> UTF-8 bytes)
var content = doc.ToByteArrayContent();
var response = await httpClient.PostAsync("/someurl", content);
// Alternative (XDocument -> string -> UTF-8 bytes)
var content = new StringContent(doc.ToStringWithDeclaration(), Encoding.UTF8, "text/xml");
var response = await httpClient.PostAsync("/someurl", content);

how to monitor the program code execution? (file creation and modification by code lines etc)

My program is about triggering XSL transformation,
Its fact that this code for carrying out the transformation, creates some dll and tmp files and deletes them pretty soon after the transformation is completed.
It is almost untraceable for me to monitor the creation and deletion of files manually, so I want to include some chunk of codelines to display "which codeline has created/modified which tmp and dll files" in console window.
This is the relevant part of the code:
string strXmlQueryTransformPath = #"input.xsl";
string strXmlOutput = string.Empty;
StringReader srXmlInput = null;
StringWriter swXmlOutput = null;
XslCompiledTransform xslTransform = null;
XPathDocument xpathXmlOrig = null;
XsltSettings xslSettings = null;
MemoryStream objMemoryStream = null;
objMemoryStream = new MemoryStream();
xslTransform = new XslCompiledTransform(false);
xpathXmlOrig = new XPathDocument("input.xml");
xslSettings = new XsltSettings();
xslSettings.EnableScript = true;
xslTransform.Load(strXmlQueryTransformPath, xslSettings, new XmlUrlResolver());
xslTransform.Transform(xpathXmlOrig, null, objMemoryStream);
objMemoryStream.Position = 0;
StreamReader objStreamReader = new StreamReader(objMemoryStream);
strXmlOutput = objStreamReader.ReadToEnd();
// make use of Data in string "strXmlOutput"
google and msdn search couldn't help me much..

The temporary DLLs will be created as part of the XSLCompiledTransform object: the XSLT document is compiled at run-time into MSIL and that generated assembly is used to perform the actual transformation. If you really want to work out exactly when the DLL appears/disappears, you could just step through the code, line by line, in a debugger and watch the Temp directory.
Why do you care about the temporary files, though? They're just an implementation detail of the XSL transform code that shouldn't matter to your code.

http://msdn.microsoft.com/en-us/library/system.xml.xsl.xslcompiledtransform.temporaryfiles.aspx
for this particular example.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Saxon .Net transformation - c#

I would need to test it to be 100% sure that it works, but I think you should be able to do processor.setProperty("http://saxon.sf.net/feature/xinclude-aware", "true")

Related

Weird character encoded characters (â€™) appearing from a feed

Read custom property value from XMP with itextsharp 5.5.9

Output Param from XSLT

How to get Xml as string from XDocument?

how to monitor the program code execution? (file creation and modification by code lines etc)

Categories

Resources