Problems with OpenXML 2.5 XSL Transformation Word 2013 C#

Problems with OpenXML 2.5 XSL Transformation Word 2013 C# - c#

Can anyone help me out on this problem?
I haven't done anything with OpenXML before and it has me stumped!
I have a Word Document which is an invoice.
In this document, I have the usual headers etc, plus the 'fields' which need to be populated with data from my XML dataset from SQLServer.
I took a copy of the word/document.xml from the docx and made the recommended changes to the file to convert it into an XSLT file.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
becomes
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:n2="urn:hl7-org:v3"
exclude-result-prefixes="n2 xs xsi xsl">
<xsl:output method="xml" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
and the file is terminated with
</xsl:template>
</xsl:stylesheet>
Then, I changed some of my 'fields' to show where I wanted the data to be merged.
All well and good......
When I ran it, I got a new file which looked OK but would not open in Word 2013. I pulled the document.xml out of the docx and tried to open that. This gave me an Unspecified Error Line 1 Column 1257.
I have since tried all sorts of things, including creating an XSLT with no merge fields, just the headers and footers set up, and I get the same thing.
I have tried several different headers with differing complexity and always get the same error.
When I trace the error, it is in this tag line
<w:document xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 wp14">
which finishes with column 1257:
mc:Ignorable="w14 w15 wp14">
I have checked that the namespaces are all declared but I cannot see, or understand, what is going wrong
Any ideas?
Thanks

Related

Removing Attribute value based on value from an XML using VB.Net

I have an XML as below
<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope
xmlns="http://com/uhg/uht/uhtSoapMsg_V1"
xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
<env:Header>
<uhtHeader
xmlns="http://com/uhg/uht/uhtHeader_V1">
<consumer>COMET</consumer>
<auditId></auditId>
<sendTimestamp>2020-09-03T18:15:40.942-05:00</sendTimestamp>
<environment>P</environment>
<businessService version="24">getClaimHistory</businessService>
<status>success</status>
</uhtHeader>
</env:Header>
<env:Body>
<srvcRspn
xmlns="http://com/uhg/uht/getClaimHistory_V24">
<srvcErrList arrayType="srvcErrOccur[1]" type="Array">
<srvcErrOccur>
<orig>Foundation</orig>
<rtnCd>00</rtnCd>
<explCd>000</explCd>
<desc></desc>
</srvcErrOccur>
</SrvcErrList>
</srvcRspn>
</env:Body>
</env:Envelope>
I want to remove all the attribute values with "http" like below:
<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope
xmlns=""
xmlns:env="">
<env:Header>
<uhtHeader
xmlns="">
<consumer>COMET</consumer>
<auditId></auditId>
<sendTimestamp>2020-09-03T18:15:40.942-05:00</sendTimestamp>
<environment>P</environment>
<businessService version="24">getClaimHistory</businessService>
<status>success</status>
</uhtHeader>
</env:Header>
<env:Body>
<srvcRspn
xmlns="">
<srvcErrList arrayType="srvcErrOccur[1]" type="Array">
<srvcErrOccur>
<orig>Foundation</orig>
<rtnCd>00</rtnCd>
<explCd>000</explCd>
<desc></desc>
</srvcErrOccur>
</SrvcErrList>
</srvcRspn>
</env:Body>
</env:Envelope>
I have tried several ways but none of them has worked for me. Can anyone suggest what is fastest way to do it in VB.NET/C#.
The actual response is very large (approx 100000 lines of XML minimum) and using for each will consume a good amount of time. Is there any parsing method or LINQ query method which can do it faster.

I got the way to do it using Regex as below:
Return Regex.Replace(xmlDoc, "((?<=<|<\/)|(?<= ))[A-Za-z0-9]+:| xmlns(:[A-Za-z0-9]+)?="".*?""", "")
It serves my purpose completely. Thanks Cleptus for your quick reference.

Saxon XSLT: Serializer producing weird indents

I'm using Saxon HE 9.5.1.8 to transform an XML to another XML file.
My problem is that the XML content written by the Serializer() class of Saxon prints out several additional indents that I don't want to have in there. I'm assuming that this is "wrong" because I got the expected output when using the DomDestination() class (but then the outer XML document information is missing) or other XSL transformers like the one that is shipped with Visual Studio / .NET Framework.
This is the input XML:
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>$44.95</price>
<publish_date>2000-10-01</publish_date>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>$5.95</price>
<publish_date>2000-12-16</publish_date>
</book>
This is the XLST file:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl"
>
<xsl:output method="xml" indent="yes"/>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="book">
<book>
<xsl:copy-of select="#*|book/#*" />
<xsl:for-each select="*">
<xsl:attribute name="{name()}">
<xsl:value-of select="text()"/>
</xsl:attribute>
</xsl:for-each>
</book>
</xsl:template>
</xsl:stylesheet>
That is the expected output:
<?xml version="1.0" encoding="utf-8"?>
<catalog>
<book id="bk101" author="Gambardella, Matthew" title="XML Developer's Guide" genre="Computer" price="$44.95" publish_date="2000-10-01" />
<book id="bk102" author="Ralls, Kim" title="Midnight Rain" genre="Fantasy" price="$5.95" publish_date="2000-12-16" />
</catalog>
And that is the output when using Saxon:
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book id="bk101"
author="Gambardella, Matthew"
title="XML Developer's Guide"
genre="Computer"
price="$44.95"
publish_date="2000-10-01"/>
<book id="bk102"
author="Ralls, Kim"
title="Midnight Rain"
genre="Fantasy"
price="$5.95"
publish_date="2000-12-16"/>
</catalog>
Does anybody know how to suppress or modify this behavior of Saxon? That is the C# code that is used to call the Saxon API:
public Stream Transform(string xmlFilePath, string xsltFilePath)
{
var result = new MemoryStream();
var xslt = new FileInfo(xsltFilePath);
var input = new FileInfo(xmlFilePath);
var processor = new Processor();
var compiler = processor.NewXsltCompiler();
var executable = compiler.Compile(new Uri(xslt.FullName));
var destination = new Serializer();
destination.SetOutputStream(result);
using(var inputStream = input.OpenRead())
{
var transformer = executable.Load();
transformer.SetInputStream(inputStream, new Uri(input.DirectoryName));
transformer.Run(destination);
}
result.Position = 0;
return result;
}

Try setting http://saxonica.com/documentation9.5/extensions/output-extras/line-length.html to a very large value to avoid that attributes are put on a new line: <xsl:output xmlns:saxon="http://saxon.sf.net/" saxon:line-length="1000"/>.

Your goal of having multiple processors produce output in the same format is hopelessly misguided. That's especially so if you choose indented output: the spec leaves it entirely to implementations how to do indentation, saying only that the goal is to make it human-readable. (And placing constraints on where extra whitespace can be inserted.)
I'm sorry you don't find Saxon's way of wrapping long attribute lists pleasing, but it is entirely within the letter and the spirit of the specification. Without it, if you have an element with eight namespace declarations, you can easily get a line that is 400 characters long, which I certainly don't regard as human-readable.
There are many reasons that comparing two XML documents lexically is never going to work. For example, the attributes can be in a different order. There are two ways of comparing XML: convert the documents into canonical form using a "Canonical XML" processor, or compare them at the tree level for example by using the XPath 2.0 deep-equal() function. Ideally (especially if you want to know where the differences are, rather than just whether differences exist), use a specialist XML comparison tool such as DeltaXML.
For what it's worth, when we do unit testing, we first attempt a lexical comparison of the results. If that fails, we parse both documents and compare them using saxon:deep-equal(), which is a modified form of the deep-equal() function that gives fine control over the comparison rules, e.g. handling of whitespace and handling of namespaces.

Loading multiple XDocuments, and working with its documents

I wrote several lines of code but still can't get over this:
I need to load many xml docs from web library. I don't know how many documents there are so I wonder which loop should I use while loading:
XDocument doc = XDocument.Load("http://" + i);
where -i is identifiers number.
I tried loading until i get document without meaningful content (thought it is the end, the rest are empty), but problem is that there is several Xdocs that are empty in the middle of library.
XML with content looks like
<?xml version="1.0" encoding="utf-8"?>
<OP xmlns="" xmlns:xsi="" xsi:schemaLocation="">
<request verb="GR" identifier="53" metadataPrefix="p"></request>
<GR>
<header>
<identifier>53,number of doc...used for counting</identifier>
</header>
<metadata>
<P xmlns="" xsi:schemaLocation="">
<TITLE>title</TITLE>
<CERTIFICATE NAME="different names">
</CERTIFICATE>
<YEAR>
<DATE>2012-10-18T00:00:00Z</DATE>
</YEAR>
<MINIATURE>
<COPY>
<CNAME>Copy name<CNAME>
<FORMAT>obj/max/dxf/3ds/...</FORMAT>
</COPY>
</MINIATURE>
</metadata>
</GR>
</OP>
XML without content
<?xml version="1.0" encoding="utf-8"?>
<OP xmlns="" xmlns:xsi="" xsi:schemaLocation="">
<request verb="GR" identifier="53" metadataPrefix="p"></request>
Furthermore, I need to do some counting like:
Tot.no. of doc,
No. of docs per certificate <CERTIFICATE>
No. of docs for each year <YEAR><DATE>
No of docs for each format <MINIATURE><COPY><FORMAT>
and my output should look like:
<?xml version="1.0" encoding="UTF-8" ?>
<Statistic>
<DocSum>21220</DocSum>
<Certificates>
<Certificate id=”certificateName”>17098</Certificate>
…
<Certificates>
<Years>
<Year year=”2014”>23</Year>
…
</Years>
<Miniature>
<Format post=”obj”>11723</Format>
…
</Miniature>
</Statistic>
If you could give me some help, hints or tips how to deal with it.

The posted answer by smink to the following thread should get you on the right path.
C# HttpWebRequest command to get directory listing
One of the easiest ways to get a list of the files of a web directory without knowing exactly how many there are or their filenames is by parsing the html of the directory and pulling out the tags.
You can then iterate through these tags and filter them out for the files by extensions that you need. I can provide a more in-depth example if necessary.

Transformation from XML to Excel 2010 file

How do I create a Template for the XSLT transformation from XML file to Excel 2010 file or Word 2010 ?
This is our Template for the Transformation in Execel 2003. But now I need it for 2010 ?
Thanks for Help !
<xsl:stylesheet xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xslex="urn:XsltExtension"
xmlns:html="http://www.w3.org/TR/REC-html40"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:x="urn:schemas-microsoft-com:office:excel"
version="2.0">
<xsl:output method="xml" indent="yes" />
<xsl:output method="xml" version="1.0"/>
<xsl:template match="/NewDataSet">
<xsl:variable name="elements" select="xs:schema/xs:element/xs:complexType/xs:choice/xs:element/xs:complexType/xs:sequence/*" />
<xsl:variable name="columnAppearances" select="ColumnAppearances/*" />
<xsl:processing-instruction name="mso-application">
<xsl:text>progid="Excel.Sheet"</xsl:text>
</xsl:processing-instruction>
<Workbook>
<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office" />
<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
<ProtectStructure>False</ProtectStructure>
<ProtectWindows>False</ProtectWindows>
</ExcelWorkbook>
</xsl:styleshee>

The open XML format used by Excel 2007+ consists of many XML's inside a ZIP package. The overall format is very different than the single-file XML format you mention. If you have a working solution now, I suggest you leave it as is, and use interop to convert it to a newer format. If you want to avoid interop, then you should use the OpenXML SDK instead. Doing it with XSLT would be very painful, if not impossible, and it is certainly off-topic here because it's a vast amount of work.

how to reference xsd file from xml

1) pls.xsd file
I have included pls.xsd in xml.xsd in same folder
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<!-- Externals changed by QTAssistant (http://www.paschidev.com) -->
<!--
This is a draft schema for the XML language defined in the
Pronunciation Lexicon Specification
(latest version at <http://www.w3.org/TR/pronunciation-lexicon/>)
At the time of writing, the specification as well as this schema are
subject to change, and no guarantee is made on their accuracy or the fact
that they are in sync.
Last modified: $Date: 2007/12/11 12:08:40 $
Copyright û 2006 World Wide Web Consortium, (Massachusetts Institute
of Technology, ERCIM, Keio University). All Rights Reserved. See
http://www.w3.org/Consortium/Legal/.
-->
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:p="http://www.w3.org/2005/01/pronunciation-lexicon"
targetNamespace="http://www.w3.org/2005/01/pronunciation-lexicon"
elementFormDefault="qualified" version="1.0">
<xs:annotation>
<xs:documentation>Importing dependent namespaces</xs:documentation>
</xs:annotation>
<xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="xml.xsd" />
...
</xs:schema>
2)My XML file
from this file i am referencing pls.xsd
<?xml version="1.0" encoding="utf-8"?>
<lexicon version="1.0"
xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
file://C:/xsdforproject/pls.xsd"
alphabet="x-microsoft-ups" xml:lang="en-IN">
<lexeme>
</lexeme>
</lexicon>
I have above two codes these giving me an errors in both my XML and in my pls.xsd file,
an error has occurred while opening external "DTD" file:///C:/xsdforproject/XMLSchema.dtd': Could not find file 'C:\xsdforproject\XMLSchema.dtd
I am using "Visual Studio 2010".
How to resolve this issue?

It would appear that something you're using is referencing XMLSchema.dtd, so have you tried downloading it and placing it in your xsdforproject folder?
http://www.w3.org/2009/XMLSchema/XMLSchema.dtd

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Problems with OpenXML 2.5 XSL Transformation Word 2013 C# - c#

Related

Removing Attribute value based on value from an XML using VB.Net

Saxon XSLT: Serializer producing weird indents

Loading multiple XDocuments, and working with its documents

Transformation from XML to Excel 2010 file

how to reference xsd file from xml

Categories

Resources