XDocument can't load xml with version 1.1 in C# LINQ? - c#

XDocument.Load throws an exception when using an XML file with version 1.1 instead of 1.0:
Unhandled Exception: System.Xml.XmlException: Version number '1.1' is invalid. Line 1, position 16.
Any clean solutions to resolve the error (without regex) and load the document?

Initial reaction, just to confirm that I can reproduce this:
using System;
using System.Xml.Linq;
class Test
{
static void Main(string[] args)
{
string xml = "<?xml version=\"1.1\" ?><root><sub /></root>";
XDocument doc = XDocument.Parse(xml);
Console.WriteLine(doc);
}
}
Results in this exception:
Unhandled Exception: System.Xml.XmlException: Version number '1.1' is invalid. Line 1, position 16.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.Throw(String res, String arg)
at System.Xml.XmlTextReaderImpl.ParseXmlDeclaration(Boolean isTextDecl)
at System.Xml.XmlTextReaderImpl.Read()
at System.Xml.Linq.XDocument.Load(XmlReader reader, LoadOptions options)
at System.Xml.Linq.XDocument.Parse(String text, LoadOptions options)
at System.Xml.Linq.XDocument.Parse(String text)
at Test.Main(String[] args)
It's still failing as of .NET 4.6.

"Version 1.0" is hardcoded in various places in the standard .NET XML libraries. For example, your code seems to be falling foul of this line in System.Xml.XmlTextReaderImpl.ParseXmlDeclaration(bool):
if (!XmlConvert.StrEqual(this.ps.chars, this.ps.charPos, charPos - this.ps.charPos, "1.0"))
I had a similar issue with XDocument.Save refusing to retain 1.1. It was the same type of thing - a hardcoded "1.0" in a System.Xml method.
I couldn't find anyway round it that still used the standard libraries.

you can just skip the first line, then use XDocument.Parse to load the XML. Like this:
var lines = File.ReadAllLines(xmlFilename).ToList();
lines[0] = String.Empty;
var xdoc = XDocument.Parse(string.Join("", lines));

Related

C# - Getting "Index was out of range...Parameter name: chunkLength" error on serializing to XMLstring

Our application requires logging API requests and we are logging them in the XML format in our file system. But we're getting an Index out of range error from time to time and are unable to reproduce them locally.
On checking the Stack trace we were able to identify that the error originates from our XML serializer method that converts an object to an XML string.
I'm using the following method to serialize objects to an XML string
public static string Serialize(object input)
{
string output = "";
var serializer = new XmlSerializer(input.GetType());
using (var writer = new StringWriter())
{
serializer.Serialize(writer, input);
output = writer.ToString();
}
return output;
}
But we are seeing the following error from time to time
Error occurred - Index was out of range.
Must be non-negative and less than the size of the collection. Parameter name: chunkLength
When searching the above error I found out that the StringBuilder throws this because it doesn't support multithreading. (link to post here)
I want to know if using the StringWriter is causing this and if so, what should I use inplace of it so that I stop getting this error?
Thanks in advance

Minio. For security reasons DTD is prohibited in this XML document

Consider code that (tries) to put a file to Minio:
public async Task Put(byte[] data)
{
using var ms = new MemoryStream(data);
var args = new PutObjectArgs { };
args.WithBucket("buckethead");
args.WithObject(Guid.NewGuid.ToString());
args.WithStreamData(ms);
args.WithObjectSize(ms.Length);
args.WithContentType("application/vnd.ms-excel");
await _client.PutObjectAsync(args);
}
Data is a ClosedXML XLTemplate, saved as bytes:
var template = new XLTemplate(#"D:\Documents\MyTemplate.xlsx");
template.AddVariable(myDto); //just a dto class with values to fill a template
template.Generate();
using var ms = new MemoryStream();
template.SaveAs(ms);
return ms.ToArray();
Problems is, this line:
await _client.PutObjectAsync(args);
Fails with the following:
{"There is an error in XML document (0, 0)."}
For security reasons DTD is prohibited in this XML document. To enable DTD processing set the DtdProcessing property on XmlReaderSettings to Parse and pass the settings into XmlReader.Create method.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.ParseDoctypeDecl()
at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
at System.Xml.XmlTextReaderImpl.Read()
at System.Xml.XmlReader.MoveToContent()
at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderErrorResponse.Read3_Error()
What does it have to do with serialization and how to fix it?
Btw created XLTemplate is legit, if saved to hard drive as .xlsx I can open it just fine.
Turns out I had port specified incorrectly in Minio client. Well the exception never even hinted at it. Don't manage exception verbosity in your libraries like this, kids.

Replacing values in XML file

Our application needs to process XML files. Some times we receive XMLs with values as follows:
<DiagnosisStatement>
<StmtText>ST &</StmtText>
</DiagnosisStatement>
Because of &< my application is not able to load XML correctly and throwing exception as follows:
An error occurred while parsing EntityName. Line 92, position 24.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.Throw(String res, String arg)
at System.Xml.XmlTextReaderImpl.Throw(String res)
at System.Xml.XmlTextReaderImpl.ParseEntityName()
at System.Xml.XmlTextReaderImpl.ParseEntityReference()
at System.Xml.XmlTextReaderImpl.Read()
at System.Xml.XmlLoader.LoadNode(Boolean skipOverWhitespace)
at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc)
at System.Xml.XmlLoader.Load(XmlDocument doc, XmlReader reader, Boolean preserveWhitespace)
at System.Xml.XmlDocument.Load(XmlReader reader)
at System.Xml.XmlDocument.Load(String filename)
at Transformation.GetEcgTransformer(String filePath, String fileType, String Manufacture, String Producer) in D:\Transformation.cs:line 160
Now I need to replace all occurrences of &< with 'and<' so that XML can get processed successfully without any exceptions.
This is what I did in order to load XML with the help of answer given by Botz3000.
string oldText = File.ReadAllText(filePath);
string newText = oldText.Replace("&<", "and<");
File.WriteAllText(filePath, newText, Encoding.UTF8);
xmlDoc = new XmlDocument();
xmlDoc.Load(filePath);
The Xml file is invalid, because & needs to be escaped as &, so you cannot just load the xml without getting an error. You can do it if you load the file as plain text though:
string invalid = File.ReadAllText(filename);
string valid = invalid.Replace("&<", "and<");
File.WriteAllText(filename, valid);
If you have control over how the Xml file is generated though, you should fix that issue by either escaping the & as & or by replacing it with "and" as you said.

Convert XML to Plain Text

My goal is to build an engine that takes the latest HL7 3.0 CDA documents and make them backward compatible with HL7 2.5 which is a radically different beast.
The CDA document is an XML file which when paired with its matching XSL file renders a HTML document fit for display to the end user.
In HL7 2.5 I need to get the rendered text, devoid of any markup, and fold it into a text stream (or similar) that I can write out in 80 character lines to populate the HL7 2.5 message.
So far, I'm taking an approach of using XslCompiledTransform to transform my XML document using XSLT and product a resultant HTML document.
My next step is to take that document (or perhaps at a step before this) and render the HTML as text. I have searched for a while, but can't figure out how to accomplish this. I'm hoping its something easy that I'm just overlooking, or just can't find the magical search terms. Can anyone offer some help?
FWIW, I've read the 5 or 10 other questions in SO which embrace or admonish using RegEx for this, and don't think that I want to go down that road. I need the rendered text.
using System;
using System.IO;
using System.Xml;
using System.Xml.Xsl;
using System.Xml.XPath;
public class TransformXML
{
public static void Main(string[] args)
{
try
{
string sourceDoc = "C:\\CDA_Doc.xml";
string resultDoc = "C:\\Result.html";
string xsltDoc = "C:\\CDA.xsl";
XPathDocument myXPathDocument = new XPathDocument(sourceDoc);
XslCompiledTransform myXslTransform = new XslCompiledTransform();
XmlTextWriter writer = new XmlTextWriter(resultDoc, null);
myXslTransform.Load(xsltDoc);
myXslTransform.Transform(myXPathDocument, null, writer);
writer.Close();
StreamReader stream = new StreamReader (resultDoc);
}
catch (Exception e)
{
Console.WriteLine ("Exception: {0}", e.ToString());
}
}
}
Since you have the XML source, consider writing an XSL that will give you the output you want without the intermediate HTML step. It would be far more reliable than trying to transform the HTML.
This will leave you with just the text:
class Program
{
static void Main(string[] args)
{
var blah = new System.IO.StringReader(sourceDoc);
var reader = System.Xml.XmlReader.Create(blah);
StringBuilder result = new StringBuilder();
while (reader.Read())
{
result.Append( reader.Value);
}
Console.WriteLine(result);
}
static string sourceDoc = "<html><body><p>this is a paragraph</p><p>another paragraph</p></body></html>";
}
Or you can use a regular expression:
public static string StripHtml(String htmlText)
{
// replace all tags with spaces...
htmlText = Regex.Replace(htmlText, #"<(.|\n)*?>", " ");
// .. then eliminate all double spaces
while (htmlText.Contains(" "))
{
htmlText = htmlText.Replace(" ", " ");
}
// clear out non-breaking spaces and & character code
htmlText = htmlText.Replace(" ", " ");
htmlText = htmlText.Replace("&", "&");
return htmlText;
}
Can you use something like this which uses lynx and perl to render the html and then convert that to plain text?
This is a great use-case for XSL:FO and FOP. FOP isn't just for PDF output, one of the other major outputs that is supported is text. You should be able to construct a simple xslt + fo stylesheet that has the specifications (i.e. line width) that you want.
This solution will is a bit more heavy-weight that just using xml->xslt->text as ScottSEA suggested, but if you have any more complex formatting requirements (e.g. indenting), it will become much easier to express in fo, than mocking up in xslt.
I would avoid regexs for extracting the text. That's too low-level and guaranteed to be brittle. If you just want text and 80 character lines, the default xslt template will only print element text. Once you have only the text, you can apply whatever text processing is necessary.
Incidentally, I work for a company who produces CDAs as part of our product (voice recognition for dications). I would look into an XSLT that transforms the 3.0 directly into 2.5. Depending on the fidelity you want to keep between the two versions, the full XSLT route will probably be your easiest bet if what you really want to achieve is conversion between the formats. That's what XSLT was built to do.

An error occurred while parsing EntityName

I'm trying to load a xml document into an object XPathDocument in C#.
My xml documents include this line:
trés dégagée + rade
and when the parser arrives there it gives me this error:
"An error occurred while parsing EntityName"
I know that's normal cause of the character "é". Does anybody know how can I avoid this error... My idea is to insert into the xml document an entities declaration and after replace all special characters with entities...but it's long and I’m not sure if it's working. Do you have other ideas? Simpler?
Thanks a lot
Was about to post this and just then the servers went down. I think I've rewritten it correctly from memory:
I think that the problem lies within the fact that by default the XPathDocument uses an XmlTextReader to parse the contents of the supplied file and this XmlTextReader uses an EntityHandling setting of ExpandEntities.
In other words, when you rely on the default settings, an XmlTextReader will validate the input XML and try to resolve all entities. The better way is to do this manually by taking full control over the XmlReaderSettings (I always do it manually):
string myXMLFile = "SomeFile.xml";
string fileContent = LoadXML(myXMLFile);
private string LoadXML(string xml)
{
XPathDocument xDoc;
XmlReaderSettings xrs = new XmlReaderSettings();
// The following line does the "magic".
xrs.CheckCharacters = false;
using (XmlReader xr = XmlReader.Create(xml, xrs))
{
xDoc = new XPathDocument(xr);
}
if (xDoc != null)
{
XPathNavigator xNav = xDoc.CreateNavigator();
return xNav.OuterXml;
}
else
// Unable to load file
return null;
}
Typically this is caused by a mismatch between the encoding used to read the file and the files actually encoding.
At a guess I would say the file is UTF-8 encoded but you are reading it with a default encoding.
Try beefing up your question with more details to get a more definitive answer.

Categories