Replacing values in XML file - c#

Our application needs to process XML files. Some times we receive XMLs with values as follows:
<DiagnosisStatement>
<StmtText>ST &</StmtText>
</DiagnosisStatement>
Because of &< my application is not able to load XML correctly and throwing exception as follows:
An error occurred while parsing EntityName. Line 92, position 24.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.Throw(String res, String arg)
at System.Xml.XmlTextReaderImpl.Throw(String res)
at System.Xml.XmlTextReaderImpl.ParseEntityName()
at System.Xml.XmlTextReaderImpl.ParseEntityReference()
at System.Xml.XmlTextReaderImpl.Read()
at System.Xml.XmlLoader.LoadNode(Boolean skipOverWhitespace)
at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc)
at System.Xml.XmlLoader.Load(XmlDocument doc, XmlReader reader, Boolean preserveWhitespace)
at System.Xml.XmlDocument.Load(XmlReader reader)
at System.Xml.XmlDocument.Load(String filename)
at Transformation.GetEcgTransformer(String filePath, String fileType, String Manufacture, String Producer) in D:\Transformation.cs:line 160
Now I need to replace all occurrences of &< with 'and<' so that XML can get processed successfully without any exceptions.

This is what I did in order to load XML with the help of answer given by Botz3000.
string oldText = File.ReadAllText(filePath);
string newText = oldText.Replace("&<", "and<");
File.WriteAllText(filePath, newText, Encoding.UTF8);
xmlDoc = new XmlDocument();
xmlDoc.Load(filePath);

The Xml file is invalid, because & needs to be escaped as &, so you cannot just load the xml without getting an error. You can do it if you load the file as plain text though:
string invalid = File.ReadAllText(filename);
string valid = invalid.Replace("&<", "and<");
File.WriteAllText(filename, valid);
If you have control over how the Xml file is generated though, you should fix that issue by either escaping the & as & or by replacing it with "and" as you said.

Related

Which XML message generates an exception while doing ".LoadXml(xml_string)"?

I'm receiving text string messages, which should be XML-like.
However, sometimes that's not the case but I don't see what those messages look like:
Current code:
var xmlDocument = new XmlDocument();
xmlDocument.LoadXml(xml);
The Exception errormessage is quite elaborated (it mentions the wrong tags), but it does not mention the xml string itself:
Exception during data receiving event: [System.Xml.XmlException: Unexpected end of file has occurred. The following elements are not closed: ID, VM, HMS. Line 1, position 156.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.Throw(String res, String arg)
at System.Xml.XmlTextReaderImpl.ThrowUnclosedElements()
at System.Xml.XmlTextReaderImpl.ParseElementContent()
at System.Xml.XmlTextReaderImpl.Read()
at System.Xml.XmlLoader.LoadNode(Boolean skipOverWhitespace)
at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc)
at System.Xml.XmlLoader.Load(XmlDocument doc, XmlReader reader, Boolean preserveWhitespace)
at System.Xml.XmlDocument.Load(XmlReader reader)
at System.Xml.XmlDocument.LoadXml(String xml)
at ... Information.FromString(String xml) in C:\...\Information.cs:line 40
at ....<ExtractInformation>d__71.MoveNext() in C:\...\Extraction.cs:line 161
at some_other_function(Byte[] data) in C:\...\Extraction.cs:line 134]
System.Xml.XmlException: Unexpected end of file has occurred. The following elements are not closed: ID, VM, HMS. Line 1, position 156.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.Throw(String res, String arg)
at System.Xml.XmlTextReaderImpl.ThrowUnclosedElements()
at System.Xml.XmlTextReaderImpl.ParseElementContent()
at System.Xml.XmlTextReaderImpl.Read()
at System.Xml.XmlLoader.LoadNode(Boolean skipOverWhitespace)
at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc)
at System.Xml.XmlLoader.Load(XmlDocument doc, XmlReader reader, Boolean preserveWhitespace)
at System.Xml.XmlDocument.Load(XmlReader reader)
at System.Xml.XmlDocument.LoadXml(String xml)
at ... Information.FromString(String xml) in C:\...\Information.cs:line 40
at ....<ExtractInformation>d__71.MoveNext() in C:\...\Extraction.cs:line 161
at some_other_function(Byte[] data) in C:\...\Extraction.cs:line 134]
My idea was to add the xml-string to the Exception's Message property, but this seems to be read-only:
public static Information FromString(string xml)
{
var xmlDocument = new XmlDocument();
try
{
xmlDocument.LoadXml(xml);
}
catch (Exception ex)
{
ex.Message += "[" + xml + "]"; // <= compiler error CS0200
throw ex;
}
At that depth I don't have any logging possibilities and as I'm working with a server application, I can't show something on screen.
How can I add information to this Exception message (or to other properties/fields of the Exception)?

XML file as input to LoadXml returns xml exception c#

In my unit testing project, I have merged all my xml files into single xml file and need to send to webservice to seperate it into modules.
So, I have used the below code in my webservice,
//xmlString is my xml file content
try
{
XElement XmlDocument = new XElement("XMLDocument");
XmlDocument testDoc = new XmlDocument();
testDoc.LoadXml(xmlString);
XmlDocument = XElement.Load(new XmlNodeReader(testDoc));
}
catch (Exception ex)
{
string returnString = ex.StackTrace.ToString();
returnString += "XML breaks the code";
return returnString;
}
EDIT: While running this in finalbuilder, I am getting below exception
Exception Message: at ModuleSeparator.GetDetails(String xmlString) System.Xml.XmlException: Data at the root level is invalid. Line 1, position 1.
Results
Return value of 'ModuleSeparator' : at System.Xml.XmlTextReaderImpl.Throw(String res, String arg)
at System.Xml.XmlTextReaderImpl.ParseRootLevelWhitespace()
at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
at System.Xml.XmlLoader.Load(XmlDocument doc, XmlReader reader, Boolean preserveWhitespace)
at System.Xml.XmlDocument.Load(XmlReader reader)
at System.Xml.XmlDocument.LoadXml(String xml)
at ModuleSeparator.GetDetails(String xmlString)XML breaks the code
Setting variable 'ModuleWiseTestCaseResults' to value ' at System.Xml.XmlTextReaderImpl.Throw(String res, String arg)
at System.Xml.XmlTextReaderImpl.ParseRootLevelWhitespace()
at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
at System.Xml.XmlLoader.Load(XmlDocument doc, XmlReader reader, Boolean preserveWhitespace)
at System.Xml.XmlDocument.Load(XmlReader reader)
at System.Xml.XmlDocument.LoadXml(String xml)
at ModuleSeparator.GetDetails(String xmlString)XML breaks the code'.
My XML will look like,
<?xml version="1.0" encoding="utf-8"?>
<test-results name="Merged results" total="418" errors="0" failures="0" not-run="0" inconclusive="128" ignored="0" skipped="7" invalid="0" date="2016-03-08" time="10:47:33">
<test-suite type="Test Project" name="" executed="True" result="Skipped" success="True" time="311.143731" asserts="0">
<results></results>
</test-suite>
</test-results>

Remove all hexadecimal characters before loading string into XML Document Object?

I have an xml string that is being posted to an ashx handler on the server. The xml string is built on the client-side and is based on a few different entries made on a form. Occasionally some users will copy and paste from other sources into the web form. When I try to load the xml string into an XMLDocument object using xmldoc.LoadXml(xmlStr) I get the following exception:
System.Xml.XmlException = {"'', hexadecimal value 0x0B, is an invalid character. Line 2, position 1."}
In debug mode I can see the rogue character (sorry I'm not sure of it's official title?):
My questions is how can I sanitise the xml string before I attempt to load it into the XMLDocument object? Do I need a custom function to parse out all these sorts of characters one-by-one or can I use some native .NET4 class to remove them?
Here you have an example to clean xml invalid characters using Regex:
xmlString = CleanInvalidXmlChars(xmlString);
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(xmlString);
public static string CleanInvalidXmlChars(string text)
{
string re = #"[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000-x10FFFF]";
return Regex.Replace(text, re, "");
}
A more efficient way to not error out on invalid XML characters would be to use the CheckCharacters flag in XmlReaderSettings.
var xmlDoc = new XmlDocument();
var xmlReaderSettings = new XmlReaderSettings { CheckCharacters = false };
using (var stringReader = new StringReader(xml)) {
using (var xmlReader = XmlReader.Create(stringReader, xmlReaderSettings)) {
xmlDoc.Load(xmlReader);
}
}

Boolean Deserialization Error

I am having problems with deserializeation for an XML element, I am assuming it is something to do with the namespace in the XML element not being found by the deserializer.
The data is coming form an outside source, which I cannot modify as a string and I am using C# 4.0.
Any help, gratefully appreciated.
string xml = "<boolean xmlns=\"http://schemas.microsoft.com/2003/10/serialization/\">false</boolean>";
var xSerializer = new XmlSerializer(typeof(bool));
using (var sr = new StringReader(xml))
using (var xr = XmlReader.Create(sr))
{
var y = xSerializer.Deserialize(xr);
}
Error:
System.InvalidOperationException was unhandled by user code
HResult=-2146233079
Message=There is an error in XML document (1, 2).
Source=System.Xml
StackTrace:
at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle, XmlDeserializationEvents events)
at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle)
...
...
...
InnerException: System.InvalidOperationException
HResult=-2146233079
Message=<boolean xmlns='http://schemas.microsoft.com/2003/10/serialization/'> was not expected.
Source=System.Xml
StackTrace:
at System.Xml.Serialization.XmlSerializationPrimitiveReader.Read_boolean()
at System.Xml.Serialization.XmlSerializer.DeserializePrimitive(XmlReader xmlReader, XmlDeserializationEvents events)
at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle, XmlDeserializationEvents events)
InnerException:
It will work if you create your serializer as below
var xSerializer = new XmlSerializer(typeof(bool),null, null,
new XmlRootAttribute("boolean"),
"http://schemas.microsoft.com/2003/10/serialization/");
You can't use xml namespaces and attributes if you want to deserialize a boolean. Infact you have to deserialize this:
string xml = "<boolean>false</boolean>";

XDocument can't load xml with version 1.1 in C# LINQ?

XDocument.Load throws an exception when using an XML file with version 1.1 instead of 1.0:
Unhandled Exception: System.Xml.XmlException: Version number '1.1' is invalid. Line 1, position 16.
Any clean solutions to resolve the error (without regex) and load the document?
Initial reaction, just to confirm that I can reproduce this:
using System;
using System.Xml.Linq;
class Test
{
static void Main(string[] args)
{
string xml = "<?xml version=\"1.1\" ?><root><sub /></root>";
XDocument doc = XDocument.Parse(xml);
Console.WriteLine(doc);
}
}
Results in this exception:
Unhandled Exception: System.Xml.XmlException: Version number '1.1' is invalid. Line 1, position 16.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.Throw(String res, String arg)
at System.Xml.XmlTextReaderImpl.ParseXmlDeclaration(Boolean isTextDecl)
at System.Xml.XmlTextReaderImpl.Read()
at System.Xml.Linq.XDocument.Load(XmlReader reader, LoadOptions options)
at System.Xml.Linq.XDocument.Parse(String text, LoadOptions options)
at System.Xml.Linq.XDocument.Parse(String text)
at Test.Main(String[] args)
It's still failing as of .NET 4.6.
"Version 1.0" is hardcoded in various places in the standard .NET XML libraries. For example, your code seems to be falling foul of this line in System.Xml.XmlTextReaderImpl.ParseXmlDeclaration(bool):
if (!XmlConvert.StrEqual(this.ps.chars, this.ps.charPos, charPos - this.ps.charPos, "1.0"))
I had a similar issue with XDocument.Save refusing to retain 1.1. It was the same type of thing - a hardcoded "1.0" in a System.Xml method.
I couldn't find anyway round it that still used the standard libraries.
you can just skip the first line, then use XDocument.Parse to load the XML. Like this:
var lines = File.ReadAllLines(xmlFilename).ToList();
lines[0] = String.Empty;
var xdoc = XDocument.Parse(string.Join("", lines));

Categories