I'm trying to load a xml document into an object XPathDocument in C#.
My xml documents include this line:
trés dégagée + rade
and when the parser arrives there it gives me this error:
"An error occurred while parsing EntityName"
I know that's normal cause of the character "é". Does anybody know how can I avoid this error... My idea is to insert into the xml document an entities declaration and after replace all special characters with entities...but it's long and I’m not sure if it's working. Do you have other ideas? Simpler?
Thanks a lot
Was about to post this and just then the servers went down. I think I've rewritten it correctly from memory:
I think that the problem lies within the fact that by default the XPathDocument uses an XmlTextReader to parse the contents of the supplied file and this XmlTextReader uses an EntityHandling setting of ExpandEntities.
In other words, when you rely on the default settings, an XmlTextReader will validate the input XML and try to resolve all entities. The better way is to do this manually by taking full control over the XmlReaderSettings (I always do it manually):
string myXMLFile = "SomeFile.xml";
string fileContent = LoadXML(myXMLFile);
private string LoadXML(string xml)
{
XPathDocument xDoc;
XmlReaderSettings xrs = new XmlReaderSettings();
// The following line does the "magic".
xrs.CheckCharacters = false;
using (XmlReader xr = XmlReader.Create(xml, xrs))
{
xDoc = new XPathDocument(xr);
}
if (xDoc != null)
{
XPathNavigator xNav = xDoc.CreateNavigator();
return xNav.OuterXml;
}
else
// Unable to load file
return null;
}
Typically this is caused by a mismatch between the encoding used to read the file and the files actually encoding.
At a guess I would say the file is UTF-8 encoded but you are reading it with a default encoding.
Try beefing up your question with more details to get a more definitive answer.
Related
I'm getting the following error in when trying to read some XML.
Exception has occurred: CLR/System.Xml.XmlException
Exception thrown: 'System.Xml.XmlException' in System.Private.Xml.dll: 'There is no Unicode byte order mark. Cannot switch to Unicode.'
I've identified this as the API is serving the content as utf-8 but the header is utf-16.
<?xml version="1.0" encoding="utf-16"?>
I've confirmed this in tests from static files by deleting the encoding or saving the file in utf-16. I have also confirmed that the incoming response is utf-8 looking in the response Content.Headers.ContentType.
Unfortunately I don't maintain the API and don't think that this will be getting fixed any time soon.
Is there a way to make a System.Text.XmlReader ignore the header in the stream, would be nice if there were a flag to simply ignore the doctype if they can't be bothered to make it accurate?
I think you can correct the content of XML using some kind of Schema replacement prior to final parsing?
I could always think about re-encoding the same content but it seems a little mad.
var mockBytes = System.Text.Encoding.UTF8.GetBytes("<?xml version=\"1.0\" encoding=\"utf-16\"?>");
var mockStream = MemoryStream new(mockBytes);
XmlReaderSettings settings = new XmlReaderSettings();
settings.Async = true;
using (var reader = XmlReader.Create(mockStream, settings))
{
if (reader.ReadToFollowing("Message") & await reader.ReadAsync())
{
while (await reader.MoveToContentAsync() == XmlNodeType.Element)
{
...
}
}
}
Thank you for the comments. Using them I have been able to test that simply instantiating and passing a StreamReader is all that is required to stop the XmlReader interpreting the encoding meta in the document type definition.
var mockBytes = System.Text.Encoding.UTF8.GetBytes("<?xml version=\"1.0\" encoding=\"utf-16\"?>");
var mockStream = MemoryStream new(mockBytes);
var sr = new StreamReader(mockStream);
XmlReaderSettings settings = new XmlReaderSettings();
settings.Async = true;
using (var reader = XmlReader.Create(sr, settings))
{
if (reader.ReadToFollowing("Message") & await reader.ReadAsync())
{
while (await reader.MoveToContentAsync() == XmlNodeType.Element)
{
...
}
}
}
This is the simplest solution I can imagine other than a flag on the XmlReaderSettings that doesn't seem to exist.
Furthermore, as #Jereon says, skipping to particular characters or line endings would get very brittle and fall over if some other change happened at the API. You would really have to try and look more carefully, perhaps pushing elements into a stack between <? + ?> not easy and also fortunately not necessary.
I'm trying to set up parsing for a test XML generated with ksoap2 in Android:
<?xml version="1.0" encoding="utf-8"?>
<v:Envelope xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns:d="http://www.w3.org/2001/XMLSchema" xmlns:c="http://schemas.xmlsoap.org/soap/encoding/" xmlns:v="http://schemas.xmlsoap.org/soap/envelope/">
<v:Header />
<v:Body>
<v:SOAPBODY>
<v:INFO i:type="v:INFO">
<v:LAITETUNNUS i:type="d:string">EI_TUNNUSTA</v:LAITETUNNUS>
</v:INFO>
<v:TOIMINNOT i:type="v:TOIMINNOT">
<v:TOIMINTA i:type="d:string">ASETUKSET_HAKU</v:TOIMINTA>
</v:TOIMINNOT>
<v:SISALTO i:type="v:SISALTO">
<v:KUVA i:type="d:string">AGFAFDGFDGFG</v:KUVA>
<v:MITTAUS i:type="d:string">12,42,12,4,53,12</v:MITTAUS>
</v:SISALTO>
</v:SOAPBODY>
</v:Body>
</v:Envelope>
But seemingly i can't parse it in any way. The exception is always that "Root element is not found" even when it goes through XML-validators like the one at w3schools. If i'm correct the contents of the body shouldn't be an issue when the problem is with root element.
The test code for parsing i try to use in C# is:
using (StreamReader streamreader = new StreamReader(Context.Request.InputStream))
{
try
{
XDocument xmlInput = new XDocument();
streamreader.BaseStream.Position = 0;
string tmp = streamreader.ReadToEnd();
var xmlreader = XmlReader.Create(streamreader.BaseStream);
xmlInput = XDocument.Parse(tmp);
xmlInput = XDocument.Load(xmlreader);
catch (Exception e)
{ }
where the xmlInput = XDocument.Parse(tmp); does indeed parse it to a XDocument, not a navigable one, though. Then xmlInput = XDocument.Load(xmlreader); throws the exception for not having a root element. I'm completely at loss here because i managed to parse and navigate the almost same xml with XMLDocument and XDocument classes before, and i fear i made some changes i didn't notice.
Thanks in advance.
Update: Here's the string tmp as requested :
"<?xml version=\"1.0\" encoding=\"utf-8\"?><v:Envelope xmlns:i=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:d=\"http://www.w3.org/2001/XMLSchema\" xmlns:c=\"http://schemas.xmlsoap.org/soap/encoding/\" xmlns:v=\"http://schemas.xmlsoap.org/soap/envelope/\"><v:Header /><v:Body><v:SOAPBODY><v:INFO i:type=\"v:INFO\"><v:LAITETUNNUS i:type=\"d:string\">EI_TUNNUSTA</v:LAITETUNNUS></v:INFO><v:TOIMINNOT i:type=\"v:TOIMINNOT\"><v:TOIMINTA i:type=\"d:string\">ASETUKSET_HAKU</v:TOIMINTA></v:TOIMINNOT><v:SISALTO i:type=\"v:SISALTO\"><v:KUVA i:type=\"d:string\">AGFAFDGFDGFG</v:KUVA><v:MITTAUS i:type=\"d:string\">12,42,12,4,53,12</v:MITTAUS></v:SISALTO></v:SOAPBODY></v:Body></v:Envelope>\r\n"
Update: Even with XDocument.Load(new StreamReader(Context.Request.InputStream, Encoding.UTF8)); the parsing will fail.
I believe you've read to the end of the stream once already, you need to reset the position in the stream again. see: "Root element is missing" error but I have a root element
I literally just want to be able to traverse the contents of various XML files that I have been given but having a non-standard DTD means I am hitting some issues - one error being "Reference to undeclared entity 'reg'" and another saying it is unable to locate the .DTD file.
Is it possible to do this sort of thing for XML files when no DTD is available? I have no control over these files and cannot change them. I am looking to grab various amounts of them at a time, move through the contents as efficiently as possible, email out some notifications and thats it.
Sample of XML file below:
<!DOCTYPE Toro-Pub PUBLIC "-//Toro//DTD Toro Publication V1.0//EN//XML" "Toro-Pub.dtd">
<!--Arbortext, Inc., 1988-2011, v.4002-->
<?Pub UDT _nopagebreak _touchup KeepsKeep="yes" KeepsPrev="no" KeepsNext="no" KeepsBoundary="page"?>
<?Pub UDT template _font?>
<?Pub UDT _bookmark _target?>
<?Pub UDT _nocolumnbreak _touchup KeepsKeep="yes" KeepsPrev="no" KeepsNext="no" KeepsBoundary="column"?>
<?Pub UDT instructions _comment FontColor="red"?>
<?Pub EntList alpha bull copy rArr sect trade deg?>
<?Pub Inc?>
<Toro-Pub><PubMeta Brand="Toro" CE="Yes" ClientPubNo="" CopyrightYear="2013" FormNumber="3378-827" Lang="CS" LangParentForm="3378-826" LangParentID="72729" LangParentRev="A" PageSize="" PhoneNoCan="" PhoneNoMex="" PhoneNoUS="" ProductFamily="sample product name" PubID="72730" PublicationType="Operator Manual" RegistrationURL="www.website.com" Rev="A" ServiceURL="www.website.com"><?TranslationData DueDate="07/01/2013" InCarton(1-yes)="0" Author="Mr Smith" EngParent="https://lwww.website.com?vPubID=423&vPubNum=3378-826" ?></PubMeta><Pub-TBlock>
<Body-TB>
...
Many thanks.
UPDATE #1
I have tried the below code taken from the suggested comment:
Stream file = File.OpenRead("4d00fa60800e0a5d_3378-827.xml");
// The next line is the fix!!!
XmlTextReader xmlTextReader = new XmlTextReader(file);
xmlTextReader.XmlResolver = null; // Don't require file in system32\inetsrv
XmlReaderSettings readerSettings = new XmlReaderSettings();
readerSettings.ValidationType = ValidationType.Schema;
//readerSettings.Schemas.Add(null, "");
readerSettings.DtdProcessing = DtdProcessing.Ignore;
readerSettings.XmlResolver = null; // Doesn't help
//readerSettings.ValidationEventHandler += ValidationEventHandle;
XmlReader myXmlReader = XmlReader.Create(xmlTextReader, readerSettings);
XmlDocument myXmlDocument = new XmlDocument();
myXmlDocument.XmlResolver = null; // Doesn't help
myXmlDocument.Load(myXmlReader); // Load doc, no .dtd required on local disk
However, I now get a new error of 'Operation is not valid due to the current state of the object.' on the line 'myXmlDocument.Load(myXmlReader)'.
I'm trying to parse some XML inside a WiX installer. The XML would be an object of all my errors returned from a web server. I'm getting the error in the question title with this code:
XmlDocument xml = new XmlDocument();
try
{
xml.LoadXml(myString);
}
catch (Exception ex)
{
System.IO.File.WriteAllText(#"C:\text.txt", myString + "\r\n\r\n" + ex.Message);
throw ex;
}
myString is this (as seen in the output of text.txt)
<?xml version="1.0" encoding="utf-8"?>
<Errors></Errors>
text.txt comes out looking like this:
<?xml version="1.0" encoding="utf-8"?>
<Errors></Errors>
Data at the root level is invalid. Line 1, position 1.
I need this XML to parse so I can see if I had any errors.
The hidden character is probably BOM.
The explanation to the problem and the solution can be found here, credits to James Schubert, based on an answer by James Brankin found here.
Though the previous answer does remove the hidden character, it also removes the whole first line. The more precise version would be:
string _byteOrderMarkUtf8 = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble());
if (xml.StartsWith(_byteOrderMarkUtf8))
{
xml = xml.Remove(0, _byteOrderMarkUtf8.Length);
}
I encountered this problem when fetching an XSLT file from Azure blob and loading it into an XslCompiledTransform object.
On my machine the file looked just fine, but after uploading it as a blob and fetching it back, the BOM character was added.
Use Load() method instead, it will solve the problem. See more
The issue here was that myString had that header line. Either there was some hidden character at the beginning of the first line or the line itself was causing the error. I sliced off the first line like so:
xml.LoadXml(myString.Substring(myString.IndexOf(Environment.NewLine)));
This solved my problem.
I Think that the problem is about encoding. That's why removing first line(with encoding byte) might solve the problem.
My solution for Data at the root level is invalid. Line 1, position 1.
in XDocument.Parse(xmlString) was replacing it with XDocument.Load( new MemoryStream( xmlContentInBytes ) );
I've noticed that my xml string looked ok:
<?xml version="1.0" encoding="utf-8"?>
but in different text editor encoding it looked like this:
?<?xml version="1.0" encoding="utf-8"?>
At the end i did not need the xml string but xml byte[]. If you need to use the string you should look for "invisible" bytes in your string and play with encodings to adjust the xml content for parsing or loading.
Hope it will help
Save your file with different encoding:
File > Save file as... > Save as UTF-8 without signature.
In VS 2017 you find encoding as a dropdown next to Save button.
Main culprit for this error is logic which determines encoding when converting Stream or byte[] array to .NET string.
Using StreamReader created with 2nd constructor parameter detectEncodingFromByteOrderMarks set to true, will determine proper encoding and create string which does not break XmlDocument.LoadXml method.
public string GetXmlString(string url)
{
using var stream = GetResponseStream(url);
using var reader = new StreamReader(stream, true);
return reader.ReadToEnd(); // no exception on `LoadXml`
}
Common mistake would be to just blindly use UTF8 encoding on the stream or byte[]. Code bellow would produce string that looks valid when inspected in Visual Studio debugger, or copy-pasted somewhere, but it will produce the exception when used with Load or LoadXml if file is encoded differently then UTF8 without BOM.
public string GetXmlString(string url)
{
byte[] bytes = GetResponseByteArray(url);
return System.Text.Encoding.UTF8.GetString(bytes); // potentially exception on `LoadXml`
}
I've solved this issue by directly editing the byte array.
Collect the UTF8 preamble and remove directly the header.
Afterward you can transform the byte[]to a string with GetString method, see below.
The \r and \t I've removed as well, just as precaution.
XmlDocument configurationXML = new XmlDocument();
List<byte> byteArray = new List<byte>(webRequest.downloadHandler.data);
foreach(byte singleByte in Encoding.UTF8.GetPreamble())
{
byteArray.RemoveAt(byteArray.IndexOf(singleByte));
}
string xml = System.Text.Encoding.UTF8.GetString(byteArray.ToArray());
xml = xml.Replace("\\r", "");
xml = xml.Replace("\\t", "");
If your xml is in a string use the following to remove any byte order mark:
xml = new Regex("\\<\\?xml.*\\?>").Replace(xml, "");
At first I had problems escaping the "&" character, then diacritics and special letters were shown as question marks and ended up with the issue OP mentioned.
I looked at the answers and I used #Ringo's suggestion to try Load() method as an alternative. That made me realize that I can deal with my response in other ways not just as a string.
using System.IO.Stream instead of string solved all the issues for me.
var response = await this.httpClient.GetAsync(url);
var responseStream = await response.Content.ReadAsStreamAsync();
var xmlDocument = new XmlDocument();
xmlDocument.Load(responseStream);
The cool thing about Load() is that this method automatically detects the string format of the input XML (for example, UTF-8, ANSI, and so on). See more
I have found out one of the solutions.
For your code this could be as follows -
XmlDocument xml = new XmlDocument();
try
{
// assuming the location of the file is in the current directory
// assuming the file name be loadData.xml
string myString = "./loadData.xml";
xml.Load(myString);
}
catch (Exception ex)
{
System.IO.File.WriteAllText(#"C:\text.txt", myString + "\r\n\r\n" + ex.Message);
throw ex;
}
if we are using XDocument.Parse(#"").
Use # it resolves the issue.
Using an XmlDataDocument object is much better than using an XDocument or XmlDocument object. XmlDataDocument works fine with UTF8 and it doesn't have problems with Byte Order Sequences. You can get the child nodes of each element using ChildNodes property.
Use a custom function such as the following one:
static public void ReadXmlDataDocument2(string xmlFilePath)
{
if (xmlFilePath != null)
{
if (File.Exists(xmlFilePath))
{
System.IO.FileStream fs = default(System.IO.FileStream);
try
{
fs = new System.IO.FileStream(xmlFilePath, System.IO.FileMode.Open, System.IO.FileAccess.Read);
System.Xml.XmlDataDocument k_XDoc = new System.Xml.XmlDataDocument();
k_XDoc.Load(fs);
fs.Close();
fs.Dispose();
fs = null;
XmlNodeList ndsRoot = k_XDoc.ChildNodes;
foreach (System.Xml.XmlNode xLog in ndsRoot)
{
foreach (System.Xml.XmlNode xLog2 in xLog.ChildNodes)
{
if (xLog2.Name == "ERRORs")
{
foreach (System.Xml.XmlNode xLog3 in xLog2.ChildNodes)
{
if (xLog3.Name == "ErrorCode")
{
// Do something
}
if (xLog3.Name == "Description")
{
// Do something
}
}
}
}
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
}
}
I want to read xml on runtime, without save it on a path
After my searching i find that, In console application i need to use Console.Out for displaying result
xmlSerializer.Serialize(Console.Out, patient);
In Windows / Web Application we need to set path like
StreamWriter streamWriter = new StreamWriter(#"C:\test.xml");
but i need to read xml with out save it, i am using Webserive where i need to read it and take a decision that either it is valid or not
I hope i define it clearly..
Use the XmlDocument object.
There are several ways to load the XML, you can use the XmlDocument.Load() and specify your URL in there or use XmlDocument.LoadXml() to load the XML from a string.
You could use the XmlDocument.LoadXml class to read the received xml. There is no need to save it to disk.
try
{
XmlDocument doc = new XmlDocument();
doc.LoadXml(receivedXMLStr);
//valid xml
}
catch (XmlException xe)
{
//invalid xml
}
Use Linq2Xml..
XElement doc;
try
{
doc=XElement.Load(yourStream);
}
catch
{
//invalid XML
}
foreach(XElement node in doc.Descendants())
{
node.Value;//value of this node
nodes.Attributes();//all the attributes of this node
}
Thanks all of you for your reply, i want to laod my XML without save it on a local Path, because saving creating many XML.
Finally i find the solutions for load the XML from class on a Memory stream, I thinn this solution is very easy and optimize
XmlDocument doc = new XmlDocument();
System.Xml.Serialization.XmlSerializer serializer2 = new System.Xml.Serialization.XmlSerializer(Patients.GetType());
System.IO.MemoryStream stream = new System.IO.MemoryStream();
serializer2.Serialize(stream, Patients);
stream.Position = 0;
doc.Load(stream);
You need to use the Deserialize option to read the xml. Follow the below steps to achieve it,
Create a target class. It structure should represent the xml output.
After creating the class, use the below code to load your xml into the target object
TargetType result = null;
XmlSerializer worker = new XmlSerializer(typeof(TargetType));
result = worker.Deserialize("<xml>.....</xml>");
Now the xml is loaded into the object 'result' and use it.