How to load XML from URL on XmlDocument() - c#

I have this code :
string m_strFilePath = "http://www.google.com/ig/api?weather=12414&hl=it";
XmlDocument myXmlDocument = new XmlDocument();
myXmlDocument.LoadXml(m_strFilePath);
foreach (XmlNode RootNode in myXmlDocument.ChildNodes)
{
}
but when I try to execute it, I get this error :
Exception Details: System.Xml.XmlException: Data at the root level is invalid. Line 1, position 1.
Why? Where am I wrong? And how can I fix this problem on C#?
Also tried with :
myXmlDocument.Load(m_strFilePath);
but I get :
Exception Details: System.Xml.XmlException: Invalid character in the given encoding. Line 1, position 503.

NOTE: You're really better off using XDocument for most XML parsing needs nowadays.
It's telling you that the value of m_strFilePath is not valid XML. Try:
string m_strFilePath = "http://www.google.com/ig/api?weather=12414&hl=it";
XmlDocument myXmlDocument = new XmlDocument();
myXmlDocument.Load(m_strFilePath); //Load NOT LoadXml
However, this is failing (for unknown reason... seems to be choking on the à of Umidità). The following works (still trying to figure out what the difference is though):
var m_strFilePath = "http://www.google.com/ig/api?weather=12414&hl=it";
string xmlStr;
using(var wc = new WebClient())
{
xmlStr = wc.DownloadString(m_strFilePath);
}
var xmlDoc = new XmlDocument();
xmlDoc.LoadXml(xmlStr);

You need to use Load() instead of LoadXML(). LoadXML tries to parse a string into XML, in this case your URL.

Related

ExceptionLogger with using xml

I am writing to ask you about such question. So i have method which writes exception's info into xml file, but if some exception processed, this method replace all that it is in that file. I want that method write to end file a new info about exception
Code of my method is given below:
public void WriteIntoFile()
{
XDocument xdoc = new XDocument(
new XElement("Exceptions",
new XElement("Exception",
new XElement("Message",this.ErrorMessage.ToString())
)));
xdoc.Save("1.xml");
}
Please, help me with it
This should do the Job, assuming the file exists and you create a new node call "Exceptions".
public void WriteIntoFile(string Message)
{
const string Path = "C:\\Temp\\Log.xml";
XmlDocument MyDocument = new XmlDocument();
MyDocument.Load(Path);
XmlNode ExceptionsNode = MyDocument.CreateElement("Exceptions");
XmlNode ExceptionNode = MyDocument.CreateElement("Exception");
XmlNode MessageNode = MyDocument.CreateElement("Message");
MessageNode.InnerText = Message;
ExceptionNode.AppendChild(MessageNode);
ExceptionsNode.AppendChild(ExceptionNode);
MyDocument.AppendChild(ExceptionsNode);
}
if you want the "Exception"- Node append to a existing "Exceptions" node, use this:
XmlNode ExceptionsNode = MyDocument.SelectSingleNode("/Exceptions");
Greetings from Austria.

"Root element is missing" exception given when trying to parse XML file

I'm trying to set up parsing for a test XML generated with ksoap2 in Android:
<?xml version="1.0" encoding="utf-8"?>
<v:Envelope xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns:d="http://www.w3.org/2001/XMLSchema" xmlns:c="http://schemas.xmlsoap.org/soap/encoding/" xmlns:v="http://schemas.xmlsoap.org/soap/envelope/">
<v:Header />
<v:Body>
<v:SOAPBODY>
<v:INFO i:type="v:INFO">
<v:LAITETUNNUS i:type="d:string">EI_TUNNUSTA</v:LAITETUNNUS>
</v:INFO>
<v:TOIMINNOT i:type="v:TOIMINNOT">
<v:TOIMINTA i:type="d:string">ASETUKSET_HAKU</v:TOIMINTA>
</v:TOIMINNOT>
<v:SISALTO i:type="v:SISALTO">
<v:KUVA i:type="d:string">AGFAFDGFDGFG</v:KUVA>
<v:MITTAUS i:type="d:string">12,42,12,4,53,12</v:MITTAUS>
</v:SISALTO>
</v:SOAPBODY>
</v:Body>
</v:Envelope>
But seemingly i can't parse it in any way. The exception is always that "Root element is not found" even when it goes through XML-validators like the one at w3schools. If i'm correct the contents of the body shouldn't be an issue when the problem is with root element.
The test code for parsing i try to use in C# is:
using (StreamReader streamreader = new StreamReader(Context.Request.InputStream))
{
try
{
XDocument xmlInput = new XDocument();
streamreader.BaseStream.Position = 0;
string tmp = streamreader.ReadToEnd();
var xmlreader = XmlReader.Create(streamreader.BaseStream);
xmlInput = XDocument.Parse(tmp);
xmlInput = XDocument.Load(xmlreader);
catch (Exception e)
{ }
where the xmlInput = XDocument.Parse(tmp); does indeed parse it to a XDocument, not a navigable one, though. Then xmlInput = XDocument.Load(xmlreader); throws the exception for not having a root element. I'm completely at loss here because i managed to parse and navigate the almost same xml with XMLDocument and XDocument classes before, and i fear i made some changes i didn't notice.
Thanks in advance.
Update: Here's the string tmp as requested :
"<?xml version=\"1.0\" encoding=\"utf-8\"?><v:Envelope xmlns:i=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:d=\"http://www.w3.org/2001/XMLSchema\" xmlns:c=\"http://schemas.xmlsoap.org/soap/encoding/\" xmlns:v=\"http://schemas.xmlsoap.org/soap/envelope/\"><v:Header /><v:Body><v:SOAPBODY><v:INFO i:type=\"v:INFO\"><v:LAITETUNNUS i:type=\"d:string\">EI_TUNNUSTA</v:LAITETUNNUS></v:INFO><v:TOIMINNOT i:type=\"v:TOIMINNOT\"><v:TOIMINTA i:type=\"d:string\">ASETUKSET_HAKU</v:TOIMINTA></v:TOIMINNOT><v:SISALTO i:type=\"v:SISALTO\"><v:KUVA i:type=\"d:string\">AGFAFDGFDGFG</v:KUVA><v:MITTAUS i:type=\"d:string\">12,42,12,4,53,12</v:MITTAUS></v:SISALTO></v:SOAPBODY></v:Body></v:Envelope>\r\n"
Update: Even with XDocument.Load(new StreamReader(Context.Request.InputStream, Encoding.UTF8)); the parsing will fail.
I believe you've read to the end of the stream once already, you need to reset the position in the stream again. see: "Root element is missing" error but I have a root element

XDocument will not parse html entities (e.g. ) but XmlDocument will

I am currently converting our old parsers that run on XmlDocument to the XDocument. I do this mainly to get the Linq querying and the added linenumber info.
My xml contains an element like this:
<?xml version="1.0"?>
<fulltext>
hello this is a failed textnode
and I don't know how to parse it.
</fulltext>
My problem is that while XmlDocument seems to have no problem reading that node with:
var xmlDocument = new XmlDocument();
var physicalPath = GetPhysicalPath(uploadFolderFile);
try
{
xmlDocument.Load(physicalPath);
}
catch (XmlException xmlException)
{
_log.Warn("Problems with the document", xmlException);
}
The example above parses the document fine but when I try to do:
XDocument xmlDocument;
var physicalPath = GetPhysicalPath(uploadFolderFile);
var xmlStream = new System.IO.StreamReader(physicalPath);
try
{
xmlDocument = XDocument.Load(xmlStream, LoadOptions.SetLineInfo | LoadOptions.SetBaseUri);
}
catch (XmlException)
{
_log.Warn("Trying to clean document for HexaDecimal", xmlException);
}
It fails to read the document because of the character
The special character seems to be allowed in XML version 1.1 but changing the description doesn't help.
I have thought about just parsing the document with XmlDocument and then converting it; but that seems to be counterintuitive. Can anybody help with this problem?
Ok...so I sort of found a solution to this problem.
First of all I try to parse the xml using the following code:
private XDocument GetXmlDocument(String physicalPath)
{
XDocument xmlDocument;
var xmlStream = new System.IO.StreamReader(physicalPath);
try
{
xmlDocument = XDocument.Load(xmlStream, LoadOptions.SetLineInfo);
}
catch (XmlException)
{
//_log.Warn("Trying to clean document for HexaDecimal", xmlException);
xmlDocument = XmlSanitizingStream.TryToCleanXMLBeforeParsing(physicalPath);
}
return xmlDocument;
}
If it fails to load the document, then I will try to clean it using the technique used in this blogpost:
http://seattlesoftware.wordpress.com/2008/09/11/hexadecimal-value-0-is-an-invalid-character/
It will not remove the character I mentioned before, but it will remove any character not allowed by the XML standard.
Then, after sanitizing the XML, I add an XMLReader and set its settings to not check characters:
public static XDocument TryToCleanXMLBeforeParsing(String physicalPath)
{
string xml;
Encoding encoding;
using (var reader = new XmlSanitizingStream(File.OpenRead(physicalPath)))
{
xml = reader.ReadToEnd();
encoding = reader.CurrentEncoding;
}
byte[] encodedString;
if (encoding.Equals(Encoding.UTF8)) encodedString = Encoding.UTF8.GetBytes(xml);
else if (encoding.Equals(Encoding.UTF32)) encodedString = Encoding.UTF32.GetBytes(xml);
else encodedString = Encoding.Unicode.GetBytes(xml);
var ms = new MemoryStream(encodedString);
ms.Flush();
ms.Position = 0;
var settings = new XmlReaderSettings {CheckCharacters = false};
XmlReader xmlReader = XmlReader.Create(ms, settings);
var xmlDocument = XDocument.Load(xmlReader);
ms.Close();
return xmlDocument;
}
Since I've cleaned the document removing illegal characters before I add the ignore characters to the reader, I am pretty sure that I do not read a malformed XML document. Worst case scenario is I get a malformed XML and it will throw an error anyways.
I only use this for parsing and it should only be used to read the data. This will not make the XML well-formed and will in many cases throw exceptions elsewhere in your code. I am only using this because I cannot change what the customer is sending us and I have to read it as is.

Remove all hexadecimal characters before loading string into XML Document Object?

I have an xml string that is being posted to an ashx handler on the server. The xml string is built on the client-side and is based on a few different entries made on a form. Occasionally some users will copy and paste from other sources into the web form. When I try to load the xml string into an XMLDocument object using xmldoc.LoadXml(xmlStr) I get the following exception:
System.Xml.XmlException = {"'', hexadecimal value 0x0B, is an invalid character. Line 2, position 1."}
In debug mode I can see the rogue character (sorry I'm not sure of it's official title?):
My questions is how can I sanitise the xml string before I attempt to load it into the XMLDocument object? Do I need a custom function to parse out all these sorts of characters one-by-one or can I use some native .NET4 class to remove them?
Here you have an example to clean xml invalid characters using Regex:
xmlString = CleanInvalidXmlChars(xmlString);
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(xmlString);
public static string CleanInvalidXmlChars(string text)
{
string re = #"[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000-x10FFFF]";
return Regex.Replace(text, re, "");
}
A more efficient way to not error out on invalid XML characters would be to use the CheckCharacters flag in XmlReaderSettings.
var xmlDoc = new XmlDocument();
var xmlReaderSettings = new XmlReaderSettings { CheckCharacters = false };
using (var stringReader = new StringReader(xml)) {
using (var xmlReader = XmlReader.Create(stringReader, xmlReaderSettings)) {
xmlDoc.Load(xmlReader);
}
}

XMLDocument, problem reading Node

I am doing the following:
System.Net.WebRequest myRequest = System.Net.WebRequest.Create("http://www.atlantawithkid.com/feed/");
System.Net.WebResponse myResponse = myRequest.GetResponse();
System.IO.Stream rssStream = myResponse.GetResponseStream();
System.Xml.XmlDocument rssDoc = new System.Xml.XmlDocument();
rssDoc.Load(rssStream);
System.Xml.XmlNodeList rssItems = rssDoc.SelectNodes("rss/channel/item");
System.Xml.XmlNode rssDetail;
// FEED DESCRIPTION
string sRssDescription;
rssDetail = rssItems.Item(0).SelectSingleNode("description");
if (rssDetail != null)
sRssDescription = rssDetail.InnerText;
But, when I read the "description" node and view the InnerText, or the InnerXML, the string is different than in the original XML document.
The string return has and ellipses and the data si truncated. However, in the original XML document there is data that I can see.
Is there a way to select this node without the data being altered?
Thanks for the help.
I suspect you're looking at the string in the debugger, and that may be truncating the data. (Or you're writing it into something else which truncates text.)
I very much doubt that this is an XmlDocument problem.
I suggest you log the InnerText somewhere that you know you'll be able to get full data out, so you can tell for sure.

Categories