Why does this XML file load slowly? - c#

I have some very simple code:
XmlDocument doc = new XmlDocument();
Console.WriteLine("loading");
doc.Load(url);
Console.WriteLine("loaded");
XmlNodeList nodeList = doc.GetElementsByTagName("p");
foreach(XmlNode node in nodeList)
{
Console.WriteLine(node.ChildNodes[0].Value);
}
return source;
I'm working on this file and it takes two minutes to load. Why does it take so long? I tried both with fetching and file from the net and loading a local file.

I imagine it's the DTD of the page that's taking so long to load. Given that it defines entities, you shouldn't disable it, so you're probably better off not going down this path.
Given the inner workings of the wikipedia parser (a right mess), I'd say it's a big leap to assume it's going to produce well-formed XHTML every time.
Use HTML Agility Pack to parse (then you can convert to XmlDocument a little more easily if required, IIRC).
If you really want to go down the XmlDocument route you can keep a local cache of the HTML DTDs. See this post, this post and this post for details.

It is becuase XmlDocument doesn't just load your Xml into a nice class heirarchy it also goes and fetches all of the namespace DTD's defined in the document. Run fiddler and you will see the calls to fetch
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent
http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent
These all took me about 20 seconds to fetch.

Related

C# does XPathDocument(Path) evaluate the whole path before returning the document?

my English is not he best, but it will work I think.
Also I'm an absolut newcomer to C#.
Given is the following code snippet:
It has to open an XML-document, from which I KNOW that one of the nodes can be missintepreted, btw is really wrong.
try
{
XPathDocument expressionLib = new XPathDocument(path);
XPathNavigator xnav = expressionLib.CreateNavigator();
}
...and so on
my intention is to create the XPathDocument and the XPathNavigator and THEN watch out for the errors.
but my code Fails with "XPathDocument expressionLib = new XPathDocument(path);" (well, it raises an expception which I catch) so I assume that "XPathDocument(path);" validates the whole XML-document before returning it.
At Microsoft pages I didn't find any hints for that assumed behavior - can you verify it?
And, what could be the workaround?
Yes, I WANT open that XML with that error inside (not at the topmost node) and react just for that invalid node and work with the rest of the file.
Enjoy Weekend
Alex.
There is no workaround. If the document is not a valid XML document or there are invalid characters or sections in the document you'll get the exception.
The only way to continue is to handle the XmlException and try to manipulate the Xml data to make it valid which could range from simple if it's just a matter of escaping some invalid character(s) to complex if you have to perform some advanced formatting or if you receive documents containing many different types of errors.
Perhaps the best course of action is to write an XML validator/repair class you'd put your XML document through before attempting to load it with XPathDocument class although I'm pretty sure there must be some library out there that would be able to do all the heavy lifting for you...

Reading oddly-formatted XML file C#

I need some help with reading an oddly-formatted XML file. Because of the way the nodes and attributes are structured, I keep running into XMLException errors (at least, that's what the output window is telling me; my breakpoints refuse to fire so that I can check it). Anyway, here's the XML. Anyone experienced anything like this before?
<ApplicationMonitoring>
<MonitoredApps>
<Application>
<function1 listenPort="5000"/>
</Application>
<Application>
<function2 listenPort="6000"/>
</Application>
</MonitoredApps>
<MIBs>
<site1 location="test.mib"/>
</MIBs>
<Community value="public"/>
<proxyAgent listenPort="161" timeOut="2"/>
</ApplicationMonitoring>
Cheers
EDIT: Current version of the parsing code (file path shortened - Im not actually using this one):
XmlDocument xml = new XmlDocument();
xml.LoadXml(#"..\..\..\ApplicationMonitoring.xml");
string port = xml.DocumentElement["proxyAgent"].InnerText;
Your problem in loading the XML is that xml.LoadXml expects you to pass the xml document as a string, not a file reference.
Try instead using:
xml.Load(#"..\..\..\ApplicationMonitoring.xml");
Essentially in your original code you are telling it that your xml document is
..\..\..\ApplicationMonitoring.xml
And I'm sure you can now see why there is a parse exception. :) I've tested this with your xml document and the modified load and it works fine (except for the issue that Only Bolivian Here pointed out with the fact that your inner Text is not going to return anything.
For completeness you probably want:
XmlDocument xml = new XmlDocument();
xml.Load(#"..\..\..\ApplicationMonitoring.xml");
string port = xml.DocumentElement["proxyAgent"].Attributes["listenPort"].Value;
//And to get stuff more specifically in the tree something like this
string function1 = xml.SelectSingleNode("//function1").Attributes["listenPort"].Value;
Note the use of the Value property on the attribute and not the ToString method which won't do what you are expecting.
Exactly how you extract the data from the xml is probably dependant on what you are doing with it. For example you may want to get a list of Application nodes to enumerate over with a foreach by doing this xml.SelectNodes("//Application").
If you are having trouble with extdacting stuff though that is probably the scope of a different question since this was just about how to get the XML document loaded.
xml.DocumentElement["proxyAgent"].InnerText;
The proxyAgent element is self closing. InnerText will return the string inside of an XML element, in this case, there is no inner elements.
You need to access an attribute of the element, not the InnerText.
Try this:
string port = xml.GetElementsByTagName("ProxyAgent")[0].Attributes["listenPort"].ToString();
Or use Linq to XML:
http://msdn.microsoft.com/en-us/library/bb387098.aspx
And... your XML is not malformed...

Parsing a file containing xml elements in C#

I have a big xml file which is composed of multiple xml files. The file structure is like this.
<xml1>
</xml1>
<xml2>
</xml2>
<xml3>
</xml3>
.
.
<xmln>
</xmln>
I want to process each XML file by a LINQ statement but I am not sure how to extract each XML element individually from the file and continue the iteration until the end. I figured experts here would be able to give me the nudge I am looking for. My requirement is to extract the individual elements so i could load into a XDoc element for further processing and continue on through the subsequent elements. Any help on this would be greatly appreciated.
Thanks for reading!!
Assuming each element is a valid XML, you can wrap your document in a top-level tag and then load it just fine:
<wrap>
<xml1>
</xml1>
<xml2>
</xml2>
</wrap>
You can use the System.Xml reference, which includes a lot of built-in functionality, and it allows you to declare an XmlTextReader. More info here.
If this is an error log with individual well formed xml elements, you probably don't want to load it as an XMLDocument. A better option is to use an XPathDocument created with an XmlReader where the XmlReaderSettings specify ConformanceLevel.Fragment. Then you can use an XPathNavigator to access your elements and attributes as needed. See the third answer in this forum post for sample code.
...late by just a year and then some. :)

Alternatives to XDocument

Hey guys, XDocument is being very finicky with one of the xml feeds I have to parse, and keeps giving me the error
'=' is an unexpected token. The expected token is ';'. Line 1, position 576.
Which is basically XDocument crying about a loose "=" sign in the XML document.
I don't have any control over the source XML document, so I need to either get XDocument to ignore this error, or use some other class. Any ideas on either one?
If the document isn't well-formed XML (and my guess is that you have '&=' in the document or some other entity-looking string) then it's unlikely that any other XML parsers are going to be any happier with it. Have you tried loading the document in, say, IE to see if it parses there or pasted to an XML validator? You can also just try XmlDocument.Load() and see if it parses there, that's the next closest XML parser (aside from XmlReader which takes a little bit of setting up).
It won't make for good XML, but if you need to just load up a bad document then the HTML Agility Pack is a good tool. It can overlook many of the things that make HTML not XHTML and not XML-like, so your erroneous XML input will likely be parsed too. The object model it expresses is similar to XmlDocument. e.g.
HtmlDocument doc = new HtmlDocument();
doc.Load("file.xml");
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[#href"])
{
HtmlAttribute att = link["href"];
att.Value = FixLink(att);
}
doc.Save("file.htm");
Or you can use Agility Pack to clean up the XML and then feed its clean output to a real XML parser for further processing.
This is a quick and dirty trick that I've used for one-time tasks. It's not necessarily recommended over a proper solution.
What I would recommended if time permits is to somehow format/fix the erroneous XML content (e.g. maybe in its string form, or using another tool) before feeding it to an XML parser.
Take a look at the answers of this question: Parsing an XML/XHTML document but ignoring errors in C#
The best option I believe is to parse it in a try/catch block, remove the offending block inside the catch block, and re-parse.

Parse XML in C#

Hello I want to know how can I parse this simple XML file content in C#. I can have multiple "in" elements, and from those I want to use date, min, max and state child values.
<out>
<in>
<id>16769</id>
<date>29-10-2010</date>
<now>12</now>
<min>12</min>
<max>23</max>
<state>2</state>
<description>enter text</description>
</in>
<in>
<id>7655</id>
<date>12-10-2010</date>
<now>1</now>
<min>1</min>
<max>2</max>
<state>0</state>
<description>enter text</description>
</in>
</out>
The System.XML namespace has all sorts of tools for parsing, reading, and writing XML data. By the way, your XML isn't well-formed; you've got two <out> elements, but only one </out> element.
Linq to xml is also helpful for parsing xml -
http://msdn.microsoft.com/en-us/library/bb387098.aspx
Also -
http://msdn.microsoft.com/library/bb308960.aspx
You need System.XML, starting with XmlDocument.Load(filename).
Once you have the XmlDocument loaded, you can drill down into it as needed using the built-in .Net XML object model, starting from XmlDocument level. You can walk the tree recursively in a pretty intuitive way, capturing what you want from each XmlNode as you go.
Alternatively (and preferably) you can quickly locate all XmlNodes in your XmlDocument that match certain conditions using XPath - examples here. An example of usage in C# is XmlNode.SelectNodes.
using System;
using System.IO;
using System.Xml;
public class Sample {
public static void Main() {
XmlDocument doc = new XmlDocument();
doc.Load("booksort.xml");
XmlNodeList nodeList;
XmlNode root = doc.DocumentElement;
nodeList=root.SelectNodes("descendant::book[author/last-name='Austen']");
//Change the price on the books.
foreach (XmlNode book in nodeList)
{
book.LastChild.InnerText="15.95";
}
Console.WriteLine("Display the modified XML document....");
doc.Save(Console.Out);
}
}
Examples can be found here http://www.c-sharpcorner.com/uploadfile/mahesh/readwritexmltutmellli2111282005041517am/readwritexmltutmellli21.aspx
This might be beyond what you want to do, but worth mentioning...
I hate parsing XML. Seriously, I almost refuse to do it, especially since .NET can do it for me. What I would do is create an "In" object that has the properties above. You probably have one already, or it would take 60 seconds to create. You'll also need a List of In objects called "Out".
Then just deserialze the XML into the objects. This takes just a few lines of code. Here is an example. BTW, this makes changing and re-saving the data just as easy.
How to serialize/deserialize

Categories