SelectNodes not working on stackoverflow feed - c#

I'm trying to add support for stackoverflow feeds in my rss reader but SelectNodes and SelectSingleNode have no effect. This is probably something to do with ATOM and xml namespaces that I just don't understand yet.
I have gotten it to work by removing all attributes from the feed tag, but that's a hack and I would like to do it properly. So, how do you use SelectNodes with atom feeds?
Here's a snippet of the feed.
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:thr="http://purl.org/syndication/thread/1.0">
<title type="html">StackOverflow.com - Questions tagged: c</title>
<link rel="self" href="http://stackoverflow.com/feeds/tag/c" type="application/atom+xml" />
<subtitle>Check out the latest from StackOverflow.com</subtitle>
<updated>2008-08-24T12:25:30Z</updated>
<id>http://stackoverflow.com/feeds/tag/c</id>
<creativeCommons:license>http://www.creativecommons.org/licenses/by-nc/2.5/rdf</creativeCommons:license>
<entry>
<id>http://stackoverflow.com/questions/22901/what-is-the-best-way-to-communicate-with-a-sql-server</id>
<title type="html">What is the best way to communicate with a SQL server?</title>
<category scheme="http://stackoverflow.com/feeds/tag/c/tags" term="c" /><category scheme="http://stackoverflow.com/feeds/tag/c/tags" term="c++" /><category scheme="http://stackoverflow.com/feeds/tag/c/tags" term="sql" /><category scheme="http://stackoverflow.com/feeds/tag/c/tags" term="mysql" /><category scheme="http://stackoverflow.com/feeds/tag/c/tags" term="database" />
<author><name>Ed</name></author>
<link rel="alternate" href="http://stackoverflow.com/questions/22901/what-is-the-best-way-to-communicate-with-a-sql-server" />
<published>2008-08-22T05:09:04Z</published>
<updated>2008-08-23T04:52:39Z</updated>
<summary type="html"><p>I am going to be using c/c++, and would like to know the best way to talk to a MySQL server. Should I use the library that comes with the server installation? Are they any good libraries I should consider other than the official one?</p></summary>
<link rel="replies" type="application/atom+xml" href="http://stackoverflow.com/feeds/question/22901/answers" thr:count="2"/>
<thr:total>2</thr:total>
</entry>
</feed>
The Solution
XmlDocument doc = new XmlDocument();
XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
nsmgr.AddNamespace("atom", "http://www.w3.org/2005/Atom");
doc.Load(feed);
// successful
XmlNodeList itemList = doc.DocumentElement.SelectNodes("atom:entry", nsmgr);

Don't confuse the namespace names in the XML file with the namespace names for your namespace manager. They're both shortcuts, and they don't necessarily have to match.
So you can register "http://www.w3.org/2005/Atom" as "atom", and then do a SelectNodes for "atom:entry".

You might need to add a XmlNamespaceManager.
XmlDocument document = new XmlDocument();
XmlNamespaceManager nsmgr = new XmlNamespaceManager(document.NameTable);
nsmgr.AddNamespace("creativeCommons", "http://backend.userland.com/creativeCommonsRssModule");
// AddNamespace for other namespaces too.
document.Load(feed);
It is needed if you want to call SelectNodes on a document that uses them. What error are you seeing?

You've guessed correctly: you're asking for nodes not in a namespace, but these nodes are in a namespace.
Description of the problem and solution: http://weblogs.asp.net/wallen/archive/2003/04/02/4725.aspx

I just want to use..
XmlNodeList itemList = xmlDoc.DocumentElement.SelectNodes("entry");
but, what namespace do the entry tags fall under? I would assume xmlns="http://www.w3.org/2005/Atom", but it has no title so how would I add that namespace?
XmlDocument document = new XmlDocument();
XmlNamespaceManager nsmgr = new XmlNamespaceManager(document.NameTable);
nsmgr.AddNamespace("", "http://www.w3.org/2005/Atom");
document.Load(feed);
Something like that?

Related

Error in Xml parsing with C# XmlDocument class

When I add xmls attribute to my root element this code through a exception at third line " Object reference not set to an instance of an object" but after removing xmls attribute from root element it it works fine.
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load("file.xml");
MessageBox.Show(xmlDoc.SelectSingleNode("person/name").InnerText);
here is my xmlfile
<?xml version="1.0" encoding="utf-8"?>
<person xmlns="namespace path">
<name>myname</name>
</person>
I want to know why it does not works after adding xmlns attribute to my root element. Do I have to use another method for parsing ?.
You need to add namespace messenger to resolve namespaces to your xml file.
Consider this example
XML File
<?xml version="1.0" encoding="utf-8"?>
<person xmlns="http://www.findpersonName.com"> // Could be any namespace
<name>myname</name>
</person>
and in your code
XmlDocument doc = new XmlDocument();
doc.Load("file.xml");
//Create an XmlNamespaceManager for resolving namespaces.
XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
nsmgr.AddNamespace("ab", "http://www.findpersonName.com");
MessageBox.Show(doc.SelectSingleNode("//ab:name", nsmgr).InnerText);
Note
If the XPath expression does not include a prefix, it is assumed
that the namespace URI is the empty namespace. If your XML includes a
default namespace, you must still add a prefix and namespace URI to
the XmlNamespaceManager; otherwise, you will not get a node selected.
For more information, see Select Nodes Using XPath Navigation.
XmlNamespaceManager ns = new XmlNamespaceManager(xmldoc.NameTable);
ns.AddNamespace("something", "http://or.other.com/init");
XmlNode node = xmldoc.SelectSingleNode("something:person/name", ns);
You may want to consider using XDocument and Linq to process your XML document.
The following example provides a rough example:
XDocument xDoc = XDocument.Load("file.xml");
var personNames = (from x in xDoc.Descendants("person").Descendants("name") select x).FirstOrDefault();
How to Get XML Node from XDocument

Why I can't access to these node content using XPath?

I am pretty new in XPath and in C# and I have the following problem:
I have to parse this file: http://static.nvd.nist.gov/feeds/xml/cpe/dictionary/official-cpe-dictionary_v2.3.xml
As you can see opening it in the browser this file have the following structure:
<?xml version='1.0' encoding='UTF-8'?>
<cpe-list xmlns:meta="http://scap.nist.gov/schema/cpe-dictionary-metadata/0.2" xmlns:config="http://scap.nist.gov/schema/configuration/0.1" xmlns:ns6="http://scap.nist.gov/schema/scap-core/0.1" xmlns:scap-core="http://scap.nist.gov/schema/scap-core/0.3" xmlns="http://cpe.mitre.org/dictionary/2.0" xmlns:cpe-23="http://scap.nist.gov/schema/cpe-extension/2.3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://scap.nist.gov/schema/configuration/0.1 http://nvd.nist.gov/schema/configuration_0.1.xsd http://cpe.mitre.org/dictionary/2.0 http://scap.nist.gov/schema/cpe/2.3/cpe-dictionary_2.3.xsd http://scap.nist.gov/schema/scap-core/0.3 http://nvd.nist.gov/schema/scap-core_0.3.xsd http://scap.nist.gov/schema/scap-core/0.1 http://nvd.nist.gov/schema/scap-core_0.1.xsd http://scap.nist.gov/schema/cpe-dictionary-metadata/0.2 http://nvd.nist.gov/schema/cpe-dictionary-metadata_0.2.xsd http://scap.nist.gov/schema/cpe-extension/2.3 http://scap.nist.gov/schema/cpe/2.3/cpe-dictionary-extension_2.3.xsd">
<generator>
<product_name>National Vulnerability Database (NVD)</product_name>
<product_version>2.22.0-SNAPSHOT (PRODUCTION)</product_version>
<schema_version>2.3</schema_version>
<timestamp>2014-03-05T05:13:33.550Z</timestamp>
</generator>
<cpe-item name="cpe:/a:1024cms:1024_cms:0.7">
<title xml:lang="en-US">1024cms.org 1024 CMS 0.7</title>
<cpe-23:cpe23-item name="cpe:2.3:a:1024cms:1024_cms:0.7:*:*:*:*:*:*:*"/>
</cpe-item>
<cpe-item name="cpe:/a:1024cms:1024_cms:1.2.5">
<title xml:lang="en-US">1024cms.org 1024 CMS 1.2.5</title>
<cpe-23:cpe23-item name="cpe:2.3:a:1024cms:1024_cms:1.2.5:*:*:*:*:*:*:*"/>
</cpe-item>
<cpe-item name="cpe:/a:1024cms:1024_cms:1.3.1">
<title xml:lang="en-US">1024cms.org 1024 CMS 1.3.1</title>
<cpe-23:cpe23-item name="cpe:2.3:a:1024cms:1024_cms:1.3.1:*:*:*:*:*:*:*"/>
</cpe-item>
.............................................................
.............................................................
.............................................................
<cpe-item name="cpe:/h:zyxel:p-660hw_t3:v2">
<title xml:lang="en-US">ZyXEL P-660HW T3 Model v2</title>
<cpe-23:cpe23-item name="cpe:2.3:h:zyxel:p-660hw_t3:v2:*:*:*:*:*:*:*"/>
</cpe-item>
</cpe-list>
So now, using XPath, I have to obtain the list of all tag (excluding the first tag situated as first tag into my tag
In my code I have something like it:
XmlDocument document = new XmlDocument(); // Represent an XML document
document.Load(sourceXML.FullName); // Loads the XML document from the specified stream
// Add the namespaces:
XmlNamespaceManager nsmgr = new XmlNamespaceManager(document.NameTable);
nsmgr.AddNamespace("ns6", "http://scap.nist.gov/schema/scap-core/0.1");
nsmgr.AddNamespace("cpe-23", "http://scap.nist.gov/schema/cpe-extension/2.3");
nsmgr.AddNamespace("ns", "http://cpe.mitre.org/dictionary/2.0");
nsmgr.AddNamespace("meta", "http://scap.nist.gov/schema/cpe-dictionary-metadata/0.2");
nsmgr.AddNamespace("scap-core", "http://scap.nist.gov/schema/scap-core/0.3");
nsmgr.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
nsmgr.AddNamespace("config", "http://scap.nist.gov/schema/configuration/0.1");
/* nodeList is the collection that contains all the <cpe-item> tag that are
* inside the root <cpe-list> tag in the XML document:
*/
XmlNodeList nodeList;
nodeList = document.DocumentElement.SelectNodes("//ns:cpe-list/ns:cpe-item", nsmgr);
long conta = 0;
So I am using this line to select all the tag that are into the tag:
nodeList = document.DocumentElement.SelectNodes("//ns:cpe-list/ns:cpe-item", nsmgr);
It seems to work but I am not sure if it is correct because when I look into using the Visual Studio Debugger it say to me that my XmlNodeList nodeList contains: 80588 element (the file is very big but it seems to me to much element !!!)
Another doubt is related to the use of the ns namespace that is into my previouse code (this is not my code, I have to work on it).
Why in the previous code there is the ns namepace ahead the cpe-list and cpe-item if in the XML code to parse I smply have something like:
<cpe-item name="cpe:/a:1024cms:1024_cms:1.3.1">
<title xml:lang="en-US">1024cms.org 1024 CMS 1.3.1</title>
<cpe-23:cpe23-item name="cpe:2.3:a:1024cms:1024_cms:1.3.1:*:*:*:*:*:*:*"/>
</cpe-item>
that don't begin with ns namespace? Why is it used?
The last question is about how can I access to the title inner text content?
I am trying to do something like this but in this way can't work:
XmlNodeList nodeList;
nodeList = document.DocumentElement.SelectNodes("//ns:cpe-list/ns:cpe-item", nsmgr);
long conta = 0;
DataModel.Vulnerability.CPE currentCPE;
foreach (XmlNode node in nodeList)
{
// Access to the name ATTRIBUTE of the <cpe-item> tag:
Debug.WriteLine(String.Format("[{0:N0}] CPE: {1} Title: {2}", conta, node.Attributes["name"].Value, node.FirstChild.FirstChild.Value));
// Access to the <title> tag content:
//Debug.WriteLine(String.Format("[{0:N0}] Title: {1} Title: {2}", conta, node.SelectSingleNode("./title", nsmgr)));
XmlNode titleNode = node.SelectSingleNode("./title", nsmgr);
conta++;
}
When this code is executed I have no problem to access to the name attributes of the current cpe element into my list but I can't access to the content of the tag because when execute this line:
XmlNode titleNode = node.SelectSingleNode("./title", nsmgr);
it return that the value is null
What is the problem? What am I missing? How can I solve?
Tnx
Andrea
Your XPath looks fine given XML snippet posted in this question. It should return correct number of elements as far as I can see. Can't tell more than that, you should check further yourself.
Your XML has default namespace (xmlns="....."). All elements in XML without prefix considered in default namespace. But in XPath, all element without prefix considered has no namespace. In the end, that different paradigm of both platform requires you to define ns prefix that point to default namespace url for use in XPath statement.
Related to point 2. Remember that all element without prefix is in default namespace. So is <title> element. Hence you need to add ns prefix in the XPath statement : ./ns:title
Ideally, one post has to contains no more than one specific question. Answering a bunch of questions in one post is rarely useful for future visitors, it is tend to confuse them instead. Remember that we are not only solving your problem here, but also trying to build knowledge-base that hopefully useful for others having similar problem.

C#.NET XML Processing via XDocument.Descendants not fetching entities as expected

Using an XDocument and the Descendants method.
//first problem 'entries' doesn't fetch at all
var entries = xmlDoc.Descendants(XName.Get("entry"))
//neither does
// xmlDoc.Descendants("entry")
var ids = from e in entries
select e.Element(XName.Get("id")).Value;
The same XDocument code works on a blog feed that's more verbose, i.e. my blog: http://blog.nick.josevski.com/feed/ a snippet is here: http://pastebin.com/KU65dgwL where the 'entry' element is replaced with 'item' and 'id' is replaced with 'link'.
To test any suggestions I created a LinqPad code gist that demonstrates the issue.
Am I missing something obvious? I've tried various combinations of .Elements() .Elements("entry") and just .Descendants() and then attempting to filter further without luck too.
This is the XML that I'm struggling to extract the entry/id node from:
<feed xmlns="http://www.w3.org/2005/Atom">
<title type="text">Author</title>
<subtitle type="text">subtitle</subtitle>
<link rel="alternate" href="http://www.site.com/blog" />
<entry>
<id>http://www.site.com/a-blog-post</id>
<title type="text">Title Of Blog Post</title>
...
<entry>
<id>http://www.site.com/another-blog-post</id>
<title type="text">Title Of Another Blog Post</title>
You are missing the XML namespace:
XNamespace ns = "http://www.w3.org/2005/Atom";
var entries = xmlDoc.Descendants(ns + "entry");

Why doesnt this xPath (c#) work?

Got this xml:
<?xml version="1.0" encoding="UTF-8"?>
<video xmlns="UploadXSD">
<title>
A vid with Pete
</title>
<description>
Petes vid
</description>
<contributor>
Pete
</contributor>
<subject>
Cat 2
</subject>
</video>
And this xpath:
videoToAdd.Title = doc.SelectSingleNode(#"/video/title").InnerXml;
And im getting an 'object reference not set to an instance of an object'. Any ideas why this is a valid xpath from what I can see and it used to work...
Your XML contains namespace specification, you need to modify your source to take that into consideration.
Example:
XmlDocument doc = new XmlDocument();
doc.Load("doc.xml");
XmlNamespaceManager xmlnsManager = new XmlNamespaceManager(doc.NameTable);
xmlnsManager.AddNamespace("ns", "UploadXSD");
videoToAdd.Title = doc.SelectSingleNode(#"/ns:video/ns:title", xmlnsManager).InnerXml;
/video/title would return a title element with no namespace, from within a video element with no namespace.
You need to either remove xmlns="UploadXSD" from your xml, or set an appropriate selection namespace in your C#
It's the xmlns="UploadXSD" attribute causing you grief here. I think you'll need to use a XmlNamespaceManager to help the parser resolve the names, or remove the xmlns attribute if you don't need it.
Is it possible that the doc variable points to the <video> element? In that case you would need to write either
videoToAdd.Title = doc.SelectSingleNode(#"./title").InnerXml;
or
videoToAdd.Title = doc.SelectSingleNode(#"//video/title").InnerXml;
Try this:
videoToAdd.Title = doc.SelectSingleNode(#"//xmlns:video/xmlns:title").InnerXml;
Your XML document has an XML namespace and to find the elements you must prefix them with xmlns:.

Query an XmlDocument without getting a 'Namespace prefix is not defined' problem

I've got an Xml document that both defines and references some namespaces. I load it into an XmlDocument object and to the best of my knowledge I create a XmlNamespaceManager object with which to query Xpath against. Problem is I'm getting XPath exceptions that the namespace "my" is not defined. How do I get the namespace manager to see that the namespaces I am referencing are already defined. Or rather how do I get the namespace definitions from the document to the namespace manager.
Furthermore tt strikes me as strange that you have to provide a namespace manager to the document which you create from the documents nametable in the first place. Even if you need to hardcode manual namespaces why can't you add them directly to the document. Why do you always have to pass this namespace manager with every single query? What can't XmlDocument just know?
Code:
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(programFiles + #"Common Files\Microsoft Shared\web server extensions\12\TEMPLATE\FEATURES\HfscBookingWorkflow\template.xml");
XmlNamespaceManager ns = new XmlNamespaceManager(xmlDoc.NameTable);
XmlNode referenceNode = xmlDoc.SelectSingleNode("/my:myFields/my:ReferenceNumber", ns);
referenceNode.InnerXml = this.bookingData.ReferenceNumber;
XmlNode titleNode = xmlDoc.SelectSingleNode("/my:myFields/my:Title", ns);
titleNode.InnerXml = this.bookingData.FamilyName;
Xml:
<?xml version="1.0" encoding="UTF-8" ?>
<?mso-infoPathSolution name="urn:schemas-microsoft-com:office:infopath:Inspection:-myXSD-2010-01-15T18-21-55" solutionVersion="1.0.0.104" productVersion="12.0.0" PIVersion="1.0.0.0" ?>
<?mso-application progid="InfoPath.Document" versionProgid="InfoPath.Document.2"?>
<my:myFields xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2010-01-15T18:21:55" xmlns:xd="http://schemas.microsoft.com/office/infopath/2003">
<my:DateRequested xsi:nil="true" />
<my:DateVisited xsi:nil="true" />
<my:ReferenceNumber />
<my:FireCall>false</my:FireCall>
Update:
ns.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
ns.AddNamespace("xhtml", "http://www.w3.org/1999/xhtml");
ns.AddNamespace("xd", "http://schemas.microsoft.com/office/infopath/2003");
ns.AddNamespace("my", "http://schemas.microsoft.com/office/infopath/2003/myXSD/2010-01-15T18:21:55");
This does the job, but it mean's I have to hard code to this particular xml schema. This schema represents an infopath form template. In particular the my namespace url will be different for every form template so I really don't want to hardcode this. It would be nice to find a clean way to get this namespace from the xml without resorting to RegEx.
I was hoping that the XmlNamespaceManager would just sort of pick up the namespace definitions form the NameTable. I mean their in the Xml but I still have to define them.
ns.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
ns.AddNamespace("xhtml", "http://www.w3.org/1999/xhtml");
ns.AddNamespace("xd", "http://schemas.microsoft.com/office/infopath/2003");
ns.AddNamespace("my", "http://schemas.microsoft.com/office/infopath/2003/myXSD/2010-01-15T18:21:55");
This does the job, but it mean's I have to hard code to this particular xml schema. This schema represents an infopath form template. In particular the my namespace url will be different for every form template so I really don't want to hardcode this. It would be nice to find a clean way to get this namespace from the xml without resorting to Regex.
I was hoping that the XmlNamespaceManager would just sort of pick up the namespace definitions form the NameTable. I mean their in the Xml but I still have to define them.
Here is the answer to the "What can't XmlDocument just know?" question.
NameTable is just an optimization for storing names. It has actually nothing to do with namespaces.
And even if XmlNamespaceManager could infer all namespaces and prefixes from XML doc that won't help in general case because of XML namespaces nature, e.g. what would XmlNamespaceManager map "my" prefix in this case:
<root>
<foo xmlns:my="blah"/>
<foo xmlns:my="balh-blah-blah"/>
</root>
Have you defined "my" in the namespace-manager?
ns.AddNamespace("my", "http://schemas.microsoft.com/office/infopath/2003/myXSD/2010-01-15T18:21:55");
Or better - choose something that is unlikely to conflict. It does seem odd that it didn't pick it up from the name-table, though.
For me with InfoPath 2007 this solved the problem
static public XmlNamespaceManager GetNameSpaceManager(this XmlDocument document)
{
XmlNamespaceManager xmlNamespaceManager = new XmlNamespaceManager(document.NameTable);
xmlNamespaceManager.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
xmlNamespaceManager.AddNamespace("dfs", "http://schemas.microsoft.com/office/infopath/2003/dataFormSolution");
xmlNamespaceManager.AddNamespace("d", "http://schemas.microsoft.com/office/infopath/2003/ado/dataFields");
xmlNamespaceManager.AddNamespace("my", "http://schemas.microsoft.com/office/infopath/2003/myXSD/2012-03-29T06:28:28");
xmlNamespaceManager.AddNamespace("xd", "http://schemas.microsoft.com/office/infopath/2003");
return xmlNamespaceManager;
}

Categories