C#.NET XML Processing via XDocument.Descendants not fetching entities as expected - c#

Using an XDocument and the Descendants method.
//first problem 'entries' doesn't fetch at all
var entries = xmlDoc.Descendants(XName.Get("entry"))
//neither does
// xmlDoc.Descendants("entry")
var ids = from e in entries
select e.Element(XName.Get("id")).Value;
The same XDocument code works on a blog feed that's more verbose, i.e. my blog: http://blog.nick.josevski.com/feed/ a snippet is here: http://pastebin.com/KU65dgwL where the 'entry' element is replaced with 'item' and 'id' is replaced with 'link'.
To test any suggestions I created a LinqPad code gist that demonstrates the issue.
Am I missing something obvious? I've tried various combinations of .Elements() .Elements("entry") and just .Descendants() and then attempting to filter further without luck too.
This is the XML that I'm struggling to extract the entry/id node from:
<feed xmlns="http://www.w3.org/2005/Atom">
<title type="text">Author</title>
<subtitle type="text">subtitle</subtitle>
<link rel="alternate" href="http://www.site.com/blog" />
<entry>
<id>http://www.site.com/a-blog-post</id>
<title type="text">Title Of Blog Post</title>
...
<entry>
<id>http://www.site.com/another-blog-post</id>
<title type="text">Title Of Another Blog Post</title>

You are missing the XML namespace:
XNamespace ns = "http://www.w3.org/2005/Atom";
var entries = xmlDoc.Descendants(ns + "entry");

Related

how to extract attribute from tag xml with c#?

<channel>
<title>test + test</title>
<link>http://testprog.test.net/api/test</link>
<description>test.com</description>
<category>test + test</category>
<item xml:base="http://test.com/test.html?id=25>
<guid isPermaLink="false">25</guid>
<link>http://test.com/link.html</link>
<title>title test</title>
<description>Description test description test</description>
<a10:updated>2015-05-26T10:23:53Z</a10:updated>
<enclosure type="" url="http://test.com/test/test.jpg" width="200" height="200"/>
</item>
</channel>
I extracted this tag (title test) like this:
title = ds.Tables["item"].Rows[0]["title"] as string;
how to extract url attribute from <encosure> tag with c#?
thx
First option
You can create classes to map and deserialize the XML into object and easily access as properties.
Second option
If you are only interested in few values and you don't want to create mapping classes , you can use XPath, there are many articles and questions anwered that you can easily find.
To extract url attribute from tag you can use the path:
"/channel/item/enclosure/param[#name='url']/#value"
There are many, many articles that will help you read XML, but the simple answer is to load your XML into an XML document, and simply call
doc.GetElementsByTagName("enclosure")
This will return an XmlNodeList with all 'enclosure' tags found in your document. I would really recommend doing some reading about using XML to make sure your application is functional and robust.
You can use LinqToXML and this will be better useful for you...
please refer the code
string xml = #"<channel>
<title>test + test</title>
<link>http://testprog.test.net/api/test</link>
<description>test.com</description>
<category>test + test</category>
<item xml:base=""http://test.com/test.html?id=25"">
<guid isPermaLink=""false"">25</guid>
<link>http://test.com/link.html</link>
<title>title test</title>
<description>Description test description test</description>
<a10>2015-05-26T10:23:53Z</a10>
<enclosure type="""" url=""http://anupshah.com/test/test.jpg"" width=""200"" height=""200""/>
</item>
</channel>";
var str = XElement.Parse(xml);
var result = (from myConfig in str.Elements("item")
select myConfig.Elements("enclosure").Attributes("url").SingleOrDefault())
.First();
Console.WriteLine(result.ToString());
I hope it will help you...

Getting all entry Elements within a Namespace from XML using XPath

I'm trying to get all the entry elements so I can display them, haven't done Xpath for a while but I thought it would be fairly simple heres what I have so far - rssNodes count is 0, what am I missing?
XmlDocument rssXmlDoc = new XmlDocument();
rssXmlDoc.Load("http://www.businessopportunities.ukti.gov.uk/alertfeed/1425362.rss");
var rssNodes = rssXmlDoc.SelectNodes("feed/entry");
The XML file has the following structure:
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<!-- some other child elements -->
<entry>
<!-- child elements -->
</entry>
<entry>
<!-- child elements -->
</entry>
<!-- more entry elements -->
<!-- some other child elements -->
</feed>
You need to properly use namespaces:
var nsm = new XmlNamespaceManager(rssXmlDoc.NameTable);
nsm.AddNamespace("atom", "http://www.w3.org/2005/Atom");
var entries = rssXmlDoc.SelectNodes("/atom:feed/atom:entry", nsm);
You need to respect XML namespaces with XPath.
NOTE: If the namespace of the entry element is known, see JLRishe's answer which is more elegant in that case.
If you don't know the XML namespace beforehand you can also ignore it using the XPath built-in local-name() function:
SelectNodes("//*[local-name()='entry']")
This will get all entry elements in the entire XML document, no matter which namespace the element belongs to.

Why I can't access to these node content using XPath?

I am pretty new in XPath and in C# and I have the following problem:
I have to parse this file: http://static.nvd.nist.gov/feeds/xml/cpe/dictionary/official-cpe-dictionary_v2.3.xml
As you can see opening it in the browser this file have the following structure:
<?xml version='1.0' encoding='UTF-8'?>
<cpe-list xmlns:meta="http://scap.nist.gov/schema/cpe-dictionary-metadata/0.2" xmlns:config="http://scap.nist.gov/schema/configuration/0.1" xmlns:ns6="http://scap.nist.gov/schema/scap-core/0.1" xmlns:scap-core="http://scap.nist.gov/schema/scap-core/0.3" xmlns="http://cpe.mitre.org/dictionary/2.0" xmlns:cpe-23="http://scap.nist.gov/schema/cpe-extension/2.3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://scap.nist.gov/schema/configuration/0.1 http://nvd.nist.gov/schema/configuration_0.1.xsd http://cpe.mitre.org/dictionary/2.0 http://scap.nist.gov/schema/cpe/2.3/cpe-dictionary_2.3.xsd http://scap.nist.gov/schema/scap-core/0.3 http://nvd.nist.gov/schema/scap-core_0.3.xsd http://scap.nist.gov/schema/scap-core/0.1 http://nvd.nist.gov/schema/scap-core_0.1.xsd http://scap.nist.gov/schema/cpe-dictionary-metadata/0.2 http://nvd.nist.gov/schema/cpe-dictionary-metadata_0.2.xsd http://scap.nist.gov/schema/cpe-extension/2.3 http://scap.nist.gov/schema/cpe/2.3/cpe-dictionary-extension_2.3.xsd">
<generator>
<product_name>National Vulnerability Database (NVD)</product_name>
<product_version>2.22.0-SNAPSHOT (PRODUCTION)</product_version>
<schema_version>2.3</schema_version>
<timestamp>2014-03-05T05:13:33.550Z</timestamp>
</generator>
<cpe-item name="cpe:/a:1024cms:1024_cms:0.7">
<title xml:lang="en-US">1024cms.org 1024 CMS 0.7</title>
<cpe-23:cpe23-item name="cpe:2.3:a:1024cms:1024_cms:0.7:*:*:*:*:*:*:*"/>
</cpe-item>
<cpe-item name="cpe:/a:1024cms:1024_cms:1.2.5">
<title xml:lang="en-US">1024cms.org 1024 CMS 1.2.5</title>
<cpe-23:cpe23-item name="cpe:2.3:a:1024cms:1024_cms:1.2.5:*:*:*:*:*:*:*"/>
</cpe-item>
<cpe-item name="cpe:/a:1024cms:1024_cms:1.3.1">
<title xml:lang="en-US">1024cms.org 1024 CMS 1.3.1</title>
<cpe-23:cpe23-item name="cpe:2.3:a:1024cms:1024_cms:1.3.1:*:*:*:*:*:*:*"/>
</cpe-item>
.............................................................
.............................................................
.............................................................
<cpe-item name="cpe:/h:zyxel:p-660hw_t3:v2">
<title xml:lang="en-US">ZyXEL P-660HW T3 Model v2</title>
<cpe-23:cpe23-item name="cpe:2.3:h:zyxel:p-660hw_t3:v2:*:*:*:*:*:*:*"/>
</cpe-item>
</cpe-list>
So now, using XPath, I have to obtain the list of all tag (excluding the first tag situated as first tag into my tag
In my code I have something like it:
XmlDocument document = new XmlDocument(); // Represent an XML document
document.Load(sourceXML.FullName); // Loads the XML document from the specified stream
// Add the namespaces:
XmlNamespaceManager nsmgr = new XmlNamespaceManager(document.NameTable);
nsmgr.AddNamespace("ns6", "http://scap.nist.gov/schema/scap-core/0.1");
nsmgr.AddNamespace("cpe-23", "http://scap.nist.gov/schema/cpe-extension/2.3");
nsmgr.AddNamespace("ns", "http://cpe.mitre.org/dictionary/2.0");
nsmgr.AddNamespace("meta", "http://scap.nist.gov/schema/cpe-dictionary-metadata/0.2");
nsmgr.AddNamespace("scap-core", "http://scap.nist.gov/schema/scap-core/0.3");
nsmgr.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
nsmgr.AddNamespace("config", "http://scap.nist.gov/schema/configuration/0.1");
/* nodeList is the collection that contains all the <cpe-item> tag that are
* inside the root <cpe-list> tag in the XML document:
*/
XmlNodeList nodeList;
nodeList = document.DocumentElement.SelectNodes("//ns:cpe-list/ns:cpe-item", nsmgr);
long conta = 0;
So I am using this line to select all the tag that are into the tag:
nodeList = document.DocumentElement.SelectNodes("//ns:cpe-list/ns:cpe-item", nsmgr);
It seems to work but I am not sure if it is correct because when I look into using the Visual Studio Debugger it say to me that my XmlNodeList nodeList contains: 80588 element (the file is very big but it seems to me to much element !!!)
Another doubt is related to the use of the ns namespace that is into my previouse code (this is not my code, I have to work on it).
Why in the previous code there is the ns namepace ahead the cpe-list and cpe-item if in the XML code to parse I smply have something like:
<cpe-item name="cpe:/a:1024cms:1024_cms:1.3.1">
<title xml:lang="en-US">1024cms.org 1024 CMS 1.3.1</title>
<cpe-23:cpe23-item name="cpe:2.3:a:1024cms:1024_cms:1.3.1:*:*:*:*:*:*:*"/>
</cpe-item>
that don't begin with ns namespace? Why is it used?
The last question is about how can I access to the title inner text content?
I am trying to do something like this but in this way can't work:
XmlNodeList nodeList;
nodeList = document.DocumentElement.SelectNodes("//ns:cpe-list/ns:cpe-item", nsmgr);
long conta = 0;
DataModel.Vulnerability.CPE currentCPE;
foreach (XmlNode node in nodeList)
{
// Access to the name ATTRIBUTE of the <cpe-item> tag:
Debug.WriteLine(String.Format("[{0:N0}] CPE: {1} Title: {2}", conta, node.Attributes["name"].Value, node.FirstChild.FirstChild.Value));
// Access to the <title> tag content:
//Debug.WriteLine(String.Format("[{0:N0}] Title: {1} Title: {2}", conta, node.SelectSingleNode("./title", nsmgr)));
XmlNode titleNode = node.SelectSingleNode("./title", nsmgr);
conta++;
}
When this code is executed I have no problem to access to the name attributes of the current cpe element into my list but I can't access to the content of the tag because when execute this line:
XmlNode titleNode = node.SelectSingleNode("./title", nsmgr);
it return that the value is null
What is the problem? What am I missing? How can I solve?
Tnx
Andrea
Your XPath looks fine given XML snippet posted in this question. It should return correct number of elements as far as I can see. Can't tell more than that, you should check further yourself.
Your XML has default namespace (xmlns="....."). All elements in XML without prefix considered in default namespace. But in XPath, all element without prefix considered has no namespace. In the end, that different paradigm of both platform requires you to define ns prefix that point to default namespace url for use in XPath statement.
Related to point 2. Remember that all element without prefix is in default namespace. So is <title> element. Hence you need to add ns prefix in the XPath statement : ./ns:title
Ideally, one post has to contains no more than one specific question. Answering a bunch of questions in one post is rarely useful for future visitors, it is tend to confuse them instead. Remember that we are not only solving your problem here, but also trying to build knowledge-base that hopefully useful for others having similar problem.

How do I get all the child tag names of an XML document using c#?

I am trying to find an efficient way to parse xml data into an SQL table.
This is a small example of the XML I will get, in reality there will be around 20-25 tags inside the property tag and hundreds of entries.
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<entry m:etag="W/"28"">
<id>114</id>
<title type="text">Title 1</title>
<updated>2012-07-09T15:02:08+01:00</updated>
<author>
<name />
</author>
<m:properties>
<d:ContentTypeID>456</d:ContentTypeID>
<d:ApproverComments>Correction to Title</d:ApproverComments>
<d:Name>bar.pdf</d:Name>
<d:Title>Title</d:Title>
<d:DocumentOwnerId m:type="Edm.Int32">20</d:DocumentOwnerId>
<d:DocumentControllerId m:type="Edm.Int32" m:null="true"></d:DocumentControllerId>
</m:properties>
</entry>
<entry m:etag="W/"28"">
<id>115</id>
<title type="text">Title 2</title>
<updated>2012-07-09T15:05:35+01:00</updated>
<author>
<name />
</author>
<m:properties>
<d:ContentTypeID>456</d:ContentTypeID>
<d:ApproverComments>Correction of Title2</d:ApproverComments>
<d:Name>foo.pdf</d:Name>
<d:Title>Title 2</d:Title>
<d:DocumentOwnerId m:type="Edm.Int32">20</d:DocumentOwnerId>
</m:properties>
</entry>
I will need to look at each 'entry' and pull out all the tag names inside the m: properties tag and set those as the columns for the SQL table.
I'm looking for a more efficient way to do this, rather than having to iterate through all entries' property tags then putting together a list of tag names that I then need to cross reference to make sure I get them all.
I have been trying to find a function along the lines of;
String TagNames = XMLDoc.getChildNode('properties').getChildNodeNames()
TagNames would equal "ContentTypeID, ApproverComments, Name, Title, DocumentOwnerID, DocumentControllerID"
Any help would be greatly appreciated.
Making the assumption you already have an XDocument, we'll name it doc, one approach might be this:
doc.GetElements("entry")
.Select(e => e.GetElement("m:properties")
.GetElements()
.Select(ce => ce.Name.LocalName)
.ToList())
.ToList();
That should give you a List<List<string>> for each entry.
You could even qualify the inner list with the id of the entry:
doc.GetElements("entry")
.Select(e =>
{
var id = e.GetElement("id").Value;
e.GetElement("m:properties")
.GetElements()
.Select(ce => string.Format("{0}|{1}", id, ce.Name.LocalName))
.ToList()
})
.ToList();

SelectNodes not working on stackoverflow feed

I'm trying to add support for stackoverflow feeds in my rss reader but SelectNodes and SelectSingleNode have no effect. This is probably something to do with ATOM and xml namespaces that I just don't understand yet.
I have gotten it to work by removing all attributes from the feed tag, but that's a hack and I would like to do it properly. So, how do you use SelectNodes with atom feeds?
Here's a snippet of the feed.
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:thr="http://purl.org/syndication/thread/1.0">
<title type="html">StackOverflow.com - Questions tagged: c</title>
<link rel="self" href="http://stackoverflow.com/feeds/tag/c" type="application/atom+xml" />
<subtitle>Check out the latest from StackOverflow.com</subtitle>
<updated>2008-08-24T12:25:30Z</updated>
<id>http://stackoverflow.com/feeds/tag/c</id>
<creativeCommons:license>http://www.creativecommons.org/licenses/by-nc/2.5/rdf</creativeCommons:license>
<entry>
<id>http://stackoverflow.com/questions/22901/what-is-the-best-way-to-communicate-with-a-sql-server</id>
<title type="html">What is the best way to communicate with a SQL server?</title>
<category scheme="http://stackoverflow.com/feeds/tag/c/tags" term="c" /><category scheme="http://stackoverflow.com/feeds/tag/c/tags" term="c++" /><category scheme="http://stackoverflow.com/feeds/tag/c/tags" term="sql" /><category scheme="http://stackoverflow.com/feeds/tag/c/tags" term="mysql" /><category scheme="http://stackoverflow.com/feeds/tag/c/tags" term="database" />
<author><name>Ed</name></author>
<link rel="alternate" href="http://stackoverflow.com/questions/22901/what-is-the-best-way-to-communicate-with-a-sql-server" />
<published>2008-08-22T05:09:04Z</published>
<updated>2008-08-23T04:52:39Z</updated>
<summary type="html"><p>I am going to be using c/c++, and would like to know the best way to talk to a MySQL server. Should I use the library that comes with the server installation? Are they any good libraries I should consider other than the official one?</p></summary>
<link rel="replies" type="application/atom+xml" href="http://stackoverflow.com/feeds/question/22901/answers" thr:count="2"/>
<thr:total>2</thr:total>
</entry>
</feed>
The Solution
XmlDocument doc = new XmlDocument();
XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
nsmgr.AddNamespace("atom", "http://www.w3.org/2005/Atom");
doc.Load(feed);
// successful
XmlNodeList itemList = doc.DocumentElement.SelectNodes("atom:entry", nsmgr);
Don't confuse the namespace names in the XML file with the namespace names for your namespace manager. They're both shortcuts, and they don't necessarily have to match.
So you can register "http://www.w3.org/2005/Atom" as "atom", and then do a SelectNodes for "atom:entry".
You might need to add a XmlNamespaceManager.
XmlDocument document = new XmlDocument();
XmlNamespaceManager nsmgr = new XmlNamespaceManager(document.NameTable);
nsmgr.AddNamespace("creativeCommons", "http://backend.userland.com/creativeCommonsRssModule");
// AddNamespace for other namespaces too.
document.Load(feed);
It is needed if you want to call SelectNodes on a document that uses them. What error are you seeing?
You've guessed correctly: you're asking for nodes not in a namespace, but these nodes are in a namespace.
Description of the problem and solution: http://weblogs.asp.net/wallen/archive/2003/04/02/4725.aspx
I just want to use..
XmlNodeList itemList = xmlDoc.DocumentElement.SelectNodes("entry");
but, what namespace do the entry tags fall under? I would assume xmlns="http://www.w3.org/2005/Atom", but it has no title so how would I add that namespace?
XmlDocument document = new XmlDocument();
XmlNamespaceManager nsmgr = new XmlNamespaceManager(document.NameTable);
nsmgr.AddNamespace("", "http://www.w3.org/2005/Atom");
document.Load(feed);
Something like that?

Categories