I also think I'm confused about XPath usage. I'm new to C# and XPath, so please be patient with me ;)
First, my XML file that I'm testing with:
<?xml version="1.0" encoding="ISO-8859-1"?>
<testroot>
<testnode>
<name>Test Node 1</name>
<things>
<thing>
<number>One</number>
</thing>
<thing>
<number>Two</number>
</thing>
</things>
</testnode>
<testnode>
<name>Test Node 2</name>
<things>
<thing>
<number>Three</number>
</thing>
<thing>
<number>Four</number>
</thing>
</things>
</testnode>
<testnode>
<name>Test Node 3</name>
<things>
<thing>
<number>Five</number>
</thing>
</things>
</testnode>
</testroot>
So first I want to get the "testnode" that contains a "name" I'm interested in. So I did the following, which worked correctly:
XmlNode testRoot = xmlDoc.DocumentElement.SelectSingleNode("/testroot/testnode[name=\"Test Node 1\"]");
Now I want to get all of the nodes under it that contain a "number" element. According to my reading, I should be able to do this:
XmlNodeList testNodes = testRoot.SelectNodes("number");
But that yields an empty list. The only way I got any results was to use //:
XmlNodeList testNodes = testRoot.SelectNodes("//number");
The problem is that seems to search all of the siblings and parents of the testRoot. When I print everything out I get every node in the file that contains "number":
txtOutput.InnerHtml += "<p>" + testRoot.FirstChild.InnerText + "</p>";
foreach (XmlNode node in testNodes)
{
txtOutput.InnerHtml += node.InnerText + "<br />";
}
Test Node 1
One
Two
Three
Four
Five
This behavior confuses me. Am I using XPaths improperly, or does it normally search from the absolute root no matter which XmlNode you start with?
You're trying to find direct child nodes called number. There aren't any of those - just one things element. If you want descendants, say so:
XmlNodeList testNodes = testRoot.SelectNodes("descendant::number");
The version with "//number" basically looks through all nodes in the document, hence your other result.
Having said all this, if you're using .NET 3.5 or higher I'd just use LINQ to XML and do all the querying in that. It's a much nicer API, IMO :)
Related
I have the current xml file:
<?xml version="1.0"?>
<Master>
<Child1>
<Display>Some things here</Display>
<Link>http://google.ca</Link>
<Description>Desc</Description>
<Image>http://google.ca</Image>
</Child2>
</Master>
I already figured out how to get the link using doc.SelectSingleNode("Master/Child1/Link").InnerText;
But now, I need a way to list every child (Like Child1, there is way more than that which all have subnodes like link, display....)
I tried a bunch of things but all I found online was how to get "name" from <Master name="Name Here"/>
Also, I'd need it to act as a String (being able to print it to console without getting System.xml.XmlNode)
Thanks for your help.
In xpath * matches any node.
var nodes = doc.SelectNodes("Master/*/Link");
foreach (XmlNode node in nodes)
Console.WriteLine(node.InnerText);
var nodes = doc.SelectNodes("Master/*");
foreach (XmlNode node in nodes)
Console.WriteLine(node.Name);
I'm trying to get all the entry elements so I can display them, haven't done Xpath for a while but I thought it would be fairly simple heres what I have so far - rssNodes count is 0, what am I missing?
XmlDocument rssXmlDoc = new XmlDocument();
rssXmlDoc.Load("http://www.businessopportunities.ukti.gov.uk/alertfeed/1425362.rss");
var rssNodes = rssXmlDoc.SelectNodes("feed/entry");
The XML file has the following structure:
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<!-- some other child elements -->
<entry>
<!-- child elements -->
</entry>
<entry>
<!-- child elements -->
</entry>
<!-- more entry elements -->
<!-- some other child elements -->
</feed>
You need to properly use namespaces:
var nsm = new XmlNamespaceManager(rssXmlDoc.NameTable);
nsm.AddNamespace("atom", "http://www.w3.org/2005/Atom");
var entries = rssXmlDoc.SelectNodes("/atom:feed/atom:entry", nsm);
You need to respect XML namespaces with XPath.
NOTE: If the namespace of the entry element is known, see JLRishe's answer which is more elegant in that case.
If you don't know the XML namespace beforehand you can also ignore it using the XPath built-in local-name() function:
SelectNodes("//*[local-name()='entry']")
This will get all entry elements in the entire XML document, no matter which namespace the element belongs to.
I am pretty new in XPath and in C# and I have the following problem:
I have to parse this file: http://static.nvd.nist.gov/feeds/xml/cpe/dictionary/official-cpe-dictionary_v2.3.xml
As you can see opening it in the browser this file have the following structure:
<?xml version='1.0' encoding='UTF-8'?>
<cpe-list xmlns:meta="http://scap.nist.gov/schema/cpe-dictionary-metadata/0.2" xmlns:config="http://scap.nist.gov/schema/configuration/0.1" xmlns:ns6="http://scap.nist.gov/schema/scap-core/0.1" xmlns:scap-core="http://scap.nist.gov/schema/scap-core/0.3" xmlns="http://cpe.mitre.org/dictionary/2.0" xmlns:cpe-23="http://scap.nist.gov/schema/cpe-extension/2.3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://scap.nist.gov/schema/configuration/0.1 http://nvd.nist.gov/schema/configuration_0.1.xsd http://cpe.mitre.org/dictionary/2.0 http://scap.nist.gov/schema/cpe/2.3/cpe-dictionary_2.3.xsd http://scap.nist.gov/schema/scap-core/0.3 http://nvd.nist.gov/schema/scap-core_0.3.xsd http://scap.nist.gov/schema/scap-core/0.1 http://nvd.nist.gov/schema/scap-core_0.1.xsd http://scap.nist.gov/schema/cpe-dictionary-metadata/0.2 http://nvd.nist.gov/schema/cpe-dictionary-metadata_0.2.xsd http://scap.nist.gov/schema/cpe-extension/2.3 http://scap.nist.gov/schema/cpe/2.3/cpe-dictionary-extension_2.3.xsd">
<generator>
<product_name>National Vulnerability Database (NVD)</product_name>
<product_version>2.22.0-SNAPSHOT (PRODUCTION)</product_version>
<schema_version>2.3</schema_version>
<timestamp>2014-03-05T05:13:33.550Z</timestamp>
</generator>
<cpe-item name="cpe:/a:1024cms:1024_cms:0.7">
<title xml:lang="en-US">1024cms.org 1024 CMS 0.7</title>
<cpe-23:cpe23-item name="cpe:2.3:a:1024cms:1024_cms:0.7:*:*:*:*:*:*:*"/>
</cpe-item>
<cpe-item name="cpe:/a:1024cms:1024_cms:1.2.5">
<title xml:lang="en-US">1024cms.org 1024 CMS 1.2.5</title>
<cpe-23:cpe23-item name="cpe:2.3:a:1024cms:1024_cms:1.2.5:*:*:*:*:*:*:*"/>
</cpe-item>
<cpe-item name="cpe:/a:1024cms:1024_cms:1.3.1">
<title xml:lang="en-US">1024cms.org 1024 CMS 1.3.1</title>
<cpe-23:cpe23-item name="cpe:2.3:a:1024cms:1024_cms:1.3.1:*:*:*:*:*:*:*"/>
</cpe-item>
.............................................................
.............................................................
.............................................................
<cpe-item name="cpe:/h:zyxel:p-660hw_t3:v2">
<title xml:lang="en-US">ZyXEL P-660HW T3 Model v2</title>
<cpe-23:cpe23-item name="cpe:2.3:h:zyxel:p-660hw_t3:v2:*:*:*:*:*:*:*"/>
</cpe-item>
</cpe-list>
So now, using XPath, I have to obtain the list of all tag (excluding the first tag situated as first tag into my tag
In my code I have something like it:
XmlDocument document = new XmlDocument(); // Represent an XML document
document.Load(sourceXML.FullName); // Loads the XML document from the specified stream
// Add the namespaces:
XmlNamespaceManager nsmgr = new XmlNamespaceManager(document.NameTable);
nsmgr.AddNamespace("ns6", "http://scap.nist.gov/schema/scap-core/0.1");
nsmgr.AddNamespace("cpe-23", "http://scap.nist.gov/schema/cpe-extension/2.3");
nsmgr.AddNamespace("ns", "http://cpe.mitre.org/dictionary/2.0");
nsmgr.AddNamespace("meta", "http://scap.nist.gov/schema/cpe-dictionary-metadata/0.2");
nsmgr.AddNamespace("scap-core", "http://scap.nist.gov/schema/scap-core/0.3");
nsmgr.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
nsmgr.AddNamespace("config", "http://scap.nist.gov/schema/configuration/0.1");
/* nodeList is the collection that contains all the <cpe-item> tag that are
* inside the root <cpe-list> tag in the XML document:
*/
XmlNodeList nodeList;
nodeList = document.DocumentElement.SelectNodes("//ns:cpe-list/ns:cpe-item", nsmgr);
long conta = 0;
So I am using this line to select all the tag that are into the tag:
nodeList = document.DocumentElement.SelectNodes("//ns:cpe-list/ns:cpe-item", nsmgr);
It seems to work but I am not sure if it is correct because when I look into using the Visual Studio Debugger it say to me that my XmlNodeList nodeList contains: 80588 element (the file is very big but it seems to me to much element !!!)
Another doubt is related to the use of the ns namespace that is into my previouse code (this is not my code, I have to work on it).
Why in the previous code there is the ns namepace ahead the cpe-list and cpe-item if in the XML code to parse I smply have something like:
<cpe-item name="cpe:/a:1024cms:1024_cms:1.3.1">
<title xml:lang="en-US">1024cms.org 1024 CMS 1.3.1</title>
<cpe-23:cpe23-item name="cpe:2.3:a:1024cms:1024_cms:1.3.1:*:*:*:*:*:*:*"/>
</cpe-item>
that don't begin with ns namespace? Why is it used?
The last question is about how can I access to the title inner text content?
I am trying to do something like this but in this way can't work:
XmlNodeList nodeList;
nodeList = document.DocumentElement.SelectNodes("//ns:cpe-list/ns:cpe-item", nsmgr);
long conta = 0;
DataModel.Vulnerability.CPE currentCPE;
foreach (XmlNode node in nodeList)
{
// Access to the name ATTRIBUTE of the <cpe-item> tag:
Debug.WriteLine(String.Format("[{0:N0}] CPE: {1} Title: {2}", conta, node.Attributes["name"].Value, node.FirstChild.FirstChild.Value));
// Access to the <title> tag content:
//Debug.WriteLine(String.Format("[{0:N0}] Title: {1} Title: {2}", conta, node.SelectSingleNode("./title", nsmgr)));
XmlNode titleNode = node.SelectSingleNode("./title", nsmgr);
conta++;
}
When this code is executed I have no problem to access to the name attributes of the current cpe element into my list but I can't access to the content of the tag because when execute this line:
XmlNode titleNode = node.SelectSingleNode("./title", nsmgr);
it return that the value is null
What is the problem? What am I missing? How can I solve?
Tnx
Andrea
Your XPath looks fine given XML snippet posted in this question. It should return correct number of elements as far as I can see. Can't tell more than that, you should check further yourself.
Your XML has default namespace (xmlns="....."). All elements in XML without prefix considered in default namespace. But in XPath, all element without prefix considered has no namespace. In the end, that different paradigm of both platform requires you to define ns prefix that point to default namespace url for use in XPath statement.
Related to point 2. Remember that all element without prefix is in default namespace. So is <title> element. Hence you need to add ns prefix in the XPath statement : ./ns:title
Ideally, one post has to contains no more than one specific question. Answering a bunch of questions in one post is rarely useful for future visitors, it is tend to confuse them instead. Remember that we are not only solving your problem here, but also trying to build knowledge-base that hopefully useful for others having similar problem.
I have the following XML structure:
<?xml version="1.0" encoding="utf-16"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance
" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<StoreResponse xmlns="http://www.some-site.com">
<StoreResult>
<Message />
<Code>OK</Code>
</StoreResult>
</StoreResponse>
</soap:Body>
</soap:Envelope>
I need to get the InnerText from Codeout of this document and I need help with the appropriate XPATH statement.
I'm really confused by XML namespaces. While working on a previous namespace problem in another XML document, I learned, that even if there's nothing in front of Code (e.g. ns:Code), it is still part of a namespace defined by the xmlns attribute in its parent node. Now, there are multiple xmlns nodes defined in parents of Code. What is the namespace that I need to specify in an XPATH statement? Is there such a thing as a "primary namespace"? Do childnodes inherit the (primary) namespace of it's parents?
The namespace of the <Code> element is http://www.some-site.com. xmlsn:xxx means that names prefixed by xxx: (like soap:Body) have that namespace. xmlns by itself means that this is the default namespace for names without any prefix.
An example of using an XDocument (Linq) approach:
XNamespace ns = "http://www.some-site.com";
var document = XDocument.Parse("your-xml-string");
var elements = document.Descendants( ns + "StoreResult" )
Descendant elements will inherit the last immediate namespace. In your example you will need to create two namespaces one for the soap envelope and a second for "some-site".
Here's an option I found in this question: Weirdness with XDocument, XPath and namespaces
var xml = "<your xml>";
var doc = XDocument.Parse(xml); // Could use .Load() here too
var code = doc.XPathSelectElement("//*[local-name()='Code']");
I'm trying to read a XML-RSS-Feed from a website. Therefore I use a async download and create a XDocument with the XDocument.Parse() Method.
The Document intends to be very simple, like this:
<root>
<someAttribute></SomeAttribute>
<item>...</item>
<item>...</item>
</root>
Now I want to read out all the items. Therefore I tried:
foreach (XElement NewsEntry in xDocument.Descendants("item"))
but this doesn't work. So I found a post in this board to use the qualified name, because there are some namespaces defined in the root element:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns="http://purl.org/rss/1.0/">
well, I tried all 3 available namespaces - nothing worked for me:
XName itemName = XName.Get("item", "http://www.w3.org/1999/02/22-rdf-syntax-ns#");
XName itemName2 = XName.Get("item", "http://purl.org/dc/elements/1.1/");
XName itemName3 = XName.Get("item", "http://purl.org/rss/1.0/modules/syndication/");
Any help would be appreciated.
(Usually I'm doing the XML-Analysis with Regex - but this time I'm developing for a mobile device, and therefore need to care about performance.)
You have not tried the default namespace at the end of the rdf declaration:
xmlns="http://purl.org/rss/1.0/"
This makes sense, as any element in the default namespace will not need to have the namespace prepended to the element name.
Not directly a solution to the XDocument RSS read problem. But why aren't you using the provided SyncdicationFeed class to load the feed? http://msdn.microsoft.com/en-us/library/system.servicemodel.syndication.syndicationfeed.aspx
Try this
var elements = from p in xDocument.Root.Elements()
where p.Name.LocalName == "item"
select p;
foreach(var element in elements)
{
//Do stuff
}