Use XPath to find a node by name attribute value - c#

I am trying to find a node by the name attribute value.
Here is a sample of the xml document:
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE kfx:XMLRELEASE SYSTEM "K000A004.dtd">
<kfx:XMLRELEASE xmlns:kfx="http://www.kofax.com/dtd/">
<kfx:KOFAXXML>
<kfx:BATCHCLASS ID="00000008" NAME="CertficateOfLiability">
<kfx:DOCUMENTS>
<kfx:DOCUMENT DOCID="00000006" DOCUMENTCLASSNAME="COL">
<kfx:DOCUMENTDATA>
<kfx:DOCUMENTFIELD NAME="Producer Name" VALUE="Howalt+McDowell Insurance" />
...
....
Here is my attempted expression:
var xml = XDocument.Load(new StreamReader("C:\\Users\\Matthew_cox\\Documents\\test.xml"));
XNamespace ns = "http://www.kofax.com/dtd/";
XmlNamespaceManager nsm = new XmlNamespaceManager(xml.CreateNavigator().NameTable);
nsm.AddNamespace("kfx", ns.NamespaceName);
var docs = xml.Descendants(ns + "DOCUMENT");
foreach(var doc in docs)
{
doc.XPathSelectElement("/DOCUMENTDATA/DOCUMENTFIELD/[#name='Producer Name']", nsm); //this line produces this exception: Expression must evaluate to a node-set.
}

XML is case-sensitive. In provided XML kfx:DOCUMENTFIELD has NAME attribute. Also your XPath doesn't have reference to namespace.
Try this XPath:
kfx:DOCUMENTDATA/kfx:DOCUMENTFIELD[#NAME = 'Producer Name']

I see two things wrong.
First of all you are selecting starting with "/", this selects from the document root, so strip the leading slash.
Secondly the expression is a bit wierd. I would include the condition directly on DOCUMENTFIELD. (I am unsure if no expression on the node axis actually means something. As in is .../[..] equivalent to .../node()[..] or perhaps even .../*[..]?)
As Kirill notes, you should also watch the casing and namespaces, but this should solve c# complaining about expressions not evaluating to node sets:
kfx:DOCUMENTDATA/kfx:DOCUMENTFIELD[#NAME = 'Producer Name']

Related

Make selectnode insensitive to uppercase or lowercase with namespaces

I have the following XML:
<myfile:bookstore>
<myfile:books>
<myfile:book> Book 1</myfile:book>
<myfile:book> Book 2</myfile:book>
</myfile:books>
</myfile:bookstore>
And the following code to select the <myfile:books> node:
XmlNamespaceManager nsmgr = new
XmlNamespaceManager(el.OwnerDocument.NameTable);
nsmgr.AddNamespace("myfile",
el.OwnerDocument.DocumentElement.NamespaceURI);
var node = el.SelectSingleNode(#"/myfile:bookstore/myfile:books", nsmgr);
How to make this work until the node name is myfile:boOkS or myfile:BOOKS insensitive to upper case and lower case?
Another question is right my namespaceManager ? Can it be more simple ?
You can use the local-name() and namespace-uri() functions to return element names and namespaces in an XPath query, and then use the translate() function to lower-case the local name.
Thus the following should work:
var elementName = "books"; // Or whatever
var nameSpaceUri = el.OwnerDocument.DocumentElement.NamespaceURI;
var xpathQuery = string.Format(#"/myfile:bookstore/*[translate(local-name(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz')='{0}' and namespace-uri()='{1}']", elementName, nameSpaceUri);
var node = el.SelectSingleNode(xpathQuery, nsmgr);
Sample working .Net fiddle.
Note that the lowercasing the return value of the name() function should be avoided in cases like this. E.g. you might be tempted to do the following:
var node = el.SelectSingleNode(#"/myfile:bookstore/*[translate(name(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz') = 'myfile:books']", nsmgr);
However, this should not be used because it hardcodes the namespace prefix myfile: inside the string literal 'myfile:books'. If later the input XML were modified to use a different prefix -- or no prefix at all, such as:
<bookstore xmlns="http://MyRootNameSpace">
<Books>
<Book> Book 1</Book>
<Book> Book 2</Book>
</Books>
</bookstore>
Then your XPath query would break despite the fact that the new XML is semantically identical to the old.

XML: Retrieve a particular value from xml

From the following XML, I want to find a value based on the Employer.
<?xml version="1.0" encoding="UTF-8"?>
<Document>
<Details>
<Employer>Taxes</Employer>
<Adr>
<Strt>Street</Strt>
<Twn>Town</Twn>
</Adr>
</Details>
<DetailsAcct>
<Recd>
<Payroll>
<Id>9</Id>
</Payroll>
</Recd>
<br>
<xy>A</xy>
</br>
</DetailsAcct>
</Document>
the C# code I applied is
detail = root.SelectSingleNode($"//w:Document//w:Employer[contains(text(), 'Taxes']/ancestor::Employer",nsmgr);
But it gives me an invalid token error.
What am I missing?
The error was due to [contains(...], notice closing parentheses is missing. And since you want to return Employer element, no need for ancestor::Employer here :
//w:Document//w:Employer[contains(., 'Taxes')]
If the XML posted resembles structure of the actual XML (except the namespaces), better to use more specific XPath i.e avoid using costly // :
/w:Document/w:Details/w:Employer[contains(., 'Taxes')]
An alternative is to use LINQ to XML.
If the XML is in a string:
string xml = "<xml goes here>";
XDocument document = XDocument.Parse(xml);
XElement element = document.Descendants("Employer").First();
string value = element.Value;
If the XML is in a .xml file:
XDocument document = XDocument.Load("xmlfile.xml");
XElement element = document.Descendants("Employer").First();
string value = element.Value;
You can also find an employer element with a specific value, if that's what you need:
XElement element = document.Descendants("Employer").First(e => e.Value == "Taxes");
Note: this will throw an exception if no element is found with the specified value. If that is not acceptable, then you can replace .First(...) with .FirstOrDefault(...) which will simply return null if no element is found.

Get nodes from xml files

How to parse the xml file?
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>link</loc>
<lastmod>2011-08-17T08:23:17+00:00</lastmod>
</sitemap>
<sitemap>
<loc>link</loc>
<lastmod>2011-08-18T08:23:17+00:00</lastmod>
</sitemap>
</sitemapindex>
I am new to XML, I tried this, but it seems to be not working :
XmlDocument xml = new XmlDocument(); //* create an xml document object.
xml.Load("sitemap.xml");
XmlNodeList xnList = xml.SelectNodes("/sitemapindex/sitemap");
foreach (XmlNode xn in xnList)
{
String loc= xn["loc"].InnerText;
String lastmod= xn["lastmod"].InnerText;
}
The problem is that the sitemapindex element defines a default namespace. You need to specify the namespace when you select the nodes, otherwise it will not find them. For instance:
XmlDocument xml = new XmlDocument();
xml.Load("sitemap.xml");
XmlNamespaceManager manager = new XmlNamespaceManager(xml.NameTable);
manager.AddNamespace("s", "http://www.sitemaps.org/schemas/sitemap/0.9");
XmlNodeList xnList = xml.SelectNodes("/s:sitemapindex/s:sitemap", manager);
Normally speaking, when using the XmlNameSpaceManager, you could leave the prefix as an empty string to specify that you want that namespace to be the default namespace. So you would think you'd be able to do something like this:
// WON'T WORK
XmlDocument xml = new XmlDocument();
xml.Load("sitemap.xml");
XmlNamespaceManager manager = new XmlNamespaceManager(xml.NameTable);
manager.AddNamespace("", "http://www.sitemaps.org/schemas/sitemap/0.9"); //Empty prefix
XmlNodeList xnList = xml.SelectNodes("/sitemapindex/sitemap", manager); //No prefixes in XPath
However, if you try that code, you'll find that it won't find any matching nodes. The reason for this is that in XPath 1.0 (which is what XmlDocument implements), when no namespace is provided, it always uses the null namespace, not the default namespace. So, it doesn't matter if you specify a default namespace in the XmlNamespaceManager, it's not going to be used by XPath, anyway. To quote the relevant paragraph from the Official XPath Specification:
A QName in the node test is expanded into an expanded-name using the
namespace declarations from the expression context. This is the same
way expansion is done for element type names in start and end-tags
except that the default namespace declared with xmlns is not used: if
the QName does not have a prefix, then the namespace URI is null (this
is the same way attribute names are expanded). It is an error if the
QName has a prefix for which there is no namespace declaration in the
expression context.
Therefore, when the elements you are reading belong to a namespace, you can't avoid putting the namespace prefix in your XPath statements. However, if you don't want to bother putting the namespace URI in your code, you can just use the XmlDocument object to return the URI of the root element, which in this case, is what you want. For instance:
XmlDocument xml = new XmlDocument();
xml.Load("sitemap.xml");
XmlNamespaceManager manager = new XmlNamespaceManager(xml.NameTable);
manager.AddNamespace("s", xml.DocumentElement.NamespaceURI); //Using xml's properties instead of hard-coded URI
XmlNodeList xnList = xml.SelectNodes("/s:sitemapindex/s:sitemap", manager);
Sitemap has 2 sub nodes "loc" and "lastmod". The nodes that you are accessing are "name" and "url". that is why you are not getting any result. Also in your XML file the last sitemap tag is not closed properly with a corresponding Kindly try xn["loc"].InnerText and see if you get the desired result.
I would definitely use LINQ to XML instead of the older XmlDocument based XML API. You can accomplish what you are looking to do using the following code. Notice, I changed the name of the element that I am trying to get the value of to 'loc' and 'lastmod', because this is what is in your sample XML ('name' and 'url' did not exist):
XElement element = XElement.Parse(XMLFILE);
IEnumerable<XElement> list = element.Elements("sitemap");
foreach (XElement e in list)
{
String LOC= e.Element("loc").Value;
String LASTMOD = e.Element("lastmod").Value;
}

How do process only certain XML nodes?

This is my XML snippet (it has a root element).
<ItemAttributes>
<Author>Ellen Galinsky</Author>
<Binding>Paperback</Binding>
<Brand>Harper Paperbacks</Brand>
<EAN>9780061732324</EAN>
<EANList>
<EANListElement>9780061732324</EANListElement>
</EANList>
<Edition>1</Edition>
<Feature>ISBN13: 9780061732324</Feature>
<Feature>Condition: New</Feature>
<Feature>Notes: BRAND NEW FROM PUBLISHER! 100% Satisfaction Guarantee. Tracking provided on most orders. Buy with Confidence! Millions of books sold!</Feature>
<ISBN>006173232X</ISBN>
<IsEligibleForTradeIn>1</IsEligibleForTradeIn>
<ItemDimensions>
<Height Units="hundredths-inches">112</Height>
<Length Units="hundredths-inches">904</Length>
<Weight Units="hundredths-pounds">98</Weight>
<Width Units="hundredths-inches">602</Width>
</ItemDimensions>
<Label>William Morrow Paperbacks</Label>
<ListPrice>
<Amount>1699</Amount>
<CurrencyCode>USD</CurrencyCode>
<FormattedPrice>$16.99</FormattedPrice>
</ListPrice>
<Manufacturer>William Morrow Paperbacks</Manufacturer>
<MPN>006173232X</MPN>
<NumberOfItems>1</NumberOfItems>
<NumberOfPages>400</NumberOfPages>
<PackageDimensions>
<Height Units="hundredths-inches">120</Height>
<Length Units="hundredths-inches">880</Length>
<Weight Units="hundredths-pounds">95</Weight>
<Width Units="hundredths-inches">590</Width>
</PackageDimensions>
<PartNumber>006173232X</PartNumber>
<ProductGroup>Book</ProductGroup>
<ProductTypeName>ABIS_BOOK</ProductTypeName>
<PublicationDate>2010-04-20</PublicationDate>
<Publisher>William Morrow Paperbacks</Publisher>
<ReleaseDate>2010-04-20</ReleaseDate>
<SKU>mon0000013657</SKU>
<Studio>William Morrow Paperbacks</Studio>
<Title>Mind in the Making: The Seven Essential Life Skills Every Child Needs</Title>
</ItemAttributes>
There are multiple "ItemAttributes" nodes, each having a different "ProductGroup" node. I want only the first "ItemAttribute" where "ProductGroup" = "book:"
This is my C# code:
XPathDocument doc = new XPathDocument(sr);
XPathNavigator nav = doc.CreateNavigator();
// Compile a standard XPath expression
XPathExpression expr;
expr = nav.Compile("//ItemAttributes[contains(ProductGroup, 'Book')]");
XPathNodeIterator iterator = nav.Select(expr);
// Iterate on the node set
try {
int x = iterator.Count; // <----------- count = 0
while (iterator.MoveNext()) { // <----------- finds nothing!
XPathNavigator nav2 = iterator.Current.Clone();
listBox1.Items.Add("price: " + nav2.Value);
}
}
catch (Exception ex) {
Console.WriteLine(ex.Message);
}
I know my code isn't correct, but I don't understand why the iterator.Count is zero!
using System.Xml.Linq
XDocument xdoc = XDocument.Load(new StringReader(xmlstr));
var foundNode = xdoc
.Descendants("ItemAttributes")
.Where(node => node.Element("ProductGroup").Value == "Book")
.First();
var price = foundNode.Element("ListPrice").Element("FormattedPrice").Value;
--EDIT--
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml.Linq;
using System.IO;
namespace ConsoleApplication4
{
class Program
{
static void Main(string[] args)
{
string xmlstr = #"
<root>
<ItemAttributes>
<Author>Ellen Galinsky</Author>
<Binding>Paperback</Binding>
<Brand>Harper Paperbacks</Brand>
<EAN>9780061732324</EAN>
<EANList>
<EANListElement>9780061732324</EANListElement>
</EANList><Edition>1</Edition>
<Feature>ISBN13: 9780061732324</Feature>
<Feature>Condition: New</Feature>
<Feature>Notes: BRAND NEW FROM PUBLISHER! 100% Satisfaction Guarantee. Tracking provided on most orders. Buy with Confidence! Millions of books sold!</Feature>
<ISBN>006173232X</ISBN>
<IsEligibleForTradeIn>1</IsEligibleForTradeIn>
<ItemDimensions>
<Height Units=""hundredths-inches"">112</Height>
<Length Units=""hundredths-inches"">904</Length>
<Weight Units=""hundredths-pounds"">98</Weight>
<Width Units=""hundredths-inches"">602</Width>
</ItemDimensions>
<Label>William Morrow Paperbacks</Label>
<ListPrice>
<Amount>1699</Amount>
<CurrencyCode>USD</CurrencyCode>
<FormattedPrice>$16.99</FormattedPrice>
</ListPrice>
<Manufacturer>William Morrow Paperbacks</Manufacturer>
<MPN>006173232X</MPN>
<NumberOfItems>1</NumberOfItems>
<NumberOfPages>400</NumberOfPages>
<PackageDimensions>
<Height Units=""hundredths-inches"">120</Height>
<Length Units=""hundredths-inches"">880</Length>
<Weight Units=""hundredths-pounds"">95</Weight>
<Width Units=""hundredths-inches"">590</Width>
</PackageDimensions>
<PartNumber>006173232X</PartNumber>
<ProductGroup>Book</ProductGroup>
<ProductTypeName>ABIS_BOOK</ProductTypeName>
<PublicationDate>2010-04-20</PublicationDate>
<Publisher>William Morrow Paperbacks</Publisher>
<ReleaseDate>2010-04-20</ReleaseDate>
<SKU>mon0000013657</SKU>
<Studio>William Morrow Paperbacks</Studio>
<Title>Mind in the Making: The Seven Essential Life Skills Every Child Needs</Title>
</ItemAttributes>
</root>
";
XDocument xdoc = XDocument.Load(new StringReader(xmlstr));
var foundNode = xdoc
.Descendants("ItemAttributes")
.Where(node => node.Element("ProductGroup").Value == "Book")
.First();
Console.WriteLine(foundNode.Element("ListPrice").Element("FormattedPrice").Value);
Console.ReadLine();
}
}
}
--EDIT2--
XDocument xdoc = XDocument.Load("http://ecs.amazonaws.com/onca/xml?AWSAccessKeyId=AKIAIAAFYAPOR6SX5GOA&AssociateTag=pragbook-20&IdType=ISBN&ItemId=9780061732324&MerchantId=All&Operation=ItemLookup&ResponseGroup=Medium&SearchIndex=Books&Service=AWSECommerceService&Timestamp=2012-02-26T20%3A18%3A37Z&Version=2011-08-01&Signature=r7yE7BQI44CqWZAiK%2FWumF3N4iutOj3re9wZtESOaKs%3D");
XNamespace ns = XNamespace.Get("http://webservices.amazon.com/AWSECommerceService/2011-08-01");
var foundNode = xdoc
.Descendants(ns+"ItemAttributes")
.Where(node => node.Element(ns+"ProductGroup").Value == "Book")
.First();
Console.WriteLine(foundNode.Element(ns+"ListPrice").Element(ns+"FormattedPrice").Value);
Console.ReadLine();
I'd use XPath with XmlDocument to handle this.
XmlDocument xDoc = new XmlDocument();
xDoc.LoadXml(myXMLString);
XmlNodeList nodeList = xDoc.SelectNodes("//ItemAttributes[./ProductGroup[text()='Book']]");
foreach (XmlNode node in nodeList)
{
//Do anything with node.Value;
}
I haven't tried your code but from what I see your XPath expression was incorrect.
I have broken down the expression I wrote above, below.
It reads as
//ItemAttributes #look for all nodes named ItemAttributes
[
./ProductGroup #with a child node called ProductGroup
[
text()='Book' #that has the string 'Book' as the text
]
]
You say that your XML is a snippet. I'm going to hazard a guess that it's contained within an element that binds the default namespace prefix to a non-trivial URI, and that this is why you're getting no results from your iterator.
Your code worked for me, with the XML document as given above. I ran the code and got one element out of the iterator. I then took your XML document and wrapped it in a root element that binds the default namespace prefix to a made-up but non-trivial URI:
<SomeRootElement xmlns="http://schemas.blahblahblah.com/example">
<ItemAttributes>
<!-- rest of your document omitted -->
</ItemAttributes>
</SomeRootElement>
I then got no results from the iterator.
I then created an XmlNamespaceManager which mapped a prefix (I chose pfx) to the namespace URI used above:
XmlNamespaceManager mgr = new XmlNamespaceManager(new NameTable());
mgr.AddNamespace("pfx", "http://schemas.blahblahblah.com/example");
I then set this namespace manager as the namespace context for the XPath expression, and added the prefix pfx to the names within your XPath expression:
XPathExpression expr;
expr = nav.Compile("//pfx:ItemAttributes[contains(pfx:ProductGroup, 'Book')]");
expr.SetContext(mgr);
XPathNodeIterator iterator = nav.Select(expr);
I then got the one element out of the iterator, as expected.
XPath can be a bit funny with namespaces. I tried binding the empty prefix "" to the URI so that I could use your XPath expression unmodified, but that didn't work. That's one thing I've found with XPath before: always bind namespace URIs to prefixes with XPath, even if the original XML document binds the default namespace prefix. Unprefixed names in XPath always seem to be in the 'null' namespace.
I've not really looked into how you map namespace prefixes to URIs with XPath in .NET much, and there are probably better ways of doing it than what I've cobbled together after a quick Googling and reading of MSDN.
EDIT: the aim of my answer was to explain why your code using XPath didn't work. You didn't understand why you were getting no results out of the iterator. My suspicion was that you hadn't given me the full XML document, and that in the part of the document that you didn't share with us laid the answer.
Ultimately, I believe your original code didn't work because of XML namespaces. As I write this edit, I can only get a 'Request has expired' error from the URLs in your comment thread with L.B, so I can no longer test with the same kind of data you're using. However, this error request does begin as follows:
<?xml version="1.0"?>
<ItemLookupErrorResponse xmlns="http://ecs.amazonaws.com/doc/2011-08-01/">
The xmlns attribute puts the element, and every element contained within it, into a namespace. Each namespace is identified by a URI, and together, the URI and the element name identify that element.
It could be that a successful request may have the same attribute. However, L.B.'s answer uses a different namespace, so I can't be sure. For the rest of this edit, I'll have to assume a successful request does contain the same namespace as an unsuccessful one.
Because of this namespace, the element <ItemAttributes> within this XML
<ItemLookupResponse xmlns="http://ecs.amazonaws.com/doc/2011-08-01/">
<ItemAttributes />
</ItemLookupResponse>
and within this XML
<ItemAttributes />
are not the same. The first is in the http://ecs.amazonaws.com/doc/2011-08-01/ namespace, whereas the second is in the namespace identified by the empty string. This empty namespace is the default namespace if it hasn't been set any other way.
Because the two ItemAttributes elements have different namespaces, they're not the same.
As well as changing the namespace of elements by using xmlns="...", you can also associate (or bind) a prefix to a namespace. This is done by specifying the prefix you want to associate with the namespace in an xmlns attribute, using an attribute such as xmlns:prefix="some-uri". This prefix is then put in the XML element before the local name, for example <prefix:SomeElement ... />. This puts the SomeElement element in the namespace associated with the URI some-uri.
Because elements are identified by local name and namespace URI, the following two XML fragments are equal, even though one uses a prefix and the other one doesn't:
<ItemLookupResponse xmlns="http://ecs.amazonaws.com/doc/2011-08-01/">
<ItemAttributes />
</ItemLookupResponse>
<ecs:ItemLookupResponse xmlns:ecs="http://ecs.amazonaws.com/doc/2011-08-01/">
<ecs:ItemAttributes />
</ecs:ItemLookupResponse>
Now we turn to XPath and namespaces. Your XPath expression is
//ItemAttributes[contains(ProductGroup, 'Book')]
One irritation with XPath is that you can't change the namespace used without prefixes in the same way as you can with XML. So the names ItemAttributes and ProductGroup in the above are always in the 'empty' namespace. This XPath expression matches nothing in your XML document because there are no elements with local name ItemAttributes in the 'empty' namespace, let alone any with a ProductGroup child element containing the text Book.
However, with most (if not all) XPath APIs, there are ways of binding prefixes to namespaces. What I did was to show one way of doing this with XPath in .NET. I associated the prefix pfx (I could have chosen any prefix I wanted) with the URI I had used in my example above. You'd use a different URI to my made-up example. You could then use the XPath expression
//pfx:ItemAttributes[contains(pfx:ProductGroup, 'Book')]
to find the relevant element, because there are element(s) with name ItemAttributes and namespace http://ecs.amazonaws.com/doc/2011-08-01/, and at least one of those contains a child element with name ProductGroup in the same namespace and with text contents Book.

Can I avoid having to use fully-qualified element names in LINQ to XML?

Say I call XElement.Parse() with the following XML string:
var xml = XElement.Parse(#"
<?xml version="1.0" encoding="UTF-8"?>
<AccessControlPolicy xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Owner>
<ID>7c75442509c41100b6a413b88b523bd6f46554cdbee5b6cbe27bc08cb3f6a865</ID>
<DisplayName>me</DisplayName>
</Owner>
<AccessControlList>
<Grant>
<Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="Group">
...
");
When it comes time to query the element, I'm forced to use fully-qualified element names because that XML document contains an xmlns attribute in its root. This requires cumbersome creations of XName instances:
var AWS_XMLNS = "http://s3.amazonaws.com/doc/2006-03-01/";
var ownerElement = xml.Element(XName.Get("AccessControlPolicy", AWS_XMLNS)).Element(XName.Get("Owner", AWS_XMLNS));
When what I really want is simply,
var ownerElement = xml.Element("AccessControlPolicy").Element("Owner");
Is there a way to make LINQ to XML assume a specific namespace so I don't have to keep specifying it?
You could simplify by using
XNamespace ns = "http://s3.amazonaws.com/doc/2006-03-01/";
var ownerElement = xml.Element(ns + "AccessControlPolicy").Element(ns + "Owner");
I don't think you can (see Jon Skeet's comment), but there are a few tricks you can do.
1) create an extension method that appends the XNamespace to your string
2) Use VB?!?

Categories