How to read nodes in XSD document [duplicate] - c#

How does XPath deal with XML namespaces?
If I use
/IntuitResponse/QueryResponse/Bill/Id
to parse the XML document below I get 0 nodes back.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<IntuitResponse xmlns="http://schema.intuit.com/finance/v3"
time="2016-10-14T10:48:39.109-07:00">
<QueryResponse startPosition="1" maxResults="79" totalCount="79">
<Bill domain="QBO" sparse="false">
<Id>=1</Id>
</Bill>
</QueryResponse>
</IntuitResponse>
However, I'm not specifying the namespace in the XPath (i.e. http://schema.intuit.com/finance/v3 is not a prefix of each token of the path). How can XPath know which Id I want if I don't tell it explicitly? I suppose in this case (since there is only one namespace) XPath could get away with ignoring the xmlns entirely. But if there are multiple namespaces, things could get ugly.

XPath 1.0/2.0
Defining namespaces in XPath (recommended)
XPath itself doesn't have a way to bind a namespace prefix with a namespace. Such facilities are provided by the hosting library.
It is recommended that you use those facilities and define namespace prefixes that can then be used to qualify XML element and attribute names as necessary.
Here are some of the various mechanisms which XPath hosts provide for specifying namespace prefix bindings to namespace URIs.
(OP's original XPath, /IntuitResponse/QueryResponse/Bill/Id, has been elided to /IntuitResponse/QueryResponse.)
C#:
XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
nsmgr.AddNamespace("i", "http://schema.intuit.com/finance/v3");
XmlNodeList nodes = el.SelectNodes(#"/i:IntuitResponse/i:QueryResponse", nsmgr);
Google Docs:
Unfortunately, IMPORTXML() does not provide a namespace prefix binding mechanism. See next section, Defeating namespaces in XPath, for how to use local-name() as a work-around.
Java (SAX):
NamespaceSupport support = new NamespaceSupport();
support.pushContext();
support.declarePrefix("i", "http://schema.intuit.com/finance/v3");
Java (XPath):
xpath.setNamespaceContext(new NamespaceContext() {
public String getNamespaceURI(String prefix) {
switch (prefix) {
case "i": return "http://schema.intuit.com/finance/v3";
// ...
}
});
Remember to call
DocumentBuilderFactory.setNamespaceAware(true).
See also:
Java XPath: Queries with default namespace xmlns
JavaScript:
See Implementing a User Defined Namespace Resolver:
function nsResolver(prefix) {
var ns = {
'i' : 'http://schema.intuit.com/finance/v3'
};
return ns[prefix] || null;
}
document.evaluate( '/i:IntuitResponse/i:QueryResponse',
document, nsResolver, XPathResult.ANY_TYPE,
null );
Note that if the default namespace has an associated namespace prefix defined, using the nsResolver() returned by Document.createNSResolver() can obviate the need for a customer nsResolver().
Perl (LibXML):
my $xc = XML::LibXML::XPathContext->new($doc);
$xc->registerNs('i', 'http://schema.intuit.com/finance/v3');
my #nodes = $xc->findnodes('/i:IntuitResponse/i:QueryResponse');
Python (lxml):
from lxml import etree
f = StringIO('<IntuitResponse>...</IntuitResponse>')
doc = etree.parse(f)
r = doc.xpath('/i:IntuitResponse/i:QueryResponse',
namespaces={'i':'http://schema.intuit.com/finance/v3'})
Python (ElementTree):
namespaces = {'i': 'http://schema.intuit.com/finance/v3'}
root.findall('/i:IntuitResponse/i:QueryResponse', namespaces)
Python (Scrapy):
response.selector.register_namespace('i', 'http://schema.intuit.com/finance/v3')
response.xpath('/i:IntuitResponse/i:QueryResponse').getall()
PhP:
Adapted from #Tomalak's answer using DOMDocument:
$result = new DOMDocument();
$result->loadXML($xml);
$xpath = new DOMXpath($result);
$xpath->registerNamespace("i", "http://schema.intuit.com/finance/v3");
$result = $xpath->query("/i:IntuitResponse/i:QueryResponse");
See also #IMSoP's canonical Q/A on PHP SimpleXML namespaces.
Ruby (Nokogiri):
puts doc.xpath('/i:IntuitResponse/i:QueryResponse',
'i' => "http://schema.intuit.com/finance/v3")
Note that Nokogiri supports removal of namespaces,
doc.remove_namespaces!
but see the below warnings discouraging the defeating of XML namespaces.
VBA:
xmlNS = "xmlns:i='http://schema.intuit.com/finance/v3'"
doc.setProperty "SelectionNamespaces", xmlNS
Set queryResponseElement =doc.SelectSingleNode("/i:IntuitResponse/i:QueryResponse")
VB.NET:
xmlDoc = New XmlDocument()
xmlDoc.Load("file.xml")
nsmgr = New XmlNamespaceManager(New XmlNameTable())
nsmgr.AddNamespace("i", "http://schema.intuit.com/finance/v3");
nodes = xmlDoc.DocumentElement.SelectNodes("/i:IntuitResponse/i:QueryResponse",
nsmgr)
SoapUI (doc):
declare namespace i='http://schema.intuit.com/finance/v3';
/i:IntuitResponse/i:QueryResponse
xmlstarlet:
-N i="http://schema.intuit.com/finance/v3"
XSLT:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:i="http://schema.intuit.com/finance/v3">
...
Once you've declared a namespace prefix, your XPath can be written to use it:
/i:IntuitResponse/i:QueryResponse
Defeating namespaces in XPath (not recommended)
An alternative is to write predicates that test against local-name():
/*[local-name()='IntuitResponse']/*[local-name()='QueryResponse']
Or, in XPath 2.0:
/*:IntuitResponse/*:QueryResponse
Skirting namespaces in this manner works but is not recommended because it
Under-specifies the full element/attribute name.
Fails to differentiate between element/attribute names in different
namespaces (the very purpose of namespaces). Note that this concern could be addressed by adding an additional predicate to check the namespace URI explicitly:
/*[ namespace-uri()='http://schema.intuit.com/finance/v3'
and local-name()='IntuitResponse']
/*[ namespace-uri()='http://schema.intuit.com/finance/v3'
and local-name()='QueryResponse']
Thanks to Daniel Haley for the namespace-uri() note.
Is excessively verbose.
XPath 3.0/3.1
Libraries and tools that support modern XPath 3.0/3.1 allow the specification of a namespace URI directly in an XPath expression:
/Q{http://schema.intuit.com/finance/v3}IntuitResponse/Q{http://schema.intuit.com/finance/v3}QueryResponse
While Q{http://schema.intuit.com/finance/v3} is much more verbose than using an XML namespace prefix, it has the advantage of being independent of the namespace prefix binding mechanism of the hosting library. The Q{} notation is known as Clark Notation after its originator, James Clark. The W3C XPath 3.1 EBNF grammar calls it a BracedURILiteral.
Thanks to Michael Kay for the suggestion to cover XPath 3.0/3.1's BracedURILiteral.

I use /*[name()='...'] in a google sheet to fetch some counts from Wikidata. I have a table like this
thes WD prop links items
NOM P7749 3925 3789
AAT P1014 21157 20224
and the formulas in cols links and items are
=IMPORTXML("https://query.wikidata.org/sparql?query=SELECT(COUNT(*)as?c){?item wdt:"&$B14&"[]}","//*[name()='literal']")
=IMPORTXML("https://query.wikidata.org/sparql?query=SELECT(COUNT(distinct?item)as?c){?item wdt:"&$B14&"[]}","//*[name()='literal']")
respectively. The SPARQL query happens not to have any spaces...
I saw name() used instead of local-name() in Xml Namespace breaking my xpath!, and for some reason //*:literal doesn't work.

Related

Context prefixes not loading in JsonLdParser.Load

I'm trying to load some basic json-ld content as a string, but I'm not able to see the namespace prefixes that should be included.
Given the following json-ld:
{
"#context": {
"name": "http://schema.org/name",
"image": {
"#id": "http://schema.org/image",
"#type": "#id"
},
"foaf": "http://xmlns.com/foaf/0.1/"
},
"name": "Manu Sporny",
"foaf:homepage": "http://manu.sporny.org/",
"image": "http://manu.sporny.org/images/manu.png"
}
I run this against the dotnetrdf library:
void Main()
{
var targetPath = #"C:\Users\me\MinContext.json";
var jsonStr = File.ReadAllText(targetPath);
var parser = new JsonLdParser();
var store = new TripleStore();
parser.Load(store, new StringReader(jsonStr));
var g = store.Graphs.FirstOrDefault();
IUriNode rdfType = g.CreateUriNode("rdf:type");
IUriNode home = g.CreateUriNode("foaf:homepage");
}
On the last line I get this RdfException message:
The Namespace URI for the given Prefix 'foaf' is not known by the
in-scope NamespaceMapper. Did you forget to define a namespace for
this prefix?
...and if you inspect the graph namespaces (g.NamespaceMap.Prefixes) you can see that it only contains three: rdf, rdfs and xsd.
So question: how do I get the foaf prefix and namespace to load correctly?
This is based on using NuGet package version 2.6.1
Prefixes are not an inherent part of any RDF graph, they are just conventions and shortcuts so that you don't have to type the full IRI. A specific database software/implementation can have options for configuring namespaces/prefixes, but they are just for presentation.
In this case, JsonLdParser simply does not import any prefix from the source data into the graph. This is a perfectly valid behaviour, and I don't know if it can be changed. Load can also take IRdfHandlerwhich seems to be able to do something with prefixes, but creating an implementation will most likely be more difficult than simply defining the namespace yourself:
g.NamespaceMap.AddNamespace("foaf", new Uri("http://xmlns.com/foaf/0.1/"));
I'd argue this is actually the more correct option. The source document can specify foaf: to be absolutely anything, but you want this foaf: (the full meaning of a resource name comes from the IRI of its prefix, not from the prefix name itself).
The alternative to that is g.CreateUriNode(new Uri("http://xmlns.com/foaf/0.1/homepage")) which creates a completely equivalent node. Of course it is simpler to add the namespace instead of typing the full IRI every time – that's what namespaces are for.

XmlDsigXPathTransform - namespace prefix is not defined

I am trying to create a signed xml detached signature file using this library: [opensbr]
I need to add an xpath filter to the TransformChain but upon calling SignedXml.ComputeSignature an exception is thrown that the namespace xbrli is not valid.
xpath: /xbrli:xbrl//*[not(local-name()='DocumentAdoptionStatus' or local-name()='DocumentAdoptionDate' and namespace-uri()='http://www.nltaxonomie.nl/8.0/basis/venj/items/bw2-data')]
constructing the transform (as per Microsoft example):
public static XmlDsigXPathTransform CreateXPathTransform(string XPathString)
{
XmlDocument doc = new XmlDocument();
XmlElement xPathElem = doc.CreateElement("XPath");
xPathElem.InnerText = XPathString;
XmlDsigXPathTransform xForm = new XmlDsigXPathTransform();
xForm.LoadInnerXml(xPathElem.SelectNodes("."));
return xForm;
}
The xpath and xml file are both valid.
How can I use namespace prefixes with XmlDsigXPathTransform?
MSDN example* suggests that you can declare namespace prefix on the XPath element :
.....
XmlElement xPathElem = doc.CreateElement("XPath");
xPathElem.SetAttribute("xmlns:xbrl", "xbrl namespace uri here");
xPathElem.InnerText = XPathString;
.....
*) See method LoadTransformByXml in Example #2
The issue here is that prefixes are only locally valid, within a limited scope. Your expression does not contain enough information to resolve your prefix to a namespace (even if it's the same as the default prefix in an XBRL document).
One solution is to "feed" the namespace mapping in code, as suggested by har07.
Another solution is to include the namespace in the complete XPath fragment, on a node level. This is what is used by the Dutch audit profession in official business register filings.
<dsig-xpath:XPath xmlns:dsig-xpath="http://www.w3.org/2002/06/xmldsig-filter2"
xmlns:xbrli="http://www.xbrl.org/2003/instance" Filter="subtract">
/xbrli:xbrl/*[localname()='DocumentAdoptionStatus' or local-name()='DocumentAdoptionDate' or local-name()='EmailAddressContact'] | //text()[normalize-space()='']
</dsig-xpath:XPath>

C# LINQ and XML Getting child nodes

Having issues getting node values. Not sure why the following code is failing to do so.
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type='text/xsl' href='STIG_unclass.xsl'?>
<Benchmark xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:cpe="http://cpe.mitre.org/language/2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" id="Windows_7_STIG" xml:lang="en" xsi:schemaLocation="http://checklists.nist.gov/xccdf/1.1 http://nvd.nist.gov/schema/xccdf-1.1.4.xsd http://cpe.mitre.org/dictionary/2.0 http://cpe.mitre.org/files/cpe-dictionary_2.1.xsd" xmlns="http://checklists.nist.gov/xccdf/1.1">
<status date="2015-06-16">accepted</status>
<title>Windows 7 Security Technical Implementation Guide</title>
<description>
The Windows 7 Security Technical Implementation Guide (STIG) is published as a tool to improve the security of Department of Defense (DoD) information systems. The requirements were developed from DoD consensus, as well as the Windows 7 Security Guide and security templates published by Microsoft Corporation. Comments or proposed revisions to this document should be sent via e-mail to the following address: disa.stig_spt#mail.mil.
</description>
<notice id="terms-of-use" xml:lang="en">Developed_by_DISA_for_the_DoD</notice>
<reference href="http://iase.disa.mil">
<dc:publisher>DISA, Field Security Operations</dc:publisher>
<dc:source>STIG.DOD.MIL</dc:source>
</reference>
<plain-text id="release-info">Release: 20 Benchmark Date: 24 Jul 2015</plain-text>
</Benchmark>
Sample XML File.
and the following is my code.
String Title = LoadedXML.Element("Benchmark").Attribute("id").Value;
var XMLData = LoadedXML.Element("Benchmark").Elements("plain-text")
.Single(release => release.Attribute("id").Value == "release-info").Value;
is there a way I can get multiple Node values at the same time? Like getting the Title and Release Value at once instead of having a separate one for each?
Your code is failing because your XML contains Namespace and you can't access your nodes directly. If you want to confirm this simply query LoadedXML.Elements() and examine the values in debugger, you can clearly see the namespaces there:-
So, You need to declare the namespace and use it:-
XNamespace ns = "http://checklists.nist.gov/xccdf/1.1";
If you want both vales to be fetched at once you can project it to a anonymous type like this:-
var result = LoadedXML.Root.Elements(ns + "plain-text")
.Where(x => (string)x.Attribute("id") == "release-info")
.Select(x => new
{
Title = (string)x.Document.Root.Attribute("id"),
XMLData = x.Value
}).FirstOrDefault();
This query is giving me below output:-
Linq-to-xml is generally used to query a XML to filter it's nodes and then get the desired element/values per need. It's more like querying a table with SQL.
If all/most of the XML is required as a result, then the better approach would be to deseralize the XMl into a native (C# here) object and map it to the required model object. XML can always be thought of a serialized version of an object (although it can be manually as well), and can be deserialized back to the actual object.
.Net has native support for all these, see msdn links for XML Serialization and Deserialization for details. You can write a small method to deserialize your object like this.
using System.Xml.Linq;
using System.Xml.Serialization;
public class XMLHelper
{
public T DeserializeData<T>(string data)
{
XmlSerializer serializer = new XmlSerializer(typeof(T));
StringReader reader = new StringReader(data);
var deserializedObject = serializer.Deserialize(reader);
return deserializedObject == null ? default(T) : (T)deserializedObject;
}
}
To get the string you can do like File.ReadAllText(xmlFilePath) or whatever is easier for the situation.
This will give you deseialized object of the whole XML. If you want some other transformed object, you can either manually map that, or use AutoMapper

Why is the empty namespace not working with .NET System.Xml.XmlNamespaceManager?

I have this XML document:
<?xml version="1.0" encoding="utf-8"?>
<document xmlns="file:///someurl/goes.here">
<levelone>
<leveltwo>
<title>Test Title</title>
<item>Item 1</item>
<item>Item 2</item>
<item>Item 3</item>
</leveltwo>
</levelone>
</document>
And this code:
static void Main(string[] args)
{
XmlDocument fileIn = new XmlDocument();
fileIn.Load("XMLFile1.xml");
XmlNamespaceManager nsm = new XmlNamespaceManager(fileIn.NameTable);
nsm.AddNamespace(String.Empty, "file:///someurl/goes.here");
XmlNode docElem = fileIn.DocumentElement;
string title = docElem.SelectSingleNode("levelone/leveltwo/title", nsm).InnerText;
Console.WriteLine("Title: " + title);
Console.WriteLine("Items: " + docElem.SelectNodes("//item", nsm).Count);
Environment.Exit(0);
}
If I run this, I get a NullReferenceException when I attempt to get the InnerText of the node I'm attempting to select for title. However, if I change my AddNamespace call to nsm.AddNamespace("a", "file:///someurl/goes.here") and update all of my XPaths to use that a namespace, then it works fine.
My understanding is that elements without the prefixes in an XPath refer to the empty namespace. String.Empty is said to be for the empty namespace in MSDN's docs, and I'm giving the NamespaceManager the correct namespace URI for the empty namespace. Yet it doesn't seem to be working and I find myself having to make a dummy namespace in the manager to get it to work right, which is really annoying. Why do I have to use this, or any workaround?
I am not looking for a way to get my code to run, I know of a couple ways and detailed one widely-accepted method above. I want to know why my understanding of the documentation is resulting in broken code - e.g. is this an issue on Microsoft's end or am I just misunderstanding something?

Creating an XmlElement without a namespace

I am using the CreateElement() method of my XmlDocument like this:
XmlElement jobElement = dom.CreateElement("job");
But what I get in the outcome is a job element with an empty namespace.
<job xmlns="">xyz</job>
And the program that is supposed to read this Xml will no longer detect this job element.
I thought this question gives an answer, but it does not. How can I create a pure job element?
<job>xyz</job>
Update: This is the root element of my document:
<job-scheduling-data xmlns="http://quartznet.sourceforge.net/JobSchedulingData" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="2.0">
"I thought this question gives an answer, but it does not."
Actually it does. Set <job> element to be in the default namespace (unprefixed namespace that, in this case, was declared at the root level) :
XmlElement jobElement = dom.CreateElement("job", "http://quartznet.sourceforge.net/JobSchedulingData");
jobElement.InnerText = "xyz";
This way, in the XML markup, <job> element simply inherits it's ancestor's default namespace (no local-empty default namespace will be created) :
<job>xyz</job>

Categories