I am pretty new in XPath and in C# and I have the following problem:
I have to parse this file: http://static.nvd.nist.gov/feeds/xml/cpe/dictionary/official-cpe-dictionary_v2.3.xml
As you can see opening it in the browser this file have the following structure:
<?xml version='1.0' encoding='UTF-8'?>
<cpe-list xmlns:meta="http://scap.nist.gov/schema/cpe-dictionary-metadata/0.2" xmlns:config="http://scap.nist.gov/schema/configuration/0.1" xmlns:ns6="http://scap.nist.gov/schema/scap-core/0.1" xmlns:scap-core="http://scap.nist.gov/schema/scap-core/0.3" xmlns="http://cpe.mitre.org/dictionary/2.0" xmlns:cpe-23="http://scap.nist.gov/schema/cpe-extension/2.3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://scap.nist.gov/schema/configuration/0.1 http://nvd.nist.gov/schema/configuration_0.1.xsd http://cpe.mitre.org/dictionary/2.0 http://scap.nist.gov/schema/cpe/2.3/cpe-dictionary_2.3.xsd http://scap.nist.gov/schema/scap-core/0.3 http://nvd.nist.gov/schema/scap-core_0.3.xsd http://scap.nist.gov/schema/scap-core/0.1 http://nvd.nist.gov/schema/scap-core_0.1.xsd http://scap.nist.gov/schema/cpe-dictionary-metadata/0.2 http://nvd.nist.gov/schema/cpe-dictionary-metadata_0.2.xsd http://scap.nist.gov/schema/cpe-extension/2.3 http://scap.nist.gov/schema/cpe/2.3/cpe-dictionary-extension_2.3.xsd">
<generator>
<product_name>National Vulnerability Database (NVD)</product_name>
<product_version>2.22.0-SNAPSHOT (PRODUCTION)</product_version>
<schema_version>2.3</schema_version>
<timestamp>2014-03-05T05:13:33.550Z</timestamp>
</generator>
<cpe-item name="cpe:/a:1024cms:1024_cms:0.7">
<title xml:lang="en-US">1024cms.org 1024 CMS 0.7</title>
<cpe-23:cpe23-item name="cpe:2.3:a:1024cms:1024_cms:0.7:*:*:*:*:*:*:*"/>
</cpe-item>
<cpe-item name="cpe:/a:1024cms:1024_cms:1.2.5">
<title xml:lang="en-US">1024cms.org 1024 CMS 1.2.5</title>
<cpe-23:cpe23-item name="cpe:2.3:a:1024cms:1024_cms:1.2.5:*:*:*:*:*:*:*"/>
</cpe-item>
<cpe-item name="cpe:/a:1024cms:1024_cms:1.3.1">
<title xml:lang="en-US">1024cms.org 1024 CMS 1.3.1</title>
<cpe-23:cpe23-item name="cpe:2.3:a:1024cms:1024_cms:1.3.1:*:*:*:*:*:*:*"/>
</cpe-item>
.............................................................
.............................................................
.............................................................
<cpe-item name="cpe:/h:zyxel:p-660hw_t3:v2">
<title xml:lang="en-US">ZyXEL P-660HW T3 Model v2</title>
<cpe-23:cpe23-item name="cpe:2.3:h:zyxel:p-660hw_t3:v2:*:*:*:*:*:*:*"/>
</cpe-item>
</cpe-list>
So now, using XPath, I have to obtain the list of all tag (excluding the first tag situated as first tag into my tag
In my code I have something like it:
XmlDocument document = new XmlDocument(); // Represent an XML document
document.Load(sourceXML.FullName); // Loads the XML document from the specified stream
// Add the namespaces:
XmlNamespaceManager nsmgr = new XmlNamespaceManager(document.NameTable);
nsmgr.AddNamespace("ns6", "http://scap.nist.gov/schema/scap-core/0.1");
nsmgr.AddNamespace("cpe-23", "http://scap.nist.gov/schema/cpe-extension/2.3");
nsmgr.AddNamespace("ns", "http://cpe.mitre.org/dictionary/2.0");
nsmgr.AddNamespace("meta", "http://scap.nist.gov/schema/cpe-dictionary-metadata/0.2");
nsmgr.AddNamespace("scap-core", "http://scap.nist.gov/schema/scap-core/0.3");
nsmgr.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
nsmgr.AddNamespace("config", "http://scap.nist.gov/schema/configuration/0.1");
/* nodeList is the collection that contains all the <cpe-item> tag that are
* inside the root <cpe-list> tag in the XML document:
*/
XmlNodeList nodeList;
nodeList = document.DocumentElement.SelectNodes("//ns:cpe-list/ns:cpe-item", nsmgr);
long conta = 0;
So I am using this line to select all the tag that are into the tag:
nodeList = document.DocumentElement.SelectNodes("//ns:cpe-list/ns:cpe-item", nsmgr);
It seems to work but I am not sure if it is correct because when I look into using the Visual Studio Debugger it say to me that my XmlNodeList nodeList contains: 80588 element (the file is very big but it seems to me to much element !!!)
Another doubt is related to the use of the ns namespace that is into my previouse code (this is not my code, I have to work on it).
Why in the previous code there is the ns namepace ahead the cpe-list and cpe-item if in the XML code to parse I smply have something like:
<cpe-item name="cpe:/a:1024cms:1024_cms:1.3.1">
<title xml:lang="en-US">1024cms.org 1024 CMS 1.3.1</title>
<cpe-23:cpe23-item name="cpe:2.3:a:1024cms:1024_cms:1.3.1:*:*:*:*:*:*:*"/>
</cpe-item>
that don't begin with ns namespace? Why is it used?
The last question is about how can I access to the title inner text content?
I am trying to do something like this but in this way can't work:
XmlNodeList nodeList;
nodeList = document.DocumentElement.SelectNodes("//ns:cpe-list/ns:cpe-item", nsmgr);
long conta = 0;
DataModel.Vulnerability.CPE currentCPE;
foreach (XmlNode node in nodeList)
{
// Access to the name ATTRIBUTE of the <cpe-item> tag:
Debug.WriteLine(String.Format("[{0:N0}] CPE: {1} Title: {2}", conta, node.Attributes["name"].Value, node.FirstChild.FirstChild.Value));
// Access to the <title> tag content:
//Debug.WriteLine(String.Format("[{0:N0}] Title: {1} Title: {2}", conta, node.SelectSingleNode("./title", nsmgr)));
XmlNode titleNode = node.SelectSingleNode("./title", nsmgr);
conta++;
}
When this code is executed I have no problem to access to the name attributes of the current cpe element into my list but I can't access to the content of the tag because when execute this line:
XmlNode titleNode = node.SelectSingleNode("./title", nsmgr);
it return that the value is null
What is the problem? What am I missing? How can I solve?
Tnx
Andrea
Your XPath looks fine given XML snippet posted in this question. It should return correct number of elements as far as I can see. Can't tell more than that, you should check further yourself.
Your XML has default namespace (xmlns="....."). All elements in XML without prefix considered in default namespace. But in XPath, all element without prefix considered has no namespace. In the end, that different paradigm of both platform requires you to define ns prefix that point to default namespace url for use in XPath statement.
Related to point 2. Remember that all element without prefix is in default namespace. So is <title> element. Hence you need to add ns prefix in the XPath statement : ./ns:title
Ideally, one post has to contains no more than one specific question. Answering a bunch of questions in one post is rarely useful for future visitors, it is tend to confuse them instead. Remember that we are not only solving your problem here, but also trying to build knowledge-base that hopefully useful for others having similar problem.
Related
I am trying to edit the values of an xml document following the instructions found on another post here How to modify existing XML file with XmlDocument and XmlNode in C# .
here is my code
XmlDocument xml = new XmlDocument();
xml = xml.Load(#"https://www.aade.gr/sites/default/files/2020-09/SampleXML_1.1%20%20%28%CE%A4%CE%99%CE%9C-%CE%A0%CE%A9%CE%9B%CE%97%CE%A3%CE%97%CE%A3_%CE%91%CE%A5%CE%A4%CE%9F%CE%A4%CE%99%CE%9C%29%20.xml");
XmlNodeList aNodes = xml.SelectNodes("/InvoicesDoc/invoice/issuer/vatNumber");
foreach (XmlNode aNode in aNodes)
{
XmlAttribute vatAttribute = aNode.Attributes["vatNumber"];
vatAttribute.Value = "123456789";
}
xml.Save(#"C:\Users\Kostas\Desktop\mydata\infinal.xml");
My problem is that XmlNodeList aNodes will return empty; i have tried to change the xml.SelectNodes("/InvoicesDoc/invoice/issuer/vatNumber") to xml.SelectNodes("/InvoicesDoc/invoice/issuer") and all the way up to single xml.SelectNodes("/InvoicesDoc") but still XmlNodeList aNodes will return empty.
First attempts i loaded the XML doc from file and had the issue. Then i thought maybe something wrong with the file so changed the load of the file directly from the site provides this xml template i need to work on. Both options will load the file fine as i can see it when is saved but my changes will not complete since aNodes is empty and foreach loop will skip straight away.
What am i doing wrong?
thanks for your help in advance.
ps this is the xml i need to edit
https://www.aade.gr/sites/default/files/2020-09/SampleXML_1.1%20%20%28%CE%A4%CE%99%CE%9C-%CE%A0%CE%A9%CE%9B%CE%97%CE%A3%CE%97%CE%A3_%CE%91%CE%A5%CE%A4%CE%9F%CE%A4%CE%99%CE%9C%29%20.xml
Update: I just tried with another xlm found in microsoft example called books on https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms762271(v=vs.85)
XmlNodelist will also return null/empty when i look for /catalog/book . So the good side is that there is no problem with original xml file i need to edit and the bad side is that still i cannot figure out what i am doing wrong.
XmlNodeList aNodes returns null because the xml contains these namespace declarations:
<InvoicesDoc xmlns=\"http://www.aade.gr/myDATA/invoice/v1.0\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"
xsi:schemaLocation=\"http://www.aade.gr/myDATA/invoice/v1.0/InvoicesDoc-v0.6.xsd\"
xmlns:icls=\"https://www.aade.gr/myDATA/incomeClassificaton/v1.0\"
xmlns:ecls=\"https://www.aade.gr/myDATA/expensesClassificaton/v1.0\">
You need to manage your xml doing something like this:
XmlDocument xml = new XmlDocument();
xml.Load(#"https://www.aade.gr/sites/default/files/2020-09/SampleXML_1.1%20%20%28%CE%A4%CE%99%CE%9C-%CE%A0%CE%A9%CE%9B%CE%97%CE%A3%CE%97%CE%A3_%CE%91%CE%A5%CE%A4%CE%9F%CE%A4%CE%99%CE%9C%29%20.xml");
XmlNamespaceManager manager = new XmlNamespaceManager(xml.NameTable);
manager.AddNamespace("InvoicesDoc", "http://www.aade.gr/myDATA/invoice/v1.0");
//Example to get the root element
XmlNodeList root = xml.SelectNodes("/InvoicesDoc:InvoicesDoc", manager);
//Example to get the VatNumber tag
XmlNodeList aNodes =xml.SelectNodes("/InvoicesDoc:InvoicesDoc/InvoicesDoc:invoice/InvoicesDoc:issuer/InvoicesDoc:vatNumber", manager);
I have the following XML structure:
<?xml version="1.0" encoding="utf-16"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance
" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<StoreResponse xmlns="http://www.some-site.com">
<StoreResult>
<Message />
<Code>OK</Code>
</StoreResult>
</StoreResponse>
</soap:Body>
</soap:Envelope>
I need to get the InnerText from Codeout of this document and I need help with the appropriate XPATH statement.
I'm really confused by XML namespaces. While working on a previous namespace problem in another XML document, I learned, that even if there's nothing in front of Code (e.g. ns:Code), it is still part of a namespace defined by the xmlns attribute in its parent node. Now, there are multiple xmlns nodes defined in parents of Code. What is the namespace that I need to specify in an XPATH statement? Is there such a thing as a "primary namespace"? Do childnodes inherit the (primary) namespace of it's parents?
The namespace of the <Code> element is http://www.some-site.com. xmlsn:xxx means that names prefixed by xxx: (like soap:Body) have that namespace. xmlns by itself means that this is the default namespace for names without any prefix.
An example of using an XDocument (Linq) approach:
XNamespace ns = "http://www.some-site.com";
var document = XDocument.Parse("your-xml-string");
var elements = document.Descendants( ns + "StoreResult" )
Descendant elements will inherit the last immediate namespace. In your example you will need to create two namespaces one for the soap envelope and a second for "some-site".
Here's an option I found in this question: Weirdness with XDocument, XPath and namespaces
var xml = "<your xml>";
var doc = XDocument.Parse(xml); // Could use .Load() here too
var code = doc.XPathSelectElement("//*[local-name()='Code']");
I have a C# program that attempts to read the following xml, but can't read any elements:
<?xml version="1.0" encoding="UTF-8"?>
<!-- Comments Here -->
<FileFeed
xmlns="http://www.mycompany.com/schemas/xxx/FileFeed/V1"
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.somecompany.com/schemas/xxx/FileFeed/V1
FileFeed.xsd"
RecordCount = "1">
<Object>
<ID>PAMSMOKE110113xxx</ID>
<CorpID>12509</CorpID>
<AnotherID>201654702345</AnotherID>
<TimeStamp>2013-09-03</TimeStamp>
<Type>Some Type</Type>
<SIM_ID>89011704258012600767</SIM_ID>
<Code>ZZZ</Code>
<Year>2013</Year>
</Object>
</FileFeed>
With the above XML my C# program is unable to read any elements.. For instance the ID Element is always NULL.
Now if I simply remove the first xmlns from the above XML, my program can read all the elements without any issues. The problem is I have to process the XML file in the format that's given to me, and can't change the file format. My program reads the below XML just fine: Note the line xmlns="http://www.mycompany.com/schemas/xxx/FileFeed/V1" is removed.
<?xml version="1.0" encoding="UTF-8"?>
<!-- Comments Here -->
<FileFeed
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.somecompany.com/schemas/xxx/FileFeed/V1
FileFeed.xsd"
RecordCount = "1">
<Object>
<ID>PAMSMOKE110113xxx</ID>
<CorpID>12509</CorpID>
<AnotherID>201654702345</AnotherID>
<TimeStamp>2013-09-03</TimeStamp>
<Type>Some Type</Type>
<SomeNumber>89011704258012600767</SomeNumber>
<Code>ZZZ</Code>
<Year>2013</Year>
</Object>
</FileFeed>
I realize I'm not posting any code, but just wondering what possible issue could I be having, where simply removing the xmlns line resolves everything??
Your problem is with xml namespaces
Using Linq2Xml
XNamespace ns = "http://www.mycompany.com/schemas/xxx/FileFeed/V1";
var xDoc = XDocument.Load(fname);
var id = xDoc.Root.Element(ns + "Object").Element(ns + "ID").Value;
Your root element FileFeed has a namespace attribute. This means that each element inside it also uses that namespace.
The Element method takes an XName as its argument. Usually you use a string which gets implicitly converted into an XName.
If you want to include a namespace you create an XNamespace and add the string. Since XNamespace overloads the + operator this will also result in an XName.
XDocument doc = XDocument.Load("Test.xml");
// this will be null
XElement objectElementWithoutNS = doc.Root.Element("Object");
XNamespace ns = doc.Root.GetDefaultNamespace();
XElement objectElementWithNS = doc.Root.Element(ns + "Object");
Xml namespaces are more or less like C# namespaces. Would you be able to access a class when its namespace is set or not set?
public namespace My.Company.Schemas {
public class FileFeed
vs
public class FileFeed {
They are two DISTINCT classes! The same applies to XML - by setting a namespace you make it possible to have documents with similar or even the same internal structure but they represent two disctinct documents that are not exchangeable. This is really convenient.
If you'd like to get help on why your actual reading method doesn't consider the namespace, you have to present the C# code. The general rule though is that any reading API makes is possible to set the namespace for actual reading.
Got this xml:
<?xml version="1.0" encoding="UTF-8"?>
<video xmlns="UploadXSD">
<title>
A vid with Pete
</title>
<description>
Petes vid
</description>
<contributor>
Pete
</contributor>
<subject>
Cat 2
</subject>
</video>
And this xpath:
videoToAdd.Title = doc.SelectSingleNode(#"/video/title").InnerXml;
And im getting an 'object reference not set to an instance of an object'. Any ideas why this is a valid xpath from what I can see and it used to work...
Your XML contains namespace specification, you need to modify your source to take that into consideration.
Example:
XmlDocument doc = new XmlDocument();
doc.Load("doc.xml");
XmlNamespaceManager xmlnsManager = new XmlNamespaceManager(doc.NameTable);
xmlnsManager.AddNamespace("ns", "UploadXSD");
videoToAdd.Title = doc.SelectSingleNode(#"/ns:video/ns:title", xmlnsManager).InnerXml;
/video/title would return a title element with no namespace, from within a video element with no namespace.
You need to either remove xmlns="UploadXSD" from your xml, or set an appropriate selection namespace in your C#
It's the xmlns="UploadXSD" attribute causing you grief here. I think you'll need to use a XmlNamespaceManager to help the parser resolve the names, or remove the xmlns attribute if you don't need it.
Is it possible that the doc variable points to the <video> element? In that case you would need to write either
videoToAdd.Title = doc.SelectSingleNode(#"./title").InnerXml;
or
videoToAdd.Title = doc.SelectSingleNode(#"//video/title").InnerXml;
Try this:
videoToAdd.Title = doc.SelectSingleNode(#"//xmlns:video/xmlns:title").InnerXml;
Your XML document has an XML namespace and to find the elements you must prefix them with xmlns:.
I've got an Xml document that both defines and references some namespaces. I load it into an XmlDocument object and to the best of my knowledge I create a XmlNamespaceManager object with which to query Xpath against. Problem is I'm getting XPath exceptions that the namespace "my" is not defined. How do I get the namespace manager to see that the namespaces I am referencing are already defined. Or rather how do I get the namespace definitions from the document to the namespace manager.
Furthermore tt strikes me as strange that you have to provide a namespace manager to the document which you create from the documents nametable in the first place. Even if you need to hardcode manual namespaces why can't you add them directly to the document. Why do you always have to pass this namespace manager with every single query? What can't XmlDocument just know?
Code:
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(programFiles + #"Common Files\Microsoft Shared\web server extensions\12\TEMPLATE\FEATURES\HfscBookingWorkflow\template.xml");
XmlNamespaceManager ns = new XmlNamespaceManager(xmlDoc.NameTable);
XmlNode referenceNode = xmlDoc.SelectSingleNode("/my:myFields/my:ReferenceNumber", ns);
referenceNode.InnerXml = this.bookingData.ReferenceNumber;
XmlNode titleNode = xmlDoc.SelectSingleNode("/my:myFields/my:Title", ns);
titleNode.InnerXml = this.bookingData.FamilyName;
Xml:
<?xml version="1.0" encoding="UTF-8" ?>
<?mso-infoPathSolution name="urn:schemas-microsoft-com:office:infopath:Inspection:-myXSD-2010-01-15T18-21-55" solutionVersion="1.0.0.104" productVersion="12.0.0" PIVersion="1.0.0.0" ?>
<?mso-application progid="InfoPath.Document" versionProgid="InfoPath.Document.2"?>
<my:myFields xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2010-01-15T18:21:55" xmlns:xd="http://schemas.microsoft.com/office/infopath/2003">
<my:DateRequested xsi:nil="true" />
<my:DateVisited xsi:nil="true" />
<my:ReferenceNumber />
<my:FireCall>false</my:FireCall>
Update:
ns.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
ns.AddNamespace("xhtml", "http://www.w3.org/1999/xhtml");
ns.AddNamespace("xd", "http://schemas.microsoft.com/office/infopath/2003");
ns.AddNamespace("my", "http://schemas.microsoft.com/office/infopath/2003/myXSD/2010-01-15T18:21:55");
This does the job, but it mean's I have to hard code to this particular xml schema. This schema represents an infopath form template. In particular the my namespace url will be different for every form template so I really don't want to hardcode this. It would be nice to find a clean way to get this namespace from the xml without resorting to RegEx.
I was hoping that the XmlNamespaceManager would just sort of pick up the namespace definitions form the NameTable. I mean their in the Xml but I still have to define them.
ns.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
ns.AddNamespace("xhtml", "http://www.w3.org/1999/xhtml");
ns.AddNamespace("xd", "http://schemas.microsoft.com/office/infopath/2003");
ns.AddNamespace("my", "http://schemas.microsoft.com/office/infopath/2003/myXSD/2010-01-15T18:21:55");
This does the job, but it mean's I have to hard code to this particular xml schema. This schema represents an infopath form template. In particular the my namespace url will be different for every form template so I really don't want to hardcode this. It would be nice to find a clean way to get this namespace from the xml without resorting to Regex.
I was hoping that the XmlNamespaceManager would just sort of pick up the namespace definitions form the NameTable. I mean their in the Xml but I still have to define them.
Here is the answer to the "What can't XmlDocument just know?" question.
NameTable is just an optimization for storing names. It has actually nothing to do with namespaces.
And even if XmlNamespaceManager could infer all namespaces and prefixes from XML doc that won't help in general case because of XML namespaces nature, e.g. what would XmlNamespaceManager map "my" prefix in this case:
<root>
<foo xmlns:my="blah"/>
<foo xmlns:my="balh-blah-blah"/>
</root>
Have you defined "my" in the namespace-manager?
ns.AddNamespace("my", "http://schemas.microsoft.com/office/infopath/2003/myXSD/2010-01-15T18:21:55");
Or better - choose something that is unlikely to conflict. It does seem odd that it didn't pick it up from the name-table, though.
For me with InfoPath 2007 this solved the problem
static public XmlNamespaceManager GetNameSpaceManager(this XmlDocument document)
{
XmlNamespaceManager xmlNamespaceManager = new XmlNamespaceManager(document.NameTable);
xmlNamespaceManager.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
xmlNamespaceManager.AddNamespace("dfs", "http://schemas.microsoft.com/office/infopath/2003/dataFormSolution");
xmlNamespaceManager.AddNamespace("d", "http://schemas.microsoft.com/office/infopath/2003/ado/dataFields");
xmlNamespaceManager.AddNamespace("my", "http://schemas.microsoft.com/office/infopath/2003/myXSD/2012-03-29T06:28:28");
xmlNamespaceManager.AddNamespace("xd", "http://schemas.microsoft.com/office/infopath/2003");
return xmlNamespaceManager;
}