xpath to element with attribute and grandparent attribute - c#

The relevant chunk of my xml is this:
[... lots of xml up here, including ancestor elements...]
<category id="MyCatID" ... >
<option ... >
<property id="MyPropID">The magic value I need</property>
[... lots of xml down here...]
My objective: Find the value of a <property> with id of MyPropID whose parent is <option> and whose grandparent (through <option>) is <category> containing the id of MyCatID.
Here is my attempted xpath:
//property[#id='MyPropID']/ancestor::category[#id='MyCatID']
In my .NET 4.7.2 that xpath query brings back all the xml inside the <category> element, which misses the mark. My hoped-for result is that it would bring back the value The magic value I need.
How is it done?

Why not reverse it, get the category with the ID you want and then navigate to the property with the ID you want? I'm not really sure how your XML looks, here's my pseudo attempt...
//category[#id='MyCatID']/option/property[#id='MyPropID']

And if for some reason you would really want to do it bottom-up way:
//property[#id='MyPropID']/../../../property[#id='MyPropID']
or
//property[#id='MyPropID']/ancestor::node()[3]/property[#id='MyPropID']

Related

xpath to find node by element and attribute containing a child element with a certain id

In c# I'm trying to find an xpath expression that will get me the value of a <property> element with id of ROBEGIN whose parent is <option> and that parent contains a child <property> with id of CEProductID and value of 5832198a-7cec-ea11-a817-000d3a191efa. The expected value I want to get is 777. Here is an xml fragment from a large xml file:
...
<option id="Whatever">
<property id="CEProductID">5832198a-7cec-ea11-a817-000d3a191efa</property>
...
<property id="ROBEGIN">777</property>
</option>
...
Important: For the <option> to be a correct match it must contain BOTH the child elements shown above, with correct id attribute values and correct element value of CEProductID. If it has one or the other matching <property> but not both, it should be ignored.
I have tried the following (and other permutations of it) without success:
xmlNode.SelectNodes($"//property[#id='CEProductID']='5832198a-7cec-ea11-a817-000d3a191efa'");
Admittedly, the above line of c# code (even if it worked) would have only gotten me the CEProductID <property> element, with which I could go programmatically up to the parent, and back down into the properties to see if <ROBEGIN> exists, and if it does, grab the value. But that seems super inefficient and I think xpath has more power than that.
How is it done?
This should get you exactly what you need:
//property [#id = 'ROBEGIN' and
parent::option [property
[#id = 'CEProductID' and text() = '5832198a-7cec-ea11-a817-000d3a191efa' ]]
]/text()
Let's break it down:
//property descend to any node named property
[#id='ROBEGIN' which has this matching attribute id
and parent::option and has a parent node named option
which in turn has [property child node
which in turn has [#id='CEProductID' attribute
and that node's inner text matches text()='5832198a-7cec-ea11-a817-000d3a191efa'
]]]/text() going back to the original node, take the inner text
Result:
777
If I understood correctly, you want to match option tag with two properties as stated in your question and then go down that second ROBEGIN and extract the inner html.
//option[property[#id='CEProductID'] and property[#id='ROBEGIN']]/property[#id='ROBEGIN']/text()
# 777

How do I modify a XML Node?

I want to modify the Node my XML File look like this.
<Tasks>
<Task>
<Title>Title of the Task</Title>
<Description>Description of the Task</Description>
<Done>false</Done>
</Task>
<Task>
<Title>Title of anotherTask</Title>
<Description>Description of anotherTask</Description>
<Done>true</Done>
</Task>
</Tasks>
I could adress the Node like this:
xmlDoc.SelectSingleNode("/Tasks/Task/Description").InnerText = "My Description";
However I have mulitple Tasks. How do I indicate which is which? I want to change the State of the Task "Done" from false to true.
You could iterate through each resulting node from something like this:
foreach( XmlNode xn in xmlDoc.SelectNodes("//Tasks"))
{
// Do something
}
Doing what you need to do on each node. More info on SelectNodes here: https://msdn.microsoft.com/en-us/library/system.xml.xmlnode.selectnodes%28v=vs.110%29.aspx
If you have control over the design of the XML, perhaps you should consider adding an ID to your task. An ID will allow you to make changes to an specific Task node instead of iterating through them or looking up by Task.Title.
You may also look at these articles:
https://msdn.microsoft.com/en-us/library/bb943906.aspx
How can I iterate though each child node in an XML file?
Having said all this, I feel your question is missing some more information on what is the criteria on when to do that something correctly. Could you expand some more? You will get better answers that way.

XPath String that grabs an element with a specific id value

I am trying to create an XPath query/string that grabs a specific element from a XML document. I am attempting to grab the element with the id=38 but my code always returns nothing for some reason.
If you look at my code & the organisation of my XML file can you tell me what XPath I need to grab the element with the id=38?
My code is:
XmlDocument xdoc = new XmlDocument();
xdoc.Load(getProductURL());
XmlNode node = xdoc.DocumentElement.SelectSingleNode("id('38')");
// node always is null for some reason?
The way the xml is organised is like so:
<courseg>
<group isempty="False" isbranch="true" id="1" name="abc">
<group isempty="False" isbranch="true" id="38" name="def"></group>
</group>
</courseg>
The XPath you need is
//*[#id='38']
Here is the example with XDocument:
XDocument xdoc = XDocument.Parse(#"
<courseg>
<group isempty=""False"" isbranch=""true"" id=""1"" name=""abc"">
<group isempty=""False"" isbranch=""true"" id=""38"" name=""def""></group>
</group>
</courseg>");
XElement node = xdoc.Root.XPathSelectElement("//*[#id='38']");
Console.WriteLine(node);
The function id('P38') would select an element with an ID value of P38. But this doesn't just mean "an attribute named 'id'". It means an attribute declared in the DTD or schema as being of type ID. You haven't shown a DTD or schema, and I suspect you don't have one. If you did, and if it declared the id attribute as being of type ID, then your document would be invalid, because an ID value cannot be all-numeric (for legacy SGML reasons, it has to take the form of a name).
In practice the id() function is probably best avoided unless you have severe performance requirements. It's too fragile - it only works when you are validating the source document against a schema or DTD. In XSLT, use key() instead. Alternatively, many processors now recognize the attribute name xml:id as a 'self declaring' ID value without reference to a schema or DTD: use that if your processor supports it.
Use this XPath query:
//*[#id = 38]
It selects every node with id attribute equals to 38. If you have to be more specific, i.e. select group with id attribute equals to 38, use this one:
//group[#id = 38]
When you mention
xdoc.DocumentElement.SelectSingleNode("id('38')"
you are asking xmldocument to search for a child node inside root node whose name is 'id'. But ideally 'id' is an attribute and not a xmlnode.
So you have to use //group[#id = '38'] to get all child node having name 'group' and attribute 'id' with a value of 38

What's a better alternative to the following data structure: Dictionary<string, Dictionary<string, string>>

I have the following set of data
<ids>
<id1 attr1="value1" attr2="value2" />
<id2 attr3="value3" attr4="value4" />
<id3 attr2="value6" attr5="value7" />
</ids>
Basically, it's an XML that can have any node name with any attribute name with any attribute value.
After parsing the XML, I store the attribute data in a Dictionary.
Then I store that same Dictionary as a value with the node name as a key.
So my data structure would be a Dictionary<string, Dictionary<string, string>> (let's give this a variable name called "dict")
So if I wanted to get the value for attr2 in the id1 node, I would do:
string value = dict["id1"]["attr2"];
// value will be value2
I think this is a pretty simple and workable solution for my needs, but there just seems to be this voice at the back of my head telling me that there is a different data structure or simpler solution that I'm missing out on. What does everyone think?
I think your solution is a good one. It will provide very fast lookups, and matches exactly to your domain.
Is your main problem with the nested dictionaries? If so, I would suggest that you not worry about it - using collections of collections is often a very useful tool.
My only complaint would be this: If you're not using this frequently, you're going to be loading a lot of information into a data structure that may be unncessary. If this is for one time lookups, leaving it in XML and using XPath queries may be a more optimal solution than pre-parsing and loading the entire thing into memory. If you're querying this frequently, though, this is a more optimal solution.
How about a class?
public class YourId
{
public string Id { get; set; }
public string Attribute1 { get; set; }
public string Value { get; set; }
}
Then you could create a List and populate it via your xml...
It would be easy to work with and you could use linq with it:
List<YourId> items = GetIdsFromXml();
var query = from i in items
where i.Id == "id1"
select i;
// or...
items.Where(i => i.Attribute == "blah").ToList();
// ect.
Just for grins - what if you kept the XML DOM and found your attributes with XPath queries? That way if you had duplicate node names you could accomodate that.
That XML doesn't look very good. It's not semantic XML at all. Semantic XML would be:
<data>
<item id="id1">
<value name="attr1">value1</value>
<!-- ... -->
</item>
<!-- ... -->
</data>
I know it's bigger, but that's XML for you. The reason I'm even saying this is that if you're not ready to go with semantic XML, you're probably looking for another data format. XML is a little bit bloated by nature. If you're looking for a compact format, have a look at JSON.
Anyways, using semantic XML, I would recommend XPath. Have a look in the MSDN documentation and look at the SelectNodes methods in the DOM objects.
Short example:
XmlDocument doc = new XmlDocument();
doc.Load("data.xml");
// Get a single item.
XmlNode item = doc.SelectSingleNode("//item[#id=myid]");
As long as all of the nodes have unique names, you should be OK. Note that it won't really work for XML like this:
<nodes>
<node id="id1" attr1="value1" attr2="value2" />
<node id="id2" attr3="value3" attr4="value4" />
<node id="id3" attr2="value6" attr5="value7" />
</nodes>
Given that the XML can have any node name and any attribute name I think your current solution is optimal.
Why not to use something that already exists?
Like Simple XML Parser in C#
If you need an XML grammar then create one for your needs. If you need a parser then use one of the many excellent ones provided in the .Net library. If you need to store the document in memory and access it use the DOM and XPath to select nodes. If you don't need any of this, then I would recommend against using XML and instead using something simpler like JSON.
If you need to keep the whole thing in memory, but just the values, then I suggest using the DataSets and loading them with the the XML loaders.

How can I use XPath to get elements?

My XML is like:
<root>
<section name="blah">
<item name="asdf">2222</item>
</section>
</root>
I will have multiple 'sections' in the XML, I want to fetch a particular section.
In this case, I need to get items that are in the section named "blah".
The xpath is then:
/root/section[#name='blah']/item
for example, in XmlDocument:
foreach(XmlElement item in doc.SelectNodes("/root/section[#name='blah']/item"))
{
Console.WriteLine(item.GetAttribute("name"));
Console.WriteLine(item.InnerText);
}
Edit re comments: if you just want the sections, then use:
/root/section[#name='blah']
but then you'll need to iterate the data manually (since you can theoretically have multiple sections named "blah", each of which can have multiple items).

Categories