XMLNodeList Weird Behavior - c#

Take the following XML as example:
<root>
<lines>
<line>
<number>1</number>
</line>
<line>
<number>2</number>
</line>
</lines>
</root>
XmlNodeList nodeList = doc.SelectNodes("//lines/line");
foreach(XmlNode node in nodeList)
{
int index = node.SelectSingleNode("//number");
}
The above code will result in index = 1 for both iterations.
foreach(XmlNode node in nodeList)
{
int index = node.SelectSingleNode("number");
}
The above code will result in 1,2 respectively. I know that // finds first occurrence of xpath but i feel like the first occurrence should be relative to the node itself. The behavior appears to find first occurrence from the root even when selecting nodes from a child node. Is this the way microsoft intended this to work or is this a bug.

yeah thanks but just removing the slashes worked as well as in my second example.
Removing the slashes only works because number is an immediate child element of line. If it were further down in the hierarchy:
<root>
<lines>
<line>
<other>
<number>1</number>
</other>
</line>
</lines>
</root>
you would still need to use .//number.
I just think it is confusing that if you are searching for node within a node that // would go back to the whole document.
That's just how XPath syntax is designed. // at the beginning of an XPath expression means that the evaluation context is the document node - the outermost node of an XML document. .// means that the context of the path expression is the current context node.
If you think about it, it is actually useful to have a way to select from the whole document in any context.
Is this the way microsoft intended this to work or is this a bug.
Microsoft is implementing the XPath standard, and yes, this is how the W3C intended an XPath library to work and it's not a bug.

Related

How to search for empty/void nodes/elements in XPath?

I have the following XML sample for which I need a XPath query to return only node1 and node3.
<root>
<node1 />
<node2 anyAttribute="anyText" />
<node3> </node3>
<node4>anyText</node4>
<node5>
<anyChildNode />
</node5>
</root>
In other words a XPath query which returns all nodes which have (simultaneously):
no attributes
no child nodes
no or whitespace-only content
I've found some solutions (1 & 2) but which are only applicable to one of the points above at a time:
for 1. /root/node()[not(node())] - tested and works
for 2. /root/node()[not(#*)] - tested and works
for 3. /root/node()[string-length(normalize-space(text())) = 0] - not working (dunno why)
Yes, I know, I could use the 3 variants above together, but I would like to avoid it and I would think that for just searching for empty nodes/elements there should be an easy way, or?
I'm also limited to xPath 1.0 on .NET, since there is no progress on supporting newer versions.
This XPath,
/root/*[not(#* or * or text()[normalize-space()])]
will select only node1 and node3, as requested.
Explanation:
Select all element (note difference from node) children of root that have no children that are attributes (#*) or elements (*) or non-whitespace text (text()[normalize-space()]).

Replace OuterXml OR generate intended string from InnerXml

I have a UI that uses the DataGridView to display the content of XML files.
If XmlNode contains only InnerText, it's quite simple, however I'm having a problem with nodes that contains childnodes (and not only string).
Simple
<node>value</node>
Displayed as "value" in DataGridViewCell.
Complex
<node>
<foo>bar</foo>
<foo2>bar</foo2>
</node>
The problem is that the InnerXml code is not intended and it's very hard to modify in UI.
I've tried to use XmlTextWriter to "beautify" the string - it works quite well, however requires a XmlNode (includes node, not only childnodes) and I cannot assign it back to InnerXml.
I would like to either see following in the UI:
<foo>bar</foo>
<foo2>bar</foo2>
(this can be assigned to InnerXml afterwards)
Or
<node>
<foo>bar</foo>
<foo2>bar</foo2>
</node>
(and find a way how to replace OuterXml with this string).
Thanks for any ideas,
Martin
You can load the OuterXml to XElement, then use String.Join() to join all child elements of the root node (in other point-of-view, the InnerXml) separated by line break, for example :
XElement e = e.Parse(something.OuterXml);
var result = string.Join(
Environment.NewLine,
e.Elements().Select(o => o.ToString())
);

XML querying particular node from C#

I have written a chunk of XML parsing which works successfully provided I use an absolute path.
I now need to take an XMLNode as an argument and run an xpath against this.
Does anyone know how to do this?
I tried using relative XPath queries without any success!!
Should it be this hard??
It would help to see examples of XPath expressions that don't work as you think they should. Here are some possible causes (mistakes I frequently make).
Assume an XML document such as:
<A>
<B>
<C d='e'/>
</B>
<C/>
<D xmlns="http://foo"/>
</A>
forgetting to remove the top-level slash ('/') representing the document:
document.XPathSelectElements("/A") // selects a single A node
document.XPathSelectElements("//B") // selects a single B node
document.XPathSelectElements("//C") // selects two C nodes
but
aNode.XPathSelectElements("/B") // selects nothing (this looks for a rootNode with name B)
aNode.XPathSelectElements("B") // selects a B node
bNode.XPathSelectElements("//C") // selects TWO C nodes - all descendants of the root node
bNode.select(".//C") // selects one C node - all descendants of B
forgetting namespaces.
aNode.XPathSelectElements("D") // selects nothing (D is in a different namespace from A)
aNode.XPathSelectElements("[local-name()='D' and namespace-uri()='http://foo']") // one D node
(This is often a problem when the root node carries a prefixless namespace - easy to miss)

Calling the DescendantNodes without repeating each node

I have an xml that I would like to get all of its elements. I tried getting those elements by Descendants() or DescendantNodes(), but both of them returned me repeated nodes
For example, here is my xml:
<Root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<FirstElement xsi:type="myType">
<SecondElement>A</SecondElement>
</FirstElement>
</Root>
and when I use this snippet:
XElement Elements = XElement.Parse(XML);
IEnumerable<XElement> xElement = Elements.Descendants();
IEnumerable<XNode> xNodes = Elements.DescendantNodes();
foreach (XNode node in xNodes )
{
stringBuilder.Append(node);
}
it gives me two nodes but repeating the <SecondElement>. I know Descendants call its children, and children of a child all the time, but is there any other way to avoid it?
Then, this is the content of my stringBuilder:
<FirstElement xsi:type="myType" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<SecondElement>A</SecondElement>
</FirstElement>
<SecondElement>A</SecondElement>
Well do you actually want all the descendants or just the top-level elements? If you only want the top level ones, then use the Elements() method - that returns all the elements directly under the current node.
The problem isn't that nodes are being repeated - it's that the higher-level nodes include the lower level nodes. So the higher-level node is being returned, then the lower-level one, and you're writing out the whole of both of those nodes, which means you're writing out the lower-level node twice.
If you just write out, say, the name of the node you're looking at, you won't see a problem. But you haven't said what you're really trying to do, so I don't know if that helps...
XmlDocument doc = new XmlDocument();
doc.LoadXml(XML);
XmlNodeList allElements = doc.SelectNodes("//*");
foreach(XmlElement element in allElements)
{
// your code here
}

Using XPath in SelectSingleNode: Retrieving individual element from XML if it's present

My XML looks like :
<?xml version=\"1.0\"?>
<itemSet>
<Item>one</Item>
<Item>two</Item>
<Item>three</Item>
.....maybe more Items here.
</itemSet>
Some of the individual Item may or may not be present. Say I want to retrieve the element <Item>two</Item> if it's present. I've tried the following XPaths (in C#).
XMLNode node = myXMLdoc.SelectSingleNode("/itemSet[Item='two']") --- If Item two is present, then it returns me only the first element one. Maybe this query just points to the first element in itemSet, if it has an Item of value two somewhere as a child. Is this interpretation correct?
So I tried:
XMLNode node = myXMLdoc.SelectSingleNode("/itemSet[Item='two']/Item[1]") --- I read this query as, return me the first <Item> element within itemSet that has value = 'two'. Am I correct?
This still returns only the first element one. What am I doing wrong?
In both the cases, using the siblings I can traverse the child nodes and get to two, but that's not what I am looking at. Also if two is absent then SelectSingleNode returns null. Thus the very fact that I am getting a successfull return node does indicate the presence of element two, so had I wanted a boolean test to chk presence of two, any of the above XPaths would suffice, but I actually the need the full element <Item>two</Item> as my return node.
[My first question here, and my first time working with web programming, so I just learned the above XPaths and related xml stuff on the fly right now from past questions in SO. So be gentle, and let me know if I am a doofus or flouting any community rules. Thanks.]
I think you want:
myXMLdoc.SelectSingleNode("/itemSet/Item[text()='two']")
In other words, you want the Item which has text of two, not the itemSet containing it.
You can also use a single dot to indicate the context node, in your case:
myXMLdoc.SelectSingleNode("/itemSet/Item[.='two']")
EDIT: The difference between . and text() is that . means "this node" effectively, and text() means "all the text node children of this node". In both cases the comparison will be against the "string-value" of the LHS. For an element node, the string-value is "the concatenation of the string-values of all text node descendants of the element node in document order" and for a collection of text nodes, the comparison will check whether any text node is equal to the one you're testing against.
So it doesn't matter when the element content only has a single text node, but suppose we had:
<root>
<item name="first">x<foo/>y</item>
<item name="second">xy<foo/>ab</item>
</root>
Then an XPath expression of "root/item[.='xy']" will match the first item, but "root/item[text()='xy']" will match the second.

Categories