How to get "real" ChildNodes of XmlNode, ignoring whitespace nodes?

How to get "real" ChildNodes of XmlNode, ignoring whitespace nodes? - c#

Apparently XmlNode.ChildNodes-list (in C# .Net) contains not only real child nodes, but also special whitespace nodes. So even in the simplest case when having one tag inside another you can get parentNode.ChildNodes.Count == 3. How to get around this?
Already tried:
xmlDocument.PreserveWhitespace = false;
Also:
foreach(XmlNode node in xmlDocument.SelectNodes("//*))
if (node is XmlWhitespace)
node.ParentNode.RemoveChild(node);

Text nodes are first class children. I guess you want Element nodes only. Can't you do
node.SelectNodes("*")
Or are you saying that <root><child></root> results in root having three child nodes?

Why not just use the following? You won't be able to remove the node from the parent, because then you're modifying the collection while you're enumerating which isn't allowed.
foreach(XmlNode node in xmlDocument.SelectNodes("//*"))
{
if (node is XmlWhitespace)
continue;
else
{
// A real node
}
}

You can do something simple like this.
xmlDocument.SelectNodes("//*).OfType<XmlElement>();
This will filter for only nodes of type XmlElement (meaning "real" nodes). it will exclude CData, whitespace, text, etc.
Make sure to add Linq namespace:
using System.Linq;

Related

How to find an XPath query to Element/Element without namespaces (XmlSerializer, fragment)?

Assume this simple XML fragment in which there may or may not be the xml declaration and has exactly one NodeElement as a root node, followed by exactly one other NodeElement, which may contain an assortment of various number of different kinds of elements.
<?xml version="1.0">
<NodeElement xmlns="xyz">
<NodeElement xmlns="">
<SomeElement></SomeElement>
</NodeElement>
</NodeElement>
How could I go about selecting the inner NodeElement and its contents without the namespace? For instance, "//*[local-name()='NodeElement/NodeElement[1]']" (and other variations I've tried) doesn't seem to yield results.
As for in general the thing that I'm really trying to accomplish is to Deserialize a fragment of a larger XML document contained in a XmlDocument. Something like the following
var doc = new XmlDocument();
doc.LoadXml(File.ReadAllText(#"trickynodefile.xml")); //ReadAllText to avoid Unicode trouble.
var n = doc.SelectSingleNode("//*[local-name()='NodeElement/NodeElement[1]']");
using(var reader = XmlReader.Create(new StringReader(n.OuterXml)))
{
var obj = new XmlSerializer(typeof(NodeElementNodeElement)).Deserialize(reader);
I believe I'm missing just the right XPath expression, which seem to be rather elusive. Any help much appreciated!

Try this:
/*/*
It selects children of the root node.
Or
/*/*[local-name() = 'NodeElement']
It selects children with local-name() = 'NodeElement' of the root node.
Anyway in your case both expressions select <NodeElement xmlns="">.

walk the tree
foreach(XmlNode node in doc.DocumentElement.childnodes[0].childnodes)
{
// do something with node
}
hideously fragile of course might want to check for nulls here and there.

XmlReader: is there a way of finding the first child, and next sibling?

I am porting code that uses an XmlDocument. XmlNode has a FirstChild
and NextSibling property. Using XmlReader, is there a way of parsing
until the first child, or the next sibling, given the xmlreader has parsed
to an arbitrary element in the xml?

With XmlRreader, you are working in other scheme. You always are at some current node, and you switch to next one. For the current node, you can ask IsEmptyElement (which says if the tag is of the form <something attr=value/>. If the element is empty, clearly you have no child items.
Consider the case when IsEmptyElement is false, now you have something like <something> <maybechild/> </something>. You can say ReadStartElement, which will move you to the next position. For the next position you check IsStartElement. If it's true, you have a child and you are at it. If not, you are at </something> and there are no children.
Some more documentation: http://msdn.microsoft.com/en-us/library/t9bfea29.aspx.

Build an XPathDocument and create an XPathNavigator using CreateNavigator. It has methods for (as the name implies) navgating XML. The equivalent methods are MoveToFirstChild and MoveToNext.

Get XmlNode Open Tag with Attributes

Is it possible to get the open tag from a XmlNode with all attributes, namespace, etc?
eg.
<root xmlns="urn:..." rattr="a">
<child attr="1">test</child>
</root>
I would like to retrieve the entire opening tag, exactly as retrieved from the original XML document if possible, from the XmlNode and later the closing tag. Both as strings.
Basically XmlNode.OuterXml without the child nodes.
EDIT
To elaborate, XmlNode.OuterXml on a node that was created with the XML above would return the entire XML fragment, including child nodes as a single string.
XmlNode.InnerXml on that same fragment would return the child nodes but not the parent node, again as a single string.
But I need the opening tag for the XML fragment without the children nodes. And without building it using the XmlAttribute array, LocalName, Namespace, etc.
This is C# 3.5
Thanks

Is there some reason you can't simply say:
string s = n.OuterXml.Substring(0, n.OuterXml.IndexOf(">") + 1);

I think the simplest way would be to call XmlNode.CloneNode(false) which (according to the docs) will clone all the attributes but not child nodes. You can then use OuterXml - although that will give you the closing tag as well.
For example:
using System;
using System.Xml;
public class Test
{
static void Main()
{
XmlDocument doc = new XmlDocument();
doc.LoadXml(#"<root xmlns='urn:xyz' rattr='a'>
<child attr='1'>test</child></root>");
XmlElement root = doc.DocumentElement;
XmlNode clone = root.CloneNode(false);
Console.WriteLine(clone.OuterXml);
}
}
Output:
<root xmlns="urn:xyz" rattr="a"></root>
Note that this may not be exactly as per the original XML document, in terms of the ordering of attributes etc. However, it will at least be equivalent.

How about:
xmlNode.OuterXML.Replace(xmlNode.InnerXML, String.Empty);
Poor man's solution :)

XML querying particular node from C#

I have written a chunk of XML parsing which works successfully provided I use an absolute path.
I now need to take an XMLNode as an argument and run an xpath against this.
Does anyone know how to do this?
I tried using relative XPath queries without any success!!
Should it be this hard??

It would help to see examples of XPath expressions that don't work as you think they should. Here are some possible causes (mistakes I frequently make).
Assume an XML document such as:
<A>
<B>
<C d='e'/>
</B>
<C/>
<D xmlns="http://foo"/>
</A>
forgetting to remove the top-level slash ('/') representing the document:
document.XPathSelectElements("/A") // selects a single A node
document.XPathSelectElements("//B") // selects a single B node
document.XPathSelectElements("//C") // selects two C nodes
but
aNode.XPathSelectElements("/B") // selects nothing (this looks for a rootNode with name B)
aNode.XPathSelectElements("B") // selects a B node
bNode.XPathSelectElements("//C") // selects TWO C nodes - all descendants of the root node
bNode.select(".//C") // selects one C node - all descendants of B
forgetting namespaces.
aNode.XPathSelectElements("D") // selects nothing (D is in a different namespace from A)
aNode.XPathSelectElements("[local-name()='D' and namespace-uri()='http://foo']") // one D node
(This is often a problem when the root node carries a prefixless namespace - easy to miss)

Calling the DescendantNodes without repeating each node

I have an xml that I would like to get all of its elements. I tried getting those elements by Descendants() or DescendantNodes(), but both of them returned me repeated nodes
For example, here is my xml:
<Root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<FirstElement xsi:type="myType">
<SecondElement>A</SecondElement>
</FirstElement>
</Root>
and when I use this snippet:
XElement Elements = XElement.Parse(XML);
IEnumerable<XElement> xElement = Elements.Descendants();
IEnumerable<XNode> xNodes = Elements.DescendantNodes();
foreach (XNode node in xNodes )
{
stringBuilder.Append(node);
}
it gives me two nodes but repeating the <SecondElement>. I know Descendants call its children, and children of a child all the time, but is there any other way to avoid it?
Then, this is the content of my stringBuilder:
<FirstElement xsi:type="myType" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<SecondElement>A</SecondElement>
</FirstElement>
<SecondElement>A</SecondElement>

Well do you actually want all the descendants or just the top-level elements? If you only want the top level ones, then use the Elements() method - that returns all the elements directly under the current node.
The problem isn't that nodes are being repeated - it's that the higher-level nodes include the lower level nodes. So the higher-level node is being returned, then the lower-level one, and you're writing out the whole of both of those nodes, which means you're writing out the lower-level node twice.
If you just write out, say, the name of the node you're looking at, you won't see a problem. But you haven't said what you're really trying to do, so I don't know if that helps...

XmlDocument doc = new XmlDocument();
doc.LoadXml(XML);
XmlNodeList allElements = doc.SelectNodes("//*");
foreach(XmlElement element in allElements)
{
// your code here
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to get "real" ChildNodes of XmlNode, ignoring whitespace nodes? - c#

Text nodes are first class children. I guess you want Element nodes only. Can't you do node.SelectNodes("*") Or are you saying that <root><child></root> results in root having three child nodes?

Why not just use the following? You won't be able to remove the node from the parent, because then you're modifying the collection while you're enumerating which isn't allowed. foreach(XmlNode node in xmlDocument.SelectNodes("//*")) { if (node is XmlWhitespace) continue; else { // A real node } }

You can do something simple like this. xmlDocument.SelectNodes("//*).OfType<XmlElement>(); This will filter for only nodes of type XmlElement (meaning "real" nodes). it will exclude CData, whitespace, text, etc. Make sure to add Linq namespace: using System.Linq;

Related

How to find an XPath query to Element/Element without namespaces (XmlSerializer, fragment)?

XmlReader: is there a way of finding the first child, and next sibling?

Get XmlNode Open Tag with Attributes

XML querying particular node from C#

Calling the DescendantNodes without repeating each node

Categories

Resources