What is the difference between Node.SelectNodes(/*) and Node.childNodes? - c#

string XML1 = "<Root><InsertHere></InsertHere></Root>";
string XML2 = "<Root><child1><childnodes>data</childnodes><childnodes>data1</childnodes></child1><child2><childnodes>data</childnodes><childnodes>data1</childnodes></child2></Root>";
Among below mentioned two code samples.. usage of childNodes doesn't copy all the child nodes from XML2. only <child1> is being copied.
string strXpath = "/Root/InsertHere";
XmlDocument xdxmlChildDoc = new XmlDocument();
XmlDocument ParentDoc = new XmlDocument();
ParentDoc.LoadXml(XML1);
xdxmlChildDoc.LoadXml(XML2);
XmlNode xnNewNode = ParentDoc.ImportNode(xdxmlChildDoc.DocumentElement.SelectSingleNode("/Root"), true);
if (xnNewNode != null)
{
XmlNodeList xnChildNodes = xnNewNode.SelectNodes("/*");
if (xnChildNodes != null)
{
foreach (XmlNode xnNode in xnChildNodes)
{
if (xnNode != null)
{
ParentDoc.DocumentElement.SelectSingleNode(strXpath).AppendChild(xnNode);
}
}
}
}
code2:
if (xnNewNode != null)
{
XmlNodeList xnChildNodes = xnNewNode.ChildNodes;
if (xnChildNodes != null)
{
foreach (XmlNode xnNode in xnChildNodes)
{
if (xnNode != null)
{
ParentDoc.DocumentElement.SelectSingleNode(strXpath).AppendChild(xnNode);
}
}
}
}
ParentDoc.OuterXML after executing first sample of code:
<Root>
<InsertHere>
<child1>
<childnodes>data</childnodes>
<childnodes>data1</childnodes>
</child1>
<child2>
<childnodes>data</childnodes>
<childnodes>data1</childnodes>
</child2>
</InsertHere>
</Root>
ParentDoc.OuterXML after executing second sample of Code
<Root>
<InsertHere>
<child1>
<childnodes>data</childnodes>
<childnodes>data1</childnodes>
</child1>
</InsertHere>
</Root>

I have done some debugging of the code, and it shows that xnNewNode.ChildNodes initially also returns 2 child nodes. After one iteration in the loop, the first child is however removed from ChildNodes, and therefore the loop ends prematurely.
If you want to use the ChildNodes property, one workaround is to "transfer" the child node references to an array or list, like this:
var xnChildNodes = xnNewNode.ChildNodes.Cast<XmlNode>().ToArray();
UPDATE
As Tomer W pointed out in his answer, when using XmlNode.AppendChild the inserted node is also removed from its original location. As stated in the MSDN documentation:
If the newChild is already in the tree, it is removed from
its original position and added to its target position.
With SelectNodes you have already created a new node collection, but with ChildNodes you are accessing the original collection.

this is a clearing of what Anders G posted, with more through explanation.
I am surprised the foreach does not fail (Throw Exception) in this situation, but hell.
In code1.
1. Create a NEW COLLECTION of nodes
2. Select nodes to it
3. append to other node => removing from original collection, but not the newly created one.
4 you are removing the node you are adding from the newly collection.
in Code2
1. Reference the ORIGINAL node collection
{child1, child2}
2. append 1st Node away to another collection => removing it from the original collection
{child2}
3. now when the foreach at index 1, it see that it passed the end of the collection. and exit.
this happens a lot when changing a collection that is subject to iteration.
but most the time, the IEnumerator is throwing an Exception when such happens.
hope i made it all clear

I had the same problem and observed, that whitespace nodes seem to have a value attached to the node, which is not the case with other nodes (at least in my application).This method removes the whitespace nodes from the node.ChildNodes list:
private List<XmlNode> findChildnodes(XmlNode node)
{
List<XmlNode> result = new List<XmlNode>();
foreach (XmlNode childnode in node.ChildNodes)
{
if(childnode.Value == null)
{
result.Add(childnode);
}
}
return result;
}

In answer to your question, Node.childNodes is All of the child nodes, whereas Node.SelectNodes(/*) is all of the child nodes that match /*. Only XML elements will match /*, so any attributes, CDATA nodes, text nodes, etc will be excluded.
Nevertheless, the problem occurs because you are changing the collection of nodes while while iterating over them. You cannot do that. The select nodes method returns a list of references to nodes. This is why is works.

Related

Prevent repeating root nodes or child nodes of root nodes using inputted directories

I was wondering how could I prevent the repetition of a root node or child nodes within that root node? As well as any child nodes of a child node etc. Keep in mind these nodes are based off inputted directories (I don't want to use Directory.GetDirectories or anything).
For example, if I inputted these three things:
cat\dog\monkey
cat\dog\tree
cat\dog\tree\monkey
Then the TreeView would look like this:
cat
dog
monkey
tree
monkey
I know this task may seem really easy, and for all I know it could be, but for some reason, I find it quite hard to do this. I also need to easily be able to associate the last node of each inputted directory with an object. So for example, "tree" would have data associated with it, etc. The process should be usable as many times as I like, in case I later decide to input more directories into the TreeView.
Thanks!
Edit: Figured it out.
split your input
get count of every args
check if there is already existing node on certain dimension
add or skip
example code looks like this. (This code may not be optimized code)
string[] input;
int count = input.Count;
getinput(input);
TreeNode CurrentNode = new TreeNode("Root");
this.treeview1.Add(CurrentNode);
foreach(string[] inputVal in input) {
foreach(string[] sp in inputVal.Split('\\')) {
CurrentNode = this.treeview.FindNode("Root");
for(int i = 0; i<sp.Count; i++) {
bool findSign = false;
foreach(TreeNode TN in CurrentNode.ChildNodes) {
//find if current node contains node named with sp[i]
if(TN.Text == sp[i]) {
findSign = true;
CurrentNode = TN;//set Current node as found node
break;
}
}
if(findSign == false) {
//if node name wasn't found, add a new node on current node
TreeNode newNode = new TreeNode(sp[i]);
CurrentNode.Add(newNode);
CurrentNode = newNode;//add node and set as current
}
}
}
}

How can I remove duplicate, invalid, child nodes from an XML document using Linq to XML?

I'm creating XML from JSON retrieved from an HttpWebRequest call, using JsonConvert. The JSON I'm getting back sometimes has duplicate nodes, creating duplicate nodes in the XML after conversion, which I then have to remove.
The processing of the JSON to XML conversion is being done in a generic service call wrapper that has no knowledge of the underlying data structure and so can't do any XPath queries based on a named node. The duplicates could be at any level within the XML.
I've got to the stage where I have a list of the names of duplicate nodes at each level but am not sure of the Linq query to use this to remove all but the first node with that name.
My code:
protected virtual void RemoveDuplicateChildren(XmlNode node)
{
if (node.NodeType != XmlNodeType.Element || !node.HasChildNodes)
{
return;
}
var xNode = XElement.Load(node.CreateNavigator().ReadSubtree());
var duplicateNames = new List<string>();
foreach (XmlNode child in node.ChildNodes)
{
var isBottom = this.IsBottomElement(child); // Has no XmlNodeType.Element type children
if (!isBottom)
{
this.RemoveDuplicateChildren(child);
}
else
{
var count = xNode.Elements(child.Name).Count();
if (count > 1 && !duplicateNames.Contains(child.Name))
{
duplicateNames.Add(child.Name);
}
}
}
if (duplicateNames.Count > 0)
{
foreach (var duplicate in duplicateNames)
{
xNode.Elements(duplicate).SelectMany(d => d.Skip(1)).Remove();
}
}
}
The final line of code obviously isn't correct but I can't find an example of how to rework it to retrieve and then remove all but the first matching element.
UPDATE:
I have found two ways of doing this now, one using the XElement and one the XmlNode, but neither actually removes the nodes.
Method 1:-
foreach (var duplicate in duplicateNames)
{
xNode.Elements(duplicate).Skip(1).Remove();
}
Method 2:-
foreach (var duplicate in duplicateNames)
{
var nodeList = node.SelectNodes(duplicate);
if (nodeList.Count > 1)
{
for (int i=1; i<nodeList.Count; i++)
{
node.RemoveChild(nodeList[i]);
}
}
}
What am I missing?
If you don't want any duplicate names: (assuming no namespaces)
XElement root = XElement.Load(file); // .Parse(string)
List<string> names = root.Descendants().Distinct(x => x.Name.LocalName).ToList();
names.ForEach(name => root.Descendants(name).Skip(1).Remove());
root.Save(file); // or root.ToString()
You might try to solve the problem at the wrong level. In XML is perfectly valid to have multiple nodes with the same name. JSON structures with duplicate property names should be invalid. You should try to do this sanitation at the JSON level and not after it was already transformed to XML.
For the xml cleanup this might be a starting point:
foreach (XmlNode child
in node.ChildNodes.Distinct(custom comparer that looks on node names))
{
.....
}

Getting same value when I'm looking in different nodes HTMLAgility

I'm having a problem using html agility.
I stumbled upon an img src/ img alt where i have to take data.
All is good when there is only one thing i need to take my data, but when there are more it founds everything in the collection like it should, but the data taken is always from the 1st node in the collection...
HtmlNodeCollection collection = doc.DocumentNode.SelectNodes("//div[#class='listHolder']//article[#class='brochure openBrochureAction']//div[#class='imgBrochure']");
foreach (HtmlNode node in collection)
{
//Tried these examples:
NomeFolheto = node.SelectSingleNode("//div[#class='imageRatioHorizontal']//img[#alt]").GetAttributeValue("alt", "none").Trim();
string testeNome = node.SelectSingleNode("//div[#class='imageRatioHorizontal']//img/#alt").Attributes["alt"].Value;
string testeimagem = node.SelectSingleNode("//div[#class='imageRatioHorizontal']//img/#src").Attributes["src"].Value;
imagem = node.SelectSingleNode("//div[#class='imageRatioHorizontal']//img[#src]").GetAttributeValue("src", "none").Trim();
}
Like i said, the collection finds all nodes it should, and gets the 1st value properly, but when it goes for the other nodes, the values it gets are from the 1st node.
What am I doing wrong? I went to check each node in the collection, and they have same "alt" attribute like it should and different "src" attribute like they should, but I know because i debugged that it's picking up the 1st node every time.
Thanks in advance
Your xpath expressions are all starting from the root (of the document). Even when you have a reference to a single node, it is still just a reference to that node within the entire tree.
You should use .// for the expressions:
HtmlNodeCollection collection = doc.DocumentNode.SelectNodes("//div[#class='listHolder']//article[#class='brochure openBrochureAction']//div[#class='imgBrochure']");
foreach (HtmlNode node in collection)
{
//Tried these examples:
NomeFolheto = node
.SelectSingleNode(".//div[#class='imageRatioHorizontal']//img[#alt]")
.GetAttributeValue("alt", "none").Trim();
string testeNome = node
.SelectSingleNode(".//div[#class='imageRatioHorizontal']//img/#alt")
.Attributes["alt"].Value;
string testeimagem = node
.SelectSingleNode(".//div[#class='imageRatioHorizontal']//img/#src")
.Attributes["src"].Value;
imagem = node
.SelectSingleNode(".//div[#class='imageRatioHorizontal']//img[#src]")
.GetAttributeValue("src", "none").Trim();
}

how to programmatically add XmlNode to an XmlNodeList

I have an XmlNodeList of products whose values are put into a table. Now I want to add a new XmlNode to the list when a certain product is found so that in the same loop the new products is treated the same as the items that are originally in the file. This way the structire of the function does not need to change, just add an extra node that is processed next.
But an XmlNode is an abstract class and I cant figure out how to create the new node programatically. Is this possible?
XmlNodeList list = productsXml.SelectNodes("/portfolio/products/product");
for (int i = 0; i < list.Count; i++)
{
XmlNode node = list[i];
if (node.Attributes["name"].InnertText.StartsWith("PB_"))
{
XmlNode newNode = ????
list.InsertAfter(???, node);
}
insertIntoTable(node);
}
XmlDocument is the factory for its nodes so you have to do this:
XmlNode newNode = document.CreateNode(XmlNodeType.Element, "product", "");
Or its shortcut:
XmlNode newNode = document.CreateElement("product");
Then to add newly created node to its parent:
node.ParentNode.AppendChild(newNode);
If added node must be processed then you have to do it explicitly: node list is a snapshot of nodes that matched search condition then it won't be dynamically updated. Simply call insertIntoTable() for added node:
insertIntoTable(node.ParentNode.AppendChild(newNode));
If your code is much different you may need little bit of refactoring to make this process a two step batch (first search for nodes to add and then process them all). Of course you may even follow a completely different approach (for example copying nodes from XmlNodeList to a List and then adding them to both lists).
Assuming this is enough (and you do not need refactoring) let's put everything together:
foreach (var node in productsXml.SelectNodes("/portfolio/products/product"))
{
if (node.Attributes["name"].InnertText.StartsWith("PB_"))
{
XmlNode newNode = document.CreateElement("product");
insertIntoTable(node.ParentNode.AppendChild(newNode));
}
// Move this before previous IF in case it must be processed
// before added node
insertIntoTable(node);
}
Refactoring
Refactoring time (and if you have a 200 lines function you really need it, much more than what I present here). First approach even if not very efficient:
var list = productsXml
.SelectNodes("/portfolio/products/product")
.Cast<XmlNode>();
.Where(x.Attributes["name"].InnertText.StartsWith("PB_"));
foreach (var node in list)
node.ParentNode.AppendChild(document.CreateElement("product"));
foreach (var node in productsXml.SelectNodes("/portfolio/products/product"))
insertIntoTable(node); // Or your real code
If you do not like a two passes approach you may use ToList() like this:
var list = productsXml
.SelectNodes("/portfolio/products/product")
.Cast<XmlNode>()
.ToList();
for (int i=0; i < list.Count; ++i)
{
var node = list[i];
if (node.Attributes["name"].InnertText.StartsWith("PB_"))
list.Add(node.ParentNode.AppendChild(document.CreateElement("product"))));
insertIntoTable(node);
}
Please note that in second example the use of for instead of foreach is mandatory because you change the collection within the loop. Note that you may even keep your original XmlNodeList object in place...
You can't create an XmlNode directly, but only the subtypes (e.g. elements) by using one of the Create* methods of the parent XmlDocument. E.g. if you want to create a new element:
XmlElement el = doc.CreateElement("elementName");
node.ParentNode.InsertAfter(el, node);
Please note that this will not add the element to the XmlNodeList list so the node won't be processed. All necessary processing of the node should therefore be done right when adding the node.

How to get all XML nodes with the same name without knowing their level?

I have a XML Example:
<Fruits>
<Red_fruits>
<Red_fruits></Red_fruits>
</Red_fruits>
<Yellow_fruits>
<banana></banana>
</Yellow_fruits>
<Red_fruits>
<Red_fruits></Red_fruits>
</Red_fruits>
</Fruits>
I have 4 Red_fruits tags, 2 of them shares the same ParentNode (Fruits), I want to get those which have the same ParentNode.
But I just want those which have the same name (Red_fruits), which means Yellow_fruits tag isn't included.
This is the way I am doing right now using C# language:
XmlDocument doc = new XmlDocument();
string selectedTag = cmbX.text;
if (File.Exists(txtFile.text))
{
try
{
//Load
doc.Load(cmbFile.text);
//Select Nodes
XmlNodeList selectedNodeList = doc.SelectNodes(".//" + selectedTag);
}
Catch
{
MessageBox.show("Some error message here");
}
}
This is returning me all red_fruits, not just the ones that belongs to Fruits.
I can't make XmlNodeList = doc.SelectNodes("/Fruits/Red_fruits") because I want to use this code to read random XML files, so I don't know the exact name that specific node will have, I just need to put all nodes with the same name and same level into a XmlNodeList using C# Language.
Is there a way of achieve this without using LINQ? How to do that?
An understanding on the usage of Single Slash / and Double slash // can help here.
Let's see how / and // work in relation to the root node. When / is used at the beginning of a path:
/a
it will define an absolute path to node a relative to the root. As such, in this case, it will only find a nodes at the root of the XML tree.
When // is used at the beginning of a path:
//a
it will define a path to node a anywhere within the XML document. As such, in this case, it will find a nodes located at any depth within the XML tree.
These XPath expressions can also be used in the middle of an XPath value to define ancestor-descendant relationships. When / is used in the middle of a path:
/a/b
it will define a path to node b that is an immediate direct descendant (ie. a child) of node a.
When // used in the middle of a path:
/a//b
it will define a path to node b that is ANY descendant of node a.
Coming back to your question:
// using GetElementsByTagName() return all the Elements having name: Red_Fruits
XmlDocument doc = new XmlDocument();
XmlNodeList nodes= doc.GetElementsByTagName("Red_Fruits");
//Using SelectNodes() method
XmlNodelist nodes = doc.SelectNodes("//Fruits/Red_Fruits");
// This will select all elements that are children of the <Fruits> element.
In case <Fruits> is the root element use the Xpath: /Fruits/Red_Fruits. [ a single slash /]
If you're simply trying to find the "next" or "previous" iteration of a single node, you can do the following and then compare it to the name
XmlNode current = doc.SelectSingleNode("Fruits").SelectSingleNode("Red_fruits");
XmlNode previous = current.NextSibling;
XmlNode next = current.NextSibling;
and you can iterate until you find the proper sibling
while(next.Name != current.Name)
{
next = next.NextSibling;
}
or you can even get your list by invoking the 'Parent' property
XmlNodeList list = current.ParentNode.SelectNodes(current.Name);
Worst case scenario, you can cycle through the XMLNode items in selectedNodeList and check the ParentNode properties. If necessary you could go recursive on the ParentNode check and count the number of times it takes to get to the root node. This would give you the depth of a node. Or you could compare the ParentNode at each level to see if it is the parent you are interested in, if that parent is not the root.
public void Test(){
XmlDocument doc = new XmlDocument();
string selectedTag = cmbX.text;
if (File.Exists(txtFile.text))
{
try
{
//Load
doc.Load(cmbFile.text);
//Select Nodes
XmlNodeList selectedNodeList = doc.SelectNodes(".//" + selectedTag);
List<XmlNode> result = new List<XmlNode>();
foreach(XmlNode node in selectedNodeList){
if(depth(node) == 2){
result.Add(node);
}
}
// result now has all the selected tags of depth 2
}
Catch
{
MessageBox.show("Some error message here");
}
}
}
private int depth(XmlNode node) {
int depth = 0;
XmlNode parent = node.ParentNode;
while(parent != null){
parent = node.ParentNode;
depth++;
}
return depth;
}

Categories