Confusion regarding BFS or DFS recursion in a treeview

Confusion regarding BFS or DFS recursion in a treeview - c#

I'm doing some procesing in a treeview, I don't use neither a stack or a queue to process all the nodes, I just do this:
void somemethod(TreeNode root){
foreach(TreeNode item in root.Nodes)
{
//doSomething on item
somemethod(item);
}
}
I'm a litle block right know (can't think with clarity) and I can't see what kind of tree processing I'm doing. Is this BFS or DFS or neither of them?
My clue was DFS but wasn't sure. The CLR don't do anything weird like process two siblings before passing down taking advantage of multiprocessing? That weird tough comes to my mind that clouded my judgment

You are doing a DFS (Depth first search/traversal) right now using recursion.
Its depth first because recursion works the same way as a stack would - you process the children of the current node before you process the next node - so you go for depth first instead of breadth.
Edit:
In response to your comment / updated question: your code will be processed sequentially item by item, there will be no parallel processing, no "magic" involved. The traversal using recursion is equivalent to using a stack (LIFO = last in, first out) - it is just implicit. So your method could also have been written like the following, which produces the same order of traversal:
public void SomeMethod(TreeNode root)
{
Stack<TreeNode> nodeStack = new Stack<TreeNode>();
nodeStack.Push(root);
while (nodeStack.Count > 0)
{
TreeNode node = nodeStack.Pop();
//do something on item
//need to push children in reverse order, so first child is pushed last
foreach (TreeNode item in node.Nodes.Reverse())
nodeStack.Push(item);
}
}
I hope this makes it clearer what is going on - it might be useful for you to write out the nodes to the console as they are being processed or actually walk through step by step with a debugger.
(Also both the recursive method and the one using a stack assume there is no cycle and don't test for that - so the assumption is this is a tree and not any graph. For the later DFS introduces a visited flag to mark nodes already seen)

Im pretty sure your example corresponds to "Depth first search", because the nodes on which you "do something" increase in depth before breadth.

Related

Going back one iteration in foreach loop in C#

I have a foreach loop that iterates different levels of a treeView in C# that its simplified version looks like this:
foreach (TreeNode childNode in currentNode.Nodes)
{
if (!someCondition)
{
currentNode.Remove();
}
}
but the problem is that when a node (for example from a list of node1, node2, node3 and node4) is removed the list becomes shorter and the foreach loop skips one iteration (for example, say if node2 was to be removed the list becomes node1, node3 and node4 and the next node that the foreach loop considers will be node4 instead of node3). This is because the framework is storing these nodes in an array/list so I would like to know if it is possible to make a foreach loop go back one iteration when I want to remove a node from the tree.
I'm pretty new in .NET framework so your help is really appreciated.

The desired result can perhaps be achieved using Linq by setting
currentNode.Nodes = currentNode.Nodes.Where( n => SomeCondition( n ) ).ToList();
or something similar, so no explicit iteration is necessary. A less elegant solution is using an explicit for-loop running backwards, so that the loop index cannot become invalid. However I would consider this bad practice when a more structural approach is available.

You can use for loop here:
// Pay attention to reversed order:
// each currentNode.Remove() changes currentNode.Nodes.Count
for (int i = currentNode.Nodes.Count - 1; i >= 0; --i) {
TreeNode childNode = currentNode.Nodes[i];
if (!someCondition) {
currentNode.Remove();
}
}

No this is not possible because the iterations of a foreach loop aren't "indexed" in a strict sense.
A for loop is, however, indexed because you provide it with a counting mechanism yourself. There you can change your counter.

Usually it's not a great idea to modify the collection that you are iterating through within a foreach. You should consider using a for loop instead and manually keep track of the current index.

Mysterious failure removing nodes from an XML document

I'd be surprised if anyone can explain this, but it'd be interesting to know if others can reproduce the weirdness I'm experiencing...
We've got a thing based on InfoPath that processes a lot of forms. Form data should conform to an XSD, but InfoPath keeps adding its own metadata in the form of so-called "my-fields". We would like to remove the my-fields, and I wrote this simple method:
string StripMyFields(string xml)
{
var doc = new XmlDocument();
doc.LoadXml(xml);
var matches = doc.SelectNodes("//node()").Cast<XmlNode>().Where(n => n.NamespaceURI.StartsWith("http://schemas.microsoft.com/office/infopath/"));
Dbug("Found {0} nodes to remove.", matches.Count());
foreach (var m in matches)
m.ParentNode.RemoveChild(m);
return doc.OuterXml;
}
Now comes the really weird stuff! When I run this code it behaves as I expect it to, removing any nodes that are in InfoPath namespaces. However, if I comment out the call to Dbug, the code completes, but one "my-field" remains in the XML.
I've even commented out the content of the convenient Dbug method, and it still behaves this same way:
void Dbug(string s, params object[] args)
{
//if (args.Length > 0)
// s = string.Format(s, args);
//Debug.WriteLine(s);
}
Input XML:
<?xml version="1.0" encoding="UTF-8"?>
<skjema xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2008-03-03T22:25:25" xml:lang="en-us">
<Field-1643 orid="1643">data.</Field-1643>
<my:myFields>
<my:field1>Al</my:field1>
<my:group1>
<my:group2>
<my:field2 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">2009-01-01</my:field2>
<Field-1611 orid="1611">More data.</Field-1611>
<my:field3>true</my:field3>
</my:group2>
<my:group2>
<my:field2>2009-01-31</my:field2>
<my:field3>false</my:field3>
</my:group2>
</my:group1>
</my:myFields>
<Field-1612 orid="1612">Even more data.</Field-1612>
<my:field3>Blah blah</my:field3>
</skjema>
The "my:field3" element (at the bottom, text "Blah blah") is not removed unless I invoke Dbug.
Clearly the universe is not supposed to be like this, but I would be interested to know if others are able to reproduce.
I'm using VS2012 Premium (11.0.50727.1 RTMREL) and FW 4.5.50709 on Win8 Enterprise 6.2.9200.

First things first. LINQ uses concept known as deferred execution. This means no results are fetched until you actually materialize query (for example via enumeration).
Why would it matter with your nodes removal issue? Let's see what happens in your code:
SelectNodes creates XPathNodeIterator, which is used by XPathNavigator which feeds data to XmlNodeList returned by SelectNodes
XPathNodeIterator walks xml document tree basing on XPath expression provided
The Cast and Where simply decide whether node returned by XPathNodeIterator should participate in final result
We arrive right before DBug method call. For a moment, assume it's not there. At this point, nothing have actually happened just yet. We only got unmaterialized LINQ query.
Things change when we start iterating. All the iterators (Cast and Where got their own iterators too) start rolling. WhereIterator asks CastIterator for item, which then asks XPathNodeIterator which finally returns first node (Field-1643). Unfortunately, this one fails the Where test, so we ask for next one. More luck with my:myFields, it is a match - we remove it.
We quickly proceed to my:field1 (again, WhereIterator → CastIterator → XPathNodeIterator), which is also removed. Stop here for a moment. Removing my:field1 detaches it from its parent, which results in setting its (my:field1) siblings to null (there's no other nodes before/after removed node).
What's the current state of things? XPathNodeIterator knows its current element is my:field1 node, which just got removed. Removed as in detached from parent, but iterator still holds reference. Sounds great, let's ask it for next node. What XPathNodeIterator does? Checks its Current item, and asks for NextSibling (since it has no children to walk first) - which is null, given we just performed detachment. And this means iteration is over. Job done.
As a result, by altering collection structure during iteration, you only removed two nodes from your document (while in reality only one, as the second removed node was child of the one already removed).
Same behavior can be observed with much simpler XML:
<Root>
<James>Bond</James>
<Jason>Bourne</Jason>
<Jimmy>Keen</Jimmy>
<Tom />
<Bob />
</Root>
Suppose we want to get rid of nodes starting with J, resulting in document containing only honest man names:
var doc = new XmlDocument();
doc.LoadXml(xml);
var matches = doc
.SelectNodes("//node()")
.Cast<XmlNode>()
.Where(n => n.Name.StartsWith("J"));
foreach (var node in matches)
{
node.ParentNode.RemoveChild(node);
}
Console.WriteLine(doc.InnerXml);
Unfortunately, Jason and Jimmy remain. James' next sibling (the one to be returned by iterator) was originally meant to be Jason, but as soon as we detached James from tree there's no siblings and iteration ends.
Now, why it works with DBug? Count call materializes query. Iterators have run, we got access to all nodes we need when we start looping. The same things happens with ToList called right after Where or if you inspect results during debug (VS even notifies you inspecting results will enumerate collection).

I think this is down to the schrodinger's cat problem that Where will not actually compile the results of the query until you view or act upon it. Meaning, until you call Count() (or any other function for getting the results) or view it in debugger, the results don't exist. As a test, try put it as such:
if (matches.Any())
foreach (var m in matches)
m.ParentNode.RemoveChild(m);

Very strange, its only when you actually view the results while debugging that it removes the last node. Incidentally, converting the result to a List and then looping through it also works.
List<XmlNode> matches = doc.SelectNodes("//node()").Cast<XmlNode>().Where(n => n.NamespaceURI.StartsWith("http://schemas.microsoft.com/office/infopath/")).ToList();
foreach (var m in matches)
{
m.ParentNode.RemoveChild(m);
}

jimmy_keen's solution worked for me. I had just a simple
//d is an XmlDocument
XmlNodeList t = d.SelectNodes(xpath);
foreach (XmlNode x in t)
{
x.ParentNode.RemoveChild(x);
}
d.Save(outputpath);
this would remove only 3 nodes while stepping through in debug mode would remove 1000+ nodes.
Just adding a Count before the foreach solved the problem:
var count = t.Count;

What are some alternatives to recursive search algorithms?

I am looking at alternatives to a deep search algorithm that I've been working on. My code is a bit too long to post here, but I've written a simplified version that captures the important aspects. First, I've created an object that I'll call 'BranchNode' that holds a few values as well as an array of other 'BranchNode' objects.
class BranchNode : IComparable<BranchNode>
{
public BranchNode(int depth, int parentValue, Random rnd)
{
_nodeDelta = rnd.Next(-100, 100);
_depth = depth + 1;
leafValue = parentValue + _nodeDelta;
if (depth < 10)
{
int children = rnd.Next(1, 10);
branchNodes = new BranchNode[children];
for (int i = 0; i < children; i++)
{
branchNodes[i] = new BranchNode(_depth, leafValue, rnd);
}
}
}
public int CompareTo(BranchNode other)
{
return other.leafValue.CompareTo(this.leafValue);
}
private int _nodeDelta;
public BranchNode[] branchNodes;
private int _depth;
public int leafValue;
}
In my actual program, I'm getting my data from elsewhere... but for this example, I'm just passing an instance of a Random object down the line that I'm using to generate values for each BranchNode... I'm also manually creating a depth of 10, whereas my actual data will have any number of generations.
As a quick explanation of my goals, _nodeDelta contains a value that is assigned to each BranchNode. Each instance also maintains a leafValue that is equal to current BranchNode's _nodeDelta summed with the _nodeDeltas of all of it's ancestors. I am trying to find the largest leafValue of a BranchNode with no children.
Currently, I am recursively transversing the heirarchy searching for BranchNodes whose child BranchNodes array is null (a.k.a: a 'childless' BranchNode), then comparing it's leafValue to that of the current highest leafValue. If it's larger, it becomes the benchmark and the search continues until it's looked at all BranchNodes.
I can post my recursive search algorithm if it'd help, but it's pretty standard, and is working fine. My issue is, as expected, that for larger heirarchies, my algorithm takes a long while to transverse the entier structure.
I was wondering if I had any other options that I could look into that may yield faster results... specificaly, I've been trying to wrap my head around linq, but I'm not even sure that it is built to do what I'm looking for, or if it'd be any faster. Are there other things that I should be looking into as well?

Maybe you want to look into an alternative data index structure: Here
It always depends on the work you are doing with the data, but if you assign a unique ID on each element that stores the hierarchical form, and creating an index of what you store, your optimization will make much more sense than micro-optimizing parts of what you do.
Also, this also lends itself a very different paradigm in search algorithms, that uses no recursion, but in the cost of additional memory for the IDs and possibly the index.

If you must visit all leaf nodes, you cannot speed up the search: it is going to go through all nodes no matter what. A typical trick played to speed up a search on trees is organizing them in some special way that simplifies the search of the tree. For example, by building a binary search tree, you make your search O(Log(N)). You could also store some helpful values in the non-leaf nodes from which you could later construct the answer to your search query.
For example, you could decide to store the _bestLeaf "pointing" to the leaf with the highest _nodeDelta of all leaves under the current subtree. If you do that, your search would become an O(1) lookup. Your inserts and removals would become more expensive, however, because you would need to update up to Log-b(N) items on the way back to root with the new _bestLeaf (b is the branching factor of your tree).

I think the first thing you should think about is maybe going away from the N-Tree and going to as Binary Search tree.
This means that all nodes have only 2 children, a greater child, and a lesser child.
From there, I would say look into balancing your search tree with something like a Red-Black tree or AVL. That way, searching your tree is O(log n).
Here are some links to get you started:
http://en.wikipedia.org/wiki/Binary_search_tree
http://en.wikipedia.org/wiki/AVL_tree
http://en.wikipedia.org/wiki/Red-black_tree
Now, if you are dead set on having each node able to have N child nodes, here are some things you should thing about:
Think about ordering your child nodes so that you can quickly determine which has the highest leaf number. that way, when you enter a new node, you can check one child node and quickly determine if it is worth recursively checking it's children.
Think about ways that you can quickly eliminate as many nodes as you possibly can from the search or break the recursive calls as early as you can. With the binary search tree, you can easily find the largest leaf node by always only looking at the greater child. this could eliminate N-log(n) children if the tree is balanced.
Think about inserting and deleting nodes. If you spend more time here, you could save a lot more time later

As others mention, a different data structure might be what you want.
If you need to keep the data structure the same, the recursion can be unwound into loops. While this approach will probably be a little bit faster, it's not going to be orders of magnitude faster, but might take up less memory.

Reverse Breadth First traversal in C#

Anyone has a ready implementation of the Reverse Breadth First traversal algorithm in C#?
By Reverse Breadth First traversal , I mean instead of searching a tree starting from a common node, I want to search the tree from the bottom and gradually converged to a common node.
Let's see the below figure, this is the output of a Breadth First traversal :
In my reverse breadth first traversal , 9,10,11 and 12 will be the first few nodes found ( the order of them are not important as they are all first order). 5, 6, 7 and 8 are the second few nodes found, and so on. 1 would be the last node found.
Any ideas or pointers?
Edit: Change "Breadth First Search" to "Breadth First traversal" to clarify the question

Use a combination of a stack and queue.
Do the 'normal' BFS using the queue (which I presume you know to do already), and keep pushing nodes on the stack as you encounter them.
Once done with the BFS, the stack will contain the reverse BFS order.

Run a normal BFS from rootNode and let depth[i] = linked list of nodes with depth i. So for your example you'll have:
depth[1] = {1}, depth[2] = {2, 3, 4} etc.. You can build this with a simple BFS search. Then print all the nodes in depth[maxDepth], then those in depth[maxDepth - 1] etc.
The depth of a node i is equal to the depth of its father node + 1. The depth of the root node can be considered 1 or 0.

C# graph traversal - tracking path between any two nodes

Looking for a good approach to keep track of a Breadth-First traversal between two nodes, without knowing anything about the graph. Versus Depth-First (where you can throw away the path if it doesn't pan out) you may have quite a few "open" possibilities during the traversal.

The naive approach is to build a tree with the source node as the root and all its connections as its children. Depending on the amount of space you have, you might need to eliminate cycles as you go. You can do that with a bitmap where each bit corresponds to a distinct node in the graph. When you reach the target node, you can follow the parent links back to the root and that is your path. Since you are going breadth first, you are assured that it is a shortest path even if you don't eliminate cycles.

For a breadth-first search you need to store at least two things. One is the set of already visited nodes and the other is the set of nodes that are directly reachable from the visited nodes but are not visited themselves. Then you keep moving states from the latter set to the former, adding newly reachable states to the latter. If you need the have a path from the root to some node(s), then you will also need to store a parent node for each node (except the root) in the aforementioned sets.
Usually the union of the set of visited nodes and the set of not-visited child nodes (i.e. the set of seen nodes) is stored in a hash table. This is to be able to quickly determine whether or not a "new" state has been seen before and ignore it if this is the case. If you have really big number of states you might indeed need a bit array (as mentioned by Joseph Bui (57509), but unless your states can be used (directly or indirectly) as indices to that array, you will need to use a hash function to map states to indices. In the latter case you might completely ignore certain states because they are mapped to the same index as a different (and seen) node, so you might want to be careful with this. Also, to get a path you still need to store the parent information which pretty much negates the use of the bit-array.
The set of unvisited but seen nodes can be stored as a queue. (Bit arrays are of no use for this set because the array will be mostly empty and finding the next set bit is relatively expensive.)

I just submitted a solution over here that also applies to this question.
Basically, I just keep a single list (a stack really) of visited nodes. Add a node to the list just before recursing or saving a solution. Always remove from the list directly after.

If you are using .NET 3.5 consider using the Hashset to prevent duplicate nodes from being expanded, this happens when there is cycles in your graph. If you have any knowledge about the contents of the graph consider implementing an A* search to reduce the number of nodes that are expanded. Good luck and I hope it works out for you.
If you are still a fan of treeware there are many excellent books on the topic of graphs and graph search such as Artificial Intelligence: A Modern Approach by Peter Norvig and Stuart Russell.
The links in my response appear to have a bug they are Hashset: http://msdn.com/en-us/library/bb359438.aspx and A* search: http://en.wikipedia.org/wiki/A*_search_algorithm

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.