C# graph traversal - tracking path between any two nodes

C# graph traversal - tracking path between any two nodes - c#

Looking for a good approach to keep track of a Breadth-First traversal between two nodes, without knowing anything about the graph. Versus Depth-First (where you can throw away the path if it doesn't pan out) you may have quite a few "open" possibilities during the traversal.

The naive approach is to build a tree with the source node as the root and all its connections as its children. Depending on the amount of space you have, you might need to eliminate cycles as you go. You can do that with a bitmap where each bit corresponds to a distinct node in the graph. When you reach the target node, you can follow the parent links back to the root and that is your path. Since you are going breadth first, you are assured that it is a shortest path even if you don't eliminate cycles.

For a breadth-first search you need to store at least two things. One is the set of already visited nodes and the other is the set of nodes that are directly reachable from the visited nodes but are not visited themselves. Then you keep moving states from the latter set to the former, adding newly reachable states to the latter. If you need the have a path from the root to some node(s), then you will also need to store a parent node for each node (except the root) in the aforementioned sets.
Usually the union of the set of visited nodes and the set of not-visited child nodes (i.e. the set of seen nodes) is stored in a hash table. This is to be able to quickly determine whether or not a "new" state has been seen before and ignore it if this is the case. If you have really big number of states you might indeed need a bit array (as mentioned by Joseph Bui (57509), but unless your states can be used (directly or indirectly) as indices to that array, you will need to use a hash function to map states to indices. In the latter case you might completely ignore certain states because they are mapped to the same index as a different (and seen) node, so you might want to be careful with this. Also, to get a path you still need to store the parent information which pretty much negates the use of the bit-array.
The set of unvisited but seen nodes can be stored as a queue. (Bit arrays are of no use for this set because the array will be mostly empty and finding the next set bit is relatively expensive.)

I just submitted a solution over here that also applies to this question.
Basically, I just keep a single list (a stack really) of visited nodes. Add a node to the list just before recursing or saving a solution. Always remove from the list directly after.

If you are using .NET 3.5 consider using the Hashset to prevent duplicate nodes from being expanded, this happens when there is cycles in your graph. If you have any knowledge about the contents of the graph consider implementing an A* search to reduce the number of nodes that are expanded. Good luck and I hope it works out for you.
If you are still a fan of treeware there are many excellent books on the topic of graphs and graph search such as Artificial Intelligence: A Modern Approach by Peter Norvig and Stuart Russell.
The links in my response appear to have a bug they are Hashset: http://msdn.com/en-us/library/bb359438.aspx and A* search: http://en.wikipedia.org/wiki/A*_search_algorithm

Related

How to construct BFS-suitable data structure for adjacency list?

Interview question on building and searching an adjacency tree, but I've never worked with them before, so I'm not sure where to begin.
I have a file containing data like:
2 13556 225 235
225 2212 226
2212 8888 2213
8888 144115 72629 141336 8889 146090 129948 167357
144115 160496 163089 144114 144116
...
formatted as such:
<parent node> <child node> [ <child node> [ …] ]
Every edge has length 1.
I then need to calculate the shortest path between two of the nodes (the two are specified in the question). Then, I need to provide the estimated complexity in big-O notation.
The latter I can probably fudge, though I've never even heard of it until now and wikipedia doesn't help me much in terms of understanding how to break down a search function into big-O, but I'll worry about that later (unless someone has a good link they could share).
My concern now is trying to model this data and then search it for the shortest path. Like I said, I've never worked with this kind of structure before so I'm kind of at a loss as to where to even begin. I found another question on adjacency lists here, but it doesn't appear to be quite what I'm looking for, unless I'm just totally missing the point. Seems to me, the input data would need to be re-organized to satisfy the structure used in that question, whereas I'm reading my data from a file so I would think I'd need to traverse every node and list of nodes to determine if I have already entered a parent and that could take a long time, potentially. I also don't see how I'd create a bfs search using that structure either.
There are lots of examples of searching out there, so I can likely sort out that part, but any help in getting a data model started that would be suitable for loading from the data file and suitable for a bfs search (or, if there's a better search option out there, please school me), would be of great help.

You'll like be storing this data in a HashTable<int, List<int>> (Dictionary) (Links), key being int (NodeID) and value being List<int>, where these are the possible destinations from the node which is the key.
You'll need to have another HashTable<int, int> (ShortestPathLastStep), which will store two NodeIDs. This will represent the last step in the shortest path to arrive at a given node. You need this to be able to play back the shortest path.
To perform a BFS (Breadth-First-Search) you'll use a Queue<int> (bfsQueue). Push the start node (given in your question) onto the queue. Now execute the following algorithm
-- currentNodeID = pop bfsQueue
---- children = Links[NodeID]
------ foreach (childNodeID in children)
--------- if (childNodeID == destinationNodeID)
----------- exit and playback shortest path
----------if (!ShortestPathLastStep.contains(childNodeID))
------------ ShortestPathLastStep.Add(childNodeID, currentNodeID);
----------bfsQueue.Push(childNodeID);
----------goto first line
This solution assumes traveling between any two nodes is a constant cost. It is ideal for BFS because the first time you arrive at the destination you will have taken the shortest path (not true if links have variable length). If links are not constant length you'll have to add more logic when deciding to overwrite the ShortestPathLastStep value, you won't be able to exit until your queue is EMPTY and you'll only be pushing nodes onto the queue if you've never been to the node (it won't exist in the short path list) or you've discovered this new way of arriving there is shorter than the last way of getting there (now you'll have to recalculate shortest distances for the nodes you can get to from this node).

C# RushHour iterative deepenig, optimization

I have to solve the rushhour problem using iterative deepening search, I'm generating new node for every move, everything works fine, except that it takes too much time to compute everything and the reason for this is that I'm generating duplicated nodes. Any ideas how to check for duplicates?
First I start at the root, then there is a method which checks every car whether is it possible to move it if yes, new node is created from the current node but the one car that has valid move replaced with new car that has new coordinates.
Problem is that the deeper the algorithm is the more duplicates moves there are.
I have tried to not to replace the car, but used the same collection as was used in root node but then the cars were moving only in one direction.
I think that I need to tie car collection somehow, but don't know how.
The code
Any ideas how to stop making duplicates?
Off topic: I'm new to C# (read several tutorial and then have been using for 2 days) so can you tell me what I'm doing wrong or what should I not do?

If you want to stick with iterative deepening, then the simplest solution may be to build a hash table. Then all you need to do with each new node is something like
NewNode = GenerateNextNode
if not InHashTable(NewNode) then
AddToHashTable(NewNode)
Process(NewNode)
Alternately, the number of possible positions (nodes) in RushHour is fairly small (assuming you are using the standard board dimensions) and it is possible to generate all possible (and impossible!) boards fairly easily. Then rather than iterative deepening you can start with the 'solution' state and work backwards (ticking off all possible 'parent' states) until you reach the start state. By working on the table of possible states you never generate duplicates, and by tagging each state once it is visited you never re-visit states.

Insert new leaves in R* Tree

What is the steps of the insertion algorithm of R* Tree?
Note: I want to be able to construct the tree by insertion. IT ALWAYS GIVES ME CRAP TREES with maximum overlap and maximum area cover not matter what condition I choose for selecting the best leaf (test minimum overlap area after adding at each level of the tree, minimum expansion ration at each level of the tree, etc).
Now how this R* Tree is constructed by insertion so beautifully (from Wikipedia):

The R*-Tree is not just a different insertion leaf strategy.
The spitting strategy (perimeter!) is just as important, as it prefers "quadratic" pages, as opposed to the slices produced by other strategies such as Ang-Tan.
Furthermore, and this is maybe the key to getting prettier trees, the R*-tree performs a kind of rebalancing to actively avoid bad splits. Instead of splitting, when a node is overfull, it also removes the least central elements (or subtrees - you need this at all levels) and reinserts them. This doesn't always prevent the overflow, but it may reduce overlap in a tree.
But of course you can do various mistakes in implementing, and the R-tree will still work, just not perform well because of the bad structure. How bad is your tree, do you have a screenshot?

Reuse my old Depth-First Tree for a bigger depth while searching the longest path

I'm looking for the longest-path trough a map in a game which is turn based. I got 1s computation time and need to move at that point.
Right now I'm generating the tree every move again.
Is it possible to use my old tree and stack (in which I store the nodes yet to be visited) to get a bigger depth and thus a better result?
For now my SearchClass is based on a Interface, thus changing the return-type and the input-variables of my function is a lot of work. Is there an easy solution for my problem?

If your map is static and not overly large, you could generate your tree in advance.
For each node, calculate the longest path to every other node on the map then store the path and its length against the original node. That way, you no longer need to compute paths during program execution; you only need to use the pre-computed longest path from your current node to your chosen destination.

Can you probably make your (Player's) tree static? Or if you are the only player, you could make it a global variable for the whole program on player's-side, but that depends on many things, that you did not share with us. I would still suggest you to have a look at MCTS: Wikipedia description and Here, it has sample code.
With MCTS the idea is simple: You compute all your 900ms, then make a player's move to the node, that has the highest winning-probability. If you can persist the tree as a global or static ( or both :D ) variable, the first thing, that you do at the begining of the next turn (or the next computation) is to get rid of all parts of the previous tree, that you can not access any more - because you are not at position [0][0], but at position lets say [1][3] ... so that shrinks the tree-size for you, which is good. So what you have to do is to replace the original tree with a new tree, which starts with the node, that you are at the moment standing on. Good thing is, you have some values precomputed, now it depends on your implementation, how you want the nodes to be explored and/or probability-updated. But as the game goes on, the program should have enough data, that it can guarrant it a very high winning probability.
This approach is exceptionally good, because it does not compute the probabilities of the steps you do not take, when known, you are not going to take them (this is a thing you did not mention in your approach and I find it a necessity, so that's why I responded).
Excuse any failures, I'm gonna specify/update/correct things upon request.
And all the things you are doing seem to meet some pattern of university-delivery, see for example, if this is not your case, you could pretty well inspire there. If you have to meet some school-delivery, make sure you do not discuss too much into detail and/or do not ask for technical-implementation help.

Implementing a tree from scratch

I'm trying to learn about trees by implementing one from scratch.
In this case I'd like to do it in C# Java or C++. (without using built in methods)
So each node will store a character and there will be a maximum of 26 nodes per node.
What data structure would I use to contain the pointers to each of the nodes?
Basically I'm trying to implement a radix tree from scratch.
Thanks,

What data structure would I use to contain the pointers to each of the nodes?
A Node. Each Node should have references to (up to) 26 other Nodes in the Tree. Within the Node you can store them in an array, LinkedList, ArrayList, or just about any other collection you can think of.

What you describe isn't quite a radix tree... in a radix tree, you can have more than one character in a node, and there is no upper bound on the number of child nodes.
What you're describing sounds more limited by the alphabet... each node can be a-z, and can be followed by another letter, a-z, etc. The distinction is critical to the data structure you choose to hold your next-node pointers.
In the tree you describe, the easiest structure to use might be a simple array of pointers... all you need to do is convert the character (e.g. 'A') to its ascii value ('65'), and subtract the starting offset (65) to determine which 'next node' you want. Takes up more space, but very fast insert and traversal.
In a true radix tree, you could have 3, 4, 78, or 0 child nodes, and your 'next node' list will have the overhead of sorting, inserting, and deleting. Much slower.
I can't speak to Java, but if I were implementing a custom radix tree in C#, I'd use one of the built-in .NET collections. Writing your own sorted list isn't really helping you learn the tree concepts, and the built-in optimizations of the .NET collections are tough to beat. Then, your code is simple: Look up your next node; if exists, grab it and go; if not, add it to the next-node collection.
Which collection you use depends on what exactly you're implementing through the tree... every type of tree involves tradeoffs between insertion time, lookup time, etc. The choices you make depend on what is most important to the application, not the tree.
Make sense?

Here's one I found recently that's not a bad API for trees - although I needed graphs it was handy to see how it was set up to separate the data structure for the data it was holding, so you could have a tree-equivalent to Iterator to navigate through the tree, and so on.
https://jsfcompounds.dev.java.net/treeutils/site/apidocs/com/truchsess/util/package-summary.html

If you are actually more interested in speed than space, and if each node represents exactly one letter (implied by your max of 26) then I'd just use a simple array of 26 slots, each referencing a "Node" (the Node is the object containing your array).
The nice thing about a fixed-sized array is that your look up would be much quicker. If you were looking up char "c" that was already guaranteed to be a lower cased letter, the look up would be as easy as:
nextNode=nodes[c-'a'];
A recursive lookup of a string would be trivial.

Thanks for the quick replies.
Yes was snogfish said was correct.
Basically, its a tree with 26 nodes (A-Z) + a bool isTerminator.
Each each node has theses values and they are linked to each other.
I have not learned pointers in depth yet so my tries today to implement this from scratch using unsafe code in C# where futile.
Therefore, I'd be grateful if someone could provide me with the code to get started in C# using the internal tree class. Once I can get it started I can port the algorithms to the other languages and just change it to use pointers.
Thanks very much,
Michael

It doesn't really matter. You can use a linked list, an array (but this will have a fixed size), or a List type from the standard library of your language.
Using a List/array will mean doing some index book-keeping to traverse the tree, so it might be easiest to use just keep references to the children in the parent.

Check out this Simeon Pilgrim Blog, the "Code Camp Puzzle Reviewed". One of the solutions uses a Radix in C# and you can download the solution.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.