I have to solve the rushhour problem using iterative deepening search, I'm generating new node for every move, everything works fine, except that it takes too much time to compute everything and the reason for this is that I'm generating duplicated nodes. Any ideas how to check for duplicates?
First I start at the root, then there is a method which checks every car whether is it possible to move it if yes, new node is created from the current node but the one car that has valid move replaced with new car that has new coordinates.
Problem is that the deeper the algorithm is the more duplicates moves there are.
I have tried to not to replace the car, but used the same collection as was used in root node but then the cars were moving only in one direction.
I think that I need to tie car collection somehow, but don't know how.
The code
Any ideas how to stop making duplicates?
Off topic: I'm new to C# (read several tutorial and then have been using for 2 days) so can you tell me what I'm doing wrong or what should I not do?
If you want to stick with iterative deepening, then the simplest solution may be to build a hash table. Then all you need to do with each new node is something like
NewNode = GenerateNextNode
if not InHashTable(NewNode) then
AddToHashTable(NewNode)
Process(NewNode)
Alternately, the number of possible positions (nodes) in RushHour is fairly small (assuming you are using the standard board dimensions) and it is possible to generate all possible (and impossible!) boards fairly easily. Then rather than iterative deepening you can start with the 'solution' state and work backwards (ticking off all possible 'parent' states) until you reach the start state. By working on the table of possible states you never generate duplicates, and by tagging each state once it is visited you never re-visit states.
Related
Interview question on building and searching an adjacency tree, but I've never worked with them before, so I'm not sure where to begin.
I have a file containing data like:
2 13556 225 235
225 2212 226
2212 8888 2213
8888 144115 72629 141336 8889 146090 129948 167357
144115 160496 163089 144114 144116
...
formatted as such:
<parent node> <child node> [ <child node> [ …] ]
Every edge has length 1.
I then need to calculate the shortest path between two of the nodes (the two are specified in the question). Then, I need to provide the estimated complexity in big-O notation.
The latter I can probably fudge, though I've never even heard of it until now and wikipedia doesn't help me much in terms of understanding how to break down a search function into big-O, but I'll worry about that later (unless someone has a good link they could share).
My concern now is trying to model this data and then search it for the shortest path. Like I said, I've never worked with this kind of structure before so I'm kind of at a loss as to where to even begin. I found another question on adjacency lists here, but it doesn't appear to be quite what I'm looking for, unless I'm just totally missing the point. Seems to me, the input data would need to be re-organized to satisfy the structure used in that question, whereas I'm reading my data from a file so I would think I'd need to traverse every node and list of nodes to determine if I have already entered a parent and that could take a long time, potentially. I also don't see how I'd create a bfs search using that structure either.
There are lots of examples of searching out there, so I can likely sort out that part, but any help in getting a data model started that would be suitable for loading from the data file and suitable for a bfs search (or, if there's a better search option out there, please school me), would be of great help.
You'll like be storing this data in a HashTable<int, List<int>> (Dictionary) (Links), key being int (NodeID) and value being List<int>, where these are the possible destinations from the node which is the key.
You'll need to have another HashTable<int, int> (ShortestPathLastStep), which will store two NodeIDs. This will represent the last step in the shortest path to arrive at a given node. You need this to be able to play back the shortest path.
To perform a BFS (Breadth-First-Search) you'll use a Queue<int> (bfsQueue). Push the start node (given in your question) onto the queue. Now execute the following algorithm
-- currentNodeID = pop bfsQueue
---- children = Links[NodeID]
------ foreach (childNodeID in children)
--------- if (childNodeID == destinationNodeID)
----------- exit and playback shortest path
----------if (!ShortestPathLastStep.contains(childNodeID))
------------ ShortestPathLastStep.Add(childNodeID, currentNodeID);
----------bfsQueue.Push(childNodeID);
----------goto first line
This solution assumes traveling between any two nodes is a constant cost. It is ideal for BFS because the first time you arrive at the destination you will have taken the shortest path (not true if links have variable length). If links are not constant length you'll have to add more logic when deciding to overwrite the ShortestPathLastStep value, you won't be able to exit until your queue is EMPTY and you'll only be pushing nodes onto the queue if you've never been to the node (it won't exist in the short path list) or you've discovered this new way of arriving there is shorter than the last way of getting there (now you'll have to recalculate shortest distances for the nodes you can get to from this node).
Problem background
I am currently developing a framework of Ant Colony System algorithms. I thought I'd start out by trying them on the first problem they were applied to: Travelling Salesman Problem (TSP). I will be using C# for the task.
All TSP instances will consist of a complete undirected graph with 2 different weights associated with each edge.
Question
Until now I've only used adjacency-list representations but I've read that they are recommended only for sparse graphs. As I am not the most knowledgeable of persons when it comes to data structures I was wondering what would be the most efficient way to implement an undirected complete graph?
I can provide additional details if required.
Thank you for your time.
UPDATE
Weight clarification. Each edge will have the two values associated with them:
distance between two cities ( d(i,j) = d(j,i) same distance in both directions)
amount of pheromone deposited by ants on that particular edge
Operations. Small summary of the operations I will be doing on the graph:
for each node, the ant on that particular node will have to iterate through the values associated with all incident edges
Problem clarification
Ant Colony Optimization algorithms can "solve" TSP as this is where they were first applied to . I say "solve" because they are part of a family of algorithms called metaheuristics optimizations, thus they never guarantee to return the optimal solution.
Regarding the problem at hand:
ants will know how to complete a tour because each ant will have a memory.
each time an ant visits a city it will store that city in its memory.
each time an ant considers visiting a new city it will search in its memory and pick an outgoing edge only if that edge will not lead it to an already visited city.
when there are no more edges the ant can choose it has complete a tour; at this point we can retrace the tour created by the ant by backtracking through its memory.
Research article details: Ant Colony System article
Efficiency considerations
I am more worried about run time (speed) than memory.
First, there's a general guide to adjacency lists vs matrices here. That's a pretty low-level, non-specific discussion, though, so it might not tell you anything you don't already know.
The takeaway, I think, is this: If you often find yourself needing to answer the question, "I need to know the distance or the pheromone level of the edge between exactly node i and node j," then you probably want the matrix form, as that question can be answered in O(1) time.
You do mention needing to iterate over the edges adjacent to a node-- here is where some cleverness and subtlety may come in. If you don't care about the order of the iteration, then you don't care about the data structure. If you care deeply about the order, and you know the order up front, and it never changes, you can probably code this directly into an adjacency list. If you find yourself always wanting, e.g., the largest or smallest concentration of pheromones, you may want to try something even more structured, like a priority queue. It really depends on what kind of operations you're doing.
Finally, I know you mention that you're more interested in speed than memory, but it's not clear to me how many graph representations you'll be using. If only one, then you truly don't care. But, if each ant is building up its own representation of the graph as it goes along, you might care more than you think, and the adjacency list will let you carry around incomplete graph representations; the flip side of that is that it will take time to build the representations up when the ant is exploring on its frontier.
Finally finally, I know you say you're dealing with complete graphs and TSP, but it is worth thinking about whether you will ever need to adapt these routines to some other problem on possibly graphs, and if so, what then.
I lean toward adjacency lists and/or even more structure, but I don't think you will find a clean, crisp answer.
Since you have a complete graph I would think that the best representation would be a 2D array.
public class Edge
{
//change types as appropriate
public int Distance {get;set;}
public int Pheromone {get;set;}
}
int numNodes;
Edge[,] graph = new Edge[numNodes,numNodes];
for(int i = 0; i < numNodes; i++)
{
for(int j = 0; j < numNodes; j++)
{
graph[i][j] = new Edge();
//initialize Edge
}
}
If you have a LOT of nodes, and don't "remember" nodes by index in this graph, then it may be beneficial to have a Dictionary that maps a Node to the index in the graph. It may also be helpful to have the reverse lookup (a List would be the appropriate data structure here. This would give you the ability to get a Node object (if you have a lot of information to store about each node) based on the index of that node in the graph.
I have a large dataset with possibly over a million entries. All items have an assigned time stamp and items are added to the set at runtime (usually, but not always, with a newer time stamp).
I need to show a sub set of this data given a certain time range. This time range is usually quite small compared to the total data set, i.e. of the 1.000.000+ items not more than about 1000 are in that given time range. This time range moves at a constant pace, e.g. every second the time range is moved by one second.
Additionally, the user may adjust the time range at any time ("move" through the data set) or set additional filters (e.g. filter by some text).
So far I wasn't worried about performance, trying to get the other things right, and only worked with smaller test sets. I am not quite sure how to tackle this problem efficiently and would be glad for every input. Thanks.
Edit: Used language is C# 4.
Update: I am now using a interval tree, implementation can be found here:
https://github.com/mbuchetics/RangeTree
It also comes with an asynchronous version which rebuilds the tree using the Task Parallel Library (TPL).
We had similar problem in our development - had to collect several million items sorted by some key and then export one page on demand from it. I see that your problem is somehow similar.
For the purpose, we adapted the red-black tree structure, in the following ways:
we added the iterator to it, so we could get 'next' item in o(1)
we added finding the iterator from the 'index', and managed to do that in O(log n)
RB Tree has O(log n) insertion complexity, so I guess that your insertions will fit in there nicely.
next() on the iterator was implemented by adding and maintaining the linked list of all leaf nodes - our original adopted RB Tree implementation didn't include this.
RB Tree is also cool because it allows you to fine-tune the node size according to your needs. By experimenting you'll be able to figure right numbers that just fit your problem.
Use SortedList sorted by timestamp.
All you have to is to have a implement a binary search on the sorted keys inside the sorted list to find the boundary of your selection which is pretty easy.
Insert new items into a sorted list. This would let you select a range pretty easily. You could potentially use linq as well if you're familiar with it.
I'm looking for the longest-path trough a map in a game which is turn based. I got 1s computation time and need to move at that point.
Right now I'm generating the tree every move again.
Is it possible to use my old tree and stack (in which I store the nodes yet to be visited) to get a bigger depth and thus a better result?
For now my SearchClass is based on a Interface, thus changing the return-type and the input-variables of my function is a lot of work. Is there an easy solution for my problem?
If your map is static and not overly large, you could generate your tree in advance.
For each node, calculate the longest path to every other node on the map then store the path and its length against the original node. That way, you no longer need to compute paths during program execution; you only need to use the pre-computed longest path from your current node to your chosen destination.
Can you probably make your (Player's) tree static? Or if you are the only player, you could make it a global variable for the whole program on player's-side, but that depends on many things, that you did not share with us. I would still suggest you to have a look at MCTS: Wikipedia description and Here, it has sample code.
With MCTS the idea is simple: You compute all your 900ms, then make a player's move to the node, that has the highest winning-probability. If you can persist the tree as a global or static ( or both :D ) variable, the first thing, that you do at the begining of the next turn (or the next computation) is to get rid of all parts of the previous tree, that you can not access any more - because you are not at position [0][0], but at position lets say [1][3] ... so that shrinks the tree-size for you, which is good. So what you have to do is to replace the original tree with a new tree, which starts with the node, that you are at the moment standing on. Good thing is, you have some values precomputed, now it depends on your implementation, how you want the nodes to be explored and/or probability-updated. But as the game goes on, the program should have enough data, that it can guarrant it a very high winning probability.
This approach is exceptionally good, because it does not compute the probabilities of the steps you do not take, when known, you are not going to take them (this is a thing you did not mention in your approach and I find it a necessity, so that's why I responded).
Excuse any failures, I'm gonna specify/update/correct things upon request.
And all the things you are doing seem to meet some pattern of university-delivery, see for example, if this is not your case, you could pretty well inspire there. If you have to meet some school-delivery, make sure you do not discuss too much into detail and/or do not ask for technical-implementation help.
Looking for a good approach to keep track of a Breadth-First traversal between two nodes, without knowing anything about the graph. Versus Depth-First (where you can throw away the path if it doesn't pan out) you may have quite a few "open" possibilities during the traversal.
The naive approach is to build a tree with the source node as the root and all its connections as its children. Depending on the amount of space you have, you might need to eliminate cycles as you go. You can do that with a bitmap where each bit corresponds to a distinct node in the graph. When you reach the target node, you can follow the parent links back to the root and that is your path. Since you are going breadth first, you are assured that it is a shortest path even if you don't eliminate cycles.
For a breadth-first search you need to store at least two things. One is the set of already visited nodes and the other is the set of nodes that are directly reachable from the visited nodes but are not visited themselves. Then you keep moving states from the latter set to the former, adding newly reachable states to the latter. If you need the have a path from the root to some node(s), then you will also need to store a parent node for each node (except the root) in the aforementioned sets.
Usually the union of the set of visited nodes and the set of not-visited child nodes (i.e. the set of seen nodes) is stored in a hash table. This is to be able to quickly determine whether or not a "new" state has been seen before and ignore it if this is the case. If you have really big number of states you might indeed need a bit array (as mentioned by Joseph Bui (57509), but unless your states can be used (directly or indirectly) as indices to that array, you will need to use a hash function to map states to indices. In the latter case you might completely ignore certain states because they are mapped to the same index as a different (and seen) node, so you might want to be careful with this. Also, to get a path you still need to store the parent information which pretty much negates the use of the bit-array.
The set of unvisited but seen nodes can be stored as a queue. (Bit arrays are of no use for this set because the array will be mostly empty and finding the next set bit is relatively expensive.)
I just submitted a solution over here that also applies to this question.
Basically, I just keep a single list (a stack really) of visited nodes. Add a node to the list just before recursing or saving a solution. Always remove from the list directly after.
If you are using .NET 3.5 consider using the Hashset to prevent duplicate nodes from being expanded, this happens when there is cycles in your graph. If you have any knowledge about the contents of the graph consider implementing an A* search to reduce the number of nodes that are expanded. Good luck and I hope it works out for you.
If you are still a fan of treeware there are many excellent books on the topic of graphs and graph search such as Artificial Intelligence: A Modern Approach by Peter Norvig and Stuart Russell.
The links in my response appear to have a bug they are Hashset: http://msdn.com/en-us/library/bb359438.aspx and A* search: http://en.wikipedia.org/wiki/A*_search_algorithm