I have a certain subset of nodes of an undirected and unweighted graph. I am trying to determine whether there is a path between all of these nodes, and, if there is, what is the shortest path which includes the fewest nodes which are not in the subset of nodes.
I have been trying to think of a way to modify a minimum spanning tree algorithm to accomplish this, but so far I haven't come up with a workable solution.
Is there a good way to do this or is this a description of an already known algorithm?
I am trying to determine whether there is a path between all of these
nodes
(I understand from this you are looking for a single path that visits all the marked nodes)
Well my friend, this could be a problem - you are describing a variation of the Traveling Salesman Problem and the Hamiltonian Path Problem (If you are looking for a simple path, the reduction from Hamiltonian Path is straight forward: mark all the nodes).
But I am afraid these problems are NP-Hard.
An NP-Hard problem is a problem that we do not know of any polynomial time solution to solve it, and the general assumption around is - one doesn't exist1.
Thus, your best shot is probably going to be some exponential solution. There is O(n^2 * 2^n) solution to TSP using dynamic programming, or brute force solution which are O(n!)
(1) Really not a formal definition, but this is enough information to understand the problem, there is really a lot more into NP-Hard problems.
Here is an approach that may get you some of the way there:
Use Floyd-Warshall or Dijkstra's to find the distance d(i, j) between node i and node j for every i and j such that node i and node j are in the subset of nodes.
(if d(i,j) = infinity then stop now, there is no solution)
Make a new graph which contains each node from the subset. For each d(i, j), add an edge between node i, node j in the new graph with the weight = d(i, j)
Now use a traveling salesman algorithm on this new graph to find the shortest path to visit all nodes.
This shortest path gives you the length of the path but the path may visit some nodes multiple times. This means we have an upper bound on the number of nodes outside of the subset required.
Dijkstra's algorithm or use a breadth first search.
You should use Dijkstra's shortest path algorithm. First, you must assign weights(or distances) to all edges in the graph, every edge that connects two nodes that are not in the subset must be given weight 1. Every edge that connects one or two nodes from the subset must be given infinite weight. Second, you should run Dijkstra's algorithm on the resulted graph.
This algorithm will examine every edge of the graph.
Also, you can use A* (A-star) algorithm.
Update:
I don't understand this problem at first. As #amit says, this is a NP-hard problem, a combination of HCP and TSP. Maybe some sort of stochastic search algorithm can solve this in polynomial time with high probability.
For someone who doesn't really have a background in graph theory, I have tackled this problem and found that in an unweighted, undirected graph the easiest method is Depth First Search. Implementations of algorithms such as Dijkstra's often take a weighted solution and input an arbitrary value for the weight.
The solution I found to work I traversed nodes in using DFS and log every successful journey, then it's simply a case of returning the shortest successful journey.
Here's the file that does the heavy lifting:
Depth First Search Algorithm
I created a Graph/Node/Connection classes that not only shows you the shortest path but also can tell you if all nodes are connected:
var allNodesAreConnected = StartNode.AllNodes.All(n => n.IsConnectedToStartNode);
Or if you want to know what nodes are not connected change it a little bit:
var anotConnectedNodes = StartNode.AllNodes.Where(n => !n.IsConnectedToStartNode);
More examples and full code in this post:
Create your own navigation system (with a Graph, Node and Connection class)
Related
Interview question on building and searching an adjacency tree, but I've never worked with them before, so I'm not sure where to begin.
I have a file containing data like:
2 13556 225 235
225 2212 226
2212 8888 2213
8888 144115 72629 141336 8889 146090 129948 167357
144115 160496 163089 144114 144116
...
formatted as such:
<parent node> <child node> [ <child node> [ …] ]
Every edge has length 1.
I then need to calculate the shortest path between two of the nodes (the two are specified in the question). Then, I need to provide the estimated complexity in big-O notation.
The latter I can probably fudge, though I've never even heard of it until now and wikipedia doesn't help me much in terms of understanding how to break down a search function into big-O, but I'll worry about that later (unless someone has a good link they could share).
My concern now is trying to model this data and then search it for the shortest path. Like I said, I've never worked with this kind of structure before so I'm kind of at a loss as to where to even begin. I found another question on adjacency lists here, but it doesn't appear to be quite what I'm looking for, unless I'm just totally missing the point. Seems to me, the input data would need to be re-organized to satisfy the structure used in that question, whereas I'm reading my data from a file so I would think I'd need to traverse every node and list of nodes to determine if I have already entered a parent and that could take a long time, potentially. I also don't see how I'd create a bfs search using that structure either.
There are lots of examples of searching out there, so I can likely sort out that part, but any help in getting a data model started that would be suitable for loading from the data file and suitable for a bfs search (or, if there's a better search option out there, please school me), would be of great help.
You'll like be storing this data in a HashTable<int, List<int>> (Dictionary) (Links), key being int (NodeID) and value being List<int>, where these are the possible destinations from the node which is the key.
You'll need to have another HashTable<int, int> (ShortestPathLastStep), which will store two NodeIDs. This will represent the last step in the shortest path to arrive at a given node. You need this to be able to play back the shortest path.
To perform a BFS (Breadth-First-Search) you'll use a Queue<int> (bfsQueue). Push the start node (given in your question) onto the queue. Now execute the following algorithm
-- currentNodeID = pop bfsQueue
---- children = Links[NodeID]
------ foreach (childNodeID in children)
--------- if (childNodeID == destinationNodeID)
----------- exit and playback shortest path
----------if (!ShortestPathLastStep.contains(childNodeID))
------------ ShortestPathLastStep.Add(childNodeID, currentNodeID);
----------bfsQueue.Push(childNodeID);
----------goto first line
This solution assumes traveling between any two nodes is a constant cost. It is ideal for BFS because the first time you arrive at the destination you will have taken the shortest path (not true if links have variable length). If links are not constant length you'll have to add more logic when deciding to overwrite the ShortestPathLastStep value, you won't be able to exit until your queue is EMPTY and you'll only be pushing nodes onto the queue if you've never been to the node (it won't exist in the short path list) or you've discovered this new way of arriving there is shorter than the last way of getting there (now you'll have to recalculate shortest distances for the nodes you can get to from this node).
I have to solve the rushhour problem using iterative deepening search, I'm generating new node for every move, everything works fine, except that it takes too much time to compute everything and the reason for this is that I'm generating duplicated nodes. Any ideas how to check for duplicates?
First I start at the root, then there is a method which checks every car whether is it possible to move it if yes, new node is created from the current node but the one car that has valid move replaced with new car that has new coordinates.
Problem is that the deeper the algorithm is the more duplicates moves there are.
I have tried to not to replace the car, but used the same collection as was used in root node but then the cars were moving only in one direction.
I think that I need to tie car collection somehow, but don't know how.
The code
Any ideas how to stop making duplicates?
Off topic: I'm new to C# (read several tutorial and then have been using for 2 days) so can you tell me what I'm doing wrong or what should I not do?
If you want to stick with iterative deepening, then the simplest solution may be to build a hash table. Then all you need to do with each new node is something like
NewNode = GenerateNextNode
if not InHashTable(NewNode) then
AddToHashTable(NewNode)
Process(NewNode)
Alternately, the number of possible positions (nodes) in RushHour is fairly small (assuming you are using the standard board dimensions) and it is possible to generate all possible (and impossible!) boards fairly easily. Then rather than iterative deepening you can start with the 'solution' state and work backwards (ticking off all possible 'parent' states) until you reach the start state. By working on the table of possible states you never generate duplicates, and by tagging each state once it is visited you never re-visit states.
Problem background
I am currently developing a framework of Ant Colony System algorithms. I thought I'd start out by trying them on the first problem they were applied to: Travelling Salesman Problem (TSP). I will be using C# for the task.
All TSP instances will consist of a complete undirected graph with 2 different weights associated with each edge.
Question
Until now I've only used adjacency-list representations but I've read that they are recommended only for sparse graphs. As I am not the most knowledgeable of persons when it comes to data structures I was wondering what would be the most efficient way to implement an undirected complete graph?
I can provide additional details if required.
Thank you for your time.
UPDATE
Weight clarification. Each edge will have the two values associated with them:
distance between two cities ( d(i,j) = d(j,i) same distance in both directions)
amount of pheromone deposited by ants on that particular edge
Operations. Small summary of the operations I will be doing on the graph:
for each node, the ant on that particular node will have to iterate through the values associated with all incident edges
Problem clarification
Ant Colony Optimization algorithms can "solve" TSP as this is where they were first applied to . I say "solve" because they are part of a family of algorithms called metaheuristics optimizations, thus they never guarantee to return the optimal solution.
Regarding the problem at hand:
ants will know how to complete a tour because each ant will have a memory.
each time an ant visits a city it will store that city in its memory.
each time an ant considers visiting a new city it will search in its memory and pick an outgoing edge only if that edge will not lead it to an already visited city.
when there are no more edges the ant can choose it has complete a tour; at this point we can retrace the tour created by the ant by backtracking through its memory.
Research article details: Ant Colony System article
Efficiency considerations
I am more worried about run time (speed) than memory.
First, there's a general guide to adjacency lists vs matrices here. That's a pretty low-level, non-specific discussion, though, so it might not tell you anything you don't already know.
The takeaway, I think, is this: If you often find yourself needing to answer the question, "I need to know the distance or the pheromone level of the edge between exactly node i and node j," then you probably want the matrix form, as that question can be answered in O(1) time.
You do mention needing to iterate over the edges adjacent to a node-- here is where some cleverness and subtlety may come in. If you don't care about the order of the iteration, then you don't care about the data structure. If you care deeply about the order, and you know the order up front, and it never changes, you can probably code this directly into an adjacency list. If you find yourself always wanting, e.g., the largest or smallest concentration of pheromones, you may want to try something even more structured, like a priority queue. It really depends on what kind of operations you're doing.
Finally, I know you mention that you're more interested in speed than memory, but it's not clear to me how many graph representations you'll be using. If only one, then you truly don't care. But, if each ant is building up its own representation of the graph as it goes along, you might care more than you think, and the adjacency list will let you carry around incomplete graph representations; the flip side of that is that it will take time to build the representations up when the ant is exploring on its frontier.
Finally finally, I know you say you're dealing with complete graphs and TSP, but it is worth thinking about whether you will ever need to adapt these routines to some other problem on possibly graphs, and if so, what then.
I lean toward adjacency lists and/or even more structure, but I don't think you will find a clean, crisp answer.
Since you have a complete graph I would think that the best representation would be a 2D array.
public class Edge
{
//change types as appropriate
public int Distance {get;set;}
public int Pheromone {get;set;}
}
int numNodes;
Edge[,] graph = new Edge[numNodes,numNodes];
for(int i = 0; i < numNodes; i++)
{
for(int j = 0; j < numNodes; j++)
{
graph[i][j] = new Edge();
//initialize Edge
}
}
If you have a LOT of nodes, and don't "remember" nodes by index in this graph, then it may be beneficial to have a Dictionary that maps a Node to the index in the graph. It may also be helpful to have the reverse lookup (a List would be the appropriate data structure here. This would give you the ability to get a Node object (if you have a lot of information to store about each node) based on the index of that node in the graph.
I'm looking for the longest-path trough a map in a game which is turn based. I got 1s computation time and need to move at that point.
Right now I'm generating the tree every move again.
Is it possible to use my old tree and stack (in which I store the nodes yet to be visited) to get a bigger depth and thus a better result?
For now my SearchClass is based on a Interface, thus changing the return-type and the input-variables of my function is a lot of work. Is there an easy solution for my problem?
If your map is static and not overly large, you could generate your tree in advance.
For each node, calculate the longest path to every other node on the map then store the path and its length against the original node. That way, you no longer need to compute paths during program execution; you only need to use the pre-computed longest path from your current node to your chosen destination.
Can you probably make your (Player's) tree static? Or if you are the only player, you could make it a global variable for the whole program on player's-side, but that depends on many things, that you did not share with us. I would still suggest you to have a look at MCTS: Wikipedia description and Here, it has sample code.
With MCTS the idea is simple: You compute all your 900ms, then make a player's move to the node, that has the highest winning-probability. If you can persist the tree as a global or static ( or both :D ) variable, the first thing, that you do at the begining of the next turn (or the next computation) is to get rid of all parts of the previous tree, that you can not access any more - because you are not at position [0][0], but at position lets say [1][3] ... so that shrinks the tree-size for you, which is good. So what you have to do is to replace the original tree with a new tree, which starts with the node, that you are at the moment standing on. Good thing is, you have some values precomputed, now it depends on your implementation, how you want the nodes to be explored and/or probability-updated. But as the game goes on, the program should have enough data, that it can guarrant it a very high winning probability.
This approach is exceptionally good, because it does not compute the probabilities of the steps you do not take, when known, you are not going to take them (this is a thing you did not mention in your approach and I find it a necessity, so that's why I responded).
Excuse any failures, I'm gonna specify/update/correct things upon request.
And all the things you are doing seem to meet some pattern of university-delivery, see for example, if this is not your case, you could pretty well inspire there. If you have to meet some school-delivery, make sure you do not discuss too much into detail and/or do not ask for technical-implementation help.
Looking for a good approach to keep track of a Breadth-First traversal between two nodes, without knowing anything about the graph. Versus Depth-First (where you can throw away the path if it doesn't pan out) you may have quite a few "open" possibilities during the traversal.
The naive approach is to build a tree with the source node as the root and all its connections as its children. Depending on the amount of space you have, you might need to eliminate cycles as you go. You can do that with a bitmap where each bit corresponds to a distinct node in the graph. When you reach the target node, you can follow the parent links back to the root and that is your path. Since you are going breadth first, you are assured that it is a shortest path even if you don't eliminate cycles.
For a breadth-first search you need to store at least two things. One is the set of already visited nodes and the other is the set of nodes that are directly reachable from the visited nodes but are not visited themselves. Then you keep moving states from the latter set to the former, adding newly reachable states to the latter. If you need the have a path from the root to some node(s), then you will also need to store a parent node for each node (except the root) in the aforementioned sets.
Usually the union of the set of visited nodes and the set of not-visited child nodes (i.e. the set of seen nodes) is stored in a hash table. This is to be able to quickly determine whether or not a "new" state has been seen before and ignore it if this is the case. If you have really big number of states you might indeed need a bit array (as mentioned by Joseph Bui (57509), but unless your states can be used (directly or indirectly) as indices to that array, you will need to use a hash function to map states to indices. In the latter case you might completely ignore certain states because they are mapped to the same index as a different (and seen) node, so you might want to be careful with this. Also, to get a path you still need to store the parent information which pretty much negates the use of the bit-array.
The set of unvisited but seen nodes can be stored as a queue. (Bit arrays are of no use for this set because the array will be mostly empty and finding the next set bit is relatively expensive.)
I just submitted a solution over here that also applies to this question.
Basically, I just keep a single list (a stack really) of visited nodes. Add a node to the list just before recursing or saving a solution. Always remove from the list directly after.
If you are using .NET 3.5 consider using the Hashset to prevent duplicate nodes from being expanded, this happens when there is cycles in your graph. If you have any knowledge about the contents of the graph consider implementing an A* search to reduce the number of nodes that are expanded. Good luck and I hope it works out for you.
If you are still a fan of treeware there are many excellent books on the topic of graphs and graph search such as Artificial Intelligence: A Modern Approach by Peter Norvig and Stuart Russell.
The links in my response appear to have a bug they are Hashset: http://msdn.com/en-us/library/bb359438.aspx and A* search: http://en.wikipedia.org/wiki/A*_search_algorithm