Most efficient implementation for a complete undirected graph

Most efficient implementation for a complete undirected graph - c#

Problem background
I am currently developing a framework of Ant Colony System algorithms. I thought I'd start out by trying them on the first problem they were applied to: Travelling Salesman Problem (TSP). I will be using C# for the task.
All TSP instances will consist of a complete undirected graph with 2 different weights associated with each edge.
Question
Until now I've only used adjacency-list representations but I've read that they are recommended only for sparse graphs. As I am not the most knowledgeable of persons when it comes to data structures I was wondering what would be the most efficient way to implement an undirected complete graph?
I can provide additional details if required.
Thank you for your time.
UPDATE
Weight clarification. Each edge will have the two values associated with them:
distance between two cities ( d(i,j) = d(j,i) same distance in both directions)
amount of pheromone deposited by ants on that particular edge
Operations. Small summary of the operations I will be doing on the graph:
for each node, the ant on that particular node will have to iterate through the values associated with all incident edges
Problem clarification
Ant Colony Optimization algorithms can "solve" TSP as this is where they were first applied to . I say "solve" because they are part of a family of algorithms called metaheuristics optimizations, thus they never guarantee to return the optimal solution.
Regarding the problem at hand:
ants will know how to complete a tour because each ant will have a memory.
each time an ant visits a city it will store that city in its memory.
each time an ant considers visiting a new city it will search in its memory and pick an outgoing edge only if that edge will not lead it to an already visited city.
when there are no more edges the ant can choose it has complete a tour; at this point we can retrace the tour created by the ant by backtracking through its memory.
Research article details: Ant Colony System article
Efficiency considerations
I am more worried about run time (speed) than memory.

First, there's a general guide to adjacency lists vs matrices here. That's a pretty low-level, non-specific discussion, though, so it might not tell you anything you don't already know.
The takeaway, I think, is this: If you often find yourself needing to answer the question, "I need to know the distance or the pheromone level of the edge between exactly node i and node j," then you probably want the matrix form, as that question can be answered in O(1) time.
You do mention needing to iterate over the edges adjacent to a node-- here is where some cleverness and subtlety may come in. If you don't care about the order of the iteration, then you don't care about the data structure. If you care deeply about the order, and you know the order up front, and it never changes, you can probably code this directly into an adjacency list. If you find yourself always wanting, e.g., the largest or smallest concentration of pheromones, you may want to try something even more structured, like a priority queue. It really depends on what kind of operations you're doing.
Finally, I know you mention that you're more interested in speed than memory, but it's not clear to me how many graph representations you'll be using. If only one, then you truly don't care. But, if each ant is building up its own representation of the graph as it goes along, you might care more than you think, and the adjacency list will let you carry around incomplete graph representations; the flip side of that is that it will take time to build the representations up when the ant is exploring on its frontier.
Finally finally, I know you say you're dealing with complete graphs and TSP, but it is worth thinking about whether you will ever need to adapt these routines to some other problem on possibly graphs, and if so, what then.
I lean toward adjacency lists and/or even more structure, but I don't think you will find a clean, crisp answer.

Since you have a complete graph I would think that the best representation would be a 2D array.
public class Edge
{
//change types as appropriate
public int Distance {get;set;}
public int Pheromone {get;set;}
}
int numNodes;
Edge[,] graph = new Edge[numNodes,numNodes];
for(int i = 0; i < numNodes; i++)
{
for(int j = 0; j < numNodes; j++)
{
graph[i][j] = new Edge();
//initialize Edge
}
}
If you have a LOT of nodes, and don't "remember" nodes by index in this graph, then it may be beneficial to have a Dictionary that maps a Node to the index in the graph. It may also be helpful to have the reverse lookup (a List would be the appropriate data structure here. This would give you the ability to get a Node object (if you have a lot of information to store about each node) based on the index of that node in the graph.

Related

Seperate list / mesh into sub-lists / sub-meshes

EDIT: To give an idea of what type of mesh I have:
Imagine a LEGO brick with four knobs on the top. I read a STL file containing the surface of the whole brick. After identifying all nodes with unique coordinates (and saving their next neighbours in a list) I cut away most of the brick, so that only the four knobs remain. Unluckily for me, these four knobs are still in one big list (one for the nodes, one for the next neighbours). I want the fastest way to get all nodes of one knob if I specify one node which I know belongs to that knob.
I have a (relatively) big List<cNode> nodes where
class cNode
{
int nodeNumber;
cVector vector;
}
and an even bigger (ca. 14e6 entries) List<cNodeCoincidence> coincidences where
class cNodeCoincidence
{
cNode node1;
cNode node2;
}
My nodes represent points in 3D and my coincidences resembles what was formerly a mesh consisting of triangles, condensed from a STL file. I know for a fact (and the user made his input accordingly), that my node-mesh is actually 4 seperate meshes in one node/coincidence list. My goal is to extract the nodes of each sub-mesh to its own nodelist. To achieve this, I start with one node for each sub-mesh, which I know to be part of said sub-mesh. Cue a recursive function:
private void AssembleSubMesh(ReadOnlyCollection<cNode> in_nodesToRead, List<cNode> in_nodesAlreadyRead)
{
List<cNode> newNodesToRead = new List<cNode>();
List<cNodeCoincidence> foundCoincidences = coincidences.Where(x => (in_nodesToRead.Any(y => y == x.node1)) || in_nodesToRead.Any(z => z == x.node2)).ToList();
in_nodesAlreadyRead.AddRange(in_nodesToRead);
List<cNode> allRemainingNodes = new List<cNode>();
foreach (cNodeCoincidence nc in foundCoincidences)
{
allRemainingNodes.Add(nc.node1);
allRemainingNodes.Add(nc.node2);
}
allRemainingNodes = allRemainingNodes.Distinct().ToList();
allRemainingNodes.RemoveAll(x => in_nodesAlreadyRead.Contains(x));
if (allRemainingNodes.Count != 0)
AssembleSubMesh(new ReadOnlyCollection<cNode>(allRemainingNodes), in_nodesAlreadyRead);
}
which is called by: AssembleSubMesh(new ReadOnlyCollection<cNode>(firstNodeIKnow), globalResultListForSubmesh); thus writing the results of the recursion to a more global list.
This procedure works (tested with small mesh), but is painfully slow (over 15 hours before I aborted the process).
Is there any way to seperate the mesh in a faster and perhaps more elegant way?
I found this SO post and had a look at this lecture and it seems, that there might be a chance that this (especially WQUPC) is what I need, but I don't understand how exactly it could help, because they only have a node list and I additionally have the coincidence list which would be a shame not to use, really (?).
Could a database help (because of indexing)?

You need to be able to identify edge. which is a a single connection between 2 vertices (no more other connections found). I assume that there are all vertices for all triangles, so they are duplicated. It depends on the dimension of your mesh sure, but it shouldn't take so much time.
You need to define dictionaries, which will pump your app's memory, but will also dramatically increase speed with guaranteed O(1) access.
In short:
1) load data
2) scan it and construct appropriate data structures
If you observe any CAD modelling software it takes much more time then it should during loading of meshes, for the same reason: they need to scan data loaded and construct appropriate data structures to be able to process that data after as fast as it possible.
3) use those data structures to get information you need as fast as it possible.
So choose data structures and keys wisely, to meet requirements of your application.

C# RushHour iterative deepenig, optimization

I have to solve the rushhour problem using iterative deepening search, I'm generating new node for every move, everything works fine, except that it takes too much time to compute everything and the reason for this is that I'm generating duplicated nodes. Any ideas how to check for duplicates?
First I start at the root, then there is a method which checks every car whether is it possible to move it if yes, new node is created from the current node but the one car that has valid move replaced with new car that has new coordinates.
Problem is that the deeper the algorithm is the more duplicates moves there are.
I have tried to not to replace the car, but used the same collection as was used in root node but then the cars were moving only in one direction.
I think that I need to tie car collection somehow, but don't know how.
The code
Any ideas how to stop making duplicates?
Off topic: I'm new to C# (read several tutorial and then have been using for 2 days) so can you tell me what I'm doing wrong or what should I not do?

If you want to stick with iterative deepening, then the simplest solution may be to build a hash table. Then all you need to do with each new node is something like
NewNode = GenerateNextNode
if not InHashTable(NewNode) then
AddToHashTable(NewNode)
Process(NewNode)
Alternately, the number of possible positions (nodes) in RushHour is fairly small (assuming you are using the standard board dimensions) and it is possible to generate all possible (and impossible!) boards fairly easily. Then rather than iterative deepening you can start with the 'solution' state and work backwards (ticking off all possible 'parent' states) until you reach the start state. By working on the table of possible states you never generate duplicates, and by tagging each state once it is visited you never re-visit states.

How does Lucene.Net store Indexed-only fields? [duplicate]

I read some document about Lucene; also I read the document in this link
(http://lucene.sourceforge.net/talks/pisa).
I don't really understand how Lucene indexes documents and don't understand which algorithms Lucene uses for indexing?
On the above link, it says Lucene uses this algorithm for indexing:
incremental algorithm:
maintain a stack of segment indices
create index for each incoming document
push new indexes onto the stack
let b=10 be the merge factor; M=8
for (size = 1; size < M; size *= b) {
if (there are b indexes with size docs on top of the stack) {
pop them off the stack;
merge them into a single index;
push the merged index onto the stack;
} else {
break;
}
}
How does this algorithm provide optimized indexing?
Does Lucene use B-tree algorithm or any other algorithm like that for indexing
- or does it have a particular algorithm?

In a nutshell, Lucene builds an inverted index using Skip-Lists on disk, and then loads a mapping for the indexed terms into memory using a Finite State Transducer (FST). Note, however, that Lucene does not (necessarily) load all indexed terms to RAM, as described by Michael McCandless, the author of Lucene's indexing system himself. Note that by using Skip-Lists, the index can be traversed from one hit to another, making things like set and, particularly, range queries possible (much like B-Trees). And the Wikipedia entry on indexing Skip-Lists also explains why Lucene's Skip-List implementation is called a multi-level Skip-List - essentially, to make O(log n) look-ups possible (again, much like B-Trees).
So once the inverted (term) index - which is based on a Skip-List data structure - is built from the documents, the index is stored on disk. Lucene then loads (as already said: possibly, only some of) those terms into a Finite State Transducer, in an FST implementation loosely inspired by Morfologick.
Michael McCandless (also) does a pretty good and terse job of explaining how and why Lucene uses a (minimal acyclic) FST to index the terms Lucene stores in memory, essentially as a SortedMap<ByteSequence,SomeOutput>, and gives a basic idea for how FSTs work (i.e., how the FST compacts the byte sequences [i.e., the indexed terms] to make the memory use of this mapping grow sub-linear). And he points to the paper that describes the particular FST algorithm Lucene uses, too.
For those curious why Lucene uses Skip-Lists, while most databases use (B+)- and/or (B)-Trees, take a look at the right SO answer regarding this question (Skip-Lists vs. B-Trees). That answer gives a pretty good, deep explanation - essentially, not so much make concurrent updates of the index "more amenable" (because you can decide to not re-balance a B-Tree immediately, thereby gaining about the same concurrent performance as a Skip-List), but rather, Skip-Lists save you from having to work on the (delayed or not) balancing operation (ultimately) needed by B-Trees (In fact, as the answer shows/references, there is probably very little performance difference between B-Trees and [multi-level] Skip-Lists, if either are "done right.")

There's a fairly good article here: https://web.archive.org/web/20130904073403/http://www.ibm.com/developerworks/library/wa-lucene/
Edit 12/2014: Updated to an archived version due to the original being deleted, probably the best more recent alternative is http://lucene.apache.org/core/3_6_2/fileformats.html
There's an even more recent version at http://lucene.apache.org/core/4_10_2/core/org/apache/lucene/codecs/lucene410/package-summary.html#package_description, but it seems to have less information in it than the older one.
In a nutshell, when lucene indexes a document it breaks it down into a number of terms. It then stores the terms in an index file where each term is associated with the documents that contain it. You could think of it as a bit like a hashtable.
Terms are generated using an analyzer which stems each word to its root form. The most popular stemming algorithm for the english language is the Porter stemming algorithm: http://tartarus.org/~martin/PorterStemmer/
When a query is issued it is processed through the same analyzer that was used to build the index and then used to look up the matching term(s) in the index. That provides a list of documents that match the query.

It seems your question more about index merging than about indexing itself.
Indexing process is quite simple if you ignore low-level details. Lucene form what is called "inverted index" from documents. So if document with text "To be or not to be" and id=1 comes in, inverted index would look like:
[to] → 1
[be] → 1
[or] → 1
[not] → 1
This is basically it – the index from the word to the list of documents containing given word. Each line of this index (word) is called posting list. This index is persisted on long-term storage then.
In reality of course things are more complicated:
Lucene may skip some words based on the particular Analyzer given;
words can be preprocessed using stemming algorithm to reduce flexia of the language;
posting list can contains not only identifiers of the documents, but also offset of the given word inside document (potentially several instances) and some other additional information.
There are many more complications which are not so important for basic understanding.
It's important to understand though, that Lucene index is append only. In some point in time application decide to commit (publish) all the changes in the index. Lucene finish all service operations with index and close it, so it's available for searching. After commit index basically immutable. This index (or index part) is called segment. When Lucene execute search for a query it search in all available segments.
So the question arise – how can we change already indexed document?
New documents or new versions of already indexed documents are indexed in new segments and old versions invalidated in previous segments using so called kill list. Kill list is the only part of committed index which can change. As you might guess, index efficiency drops with time, because old indexes might contain mostly removed documents.
This is where merging comes in. Merging – is the process of combining several indexes to make more efficient index overall. What is basically happens during merge is live documents copied to the new segment and old segments removed entirely.
Using this simple process Lucene is able to maintain index in good shape in terms of a search performance.
Hope it'll helps.

It is inverted index, but that does not specify which structure it uses.
Index format in lucene has complete information.
Start with 'Summary of File Extensions'.
You'll first notice that it talks about various different indexes.
As far as I could notice none of these use strictly speaking a B-tree, but there are similarities - the above structures do resemble trees.

What are some alternatives to recursive search algorithms?

I am looking at alternatives to a deep search algorithm that I've been working on. My code is a bit too long to post here, but I've written a simplified version that captures the important aspects. First, I've created an object that I'll call 'BranchNode' that holds a few values as well as an array of other 'BranchNode' objects.
class BranchNode : IComparable<BranchNode>
{
public BranchNode(int depth, int parentValue, Random rnd)
{
_nodeDelta = rnd.Next(-100, 100);
_depth = depth + 1;
leafValue = parentValue + _nodeDelta;
if (depth < 10)
{
int children = rnd.Next(1, 10);
branchNodes = new BranchNode[children];
for (int i = 0; i < children; i++)
{
branchNodes[i] = new BranchNode(_depth, leafValue, rnd);
}
}
}
public int CompareTo(BranchNode other)
{
return other.leafValue.CompareTo(this.leafValue);
}
private int _nodeDelta;
public BranchNode[] branchNodes;
private int _depth;
public int leafValue;
}
In my actual program, I'm getting my data from elsewhere... but for this example, I'm just passing an instance of a Random object down the line that I'm using to generate values for each BranchNode... I'm also manually creating a depth of 10, whereas my actual data will have any number of generations.
As a quick explanation of my goals, _nodeDelta contains a value that is assigned to each BranchNode. Each instance also maintains a leafValue that is equal to current BranchNode's _nodeDelta summed with the _nodeDeltas of all of it's ancestors. I am trying to find the largest leafValue of a BranchNode with no children.
Currently, I am recursively transversing the heirarchy searching for BranchNodes whose child BranchNodes array is null (a.k.a: a 'childless' BranchNode), then comparing it's leafValue to that of the current highest leafValue. If it's larger, it becomes the benchmark and the search continues until it's looked at all BranchNodes.
I can post my recursive search algorithm if it'd help, but it's pretty standard, and is working fine. My issue is, as expected, that for larger heirarchies, my algorithm takes a long while to transverse the entier structure.
I was wondering if I had any other options that I could look into that may yield faster results... specificaly, I've been trying to wrap my head around linq, but I'm not even sure that it is built to do what I'm looking for, or if it'd be any faster. Are there other things that I should be looking into as well?

Maybe you want to look into an alternative data index structure: Here
It always depends on the work you are doing with the data, but if you assign a unique ID on each element that stores the hierarchical form, and creating an index of what you store, your optimization will make much more sense than micro-optimizing parts of what you do.
Also, this also lends itself a very different paradigm in search algorithms, that uses no recursion, but in the cost of additional memory for the IDs and possibly the index.

If you must visit all leaf nodes, you cannot speed up the search: it is going to go through all nodes no matter what. A typical trick played to speed up a search on trees is organizing them in some special way that simplifies the search of the tree. For example, by building a binary search tree, you make your search O(Log(N)). You could also store some helpful values in the non-leaf nodes from which you could later construct the answer to your search query.
For example, you could decide to store the _bestLeaf "pointing" to the leaf with the highest _nodeDelta of all leaves under the current subtree. If you do that, your search would become an O(1) lookup. Your inserts and removals would become more expensive, however, because you would need to update up to Log-b(N) items on the way back to root with the new _bestLeaf (b is the branching factor of your tree).

I think the first thing you should think about is maybe going away from the N-Tree and going to as Binary Search tree.
This means that all nodes have only 2 children, a greater child, and a lesser child.
From there, I would say look into balancing your search tree with something like a Red-Black tree or AVL. That way, searching your tree is O(log n).
Here are some links to get you started:
http://en.wikipedia.org/wiki/Binary_search_tree
http://en.wikipedia.org/wiki/AVL_tree
http://en.wikipedia.org/wiki/Red-black_tree
Now, if you are dead set on having each node able to have N child nodes, here are some things you should thing about:
Think about ordering your child nodes so that you can quickly determine which has the highest leaf number. that way, when you enter a new node, you can check one child node and quickly determine if it is worth recursively checking it's children.
Think about ways that you can quickly eliminate as many nodes as you possibly can from the search or break the recursive calls as early as you can. With the binary search tree, you can easily find the largest leaf node by always only looking at the greater child. this could eliminate N-log(n) children if the tree is balanced.
Think about inserting and deleting nodes. If you spend more time here, you could save a lot more time later

As others mention, a different data structure might be what you want.
If you need to keep the data structure the same, the recursion can be unwound into loops. While this approach will probably be a little bit faster, it's not going to be orders of magnitude faster, but might take up less memory.

Reuse my old Depth-First Tree for a bigger depth while searching the longest path

I'm looking for the longest-path trough a map in a game which is turn based. I got 1s computation time and need to move at that point.
Right now I'm generating the tree every move again.
Is it possible to use my old tree and stack (in which I store the nodes yet to be visited) to get a bigger depth and thus a better result?
For now my SearchClass is based on a Interface, thus changing the return-type and the input-variables of my function is a lot of work. Is there an easy solution for my problem?

If your map is static and not overly large, you could generate your tree in advance.
For each node, calculate the longest path to every other node on the map then store the path and its length against the original node. That way, you no longer need to compute paths during program execution; you only need to use the pre-computed longest path from your current node to your chosen destination.

Can you probably make your (Player's) tree static? Or if you are the only player, you could make it a global variable for the whole program on player's-side, but that depends on many things, that you did not share with us. I would still suggest you to have a look at MCTS: Wikipedia description and Here, it has sample code.
With MCTS the idea is simple: You compute all your 900ms, then make a player's move to the node, that has the highest winning-probability. If you can persist the tree as a global or static ( or both :D ) variable, the first thing, that you do at the begining of the next turn (or the next computation) is to get rid of all parts of the previous tree, that you can not access any more - because you are not at position [0][0], but at position lets say [1][3] ... so that shrinks the tree-size for you, which is good. So what you have to do is to replace the original tree with a new tree, which starts with the node, that you are at the moment standing on. Good thing is, you have some values precomputed, now it depends on your implementation, how you want the nodes to be explored and/or probability-updated. But as the game goes on, the program should have enough data, that it can guarrant it a very high winning probability.
This approach is exceptionally good, because it does not compute the probabilities of the steps you do not take, when known, you are not going to take them (this is a thing you did not mention in your approach and I find it a necessity, so that's why I responded).
Excuse any failures, I'm gonna specify/update/correct things upon request.
And all the things you are doing seem to meet some pattern of university-delivery, see for example, if this is not your case, you could pretty well inspire there. If you have to meet some school-delivery, make sure you do not discuss too much into detail and/or do not ask for technical-implementation help.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.