I am looking at alternatives to a deep search algorithm that I've been working on. My code is a bit too long to post here, but I've written a simplified version that captures the important aspects. First, I've created an object that I'll call 'BranchNode' that holds a few values as well as an array of other 'BranchNode' objects.
class BranchNode : IComparable<BranchNode>
{
public BranchNode(int depth, int parentValue, Random rnd)
{
_nodeDelta = rnd.Next(-100, 100);
_depth = depth + 1;
leafValue = parentValue + _nodeDelta;
if (depth < 10)
{
int children = rnd.Next(1, 10);
branchNodes = new BranchNode[children];
for (int i = 0; i < children; i++)
{
branchNodes[i] = new BranchNode(_depth, leafValue, rnd);
}
}
}
public int CompareTo(BranchNode other)
{
return other.leafValue.CompareTo(this.leafValue);
}
private int _nodeDelta;
public BranchNode[] branchNodes;
private int _depth;
public int leafValue;
}
In my actual program, I'm getting my data from elsewhere... but for this example, I'm just passing an instance of a Random object down the line that I'm using to generate values for each BranchNode... I'm also manually creating a depth of 10, whereas my actual data will have any number of generations.
As a quick explanation of my goals, _nodeDelta contains a value that is assigned to each BranchNode. Each instance also maintains a leafValue that is equal to current BranchNode's _nodeDelta summed with the _nodeDeltas of all of it's ancestors. I am trying to find the largest leafValue of a BranchNode with no children.
Currently, I am recursively transversing the heirarchy searching for BranchNodes whose child BranchNodes array is null (a.k.a: a 'childless' BranchNode), then comparing it's leafValue to that of the current highest leafValue. If it's larger, it becomes the benchmark and the search continues until it's looked at all BranchNodes.
I can post my recursive search algorithm if it'd help, but it's pretty standard, and is working fine. My issue is, as expected, that for larger heirarchies, my algorithm takes a long while to transverse the entier structure.
I was wondering if I had any other options that I could look into that may yield faster results... specificaly, I've been trying to wrap my head around linq, but I'm not even sure that it is built to do what I'm looking for, or if it'd be any faster. Are there other things that I should be looking into as well?
Maybe you want to look into an alternative data index structure: Here
It always depends on the work you are doing with the data, but if you assign a unique ID on each element that stores the hierarchical form, and creating an index of what you store, your optimization will make much more sense than micro-optimizing parts of what you do.
Also, this also lends itself a very different paradigm in search algorithms, that uses no recursion, but in the cost of additional memory for the IDs and possibly the index.
If you must visit all leaf nodes, you cannot speed up the search: it is going to go through all nodes no matter what. A typical trick played to speed up a search on trees is organizing them in some special way that simplifies the search of the tree. For example, by building a binary search tree, you make your search O(Log(N)). You could also store some helpful values in the non-leaf nodes from which you could later construct the answer to your search query.
For example, you could decide to store the _bestLeaf "pointing" to the leaf with the highest _nodeDelta of all leaves under the current subtree. If you do that, your search would become an O(1) lookup. Your inserts and removals would become more expensive, however, because you would need to update up to Log-b(N) items on the way back to root with the new _bestLeaf (b is the branching factor of your tree).
I think the first thing you should think about is maybe going away from the N-Tree and going to as Binary Search tree.
This means that all nodes have only 2 children, a greater child, and a lesser child.
From there, I would say look into balancing your search tree with something like a Red-Black tree or AVL. That way, searching your tree is O(log n).
Here are some links to get you started:
http://en.wikipedia.org/wiki/Binary_search_tree
http://en.wikipedia.org/wiki/AVL_tree
http://en.wikipedia.org/wiki/Red-black_tree
Now, if you are dead set on having each node able to have N child nodes, here are some things you should thing about:
Think about ordering your child nodes so that you can quickly determine which has the highest leaf number. that way, when you enter a new node, you can check one child node and quickly determine if it is worth recursively checking it's children.
Think about ways that you can quickly eliminate as many nodes as you possibly can from the search or break the recursive calls as early as you can. With the binary search tree, you can easily find the largest leaf node by always only looking at the greater child. this could eliminate N-log(n) children if the tree is balanced.
Think about inserting and deleting nodes. If you spend more time here, you could save a lot more time later
As others mention, a different data structure might be what you want.
If you need to keep the data structure the same, the recursion can be unwound into loops. While this approach will probably be a little bit faster, it's not going to be orders of magnitude faster, but might take up less memory.
Related
Given a binary tree, each of it's nodes contains an item with range, for instance, one, particular node may contain a range of ( 1 to 1.23456 ]
If the query element is less than or greater than the described range, it inspects the respective child. For example, it is 1.3
As follows, we will be looking over the right branch, performing 2 "if" checks to see if it fits in the range of the element.
Even though balanced Binary Search Tree (BST) is an elegant way of traversing quickly through a dataset, the amount of "if" checks grows significantly if there are more and more children. It becomes even more of a problem, when it has to be done several million times per second.
Is there an elegant way of storing objects such that given an element with a value (1.3 for example), its value can be simply fed into something as Dictionary? This would quickly retrieve the proper element to whose range this value fits or null if it fits none.
However, dictionary doesn't check against ranges, instead, it expects a single value. Therefore, is there a data structure which can provide an item if supplied key fits within the item's range?
Here a person has similar problem, however he finds out that the memory is wasted. He is being advised to BST approach, but is it the only solution?
Sorry if there is an evident answer, I may missed it.
Are you asking about interval trees? Interval trees allow you get all the elements on the interval x..y within O(logn) time. For C# implementation I have used the libary called IntervalTreeLib and it worked nicely.
In computer science, an interval tree is an ordered tree data
structure to hold intervals. Specifically, it allows one to
efficiently find all intervals that overlap with any given interval or
point. It is often used for windowing queries, for instance, to find
all roads on a computerized map inside a rectangular viewport, or to
find all visible elements inside a three-dimensional scene. A similar
data structure is the segment tree.
Problem background
I am currently developing a framework of Ant Colony System algorithms. I thought I'd start out by trying them on the first problem they were applied to: Travelling Salesman Problem (TSP). I will be using C# for the task.
All TSP instances will consist of a complete undirected graph with 2 different weights associated with each edge.
Question
Until now I've only used adjacency-list representations but I've read that they are recommended only for sparse graphs. As I am not the most knowledgeable of persons when it comes to data structures I was wondering what would be the most efficient way to implement an undirected complete graph?
I can provide additional details if required.
Thank you for your time.
UPDATE
Weight clarification. Each edge will have the two values associated with them:
distance between two cities ( d(i,j) = d(j,i) same distance in both directions)
amount of pheromone deposited by ants on that particular edge
Operations. Small summary of the operations I will be doing on the graph:
for each node, the ant on that particular node will have to iterate through the values associated with all incident edges
Problem clarification
Ant Colony Optimization algorithms can "solve" TSP as this is where they were first applied to . I say "solve" because they are part of a family of algorithms called metaheuristics optimizations, thus they never guarantee to return the optimal solution.
Regarding the problem at hand:
ants will know how to complete a tour because each ant will have a memory.
each time an ant visits a city it will store that city in its memory.
each time an ant considers visiting a new city it will search in its memory and pick an outgoing edge only if that edge will not lead it to an already visited city.
when there are no more edges the ant can choose it has complete a tour; at this point we can retrace the tour created by the ant by backtracking through its memory.
Research article details: Ant Colony System article
Efficiency considerations
I am more worried about run time (speed) than memory.
First, there's a general guide to adjacency lists vs matrices here. That's a pretty low-level, non-specific discussion, though, so it might not tell you anything you don't already know.
The takeaway, I think, is this: If you often find yourself needing to answer the question, "I need to know the distance or the pheromone level of the edge between exactly node i and node j," then you probably want the matrix form, as that question can be answered in O(1) time.
You do mention needing to iterate over the edges adjacent to a node-- here is where some cleverness and subtlety may come in. If you don't care about the order of the iteration, then you don't care about the data structure. If you care deeply about the order, and you know the order up front, and it never changes, you can probably code this directly into an adjacency list. If you find yourself always wanting, e.g., the largest or smallest concentration of pheromones, you may want to try something even more structured, like a priority queue. It really depends on what kind of operations you're doing.
Finally, I know you mention that you're more interested in speed than memory, but it's not clear to me how many graph representations you'll be using. If only one, then you truly don't care. But, if each ant is building up its own representation of the graph as it goes along, you might care more than you think, and the adjacency list will let you carry around incomplete graph representations; the flip side of that is that it will take time to build the representations up when the ant is exploring on its frontier.
Finally finally, I know you say you're dealing with complete graphs and TSP, but it is worth thinking about whether you will ever need to adapt these routines to some other problem on possibly graphs, and if so, what then.
I lean toward adjacency lists and/or even more structure, but I don't think you will find a clean, crisp answer.
Since you have a complete graph I would think that the best representation would be a 2D array.
public class Edge
{
//change types as appropriate
public int Distance {get;set;}
public int Pheromone {get;set;}
}
int numNodes;
Edge[,] graph = new Edge[numNodes,numNodes];
for(int i = 0; i < numNodes; i++)
{
for(int j = 0; j < numNodes; j++)
{
graph[i][j] = new Edge();
//initialize Edge
}
}
If you have a LOT of nodes, and don't "remember" nodes by index in this graph, then it may be beneficial to have a Dictionary that maps a Node to the index in the graph. It may also be helpful to have the reverse lookup (a List would be the appropriate data structure here. This would give you the ability to get a Node object (if you have a lot of information to store about each node) based on the index of that node in the graph.
How to construct/obtain a datastructure with the following capabilities:
Stores (key,value) nodes, keys implement IComparable.
Fast (log N) insertion and retrieval.
Fast (log N) method to retrieve the next higher/next lower node from any node. [EXAMPLE: if
the key values inserted are (7,cat), (4,dog),(12,ostrich), (13,goldfish) then if keyVal referred to (7,cat), keyVal.Next() should return a reference to (12,ostrich) ].
A solution with an enumerator from an arbitrary key would of course also suffice. Note that standard SortedDictionary functionality will not suffice, since only an enumerator over the entire set can be returned, which makes finding keyVal.next require N operations at worst.
Could a self-implemented balanced binary search tree (red-black tree) be fitted with node.next() functionality? Any good references for doing this? Any less coding-time consuming solutions?
I once had similar requirements and was unable to find something suitable. So I implemented an AVL tree. Here come some advices to do it with performance in mind:
Do not use recursion for walking the tree (insert, update, delete, next). Better use a stack array to store the way up to the root which is needed for balancing operations.
Do not store parent nodes. All operations will start from the root node and walk further down. Parents are not needed, if implemented carefully.
In order to find the Next() node of an existing one, usually Find() is first called. The stack produced by that, should be reused for Next() than.
By following these rules, I was able to implement the AVL tree. It is working very efficiently even for very large data sets. I would be willing to share, but it would need some modifications, since it does not store values (very easy) and does not rely on IComparable but on fixed key types of int.
The OrderedDictionary in PowerCollections provides a "get iterator starting at or before key" function that takes O(log N) time to return the first value. That makes it very fast to, say, scan the 1,000 items that are near the middle of a 50 million item set (which with SortedDictionary would require guessing to start at the start or the end, both of which are equally bad choices and would require iterator around 25 million items). OrderedDictionary can to that with just 1,000 items iterated.
There is a problem in OrderedDictionary though in that it uses yield which causes O(n^2) performance and out of memory conditions when iterating a 50 million item set in a 32 bit process. There is a quite simple fix for that while I will document later.
This question already has answers here:
How to find the index of an element in an array in Java?
(15 answers)
Closed 6 years ago.
I was asked this question in an interview. Although the interview was for dot net position, he asked me this question in context to java, because I had mentioned java also in my resume.
How to find the index of an element having value X in an array ?
I said iterating from the first element till last and checking whether the value is X would give the result. He asked about a method involving less number of iterations, I said using binary search but that is only possible for sorted array. I tried saying using IndexOf function in the Array class. But nothing from my side answered that question.
Is there any fast way of getting the index of an element having value X in an array ?
As long as there is no knowledge about the array (is it sorted? ascending or descending? etc etc), there is no way of finding an element without inspecting each one.
Also, that is exactly what indexOf does (when using lists).
How to find the index of an element having value X in an array ?
This would be fast:
int getXIndex(int x){
myArray[0] = x;
return 0;
}
A practical way of finding it faster is by parallel processing.
Just divide the array in N parts and assign every part to a thread that iterates through the elements of its part until value is found. N should preferably be the processor's number of cores.
If a binary search isn't possible (beacuse the array isn't sorted) and you don't have some kind of advanced search index, the only way I could think of that isn't O(n) is if the item's position in the array is a function of the item itself (like, if the array is [10, 20, 30, 40], the position of an element n is (n / 10) - 1).
Maybe he wants to test your knowledge about Java.
There is Utility Class called Arrays, this class contains various methods for manipulating arrays (such as sorting and searching)
http://download.oracle.com/javase/6/docs/api/java/util/Arrays.html
In 2 lines you can have a O(n * log n) result:
Arrays.sort(list); //O(n * log n)
Arrays.binarySearch(list, 88)); //O(log n)
Puneet - in .net its:
string[] testArray = {"fred", "bill"};
var indexOffset = Array.IndexOf(testArray, "fred");
[edit] - having read the question properly now, :) an alternative in linq would be:
string[] testArray = { "cat", "dog", "banana", "orange" };
int firstItem = testArray.Select((item, index) => new
{
ItemName = item,
Position = index
}).Where(i => i.ItemName == "banana")
.First()
.Position;
this of course would find the FIRST occurence of the string. subsequent duplicates would require additional logic. but then so would a looped approach.
jim
It's a question about data structures and algorithms (altough a very simple data structure). It goes beyond the language you are using.
If the array is ordered you can get O(log n) using binary search and a modified version of it for border cases (not using always (a+b)/2 as the pivot point, but it's a pretty sophisticated quirk).
If the array is not ordered then... good luck.
He can be asking you about what methods you have in order to find an item in Java. But anyway they're not faster. They can be olny simpler to use (than a for-each - compare - return).
There's another solution that's creating an auxiliary structure to do a faster search (like a hashmap) but, OF COURSE, it's more expensive to create it and use it once than to do a simple linear search.
Take a perfectly unsorted array, just a list of numbers in memory. All the machine can do is look at individual numbers in memory, and check if they are the right number. This is the "password cracker problem". There is no faster way than to search from the beginning until the correct value is hit.
Are you sure about the question? I have got a questions somewhat similar to your question.
Given a sorted array, there is one element "x" whose value is same as its index find the index of that element.
For example:
//0,1,2,3,4,5,6,7,8,9, 10
int a[10]={1,3,5,5,6,6,6,8,9,10,11};
at index 6 that value and index are same.
for this array a, answer should be 6.
This is not an answer, in case there was something missed in the original question this would clarify that.
If the only information you have is the fact that it's an unsorted array, with no reletionship between the index and value, and with no auxiliary data structures, then you have to potentially examine every element to see if it holds the information you want.
However, interviews are meant to separate the wheat from the chaff so it's important to realise that they want to see how you approach problems. Hence the idea is to ask questions to see if any more information is (or could be made) available, information that can make your search more efficient.
Questions like:
1/ Does the data change very often?
If not, then you can use an extra data structure.
For example, maintain a dirty flag which is initially true. When you want to find an item and it's true, build that extra structure (sorted array, tree, hash or whatever) which will greatly speed up searches, then set the dirty flag to false, then use that structure to find the item.
If you want to find an item and the dirty flag is false, just use the structure, no need to rebuild it.
Of course, any changes to the data should set the dirty flag to true so that the next search rebuilds the structure.
This will greatly speed up (through amortisation) queries for data that's read far more often than written.
In other words, the first search after a change will be relatively slow but subsequent searches can be much faster.
You'll probably want to wrap the array inside a class so that you can control the dirty flag correctly.
2/ Are we allowed to use a different data structure than a raw array?
This will be similar to the first point given above. If we modify the data structure from an array into an arbitrary class containing the array, you can still get all the advantages such as quick random access to each element.
But we gain the ability to update extra information within the data structure whenever the data changes.
So, rather than using a dirty flag and doing a large update on the next search, we can make small changes to the extra information whenever the array is changed.
This gets rid of the slow response of the first search after a change by amortising the cost across all changes (each change having a small cost).
3. How many items will typically be in the list?
This is actually more important than most people realise.
All talk of optimisation tends to be useless unless your data sets are relatively large and performance is actually important.
For example, if you have a 100-item array, it's quite acceptable to use even the brain-dead bubble sort since the difference in timings between that and the fastest sort you can find tend to be irrelevant (unless you need to do it thousands of times per second of course).
For this case, finding the first index for a given value, it's probably perfectly acceptable to do a sequential search as long as your array stays under a certain size.
The bottom line is that you're there to prove your worth, and the interviewer is (usually) there to guide you. Unless they're sadistic, they're quite happy for you to ask them questions to try an narrow down the scope of the problem.
Ask the questions (as you have for the possibility the data may be sorted. They should be impressed with your approach even if you can't come up with a solution.
In fact (and I've done this in the past), they may reject all your possibile approaches (no, it's not sorted, no, no other data structures are allowed, and so on) just to see how far you get.
And maybe, just maybe, like the Kobayashi Maru, it may not be about winning, it may be how you deal with failure :-)
Looking for a good approach to keep track of a Breadth-First traversal between two nodes, without knowing anything about the graph. Versus Depth-First (where you can throw away the path if it doesn't pan out) you may have quite a few "open" possibilities during the traversal.
The naive approach is to build a tree with the source node as the root and all its connections as its children. Depending on the amount of space you have, you might need to eliminate cycles as you go. You can do that with a bitmap where each bit corresponds to a distinct node in the graph. When you reach the target node, you can follow the parent links back to the root and that is your path. Since you are going breadth first, you are assured that it is a shortest path even if you don't eliminate cycles.
For a breadth-first search you need to store at least two things. One is the set of already visited nodes and the other is the set of nodes that are directly reachable from the visited nodes but are not visited themselves. Then you keep moving states from the latter set to the former, adding newly reachable states to the latter. If you need the have a path from the root to some node(s), then you will also need to store a parent node for each node (except the root) in the aforementioned sets.
Usually the union of the set of visited nodes and the set of not-visited child nodes (i.e. the set of seen nodes) is stored in a hash table. This is to be able to quickly determine whether or not a "new" state has been seen before and ignore it if this is the case. If you have really big number of states you might indeed need a bit array (as mentioned by Joseph Bui (57509), but unless your states can be used (directly or indirectly) as indices to that array, you will need to use a hash function to map states to indices. In the latter case you might completely ignore certain states because they are mapped to the same index as a different (and seen) node, so you might want to be careful with this. Also, to get a path you still need to store the parent information which pretty much negates the use of the bit-array.
The set of unvisited but seen nodes can be stored as a queue. (Bit arrays are of no use for this set because the array will be mostly empty and finding the next set bit is relatively expensive.)
I just submitted a solution over here that also applies to this question.
Basically, I just keep a single list (a stack really) of visited nodes. Add a node to the list just before recursing or saving a solution. Always remove from the list directly after.
If you are using .NET 3.5 consider using the Hashset to prevent duplicate nodes from being expanded, this happens when there is cycles in your graph. If you have any knowledge about the contents of the graph consider implementing an A* search to reduce the number of nodes that are expanded. Good luck and I hope it works out for you.
If you are still a fan of treeware there are many excellent books on the topic of graphs and graph search such as Artificial Intelligence: A Modern Approach by Peter Norvig and Stuart Russell.
The links in my response appear to have a bug they are Hashset: http://msdn.com/en-us/library/bb359438.aspx and A* search: http://en.wikipedia.org/wiki/A*_search_algorithm