What is the steps of the insertion algorithm of R* Tree?
Note: I want to be able to construct the tree by insertion. IT ALWAYS GIVES ME CRAP TREES with maximum overlap and maximum area cover not matter what condition I choose for selecting the best leaf (test minimum overlap area after adding at each level of the tree, minimum expansion ration at each level of the tree, etc).
Now how this R* Tree is constructed by insertion so beautifully (from Wikipedia):
The R*-Tree is not just a different insertion leaf strategy.
The spitting strategy (perimeter!) is just as important, as it prefers "quadratic" pages, as opposed to the slices produced by other strategies such as Ang-Tan.
Furthermore, and this is maybe the key to getting prettier trees, the R*-tree performs a kind of rebalancing to actively avoid bad splits. Instead of splitting, when a node is overfull, it also removes the least central elements (or subtrees - you need this at all levels) and reinserts them. This doesn't always prevent the overflow, but it may reduce overlap in a tree.
But of course you can do various mistakes in implementing, and the R-tree will still work, just not perform well because of the bad structure. How bad is your tree, do you have a screenshot?
Related
Given a binary tree, each of it's nodes contains an item with range, for instance, one, particular node may contain a range of ( 1 to 1.23456 ]
If the query element is less than or greater than the described range, it inspects the respective child. For example, it is 1.3
As follows, we will be looking over the right branch, performing 2 "if" checks to see if it fits in the range of the element.
Even though balanced Binary Search Tree (BST) is an elegant way of traversing quickly through a dataset, the amount of "if" checks grows significantly if there are more and more children. It becomes even more of a problem, when it has to be done several million times per second.
Is there an elegant way of storing objects such that given an element with a value (1.3 for example), its value can be simply fed into something as Dictionary? This would quickly retrieve the proper element to whose range this value fits or null if it fits none.
However, dictionary doesn't check against ranges, instead, it expects a single value. Therefore, is there a data structure which can provide an item if supplied key fits within the item's range?
Here a person has similar problem, however he finds out that the memory is wasted. He is being advised to BST approach, but is it the only solution?
Sorry if there is an evident answer, I may missed it.
Are you asking about interval trees? Interval trees allow you get all the elements on the interval x..y within O(logn) time. For C# implementation I have used the libary called IntervalTreeLib and it worked nicely.
In computer science, an interval tree is an ordered tree data
structure to hold intervals. Specifically, it allows one to
efficiently find all intervals that overlap with any given interval or
point. It is often used for windowing queries, for instance, to find
all roads on a computerized map inside a rectangular viewport, or to
find all visible elements inside a three-dimensional scene. A similar
data structure is the segment tree.
How to construct/obtain a datastructure with the following capabilities:
Stores (key,value) nodes, keys implement IComparable.
Fast (log N) insertion and retrieval.
Fast (log N) method to retrieve the next higher/next lower node from any node. [EXAMPLE: if
the key values inserted are (7,cat), (4,dog),(12,ostrich), (13,goldfish) then if keyVal referred to (7,cat), keyVal.Next() should return a reference to (12,ostrich) ].
A solution with an enumerator from an arbitrary key would of course also suffice. Note that standard SortedDictionary functionality will not suffice, since only an enumerator over the entire set can be returned, which makes finding keyVal.next require N operations at worst.
Could a self-implemented balanced binary search tree (red-black tree) be fitted with node.next() functionality? Any good references for doing this? Any less coding-time consuming solutions?
I once had similar requirements and was unable to find something suitable. So I implemented an AVL tree. Here come some advices to do it with performance in mind:
Do not use recursion for walking the tree (insert, update, delete, next). Better use a stack array to store the way up to the root which is needed for balancing operations.
Do not store parent nodes. All operations will start from the root node and walk further down. Parents are not needed, if implemented carefully.
In order to find the Next() node of an existing one, usually Find() is first called. The stack produced by that, should be reused for Next() than.
By following these rules, I was able to implement the AVL tree. It is working very efficiently even for very large data sets. I would be willing to share, but it would need some modifications, since it does not store values (very easy) and does not rely on IComparable but on fixed key types of int.
The OrderedDictionary in PowerCollections provides a "get iterator starting at or before key" function that takes O(log N) time to return the first value. That makes it very fast to, say, scan the 1,000 items that are near the middle of a 50 million item set (which with SortedDictionary would require guessing to start at the start or the end, both of which are equally bad choices and would require iterator around 25 million items). OrderedDictionary can to that with just 1,000 items iterated.
There is a problem in OrderedDictionary though in that it uses yield which causes O(n^2) performance and out of memory conditions when iterating a 50 million item set in a 32 bit process. There is a quite simple fix for that while I will document later.
I have a large dataset with possibly over a million entries. All items have an assigned time stamp and items are added to the set at runtime (usually, but not always, with a newer time stamp).
I need to show a sub set of this data given a certain time range. This time range is usually quite small compared to the total data set, i.e. of the 1.000.000+ items not more than about 1000 are in that given time range. This time range moves at a constant pace, e.g. every second the time range is moved by one second.
Additionally, the user may adjust the time range at any time ("move" through the data set) or set additional filters (e.g. filter by some text).
So far I wasn't worried about performance, trying to get the other things right, and only worked with smaller test sets. I am not quite sure how to tackle this problem efficiently and would be glad for every input. Thanks.
Edit: Used language is C# 4.
Update: I am now using a interval tree, implementation can be found here:
https://github.com/mbuchetics/RangeTree
It also comes with an asynchronous version which rebuilds the tree using the Task Parallel Library (TPL).
We had similar problem in our development - had to collect several million items sorted by some key and then export one page on demand from it. I see that your problem is somehow similar.
For the purpose, we adapted the red-black tree structure, in the following ways:
we added the iterator to it, so we could get 'next' item in o(1)
we added finding the iterator from the 'index', and managed to do that in O(log n)
RB Tree has O(log n) insertion complexity, so I guess that your insertions will fit in there nicely.
next() on the iterator was implemented by adding and maintaining the linked list of all leaf nodes - our original adopted RB Tree implementation didn't include this.
RB Tree is also cool because it allows you to fine-tune the node size according to your needs. By experimenting you'll be able to figure right numbers that just fit your problem.
Use SortedList sorted by timestamp.
All you have to is to have a implement a binary search on the sorted keys inside the sorted list to find the boundary of your selection which is pretty easy.
Insert new items into a sorted list. This would let you select a range pretty easily. You could potentially use linq as well if you're familiar with it.
I'm looking for the longest-path trough a map in a game which is turn based. I got 1s computation time and need to move at that point.
Right now I'm generating the tree every move again.
Is it possible to use my old tree and stack (in which I store the nodes yet to be visited) to get a bigger depth and thus a better result?
For now my SearchClass is based on a Interface, thus changing the return-type and the input-variables of my function is a lot of work. Is there an easy solution for my problem?
If your map is static and not overly large, you could generate your tree in advance.
For each node, calculate the longest path to every other node on the map then store the path and its length against the original node. That way, you no longer need to compute paths during program execution; you only need to use the pre-computed longest path from your current node to your chosen destination.
Can you probably make your (Player's) tree static? Or if you are the only player, you could make it a global variable for the whole program on player's-side, but that depends on many things, that you did not share with us. I would still suggest you to have a look at MCTS: Wikipedia description and Here, it has sample code.
With MCTS the idea is simple: You compute all your 900ms, then make a player's move to the node, that has the highest winning-probability. If you can persist the tree as a global or static ( or both :D ) variable, the first thing, that you do at the begining of the next turn (or the next computation) is to get rid of all parts of the previous tree, that you can not access any more - because you are not at position [0][0], but at position lets say [1][3] ... so that shrinks the tree-size for you, which is good. So what you have to do is to replace the original tree with a new tree, which starts with the node, that you are at the moment standing on. Good thing is, you have some values precomputed, now it depends on your implementation, how you want the nodes to be explored and/or probability-updated. But as the game goes on, the program should have enough data, that it can guarrant it a very high winning probability.
This approach is exceptionally good, because it does not compute the probabilities of the steps you do not take, when known, you are not going to take them (this is a thing you did not mention in your approach and I find it a necessity, so that's why I responded).
Excuse any failures, I'm gonna specify/update/correct things upon request.
And all the things you are doing seem to meet some pattern of university-delivery, see for example, if this is not your case, you could pretty well inspire there. If you have to meet some school-delivery, make sure you do not discuss too much into detail and/or do not ask for technical-implementation help.
Looking for a good approach to keep track of a Breadth-First traversal between two nodes, without knowing anything about the graph. Versus Depth-First (where you can throw away the path if it doesn't pan out) you may have quite a few "open" possibilities during the traversal.
The naive approach is to build a tree with the source node as the root and all its connections as its children. Depending on the amount of space you have, you might need to eliminate cycles as you go. You can do that with a bitmap where each bit corresponds to a distinct node in the graph. When you reach the target node, you can follow the parent links back to the root and that is your path. Since you are going breadth first, you are assured that it is a shortest path even if you don't eliminate cycles.
For a breadth-first search you need to store at least two things. One is the set of already visited nodes and the other is the set of nodes that are directly reachable from the visited nodes but are not visited themselves. Then you keep moving states from the latter set to the former, adding newly reachable states to the latter. If you need the have a path from the root to some node(s), then you will also need to store a parent node for each node (except the root) in the aforementioned sets.
Usually the union of the set of visited nodes and the set of not-visited child nodes (i.e. the set of seen nodes) is stored in a hash table. This is to be able to quickly determine whether or not a "new" state has been seen before and ignore it if this is the case. If you have really big number of states you might indeed need a bit array (as mentioned by Joseph Bui (57509), but unless your states can be used (directly or indirectly) as indices to that array, you will need to use a hash function to map states to indices. In the latter case you might completely ignore certain states because they are mapped to the same index as a different (and seen) node, so you might want to be careful with this. Also, to get a path you still need to store the parent information which pretty much negates the use of the bit-array.
The set of unvisited but seen nodes can be stored as a queue. (Bit arrays are of no use for this set because the array will be mostly empty and finding the next set bit is relatively expensive.)
I just submitted a solution over here that also applies to this question.
Basically, I just keep a single list (a stack really) of visited nodes. Add a node to the list just before recursing or saving a solution. Always remove from the list directly after.
If you are using .NET 3.5 consider using the Hashset to prevent duplicate nodes from being expanded, this happens when there is cycles in your graph. If you have any knowledge about the contents of the graph consider implementing an A* search to reduce the number of nodes that are expanded. Good luck and I hope it works out for you.
If you are still a fan of treeware there are many excellent books on the topic of graphs and graph search such as Artificial Intelligence: A Modern Approach by Peter Norvig and Stuart Russell.
The links in my response appear to have a bug they are Hashset: http://msdn.com/en-us/library/bb359438.aspx and A* search: http://en.wikipedia.org/wiki/A*_search_algorithm