Implementing a tree from scratch

Implementing a tree from scratch - c#

I'm trying to learn about trees by implementing one from scratch.
In this case I'd like to do it in C# Java or C++. (without using built in methods)
So each node will store a character and there will be a maximum of 26 nodes per node.
What data structure would I use to contain the pointers to each of the nodes?
Basically I'm trying to implement a radix tree from scratch.
Thanks,

What data structure would I use to contain the pointers to each of the nodes?
A Node. Each Node should have references to (up to) 26 other Nodes in the Tree. Within the Node you can store them in an array, LinkedList, ArrayList, or just about any other collection you can think of.

What you describe isn't quite a radix tree... in a radix tree, you can have more than one character in a node, and there is no upper bound on the number of child nodes.
What you're describing sounds more limited by the alphabet... each node can be a-z, and can be followed by another letter, a-z, etc. The distinction is critical to the data structure you choose to hold your next-node pointers.
In the tree you describe, the easiest structure to use might be a simple array of pointers... all you need to do is convert the character (e.g. 'A') to its ascii value ('65'), and subtract the starting offset (65) to determine which 'next node' you want. Takes up more space, but very fast insert and traversal.
In a true radix tree, you could have 3, 4, 78, or 0 child nodes, and your 'next node' list will have the overhead of sorting, inserting, and deleting. Much slower.
I can't speak to Java, but if I were implementing a custom radix tree in C#, I'd use one of the built-in .NET collections. Writing your own sorted list isn't really helping you learn the tree concepts, and the built-in optimizations of the .NET collections are tough to beat. Then, your code is simple: Look up your next node; if exists, grab it and go; if not, add it to the next-node collection.
Which collection you use depends on what exactly you're implementing through the tree... every type of tree involves tradeoffs between insertion time, lookup time, etc. The choices you make depend on what is most important to the application, not the tree.
Make sense?

Here's one I found recently that's not a bad API for trees - although I needed graphs it was handy to see how it was set up to separate the data structure for the data it was holding, so you could have a tree-equivalent to Iterator to navigate through the tree, and so on.
https://jsfcompounds.dev.java.net/treeutils/site/apidocs/com/truchsess/util/package-summary.html

If you are actually more interested in speed than space, and if each node represents exactly one letter (implied by your max of 26) then I'd just use a simple array of 26 slots, each referencing a "Node" (the Node is the object containing your array).
The nice thing about a fixed-sized array is that your look up would be much quicker. If you were looking up char "c" that was already guaranteed to be a lower cased letter, the look up would be as easy as:
nextNode=nodes[c-'a'];
A recursive lookup of a string would be trivial.

Thanks for the quick replies.
Yes was snogfish said was correct.
Basically, its a tree with 26 nodes (A-Z) + a bool isTerminator.
Each each node has theses values and they are linked to each other.
I have not learned pointers in depth yet so my tries today to implement this from scratch using unsafe code in C# where futile.
Therefore, I'd be grateful if someone could provide me with the code to get started in C# using the internal tree class. Once I can get it started I can port the algorithms to the other languages and just change it to use pointers.
Thanks very much,
Michael

It doesn't really matter. You can use a linked list, an array (but this will have a fixed size), or a List type from the standard library of your language.
Using a List/array will mean doing some index book-keeping to traverse the tree, so it might be easiest to use just keep references to the children in the parent.

Check out this Simeon Pilgrim Blog, the "Code Camp Puzzle Reviewed". One of the solutions uses a Radix in C# and you can download the solution.

Related

Data structure similar to Dictionary, but with range?

Given a binary tree, each of it's nodes contains an item with range, for instance, one, particular node may contain a range of ( 1 to 1.23456 ]
If the query element is less than or greater than the described range, it inspects the respective child. For example, it is 1.3
As follows, we will be looking over the right branch, performing 2 "if" checks to see if it fits in the range of the element.
Even though balanced Binary Search Tree (BST) is an elegant way of traversing quickly through a dataset, the amount of "if" checks grows significantly if there are more and more children. It becomes even more of a problem, when it has to be done several million times per second.
Is there an elegant way of storing objects such that given an element with a value (1.3 for example), its value can be simply fed into something as Dictionary? This would quickly retrieve the proper element to whose range this value fits or null if it fits none.
However, dictionary doesn't check against ranges, instead, it expects a single value. Therefore, is there a data structure which can provide an item if supplied key fits within the item's range?
Here a person has similar problem, however he finds out that the memory is wasted. He is being advised to BST approach, but is it the only solution?
Sorry if there is an evident answer, I may missed it.

Are you asking about interval trees? Interval trees allow you get all the elements on the interval x..y within O(logn) time. For C# implementation I have used the libary called IntervalTreeLib and it worked nicely.
In computer science, an interval tree is an ordered tree data
structure to hold intervals. Specifically, it allows one to
efficiently find all intervals that overlap with any given interval or
point. It is often used for windowing queries, for instance, to find
all roads on a computerized map inside a rectangular viewport, or to
find all visible elements inside a three-dimensional scene. A similar
data structure is the segment tree.

What are some alternatives to recursive search algorithms?

I am looking at alternatives to a deep search algorithm that I've been working on. My code is a bit too long to post here, but I've written a simplified version that captures the important aspects. First, I've created an object that I'll call 'BranchNode' that holds a few values as well as an array of other 'BranchNode' objects.
class BranchNode : IComparable<BranchNode>
{
public BranchNode(int depth, int parentValue, Random rnd)
{
_nodeDelta = rnd.Next(-100, 100);
_depth = depth + 1;
leafValue = parentValue + _nodeDelta;
if (depth < 10)
{
int children = rnd.Next(1, 10);
branchNodes = new BranchNode[children];
for (int i = 0; i < children; i++)
{
branchNodes[i] = new BranchNode(_depth, leafValue, rnd);
}
}
}
public int CompareTo(BranchNode other)
{
return other.leafValue.CompareTo(this.leafValue);
}
private int _nodeDelta;
public BranchNode[] branchNodes;
private int _depth;
public int leafValue;
}
In my actual program, I'm getting my data from elsewhere... but for this example, I'm just passing an instance of a Random object down the line that I'm using to generate values for each BranchNode... I'm also manually creating a depth of 10, whereas my actual data will have any number of generations.
As a quick explanation of my goals, _nodeDelta contains a value that is assigned to each BranchNode. Each instance also maintains a leafValue that is equal to current BranchNode's _nodeDelta summed with the _nodeDeltas of all of it's ancestors. I am trying to find the largest leafValue of a BranchNode with no children.
Currently, I am recursively transversing the heirarchy searching for BranchNodes whose child BranchNodes array is null (a.k.a: a 'childless' BranchNode), then comparing it's leafValue to that of the current highest leafValue. If it's larger, it becomes the benchmark and the search continues until it's looked at all BranchNodes.
I can post my recursive search algorithm if it'd help, but it's pretty standard, and is working fine. My issue is, as expected, that for larger heirarchies, my algorithm takes a long while to transverse the entier structure.
I was wondering if I had any other options that I could look into that may yield faster results... specificaly, I've been trying to wrap my head around linq, but I'm not even sure that it is built to do what I'm looking for, or if it'd be any faster. Are there other things that I should be looking into as well?

Maybe you want to look into an alternative data index structure: Here
It always depends on the work you are doing with the data, but if you assign a unique ID on each element that stores the hierarchical form, and creating an index of what you store, your optimization will make much more sense than micro-optimizing parts of what you do.
Also, this also lends itself a very different paradigm in search algorithms, that uses no recursion, but in the cost of additional memory for the IDs and possibly the index.

If you must visit all leaf nodes, you cannot speed up the search: it is going to go through all nodes no matter what. A typical trick played to speed up a search on trees is organizing them in some special way that simplifies the search of the tree. For example, by building a binary search tree, you make your search O(Log(N)). You could also store some helpful values in the non-leaf nodes from which you could later construct the answer to your search query.
For example, you could decide to store the _bestLeaf "pointing" to the leaf with the highest _nodeDelta of all leaves under the current subtree. If you do that, your search would become an O(1) lookup. Your inserts and removals would become more expensive, however, because you would need to update up to Log-b(N) items on the way back to root with the new _bestLeaf (b is the branching factor of your tree).

I think the first thing you should think about is maybe going away from the N-Tree and going to as Binary Search tree.
This means that all nodes have only 2 children, a greater child, and a lesser child.
From there, I would say look into balancing your search tree with something like a Red-Black tree or AVL. That way, searching your tree is O(log n).
Here are some links to get you started:
http://en.wikipedia.org/wiki/Binary_search_tree
http://en.wikipedia.org/wiki/AVL_tree
http://en.wikipedia.org/wiki/Red-black_tree
Now, if you are dead set on having each node able to have N child nodes, here are some things you should thing about:
Think about ordering your child nodes so that you can quickly determine which has the highest leaf number. that way, when you enter a new node, you can check one child node and quickly determine if it is worth recursively checking it's children.
Think about ways that you can quickly eliminate as many nodes as you possibly can from the search or break the recursive calls as early as you can. With the binary search tree, you can easily find the largest leaf node by always only looking at the greater child. this could eliminate N-log(n) children if the tree is balanced.
Think about inserting and deleting nodes. If you spend more time here, you could save a lot more time later

As others mention, a different data structure might be what you want.
If you need to keep the data structure the same, the recursion can be unwound into loops. While this approach will probably be a little bit faster, it's not going to be orders of magnitude faster, but might take up less memory.

C# Datastructure with SortedDictionary() and node.Next() functionality?

How to construct/obtain a datastructure with the following capabilities:
Stores (key,value) nodes, keys implement IComparable.
Fast (log N) insertion and retrieval.
Fast (log N) method to retrieve the next higher/next lower node from any node. [EXAMPLE: if
the key values inserted are (7,cat), (4,dog),(12,ostrich), (13,goldfish) then if keyVal referred to (7,cat), keyVal.Next() should return a reference to (12,ostrich) ].
A solution with an enumerator from an arbitrary key would of course also suffice. Note that standard SortedDictionary functionality will not suffice, since only an enumerator over the entire set can be returned, which makes finding keyVal.next require N operations at worst.
Could a self-implemented balanced binary search tree (red-black tree) be fitted with node.next() functionality? Any good references for doing this? Any less coding-time consuming solutions?

I once had similar requirements and was unable to find something suitable. So I implemented an AVL tree. Here come some advices to do it with performance in mind:
Do not use recursion for walking the tree (insert, update, delete, next). Better use a stack array to store the way up to the root which is needed for balancing operations.
Do not store parent nodes. All operations will start from the root node and walk further down. Parents are not needed, if implemented carefully.
In order to find the Next() node of an existing one, usually Find() is first called. The stack produced by that, should be reused for Next() than.
By following these rules, I was able to implement the AVL tree. It is working very efficiently even for very large data sets. I would be willing to share, but it would need some modifications, since it does not store values (very easy) and does not rely on IComparable but on fixed key types of int.

The OrderedDictionary in PowerCollections provides a "get iterator starting at or before key" function that takes O(log N) time to return the first value. That makes it very fast to, say, scan the 1,000 items that are near the middle of a 50 million item set (which with SortedDictionary would require guessing to start at the start or the end, both of which are equally bad choices and would require iterator around 25 million items). OrderedDictionary can to that with just 1,000 items iterated.
There is a problem in OrderedDictionary though in that it uses yield which causes O(n^2) performance and out of memory conditions when iterating a 50 million item set in a 32 bit process. There is a quite simple fix for that while I will document later.

Filtering a sub set of (potentially) 1.000.000+ items

I have a large dataset with possibly over a million entries. All items have an assigned time stamp and items are added to the set at runtime (usually, but not always, with a newer time stamp).
I need to show a sub set of this data given a certain time range. This time range is usually quite small compared to the total data set, i.e. of the 1.000.000+ items not more than about 1000 are in that given time range. This time range moves at a constant pace, e.g. every second the time range is moved by one second.
Additionally, the user may adjust the time range at any time ("move" through the data set) or set additional filters (e.g. filter by some text).
So far I wasn't worried about performance, trying to get the other things right, and only worked with smaller test sets. I am not quite sure how to tackle this problem efficiently and would be glad for every input. Thanks.
Edit: Used language is C# 4.
Update: I am now using a interval tree, implementation can be found here:
https://github.com/mbuchetics/RangeTree
It also comes with an asynchronous version which rebuilds the tree using the Task Parallel Library (TPL).

We had similar problem in our development - had to collect several million items sorted by some key and then export one page on demand from it. I see that your problem is somehow similar.
For the purpose, we adapted the red-black tree structure, in the following ways:
we added the iterator to it, so we could get 'next' item in o(1)
we added finding the iterator from the 'index', and managed to do that in O(log n)
RB Tree has O(log n) insertion complexity, so I guess that your insertions will fit in there nicely.
next() on the iterator was implemented by adding and maintaining the linked list of all leaf nodes - our original adopted RB Tree implementation didn't include this.
RB Tree is also cool because it allows you to fine-tune the node size according to your needs. By experimenting you'll be able to figure right numbers that just fit your problem.

Use SortedList sorted by timestamp.
All you have to is to have a implement a binary search on the sorted keys inside the sorted list to find the boundary of your selection which is pretty easy.

Insert new items into a sorted list. This would let you select a range pretty easily. You could potentially use linq as well if you're familiar with it.

C# graph traversal - tracking path between any two nodes

Looking for a good approach to keep track of a Breadth-First traversal between two nodes, without knowing anything about the graph. Versus Depth-First (where you can throw away the path if it doesn't pan out) you may have quite a few "open" possibilities during the traversal.

The naive approach is to build a tree with the source node as the root and all its connections as its children. Depending on the amount of space you have, you might need to eliminate cycles as you go. You can do that with a bitmap where each bit corresponds to a distinct node in the graph. When you reach the target node, you can follow the parent links back to the root and that is your path. Since you are going breadth first, you are assured that it is a shortest path even if you don't eliminate cycles.

For a breadth-first search you need to store at least two things. One is the set of already visited nodes and the other is the set of nodes that are directly reachable from the visited nodes but are not visited themselves. Then you keep moving states from the latter set to the former, adding newly reachable states to the latter. If you need the have a path from the root to some node(s), then you will also need to store a parent node for each node (except the root) in the aforementioned sets.
Usually the union of the set of visited nodes and the set of not-visited child nodes (i.e. the set of seen nodes) is stored in a hash table. This is to be able to quickly determine whether or not a "new" state has been seen before and ignore it if this is the case. If you have really big number of states you might indeed need a bit array (as mentioned by Joseph Bui (57509), but unless your states can be used (directly or indirectly) as indices to that array, you will need to use a hash function to map states to indices. In the latter case you might completely ignore certain states because they are mapped to the same index as a different (and seen) node, so you might want to be careful with this. Also, to get a path you still need to store the parent information which pretty much negates the use of the bit-array.
The set of unvisited but seen nodes can be stored as a queue. (Bit arrays are of no use for this set because the array will be mostly empty and finding the next set bit is relatively expensive.)

I just submitted a solution over here that also applies to this question.
Basically, I just keep a single list (a stack really) of visited nodes. Add a node to the list just before recursing or saving a solution. Always remove from the list directly after.

If you are using .NET 3.5 consider using the Hashset to prevent duplicate nodes from being expanded, this happens when there is cycles in your graph. If you have any knowledge about the contents of the graph consider implementing an A* search to reduce the number of nodes that are expanded. Good luck and I hope it works out for you.
If you are still a fan of treeware there are many excellent books on the topic of graphs and graph search such as Artificial Intelligence: A Modern Approach by Peter Norvig and Stuart Russell.
The links in my response appear to have a bug they are Hashset: http://msdn.com/en-us/library/bb359438.aspx and A* search: http://en.wikipedia.org/wiki/A*_search_algorithm

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.