Duplicate Entries in Binary Search Tree

Duplicate Entries in Binary Search Tree - c#

I have a very simple question regarding BSTs. I have seen multiple definitions of BSTs regarding duplicate entries. Some define BSTs as not allowing duplicate entries, others that node's left child is <= to the nodes value and the right child is greater than the node's value, and some definitions are the opposite of that ( left child is < than the node, right child is >=).
So my question is what is the official definition (if one exists) for BSTs regarding duplicate entries? For example what would a BST look like after inserting the values : 3, 5, 10, 8, 5, 10?
Thank you in advance for clarifying the definition and answering my question!

One of the well-known books in the algorithm and data structure area is the CLRS book, also known as the bible of data structures and algorithms:
According to the definition of this book, the duplicate entries are placed in the right tree of the node that contains the same key. As an example, take a look at the insertion algorithm of BSTs adopted from this book:

the important point is that not having duplicates in the tree assures the fast lookup times.
If you have duplicates in one side of the node your search time will suffer because you have to go through all duplicates before you can continue.
http://en.wikipedia.org/wiki/Binary_search_tree

Related

Algorithm for sorting questions that have been re-arranged, knowing new position and old position

Say I have a bunch of question fields and users can rearrange them and send to the server the new position and old position of the question(s) that have had their position altered and the server will then use this data to update all the effected questions position/number value in the database. As stated I am only sending back the new position and old position of the question(s) that have been moved and not of those effected before or after it etc. So that calculation will have to be done at the server.
So lets say I have questions numbered.
1 2 3 4 5 6 7 8
If I rearrange them as such
8 1 2 3 4 5 6 7
Then all values between the new position for (question 8), which is now 1 and old position (8) will need to be updated, with 1 - 7 getting a +1 and 8 being changed to 1. This is of course a simplified scenario, the questions can be rearranged to a much greater complexity where effected ranges will overlap and other ranges terminate within other ranges etc. So the required increments and decrements will be combined. Sounds like a fairly common scenario so if someone can put up an example, preferably in c# would be great.

Something along the lines of the answers above: Can you just add a 'position' property to the question object (independent of it's position in the list)? The user changes this property when they rearrange. Then you can just do a ListOfQuestions.OrderBy(q => q.Position)?
This just

This looks to me like you're trying to simply do random insertions and removals of a collection.
If that is the case, a Linked List is a good option for that. A linked list essentially has a head node(first node) and each node has a next attribute, which references another list.
that way, if you want to remove a node you can just make that node's parent node point to that node's child, and if you want to insert a node you can do the reverse.
c# has a built-in linked list class that you might want to check out

An alternative would be to:
1. Create your fixed size list/collection (Collection A)
2. Put the questions with modified positions on the new positions/index
3. Iterate through original question collection and placing the unmoved questions in the empty positions of the new collection (Collection A). Update question indexes based on their positions in the new collection.
Crude I know.

Can a "pre-computed" map-reduce index (à la RavenDB/CouchDB) be used for this kind of algorithm?

I'm trying to see if a specific algorithm can be translated to the kind of map-reduce index RavenDB/CouchDB uses, ie, "pre-computed" map-reduce (which means the indexes are refreshed on insertion and updates, not when performing the actual query).
Let's say we have a typical online store with 50,000 products, grouped in categories. Every product has a collection of "Attribute Values", ie, something like "[Red, Round, Metal]".
Since we have so much products on our website, and there's probably a lot of items in each of the categories, we want to give the user another way to "filter" the products he's currently seeing.
For example, if a category is "Less than $20", there's a whole bunch of products in this category. But our user only need to see products which are less than $20 and Red. Unfortunately, there's no sub-category "Red" in the "Less than $20" category.
Our algorithm would take the current list of products, and generate a list of "interesting" Attributes and Attribute Values, ie, given a list of products, it would output something like:
Color
Red (40)
Blue (32)
Yellow (17)
Material
Metal (37)
Plastic (36)
Wood (23)
Shape
Square (56)
Round (17)
Cylinder (12)
Could this sort of algorithm be somehow pre-computed à la RavenDB/CouchDB map-reduce index? If not, why exactly (so I can identify that kind of algorithm in the future) and if yes, how?
A C# 4.0 Visual Studio Test Solution is available that demonstrates the potential data structures and sample data, as well as a try at a map-reduce implementation (which doesn't seem to be pre-computable).

General case: It's always possible to use a CouchDB-style map-reduce view, but it's not necessarily practical.
In the end, it's mostly a counting-based argument: if you need to ask the question for any subset of your 500,000 products, then your database must be able to provide a distinct answer to each of 2500,000 different possible questions, which uses a prohibitive amount of memory if you have to emit a B-tree leaf for every one of them (and you need to emit data unless the answer to most of these queries is zero, false, an empty set or a similar null value).
CouchDB provides a first small optimization through the existence of range queries (meaning that in an ideal case, it can use as little as N B-tree leaves to answer N2 questions). However, in your example, this would only reduce the number of leaves down to 2250,000 (and that's a theoretical lower bound).
CouchDB provides a second small optimization through key prefix queries, meaning that you can compress [A], [A,B] and [A,B,C] queries into a single [A,B,C] key. So, instead of your 2250,000 possibilities, you're down to a "mere" 2249,999 ...
So, while you could think up an emitting strategy for answering the question for any subset, it would take more storage space than is actually available on our planet. In the general case, to answer N different questions you need to emit at least sqrt(N/2) B-tree leaves, so count your questions and determine if that lower bound on the number of leaves is acceptable.
Only for categories and subcategories: if you give up on arbitrary lists of products and only ask questions of the form "give me the significant attributes in category A filtered by attributes B and C", then your number of emits drops to:
AvgCategories * AvgAttr * 2 ^ (AvgAttr - 1) * 500,000
You're basically emitting for each product the keys [Category,Attr,Attr,...] for all categories of the product and all combinations of attributes of the product, which lets you query by category + attributes. If you have on average 1 category and 3 attributes per product, this works out to about 6 million entries, which is fairly acceptable.

This should be quite straightforward to implement in something like CouchDB. Have the map phase of your index output one key, value pair for each attribute the object has, with the value simply being '1'. Then, have the reduce phase sum up all input values and output the sum. The end result will be an index of the form you describe.

Implementing a tree from scratch

I'm trying to learn about trees by implementing one from scratch.
In this case I'd like to do it in C# Java or C++. (without using built in methods)
So each node will store a character and there will be a maximum of 26 nodes per node.
What data structure would I use to contain the pointers to each of the nodes?
Basically I'm trying to implement a radix tree from scratch.
Thanks,

What data structure would I use to contain the pointers to each of the nodes?
A Node. Each Node should have references to (up to) 26 other Nodes in the Tree. Within the Node you can store them in an array, LinkedList, ArrayList, or just about any other collection you can think of.

What you describe isn't quite a radix tree... in a radix tree, you can have more than one character in a node, and there is no upper bound on the number of child nodes.
What you're describing sounds more limited by the alphabet... each node can be a-z, and can be followed by another letter, a-z, etc. The distinction is critical to the data structure you choose to hold your next-node pointers.
In the tree you describe, the easiest structure to use might be a simple array of pointers... all you need to do is convert the character (e.g. 'A') to its ascii value ('65'), and subtract the starting offset (65) to determine which 'next node' you want. Takes up more space, but very fast insert and traversal.
In a true radix tree, you could have 3, 4, 78, or 0 child nodes, and your 'next node' list will have the overhead of sorting, inserting, and deleting. Much slower.
I can't speak to Java, but if I were implementing a custom radix tree in C#, I'd use one of the built-in .NET collections. Writing your own sorted list isn't really helping you learn the tree concepts, and the built-in optimizations of the .NET collections are tough to beat. Then, your code is simple: Look up your next node; if exists, grab it and go; if not, add it to the next-node collection.
Which collection you use depends on what exactly you're implementing through the tree... every type of tree involves tradeoffs between insertion time, lookup time, etc. The choices you make depend on what is most important to the application, not the tree.
Make sense?

Here's one I found recently that's not a bad API for trees - although I needed graphs it was handy to see how it was set up to separate the data structure for the data it was holding, so you could have a tree-equivalent to Iterator to navigate through the tree, and so on.
https://jsfcompounds.dev.java.net/treeutils/site/apidocs/com/truchsess/util/package-summary.html

If you are actually more interested in speed than space, and if each node represents exactly one letter (implied by your max of 26) then I'd just use a simple array of 26 slots, each referencing a "Node" (the Node is the object containing your array).
The nice thing about a fixed-sized array is that your look up would be much quicker. If you were looking up char "c" that was already guaranteed to be a lower cased letter, the look up would be as easy as:
nextNode=nodes[c-'a'];
A recursive lookup of a string would be trivial.

Thanks for the quick replies.
Yes was snogfish said was correct.
Basically, its a tree with 26 nodes (A-Z) + a bool isTerminator.
Each each node has theses values and they are linked to each other.
I have not learned pointers in depth yet so my tries today to implement this from scratch using unsafe code in C# where futile.
Therefore, I'd be grateful if someone could provide me with the code to get started in C# using the internal tree class. Once I can get it started I can port the algorithms to the other languages and just change it to use pointers.
Thanks very much,
Michael

It doesn't really matter. You can use a linked list, an array (but this will have a fixed size), or a List type from the standard library of your language.
Using a List/array will mean doing some index book-keeping to traverse the tree, so it might be easiest to use just keep references to the children in the parent.

Check out this Simeon Pilgrim Blog, the "Code Camp Puzzle Reviewed". One of the solutions uses a Radix in C# and you can download the solution.

Rotating/moving objects

I have a situation that I am not quite sure where to start looking. I have been searching for the past four hours and I couldn't find anything that does what I am looking to do.
I have eight objects controlling individual lights. When an event occurs I generate an ID of it, store that value in the first available object, and start a method. I also store the ID in a list and match the object number to the index number of that list. What I would like to do is have those eight objects update and rotate depending if the matching item is removed from the list.
Example: There are five out of the eight objects active and I remove an item from the list indexed at 0. Object 0 is stopped then object 1 is moved to object 0 then 2 to 1, 3 to 2, etc.
So my question is what terms should I look up to help me accomplish that goal? I am relatively new to c# and with the results of my research today I just want to know what is the right question to ask.
If what I am looking to do is impossible just say so and I will come up with a more simple program on my end. Or if you have a solution to that situation I am all ears.

I think you're just describing a [stack data structure](https://en.wikipedia.org/wiki/Stack_(abstract_data_type). Check it out and see if it's what you're looking for.
A stack doesn't "rotate" objects, but when you remove the top item, the "index" of all the other items will decrement as you described. In your example, it seems like the list would eventually become empty (after 8 events) which is consistent with a stack - but it's not clear why one would call this rotation.
If a stack is what you're looking for, the BCL defines a generic Stack<T> that can do this. You add items to a stack with the Push() operation, which has the side effect of incrementing the index of all other items in the structure. The dual Pop operation removes the top item from the collection and decrements the index of the other items, as you described.

C# graph traversal - tracking path between any two nodes

Looking for a good approach to keep track of a Breadth-First traversal between two nodes, without knowing anything about the graph. Versus Depth-First (where you can throw away the path if it doesn't pan out) you may have quite a few "open" possibilities during the traversal.

The naive approach is to build a tree with the source node as the root and all its connections as its children. Depending on the amount of space you have, you might need to eliminate cycles as you go. You can do that with a bitmap where each bit corresponds to a distinct node in the graph. When you reach the target node, you can follow the parent links back to the root and that is your path. Since you are going breadth first, you are assured that it is a shortest path even if you don't eliminate cycles.

For a breadth-first search you need to store at least two things. One is the set of already visited nodes and the other is the set of nodes that are directly reachable from the visited nodes but are not visited themselves. Then you keep moving states from the latter set to the former, adding newly reachable states to the latter. If you need the have a path from the root to some node(s), then you will also need to store a parent node for each node (except the root) in the aforementioned sets.
Usually the union of the set of visited nodes and the set of not-visited child nodes (i.e. the set of seen nodes) is stored in a hash table. This is to be able to quickly determine whether or not a "new" state has been seen before and ignore it if this is the case. If you have really big number of states you might indeed need a bit array (as mentioned by Joseph Bui (57509), but unless your states can be used (directly or indirectly) as indices to that array, you will need to use a hash function to map states to indices. In the latter case you might completely ignore certain states because they are mapped to the same index as a different (and seen) node, so you might want to be careful with this. Also, to get a path you still need to store the parent information which pretty much negates the use of the bit-array.
The set of unvisited but seen nodes can be stored as a queue. (Bit arrays are of no use for this set because the array will be mostly empty and finding the next set bit is relatively expensive.)

I just submitted a solution over here that also applies to this question.
Basically, I just keep a single list (a stack really) of visited nodes. Add a node to the list just before recursing or saving a solution. Always remove from the list directly after.

If you are using .NET 3.5 consider using the Hashset to prevent duplicate nodes from being expanded, this happens when there is cycles in your graph. If you have any knowledge about the contents of the graph consider implementing an A* search to reduce the number of nodes that are expanded. Good luck and I hope it works out for you.
If you are still a fan of treeware there are many excellent books on the topic of graphs and graph search such as Artificial Intelligence: A Modern Approach by Peter Norvig and Stuart Russell.
The links in my response appear to have a bug they are Hashset: http://msdn.com/en-us/library/bb359438.aspx and A* search: http://en.wikipedia.org/wiki/A*_search_algorithm

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.