Sum of nodes between two BST leaves - c#

I am NOT looking for Maximum Sum Path of a tree. I can create and find the sum of the binary search tree, but I need to find the sum of all the nodes between two leaves.
For example, for the BST built with these nodes : 5, 10, 13, 8, 3, 4, 5, the tree looks like this :
5
/ \
3 10
\ / \
4 8 13
/
5
So if the input is 4 and 13, then the sum of nodes is 3+5+10=18.
The function could be something like this: Sum(tree, firstNumber, secondNumber);
I am thinking to create two ordered lists of all the nodes until the root node and seeing if any of them are common, and then just adding up the values of all the unique nodes. It's terrible in terms of time complexity and memory management, so I am looking to see if there's an easier way
Edit: M. Hazara is right, 5 is not a leaf. so I removed that example

Well, probably the first idea that comes to mind is to make a list for each node traveled for both leafs. However, since it's a Binary Search Tree (BST), the condition for which path to take is known and, therefore, is possible to track which nodes are in a common path and which aren't.
First of all, a BST search always starts from the root node, so it will always be part of the sum, assuming the BST has 3 nodes or more because we need two leafs. Secondly, it's essential to do a BST search for each leaf (firstNumber,secondNumber) to guarantee that we traveled each path's nodes, regardless if they are common or not. Finally, all that's left is to determine when a node is common in both paths, so we don't add the same value twice.
A node is in the common path if both BST searches take the same path. So, you will have to run both simultaneously inside your function. We can account all the common nodes in the sum by using a do-while:
firstCurrentNode = tree.Root;
secondCurrentNode = tree.Root;
do {
sum += firstCurrentNode;
// FirstNumber path check:
if (firstNumber < firstCurrentNode) firstCurrentNode = firstCurrentNode.goLeftNode();
else firstCurrentNode = firstCurrentNode.goRightNode();
// SecondNumber path check:
if (secondNumber < secondCurrentNod) secondCurrentNode = secondCurrentNode.goLeftNode();
else secondCurrentNode = secondCurrentNode.goRightNode();
} while (firstCurrentNode == secondCurrentNode)
Once they are not equal, i.e, take different paths, due to the properties of a tree, they will never share a same path again. Therefore, we only need to continue the BST searches individually, including every node we find into the sum. However, we might have reached the leafs already, so it's necessary to check that first:
// Search for firstNumber
while(firstCurrentNode != firstNumber) {
sum += firstCurrentNode;
if (firstNumber < firstCurrentNode) firstCurrentNode = firstCurrentNode.goLeftNode();
else firstCurrentNode = firstCurrentNode.goRightNode();
}
// Search for secondNumber
while(secondCurrentNode != secondNumber) {
sum += secondCurrentNode;
if (secondNumber < secondCurrentNod) secondCurrentNode = secondCurrentNode.goLeftNode();
else secondCurrentNode = secondCurrentNode.goRightNode();
}
Of course, I assumed that the leafs used exists in the tree and all nodes contains unique values and you can make adjustments for that, but this is one way to get the sum without using more memory space or get worse performance.

Related

Best way to search/filter (potentially several billion) combinations of items from multiple lists or arrays

I'm trying to write a program to optimize equipment configurations for a game. The problem is as follows:
There are six different equipment slots for a character to place an item. This is represented by 6 lists of individual items for each slot in the code containing all of the equipment owned by the player altogether.
The program will calculate the total stats of the character for each possible combination of equipment (1 from each list). These calculated stats can be filtered by specific stat min/max values and then also sorted by a specific stat to pinpoint a certain target set of stats for their character.
The program should be able to perform these queries without running out of memory or taking hours, and of course, the main problem is sifting through several billion possible combinations.
I'm not sure what the name of any supporting data structures or search algorithms to accomplish this would be called (in order to do more research towards a solution). I have come up with the following idea but I'm not sure if i'm on the right track or if someone can point me in a more effective direction.
The idea i'm pursuing is to use recursion, where each list (for each possible equipment slot) is set into a tree structure, with each progressive list acting as a child of the last. E.G.:
Weapons List
|
-----Armor List
|
------Helm List... etc
Each layer of the tree would keep a dictionary of every child path it can take containing the IDs of 1 item from each list and progressively calculating the stats given to the character (simple addition of stats from weapon + armor + helm as it traverses the tree and so on...)
When any stat with a min/max filter being applied hits it's boundary for that stat (namely, if the stat goes over the maximum before it reaches the bottom layer of the tree, it eliminates that "path" from the dictionary thus removing that entire leg of possible results from being traversed).
The main goal here is to reduce the possible tree paths to be traversed by the search algorithm and remove as many invalid results before the tree needs to calculate them to make the search as fast as possible and avoid any wasteful cycles. This seems pretty straightforward when removing items based on a "maximum" filter since when adding each item's stats progressively we can quickly tell when a stat has crossed it's expected maximum -- however when it comes to stopping paths based on a minimum total stat, I can't wrap my head around how to predict and remove these paths that won't end up above the minimum by the sixth item.
To simplify the idea, think of it like this:
I have 3 arrays of numbers
[X][0][1][2]
[0] 5 3 2
[1] 1 0 8
[2] 3 2 7
[3] 2 1 0
I want to find all combinations from the 3 arrays (sums) that are minimum of 9 and maximum of 11 total.
Each array must select at least but no more than 1 item and the sum of those selected values is what is being searched. This would need to be able to scale up to search 6+ arrays of 40+ values each essentially. Is the above approach on the right track or what is the best way to go about this (mainly using c#)
You should be able to filter out a lot of items by using a lower and upper bound for each slot:
var minimum = slots.Sum(slot => slot.Minimum);
var maximum = slots.Sum(slot => slot.Maximum);
foreach (var slot in slots)
{
var maxAvailable = maximum - slot.Maximum;
var minAvailable = minimum - slot.Minimum;
var filtered = slot.Items
// If we choose the strongest item in all the other slots and it's still below the minimum
.Where(item => item.Value + maxAvailable >= request.MinimumValue)
// If we choose the weakest item in all the other slots and its still above the maximum
.Where(item => item.Value + minAvailable <= request.MaximumValue);
}
After doing this, you can guarantee that all your combinations will be above the requested minimum, however some combinations may also be above the requested maximum, so combine this with the logic you have so far and I think you should get pretty optimal performance.

Finding a linked list neighbor rolling density

I have a linked list like this:
public class Node {
public Node next;
public Node prev;
public int length;
public int weight;
}
I am trying to finding a rolling density for a non-circular linked list (has a clear begin and end) that uses a specific length as the window. This adds complexity because the end nodes will only use a percentage of the weight.
That means given 3 nodes
A (L: 10, W:10) -> B (L: 5, W:10) -> C (L:20, W:5)
(where L means length and W means weight)
and a window of 9 for the node B it would take use all of Node B, and now it has a window of 4 left over. It would evenly split the window before and after so 2 from A and 2 of C.
so the density would be:
[(2/10)*(10) + (5/5)*(10) + (2/20)*(5)] / 9 = 1.3889
This common case is not the part I am struggling with, its the end point when there is not enough on the left side, the window should take more from the right side and vice versa. There is also the case where there could not be enough length on either side.
I am un-sure if I should implement as a recursive function or a loop. I know a loop would require less calculations but a recursive function could be easier to understand.
Case 1: There is just 1 node in the linked list
take the density of the 1 node ignoring the window
Case 2: There is not enough length on the left/right side
Take the remainder from the right/left side.
Case 3: There is not enough length on both sides, but there is more than just 1 node.
Take all nodes and not require the window to be met.
With all you wrote, it seems your only question is: "should I loop or should I recurse?" Depending on your needs, whichever is easiest to read and maintain (or if performance is your highest priority, whichever is more performant).
You're dealing with a linked list. I would recommend simply looping, rather than recursing. (If you were dealing with a tree, that would be a different story.) In either case, you may find a way to save a lot of computation by doing some form of memoization. If your window involves going through hundreds of nodes to the left and right, you can store much of your density calculation for node n, and it will almost all be reusable for n+1. Before you get into that, I'd test the non-memoized version first, and see if it's sufficiently performant.
One design pattern that might help you remove your number of edge cases is to have a Null node:
Node nullNodeBeginning = new nullNode(length=0, weight=0);
nullNodeBeginning.prev = nullNodeBeginning;
Node nullNodeEnding = new nullNode(length=0, weight=0);
nullNodeEnding.next = nullNodeEnding;
If you add nullNodeBeginning to the beginning of your linked list and nullNodeEnding to the ending of your linked list, you effectively have an infinite linked list. Then your logic becomes something like:
Get the length of the specific center node
For previous, next:
Get the length of n nodes in that direction (may total to 0)
If total length = total length of list, you can't fill the window
If length < n, get nodes from the other direction
There are other ways to do it (and this one requires maintaining the length of all the nodes), but by capping your list with a nullNode, you should be able to avoid all special cases other than "insufficient nodes for the window", and make your logic much cleaner.

Find matching entries in list which are different

I have two lists. The first one contains entries like
RB Leipzig vs SV Darmstadt 98
Hertha Berlin vs Hoffenheim
..
and in the second contains basically the same entries but could but written in different forms. For example:
Hertha BSC vs TSG Hoffenheim
RB Leipzig vs Darmstadt 98
..
and so on. Both lists represent the same sport games but they can use alternate team names and don't appear in the same order.
My goal (hehe pun) is to unify both lists to one and match the same entries and discard entries which don't appear in both lists.
I already tried to use Levensthein distance and fuzzy search.
I thought about using machine learning but have no idea how to start with that.
Would appriciate any help and ideas!
You can solve this problem using Linear Programming combined with the Levenshtein Distance you already mentioned. Linear Programming is a commonly used optimization technique for solving optimization problems, like this one. Check this link to find out an example how to use Solver Foundation in C#. This example isn't related with the specific problem you have, but is a good example how the library works.
Hints:
You need to build a matrix of distances between each pair of teams/strings between 2 lists. Let's say both lists have N elements. In i-th row of the matrix you will have N values, the j-th value will indicate the Levenshtein Distance between i-th element from the first and j-th element from the second list. Then, you need to set the constraints. The constraints would be:
The sum in each row needs to equal 1
The sum in each column equals 1
Each of the coefficient (matrix entry) needs to be either 0 or 1
I have solved the same problem a couple of months ago and this approach worked perfectly for me.
And the cost function would be the sum: `
sum(coef[i][j] * dist[i][j] for i in [1, n] and for j in [1, n])
`. You want to minimize this function, because you want the overall "distance" between the 2 sets after the mapping to be as low as possible.
You can use a BK-tree (I googled C# implementations and found two: 1, 2). Use the Levenshtein distance as the metric. Optionally, delete the all-uppercase substrings from the names in the lists in order to improve the metric (just be careful that this doesn't accidentally leave you with empty strings for names).
1. Put the names from the first list in the BK-tree
2. Look up the names from the second list in the BK-tree
a. Assign an integer token to the name pair, stored in a Map<Integer, Tuple<String, String>>
b. Replace each team name with the token
3. Sort each token pair (so [8 vs 4] becomes [4 vs 8])
4. Sort each list by its first token in the token pair,
then by the second token in the token pair (so the list
would look like [[1 vs 2], [1 vs 4], [2 vs 4]])
Now you just iterate through the two lists
int i1 = 0
int i2 = 0
while(i1 < list1.length && i2 < list2.length) {
if(list1[i1].first == list2[i2].first && list1[i1].second == list2[i2].second) {
// match
i1++
i2++
} else if(list1[i1].first < list2[i2].first) {
i1++
} else if(list1[i1].first > list2[i2].first) {
i2++
} else if(list1[i1].second < list2[i2].second {
i1++
} else {
i2++
}
}

What are some alternatives to recursive search algorithms?

I am looking at alternatives to a deep search algorithm that I've been working on. My code is a bit too long to post here, but I've written a simplified version that captures the important aspects. First, I've created an object that I'll call 'BranchNode' that holds a few values as well as an array of other 'BranchNode' objects.
class BranchNode : IComparable<BranchNode>
{
public BranchNode(int depth, int parentValue, Random rnd)
{
_nodeDelta = rnd.Next(-100, 100);
_depth = depth + 1;
leafValue = parentValue + _nodeDelta;
if (depth < 10)
{
int children = rnd.Next(1, 10);
branchNodes = new BranchNode[children];
for (int i = 0; i < children; i++)
{
branchNodes[i] = new BranchNode(_depth, leafValue, rnd);
}
}
}
public int CompareTo(BranchNode other)
{
return other.leafValue.CompareTo(this.leafValue);
}
private int _nodeDelta;
public BranchNode[] branchNodes;
private int _depth;
public int leafValue;
}
In my actual program, I'm getting my data from elsewhere... but for this example, I'm just passing an instance of a Random object down the line that I'm using to generate values for each BranchNode... I'm also manually creating a depth of 10, whereas my actual data will have any number of generations.
As a quick explanation of my goals, _nodeDelta contains a value that is assigned to each BranchNode. Each instance also maintains a leafValue that is equal to current BranchNode's _nodeDelta summed with the _nodeDeltas of all of it's ancestors. I am trying to find the largest leafValue of a BranchNode with no children.
Currently, I am recursively transversing the heirarchy searching for BranchNodes whose child BranchNodes array is null (a.k.a: a 'childless' BranchNode), then comparing it's leafValue to that of the current highest leafValue. If it's larger, it becomes the benchmark and the search continues until it's looked at all BranchNodes.
I can post my recursive search algorithm if it'd help, but it's pretty standard, and is working fine. My issue is, as expected, that for larger heirarchies, my algorithm takes a long while to transverse the entier structure.
I was wondering if I had any other options that I could look into that may yield faster results... specificaly, I've been trying to wrap my head around linq, but I'm not even sure that it is built to do what I'm looking for, or if it'd be any faster. Are there other things that I should be looking into as well?
Maybe you want to look into an alternative data index structure: Here
It always depends on the work you are doing with the data, but if you assign a unique ID on each element that stores the hierarchical form, and creating an index of what you store, your optimization will make much more sense than micro-optimizing parts of what you do.
Also, this also lends itself a very different paradigm in search algorithms, that uses no recursion, but in the cost of additional memory for the IDs and possibly the index.
If you must visit all leaf nodes, you cannot speed up the search: it is going to go through all nodes no matter what. A typical trick played to speed up a search on trees is organizing them in some special way that simplifies the search of the tree. For example, by building a binary search tree, you make your search O(Log(N)). You could also store some helpful values in the non-leaf nodes from which you could later construct the answer to your search query.
For example, you could decide to store the _bestLeaf "pointing" to the leaf with the highest _nodeDelta of all leaves under the current subtree. If you do that, your search would become an O(1) lookup. Your inserts and removals would become more expensive, however, because you would need to update up to Log-b(N) items on the way back to root with the new _bestLeaf (b is the branching factor of your tree).
I think the first thing you should think about is maybe going away from the N-Tree and going to as Binary Search tree.
This means that all nodes have only 2 children, a greater child, and a lesser child.
From there, I would say look into balancing your search tree with something like a Red-Black tree or AVL. That way, searching your tree is O(log n).
Here are some links to get you started:
http://en.wikipedia.org/wiki/Binary_search_tree
http://en.wikipedia.org/wiki/AVL_tree
http://en.wikipedia.org/wiki/Red-black_tree
Now, if you are dead set on having each node able to have N child nodes, here are some things you should thing about:
Think about ordering your child nodes so that you can quickly determine which has the highest leaf number. that way, when you enter a new node, you can check one child node and quickly determine if it is worth recursively checking it's children.
Think about ways that you can quickly eliminate as many nodes as you possibly can from the search or break the recursive calls as early as you can. With the binary search tree, you can easily find the largest leaf node by always only looking at the greater child. this could eliminate N-log(n) children if the tree is balanced.
Think about inserting and deleting nodes. If you spend more time here, you could save a lot more time later
As others mention, a different data structure might be what you want.
If you need to keep the data structure the same, the recursion can be unwound into loops. While this approach will probably be a little bit faster, it's not going to be orders of magnitude faster, but might take up less memory.

C# Find the Next X and Previous Numbers in a sequence

I have a list of numbers, {1,2,3,4,...,End} where End is specified. I want to display the X closest numbers around a given number Find within the list. If x is odd I want the extra digit to go on the greater than side.
Example (Base Case)
End: 6
X: 2
Find: 3
The result should be: {2,3,4}
Another Example (Bound Case):
End: 6
X: 4
Find: 5
The result should be: {2,3,4,5,6}
Yet Another Example (Odd Case):
End: 6
X: 3
Find: 3
The result should be: {2,3,4,5}
I'm assuming it would be easier to simply find a start and stop value, rather than actually generating the list, but I don't really care one way or another.
I'm using C# 4.0 if that matters.
Edit: I can think of a way to do it, but it involves way too many if, else if cases.
if (Find == 1)
{
Start = Find;
Stop = (Find + X < End ? Find + X : End);
}
else if (Find == 2)
{
if (X == 1)
{
Start = Find;
End = (Find + 1 < End ? Find + 1 : End);
}
...
}
You can hopefully see where this is going. I assuming I'm going to have to use a (X % 2 == 0) for odd/even checking. Then some bound thats like less = Find - X/2 and more = Find + X/2. I just can't figure out the path of least if cases.
Edit II: I should also clarify that I don't actually create a list of {1,2,3,4...End}, but maybe I need to just start at Find-X/2.
I realise that you are learning, and out of respect from this I will not provide you with the full solution. I will however do my best to nudge you in the right direction.
From looking at your attempted solution, I think you need to figure out the algorithm you need before trying to code up something that may or may not solve your problem. As you say yourself, writing one if statement for every possible permutation on the input is not a manageble solution. You need to find an algorithm that is general enough that you can use it for any input you get, and still get the right results out.
Basically, there are two questions you need to answer before you'll be able to code up a working solution.
How do I find the lower bound of the list I want to return?
How do I find the upper bound of the list I want to return?
Considering the example base case, you know that the given parameter X contains a number that tells you how many numbers around Find you should display. Therefore you need to divide X equally on both sides of Find.
Thus:
If I get an input X = 4 and Find = 3, the lower bound will be 3 - 4/2 or Find - X/2.
The higher bound will be 3 + 4/2 or Find + X/2.
Start by writing a program that runs and works for the base case. Once that is done, sit down and figure out how you would find the higher and lower bounds for a more complicated case.
Good luck!
You can look at Extension methods. skip and take.
x.Skip(3).Take(4);
this will help u in what u r trying to do

Categories