Appropriate data structure for fast searching "between" pairs of longs - c#

I currently (conceptually) have:
IEnumerable<Tuple<long, long, Guid>>
given a long, I need to find the "corresponding" GUID.
the pairs of longs should never overlap, although there may be gaps between pairs, for example:
1, 10, 366586BD-3980-4BD6-AFEB-45C19E8FC989
11, 15, 920EA34B-246B-41B0-92AF-D03E0AAA2692
20, 30, 07F9ED50-4FC7-431F-A9E6-783B87B78D0C
For every input long, there should be exactly 0 or 1 matching GUIDs.
so an input of 7, should return 366586BD-3980-4BD6-AFEB-45C19E8FC989
an input of 16 should return null
Update: I have about 90K pairs
How should I store this in-memory for fast searching?
Thanks

So long as they're stored in order, you can just do a binary search based on "start of range" vs candidate. Once you've found the entry with the highest "start of range" which is smaller than or equal to your target number, then either you've found an entry with the right GUID, or you've proved that you've hit a gap (because the entry with the highest smaller start of range has a lower end of range than your target).
You could potentially simplify the logic very slightly by making it a Dictionary<long, Guid?> and just record the start points, adding an entry with a null value for each gap. Then you just need to find the entry with the highest key which is less than or equal to your target number, and return the value.

Try this (I am sorry, not a solution for your IEnumerable):
public static Guid? Search(List<Tuple<long, long, Guid>> list, long input)
{
Tuple<long, long, Guid> item = new Tuple<long,long,Guid> { Item1 = input };
int index = list.BinarySearch(item, Comparer.Instance);
if (index >= 0) // Exact match found.
return list[index].Item3;
index = ~index;
if (index == 0)
return null;
item = list[index - 1];
if ((input >= item.Item1) && (input <= item.Item2))
return item.Item3;
return null;
}
public class Comparer : IComparer<Tuple<long, long, Guid>>
{
static public readonly Comparer Instance = new Comparer();
private Comparer()
{
}
public int Compare(Tuple<long,long,Guid> x, Tuple<long,long,Guid> y)
{
return x.Item1.CompareTo(y.Item1);
}
}

A B-tree is actually pretty good at this. Specifically, a B+-tree where each branch pointer has the start of your range as the key. The leaf data can contain the upper bound, so you deal with gaps correctly. I'm not sure if it's the best you could find anywhere, and you'd need to engineer it yourself, but it should certainly have very good performance.

Related

Using Random.range to generate level no without repetition

I tried to use recursion for the problem at hand as follows,
int newlevelgen()
{
int exampleno = Random.Range (1,4);
if (exampleno != lastlevelno)
{
lastlevelno = exampleno;
return exampleno;
}
else
{
newlevelgen();
}
return exampleno;
}
This is my code above, what I want to do is generate new number without repeating the previous one, but this simply does not work. Help!
The idea is that you get a value between a and b. If that value is greater of equal to previous value, then you increase that and return it. Other case, you return as is.
Think of it, it does work.
public int GetUniqueLevelInclusiveOrdinal(int a , int b, int previous){
// given the ordinal numbers from a to b INCLUSIVE,
// so including a and b,
// (*** NOTE, no random int calls in Unity or any system
// work inclusively so take care ***)
// choose one of the numbers from a to b inclusive
// but do not choose "previous"
// which top index to use with Random.Range which is exclusive?
int top = b+1;
// everyday algorithm for excluding one item from a random run
int topExcludeOne = top-1;
int value = Random.Range(a, topExcludeOne);
if (value >= previous) return value+1;
else return value;
}
This is an extremely well-known, basic, pattern in programming...
int result = UnityEngine.Random.Range(0,highest-1);
if (result>=exclude)
return result+1;
else
return result;
In Unity you must use extensions:
public static int RandomIndexButNotThis(this int highest, int exclude)
{
if (highest<2) Debug.Break();
int result = UnityEngine.Random.Range(0,highest-1);
if (result>=exclude)
return result+1;
else
return result;
}
To get a random index 0 to 99
Random.Range(0,100)
To get a random index 0 to 99, but excluding 61
100.RandomIndexButNotThis(61)
To get a random index 0 to 9
Random.Range(0,10)
To get a random index 0 to 9, but excluding 8
10.RandomIndexButNotThis(8)
If new to Unity, intro to extensions
Let me preface this by saying that the following is an answer to the problem with the original method that was posted. It is far from that best way to get the desired result but that's not the focus of this answer. Yes there is a chance that the recursion will go on for a few calls. 33% chance to need another recursive call, though that does not mean a 33% for an infinite loop as the likelihood of needing X recursive calls is 0.33^X so the likelihood of reaching 3 recursive calls is only 0.33^3 ~= 0.036.
Still, a different method is advised. See Joe Blow's answer for example.
There's a problem with your recursion. You aren't using the new value that the recursive call to newlevelgen() would give you in case exampleno is the same as lastlevelno and always returning the (possibly duplicate) exampleno value. Change it to:
int newlevelgen()
{
int exampleno = Random.Range (1,4);
if (exampleno != lastlevelno)
{
lastlevelno = exampleno;
return exampleno;
}
else
{
return newlevelgen();
}
}
I created utility to easely handle random without repetitions, flat distributed random, and weighted lists (universal property drawer included!). You can find it on GtiHub and use freely:
https://github.com/khadzhynov/RandomUtils

How to compare large string integer values

Currently I am working on a program that processes extremely large integernumbers .
To prevent hitting the intiger.maxvalue a script that processes strings as numbers, and splits them up into a List<int>as following
0 is the highest currently known value
list entry 0: 123 (hundred twenty three million)
list entry 1: 321 (three hundred twenty one thousand)
list entry 2: 777 (seven hundred seventy seven)
Now my question is: How would one check if the incoming string value is sub tractable from these values?
The start for subtraction I currently made is as following, but I am getting stuck on the subtracting part.
public bool Subtract(string value)
{
string cleanedNumeric = NumericAndSpaces(value);
List<string> input = new List<string>(cleanedNumeric.Split(' '));
// In case 1) the amount is bigger 2) biggest value exceeded by a 10 fold
// 3) biggest value exceeds the value
if (input.Count > values.Count ||
input[input.Count - 1].Length > values[0].ToString().Length ||
FastParseInt(input[input.Count -1]) > values[0])
return false;
// Flip the array for ease of comparison
input.Reverse();
return true;
}
EDIT
Current target for the highest achievable number in this program is a Googolplex And are limited to .net3.5 MONO
You should do some testing on this because I haven't run extensive tests but it has worked on the cases I've put it through. Also, it might be worth ensuring that each character in the string is truly a valid integer as this procedure would bomb given a non-integer character. Finally, it expects positive numbers for both subtrahend and minuend.
static void Main(string[] args)
{
// In subtraction, a subtrahend is subtracted from a minuend to find a difference.
string minuend = "900000";
string subtrahend = "900001";
var isSubtractable = IsSubtractable(subtrahend, minuend);
}
public static bool IsSubtractable(string subtrahend, string minuend)
{
minuend = minuend.Trim();
subtrahend = subtrahend.Trim();
// maybe loop through characters and ensure all are valid integers
// check if the original number is longer - clearly subtractable
if (minuend.Length > subtrahend.Length) return true;
// check if original number is shorter - not subtractable
if (minuend.Length < subtrahend.Length) return false;
// at this point we know the strings are the same length, so we'll
// loop through the characters, one by one, from the start, to determine
// if the minued has a higher value character in a column of the number.
int numberIndex = 0;
while (numberIndex < minuend.Length )
{
Int16 minuendCharValue = Convert.ToInt16(minuend[numberIndex]);
Int16 subtrahendCharValue = Convert.ToInt16(subtrahend[numberIndex]);
if (minuendCharValue > subtrahendCharValue) return true;
if (minuendCharValue < subtrahendCharValue) return false;
numberIndex++;
}
// number are the same
return true;
}
[BigInteger](https://msdn.microsoft.com/en-us/library/system.numerics.biginteger.aspx) is of aribtary size.
Run this code if you don't believe me
var foo = new BigInteger(2);
while (true)
{
foo = foo * foo;
}
Things get crazy. My debugger (VS2013) becomes unable to represent the number before it's done. ran it for a short time and got a number with 1.2 million digits in base 10 from ToString. It is big enough. There is a 2GB limit on object, which can be overriden in .NET 4.5 with the setting gcAllowVeryLargeObjects
Now what to do if you are using .NET 3.5? You basically need to reimplement BigInteger (obviously only taking what you need, there is a lot in there).
public class MyBigInteger
{
uint[] _bits; // you need somewhere to store the value to an arbitrary length.
....
You also need to perform maths on these arrays. here is the Equals method from BigInteger:
public bool Equals(BigInteger other)
{
AssertValid();
other.AssertValid();
if (_sign != other._sign)
return false;
if (_bits == other._bits)
// _sign == other._sign && _bits == null && other._bits == null
return true;
if (_bits == null || other._bits == null)
return false;
int cu = Length(_bits);
if (cu != Length(other._bits))
return false;
int cuDiff = GetDiffLength(_bits, other._bits, cu);
return cuDiff == 0;
}
It basically does cheap length and sign comparisons of the byte arrays, then, if that doesn't produce a difference hands off to GetDiffLength.
internal static int GetDiffLength(uint[] rgu1, uint[] rgu2, int cu)
{
for (int iv = cu; --iv >= 0; )
{
if (rgu1[iv] != rgu2[iv])
return iv + 1;
}
return 0;
}
Which does the expensive check of looping through the arrays looking for a difference.
All you math will have to follow this pattern and can largely be ripped of from the .Net source code.
Googleplex and 2GB:
Here the 2GB limit becomes a problem, because you will be needing an object size of 3.867×10^90 gigabyte. This the the point where you give up, or get clever and store objects as powers at the cost of not being able to represent a lot of them. *2
if you moderate your expectations, it doesn't actually change the maths of BigInteger to split _bits into multiple jagged arrays *1. You change the cheap checks a bit. Rather than checking the size of the array, you check the number of subarrays and then the size of the last one. Then the loop needs to be a bit more (but not much) more complex in that it does elementwise array comparison for each sub array. There are other changes as well, but it's by no means impossible and gets you out of the 2GB limit.
*1 Note use jagged arrays[][], not multidimensional arrays [,] which are still subject to the same limit.
*2 Ie give up on precision and store the mantissa and exponent. If you look how floating point numbers are implemented they can't represent all numbers between their max and min (as the number of real numbers in a range is 'bigger' than infinite). They make a complex trade off between precision and range. If you are wanting to do this, looking at float implementations will be a lot more useful than taking about integer representations like Biginteger.

How do make binary Search in list of structures in C#?

I want to ask how can I implement binary search in a List where I have 50,000 structures.
I sort this list by word like this listSub.OrderBy(s => s.word);
public struct SubItem
{
public string word;
public int count;
public SubItem(string word, int count)
{
this.word = word;
this.count = count;
}
}
I dont know how binary search in List<SubItem>. Can you help me?
The key to a binary search is that all of the items to the left of the center item are less than the center item, and to the right are all greater. The other key is that any given section of the list behaves as a sorted list in its own right, and thus that property still applies. So you keep breaking the list in half until you have found your element. This is usually done with recursion, with an empty list as the base case that the element does not exist in the list.
Here is some pseudocode to get you started:
find (list, start, end, target) {
if (start > end) // this is called a base case, look into recursion
return -1
center = (end - start) / 2
if list[center] == target // this is also a base case
return center;
else if list[center] > target
return find(list, start, center-1, target)
else
return find(list, center+1, end, target)
}
And we would run it like so:
list = [ 1 , 3, 7, 9, 12, 15, 17, 21, 28 ]
// need to find the target in the list
find(list, 0, list.length - 1, 3);
this would first look at 12, which is bigger than our target, so then it would split the list in half, look at the middle of the lesser half which is 3 and find it, returning its index. It tends to be more beneficial on larger and larger lists.
I said it uses recursion, which means two things: it will call itself until it finds an answer, and that when it finds the answer, it stops calling itself. There are two main kinds of recursion which you might find out about later, but the most important parts are those two: the recursive step (calling itself) and the base case (when it stops calling itself). Without those two parts something is not recursive.
Use ArrayList. It has a method named as BinarySearch.

C# fastest intersection of 2 sets of sorted numbers

I'm calculating intersection of 2 sets of sorted numbers in a time-critical part of my application. This calculation is the biggest bottleneck of the whole application so I need to speed it up.
I've tried a bunch of simple options and am currently using this:
foreach (var index in firstSet)
{
if (secondSet.BinarySearch(index) < 0)
continue;
//do stuff
}
Both firstSet and secondSet are of type List.
I've also tried using LINQ:
var intersection = firstSet.Where(t => secondSet.BinarySearch(t) >= 0).ToList();
and then looping through intersection.
But as both of these sets are sorted I feel there's a better way to do it. Note that I can't remove items from sets to make them smaller. Both sets usually consist of about 50 items each.
Please help me guys as I don't have a lot of time to get this thing done. Thanks.
NOTE: I'm doing this about 5.3 million times. So every microsecond counts.
If you have two sets which are both sorted, you can implement a faster intersection than anything provided out of the box with LINQ.
Basically, keep two IEnumerator<T> cursors open, one for each set. At any point, advance whichever has the smaller value. If they match at any point, advance them both, and so on until you reach the end of either iterator.
The nice thing about this is that you only need to iterate over each set once, and you can do it in O(1) memory.
Here's a sample implementation - untested, but it does compile :) It assumes that both of the incoming sequences are duplicate-free and sorted, both according to the comparer provided (pass in Comparer<T>.Default):
(There's more text at the end of the answer!)
static IEnumerable<T> IntersectSorted<T>(this IEnumerable<T> sequence1,
IEnumerable<T> sequence2,
IComparer<T> comparer)
{
using (var cursor1 = sequence1.GetEnumerator())
using (var cursor2 = sequence2.GetEnumerator())
{
if (!cursor1.MoveNext() || !cursor2.MoveNext())
{
yield break;
}
var value1 = cursor1.Current;
var value2 = cursor2.Current;
while (true)
{
int comparison = comparer.Compare(value1, value2);
if (comparison < 0)
{
if (!cursor1.MoveNext())
{
yield break;
}
value1 = cursor1.Current;
}
else if (comparison > 0)
{
if (!cursor2.MoveNext())
{
yield break;
}
value2 = cursor2.Current;
}
else
{
yield return value1;
if (!cursor1.MoveNext() || !cursor2.MoveNext())
{
yield break;
}
value1 = cursor1.Current;
value2 = cursor2.Current;
}
}
}
}
EDIT: As noted in comments, in some cases you may have one input which is much larger than the other, in which case you could potentially save a lot of time using a binary search for each element from the smaller set within the larger set. This requires random access to the larger set, however (it's just a prerequisite of binary search). You can even make it slightly better than a naive binary search by using the match from the previous result to give a lower bound to the binary search. So suppose you were looking for values 1000, 2000 and 3000 in a set with every integer from 0 to 19,999. In the first iteration, you'd need to look across the whole set - your starting lower/upper indexes would be 0 and 19,999 respectively. After you'd found a match at index 1000, however, the next step (where you're looking for 2000) can start with a lower index of 2000. As you progress, the range in which you need to search gradually narrows. Whether or not this is worth the extra implementation cost or not is a different matter, however.
Since both lists are sorted, you can arrive at the solution by iterating over them at most once (you may also get to skip part of one list, depending on the actual values they contain).
This solution keeps a "pointer" to the part of list we have not yet examined, and compares the first not-examined number of each list between them. If one is smaller than the other, the pointer to the list it belongs to is incremented to point to the next number. If they are equal, the number is added to the intersection result and both pointers are incremented.
var firstCount = firstSet.Count;
var secondCount = secondSet.Count;
int firstIndex = 0, secondIndex = 0;
var intersection = new List<int>();
while (firstIndex < firstCount && secondIndex < secondCount)
{
var comp = firstSet[firstIndex].CompareTo(secondSet[secondIndex]);
if (comp < 0) {
++firstIndex;
}
else if (comp > 0) {
++secondIndex;
}
else {
intersection.Add(firstSet[firstIndex]);
++firstIndex;
++secondIndex;
}
}
The above is a textbook C-style approach of solving this particular problem, and given the simplicity of the code I would be surprised to see a faster solution.
You're using a rather inefficient Linq method for this sort of task, you should opt for Intersect as a starting point.
var intersection = firstSet.Intersect(secondSet);
Try this. If you measure it for performance and still find it unwieldy, cry for further help (or perhaps follow Jon Skeet's approach).
I was using Jon's approach but needed to execute this intersect hundreds of thousands of times for a bulk operation on very large sets and needed more performance. The case I was running in to was heavily imbalanced sizes of the lists (eg 5 and 80,000) and wanted to avoid iterating the entire large list.
I found that detecting the imbalance and changing to an alternate algorithm gave me huge benifits over specific data sets:
public static IEnumerable<T> IntersectSorted<T>(this List<T> sequence1,
List<T> sequence2,
IComparer<T> comparer)
{
List<T> smallList = null;
List<T> largeList = null;
if (sequence1.Count() < Math.Log(sequence2.Count(), 2))
{
smallList = sequence1;
largeList = sequence2;
}
else if (sequence2.Count() < Math.Log(sequence1.Count(), 2))
{
smallList = sequence2;
largeList = sequence1;
}
if (smallList != null)
{
foreach (var item in smallList)
{
if (largeList.BinarySearch(item, comparer) >= 0)
{
yield return item;
}
}
}
else
{
//Use Jon's method
}
}
I am still unsure about the point at which you break even, need to do some more testing
try
firstSet.InterSect (secondSet).ToList ()
or
firstSet.Join(secondSet, o => o, id => id, (o, id) => o)

Binary Search on Keys of SortedList<K, V>

I need to write some code for linear interpolation and I am trying to figure out the most efficient way to search the Keys of a SortedList<K, V> for the upper and lower keys that surround my target key.
SortedList<int, double> xyTable = new SortedList<int, double>()
{
{1, 10}, {2, 20}, {3, 30}, {4,40}
};
double targetX = 3.5;
What is the most efficient way to search the list and determine that 3.5 is between 3 and 4? I have a method / cheat that works for integers (temporarily insert the target Key into the list then find the index) but I figured I'd ask the pros so I could produce quality code.
Thanks.
A binary search gives you decent performance on a list. However the Keys property on SortedList is of type IList, whereas BinarySearch is defined on List. Fortunately, you can find an implementation of binary search for IList in this related question:
How to perform a binary search on IList<T>?
In my case the source SortedList is not changing much, since its being used as a lookup table. So in this case it makes sense to convert the SortedList to a List<T> once. After that it is quite easy to use the built-in BinarySearch method of List<T>...
double targetX = 3.5;
// Assume keys are doubles, may need to convert to doubles if required here.
// The below line should only be performed sparingly as it is an O(n) operation.
// In my case I only do this once, as the list is unchanging.
List<double> keys = xyTable.Keys.ToList();
int ipos = keys.BinarySearch(targetX);
if (ipos >= 0)
{
// exact target found at position "ipos"
}
else
{
// Exact key not found: BinarySearch returns negative when the
// exact target is not found, which is the bitwise complement
// of the next index in the list larger than the target.
ipos = ~ipos;
if (ipos >= 0 && ipos < keys.Count)
{
if (ipos > 0)
{
// target is between positions "ipos-1" and "ipos"
}
else
{
// target is below position "ipos"
}
}
else
{
// target is above position "ipos"
}
}
public class Bounds
{
int lower;
int upper;
public Bounds(int lower, int upper)
{
this.lower = lower;
this.upper = upper;
}
}
public Bounds BinarySearch(List<int> keys, double target)
{
// lower boundary case returns the smallest key as the lower and upper bounds
if (target < keys[0])
return new Bounds(0, 0);
else if (target < keys[1])
return new Bounds(0, 1);
// upper boundary case returns the largest key as the lower and upper bounds
else if (target > keys[keys.Length - 1])
return new Bounds(keys.Length - 1, keys.Length - 1);
else if (target > keys[keys.Length - 2])
return new Bounds(keys.Length - 2, keys.Length - 1);
else
return BinarySearch(keys, target, 0, keys.Length - 1);
}
// 'keys' is a List storing all of the keys from your SortedList.
public Bounds BinarySearch(List<int> keys, double target, int lower, int upper)
{
int middle = (upper + lower)/2;
// target is equal to one of the keys
if (keys[middle] == target)
return new Bounds(middle - 1, middle + 1);
else if (keys[middle] < target && keys[middle + 1] > target)
return new Bounds(middle, middle + 1);
else if (keys[middle] > target && keys[middle - 1] < target)
return new Bounds(middle - 1, middle);
if (list[middle] < target)
return BinarySearch(list, target, lower, upper/2 - 1);
if (list[middle] > target)
return BinarySearch(list, target, upper/2 + 1, upper);
}
This might work..I didn't test it out. If not, hopefully it's close enough that you can use it with minor tweaks. This is a strange problem, so I handled all of the boundary cases so I didn't have to think about what the algorithm would do when the range was down to 2 elements or less.
From MSDN,
The elements of a SortedList object are sorted by the keys either according to a specific IComparer implementation specified when the SortedList is created, or according to the IComparable implementation provided by the keys themselves.
The index sequence is based on the sort sequence. When an element is added, it is inserted into SortedList in the correct sort order, and the indexing adjusts accordingly. When an element is removed, the indexing also adjusts accordingly. Therefore, the index of a specific key/value pair might change as elements are added or removed from the SortedList.
*****This method uses a binary search algorithm; therefore, this method is an O(log n) operation, where n is Count.*****
Starting with the .NET Framework 2.0, this method uses the collection’s objects’ Equals and CompareTo methods on item to determine whether item exists. In the earlier versions of the .NET Framework, this determination was made by using the Equals and CompareTo methods of the item parameter on the objects in the collection.
In other words, the method IndexOfKey in SortedList is actually using a binarySearch algorithm already, so there is no need to convert form SortedList to List in your case.
Hope it helps..

Categories