Find equal or nearest smaller value from an array - c#

Let's suppose I have this array (it is actually 255 long, values up to int.MaxValue):
int[] lows = {0,9,0,0,5,0,0,8,4,1,3,0,0,0,0};
From this array I would like to get index of a value equal or smaller to my number.
number = 7 -> index = 4
number = 2 -> index = 9
number = 8 -> index = 7
number = 9 -> index = 1
What would be the fastest way of finding it?
So far I've used linear search, but that turned out to be too inefficient for my need, because even though this array is only 255 long, values will be searched for a few million times.
I would need something equal to TreeSet.floor(E) used in java. I wanted to use Dictionary, but i don't know if it can find first smaller or equal value like I need.

Sort the array and then do a binary search to find the values.
See:
https://en.wikipedia.org/wiki/Binary_search
and
Array.BinarySearch Method

If it's not sorted (or otherwise held in a data structure where there is a relationship between the members that can assist the search), then you will have to examine every member to find the right one.
The easiest solution is probably to sort it and then do a binary chop/search to find the element matching your criteria.
If you want efficiency with the ability to still take unsorted arrays, maintain a sorted flag somewhere for the array (i.e., turn the whole thing into a class containing the indicator and the array) that indicates that the list is sorted.
Then you set this flag to false whenever the array is changed.
At the point where you want to do your search, you first check the sorted flag and sort the array if it's set to false (setting it to true as part of that process). If the flag is true, just bypass the sort.
That way, you only sort when needed. If the array hasn't changed since the last sort, there's no point in re-sorting.
You can also maintain the original unsorted list if the user needs that, keeping the sorted list as an additional array withing the class (another advantage of class-ifying your array). That way, you lose nothing. You have the original untouched data for the user to get at, and a fast means of efficiently find your desired element.
Your object (when sorted) would then contain:
int[] lows = {0,9,0,0,5,0,0,8,4,1,3,0,0,0,0};
int[] sortedlows = {0,0,0,0,0,0,0,0,0,1,3,4,5,8,9};
boolean isSorted = true;
If you then changed that_object[0] to 3, you'd end up with:
int[] lows = {3,9,0,0,5,0,0,8,4,1,3,0,0,0,0};
int[] sortedlows = {0,0,0,0,0,0,0,0,0,1,3,4,5,8,9};
boolean isSorted = false;
indicating that a sort would be needed before searching through sortedLows.
And keep in mind it's not a requirement to turn this into a class. If you're worried about it's performance (specifically accessing array elements through a getter method), you can maintain the arrays and flag yourself while still allowing direct access to the unsorted array. You just have to ensure that every place in your code that changes the array also sets the flag correctly.
But you should measure the performance before taking this path. The class-based way is "safer" since the object itself controls the whole thing.

First, normalise the data:
public static Dictionary<int, int> GetNormalised(int[] data)
{
var normalised = data.Select((value, index) => new { value, index })
.GroupBy(p => p.value, p => p.index)
.Where(p => p.Key != 0)
.OrderBy(p => p.Key)
.ToDictionary(p => p.Key, p => p.Min());
return normalised;
}
The search method:
public static int GetNearest(Dictionary<int, int> normalised, int value)
{
var res = normalised.Where(p => p.Key <= value)
.OrderBy(p => value - p.Key)
.Select(p => (int?)p.Value)
.FirstOrDefault();
if (res == null)
{
throw new ArgumentOutOfRangeException("value", "Not found");
}
return res.Value;
}
The unit test:
[TestMethod]
public void GetNearestTest()
{
var data = new[] { 0, 9, 0, 0, 5, 0, 0, 8, 4, 1, 3, 0, 0, 0, 0 };
var normalised = Program.GetNormalised(data);
var value = 7;
var expected = 4;
var actual = Program_Accessor.GetNearest(normalised, value);
Assert.AreEqual(expected, actual);
value = 2;
expected = 9;
actual = Program_Accessor.GetNearest(normalised, value);
Assert.AreEqual(expected, actual);
value = 8;
expected = 7;
actual = Program_Accessor.GetNearest(normalised, value);
Assert.AreEqual(expected, actual);
value = 9;
expected = 1;
actual = Program_Accessor.GetNearest(normalised, value);
Assert.AreEqual(expected, actual);
}
To optimise the performance cache all the used results.

To mock Java TreeSet in C#, use C# class: SortedDictionary or SortedSet; mock floor method in Java TreeSet, use LINQ method, to get minimum value.
SortedSet data = new SortedSet();
data.Where(p =>p < prices[i]).OrderByDescending(p=>p ).Take(1)
Based on HackerRank minimum cost algorithm, Java TreeSet implementation solution works fine, but C# SortedDictionary, SortedSet timeout.
Detail, see my coding blog: http://juliachencoding.blogspot.ca/2016/11/c-sorteddictionary-minimum-cost.html
So, C# SortedSet class GetViewBetween can do the same thing. See GetViewBetween API
There is a post: SortedSet / SortedList with better LINQ performance?

Related

C# why does binarysearch have to be made on sorted arrays and lists?

C# why does binarysearch have to be made on sorted arrays and lists?
Is there any other method that does not require me to sort the list?
It kinda messes with my program in a way that I cannot sort the list for it to work as I want to.
A binary search works by dividing the list of candidates in half using equality. Imagine the following set:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
We can also represent this as a binary tree, to make it easier to visualise:
Source
Now, say we want to find the number 3. We can do it like so:
Is 3 smaller than 8? Yes. OK, now we're looking at everything between 1 and 7.
Is 3 smaller than 4? Yes. OK, now we're looking at everything between 1 and 3.
Is 3 smaller than 2? No. OK, now we're looking at 3.
We found it!
Now, if your list isn't sorted, how will we divide the list in half? The simple answer is: we can't. If we swap 3 and 15 in the example above, it would work like this:
Is 3 smaller than 8? Yes. OK, now we're looking at everything between 1 and 7.
Is 3 smaller than 4? Yes. OK, now we're looking at everything between 1 and 3 (except we swapped it with 15).
Is 3 smaller than 2? No. OK, now we're looking at 15.
Huh? There's no more items to check but we didn't find it. I guess it's not in the list.
The solution is to use an appropriate data type instead. For fast lookups of key/value pairs, I'll use a Dictionary. For fast checks if something already exists, I'll use a HashSet. For general storage I'll use a List or an array.
Dictionary example:
var values = new Dictionary<int, string>();
values[1] = "hello";
values[2] = "goodbye";
var value2 = values[2]; // this lookup will be fast because Dictionaries are internally optimised inside and partition keys' hash codes into buckets.
HashSet example:
var mySet = new HashSet<int>();
mySet.Add(1);
mySet.Add(2);
if (mySet.Contains(2)) // this lookup is fast for the same reason as a dictionary.
{
// do something
}
List exmaple:
var list = new List<int>();
list.Add(1);
list.Add(2);
if (list.Contains(2)) // this isn't fast because it has to visit each item in the list, but it works OK for small sets or places where performance isn't so important
{
}
var idx2 = list.IndexOf(2);
If you have multiple values with the same key, you could store a list in a Dictionary like this:
var values = new Dictionary<int, List<string>>();
if (!values.ContainsKey(key))
{
values[key] = new List<string>();
}
values[key].Add("value1");
values[key].Add("value2");
There is no way you use binary search on unordered collections. Sorting collection is the main concept of the binary search. The key is that on every move u take the middle index between l and r. On first step they are 0 and size - 1, after every step one of them becomes middle index between them. If x > arr[m] then l becomes m + 1, otherwise r becomes m - 1. Basically, on every step you take half of the array you had and, of course, it remains sorted. This code is recursive, if you don't know what recursion is(which is very important in programming), you can review and learn here.
// C# implementation of recursive Binary Search
using System;
class GFG {
// Returns index of x if it is present in
// arr[l..r], else return -1
static int binarySearch(int[] arr, int l,
int r, int x)
{
if (r >= l) {
int mid = l + (r - l) / 2;
// If the element is present at the
// middle itself
if (arr[mid] == x)
return mid;
// If element is smaller than mid, then
// it can only be present in left subarray
if (arr[mid] > x)
return binarySearch(arr, l, mid - 1, x);
// Else the element can only be present
// in right subarray
return binarySearch(arr, mid + 1, r, x);
}
// We reach here when element is not present
// in array
return -1;
}
// Driver method to test above
public static void Main()
{
int[] arr = { 2, 3, 4, 10, 40 };
int n = arr.Length;
int x = 10;
int result = binarySearch(arr, 0, n - 1, x);
if (result == -1)
Console.WriteLine("Element not present");
else
Console.WriteLine("Element found at index "
+ result);
}
}
Output:
Element is present at index 3
Sure there is.
var list = new List<int>();
list.Add(42);
list.Add(1);
list.Add(54);
var index = list.IndexOf(1); //TADA!!!!
EDIT: Ok, I hoped the irony was obvious. But strictly speaking, if your array is not sorted, you are pretty much stuck with the linear search, readily available by means of IndexOf() or IEnumerable.First().

Find Index of List of Tuples from 1 item

Please consider a List of Tuples in C#. This relates to the original Tuple (not Value Tuple). How can I get the index of the List, if I know one of the Items within the List of Tuples?
List<Tuple<double, int>> ListOfTuples2 = new
List<Tuple<double, int>>();
double doubleTuple = 5000;
int intTuple = 7;
ListOfTuples2.Add(Tuple.Create(doubleTuple, intTuple));
ListOfTuples2.Add(Tuple.Create(5000.00, 2));
ListOfTuples2.Add(Tuple.Create(5000.25, 3));
ListOfTuples2.Add(Tuple.Create(5000.50, 4));
ListOfTuples2.Add(Tuple.Create(5000.25, 5));
/* How can I get the Index of the List if
doubleTuple = 5000.25 ? */
You can use the FindIndex method of the list accepting a predicate as argument
int index = ListOfTuples2.FindIndex(t => t.Item1 == 5000.25);
if (index > = 0) {
// found!
}
FindIndex returns -1 if no such item is found.
But you might consider using a dictionary instead. If the collection is big, it finds entries much faster than a list. The retrieval times in Big O notation: List<T> is O(n), Dictionary<K,V> is O(1). However, items in a dictionary are not ordered and have no index. In addition, keys must be unique. If you need ordered items, stick to the list.
var dict = new Dictionary<double, int>{
[doubleTuple] = intTuple,
[5000.00] = 2,
[5000.25] = 3,
[5000.50] = 4,
[5000.25] = 5
}
if (dict.TryGetValue(5000.25, out int result)) {
// result is 3; and contains the value, not the index.
}
You can also add entries with
dict.Add(5000.75, 8);
If you are sure that the dictionary contains an entry, you can simply retrieve it with
int result = dict[5000.25];
Also, if you are dealing with prices, consider using the decimal type. If has been created specifically for financial and monetary calculations. The double type stores the values as binary numbers. 0.1 (decimal) is 0.000110011001100110011001100110011... (binary), i.e., double introduces a rounding error, solely by converting a decimal constant into its binary representation, whereas decimal stores each decimal of the constant as is. double is okay (and faster) for scientific calculations. It makes no difference whether a temperature is 29.7 or 29.69999999999 degrees, since you can measure it with a very limited precision anyway (maybe 1%).
C# 7.0 has added ValueTuple types plus a simple syntax for tuple types and tuple values. Consider replacing the Tuple class with this new feature.
var listOfValueTuples = new List<(double, int)> {
(doubleTuple, intTuple),
(5000.00, 2),
(5000.25, 3),
(5000.50, 4),
(5000.25, 5)
};
In case you want to get all indexes you can write following code:
var indexes = ListOfTuples2.Select((tuple, index) => new {tuple, index}).Where(o => Math.Abs(o.tuple.Item1 - 5000.25) < 1e-5).Select(o => o.index - 1);
foreach (var index in indexes)
{
Console.WriteLine(index);
}
Note that comparing two floats can return unpredictable results, so I used comparing using Math.Abs method

Get IndexOf Second int record in a sorted List in C#

I am having problem while trying to get First and Second Record (not second highest/lowest integer) Index from a sorted List. Lets say that list consists of three records that in order are like this: 0, 0, 1.
I tried like this:
int FirstNumberIndex = MyList.IndexOf(MyList.OrderBy(item => item).Take(1).ToArray()[0]); //returns first record index, true
int SecondNumberIndex = MyList.IndexOf(MyList.OrderBy(item => item).Take(2).ToArray()[1]); //doesn't seem to work
As I explained, I am trying to get the indexes of first two zeros (they are not necessarily in ascending order before the sort) and not of zero and 1.
So if there was a list {0, 2, 4, 0} I need to get Indexes 0 and 3. But this may apply to any number that is smallest and repeats itself in the List.
However, it must also work when the smallest value does not repeat itself.
SecondNumberIndex is set to 0 because
MyList.OrderBy(item => item).Take(2).ToArray()[1] == 0
then you get
MyList.IndexOf(0)
that finds the first occurence of 0. 0 is equal to every other 0. So every time you ask for IndexOf(0), the very first 0 on the list gets found.
You can get what you want by using that sort of approach:
int FirstNumberIndex = MyList.IndexOf(0); //returns first record index, true
int SecondNumberIndex = MyList.IndexOf(0, FirstNumberIndex + 1 ); //will start search next to last ocurrence
From your code I guess you confuse some kind of "instance equality" with regular "equality".
Int is a simple type, IndexOf will not search for ocurrence of your specific instance of 0.
Keep in mind that this code, even if we will move in our thoughts to actual objects:
MyList.OrderBy(item => item).Take(2).ToArray()[1]
will not necessarily return equal objects in their original relative order from the input list.
EDIT
This cannot be adopted for general case, for getting indexes of ordered values from the original, unordered list.
If you are searching for indexes of any number of equal values, then setting bigger and bigger offset for the second parameter of IndexOf is OK.
But, let's consider a case when there are no duplicates. Such approach will work only when the input list is actually ordered ;)
You can preprocess your input list to have pairs (value = list[i],idx = i), then sort that pairs by value and then iterate over sorted pairs and print idx-es
You, probably, are asking about something like this:
var list = new List<int>{0,0,1};
var result = list.Select((val,i)=> new {value = val, idx = i}).Where(x=>x.value == 0);
foreach(var r in result) //anonymous type enumeration
Console.WriteLine(r.idx);
You can try user FindIndex.
var MyList = new List<int>() {3, 5, 1, 2, 4};
int firsIndex = MyList.FindIndex(a => a == MyList.OrderBy(item => item).Take(1).ToArray()[0]);
int secondIndex = MyList.FindIndex(a => a == MyList.OrderBy(item => item).Take(2).ToArray()[1]);
You could calculate the offset of the first occurrence, then use IndexOf on the list after skipping the offset.
int offset = ints.IndexOf(0) + 1;
int secondIndex = ints.Skip(offset).ToList().IndexOf(0) + offset;

Median Maintenance Algorithm - Same implementation yields different results depending on Int32 or Int64

I found something interesting while doing a HW question.
The howework question asks to code the Median Maintenance algorithm.
The formal statement is as follows:
The goal of this problem is to implement the "Median Maintenance" algorithm (covered in the Week 5 lecture on heap applications). The text file contains a list of the integers from 1 to 10000 in unsorted order; you should treat this as a stream of numbers, arriving one by one. Letting xi denote the ith number of the file, the kth median mk is defined as the median of the numbers x1,…,xk. (So, if k is odd, then mk is ((k+1)/2)th smallest number among x1,…,xk; if k is even, then m1 is the (k/2)th smallest number among x1,…,xk.)
In order to get O(n) running time, this should be implemented using heaps obviously. Anyways, I coded this using Brute Force (deadline was too soon and needed an answer right away) (O(n2)) with the following steps:
Read data in
Sort array
Find Median
Add it to running time
I ran the algorithm through several test cases (with a known answer) and got the correct results, however when I was running the same algorithm on a larger data set I was getting the wrong answer. I was doing all the operations using Int64 ro represent the data.
Then I tried switching to Int32 and magically I got the correct answer which makes no sense to me.
The code is below, and it is also found here (the data is in the repo). The algorithm starts to give erroneous results after the 3810 index:
private static void Main(string[] args)
{
MedianMaintenance("Question2.txt");
}
private static void MedianMaintenance(string filename)
{
var txtData = File.ReadLines(filename).ToArray();
var inputData32 = new List<Int32>();
var medians32 = new List<Int32>();
var sums32 = new List<Int32>();
var inputData64 = new List<Int64>();
var medians64 = new List<Int64>();
var sums64 = new List<Int64>();
var sum = 0;
var sum64 = 0f;
var i = 0;
foreach (var s in txtData)
{
//Add to sorted list
var intToAdd = Convert.ToInt32(s);
inputData32.Add(intToAdd);
inputData64.Add(Convert.ToInt64(s));
//Compute sum
var count = inputData32.Count;
inputData32.Sort();
inputData64.Sort();
var index = 0;
if (count%2 == 0)
{
//Even number of elements
index = count/2 - 1;
}
else
{
//Number is odd
index = ((count + 1)/2) - 1;
}
var val32 = Convert.ToInt32(inputData32[index]);
var val64 = Convert.ToInt64(inputData64[index]);
if (i > 3810)
{
var t = sum;
var t1 = sum + val32;
}
medians32.Add(val32);
medians64.Add(val64);
//Debug.WriteLine("Median is {0}", val);
sum += val32;
sums32.Add(Convert.ToInt32(sum));
sum64 += val64;
sums64.Add(Convert.ToInt64(sum64));
i++;
}
Console.WriteLine("Median Maintenance result is {0}", (sum).ToString("N"));
Console.WriteLine("Median Maintenance result is {0}", (medians32.Sum()).ToString("N"));
Console.WriteLine("Median Maintenance result is {0} - Int64", (sum64).ToString("N"));
Console.WriteLine("Median Maintenance result is {0} - Int64", (medians64.Sum()).ToString("N"));
}
What's more interesting is that the running sum (in the sum64 variable) yields a different result than summing all items in the list with LINQ's Sum() function.
The results (the thirs one is the one that's wrong):
These are the computer details:
I'll appreciate if someone can give me some insights on why is this happening.
Thanks,
0f is initializing a 32 bit float variable, you meant 0d or 0.0 to receive a 64 bit floating point.
As for linq, you'll probably get better results if you use strongly typed lists.
new List<int>()
new List<long>()
The first thing I notice is what the commenter did: var sum64 = 0f initializes sum64 as a float. As the median value of a collection of Int64s will itself be an Int64 (the specified rules don't use the mean between two midpoint values in a collection of even cardinality), you should instead declare this variable explicitly as a long. In fact, I would go ahead and replace all usages of var in this code example; the convenience of var is being lost here in causing type-related bugs.

Appropriate data structure for fast searching "between" pairs of longs

I currently (conceptually) have:
IEnumerable<Tuple<long, long, Guid>>
given a long, I need to find the "corresponding" GUID.
the pairs of longs should never overlap, although there may be gaps between pairs, for example:
1, 10, 366586BD-3980-4BD6-AFEB-45C19E8FC989
11, 15, 920EA34B-246B-41B0-92AF-D03E0AAA2692
20, 30, 07F9ED50-4FC7-431F-A9E6-783B87B78D0C
For every input long, there should be exactly 0 or 1 matching GUIDs.
so an input of 7, should return 366586BD-3980-4BD6-AFEB-45C19E8FC989
an input of 16 should return null
Update: I have about 90K pairs
How should I store this in-memory for fast searching?
Thanks
So long as they're stored in order, you can just do a binary search based on "start of range" vs candidate. Once you've found the entry with the highest "start of range" which is smaller than or equal to your target number, then either you've found an entry with the right GUID, or you've proved that you've hit a gap (because the entry with the highest smaller start of range has a lower end of range than your target).
You could potentially simplify the logic very slightly by making it a Dictionary<long, Guid?> and just record the start points, adding an entry with a null value for each gap. Then you just need to find the entry with the highest key which is less than or equal to your target number, and return the value.
Try this (I am sorry, not a solution for your IEnumerable):
public static Guid? Search(List<Tuple<long, long, Guid>> list, long input)
{
Tuple<long, long, Guid> item = new Tuple<long,long,Guid> { Item1 = input };
int index = list.BinarySearch(item, Comparer.Instance);
if (index >= 0) // Exact match found.
return list[index].Item3;
index = ~index;
if (index == 0)
return null;
item = list[index - 1];
if ((input >= item.Item1) && (input <= item.Item2))
return item.Item3;
return null;
}
public class Comparer : IComparer<Tuple<long, long, Guid>>
{
static public readonly Comparer Instance = new Comparer();
private Comparer()
{
}
public int Compare(Tuple<long,long,Guid> x, Tuple<long,long,Guid> y)
{
return x.Item1.CompareTo(y.Item1);
}
}
A B-tree is actually pretty good at this. Specifically, a B+-tree where each branch pointer has the start of your range as the key. The leaf data can contain the upper bound, so you deal with gaps correctly. I'm not sure if it's the best you could find anywhere, and you'd need to engineer it yourself, but it should certainly have very good performance.

Categories