Please consider a List of Tuples in C#. This relates to the original Tuple (not Value Tuple). How can I get the index of the List, if I know one of the Items within the List of Tuples?
List<Tuple<double, int>> ListOfTuples2 = new
List<Tuple<double, int>>();
double doubleTuple = 5000;
int intTuple = 7;
ListOfTuples2.Add(Tuple.Create(doubleTuple, intTuple));
ListOfTuples2.Add(Tuple.Create(5000.00, 2));
ListOfTuples2.Add(Tuple.Create(5000.25, 3));
ListOfTuples2.Add(Tuple.Create(5000.50, 4));
ListOfTuples2.Add(Tuple.Create(5000.25, 5));
/* How can I get the Index of the List if
doubleTuple = 5000.25 ? */
You can use the FindIndex method of the list accepting a predicate as argument
int index = ListOfTuples2.FindIndex(t => t.Item1 == 5000.25);
if (index > = 0) {
// found!
}
FindIndex returns -1 if no such item is found.
But you might consider using a dictionary instead. If the collection is big, it finds entries much faster than a list. The retrieval times in Big O notation: List<T> is O(n), Dictionary<K,V> is O(1). However, items in a dictionary are not ordered and have no index. In addition, keys must be unique. If you need ordered items, stick to the list.
var dict = new Dictionary<double, int>{
[doubleTuple] = intTuple,
[5000.00] = 2,
[5000.25] = 3,
[5000.50] = 4,
[5000.25] = 5
}
if (dict.TryGetValue(5000.25, out int result)) {
// result is 3; and contains the value, not the index.
}
You can also add entries with
dict.Add(5000.75, 8);
If you are sure that the dictionary contains an entry, you can simply retrieve it with
int result = dict[5000.25];
Also, if you are dealing with prices, consider using the decimal type. If has been created specifically for financial and monetary calculations. The double type stores the values as binary numbers. 0.1 (decimal) is 0.000110011001100110011001100110011... (binary), i.e., double introduces a rounding error, solely by converting a decimal constant into its binary representation, whereas decimal stores each decimal of the constant as is. double is okay (and faster) for scientific calculations. It makes no difference whether a temperature is 29.7 or 29.69999999999 degrees, since you can measure it with a very limited precision anyway (maybe 1%).
C# 7.0 has added ValueTuple types plus a simple syntax for tuple types and tuple values. Consider replacing the Tuple class with this new feature.
var listOfValueTuples = new List<(double, int)> {
(doubleTuple, intTuple),
(5000.00, 2),
(5000.25, 3),
(5000.50, 4),
(5000.25, 5)
};
In case you want to get all indexes you can write following code:
var indexes = ListOfTuples2.Select((tuple, index) => new {tuple, index}).Where(o => Math.Abs(o.tuple.Item1 - 5000.25) < 1e-5).Select(o => o.index - 1);
foreach (var index in indexes)
{
Console.WriteLine(index);
}
Note that comparing two floats can return unpredictable results, so I used comparing using Math.Abs method
Related
C# why does binarysearch have to be made on sorted arrays and lists?
Is there any other method that does not require me to sort the list?
It kinda messes with my program in a way that I cannot sort the list for it to work as I want to.
A binary search works by dividing the list of candidates in half using equality. Imagine the following set:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
We can also represent this as a binary tree, to make it easier to visualise:
Source
Now, say we want to find the number 3. We can do it like so:
Is 3 smaller than 8? Yes. OK, now we're looking at everything between 1 and 7.
Is 3 smaller than 4? Yes. OK, now we're looking at everything between 1 and 3.
Is 3 smaller than 2? No. OK, now we're looking at 3.
We found it!
Now, if your list isn't sorted, how will we divide the list in half? The simple answer is: we can't. If we swap 3 and 15 in the example above, it would work like this:
Is 3 smaller than 8? Yes. OK, now we're looking at everything between 1 and 7.
Is 3 smaller than 4? Yes. OK, now we're looking at everything between 1 and 3 (except we swapped it with 15).
Is 3 smaller than 2? No. OK, now we're looking at 15.
Huh? There's no more items to check but we didn't find it. I guess it's not in the list.
The solution is to use an appropriate data type instead. For fast lookups of key/value pairs, I'll use a Dictionary. For fast checks if something already exists, I'll use a HashSet. For general storage I'll use a List or an array.
Dictionary example:
var values = new Dictionary<int, string>();
values[1] = "hello";
values[2] = "goodbye";
var value2 = values[2]; // this lookup will be fast because Dictionaries are internally optimised inside and partition keys' hash codes into buckets.
HashSet example:
var mySet = new HashSet<int>();
mySet.Add(1);
mySet.Add(2);
if (mySet.Contains(2)) // this lookup is fast for the same reason as a dictionary.
{
// do something
}
List exmaple:
var list = new List<int>();
list.Add(1);
list.Add(2);
if (list.Contains(2)) // this isn't fast because it has to visit each item in the list, but it works OK for small sets or places where performance isn't so important
{
}
var idx2 = list.IndexOf(2);
If you have multiple values with the same key, you could store a list in a Dictionary like this:
var values = new Dictionary<int, List<string>>();
if (!values.ContainsKey(key))
{
values[key] = new List<string>();
}
values[key].Add("value1");
values[key].Add("value2");
There is no way you use binary search on unordered collections. Sorting collection is the main concept of the binary search. The key is that on every move u take the middle index between l and r. On first step they are 0 and size - 1, after every step one of them becomes middle index between them. If x > arr[m] then l becomes m + 1, otherwise r becomes m - 1. Basically, on every step you take half of the array you had and, of course, it remains sorted. This code is recursive, if you don't know what recursion is(which is very important in programming), you can review and learn here.
// C# implementation of recursive Binary Search
using System;
class GFG {
// Returns index of x if it is present in
// arr[l..r], else return -1
static int binarySearch(int[] arr, int l,
int r, int x)
{
if (r >= l) {
int mid = l + (r - l) / 2;
// If the element is present at the
// middle itself
if (arr[mid] == x)
return mid;
// If element is smaller than mid, then
// it can only be present in left subarray
if (arr[mid] > x)
return binarySearch(arr, l, mid - 1, x);
// Else the element can only be present
// in right subarray
return binarySearch(arr, mid + 1, r, x);
}
// We reach here when element is not present
// in array
return -1;
}
// Driver method to test above
public static void Main()
{
int[] arr = { 2, 3, 4, 10, 40 };
int n = arr.Length;
int x = 10;
int result = binarySearch(arr, 0, n - 1, x);
if (result == -1)
Console.WriteLine("Element not present");
else
Console.WriteLine("Element found at index "
+ result);
}
}
Output:
Element is present at index 3
Sure there is.
var list = new List<int>();
list.Add(42);
list.Add(1);
list.Add(54);
var index = list.IndexOf(1); //TADA!!!!
EDIT: Ok, I hoped the irony was obvious. But strictly speaking, if your array is not sorted, you are pretty much stuck with the linear search, readily available by means of IndexOf() or IEnumerable.First().
I have two lists of doubles that I need to compare for equality. There are obviously a million ways to do this, the simplest probably being list1.Equals(list2). However I want to have some sort of error message indicating precisely every list index and value for both lists wherever there is a difference. This error message would hopefully be something like
list1 and list2 are not equal.
list1 has value 0.1 at index 2, list2 has value 0.05 at index 2
etc. etc. for every difference
I also have a Utilities method already called AreEqual that basically just compares the values.
My first thought was evidently to loop through the lists and use AreEqual (I already know the lists are the same length)
for (int index = 0; index < list1.Count; index++)
{
check.AreEqual(list1[index], list2[index]);
}
but this doesn't help much for generating a useful error message unless in the case they're not equal I call some method to generate an error message like this
public string ErrorMessage(List<double> oldList, List<double> newList)
{
// build some error message here by taking the list difference
// and using IndexOf or whatnot
}
This seems super overkill, though. I can think of a million ways to do this but I can't determine what an appropriate way to do it is.
Is looping over the values and calling an error-message generating method reasonable?
Or is using something like
list3 = list1.Except(list2)
and then checking whether or not list3 is empty or not and correspondingly using IndexOf to get the differing values in both lists appropriate?
Or am I losing my mind and there's a much more straightforward way to do this?
You can use following LINQ query:
string sizeMsg = "";
if (list1.Count != list2.Count)
sizeMsg = String.Format("They have a different size, list1.Count:{0} list2.Count:{1}", list1.Count, list2.Count);
int count = Math.Min(list1.Count, list2.Count);
var differences = Enumerable.Range(0, count)
.Select(index => new { index, d1 = list1[index], d2 = list2[index] })
.Where(x => x.d1 != x.d2)
.Select(x => String.Format("list1 has value {0} at index {1}, list2 has value {2} at index {1}"
, x.d1, x.index, x.d2));
string differenceMessage = String.Join(Environment.NewLine, differences);
I think that using Linq here just makes it more complicated, when you can just do something like this:
public static IEnumerable<string> DifferenceErrors(List<double> list1, List<double> list2)
{
// I recommend defining a minimum difference below which you consider the values to be identical:
const double EPSILON = 0.00001;
for (int i = 0; i < list1.Count; ++i)
if (Math.Abs(list1[i] - list2[i]) >= EPSILON)
yield return $"At index {i}, list1 has value {list1[i]} and list2 has value {list2[i]}";
}
If you want to use C# prior to C#6 change the yield to this:
yield return string.Format("At index {0} list1 has value {1} and list2 has value {2}", i, list1[i], list2[i]);
for eq test I will use this and check if list3 is empty
list3 = list1.Except(list2)
if list3 is not empty and values are unique - we can loop thru list three and provide meaning full feedback.
This seems to be the easiest for me.
but using linqPad - had a small test(6 entries are different)
var list1 = new List<double>{1,2,3,4,7,8,9,10,11};
var list2 = new List<double>{1,2,3,5,6,7,8,19,20};
var list3 = list1.Except(list2).Dump();
var list4 = list2.Except(list1).Dump();
IEnumerable (4 items) 4 9 10 11
IEnumerable (4 items) 5 6 19 20
but result gives us only four entries are different.
If you care about order - there is a need for a loop, if not - go with except.
I have an array of double Double[] array = new Double[5];
For example, if the array contains data like this:
{0.5 , 1.5 , 1.1 , 0.6 , 2}
How do I find the number that is closest to 1? The output should be 1.1, because it's the one that is closest to 1 in this case.
var result = source.OrderBy(x => Math.Abs(1 - x)).First();
Requires using System.Linq; at the top of a file. It's O(n log(n)) solution.
Update
If you're really afraid about performance and want O(n) solution, you can use MinBy() extension method from moreLINQ library.
Or you could use Aggregate() method:
var result = source.Aggregate(
new { val = 0d, abs = double.MaxValue },
(a, i) => Math.Abs(1 - i) > a.abs ? a : new { val = i, abs = Math.Abs(1 - i) },
a => a.val);
You can achieve this in a simple way using LINQ:
var closestTo1 = array.OrderBy(x => Math.Abs(x - 1)).First();
Something like this should be easy to understand by any programmer and has O(n) complexity (non-LINQ):
double minValue = array[0];
double minDifference = Math.Abs(array[0] - 1);
foreach (double val in array)
{
int dif = Math.Abs(x - 1);
if (dif < minValue)
{
minDifference = dif;
minValue = val;
}
}
After this code executes, minValue will have your required value.
Code summary:
It will set the minimum value as the first element of the array. Then the difference will be the absolute value of the first element minus 1.
This loop will linear search the array and find the minimum value of the array. If the difference is less than the minimum value it will set a new minimum difference and minimum value.
Let's suppose I have this array (it is actually 255 long, values up to int.MaxValue):
int[] lows = {0,9,0,0,5,0,0,8,4,1,3,0,0,0,0};
From this array I would like to get index of a value equal or smaller to my number.
number = 7 -> index = 4
number = 2 -> index = 9
number = 8 -> index = 7
number = 9 -> index = 1
What would be the fastest way of finding it?
So far I've used linear search, but that turned out to be too inefficient for my need, because even though this array is only 255 long, values will be searched for a few million times.
I would need something equal to TreeSet.floor(E) used in java. I wanted to use Dictionary, but i don't know if it can find first smaller or equal value like I need.
Sort the array and then do a binary search to find the values.
See:
https://en.wikipedia.org/wiki/Binary_search
and
Array.BinarySearch Method
If it's not sorted (or otherwise held in a data structure where there is a relationship between the members that can assist the search), then you will have to examine every member to find the right one.
The easiest solution is probably to sort it and then do a binary chop/search to find the element matching your criteria.
If you want efficiency with the ability to still take unsorted arrays, maintain a sorted flag somewhere for the array (i.e., turn the whole thing into a class containing the indicator and the array) that indicates that the list is sorted.
Then you set this flag to false whenever the array is changed.
At the point where you want to do your search, you first check the sorted flag and sort the array if it's set to false (setting it to true as part of that process). If the flag is true, just bypass the sort.
That way, you only sort when needed. If the array hasn't changed since the last sort, there's no point in re-sorting.
You can also maintain the original unsorted list if the user needs that, keeping the sorted list as an additional array withing the class (another advantage of class-ifying your array). That way, you lose nothing. You have the original untouched data for the user to get at, and a fast means of efficiently find your desired element.
Your object (when sorted) would then contain:
int[] lows = {0,9,0,0,5,0,0,8,4,1,3,0,0,0,0};
int[] sortedlows = {0,0,0,0,0,0,0,0,0,1,3,4,5,8,9};
boolean isSorted = true;
If you then changed that_object[0] to 3, you'd end up with:
int[] lows = {3,9,0,0,5,0,0,8,4,1,3,0,0,0,0};
int[] sortedlows = {0,0,0,0,0,0,0,0,0,1,3,4,5,8,9};
boolean isSorted = false;
indicating that a sort would be needed before searching through sortedLows.
And keep in mind it's not a requirement to turn this into a class. If you're worried about it's performance (specifically accessing array elements through a getter method), you can maintain the arrays and flag yourself while still allowing direct access to the unsorted array. You just have to ensure that every place in your code that changes the array also sets the flag correctly.
But you should measure the performance before taking this path. The class-based way is "safer" since the object itself controls the whole thing.
First, normalise the data:
public static Dictionary<int, int> GetNormalised(int[] data)
{
var normalised = data.Select((value, index) => new { value, index })
.GroupBy(p => p.value, p => p.index)
.Where(p => p.Key != 0)
.OrderBy(p => p.Key)
.ToDictionary(p => p.Key, p => p.Min());
return normalised;
}
The search method:
public static int GetNearest(Dictionary<int, int> normalised, int value)
{
var res = normalised.Where(p => p.Key <= value)
.OrderBy(p => value - p.Key)
.Select(p => (int?)p.Value)
.FirstOrDefault();
if (res == null)
{
throw new ArgumentOutOfRangeException("value", "Not found");
}
return res.Value;
}
The unit test:
[TestMethod]
public void GetNearestTest()
{
var data = new[] { 0, 9, 0, 0, 5, 0, 0, 8, 4, 1, 3, 0, 0, 0, 0 };
var normalised = Program.GetNormalised(data);
var value = 7;
var expected = 4;
var actual = Program_Accessor.GetNearest(normalised, value);
Assert.AreEqual(expected, actual);
value = 2;
expected = 9;
actual = Program_Accessor.GetNearest(normalised, value);
Assert.AreEqual(expected, actual);
value = 8;
expected = 7;
actual = Program_Accessor.GetNearest(normalised, value);
Assert.AreEqual(expected, actual);
value = 9;
expected = 1;
actual = Program_Accessor.GetNearest(normalised, value);
Assert.AreEqual(expected, actual);
}
To optimise the performance cache all the used results.
To mock Java TreeSet in C#, use C# class: SortedDictionary or SortedSet; mock floor method in Java TreeSet, use LINQ method, to get minimum value.
SortedSet data = new SortedSet();
data.Where(p =>p < prices[i]).OrderByDescending(p=>p ).Take(1)
Based on HackerRank minimum cost algorithm, Java TreeSet implementation solution works fine, but C# SortedDictionary, SortedSet timeout.
Detail, see my coding blog: http://juliachencoding.blogspot.ca/2016/11/c-sorteddictionary-minimum-cost.html
So, C# SortedSet class GetViewBetween can do the same thing. See GetViewBetween API
There is a post: SortedSet / SortedList with better LINQ performance?
Basically i have a dictionary containing all the words of my vocabulary as keys, and all with 0 as value.
To process a document into a bag of words representation i used to copy that dictionary with the appropriate IEqualityComparer and simply checked if the dictionary contained every word in the document and incremented it's key.
To get the array of the bag of words representation i simply used the ToArray method.
This seemed to work fine, but i was just told that the dictionary doesnt assure the same Key order, so the resulting arrays might represent the words in different order, making it useless.
My current idea to solve this problem is to copy all the keys of the word dictionary into an ArrayList, create an array of the proper size and then use the indexOf method of the array list to fill the array.
So my question is, is there any better way to solve this, mine seems kinda crude... and won't i have issues because of the IEqualityComparer?
Let me see if I understand the problem. You have two documents D1 and D2 each containing a sequence of words drawn from a known vocabulary {W1, W2... Wn}. You wish to obtain two mappings indicating the number of occurrences of each word in each document. So for D1, you might have
W1 --> 0
W2 --> 1
W3 --> 4
indicating that D1 was perhaps "W3 W2 W3 W3 W3". Perhaps D2 is "W2 W1 W2", so its mapping is
W1 --> 1
W2 --> 2
W3 --> 0
You wish to take both mappings and determine the vectors [0, 1, 4] and [1, 2, 0] and then compute the angle between those vectors as a way of determining how similar or different the two documents are.
Your problem is that the dictionary does not guarantee that the key/value pairs are enumerated in any particular order.
OK, so order them.
vector1 = (from pair in map1 orderby pair.Key select pair.Value).ToArray();
vector2 = (from pair in map2 orderby pair.Key select pair.Value).ToArray();
and you're done.
Does that solve your problem, or am I misunderstanding the scenario?
If I understand correctly, you want to split a document by word frequency.
You could take the document and run a Regex over it to split out the words:
var words=Regex
.Matches(input,#"\w+")
.Cast<Match>()
.Where(m=>m.Success)
.Select(m=>m.Value);
To make the frequency map:
var map=words.GroupBy(w=>w).Select(g=>new{word=g.Key,freqency=g.Count()});
There are overloads of the GroupBy method that allow you to supply an alternative IEqualityComparer if this is important.
Reading your comments, to create a corresponding sequence of only frequencies:
map.Select(a=>a.frequency)
This sequence will be in exactly the same order as the sequence map above.
Is this any help at all?
There is also an OrderedDictionary.
Represents a collection of key/value
pairs that are accessible by the key
or index.
Something like this might work although it is definitely ugly and I believe is similar to what you were suggesting. GetWordCount() does the work.
class WordCounter
{
public Dictionary dictionary = new Dictionary();
public void CountWords(string text)
{
if (text != null && text != string.Empty)
{
text = text.ToLower();
string[] words = text.Split(' ');
if (dictionary.ContainsKey(words[0]))
{
if (text.Length > words[0].Length)
{
text = text.Substring(words[0].Length + 1);
CountWords(text);
}
}
else
{
int count = words.Count(
delegate(string s)
{
if (s == words[0]) { return true; }
else { return false; }
});
dictionary.Add(words[0], count);
if (text.Length > words[0].Length)
{
text = text.Substring(words[0].Length + 1);
CountWords(text);
}
}
}
}
public int[] GetWordCount(string text)
{
CountWords(text);
return dictionary.Values.ToArray<int>();
}
}
Would be this helpful to you:
SortedDictionary<string, int> dic = new SortedDictionary<string, int>();
for (int i = 0; i < 10; i++)
{
if (dic.ContainsKey("Word" + i))
dic["Word" + i]++;
else
dic.Add("Word" + i, 0);
}
//to get the array of words:
List<string> wordsList = new List<string>(dic.Keys);
string[] wordsArr = wordsList.ToArray();
//to get the array of values
List<int> valuesList = new List<int>(dic.Values);
int[] valuesArr = valuesList.ToArray();
If all you're trying to do is calculate cosine similarity, you don't need to convert your data to 20,000-length arrays, especially considering the data would likely be sparse with most entries being zero.
While processing the files, store the file output data into a Dictionary keyed on the word. Then to calculate the dot product and magnitudes, you iterate through the words in the full word list, look for the word in each of the file ouptut data, and use the found value if it exists and zero if it doesn't.