LINQ and GroupBy

LINQ and GroupBy - c#

I haven't done much LINQ before, so I often find some aspects confusing. Recently someone created a query that looks like the following using the GroupBy operator. Here's what they did:
List<int> ranges = new List<int>() {100, 1000, 1000000};
List<int> sizes = new List<int>(new int[]{99,98,10,5,5454, 12432, 11, 12432, 992, 56, 222});
var xx = sizes.GroupBy (size => ranges.First(range => range >= size));
xx.Dump();
Basically I am very quite confused as to how the key expression works, i.e. ranges.First(range => range >= size
Can anyone shed some light? Can it be decomposed further to make this easier to understand? I thought that First would produce one result.
Thanks in advance.

size => ranges.First(range => range >= size) this Func builds key, on which sizes will be grouped. It takes current size and finds first range, which is greater or equal current size.
How it works:
For size 99 first range which >= 99 is 100. So, calculated key value will be 100. Size goes to group with key 100.
Next sizes 98, 10, 5 also will get key 100 and go to that group.
For size 5454 calculated key value will be 1000000 (it's the first range which is greater that 5454. So, new key is created, and size goes to group with key 1000000.
Etc.

ranges.First(range => range >= size) returns an int, the first range that is >= the current size value. So every size belongs to one range. That is the group.
Note that First throws an exception if there's no range which is >= the given size.

If you write the code with for loop it looks like this:
var myGroup = new Dictionary<int, List<int>>();
foreach( size in sizes)
{
// ranges.First(range => range >= size) is like bellow
range = find minimum value in ranges which is greater than equal to size;
// this grouping will be done autamatically by calling GroupBy in your code:
if (myGroup[range] has no value) // actually TryGetValue
myGroup[range] = new List<int>();
// this addition will be done by each iteration on your inputs.
myGroup[range].Add(item);
}
Just difference in your linq command is, it doesn't works with for loop, actually it works with hash table, and it's faster (in average), and if you learn linq well, it's more readable.

Not sure whether it adds to the clarity, but if you really want to break it down, you could do the following (I'm guessing you are using LinqPad)
List<int> ranges = new List<int>() {100, 1000, 1000000};
List<int> sizes = new List<int>(new int[]{99,98,10,5,5454, 12432, 11, 12432, 992, 56, 222});
void Main()
{
var xx = sizes.GroupBy (size => GetRangeValue(size));
xx.Dump();
}
private int GetRangeValue(int size)
{
// find the first value in ranges which is bigger than or equal to our size
return ranges.First(range => range >= size);
}
And yes, you are correct, First does produce one result.

Indeed, first returns one value, which becomes key for grouping.
What happens here is
- First is called for each value in sizes, returning the first range larger than size (100,100,100,100,1000000, 1000000, etc)
- "sizes" are grouped by this value. For every range a grouping is returned, for instance
100: 99,98,10,5,11...

GroupBy essentially builds a lookup table (dictionary) where each of the items in your source that meets a common condition is grouped into a list and then assigned to a key in the lookup table.
Here is a sample program that replaces your call to xx.Dump() with a code block that pretty-prints the output in a way specific to your example. Notice the use of OrderBy to first order the keys (range values) as well as group of items associated with each range.
using System;
using System.Collections.Generic;
using System.Linq;
class GroupByDemo
{
static public void Main(string[] args)
{
List<int> ranges = new List<int>() {100, 1000, 1000000};
List<int> sizes = new List<int>(
new int[]{99,98,10,5,5454, 12432, 11, 12432, 992, 56, 222});
var sizesByRange =
sizes.GroupBy(size => ranges.First(range => range >= size));
// Pretty-print the 'GroupBy' results.
foreach (var range in sizesByRange.OrderBy(r => r.Key))
{
Console.WriteLine("Sizes up to range limit '{0}':", range.Key);
foreach (var size in range.ToList().OrderBy(s => s))
{
Console.WriteLine(" {0}", size);
}
}
Console.WriteLine("--");
}
}
Expected Results
Notice that 12432 appears twice in the last group because that value appears twice in the original source list.
Sizes up to range limit '100':
5
10
11
56
98
99
Sizes up to range limit '1000':
222
992
Sizes up to range limit '1000000':
5454
12432
12432
--

Related

Algorithm to fit into goal sum using provided units of measure with least overage

Here is an example:
Customer orders 57 single peaces of an item. The company only sells in
units of 15 and 6.
The algorithm has to figure out the best possible combination of UOMs (unit of measure) with the following priorities in order of importance
least amount of overage
using highest unit of measure
In this example the expected result is List<int>:
{ 15, 15, 15, 6, 6 } //sum=57
I've researched "bin packing" and "knapsack problem" but couldn't figure out how it could be applied in this case.
So far I have this which is clearly doesn't accomplish the best combination.
void Main()
{
var solver = new Solver();
var r = solver.Solve(57, new decimal [] {6, 15}).Dump();
}
public class Solver
{
public List<decimal> mResults;
public List<decimal> Solve(decimal goal, decimal[] elements)
{
mResults = new List<decimal>();
RSolve(goal, 0m, elements, elements.Where(e => e <= goal).OrderByDescending(x => x).FirstOrDefault());
return mResults;
}
public void RSolve(decimal goal, decimal currentSum, decimal[] elements, decimal toAdd)
{
if(currentSum >= goal)
return;
currentSum += toAdd;
mResults.Add(toAdd);
var remainder = goal - currentSum;
var nextToAdd = elements.Where(e => e <= remainder).OrderByDescending(e => e).FirstOrDefault();
if(nextToAdd == 0)
nextToAdd = elements.OrderBy(e => e).FirstOrDefault();
RSolve(goal, currentSum, elements, nextToAdd);
}
}

This is an instance of the change-making problem. It can be solved by dynamic programming; build an array where the value at index i is the largest coin that can be used in a solution totalling i, or -1 if no solution is possible. If you use larger units first then the solutions will satisfy the "highest unit of measure" requirement automatically; to satisfy the "least amount of overage" requirement, you can try to find a solution for 57, if that doesn't work try 58, and so on.
The running time is at most O((n + 1 - n₀)k) where n is the required sum, n₀ is the largest index in the cache, and k is the number of units of measure. In particular, after trying 57, it takes at most O(k) time to try 58, then at most O(k) time to try 59, and so on.
To build the output list for a sum n, initialize an empty list, then while n > 0 append the value in the cache at index n, and subtract that value from n.

How do you compute for the moving average of a given dataset in C#?

So I use a randomly generated dataset and I need to find the moving average of the sample size the user inputted. For example, the dataset is a list with {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} and the sample size the user inputted is 2. The program must calculate first the mean of:
1 and 2 = 1.5, 2 and 3 = 2.5, 3 and 4 = 3.5,
and so on. How do I do this? Thanks! 😊

You can keep track of the sum and queue up all the values so you know what to subtract from the sum once you get to the sample size.
public static IEnumerable<decimal> MovingAverages(IEnumerable<decimal> values, int sample)
{
var queue = new Queue<decimal>(sample);
decimal sum = 0;
foreach(var x in values)
{
sum += x;
queue.Enqueue(x);
if(queue.Count == sample)
{
yield return sum / sample;
sum -= queue.Dequeue();
}
}
}
Basically this will queue up the first n values (equal to sample) and the sum. Once it gets the desired number of values to average it yields the sum divided by the sample size and then removes the oldest value from the queue and subtracts it from the sum. Note that if the sample size is larger than the number of values this will return an empty enumerable.

This can be done with a simple loop.
for (int i = 0; i <= array.Length - samplesize; i++)
Console.WriteLine(array.Skip(i).Take(samplesize).Average());
The Skip(i).Take(samplesize) portion selects only the elements you are interested in at the moment.

C# why does binarysearch have to be made on sorted arrays and lists?

C# why does binarysearch have to be made on sorted arrays and lists?
Is there any other method that does not require me to sort the list?
It kinda messes with my program in a way that I cannot sort the list for it to work as I want to.

A binary search works by dividing the list of candidates in half using equality. Imagine the following set:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
We can also represent this as a binary tree, to make it easier to visualise:
Source
Now, say we want to find the number 3. We can do it like so:
Is 3 smaller than 8? Yes. OK, now we're looking at everything between 1 and 7.
Is 3 smaller than 4? Yes. OK, now we're looking at everything between 1 and 3.
Is 3 smaller than 2? No. OK, now we're looking at 3.
We found it!
Now, if your list isn't sorted, how will we divide the list in half? The simple answer is: we can't. If we swap 3 and 15 in the example above, it would work like this:
Is 3 smaller than 8? Yes. OK, now we're looking at everything between 1 and 7.
Is 3 smaller than 4? Yes. OK, now we're looking at everything between 1 and 3 (except we swapped it with 15).
Is 3 smaller than 2? No. OK, now we're looking at 15.
Huh? There's no more items to check but we didn't find it. I guess it's not in the list.
The solution is to use an appropriate data type instead. For fast lookups of key/value pairs, I'll use a Dictionary. For fast checks if something already exists, I'll use a HashSet. For general storage I'll use a List or an array.
Dictionary example:
var values = new Dictionary<int, string>();
values[1] = "hello";
values[2] = "goodbye";
var value2 = values[2]; // this lookup will be fast because Dictionaries are internally optimised inside and partition keys' hash codes into buckets.
HashSet example:
var mySet = new HashSet<int>();
mySet.Add(1);
mySet.Add(2);
if (mySet.Contains(2)) // this lookup is fast for the same reason as a dictionary.
{
// do something
}
List exmaple:
var list = new List<int>();
list.Add(1);
list.Add(2);
if (list.Contains(2)) // this isn't fast because it has to visit each item in the list, but it works OK for small sets or places where performance isn't so important
{
}
var idx2 = list.IndexOf(2);
If you have multiple values with the same key, you could store a list in a Dictionary like this:
var values = new Dictionary<int, List<string>>();
if (!values.ContainsKey(key))
{
values[key] = new List<string>();
}
values[key].Add("value1");
values[key].Add("value2");

There is no way you use binary search on unordered collections. Sorting collection is the main concept of the binary search. The key is that on every move u take the middle index between l and r. On first step they are 0 and size - 1, after every step one of them becomes middle index between them. If x > arr[m] then l becomes m + 1, otherwise r becomes m - 1. Basically, on every step you take half of the array you had and, of course, it remains sorted. This code is recursive, if you don't know what recursion is(which is very important in programming), you can review and learn here.
// C# implementation of recursive Binary Search
using System;
class GFG {
// Returns index of x if it is present in
// arr[l..r], else return -1
static int binarySearch(int[] arr, int l,
int r, int x)
{
if (r >= l) {
int mid = l + (r - l) / 2;
// If the element is present at the
// middle itself
if (arr[mid] == x)
return mid;
// If element is smaller than mid, then
// it can only be present in left subarray
if (arr[mid] > x)
return binarySearch(arr, l, mid - 1, x);
// Else the element can only be present
// in right subarray
return binarySearch(arr, mid + 1, r, x);
}
// We reach here when element is not present
// in array
return -1;
}
// Driver method to test above
public static void Main()
{
int[] arr = { 2, 3, 4, 10, 40 };
int n = arr.Length;
int x = 10;
int result = binarySearch(arr, 0, n - 1, x);
if (result == -1)
Console.WriteLine("Element not present");
else
Console.WriteLine("Element found at index "
+ result);
}
}
Output:
Element is present at index 3

Sure there is.
var list = new List<int>();
list.Add(42);
list.Add(1);
list.Add(54);
var index = list.IndexOf(1); //TADA!!!!
EDIT: Ok, I hoped the irony was obvious. But strictly speaking, if your array is not sorted, you are pretty much stuck with the linear search, readily available by means of IndexOf() or IEnumerable.First().

Get IndexOf Second int record in a sorted List in C#

I am having problem while trying to get First and Second Record (not second highest/lowest integer) Index from a sorted List. Lets say that list consists of three records that in order are like this: 0, 0, 1.
I tried like this:
int FirstNumberIndex = MyList.IndexOf(MyList.OrderBy(item => item).Take(1).ToArray()[0]); //returns first record index, true
int SecondNumberIndex = MyList.IndexOf(MyList.OrderBy(item => item).Take(2).ToArray()[1]); //doesn't seem to work
As I explained, I am trying to get the indexes of first two zeros (they are not necessarily in ascending order before the sort) and not of zero and 1.
So if there was a list {0, 2, 4, 0} I need to get Indexes 0 and 3. But this may apply to any number that is smallest and repeats itself in the List.
However, it must also work when the smallest value does not repeat itself.

SecondNumberIndex is set to 0 because
MyList.OrderBy(item => item).Take(2).ToArray()[1] == 0
then you get
MyList.IndexOf(0)
that finds the first occurence of 0. 0 is equal to every other 0. So every time you ask for IndexOf(0), the very first 0 on the list gets found.
You can get what you want by using that sort of approach:
int FirstNumberIndex = MyList.IndexOf(0); //returns first record index, true
int SecondNumberIndex = MyList.IndexOf(0, FirstNumberIndex + 1 ); //will start search next to last ocurrence
From your code I guess you confuse some kind of "instance equality" with regular "equality".
Int is a simple type, IndexOf will not search for ocurrence of your specific instance of 0.
Keep in mind that this code, even if we will move in our thoughts to actual objects:
MyList.OrderBy(item => item).Take(2).ToArray()[1]
will not necessarily return equal objects in their original relative order from the input list.
EDIT
This cannot be adopted for general case, for getting indexes of ordered values from the original, unordered list.
If you are searching for indexes of any number of equal values, then setting bigger and bigger offset for the second parameter of IndexOf is OK.
But, let's consider a case when there are no duplicates. Such approach will work only when the input list is actually ordered ;)
You can preprocess your input list to have pairs (value = list[i],idx = i), then sort that pairs by value and then iterate over sorted pairs and print idx-es

You, probably, are asking about something like this:
var list = new List<int>{0,0,1};
var result = list.Select((val,i)=> new {value = val, idx = i}).Where(x=>x.value == 0);
foreach(var r in result) //anonymous type enumeration
Console.WriteLine(r.idx);

You can try user FindIndex.
var MyList = new List<int>() {3, 5, 1, 2, 4};
int firsIndex = MyList.FindIndex(a => a == MyList.OrderBy(item => item).Take(1).ToArray()[0]);
int secondIndex = MyList.FindIndex(a => a == MyList.OrderBy(item => item).Take(2).ToArray()[1]);

You could calculate the offset of the first occurrence, then use IndexOf on the list after skipping the offset.
int offset = ints.IndexOf(0) + 1;
int secondIndex = ints.Skip(offset).ToList().IndexOf(0) + offset;

Given two sets of numbers, find the smallest set from each where the sum is equal

I am working on an application that needs to match two sets of data based on various criteria, including the sum of any number of items from each set. I've distilled the problem down to this statement:
Given a set of items and transactions, find the smallest set of items where the sum is equal to the sum of the smallest set of transactions. (There’s some complexity I’m ignoring for this post, but for now I’m only concerned about the total amounts matching, not dates, descriptions, clearing differences, etc.)
Or, mathematically:Given two sets of numbers, find the smallest set from each where the sums are equal.
The other similar SO questions I've run across assume you know the sum ahead of time, or know the quantity from each set that you are going for.
And here is a test that (I think) illustrates what I'm going for.
[TestMethod]
public void StackOverflowTest()
{
var seta = new[]{10, 20, 30, 40, 50};
var setb = new[]{ 45, 45, 100, 200 };
var result = Magic(seta, setb);
Assert.AreEqual(new[]{40,50},result.SetA);
Assert.AreEqual(new[] { 45, 45 }, result.SetB);
}
class MagicResult
{
public int[] SetA { get; set; }
public int[] SetB { get; set; }
}
private MagicResult Magic(int[] seta, int[] setb)
{
throw new NotImplementedException();
}
I'm looking for an elegant solution that will make this pass, but will take any pseudocode or suggestion that gets me there ;)

Brute force:
var result = (from a in seta.Subsets()
from b in setb.Subsets()
where a.Count() > 0 && b.Count() > 0
where a.Sum() == b.Sum()
orderby a.Count() + b.Count()
select new MagicResult { SetA = a.ToArray(), SetB = b.ToArray() }
).First();
using the Subsets method from the EvenMoreLINQ project.

This can be solved using dynamic programming in O(nW) time where W is the size of the largest sum. Solve the knapsack problem for both sets to generate an array for each that contains all the possible sums and keep track of the number of items used. Then, compare equal sums in each array to find the minimum for each
Not tested, but this is the idea.
arr1dp = [None]*W; arr1dp[0] = 0;
arr2dp = [None]*W; arr2dp[0] = 0;
# knapsack arr1
for i in range(len(arr1)):
for cur_item in arr1:
if (arr1dp[cur_item] is not none):
arr1dp[cur_item+i] = min(arr1dp[cur_item]+1,arr1dp[cur_item])
# do the same for arr2
# omitted for brevity
# find the smallest match
for i in range(W):
if arr1dp[i] is not none and arr2dp[i] is not none:
min_val = min(min_val,arr1dp[i]+arr2dp[i])

If the two sets contain a number in common, there is a solution of size 1.
If not, try all sums of two numbers (there are N-choose-two, or N*(N-1)/2 in each set). Compare them against the collection of single-number and two-number sums.
If no joy, try all sums of three numbers, comparing them against 1, 2 or 3-number sums; and so on until all sums (2**N for a set of size N) have been tried.
Here's working code that stops searching as soon as it finds a solution. (There might be smaller sums with the same number of summands). It's in python, but that's practically pseudo-code :-)
from itertools import combinations
# To allow lists of different sizes: ensure list1 is never the short one
if len(list1) < len(list2):
list1, list2 = list2, list1
def found(val, dict1, dict2):
print "Sum:", val
print "Sum 1", dict1[val]
print "Sum 2", dict2[val]
def findsum(list1, list2):
# Each dict has sums as keys and lists of summands as values.
# We start with length 1:
dict1 = dict()
dict2 = dict()
for n in range(1, max(len(list1), len(list2))+1):
# Check all size n sums from list1 against size < n sums in list2
for nums in combinations(list1, n):
s = sum(nums)
if s in dict1: # Is this sum new for our list?
continue
dict1[s] = nums
if s in dict2:
found(s, dict1, dict2)
return # If you want to look for a smallest sum, keep going
# If list2 is too short, nothing to do
if len(list2) < n:
continue
# Check all size n sums from list2 against size <= n sums in list1
for nums in combinations(list2, n):
s = sum(nums)
if s in dict2: # Is this sum new for our list?
continue
dict2[s] = nums
if s in dict1:
found(s, dict1, dict2)
return # If you want to look for a smallest sum, keep going
findsum(list1, list2)
This is designed to find a solution in the smallest number of comparisons. If you also want the sum to be minimal, then at each size n generate all n-part sums at once, sort them and check them in increasing order.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

LINQ and GroupBy - c#

ranges.First(range => range >= size) returns an int, the first range that is >= the current size value. So every size belongs to one range. That is the group. Note that First throws an exception if there's no range which is >= the given size.

Related

Algorithm to fit into goal sum using provided units of measure with least overage

How do you compute for the moving average of a given dataset in C#?

C# why does binarysearch have to be made on sorted arrays and lists?

Get IndexOf Second int record in a sorted List in C#

Given two sets of numbers, find the smallest set from each where the sum is equal

Categories

Resources