C#: How to use the Enumerable.Aggregate method - c#

Lets say I have this amputated Person class:
class Person
{
public int Age { get; set; }
public string Country { get; set; }
public int SOReputation { get; set; }
public TimeSpan TimeSpentOnSO { get; set; }
...
}
I can then group on Age and Country like this:
var groups = aListOfPeople.GroupBy(x => new { x.Country, x.Age });
Then I can output all the groups with their reputation totals like this:
foreach(var g in groups)
Console.WriteLine("{0}, {1}:{2}",
g.Key.Country,
g.Key.Age,
g.Sum(x => x.SOReputation));
My question is, how can I get a sum of the TimeSpentOnSO property? The Sum method won't work in this case since it is only for int and such. I thought I could use the Aggregate method, but just seriously can't figure out how to use it... I'm trying all kinds properties and types in various combinations but the compiler just won't recognize it.
foreach(var g in groups)
Console.WriteLine("{0}, {1}:{2}",
g.Key.Country,
g.Key.Age,
g.Aggregate( what goes here?? ));
Have I completely missunderstood the Aggregate method? Or what is going on? Is it some other method I should use instead? Or do I have to write my own Sum variant for TimeSpans?
And to add to the mess, what if Person is an anonymous class, a result from for example a Select or a GroupJoin statement?
Just figured out that I could make the Aggregate method work if I did a Select on the TimeSpan property first... but I find that kind of annoying... Still don't feel I understand this method at all...
foreach(var g in groups)
Console.WriteLine("{0}, {1}:{2}",
g.Key.Country,
g.Key.Age,
g.Select(x => x.TimeSpentOnSO)
g.Aggregate((sum, x) => sum + y));

List<TimeSpan> list = new List<TimeSpan>
{
new TimeSpan(1),
new TimeSpan(2),
new TimeSpan(3)
};
TimeSpan total = list.Aggregate(TimeSpan.Zero, (sum, value) => sum.Add(value));
Debug.Assert(total.Ticks == 6);

g.Aggregate(TimeSpan.Zero, (i, p) => i + p.TimeSpentOnSO)
Basically, the first argument to Aggregate is an initializer, which is used as the first value of "i" in the function passed in the second argument. It'll iterate over the list, and each time, "i" will contain the total so far.
For example:
List<int> nums = new List<int>{1,2,3,4,5};
nums.Aggregate(0, (x,y) => x + y); // sums up the numbers, starting with 0 => 15
nums.Aggregate(0, (x,y) => x * y); // multiplies the numbers, starting with 0 => 0, because anything multiplied by 0 is 0
nums.Aggregate(1, (x,y) => x * y); // multiplies the numbers, starting with 1 => 120

A combination of Chris and Daniels answers solved it for me. I needed to initialize the TimeSpan, and I did things in the wrong order. The solution is:
foreach(var g in groups)
Console.WriteLine("{0}, {1}:{2}",
g.Key.Country,
g.Key.Age,
g.Aggregate(TimeSpan.Zero, (sum, x) => sum + x.TimeSpentOnSO));
Thanks!
And also... D'oh!

You could write TimeSpan Sum method...
public static TimeSpan Sum(this IEnumerable<TimeSpan> times)
{
return TimeSpan.FromTicks(times.Sum(t => t.Ticks));
}
public static TimeSpan Sum<TSource>(this IEnumerable<TSource> source,
Func<TSource, TimeSpan> selector)
{
return TimeSpan.FromTicks(source.Sum(t => selector(t).Ticks));
}
Alternatively, MiscUtil has generic-enabled Sum methods, so Sum should work on a TimeSpan just fine (since there is a TimeSpan+TimeSpan=>TimeSpan operator defined).
Just please don't tell me the number... it would scare me...

You could sum on one of the Total properties of the TimeSpan. For instance, you could get the represented TotalHours of time spent on SO like this:
g.Sum(x => x.SOReputation.TotalHours)
I believe this would give you the result you're looking for, but with the caveat that you'd have to put the units of measure according to what you need (hours, minutes, second, days, etc.)

Related

Group dateTime by hour range

I got a list like this:
class Article
{
...
Public DateTime PubTime{get;set}
...
}
List<Article> articles
Now I want to group this list with hour range :[0-5,6-11,12-17,18-23]
I know there is a cumbersome way to do this:
var firstRange = articles.Count(a => a.PubTime.Hour >= 0 && a.PubTime.Hour <= 5);
But I want to use a elegant way. How can I do that?Use Linq Or anything others?
Group by Hour / 6:
var grouped = articles.GroupBy(a => a.PubTime.Hour / 6);
IDictionary<int, int> CountsByHourGrouping = grouped.ToDictionary(g => g.Key, g => g.Count());
The key in the dictionary is the period (0 representing 0-5, 1 representing 6-11, 2 representing 12-17, and 3 representing 18-23). The value is the count of articles in that period.
Note that your dictionary will only contain values where those times existed in the source data, so it won't always contain 4 items.
You could write a CheckRange Function, which takes your values and returns a bool. To make your code more reusable and elegant.
Function Example:
bool CheckRange (this int number, int min, int max)
=> return (number >= min && number <= max);
You could now use this function to check if the PubTime.Hour is in the correct timelimit.
Implementation Example:
var firstRange = articles.Count(a => a.CheckRange(0, 5));

calculate sum of list properties excluding min and max value with linq

This is what I have so far:
decimal? total = list.Sum(item => item.Score);
What I would like to do is to exclude the min and max value in the list and then get the total value.
Is it possible to do all that in one linq statement?
list.OrderBy(item => item.Score)
.Skip(1)
.Reverse()
.Skip(1)
.Sum(item => item.Score);
You can try ordering the list first, then skip first item (minimum) and take all but the last (maximum) from the rest:
decimal? total = list.OrderBy(x => x.Score)
.Skip(1)
.Take(list.Count - 2)
.Sum(x => x.Score);
This is not the nicest code imaginable, but it does have the benefits of
only enumerating through the entire collection once (though it does get the first value three times).
Not require any much more memory than that to hold the IEnumerator and two Tuple<int, int, long, long> objects (which you'd not have if using OrderBy, ToList and sorting, etc.). This lets it work with arbitrarily large IEnumerable collections.
A single Linq expression (which is what you wanted).
Handles the edge cases (values.Count() < 2) properly:
when there's no values, using Min() and Max() on an IEnumerable will throw an InvalidOperationException
when there's one value, naïve implementations will do something like Sum() - Min() - Max() on the IEnumerable which returns the single value, negated.
I know you've already accepted an answer, but here it is: I'm using a single call to Enumerable.Aggregate.
public static long SumExcludingMinAndMax(IEnumerable<int> values)
{
// first parameter: seed (Tuple<running minimum, running maximum, count, running total>)
// second parameter: func to generate accumulate
// third parameter: func to select final result
var result = values.Aggregate(
Tuple.Create<int, int, long, long>(int.MaxValue, int.MinValue, 0, 0),
(accumulate, value) => Tuple.Create<int, int, long, long>(Math.Min(accumulate.Item1, value), Math.Max(accumulate.Item2, value), accumulate.Item3 + 1, accumulate.Item4 + value),
accumulate => accumulate.Item3 < 2 ? 0 : accumulate.Item4 - accumulate.Item1 - accumulate.Item2);
return result;
}
If you want to exclude all min- and max-values, pre-calculate both values and then use Ènumerable.Where to exclude them:
decimal? min = list.Min(item => item.Score);
decimal? max = list.Max(item => item.Score);
decimal? total = list
.Where(item=> item.Score != min && item.Score != max)
.Sum(item => item.Score);
You should pre-process list before sum to exclude min and max.

Get unique 3 digit zips from a list C#

I have a list of integers that represent US ZIP codes, and I want to get unique values based on the first three digits of the ZIP code. For example this is my list:
10433
30549
10456
54933
60594
30569
30659
My result should contain only:
10433
30549
54933
60594
30659
The US ZIP codes excluded from my list are: 10456 and 30659 because I already have the ZIPs that contain 104xx and 306xx.
I really don't know how to get this done, I guess it's not that hard, but I have no idea. I've made a function, that saves me the unique first three digits, and I've added some random 2 digits at the end of each zip. But it didn't worked out because I got for example 10423 but 10423 is not in my list, and I don't have a specific pattern that all my numbers have the last 2 digits in a range.
A little Linq should work. If using a list of ints:
var zips = new[] { 10433, 30549, 10456, 54933, 60594, 30569, 30659 };
var results = zips.GroupBy(z => z / 100).Select(g => g.First());
Or if using a list of strings:
var zips = new[] { "10433", "30549", "10456", "54933", "60594", "30569", "30659" };
var results = zips.GroupBy(z => z.Remove(3)).Select(g => g.First());
Another solution would be to use a custom IEqualityComparer<T>. For ints:
class ZipComparer : IEqualityComparer<int> {
public bool Equals(int x, int y) {
return x / 100 == y / 100;
}
public int GetHashCode(int x) {
return x / 100;
}
}
For strings:
class ZipComparer : IEqualityComparer<string> {
public bool Equals(string x, string y) {
return x.Remove(3) == y.Remove(3);
}
public int GetHashCode(string x) {
return x.Remove(3).GetHashCode();
}
}
Then to use it, you can simply call Distinct:
var result = zips.Distinct(new ZipComparer());
Finally, you also use MoreLINQ's DistinctBy extension method (also available on NuGet):
var results = zips.DistinctBy(z => z / 100);
// or
var results = zips.DistinctBy(z => z.Remove(3));
The popular answer here gives code, but doesn't solving the problem. The problem is that you don't seem to have an algorithm. So...
How would you solve this problem on paper?
I would imagine the process would be something like this:
For each number, determine the 3 digit prefix
If you don't already have a zip with that prefix, keep the number
If you do already have a zip with that prefix, discard the number
How would you write this in code?
Well, there's a couple things you need:
A bucket to keep track of what prefixes you have found, and which values you've kept.
A loop over all of the items
A way to determine the prefix
Here's one way to write this (you can convert it to using strings instead as an exercise):
ICollection<int> GetUniqueZipcodes(int[] zips)
{
Dictionary<int, int> bucket = new Dictionary<int,int>();
foreach (var zip in zips)
{
int prefix = GetPrefix(zip);
if(!bucket.ContainsKey(prefix))
{
bucket.Add(prefix, zip);
}
}
return bucket.Values;
}
int GetPrefix(int zip)
{
return zip / 100;
}
Getting concise
Now, many programmers these days would say "OMG so many lines of code this could be a one liner". And they're right, it could. Borrowing from p.s.w.g's answer, this can be condensed (in a very readable manner) to:
var results = zips.GroupBy(z => z / 100).Select(g => g.First());
So how does this work? zips.GroupBy(z => z/100) does the bucketing operation. You end up with a collection of groups that looks like this:
{
104: { 10433, 10456 },
305: { 30549, 30569 },
549: { 54933 },
605: { 60594 },
306: { 30659 }
}
Then we use .Select(g => g.First()), which takes the first item from each group.

Whats the most concise way to pick a random element by weight in c#?

Lets assume:
List<element> which element is:
public class Element {
int Weight { get; set; }
}
What I want to achieve is, select an element randomly by the weight.
For example:
Element_1.Weight = 100;
Element_2.Weight = 50;
Element_3.Weight = 200;
So
the chance Element_1 got selected is 100/(100+50+200)=28.57%
the chance Element_2 got selected is 50/(100+50+200)=14.29%
the chance Element_3 got selected is 200/(100+50+200)=57.14%
I know I can create a loop, calculate total, etc...
What I want to learn is, whats the best way to do this by Linq in ONE line (or as short as possible), thanks.
UPDATE
I found my answer below. First thing I learn is: Linq is NOT magic, it's slower then well-designed loop.
So my question becomes find a random element by weight, (without as short as possible stuff :)
If you want a generic version (useful for using with a (singleton) randomize helper, consider whether you need a constant seed or not)
usage:
randomizer.GetRandomItem(items, x => x.Weight)
code:
public T GetRandomItem<T>(IEnumerable<T> itemsEnumerable, Func<T, int> weightKey)
{
var items = itemsEnumerable.ToList();
var totalWeight = items.Sum(x => weightKey(x));
var randomWeightedIndex = _random.Next(totalWeight);
var itemWeightedIndex = 0;
foreach(var item in items)
{
itemWeightedIndex += weightKey(item);
if(randomWeightedIndex < itemWeightedIndex)
return item;
}
throw new ArgumentException("Collection count and weights must be greater than 0");
}
// assuming rnd is an already instantiated instance of the Random class
var max = list.Sum(y => y.Weight);
var rand = rnd.Next(max);
var res = list
.FirstOrDefault(x => rand >= (max -= x.Weight));
This is a fast solution with precomputation. The precomputation takes O(n), the search O(log(n)).
Precompute:
int[] lookup=new int[elements.Length];
lookup[0]=elements[0].Weight-1;
for(int i=1;i<lookup.Length;i++)
{
lookup[i]=lookup[i-1]+elements[i].Weight;
}
To generate:
int total=lookup[lookup.Length-1];
int chosen=random.GetNext(total);
int index=Array.BinarySearch(lookup,chosen);
if(index<0)
index=~index;
return elements[index];
But if the list changes between each search, you can instead use a simple O(n) linear search:
int total=elements.Sum(e=>e.Weight);
int chosen=random.GetNext(total);
int runningSum=0;
foreach(var element in elements)
{
runningSum+=element.Weight;
if(chosen<runningSum)
return element;
}
This could work:
int weightsSum = list.Sum(element => element.Weight);
int start = 1;
var partitions = list.Select(element =>
{
var oldStart = start;
start += element.Weight;
return new { Element = element, End = oldStart + element.Weight - 1};
});
var randomWeight = random.Next(weightsSum);
var randomElement = partitions.First(partition => (partition.End > randomWeight)).
Select(partition => partition.Element);
Basically, for each element a partition is created with an end weight.
In your example, Element1 would associated to (1-->100), Element2 associated to (101-->151) and so on...
Then a random weight sum is calculated and we look for the range which is associated to it.
You could also compute the sum in the method group but that would introduce another side effect...
Note that I'm not saying this is elegant or fast. But it does use linq (not in one line...)
.Net 6 introduced .MaxBy making this much easier.
This could now be simplified to the following one-liner:
list.MaxBy(x => rng.GetNext(x.weight));
This works best if the weights are large or floating point numbers, otherwise there will be collisions, which can be prevented by multiplying the weight by some factor.

Convert Sum to an Aggregate product expression

I have this expression:
group i by i.ItemId into g
select new
{
Id = g.Key,
Score = g.Sum(i => i.Score)
}).ToDictionary(o => o.Id, o => o.Score);
and instead of g.Sum I'd like to get the mathematical product using Aggregate.
To make sure it worked the same as .Sum (but as product) I tried make an Aggregate function that would just return the sum...
Score = g.Aggregate(0.0, (sum, nextItem) => sum + nextItem.Score.Value)
However, this does not give the same result as using .Sum. Any idas why?
nextItem.Score is of type double?.
public static class MyExtensions
{
public static double Product(this IEnumerable<double?> enumerable)
{
return enumerable
.Aggregate(1.0, (accumulator, current) => accumulator * current.Value);
}
}
The thing is that in your example you are starting the multiplication with 0.0 - A multiplication with zero yields zero, at the end the result will be zero.
Correct is to use the identity property of multiplication. While adding zero to a number leaves the number of unchanged, the same property holds true for a multiplication with 1. Hence, the correct way to start a product aggregate is to kick off multiplication wit the number 1.0.
If you aren't sure about initial value in your aggregate query and you don't acutally need one (like in this example) I would recommend you not to use it at all.
You can use Aggregate overload which doesn't take the initial value - http://msdn.microsoft.com/en-us/library/bb549218.aspx
Like this
int product = sequence.Aggregate((x, acc) => x * acc);
Which evaluates to item1 * (item2 * (item3 * ... * itemN)).
instead of
int product = sequence.Aggregate(1.0, (x, acc) => x * acc);
Which evaluates to 1.0 * (item1 * (item2 * (item3 * ... * itemN))).
//edit:
There is one important difference though. Former one does throw an InvalidOperationException when the input sequence is empty. Latter one returns seed value, therefore 1.0.

Categories