I have a simple class:
class Balls
{
public int BallType;
}
And i have a really simple list:
var balls = new List<Balls>()
{
new Balls() { BallType = 1},
new Balls() { BallType = 1},
new Balls() { BallType = 1},
new Balls() { BallType = 2}
};
I've used GroupBy on this list and I want to get back the key which has the highest count/amount:
After I used x.GroupBy(q => q.BallType) I tried to use .Max(), but it returns 3 and I need the key which is 1.
I also tried to use Console.WriteLine(x.GroupBy(q => q.Balltype).Max().Key); but it throws System.ArgumentException.
Here's what I came up with:
var mostCommonBallType = balls
.GroupBy(k => k.BallType)
.OrderBy(g => g.Count())
.Last().Key
You group by the BallType, order by the count of items in the group, get the last value (since order by is in an ascending order, the most common value would be the last) and then return it's key
Some came up with the idea to order the sequence:
var mostCommonBallType = balls
.GroupBy(k => k.BallType)
.OrderBy(g => g.Count())
.Last().Key
Apart from that it is more efficient to OrderByDescending and then take the FirstOrDefault, you also get in trouble if your collection of Balls is empty.
If you use a different overload of GroupBy, you won't have these problems
var mostCommonBallType = balls.GroupBy(
// KeySelector:
k => k.BallType,
// ResultSelector:
(ballType, ballsWithThisBallType) => new
{
BallType = ballType,
Count = ballsWithThisBallType.Count(),
})
.OrderByDescending(group => group.Count)
.Select(group => group.BallType)
.FirstOrDefault();
This solves the previously mentioned problems. However, if you only need the 1st element, why would you order the 2nd and the 3rd element? Using Aggregate instead of OrderByDescending will enumerate only once:
Assuming your collection is not empty:
var result = ... GroupBy(...)
.Aggregate( (groupWithHighestBallCount, nextGroup) =>
(groupWithHighestBallCount.Count >= nextGroup.Count) ?
groupWithHighestBallCount : nextGroup)
.Select(...).FirstOrDefault();
Aggregate takes the first element of your non-empty sequence, and assigns it to groupWithHighestBallCount. Then it iterates over the rest of the sequence, and compare this nextGroup.Count with the groupWithHighestBallCount.Count. It keeps the one with the hightes value as the next groupWithHighestBallCount. The return value is the final groupWithHighestBallCount.
See that Aggregate only enumerates once?
Related
I would like to sort a List<string> in a particular way. Below is a unit test showing the input, the specific way (which I am calling a "hierarchy" - feel free to correct my terminology so that I may learn), and the desired output. The code should be self explanatory.
[Test]
public void CustomSortByHierarchy()
{
List<string> input = new List<string>{"TJ", "DJ", "HR", "HR", "TJ"};
List<string> hierarchy = new List<string>{"HR", "TJ", "DJ" };
List<string> sorted = input.Sort(hierarchy); // <-- does not compile. How do I sort by the hierarchy?
// ...and if the sort worked as desired, these assert statements would return true:
Assert.AreEqual("HR", sorted[0]);
Assert.AreEqual("HR", sorted[1]);
Assert.AreEqual("TJ", sorted[2]);
Assert.AreEqual("TJ", sorted[3]);
Assert.AreEqual("DJ", sorted[4]);
}
Another way to do it:
var hierarchy = new Dictionary<string, int>{
{ "HR", 1},
{ "TJ", 2},
{ "DJ", 3} };
var sorted = strings.OrderBy(s => hierarchy[s]).ToList();
There are so many ways to do this.
It's not great to create a static dictionary - especially when you have a static list of the values already in the order that you want (i.e. List<string> hierarchy = new List<string>{"HR", "TJ", "DJ" };). The problem with a static dictionary is that it is static - to change it you must recompile your program - and also it's prone to errors - you might mistype a number. It's best to dynamically create the dictionary. That way you can adjust your hierarchy at run-time and use it to order your input.
Here's the basic way to create the dictionary:
Dictionary<string, int> indices =
hierarchy
.Select((value, index) => new { value, index })
.ToDictionary(x => x.value, x => x.index);
Then it's an easy sort:
List<string> sorted = input.OrderBy(x => indices[x]).ToList();
However, if you have a missing value in the hierarchy then this will blow up with a KeyNotFoundException exception.
Try with this input:
List<string> input = new List<string> { "TJ", "DJ", "HR", "HR", "TJ", "XX" };
You need to decide if you are removing missing items from the list or concatenating them at the end of the list.
To remove you'd do this:
List<string> sorted =
input
.Where(x => indices.ContainsKey(x))
.OrderBy(x => indices[x])
.ToList();
Or to sort to the end you'd do this:
List<string> sorted =
input
.OrderBy(x => indices.ContainsKey(x) ? indices[x] : int.MaxValue)
.ThenBy(x => x) // groups missing items together and is optional
.ToList();
If you simply want to remove items from input that aren't in hierarchy then there are a couple of other options that might be appealing.
Try this:
List<string> sorted =
(
from x in input
join y in hierarchy.Select((value, index) => new { value, index })
on x equals y.value
orderby y.index
select x
).ToList();
Or this:
ILookup<string, string> lookup = input.ToLookup(x => x);
List<string> sorted = hierarchy.SelectMany(x => lookup[x]).ToList();
Personally, I like this last one. It's a two liner and it doesn't rely on indices at all.
I have got this assignment. I need to create method which works with JSON data in this form:
On input N, what is top N of movies? The score of a movie is its average rate
So I have a JSONfile with 5 mil. movies inside. Each row looks like this:
{ Reviewer:1, Movie:1535440, Grade:1, Date:'2005-08-18'},
{ Reviewer:1, Movie:1666666, Grade:2, Date:'2006-09-20'},
{ Reviewer:2, Movie:1535440, Grade:3, Date:'2008-05-10'},
{ Reviewer:3, Movie:1535440, Grade:5, Date:'2008-05-11'},
This file is deserialized and then saved as a IEnumerable. And then I wanted to create a method, which returns List<int> where int is MovieId. Movies in the list are ordered descending and the amount of "top" movies is specified as a parameter of the method.
My method looks like this:
public List<int> GetSpecificAmountOfBestMovies(int amountOfMovies)
{
var moviesAndAverageGradeSortedList = _deserializator.RatingCollection()
.GroupBy(movieId => movieId.Movie)
.Select(group => new
{
Key = group.Key,
Average = group.Average(g => g.Grade)
})
.OrderByDescending(a => a.Average)
.Take(amountOfMovies)
.ToList();
var moviesSortedList = new List<int>();
foreach (var movie in moviesAndAverageGradeSortedList)
{
var key = movie.Key;
moviesSortedList.Add(key);
}
return moviesSortedList;
}
So moviesAndAverageGradeSortedList returns List<{int,double}> because of the .select method. So I could not return this value as this method is type of List<int> because I want only movieIds not their average grades.
So I created a new List<int> and then foreach loop which go through the moviesAndAverageGradeSortedList and saves only Keys from that List.
I think this solution is not correct because foreach loop can be then very slow when I put big number as a parameter. Does somebody know, how can I get "Keys" (movieIds) from the first list and therefore avoid creating another List<int> and foreach loop?
I will be thankful for every solution.
You can avoid the second list creation by just adding another .Select after the ordering. Also to make it all a bit cleaner you could:
return _deserializator.RatingCollection()
.GroupBy(i => i.Movie)
.OrderByDescending(g => g.Average(i => i.Grade))
.Select(g => g.Key)
.Take(amountOfMovies)
.ToList();
Note that this won't really improve performance much (if at all) because even in your original implementation the creation of the second list is done only on the subset of the first n items. The expensive operations are the ordering by the averages of the group and that you want to perform on all items in the json file, regardless to the number of item you want to return
You could add another select after you have ordered the list by average
var moviesAndAverageGradeSortedList = _deserializator.RatingCollection()
.GroupBy(movieId => movieId.Movie)
.Select(group => new
{
Key = group.Key,
Average = group.Average(g => g.Grade)
})
.OrderByDescending(a => a.Average)
.Take(amountOfMovies)
.Select(s=> s.Key)
.ToList();
I want to access the first, second, third elements in a list. I can use built in .First() method for accessing first element.
My code is as follows:
Dictionary<int, Tuple<int, int>> pList = new Dictionary<int, Tuple<int, int>>();
var categoryGroups = pList.Values.GroupBy(t => t.Item1);
var highestCount = categoryGroups
.OrderByDescending(g => g.Count())
.Select(g => new { Category = g.Key, Count = g.Count() })
.First();
var 2ndHighestCount = categoryGroups
.OrderByDescending(g => g.Count())
.Select(g => new { Category = g.Key, Count = g.Count() })
.GetNth(1);
var 3rdHighestCount = categoryGroups
.OrderByDescending(g => g.Count())
.Select(g => new { Category = g.Key, Count = g.Count() })
.GetNth(2);
twObjClus.WriteLine("--------------------Cluster Label------------------");
twObjClus.WriteLine("\n");
twObjClus.WriteLine("Category:{0} Count:{1}",
highestCount.Category, highestCount.Count);
twObjClus.WriteLine("\n");
twObjClus.WriteLine("Category:{0} Count:{1}",
2ndHighestCount.Category, 2ndHighestCount.Count);
// Error here i.e. "Can't use 2ndHighestCount.Category here"
twObjClus.WriteLine("\n");
twObjClus.WriteLine("Category:{0} Count:{1}",
3rdHighestCount.Category, 3rdHighestCount.Count);
// Error here i.e. "Can't use 3rdHighestCount.Category here"
twObjClus.WriteLine("\n");
I have written extension method GetNth() as:
public static IEnumerable<T> GetNth<T>(this IEnumerable<T> list, int n)
{
if (n < 0)
throw new ArgumentOutOfRangeException("n");
if (n > 0){
int c = 0;
foreach (var e in list){
if (c % n == 0)
yield return e;
c++;
}
}
}
Can I write extension methods as .Second(), .Third() similar to
built in method .First() to access second and third indices?
If what you're looking for is a single object, you don't need to write it yourself, because a built-in method for that already exists.
foo.ElementAt(1)
will get you the second element, etc. It works similarly to First and returns a single object.
Your GetNth method seems to be returning every Nth element, instead of just the element at index N. I'm assuming that's not what you want since you said you wanted something similar to First.
Since #Eser gave up and doesn't want to post the correct way as an answer, here goes:
You should rather do the transforms once, collect the results into an array, and then get the three elements from that. The way you're doing it right now results in code duplication as well as grouping and ordering being done multiple times, which is inefficient.
var highestCounts = pList.Values
.GroupBy(t => t.Item1)
.OrderByDescending(g => g.Count())
.Select(g => new { Category = g.Key, Count = g.Count() })
.Take(3)
.ToArray();
// highestCounts[0] is the first count
// highestCounts[1] is the second
// highestCounts[2] is the third
// make sure to handle cases where there are less than 3 items!
As an FYI, if you some day need just the Nth value and not the top three, you can use .ElementAt to access values at an arbitrary index.
I have a some code to sort my collection in linq in C#. I want it to group by the houseName to sum over the volumes, order that collection, but also pass a third parameter, pctVol, to the new sorted collection. What am I doing wrong? I know that the problem lies in the pctVol = group.Selecct(item => item.pctVol) line.
var inBetween = this.GroupBy(item => item.houseName)
.Select(group =>
new DataItem
{
houseName = group.Key,
VOLUME = group.Sum(item => item.VOLUME),
pctVol = group.Select(item => item.pctVol)
})
.ToList();
ObservableCollection<DataItem> objSort = new ObservableCollection<DataItem>(inBetween.OrderBy(DataItem =>
DataItem.VOLUME));
return objSort;
What kind of value do you want pctVol to have? With that code, it looks like DataItem.pctVol will be an IEnumerable containing all the pctVol values in that group.
If you want a single value, and all the pctVol values in each group are guaranteed to be the same, then you could just take the value from the first element, like this: pctVol = group.First().pctVol
I am having a bit of problem in that I am trying to GroupBy using linq and although it works, it only works when I eliminate one element of the code.
nestedGroupedStocks = stkPositions.GroupBy(x => new { x.stockName,
x.stockLongshort,x.stockIsin, x.stockPrice })
.Select(y => new stockPos
{
stockName = y.Key.stockName,
stockLongshort = y.Key.stockLongshort,
stockIsin = y.Key.stockIsin,
stockPrice = y.Key.stockPrice,
stockQuantity = y.Sum(x => x.stockQuantity)
}).ToList();
The above code Groups my stock positions and the results in the list containing 47 entries but what it fails to do is sum duplicate stocks with different quantities...
nestedGroupedStocks = stkPositions.GroupBy(x => new { x.stockName,
x.stockIsin, x.stockPrice })
.Select(y => new stockPos
{
stockName = y.Key.stockName,
stockIsin = y.Key.stockIsin,
stockPrice = y.Key.stockPrice,
stockQuantity = y.Sum(x => x.stockQuantity)
}).ToList();
However, if I elimanate "x.longshort" then I get the desired result, 34 stocks summed up, but the then all longshort elements in the list are null...
Its driving me nuts :-)
This part
.GroupBy(x => new { x.stockName,x.stockLongshort,x.stockIsin, x.stockPrice })
is the problem. You are trying to group the elements by that new object as key, but x.stockLongshort will most likely change for every single element in the list, making the GroupBy fail unless the name and the stockLongshort will match in both elements ( as for the other 2 fields, but I assume those are always the same).
nestedGroupedStocks = stkPositions.GroupBy(x => x.stockName)
.Select(y => new stockPos
{
stockName = y.First().stockName,
stockLongshort = y.First().stockLongshort,
stockIsin = y.First().stockIsin,
stockPrice = y.First().stockPrice,
stockQuantity = y.Sum(z => z.stockQuantity)
}).ToList();
Note that the stockLongshort property is set to be equal to the value of the first element in the group. You could set it to 0 if that's more usefull to you.
Longer Explanation
GroupBy returns IEnumerable<IGrouping<TKey, TSource>> , that is, a "set" (that you can enumarte) of Groups, with each element of the same group sharing the same Key, that you have defined with the lambda expression in the argument.
If you put x.stockLongshort as a property of the Key object, that becomes a discriminant of the evaluation made by GroupBy, that, as a consequence, puts two elements that differ just by that property in two distinct groups.