how to aggregate a linq query by different groupings - c#

How do you perform multiple seperate aggregations on different grouping in linq?
for example, i have a table:
UNO YOS Ranking Score
123456 1 42 17
645123 3 84 20
I want to perform an set of aggregations on this data both grouped and ungrouped, like:
var grouped = table.GroupBy(x => x.score )
.Select(x => new
{
Score = x.Key.ToString(),
OverallAverageRank = x.Average(y => y.Ranking),
Year1RankAvg = x.Where(y => y.YOS == 1).Average(y => y.Ranking),
Year2RankAvg = x.Where(y => y.YOS == 2).Average(y => y.Ranking)
//...etc
});
I also want to perform different aggregations (standard deviation) on the same slices and whole-set data.
I can't figure out how to both group and not group the YOS at the same time and while this compiles fine, when it comes to runtime, I get "Sequence contains no elements", if any of the YOS averages are in.

Like anything programming, when you have a sequence of similar items, use a collection. In this case, I left it IEnumerable, but you could make it a List, or a Dictionary by YOS, if desired.
var ans = table.GroupBy(t => t.Score)
.Select(tg => new {
Score = tg.Key,
OverallAverageRank = tg.Average(t => t.Ranking),
YearRankAvgs = tg.GroupBy(t => t.YOS).Select(tyg => new { YOS = tyg.Key, RankAvg = tyg.Average(t => t.Ranking) })
});
If you need the range of years from 1 to max (or some other number) filled in, you can modify the answer:
var ans2 = ans.Select(soryr => new {
soryr.Score,
soryr.OverallAverageRank,
YearRankDict = soryr.YearRankAvgs.ToDictionary(yr => yr.YOS),
YearMax = soryr.YearRankAvgs.Max(yr => yr.YOS)
})
.Select(soryr => new {
Score = soryr.Score,
OverAverageRank = soryr.OverallAverageRank,
YearRankAvgs = Enumerable.Range(1, soryr.YearMax).Select(yos => soryr.YearRankDict.ContainsKey(yos) ? soryr.YearRankDict[yos] : new { YOS = yos, RankAvg = 0.0 }).ToList()
});
If you preferred, you could modify the original ans to return RankAvg as double? and put null in place of 0.0 when adding missing years.

Related

Using GroupBy to compute average or count based on the whole data until the corresponding date

I have the AssessmentItems DB object which contains the items about: Which user evaluated (EvaluatorId), which submission (SubmissionId), based on which rubric item (or criteria)(RubricItemId) and when (DateCreated).
I group by this object by RubricItemId and DateCreated to get compute some daily statistics based on each assessment criteria (or rubric item).
For example, I compute the AverageScore, which works fine and returns an output like: RubricItem: 1, Day: 15/01/2019, AverageScore: 3.2.
_context.AssessmentItems
.Include(ai => ai.RubricItem)
.Include(ai => ai.Assessment)
.Where(ai => ai.RubricItem.RubricId == rubricId && ai.Assessment.Submission.ReviewRoundId == reviewRoundId)
.Select(ai => new
{
ai.Id,
DateCreated = ai.DateCreated.ToShortDateString(),//.ToString(#"yyyy-MM-dd"),
ai.CurrentScore,
ai.RubricItemId,
ai.Assessment.SubmissionId,
ai.Assessment.EvaluatorId
})
.GroupBy(ai => new { ai.RubricItemId, ai.DateCreated })
.Select(g => new
{
g.Key.RubricItemId,
g.Key.DateCreated,
AverageScore = g.Average(ai => ai.CurrentScore),
NumberOfStudentsEvaluating = g.Select(ai => ai.EvaluatorId).Distinct().Count(),
}).ToList();
What I want to do is to compute the average until that day. I mean instead of calculating the average for the day, I want to get the average until that day (that is, I want to consider the assessment scores of the preceding days). The same why, when I compute NumberOfStudentsEvaluating, I want to indicate the total number of students participated in the evaluation until that day.
One approach to achieve this could be to iterate through the result object and compute these properties again:
foreach (var i in result)
{
i.AverageScore = result.Where(r => r.DateCreated <= i.DateCreated).Select(r => r.AverageScore).Average(),
}
But, this is quite costly. I wonder if it is possible to tweak the code a bit to achieve this, or should I start from scratch with another approach.
If you split the query into two halves, you can compute the average as you would like (I also computed the NumberOfStudentsEvaluating on the same criteria) but I am not sure if EF/EF Core will be able to translate to SQL:
var base1 = _context.AssessmentItems
.Include(ai => ai.RubricItem)
.Include(ai => ai.Assessment)
.Where(ai => ai.RubricItem.RubricId == rubricId && ai.Assessment.Submission.ReviewRoundId == reviewRoundId)
.Select(ai => new {
ai.Id,
ai.DateCreated,
ai.CurrentScore,
ai.RubricItemId,
ai.Assessment.SubmissionId,
ai.Assessment.EvaluatorId
})
.GroupBy(ai => ai.RubricItemId);
var ans1 = base1
.SelectMany(rig => rig.Select(ai => ai.DateCreated).Distinct().Select(DateCreated => new { RubricItemId = rig.Key, DateCreated, Items = rig.Where(b => b.DateCreated <= DateCreated) }))
.Select(g => new {
g.RubricItemId,
DateCreated = g.DateCreated.ToShortDateString(), //.ToString(#"yyyy-MM-dd"),
AverageScore = g.Items.Average(ai => ai.CurrentScore),
NumberOfStudentsEvaluating = g.Items.Select(ai => ai.EvaluatorId).Distinct().Count(),
}).ToList();

Multiple fields in GroupBy clause in LINQ (where one is computed field)

I have a LINQ query upon which I need to add two fields as group by clauses. While I can easily group by with as many column fields but the problem is occurring when one of the fields is a calculated field. I can't seem to be able to get my head around on how to add the second attribute in this case
var values = intermediateValues
//.GroupBy(x => new {x.Rate, x.ExpiryDate })
.GroupBy(r => new { Rate = ((int)(r.Rate / BucketSize)) * BucketSize })
.Select(y => new FXOptionScatterplotValue
{
Volume = y.Sum(z => z.TransactionType == "TERMINATION" ? -z.Volume : z.Volume),
Rate = y.Key.Rate,
ExpiryDate = y.Key.ExpiryDate,
Count = y.Count()
}).ToArray();
In the above code sample I would like to have ExpiryDate added to my existing GroupBy clause which has a computed field of Rate already there. The code looks like this in VS editor
So just include it as you have in the commented-out code:
.GroupBy(r => new { Rate = ((int)(r.Rate / BucketSize)) * BucketSize,
r.ExpiryDate })
This might help you
var values = intermediateValues
//.GroupBy(x => new {x.Rate, x.ExpiryDate })
.GroupBy(r => new { Rate = ((int)(r.Rate / BucketSize) ) * BucketSize,ExpiryDate1 = r.ExpiryDate })
.Select(y => new FXOptionScatterplotValue
{
Volume = y.Sum(z => z.TransactionType == "TERMINATION" ? -z.Volume : z.Volume),
Rate = y.Key.Rate,
ExpiryDate = y.Key.ExpiryDate1,
Count = y.Count()
}).ToArray();
Just use ExpiryDate1 as anonymous type and use this as key name....

Compare two lists to get waterfall chart data in LinQ

My object has 3 fields - Term, Subject and Marks. I want to have a list of items whose marks are different for any subject.
eg:
First list
Term1, English,90
Term1, Maths, 60
Term1, Physics, 30
Second list
Term2, English, 95
Term2,Maths, 60
Term2, Chemistry, 20
Finally what i want is
English : +5
Physics : +30
Chemistry : -20.
I am using the below query to get difference, but it fails if the key field(subject in this case) values are not same in the two list(eg:-Chemistry present in List2 but not in List1)
var diffData = list1.Union(list2)
.GroupBy(m => m.Subject)
.Select(d=>
{
Subject= d.Key,
Difference = d.OrderBy(m =>m.Term).Select(s => s.Mark).Aggregate(t1, t2) => t2 - t1)
}).Where(m => m.Difference != 0).ToList();
Please help
var diffs = list1.Union(list2)
//Create groups where the key is subject and the value is the
//list of positive marks for Term2 and negative marks for Term1
.GroupBy(c => c.Subject, c => c.Term == "Term2" ? c.Mark : -c.Mark)
.Select(s => new
{
Subject = s.Key,
Difference = s.Sum()
})
.Where(s => s.Difference != 0);
var diffs2 = list1.Union(list2)
.GroupBy(c => c.Subject)
.Select(s =>
{
//For a more general and slighly different algorithm, you can
//subtract all the marks for a each subject except the last term
//mark from the last term mark (e.g. 95 - 90 for English or 30 -
//n/a because there's only one term for Physics
var marks = s.OrderByDescending(c => c.Term).Select(c => c.Mark);
var lastTermMark = marks.First();
return new
{
Subject = s.Key,
Difference = marks.Skip(1)
.Aggregate(lastTermMark, (diff, mark) => diff - mark)
};
})
.Where(s => s.Difference != 0);
Try it this way you simply got your Output
var Outpt = (from a in List1
from b in List2
where a.Subject == b.Subject
Select new {subject = a.subject , marks = a.marks - b.marks , a.Term}).ToList();

Find MAX/MIN list item using LINQ?

I have a list Having multiple Items and 3 props ID,DATE,COMMENT.ID field is Auto incremented in DATABASE.
Let say list Contains
2,16AUG,CommentMODIFIED
1,15AUG,CommentFIRST
3,18AUG,CommentLASTModified
I want to get a single ITEM.Item Having Minimum DATE and having Latest Comment. In this case
1,15AUG,CommentLASTModified
Any easy way to do it using LINQ.
orderedItems = items.OrderBy(x => x.Date);
var result = items.First();
result.Comment = items.Last().Comment;
To get a single item out of the list, you can order the items then take the first one, like this:
var result = items
.OrderByDescending(x => x.Date)
.First();
But First will throw an exception if the items collection is empty. This is a bit safer:
var result = items
.OrderByDescending(x => x.Date)
.FirstOrDefault();
To get the min / max of different columns you can do this:
var result =
new Item {
Id = 1,
Date = items.Min(x => x.Date),
Comment = items.Max(x => x.Comment)
};
But this will require two trips to the database. This might be a bit more efficient:
var result =
(from x in items
group x by 1 into g
select new Item {
Id = 1,
Date = g.Min(g => g.Date),
Comment = g.Max(g => g.Comment)
})
.First();
Or in fluent syntax:
var result = items
.GroupBy(x => 1)
.Select(g => new Item {
Id = 1,
Date = g.Min(g => g.Date),
Comment = g.Max(g => g.Comment)
})
.First();

Multiple group by with aggregate in Linq

I currently have this code:
foreach (var newsToPolitician in news.NewsToPoliticians)
{
var politician = newsToPolitician.Politician;
var votes = (from s in db.Scores
where o.IDPolitician == politician.IDPolitician
&& o.IDNews == IDNews
group o by o.IDAtribute
into g
select new{
Atribute= g.Key,
TotalScore= g.Sum(x => x.Score)
}).ToList();
}
It works alright, but I want to avoid making multiple queries to my database in foreach loop.
My table Scores looks like this:
IDScore | IDNews | IDUser | IDPolitician | IDAtribute | Score
1 40 1010 35 1 1
2 40 1010 35 2 -1
3 40 1002 35 1 1
4 40 1002 35 2 1
5 40 1002 40 1 -1
...
My goal is to aggregate all the scores for all politicians in a news. A news can have up to 7 politicians.
Is it expensive to call my database up to seven times in a foreach loop. I know that isn't best practice so I'm interested is there any way to avoid it in this particular case and make one call to database and then process it on the server side?
Update - Due to user comments have re-jigged to try and ensure aggregation on the server.
In this case we can group on the server by both IDPolitician and IDAttribute and then pull the groups in with ToLookup locally as so:
var result = db.Scores.Where(s => s.IDNews == IDNews)
.Where(s => news.NewsToPoliticians
.Select(n => n.Politician.IDPolitician)
.Contains(s.IDPolitician))
.GroupBy(s => new
{
s.IDPolitician,
s.IDAttribute
},
(k,g ) => new
{
k.IDPolitician,
k.IDAttribute,
Sum = g.Sum(x => x.Score)
})
.ToLookup(anon => anon.IDPolitician,
anon => new { anon.IDAttribute, anon.Sum })
Legacy -
You want to use GroupJoin here, it would be something along the lines of:
var result = news.NewsToPoliticians
.GroupJoin( db.Scores.Where(s= > s.IDNews == IDNews),
p => p.IDPolitician,
s => s.IDPolitician,
(k,g) => new
{
PoliticianId = k,
GroupedVotes = g.GroupBy(s => s.IDAtribute,
(id, group) => new
{
Atribute = id,
TotalScore = group.Sum(x => x.Score)
})
})
.ToList();
However you are at the mercy of your provider as to how it translates this so it might still be multiple queries to get round this you could use something like:
var politicianIds = news.NewsToPoliticians.Select(p => p.IDPolitician).ToList()
var result = db.Scores.Where(s= > s.IDNews == IDNews)
.Where(s => politicianIds.Contains(s.IDPolitician))
.GroupBy(p => p.IDPolitician,
(k,g) => new
{
PoliticianId = k,
GroupedVotes = g.GroupBy(s => s.IDAtribute,
(id, group) => new
{
Atribute = id,
TotalScore = group.Sum(x => x.Score)
})
})
.ToList();
Which hopefully should be at most 2 query (depending on whether NewsToPoliticians is db dependent). You'll just have to try it out and see.
Use a stored procedure and get the SQL server engine to do all the work. You can still use Linq to call the stored procedure and this will minimize all the calls to the database

Categories