Multiple fields in GroupBy clause in LINQ (where one is computed field) - c#

I have a LINQ query upon which I need to add two fields as group by clauses. While I can easily group by with as many column fields but the problem is occurring when one of the fields is a calculated field. I can't seem to be able to get my head around on how to add the second attribute in this case
var values = intermediateValues
//.GroupBy(x => new {x.Rate, x.ExpiryDate })
.GroupBy(r => new { Rate = ((int)(r.Rate / BucketSize)) * BucketSize })
.Select(y => new FXOptionScatterplotValue
{
Volume = y.Sum(z => z.TransactionType == "TERMINATION" ? -z.Volume : z.Volume),
Rate = y.Key.Rate,
ExpiryDate = y.Key.ExpiryDate,
Count = y.Count()
}).ToArray();
In the above code sample I would like to have ExpiryDate added to my existing GroupBy clause which has a computed field of Rate already there. The code looks like this in VS editor

So just include it as you have in the commented-out code:
.GroupBy(r => new { Rate = ((int)(r.Rate / BucketSize)) * BucketSize,
r.ExpiryDate })

This might help you
var values = intermediateValues
//.GroupBy(x => new {x.Rate, x.ExpiryDate })
.GroupBy(r => new { Rate = ((int)(r.Rate / BucketSize) ) * BucketSize,ExpiryDate1 = r.ExpiryDate })
.Select(y => new FXOptionScatterplotValue
{
Volume = y.Sum(z => z.TransactionType == "TERMINATION" ? -z.Volume : z.Volume),
Rate = y.Key.Rate,
ExpiryDate = y.Key.ExpiryDate1,
Count = y.Count()
}).ToArray();
Just use ExpiryDate1 as anonymous type and use this as key name....

Related

Using GroupBy to compute average or count based on the whole data until the corresponding date

I have the AssessmentItems DB object which contains the items about: Which user evaluated (EvaluatorId), which submission (SubmissionId), based on which rubric item (or criteria)(RubricItemId) and when (DateCreated).
I group by this object by RubricItemId and DateCreated to get compute some daily statistics based on each assessment criteria (or rubric item).
For example, I compute the AverageScore, which works fine and returns an output like: RubricItem: 1, Day: 15/01/2019, AverageScore: 3.2.
_context.AssessmentItems
.Include(ai => ai.RubricItem)
.Include(ai => ai.Assessment)
.Where(ai => ai.RubricItem.RubricId == rubricId && ai.Assessment.Submission.ReviewRoundId == reviewRoundId)
.Select(ai => new
{
ai.Id,
DateCreated = ai.DateCreated.ToShortDateString(),//.ToString(#"yyyy-MM-dd"),
ai.CurrentScore,
ai.RubricItemId,
ai.Assessment.SubmissionId,
ai.Assessment.EvaluatorId
})
.GroupBy(ai => new { ai.RubricItemId, ai.DateCreated })
.Select(g => new
{
g.Key.RubricItemId,
g.Key.DateCreated,
AverageScore = g.Average(ai => ai.CurrentScore),
NumberOfStudentsEvaluating = g.Select(ai => ai.EvaluatorId).Distinct().Count(),
}).ToList();
What I want to do is to compute the average until that day. I mean instead of calculating the average for the day, I want to get the average until that day (that is, I want to consider the assessment scores of the preceding days). The same why, when I compute NumberOfStudentsEvaluating, I want to indicate the total number of students participated in the evaluation until that day.
One approach to achieve this could be to iterate through the result object and compute these properties again:
foreach (var i in result)
{
i.AverageScore = result.Where(r => r.DateCreated <= i.DateCreated).Select(r => r.AverageScore).Average(),
}
But, this is quite costly. I wonder if it is possible to tweak the code a bit to achieve this, or should I start from scratch with another approach.
If you split the query into two halves, you can compute the average as you would like (I also computed the NumberOfStudentsEvaluating on the same criteria) but I am not sure if EF/EF Core will be able to translate to SQL:
var base1 = _context.AssessmentItems
.Include(ai => ai.RubricItem)
.Include(ai => ai.Assessment)
.Where(ai => ai.RubricItem.RubricId == rubricId && ai.Assessment.Submission.ReviewRoundId == reviewRoundId)
.Select(ai => new {
ai.Id,
ai.DateCreated,
ai.CurrentScore,
ai.RubricItemId,
ai.Assessment.SubmissionId,
ai.Assessment.EvaluatorId
})
.GroupBy(ai => ai.RubricItemId);
var ans1 = base1
.SelectMany(rig => rig.Select(ai => ai.DateCreated).Distinct().Select(DateCreated => new { RubricItemId = rig.Key, DateCreated, Items = rig.Where(b => b.DateCreated <= DateCreated) }))
.Select(g => new {
g.RubricItemId,
DateCreated = g.DateCreated.ToShortDateString(), //.ToString(#"yyyy-MM-dd"),
AverageScore = g.Items.Average(ai => ai.CurrentScore),
NumberOfStudentsEvaluating = g.Items.Select(ai => ai.EvaluatorId).Distinct().Count(),
}).ToList();

how to aggregate a linq query by different groupings

How do you perform multiple seperate aggregations on different grouping in linq?
for example, i have a table:
UNO YOS Ranking Score
123456 1 42 17
645123 3 84 20
I want to perform an set of aggregations on this data both grouped and ungrouped, like:
var grouped = table.GroupBy(x => x.score )
.Select(x => new
{
Score = x.Key.ToString(),
OverallAverageRank = x.Average(y => y.Ranking),
Year1RankAvg = x.Where(y => y.YOS == 1).Average(y => y.Ranking),
Year2RankAvg = x.Where(y => y.YOS == 2).Average(y => y.Ranking)
//...etc
});
I also want to perform different aggregations (standard deviation) on the same slices and whole-set data.
I can't figure out how to both group and not group the YOS at the same time and while this compiles fine, when it comes to runtime, I get "Sequence contains no elements", if any of the YOS averages are in.
Like anything programming, when you have a sequence of similar items, use a collection. In this case, I left it IEnumerable, but you could make it a List, or a Dictionary by YOS, if desired.
var ans = table.GroupBy(t => t.Score)
.Select(tg => new {
Score = tg.Key,
OverallAverageRank = tg.Average(t => t.Ranking),
YearRankAvgs = tg.GroupBy(t => t.YOS).Select(tyg => new { YOS = tyg.Key, RankAvg = tyg.Average(t => t.Ranking) })
});
If you need the range of years from 1 to max (or some other number) filled in, you can modify the answer:
var ans2 = ans.Select(soryr => new {
soryr.Score,
soryr.OverallAverageRank,
YearRankDict = soryr.YearRankAvgs.ToDictionary(yr => yr.YOS),
YearMax = soryr.YearRankAvgs.Max(yr => yr.YOS)
})
.Select(soryr => new {
Score = soryr.Score,
OverAverageRank = soryr.OverallAverageRank,
YearRankAvgs = Enumerable.Range(1, soryr.YearMax).Select(yos => soryr.YearRankDict.ContainsKey(yos) ? soryr.YearRankDict[yos] : new { YOS = yos, RankAvg = 0.0 }).ToList()
});
If you preferred, you could modify the original ans to return RankAvg as double? and put null in place of 0.0 when adding missing years.

How to group the headers and add where condition inside the select statement

Please refer below data table Image
Header contains two sets of values
Difficult to use/Understand
Delivery Issues
I want to group the header and get the category percentage like below code
var p = from linq_row in trend_data.AsEnumerable()
group linq_row by linq_row["Header"] into g
select new
{
categorypercentage = g.Select(s => s["categoryPercentage"].ToString()).ToArray()
};
above linq will group the header and return the categorypercentage same for the headers
but what i want is i need two sets of variable instead of one single variable
var p= from.....
select new
{
firstgroupedheadercategorypercentage= g.select ....and some where clasue
secondgroupedheadercategorypercentage= g.select ....and some where clasue
}
how can i do or perform the operation of inserting where clause in inside of select new statement.
i need to get category percentage based upon the header value.
If you want the sum of the percentage column of every group:
var headerPercentages = trend_data.AsEnumerable()
.GroupBy(r => r.Field<string>("Header"))
.Select(g => new { Header = g.Key, Percentage = g.Sum(r => r.Field<double>("categoryPercentage"))});
If you want a single anonymous type instance as result with two poperties for these specific headers:
var x = new {
firstHeaderPercentage = trend_data.AsEnumerable()
.Where(r => r.Field<string>("Header") == "Difficult to use/Understand")
.Sum(r => r.Field<double>("categoryPercentage")),
secondHeaderPercentage = trend_data.AsEnumerable()
.Where(r => r.Field<string>("Header") == "Delivery Issues")
.Sum(r => r.Field<double>("categoryPercentage"))
};

Linq two select statements, second one uses first ones result,

This linq query works well.
var qry = context.Boxes
.GroupBy(k=>k.Box_ID)
.Select( group => new {
Box_ID = group.Key,
TotalA = group.Sum(p => p.A),
TotalC = group.Sum(p => p.C)
})
.Select(p => new {
Box_ID = p.Kasa_ID,
TotalA = p.TotalA,
TotalC = p.TotalC,
DiffAC = p.TotalA - p.TotalC
});
But, i saw these type select statements, second one uses first select's anonymous type result, written like this:
var qry = context.Boxes
.GroupBy(k => k.Box_ID)
.Select(group => new
{
Box_ID = group.Key,
TotalA = group.Sum(p => p.A),
TotalC = group.Sum(p => p.C)
})
.Select(p => new
{
Box_ID, //*** compiler error
TotalA, //I'm asking about these 3 lines, is this syntax possible
TotalC, //TotalC = p.TotalC,
DiffAC = p.TotalA - p.TotalC // calculate
});
comments contains details.
When i try to compile second query, compiler gives me the error "The name 'Box_ID' does not exist in the current contex".
In fact there is no doubt with first syntax, but second one is more readable. How can i use second syntax? or in which condititons i can use it.
.Select(p => new
{
p.Box_ID,
p.TotalA,
p.TotalC,
DiffAC = p.TotalA - p.TotalC // calculate
});

Use Linq to return first result for each category

I have a class (ApplicationHistory) with 3 properties:
ApplicantId, ProviderId, ApplicationDate
I return the data from the database into a list, however this contains duplicate ApplicantId/ProviderId keys.
I want to supress the list so that the list only contains the the earliest Application Date for each ApplicantId/ProviderId.
The example below is where I'm currently at, but I'm not sure how to ensure the earliest date is returned.
var supressed = history
.GroupBy(x => new
{
ApplicantId = x.ApplicantId,
ProviderId = x.ProviderId
})
.First();
All advice appreciated.
Recall that each group formed by the GroupBy call is an IGrouping<ApplicationHistory>, which implements IEnumerable<ApplicationHistory>. Read more about IGrouping here. You can order those and pick the first one:
var oldestPerGroup = history
.GroupBy(x => new
{
ApplicantId = x.ApplicantId,
ProviderId = x.ProviderId
})
.Select(g => g.OrderBy(x => x.ApplicationDate).FirstOrDefault());
You are selecting first group. Instead select first item from each group:
var supressed = history
.GroupBy(x => new {
ApplicantId = x.ApplicantId,
ProviderId = x.ProviderId
})
.Select(g => g.OrderBy(x => x.ApplicationDate).First());
Or query syntax (btw you don't need to specify names for anonymous object properties in this case):
var supressed = from h in history
group h by new {
h.ApplicantId,
h.ProviderId
} into g
select g.OrderBy(x => x.ApplicationDate).First();

Categories