Include count = 0 in linq results - c#

I have a table having TeamName and CurrentStatus fields. I am making a linq query to get for each team and for each status the count of records:
var teamStatusCounts = models.GroupBy(x => new { x.CurrentStatus, x.TeamName })
.Select(g => new { g.Key, Count = g.Count() });
The results of this query returns all the counts except where count is 0. I need to get the rows where there is no record for a specific team and a specific status (where count = 0).

You could have a separate collection for team name and statuses you are expecting and add the missing ones to the result set
//assuming allTeamNamesAndStatuses is a cross joing of all 'CurrentStatus' and 'TeamNames'
var teamStatusCounts = models.GroupBy(x => new { x.CurrentStatus, x.TeamName })
.Select(g => new { g.Key, Count = g.Count() })
.ToList();
var missingTeamsAndStatuses = allTeamNamesAndStatuses
.Where(a=>
!teamStatusCounts.Any(b=>
b.Key.CurrentStatus == a.CurrentStatus
&& b.Key.TeamName == a.TeamName))
.Select(a=>new {
Key = new { a.CurrentStatus, a.TeamName },
Count = 0
});
teamStatusCounts.AddRange(emptyGroups);
I've created a fiddle demonstrating the answer as well

I would select the team names and status first:
var teams = models.Select(x => x.TeamName).Distinct().ToList();
var status = models.Select(x => x.CurrentStatus).Distinct().ToList();
You can skip this if you know the list entries already.
Then you can select for each team and each state the number of models:
var teamStatusCounts = teams.SelectMany(team => states.Select(state =>
new
{
TeamName = team,
CurrentStatus = state,
Count = models.Count(model =>
model.TeamName == team && model.CurrentStatus == state)
}));

Related

Linq query optimization/summarize

I have the following query:
var countA=await _importContext.table1.CountAsync(ssc => ssc.ImportId == importId)
var countB=await _importContext.table2.CountAsync(ssc => ssc.ImportId == importId)
var countC=await _importContext.table3.CountAsync(ssc => ssc.ImportId == importId)
var countD=await _importContext.table4.CountAsync(ssc => ssc.ImportId == importId)
There are 9 more count from different tables. Is there a way to summarize the query in terms of optimizing & removing redundancy?
I tried wrapping up the queries like:
var result = new
{
countA = context.table1.Count(),
countB = context.table2.Count(),
.....
};
but this takes more time than the first one above.
You can't really optimise it as you seem to need the counts from all of the tables. Your second method of getting the data still calls the database the same amount of times as the first but also creates an object containing all of the counts so is likely to take longer.
The only thing you can really do to make it faster is to get the data in parallel but this might be overkill. I would just go with your first option unless it's really slow.
You can create such query via gouping by constant and Concat operator:
Helper class:
public class TableResult
{
public string Name { get; set; }
public int Count { get; set; }
}
Query:
var query = _importContext.table1.Where(ssc.ImportId == importId).GroupBy(x => 1).Select(g => new TableResult { Name = "table1", Count = g.Count() })
.Concat(_importContext.table2.Where(ssc.ImportId == importId).GroupBy(x => 1).Select(g => new TableResult { Name = "table2", Count = g.Count() }))
.Concat(_importContext.table3.Where(ssc.ImportId == importId).GroupBy(x => 1).Select(g => new TableResult { Name = "table3", Count = g.Count() }))
.Concat(_importContext.table4.Where(ssc.ImportId == importId).GroupBy(x => 1).Select(g => new TableResult { Name = "table4", Count = g.Count() }));
var result = await query.ToListAsync();

LINQ Query Multiple Group and count of latest record - Oracle DB

I tried to divided Linq queries into 3 (total, success, fail) but so far "Total" Linq query is working fine. Please help me to get "Success", "Fail" columns (it has mulitple statuses and we have to check the last column of each transaction and destination)
Note: you need to group by ProcessTime, TransactionId, Destination and check last column whether it is success or Fail then apply count (we are using oracle as backend)
LINQ for Total count
var query = (from filetrans in context.FILE_TRANSACTION
join route in context.FILE_ROUTE on filetrans.FILE_TRANID equals route.FILE_TRANID
where
filetrans.PROCESS_STRT_TIME >= fromDateFilter && filetrans.PROCESS_STRT_TIME <= toDateFilter
select new { PROCESS_STRT_TIME = DbFunctions.TruncateTime((DateTime)filetrans.PROCESS_STRT_TIME), filetrans.FILE_TRANID, route.DESTINATION }).
GroupBy(p => new { p.PROCESS_STRT_TIME, p.FILE_TRANID, p.DESTINATION });
var result = query.GroupBy(x => x.Key.PROCESS_STRT_TIME).Select(x => new { x.Key, Count = x.Count() }).ToDictionary(a => a.Key, a => a.Count);
Check this solution. If it gives wrong result, then I need more details.
var fileTransQuery =
from filetrans in context.AFRS_FILE_TRANSACTION
where accountIds.Contains(filetrans.ACNT_ID) &&
filetrans.PROCESS_STRT_TIME >= fromDateFilter && filetrans.PROCESS_STRT_TIME <= toDateFilter
select filetrans;
var routesQuery =
from filetrans in fileTransQuery
join route in context.AFRS_FILE_ROUTE on filetrans.FILE_TRANID equals route.FILE_TRANID
select route;
var lastRouteQuery =
from d in routesQuery.GroupBy(route => new { route.FILE_TRANID, route.DESTINATION })
.Select(g => new
{
g.Key.FILE_TRANID,
g.Key.DESTINATION,
ROUTE_ID = g.Max(x => x.ROUTE_ID)
})
from route in routesQuery
.Where(route => d.FILE_TRANID == route.FILE_TRANID && d.DESTINATION == route.DESTINATION && d.ROUTE_ID == route.ROUTE_ID)
select route;
var recordsQuery =
from filetrans in fileTransQuery
join route in lastRouteQuery on filetrans.FILE_TRANID equals route.FILE_TRANID
select new { filetrans.PROCESS_STRT_TIME, route.CRNT_ROUTE_FILE_STATUS_ID };
var result = recordsQuery
.GroupBy(p => DbFunctions.TruncateTime((DateTime)p.PROCESS_STRT_TIME))
.Select(g => new TrendData
{
TotalCount = g.Sum(x => x.CRNT_ROUTE_FILE_STATUS_ID != 7 && x.CRNT_ROUTE_FILE_STATUS_ID != 8 ? 1 : 0)
SucccessCount = g.Sum(x => x.CRNT_ROUTE_FILE_STATUS_ID == 7 ? 1 : 0),
FailCount = g.Sum(x => failureStatus.Contains(x.CRNT_ROUTE_FILE_STATUS_ID) ? 1 : 0),
Date = g.Min(x => x.PROCESS_STRT_TIME)
})
.OrderBy(x => x.Date)
.ToList();

How to use GroupBy on an index in RavenDB?

I have this document, a post :
{Content:"blabla",Tags:["test","toto"], CreatedOn:"2019-05-01 01:02:01"}
I want to have a page that displays themost used tags since the last 30 days.
So far I tried to create an index like this
public class Toss_TagPerDay : AbstractIndexCreationTask<TossEntity, TagByDayIndex>
{
public Toss_TagPerDay()
{
Map = tosses => from toss in tosses
from tag in toss.Tags
select new TagByDayIndex()
{
Tag = tag,
CreatedOn = toss.CreatedOn.Date,
Count = 1
};
Reduce = results => from result in results
group result by new { result.Tag, result.CreatedOn }
into g
select new TagByDayIndex()
{
Tag = g.Key.Tag,
CreatedOn = g.Key.CreatedOn,
Count = g.Sum(i => i.Count)
};
}
}
And I query it like that
await _session
.Query<TagByDayIndex, Toss_TagPerDay>()
.Where(i => i.CreatedOn >= firstDay)
.GroupBy(i => i.Tag)
.OrderByDescending(g => g.Sum(i => i.Count))
.Take(50)
.Select(t => new BestTagsResult()
{
CountLastMonth = t.Count(),
Tag = t.Key
})
.ToListAsync()
But this gives me the error
Message: System.NotSupportedException : Could not understand expression: from index 'Toss/TagPerDay'.Where(i => (Convert(i.CreatedOn, DateTimeOffset) >= value(Toss.Server.Models.Tosses.BestTagsQueryHandler+<>c__DisplayClass3_0).firstDay)).GroupBy(i => i.Tag).OrderByDescending(g => g.Sum(i => i.Count)).Take(50).Select(t => new BestTagsResult() {CountLastMonth = t.Count(), Tag = t.Key})
---- System.NotSupportedException : GroupBy method is only supported in dynamic map-reduce queries
Any idea how can I make this work ? I could query for all the index data from the past 30 days and do the groupby / order / take in memory but this could make my app load a lot of data.
The results from the map-reduce index you created will give you the number of tags per day. You want to have the most popular ones from the last 30 days so you need to do the following query:
var tagCountPerDay = session
.Query<TagByDayIndex, Toss_TagPerDay>()
.Where(i => i.CreatedOn >= DateTime.Now.AddDays(-30))
.ToList();
Then you can the the client side grouping by Tag:
var mostUsedTags = tagCountPerDay.GroupBy(x => x.Tag)
.Select(t => new BestTagsResult()
{
CountLastMonth = t.Count(),
Tag = t.Key
})
.OrderByDescending(g => g.CountLastMonth)
.ToList();
#Kuepper
Based on your index definition. You can handle that by the following index:
public class TrendingSongs : AbstractIndexCreationTask<TrackPlayedEvent, TrendingSongs.Result>
{
public TrendingSongs()
{
Map = events => from e in events
where e.TypeOfTrack == TrackSubtype.song && e.Percentage >= 80 && !e.Tags.Contains(Podcast.Tags.FraKaare)
select new Result
{
TrackId = e.TrackId,
Count = 1,
Timestamp = new DateTime(e.TimestampStart.Year, e.TimestampStart.Month, e.TimestampStart.Day)
};
Reduce = results => from r in results
group r by new {r.TrackId, r.Timestamp}
into g
select new Result
{
TrackId = g.Key.TrackId,
Count = g.Sum(x => x.Count),
Timestamp = g.Key.Timestamp
};
}
}
and the query using facets:
from index TrendingSongs where Timestamp between $then and $now select facet(TrackId, sum(Count))
The reason for the error is that you can't use 'GroupBy' in a query made on an index.
'GroupBy' can be used when performing a 'dynamic query',
i.e. a query that is made on a collection, without specifying an index.
See:
https://ravendb.net/docs/article-page/4.1/Csharp/client-api/session/querying/how-to-perform-group-by-query
I solved a similar problem, by using AdditionalSources that uses dynamic values.
Then I update the index every morning to increase the Earliest Timestamp. await IndexCreation.CreateIndexesAsync(new AbstractIndexCreationTask[] {new TrendingSongs()}, _store);
I still have to try it in production, but my tests so far look like it's a lot faster than the alternatives. It does feel pretty hacky though and I'm surprised RavenDB does not offer a better solution.
public class TrendingSongs : AbstractIndexCreationTask<TrackPlayedEvent, TrendingSongs.Result>
{
public DateTime Earliest = DateTime.UtcNow.AddDays(-16);
public TrendingSongs()
{
Map = events => from e in events
where e.TypeOfTrack == TrackSubtype.song && e.Percentage >= 80 && !e.Tags.Contains(Podcast.Tags.FraKaare)
&& e.TimestampStart > new DateTime(TrendingHelpers.Year, TrendingHelpers.Month, TrendingHelpers.Day)
select new Result
{
TrackId = e.TrackId,
Count = 1
};
Reduce = results => from r in results
group r by new {r.TrackId}
into g
select new Result
{
TrackId = g.Key.TrackId,
Count = g.Sum(x => x.Count)
};
AdditionalSources = new Dictionary<string, string>
{
{
"TrendingHelpers",
#"namespace Helpers
{
public static class TrendingHelpers
{
public static int Day = "+Earliest.Day+#";
public static int Month = "+Earliest.Month+#";
public static int Year = "+Earliest.Year+#";
}
}"
}
};
}
}

LINQ query to retrieve pivoted data taking too long

I am working on a LINQ query which includes some pivot data as below
var q = data.GroupBy(x => new
{
x.Med.Name,
x.Med.GenericName,
}).ToList().Select(g =>
new SummaryDto
{
Name= g.Key.Name,
GenericName = g.Key.GenericName,
Data2012 = g.Where(z => z.ProcessDate.Year == 2012).Count(),
Data2013 = g.Where(z => z.ProcessDate.Year == 2013).Count(),
Data2014 = g.Where(z => z.ProcessDate.Year == 2014).Count(),
Data2015 = g.Where(z => z.ProcessDate.Year == 2015).Count(),
Data2016 = g.Where(z => z.ProcessDate.Year == 2016).Count(),
Data2017 = g.Where(z => z.ProcessDate.Year == 2017).Count(),
TotalCount = g.Count(),
}).AsQueryable();
return q;
The above LINQ takes too long as it queries grp q.Count()*6 times. If there are 10000 records, then it queries 60000 times
Is there a better way to make this faster?
Add year to the group key, then group again, and harvest per-group counts:
return data.GroupBy(x => new {
x.Med.Name
, x.Med.GenericName
, x.ProcessDate.Year
}).Select(g => new {
g.Key.Name
, g.Key.GenericName
, g.Key.Year
, Count = g.Count()
}).GroupBy(g => new {
g.Name
, g.GenericName
}).Select(g => new SummaryDto {
Name = g.Key.Name
, GenericName = g.Key.GenericName
, Data2012 = g.SingleOrDefault(x => x.Year == 2012)?.Count ?? 0
, Data2013 = g.SingleOrDefault(x => x.Year == 2013)?.Count ?? 0
, Data2014 = g.SingleOrDefault(x => x.Year == 2014)?.Count ?? 0
, Data2015 = g.SingleOrDefault(x => x.Year == 2015)?.Count ?? 0
, Data2016 = g.SingleOrDefault(x => x.Year == 2016)?.Count ?? 0
, Data2017 = g.SingleOrDefault(x => x.Year == 2017)?.Count ?? 0
, TotalCount = g.Sum(x => x.Count)
}).AsQueryable();
Note: This approach is problematic, because year is hard-coded in the SummaryDto class. You would be better off passing your DTO constructor an IDictionary<int,int> with counts for each year. If you make this change, the final Select(...) would look like this:
.Select(g => new SummaryDto {
Name = g.Key.Name
, GenericName = g.Key.GenericName
, TotalCount = g.Sum(x => x.Count)
, DataByYear = g.ToDictionary(i => i.Year, i => i.Count)
}).AsQueryable();
I suggest grouping inside the group by year and then converting to a dictionary to access the counts. Whether it is faster to group with year first and then count in-memory depends on the distribution of the initial grouping, but with the database it may depend on how efficiently it can group by year, so I would test to determine which seems fastest.
In any case grouping by year after the initial grouping is about 33% faster than your query in-memory, but again it is vastly dependent on the distribution. As the number of initial groups increase, the grouping by Year queries slow down to match the original query. Note that the original query without any year counts is about 1/3 the time.
Here is grouping after the database grouping:
var q = data.GroupBy(x => new {
x.Med.Name,
x.Med.GenericName,
}).ToList().Select(g => {
var gg = g.GroupBy(d => d.ProcessDate.Year).ToDictionary(d => d.Key, d => d.Count());
return new SummaryDto {
Name = g.Key.Name,
GenericName = g.Key.GenericName,
Data2012 = gg.GetValueOrDefault(2012),
Data2013 = gg.GetValueOrDefault(2013),
Data2014 = gg.GetValueOrDefault(2014),
Data2015 = gg.GetValueOrDefault(2015),
Data2016 = gg.GetValueOrDefault(2016),
Data2017 = gg.GetValueOrDefault(2017),
TotalCount = g.Count(),
};
}).AsQueryable();

Getting duplicate data based on dynamic key

I have a list of Person objects:
List<PersonData> AllPersons
From this list I want all those person objects that are duplicated based on a certain property.
Example, this code give all the duplicates based on the Id
var duplicateKeys = AllPersons.GroupBy(p => p.Id).Select(g => new { g.Key, Count = g.Count() }).Where(x => x.Count > 1).ToList().Select(d => d.Key);
duplicates = AllPersons.Where(p => duplicateKeys.Contains(p.Id)).ToList();
Can the part p.Id be dynamic?
Meaning if the user specifies the unique column in a config file and it's read like so:
string uniqueColumn = "FirstName";
How can the query be composed to add that functionality?
Regards.
You can use Reflection to achieve that:
List<PersonData> AllPersons = new List<PersonData>()
{
new PersonData { Id = 1, FirstName = "Tom" },
new PersonData { Id = 2, FirstName = "Jon" },
new PersonData { Id = 3, FirstName = "Tom" }
};
string uniqueColumn = "FirstName";
var prop = typeof(PersonData).GetProperty(uniqueColumn);
var duplicateKeys = AllPersons.GroupBy(p => prop.GetValue(p, null))
.Select(g => new { g.Key, Count = g.Count() })
.Where(x => x.Count > 1)
.Select(d => d.Key)
.ToList();
var duplicates = AllPersons.Where(p => duplicateKeys.Contains(prop.GetValue(p, null))).ToList();
duplicates have 2 elements with FirstName == "Tom" after query execution.
You might want to look into Dynamic LINQ or PredicateBuilder.

Categories