Entity Framework - paging with group by skipping group - c#

I have the following query:
var enumerable = repository.Elemtents.Where((s) =>
DbFunctions.TruncateTime(s.Timestamp) <= parameter.To.Date &&
DbFunctions.TruncateTime(s.Timestamp) >= parameter.From.Date)
.OrderByDescending((s) => s.Timestamp)
.GroupBy((s) => new {Date = DbFunctions.TruncateTime(s.Timestamp), s.Timestamp.Hour})
.OrderByDescending((s) => s.Key.Date);
I now want to apply paging with Skip() and Take(). In my table (protocol entries) can be a large amount of data. So I coul do the following, but it would be a perfomance lack.
var result = enumerable
.ToList()
.SelectMany((x) => x)
.Skip(0)
.Take(2);
I want to apply Skip() and Take() on the query directly so that it will be done on the sql server. If I do the following, I get weird results:
var result = repository.Elemtents.Where((s) =>
DbFunctions.TruncateTime(s.Timestamp) <= parameter.To.Date &&
DbFunctions.TruncateTime(s.Timestamp) >= parameter.From.Date)
.OrderByDescending((s) => s.Timestamp)
.GroupBy((s) => new {Date = DbFunctions.TruncateTime(s.Timestamp), s.Timestamp.Hour})
.OrderByDescending((s) => s.Key.Date)
.Skip(0)
.Take(2)
.ToList();
Does anyone know how to resolve this?

Related

ANY with ALL in Entity Framework evaluates locally

I have the following Entity Framework 2.0 query:
var user = context.Users.AsNoTracking()
.Include(x => x.UserSkills).ThenInclude(x => x.Skill)
.Include(x => x.UserSkills).ThenInclude(x => x.SkillLevel)
.FirstOrDefault(x => x.Id == userId);
var userSkills = user.UserSkills.Select(z => new {
SkillId = z.SkillId,
SkillLevelId = z.SkillLevelId
}).ToList()
Then I tried the following query:
var lessons = _context.Lessons.AsNoTracking()
.Where(x => x.LessonSkills.All(y =>
userSkills.Any(z => y.SkillId == z.SkillId && y.SkillLevelId <= z.SkillLevelId)))
.ToList();
This query evaluates locally and I get the message:
The LINQ expression 'where (([y].SkillId == [z].SkillId) AndAlso ([y].SkillLevelId <= [z].SkillLevelId))' could not be translated and will be evaluated locally.'.
I tried to solve it using userSkills instead of user.UserSkills but no luck.
Is there a way to run this query on the server?
You should try limiting the usage of in-memory collections inside LINQ to Entities queries to basically Contains on primitive value collection, which currently is the only server translatable construct.
Since Contains is not applicable here, you should not use the memory collection, but the corresponding server side subquery:
var userSkills = context.UserSkills
.Where(x => x.UserId == userId);
var lessons = context.Lessons.AsNoTracking()
.Where(x => x.LessonSkills.All(y =>
userSkills.Any(z => y.SkillId == z.SkillId && y.SkillLevelId <= z.SkillLevelId)))
.ToList();
or even embed the first subquery into the main query:
var lessons = context.Lessons.AsNoTracking()
.Where(x => x.LessonSkills.All(y =>
context.UserSkills.Any(z => z.UserId == userId && y.SkillId == z.SkillId && y.SkillLevelId <= z.SkillLevelId)))
.ToList();
Use Contains on the server then filter further on the client:
var userSkillIds = userSkills.Select(s => s.SkillId).ToList();
var lessons = _context.Lessons.AsNoTracking()
.Where(lsn => lsn.LessonSkills.All(lsnskill => userSkillIds.Contains(lsnskill.SkillId)))
.AsEnumerable() // depending on EF Core translation, may not be needed
.Where(lsn => lsn.LessonSkills.All(lsnskill => userSkills.Any(uskill => uskill.SkillId == lsnskill.SkillId && lsnskill.SkillLevelId <= uskill.SkillLevelId)))
.ToList();

Getting the count of most repeated records in Linq

I am working on an application in which I have to store play history of a song in the data table. I have a table named PlayHistory which has four columns.
Id | SoundRecordingId(FK) | UserId(FK) | DateTime
Now i have to implement a query that will return the songs that are in trending phase i.e. being mostly played. I have written the following query in sql server that returns me data somehow closer to what I want.
select COUNT(*) as High,SoundRecordingId
from PlayHistory
where DateTime >= GETDATE()-30
group by SoundRecordingId
Having COUNT(*) > 1
order by SoundRecordingId desc
It returned me following data:
High SoundRecordingId
2 5
2 3
Which means Song with Ids 5 and 3 were played the most number of times i.e.2
How can I implement this through Linq in c#.
I have done this so far:
DateTime d = DateTime.Now;
var monthBefore = d.AddMonths(-1);
var list =
_db.PlayHistories
.OrderByDescending(x=>x.SoundRecordingId)
.Where(t => t.DateTime >= monthBefore)
.GroupBy(x=>x.SoundRecordingId)
.Take(20)
.ToList();
It returns me list of whole table with the count of SoundRecording objects but i want just count of the most repeated records.
Thanks
There is an overload of the .GroupBy method which will solve your problem.
DateTime d = DateTime.Now;
var monthBefore = d.AddMonths(-1);
var list =
_db.PlayHistories
.OrderByDescending(x=>x.SoundRecordingId)
.Where(t => t.DateTime >= monthBefore)
.GroupBy(x=>x.SoundRecordingId, (key,values) => new {SoundRecordingID=key, High=values.count()})
.Take(20)
.ToList();
I have simply added the result selector to the GroupBy method call here which does the same transformation you have written in your SQL.
The method overload in question is documented here
To go further into your problem, you will probably want to do another OrderByDescending to get your results in popularity order. To match the SQL statement you also have to filter for only counts > 1.
DateTime d = DateTime.Now;
var monthBefore = d.AddMonths(-1);
var list =
_db.PlayHistories
.Where(t => t.DateTime >= monthBefore)
.GroupBy(x=>x.SoundRecordingId, (key,values) => new {SoundRecordingID=key, High=values.count()})
.Where(x=>x.High>1)
.OrderByDescending(x=>x.High)
.ToList();
I like the 'linq' syntax it's similar to SQL
var query = from history in _db.PlayHistories
where history.DateTime >= monthBefore
group history by history.SoundRecordingId into historyGroup
where historyGroup.Count() > 1
orderby historyGroup.Key
select new { High = historyGroup.Count(), SoundRecordingId = historyGroup.Key };
var data = query.Take(20).ToList();
You´re allmost done. Just order your list by the count and take the first:
var max =
_db.PlayHistories
.OrderByDescending(x=>x.SoundRecordingId)
.Where(t => t.DateTime >= monthBefore)
.GroupBy(x=>x.SoundRecordingId)
.OrderByDescending(x => x.Count())
.First();
This gives you a single key-value-pair where the Key is your SoundRecordingId and the value is the number of its occurences in your input-list.
EDIT: To get all records with that amount chose this instead:
var grouped =
_db.PlayHistories
.OrderByDescending(x => x.SoundRecordingId)
.Where(t => t.DateTime >= monthBefore)
.GroupBy(x => x.SoundRecordingId)
.Select(x => new { Id = x.Key, Count = x.Count() }
.OrderByDescending(x => x.Count)
.ToList();
var maxCount = grouped.First().Count;
var result = grouped.Where(x => x.Count == maxCount);
This solves the problem by giving you what you asked for. Your query in LINQ, returning just the play counts.
var list = _db.PlayHistories.Where(x => x.DateTimeProp > (DateTime.Now).AddMonths(-1))
.OrderByDescending(y => y.SoundRecordingId.Count())
.ThenBy(z => z.SoundRecordingId)
.Select(xx => xx.SoundRecordingId).Take(20).ToList();

LINQ - How to get subset of columns after GroupBy

This LINQ-to-SQL query works (testing in LINQpad):
var q5 = LOGs.Where(r => r.APP_NAME == "Toaster")
.GroupBy(pol => pol.CASE_NO)
.Select(grp => grp.First())
.OrderByDescending(l => l.WHEN);
q5.Dump();
However, that returns all columns for each row.
How can I refine the Select() part to specify certain columns?
I can do it in two steps by adding .ToList() to the query, then querying q5:
var q5a = q5.Select(r => new {CASE=r.CASE_NO, WHEN = r.WHEN});
q5a.Dump();
Can I accomplish that in one statement instead of two?
Thanks --
why don't you filter after where?
var q5 = LOGs.Where(r => r.APP_NAME == "Toaster")
.Select(r=> new{r.CASE_NO, r.WHEN})
.GroupBy(pol => pol.CASE_NO)
.Select(grp => grp.First())
.OrderByDescending(l => l.WHEN);
remembar that new {CASE=r.CASE_NO, WHEN = r.WHEN} creates a new anonymous type because of differents property names, new {r.CASE_NO, r.WHEN} doesn't !

SQL Azure vs. On-Premises Timeout Issue - EF

I'm working on a report right now that runs great with our on-premises DB (just refreshed from PROD). However, when I deploy the site to Azure, I get a SQL Timeout during its execution. If I point my development instance at the SQL Azure instance, I get a timeout as well.
Goal: To output a list of customers that have had an activity created during the search range, and when that customer is found, get some other information about that customer regarding policies, etc. I've removed some of the properties below for brevity (as best I can)...
UPDATE
After lots of trial and error, I can get the entire query to run fairly consistently within 1000MS so long as this block of code is not executed.
CurrentStatus = a.Activities
.Where(b => b.ActivityType.IsReportable)
.OrderByDescending(b => b.DueDateTime)
.Select(b => b.Status.Name)
.FirstOrDefault(),
With this code in place, things begin to go haywire. I think this Where clause is a big part of it: .Where(b => b.ActivityType.IsReportable). What is the best way to grab the status name?
EXISTING CODE
Any thoughts as to why SQL Azure would timeout whereas on-premises would turn this around in less than 100MS?
return db.Customers
.Where(a => a.Activities.Where(
b => b.CreatedDateTime >= search.BeginDateCreated
&& b.CreatedDateTime <= search.EndDateCreated).Count() > 0)
.Where(a => a.CustomerGroup.Any(d => d.GroupId== search.GroupId))
.Select(a => new CustomCustomerReport
{
CustomerId = a.Id,
Manager = a.Manager.Name,
Customer = a.FirstName + " " + a.LastName,
ContactSource= a.ContactSource!= null ? a.ContactSource.Name : "Unknown",
ContactDate = a.DateCreated,
NewSale = a.Sales
.Where(p => p.Employee.IsActive)
.OrderByDescending(p => p.DateCreated)
.Select(p => new PolicyViewModel
{
//MISC PROPERTIES
}).FirstOrDefault(),
ExistingSale = a.Sales
.Where(p => p.CancellationDate == null || p.CancellationDate <= myDate)
.Where(p => p.SaleDate < myDate)
.OrderByDescending(p => p.DateCreated)
.Select(p => new SalesViewModel
{
//MISC PROPERTIES
}).FirstOrDefault(),
CurrentStatus = a.Activities
.Where(b => b.ActivityType.IsReportable)
.OrderByDescending(b => b.DueDateTime)
.Select(b => b.Disposition.Name)
.FirstOrDefault(),
CustomerGroup = a.CustomerGroup
.Where(cd => cd.GroupId == search.GroupId)
.Select(cd => new GroupViewModel
{
//MISC PROPERTIES
}).FirstOrDefault()
}).ToList();
I cannot give you a definite answer but I would recommend approaching the problem by:
Run SQL profiler locally when this code is executed and see what SQL is generated and run. Look at the query execution plan for each query and look for table scans and other slow operations. Add indexes as needed.
Check your lambdas for things that cannot be easily translated into SQL. You might be pulling the contents of a table into memory and running lambdas on the results, which will be very slow. Change your lambdas or consider writing raw SQL.
Is the Azure database the same as your local database? If not, pull the data locally so your local system is indicative.
Remove sections (i.e. CustomerGroup then CurrentDisposition then ExistingSale then NewSale) and see if there is a significant performance improvement after removing the last section. Focus on the last removed section.
Looking at the line itself:
You use ".Count() > 0" on line 4. Use ".Any()" instead, since the former goes through every row in the database to get you an accurate count when you just want to know if at least one row satisfies the requirements.
Ensure fields referenced in where clauses have indexes, such as IsReportable.
Short answer: use memory.
Long answer:
Because of either bad maintenance plans or limited hardware, running this query in one big lump is what's causing it to fail on Azure. Even if that weren't the case, because of all the navigation properties you're using, this query would generate a staggering number of joins. The answer here is to break it down in smaller pieces that Azure can run. I'm going to try to rewrite your query into multiple smaller, easier to digest queries that use the memory of your .NET application. Please bear with me as I make (more or less) educated guesses about your business logic/db schema and rewrite the query accordingly. Sorry for using the query form of LINQ but I find things such as join and group by are more readable in that form.
var activityFilterCustomerIds = db.Activities
.Where(a =>
a.CreatedDateTime >= search.BeginDateCreated &&
a.CreatedDateTime <= search.EndDateCreated)
.Select(a => a.CustomerId)
.Distinct()
.ToList();
var groupFilterCustomerIds = db.CustomerGroup
.Where(g => g.GroupId = search.GroupId)
.Select(g => g.CustomerId)
.Distinct()
.ToList();
var customers = db.Customers
.AsNoTracking()
.Where(c =>
activityFilterCustomerIds.Contains(c.Id) &&
groupFilterCustomerIds.Contains(c.Id))
.ToList();
var customerIds = customers.Select(x => x.Id).ToList();
var newSales =
(from s in db.Sales
where customerIds.Contains(s.CustomerId)
&& s.Employee.IsActive
group s by s.CustomerId into grouped
select new
{
CustomerId = grouped.Key,
Sale = grouped
.OrderByDescending(x => x.DateCreated)
.Select(new PolicyViewModel
{
// properties
})
.FirstOrDefault()
}).ToList();
var existingSales =
(from s in db.Sales
where customerIds.Contains(s.CustomerId)
&& (s.CancellationDate == null || s.CancellationDate <= myDate)
&& s.SaleDate < myDate
group s by s.CustomerId into grouped
select new
{
CustomerId = grouped.Key,
Sale = grouped
.OrderByDescending(x => x.DateCreated)
.Select(new SalesViewModel
{
// properties
})
.FirstOrDefault()
}).ToList();
var currentStatuses =
(from a in db.Activities.AsNoTracking()
where customerIds.Contains(a.CustomerId)
&& a.ActivityType.IsReportable
group a by a.CustomerId into grouped
select new
{
CustomerId = grouped.Key,
Status = grouped
.OrderByDescending(x => x.DueDateTime)
.Select(x => x.Disposition.Name)
.FirstOrDefault()
}).ToList();
var customerGroups =
(from cg in db.CustomerGroups
where cg.GroupId == search.GroupId
group cg by cg.CustomerId into grouped
select new
{
CustomerId = grouped.Key,
Group = grouped
.Select(x =>
new GroupViewModel
{
// ...
})
.FirstOrDefault()
}).ToList();
return customers
.Select(c =>
new CustomCustomerReport
{
// ... simple props
// ...
// ...
NewSale = newSales
.Where(s => s.CustomerId == c.Id)
.Select(x => x.Sale)
.FirstOrDefault(),
ExistingSale = existingSales
.Where(s => s.CustomerId == c.Id)
.Select(x => x.Sale)
.FirstOrDefault(),
CurrentStatus = currentStatuses
.Where(s => s.CustomerId == c.Id)
.Select(x => x.Status)
.FirstOrDefault(),
CustomerGroup = customerGroups
.Where(s => s.CustomerId == c.Id)
.Select(x => x.Group)
.FirstOrDefault(),
})
.ToList();
Hard to suggest anything without seeing actual table definitions, espectially the indexes and foreign keys on Activities entity.
As far I understand Activity (CustomerId, ActivityTypeId, DueDateTime, DispositionId). If this is standard warehousing table (DateTime, ClientId, Activity), I'd suggest the following:
If number of Activities is reasonably small, then force the use of CONTAINS by
var activities = db.Activities.Where( x => x.IsReportable ).ToList();
...
.Where( b => activities.Contains(b.Activity) )
You can even help the optimiser by specifying that you want ActivityId.
Indexes on Activitiy entity should be up to date. For this particular query I suggest (CustomerId, ActivityId, DueDateTime DESC)
precache Disposition table, my crystal ball tells me that it's dictionary table.
For similar task to avoid constantly hitting Activity table I made another small table (CustomerId, LastActivity, LastVAlue) and updated it as the status changed.

Query execution time in Entity Framework vs in SQL Server

I have a query written in Linq To Entities:
db.Table<Operation>()
.Where(x => x.Date >= dateStart)
.Where(x => x.Date < dateEnd)
.GroupBy(x => new
{
x.EntityId,
x.EntityName,
x.EntityToken
})
.Select(x => new EntityBrief
{
EntityId = x.Key.EntityId,
EntityName = x.Key.EntityName,
EntityToken = x.Key.EntityToken,
Quantity = x.Count()
})
.OrderByDescending(x => x.Quantity)
.Take(5)
.ToList();
The problem is that it takes 4 seconds when executing in the application using EF. But when I take the created pure SQL Query from that query object (using Log) and fire it directly on SQL Server, then it takes 0 seconds. Is it a known problem?
Firstly, try improving your query:
var entityBriefs =
Table<Operation>().Where(x => x.Date >= dateStart && x.Date < dateEnd)
.GroupBy(x => x.EntityId)
.OrderByDescending(x => x.Count())
.Take(5)
.Select(x => new EntityBrief
{
EntityId = x.Key.EntityId,
Quantity = x.Count()
});
var c = entityBriefs.ToDictionary(e => e.EntityId, e => e);
var entityInfo = Table<Operation>().Where(o => mapping.Keys.Contains(o.EntityId).ToList();
foreach(var entity in entityInfo)
{
mapping[entity.EntityId].EntityName = entity.EntityName;
mapping[entity.EntityId].EntityToken = entity.EntityToken;
}
You may also compile queries with the help of CompiledQuery.Compile, and use it further with improved performance.
http://msdn.microsoft.com/en-us/library/bb399335%28v=vs.110%29.aspx
The problem was with the database locks. I used wrong isolation level, so my queries were blocked under some circumstances. Now I use read-commited-snapshot and the execution time looks good.

Categories