Linq Count inside Select, nested, minimizing immediate execution database dependency calls - c#

I have a massive LINQ query that fetches information that looks like this:
In other words, first-level categories, which own second-level categories, which own third level categories. For each category we retrieve the number of listings it contains.
Here is the query:
categories = categoryRepository
.Categories
.Where(x => x.ParentID == null)
.Select(x => new CategoryBrowseIndexViewModel
{
CategoryID = x.CategoryID,
FriendlyName = x.FriendlyName,
RoutingName = x.RoutingName,
ListingCount = listingRepository
.Listings
.Where(y => y.SelectedCategoryOneID == x.CategoryID
&& y.Lister.Status != Subscription.StatusEnum.Cancelled.ToString())
.Count(),
BrowseCategoriesLevelTwoViewModels = categoryRepository
.Categories
.Where(a => a.ParentID == x.CategoryID)
.Select(a => new BrowseCategoriesLevelTwoViewModel
{
CategoryID = a.CategoryID,
FriendlyName = a.FriendlyName,
RoutingName = a.RoutingName,
ParentRoutingName = x.RoutingName,
ListingCount = listingRepository
.Listings
.Where(n => n.SelectedCategoryTwoID == a.CategoryID
&& n.Lister.Status != Subscription.StatusEnum.Cancelled.ToString())
.Count(),
BrowseCategoriesLevelThreeViewModels = categoryRepository
.Categories
.Where(b => b.ParentID == a.CategoryID)
.Select(b => new BrowseCategoriesLevelThreeViewModel
{
CategoryID = b.CategoryID,
FriendlyName = b.FriendlyName,
RoutingName = b.RoutingName,
ParentRoutingName = a.RoutingName,
ParentParentID = x.CategoryID,
ParentParentRoutingName = x.RoutingName,
ListingCount = listingRepository
.Listings
.Where(n => n.SelectedCategoryThreeID == b.CategoryID
&& n.Lister.Status != Subscription.StatusEnum.Cancelled.ToString())
.Count()
})
.Distinct()
.OrderBy(b => b.FriendlyName)
.ToList()
})
.Distinct()
.OrderBy(a => a.FriendlyName)
.ToList()
})
.Distinct()
.OrderBy(x => x.FriendlyName == jobVacanciesFriendlyName)
.ThenBy(x => x.FriendlyName == servicesLabourHireFriendlyName)
.ThenBy(x => x.FriendlyName == goodsEquipmentFriendlyName)
.ToList();
This was fast enough on my dev machine, but alas! Deployed to Azure it's very slow. The reason seems to be that this query is making hundreds of dependency calls to the database, I'm pretty sure because of the immediate execution of the Count statements. Although the app and the database are in the same datacenter, the calls add up in a way they didn't on my dev machine (~40s vs < 1s). So what I'd like to do is send this whole thing off to the database, let it crunch, and get it all back in one hit, if it's possible. How do I do this? Also if I'm approaching this whole thing wrong please tell me. This is the biggest bottleneck in my web app so any help to make it more efficient is appreciated. Thank you! (I'm less concerned about web app memory usage than I am about the cumulative effect of all the database calls.)

This is my suggestion to your massive query.
Don't use ToList() inside the inner queries.
Don't use Count() inside the inner queries.
Try to retrieve all the data once without above IEnumerable operations.In other words fetch the data as IQueryable mode.After loading it in to the App's memory,you can create your data model as you wish.This process will give huge performance boost to your app.So try that and let us know.
Update : about Count()
If you have lot of columns on that list, just fetch a 1 column without Count() using projection.After that you can get the count() on your IEnumerable list.In other words on your app's memory after fetching it from the db.

Here's what I've got so far. It's working really well, but I'm still curious if I can do this in one DB trip, not two. That would seem to be complicated by the fact that each repository has its own DBContext. If you guys have any more thoughts I'd be more than happy to upvote you.
var allCategories = categoryRepository
.Categories
.Select(x => new
{
x.CategoryID,
x.FriendlyName,
x.RoutingName,
x.ParentID
})
.ToList();
var allListings = listingRepository
.Listings
.Where(x => x.Lister.Status != Subscription.StatusEnum.Cancelled.ToString())
.Select(x => new
{
x.SelectedCategoryOneID,
x.SelectedCategoryTwoID,
x.SelectedCategoryThreeID,
})
.ToList();
categories =
allCategories
.Where(x => x.ParentID == null)
.Select(a => new CategoryBrowseIndexViewModel
{
CategoryID = a.CategoryID,
FriendlyName = a.FriendlyName,
RoutingName = a.RoutingName,
ListingCount = allListings
.Where(x => x.SelectedCategoryOneID == a.CategoryID)
.Count(),
BrowseCategoriesLevelTwoViewModels =
allCategories
.Where(x => x.ParentID == a.CategoryID)
.Select(b => new BrowseCategoriesLevelTwoViewModel
{
CategoryID = b.CategoryID,
FriendlyName = b.FriendlyName,
RoutingName = b.RoutingName,
ParentRoutingName = a.RoutingName,
ListingCount = allListings
.Where(x => x.SelectedCategoryTwoID == b.CategoryID)
.Count(),
BrowseCategoriesLevelThreeViewModels =
allCategories
.Where(x => x.ParentID == b.CategoryID)
.Select(c => new BrowseCategoriesLevelThreeViewModel
{
CategoryID = c.CategoryID,
FriendlyName = c.FriendlyName,
RoutingName = c.RoutingName,
ParentRoutingName = b.RoutingName,
ParentParentID = a.CategoryID,
ParentParentRoutingName = a.RoutingName,
ListingCount = allListings
.Where(x => x.SelectedCategoryThreeID == c.CategoryID)
.Count()
})
.OrderBy(x => x.FriendlyName)
})
.OrderBy(x => x.FriendlyName)
})
.OrderBy(x => x.FriendlyName == jobVacanciesFriendlyName)
.ThenBy(x => x.FriendlyName == servicesLabourHireFriendlyName)
.ThenBy(x => x.FriendlyName == goodsEquipmentFriendlyName);

Related

Is there a way to simplify these linq statements using an .Include()?

Currently I am doing a keyword search on the Plates table (Name column) but also have a Search (searching on SearchTerm column) table which contains Plat Id's that I also want to search and return the corresponding platforms.
The code below works but I'd like to simplify the logic using an .Include statement if possible although I'm not quite sure how. Any help would be greatly appreciated.
if (!string.IsNullOrEmpty(request.Keyword))
{
var searchTermPlateIds = await _db.Search
.Where(x=> x.SearchTerm.ToLower().Contains(request.Keyword.Trim().ToLower()))
.Select(x => x.PlatformId)
.ToListAsync(ct);
var plateFromPlateIds = await _db.Plate
.OrderBy(x => x.Name)
.Where(x => searchTermPlateIds.Contains(x.Id) && x.Status != PlateStatus.Disabled)
.ToListAsync(ct);
plates = await _db.Plates
.OrderBy(x => x.Name)
.Where(x => !string.IsNullOrEmpty(request.Keyword.Trim()) && x.Name.ToLower().Contains(request.Keyword.Trim().ToLower()) && x.Status != PlateStatus.Disabled)
.ToListAsync(ct);
plates = plates.Union(platesFromPlateIds).ToList();
}
Remember simple thing, Include ONLY for loading related data, not for filtering.
What we can do here - optimize query, to make only one request to database, instead of three.
var query = _db.Plates
.Where(x => x.Status != PlateStatus.Disabled);
if (!string.IsNullOrEmpty(request.Keyword))
{
// do not materialize Ids
var searchTermPlateIds = _db.Search
.Where(x => x.SearchTerm.ToLower().Contains(request.Keyword.Trim().ToLower()))
.Select(x => x.PlatformId);
// queryable will be combined into one query
query = query
.Where(x => searchTermPlateIds.Contains(x.Id);
}
// final materialization, here you can add Includes if needed.
var plates = await query
.OrderBy(x => x.Name)
.ToListAsync(ct);

LINQ OrderBy for complex entity

I have complex query:
var containers = this.Repository.Containers
.Include(x => x.PostsContainers)
.ThenInclude(x => x.Post)
.ThenInclude(x => x.TasksPosts)
.ThenInclude(x => x.Task)
.ThenInclude(x => x.AssignedToUser);
var items = containers.Where(x => x.PostsContainers
.Any(y => y.Post.TasksPosts
.Any(z => z.Task.DateDue <= DateTime.UtcNow.AddDays(7)
&& !z.Task.Completed
&& z.Task.AssignedToUserId.Value == userId)));
I got items but I also need to sort these items by Task.DueDate and extract AssignedToUser name.
What's the best way to do it (with good performance, without code duplication)? Maybe I need to rewrite my code?
Try to refactor it this way:
var dateDue = DateTime.UtcNow.AddDays(7)
var result = (from c in this.Repository.Containers
from pc in c.PostsContainers
from tp in pc.Post.TasksPosts
where tp.Task.DateDue <= dateDue
&& !tp.Task.Completed
&& tp.Task.AssignedToUserId == userId
orderby tp.Task.DueDate
select new
{
Container = c,
tp.Task.AssignedToUser
})
.ToList();

ANY with ALL in Entity Framework evaluates locally

I have the following Entity Framework 2.0 query:
var user = context.Users.AsNoTracking()
.Include(x => x.UserSkills).ThenInclude(x => x.Skill)
.Include(x => x.UserSkills).ThenInclude(x => x.SkillLevel)
.FirstOrDefault(x => x.Id == userId);
var userSkills = user.UserSkills.Select(z => new {
SkillId = z.SkillId,
SkillLevelId = z.SkillLevelId
}).ToList()
Then I tried the following query:
var lessons = _context.Lessons.AsNoTracking()
.Where(x => x.LessonSkills.All(y =>
userSkills.Any(z => y.SkillId == z.SkillId && y.SkillLevelId <= z.SkillLevelId)))
.ToList();
This query evaluates locally and I get the message:
The LINQ expression 'where (([y].SkillId == [z].SkillId) AndAlso ([y].SkillLevelId <= [z].SkillLevelId))' could not be translated and will be evaluated locally.'.
I tried to solve it using userSkills instead of user.UserSkills but no luck.
Is there a way to run this query on the server?
You should try limiting the usage of in-memory collections inside LINQ to Entities queries to basically Contains on primitive value collection, which currently is the only server translatable construct.
Since Contains is not applicable here, you should not use the memory collection, but the corresponding server side subquery:
var userSkills = context.UserSkills
.Where(x => x.UserId == userId);
var lessons = context.Lessons.AsNoTracking()
.Where(x => x.LessonSkills.All(y =>
userSkills.Any(z => y.SkillId == z.SkillId && y.SkillLevelId <= z.SkillLevelId)))
.ToList();
or even embed the first subquery into the main query:
var lessons = context.Lessons.AsNoTracking()
.Where(x => x.LessonSkills.All(y =>
context.UserSkills.Any(z => z.UserId == userId && y.SkillId == z.SkillId && y.SkillLevelId <= z.SkillLevelId)))
.ToList();
Use Contains on the server then filter further on the client:
var userSkillIds = userSkills.Select(s => s.SkillId).ToList();
var lessons = _context.Lessons.AsNoTracking()
.Where(lsn => lsn.LessonSkills.All(lsnskill => userSkillIds.Contains(lsnskill.SkillId)))
.AsEnumerable() // depending on EF Core translation, may not be needed
.Where(lsn => lsn.LessonSkills.All(lsnskill => userSkills.Any(uskill => uskill.SkillId == lsnskill.SkillId && lsnskill.SkillLevelId <= uskill.SkillLevelId)))
.ToList();

Group By Select New Object

I want to retrieve a list of games from my database and the count the number of games that a specified team won and lost and put it into an object with a win and loss property. I was trying this but it doesn't seem to be correct.
var winLoss = _teamService.GetGames()
.Where(x => x.Result != "Tie")
.GroupBy(x => x.Result)
.Select(x => new
{
Wins = x.Count(a => a.Result == "Hello"),
Losses = x.Count(a => a.Result != "Hello")
});
The return type for this is an IQueryable whereas I want it to just be a single object with a Win and Loss property.
Doing a GroupBy on the Results would put all the Wins for the current team into one group and then separate groups for each team they lost to in their own separate group.
Using a LINQ query you're going to end up with a collection, but what you care about is essentially a list of keys and values. I believe this will supply you the information you're looking for:
var winLoss = _teamService.GetGames()
.Where(x => x.Result != "Tie").GroupBy(x => x.Result)
.ToDictionary(e => e.Key, e => e.Count());
int wins = 0;
int losses = 0;
winLoss.TryGetValue("WIN", out wins);
winLoss.TryGetValue("LOSS", out losses);
I just went with two simple count calls to the SQL database.
Wins = _teamService.GetGames().Count(x => x.Result == "Name");
Loses = _teamService.GetGames().IsNotTie().Count(x => x.Result != "Name");
It's not 100% what I wanted but to do it in one call involved more complicated LINQ and therefore more complicated SQL.
You need to count the wins and losses for each team:
var winLoss = _teamService.GetGames()
.GroupBy(x => x.Team)
.Where(gg => gg.Key == "Hello")
.Select(gg => new
{
Wins = gg.Count(g => g.Result == "Hello"),
Losses = gg.Count(g => g.Result != "Hello")
});
Add FirstOrDefault() at the end of your linq query so you will get only the first element:
var winLoss = _teamService.GetGames()
.Where(x => x.Result != "Tie").GroupBy(x => x.Result);
var win = winLoss.Select(x => x.Count(a => a.Result == "Hello")).FirstOrDefault();
var loose = winLoss.Select(x => x.Count(a => a.Result != "Hello")).FirstOrDefault();

Query execution time in Entity Framework vs in SQL Server

I have a query written in Linq To Entities:
db.Table<Operation>()
.Where(x => x.Date >= dateStart)
.Where(x => x.Date < dateEnd)
.GroupBy(x => new
{
x.EntityId,
x.EntityName,
x.EntityToken
})
.Select(x => new EntityBrief
{
EntityId = x.Key.EntityId,
EntityName = x.Key.EntityName,
EntityToken = x.Key.EntityToken,
Quantity = x.Count()
})
.OrderByDescending(x => x.Quantity)
.Take(5)
.ToList();
The problem is that it takes 4 seconds when executing in the application using EF. But when I take the created pure SQL Query from that query object (using Log) and fire it directly on SQL Server, then it takes 0 seconds. Is it a known problem?
Firstly, try improving your query:
var entityBriefs =
Table<Operation>().Where(x => x.Date >= dateStart && x.Date < dateEnd)
.GroupBy(x => x.EntityId)
.OrderByDescending(x => x.Count())
.Take(5)
.Select(x => new EntityBrief
{
EntityId = x.Key.EntityId,
Quantity = x.Count()
});
var c = entityBriefs.ToDictionary(e => e.EntityId, e => e);
var entityInfo = Table<Operation>().Where(o => mapping.Keys.Contains(o.EntityId).ToList();
foreach(var entity in entityInfo)
{
mapping[entity.EntityId].EntityName = entity.EntityName;
mapping[entity.EntityId].EntityToken = entity.EntityToken;
}
You may also compile queries with the help of CompiledQuery.Compile, and use it further with improved performance.
http://msdn.microsoft.com/en-us/library/bb399335%28v=vs.110%29.aspx
The problem was with the database locks. I used wrong isolation level, so my queries were blocked under some circumstances. Now I use read-commited-snapshot and the execution time looks good.

Categories