Entity Framework - slow query after adding group by - c#

I have a following query which runs very fast:
var query =
(from art in ctx.Articles
join phot in ctx.ArticlePhotos on art.Id equals phot.ArticleId
join artCat in ctx.ArticleCategories on art.Id equals artCat.ArticleId
join cat in ctx.Categories on artCat.CategoryId equals cat.Id
where art.Active && art.ArticleCategories.Any(c => c.Category.MaterializedPath.StartsWith(categoryPath))
orderby art.PublishDate descending
select new ArticleSmallResponse
{
Id = art.Id,
Title = art.Title,
Active = art.Active,
PublishDate = art.PublishDate ?? art.CreateDate,
MainImage = phot.RelativePath,
RootCategory = art.Category.Name,
Summary = art.Summary
})
.AsNoTracking().Take(request.Take);
However, if I add group by and change query to following statement, it runs much much slower.
var query =
(from art in ctx.Articles
join phot in ctx.ArticlePhotos on art.Id equals phot.ArticleId
join artCat in ctx.ArticleCategories on art.Id equals artCat.ArticleId
join cat in ctx.Categories on artCat.CategoryId equals cat.Id
where art.Active && art.ArticleCategories.Any(c => c.Category.MaterializedPath.StartsWith(categoryPath))
orderby art.PublishDate descending
select new ArticleSmallResponse
{
Id = art.Id,
Title = art.Title,
Active = art.Active,
PublishDate = art.PublishDate ?? art.CreateDate,
MainImage = phot.RelativePath,
RootCategory = art.Category.Name,
Summary = art.Summary
})
.GroupBy(m => m.Id)
.Select(m => m.FirstOrDefault())
.AsNoTracking().Take(request.Take);
Homepage calls query 9 times for each category. With the first version of query, without caching turned on and connecting to SQL remotely, page load is around 1.5 seconds, which makes it almost instant when application is on server, but second way makes homepage load around 39 seconds when SQL is remotely.
Can it be fixed without rewriting the entire query in to the view or stored procedure?

Grouping is an expensive operation on the database end. Without knowing what your database looks like and what indexes you've setup, it will be difficult to determine. Why not just group on the client side after the data has arrived (assuming its not an overwhelming amount).
This question explains how.
Group by in LINQ

Related

Improving LINQ query for many-to-many relation

I have a database with the following schema:
Now, I'm trying to pull all landingpages for a domain and sort those by the first UrlFilter's FilterType that matches a certain group. This is the LINQ I've come up with so far:
var baseQuery = DbSet.AsNoTracking()
.Where(e => EF.Functions.Contains(EF.Property<string>(e, "Url"), $"\"{searchTerm}*\""))
.Where(e => e.DomainLandingPages.Select(lp => lp.DomainId).Contains(domainId));
var count = baseQuery.Count();
var page = baseQuery
.Select(e => new
{
LandingPage = e,
UrlFilter = e.LandingPageUrlFilters.FirstOrDefault(f => f.UrlFilter.GroupId == groupId)
})
.Select(e => new
{
e.LandingPage,
FilterType = e.UrlFilter == null ? UrlFilterType.NotCovered : e.UrlFilter.UrlFilter.UrlFilterType
})
.OrderBy(e => e.FilterType)
.Skip(10).Take(75).ToList();
Now, while this technically works, it's quite slow with execution times ranging from 10-30 seconds, which is not good enough for the use case. The LINQ is translated to the following SQL:
SELECT [l1].[Id], [l1].[LastUpdated], [l1].[Url], CASE
WHEN (
SELECT TOP(1) [l].[LandingPageId]
FROM [LandingPageUrlFilters] AS [l]
INNER JOIN [UrlFilters] AS [u] ON [l].[UrlFilterId] = [u].[Id]
WHERE ([l1].[Id] = [l].[LandingPageId]) AND ([u].[GroupId] = #__groupId_3)) IS NULL THEN 4
ELSE (
SELECT TOP(1) [u0].[UrlFilterType]
FROM [LandingPageUrlFilters] AS [l0]
INNER JOIN [UrlFilters] AS [u0] ON [l0].[UrlFilterId] = [u0].[Id]
WHERE ([l1].[Id] = [l0].[LandingPageId]) AND ([u0].[GroupId] = #__groupId_3))
END AS [FilterType]
FROM [LandingPages] AS [l1]
WHERE CONTAINS([l1].[Url], #__Format_1) AND #__domainId_2 IN (
SELECT [d].[DomainId]
FROM [DomainLandingPages] AS [d]
WHERE [l1].[Id] = [d].[LandingPageId]
)
ORDER BY CASE
WHEN (
SELECT TOP(1) [l2].[LandingPageId]
FROM [LandingPageUrlFilters] AS [l2]
INNER JOIN [UrlFilters] AS [u1] ON [l2].[UrlFilterId] = [u1].[Id]
WHERE ([l1].[Id] = [l2].[LandingPageId]) AND ([u1].[GroupId] = #__groupId_3)) IS NULL THEN 4
ELSE (
SELECT TOP(1) [u2].[UrlFilterType]
FROM [LandingPageUrlFilters] AS [l3]
INNER JOIN [UrlFilters] AS [u2] ON [l3].[UrlFilterId] = [u2].[Id]
WHERE ([l1].[Id] = [l3].[LandingPageId]) AND ([u2].[GroupId] = #__groupId_3))
END
OFFSET #__p_4 ROWS FETCH NEXT #__p_5 ROWS ONLY
Now my question is, how can I improve the execution time of this? Either by SQL or LINQ
EDIT: So I've been tinkering with some raw SQL and this is what I've come up with:
with matched_urls as (
select l.id, min(f.urlfiltertype) as Filter
from landingpages l
join landingpageurlfilters lpf on lpf.landingpageid = l.id
join urlfilters f on lpf.urlfilterid = f.id
where f.groupid = #groupId
and contains(Url, '"barz*"')
group by l.id
) select l.id, 5 as Filter
from landingpages l
where #domainId in (
select domainid
from domainlandingpages dlp
where l.id = dlp.landingpageid
) and l.id not in (select id from matched_urls ) and contains(Url, '"barz*"')
union select * from matched_urls
order by Filter
offset 10 rows fetch next 30 rows only
This performs somewhat okay, cutting the execution time down to ~5 seconds. As this is to be used for a table search I would however like to get it down even further. Is there any way to improve this SQL?
You're right to have a look at the generated SQL. In general, I would advise to learn SQL, write a performing SQL query and work your way back (either use a stored procedure or raw SQL, or design your LINQ query with that same philosophy.
I suspect this will be better (not tested):
var page = (
from e in baseQuery
let urlFilter = e.LandingPageUrlFilters.OrderBy(f => f.UrlFilterType).FirstOrDefault(f => f.UrlFilter.GroupId == groupId)
let filterType = urlFilter == null ? UrlFilterType.NotCovered : e.UrlFilter.UrlFilter.UrlFilterType
select new
{
LandingPage = e,
FilterType = filterType
}
).Skip(10).Take(75).ToList();
one of the way to improve the execution time is see execution plan in SSMS (SQL Server Management Studio).
After look on the execution plan you can design some indexes, or if you have no experiences with this, you can see if SSMS recommends some indexes.
Next try to create the indexes and execute the query again and see if execution time was improved.
Note: this is only one of many possible ways to improve execution time...

Get TOP5 records from each status using Linq C#

I have a VacancyApply table and that table consist of Status Id's,So i need Top5 data from each Status.I want to get top 5 records of each status.Status is int like 1,2,3
My Query
var result = (from ui in _context.VacancyApply
join s in _context.UserProfile on ui.UserId equals s.UserId
join x in _context.Vacancy on ui.VacancyId equals x.VacancyId
join st in _context.Status on ui.StatusId equals st.StatusId
where ui.UserId == userId && ui.IsActive == true
orderby ui.StatusId
select new VacancyApply
{
VacancyApplyId = ui.VacancyApplyId,
VacancyId = ui.VacancyId,
UserId = ui.UserId,
StatusId = ui.StatusId,
VacancyName = x.VacancyName,
VacancyStack = x.VacancyStack,
VacancyEndDate = x.VacancyEndDate,
StatusName = st.StatusName,
UserName = s.FirstName
}).ToList();
Now what I can see from the output is that it contains One VacancyId and One VendorId.
I have a feeling that you have Many to Many relationships between Vacancy and Status tables.
But nevertheless, the answer is very simple: you need to use LINQ Take extension method (maybe it will be good to make it follow after the OrderBy because just taking the last items doesn't make sense without some logic):
var output = (logic to join, filter, etc.).OrderBy(lambda).Take(N); // N is the number of
// items you want to select
Now if you want Generally to take the last items from Vacancy and only after join it with Status do this:
var output = Vacancy.OrderBy(lambda).Take(N).(now join, filter, etc. with other tables);
However, if you want to Group all similar Statuses in conjunction with Vacancies and only after taking the Top items, use GroupBy:
var output = (logic to join, filter, etc.).GroupBy(st => st.StausId).
.Select(group => group.OrderBy(lambda).Take(N));

How to join a list and large lists/tables using LINQ

Initially I have such a list :
List<Car> cars = db.Car.Where(x => x.ProductionYear == 2005).ToList();
Then I'm trying to join this list with two large tables using LINQ like this :
var joinedList = (from car in cars
join driver in db.Driver.ToList()
on car.Id equals driver.CarId
join building in db.Building.ToList()
on driver.BuildingId equals building.Id
select new Building
{
Name = building.Name;
Id = building.Id;
City = building.City;
}).ToList();
Both Driver and Building tables have about 1 million rows. When I run this join I get out of memory exception. How can I make this join work? Should I make the join operation on database? If yes, how can I carry cars list to the db? Thanks in advance.
Even if you remove the .ToList() calls inside your join, you code will still pull all the data and perform the join in-memory and not in SQL server. This is because you're using a local list cars in your join. The below should solve your problem:
var joinedList = (from car in db.Car.Where(x => x.ProductionYear == 2005)
join driver in db.Driver
on car.Id equals driver.CarId
join building in db.Building
on driver.BuildingId equals building.Id
select new Building
{
Name = building.Name;
Id = building.Id;
City = building.City;
}).ToList();
You can remove the last .ToList() and do some paging if you expect to get too many records in the results.
even If You have removed .ToList() replace in .AsQueryable()
AsQueryable Faster then ToList And AsEnumerable
If you create an IQueryable, then the query may be converted to sql
and run on the database server
If you create an IEnumerable, then all rows will be pulled into
memory as objects before running the query.
In both cases if you don't call a ToList() or ToArray() then query
will be executed each time it is used, so, say, you have an
IQueryable and you fill 4 list boxes from it, then the query will be
run against the database 4 times.
so following Used Linq query
var joinedList = (from car in db.Car.Where(x => x.ProductionYear == 2005).AsQueryable()
join driver in db.Driver.AsQueryable()
on car.Id equals driver.CarId
join building in db.Building.AsQueryable()
on driver.BuildingId equals building.Id
select new Building
{
Name = building.Name,
Id = building.Id,
City = building.City,
}).ToList();
First don't ever try ToList() while using LINQ(you can) but make sure that you use ToList() as less as possible in a very rare scenarios only.
Every time you will get OutOfMemoryException when the table contains many rows.
So, here is the code for your question:
var joinedList = (from car in db.Car.GetQueryable().Where(x => x.ProductionYear == 2005)
join driver in db.Driver.GetQueryable() on car.Id equals driver.CarId
join building in db.Building.GetQueryable() on driver.BuildingId equals building.Id
select new Building
{
Name = building.Name;
Id = building.Id;
City = building.City;
}).ToList();

Get the "latest" datetime from a large linq query that currently returns every record that has a datetime

I have a fairly long linq query and everything works as it should.. but in a final join i am doing an innerjoin on a table that has a log, the log returns more than 50 records, i just want the latest record..
Here is an example
var tst = from w in context.storage
join p in context.products on w.id equals p.wid
join l in context.logger on p.id equals l.pid
select new
{
storageid = w.id,
productid = p.id
productname = p.name
bought = l.when
};
So a quick explanation of what happens, each product is stored in a storage center and there is a log when that product was bought, if it was bought 100 times then there is 100 records in the logger.
So currently it returns 50 records for productid = 5 ... why .. because it was bought 50 times but i only want 1 record, hence i only want the latest date time for from the logger.
Can anyone help? I am a little stuck.
Use result.Distinct(x => x.Prop) to get unique entries only
Use result.Max(x => x.Prop) to get latest date, and Min() to get earliest.
This is a case where you want to restrict to collection of records on which to join, which you can do by coding the join manually (sort of):
from w in context.storage
join p in context.products on w.id equals p.wid
// "manual" join:
from l in context.logger.Where(l => l.pid == p.id).OrderByDescencing(l => l.when).Take(1)
select new
{
storageid = w.id,
productid = p.id
productname = p.name
bought = l.when
};
In fluent linq syntax this is a SelectMany with a result selector.

Linq join efficiency question

// Loop each users profile
using (DataClassesDataContext db = new DataClassesDataContext())
{
var q = (from P in db.tblProfiles orderby P.UserID descending select new { LastUpdated = P.ProfileLastUpdated, UserID = P.UserID }).ToList();
foreach(var Rec in q){
string Username = db.tblForumAuthors.SingleOrDefault(author => author.Author_ID == Rec.UserID).Username;
AddURL(("Users/" + Rec.UserID + "/" + Username), Rec.LastUpdated.Value, ChangeFrequency.daily, 0.4);
}
}
This is for my sitemap, printing a URL for each users profile on the system. But say we have 20,000 users, is the Username query going to slow this down significantly?
I'm used to having the join in the SQL query, but having it separated from the main query and in the loop seems like it could be inefficient unless it compiles it well.
It will probably be unbearably slow. In your case this will issue 20,000 separate SQL queries to the database. Since the queries run synchronously, you will incur the server communication overhead on each iteration. The delay will accumulate quite fast.
Go with a join.
from P in db.tblProfiles
join A in db.tblForumAuthors on P.UserID equals A.Author_ID
orderby P.UserID descending
select new { LastUpdated = P.ProfileLastUpdated, UserID = P.UserID, Username = A.Username };
By the way, SingleOrDefault(...).Username will throw a NullReferenceException if the author is missing. Better use Single() or check your logic.
If you have constranints set up correctly in your database while designing the DataContext, then the designer should generate a one-to-one members in your Profile and Author classes.
If not, you can do it manually in designer.
Then you will be able to do something like this:
var q =
from profile in db.tblProfiles
order by profile.UserID descending
select new {
LastUpdated = profile.ProfileLastUpdated,
profile.UserID,
profile.Author.Username
};
Do the JOIN!
Save yourself from unnecessary database access, join and get everything you need in a single shot!

Categories