Optimize LINQ query - c#

I have a report that shows orders made to a determined merchant, and it was working fine until I needed to add a filter for payment status.
This is how I build the query, filter by filter:
var queryOrder = context.Orders.Select(m=>m);
if (viewModel.InitialDate.HasValue)
queryOrder = queryOrder.Where(m => m.CreatedDate.Date >= viewModel.InitialDate.Value);
(...) /* continues building the query, filter by filter */
if (viewModel.SelectedPaymentStatus != null)
queryOrder = queryOrder.Where(m => viewModel.SelectedPaymentStatus.Contains(m.Payments.Select(p => p.PaymentStatusId).Single().ToString()));
queryOrder = queryOrder.Where(m => m.MerchantId == merchantId);
When I run queryOrder, even if it's only a queryOrder.Count(), it takes over 1 minute to execute. Using SQL Server's profiling tool, I extracted the generated query as this:
SELECT [t0].[Id], [t0].[CustomerId], [t0].[MerchantId], [t0].[OrderNumber], [t0].[Amount], [t0].[SoftDescriptor], [t0].[ShippingMethod], [t0].[ShippingPrice], [t0].[IpAddress], [t0].[SellerComment], [t0].[CreatedDate]
FROM [dbo].[Order] AS [t0]
WHERE ([t0].[MerchantId] = #p0)
AND ((CONVERT(NVarChar,(
SELECT [t1].[PaymentStatusId]
FROM [dbo].[Payment] AS [t1]
WHERE [t1].[OrderId] = [t0].[Id]
))) IN (#p1, #p2, #p3, #p4, #p5, #p6, #p7, #p8))
the #p0 parameter is a Guid for merchantId, and the #p1 thru #p8 are numeral strings "1" thru "8", representing the paymentStatusId's.
If I skip the line:
if (viewModel.SelectedPaymentStatus != null)
queryOrder = queryOrder.Where(m => viewModel.SelectedPaymentStatus.Contains(m.Payments.Select(p => p.PaymentStatusId).Single().ToString()));
The query runs in under 1 second. But when I use it, the performance hits the floor. Any tips on how to solve this?

All your queries are deferred and this is both the good/bad part of linq. Try to split the queries and use some in-memory results.
Try removing the first query (doesn't make much sense really, you're returning the same collection) and amend the second query to be like this and see if it makes any difference.
var clause = context.Orders.Payments.Select(p => p.PaymentStatusId).Single().ToString();
if (viewModel.SelectedPaymentStatus != null)
var queryOrder = context.Orders.queryOrder.Where(m => viewModel.SelectedPaymentStatus.Contains(clause));

I've annotated the problem snippet:
if (viewModel.SelectedPaymentStatus != null) {
// Give me only orders from queryOrder where...
queryOrder = queryOrder.Where(
// ...my viewModel's SelectedPaymentStatus collection...
m => viewModel.SelectedPaymentStatus.Contains(
// ...contains the order's payment's PaymentStatusId...
m.Payments.Select(p => p.PaymentStatusId).Single()
// ... represented as a string?!
.ToString()
// Why are database IDs strings?
)
);
}
viewModel.SelectedPaymentStatus appears to be a collection of strings; therefore, you're asking the database to convert PaymentStatusId to an nvarchar and do string comparisons with the elements of SelectedPaymentStatus. Yuck.
Since viewModel.SelectedPaymentStatus is small, it might be better to create a temporary List<int> and use that in your query:
if (viewModel.SelectedPaymentStatus != null) {
// Let's do the conversion once, in C#
List<int> statusIds = viewModel.SelectedPaymentStatus.Select( i => Convert.ToInt32(i) ).ToList();
// Now select the matching orders
queryOrder = queryOrder.Where(
m => statusIds.Contains(
m.Payments.Select(p => p.PaymentStatusId).Single())
)
);
}

Related

Improving LINQ query for many-to-many relation

I have a database with the following schema:
Now, I'm trying to pull all landingpages for a domain and sort those by the first UrlFilter's FilterType that matches a certain group. This is the LINQ I've come up with so far:
var baseQuery = DbSet.AsNoTracking()
.Where(e => EF.Functions.Contains(EF.Property<string>(e, "Url"), $"\"{searchTerm}*\""))
.Where(e => e.DomainLandingPages.Select(lp => lp.DomainId).Contains(domainId));
var count = baseQuery.Count();
var page = baseQuery
.Select(e => new
{
LandingPage = e,
UrlFilter = e.LandingPageUrlFilters.FirstOrDefault(f => f.UrlFilter.GroupId == groupId)
})
.Select(e => new
{
e.LandingPage,
FilterType = e.UrlFilter == null ? UrlFilterType.NotCovered : e.UrlFilter.UrlFilter.UrlFilterType
})
.OrderBy(e => e.FilterType)
.Skip(10).Take(75).ToList();
Now, while this technically works, it's quite slow with execution times ranging from 10-30 seconds, which is not good enough for the use case. The LINQ is translated to the following SQL:
SELECT [l1].[Id], [l1].[LastUpdated], [l1].[Url], CASE
WHEN (
SELECT TOP(1) [l].[LandingPageId]
FROM [LandingPageUrlFilters] AS [l]
INNER JOIN [UrlFilters] AS [u] ON [l].[UrlFilterId] = [u].[Id]
WHERE ([l1].[Id] = [l].[LandingPageId]) AND ([u].[GroupId] = #__groupId_3)) IS NULL THEN 4
ELSE (
SELECT TOP(1) [u0].[UrlFilterType]
FROM [LandingPageUrlFilters] AS [l0]
INNER JOIN [UrlFilters] AS [u0] ON [l0].[UrlFilterId] = [u0].[Id]
WHERE ([l1].[Id] = [l0].[LandingPageId]) AND ([u0].[GroupId] = #__groupId_3))
END AS [FilterType]
FROM [LandingPages] AS [l1]
WHERE CONTAINS([l1].[Url], #__Format_1) AND #__domainId_2 IN (
SELECT [d].[DomainId]
FROM [DomainLandingPages] AS [d]
WHERE [l1].[Id] = [d].[LandingPageId]
)
ORDER BY CASE
WHEN (
SELECT TOP(1) [l2].[LandingPageId]
FROM [LandingPageUrlFilters] AS [l2]
INNER JOIN [UrlFilters] AS [u1] ON [l2].[UrlFilterId] = [u1].[Id]
WHERE ([l1].[Id] = [l2].[LandingPageId]) AND ([u1].[GroupId] = #__groupId_3)) IS NULL THEN 4
ELSE (
SELECT TOP(1) [u2].[UrlFilterType]
FROM [LandingPageUrlFilters] AS [l3]
INNER JOIN [UrlFilters] AS [u2] ON [l3].[UrlFilterId] = [u2].[Id]
WHERE ([l1].[Id] = [l3].[LandingPageId]) AND ([u2].[GroupId] = #__groupId_3))
END
OFFSET #__p_4 ROWS FETCH NEXT #__p_5 ROWS ONLY
Now my question is, how can I improve the execution time of this? Either by SQL or LINQ
EDIT: So I've been tinkering with some raw SQL and this is what I've come up with:
with matched_urls as (
select l.id, min(f.urlfiltertype) as Filter
from landingpages l
join landingpageurlfilters lpf on lpf.landingpageid = l.id
join urlfilters f on lpf.urlfilterid = f.id
where f.groupid = #groupId
and contains(Url, '"barz*"')
group by l.id
) select l.id, 5 as Filter
from landingpages l
where #domainId in (
select domainid
from domainlandingpages dlp
where l.id = dlp.landingpageid
) and l.id not in (select id from matched_urls ) and contains(Url, '"barz*"')
union select * from matched_urls
order by Filter
offset 10 rows fetch next 30 rows only
This performs somewhat okay, cutting the execution time down to ~5 seconds. As this is to be used for a table search I would however like to get it down even further. Is there any way to improve this SQL?
You're right to have a look at the generated SQL. In general, I would advise to learn SQL, write a performing SQL query and work your way back (either use a stored procedure or raw SQL, or design your LINQ query with that same philosophy.
I suspect this will be better (not tested):
var page = (
from e in baseQuery
let urlFilter = e.LandingPageUrlFilters.OrderBy(f => f.UrlFilterType).FirstOrDefault(f => f.UrlFilter.GroupId == groupId)
let filterType = urlFilter == null ? UrlFilterType.NotCovered : e.UrlFilter.UrlFilter.UrlFilterType
select new
{
LandingPage = e,
FilterType = filterType
}
).Skip(10).Take(75).ToList();
one of the way to improve the execution time is see execution plan in SSMS (SQL Server Management Studio).
After look on the execution plan you can design some indexes, or if you have no experiences with this, you can see if SSMS recommends some indexes.
Next try to create the indexes and execute the query again and see if execution time was improved.
Note: this is only one of many possible ways to improve execution time...

Order by descending by row field in query

I want to write a EF query which does order by ascending or descending based on condition. Following is the my pseudo code:
var result= q.OrderByDescending(x => x.StatusId == 3)
if( x.StatusId == 3)
then order by x.ReserveDate
if( x.StatusId != 3 )
then order by descending x.LastUpdateDate
How can i do this?
Update
This is not same as q = condition ? q.OrderBy(..) : q.OrderByDescending(..) as marked in referenced duplicate question, sorting order differs based on value within the row instead of a flag outside query.
You can supply complex expressions in OrderBy...
// you might have to give bounding start,end for
// for this query to work correctly...
var end = DateTime.Now;
var start = end.AddYears(-100);
var result = q.OrderBy(
x => x.StatusId == 3 ?
// this will sort by in ascending order
DbFunctions.DiffDays(x.ReserveDate, start) :
// this will sort in descending order
DbFunctions.DiffDays(end, x.LastUpdateDate) );
SQL Generated will be
SELECT
...
...
FROM ( SELECT CASE
WHEN ([Extent2].[StatusId ] = 3)
THEN DATEDIFF (day, #p__linq__0, [Extent1].[ReserveDate])
ELSE
DATEDIFF (day, [Extent1].[LastUpdateDate], #p__linq__1)
END AS [C1]
FROM [dbo].[Table] AS [Extent1]
) AS [Project1]
ORDER BY [Project1].[C1]
As BoessB's comment has it, this is easiest with two concatenated queries:
var q1 = from x in source
where x.StatusId == 3
order by x.ReserveDate;
var q2 = from x in source
where x.StatusId != 3
order by x.LastUpdateDate descending;
var results = await q1.Concat(q2).ToListAsync();
It might be possible to do in a single expression if you can create a derived field (let clauses help) from ReserveDate and LastUpdateDate that sorts in the right way. But I would suggest splitting the query will be clearer.

Convert SQL to EF Linq

I have the following query:
SELECT COUNT(1)
FROM Warehouse.WorkItems wi
WHERE wi.TaskId = (SELECT TaskId
FROM Warehouse.WorkItems
WHERE WorkItemId = #WorkItemId)
AND wi.IsComplete = 0;
And since we are using EF, I'd like to be able to use the Linq functionality to generate this query. (I know that I can give it a string query like this, but I would like to use EF+Linq to generate the query for me, for refactoring reasons.)
I really don't need to know the results of the query. I just need to know if there are any results. (The use of an Any() would be perfect, but I can't get the write code for it.)
So... Basically, how do I write that SQL query as a LINQ query?
Edit: Table Structure
WorkItemId - int - Primary Key
TaskId - int - Foreign Key on Warehouse.Tasks
IsComplete - bool
JobId - int
UserName - string
ReportName - string
ReportCriteria - string
ReportId - int - Foreign Key on Warehouse.Reports
CreatedTime - DateTime
The direct translation could be something like this
var result = db.WorkItems.Any(wi =>
!wi.IsComplete && wi.TaskId == db.WorkItems
.Where(x => x.WorkItemId == workItemId)
.Select(x => x.TaskId)
.FirstOrDefault()));
Taking into account the fact that SQL =(subquery), IN (subquery) and EXISTS(subquery) in nowadays modern databases are handled identically, you can try this instead
var result = db.WorkItems.Any(wi =>
!wi.IsComplete && db.WorkItems.Any(x => x.WorkItemId == workItemId
&& x.TaskId == wi.TaskId));
Turns out that I just needed to approach the problem from a different angle.
I came up with about three solutions with varying Linq syntaxes:
Full method chain:
var q1 = Warehouse.WorkItems
.Where(workItem => workItem.TaskId == (from wis in Warehouse.WorkItems
where wis.WorkItemId == workItemId
select wis.TaskId).First())
.Any(workItem => !workItem.IsComplete);
Mixed query + method chain:
var q2 = Warehouse.WorkItems
.Where(workItem => workItem.TaskId == Warehouse.WorkItems
.Where(wis => wis.WorkItemId == workItemId)
.Select(wis => wis.TaskId)
.First())
.Any(workItem => !workItem.IsComplete);
Full query:
var q3 = (from wi in Warehouse.WorkItems
where wi.TaskId == (from swi in Warehouse.WorkItems
where swi.WorkItemId == workItemId
select swi.TaskId).First()
where !wi.IsComplete
select 1).Any();
The only problems with this is that it comes up with some really jacked up SQL:
SELECT
(CASE
WHEN EXISTS(
SELECT NULL AS [EMPTY]
FROM [Warehouse].[WorkItems] AS [t0]
WHERE (NOT ([t0].[IsComplete] = 1)) AND ([t0].[TaskId] = ((
SELECT TOP (1) [t1].[TaskId]
FROM [Warehouse].[WorkItems] AS [t1]
WHERE [t1].[WorkItemId] = #p0
)))
) THEN 1
ELSE 0
END) AS [value]
You can use the Any() function like so:
var result = Warehouse.WorkItems.Any(x => x.WorkItemId != null);
In short, you pass in your condition, which in this case is checking whether or not any of the items in your collection have an ID
The variable result will tell you whether or not all items in your collection have ID's.
Here's a helpful webpage to help you get started with LINQ: http://www.dotnetperls.com/linq
Subquery in the original SQL was a useless one, thus not a good sample for Any() usage. It is simply:
SELECT COUNT(*)
FROM Warehouse.WorkItems wi
WHERE WorkItemId = #WorkItemId
AND wi.IsComplete = 0;
It looks like, since the result would be 0 or 1 only, guessing the purpose and based on seeking how to write Any(), it may be written as:
SELECT CASE WHEN EXISTS ( SELECT *
FROM Warehouse.WorkItems wi
WHERE WorkItemId = #WorkItemId AND
wi.IsComplete = 0 ) THEN 1
ELSE 0
END;
Then it makes sense to use Any():
bool exists = db.WorkItems.Any( wi => wi.WorkItemId == workItemId & !wi.IsComplete );
EDIT: I misread the original query in a hurry, sorry. Here is an update on the Linq usage:
bool exists = db.WorkItems.Any( wi =>
db.WorkItems
.SingleOrDefault(wi.WorkItemId == workItemId).TaskId == wi.TaskId
&& !wi.IsComplete );
If the count was needed as in the original SQL:
var count = db.WorkItems.Count( wi =>
db.WorkItems
.SingleOrDefault(wi.WorkItemId == workItemId).TaskId == wi.TaskId
&& !wi.IsComplete );
Sorry again for the confusion.

Linq-to-SQL: arithmetic operation on consecutive elements

For example, I have a table:
Date |Value
----------|-----
2015/10/01|5
2015/09/01|8
2015/08/01|10
Is there any way using Linq-to-SQL to get a new sequence which will be an arithmetic operation between consecutive elements in the previously ordered set (for example, i.Value - (i-1).Value)? It must be executed on SQL Server 2008 side, not application side.
For example dataContext.GetTable<X>().OrderByDescending(d => d.Date).Something(.......).ToArray(); should return 3, 2.
Is it possible?
You can try this:
var q = (
from i in Items
orderby i.ItemDate descending
let prev = Items.Where(x => x.ItemDate < i.ItemDate).FirstOrDefault()
select new { Value = i.ItemValue - (prev == null ? 0 : prev.ItemValue) }
).ToArray();
EDIT:
If you slightly modify the above linq query to:
var q = (from i in Items
orderby i.ItemDate descending
let prev = Items.Where(x => x.ItemDate < i.ItemDate).FirstOrDefault()
select new { Value = (int?)i.ItemValue - prev.ItemValue }
).ToArray();
then you get the following TSQL query sent to the database:
SELECT ([t0].[ItemValue]) - ((SELECT [t2].[ItemValue]
FROM (SELECT TOP (1) [t1].[ItemValue]
FROM [Items] AS [t1]
WHERE [t1].[ItemDate] < [t0].[ItemDate]) AS [t2]
)) AS [Value]
FROM [Items] AS [t0]
ORDER BY [t0].[ItemDate] DESC
My guess now is if you place an index on ItemDate field this shouldn't perform too bad.
I wouldn't let SQL do this, it would create an inefficient SQL query (I think).
I could create a stored procedure, but if the amount of data is not too big I can also use Linq to objects:
List<x> items=dataContext.GetTable<X>().OrderByDescending(d => d.Date).ToList();//Bring data to memory
var res = items.Skip(1).Zip(items, (cur, prev) => cur.Value - prev.Value);
At the end, I might use a foreach for readability

using nHibernate QueryOver to join a subset

I am using nHibernate for our database access. I need to do a complicated query to find all member journal entries after a certain date with certain value, PreviousId, set for each member. I can easily write the SQL for it:
SELECT J.MemberId, J.PreviousId
FROM tblMemMemberStatusJournal J
INNER JOIN (
SELECT MemberId,
MIN(EffectiveDate) AS EffectiveDate
FROM tblMemMemberStatusJournal
WHERE EffectiveDate > #StartOfMonth
AND (PreviousId is NOT null)
GROUP BY MemberId
) AS X ON (X.EffectiveDate = J.EffectiveDate AND X.MemberId = J.MemberId)
However I am having a lot of trouble trying to get nHibernate to generate this information. There is not a lot of (any) documentation for how to use QueryOver.
I have been seeing information in other places, but none of it is very clear and very little has an actual explanation as to why things are done in certain ways. The answer for Selecting on Sub Queries in NHibernate with Critieria API did not give an adequate example as to what it is doing, so I haven't been able to replicate it.
I've gotten the inner part of the query created with this:
IList<object[]> result = session.QueryOver<MemberStatusJournal>()
.SelectList(list => list
.SelectGroup(a => a.Member.ID)
.SelectMin(a => a.EffectiveDate))
.Where(j => (j.EffectiveDate > firstOfMonth) && (j.PreviousId != null))
.List<object[]>();
Which, according to the profiler, makes this SQL:
SELECT this_.MemberId as y0_,
min(this_.EffectiveDate) as y1_
FROM tblMemMemberStatusJournal this_
WHERE (this_.EffectiveDate > '2014-08-01T00:00:00' /* #p0 */
and not (this_.PreviousLocalId is null))
GROUP BY this_.MemberId
But I am not finding a good example of how to actually do join this subset with a parent query. Does anyone have any suggestions?
You aren't actually joining on a subset, you're filtering on a subset. Knowing this, you have the option of filtering via other means, in this case, a correlated subquery.
The solution below first creates a detatched query to act as the inner subquery. We can correlate properties of the inner query with properties of the outer query through the use of an alias.
MemberStatusJournal memberStatusJournalAlias = null; // This will represent the
// object of the outer query
var subQuery = QueryOver.Of<MemberStatusJournal>()
.Select(Projections.GroupProperty(Projections.Property<MemberStatusJournal>(m => m.Member.ID)))
.Where(j => (j.EffectiveDate > firstOfMonth) && (j.PreviousId != null))
.Where(Restrictions.EqProperty(
Projections.Min<MemberStatusJournal>(j => j.EffectiveDate),
Projections.Property(() => memberStatusJournalAlias.EffectiveDate)
)
)
.Where(Restrictions.EqProperty(
Projections.GroupProperty(Projections.Property<MemberStatusJournal>(m => m.Member.Id)),
Projections.Property(() => memberStatusJournalAlias.Member.Id)
));
var results = session.QueryOver<MemberStatusJournal>(() => memberStatusJournalAlias)
.WithSubquery
.WhereExists(subQuery)
.List();
This would produce an SQL query like the following:
SELECT blah
FROM tblMemMemberStatusJournal J
WHERE EXISTS (
SELECT J2.MemberId
FROM tblMemberStatusJournal J2
WHERE J2.EffectiveDate > #StartOfMonth
AND (J2.PreviousId is NOT null)
GROUP BY J2.MemberId
HAVING MIN(J2.EffectiveDate) = J.EffectiveDate
AND J2.MemberId = J.MemberId
)
This looks less efficient than the inner join query you opened the question with. But my experience is that the SQL Query Optimizer is clever enough to convert this into an inner join. If you want to confirm this, you can use SQL Studio to generate and compare the execution plans of both queries.

Categories