See the query below. The object and property names have been obfuscated somewhat to not leak confidential/sensitive information, but the query structure is the same.
When the .OrderBy(p => "") is added, which is complete non-sense to me, the query runs much faster. The time it takes to execute the query goes from approx. 2000ms down to approx. 400ms. I have tested it a couple of times, adding and removing only the OrderBy statement.
I am completely puzzled, how can this be? The query is executed on a SQL database in an Azure environment.
I can understand that ordering data on property A, and then selecting records where property A equals some value could potentialy speed up the query. But ordering on an empty string!? What is going on here?
Also I want to note, that the query, without the OrderBy, using Expressions ( as suggested in this post to circumvent SQL parameter sniffing) lowers the execution time also to approx. 400ms. Adding the .OrderBy(p => "") then doesn't make any noticeable difference.
var query = (from p in Context.Punders.Where(p => p.A == A)
.Where(p => null != p.SomeNumber)
.Where(p => p.StatusCode == Default ||
p.StatusCode == Cancelled)
.Where(p => p.DatePosted >= startDate && p.DatePosted <= endDate)
join f in Context.Founders.Where(f => f.A == A) on p.Code equals f.Code
join r in Context.Rounders.Where(r => r.A == A) on p.Code equals r.Code
into rg
from r in rg.DefaultIfEmpty()
join pt in Context.FishTypes.Where(ft => ft.A ==A) on p.Code equals pt.Code
where r == null
select new
{
p.Code,
f.B,
f.C,
p.D,
p.E,
pt.F,
pt.G,
p.H
})
.OrderBy(p => "");
Query without the .OrderBy(...
SELECT [Filter1].[q] AS [q],
[Filter1].[c1] AS [edoc],
[Filter1].[oc1] AS [wnrdc],
[Filter1].[otc1] AS [weener],
[Filter1].[ptc1] AS [pmtpdc],
[Extent4].[isr] AS [isr],
[Extent4].[rac] AS [rac],
[Filter1].[arn] AS [arn]
FROM (SELECT [Extent1].[pcid] AS [pcid1],
[Extent1].[edoc] AS [c1],
[Extent1].[pmtpdc] AS [ptc1],
[Extent1].[q] AS [q],
[Extent1].[arn] AS [arn],
[Extent1].[dateposted] AS [DatePosted],
[Extent2].[pcid] AS [pcid2],
[Extent2].[wnrdc] AS [oc1],
[Extent2].[weener] AS [otc1]
FROM [fnish].[post] AS [Extent1]
INNER JOIN [fnish].[olik] AS [Extent2]
ON [Extent1].[olikedoc] = [Extent2].[edoc]
LEFT OUTER JOIN [fnish].[receivable] AS [Extent3]
ON ( [Extent3].[pcid] = #p__linq__4 )
AND ( [Extent1].[edoc] =
[Extent3].[pepstedoc] )
WHERE ( [Extent1].[arn] IS NOT NULL )
AND ( [Extent1].[posttedoc] IN ( N'D', N'X' ) )
AND ( [Extent3].[id] IS NULL )) AS [Filter1]
INNER JOIN [fnish].[paymenttype] AS [Extent4]
ON [Filter1].[ptc1] = [Extent4].[edoc]
WHERE ( [Filter1].[pcid1] = #p__linq__0 )
AND ( [Filter1].[dateposted] >= #p__linq__1 )
AND ( [Filter1].[dateposted] <= #p__linq__2 )
AND ( [Filter1].[pcid2] = #p__linq__3 )
AND ( [Extent4].[pcid] = #p__linq__5 )
Query with the .OrderBy(...
SELECT [Project1].[q] AS [q],
[Project1].[edoc] AS [edoc],
[Project1].[wnrdc] AS [wnrdc],
[Project1].[weener] AS [weener],
[Project1].[pmtpdc] AS [pmtpdc],
[Project1].[isr] AS [isr],
[Project1].[rac] AS [rac],
[Project1].[arn] AS [arn]
FROM (SELECT N'' AS [C1],
[Filter1].[c1] AS [edoc],
[Filter1].[ptc1] AS [pmtpdc],
[Filter1].[q] AS [q],
[Filter1].[arn] AS [arn],
[Filter1].[oc1] AS [wnrdc],
[Filter1].[otc1] AS [weener],
[Extent4].[isr] AS [isr],
[Extent4].[rac] AS [rac]
FROM (SELECT [Extent1].[pcid] AS [pcid1],
[Extent1].[edoc] AS [c1],
[Extent1].[pmtpdc] AS [ptc1],
[Extent1].[q] AS [q],
[Extent1].[arn] AS [arn],
[Extent1].[dateposted] AS [DatePosted],
[Extent2].[pcid] AS [pcid2],
[Extent2].[wnrdc] AS [oc1],
[Extent2].[weener] AS [otc1]
FROM [fnish].[post] AS [Extent1]
INNER JOIN [fnish].[olik] AS [Extent2]
ON [Extent1].[olikedoc] = [Extent2].[edoc]
LEFT OUTER JOIN [fnish].[receivable] AS [Extent3]
ON ( [Extent3].[pcid] =
#p__linq__4 )
AND ( [Extent1].[edoc] =
[Extent3].[pepstedoc] )
WHERE ( [Extent1].[arn] IS NOT NULL )
AND ( [Extent1].[posttedoc] IN ( N'D', N'X' ) )
AND ( [Extent3].[id] IS NULL )) AS [Filter1]
INNER JOIN [fnish].[paymenttype] AS [Extent4]
ON [Filter1].[ptc1] = [Extent4].[edoc]
WHERE ( [Filter1].[pcid1] = #p__linq__0 )
AND ( [Filter1].[dateposted] >= #p__linq__1 )
AND ( [Filter1].[dateposted] <= #p__linq__2 )
AND ( [Filter1].[pcid2] = #p__linq__3 )
AND ( [Extent4].[pcid] = #p__linq__5 )) AS [Project1]
ORDER BY [Project1].[c1] ASC
Conclusion
From what I have learned, with a bit of a guess: It is case specific behavior. In my case, the performance gain is likely due to a different execution plan being constructed by the SQL server that is yielding a better performing query. I've seen a different execution plan with the query without the OrderBy using the SQL statement OPTION(RECOMIPILE) that showed similar performance gain. So adding the OrderBy to the LINQ query is very likely (I think) producing a different execution plan that yields a better performing query.
Given your note
Also I want to note, that the query, without the OrderBy, using
Expressions ( as suggested in this post to circumvent SQL parameter
sniffing) lowers the execution time also to approx. 400ms. Adding the
.OrderBy(p => "") then doesn't make any noticeable difference.
The most reasonable explanation is: OrderBy has the same effect as using explicit values instead of parameters. So if you had pre-cached plan for given query, and with particular parameter values this plan is not optimal (2 seconds) - changing this query by adding useless OrderBy to it will force SQL Server to create new execution plan for this query, and so will negate effect of old non-optimal execution plan. Of course, it should be clear that this is not a good way to negate plan caching.
Using C# and Linq to SQL, I found that my query with multiple where is orders of magnitude slower than with a single where / and.
Here is the query
using (TeradiodeDataContext dc = new TeradiodeDataContext())
{
var filterPartNumberID = 71;
var diodeIDsInBlades = (from bd in dc.BladeDiodes
select bd.DiodeID.Value).Distinct();
var diodesWithTestData = (from t in dc.Tests
join tt in dc.TestTypes on t.TestTypeID equals tt.ID
where tt.DevicePartNumberID == filterPartNumberID
select t.DeviceID.Value).Distinct();
var result = (from d in dc.Diodes
where d.DevicePartNumberID == filterPartNumberID
where diodesWithTestData.Contains(d.ID)
where !diodeIDsInBlades.Contains(d.ID)
orderby d.Name
select d);
var list = result.ToList();
// ~15 seconds
}
However, when the condition in the final query is this
where d.DevicePartNumberID == filterPartNumberID
& diodesWithTestData.Contains(d.ID)
& !diodeIDsInBlades.Contains(d.ID)
// milliseconds
it is very fast.
Comparing the SQL in result before calling ToList(), here are the queries (value 71 manually added in place of #params)
-- MULTIPLE WHERE
SELECT [t0].[ID], [t0].[Name], [t0].[M2MID], [t0].[DevicePartNumberID], [t0].[Comments], [t0].[Hold]
FROM [dbo].[Diode] AS [t0]
WHERE (NOT (EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT DISTINCT [t2].[value]
FROM (
SELECT [t1].[DiodeID] AS [value]
FROM [dbo].[BladeDiode] AS [t1]
) AS [t2]
) AS [t3]
WHERE [t3].[value] = [t0].[ID]
))) AND (EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT DISTINCT [t6].[value]
FROM (
SELECT [t4].[DeviceID] AS [value], [t5].[DevicePartNumberID]
FROM [dbo].[Test] AS [t4]
INNER JOIN [dbo].[TestType] AS [t5] ON [t4].[TestTypeID] = ([t5].[ID])
) AS [t6]
WHERE [t6].[DevicePartNumberID] = (71)
) AS [t7]
WHERE [t7].[value] = [t0].[ID]
)) AND ([t0].[DevicePartNumberID] = 71)
ORDER BY [t0].[Name]
and
-- SINGLE WHERE
SELECT [t0].[ID], [t0].[Name], [t0].[M2MID], [t0].[DevicePartNumberID], [t0].[Comments], [t0].[Hold]
FROM [dbo].[Diode] AS [t0]
WHERE ([t0].[DevicePartNumberID] = 71) AND (EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT DISTINCT [t3].[value]
FROM (
SELECT [t1].[DeviceID] AS [value], [t2].[DevicePartNumberID]
FROM [dbo].[Test] AS [t1]
INNER JOIN [dbo].[TestType] AS [t2] ON [t1].[TestTypeID] = ([t2].[ID])
) AS [t3]
WHERE [t3].[DevicePartNumberID] = (71)
) AS [t4]
WHERE [t4].[value] = [t0].[ID]
)) AND (NOT (EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT DISTINCT [t6].[value]
FROM (
SELECT [t5].[DiodeID] AS [value]
FROM [dbo].[BladeDiode] AS [t5]
) AS [t6]
) AS [t7]
WHERE [t7].[value] = [t0].[ID]
)))
ORDER BY [t0].[Name]
The two SQL queries execute in < 1 second in SSMS and produce the same results.
So I'm wondering why the first is slower on the LINQ side. It's worrying to me because I know I've used multiple where elsewhere, without being aware of a such a severe performance impact.
This question even has answered with both multiple & and where. And this answer even suggests using multiple where clauses.
Can anyone explain why this happens in my case?
Because writing like this
if (someParam1 != 0)
{
myQuery = myQuery.Where(q => q.SomeField1 == someParam1)
}
if (someParam2 != 0)
{
myQuery = myQuery.Where(q => q.SomeField2 == someParam2)
}
is NOT(upd) the same as (in case when someParam1 and someParam2 != 0)
myQuery = from t in Table
where t.SomeField1 == someParam1
&& t.SomeField2 == someParam2
select t;
is (NOT deleted) the same as
myQuery = from t in Table
where t.SomeField1 == someParam1
where t.SomeField2 == someParam2
select t;
UPD
Yes, I do mistake. Second query is same, first is not same.
First and Second queries not EXACTLY the same. Let me show you what I mean.
1st query with lamda-expression writen as
t.Where(r => t.SomeField1 == someParam1 && t.SomeField2 == someParam2)
2nd query as
t.Where(r => r.SomeField1 == someParam1).Where(r => r.SomeField2 == someParam2)
In this case in generated SQL Predicate with SomeField2 goes first (it is important, see below)
In 1st case we getting this SQL:
SELECT <all field from Table>
FROM table t
WHERE t.SomeField1 = :someParam1
AND t.SomeField2 = :someParam2
In 2 case the SQL is:
SELECT <all field from Table>
FROM table t
WHERE t.SomeField2 = :someParam2
AND t.SomeField1 = :someParam1
As we see there are 2 'same' SQLs. As we see, the OP's SQLs are also 'same', they are different in order of predicates in WHERE clause (as in my example). And I guess that SQL optimizer generate 2 different execution plans and may be(!!!) doing NOT EXISTS, then EXISTS and then filtering take more time than do first filtering and after that do EXISTS and NOT EXISTS
UPD2
It is a 'problem' of Linq Provider (ORM). I'm using another ORM (linq2db), and it generates for me EXACTLY the same SQLs in both cases.
I have the following EF6 fetch
dgOrders.DataSource = Context.Orders
.Where(o => o.ProposedOrder == ProposedOrders
&& o.Inactive == false
&& o.OnHold == false
&& o.Archive == false
&& (!o.ManufactureSiteFlag.HasValue || (o.ManufactureSiteFlag & currentSite) > 0)
&& (FilterOnDispatch == ""
|| (FilterOnDispatch.Equals("YES") && o.Deliveries.Count(d => d.Dispatched == true) > 0)
|| (FilterOnDispatch.Equals("NO") && o.Deliveries.Count(d => d.Dispatched == false) > 0)));
When it executes it produces the following sequence of SQL on the server
(#p__linq__0 bit,#p__linq__1 int,#p__linq__2 nvarchar(4000),#p__linq__3 nvarchar(4000),#p__linq__4 nvarchar(4000))
SELECT
[Project3].[OrderID] AS [OrderID],
[Project3].[OrderNum] AS [OrderNum],
....
[Project3].[OrderDeliveryStatusID] AS [OrderDeliveryStatusID]
FROM ( SELECT
[Project2].[OrderID] AS [OrderID],
[Project2].[OrderNum] AS [OrderNum],
....
[Project2].[OrderDeliveryStatusID] AS [OrderDeliveryStatusID]
FROM ( SELECT
[Project1].[OrderID] AS [OrderID],
[Project1].[OrderNum] AS [OrderNum],
....
[Project1].[OrderDeliveryStatusID] AS [OrderDeliveryStatusID],
[Project1].[C1] AS [C1],
(SELECT
COUNT(1) AS [A1]
FROM [dbo].[Deliveries] AS [Extent3]
WHERE ([Project1].[OrderID] = [Extent3].[OrderID]) AND (0 = [Extent3].[Dispatched])) AS [C2]
FROM ( SELECT
[Extent1].[OrderID] AS [OrderID],
[Extent1].[OrderNum] AS [OrderNum],
....
[Extent1].[OrderDeliveryStatusID] AS [OrderDeliveryStatusID],
(SELECT
COUNT(1) AS [A1]
FROM [dbo].[Deliveries] AS [Extent2]
WHERE ([Extent1].[OrderID] = [Extent2].[OrderID]) AND (1 = [Extent2].[Dispatched])) AS [C1]
FROM [dbo].[Orders] AS [Extent1]
) AS [Project1]
) AS [Project2]
WHERE ([Project2].[ProposedOrder] = #p__linq__0) AND (0 = [Project2].[Inactive]) AND (0 = [Project2].[OnHold]) AND (0 = [Project2].[Archive]) AND (([Project2].[ManufactureSiteFlag] IS NULL) OR ((( CAST( [Project2].[ManufactureSiteFlag] AS int)) & (#p__linq__1)) > 0)) AND ((N'' = #p__linq__2) OR ((N'YES' = #p__linq__3) AND ([Project2].[C1] > 0)) OR ((N'NO' = #p__linq__4) AND ([Project2].[C2] > 0)))
) AS [Project3]
and then produces
SELECT
[Extent1].[OrderID] AS [OrderID],
[Extent1].[OrderNum] AS [OrderNum],
...
[Extent1].[OrderDeliveryStatusID] AS [OrderDeliveryStatusID]
FROM [dbo].[Orders] AS [Extent1];
SELECT
[Extent1].[OrderID] AS [OrderID],
[Extent1].[OrderNum] AS [OrderNum],
....
[Extent1].[OrderDeliveryStatusID] AS [OrderDeliveryStatusID]
FROM [dbo].[Orders] AS [Extent1]
WHERE [Extent1].[OrderID] IN (91,181,421,690,844,1544,2460,2682,2687,2736,2760,2806,2816,2817,2818,3134,3141,3154,3473,3726,4404,4583,4590,4641,4673,4677,4695,4737,4741,4789,4837,4885,4886,4887,4889,4993,5013,5018,5043,5046,5074,5090,5106,5134,5141,5231,5260,5261,5264,5265,5276,5369,5371,5421,5458,5513,5583,5688,5837,5863,5894,5895,5908,6002,6055,6084,6113,6128,6240,6432,6589,6590,6651,6676,6708,6733,6757,6772,6785,6831,6931,6934,6935,6936,7003,7004,7043,7068,7128,7135,7170,7172,7195,7223,7243,7325,7350,7360,7377,7452,7504,7508,7568,7613,7614,7641,7676,7714,7740,7764,7842,8008,8023,8174,8244,8250,8269,8312,8340,8346,8392,8437,8470,8488,8652,8664,8703,8710,8722,8750,8831,8920,9016,9181,9243,9262,9413,9421,9429,9621,9680,9707,9709,9710,9772,9787,9797,9832,9911,9918,9959,9961,9972,10042,10052,10056,10083,10120,10189,10221,10222,10253,10254,10293,10348,10413,10415,10430,10442,10452,10468,10491,10505,10529,10555,10573,10630,10662,10787,10791,10804,10838,10887,10933,10934,10955,10968,11010,11020,11059,11072,11078,11149,11151,11188,11281,11299,11421,11496,11502,11572,11647,11655,11758,11817,11948,12049,12082,12137,12201,12275,12406,12451,12466,12472,12516,12547,12581,12608,12650,12666,12720,12730,12732,12771,12775,12792,12807,12810,12843,12965,13074,13075,13085,13087,13102,13153,13198,13316,13326,13516,13763,13795,13800,13802,13867,13871,13878,13887,13891);
The second SQL statement is repeated multiple times with different primary key values.
Why is the first select statement not sufficient to satisfy the requirement of the request. The subsequent set of statements appear to be returning a narrowed view of the dataset with no additional benefit.
Does it have something to do with the foreign key link to deliveries?
What can be done to improve performance here?
UPDATE: Apart from adopting some of the comments and answers below to improve performance on the original query, the additional fetches were tracked back to the Context.Refresh option which for some reason is the instigator of this behaviour.
If you are using parameters to turn parts of the query on or off, you should instead compose it instead of putting it inside the query itself. This should simplify the query itself.
var query = Context.Orders
.Where(o => o.ProposedOrder == ProposedOrders
&& o.Inactive == false
&& o.OnHold == false
&& o.Archive == false
&& (!o.ManufactureSiteFlag.HasValue || (o.ManufactureSiteFlag & currentSite) > 0);
if (FilterOnDispatch.Equals("YES"))
query = query.Where(o=>o.Deliveries.Count(d => d.Dispatched == true) > 0);
else if (FilterOnDispatch.Equals("NO"))
query = query.Where(o=>o.Deliveries.Count(d => d.Dispatched == false) > 0);
dgOrders.DataSource = query;
Also, are you having any entities included in the query? Subsequent queries might be caused by EF having to pull related entities alongside the primary one.