So I have a Topic which has these related entities.
- List<Posts>
- List<Votes>
- List<Views>
I have the following query. Where I want to pull out and order popular Topics based on the count of 3 related entities over a specific date period.
var topics = _context.Topic
.OrderByDescending(x => x.Posts.Count(c => c.DateCreated >= from && c.DateCreated <= to))
.ThenByDescending(x => x.Votes.Count(c => c.DateCreated >= from && c.DateCreated <= to))
.ThenByDescending(x => x.Views.Count(c => c.DateCreated >= from && c.DateCreated <= to))
.Take(amountToShow)
.ToList();
I'm looking for the most efficient query for doing the above? Is what I am doing the best way to do this with EntityFramework? Or am I missing something?
Any help appreciated.
If you put your above code into LINQPad, or check it with the profiler, you will see that it will probably generate something like the following SQL:
SELECT TOP #amountToShow [t0].[id] --and additional columns
FROM [Topic] AS [t0]
ORDER BY (
SELECT COUNT(*)
FROM [Posts] AS [t1]
WHERE ([t1].DateCreated >= #from AND [t1].DateCreated <= #to)
AND ([t1].[topidId] = [t0].[id])
) DESC, (
SELECT COUNT(*)
FROM [Votes] AS [t2]
WHERE ([t2].DateCreated >= #from AND [t2].DateCreated <= #to)
AND ([t2].[topicId] = [t0].[id])
) DESC , (
SELECT COUNT(*)
FROM [Views] AS [t3]
WHERE ([t3].DateCreated >= #from AND [t3].DateCreated <= #to)
AND ([t3].[topicId] = [t0].[id])
) DESC
GO
You could try rewriting the SQL a bit to GROUP the results of the subqueries and LEFT JOIN them to the original table, which does seem to be about 2x faster in the db itself:
SELECT TOP #amountToShow [t0].[id] --etc
FROM [Topic] AS [t0]
LEFT JOIN
(SELECT topicId, COUNT(*) AS num FROM Posts p
WHERE [p].DateCreated >= #from AND .DateCreated <= #to
GROUP BY topicId) [t1]
ON t0.id = t1.topicId
LEFT JOIN
(SELECT topicId, COUNT(*) AS num FROM Votes vo
WHERE [vo].DateCreated >= #from AND [vo].DateCreated <= #to
GROUP BY topidId) [t2]
ON t0.id = t2.topicId
LEFT JOIN
(SELECT topicId, COUNT(*) AS num FROM Views vi
WHERE [vi].DateCreated >= #from AND [vi].DateCreated <= #to
GROUP BY topicId) [t3]
ON t0.id = t3.topicId
ORDER BY t1.num DESC, t2.num DESC, t3.num DESC
But getting LINQ to generate code like this is iffy at best. Doing LEFT JOINs are not exactly its strong suit, and using the techniques that are out there for doing so will probably generate SQL that uses CROSS APPLY and/or OUTER APPLY instead, and will likely be as slow or slower than your current code.
If you are that worried about speed, you might consider putting your fine-tuned SQL into a view so that you know that the query being used is the one you want.
Bear in mind, too, that you or someone else will have to come back to this code and maintain it later. Your current linq statement is very straightforward and easy to understand. A complicated query is going to be harder to maintain and will take more work to alter in the future.
Related
See the query below. The object and property names have been obfuscated somewhat to not leak confidential/sensitive information, but the query structure is the same.
When the .OrderBy(p => "") is added, which is complete non-sense to me, the query runs much faster. The time it takes to execute the query goes from approx. 2000ms down to approx. 400ms. I have tested it a couple of times, adding and removing only the OrderBy statement.
I am completely puzzled, how can this be? The query is executed on a SQL database in an Azure environment.
I can understand that ordering data on property A, and then selecting records where property A equals some value could potentialy speed up the query. But ordering on an empty string!? What is going on here?
Also I want to note, that the query, without the OrderBy, using Expressions ( as suggested in this post to circumvent SQL parameter sniffing) lowers the execution time also to approx. 400ms. Adding the .OrderBy(p => "") then doesn't make any noticeable difference.
var query = (from p in Context.Punders.Where(p => p.A == A)
.Where(p => null != p.SomeNumber)
.Where(p => p.StatusCode == Default ||
p.StatusCode == Cancelled)
.Where(p => p.DatePosted >= startDate && p.DatePosted <= endDate)
join f in Context.Founders.Where(f => f.A == A) on p.Code equals f.Code
join r in Context.Rounders.Where(r => r.A == A) on p.Code equals r.Code
into rg
from r in rg.DefaultIfEmpty()
join pt in Context.FishTypes.Where(ft => ft.A ==A) on p.Code equals pt.Code
where r == null
select new
{
p.Code,
f.B,
f.C,
p.D,
p.E,
pt.F,
pt.G,
p.H
})
.OrderBy(p => "");
Query without the .OrderBy(...
SELECT [Filter1].[q] AS [q],
[Filter1].[c1] AS [edoc],
[Filter1].[oc1] AS [wnrdc],
[Filter1].[otc1] AS [weener],
[Filter1].[ptc1] AS [pmtpdc],
[Extent4].[isr] AS [isr],
[Extent4].[rac] AS [rac],
[Filter1].[arn] AS [arn]
FROM (SELECT [Extent1].[pcid] AS [pcid1],
[Extent1].[edoc] AS [c1],
[Extent1].[pmtpdc] AS [ptc1],
[Extent1].[q] AS [q],
[Extent1].[arn] AS [arn],
[Extent1].[dateposted] AS [DatePosted],
[Extent2].[pcid] AS [pcid2],
[Extent2].[wnrdc] AS [oc1],
[Extent2].[weener] AS [otc1]
FROM [fnish].[post] AS [Extent1]
INNER JOIN [fnish].[olik] AS [Extent2]
ON [Extent1].[olikedoc] = [Extent2].[edoc]
LEFT OUTER JOIN [fnish].[receivable] AS [Extent3]
ON ( [Extent3].[pcid] = #p__linq__4 )
AND ( [Extent1].[edoc] =
[Extent3].[pepstedoc] )
WHERE ( [Extent1].[arn] IS NOT NULL )
AND ( [Extent1].[posttedoc] IN ( N'D', N'X' ) )
AND ( [Extent3].[id] IS NULL )) AS [Filter1]
INNER JOIN [fnish].[paymenttype] AS [Extent4]
ON [Filter1].[ptc1] = [Extent4].[edoc]
WHERE ( [Filter1].[pcid1] = #p__linq__0 )
AND ( [Filter1].[dateposted] >= #p__linq__1 )
AND ( [Filter1].[dateposted] <= #p__linq__2 )
AND ( [Filter1].[pcid2] = #p__linq__3 )
AND ( [Extent4].[pcid] = #p__linq__5 )
Query with the .OrderBy(...
SELECT [Project1].[q] AS [q],
[Project1].[edoc] AS [edoc],
[Project1].[wnrdc] AS [wnrdc],
[Project1].[weener] AS [weener],
[Project1].[pmtpdc] AS [pmtpdc],
[Project1].[isr] AS [isr],
[Project1].[rac] AS [rac],
[Project1].[arn] AS [arn]
FROM (SELECT N'' AS [C1],
[Filter1].[c1] AS [edoc],
[Filter1].[ptc1] AS [pmtpdc],
[Filter1].[q] AS [q],
[Filter1].[arn] AS [arn],
[Filter1].[oc1] AS [wnrdc],
[Filter1].[otc1] AS [weener],
[Extent4].[isr] AS [isr],
[Extent4].[rac] AS [rac]
FROM (SELECT [Extent1].[pcid] AS [pcid1],
[Extent1].[edoc] AS [c1],
[Extent1].[pmtpdc] AS [ptc1],
[Extent1].[q] AS [q],
[Extent1].[arn] AS [arn],
[Extent1].[dateposted] AS [DatePosted],
[Extent2].[pcid] AS [pcid2],
[Extent2].[wnrdc] AS [oc1],
[Extent2].[weener] AS [otc1]
FROM [fnish].[post] AS [Extent1]
INNER JOIN [fnish].[olik] AS [Extent2]
ON [Extent1].[olikedoc] = [Extent2].[edoc]
LEFT OUTER JOIN [fnish].[receivable] AS [Extent3]
ON ( [Extent3].[pcid] =
#p__linq__4 )
AND ( [Extent1].[edoc] =
[Extent3].[pepstedoc] )
WHERE ( [Extent1].[arn] IS NOT NULL )
AND ( [Extent1].[posttedoc] IN ( N'D', N'X' ) )
AND ( [Extent3].[id] IS NULL )) AS [Filter1]
INNER JOIN [fnish].[paymenttype] AS [Extent4]
ON [Filter1].[ptc1] = [Extent4].[edoc]
WHERE ( [Filter1].[pcid1] = #p__linq__0 )
AND ( [Filter1].[dateposted] >= #p__linq__1 )
AND ( [Filter1].[dateposted] <= #p__linq__2 )
AND ( [Filter1].[pcid2] = #p__linq__3 )
AND ( [Extent4].[pcid] = #p__linq__5 )) AS [Project1]
ORDER BY [Project1].[c1] ASC
Conclusion
From what I have learned, with a bit of a guess: It is case specific behavior. In my case, the performance gain is likely due to a different execution plan being constructed by the SQL server that is yielding a better performing query. I've seen a different execution plan with the query without the OrderBy using the SQL statement OPTION(RECOMIPILE) that showed similar performance gain. So adding the OrderBy to the LINQ query is very likely (I think) producing a different execution plan that yields a better performing query.
Given your note
Also I want to note, that the query, without the OrderBy, using
Expressions ( as suggested in this post to circumvent SQL parameter
sniffing) lowers the execution time also to approx. 400ms. Adding the
.OrderBy(p => "") then doesn't make any noticeable difference.
The most reasonable explanation is: OrderBy has the same effect as using explicit values instead of parameters. So if you had pre-cached plan for given query, and with particular parameter values this plan is not optimal (2 seconds) - changing this query by adding useless OrderBy to it will force SQL Server to create new execution plan for this query, and so will negate effect of old non-optimal execution plan. Of course, it should be clear that this is not a good way to negate plan caching.
I've done some extensive research and I've concluded that the DATEDIFF function is making my queries run very slow.
Below is the generated query by Entity Framework and it does look readable enough hopefully.
Here's the Linq that generates the T-SQL:
model.NewTotal1Week = ( from sdo in context.SubscriberDebitOrders
where
(
sdo.CampaignId == campaignId &&
( sdo.Status == ( Int32 ) DebitOrderStatus.New_Faulty ) &&
( SqlFunctions.DateDiff( "week", sdo.Collections.FirstOrDefault( c => c.TxnStatus == "U" ).ProcessDate, DateTime.Now ) <= 1 )
)
select sdo ).Count();
In the query below, I would like to get a COUNT of all Collections which fall within 1 week from the time they were Processed to today's date.
Is there anyone that can help me get rid of the DATEDIFF function? I've seen examples online but I couldn't adapt it to my scenario, forgive me I'm not very genius yet.
exec sp_executesql N'SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM [dbo].[SubscriberDebitOrder] AS [Extent1]
OUTER APPLY (SELECT TOP (1)
[Extent2].[ProcessDate] AS [ProcessDate]
FROM [dbo].[Collections] AS [Extent2]
WHERE ([Extent1].[Id] = [Extent2].[DebitOrderId]) AND (''U'' = [Extent2].[TxnStatus]) ) AS [Limit1]
WHERE ([Extent1].[CampaignId] = #p__linq__0) AND (3 = [Extent1].[Status]) AND ((DATEDIFF(week, [Limit1].[ProcessDate], SysDateTime())) <= 1)
) AS [GroupBy1]',N'#p__linq__0 int',#p__linq__0=3
go
Thanks in advance.
Its not the just DATEDIFF, any function on the column would cause query to do a SCAN on the underlying table/index
DATEDIFF(week, [Limit1].[ProcessDate], SysDateTime())) <=1
Above logic is fetching last week data? You can also write above without putting function around ProcessDate Column.
[Limit1].[ProcessDate] > SysDateTime()-7
This is your query:
SELECT GroupBy1.A1 AS C1
FROM (SELECT COUNT(1) AS[A1
FROM dbo.SubscriberDebitOrder AS Extent1 OUTER APPLY
(SELECT TOP (1) Extent2.ProcessDate
FROM [dbo].Collections Extent2
WHERE (Extent1.Id = Extent2.DebitOrderId AND
'U' = Extent2.TxnStatus
) AS [Limit1]
WHERE (Extent1.CampaignId = #p__linq__0) AND (3 = Extent1.Status) AND
(DATEDIFF(week, Limit1.ProcessDate, SysDateTime()) <= 1)
) GroupBy1;
As mentioned elsewhere, you should change the date logic and get rid of the outer query:
SELECT COUNT(1) AS A1
FROM dbo.SubscriberDebitOrder AS Extent1 OUTER APPLY
(SELECT TOP (1) Extent2.ProcessDate
FROM [dbo].Collections Extent2
WHERE (Extent1.Id = Extent2.DebitOrderId AND
'U' = Extent2.TxnStatus
) AS limit1
WHERE (Extent1.CampaignId = #p__linq__0) AND (3 = Extent1.Status) AND
Limit1.ProcessDate <= DATEADD(-1, week, GETDATE())
Very important note: This is not exactly equivalent to your query. Your original query counted the number of week boundaries between two dates. This depends on datefirst, but it woudld often be the number of Saturday or Sunday nights.
Based on your description, the above is more correct.
Next, you want indexes on Collections(DebitOrderId, TxnStatus, ProcessDate) and SubscriberDebitOrder(CampaignId, Status).
I am using EF6 and I would like to get the records in a table which are in a group of IDs.
In my test for example I am using 4 IDs.
I try two options, the first is with any.
dbContext.MyTable
.Where(x => myIDS.Any(y=> y == x.MyID));
And the T-SQL that this linq exrepsion generates is:
SELECT
*
FROM [dbo].[MiTabla] AS [Extent1]
WHERE EXISTS (SELECT
1 AS [C1]
FROM (SELECT
[UnionAll2].[C1] AS [C1]
FROM (SELECT
[UnionAll1].[C1] AS [C1]
FROM (SELECT
cast(130 as bigint) AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]
UNION ALL
SELECT
cast(139 as bigint) AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable2]) AS [UnionAll1]
UNION ALL
SELECT
cast(140 as bigint) AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable3]) AS [UnionAll2]
UNION ALL
SELECT
cast(141 as bigint) AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable4]) AS [UnionAll3]
WHERE [UnionAll3].[C1] = [Extent1].[MiID]
)
How can is seen, the T-SQL is a "where exists" that use many subqueries and unions.
The second option is with contains.
dbContext.MyTable
.Where(x => myIDS.Contains(x.MiID));
And the T-SQL:
SELECT
*
FROM [dbo].[MiTabla] AS [Extent1]
WHERE [Extent1].[MiID] IN (cast(130 as bigint), cast(139 as bigint), cast(140 as bigint), cast(141 as bigint))
The contains is translated into "where in", but the query is much less complex.
I have read that any it use to be faster, so I have the doubt if the any is, although it is more complex at a first glance, is faster or not.
Thank so much.
EDIT: I have some test (I don't know if this is the best way to test this).
System.Diagnostics.Stopwatch miswContains = new System.Diagnostics.Stopwatch();
miswContains.Start();
for (int i = 0; i < 100; i++)
{
IQueryable<MyTable> iq = dbContext.MyTable
.Where(x => myIDS.Contains(x.MyID));
iq.ToArrayAsync();
}
miswContains.Stop();
System.Diagnostics.Stopwatch miswAny = new System.Diagnostics.Stopwatch();
miswAny.Start();
for (int i = 0; i < 20; i++)
{
IQueryable<MyTable> iq = dbContext.Mytable
.Where(x => myIDS.Any(y => y == x.MyID));
iq.ToArrayAsync();
}
miswAny.Stop();
the results are that miswAny is about 850ms and the miswContains is about 4251ms.
So the second option, with contaions, is slower.
Your second option is the fastest solution I can think of (at least for not very large arrays of ids) provided your MiTabla.MiID is in an index.
If you want to read more about in clause performance: Is SQL IN bad for performance?.
If you know the ID, then using LINQ2SQL Count() method would create a much cleaner and faster SQL code (than both Any and Contains):
dbContext.MyTable
.Where(x => myIDS.Count(y=> y == x.MyID) > 0);
The generated SQL for the count should look something like this:
DECLARE #p0 Decimal(9,0) = 12345
SELECT COUNT(*) AS [value]
FROM [ids] AS [t0]
WHERE [t0].[id] = #p0
You can tell by the shape of the queries that Any is not scalable at all. It doesn't take many elements in myIDS (~50 probably) to get a SQL exception that the maximum nesting level has exceeded.
Contains is much better in this respect. It can handle a couple of thousands of elements before its performance gets severely affected.
So I would go for the scalable solution, even though Any may be faster with small numbers. It is possible to make Contains even better scalable.
I have read that any it use to be faster,
In LINQ-to-objects that's generally true, because the enumeration stops at the first hit. But with LINQ against a SQL backend, the generated SQL is what counts.
I'm having some trouble with SQL timeout for the following LINQ2SQL query:
DateTime date = DateTime.Parse("2013-08-01 00:00:00.000");
Clients.Where(e =>
(
!Orders.Any(f => f.ClientId.Equals(e.Id) && f.OrderDate >= date)
||
Comments.Any(f => f.KeyId.Equals(e.Id))
)
).Count().Dump();
When running this in LinqPad it will take forever to finish and will become an SQL timeout if running on the server.
The SQL-code generated:
-- Region Parameters
DECLARE #p0 DateTime = '2013-08-01 00:00:00.000'
-- EndRegion
SELECT COUNT(*) AS [value]
FROM [Clients] AS [t0]
WHERE (NOT (EXISTS(
SELECT NULL AS [EMPTY]
FROM [Orders] AS [t1]
WHERE ([t1].[ClientId] = [t0].[Id]) AND ([t1].[OrderDate] >= #p0)
))) OR (EXISTS(
SELECT NULL AS [EMPTY]
FROM [Comments] AS [t2]
WHERE [t2].[KeyId] = [t0].[Id]
))
Works fine in SQL-studio!
But:
SELECT COUNT(*) AS [value]
FROM [Clients] AS [t0]
WHERE
(NOT (EXISTS(SELECT NULL AS [EMPTY] FROM [Orders] AS [t1] WHERE ([t1].[ClientId] = [t0].[Id]) AND ([t1].[OrderDate] >= '2013-08-01 00:00:00.000'))))
OR
(EXISTS(SELECT NULL AS [EMPTY] FROM [Comments] AS [t2] WHERE [t2].[KeyId] = [t0].[Id]))
And will get me a the problem as actually running the query in LinqPad.
What is the difference of using DECLARE #p0 DateTime = '2013-08-01 00:00:00.000' compared to using the constant date and how do I get my Linq2SQL to work?
EDIT:
See execution plans for both queries:
Timeouts:
Fine:
Some other things I've noticed is that if I remove the NOT it works fine:
SELECT COUNT(*) AS [value]
FROM [Clients] AS [t0]
WHERE
((EXISTS(SELECT NULL AS [EMPTY] FROM [Orders] AS [t1] WHERE ([t1].[ClientId] = [t0].[Id]) AND ([t1].[OrderDate] >= '2013-08-01 00:00:00.000'))))
OR
(EXISTS(SELECT NULL AS [EMPTY] FROM [Comments] AS [t2] WHERE [t2].[KeyId] = [t0].[Id]))
Or if I remove the OR EXISTS parts it also works fine:
SELECT COUNT(*) AS [value]
FROM [Clients] AS [t0]
WHERE
((EXISTS(SELECT NULL AS [EMPTY] FROM [Orders] AS [t1] WHERE ([t1].[ClientId] = [t0].[Id]) AND ([t1].[OrderDate] >= '2013-08-01 00:00:00.000'))))
Thanks
/Niels
Your Orders table must be fairly large. You have an index on OrderDate right?
SQL Server actually generates 2 different execution plans in this example. Or if it generates the same plan, SQL gives greatly different numbers of returned rows for the 2 statements.
DECLARE #p0 DateTime = '2013-08-01 00:00:00.000'
SELECT * FROM Orders WHERE OrderDate >= #p0
SELECT * FROM Orders WHERE OrderDate >= '2013-08-01 00:00:00.000'
The first statement generates a parameterized query, plan optimizer will assume #p0 is unknown at the time and choose an execution plan that best fits unknown values.
The 2nd statement, optimizer will take into account that you supplied a fixed value. SQL will look at the index distribution and estimate how many rows will be filtered by >= '2013-08-01'
The execution plan is not visible, but in general sql performance recommendation don't use negations it always be a hit in performance. In your case try to use the <= instead the negation with the >=
And if you use many or it will hit your performance as well. Try to use subquerys as a work around to not use many or or negations.
The solution for me was to rebuild the index of OrderDate.
I'm newish to LinqToSQL and the project that I am working on cannot be changed to something else. I am translating some old SQL code to Linq. Not being that hot at linq, I used Linqer to do the translation for me. The query took about 90 seconds to run, so I thought it must be the linqToSQL. However, when I copied the query that the LinqToSQL produced and ran an ExecuteQuery on the datacontext it was super quick as I expected. I've copied the full queries, rather than trying to distil it down, but it looks like the issue is with something LinqToSQL is doing behind the scenes.
To summarise, if I copy the T-SQL created by linq and run
var results = DB.ExecuteQuery<InvoiceBalanceCheckDTO.InvoiceBalanceCheck>(#"T-SQL created by Linq - see below").ToList()
it completes with expected results in about 0.5 seconds.
It runs about the same time directly in SSMS. However, if I use the linqToSQL code that creates the T-SQL and do ToList() it takes ages. The result is only 9 records, although without the constraint to check the balance <> 0, there would be around 19,000 records. It's as if it's getting all 19,000 and then checking <> 0 after it's got the records.
I have also changed the Linq to project into the class used above, rather than to an anonymous type, but it makes not difference
This is the original SQL :
SELECT InvoiceNum, Max(AccountCode), Sum(AmountInc) AS Balance
FROM
(SELECT InvoiceNum, AccountCode, AmountInc From TourBookAccount WHERE AccDetailTypeE IN(20,30) AND InvoiceNum >= 1000
UNION ALL
SELECT InvoiceNum, '<no matching invoice>' AS AccountCode, AccountInvoiceDetail.AmountInc
FROM AccountInvoiceDetail
INNER JOIN AccountInvoice ON AccountInvoiceDetail.InvoiceID=AccountInvoice.InvoiceID
WHERE AccDetailTypeE IN(20,30)
AND InvoiceNum >= 1000
) as t
GROUP BY InvoiceNum
HAVING (Sum(t.AmountInc)<>0)
ORDER BY InvoiceNum
and this is the linq
var test = (from t in
(
//this gets the TourBookAccount totals
from tba in DB.TourBookAccount
where
detailTypes.Contains(tba.AccDetailTypeE) &&
tba.InvoiceNum >= dto.CheckInvoiceNumFrom
select new
{
InvoiceNum = tba.InvoiceNum,
AccountCode = tba.AccountCode,
Balance = tba.AmountInc
}
)
.Concat //note that concat, since it's possible that the AccountInvoice record does not actually exist
(
//this gets the Invoice detail totals.
from aid in DB.AccountInvoiceDetail
where
detailTypes.Contains(aid.AccDetailTypeE) &&
aid.AccountInvoice.InvoiceNum >= dto.CheckInvoiceNumFrom &&
select new
{
InvoiceNum = aid.AccountInvoice.InvoiceNum,
AccountCode = "<No Account Records>",
Balance = aid.AmountInc
}
)
group t by t.InvoiceNum into g
where Convert.ToDecimal(g.Sum(p => p.Balance)) != 0m
select new
{
InvoiceNum = g.Key,
AccountCode = g.Max(p => p.AccountCode),
Balance = g.Sum(p => p.Balance)
}).ToList();
and this is the T-SQL that the linq produces
SELECT [t5].[InvoiceNum], [t5].[value2] AS [AccountCode], [t5].[value3] AS [Balance]
FROM (
SELECT SUM([t4].[AmountInc]) AS [value], MAX([t4].[AccountCode]) AS [value2], SUM([t4].[AmountInc]) AS [value3], [t4].[InvoiceNum]
FROM (
SELECT [t3].[InvoiceNum], [t3].[AccountCode], [t3].[AmountInc]
FROM (
SELECT [t0].[InvoiceNum], [t0].[AccountCode], [t0].[AmountInc]
FROM [dbo].[TourBookAccount] AS [t0]
WHERE ([t0].[AccDetailTypeE] IN (20, 30)) AND ([t0].[InvoiceNum] >= 1000)
UNION ALL
SELECT [t2].[InvoiceNum],'<No Account Records>' AS [value], [t1].[AmountInc]
FROM [dbo].[AccountInvoiceDetail] AS [t1]
INNER JOIN [dbo].[AccountInvoice] AS [t2] ON [t2].[InvoiceID] = [t1].[InvoiceID]
WHERE ([t1].[AccDetailTypeE] IN (20, 30)) AND ([t2].[InvoiceNum] >= 1000)
) AS [t3]
) AS [t4]
GROUP BY [t4].[InvoiceNum]
) AS [t5]
WHERE [t5].[value] <> 0
I would bet money, that the problem is in this line:
where Convert.ToDecimal(g.Sum(p => p.Balance)) != 0m
What is probably happening, is that it can't translate this to SQL and silently tries to get all rows from db to memory, and then do filtering on in memory objects (LINQ to objects)
Maybe try to change this to something like:
where g.Sum(p=>.Balance!=0)
Well, the answer turned out not to be LinqToSQL itself (although possibly the way it creates the query could be blamed) , but the way SQL server handles the query. When I was running the query on the database to check speed (and running the created T=SQL in DB.ExecuteQuery) I had all the variables hardcoded. When I changed it to use the exact sql that Linq produces (i.e. with variables that are substituted) it ran just as slow in SSMS.
Looking at the execution plans of the two, they are quite different. A quick search on SO brought me to this page : Why does a parameterized query produces vastly slower query plan vs non-parameterized query which indicated that the problem was SQL server's "Parameter sniffing".
The culprit turned out to be the "No Account Records" string
For completeness, here is the generated T-SQL that Linq creates.
Change #p10 to the actual hardcoded string, and it's back to full speed !
In the end I just removed the line from the linq and set the account code afterwards and all was good.
Thanks #Botis,#Blorgbeard,#ElectricLlama & #Scott for suggestions.
DECLARE #p0 as Int = 20
DECLARE #p1 as Int = 30
DECLARE #p2 as Int = 1000
DECLARE #p3 as Int = 20
DECLARE #p4 as Int = 30
DECLARE #p5 as Int = 1000
DECLARE #p6 as Int = 40
DECLARE #p7 as Int = 10
DECLARE #p8 as Int = 0
DECLARE #p9 as Int = 1
DECLARE #p10 as NVarChar(4000)= '<No Account Records>' /*replace this parameter with the actual text in the SQl and it's way faster.*/
DECLARE #p11 as Decimal(33,4) = 0
SELECT [t5].[InvoiceNum], [t5].[value2] AS [AccountCode], [t5].[value3] AS [Balance]
FROM (
SELECT SUM([t4].[AmountInc]) AS [value], MAX([t4].[AccountCode]) AS [value2], SUM([t4].[AmountInc]) AS [value3], [t4].[InvoiceNum]
FROM (
SELECT [t3].[InvoiceNum], [t3].[AccountCode], [t3].[AmountInc]
FROM (
SELECT [t0].[InvoiceNum], [t0].[AccountCode], [t0].[AmountInc]
FROM [dbo].[TourBookAccount] AS [t0]
WHERE ([t0].[AccDetailTypeE] IN (#p0, #p1)) AND ([t0].[InvoiceNum] >= #p2)
UNION ALL
SELECT [t2].[InvoiceNum], #p10 AS [value], [t1].[AmountInc]
FROM [dbo].[AccountInvoiceDetail] AS [t1]
INNER JOIN [dbo].[AccountInvoice] AS [t2] ON [t2].[InvoiceID] = [t1].[InvoiceID]
WHERE ([t1].[AccDetailTypeE] IN (#p3, #p4)) AND ([t2].[InvoiceNum] >= #p5) AND ([t2].[InvoiceStatusE] <= #p6) AND ([t2].[InvoiceTypeE] = #p7) AND ([t1].[BookNum] <> #p8) AND ([t1].[AccDetailSourceE] = #p9)
) AS [t3]
) AS [t4]
GROUP BY [t4].[InvoiceNum]
) AS [t5]
WHERE [t5].[value] <> #p11
SELECT [t5].[InvoiceNum], [t5].[value2] AS [AccountCode], [t5].[value3] AS [Balance]
FROM (
SELECT SUM([t4].[AmountInc]) AS [value], MAX([t4].[AccountCode]) AS [value2], SUM([t4].[AmountInc]) AS [value3], [t4].[InvoiceNum]
FROM (
SELECT [t3].[InvoiceNum], [t3].[AccountCode], [t3].[AmountInc]
FROM (
SELECT [t0].[InvoiceNum], [t0].[AccountCode], [t0].[AmountInc]
FROM [dbo].[TourBookAccount] AS [t0]
WHERE ([t0].[AccDetailTypeE] IN (20, 30)) AND ([t0].[InvoiceNum] >= 1000)
UNION ALL
SELECT [t2].[InvoiceNum], '<No Account Records>' AS [value], [t1].[AmountInc]
FROM [dbo].[AccountInvoiceDetail] AS [t1]
INNER JOIN [dbo].[AccountInvoice] AS [t2] ON [t2].[InvoiceID] = [t1].[InvoiceID]
WHERE ([t1].[AccDetailTypeE] IN (20, 30)) AND ([t2].[InvoiceNum] >= 0) AND ([t2].[InvoiceStatusE] <= 40) AND ([t2].[InvoiceTypeE] = 10) AND ([t1].[BookNum] <> 0) AND ([t1].[AccDetailSourceE] = 1)
) AS [t3]
) AS [t4]
GROUP BY [t4].[InvoiceNum]
) AS [t5]
WHERE [t5].[value] <> 0