LinqToSQL - ToList() appears to be very slow

LinqToSQL - ToList() appears to be very slow - c#

I'm newish to LinqToSQL and the project that I am working on cannot be changed to something else. I am translating some old SQL code to Linq. Not being that hot at linq, I used Linqer to do the translation for me. The query took about 90 seconds to run, so I thought it must be the linqToSQL. However, when I copied the query that the LinqToSQL produced and ran an ExecuteQuery on the datacontext it was super quick as I expected. I've copied the full queries, rather than trying to distil it down, but it looks like the issue is with something LinqToSQL is doing behind the scenes.
To summarise, if I copy the T-SQL created by linq and run
var results = DB.ExecuteQuery<InvoiceBalanceCheckDTO.InvoiceBalanceCheck>(#"T-SQL created by Linq - see below").ToList()
it completes with expected results in about 0.5 seconds.
It runs about the same time directly in SSMS. However, if I use the linqToSQL code that creates the T-SQL and do ToList() it takes ages. The result is only 9 records, although without the constraint to check the balance <> 0, there would be around 19,000 records. It's as if it's getting all 19,000 and then checking <> 0 after it's got the records.
I have also changed the Linq to project into the class used above, rather than to an anonymous type, but it makes not difference
This is the original SQL :
SELECT InvoiceNum, Max(AccountCode), Sum(AmountInc) AS Balance
FROM
(SELECT InvoiceNum, AccountCode, AmountInc From TourBookAccount WHERE AccDetailTypeE IN(20,30) AND InvoiceNum >= 1000
UNION ALL
SELECT InvoiceNum, '<no matching invoice>' AS AccountCode, AccountInvoiceDetail.AmountInc
FROM AccountInvoiceDetail
INNER JOIN AccountInvoice ON AccountInvoiceDetail.InvoiceID=AccountInvoice.InvoiceID
WHERE AccDetailTypeE IN(20,30)
AND InvoiceNum >= 1000
) as t
GROUP BY InvoiceNum
HAVING (Sum(t.AmountInc)<>0)
ORDER BY InvoiceNum
and this is the linq
var test = (from t in
(
//this gets the TourBookAccount totals
from tba in DB.TourBookAccount
where
detailTypes.Contains(tba.AccDetailTypeE) &&
tba.InvoiceNum >= dto.CheckInvoiceNumFrom
select new
{
InvoiceNum = tba.InvoiceNum,
AccountCode = tba.AccountCode,
Balance = tba.AmountInc
}
)
.Concat //note that concat, since it's possible that the AccountInvoice record does not actually exist
(
//this gets the Invoice detail totals.
from aid in DB.AccountInvoiceDetail
where
detailTypes.Contains(aid.AccDetailTypeE) &&
aid.AccountInvoice.InvoiceNum >= dto.CheckInvoiceNumFrom &&
select new
{
InvoiceNum = aid.AccountInvoice.InvoiceNum,
AccountCode = "<No Account Records>",
Balance = aid.AmountInc
}
)
group t by t.InvoiceNum into g
where Convert.ToDecimal(g.Sum(p => p.Balance)) != 0m
select new
{
InvoiceNum = g.Key,
AccountCode = g.Max(p => p.AccountCode),
Balance = g.Sum(p => p.Balance)
}).ToList();
and this is the T-SQL that the linq produces
SELECT [t5].[InvoiceNum], [t5].[value2] AS [AccountCode], [t5].[value3] AS [Balance]
FROM (
SELECT SUM([t4].[AmountInc]) AS [value], MAX([t4].[AccountCode]) AS [value2], SUM([t4].[AmountInc]) AS [value3], [t4].[InvoiceNum]
FROM (
SELECT [t3].[InvoiceNum], [t3].[AccountCode], [t3].[AmountInc]
FROM (
SELECT [t0].[InvoiceNum], [t0].[AccountCode], [t0].[AmountInc]
FROM [dbo].[TourBookAccount] AS [t0]
WHERE ([t0].[AccDetailTypeE] IN (20, 30)) AND ([t0].[InvoiceNum] >= 1000)
UNION ALL
SELECT [t2].[InvoiceNum],'<No Account Records>' AS [value], [t1].[AmountInc]
FROM [dbo].[AccountInvoiceDetail] AS [t1]
INNER JOIN [dbo].[AccountInvoice] AS [t2] ON [t2].[InvoiceID] = [t1].[InvoiceID]
WHERE ([t1].[AccDetailTypeE] IN (20, 30)) AND ([t2].[InvoiceNum] >= 1000)
) AS [t3]
) AS [t4]
GROUP BY [t4].[InvoiceNum]
) AS [t5]
WHERE [t5].[value] <> 0

I would bet money, that the problem is in this line:
where Convert.ToDecimal(g.Sum(p => p.Balance)) != 0m
What is probably happening, is that it can't translate this to SQL and silently tries to get all rows from db to memory, and then do filtering on in memory objects (LINQ to objects)
Maybe try to change this to something like:
where g.Sum(p=>.Balance!=0)

Well, the answer turned out not to be LinqToSQL itself (although possibly the way it creates the query could be blamed) , but the way SQL server handles the query. When I was running the query on the database to check speed (and running the created T=SQL in DB.ExecuteQuery) I had all the variables hardcoded. When I changed it to use the exact sql that Linq produces (i.e. with variables that are substituted) it ran just as slow in SSMS.
Looking at the execution plans of the two, they are quite different. A quick search on SO brought me to this page : Why does a parameterized query produces vastly slower query plan vs non-parameterized query which indicated that the problem was SQL server's "Parameter sniffing".
The culprit turned out to be the "No Account Records" string
For completeness, here is the generated T-SQL that Linq creates.
Change #p10 to the actual hardcoded string, and it's back to full speed !
In the end I just removed the line from the linq and set the account code afterwards and all was good.
Thanks #Botis,#Blorgbeard,#ElectricLlama & #Scott for suggestions.
DECLARE #p0 as Int = 20
DECLARE #p1 as Int = 30
DECLARE #p2 as Int = 1000
DECLARE #p3 as Int = 20
DECLARE #p4 as Int = 30
DECLARE #p5 as Int = 1000
DECLARE #p6 as Int = 40
DECLARE #p7 as Int = 10
DECLARE #p8 as Int = 0
DECLARE #p9 as Int = 1
DECLARE #p10 as NVarChar(4000)= '<No Account Records>' /*replace this parameter with the actual text in the SQl and it's way faster.*/
DECLARE #p11 as Decimal(33,4) = 0
SELECT [t5].[InvoiceNum], [t5].[value2] AS [AccountCode], [t5].[value3] AS [Balance]
FROM (
SELECT SUM([t4].[AmountInc]) AS [value], MAX([t4].[AccountCode]) AS [value2], SUM([t4].[AmountInc]) AS [value3], [t4].[InvoiceNum]
FROM (
SELECT [t3].[InvoiceNum], [t3].[AccountCode], [t3].[AmountInc]
FROM (
SELECT [t0].[InvoiceNum], [t0].[AccountCode], [t0].[AmountInc]
FROM [dbo].[TourBookAccount] AS [t0]
WHERE ([t0].[AccDetailTypeE] IN (#p0, #p1)) AND ([t0].[InvoiceNum] >= #p2)
UNION ALL
SELECT [t2].[InvoiceNum], #p10 AS [value], [t1].[AmountInc]
FROM [dbo].[AccountInvoiceDetail] AS [t1]
INNER JOIN [dbo].[AccountInvoice] AS [t2] ON [t2].[InvoiceID] = [t1].[InvoiceID]
WHERE ([t1].[AccDetailTypeE] IN (#p3, #p4)) AND ([t2].[InvoiceNum] >= #p5) AND ([t2].[InvoiceStatusE] <= #p6) AND ([t2].[InvoiceTypeE] = #p7) AND ([t1].[BookNum] <> #p8) AND ([t1].[AccDetailSourceE] = #p9)
) AS [t3]
) AS [t4]
GROUP BY [t4].[InvoiceNum]
) AS [t5]
WHERE [t5].[value] <> #p11
SELECT [t5].[InvoiceNum], [t5].[value2] AS [AccountCode], [t5].[value3] AS [Balance]
FROM (
SELECT SUM([t4].[AmountInc]) AS [value], MAX([t4].[AccountCode]) AS [value2], SUM([t4].[AmountInc]) AS [value3], [t4].[InvoiceNum]
FROM (
SELECT [t3].[InvoiceNum], [t3].[AccountCode], [t3].[AmountInc]
FROM (
SELECT [t0].[InvoiceNum], [t0].[AccountCode], [t0].[AmountInc]
FROM [dbo].[TourBookAccount] AS [t0]
WHERE ([t0].[AccDetailTypeE] IN (20, 30)) AND ([t0].[InvoiceNum] >= 1000)
UNION ALL
SELECT [t2].[InvoiceNum], '<No Account Records>' AS [value], [t1].[AmountInc]
FROM [dbo].[AccountInvoiceDetail] AS [t1]
INNER JOIN [dbo].[AccountInvoice] AS [t2] ON [t2].[InvoiceID] = [t1].[InvoiceID]
WHERE ([t1].[AccDetailTypeE] IN (20, 30)) AND ([t2].[InvoiceNum] >= 0) AND ([t2].[InvoiceStatusE] <= 40) AND ([t2].[InvoiceTypeE] = 10) AND ([t1].[BookNum] <> 0) AND ([t1].[AccDetailSourceE] = 1)
) AS [t3]
) AS [t4]
GROUP BY [t4].[InvoiceNum]
) AS [t5]
WHERE [t5].[value] <> 0

Related

Why are multiple where in LINQ so slow?

Using C# and Linq to SQL, I found that my query with multiple where is orders of magnitude slower than with a single where / and.
Here is the query
using (TeradiodeDataContext dc = new TeradiodeDataContext())
{
var filterPartNumberID = 71;
var diodeIDsInBlades = (from bd in dc.BladeDiodes
select bd.DiodeID.Value).Distinct();
var diodesWithTestData = (from t in dc.Tests
join tt in dc.TestTypes on t.TestTypeID equals tt.ID
where tt.DevicePartNumberID == filterPartNumberID
select t.DeviceID.Value).Distinct();
var result = (from d in dc.Diodes
where d.DevicePartNumberID == filterPartNumberID
where diodesWithTestData.Contains(d.ID)
where !diodeIDsInBlades.Contains(d.ID)
orderby d.Name
select d);
var list = result.ToList();
// ~15 seconds
}
However, when the condition in the final query is this
where d.DevicePartNumberID == filterPartNumberID
& diodesWithTestData.Contains(d.ID)
& !diodeIDsInBlades.Contains(d.ID)
// milliseconds
it is very fast.
Comparing the SQL in result before calling ToList(), here are the queries (value 71 manually added in place of #params)
-- MULTIPLE WHERE
SELECT [t0].[ID], [t0].[Name], [t0].[M2MID], [t0].[DevicePartNumberID], [t0].[Comments], [t0].[Hold]
FROM [dbo].[Diode] AS [t0]
WHERE (NOT (EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT DISTINCT [t2].[value]
FROM (
SELECT [t1].[DiodeID] AS [value]
FROM [dbo].[BladeDiode] AS [t1]
) AS [t2]
) AS [t3]
WHERE [t3].[value] = [t0].[ID]
))) AND (EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT DISTINCT [t6].[value]
FROM (
SELECT [t4].[DeviceID] AS [value], [t5].[DevicePartNumberID]
FROM [dbo].[Test] AS [t4]
INNER JOIN [dbo].[TestType] AS [t5] ON [t4].[TestTypeID] = ([t5].[ID])
) AS [t6]
WHERE [t6].[DevicePartNumberID] = (71)
) AS [t7]
WHERE [t7].[value] = [t0].[ID]
)) AND ([t0].[DevicePartNumberID] = 71)
ORDER BY [t0].[Name]
and
-- SINGLE WHERE
SELECT [t0].[ID], [t0].[Name], [t0].[M2MID], [t0].[DevicePartNumberID], [t0].[Comments], [t0].[Hold]
FROM [dbo].[Diode] AS [t0]
WHERE ([t0].[DevicePartNumberID] = 71) AND (EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT DISTINCT [t3].[value]
FROM (
SELECT [t1].[DeviceID] AS [value], [t2].[DevicePartNumberID]
FROM [dbo].[Test] AS [t1]
INNER JOIN [dbo].[TestType] AS [t2] ON [t1].[TestTypeID] = ([t2].[ID])
) AS [t3]
WHERE [t3].[DevicePartNumberID] = (71)
) AS [t4]
WHERE [t4].[value] = [t0].[ID]
)) AND (NOT (EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT DISTINCT [t6].[value]
FROM (
SELECT [t5].[DiodeID] AS [value]
FROM [dbo].[BladeDiode] AS [t5]
) AS [t6]
) AS [t7]
WHERE [t7].[value] = [t0].[ID]
)))
ORDER BY [t0].[Name]
The two SQL queries execute in < 1 second in SSMS and produce the same results.
So I'm wondering why the first is slower on the LINQ side. It's worrying to me because I know I've used multiple where elsewhere, without being aware of a such a severe performance impact.
This question even has answered with both multiple & and where. And this answer even suggests using multiple where clauses.
Can anyone explain why this happens in my case?

Because writing like this
if (someParam1 != 0)
{
myQuery = myQuery.Where(q => q.SomeField1 == someParam1)
}
if (someParam2 != 0)
{
myQuery = myQuery.Where(q => q.SomeField2 == someParam2)
}
is NOT(upd) the same as (in case when someParam1 and someParam2 != 0)
myQuery = from t in Table
where t.SomeField1 == someParam1
&& t.SomeField2 == someParam2
select t;
is (NOT deleted) the same as
myQuery = from t in Table
where t.SomeField1 == someParam1
where t.SomeField2 == someParam2
select t;
UPD
Yes, I do mistake. Second query is same, first is not same.
First and Second queries not EXACTLY the same. Let me show you what I mean.
1st query with lamda-expression writen as
t.Where(r => t.SomeField1 == someParam1 && t.SomeField2 == someParam2)
2nd query as
t.Where(r => r.SomeField1 == someParam1).Where(r => r.SomeField2 == someParam2)
In this case in generated SQL Predicate with SomeField2 goes first (it is important, see below)
In 1st case we getting this SQL:
SELECT <all field from Table>
FROM table t
WHERE t.SomeField1 = :someParam1
AND t.SomeField2 = :someParam2
In 2 case the SQL is:
SELECT <all field from Table>
FROM table t
WHERE t.SomeField2 = :someParam2
AND t.SomeField1 = :someParam1
As we see there are 2 'same' SQLs. As we see, the OP's SQLs are also 'same', they are different in order of predicates in WHERE clause (as in my example). And I guess that SQL optimizer generate 2 different execution plans and may be(!!!) doing NOT EXISTS, then EXISTS and then filtering take more time than do first filtering and after that do EXISTS and NOT EXISTS
UPD2
It is a 'problem' of Linq Provider (ORM). I'm using another ORM (linq2db), and it generates for me EXACTLY the same SQLs in both cases.

SQL Server - DATEDIFF Function taking too long

I've done some extensive research and I've concluded that the DATEDIFF function is making my queries run very slow.
Below is the generated query by Entity Framework and it does look readable enough hopefully.
Here's the Linq that generates the T-SQL:
model.NewTotal1Week = ( from sdo in context.SubscriberDebitOrders
where
(
sdo.CampaignId == campaignId &&
( sdo.Status == ( Int32 ) DebitOrderStatus.New_Faulty ) &&
( SqlFunctions.DateDiff( "week", sdo.Collections.FirstOrDefault( c => c.TxnStatus == "U" ).ProcessDate, DateTime.Now ) <= 1 )
)
select sdo ).Count();
In the query below, I would like to get a COUNT of all Collections which fall within 1 week from the time they were Processed to today's date.
Is there anyone that can help me get rid of the DATEDIFF function? I've seen examples online but I couldn't adapt it to my scenario, forgive me I'm not very genius yet.
exec sp_executesql N'SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM [dbo].[SubscriberDebitOrder] AS [Extent1]
OUTER APPLY (SELECT TOP (1)
[Extent2].[ProcessDate] AS [ProcessDate]
FROM [dbo].[Collections] AS [Extent2]
WHERE ([Extent1].[Id] = [Extent2].[DebitOrderId]) AND (''U'' = [Extent2].[TxnStatus]) ) AS [Limit1]
WHERE ([Extent1].[CampaignId] = #p__linq__0) AND (3 = [Extent1].[Status]) AND ((DATEDIFF(week, [Limit1].[ProcessDate], SysDateTime())) <= 1)
) AS [GroupBy1]',N'#p__linq__0 int',#p__linq__0=3
go
Thanks in advance.

Its not the just DATEDIFF, any function on the column would cause query to do a SCAN on the underlying table/index
DATEDIFF(week, [Limit1].[ProcessDate], SysDateTime())) <=1
Above logic is fetching last week data? You can also write above without putting function around ProcessDate Column.
[Limit1].[ProcessDate] > SysDateTime()-7

This is your query:
SELECT GroupBy1.A1 AS C1
FROM (SELECT COUNT(1) AS[A1
FROM dbo.SubscriberDebitOrder AS Extent1 OUTER APPLY
(SELECT TOP (1) Extent2.ProcessDate
FROM [dbo].Collections Extent2
WHERE (Extent1.Id = Extent2.DebitOrderId AND
'U' = Extent2.TxnStatus
) AS [Limit1]
WHERE (Extent1.CampaignId = #p__linq__0) AND (3 = Extent1.Status) AND
(DATEDIFF(week, Limit1.ProcessDate, SysDateTime()) <= 1)
) GroupBy1;
As mentioned elsewhere, you should change the date logic and get rid of the outer query:
SELECT COUNT(1) AS A1
FROM dbo.SubscriberDebitOrder AS Extent1 OUTER APPLY
(SELECT TOP (1) Extent2.ProcessDate
FROM [dbo].Collections Extent2
WHERE (Extent1.Id = Extent2.DebitOrderId AND
'U' = Extent2.TxnStatus
) AS limit1
WHERE (Extent1.CampaignId = #p__linq__0) AND (3 = Extent1.Status) AND
Limit1.ProcessDate <= DATEADD(-1, week, GETDATE())
Very important note: This is not exactly equivalent to your query. Your original query counted the number of week boundaries between two dates. This depends on datefirst, but it woudld often be the number of Saturday or Sunday nights.
Based on your description, the above is more correct.
Next, you want indexes on Collections(DebitOrderId, TxnStatus, ProcessDate) and SubscriberDebitOrder(CampaignId, Status).

Trouble with some linq2sql generated T-SQL

I'm having some trouble with SQL timeout for the following LINQ2SQL query:
DateTime date = DateTime.Parse("2013-08-01 00:00:00.000");
Clients.Where(e =>
(
!Orders.Any(f => f.ClientId.Equals(e.Id) && f.OrderDate >= date)
||
Comments.Any(f => f.KeyId.Equals(e.Id))
)
).Count().Dump();
When running this in LinqPad it will take forever to finish and will become an SQL timeout if running on the server.
The SQL-code generated:
-- Region Parameters
DECLARE #p0 DateTime = '2013-08-01 00:00:00.000'
-- EndRegion
SELECT COUNT(*) AS [value]
FROM [Clients] AS [t0]
WHERE (NOT (EXISTS(
SELECT NULL AS [EMPTY]
FROM [Orders] AS [t1]
WHERE ([t1].[ClientId] = [t0].[Id]) AND ([t1].[OrderDate] >= #p0)
))) OR (EXISTS(
SELECT NULL AS [EMPTY]
FROM [Comments] AS [t2]
WHERE [t2].[KeyId] = [t0].[Id]
))
Works fine in SQL-studio!
But:
SELECT COUNT(*) AS [value]
FROM [Clients] AS [t0]
WHERE
(NOT (EXISTS(SELECT NULL AS [EMPTY] FROM [Orders] AS [t1] WHERE ([t1].[ClientId] = [t0].[Id]) AND ([t1].[OrderDate] >= '2013-08-01 00:00:00.000'))))
OR
(EXISTS(SELECT NULL AS [EMPTY] FROM [Comments] AS [t2] WHERE [t2].[KeyId] = [t0].[Id]))
And will get me a the problem as actually running the query in LinqPad.
What is the difference of using DECLARE #p0 DateTime = '2013-08-01 00:00:00.000' compared to using the constant date and how do I get my Linq2SQL to work?
EDIT:
See execution plans for both queries:
Timeouts:
Fine:
Some other things I've noticed is that if I remove the NOT it works fine:
SELECT COUNT(*) AS [value]
FROM [Clients] AS [t0]
WHERE
((EXISTS(SELECT NULL AS [EMPTY] FROM [Orders] AS [t1] WHERE ([t1].[ClientId] = [t0].[Id]) AND ([t1].[OrderDate] >= '2013-08-01 00:00:00.000'))))
OR
(EXISTS(SELECT NULL AS [EMPTY] FROM [Comments] AS [t2] WHERE [t2].[KeyId] = [t0].[Id]))
Or if I remove the OR EXISTS parts it also works fine:
SELECT COUNT(*) AS [value]
FROM [Clients] AS [t0]
WHERE
((EXISTS(SELECT NULL AS [EMPTY] FROM [Orders] AS [t1] WHERE ([t1].[ClientId] = [t0].[Id]) AND ([t1].[OrderDate] >= '2013-08-01 00:00:00.000'))))
Thanks
/Niels

Your Orders table must be fairly large. You have an index on OrderDate right?
SQL Server actually generates 2 different execution plans in this example. Or if it generates the same plan, SQL gives greatly different numbers of returned rows for the 2 statements.
DECLARE #p0 DateTime = '2013-08-01 00:00:00.000'
SELECT * FROM Orders WHERE OrderDate >= #p0
SELECT * FROM Orders WHERE OrderDate >= '2013-08-01 00:00:00.000'
The first statement generates a parameterized query, plan optimizer will assume #p0 is unknown at the time and choose an execution plan that best fits unknown values.
The 2nd statement, optimizer will take into account that you supplied a fixed value. SQL will look at the index distribution and estimate how many rows will be filtered by >= '2013-08-01'

The execution plan is not visible, but in general sql performance recommendation don't use negations it always be a hit in performance. In your case try to use the <= instead the negation with the >=
And if you use many or it will hit your performance as well. Try to use subquerys as a work around to not use many or or negations.

The solution for me was to rebuild the index of OrderDate.

LINQ SQL Statement is returning wrong results

I have a SQL statement which afaik is correct but the response from the SQL server is incorrect. I've debugged this issue and found that if I execute the SQL statement without the wrapping store procedure I get different results. All I have done is replaced the variable with the actual values
Linq generated code:
exec sp_executesql N'SELECT [t0].[RoomId], [t0].[Title], [t0].[Detail], [t0].[ThumbnailPath], [t0].[PageId], [t0].[TypeId], [t0].[LocationId], [t0].[TimeStamp], [t0].[DeleteStamp]
FROM [dbo].[Room] AS [t0]
INNER JOIN [dbo].[RoomType] AS [t1] ON [t1].[RoomTypeId] = [t0].[TypeId]
WHERE ([t1].[Sleeps] >= #p0) AND ([t0].[DeleteStamp] IS NULL) AND (((
SELECT COUNT(*)
FROM [dbo].[Booking] AS [t2]
INNER JOIN [dbo].[Order] AS [t3] ON [t3].[OrderId] = [t2].[OrderId]
WHERE ([t2].[StartStamp] <= #p1)
AND ([t2].[EndStamp] >= #p2)
AND (([t3].[Status] = #p3)
OR ([t3].[Status] = #p4)
OR (([t3].[Status] = #p5) AND ([t3].[CreatedStamp] > #p6)))
AND ([t2].[RoomId] = [t0].[RoomId])
)) = #p7)
',N'#p0 int,#p1 datetime,#p2 datetime,#p3 int,#p4 int,#p5 int,#p6 datetime,#p7 int',
#p0=1,#p1='2011-04-05 00:00:00',#p2='2011-04-04 00:00:00',#p3=3,#p4=5,#p5=0,#p6='2011-04-04 12:36:09.490',#p7=0
Without the SP
SELECT [t0].[RoomId], [t0].[Title], [t0].[Detail], [t0].[ThumbnailPath], [t0].[PageId], [t0].[TypeId], [t0].[LocationId], [t0].[TimeStamp], [t0].[DeleteStamp]
FROM [dbo].[Room] AS [t0]
INNER JOIN [dbo].[RoomType] AS [t1] ON [t1].[RoomTypeId] = [t0].[TypeId]
WHERE ([t1].[Sleeps] >= 1) AND ([t0].[DeleteStamp] IS NULL) AND (((
SELECT COUNT(*)
FROM [dbo].[Booking] AS [t2]
INNER JOIN [dbo].[Order] AS [t3] ON [t3].[OrderId] = [t2].[OrderId]
WHERE ([t2].[StartStamp] <= '2011-04-05 00:00:00')
AND ([t2].[EndStamp] >= '2011-04-04 00:00:00')
AND (([t3].[Status] = 3)
OR ([t3].[Status] = 4)
OR (([t3].[Status] = 5) AND ([t3].[CreatedStamp] > '2011-04-04 12:36:09.490')))
AND ([t2].[RoomId] = [t0].[RoomId])
)) = 0)
The first result set returns 1 row where as the 2nd returns me 21!!
Can anybody spot the difference as its driving me crazy.

You made an error replacing the variables!
You replaced p4 with 4 when you should have replaced it with 5 and p5 with 5 instead of 0.

Well, one difference is #p5=0 while you have [t3].[Status] = 5 in the other.

Linq's static function 'Contain' is messing up with its content on a nested query?

I'm trying to reduce the number of database queries and in doing so I'm attempting to retrieve a series of hierarchical entities (that were previously fetched recursively) in one go.
I've got a Labels property holding a set of strings:
public string[] Labels { get; set; } // new string[] {"{{a}}", "{{b}}", "{{c}}", "{{d}}", "{{e}}"};
that I'm using to construct a first query:
var IDeferredTopLabels=
db.labels
.Where(l =>
l.site_id == this.site_id &&
this.Labels.Contains(l.name_for_code)
)
.Select(l => new LabelWithParentId { Label = l, ParentId = null });
The above, if inspected while debugging, would generate the following TSQL
SELECT [t0].[id], [t0].[name_for_code], [t0].[site_id], [t0].[priority_level]
FROM [dbo].[labels] AS [t0]
WHERE ([t0].[site_id] = 15) AND ([t0].[name_for_code] IN ('{{a}}', '{{b}}', '{{c}}', '{{d}}', '{{e}}'))
So far so good, following the IDeferredToplabels declaration I'm extending the first query with a Union specifying how child Label entities should be fetched:
var IDeferredWithChildLabels = IDeferredTopLabels
.Union(
db.label__labels
.Where(ll =>
db.labels
.Where(l =>
l.site_id == this.site_id &&
this.Labels.Contains(l.name_for_code)
)
.Select(l => l.id)
.Contains(ll.parent_label_id)
)
.Select(ll => new LabelWithParentId { Label = ll.label1, ParentId = ll.parent_label_id})
);
Now if I inspect the TSQL generate is:
SELECT [t4].[id], [t4].[name_for_code], [t4].[site_id], [t4].[priority_level], [t4].[value] AS [ParentId]
FROM (
SELECT [t0].[id], [t0].[name_for_code], [t0].[site_id], [t0].[priority_level], NULL AS [value]
FROM [dbo].[labels] AS [t0]
WHERE ([t0].[site_id] = 15) AND ([t0].[name_for_code] IN ('{{a}}', '{{b}}', '{{c}}', '{{d}}', '{{e}}'))
UNION
SELECT [t2].[id], [t2].[name_for_code], [t2].[site_id], [t2].[priority_level], [t1].[parent_label_id] AS [value]
FROM [dbo].[label__label] AS [t1]
INNER JOIN [dbo].[labels] AS [t2] ON [t2].[id] = [t1].[label_id]
WHERE EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[labels] AS [t3]
WHERE ([t3].[id] = [t1].[parent_label_id]) AND ([t3].[site_id] = 15) AND ([t3].[name_for_code] IN ('{{a}}', '{{b}}', '{{c}}', '{{a}}'0, '{{a}}'1))
)
) AS [t4]
If you see at the end of this second TSQL Query, the part IN ('{{a}}', '{{b}}', '{{c}}', '{{a}}'0, '{{a}}'1)), the set of strings have some unespected 0 and 1 next to the second last and last element plus it is no longer the initial set of strings (ie '{{a}}', '{{b}}', '{{c}}', '{{d}}', '{{e}}') but instead '{{a}}', '{{b}}', '{{c}}', '{{a}}', '{{a}}'.
What I'm I doing wrong!?
I'm clueless, I appreciate your help, thank you.
EDIT:
I just tried to use an int[] and compare the id rather then the name_for_code and I'm still getting 1 and 0 appended to the last 2 elements of the array as well as the array elements are wrong:
SELECT [t4].[id], [t4].[name_for_code], [t4].[site_id], [t4].[priority_level], [t4].[value] AS [ParentId]
FROM (
SELECT [t0].[id], [t0].[name_for_code], [t0].[site_id], [t0].[priority_level], NULL AS [value]
FROM [dbo].[labels] AS [t0]
WHERE ([t0].[site_id] = 15) AND ([t0].[id] IN (1, 2, 3, 4, 5))
UNION
SELECT [t2].[id], [t2].[name_for_code], [t2].[site_id], [t2].[priority_level], [t1].[parent_label_id] AS [value]
FROM [dbo].[label__label] AS [t1]
INNER JOIN [dbo].[labels] AS [t2] ON [t2].[id] = [t1].[label_id]
WHERE EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[labels] AS [t3]
WHERE ([t3].[id] = [t1].[parent_label_id]) AND ([t3].[site_id] = 15) AND ([t3].[id] IN (1, 2, 3, 10, 11))
)
) AS [t4]

OK I found the problem and actually it was not happening when running the code but only in debug mode and specifically it seems it is a bug in Scott Gu's LINQ to SQL Debug Visualizer
That is why the TSQL that I was grabbing was not parameterized, #David B you made a very good point!
I will leave a comment on Scott Gu's page pointing to this question.
Thanks all!

Try making a copy of Labels and using that in the second part of the query (the Union). EG:
string[] LabelsCopy = new string[5];
Labels.CopyTo(LabelsCopy, 0);
Then set a breakpoint after the query and inspect the contents of each array (or print out the contents of both arrays with Console.WriteLine(String.Join(',',Labels))).
It's extremely unlikely that .Contains() is changing the contents of Labels, but if it is then you'll have proof.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

LinqToSQL - ToList() appears to be very slow - c#

Related

Why are multiple where in LINQ so slow?

SQL Server - DATEDIFF Function taking too long

Trouble with some linq2sql generated T-SQL

LINQ SQL Statement is returning wrong results

Linq's static function 'Contain' is messing up with its content on a nested query?

Categories

Resources