Generated LinqtoSql Sql 5x slower than SAME EXACT hand-written sql - c#

I have a sql statement which is hardcoded in an existing VB6 app. I'm upgrading a new version in C# and using Linq To Sql. I was able to get LinqToSql to generate the same sql (before I start refactoring), but for some reason the Sql generated by LinqToSql is 5x slower than the original sql. This is running the generated Sql Directly in LinqPad.
The only real difference my meager sql eyes can spot is the
WITH (NOLOCK), which if I add into the LinqToSql generated sql, makes no difference.
Can someone point out what I'm doing wrong here? Thanks!
Existing Hard Coded Sql (5.0 Seconds)
SELECT DISTINCT
CH.ClaimNum, CH.AcnProvID, CH.AcnPatID, CH.TinNum, CH.Diag1, CH.GroupNum, CH.AllowedTotal
FROM Claims.dbo.T_ClaimsHeader AS CH WITH (NOLOCK)
WHERE
CH.ContractID IN ('123A','123B','123C','123D','123E','123F','123G','123H')
AND ( ( (CH.Transmited Is Null or CH.Transmited = '')
AND CH.DateTransmit Is Null
AND CH.EobDate Is Null
AND CH.ProcessFlag IN ('Y','E')
AND CH.DataSource NOT IN ('A','EC','EU')
AND CH.AllowedTotal > 0 ) )
ORDER BY CH.AcnPatID, CH.ClaimNum
Generated Sql from LinqToSql (27.6 Seconds)
-- Region Parameters
DECLARE #p0 NVarChar(4) SET #p0 = '123A'
DECLARE #p1 NVarChar(4) SET #p1 = '123B'
DECLARE #p2 NVarChar(4) SET #p2 = '123C'
DECLARE #p3 NVarChar(4) SET #p3 = '123D'
DECLARE #p4 NVarChar(4) SET #p4 = '123E'
DECLARE #p5 NVarChar(4) SET #p5 = '123F'
DECLARE #p6 NVarChar(4) SET #p6 = '123G'
DECLARE #p7 NVarChar(4) SET #p7 = '123H'
DECLARE #p8 VarChar(1) SET #p8 = ''
DECLARE #p9 NVarChar(1) SET #p9 = 'Y'
DECLARE #p10 NVarChar(1) SET #p10 = 'E'
DECLARE #p11 NVarChar(1) SET #p11 = 'A'
DECLARE #p12 NVarChar(2) SET #p12 = 'EC'
DECLARE #p13 NVarChar(2) SET #p13 = 'EU'
DECLARE #p14 Decimal(5,4) SET #p14 = 0
-- EndRegion
SELECT DISTINCT
[t0].[ClaimNum],
[t0].[acnprovid] AS [AcnProvID],
[t0].[acnpatid] AS [AcnPatID],
[t0].[tinnum] AS [TinNum],
[t0].[diag1] AS [Diag1],
[t0].[GroupNum],
[t0].[allowedtotal] AS [AllowedTotal]
FROM [Claims].[dbo].[T_ClaimsHeader] AS [t0]
WHERE
([t0].[contractid] IN (#p0, #p1, #p2, #p3, #p4, #p5, #p6, #p7))
AND (([t0].[Transmited] IS NULL) OR ([t0].[Transmited] = #p8))
AND ([t0].[DATETRANSMIT] IS NULL)
AND ([t0].[EOBDATE] IS NULL)
AND ([t0].[PROCESSFLAG] IN (#p9, #p10))
AND (NOT ([t0].[DataSource] IN (#p11, #p12, #p13)))
AND ([t0].[allowedtotal] > #p14)
ORDER BY [t0].[acnpatid], [t0].[ClaimNum]
New LinqToSql Code (30+ seconds... Times out )
var contractIds = T_ContractDatas.Where(x => x.EdiSubmissionGroupID == "123-01").Select(x => x.CONTRACTID).ToList();
var processFlags = new List<string> {"Y","E"};
var dataSource = new List<string> {"A","EC","EU"};
var results = (from claims in T_ClaimsHeaders
where contractIds.Contains(claims.contractid)
&& (claims.Transmited == null || claims.Transmited == string.Empty )
&& claims.DATETRANSMIT == null
&& claims.EOBDATE == null
&& processFlags.Contains(claims.PROCESSFLAG)
&& !dataSource.Contains(claims.DataSource)
&& claims.allowedtotal > 0
select new
{
ClaimNum = claims.ClaimNum,
AcnProvID = claims.acnprovid,
AcnPatID = claims.acnpatid,
TinNum = claims.tinnum,
Diag1 = claims.diag1,
GroupNum = claims.GroupNum,
AllowedTotal = claims.allowedtotal
}).OrderBy(x => x.ClaimNum).OrderBy(x => x.AcnPatID).Distinct();
I'm using the list of constants above to make LinqToSql Generate IN ('xxx','xxx',etc) Otherwise it uses subqueries which are just as slow...

Compare the execution plans for the two queires. The linqtosql query is using loads of parameters, the query optimiser will build an execution plan based on what MIGHT be in the parameters, the hard coded SQL has literal values, the query optimiser will build an execution plan based on the actual values. It is probably producing a much more eficient plan for the literal values. Your best bet is to try and spot the slow bits in the execution plan and try and get linq2sql to produce a better query. If you can't but you think you can build one by hand then create an SP, which you can then expose as a method on your data context class in linqtosql.

The hard-coded values in the first SQL may be allowing the query optimizer to use indexes that it doesn't know it can efficiently use for the second, parameterised, SQL.
Another possibility is that if you're running the hand-crafted SQL in SQL Server Management Studio, the different default SET-tings of SSMS compared to the .NET SQL Server provider may be affecting performance. If this is the case, changing some of the SET-tings on the .NET connection prior to executing the command might help (e.g. SET ARITHABORT ON) but I don't know if you can do this in LinqPad. See here for more info on this possibility.

The big difference are the parameters.
I can't know for sure without analyzing the plans, but L2S parameterizes queries so that their plans can be effectively reused, avoiding excessive query recompilation on the server. This is, in general, a Good Thing because it keeps the CPU time low on the SQL Server -- it doesn't have to keep generating and generating and generating the same plan.
But L2S goes a bit overboard when you use constants. It parameterizes them, too, which can be detrimental for performance in certain situations.
Putting on my Aluminum-Foil Clairvoyancy Hat, I'm visualizing the kinds of index structures you might have on that table. For example, you may have an index just on ProcessFlag, and there may be very few values for "Y" and "E" for ProcessFlag, causing the query with the hard-coded constants to do a scan only of the values where ProcessFlag = "Y" and "E". For the parameterized query, SQL Server generates a plan which is judged to be optimal for arbitrary input. That means that the server can't take advantage of this little hint (the constants) that you give it.
My advice to you at this point is to take a good look at your indexes and favor composite indexes which cover more of your WHERE conditions together. I will bet that with a bit of that type of analysis, you will find that the query performance becomes far more similar. (and probably improves, in both cases!)

You might also check our compiled LINQ queries - http://www.jdconley.com/blog/archive/2007/11/28/linq-to-sql-surprise-performance-hit.aspx

Related

Alter EF Generated query parameters

I'm using EF 6.4.4 to query a SQL view. Now the view is not really performing optimal, but i don't control it.
I'm executing the following code with a WHERE statement on a string/nvarchar property
_context.ViewObject
.Where(x => x.Name == nameFilter)
.ToList();
Similarly, i have the same SQL statement executed in SMSS
SELECT [Id]
, [Name]
, ...
FROM [View]
WHERE [Name] = '<nameFilter>'
My problem is that the EF variant is way slower than the direct SQL query.
When checking out the SQL query generated by EF i see the following:
SELECT [Id]
, [Name]
, ...
FROM [View]
WHERE [Name] = #p__linq__0
with parameter #p__linq__0 is of type NVARCHAR(4000) NULL
This even though that my input variable is not NULL and has a lenght of maximum 6 characters.
When i execute the same sql query with this parameter, it is slow in SMSS as well.
Apparently, this has somethign
So what i want to do is alter the SQL query parameter that EF is using to generate this query. This to make sure that my parameter is more accurately represented in the query and that i can get the same performance as directly in SMSS.
Is there a way to do this?
Whats going on: parameter sniffing
Execute the following in SSMS and you will propably see the same performance.
EXECUTE sp_executesql N'SELECT [Id]
, [Name]
, ...
FROM [View]
WHERE [Name] = #nameFilter'
,N'#nameFilter nvarchar(4000)'
,#nameFilter = '<namefilter>';
sp_executeSql is used by EF to execute queries against a database and thus, when you write .Where(x => x.Name == nameFilter) this is translated to the above statement.
Making you suffer from parameter sniffing.
You could fix this by adding recompile to your queries like described here But be aware that adding recompile to all queries might have negative impact on the other queries.
You can execute the following queries with actual execution plan to see the difference:
Query with WHERE Name = #NameFilter
Query with WHERE Name = '<NameFilter>'
Query with WHERE Name = #NameFilter OPTION(RECOMPILE)
If it's not parameter sniffing, it might be implicit conversions, but I'm guessing both types are NVARCHAR so this shouldn't matter.
99% of the time it's parameter sniffing.

Using collation in Linq to Sql

Imagine this sql query
select * from products order by name collate Persian_100_CI_AI asc
Now using Linq:
product = DB.Products.OrderBy(p => p.name); // what should I do here?
How can I apply collation?
This is now possible with EF Core 5.0 using the collate function.
In your example the code would be:
product = DB.Products.OrderBy(p => EF.Functions.Collate(p.name, "Persian_100_CI_AI"));
There is no direct way.
Workaround:
Create function in Sql Server
CREATE FUNCTION [dbo].[fnsConvert]
(
#p NVARCHAR(2000) ,
#c NVARCHAR(2000)
)
RETURNS NVARCHAR(2000)
AS
BEGIN
IF ( #c = 'Persian_100_CI_AI' )
SET #p = #p COLLATE Persian_100_CI_AI
IF ( #c = 'Persian_100_CS_AI' )
SET #p = #p COLLATE Persian_100_CS_AI
RETURN #p
END
Import it in model and use:
from o in DB.Products
orderby DB.fnsConvert(s.Description, "Persian_100_CI_AI")
select o;
You can't change the collation through a LINQ statement. You better do the sorting in memory by applying a StringComparer that is initialized with the correct culture (at least... I hope it's correct) and ignores case (true).
DB.Products.AsEnumerable()
.OrderBy (x => x, StringComparer.Create(new CultureInfo("fa-IR"), true))
edit
Since people (understandably) don't seem to read comments let me add that this is answered using the exact code of the question, in which there is no Where or Select. Of course I'm aware of the possibly huge data overhead when doing something like...
DB.Products.AsEnumerable().Where(...).Select(...).OrderBy(...)
...which first pulls the entire table contents into memory and then does the filtering and projection the database itself could have done by moving AsEnumerable():
DB.Products.Where(...).Select(...).AsEnumerable().OrderBy(...)
The point is that if the database doesn't support ordering by some desired character set/collation the only option using EF's DbSet is to do the ordering in memory.
The alternative is to run a SQL query having an ORDER BY with explicit collation. If paging is used, this is the only option.

Strange SQL generated from LINQ when checking BIT column

I have the following LINQtoSQL statement
from t1 in __table1
join t2 in __table2 on t1.Id equals t2.OtherTableId
where t2.BranchId == branchId
&& !t1.IsPersonal
select t1.Id
And this generates the following SQL
SELECT DISTINCT [t0].[Id]
FROM [__table1] AS [t0]
INNER JOIN [__table2] AS [t1] ON [t0].[Id] = [t1].[OtherTableId]
WHERE ([t1].[BranchId] = #p0) AND (NOT ([t0].[IsPersonal] = 1))
Now the issue that I have is this:
(NOT ([t0].[IsPersonal] = 1))
How can I write the LINQ to just say
[t0].[IsPersonal] = 0
NOTE: IsPersonal is not nullable.
Edit: I may have outsmarted the optimizer but unfortunately when using Linq2Sql the filtered indexes aren't used when the 'filter criteria' is a parameter - which is what this does. So in the end I gave up and switched to a stored procedure. Alternatives were just too icky.
Note: the generated SQL does work with filtered indexes without an index hint, but since I was running in in SSMS the query plan cache doesn't apply.
Aha! Finally managed to outsmart the optimizer.
WHERE object.Equals(t.Voided, 0) or
WHERE object.Equals(t.Voided, "false")
Which generates
WHERE ([t0].[Voided] = #p0)
#p0 is sent as a string or number which SQL Server casts to a boolean for you.
This seems to work with a filtered index (and force hint), which is the reason I needed to get around the optimizer in the first place.
Note: For some reason sometimes "0" gives a boolean parse error so 0 or "false" is probably better. Could depend on some subtleties of your query.
I prefer 0 because "false" ends up being a varchar(8000) which is a little overkill!
Right so I think that I have figured it out. The following line
t1.IsPersonal == false
gets optimised as
!t1.IsPersonal
Which is, in turn, literally translated into
(NOT ([t0].[IsPersonal] = 1))
Seems that the optimiser is to "blame"

Linq to SQL, update many rows

I need to update a row of alarms with Linq to SQL, which can contain over 100000 rows.
Which means that a simple update such as:
foreach (var alarm in Alarms)
{
alarm.Alarm_Ack_UTC = DateTime.UtcNow;
}
SubmitChanges();
gives me a SQL query of
SELECT [t0].[Alarm_ID], [t0].[Alarm_Application_Number], [t0].[Alarm_Ack_UTC], [t0].[Alarm_DateTime_UTC], [t0].[Alarm_Message_Number], [t0].[Username], [t0].[Runtime_Message], [t0].[Alarm_Application_Name], [t0].[Alarm_Application_Computer], [t0].[Alarm_GUID], [t0].[Alarm_Comments]
FROM [Alarms] AS [t0]
GO
-- Region Parameters
DECLARE #p0 Int = 1
DECLARE #p1 DateTime = '2012-03-16 11:56:25.850'
-- EndRegion
UPDATE [Alarms]
SET [Alarm_Ack_UTC] = #p1
WHERE [Alarm_ID] = #p0
GO
-- Region Parameters
DECLARE #p0 Int = 2
DECLARE #p1 DateTime = '2012-03-16 11:56:25.851'
-- EndRegion
UPDATE [Alarms]
SET [Alarm_Ack_UTC] = #p1
WHERE [Alarm_ID] = #p0
GO
-- Region Parameters
DECLARE #p0 Int = 3
DECLARE #p1 DateTime = '2012-03-16 11:56:25.851'
-- EndRegion
UPDATE [Alarms]
SET [Alarm_Ack_UTC] = #p1
WHERE [Alarm_ID] = #p0
GO
Repeated 100000 times, which is really slow, inefficient and unoptimized.
The real query is more advanced, and update more data, uses a .Where(a => a.Time != null) and other things.
But just to improve the query above, which could be replaced with the very efficient SQL query:
UPDATE [Alarms]
SET Alarm_Ack_UTC = GETUTCDATE()
GO
How can one achieve this with Linq to SQL? Or is it impossible?
You can't do this with LINQ to SQL (or any other O/RM). They will always fetch an object from the database that you want to change and have a single update statement for that entity. If you change 10,000 entities, you will have at least 10,000 update statements.
If this is too slow, switch to a stored procedure or manual SQL statement in that case.
I'd opt for writing a stored procedure.
You can then map to this stored procedure in your Linq-to-SQL designer by dragging it over to your design. It will then appear as a method of your DataContext and will result in much more efficient design.
If you need to run specific optimized SQL like this (via Linq to SQL) you will need to use ExecuteQuery.
So using your example you could do:
db.ExecuteQuery<Alarm>("UPDATE [Alarms] SET Alarm_Ack_UTC = GETUTCDATE()");
If you want a more optimized way of updating multiple rows with different values then you would need to think about using SqlBulkCopy which is SQL server specific, but isn't Linq to SQL.

Alternative to using IList.Contains(item.Id) in Entity Framework for large lists?

Is there an alternative to using .Contains() to select objects in Entity Framework that exist in a specified list? Contains() works great if your list is small, however once you start getting a few thousands items the performance is terrible.
return (from item in context.Accounts
where accountIdList.Contains(item.AccountId)
select item).ToList();
I'm using EF 4.0, .Net Framework 4.0, and SQL Server 2005. I'm not opposed to a SQL solution either since the query that EF generates only takes a second to run on SQL for about 10k items.
I found an alternative that runs in about a second using a SQL Stored Procedure and a comma-delimited string for the parameter. Much better than the 5+ minutes EF was taking using .Contains()
It is run from my code using the following:
string commaDelmitedList = string.Join(",", accountIdList);
return context.GetAccountsByList(commaDelmitedList).ToList();
The StoredProcedure (simplified) looks like this:
SELECT *
FROM Accounts as T1 WITH (NOLOCK)
INNER JOIN (
SELECT Num FROM dbo.StringToNumSet(#commaDelimitedAccountIds, ',')
) as [T2] ON [T1].[AccountId] = [T2].[num]
And the User-Defined function dbo.StringToNumSet() looks like this:
CREATE FUNCTION [dbo].[StringToNumSet] (
#TargetString varchar(MAX),
#SearchChar varchar(1)
)
RETURNS #Set TABLE (
num int not null
)
AS
BEGIN
DECLARE #SearchCharPos int, #LastSearchCharPos int
SET #SearchCharPos = 0
WHILE 1=1
BEGIN
SET #LastSearchCharPos = #SearchCharPos
SET #SearchCharPos = CHARINDEX( #SearchChar, #TargetString, #SearchCharPos + 1 )
IF #SearchCharPos = 0
BEGIN
INSERT #Set( num ) VALUES ( SUBSTRING( #TargetString, #LastSearchCharPos + 1, DATALENGTH( #TargetString ) ) )
BREAK
END
ELSE
INSERT #Set( num ) VALUES ( SUBSTRING( #TargetString, #LastSearchCharPos + 1, #SearchCharPos - #LastSearchCharPos - 1 ) )
END
RETURN
END
Would it be viable to just read you infomation into memory then do the searchs.
I've found that in most cases were you need to work with big amounts of data if you can get away with reading all the data into memory and then doing the lookups its much much faster.
Contains already gets translated to to a massive WHERE IN SQL statement, so that's not really a problem. However, you shouldn't eagerly evaluate the query, as this will execute the query everytime you call that method. Take advantage of the the nature of linq-to-entities and let the query get evaluated when you actually iterate over it.

Categories