Strange SQL generated from LINQ when checking BIT column - c#

I have the following LINQtoSQL statement
from t1 in __table1
join t2 in __table2 on t1.Id equals t2.OtherTableId
where t2.BranchId == branchId
&& !t1.IsPersonal
select t1.Id
And this generates the following SQL
SELECT DISTINCT [t0].[Id]
FROM [__table1] AS [t0]
INNER JOIN [__table2] AS [t1] ON [t0].[Id] = [t1].[OtherTableId]
WHERE ([t1].[BranchId] = #p0) AND (NOT ([t0].[IsPersonal] = 1))
Now the issue that I have is this:
(NOT ([t0].[IsPersonal] = 1))
How can I write the LINQ to just say
[t0].[IsPersonal] = 0
NOTE: IsPersonal is not nullable.

Edit: I may have outsmarted the optimizer but unfortunately when using Linq2Sql the filtered indexes aren't used when the 'filter criteria' is a parameter - which is what this does. So in the end I gave up and switched to a stored procedure. Alternatives were just too icky.
Note: the generated SQL does work with filtered indexes without an index hint, but since I was running in in SSMS the query plan cache doesn't apply.
Aha! Finally managed to outsmart the optimizer.
WHERE object.Equals(t.Voided, 0) or
WHERE object.Equals(t.Voided, "false")
Which generates
WHERE ([t0].[Voided] = #p0)
#p0 is sent as a string or number which SQL Server casts to a boolean for you.
This seems to work with a filtered index (and force hint), which is the reason I needed to get around the optimizer in the first place.
Note: For some reason sometimes "0" gives a boolean parse error so 0 or "false" is probably better. Could depend on some subtleties of your query.
I prefer 0 because "false" ends up being a varchar(8000) which is a little overkill!

Right so I think that I have figured it out. The following line
t1.IsPersonal == false
gets optimised as
!t1.IsPersonal
Which is, in turn, literally translated into
(NOT ([t0].[IsPersonal] = 1))
Seems that the optimiser is to "blame"

Related

SQL server returns no record for any query

Week ago, we logged an error that a query "Select columns From Table Where column = value" returns no records, but the data was there for sure.
Same error as above happened again 3 times today. But obviously that data is always there because that query success for most of time, and the data never been touched.
Same time today, we also found another error: We use EntityFramework (6.1.40302.0) to do a count on a table, and we gets the error below:
"System.InvalidOperationException: Sequence contains no elements
at System.Linq.Enumerable.Single[TSource](IEnumerable`1 source)"
Apparently, after EntityFramework retrieve return result from a "Select Count(1) From Table Where column = value" query, it always expect to have one record, so it used .Single() internally. But when this query executed, the Sql Server returns no record.
As for a "Select Count(1) From Table Where x = y" the server should always return a result, even when there's no matches (return 0)
The difficulty to track down this bug is, I can't reproduce it. It only happens couple of times, which been logged.
We've also noticed, the bug happens, the SQL server is either busy or there's long running transaction.
We're using SQL Server 2016.
Anyone else experienced it ever?
Thanks.
------- Update
I've attached the query that managed to dig out from EF:
Apparently this query should always return something, 0 or some number, but not no records. Even if there's other process locking resource / uncommit transaction etc.
SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM [dbo].[FileRecords] AS [Extent1]
WHERE ([Extent1].[FileStatusId] IN (1000, 2000, 5000)) AND ([Extent1].[UserId] = #p__linq__0)
) AS [GroupBy1]
-- p__linq__0: '1' (Type = Int32, IsNullable = false)
It is generated by an EF Linq query:
Context.FileRecords.Count(f => openStatus.Contains(f.FileStatusId) && f.UserId == userId)

Using collation in Linq to Sql

Imagine this sql query
select * from products order by name collate Persian_100_CI_AI asc
Now using Linq:
product = DB.Products.OrderBy(p => p.name); // what should I do here?
How can I apply collation?
This is now possible with EF Core 5.0 using the collate function.
In your example the code would be:
product = DB.Products.OrderBy(p => EF.Functions.Collate(p.name, "Persian_100_CI_AI"));
There is no direct way.
Workaround:
Create function in Sql Server
CREATE FUNCTION [dbo].[fnsConvert]
(
#p NVARCHAR(2000) ,
#c NVARCHAR(2000)
)
RETURNS NVARCHAR(2000)
AS
BEGIN
IF ( #c = 'Persian_100_CI_AI' )
SET #p = #p COLLATE Persian_100_CI_AI
IF ( #c = 'Persian_100_CS_AI' )
SET #p = #p COLLATE Persian_100_CS_AI
RETURN #p
END
Import it in model and use:
from o in DB.Products
orderby DB.fnsConvert(s.Description, "Persian_100_CI_AI")
select o;
You can't change the collation through a LINQ statement. You better do the sorting in memory by applying a StringComparer that is initialized with the correct culture (at least... I hope it's correct) and ignores case (true).
DB.Products.AsEnumerable()
.OrderBy (x => x, StringComparer.Create(new CultureInfo("fa-IR"), true))
edit
Since people (understandably) don't seem to read comments let me add that this is answered using the exact code of the question, in which there is no Where or Select. Of course I'm aware of the possibly huge data overhead when doing something like...
DB.Products.AsEnumerable().Where(...).Select(...).OrderBy(...)
...which first pulls the entire table contents into memory and then does the filtering and projection the database itself could have done by moving AsEnumerable():
DB.Products.Where(...).Select(...).AsEnumerable().OrderBy(...)
The point is that if the database doesn't support ordering by some desired character set/collation the only option using EF's DbSet is to do the ordering in memory.
The alternative is to run a SQL query having an ORDER BY with explicit collation. If paging is used, this is the only option.

sp_executesql runs in milliseconds in SSMS but takes 3 seconds from ado.net [duplicate]

This question already has an answer here:
Stored Proc slower from application than Management Studio
(1 answer)
Closed 9 years ago.
This is my dynamic query used on search form which runs in milliseconds in SSMS roughly between 300 to 400 ms:
exec sp_executesql N'set arithabort off;
set transaction isolation level read uncommitted;
With cte as
(Select ROW_NUMBER() OVER
(Order By Case When d.OldInstrumentID IS NULL
THEN d.LastStatusChangedDateTime Else d.RecordingDateTime End
desc) peta_rn,
d.DocumentID
From Documents d
Inner Join Users u on d.UserID = u.UserID
Inner Join IGroupes ig on ig.IGroupID = d.IGroupID
Inner Join ITypes it on it.ITypeID = d.ITypeID
Where 1=1
And (CreatedByAccountID = #0 Or DocumentStatusID = #1 Or DocumentStatusID = #2 )
And (d.JurisdictionID = #3 Or DocumentStatusID = #4 Or DocumentStatusID = #5)
AND ( d.DocumentStatusID = 9 )
)
Select d.DocumentID, d.IsReEfiled, d.IGroupID, d.ITypeID, d.RecordingDateTime,
d.CreatedByAccountID, d.JurisdictionID,
Case When d.OldInstrumentID IS NULL THEN d.LastStatusChangedDateTime
Else d.RecordingDateTime End as LastStatusChangedDateTime,
dbo.FnCanChangeDocumentStatus(d.DocumentStatusID,d.DocumentID) as CanChangeStatus,
d.IDate, d.InstrumentID, d.DocumentStatusID,ig.Abbreviation as IGroupAbbreviation,
u.Username, j.JDAbbreviation, inf.DocumentName,
it.Abbreviation as ITypeAbbreviation, d.DocumentDate,
ds.Abbreviation as DocumentStatusAbbreviation,
Upper(dbo.GetFlatDocumentName(d.DocumentID)) as FlatDocumentName
From Documents d
Left Join IGroupes ig On d.IGroupID = ig.IGroupID
Left Join ITypes it On d.ITypeID = it.ITypeID
Left Join Users u On u.UserID = d.UserID
Left Join DocumentStatuses ds On d.DocumentStatusID = ds.DocumentStatusID
Left Join InstrumentFiles inf On d.DocumentID = inf.DocumentID
Left Join Jurisdictions j on j.JurisdictionID = d.JurisdictionID
Inner Join cte on cte.DocumentID = d.DocumentID
Where 1=1
And peta_rn>=#6 AND peta_rn<=#7
Order by peta_rn',
N'#0 int,#1 int,#2 int,#3 int,#4 int,#5 int,#6 bigint,#7 bigint',
#0=44,#1=5,#2=9,#3=1,#4=5,#5=9,#6=94200,#7=94250
This sql is formed in C# code and the where clauses are added dynamically based on the value the user has searched in search form. It takes roughly 3 seconds to move from one page to 2nd. I already have necessary indexes on most of the columns where I search.
Any idea why would my Ado.Net code be slow?
Update: Not sure if execution plans would help but here they are:
It is possible that SQL server has created inappropriate query plan for ADO.NET connections. We have seen similar issues with ADO, usual solution is to clear any query plans and run slow query again - this may create better plan.
To clear query plans most general solution is to update statistics for involved tables. Like next for you:
update statistics documents with fullscan
Do same for other tables involved and then run your slow query from ADO.NET (do not run SSMS before).
Note that such timing inconsistencies may hint of bad query or database design - at least for us that is usually so :)
If you run a query repeatedly in SSMS, the database may re-use a previously created execution plan, and the required data may already be cached in memory.
There are a couple of things I notice in your query:
the CTE joins Users, IGroupes and ITypes, but the joined records are not used in the SELECT
the CTE performs an ORDER BY on a calculated expression (notice the 85% cost in (unindexed) Sort)
probably replacing the CASE expression with a computed persisted column which can be indexed speeds up execution.
note that the ORDER BY is executed on data resulting from joining 4 tables
the WHERE condition of the CTE states AND d.DocumentStatusID = 9, but AND's other DocumentStatusIDs
paging is performed on the result of 8 JOINed tables.
most likely creating an intermediate CTE which filters the first CTE based on peta_rn improves performance
.net by default uses UTF strings, which equates to NVARCHAR as opposed to VARCHAR.
When you are doing a WHERE ID = #foo in dot net, you are likely to be implicitly doing
WHERE CONVERT(ID, NVARCHAR) = #foo
The result is that this where clause can't be indexed, and must be table scanned. The solution is to actually pass each parameter into the SqlCommand as a DbParameter with the DbType set to VARCHAR (in the case of string).
A similar situation could of course occur with Int types if the .net parameter is "wider" than the SQL column equivalent.
PS The easiest way to "prove" this issue is to run your query in SSMS with the following above
DECLARE #p0 INT = 123
DECLARE #p1 NVARCHAR = "foobar" //etc etc
and compare with
DECLARE #p0 INT = 123
DECLARE #p1 VARCHAR = "foobar" //etc etc

Alternative to using IList.Contains(item.Id) in Entity Framework for large lists?

Is there an alternative to using .Contains() to select objects in Entity Framework that exist in a specified list? Contains() works great if your list is small, however once you start getting a few thousands items the performance is terrible.
return (from item in context.Accounts
where accountIdList.Contains(item.AccountId)
select item).ToList();
I'm using EF 4.0, .Net Framework 4.0, and SQL Server 2005. I'm not opposed to a SQL solution either since the query that EF generates only takes a second to run on SQL for about 10k items.
I found an alternative that runs in about a second using a SQL Stored Procedure and a comma-delimited string for the parameter. Much better than the 5+ minutes EF was taking using .Contains()
It is run from my code using the following:
string commaDelmitedList = string.Join(",", accountIdList);
return context.GetAccountsByList(commaDelmitedList).ToList();
The StoredProcedure (simplified) looks like this:
SELECT *
FROM Accounts as T1 WITH (NOLOCK)
INNER JOIN (
SELECT Num FROM dbo.StringToNumSet(#commaDelimitedAccountIds, ',')
) as [T2] ON [T1].[AccountId] = [T2].[num]
And the User-Defined function dbo.StringToNumSet() looks like this:
CREATE FUNCTION [dbo].[StringToNumSet] (
#TargetString varchar(MAX),
#SearchChar varchar(1)
)
RETURNS #Set TABLE (
num int not null
)
AS
BEGIN
DECLARE #SearchCharPos int, #LastSearchCharPos int
SET #SearchCharPos = 0
WHILE 1=1
BEGIN
SET #LastSearchCharPos = #SearchCharPos
SET #SearchCharPos = CHARINDEX( #SearchChar, #TargetString, #SearchCharPos + 1 )
IF #SearchCharPos = 0
BEGIN
INSERT #Set( num ) VALUES ( SUBSTRING( #TargetString, #LastSearchCharPos + 1, DATALENGTH( #TargetString ) ) )
BREAK
END
ELSE
INSERT #Set( num ) VALUES ( SUBSTRING( #TargetString, #LastSearchCharPos + 1, #SearchCharPos - #LastSearchCharPos - 1 ) )
END
RETURN
END
Would it be viable to just read you infomation into memory then do the searchs.
I've found that in most cases were you need to work with big amounts of data if you can get away with reading all the data into memory and then doing the lookups its much much faster.
Contains already gets translated to to a massive WHERE IN SQL statement, so that's not really a problem. However, you shouldn't eagerly evaluate the query, as this will execute the query everytime you call that method. Take advantage of the the nature of linq-to-entities and let the query get evaluated when you actually iterate over it.

Generated LinqtoSql Sql 5x slower than SAME EXACT hand-written sql

I have a sql statement which is hardcoded in an existing VB6 app. I'm upgrading a new version in C# and using Linq To Sql. I was able to get LinqToSql to generate the same sql (before I start refactoring), but for some reason the Sql generated by LinqToSql is 5x slower than the original sql. This is running the generated Sql Directly in LinqPad.
The only real difference my meager sql eyes can spot is the
WITH (NOLOCK), which if I add into the LinqToSql generated sql, makes no difference.
Can someone point out what I'm doing wrong here? Thanks!
Existing Hard Coded Sql (5.0 Seconds)
SELECT DISTINCT
CH.ClaimNum, CH.AcnProvID, CH.AcnPatID, CH.TinNum, CH.Diag1, CH.GroupNum, CH.AllowedTotal
FROM Claims.dbo.T_ClaimsHeader AS CH WITH (NOLOCK)
WHERE
CH.ContractID IN ('123A','123B','123C','123D','123E','123F','123G','123H')
AND ( ( (CH.Transmited Is Null or CH.Transmited = '')
AND CH.DateTransmit Is Null
AND CH.EobDate Is Null
AND CH.ProcessFlag IN ('Y','E')
AND CH.DataSource NOT IN ('A','EC','EU')
AND CH.AllowedTotal > 0 ) )
ORDER BY CH.AcnPatID, CH.ClaimNum
Generated Sql from LinqToSql (27.6 Seconds)
-- Region Parameters
DECLARE #p0 NVarChar(4) SET #p0 = '123A'
DECLARE #p1 NVarChar(4) SET #p1 = '123B'
DECLARE #p2 NVarChar(4) SET #p2 = '123C'
DECLARE #p3 NVarChar(4) SET #p3 = '123D'
DECLARE #p4 NVarChar(4) SET #p4 = '123E'
DECLARE #p5 NVarChar(4) SET #p5 = '123F'
DECLARE #p6 NVarChar(4) SET #p6 = '123G'
DECLARE #p7 NVarChar(4) SET #p7 = '123H'
DECLARE #p8 VarChar(1) SET #p8 = ''
DECLARE #p9 NVarChar(1) SET #p9 = 'Y'
DECLARE #p10 NVarChar(1) SET #p10 = 'E'
DECLARE #p11 NVarChar(1) SET #p11 = 'A'
DECLARE #p12 NVarChar(2) SET #p12 = 'EC'
DECLARE #p13 NVarChar(2) SET #p13 = 'EU'
DECLARE #p14 Decimal(5,4) SET #p14 = 0
-- EndRegion
SELECT DISTINCT
[t0].[ClaimNum],
[t0].[acnprovid] AS [AcnProvID],
[t0].[acnpatid] AS [AcnPatID],
[t0].[tinnum] AS [TinNum],
[t0].[diag1] AS [Diag1],
[t0].[GroupNum],
[t0].[allowedtotal] AS [AllowedTotal]
FROM [Claims].[dbo].[T_ClaimsHeader] AS [t0]
WHERE
([t0].[contractid] IN (#p0, #p1, #p2, #p3, #p4, #p5, #p6, #p7))
AND (([t0].[Transmited] IS NULL) OR ([t0].[Transmited] = #p8))
AND ([t0].[DATETRANSMIT] IS NULL)
AND ([t0].[EOBDATE] IS NULL)
AND ([t0].[PROCESSFLAG] IN (#p9, #p10))
AND (NOT ([t0].[DataSource] IN (#p11, #p12, #p13)))
AND ([t0].[allowedtotal] > #p14)
ORDER BY [t0].[acnpatid], [t0].[ClaimNum]
New LinqToSql Code (30+ seconds... Times out )
var contractIds = T_ContractDatas.Where(x => x.EdiSubmissionGroupID == "123-01").Select(x => x.CONTRACTID).ToList();
var processFlags = new List<string> {"Y","E"};
var dataSource = new List<string> {"A","EC","EU"};
var results = (from claims in T_ClaimsHeaders
where contractIds.Contains(claims.contractid)
&& (claims.Transmited == null || claims.Transmited == string.Empty )
&& claims.DATETRANSMIT == null
&& claims.EOBDATE == null
&& processFlags.Contains(claims.PROCESSFLAG)
&& !dataSource.Contains(claims.DataSource)
&& claims.allowedtotal > 0
select new
{
ClaimNum = claims.ClaimNum,
AcnProvID = claims.acnprovid,
AcnPatID = claims.acnpatid,
TinNum = claims.tinnum,
Diag1 = claims.diag1,
GroupNum = claims.GroupNum,
AllowedTotal = claims.allowedtotal
}).OrderBy(x => x.ClaimNum).OrderBy(x => x.AcnPatID).Distinct();
I'm using the list of constants above to make LinqToSql Generate IN ('xxx','xxx',etc) Otherwise it uses subqueries which are just as slow...
Compare the execution plans for the two queires. The linqtosql query is using loads of parameters, the query optimiser will build an execution plan based on what MIGHT be in the parameters, the hard coded SQL has literal values, the query optimiser will build an execution plan based on the actual values. It is probably producing a much more eficient plan for the literal values. Your best bet is to try and spot the slow bits in the execution plan and try and get linq2sql to produce a better query. If you can't but you think you can build one by hand then create an SP, which you can then expose as a method on your data context class in linqtosql.
The hard-coded values in the first SQL may be allowing the query optimizer to use indexes that it doesn't know it can efficiently use for the second, parameterised, SQL.
Another possibility is that if you're running the hand-crafted SQL in SQL Server Management Studio, the different default SET-tings of SSMS compared to the .NET SQL Server provider may be affecting performance. If this is the case, changing some of the SET-tings on the .NET connection prior to executing the command might help (e.g. SET ARITHABORT ON) but I don't know if you can do this in LinqPad. See here for more info on this possibility.
The big difference are the parameters.
I can't know for sure without analyzing the plans, but L2S parameterizes queries so that their plans can be effectively reused, avoiding excessive query recompilation on the server. This is, in general, a Good Thing because it keeps the CPU time low on the SQL Server -- it doesn't have to keep generating and generating and generating the same plan.
But L2S goes a bit overboard when you use constants. It parameterizes them, too, which can be detrimental for performance in certain situations.
Putting on my Aluminum-Foil Clairvoyancy Hat, I'm visualizing the kinds of index structures you might have on that table. For example, you may have an index just on ProcessFlag, and there may be very few values for "Y" and "E" for ProcessFlag, causing the query with the hard-coded constants to do a scan only of the values where ProcessFlag = "Y" and "E". For the parameterized query, SQL Server generates a plan which is judged to be optimal for arbitrary input. That means that the server can't take advantage of this little hint (the constants) that you give it.
My advice to you at this point is to take a good look at your indexes and favor composite indexes which cover more of your WHERE conditions together. I will bet that with a bit of that type of analysis, you will find that the query performance becomes far more similar. (and probably improves, in both cases!)
You might also check our compiled LINQ queries - http://www.jdconley.com/blog/archive/2007/11/28/linq-to-sql-surprise-performance-hit.aspx

Categories