Using collation in Linq to Sql - c#

Imagine this sql query
select * from products order by name collate Persian_100_CI_AI asc
Now using Linq:
product = DB.Products.OrderBy(p => p.name); // what should I do here?
How can I apply collation?

This is now possible with EF Core 5.0 using the collate function.
In your example the code would be:
product = DB.Products.OrderBy(p => EF.Functions.Collate(p.name, "Persian_100_CI_AI"));

There is no direct way.
Workaround:
Create function in Sql Server
CREATE FUNCTION [dbo].[fnsConvert]
(
#p NVARCHAR(2000) ,
#c NVARCHAR(2000)
)
RETURNS NVARCHAR(2000)
AS
BEGIN
IF ( #c = 'Persian_100_CI_AI' )
SET #p = #p COLLATE Persian_100_CI_AI
IF ( #c = 'Persian_100_CS_AI' )
SET #p = #p COLLATE Persian_100_CS_AI
RETURN #p
END
Import it in model and use:
from o in DB.Products
orderby DB.fnsConvert(s.Description, "Persian_100_CI_AI")
select o;

You can't change the collation through a LINQ statement. You better do the sorting in memory by applying a StringComparer that is initialized with the correct culture (at least... I hope it's correct) and ignores case (true).
DB.Products.AsEnumerable()
.OrderBy (x => x, StringComparer.Create(new CultureInfo("fa-IR"), true))
edit
Since people (understandably) don't seem to read comments let me add that this is answered using the exact code of the question, in which there is no Where or Select. Of course I'm aware of the possibly huge data overhead when doing something like...
DB.Products.AsEnumerable().Where(...).Select(...).OrderBy(...)
...which first pulls the entire table contents into memory and then does the filtering and projection the database itself could have done by moving AsEnumerable():
DB.Products.Where(...).Select(...).AsEnumerable().OrderBy(...)
The point is that if the database doesn't support ordering by some desired character set/collation the only option using EF's DbSet is to do the ordering in memory.
The alternative is to run a SQL query having an ORDER BY with explicit collation. If paging is used, this is the only option.

Related

Alter EF Generated query parameters

I'm using EF 6.4.4 to query a SQL view. Now the view is not really performing optimal, but i don't control it.
I'm executing the following code with a WHERE statement on a string/nvarchar property
_context.ViewObject
.Where(x => x.Name == nameFilter)
.ToList();
Similarly, i have the same SQL statement executed in SMSS
SELECT [Id]
, [Name]
, ...
FROM [View]
WHERE [Name] = '<nameFilter>'
My problem is that the EF variant is way slower than the direct SQL query.
When checking out the SQL query generated by EF i see the following:
SELECT [Id]
, [Name]
, ...
FROM [View]
WHERE [Name] = #p__linq__0
with parameter #p__linq__0 is of type NVARCHAR(4000) NULL
This even though that my input variable is not NULL and has a lenght of maximum 6 characters.
When i execute the same sql query with this parameter, it is slow in SMSS as well.
Apparently, this has somethign
So what i want to do is alter the SQL query parameter that EF is using to generate this query. This to make sure that my parameter is more accurately represented in the query and that i can get the same performance as directly in SMSS.
Is there a way to do this?
Whats going on: parameter sniffing
Execute the following in SSMS and you will propably see the same performance.
EXECUTE sp_executesql N'SELECT [Id]
, [Name]
, ...
FROM [View]
WHERE [Name] = #nameFilter'
,N'#nameFilter nvarchar(4000)'
,#nameFilter = '<namefilter>';
sp_executeSql is used by EF to execute queries against a database and thus, when you write .Where(x => x.Name == nameFilter) this is translated to the above statement.
Making you suffer from parameter sniffing.
You could fix this by adding recompile to your queries like described here But be aware that adding recompile to all queries might have negative impact on the other queries.
You can execute the following queries with actual execution plan to see the difference:
Query with WHERE Name = #NameFilter
Query with WHERE Name = '<NameFilter>'
Query with WHERE Name = #NameFilter OPTION(RECOMPILE)
If it's not parameter sniffing, it might be implicit conversions, but I'm guessing both types are NVARCHAR so this shouldn't matter.
99% of the time it's parameter sniffing.

sp_executesql runs in milliseconds in SSMS but takes 3 seconds from ado.net [duplicate]

This question already has an answer here:
Stored Proc slower from application than Management Studio
(1 answer)
Closed 9 years ago.
This is my dynamic query used on search form which runs in milliseconds in SSMS roughly between 300 to 400 ms:
exec sp_executesql N'set arithabort off;
set transaction isolation level read uncommitted;
With cte as
(Select ROW_NUMBER() OVER
(Order By Case When d.OldInstrumentID IS NULL
THEN d.LastStatusChangedDateTime Else d.RecordingDateTime End
desc) peta_rn,
d.DocumentID
From Documents d
Inner Join Users u on d.UserID = u.UserID
Inner Join IGroupes ig on ig.IGroupID = d.IGroupID
Inner Join ITypes it on it.ITypeID = d.ITypeID
Where 1=1
And (CreatedByAccountID = #0 Or DocumentStatusID = #1 Or DocumentStatusID = #2 )
And (d.JurisdictionID = #3 Or DocumentStatusID = #4 Or DocumentStatusID = #5)
AND ( d.DocumentStatusID = 9 )
)
Select d.DocumentID, d.IsReEfiled, d.IGroupID, d.ITypeID, d.RecordingDateTime,
d.CreatedByAccountID, d.JurisdictionID,
Case When d.OldInstrumentID IS NULL THEN d.LastStatusChangedDateTime
Else d.RecordingDateTime End as LastStatusChangedDateTime,
dbo.FnCanChangeDocumentStatus(d.DocumentStatusID,d.DocumentID) as CanChangeStatus,
d.IDate, d.InstrumentID, d.DocumentStatusID,ig.Abbreviation as IGroupAbbreviation,
u.Username, j.JDAbbreviation, inf.DocumentName,
it.Abbreviation as ITypeAbbreviation, d.DocumentDate,
ds.Abbreviation as DocumentStatusAbbreviation,
Upper(dbo.GetFlatDocumentName(d.DocumentID)) as FlatDocumentName
From Documents d
Left Join IGroupes ig On d.IGroupID = ig.IGroupID
Left Join ITypes it On d.ITypeID = it.ITypeID
Left Join Users u On u.UserID = d.UserID
Left Join DocumentStatuses ds On d.DocumentStatusID = ds.DocumentStatusID
Left Join InstrumentFiles inf On d.DocumentID = inf.DocumentID
Left Join Jurisdictions j on j.JurisdictionID = d.JurisdictionID
Inner Join cte on cte.DocumentID = d.DocumentID
Where 1=1
And peta_rn>=#6 AND peta_rn<=#7
Order by peta_rn',
N'#0 int,#1 int,#2 int,#3 int,#4 int,#5 int,#6 bigint,#7 bigint',
#0=44,#1=5,#2=9,#3=1,#4=5,#5=9,#6=94200,#7=94250
This sql is formed in C# code and the where clauses are added dynamically based on the value the user has searched in search form. It takes roughly 3 seconds to move from one page to 2nd. I already have necessary indexes on most of the columns where I search.
Any idea why would my Ado.Net code be slow?
Update: Not sure if execution plans would help but here they are:
It is possible that SQL server has created inappropriate query plan for ADO.NET connections. We have seen similar issues with ADO, usual solution is to clear any query plans and run slow query again - this may create better plan.
To clear query plans most general solution is to update statistics for involved tables. Like next for you:
update statistics documents with fullscan
Do same for other tables involved and then run your slow query from ADO.NET (do not run SSMS before).
Note that such timing inconsistencies may hint of bad query or database design - at least for us that is usually so :)
If you run a query repeatedly in SSMS, the database may re-use a previously created execution plan, and the required data may already be cached in memory.
There are a couple of things I notice in your query:
the CTE joins Users, IGroupes and ITypes, but the joined records are not used in the SELECT
the CTE performs an ORDER BY on a calculated expression (notice the 85% cost in (unindexed) Sort)
probably replacing the CASE expression with a computed persisted column which can be indexed speeds up execution.
note that the ORDER BY is executed on data resulting from joining 4 tables
the WHERE condition of the CTE states AND d.DocumentStatusID = 9, but AND's other DocumentStatusIDs
paging is performed on the result of 8 JOINed tables.
most likely creating an intermediate CTE which filters the first CTE based on peta_rn improves performance
.net by default uses UTF strings, which equates to NVARCHAR as opposed to VARCHAR.
When you are doing a WHERE ID = #foo in dot net, you are likely to be implicitly doing
WHERE CONVERT(ID, NVARCHAR) = #foo
The result is that this where clause can't be indexed, and must be table scanned. The solution is to actually pass each parameter into the SqlCommand as a DbParameter with the DbType set to VARCHAR (in the case of string).
A similar situation could of course occur with Int types if the .net parameter is "wider" than the SQL column equivalent.
PS The easiest way to "prove" this issue is to run your query in SSMS with the following above
DECLARE #p0 INT = 123
DECLARE #p1 NVARCHAR = "foobar" //etc etc
and compare with
DECLARE #p0 INT = 123
DECLARE #p1 VARCHAR = "foobar" //etc etc

OrderBy on a Nullable<int> with a default value in Entity Framework

We are migrating some code to use Entity Framework and have a query that is trying to sort on a Nullable field and provides a default sort value is the value is null using the Nullable.GetValueOrDefault(T) function.
However, upon execution it returns the following error:
LINQ to Entities does not recognize the method 'Int32 GetValueOrDefault(Int32)' method, and this method cannot be translated into a store expression.
The query looks like:
int magicDefaultSortValue = 250;
var query = context.MyTable.OrderBy(t => t.MyNullableSortColumn
.GetValueOrDefault(magicDefaultSortValue));
From this answer I can see that there is a way to provide "translations" within your EDMX. Could we write a similar translation for this coalescing function?
NOTE: When I tried, the ?? coalescing operator instead of GetValueOrDefault in the query it does work. So perhaps whatever makes that work could be leveraged?
I believe you found your answer. When you use ??, EF generates SQL using a CASE to select your sort value if the value is null, and then sorts on that.
MyTable.OrderBy (t => t.MyNullableSortColumn ?? magicDefaultSortValue).ToArray();
will generate the following sql:
-- Region Parameters
DECLARE p__linq__0 Int = 250
-- EndRegion
SELECT
[Project1].[MyColumn1] AS [MyColumn1],
[Project1].[MyNullableSortColumn] AS [MyNullableSortColumn]
FROM ( SELECT
CASE WHEN ([Extent1].[MyNullableSortColumn] IS NULL) THEN #p__linq__0 ELSE [Extent1].[MyNullableSortColumn] END AS [C1],
[Extent1].[MyColumn1] AS [MyColumn1],
[Extent1].[MyNullableSortColumn] AS [MyNullableSortColumn]
FROM [dbo].[MyTable] AS [Extent1]
) AS [Project1]
ORDER BY [Project1].[C1] ASC
As an aside, I would recommend getting LINQPad which will let you work with your EF models and view the sql being generated. Also, it is helpful to know about the EntityFunctions class and SqlFunctions class as they provide access to several useful functions.

Alternative to using IList.Contains(item.Id) in Entity Framework for large lists?

Is there an alternative to using .Contains() to select objects in Entity Framework that exist in a specified list? Contains() works great if your list is small, however once you start getting a few thousands items the performance is terrible.
return (from item in context.Accounts
where accountIdList.Contains(item.AccountId)
select item).ToList();
I'm using EF 4.0, .Net Framework 4.0, and SQL Server 2005. I'm not opposed to a SQL solution either since the query that EF generates only takes a second to run on SQL for about 10k items.
I found an alternative that runs in about a second using a SQL Stored Procedure and a comma-delimited string for the parameter. Much better than the 5+ minutes EF was taking using .Contains()
It is run from my code using the following:
string commaDelmitedList = string.Join(",", accountIdList);
return context.GetAccountsByList(commaDelmitedList).ToList();
The StoredProcedure (simplified) looks like this:
SELECT *
FROM Accounts as T1 WITH (NOLOCK)
INNER JOIN (
SELECT Num FROM dbo.StringToNumSet(#commaDelimitedAccountIds, ',')
) as [T2] ON [T1].[AccountId] = [T2].[num]
And the User-Defined function dbo.StringToNumSet() looks like this:
CREATE FUNCTION [dbo].[StringToNumSet] (
#TargetString varchar(MAX),
#SearchChar varchar(1)
)
RETURNS #Set TABLE (
num int not null
)
AS
BEGIN
DECLARE #SearchCharPos int, #LastSearchCharPos int
SET #SearchCharPos = 0
WHILE 1=1
BEGIN
SET #LastSearchCharPos = #SearchCharPos
SET #SearchCharPos = CHARINDEX( #SearchChar, #TargetString, #SearchCharPos + 1 )
IF #SearchCharPos = 0
BEGIN
INSERT #Set( num ) VALUES ( SUBSTRING( #TargetString, #LastSearchCharPos + 1, DATALENGTH( #TargetString ) ) )
BREAK
END
ELSE
INSERT #Set( num ) VALUES ( SUBSTRING( #TargetString, #LastSearchCharPos + 1, #SearchCharPos - #LastSearchCharPos - 1 ) )
END
RETURN
END
Would it be viable to just read you infomation into memory then do the searchs.
I've found that in most cases were you need to work with big amounts of data if you can get away with reading all the data into memory and then doing the lookups its much much faster.
Contains already gets translated to to a massive WHERE IN SQL statement, so that's not really a problem. However, you shouldn't eagerly evaluate the query, as this will execute the query everytime you call that method. Take advantage of the the nature of linq-to-entities and let the query get evaluated when you actually iterate over it.

Generated LinqtoSql Sql 5x slower than SAME EXACT hand-written sql

I have a sql statement which is hardcoded in an existing VB6 app. I'm upgrading a new version in C# and using Linq To Sql. I was able to get LinqToSql to generate the same sql (before I start refactoring), but for some reason the Sql generated by LinqToSql is 5x slower than the original sql. This is running the generated Sql Directly in LinqPad.
The only real difference my meager sql eyes can spot is the
WITH (NOLOCK), which if I add into the LinqToSql generated sql, makes no difference.
Can someone point out what I'm doing wrong here? Thanks!
Existing Hard Coded Sql (5.0 Seconds)
SELECT DISTINCT
CH.ClaimNum, CH.AcnProvID, CH.AcnPatID, CH.TinNum, CH.Diag1, CH.GroupNum, CH.AllowedTotal
FROM Claims.dbo.T_ClaimsHeader AS CH WITH (NOLOCK)
WHERE
CH.ContractID IN ('123A','123B','123C','123D','123E','123F','123G','123H')
AND ( ( (CH.Transmited Is Null or CH.Transmited = '')
AND CH.DateTransmit Is Null
AND CH.EobDate Is Null
AND CH.ProcessFlag IN ('Y','E')
AND CH.DataSource NOT IN ('A','EC','EU')
AND CH.AllowedTotal > 0 ) )
ORDER BY CH.AcnPatID, CH.ClaimNum
Generated Sql from LinqToSql (27.6 Seconds)
-- Region Parameters
DECLARE #p0 NVarChar(4) SET #p0 = '123A'
DECLARE #p1 NVarChar(4) SET #p1 = '123B'
DECLARE #p2 NVarChar(4) SET #p2 = '123C'
DECLARE #p3 NVarChar(4) SET #p3 = '123D'
DECLARE #p4 NVarChar(4) SET #p4 = '123E'
DECLARE #p5 NVarChar(4) SET #p5 = '123F'
DECLARE #p6 NVarChar(4) SET #p6 = '123G'
DECLARE #p7 NVarChar(4) SET #p7 = '123H'
DECLARE #p8 VarChar(1) SET #p8 = ''
DECLARE #p9 NVarChar(1) SET #p9 = 'Y'
DECLARE #p10 NVarChar(1) SET #p10 = 'E'
DECLARE #p11 NVarChar(1) SET #p11 = 'A'
DECLARE #p12 NVarChar(2) SET #p12 = 'EC'
DECLARE #p13 NVarChar(2) SET #p13 = 'EU'
DECLARE #p14 Decimal(5,4) SET #p14 = 0
-- EndRegion
SELECT DISTINCT
[t0].[ClaimNum],
[t0].[acnprovid] AS [AcnProvID],
[t0].[acnpatid] AS [AcnPatID],
[t0].[tinnum] AS [TinNum],
[t0].[diag1] AS [Diag1],
[t0].[GroupNum],
[t0].[allowedtotal] AS [AllowedTotal]
FROM [Claims].[dbo].[T_ClaimsHeader] AS [t0]
WHERE
([t0].[contractid] IN (#p0, #p1, #p2, #p3, #p4, #p5, #p6, #p7))
AND (([t0].[Transmited] IS NULL) OR ([t0].[Transmited] = #p8))
AND ([t0].[DATETRANSMIT] IS NULL)
AND ([t0].[EOBDATE] IS NULL)
AND ([t0].[PROCESSFLAG] IN (#p9, #p10))
AND (NOT ([t0].[DataSource] IN (#p11, #p12, #p13)))
AND ([t0].[allowedtotal] > #p14)
ORDER BY [t0].[acnpatid], [t0].[ClaimNum]
New LinqToSql Code (30+ seconds... Times out )
var contractIds = T_ContractDatas.Where(x => x.EdiSubmissionGroupID == "123-01").Select(x => x.CONTRACTID).ToList();
var processFlags = new List<string> {"Y","E"};
var dataSource = new List<string> {"A","EC","EU"};
var results = (from claims in T_ClaimsHeaders
where contractIds.Contains(claims.contractid)
&& (claims.Transmited == null || claims.Transmited == string.Empty )
&& claims.DATETRANSMIT == null
&& claims.EOBDATE == null
&& processFlags.Contains(claims.PROCESSFLAG)
&& !dataSource.Contains(claims.DataSource)
&& claims.allowedtotal > 0
select new
{
ClaimNum = claims.ClaimNum,
AcnProvID = claims.acnprovid,
AcnPatID = claims.acnpatid,
TinNum = claims.tinnum,
Diag1 = claims.diag1,
GroupNum = claims.GroupNum,
AllowedTotal = claims.allowedtotal
}).OrderBy(x => x.ClaimNum).OrderBy(x => x.AcnPatID).Distinct();
I'm using the list of constants above to make LinqToSql Generate IN ('xxx','xxx',etc) Otherwise it uses subqueries which are just as slow...
Compare the execution plans for the two queires. The linqtosql query is using loads of parameters, the query optimiser will build an execution plan based on what MIGHT be in the parameters, the hard coded SQL has literal values, the query optimiser will build an execution plan based on the actual values. It is probably producing a much more eficient plan for the literal values. Your best bet is to try and spot the slow bits in the execution plan and try and get linq2sql to produce a better query. If you can't but you think you can build one by hand then create an SP, which you can then expose as a method on your data context class in linqtosql.
The hard-coded values in the first SQL may be allowing the query optimizer to use indexes that it doesn't know it can efficiently use for the second, parameterised, SQL.
Another possibility is that if you're running the hand-crafted SQL in SQL Server Management Studio, the different default SET-tings of SSMS compared to the .NET SQL Server provider may be affecting performance. If this is the case, changing some of the SET-tings on the .NET connection prior to executing the command might help (e.g. SET ARITHABORT ON) but I don't know if you can do this in LinqPad. See here for more info on this possibility.
The big difference are the parameters.
I can't know for sure without analyzing the plans, but L2S parameterizes queries so that their plans can be effectively reused, avoiding excessive query recompilation on the server. This is, in general, a Good Thing because it keeps the CPU time low on the SQL Server -- it doesn't have to keep generating and generating and generating the same plan.
But L2S goes a bit overboard when you use constants. It parameterizes them, too, which can be detrimental for performance in certain situations.
Putting on my Aluminum-Foil Clairvoyancy Hat, I'm visualizing the kinds of index structures you might have on that table. For example, you may have an index just on ProcessFlag, and there may be very few values for "Y" and "E" for ProcessFlag, causing the query with the hard-coded constants to do a scan only of the values where ProcessFlag = "Y" and "E". For the parameterized query, SQL Server generates a plan which is judged to be optimal for arbitrary input. That means that the server can't take advantage of this little hint (the constants) that you give it.
My advice to you at this point is to take a good look at your indexes and favor composite indexes which cover more of your WHERE conditions together. I will bet that with a bit of that type of analysis, you will find that the query performance becomes far more similar. (and probably improves, in both cases!)
You might also check our compiled LINQ queries - http://www.jdconley.com/blog/archive/2007/11/28/linq-to-sql-surprise-performance-hit.aspx

Categories