I'm having a performance problem with a simple linq query :
var query = ctx.Set<AdministrativeProfile>().OrderBy(x => x.User.Lastname).Skip(9000).Take(1);
The generated SQL is as follow:
SELECT TOP (1)
[Join1].[ProfileID] AS [ProfileID],
[Join1].[Address] AS [Address],
[Join1].[BankAccountNumber] AS [BankAccountNumber],
[Join1].[BankAccountType] AS [BankAccountType],
[Join1].[BankBIC] AS [BankBIC],
[Join1].[BankIBAN] AS [BankIBAN],
[Join1].[BankName] AS [BankName],
[Join1].[City] AS [City],
[Join1].[CountryISO] AS [CountryISO],
[Join1].[IdentifiedUserID1] AS [IdentifiedUserID],
[Join1].[Phone] AS [Phone],
[Join1].[SocialSecurityID] AS [SocialSecurityID],
[Join1].[WithHoldingTaxRate] AS [WithHoldingTaxRate],
[Join1].[Zip] AS [Zip]
FROM ( SELECT [Extent1].[ProfileID] AS [ProfileID], [Extent1].[Address] AS [Address], [Extent1].[BankAccountNumber] AS [BankAccountNumber], [Extent1].[BankAccountType] AS [BankAccountType], [Extent1].[BankBIC] AS [BankBIC], [Extent1].[BankIBAN] AS [BankIBAN], [Extent1].[BankName] AS [BankName], [Extent1].[City] AS [City], [Extent1].[CountryISO] AS [CountryISO], [Extent1].[IdentifiedUserID] AS [IdentifiedUserID1], [Extent1].[Phone] AS [Phone], [Extent1].[SocialSecurityID] AS [SocialSecurityID], [Extent1].[WithHoldingTaxRate] AS [WithHoldingTaxRate], [Extent1].[Zip] AS [Zip], [Extent2].[Lastname] AS [Lastname], row_number() OVER (ORDER BY [Extent2].[Lastname] ASC) AS [row_number]
FROM [dbo].[AdministrativeProfile] AS [Extent1]
INNER JOIN [dbo].[vUsers] AS [Extent2] ON [Extent1].[IdentifiedUserID] = [Extent2].[IdentifiedUserId]
) AS [Join1]
WHERE [Join1].[row_number] > 9000
ORDER BY [Join1].[Lastname] ASC
The running time for this query is approximately 15 seconds.
I read a few things about SELECT TOP being slow because of some kind of sort, but I couldn't manage to find a solution to my problem.
Here's the execution plan
Few things to note :
1) The .Skip(n).Take(x) is added generically by a paging system, so the only part that I could modify without breaking the generic paging is this :
ctx.Set<AdministrativeProfile>().OrderBy(x => x.User.Lastname)
2) I found a few ways to fix the SQL statement and make it lightning fast (like use INNER HASH JOIN in the sub query, or add an additional where clause checking if [Join1].[row_number] < x), but since it is generated by a linq query, it doesn't help me much
3) When doing Skip(x) with a small number for x, it runs much quicker. The execution time increase with how big x is.
4) The tables I use do not have many rows. About 9000 each.
So basically, I know how to fix the SQL, but I don't know how to change the linq query to optimize it.
You should profile the query and see where the bottleneck is. However, I have dealt with a similar problem, and the slowness should be alleviated by adding a non-clustered index to the property by which you sort ([Lastname] in your case).
Basically speaking: the more members you have to skip, the more sorting has to be performed: the sorting becomes much quicker when using an index.
Related
This question already has an answer here:
Stored Proc slower from application than Management Studio
(1 answer)
Closed 9 years ago.
This is my dynamic query used on search form which runs in milliseconds in SSMS roughly between 300 to 400 ms:
exec sp_executesql N'set arithabort off;
set transaction isolation level read uncommitted;
With cte as
(Select ROW_NUMBER() OVER
(Order By Case When d.OldInstrumentID IS NULL
THEN d.LastStatusChangedDateTime Else d.RecordingDateTime End
desc) peta_rn,
d.DocumentID
From Documents d
Inner Join Users u on d.UserID = u.UserID
Inner Join IGroupes ig on ig.IGroupID = d.IGroupID
Inner Join ITypes it on it.ITypeID = d.ITypeID
Where 1=1
And (CreatedByAccountID = #0 Or DocumentStatusID = #1 Or DocumentStatusID = #2 )
And (d.JurisdictionID = #3 Or DocumentStatusID = #4 Or DocumentStatusID = #5)
AND ( d.DocumentStatusID = 9 )
)
Select d.DocumentID, d.IsReEfiled, d.IGroupID, d.ITypeID, d.RecordingDateTime,
d.CreatedByAccountID, d.JurisdictionID,
Case When d.OldInstrumentID IS NULL THEN d.LastStatusChangedDateTime
Else d.RecordingDateTime End as LastStatusChangedDateTime,
dbo.FnCanChangeDocumentStatus(d.DocumentStatusID,d.DocumentID) as CanChangeStatus,
d.IDate, d.InstrumentID, d.DocumentStatusID,ig.Abbreviation as IGroupAbbreviation,
u.Username, j.JDAbbreviation, inf.DocumentName,
it.Abbreviation as ITypeAbbreviation, d.DocumentDate,
ds.Abbreviation as DocumentStatusAbbreviation,
Upper(dbo.GetFlatDocumentName(d.DocumentID)) as FlatDocumentName
From Documents d
Left Join IGroupes ig On d.IGroupID = ig.IGroupID
Left Join ITypes it On d.ITypeID = it.ITypeID
Left Join Users u On u.UserID = d.UserID
Left Join DocumentStatuses ds On d.DocumentStatusID = ds.DocumentStatusID
Left Join InstrumentFiles inf On d.DocumentID = inf.DocumentID
Left Join Jurisdictions j on j.JurisdictionID = d.JurisdictionID
Inner Join cte on cte.DocumentID = d.DocumentID
Where 1=1
And peta_rn>=#6 AND peta_rn<=#7
Order by peta_rn',
N'#0 int,#1 int,#2 int,#3 int,#4 int,#5 int,#6 bigint,#7 bigint',
#0=44,#1=5,#2=9,#3=1,#4=5,#5=9,#6=94200,#7=94250
This sql is formed in C# code and the where clauses are added dynamically based on the value the user has searched in search form. It takes roughly 3 seconds to move from one page to 2nd. I already have necessary indexes on most of the columns where I search.
Any idea why would my Ado.Net code be slow?
Update: Not sure if execution plans would help but here they are:
It is possible that SQL server has created inappropriate query plan for ADO.NET connections. We have seen similar issues with ADO, usual solution is to clear any query plans and run slow query again - this may create better plan.
To clear query plans most general solution is to update statistics for involved tables. Like next for you:
update statistics documents with fullscan
Do same for other tables involved and then run your slow query from ADO.NET (do not run SSMS before).
Note that such timing inconsistencies may hint of bad query or database design - at least for us that is usually so :)
If you run a query repeatedly in SSMS, the database may re-use a previously created execution plan, and the required data may already be cached in memory.
There are a couple of things I notice in your query:
the CTE joins Users, IGroupes and ITypes, but the joined records are not used in the SELECT
the CTE performs an ORDER BY on a calculated expression (notice the 85% cost in (unindexed) Sort)
probably replacing the CASE expression with a computed persisted column which can be indexed speeds up execution.
note that the ORDER BY is executed on data resulting from joining 4 tables
the WHERE condition of the CTE states AND d.DocumentStatusID = 9, but AND's other DocumentStatusIDs
paging is performed on the result of 8 JOINed tables.
most likely creating an intermediate CTE which filters the first CTE based on peta_rn improves performance
.net by default uses UTF strings, which equates to NVARCHAR as opposed to VARCHAR.
When you are doing a WHERE ID = #foo in dot net, you are likely to be implicitly doing
WHERE CONVERT(ID, NVARCHAR) = #foo
The result is that this where clause can't be indexed, and must be table scanned. The solution is to actually pass each parameter into the SqlCommand as a DbParameter with the DbType set to VARCHAR (in the case of string).
A similar situation could of course occur with Int types if the .net parameter is "wider" than the SQL column equivalent.
PS The easiest way to "prove" this issue is to run your query in SSMS with the following above
DECLARE #p0 INT = 123
DECLARE #p1 NVARCHAR = "foobar" //etc etc
and compare with
DECLARE #p0 INT = 123
DECLARE #p1 VARCHAR = "foobar" //etc etc
I have a linq query which selects several fields from my Customer table.
Applied to this method are multiple filters, using Func<IQueryable<T>, IQueryable<T>> with .Invoke.
The original query is essentially select * from customer.
The filter method is essentially select top 10
The output SQL is select top 10 from (select * from customer)
My customer table has over 1,000,000 rows which causes this query to take about 7 seconds to execute in SSMS. If I alter the output SQL to select top 10 from (select top 10 * from customer) by running it in SSMS then the query is instant (as you'd expect).
I am wondering if anyone knows what might cause LINQ to not combine these in a nice way, and if there is a best practice/workaround I can implement.
I should note that my actual code isn't select * it is selecting a few fields, but there is nothing more complex.
I am using SQL Server 2008 and MVC 3 with entity framework (not sure what version)
Edit: I should add, it's IQueryable all the way, nothing is evaluated until the end, and as a result the long execution is confined to that single line.
I don't know why it's not being optimised.
If the filter method really is equivalent to SELECT TOP 10 then you should be able to do it like this:
return query.Take(10);
which would resolve to select top 10 * from customer rather than the more convoluted thing you ended up with.
If this won't work then I'm afraid I'll need a little more detail.
EDIT: To clarify, if you do this in LINQ:
DataItems.Take(10).Take(10)
you would get this SQL:
SELECT TOP (10) [t1].[col1], [t1].[col2]
FROM (
SELECT TOP (10) [t0].[col1], [t0].[col2]
FROM [DataItem] AS [t0]
) AS [t1]
So if you can somehow use a Take(n) you will be okay.
Consider this piece of code:
var question = context.Questionnaires.FirstOrDefault(q => q.id == 169).Categories.ToList()[1].Questions.ToList()[0];
This is just some excerise code for me to get familiar on how EF works. So for that reason I have created a few tables. Questionnaire with a reference to Category and the Category has a reference to Question.
What I notice here when I execute this code, I see in the profiler only the select statement of the questionnaire. But i am wondering, where is the query to get all the categories and questions? I can't find this query? I assume this must be visible in the profiler, right?
EDIT:
this is what I can get from the profiler:
SELECT TOP (1)
[Extent1].[id] AS [id],
[Extent1].[actualFrom] AS [actualFrom],
[Extent1].[name] AS [name],
[Extent1].[version] AS [version],
[Extent1].[startDate] AS [startDate],
[Extent1].[endDate] AS [endDate],
[Extent1].[description] AS [description],
[Extent1].[createdOn] AS [createdOn],
[Extent1].[createdBy] AS [createdBy],
[Extent1].[showQuestionCode] AS [showQuestionCode],
[Extent1].[font] AS [font],
[Extent1].[removed] AS [removed],
[Extent1].[showAchievementsAppointmentTab] AS [showAchievementsAppointmentTab],
[Extent1].[showConceptTabs] AS [showConceptTabs],
[Extent1].[f_QuestionnaireBuilder_QuestionnaireType_Id] AS [f_QuestionnaireBuilder_QuestionnaireType_Id],
[Extent1].[f_QuestionnaireBuilder_Status_Id] AS [f_QuestionnaireBuilder_Status_Id],
[Extent1].[f_QuestionnaireBuilder_Questionnaire_ParentId] AS [f_QuestionnaireBuilder_Questionnaire_ParentId],
[Extent1].[f_QuestionnaireBuilder_QuestionnaireCategory_Id] AS [f_QuestionnaireBuilder_QuestionnaireCategory_Id],
[Extent1].[f_Careplan_VisionModel] AS [f_Careplan_VisionModel]
FROM [implementation].[QuestionnaireBuilder_Questionnaire] AS [Extent1]
WHERE 169 = [Extent1].[id]
Do you have lazy loading enabled?
If lazy loading is not enabled, I think your C# line will throw an exception. The only time the database is hit is at the call to FirstOrDefault. The returned Questionnaire will have an empty collection of Categories because you did not include them in the original query. So, requesting an index of 1 should throw an exception.
If lazy loading is enabled, then the line should work, but it will result in multiple queries hitting the database. The first will be at the call to FirstOrDefault, the second will be at the conversion of Categories to a list, and the third will be at the conversion of Questions to a list.
So, if you have lazy loading enabled, check the profiler for additional queries after the one you posted.
You might try this, if lazy loading is disabled:
var question = context.Questionnaires.
Include("Categories.Questions").
FirstOrDefault(q => q.id == 169).
Categories.ToList()[1].
Questions.ToList()[0];
Here's a short blog post with some more information. Getting Started with Entity Framework 4 – Lazy Loading
I'm not sure if this is the root cause of your problem but remove the .ToList() calls. instead of an indexer, use .ElementAt(). ToList stops the query from being executed as a SQL query and instead switches you to Linq-to-Objects
I have a query linq like this:
var query = from c in Context.Customers select c;
var result = query.ToList();
Linq query generate this tsql code:
exec sp_executesql N'SELECT
[Project1].[Id] AS [Id],
[Project1].[Name] AS [Name],
[Project1].[Email] AS [Email]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
[Extent1].[Email] AS [Email]
FROM [dbo].[Customers] AS [Extent1] ) AS [Project1]
Is there a way for not generate subquery?
Do you have any evidence that that query is causing performance problems? I imagine the query-optimizer would easily recognize that.
If you're certain after profiling that the query is a performance problem (doubtful) - and only then - you could simply turn the query into a stored procedure, and call that instead.
You use a tool like Linq because you don't want to write SQL, before abandoning that you should at least compare the query plan of your proposed SQL vs that generated by the tool. I don't have access to SQL Studio at the moment, but I would be a bit surprised if the query plans aren't identical...
EDIT: having had a chance to check out the query plans, they are in fact identical.
Short answer: No you cannot modify that query.
Long answer: If you want to reimplement Linq provider and query generator then perhaps there is a way but I doubt you want to do that. You can also implement custom EF provider wrapper which will take query passed from EF and reformat it but that will be hard as well - and slow. Are you going to write custom interpreter for SQL queries?
I created an Entity Framework model that contains two tables from the Northwind database to test some of its functionality: Products and CAtegories.
It automatically created an association between Category and Product which is 0..1 to *.
I wrote this simple query:
var beverages = from p in db.Products.Include("Category")
where p.Category.CategoryName == "Beverages"
select p;
var beverageList = beverages.ToList();
I ran SQL Profiler and ran the code so i could see the SQL that it generates and this is what it generated:
SELECT
[Extent1].[ProductID] AS [ProductID],
[Extent1].[ProductName] AS [ProductName],
[Extent1].[SupplierID] AS [SupplierID],
[Extent1].[QuantityPerUnit] AS [QuantityPerUnit],
[Extent1].[UnitPrice] AS [UnitPrice],
[Extent1].[UnitsInStock] AS [UnitsInStock],
[Extent1].[UnitsOnOrder] AS [UnitsOnOrder],
[Extent1].[ReorderLevel] AS [ReorderLevel],
[Extent1].[Discontinued] AS [Discontinued],
[Extent3].[CategoryID] AS [CategoryID],
[Extent3].[CategoryName] AS [CategoryName],
[Extent3].[Description] AS [Description],
[Extent3].[Picture] AS [Picture]
FROM [dbo].[Products] AS [Extent1]
INNER JOIN [dbo].[Categories] AS [Extent2]
ON [Extent1].[CategoryID] = [Extent2].CategoryID]
LEFT OUTER JOIN [dbo].[Categories] AS [Extent3]
ON [Extent1].[CategoryID] = [Extent3].[CategoryID]
WHERE N'Beverages' = [Extent2].[CategoryName]
I am curious why the query inner joins to Categories and then left joins to it. The select statement is using the fields from the left joined table. Can someone help me understand the reason for this? If I remove the left join and change the select list to pull from Extent2 I get the same results for this query. In what situation would this not be true?
[Extent3] is a realization of Include(Category) and Include should not impact on result of selection from "main" table Product, so LEFT JOIN (all records from Product and some records from the right table Category).
[Extent2] is really to filter all records by related table Category with name "Beverages", so in this case it is the strong restriction (INNER JOIN)
Why two? :) Because of parsing expression-by-expression and auto generation for every statement (Include, Where)
You'll notice that the query is pulling all columns in the SELECT list from the copy of the Categories table aliased Extent3, but it's checking the CategoryName against the copy aliased Extent2.
In other words, in this scenario EF's query generation is not realizing that you're Include()ing and restricting the query via the same table, so it's blindly using two copies.
Unfortunately, beyond explaining what's going on, my experience with EF is not advanced enough to suggest a solution...
djacobson and igor explain pretty well why this happens. The way I personally use the Entity Framework, I avoid using Include altogether. Depending on what you're planning to do with the data, you could do something like this:
var beverages = from p in db.Products
select new {p, p.Category} into pc
where pc.Category.CategoryName == "Beverages"
select pc;
return beverages.ToList().Select(pc => pc.p);
... which, at least in EF 4.0, will produce just a single inner join. Entity Framework is smart enough to make it so that the product's Category property is populated with the category that came back from the database with it.
Of course, it's very likely that SQL Server optimizes things away so this won't actually gain you anything.
(Not directly an answer to your question if the queries are the same, but the comment field is too restricting for this)
If you leave out the .Include(), doesn't it load it anyway (because of the where)? Generally it makes more sense to me to use projections instead of Include():
var beverages = from p in db.Products.Include("Category")
where p.Category.CategoryName == "Beverages"
select new { Product = p, Category = p.Category };
var beverageList = beverages.ToList();