We are using Linq and Entity Framework to access a SQL Server 2012 database. We are having some performance issue, so after some investigation, we were able to fix some of the problems, but I would like to use SQL query with OFFSET/FETCH instead of ROW_NUMBER() and BETWEEN syntax.
The performance difference is not so big. OFFSET/FETCH is quicker by about 10%. Do you have any idea why the generated query uses ROW_NUMBER() and BETWEEN syntax? What can I do to force Linq to generate OFFSET/FETCH query?
C# code:
var orders = dc.Orders.OrderBy(q => q.LastModifiedTimestamp)
.Skip(q => skipCount)
.Take(q => takeCount)
.ToList();
The currently generated query:
-- Region Parameters
DECLARE #p0 Int = 10
DECLARE #p1 Int = 10
-- EndRegion
SELECT [t2].[OrderId], [t2].[CustomerId]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t1].[OrderId], [t1].[CustomerId]
FROM (
SELECT DISTINCT [t0].[OrderId], [t0].[CustomerId]
FROM [Order] AS [t0]
) AS [t1]
) AS [t2]
WHERE [t2].[ROW_NUMBER] BETWEEN #p0 + 1 AND #p0 + #p1
ORDER BY [t2].[ROW_NUMBER]
The preferred query:
SELECT *
FROM [Order]
ORDER BY LastModifiedTimestamp
OFFSET 10000 ROWS
FETCH NEXT 10000 ROWS ONLY
Do you have any idea why the generated query use ROW_NUMBER() and BETWEEN syntax? What can I do to force Linq to generate OFFSET/FETCH query?
Is there any difference between these two LINQ-to-Entities queries:
context.Table.Count(x => ...)
and
context.Table.Where(x => ...).Count()
in terms of performance and generated SQL?
I tried to look generated SQL myself, but I only know how to get the SQL from IQueryable, but Count returns the value directly.
I have managed to see the SQL (thanks to #dasblinkenlight), the answer is - no, both LINQ queries generate exactly the same SQL query, at least for a simple query without a grouping:
SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM [dbo].[Table] AS [Extent1]
WHERE <condition>
) AS [GroupBy1]
I noticed that when using GroupBy in Linq to SQL, there's a difference in the result query when providing a reference Id as the Key versus using the actual navigation property as the Key.
Example 1:
Employees.GroupBy(x => x.CompanyId).Select(g => g.Count())
Result SQL:
SELECT COUNT(*) AS [value]
FROM [Employees] AS [t0]
GROUP BY [t0].[CompanyId]
Example 2:
Employees.GroupBy(x => x.Company).Select(g => g.Count())
Result SQL:
SELECT [t1].[value]
FROM (
SELECT COUNT(*) AS [value], [t0].[DivisionDeductionID]
FROM [CheckDeductions] AS [t0]
GROUP BY [t0].[DivisionDeductionID]
) AS [t1]
LEFT OUTER JOIN [DivisionDeductions] AS [t2] ON [t2].[DivisionDeductionID] = [t1].[DivisionDeductionID]
Looking at Example #2, it is obvious that [t2] is never used other than the LEFT JOIN itself. why doesn't LINQ to SQL detects that and just uses the same query as Example #1? it anyways groups by the ID field.
This looks like EF's SQL generator has missed an opportunity to optimize the query: indeed, since [t2] is not used outside the outer join, it could be thrown away, along with a nested select.
It appears that EF writers added a join for [t2] because they did not want to differentiate between a situation (1) when a navigation property is used only for its PK (so the corresponding FK could be used in its place) and (2) a situation when the query actually pull additional fields from it.
This practice is completely justified, given that RDBMS optimizes out the unnecessary join anyway.
If I use a join, the Include() method is no longer working, eg:
from e in dc.Entities.Include("Properties")
join i in dc.Items on e.ID equals i.Member.ID
where (i.Collection.ID == collectionID)
select e
e.Properties is not loaded
Without the join, the Include() works
Lee
UPDATE: Actually I recently added another Tip that covers this, and provides an alternate probably better solution. The idea is to delay the use of Include() until the end of the query, see this for more information: Tip 22 - How to make include really include
There is known limitation in the Entity Framework when using Include().
Certain operations are just not supported with Include.
Looks like you may have run into one on those limitations, to work around this you should try something like this:
var results =
from e in dc.Entities //Notice no include
join i in dc.Items on e.ID equals i.Member.ID
where (i.Collection.ID == collectionID)
select new {Entity = e, Properties = e.Properties};
This will bring back the Properties, and if the relationship between entity and Properties is a one to many (but not a many to many) you will find that each resulting anonymous type has the same values in:
anonType.Entity.Properties
anonType.Properties
This is a side-effect of a feature in the Entity Framework called relationship fixup.
See this Tip 1 in my EF Tips series for more information.
Try this:
var query = (ObjectQuery<Entities>)(from e in dc.Entities
join i in dc.Items on e.ID equals i.Member.ID
where (i.Collection.ID == collectionID)
select e)
return query.Include("Properties")
So what is the name of the navigation property on "Entity" which relates to "Item.Member" (i.e., is the other end of the navigation). You should be using this instead of the join. For example, if "entity" add a property called Member with the cardinality of 1 and Member had a property called Items with a cardinality of many, you could do this:
from e in dc.Entities.Include("Properties")
where e.Member.Items.Any(i => i.Collection.ID == collectionID)
select e
I'm guessing at the properties of your model here, but this should give you the general idea. In most cases, using join in LINQ to Entities is wrong, because it suggests that either your navigational properties are not set up correctly, or you are not using them.
So, I realise I am late to the party here, however I thought I'd add my findings. This should really be a comment on Alex James's post, but as I don't have the reputation it'll have to go here.
So my answer is: it doesn't seem to work at all as you would intend. Alex James gives two interesting solutions, however if you try them and check the SQL, it's horrible.
The example I was working on is:
var theRelease = from release in context.Releases
where release.Name == "Hello World"
select release;
var allProductionVersions = from prodVer in context.ProductionVersions
where prodVer.Status == 1
select prodVer;
var combined = (from release in theRelease
join p in allProductionVersions on release.Id equals p.ReleaseID
select release).Include(release => release.ProductionVersions);
var allProductionsForChosenRelease = combined.ToList();
This follows the simpler of the two examples. Without the include it produces the perfectly respectable sql:
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name]
FROM [dbo].[Releases] AS [Extent1]
INNER JOIN [dbo].[ProductionVersions] AS [Extent2] ON [Extent1].[Id] = [Extent2].[ReleaseID]
WHERE ('Hello World' = [Extent1].[Name]) AND (1 = [Extent2].[Status])
But with, OMG:
SELECT
[Project1].[Id1] AS [Id],
[Project1].[Id] AS [Id1],
[Project1].[Name] AS [Name],
[Project1].[C1] AS [C1],
[Project1].[Id2] AS [Id2],
[Project1].[Status] AS [Status],
[Project1].[ReleaseID] AS [ReleaseID]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
[Extent2].[Id] AS [Id1],
[Extent3].[Id] AS [Id2],
[Extent3].[Status] AS [Status],
[Extent3].[ReleaseID] AS [ReleaseID],
CASE WHEN ([Extent3].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1]
FROM [dbo].[Releases] AS [Extent1]
INNER JOIN [dbo].[ProductionVersions] AS [Extent2] ON [Extent1].[Id] = [Extent2].[ReleaseID]
LEFT OUTER JOIN [dbo].[ProductionVersions] AS [Extent3] ON [Extent1].[Id] = [Extent3].[ReleaseID]
WHERE ('Hello World' = [Extent1].[Name]) AND (1 = [Extent2].[Status])
) AS [Project1]
ORDER BY [Project1].[Id1] ASC, [Project1].[Id] ASC, [Project1].[C1] ASC
Total garbage. The key point to note here is the fact that it returns the outer joined version of the table which has not been limited by status=1.
This results in the WRONG data being returned:
Id Id1 Name C1 Id2 Status ReleaseID
2 1 Hello World 1 1 2 1
2 1 Hello World 1 2 1 1
Note that the status of 2 is being returned there, despite our restriction. It simply does not work.
If I have gone wrong somewhere, I would be delighted to find out, as this is making a mockery of Linq. I love the idea, but the execution doesn't seem to be usable at the moment.
Out of curiosity, I tried the LinqToSQL dbml rather than the LinqToEntities edmx that produced the mess above:
SELECT [t0].[Id], [t0].[Name], [t2].[Id] AS [Id2], [t2].[Status], [t2].[ReleaseID], (
SELECT COUNT(*)
FROM [dbo].[ProductionVersions] AS [t3]
WHERE [t3].[ReleaseID] = [t0].[Id]
) AS [value]
FROM [dbo].[Releases] AS [t0]
INNER JOIN [dbo].[ProductionVersions] AS [t1] ON [t0].[Id] = [t1].[ReleaseID]
LEFT OUTER JOIN [dbo].[ProductionVersions] AS [t2] ON [t2].[ReleaseID] = [t0].[Id]
WHERE ([t0].[Name] = #p0) AND ([t1].[Status] = #p1)
ORDER BY [t0].[Id], [t1].[Id], [t2].[Id]
Slightly more compact - weird count clause, but overall same total FAIL.
Has anybody actually ever used this stuff in a real business application? I'm really starting to wonder...
Please tell me I've missed something obvious, as I really want to like Linq!
Try the more verbose way to do more or less the same thing obtain the same results, but with more datacalls:
var mydata = from e in dc.Entities
join i in dc.Items
on e.ID equals i.Member.ID
where (i.Collection.ID == collectionID)
select e;
foreach (Entity ent in mydata) {
if(!ent.Properties.IsLoaded) { ent.Properties.Load(); }
}
Do you still get the same (unexpected) result?
EDIT: Changed the first sentence, as it was incorrect. Thanks for the pointer comment!
I am noticing that the Entity Framework is generated some inefficient queries when using the Find() method. For example here is my C# code.
Model model = unit.Repository.DbSet.Find(model.ID);
Generate Find() Query
DECLARE #p0 int = 1
SELECT
[Limit1].[ID] AS [ID],
[Limit1].[UserID] AS [UserID],
[Limit1].[Started] AS [Started],
[Limit1].[Updated] AS [Updated],
[Limit1].[Completed] AS [Completed]
FROM ( SELECT TOP (2)
[Extent1].[ID] AS [ID],
[Extent1].[UserID] AS [UserID],
[Extent1].[Started] AS [Started],
[Extent1].[Updated] AS [Updated],
[Extent1].[Completed] AS [Completed]
FROM [dbo].[Table] AS [Extent1]
WHERE [Extent1].[ID] = #p0
) AS [Limit1]
It seems to be running a whole other select query which is unnecessary. Here is the output using the SingleOrDefault() method.
Generate SingleOrDefault() Query
DECLARE #p__linq__0 int = 1
SELECT TOP (2)
[Extent1].[ID] AS [ID],
[Extent1].[UserID] AS [UserID],
[Extent1].[Started] AS [Started],
[Extent1].[Updated] AS [Updated],
[Extent1].[Completed] AS [Completed]
FROM [dbo].[Table] AS [Extent1]
WHERE [Extent1].[ID] = #p__linq__0
Is there a reason why Find() is generating two selects? Should the Find() method be avoided in favor of the SingleOrDefault() method?
I doubt there is any performance difference between the two, for sql server at least. It looks like the first one just has an extra wrapper around the select. Running a similar query on a database that I have generates the exact same plan, so I would imagine the outer select gets optimized away in the execution plan.