Why is LinqToEntities Skip/Take oracle implementation so slow - c#

Context is an Oracle Database, Entity Framework 5, LinqToEntities, Database First.
I'm trying to implement pagination over some large tables, my linqToEntities query looks like this :
context.MyDbSet
.Include("T2")
.Include("T3")
.Include("T4")
.Include("T5")
.Include("T6")
.Where(o => o.T3 != null)
.OrderBy(o => o.Id)
.Skip(16300)
.Take(50);
Fact is, depending on how many records i wanna skip (0 or 16300) it goes from 0.08s to 10.7s.
It seems weird to me so i checked the generated SQL and here is how it looks like :
SELECT *
FROM (
SELECT
[...]
FROM ( SELECT [...] row_number() OVER (ORDER BY "Extent1"."Id" ASC) AS "row_number"
FROM T1 "Extent1"
LEFT OUTER JOIN T2 "Extent2" ON [...]
LEFT OUTER JOIN T3 "Extent3" ON [...]
LEFT OUTER JOIN T4 "Extent4" ON [...]
LEFT OUTER JOIN T5 "Extent5" ON [...]
LEFT OUTER JOIN T6 "Extent6" ON [...]
WHERE ("Extent1"."SomeId" IS NOT NULL)
) "Filter1"
WHERE ("Filter1"."row_number" > 16300)
ORDER BY "Filter1"."Id" ASC
)
WHERE (ROWNUM <= (50) )
I checked if it actually was Oracle who took time through SQL Developper, and it is.
I then checked the execution plan and it's what appears :
All i actually understand is that there's no STOPKEY for the first filter over row_number and that it probably fetch the whole sub query.
I don't really know where to look at, if i'm not mistaken, the request is generated by ODT/ODP/.. and thus, should be optimized for an Oracle DB.. (and i can't change it myself)
Maybe my database model is rotten and i could add whetever indexes or optimization for it to work better ?

You see all those UNIQUE SCAN subqueries? That's doing a UNIQUE on every single one of those tables.
You should have a one-to-many relationship with T1 being the parent, and any tables you need to be in the query to have a FOREIGN KEY relationship to an indexed ID primary key column on T1.
Then you can use Join instead of Include and the subqueries will be unnecessary.

Related

LINQ to SQL generates a different query for similar group by expressions

I noticed that when using GroupBy in Linq to SQL, there's a difference in the result query when providing a reference Id as the Key versus using the actual navigation property as the Key.
Example 1:
Employees.GroupBy(x => x.CompanyId).Select(g => g.Count())
Result SQL:
SELECT COUNT(*) AS [value]
FROM [Employees] AS [t0]
GROUP BY [t0].[CompanyId]
Example 2:
Employees.GroupBy(x => x.Company).Select(g => g.Count())
Result SQL:
SELECT [t1].[value]
FROM (
SELECT COUNT(*) AS [value], [t0].[DivisionDeductionID]
FROM [CheckDeductions] AS [t0]
GROUP BY [t0].[DivisionDeductionID]
) AS [t1]
LEFT OUTER JOIN [DivisionDeductions] AS [t2] ON [t2].[DivisionDeductionID] = [t1].[DivisionDeductionID]
Looking at Example #2, it is obvious that [t2] is never used other than the LEFT JOIN itself. why doesn't LINQ to SQL detects that and just uses the same query as Example #1? it anyways groups by the ID field.
This looks like EF's SQL generator has missed an opportunity to optimize the query: indeed, since [t2] is not used outside the outer join, it could be thrown away, along with a nested select.
It appears that EF writers added a join for [t2] because they did not want to differentiate between a situation (1) when a navigation property is used only for its PK (so the corresponding FK could be used in its place) and (2) a situation when the query actually pull additional fields from it.
This practice is completely justified, given that RDBMS optimizes out the unnecessary join anyway.

NHibernate linq fetch query across multiple joins with subquery causes incorrect sql

I have four tables in a sql server databases:
Part
-----
Id (PK)
LineId (FK)
other fields...
Line
-----
Id (PK)
ProcessId (FK)
other fields...
Process
-----
Id (PK)
ProcessTypeId (FK)
other fields...
ProcessType
-----
Id (PK)
other fields...
I am trying to use a linq query with fetch to hydrate these entities then map the result to a view model dto.
I am using two queries, one is on Part, I am applying a filter on it to narrow down the result:
var partids = s.Query<Part>()
.Where(p => p.Line.Process.ProcessType.Id == processTypeId)
.Select(p => p.Id);
I then use this query to eager load the related entities and use the first query as a subquery:
var q = s.Query<Part>()
.Fetch(p => p.Line)
.ThenFetch(l => l.Process)
.ThenFetch(pr => pr.ProcessType)
.Where(p => partids.Contains(p.Id))
.ToList();
Though this query works, I noticed that was taking a very long time to load. So, using a profiler, I looked at the generated SQL which was:
SELECT part0_.Id AS Id0_0_,
line1_.Id AS Id1_1_,
process2_.Id AS Id2_2_,
process3_.Id AS Id3_3_,
part0_.Name AS Name0_0_,
part0_.LineId AS Line3_0_0_,
line1_.Name AS Name1_1_,
line1_.ProcessId AS Proccess3_1_1_,
process2_.Name AS Name2_2_,
process2_.ProcessTypeId AS Proccess3_2_2_,
process3_.Name AS Name3_3_
FROM part part0_
LEFT OUTER JOIN Line line1_ ON part0_.LineId=line1_.Id
LEFT OUTER JOIN Process process2_ ON line1_.ProcessId=process2_.Id
LEFT OUTER JOIN ProcessType process3_ ON process2_.ProcessTypeId=process3_.Id
WHERE part0_.Id IN
( SELECT part4_.Id
FROM Part Part4_
INNER JOIN Line Line5_ ON Part4_.LineId=Line5_.Id
WHERE process2_.ProcessTypeId= 126 );
Th subquery joining back onto the main query is causing this to run extremely slow in most cases.
I would have expected the generated SQL to be this:
SELECT part0_.Id AS Id0_0_,
line1_.Id AS Id1_1_,
process2_.Id AS Id2_2_,
process3_.Id AS Id3_3_,
part0_.Name AS Name0_0_,
part0_.LineId AS Line3_0_0_,
line1_.Name AS Name1_1_,
line1_.ProcessId AS Proccess3_1_1_,
process2_.Name AS Name2_2_,
process2_.ProcessTypeId AS Proccess3_2_2_,
process3_.Name AS Name3_3_
FROM part part0_
LEFT OUTER JOIN Line line1_ ON part0_.LineId=line1_.Id
LEFT OUTER JOIN Process process2_ ON line1_.ProcessId=process2_.Id
LEFT OUTER JOIN ProcessType process3_ ON process2_.ProcessTypeId=process3_.Id
WHERE part0_.Id IN
( SELECT part4_.Id
FROM Part part4_
INNER JOIN Line Line5_ ON part4_.LineId=Line5_.Id
INNER JOIN Process Process6_ ON Line5_.LineId=Process6_.Id
WHERE Process6_.ProcessTypeId= 126 );
I am using NHibernate 4 with the linq provider for all of my queries. Am I missing something in my linq query here?
The work around I use at the moment is to hydrate the partids query with ToList and then use the list of ids from memory. However, I would like to avoid two round trips to the database in this scenario if possible.
It is not currently feasible for me to use the QueryOver or HQL apis because all of my querying a filter code uses linq.
Please help!
Unfortunately, I think the workaround that you're currently using is your best option.
This is a bug in NHibernate, it currently affects version 4 and I'm pretty sure it was there in 3.3 too.
The bug has been reported.

Efficient way to count related entities of a many to many relation in EF

I would like to know how to efficiently count (SQL server side) the amount of distinct count of results for a specific range of a related entity that has a many to many relationship.
This is the current situation in entity Framework:
Table1 1<------->∞ Table2
Table2 ∞<------->∞ Table4
Table 2 and Table 4 have a many to many relationship and are linked with Table3 in SQL.
What I want is the distinct count of table4 results related to a specific range of Table1.
In LinQ to SQL the query is this:
(from dc in Table1
join vc in Table2 on dc.Table1Id equals vc.Table2Id
join vcac in Table3 on vc.Table2Id equals vcac.Table3Id
join ac in Table4 on vcac.Table3Id equals ac.Table4Id
where dc.Table1Id > 200000
group ac by ac.Table4Id into nieuw
select new { acid= nieuw.Key}).Count()
This lets SQL server return the count directly.
Because the extra table for the many to many relation ( Table3) is gone, I have had problems converting this query to L2E in query syntax. ( since I cannot join table 4 with table 2 with an inner join).
I have this in chained syntax, however, is this efficient ( does this fetch the whole list, or does it let SQLserver do the count, as I'm not sure this is an efficient way to select, Table 2 contains about 30.000 entries, I don't want it to fetch this result just to count it):
context.Table4.Where(a => a.Table2.Any(v => v.Table1Id > 200000)).Select(a => aTable4Id).Distinct().Count();
How would I go converting the Linq2SQL query into L2E in the query syntax ? Or is the chained syntax fine in this situation ?
The .Select() method uses deferred execution, meaning it won't actually run the query until you need the results. At that point in the chain it still exists only as a query definition. Then you modify with .Distinct() before getting .Count() - which does query the database using a SQL GROUP BY statement. So you should be good.

How can I call Include in Entity Framework after filter query

I'm trying call Include and loading related entities in EF !after WHERE query. How can I do this in EF? In SQL it's looks like this:
SELECT * FROM T1 INNER JOIN ( SELECT * FROM T WHERE ... (filtering data)) as T2 ON T1.A = T2.A (loading data to filtered data)
If i write db.Include(..).Where(..) or db.Where(..).Include(..) , in SQL server profiler I will see next query:
SELECT ...
FROM T1 AS [Extent1]
INNER JOIN T2 AS [Extent2] ON [Extent1].[A] = [Extent2].[A]
LEFT OUTER JOIN T2 AS [Extent3] ON [Extent1].[A] = [Extent3].[A]
WHERE N'B1' = [Extent1].[B]
But in this first performed join query and after filtering query.
Thanks in advance
AFAIK Unfortunately, this feature is not yet supported in Entity Framework (filtering on include).
Nearest you can get is to to perform separate queries.
You can check out Larislav's answer for more information.

Trouble understanding the SQL generated from this Entity Framework query

I created an Entity Framework model that contains two tables from the Northwind database to test some of its functionality: Products and CAtegories.
It automatically created an association between Category and Product which is 0..1 to *.
I wrote this simple query:
var beverages = from p in db.Products.Include("Category")
where p.Category.CategoryName == "Beverages"
select p;
var beverageList = beverages.ToList();
I ran SQL Profiler and ran the code so i could see the SQL that it generates and this is what it generated:
SELECT
[Extent1].[ProductID] AS [ProductID],
[Extent1].[ProductName] AS [ProductName],
[Extent1].[SupplierID] AS [SupplierID],
[Extent1].[QuantityPerUnit] AS [QuantityPerUnit],
[Extent1].[UnitPrice] AS [UnitPrice],
[Extent1].[UnitsInStock] AS [UnitsInStock],
[Extent1].[UnitsOnOrder] AS [UnitsOnOrder],
[Extent1].[ReorderLevel] AS [ReorderLevel],
[Extent1].[Discontinued] AS [Discontinued],
[Extent3].[CategoryID] AS [CategoryID],
[Extent3].[CategoryName] AS [CategoryName],
[Extent3].[Description] AS [Description],
[Extent3].[Picture] AS [Picture]
FROM [dbo].[Products] AS [Extent1]
INNER JOIN [dbo].[Categories] AS [Extent2]
ON [Extent1].[CategoryID] = [Extent2].CategoryID]
LEFT OUTER JOIN [dbo].[Categories] AS [Extent3]
ON [Extent1].[CategoryID] = [Extent3].[CategoryID]
WHERE N'Beverages' = [Extent2].[CategoryName]
I am curious why the query inner joins to Categories and then left joins to it. The select statement is using the fields from the left joined table. Can someone help me understand the reason for this? If I remove the left join and change the select list to pull from Extent2 I get the same results for this query. In what situation would this not be true?
[Extent3] is a realization of Include(Category) and Include should not impact on result of selection from "main" table Product, so LEFT JOIN (all records from Product and some records from the right table Category).
[Extent2] is really to filter all records by related table Category with name "Beverages", so in this case it is the strong restriction (INNER JOIN)
Why two? :) Because of parsing expression-by-expression and auto generation for every statement (Include, Where)
You'll notice that the query is pulling all columns in the SELECT list from the copy of the Categories table aliased Extent3, but it's checking the CategoryName against the copy aliased Extent2.
In other words, in this scenario EF's query generation is not realizing that you're Include()ing and restricting the query via the same table, so it's blindly using two copies.
Unfortunately, beyond explaining what's going on, my experience with EF is not advanced enough to suggest a solution...
djacobson and igor explain pretty well why this happens. The way I personally use the Entity Framework, I avoid using Include altogether. Depending on what you're planning to do with the data, you could do something like this:
var beverages = from p in db.Products
select new {p, p.Category} into pc
where pc.Category.CategoryName == "Beverages"
select pc;
return beverages.ToList().Select(pc => pc.p);
... which, at least in EF 4.0, will produce just a single inner join. Entity Framework is smart enough to make it so that the product's Category property is populated with the category that came back from the database with it.
Of course, it's very likely that SQL Server optimizes things away so this won't actually gain you anything.
(Not directly an answer to your question if the queries are the same, but the comment field is too restricting for this)
If you leave out the .Include(), doesn't it load it anyway (because of the where)? Generally it makes more sense to me to use projections instead of Include():
var beverages = from p in db.Products.Include("Category")
where p.Category.CategoryName == "Beverages"
select new { Product = p, Category = p.Category };
var beverageList = beverages.ToList();

Categories