Correct behavior of OrderBy - c#

I have encountered something that puzzles me and I would like to see your opinion on the matter. It turns out that linq to sql and entity framework threats consecutive order by's differently.
The following code is used just for example and I am not claiming it has any sense at all:
Linq to sql:
DataClasses1DataContext db = new DataClasses1DataContext();
var result = (from c in db.Products
orderby c.ProductName
orderby c.UnitPrice
orderby c.UnitsOnOrder
select c).ToList();
What it generats on the server side:
SELECT [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID], [t0].[CategoryID], [t0].[QuantityPerUnit], [t0].[UnitPrice], [t0].[UnitsInStock], [t0].[UnitsOnOrder], [t0].[ReorderLevel], [t0].[Discontinued]
FROM [dbo].[Products] AS [t0]
ORDER BY [t0].[UnitsOnOrder], [t0].[UnitPrice], [t0].[ProductName]
The same test with Entity Framework generates this:
SELECT
[Extent1].[ProductID] AS [ProductID],
[Extent1].[ProductName] AS [ProductName],
[Extent1].[SupplierID] AS [SupplierID],
[Extent1].[CategoryID] AS [CategoryID],
[Extent1].[QuantityPerUnit] AS [QuantityPerUnit],
[Extent1].[UnitPrice] AS [UnitPrice],
[Extent1].[UnitsInStock] AS [UnitsInStock],
[Extent1].[UnitsOnOrder] AS [UnitsOnOrder],
[Extent1].[ReorderLevel] AS [ReorderLevel],
[Extent1].[Discontinued] AS [Discontinued]
FROM [dbo].[Products] AS [Extent1]
ORDER BY [Extent1].[UnitsOnOrder] ASC
As you can see Linq To Sql adds all the requested order by's where the last one has the highest priority (which in my opinion is correct).
On the other hand entity framework respects only the last order by and disregards all the others.
Now I know there is an order by then by clause that can be used but I am just wondering which behavior is more correct. Also as far as I remember the query extenders used in asp are working with a separate order by which if applied on a query generated from a different data source will not work correctly (according to the above example one of the order by's will be omitted)

My opinion is that EF is correct. I don't know why L2S would do what you're describing - in my opinion, if you add an OrderBy clause instead of using ThenBy, it should overwrite any existing OrderBys.
When you're working with Linq-To-Objects, you should see OrderBy replace any previous ones, so it makes more sense to me to have the data-driven LINQ act the same.
If the behavior changed the way you're describing, then it seems that Microsoft agrees, since EF was designed to replace L2S.

What i've learned is that the order by is written like this:
DataClasses1DataContext db = new DataClasses1DataContext();
var result = (from c in db.Products
orderby c.UnitsOnOrder, c.UnitPrice, c.ProductName
select c).ToList();
And like that you can see the order clear to every one.

Related

How to write EF query for Common Table Expression

This Sql query returns the expected data. I need to do the same in EF Query. I am not sure how to do it all in one EF query.
WITH cteproductactions (productkey, actionid) AS (
SELECT productkey, count(*) FROM productactions
GROUP BY productkey
HAVING count(*)>0
)
SELECT p.name,p.productkey,p.imageurl
FROM product p
INNER JOIN cteproductactions c on p.ProductKey=c.productkey
WHERE p.profileid=100
EF Query
var products = productRepo.Where(x => x.profileid=100);
var productkeys = products.Select(x => x.ProductKey).ToList();
var productActions = productActionsRepo.Where(x => productkeys.Contains(x.ProductKey));
You'll tie yourself in knots trying to write an SQL then "converting" it into LINQ or forcing EF to generate an SQL that is the same.. It's better to start from a place where you express what you want in high level (English) and write the LINQ for it; forget the SQL unless there's a real problem
The SQL as written doesn't really make sense, or need a CTE, the HAVING clause is pointless and none of the columns from the CTE are used in the output. The only purpose the CTE serves is to filter the product list down to those that have at least one productkey, so write an EF from that - "all products Where profile is is 100 and a related product key exists" - don't get bogged down in "how do I make EF do a cte?" because these SQL express the same sentiments without a CTE:
SELECT p.name,p.productkey,p.imageurl
FROM product p
INNER JOIN (SELECT DISTINCT productKey FROM productactions) c on p.ProductKey=c.productkey
WHERE p.profileid=100
SELECT DISTINCT p.name,p.productkey,p.imageurl
FROM product p
INNER JOIN productactions c on p.ProductKey=c.productkey
WHERE p.profileid=100
SELECT p.name,p.productkey,p.imageurl
FROM product p
WHERE p.profileid=100
AND EXISTS(SELECT null FROM productactions c WHERE p.ProductKey=c.productkey)
Assuming product and productactions are in a 1:M relationship connected by productKey, consider something like:
var products = productRepo
.Where(p => p.profileid==100 && p.ProductActions.Any())
.Select(p => new {p.Name, p.ProductKey, p.ImageUrl)
Main message here is "don't start from an SQL mindset and think "how can I make EF do this sql", start from a "What do I want and how can I make EF do it" - forget the SQL unless EF is generating something horrifically underperformant.

LINQ to SQL generates a different query for similar group by expressions

I noticed that when using GroupBy in Linq to SQL, there's a difference in the result query when providing a reference Id as the Key versus using the actual navigation property as the Key.
Example 1:
Employees.GroupBy(x => x.CompanyId).Select(g => g.Count())
Result SQL:
SELECT COUNT(*) AS [value]
FROM [Employees] AS [t0]
GROUP BY [t0].[CompanyId]
Example 2:
Employees.GroupBy(x => x.Company).Select(g => g.Count())
Result SQL:
SELECT [t1].[value]
FROM (
SELECT COUNT(*) AS [value], [t0].[DivisionDeductionID]
FROM [CheckDeductions] AS [t0]
GROUP BY [t0].[DivisionDeductionID]
) AS [t1]
LEFT OUTER JOIN [DivisionDeductions] AS [t2] ON [t2].[DivisionDeductionID] = [t1].[DivisionDeductionID]
Looking at Example #2, it is obvious that [t2] is never used other than the LEFT JOIN itself. why doesn't LINQ to SQL detects that and just uses the same query as Example #1? it anyways groups by the ID field.
This looks like EF's SQL generator has missed an opportunity to optimize the query: indeed, since [t2] is not used outside the outer join, it could be thrown away, along with a nested select.
It appears that EF writers added a join for [t2] because they did not want to differentiate between a situation (1) when a navigation property is used only for its PK (so the corresponding FK could be used in its place) and (2) a situation when the query actually pull additional fields from it.
This practice is completely justified, given that RDBMS optimizes out the unnecessary join anyway.

Improve query generated from entity framework

I have a query linq like this:
var query = from c in Context.Customers select c;
var result = query.ToList();
Linq query generate this tsql code:
exec sp_executesql N'SELECT
[Project1].[Id] AS [Id],
[Project1].[Name] AS [Name],
[Project1].[Email] AS [Email]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
[Extent1].[Email] AS [Email]
FROM [dbo].[Customers] AS [Extent1] ) AS [Project1]
Is there a way for not generate subquery?
Do you have any evidence that that query is causing performance problems? I imagine the query-optimizer would easily recognize that.
If you're certain after profiling that the query is a performance problem (doubtful) - and only then - you could simply turn the query into a stored procedure, and call that instead.
You use a tool like Linq because you don't want to write SQL, before abandoning that you should at least compare the query plan of your proposed SQL vs that generated by the tool. I don't have access to SQL Studio at the moment, but I would be a bit surprised if the query plans aren't identical...
EDIT: having had a chance to check out the query plans, they are in fact identical.
Short answer: No you cannot modify that query.
Long answer: If you want to reimplement Linq provider and query generator then perhaps there is a way but I doubt you want to do that. You can also implement custom EF provider wrapper which will take query passed from EF and reformat it but that will be hard as well - and slow. Are you going to write custom interpreter for SQL queries?

How do i modify Linq generated sql statement?

I have summarized my problem in following code.
NorthwindDataContext dc = new NorthwindDataContext();
var query = from c in dc.Customers
select c;
Above code is generating following sql statement
SELECT [t0].[ID], [t0].[FirstName], [t0].[LastName]
FROM [dbo].[Customer] AS [t0]
Now i want to modify the above generated query something like this
SELECT [t0].[ID], [t0].[FirstName], [t0].[LastName] FROM [dbo].[Customer]
AS [t0] WITH (nolock)
Is it possible in linq to modify the generated query?If yes then how?
You will not be able to modify the generated L2S T-SQL code directly, the way you want (unless you modify the transaction isolation level). However, we've dealt with situations like this, fairly simply, by creating a view with lock hints we want in place and querying the view, instead of the table directly.
I have found a very handy tips for modifying the linq generated sql statement.
NorthwindDataContext db = new NorthwindDataContext();
if (db.Connection.State == System.Data.ConnectionState.Closed)
db.Connection.Open();
var cmd = db.GetCommand(db.Customers.Where(p => p.ID == 1));
cmd.CommandText = cmd.CommandText.Replace("[Customers] AS [t0]", "[Customers] AS [t0] WITH (NOLOCK)");
var results = db.Translate(cmd.ExecuteReader());
Maybe these pages will help you..
http://www.infoq.com/news/2008/03/linq-nolock
http://coolthingoftheday.blogspot.com/2008/03/linq-to-sql-nolock.html
which refers hanselmans blog entry
http://www.hanselman.com/blog/GettingLINQToSQLAndLINQToEntitiesToUseNOLOCK.aspx
or check out this question
NOLOCK with Linq to SQL

Trouble understanding the SQL generated from this Entity Framework query

I created an Entity Framework model that contains two tables from the Northwind database to test some of its functionality: Products and CAtegories.
It automatically created an association between Category and Product which is 0..1 to *.
I wrote this simple query:
var beverages = from p in db.Products.Include("Category")
where p.Category.CategoryName == "Beverages"
select p;
var beverageList = beverages.ToList();
I ran SQL Profiler and ran the code so i could see the SQL that it generates and this is what it generated:
SELECT
[Extent1].[ProductID] AS [ProductID],
[Extent1].[ProductName] AS [ProductName],
[Extent1].[SupplierID] AS [SupplierID],
[Extent1].[QuantityPerUnit] AS [QuantityPerUnit],
[Extent1].[UnitPrice] AS [UnitPrice],
[Extent1].[UnitsInStock] AS [UnitsInStock],
[Extent1].[UnitsOnOrder] AS [UnitsOnOrder],
[Extent1].[ReorderLevel] AS [ReorderLevel],
[Extent1].[Discontinued] AS [Discontinued],
[Extent3].[CategoryID] AS [CategoryID],
[Extent3].[CategoryName] AS [CategoryName],
[Extent3].[Description] AS [Description],
[Extent3].[Picture] AS [Picture]
FROM [dbo].[Products] AS [Extent1]
INNER JOIN [dbo].[Categories] AS [Extent2]
ON [Extent1].[CategoryID] = [Extent2].CategoryID]
LEFT OUTER JOIN [dbo].[Categories] AS [Extent3]
ON [Extent1].[CategoryID] = [Extent3].[CategoryID]
WHERE N'Beverages' = [Extent2].[CategoryName]
I am curious why the query inner joins to Categories and then left joins to it. The select statement is using the fields from the left joined table. Can someone help me understand the reason for this? If I remove the left join and change the select list to pull from Extent2 I get the same results for this query. In what situation would this not be true?
[Extent3] is a realization of Include(Category) and Include should not impact on result of selection from "main" table Product, so LEFT JOIN (all records from Product and some records from the right table Category).
[Extent2] is really to filter all records by related table Category with name "Beverages", so in this case it is the strong restriction (INNER JOIN)
Why two? :) Because of parsing expression-by-expression and auto generation for every statement (Include, Where)
You'll notice that the query is pulling all columns in the SELECT list from the copy of the Categories table aliased Extent3, but it's checking the CategoryName against the copy aliased Extent2.
In other words, in this scenario EF's query generation is not realizing that you're Include()ing and restricting the query via the same table, so it's blindly using two copies.
Unfortunately, beyond explaining what's going on, my experience with EF is not advanced enough to suggest a solution...
djacobson and igor explain pretty well why this happens. The way I personally use the Entity Framework, I avoid using Include altogether. Depending on what you're planning to do with the data, you could do something like this:
var beverages = from p in db.Products
select new {p, p.Category} into pc
where pc.Category.CategoryName == "Beverages"
select pc;
return beverages.ToList().Select(pc => pc.p);
... which, at least in EF 4.0, will produce just a single inner join. Entity Framework is smart enough to make it so that the product's Category property is populated with the category that came back from the database with it.
Of course, it's very likely that SQL Server optimizes things away so this won't actually gain you anything.
(Not directly an answer to your question if the queries are the same, but the comment field is too restricting for this)
If you leave out the .Include(), doesn't it load it anyway (because of the where)? Generally it makes more sense to me to use projections instead of Include():
var beverages = from p in db.Products.Include("Category")
where p.Category.CategoryName == "Beverages"
select new { Product = p, Category = p.Category };
var beverageList = beverages.ToList();

Categories