Count(x => ...) vs Where(x => ...).Count() - c#

Is there any difference between these two LINQ-to-Entities queries:
context.Table.Count(x => ...)
and
context.Table.Where(x => ...).Count()
in terms of performance and generated SQL?
I tried to look generated SQL myself, but I only know how to get the SQL from IQueryable, but Count returns the value directly.

I have managed to see the SQL (thanks to #dasblinkenlight), the answer is - no, both LINQ queries generate exactly the same SQL query, at least for a simple query without a grouping:
SELECT
[GroupBy1].[A1] AS [C1]
FROM ( SELECT
COUNT(1) AS [A1]
FROM [dbo].[Table] AS [Extent1]
WHERE <condition>
) AS [GroupBy1]

Related

Why does System.Data.Linq generates ROW_NUMBER() for Paging instead of OFFSET/FETCH for SQL Server 2012

We are using Linq and Entity Framework to access a SQL Server 2012 database. We are having some performance issue, so after some investigation, we were able to fix some of the problems, but I would like to use SQL query with OFFSET/FETCH instead of ROW_NUMBER() and BETWEEN syntax.
The performance difference is not so big. OFFSET/FETCH is quicker by about 10%. Do you have any idea why the generated query uses ROW_NUMBER() and BETWEEN syntax? What can I do to force Linq to generate OFFSET/FETCH query?
C# code:
var orders = dc.Orders.OrderBy(q => q.LastModifiedTimestamp)
.Skip(q => skipCount)
.Take(q => takeCount)
.ToList();
The currently generated query:
-- Region Parameters
DECLARE #p0 Int = 10
DECLARE #p1 Int = 10
-- EndRegion
SELECT [t2].[OrderId], [t2].[CustomerId]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t1].[OrderId], [t1].[CustomerId]
FROM (
SELECT DISTINCT [t0].[OrderId], [t0].[CustomerId]
FROM [Order] AS [t0]
) AS [t1]
) AS [t2]
WHERE [t2].[ROW_NUMBER] BETWEEN #p0 + 1 AND #p0 + #p1
ORDER BY [t2].[ROW_NUMBER]
The preferred query:
SELECT *
FROM [Order]
ORDER BY LastModifiedTimestamp
OFFSET 10000 ROWS
FETCH NEXT 10000 ROWS ONLY
Do you have any idea why the generated query use ROW_NUMBER() and BETWEEN syntax? What can I do to force Linq to generate OFFSET/FETCH query?

LINQ to SQL generates a different query for similar group by expressions

I noticed that when using GroupBy in Linq to SQL, there's a difference in the result query when providing a reference Id as the Key versus using the actual navigation property as the Key.
Example 1:
Employees.GroupBy(x => x.CompanyId).Select(g => g.Count())
Result SQL:
SELECT COUNT(*) AS [value]
FROM [Employees] AS [t0]
GROUP BY [t0].[CompanyId]
Example 2:
Employees.GroupBy(x => x.Company).Select(g => g.Count())
Result SQL:
SELECT [t1].[value]
FROM (
SELECT COUNT(*) AS [value], [t0].[DivisionDeductionID]
FROM [CheckDeductions] AS [t0]
GROUP BY [t0].[DivisionDeductionID]
) AS [t1]
LEFT OUTER JOIN [DivisionDeductions] AS [t2] ON [t2].[DivisionDeductionID] = [t1].[DivisionDeductionID]
Looking at Example #2, it is obvious that [t2] is never used other than the LEFT JOIN itself. why doesn't LINQ to SQL detects that and just uses the same query as Example #1? it anyways groups by the ID field.
This looks like EF's SQL generator has missed an opportunity to optimize the query: indeed, since [t2] is not used outside the outer join, it could be thrown away, along with a nested select.
It appears that EF writers added a join for [t2] because they did not want to differentiate between a situation (1) when a navigation property is used only for its PK (so the corresponding FK could be used in its place) and (2) a situation when the query actually pull additional fields from it.
This practice is completely justified, given that RDBMS optimizes out the unnecessary join anyway.

How to optimize SQL query generated by Entity Framework in SQL Server Management Studio?

I create a query in linq which returns a table of most active salesmen in my shop:
ProjectDB3Context db = new ProjectDB3Context();
db.Database.Log = message => Trace.WriteLine(message);
var result = db.tblUsers.Join(db.tblSales,
u => u.ID,
sl => sl.tblUserId,
(u, sl) => new { u, sl })
.Select(o => new
{
UserId = o.u.ID,
Login = o.u.UserLogin,
FullName = o.u.Name + " " + o.u.Surname,
ItemsToSell = db.tblSales.Where(x => x.tblUserId == o.u.ID).Count()
})
.Distinct()
.OrderByDescending(x => x.ItemsToSell)
.ToList();
The henerated SQL query looks like:
SELECT
[Distinct1].[C1] AS [C1],
[Distinct1].[ID] AS [ID],
[Distinct1].[UserLogin] AS [UserLogin],
[Distinct1].[C2] AS [C2],
[Distinct1].[C3] AS [C3]
FROM ( SELECT DISTINCT
[Project1].[ID] AS [ID],
[Project1].[UserLogin] AS [UserLogin],
1 AS [C1],
[Project1].[Name] + N' ' + [Project1].[Surname] AS [C2],
[Project1].[C1] AS [C3]
FROM ( SELECT
[Extent1].[ID] AS [ID],
[Extent1].[UserLogin] AS [UserLogin],
[Extent1].[Name] AS [Name],
[Extent1].[Surname] AS [Surname],
(SELECT
COUNT(1) AS [A1]
FROM [dbo].[tblSale] AS [Extent3]
WHERE [Extent3].[tblUserId] = [Extent1].[ID]) AS [C1]
FROM [dbo].[tblUser] AS [Extent1]
INNER JOIN [dbo].[tblSale] AS [Extent2] ON [Extent1].[ID] = [Extent2].[tblUserId]
) AS [Project1]
) AS [Distinct1]
ORDER BY [Distinct1].[C3] DESC
Statistics:
SQL Server Execution Times:
CPU time = 359 ms, elapsed time = 529 ms.
Execution plan screen shot
I want to optimize the generated SQL query and insert optimized query into a stored procedure. SQL Server Management Studio gives me a tip to create a nonclustered index (tblUserId) on tblSale (you can see this tip in image that I included).
When I create it using command:
CREATE NONCLUSTERED INDEX IX_ProductVendor_tblUserId
ON tblSale (tblUserId);
and then run the SQL query in SQL Server Management Studio I get:
SQL Server Execution Times:
CPU time = 328 ms, elapsed time = 631 ms.
So it takes much longer after I used index to optimize my SQL query.
Can anybody help me with optimize this query in SQL Server using indexes?
Can anybody help me with optimize this query in SQL Server using indexes?
First off, before trying to optimize the SQL query in database, make sure your LINQ query is optimal. Which is not the case with yours. There is unnecessary join which in turn requires distinct etc. And tblSales is accessed twice (see the generated SQL).
What you are trying to achieve is to get users with sales ordered by sales count descending. The following simple query should produce the desired result
var result = db.tblUsers
.Select(u => new
{
UserId = u.ID,
Login = u.UserLogin,
FullName = u.Name + " " + u.Surname,
ItemsToSell = db.tblSales.Count(s => s.tblUserId == u.ID)
})
.Where(x => x.ItemsToSel > 0)
.OrderByDescending(x => x.ItemsToSell)
.ToList();
Try and see the new execution plan/time.
I want to optimize the generated SQL query and insert optimized query into a stored procedure.
Bzzt. Wrong.
Your query is already "optimized" - in that there isn't anything you can do to the query itself to improve its runtime performance.
Stored procedures in SQL Server do not have any kind of magic-optimization or other real advantages over immediately-executed queries. Stored procedures do benefit from cached execution plans, but so do immediate-queries after their first execution, and execution-plan generation isn't that expensive an operation.
Anyway, using stored procedures for read-only SELECT operations is inadvisble, it is better to use an UDF (CREATE FUNCTION) so you can take advantage of function composition which can be optimized and runtime far better than nested stored procedure calls.
If SQL Server's Show Execution Plan feature tells you to create an index, that is outside of EF's responsibility, it is also outside a stored procedure's responsibility too. Just define the index in your database and include it in your setup script. Your EF-generated query will run much faster without it being a stored procedure.

Why linq-to-sql query is translated into subquery?

Why this linq query:
(from c in Orders
select new
{
Id=c.Id,
DeliveryDate = c.DeliveryDate.Value
}).Take(10)
is translated into
SELECT TOP (10) [t1].[Id], [t1].[value] AS [DeliveryDate]
FROM (
SELECT [t0].[Id], [t0].[DeliveryDate] AS [value]
FROM [Orders] AS [t0]
) AS [t1]
but when I change DeliveryDate = c.DeliveryDate.Value into DeliveryDate = c.DeliveryDate SQL query looks as simple as:
SELECT TOP (10) [t0].[Id], [t0].[DeliveryDate]
FROM [Orders] AS [t0]
I think this is because the LINQ2SQL's translator is under-optimized. Use of a "property" (Value) triggers creation of a sub-query, which turns out to be unnecessary.
It is worth to note that any RDBMS worth its salt would generate identical query plans for both SQL queries, so in the end it would not matter either way.
Possibly a bug / not optimized issue. I can not explain it different.

Improve query generated from entity framework

I have a query linq like this:
var query = from c in Context.Customers select c;
var result = query.ToList();
Linq query generate this tsql code:
exec sp_executesql N'SELECT
[Project1].[Id] AS [Id],
[Project1].[Name] AS [Name],
[Project1].[Email] AS [Email]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
[Extent1].[Email] AS [Email]
FROM [dbo].[Customers] AS [Extent1] ) AS [Project1]
Is there a way for not generate subquery?
Do you have any evidence that that query is causing performance problems? I imagine the query-optimizer would easily recognize that.
If you're certain after profiling that the query is a performance problem (doubtful) - and only then - you could simply turn the query into a stored procedure, and call that instead.
You use a tool like Linq because you don't want to write SQL, before abandoning that you should at least compare the query plan of your proposed SQL vs that generated by the tool. I don't have access to SQL Studio at the moment, but I would be a bit surprised if the query plans aren't identical...
EDIT: having had a chance to check out the query plans, they are in fact identical.
Short answer: No you cannot modify that query.
Long answer: If you want to reimplement Linq provider and query generator then perhaps there is a way but I doubt you want to do that. You can also implement custom EF provider wrapper which will take query passed from EF and reformat it but that will be hard as well - and slow. Are you going to write custom interpreter for SQL queries?

Categories