Linq making very inefficient Entity Framework query - c#

Entity Framework generates very poorly performing SQL for the following LINQ query:
var query = _context.Sessions
.Where(s => s.OrganizationId == orgId && s.Device != null && s.Device.User != null)
.Select(s => s.Device.User)
.Distinct();
Generates this SQL:
exec sp_executesql N'SELECT
[Distinct1].[Id] AS [Id],
[Distinct1].[Email] AS [Email],
[Distinct1].[Sex] AS [Sex],
[Distinct1].[Age] AS [Age]
FROM ( SELECT DISTINCT
[Extent4].[Id] AS [Id],
[Extent4].[Email] AS [Email],
[Extent4].[Sex] AS [Sex],
[Extent4].[Age] AS [Age]
FROM (SELECT [Extent1].[OrganizationId] AS [OrganizationId], [Extent3].[UserId] AS [UserId1]
FROM [dbo].[Sessions] AS [Extent1]
INNER JOIN [dbo].[Devices] AS [Extent2] ON [Extent1].[DeviceId] = [Extent2].[Id]
LEFT OUTER JOIN [dbo].[Devices] AS [Extent3] ON [Extent1].[DeviceId] = [Extent3].[Id]
WHERE [Extent2].[UserId] IS NOT NULL ) AS [Filter1]
LEFT OUTER JOIN [dbo].[Users] AS [Extent4] ON [Filter1].[UserId1] = [Extent4].[Id]
WHERE [Filter1].[OrganizationId] = #p__linq__0
) AS [Distinct1]',N'#p__linq__0 int',#p__linq__0=2
The SQL I'm actually looking to execute is the following, which runs lightning fast:
select distinct u.*
from Sessions s
inner join Devices d on s.DeviceId = d.Id
inner join Users u on d.UserId = u.Id
where OrganizationId = 2
How can I get the Entity Framework-generated SQL to be as close to this query as possible?

Why select the whole User entity if you just want the email?
Try this:
var query = _context.Sessions
.Where(s => s.OrganizationId == orgId && s.Device != null && s.Device.User != null)
.Select(s => s.Device.User.Email)
.Distinct();

Try starting with the users table:
var query = (
from u in _context.Users
where u.Devices.Any(d => d.Sessions
.Any(s => s.OrganisationId == orgId)
)
select u
);
It won't do the query you specified but what it does return might have the same good performance.

You can do it pretty simply:
_context.Sessions
.Where(s => s.OrganizationId == 2)
.Select(s => s.Device.User)
.Distinct();
You do not need to check for null, as it will perform an INNER JOIN for you.

I dont like use the DISTINCT , if a query contains it then the query's wrong.
Other way to do it
var query = _context.Sessions.Include("Device.User.Email")
.Where(s => s.OrganizationId == orgId);

Related

How to better write this EF core 6 Group by Query

I am looking at the queries generated by EF core 6 and I am not that happy about it as it generates lo much blocking overhead and in efficient TSQL and I am hoping that I am the problem...
My EF Query:
var query = db.PurchaseOrders
.Include(i => i.Items)
.Where(w=>w.Items.Any(a=>a.InventoryItemId==item.Id))
.GroupBy(g=>g.SupplierId)
.Select(s => new CurrentInventoryItemSupplier{ SupplierId=s.Key
, LastOrder= s.Max(m=>m.OrderDate)
, FirstOrder= s.Min(m=>m.OrderDate)
, Orders= s.Count()
}
).ToList();
Generates this:
SELECT [p].[SupplierId], (
SELECT MAX([p1].[OrderDate])
FROM [PurchaseOrders] AS [p1]
WHERE EXISTS (
SELECT 1
FROM [PurchasedItem] AS [p2]
WHERE ([p1].[Id] = [p2].[PurchaseOrderId]) AND ([p2].[InventoryItemId] = #__item_Id_0)) AND ([p].[SupplierId] = [p1].[SupplierId])) AS [LastOrder], (
SELECT MIN([p4].[OrderDate])
FROM [PurchaseOrders] AS [p4]
WHERE EXISTS (
SELECT 1
FROM [PurchasedItem] AS [p5]
WHERE ([p4].[Id] = [p5].[PurchaseOrderId]) AND ([p5].[InventoryItemId] = #__item_Id_0)) AND ([p].[SupplierId] = [p4].[SupplierId])) AS [FirstOrder], (
SELECT COUNT(*)
FROM [PurchaseOrders] AS [p7]
WHERE EXISTS (
SELECT 1
FROM [PurchasedItem] AS [p8]
WHERE ([p7].[Id] = [p8].[PurchaseOrderId]) AND ([p8].[InventoryItemId] = #__item_Id_0)) AND ([p].[SupplierId] = [p7].[SupplierId])) AS [Orders]
FROM [PurchaseOrders] AS [p]
WHERE EXISTS (
SELECT 1
FROM [PurchasedItem] AS [p0]
WHERE ([p].[Id] = [p0].[PurchaseOrderId]) AND ([p0].[InventoryItemId] = #__item_Id_0))
GROUP BY [p].[SupplierId]
(plan: https://www.brentozar.com/pastetheplan/?id=rkOloR7S9)
here ideally I would expect
select
P.[SupplierId], MAX([p].[OrderDate]) as LastOrder, MIN([p].[OrderDate]) as FirtOrder, COUNT(P.Id) as Orders
FROM [PurchaseOrders] AS [p]
join [PurchasedItem] AS [p2] on [P2].[PurchaseOrderId]=[P].ID
where [p2].InventoryItemId= #__item_Id_0
group by P.[SupplierId]
(https://www.brentozar.com/pastetheplan/?id=rkOloR7S9)
Is there a way to improve the generated TSQL or make a parametrized Function and call the function from EF core?
There is no way this will survive production data volumes
ok, I figured it out, I had to use the Join method and not the Include
This:
var query = db.PurchaseOrders
.Join(db.PurchaseOrderItems, po => po.Id, pi => pi.PurchaseOrderId
, (po, pi) => new { SupplierId = po.SupplierId, OrderDate = po.OrderDate, pi.InventoryItemId })
.Where(w => w.InventoryItemId == item.Id)
.GroupBy(g => g.SupplierId)
.Select(s => new CurrentInventoryItemSupplier
{
SupplierId = s.Key,
LastOrder = s.Max(m => m.OrderDate),
FirstOrder = s.Min(m => m.OrderDate),
Orders = s.Count(),
});
Generates
SELECT [p].[SupplierId], MAX([p].[OrderDate]) AS [LastOrder], MIN([p].[OrderDate]) AS [FirstOrder], COUNT(*) AS [Orders]
FROM [PurchaseOrders] AS [p]
INNER JOIN [PurchaseOrderItems] AS [p0] ON [p].[Id] = [p0].[PurchaseOrderId]
WHERE [p0].[InventoryItemId] = #__item_Id_0
GROUP BY [p].[SupplierId]
when you loop over it, nothing wrong with it :-)

Linq Query With Multiple Joins Not Giving Correct Results

I have a Linq query which is being used to replace a database function. This is the first one with multiple joins and I can't seem to figure out why it returns 0 results.
If you can see any difference which could result in the incorrect return it would be greatly appreciated......I've been trying to solve it longer than I should have.
Linq Query
context.StorageAreaRacks
.Join(context.StorageAreas, sar => sar.StorageAreaId, sa => sa.Id, (sar, sa) => new { sar, sa })
.Join(context.StorageAreaTypes, xsar => xsar.sar.StorageAreaId, sat => sat.Id, (xsar, sat) => new { xsar, sat })
.Join(context.Racks, xxsar => xxsar.xsar.sar.RackId, r => r.Id, (xxsar, r) => new { xxsar, r })
.Where(x => x.xxsar.sat.IsManual == false)
.Where(x => x.r.IsEnabled == true)
.Where(x => x.r.IsVirtual == false)
.Select(x => new { x.xxsar.sat.Id, x.xxsar.sat.Name })
.Distinct()
.ToList();
This is the query which is generated by the LINQ query
SELECT
[Distinct1].[C1] AS [C1],
[Distinct1].[Id] AS [Id],
[Distinct1].[Name] AS [Name]
FROM ( SELECT DISTINCT
[Extent2].[Id] AS [Id],
[Extent2].[Name] AS [Name],
1 AS [C1]
FROM [dbo].[StorageAreaRacks] AS [Extent1]
INNER JOIN [dbo].[StorageAreaTypes] AS [Extent2] ON [Extent1].[StorageAreaId] = [Extent2].[Id]
INNER JOIN [dbo].[Racks] AS [Extent3] ON [Extent1].[RackId] = [Extent3].[Id]
WHERE (0 = [Extent2].[IsManual]) AND (1 = [Extent3].[IsEnabled]) AND (0 = [Extent3].[IsVirtual])
) AS [Distinct1]
Sql Query which produces required results
SELECT DISTINCT sat.Name, sat.Id
FROM StorageAreaRacks sar
JOIN StorageAreas sa on sa.id = sar.StorageAreaId
JOIN StorageAreaTypes sat on sat.id = sa.StorageAreaTypeId
JOIN Racks r on r.id = sar.RackId
WHERE sat.IsManual = 0
AND r.IsEnabled = 1
AND r.IsVirtual = 0
Using joins with LINQ method syntax is hard to read and error prone.
Using joins with LINQ query syntax is better, but still error prone (you can join by the wrong key as you did) and does not give you information about join cardinality.
The best for LINQ to Entities queries is to use navigation properties (as Gert Arnold suggested in the comments and not only - see Don’t use Linq’s Join. Navigate!) because they have none of the aforementioned drawbacks.
The whole query should be something like this:
var query = context.StorageAreaRacks
.Where(sar => !sar.StorageArea.StorageAreaType.IsManual
&& sar.Rack.IsEnabled && !sar.Rack.IsVirtual)
.Select(sar => new
{
sar.StorageArea.StorageAreaType.Id,
sar.StorageArea.StorageAreaType.Name,
})
.Distinct();
or
var query = (
from sar in context.StorageAreaRacks
let sat = sar.StorageArea.StorageAreaType
let r = sar.Rack
where !sat.IsManual && r.IsEnabled && !r.IsVirtual
select new { sat.Id, sat.Name })
.Distinct();
Simple, readable and almost no place for mistakes. Navigation properties are one of the most beautiful features of EF, don't miss them.
Your LINQ doesn't translate the SQL properly; it Joins the StorageAreaTypes on the StorageAreaRack.StorageAreaId instead of on the StorageAreas.StorageAreaTypeId, which is why EF drops the StorageAreas Join - it has no effect on the outcome.
I think it is clearer if you elevate the members of each join to flatten the anonymous objects and name them based on their members (that are the join tables). Also, no reason to separate the Where clauses, LINQ can use && as well as SQL using AND. Also, if you have boolean values, don't compare them to true or false. Also there is no reason to pass range variables through that aren't used later.
Putting it all together:
var ans = context.StorageAreaRacks
.Join(context.StorageAreas, sar => sar.StorageAreaId, sa => sa.Id, (sar, sa) => new { sar, sa })
.Join(context.StorageAreaTypes, sarsa => sarsa.sa.StorageAreaTypeId, sat => sat.Id, (sarsa, sat) => new { sarsa.sar, sat })
.Join(context.Racks, sarsat => sarsat.sar.RackId, r => r.Id, (sarsat, r) => new { sarsat.sat, r })
.Where(satr => !satr.sat.IsManual && satr.r.IsEnabled && !satr.r.IsVirtual)
.Select(satr => new { satr.sat.Id, satr.sat.Name })
.Distinct()
.ToList();
However, I think when multiple joins are involved and when translating SQL, LINQ comprehension syntax can be easier to understand:
var ans = (from sar in context.StorageAreaRacks
join sa in context.StorageAreas on sar.StorageAreaId equals sa.Id
join sat in context.StorageAreaTypes on sa.StorageAreaTypeId equals sat.Id
join r in context.Racks on sar.RackId equals r.Id
where !sat.IsManual && r.IsEnabled && !r.IsVirtual
select new {
sat.Name,
sat.Id
}).Distinct().ToList();
You are missing a Where for your rack ID != null in your LINQ statement, and a Distinct().

Why are multiple where in LINQ so slow?

Using C# and Linq to SQL, I found that my query with multiple where is orders of magnitude slower than with a single where / and.
Here is the query
using (TeradiodeDataContext dc = new TeradiodeDataContext())
{
var filterPartNumberID = 71;
var diodeIDsInBlades = (from bd in dc.BladeDiodes
select bd.DiodeID.Value).Distinct();
var diodesWithTestData = (from t in dc.Tests
join tt in dc.TestTypes on t.TestTypeID equals tt.ID
where tt.DevicePartNumberID == filterPartNumberID
select t.DeviceID.Value).Distinct();
var result = (from d in dc.Diodes
where d.DevicePartNumberID == filterPartNumberID
where diodesWithTestData.Contains(d.ID)
where !diodeIDsInBlades.Contains(d.ID)
orderby d.Name
select d);
var list = result.ToList();
// ~15 seconds
}
However, when the condition in the final query is this
where d.DevicePartNumberID == filterPartNumberID
& diodesWithTestData.Contains(d.ID)
& !diodeIDsInBlades.Contains(d.ID)
// milliseconds
it is very fast.
Comparing the SQL in result before calling ToList(), here are the queries (value 71 manually added in place of #params)
-- MULTIPLE WHERE
SELECT [t0].[ID], [t0].[Name], [t0].[M2MID], [t0].[DevicePartNumberID], [t0].[Comments], [t0].[Hold]
FROM [dbo].[Diode] AS [t0]
WHERE (NOT (EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT DISTINCT [t2].[value]
FROM (
SELECT [t1].[DiodeID] AS [value]
FROM [dbo].[BladeDiode] AS [t1]
) AS [t2]
) AS [t3]
WHERE [t3].[value] = [t0].[ID]
))) AND (EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT DISTINCT [t6].[value]
FROM (
SELECT [t4].[DeviceID] AS [value], [t5].[DevicePartNumberID]
FROM [dbo].[Test] AS [t4]
INNER JOIN [dbo].[TestType] AS [t5] ON [t4].[TestTypeID] = ([t5].[ID])
) AS [t6]
WHERE [t6].[DevicePartNumberID] = (71)
) AS [t7]
WHERE [t7].[value] = [t0].[ID]
)) AND ([t0].[DevicePartNumberID] = 71)
ORDER BY [t0].[Name]
and
-- SINGLE WHERE
SELECT [t0].[ID], [t0].[Name], [t0].[M2MID], [t0].[DevicePartNumberID], [t0].[Comments], [t0].[Hold]
FROM [dbo].[Diode] AS [t0]
WHERE ([t0].[DevicePartNumberID] = 71) AND (EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT DISTINCT [t3].[value]
FROM (
SELECT [t1].[DeviceID] AS [value], [t2].[DevicePartNumberID]
FROM [dbo].[Test] AS [t1]
INNER JOIN [dbo].[TestType] AS [t2] ON [t1].[TestTypeID] = ([t2].[ID])
) AS [t3]
WHERE [t3].[DevicePartNumberID] = (71)
) AS [t4]
WHERE [t4].[value] = [t0].[ID]
)) AND (NOT (EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT DISTINCT [t6].[value]
FROM (
SELECT [t5].[DiodeID] AS [value]
FROM [dbo].[BladeDiode] AS [t5]
) AS [t6]
) AS [t7]
WHERE [t7].[value] = [t0].[ID]
)))
ORDER BY [t0].[Name]
The two SQL queries execute in < 1 second in SSMS and produce the same results.
So I'm wondering why the first is slower on the LINQ side. It's worrying to me because I know I've used multiple where elsewhere, without being aware of a such a severe performance impact.
This question even has answered with both multiple & and where. And this answer even suggests using multiple where clauses.
Can anyone explain why this happens in my case?
Because writing like this
if (someParam1 != 0)
{
myQuery = myQuery.Where(q => q.SomeField1 == someParam1)
}
if (someParam2 != 0)
{
myQuery = myQuery.Where(q => q.SomeField2 == someParam2)
}
is NOT(upd) the same as (in case when someParam1 and someParam2 != 0)
myQuery = from t in Table
where t.SomeField1 == someParam1
&& t.SomeField2 == someParam2
select t;
is (NOT deleted) the same as
myQuery = from t in Table
where t.SomeField1 == someParam1
where t.SomeField2 == someParam2
select t;
UPD
Yes, I do mistake. Second query is same, first is not same.
First and Second queries not EXACTLY the same. Let me show you what I mean.
1st query with lamda-expression writen as
t.Where(r => t.SomeField1 == someParam1 && t.SomeField2 == someParam2)
2nd query as
t.Where(r => r.SomeField1 == someParam1).Where(r => r.SomeField2 == someParam2)
In this case in generated SQL Predicate with SomeField2 goes first (it is important, see below)
In 1st case we getting this SQL:
SELECT <all field from Table>
FROM table t
WHERE t.SomeField1 = :someParam1
AND t.SomeField2 = :someParam2
In 2 case the SQL is:
SELECT <all field from Table>
FROM table t
WHERE t.SomeField2 = :someParam2
AND t.SomeField1 = :someParam1
As we see there are 2 'same' SQLs. As we see, the OP's SQLs are also 'same', they are different in order of predicates in WHERE clause (as in my example). And I guess that SQL optimizer generate 2 different execution plans and may be(!!!) doing NOT EXISTS, then EXISTS and then filtering take more time than do first filtering and after that do EXISTS and NOT EXISTS
UPD2
It is a 'problem' of Linq Provider (ORM). I'm using another ORM (linq2db), and it generates for me EXACTLY the same SQLs in both cases.

How can i get primary of a new table generated from OrderBy / GroupBy?

How can I get primary of a new table generated from OrderBy / GroupBy?
var something = (from m in _db.Requests
where m.StoreID == myRequest.StoreID
where m.AcceptedTime != null
where System.Data.Entity.DbFunctions.TruncateTime(m.RequestTime) == today
group m by m.StaffID into g
let TotalPoints = g.Count()
orderby TotalPoints ascending
select new { User = g.Key});
then, I try to get the 1st result which will be the least times "m" appeared in my Requests table
var thisStaff = something.Select(o=>o.User).Take(1).ToString();
However, the value of "thisStaff" is not StaffID which is the Key of my Request table. The value in it is
SELECT TOP (1) [Project1].[StaffID] AS [StaffID]
FROM ( SELECT [GroupBy1].[A1] AS [C1], [GroupBy1].[K1] AS [StaffID]
FROM ( SELECT [Extent1].[StaffID] AS [K1], COUNT(1) AS [A1]
FROM [dbo].[Requests] AS [Extent1]
WHERE ([Extent1].[StoreID] = #p__linq__0) AND
([Extent1].[AcceptedTime] IS NOT NULL) AND
((convert (datetime2, convert(varchar(255), [Extent1].[RequestTime], 102) , 102)) = #p__linq__1)
GROUP BY [Extent1].[StaffID] ) AS [GroupBy1] ) AS [Project1]
ORDER BY [Project1].[C1] ASC
Please suggest how i should change it. By the way, I've also tried using the following and get almost same result.
var something2 = _db.Requests
.Where(o => o.StoreID == myRequest.StoreID)
.Where(o => o.AcceptedTime != null)
.Where(o => System.Data.Entity.DbFunctions.TruncateTime(o.RequestTime) == today)
.GroupBy(x => x.StaffID)
.Select(x => new
{
Count = x.Count(),
Name = x.Key,
})
.OrderBy(x => x.Count)
.Take(1);
Your query is fine, you're just lacking the method call that will call the database and return the result (a feature of EF is deferred execution), FirstOrDefault() should do the trick:
var thisStaff = something.Select(o => o.User).FirstOrDefault();
The value that you see is the value that the IQueryable you have constructed returns for its ToString method (it will be the SQL that is run against the database when the query is executed).

Returning (blanks) in many to many Linq query

A follow up to this question: Changing a linq query to filter on many-many
I have the following Linq query
public static List<string> selectedLocations = new List<string>();
// I then populate selectedLocations with a number of difference strings, each
// corresponding to a valid Location
viewModel.people = (from c in db.People
select c)
.OrderBy(x => x.Name)
.ToList();
// Here I'm basically filtering my dataset to include Locations from
// my array of selectedLocations
viewModel.people = from c in viewModel.people
where (
from a in selectedLocations
where c.Locations.Any(o => o.Name == a)
select a
).Any()
select c;
How can I modify the query so that it also returns people that have NO location set at all?
You can do filtering on database side:
viewModel.people =
(from p in db.People
where !p.Locations.Any() ||
p.Locations.Any(l => selectedLocations.Contains(l.Name))
orderby p.Name
select p).ToList();
Or lambda syntax:
viewModel.people =
db.People.Where(p => !p.Locations.Any() ||
p.Locations.Any(l => selectedLocations.Contains(l.Name)))
.OrderBy(p => p.Name)
.ToList();
EF will generate two EXISTS subqueries in this case. Something like:
SELECT [Extent1].[Name]
[Extent1].[Id]
-- other fields from People table
FROM [dbo].[People] AS [Extent1]
WHERE (NOT EXISTS (SELECT 1 AS [C1]
FROM [dbo].[PeopleLocations] AS [Extent2]
WHERE [Extent2].[PersonId] = [Extent1].[Id])
OR EXISTS (SELECT 1 AS [C1]
FROM [dbo].[PeopleLocations] AS [Extent3]
WHERE [Extent3].[PersonId] = [Extent1].[Id])
AND [Extent3].[Name] IN ('location1', 'location2')))
ORDER BY [Extent1].[Name] ASC

Categories