optimize Entity Framework Linq query (selected 1 unexpected field) - c#

All,
Can anyone help me optimize the following EF/Linq query:
The EF/Linq query (taken from LinqPad):
Articles
.AsNoTracking()
.Where(a => a.Active == "J")
.SelectMany(a => KerlServices
.Where(ks => ks.Service.SAPProductNumber == a.SAPProductNumber))
.Select(ks => new {
ks.KerlCode,
ks.Service.SAPProductNumber,
ks.Service.Type })
.ToList()
The relation between Articles and Services (ks.Service.SAPProductNumber == a.SAPProductNumber) is in theory a 1:optional relation with cannot be defined in EF. This is however not my question.
The resulting SQL query:
SELECT
[Join1].[F_SERVICESID] AS [F_SERVICESID],
[Join1].[F_KERLCOD] AS [F_KERLCOD],
[Join1].[F_SAPARTNUM] AS [F_SAPARTNUM],
[Join1].[F_TYPE] AS [F_TYPE]
FROM [dbo].[T_ART] AS [Extent1]
INNER JOIN (SELECT [Extent2].[F_KERLCOD] AS [F_KERLCOD], [Extent2].[F_SERVICESID] AS [F_SERVICESID], [Extent3].[F_SAPARTNUM] AS [F_SAPARTNUM], [Extent3].[F_TYPE] AS [F_TYPE]
FROM [dbo].[T_SERVICESKERL] AS [Extent2]
INNER JOIN [dbo].[T_SERVICES] AS [Extent3] ON [Extent2].[F_SERVICESID] = [Extent3].[F_ID] ) AS [Join1] ON [Extent1].[F_SAPARTNUM] = [Join1].[F_SAPARTNUM]
WHERE N'J' = [Extent1].[F_ACTIND]
Why does EF generate a query that selects [Join1].[F_SERVICESID]? I don't need this field. Does anyone know a way to prevent this?
Kind regards, Jan.
ADDITION 1:
KerlServices
.AsNoTracking()
.Select(ks => new {
ks.KerlCode,
ks.Service.SAPProductNumber,
ks.Service.Type })
.Join(
Articles,
ks => ks.SAPProductNumber,
a => a.SAPProductNumber,
(ks, a) => new { ks, a.Active })
.Where(ksa => ksa.Active == "J")
.Select(ksa => ksa.ks)
.ToList()
results in:
SELECT
[Extent1].[F_SERVICESID] AS [F_SERVICESID],
[Extent1].[F_KERLCOD] AS [F_KERLCOD],
[Extent2].[F_SAPARTNUM] AS [F_SAPARTNUM],
[Extent2].[F_TYPE] AS [F_TYPE]
FROM [dbo].[T_SERVICESKERL] AS [Extent1]
INNER JOIN [dbo].[T_SERVICES] AS [Extent2] ON [Extent1].[F_SERVICESID] = [Extent2].[F_ID]
INNER JOIN [dbo].[T_ART] AS [Extent3] ON [Extent2].[F_SAPARTNUM] = [Extent3].[F_SAPARTNUM]
WHERE N'J' = [Extent3].[F_ACTIND]
This 'improvement' does not answer my own question, but the result surely looks prettier to me.
UPDATE 1:
The query in Ivan Stoev's answer produces the following SQL:
SELECT
[Extent1].[F_SERVICESID] AS [F_SERVICESID],
[Extent1].[F_KERLCOD] AS [F_KERLCOD],
[Extent2].[F_SAPARTNUM] AS [F_SAPARTNUM],
[Extent2].[F_TYPE] AS [F_TYPE]
FROM [dbo].[T_SERVICESKERL] AS [Extent1]
INNER JOIN [dbo].[T_SERVICES] AS [Extent2] ON [Extent1].[F_SERVICESID] = [Extent2].[F_ID]
WHERE EXISTS (SELECT
1 AS [C1]
FROM [dbo].[T_ART] AS [Extent3]
WHERE (N'J' = [Extent3].[F_ACTIND]) AND ([Extent3].[F_SAPARTNUM] = [Extent2].[F_SAPARTNUM])
)

Why does EF generate a query that selects [Join1].[F_SERVICESID]? I don't need this field.
That's weird if true, I have no explanation for that.
Can anyone help me optimize the following EF/Linq query
It's worth trying the following, which for me represents the most logical way to retrieve the data in question:
KerlServices
.AsNoTracking()
.Select(ks => new {
ks.KerlCode,
ks.Service.SAPProductNumber,
ks.Service.Type })
.Where(ks => Articles.Any(a => a.Active == "J" && a.SAPProductNumber == ks.SAPProductNumber)
.ToList()
UPDATE: Recently I've encountered that EF includes some additional fields in the generated SQL query when dialing with foreign key relations. These fields are not included in the projected result, so I think you should not worry about. Take any of the queries above, execute it inside the real code environment (VS Debug) and check the the projected list - I'm pretty sure the field in question will not be there.

Related

Remove duplicated SUM subquery in Entity Framework Core query

I have the following LINQ code:
return from policy in db.Policy.Include(it => it.LedgerLines)
let balance = policy.LedgerLines.Sum(it => it.Amount)
where balance > 0m && balance < 5m
select policy;
This gets translated to
SELECT ...
FROM [Policy] AS [p]
LEFT JOIN [PolicyLedger] AS [p0] ON [p].[Id] = [p0].[PolicyId]
WHERE (((SELECT SUM([p1].[Amount])
FROM [PolicyLedger] AS [p1]
WHERE [p].[Id] = [p1].[PolicyId]) > 0.0))
AND ((SELECT SUM([p2].[Amount])
FROM [PolicyLedger] AS [p2]
WHERE [p].[Id] = [p2].[PolicyId]) < 5.0)
ORDER BY [p].[Id], [p0].[Id]
Is there any way to only execute the SUM([p1].[Amount]) subquery once?
(EF Core 3.1)
The line
let balance = policy.LedgerLines.Sum(it => it.Amount)
which is the equivalent of intermediate projection clearly indicates the intent to reuse the expression.
But EF Core query translator puts a lot of efforts to produce "pretty" queries by eliminating subqueries as much as possible. Unfortunately in this case it seems to go too much in that regard.
With that being said, you can consider it to be a translation defect, leave the LINQ query "as is" and wait for improved translation - EFC 5.x doesn't improve that, may be EFC 6.0 or later, if ever.
But here is one not so distracting trick to let EFC 3.1 / 5.x generate JOIN to GROUP BY subquery and reuse the SUM expression.
The only change to the original LINQ query is to replace the above let statement with the following
from balance in policy.LedgerLines
.GroupBy(it => it.PolicyId)
.Select(g => g.Sum(it => it.Amount))
which gets translated to
SELECT ...
FROM [Policy] AS [p]
INNER JOIN (
SELECT SUM([p0].[Amount]) AS [c], [p0].[PolicyId]
FROM [PolicyLedger] AS [p0]
GROUP BY [p0].[PolicyId]
) AS [t] ON [p].[Id] = [t].[PolicyId]
LEFT JOIN [PolicyLedger] AS [p1] ON [p].[Id] = [p1].[PolicyId]
WHERE ([t].[c] > 0.0) AND ([t].[c] < 5.0)
ORDER BY [p].[Id], [p1].[Id]
You could start your query from the LedgerLine entity and use a GroupBy() to build the sum of the Amount column for each policy. However, you can't group on a navigation property, so you have to group on the PolicyId instead. This means you need to join the PolicyId column with the Policies table/DbSet afterwards to get the actual Policy entity (with any required included collection properties).
The code can look like this:
var result = context.LedgerLines
.Include(it => it.Policy)
.GroupBy(it => it.PolicyId)
.Select(it => new {
policyId = it.Key,
sum = it.Sum(a => a.Amount)
})
.Join(context.Policies.Include(it => it.LedgerLines),
it => it.policyId,
it => it.Id,
(a,b) => new {
a.sum,
policy=b
})
.Where(it => it.sum > 0m && it.sum < 5m)
.Select(it => it.policy)
.ToList();
This will generate a query like this (for MySQL):
SELECT `p`.`Id`, `p`.`Name`, `l0`.`Id`, `l0`.`Amount`, `l0`.`PolicyId`
FROM (
SELECT `l`.`PolicyId`, SUM(`l`.`Amount`) AS `c`
FROM `LedgerLines` AS `l`
GROUP BY `l`.`PolicyId`
) AS `t`
INNER JOIN `Policies` AS `p` ON `t`.`PolicyId` = `p`.`Id`
LEFT JOIN `LedgerLines` AS `l0` ON `p`.`Id` = `l0`.`PolicyId`
WHERE (CAST(`t`.`c` AS decimal(18, 2)) > 0) AND (CAST(`t`.`c` AS decimal(18, 2)) < 5)
ORDER BY `p`.`Id`, `l0`.`Id`
As you see only one SUM() call is used, but I'm unsure about the performance as you JOIN over the LedgerLines table twice, not to mention that this code looks weird and cumbersome.

EF Core LINQ query failing due to limitation?

I am trying to do quite a simple group by, and sum, with EF Core 3.0
However am getting a strange error:
System.InvalidOperationException: 'Processing of the LINQ expression
'AsQueryable((Unhandled parameter:
y).TransactionLines)' by 'NavigationExpandingExpressionVisitor'
failed. This may indicate either a bug or a limitation in EF Core.
var creditBalances = await context.Transaction
.Include(x => x.TransactionLines)
.Include(x=>x.CreditAccount)
.Where(x => x.CreditAccount.UserAccount.Id == userAccount.Id)
.GroupBy(x => new
{
x.CreditAccount.ExternalId
})
.Select(x => new
{
x.Key.ExternalId,
amount = x.Sum(y => y.TransactionLines.Sum(z => z.Amount))
})
.ToListAsync();
I'm battling to see where an issue can arise, so not even sure where to start. I am trying to get a sum of all the transaction amounts (Which is a Sum of all the TransactionLines for each transaction - i.e. A Transaction amount is made of the lines associated to it).
I then sum up all the transactions, grouping by then CreditAccount ID.
The line, Unhandled parameter: y is worrying. Maybe my grouping and summing is out.
So start at the TransactionLines level and this is as simple as:
var q = from c in context.TransactionLines
where c.Transaction.CreditAccount.UserAccount.Id == userAccount.Id
group c by c.Transaction.CreditAccount.ExternalId into g
select new
{
ExternalId = g.Key,
Amount = g.Sum(x => x.Amount)
};
var creditBalances = await q.ToListAsync();
( You don't need any Include() since you're not returning an Entity with related data. You're projecting a custom data shape. )
Which translates to:
SELECT [c].[ExternalId], SUM([t].[Amount]) AS [Amount]
FROM [TransactionLines] AS [t]
LEFT JOIN [Transaction] AS [t0] ON [t].[TransactionId] = [t0].[Id]
LEFT JOIN [CreditAccounts] AS [c] ON [t0].[CreditAccountId] = [c].[Id]
LEFT JOIN [UserAccount] AS [u] ON [c].[UserAccountId] = [u].[Id]
WHERE [u].[Id] = #__userAccount_Id_0
GROUP BY [c].[ExternalId]

Linq Query With Multiple Joins Not Giving Correct Results

I have a Linq query which is being used to replace a database function. This is the first one with multiple joins and I can't seem to figure out why it returns 0 results.
If you can see any difference which could result in the incorrect return it would be greatly appreciated......I've been trying to solve it longer than I should have.
Linq Query
context.StorageAreaRacks
.Join(context.StorageAreas, sar => sar.StorageAreaId, sa => sa.Id, (sar, sa) => new { sar, sa })
.Join(context.StorageAreaTypes, xsar => xsar.sar.StorageAreaId, sat => sat.Id, (xsar, sat) => new { xsar, sat })
.Join(context.Racks, xxsar => xxsar.xsar.sar.RackId, r => r.Id, (xxsar, r) => new { xxsar, r })
.Where(x => x.xxsar.sat.IsManual == false)
.Where(x => x.r.IsEnabled == true)
.Where(x => x.r.IsVirtual == false)
.Select(x => new { x.xxsar.sat.Id, x.xxsar.sat.Name })
.Distinct()
.ToList();
This is the query which is generated by the LINQ query
SELECT
[Distinct1].[C1] AS [C1],
[Distinct1].[Id] AS [Id],
[Distinct1].[Name] AS [Name]
FROM ( SELECT DISTINCT
[Extent2].[Id] AS [Id],
[Extent2].[Name] AS [Name],
1 AS [C1]
FROM [dbo].[StorageAreaRacks] AS [Extent1]
INNER JOIN [dbo].[StorageAreaTypes] AS [Extent2] ON [Extent1].[StorageAreaId] = [Extent2].[Id]
INNER JOIN [dbo].[Racks] AS [Extent3] ON [Extent1].[RackId] = [Extent3].[Id]
WHERE (0 = [Extent2].[IsManual]) AND (1 = [Extent3].[IsEnabled]) AND (0 = [Extent3].[IsVirtual])
) AS [Distinct1]
Sql Query which produces required results
SELECT DISTINCT sat.Name, sat.Id
FROM StorageAreaRacks sar
JOIN StorageAreas sa on sa.id = sar.StorageAreaId
JOIN StorageAreaTypes sat on sat.id = sa.StorageAreaTypeId
JOIN Racks r on r.id = sar.RackId
WHERE sat.IsManual = 0
AND r.IsEnabled = 1
AND r.IsVirtual = 0
Using joins with LINQ method syntax is hard to read and error prone.
Using joins with LINQ query syntax is better, but still error prone (you can join by the wrong key as you did) and does not give you information about join cardinality.
The best for LINQ to Entities queries is to use navigation properties (as Gert Arnold suggested in the comments and not only - see Don’t use Linq’s Join. Navigate!) because they have none of the aforementioned drawbacks.
The whole query should be something like this:
var query = context.StorageAreaRacks
.Where(sar => !sar.StorageArea.StorageAreaType.IsManual
&& sar.Rack.IsEnabled && !sar.Rack.IsVirtual)
.Select(sar => new
{
sar.StorageArea.StorageAreaType.Id,
sar.StorageArea.StorageAreaType.Name,
})
.Distinct();
or
var query = (
from sar in context.StorageAreaRacks
let sat = sar.StorageArea.StorageAreaType
let r = sar.Rack
where !sat.IsManual && r.IsEnabled && !r.IsVirtual
select new { sat.Id, sat.Name })
.Distinct();
Simple, readable and almost no place for mistakes. Navigation properties are one of the most beautiful features of EF, don't miss them.
Your LINQ doesn't translate the SQL properly; it Joins the StorageAreaTypes on the StorageAreaRack.StorageAreaId instead of on the StorageAreas.StorageAreaTypeId, which is why EF drops the StorageAreas Join - it has no effect on the outcome.
I think it is clearer if you elevate the members of each join to flatten the anonymous objects and name them based on their members (that are the join tables). Also, no reason to separate the Where clauses, LINQ can use && as well as SQL using AND. Also, if you have boolean values, don't compare them to true or false. Also there is no reason to pass range variables through that aren't used later.
Putting it all together:
var ans = context.StorageAreaRacks
.Join(context.StorageAreas, sar => sar.StorageAreaId, sa => sa.Id, (sar, sa) => new { sar, sa })
.Join(context.StorageAreaTypes, sarsa => sarsa.sa.StorageAreaTypeId, sat => sat.Id, (sarsa, sat) => new { sarsa.sar, sat })
.Join(context.Racks, sarsat => sarsat.sar.RackId, r => r.Id, (sarsat, r) => new { sarsat.sat, r })
.Where(satr => !satr.sat.IsManual && satr.r.IsEnabled && !satr.r.IsVirtual)
.Select(satr => new { satr.sat.Id, satr.sat.Name })
.Distinct()
.ToList();
However, I think when multiple joins are involved and when translating SQL, LINQ comprehension syntax can be easier to understand:
var ans = (from sar in context.StorageAreaRacks
join sa in context.StorageAreas on sar.StorageAreaId equals sa.Id
join sat in context.StorageAreaTypes on sa.StorageAreaTypeId equals sat.Id
join r in context.Racks on sar.RackId equals r.Id
where !sat.IsManual && r.IsEnabled && !r.IsVirtual
select new {
sat.Name,
sat.Id
}).Distinct().ToList();
You are missing a Where for your rack ID != null in your LINQ statement, and a Distinct().

using nHibernate QueryOver to join a subset

I am using nHibernate for our database access. I need to do a complicated query to find all member journal entries after a certain date with certain value, PreviousId, set for each member. I can easily write the SQL for it:
SELECT J.MemberId, J.PreviousId
FROM tblMemMemberStatusJournal J
INNER JOIN (
SELECT MemberId,
MIN(EffectiveDate) AS EffectiveDate
FROM tblMemMemberStatusJournal
WHERE EffectiveDate > #StartOfMonth
AND (PreviousId is NOT null)
GROUP BY MemberId
) AS X ON (X.EffectiveDate = J.EffectiveDate AND X.MemberId = J.MemberId)
However I am having a lot of trouble trying to get nHibernate to generate this information. There is not a lot of (any) documentation for how to use QueryOver.
I have been seeing information in other places, but none of it is very clear and very little has an actual explanation as to why things are done in certain ways. The answer for Selecting on Sub Queries in NHibernate with Critieria API did not give an adequate example as to what it is doing, so I haven't been able to replicate it.
I've gotten the inner part of the query created with this:
IList<object[]> result = session.QueryOver<MemberStatusJournal>()
.SelectList(list => list
.SelectGroup(a => a.Member.ID)
.SelectMin(a => a.EffectiveDate))
.Where(j => (j.EffectiveDate > firstOfMonth) && (j.PreviousId != null))
.List<object[]>();
Which, according to the profiler, makes this SQL:
SELECT this_.MemberId as y0_,
min(this_.EffectiveDate) as y1_
FROM tblMemMemberStatusJournal this_
WHERE (this_.EffectiveDate > '2014-08-01T00:00:00' /* #p0 */
and not (this_.PreviousLocalId is null))
GROUP BY this_.MemberId
But I am not finding a good example of how to actually do join this subset with a parent query. Does anyone have any suggestions?
You aren't actually joining on a subset, you're filtering on a subset. Knowing this, you have the option of filtering via other means, in this case, a correlated subquery.
The solution below first creates a detatched query to act as the inner subquery. We can correlate properties of the inner query with properties of the outer query through the use of an alias.
MemberStatusJournal memberStatusJournalAlias = null; // This will represent the
// object of the outer query
var subQuery = QueryOver.Of<MemberStatusJournal>()
.Select(Projections.GroupProperty(Projections.Property<MemberStatusJournal>(m => m.Member.ID)))
.Where(j => (j.EffectiveDate > firstOfMonth) && (j.PreviousId != null))
.Where(Restrictions.EqProperty(
Projections.Min<MemberStatusJournal>(j => j.EffectiveDate),
Projections.Property(() => memberStatusJournalAlias.EffectiveDate)
)
)
.Where(Restrictions.EqProperty(
Projections.GroupProperty(Projections.Property<MemberStatusJournal>(m => m.Member.Id)),
Projections.Property(() => memberStatusJournalAlias.Member.Id)
));
var results = session.QueryOver<MemberStatusJournal>(() => memberStatusJournalAlias)
.WithSubquery
.WhereExists(subQuery)
.List();
This would produce an SQL query like the following:
SELECT blah
FROM tblMemMemberStatusJournal J
WHERE EXISTS (
SELECT J2.MemberId
FROM tblMemberStatusJournal J2
WHERE J2.EffectiveDate > #StartOfMonth
AND (J2.PreviousId is NOT null)
GROUP BY J2.MemberId
HAVING MIN(J2.EffectiveDate) = J.EffectiveDate
AND J2.MemberId = J.MemberId
)
This looks less efficient than the inner join query you opened the question with. But my experience is that the SQL Query Optimizer is clever enough to convert this into an inner join. If you want to confirm this, you can use SQL Studio to generate and compare the execution plans of both queries.

QueryOver - adding additional criteria to a Join

I'm quite new to NHibernate and QueryOver and I can't get NHibernate to generate the SQL I need.
I need to make a join and have an extra criteria on so I avoid getting to much data from the table I'm joining with.
The SQL I receive from QueryOver is:
SELECT * FROM adresse this_
left outer join r580_test.afvigelse remarkalia1_ on this_.id=remarkalia1_.adrid
left outer join r580_test.afvigelseklagepunkter remarkcomp5_ on remarkalia1_.id=remarkcomp5_.afvigelseid
left outer join r580_test.klagepunkter complainta2_ on remarkcomp5_.klagepunktid=complainta2_.id
WHERE this_.id = 16633 and remarkalia1_.dato between '2009-03-13 00:00:00' and '02-03-2012 16:34:35'
What I would like is this(the where date between has been moved to the end for the first left outer join):
SELECT * FROM adresse this_
left outer join r580_test.afvigelse remarkalia1_ on this_.id=remarkalia1_.adrid and remarkalia1_.dato between '2009-03-13 00:00:00' and '02-03-2012 16:34:35'
left outer join r580_test.afvigelseklagepunkter remarkcomp5_ on remarkalia1_.id=remarkcomp5_.afvigelseid
left outer join r580_test.klagepunkter complainta2_ on remarkcomp5_.klagepunktid=complainta2_.id
WHERE this_.id = 16633
My QueryOver looks like this:
adr = session.QueryOver<Address>()
.Where(x => x.Id == 16633)
.JoinQueryOver<Remark>(y => y.Remarks).Where(y => y.Created > DateTime.Now.AddDays(-14))
.JoinAlias(y => y.RemarkComplaint, () => complaintAlias, JoinType.LeftOuterJoin)
.SingleOrDefault();
Anyone got an idea about how to fix this?
There are several overloads for joinqueryover - I believe you want something like:
Remark remark = null;
adr = session.QueryOver<Address>()
.Where(x => x.Id == 16633)
.JoinQueryOver<Remark>(y => y.Remarks, () => remark, y => y.Created > DateTime.Now.AddDays(-14))
.JoinAlias(y => y.RemarkComplaint, () => complaintAlias, JoinType.LeftOuterJoin)
.SingleOrDefault();
In this case the third parameter is the withClause which, I believe, will add the restriction to the join.

Categories