Using DISTINCT on a subquery to remove duplicates in Entity Framework

Using DISTINCT on a subquery to remove duplicates in Entity Framework - c#

I have question about use of Distinct with Entity Framework, using Sql 2005. In this example:
practitioners = from p in context.Practitioners
join pn in context.ProviderNetworks on
p.ProviderId equals pn.ProviderId
(notNetworkIds.Contains(pn.Network))
select p;
practitioners = practitioners
.Distinct()
.OrderByDescending(p => p.UpdateDate);
data = practitioners.Skip(PageSize * (pageOffset ?? 0)).Take(PageSize).ToList();
It all works fine, but the use of distinct is very inefficient. Larger result sets incur unacceptable performance. The DISTINCT is killing me. The distinct is only needed because multiple networks can be queried, causing Providers records to be duplicated. In effect I need to ask the DB "only return providers ONCE even if they're in multiple networks". If I could place the DISTINCT on the ProviderNetworks, the query runs much faster.
How can I cause EF to add the DISTINCT only the subquery, not to the entire resultset?
The resulting simplified sql I DON'T want is:
select DISTINCT p.* from Providers
inner join Networks pn on p.ProviderId = pn.ProviderId
where NetworkName in ('abc','def')
IDEAL sql is:
select p.* from Providers
inner join (select DISTINCT ProviderId from Networks
where NetworkName in ('abc','def'))
as pn on p.ProviderId = pn.ProviderId
Thanks
Dave

I dont think you need a Distinct here but a Exists (or Any as it is called in Linq)
Try this:
var q = (from p in context.Practitioners
where context.ProviderNetworks.Any(pn => pn.ProviderId == p.ProviderId && notNetworkIds.Contains(pn.Network))
orderby p.UpdateDate descending
select p).Skip(PageSize * (pageOffset ?? 0)).Take(PageSize).ToList();

Related

LINQ Left Join SQL equivalent via DefaultIfEmpty fails to return results

I have had an extensive look around on SE, tried all of the suggestions, checked out MSDN how to perform Left Join equivalent in LINQ to SQL and I have constructed my LINQ query according to MSDN example.
However, the result is not what SQL would return and I am completely lost as to where am I going wrong.
Here is some details:
I have two tables, Customers and Reports. A customer can submit many reports or none. In the current state I have many more reports than customers.
LINQ code:
var query = {from c in customers
join r in reports on c.Id equals r.Id into temp
from items in temp.DefaultIfEmpty()
select new {
c.Id,
LastReportDate = items?.DateCreated ?? DateTime.MinValue
}).ToList();
SQL code:
SELECT [Customers].[Id], R.LastReport AS LastReportDate FROM [Customers]
LEFT JOIN (
SELECT Reports.Id, MAX( [Reports].[Created] ) AS LastReport
FROM Reports GROUP BY Reports.Id
) AS r ON [Customers].[Id] = r.[Id]
The problem is that the query returns number of elements equal to number of reports. However, what I want is to get a list with all customers and for those who have submitted a report I wish to display the date of the most recent report, for those who have not submitted anything, I am happy to leave it NULL or DateTime.MinValue
Any help would be greatly appreciated. I guess I am missing a group by call somewhere in my LINQ code...

Im thinking probably something like this:
var query =
from c in customers
join r in reports on c.Id equals r.Id into g
select new
{
c.Id,
LastReportDate = g.Max(x => (DateTime?)x.Created)
};

you are now joining on join r in reports on c.Id equals r.Id into temp
this looks like: join on a customer.Id on Reports.Id, since you say there are 1 to many relation/rapport. I think your table will have a Reports.CustomerId. Is this correct?
So your query should something look like:
var results = customer.Where(c => c.Reports.Any())
.SelectMany(c => {c, c.Reports.Max(r => r.Created)})
.ToList();
the select comes out of my head, so i am probably missing something ;)
Have you tried LinqPad ? There you can type your linq-queries, and directly see your sql code and results. Works like a charm!

Entity Framework LINQ to Entities Join Query Timeout

I am executing the following LINQ to Entities query but it is stuck and does not return response until timeout. I executed the same query on SQL Server and it return 92000 in 3 sec.
var query = (from r in WinCtx.PartsRoutings
join s in WinCtx.Tab_Processes on r.ProcessName equals s.ProcessName
join p in WinCtx.Tab_Parts on r.CustPartNum equals p.CustPartNum
select new { r}).ToList();
SQL Generated:
SELECT [ I omitted columns]
FROM [dbo].[PartsRouting] AS [Extent1]
INNER JOIN [dbo].[Tab_Processes] AS [Extent2] ON ([Extent1].[ProcessName] = [Extent2].[ProcessName]) OR (([Extent1].[ProcessName] IS NULL) AND ([Extent2].[ProcessName] IS NULL))
INNER JOIN [dbo].[Tab_Parts] AS [Extent3] ON ([Extent1].[CustPartNum] = [Extent3].[CustPartNum]) OR (([Extent1].[CustPartNum] IS NULL) AND ([Extent3].[CustPartNum] IS NULL))
PartsRouting Table has 100,000+ records, Parts = 15000+, Processes = 200.
I tried too many things found online but nothing worked for me as to how I can achieve the result with same performance of SQL.

Based on the comments, looks like the issue is caused by the additional OR with IS NULL conditions in joins generated by the EF SQL translator. They were added in EF in order to emulate the C# == operator semantics which are different from SQL = for NULL values.
You can start by turning that EF behavior off through UseDatabaseNullSemantics property (it's false by default):
WinCtx.Configuration.UseDatabaseNullSemantics = true;
Unfortunately that's not enough, because it fixes the normal comparison operators, but they simply forgot to do the same for join conditions.
In case you are using joins just for filtering (as it seems), you can replace them with LINQ Any conditions which translates to SQL EXISTS and nowadays database query optimizers are treating it the same way as if it was an inner join:
var query = (from r in WinCtx.PartsRoutings
where WinCtx.Tab_Processes.Any(s => r.ProcessName == s.ProcessName)
where WinCtx.Tab_Parts.Any(p => r.CustPartNum == p.CustPartNum)
select new { r }).ToList();
You might also consider using just select r since creating anonymous type with single property just introdeces additional memory overhead with no advantages.
Update: Looking at the latest comment, you do need fields from joined tables (that's why it's important to not omit relevant parts of the query in question). In such case, you could try the alternative join syntax with where clauses:
WinCtx.Configuration.UseDatabaseNullSemantics = true;
var query = (from r in WinCtx.PartsRoutings
from s in WinCtx.Tab_Processes where r.ProcessName == s.ProcessName
from p in WinCtx.Tab_Parts where r.CustPartNum == p.CustPartNum
select new { r, s.Foo, p.Bar }).ToList();

Convert SQL to Linq Lambda

I have two entities (EF 6) named Purchases and Packets. I am able to Join these two but not quite sure how do I count the Packets contained in the given Purchase. I have this SQL query to be converted to LINQ (Lambda expression preferred).
Thank you
SELECT
Pur.*,
Pac.Price,
(SELECT COUNT(ID) FROM Packets WHERE PurchaseID = Pur.ID) AS PacketCount
FROM
Purchases AS Pur
INNER JOIN
Packets AS Pac
ON
Pur.ID = Pac.PurchaseID
NOTE: I checked the answered Q's but none of them is addressing my issue.

I am no expert in LINQ (far from it) but I have done something similar. You say you have done the join already. If it is along the lines of:
var joinList = (from Item1 in Purchases
join Item2 in Packets
on Item1.Id equals Item2.PurchaseId
select new { Item1, Item2 }).ToList();
Then you can go:
var subList = joinList.Where(j => j.Item1.Id == myId).Select(s => { new s.Item1, s.Item2.Price, Count = joinList.Where(j => j.Item1.Id == myId).Count() }).ToList();
This will give you a List similar to the recordset returned by your SQL. Note that you will need to break out all the fields in Item1 (equivalent to Purchases.*). The alternative is to name them all in the Select (as s.Item2.Price).
HTH

LINQ Group by and having where clause

Below is the SQL Query I am trying to translate
SELECT dbo.Contracts.Supplier
FROM dbo.Contracts INNER JOIN dbo.Products ON dbo.Contracts.Product = dbo.Products.Product
where dbo.Products.ProductGroup='Crude'
GROUP BY dbo.Contracts.Supplier
Am I doing something wrong because I do not get same results with the following LINQ
var result = from c in context.Contracts
join p in context.Products on c.Product equals p.Product1
where p.Product1.Equals("Crude")
group c by c.Supplier into g
select new { supplier = g.Key };
It is generating a weird statement
SELECT
1 AS [C1],
[Distinct1].[Supplier] AS [Supplier]
FROM ( SELECT DISTINCT
[Extent1].[Supplier] AS [Supplier]
FROM [dbo].[Contracts] AS [Extent1]
WHERE N'Crude' = [Extent1].[Product]
) AS [Distinct1]
Using distinct would work but to get same results, LINQ should be generating a statement like so (it's like it is ignoring the join):
SELECT distinct dbo.Contracts.Supplier
FROM dbo.Contracts INNER JOIN dbo.Products ON dbo.Contracts.Product = dbo.Products.Product
where dbo.Products.ProductGroup='Crude'

I'm assuming that you are using 'EntityFramework' or 'Linq To SQL'. If so, you should be able to use navigation properties to navigate to product and filter invalit results out. This way your query might look something like this:
var result = (from c in context.Contracts
where c.Products.Any(p => p.ProductGroup == "Crude")
select c.Supplier).Distinct();
It will automatically convert into correct query (in this case possibly without join even, just using Exists sql keyword) and return distinct suppliers. This is if I understand your objective correctly - you want to obtain all suppliers assigned to contracts that contain product from 'Crude' product group.
Basically you should try to avoid using joins from linq to sql or linq to entities as much as possible when you can use navigation properties. System will probably be better at converting them into specific sql.

Is this LINQ Query "correct"?

I have the following LINQ query, that is returning the results that I expect, but it does not "feel" right.
Basically it is a left join. I need ALL records from the UserProfile table.
Then the LastWinnerDate is a single record from the winner table (possible multiple records) indicating the DateTime the last record was entered in that table for the user.
WinnerCount is the number of records for the user in the winner table (possible multiple records).
Video1 is basically a bool indicating there is, or is not a record for the user in the winner table matching on a third table Objective (should be 1 or 0 rows).
Quiz1 is same as Video 1 matching another record from Objective Table (should be 1 or 0 rows).
Video and Quiz is repeated 12 times because it is for a report to be displayed to a user listing all user records and indicate if they have met the objectives.
var objectiveIds = new List<int>();
objectiveIds.AddRange(GetObjectiveIds(objectiveName, false));
var q =
from up in MetaData.UserProfile
select new RankingDTO
{
UserId = up.UserID,
FirstName = up.FirstName,
LastName = up.LastName,
LastWinnerDate = (
from winner in MetaData.Winner
where objectiveIds.Contains(winner.ObjectiveID)
where winner.Active
where winner.UserID == up.UserID
orderby winner.CreatedOn descending
select winner.CreatedOn).First(),
WinnerCount = (
from winner in MetaData.Winner
where objectiveIds.Contains(winner.ObjectiveID)
where winner.Active
where winner.UserID == up.UserID
orderby winner.CreatedOn descending
select winner).Count(),
Video1 = (
from winner in MetaData.Winner
join o in MetaData.Objective on winner.ObjectiveID equals o.ObjectiveID
where o.ObjectiveNm == Constants.Promotions.SecVideo1
where winner.Active
where winner.UserID == up.UserID
select winner).Count(),
Quiz1 = (
from winner2 in MetaData.Winner
join o2 in MetaData.Objective on winner2.ObjectiveID equals o2.ObjectiveID
where o2.ObjectiveNm == Constants.Promotions.SecQuiz1
where winner2.Active
where winner2.UserID == up.UserID
select winner2).Count(),
};

You're repeating join winners table part several times. In order to avoid it you can break it into several consequent Selects. So instead of having one huge select, you can make two selects with lesser code. In your example I would first of all select winner2 variable before selecting other result properties:
var q1 =
from up in MetaData.UserProfile
select new {up,
winners = from winner in MetaData.Winner
where winner.Active
where winner.UserID == up.UserID
select winner};
var q = from upWinnerPair in q1
select new RankingDTO
{
UserId = upWinnerPair.up.UserID,
FirstName = upWinnerPair.up.FirstName,
LastName = upWinnerPair.up.LastName,
LastWinnerDate = /* Here you will have more simple and less repeatable code
using winners collection from "upWinnerPair.winners"*/

The query itself is pretty simple: just a main outer query and a series of subselects to retrieve actual column data. While it's not the most efficient means of querying the data you're after (joins and using windowing functions will likely get you better performance), it's the only real way to represent that query using either the query or expression syntax (windowing functions in SQL have no mapping in LINQ or the LINQ-supporting extension methods).
Note that you aren't doing any actual outer joins (left or right) in your code; you're creating subqueries to retrieve the column data. It might be worth looking at the actual SQL being generated by your query. You don't specify which ORM you're using (which would determine how to examine it client-side) or which database you're using (which would determine how to examine it server-side).
If you're using the ADO.NET Entity Framework, you can cast your query to an ObjectQuery and call ToTraceString().
If you're using SQL Server, you can use SQL Server Profiler (assuming you have access to it) to view the SQL being executed, or you can run a trace manually to do the same thing.
To perform an outer join in LINQ query syntax, do this:
Assuming we have two sources alpha and beta, each having a common Id property, you can select from alpha and perform a left join on beta in this way:
from a in alpha
join btemp in beta on a.Id equals btemp.Id into bleft
from b in bleft.DefaultIfEmpty()
select new { IdA = a.Id, IdB = b.Id }
Admittedly, the syntax is a little oblique. Nonetheless, it works and will be translated into something like this in SQL:
select
a.Id as IdA,
b.Id as Idb
from alpha a
left join beta b on a.Id = b.Id

It looks fine to me, though I could see why the multiple sub-queries could trigger inefficiency worries in the eyes of a coder.
Take a look at what SQL is produced though (I'm guessing you're running this against a database source from your saying "table" above), before you start worrying about that. The query providers can be pretty good at producing nice efficient SQL that in turn produces a good underlying database query, and if that's happening, then happy days (it will also give you another view on being sure of the correctness).

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Using DISTINCT on a subquery to remove duplicates in Entity Framework - c#

Related

LINQ Left Join SQL equivalent via DefaultIfEmpty fails to return results

Entity Framework LINQ to Entities Join Query Timeout

Convert SQL to Linq Lambda

LINQ Group by and having where clause

Is this LINQ Query "correct"?

Categories

Resources