Entity Framework LINQ to Entities Join Query Timeout - c#

I am executing the following LINQ to Entities query but it is stuck and does not return response until timeout. I executed the same query on SQL Server and it return 92000 in 3 sec.
var query = (from r in WinCtx.PartsRoutings
join s in WinCtx.Tab_Processes on r.ProcessName equals s.ProcessName
join p in WinCtx.Tab_Parts on r.CustPartNum equals p.CustPartNum
select new { r}).ToList();
SQL Generated:
SELECT [ I omitted columns]
FROM [dbo].[PartsRouting] AS [Extent1]
INNER JOIN [dbo].[Tab_Processes] AS [Extent2] ON ([Extent1].[ProcessName] = [Extent2].[ProcessName]) OR (([Extent1].[ProcessName] IS NULL) AND ([Extent2].[ProcessName] IS NULL))
INNER JOIN [dbo].[Tab_Parts] AS [Extent3] ON ([Extent1].[CustPartNum] = [Extent3].[CustPartNum]) OR (([Extent1].[CustPartNum] IS NULL) AND ([Extent3].[CustPartNum] IS NULL))
PartsRouting Table has 100,000+ records, Parts = 15000+, Processes = 200.
I tried too many things found online but nothing worked for me as to how I can achieve the result with same performance of SQL.

Based on the comments, looks like the issue is caused by the additional OR with IS NULL conditions in joins generated by the EF SQL translator. They were added in EF in order to emulate the C# == operator semantics which are different from SQL = for NULL values.
You can start by turning that EF behavior off through UseDatabaseNullSemantics property (it's false by default):
WinCtx.Configuration.UseDatabaseNullSemantics = true;
Unfortunately that's not enough, because it fixes the normal comparison operators, but they simply forgot to do the same for join conditions.
In case you are using joins just for filtering (as it seems), you can replace them with LINQ Any conditions which translates to SQL EXISTS and nowadays database query optimizers are treating it the same way as if it was an inner join:
var query = (from r in WinCtx.PartsRoutings
where WinCtx.Tab_Processes.Any(s => r.ProcessName == s.ProcessName)
where WinCtx.Tab_Parts.Any(p => r.CustPartNum == p.CustPartNum)
select new { r }).ToList();
You might also consider using just select r since creating anonymous type with single property just introdeces additional memory overhead with no advantages.
Update: Looking at the latest comment, you do need fields from joined tables (that's why it's important to not omit relevant parts of the query in question). In such case, you could try the alternative join syntax with where clauses:
WinCtx.Configuration.UseDatabaseNullSemantics = true;
var query = (from r in WinCtx.PartsRoutings
from s in WinCtx.Tab_Processes where r.ProcessName == s.ProcessName
from p in WinCtx.Tab_Parts where r.CustPartNum == p.CustPartNum
select new { r, s.Foo, p.Bar }).ToList();

Related

Multiple LINQ Join conditions with differing comparison operators

I am trying to build a LINQ query that will accomodate a dynamic list of predicates, but also provide multiple non-equity join conditions between two tables. The ORM I am using is Telerik Open Access/Data Access going against an Oracle database.
Here is the PL-SQL Query I am trying to build in Linq.
SELECT DISTINCT
"asset".asset_number
, "hdr".revision
, "hdr".syscfg_booth_num
, "hdr".message_id
, "hdr".msg_num
, "hdr".msg_create_date
, "hdr".msg_xmit_date
, "hdr".skytel_date
FROM xfe_rep.wi_transmits "hdr"
INNER JOIN xfe_rep.pin2pagerid "asset" ON
"hdr".pin = "asset".wireless_pager_pin
AND
"hdr".msg_create_date >= "asset".effective_start_date
AND
"hdr".msg_create_date <= "asset".effective_end_date
WHERE
"hdr".field_one = ??
AND
"hdr".field_two = ??;
The WHERE clause will be added dynamically from individual predicates provided in a list. My first pass looked something like this ...
IQueryable<WITransmits> query = from wiTransmits in ctx.WITransmits
join assetMap in ctx.AssetNumberMaps
on wiTransmits.PIN equals assetMap.PIN
where
assetMap.EffectiveStartDate <= wiTransmits.MessageCreateDate &&
assetMap.EffectiveEndDate >= wiTransmits.MessageCreateDate
select wiTransmits;
foreach (Expression<Func<WITransmits, Boolean>> filter in messagePredicate)
{
query = query.Where(filter);
}
The problem with the above code is that it raises the exception "ORA-01013: user requested cancel of current operation". I believe that this is due to the number of records being returned. I have learned from other tests that when the WHERE clause is added via both the expression and method syntax to the same query it generates multiple SQL Select statements and executes each separately against the database performing the actual join in memory instead. With millions of records this is latent and impractical, I believe giving rise to the exception.
Of course when I removed the WHERE clause from the Expression syntax only one SQL statement was created, and executes as expected.
IQueryable<WITransmits> query = from wiTransmits in ctx.WITransmits
join assetMap in ctx.AssetNumberMaps
on wiTransmits.PIN equals assetMap.PIN
select wiTransmits;
foreach (Expression<Func<WITransmits, Boolean>> filter in messagePredicate)
{
query = query.Where(filter);
}
But, of course I lose the JOIN filters on the dates. The other alternative I had considered was to add multiple conditions to the JOIN. This, however, only allows me to compare for EQUALITY between two object definitons. I need to compare each property with a different comparison operator ( ==, >=, <= ) though ... so again this does not work.
IQueryable<WITransmits> query = from wiTransmits in ctx.WITransmits
join assetMap in ctx.AssetNumberMaps on
new { wiTransmits.PIN, wiTransmits.MessageCreateDate, wiTransmits.MessageCreateDate } equals
new { assetMap.PIN, assetMap.EffectiveStartDate, assetMap.EffectiveEndDate }
select wiTransmits;
foreach (Expression<Func<WITransmits, Boolean>> filter in messagePredicate)
{
query = query.Where(filter);
}
The final thought I had was to simply add the conditions as a couple of additional WHERE filters. The problem here is that the WHERE clause is between two table values and not a static value provided by the consumer. The query does not return a Queryable typed for both tables involved in the comparison so this will not work either.
IQueryable<WITransmits> query = from wiTransmits in ctx.WITransmits
join assetMap in ctx.AssetNumberMaps
on wiTransmits.PIN equals assetMap.PIN
select wiTransmits;
var joinPredicate = new List<Expression<Func<WITransmits, AssetNumberMaps, Boolean>>>();
joinPredicate.Add((wiTransmits, assetMap) => assetMap.EffectiveStartDate <= wiTransmits.MessageCreateDate);
joinPredicate.Add((wiTransmits, assetMap) => assetMap.EffectiveEndDate >= wiTransmits.MessageCreateDate);
foreach (Expression<Func<WITransmits, AssetNumberMaps, Boolean>> filter in joinPredicate)
{
query = query.Where(filter); // DOES NOT WORK
}
foreach (Expression<Func<WITransmits, Boolean>> filter in messagePredicate)
{
query = query.Where(filter);
}
Anybody have any ideas on how to do this? I am fresh out of direction ....

Duplicated joins in Npgsql for the same navigation property

This question already exists here, but for Entity Framework to TSQL
When using the same navigation property multiple times on a select, Npgsql query results in multiple joins, one for every use of the navigation property. This result in an awful performance hit (tested)
I've read that this is a problem with EF 4, but this problem also occurs on EF 6.
I think that this is an issue with the Npgsql LINQ to SQL translator
This is the code that Npgsql generate for the same navigation property used mutliple times, obviously, only one join is needed (copied from the other questions because is exactly the same case)
LEFT OUTER JOIN [dbo].[Versions] AS [Extent4] ON [Extent1].[IDVersionReported] = [Extent4].[ID]
LEFT OUTER JOIN [dbo].[Versions] AS [Extent5] ON [Extent1].[IDVersionReported] = [Extent5].[ID]
LEFT OUTER JOIN [dbo].[Versions] AS [Extent6] ON [Extent1].[IDVersionReported] = [Extent6].[ID]
LEFT OUTER JOIN [dbo].[Versions] AS [Extent7] ON [Extent1].[IDVersionReported] = [Extent7].[ID]
Is it posible to tune PostgreSql to optimize repeated joins?
If not, which option is the best for solving this problem?
Wait until Npgsql gets fixed
Download Npgsql code and find the way to fix it
Intercept generated SQL before reaching the database, parse it, and remove duplicated joins. (Read here )
Do not use navigation properties, use LINQ joins instead
Indeed, this is a problem of Entity Framework, but I found a workaround, hope that this helps someone.
This was the original where part of the LINQ query:
from cr in Creditos
where cr.validado == 1 &&
cr.fecharegistro >= Desde &&
cr.fecharegistro <= Hasta &&
!ProductosExcluidos.Contains(cr.idproducto.Value) &&
cr.amortizaciones.Sum(am => am.importecapital - am.pagoscap - am.capcancel) > 1
//All references to the navigation property cr.numcliente
//results on a separated LEFT OUTTER JOIN between this the 'creditos' and 'clientes' tables
select new ArchivoCliente
{
RFC = cr.numcliente.rfc,
Nombres = cr.numcliente.nombres,
ApellidoPaterno = cr.numcliente.apellidopaterno,
ApellidoMaterno = cr.numcliente.apellidomaterno,
}
Note the last condition of the where, that condition does a sum of all child entities of cr, if we take out that last condition, all the duplicated LEFT OUTTER JOIN are replaced by one single JOIN, for some reason, Entity Framework doesn't like subqueries or aggregates on the where part of the query
If instead we replace the original query with this other equivalent query only a single LEFT OUTTER JOIN is generated.
(from cr in Creditos
where cr.validado == 1 &&
cr.fecharegistro >= Desde &&
cr.fecharegistro <= Hasta &&
!ProductosExcluidos.Contains(cr.idproducto.Value) &&
//Excluded aggregate function condition from the first where
//the value is now on the select and used for posterior filtering
select new ArchivoCliente
{
RFC = cr.numcliente.rfc,
Nombres = cr.numcliente.nombres,
ApellidoPaterno = cr.numcliente.apellidopaterno,
ApellidoMaterno = cr.numcliente.apellidomaterno,
SumaAmort = cr.amortizaciones.Sum(am => am.importecapital - am.pagoscap - am.capcancel)
}).Where (x => x.SumaAmort > 1);
Now instead of directly filtering on the first where statement, the aggregate result is stored as part of the projection, and then, a second where is applied to the resulting query.
This results in a much faster query, with only the necessary joins on the translated SQL statement.

LINQ Group by and having where clause

Below is the SQL Query I am trying to translate
SELECT dbo.Contracts.Supplier
FROM dbo.Contracts INNER JOIN dbo.Products ON dbo.Contracts.Product = dbo.Products.Product
where dbo.Products.ProductGroup='Crude'
GROUP BY dbo.Contracts.Supplier
Am I doing something wrong because I do not get same results with the following LINQ
var result = from c in context.Contracts
join p in context.Products on c.Product equals p.Product1
where p.Product1.Equals("Crude")
group c by c.Supplier into g
select new { supplier = g.Key };
It is generating a weird statement
SELECT
1 AS [C1],
[Distinct1].[Supplier] AS [Supplier]
FROM ( SELECT DISTINCT
[Extent1].[Supplier] AS [Supplier]
FROM [dbo].[Contracts] AS [Extent1]
WHERE N'Crude' = [Extent1].[Product]
) AS [Distinct1]
Using distinct would work but to get same results, LINQ should be generating a statement like so (it's like it is ignoring the join):
SELECT distinct dbo.Contracts.Supplier
FROM dbo.Contracts INNER JOIN dbo.Products ON dbo.Contracts.Product = dbo.Products.Product
where dbo.Products.ProductGroup='Crude'
I'm assuming that you are using 'EntityFramework' or 'Linq To SQL'. If so, you should be able to use navigation properties to navigate to product and filter invalit results out. This way your query might look something like this:
var result = (from c in context.Contracts
where c.Products.Any(p => p.ProductGroup == "Crude")
select c.Supplier).Distinct();
It will automatically convert into correct query (in this case possibly without join even, just using Exists sql keyword) and return distinct suppliers. This is if I understand your objective correctly - you want to obtain all suppliers assigned to contracts that contain product from 'Crude' product group.
Basically you should try to avoid using joins from linq to sql or linq to entities as much as possible when you can use navigation properties. System will probably be better at converting them into specific sql.

Using DISTINCT on a subquery to remove duplicates in Entity Framework

I have question about use of Distinct with Entity Framework, using Sql 2005. In this example:
practitioners = from p in context.Practitioners
join pn in context.ProviderNetworks on
p.ProviderId equals pn.ProviderId
(notNetworkIds.Contains(pn.Network))
select p;
practitioners = practitioners
.Distinct()
.OrderByDescending(p => p.UpdateDate);
data = practitioners.Skip(PageSize * (pageOffset ?? 0)).Take(PageSize).ToList();
It all works fine, but the use of distinct is very inefficient. Larger result sets incur unacceptable performance. The DISTINCT is killing me. The distinct is only needed because multiple networks can be queried, causing Providers records to be duplicated. In effect I need to ask the DB "only return providers ONCE even if they're in multiple networks". If I could place the DISTINCT on the ProviderNetworks, the query runs much faster.
How can I cause EF to add the DISTINCT only the subquery, not to the entire resultset?
The resulting simplified sql I DON'T want is:
select DISTINCT p.* from Providers
inner join Networks pn on p.ProviderId = pn.ProviderId
where NetworkName in ('abc','def')
IDEAL sql is:
select p.* from Providers
inner join (select DISTINCT ProviderId from Networks
where NetworkName in ('abc','def'))
as pn on p.ProviderId = pn.ProviderId
Thanks
Dave
I dont think you need a Distinct here but a Exists (or Any as it is called in Linq)
Try this:
var q = (from p in context.Practitioners
where context.ProviderNetworks.Any(pn => pn.ProviderId == p.ProviderId && notNetworkIds.Contains(pn.Network))
orderby p.UpdateDate descending
select p).Skip(PageSize * (pageOffset ?? 0)).Take(PageSize).ToList();

Is this LINQ Query "correct"?

I have the following LINQ query, that is returning the results that I expect, but it does not "feel" right.
Basically it is a left join. I need ALL records from the UserProfile table.
Then the LastWinnerDate is a single record from the winner table (possible multiple records) indicating the DateTime the last record was entered in that table for the user.
WinnerCount is the number of records for the user in the winner table (possible multiple records).
Video1 is basically a bool indicating there is, or is not a record for the user in the winner table matching on a third table Objective (should be 1 or 0 rows).
Quiz1 is same as Video 1 matching another record from Objective Table (should be 1 or 0 rows).
Video and Quiz is repeated 12 times because it is for a report to be displayed to a user listing all user records and indicate if they have met the objectives.
var objectiveIds = new List<int>();
objectiveIds.AddRange(GetObjectiveIds(objectiveName, false));
var q =
from up in MetaData.UserProfile
select new RankingDTO
{
UserId = up.UserID,
FirstName = up.FirstName,
LastName = up.LastName,
LastWinnerDate = (
from winner in MetaData.Winner
where objectiveIds.Contains(winner.ObjectiveID)
where winner.Active
where winner.UserID == up.UserID
orderby winner.CreatedOn descending
select winner.CreatedOn).First(),
WinnerCount = (
from winner in MetaData.Winner
where objectiveIds.Contains(winner.ObjectiveID)
where winner.Active
where winner.UserID == up.UserID
orderby winner.CreatedOn descending
select winner).Count(),
Video1 = (
from winner in MetaData.Winner
join o in MetaData.Objective on winner.ObjectiveID equals o.ObjectiveID
where o.ObjectiveNm == Constants.Promotions.SecVideo1
where winner.Active
where winner.UserID == up.UserID
select winner).Count(),
Quiz1 = (
from winner2 in MetaData.Winner
join o2 in MetaData.Objective on winner2.ObjectiveID equals o2.ObjectiveID
where o2.ObjectiveNm == Constants.Promotions.SecQuiz1
where winner2.Active
where winner2.UserID == up.UserID
select winner2).Count(),
};
You're repeating join winners table part several times. In order to avoid it you can break it into several consequent Selects. So instead of having one huge select, you can make two selects with lesser code. In your example I would first of all select winner2 variable before selecting other result properties:
var q1 =
from up in MetaData.UserProfile
select new {up,
winners = from winner in MetaData.Winner
where winner.Active
where winner.UserID == up.UserID
select winner};
var q = from upWinnerPair in q1
select new RankingDTO
{
UserId = upWinnerPair.up.UserID,
FirstName = upWinnerPair.up.FirstName,
LastName = upWinnerPair.up.LastName,
LastWinnerDate = /* Here you will have more simple and less repeatable code
using winners collection from "upWinnerPair.winners"*/
The query itself is pretty simple: just a main outer query and a series of subselects to retrieve actual column data. While it's not the most efficient means of querying the data you're after (joins and using windowing functions will likely get you better performance), it's the only real way to represent that query using either the query or expression syntax (windowing functions in SQL have no mapping in LINQ or the LINQ-supporting extension methods).
Note that you aren't doing any actual outer joins (left or right) in your code; you're creating subqueries to retrieve the column data. It might be worth looking at the actual SQL being generated by your query. You don't specify which ORM you're using (which would determine how to examine it client-side) or which database you're using (which would determine how to examine it server-side).
If you're using the ADO.NET Entity Framework, you can cast your query to an ObjectQuery and call ToTraceString().
If you're using SQL Server, you can use SQL Server Profiler (assuming you have access to it) to view the SQL being executed, or you can run a trace manually to do the same thing.
To perform an outer join in LINQ query syntax, do this:
Assuming we have two sources alpha and beta, each having a common Id property, you can select from alpha and perform a left join on beta in this way:
from a in alpha
join btemp in beta on a.Id equals btemp.Id into bleft
from b in bleft.DefaultIfEmpty()
select new { IdA = a.Id, IdB = b.Id }
Admittedly, the syntax is a little oblique. Nonetheless, it works and will be translated into something like this in SQL:
select
a.Id as IdA,
b.Id as Idb
from alpha a
left join beta b on a.Id = b.Id
It looks fine to me, though I could see why the multiple sub-queries could trigger inefficiency worries in the eyes of a coder.
Take a look at what SQL is produced though (I'm guessing you're running this against a database source from your saying "table" above), before you start worrying about that. The query providers can be pretty good at producing nice efficient SQL that in turn produces a good underlying database query, and if that's happening, then happy days (it will also give you another view on being sure of the correctness).

Categories