This question already exists here, but for Entity Framework to TSQL
When using the same navigation property multiple times on a select, Npgsql query results in multiple joins, one for every use of the navigation property. This result in an awful performance hit (tested)
I've read that this is a problem with EF 4, but this problem also occurs on EF 6.
I think that this is an issue with the Npgsql LINQ to SQL translator
This is the code that Npgsql generate for the same navigation property used mutliple times, obviously, only one join is needed (copied from the other questions because is exactly the same case)
LEFT OUTER JOIN [dbo].[Versions] AS [Extent4] ON [Extent1].[IDVersionReported] = [Extent4].[ID]
LEFT OUTER JOIN [dbo].[Versions] AS [Extent5] ON [Extent1].[IDVersionReported] = [Extent5].[ID]
LEFT OUTER JOIN [dbo].[Versions] AS [Extent6] ON [Extent1].[IDVersionReported] = [Extent6].[ID]
LEFT OUTER JOIN [dbo].[Versions] AS [Extent7] ON [Extent1].[IDVersionReported] = [Extent7].[ID]
Is it posible to tune PostgreSql to optimize repeated joins?
If not, which option is the best for solving this problem?
Wait until Npgsql gets fixed
Download Npgsql code and find the way to fix it
Intercept generated SQL before reaching the database, parse it, and remove duplicated joins. (Read here )
Do not use navigation properties, use LINQ joins instead
Indeed, this is a problem of Entity Framework, but I found a workaround, hope that this helps someone.
This was the original where part of the LINQ query:
from cr in Creditos
where cr.validado == 1 &&
cr.fecharegistro >= Desde &&
cr.fecharegistro <= Hasta &&
!ProductosExcluidos.Contains(cr.idproducto.Value) &&
cr.amortizaciones.Sum(am => am.importecapital - am.pagoscap - am.capcancel) > 1
//All references to the navigation property cr.numcliente
//results on a separated LEFT OUTTER JOIN between this the 'creditos' and 'clientes' tables
select new ArchivoCliente
{
RFC = cr.numcliente.rfc,
Nombres = cr.numcliente.nombres,
ApellidoPaterno = cr.numcliente.apellidopaterno,
ApellidoMaterno = cr.numcliente.apellidomaterno,
}
Note the last condition of the where, that condition does a sum of all child entities of cr, if we take out that last condition, all the duplicated LEFT OUTTER JOIN are replaced by one single JOIN, for some reason, Entity Framework doesn't like subqueries or aggregates on the where part of the query
If instead we replace the original query with this other equivalent query only a single LEFT OUTTER JOIN is generated.
(from cr in Creditos
where cr.validado == 1 &&
cr.fecharegistro >= Desde &&
cr.fecharegistro <= Hasta &&
!ProductosExcluidos.Contains(cr.idproducto.Value) &&
//Excluded aggregate function condition from the first where
//the value is now on the select and used for posterior filtering
select new ArchivoCliente
{
RFC = cr.numcliente.rfc,
Nombres = cr.numcliente.nombres,
ApellidoPaterno = cr.numcliente.apellidopaterno,
ApellidoMaterno = cr.numcliente.apellidomaterno,
SumaAmort = cr.amortizaciones.Sum(am => am.importecapital - am.pagoscap - am.capcancel)
}).Where (x => x.SumaAmort > 1);
Now instead of directly filtering on the first where statement, the aggregate result is stored as part of the projection, and then, a second where is applied to the resulting query.
This results in a much faster query, with only the necessary joins on the translated SQL statement.
Related
I am converting our Data Access Layer from Telerik ORM (deprecated) to EF 6. It connects to an Oracle 12c datasource with a 30 character limit. Some table names and column names are exactly 30 characters (as in this example below), some are shorter.
The Linq to Sql queries already exist and worked find with Telerik because it used alias for column names as t1, t2, etc.. whereas EF apparently copies the original column name as the alias. In most cases this works except for code table joins where both the PK of the code table and the FK of the parent table are exactly the same, then it appends a 1 to the end of the column alais thus making it 31 characters. As you can guess Oracle complains.
I need to find a way to force EF to use a shorter alias, any way to do this?
Here is the Linq query:
var dbElevationsHistory = (from elevationHist in dbContext.ElevationDataReadingJournals
join it in dbContext.MeasurementIssueTypes on elevationHist.MeasurementIssueTypeId equals it.MeasurementIssueTypeId into iType
from issueType in iType.DefaultIfEmpty()
join mt in dbContext.ElevationMeasureMethodTypes on elevationHist.ElevationMeasureMethodTypeId equals mt.ElevationMeasureMethodTypeId into mType
from methodType in mType.DefaultIfEmpty()
join at in dbContext.ElevationAccuracyTypes on elevationHist.ElevationAccuracyTypeId equals at.ElevationAccuracyTypeId into aType
from accuracyType in aType.DefaultIfEmpty()
join cAgency in dbContext.Organizations on elevationHist.CoopAgencyOrganizationId equals cAgency.OrganizationId into foundAgencies
from coopAgency in foundAgencies.DefaultIfEmpty()
where elevationHist.StationId == wellKey
select new { elevationHist, issueType, accuracyType, methodType, coopAgency })
.OrderByDescending(eh => eh.elevationHist.ModifiedDate)
.OrderByDescending(eh => eh.elevationHist.ElevationDataReadingJournalId);
And the sql query it produces:
SELECT
"Project1"."C1" AS "C1",
"Project1"."EWM_ELEVATION_DATA_READ_JRL_ID" AS "EWM_ELEVATION_DATA_READ_JRL_ID",
"Project1"."EWM_ELEVATION_DATA_READING_ID" AS "EWM_ELEVATION_DATA_READING_ID",
"Project1"."CRUD_TYPE" AS "CRUD_TYPE",
"Project1"."CRUD_DT" AS "CRUD_DT",
"Project1"."MEASUREMENT_DT" AS "MEASUREMENT_DT",
"Project1"."ORG_ID" AS "ORG_ID",
"Project1"."EWM_MEASUREMENT_ISSUE_TYPE_ID" AS "EWM_MEASUREMENT_ISSUE_TYPE_ID",
"Project1"."EWM_STATION_ID" AS "EWM_STATION_ID",
"Project1"."EWM_ELEV_MEASURE_METHOD_TYP_ID" AS "EWM_ELEV_MEASURE_METHOD_TYP_ID",
"Project1"."EWM_ELEVATION_ACCURACY_TYPE_ID" AS "EWM_ELEVATION_ACCURACY_TYPE_ID",
"Project1"."REFERENCE_POINT_ELEVATION" AS "REFERENCE_POINT_ELEVATION",
"Project1"."GROUND_SURFACE_ELEVATION" AS "GROUND_SURFACE_ELEVATION",
"Project1"."WATER_SURFACE_READING" AS "WATER_SURFACE_READING",
"Project1"."REFERENCE_POINT_READING" AS "REFERENCE_POINT_READING",
"Project1"."MANDATORY_READING" AS "MANDATORY_READING",
"Project1"."COMMENTS" AS "COMMENTS",
"Project1"."MODIFIED_DATE" AS "MODIFIED_DATE",
"Project1"."MODIFIED_USER" AS "MODIFIED_USER",
"Project1"."MODIFIED_PROC" AS "MODIFIED_PROC",
"Project1"."COOPERATING_AGENCY_ORG_ID" AS "COOPERATING_AGENCY_ORG_ID",
"Project1"."APPL_ID" AS "APPL_ID",
"Project1"."SUBMISSION_DATE" AS "SUBMISSION_DATE",
"Project1"."EWM_MEASUREMENT_ISSUE_TYPE_ID1" AS "EWM_MEASUREMENT_ISSUE_TYPE_ID1",
"Project1"."EWM_MEASURE_ISSUE_TYPE_CODE" AS "EWM_MEASURE_ISSUE_TYPE_CODE",
"Project1"."EWM_MEASURE_ISSUE_TYPE_DESC" AS "EWM_MEASURE_ISSUE_TYPE_DESC",
"Project1"."EWM_MEASURE_ISSUE_TYPE_ACTV" AS "EWM_MEASURE_ISSUE_TYPE_ACTV",
"Project1"."EWM_MEASURE_ISSUE_TYPE_ORDER" AS "EWM_MEASURE_ISSUE_TYPE_ORDER",
"Project1"."EWM_MEASURE_ISSUE_TYPE_CLASS" AS "EWM_MEASURE_ISSUE_TYPE_CLASS",
"Project1"."MODIFIED_DATE1" AS "MODIFIED_DATE1",
"Project1"."MODIFIED_USER1" AS "MODIFIED_USER1",
"Project1"."MODIFIED_PROC1" AS "MODIFIED_PROC1",
"Project1"."APPL_ID1" AS "APPL_ID1",
"Project1"."EWM_ELEVATION_ACCURACY_TYPE_ID1" AS "EWM_ELEVATION_ACCURACY_TYPE_ID1",
"Project1"."EWM_ELEVATION_ACCURACY_DESC" AS "EWM_ELEVATION_ACCURACY_DESC",
"Project1"."EWM_ELEVATION_ACCURACY_ACTV" AS "EWM_ELEVATION_ACCURACY_ACTV",
"Project1"."EWM_ELEVATION_ACCURACY_ORDER" AS "EWM_ELEVATION_ACCURACY_ORDER",
"Project1"."MODIFIED_DATE3" AS "MODIFIED_DATE2",
"Project1"."MODIFIED_USER3" AS "MODIFIED_USER2",
"Project1"."MODIFIED_PROC3" AS "MODIFIED_PROC2",
"Project1"."EWM_ELEVATION_ACCURACY_CD" AS "EWM_ELEVATION_ACCURACY_CD",
"Project1"."APPL_ID3" AS "APPL_ID2",
"Project1"."EWM_ELEV_MEASURE_METHOD_TYP_ID1" AS "EWM_ELEV_MEASURE_METHOD_TYP_ID1",
"Project1"."EWM_ELEV_MEASURE_METHOD_DESC" AS "EWM_ELEV_MEASURE_METHOD_DESC",
"Project1"."EWM_ELEV_MEASURE_METHOD_ACTV" AS "EWM_ELEV_MEASURE_METHOD_ACTV",
"Project1"."EWM_ELEV_MEASURE_METHOD_ORDER" AS "EWM_ELEV_MEASURE_METHOD_ORDER",
"Project1"."MODIFIED_DATE2" AS "MODIFIED_DATE3",
"Project1"."MODIFIED_USER2" AS "MODIFIED_USER3",
"Project1"."MODIFIED_PROC2" AS "MODIFIED_PROC3",
"Project1"."EWM_ELEV_MEASURE_METHOD_CD" AS "EWM_ELEV_MEASURE_METHOD_CD",
"Project1"."APPL_ID2" AS "APPL_ID3",
"Project1"."ORG_ID1" AS "ORG_ID1",
"Project1"."ORG_NAME" AS "ORG_NAME",
"Project1"."ORG_ABBR" AS "ORG_ABBR",
"Project1"."ORG_TYPE_ID" AS "ORG_TYPE_ID",
"Project1"."MODIFIED_DATE4" AS "MODIFIED_DATE4",
"Project1"."MODIFIED_USER4" AS "MODIFIED_USER4",
"Project1"."MODIFIED_PROC4" AS "MODIFIED_PROC4",
"Project1"."ORG_TIN" AS "ORG_TIN"
FROM ( SELECT
"Extent1"."EWM_ELEVATION_DATA_READ_JRL_ID" AS "EWM_ELEVATION_DATA_READ_JRL_ID",
"Extent1"."EWM_ELEVATION_DATA_READING_ID" AS "EWM_ELEVATION_DATA_READING_ID",
"Extent1"."CRUD_TYPE" AS "CRUD_TYPE",
"Extent1"."CRUD_DT" AS "CRUD_DT",
"Extent1"."MEASUREMENT_DT" AS "MEASUREMENT_DT",
"Extent1"."ORG_ID" AS "ORG_ID",
"Extent1"."EWM_MEASUREMENT_ISSUE_TYPE_ID" AS "EWM_MEASUREMENT_ISSUE_TYPE_ID",
"Extent1"."EWM_STATION_ID" AS "EWM_STATION_ID",
"Extent1"."EWM_ELEV_MEASURE_METHOD_TYP_ID" AS "EWM_ELEV_MEASURE_METHOD_TYP_ID",
"Extent1"."EWM_ELEVATION_ACCURACY_TYPE_ID" AS "EWM_ELEVATION_ACCURACY_TYPE_ID",
"Extent1"."REFERENCE_POINT_ELEVATION" AS "REFERENCE_POINT_ELEVATION",
"Extent1"."GROUND_SURFACE_ELEVATION" AS "GROUND_SURFACE_ELEVATION",
"Extent1"."WATER_SURFACE_READING" AS "WATER_SURFACE_READING",
"Extent1"."REFERENCE_POINT_READING" AS "REFERENCE_POINT_READING",
"Extent1"."MANDATORY_READING" AS "MANDATORY_READING",
"Extent1"."COMMENTS" AS "COMMENTS",
"Extent1"."MODIFIED_DATE" AS "MODIFIED_DATE",
"Extent1"."MODIFIED_USER" AS "MODIFIED_USER",
"Extent1"."MODIFIED_PROC" AS "MODIFIED_PROC",
"Extent1"."COOPERATING_AGENCY_ORG_ID" AS "COOPERATING_AGENCY_ORG_ID",
"Extent1"."APPL_ID" AS "APPL_ID",
"Extent1"."SUBMISSION_DATE" AS "SUBMISSION_DATE",
1 AS "C1",
"Extent2"."EWM_MEASUREMENT_ISSUE_TYPE_ID" AS "EWM_MEASUREMENT_ISSUE_TYPE_ID1",
"Extent2"."EWM_MEASURE_ISSUE_TYPE_CODE" AS "EWM_MEASURE_ISSUE_TYPE_CODE",
"Extent2"."EWM_MEASURE_ISSUE_TYPE_DESC" AS "EWM_MEASURE_ISSUE_TYPE_DESC",
"Extent2"."EWM_MEASURE_ISSUE_TYPE_ACTV" AS "EWM_MEASURE_ISSUE_TYPE_ACTV",
"Extent2"."EWM_MEASURE_ISSUE_TYPE_ORDER" AS "EWM_MEASURE_ISSUE_TYPE_ORDER",
"Extent2"."EWM_MEASURE_ISSUE_TYPE_CLASS" AS "EWM_MEASURE_ISSUE_TYPE_CLASS",
"Extent2"."MODIFIED_DATE" AS "MODIFIED_DATE1",
"Extent2"."MODIFIED_USER" AS "MODIFIED_USER1",
"Extent2"."MODIFIED_PROC" AS "MODIFIED_PROC1",
"Extent2"."APPL_ID" AS "APPL_ID1",
"Extent3"."EWM_ELEV_MEASURE_METHOD_TYP_ID" AS "EWM_ELEV_MEASURE_METHOD_TYP_ID1",
"Extent3"."EWM_ELEV_MEASURE_METHOD_DESC" AS "EWM_ELEV_MEASURE_METHOD_DESC",
"Extent3"."EWM_ELEV_MEASURE_METHOD_ACTV" AS "EWM_ELEV_MEASURE_METHOD_ACTV",
"Extent3"."EWM_ELEV_MEASURE_METHOD_ORDER" AS "EWM_ELEV_MEASURE_METHOD_ORDER",
"Extent3"."MODIFIED_DATE" AS "MODIFIED_DATE2",
"Extent3"."MODIFIED_USER" AS "MODIFIED_USER2",
"Extent3"."MODIFIED_PROC" AS "MODIFIED_PROC2",
"Extent3"."EWM_ELEV_MEASURE_METHOD_CD" AS "EWM_ELEV_MEASURE_METHOD_CD",
"Extent3"."APPL_ID" AS "APPL_ID2",
**"Extent4"."EWM_ELEVATION_ACCURACY_TYPE_ID" AS "EWM_ELEVATION_ACCURACY_TYPE_ID1"**,
"Extent4"."EWM_ELEVATION_ACCURACY_DESC" AS "EWM_ELEVATION_ACCURACY_DESC",
"Extent4"."EWM_ELEVATION_ACCURACY_ACTV" AS "EWM_ELEVATION_ACCURACY_ACTV",
"Extent4"."EWM_ELEVATION_ACCURACY_ORDER" AS "EWM_ELEVATION_ACCURACY_ORDER",
"Extent4"."MODIFIED_DATE" AS "MODIFIED_DATE3",
"Extent4"."MODIFIED_USER" AS "MODIFIED_USER3",
"Extent4"."MODIFIED_PROC" AS "MODIFIED_PROC3",
"Extent4"."EWM_ELEVATION_ACCURACY_CD" AS "EWM_ELEVATION_ACCURACY_CD",
"Extent4"."APPL_ID" AS "APPL_ID3",
"Extent5"."ORG_ID" AS "ORG_ID1",
"Extent5"."ORG_NAME" AS "ORG_NAME",
"Extent5"."ORG_ABBR" AS "ORG_ABBR",
"Extent5"."ORG_TYPE_ID" AS "ORG_TYPE_ID",
"Extent5"."MODIFIED_DATE" AS "MODIFIED_DATE4",
"Extent5"."MODIFIED_USER" AS "MODIFIED_USER4",
"Extent5"."MODIFIED_PROC" AS "MODIFIED_PROC4",
"Extent5"."ORG_TIN" AS "ORG_TIN"
FROM "EWM_ADM"."EWM_ELEVATION_DATA_READING_JRL" "Extent1"
LEFT OUTER JOIN "EWM_ADM"."EWM_MEASUREMENT_ISSUE_TYPE" "Extent2" ON "Extent1"."EWM_MEASUREMENT_ISSUE_TYPE_ID" = "Extent2"."EWM_MEASUREMENT_ISSUE_TYPE_ID"
LEFT OUTER JOIN "EWM_ADM"."EWM_ELEV_MEASURE_METHOD_TYP" "Extent3" ON "Extent1"."EWM_ELEV_MEASURE_METHOD_TYP_ID" = "Extent3"."EWM_ELEV_MEASURE_METHOD_TYP_ID"
LEFT OUTER JOIN "EWM_ADM"."EWM_ELEVATION_ACCURACY_TYPE" "Extent4" ON "Extent1"."EWM_ELEVATION_ACCURACY_TYPE_ID" = "Extent4"."EWM_ELEVATION_ACCURACY_TYPE_ID"
LEFT OUTER JOIN "BUS_ADM"."ORGANIZATION" "Extent5" ON "Extent1"."COOPERATING_AGENCY_ORG_ID" = "Extent5"."ORG_ID"
WHERE (("Extent1"."EWM_STATION_ID" = 52370) OR (("Extent1"."EWM_STATION_ID" IS NULL) AND (52370 IS NULL)))
) "Project1"
ORDER BY "Project1"."EWM_ELEVATION_DATA_READ_JRL_ID" DESC
The line in question is this one (marked with asterisks in the query since I cannot bold it).
"Extent4"."EWM_ELEVATION_ACCURACY_TYPE_ID" AS "EWM_ELEVATION_ACCURACY_TYPE_ID1"
As you can see it appended a 1 to the alias. I need to prevent that or force a custom alias name.
I am executing the following LINQ to Entities query but it is stuck and does not return response until timeout. I executed the same query on SQL Server and it return 92000 in 3 sec.
var query = (from r in WinCtx.PartsRoutings
join s in WinCtx.Tab_Processes on r.ProcessName equals s.ProcessName
join p in WinCtx.Tab_Parts on r.CustPartNum equals p.CustPartNum
select new { r}).ToList();
SQL Generated:
SELECT [ I omitted columns]
FROM [dbo].[PartsRouting] AS [Extent1]
INNER JOIN [dbo].[Tab_Processes] AS [Extent2] ON ([Extent1].[ProcessName] = [Extent2].[ProcessName]) OR (([Extent1].[ProcessName] IS NULL) AND ([Extent2].[ProcessName] IS NULL))
INNER JOIN [dbo].[Tab_Parts] AS [Extent3] ON ([Extent1].[CustPartNum] = [Extent3].[CustPartNum]) OR (([Extent1].[CustPartNum] IS NULL) AND ([Extent3].[CustPartNum] IS NULL))
PartsRouting Table has 100,000+ records, Parts = 15000+, Processes = 200.
I tried too many things found online but nothing worked for me as to how I can achieve the result with same performance of SQL.
Based on the comments, looks like the issue is caused by the additional OR with IS NULL conditions in joins generated by the EF SQL translator. They were added in EF in order to emulate the C# == operator semantics which are different from SQL = for NULL values.
You can start by turning that EF behavior off through UseDatabaseNullSemantics property (it's false by default):
WinCtx.Configuration.UseDatabaseNullSemantics = true;
Unfortunately that's not enough, because it fixes the normal comparison operators, but they simply forgot to do the same for join conditions.
In case you are using joins just for filtering (as it seems), you can replace them with LINQ Any conditions which translates to SQL EXISTS and nowadays database query optimizers are treating it the same way as if it was an inner join:
var query = (from r in WinCtx.PartsRoutings
where WinCtx.Tab_Processes.Any(s => r.ProcessName == s.ProcessName)
where WinCtx.Tab_Parts.Any(p => r.CustPartNum == p.CustPartNum)
select new { r }).ToList();
You might also consider using just select r since creating anonymous type with single property just introdeces additional memory overhead with no advantages.
Update: Looking at the latest comment, you do need fields from joined tables (that's why it's important to not omit relevant parts of the query in question). In such case, you could try the alternative join syntax with where clauses:
WinCtx.Configuration.UseDatabaseNullSemantics = true;
var query = (from r in WinCtx.PartsRoutings
from s in WinCtx.Tab_Processes where r.ProcessName == s.ProcessName
from p in WinCtx.Tab_Parts where r.CustPartNum == p.CustPartNum
select new { r, s.Foo, p.Bar }).ToList();
I currently have some stored procedures in T-SQL that I would like to translate to LINQ-To-Entities for greater maintainability. However, no matter how I seem to construct the LINQ query, when I inspect the generated code, it's an abhorrent terribly-performing monstrosity. I've investigated into various combinations of "let" clauses, joins, shifting around the "where" clause to both inside and out of the anonymous type selection, or even using the extension ".Where<>()" method piecemeal in the other indiciations themselves, and nothing seems to generate code anywhere as close to what I would expect or need.
The difficulties seem threefold:
The joins that I need to do are on combinations of booleans, whereas LINQ-To-Entities seems to only have join functionality for equijoins. How can I translate these joins?
I need to join on the same table multiple different times with different join/where clauses, so that I can select a different value for each record. How do I accomplish this in LINQ-To-Entities?
How do I prevent my joins on entity collections become nested messes?
The T-SQL query I'm attempting to translate is this (where the hardcoded numbers are specific hardcoded types):
SELECT Transport.*,
[Address].Street1,
Carrier1Insurance.InsuranceNumber,
Carrier2Insurance.InsuranceNumber,
Carrier3Insurance.InsuranceNumber
FROM Transport
INNER JOIN Appoint ON Transport.AppointKEY = Appoint.AppointKEY
INNER JOIN Patient ON Appoint.PatientKEY = Patient.PatientKEY
LEFT OUTER JOIN [Address] ON [Address].AddressFKEY = Patient.PatientKEY AND [Address].AddressTypeByTableKEY = 1
LEFT OUTER JOIN PatientInsurance Carrier1Insurance ON Carrier1Insurance.PatientKEY = Patient.PatientKEY AND Carrier1Insurance.CarrierKEY = 7
LEFT OUTER JOIN PatientInsurance Carrier2Insurance ON Carrier2Insurance.PatientKEY = Patient.PatientKEY AND Carrier2Insurance.CarrierKEY = 8
LEFT OUTER JOIN PatientInsurance Carrier3Insurance ON Carrier3Insurance.PatientKEY = Patient.PatientKEY AND (Carrier3Insurance.CarrierKEY <> 7 AND Carrier3Insurance.CarrierKEY = 8)
WHERE (Transport.TransportDate >= '07-01-2013' AND Transport.TransportDate <= '07-31-2013')
AND EXISTS (SELECT TOP 1 *
FROM Remit
WHERE Remit.CarrierKEY = 8
AND Remit.AppointKEY = Transport.AppointKEY
AND Remit.PaidAmt > 0)
And the latest in many, many attempts at LINQ is this:
var medicareTransportList = from transportItem in ClientEDM.Transports
join patientAddress in ClientEDM.Addresses on transportItem.Appoint.PatientKEY equals patientAddress.AddressFKEY
join carrier1Insurance in ClientEDM.PatientInsurances on transportItem.Appoint.PatientKEY equals carrier1Insurance.PatientKEY
join carrier2Insurance in ClientEDM.PatientInsurances on transportItem.Appoint.PatientKEY equals carrier2Insurance.PatientKEY
join otherInsurance in ClientEDM.PatientInsurances on transportItem.Appoint.PatientKEY equals otherInsurance.PatientKEY
where (transportItem.TransportDate > fromDate ** transportItem.TransportDate <= toDate) && transportItem.Appoint.Remits.Any(remit => remit.CarrierKEY == 0 && remit.PaidAmt > 0.00M) &&
(carrier1Insurance.CarrierKEY == 7) &&
(carrier2Insurance.CarrierKEY == 8 ) &&
(otherInsurance.CarrierKEY != 7 &&
otherInsurance.CarrierKEY != 8 ) &&
(patientAddress.AddressTypeByTableKEY == 1)
select new
{
transport = transportItem,
patient = patientAddress,
medicare = medicareInsurance,
medicaid = medicaidInsurance,
other = otherInsurance
};
The LINQ .Join() operator is equivalent to the INNER JOIN SQL operator.
For any other case, use the .GroupJoin() operator.
But do you really need to use join? In many case, using LINQ, SQL JOIN (inner or outer) can be expressed using navigation properties between entities.
Please explains your conceptual data model for a precise answer.
I have the following code, which is misbehaving:
TPM_USER user = UserManager.GetUser(context, UserId);
var tasks = (from t in user.TPM_TASK
where t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
orderby t.DUEDATE, t.PROJECTID
select t);
The first line, UserManager.GetUser just does a simple lookup in the database to get the correct TPM_USER record. However, the second line causes all sorts of SQL chaos.
First off, it's executing two SQL statements here. The first one grabs every single row in TPM_TASK which is linked to that user, which is sometimes tens of thousands of rows:
SELECT
-- Columns
FROM TPMDBO.TPM_USERTASKS "Extent1"
INNER JOIN TPMDBO.TPM_TASK "Extent2" ON "Extent1".TASKID = "Extent2".TASKID
WHERE "Extent1".USERID = :EntityKeyValue1
This query takes about 18 seconds on users with lots of tasks. I would expect the WHERE clause to contain the STAGEID filters too, which would remove the majority of the rows.
Next, it seems to execute a new query for each TPM_PROJECTVERSION pair in the list above:
SELECT
-- Columns
FROM TPMDBO.TPM_PROJECTVERSION "Extent1"
WHERE ("Extent1".PROJECTID = :EntityKeyValue1) AND ("Extent1".VERSIONID = :EntityKeyValue2)
Even though this query is fast, it's executed several hundred times if the user has tasks in a whole bunch of projects.
The query I would like to generate would look something like:
SELECT
-- Columns
FROM TPMDBO.TPM_USERTASKS "Extent1"
INNER JOIN TPMDBO.TPM_TASK "Extent2" ON "Extent1".TASKID = "Extent2".TASKID
INNER JOIN TPMDBO.TPM_PROJECTVERSION "Extent3" ON "Extent2".PROJECTID = "Extent3".PROJECTID AND "Extent2".VERSIONID = "Extent3".VERSIONID
WHERE "Extent1".USERID = 5 and "Extent2".STAGEID > 0 and "Extent2".STAGEID <> 3 and "Extent3".STAGEID <= 10
The query above would run in about 1 second. Normally, I could specify that JOIN using the Include method. However, this doesn't seem to work on properties. In other words, I can't do:
from t in user.TPM_TASK.Include("TPM_PROJECTVERSION")
Is there any way to optimize this LINQ statement? I'm using .NET4 and Oracle as the backend DB.
Solution:
This solution is based on Kirk's suggestions below, and works since context.TPM_USERTASK cannot be queried directly:
var tasks = (from t in context.TPM_TASK.Include("TPM_PROJECTVERSION")
where t.TPM_USER.Any(y => y.USERID == UserId) &&
t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
orderby t.DUEDATE, t.PROJECTID
select t);
It does result in a nested SELECT rather than querying TPM_USERTASK directly, but it seems fairly efficient none-the-less.
Yes, you are pulling down a specific user, and then referencing the relationship TPM_TASK. That it is pulling down every task attached to that user is exactly what it's supposed to be doing. There's no ORM SQL translation when you're doing it this way. You're getting a user, then getting all his tasks into memory, and then performing some client-side filtering. This is all done using lazy-loading, so the SQL is going to be exceptionally inefficient as it can't batch anything up.
Instead, rewrite your query to go directly against TPM_TASK and filter against the user:
var tasks = (from t in context.TPM_TASK
where t.USERID == user.UserId && t.STAGEID > 0 && t.STAGEID != 3 && t.TPM_PROJECTVERSION.STAGEID <= 10
orderby t.DUEDATE, t.PROJECTID
select t);
Note how we're checking t.USERID == user.UserId. This produces the same effect as user.TPM_TASK but now all the heavy lifting is done by the database rather than in memory.
I have the following LINQ query, that is returning the results that I expect, but it does not "feel" right.
Basically it is a left join. I need ALL records from the UserProfile table.
Then the LastWinnerDate is a single record from the winner table (possible multiple records) indicating the DateTime the last record was entered in that table for the user.
WinnerCount is the number of records for the user in the winner table (possible multiple records).
Video1 is basically a bool indicating there is, or is not a record for the user in the winner table matching on a third table Objective (should be 1 or 0 rows).
Quiz1 is same as Video 1 matching another record from Objective Table (should be 1 or 0 rows).
Video and Quiz is repeated 12 times because it is for a report to be displayed to a user listing all user records and indicate if they have met the objectives.
var objectiveIds = new List<int>();
objectiveIds.AddRange(GetObjectiveIds(objectiveName, false));
var q =
from up in MetaData.UserProfile
select new RankingDTO
{
UserId = up.UserID,
FirstName = up.FirstName,
LastName = up.LastName,
LastWinnerDate = (
from winner in MetaData.Winner
where objectiveIds.Contains(winner.ObjectiveID)
where winner.Active
where winner.UserID == up.UserID
orderby winner.CreatedOn descending
select winner.CreatedOn).First(),
WinnerCount = (
from winner in MetaData.Winner
where objectiveIds.Contains(winner.ObjectiveID)
where winner.Active
where winner.UserID == up.UserID
orderby winner.CreatedOn descending
select winner).Count(),
Video1 = (
from winner in MetaData.Winner
join o in MetaData.Objective on winner.ObjectiveID equals o.ObjectiveID
where o.ObjectiveNm == Constants.Promotions.SecVideo1
where winner.Active
where winner.UserID == up.UserID
select winner).Count(),
Quiz1 = (
from winner2 in MetaData.Winner
join o2 in MetaData.Objective on winner2.ObjectiveID equals o2.ObjectiveID
where o2.ObjectiveNm == Constants.Promotions.SecQuiz1
where winner2.Active
where winner2.UserID == up.UserID
select winner2).Count(),
};
You're repeating join winners table part several times. In order to avoid it you can break it into several consequent Selects. So instead of having one huge select, you can make two selects with lesser code. In your example I would first of all select winner2 variable before selecting other result properties:
var q1 =
from up in MetaData.UserProfile
select new {up,
winners = from winner in MetaData.Winner
where winner.Active
where winner.UserID == up.UserID
select winner};
var q = from upWinnerPair in q1
select new RankingDTO
{
UserId = upWinnerPair.up.UserID,
FirstName = upWinnerPair.up.FirstName,
LastName = upWinnerPair.up.LastName,
LastWinnerDate = /* Here you will have more simple and less repeatable code
using winners collection from "upWinnerPair.winners"*/
The query itself is pretty simple: just a main outer query and a series of subselects to retrieve actual column data. While it's not the most efficient means of querying the data you're after (joins and using windowing functions will likely get you better performance), it's the only real way to represent that query using either the query or expression syntax (windowing functions in SQL have no mapping in LINQ or the LINQ-supporting extension methods).
Note that you aren't doing any actual outer joins (left or right) in your code; you're creating subqueries to retrieve the column data. It might be worth looking at the actual SQL being generated by your query. You don't specify which ORM you're using (which would determine how to examine it client-side) or which database you're using (which would determine how to examine it server-side).
If you're using the ADO.NET Entity Framework, you can cast your query to an ObjectQuery and call ToTraceString().
If you're using SQL Server, you can use SQL Server Profiler (assuming you have access to it) to view the SQL being executed, or you can run a trace manually to do the same thing.
To perform an outer join in LINQ query syntax, do this:
Assuming we have two sources alpha and beta, each having a common Id property, you can select from alpha and perform a left join on beta in this way:
from a in alpha
join btemp in beta on a.Id equals btemp.Id into bleft
from b in bleft.DefaultIfEmpty()
select new { IdA = a.Id, IdB = b.Id }
Admittedly, the syntax is a little oblique. Nonetheless, it works and will be translated into something like this in SQL:
select
a.Id as IdA,
b.Id as Idb
from alpha a
left join beta b on a.Id = b.Id
It looks fine to me, though I could see why the multiple sub-queries could trigger inefficiency worries in the eyes of a coder.
Take a look at what SQL is produced though (I'm guessing you're running this against a database source from your saying "table" above), before you start worrying about that. The query providers can be pretty good at producing nice efficient SQL that in turn produces a good underlying database query, and if that's happening, then happy days (it will also give you another view on being sure of the correctness).