Entity Framework performing different on multiple computers - c#

I am at a loss, So I came here to see what everyone thinks about this.
Background: Textbox using jquery autocomplete w/ EF 5
Problem: Entity is performing a inner join, but the syntax shows a left join is being performing. (Both PC's are running Win 7 and latest framework)
Anomaly: It works the right way (left outer join) on my PC. But on my buddys PC, it shows an inner join. Both files are binary equals. (Actually being pulled from git)
Here is the code:
public JsonResult AutoCompleteName(string term)
{
using (var db = new PersonnelContext())
{
return this.Json((from r in db.Personnel
join per in db.PersonnelEmployee on r.Id equals per.Personnel_Id
join dep in db.RefDepartment on per.Department equals dep.Department into rfdp
from g in rfdp.DefaultIfEmpty()
where r.First_Name.ToLower().Contains(term.ToLower()) | r.Last_Name.ToLower().Contains(term.ToLower()) | (r.First_Name.ToLower() + " " + r.Last_Name.ToLower()).Contains(term.ToLower())
select new { firstname = r.First_Name, lastname = r.Last_Name, department = g.DeptDesc ?? "None", per.Personnel_Id, per.Pernr }).OrderBy(a => a.firstname).ToArray(), JsonRequestBehavior.AllowGet);
}
}
Here is the intellitrace from the PC where its working fine: (Note I took out some due to confidentiality)
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[First_Name] AS [First_Name],
[Extent1].[Last_Name] AS [Last_Name],
[Extent2].[Personnel_Id] AS [Personnel_Id],
[Extent2].[Pernr] AS [Pernr],
CASE WHEN ([Extent3].[Dept_Desc] IS NULL) THEN N'None' ELSE [Extent3].[Dept_Desc] END AS [C1]
FROM [Core].[Personnel] AS [Extent1]
INNER JOIN [Core].[Personnel_Employee] AS [Extent2] ON [Extent1].[Id] = [Extent2].[Personnel_Id]
LEFT OUTER JOIN [Core].[Ref_Department] AS [Extent3] ON [Extent2].[Department] = [Extent3].[Department]
WHERE (( CAST(CHARINDEX(LOWER(#p__linq__0), LOWER([Extent1].[First_Name])) AS int)) > 0) OR (( CAST(CHARINDEX(LOWER(#p__linq__1), LOWER([Extent1].[Last_Name])) AS int)) > 0) OR (( CAST(CHARINDEX(LOWER(#p__linq__2), LOWER([Extent1].[First_Name]) + N' ' + LOWER([Extent1].[Last_Name])) AS int)) > 0)
) AS [Project1]
ORDER BY [Project1].[First_Name] ASC"
As you can see, its performing a Left Outer Join, just as the syntax suggests.
Here is the Intellitrace from PC 2: (Exact same code!)
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[First_Name] AS [First_Name],
[Extent1].[Last_Name] AS [Last_Name],
[Extent2].[Personnel_Id] AS [Personnel_Id],
[Extent2].[Pernr] AS [Pernr],
CASE WHEN ([Extent3].[Dept_Desc] IS NULL) THEN N'None' ELSE [Extent3].[Dept_Desc] END AS [C1]
FROM [Core].[Personnel] AS [Extent1]
INNER JOIN [Core].[Personnel_Employee] AS [Extent2] ON [Extent1].[Id] = [Extent2].[Personnel_Id]
INNER JOIN [Core].[Ref_Department] AS [Extent3] ON [Extent2].[Department] = [Extent3].[Department]
WHERE (( CAST(CHARINDEX(LOWER(#p__linq__0), LOWER([Extent1].[First_Name])) AS int)) > 0) OR (( CAST(CHARINDEX(LOWER(#p__linq__1), LOWER([Extent1].[Last_Name])) AS int)) > 0) OR (( CAST(CHARINDEX(LOWER(#p__linq__2), LOWER([Extent1].[First_Name]) + N' ' + LOWER([Extent1].[Last_Name])) AS int)) > 0)
) AS [Project1]
ORDER BY [Project1].[First_Name] ASC"
Here is another strange anomaly:
When deployed to QA, it works fine on both PC's!! There is no consistency here for us to know whats going on.
Any ideas here? I have no idea what going on. We have completely wiped away our working versions, and pulled down again from git. Same thing happens. If no one has any ideas, we may need to use a proc. But I would at least like to know whats going on.
** Edit ** Answer:
I really have no idea why this works, but i got it working. Here is the query now:
return this.Json((from r in db.Personnel
join per in db.PersonnelEmployee on r.Id equals per.Personnel_Id
where r.First_Name.ToLower().Contains(term.ToLower()) | r.Last_Name.ToLower().Contains(term.ToLower()) | (r.First_Name.ToLower() + " " + r.Last_Name.ToLower()).Contains(term.ToLower())
join dep in db.RefDepartment
.Where(x => x.DeptDesc != null || x.DeptDesc == null) on per.Department equals dep.Department into rfdp
from g in rfdp.DefaultIfEmpty()
select new {
firstname = r.First_Name,
lastname = r.Last_Name,
department = (g == null) ? "None" : g.DeptDesc,
per.Personnel_Id,
per.Pernr }).ToArray(), JsonRequestBehavior.AllowGet);
I think this one works because it explicitly looks for DeptDesc being null and not null.... I am just glad its working. Thanks for everyone who looked at this.

I was able to get this fixed with the following modified query:
return this.Json((from r in db.Personnel
join per in db.PersonnelEmployee on r.Id equals per.Personnel_Id
where r.First_Name.ToLower().Contains(term.ToLower()) | r.Last_Name.ToLower().Contains(term.ToLower()) | (r.First_Name.ToLower() + " " + r.Last_Name.ToLower()).Contains(term.ToLower())
join dep in db.RefDepartment
.Where(x => x.DeptDesc != null || x.DeptDesc == null) on per.Department equals dep.Department into rfdp
from g in rfdp.DefaultIfEmpty()
select new {
firstname = r.First_Name,
lastname = r.Last_Name,
department = (g == null) ? "None" : g.DeptDesc,
per.Personnel_Id,
per.Pernr }).ToArray(), JsonRequestBehavior.AllowGet);
As I said in the edits: I'm not 100% sure why this works instead of the first, but im glad it does!

This sounds suspiciously like an already 'fixed' issue with EF... have a look at this duplicated join issue
They note that removing the orderby removes the duplicate... worth a test eh

Related

Why are multiple where in LINQ so slow?

Using C# and Linq to SQL, I found that my query with multiple where is orders of magnitude slower than with a single where / and.
Here is the query
using (TeradiodeDataContext dc = new TeradiodeDataContext())
{
var filterPartNumberID = 71;
var diodeIDsInBlades = (from bd in dc.BladeDiodes
select bd.DiodeID.Value).Distinct();
var diodesWithTestData = (from t in dc.Tests
join tt in dc.TestTypes on t.TestTypeID equals tt.ID
where tt.DevicePartNumberID == filterPartNumberID
select t.DeviceID.Value).Distinct();
var result = (from d in dc.Diodes
where d.DevicePartNumberID == filterPartNumberID
where diodesWithTestData.Contains(d.ID)
where !diodeIDsInBlades.Contains(d.ID)
orderby d.Name
select d);
var list = result.ToList();
// ~15 seconds
}
However, when the condition in the final query is this
where d.DevicePartNumberID == filterPartNumberID
& diodesWithTestData.Contains(d.ID)
& !diodeIDsInBlades.Contains(d.ID)
// milliseconds
it is very fast.
Comparing the SQL in result before calling ToList(), here are the queries (value 71 manually added in place of #params)
-- MULTIPLE WHERE
SELECT [t0].[ID], [t0].[Name], [t0].[M2MID], [t0].[DevicePartNumberID], [t0].[Comments], [t0].[Hold]
FROM [dbo].[Diode] AS [t0]
WHERE (NOT (EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT DISTINCT [t2].[value]
FROM (
SELECT [t1].[DiodeID] AS [value]
FROM [dbo].[BladeDiode] AS [t1]
) AS [t2]
) AS [t3]
WHERE [t3].[value] = [t0].[ID]
))) AND (EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT DISTINCT [t6].[value]
FROM (
SELECT [t4].[DeviceID] AS [value], [t5].[DevicePartNumberID]
FROM [dbo].[Test] AS [t4]
INNER JOIN [dbo].[TestType] AS [t5] ON [t4].[TestTypeID] = ([t5].[ID])
) AS [t6]
WHERE [t6].[DevicePartNumberID] = (71)
) AS [t7]
WHERE [t7].[value] = [t0].[ID]
)) AND ([t0].[DevicePartNumberID] = 71)
ORDER BY [t0].[Name]
and
-- SINGLE WHERE
SELECT [t0].[ID], [t0].[Name], [t0].[M2MID], [t0].[DevicePartNumberID], [t0].[Comments], [t0].[Hold]
FROM [dbo].[Diode] AS [t0]
WHERE ([t0].[DevicePartNumberID] = 71) AND (EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT DISTINCT [t3].[value]
FROM (
SELECT [t1].[DeviceID] AS [value], [t2].[DevicePartNumberID]
FROM [dbo].[Test] AS [t1]
INNER JOIN [dbo].[TestType] AS [t2] ON [t1].[TestTypeID] = ([t2].[ID])
) AS [t3]
WHERE [t3].[DevicePartNumberID] = (71)
) AS [t4]
WHERE [t4].[value] = [t0].[ID]
)) AND (NOT (EXISTS(
SELECT NULL AS [EMPTY]
FROM (
SELECT DISTINCT [t6].[value]
FROM (
SELECT [t5].[DiodeID] AS [value]
FROM [dbo].[BladeDiode] AS [t5]
) AS [t6]
) AS [t7]
WHERE [t7].[value] = [t0].[ID]
)))
ORDER BY [t0].[Name]
The two SQL queries execute in < 1 second in SSMS and produce the same results.
So I'm wondering why the first is slower on the LINQ side. It's worrying to me because I know I've used multiple where elsewhere, without being aware of a such a severe performance impact.
This question even has answered with both multiple & and where. And this answer even suggests using multiple where clauses.
Can anyone explain why this happens in my case?
Because writing like this
if (someParam1 != 0)
{
myQuery = myQuery.Where(q => q.SomeField1 == someParam1)
}
if (someParam2 != 0)
{
myQuery = myQuery.Where(q => q.SomeField2 == someParam2)
}
is NOT(upd) the same as (in case when someParam1 and someParam2 != 0)
myQuery = from t in Table
where t.SomeField1 == someParam1
&& t.SomeField2 == someParam2
select t;
is (NOT deleted) the same as
myQuery = from t in Table
where t.SomeField1 == someParam1
where t.SomeField2 == someParam2
select t;
UPD
Yes, I do mistake. Second query is same, first is not same.
First and Second queries not EXACTLY the same. Let me show you what I mean.
1st query with lamda-expression writen as
t.Where(r => t.SomeField1 == someParam1 && t.SomeField2 == someParam2)
2nd query as
t.Where(r => r.SomeField1 == someParam1).Where(r => r.SomeField2 == someParam2)
In this case in generated SQL Predicate with SomeField2 goes first (it is important, see below)
In 1st case we getting this SQL:
SELECT <all field from Table>
FROM table t
WHERE t.SomeField1 = :someParam1
AND t.SomeField2 = :someParam2
In 2 case the SQL is:
SELECT <all field from Table>
FROM table t
WHERE t.SomeField2 = :someParam2
AND t.SomeField1 = :someParam1
As we see there are 2 'same' SQLs. As we see, the OP's SQLs are also 'same', they are different in order of predicates in WHERE clause (as in my example). And I guess that SQL optimizer generate 2 different execution plans and may be(!!!) doing NOT EXISTS, then EXISTS and then filtering take more time than do first filtering and after that do EXISTS and NOT EXISTS
UPD2
It is a 'problem' of Linq Provider (ORM). I'm using another ORM (linq2db), and it generates for me EXACTLY the same SQLs in both cases.

Linq to SQL: Query runs fine in SQL Server Management Studio, but times out on the application

I'm maintaining the code of a legacy application. It is using Entity Framework 3.5 and updating it to a newer version at the moment is not possible.
There is a query that lately has started to time out. This is the query in LINQ:
var compoQueryResult = (from composition in MyAppDataContext.Compositions
join compositionStatus in MyAppDataContext.CompositionStatus on composition.StatusID equals compositionStatus.CompositionStatusID
join compoUnit in MyAppDataContext.CompositionUnits on composition.CompositionID equals compoUnit.CompositionID
join unit in MyAppDataContext.Units on compoUnit.UnitID equals unit.UnitID
join carUnit in MyAppDataContext.Car_Units on unit.UnitID equals carUnit.UnitId
join car in MyAppDataContext.Cars on carUnit.CarId equals car.CarID
join location in MyAppDataContext.Locations on unit.ParkedLocationID equals location.LocationID
from road in MyAppDataContext.Roads.Where(r => r.RoadID == unit.RoadID).DefaultIfEmpty()
from ta in MyAppDataContext.Allocations.Where(ta => ta.CompositionId == composition.CompositionID).DefaultIfEmpty()
from arrivalLocation in MyAppDataContext.Locations.Where(al => al.LocationID == ta.EndLocationID).DefaultIfEmpty()
where (MyApp.IsAdministrator || MyApp.GetLineIds().Contains(car.LineId.Value))
&& car.LineId.Value == LineId
&& (statusId == null || statusId == compositionStatus.CompositionStatusID)
&& (locationId == null || unit.ParkedLocationID == locationId)
&& (!onlyATMS.HasValue
|| (onlyATMS == true && composition.CompositionName.Contains(Common.Constants.ATMSSymbol))
|| (onlyATMS == false && !composition.CompositionName.Contains(Common.Constants.ATMSSymbol)))
select new CompositionQueryResultItem
{
CompositionId = composition.CompositionID,
CompositionName = composition.CompositionName,
CompositionNumber = composition.CompositionNumber,
DateTimeModified = composition.DataModified,
Status = compositionStatus.CompositionStatusDesc,
Location = location.LocationName,
Road = road.Name,
Number = ta.Number,
DepartureTime = ta.DepartTime,
ArrivalLocation = arrivalLocation.LocationName,
ArrivalTime = ta.ArriveTime
}).Distinct().ToList();
After debugging the app I can see that the following query is being executed:
SELECT DISTINCT [t10].[CompositionID] AS [CompositionId], [t10].[CompositionName], [t10].[CompositionNumber], [t10].[value] AS [DateTimeModified], [t10].[CompositionStatusDesc] AS [Status], [t10].[LocationName] AS [Location], [t10].[value2] AS [Road], [t10].[value3] AS [Number], [t10].[value4] AS [DepartureTime], [t10].[value5] AS [ArrivalLocation], [t10].[value6] AS [ArrivalTime]
FROM (
SELECT [t0].[CompositionID], [t0].[CompositionName], [t0].[CompositionNumber], [t0].[DataModified] AS [value], [t1].[CompositionStatusDesc], [t6].[LocationName], [t7].[Name] AS [value2], [t8].[Number] AS [value3], [t8].[DepartTime] AS [value4], [t9].[LocationName] AS [value5], [t8].[ArriveTime] AS [value6], [t5].[LineId]
FROM [dbo].[Composition] AS [t0]
INNER JOIN [dbo].[CompositionStatus] AS [t1] ON [t0].[StatusID] = [t1].[CompositionStatusID]
INNER JOIN [dbo].[CompositionUnit] AS [t2] ON [t0].[CompositionID] = [t2].[CompositionID]
INNER JOIN [dbo].[Unit] AS [t3] ON [t2].[UnitID] = [t3].[UnitID]
INNER JOIN [dbo].[Car_Unit] AS [t4] ON [t3].[UnitID] = [t4].[UnitId]
INNER JOIN [dbo].[Car] AS [t5] ON [t4].[CarId] = [t5].[CarID]
INNER JOIN [dbo].[Location] AS [t6] ON [t3].[ParkedLocationID] = ([t6].[LocationID])
LEFT OUTER JOIN [dbo].[Road] AS [t7] ON ([t7].[RoadID]) = [t3].[RoadID]
LEFT OUTER JOIN [dbo].[Allocation] AS [t8] ON [t8].[CompositionId] = ([t0].[CompositionID])
LEFT OUTER JOIN [dbo].[Location] AS [t9] ON [t9].[LocationID] = [t8].[EndLocationID]
) AS [t10]
WHERE ([t10].[LineId]) = #p0
If I execute that query on SQL Server Manager Studio I get no issues at all. It executes in less than one second. However, the application is timing out and I have no idea why.
I tried doing
MyAppDataContext.ObjectTrackingEnabled = false;
but I still get the same error.
I'm about to give up and create an stored procedure instead, but I can't grasp why the timeout is happening.
The query execution plan that is being used by the SQL Server can be different when the same query is run through the application. Check your execution plans. You can use hints like option(recompile) at the end of the query or you can optimize it for a particular Lineid if for different id's the rows returned are significantly different.
Sometimes, using such hints make it slow in SSMS but not in application. You can find a solution after some hit and trial.

Entity framework performance join vs navigation property

I am trying to understand why is join in my case faster than statement which use navigation property. I have two queries.
First with navigation property :
var result = (from users in context.MetricBloodPreasure
orderby users.User.LastName, users.User.FirstName
select new
{
UserName = users.User.LastName + ", " + users.User.FirstName,
Date = users.DateOfValue,
}).ToList();
Generatet sql :
SELECT
[Project1].[C1] AS [C1],
[Project1].[C2] AS [C2],
[Project1].[DateOfValue] AS [DateOfValue]
FROM ( SELECT
[Extent1].[DateOfValue] AS [DateOfValue],
[Extent2].[FirstName] AS [FirstName],
[Extent2].[LastName] AS [LastName],
1 AS [C1],
CASE WHEN ([Extent2].[LastName] IS NULL) THEN N'' ELSE [Extent2].[LastName] END + N', ' + CASE WHEN ([Extent2].[FirstName] IS NULL) THEN N'' ELSE [Extent2].[FirstName] END AS [C2]
FROM [dbo].[MetricBloodPreasure] AS [Extent1]
INNER JOIN [dbo].[User] AS [Extent2] ON [Extent1].[UserId] = [Extent2].[Id]
) AS [Project1]
ORDER BY [Project1].[LastName] ASC, [Project1].[FirstName] ASC
Second with join:
var result1 = (from u in context.User
orderby u.LastName, u.FirstName
join us in context.MetricBloodPreasure
on u.Id equals us.UserId into users
from s in users
select new
{
UserName = s.User.LastName + ", " + s.User.FirstName,
Date = s.DateOfValue,
}).ToList();
Generated sql:
SELECT
1 AS [C1],
CASE WHEN ([Extent1].[LastName] IS NULL) THEN N'' ELSE [Extent1].[LastName] END + N', ' + CASE WHEN ([Extent1].[FirstName] IS NULL) THEN N'' ELSE [Extent1].[FirstName] END AS [C2],
[Extent2].[DateOfValue] AS [DateOfValue]
FROM [dbo].[User] AS [Extent1]
INNER JOIN [dbo].[MetricBloodPreasure] AS [Extent2] ON ([Extent1].[Id] = [Extent2].[UserId]) AND ([Extent2].[UserId] = [Extent1].[Id])
Before running first query, call var user = context.User.FirstOrDefault(); because I think open connection to database take some time.
Results :
Navigation property query : 00:00:00.6719646
Join query : 00:00:00.4941169
Looking at results it seems that Linq queries that use joins instead of navigation properties are faster. Is that true or I'm doing something wrong ?
To get a better insight into what it is doing you should get the raw SQL and you can check out the execution plan yourself.
To do this, you can either use SQL Profiler to see what query is being run, or you can log the SQL query itself by doing something like this before you run your query:
context.Database.Log = s => System.Diagnostics.Debug.WriteLine(s);
Also doing a simple benchmark like you did by running each once isn't going to necessarily be reliable. You'll want to run it multiple times and average things out. You should also run them in the opposite order to see if that changes things as well.

LINQ query optimization for slow grouping

I have a LINQ query that gets data via Entity Framework Code First from an SQL database. This works, but it works very very slow.
This is the original query:
var tmpResult = from mdv in allMetaDataValues
where mdv.Metadata.InputType == MetadataInputType.String && mdv.Metadata.ShowInFilter && !mdv.Metadata.IsHidden && !string.IsNullOrEmpty(mdv.ValueString)
group mdv by new
{
mdv.ValueString,
mdv.Metadata
} into g
let first = g.FirstOrDefault()
select new
{
MetadataTitle = g.Key.Metadata.Title,
MetadataID = g.Key.Metadata.ID,
CollectionColor = g.Key.Metadata.Collection.Color,
CollectionID = g.Key.Metadata.Collection.ID,
MetadataValueCount = 0,
MetadataValueTitle = g.Key.ValueString,
MetadataValueID = first.ID
};
This is the generated SQL from the original query:
{SELECT
0 AS [C1],
[Project4].[Title] AS [Title],
[Project4].[ID] AS [ID],
[Extent9].[Color] AS [Color],
[Project4].[Collection_ID] AS [Collection_ID],
[Project4].[ValueString] AS [ValueString],
[Project4].[C1] AS [C2]
FROM (SELECT
[Project2].[ValueString] AS [ValueString],
[Project2].[ID] AS [ID],
[Project2].[Title] AS [Title],
[Project2].[Collection_ID] AS [Collection_ID],
(SELECT TOP (1)
[Filter4].[ID1] AS [ID]
FROM ( SELECT [Extent6].[ID] AS [ID1], [Extent6].[ValueString] AS [ValueString], [Extent7].[Collection_ID] AS [Collection_ID1], [Extent8].[ID] AS [ID2], [Extent8].[InputType] AS [InputType], [Extent8].[ShowInFilter] AS [ShowInFilter], [Extent8].[IsHidden] AS [IsHidden1]
FROM [dbo].[MetadataValue] AS [Extent6]
LEFT OUTER JOIN [dbo].[Media] AS [Extent7] ON [Extent6].[Media_ID] = [Extent7].[ID]
INNER JOIN [dbo].[Metadata] AS [Extent8] ON [Extent6].[Metadata_ID] = [Extent8].[ID]
WHERE ( NOT (([Extent6].[ValueString] IS NULL) OR (( CAST(LEN([Extent6].[ValueString]) AS int)) = 0))) AND ([Extent7].[IsHidden] <> cast(1 as bit))
) AS [Filter4]
WHERE (2 = CAST( [Filter4].[InputType] AS int)) AND ([Filter4].[ShowInFilter] = 1) AND ([Filter4].[IsHidden1] <> cast(1 as bit)) AND ([Filter4].[Collection_ID1] = #p__linq__0) AND (([Project2].[ValueString] = [Filter4].[ValueString]) OR (([Project2].[ValueString] IS NULL) AND ([Filter4].[ValueString] IS NULL))) AND (([Project2].[ID] = [Filter4].[ID2]) OR (1 = 0))) AS [C1]
FROM ( SELECT
[Distinct1].[ValueString] AS [ValueString],
[Distinct1].[ID] AS [ID],
[Distinct1].[Title] AS [Title],
[Distinct1].[Collection_ID] AS [Collection_ID]
FROM ( SELECT DISTINCT
[Filter2].[ValueString] AS [ValueString],
[Filter2].[ID3] AS [ID],
[Filter2].[InputType1] AS [InputType],
[Filter2].[Title1] AS [Title],
[Filter2].[ShowInFilter1] AS [ShowInFilter],
[Filter2].[IsHidden2] AS [IsHidden],
[Filter2].[Collection_ID2] AS [Collection_ID]
FROM ( SELECT [Filter1].[ValueString], [Filter1].[Collection_ID3], [Filter1].[IsHidden3], [Filter1].[ID3], [Filter1].[InputType1], [Filter1].[Title1], [Filter1].[ShowInFilter1], [Filter1].[IsHidden2], [Filter1].[Collection_ID2]
FROM ( SELECT [Extent1].[ValueString] AS [ValueString], [Extent2].[Collection_ID] AS [Collection_ID3], [Extent4].[IsHidden] AS [IsHidden3], [Extent5].[ID] AS [ID3], [Extent5].[InputType] AS [InputType1], [Extent5].[Title] AS [Title1], [Extent5].[ShowInFilter] AS [ShowInFilter1], [Extent5].[IsHidden] AS [IsHidden2], [Extent5].[Collection_ID] AS [Collection_ID2]
FROM [dbo].[MetadataValue] AS [Extent1]
LEFT OUTER JOIN [dbo].[Media] AS [Extent2] ON [Extent1].[Media_ID] = [Extent2].[ID]
INNER JOIN [dbo].[Metadata] AS [Extent3] ON [Extent1].[Metadata_ID] = [Extent3].[ID]
LEFT OUTER JOIN [dbo].[Metadata] AS [Extent4] ON [Extent1].[Metadata_ID] = [Extent4].[ID]
LEFT OUTER JOIN [dbo].[Metadata] AS [Extent5] ON [Extent1].[Metadata_ID] = [Extent5].[ID]
WHERE ( NOT (([Extent1].[ValueString] IS NULL) OR (( CAST(LEN([Extent1].[ValueString]) AS int)) = 0))) AND ([Extent2].[IsHidden] <> cast(1 as bit)) AND (2 = CAST( [Extent3].[InputType] AS int)) AND ([Extent3].[ShowInFilter] = 1)
) AS [Filter1]
WHERE [Filter1].[IsHidden3] <> cast(1 as bit)
) AS [Filter2]
WHERE [Filter2].[Collection_ID3] = #p__linq__0
) AS [Distinct1]
) AS [Project2] ) AS [Project4]
LEFT OUTER JOIN [dbo].[Collection] AS [Extent9] ON [Project4].[Collection_ID] = [Extent9].[ID]}
If we remove the "let first = g.FirstOrDefault()" and change "MetadataValueID = first.ID" to "MetadataValueID = 0" so that we just have a fixed ID = 0 for testing purposes, then the data loads very fast and the generated query itself is half the size compared to the original
So it seems that this part is making the query very slow:
let first = g.FirstOrDefault()
...
MetadataValueID = first.ID
};
How can this be rewritten?
If I try to rewrite the code, it is still slow:
MetadataValueID = g.Select(x => x.ID).FirstOrDefault()
or
let first = g.Select(x => x.ID).FirstOrDefault()
...
MetadataValueID = first
};
Any suggestions?
Using EF I have allways felt that it has problems efficiently translating stuff like g.Key.Metadata.Collection, so I try to join more explicitly and to include only fields, that are neccessary for your result. You can use include instead of join using repository pattern.
Then your query would look like this:
from mdv in allMetaDataValues.Include("Metadata").Include("Metadata.Collection")
where mdv.Metadata.InputType == MetadataInputType.String &&
mdv.Metadata.ShowInFilter &&
!mdv.Metadata.IsHidden &&
!string.IsNullOrEmpty(mdv.ValueString)
group mdv by new
{
MetadataID = mdv.Metadata.ID,
CollectionID = mdv.Metadata.Collection.ID,
mdv.Metadata.Title,
mdv.Metadata.Collection.Color,
mdv.ValueString
} into g
let first = g.FirstOrDefault().ID
select new
{
MetadataTitle = g.Key.Title,
MetadataID = g.Key.MetadataID,
CollectionColor = g.Key.Color,
CollectionID = g.Key.CollectionID,
MetadataValueCount = 0,
MetadataValueTitle = g.Key.ValueString,
MetadataValueID = first
}
Good tool for playing with linq is LinqPad.
The problem is also that:
let first = g.FirstOrDefault().ID
cannot be easily translated to SQL see this answer. But this rewrite simplifies the underlying query for it at least. It remains to me unclear, why you need first ID from a set without using orderby.
It could be rewriten like this:
let first = (from f in allMetaDataValues
where f.Metadata.ID == g.Key.MetadataID &&
f.ValuesString == g.Key.ValuesString select f.ID)
.FirstOrDefault()
This way you do not let EF write the query for you and you can specify exactly how to do the select.
To speed up the query you can also consider adding indexes to database according to the generated query - namely index using both colums used in where clause of this let first query.
Try the following solution.
Replace FirstOrDefault() with .Take(1). FirstOrDefault() is not lazy loaded.
var tmpResult = from mdv in allMetaDataValues
where mdv.Metadata.InputType == MetadataInputType.String && mdv.Metadata.ShowInFilter && !mdv.Metadata.IsHidden && !string.IsNullOrEmpty(mdv.ValueString)
group mdv by new
{
mdv.ValueString,
mdv.Metadata
} into g
let first = g.Take(1)
select new
{
MetadataTitle = g.Key.Metadata.Title,
MetadataID = g.Key.Metadata.ID,
CollectionColor = g.Key.Metadata.Collection.Color,
CollectionID = g.Key.Metadata.Collection.ID,
MetadataValueCount = 0,
MetadataValueTitle = g.Key.ValueString,
MetadataValueID = first.ID
};

Optimising LINQ-to-SQL queries

I have a very heavy LINQ-to-SQL query, which does a number of joins onto different tables to return an anonymous type. The problem is, if the amount of rows returned is fairly large (> 200), then the query becomes awfully slow and ends up timing out. I know I can increase the data context timeout setting, but that's a last resort.
I'm just wondering if my query would work better if I split it up, and do my comparisons as LINQ-to-Objects queries so I can possibly even use PLINQ to maximise the the processing power. But I'm that's a foreign concept to me, and I can't get my head around on how I would split it up. Can anyone offer any advice? I'm not asking for the code to be written for me, just some general guidance on how I could improve this would be great.
Note I've ensured the database has all the correct keys that I'm joining on, and I've ensured these keys are up to date.
The query is below:
var cons = (from c in dc.Consignments
join p in dc.PODs on c.IntConNo equals p.Consignment into pg
join d in dc.Depots on c.DeliveryDepot equals d.Letter
join sl in dc.Accounts on c.Customer equals sl.LegacyID
join ss in dc.Accounts on sl.InvoiceAccount equals ss.LegacyID
join su in dc.Accounts on c.Subcontractor equals su.Name into sug
join sub in dc.Accountsubbies on ss.ID equals sub.AccountID into subg
where (sug.FirstOrDefault() == null
|| sug.FirstOrDefault().Customer == false)
select new
{
ID = c.ID,
IntConNo = c.IntConNo,
LegacyID = c.LegacyID,
PODs = pg.DefaultIfEmpty(),
TripNumber = c.TripNumber,
DropSequence = c.DropSequence,
TripDate = c.TripDate,
Depot = d.Name,
CustomerName = c.Customer,
CustomerReference = c.CustomerReference,
DeliveryName = c.DeliveryName,
DeliveryTown = c.DeliveryTown,
DeliveryPostcode = c.DeliveryPostcode,
VehicleText = c.VehicleReg + c.Subcontractor,
SubbieID = sug.DefaultIfEmpty().FirstOrDefault().ID.ToString(),
SubbieList = subg.DefaultIfEmpty(),
ScanType = ss.PODScanning == null ? 0 : ss.PODScanning
});
Here's the generated SQL as requested:
{SELECT [t0].[ID], [t0].[IntConNo], [t0].[LegacyID], [t6].[test], [t6].[ID] AS [ID2], [t6].[Consignment], [t6].[Status], [t6].[NTConsignment], [t6].[CustomerRef], [t6].[Timestamp], [t6].[SignedBy], [t6].[Clause], [t6].[BarcodeNumber], [t6].[MainRef], [t6].[Notes], [t6].[ConsignmentRef], [t6].[PODedBy], (
SELECT COUNT(*)
FROM (
SELECT NULL AS [EMPTY]
) AS [t10]
LEFT OUTER JOIN (
SELECT NULL AS [EMPTY]
FROM [dbo].[PODs] AS [t11]
WHERE [t0].[IntConNo] = [t11].[Consignment]
) AS [t12] ON 1=1
) AS [value], [t0].[TripNumber], [t0].[DropSequence], [t0].[TripDate], [t1].[Name] AS [Depot], [t0].[Customer] AS [CustomerName], [t0].[CustomerReference], [t0].[DeliveryName], [t0].[DeliveryTown], [t0].[DeliveryPostcode], [t0].[VehicleReg] + [t0].[Subcontractor] AS [VehicleText], CONVERT(NVarChar,(
SELECT [t16].[ID]
FROM (
SELECT TOP (1) [t15].[ID]
FROM (
SELECT NULL AS [EMPTY]
) AS [t13]
LEFT OUTER JOIN (
SELECT [t14].[ID]
FROM [dbo].[Account] AS [t14]
WHERE [t0].[Subcontractor] = [t14].[Name]
) AS [t15] ON 1=1
ORDER BY [t15].[ID]
) AS [t16]
)) AS [SubbieID],
(CASE
WHEN [t3].[PODScanning] IS NULL THEN #p0
ELSE [t3].[PODScanning]
END) AS [ScanType], [t3].[ID] AS [ID3]
FROM [dbo].[Consignments] AS [t0]
INNER JOIN [dbo].[Depots] AS [t1] ON [t0].[DeliveryDepot] = [t1].[Letter]
INNER JOIN [dbo].[Account] AS [t2] ON [t0].[Customer] = [t2].[LegacyID]
INNER JOIN [dbo].[Account] AS [t3] ON [t2].[InvoiceAccount] = [t3].[LegacyID]
LEFT OUTER JOIN ((
SELECT NULL AS [EMPTY]
) AS [t4]
LEFT OUTER JOIN (
SELECT 1 AS [test], [t5].[ID], [t5].[Consignment], [t5].[Status], [t5].[NTConsignment], [t5].[CustomerRef], [t5].[Timestamp], [t5].[SignedBy], [t5].[Clause], [t5].[BarcodeNumber], [t5].[MainRef], [t5].[Notes], [t5].[ConsignmentRef], [t5].[PODedBy]
FROM [dbo].[PODs] AS [t5]
) AS [t6] ON 1=1 ) ON [t0].[IntConNo] = [t6].[Consignment]
WHERE ((NOT (EXISTS(
SELECT TOP (1) NULL AS [EMPTY]
FROM [dbo].[Account] AS [t7]
WHERE [t0].[Subcontractor] = [t7].[Name]
ORDER BY [t7].[ID]
))) OR (NOT (((
SELECT [t9].[Customer]
FROM (
SELECT TOP (1) [t8].[Customer]
FROM [dbo].[Account] AS [t8]
WHERE [t0].[Subcontractor] = [t8].[Name]
ORDER BY [t8].[ID]
) AS [t9]
)) = 1))) AND ([t2].[Customer] = 1) AND ([t3].[Customer] = 1)
ORDER BY [t0].[ID], [t1].[ID], [t2].[ID], [t3].[ID], [t6].[ID]
}
Try moving the subcontractor join up higher and push the where clause along with it. That way you're not unnecessarily making joins which would fail at the end.
I would also modify the select for the subcontractor id, so you don't get the Id of a potentially null value.
var cons = (from c in dc.Consignments
join su in dc.Accounts on c.Subcontractor equals su.Name into sug
where (sug.FirstOrDefault() == null || sug.FirstOrDefault().Customer == false)
join p in dc.PODs on c.IntConNo equals p.Consignment into pg
join d in dc.Depots on c.DeliveryDepot equals d.Letter
join sl in dc.Accounts on c.Customer equals sl.LegacyID
join ss in dc.Accounts on sl.InvoiceAccount equals ss.LegacyID
join sub in dc.Accountsubbies on ss.ID equals sub.AccountID into subg
let firstSubContractor = sug.DefaultIfEmpty().FirstOrDefault()
select new
{
ID = c.ID,
IntConNo = c.IntConNo,
LegacyID = c.LegacyID,
PODs = pg.DefaultIfEmpty(),
TripNumber = c.TripNumber,
DropSequence = c.DropSequence,
TripDate = c.TripDate,
Depot = d.Name,
CustomerName = c.Customer,
CustomerReference = c.CustomerReference,
DeliveryName = c.DeliveryName,
DeliveryTown = c.DeliveryTown,
DeliveryPostcode = c.DeliveryPostcode,
VehicleText = c.VehicleReg + c.Subcontractor,
SubbieID = firstSubContractor == null ? "" : firstSubContractor.ID.ToString(),
SubbieList = subg.DefaultIfEmpty(),
ScanType = ss.PODScanning == null ? 0 : ss.PODScanning
});

Categories