EF6 cascade object very slow to setup - c#

I have a problem with entity framework, when trying to retreive data.
I split the work into multiple steps:
generate the query
Execute it and retreive dataset from db.
Fill in my ViewModel using dataset.
Actualy, step 1 and 2 are very fast, step 3 can take up to 1 minute (for 200 records). Meaning it's not SQL related (i copied query from debugger to MSSMS, and it's executed in less than a second).
First i was using step 3B, to make it simple, i retreive a Job entity which i transform into a MapMarker object. and i thaught it was the ConvertAll that was slowing down the process.
After some SO reading, I tested using Select but result was the same.
The only this is, if i use the "main object", in this example: Job, everything is fast; as a test, i put the Job.Job_ID into all field, and the execution time is normal (less than a second).
Then i insert again this: ,Latitude = _Job.Maintenance.Equipement.Location.GPS.GPS_Latitude.Value and the slowlyness is back.
I even tried step 3C using a foreach loop (which i knew was not better but ok...) but it is as slow as other solutions.
The main question is:
What am i missing in the EF6 configuration (or somewhere else ?) that make this process so slow ?
I'm going to do it the good old way and execute my own sql query, i started using EF, i guess these entities should be usable, for now using "simple" objects works realy fine but if you cannot cascade them.. what is the added value ?
Below the steps i'm talking about.
Step 1:
IEnumerable<Job> Jobs = db.Job.Include(e => e.Maintenance.MaintenancePlan.MaintenanceType).Include(e => e.Maintenance.MaintenancePlan.MaintenanceType)
.Include(e => e.Maintenance.MaintenancePlan.MaintenanceType.Shape)
.Include(e => e.Maintenance.MaintenanceStatus)
.Include(e => e.Users)
.Include(e => e.Users.Color)
.Include(e => e.Maintenance.Equipement.Location.GPS);
Step 2:
List<Job> listJobs = Jobs.ToList();
Step 3A:
IEnumerable<MapMarker> IEMarkerJobsA = Jobs.AsEnumerable().Select(_Job => new MapMarker
{
ID = string.Format("Job_{0}", _Job.Job_ID)
,Latitude = _Job.Maintenance.Equipement.Location.GPS.GPS_Latitude.Value
,Longitude = _Job.Maintenance.Equipement.Location.GPS.GPS_Longitude.Value
});
List<MapMarker> listMarkerJobsA = IEMarkerJobsA.ToList();
Step 3B:
IEnumerable listMarkerJobs = listJobs.ConvertAll(
new Converter(MapMarker.MapMarkerFactory));
When the Factory is like this:
public static MapMarker MapMarkerFactory(Job _Job)
{
MapMarker A = new MapMarker();
A.ID = String.Format("Job_{0}", _Job.Job_ID);
A.Latitude = _Job.Maintenance.Equipement.Location.GPS.GPS_Latitude.Value;
A.Longitude = _Job.Maintenance.Equipement.Location.GPS.GPS_Longitude.Value;
A.Title = String.Format("{1}", (_Job.Users != null) ? String.Format("[{0}]", _Job.Users.Users_NickName) : "", _Job.Maintenance.Equipement.Equipement_Name);
A.Icon = GetIconePath((_Job.Users != null) ? _Job.Users.Color.Color_Name : "red", _Job.Maintenance.MaintenancePlan.MaintenanceType.Shape.Shape_Name, _Job.Maintenance.MaintenanceStatus.MaintenanceStatus_Description, "13px");
A.IconSize = new Size(13, 13);
A.WindowInfoContent = String.Format("JobID= {0}", _Job.Job_ID);
return A;
}
Step 3C:
List<MapMarker> listMarkerJobs = new List<MapMarker>();
foreach (Job _Job in Jobs)
{
MapMarker A = new MapMarker();
A.ID = String.Format("Job_{0}", _Job.Job_ID);
A.Latitude = _Job.Maintenance.Equipement.Location.GPS.GPS_Latitude.Value:
A.Longitude = _Job.Maintenance.Equipement.Location.GPS.GPS_Longitude.Value;
A.Title = String.Format("{1}", (_Job.Users != null) ? String.Format("[{0}]", _Job.Users.Users_NickName) : "", _Job.Maintenance.Equipement.Equipement_Name);
A.Icon = MapMarker.GetIconePath((_Job.Users != null) ? _Job.Users.Color.Color_Name : "red", _Job.Maintenance.MaintenancePlan.MaintenanceType.Shape.Shape_Name, _Job.Maintenance.MaintenanceStatus.MaintenanceStatus_Description, "13px");
A.IconSize = new System.Drawing.Size(13, 13);
A.WindowInfoContent = String.Format("JobID= {0}", _Job.Job_ID);
listMarkerJobs.Add(A);
}

Try this for part 3
List<MapMarker> listMarkerJobs = Jobs.AsNoTracking().Select(_Job => new MapMarker
{
ID = string.Format("Job_{0}", _Job.Job_ID),
Latitude = _Job.Maintenance.Equipement.Location.GPS.GPS_Latitude.Value,
Longitude = _Job.Maintenance.Equipement.Location.GPS.GPS_Longitude.Value
}).ToList();

Related

How do I convert this looped code to a single LINQ implementation?

I am trying to optimise the code below which loops through objects one by one and does a database lookup. I want to make a LINQ statement that will do the same task in one transaction.
This is my inefficient looped code;
IStoreUnitOfWork uow = StoreRepository.UnitOfWorkSource.GetUnitOfWorkFactory().CreateUnitOfWork();
var localRunners = new List<Runners>();
foreach(var remoteRunner in m.Runners) {
var localRunner = uow.CacheMarketRunners.Where(x => x.SelectionId == remoteRunner.SelectionId && x.MarketId == m.MarketId).FirstOrDefault();
localRunners.Add(localRunner);
}
This is my very feable attempt at a single query to do the same thing. Well it's not even an attempt. I don't know where to start. The remoteRunners object has a composite key.
IStoreUnitOfWork uow = StoreRepository.UnitOfWorkSource.GetUnitOfWorkFactory().CreateUnitOfWork();
var localRunners = new List<Runners>();
var localRunners = uow.CacheMarketRunners.Where(x =>
x.SelectionId in remoteRunners.SelectionId &&
x.MarketId in remoteRunners.MarketId);
Thank you for looking
So you have an object m, which has a property MarketId. Object m also has a sequence of Runners, where every Runner has a property SelectionId.
Your database has CacheMarketRunners. Every CacheMarketRunner has a MarketId and a SelectionId.
Your query should return allCacheMarketRunners with a MarketId equal to m.MarketId and a SelectionId that is contained in the sequence m.Runners.SelectionId.
If your m does not have too many Runners, say less then 250, consider using Queryable.Contains
var requestedSelectionIds = m.Runners.Select(runner => runner.SelectionId);
var result = CacheMarketRunners.Where(cacheMarketRunner =>
cacheMarketRunner.MarketId == m.MarketId
&& requestedSelectionIds.Contains(cacheMarketRunner.SelectionId));
To improve performance, you need caching transaction results:
var marketRunners = uow.CacheMarketRunners.Where(x => x.MarketId == m.MarketId).ToList();
Transaction results regarding uow are stored in the List, such that you don't have transaction in the for loop. Hence performance should be improved:
var localRunners = new List<Runners>();
foreach(var remoteRunner in m.Runners) {
var localRunner = marketRunners.FirstOrDefault(x => x.SelectionId == remoteRunner.SelectionId);
localRunners.Add(localRunner);
}
You can even remove the for loop:
var localRunners = m.Runners.Select(remoteRunner => marketRunners.FirstOrDefault(x => x.SelectionId == remoteRunner.SelectionId)).ToList();

Most efficient way to search enumerable

I am writing a small program that takes in a .csv file as input with about 45k rows. I am trying to compare the contents of this file with the contents of a table on a database (SQL Server through dynamics CRM using Xrm.Sdk if it makes a difference).
In my current program (which takes about 25 minutes to compare - the file and database are the exact same here both 45k rows with no differences), I have all existing records on the database in a DataCollection<Entity> which inherits Collection<T> and IEnumerable<T>
In my code below I am filtering using the Where method and then doing a logic based the count of matches. The Where seems to be the bottleneck here. Is there a more efficient approach than this? I am by no means a LINQ expert.
foreach (var record in inputDataLines)
{
var fields = record.Split(',');
var fund = fields[0];
var bps = Convert.ToDecimal(fields[1]);
var withdrawalPct = Convert.ToDecimal(fields[2]);
var percentile = Convert.ToInt32(fields[3]);
var age = Convert.ToInt32(fields[4]);
var bombOutTerm = Convert.ToDecimal(fields[5]);
var matchingRows = existingRecords.Entities.Where(r => r["field_1"].ToString() == fund
&& Convert.ToDecimal(r["field_2"]) == bps
&& Convert.ToDecimal(r["field_3"]) == withdrawalPct
&& Convert.ToDecimal(r["field_4"]) == percentile
&& Convert.ToDecimal(r["field_5"]) == age);
entitiesFound.AddRange(matchingRows);
if (matchingRows.Count() == 0)
{
rowsToAdd.Add(record);
}
else if (matchingRows.Count() == 1)
{
if (Convert.ToDecimal(matchingRows.First()["field_6"]) != bombOutTerm)
{
rowsToUpdate.Add(record);
entitiesToUpdate.Add(matchingRows.First());
}
}
else
{
entitiesToDelete.AddRange(matchingRows);
rowsToAdd.Add(record);
}
}
EDIT: I can confirm that all existingRecords are in memory before this code is executed. There is no IO or DB access in the above loop.
Himbrombeere is right, you should execute the query first and put the result into a collection before you use Any, Count, AddRange or whatever method will execute the query again. In your code it's possible that the query is executed 5 times in every loop iteration.
Watch out for the term deferred execution in the documentation. If a method is implemented in that way, then it means that this method can be used to construct a LINQ query(so you can chain it with other methods and at the end you have a query). But only methods that don't use deferred execution like Count, Any, ToList(or a plain foreach) will actually execute it. If you dont want that the whole query is executed everytime and you have to access this query multiple times , it's better to store the result in a collection(.f.e with ToList).
However, you could use a different approach which should be much more efficient, a Lookup<TKey, TValue> which is similar to a dictionary and can be used with an anonymous type as key:
var lookup = existingRecords.Entities.ToLookup(r => new
{
fund = r["field_1"].ToString(),
bps = Convert.ToDecimal(r["field_2"]),
withdrawalPct = Convert.ToDecimal(r["field_3"]),
percentile = Convert.ToDecimal(r["field_4"]),
age = Convert.ToDecimal(r["field_5"])
});
Now you can access this lookup in the loop very efficiently.
foreach (var record in inputDataLines)
{
var fields = record.Split(',');
var fund = fields[0];
var bps = Convert.ToDecimal(fields[1]);
var withdrawalPct = Convert.ToDecimal(fields[2]);
var percentile = Convert.ToInt32(fields[3]);
var age = Convert.ToInt32(fields[4]);
var bombOutTerm = Convert.ToDecimal(fields[5]);
var matchingRows = lookup[new {fund, bps, withdrawalPct, percentile, age}].ToList();
entitiesFound.AddRange(matchingRows);
if (matchingRows.Count() == 0)
{
rowsToAdd.Add(record);
}
else if (matchingRows.Count() == 1)
{
if (Convert.ToDecimal(matchingRows.First()["field_6"]) != bombOutTerm)
{
rowsToUpdate.Add(record);
entitiesToUpdate.Add(matchingRows.First());
}
}
else
{
entitiesToDelete.AddRange(matchingRows);
rowsToAdd.Add(record);
}
}
Note that this will work even if the key does not exist(an empty list is returned).
Add a ToList after your Convert.ToDecimal(r["field_5"]) == age);-line to force an immediate execution of the query.
var matchingRows = existingRecords.Entities.Where(r => r["field_1"].ToString() == fund
&& Convert.ToDecimal(r["field_2"]) == bps
&& Convert.ToDecimal(r["field_3"]) == withdrawalPct
&& Convert.ToDecimal(r["field_4"]) == percentile
&& Convert.ToDecimal(r["field_5"]) == age)
.ToList();
The Where doesn´t actually execute your query, it just prepares it. The actual execution happens later in a delayed way. In your case that happens when calling Count which itself will iterate the entire collection of items. But if the first condition fails, the second one is checked leading to a second iteration of the complete collection when calling Count. In this case you actually execute that query a thrird time when calling matchingRows.First().
When forcing an immediate execution you´re executing the query only once and thus iterating the entire collection only once also which will decrease your overall-time.
Another option, which is basically along the same lines as the other answers, is to prepare your data first, so that you're not repeatedly calling things like r["field_2"] (which are relatively slow to look up).
This is a (1) clean your data, (2) query/join your data, (3) process your data approach.
Do this:
(1)
var inputs =
inputDataLines
.Select(record =>
{
var fields = record.Split(',');
return new
{
fund = fields[0],
bps = Convert.ToDecimal(fields[1]),
withdrawalPct = Convert.ToDecimal(fields[2]),
percentile = Convert.ToInt32(fields[3]),
age = Convert.ToInt32(fields[4]),
bombOutTerm = Convert.ToDecimal(fields[5]),
record
};
})
.ToArray();
var entities =
existingRecords
.Entities
.Select(entity => new
{
fund = entity["field_1"].ToString(),
bps = Convert.ToDecimal(entity["field_2"]),
withdrawalPct = Convert.ToDecimal(entity["field_3"]),
percentile = Convert.ToInt32(entity["field_4"]),
age = Convert.ToInt32(entity["field_5"]),
bombOutTerm = Convert.ToDecimal(entity["field_6"]),
entity
})
.ToArray()
.GroupBy(x => new
{
x.fund,
x.bps,
x.withdrawalPct,
x.percentile,
x.age
}, x => new
{
x.bombOutTerm,
x.entity,
});
(2)
var query =
from i in inputs
join e in entities on new { i.fund, i.bps, i.withdrawalPct, i.percentile, i.age } equals e.Key
select new { input = i, matchingRows = e };
(3)
foreach (var x in query)
{
entitiesFound.AddRange(x.matchingRows.Select(y => y.entity));
if (x.matchingRows.Count() == 0)
{
rowsToAdd.Add(x.input.record);
}
else if (x.matchingRows.Count() == 1)
{
if (x.matchingRows.First().bombOutTerm != x.input.bombOutTerm)
{
rowsToUpdate.Add(x.input.record);
entitiesToUpdate.Add(x.matchingRows.First().entity);
}
}
else
{
entitiesToDelete.AddRange(x.matchingRows.Select(y => y.entity));
rowsToAdd.Add(x.input.record);
}
}
I would suspect that this will be the among the fastest approaches presented.

Why does EF load data from the database and ignores local changes?

I have the following code:
var existingParticipant = Context.CaseParticipants.Where(p => p.CaseId == caseId);
foreach (var cp in existingParticipant)
{
var ncp = caseParticipantList.First(a => a.Id == cp.Id);
cp.IsIncompetent = ncp.IsIncompetent;
cp.IsLeave = ncp.IsLeave;
cp.SubstituteUserId = ncp.IsPresent ? null : ncp.SubstituteUserId;
}
var withSubs = existingParticipant.Where(c => c.SubstituteUserId != null).ToList();
What surprised me is that the last line fetches the rows from the DB a second time, ignoring any changes I've just done in the previous lines, why is that, and how do I avoid it?
I think your problem is that your existingParticipant is a query and not a list. That query gets executed for the foreach, but existingParticipant still stays a query that will get executed on the database when calling ToList() again. To solve it execute the initial query straight away and that way you work in memory on your changed entities.
IList<...> existingParticipant = Context.CaseParticipants.Where(p => p.CaseId == caseId).ToList(); // Explicit executing of query
foreach (var cp in existingParticipant)
{
var ncp = caseParticipantList.First(a => a.Id == cp.Id);
cp.IsIncompetent = ncp.IsIncompetent;
cp.IsLeave = ncp.IsLeave;
cp.SubstituteUserId = ncp.IsPresent ? null : ncp.SubstituteUserId;
}
var withSubs = existingParticipant.Where(c => c.SubstituteUserId != null).ToList(); // Working in memory on list
The type of existingParticipants is IQueryable, that means you won't get the objects into memory but only a query itself working on database directly
If you want to process your objects into memory call .ToList() after
Context.CaseParticipants.Where(p => p.CaseId == caseId)

The query processor ran out of internal resources and could not produce a query plan in EF

I have a query in EF where there is a List of string value that it checks for existence in another table.
Please consider the below query for more details.
Code
List<string> ItmsStock = item.Select(ds => ds.ItemNum).ToList(); // Currently, This List items count is 80,000 records.
this.Db.Database.CommandTimeout = 180;
var existsStckList = Db.Stocktakes.Where(ds => ItmsStock.Contains(ds.ItemNo)).Select(ds => ds.ItemNo).ToList();
item.RemoveAll(ds => existsStckList.Contains(ds.ItemNum));
var ItmsExists = Db.Items.Where(ds => ItmsStock.Contains(ds.ItemNo)).Select(ds => ds.ItemNo).ToList();
ItmsExists = Db.Stocktakes.Where(ds => !ItmsExists.Contains(ds.ItemNo)).Select(ds => ds.ItemNo).ToList();
I searched on the internet and found the converted sql uses IN to check for existence. so, the limit for the IN makes the problem. My question here is, How can I efficiently perform the above actions without using for loop.
I ll be appreciating you, If anybody can help me out.
Edit
Previously, I had the below code. After facing the performance issue with the below code, I wrote the above one.
foreach (var stockitems in item)
{
if (Db.Stocktakes.Any(a => a.ItemNo == stockitems.ItemNum))
{
StockResult ss = new StockResult();
ss.ItemNumber = stockitems.ItemNum;
ss.FileName = stockitems.FileName;
Stockres.Add(ss);
}
else if (!Db.Stocktakes.Any(a => a.ItemNo == stockitems.ItemNum) && Db.Items.Any(a => a.ItemNo == stockitems.ItemNum))
{
var ItemNo = stockitems.ItemNum;
var AdminId = Convert.ToInt32(Session["AccId"]);
var CreatedOn = System.DateTime.Now;
int dbres = Db.Database.ExecuteSqlCommand("insert into Stocktake values({0},{1},{2})", ItemNo, AdminId, CreatedOn);
Db.SaveChanges();
totalcount = totalcount + 1;
}
else
{
StockResult sss = new StockResult();
sss.ItemNumber = stockitems.ItemNum;
sss.FileName = stockitems.FileName;
Stockitemsdup.Add(sss);
}
}
Thanks.
Issue batches of 1000 item IDs to the database, or use native SQL and submit a table-valued parameter, or a temp table filled with SqlBulkCopy.
I'm surprised you got htis particular message. The parameter limit is about 2000 parameters. Your query should have been rejected.

Complexity limits of Linq queries

I'm a big fan of Linq, and I have been really enjoying the power of expression trees etc. But I have found that whenever I try to get too clever with my queries, I hit some kind of limitation in the framework: while the query can take a very short time to run on the database (as shown by performance analyzer), the results take ages to materialize. When that happens I know I've been too fancy, and I start breaking the query up into smaller, bite sized chunks - so I have a solution for that, though it might not always be the most optimal.
But I'd like to understand:
What is it that pushes the Linq framework over the edge in terms of materializing the query results?
Where can I read about the mechanism of materializing query results?
Is there a certain measurable complexity limit for Linq queries that should be avoided?
What design patterns are known to cause this problem, and what patterns can remedy it?
EDIT: As requested in comments, here's an example of a query that I measured to run on SQL Server in a few seconds, but took almost 2 minutes to materialize. I'm not going to try explaining all the stuff in context; it's here just so you can view the constructs and see an example of what I'm talking about:
Expression<Func<Staff, TeacherInfo>> teacherInfo =
st => new TeacherInfo
{
ID = st.ID,
Name = st.FirstName + " " + st.LastName,
Email = st.Email,
Phone = st.TelMobile,
};
var step1 =
currentReportCards.AsExpandable()
.GroupJoin(db.ScholarReportCards,
current =>
new { current.ScholarID, current.AcademicTerm.AcademicYearID },
past => new { past.ScholarID, past.AcademicTerm.AcademicYearID },
(current, past) => new
{
Current = current,
PastCards =
past.Where(
rc =>
rc.AcademicTerm.StartDate <
current.AcademicTerm.StartDate &&
rc.AcademicTerm.Grade == current.AcademicTerm.Grade &&
rc.AcademicTerm.SchoolID == current.AcademicTerm.SchoolID)
});
// This materialization is what takes a long time:
var subjects = step1.SelectMany(x => from key in x.Current.Subjects
.Select(s => new { s.Subject.SubjectID, s.Subject.SubjectCategoryID })
.Union(x.PastCards.SelectMany(c => c.Subjects)
.Select(
s => new { s.Subject.SubjectID, s.Subject.SubjectCategoryID }))
join cur in x.Current.Subjects on key equals
new { cur.Subject.SubjectID, cur.Subject.SubjectCategoryID } into jcur
from cur in jcur.DefaultIfEmpty()
join past in x.PastCards.SelectMany(p => p.Subjects) on key equals
new { past.Subject.SubjectID, past.Subject.SubjectCategoryID } into past
select new
{
x.Current.ScholarID,
IncludeInContactSection =
// ReSharper disable ConstantNullCoalescingCondition
(bool?)cur.Subject.IncludeInContactSection ?? false,
IncludeGrades = (bool?)cur.Subject.IncludeGrades ?? true,
// ReSharper restore ConstantNullCoalescingCondition
SubjectName =
cur.Subject.Subject.Name ?? past.FirstOrDefault().Subject.Subject.Name,
SubjectCategoryName = cur.Subject.SubjectCategory.Description,
ClassInfo = (from ce in myDb.ClassEnrollments
.Where(
ce =>
ce.Class.SubjectID == cur.Subject.SubjectID
&& ce.ScholarID == x.Current.ScholarID)
.Where(enrollmentExpr)
.OrderByDescending(ce => ce.TerminationDate ?? DateTime.Today)
let teacher = ce.Class.Teacher
let secTeachers = ce.Class.SecondaryTeachers
select new
{
ce.Class.Nickname,
Primary = teacherInfo.Invoke(teacher),
Secondaries = secTeachers.AsQueryable().AsExpandable()
.Select(ti => teacherInfo.Invoke(ti))
})
.FirstOrDefault(),
Comments = cur.Comments
.Select(cc => new
{
Staff = cc.Staff.FirstName + " "
+ cc.Staff.LastName,
Comment = cc.CommentTemplate.Text ??
cc.CommentFreeText
}),
// ReSharper disable ConstantNullCoalescingCondition
DisplayOrder = (byte?)cur.Subject.DisplayOrder ?? (byte)99,
// ReSharper restore ConstantNullCoalescingCondition
cur.Percentile,
cur.Score,
cur.Symbol,
cur.MasteryLevel,
PastScores = past.Select(p => new
{
p.Score,
p.Symbol,
p.MasteryLevel,
p.ScholarReportCard
.AcademicTermID
}),
Assessments = cur.Assessments
.Select(a => new
{
a.ScholarAssessment.AssessmentID,
a.ScholarAssessment.Assessment.Description,
a.ScholarAssessment.Assessment.Type.Nickname,
a.ScholarAssessment.AssessmentDate,
a.ScoreDesc,
a.ScorePerc,
a.MasteryLevel,
a.ScholarAssessment.Assessment.Type.AssessmentFormat,
a.ScholarAssessment.PublishedStatus,
a.ScholarAssessment.FPScore,
a.ScholarAssessment.TotalScore,
a.ScholarAssessment.Assessment.Type.ScoreType,
a.ScholarAssessment.Assessment.Type.OverrideBelowLabel,
a.ScholarAssessment.Assessment.Type.OverrideApproachingLabel,
a.ScholarAssessment.Assessment.Type.OverrideMeetingLabel,
a.ScholarAssessment.Assessment.Type.OverrideExceedingLabel,
})
})
.ToList();
Linq uses deferred execution for some tasks, for example while iterating through an IEnumerable<>, so what you call materialization includes some actual data fetching.
var reportCards = db.ScholarReportCards.Where(cr => ...); // this prepares the query
foreach (var rc in reportCards) {} // this executes your query and calls the DB
I think that if you trace/time queries on your SQL server you may see some queries arriving during the "materialization" step. This problem may even be exacerbated by anti-patterns such as the "Select N+1" problem : for example it looks like you're not including the AcademicTerm objects in your request; if you don't resolving these will result in a select N+1, that is for every ScholarReportCard there will be a call to the DB to lazily resolve the AcademicTerm attached.
If we focus on the Linq to DB aspect, at least try not to :
select n+1: Include the related datatables you will need
select too much data: include only the columns you need in your selection (Include on the table you need)

Categories