I am trying to optimise the code below which loops through objects one by one and does a database lookup. I want to make a LINQ statement that will do the same task in one transaction.
This is my inefficient looped code;
IStoreUnitOfWork uow = StoreRepository.UnitOfWorkSource.GetUnitOfWorkFactory().CreateUnitOfWork();
var localRunners = new List<Runners>();
foreach(var remoteRunner in m.Runners) {
var localRunner = uow.CacheMarketRunners.Where(x => x.SelectionId == remoteRunner.SelectionId && x.MarketId == m.MarketId).FirstOrDefault();
localRunners.Add(localRunner);
}
This is my very feable attempt at a single query to do the same thing. Well it's not even an attempt. I don't know where to start. The remoteRunners object has a composite key.
IStoreUnitOfWork uow = StoreRepository.UnitOfWorkSource.GetUnitOfWorkFactory().CreateUnitOfWork();
var localRunners = new List<Runners>();
var localRunners = uow.CacheMarketRunners.Where(x =>
x.SelectionId in remoteRunners.SelectionId &&
x.MarketId in remoteRunners.MarketId);
Thank you for looking
So you have an object m, which has a property MarketId. Object m also has a sequence of Runners, where every Runner has a property SelectionId.
Your database has CacheMarketRunners. Every CacheMarketRunner has a MarketId and a SelectionId.
Your query should return allCacheMarketRunners with a MarketId equal to m.MarketId and a SelectionId that is contained in the sequence m.Runners.SelectionId.
If your m does not have too many Runners, say less then 250, consider using Queryable.Contains
var requestedSelectionIds = m.Runners.Select(runner => runner.SelectionId);
var result = CacheMarketRunners.Where(cacheMarketRunner =>
cacheMarketRunner.MarketId == m.MarketId
&& requestedSelectionIds.Contains(cacheMarketRunner.SelectionId));
To improve performance, you need caching transaction results:
var marketRunners = uow.CacheMarketRunners.Where(x => x.MarketId == m.MarketId).ToList();
Transaction results regarding uow are stored in the List, such that you don't have transaction in the for loop. Hence performance should be improved:
var localRunners = new List<Runners>();
foreach(var remoteRunner in m.Runners) {
var localRunner = marketRunners.FirstOrDefault(x => x.SelectionId == remoteRunner.SelectionId);
localRunners.Add(localRunner);
}
You can even remove the for loop:
var localRunners = m.Runners.Select(remoteRunner => marketRunners.FirstOrDefault(x => x.SelectionId == remoteRunner.SelectionId)).ToList();
Related
Why line 2 of mycode can not be read :
var allResarchs = db.Researchs;
allResarchs.Where(a => a.ChiefManagerId == 1);
allResarchs.ToList();
You never set the value of Where statement to the variable.
var allResarchs = db.Researchs.Where(a => a.ChiefManagerId == 1).ToList();
You should assign results of LINQ queries:
var allResarchs = db.Researchs;
var filtered = allResarchs.Where(a => a.ChiefManagerId == 1);
var list = filtered.ToList();
Also you can do it in simplier way (if you do not need intermediate results):
var list = db.Researchs.Where(a => a.ChiefManagerId == 1).ToList();
Linq never changes the input sequence!
allResearches.Where(research => research.ChiefManagerId == 1);
This statement won't change allResearches. You could do the following. (by the way, I've changed the var into the actual returned types, so you understand better what is going on.)
IQueryable<Research> queryResearches = db.Researches;
IQueryable<Research> queryResearchesWithId1 = queryResearches
.Where(research => research.ChiefManagerId == 1);
List<Research> researchesWithId1 = queryResearchedWithId1.ToList();
Be aware, that until the last statement the query is not executed, there is no communication with the database yet. Only the last statement will actually contact the database.
Of course you can write it all in one statement. However, this won't improve performance very much:
var researchesWithId1 = db.Researches
.Where(research => research.ChiefManagerId == 1)
.ToList();
I am writing a small program that takes in a .csv file as input with about 45k rows. I am trying to compare the contents of this file with the contents of a table on a database (SQL Server through dynamics CRM using Xrm.Sdk if it makes a difference).
In my current program (which takes about 25 minutes to compare - the file and database are the exact same here both 45k rows with no differences), I have all existing records on the database in a DataCollection<Entity> which inherits Collection<T> and IEnumerable<T>
In my code below I am filtering using the Where method and then doing a logic based the count of matches. The Where seems to be the bottleneck here. Is there a more efficient approach than this? I am by no means a LINQ expert.
foreach (var record in inputDataLines)
{
var fields = record.Split(',');
var fund = fields[0];
var bps = Convert.ToDecimal(fields[1]);
var withdrawalPct = Convert.ToDecimal(fields[2]);
var percentile = Convert.ToInt32(fields[3]);
var age = Convert.ToInt32(fields[4]);
var bombOutTerm = Convert.ToDecimal(fields[5]);
var matchingRows = existingRecords.Entities.Where(r => r["field_1"].ToString() == fund
&& Convert.ToDecimal(r["field_2"]) == bps
&& Convert.ToDecimal(r["field_3"]) == withdrawalPct
&& Convert.ToDecimal(r["field_4"]) == percentile
&& Convert.ToDecimal(r["field_5"]) == age);
entitiesFound.AddRange(matchingRows);
if (matchingRows.Count() == 0)
{
rowsToAdd.Add(record);
}
else if (matchingRows.Count() == 1)
{
if (Convert.ToDecimal(matchingRows.First()["field_6"]) != bombOutTerm)
{
rowsToUpdate.Add(record);
entitiesToUpdate.Add(matchingRows.First());
}
}
else
{
entitiesToDelete.AddRange(matchingRows);
rowsToAdd.Add(record);
}
}
EDIT: I can confirm that all existingRecords are in memory before this code is executed. There is no IO or DB access in the above loop.
Himbrombeere is right, you should execute the query first and put the result into a collection before you use Any, Count, AddRange or whatever method will execute the query again. In your code it's possible that the query is executed 5 times in every loop iteration.
Watch out for the term deferred execution in the documentation. If a method is implemented in that way, then it means that this method can be used to construct a LINQ query(so you can chain it with other methods and at the end you have a query). But only methods that don't use deferred execution like Count, Any, ToList(or a plain foreach) will actually execute it. If you dont want that the whole query is executed everytime and you have to access this query multiple times , it's better to store the result in a collection(.f.e with ToList).
However, you could use a different approach which should be much more efficient, a Lookup<TKey, TValue> which is similar to a dictionary and can be used with an anonymous type as key:
var lookup = existingRecords.Entities.ToLookup(r => new
{
fund = r["field_1"].ToString(),
bps = Convert.ToDecimal(r["field_2"]),
withdrawalPct = Convert.ToDecimal(r["field_3"]),
percentile = Convert.ToDecimal(r["field_4"]),
age = Convert.ToDecimal(r["field_5"])
});
Now you can access this lookup in the loop very efficiently.
foreach (var record in inputDataLines)
{
var fields = record.Split(',');
var fund = fields[0];
var bps = Convert.ToDecimal(fields[1]);
var withdrawalPct = Convert.ToDecimal(fields[2]);
var percentile = Convert.ToInt32(fields[3]);
var age = Convert.ToInt32(fields[4]);
var bombOutTerm = Convert.ToDecimal(fields[5]);
var matchingRows = lookup[new {fund, bps, withdrawalPct, percentile, age}].ToList();
entitiesFound.AddRange(matchingRows);
if (matchingRows.Count() == 0)
{
rowsToAdd.Add(record);
}
else if (matchingRows.Count() == 1)
{
if (Convert.ToDecimal(matchingRows.First()["field_6"]) != bombOutTerm)
{
rowsToUpdate.Add(record);
entitiesToUpdate.Add(matchingRows.First());
}
}
else
{
entitiesToDelete.AddRange(matchingRows);
rowsToAdd.Add(record);
}
}
Note that this will work even if the key does not exist(an empty list is returned).
Add a ToList after your Convert.ToDecimal(r["field_5"]) == age);-line to force an immediate execution of the query.
var matchingRows = existingRecords.Entities.Where(r => r["field_1"].ToString() == fund
&& Convert.ToDecimal(r["field_2"]) == bps
&& Convert.ToDecimal(r["field_3"]) == withdrawalPct
&& Convert.ToDecimal(r["field_4"]) == percentile
&& Convert.ToDecimal(r["field_5"]) == age)
.ToList();
The Where doesn´t actually execute your query, it just prepares it. The actual execution happens later in a delayed way. In your case that happens when calling Count which itself will iterate the entire collection of items. But if the first condition fails, the second one is checked leading to a second iteration of the complete collection when calling Count. In this case you actually execute that query a thrird time when calling matchingRows.First().
When forcing an immediate execution you´re executing the query only once and thus iterating the entire collection only once also which will decrease your overall-time.
Another option, which is basically along the same lines as the other answers, is to prepare your data first, so that you're not repeatedly calling things like r["field_2"] (which are relatively slow to look up).
This is a (1) clean your data, (2) query/join your data, (3) process your data approach.
Do this:
(1)
var inputs =
inputDataLines
.Select(record =>
{
var fields = record.Split(',');
return new
{
fund = fields[0],
bps = Convert.ToDecimal(fields[1]),
withdrawalPct = Convert.ToDecimal(fields[2]),
percentile = Convert.ToInt32(fields[3]),
age = Convert.ToInt32(fields[4]),
bombOutTerm = Convert.ToDecimal(fields[5]),
record
};
})
.ToArray();
var entities =
existingRecords
.Entities
.Select(entity => new
{
fund = entity["field_1"].ToString(),
bps = Convert.ToDecimal(entity["field_2"]),
withdrawalPct = Convert.ToDecimal(entity["field_3"]),
percentile = Convert.ToInt32(entity["field_4"]),
age = Convert.ToInt32(entity["field_5"]),
bombOutTerm = Convert.ToDecimal(entity["field_6"]),
entity
})
.ToArray()
.GroupBy(x => new
{
x.fund,
x.bps,
x.withdrawalPct,
x.percentile,
x.age
}, x => new
{
x.bombOutTerm,
x.entity,
});
(2)
var query =
from i in inputs
join e in entities on new { i.fund, i.bps, i.withdrawalPct, i.percentile, i.age } equals e.Key
select new { input = i, matchingRows = e };
(3)
foreach (var x in query)
{
entitiesFound.AddRange(x.matchingRows.Select(y => y.entity));
if (x.matchingRows.Count() == 0)
{
rowsToAdd.Add(x.input.record);
}
else if (x.matchingRows.Count() == 1)
{
if (x.matchingRows.First().bombOutTerm != x.input.bombOutTerm)
{
rowsToUpdate.Add(x.input.record);
entitiesToUpdate.Add(x.matchingRows.First().entity);
}
}
else
{
entitiesToDelete.AddRange(x.matchingRows.Select(y => y.entity));
rowsToAdd.Add(x.input.record);
}
}
I would suspect that this will be the among the fastest approaches presented.
I've attempted to modify my connection string to include an extended timeout and I've confirmed that on the sql server side the view that feeds my EF Object executes within seconds and returns a total of 3000 or less records.
BUT when I attempt to run it via code I am now running into Timeout issues and I was seeking some advice to fix this issue. I get "Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding."
Most solutions I find on the specific error recommend connection string modifications OR something along this.context.CommandTimeout... which I cannot figure out how to use in this situation.
I've included the Method I use to acquire the desired data. If there is a more efficient way please let me know.
The input arguments are:
int? inputSKU = null
int? inputStoreNum = null
DateTime? inputStartDate = null
The intent is to return the full list.
And it hangs at, because it skips all the conditional bits:
var qUniqueOffers = query.GroupBy(q => q.Plan_Number).ToList();
Thank you.
private List<PromotionItem> QueryPromotion(int? inputSKU, int? inputStoreNum, DateTime? inputStartDate)
{
log.Info("Client requested QueryPromotion");
List<PromotionItem> resultQuery = new List<PromotionItem>();
try
{
using (DWH_Entities db = new DWH_Entities())
{
var query = db.vw_Web_Promotion.AsQueryable();
// filter promotion results that don't match SKU#
if (inputSKU != null)
query = query.Where(q => q.Sku_Number == inputSKU);
// filter promotion results that don't match Store Num
if (inputStoreNum != null)
query = query.Where(q => q.Store_Number == inputStoreNum);
// filter promotion results that don't match Promotion Start Date
if (inputStartDate != null)
query = query.Where(q => q.Start_Date >= inputStartDate);
// Group promotions By Plan Number ('Promotion ID')
var qUniqueOffers = query
.GroupBy(q => q.Plan_Number)
.ToList();
// Select first from each group to get unique details
var qOffers = qUniqueOffers
.Select(g => g.OrderBy(gi => gi.Plan_Number).First())
.ToList();
foreach (var qo in qOffers)
{
resultQuery.Add(new PromotionItem
{
PromotionNumber = qo.Plan_Number.Trim(),
PromotionDescription = qo.Plan_Description.Trim(),
StartDate = qo.Start_Date,
EndDate = qo.End_Date
});
}
}
}
catch (Exception e)
{
log.Error("[" + e.TargetSite + "] | " + e.Message);
throw e;
}
return resultQuery;
}
If you are using latest EF version do the following to increase timeout:
using (DWH_Entities db = new DWH_Entities())
{
db.Database.CommandTimeout = 300;
...
If you want records in the minimum time, try following:
var temp = query.ToList();
var qUniqueOffers = temp.GroupBy(q => q.Plan_Number)
.ToList();
// Group promotions By Plan Number ('Promotion ID')
var qUniqueOffers = query
.GroupBy(q => q.Plan_Number)
.ToList();
// Select first from each group to get unique details
var qOffers = qUniqueOffers
.Select(g => g.OrderBy(gi => gi.Plan_Number).First())
.ToList();
The way you have written the above LINQ means you are pulling a lot of data over the wire (the first ToList) and then getting a subset of the data (using First and the second ToList). Consider changing it to:
// Group promotions By Plan Number ('Promotion ID')
var qUniqueOffers = query
.GroupBy(q => q.Plan_Number)
// Select first from each group to get unique details
var qOffers = qUniqueOffers
.Select(g => g.OrderBy(gi => gi.Plan_Number).First())
.ToList();
This should result in much less data being sent from the database - which will hopefully make it faster.
As https://stackoverflow.com/a/13827077/34092 states:
ToList() always forces everything ahead of it to evaluate immediately,
as opposed to deferred execution.
I have the following code:
var existingParticipant = Context.CaseParticipants.Where(p => p.CaseId == caseId);
foreach (var cp in existingParticipant)
{
var ncp = caseParticipantList.First(a => a.Id == cp.Id);
cp.IsIncompetent = ncp.IsIncompetent;
cp.IsLeave = ncp.IsLeave;
cp.SubstituteUserId = ncp.IsPresent ? null : ncp.SubstituteUserId;
}
var withSubs = existingParticipant.Where(c => c.SubstituteUserId != null).ToList();
What surprised me is that the last line fetches the rows from the DB a second time, ignoring any changes I've just done in the previous lines, why is that, and how do I avoid it?
I think your problem is that your existingParticipant is a query and not a list. That query gets executed for the foreach, but existingParticipant still stays a query that will get executed on the database when calling ToList() again. To solve it execute the initial query straight away and that way you work in memory on your changed entities.
IList<...> existingParticipant = Context.CaseParticipants.Where(p => p.CaseId == caseId).ToList(); // Explicit executing of query
foreach (var cp in existingParticipant)
{
var ncp = caseParticipantList.First(a => a.Id == cp.Id);
cp.IsIncompetent = ncp.IsIncompetent;
cp.IsLeave = ncp.IsLeave;
cp.SubstituteUserId = ncp.IsPresent ? null : ncp.SubstituteUserId;
}
var withSubs = existingParticipant.Where(c => c.SubstituteUserId != null).ToList(); // Working in memory on list
The type of existingParticipants is IQueryable, that means you won't get the objects into memory but only a query itself working on database directly
If you want to process your objects into memory call .ToList() after
Context.CaseParticipants.Where(p => p.CaseId == caseId)
I'm a big fan of Linq, and I have been really enjoying the power of expression trees etc. But I have found that whenever I try to get too clever with my queries, I hit some kind of limitation in the framework: while the query can take a very short time to run on the database (as shown by performance analyzer), the results take ages to materialize. When that happens I know I've been too fancy, and I start breaking the query up into smaller, bite sized chunks - so I have a solution for that, though it might not always be the most optimal.
But I'd like to understand:
What is it that pushes the Linq framework over the edge in terms of materializing the query results?
Where can I read about the mechanism of materializing query results?
Is there a certain measurable complexity limit for Linq queries that should be avoided?
What design patterns are known to cause this problem, and what patterns can remedy it?
EDIT: As requested in comments, here's an example of a query that I measured to run on SQL Server in a few seconds, but took almost 2 minutes to materialize. I'm not going to try explaining all the stuff in context; it's here just so you can view the constructs and see an example of what I'm talking about:
Expression<Func<Staff, TeacherInfo>> teacherInfo =
st => new TeacherInfo
{
ID = st.ID,
Name = st.FirstName + " " + st.LastName,
Email = st.Email,
Phone = st.TelMobile,
};
var step1 =
currentReportCards.AsExpandable()
.GroupJoin(db.ScholarReportCards,
current =>
new { current.ScholarID, current.AcademicTerm.AcademicYearID },
past => new { past.ScholarID, past.AcademicTerm.AcademicYearID },
(current, past) => new
{
Current = current,
PastCards =
past.Where(
rc =>
rc.AcademicTerm.StartDate <
current.AcademicTerm.StartDate &&
rc.AcademicTerm.Grade == current.AcademicTerm.Grade &&
rc.AcademicTerm.SchoolID == current.AcademicTerm.SchoolID)
});
// This materialization is what takes a long time:
var subjects = step1.SelectMany(x => from key in x.Current.Subjects
.Select(s => new { s.Subject.SubjectID, s.Subject.SubjectCategoryID })
.Union(x.PastCards.SelectMany(c => c.Subjects)
.Select(
s => new { s.Subject.SubjectID, s.Subject.SubjectCategoryID }))
join cur in x.Current.Subjects on key equals
new { cur.Subject.SubjectID, cur.Subject.SubjectCategoryID } into jcur
from cur in jcur.DefaultIfEmpty()
join past in x.PastCards.SelectMany(p => p.Subjects) on key equals
new { past.Subject.SubjectID, past.Subject.SubjectCategoryID } into past
select new
{
x.Current.ScholarID,
IncludeInContactSection =
// ReSharper disable ConstantNullCoalescingCondition
(bool?)cur.Subject.IncludeInContactSection ?? false,
IncludeGrades = (bool?)cur.Subject.IncludeGrades ?? true,
// ReSharper restore ConstantNullCoalescingCondition
SubjectName =
cur.Subject.Subject.Name ?? past.FirstOrDefault().Subject.Subject.Name,
SubjectCategoryName = cur.Subject.SubjectCategory.Description,
ClassInfo = (from ce in myDb.ClassEnrollments
.Where(
ce =>
ce.Class.SubjectID == cur.Subject.SubjectID
&& ce.ScholarID == x.Current.ScholarID)
.Where(enrollmentExpr)
.OrderByDescending(ce => ce.TerminationDate ?? DateTime.Today)
let teacher = ce.Class.Teacher
let secTeachers = ce.Class.SecondaryTeachers
select new
{
ce.Class.Nickname,
Primary = teacherInfo.Invoke(teacher),
Secondaries = secTeachers.AsQueryable().AsExpandable()
.Select(ti => teacherInfo.Invoke(ti))
})
.FirstOrDefault(),
Comments = cur.Comments
.Select(cc => new
{
Staff = cc.Staff.FirstName + " "
+ cc.Staff.LastName,
Comment = cc.CommentTemplate.Text ??
cc.CommentFreeText
}),
// ReSharper disable ConstantNullCoalescingCondition
DisplayOrder = (byte?)cur.Subject.DisplayOrder ?? (byte)99,
// ReSharper restore ConstantNullCoalescingCondition
cur.Percentile,
cur.Score,
cur.Symbol,
cur.MasteryLevel,
PastScores = past.Select(p => new
{
p.Score,
p.Symbol,
p.MasteryLevel,
p.ScholarReportCard
.AcademicTermID
}),
Assessments = cur.Assessments
.Select(a => new
{
a.ScholarAssessment.AssessmentID,
a.ScholarAssessment.Assessment.Description,
a.ScholarAssessment.Assessment.Type.Nickname,
a.ScholarAssessment.AssessmentDate,
a.ScoreDesc,
a.ScorePerc,
a.MasteryLevel,
a.ScholarAssessment.Assessment.Type.AssessmentFormat,
a.ScholarAssessment.PublishedStatus,
a.ScholarAssessment.FPScore,
a.ScholarAssessment.TotalScore,
a.ScholarAssessment.Assessment.Type.ScoreType,
a.ScholarAssessment.Assessment.Type.OverrideBelowLabel,
a.ScholarAssessment.Assessment.Type.OverrideApproachingLabel,
a.ScholarAssessment.Assessment.Type.OverrideMeetingLabel,
a.ScholarAssessment.Assessment.Type.OverrideExceedingLabel,
})
})
.ToList();
Linq uses deferred execution for some tasks, for example while iterating through an IEnumerable<>, so what you call materialization includes some actual data fetching.
var reportCards = db.ScholarReportCards.Where(cr => ...); // this prepares the query
foreach (var rc in reportCards) {} // this executes your query and calls the DB
I think that if you trace/time queries on your SQL server you may see some queries arriving during the "materialization" step. This problem may even be exacerbated by anti-patterns such as the "Select N+1" problem : for example it looks like you're not including the AcademicTerm objects in your request; if you don't resolving these will result in a select N+1, that is for every ScholarReportCard there will be a call to the DB to lazily resolve the AcademicTerm attached.
If we focus on the Linq to DB aspect, at least try not to :
select n+1: Include the related datatables you will need
select too much data: include only the columns you need in your selection (Include on the table you need)