Optimize LINQ to Objects query

Optimize LINQ to Objects query - c#

I have around 200K records in a list and I'm looping through them and forming another collection. This works fine on my local 64 bit Win 7 but when I move it to a Windows Server 2008 R2, it takes a lot of time. There is difference of about an hour almost!
I tried looking at Compiled Queries and am still figuring it out.
For various reasons, we cant do a database join and retrieve the child values
Here is the code:
//listOfDetails is another collection
List<SomeDetails> myDetails = null;
foreach (CustomerDetails myItem in customerDetails)
{
var myList = from ss in listOfDetails
where ss.CustomerNumber == myItem.CustomerNum
&& ss.ID == myItem.ID
select ss;
myDetails = (List<SomeDetails>)(myList.ToList());
myItem.SomeDetails = myDetails;
}

I would do this differently:
var lookup = listOfDetails.ToLookup(x => new { x.CustomerNumber, x.ID });
foreach(var item in customerDetails)
{
var key = new { CustomerNumber = item.CustomerNum, item.ID };
item.SomeDetails = lookup[key].ToList();
}
The big benefit of this code is that it only has to loop through the listOfDetails once to build the lookup - which is nothing more than a hash map. After that we just get the values using the key, which is very fast as that is what hash maps are built for.

I don't know why you have the difference in performance, but you should be able to make that code perform better.
//listOfDetails is another collection
List<SomeDetails> myDetails = ...;
detailsGrouped = myDetails.ToLookup(x => new { x.CustomerNumber, x.ID });
foreach (CustomerDetails myItem in customerDetails)
{
var myList = detailsGrouped[new { CustomerNumber = myItem.CustomerNum, myItem.ID }];
myItem.SomeDetails = myList.ToList();
}
The idea here is to avoid the repeated looping on myDetails, and build a hash based lookup instead. Once that is built, it is very cheap to do a lookup.

The inner ToList() is forcing an evaluation on each loop, which has got to hurt. The SelectMany might let you avoid the ToList, something like this :
var details = customerDetails.Select( item => listOfDetails
.Where( detail => detail.CustomerNumber == item.CustomerNum)
.Where( detail => detail.ID == item.ID)
.SelectMany( i => i as SomeDetails )
);
If you first get all the SomeDetails and then assign them to the items, it might speed up. Or it might not. You should really profile to see where the time is being taken.

I think you'd probably benefit from a join here, so:
var mods = customerDetails
.Join(
listOfDetails,
x => Tuple.Create(x.ID, x.CustomerNum),
x => Tuple.Create(x.ID, x.CustomerNumber),
(a, b) => new {custDet = a, listDet = b})
.GroupBy(x => x.custDet)
.Select(g => new{custDet = g.Key,items = g.Select(x => x.listDet).ToList()});
foreach(var mod in mods)
{
mod.custDet.SomeDetails = mod.items;
}
I didn't compile this code...
With a join the matching of items from one list against another is done by building a hashtable-like collection (Lookup) of the second list in O(n) time. Then it's a matter of iterating the first list and pulling items from the Lookup. As pulling data from a hashtable is O(1), the iterate/match phase also only takes O(n), as does the subsequent GroupBy. So in all the operation should take ~O(3n) which is equivalent to O(n), where n is the length of the longer list.

Related

How to join two different databases' tables in C# with Linq?

I try to join two different databases tables in c# but it gives me an error how can I handle that ?
this is my query:
var list = (from h in db.database1.AsEnumerable()
join j in NV_DB.database2.AsEnumerable()
on h.Creation_Date equals j.Creation_Date
where j.Ship_Status == 3 && h.CustomerNo == CustomerNo
select new
{
shipName = h.ShipName,
creationDate = j.Creation_Date,
endingDate = j.Ending_Date
}
).ToList();
if I do like this it gives me System.OverflowException error. But when I run this in sql, it gives me just 30 records*

You need to remove °AsEnumerable`. While it does not run the sql, when you use it in the where it actually brings the entire tables in memory and then performs the job where part of your query
Your answer is basically the first comment in the accepted answer here: Am I misunderstanding LINQ to SQL .AsEnumerable()?
While AsEnumerable doesn't evaluate the query at the time that it's called , it definitely has an effect. Anything further called on the query will be evaluated using LINQ to objects, so you can't compose additional elements onto the query (another Where or an OrderBy or anything of that nature) that will become part of the SQL statement.
In depth explanation here: https://www.codeproject.com/Articles/732425/IEnumerable-Vs-IQueryable
While querying data from database, IEnumerable executes select query on server side, load data in-memory on client side and then filter data. Hence does more work and becomes slow.
While querying data from database, IQueryable executes select query on server side with all filters. Hence does less work and becomes fas

To debug this, start dividing your statements into smaller steps:
var list1 = db.database1.AsEnumerable().ToList();
var list2 = NV_DB.database2.AsEnumerable().ToList();
var joinResult = list1.Join(list2, // join list1 and list2
list1Row => list1Row.CreationDate, // from every row in list1 take the CreationDate
list2Row => list2Row.CreationDate, // from every row in list2 take the CreationDate
(list1Row, list2Row) => new // when they match, make one new object
{
// You only need the following properties:
ShipName = list1Item.ShipName,
CreationDate = list2Item.CreationDate,
EndingDate = list2Item.EndingDate,
ShipStatus = list2Item.ShipStatus,
CustomerNo = list1Item.CustomerNo,
})
.ToList();
var whereResult = joinResult
.Where(joinedRow => joinedRow.ShipStatus == 3
&& joinedRow.CustomerNo == customerNo)
.ToList();
var selectResult = whereResult.Select(whereResultRow => new
{
ShipName = whereResultRow.ShipName,
CreationDate = whereResultRow.CreationDate,
EndingDate = whereResultRow.Ending_Date,
})
.ToList();
This is executed completely as enumerable (in your local process, not by the database management system). My guess would be that this runs smoothly.
Now combine thw first few statements:
var joinResult = db.database1.AsEnumerable()
.Join(NV_DB.database2.AsEnumerable(), // join list1 and list2
list1Row => list1Row.CreationDate, // from every row in list1 take the CreationDate
list2Row => list2Row.CreationDate, // from every row in list2 take the CreationDate
(list1Row, list2Row) => new // when they match, make one new object
{
// You only need the following properties:
ShipName = list1Item.ShipName,
CreationDate = list2Item.CreationDate,
EndingDate = list2Item.EndingDate,
ShipStatus = list2Item.ShipStatus,
CustomerNo = list1Item.CustomerNo,
})
.ToList();
When this works, add the Where:
var whereResult = db.database1.AsEnumerable()
.Join(NV_DB.database2.AsEnumerable(), ...)
.Where(joinedRow => joinedRow.ShipStatus == 3
&& joinedRow.CustomerNo == customerNo)
.ToList();
Etc.
Using your debugger, you'll find the problem within a few minutes (depending on your compilation time). My guess is that it is within your join.

How can I convert this linq to bool

How to convert a query to bool?
I used the "ALL (x => x)" but did not give the answer I needed.
Code Line
checkItemInventory.Where(x => listCost.Contains(x.Id));
In this case, the listcost would have 2 items, I needed to check if the checkItemInventory has these 2 items.

"All items in the inventory have an id that present in listcost". listCost needs to have the same number of items as inventory (assuming Id is unique) possibly more, to stand a chance of returning true
checkItemInventory.All(x => listCost.Contains(x.Id))
"At least one item in the inventory has an id that is also in listCost". Listcost could minimally have only one id in it, to stand a chance of returning true
checkItemInventory.Any(x => listCost.Contains(x.Id))
As you can see, neither of these are what you want as you seem to be saying you want to check whether every item in listcost is also present in the inventory. This is like the top code, but the other way round ("all items in listCost are present in inventory" vs "all items in inventory are present in listcost"
I think I'd make a dictionary out of the inventory first, unless it's already something that supports a fast lookup:
var d = checkItemInventory.Select(x => new { x.Id, x.Id }).ToDictionary();
var boolResult = listCost.All(lc => d.ContainsKey(lc));
If inventory is small, you could use this approach:
listCost.All(lc => checkItemInventory.Any(cii => cii.Id == lc));
Just be mindful that internally it might do something like:
bool all = true;
foreach(lc in listCost){
bool found = false;
foreach(cci in checkItemInventory)
if(lc == cci.Id){
found = true;
break;
}
all &= found;
if(!all)
return false;
}
return true;
Which is a lot of repeated comparisons (for every item in listCost, the whole inventory is scanned), could be slow
Edit
I asked for clarification of how you store your inventory and your costs of building items. Here's one assumption I made, and how a solutio based on it might work:
Assuming your inventory has the kind of item and a count saying how many of that item the player is carrying:
class InventoryItem{
int ItemKindId { get; set;}
int CountOf { get; set; }
}
player.Inventory.Add(new InventoryItem() {
ItemKindId = Constants.WOOD, //1
CountOf = 10 //holding 10 items of wood
};
player.Inventory.Add(new InventoryItem() {
ItemKindId = Constants.STONE, //2
CountOf = 5 //holding 5 items of stone
};
Assuming you have a Recipe for making e.g. an axe, it needs 1 wood and 2 stone, but it lists them in simple order:
int[] axeRecipe = new int[] { Constants.WOOD, Constants.STONE, Constants.STONE };
Might be easiest to group the recipe:
var recipe = axeRecipe.GroupBy(item => item)
/*
now we have a grouping of the recipe[item].Key as the material and a
recipe[item].Count() of how much. The group is like a dictionary:
recipe[Constants.WOOD] = new List<int>{ Constants.WOOD };
recipe[Constants.STONE] = new List<int>{ Constants.STONE, Constants.STONE, };
A group item has a Key and a list of objects that have that key
Because my recipe was simply ints, the Key is the same number as all the
items in the list
*/
//for all items in the recipe
grp.All(groupItem =>
//does the player inventory contain any item
playerInventory.Any(inventoryItem =>
//where the material kind is the same as the recipe key (material)
inventoryItem.ItemKindId == groupItem.Key &&
//and the count they have of it, is enough to make the recipe
inventoryItem.CountOf >= groupItem.Count()
);
You can of course reduce this to a single line if you want: axeRecipe.GroupBy(...).All(...)

You could map the listCost to a list of int and then use Except() and Any() to check whether all items are contained:
bool containsAll = !listCost.Select(x => x.Id).Except(checkItemInventory).Any();

[UPDATE]
You are telling us the following:
How to convert a query to bool? I used the "ALL (x => x)" but did not give the answer I needed.
checkItemInventory.Where(x => listCost.Contains(x.Id));
In this case, the listcost would have 2 items, I needed to check if
the checkItemInventory has these 2 items.
if you need to check if there is any result then you can use:
bool hasItems = checkItemInventory.Where(x => listCost.Contains(x.Id)).Any();
if you need to count the result you can use
checkItemInventory.Where(x => listCost.Contains(x.Id)).Count();

You could use a Join to create a method based Linq query and use the results to check if the length of the list is greater than 0. Then turn that into a boolean.
var query = checkItemInventory.Join(listCost,
inventory => inventory.Id,
cost => cost.Id,
(inventory, cost) => new { id = inventory.Id });
var count = query.ToList().Count();
var b = (count > 0);

If I get it correctly, listCost can have less elements than checkItemInventory. You want to check that all elements in listCost have a corresponding element in checkItemInventory. Correct? If yes, try this:
listCost.All(x => checkItemInventory.Contains(x));
I don't know the type of these lists, so you might need to use x.id in some places

How do I convert this looped code to a single LINQ implementation?

I am trying to optimise the code below which loops through objects one by one and does a database lookup. I want to make a LINQ statement that will do the same task in one transaction.
This is my inefficient looped code;
IStoreUnitOfWork uow = StoreRepository.UnitOfWorkSource.GetUnitOfWorkFactory().CreateUnitOfWork();
var localRunners = new List<Runners>();
foreach(var remoteRunner in m.Runners) {
var localRunner = uow.CacheMarketRunners.Where(x => x.SelectionId == remoteRunner.SelectionId && x.MarketId == m.MarketId).FirstOrDefault();
localRunners.Add(localRunner);
}
This is my very feable attempt at a single query to do the same thing. Well it's not even an attempt. I don't know where to start. The remoteRunners object has a composite key.
IStoreUnitOfWork uow = StoreRepository.UnitOfWorkSource.GetUnitOfWorkFactory().CreateUnitOfWork();
var localRunners = new List<Runners>();
var localRunners = uow.CacheMarketRunners.Where(x =>
x.SelectionId in remoteRunners.SelectionId &&
x.MarketId in remoteRunners.MarketId);
Thank you for looking

So you have an object m, which has a property MarketId. Object m also has a sequence of Runners, where every Runner has a property SelectionId.
Your database has CacheMarketRunners. Every CacheMarketRunner has a MarketId and a SelectionId.
Your query should return allCacheMarketRunners with a MarketId equal to m.MarketId and a SelectionId that is contained in the sequence m.Runners.SelectionId.
If your m does not have too many Runners, say less then 250, consider using Queryable.Contains
var requestedSelectionIds = m.Runners.Select(runner => runner.SelectionId);
var result = CacheMarketRunners.Where(cacheMarketRunner =>
cacheMarketRunner.MarketId == m.MarketId
&& requestedSelectionIds.Contains(cacheMarketRunner.SelectionId));

To improve performance, you need caching transaction results:
var marketRunners = uow.CacheMarketRunners.Where(x => x.MarketId == m.MarketId).ToList();
Transaction results regarding uow are stored in the List, such that you don't have transaction in the for loop. Hence performance should be improved:
var localRunners = new List<Runners>();
foreach(var remoteRunner in m.Runners) {
var localRunner = marketRunners.FirstOrDefault(x => x.SelectionId == remoteRunner.SelectionId);
localRunners.Add(localRunner);
}
You can even remove the for loop:
var localRunners = m.Runners.Select(remoteRunner => marketRunners.FirstOrDefault(x => x.SelectionId == remoteRunner.SelectionId)).ToList();

How to group a list with Linq

I have a list which I get from a database. The structure looks like (which I'm representing with JSON as it's easier for me to visualise)
{id:1
value:"a"
},
{id:1
value:"b"
},
{id:1
value:"c"
},
{id:2
value:"t"
}
As you can see, I have 2 unique ID's, ID 1 and 2. I want to group by the ID. The end result I'd like is
{id:1,
values:["a","b","c"],
},
{id:2,
values["g"]
}
Is this possible with Linq? At the moment, I have a massive complex foreach, which first sorts the list (by ID) and then detects if it's already been added etc but this monstrous loop made me realise I'm doing wrong and honestly, it's too embarrassing to share.

You can group by the item Id and have the resulting type be a Dictionary<int, List<string>>
var result = myList.GroupBy(item => item.Id)
.ToDictionary(item => item.Key,
item => item.Select(i => i.Value).ToList());

You can either use GroupBy method on IEnumerable to create IGrouping object that contains a key and grouped objects or you can use ToLookupto create exactly what you want in result:
yourList.ToLookup(m => m.id, m => m.value);
This creates a hashed collection of keys with their values.
For more information please see below post:
https://www.c-sharpcorner.com/UploadFile/d3e4b1/practical-usage-of-using-tolookup-method-in-linq-C-Sharp/

Just a little more detail to emphasize the difference between the ToLookup approach and the GroupBy approach:
// class definition
public class Item
{
public long Id { get; set; }
public string Value { get; set; }
}
// create your list
var items = new List<Item>
{
new Item{Id = 0, Value = "value0a"},
new Item{Id = 0, Value = "value0b"},
new Item{Id = 1, Value = "value1"}
};
// this approach results in a List<string> (a collection of the values)
var lookup = items.ToLookup(i => i.Id, i => i.Value);
var groupOfValues = lookup[0].ToList();
// this approach results in a List<Item> (a collection of the objects)
var itemsGroupedById = items.GroupBy(i => i.Id).ToList();
var groupOfItems = itemsGroupedById[0].ToList();
So, if you want to work with values only after grouping, then you could take the first approach; if you want to work with objects after grouping, you could take the second approach. And, these are just a couple example implementations, there are plenty of ways to accomplish your goal.

First convert to a Lookup then select into a list, like so:
var groups = list
.ToLookup
(
item => item.ID,
item => item.Value
)
.Select
(
item => new
{
ID = item.Key,
Values = item.ToList()
}
)
.ToList();
The resulting JSON looks like this:
[{"ID":1,"Values":["a","b","c"]},{"ID":2,"Values":["t"]}]
Link to working example on DotNetFiddle.

Complexity limits of Linq queries

I'm a big fan of Linq, and I have been really enjoying the power of expression trees etc. But I have found that whenever I try to get too clever with my queries, I hit some kind of limitation in the framework: while the query can take a very short time to run on the database (as shown by performance analyzer), the results take ages to materialize. When that happens I know I've been too fancy, and I start breaking the query up into smaller, bite sized chunks - so I have a solution for that, though it might not always be the most optimal.
But I'd like to understand:
What is it that pushes the Linq framework over the edge in terms of materializing the query results?
Where can I read about the mechanism of materializing query results?
Is there a certain measurable complexity limit for Linq queries that should be avoided?
What design patterns are known to cause this problem, and what patterns can remedy it?
EDIT: As requested in comments, here's an example of a query that I measured to run on SQL Server in a few seconds, but took almost 2 minutes to materialize. I'm not going to try explaining all the stuff in context; it's here just so you can view the constructs and see an example of what I'm talking about:
Expression<Func<Staff, TeacherInfo>> teacherInfo =
st => new TeacherInfo
{
ID = st.ID,
Name = st.FirstName + " " + st.LastName,
Email = st.Email,
Phone = st.TelMobile,
};
var step1 =
currentReportCards.AsExpandable()
.GroupJoin(db.ScholarReportCards,
current =>
new { current.ScholarID, current.AcademicTerm.AcademicYearID },
past => new { past.ScholarID, past.AcademicTerm.AcademicYearID },
(current, past) => new
{
Current = current,
PastCards =
past.Where(
rc =>
rc.AcademicTerm.StartDate <
current.AcademicTerm.StartDate &&
rc.AcademicTerm.Grade == current.AcademicTerm.Grade &&
rc.AcademicTerm.SchoolID == current.AcademicTerm.SchoolID)
});
// This materialization is what takes a long time:
var subjects = step1.SelectMany(x => from key in x.Current.Subjects
.Select(s => new { s.Subject.SubjectID, s.Subject.SubjectCategoryID })
.Union(x.PastCards.SelectMany(c => c.Subjects)
.Select(
s => new { s.Subject.SubjectID, s.Subject.SubjectCategoryID }))
join cur in x.Current.Subjects on key equals
new { cur.Subject.SubjectID, cur.Subject.SubjectCategoryID } into jcur
from cur in jcur.DefaultIfEmpty()
join past in x.PastCards.SelectMany(p => p.Subjects) on key equals
new { past.Subject.SubjectID, past.Subject.SubjectCategoryID } into past
select new
{
x.Current.ScholarID,
IncludeInContactSection =
// ReSharper disable ConstantNullCoalescingCondition
(bool?)cur.Subject.IncludeInContactSection ?? false,
IncludeGrades = (bool?)cur.Subject.IncludeGrades ?? true,
// ReSharper restore ConstantNullCoalescingCondition
SubjectName =
cur.Subject.Subject.Name ?? past.FirstOrDefault().Subject.Subject.Name,
SubjectCategoryName = cur.Subject.SubjectCategory.Description,
ClassInfo = (from ce in myDb.ClassEnrollments
.Where(
ce =>
ce.Class.SubjectID == cur.Subject.SubjectID
&& ce.ScholarID == x.Current.ScholarID)
.Where(enrollmentExpr)
.OrderByDescending(ce => ce.TerminationDate ?? DateTime.Today)
let teacher = ce.Class.Teacher
let secTeachers = ce.Class.SecondaryTeachers
select new
{
ce.Class.Nickname,
Primary = teacherInfo.Invoke(teacher),
Secondaries = secTeachers.AsQueryable().AsExpandable()
.Select(ti => teacherInfo.Invoke(ti))
})
.FirstOrDefault(),
Comments = cur.Comments
.Select(cc => new
{
Staff = cc.Staff.FirstName + " "
+ cc.Staff.LastName,
Comment = cc.CommentTemplate.Text ??
cc.CommentFreeText
}),
// ReSharper disable ConstantNullCoalescingCondition
DisplayOrder = (byte?)cur.Subject.DisplayOrder ?? (byte)99,
// ReSharper restore ConstantNullCoalescingCondition
cur.Percentile,
cur.Score,
cur.Symbol,
cur.MasteryLevel,
PastScores = past.Select(p => new
{
p.Score,
p.Symbol,
p.MasteryLevel,
p.ScholarReportCard
.AcademicTermID
}),
Assessments = cur.Assessments
.Select(a => new
{
a.ScholarAssessment.AssessmentID,
a.ScholarAssessment.Assessment.Description,
a.ScholarAssessment.Assessment.Type.Nickname,
a.ScholarAssessment.AssessmentDate,
a.ScoreDesc,
a.ScorePerc,
a.MasteryLevel,
a.ScholarAssessment.Assessment.Type.AssessmentFormat,
a.ScholarAssessment.PublishedStatus,
a.ScholarAssessment.FPScore,
a.ScholarAssessment.TotalScore,
a.ScholarAssessment.Assessment.Type.ScoreType,
a.ScholarAssessment.Assessment.Type.OverrideBelowLabel,
a.ScholarAssessment.Assessment.Type.OverrideApproachingLabel,
a.ScholarAssessment.Assessment.Type.OverrideMeetingLabel,
a.ScholarAssessment.Assessment.Type.OverrideExceedingLabel,
})
})
.ToList();

Linq uses deferred execution for some tasks, for example while iterating through an IEnumerable<>, so what you call materialization includes some actual data fetching.
var reportCards = db.ScholarReportCards.Where(cr => ...); // this prepares the query
foreach (var rc in reportCards) {} // this executes your query and calls the DB
I think that if you trace/time queries on your SQL server you may see some queries arriving during the "materialization" step. This problem may even be exacerbated by anti-patterns such as the "Select N+1" problem : for example it looks like you're not including the AcademicTerm objects in your request; if you don't resolving these will result in a select N+1, that is for every ScholarReportCard there will be a call to the DB to lazily resolve the AcademicTerm attached.
If we focus on the Linq to DB aspect, at least try not to :
select n+1: Include the related datatables you will need
select too much data: include only the columns you need in your selection (Include on the table you need)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Optimize LINQ to Objects query - c#

Related

How to join two different databases' tables in C# with Linq?

How can I convert this linq to bool

How do I convert this looped code to a single LINQ implementation?

How to group a list with Linq

Complexity limits of Linq queries

Categories

Resources