I'm trying to load a lot of records from the DB and I would like to run them in parallel to speed things up.
Below is some example code which breaks when it tries to access the Applicants property which is null. However in a non-parallel loop, Applicants property is either populated or is an empty list, but is never null. Lazy loading is definitely enabled.
var testList = new List<string>();
Context.Jobs
.AsParallel()
.WithDegreeOfParallelism(5)
.ForAll(
x => testList.Add(x.Applicants.Count().ToString())
);
Can I do something like this? Is it related to the entity framework connection? Can I make it parallel friendly and pass an instance of it into the task or something? I'm just shooting out ideas but really I haven't a clue.
Edit:
Is this post related to mine? My issue sounds kind of similar. Entity Framework lazy loading doesn't work from other thread
PLINQ does not offer a way to parallelize LINQ-to-SQL and LINQ-to-Entities queries. So when you call AsParallel EF should first materialize the query.
Furthermore, it doesn't make any sence to parallelize the query that executes on database, cause database can do that itself.
But if you want to parallelize cliend-side stuff, below code may help:
Context.Jobs
.Select(x => x.Applicants.Count().ToString())
.AsParallel()
.WithDegreeOfParallelism(5)
.ForAll(
x => testList.Add(x)
);
Note that you can access navigation properties only before the query is materialized. (in your case before AsParallel() call). So use Select to get all what you want.
Context.Jobs
.Select(x => new { Job = x, Applicants = x.Applicants })
.AsParallel()
.WithDegreeOfParallelism(5)
.ForAll(
x => testList.Add(x.Applicants.Count().ToString())
);
You also can use Include method to include navigation properties into results of the query...
Context.Jobs
.Include("Applicants")
.AsParallel()
.WithDegreeOfParallelism(5)
.ForAll(
x => testList.Add(x.Applicants.Count().ToString())
);
Related
I'm trying to use the .ToLookup() method with an EF Core query and wondering what the best practice is when using it, should I be buffering the query into a list first, or call .ToLookup() directly on the IQueryable?
var lookup = DbContext.Foo.Where(f => f.Id > 1).ToLookup(f => f.Id);
//vs:
var lookup = (await DbContext.Foo.Where(f => f.Id > 1).ToListAsync(cancellation)).ToLookup(f => f.Id);
My main concern is the ToListAsync approach will execute the query asynchronously whereas the direct .ToLookup call looks like it will block until results of the query are returned.
However as #Tim mentioned the ToListAsync approach will end up creating 2 collections in memory.
Thanks
ToLookup creates an in-memory collection similar to a Dictionary<TKey, TValue>, so there is no need to create a list and then call ToLookup, you will just waste CPU and memory for no reason.
So it's similar to ToDictionary, but different to GroupBy. The latter is using "deferred execution", which means you are still in a database context when you consume it, whereas the lookup or dictionary are collections that are already filled.
I'm running into some speed issues in my project and it seems like the primary cause it calls to the database using entity framework. Every time I call the database, it is always done as
database.Include(...).Where(...)
and I'm wondering if that is different than
database.Where(...).Include(...)?
My thinking is that the first way includes everything for all the elements in the target table, then filters out the ones I want, while the second one filters out the ones I want, then only includes everything for those. I don't fully understand entity framework, so is my thinking correct?
Entity Framework delays its querying as long as it can, up until the point where your code start working on the data. Just to prove the example:
var query = db.People
.Include(p => p.Cars)
.Where(p => p.Employer.Name == "Globodyne")
.Select(p => p.Employer.Founder.Cars);
With all these chained calls, EF has not yet called the database. Instead, it has kept track of what you're trying to fetch, and it knows what query to run if you start working with the data. If you never do anything else with query after this point, then you will never hit the database.
However, if you do any of the following:
var result = query.ToList();
var firstCar = query.FirstOrDefault();
var founderHasCars = query.Any();
Now, EF is forced to look at the database because it cannot answer your question unless it actually fetches the data from the database. At this point, not before, does EF actually hit the database.
For reference, this trigger to fetch the data is often referred to as "enumerating the collection", i.e. turning a query into an actual result set.
By deferring the execution of that query for as long as possible, EF is able to wait and see if you're going to filter/order/paginate/transform/... the result set, which could lead to EF needing to return less data than when it executes every command immediately.
This also means that when you call Include, you're not actually hitting the database, so you're not going to be loading data from items that will later be filtered by your Where clause, if you didn't enumerate the collection.
Take these two examples:
var list1 = db.People
.Include(p => p.Cars)
.ToList() // <= enumeration
.Where(p => p.Name == "Bob");
var list2 = db.People
.Include(p => p.Cars)
.Where(p => p.Name == "Bob")
.ToList(); // <= enumeration
These lists will eventually yield the same result. However, the first list will fetch data before you filter it because you called ToList before Where. This means you're going to be loading all people and their cars in memory, only to then filter that list in memory.
The second list, however, will only enumerate the collection when it already knows about the Where clause, and therefore EF will only load people named Bob and their cars into memory. The filtering will happen on the database before it gets sent back to your runtime.
You did not show enough code for me to verify whether you are prematurely enumerating the collection. I hope this answer helps you in determining whether this is the cause of your performance issues.
database.Include(...).Where(...) and I'm wondering if that is different than database.Where(...).Include(...)?
Assuming this code is verbatim (except the missing db set) and there is nothing happening inbetween the Include and Where, the order does not change the execution and therefore it is not the source of your performance issue.
I generally advise you to put your Include statements before anything else (i.e. right after db.MyTable), as a matter of readability. The other operations depends on the specific query you're trying to construct.
Most of the times order of clauses will not make any difference
Include statement tells to SQL Join one table with another
While Where will results in.. yes, SQL Where
When you do something like database.Include(...).Where(...) you are building IQueryable object that will be transleted to direct SQL after you try to access it like with .ToList() or .FirstOrDefault() and those queries are already optimized
So if you still have performance issues - you should use profiler to look for bottlenecks and maybe consider using stored procedures (those could be integrated with EF)
I'm not long ago in C# and looking for some best practice how to write a code. Now, I'm working with EF Core and have the following code
var details = _dbContext.Details.Where(x => x.Name == "Button");
foreach(var detail in details)
{
...
}
To better responsive I try to use ToListAsync() like
var details = await _dbContext.Details.Where(x => x.Name == "Button").ToListAsync();
If I'm understood, it should be more efficient way. Should I always use ToListAsync() before foreach?
The same for deleting. The first one
var details = _dbContext.Details.Where(x => x.Id == "Button");
_dbContext.Details.RemoveRange(details);
or
var details = await _dbContext.Details.Where(x => x.Id == "Button").ToListAsync();
_dbContext.Details.RemoveRange(details);
Which one will be better? So, when I not add ToListAsync(), then query will run synchronously?
Returning IEnumerable from an action results in synchronous collection iteration by the serializer. The result is the blocking of calls and a potential for thread pool starvation. To avoid synchronous enumeration, use ToListAsync before returning the enumerable.
Beginning with ASP.NET Core 3.0, IAsyncEnumerable can be used as an alternative to IEnumerable that enumerates asynchronously. For more information, see Controller action return types.
for more information check ASP.NET Core Performance Best Practices
In this case "ToListAsync()" will hinder your performance, just as ToList() would.
The foreach loop is more that capable of handling the "details" variable without the added step of converting it to a list. Doing it asynchronously or synchronously is irrelevant here.
"details" is already a "IQueryable<T>", and turning it into a "List<T>" is unnecessary.
The same applies for your deletion, the conversion is unnecessary.
If you would like to increase your performance there is something you can do;
If the only time you use the variable "details" is for iteration in your foreach loop, you can remove the variable declaration all together and just do this:
foreach(var detail in _dbContext.Details.Where(x => x.Name == "Button"))
{
...
}
Now instead of:
Create memory -> Read memory from _dbContext.Details... -> Copy that
to the newly created memory -> Read newly created memory -> Pass
memory to the foreach loop
You simply:
Read memory from _dbContext.Details... -> Pass memory to the foreach
loop
Many examples you will read will have many var assignments. These are simply to break the steps down easier for learners to understand, but will often not be a more performant way of coding.
This is my first time working with Entity Framework (EF) and I'm trying to learn what exactly executes a query on my database and what doesn't.
This is the code I'm working with. Don't mind the functionality, it isn't important for this question.
using (var db = new Context())
{
//Check if any reviews have been given.
if (combinedReviews.Any())
{
var restaurantsReviewedIds = combinedReviews.Select(rev => rev.RestaurantId);
//(1)
ratedRestaurants = db.Restaurants.Where(rest => restaurantsReviewedIds.Contains(rest.Id))
.DistinctBy(rest => rest.Id)
.ToList();
}
//(2)
var restsClose = db.Restaurants.Where(rest => db.Reviews.Any(rev => rev.RestaurantId == rest.Id))
.OrderBy(rest => rest.Location.Distance(algorithmParams.Location))
.Take(algorithmParams.AmountOfRecommendations);
//(3)
tempList = ratedRestaurants.Union(restsClose).ToList();
var tempListIds = tempList.Select(rest => rest.Id); //Temporary list.
//(4)
restsWithAverage = db.Reviews.Where(rev => tempListIds.Contains(rev.RestaurantId))
.GroupBy(rev => rev.RestaurantId)
.ToList();
}
I have marked each piece of code with numbers, so I'll refer to them with that. Below is what I think is what happens.
This executes a query since I'm calling .ToList() here.
This returns an IQueryable, so this won't execute a query against the database.
This executes the query from (2).
This executes another query since I'm calling .ToList().
How close to the truth am I? Is all of this correct? If this doesn't make sense, could you give an example what executes a query and what doesn't?
I'm sorry for asking so many questions in one question, but I thought I wouldn't need to create so many questions since all of this is about a single topic.
If you don't want to execute a query you can use AsEnumerable.
ToList vs AsEnumerable
ToList – converts an IEnumerable<T> to a List<T>. The advantage of using AsEnumerable vs. ToList is that AsEnumerable does not execute the query. AsEnumerable preserves deferred execution and does not build an often useless intermediate list.
On the other hand, when forced execution of a LINQ query is desired, ToList can be a way to do that.
You could also force execution by putting a For Each loop immediately after the query expression, but by calling ToList or ToArray you cache all the data in a single collection object.
ToLookup and ToDictionary also executing the queries.
Here you can find a list of operators and if they are executing query:
https://msdn.microsoft.com/en-us/library/mt693095.aspx.
Linq query execution is different per query. I recommend reading the following page: https://msdn.microsoft.com/en-us/library/bb738633(v=vs.110).aspx
I have a few mongo queries.
var threads = postCollection.AsQueryable<PostMongoEntity>()
.Select(w => w.ThreadId);
var entities = threadCollection.AsQueryable<ThreadMongoEntity>()
.Where(e => e.ThreadId.In(threads))
.OrderBy(e => e.Time)
.Skip(page * ThreadPageSize)
.Take(ThreadPageSize);
The first query finds all threads ids from a posts collection, the second gets all threads with that id. I wanted to know if this will do everything on the actual database. This isn't the complete query, but most of the important stuff is here. The part I'm woried about is Where(e => e.ThreadId.In(threads)). Will it send the thread list to the database or will it get all threads and do filtering locally?
It will send the list of threadIds to MongoDB. IT will NOT pull all the records back and do the filtering locally. I assume this is what you are wanting.
Well, from type compatibility looks legal. threads is IQueryable that implements IEnumerable, while operation In accept exactly IEnumerable (http://api.mongodb.org/csharp/1.9.2/)
Sorry just look attentively at your question
But!
obviously you need use long (or what type is dedicated for Id in PostMongoEntity). So it is became legal only if In accept IEnumerable of primitive types instead of entities.
P.S. This method have some restriction on number of PostMongoEntity keys - cannot quickly find exact reference.