Linq ToList() and Count() performance problem

Linq ToList() and Count() performance problem - c#

I have 200k rows in my table and I need to filter the table and then show in datatable. When I try to do that, my sql run fast. But when I want to get row count or run the ToList(), it takes long time. Also when I try to convert it to list it has 15 rows after filter, it has not huge data.
public static List<Books> GetBooks()
{
List<Books> bookList = new List<Books>();
var v = from a in ctx.Books select a);
int allBooksCount = v.Count(); // I need all books count before filter. but it is so slow is my first problem
if (isFilter)
{
v = v.Where(a => a.startdate <= DateTime.Now && a.enddate>= DateTime.Now);
}
.
.
bookList = v.ToList(); // also toList is so slow is my second problem
}

There's nothing wrong with the code you've shown. So either you have some trouble in the database itself, or you're ruining the query by using IEnumerable instead of IQueryable.
My guess is that either ctx.Books is IEnumerable<Books> (instead of IQueryable<Books>), or that the Count (and Where etc.) method you're calling is the Enumerable version, rather than the Queryable version.
Which version of Count are you actually calling?

First, to get help you need to provide quantitative values for "fast" vs. "too long". Loading entities from EF will take longer than running a raw SQL statement in a client tool like TOAD etc. Are you seeing differences of 15ms vs. 15 seconds, or 15ms vs. 150ms?
To help identify and eliminate possible culprits:
Eliminate the possibility of a long-running DbContext instance tracking too many entities bogging down performance. The longer a DbContext is used and the more entities it tracks, the slower it gets. Temporarily change the code to:
List<Books> bookList = new List<Books>();
using (var context = new YourDbContext())
{
var v = from a in context.Books select a);
int allBooksCount = v.Count(); // I need all books count before filter. but it is so slow is my first problem
if (isFilter)
{
v = v.Where(a => a.startdate <= DateTime.Now && a.enddate>= DateTime.Now);
}
.
.
bookList = v.ToList();
}
Using a fresh DbContext ensures queries are not sifting through in-memory entities after running a query to find tracked instances to return. This also ensures we are running against IQueryable off the Books DbSet within the context. We can only guess what "ctx" in your code actually represents.
Next: Look at a profiler for MySQL, or have your database log out SQL statements to capture exactly what EF is requesting. Check that the Count and ToList each trigger just one query against the database, and then run these exact statements against the database. If there are more than these two queries being run then something odd is happening behind the scenes that you need to investigate, such as that your example doesn't really represent what your real code is doing. You could be tripping client side evaluation (if using EF Core 2) or lazy loading. The next thing I would look at is if possible to look at the execution plan for these queries for hints like missing indexes or such. (my DB experience is primarily SQL Server so I cannot provide advice on tools to use for MySQL)

I would log the actual SQL queries here. You can then use DESCRIBE to look at how many rows it hits. There are various tools that can further analyse the queries if DESCRIBE isn't sufficient. This way you can see whether it's the queries or the (lack of) indices that is the problem. Next step has to be guided by that.

Related

I want a way to update a range of records based on a where clause in entity framework without using ToList() and foreach

I have a function in my asp.net core app which updates a bunch of records based on a certain criteria I write in a where clause ... I read that ToList() has bad performance , so is there a better and faster way than using tolist and foreach ???
This is my current way doing it , I would appreciate it if someone provides a more efficient way
public async Task UpdateCatalogOnTenantApproval(int tenantID)
{
var catalogQuery = GetQueryable();
var catalog = await catalogQuery.Where(x => x.IdTenant == tenantID).ToListAsync();
catalog.ForEach(c => { c.IsApprovedByAdmin = true; c.IsActive = true; });
Context.UpdateRange(catalog);
await Context.SaveChangesAsync(); ;
}

read that ToList() has bad performance ,
That is wrong. ToList has as good a performance as you will get - submit a bad query which is overly complex and which results in bad SQL that SQL Server will take ages to execute and it is slow.
Also, many people think "ToList" is slow (as in: in the profiler). You see, yo ustart with a db context, take a set of entities there, add some where clauses - all fast. Then ToList and it takes "long" (compared to the rest). Well, THAT is where the query is sent to the sql server ;) WHere (x=>whatever) takes "no time" because all it does is add some nodes to the expression tree, not executing the query. THAT is mostly what people mix up - delayed execution which exeutes only when asked for the results.
And third, some people like "ToList().Where() and complain about performance. Filter as much as possible no the DB.
All three reasons are why people think ToList is slow - but all it shows is a lack of understanding of how LINQ and SQL operate.

Entity Framework does not handle bulk update operations by default -- hence your existing code. If you really want to do these bulk operations, then you have two options:
Write the SQL yourself and use the ExecuteSqlCommand() method
to execute it; or
Look at 3rd party extensions, such as https://entityframework-extensions.net/

We can reduce query cost by selecting a subset of data before attaching for EF to track, and then updating.
However, it may be just pointless micro-optimization that does not perform significantly better unless you are processing massive amount of records.
// select pk for EF to track, and the 2 fields to be modified
var catalog = await catalogQuery.Where(x => x.IdTenant == tenantID)
.Select(x => new Catelog{x.CatelogId, x.IsApprovedByAdmin, x.IsActive }).ToListAsync();
//next we attach range here to let EF track the list
Context.AttachRange(catalog);
//perform your update as usual, this will be flagged as modified if changed
catalog.ForEach(c => { c.IsApprovedByAdmin = true; c.IsActive = true; });
//save and let EF update based on modified fields.
await Context.SaveChangesAsync();

Let me explain to you what you have done and what you are trying to do.
You are partially right about the performance issues related to ToList and ToListAsync as they are mainly responsible to upload entities to the memory and track them.
Based on that if your request is expected to deal intensively with light data you are not required to enhance your code. if it is not, however, there are many open approaches each one has its pros and cons and you have to treat and balance between them for each case you do not want to use the dual app-SQL requests.
let's be more realistic by talking about your case:
1- we assume that your method is a resource-consuming by (loading high volume of data, intensively called, or both)
2- I see the modification is too static by updating all of the rows by c.IsApprovedByAdmin = true; c.IsActive = true;
form (1) and (2) I suggest to write a stored procedure or ExexcuteSqlCammand (as Bryan Lewis suggested) that does this for you
because (3) the stored procedures, triggers, and all the SQL based operation are hard-maintainable and are highly potential for hidden exceptions. In your case, however, you less likely to fell into that as your code is too basic and you could reduce more the risk by construct your query from dynamic elements such as nameof(yourClassName that is the table name).YouProperty and the like ...
Anyway, this is an example to show that there is no ideal approach and you have study each case alone.
Finally, I do not agree with the 3d parties extensions as most of freely provided developed by unprofessionals and tracking exceptions caused by them are nightmares, and the paid versions are too expensive and not 0-exception extensions. The 3d party extension are more oriented to the complex bulk update/delete and/or huge data.
e.g.
await Context.UpdateAsync(e=> new Catalog
{ Archived = e.LastUpdate >
DateTime.UtcNow.AddYears(-99)? false : true
});

Entity Framework - Performance in count

I've a little question about performance with Entity Framework.
Something like
using (MyContext context = new MyContext())
{
Document DocObject = context.Document.Find(_id);
int GroupCount = context.Document.Where(w=>w.Group == DocObject.Group).ToList().Count();
}
takes about 2 seconds in my database (about 30k datasets), while this one
using (MyContext context = new MyContext())
{
Document DocObject = context.Document.Find(_id);
int GroupCount = context.Document.Where(w=>w.Group == DocObject.Group).Count();
}
takes 0,02 seconds.
When my filter for 10 documents had 20 seconds to wait, I checked my code, and changed this to not use ToList() before Count().
Any ideas why it needs 2 seconds for this line with the ToList()?

Calling ToList() then Count() will:
execute the whole SELECT FROM WHERE against your database
then materialize all the resulting entities as .Net objects
create a new List<T> object containing all the results
return the result of the Count property of the .Net list you just created
Calling Count() against an IQueryable will:
execute SELECT COUNT FROM WHERE against your database
return an Int32 with the number of rows
Obviously, if you're only interested in the number of items (not the items themselves), then you shouldn't ever call ToList() first, as it will require a lot of resources for nothing.

Yes, ToList() will evaluate the results (retrieving the objects from database), if you do not use ToList(), the objects aren´t retrieved from the database.
Linq-To-Entities uses LazyLoading per default.
It works something like this;
When you query your underlying DB connection using Linq-To-Entities you will get a proxy object back on which you can perform a number of operations (count being one). This means that you do not get all the data from the DB at once but rather the objects are retrieved from the DB at the time of evaluation. One way of evaluating the object is by using ToList().
Maybe you should read this.

Because ToList() will query the database for the whole objects (will do a SELECT * so to say), and then you'll use Count() on the list in memory with all the records, whereas if you use Count() on the IQueryable (and not on the List), EF will translate it to a simple SELECT COUNT(*) SQL query

Your first query isnt fully transalted to sql - when you call .ToList().Count(), you are basically saying "download all, materialize it to POCO and call extension method named Count()" which, of course, take some time.
Your second query is, however, transalted to something like select count(*) from Documents where GroupId = #DocObjectGroup which is much faster to execute and you arent materializing anything, just simple scalar.

Using the extension method Enumerable.ToList() will construct a new List object from the IEnumerable<T> source collection which means that there is an associated cost with doing ToList().

EF LINQ ToList is very slow

I am using ASP NET MVC 4.5 and EF6, code first migrations.
I have this code, which takes about 6 seconds.
var filtered = _repository.Requests.Where(r => some conditions); // this is fast, conditions match only 8 items
var list = filtered.ToList(); // this takes 6 seconds, has 8 items inside
I thought that this is because of relations, it must build them inside memory, but that is not the case, because even when I return 0 fields, it is still as slow.
var filtered = _repository.Requests.Where(r => some conditions).Select(e => new {}); // this is fast, conditions match only 8 items
var list = filtered.ToList(); // this takes still around 5-6 seconds, has 8 items inside
Now the Requests table is quite complex, lots of relations and has ~16k items. On the other hand, the filtered list should only contain proxies to 8 items.
Why is ToList() method so slow? I actually think the problem is not in ToList() method, but probably EF issue, or bad design problem.
Anyone has had experience with anything like this?
EDIT:
These are the conditions:
_repository.Requests.Where(r => ids.Any(a => a == r.Student.Id) && r.StartDate <= cycle.EndDate && r.EndDate >= cycle.StartDate)
So basically, I can checking if Student id is in my id list and checking if dates match.

Your filtered variable contains a query which is a question, and it doesn't contain the answer. If you request the answer by calling .ToList(), that is when the query is executed. And that is the reason why it is slow, because only when you call .ToList() is the query executed by your database.
It is called Deferred execution. A google might give you some more information about it.
If you show some of your conditions, we might be able to say why it is slow.

In addition to Maarten's answer I think the problem is about two different situation
some condition is complex and results in complex and heavy joins or query in your database
some condition is filtering on a column which does not have an index and this cause the full table scan and make your query slow.
I suggest start monitoring the query generated by Entity Framework, it's very simple, you just need to set Log function of your context and see the results,
using (var context = new MyContext())
{
context.Database.Log = Console.Write;
// Your code here...
}
if you see something strange in generated query try to make it better by breaking it in parts, some times Entity Framework generated queries are not so good.
if the query is okay then the problem lies in your database (assuming no network problem).
run your query with an SQL profiler and check what's wrong.
UPDATE
I suggest you to:
add index for StartDate and EndDate Column in your table (one for each, not one for both)

ToList executes the query against DB, while first line is not.
Can you show some conditions code here?
To increase the performance you need to optimize query/create indexes on the DB tables.

Your first line of code only returns an IQueryable. This is a representation of a query that you want to run not the result of the query. The query itself is only runs on the databse when you call .ToList() on your IQueryable, because its the first point that you have actually asked for data.
Your adjustment to add the .Select only adds to the existing IQueryable query definition. It doesnt change what conditions have to execute. You have essentially changed the following, where you get back 8 records:
select * from Requests where [some conditions];
to something like:
select '' from Requests where [some conditions];
You will still have to perform the full query with the conditions giving you 8 records, but for each one, you only asked for an empty string, so you get back 8 empty strings.
The long and the short of this is that any performance problem you are having is coming from your "some conditions". Without seeing them, its is difficult to know. But I have seen people in the past add .Where clauses inside a loop, before calling .ToList() and inadvertently creating a massively complicated query.

Jaanus. The most likely reason of this issue is complecity of generated SQL query by entity framework. I guess that your filter condition contains some check of other tables.
Try to check generated query by "SQL Server Profiler". And then copy this query to "Management Studio" and check "Estimated execution plan". As a rule "Management Studio" generatd index recomendation for your query try to follow these recomendations.

IQueryable<>.ToString() too slow

I'm using BatchDelete found on the answer to this question: EF Code First Delete Batch From IQueryable<T>?
The method seems to be wasting too much time building the delete clause from the IQueryable. Specifically, deleting 20.000 elements using the IQueryable below is taking almost two minutes.
context.DeleteBatch(context.SomeTable.Where(x => idList.Contains(x.Id)));
All the time is spent on this line:
var sql = clause.ToString();
The line is part of this method, available on the original question linked above but pasted here for convenience:
private static string GetClause<T>(DbContext context, IQueryable<T> clause) where T : class
{
const string Snippet = "FROM [dbo].[";
var sql = clause.ToString();
var sqlFirstPart = sql.Substring(sql.IndexOf(Snippet, System.StringComparison.OrdinalIgnoreCase));
sqlFirstPart = sqlFirstPart.Replace("AS [Extent1]", string.Empty);
sqlFirstPart = sqlFirstPart.Replace("[Extent1].", string.Empty);
return sqlFirstPart;
}
I imagine making context.SomeTable.Where(x => idList.Contains(x.Id)) into a compiled query could help, but AFAIK you can't compile queries while using DbContext on EF 5. In thesis they should be cached but I see no sign of improvement on a second execution of the same BatchDelete.
Is there a way to make this faster? I would like to avoid manually building the SQL delete statement.

The IQueryable isn't cached and each time you evaluate it you're going out to SQL. Running ToList() or ToArray() on it will evaluate it once and then you can work with the list as the cached version.
If you want to preserve you're interfaces, you'd use ToList().AsQueryable() and this would pass in a cached version.
Related post.
How do I cache an IQueryable object?

It seems there is no way to cache the IQueryable in this case, because the query contains a list of ids to check against and the list changes in every call.
The only way I found to avoid the two minute delay in building the query every time I had to mass-delete objects was to use ExecuteSqlCommand as below:
var list = string.Join("','", ids.Select(x => x.ToString()));
var qry = string.Format("DELETE FROM SomeTable WHERE Id IN ('{0}')", list);
context.Database.ExecuteSqlCommand(qry);
I'll mark this as the answer for now. If any other technique is suggested that doesn't rely on ExecuteSqlCommand, I'll gladly change the answer.

There is a EF pattern that works Ok.
it uses projection. to return ONLY keys from DB. (projections are not added to context,
So this is pretty quick.
Then You build the context with KEY only stub POCOs, and light the fuse....
basically.
var deleteMagazine = Context.Set<DeadMeat>.Where(t=>t.IhateYou == true).Select(t=>t.THEKEY).toList
//Now instantiate a dummy POCO with KEY only for the list,
foreach ( var bullet in deleteMagazine)
{
context.Set<deadmeat>.attach(bullet);
context.set<deadmeat>.remove(bullet);
// consider saving chnages every 1000 records .... performance, trial different values
if (magazineisEmpty) // your counter logic here :-)
context.SaveChanges
}
// shoot anyone still moving
context.SaveChanges
check SQL server profiler....

EF 4.0 make batch query to get count of resultset but return only top 5 records

Is there a way to get count of resultset but return only top 5 records while making just one db hit instead of 2 (one for count and second for data)

There is not a particularly good way to do this in Entity Framework, at least as of v4. #Tobias writes a single LINQ query, but his suspicions are correct. You'll see multiple queries roll by in SQL Profiler.
Ignoring EF for a minute, this is a relatively complicated problem for SQL Server. Well, it's complicated once your data size gets large or your query gets complicated. You can get a flavor for what's involved here.
With that said, I wouldn't worry about it being 2 queries just yet. Don't optimize until you know it is an actual performance problem. You'll likely end up working around EF, maybe using the EF extensions and creating a stored proc that can take advantage of windowed functions and CTE's. Or maybe it will just return two result sets in a single procedure.

This little query should do the trick (I'm not sure if that's really just one physical query though, and it could be that the grouping is done in the code rather than in the DB), but it's definitely more convenient:
var obj = (from x in entities.SomeTable
let item = new { N = 1, x }
group item by item.N into g
select new { Count = g.Count(), First = g.Take(5) }).FirstOrDefault();
Nonetheless, just doing this in two queries will definitely be much faster (especially if you define them in one stored procedure, as proposed here).

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.