Entity Framework - Performance in count

Entity Framework - Performance in count - c#

I've a little question about performance with Entity Framework.
Something like
using (MyContext context = new MyContext())
{
Document DocObject = context.Document.Find(_id);
int GroupCount = context.Document.Where(w=>w.Group == DocObject.Group).ToList().Count();
}
takes about 2 seconds in my database (about 30k datasets), while this one
using (MyContext context = new MyContext())
{
Document DocObject = context.Document.Find(_id);
int GroupCount = context.Document.Where(w=>w.Group == DocObject.Group).Count();
}
takes 0,02 seconds.
When my filter for 10 documents had 20 seconds to wait, I checked my code, and changed this to not use ToList() before Count().
Any ideas why it needs 2 seconds for this line with the ToList()?

Calling ToList() then Count() will:
execute the whole SELECT FROM WHERE against your database
then materialize all the resulting entities as .Net objects
create a new List<T> object containing all the results
return the result of the Count property of the .Net list you just created
Calling Count() against an IQueryable will:
execute SELECT COUNT FROM WHERE against your database
return an Int32 with the number of rows
Obviously, if you're only interested in the number of items (not the items themselves), then you shouldn't ever call ToList() first, as it will require a lot of resources for nothing.

Yes, ToList() will evaluate the results (retrieving the objects from database), if you do not use ToList(), the objects aren´t retrieved from the database.
Linq-To-Entities uses LazyLoading per default.
It works something like this;
When you query your underlying DB connection using Linq-To-Entities you will get a proxy object back on which you can perform a number of operations (count being one). This means that you do not get all the data from the DB at once but rather the objects are retrieved from the DB at the time of evaluation. One way of evaluating the object is by using ToList().
Maybe you should read this.

Because ToList() will query the database for the whole objects (will do a SELECT * so to say), and then you'll use Count() on the list in memory with all the records, whereas if you use Count() on the IQueryable (and not on the List), EF will translate it to a simple SELECT COUNT(*) SQL query

Your first query isnt fully transalted to sql - when you call .ToList().Count(), you are basically saying "download all, materialize it to POCO and call extension method named Count()" which, of course, take some time.
Your second query is, however, transalted to something like select count(*) from Documents where GroupId = #DocObjectGroup which is much faster to execute and you arent materializing anything, just simple scalar.

Using the extension method Enumerable.ToList() will construct a new List object from the IEnumerable<T> source collection which means that there is an associated cost with doing ToList().

Related

Linq ToList() and Count() performance problem

I have 200k rows in my table and I need to filter the table and then show in datatable. When I try to do that, my sql run fast. But when I want to get row count or run the ToList(), it takes long time. Also when I try to convert it to list it has 15 rows after filter, it has not huge data.
public static List<Books> GetBooks()
{
List<Books> bookList = new List<Books>();
var v = from a in ctx.Books select a);
int allBooksCount = v.Count(); // I need all books count before filter. but it is so slow is my first problem
if (isFilter)
{
v = v.Where(a => a.startdate <= DateTime.Now && a.enddate>= DateTime.Now);
}
.
.
bookList = v.ToList(); // also toList is so slow is my second problem
}

There's nothing wrong with the code you've shown. So either you have some trouble in the database itself, or you're ruining the query by using IEnumerable instead of IQueryable.
My guess is that either ctx.Books is IEnumerable<Books> (instead of IQueryable<Books>), or that the Count (and Where etc.) method you're calling is the Enumerable version, rather than the Queryable version.
Which version of Count are you actually calling?

First, to get help you need to provide quantitative values for "fast" vs. "too long". Loading entities from EF will take longer than running a raw SQL statement in a client tool like TOAD etc. Are you seeing differences of 15ms vs. 15 seconds, or 15ms vs. 150ms?
To help identify and eliminate possible culprits:
Eliminate the possibility of a long-running DbContext instance tracking too many entities bogging down performance. The longer a DbContext is used and the more entities it tracks, the slower it gets. Temporarily change the code to:
List<Books> bookList = new List<Books>();
using (var context = new YourDbContext())
{
var v = from a in context.Books select a);
int allBooksCount = v.Count(); // I need all books count before filter. but it is so slow is my first problem
if (isFilter)
{
v = v.Where(a => a.startdate <= DateTime.Now && a.enddate>= DateTime.Now);
}
.
.
bookList = v.ToList();
}
Using a fresh DbContext ensures queries are not sifting through in-memory entities after running a query to find tracked instances to return. This also ensures we are running against IQueryable off the Books DbSet within the context. We can only guess what "ctx" in your code actually represents.
Next: Look at a profiler for MySQL, or have your database log out SQL statements to capture exactly what EF is requesting. Check that the Count and ToList each trigger just one query against the database, and then run these exact statements against the database. If there are more than these two queries being run then something odd is happening behind the scenes that you need to investigate, such as that your example doesn't really represent what your real code is doing. You could be tripping client side evaluation (if using EF Core 2) or lazy loading. The next thing I would look at is if possible to look at the execution plan for these queries for hints like missing indexes or such. (my DB experience is primarily SQL Server so I cannot provide advice on tools to use for MySQL)

I would log the actual SQL queries here. You can then use DESCRIBE to look at how many rows it hits. There are various tools that can further analyse the queries if DESCRIBE isn't sufficient. This way you can see whether it's the queries or the (lack of) indices that is the problem. Next step has to be guided by that.

C# LINQ to Entities does not recognize the method 'System.String method, and this method cannot be translated into a store expression

It is just an example of my situation
var query =
Employee.Join(department,
emp => emp.depId,
dep=> dep.Id,
(emp, dep) =>
new EmployeeModel{ Name = emp.Name, Total = GetTotal(emp)});
public string GetTotal(emp)
{
//dynamically decide which column to pass to the stored procedure, that is
//why I am passing the whole object
total = sp(emp.column1, emp.column2);//here I pass the parameters to SP
return total;
}
And I get an exception here, I do not know how to solve it.
Any help would be greatly appreciated.
Thanks

Of course this is going to happen. Linq to entities doesn't know what GetTotal() is, and given how it is going to call a stored procedure for each row returned I suggest you move the join itself to a stored procedure and forget LINQ for a while here.

You have to understand the difference between IEnumerable and IQueryable.
IEnumerable
An object that implements IEnumerable, represents and enumerable sequence. It holds everything to ask for the first element of the sequence, and as long as you've got an element, you can ask for the next one.
At its lowest level this is done by calling GetEnumerator and repeatedly calling MoveNext / Current, until there are no more elements.
Higher level methods as foreach and IEnumerable LINQ methods that don't return IEnumerable<TResult>, like ToList / ToDictionary / Count / FirstOrDefault / Any / etc. all deep inside use GetEnumerator / MoveNext / Current.
IQueryable
An object that implements IQueryable, doesn't represent the enumerable sequence itself, it represents the potential to get an Enumerable sequence.
For this, an IQueryable has an Expression and a Provider.
The Expression represents what data must be fetched in some generic format. The Provider knows who will provide the data (quite often a database management system) and what language is used to communicate with this DBMS (usually SQL).
As long as you concatenate LINQ methods that return IQueryable<...>, you are only changing the Expression. The DBMS is not contacted.
When you start enumerating the sequence, either directly by calling GetEnumerator / MoveNext, or indirectly by using foreach, or calling a LINQ method that returns anything but IQueryable<...>, then the Expression is sent to the Provider who will translate the Expression into SQL and fetch the data. The fetched data is represented as an enumerable sequence, which your code can access using MoveNext / Current repeatedly.
The problem is, that your provider doesn't know how to translate the call to GetTotal(...) into SQL. In fact, although the guys who wrote Entity Framework did a marvellous job, there are several LINQ methods that your Provider can't handle. See Supported and Unsupported LINQ Methods (LINQ to Entities).
So you can't call your own methods when using IQueryable. Luckily there are several solutions for this.
AsEnumerable
Your DBMS is extremely optimized to select the data in the query. One of the slower parts of a database query is the transfer from the selected data from the DBMS to your local process. Hence it is wise to limit the amount of data being transferred.
If your DBMS doesn't need the output of your method, consider using AsEnumerable(). This will transfer your selected data to your local process in a smart way.
Depending on the Provider, it can fetch the data "per page", instead of fetching all data, so if you end your LINQ with Take(3).ToList(), it will not have fetched all 1000 Customers from your database.
The result of the fetched data is represented as an IEnumerable<...> to you. The LINQ statements after that are executed on your local machine, so you can call your local methods.
Alas, as you use GetTotal in your Join, you can't use this method. If you would use AsEnumerable before the Join, you would transfer all Employees and all Departments to your local process, who had to do the join.
Convert GetEmp
You use GetEmp in parameter resultSelector of your Join. This parameter has the following generic format
System.Linq.Expressions.Expression<Func<TOuter,TInner,TResult>> resultSelector
Of in your case:
Expression<Employee, Department, EmployeeModel>
So you'll have to change your GetEmp, such that it creates this expression.
Expression<Employee, Department, EmployeeModel> EmployeeModelExpression =
(employee, department) => new EmployeeModel
{
Name = employee.Name,
Total = ???
};
Alas you forgot to tell us how you calculate the Total in SP.
You should only use fairly simple calculations, like:
Total = employee.Column1 + employee.Column2,
Total = (employee.Column1 < 50) ? employee.Column1 : employee.Column2,
Total = employee.Column1 ?? employee.Column2

C# - Concatenate an in memory IList and IQueryable?

Suppose I have a List containing one string value. Suppose I also have an IQueryable that contains several strings from a database. I want to be able to concatenate these two containers into one list and then be able to call methods such as .Skip or .Take on the list. I want to be able to do this in such a way that when I combine the two containers I don't load all of the DB data into memory (only after I call .Skip and .Take). Basically, I want to do something like this (pseudocode):
IQueryable someQuery = myEntities.GetDBQuery(); // Gets "test2", "test3"
IList inMemoryList = new List();
inMemoryList.Add("test");
IList finalList = inMemoryList.Union(someQuery) // Can I do something like this without loading DB data into memory? finalList should contain all 3 strings.
// At this point it is fine to load the filtered query into memory.
foreach (string myString in finalList.Skip(100).Take(200))
{
// Do work...
}
How can I achieve this?

If I didn't misunderstand, you are trying to query the data, part of which comes from memory and others from database, like this:
//the following code will not compile, just for example
var dbQuery = BuildDbQuery();
var list = BuildListInMemory();
var myQuery = (dbQuery + list).OrderBy(aa).Skip(bb).Take(cc).Select(dd);
//and you don't want to load all records into memory by dbQuery
//because you only need some of them
The short answer is NO, you can't. Consider the .OrderBy method, all data have to be in a same "place", otherwise the code can't sort them. So the code loads all records in database by dbQuery into memory(now they are in a same place) and then sorts all of them including those in list. That probably causes a memory issue when dbQuery gives thousands of rows.
HOW TO RESOLVE
Pass the data in list into database (as parameters of dbQuery) so that the query happens in database. This is easy if your list has only a few items.
If list also has lots of records that will makes dbQuery too complex, you can try to query twice, one for dbQuery and one for list. For example, you have 10,000 users in database and 1,000 users in your memory list, and you want to get the top 10 youngest users. You don't need to load 10,000 users into memory and then find the youngest 10. Instead, you find 10 youngest (ResultA) in dbQuery and load into memory, and 10 youngest (ResultB) in memory list, and then compare between ResultA and ResultB.

I entirely agree with Danny's answer when he says you need to somehow find a way to include in memory user list into db so that you achieve what you want. As for the example which you sought in your comment, without knowing data structure of your User object, seems difficult. However assuming you would be able to connect the dots. Here is my suggested approach:
Create temporary table with identical structure that of your regular user table in your db and insert all your inmemory users into it
Write a query to Union temporary and regular table both identical in structure so that should be easy.
Return the result in your application and use it performing standard Linq operations
If you want exact code which you can use as it is then you will have to provide your User object structure - fields type etc in db to enable me to write the code.

You specify that your query and your list are both sequences of strings. someQuery can be performed completely on the database side (not in-memory)
Let's make your sequences less generic:
IQueryable<string> someQuery = ...
IList<string> myList = ...
You also specify that myList contains only one element.
string myOneAndOnlyString = myList.Single();
As your list is in-memory, this has to be performed in-memory. But because the list has only one element, this won't take any time.
The query that you request:
IQueryable<string> correctQuery = someQuery
.Where(item => item.Equals(myOneandOnlyString)
.Skip(skipCount)
.Take(takeCount)
Use your SQL server profiler to check the used SQL and see that the request is completely performed in one SQL statement.

How Dapper.NET works internally with .Count() and SingleOrDefault()?

I am new to Dapper though I am aware about ORMs and DAL and have implemented DAL with NHibernate earlier.
Example Query: -
string sql = "SELECT * FROM MyTable";
public int GetCount()
{
var result = Connection.Query<MyTablePoco>(sql).Count();
return result;
}
Will Dapper convert this query (internally) to SELECT COUNT(*) FROM MyTable looking at .Count() at the end?
Similarly, will it convert to SELECT TOP 1 * FROM MyTable in case of SingleOrDefault()?
I came from NHibernate world where it generates query accordingly. I am not sure about Dapper though. As I am working with MS Access, I do not see a way to check the query generated.

No, dapper will not adjust your query. The immediate way to tell this is simply: does the method return IEnumerable... vs IQueryable...? If it is the first, then it can only use local in-memory mechanisms.
Specifically, by default, Query will actually return a fully populated List<>. LINQ's Count() method recognises that and just accesses the .Count property of the list. So all the data is fetched from the database.
If you want to ask the database for the count, ask the database for the count.
As for mechanisms to view what is actually sent to the database: we use mini-profiler for this. It works great.
Note: when you are querying exactly one row: QueryFirstOrDefault (and the other variants you would expect) exist and have optimizations internally (including hints to ADO.NET, although not all providers can act on those things) to do things as efficiently as possible, but it does not adjust your query. In some cases the provider itself (not dapper) can help, but ultimately: if you only want the first row, ask the database for the first row (TOP or similar).

Deferred execution value unchanged

Say I have IQueryable<Car> cars.
I iterate it such as:
foreach (Car c in cars)
{
c.Doors = 2;
}
Why does c.Doors contain the original value after the foreach, instead of the changed value?
Thank you in advance

You indicated that IQueryable is the result of a linq to entities query which is not a correct statement. MyDatabaseContext.Cars.Where(x => x.Name == "Test") returns a IQueryable which on iteration perfoms a query on the database. (Iteration is when you perform a foreach over it). So it doesnt contain the result set yet, just the query.
looping twice over cars generates 2 identical queries to the database and returns 2 identical result sets. If you want to preserve the data. you need to call ToArray or ToList or after manipulation, perform a save of the changes before you iterate again.

Iterating over an IQueryable retrieves the result set from the database as it goes, as I'm sure you know. What I've observed in this and other situations is that this result is not cached, so you'll often find that iterating over the IQueryable again will actually run the query again, thus your modifications aren't preserved because you're getting a new result set which was never affected by that code.
The reason it works when you call ToList() first and iterate over the result of that is because ToList() retrieves and materialises the entire result set, which is then no longer linked to the database content and is really just a copy of what the database returned.
The only way to be sure your changes will stick is to operate on the local copy of the data, i.e. lift it out of IQueryable land. This can be as simple as saving an IEnumerable of the result set (i.e. the result of ToEnumerable()), which will then return the same result set each time you enumerate it, but unlike ToList() it won't cause evaluation immediately.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.