This question already has answers here:
Optimize entity framework query
(5 answers)
Closed 5 years ago.
I am using Entity Framework in my ASP.NET MVC application, and I am facing an issue in loading data from SQL Server via LINQ. My query returns the result in 4 seconds, but I need to get in less time, still searching for better solution.
Here is my linq query:
var user =
context.CreateObjectSet<DAL.ProductMaster>()
.AsNoTracking()
.Include("Product1").AsNoTracking()
.Include("Product2").AsNoTracking()
.Include("Product3").AsNoTracking()
.Include("Product4").AsNoTracking()
.Include("Product5").AsNoTracking()
.Where(x => x.Id == request.Id)
).FirstOrDefault();
No need to get multiple entities fulfilling the predicate in Where() only to then select the first one; simply call FirstOrDefault() directly.
if (request == null || organizationsList == null || !organizationsList.Any())
return;
var user = context.CreateObjectSet<DAL.User>()
.AsNoTracking()
.Include("UsersinPrograms.FollowUps")
.Include() ...
.FirstOrDefault(x => x.Id == request.userId && organizationsList.Contains(x.OrganizationId)));
Also, remove the Include()'s that you don't need.
For this and any large scale business logic operations one should create Stored Procedures instead of basing them in Linq. Then in EF one creates a DTO object to handle the result and consumes the procedure in EF.
At that point the speed gained or lost is up to how the procedure's sql has been structured, with an eye on indexes used within the database to possibly provide an enhancement on speed.
Linq SQL is really boilerplate and not designed for speed.
I have done this with multiple EF projects.
Also one can build their linq query in Linqpad and then switch to view the SQL generated. That would provide an idea on how to structure one's SQL in a stored proc.
Also you could look into using SQL CTEs (Common Table Experssions) to build your query one step at a time until the data is just as needed.
Related
I have a web app, and I'm connecting to a DB with entity framework.
To select all Employee records of a particular department, for example, I can quite easily write:
....Employees.Where(o => o.Department == "HR").ToList();
Works fine. But is it most optimal?
Should this Where clause be incorporated into a stored procedure or view? Or does my entity framework code do the job of converting it to SQL anyway?
We've had performance problems in our team in the past from when people pull records into memory and then do the filtering in .net instead of at a database level. I'm trying to avoid this happening again so want to be crystal clear on what I must avoid.
If Employees is provided by Entity Framework then the Where() will be translated into SQL and sent to the database. It is only the point that you materialise the objects does it take the filters you have applied before and turn them into SQL. Anything after that point is just plain LINQ to objects.
Methods that cause materialisation to happen include things like .ToList() and .ToArray() (there are more, but these two are probably the most common).
If you want to see what is happening on SQL Server, you should open up the SQL Profiler and have a look at the queries that are being sent.
We've had performance problems in our team in the past from when people pull records into memory and then do the filtering in .net instead of at a database level.
Just as an addendum to Colin's answer, and to target the quote above, the way to avoid this is to make sure your database queries are fully constructed with IQueryable<T> first, before enumerating the results with a call such as .ToList(), or .ToArray().
As an example, consider the following:
IEnumerable<Employee> employees = context.Employees;
// other code, before executing the following
var hrEmployees = employees.Where(o => o.Department == "HR").ToList();
The .ToList() will enumerate the results grabbing all of the employees from the context first, and then performing the filtering on the client. This isn't going to perform very well if you've got a lot of employees to contend with, and it's certainly not going to scale very well.
Compare that with this:
IQueryable<Employee> employees = context.Employees;
// other code, before executing the following
var hrEmployees = employees.Where(o => o.Department == "HR").ToList();
IQueryable<T> derives from IEnumerable<T>. The difference between them is that IQueryable<T> has a query provider built in, and the query you construct is represented as an expression tree. That means it's not evaluated until a call that enumerates the results of the query, such as .ToList().
In the second example above, the query provider will execute SQL to fetch only those employees that belong to the HR department, performing the filtering on the database itself.
I had the following:
List<Message> unreadMessages = this.context.Messages
.Where( x =>
x.AncestorMessage.MessageID == ancestorMessageID &&
x.Read == false &&
x.SentTo.Id == userID ).ToList();
foreach(var unreadMessage in unreadMessages)
{
unreadMessage.Read = true;
}
this.context.SaveChanges();
But there must be a way of doing this without having to do 2 SQL queries, one for selecting the items, and one for updating the list.
How do i do this?
Current idiomatic support in EF
As far as I know, there is no direct support for "bulk updates" yet in Entity Framework (there has been an ongoing discussion for bulk operation support for a while though, and it is likely it will be included at some point).
(Why) Do you want to do this?
It is clear that this is an operation that, in native SQL, can be achieved in a single statement, and provides some significant advantages over the approach followed in your question. Using the single SQL statement, only a very small amount of I/O is required between client and DB server, and the statement itself can be completely executed and optimized by the DB server. No need to transfer to and iterate through a potentially large result set client side, just to update one or two fields and send this back the other way.
How
So although not directly supported by EF, it is still possible to do this, using one of two approaches.
Option A. Handcode your SQL update statement
This is a very simple approach, that does not require any other tools/packages and can be performed Async as well:
var sql = "UPDATE TABLE x SET FIELDA = #fieldA WHERE FIELDB = #fieldb";
var parameters = new SqlParameter[] { ..., ... };
int result = db.Database.ExecuteSqlCommand(sql, parameters);
or
int result = await db.Database.ExecuteSqlCommandAsync(sql, parameters);
The obvious downside is, well breaking the nice linqy paradigm and having to handcode your SQL (possibly for more than one target SQL dialect).
Option B. Use one of the EF extension/utility packages
Since a while, a number of open source nuget packages are available that offer specific extensions to EF. A number of them do provide a nice "linqy" way to issue a single update SQL statement to the server. Two examples are:
Entity Framework Extended Library that allows performing a bulk update using a statement like:
context.Messages.Update(
x => x.Read == false && x.SentTo.Id == userID,
x => new Message { Read = true });
It is also available on github
EntityFramework.Utilities that allows performing a bulk update using a statement like:
EFBatchOperation
.For(context, context.Messages)
.Where(x => x.Read == false && x.SentTo.Id == userID)
.Update(x => x.Read, x => x.Read = true);
It is also available on github
And there are definitely other packages and libraries out there that provide similar support.
Even SQL has to do this in two steps in a sense, in that an UPDATE query with a WHERE clause first runs the equivalent of a SELECT behind the scenes, filtering via the WHERE clause, then applying the update. So really, I don't think you need to be worried about improving this.
Further, the reason why it's broken into two steps like this in LINQ is precisely for performance reasons. You want that "select" to be as minimal as possible, i.e. you don't want to load any more objects from the database into in memory objects than you have to. Only then do you alter objects (in the foreach).
If you really want to run a native UPDATE on the SQL side, you could use a System.Data.SqlClient.SqlCommand to issue the update, instead of having LINQ give you back objects that you then update. That will be faster, but then you conceptually move some of your logic out of your C# code object model space into the database model space (you are doing things in the database, not in your object space), even if the SqlCommand is being issued from your code.
I am using ASP NET MVC 4.5 and EF6, code first migrations.
I have this code, which takes about 6 seconds.
var filtered = _repository.Requests.Where(r => some conditions); // this is fast, conditions match only 8 items
var list = filtered.ToList(); // this takes 6 seconds, has 8 items inside
I thought that this is because of relations, it must build them inside memory, but that is not the case, because even when I return 0 fields, it is still as slow.
var filtered = _repository.Requests.Where(r => some conditions).Select(e => new {}); // this is fast, conditions match only 8 items
var list = filtered.ToList(); // this takes still around 5-6 seconds, has 8 items inside
Now the Requests table is quite complex, lots of relations and has ~16k items. On the other hand, the filtered list should only contain proxies to 8 items.
Why is ToList() method so slow? I actually think the problem is not in ToList() method, but probably EF issue, or bad design problem.
Anyone has had experience with anything like this?
EDIT:
These are the conditions:
_repository.Requests.Where(r => ids.Any(a => a == r.Student.Id) && r.StartDate <= cycle.EndDate && r.EndDate >= cycle.StartDate)
So basically, I can checking if Student id is in my id list and checking if dates match.
Your filtered variable contains a query which is a question, and it doesn't contain the answer. If you request the answer by calling .ToList(), that is when the query is executed. And that is the reason why it is slow, because only when you call .ToList() is the query executed by your database.
It is called Deferred execution. A google might give you some more information about it.
If you show some of your conditions, we might be able to say why it is slow.
In addition to Maarten's answer I think the problem is about two different situation
some condition is complex and results in complex and heavy joins or query in your database
some condition is filtering on a column which does not have an index and this cause the full table scan and make your query slow.
I suggest start monitoring the query generated by Entity Framework, it's very simple, you just need to set Log function of your context and see the results,
using (var context = new MyContext())
{
context.Database.Log = Console.Write;
// Your code here...
}
if you see something strange in generated query try to make it better by breaking it in parts, some times Entity Framework generated queries are not so good.
if the query is okay then the problem lies in your database (assuming no network problem).
run your query with an SQL profiler and check what's wrong.
UPDATE
I suggest you to:
add index for StartDate and EndDate Column in your table (one for each, not one for both)
ToList executes the query against DB, while first line is not.
Can you show some conditions code here?
To increase the performance you need to optimize query/create indexes on the DB tables.
Your first line of code only returns an IQueryable. This is a representation of a query that you want to run not the result of the query. The query itself is only runs on the databse when you call .ToList() on your IQueryable, because its the first point that you have actually asked for data.
Your adjustment to add the .Select only adds to the existing IQueryable query definition. It doesnt change what conditions have to execute. You have essentially changed the following, where you get back 8 records:
select * from Requests where [some conditions];
to something like:
select '' from Requests where [some conditions];
You will still have to perform the full query with the conditions giving you 8 records, but for each one, you only asked for an empty string, so you get back 8 empty strings.
The long and the short of this is that any performance problem you are having is coming from your "some conditions". Without seeing them, its is difficult to know. But I have seen people in the past add .Where clauses inside a loop, before calling .ToList() and inadvertently creating a massively complicated query.
Jaanus. The most likely reason of this issue is complecity of generated SQL query by entity framework. I guess that your filter condition contains some check of other tables.
Try to check generated query by "SQL Server Profiler". And then copy this query to "Management Studio" and check "Estimated execution plan". As a rule "Management Studio" generatd index recomendation for your query try to follow these recomendations.
Trying to refactor some code that has gotten really slow recently and I came across a code block that is taking 5+ seconds to execute.
The code consists of 2 statements:
IEnumerable<int> StudentIds = _entities.Filters
.Where(x => x.TeacherId == Profile.TeacherId.Value && x.StudentId != null)
.Select(x => x.StudentId)
.Distinct<int>();
and
_entities.StudentClassrooms
.Include("ClassroomTerm.Classroom.School.District")
.Include("ClassroomTerm.Teacher.Profile")
.Include("Student")
.Where(x => StudentIds.Contains(x.StudentId)
&& x.ClassroomTerm.IsActive
&& x.ClassroomTerm.Classroom.IsActive
&& x.ClassroomTerm.Classroom.School.IsActive
&& x.ClassroomTerm.Classroom.School.District.IsActive).AsQueryable<StudentClassroom>();
So it's a bit messy but first I get a Distinct list of Id's from one Table (Filters), then I query another Table using it.
These are relatively small tables, but it's still 5+ seconds of query time.
I put this in LINQPad and it showed that it was doing the bottom query first then running 1000 "distinct" queries afterwards.
On a whim I changed the "StudentIds" code by just adding .ToArray() at the end. This improved the speed 1000x ... it now takes like 100ms to complete the same query.
What's the deal? What am I doing wrong?
This is one of the pitfalls of deferred execution in Linq: In your first approach StudentIds is really an IQueryable, not an in-memory collection. That means using it in the second query will run the query again on the database - each and every time.
Forcing execution of the first query by using ToArray() makes StudentIds an in-memory collection and the Contains part in your second query will run over this collection that contains a fixed sequence of items - This gets mapped to something equivalent to a SQL where StudentId in (1,2,3,4) query.
This query will of course, be much much faster since you determined this sequence once up-front, and not every time the Where clause is executed. Your second query without using ToArray() (I would think) would be mapped to a SQL query with an where exists (...) sub-query that gets evaluated for each row.
ToArray() Materializes the initial query to the server memory.
My guess would be the query provider is not able to parse the expression StudentIds.Contains(x.StudentId). Hence it probably thinks that the studentIds is an array already loaded to memory. So it's probably querying the database over and over again during the parsing phase. The only way to know for sure is to setup the profiler.
If you need to do this on the db server, use a join, instead of "contains". If you need to use contains to do what looks like a join problem, you are likely to be missing a surrogate primary key or a foreign key somewhere.
You could also declare studentIds as IQueryable instead of IEnumerable. This might give the query provider the hint it needs to interpret the studentIds as expression aka. data not already loaded to memory. I somehow doubt this but worth a try.
If all else fails, use ToArray(). This will load the initial studentIds to memory.
Using the Entity Framework, when one executes a query on lets say 2000 records requiring a groupby and some other calculations, does the query get executed on the server and only the results sent over to the client or is it all sent over to the client and then executed?
This using SQL Server.
I'm looking into this, as I'm going to be starting a project where there will be loads of queries required on a huge database and want to know if this will produce a significant load on the network, if using the Entity Framework.
I would think all database querying is done on the server side (where the database is!) and the results are passed over. However, in Linq you have what's known as Delayed Execution (lazily loaded) so your information isn't actually retrieved until you try to access it e.g. calling ToList() or accessing a property (related table).
You have the option to use the LoadWith to do eager loading if you require it.
So in terms of performance if you only really want to make 1 trip to the Database for your query (which has related tables) I would advise using the LoadWith options. However, it does really depend on the particular situation.
It's always executed on SQL Server. This also means sometimes you have to change this:
from q in ctx.Bar
where q.Id == new Guid(someString)
select q
to
Guid g = new Guid(someString);
from q in ctx.Bar
where q.Id == g
select q
This is because the constructor call cannot be translated to SQL.
Sql's groupby and linq's groupby return differently shaped results.
Sql's groupby returns keys and aggregates (no group members)
Linq's groupby returns keys and group members.
If you use those group members, they must be (re-)fetched by the grouping key. This can result in +1 database roundtrip per group.
well, i had the same question some time ago.
basically: your linq-statement is converted to a sql-statement. however: some groups will get translated, others not - depending on how you write your statement.
so yes - both is possible
example:
var a = (from entity in myTable where entity.Property == 1 select entity).ToList();
versus
var a = (from entity in myTable.ToList() where entity.Property == 1 select entity).ToList();