Improving performance of slow query - Potentially by disabling change tracking - c#

I have a Linq query on a DbSet that hits a table and grabs 65k rows. The query takes about 3 minutes, to me that seems like obviously too much. Although I don't have a line of comparison but I'm certain this can be improved. I'm relative new to EF and Linq so I suspect I may also be structuring my query in a way that is a big "NO".
I read that change tracking is where EF spends most of it's time, and that is enabled on the entity in question so perhaps I should turn that off (if so, how)?
Here's the code:
ReportTarget reportTarget = repository.GetById(reportTargetId);
if (reportTarget != null)
{
ReportsBundle targetBundle = reportTarget.SavedReportsBundles.SingleOrDefault(rb => rb.ReportsBundleId == target.ReportsBundleId);
if (targetBundle != null)
{
}
}
This next line takes 3 Minutes to execute (65k records):
IPoint[] pointsData = targetBundle.ReportEntries
.Where(e => ... a few conditions )
.Select((entry, i) => new
{
rowID = entry.EntryId,
x = entry.Profit,
y = i,
weight = target.HiddenPoints.Contains(entry.EntryId) ? 0 : 1,
group = 0
}.ActLike<IPoint>())
.ToArray();
Note: ActLike() is from Impromptu Interface library that uses the .NET DLR to make dynamic proxies of objects that implement an interface on the fly. I doubt this is the bottle neck.
How can I optimize performance for this particular DbSet (TradesReportEntries) as I'll be querying this table for large data sets (IPoint[]s) often

Well, it looks like you're loading an entity object then querying navigation properties. When this occurs, EF loads all related entities FIRST (via lazy loading), then your query is performed on the entire collection. This may be why you're having performance issues.
Try querying against the collection by using the following:
context.Entry(targetBundle)
.Collection(p => p.TradesReportEntries)
.Query()
.Where( e => <your filter here> )
.Select( <your projection here> )
This allows you to specify a filter in addition to the behind-the-curtain filter that handles loading the nav property by default. Let us know how that works out.

Related

Lazy Loading Subquery Result - LINQ

I am building a system to load a large amount of data from a database and display them to the user in a grid. I'd like to make the query as efficient as possible, and wanted to know if lazy-loading a subquery was possible.
For example, I have a query resembling the following:
List<Result> = db.Entity1
.Select(e1 => new RowVm {
Id = e1.Id,
SubqueryCount = db.Entity2.Where(e2 => e2.e1Id == e1.Id).Count(),
OtherSubquery = db.Entity3.Where(e3 => e3.Name == e1.Name).ToList()
})
.Where(row => row.SubqueryCount > 10)
.ToList();
My issue is, I would like OtherSubquery to delay loading until after the .Where(row => row.SubqueryCount > 10) because the number of entities will then be way fewer, reducing the computation required.
Crucially, however, my actual query has a dynamic filter instead of the static where clause above. This means I do not know which subqueries will need to be used to evaluate the filter, and hence moving the unused subqueries to a second select after the filter is not an option. The actual query also has many more subqueries (for different columns in the grid) so the performance difference would be rather large, I imagine.
Is this form of delaying execution of a subquery possible in LINQ, or do I just have to settle for the slower performance of loading all the subqueries in advance of filtering?

Linq ToList() and Count() performance problem

I have 200k rows in my table and I need to filter the table and then show in datatable. When I try to do that, my sql run fast. But when I want to get row count or run the ToList(), it takes long time. Also when I try to convert it to list it has 15 rows after filter, it has not huge data.
public static List<Books> GetBooks()
{
List<Books> bookList = new List<Books>();
var v = from a in ctx.Books select a);
int allBooksCount = v.Count(); // I need all books count before filter. but it is so slow is my first problem
if (isFilter)
{
v = v.Where(a => a.startdate <= DateTime.Now && a.enddate>= DateTime.Now);
}
.
.
bookList = v.ToList(); // also toList is so slow is my second problem
}
There's nothing wrong with the code you've shown. So either you have some trouble in the database itself, or you're ruining the query by using IEnumerable instead of IQueryable.
My guess is that either ctx.Books is IEnumerable<Books> (instead of IQueryable<Books>), or that the Count (and Where etc.) method you're calling is the Enumerable version, rather than the Queryable version.
Which version of Count are you actually calling?
First, to get help you need to provide quantitative values for "fast" vs. "too long". Loading entities from EF will take longer than running a raw SQL statement in a client tool like TOAD etc. Are you seeing differences of 15ms vs. 15 seconds, or 15ms vs. 150ms?
To help identify and eliminate possible culprits:
Eliminate the possibility of a long-running DbContext instance tracking too many entities bogging down performance. The longer a DbContext is used and the more entities it tracks, the slower it gets. Temporarily change the code to:
List<Books> bookList = new List<Books>();
using (var context = new YourDbContext())
{
var v = from a in context.Books select a);
int allBooksCount = v.Count(); // I need all books count before filter. but it is so slow is my first problem
if (isFilter)
{
v = v.Where(a => a.startdate <= DateTime.Now && a.enddate>= DateTime.Now);
}
.
.
bookList = v.ToList();
}
Using a fresh DbContext ensures queries are not sifting through in-memory entities after running a query to find tracked instances to return. This also ensures we are running against IQueryable off the Books DbSet within the context. We can only guess what "ctx" in your code actually represents.
Next: Look at a profiler for MySQL, or have your database log out SQL statements to capture exactly what EF is requesting. Check that the Count and ToList each trigger just one query against the database, and then run these exact statements against the database. If there are more than these two queries being run then something odd is happening behind the scenes that you need to investigate, such as that your example doesn't really represent what your real code is doing. You could be tripping client side evaluation (if using EF Core 2) or lazy loading. The next thing I would look at is if possible to look at the execution plan for these queries for hints like missing indexes or such. (my DB experience is primarily SQL Server so I cannot provide advice on tools to use for MySQL)
I would log the actual SQL queries here. You can then use DESCRIBE to look at how many rows it hits. There are various tools that can further analyse the queries if DESCRIBE isn't sufficient. This way you can see whether it's the queries or the (lack of) indices that is the problem. Next step has to be guided by that.

Include() vs Select() performance

I have a parent entity with a navigation property to a child entity. The parent entity may not be removed as long as there are associated records in the child entity. The child entity can contain hundreds of thousands of records.
I'm wondering what will be the most efficient to do in Entity Framework to do this:
var parentRecord = _context.Where(x => x.Id == request.Id)
.Include(x => x.ChildTable)
.FirstOrDefault();
// check if parentRecord exists
if (parentRecord.ChildTable.Any()) {
// cannot remove
}
or
var parentRecord = _context.Where(x => x.Id == request.Id)
.Select(x => new {
ParentRecord = x,
HasChildRecords = x.ChildTable.Any()
})
.FirstOrDefault();
// check if parentRecord exists
if (parentRecord.HasChildRecords) {
// cannot remove
}
The first query may include thousands of records while the second query will not, however, the second one is more complex.
Which is the best way to do this?
I would say it depens. It depends on which DBMS you're using. it depends on how good the optimizer works etc.
So one single statement with a JOIN could be far faster than a lot of SELECT statements.
In general I would say when you need the rows from your Child table use .Include(). Otherwise don't include them.
Or in simple words, just read the data you need.
The answer depends on your database design. Which columns are indexed? How much data is in table?
Include() offloads work to your C# layer, but means a more simple query. It's probably the better choice here but you should consider extracting the SQL that is generated by entity framework and running each through an optimisation check.
You can output the sql generated by entity framework to your visual studio console as note here.
This example might create a better sql query that suites your needs.

How do I update multiple Entity models in one SQL statement?

I had the following:
List<Message> unreadMessages = this.context.Messages
.Where( x =>
x.AncestorMessage.MessageID == ancestorMessageID &&
x.Read == false &&
x.SentTo.Id == userID ).ToList();
foreach(var unreadMessage in unreadMessages)
{
unreadMessage.Read = true;
}
this.context.SaveChanges();
But there must be a way of doing this without having to do 2 SQL queries, one for selecting the items, and one for updating the list.
How do i do this?
Current idiomatic support in EF
As far as I know, there is no direct support for "bulk updates" yet in Entity Framework (there has been an ongoing discussion for bulk operation support for a while though, and it is likely it will be included at some point).
(Why) Do you want to do this?
It is clear that this is an operation that, in native SQL, can be achieved in a single statement, and provides some significant advantages over the approach followed in your question. Using the single SQL statement, only a very small amount of I/O is required between client and DB server, and the statement itself can be completely executed and optimized by the DB server. No need to transfer to and iterate through a potentially large result set client side, just to update one or two fields and send this back the other way.
How
So although not directly supported by EF, it is still possible to do this, using one of two approaches.
Option A. Handcode your SQL update statement
This is a very simple approach, that does not require any other tools/packages and can be performed Async as well:
var sql = "UPDATE TABLE x SET FIELDA = #fieldA WHERE FIELDB = #fieldb";
var parameters = new SqlParameter[] { ..., ... };
int result = db.Database.ExecuteSqlCommand(sql, parameters);
or
int result = await db.Database.ExecuteSqlCommandAsync(sql, parameters);
The obvious downside is, well breaking the nice linqy paradigm and having to handcode your SQL (possibly for more than one target SQL dialect).
Option B. Use one of the EF extension/utility packages
Since a while, a number of open source nuget packages are available that offer specific extensions to EF. A number of them do provide a nice "linqy" way to issue a single update SQL statement to the server. Two examples are:
Entity Framework Extended Library that allows performing a bulk update using a statement like:
context.Messages.Update(
x => x.Read == false && x.SentTo.Id == userID,
x => new Message { Read = true });
It is also available on github
EntityFramework.Utilities that allows performing a bulk update using a statement like:
EFBatchOperation
.For(context, context.Messages)
.Where(x => x.Read == false && x.SentTo.Id == userID)
.Update(x => x.Read, x => x.Read = true);
It is also available on github
And there are definitely other packages and libraries out there that provide similar support.
Even SQL has to do this in two steps in a sense, in that an UPDATE query with a WHERE clause first runs the equivalent of a SELECT behind the scenes, filtering via the WHERE clause, then applying the update. So really, I don't think you need to be worried about improving this.
Further, the reason why it's broken into two steps like this in LINQ is precisely for performance reasons. You want that "select" to be as minimal as possible, i.e. you don't want to load any more objects from the database into in memory objects than you have to. Only then do you alter objects (in the foreach).
If you really want to run a native UPDATE on the SQL side, you could use a System.Data.SqlClient.SqlCommand to issue the update, instead of having LINQ give you back objects that you then update. That will be faster, but then you conceptually move some of your logic out of your C# code object model space into the database model space (you are doing things in the database, not in your object space), even if the SqlCommand is being issued from your code.

Speed up entity framework

I've been looking to optimize a web application which uses MVC 2 and EF4. Listing queries took ~22seconds for ~10k rows with 14 columns, which is obviously too slow.
So as part of this I've upgraded to MVC 4 and EF 6.1 (highest I could go with VS2010).
For read-only queries I've added .AsNoTracking() to the queries, this dropped the time to ~3seconds. I'm wondering if there's anything more I could do to get it down to ~1seconds.
my code so far is:
category = CategoryHelper.MapToOldFormat(category);
var mainIds = Repository.Categories
.Include(o => o.LinkedCategories)
.Where(o => o.Category1.Contains(category))
.AsNoTracking()
.ToList();
var linkedCats = mainIds.SelectMany(o => o.LinkedCategories).Union(mainIds).Select(c => c.Id);
var notifications = Repository.Notifications
.Include(o => o.Country)
.Include(o => o.NonEUCountries)
.Include(o => o.Language)
.Include(o => o.RAW)
.Include(o => o.RAW.Classification)
.Include(o => o.RAW.TransactionPN)
.AsNoTracking();
if (id != null)
{
notifications = notifications.Where(o => o.Id == id);
}
if (!string.IsNullOrWhiteSpace(category))
{
notifications = notifications.Where(o => linkedCats.Contains(o.RAW.Classification.CategoryID));
}
return notifications.Logged(MethodBase.GetCurrentMethod()).ToList();
In the benchmarks category wand id were null, so the IN for category doesn't get generated. I will be replacing that with a intflag in the future as a fast way to support multiple categories.
Are there any other big performance problems with this example query?
First of all, listing 10k results is painful. You need to use paging with large datasets.
Imagine the cost of moving relational data to 10k instances of some class and inject run-time features like self-tracking or lazy-loading. A loop of 10k iterations where each iteration has complex code. It should be slow by default, shouldn't it?
Thus, it seems that you need to leverage LINQ's extension methods like .Skip(...) and .Take(...).
Another improvement would be a result of analyzing your current data schema in your DB and your object model, because 1 table is 1 class (with 14 columns / properties) could be a problem: maybe a segmented design would improve your scenario.
Anyway, paging will be your friend. This should reduce query times to a fraction of a second.
Update
#Phyx said in some comment:
I Inherited the project and don't have the budget to change ALL the
listings.
If you can't change that, I would say caching should be the solution. 1 user should receive the impact of these unoptimized (non-optimizable...) queries and the rest would be consuming an output cache that might last in small time intervals, but it might be enough to speed up your application and reduce load times.

Categories