Lazy Loading Subquery Result - LINQ

Lazy Loading Subquery Result - LINQ - c#

I am building a system to load a large amount of data from a database and display them to the user in a grid. I'd like to make the query as efficient as possible, and wanted to know if lazy-loading a subquery was possible.
For example, I have a query resembling the following:
List<Result> = db.Entity1
.Select(e1 => new RowVm {
Id = e1.Id,
SubqueryCount = db.Entity2.Where(e2 => e2.e1Id == e1.Id).Count(),
OtherSubquery = db.Entity3.Where(e3 => e3.Name == e1.Name).ToList()
})
.Where(row => row.SubqueryCount > 10)
.ToList();
My issue is, I would like OtherSubquery to delay loading until after the .Where(row => row.SubqueryCount > 10) because the number of entities will then be way fewer, reducing the computation required.
Crucially, however, my actual query has a dynamic filter instead of the static where clause above. This means I do not know which subqueries will need to be used to evaluate the filter, and hence moving the unused subqueries to a second select after the filter is not an option. The actual query also has many more subqueries (for different columns in the grid) so the performance difference would be rather large, I imagine.
Is this form of delaying execution of a subquery possible in LINQ, or do I just have to settle for the slower performance of loading all the subqueries in advance of filtering?

Related

EF Core query is extremely slow with many relations between tables

I have an EF Core query like this:
var existingViolations = await _context.Parent
.Where(p => p.ProjectId == projectId)
.Include(p => p.Relation1)
.Include(p => p.Relation2)
.ThenInclude(r => r.Relation21)
.Include(p => p.Relation3)
.AsSplitQuery()
.ToListAsync();
This query usually takes between 55-65 seconds which can sometimes cause database timeouts. All the tables included in the query, including the parent table, contain anywhere from 30k-60k rows and 3-6 columns each. I have tried splitting it up into smaller queries using LoadAsync() like this:
_context.ChangeTracker.LazyLoadingEnabled = false;
_context.ChangeTracker.AutoDetectChangesEnabled = false;
await _context.Relation1.Where(r1 => r1.Parent.ProjectId == projectId).LoadAsync();
await _context.Relation2.Where(r2 => r2.Parent.ProjectId == projectId).Include(r2 => r2.Relation21).LoadAsync();
await _context.Relation3.Where(r3 => r3.Parent.ProjectId == projectId).LoadAsync();
var result = await _context.Parent.Where(p => p.ProjectId == projectId).ToListAsync();
That shaves about 5 seconds off the query time, so nothing to brag about. I've done some timings, and it's the last line (var result = await _context.Parent.Where(p => p.ProjectId == projectId).ToListAsync();) that takes by far the longest to complete, about 90% of the spent time.
How can I optimize this further?
EDIT: Here is the generated SQL query:
SELECT [v].[Id], [v].[Description], [v].[ProjectId], [v].[RuleId], [v].[StateStatus], [v0].[Id], [v0].[ElementId], [v0].[Role], [v0].[ParentId], [t].[Id], [t].[ActivatedDate], [t].[StateStatus], [t].[ParentId], [t].[Id0], [t].[RunId], [t].[SerializedState], [t].[StateId], [p].[Id], [p].[ActualValue], [p].[CurrentValue], [p].[ParameterId], [p].[ParentId]
FROM [Parent] AS [v]
LEFT JOIN [Relation1] AS [v0] ON [v].[Id] = [v0].[ParentId]
LEFT JOIN (
SELECT [s].[Id], [s].[ActivatedDate], [s].[StateStatus], [s].[ParentId], [s0].[Id] AS [Id0], [s0].[RunId], [s0].[SerializedState], [s0].[StateId]
FROM [Relation2] AS [s]
LEFT JOIN [Relation21] AS [s0] ON [s].[Id] = [s0].[StateId]
) AS [t] ON [v].[Id] = [t].[ParentId]
LEFT JOIN [Relation3] AS [p] ON [v].[Id] = [p].[ParentId]
WHERE [v].[ProjectId] = #__projectId_0
ORDER BY [v].[Id], [v0].[Id], [t].[Id], [t].[Id0], [p].[Id]
When running the SQL query directly in the database, it takes about 3-4 seconds to complete, so the problem seems to be with how EF processes the results.

Without seeing the real entities and how they might be configured, it's anyone's guess.
Generally speaking when looking at performance issues like this, the first thing I would look to tackle is "what is this data being loaded for?" Typically when I see queries using a lot of Includes, this is something like a read operation to be loaded for a view or computation based on that selected data. Projection down to a simpler model can help significantly here if you really only need a few columns from each table to satisfy your needs. The benefit of projection using a Select across the related data to fill either a DTO/ViewModel class for a view or an an anonymous type for a computation is that Include will want to pass all columns for all eager loaded tables in the one go, where projection will only pass back the columns referenced. This can be critically important where tables can contain things like large text/binary columns that you don't need at all or right away. This is also very important in cases where the database server might be some distance from the consuming client or web server. Less data over the wire = faster performance, though the issue right now sounds like the DB query itself.
The next thing to check would be the relationships between all of the tables and any relevant configuration in EF vs. the table design. Waiting a minute to pull a few records from 30-60k rows is ridiculously long and I would be highly suspect of some flawed relationship mapping that isn't using FKs/indexes. Another place to look would be to run a profiler against the database to capture the exact SQL statement(s) being run, then execute those manually to investigate their execution plan which might reveal schema problems or some weirdness with the entity relationship mapping producing very inefficient queries.
The next thing to check would be to use a process of elimination to see if there is a bad relationship. Eliminate each of the eager load Include statements one by one and see how long each the query scenario takes. If there is a particular Include that is responsible for a drastic slow-down, drill down into that relationship to see why that might be.
That should give you a couple avenues to check. Consider revising your question with the actual entities and any further troubleshooting results.

Simulate Entity Framework's .Last() when using SQL Server

SQL server is able to translate EF .First() using its function TOP(1). But when using Entity Framework's .Last() function, it throws an exception. SQL server does not recognize such functions, for obvious reasons.
I used to work it around by sorting descending and taking the first corresponding line :
var v = db.Table.OrderByDescending(t => t.ID).FirstOrDefault(t => t.ClientNumber == ClientNumberDetected);
This does it with a single query, but sorting the whole table (million rows) before querying...
Do I have good reasons to think there will be speed issues if I abuse of this technique ?
I thought of something similar... but it requires two query :
int maxID_of_Client = db.Where(t => t.ClientNumber == ClientNumberDetected).Max(t => t.ID);
var v = db.First(t => t.ID == maxID_of_Client);
It's consisting of retrieving the max ID of the client, then use this ID to retrieve the last line of the client.
It doesn't seems faster to query two times...
There must be a way to optimize this and use a single query without sorting millions of datas.
Unless there is something I don't understand, I'm probably not the first to think about this problem and I want to solve it for good !
Thanks in advance.

The assumption driving this question is that result sets with no ordering clause come back from your DB in any predictable order at all.
In reality, result sets that come back from SQL have no implicit ordering and none should be assumed.
Therefore, the result of
db.Table.FirstOrDefault(t => t.ClientNumber == ClientNumberDetected)
is actually indeterminate.
Whether you're taking first or last, without ordering it's all meaningless anyway.
Now, what goes to SQL where you add an ordering clause to your LINQ? It will be something similar to...
SELECT TOP(1) something FROM somewhere WHERE foo=bar ORDER BY somevalue
or, in the the descending/last case
SELECT TOP(1) something FROM somewhere WHERE foo=bar ORDER BY somevalue DESC
From SQL's POV, there's no significant difference here and your DB will be optimized for this sort of query. The index can be scanned in either direction, and the cost of each query above is the same.
TL;DR :
db.Table.OrderByDescending(t => t.ID)
.FirstOrDefault(t => t.ClientNumber == ClientNumberDetected)
is just fine.

What is the performance overhead of Entity Framework compared to raw SQL execution time?

In my application (EF6 + SQL Server) I am dynamically creating EF queries to enable rich search functionality.
These queries are created by chaining a bunch of Where() predicates, and by projecting the results using few aggregations into a known CLR types. In all the cases EF generates a single SQL query that returns small amount of results (about 10).
Using SQL Profiler I can see that the execution time of these generated queries when executed by the database is withing few milliseconds. However, unless the query is trivially simple, the total execution time (calling ToList() or Count() from my code) is within few HUNDRED milliseconds! The code is built in Release mode and tested without debugger attached.
Can anyone give me any hints what might be wrong with my approach? Can it be possible that the EF's overhead is two orders of magnitude in time compared to the raw SQL execution time?
EDIT:
These are some code samples that I am using to filter the result set:
if (p.PriceMin != null)
query = query.Where(a => a.Terms.Any(t => t.Price >= p.PriceMin.Value));
if (p.StartDate != null && p.EndDate != null)
query = query.Where(a => a.Terms.Any(t => t.Date >= p.StartDate.Value && t.Date <= p.EndDate.Value));
if (p.DurationMin != null)
query = query.Where(a => a.Itinerary.OfType<DayElement>().Count() > p.DurationMin.Value - 2);
if (p.Locations != null && p.Locations.Count > 0)
{
var locs = p.Locations.Select(l => new Nullable<int>(l)).ToList();
query = query.Where(a => a.Itinerary.OfType<MoveToElement>().Any(e => locs.Contains(e.LocationId)) ||
a.Itinerary.OfType<StartElement>().Any(e => locs.Contains(e.LocationId)) ||
a.Itinerary.OfType<EndElement>().Any(e => locs.Contains(e.LocationId)));
}
Then I order the results like this:
if (p.OrderById)
query = query.OrderBy(a => a.Id);
else if (p.OrderByPrice)
query = query.OrderByDescending(a => a.Terms.Average(t => t.Price));
The execution time is roughly the same if I try to execute the same query multiple times in a row (calling multiple query.Count() withing the same DbContext), so I guess in this case the EF's query generation is pretty efficient. It seems that something else is the bottleneck...

In general, yes EF is slower than raw SQL because it's difficult to predict how EF will build up a query and EF has no knowledge of your database indexes or how it is structured.
There's no way to say exactly what the overhead is, as it will vary from query to query.
If you want to optimize your query with EF you will have to try out various ways of chaining your where predicates and benchmark the results. Even the slightest difference can make a big difference.
I ran into an issue myself where there was a huge difference between using .Any() and .Contains() which you can see here: Check if list contains item from other list in EntityFramework
The result was the same, but the second query was about 100 times faster. So yes, it is possible that for certain queries EF is two orders of magnitude slower than raw SQL. Other times it will be a few milliseconds slower.

According to documentation it depends if you run a unique query for the first time or for the seconds time. It depends on your benchmark code. You should
execute the query for the first time
start stopwatch
execute the query for the second time
stop stopwatch

Improving performance of slow query - Potentially by disabling change tracking

I have a Linq query on a DbSet that hits a table and grabs 65k rows. The query takes about 3 minutes, to me that seems like obviously too much. Although I don't have a line of comparison but I'm certain this can be improved. I'm relative new to EF and Linq so I suspect I may also be structuring my query in a way that is a big "NO".
I read that change tracking is where EF spends most of it's time, and that is enabled on the entity in question so perhaps I should turn that off (if so, how)?
Here's the code:
ReportTarget reportTarget = repository.GetById(reportTargetId);
if (reportTarget != null)
{
ReportsBundle targetBundle = reportTarget.SavedReportsBundles.SingleOrDefault(rb => rb.ReportsBundleId == target.ReportsBundleId);
if (targetBundle != null)
{
}
}
This next line takes 3 Minutes to execute (65k records):
IPoint[] pointsData = targetBundle.ReportEntries
.Where(e => ... a few conditions )
.Select((entry, i) => new
{
rowID = entry.EntryId,
x = entry.Profit,
y = i,
weight = target.HiddenPoints.Contains(entry.EntryId) ? 0 : 1,
group = 0
}.ActLike<IPoint>())
.ToArray();
Note: ActLike() is from Impromptu Interface library that uses the .NET DLR to make dynamic proxies of objects that implement an interface on the fly. I doubt this is the bottle neck.
How can I optimize performance for this particular DbSet (TradesReportEntries) as I'll be querying this table for large data sets (IPoint[]s) often

Well, it looks like you're loading an entity object then querying navigation properties. When this occurs, EF loads all related entities FIRST (via lazy loading), then your query is performed on the entire collection. This may be why you're having performance issues.
Try querying against the collection by using the following:
context.Entry(targetBundle)
.Collection(p => p.TradesReportEntries)
.Query()
.Where( e => <your filter here> )
.Select( <your projection here> )
This allows you to specify a filter in addition to the behind-the-curtain filter that handles loading the nav property by default. Let us know how that works out.

Are different IQueryable objects combined?

I have a little program that needs to do some calculation on a data range. The range maybe contain about half a millon of records. I just looked to my db and saw that a group by was executed.
I thought that the result was executed on the first line, and later I just worked with data in RAM. But now I think that the query builder combine the expression.
var Test = db.Test.Where(x => x > Date.Now.AddDays(-7));
var Test2 = (from p in Test
group p by p.CustomerId into g
select new { UniqueCount = g.Count() } );
In my real world app I got more subqueries that is based on the range selected by the first query. I think I just added a big overhead to let the DB make different selects.
Now I bascilly just call .ToList() after the first expression.
So my question is am I right about that the query builder combine different IQueryable when it builds the expression tree?

Yes, you are correct. LINQ expressions are lazily evaluated at the moment you evaluate them (via .ToList(), for example). At that point in time, Entity Framework will look at the total query and build an SQL statement to represent it.
In this particular case, it's probably wiser to not evaluate the first query, because the SQL database is optimized for performing set-based operations like grouping and counting. Rather than forcing the database to send all the Test objects across the wire, deserializing the results into in-memory objects, and then performing the grouping and counting locally, you will likely see better performance by having the SQL database just return the resulting Counts.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.