Entity Framework Query is too slow - c#

I have to put a complex query on your database. But the query ends at 8000 ms. Do I do something wrong? I use .net 1.1 and Entity Framework core 1.1.2 version.
var fol = _context.UserRelations
.Where(u => u.FollowerId == id && u.State == true)
.Select(p => p.FollowingId)
.ToArray();
var Votes = await _context.Votes
.OrderByDescending(c => c.CreationDate)
.Skip(pageSize * pageIndex)
.Take(pageSize)
.Where(fo => fol.Contains(fo.UserId))
.Select(vote => new
{
Id = vote.Id,
VoteQuestions = vote.VoteQuestions,
VoteImages = _context.VoteMedias.Where(m => m.VoteId == vote.Id)
.Select(k => k.MediaUrl.ToString()),
Options = _context.VoteOptions.Where(m => m.VoteId == vote.Id).Select( ques => new
{
OptionsID = ques.Id,
OptionsName = ques.VoteOption,
OptionsCount = ques.VoteRating.Count(cout => cout.VoteOptionsId == ques.Id),
}),
User = _context.Users.Where(u => u.Id == vote.UserId).Select(usr => new
{
Id = usr.Id,
Name = usr.UserProperties.Where(o => o.UserId == vote.UserId).Select(l => l.Name.ToString())
.First(),
Surname = usr.UserProperties.Where(o => o.UserId == vote.UserId)
.Select(l => l.SurName.ToString()).First(),
ProfileImage = usr.UserProfileImages.Where(h => h.UserId == vote.UserId && h.State == true)
.Select(n => n.ImageUrl.ToString()).First()
}),
NextPage = nextPage
}).ToListAsync();

Have a look at the SQL queries you generate to the server (and results of this queries). For SQL Server the best option is SQL Server Profiler, there are ways for other servers too.
you create two queries. First creates fol array and then you pass it into the second query using Contains. Do you know how this works? You probably generate query with as many parameters as many items you have in the array. It is neither pretty or efficient. It is not necessary here, merge it into the main query and you would have only one parameter.
you do paginating before filtering, is this really the way it should work? Also have a look at other ways of paginating based on filtering by ids rather than simple skipping.
you do too much side queries in one query. When you query three sublists of 100 items each, you do not get 300 rows. To get it in one query you create join and get actually 100*100*100 = 1000000 rows. Unless you are sure the frameworks can split it into multiple queries (probably can not), you should query the sublists in separate queries. This would be probably the main performance problem you have.
please use singular to name tables, not plural
for performance analysis, indexes structure and execution plan are vital information and you can not really say much without them

As noted in the comments, you are potentially executing 100, 1000 or 10000 queries. For every Vote in your database that matches the first result you do 3 other queries.
For 1000 votes which result from the first query you need to do 3000 other queries to fetch the data. That's insane!
You have to use EF Cores eager loading feature to fetch this data with very few queries. If your models are designed well with relations and navigation properties its easy.
When you load flat models without a projection (using .Select), you have to use .Include to tell EF Which other related entities it should load.
// Assuming your navigation property is called VoteMedia
await _context.Votes.
.Include(vote => vote.VoteMedia)
...
This would load all VoteMedia objects with the vote. So extra query to get them is not necessary.
But if you use projects, the .Include calls are not necessary (in fact they are even ignored, when you reference navigation properties in the projection).
// Assuming your navigation property is called VoteMedia
await _context.Votes.
.Include(vote => vote.VoteMedia)
...
.Select( vote => new
{
Id = vote.Id,
VoteQuestions = vote.VoteQuestions,
// here you reference to VoteMedia from your Model
// EF Core recognize that and will load VoteMedia too.
//
// When using _context.VoteMedias.Where(...), EF won't do that
// because you directly call into the context
VoteImages = vote.VoteMedias.Where(m => m.VoteId == vote.Id)
.Select(k => k.MediaUrl.ToString()),
// Same here
Options = vote.VoteOptions.Where(m => m.VoteId == vote.Id).Select( ques => ... );
}

Related

Why is Entity Framework having performance issues when calculating a sum

I am using Entity Framework in a C# application and I am using lazy loading. I am experiencing performance issues when calculating the sum of a property in a collection of elements. Let me illustrate it with a simplified version of my code:
public decimal GetPortfolioValue(Guid portfolioId) {
var portfolio = DbContext.Portfolios.FirstOrDefault( x => x.Id.Equals( portfolioId ) );
if (portfolio == null) return 0m;
return portfolio.Items
.Where( i =>
i.Status == ItemStatus.Listed
&&
_activateStatuses.Contains( i.Category.Status )
)
.Sum( i => i.Amount );
}
So I want to fetch the value for all my items that have a certain status of which their parent has a specific status as well.
When logging the queries generated by EF I see it is first fetching my Portfolio (which is fine). Then it does a query to load all Item entities that are part of this portfolio. And then it starts fetching ALL Category entities for each Item one by one. So if I have a portfolio that contains 100 items (each with a category), it literally does 100 SELECT ... FROM categories WHERE id = ... queries.
So it seems like it's just fetching all info, storing it in its memory and then calculating the sum. Why does it not do a simple join between my tables and calculate it like that?
Instead of doing 102 queries to calculate the sum of 100 items I would expect something along the lines of:
SELECT
i.id, i.amount
FROM
items i
INNER JOIN categories c ON c.id = i.category_id
WHERE
i.portfolio_id = #portfolioId
AND
i.status = 'listed'
AND
c.status IN ('active', 'pending', ...);
on which it could then calculate the sum (if it is not able to use the SUM directly in the query).
What is the problem and how can I improve the performance other than writing a pure ADO query instead of using Entity Framework?
To be complete, here are my EF entities:
public class ItemConfiguration : EntityTypeConfiguration<Item> {
ToTable("items");
...
HasRequired(p => p.Portfolio);
}
public class CategoryConfiguration : EntityTypeConfiguration<Category> {
ToTable("categories");
...
HasMany(c => c.Products).WithRequired(p => p.Category);
}
EDIT based on comments:
I didn't think it was important but the _activeStatuses is a list of enums.
private CategoryStatus[] _activeStatuses = new[] { CategoryStatus.Active, ... };
But probably more important is that I left out that the status in the database is a string ("active", "pending", ...) but I map them to an enum used in the application. And that is probably why EF cannot evaluate it? The actual code is:
... && _activateStatuses.Contains(CategoryStatusMapper.MapToEnum(i.Category.Status)) ...
EDIT2
Indeed the mapping is a big part of the problem but the query itself seems to be the biggest issue. Why is the performance difference so big between these two queries?
// Slow query
var portfolio = DbContext.Portfolios.FirstOrDefault(p => p.Id.Equals(portfolioId));
var value = portfolio.Items.Where(i => i.Status == ItemStatusConstants.Listed &&
_activeStatuses.Contains(i.Category.Status))
.Select(i => i.Amount).Sum();
// Fast query
var value = DbContext.Portfolios.Where(p => p.Id.Equals(portfolioId))
.SelectMany(p => p.Items.Where(i =>
i.Status == ItemStatusConstants.Listed &&
_activeStatuses.Contains(i.Category.Status)))
.Select(i => i.Amount).Sum();
The first query does a LOT of small SQL queries whereas the second one just combines everything into one bigger query. I'd expect even the first query to run one query to get the portfolio value.
Calling portfolio.Items this will lazy load the collection in Items and then execute the subsequent calls including the Where and Sum expressions. See also Loading Related Entities article.
You need to execute the call directly on the DbContext the Sum expression can be evaluated database server side.
var portfolio = DbContext.Portfolios
.Where(x => x.Id.Equals(portfolioId))
.SelectMany(x => x.Items.Where(i => i.Status == ItemStatus.Listed && _activateStatuses.Contains( i.Category.Status )).Select(i => i.Amount))
.Sum();
You also have to use the appropriate type for _activateStatuses instance as the contained values must match the type persisted in the database. If the database persists string values then you need to pass a list of string values.
var _activateStatuses = new string[] {"Active", "etc"};
You could use a Linq expression to convert enums to their string representative.
Notes
I would recommend you turn off lazy loading on your DbContext type. As soon as you do that you will start to catch issues like this at run time via Exceptions and can then write more performant code.
I did not include error checking for if no portfolio was found but you could extend this code accordingly.
Yep CategoryStatusMapper.MapToEnum cannot be converted to SQL, forcing it to run the Where in .Net. Rather than mapping the status to the enum, _activeStatuses should contain the list of integer values from the enum so the mapping is not required.
private int[] _activeStatuses = new[] { (int)CategoryStatus.Active, ... };
So that the contains becomes
... && _activateStatuses.Contains(i.Category.Status) ...
and can all be converted to SQL
UPDATE
Given that i.Category.Status is a string in the database, then
private string[] _activeStatuses = new[] { CategoryStatus.Active.ToString(), ... };

Replacing Include() calls to Select()

Im trying to eliminate the use of the Include() calls in this IQueryable definition:
return ctx.timeDomainDataPoints.AsNoTracking()
.Include(dp => dp.timeData)
.Include(dp => dp.RecordValues.Select(rv => rv.RecordKind).Select(rk => rk.RecordAlias).Select(fma => fma.RecordAliasGroup))
.Include(dp => dp.RecordValues.Select(rv => rv.RecordKind).Select(rk => rk.RecordAlias).Select(fma => fma.RecordAliasUnit))
.Where(dp => dp.RecordValues.Any(rv => rv.RecordKind.RecordAlias != null))
.Where(dp => dp.Source == 235235)
.Where(dp => dp.timeData.time >= start && cd.timeData.time <= end)
.OrderByDescending(cd => cd.timeData.time);
I have been having issues with the database where the run times are far too long and the primary cause of this is the Include() calls are pulling everything.
This is evident in viewing the table that is returned from the resultant SQL query generated from this showing lots of unnecessary information being returned.
One of the things that you learn I guess.
The Database has a large collection of data points which there are many Recorded values.
Each Recorded value is mapped to a Record Kind which may have a Record Alias.
I have tried creating a Select() as an alternative but I just cant figure out how to construct the right Select and also keep the entity hierarchy correctly loaded. I.e. the related entities are loaded with unnecessary calls to the DB.
Does anyone has alternate solutions that may jump start me to solve this problem.
Ill add more detail if needed.
You are right. One of the slower parts of a database query is the transport of the selected data from the DBMS to your local process. Hence it is wise to limit this.
Every TimeDomainDataPoint has a primary key. All RecordValues of this TimeDomainDataPoint have a foreign key TimeDomainDataPointId with a value equal to this primary key.
So If TimeDomainDataPoint with Id 4 has a thousand RecordValues, then every RecordValue will have a foreign key with a value 4. It would be a waste to transfer this value 4 a 1001 times, while you only need it once.
When querying data, always use Select and select only the properties you actually plan to use. Only use Include if you plan to update the fetched included items.
The following will be much faster:
var result = dbContext.timeDomainDataPoints
// first limit the datapoints you want to select
.Where(datapoint => d.RecordValues.Any(rv => rv.RecordKind.RecordAlias != null))
.Where(datapoint => datapoint.Source == 235235)
.Where(datapoint => datapoint.timeData.time >= start
&& datapoint.timeData.time <= end)
.OrderByDescending(datapoint => datapoint.timeData.time)
// then select only the properties you actually plan to use
Select(dataPoint => new
{
Id = dataPoint.Id,
RecordValues = dataPoint.RecordValues
.Where(recordValues => ...) // if you don't want all RecordValues
.Select(recordValue => new
{
// again: select only the properties you actually plan to use:
Id = recordValue.Id,
// not needed, you know the value: DataPointId = recordValue.DataPointId,
RecordKinds = recordValues.RecordKinds
.Where(recordKind => ...) // if you don't want all recordKinds
.Select(recordKind => new
{
... // only the properties you really need!
})
.ToList(),
...
})
.ToList(),
TimeData = dataPoint.TimeData.Select(...),
...
});
Possible imporvement
The part:
.Where(datapoint => d.RecordValues.Any(rv => rv.RecordKind.RecordAlias != null))
is used to fetch only datapoints that have recordValues with a non-null RecordAlias. If you are selecting the RecordAlias anyway, consider doing this Where after your select:
.Select(...)
.Where(dataPoint => dataPoint
.Where(dataPoint.RecordValues.RecordKind.RecordAlias != null)
.Any());
I'm not really sure whether this is faster. If your database management system internally first creates a complete table with all columns of all joined tables and then throws away the columns that are not selected, then it won't make a difference. However, if it only creates a table with the columns it actually uses, then the internal table will be smaller. This could be faster.
your problem is hierarchy joins in your query.In order to decrease this problem create other query for get result from relation table as follows:
var items= ctx.timeDomainDataPoints.AsNoTracking().Include(dp =>dp.timeData).Include(dp => dp.RecordValues);
var ids=items.selectMany(item=>item.RecordValues).Select(i=>i.Id);
and on other request to db:
var otherItems= ctx.RecordAlias.AsNoTracking().select(dp =>dp.RecordAlias).where(s=>ids.Contains(s.RecordKindId)).selectMany(s=>s.RecordAliasGroup)
to this approach your query do not have internal joins.

EF 6. Complex Query. Search and Filter

I'm using Entity Framework 6 in my asp.net mvc application.
I have complex query to database that causes about 15 tables.
Query includes searching and filtering. This query execution is slow (about 800 ms on local machine).
query.Include(i => i.Customer)
.Include(i => i.Address)
...
.Include(i => i.Photos)
.Select(x => new {
...
x.Address.City,
CustomerName = i.Customer != null ? i.Customer.Name : "",
...
});
...
//searching & filtering
// searchFilter.PropertyName and searchFilter.PropertyValue - strings!
// for example searchFilter.PropertyName = 'CustomerName'
query = query.Where(String.Format("{0} == {1}", searchFilter.PropertyName, searchFilter.PropertyValue));
// PageIndex = 20
query = query.Skip(PageIndex * PageSize).Take(PageSize)
...
var result = query.ToList();
...
The problems:
Eager loading not working properly - MiniProfiler shows duplicate requests of Address table
Such searching (using 'Contains') is very limited, because I have to create anonymous type object with properties like Customer for checking if Customer is not null (or more complex actions) and have to hard-code somewhere PropertyName's strings (for instance in javascript file that calls ajax request).
Are there other ways to do it?
some remarks:
you don't need to include(i => i.Address) to select x.Address.City, nor to test any Address property in a where clause
as long as you have an IQueryable bind to an Sql Server, CustomerName = i.Customer != null ? i.Customer.Name : "" can be replaced by CustomerName = i.Customer.Name ?? ""
Not sure this can have significant performance impact but...
Otherwise, as often, index creation is a path to performance improvement.

How to do join with objects in Entity Framework

So I am converting a old project with ordinary SQL queries to a ORM using the Entity Framework. So I have created database model like this:
So I had this old query which I want to translate to a linq expression
SELECT UGLINK.USERNAME
FROM GMLINK
INNER JOIN UGLINK
ON GMLINK.GROUPID = UGLINK.GROUPID
WHERE (((GMLINK.MODULEID)=%ID%))
And the problem I have is that I can't figure out how to do a join query using the objects.
Instead I have to go though the properties like this (which seems to be working):
// So this is one of the module objects that is located in a listView in the GUI
Module m = ModuleList.selectedItem as Module;
/* Now I want to fetch all the User objects that,
* via a group, is connected to a certain module */
var query = context.gmLink
.Join(context.ugLink,
gmlink => gmlink.GroupId,
uglink => uglink.GroupId,
(gmlink, uglink) => new { gmLink = gmlink, ugLink = uglink })
.Where(gmlink => gmlink.gmLink.ModuleId == m.ModuleId)
.Select(x => x.ugLink.User);
So as I said this works, but as you see I kind of have to connect the modules via the link tables properties .GroupId and .ModuleId and so on. Instead I would like to go through the objects created by EF.
I wanted to write a question a bit like this, but can't figure out how to do it, is it at all possible?
var query = context.User
.Select(u => u.ugLink
.Select(uglink => uglink.Group.gmLink
.Where(gmLink => gmLink.Module == m)));
This should be working:
var query = context.gmLink
.Where(gmlink => gmlink.ModuleId == m.ModuleId)
.SelectMany(gmlink => gmlink.Group.ugLink)
.Select(uglink => uglink.User);
It's impossible to filter gmLinks using .Where(gmlink => gmlink.Module == m) in EF, so this comparison needs to be done using identifiers. Another option is .Where(gmlink => gmlink.Module.ModuleId == m.ModuleId)
If you have lazy loading enabled, you do not need to apply specific join notation (you can access the navigation properties directly) - but the queries that are ran against SQL are inefficient (generally the results are returned in a number of different select statements).
My preference is to disable lazy loading on the context, and use .Include() notation to join tables together manually, resulting in generally more efficient queries. .Include() is used to explicitly join entities in Entity Framework.
Join() is misleading, and not appropriate for joining tables in EF.
So, to replicate this statement:
SELECT UGLINK.USERNAME
FROM GMLINK
INNER JOIN UGLINK
ON GMLINK.GROUPID = UGLINK.GROUPID
WHERE (((GMLINK.MODULEID)=%ID%))
You would use the following:
var query = context.gmLink
.Include(x => x.Group.gmLink)
.Where(x => x.ModuleId == myIdVariable)
.Select(x => new {
UserName = x.Group.ugLink.UserName
});
Assuming that your navigation properties are correctly set up. I have not tested this, so I'm not 100% on the syntax.
You should really run SQL profiler while you write and run LINQ to Entity queries against your database, so you can understand what's actually being generated and run against your database. A lot of the time, an EF query may be functioning correctly, but you may experience performance issues when deployed to a production system.
This whitepaper might help you out.
I haven't tested it, but something like this:
var users = context.User
.Where(x => x.ugLink
.Any(y => context.gmLink
.Where(z => z.ModuleId == m)
.Select(z => z.GroupId)
.Contains(y.GroupId)
)
)
.ToList();

LINQ Query - Only get Order and MAX Date from Child Collection

I'm trying to get a list that displays 2 values in a label from a parent and child (1-*) entity collection model.
I have 3 entities:
[Customer]: CustomerId, Name, Address, ...
[Order]: OrderId, OrderDate, EmployeeId, Total, ...
[OrderStatus]: OrderStatusId, StatusLevel, StatusDate, ...
A Customer can have MANY Order, which in turn an Order can have MANY OrderStatus, i.e.
[Customer] 1--* [Order] 1--* [OrderStatus]
Given a CustomerId, I want to get all of the Orders (just OrderId) and the LATEST (MAX?) OrderStatus.StatusDate for that Order.
I've tried a couple of attempts, but can seem to get the results I want.
private IQueryable<Customer> GetOrderData(string customerId)
{
var ordersWithLatestStatusDate = Context.Customers
// Note: I am not sure if I should add the .Expand() extension methods here for the other two entity collections since I want these queries to be as performant as possible and since I am projecting below (only need to display 2 fields for each record in the IQueryable<T>, but thinking I should now after some contemplation.
.Where(x => x.CustomerId == SelectedCustomer.CustomerId)
.Select(x => new Custom
{
CustomerId = x.CustomerId,
...
// I would like to project my Child and GrandChild Collections, i.e. Orders and OrderStatuses here but don't know how to do that. I learned that by projecting, one does not need to "Include/Expand" these extension methods.
});
return ordersWithLatestStatusDate ;
}
---- UPDATE 1 ----
After the great solution from User: lazyberezovsky, I tried the following:
var query = Context.Customers
.Where(c => c.CustomerId == SelectedCustomer.CustomerId)
.Select(o => new Customer
{
Name = c.Name,
LatestOrderDate = o.OrderStatus.Max(s => s.StatusDate)
});
In my hastiness from my initial posting, I didn't paste everything in correctly since it was mostly from memory and didn't have the exact code for reference at the time. My method is a strongly-typed IQueryabled where I need it to return a collection of items of type T due to a constraint within a rigid API that I have to go through that has an IQueryable query as one of its parameters. I am aware I can add other entities/attributes by either using the extension methods .Expand() and/or .Select(). One will notice that my latest UPDATED query above has an added "new Customer" within the .Select() where it was once anonymous. I'm positive that is why the query failed b/c it couldn't be turn into a valid Uri due to LatestOrderDate not being a property of Customer at the Server level. FYI, upon seeing the first answer below, I had added that property to my client-side Customer class with simple { get; set; }. So given this, can I somehow still have a Customer collection with the only bringing back those 2 fields from 2 different entities? The solution below looked so promising and ingenious!
---- END UPDATE 1 ----
FYI, the technologies I'm using are OData (WCF), Silverlight, C#.
Any tips/links will be appreciated.
This will give you list of { OrderId, LatestDate } objects
var query = Context.Customers
.Where(c => c.CustomerId == SelectedCustomer.CustomerId)
.SelectMany(c => c.Orders)
.Select(o => new {
OrderId = o.OrderId,
LatestDate = o.Statuses.Max(s => s.StatusDate) });
.
UPDATE construct objects in-memory
var query = Context.Customers
.Where(c => c.CustomerId == SelectedCustomer.CustomerId)
.SelectMany(c => c.Orders)
.AsEnumerable() // goes in-memory
.Select(o => new {
OrderId = o.OrderId,
LatestDate = o.Statuses.Max(s => s.StatusDate) });
Also grouping could help here.
If I read this correctly you want a Customer entity and then a single value computed from its Orders property. Currently this is not supported in OData. OData doesn't support computed values in the queries. So no expressions in the projections, no aggregates and so on.
Unfortunately even with two queries this is currently not possible since OData doesn't support any way of expressing the MAX functionality.
If you have control over the service, you could write a server side function/service operation to execute this kind of query.

Categories