I'm using C# and entity framework 6.
I have an entity (Animal) in my database that has a list of another entity (Feeding) attached to it. The Feeding entity has a DateTime associated with it.
My query looks like this:
var sum = animal.Feedings.Where(f => (DateTimeOffset.Now.Date- f.DateTime.Date).Days == 1).Select(f => f.Amount).Sum()
I basically want the summation of the total feeding amount from yesterday. But if my Animal object has a significantly large amount of feedings (say 5000, which is expected and could be even higher), this line is taking 20+ seconds to complete.
Is there a way to refactor this line of code to be more efficient?
EDIT
It appears to be the Feedings object being lazy loaded from the animal object.
var feedings = animal.Feedings
This line is what now takes the excessive time. The animal object was originally created from an AsQueryable() object, by selecting the animal by an ID.
EDIT #2
This logic is inside the Animal Repository and cannot access the DBContext, it can only access an IQueryable of the Animal collection. I also do not have access to EF's Include() function to try to include the feedings with the query.
The slow part is loading the Feedings collection. If it's already loaded, the LINQ to Objects query will run very fast.
If you have access to the DbContext, instead of loading the collection you can execute the query inside the database like this:
var date = new DateTimeOffset(DateTime.Today.AddDays(-1));
var sum = dbContext.Entry(animal).Collection(e => e.Feedings).Query()
.Where(f => DbFunctions.TruncateTime(f.DateTime) == date)
.Sum(f => f.Amount);
Having access to IQueryable<Animals> (as mentioned in the comments and the update) will also work. Just instead of
dbContext.Entry(animal).Collection(e => e.Feedings).Query()
you would use something like this
animals.Where(a => a.Id == animal.Id).SelectMany(a => a.Feedings)
The rest is the same.
If you have an index on DateTime, then try
var startDate = DateTimeOffset.Now.AddDays(-1).Date;
var endDate = DateTimeOffset.Now.Date;
var sum = animal.Feedings
.Where(f => (f.DateTime >= startDate && f.DateTime < endDate))
.Select(f => f.Amount)
.DefaultIfEmpty(0) // don't throw an error if the query has no results
.Sum();
Related
I am using Entity Framework in a C# application and I am using lazy loading. I am experiencing performance issues when calculating the sum of a property in a collection of elements. Let me illustrate it with a simplified version of my code:
public decimal GetPortfolioValue(Guid portfolioId) {
var portfolio = DbContext.Portfolios.FirstOrDefault( x => x.Id.Equals( portfolioId ) );
if (portfolio == null) return 0m;
return portfolio.Items
.Where( i =>
i.Status == ItemStatus.Listed
&&
_activateStatuses.Contains( i.Category.Status )
)
.Sum( i => i.Amount );
}
So I want to fetch the value for all my items that have a certain status of which their parent has a specific status as well.
When logging the queries generated by EF I see it is first fetching my Portfolio (which is fine). Then it does a query to load all Item entities that are part of this portfolio. And then it starts fetching ALL Category entities for each Item one by one. So if I have a portfolio that contains 100 items (each with a category), it literally does 100 SELECT ... FROM categories WHERE id = ... queries.
So it seems like it's just fetching all info, storing it in its memory and then calculating the sum. Why does it not do a simple join between my tables and calculate it like that?
Instead of doing 102 queries to calculate the sum of 100 items I would expect something along the lines of:
SELECT
i.id, i.amount
FROM
items i
INNER JOIN categories c ON c.id = i.category_id
WHERE
i.portfolio_id = #portfolioId
AND
i.status = 'listed'
AND
c.status IN ('active', 'pending', ...);
on which it could then calculate the sum (if it is not able to use the SUM directly in the query).
What is the problem and how can I improve the performance other than writing a pure ADO query instead of using Entity Framework?
To be complete, here are my EF entities:
public class ItemConfiguration : EntityTypeConfiguration<Item> {
ToTable("items");
...
HasRequired(p => p.Portfolio);
}
public class CategoryConfiguration : EntityTypeConfiguration<Category> {
ToTable("categories");
...
HasMany(c => c.Products).WithRequired(p => p.Category);
}
EDIT based on comments:
I didn't think it was important but the _activeStatuses is a list of enums.
private CategoryStatus[] _activeStatuses = new[] { CategoryStatus.Active, ... };
But probably more important is that I left out that the status in the database is a string ("active", "pending", ...) but I map them to an enum used in the application. And that is probably why EF cannot evaluate it? The actual code is:
... && _activateStatuses.Contains(CategoryStatusMapper.MapToEnum(i.Category.Status)) ...
EDIT2
Indeed the mapping is a big part of the problem but the query itself seems to be the biggest issue. Why is the performance difference so big between these two queries?
// Slow query
var portfolio = DbContext.Portfolios.FirstOrDefault(p => p.Id.Equals(portfolioId));
var value = portfolio.Items.Where(i => i.Status == ItemStatusConstants.Listed &&
_activeStatuses.Contains(i.Category.Status))
.Select(i => i.Amount).Sum();
// Fast query
var value = DbContext.Portfolios.Where(p => p.Id.Equals(portfolioId))
.SelectMany(p => p.Items.Where(i =>
i.Status == ItemStatusConstants.Listed &&
_activeStatuses.Contains(i.Category.Status)))
.Select(i => i.Amount).Sum();
The first query does a LOT of small SQL queries whereas the second one just combines everything into one bigger query. I'd expect even the first query to run one query to get the portfolio value.
Calling portfolio.Items this will lazy load the collection in Items and then execute the subsequent calls including the Where and Sum expressions. See also Loading Related Entities article.
You need to execute the call directly on the DbContext the Sum expression can be evaluated database server side.
var portfolio = DbContext.Portfolios
.Where(x => x.Id.Equals(portfolioId))
.SelectMany(x => x.Items.Where(i => i.Status == ItemStatus.Listed && _activateStatuses.Contains( i.Category.Status )).Select(i => i.Amount))
.Sum();
You also have to use the appropriate type for _activateStatuses instance as the contained values must match the type persisted in the database. If the database persists string values then you need to pass a list of string values.
var _activateStatuses = new string[] {"Active", "etc"};
You could use a Linq expression to convert enums to their string representative.
Notes
I would recommend you turn off lazy loading on your DbContext type. As soon as you do that you will start to catch issues like this at run time via Exceptions and can then write more performant code.
I did not include error checking for if no portfolio was found but you could extend this code accordingly.
Yep CategoryStatusMapper.MapToEnum cannot be converted to SQL, forcing it to run the Where in .Net. Rather than mapping the status to the enum, _activeStatuses should contain the list of integer values from the enum so the mapping is not required.
private int[] _activeStatuses = new[] { (int)CategoryStatus.Active, ... };
So that the contains becomes
... && _activateStatuses.Contains(i.Category.Status) ...
and can all be converted to SQL
UPDATE
Given that i.Category.Status is a string in the database, then
private string[] _activeStatuses = new[] { CategoryStatus.Active.ToString(), ... };
Im trying to eliminate the use of the Include() calls in this IQueryable definition:
return ctx.timeDomainDataPoints.AsNoTracking()
.Include(dp => dp.timeData)
.Include(dp => dp.RecordValues.Select(rv => rv.RecordKind).Select(rk => rk.RecordAlias).Select(fma => fma.RecordAliasGroup))
.Include(dp => dp.RecordValues.Select(rv => rv.RecordKind).Select(rk => rk.RecordAlias).Select(fma => fma.RecordAliasUnit))
.Where(dp => dp.RecordValues.Any(rv => rv.RecordKind.RecordAlias != null))
.Where(dp => dp.Source == 235235)
.Where(dp => dp.timeData.time >= start && cd.timeData.time <= end)
.OrderByDescending(cd => cd.timeData.time);
I have been having issues with the database where the run times are far too long and the primary cause of this is the Include() calls are pulling everything.
This is evident in viewing the table that is returned from the resultant SQL query generated from this showing lots of unnecessary information being returned.
One of the things that you learn I guess.
The Database has a large collection of data points which there are many Recorded values.
Each Recorded value is mapped to a Record Kind which may have a Record Alias.
I have tried creating a Select() as an alternative but I just cant figure out how to construct the right Select and also keep the entity hierarchy correctly loaded. I.e. the related entities are loaded with unnecessary calls to the DB.
Does anyone has alternate solutions that may jump start me to solve this problem.
Ill add more detail if needed.
You are right. One of the slower parts of a database query is the transport of the selected data from the DBMS to your local process. Hence it is wise to limit this.
Every TimeDomainDataPoint has a primary key. All RecordValues of this TimeDomainDataPoint have a foreign key TimeDomainDataPointId with a value equal to this primary key.
So If TimeDomainDataPoint with Id 4 has a thousand RecordValues, then every RecordValue will have a foreign key with a value 4. It would be a waste to transfer this value 4 a 1001 times, while you only need it once.
When querying data, always use Select and select only the properties you actually plan to use. Only use Include if you plan to update the fetched included items.
The following will be much faster:
var result = dbContext.timeDomainDataPoints
// first limit the datapoints you want to select
.Where(datapoint => d.RecordValues.Any(rv => rv.RecordKind.RecordAlias != null))
.Where(datapoint => datapoint.Source == 235235)
.Where(datapoint => datapoint.timeData.time >= start
&& datapoint.timeData.time <= end)
.OrderByDescending(datapoint => datapoint.timeData.time)
// then select only the properties you actually plan to use
Select(dataPoint => new
{
Id = dataPoint.Id,
RecordValues = dataPoint.RecordValues
.Where(recordValues => ...) // if you don't want all RecordValues
.Select(recordValue => new
{
// again: select only the properties you actually plan to use:
Id = recordValue.Id,
// not needed, you know the value: DataPointId = recordValue.DataPointId,
RecordKinds = recordValues.RecordKinds
.Where(recordKind => ...) // if you don't want all recordKinds
.Select(recordKind => new
{
... // only the properties you really need!
})
.ToList(),
...
})
.ToList(),
TimeData = dataPoint.TimeData.Select(...),
...
});
Possible imporvement
The part:
.Where(datapoint => d.RecordValues.Any(rv => rv.RecordKind.RecordAlias != null))
is used to fetch only datapoints that have recordValues with a non-null RecordAlias. If you are selecting the RecordAlias anyway, consider doing this Where after your select:
.Select(...)
.Where(dataPoint => dataPoint
.Where(dataPoint.RecordValues.RecordKind.RecordAlias != null)
.Any());
I'm not really sure whether this is faster. If your database management system internally first creates a complete table with all columns of all joined tables and then throws away the columns that are not selected, then it won't make a difference. However, if it only creates a table with the columns it actually uses, then the internal table will be smaller. This could be faster.
your problem is hierarchy joins in your query.In order to decrease this problem create other query for get result from relation table as follows:
var items= ctx.timeDomainDataPoints.AsNoTracking().Include(dp =>dp.timeData).Include(dp => dp.RecordValues);
var ids=items.selectMany(item=>item.RecordValues).Select(i=>i.Id);
and on other request to db:
var otherItems= ctx.RecordAlias.AsNoTracking().select(dp =>dp.RecordAlias).where(s=>ids.Contains(s.RecordKindId)).selectMany(s=>s.RecordAliasGroup)
to this approach your query do not have internal joins.
I have to put a complex query on your database. But the query ends at 8000 ms. Do I do something wrong? I use .net 1.1 and Entity Framework core 1.1.2 version.
var fol = _context.UserRelations
.Where(u => u.FollowerId == id && u.State == true)
.Select(p => p.FollowingId)
.ToArray();
var Votes = await _context.Votes
.OrderByDescending(c => c.CreationDate)
.Skip(pageSize * pageIndex)
.Take(pageSize)
.Where(fo => fol.Contains(fo.UserId))
.Select(vote => new
{
Id = vote.Id,
VoteQuestions = vote.VoteQuestions,
VoteImages = _context.VoteMedias.Where(m => m.VoteId == vote.Id)
.Select(k => k.MediaUrl.ToString()),
Options = _context.VoteOptions.Where(m => m.VoteId == vote.Id).Select( ques => new
{
OptionsID = ques.Id,
OptionsName = ques.VoteOption,
OptionsCount = ques.VoteRating.Count(cout => cout.VoteOptionsId == ques.Id),
}),
User = _context.Users.Where(u => u.Id == vote.UserId).Select(usr => new
{
Id = usr.Id,
Name = usr.UserProperties.Where(o => o.UserId == vote.UserId).Select(l => l.Name.ToString())
.First(),
Surname = usr.UserProperties.Where(o => o.UserId == vote.UserId)
.Select(l => l.SurName.ToString()).First(),
ProfileImage = usr.UserProfileImages.Where(h => h.UserId == vote.UserId && h.State == true)
.Select(n => n.ImageUrl.ToString()).First()
}),
NextPage = nextPage
}).ToListAsync();
Have a look at the SQL queries you generate to the server (and results of this queries). For SQL Server the best option is SQL Server Profiler, there are ways for other servers too.
you create two queries. First creates fol array and then you pass it into the second query using Contains. Do you know how this works? You probably generate query with as many parameters as many items you have in the array. It is neither pretty or efficient. It is not necessary here, merge it into the main query and you would have only one parameter.
you do paginating before filtering, is this really the way it should work? Also have a look at other ways of paginating based on filtering by ids rather than simple skipping.
you do too much side queries in one query. When you query three sublists of 100 items each, you do not get 300 rows. To get it in one query you create join and get actually 100*100*100 = 1000000 rows. Unless you are sure the frameworks can split it into multiple queries (probably can not), you should query the sublists in separate queries. This would be probably the main performance problem you have.
please use singular to name tables, not plural
for performance analysis, indexes structure and execution plan are vital information and you can not really say much without them
As noted in the comments, you are potentially executing 100, 1000 or 10000 queries. For every Vote in your database that matches the first result you do 3 other queries.
For 1000 votes which result from the first query you need to do 3000 other queries to fetch the data. That's insane!
You have to use EF Cores eager loading feature to fetch this data with very few queries. If your models are designed well with relations and navigation properties its easy.
When you load flat models without a projection (using .Select), you have to use .Include to tell EF Which other related entities it should load.
// Assuming your navigation property is called VoteMedia
await _context.Votes.
.Include(vote => vote.VoteMedia)
...
This would load all VoteMedia objects with the vote. So extra query to get them is not necessary.
But if you use projects, the .Include calls are not necessary (in fact they are even ignored, when you reference navigation properties in the projection).
// Assuming your navigation property is called VoteMedia
await _context.Votes.
.Include(vote => vote.VoteMedia)
...
.Select( vote => new
{
Id = vote.Id,
VoteQuestions = vote.VoteQuestions,
// here you reference to VoteMedia from your Model
// EF Core recognize that and will load VoteMedia too.
//
// When using _context.VoteMedias.Where(...), EF won't do that
// because you directly call into the context
VoteImages = vote.VoteMedias.Where(m => m.VoteId == vote.Id)
.Select(k => k.MediaUrl.ToString()),
// Same here
Options = vote.VoteOptions.Where(m => m.VoteId == vote.Id).Select( ques => ... );
}
Context: EF4, C#, .NET4, SQL2008/R2
Tables/entities to repro problem:
Account (long Id, string Name, etc.)
Order (long Id, DateTime
DateToExecute, int OrderStatus, etc.)
AccountOrder (long Id, long
AccountId, long OrderId) <- Yes, one account may have many orders and, likewise, one order may be associated with many accounts.
OrderedItem (long Id, long OrderId, long
ItemId, etc) <- One order may have many items, and we want to eager-load these items (I realize this has performance/data size implications).
Pseudocode (nearly real code) that would be ideal to work:
DateTime startDateInclusive = xxxx;
DateTime stopDateExclusive = yyy;
var query = Db.Accounts.Include(a => a.AccountOrders.Select(ao => ao.Order.Ordereditems.Select(oi => oi.Item)))
.Where(account =>
account.AccountOrders.Where(ao => ao.OrderStatus != 42)
.Max(ao => ao.DateToExecute).IsBetween(startDateInclusive, stopDateExclusive))
.OrderBy(account =>
account.AccountOrders.Where(ao => ao.OrderStatus != 42)
.Max(ao => ao.DateToExecute));
var results = query.Take(5).ToList();
In English, this is looking for the next 5 accounts that have their last order to be executed within a date range. However, there are also Orders that can be cancelled, so we must exclude OrderStatus of 42 when performing that Max.
The problem revolves around this filtered Max date across many-to-many tables. An added complexity is that we need to sort by that filtered max value and we must do all of the above without breaking our eager loading (i.e. joins must be done via projection in the Where and not using a .Join). I’m not sure how to do this query without the result being 10x’s more complex than it should be. I’d hate to do the joins to filter the ao.OrderStatus/Max the DateToExecute 3 times (once for startDate, once for stopDate, and once for the sort). And clearly the IsBetween isn’t functional.
Any ideas on how to perform this query, sorted this way, in a fairly-efficient way for the generated SQL?
It may be helpful to use an anonymous type here:
DateTime startDateInclusive = xxxx;
DateTime stopDateExclusive = yyy;
var query = Db.Accounts
.Select(account => new {
Account = account,
MaxDate = account.AccountOrders.Select(ao => ao.Order).Where(o => o.OrderStatus != 42).Max(o => o.DateToExecute)
})
.Where(a => a.MaxDate >= startDateInclusive && a.MaxDate < stopDateExclusive)
.OrderBy(a => a.MaxDate)
.Select(a => a.Account)
.Include(a => a.AccountOrders.Select(ao => ao.Order.Ordereditems.Select(oi => oi.Item)));
var results = query.Take(5).ToList();
This is untested as I don't have any datasource to test against. But it's probably the simplest approach for what you need to do.
Is there a way to get the whole count when using the Take operator?
You can do both.
IEnumerable<T> query = ...complicated query;
int c = query.Count();
query = query.Take(n);
Just execute the count before the take. this will cause the query to be executed twice, but i believe that that is unavoidable.
if this is in a Linq2SQL context, as your comment implies then this will in fact query the database twice. As far as lazy loading goes though it will depend on how the result of the query is actually used.
For example: if you have two tables say Product and ProductVersion where each Product has multiple ProductVersions associated via a foreign key.
if this is your query:
var query = db.Products.Where(p => complicated condition).OrderBy(p => p.Name).ThenBy(...).Select(p => p);
where you are just selecting Products but after executing the query:
var results = query.ToList();//forces query execution
results[0].ProductVersions;//<-- Lazy loading occurs
if you reference any foreign key or related object that was not part of the original query then it will be lazy loaded in. In your case, the count will not cause any lazy loading because it is simply returning an int. but depending on what you actually do with the result of the Take() you may or may not have Lazy loading occur. Sometimes it can be difficult to tell if you have LazyLoading ocurring, to check you should log your queries using the DataContext.Log property.
The easiest way would be to just do a Count of the query, and then do Take:
var q = ...;
var count = q.Count();
var result = q.Take(...);
It is possible to do this in a single Linq-to-SQL query (where only one SQL statement will be executed). The generated SQL does look unpleasant though, so your performance may vary.
If this is your query:
IQueryable<Person> yourQuery = People
.Where(x => /* complicated query .. */);
You can append the following to it:
var result = yourQuery
.GroupBy (x => true) // This will match all of the rows from your query ..
.Select (g => new {
// .. so 'g', the group, will then contain all of the rows from your query.
CountAll = g.Count(),
TakeFive = g.Take(5),
// We could also query for a max value.
MaxAgeFromAll = g.Max(x => x.PersonAge)
})
.FirstOrDefault();
Which will let you access your data like so:
// Check that result is not null before access.
// If there are no records to find, then 'result' will return null (because of the grouping)
if(result != null) {
var count = result.CountAll;
var firstFiveRows = result.TakeFive;
var maxPersonAge = result.MaxAgeFromAll;
}