Include vs select projects in EfCore - c#

I have this query
var answers = await context.QuestionnaireUserAnswers
.Include(a => a.QuestionAnsweres)
.Include(a => a.QuestionnaireSentOut).ThenInclude(q => q.Questionnaire)
.Include(a => a.User).ThenInclude(w => w.Organization)
.Where(a => a.QuestionnaireSentOut.Questionnaire.Status != QuestionnaireStatus.ArchivedDraft &&
a.QuestionnaireSentOut.Questionnaire.Status != QuestionnaireStatus.ArchivedPublished)
.Where(a => a.QuestionnaireSentOut.QuestionnaireType.Equals(QuestionnaireSentType.Normal))
.Where(a => a.Status.Equals(QuestionAnsweredStatus.Draft))
.Where(a => a.NextReminderDate < DateTime.UtcNow)
.ToListAsync();
which I changed to this
var answers = await context.QuestionnaireUserAnswers
.Where(a => a.QuestionnaireSentOut.Questionnaire.Status != QuestionnaireStatus.ArchivedDraft &&
a.QuestionnaireSentOut.Questionnaire.Status != QuestionnaireStatus.ArchivedPublished)
.Where(a => a.QuestionnaireSentOut.QuestionnaireType.Equals(QuestionnaireSentType.Normal))
.Where(a => a.Status.Equals(QuestionAnsweredStatus.Draft))
.Where(a => a.NextReminderDate < DateTime.UtcNow)
.Select(ans => new QuestionnaireApiServiceReminderModel
{
AnswerId = ans.Id,
Created = ans.Created,
NextReminderDate = ans.NextReminderDate,
QuestionnaireSentOutQuestionnaireTitle = ans.QuestionnaireSentOut.Questionnaire.Title,
QuestionnaireSentOutTitle = ans.QuestionnaireSentOut.Questionnaire.Title,
User = new SimpleUserWithOrganizationId
{
OrganizationId = ans.User.OrganizationId,
UserId = ans.UserId,
Email = ans.User.Email,
Name = ans.User.Name,
}
})
.ToListAsync();
so that I only get those properties that I need.
Down below my code does this
foreach (var answer in answers) {
.. // some calls
answer.NextReminderDate = DateTime.UtcNow.AddDays(7);
}
await context.SaveChangesAsync();
Now if I go with my updated version then I need to fetch the corresponding QuestionnaireUserAnswers before updating it. Which will be multiple roundtrips to database. Does this mean that in this case it is better to use the first query with includes?
One way to handle this could be write something like this
await context.Database.ExecuteSqlInterpolatedAsync($"UPDATE QuestionnaireUserAnswers SET NextReminderDate = {newNextReminderDate} WHERE Id IN ({string.Join(',', answers.Select(x => x.AnswerId))})");
but is this an approved efcore way?
So is this a better solution in this scanerio?
var answers = await context.QuestionnaireUserAnswers
.Where(a => a.QuestionnaireSentOut.Questionnaire.Status != QuestionnaireStatus.ArchivedDraft &&
a.QuestionnaireSentOut.Questionnaire.Status != QuestionnaireStatus.ArchivedPublished)
.Where(a => a.QuestionnaireSentOut.QuestionnaireType.Equals(QuestionnaireSentType.Normal))
.Where(a => a.Status.Equals(QuestionAnsweredStatus.Draft))
.Where(a => a.NextReminderDate < DateTime.UtcNow)
.Select(ans => new QuestionnaireApiServiceReminderModel
{
Answer = ans,
AnswerId = ans.Id,
Created = ans.Created,
NextReminderDate = ans.NextReminderDate,
QuestionnaireSentOutQuestionnaireTitle = ans.QuestionnaireSentOut.Questionnaire.Title,
QuestionnaireSentOutTitle = ans.QuestionnaireSentOut.Questionnaire.Title,
User = new SimpleUserWithOrganizationId
{
OrganizationId = ans.User.OrganizationId,
UserId = ans.UserId,
Email = ans.User.Email,
Name = ans.User.Name,
}
})
.ToListAsync();
//Sending reminders
foreach (var answer in answers)
{
try
{
answer.Answer.NextReminderDate = DateTime.UtcNow.AddDays(7);
QuestionnaireReminder(
answer.User.Name,
answer.User.Email,
answer.QuestionnaireSentOutQuestionnaireTitle,
answer.QuestionnaireSentOutTitle,
answer.Created.ToString(),
answer.AnswerId,
appServer,
answer.User.OrganizationId, _configuration);
}
catch (Exception)
{
await ErrorHandleService.HandleError("Cound not send questionnaire reminder",
"Cound not send questionnaire reminder email to: " + answer.User.Email,
null, null, null, _configuration, sendDeveloperMails, currentVersion, whoAmIName);
}
}
await context.SaveChangesAsync();

The short answer is Yes, and No.
When reading data, project to DTO/ViewModels like your updated code. This reduces the size of the resulting data to pull back just the info from the related entities that is needed.
When updating data, load the tracked entity/entities.
The "No" part of the answer is that in your first query, remove the Include statements, they are not needed, and should be removed to avoid extra data being sent across the wire, potentially forming a Cartesian Product across 1-to-many tables via JOINs.
var answers = await context.QuestionnaireUserAnswers
.Where(a => a.QuestionnaireSentOut.Questionnaire.Status != QuestionnaireStatus.ArchivedDraft &&
a.QuestionnaireSentOut.Questionnaire.Status != QuestionnaireStatus.ArchivedPublished)
.Where(a => a.QuestionnaireSentOut.QuestionnaireType.Equals(QuestionnaireSentType.Normal))
.Where(a => a.Status.Equals(QuestionAnsweredStatus.Draft))
.Where(a => a.NextReminderDate < DateTime.UtcNow)
.ToListAsync();
Include is not needed to perform Where conditions. EF will work those out into the query just fine. It is only needed in cases where you want to access those related entities within working with the resultset, otherwise they would be lazy loaded with potentially several DB round-trips getting queued up.
This is provided the "Some Calls" don't attempt to access related entities on the answer. If your logic does need some data from QuestionaireSentOut or Questionaire etc. then you can use a mix of projection with the entity reference:
.Select(ans => new
{
QuestionnaireSentOutQuestionnaireTitle = ans.QuestionnaireSentOut.Questionnaire.Title,
QuestionnaireSentOutTitle = ans.QuestionnaireSentOut.Questionnaire.Title,
Answer = ans,
User = ans.User.Select(u => new
{
OrganizationId = u.OrganizationId,
UserId = u.UserId,
Email = u.Email,
Name = u.Name,
}
})
Similar to your projection, you can return the answer entity in the resulting query for processing purposes. Inside the loop if the logic needs details from the user or questionaire, it gets it from the resulting projection, then when you go to update the answer itself, update it via the .Answer reference which will be the tracked EF entity. This approach should not be used for Read-type operations where populating a ViewModel to be sent back to a View etc. It's advisable to not mix view models and entities to avoid errors and performance issues with serialization and such. Where I need details like this I will use anonymous types to discourage passing objects with entity references around.

You should be able do this with the free library Entity Framework Plus using the Batch Update feature. The feature was created to work for multiple records, but it will work just as well for just one record.
It constructs an UPDATE ... SET ... WHERE ... query similar to what you wrote yourself, without loading any data from the database into the DbContext.
Sample code:
using Z.EntityFramework.Plus;
var id = .... ;
var newDate = DateTime.UtcNow.AddDays(7);
context.QuestionnaireUserAnswers
.Where(a => a.Id == id)
.Update(a => new QuestionnaireUserAnswer() { NextReminderDate = newDate });
Or for a collection of Id values:
using Z.EntityFramework.Plus;
var ids = new[] { ... , ... , ... };
var newDate = DateTime.UtcNow.AddDays(7);
context.QuestionnaireUserAnswers
.Where(a => ids.Contains(a.Id))
.Update(a => new QuestionnaireUserAnswer() { NextReminderDate = newDate });
A note: I believe this will not update any QuestionnaireUserAnswer objects that are already loaded in your DbContext. This may not be much of a problem in ASP .NET Core (or API) where contexts are short-lived, but it could be an issue if you use the context for a longer period.

Related

C# linq Contains method with List

I need help with Linq Contains method. Here's the code below.
This code does work but outputs an empty sets.
var query = _context.RegistrationCodes.Select(x => x);
if (request.OperatorId != null && request.OperatorId != Guid.Empty)
{
var checkOperator = _context.Operators.Include(a => a.OperatorLevel).Include(a => a.City).Include("City.StateRegion.Country").FirstOrDefault(a => a.Id == request.OperatorId);
List<String> Cities = new List<String>();
if (checkOperator.OperatorLevel.Name == "City")
{
Cities = await _context.Cities
.Where(a => (checkOperator.CityId) == (a.Id))
.Select(a => a.Code)
.ToListAsync();
}
else if (checkOperator.OperatorLevel.Name == "Regional")
{
Cities = await _context.Cities
.Where(a => checkOperator.City.StateRegionId == a.StateRegionId)
.Select(a => a.Code)
.ToListAsync();
}
else if (checkOperator.OperatorLevel.Name == "National")
{
List<Guid> StateRegion = await _context.StateRegions
.Where(a => checkOperator.City.StateRegion.CountryId == a.CountryId)
.Select(a => a.Id)
.ToListAsync();
Cities = await _context.Cities
.Where(a => StateRegion.Contains(a.StateRegionId))
.Select(a => a.Code)
.ToListAsync();
}
var nullableStrings = Cities.Cast<String?>().ToList();
query = query.Where(a => nullableStrings.Contains(a.Code));
}
I need to compare nullableStrings to a.Code which is something like this, but does not work.
query = query.Where(a => a.Code.Contains(nullableStrings));
Error : Argument 1: cannot convert from 'System.Collections.Generic.List' to 'char'
I need a method that would replace
query = query.Where(a => nullableStrings.Contains(a.Code));
A help would be appreciated. Thanks.
Looking at the code, my guess is the requirement is to get a list of operators depending on the current (check) operator's level. I suspect the issue you are encountering is that some cities may not have a code. You then want to apply all found codes to another query that you are building up.
My guess is that the crux of the problem is that some cities might not have a code, hence the concern for null-able strings, while others might have multiple codes hacked into a single-code intended field. The solution there would typically be to remove any null values
Firstly, this line:
var checkOperator = _context.Operators.Include(a => a.OperatorLevel).Include(a => a.City).Include("City.StateRegion.Country").FirstOrDefault(a => a.Id == request.OperatorId);
can be simplified to:
var checkOperator = _context.Operators
.Select(a => new
{
Level = a.OperatorLevel.Name,
CityId = a.City.Id,
CityCode = a.City.Code,
StateRegionId = a.City.StateRegion.Id,
CountryId = a.City.StateRegion.Country.Id
}).FirstOrDefault(a => a.Id == request.OperatorId);
This builds a faster query, rather than fetching an entire operator object graph, just select the fields from the object graph that we need.
Now to handle the operator level. Here I don't recommend trying to force every scenario into a single pattern. The goal is just to apply a filter to the built query, so have the scenarios do just that:
select (checkOperator.Level)
{
case "City":
query = query.Where(a => a.Code == checkOperator.CityCode);
break;
case "Regional":
var cityCodes = await _context.Cities
.Where(a => a.Code != null && a.StateRegion.Id == checkOperator.StateRegionId)
.Select(a => a.Code)
.ToListAsync();
query = query.Where(a => cityCodes.Contains(a.Code));
break;
case "Country":
var cityCodes = await _context.Cities
.Where(a => a.Code != null && a.StateRegion.Country.Id == checkOperator.CountryId)
.Select(a => a.Code)
.ToListAsync();
query = query.Where(a => cityCodes.Contains(a.Code));
break;
}
Now based on the comments it sounds like your data with cities and codes is breaking proper normalization where Code was intended as a 1-to-1 but later hacked to handle one city having multiple codes, so multiple values were concatenated with hyphens. (I.e. ABC-DEF) If this represents 2 Codes for the city then you will need to handle this..
private List<string> splitCityCodes(List<string> cityCodes)
{
if (cityCodes == null) throw NullReferenceException(nameof(cityCodes));
if (!cityCodes.Any()) throw new ArgumentException("At least one city code is expected.");
var multiCodes = cityCodes.Where(x => x.Contains("-")).ToList();
if (!multiCodes.Any())
return cityCodes;
var results = new List<string>(cityCodes);
results.RemoveRange(multiCodes);
foreach(var multiCode in multiCodes)
{
var codes = multiCode.Split("-");
results.AddRange(codes);
}
return results.Distinct();
}
That can probably be optimized, but the gist is to take the city codes, look for hyphenated values and split them up, then return a distinct list to remove any duplicates.
List<string> cityCodes = new List<string>();
select (checkOperator.Level)
{
case "City":
cityCodes = splitCityCodes(new []{checkOperator.CityCode}.ToList());
if(cityCodes.Count == 1)
query = query.Where(a => a.Code == cityCodes[0]);
else
query = query.Where(a => cityCodes.Contains(a.Code));
break;
case "Regional":
cityCodes = await _context.Cities
.Where(a => a.Code != null && a.StateRegion.Id == checkOperator.StateRegionId)
.Select(a => a.Code)
.ToListAsync();
cityCodes = splitCityCodes(cityCodes);
query = query.Where(a => cityCodes.Contains(a.Code));
break;
case "Country":
cityCodes = await _context.Cities
.Where(a => a.Code != null && a.StateRegion.Country.Id == checkOperator.CountryId)
.Select(a => a.Code)
.ToListAsync();
cityCodes = splitCityCodes(cityCodes);
query = query.Where(a => cityCodes.Contains(a.Code));
break;
}
... and I suspect that would about do it for handling the possibility of a city code containing multiple values.
If your search argument is in the form "ABC-DEF" and you want that to match "ABC" OR "DEF" then it can be done, but it is not clear from your data setup how that scenario comes about.
Lets assume these codes are airport codes, and that a city that has multiple airports has the City.Code as a hyphenated list of the Airport codes, then if the checkOperator is in Australia, and their OperatorLevel is "National" then this might build the following nullableStrings:
var Cities = new List<string> {
"PER",
"ADE",
"DRW",
"MEL-AVV",
"SYD",
"BNE",
"OOL",
"HBA"
};
If then your query is a listing of AirPorts and you want to search the airports by these codes, specifically to match both "MEL" and "AVV" then you can use syntax like this
var nullableStrings = Cities.Cast<String?>().ToList();
query = query.Where(ap => nullableStrings.Any(n => n.Contains(ap.Code)));
But if you intend this to be translated to SQL via LINQ to Entities (so be executed server-side) then we can make this query more efficient buy normalizing the search args so we can do an exact match lookup:
var nullableStrings = Cities.Where(x => !String.IsNullOrWhiteSpace (x))
.SelectMany(x => x.Split('-'))
.Cast<String?>()
.ToList();
query = query.Where(ap => nullableStrings.Contains(ap.Code));
As this routine is called as part of a larger set and your checkOperator goes out of scope, you should try to reduce the fields that you retrieve from the database to the specific set that this query needs through a projection
Using .Select() to project out specific fields can help improve the overall efficiency of the database, not just each individual query. If the additional fields are minimal or natural surrogate keys, and your projections are common to other query scenarios then they can make good candidates for specific index optimizations.
Instead of loading SELECT * from all these table in this include list:
var checkOperator = _context.Operators.Include(a => a.OperatorLevel)
.Include(a => a.City.StateRegion.Country)
.FirstOrDefault(a => a.Id == request.OperatorId);
So instead of all the fields from OperatorLevel, City, StateRegion, Country we can load just the fields that our logic needs:
var checkOperator = _context.Operators.Where(o => o.Id == request.OperatorId)
.Select(o => new {
OperatorLevelName = o.OperatorLevel.Name,
o.CityId,
o.City.StateRegionId,
o.City.StateRegion.CountryId
})
.FirstOrDefault();
So many of the EF has poor performance opinions out there stem from a lot of poorly defined examples that proliferate the web. Eagerly loading is the same as executing SELECT * FROM ... for simple tables it's only a bandwidth and memory waste, but for complex tables that have computed columns or custom expressions there can be significant server CPU costs.
It cannot be overstated the improvements that you can experience if you use projections to expose only the specific sub-set of the data that you need, especially if you will not be attempting to modify the results of the query.
Be a good corporate citizen, only take what you need!
So lets put this back into your logic:
if (request.OperatorId != null && request.OperatorId != Guid.Empty)
{
var checkOperator = _context.Operators.Where(o => o.Id == request.OperatorId)
.Select(o => new {
OperatorLevelName = o.OperatorLevel.Name,
o.CityId,
o.City.StateRegionId,
o.City.StateRegion.CountryId
})
.FirstOrDefault();
IQueryable<City> cityQuery = null;
if (checkOperator.OperatorLevelName == "City")
cityQuery = _context.Cities
.Where(a => checkOperator.CityId == a.Id);
else if (checkOperator.OperatorLevelName == "Regional")
cityQuery = _context. Cities
.Where(a => checkOperator.StateRegionId == a.StateRegionId);
else if (checkOperator.OperatorLevelName == "National")
cityQuery = _context. Cities
.Where(c => c.StateRegion.CountryId == checkOperator.CountryId);
// TODO: is there any default filter when operator level is something else?
if (cityQuery != null)
{
var nullableStrings = cityQuery.Select(a => a.Code)
.ToList()
.Where(x => !String.IsNullOrWhiteSpace(x))
.SelectMany(x => x.Split('-'))
.Cast<String?>()
.ToList();
query = query.Where(ap => nullableStrings.Contains(ap.Code));
}
}
If you don't want or need to normalize the strings, then you can defer this whole expression without realizing the city query at all:
// No nullable string, but we can still remove missing Codes
cityQuery = cityQuery.Where(c => c.Code != null);
query = query.Where(ap => cityQuery.Any(c => c.Code.Contains(ap.Code)));

Optimize ef-core query

does anyone have any ideas how to improve or optimize this query in terms of performance?
An Include cannot be used due to missing Foreign Keys / Navigation Properties because this is a scaffolded model.
using (var session = new Typo3DBContext())
{
var countryList = session.TxNeustageodataDomainModelCountry
.Where(x => x.Deleted == 0)
.Join(session.TxNeustameinereiseDomainModelTripCountryMm,
country => (uint)country.Uid,
tripMM => tripMM.UidForeign,
(country, tripMM) =>
new
{
country = country,
tripMM = tripMM
})
.Join(session.TxNeustameinereiseDomainModelTrip,
combinedEntry => combinedEntry.tripMM.UidLocal,
trip => trip.Uid,
(combinedEntry, trip) =>
new
{
combinedEntry = combinedEntry,
trip = trip
})
.GroupBy(
temp =>
new
{
Name = temp.combinedEntry.country.Name,
Iso = temp.combinedEntry.country.Iso,
Id = temp.combinedEntry.tripMM.UidForeign,
Status = temp.trip.Status,
Deleted = temp.trip.Deleted
},
temp => temp.combinedEntry.tripMM
)
.Where(x => x.Key.Status == 2 && x.Key.Deleted == 0)
.Select(
group =>
new CountryHelperClass
{
Count = group.Count(),
Iso = group.Key.Iso,
Name = group.Key.Name,
Id = group.Key.Id
})
.ToList();
return countryList;
}
You may analyze the generated SQL first and see if optimal sql is being generated. you may follow the this link to start. Another good tool to work with linq queries is to use LINQPad. Some of the common issue with Linq queries are
The ‘N+1 Select’ problem (If you are using ef core 3 This and other sql related issue re being optimized):
To greedy with row and columns
Change Tracking related issues
Missing indexes
Details of these issue can be found in above link an on internet also
Normally i go for stored procedure approach for complex queries as it saves lot of time of optimization of queries

Linq - Order by in Include

I have a situation where OrderBy need to be done for Include object. This is how I have tried so far
Customers query = null;
try
{
query = _context.Customers
.Include(x => x.CustomerStatus)
.ThenInclude(x => x.StatusNavigation)
.Select(x => new Customers()
{
Id = x.Id,
Address = x.Address,
Contact = x.Contact,
Name = x.Name,
CustomerStatus = new List<CustomerStatus>
{
x.CustomerStatus.OrderByDescending(y => y.Date).FirstOrDefault()
}
})
.FirstOrDefault(x => x.Id == 3);
}
catch (Exception ex)
{
throw;
}
The above code successfully ordering the include element but it is not including it's child table.
Eg: Customer include CustomerStatus but CustomerStatus not including StatusNavigation tables.
I even tried with this but neither it can help me
_context.Customers
.Include(x => x.CustomerStatus.OrderByDescending(y => y.Date).FirstOrDefault())
.ThenInclude(x => x.StatusNavigation).FirstOrDefault(x => x.Id == 3);
What am I doing wrong please guide me someone
Even I tried this way
var query = _context.CustomerStatus
.GroupBy(x => x.CustomerId)
.Select(x => x.OrderByDescending(y => y.Date).FirstOrDefault())
.Include(x => x.StatusNavigation)
.Join(_context.Customers, first => first.CustomerId, second => second.Id, (first, second) => new Customers
{
Id = second.Id,
Name = second.Name,
Address = second.Address,
Contact = second.Contact,
CustomerStatus = new List<CustomerStatus> {
new CustomerStatus
{
Id = first.Id,
CustomerId = first.CustomerId,
Date = first.Date,
StatusNavigation = first.StatusNavigation
}
},
}).FirstOrDefault(x => x.Id == 3);
but this is hitting a databases a 3 times and filtering the result in memory.
First select all data from customer status and then from status and then from customer then it filter all the data in memory. Is there any other efficient way to do this??
This is how I have prepared by entity class
As #Chris Pratt mentioned once you are doing new Customer inside the select you are creating a new model. You are discarding the models build by the EntityFramework. My suggestion would be have the query just:
query = _context.Customers
.Include(x => x.CustomerStatus)
.ThenInclude(x => x.StatusNavigation);
Like this you would have an IQueryable object which it would not be executed unless you do a select from it:
var customer3 = query.FirstOrDefault(x=>x.Id==3)
Which returns the customer and the interlinked tables (CustomerStatus and StatusNavigation). Then you can create the object that you want:
var customer = new Customers()
{
Id = customer3.Id,
Address = customer3.Address,
Contact = customer3.Contact,
Name = x.Name,
CustomerStatus = new List<CustomerStatus>
{
customer3.CustomerStatus.OrderByDescending(y => y.Date).FirstOrDefault()
}
})
In this way you can reuse the query for creating different response objects and have a single querying to database, but downside is that more memory is used then the original query (even though it shouldn't be too much of an issue).
If the model that is originally return from database doesn't meet the requirements (i.e. you always need to do: CustomerStatus = new List {...} ) it might indicate that the database schema is not well defined to the needs of the application, so a refactoring might be needed.
What I think is happening is that you are actually overriding the Include and ThenInclude. Include is explicitly to eager-load a navigation property. However, you're doing a couple of things that are likely hindering this.
First, you're selecting into a new Customer. That alone may be enough to break the logic of Include. Second, you're overriding what gets put in the CustomerStatus collection. That should ideally be just loaded in automatically via Include, but by altering it to just have the first entity, you're essentially throwing away the effect of Include. (Selecting a relationship is enough to cause a join to be issued, without explicitly calling Include). Third, the ThenInclude is predicated on the Include, so overriding that is probably throwing out the ThenIncude as well.
All this is conjecture. I haven't done anything exactly like what you're doing here before, but nothing else makes sense.
Try selecting into a new CustomerStatus as well:
CustomerStatus = x.CustomerStatus.OrderByDescending(o => o.Date).Select(s => new CustomerStatus
{
x.Id,
x.Status,
x.Date,
x.CustomerId,
x.Customer,
x.StatusNavigation
})
You can remove the Include/ThenInclude at that point, because the act of selecting these relationships will cause the join.
After Reading from Couple of sources (Source 1) and (Source 2). I think what is happening is that If you use select after Include. It disregards Include even if you are using Include query data in select. So to solve this use .AsEnumerable() before calling select.
query = _context.Customers
.Include(x => x.CustomerStatus)
.ThenInclude(x => x.StatusNavigation)
.AsEnumerable()
.Select(x => new Customers()
{
Id = x.Id,
Address = x.Address,
Contact = x.Contact,
Name = x.Name,
CustomerStatus = new List<CustomerStatus>
{
x.CustomerStatus.OrderByDescending(y => y.Date).FirstOrDefault()
}
})
.FirstOrDefault(x => x.Id == 3);

How can I reuse a subquery inside a select expression?

In my database I have two tables Organizations and OrganizationMembers, with a 1:N relationship.
I want to express a query that returns each organization with the first and last name of the first organization owner.
My current select expression works, but it's neither efficient nor does it look right to me, since every subquery gets defined multiple times.
await dbContext.Organizations
.AsNoTracking()
.Select(x =>
{
return new OrganizationListItem
{
Id = x.Id,
Name = x.Name,
OwnerFirstName = (x.Members.OrderBy(member => member.CreatedAt).First(member => member.Role == RoleType.Owner)).FirstName,
OwnerLastName = (x.Members.OrderBy(member => member.CreatedAt).First(member => member.Role == RoleType.Owner)).LastName,
OwnerEmailAddress = (x.Members.OrderBy(member => member.CreatedAt).First(member => member.Role == RoleType.Owner)).EmailAddress
};
})
.ToArrayAsync();
Is it somehow possible to summarize or reuse the subqueries, so I don't need to define them multiple times?
Note that I've already tried storing the subquery result in a variable. This doesn't work, because it requires converting the expression into a statement body, which results in a compiler error.
The subquery can be reused by introducing intermediate projection (Select), which is the equivalent of let operator in the query syntax.
For instance:
dbContext.Organizations.AsNoTracking()
// intermediate projection
.Select(x => new
{
Organization = x,
Owner = x.Members
.Where(member => member.Role == RoleType.Owner)
.OrderBy(member => member.CreatedAt)
.FirstOrDefault()
})
// final projection
.Select(x => new OrganizationListItem
{
Id = x.Organization.Id,
Name = x.Organization.Name,
OwnerFirstName = Owner.FirstName,
OwnerLastName = Owner.LastName,
OwnerEmailAddress = Owner.EmailAddress
})
Note that in pre EF Core 3.0 you have to use FirstOrDefault instead of First if you want to avoid client evaluation.
Also this does not make the generated SQL query better/faster - it still contains separate inline subquery for each property included in the final select. Hence will improve readability, but not the efficiency.
That's why it's usually better to project nested object into unflattened DTO property, i.e. instead of OwnerFirstName, OwnerLastName, OwnerEmailAddress have a class with properties FirstName, LastName, EmailAddress and property let say Owner of that type in OrganizationListItem (similar to entity with reference navigation property). This way you will be able to use something like
dbContext.Organizations.AsNoTracking()
.Select(x => new
{
Id = x.Organization.Id,
Name = x.Organization.Name,
Owner = x.Members
.Where(member => member.Role == RoleType.Owner)
.OrderBy(member => member.CreatedAt)
.Select(member => new OwnerInfo // the new class
{
FirstName = member.FirstName,
LastName = member.LastName,
EmailAddress = member.EmailAddress
})
.FirstOrDefault()
})
Unfortunately in pre 3.0 versions EF Core will generate N + 1 SQL queries for this LINQ query, but in 3.0+ it will generate a single and quite efficient SQL query.
How about this:
await dbContext.Organizations
.AsNoTracking()
.Select(x =>
{
var firstMember = x.Members.OrderBy(member => member.CreatedAt).First(member => member.Role == RoleType.Owner);
return new OrganizationListItem
{
Id = x.Id,
Name = x.Name,
OwnerFirstName = firstMember.FirstName,
OwnerLastName = firstMember.LastName,
OwnerEmailAddress = firstMember.EmailAddress
};
})
.ToArrayAsync();
How about doing this like
await dbContext.Organizations
.AsNoTracking()
.Select(x => new OrganizationListItem
{
Id = x.Id,
Name = x.Name,
OwnerFirstName = x.Members.FirstOrDefault(member => member.Role == RoleType.Owner).FirstName,
OwnerLastName = x.Members.FirstOrDefault(member => member.Role == RoleType.Owner)).LastName,
OwnerEmailAddress = x.Members.FirstOrDefault(member => member.Role == RoleType.Owner)).EmailAddress
})
.ToArrayAsync();

SQL Azure vs. On-Premises Timeout Issue - EF

I'm working on a report right now that runs great with our on-premises DB (just refreshed from PROD). However, when I deploy the site to Azure, I get a SQL Timeout during its execution. If I point my development instance at the SQL Azure instance, I get a timeout as well.
Goal: To output a list of customers that have had an activity created during the search range, and when that customer is found, get some other information about that customer regarding policies, etc. I've removed some of the properties below for brevity (as best I can)...
UPDATE
After lots of trial and error, I can get the entire query to run fairly consistently within 1000MS so long as this block of code is not executed.
CurrentStatus = a.Activities
.Where(b => b.ActivityType.IsReportable)
.OrderByDescending(b => b.DueDateTime)
.Select(b => b.Status.Name)
.FirstOrDefault(),
With this code in place, things begin to go haywire. I think this Where clause is a big part of it: .Where(b => b.ActivityType.IsReportable). What is the best way to grab the status name?
EXISTING CODE
Any thoughts as to why SQL Azure would timeout whereas on-premises would turn this around in less than 100MS?
return db.Customers
.Where(a => a.Activities.Where(
b => b.CreatedDateTime >= search.BeginDateCreated
&& b.CreatedDateTime <= search.EndDateCreated).Count() > 0)
.Where(a => a.CustomerGroup.Any(d => d.GroupId== search.GroupId))
.Select(a => new CustomCustomerReport
{
CustomerId = a.Id,
Manager = a.Manager.Name,
Customer = a.FirstName + " " + a.LastName,
ContactSource= a.ContactSource!= null ? a.ContactSource.Name : "Unknown",
ContactDate = a.DateCreated,
NewSale = a.Sales
.Where(p => p.Employee.IsActive)
.OrderByDescending(p => p.DateCreated)
.Select(p => new PolicyViewModel
{
//MISC PROPERTIES
}).FirstOrDefault(),
ExistingSale = a.Sales
.Where(p => p.CancellationDate == null || p.CancellationDate <= myDate)
.Where(p => p.SaleDate < myDate)
.OrderByDescending(p => p.DateCreated)
.Select(p => new SalesViewModel
{
//MISC PROPERTIES
}).FirstOrDefault(),
CurrentStatus = a.Activities
.Where(b => b.ActivityType.IsReportable)
.OrderByDescending(b => b.DueDateTime)
.Select(b => b.Disposition.Name)
.FirstOrDefault(),
CustomerGroup = a.CustomerGroup
.Where(cd => cd.GroupId == search.GroupId)
.Select(cd => new GroupViewModel
{
//MISC PROPERTIES
}).FirstOrDefault()
}).ToList();
I cannot give you a definite answer but I would recommend approaching the problem by:
Run SQL profiler locally when this code is executed and see what SQL is generated and run. Look at the query execution plan for each query and look for table scans and other slow operations. Add indexes as needed.
Check your lambdas for things that cannot be easily translated into SQL. You might be pulling the contents of a table into memory and running lambdas on the results, which will be very slow. Change your lambdas or consider writing raw SQL.
Is the Azure database the same as your local database? If not, pull the data locally so your local system is indicative.
Remove sections (i.e. CustomerGroup then CurrentDisposition then ExistingSale then NewSale) and see if there is a significant performance improvement after removing the last section. Focus on the last removed section.
Looking at the line itself:
You use ".Count() > 0" on line 4. Use ".Any()" instead, since the former goes through every row in the database to get you an accurate count when you just want to know if at least one row satisfies the requirements.
Ensure fields referenced in where clauses have indexes, such as IsReportable.
Short answer: use memory.
Long answer:
Because of either bad maintenance plans or limited hardware, running this query in one big lump is what's causing it to fail on Azure. Even if that weren't the case, because of all the navigation properties you're using, this query would generate a staggering number of joins. The answer here is to break it down in smaller pieces that Azure can run. I'm going to try to rewrite your query into multiple smaller, easier to digest queries that use the memory of your .NET application. Please bear with me as I make (more or less) educated guesses about your business logic/db schema and rewrite the query accordingly. Sorry for using the query form of LINQ but I find things such as join and group by are more readable in that form.
var activityFilterCustomerIds = db.Activities
.Where(a =>
a.CreatedDateTime >= search.BeginDateCreated &&
a.CreatedDateTime <= search.EndDateCreated)
.Select(a => a.CustomerId)
.Distinct()
.ToList();
var groupFilterCustomerIds = db.CustomerGroup
.Where(g => g.GroupId = search.GroupId)
.Select(g => g.CustomerId)
.Distinct()
.ToList();
var customers = db.Customers
.AsNoTracking()
.Where(c =>
activityFilterCustomerIds.Contains(c.Id) &&
groupFilterCustomerIds.Contains(c.Id))
.ToList();
var customerIds = customers.Select(x => x.Id).ToList();
var newSales =
(from s in db.Sales
where customerIds.Contains(s.CustomerId)
&& s.Employee.IsActive
group s by s.CustomerId into grouped
select new
{
CustomerId = grouped.Key,
Sale = grouped
.OrderByDescending(x => x.DateCreated)
.Select(new PolicyViewModel
{
// properties
})
.FirstOrDefault()
}).ToList();
var existingSales =
(from s in db.Sales
where customerIds.Contains(s.CustomerId)
&& (s.CancellationDate == null || s.CancellationDate <= myDate)
&& s.SaleDate < myDate
group s by s.CustomerId into grouped
select new
{
CustomerId = grouped.Key,
Sale = grouped
.OrderByDescending(x => x.DateCreated)
.Select(new SalesViewModel
{
// properties
})
.FirstOrDefault()
}).ToList();
var currentStatuses =
(from a in db.Activities.AsNoTracking()
where customerIds.Contains(a.CustomerId)
&& a.ActivityType.IsReportable
group a by a.CustomerId into grouped
select new
{
CustomerId = grouped.Key,
Status = grouped
.OrderByDescending(x => x.DueDateTime)
.Select(x => x.Disposition.Name)
.FirstOrDefault()
}).ToList();
var customerGroups =
(from cg in db.CustomerGroups
where cg.GroupId == search.GroupId
group cg by cg.CustomerId into grouped
select new
{
CustomerId = grouped.Key,
Group = grouped
.Select(x =>
new GroupViewModel
{
// ...
})
.FirstOrDefault()
}).ToList();
return customers
.Select(c =>
new CustomCustomerReport
{
// ... simple props
// ...
// ...
NewSale = newSales
.Where(s => s.CustomerId == c.Id)
.Select(x => x.Sale)
.FirstOrDefault(),
ExistingSale = existingSales
.Where(s => s.CustomerId == c.Id)
.Select(x => x.Sale)
.FirstOrDefault(),
CurrentStatus = currentStatuses
.Where(s => s.CustomerId == c.Id)
.Select(x => x.Status)
.FirstOrDefault(),
CustomerGroup = customerGroups
.Where(s => s.CustomerId == c.Id)
.Select(x => x.Group)
.FirstOrDefault(),
})
.ToList();
Hard to suggest anything without seeing actual table definitions, espectially the indexes and foreign keys on Activities entity.
As far I understand Activity (CustomerId, ActivityTypeId, DueDateTime, DispositionId). If this is standard warehousing table (DateTime, ClientId, Activity), I'd suggest the following:
If number of Activities is reasonably small, then force the use of CONTAINS by
var activities = db.Activities.Where( x => x.IsReportable ).ToList();
...
.Where( b => activities.Contains(b.Activity) )
You can even help the optimiser by specifying that you want ActivityId.
Indexes on Activitiy entity should be up to date. For this particular query I suggest (CustomerId, ActivityId, DueDateTime DESC)
precache Disposition table, my crystal ball tells me that it's dictionary table.
For similar task to avoid constantly hitting Activity table I made another small table (CustomerId, LastActivity, LastVAlue) and updated it as the status changed.

Categories