I am going through this tutorial to help me better understand the EF Structure. I currently use SQL.
https://learn.microsoft.com/en-us/aspnet/core/data/ef-rp/read-related-data?view=aspnetcore-2.1&tabs=visual-studio
In this example, it shows the instructor, office, student, course, grade, and assignments
public async Task OnGetAsync(int? id, int? courseID)
{
Instructor = new InstructorIndexData();
Instructor.Instructors = await _context.Instructors
.Include(i => i.OfficeAssignment)
.Include(i => i.CourseAssignments)
.ThenInclude(i => i.Course)
.ThenInclude(i => i.Department)
.Include(i => i.CourseAssignments)
.ThenInclude(i => i.Course)
.ThenInclude(i => i.Enrollments)
.ThenInclude(i => i.Student)
.AsNoTracking()
.OrderBy(i => i.LastName)
.ToListAsync();
if (id != null)
{
InstructorID = id.Value;
Instructor instructor = Instructor.Instructors.Where(
i => i.ID == id.Value).Single();
Instructor.Courses = instructor.CourseAssignments.Select(s => s.Course);
}
if (courseID != null)
{
CourseID = courseID.Value;
Instructor.Enrollments = Instructor.Courses.Where(
x => x.CourseID == courseID).Single().Enrollments;
}
}
To help me better understand the syntax would this SQL Statement be the equivalent?
SELECT *
FROM Instructor INNER JOIN
OfficeAssignment ON Instructor.ID = OfficeAssignment.InstructorID INNER JOIN
Department ON Instructor.ID = Department.InstructorID INNER JOIN
Course ON Department.DepartmentID = Course.DepartmentID INNER JOIN
Enrollment ON Course.CourseID = Enrollment.CourseID INNER JOIN
CourseAssignment ON Course.CourseID = CourseAssignment.CourseID INNER JOIN
Student ON Enrollment.StudentID = Student.ID
WHERE Instructor.ID = #ID AND Course.CourseID = #CourseID ORDER BY Instructor.Lastname
It helps to use entities as objects rather than thinking of them as tables. Yes, they typically correlate directly to the underlying tables, but that is a means to an end. You can leverage the relationships more directly than simply treating it as another way to write SQL.
For example:
Instructor.Instructors = await _context.Instructors
.Include(i => i.OfficeAssignment)
.Include(i => i.CourseAssignments)
.ThenInclude(i => i.Course)
.ThenInclude(i => i.Department)
.Include(i => i.CourseAssignments)
.ThenInclude(i => i.Course)
.ThenInclude(i => i.Enrollments)
.ThenInclude(i => i.Student)
.AsNoTracking()
.OrderBy(i => i.LastName)
.ToListAsync();
This will correspond roughly to an SQL statement with a bunch of inner joins and an OrderBy clause. In the realm of EF though, this would be considered bad practice. The reason is that like an SQL statement with inner joins, you are effectively doing a "SELECT *" across all of those tables. Do you really want all of the columns of all of the joined tables?
AsNoTracking() merely tells EF that for the data retrieved, you aren't going to modify it, so don't bother tracking dirty state. This is a performance tweak for read operations.
ToListAsync() performs the query as an awaitable operation which will free up the thread the method was called on. No magic multi-threaded execution here, just the call can hand off to SQL Server, release it's thread, then be assigned a new thread based on a continuation point after the await.
One warning sign I see with the example is the use of the null-able parameters. Can this method validly be called with:
Neither an ID or course ID?
and
An ID with no course ID?
and
A course ID with no ID?
and
Both an ID and course ID?
If any of these combinations is invalid then the method should be split up or refined.
Getting back to the "SELECT *" behaviour, using EF you have a lot of power hiding behind the scenes ready to turn Linq map/reduce operations into SQL to run against the server and return you a meaningful, minimal set of data.
For example:
var query = _context.Instructors.AsQueryable();
if (id.HasValue)
query = query.Where(i => i.ID == id.Value);
query = query.OrderBy(i => i.LastName);
var instructors = await query.Select(i => new InstructorIndexData
{
InstructorId = i.ID,
// ...
Courses = i.CourseAssignments.Select(ca => new CourseData {
CourseId = ca.Course.ID,
CourseName = ca.Course.Name,
//..
}
}).ToListAsync()
if (courseId.HasValue)
{
var enrollments = await query.SelectMany(i => i.Courses.SingleOrDefault(c => c.CourseID == courseID.Value).Enrollments.Select(e => new EnrollmentData
{
InstructorId = i.ID,
EnrollmentId = e.EnrollmentID,
CourseId = e.Course.CourseID,
//...
}).ToListAsync();
// From here, group the Enrollments by Instructor ID and add them to the Instructor index data.
var groupedEnrollments = enrollments.GroupBy(e => e.InstructorId);
foreach(instructorId in groupedEnrollments.Keys)
{
var instructor = instructors.Single(i => i.InstructorId == instructorId);
instructor.Enrollments = groupedEnrollments[instructorId].ToList();
}
}
Now the caveat here is that I'm basing this on memory and with a rough guess of your structure and desired output. The key points would be leveraging the IQueryable and issuing Select statements to just pull back the exact data you need to populate the objects you want to provide to a view.
I do this in 2 query executions, one to get the instructor(s), then the second to get the enrollments if requested based on the provided course ID. Personally I'd split this into two methods since I'd expect the enrollments would be optional. Also there is a difference between fetching one instructor, and all instructors. In cases where potentially large amounts of data are returned, you should look at establishing pagination with Skip() and Take() to avoid expensive queries bogging down the CPU, network, and memory usage.
Related
I have a situation where OrderBy need to be done for Include object. This is how I have tried so far
Customers query = null;
try
{
query = _context.Customers
.Include(x => x.CustomerStatus)
.ThenInclude(x => x.StatusNavigation)
.Select(x => new Customers()
{
Id = x.Id,
Address = x.Address,
Contact = x.Contact,
Name = x.Name,
CustomerStatus = new List<CustomerStatus>
{
x.CustomerStatus.OrderByDescending(y => y.Date).FirstOrDefault()
}
})
.FirstOrDefault(x => x.Id == 3);
}
catch (Exception ex)
{
throw;
}
The above code successfully ordering the include element but it is not including it's child table.
Eg: Customer include CustomerStatus but CustomerStatus not including StatusNavigation tables.
I even tried with this but neither it can help me
_context.Customers
.Include(x => x.CustomerStatus.OrderByDescending(y => y.Date).FirstOrDefault())
.ThenInclude(x => x.StatusNavigation).FirstOrDefault(x => x.Id == 3);
What am I doing wrong please guide me someone
Even I tried this way
var query = _context.CustomerStatus
.GroupBy(x => x.CustomerId)
.Select(x => x.OrderByDescending(y => y.Date).FirstOrDefault())
.Include(x => x.StatusNavigation)
.Join(_context.Customers, first => first.CustomerId, second => second.Id, (first, second) => new Customers
{
Id = second.Id,
Name = second.Name,
Address = second.Address,
Contact = second.Contact,
CustomerStatus = new List<CustomerStatus> {
new CustomerStatus
{
Id = first.Id,
CustomerId = first.CustomerId,
Date = first.Date,
StatusNavigation = first.StatusNavigation
}
},
}).FirstOrDefault(x => x.Id == 3);
but this is hitting a databases a 3 times and filtering the result in memory.
First select all data from customer status and then from status and then from customer then it filter all the data in memory. Is there any other efficient way to do this??
This is how I have prepared by entity class
As #Chris Pratt mentioned once you are doing new Customer inside the select you are creating a new model. You are discarding the models build by the EntityFramework. My suggestion would be have the query just:
query = _context.Customers
.Include(x => x.CustomerStatus)
.ThenInclude(x => x.StatusNavigation);
Like this you would have an IQueryable object which it would not be executed unless you do a select from it:
var customer3 = query.FirstOrDefault(x=>x.Id==3)
Which returns the customer and the interlinked tables (CustomerStatus and StatusNavigation). Then you can create the object that you want:
var customer = new Customers()
{
Id = customer3.Id,
Address = customer3.Address,
Contact = customer3.Contact,
Name = x.Name,
CustomerStatus = new List<CustomerStatus>
{
customer3.CustomerStatus.OrderByDescending(y => y.Date).FirstOrDefault()
}
})
In this way you can reuse the query for creating different response objects and have a single querying to database, but downside is that more memory is used then the original query (even though it shouldn't be too much of an issue).
If the model that is originally return from database doesn't meet the requirements (i.e. you always need to do: CustomerStatus = new List {...} ) it might indicate that the database schema is not well defined to the needs of the application, so a refactoring might be needed.
What I think is happening is that you are actually overriding the Include and ThenInclude. Include is explicitly to eager-load a navigation property. However, you're doing a couple of things that are likely hindering this.
First, you're selecting into a new Customer. That alone may be enough to break the logic of Include. Second, you're overriding what gets put in the CustomerStatus collection. That should ideally be just loaded in automatically via Include, but by altering it to just have the first entity, you're essentially throwing away the effect of Include. (Selecting a relationship is enough to cause a join to be issued, without explicitly calling Include). Third, the ThenInclude is predicated on the Include, so overriding that is probably throwing out the ThenIncude as well.
All this is conjecture. I haven't done anything exactly like what you're doing here before, but nothing else makes sense.
Try selecting into a new CustomerStatus as well:
CustomerStatus = x.CustomerStatus.OrderByDescending(o => o.Date).Select(s => new CustomerStatus
{
x.Id,
x.Status,
x.Date,
x.CustomerId,
x.Customer,
x.StatusNavigation
})
You can remove the Include/ThenInclude at that point, because the act of selecting these relationships will cause the join.
After Reading from Couple of sources (Source 1) and (Source 2). I think what is happening is that If you use select after Include. It disregards Include even if you are using Include query data in select. So to solve this use .AsEnumerable() before calling select.
query = _context.Customers
.Include(x => x.CustomerStatus)
.ThenInclude(x => x.StatusNavigation)
.AsEnumerable()
.Select(x => new Customers()
{
Id = x.Id,
Address = x.Address,
Contact = x.Contact,
Name = x.Name,
CustomerStatus = new List<CustomerStatus>
{
x.CustomerStatus.OrderByDescending(y => y.Date).FirstOrDefault()
}
})
.FirstOrDefault(x => x.Id == 3);
I have 2 tables:
USERS
UserId
Name
Scores (collection of table Scores)
SCORES
UserId
CategoryId
Points
I need to show all the users and a SUM of their points, but also I need to show the name of the user. It can be filtered by CategoryId or not.
Context.Scores
.Where(p => p.CategoryId == categoryId) * OPTIONAL
.GroupBy(p => p.UserId)
.Select(p => new
{
UserId = p.Key,
Points = p.Sum(s => s.Points),
Name = p.Select(s => s.User.Name).FirstOrDefault()
}).OrderBy(p => p.Points).ToList();
The problem is that when I add the
Name = p.Select(s => s.User.Name).FirstOrDefault()
It takes so long. I don't know how to access the properties that are not inside the GroupBy or are a SUM. This example is very simple becaouse I don't have only the Name, but also other properties from User table.
How can I solve this?
It takes so long because the query is causing client evaluation. See Client evaluation performance issues and how to use Client evaluation logging to identify related issues.
If you are really on EF Core 2.0, there is nothing you can do than upgrading to v2.1 which contains improved LINQ GroupBy translation. Even with it the solution is not straight forward - the query still uses client evaluation. But it could be rewritten by separating the GroupBy part into subquery and joining it to the Users table to get the additional information needed.
Something like this:
var scores = db.Scores.AsQueryable();
// Optional
// scores = scores.Where(p => p.CategoryId == categoryId);
var points = scores
.GroupBy(s => s.UserId)
.Select(g => new
{
UserId = g.Key,
Points = g.Sum(s => s.Points),
});
var result = db.Users
.Join(points, u => u.UserId, p => p.UserId, (u, p) => new
{
u.UserId,
u.Name,
p.Points
})
.OrderBy(p => p.Points)
.ToList();
This still produces a warning
The LINQ expression 'orderby [p].Points asc' could not be translated and will be evaluated locally.
but at least the query is translated and executes as single SQL:
SELECT [t].[UserId], [t].[Points], [u].[UserId] AS [UserId0], [u].[Name]
FROM [Users] AS [u]
INNER JOIN (
SELECT [s].[UserId], SUM([s].[Points]) AS [Points]
FROM [Scores] AS [s]
GROUP BY [s].[UserId]
) AS [t] ON [u].[UserId] = [t].[UserId]
The goal is to get the first DateTime and Last DateTime from a collection on an Entity (Foreign Key). My Entity is an organization and my collection are Invoices. I'm grouping results since Organizations unfortunately are not Unique. I'm dealing with duplicate data and cannot assume my organizations are unique so I'm grouping by a Number field on my Entity.
I'm using .NET Core 2.1.2 with Entity Framework.
I'm trying to get the following query generated from LINQ:
SELECT MIN([organization].[Id]) AS Id, MIN([organization].[Name]) AS Name,
MIN([organization].[Number]) AS Number, MIN([invoice].[Date])
AS First, MAX([invoice].[Date]) AS Last
FROM [organization]
INNER JOIN [invoice] ON [invoice].[OrganizationId] = [organization].[Id]
GROUP BY [organization].[Number], [organization].[Name]
ORDER BY [organization].[Name]
However I have no idea how to get to write the LINQ query to get it to generate this result.
I got as far as:
await _context
.Organization
.Where(z => z.Invoices.Any())
.GroupBy(organization => new
{
organization.Number,
organization.Name
})
.Select(grouping => new
{
Id = grouping.Min(organization => organization.Id),
Name = grouping.Min(organization => organization.Name),
Number= grouping.Min(organization => organization.Number),
//First = ?,
//Last = ?
})
.OrderBy(z => z.Name)
.ToListAsync();
I have no clue how to write the LINQ query in such a way that it generates the above.
I have a couple questions still:
Are the Min statements for Id, Name and Number correct ways of getting the
first element in the grouping?
Do I need a join statement or is "WHERE EXISTS" better (this got generated before I changed the code)?
Does anyone know how to finish writing the LINQ statement? Because I have to get the first and last Date from the Invoices Collection on my Organization Entity:
organization.Invoices.Min(invoice => invoice.Date)
organization.Invoices.Max(invoice => invoice.Date)
Here is the trick.
To make inner join by using collection navigation property simple use SelectMany and project all primitive properties that you need later (this is important for the current EF Core query translator). Then perform the GroupBy and project the key properties / aggregates. Finally do the ordering.
So
var query = _context
.Organization
.SelectMany(organization => organization.Invoices, (organization, invoice) => new
{
organization.Id,
organization.Number,
organization.Name,
invoice.Date
})
.GroupBy(e => new
{
e.Number,
e.Name
})
.Select(g => new
{
Id = g.Min(e => e.Id),
Name = g.Key.Name,
Number = g.Key.Number,
First = g.Min(e => e.Date),
Last = g.Max(e => e.Date),
})
.OrderBy(e => e.Name);
is translated to
SELECT MIN([organization].[Id]) AS [Id], [organization].[Name], [organization].[Number],
MIN([organization.Invoice].[Date]) AS [First], MAX([organization.Invoice].[Date]) AS [Last]
FROM [Organization] AS [organization]
INNER JOIN [Invoice] AS [organization.Invoice] ON [organization].[Id] = [organization.Invoice].[OrganizationId]
GROUP BY [organization].[Number], [organization].[Name]
ORDER BY [organization].[Name]
I have an entity with a few ordinary 1 to many relationships and one collection with implicit 1 to many relationship without any relationship defined.
What I'm trying to do is to make it work together like:
IQueryable<Employee> leftJoin =
_dbContext.EmployeeList
.GroupJoin(
inner: _dbContext.ContactList.Where(x => x.TableType == TableType.Employee),
outerKeySelector: employee => employee.EmployeeId,
innerKeySelector: contact => contact.TableId,
resultSelector: (employee, contacts) => new { employee, contacts = contacts.DefaultIfEmpty() }
)
.AsEnumerable()
.Select(x =>
{
x.employee.ContactList = x.contacts;
return x.employee;
})
.AsQueryable()
.Include(x => x.EmployeeRoleMapList);
The main hassle is I need to somehow init Employee.ContactList from joined set, within IQuerable it's not possible, I'm made to cast it to the IEnumerable but after that ordinary include doesn't work which is logical as well. Is there some workaround or some different way I could use to achieve this?
I'm working on a report right now that runs great with our on-premises DB (just refreshed from PROD). However, when I deploy the site to Azure, I get a SQL Timeout during its execution. If I point my development instance at the SQL Azure instance, I get a timeout as well.
Goal: To output a list of customers that have had an activity created during the search range, and when that customer is found, get some other information about that customer regarding policies, etc. I've removed some of the properties below for brevity (as best I can)...
UPDATE
After lots of trial and error, I can get the entire query to run fairly consistently within 1000MS so long as this block of code is not executed.
CurrentStatus = a.Activities
.Where(b => b.ActivityType.IsReportable)
.OrderByDescending(b => b.DueDateTime)
.Select(b => b.Status.Name)
.FirstOrDefault(),
With this code in place, things begin to go haywire. I think this Where clause is a big part of it: .Where(b => b.ActivityType.IsReportable). What is the best way to grab the status name?
EXISTING CODE
Any thoughts as to why SQL Azure would timeout whereas on-premises would turn this around in less than 100MS?
return db.Customers
.Where(a => a.Activities.Where(
b => b.CreatedDateTime >= search.BeginDateCreated
&& b.CreatedDateTime <= search.EndDateCreated).Count() > 0)
.Where(a => a.CustomerGroup.Any(d => d.GroupId== search.GroupId))
.Select(a => new CustomCustomerReport
{
CustomerId = a.Id,
Manager = a.Manager.Name,
Customer = a.FirstName + " " + a.LastName,
ContactSource= a.ContactSource!= null ? a.ContactSource.Name : "Unknown",
ContactDate = a.DateCreated,
NewSale = a.Sales
.Where(p => p.Employee.IsActive)
.OrderByDescending(p => p.DateCreated)
.Select(p => new PolicyViewModel
{
//MISC PROPERTIES
}).FirstOrDefault(),
ExistingSale = a.Sales
.Where(p => p.CancellationDate == null || p.CancellationDate <= myDate)
.Where(p => p.SaleDate < myDate)
.OrderByDescending(p => p.DateCreated)
.Select(p => new SalesViewModel
{
//MISC PROPERTIES
}).FirstOrDefault(),
CurrentStatus = a.Activities
.Where(b => b.ActivityType.IsReportable)
.OrderByDescending(b => b.DueDateTime)
.Select(b => b.Disposition.Name)
.FirstOrDefault(),
CustomerGroup = a.CustomerGroup
.Where(cd => cd.GroupId == search.GroupId)
.Select(cd => new GroupViewModel
{
//MISC PROPERTIES
}).FirstOrDefault()
}).ToList();
I cannot give you a definite answer but I would recommend approaching the problem by:
Run SQL profiler locally when this code is executed and see what SQL is generated and run. Look at the query execution plan for each query and look for table scans and other slow operations. Add indexes as needed.
Check your lambdas for things that cannot be easily translated into SQL. You might be pulling the contents of a table into memory and running lambdas on the results, which will be very slow. Change your lambdas or consider writing raw SQL.
Is the Azure database the same as your local database? If not, pull the data locally so your local system is indicative.
Remove sections (i.e. CustomerGroup then CurrentDisposition then ExistingSale then NewSale) and see if there is a significant performance improvement after removing the last section. Focus on the last removed section.
Looking at the line itself:
You use ".Count() > 0" on line 4. Use ".Any()" instead, since the former goes through every row in the database to get you an accurate count when you just want to know if at least one row satisfies the requirements.
Ensure fields referenced in where clauses have indexes, such as IsReportable.
Short answer: use memory.
Long answer:
Because of either bad maintenance plans or limited hardware, running this query in one big lump is what's causing it to fail on Azure. Even if that weren't the case, because of all the navigation properties you're using, this query would generate a staggering number of joins. The answer here is to break it down in smaller pieces that Azure can run. I'm going to try to rewrite your query into multiple smaller, easier to digest queries that use the memory of your .NET application. Please bear with me as I make (more or less) educated guesses about your business logic/db schema and rewrite the query accordingly. Sorry for using the query form of LINQ but I find things such as join and group by are more readable in that form.
var activityFilterCustomerIds = db.Activities
.Where(a =>
a.CreatedDateTime >= search.BeginDateCreated &&
a.CreatedDateTime <= search.EndDateCreated)
.Select(a => a.CustomerId)
.Distinct()
.ToList();
var groupFilterCustomerIds = db.CustomerGroup
.Where(g => g.GroupId = search.GroupId)
.Select(g => g.CustomerId)
.Distinct()
.ToList();
var customers = db.Customers
.AsNoTracking()
.Where(c =>
activityFilterCustomerIds.Contains(c.Id) &&
groupFilterCustomerIds.Contains(c.Id))
.ToList();
var customerIds = customers.Select(x => x.Id).ToList();
var newSales =
(from s in db.Sales
where customerIds.Contains(s.CustomerId)
&& s.Employee.IsActive
group s by s.CustomerId into grouped
select new
{
CustomerId = grouped.Key,
Sale = grouped
.OrderByDescending(x => x.DateCreated)
.Select(new PolicyViewModel
{
// properties
})
.FirstOrDefault()
}).ToList();
var existingSales =
(from s in db.Sales
where customerIds.Contains(s.CustomerId)
&& (s.CancellationDate == null || s.CancellationDate <= myDate)
&& s.SaleDate < myDate
group s by s.CustomerId into grouped
select new
{
CustomerId = grouped.Key,
Sale = grouped
.OrderByDescending(x => x.DateCreated)
.Select(new SalesViewModel
{
// properties
})
.FirstOrDefault()
}).ToList();
var currentStatuses =
(from a in db.Activities.AsNoTracking()
where customerIds.Contains(a.CustomerId)
&& a.ActivityType.IsReportable
group a by a.CustomerId into grouped
select new
{
CustomerId = grouped.Key,
Status = grouped
.OrderByDescending(x => x.DueDateTime)
.Select(x => x.Disposition.Name)
.FirstOrDefault()
}).ToList();
var customerGroups =
(from cg in db.CustomerGroups
where cg.GroupId == search.GroupId
group cg by cg.CustomerId into grouped
select new
{
CustomerId = grouped.Key,
Group = grouped
.Select(x =>
new GroupViewModel
{
// ...
})
.FirstOrDefault()
}).ToList();
return customers
.Select(c =>
new CustomCustomerReport
{
// ... simple props
// ...
// ...
NewSale = newSales
.Where(s => s.CustomerId == c.Id)
.Select(x => x.Sale)
.FirstOrDefault(),
ExistingSale = existingSales
.Where(s => s.CustomerId == c.Id)
.Select(x => x.Sale)
.FirstOrDefault(),
CurrentStatus = currentStatuses
.Where(s => s.CustomerId == c.Id)
.Select(x => x.Status)
.FirstOrDefault(),
CustomerGroup = customerGroups
.Where(s => s.CustomerId == c.Id)
.Select(x => x.Group)
.FirstOrDefault(),
})
.ToList();
Hard to suggest anything without seeing actual table definitions, espectially the indexes and foreign keys on Activities entity.
As far I understand Activity (CustomerId, ActivityTypeId, DueDateTime, DispositionId). If this is standard warehousing table (DateTime, ClientId, Activity), I'd suggest the following:
If number of Activities is reasonably small, then force the use of CONTAINS by
var activities = db.Activities.Where( x => x.IsReportable ).ToList();
...
.Where( b => activities.Contains(b.Activity) )
You can even help the optimiser by specifying that you want ActivityId.
Indexes on Activitiy entity should be up to date. For this particular query I suggest (CustomerId, ActivityId, DueDateTime DESC)
precache Disposition table, my crystal ball tells me that it's dictionary table.
For similar task to avoid constantly hitting Activity table I made another small table (CustomerId, LastActivity, LastVAlue) and updated it as the status changed.