LINQ Query - Only get Order and MAX Date from Child Collection - c#

I'm trying to get a list that displays 2 values in a label from a parent and child (1-*) entity collection model.
I have 3 entities:
[Customer]: CustomerId, Name, Address, ...
[Order]: OrderId, OrderDate, EmployeeId, Total, ...
[OrderStatus]: OrderStatusId, StatusLevel, StatusDate, ...
A Customer can have MANY Order, which in turn an Order can have MANY OrderStatus, i.e.
[Customer] 1--* [Order] 1--* [OrderStatus]
Given a CustomerId, I want to get all of the Orders (just OrderId) and the LATEST (MAX?) OrderStatus.StatusDate for that Order.
I've tried a couple of attempts, but can seem to get the results I want.
private IQueryable<Customer> GetOrderData(string customerId)
{
var ordersWithLatestStatusDate = Context.Customers
// Note: I am not sure if I should add the .Expand() extension methods here for the other two entity collections since I want these queries to be as performant as possible and since I am projecting below (only need to display 2 fields for each record in the IQueryable<T>, but thinking I should now after some contemplation.
.Where(x => x.CustomerId == SelectedCustomer.CustomerId)
.Select(x => new Custom
{
CustomerId = x.CustomerId,
...
// I would like to project my Child and GrandChild Collections, i.e. Orders and OrderStatuses here but don't know how to do that. I learned that by projecting, one does not need to "Include/Expand" these extension methods.
});
return ordersWithLatestStatusDate ;
}
---- UPDATE 1 ----
After the great solution from User: lazyberezovsky, I tried the following:
var query = Context.Customers
.Where(c => c.CustomerId == SelectedCustomer.CustomerId)
.Select(o => new Customer
{
Name = c.Name,
LatestOrderDate = o.OrderStatus.Max(s => s.StatusDate)
});
In my hastiness from my initial posting, I didn't paste everything in correctly since it was mostly from memory and didn't have the exact code for reference at the time. My method is a strongly-typed IQueryabled where I need it to return a collection of items of type T due to a constraint within a rigid API that I have to go through that has an IQueryable query as one of its parameters. I am aware I can add other entities/attributes by either using the extension methods .Expand() and/or .Select(). One will notice that my latest UPDATED query above has an added "new Customer" within the .Select() where it was once anonymous. I'm positive that is why the query failed b/c it couldn't be turn into a valid Uri due to LatestOrderDate not being a property of Customer at the Server level. FYI, upon seeing the first answer below, I had added that property to my client-side Customer class with simple { get; set; }. So given this, can I somehow still have a Customer collection with the only bringing back those 2 fields from 2 different entities? The solution below looked so promising and ingenious!
---- END UPDATE 1 ----
FYI, the technologies I'm using are OData (WCF), Silverlight, C#.
Any tips/links will be appreciated.

This will give you list of { OrderId, LatestDate } objects
var query = Context.Customers
.Where(c => c.CustomerId == SelectedCustomer.CustomerId)
.SelectMany(c => c.Orders)
.Select(o => new {
OrderId = o.OrderId,
LatestDate = o.Statuses.Max(s => s.StatusDate) });
.
UPDATE construct objects in-memory
var query = Context.Customers
.Where(c => c.CustomerId == SelectedCustomer.CustomerId)
.SelectMany(c => c.Orders)
.AsEnumerable() // goes in-memory
.Select(o => new {
OrderId = o.OrderId,
LatestDate = o.Statuses.Max(s => s.StatusDate) });
Also grouping could help here.

If I read this correctly you want a Customer entity and then a single value computed from its Orders property. Currently this is not supported in OData. OData doesn't support computed values in the queries. So no expressions in the projections, no aggregates and so on.
Unfortunately even with two queries this is currently not possible since OData doesn't support any way of expressing the MAX functionality.
If you have control over the service, you could write a server side function/service operation to execute this kind of query.

Related

LINQ Query - Proper "Where" clause to select only children that meet a condition

I have an IQueryable of a complex EF model, let's call it GeneralForm. This GeneralForm entity aggregates a member called Section. The Section contains a list of FormFields and each FormField has a name. I want to select only the FormFields whose names are in a list of given names.
IQueryable<GeneralForm> query = InitializeMyQuery();
What is the correct "Where" clause to do so. something like this:
if (criteria.FormFieldNames.Any())
{
query = query.Where(gf => gf.Section.FormFields.Where(x => criteria.FormFieldNames.Contains(x.FormField.FieldName)).Any());
}
does not work, as it still retrieves all FormFields, not just the ones I want.
Any suggestion would be highly appreciated.
Thanks,
Ed
Edit 1: This is how the query is built (for privacy reason, I renamed some entities and I also removed the ones that do not really pertain to the issue I am trying to resolve):
query = (from genFormEntry in _context.GeneralForms
.Include(r => r.Sections)
.Include(r => r.Form.FormFields)
.Include(r => r.Form.FormFields.Select(x => x.FormField))
select genFormEntry);
This query retrieves Sections that have any matching form name. It doesn't do any filtering on the FormField side.
You may try to join those tables manually, or depending on your Ef version, you can try using filtered includes:
if (criteria.FormFieldNames.Any())
{
query = query
.Include(gf => gf.Section.FormFields.Where(x => criteria.FormFieldNames.Contains(x.FormField.FieldName)) // Include the FormFields that match the criteria
.Where(gf => gf.Section.FormFields.Where(x => criteria.FormFieldNames.Contains(x.FormField.FieldName)).Any());
}
Edit:
As Ef 6.1 doesn't support filtered includes. Only two options left. 1 is mentioned above which is manual linq joins (which is pretty ugly and not versatile) and the other is to rewrite the query like below :
// guessing navigation property names here.
query = _context.FormFields.Include(r => r.Form.Section.GeneralForm);
// and later in your code
if (criteria.FormFieldNames.Any())
{
query = query.Where(f => criteria.FormFieldNames.Contains(f.FieldName));
}

How to select top result for a given ID then join that into another table EF Core LINQ

For the life of me I am unable to google my way out of this one.
I have 2 tables within a database
1. Computers
2. UserLogins
Essentially, I'm trying to get the latest login entry from the "UserLogins" table, and join it with the corresponding entry in the "Computers" table.
This sounds simple enough, but I haven't sat through enough LINQ/EF Core courses yet to figure out how to do this correctly it seems.
Here is some SQL that I know functions how I expect it to:
SELECT * FROM ComputerInfo
LEFT JOIN (
SELECT LoginID, UserID, l.ComputerName, IpAddress, l.LoginTime FROM UserLogins as l
INNER JOIN (
SELECT ComputerName, MAX(LoginTime) as LoginTime
FROM UserLogins
GROUP BY ComputerName) as max on max.ComputerName = l.ComputerName and max.LoginTime = l.LoginTime
) as toplogin on toplogin.ComputerName = ComputerInfo.ComputerName
For reference, I am going to be implementing this in my Controller.cs class, and I am using :
EF Core (3.1.2)
ASP.NET Core (3.1)
I do have a couple queries I was experimenting with that return the results, but I can't join them without errors:
var computerQuery = _context.ComputerInfo
.OrderBy(on => on.ComputerName)
var userQuery = _context.UserLogins
.Select(p => p.ComputerName)
.Distinct()
.Select(id => _context.UserLogins
.OrderByDescending(p => p.LoginTime)
.FirstOrDefault(p => p.ComputerName == id))
.ToListAsync();
So I kind of found a shotty way to get this done I think. Not sure if it is correct but here is what I came up with:
I Created a new class called "ComputerInfoFull" which basically was just "ComputerInfo" && "UserLogins" combined, and used this for the linq query:
var initial = from computerInfo in _context.ComputerInfo
from userInfo in _context.UserLogins
.Where(o => o.ComputerName == computerInfo.ComputerName)
.OrderByDescending(o => o.LoginTime).Take(1)
select new ComputerInfoFull(computerInfo, userInfo);
I'm very sure there is a cleaner Lambda way of writing this, but I can't figure out how to make it work right. Too much stuff going on for my tiny brain to handle lol. If anyone has any ideas on how I can make this cleaner please let me know so I can learn.

Why is Entity Framework having performance issues when calculating a sum

I am using Entity Framework in a C# application and I am using lazy loading. I am experiencing performance issues when calculating the sum of a property in a collection of elements. Let me illustrate it with a simplified version of my code:
public decimal GetPortfolioValue(Guid portfolioId) {
var portfolio = DbContext.Portfolios.FirstOrDefault( x => x.Id.Equals( portfolioId ) );
if (portfolio == null) return 0m;
return portfolio.Items
.Where( i =>
i.Status == ItemStatus.Listed
&&
_activateStatuses.Contains( i.Category.Status )
)
.Sum( i => i.Amount );
}
So I want to fetch the value for all my items that have a certain status of which their parent has a specific status as well.
When logging the queries generated by EF I see it is first fetching my Portfolio (which is fine). Then it does a query to load all Item entities that are part of this portfolio. And then it starts fetching ALL Category entities for each Item one by one. So if I have a portfolio that contains 100 items (each with a category), it literally does 100 SELECT ... FROM categories WHERE id = ... queries.
So it seems like it's just fetching all info, storing it in its memory and then calculating the sum. Why does it not do a simple join between my tables and calculate it like that?
Instead of doing 102 queries to calculate the sum of 100 items I would expect something along the lines of:
SELECT
i.id, i.amount
FROM
items i
INNER JOIN categories c ON c.id = i.category_id
WHERE
i.portfolio_id = #portfolioId
AND
i.status = 'listed'
AND
c.status IN ('active', 'pending', ...);
on which it could then calculate the sum (if it is not able to use the SUM directly in the query).
What is the problem and how can I improve the performance other than writing a pure ADO query instead of using Entity Framework?
To be complete, here are my EF entities:
public class ItemConfiguration : EntityTypeConfiguration<Item> {
ToTable("items");
...
HasRequired(p => p.Portfolio);
}
public class CategoryConfiguration : EntityTypeConfiguration<Category> {
ToTable("categories");
...
HasMany(c => c.Products).WithRequired(p => p.Category);
}
EDIT based on comments:
I didn't think it was important but the _activeStatuses is a list of enums.
private CategoryStatus[] _activeStatuses = new[] { CategoryStatus.Active, ... };
But probably more important is that I left out that the status in the database is a string ("active", "pending", ...) but I map them to an enum used in the application. And that is probably why EF cannot evaluate it? The actual code is:
... && _activateStatuses.Contains(CategoryStatusMapper.MapToEnum(i.Category.Status)) ...
EDIT2
Indeed the mapping is a big part of the problem but the query itself seems to be the biggest issue. Why is the performance difference so big between these two queries?
// Slow query
var portfolio = DbContext.Portfolios.FirstOrDefault(p => p.Id.Equals(portfolioId));
var value = portfolio.Items.Where(i => i.Status == ItemStatusConstants.Listed &&
_activeStatuses.Contains(i.Category.Status))
.Select(i => i.Amount).Sum();
// Fast query
var value = DbContext.Portfolios.Where(p => p.Id.Equals(portfolioId))
.SelectMany(p => p.Items.Where(i =>
i.Status == ItemStatusConstants.Listed &&
_activeStatuses.Contains(i.Category.Status)))
.Select(i => i.Amount).Sum();
The first query does a LOT of small SQL queries whereas the second one just combines everything into one bigger query. I'd expect even the first query to run one query to get the portfolio value.
Calling portfolio.Items this will lazy load the collection in Items and then execute the subsequent calls including the Where and Sum expressions. See also Loading Related Entities article.
You need to execute the call directly on the DbContext the Sum expression can be evaluated database server side.
var portfolio = DbContext.Portfolios
.Where(x => x.Id.Equals(portfolioId))
.SelectMany(x => x.Items.Where(i => i.Status == ItemStatus.Listed && _activateStatuses.Contains( i.Category.Status )).Select(i => i.Amount))
.Sum();
You also have to use the appropriate type for _activateStatuses instance as the contained values must match the type persisted in the database. If the database persists string values then you need to pass a list of string values.
var _activateStatuses = new string[] {"Active", "etc"};
You could use a Linq expression to convert enums to their string representative.
Notes
I would recommend you turn off lazy loading on your DbContext type. As soon as you do that you will start to catch issues like this at run time via Exceptions and can then write more performant code.
I did not include error checking for if no portfolio was found but you could extend this code accordingly.
Yep CategoryStatusMapper.MapToEnum cannot be converted to SQL, forcing it to run the Where in .Net. Rather than mapping the status to the enum, _activeStatuses should contain the list of integer values from the enum so the mapping is not required.
private int[] _activeStatuses = new[] { (int)CategoryStatus.Active, ... };
So that the contains becomes
... && _activateStatuses.Contains(i.Category.Status) ...
and can all be converted to SQL
UPDATE
Given that i.Category.Status is a string in the database, then
private string[] _activeStatuses = new[] { CategoryStatus.Active.ToString(), ... };

Entity Framework Query is too slow

I have to put a complex query on your database. But the query ends at 8000 ms. Do I do something wrong? I use .net 1.1 and Entity Framework core 1.1.2 version.
var fol = _context.UserRelations
.Where(u => u.FollowerId == id && u.State == true)
.Select(p => p.FollowingId)
.ToArray();
var Votes = await _context.Votes
.OrderByDescending(c => c.CreationDate)
.Skip(pageSize * pageIndex)
.Take(pageSize)
.Where(fo => fol.Contains(fo.UserId))
.Select(vote => new
{
Id = vote.Id,
VoteQuestions = vote.VoteQuestions,
VoteImages = _context.VoteMedias.Where(m => m.VoteId == vote.Id)
.Select(k => k.MediaUrl.ToString()),
Options = _context.VoteOptions.Where(m => m.VoteId == vote.Id).Select( ques => new
{
OptionsID = ques.Id,
OptionsName = ques.VoteOption,
OptionsCount = ques.VoteRating.Count(cout => cout.VoteOptionsId == ques.Id),
}),
User = _context.Users.Where(u => u.Id == vote.UserId).Select(usr => new
{
Id = usr.Id,
Name = usr.UserProperties.Where(o => o.UserId == vote.UserId).Select(l => l.Name.ToString())
.First(),
Surname = usr.UserProperties.Where(o => o.UserId == vote.UserId)
.Select(l => l.SurName.ToString()).First(),
ProfileImage = usr.UserProfileImages.Where(h => h.UserId == vote.UserId && h.State == true)
.Select(n => n.ImageUrl.ToString()).First()
}),
NextPage = nextPage
}).ToListAsync();
Have a look at the SQL queries you generate to the server (and results of this queries). For SQL Server the best option is SQL Server Profiler, there are ways for other servers too.
you create two queries. First creates fol array and then you pass it into the second query using Contains. Do you know how this works? You probably generate query with as many parameters as many items you have in the array. It is neither pretty or efficient. It is not necessary here, merge it into the main query and you would have only one parameter.
you do paginating before filtering, is this really the way it should work? Also have a look at other ways of paginating based on filtering by ids rather than simple skipping.
you do too much side queries in one query. When you query three sublists of 100 items each, you do not get 300 rows. To get it in one query you create join and get actually 100*100*100 = 1000000 rows. Unless you are sure the frameworks can split it into multiple queries (probably can not), you should query the sublists in separate queries. This would be probably the main performance problem you have.
please use singular to name tables, not plural
for performance analysis, indexes structure and execution plan are vital information and you can not really say much without them
As noted in the comments, you are potentially executing 100, 1000 or 10000 queries. For every Vote in your database that matches the first result you do 3 other queries.
For 1000 votes which result from the first query you need to do 3000 other queries to fetch the data. That's insane!
You have to use EF Cores eager loading feature to fetch this data with very few queries. If your models are designed well with relations and navigation properties its easy.
When you load flat models without a projection (using .Select), you have to use .Include to tell EF Which other related entities it should load.
// Assuming your navigation property is called VoteMedia
await _context.Votes.
.Include(vote => vote.VoteMedia)
...
This would load all VoteMedia objects with the vote. So extra query to get them is not necessary.
But if you use projects, the .Include calls are not necessary (in fact they are even ignored, when you reference navigation properties in the projection).
// Assuming your navigation property is called VoteMedia
await _context.Votes.
.Include(vote => vote.VoteMedia)
...
.Select( vote => new
{
Id = vote.Id,
VoteQuestions = vote.VoteQuestions,
// here you reference to VoteMedia from your Model
// EF Core recognize that and will load VoteMedia too.
//
// When using _context.VoteMedias.Where(...), EF won't do that
// because you directly call into the context
VoteImages = vote.VoteMedias.Where(m => m.VoteId == vote.Id)
.Select(k => k.MediaUrl.ToString()),
// Same here
Options = vote.VoteOptions.Where(m => m.VoteId == vote.Id).Select( ques => ... );
}

Performance - get data through navigation property vs compiled query

I have compiled queries for both, the main entity customer and for related entities(orders).
var customer = MyService.GetCustomer<Customer>().where(c => c.Id == fooId).ToList();
var customerOrders = MyService.GetOrders<Order>().where(o => o.CustomerId == fooId && o.IsActive).ToList();
But I think I can get all orders through navigation properties instead of compiled query calls since customer is already loaded in memory by the code below:
var customerOrders = customer.Orders.where(o => o.IsActive).ToList(); // I didn't do filter further more
But when I measure the timings I couldn't find any considerable amount of difference (The DB has 500 customers and 4000 orders. And every particular customer has 30 active orders and around 400 inactive orders).
Which one of these two would have a better performance?
I could not fully understand this related question
Linq to Entities translates the Linq queries to SQL.
var customer = MyService.GetCustomer<Customer>().where(c => c.Id == fooId).ToList();
This can actually be simplified since c.Id is unique:
Customer customer = MyService.GetCustomer<Customer>().SingleOrDefault(c=> c.Id == fooId);
What you do here is just get a customer with a certain Id first. (constant and does not depend on the Count of the Orders you query)
Then you query the orders of this customer(this query is dependend on how many orders this customer have):
var customerOrders = customer.Orders.where(o => o.IsActive).ToList();
Then you do another query for which will lead to the exact same SQL statement as the one above.
var customerOrders = MyService.GetOrders<Order>().where(o => o.CustomerId == fooId && o.IsActive).ToList();
This is why the performance difference is only the first query.
You way depend of your case.
if you're going to actively use related entities:
Best way include this in you query:
using System.Data.Entity;
...
var customer = MyService.GetCustomer<Customer>().where(c => c.Id == fooId).Include(u => u.Orders).ToList();
In other cases prefer lazy loading.

Categories