Using EF core 5 and ASP.NET Core 3.1, I am trying to get a filtered collection based on a condition on its grandchildren collection.
I have the following entities:
public class Organisation
{
public int Id { get; set; }
public int? OrganisationId { get; set; }
public IEnumerable<Customer> Customers { get; set; }
}
public partial class Customer
{
[Key]
public uint Id { get; set; }
public int? EmployerId { get; set; }
public int? OrganisationId { get; set; }
public List<TimecardProperties> TimecardsProperties { get; set; }
}
public partial class TimecardProperties
{
[Key]
public int Id { get; set; }
public int? EmployerId { get; set; }
public int? Week { get; set; }
public short? Year { get; set; }
}
The goal is to get all Organisations that have at least one customer and the customer has at least 1 timecard property that is in week=34 and year=2021.
So far I have tried the following:
////necessary join to get Organisations for user id
IQueryable<Organisation> ouQuery = (from cou in _dbContext.Organisations
join uou in _dbContext.table2 on cou.OrganisationId equals uou.OrganisationId
where uou.UsersId == int.Parse(userId)
select cou)
.Where(cou => cou.Customers.Where(c => c.TimecardsProperties.Count > 0).Any())
.Include(cou => cou.Customers.Where(c => c.TimecardsProperties.Count > 0))
.ThenInclude(c => c.TimecardsProperties.Where(tc => tc.tWeek == 34 && tc.Year > 2020))
;
This returns a organisation list that each have a customers list but some customers have a count of timecards 0. I don't want to have organisation in the returned list that does not have at least one item in the timecards collection.
Also, it is too slow, and if I try to filter the produced list its even
slower (over 15 seconds)
I have also tried a raw sql query on the organisation db context but it is again very slow:
select distinct count(id) from organisation a where organisation_id in (
select organisation_id from customers where employer_id in (select distinct employer_id from timecards a
inner join timecard_components b on a.id=b.timecards_id
where week IN(
34) and year in (2021,2021) and invoice !=0 and type = 'time'
group by employer_id, week)
);
In general, I want to know the the total
count of the returned organisation collection for pagination (so I don't need to include all attributes of each entity)
as well as return only a part of the correct results, which satisfy the conditions,
an organisation list that has at least 1 timecards in
their customers by executing the query in the end like so:
ouQuery.Skip((page - 1) * pageSize).Take(pageSize).ToListAsync();
I have also tried the EntityFramework.Plus and projection with no results.
How could I write this to achieve getting the total count of the organisation list and a part of these results (first 10) to display to the user?
Use navigation properties. This is the query you want:
var orgsQuery = dbContext.Organizations
.Where( o => o.Customers.Any( c =>
c.TimecardProperties.Any( tp =>
tp.Year = 2021
&& tp.Week = 34 ) ) );
Add includes and other predicates as needed
I need to fetch from the database this:
rack
it's type
single shelf with all its boxes and their box types
single shelf above the previous shelf without boxes and with shelf type
Shelves have VerticalPosition which is in centimeters from the ground - when I am querying for e.g. second shelf in rack, I need to order them and select shelf on index 1.
I have this ugly EF query now:
var targetShelf = await _warehouseContext.Shelves
.Include(s => s.Rack)
.ThenInclude(r => r.Shelves)
.ThenInclude(s => s.Type)
.Include(s => s.Rack)
.ThenInclude(r => r.Type)
.Include(s => s.Rack)
.ThenInclude(r => r.Shelves)
.Include(s => s.Boxes)
.ThenInclude(b => b.BoxType)
.Where(s => s.Rack.Aisle.Room.Number == targetPosition.Room)
.Where(s => s.Rack.Aisle.Letter == targetPosition.Aisle)
.Where(s => s.Rack.Position == targetPosition.Rack)
.OrderBy(s => s.VerticalPosition)
.Skip(targetPosition.ShelfNumber - 1)
.FirstOrDefaultAsync();
but this gets all boxes from all shelves and it also shows warning
Compiling a query which loads related collections for more than one collection navigation, either via 'Include' or through projection, but no 'QuerySplittingBehavior' has been configured. By default, Entity Framework will use 'QuerySplittingBehavior.SingleQuery', which can potentially result in slow query performance.
Also I would like to use AsNoTracking(), because I don't need change tracker for these data.
First thing: for AsNoTracking() I would need to query Racks, because it complains about circular include.
Second thing: I tried conditional include like this:
.Include(r => r.Shelves)
.ThenInclude(s => s.Boxes.Where(b => b.ShelfId == b.Shelf.Rack.Shelves.OrderBy(sh => sh.VerticalPosition).Skip(shelfNumberFromGround - 1).First().Id))
but this won't even translate to SQL.
I have also thought of two queries - one will retrieve rack with shelves and second only boxes, but I still wonder if there is some single call command for this.
Entities:
public class Rack
{
public Guid Id { get; set; }
public Guid RackTypeId { get; set; }
public RackType Type { get; set; }
public ICollection<Shelf> Shelves { get; set; }
}
public class RackType
{
public Guid Id { get; set; }
public ICollection<Rack> Racks { get; set; }
}
public class Shelf
{
public Guid Id { get; set; }
public Guid ShelfTypeId { get; set; }
public Guid RackId { get; set; }
public int VerticalPosition { get; set; }
public ShelfType Type { get; set; }
public Rack Rack { get; set; }
public ICollection<Box> Boxes { get; set; }
}
public class ShelfType
{
public Guid Id { get; set; }
public ICollection<Shelf> Shelves { get; set; }
}
public class Box
{
public Guid Id { get; set; }
public Guid ShelfId { get; set; }
public Guid BoxTypeId { get; set; }
public BoxType BoxType { get; set; }
public Shelf Shelf { get; set; }
}
public class BoxType
{
public Guid Id { get; set; }
public ICollection<Box> Boxes { get; set; }
}
I hope I explained it good enough.
Query Splitting
First, I'd recommend benchmarking the query as-is before deciding whether to attempt any optimization.
It can be faster to perform multiple queries than one large query with many joins. While you avoid a single complex query, you have additional network round-trips if your DB isn't on the same machine, and some databases (e.g. SQL Server without MARS enabled) only support one active query at a time. Your mileage may vary in terms of actual performance.
Databases do not generally guarantee consistency between separate queries (SQL Server allows you to mitigate that with the performance-expensive options of serializable or snapshot transactions). You should be cautious using a multiple-query strategy if intervening data modifications are possible.
To split a specific query, use the AsSplitQuery() extension method.
To use split queries for all queries against a given DB context,
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
optionsBuilder
.UseSqlServer(
#"Server=(localdb)\mssqllocaldb;Database=EFQuerying;Trusted_Connection=True;ConnectRetryCount=0",
o => o.UseQuerySplittingBehavior(QuerySplittingBehavior.SplitQuery));
}
Reference.
Query that won't translate
.Include(r => r.Shelves)
.ThenInclude(s => s.Boxes.Where(b => b.ShelfId == b.Shelf.Rack.Shelves.OrderBy(sh => sh.VerticalPosition).Skip(shelfNumberFromGround - 1).First().Id))
Your expression
s.Boxes.Where(b => b.ShelfId == b.Shelf.Rack.Shelves.OrderBy(sh => sh.VerticalPosition).Skip(shelfNumberFromGround - 1).First().Id
resolves to an Id. ThenInclude() expects an expression that ultimately specifies a collection navigation (in other words, a table).
Ok, from your question I'm assuming you have a method where you need these bits of information:
single shelf with all its boxes and their box types
single shelf above the previous shelf without boxes and with shelf type
rack and it's type
Whether EF breaks up the queries or you do doesn't really make much of a difference performance-wise. What matters is how well the code is later understood and can adapt if/when requirements change.
The first step I would recommend is to identify the scope of detail you actually need. You mention that you don't need tracking, so I would expect you intend to deliver these results or otherwise consume the information without persisting changes. Project this down to just the details from the various tables that you need to be served by a DTO or ViewModel, or an anonymous type if the data doesn't really need to travel. For instance you will have a shelf & shelf type which is effectively a many-to-one so the shelf type details can probably be part of the shelf results. Same with the Box and BoxType details. A shelf would then have an optional set of applicable box details. The Rack & Racktype details can come back with one of the shelf queries.
[Serializable]
public class RackDTO
{
public int RackId { get; set; }
public int RackTypeId { get; set; }
public string RackTypeName { get; set; }
}
[Serializable]
public class ShelfDTO
{
public int ShelfId { get; set; }
public int VerticalPosition { get; set; }
public int ShelfTypeId { get; set; }
public string ShelfTypeName { get; set; }
public ICollection<BoxDTO> Boxes { get; set; } = new List<BoxDTO>();
public RackDTO Rack { get; set; }
}
[Serializable]
public class BoxDTO
{
public int BoxId { get; set; }
public int BoxTypeId { get; set; }
public string BoxTypeName { get; set; }
}
Then when reading the information, I'd probably split it into two queries. One to get the "main" shelf, then a second optional one to get the "previous" one if applicable.
ShelfDTO shelf = await _warehouseContext.Shelves
.Where(s => s.Rack.Aisle.Room.Number == targetPosition.Room
&& s.Rack.Aisle.Letter == targetPosition.Aisle
&& s.Rack.Position == targetPosition.Rack)
.Select(s => new ShelfDTO
{
ShelfId = s.ShelfId,
VerticalPosition = s.VerticalPosition,
ShelfTypeId = s.ShelfType.ShelfTypeId,
ShelfTypeName = s.ShelfType.Name,
Rack = s.Rack.Select(r => new RackDTO
{
RackId = r.RackId,
RackTypeId = r.RackType.RackTypeId,
RackTypeName = r.RackType.Name
}).Single(),
Boxes = s.Boxes.Select(b => new BoxDTO
{
BoxId = b.BoxId,
BoxTypeId = b.BoxType.BoxTypeId,
BoxTypeName = b.BoxType.Name
}).ToList()
}).OrderBy(s => s.VerticalPosition)
.Skip(targetPosition.ShelfNumber - 1)
.FirstOrDefaultAsync();
ShelfDTO previousShelf = null;
if (targetPosition.ShelfNumber > 1 && shelf != null)
{
previousShelf = await _warehouseContext.Shelves
.Where(s => s.Rack.RackId == shelf.RackId
&& s.VerticalPosition < shelf.VerticalPosition)
.Select(s => new ShelfDTO
{
ShelfId = s.ShelfId,
VerticalPosition = s.VerticalPosition,
ShelfTypeId = s.ShelfType.ShelfTypeId,
ShelfTypeName = s.ShelfType.Name,
Rack = s.Rack.Select(r => new RackDTO
{
RackId = r.RackId,
RackTypeId = r.RackType.RackTypeId,
RackTypeName = r.RackType.Name
}).Single()
}).OrderByDescending(s => s.VerticalPosition)
.FirstOrDefaultAsync();
}
Two fairly simple to read queries that should return what you need without much problem. Because we project down to a DTO we don't need to worry about eager loading and potential cyclical references if we wanted to load an entire detached graph. Obviously this would need to be fleshed out to include the details from the shelf, box, and rack that are relevant to the consuming code/view. This can be trimmed down even more by leveraging Automapper and it's ProjectTo method to take the place of that whole Select projection as a one-liner.
In SQL raw it could look like
WITH x AS(
SELECT
r.*, s.Id as ShelfId, s.Type as ShelfType
ROW_NUMBER() OVER(ORDER BY s.verticalposition) as shelfnum
FROM
rooms
JOIN aisles on aisles.RoomId = rooms.Id
JOIN racks r on r.AisleId = aisles.Id
JOIN shelves s ON s.RackId = r.Id
WHERE
rooms.Number = #roomnum AND
aisles.Letter = #let AND
r.Position = #pos
)
SELECT *
FROM
x
LEFT JOIN boxes b
ON
b.ShelfId = x.ShelfId AND x.ShelfNum = #shelfnum
WHERE
x.ShelfNum BETWEEN #shelfnum AND #shelfnum+1
The WITH uses room/aisle/rack joins to locate the rack; you seem to have these identifiers. Shelves are numbered in increasing height off ground. Outside the WITH, boxes are left joined only if they are on the shelf you want, but two shelves are returned; the shelf you want with all it's boxes and the shelf above but box data will be null because the left join fails
As an opinion, if your query is getting this level of depth, you might want to consider either using views as a shortcut in your database or use No-SQL as a read store.
Having to do lots of joins, and doing taxing operations like order by during runtime with LINQ is something I'd try my best to avoid.
So I'd approach this as a design problem, rather than a code/query problem.
In EF, All related entities loaded with Include, ThenInclude etc. produce joins on the database end. This means that when we load related master tables, the list values will get duplicated across all records, thus causing what is called "cartesian explosion". Due to this, there was a need to split huge queries into multiple calls, and eventually .AsSplitQuery() was introduced.
Eg:
var query = Context.DataSet<Transactions>()
.Include(x => x.Master1)
.Include(x => x.Master2)
.Include(x => x.Master3)
.ThenInclude(x => x.Master3.Masterx)
.Where(expression).ToListAsync();
Here we can introduce splitquery
var query = Context.DataSet<Transactions>()
.Include(x => x.Master1)
.Include(x => x.Master2)
.Include(x => x.Master3)
.ThenInclude(x => x.Master3.Masterx)
.Where(expression).AsSplitQuery.ToListAsync();
As an alternate to include this to all existing queries, which could be time consuming, we could specify this globally like
services.AddDbContextPool<EntityDataLayer.ApplicationDbContext>(options =>
{
options.EnableSensitiveDataLogging(true);
options.UseMySql(mySqlConnectionStr,
ServerVersion.AutoDetect(mySqlConnectionStr), x =>
x.UseQuerySplittingBehavior(QuerySplittingBehavior.SplitQuery)
x.EnableRetryOnFailure(
maxRetryCount: 10,
maxRetryDelay: TimeSpan.FromSeconds(30),
errorNumbersToAdd: null));
});
This will ensure that all queries are called as split queries.
Now in case we need single query, we can just override this by stating single query explicitly in individual queries. This may be done vice-versa though.
var data = await query.AsSingleQuery().ToListAsync();
I am trying to extract list of Categories with the corresponding Tickets for a specific userId using Linq Lambda expression.
Category:
public class Category
{
public int Id { get; set; }
public string CategoryName { get; set; }
public ICollection<Ticket> Tickets { get; set; }
}
Ticket:
public class Ticket
{
public int Id { get; set; }
public string Title { get; set; }
public User User { get; set; }
public Category Category { get; set; }
}
User
public class User
{
public string Id { get; set; }
public string UserName { get; set; }
public ICollection<Ticket> Tickets { get; set; }
}
This is what I tried so far. But it is not even close.
public async Task<IEnumerable<Category>> GetCategories(string userid)
{
var categories = _context.Categories
.Include(c => c.Tickets)
.AsQueryable();
categories = categories
.Where(c => c.Tickets.Any(t =>t.User.Id.Equals(id)));
return await categories.ToListAsync();
}
How can I have a list of categories with corresponding tickets for a specific userId?
The issue is this line of code:
categories = categories.Where(c => c.Tickets.Any(t =>t.User.Id.Equals(id)));
This will return all categories that contain at least 1 ticket with the specified user id, but all tickets in the category will be included. My understanding is, you want to keep only the tickets that belong to the specified user id.
The fact that your Tickets are represented by an ICollection<Ticket> instead of IEnumerable<Ticket>, make this a bit more difficult - since LINQ works on IEnumerables. That's the reason for the ToList() call, which causes enumeration - something to be aware of here - Tickets collection is no longer lazy past that point.
This code will do what you want:
IEnumerable<Category> selection = (from c in categories
select new Category
{
Id = c.Id,
CategoryName = c.CategoryName,
Tickets = (from t in c.Tickets
where t.User.Id.Equals(id)
select t).ToList()
}).Where(c => c.Tickets.Count() > 0);
but, I think, both performance-wise and readability-wise, you might want a loop instead:
List<Category> selection = new List<Category>();
foreach (var category in categories)
{
category.Tickets = category.Tickets.Where(t => t.User.Id.Equals(id)).ToList();
if (category.Tickets.Count > 0)
{
selection.Add(category);
}
}
Finally, since your Ticket class carries the Category info anyway, it may be worth to flatten the list using SelectMany:
var selection = categories.SelectMany(c => c.Tickets.Where(t => t.User.Id.Equals(id)));
This returns a flattened list of Tickets - but only the ones that match the specified user id, from all categories. This has the advantage of staying lazy (it's still an IEnumerable), being simple, and very likely of higher performance than the other options; but, you no longer have the nested list.
Take your pick!
I'm building a simple tour booking app.
It has 3 linked tables, setup as below.
Given a specific TourCategoryId and a date (dte), I'm trying to build an object which I can then go on to populate a viewmodel with.
However I'm not understanding how to check the TourDate field against dte.
Am I trying to do too much in one statement? Can anyone help me with what I thought would be a simple query?
Thanks for any advice:
My linq is:
var tours = db.Tours
.Include(x => x.TourDate)
.Include(x => x.TourDate.Select(b => b.Booking))
.Where(t => t.TourCategoryId == id &&
t.TourDate.Any(td => td.Date==dte));
(also pseudo code)
.Where(v => v.Booking.NumberBooked.Sum() < v.Tours.PlacesAvailable)
I've updated this with suggestions below, but it is returning all td.Date - and not filtering, as you can see from the screenshot below, showing all 3 records returned, and not just those where td.Date == dte:
Is it the .Any that is maybe not correct?
The end result should be
List of Tours => List of Dates => List of Bookings - where the dates are filtered to the dte eg:
Tour1- night tours
--TourDate - filtered to 02/03/2015
----Booking1
----Booking2
----Booking3
Tour2- day tours
--TourDate - filtered to 02/03/2015
----Booking1
----Booking2
Tour3- multi-day tours
--TourDate - filtered to 02/03/2015
----Booking1
----Booking2
----Booking3
----Booking4
Thanks again, Mark
public class Tour
{
public int TourId { get; set; }
public int TourCategoryId { get; set; }
public string TourName { get; set; }
public int PlacesAvailable { get; set; }
public virtual ICollection<TourDate> TourDate { get; set; }
}
public class TourDate
{
public int TourDateId { get; set; }
public int TourId { get; set; }
public DateTime Date { get; set; }
public virtual Tour Tour { get; set; }
public virtual ICollection<Booking> Booking { get; set; }
}
public class Booking
{
public int BookingId { get; set; }
public int TourDateId { get; set; }
public string Name { get; set; }
public string Email { get; set; }
public int NumberBooked { get; set; }
public virtual TourDate TourDate { get; set; }
}
From what I understand you want any Tour in TourCategoryId which has a TourDate on dte.
This may work for you.
var tours =
db.Tours
.Include(x => x.TourDate)
.Include(x => x.TourDate.Select(b => b.Booking))
.Where(t => t.TourCategoryId == id &&
t.TourDate.Any(td => td.Date == dte));
To filter the TourDate.
foreach(var tour in tours)
{
tour.TourDate = tour.TourDate.Where(td => td.Date == dte).ToList()
}
The problem you are facing is that the business object you work with does not fit in the class hierarchy you use.
Try not to work strictly with the database mapped classes. Your business objects in this case are something different from just a list of instances of the Tour class that are matched to the db strucure. Just use the db classes to form a query, with the help of all the navigation properties you created, but dont try to fit all the queries you need to the same class hierarchy as you need for a view model.
You should build your query around bookings, not tours. select the bookings matching your criteria, including the one in pseudocode, then group by the dates and tours and select some kind of anonimous object, representing the groups. Then translate those to view model classes you need, not necessary all of which are database mapped.
In many cases, you do not even need ro hydrate an entity just to get some property loaded.
No need to load an elephant just to be able to take a picture.
Include is really just a workaround, like killing the elephant and loading by parts.
I am using EntityFramework 6 and running into some major speed issues -- this query is taking over two seconds to run. I have spent the better part of the day using LinqPad in order to speed up the query but I could only get it down from 4 to two seconds. I have tried grouping, joins, etc. but the generated SQL looks overly complicated to me. I am guessing that I am just taking the wrong approach to writing the LINQ.
Here is what I am attempting to do
Find all A where Valid is null and AccountId isn't the current user
Make sure the Collection of B does not contain any B where AccountId is the current user
Order the resulting A by the number of B in its collection in descending order
Any A that doesn't have any B should be at the end of the returned results.
I have to models which look like this:
public class A
{
public int Id { get; set; }
public bool? Valid { get; set; }
public string AccountId { get; set; }
public virtual ICollection<B> Collection { get; set; }
}
public class B
{
public int Id { get; set; }
public bool Valid { get; set; }
public string AccountId { get; set; }
public DateTime CreatedDate { get; set; }
public virtual A Property { get; set; }
}
The table for A has about one million rows and B will eventually have around ten million. Right now B is sitting at 50,000.
Here is what the query currently looks like. It gives me the expected results but I have to run an orderby multiple times and do other unnecessary steps:
var filterA = this.context.A.Where(gt => gt.Valid == null && !gt.AccountId.Contains(account.Id));
var joinedQuery = from b in this.context.B.Where(gv => !gv.AccountId.Contains(account.Id))
join a in filterA on gv.A equals a
where !a.Collection.Any(v => v.AccountId.Contains(account.Id))
let count = gt.Collection.Count()
orderby count descending
select new { A = gt, Count = count };
IQueryable<GifTag> output = joinedQuery
.Where(t => t.A != null)
.Select(t => t.A)
.Distinct()
.Take(20)
.OrderBy(t => t.Collection.Count);
Thanks
Well you could always try to remove these two lines from the joinQuery
where !a.Collection.Any(v => v.AccountId.Contains(account.Id))
and
orderby count descending
the first line have already been filtered in the first Query
and the orderline, well do do the ordering on the last Query so there is no point in doing it twice