I need help speeding up this EF LINQ query - c#

I am using EntityFramework 6 and running into some major speed issues -- this query is taking over two seconds to run. I have spent the better part of the day using LinqPad in order to speed up the query but I could only get it down from 4 to two seconds. I have tried grouping, joins, etc. but the generated SQL looks overly complicated to me. I am guessing that I am just taking the wrong approach to writing the LINQ.
Here is what I am attempting to do
Find all A where Valid is null and AccountId isn't the current user
Make sure the Collection of B does not contain any B where AccountId is the current user
Order the resulting A by the number of B in its collection in descending order
Any A that doesn't have any B should be at the end of the returned results.
I have to models which look like this:
public class A
{
public int Id { get; set; }
public bool? Valid { get; set; }
public string AccountId { get; set; }
public virtual ICollection<B> Collection { get; set; }
}
public class B
{
public int Id { get; set; }
public bool Valid { get; set; }
public string AccountId { get; set; }
public DateTime CreatedDate { get; set; }
public virtual A Property { get; set; }
}
The table for A has about one million rows and B will eventually have around ten million. Right now B is sitting at 50,000.
Here is what the query currently looks like. It gives me the expected results but I have to run an orderby multiple times and do other unnecessary steps:
var filterA = this.context.A.Where(gt => gt.Valid == null && !gt.AccountId.Contains(account.Id));
var joinedQuery = from b in this.context.B.Where(gv => !gv.AccountId.Contains(account.Id))
join a in filterA on gv.A equals a
where !a.Collection.Any(v => v.AccountId.Contains(account.Id))
let count = gt.Collection.Count()
orderby count descending
select new { A = gt, Count = count };
IQueryable<GifTag> output = joinedQuery
.Where(t => t.A != null)
.Select(t => t.A)
.Distinct()
.Take(20)
.OrderBy(t => t.Collection.Count);
Thanks

Well you could always try to remove these two lines from the joinQuery
where !a.Collection.Any(v => v.AccountId.Contains(account.Id))
and
orderby count descending
the first line have already been filtered in the first Query
and the orderline, well do do the ordering on the last Query so there is no point in doing it twice

Related

Filtering parent collection by grandchildren properties in EF Core

Using EF core 5 and ASP.NET Core 3.1, I am trying to get a filtered collection based on a condition on its grandchildren collection.
I have the following entities:
public class Organisation
{
public int Id { get; set; }
public int? OrganisationId { get; set; }
public IEnumerable<Customer> Customers { get; set; }
}
public partial class Customer
{
[Key]
public uint Id { get; set; }
public int? EmployerId { get; set; }
public int? OrganisationId { get; set; }
public List<TimecardProperties> TimecardsProperties { get; set; }
}
public partial class TimecardProperties
{
[Key]
public int Id { get; set; }
public int? EmployerId { get; set; }
public int? Week { get; set; }
public short? Year { get; set; }
}
The goal is to get all Organisations that have at least one customer and the customer has at least 1 timecard property that is in week=34 and year=2021.
So far I have tried the following:
////necessary join to get Organisations for user id
IQueryable<Organisation> ouQuery = (from cou in _dbContext.Organisations
join uou in _dbContext.table2 on cou.OrganisationId equals uou.OrganisationId
where uou.UsersId == int.Parse(userId)
select cou)
.Where(cou => cou.Customers.Where(c => c.TimecardsProperties.Count > 0).Any())
.Include(cou => cou.Customers.Where(c => c.TimecardsProperties.Count > 0))
.ThenInclude(c => c.TimecardsProperties.Where(tc => tc.tWeek == 34 && tc.Year > 2020))
;
This returns a organisation list that each have a customers list but some customers have a count of timecards 0. I don't want to have organisation in the returned list that does not have at least one item in the timecards collection.
Also, it is too slow, and if I try to filter the produced list its even
slower (over 15 seconds)
I have also tried a raw sql query on the organisation db context but it is again very slow:
select distinct count(id) from organisation a where organisation_id in (
select organisation_id from customers where employer_id in (select distinct employer_id from timecards a
inner join timecard_components b on a.id=b.timecards_id
where week IN(
34) and year in (2021,2021) and invoice !=0 and type = 'time'
group by employer_id, week)
);
In general, I want to know the the total
count of the returned organisation collection for pagination (so I don't need to include all attributes of each entity)
as well as return only a part of the correct results, which satisfy the conditions,
an organisation list that has at least 1 timecards in
their customers by executing the query in the end like so:
ouQuery.Skip((page - 1) * pageSize).Take(pageSize).ToListAsync();
I have also tried the EntityFramework.Plus and projection with no results.
How could I write this to achieve getting the total count of the organisation list and a part of these results (first 10) to display to the user?
Use navigation properties. This is the query you want:
var orgsQuery = dbContext.Organizations
.Where( o => o.Customers.Any( c =>
c.TimecardProperties.Any( tp =>
tp.Year = 2021
&& tp.Week = 34 ) ) );
Add includes and other predicates as needed

How could I make this EF Core query better?

I need to fetch from the database this:
rack
it's type
single shelf with all its boxes and their box types
single shelf above the previous shelf without boxes and with shelf type
Shelves have VerticalPosition which is in centimeters from the ground - when I am querying for e.g. second shelf in rack, I need to order them and select shelf on index 1.
I have this ugly EF query now:
var targetShelf = await _warehouseContext.Shelves
.Include(s => s.Rack)
.ThenInclude(r => r.Shelves)
.ThenInclude(s => s.Type)
.Include(s => s.Rack)
.ThenInclude(r => r.Type)
.Include(s => s.Rack)
.ThenInclude(r => r.Shelves)
.Include(s => s.Boxes)
.ThenInclude(b => b.BoxType)
.Where(s => s.Rack.Aisle.Room.Number == targetPosition.Room)
.Where(s => s.Rack.Aisle.Letter == targetPosition.Aisle)
.Where(s => s.Rack.Position == targetPosition.Rack)
.OrderBy(s => s.VerticalPosition)
.Skip(targetPosition.ShelfNumber - 1)
.FirstOrDefaultAsync();
but this gets all boxes from all shelves and it also shows warning
Compiling a query which loads related collections for more than one collection navigation, either via 'Include' or through projection, but no 'QuerySplittingBehavior' has been configured. By default, Entity Framework will use 'QuerySplittingBehavior.SingleQuery', which can potentially result in slow query performance.
Also I would like to use AsNoTracking(), because I don't need change tracker for these data.
First thing: for AsNoTracking() I would need to query Racks, because it complains about circular include.
Second thing: I tried conditional include like this:
.Include(r => r.Shelves)
.ThenInclude(s => s.Boxes.Where(b => b.ShelfId == b.Shelf.Rack.Shelves.OrderBy(sh => sh.VerticalPosition).Skip(shelfNumberFromGround - 1).First().Id))
but this won't even translate to SQL.
I have also thought of two queries - one will retrieve rack with shelves and second only boxes, but I still wonder if there is some single call command for this.
Entities:
public class Rack
{
public Guid Id { get; set; }
public Guid RackTypeId { get; set; }
public RackType Type { get; set; }
public ICollection<Shelf> Shelves { get; set; }
}
public class RackType
{
public Guid Id { get; set; }
public ICollection<Rack> Racks { get; set; }
}
public class Shelf
{
public Guid Id { get; set; }
public Guid ShelfTypeId { get; set; }
public Guid RackId { get; set; }
public int VerticalPosition { get; set; }
public ShelfType Type { get; set; }
public Rack Rack { get; set; }
public ICollection<Box> Boxes { get; set; }
}
public class ShelfType
{
public Guid Id { get; set; }
public ICollection<Shelf> Shelves { get; set; }
}
public class Box
{
public Guid Id { get; set; }
public Guid ShelfId { get; set; }
public Guid BoxTypeId { get; set; }
public BoxType BoxType { get; set; }
public Shelf Shelf { get; set; }
}
public class BoxType
{
public Guid Id { get; set; }
public ICollection<Box> Boxes { get; set; }
}
I hope I explained it good enough.
Query Splitting
First, I'd recommend benchmarking the query as-is before deciding whether to attempt any optimization.
It can be faster to perform multiple queries than one large query with many joins. While you avoid a single complex query, you have additional network round-trips if your DB isn't on the same machine, and some databases (e.g. SQL Server without MARS enabled) only support one active query at a time. Your mileage may vary in terms of actual performance.
Databases do not generally guarantee consistency between separate queries (SQL Server allows you to mitigate that with the performance-expensive options of serializable or snapshot transactions). You should be cautious using a multiple-query strategy if intervening data modifications are possible.
To split a specific query, use the AsSplitQuery() extension method.
To use split queries for all queries against a given DB context,
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
optionsBuilder
.UseSqlServer(
#"Server=(localdb)\mssqllocaldb;Database=EFQuerying;Trusted_Connection=True;ConnectRetryCount=0",
o => o.UseQuerySplittingBehavior(QuerySplittingBehavior.SplitQuery));
}
Reference.
Query that won't translate
.Include(r => r.Shelves)
.ThenInclude(s => s.Boxes.Where(b => b.ShelfId == b.Shelf.Rack.Shelves.OrderBy(sh => sh.VerticalPosition).Skip(shelfNumberFromGround - 1).First().Id))
Your expression
s.Boxes.Where(b => b.ShelfId == b.Shelf.Rack.Shelves.OrderBy(sh => sh.VerticalPosition).Skip(shelfNumberFromGround - 1).First().Id
resolves to an Id. ThenInclude() expects an expression that ultimately specifies a collection navigation (in other words, a table).
Ok, from your question I'm assuming you have a method where you need these bits of information:
single shelf with all its boxes and their box types
single shelf above the previous shelf without boxes and with shelf type
rack and it's type
Whether EF breaks up the queries or you do doesn't really make much of a difference performance-wise. What matters is how well the code is later understood and can adapt if/when requirements change.
The first step I would recommend is to identify the scope of detail you actually need. You mention that you don't need tracking, so I would expect you intend to deliver these results or otherwise consume the information without persisting changes. Project this down to just the details from the various tables that you need to be served by a DTO or ViewModel, or an anonymous type if the data doesn't really need to travel. For instance you will have a shelf & shelf type which is effectively a many-to-one so the shelf type details can probably be part of the shelf results. Same with the Box and BoxType details. A shelf would then have an optional set of applicable box details. The Rack & Racktype details can come back with one of the shelf queries.
[Serializable]
public class RackDTO
{
public int RackId { get; set; }
public int RackTypeId { get; set; }
public string RackTypeName { get; set; }
}
[Serializable]
public class ShelfDTO
{
public int ShelfId { get; set; }
public int VerticalPosition { get; set; }
public int ShelfTypeId { get; set; }
public string ShelfTypeName { get; set; }
public ICollection<BoxDTO> Boxes { get; set; } = new List<BoxDTO>();
public RackDTO Rack { get; set; }
}
[Serializable]
public class BoxDTO
{
public int BoxId { get; set; }
public int BoxTypeId { get; set; }
public string BoxTypeName { get; set; }
}
Then when reading the information, I'd probably split it into two queries. One to get the "main" shelf, then a second optional one to get the "previous" one if applicable.
ShelfDTO shelf = await _warehouseContext.Shelves
.Where(s => s.Rack.Aisle.Room.Number == targetPosition.Room
&& s.Rack.Aisle.Letter == targetPosition.Aisle
&& s.Rack.Position == targetPosition.Rack)
.Select(s => new ShelfDTO
{
ShelfId = s.ShelfId,
VerticalPosition = s.VerticalPosition,
ShelfTypeId = s.ShelfType.ShelfTypeId,
ShelfTypeName = s.ShelfType.Name,
Rack = s.Rack.Select(r => new RackDTO
{
RackId = r.RackId,
RackTypeId = r.RackType.RackTypeId,
RackTypeName = r.RackType.Name
}).Single(),
Boxes = s.Boxes.Select(b => new BoxDTO
{
BoxId = b.BoxId,
BoxTypeId = b.BoxType.BoxTypeId,
BoxTypeName = b.BoxType.Name
}).ToList()
}).OrderBy(s => s.VerticalPosition)
.Skip(targetPosition.ShelfNumber - 1)
.FirstOrDefaultAsync();
ShelfDTO previousShelf = null;
if (targetPosition.ShelfNumber > 1 && shelf != null)
{
previousShelf = await _warehouseContext.Shelves
.Where(s => s.Rack.RackId == shelf.RackId
&& s.VerticalPosition < shelf.VerticalPosition)
.Select(s => new ShelfDTO
{
ShelfId = s.ShelfId,
VerticalPosition = s.VerticalPosition,
ShelfTypeId = s.ShelfType.ShelfTypeId,
ShelfTypeName = s.ShelfType.Name,
Rack = s.Rack.Select(r => new RackDTO
{
RackId = r.RackId,
RackTypeId = r.RackType.RackTypeId,
RackTypeName = r.RackType.Name
}).Single()
}).OrderByDescending(s => s.VerticalPosition)
.FirstOrDefaultAsync();
}
Two fairly simple to read queries that should return what you need without much problem. Because we project down to a DTO we don't need to worry about eager loading and potential cyclical references if we wanted to load an entire detached graph. Obviously this would need to be fleshed out to include the details from the shelf, box, and rack that are relevant to the consuming code/view. This can be trimmed down even more by leveraging Automapper and it's ProjectTo method to take the place of that whole Select projection as a one-liner.
In SQL raw it could look like
WITH x AS(
SELECT
r.*, s.Id as ShelfId, s.Type as ShelfType
ROW_NUMBER() OVER(ORDER BY s.verticalposition) as shelfnum
FROM
rooms
JOIN aisles on aisles.RoomId = rooms.Id
JOIN racks r on r.AisleId = aisles.Id
JOIN shelves s ON s.RackId = r.Id
WHERE
rooms.Number = #roomnum AND
aisles.Letter = #let AND
r.Position = #pos
)
SELECT *
FROM
x
LEFT JOIN boxes b
ON
b.ShelfId = x.ShelfId AND x.ShelfNum = #shelfnum
WHERE
x.ShelfNum BETWEEN #shelfnum AND #shelfnum+1
The WITH uses room/aisle/rack joins to locate the rack; you seem to have these identifiers. Shelves are numbered in increasing height off ground. Outside the WITH, boxes are left joined only if they are on the shelf you want, but two shelves are returned; the shelf you want with all it's boxes and the shelf above but box data will be null because the left join fails
As an opinion, if your query is getting this level of depth, you might want to consider either using views as a shortcut in your database or use No-SQL as a read store.
Having to do lots of joins, and doing taxing operations like order by during runtime with LINQ is something I'd try my best to avoid.
So I'd approach this as a design problem, rather than a code/query problem.
In EF, All related entities loaded with Include, ThenInclude etc. produce joins on the database end. This means that when we load related master tables, the list values will get duplicated across all records, thus causing what is called "cartesian explosion". Due to this, there was a need to split huge queries into multiple calls, and eventually .AsSplitQuery() was introduced.
Eg:
var query = Context.DataSet<Transactions>()
.Include(x => x.Master1)
.Include(x => x.Master2)
.Include(x => x.Master3)
.ThenInclude(x => x.Master3.Masterx)
.Where(expression).ToListAsync();
Here we can introduce splitquery
var query = Context.DataSet<Transactions>()
.Include(x => x.Master1)
.Include(x => x.Master2)
.Include(x => x.Master3)
.ThenInclude(x => x.Master3.Masterx)
.Where(expression).AsSplitQuery.ToListAsync();
As an alternate to include this to all existing queries, which could be time consuming, we could specify this globally like
services.AddDbContextPool<EntityDataLayer.ApplicationDbContext>(options =>
{
options.EnableSensitiveDataLogging(true);
options.UseMySql(mySqlConnectionStr,
ServerVersion.AutoDetect(mySqlConnectionStr), x =>
x.UseQuerySplittingBehavior(QuerySplittingBehavior.SplitQuery)
x.EnableRetryOnFailure(
maxRetryCount: 10,
maxRetryDelay: TimeSpan.FromSeconds(30),
errorNumbersToAdd: null));
});
This will ensure that all queries are called as split queries.
Now in case we need single query, we can just override this by stating single query explicitly in individual queries. This may be done vice-versa though.
var data = await query.AsSingleQuery().ToListAsync();

Filter list of entity objects by string child object property using Linq Lambda

I am trying to return an IQueryable lands filtered by a child object property Owner.Name. Is working well with the query style solution, but I want to use a lambda one.
On short these are my classes mapped by EntityFramework:
public class Land
{
public int Id { get; set; }
public virtual ICollection<Owner> Owners { get; set; }
}
public class Owner
{
public int Id { get; set; }
public string Name { get; set; }
public int LandId { get; set; }
public virtual Land Lands { get; set; }
}
The query which is working fine:
var list = from land in db.Lands
join owner in db.Owners on land.Id equals Owner.LandId
where owner.Name.Contains("Smit")
select land;
I was trying using this:
var list = db.Lands.Where(lnd => lnd.Owners.Count() > 0 &&
lnd.Owners.Where(own => own.Name.Contains("Smit")).Count() > 0);
It works only for small lists, but for some with thousands of records it gives timeout.
Well, one issue which may be causing the speed problem is that your lambda version and your non-lambda versions do very different things. You're non lambda is doing a join with a where on one side of the join.
Why not just write the lambda equivalent of it?
var list = db.Lands.Join(db.Owners.Where(x=> x.Name.Contains("Smit")), a=> a.Id, b => b.LandId, (a,b) => a).toList();
I mean, that is the more direct equivalent of your non lambda
I think you can use this one:
var list = db.Lands.Where(lnd => lnd.Owners.Any(x => x.Name.Contains("Smit")));
Try something more straightforward:
var lands = db.Owners.Where(o => o.Name.Contains("Smit")).Select(o => o.Lands);
You just need to make sure that Owner.Name is not null and LINQ will do the rest.

Simple Linq query that joins two tables and returns a collection

I have two tables LookUpCodes and LookUpValues they are defined as below:
public partial class LookUpCodes
{
public int Id { get; set; }
public int? CodeId { get; set; }
public string Code { get; set; }
public string Description { get; set; }
}
public partial class LookUpValues
{
public int Id { get; set; }
public int CodeId { get; set; }
public string CodeValue { get; set; }
public bool? IsActive { get; set; }
}
Each LookUpCode can have multiple Values associated with it. I want to pass in a code and get associated list of values back.
This is probably a common question as I have seen this everywhere, I am not looking for an answer per se, if someone can just explain how to build the proper query I would be obliged.
Here is what I have done so far:
public IEnumerable<LookUpValues> GetValuesByCode(string cd)
{
var query = from code in _context.LookUpCodes
join values in _context.LookUpValues on code.CodeId equals values.CodeId
where code.Code == cd
select new { LookUpValues = values };
return (IEnumerable<LookUpValues>) query.ToList();
}
You are very close to that you are looking for:
public IEnumerable<LookUpValues> GetValuesByCode(string cd)
{
var query = from code in _context.LookUpCodes
join values in _context.LookUpValues
on code.CodeId equals values.CodeId
where code.Code == cd
select values;
return query;
}
Since you have written the join, I assume that you have understood how it works. However let's revisit it:
from a in listA
join b in listB
on a.commonId equals b.commonId
In the above snippet we join the contents of listA with the contents of listB and we base their join on a commonId property existing in items of both lists. Apparently the pair of a and b that fulfill the join criterion it would form one of the possible many results.
Then the where clause is applied on the results of the join. The joined items that pass thewherefilter is the new result. Even at this point the results is still pairs ofaandb`.
Last you project, using the select keyword each pair of the results to a new object. In your case, for each pair of code and values that passed also the where filter, you return only the values.

Select value from group with the highest count

Say I have two sets; Candy and Region. For each Candy I have one-to-many names of what that candy is called in a certain Region (CandyRegion).
I want to build a list of Candy objects where only a single CandyRegion.Name is selected. The CandyRegion.Name selected is determined by the most popular Region (where the number of candies available in that Region is the greatest).
Whats a suitable way to perform the query to find the most popular Region? What I have so far:
context.Candys
.Select(c => c.CandyRegions
.OrderByDescending(cr =>
/*
* Order CandyRegion by Candy.CandyRegion.Count outside
* the context of c?
*/
context.CandyRegions.Count(cr2 => cr2.RegionId == cr.RegionId)
)
.FirstOrDefault()
)
.ToList();
I have a feeling that the performance of the above query is going to be a problem.
Edit: Classes
public class Candy
{
public int Id { get; set; }
...
public virtual List<CandyRegion> CandyRegions { get; set; }
}
public class Region
{
public int Id { get; set; }
...
public virtual List<CandyRegion> CandyRegions { get; set; }
}
public class CandyRegion
{
public int Id { get; set; }
public string Name { get; set; }
...
public int RegionId { get; set; }
public virtual Region Region { get; set; }
public int CandyId { get; set; }
public virtual Candy Candy { get; set; }
}
This should do the trick, although I haven't tested it. Let me know this solves your problem.
Context.Candys
.Where(c => c.Id == c.CandyRegions
.FirstOrDefault(cr => cr.RegionId == c.CandyRegions
.GroupBy(grp => grp.RegionId)
.Select(r => new
{
RegionId = r.Key,
Total = r.Count()
}
).OrderByDescending(r => r.Total)
.Take(1)
.First().RegionId
).CandyId
)
.ToList();
Explanation of what's going on above...
Since the Region and the Candy both use CandyRegion as a bridging table, you can group the Candy's foreign key collection of CandyRegion by it's RegionId.
This provides an easy way of determining what the count is for each grouped set. Now that you have the counts, we want to order them from highest to lowest, and grab just the topmost item. We don't care about the rest.
Once that's done, it would just be a matter of selecting the first CandyRegion contained in the list that matches the determined RegionId and then comparing it's CandyId to the CandyID of the same CandyRegion.
Finally, when that's all done, you return the result as a list which would be the said Candies that you are after.
Any specific reason to query against context for getting count while you can have c.CandyRegion with you?
Below query will result in join and query with context will result in inner query.
context.Candy
.Select(c => c.CandyRegion
.OrderByDescending(cr =>
/*
* Order CandyRegion by Candy.CandyRegion.Count outside
* the context of c?
*/
c.CandyRegion.Count()
)
.FirstOrDefault()
)
.ToList();

Categories