How could I make this EF Core query better? - c#

I need to fetch from the database this:
rack
it's type
single shelf with all its boxes and their box types
single shelf above the previous shelf without boxes and with shelf type
Shelves have VerticalPosition which is in centimeters from the ground - when I am querying for e.g. second shelf in rack, I need to order them and select shelf on index 1.
I have this ugly EF query now:
var targetShelf = await _warehouseContext.Shelves
.Include(s => s.Rack)
.ThenInclude(r => r.Shelves)
.ThenInclude(s => s.Type)
.Include(s => s.Rack)
.ThenInclude(r => r.Type)
.Include(s => s.Rack)
.ThenInclude(r => r.Shelves)
.Include(s => s.Boxes)
.ThenInclude(b => b.BoxType)
.Where(s => s.Rack.Aisle.Room.Number == targetPosition.Room)
.Where(s => s.Rack.Aisle.Letter == targetPosition.Aisle)
.Where(s => s.Rack.Position == targetPosition.Rack)
.OrderBy(s => s.VerticalPosition)
.Skip(targetPosition.ShelfNumber - 1)
.FirstOrDefaultAsync();
but this gets all boxes from all shelves and it also shows warning
Compiling a query which loads related collections for more than one collection navigation, either via 'Include' or through projection, but no 'QuerySplittingBehavior' has been configured. By default, Entity Framework will use 'QuerySplittingBehavior.SingleQuery', which can potentially result in slow query performance.
Also I would like to use AsNoTracking(), because I don't need change tracker for these data.
First thing: for AsNoTracking() I would need to query Racks, because it complains about circular include.
Second thing: I tried conditional include like this:
.Include(r => r.Shelves)
.ThenInclude(s => s.Boxes.Where(b => b.ShelfId == b.Shelf.Rack.Shelves.OrderBy(sh => sh.VerticalPosition).Skip(shelfNumberFromGround - 1).First().Id))
but this won't even translate to SQL.
I have also thought of two queries - one will retrieve rack with shelves and second only boxes, but I still wonder if there is some single call command for this.
Entities:
public class Rack
{
public Guid Id { get; set; }
public Guid RackTypeId { get; set; }
public RackType Type { get; set; }
public ICollection<Shelf> Shelves { get; set; }
}
public class RackType
{
public Guid Id { get; set; }
public ICollection<Rack> Racks { get; set; }
}
public class Shelf
{
public Guid Id { get; set; }
public Guid ShelfTypeId { get; set; }
public Guid RackId { get; set; }
public int VerticalPosition { get; set; }
public ShelfType Type { get; set; }
public Rack Rack { get; set; }
public ICollection<Box> Boxes { get; set; }
}
public class ShelfType
{
public Guid Id { get; set; }
public ICollection<Shelf> Shelves { get; set; }
}
public class Box
{
public Guid Id { get; set; }
public Guid ShelfId { get; set; }
public Guid BoxTypeId { get; set; }
public BoxType BoxType { get; set; }
public Shelf Shelf { get; set; }
}
public class BoxType
{
public Guid Id { get; set; }
public ICollection<Box> Boxes { get; set; }
}
I hope I explained it good enough.

Query Splitting
First, I'd recommend benchmarking the query as-is before deciding whether to attempt any optimization.
It can be faster to perform multiple queries than one large query with many joins. While you avoid a single complex query, you have additional network round-trips if your DB isn't on the same machine, and some databases (e.g. SQL Server without MARS enabled) only support one active query at a time. Your mileage may vary in terms of actual performance.
Databases do not generally guarantee consistency between separate queries (SQL Server allows you to mitigate that with the performance-expensive options of serializable or snapshot transactions). You should be cautious using a multiple-query strategy if intervening data modifications are possible.
To split a specific query, use the AsSplitQuery() extension method.
To use split queries for all queries against a given DB context,
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
optionsBuilder
.UseSqlServer(
#"Server=(localdb)\mssqllocaldb;Database=EFQuerying;Trusted_Connection=True;ConnectRetryCount=0",
o => o.UseQuerySplittingBehavior(QuerySplittingBehavior.SplitQuery));
}
Reference.
Query that won't translate
.Include(r => r.Shelves)
.ThenInclude(s => s.Boxes.Where(b => b.ShelfId == b.Shelf.Rack.Shelves.OrderBy(sh => sh.VerticalPosition).Skip(shelfNumberFromGround - 1).First().Id))
Your expression
s.Boxes.Where(b => b.ShelfId == b.Shelf.Rack.Shelves.OrderBy(sh => sh.VerticalPosition).Skip(shelfNumberFromGround - 1).First().Id
resolves to an Id. ThenInclude() expects an expression that ultimately specifies a collection navigation (in other words, a table).

Ok, from your question I'm assuming you have a method where you need these bits of information:
single shelf with all its boxes and their box types
single shelf above the previous shelf without boxes and with shelf type
rack and it's type
Whether EF breaks up the queries or you do doesn't really make much of a difference performance-wise. What matters is how well the code is later understood and can adapt if/when requirements change.
The first step I would recommend is to identify the scope of detail you actually need. You mention that you don't need tracking, so I would expect you intend to deliver these results or otherwise consume the information without persisting changes. Project this down to just the details from the various tables that you need to be served by a DTO or ViewModel, or an anonymous type if the data doesn't really need to travel. For instance you will have a shelf & shelf type which is effectively a many-to-one so the shelf type details can probably be part of the shelf results. Same with the Box and BoxType details. A shelf would then have an optional set of applicable box details. The Rack & Racktype details can come back with one of the shelf queries.
[Serializable]
public class RackDTO
{
public int RackId { get; set; }
public int RackTypeId { get; set; }
public string RackTypeName { get; set; }
}
[Serializable]
public class ShelfDTO
{
public int ShelfId { get; set; }
public int VerticalPosition { get; set; }
public int ShelfTypeId { get; set; }
public string ShelfTypeName { get; set; }
public ICollection<BoxDTO> Boxes { get; set; } = new List<BoxDTO>();
public RackDTO Rack { get; set; }
}
[Serializable]
public class BoxDTO
{
public int BoxId { get; set; }
public int BoxTypeId { get; set; }
public string BoxTypeName { get; set; }
}
Then when reading the information, I'd probably split it into two queries. One to get the "main" shelf, then a second optional one to get the "previous" one if applicable.
ShelfDTO shelf = await _warehouseContext.Shelves
.Where(s => s.Rack.Aisle.Room.Number == targetPosition.Room
&& s.Rack.Aisle.Letter == targetPosition.Aisle
&& s.Rack.Position == targetPosition.Rack)
.Select(s => new ShelfDTO
{
ShelfId = s.ShelfId,
VerticalPosition = s.VerticalPosition,
ShelfTypeId = s.ShelfType.ShelfTypeId,
ShelfTypeName = s.ShelfType.Name,
Rack = s.Rack.Select(r => new RackDTO
{
RackId = r.RackId,
RackTypeId = r.RackType.RackTypeId,
RackTypeName = r.RackType.Name
}).Single(),
Boxes = s.Boxes.Select(b => new BoxDTO
{
BoxId = b.BoxId,
BoxTypeId = b.BoxType.BoxTypeId,
BoxTypeName = b.BoxType.Name
}).ToList()
}).OrderBy(s => s.VerticalPosition)
.Skip(targetPosition.ShelfNumber - 1)
.FirstOrDefaultAsync();
ShelfDTO previousShelf = null;
if (targetPosition.ShelfNumber > 1 && shelf != null)
{
previousShelf = await _warehouseContext.Shelves
.Where(s => s.Rack.RackId == shelf.RackId
&& s.VerticalPosition < shelf.VerticalPosition)
.Select(s => new ShelfDTO
{
ShelfId = s.ShelfId,
VerticalPosition = s.VerticalPosition,
ShelfTypeId = s.ShelfType.ShelfTypeId,
ShelfTypeName = s.ShelfType.Name,
Rack = s.Rack.Select(r => new RackDTO
{
RackId = r.RackId,
RackTypeId = r.RackType.RackTypeId,
RackTypeName = r.RackType.Name
}).Single()
}).OrderByDescending(s => s.VerticalPosition)
.FirstOrDefaultAsync();
}
Two fairly simple to read queries that should return what you need without much problem. Because we project down to a DTO we don't need to worry about eager loading and potential cyclical references if we wanted to load an entire detached graph. Obviously this would need to be fleshed out to include the details from the shelf, box, and rack that are relevant to the consuming code/view. This can be trimmed down even more by leveraging Automapper and it's ProjectTo method to take the place of that whole Select projection as a one-liner.

In SQL raw it could look like
WITH x AS(
SELECT
r.*, s.Id as ShelfId, s.Type as ShelfType
ROW_NUMBER() OVER(ORDER BY s.verticalposition) as shelfnum
FROM
rooms
JOIN aisles on aisles.RoomId = rooms.Id
JOIN racks r on r.AisleId = aisles.Id
JOIN shelves s ON s.RackId = r.Id
WHERE
rooms.Number = #roomnum AND
aisles.Letter = #let AND
r.Position = #pos
)
SELECT *
FROM
x
LEFT JOIN boxes b
ON
b.ShelfId = x.ShelfId AND x.ShelfNum = #shelfnum
WHERE
x.ShelfNum BETWEEN #shelfnum AND #shelfnum+1
The WITH uses room/aisle/rack joins to locate the rack; you seem to have these identifiers. Shelves are numbered in increasing height off ground. Outside the WITH, boxes are left joined only if they are on the shelf you want, but two shelves are returned; the shelf you want with all it's boxes and the shelf above but box data will be null because the left join fails

As an opinion, if your query is getting this level of depth, you might want to consider either using views as a shortcut in your database or use No-SQL as a read store.
Having to do lots of joins, and doing taxing operations like order by during runtime with LINQ is something I'd try my best to avoid.
So I'd approach this as a design problem, rather than a code/query problem.

In EF, All related entities loaded with Include, ThenInclude etc. produce joins on the database end. This means that when we load related master tables, the list values will get duplicated across all records, thus causing what is called "cartesian explosion". Due to this, there was a need to split huge queries into multiple calls, and eventually .AsSplitQuery() was introduced.
Eg:
var query = Context.DataSet<Transactions>()
.Include(x => x.Master1)
.Include(x => x.Master2)
.Include(x => x.Master3)
.ThenInclude(x => x.Master3.Masterx)
.Where(expression).ToListAsync();
Here we can introduce splitquery
var query = Context.DataSet<Transactions>()
.Include(x => x.Master1)
.Include(x => x.Master2)
.Include(x => x.Master3)
.ThenInclude(x => x.Master3.Masterx)
.Where(expression).AsSplitQuery.ToListAsync();
As an alternate to include this to all existing queries, which could be time consuming, we could specify this globally like
services.AddDbContextPool<EntityDataLayer.ApplicationDbContext>(options =>
{
options.EnableSensitiveDataLogging(true);
options.UseMySql(mySqlConnectionStr,
ServerVersion.AutoDetect(mySqlConnectionStr), x =>
x.UseQuerySplittingBehavior(QuerySplittingBehavior.SplitQuery)
x.EnableRetryOnFailure(
maxRetryCount: 10,
maxRetryDelay: TimeSpan.FromSeconds(30),
errorNumbersToAdd: null));
});
This will ensure that all queries are called as split queries.
Now in case we need single query, we can just override this by stating single query explicitly in individual queries. This may be done vice-versa though.
var data = await query.AsSingleQuery().ToListAsync();

Related

.NET Core Dapper: Get data by joining multiple tables on combined primary keys as object with list of objects

I've been using Dapper to access data from an older database with a specific structure and am having trouble with some issues. Many of them have some sort of answer but I can't seem to combine the solutions, here we go:
The object is to be created out of 3 different tables (inner join)
The objects are to be created in a nested object (a main object with lists of sub objects)
The required tables have combined primary keys (no separate unique keys)
The objects have identical property names (I'm not sure that this is an issue, haven't come far enough)
public class MainObject {
public long Id1 { get; set; }
public long Id2 { get; set; }
public string Id3 { get; set; }
public List<SubObject1> Subobject1 { get; set; }
public List<SubObject2> Subobject2 { get; set; }
public string OtherProps { get; set; }
}
public class SubObject1 {
public long Id1 { get; set; }
public long Id2 { get; set; }
public string Id3 { get; set; }
}
public class SubObject1 {
public long Id1 { get; set; }
public long Id2 { get; set; }
public string Id3 { get; set; }
}
I've been trying to combine issues 1 and 2 as described in this StackOverflow answer. After that I've been trying to add issue 3 as described in this StackOverflow answer, but haven't been able to make it work. An error I get frequently is System.ArgumentException: When using the multi-mapping APIs ensure you set the splitOn param if you have keys other than Id, so I'm not sure I even understand the entire concept.
My query (which is able to return multiple rows) is structured as:
SELECT MainObject.*, SubObject1.*, SubObject2.*
FROM MainObject
INNER JOIN SubObject1 ON MainObject.Id1 = SubObject1.Id1
AND MainObject.Id2 = SubObject1.Id2 AND MainObject.Id3 = SubObject1.Id3
INNER JOIN SubObject2 ON MainObject.Id1 = SubObject2.Id1
AND MainObject.Id2 = SubObject2.Id2 AND MainObject.Id3 = SubObject2.Id3
WHERE MainObject.OtherProps = 'SomeValue'
Preferable the output would be of type List<MainObject>.
I'm open to all remarks and hints
E: The reason we chose Dapper is because we're reluctant to use Entity Framework and our current mapping has performance issues. At the moment we query for List, which loops and queries for every SubObject1 and SubObject2 separately (thus executing a lot of queries).
Good morning mate
Look at your simple query I suggest that you use the Linq, it would be easier and have the resource you want ...
The dapper is used to query more complex and heavy ones where you select the fields eg: "SELEC a.id as a_id, B.id as b_id" and then manually map.
I suggest in this case to use Linq.
Linq example:
var blogs1 = context.Blogs
.Include(b => b.Posts.Select(p => p.Comments))
.Include(b => b.Users)
.Include(b => b.Users.City)
.ToList();
After trying to combine the different solutions for a while I stumbled on an article that took a different approach. Instead of trying to map the objects from a single query, it split up the joins in a QueryMultiple statement. I then used Linq to map the multiple objects as one.
public List<MainObject> GetMainObjects(string conn)
{
List<MainObject> mos = new List<MainObject>();
using (IDbConnection connection = new SqlConnection(conn))
{
SqlMapper.GridReader results = connection.QueryMultiple(
$"SELECT * FROM MainObject; " +
$"SELECT * FROM SubObject1; " +
$"SELECT * FROM SubObject2;");
mos = results.Read<MainObject>().ToList();
IEnumerable<SubObject1> so1s = results.Read<SubObject1>();
mos.ForEach(mo => mo.SubObject1 = new List<SubObject1>());
mos.ForEach(mo => mo.SubObject1.AddRange(
so1s.Where(so1 => mo.Id1 == so1.Id1 && mo.Id2 == so1.Id2 && mo.Id3.Equals(so1.Id3))
));
IEnumerable<SubObject2> so2s = results.Read<SubObject2>();
mos.ForEach(mo => mo.SubObject2 = new List<SubObject2>());
mos.ForEach(mo => mo.SubObject2.AddRange(
so2s.Where(so2 => mo.Id1 == so2.Id1 && mo.Id2 == so2.Id2 && mo.Id3.Equals(so2.Id3))
));
}
return mos;
}

Use Where Clause on navigation property. Many-to-many relationship

I have been looking through other examples on SO and I am still unable to resolve this.
I have the following model structure
public class Event
{
[Key]
public int ID { get; set; }
public ICollection<EventCategory> EventCategories{ get; set; }
}
public class Category
{
[Key]
public int ID { get; set; }
public ICollection<EventCategory> EventCategories{ get; set; }
}
public class EventCategory
{
[Key]
public int ID { get; set; }
public int EventID{ get; set; }
public Event Event{ get; set; }
public int CategoryID{ get; set; }
public Category Category{ get; set; }
}
From my Events controller I am trying to use a LINQ query to only show Events where the CategoryID is equal to 1 but i keep on coming into errors with my WHERE clause I think.
UPDATE:
I have been trying multiple queries but at present it is
var eventsContext = _context.Events
.Include(e => e.EventCategories)
.Include(e=>e.EventCategories.Select(ms => ms.Category))
.Where(e=>e.ID==1)
.Take(15)
.OrderByDescending(o => o.StartDate);
This is the error I get
TIA
First, the lambda passed to Include must be a model expression. Specifically, that means you cannot use something like Select. If you're trying to include EventCategories.Category, then you should actually do:
.Include(e => e.EventCategories).ThenInclude(ms => ms.Category)
That will fix your immediate error. The next issue is that the way in which you're attempting to query the category ID is incorrect. The lamdas don't carry over from one clause to the next. In other words, when you're doing Where(e => e.ID == 1), e is Event, not Category. The fact that you just included Category doesn't limit the where clause to that context. Therefore, what you actually need is:
.Where(e => e.EventCategories.Any(c => c.CategoryID == 1))
For what it's worth, you could also write that as:
.Where(e => e.EventCategories.Any(c => c.Category.ID == 1))
Notice the . between Category and ID. Now this where clause requires joins to be made between all of Event, EventCategories, and Category, which then means you don't actually need your Include(...).ThenInclude(...) statement, since all this does is tell EF to make the same JOINs it's already making. I will still usually do the includes explicitly, though, as otherwise, if your where clause were to change in some future iteration, you may end up no longer implicitly including everything you actually want included. Just food for thought.

Simplify linq expression that orders children

I am trying to write a LINQ query that fetches a list of Course entities and their mapped Skill children. The are related with a join table with corresponding CourseId and SkillId. I wish to sort the Skill children with the Weight property and then with the SkillId property. I am using dot net core 2.0.
After looking at similar questions on how to sort/order a list of children to an entity here:
LINQ ".Include" orderby in subquery
Entity Framework Ordering Includes
I have come up with this:
// Create query to get all courses with their skills
var coursesWithUnorderedSkills= db.Courses
.Include(i => i.Skills)
.ThenInclude(i => i.Skill);
// Order the skills for each course
await coursesWithUnorderedSkills.ForEachAsync(x => x.Skills = x.Skills
.OrderBy(o => o.Weight)
.ThenBy(o => o.SkillId)
.ToList());
// Get a list of courses from the query
var coursesWithOrderedSkills = await q.ToListAsync();
How can this be simplified into a single query and will this query have any unexpected performance issues since I am calling ToList in the ForEachAsync call?
Models
public class Course
{
[Key]
public int Id { get; set; }
public List<CourseSkill> Skills { get; set; }
}
public class CourseSkill
{
public Course Course { get; set; }
public int CourseId { get; set; }
public Skill Skill { get; set; }
public int SkillId { get; set; }
public int Weight { get; set; } = 0;
}
public class Skill
{
[Key]
public int Id { get; set; }
}
Sorry about the comments, now with the model it looks clear to me what you are looking for. And you are right, the second statement would sort the Skills list.
Anyway, if you want to sort the child collection without risking calling twice to the database through your IQueryable, you can take first the list of courses asynchronously, and then sort the Skills in memory:
// Create the list with all courses with their skills
var coursesWithSkills= await db.Courses
.Include(i => i.Skills)
.ThenInclude(i => i.Skill)
.ToListAsync();
// Order the skills for each course once data is in memory
foreach(x in coursesWithSkills)
{
x.Skills = x.Skills.OrderBy(o => o.Weight)
.ThenBy(o => o.SkillId)
.ToList());
}
If you need that sorting part to not block the current thread, you should run it with a Task.Run rather than an async operation, as all the sorting work is CPU intensive and will be done in memory. But I wouldn't go for early optimization and I would leave the foreach block as it is until you see any performance issue.

Entity.HasRequired returning items with NULL property

I have two related entities built and linked with Fluent API.
public class EDeal : IEntityBase
{
public int ID { get; set; }
public string Customer_id { get; set; }
public virtual ECustomer Customer { get; set; }
...etc
}
public class ECustomer : IEntityBase
{
public int ID { get; set; }
public string Customer_id { get; set; }
public string Customer_name { get; set; }
public virtual ICollection<EDeal> Deals { get; set; }
...etc
}
linked with
modelBuilder.Entity<ECustomer>().HasKey(c => c.Customer_id);
modelBuilder.Entity<EDeal>().HasRequired<ECustomer>(s => s.Customer)
.WithMany(r => r.Deals)
.HasForeignKey(s => s.Customer_id);
I recognize that this is inefficient linking but I had to link it in this way because I don't have control over the db structure.
The important thing to note is that the EDeal requires an ECustomer (.HasRequired). The database contains many rows in EDeal that have a null Customer_id field and I do not want to ever pull those lines when I query the entity.
I thought that the .HasRequired would make sure that I never got back any EDeals that do not have ECustomers associated with them but that doesn't seem to be the case. Instead, it only seems to ignore those lines with NULL Customer_id values when I try to order by a property in the Customer. And even then, returning the .Count() of the query behaves strangely.
var count1 = db.Set<EDeal>().Count(); //returns 1112
var count2 = db.Set<EDeal>().ToList().Count(); //returns 1112
var count3 = db.Set<EDeal>().OrderBy(c => c.Customer.Customer_name).Count(); //returns 1112
var count4 = db.Set<EDeal>().OrderBy(c => c.Customer.Customer_name).ToList().Count(); //returns 967
I know I can add a .Where(c => c.Customer.Customer_id != Null) to make sure I only get back what I'm looking for, but I'm hoping for a solution in the Entity's configuration because I have many generic functions acting on my IEntityBase class that build dynamic queries on generic Entities and I don't want to use a workaround for this case.
Questions:
1) Is there a way to limit the entity to only return those EDeals that have a corresponding ECustomer?
2) In my example above, why do count3 and count4 differ?
Thanks in advance.

Filter linq/entity query results by related data

I'm using MVC5 EF6 and Identity 2.1.
I have two classes:
public class Incident
{
public int IncidentId {get; set;}
...//Title, Description, etc
public virtual ICollection<FollowedIncident> FollowedIncidents { get; set; }
public virtual ApplicationUser User { get; set; }
}
public class FollowedIncident
{
public int FollowedIncidentId { get; set; }
public string UserId { get; set; }
public int IncidentId { get; set; }
public virtual Incident Incident { get; set; }
public virtual ApplicationUser User { get; set; }
}
So, the users will have the ability to follow an incident. (For starters, I'm not entirely sure if I need the ICollection and public virtual relationship references, but added them just in case for the time being.)
I'm trying to create the query that will show users the results of their followed incidents. In my controller, my query starts like this (I'm using Troy Goode's paging package... i.e. listUnpaged):
IQueryable<Incident> listUnpaged = db.Incidents.OrderByDescending(d => d.IncidentDate);
Then I want to filter by followed incidents. So, I want to show incidents where userId (parameter I pass to it) is equal to UserId in FollowedIncident. I've tried like this (error about conversion to bool from IEnumerable):
listUnpaged = listUnpaged.Where(s => s.FollowedIncidents.Where(t => t.UserId == userId));
And this (no error, but doesn't filter at all):
listUnpaged = listUnpaged.Where(s => s.FollowedIncidents.All(t => t.UserId == userId));
To me, it seems it should be as simple as this:
listUnpaged = listUnpaged.Where(s => s.FollowedIncidents.UserId == userId));
But, the linq extensions don't seem to like related data child properties? (I apologize for my programming terminology as I haven't quite pieced together all the names for everything yet.)
Anyone know how to accomplish this? It seems I may not even be thinking about it correct? (...since in the past, I've always used related data to supplement or add properties to a result. This will be the first time I want to narrow results by related data.)
Thank you.
Actually you're going about getting the Incidents the wrong way.. since Incident is a navigation property of FollowedIncident you should just use
IQueryable<Incident> listUnpaged = db.FollowedIncidents
.Where(a => a.UserId == userid)
.Select(a => a.Incident)
.OrderByDescending(d => d.IncidentDate);
Another option is to use Any()
IQueryable<Incident> listUnpaged = db.Incidents
.Where(a => a.FollowedIncidents.Any(b => b.UserId == userid)
.OrderByDescending(d => d.IncidentDate);
which would be like saying
Select *
From Incidents
Where Id IN (Select IncidentId
From FollowedIncident
Where UserId = #UserId)

Categories