EF Core Cascading Deletes for Speed?

EF Core Cascading Deletes for Speed? - c#

I'm working on an established (but changeable, assuming existing data survives any changes) code base and investigating some very slow deletes. So far I've only succeeded in making things worse, so here we are. I've backed out most of my attempted changes below to avoid adding extra unnecessary confusion.
There's a data class ProductDefinition which models a same-object hierarchy similar to e.g. a folder structure: every PD (except the root) will have one parent, but like a folder can have multiple children.
public class ProductDefinition
{
public int ID { get; set; }
// each tree of PDs should have a 'head' which will have no parent
// but most will have a ParentPDID and corresponding ParentPD
public virtual ProductDefinition ParentProductDefinition { get; set; }
public int? ParentProductDefinitionId { get; set; }
public virtual List<ProductDefinition> ProductDefinitions { get; set; }
= new List<ProductDefinition>();
[Required]
[StringLength(100)]
public string Name { get; set; }
// etc. Fields. Nothing so large you'd expect speed issues
}
The corresponding table has been specifically declared in the Context
public DbSet<ProductDefinition> ProductDefinitions { get; set; }
Along with a Fluent API relationship defined on Context.OnModelCreating
modelBuilder.Entity<ProductDefinition>()
.HasMany(productDefinition => productDefinition.ProductDefinitions)
.WithOne(childPd => childPd.ParentProductDefinition)
.HasForeignKey(childPd => childPd.ParentProductDefinitionId)
.HasPrincipalKey(productDefinition => productDefinition.ID);
It looks like an attempt has already been made to firm up deletion in the ProductDefinitionManager class
public static async Task ForceDelete(int ID, ProductContext context)
{
// wrap the recursion in a save so that it only happens once
await ForceDeleteNoSave(ID, context);
await context.SaveChangesAsync();
}
And
private static async Task ForceDeleteNoSave(int ID, ProductContext context)
{
var pd = await context.ProductDefinitions
.AsNoTracking()
.Include(x => x.ProductDefinitions)
.SingleAsync(x => x.ID == ID);
if (pd.ProductDefinitions != null && pd.ProductDefinitions.Count != 0)
{
var childIDs = pd.ProductDefinitions.Select(x => x.ID).ToList();
// delete the children recursively
foreach (var child in childIDs)
{
// EDITED HERE TO CORRECTLY REFLECT THE CURRENT CODE BASE
await ForceDeleteNoSave(child, context);
}
}
// delete the PD
// mark Supplier as edited
var supplier = await context.Suppliers.FindAsync(pd.SupplierID);
supplier.Edited = true;
// reload with tracking
pd = await context.ProductDefinitions.FirstOrDefaultAsync(x => x.ID == ID);
context.ProductDefinitions.Remove(pd);
}
At present, the above solution 'works', but:
a) takes over 2 minutes to complete
b) Seems to be giving the React front end a 502 error (but see above). Certainly the FE is claiming a 502
My primary question is: is there a way to improve the deletion speed, e.g. by defining a cascading delete in FluentAPI (my attempt hit an issue when trying to apply the migration)? But I welcome any discussion of what might be causing the FE to report Bad Gateway.

Unfortunately this is self refencing relationship and cascade delete cannot be used due to "multiple cascade paths" issue - limitation of SqlServer (and probably other) database (Oracle has no such issue).
The best way to handle in the databases which does not support "multiple cascade paths" is to use database trigger ("instead of delete").
But let say we want to handle it via client code in EF Core. The question is how to load effectively a recursive tree like structure (another not easy task in EF Core due to lack of recursive query support).
The problem with your code is that it uses depth first algorithm, which executes a lot of database queries. The more appropriate and performant way is to use breath first algorithm - in simple words, loading the items by level. This way the number of the database queries would be the maximum depth in the tree, which is way less than the number of the elements.
One way to implement that is to start with a query with initial filter applies, and then use SelectMany to get the next level (each SelectMany adds a join to the previous query). The process ends when the query does not return data:
public static async Task ForceDelete(int ID, ProductContext context)
{
var items = new List<ProductDefinition>();
// Collect the items by level
var query = context.ProductDefinitions.Where(e => e.ID == ID);
while (true)
{
var nextLevel = await query
.Include(e => e.Supplier)
.ToListAsync();
if (nextLevel.Count == 0) break;
items.AddRange(nextLevel);
query = query.SelectMany(e => e.ProductDefinitions);
}
foreach (var item in items)
item.Supplier.Edited = true;
context.RemoveRange(items);
await context.SaveChangesAsync();
}
Note that the executed queries eager load the related Supplier so it ca easily be updated.
Once the items are collected, they are simply marked for deleting via RemoveRange method. The order doesn't matter because EF Core will apply the commands by the dependency order anyway.
Another way to collect the items is to use the IDs from the previous level as a filter (SQL IN):
// Collect the items by level
Expression<Func<ProductDefinition, bool>> filter = e => e.ID == ID;
while (true)
{
var nextLevel = await context.ProductDefinitions
.Include(e => e.Supplier)
.Where(filter)
.ToListAsync();
if (nextLevel.Count == 0) break;
items.AddRange(nextLevel);
var parentIds = nextLevel.Select(e => e.ID);
filter = e => parentIds.Contains(e.ParentProductDefinitionId.Value);
}
I like more the former. The drawback is that EF Core generates a huge table name aliases, and also it could hit some SQL join number limitation in case of big depth. The later has no depth limitation, but might have issues with big IN clause. You should check which one is more appropriate for your case.

Ok. It is a bit hard to understand exactly why this is slow. How big is the data structure etc.
The first thing that springs to my eye when I look at the above code is the following:
public static async Task ForceDelete(int ID, ProductContext context)
{
// wrap the recursion in a save so that it only happens once
await ForceDeleteNoSave(ID, context);
await context.SaveChangesAsync();
}
This method is called recursively but every time you are done with a bunch of children it will call context.SaveChagesAsync(). Which means when you run the code you will get multiple saves and multiple calls to the database.
This seems like an anti-pattern, because if your program crashes half-way through it has already deleted some of the children.
Instead have an InitForceDelete() that in the end will call the context.SaveChangesAsync() so it is all done in one operation.
Something like this:
public static async Task InitForceDelete(int ID, ProductContext context)
{
// wrap the recursion in a save so that it only happens once
await ForceDeleteNoSave(ID, context);
await context.SaveChangesAsync();
}
private static async Task ForceDeleteNoSave(int ID, ProductContext context)
{
var pd = await context.ProductDefinitions
.AsNoTracking()
.Include(x => x.ProductDefinitions)
.SingleAsync(x => x.ID == ID);
if (pd.ProductDefinitions != null && pd.ProductDefinitions.Count != 0)
{
var childIDs = pd.ProductDefinitions.Select(x => x.ID).ToList();
// delete the children recursively
foreach (var child in childIDs)
{
await ForceDeleteNoSave(child, context);
}
}
var supplier = await context.Suppliers.FindAsync(pd.SupplierID);
supplier.Edited = true;
// reload with tracking
pd = await context.ProductDefinitions.FirstOrDefaultAsync(x => x.ID == ID);
context.ProductDefinitions.Remove(pd);
}
Now secondly you should try to inspect the sql that is being executed on your SQL server. You should be able to find the execution plans triggered by your LINQ statements and see if the SQL is completely crazy. Maybe your code is executing one call per ProductDefinition which would make it super slow.
I am sorry I cannot be more precise, but from the code you have presented it is hard to give direct pointers except for your constant call to context.SaveChagesAsync().

Related

Infinite loop in a circular reference with Entity Framework and .NET Core

My question is: why does Entity Framework Core does it automatically load subclasses when using .Include? From my understanding I am supposed to specify the .ThenInclude but even without it, it loads..
The details
I have an unavoidable circular reference and specifically for that reason I avoided lazy loading, so that I could hard specify what should be included.
My data model - I have highlighted the circular reference:
A user should be able to subscribe to a bunch of categories.
A job should belong to one or more category
What I want to achieve
I built an endpoint that takes 3 optional parameters, a userId, JobId and a collection of categories.
I want to return a collection of jobs filtered by the above parameters
My attempt that is causing the infinite loop
var user = await UserAccessor.GetAppUser();
var result = DataContext.Jobs
.Include(job => job.Customer)
.Include(job => job.Categories)
.AsNoTracking()
.AsQueryable();
if (request.JobId != null)
{
result = result.Where(job => job.Id == request.JobId);
}
if (request.UserEmail != null)
{
result = result.Where(job => job.Customer.Email == request.UserEmail);
}
if (request.Categories != null)
{
var categories = await CategoryService.ConvertStringToCategories(request.Categories);
result = result.Where(job => job.Categories.Intersect(categories).Any());
}
return Mapper.Map<List<Job>, List<JobDto>>(await result.ToListAsync());
If I put a result.ToList() anywhere in the code and look at the actual values it seems it goes into an infinite loop without me specifying the .ThenInclude().
Have a look at the screenshot below, the infinite loop goes as follows
Collection of jobs (this is the result variable)
Job has a property collection of category (this I included with .Include(job => job.Categories))
Category has a property collection of jobs (this should be null as I didn't include it with a .ThenInclude() but it is not arrrggg)

Entity Framework Core Nested List Performance

I struggle with the entity framework core performance while I try to add an additional item to a nested list.
Let's say as an example:
I have multiple projects, the project contains multiple houses, the house has multiple facades and the facade has multiple windows.
If I now want to add an additional window to a specific project, house, facade I do it like that:
public async Task SaveWindowAsync(Guid projectId, Guid houseId, Guid facadeId, WindowEntity windowEntity)
{
using (ProjectsDbContext context = new ProjectsDbContext())
{
var windowList = context.ProjectSet
.Include(p => p.Houses)
.ThenInclude(h => h.Facades)
.ThenInclude(f => f.Windows)
.First(p => p.Id == projectId).Houses
.First(h => h.Id == houseId).Facades
.First(f => f.Id == facadeId).Windows;
windowList.Add(windowEntity);
await context.SaveChangesAsync();
}
}
This works fine regarding the functionality. However the performance gets slower and slower when the database is increasing. Is there a more performant way to add an item to a nested list?
Update 1
I created a simple test database with this futuristic objects with 50 projects, each project has 10 houses, each house has 10 facades and each facade has 10 windows. this results in a database size of about 10Mb.
In the test I add 1000 Windows after each other (no bulk):
The solution mention above requires a total time of 145s.
The solution mentioned by #David Browne - Microsoft takes about 54s
var facadeEntity = context.Set<FacadeEntity>()
.Include(f => f.Windows)
.Single(f => f.Id == facadeId);
facadeEntity.Windows.Add(windowEntity);
await context.SaveChangesAsync();
Update 2
As recommended by #David Browne I added a ForeignKey to the window:
modelBuilder.Entity<FacadeEntity>()
.HasMany(f => f.Windows).WithOne()
.HasForeignKey(f => f.FacadeId)
.OnDelete(DeleteBehavior.Cascade);
The save is executed like that:
context.Entry(windowEntity).Property(nameof(WindowEntity.FacadeId)).CurrentValue = facadeId;
context.Set<WindowEntity>().Add(windowEntity);
await context.SaveChangesAsync();
This issue is the same the more windows I have the longer the add takes.
The durartion for 1000 Windows is around 53s.

I currently have only a "DbSet ProjectSet" you would add an additional DbSet to the context?
If you don't have declared DbSet<T> for an entity, access it through DbContext.Set<T>(), something like:
public static async Task SaveWindowAsync(Guid projectId, Guid houseId, Guid facadeId, Window windowEntity)
{
using (ProjectsDbContext context = new ProjectsDbContext())
{
var facade = context.Set<Facade>()
.Where(f => f.FacadeId == facadeId)
.Single();
facade.Windows.Add(windowEntity);
await context.SaveChangesAsync();
}
}
This translates to:
SELECT TOP(2) [f].[FacadeId], [f].[HouseId]
FROM [Facade] AS [f]
WHERE [f].[FacadeId] = #__facadeId_0
and then:
INSERT INTO [Window] ([WindowId], [FacadeId])
VALUES (#p0, #p1)
Assuming Facade has a single-column primary key. If it has a compound key of (ProjectId,HouseId,FacadeId), then add those to the Where.
The best way to do this, however is to set the Foreign Key property of Window.FacadeId and not load the Facade at all. In EF Core you can do this with Shadow Properties if you don't have a Foreign Key Property. EG:
public static async Task SaveWindowAsync(Guid projectId, Guid houseId, Guid facadeId, Window windowEntity)
{
using (ProjectsDbContext context = new ProjectsDbContext())
{
context.Entry(windowEntity).Property("FacadeId").CurrentValue = facadeId;
context.Set<Window>().Add(windowEntity);
await context.SaveChangesAsync();
}
}

You should firstly select from your Windows DbSet then join Facades then join houses
your query should looks like that
var windowList = context.DbSet<Windows>().Where(w => w.Facade.Id== facadeId && w.Facade.House.Id == houseId && w.Facade.House.Project.Id == projectId)

If you're trying to add a Window to a specific project/house/facade, can you just set the FacadeId on the WindowEntity directly and save that? Presumably, the Window has a FacadeId property, just as Facade has a HouseId, and House has a ProjectId. If Window has the Ids of all its parents (unnecessary), then just set the House and Project Ids as well.
public async Task SaveWindowAsync(Guid projectId, Guid houseId, Guid facadeId, WindowEntity windowEntity)
{
using (ProjectsDbContext context = new ProjectsDbContext())
{
windowEntity.FacadeId = facadeId;
context.WindowSet.Add(windowEntity);
await context.SaveChangesAsync();
}
}
Update:
If you don't want to set the facadeId on the entity directly, then you can load just the facade, and set its Facade property rather than the Id. You can optionally add the windowEntity to the Facade's Windows collection as well if you are going to continue working with that instance of the collection.
public async Task SaveWindowAsync(Guid facadeId, WindowEntity windowEntity)
{
using (ProjectsDbContext context = new ProjectsDbContext())
{
var facade = Facades.Single(x => x.Id == facadeId);
windowEntity.Facade = facade;
facade.Windows.Add(windowEntity);
await context.SaveChangesAsync();
}
}

Select() decline in performance

I'm working on small app which is written in c# .net core and I'm populating one prop in a code because that information is not available in database, code looks like this:
public async Task<IEnumerable<ProductDTO>> GetData(Request request)
{
IQueryable<Product> query = _context.Products;
var products = await query.ToListAsync();
// WARNING - THIS SOLUTION LOOKS EXPENCIVE TO ME!
return MapDataAsDTO(products).Select(c =>
{
c.HasBrandStock = products.Any(cc => cc.ParentProductId == c.Id);
return c;
});
}
}
private IEnumerable<ProductDTO> MapDataAsDTO(IEnumerable<Product> products)
{
return products.Select(p => MapData(p)).ToList();
}
What is bothering me here is this code:
return MapDataAsDTO(products).Select(c =>
{
c.HasBrandStock = data.Any(cc => cc.ParentProductId == c.Id);
return c;
});
}
I've tested it on like 300k rows and it seems slow, I'm wondering is there a better solutions in this situations?
Thanks guys!
Cheers

First up, this method is loading all products, and generally that is a bad idea unless you are guaranteeing that the total number of records will remain reasonable, and the total size of those records will be reasonable. If the system can grow, add support for server-side pagination now. (Page # and Page size, leveraging Skip & Take) 300k products is not a reasonable number to be loading all data in one hit. Any way you skin this cat it will be slow, expensive, and error prone due to server load without paging. One user making a request on the server will need to have the DB server allocate for and load up 300k rows, transmit that data over the wire to the app server, which will allocate memory for those 300k rows, then transmit that data over the wire to the client who literally does not need those 300k rows at once. What do you think happens when 10 users hit this page? 100? And what happens when it's "to slow" and they start hammering the F5 key a few times. >:)
Second, async is not a silver bullet. It doesn't make queries faster, it actually makes them a bit slower. What it does do is allow your web server to be more responsive to other requests while those slower queries are running. Default to synchronous queries, get them running as efficiently as possible, then for the larger ones that are justified, switch them to asynchronous. MS made async extremely easy to implement, perhaps too easy to treat as a default. Keep it simple and synchronous to start, then re-factor methods to async as needed.
From what I can see you want to load all products into DTOs, and for products that are recognized as being a "parent" of at least one other product, you want to set their DTO's HasBrandStock to True. So given product IDs 1 and 2, where 2's parent ID is 1, the DTO for Product ID 1 would have a HasBrandStock True while Product ID 2 would have HasBrandStock = False.
One option would be to tackle this operation in 2 queries:
var parentProductIds = _context.Products
.Where(x => x.ParentProductId != null)
.Select(x => x.ParentProductId)
.Distinct()
.ToList();
var dtos = _context.Products
.Select(x => new ProductDTO
{
ProductId = x.ProductId,
ProductName = x.ProductName,
// ...
HasBrandStock = parentProductIds.Contains(x.ProductId)
}).ToList();
I'm using a manual Select here because I don't know what your MapAsDto method is actually doing. I'd highly recommend using Automapper and it's ProjectTo<T> method if you want to simplify the mapping code. Custom mapping functions can too easily hide expensive bugs like ToList calls when someone hits a scenario that EF cannot translate.
The first query gets a distinct list of just the Product IDs that are the parent ID of at least one other product. The second query maps out all products into DTOs, setting the HasBrandStock based on whether each product appears in the parentProductIds list or not.
This option will work if a relatively limited number of products are recognized as "parents". That first list can only get so big before it risks crapping out being too many items to translate into an IN clause.
The better option would be to look at your mapping. You have a ParentProductId, does a product entity have an associated ChildProducts collection?
public class Product
{
public int ProductId { get; set; }
public string ProductName { get; set; }
// ...
public virtual Product ParentProduct { get; set; }
public virtual ICollection<Product> ChildProducts { get; set; } = new List<Product>();
}
public class ProductConfiguration : EntityTypeConfiguration<Product>
{
public ProductConfiguration()
{
HasKey(x => x.ProductId);
HasOptional(x => x.ParentProduct)
.WithMany(x => x.ChildProducts)
.Map(x => x.MapKey("ParentProductId"));
}
}
This example maps the ParentProductId without exposing a field in the entity (recommended). Otherwise, if you do expose a ParentProductId, substitute the .Map(...) call with .HasForeignKey(x => x.ParentProductId).
This assumes EF6 as per your tags, if you're using EF Core then you use HasForeignKey("ParentProductId") in place of Map(...) to establish a shadow property for the FK without exposing a property. The entity configuration is a bit different with Core.
This allows your queries to leverage the relationship between parent products and any related children products. Populating the DTOs can be accomplished with one query:
var dtos = _context.Products
.Select(x => new ProductDTO
{
ProductId = x.ProductId,
ProductName = x.ProductName,
// ...
HasBrandStock = x.ChildProducts.Any()
}).ToList();
This leverages the relationship to populate your DTO and it's flag in one pass. The caveat here is that there is now a cyclical relationship between product and itself represented in the entity. This means don't feed entities to something like a serializer. That includes avoiding adding entities as members of DTOs/ViewModels.

Using Include with Intersect/Union/Exclude in Linq

What seemed that it should be a relatively straight-forward task has turned into something of a surprisingly complex issue. To the point that I'm starting to think that my methodology perhaps is simply out of scope with the capabilities of Linq.
What I'm trying to do is piece-together a Linq query and then invoke .Include() in order to pull-in values from a number of child entities. For example, let's say I have these entities:
public class Parent
{
public int Id { get; set; }
public string Name { get; set; }
public string Location { get; set; }
public ISet<Child> Children { get; set; }
}
public class Child
{
public int Id { get; set; }
public int ParentId { get; set; }
public Parent Parent { get; set; }
public string Name { get; set; }
}
And let's say I want to perform a query to retrieve records from Parent, where Name is some value and Location is some other value, and then include Child records, too. But for whatever reason I don't know the query values for Name and Location at the same time, so I have to take two separate queryables and join them, such:
MyDbContext C = new MyDbContext();
var queryOne = C.Parent.Where(p => p.Name == myName);
var queryTwo = C.Parent.Where(p => p.Location == myLocation);
var finalQuery = queryOne.Intersect(queryTwo);
That works fine, producing results exactly as if I had just done:
var query = C.Parent.Where(p => p.Name == myName && p.Location = myLocation);
And similarly, I can:
var finalQuery = queryOne.Union(queryTwo);
To give me results just as if I had:
var query = C.Parent.Where(p => p.Name == myName || p.Location = myLocation);
What I cannot do, however, once the Intersect() or Union() is applied, however, is then go about mapping the Child using Include(), as in:
finalQuery.Include(p => p.Children);
This code will compile, but produces results as follows:
In the case of a Union(), a result set will be produced, but no Child entities will be enumerated.
In the case of an Intersect(), a run-time error is generated upon attempt to apply Include(), as follows:
Expression of type
'System.Collections.Generic.IEnumerable`1[Microsoft.EntityFrameworkCore.Query.Internal.AnonymousObject]'
cannot be used for parameter of type
'System.Collections.Generic.IEnumerable`1[System.Object]' of method
'System.Collections.Generic.IEnumerable`1[System.Object]
Intersect[Object](System.Collections.Generic.IEnumerable`1[System.Object],
System.Collections.Generic.IEnumerable`1[System.Object])'
The thing that baffles me is that this code will work exactly as expected:
var query = C.Parent.Where(p => p.Name == myName).Where(p => p.Location == myLocation);
query.Include(p => p.Children);
I.e., with the results as desired, including the Child entities enumerated.

my methodology perhaps is simply out of scope with the capabilities of Linq
The problem is not LINQ, but EF Core query translation, and specifically the lack of Intersect / Union / Concat / Except method SQL translation, tracked by #6812 Query: Translate IQueryable.Concat/Union/Intersect/Except/etc. to server.
Shortly, such queries currently use client evaluation, which with combination of how the EF Core handles Include leads to many unexpected runtime exceptions (like your case #2) or wrong behaviors (like Ignored Includes in your case #1).
So while your approach technically perfectly makes sense, according to the EF Core team leader response
Changing this to producing a single SQL query on the server isn't currently a top priority
so this currently is not even planned for 3.0 release, although there are plans to change (rewrite) the whole query translation pipeline, which might allow implementing that as well.
For now, you have no options. You may try processing the query expression trees yourself, but that's a complicated task and you'll probably find why it is not implemented yet :) If you can convert your queries to the equivalent single query with combined Where condition, then applying Include will be fine.
P.S. Note that even now your approach technically "works" w/o Include, prefomance wise the way it is evaluated client side makes it absolutely non equivalent of the corresponding single query.

A long time has gone by, but this .Include problem still exists in EF 6. However, there is a workaround: Append every child request with .Include before intersecting/Unionizing.
MyDbContext C = new MyDbContext();
var queryOne = db.Parents.Where(p => p.Name == parent.Name).Include("Children");
var queryTwo = db.Parents.Where(p => p.Location == parent.Location).Include("Children");
var finalQuery = queryOne.Intersect(queryTwo);
As stated by #Ivan Stoev, Intersection/Union is done with after-fetched data, while .Include is ok at request time.
So, as of now, you have this one option available.

How to update a Collection in Many-Many by assigning a new Collection?

In entity framework core 2.0, I have many-many relationship between Post and Category (the binding class is PostCategory).
When the user updates a Post, the whole Post object (with its PostCategory collection) is being sent to the server, and here I want to reassign the new received Collection PostCategory (the user may change this Collection significantly by adding new categories, and removing some categories).
Simplified code I use to update that collection (I just assign completely new collection):
var post = await dbContext.Posts
.Include(p => p.PostCategories)
.ThenInclude(pc => pc.Category)
.SingleOrDefaultAsync(someId);
post.PostCategories = ... Some new collection...; // <<<
dbContext.Posts.Update(post);
await dbContext.SaveChangesAsync();
This new collection has objects with the same Id of objects in the previous collection (e.g. the user removed some (but not all) categories). Because of the, I get an exception:
System.InvalidOperationException: The instance of entity type 'PostCategory' cannot be tracked because another instance with the same key value for {'CategoryId', 'PostId'} is already being tracked.
How can I rebuild the new collection (or simply assign a new collection) efficiently without getting this exception?
UPDATE
The answer in this link seems to be related to what I want, but it is a good and efficient method? Is there any possible better approach?
UPDATE 2
I get my post (to edit overwrite its values) like this:
public async Task<Post> GetPostAsync(Guid postId)
{
return await dbContext.Posts
.Include(p => p.Writer)
.ThenInclude(u => u.Profile)
.Include(p => p.Comments)
.Include(p => p.PostCategories)
.ThenInclude(pc => pc.Category)
.Include(p => p.PostPackages)
.ThenInclude(pp => pp.Package)
//.AsNoTracking()
.SingleOrDefaultAsync(p => p.Id == postId);
}
UPDATE 3 (The code in my controller, which tries to update the post):
var writerId = User.GetUserId();
var categories = await postService.GetOrCreateCategoriesAsync(
vm.CategoryViewModels.Select(cvm => cvm.Name), writerId);
var post = await postService.GetPostAsync(vm.PostId);
post.Title = vm.PostTitle;
post.Content = vm.ContentText;
post.PostCategories = categories?.Select(c => new PostCategory { CategoryId = c.Id, PostId = post.Id }).ToArray();
await postService.UpdatePostAsync(post); // Check the implementation in Update4.
UPDATE 4:
public async Task<Post> UpdatePostAsync(Post post)
{
// Find (load from the database) the existing post
var existingPost = await dbContext.Posts
.SingleOrDefaultAsync(p => p.Id == post.Id);
// Apply primitive property modifications
dbContext.Entry(existingPost).CurrentValues.SetValues(post);
// Apply many-to-many link modifications
dbContext.Set<PostCategory>().UpdateLinks(
pc => pc.PostId, post.Id,
pc => pc.CategoryId,
post.PostCategories.Select(pc => pc.CategoryId)
);
// Apply all changes to the db
await dbContext.SaveChangesAsync();
return existingPost;
}

The main challenge when working with disconnect link entities is to detect and apply the added and deleted links. And EF Core (as of the time of writing) provides little if no help to do that.
The answer from the link is ok (the custom Except method is too heavier for what it does IMO), but it has some traps - the existing links has to be retrieved in advance using the eager / explicit loading (though with EF Core 2.1 lazy loading that might not be an issue), and the new links should have only FK properties populated - if they contain reference navigation properties, EF Core will try to create new linked entities when calling Add / AddRange.
A while ago I answered similar, but slightly different question - Generic method for updating EFCore joins. Here is the more generalized and optimized version of the custom generic extension method from the answer:
public static class EFCoreExtensions
{
public static void UpdateLinks<TLink, TFromId, TToId>(this DbSet<TLink> dbSet,
Expression<Func<TLink, TFromId>> fromIdProperty, TFromId fromId,
Expression<Func<TLink, TToId>> toIdProperty, IEnumerable<TToId> toIds)
where TLink : class, new()
{
// link => link.FromId == fromId
Expression<Func<TFromId>> fromIdVar = () => fromId;
var filter = Expression.Lambda<Func<TLink, bool>>(
Expression.Equal(fromIdProperty.Body, fromIdVar.Body),
fromIdProperty.Parameters);
var existingLinks = dbSet.AsTracking().Where(filter);
var toIdSet = new HashSet<TToId>(toIds);
if (toIdSet.Count == 0)
{
//The new set is empty - delete all existing links
dbSet.RemoveRange(existingLinks);
return;
}
// Delete the existing links which do not exist in the new set
var toIdSelector = toIdProperty.Compile();
foreach (var existingLink in existingLinks)
{
if (!toIdSet.Remove(toIdSelector(existingLink)))
dbSet.Remove(existingLink);
}
// Create new links for the remaining items in the new set
if (toIdSet.Count == 0) return;
// toId => new TLink { FromId = fromId, ToId = toId }
var toIdParam = Expression.Parameter(typeof(TToId), "toId");
var createLink = Expression.Lambda<Func<TToId, TLink>>(
Expression.MemberInit(
Expression.New(typeof(TLink)),
Expression.Bind(((MemberExpression)fromIdProperty.Body).Member, fromIdVar.Body),
Expression.Bind(((MemberExpression)toIdProperty.Body).Member, toIdParam)),
toIdParam);
dbSet.AddRange(toIdSet.Select(createLink.Compile()));
}
}
It uses a single database query to retrieve the exiting links from the database. The overhead are few dynamically built expressions and compiled delegates (in order to keep the calling code simplest as possible) and a single temporary HashSet for fast lookup. The performance affect of the expression / delegate building should be negligible, and can be cached if needed.
The idea is to pass just a single existing key for one of the linked entities and list of exiting keys for the other linked entity. So depending of which of the linked entity links you are updating, it will be called differently.
In you sample, assuming you are receiving IEnumerable<PostCategory> postCategories, the process would be something like this:
var post = await dbContext.Posts
.SingleOrDefaultAsync(someId);
dbContext.Set<PostCategory>().UpdateLinks(pc =>
pc.PostId, post.Id, pc => pc.CategoryId, postCategories.Select(pc => pc.CategoryId));
await dbContext.SaveChangesAsync();
Note that this method allows you to change the requirement and accept IEnumerable<int> postCategoryIds:
dbContext.Set<PostCategory>().UpdateLinks(pc =>
pc.PostId, post.Id, pc => pc.CategoryId, postCategoryIds);
or IEnumerable<Category> postCategories:
dbContext.Set<PostCategory>().UpdateLinks(pc =>
pc.PostId, post.Id, pc => pc.CategoryId, postCategories.Select(c => c.Id));
or similar DTOs / ViewModels.
Category posts can be updated in a similar manner, with corresponding selectors swapped.
Update: In case you a receiving a (potentially) modified Post post entity instance, the whole update procedure cold be like this:
// Find (load from the database) the existing post
var existingPost = await dbContext.Posts
.SingleOrDefaultAsync(p => p.Id == post.Id);
if (existingPost == null)
{
// Handle the invalid call
return;
}
// Apply primitive property modifications
dbContext.Entry(existingPost).CurrentValues.SetValues(post);
// Apply many-to-many link modifications
dbContext.Set<PostCategory>().UpdateLinks(pc => pc.PostId, post.Id,
pc => pc.CategoryId, post.PostCategories.Select(pc => pc.CategoryId));
// Apply all changes to the db
await dbContext.SaveChangesAsync();
Note that EF Core uses separate database query for eager loading related collecttions. Since the helper method does the same, there is no need to Include link related data when retrieving the main entity from the database.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

EF Core Cascading Deletes for Speed? - c#

Related

Infinite loop in a circular reference with Entity Framework and .NET Core

Entity Framework Core Nested List Performance

Select() decline in performance

Using Include with Intersect/Union/Exclude in Linq

How to update a Collection in Many-Many by assigning a new Collection?

Categories

Resources