GetAllWithChildren() performance issue - c#

I used SQLite-Net Extensions
in the following code to retrieve 1000 rows with their children relationships from an Sqlite database:
var list =
SQLiteNetExtensions.Extensions.ReadOperations.GetAllWithChildren<DataModel>(connection);
The problem is that the performance is awkward. Because GetAllWithChildren() returns a List not an Enumerable. Does exist any way to load the records in to an Enumerable using Sqlite.net extensions?
I now use Table() method from Sqlite.net, loads the fetched rows in to the Enumerable but I dont want to use it because it does not understand the relationships and does not load the children entities at all.

GetAllWithChildren suffers from the N+1 problem, and in your specific scenario this performs specially bad. It's not clear in your question what you're trying, but you could try these solutions:
Use the filterparameter in GetAllWithChildren:
Instead of loading all the objects to memory and then filter, you can use the filter property, that internally performs a Table<T>().Where(filter) query, and SQLite-Net will convert to a SELECT-WHERE clause, so it's very efficient:
var list = connection.GetAllWithChildren<DataModel>(d => d.Name == "Jason");
Perform the query and then load the relationships
If you look at the GetAllWithChildren code you'll realize that it just performs the query and then loads the existing relationships. You can do that by yourself to avoid automatically loading unwanted relationships:
// Load elements from database
var list = connection.Table<DataModel>().Where(d => d.Name == "Jason").toList();
// Iterate elements and load relationships
foreach (DataModel element in list) {
connection.GetChildren(element, recursive = false);
}
Load relationships manually
To completely workaround the N+1 problem you can manually fetch relationships using a Contains filter with the foreign keys. This highly depends on you entity model, but would look like this:
// Load elements from database
var list = connection.Table<DataModel>().Where(d => d.Name == "Jason").toList();
// Get list of dependency IDs
var dependencyIds = list.Select(d => d.DependencyId).toList();
// Load all dependencies from database on a single query
var dependencies = connection.Table<Dependency>.Where(d => dependencyIds.Contains(d.Id)).ToList();
// Assign relationships back to the elements
foreach (DataModel element in list) {
element.Dependency = dependencies.FirstOrDefault(d => d.Id == element.DependencyId);
}
This solution solves the N+1 problem, because it performs only two database queries.

Another method to load relationships manually
Imagine we have these classes:
public class Parent
{
[PrimaryKey, AutoIncrement] public int Id { get; set; }
public string Name { get; set; }
public List<Child> children { get; set; }
public override bool Equals(object obj)
{
return obj != null && Id.Equals(((BaseModel) obj).Id);
}
public override int GetHashCode()
{
return Id.GetHashCode();
}
}
and
public class Child
{
[PrimaryKey, AutoIncrement] public int Id { get; set; }
public string Name { get; set; }
public int ParentId { get; set; }
}
Hint these classes have one-to-many relation. Then inner join between them would be:
var parents = databaseSync.Table<Parent>().ToList();
var children = databaseSync.Table<Child>().ToList();
List<Parent> parentsWithChildren = parents.GroupJoin(children, parent => parent.Id, child => child.ParentId,
(parent, children1) =>
{
parent.children = children1.ToList();
return parent;
}).Where(parent => parent.children.Any()).ToList();

Related

EF Core - Enforce priority in executing commands in a transaction

I want to delete 2 set of data in database, using EF Core.
All codes are hypothetical.
Data models:
class Parent
{
public int Id { get; set; }
}
class Child
{
public int Id { get; set; }
public int ParentId { get; set; }
public virtual Parent Parent { get; set; }
public bool Flag { get; set; }
}
Let's assume I want to delete all [Child] records with (ParentId=100) and (flag=false), after that if (child.ParentId=100).length=0 then delete the parent itself too.
So, here is the service class:
class Service
{
public void Command(int parentId)
{
Parent parent = GetParent(parentId);
List<Child> children = GetChildren(parent);
List<Child> toDelete = children.Where(x => !x.Flag).ToList();
foreach(var child in toDelete)
{
var entry = DbContext.Entry(child);
entry.State = EntityState.Deleted;
}
List<Child> remainChildren = children.Where(x => x.Flag).ToList();
if (!remainChildren.Any())
{
var entry = DbContext.Entry(parent );
entry.State = EntityState.Deleted;
}
SaveChanges();
}
}
I have multiple scenarios that call the Service.Command method.
Because I call SaveChanges() only once, I assume that all delete operations will be executed in a single transaction, and of course they would be in this order:
Delete child records
Delete parent
but EF send queries to database like this:
Delete parent
Delete child records
Obviously it will throw an ForeignKey exception.
Is there any way to enforce EF Core to execute queries in order that I wrote the code?
Set the parent child relationship to cascade delete at the DB level.
Query the needed data in one hit...
var data = context.Parents.Where(p => p.ParentId == parentId)
.Select(p => new
{
Parent = p,
ChildrenToRemove = p.Children.Where(c => c.Flag).ToList(),
HasRemainingChildren = p.Children.Any(c => !c.Flag)
}).Single();
Then it's just a matter of inspecting the data and acting accordingly. If there are no remaining children, delete the parent and let cascade take care of it. Otherwise, just delete the children from the context.
if(!data.HasRemainingChildren)
context.Parents.Remove(data.Parent);
else
context.Children.RemoveRange(data.ChildrenToRemove);
For big entities you can further optimize this by selecting just the IDs then associating them to new Entity instances, attach them to a fresh DbContext, and then issue the Remove/RemoveRange calls. This option is an optimization for dealing with large numbers of items, or "big" entities that would otherwise result in a lot of data across the wire.

Linq EF Split Parent into multiple Parents

Using Entity Framework to query a database with a Parent table and Child table with a 1-n relationship:
public class Parent {
public int id { get; set; }
public IList<Child> Children { get; set; }
}
public class Child {
public int id { get; set; }
}
Using EF, here's a quick sample query:
var parents = context.Parents;
Which returns:
parent id = 1, children = { (id = 1), (id = 2), (id = 3) }
What we need is for this to flatten into a 1-1 relationship, but as a list of parents with a single child each:
parent id = 1, children = { (id = 1) }
parent id = 1, children = { (id = 2) }
parent id = 1, children = { (id = 3) }
We're using an OData service layer which hits EF. So performance is an issue -- don't want it to perform a ToList() or iterate the entire result for example.
We've tried several different things, and the closest we can get is creating an anonymous type like such:
var results = from p in context.Parents
from c in p.Children
select new { Parent = p, Child = c }
But this isn't really what we're looking for. It creates an anonymous type of parent and child, not parent with child. So we can't return an IEnumerable<Parent> any longer, but rather an IEnumerable<anonymous>. The anonymous type isn't working with our OData service layer.
Also tried with SelectMany and got 3 results, but all of Children which again isn't quite what we need:
context.Parents.SelectMany(p => p.Children)
Is what we're trying to do possible? With the sample data provided, we'd want 3 rows returned -- representing a List each with a single Child. When normally it returns 1 Parent with 3 Children, we want the Parent returned 3 times with a single child each.
Your requirements don't make any sense, the idea behind how EF and LINQ work is not those repetitive info like SQL does. But you know them better and we don't know the whole picture, so I will try to answer your question hoping I understood it correctly.
If like you said, your problem is that IEnumerable<anonymous> doesn't work with your OData service layer, then create a class for the relationship:
public class ParentChild {
public Parent Parent { get; set; }
public Child Child { get; set; }
}
And then you can use in in your LINQ query:
var results = from p in context.Parents
from c in p.Children
select new ParentChild { Parent = p, Child = c }

EF Update Many-to-Many in Detached Scenario

I was trying to create a generic method to update an Entity and all it's collection properties from a detached object. For example:
public class Parent{
public int Id { get; set; }
public string ParentProperty { get; set; }
public List<Child> Children1 { get; set; }
public List<Child> Children2 { get; set; }
}
public class Child{
public int Id { get; set; }
public string ChildProperty { get; set; }
}
So, my first intention was to use something like this:
Repository<Parent>.Update(parentObj);
It would be perfect have a magic inside this method that update Parent properties and compare the list of Children of the parentObj to the current values in database and add/update/remove them accordingly, but it's too complex to my knowledge about EF/Reflection/Generic... and so I tried a second more easier way like this:
Repository<Parent>.Update(parentObj, parent => parent.Children1
parent => parent.Children2);
This method would be a little harder to use, but yet acceptable. But how I think the second parameter had to be params Expression<Func<TEntity, ICollection<TRelatedEntity>>>[] relatedEntities I had problems to specify multiple TRelatedEntity. So my try was to 3rd step with no success yet...
Now I tried to call a method to update Parent and a sequence of methods to update Childreen, like this:
Repository<Parent>.Update(parentObj);
Repository<Parent>.UpdateChild(parentObj, parent => parent.Id, parent => parent.Children1);
Repository<Parent>.UpdateChild(parentObj, parent => parent.Id, parent => parent.Children2);
And the code:
public virtual void Update(TEntity entityToUpdate)
{
context.Entry(entityToUpdate).State = EntityState.Modified;
}
public virtual void UpdateChild<TRelatedEntity>(TEntity entityToUpdate, Func<TEntity, object> keySelector, Expression<Func<TEntity, ICollection<TRelatedEntity>>> relatedEntitySelector) where TRelatedEntity: class
{
var entityInDb = dbSet.Find(keySelector.Invoke(entityToUpdate));
var current = relatedEntitySelector.Compile().Invoke(entityToUpdate);
var original = relatedEntitySelector.Compile().Invoke(entityInDb);
foreach (var created in current.Except(original))
{
context.Set<TRelatedEntity>().Add(created);
}
foreach (var removed in original.Except(current))
{
context.Set<TRelatedEntity>().Remove(removed);
}
foreach (var updated in current.Intersect(original))
{
context.Entry(updated).State = EntityState.Modified;
}
context.Entry(entityInDb).State = EntityState.Detached;
}
First problem was to get original values, because when I call dbSet.Find the entity is already in context (context.Entry(entityToUpdate).State = EntityState.Modified;).
So I tried to change order calling first Child:
Repository<Parent>.Update(parentObj);
Repository<Parent>.UpdateChild(parentObj, parent => parent.Id, parent => parent.Children1);
Repository<Parent>.UpdateChild(parentObj, parent => parent.Id, parent => parent.Children2);
And now I have the error:
Store update, insert, or delete statement affected an unexpected number of rows (0). Entities may have been modified or deleted since entities were loaded. See http://go.microsoft.com/fwlink/?LinkId=472540 for information on understanding and handling optimistic concurrency exceptions.
In summary, it would be very nice the first way, but I would be satisfied with the second/third too.
Thanks very much
Edit 1
Please, I need a native solution or using Automapper (which we already use in the project), because my customer don't like external dependencies and if we need to adapt something to the project, like working with Attached objects to update their related entities, so GraphDiff mencioned in the comments doesn't fit our needs (and VS 2015 RC crashed when I tried to install the package for tests)
Have you considered getting the object from the DB and using AutoMapper to modify all the property values?
I mean:
var obj = GetObjectFromDB(...);
AutoMapObj(obj, modifiedObj);
SaveInDb();

NHibernate join does not fully populate objects within transactions

We have a situation where a transaction is started on an NHibernate session, some rows are populated into a couple of tables, and a query is executed which performs a join on the two tables.
Models:
public class A
{
public virtual string ID { get; set; } // Primary key
public IList<B> Bs { get; set; }
}
public class B
{
public virtual string ID { get; set; } // Foreign key
}
NHibernate maps:
public class AMap: ClassMap<A>
{
public AMap()
{
Table("dbo.A");
Id(x => x.ID).Not.Nullable();
HasMany(u => u.Bs).KeyColumn("ID");
}
}
public class BMap: ClassMap<B>
{
public BMap()
{
Table("dbo.B");
Map(x => x.ID, "ID").Not.Nullable();
}
}
A transaction is started and the following code is executed:
var a1 = new A
{
ID = "One"
};
session.Save(a1);
var a2 = new A
{
ID = "Two"
};
session.Save(a2);
session.Flush();
var b1 = new B
{
ID = a1.ID
};
session.Save(b1);
var b2 = new B
{
ID = a2.ID
};
session.Save(b2);
session.Flush();
A a = null;
B b = null;
var result = _session.QueryOver(() => a)
.JoinQueryOver(() => a.Bs, () => b,JoinType.LeftOuterJoin)
.List();
The result is a list of A. In the list, objects of A do not have Bs populated.
Although this example is simplified, the actual objects in question have additional properties associated with corresponding table columns; all those properties populate as expected; the issue is confined to the property mapped as HasMany (foreign key association).
If the table is populated first, and then the query is performed (either as separate processes or in consecutive transactions), the objects of A do have their Bs correctly populated. In other words, it seems as though queries executed in a transaction are not able to see the complete effect of inserts previously performed within the same transaction.
Inspection of the SQL generated by NHibernate confirms that it correctly performed all the inserts and correctly formulated the join query; it appears that it simply did not correctly populate the objects from the query result.
Are there any special steps required to ensure that database inserts/updates performed via NHibernate are fully visible to subsequent fetches in the same transaction?
HasMany(u => u.Bs).KeyColumn("ID");
looks wrong to me. The id of a one-to-many relation should be A_ID.
You do lots of strange things in your code. I hope your real code doesn't look like this. You should not set foreign keys directly. They are managed by NH. You should not Flush all the time. Normally you never flush.
Also note that the left outer join is not used to populate the list of Bs in A. (There is no information for NHibernate that this would be a valid option.) There are mapping tricks to load entities and one of its collections in one query, but this is most of the time not such a good idea and I suggest to not try this unless you really know NH and how queries are processed very well. You'll only get the same A multiple times and some performance problems if you do not break it completely. If you are afraid of the N+1 problem (I hope you are), use batch-size instead.
Figured out the solution. The gist of it is to add the "child" items to the "parent" and then save that.
So... classes now look like:
public class A
{
public virtual string ID { get; set; } // Primary key
public virtual IList<B> Bs { get; set; }
}
public class B
{
public virtual A A { get; set; } // Foreign key now expressed as reference to "parent" object instead of property containing key value
}
ClassMaps for both parent and child express the relationship as object/list:
public class AMap: ClassMap<A>
{
public AMap()
{
Table("dbo.A");
Id(x => x.ID).Not.Nullable();
HasMany(u => u.Bs).KeyColumn("ID").Cascade.SaveUpdate();
}
}
public class BMap: ClassMap<B>
{
public BMap()
{
Table("dbo.B");
Map(x => x.ID, "ID").Not.Nullable();
References(x => x.A, "ID").Not.Nullable();
}
}
Finally, data is saved by constructing the objects and their relationship before saving them i.e. relationships are saved with the objects:
var a1 = new A
{
ID = "One"
};
var b1 = new B
{
A = a1
};
a1.Bs = new []{b1};
session.Save(a1);
var a2 = new A
{
ID = "Two"
};
var b2 = new B
{
A = a2
};
a2.Bs = new []{b2};
session.Save(a2);
session.Flush();
This query:
A a = null;
B b = null;
var result = _session.QueryOver(() => a)
.JoinQueryOver(() => a.Bs, () => b,JoinType.LeftOuterJoin)
.List();
Now returns the expected result, and within the same session/transaction.

Best way to load navigation properties in new entity

I am trying to add new record into SQL database using EF. The code looks like
public void Add(QueueItem queueItem)
{
var entity = queueItem.ApiEntity;
var statistic = new Statistic
{
Ip = entity.Ip,
Process = entity.ProcessId,
ApiId = entity.ApiId,
Result = entity.Result,
Error = entity.Error,
Source = entity.Source,
DateStamp = DateTime.UtcNow,
UserId = int.Parse(entity.ApiKey),
};
_statisticRepository.Add(statistic);
unitOfWork.Commit();
}
There is navigation Api and User properties in Statistic entity which I want to load into new Statistic entity. I have tried to load navigation properties using code below but it produce large queries and decrease performance. Any suggestion how to load navigation properties in other way?
public Statistic Add(Statistic statistic)
{
_context.Statistic.Include(p => p.Api).Load();
_context.Statistic.Include(w => w.User).Load();
_context.Statistic.Add(statistic);
return statistic;
}
Some of you may have question why I want to load navigation properties while adding new entity, it's because I perform some calculations in DbContext.SaveChanges() before moving entity to database. The code looks like
public override int SaveChanges()
{
var addedStatistics = ChangeTracker.Entries<Statistic>().Where(e => e.State == EntityState.Added).ToList().Select(p => p.Entity).ToList();
var userCreditsGroup = addedStatistics
.Where(w => w.User != null)
.GroupBy(g => g.User )
.Select(s => new
{
User = s.Key,
Count = s.Sum(p=>p.Api.CreditCost)
})
.ToList();
//Skip code
}
So the Linq above will not work without loading navigation properties because it use them.
I am also adding Statistic entity for full view
public class Statistic : Entity
{
public Statistic()
{
DateStamp = DateTime.UtcNow;
}
public int Id { get; set; }
public string Process { get; set; }
public bool Result { get; set; }
[Required]
public DateTime DateStamp { get; set; }
[MaxLength(39)]
public string Ip { get; set; }
[MaxLength(2083)]
public string Source { get; set; }
[MaxLength(250)]
public string Error { get; set; }
public int UserId { get; set; }
[ForeignKey("UserId")]
public virtual User User { get; set; }
public int ApiId { get; set; }
[ForeignKey("ApiId")]
public virtual Api Api { get; set; }
}
As you say, the following operations against your context will generate large queries:
_context.Statistic.Include(p => p.Api).Load();
_context.Statistic.Include(w => w.User).Load();
These are materialising the object graphs for all statistics and associated api entities and then all statistics and associated users into the statistics context
Just replacing this with a single call as follows will reduce this to a single round trip:
_context.Statistic.Include(p => p.Api).Include(w => w.User).Load();
Once these have been loaded, the entity framework change tracker will fixup the relationships on the new statistics entities, and hence populate the navigation properties for api and user for all new statistics in one go.
Depending on how many new statistics are being created in one go versus the number of existing statistics in the database I quite like this approach.
However, looking at the SaveChanges method it looks like the relationship fixup is happening once per new statistic. I.e. each time a new statistic is added you are querying the database for all statistics and associated api and user entities to trigger a relationship fixup for the new statistic.
In which case I would be more inclined todo the following:
_context.Statistics.Add(statistic);
_context.Entry(statistic).Reference(s => s.Api).Load();
_context.Entry(statistic).Reference(s => s.User).Load();
This will only query for the Api and User of the new statistic rather than for all statistics. I.e you will generate 2 single row database queries for each new statistic.
Alternatively, if you are adding a large number of statistics in one batch, you could make use of the Local cache on the context by preloading all users and api entities upfront. I.e. take the hit upfront to pre cache all user and api entities as 2 large queries.
// preload all api and user entities
_context.Apis.Load();
_context.Users.Load();
// batch add new statistics
foreach(new statistic in statisticsToAdd)
{
statistic.User = _context.Users.Local.Single(x => x.Id == statistic.UserId);
statistic.Api = _context.Api.Local.Single(x => x.Id == statistic.ApiId);
_context.Statistics.Add(statistic);
}
Would be interested to find out if Entity Framework does relationship fixup from its local cache.
I.e. if the following would populate the navigation properties from the local cache on all the new statistics. Will have a play later.
_context.ChangeTracker.DetectChanges();
Disclaimer: all code entered directly into browser so beware of the typos.
Sorry I dont have the time to test that, but EF maps entities to objects. Therefore shouldnt simply assigning the object work:
public void Add(QueueItem queueItem)
{
var entity = queueItem.ApiEntity;
var statistic = new Statistic
{
Ip = entity.Ip,
Process = entity.ProcessId,
//ApiId = entity.ApiId,
Api = _context.Apis.Single(a => a.Id == entity.ApiId),
Result = entity.Result,
Error = entity.Error,
Source = entity.Source,
DateStamp = DateTime.UtcNow,
//UserId = int.Parse(entity.ApiKey),
User = _context.Users.Single(u => u.Id == int.Parse(entity.ApiKey)
};
_statisticRepository.Add(statistic);
unitOfWork.Commit();
}
I did a little guessing of your namings, you should adjust it before testing
How about make a lookup and load only necessary columns.
private readonly Dictionary<int, UserKeyType> _userKeyLookup = new Dictionary<int, UserKeyType>();
I'm not sure how you create a repository, you might need to clean up the lookup once the saving changes is completed or in the beginning of the transaction.
_userKeyLookup.Clean();
First find in the lookup, if not found then load from context.
public Statistic Add(Statistic statistic)
{
// _context.Statistic.Include(w => w.User).Load();
UserKeyType key;
if (_userKeyLookup.Contains(statistic.UserId))
{
key = _userKeyLookup[statistic.UserId];
}
else
{
key = _context.Users.Where(u => u.Id == statistic.UserId).Select(u => u.Key).FirstOrDefault();
_userKeyLookup.Add(statistic.UserId, key);
}
statistic.User = new User { Id = statistic.UserId, Key = key };
// similar code for api..
// _context.Statistic.Include(p => p.Api).Load();
_context.Statistic.Add(statistic);
return statistic;
}
Then change the grouping a little.
var userCreditsGroup = addedStatistics
.Where(w => w.User != null)
.GroupBy(g => g.User.Id)
.Select(s => new
{
User = s.Value.First().User,
Count = s.Sum(p=>p.Api.CreditCost)
})
.ToList();

Categories