EntityFramework: Reusability of '.Where' clauses for nested data - c#

The example is as follows: I have a database with all kinds of information about code reviews: reviewers, file infos, revision infos, ... I am using Entity Framework for data access. With regards to that, I have following models (simplified):
public class ReviewItem {
public virtual ICollection<ExpandedRevision> ExpandedRevisions { get; set; }
public virtual int? AuthorId { get; set; }
}
// ...
public class ExpandedRevision {
public int ChangedLines { get; set; }
public string FilePath { get; set; }
}
Now I want to get the total number of changed lines that a particular user reviewed. I do it like that:
public int GetLinesReviewed(int userId) {
using (var ctx = new CrucibleDBContext()) {
return ctx.ReviewItems
.Include(file => file.ExpandedRevisions)
.Where(file => file.AuthorId == userId)
.Sum(file => file
.ExpandedRevisions
.Where(er => !er.FilePath.Contains("third party") &&
!er.FilePath.Contains("ThirdParty")
)
.Sum(er => er.ChangedLines)
});
}
}
So the most interesting part in the above example is this:
.Where(er => !er.Path.Contains("third party") &&
!er.Path.Contains("ThirdParty")
)
Remembering that this is a simplified example, there are actually much more lines there.
I need such piece of code in a lot of places, so what I tried to do is to create a function that will encapsulate and centralize this logic. But due to the fact that ExpandedRevisions is declared as ICollection<ExpandedRevision>, the compiler does not allow me to pass Expression<Func<ExpandedRevision, bool>> there, but only Func<ExpandedRevision, bool>. That results in materialization of the whole thing, which is obviously something I don't want.
Question: how do I achieve code reusability in this case without having to load all the things into memory?

Related

Entity Framework Core 2.2 use scalar DBFunction to get property on list of foreign keys

I have a model with a linked list of foreign keys i.e.
[Table("a"]
public class A {
[Key]
[Column("a_id")]
public int Id { get; set; }
public List<B> Bs { get; set; } = new List<B>();
}
[Table("b"]
public class B {
[Key]
[Column("b_id")]
public int Id { get; set; }
[NotMapped]
public string MyFunctionValue { get; set; }
[ForeignKey("a_id")]
public A A { get; set; }
}
I've then defined a function which links to a scalar sql function like so...
public static class MySqlFunctions {
[DbFunction("MyFunction", "dbo")]
public static string MyFunction(int bId) {
throw new NotImplementedException();
}
}
and registered in my context like so...
modelBuilder.HasDbFunction(() => MySqlFunctions.MyFunction(default));
What I want to be able to do in my repository class is to grab the A records with the linked B records in a List with their MyFunctionValue value set to the return value of the function when ran against the id of B. Something like...
myContext.A
.Include(a => a.Bs.Select(b => new B {
Id = b.Id,
MyFunctionValue = MySqlFunctions.MyFunction(b.Id)
});
However with all the options I've tried so far I'm getting either a InvalidOperationException or NotImplementedException I guess because it can't properly convert it to SQL?
Is there any way I can write a query like this or is it too complex for EF to generate SQL for? I know there's a possibility I could use .FromSql but I'd rather avoid it if possible as it's a bit messy.
EDIT:
So I've managed to get it working with the following code but it's obviously a bit messy, if anyone has a better solution I'd be grateful.
myContext.A
.Include(a => a.Bs)
.Select(a => new {
A = a,
MyFunctionValues = a.Bs.Select(b => MySqlFunctions.MyFunction(b.Id))
})
.AsEnumerable()
.Select(aWithMfvs => {
for (int i = 0; i < aWithMfvs.MyFunctionValues.Count(); i++) {
aWithMfvs.A.Bs[i].MyFunctionValue = aWithMfvs.MyFunctionValues[i];
}
return aWithMfvs.A;
})
.AsQueryable();
There are several things you should consider with db functions:
When you declare a DbFunction as static method, you don't have to register it with the modelBuilder
Registering is only needed, when you would use Fluent API (which IMHO I recommend anyway in order to have you entities free of any dependencies)
The return value, the method name and the count, type and order of the method parameters must match your code in the user defined function (UDF)
You named the method parameter as bId. Is it exactly the same in your UDF or rather as in the table like b_id?

Entity Framework core select causes too many queries

I have the following method which is meant to build me up a single object instance, where its properties are built via recursively calling the same method:
public ChannelObjectModel GetChannelObject(Guid id, Guid crmId)
{
var result = (from channelObject in _channelObjectRepository.Get(x => x.Id == id)
select new ChannelObjectModel
{
Id = channelObject.Id,
Name = channelObject.Name,
ChannelId = channelObject.ChannelId,
ParentObjectId = channelObject.ParentObjectId,
TypeId = channelObject.TypeId,
ChannelObjectType = channelObject.ChannelObjectTypeId.HasValue ? GetChannelObject(channelObject.ChannelObjectTypeId.Value, crmId) : null,
ChannelObjectSearchType = channelObject.ChannelObjectSearchTypeId.HasValue ? GetChannelObject(channelObject.ChannelObjectSearchTypeId.Value, crmId) : null,
ChannelObjectSupportingObject = channelObject.ChannelObjectSupportingObjectId.HasValue ? GetChannelObject(channelObject.ChannelObjectSupportingObjectId.Value, crmId) : null,
Mapping = _channelObjectMappingRepository.Get().Where(mapping => mapping.ChannelObjectId == channelObject.Id && mapping.CrmId == crmId).Select(mapping => new ChannelObjectMappingModel
{
CrmObjectId = mapping.CrmObjectId
}).ToList(),
Fields = _channelObjectRepository.Get().Where(x => x.ParentObjectId == id).Select(field => GetChannelObject(field.Id, crmId)).ToList()
}
);
return result.First();
}
public class ChannelObjectModel
{
public ChannelObjectModel()
{
Mapping = new List<ChannelObjectMappingModel>();
Fields = new List<ChannelObjectModel>();
}
public Guid Id { get; set; }
public Guid ChannelId { get; set; }
public string Name { get; set; }
public List<ChannelObjectMappingModel> Mapping { get; set; }
public int TypeId { get; set; }
public Guid? ParentObjectId { get; set; }
public ChannelObjectModel ParentObject { get; set; }
public List<ChannelObjectModel> Fields { get; set; }
public Guid? ChannelObjectTypeId { get; set; }
public ChannelObjectModel ChannelObjectType { get; set; }
public Guid? ChannelObjectSearchTypeId { get; set; }
public ChannelObjectModel ChannelObjectSearchType { get; set; }
public Guid? ChannelObjectSupportingObjectId { get; set; }
public ChannelObjectModel ChannelObjectSupportingObject { get; set; }
}
this is connecting to a SQL database using Entity Framework Core 2.1.1
Whilst it technically works, it causes loads of database queries to be made - I realise its because of the ToList() and First() etc. calls.
However because of the nature of the object, I can make one huge IQueryable<anonymous> object with a from.... select new {...} and call First on it, but the code was over 300 lines long going just 5 tiers deep in the hierarchy, so I am trying to replace it with something like the code above, which is much cleaner, albeit much slower..
ChannelObjectType, ChannelObjectSearchType, ChannelObjectSupportingObject
Are all ChannelObjectModel instances and Fields is a list of ChannelObjectModel instances.
The query takes about 30 seconds to execute currently, which is far too slow and it is on a small localhost database too, so it will only get worse with a larger number of db records, and generates a lot of database calls when I run it.
The 300+ lines code generates a lot less queries and is reasonably quick, but is obviously horrible, horrible code (which I didn't write!)
Can anyone suggest a way I can recursively build up an object in a similar way to the above method, but drastically cut the number of database calls so it's quicker?
I work with EF6, not Core, but as far as I know, same things apply here.
First of all, move this function to your repository, so that all calls share the DbContext instance.
Secondly, use Include on your DbSet on properties to eager load them:
ctx.DbSet<ChannelObjectModel>()
.Include(x => x.Fields)
.Include(x => x.Mapping)
.Include(x => x.ParentObject)
...
Good practice is to make this a function of context (or extension method) called for example BuildChannelObject() and it should return the IQueryable - just the includes.
Then you can start the recursive part:
public ChannelObjectModel GetChannelObjectModel(Guid id)
{
var set = ctx.BuildChannelObject(); // ctx is this
var channelModel = set.FirstOrDefault(x => x.Id == id); // this loads the first level
LoadRecursive(channelModel, set);
return channelModel;
}
private void LoadRecursive(ChannelObjectModel c, IQueryable<ChannelObjectModel> set)
{
if(c == null)
return; // recursion end condition
c.ParentObject = set.FirstOrDefault(x => x.Id == c?.ParentObject.Id);
// all other properties
LoadRecursive(c.ParentObject, set);
// all other properties
}
If all this code uses the same instance of DbContext, it should be quite fast. If not, you can use another trick:
ctx.DbSet<ChannelObjectModel>().BuildChannelObjectModel().Load();
This loads all objects to memory cache of your DbContext. Unfortunately, it dies with context instance, but it makes those recursive calls much faster, since no database trip is made.
If this is still to slow, you can add AsNoTracking() as last instruction of BuildChannelObjectModel().
If this is still to slow, just implement application wide memory cache of those objects and use that instead of querying database everytime - this works great if your app is a service that can have long startup, but then work fast.
Whole another approach is to enable lazy loading by marking navigation properties as virtual - but remember that returned type will be derived type anonymous proxy, not your original ChannelObjectModel! Also, properties will load only as long you don't dispose the context - after that you get an exception. To load all properties with the context and then return complete object is also a little bit tricky - easiest (but not the best!) way to do it to serialize the object to JSON (remember about circural references) before returning it.
If that does not satisfy you, switch to nHibernate which I hear has application wide cache by default.

Linq extension methods that compile to store methods

I have a code-first EF database where objects have "statuses" to track history. They're implemented something like this:
public class Example
{
public Example()
{
this.Statuses = new HashSet<Status>();
}
public Guid Id { get; set; }
public virtual ICollection<Status> Statuses { get; set; }
}
public class Status
{
public Guid Id { get; set; }
public DateTimeOffset SetOn { get; set; }
public string SetBy { get; set; }
}
We have a few instances in code where we need to get either the oldest or newest status. Currently we've been using chained linq expressions like the following:
var setBy = example.Statuses.OrderByDescending(s => s.SetOn).FirstOrDefault().SetBy;
I think it would be more readable if we could do some of that with extensions, since getting the newest or oldest status is just a difference of whether it's sorted by ascending or descending.
A simple extension method like this works with linq-to-objects, if we've already gotten results from the database:
public static Status Newest(this IQueryable<Status> items)
{
return items.OrderByDescending(s => s.SetOn).FirstOrDefault();
}
However, this doesn't work if I'm running it on an IQueryable representing our database, since EF is unable to translate it to a store expression. For instance, if "repository" below is an IQueryable<Example> representing Examples in a SQL backend, the following will fail:
var date = DateTimeOffset.Parse("4/1/2018");
var query = repository.Where(ex => ex.Statuses.Newest().SetOn > date).FirstOrDefault();
Is there a way I can refactor this into an extension method or expression that can be translated to a store expression?
This can be done with LINQKit by defining an expression that returns a Status from an Example, and wrapping the IQueryable with LINQKit's Expandable. Using the above classes, I could do something like
private Expression<Func<Example, Status>> Newest =
e => e.Statuses.OrderByDescending(s => s.SetOn).FirstOrDefault();
And invoke it like
var results = from example in repository.AsExpandable()
select new
{
Example = example,
LatestStatus = Newest.Invoke(example)
};

Easier way of avoiding duplicates in entity framework

Can anyone provide an easier more automatic way of doing this?
I have the following save method for a FilterComboTemplate model. The data has been converted from json to a c# model entity by the webapi.
So I don't create duplicate entries in the DeviceProperty table I have to go through each filter in turn and retrieve the assigned DeviceFilterProperty from the context and override the object in the filter. See the code below.
I have all the object Id's if they already exist so it seems like this should be handled automatically but perhaps that's just wishful thinking.
public void Save(FilterComboTemplate comboTemplate)
{
// Set the Device Properties so we don't create dupes
foreach (var filter in comboTemplate.Filters)
{
filter.DeviceProperty = context.DeviceFilterProperties.Find(filter.DeviceFilterProperty.DeviceFilterPropertyId);
}
context.FilterComboTemplates.Add(comboTemplate);
context.SaveChanges();
}
From here I'm going to have to check whether any of the filters exist too and then manually update them if they are different to what's in the database so as not to keep creating a whole new set after an edit of a FilterComboTemplate.
I'm finding myself writing a lot of this type of code. I've included the other model classes below for a bit of context.
public class FilterComboTemplate
{
public FilterComboTemplate()
{
Filters = new Collection<Filter>();
}
[Key]
public int FilterComboTemplateId { get; set; }
[Required]
public string Name { get; set; }
[Required]
public ICollection<Filter> Filters { get; set; }
}
public class Filter
{
[Key]
public int FilterId { get; set; }
[Required]
public DeviceFilterProperty DeviceFilterProperty { get; set; }
[Required]
public bool Exclude { get; set; }
[Required]
public string Data1 { get; set; }
}
public class DeviceFilterProperty
{
[Key]
public int DeviceFilterPropertyId { get; set; }
[Required]
public string Name { get; set; }
}
Judging from some similar questions on SO, it does not seem something EF does automatically...
It's probably not a massive cut on code but you could do something like this, an extension method on DbContext (or on your particular dataContext):
public static bool Exists<TEntity>(this MyDataContext context, int id)
{
// your code here, something similar to
return context.Set<TEntity>().Any(x => x.Id == id);
// or with reflection:
return context.Set<TEntity>().Any(x => {
var props = typeof(TEntity).GetProperties();
var myProp = props.First(y => y.GetCustomAttributes(typeof(Key), true).length > 0)
var objectId = myProp.GetValue(x)
return objectId == id;
});
}
This will check if an object with that key exists in the DbContext. Naturally a similar method can be created to actually return that entity as well.
There are two "returns" in the code, just use the one you prefer. The former will force you to have all entities inherit from an "Entity" object with an Id Property (which is not necessarily a bad thing, but I can see the pain in this... you will also need to force the TEntity param: where TEntity : Entity or similar).
Take the "reflection" solution with a pinch of salt, first of all the performance may be a problem, second of all I don't have VS running up now, so I don't even know if it compiles ok, let alone work!
Let me know if that works :)
It seems that you have some common operations for parameters after it's bound from request.
You may consider to write custom parameter bindings to reuse the code. HongMei's blog is a good start point: http://blogs.msdn.com/b/hongmeig1/archive/2012/09/28/how-to-customize-parameter-binding.aspx
You may use the code in Scenario 2 to get the formatter binding to deserialize the model from body and perform the operations your want after that.
See the final step in the blog to specify the parameter type you want customize.

Does AsQueryable() on ICollection really makes lazy execution?

I am using Entity Framework CodeFirst where I have used Parent Child relations using ICollection as
public class Person
{
public string UserName { get;set}
public ICollection<Blog> Blogs { get; set;}
}
public class Blog
{
public int id { get; set; }
public string Subject { get; set; }
public string Body { get; set; }
}
Ok, so far everything is working ok, but my concern is, whenever I want to get the Blogs of a person, I get it as
var thePerson = _context.Persons.Where(x => x.UserName = 'xxx').SingleOrDefault();
var theBlogs = thePerson.Blogs.OrderBy(id).Take(5);
Now, I understand that, when the line is executed, all Blogs for that person is loaded into the memory and then sorting and selecting is done from memory. That is not ideal for a record of Person who has large number of blogs. I want to make the Blog Child as IQueryable so that the Sorting and Selecting is done in SQL database before pulling to Memory.
I know I could declare the Blogs as IQueryable in my context so that I could directly query as
var theBlogs = _context.Blogs.Where(.....)
but that is not feasible for me due to design choice, I want to avoid any circular reference as much as possible due to serialization problem. So, I did not make any reference of the parent entity in my child.
I found that, i can call AsQueryable() method on the blogs as
var theBlogs = thePerson.Blogs.AsQueryable().OrderBy(id).Take(5);
That looks like a magic for me and seems too good to be true. So my question. Does this AsQueryable really make the ICollection as IQueryable in reality and makes all Query process in SQL Server (Lazy loading) OR it is just a casting where Blogs are loaded into memory as like before, but change the interface from ICollection to IQueryable ?
So actually it appears that writing your navigation property as IQueryable<T> is not possible.
What you could do is adding a navigation property to Blog:
public class Blog
{
public int id { get; set; }
public string Subject { get; set; }
public string Body { get; set; }
public virtual Person Owner { get; set; }
}
From that, you can query as follows so it won't load everything into memory:
var thePerson = _context.Persons.Where(x => x.UserName = 'xxx').SingleOrDefault();
var results = _context.Blogs.Where(z => z.Person.Name = thePerson.Name).OrderBy(id).Take(5)
I suggest you to try LINQPad to see how LINQ is translated into SQL, and what is actually requested from the DB.
A better approach is described in Ladislav's answer. In your case:
var theBlogs = _context.Entry(thePerson)
.Collection(x => x.Blogs)
.Query()
.OrderBy(x => x.id)
.Take(5);

Categories