I have a long-time burning question about how to avoid null errors with data queried via Entity Framework (version 6 - not Core yet, sadly).
Let's say you have a table Employees, and it has a relationship with another table, EmployeePayments (one employee has many employee payments).
On your Employee domain object you create a property TotalPayments which relies on you having loaded the EmployeePayments for that object.
I try to ensure that any time I do a query, I "include" the dependency, for example:
var employees = context.Employees.Include(e => e.EmployeePayments);
The problem is, I have a lot of queries around the place (I use the generic repository pattern, so I call repository functions like GetAll or GetSingle from my service library), and so that's a lot of places to remember to add the includes. If I don't include them, I run the risk of having a null exception if the TotalPayments property is used.
What's the best way to handle this?
Note 1: we have a lot of tables and I don't really want to have to revert to using specific repositories for each one, we take advantage of the generic repository in a lot of ways.... but I will be happy to hear strong arguments for the alternative :)
Note 2: I do not have lazy loading turned on, and don't plan on turning it on, for performance reasons.
This is one reason I consider the Generic Repository an anti-pattern for EF. I use a repository pattern, but scope it like I would a controller. I.e. a CreateOrderController would have a CreateOrderRepository. This repository would provide access to all relevant entities via IQueryable. Common stuff like lookups etc. would have their own secondary repository. Using generic repositories that are geared to working with a single entity type mean adding references to several repositories to do specific things and running into issues like this when attempting to load entities. Sometimes you want related data, sometimes you don't. Simply adding convenient methods in top level entities effectively "breaks" that an object should always be considered complete or complete-able without relying on lazy-loading which brings significant performance costs.
Having repositories return IQueryable avoids many of the problems by giving control to the calling code how entities are consumed. For instance I don't put helper methods in the entities, but rather code needing to populate a view model relies on Linq to build the view model. If my view model wants a sum of payments for an employee, then my repository returning IQueryable can do the following:
public IQueryable<Employee> GetEmployeeById(int employeeId)
{
return Context.Employees.Where(x => x.EmployeeId == employeeId);
}
then in the controller / service:
using (var contextScope = ContextScopeFactory.Create())
{
var employeeViewModel = EmployeeRepository.GetEmployeeById(employeeId)
.Select(x => new EmployeeSummaryViewModel
{
EmployeeId = x.EmployeeId,
EmployeeName = x.LastName + ", " + x.FirstName,
TotalPayments = x.Payments.Where(p => p.IsActive).Sum(p => p.Amount)
}).Single();
}
I use a repository because it is easier to mock out than the DbContext and it's DbSets. For Synchronous code I just have the mock to populate and return List<Employee>().AsQueryable(). For Async code I need to add a wrapper for an Async List.
This pattern may go against more traditional views of a repository and separation of concerns that the calling code needs to "know" about the entities, that EF-isms are leaked. However, no matter what approach you try to rationalize to get around the inefficiencies of trying to "hide" EF behind a repository, either you will be left with very inefficient code where repositories return pre-populated DTOs or contain dozens of near identical methods to return different DTOs (or worse, entities in various degrees of completeness) or you are adding complexities like passing in magic strings or expression trees into your methods to tell EF how to filter, how to sort, what to include, paging, etc. Passing in expressions or strings requires the calling code to "know" about the entities and leaks EF restrictions. (Passed in expressions / strings still have to be able to be ultimately understood by EF)
So this may not be a viable answer to the current state of your project, but it might be worth looking into whether your dependency on the repositories can be better managed without splitting them with the Generic pattern, and/or leveraging EF's excellent IQueryable / Linq capabilities to let your controllers/services project the entities into view models / DTOs rather than embedding these reduce elements in the entities themselves.
Related
My solution has several layers, one of them is the DataAccess layer where I've implemented the repository pattern.
My main focus is that EntityFramework is only referenced in the DataAccess layer.
I had the need to include relations on my queries so I adapted my querie methods to receive the Includes as an input.
/// <inheritdoc/>
public IQueryable<T> AsQueryable(params Expression<Func<T, object>>[] includes)
{
var query = _dbSet.AsQueryable();
if (includes != null)
{
query = includes.Aggregate(query,
(current, include) => current.Include(include));
}
return query;
}
Example use:
// Books :: ICollection<Book>
var query = _repository.AsQueryable(e => e.Books);
Using the example above how can I, for example, include the Book -> Author relation in the query? The property Books is a collection therefore I cannot reference the Author property.
Example: .AsQueryable(e => e.Books, e => Books.Author) OR .AsQueryable(e => e.Books.ChildInclude(b => b.Author))
My main focus is that EntityFramework is only referenced in the DataAccess layer.
The only safe way to accomplish this is that I can recommend is that you define a boundary between your code that must be isolated from knowledge of EF and the code that can be aware of EF. This means that only materialized DTOs or non-entity models travel across this boundary. This would generally be applicable in cases where you want multiple consumers to be accessing data in an isolated and identical way. (I.e. a web site + API) Even then this imposes trade-offs for flexibility and performance. That boundary normally wouldn't be the repository, but a service which can be aware of EF, manage the DbContext scope, (via Unit of Work, or managing when SaveChanges() is called, etc.) access a Repository that leverages IQueryable<TEntity>, then projects results into materialized List<TDTO> or TDO instances for the consumers.
Passing entities outside of the scope of the DBContext they are tracked by leads to all kinds of complexity and problems within systems. This means designing a separation layer that returns DTOs or IEnumerable<TDTO> rather than IQueryable<TEntity> or even IEnumerable<TEntity>. Any code that accepts entities should always have a complete entity graph, or a complete-able entity graph. (Lazy load-able) Bug conditions are ripe when functions accepting entity graphs might, or might not get references or all properties populated, and guessing whether some fetched or constructed entity is "complete enough" to pass to an existing method.
If this abstraction is merely "highly desired" to satisfy a personal preference, or some uncertain concern about a future requirement that might require you to replace EF with some other mechanism, my advice would be simply "Don't". By implementing patterns to abstract code from EF you are imposing absolute performance constraints and usually significant complexity into your system for no immediate benefit. Leveraging IQueryable allows your code to build efficient and performant queries against the data layer. Materializing objects to pass back to a consuming layer will mean a lot of either very similar code, or very complex code to handle filtering, eager loading, projection, sorting, and pagination, and generally return far more data than a more direct method would return.
Eager loading can be facilitated using magic strings, or expressions. Filtering and sorting can be facilitated by expressions as well. However, a big caveat of developing complex expression-based abstractions is that while this may isolate calling code from EF references, it still doesn't isolate that code from EF-specific rules. For instance those expressions need knowledge of and obey EF rules such as not calling functions or anything that ultimately cannot be converted down to SQL.
Abstractions like this often lead to performance issues that reinforce the desire to later replace EF with another ORM or data access method due to performance issues. It becomes a self-fulfilling prophecy.
I'm starting to get my head into Domain Driven Design and I'm having some issues with the repositories and the fact that EF Core explicitly loading will automatically fill my navigational properties.
I have a repository that I use to load my aggregate root and its children. However, some of the aggregate children need to be loaded later on (I need to load those entities based on a date range).
Example:
Load schedule owners
Calculate a date range
Load schedule owner's schedules
I'm trying to keep my data access layer isolated from the core layer and this is where I have some questions.
Imagine this method on my repository:
public List<Schedule> GetSchedules(Guid scheduleOwnePk, DateRange dateRange)
{
var schedules = dbContext.Schedules.Where(x => x.PkScheduleOwner == scheduleOwnerPk && x.StartDate >= dateRange.Start && x.EndDate <= dateRange.End).ToList();
return schedules;
}
I can call this method from the core layer in two ways:
//Take advantage of EF core ability to fill the navigational property automatically
scheduleOwnerRepository.GetSchedules(scheduleOwner.Pk, dateRange)
or
var schedules = scheduleOwnerRepository.GetSchedules(scheduleOwner.Pk, dateRange);
//At this moment EF core already loaded the navigational property, so I need to clear it to avoid duplicated results
scheduleOwner.Schedules.Clear();
//Schedules is implemented as an IEnumerable to protect it from being changed outside the aggregator root
scheduleOwner.AddSchedules(schedules);
The problem with the first approach is that it leaks EF core to the core layer, meaning that the property ScheduleOwner.Schedules will no longer be filled if I move away from EF core.
The second approach abstracts EF core but requires some extra steps to get ScheduleOwner.Schedules filled. Since EF core will automatically load the navigational property after the repository method is called, I'm forced to clear it before adding the results, otherwise I'll be inserting duplicated results.
How do you guys deal with this kind of situation? Do you take advantage of EF core features or do you follow the more natural approach of calling a repository method and use its results to fill some property?
Thanks for the help.
There are a couple of things to consider here.
Try to avoid using your domain model for querying. Rather use a read model through a query layer.
An aggregate is a complete unit as it were so when loaded you load everything. When you run into a scenario where you do not need all of the related data it may indicate that the data is not part of the aggregate but it may, in fact, only be related in a weaker sense.
An example is Order to Customer. Although an Order may very well require a Customer the Order is an aggregate in its own right. The Customer may have a list of OrderIds but that may become large rather quickly. One would typically not require a complete list of orders to determine whether an aggregate is valid or complete. However, you may very well need a list of ActiveOrder value objects of sorts if that is required for, say, keep a maximum order amount although there are various ways to deal with that case also.
Back to your scenario. An EF entity is not your domain model and when I have had to make use of EF in the past I would load the entity and then map to my domain entity in the repository. The repository would only deal with domain aggregates and you should avoid query methods on the repository. As a minimum a repository would typically have at least a Get(id) and a Save(aggregate) method.
I would recommend querying using a separate layer that returns as simple a result as possible. For something like a Count I may return an int whereas something like IScheduleQuery.Search(specification) I may return IEnumerable<DataRow> or, if it contains more complex data or I have a need for a read model I may return IEnumerable<Query.Schedule>.
I have read a lot of posts of the repository pattern but there are a few practical problems that they doesn't seem to solve or explain. This is what I understand about this two patterns:
The repository and the query pattern are complementary: Query objects represents business logic (WHERE clauses) and the repository pattern has a Get(IPredicate) method that takes a query object and returns a SELECT WHERE result
The repository should not have business logic: All the business logic must go on the query objects
Currently I have a class that wraps each logical object (which almost always is a single entity object) that implements multiple "Get" methods that implement most complex queries (joins, groupBy, etc...), this isn't a good pattern because classes tend to grow a lot because of boilerplate code for similar queries and its public methods are dependent on context that this class will be used, thus, making this classes unusable for multiple projects that depends on the same database, which is my primary goal for this project refactoring.
How queries that are more complex than a single SELECT WHERE are implemented with this two patterns without leaking business logic into the repository?
Or if this business logic doesn't fit into the repository nor the query objects
where does this logic fit?
The Repository pattern works well for standard CRUD applications. Where you need to implement the classic set of create, read, update and delete operations against a single table in a database. In that case you create a repository for each table and allow the read operation to have extra values so that filtering can be applied.
At the next level up you have the Unit of Work pattern. These are used to span multiple repositories and perform business operations. So for example, you would read values from multiple repositories, perform calculations and then write back changes to multiple repositories. All of that would occur inside a transaction so that you always have a consistent state in the database.
The problem is when you have complex queries that span multiple tables. In that case you would place the query into the repository that is the first table in the query from clause. Then you would need to provide parameters to that repository method so it can be parameterised as needed.
There are quite a few implementations of repository patterns and unit of work flying around on the internet. Some of them are quite simple where developer basically implements his own for each table himself manually, some are generic but not advanced, and some are really cool, generic and still offer you the ability to do a decent where, projection and the like.
An example of an in my opinion good implementation can be found here :
https://genericunitofworkandrepositories.codeplex.com/
It is targetting MVC, which is shown by the interface. I am focussing on WPF applications so I needed to tune it a bit. But the ideas of this unit of work implementation are pretty good.
There is down side to this implementation. Because it is relying on some advanced LINQ and EF functionality one could argue that your underlying access layer is infecting the repository layer and the layers using the repositories.
The point being that when for instance you want to move away from EF, chances are that you would have to change the interface of your repositories.
To show the power of this library some code snippets to prove this :
_fotoRepository = unitOfWork.RepositoryAsync<Foto>();
var fotos = await _fotoRepository
.Query(r => r.BestelBonId == bestelBonId || werkstukids.Contains(r.WerkstukMetBewerkingenId.Value))
.SelectAsync()
.ConfigureAwait(false);
or using projection:
IRepository<Relatie> relatieRepository = unitOfWork.RepositoryAsync<Relatie>();
var relatiesOverviewsEnumerable = relatieRepository
.Query()
.NoTracking()
.OrderBy(q => q.OrderBy(d => d.RelatieId))
.Select(b => new RelatieOverview
{
RelatieId = b.RelatieId,
Naam = b.Naam,
BTW = b.BTW,
HoofdAdres = b.Adressen.FirstOrDefault(a => a.AdresTypeId == HoofdadresType)
});
_relatieOverviews = new ObservableCollection<RelatieOverview>(relatiesOverviewsEnumerable);
I have inherited a code base that uses DTOs in the business layer, these are populated via a set of mappers from Entity Framework. This has some quite serious limitation in terms of querying so I am working on a new "optimised" querying service.
My first issue is that I need to translate my LINQ query on my DTO to work with my Entity object but the calling context has no knowledge of the EF entities. Let's assume we can rely on the properties on each object having matching names.
This is where I have got to in terms of stubbing out what I want:
public static List<TDataObject> GetFiltered<TDataObject(Expression<Func<TDataObject, TDataObject>> projection, Func<TDataObject, bool> filter)
{
// 1. translate the filter parameter to work with my equivalent Entity object
// 2. build the EF query with the modified filter expression and also a Select() projection so we only return the properties we need. (this should generate an optimised SQL query under the hood)
// 3. map the results from the EF query back onto my TDataObject and return (I already have AutoMapper maps in place for this)
}
It is item 1 that I am struggling with so if anyone has any code examples for blogs posts for achieving this I'd appreciate it.
Also if anyone has any alternate suggestions I'd be happy to hear them.
One way to handle this is to build primitives around queries (instead of layers with repositories etc). Here's what we do:
http://lostechies.com/jimmybogard/2013/10/29/put-your-controllers-on-a-diet-gets-and-queries/
The calling code (controller) knows about a query and the result (DTO), but the piece doing the mapping knows exactly about EF Context/NHibernate ISession. Works very well for us and keeps our controllers light and thin.
Alternatively, get rid of your layers, and expose the EF objects directly:
var dtos = context.Employees.Where(e => e.IsActive)
.Project().ToArray<EmployeeShowViewModel>();
Put this in the controller because who cares, layers and abstractions are productivity preventers and time wasters.
I'm kinda stuggling with this being best practice or not..
I have a repository that returns an IQueryable lets say. Is this correct usage in the controller?
var whatever = ObjectRepository.GetWhatever(id);
var videoId = whatever.UsersInObject1InObjects2.First().Object.Video.ExternalVideoId;
Where in the 2nd line above ".Object" and ".Video" are references to tables that are related to "whatever" table.
Or should I be making another function in a different repository to get the ExternalVideoId?
The way I usually do this is this. I create a separate model class that encapsulates data that the controller need from the database. (I.e. as a rule I do not use ORM class for this). This allows me annotating the model class members with all sort of attributes I might need. In the repository code I query the ORM and then return the model class (or IQueryable/IEnumerbale of model) as the result.
I've heard an opinion that returning IQueryable from an ORM such as Entity Framework is not advised, because by doing this, you risk executing a query against your back end several times if you are not careful with your model in controller/view. This is because in EF IQueryable represent a query that is not executed yet. By returning IEnumerable instead of IQueryable you make sure that query execution is confined to your repository class.
I however find, that sometimes it's more convenient to return IQueryable. For example, for table paging/sorting scenarios I can apply page number, sort direction, page size, etc to the IQueryable before executing it. I feel that this logic belongs more to controller rather then to repository. Some may disagree.
What you're actually doing is using a collection to do a query.
I think you need to have a better look at linq queries.
You don't want to add custom queries to your repository.
You just want to expose the queryable interface like so..
public IQueryable<T> Get( IUser identity )
{
return Context.Set<T>();
}
public IQueryable<IBindable> GetItem( IUser identity )
{
return Context.Set<T>().Cast<IBindable>();
}
then you can use linq to sql