Repository and query objects pattern. How to implement complex queries

Repository and query objects pattern. How to implement complex queries - c#

I have read a lot of posts of the repository pattern but there are a few practical problems that they doesn't seem to solve or explain. This is what I understand about this two patterns:
The repository and the query pattern are complementary: Query objects represents business logic (WHERE clauses) and the repository pattern has a Get(IPredicate) method that takes a query object and returns a SELECT WHERE result
The repository should not have business logic: All the business logic must go on the query objects
Currently I have a class that wraps each logical object (which almost always is a single entity object) that implements multiple "Get" methods that implement most complex queries (joins, groupBy, etc...), this isn't a good pattern because classes tend to grow a lot because of boilerplate code for similar queries and its public methods are dependent on context that this class will be used, thus, making this classes unusable for multiple projects that depends on the same database, which is my primary goal for this project refactoring.
How queries that are more complex than a single SELECT WHERE are implemented with this two patterns without leaking business logic into the repository?
Or if this business logic doesn't fit into the repository nor the query objects
where does this logic fit?

The Repository pattern works well for standard CRUD applications. Where you need to implement the classic set of create, read, update and delete operations against a single table in a database. In that case you create a repository for each table and allow the read operation to have extra values so that filtering can be applied.
At the next level up you have the Unit of Work pattern. These are used to span multiple repositories and perform business operations. So for example, you would read values from multiple repositories, perform calculations and then write back changes to multiple repositories. All of that would occur inside a transaction so that you always have a consistent state in the database.
The problem is when you have complex queries that span multiple tables. In that case you would place the query into the repository that is the first table in the query from clause. Then you would need to provide parameters to that repository method so it can be parameterised as needed.

There are quite a few implementations of repository patterns and unit of work flying around on the internet. Some of them are quite simple where developer basically implements his own for each table himself manually, some are generic but not advanced, and some are really cool, generic and still offer you the ability to do a decent where, projection and the like.
An example of an in my opinion good implementation can be found here :
https://genericunitofworkandrepositories.codeplex.com/
It is targetting MVC, which is shown by the interface. I am focussing on WPF applications so I needed to tune it a bit. But the ideas of this unit of work implementation are pretty good.
There is down side to this implementation. Because it is relying on some advanced LINQ and EF functionality one could argue that your underlying access layer is infecting the repository layer and the layers using the repositories.
The point being that when for instance you want to move away from EF, chances are that you would have to change the interface of your repositories.
To show the power of this library some code snippets to prove this :
_fotoRepository = unitOfWork.RepositoryAsync<Foto>();
var fotos = await _fotoRepository
.Query(r => r.BestelBonId == bestelBonId || werkstukids.Contains(r.WerkstukMetBewerkingenId.Value))
.SelectAsync()
.ConfigureAwait(false);
or using projection:
IRepository<Relatie> relatieRepository = unitOfWork.RepositoryAsync<Relatie>();
var relatiesOverviewsEnumerable = relatieRepository
.Query()
.NoTracking()
.OrderBy(q => q.OrderBy(d => d.RelatieId))
.Select(b => new RelatieOverview
{
RelatieId = b.RelatieId,
Naam = b.Naam,
BTW = b.BTW,
HoofdAdres = b.Adressen.FirstOrDefault(a => a.AdresTypeId == HoofdadresType)
});
_relatieOverviews = new ObservableCollection<RelatieOverview>(relatiesOverviewsEnumerable);

Related

Generic Repository pattern with ThenIncludes

My solution has several layers, one of them is the DataAccess layer where I've implemented the repository pattern.
My main focus is that EntityFramework is only referenced in the DataAccess layer.
I had the need to include relations on my queries so I adapted my querie methods to receive the Includes as an input.
/// <inheritdoc/>
public IQueryable<T> AsQueryable(params Expression<Func<T, object>>[] includes)
{
var query = _dbSet.AsQueryable();
if (includes != null)
{
query = includes.Aggregate(query,
(current, include) => current.Include(include));
}
return query;
}
Example use:
// Books :: ICollection<Book>
var query = _repository.AsQueryable(e => e.Books);
Using the example above how can I, for example, include the Book -> Author relation in the query? The property Books is a collection therefore I cannot reference the Author property.
Example: .AsQueryable(e => e.Books, e => Books.Author) OR .AsQueryable(e => e.Books.ChildInclude(b => b.Author))

My main focus is that EntityFramework is only referenced in the DataAccess layer.
The only safe way to accomplish this is that I can recommend is that you define a boundary between your code that must be isolated from knowledge of EF and the code that can be aware of EF. This means that only materialized DTOs or non-entity models travel across this boundary. This would generally be applicable in cases where you want multiple consumers to be accessing data in an isolated and identical way. (I.e. a web site + API) Even then this imposes trade-offs for flexibility and performance. That boundary normally wouldn't be the repository, but a service which can be aware of EF, manage the DbContext scope, (via Unit of Work, or managing when SaveChanges() is called, etc.) access a Repository that leverages IQueryable<TEntity>, then projects results into materialized List<TDTO> or TDO instances for the consumers.
Passing entities outside of the scope of the DBContext they are tracked by leads to all kinds of complexity and problems within systems. This means designing a separation layer that returns DTOs or IEnumerable<TDTO> rather than IQueryable<TEntity> or even IEnumerable<TEntity>. Any code that accepts entities should always have a complete entity graph, or a complete-able entity graph. (Lazy load-able) Bug conditions are ripe when functions accepting entity graphs might, or might not get references or all properties populated, and guessing whether some fetched or constructed entity is "complete enough" to pass to an existing method.
If this abstraction is merely "highly desired" to satisfy a personal preference, or some uncertain concern about a future requirement that might require you to replace EF with some other mechanism, my advice would be simply "Don't". By implementing patterns to abstract code from EF you are imposing absolute performance constraints and usually significant complexity into your system for no immediate benefit. Leveraging IQueryable allows your code to build efficient and performant queries against the data layer. Materializing objects to pass back to a consuming layer will mean a lot of either very similar code, or very complex code to handle filtering, eager loading, projection, sorting, and pagination, and generally return far more data than a more direct method would return.
Eager loading can be facilitated using magic strings, or expressions. Filtering and sorting can be facilitated by expressions as well. However, a big caveat of developing complex expression-based abstractions is that while this may isolate calling code from EF references, it still doesn't isolate that code from EF-specific rules. For instance those expressions need knowledge of and obey EF rules such as not calling functions or anything that ultimately cannot be converted down to SQL.
Abstractions like this often lead to performance issues that reinforce the desire to later replace EF with another ORM or data access method due to performance issues. It becomes a self-fulfilling prophecy.

Entity Framework dependencies loading

I have a long-time burning question about how to avoid null errors with data queried via Entity Framework (version 6 - not Core yet, sadly).
Let's say you have a table Employees, and it has a relationship with another table, EmployeePayments (one employee has many employee payments).
On your Employee domain object you create a property TotalPayments which relies on you having loaded the EmployeePayments for that object.
I try to ensure that any time I do a query, I "include" the dependency, for example:
var employees = context.Employees.Include(e => e.EmployeePayments);
The problem is, I have a lot of queries around the place (I use the generic repository pattern, so I call repository functions like GetAll or GetSingle from my service library), and so that's a lot of places to remember to add the includes. If I don't include them, I run the risk of having a null exception if the TotalPayments property is used.
What's the best way to handle this?
Note 1: we have a lot of tables and I don't really want to have to revert to using specific repositories for each one, we take advantage of the generic repository in a lot of ways.... but I will be happy to hear strong arguments for the alternative :)
Note 2: I do not have lazy loading turned on, and don't plan on turning it on, for performance reasons.

This is one reason I consider the Generic Repository an anti-pattern for EF. I use a repository pattern, but scope it like I would a controller. I.e. a CreateOrderController would have a CreateOrderRepository. This repository would provide access to all relevant entities via IQueryable. Common stuff like lookups etc. would have their own secondary repository. Using generic repositories that are geared to working with a single entity type mean adding references to several repositories to do specific things and running into issues like this when attempting to load entities. Sometimes you want related data, sometimes you don't. Simply adding convenient methods in top level entities effectively "breaks" that an object should always be considered complete or complete-able without relying on lazy-loading which brings significant performance costs.
Having repositories return IQueryable avoids many of the problems by giving control to the calling code how entities are consumed. For instance I don't put helper methods in the entities, but rather code needing to populate a view model relies on Linq to build the view model. If my view model wants a sum of payments for an employee, then my repository returning IQueryable can do the following:
public IQueryable<Employee> GetEmployeeById(int employeeId)
{
return Context.Employees.Where(x => x.EmployeeId == employeeId);
}
then in the controller / service:
using (var contextScope = ContextScopeFactory.Create())
{
var employeeViewModel = EmployeeRepository.GetEmployeeById(employeeId)
.Select(x => new EmployeeSummaryViewModel
{
EmployeeId = x.EmployeeId,
EmployeeName = x.LastName + ", " + x.FirstName,
TotalPayments = x.Payments.Where(p => p.IsActive).Sum(p => p.Amount)
}).Single();
}
I use a repository because it is easier to mock out than the DbContext and it's DbSets. For Synchronous code I just have the mock to populate and return List<Employee>().AsQueryable(). For Async code I need to add a wrapper for an Async List.
This pattern may go against more traditional views of a repository and separation of concerns that the calling code needs to "know" about the entities, that EF-isms are leaked. However, no matter what approach you try to rationalize to get around the inefficiencies of trying to "hide" EF behind a repository, either you will be left with very inefficient code where repositories return pre-populated DTOs or contain dozens of near identical methods to return different DTOs (or worse, entities in various degrees of completeness) or you are adding complexities like passing in magic strings or expression trees into your methods to tell EF how to filter, how to sort, what to include, paging, etc. Passing in expressions or strings requires the calling code to "know" about the entities and leaks EF restrictions. (Passed in expressions / strings still have to be able to be ultimately understood by EF)
So this may not be a viable answer to the current state of your project, but it might be worth looking into whether your dependency on the repositories can be better managed without splitting them with the Generic pattern, and/or leveraging EF's excellent IQueryable / Linq capabilities to let your controllers/services project the entities into view models / DTOs rather than embedding these reduce elements in the entities themselves.

The responsibilities of my service and repository layer

The other day I asked this question:
Should the repository layer return data-transfer-objects (DTO)?
The answer (well by just one person, but I already had a hunch that it wasn't a good idea) was that no, the repository later should not have to deal with the DTO objects (their purpose is purely to be sent over the wire) and the service layer should deal with that.
Now I've come up with a construction in the meantime that I need your opinion on. The idea is that, when it makes sense to do so, the repository layer can return an interface type I've defined called IProjectable. This wraps the query (the repository layer does not execute the query yet) but does not allow the consumer to change the query (it's not IQueryable), just to perform projection operations on it (so far for me only First and ToPagedList) that would perform the projection and actually execute the query.
So something like this in the repository:
public IProjectable<User> GetUser(int id)
{
var query = from u in Set<User>()
where u.UserID == id
select u;
return query.AsProjectable();
}
And in the service layer something like this:
var dto = repository.GetUser(16).Single(u => new SimpleUserDto
{
FullName = u.FirstName + " " + u.LastName,
DisplayAddress = u.Address.Street + u.Address.HouseNumber,
OrderCount = u.Orders.Count()
});
return dto;
Am I correct in saying that doing the actual data access here is still the responsibility of the repository layer (as it should be) and that the projection to a serializable form is the responsibility of the service layer (as it should be)?
The only other way I see to do this efficiently (returning a User from the repository and doing the Count() on his Orders in the service layer would result in an extra query to the database) is to define a type that has all these properties and return it from the repository layer and just don't call it "Dto", which seems silly as it would be identical to the DTO just not named the same for the sake of "purity". This way, it seems, I can have my cake and eat it for the most part too.
The downside I see is that you can get a mismatch where the service layer performs projections that can't actually be translated to SQL which it shouldn't have to worry about, or where it performs such complex projections that makes it questionable what layer is doing the actual data access.
I'm using Entity Framework 4 by the way, if it matters.

Am I correct in saying that doing the
actual data access here is still the
responsibility of the repository layer
(as it should be) and that the
projection to a serializable form is
the responsibility of the service
layer (as it should be)?
Yes you are, the service layer still has no idea how the actual DataAccess is being performed (as it should not have not). Are the calls send to SQL? is there a caching layer in between?
The downside I see is that you can get
a mismatch where the service layer
performs projections that can't
actually be translated to SQL which it
shouldn't have to worry about, or
where it performs such complex
projections that makes it questionable
what layer is doing the actual data
access.
For this problem i use a pipeline pattern which basicly is just a set of extension methods over IProjectable which can perform tested projections. Next, in your serviceLayer you can just write your query using a composition of these pipeline methods, for example:
var users = repository.GetUsers().FilterByName("Polity").OrderByAge().ToTransferObjects();

One of developers I most respect ayende (http://ayende.com/Blog/Default.aspx) said: "ORM is your repository" video here -> http://community.devexpress.com/blogs/seth/archive/2011/03/09/interview-with-ayende-rahien-aka-oren-eini.aspx
Question is do you really need Repository pattern?
Just my opinion :)

Does Queryability and Lazy Loading in C# blur the lines of Data Access vs Business Logic?

I am experiencing a mid-career philosophical architectural crisis. I see the very clear lines between what is considered client code (UI, Web Services, MVC, MVP, etc) and the Service Layer. The lines from the Service layer back, though, are getting more blurred by the minute. And it all started with the ability to query code with Linq and the concept of Lazy loading.
I have created a Business Layer that consists of Contracts and Implementations. The Implementations then could have dependencies to other Contracts and so on. This is handled via an IoC Container with DI. There is one service that handles the DataAccess and all it does is return a UnitOfWork. This UnitOfWork creates a transaction when extantiated and commits the data on the Commit method. [View this Article (Testability and Entity Framework 4.0)]:
public interface IUnitOfWork : IDisposable {
IRepository<T> GetRepository<T>() where T : class;
void Commit();
}
The Repository is generic and works against two implementations (EF4 and an InMemory DataStore). T is made up of POCOs that get generated from the database schema or the EF4 mappings. Testability is built into the Repository design. We can leverage the in-memory implementation to assert results with expectations.
public interface IRepository<T> where T : class {
IQueryable<T> Table { get; }
void Add(T entity);
void Remove(T entity);
}
While the Data Source is abstracted, IQueryable still gives me the ability to create queries anywhere I want within the Business logic. Here is an example.
public interface IFoo {
Bar[] GetAll();
}
public class FooImpl : IFoo {
IDataAccess _dataAccess;
public FooImpl(IDataAccess dataAccess) {
_dataAccess = dataAccess;
}
public Bar[] GetAll() {
Bar[] output;
using (var work = _dataAccess.DoWork()) {
output = work.GetRepository<Bar>().Table.ToArray();
}
return output;
}
}
Now you can see how the queries could get even more complex as you perform joins with complex filters.
Therefore, my questions are:
Does it matter that there is no clear distinction between BLL and the DAL?.
Is queryability considered data access or business logic when behind a Repository layer that acts like an InMemory abstraction?
Addition: The more I think about it, maybe the second question was the only one that should have been asked.

I think the best way to answer your questions is to step back a moment and consider why separation between business logic layers and data access layers is the recommended practice.
In my mind, the reasons are simple: keep the business logic separate from the data layer because the business logic is where the value is, the data layer and business logic will need to change over time more or less independently of each other, and and the business logic needs to be readable without having to have detailed knowledge of what all the data access layer does.
So the litmus test for your query gymnastics boils down to this:
Can you make a change to the data schema in your system without upsetting a significant portion of the business logic?
Is your business logic readable to you and to other C# developers?

1. Only if you care more about philosophy than getting stuff done. :)
2. I'd say it's business logic because you have an abstraction in between. I would call that repository layer part of DAL, and anything that uses it, BL.
But yeah, this is blurry to me as well. I don't think it matters though. The point of using patterns like this is to write a clean, usable code that easy to communicate at the same time, and that goal is accomplished either way.

1.Does it matter that there is no clear distinction between BLL and the DAL?.
It sure does matter! Any programmer that uses your Table property needs to understand the ramifications (database roundtrip, query translation, object tracking). That goes for programmers reading the business logic classes as well.
2.Is queryability considered data access or business logic when behind a Repository layer that acts like an InMemory abstraction?
Abstraction is a blanket that we hide our problems under.
If your abstraction is perfect, then the queries could be abstractly considered as operating against in-memory collections and therefore they are not data access.
However, abstractions leak. If you want queries that make sense in the data world, there must be effort to work above and beyond the abstraction. That extra effort (which defeats abstraction) produces data access code.
Some examples:
output = work.GetRepository<Bar>().Table.ToArray();
This is code is (abstractly) fine. But in the data world it results in scanning an entire table and is (at least generally) dumb!
badquery = work.GetRepository<Customer>().Table.Where(c => c.Name.Contains("Bob")).ToArray();
goodquery = work.GetRepository<Customer>().Table.Where(c => c.Name.StartsWith("Bob")).ToArray();
Goodquery is better than bad query when there's an index on Customer.Name. But that fact is not available to us unless we lift the abstraction.
badquery = work.GetRepository<Customer>().Table
.GroupBy(c => c.Orders.Count())
.Select(g => new
{
TheCount = g.Key,
TheCustomers = g.ToList()
}).ToArray();
goodquery = work.GetRepository<Customer>().Table
.Select(c => new {Customer = c, theCount = c.Orders.Count())
.ToArray()
.GroupBy(x => x.theCount)
.Select(g => new
{
TheCount = g.Key,
TheCustomers = g.Select(x => x.Customer).ToList()
})
.ToArray();
goodquery is better than bad query since badquery will requery the database by group key, for each group (and worse, it is highly unlikely there is an index to help with filtering customers by c.Orders.Count() ).
Testability is built into the Repository design. We can leverage the InMemory implementation to assert results with expectations.
Be under no illusions that your queries are being tested if you actually run them against in-memory collections. Those queries are untestable unless a database is involved.

Should a Repository be responsible for "flattening" a domain?

Disclaimer: I'm pretty new to DDD and its associated terminology, so if i'm mislabeling any concepts, please correct me.
I'm currently working on a site with a relatively simple domain model (Catalog items, each of which stores a collection of CatalogImage items.)
My repository follows the standard interface of FindbyID(int ID) GetAll() etc...
The problem arises when trying to find a particular image by its ID; I end up with methods such as FindImagebyID(int CatalogItemID, int ImgID)
As new requirments develop, and the object graph becomes more heavily nested, I could see an explosion of methods such as Find{NestedType}ByID(int catalogItemID,.....,int nestedTypeID)
Should I simply be returning an IEnumerable from the FindAll() method, and using Linq in a higher layer to form these queries? Or will that be a violation of SoC?

It sounds to me like you have a justification for building multiple repositories.
Example
interface CatalogRepository
{
Catalog FindByID(int ID);
}
interface CatalogImageRepository
{
CatalogImage FindByID(int ID);
}
This will properly separate out your concerns, since each repository is only responsible for knowing how to deal with that specific entity.

I would filter the model at a layer above the repository, with LINQ if you like. Makes the repository simple. If you are using LINQ to get the data from the database this method works very well, if you are having to use ADO or some other legacy data access layer than it might make it more difficult to make the repository so simple. Linq makes it easy so that you can have the repository return IQueryable and let the next layer add the filtering and the actual retrieval of data does not happen until it is asked for. This makes it possible to have a method on the repository like GetImages() that gets all images, and the next layer adds the filtering for a specific image. If you are using ADO, you are probably not going to want to bring back all images then filter....so could be a trade off.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.