I'm a little bit familiar with Entity Framework for some simple projects, but now I want to go deeper and write better code.
There is plenty of topics talking about whether using statics methods in DAL or not. For the moment I'm more of the side of people who think yes we can use statics methods.
But I'm still thinking if some practices are good or not.
Lot of people do like this:
public IList<Person> GetAll()
{
using (var dbContext = new MyDbContext())
{
return dbContext.Persons.ToList();
}
}
But I'm wondering if doing like this is a good practice:
public static IQueryable<Person> GetAll()
{
var dbContext = new MyDbContext();
return dbContext.Persons;
}
The goal is to use only static methods in a static class as I think it's legit because this class is just a DAL, there will never be any property. I also need to do like this instead of using the using() scope to avoid disposing the context as this method return an IQueryable.
I'm sure some people already think "OMG no, your context will never be disposed", so please read this article: http://blog.jongallant.com/2012/10/do-i-have-to-call-dispose-on-dbcontext.html
I tried by myself and yes the context is disposed only when I don't need it anymore.
I repeat, the goal here is to use static methods so I can't use a dbContext property which the constructor instantiate.
So why people always use the using() scope?
Is it a bad practice to do it like I would like to?
Another bonus question: where is the [NotMapped] attribute with EF6? I've checked on both System.ComponentModel.DataAnnotations and System.ComponentModel.DataAnnotations.Schema but can't find it, this attribute is not recognized by the compiler.
Thank's for your answers
Following the Repository pattern, IQueryable<T> shall never be returned anyway.
Repository pattern, done right
Besides, your repositories depend on your DbContext. Let's say you have to work on customers in an accounting system.
Customer
public class Customer {
public int Id { get; protected set; }
public string GivenName { get; set; }
public string Surname { get; set; }
public string Address { get; set; }
}
CustomerRepository
public class CustomerRepository {
public CustomerRepository(DbContext context) {
if (context == null) throw new ArgumentNullException("context");
this.context = context;
}
public IList<Customer> GetAll() { return context.Customers.ToList(); }
public IList<Invoice> GetInvoicesFor(Customer customer) {
return context.Invoices
.Where(invoice => invoice.Customer.Id == customer.Id)
.ToList();
}
private readonly DbContext context;
}
So in fact, to answer your question in a more concise and precise way, I think neither approach is good. I would more preferably use a DbContext per business concern. When you access let's say the Customers Management features, then instantiate a single DbContext that shall be shared across all of your required repositories, then dispose this very DbContext once you exit this set of features. This way, you shall not have to use Using statements, and your contexts should be managed adequately.
Here's another short and simple good reference for the Repository pattern:
Repository (Martin Fowler)
In response to comments from the OP
But actually the point is I don't want to follow the repository pattern. People say "what about if your data source change?" I want to answer, what about if it never change? What the point of having a such powerful class but not using it just in case one day the database provider may change
Actually, the Repository Pattern doesn't only serve the purpose of easier data source change, it also encourages better separation of concerns and a more functional approach closer to the business domain as members in the repository shall all revolve around business terminologies.
For sure the repository itself cannot take over control to dispose a data context or whatever the object it uses to access the underlying data source, since it doesn't belong to it, it is only lended to it so that it can fulfill its tasks.
As for your point about will the data source change someday? No one can predict it. It is most likely to never change in most of the systems I have worked on. A database change is more likely to be seen after 10 years of the initial development for moerdnization purposes. That day, however, you'll understand how the Repository Pattern saves you time and headaches in comparison to a tightly coupled code. I work with tightly coupled code from legacy systems, and I do take the advantages for benefits. Prevention is better than cure.
But please lets focus on instantiate the dbContext in methods without the using() statement. Is it really bad? I mean also when we inject the context in the constructor we don't handle the dispose(), we let entity framework doing it and it manages it pretty well.
No, it isn't necessarily bad not to use using statements, as long as you dispose all unnecessary resources as long as they are no longer used. The using statements serves this purpose for you by doing it automatically instead of you having to take care about.
As for the Repository pattern, it can't dispose the context that is passed to it, and it shan't be disposed neither because the context is actually contextualized to a certain matter and is used across other features within a given business context.
Let's say you have Customer management features. Within them, you might also require to have the invoices for this customer, along with the transaction history. A single data context shall be used across all of the data access as long as the user works within the customer management business context. You shall then have one DbContext injected in your Customer management feature. This very same DbContext shall be shared across all of the repositories used to access your data source.
Once the user exits the Customer management functionalities, the DbContext shall get disposed accordingly as it may cause memory leaks. It is false to believe that as long as it is no longer used, everything gets garbage collected. You never know how the .NET Framework manages its resources and how long it shall take to dispose your DbContext. You only know that it might get disposed somehow, someday.
If the DbContext gets disposed immediately after a data access is performed, you'll have to instantiate a new instance everytime you need to access the underlying data source. It's a matter of common sense. You have to define the context under which the DbContext shall be used, and make it shared across the identified resources, and dispose it as soon as it is no longer needed. Otherwise, it could cause memory leaks and other such problems.
In response to comment by Mick
I would go further and suggest you should always return IQueryable to enable you to reuse that result passing it into other calls on your repositories. Sorry but your argument makes absolutely no sense to me. Repositories are not meant to be stand-alone one stop shops they should be used to break up logic into small, understandable, encapsulated, easily maintained chunks.
I shall disagree to always return IQueryable<T> through a Repository, otherwise what is it good to have multiple methods for? To retrieve the data within your repository, one could simply do:
public class Repository<T> where T : class {
public Repository(DbContext dataContext) { context = dataContext; }
public IQueryable<T> GetAll() { return context.Set<T>(); }
private readonly DbContext context;
}
and place predicates everywhere in your code to filter the data as per the views needs or whatsoever. When it is time to change the filter criterion or the like, you'll have to browse all of your code to make sure anyone didn't use a filter which was actually unexpected, and may cause the system to misbehave.
On a side note, I do understand your point and I might admit that for some reasons as described in your comment, it might be useful to return IQueryable<T>. Besides, I simply wonder for what good is it, since a repository's responsibility is to provide a class everything it needs to get its information data. So if one needs to pass along IQueryable<T>s to another repository, it sounds strange to me as if one wouldn't have completely investigated every possible way to retrieve data. If one need some data to process another query, lazy-loading can do it and no need to return IQueryables. As per its name, IQueryables are made to perform queries, and the Repository's responsibility to to access the data, that is, to perform queries.
Related
I ran at a major architectural problem.
CONTEXT
I'm trying to build an ASP.NET Core microservice application that implements the strategy pattern.
The application communicates with other microservices.
I have a main entity that aggregates all the information I need to work with, let's call it "MainContext". The goal is that this entiy is loaded and built only one time (as we need to get that information from other microservices) and then is processed throughout the whole application.
public class MainContext
{
public DeterminerAttribute Attribute {get; set; }
public OtherContextA ContextA { get; set; }
public OtherContextB ContextB { get; set; }
}
As you can see, the MainContext aggregates other contexts. These 'OtherContexts' are base classes that have their own child classes. They are somehow different and have different types and quantities of fields.
The application builds the MainContext in one separate place. The process looks something like this:
We get a specific attribute from other microservice and use this attribute as a determiner in a switch expression. The attribute is also saved in MainContext.
In switch expression we load specific implementations of OtherContextA and OtherContextB classes and wrap them up in their base classes. This step is important, as I don't want to ask for information that I don't need from other services.
The method returns MainContext with all information loaded, ready to use.
Then, I use strategy pattern, because different contexts require different treatment.
THE PROBLEM
The strategies have the same interface, and thus should implement the same methods that have the same signature. In my case, there is only one method, that looks something like this:
public class SomeStrategyToProcessContext : StrategyInterface
{
public async Task ProcessContext(MainContext mainContext, ...);
}
Now, in strategies I want to work with concrete implementations of Contexts. It makes sense as I KNOW, as a programmer who made that mess, that the strategies to be used are chosen based on the same attribute that I used to load contexts and therefore should work with the concrete implementations, as I need data stored in them. But this:
var concreteContext = (OtherConcreteContextA) mainContex.ContextA
is considered a bad pratice, AFAIK.
Obviously, base classes have only base, unspecific data. In strategy classes, I want to provide access only to the NEEDED data, no more, no less.
My quistion is: is there any safe and sustainable way of implementing this witin OOP (or other) paradigm? I want to avoid the casting, as it breaks the abstraction and contradics every programming principle I've learned about. Any advice, even if it's toxic or/and suggests to change the whole architecture is as good as gold. Thanks!
Let’s say I have some DDD service that requires some IEnumerable<Foo> to perform some calculations. I came up with two designs:
Abstract the data access with an IFooRepository interface, which is quite typical
public class FooService
{
private readonly IFooRepository _fooRepository;
public FooService(IFooRepository fooRepository)
=> _fooRepository = fooRepository;
public int Calculate()
{
var fooModels = _fooRepository.GetAll();
return fooModels.Sum(f => f.Bar);
}
}
Do not rely on the IFooRepository abstraction and inject IEnumerable<Foo> directly
public class FooService
{
private readonly IEnumerable<Foo> _foos;
public FooService(IEnumerable<Foo> foos)
=> _foos = foos;
public int Calculate()
=> _foos.Sum(f => f.Bar);
}
This second design seems better in my opinion as FooService now does not care where the data is coming from and Calculate becomes pure domain logic (ignoring the fact that IEnumerable may come from an impure source).
Another argument for using the second design is that when IFooRepository performs asynchronous IO over the network, usually it will be desirable to use async-await like:
public class AsyncDbFooRepository : IFooRepository
{
public async Task<IEnumerable<Foo>> GetAll()
{
// Asynchronously fetch results from database
}
}
But as you need to async all the way down, FooService is now forced to change its signature to async Task<int> Calculate(). This seems to violate the dependency inversion principle.
However, there are also issues with the second design. First of all, you have to rely on the DI container (using Simple Injector as an example here) or the composition root to resolve the data access code like:
public class CompositionRoot
{
public void ComposeDependencies()
{
container.Register<IFooRepository, AsyncDbFooRepository>(Lifestyle.Scoped);
// Not sure if the syntax is right, but it demonstrates the concept
container.Register<FooService>(async () => new FooService(await GetFoos(container)));
}
private async Task<IEnumerable<Foo>> GetFoos(Container container)
{
var fooRepository = container.GetInstance<IFooRepository>();
return await fooRepository.GetAll();
}
}
Also in my specific scenario, AsyncDbFooRepository requires some sort of runtime parameter to construct, and that means you need an abstract factory to construct AsyncDbFooRepository.
With the abstract factory, now I have to manage the life cycles of all dependencies under AsyncDbFooRepository (the object graph under AsyncDbFooRepository is not trivial). I have a hunch that I am using DI incorrectly if I opt for the second design.
In summary, my questions are:
Am I using DI incorrectly in my second design?
How can I compose my dependencies satisfactorily for my second design?
One aspect of async/await is that it by definition needs to applied "all the way down" as you rightfully state. You however can't prevent the use of Task<T> when injecting an IEnumerable<T>, as you suggest in your second option. You will have to inject a Task<IEnumerable<T>> into constructors to ensure data is retrieved asynchronously. When injecting an IEnumerable<T> it either means that your thread gets blocked when the collection is enumerated -or- all data must be loaded during object graph construction.
Loading data during object graph construction however is problematic, because of the reasons I explained here. Besides that, since we're dealing with collections of data here, it means that all data must be fetched from the database on each request, even though not all data might be required or even used. This might cause quite a performance penalty.
Am I using DI incorrectly in my second design?
That's hard to say. An IEnumerable<T> is a stream, so you could consider it a factory, which means that injecting an IEnumerable<T> does not require the runtime data to be loaded during object construction. As long as that condition is met, injecting an IEnumerable<T> could be fine, but still makes it impossible to make the system asynchronous.
However, when injecting an IEnumerable<T> you might end up with ambiguity, because it might not be very clear what it means to be injecting an IEnumerable<T>. Is that collection a stream that is lazily evaluated or not? Does it contain all elements of T. Is T runtime data or a service?
To prevent this confusion, moving the loading of this runtime information behind an abstraction is typically the best thing to do. To make your life easier, you could make the repository abstraction generic as well:
public interface IRepository<T> where T : Entity
{
Task<IEnumerable<T>> GetAll();
}
This allows you to have one generic implementation and make one single registration for all entities in the system.
How can I compose my dependencies satisfactorily for my second design?
You can't. To be able to do this, your DI container must be able to resolve object graphs asynchronously. For instance, it requires the following API:
Task<T> GetInstanceAsync<T>()
But Simple Injection doesn't have such API, and neither does any other existing DI Container and that's for good reason. The reason is that object construction must be simple, fast and reliable and you lose that when doing I/O during object graph construction.
So not only is your second design undesirable, it is impossible to do so when data is loaded during object construction, without breaking the asynchonicity of the system and causing threads to block while using a DI container.
I try as much as possible (until now I've succeded every time) to not inject any service that do IO in my domain models as I like to keep them pure with no side effects.
That being said the second solution seems better but there is a problem with the signature of the method public int Calculate(): it uses some hidden data to perform the calculation, so it is not explicit. In cases like this I like to pass the transient input data as input parameter directly to the method like this:
public int Calculate(IEnumerable<Foo> foos)
In this way it is very clear what the method needs and what it returns (based on the combination of class name and method name).
What I have:
public interface IRepository
{
IDisposable CreateConnection();
User GetUser();
//other methods, doesnt matter
}
public class Repository
{
private SqlConnection _connection;
IDisposable CreateConnection()
{
_connection = new SqlConnection();
_connection.Open();
return _connection;
}
User GetUser()
{
//using _connection gets User from Database
//assumes _connection is not null and open
}
//other methods, doesnt matter
}
This enables classes that are using IRepository to be easily testable and IoC containers friendly. However someone using this class has to call CreateConnection before calling any methods that are getting something from database, otherwise exception will be thrown. This itself is kind of good - we dont want to have long lasting connections in application. So using this class I do it like this.
using(_repository.CreateConnection())
{
var user = _repository.GetUser();
//do something with user
}
Unfortunetelly this is not very good solution because people using this class (including even me!) often forget to call _repository.CreateConnection() before calling methods to get something from database.
To resolve this I was looking at Mark Seemann blog post SUT Double where he implements Repository pattern in correct way. Unfortunately he makes Repository implement IDisposable, which means I cannot simply inject it by IoC and DI to classes and use it after, because after just one usage it will be disposed. He uses it once per request and he uses ASP.NET WebApi capabilities to dispose it after request processing is done. This is something I cannot do because I have my classes instances which use Repository working all the time.
What is the best possible solution here? Should I use some kind of factory that will give me IDisposable IRepository ? Will it be easily testable then?
There are a few problematic spots in your design. First of all, your IRepository interface implements multiple levels of abstractions. Creating a user is a much higher level concept than connection management. By placing these behaviours together you are breaking the Single Responsibility Principle which dictates that a class should only have one responsibility, one reason to change. You are also violating the Interface Segregation Principle that pushes us toward narrow role interfaces.
On top of that, the CreateConnection() and GetUser method are temporal coupled. Temporal Coupling is a code smell and you are already witnessing this to be a problem, because you are able to forget the call to CreateConnection.
Besides this, the creation of the connection is something you will start to see on every repository in the system and every piece of business logic will need to either create a connection, or get an existing connection from the outside. This becomes unmaintainable in the long run. Connection management however is a cross-cutting concern; you don't want the business logic to be concerned in such low level concern.
You should start by splitting the IRepository into two different interfaces:
public interface IRepository
{
User GetUser();
}
public interface IConnectionFactory
{
IDisposable CreateConnection();
}
Instead of letting business logic manage the connection itself, you can manage the transaction at a higher level. This could be the request, but this might be too course grained. What you need is to start the transaction somewhere in between the presentation layer code and the business layer code, but without having to having to duplicate yourself. In other words, you want to be able to transparently apply this cross-cutting concern, without having to write it over and over again.
This is one of the many reasons that I started to use application designs as described here a few years ago, where business operations are defined using message objects and their corresponding business logic is hidden behind a generic interface. After applying these patterns, you will have a very clear interception point where you can start transactions with their corresponding connections and let the whole business operation run within that same transaction. For instance, you can use the following generic code that can be applied around every piece of business logic in your application:
public class TransactionCommandHandlerDecorator<TCommand> : ICommandHandler<TCommand>
{
private readonly ICommandHandler<TCommand> decorated;
public TransactionCommandHandlerDecorator(ICommandHandler<TCommand> decorated) {
this.decorated = decorated;
}
public void Handle(TCommand command) {
using (var scope = new TransactionScope()) {
this.decorated.Handle(command);
scope.Complete();
}
}
}
This code wraps everything around a TransactionScope. This allows your repository to simply open and close a connection; this wrapper will ensure that the same connection is used nonetheless. This way you can inject an IConnectionFactory abstraction into your repository and let the repository directly close the connection at the end of its method call, while under the covers .NET will keep the real connection opened.
Create a repository factory that creates IDisposable repositories.
public interface IRepository : IDisposable {
User GetUser();
//other methods, doesn't matter
}
public interface IRepositoryFactory {
IRepository Create();
}
You create them within a using and they are disposed of when done.
using(var repository = factory.Create()) {
var user = repository.GetUser();
//do something with user
}
You can inject the factory and create the repositories as needed.
So, you already mentioned that
we dont want to have long lasting connections in application
which is absolutely right!
You need to open connection in each repository method implementation, execute queries or commands against the database, and then close the connection. I don't see why you would expose anything like connection to the domain layer. In other words, remove CreateConnection() methods from repositories. They are not needed. Each method will open/close it inside, when implemented.
There are times when you would want to wrap several repository method calls into something, but that is only related to transaction, not connection. In that case there are 2 answers:
Check the correctness of your Repository pattern implementation. You should have repositories only for Aggregate Roots. Not every entity qualifies as aggregate root. Aggregate root is the guaranteed transaction boundary, so you should not be worried about transactions anyway out of repository - each repository method call will naturally follow the boundary, since it handles only a single aggregate root at a time.
If you still need to execute operations against several aggregate roots in one go, then you will have to implement a pattern called Unit of Work. This is essentially a business layer transaction implementation. I don't recommend relying on built-in transaction features into storage technologies for this specific case (several aggregates in one go), because they differ from vendor to vendor (while relational DBs can guarantee several aggregate roots in one go, NoSQL DBs only guarantee single aggregate at a time).
From my experience, you should only need to modify single aggregate at a time. Unit of Work is a very rare case pattern. So, just rethink your repositories and aggregate roots, that should do the trick for you.
Just for the completeness of the answer - you do need to have repository interfaces, which you already have. Thus, your approach is already unit-testable.
You are mixing apples with oranges and peaches.
There are three concepts at play here:
The repository contract
The implementation details
Repository lifetime management
Your repository conceptually holds users, but it has a CreateConnection() method that indicates details of the implementation (a connection is needed). Not good.
What you need to do is remove the CreateConnection() method from the interface. Now you have a true definition of what a user repository is (by the way, you should call it that, IUserRepository).
On to the implementation details:
You have a user repository that talks to a database, so you should implement a DatabaseUserRepository class. This is where the details of creating a connection and handling it are stored. You may decide to keep an open connection for the lifetime of the object, or you may decide it's best to open and close a connection for every operation.
On to the lifetime of the object:
You have a dependency container. You may have decided you want your repository to be used as a singleton because your DatabaseUserRepository class implements atomic, thread-safe operations, or you may want your repository to be transient so a new instance is created because it implements a unit of work pattern which means that all changes are saved together (e.g. EF.SaveChanges()).
See the difference now?
The interface allows for unit testing. Any component that needs data from the database can use a mock repository that loads garbage from memory (e.g. MemoryUserRepository).
The implementation provides a repository that stores users in a database. You may even decide to have two versions of this class that implement the interface along with different strategies or patterns.
The lifetime of the repository will be setup according to the implementation details in the dependency container.
I would create a Connection Factory...
public class ConnectionFactory
{
public IDbConnection Create()
{
// your logic here
}
}
Now make it a dependency to your repositories, and use it inside you repositories as well... You don`t need an IDisposable repository, you need to dispose the connection.
I'm on the cellphone, so it's hard to give you a more detailed example. If you need, i can edit it later with a more detailed example.
Environment: ASP.NET MVC3 C#
Say I have some repository (semi-psuedo):
public interface IRepository
{
create();read();update();delete();opendb();closedb();
}
public class CarRepository : IRepository
{
private DbContext namedDbContext;
public void opendb()
{
namedDbContext = new DbContext();
}
public void closedb()
{
namedDbContext.dispose();
}
}
And then in a controller the repository is injected and used as follows to manually control the db connection lifetime:
public class SomeController : Controller
{
private IRepository CarRepository;
public void SomeController(IRepository _carRepository)
{
CarRepository = _carRepository;
}
public ActionResult SomeAction(int CarId)
{
CarRepository.opendb();
var car = CarRepository.read(CarId);
CarRepository.closedb();
}
}
Is this considered bad practice because it is taking the control of the connection from the repository and placing it in the controller? I am worried about memory leaks from using dependency injection and want to ensure duplicate connections are not opened, nor long running and unused.
Yes. Sure. Most ADO.NET drivers uses connection pooling, so the actual connection process isn't that heavy. And you have TransactionScope which can take care of transaction over multiple connections, but it wont be as fast as one transaction over one connection.
I am worried about memory leaks from using dependency injection and want to ensure duplicate connections are not opened, nor long running and unused.
A IoC will guaranteed clean up the connection (a large user base have made sure of that). There is no guarantee that a programmer will do the cleanup in all places.
The REpository pattern provides an abstraction of the persistence layer. It shouldn't expose any of the persistence details such as db connection. What if the storage is an xml file, or a cloud storage?
So yes, it is bad practice. If you want more control, you might make the repository use the unit of work pattern, so that a higher level should decide when a transaction is commited, but that's it. No knowledge of the database should be exposed by the repository.
AS for memory leaks, make repository implmement IDIsposable (where you close any outstanding open conenctions)and just makes sure that the DI container manages a repository instance per request, it will call Dispose on it.
Part of a repository is abstracting away the details of persistence.
I see two problems with your proposal:
You are leaking the abstraction more than necessary by naming these methods "opendb" and "closedb" and
If you go down this route, you should return IDisposable (the connection object) from the opendb() method, and wrap the action in a using block to ensure that the connection gets closed.
Typically, you can just let the repository create a connection for each method, so you just have to get it right in your repository methods. The challenge comes when you want to perform multiple actions against the repository, without using a separate connection for each piece.
To achieve that, you could expose the notion of a unit-of-work from the repository. Your unit of work will implement the interface for the repository's methods, so you can't call them outside of a unit-of-work. It will also implement IDisposable, so whenever you call into your repository you will use a using block. Internally, the repository will manage the connection, but will neither expose the connection nor "talk about it."
For example:
public ActionResult SomeAction(int CarId)
{
using (var repo = CarRepository.BeginUnitOfWork())
{
var car = repo.read(CarId);
// do something meaningful with the car, do more with the repo, etc.
}
}
I've been googling a ton on repository patterns with Linq over the last few days. There's a lot of info out there but it's often contradictory and I'm still looking for a definitive source.
One of the things I'm still not sure about is whether the repository should instantiate it's own DataContext and have a SubmitChanges method, or if the DataContext should be injected and the submission handled externally. I've seen both designs but no real comment on the reasoning.
Anyway, the following pattern is pretty common
class Repository<T>
{
DataContext db = new LinqDataContext();
public IEnumerable<T> GetAll() { ... }
public T GetById() { ... }
... etc
public void SubmitChanges() { ... }
}
So my main question is, with the above implementation, why does the repository not need to implement IDisposable? I've seen literally hundreds of examples as above, and none of them seem to bother disposing the DataContext. Isn't this a memory leak?
Disposing a DataContext closes the underlying connection if you have autoclose set to false. If you do not call dispose, you have to wait for the GC to call it for you. You should implement IDisposable and dispose of your repositories which should in turn dispose their DataContext.
Another solution is to create a new data context for each method in your repository if your methods don't work together within a single transaction. Then you can dispose your contexts as soon as they are used via a using() directive.
Not necessary but you probably should implement IDisposable.