Mixing Repository implementations for different data sources - c#

A Repository as defined by Martin Fowler is supposed to act like an in-memory domain object collection. This allows the application (in theory) to be ignorant of the persistence mechanism.
So under normal circumstances you'd have something like this:
public void MyBusinessLogicMethod () {
...
IRepository<Customer> repository = myIocContainer.Resolve<IRepository<Customer>>();
repository.Add(customer);
}
If however you have a series of inserts/updates that you wish to do and want a mechanism to roll back should any of them fail you'd need some sort of UnitOfWork implementation:
public void MyBusinessLogicMethod () {
...
using (IUnitOfWork uow = new UnitOfWork()){
IRepository<Customer> customerRepo = myIocContainer.Resolve<IRepository<Customer>>(uow);
customerRepo.Add(customer);
IRepository<Order> orderRepo = myIocContainer.Resolve<IRepository<Order>>(uow);
orderRepo.Add(order);
IRepository<Invoice> invoiceRepo = myIocContainer.Resolve<IRepository<Invoice>>(uow);
invoiceRepo.Update(invoice);
uow.Save();
}
}
However if you had some bizarre requirement that your Customer Repository was acting against a SqlServer database, your Order Repository against a MySql database and your Invoice Repository against a PostgreSQL database, how would you go about handling the Transactions for each database session?
Now this is a bit of contrived example for sure but every Repository implementation I've come across seems to know at some level that it's really a particular database and ORM being used.
Imagine another scenario where you have 2 repositories where one is going to a database and the other is calling a web service. The whole point of Repositories is that the application shouldn't care what data source you are going to but without jumping through some massive hoops I don't see how these scenarios can be accounted for without the application knowing at some level "FYI this is going to data source x so we'd better treat it differently".
Is there a pattern or implementation that addresses this issue? It seems to me if you are using Database x and ORM y for your entire application then Repositories work splendidly, but if due to technical debt that course deviates then the benefits of repositories are greatly reduced.

In your UnitOfWork, as suggested, you should use a TransactionScope transaction.
It elevates, in your case, to MSDTC and ensure all enlisted operations are correctly executed before commit or otherwise rollback.

Related

How do I maintain referential transparency between related entities without relying on a common data context instance?

Thanks for looking.
Background
In my .NET applications I usually have a Business Logic Layer (BLL) containing my business methods and a Data Access Layer (DAL) which contains my Entitiy classes and any methods for dealing with atomic entities (i.e. CRUD methods for a single entity). This is a pretty typical design pattern.
Here is a pseudocode example of what I mean:
BLL
public static int CreateProduct(ProductModel product){
return DAL.SomeClass.CreateProduct(new DAL.Product{
Name = product.Name,
Price = product.Price
});
}
DAL
public int CreateProduct(Product p){
var db = new MyDataContext();
db.Products.AddObject(p);
db.SaveChanges();
return p.Id;
}
No problems with this simple example.
Ideally, all the business of instantiating a data context and using that data context lives in the DAL. But this becomes a problem if I attempt to deal with slightly more complex objects:
BLL
public static int CreateProduct(ProductModel product){
return DAL.SomeClass.CreateProduct(new DAL.Product{
Name = product.Name,
Price = product.Price,
ProductType = DAL.SomeClass.GetProductTypeById(product.ProductTypeId) //<--PROBLEM
});
}
Now, instead of saving the entity, I get the following error:
An entity object cannot be referenced by multiple instances of IEntityChangeTracker
Ok, so the answer to dealing with that is to pass a common data context to both calls:
BLL
public static int CreateProduct(ProductModel product){
using{var db = new DAL.MyDataContext()){
return DAL.SomeClass.CreateProduct(new DAL.Product{
Name = product.Name,
Price = product.Price,
ProductType = DAL.SomeClass.GetProductTypeById(product.ProductTypeId, db) //<--CONTEXT
}, db); //<--CONTEXT
}
}
Problem
This solves the immediate problem, but now my referential transparency is blown because I have to:
Instantiate the data context in the BLL
Pass the data context to the DAL from the BLL
Create overridden methods in the DAL that accept a data context as a parameter.
This may not be a problem for some but for me, since I write my code in a more functional style, it is a big problem. It's all the same database after all, so why the heck can't I deal with unique entities regardless of their data context instance?
Other Notes
I realize that some may be tempted to simply say to create a common data context for all calls. This won't fly as doing so is bad practice for a multitude of reasons and ultimately causes a connection pool overflow. See this great answer for more details.
Any constructive input is appreciated.
Personally, I track my unit of work and associate a data context to it via static methods. This works great if you aren't talking about operations with long lifetimes, such as my current project, an ASP.NET application, where every request is a (mostly) distinct unit and a request start and end coincide with the unit start/end. I store data context in the request CurrentContext, which, if you aren't familiar with it, is basically a dictionary managed by the system that allocates a request-specific storage accessible by static methods. The work's already done for me there, but you can find lots of examples of implementing your own unit of work pattern. One DbContext per web request... why?
Another equally workable answer for many is injection. Used for this purpose (injecting datacontext), it basically mimics the code you wrote at the end of your question, but shields you from the "non-functional" stuff you dislike.
Yes, you are only accessing one database, but if you look closely, you will see the database is not the constraint here. That is arising from the cache, which is designed to permit multiple, differing, concurrent copies of the data. If you don't wish to permit that, then you have a whole host of other solutions available.

Is Db access from my Services Layer a bad thing?

My last app implemented UoW, DI, IoC, Repository Pattern, Factories, all sorts of stuff that seemed neat, but made maintenance and debugging a pain.
I'm taking the opposite approach with my most recent app - no DI, no IoC, no UoW, just MVC, Services Layer, and DB. I'm probably thinking about Repository Pattern all wrong, but the reading that I've done suggests that it's responsible only for Db access, and not Business logic, to keep the two concerns separated.
In implementing a repository pattern, I feel like I'm just duplicating so much of my Service layer. For example, in my UserService class, I have the following:
public void UpdateAboutMe(AboutMeDto request)
{
using (var db = CreateContext())
{
var user = db.Users.FirstOrDefault(s => s.Username.Equals(request.Username, StringComparison.OrdinalIgnoreCase));
if (user != null)
{
user.AboutMe = request.AboutMe;
SaveChanges(db);
}
else
{
throw new InvalidDataException("Null User");
}
}
}
This way, the Service grabs the object, updates a single field, and commits the changes to the DB, and disposes the context.
In my UserService, I have other methods like this:
GetUserByUserName
GetUserById
GetUsersWithChildEntities
GetUsersWithoutChildEntities (faster than the former, right?)
UpdateUserThumbnail
UpdateUserBio
UpdateUserInterests
Wouldn't every one of these need a corresponding Repo method?
If I implement a repository method, the above service might look like this:
public void UpdateAboutMe(AboutMeDto request)
{
return _userRepository.UpdateAboutMe(request);
}
Which seems cleaner, but not a lot cleaner since I'm just moving stuff around - and if I decide to change my one of my Get methods to include some child entity, I now have to create another method in the Repo, update the Interface, and update the Service method, instead of just doing it directly from my service method.
I'm basically interested in learning whether or not I should implement Repository Pattern, based on the limited understanding I've demonstrated above. It seems like it's either add a vertical layer of complexity to your app, or just make your service layer a little beefier.
IMO - with EF lazy loading and per-field updates - Repository Pattern seems like so much more overhead.
And, I'm not huge on TDD in this case, so I'd like to keep testability out of the equation if possible.
Patterns exist to solve problems. If the way the pattern solves the problem introduces others that aren't acceptable in your environment, then either you are doing it wrong or you just need to go down a different path.
Along with this, just because something is a pattern doesn't mean you should blindly use it. There are many "patterns" that I consider to be pure garbage due to introducing large swaths of code for relatively little gain.
I'm not sure why you have a method call to update a single field on a single record. That seems to make things a bit difficult and certainly can cause lots of DB queries to fire off when just one would do, essentially undermining performance for no gain.
Two examples:
GetUser(String userName, Int32 id, Boolean withEntities);
or
GetUser(String userName, Boolean withEntities);
GetUser(Int32 id, Boolean withEntities);
The first one combines your common ways of acquiring a specific user account. The second one duplicates code, but splits it out. Later you might decide to add a GetUser(String email, Boolean withEntities) at some point.
The various UpdateUser... methods you have I'd roll into one. Passing a full User object into it and letting the one method update the entire thing. There are very very few circumstances where I'd have methods update just a single field.
If you aren't interested in TDD, IoC/DI, or reuseability, there's no need to have excess layers. Each layer has a purpose, but if you do not have that purpose you do not need that layer.
However, it will become more difficult to rewrite things once people start dying during a server outage.

Repository and Unit of Work patterns - How to save changes

I'm struggling to understand the relationship between the Repository and Unit of Work patterns despite this kind of question being asked so many times. Essentially I still don't understand which part would save/commit data changes - the repository or the unit of work?
Since every example I've seen relates to using these in conjunction with a database/OR mapper let's make a more interesting example - lets persist the data to the file system in data files; according to the patterns I should be able to do this because where the data goes is irrelevant.
So for a basic entity:
public class Account
{
public int Id { get; set; }
public string Name { get; set; }
}
I imagine the following interfaces would be used:
public interface IAccountRepository
{
Account Get(int id);
void Add(Account account);
void Update(Account account);
void Remove(Account account);
}
public interface IUnitOfWork
{
void Save();
}
And I think in terms of usage it would look like this:
IUnitOfWork unitOfWork = // Create concrete implementation here
IAccountRepository repository = // Create concrete implementation here
// Add a new account
Account account = new Account() { Name = "Test" };
repository.Add(account);
// Commit changes
unitOfWork.Save();
Bearing in mind that all data will be persisted to files, where does the logic go to actually add/update/remove this data?
Does it go in the repository via the Add(), Update() and Remove() methods? It sounds logical to me to have all the code which reads/writes files in one place, but then what is the point of the IUnitOfWork interface?
Does it go in the IUnitOfWork implementation, which for this scenario would also be responsible for data change tracking too? To me this would suggest that the repository can read files while the unit of work has to write files but that the logic is now split into two places.
Repository can work without Unit Of Work, so it can also have Save method.
public interface IRepository<T>
{
T Get(int id);
void Add(T entity);
void Update(T entity);
void Remove(T entity);
void Save();
}
Unit Of Work is used when you have multiple repositories (may have different data context). It keeps track of all changes in a transaction until you call Commit method to persist all changes to database(file in this case).
So, when you call Add/Update/Remove in the Repository, it only changes the status of the entity, mark it as Added, Removed or Dirty... When you call Commit, Unit Of Work will loop through repositories and perform actual persistence:
If repositories share the same data context, the Unit Of Work can work directly with the data context for higher performance(open and write file in this case).
If repositories have different data context(different databases or files), the Unit Of Work will call each repository's Save method in a same TransactionScope.
I'm actually quite new to this but as nobody wiser has posted:
The code which CRUDs happens in the repositories as you would expect, but when Account.Add (for example) is called, all that happens is that an Account object is added to the list of things to be added later (the change is tracked).
When unitOfWork.Save() is called the repositories are allowed to look through their list of what has changed Or the UoW's list of what has changed (depending on how you choose to implement the pattern) and act appropriately - so in your case there might be a List<Account> NewItemsToAdd field that has been tracking what to add based on calls to .Add(). When the UoW says it's OK to save, the repository can actually persist the new items as files, and if successful clear the list of new items to add.
AFAIK the point of the UoW is to manage the Save across multiple repositories (which combined are the logical unit of work that we want to commit).
I really like your question.
I've used Uow / Repository Pattern with Entity Framework and it shows how much EF actually does (how the context tracks the changes until SaveChanges is finally called). To implement this design pattern in your example you need to write quite a bit of code to manage the changes.
Ehe, things are tricky. Imagine this scenario: one repo saves something in a db, other on the file system and the third something on the cloud. How do you commit that?
As a guideline, the UoW should commit things, however in the above scenario, Commit is just an illusion as you have 3 very different things to update. Enter eventual consistency, which means that all things will be consistent eventually (not in the same moment as you're used with a RDBMS).
That UoW is called a Saga in a message driven architecture. The point is every saga bit can be executed at different time. Saga completes only when all 3 repositories are updated.
You don't see this approach as often, because most of the time you'll work with a RDBMS, but nowadays NoSql is quite common so a classic transactional approach is very limited.
So, if you're sure you work ONLY with ONE rdbms, use a transaction with the UoW and pass teh associated connection to each repository. At the end, UoW will call commit.
If you know or expect you might have to work with more than one rdbms or a storage that doesn't support transactions, try to familiarize yourself with a message driven architecture and with the saga concept.
Using the file system can complicate things quite much if you want to do it on yourself.
Only write when the UoW is committed.
What you have to do is to let the repositories enqueue all IO operations in the UnitOfWork. Something like:
public class UserFileRepository : IUserRepository
{
public UserFileRepository(IUnitOfWork unitOfWork)
{
_enquableUow = unitOfWork as IEnquableUnitOfWork;
if (_enquableUow == null) throw new NotSupportedException("This repository only works with IEnquableUnitOfWork implementations.");
}
public void Add(User user)
{
_uow.Append(() => AppendToFile(user));
}
public void Uppate(User user)
{
_uow.Append(() => ReplaceInFile(user));
}
}
By doing so you can get all changes written to the file(s) at the same time.
The reason that you don't need to do that with DB repositories is that the transaction support is built into the DB. Hence you can tell the DB to start a transaction directly and then just use it to fake a Unit Of Work.
Transaction support
Will be complex as you have to be able to roll back changes in the files and also prevent different threads/transactions from accessing the same files during simultaneous transactions.
normally, repositories handle all reads, and unit-of-work handles all writes,but for sure you can handle all reads and writes by only using one of these two
(but if only using repository pattern, it will be very tedious to maintain maybe 10 repositories,more worse,maybe result in inconsistent reads and writes be overwritten),
advantage of mix using both is ease of tracing status change and ease of handling concurrency and consistent problems.
for better understanding,you can refer links: Repository Pattern with Entity Framework 4.1 and Parent/Child Relationships
and
https://softwareengineering.stackexchange.com/questions/263502/unit-of-work-concurrency-how-is-it-handled

Data access architectures with Raven DB

What data access architectures are available that I can use with Raven DB?
Basically, I want to separate persistence via interfaces, so I don't expose underline storage to the upper layers. I.e. I don't want my domain to see IDocumentStore or IDocumentSession which are from Raven DB.
I have implemented the generic repository pattern and that seems to work. However, I am not sure that is actually the correct approach. Maybe I shall go towards command-query segregation or something else?
What are your thoughts?
Personally, I'm not really experienced with the Command Pattern. I saw that it was used in Rob Ashton's excellent tutorial.
For myself, I'm going to try using the following :-
Repository Pattern (as you've done)
Dependency Injection with StructureMap
Moq for mock testing
Service layer for isolating business logic (not sure of the pattern here .. or even if this is a pattern.
So when i wish to get any data from RavenDB (the persistence source), i'll use Services, which will then call the appropriate repository. This way, i'm not exposing the repository to the Application nor is the repository very heavy or complex -> it's basically a FindAll / Save / Delete.
eg.
public SomeController(IUserService userService, ILoggingService loggingService)
{
UserService = userService;
LoggingService = loggingService;
}
public ActionMethod Index()
{
// Find all active users, page 1 and 15 records.
var users = UserService.FindWithIsActive(1, 15);
return View(new IndexViewModel(users));
}
public class UserService : IUserService
{
public UserService(IGenericReposistory<User> userRepository,
ILoggingService loggingService)
{
Repository = userRepository;
LoggingService = loggingService;
}
public IEnumberable<User> FindWithIsActive(int page, int count)
{
// Note: Repository.Find() returns an IQueryable<User> in this case.
// Think of it as a SELECT * FROM User table, if it was an RDMBS.
return Repository.Find()
.WithIsActive()
.Skip(page)
.Take(count)
.ToList();
}
}
So that's a very simple and contrived example with no error/validation checking, try/catch, etc... .. and it's pseudo code .. but you can see how the services are rich while the repository is (suppose to be, for me at least) simple or lighter. And then I only expose any data via services.
That's what I do right now with .NET and Entity Framework and I'm literally hours away from giving this a go with RavenDb (WOOT!)
What are you trying to achieve by that?
You can't build an application which makes use of both an RDBMS and DocDB, not efficiently at least. You have to decide for yourself which database you are going to use, and then go all the way with it. If you decide to go with an RDMBS, you can use NHibernate for example, and then again - no need for any other abstraction layer.

Does Queryability and Lazy Loading in C# blur the lines of Data Access vs Business Logic?

I am experiencing a mid-career philosophical architectural crisis. I see the very clear lines between what is considered client code (UI, Web Services, MVC, MVP, etc) and the Service Layer. The lines from the Service layer back, though, are getting more blurred by the minute. And it all started with the ability to query code with Linq and the concept of Lazy loading.
I have created a Business Layer that consists of Contracts and Implementations. The Implementations then could have dependencies to other Contracts and so on. This is handled via an IoC Container with DI. There is one service that handles the DataAccess and all it does is return a UnitOfWork. This UnitOfWork creates a transaction when extantiated and commits the data on the Commit method. [View this Article (Testability and Entity Framework 4.0)]:
public interface IUnitOfWork : IDisposable {
IRepository<T> GetRepository<T>() where T : class;
void Commit();
}
The Repository is generic and works against two implementations (EF4 and an InMemory DataStore). T is made up of POCOs that get generated from the database schema or the EF4 mappings. Testability is built into the Repository design. We can leverage the in-memory implementation to assert results with expectations.
public interface IRepository<T> where T : class {
IQueryable<T> Table { get; }
void Add(T entity);
void Remove(T entity);
}
While the Data Source is abstracted, IQueryable still gives me the ability to create queries anywhere I want within the Business logic. Here is an example.
public interface IFoo {
Bar[] GetAll();
}
public class FooImpl : IFoo {
IDataAccess _dataAccess;
public FooImpl(IDataAccess dataAccess) {
_dataAccess = dataAccess;
}
public Bar[] GetAll() {
Bar[] output;
using (var work = _dataAccess.DoWork()) {
output = work.GetRepository<Bar>().Table.ToArray();
}
return output;
}
}
Now you can see how the queries could get even more complex as you perform joins with complex filters.
Therefore, my questions are:
Does it matter that there is no clear distinction between BLL and the DAL?.
Is queryability considered data access or business logic when behind a Repository layer that acts like an InMemory abstraction?
Addition: The more I think about it, maybe the second question was the only one that should have been asked.
I think the best way to answer your questions is to step back a moment and consider why separation between business logic layers and data access layers is the recommended practice.
In my mind, the reasons are simple: keep the business logic separate from the data layer because the business logic is where the value is, the data layer and business logic will need to change over time more or less independently of each other, and and the business logic needs to be readable without having to have detailed knowledge of what all the data access layer does.
So the litmus test for your query gymnastics boils down to this:
Can you make a change to the data schema in your system without upsetting a significant portion of the business logic?
Is your business logic readable to you and to other C# developers?
1. Only if you care more about philosophy than getting stuff done. :)
2. I'd say it's business logic because you have an abstraction in between. I would call that repository layer part of DAL, and anything that uses it, BL.
But yeah, this is blurry to me as well. I don't think it matters though. The point of using patterns like this is to write a clean, usable code that easy to communicate at the same time, and that goal is accomplished either way.
1.Does it matter that there is no clear distinction between BLL and the DAL?.
It sure does matter! Any programmer that uses your Table property needs to understand the ramifications (database roundtrip, query translation, object tracking). That goes for programmers reading the business logic classes as well.
2.Is queryability considered data access or business logic when behind a Repository layer that acts like an InMemory abstraction?
Abstraction is a blanket that we hide our problems under.
If your abstraction is perfect, then the queries could be abstractly considered as operating against in-memory collections and therefore they are not data access.
However, abstractions leak. If you want queries that make sense in the data world, there must be effort to work above and beyond the abstraction. That extra effort (which defeats abstraction) produces data access code.
Some examples:
output = work.GetRepository<Bar>().Table.ToArray();
This is code is (abstractly) fine. But in the data world it results in scanning an entire table and is (at least generally) dumb!
badquery = work.GetRepository<Customer>().Table.Where(c => c.Name.Contains("Bob")).ToArray();
goodquery = work.GetRepository<Customer>().Table.Where(c => c.Name.StartsWith("Bob")).ToArray();
Goodquery is better than bad query when there's an index on Customer.Name. But that fact is not available to us unless we lift the abstraction.
badquery = work.GetRepository<Customer>().Table
.GroupBy(c => c.Orders.Count())
.Select(g => new
{
TheCount = g.Key,
TheCustomers = g.ToList()
}).ToArray();
goodquery = work.GetRepository<Customer>().Table
.Select(c => new {Customer = c, theCount = c.Orders.Count())
.ToArray()
.GroupBy(x => x.theCount)
.Select(g => new
{
TheCount = g.Key,
TheCustomers = g.Select(x => x.Customer).ToList()
})
.ToArray();
goodquery is better than bad query since badquery will requery the database by group key, for each group (and worse, it is highly unlikely there is an index to help with filtering customers by c.Orders.Count() ).
Testability is built into the Repository design. We can leverage the InMemory implementation to assert results with expectations.
Be under no illusions that your queries are being tested if you actually run them against in-memory collections. Those queries are untestable unless a database is involved.

Categories