I am struggling to understand how to best query a repository.
The three factors that are throwing me through a loop right now are:
Return type of data
Columns to run query on
Number of records to return
Point 1
In regards to question one:
I have Repositories with lot of methods that return a combination of both Entities and scalar values. This seems to lead to "method explosion". Should I always return an Entity object? How should I query for objects where I only need one column?
Point 2
When running a query should I include every column in the table even if I only need one, or two columns? If I create specific queries for this it leads to more methods in the Repository
Point 3
How should I provide conditions for the query? I read about Specifications, but my understanding is that you loop through the returned records and filter out the ones that pass into a new collection. This doesn't seem like a good idea performance wise. Right Now I just make a new method in the Repo like getNameById() which encapsulates the condition.
Please not that I am not using an ORM, I just have raw sql in my Repositories.
Update
Point 1:
Based on the answers and a bit more research would this be a good implementation?
Right now I have a large repository that return a mix of scalar and entity type objects (all same entity). I'm thinking I could reduce this greatly if I just use a GetUser(userId) method and forget writing methods that just return single column values.
For example if I need to return a user name I could call the GetUser(userId) method that hydrates the User object and then in the service layer just filter it down to the username.
Another way would be to use some sort of QueryBuilder class I could pass into the Repository which could be parsed to generate the proper sql.
Point 2
Looking back this is pretty similar to point one and my current solution would be to just grab all table fields. It's a tradeoff between performance and maintainability.
Point 3
I would need to provide some sort of where clause. I'm not sure if this make sense doing via Specification or just a sql string. My current solution is to make new methods for these types, but I would like something more generic for the Repository
Overall, still researching into this... I'd love to hear more input into this or links to books or references that kind of tie this all together.
I have Repositories with lot of methods that return a combination of both Entities and scalar values. This seems to lead to "method explosion". Should I always return an Entity object? How should I query for objects where I only need one column?
You can fight repository method explosion similar to how you would fight other SRP violations. You can create another repository for the same entity. See this answer to a similar question.
When running a query should I include every column in the table even if I only need one, or two columns? If I create specific queries for this it leads to more methods in the Repository
This is not a DDD question. Domain driven design does not deal with 'rows and columns'. There is always some redundancy in how much data you load to 'hydrate' the domain object, but you have to measure whether this really affects your performance. If this is really a performance bottleneck than it maybe a symptom of incorrect domain model.
How should I provide conditions for the query? I read about Specifications, but my understanding is that you loop through the returned records and filter out the ones that pass into a new collection. This doesn't seem like a good idea performance wise. Right Now I just make a new method in the Repo like getNameById() which encapsulates the condition.
This again is a data access issue. Nothing in DDD says that your repository can not convert Specification to a SQL query. It is up to you whether you do this or iterate over records in memory (as long as repository consumer only sees Specification and Repository and stays unaware of the actual implementation).
Regarding 'Raw SQL vs. ORM in DDD' you may find this answer interesting.
I agree with everything Dmitry says, but perhaps think you should have a read of CQRS.
I used to ask similar questions when getting started with DDD (regarding 'method explosion', not your SQL issues), and this lead me to CQRS. Personally, I don't really see how DDD is practical without it, and it answers a lot of these sorts of questions when it comes to querying data. Using it's principles what I'd suggest is:
Only use domain repositories when committing a transaction. That is, you don't use repositories to display data in the UI. You only fetch aggregates from your repository when you want to perform an operation against them.
Your repositories only return aggregates, not individual entities separately. This makes sense now as we are only using repositories in a transactional sense, and entities can only be mutated via atomic operations and persisted by the aggregate as a whole.
You create separate repositories (or 'query services') which provide tailor made queries and data types for whatever data you need. These can return dumb DTOs with no logic.
This keeps your proper domain & repositories clean, whilst providing the means to create a thin data access layer that provides high performing queries.
Regarding the specification pattern: rather than converting it to a SQL query in code, you could provide public properties on the specification that represent the criteria. These values could then be added in the where clause of your SQL or sent as parameters to a SPROC.
First of all, you haven't really explained what you are using all these queries for. Chances are it's for user interface needs. If so, there's no need to jump through all these hoops (service->repository->domain->dto->client), just query the database as directly as possible. And what do you know, gone are the questions whether you can query for scalars or just the columns you need. Just use plain sql and return what you need. Don't create abstractions that cause friction.
Chobo,
We need remember two things about Repository [Fowler PoEAA][Evans DDD] pattern:
Use a Repository pattern as a simple collection. Repository abstract these infrastructure details because aren't from the domain.
If a Repository pattren is a collection, it is a cluster of objects of the same type.
Two others types may help your Repository: the Query Object [Fowler PoEAA] and Data Mapper [Fowler PoEAA] patterns. Query Object pattern aggregate criterias using object oriented methods and know how to translate them as a SQL statement. Data Mapper pattern know map the object states from application and the table columns from databases.
You can use Lazy Load pattern [Fowler PoEAA] to mitigate the problem of large object in memory.
Success for you!
Related
I have two separate databases. Database A is small and contains 6 tables and some stored procedures, database B is larger and contains 52 tables, 159 views, and some stored procedures. I have been tasked with creating an application that does operations on both of these databases. There is already code that has been written that accesses these databases but it is not testable and I would like to create the DAL code from scratch. So I made a code first Entity Framework connection and let it create all of the POCOs for me as well as the Context classes. Now I have 3 DatabaseContext classes. 1 for database A which is just the 6 separate tables. 1 for the tables from database B and 1 for the views from database B. I split the Context class for database B into 2 separate context classes because the tables and the views are used in 2 different scenarios.
I have created 2 projects in a single solution. DomainModel and DataAccessLayer. The DomainModel project holds all of the POCOs which the Entity Framework code first generated for me. DataAccessLayer is where the DatabaseContexts are and where I would like to create the code which other projects can use to access the database. My goal being to make a reusable and testable data access layer code.
The QUESTION is: What do I create in the DataAccessLayer project to make the DAL resusable in other projects that want to be able to talk to the database in an abstract way and what is the best way to structure it?
What I have done:
In my search on the internet I have seen that people recommend the repository pattern (sometimes with UnitOfWork) and it seems like that is the way to go, so I created an interface called IGenericRepository and an implementation called GenericEntityFrameworkRepository. The interface has methods such as GetByID, Insert, Delete, and Update. The implementation takes a DbContext as a parameter in the constructor and implements all of the methods. The next step might be to create specific implementations for the different tables and views but that is going to be a very tedious thing to do since there are so many tables and views.
I have also read that Entity Framework IS the DAL. Then it might be enough to just create another project that holds the business logic and returns IEnumerables. But that seems like it will be hard to test.
What I am trying to achieve is a well structured and testable foundation to access our database that can be tested thoroughly and expanded as other projects start to require other functionality to the database.
The folder structure for the project can be seen in the following picture:
Design and architecture is all about trade-offs. Be careful when you want to consider abstracting your data access for purposes of reuse. Within an application this leads to overly complex and poorly performing solutions to what can be a very simple and fast solution.
For instance, if I want to build a business domain layer as a set of services to be consumed by a number of different applications (web applications and services/APIs that will serve mobile apps or third party interests) then that domain layer becomes a boundary that defines what information will come in, and what information will go out. Everything in and out of that abstraction should be DTOs that reflect the domain that I am making available via the actions I make available. Within that boundary, EF should be leveraged to perform the projection needed to produce the relevant DTOs, or update entities based on the specific details passed in. (After verification) The trade-off is that every consumer will need to make due with this common access point, so there is little wiggle room to customize what comes out unless you build it in. This is preferable to having everything just do their own thing accessing the database and running custom queries or calling Stored Procedures.
What I see most often though is that within a single solution, developers want to abstract the system not to "know" that the data domain is provided by EF. The application passes around POCO Entities or DTOs and they implement things like Generic Repositories with GetById, GetAll, Update, Upsert, etc. This is a very, very poor design decision because it means that your data domain cannot take advantage of things like projection, pagination, custom filtering, ordering, eager loading related data, etc. Abstracting the EF DbContext is certainly a valid objective for enabling unit testing, but I strongly recommend avoiding the performance pitfall of a Generic Repository pattern. IMO attempting to abstract away EF just for the sake of hiding it or making it swappable is either a poor performing and/or an overly complex mess.
Take for example a typical GetAll() method.
public IEnumerable<T> GetAll()
{
return _context.Set<T>().ToList();
}
What happens if you only want orders from the last 3 months?
var orders = orderRepository.GetAll()
.Where(x => x.OrderDate >= startDate).ToList();
The issue here is that GetAll() will always return all rows. If you have 4 million orders you can appreciate that is not desirable.
You could make an OrderRepository that extends Repository to implement a method like GetOrdersAfter(DateTime) but soon, what is the point of the Generic Repository? The next option would be to pass something like an Expression<Func<TEntity>> Where clause into the GetAll() method. There are plenty of examples of that out there. However, doing that is leaking EF-isms into the consuming code. The Where clause expression has to conform to EF rules. For instance passing order => order.SomeUnmappedProperty == someValue would cause the query to fail as EF won't be able to convert that down to SQL. Even if you're Ok with that the GetAll method will need to start looking like:
IEnumerable<TEntity> GetAll(Expression<Func<TEntity>> whereClause = null,
IEnumerable<OrderByClause> orderByClauses = null,
IEnumerable<string> includes = null,
int? pageNumber = null, int? pageSize = null )
or some similar monstrosity to handle filtering, ordering, eager loading, and pagination, and this doesn't even cover projection. For that you end up with additional methods like:
IEnumerable<TEntity, TDTO> GetAll(Expression<Func<TEntity>> whereClause = null,
IEnumerable<OrderByClause> orderByClauses = null,
IEnumerable<string> includes = null,
int? pageNumber = null, int? pageSize = null )
which will call the other GetAll then perform a Mapper.Map to return a desired DTO/ViewModel that is hopefully registered in a mapping dependency. Then you need to consider async vs. synchronous flavours of the methods. (or forcing all calls to be synchronous or async)
Overall my advice would be to avoid Generic Repository patterns and don't abstract away EF just for the sake of abstraction, instead look for ways to leverage the most you can out of it. Most of the problems developers run into with performance and odd behaviour with EF stem from efforts to abstract it. They are so worried about needing to abstract it to be able to replace it if they need to, that they end up with those performance and complexity problems entirely because of the limitations imposed by the abstraction. (A self-fulfilling prophecy) "Architecture" can be a dirty word when you try to prematurely optimize a solution to solve a problem you don't currently, and probably will never face. Always keep YAGNI and KISS at the forefront of all design considerations.
I guess you tumbled between selecting right architecture for a given solution. a software must be created for meet the business needs. so depending on the business, complexity of the domain, we have to choose the write architecture.
Microsoft elaborated this in a detail PDF you will find it here
https://learn.microsoft.com/en-us/dotnet/architecture/modern-web-apps-azure/
You want some level of abstraction between your Data and the BL then "N-Layer" architecture is not the best solution. even though i am not telling the N-Layer entirely out dated but there is more suitable solutions for this problem .
And if you using EF or EF core then there is no need to implement Repository cus EF itself contain UOW and Repositories.
Clean Architecture introduce higher level of abstraction between layers. it abstract away your data layer and make it reusable over different project or different solutions. Clean architecture is a big topic i can't describe here in more details. but i learn the subject from him.
Jason Taylor- Clean Architecture
https://www.youtube.com/watch?v=dK4Yb6-LxAk
https://github.com/jasontaylordev/CleanArchitecture
I might be way off here, and this question probably bordering subjective, but here goes anyway.
Currently I use IList<T> to cache information from the database in memory so I can use LINQ to query information from them. I have a ORM'ish layer I've written with the help of some questions here on SO, to easily query the information I need from the DB. For example:
IList<Customer> customers = DB.GetDataTable("Select * FROM Customers").ToList<Customer>();
Its been working fine. I also have extension methods to do CRUD updates on single items within these lists:
DB.Update<Customer>(customers(0));
Again working quite well.
Now in the GUI layer of my app, specifically when binding DataGridView's for the user to edit the data, i find myself bypassing this DAL layer and directly using TableAdapters within the forms which kind of breaks the layered architecture which smells a bit to me. I've also found the fact that I'm using TableAdapters here and ILists there, there are differing standards followed throughout my code which I would like to consolidate into one.
Ideally, I would like to be able to bind to these lists and then have the DAL update the list's 'dirty' data for me. To me, this process would involve the following:
Traversing the list for any 'dirty' items
For each of these, see if there is already an item with the PK in the DB
If (2), then update, else insert
Finally, perform a Delete FROM * WHERE ID NOT IN('all ids in list') query
I'm not entirely sure how this is handled in a TableAdapter, but I can see the performance of this method dropping significantly and quite quickly with increasing items in the list.
So my question is this:
Is there an easier way of committing List to a database? Note the word commit, as it may be an insert/update or delete.
Should I maybe convert to DataTable? e.g. here
I'm sure some of the more advanced ORM's will perform this type of thing, however is there any mini-orm (e.g. dapper/Petapoco/Simple.data etc) that can do this for me? I want to keep it simple (as is with my current DAL) and flexible (I don't mind writing the SQL if its gets me exactly what I need).
Currently I use IList to cache information from the database in memory so I can use LINQ to query information from them.
Linq also has a department called Linq-to-Datasets so this is not a compelling reason.
Better decide what you really want/need:
a full ORM like Entity Framework
use DataSets with DataDapters
use basic ADO.NET (DataReader and List<>) and implement your own change-tracking.
You can mix them to some extent but like you noted it's better to pick one.
I have to develop a fairly large ASP.NET MVC project very quickly and I would like to get some opinions on my DAL design to make sure nothing will come back to bite me since the BL is likely to get pretty complex. A bit of background: I am working with an Oracle backend so the built-in LINQ to SQL is out; I also need to use production-level libraries so the Oracle EF provider project is out; finally, I am unable to use any GPL or LGPL code (Apache, MS-PL, BSD are okay) so NHibernate/Castle Project are out. I would prefer - if at all possible - to avoid dishing out money but I am more concerned about implementing the right solution. To summarize, there are my requirements:
Oracle backend
Rapid development
(L)GPL-free
Free
I'm reasonably happy with DataSets but I would benefit from using POCOs as an intermediary between DataSets and views. Who knows, maybe at some point another DAL solution will show up and I will get the time to switch it out (yeah, right). So, while I could use LINQ to convert my DataSets to IQueryable, I would like to have a generic solution so I don't have to write a custom query for each class.
I'm tinkering with reflection right now, but in the meantime I have two questions:
Are there any problems I overlooked with this solution?
Are there any other approaches you would recommend to convert DataSets to POCOs?
Thanks in advance.
There's no correct answer, though you'll find people who will try to give you one. Some things to keep in mind:
Since you can't get the advantages of EF or Linq-to-SQL, don't worry about using the IQuerable interface; you won't be getting the main advantage of it. Of course, once you've got your pocos, LINQ to object will be a great way of dealing with them! Many of your repository methods will return IQueryable<yourType>.
As long as you have a good repository to return your pocos, using reflection to fill them out is a good strategy, at first. If you have a well-encapsulated repository, I say again. You can always switch out the reflection-filled entity object code for more efficient code later, and nothing in you BL will know the difference. If you make yourself dependent on straight reflection (not optimized reflection like nHibernate), you might regret the inefficiency later.
I would suggest looking into T4 templates. I generated entity classes (and all the code to populate them, and persist them) from T4 templates a few months ago, for the first time. I'm sold! My code in my T4 template is pretty horrible this first try, but it spits out some nice, consistent code.
You will have to have a plan for your repository methods, and closely monitor all the methods your team creates. You can't have a general .GetOrders() method, because it will get all the customers every time, and then your LINQ to object will look nice, but will be covering some bad data access! Have methods like .GetOrderById(int OrderID) and .GetOrderByCustomer(int CustomerID). Make sure each method that returns entities uses an index at least in the DB. If the basic query returns some wasted records, that's fine, but it can't do table scans and return thousands of wasted records.
An example:
var Order = From O in rOrders.GetOrderByCustomer(CustID)
Where O.OrderDate > PromoBeginDate
Select O
In this example, all the Order for a customer would be retrieved, just to get some of the orders. But there won't be a huge amount of waste, and CustomerID should certainly be an indexed field on Orders. You have to decide whether this is acceptable, or whether to add a date distinction to your repository, either as a new method or with overloading other methods. There's no shortcut to this; you have walk the line between efficiency and maintaining your data abstraction. You don't want to have a method in your repository for every single data inquiry in your entire solution.
Some recent articles I've found where people are wrestling with how exactly to do this.:
http://mikehadlow.blogspot.com/2009/01/should-my-repository-expose-iqueryable.html
http://www.west-wind.com/WebLog/posts/160237.aspx
Devart dotConnect for Oracle supports the entity framework, you could then use LINQ to Entities.
Don't worry about using reflection to build DTOs from Datasets. They just work great.
An area of pain will be implementation of IComparer for each business object. Only load the data that is minimum requirement at the presentation layer. I burnt my fingers badly on in-memory sorting.
Also, plan in advanced for lazy-loading on DTOs.
We wrote our on Generic library to convert datatable/datarow into entitycollection/entityobjects. And they work pretty fast.
I thought I would rewrite this question (same iteration). The original was how to wrap a repository pattern around an EAV/CR database. I am trying a different approach.
Question: How could you code a data repository in a "factory" design pattern way? I have a fixed number of entities, but the attributes to these entities are fairly customer specific. They advertise Products which are all similar, but each customer attaches different information to them based on their business model. For example, some care about the percent slab off waste while others care about the quantity of pounds sold. Every time we find another customer, we add a bunch of fields, remove a bunch of fields, and then spend hours keeping each solution current to the latest common release.
I thought we could put the repository classes in a factory pattern, so that when I know the customer type, then I know what fields they would use. Practical? Better way? The web forms use User Controls which are modified to reflect what fields are on the layouts. We currently "join" the fields found on the layout to the fields found in the product table, then CRUD common fields.
Previous question content:
We have an EAV/CR data model that allows different classes for the same entity. This tracks products where customers have wildly different products. Customers can define a "class" of product, load it up with fields, then populate it with data. For example,
Product.Text_Fields.Name
Product.Text_Fields.VitaminEContent
Any suggestion on how to wrap a repository pattern around this?
We have a three table EAV: a Product table, a value table, and a meta table that lists the field names and data types (we list the data types because we have other tables like Product.Price and Product.Price Meta data, along with others like Product.Photo.) Customers track all kinds of prices like a competitor's percent off difference as well as on the fly calculations.
We currently use Linq to SQL with C#.
Edit:
I like the "Dynamic Query" Linq below. The idea behind our DB is like a locker room locker storage. Each athlete (or customer) organizes their own locker in the way they wish, and we handle storing it for them. We don't care what's in the locker as long as they can do what they need with it.
Very interesting... The objects passed to the repository could be dynamic? This almost makes sense, almost like a factory pattern. Would it be possible for the customer to put their own class definitions in a text file, then we inherit them and store them in the DB?
As I understand it the repository pattern abstracts the physical implementation of the database from the application. Are you planning to store the data in differing data stores? If you are happy with Linq to SQL then I'd suggest that perhaps you don't need to abstract in this way as it looks to be extremely complex. That said, I can see that providing an EAV-style repository, i.e. a query would need to be passed Table, Field Type, and Field Name along with any conditional required, might offer you the abstraction you are seeking.
I'm not sure if that would still qualify as a repository pattern in the strictest terms as you aren't really abstracting the storage from the application. It's going to be a toss-up between benefit and effort and there I'm unable to help.
You might want to take a look at the Dynamic Linq extensions found in the dynamic data preview on Codeplex.
I think about having a class clsConnection which we can take advantage of in order to execute every SQL query like select, insert, update, delete, .... is pretty good.
But how complete it could be? How?
You could use LINQ to SQL as AB Kolan suggested or, if you don't have time for the learning curve, I'd suggest taking a look at the Microsoft Enterprise Library Data Access Application Blocks.
You can use the DAB (SQlHelper) from the enterprise Library. This has all the methods/properties necessary for database operation. You dont need to create you own code.
Alternately you can use a ORM like LINQ or NHibernate.
It sounds to me like you're just re-writing the ADO.NET SqlConnection (which already has an attached property of type SqlCommand). Or Linq to SQL (or, even, Linq to Entities).
When doing data access i tend to split it into 2 tiers - purely for testability.
totally seperate the logic for getting key values and managing the really low level data collection from the atomic inserts, updates, selects deletes etc.
This way you can test the logic of the low level data collection very easily without needing to read and write from a database.
this way one layer of classes effectively manages writes to individual tables whilst the other is concerned with getting the data from lookups etc to populate these tables
The Business logic layer that sits on top of these 2 dal layers obviously manages the actual business logic - this means that the datastructure is as seperated from the business logic as is realistically possible ... Ie you could replace the dal and not feel the pain so much.
the 2 routes you can take that work well are
ADO.Net
this is very powerful as you have total control, but at the same time it is time consuming and feels repetative. Also its old school so most people are bored of it hence all the linq 2 sql comments. With this you open a connection to the DB and then execute a command against it.
Basically you create a class to interface with the database and use this to use stored procedures that are in the database. The lowest level class essentially fires off the command with its parameters and then populates itself with the returned values.
and Linq 2 SQL
This is a cool system. Essentially it makes SP's redundant for 90% of cases in return for allowing strongly typed sqlesque statements in your code - save time and are more reliable. I still use 2 dal layers with this but take advantage of the fact that it will generate the basic class with properties for you and simply add functionality to actually do the atomic operations. The higher level then implements the read and write logic for multiple objects.
The nicest part is that you can generate collections of collections easily with linq 2 sql and then write all the inserts and updates with one command (altohguh in reality you tend to do things seperatley).
L2S is powerful once you start playing with it wheras generating a collection of objects from ado.net can be a real pain in comparison - especially when you have to do it again and again.
Another alternative is Linq 2 entities
I ahve had problems with this due to linked servers, also it doesn't like views much and if your tables dont have pk's or constraints then it doesn't like life much either. Id stay clear of it for a while.
Of course if you mean that you want a generic class for writing and reading data from a database I think you will be adding complexity rather than solving a problem. Really you can;t avoid writing code ;) - each bit of data access is unique, trying to genericise it past ado.net or l2s is really asking for trouble imo.
Small project:
A singleton class (like DatabaseConnection) might be good for what you're doing.
Large project:
Enterprise Library has some database code; NHibernate or Entities Framework, perhaps.
Your question wasn't specific enough to give a very definitive answer on this.