I'm learning DDD (domain driven design) and the repository pattern (in C#). I would like to be able to use the repository pattern to persist an entity and not care which database is actually used (Oracle, MySQL, MongoDB, RavenDB, etc.). I am, however, not sure how to handle the database specific id:s most (all?) databases uses. RavenDB, for example, requires that each entity it should store has an id property of type string. Other may require an id property of type int. Since this is handled differently by different databases, I cannot make the database id a part of the entity class. But it would have to exist at some point, at least when I store the actual entity. My question is what the best practise regarding this is?
The idea I am currently pursuing is to, for each database I want to support, implement database specific "value objects" for each business object type. These value object would then have the database specific id property and I would map between the two upon reads and writes. Does this seem like a good idea?
This is the classic case of leaking abstractions. You can't possibly abstract away the type of database under a repository interface unless you want to loose all the good things that come with each database. The requirements on ID type (string, Guid or whatever) are only the very top of huge iceberg with majority of its mass under the muddy waters.
Think about transaction handling, concurrency and other stuff. I understand your point about persistence ignorance. It's a good thing for sure to not depend on specific database technology in the domain model. But you also can't get rid of any dependency on any persistence technology.
It's relatively easy to make your domain model work well with any RDBMS. Most of them have standardized data types. Using ORM like NHibernate will help you a lot. It's much harder to achieve the same among NoSQL databases because they tend to differ a lot (which is very good actually).
So my advise would be to do some research on what is the set of possible persistence technologies you will have to deal with and then choose appropriate level of abstraction for the persistence subsystem.
If this won't work for you, think about Event Sourcing. The event store is one of the least demanding persistence technique. Using library such as Jonathan Oliver's EventStore will allow you to use virtually any storage technology, including file system.
I would go ahead and create an int Id field in the entity and then convert that to a string in the repository where the Id must be a string. I think the effort to abstract your persistence is very worth while and actually eases maintenance.
You are doing the right thing! Abstract yourself away from the constraints of the databases primary key types!
Don't try to translate types, just use a different field.
Specifically: Do not try to use the database's primary key, except in your data access logic. If you need a friendly ID for an object, just create an additional field, of whatever type you like, and require your database to store that. Only in your data access layer would you need to find & update the DB record(s) based on your object's friendly ID. Easy.
Then, your constraints on which databases can persist your objects have changed from 'must be able to have a primary key of type xxxx' to simple 'must be able to store type xxxx'. I think you'll then find you cna use any database in the world. Happy coding! DDD is the best!
You can potentially have the ids in the entity but not expose it as part of entity's public interface. This is possible with NHibernate because it allows you to map table column to a private field.
So you can potentially have something like
class Customer {
private readonly Int32? _relationalId;
private readonly String? _documentId;
...
This is not ideal because your persistence logic 'bleeds' on business logic but given the requirements it probably is easier and more robust than maintaining mapping between entity and its id somewhere outside entity. I would also highly recommend you to evaluate "Database agnostic" approach which would be more realistic if you only want to support relational databases. In this case you can at least reuse ORM like NHibernate for your repository implementation. And most relational database support same id types. In your scenario you not only need ORM you also need something like "Object-Document-Mapper". I can see that you will have to write tons and tons of infrastructure code. I highly recommend you to reevaluate your requirements and choose between relational and document databases. Read this: Pros/cons of document-based databases vs. relational databases
Related
Trying to build a stample project using DDD, I'm facing an issue:
To validate zipcode, address, and etc.., I have a set of db table(20 tables hundreds of columns, 26Mo) that I would like to query.
Those table are not related to my domain. This table have their own connection string and can be stored outside of the persitance DB.
I was thinking of adding a connection string to the Core and use a simple orm raw sql query to validate the data.
The process is easyer to write in C# than in SQL so there is no stored procedure to do the job.
There is no modification on those data. Only querying.
I think it's important to remember that DDD doesn't have to apply to everything you do. If you have a complex problem domain that is worthy of the complexities DDD brings, that's fine. However it's also fine to have other areas of your software (other boundaries, essentially) that are CRUD. In fact, CRUD is best where you can get away with it because of the simplicity. As #D.R. said, you can load data using something more akin to a Transaction Script (I can see something like IZipCodeValidator in your future) and pass the results of that in where you need them, or you might consider your Application Service being allowed to go and get that ZipCode data using CRUD (IZipCodeRepository) and passing that in to a full-on Domain Object that has complex rules for the validation.
I believe it's a DDD purist view to try and avoid passing things to methods on Domain Objects that do things (e.g. DomainObject.ValidateAddress(address, IZipCodeRepository repo)), instead preferring to pass in the values useful for the validation (e.g. DomainObject.ValidateAddress(address, IEnumerable<ZipCode> zipcodes)). I think anyone could see the potential for performance issues there, so your mileage may vary. I'll just say to resist it if you can.
This sounds like a bounded context of its own. I'd probably query it from the core domain using an anti-corruption layer in-between. So your domain simply uses an interface to a service. Your application layer would implement this interface with the anti-corruption layer to the other bounded context. The implementation can use simple DB query mechanisms like ADO.NET queries.
I'm developing a .NET web service while trying to maintain a layered architecture that keeps my Models in one project and DB access (DAL) in a different one. The idea behind this is that if I have to change the DB technology, it's just a matter of creating a differnet DAL, while the rest of the application remains unchanged.
In the data-access layer that I'm developing, I am using Mongo DB C# Driver.
I've seen that:
Properties named "ID" will be mapped by the C# driver as the database's "_id" (Convention over configuration);
Int + Auto-increment in MongoDB is not a good idea;
Using Guid's as ID in MongoDB isn't a good idea either;
The recommended data type for the ID of documents stored in MongoDB is ObjectID. The C# driver provides a class to represent this;
However, if I use this data type (from MongoDB.Bson) in my Models, then they will become dependent on the MongoDB C# Driver and I don't want that: I want my models to be DB-independent; only my DALs can depend on whatever data access technologies I use.
So what data type should I use for my POCOs' IDs in order to have guarantee uniqueness in the DB? Would a string representation of a Guid be horrible in terms of performance?
Your feedback is welcome.
Good question.
From Experience, I can say that you're right: both GUIDs and auto-increment aren't the best idea (with GUID being a lot better than auto-increments), but not only for the reason mentioned in the SO question you linked to, but mostly because you need to be aware of the implications of monotonic vs. non-monotonic keys.
With the ObjectIds, I see three options:
Map between domain model and DAL. In the domain model, you could use the objectid's string representation. That's a bit annoying, but it forces you to separation of concerns.
Use your own data type and implement a type converter / mongodb serializer. I haven't tried that but I don't see why this wouldn't work.
Accept the MongoDB dependency. After all, if you really swap out your database, that will be a huge task. Different databases have very different characteristics and require very different data models. The whole "swap out the database" in a minute is bogus IMHO, it's never that easy and a database is a much leakier abstraction than anyone wants to admit. Trying to keep independent is a PITA. Anyway, doing a seek-and-destroy on the word ObjectId will be less than 1% of the other work.
I am currently in the process of developing a a rather big web application and is using domain driven design.
I have currently run into some trouble with tracking changes to my Product entity. The thing is, products are constructed partly from data in SQL Azure, partly from data in Azure Table Storage. If certain properties are changed, I will need to persist to both, other changes only to one.
As a result I can not use NHibernate or Entity Framework for tracking changes. For instance the Price argument on the
public void AddPrice(Price price)
method on the Product entity must be persisted to SQL Azure, calculations on a range of prices will take place and the result will be saved to Azure Table Storage.
How would you solve this?
Thoughts:
1) I thought about implementing my own change tracker based on Castle.DynamicProxy, but that seems rather tedious.
2) Implement events internally in the domain entities. This is not a good thing.
Scattering one entity across several persistent stores might not be a good idea. To be more precise, it might mean that it's not one and the same entity and could be split up in smaller, more accurately designed parts instead.
calculations on a range of prices will take place
Are you sure these calculations affect the Product entity and should be handled by the same NHibernate/EF session used in the Product repository ? Since they have to be stored elsewhere, don't they make up a first class notion in the ubiquitous language, resulting in a separate entity with a persistence logic of its own ?
See http://ayende.com/blog/153699/ask-ayende-repository-for-abstracting-multiple-data-sources
What do ORMs do? They take a copy of the data that's used to restore your object into its current state, just before they hand you a reference to the object. When behavior has been applied to the object and you're asking to persist it, the ORM will compare its copy of the data to the data currently inside the object and flush changes accordingly. Why not do the same? The only difference is that not all detected changes will be flushed to the same datastore.
HTH.
BTW, any concurrency going on here?
I am starting my first project using DDD (using C#) and at this stage I feel we will probably go with MongoDB or maybe CouchDB for the persistence (an ORM like Entity framework seems too much of an overkill for what we want), but saying that I have pretty much zero experience in MongoDB or CouchDB at this stage.
As I am creating my domain I thought about using GUIDS as my IDs for my entities (coming from a relational database world, still having trouble moving away from it).
If I go down this route will I be able to easily plugin in my persistence layer (mongoDB/CouchDB) or would I have to change my domain model (currently my constructors on my entity objects have a string ID as a parameter (which will be the GUID ID)).
JD
With MongoDB you probbly want to have a collection per aggregate root, which means that your aggregate roots needs ids, since they will be the documents in the DB. If you want to keep your domain model free of MongoDB specific code those ids can be strings.
I would not include the ids in the constructor arguments. I would just let them be writable properties. As with an ORM I would consider handling reading and storing of entities via repositories. And keep the MongoDB code in there.
I'm trying to brush up on my design pattern skills, and I'm curious what are the differences between these patterns? All of them seem like they are the same thing - encapsulate the database logic for a specific entity so the calling code has no knowledge of the underlying persistence layer. From my brief research all of them typically implement your standard CRUD methods and abstract away the database-specific details.
Apart from naming conventions (e.g. CustomerMapper vs. CustomerDAO vs. CustomerGateway vs. CustomerRepository), what is the difference, if any? If there is a difference, when would you chose one over the other?
In the past I would write code similar to the following (simplified, naturally - I wouldn't normally use public properties):
public class Customer
{
public long ID;
public string FirstName;
public string LastName;
public string CompanyName;
}
public interface ICustomerGateway
{
IList<Customer> GetAll();
Customer GetCustomerByID(long id);
bool AddNewCustomer(Customer customer);
bool UpdateCustomer(Customer customer);
bool DeleteCustomer(long id);
}
and have a CustomerGateway class that implements the specific database logic for all of the methods. Sometimes I would not use an interface and make all of the methods on the CustomerGateway static (I know, I know, that makes it less testable) so I can call it like:
Customer cust = CustomerGateway.GetCustomerByID(42);
This seems to be the same principle for the Data Mapper and Repository patterns; the DAO pattern (which is the same thing as Gateway, I think?) also seems to encourage database-specific gateways.
Am I missing something? It seems a little weird to have 3-4 different ways of doing the same exact thing.
Your example terms; DataMapper, DAO, DataTableGateway and Repository, all have a similar purpose (when I use one, I expect to get back a Customer object), but different intent/meaning and resulting implementation.
A Repository "acts like a collection, except with more elaborate querying capability" [Evans, Domain Driven Design] and may be considered as an "objects in memory facade" (Repository discussion)
A DataMapper "moves data between objects and a database while keeping them independent of each other and the mapper itself" (Fowler, PoEAA, Mapper)
A TableDataGateway is "a Gateway (object that encapsulates access to an external system or resource) to a database table. One instance handles all the rows in the table" (Fowler, PoEAA, TableDataGateway)
A DAO "separates a data resource's client interface from its data access mechanisms / adapts a specific data resource's access API to a generic client interface" allowing "data access mechanisms to change independently of the code that uses the data" (Sun Blueprints)
Repository seems very generic, exposing no notion of database interaction.
A DAO provides an interface enabling different underlying database implementations to be used.
A TableDataGateway is specifically a thin wrapper around a single table.
A DataMapper acts as an intermediary enabling the Model object to evolve independently of the database representation (over time).
There is a tendency in software design world (at least, I feel so) to invent new names for well-known old things and patterns. And when we have a new paradigm (which perhaps slightly differs from already existing things), it usually comes with a whole set of new names for each tier. So "Business Logic" becomes "Services Layer" just because we say we do SOA, and DAO becomes Repository just because we say we do DDD (and each of those isn't actually something new and unique at all, but again: new names for already known concepts gathered in the same book). So I am not saying that all these modern paradigms and acronyms mean EXACTLY the same thing, but you really shouldn't be too paranoid about it. Mostly these are the same patterns, just from different families.
Data Mapper vs Table Data Gateway
To make a long story short:
the Data Mapper will receive the Domain Model object(Entity) as param and will use it to implement the CRUD operations
the Table Data Gateway will receives all the params(as primitives) for the methods and will not know anything about the Domain Model object(Entity).
In the end both of them will act as mediator between the in-memory objects and the database.
You have a good point. Pick the one you are most familiar with. I like to point out few things that may help clarify.
The Table Data Gateway is used mainly for a single table or view. It contains all the selects, inserts, updates, and deletes. So Customer is a table or a view in your case. So, one instance of a table data gateway object handles all the rows in the table. Usually this is related to one object per database table.
While Data Mapper is more independent of any domain logic and is less coupled (although I believe either there is coupling or not coupling). It is merely a intermediary layer to transfer the data between objects and a database while keeping them independent of each other and the mapper itself.
So, typically in a mapper, you see methods like insert, update, delete and in table data gateway you will find getcustomerbyId, getcustomerbyName, etc.
Data transfer object differs from the above two patterns, mainly because it is a distribution pattern and not a data source pattern as above two patterns. Use it mainly when you are working with remote interface and need to make your calls less chatty as each call can get expensive. So usually design an DTO which can be serialized over wire that can carry all the data back to the server for applying further business rules or processing.
I am not well versed in repository pattern as I did not get a chance to use till now but will be looking at others answers.
Below is just my understanding.
TableGateWay/RowDataGateWay:
In this context, Gateway is referring a specific implementation that has each "domain object" mapping to each "domain object gateway". For example, if we have Person, then we will have a PersonGateway to store the domain object Person to database. If we have Person, Employee, Customer, etc, we will have PersonGateway, EmployeeGateway, and CustomerGateway. Each gateway will have specific CRUD function for that object and it has nothing to do with other gateway. There is no reusable code/module here. The gateway can be further divided into RowDataGateway or TableGateway, depends if you pass an "id" or an "object". Gateway is usually compared with Active record. It ties your domain model to database schema.
Repository/DataMapper/DAO: They are the same thing. They all refer to the Persistence layer that transfer database entities to domain model. Unlike gateway, the Repository/DataMapper/DAO hide the implementation. You don't know if there is a PersonGateway behind Person. It may, or it may not, you don't care. All you know is it must have CRUD operations supported for each domain object. It decouple the data source and domain model.