I'm starting with DDD and trying to apply in my current project, but as you can suppose I have thousand of questions.
Here I present a sample domain so I can make different question and serves as an exercise on which you can explain me how things can be made.
Our hypothetical system must take control of companies and what persons works on each one.
Domain.
Company(id, name, address)
Employee(id, name, surname, age)
A person can only work on a company and a company can have many employees working on it.
Operations
The system must allow to add a new employee to a company. For this it receives the company id and the name, surname, age of the new employee. There are some restriction to satisfy:
There can not be another employee with the same name, surname and age in the company.
The employee can be working in another company.
Questions
I have a mess in my mind :)
To implement the operation I'm thinking on:
Service receives all the parameters.
Service call CompanyRepository->findCompanyById to retrieve the company instance.
Service creates a new instance of Employee using the specified parameters.
Services calls company->addEmployee to attach the employee to the company.
Within the company->AddEmployee check the new employee satisfies the conditions (specifications).
Service calls CompanyRepository->save(company) to persist the company in addition to the employee.
Because company+employee is managed as a cluster (aggregate) I'm considering the company the aggregate root.
Is this a good implementation?
If I consider company+employee an aggregate, in the same way I described to save the cluster company+employe, must I retrieve all the related employees too when I retrieve the company instance from repository ?
Respect specifications I can understand easily how to check, for example, if a employee name has more than 10 chars, but: how to check if the employee exists in the same company if the company has thousands of employees?
Can a specification call repository operations? and if yes, and the fact that considering company+employee a cluster is right, what would be the right place? CustomerRepository->findEmployeeByName(idCompany, nameEmployee) or better create a specific EmployeeRepository.
1
What is good or bad is just an opinion. DDD shouldn't be a dogma and you can take the best of it and your own additions to build good software architectures.
2
No, company can either lazily-load employees (i.e. when runtime access the Employee property the first time after the company has been retrieved) or company can implement a Load method which also supports pagination to load only required employees.
3
You should implement this at repository level (for example ICompanyRepository.ContainsEmployee(Employee)) and use the underlying data mapper to perform this heavy operation.
Also, I tend to call specifications as pre-/post-conditions in repositories, because it's the only way to be sure that they will be fulfilled in all cases.
4
I would avoid it, but if you want to ensure a lot of domain rules, you'll need to use them in specifications. BTW, I wouldn't use repositories but I would use services instead.
The right place depends on the requirement. If you want to perform checks to be sure that domain objects are being stored in the repository in a valid state, I find no issue if you want to call an specification from within ICompanyRepository.Add or ICompanyRepository.Update, or ICompanyRepository.AddOrUpdate. The point here is you shouldn't verify object states both when they're stored in the repository and when they're retrievied. If your specifications and your code is reliable, if the repository has filtered domain objects and it has stored them in the underlying data store, you can be sure than read operations will get valid domain objects.
Side note: While you shouldn't model your domain based on the underlying data store (either relational, NoSQL, file system...), again, you shouldn't apply DDD like a dogma. If the underlying data store provides a better approach to define data constraints, I would try to use them instead of implementing complex specifications which might need to access the data anyway.
Your solution should be a balance of an optimal software architecture and runtime performance.
Related
First of all I'm using manual queries.
I'm having some issues organizing my service and repository layer. The problem is that I have some objects that are rather large aggregate objects that need to have basic data from their constituent objects in display mode but when being processed it needs to have all its child objects fully developed to avoid null errors and some other business logic errors.
So for example a Job has Customer, Supplier, Site, Machine, User, Run, Store, Component, Document, CheckSheet, Requisitions, OrderItems.
When displaying a Job we want to have things like customer name, supplier name, user, component description, store name, machine number, site name, number of checksheets. As you can see it's mostly just descriptions. However when I want to process a Job I need all of those objects fully developed.
Currently the way I'm doing it, which I don't like, is that when fetching a Job from the JobRepository it fully develops the Job object directly and fills the rest of the child objects with display values only and when I want to process a job I go ahead and call the respective repositories for each object to get a full object. There are some problems with this approach.
If we add a field to an object I have to potential look for changes in more than 1 places, which kind of defeats the purpose of a repository.
I can only have 2 'modes' of retrieval which is fully developed or display only. What if I want some of the child objects to be more developed but not others? For example when displaying OrderItems I'd like to have the Part object within it fully developed with prices and all.
I feel like things will get out of control soon with duplication everywhere and I need a better long term solution. Any help would be appreciated.
So I am trying to follow the Domain Driven Design approach and am not sure how to handle lookup tables that require CRUD operations. Here is a contrived example to demonstrate my question.
Let's say I have a Person class
public class Person
{
public string Address { get; private set; }
}
My database has a table of People and a table of Addresses. The People table has a column for Address_Id which is a foreign key to the Address table. The obvious idea being you can't add a person record to the People table with an address value that doesn't exist in the Addresses table since there is a foreign key relationship.
In my application, Person is an aggregate root and thus has an associated repository. Using the repository, I load people which also loads the associated address.
My question is how do I do CRUD operations on the Addresses table? Following DDD principles, I don't think I am supposed to create a repository for the Addresses table.
The requirement for my application is that when creating a new Person object, a drop-down list of addresses is presented from which the user will select an address. They are not allowed to hand type addresses, they must pick from one that already exists. Hence, the CRUD operations on the Addresses table. The administrator of the system will need to manage the Addresses lookup table so that the proper selections are presented to a user creating Person objects.
Again, this is a very contrived example and I get that nobody would ever design such a system. It is simply to help illustrate the question.
IMO, you have two use cases: 1) Saving Person objects, but before 2) listing all available Addresses to be able to select the right one.
So, I would create a AddressRepository, maybe not a CRUD one, but only for fetching entities.
Are you ever editing or retrieving addresses on their own? The repository is essentially a mapper for a business object to a relational database, it is supposed to encapsulate how the object is persisted so you don't have to worry about it.
If the object is persisted in multiple tables the business logic does not have to know that, so unless you needed to edit Address objects on their own, I wouldn't add a repository for Address.
Have a look at this: Repository Pattern Step by Step Explanation
Well, but I guess property string Address should be Address Address.
In that case, when you store a Person on PersonRepository, if some given Address doesn't exists in the underlying store, the whole repository using its tech-specific implementation will create the whole address registry in your Addresses relational table for you.
Also, I guess you'll be using a repository over an existing data mapper - an OR/M -, which should manage this cases easily: it's just about mapping the whole property as a many-to-one assocation.
Actually I believe a repository should store root aggregates like you mention in your own question.
It depends on your own domain. If an address can live alone because it can be associated to 0 or more persons, you should consider adding addresses using a specific AddressRepository, register addresses using it and later you can always associate one to some Person.
I'm currently in the process of migrating a legacy application to a domain driven design model and I came across this issue:
The application is about managing a large amount of contacts in real time including checking for duplicates. Whenever someone saves a new contact, it has to pass a duplicate check with a 3rd party software (it's basically a similarity search software). If it passes the check, the contact will be created in SQL and a small subset of the contact (some core fields which are relevant for duplicate checking) has to be stored in the database of the 3rd party software.
So the entity "contact" lives in two (synchronized) storage systems, but one system only has a small subset of fields whereas SQL has 50+ fields for the contact.
Now I was thinking if it would be OK to create two types for "contact" (Contact and ContactShort). As a result I'd also have to create two repositories for those entities and use those repositories in a domain service which is ultimately used to perform those operations where I need the duplicate checking software (like the Save/Insert methods).
Is there a good rule of thumb of how to approach such a scenario?
EDIT: I still haven't found a definitive solution but thought a bit more about it:
Maybe it was wrong to separate the duplicate checking storage system from the SQL DB in this case. Actually, I think it is wrong to expose the methods of the 3rd party software. It is pure infrastructure. Since a save operation must never be performed without the duplicate check, I think the calls to the 3rd party software should be internal to the SQLRepository. It must never leave the infrastructure layer since it can never return a valid entity of a contact. What do you think?
To me your suggested solution sounds good. At the lower level (data access layer) you should have two independent objects that wrap access to two different databases (two repositories, as you require different connection strings. This can be 2 instances of the same XXXRepository if you use the same database engine, or it can be different repositories XXXRepository and YYYRepository to access 2 different database engines).
On the upper level (Domain Layer and GUI), however, you shouldn't be bothered how and where these data go. As you said, you have a service that separates the pluming so that the application domain and upper layers (like GUI) won't see what's happening below (in data access layer).
First off, I am using web forms without any ORM framework.
I have been struggling with how to make my domain objects as "smart" and "rich" as they can be without allowing them access to my service and repository layer. My most recent attempt was in creating a model for gift certificates for a online store.
The main recurring issues that I am seeing is that:
More and more logic keeps being introduced in the service layer. All the calls to the repository must pass through the service layer and each time the parameters are validated (eg - exists in db, etc). As a result my service layer is growing, but my domain objects just have some simple contractual validations. Even object validation is in the service layer since if the ID of the item is null, it will check the db to ensure that the code is unique. IHMO, the consumer of the system should not care if the functionality they need deals with persistence or not.
I have a separate POCO for transaction log entries for when a gift certificate is redeemed. I assume that I should put a list or collection of these transactions as a property of my Gift Certificate model, but I am still unsure of when that property should be filled. Do I add a separate method on the service for loading the transactions into a object on demand (eg - LoadTransactions(gc object)) or should the transactions be automatically loaded any time a existing gift certificate or list of gift certificates are requested (or maybe a option in the getGCs to load transactions as well)
What about computed fields like "Available Balance"... should I even have properties like this on my object? Anytime I am working with the object, I will need to keep updating that property to insure it is up to date. Right now I simply have a service method GetBalanceByCode(gc code).
Even actions like redeeming a gift certificate are basically 100% data-centric (take some input parameters, validate them and add a transaction log entry to db).
More and more logic keeps being
introduced in the service layer (...)
Even object validation is in the
service layer (...)
Validation is not the best candidate as domain model element. Input (my personal preference is that it's represented as commands) should be validated at application service level. Domain logic should model how business work and assume that all the arguments are valid. Good candidates for domain logic are computations for example: you want to have them in one single place and have them well tested.
I have a separate POCO for transaction
log entries for when a gift
certificate is redeemed.
This kind of object is known as Event. You can learn about Events from Eric Evans 'What I learnt since the Blue Book' presentation. Event is basically an entity which is immutable. Events are quite often aggregates on their own because usually there's lots of them. By making them aggregates, you don't have any problems with lazy loading them as part of other objects's collection.
What about computed fields like
"Available Balance"... should I even
have properties like this on my
object?
Computed properties are kind of logic that naturally fits in domain model, however it's debatable if a better approach is to compute the value each time or compute it when object changes and persist it in the DB.
Even actions like redeeming a gift
certificate are basically 100%
data-centric (take some input
parameters, validate them and add a
transaction log entry to db).
This action would be modeled as creating a CertificateRedeemed event. This event would be probably created by Certificate aggregate or some other object. This blog post by Udi Dahan can be helpful
This is not an entirely easy question to answer given the fact that domain models are very subjective, and rely a lot on your...well, domain. It sounds like you are actually creating something similar to The Onion Architecture (and Part 2) described by Jeffery Palermo. This is not a bad pattern to use, though DDD purists will tell you it leads to "anemic" domain models (where your domain objects are basically Data holders with no behavior). The thing is, that may be exactly what you need in your scenario. A "full, rich" domain model may be overkill for what you are doing (and given your last bullet point it sounds like that could be the case).
You may not need a domain model for your system at all. You could be well served with some View Models (that is simple data models to describe your view) and have your UI send some DTOs to through your services to put the data in the database. If you find something that requires a more complex approach, then you can apply a richer domain model to that component. Also remember that you don't necessarily have one domain model in your system. There can, and in many cases should, be different models that describe things differently (often grouped into Bounded Contexts). The overall goal of DDD is to simplify otherwise complex systems. If its causing you additional complexity, then you may be taking the long way round.
There is an approach called DCI (data-context-interactions) which is supposed to be alternative to the old school OOP. Although it does not address explicitly the issue of persistence ignorance, your question brought it to my mind, because it deals with similar issues.
In DCI domain objects are small data-holders with only a little logic, like in your case, and interactions between them are implemented separately. The algorithm of interaction is not spread through small methods of several objects, but it is in one place, which might make it more lucid and understandable.
I think it is still rather academic thing than a solution we should start implementing tomorrow, but someone who comes across this question might be interested.
I'm trying to brush up on my design pattern skills, and I'm curious what are the differences between these patterns? All of them seem like they are the same thing - encapsulate the database logic for a specific entity so the calling code has no knowledge of the underlying persistence layer. From my brief research all of them typically implement your standard CRUD methods and abstract away the database-specific details.
Apart from naming conventions (e.g. CustomerMapper vs. CustomerDAO vs. CustomerGateway vs. CustomerRepository), what is the difference, if any? If there is a difference, when would you chose one over the other?
In the past I would write code similar to the following (simplified, naturally - I wouldn't normally use public properties):
public class Customer
{
public long ID;
public string FirstName;
public string LastName;
public string CompanyName;
}
public interface ICustomerGateway
{
IList<Customer> GetAll();
Customer GetCustomerByID(long id);
bool AddNewCustomer(Customer customer);
bool UpdateCustomer(Customer customer);
bool DeleteCustomer(long id);
}
and have a CustomerGateway class that implements the specific database logic for all of the methods. Sometimes I would not use an interface and make all of the methods on the CustomerGateway static (I know, I know, that makes it less testable) so I can call it like:
Customer cust = CustomerGateway.GetCustomerByID(42);
This seems to be the same principle for the Data Mapper and Repository patterns; the DAO pattern (which is the same thing as Gateway, I think?) also seems to encourage database-specific gateways.
Am I missing something? It seems a little weird to have 3-4 different ways of doing the same exact thing.
Your example terms; DataMapper, DAO, DataTableGateway and Repository, all have a similar purpose (when I use one, I expect to get back a Customer object), but different intent/meaning and resulting implementation.
A Repository "acts like a collection, except with more elaborate querying capability" [Evans, Domain Driven Design] and may be considered as an "objects in memory facade" (Repository discussion)
A DataMapper "moves data between objects and a database while keeping them independent of each other and the mapper itself" (Fowler, PoEAA, Mapper)
A TableDataGateway is "a Gateway (object that encapsulates access to an external system or resource) to a database table. One instance handles all the rows in the table" (Fowler, PoEAA, TableDataGateway)
A DAO "separates a data resource's client interface from its data access mechanisms / adapts a specific data resource's access API to a generic client interface" allowing "data access mechanisms to change independently of the code that uses the data" (Sun Blueprints)
Repository seems very generic, exposing no notion of database interaction.
A DAO provides an interface enabling different underlying database implementations to be used.
A TableDataGateway is specifically a thin wrapper around a single table.
A DataMapper acts as an intermediary enabling the Model object to evolve independently of the database representation (over time).
There is a tendency in software design world (at least, I feel so) to invent new names for well-known old things and patterns. And when we have a new paradigm (which perhaps slightly differs from already existing things), it usually comes with a whole set of new names for each tier. So "Business Logic" becomes "Services Layer" just because we say we do SOA, and DAO becomes Repository just because we say we do DDD (and each of those isn't actually something new and unique at all, but again: new names for already known concepts gathered in the same book). So I am not saying that all these modern paradigms and acronyms mean EXACTLY the same thing, but you really shouldn't be too paranoid about it. Mostly these are the same patterns, just from different families.
Data Mapper vs Table Data Gateway
To make a long story short:
the Data Mapper will receive the Domain Model object(Entity) as param and will use it to implement the CRUD operations
the Table Data Gateway will receives all the params(as primitives) for the methods and will not know anything about the Domain Model object(Entity).
In the end both of them will act as mediator between the in-memory objects and the database.
You have a good point. Pick the one you are most familiar with. I like to point out few things that may help clarify.
The Table Data Gateway is used mainly for a single table or view. It contains all the selects, inserts, updates, and deletes. So Customer is a table or a view in your case. So, one instance of a table data gateway object handles all the rows in the table. Usually this is related to one object per database table.
While Data Mapper is more independent of any domain logic and is less coupled (although I believe either there is coupling or not coupling). It is merely a intermediary layer to transfer the data between objects and a database while keeping them independent of each other and the mapper itself.
So, typically in a mapper, you see methods like insert, update, delete and in table data gateway you will find getcustomerbyId, getcustomerbyName, etc.
Data transfer object differs from the above two patterns, mainly because it is a distribution pattern and not a data source pattern as above two patterns. Use it mainly when you are working with remote interface and need to make your calls less chatty as each call can get expensive. So usually design an DTO which can be serialized over wire that can carry all the data back to the server for applying further business rules or processing.
I am not well versed in repository pattern as I did not get a chance to use till now but will be looking at others answers.
Below is just my understanding.
TableGateWay/RowDataGateWay:
In this context, Gateway is referring a specific implementation that has each "domain object" mapping to each "domain object gateway". For example, if we have Person, then we will have a PersonGateway to store the domain object Person to database. If we have Person, Employee, Customer, etc, we will have PersonGateway, EmployeeGateway, and CustomerGateway. Each gateway will have specific CRUD function for that object and it has nothing to do with other gateway. There is no reusable code/module here. The gateway can be further divided into RowDataGateway or TableGateway, depends if you pass an "id" or an "object". Gateway is usually compared with Active record. It ties your domain model to database schema.
Repository/DataMapper/DAO: They are the same thing. They all refer to the Persistence layer that transfer database entities to domain model. Unlike gateway, the Repository/DataMapper/DAO hide the implementation. You don't know if there is a PersonGateway behind Person. It may, or it may not, you don't care. All you know is it must have CRUD operations supported for each domain object. It decouple the data source and domain model.