How to implement entity validation across instances of the same entity

How to implement entity validation across instances of the same entity - c#

In our project we use DDD as architecture (clean architecture).
Let's say I have an entity called A. A has a property called B.
Now I want a validation that when a second entity A is created, that B must be unique over all instances of A in a store.
My idea was to implement a domain service for it, using the repository. The question then is if this domain service should implement the validation itself or just provide that data for it... (to be used in the interactor/usecase for validation).
Example code (code is kept simple):
public class A
{
public A(string b)
{
B = b;
}
public string B {get; private set;}
}

Let's say I have an entity called A. A has a property called B. Now I want a validation that when a second entity A is created, that B must be unique over all instances of A in a store.
The problem you are trying to solve is sometimes known as set validation.
The easy answer: you introduce an index, that tracks the mapping of each value B to the specific entity A that is allowed to own it.
Of course, that introduces contention; you'll need to mitigate the case where two different A's are being modified at the same time. The index, and all of the A's, become part of a single consistency boundary that needs to be managed. This is pretty much what happens when we are storing our entites in a single RDBMS -- we can introduce a constraint to ensure that there are no duplicates.
You can split that single consistency boundary into separate A entities, and also individual B->A entities. But now you have the possible problem of trying to modify two different consistency boundaries at the same time, and that introduces race conditions.
A third possibility is to relax the consistency constraint -- allow conflicts to be stored, and resolve them later. See, for example, Greg Young on warehouse systems and Udi Dahan on race conditions.
The usual answer from domain-driven-design is to push back really hard on that requirement, to make sure that it is real: what's the actual cost to the business if the constraint is violated?
Think airplane seat maps: obviously only one passenger should be sitting in a seat. But that doesn't mean it's a critical failure for the seat to be assigned to more than one person, because the human operators (gate agents) have ways of mitigating these problems. See also Greg Young's talk Stop Over Engineering.

I think a domain service is the option to take, take a look at this blog(blog.sapiensworks.com/post/2017/08/23/…) where the 'username must be unique' scenario is given, resembling my issue in the initial post.

Related

DDD : create repository for an entity and its status

I have an entity in my domain which I need to track its status. And I have a handler for this need. This status is like InProgress, Completed or Deleted. And I use CosmosDb, SQL API for storing that data.
Inside CosmosDb, I have created a container for those created entities and another container for its status. Therefore, inside the code, I have two repositories for those two containers.
internal interface EntityRepository
{
Task AddAsync(Entity entity);
}
internal interface EntityStatusRepository
{
Task AddAsync(EntityStatus entityStatus);
}
And for each repository, I have created one service
public interface EnityService
{
Task AddAsync(Entity entity);
}
public interface EntityStatusService
{
Task AddStatusAsync(EntityStatus entityStatus)
}
Those services have been exposed as public interfaces for the handler and not repositories.
Now I really wonder
Based on DDD and having an entity and its status, should I create two separated repositories or they should be as one repository, as they are one context?
Do I need to expose the entity and its status through one service?
I wonder if anyone has a suggestion or even a better solution?

I'm not a DDD expert - just reading through Implementing DDD by Vernon but from my experience, you have an issue with bounded context. Your models Entity and EntityStatus are probably closely related. In that case you should create EntityStatusRepository only if you have a place where you need EntityStatuses by itself. If you need both of them just go with EntityRepository

It appears the EntityStatus should be a property on Entity, but let’s go through the logic to make sure. Note that these are not hard rules, just my rules of thumb when I’m going through these decisions. Extenuating circumstances my supersede these.
Should EntityStatus be an Aggregate Root? Would it make sense to
work with an EntityStatus by itself with no relationship to anything
else, or with only references to child objects? If not, then it is
not an Aggregate Root. That means it’s either a supporting entity or
a property.
If the parent entity always has exactly one current value of
EntityStatus, and no logic needs to be embedded inside the status,
then it is best to leave it as a property on the Entity.
If the EntityStatus needs logic built into it then it should probably
be a value object. For example, if status can only change from X to
Y in some circumstances but not others, or if some external process
must be launched when a status changes, it should be a value object
whose value is set by the Entity. Being a value object doesn't necessarily mean it's a separate entity, though.
Finally, I prefer to tie my repositories to Aggregate Roots even if there are value objects owned by the AR. An AR update should all be saved or nothing, and extending a DB single transaction across repositories is less than ideal. If you’re using the Unit of Work pattern, then an AR update should be a single unit. I’ve tried creating a separate repo per table where the AR repo uses the individual table repos, and it felt too granular with all the plumbing code. It was also easy to lose the business idea you’re trying to accomplish when dealing with all the pieces floating around. In the end, though, there’s no rule governing this so do what you think is right.

object == object instead of object.id == object.id potential problems

I have inherited a very sloppy project and I am tasked with explaining why its bad. I've noticed all over the code they have done comparisons like this
(IQueryable).FirstOrDefault(x => x.Facility == facility && x.Carrier == shipCode.Carrier
in the example above x.Facility is from the database and facility is from shipment.facility which is mapped as a complex object in nhibernate.
I would have expected to see .FirstOrDefault(x => x.Facility.ID == facility.ID
At first I thought comparing the whole object might cause issues if the facility record was changed in the db then the facility stored in shipment would obviously be different.
After thinking about it more I realized that shipment.facility was populated from the id so it should match even if that facility record changed.
It still feels wrong to me I see this being very buggy and hard to track down? Is there somthing specifically wrong with comparing the entire object vs an id.

Expanding my previous comments into an answer:
I think this is actually good practice when the ORM allows it. I'm not experienced with NHibernate, so for the rest of the answer I'll assume that it does in fact implement equality in a sensible way (such as comparing primary keys). You should ensure this is the case, otherwise the code would be not only bad but potentially buggy.
As an analogy, forget about SQL for the moment, imagine you were dealing with a POCO which was not part of any ORM. You want to choose between the following:
Approach 1: IEquatable
public class Facility : IEquatable<Facility>
{
public int Id {get; private set;}
//The rest of the properties
public bool Equals(Facility other)
{
return other.Id == Id;
}
}
(You'd also want to override Object.Equals, but I'll exclude that for brevity)
Approach 2: IEqualityComparer
public class Facility
{
public int Id {get; private set;}
//The rest of the properties
}
public class FacilityIdsMatchEqualityComparer : IEqualityComparer<Facility>
{
public bool Equals(Facility x, Facility y)
{
return x.Id == y.Id;
}
}
(GetHashCode also excluded for brevity).
Which of the two approaches is better? Well, I'd say it's Approach 1 by a clear margin. It adheres to two important principles that Approach 2 doesn't:
Don't repeat yourself. In the second approach, any code anywhere trying to compare facilities would have to use the FacilityIdsMatchEqualityComparer. This fact, that particular equality comparison logic needs to be used, would be sprayed all over the solution, repeated every time you want to compare them.
Single responsibility principle. Not only does every class doing the comparison need to repeat code, but that code is taking on a responsibility that doesn't belong to the class. It should be up to the Facility to express the implementation of equality, not up to every class that wants to use it to say that it's done by comparing Ids. (Yes I realise that you could just make it, say, FacilityEqualityComparer so that the calling classes remain agnostic about how the equality comparison is done, but the purpose of this is as an analogy with the code in the OP, where the exact comparison logic is hard coded into every comparison)
So, bringing it back to the actual question, this analogy very closely mirrors the situation you have here. It's not quite as simple as the Facility implementing IEquatable, but the principle is exactly the same: the ORM is taking its own responsibility for how equality checking is implemented, rather than pushing that responsibility out to code using it.
This means that if I'm writing code, say outside of the data access layer, I can say "I want to check if these two objects are equal, so I'll write object1 == object2", rather than "I want to check if these two objects are equal, and these two objects are entities, which because of the way we implement our persistence means that when I check for equality that will be converted into a SQL query, so I need to write this check as if I were writing SQL, which means I need to compare their primary keys, which either by checking attributes or maybe through my knowledge of conventions in the data access layer, I know means comparing their Id properties. So I'll write object1.Id == object2.Id".
ORMs aren't perfect, and you can't always completely abstract away the underlying SQL database, but when you can, you should!

I am not sure what you are using but normally Entity Framework should give you an error like this if you compare the objects:
Unable to create a constant value of type '...'. Only primitive types
or enumeration types are supported in this context.
Because you are sending a class needs to be converted into SQL. In any case comparing objects is not ideal for SQL code if you are trying to send a query to SQL Server.

Reducing Repositories to Aggregate Roots

I currently have a repository for just about every table in the database and would like to further align myself with DDD by reducing them to aggregate roots only.
Let’s assume that I have the following tables, User and Phone. Each user might have one or more phones. Without the notion of aggregate root I might do something like this:
//assuming I have the userId in session for example and I want to update a phone number
List<Phone> phones = PhoneRepository.GetPhoneNumberByUserId(userId);
phones[0].Number = “911”;
PhoneRepository.Update(phones[0]);
The concept of aggregate roots is easier to understand on paper than in practice. I will never have phone numbers that do not belong to a User, so would it make sense to do away with the PhoneRepository and incorporate phone related methods into the UserRepository? Assuming the answer is yes, I’m going to rewrite the prior code sample.
Am I allowed to have a method on the UserRepository that returns phone numbers? Or should it always return a reference to a User, and then traverse the relationship through the User to get to the phone numbers:
List<Phone> phones = UserRepository.GetPhoneNumbers(userId);
// Or
User user = UserRepository.GetUserWithPhoneNumbers(userId); //this method will join to Phone
Regardless of which way I acquire the phones, assuming I modified one of them, how do I go about updating them? My limited understanding is that objects under the root should be updated through the root, which would steer me towards choice #1 below. Although this will work perfectly well with Entity Framework, this seems extremely un-descriptive, because reading the code I have no idea what I’m actually updating, even though Entity Framework is keeping tab on changed objects within the graph.
UserRepository.Update(user);
// Or
UserRepository.UpdatePhone(phone);
Lastly, assuming I have several lookup tables that are not really tied to anything, such as CountryCodes, ColorsCodes, SomethingElseCodes. I might use them to populate drop downs or for whatever other reason. Are these standalone repositories? Can they be combined into some sort of logical grouping/repository such as CodesRepository? Or is that against best practices.

You are allowed to have any method you want in your repository :) In both of the cases you mention, it makes sense to return the user with phone list populated. Normally user object would not be fully populated with all the sub information (say all addresses, phone numbers) and we may have different methods for getting the user object populated with different kind of information. This is referred to as lazy loading.
User GetUserDetailsWithPhones()
{
// Populate User along with Phones
}
For updating, in this case, the user is being updated, not the phone number itself. Storage model may store the phones in different table and that way you may think that just the phones are being updated but that is not the case if you think from DDD perspective. As far as readability is concerned, while the line
UserRepository.Update(user)
alone doesn't convey what is being updated, the code above it would make it clear what is being updated. Also it would most likely be part of a front end method call that may signifiy what is being updated.
For the lookup tables, and actually even otherwise, it is useful to have GenericRepository and use that. The custom repository can inherit from the GenericRepository.
public class UserRepository : GenericRepository<User>
{
IEnumerable<User> GetUserByCustomCriteria()
{
}
User GetUserDetailsWithPhones()
{
// Populate User along with Phones
}
User GetUserDetailsWithAllSubInfo()
{
// Populate User along with all sub information e.g. phones, addresses etc.
}
}
Search for Generic Repository Entity Framework and you would fine many nice implementation. Use one of those or write your own.

Your example on the Aggregate Root repository is perfectly fine i.e any entity that cannot reasonably exist without dependency on another shouldn't have its own repository (in your case Phone). Without this consideration you can quickly find yourself with an explosion of Repositories in a 1-1 mapping to db tables.
You should look at using the Unit of Work pattern for data changes rather than the repositories themselves as I think they're causing you some confusion around intent when it comes to persisting changes back to the db. In an EF solution the Unit of Work is essentially an interface wrapper around your EF Context.
With regards to your repository for lookup data we simply create a ReferenceDataRepository that becomes responsible for data that doesn't specifically belong to a domain entity (Countries, Colours etc).

If phone makes no sense w/o user, it's an entity (if You care about it's identity) or value object and should always be modified through user and retrieved/updated together.
Think about aggregate roots as context definers - they draw local contexts but are in global context (Your application) themselves.
If You follow domain driven design, repositories are supposed to be 1:1 per aggregate roots.
No excuses.
I bet these are problems You are facing:
technical difficulties - object relation impedance mismatch. You are struggling with persisting whole object graphs with ease and entity framework kind a fails to help.
domain model is data centric (as opposed to behavior centric). because of that - You lose knowledge about object hierarchy (previously mentioned contexts) and magically everything becomes an aggregate root.
I'm not sure how to fix first problem, but I've noticed that fixing second one fixes first good enough. To understand what I mean with behavior centric, give this paper a try.
P.s. Reducing repository to aggregate root makes no sense.
P.p.s. Avoid "CodeRepositories". That leads to data centric -> procedural code.
P.p.p.s Avoid unit of work pattern. Aggregate roots should define transaction boundaries.

This is an old question, but thought worth posting a simple solution.
EF Context is already giving you both Unit of Work (tracks changes) and Repositories (in-memory reference to stuff from DB). Further abstraction is not mandatory.
Remove the DBSet from your context class, as Phone is not an aggregate root.
Use the 'Phones' navigation property on User instead.
static void updateNumber(int userId, string oldNumber, string newNumber)
static void updateNumber(int userId, string oldNumber, string newNumber)
{
using (MyContext uow = new MyContext()) // Unit of Work
{
DbSet<User> repo = uow.Users; // Repository
User user = repo.Find(userId);
Phone oldPhone = user.Phones.Where(x => x.Number.Trim() == oldNumber).SingleOrDefault();
oldPhone.Number = newNumber;
uow.SaveChanges();
}
}

If a Phone entity only makes sense together with an aggregate root User, then I would also think it makes sense that the operation for adding a new Phone record is the responsibility of the User domain object throught a specific method (DDD behavior) and that could make perfectly sense for several reasons, the immidiate reason is we should check the User object exists since the Phone entity depends on it existence and perhaps keep a transaction lock on it while doing more validation checks to ensure no other process have deleted the root aggregate before we are done validating the operation. In other cases with other kinds of root aggregates you might want to aggregate or calculate some value and persist it on column properties of the root aggregate for more efficient processing by other operations later on. Note though I suggest the User domain object have a method that adds the Phone it doesn't mean it should know about the existence of the database or EF, one of the great feature of EM and Hibernate is that they can track changes made to entity classes transparently and that also means adding of new related entities by their navigation collection properties.
Also if you want to use methods that retrieve all phones regardless of the users owning them you could still though it through the User repository you only need one method returns all users as IQueryable then you can map them to get all user phones and do a refined query with that. So you don't even need a PhoneRepository in this case. Beside I would rather use a class with extensions method for IQueryable that I can use anywhere not just from a Repository class if I wanted to abstract queries behind methods.
Just one caveat for being able to delete Phone entities by only using the domain object and not a Phone repository you need to make sure the UserId is part of the Phone primary key or in other words the primary key of a Phone record is a composite key made up of UserId and some other property (I suggest an auto generated identity) in the Phone entity. This makes sense intuively as the Phone record is "owned" by the User record and it's removal from the User navigation collection would equal its complete removal from the database.

DDD: Can a Value Object have lists inside them?

I'm not well versed in domain driven design and I've recently started created a domain model for a project. I still haven't decided on an ORM (though I will likely go with NHibernate) and I am currently trying to ensure that my Value Objects should be just that.
I have a few VOs that have almost no behavior other than to encapsulate "like" terms, for instance:
public class Referral {
public Case Case { get; set; } // this is the a reference to the aggregate root
public ReferralType ReferralType { get; set; } // this is an enum
public string ReferralTypeOther { get; set; }
} // etc, etc.
This particular class has a reference to "Case" which is two levels up, so if say I were going to access a Referral I could go: case.social.referral (Case, Social and Referral are all classes, there is a single Social inside a Case and there is a single Referral inside a Social). Now that I am looking at it as I type it, I don't think I need a Case in the Referral since it will be accessible through the Social entity, correct?
Now, there is no doubt in my mind this is something that should be a VO, and the method I plan to use to persist this to the database is to either have NHibernate assign it a surrogate identifier (which I am still not too clear on, if anyone could please elaborate on that too it would help me out, since I don't know if the surrogate identifier requires that I have an Id in my VO already or if it can operate without one) and/or a protected Id property that would not be exposed outside the Referral class (for the sole purpose of persisting to the DB).
Now on to my title question: Should a VO have a collection, (in my case a List) inside it? I can only think of this as a one-to-many relationship in the database but since there is no identity it didn't seem adequate to make the class an entity. Below is the code:
public class LivingSituation {
private IList<AdultAtHome> AdultsAtHome { get; set; }
public ResidingWith CurrentlyResidingWith { get; set } // this is an enum
} // etc, etc.
This class currently doesn't have an Id and the AdultsAtHome class just has intrinsic types (string, int). So I am not sure if this should be an entity or if it can remain as a VO and I just need to configure my ORM to use a 1:m relationship for this using their own tables and a private/protected Id field so that the ORM can persist to the DB.
Also, should I go with normalized tables for each of my classes, or not? I think I would only need to use a table per class when there is a possibility of having multiple instances of the class assigned to an entity or value object and/or there is the possibility of having collections 1:m relationships with some of those objects. I have no problem with using a single table for certain value objects that have intrinsic types but with nested types I think it would be advantageous to use normalized tables. Any suggestions on this as well?
Sorry for being so verbose with the multiple questions:
1) Do I need a surrogate identifier (with say NHibernate) for my value objects?
2) If #1 is yes, then does this need to be private/protected so that my value object "remains" a value object in concept?
3) Can a value object have other value objects (in say, a List) or would that constitute an entity? (I think the answer to this is no, but I'd prefer to be sure before I proceed further.)
4) Do I need a reference to the aggregate root from a value object that is a few levels down from the aggregate root? (I don't think I do, this is likely an oversight on my part when writing the model, anyone agree?)
5) Is it OK to use normalized tables for certain things (like nested types and/or types with collections as properties which would need their own tables anyway for the 1:m relationship) while having the ORM do the mapping for the simpler value objects to the same table that belongs to my entity?
Thanks again.

Take a look at the answers to related questions here and here
1) Yes - If you're storing VOs in their own table
2) If you can use a private/protected ID property, then great. Alternatively, you might use explicit interfaces to 'hide' the ID property.
But, reading into your question, are you suggesting that developers who see an ID property will automatically assume the object is an entity? If so, they need (re)training.
3) Yes it can, but with the following restrictions:
It should be quite rare
It should only reference other VOs
Also, consider this: VOs shouldn't stick around. Would it be easy/efficient to re-create the entire VO every time it's needed? If not, make it an Entity.
4) Depends on how you want to implement your Aggregate Locking. If you want to use Ayende's solution, the answer is yes. Otherwise, you would need a mechanism to traverse the object graph back to the Aggregate Root.
5) Yes. Don't forget that DDD is Persistence Ignorant (in an ideal world!).
However...
I believe Referral should be an Entity. Imagine these conversations:
Conversation 1:
Tom: "Hey Joe! Can you give me David Jone's referral?"
Joe: "Which one?"
Tom: "Sorry, I mean Referral No.123"
Conversation 2:
Tom: "Hey Joe! Can you give me David Jone's referral?"
Joe: "Which one?"
Tom: "I don't care - just give me any"
Conversation 1 suggests that Referral is an Entity, whereas conversation 2 suggests it's a VO.
One more thing: Does Referral.ReferralType change during it's lifetime (there's another hint that it should be an Entity)? If it doesn't change, consider using polyporphism and let NH handle it.
Hope that helps!

Reconstituting domain objects from database: identity problem

We are using Linq to SQL to read and write our domain objects to a SQL Server database.
We are exposing a number of services (via WCF) to do various operations. Conecptually, the implementation of these operations consists of three steps: reconstitute the necessary domain objects from the database; execute the operation on the domain objects; persist the (now changed) domain objects back to the database.
Problem is that sometimes, there are two or more instances of the same entity objects, which can lead to inconsistenties when saving the objects back to the db. A little made-up example:
public void Move(string sourceLocationid, destinationLocationId, itemId);
which is supposed to move the item with the given id from the source to the destination location (actual services are more complicated, often involving many locations, items etc). Now, it could be that both source and destination location id are the same - a naive implementation would just reconstitute two instances of the entity object, which would lead to problems.
This issue is now "solved" by checking for it manually, i.e. we reconstitute a first location, check if the id of the second is different from it, and if so reconsistute the second, and so on. This is obvisouly difficult and error-prone.
Anyway, I was actually surprised that there does not seem to be a "standard" solution for this in domain driven design. In particular, repositories or factories do not seem to solve this problem (unless they maintain their own cache, which then needs to be updated etc).
My idea would be to make a DomainContext object per operation, which tracks and caches the domain objects used in that particular method. Instead of reconstituing and saving individual domain objects, such an object would be reconstituted and saved as a whole (possibly using repositories), and it could act as a cache for the domain objects used in that particular operation.
Anyway, it seems that this is a common problem, so how is this usually dealt with? What do you think of the idea above?

The DataContext in Linq-To-Sql supports the Identity Map concept out of the box and should be caching the objects you retrieve. The objects will only be different if you are not using the same DataContext for each GetById() operation.
Linq to Sql objects aren't really valid outside of the lifetime of the DataContext. You may find Rick Strahl's Linq to SQL DataContext Lifetime Management a good background read.
Also, the ORM is not responsible for logic in the domain. It's not going to disallow your example Move operation. That's up for the domain to decide what that means. Does it ignore it? or is it an error? It's your domain logic, and that needs to be implemented at the service boundary you are creating.
However, Linq-To-Sql does know when an object changes, and from what I've looked at, it won't record the change if you are re-assigning the same value. e.g. if Item.LocationID = 12, setting the locationID to 12 again won't trigger an update when SubmitChanges() is called.
Based on the example given, I'd be tempted to return early without ever loading an object if the source and destination are the same.
public void Move(string sourceLocationId, destinationLocationId, itemId)
{
if( sourceLocationId == destinationLocationId )
return;
using( DataContext ctx = new DataContext() )
{
Item item = ctx.Items.First( o => o.ItemID == itemId );
Location destination =
ctx.Locations.First( o => o.LocationID == destinationLocationID );
item.Location = destination;
ctx.SubmitChanges();
}
}
Another small point, which may or may not be applicable, is you should make your interfaces as chunky as possible. e.g. If you're typically going to perform 10 move operations at once, it's better to call 1 service method to perform all 10 operations at once, rather than 1 operation at a time. ref: chunky vs chatty

Many ORMs use two concepts that, if I understand you, address your issue. The first and most relevant is Context this is responsible for ensuring that only one object represents a entity (database table row, in the simple case) no mater how many times or ways it's requested from the database. The second is Unit of Work; this ensures that updates to the database for a group of entities either all succeed or all fail.
Both of these are implemented by the ORM I'm most familiar with (LLBLGen Pro), however I believe NHibernate and others also implement these concepts.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.