How to query aggregate root by some other property apart from Id? - c#

For clarification: BuckupableThing is some hardware device with program written in it (which is backed-up).
Updated clarification: This question is more about CQRS/ES implementation than about DDD modelling.
Say I have 3 aggregate roots:
class BackupableThing
{
Guid Id { get; }
}
class Project
{
Guid Id { get; }
string Description { get; }
byte[] Data { get; }
}
class Backup
{
Guid Id { get; }
Guid ThingId { get; }
Guid ProjectId { get; }
DateTime PerformedAt { get; }
}
Whenever I need to backup BackupableThing, I need to create new Project first and then create new Backup with ProjectId set to this new Project's Id. Everything is working as long as for each new Backup there's new project.
But really I need to create project if only it doesn't already exist, where unique id of existing project should be it's Data property (some kind of Hash of byte[] array). So when any other BackupableThing gets backed-up and the system sees that another BackupableThing has already been backed-up with the same result (Data) - show already created and working project with all descriptions and everything set.
First I thought of approaching this problem by encoding hash in Guid somehow, but this seems hacky and not straightforward, also it increases chances of collision with randomly generated Guids.
Then I came up with the idea of separate table (with separate repository) which holds two columns: Hash of data (some int/long) and PlcProjectId (Guid). But this looks very much like projection, and it is in fact going to be kind of projection, so I could rebuild it in theory using my Domain Events from Event Store. I read that it's bad to query read-side from domain services / aggregates / repository (from the write side), and I couldn't come up with something else in some time.
Update
So basically I create read-side inside the domain to which only domain has access. And I query it before adding new Project so that if it already exists I just use already existing one? Yes, I thought of it already over night, and it seems that not only I have to make such domain storage and query it before creating new aggregate, I also have to introduce some compensating action. For example, if multiple requests sent to create the same Project simultaneously, two identical projects would be created. So I need my domain storage to be an event handler and if user created the same project - I need to fire compensating command to remove/move/recreate this project using existing one...
Update 2
I'm also thinking of creating another aggregate for this purpose - aggregate for the scope of uniqueness of my project (in this specific scenario - GlobalScopeAggregate or DomainAggregate) which will hold {name, Guid} key-value reference. Separate GlobalScopeHandler will be responsible for ProjectCreated, ProjectArchived, ProjectRenamed events and will ultimately fire compensating actions if ProjectCreated event occurs with the same name which already has been created. But I am confused about compensating actions. How should I react if user has already made backup and has in his interface related view to the project? He can change description, name and etc. of wrong project, which already has been removed by compensating action. Also, my compensating action will remove Project and Backup aggregates, and create new Backup aggregate with existing ProjectId, because my Backup aggregate doesn't have setter on ProjectId field (it is immutable record of backup performed action). Is this normal?
Update 3 - DOMAIN clarification
There's number of industrial devices (BackupableThing, programmable controllers) on the wide network which have some firmware programmed in it. Customers update the firmware and upload it into the controllers (backupable things). This very program is gets backuped. But there's a lot of controllers of the same type, and it's very likely that customers will upload the same program over and over again to multiple controllers as well as to the same controller (as a means to revers some changes). User needs to repeatedly backup all those controllers. Backup is some binary data (stored in the controller, the program) and date of the backup occurrence. Project is some entity to encapsulate binary data as well as all information related to the backup. Given I can't backup program in the state that it was previously uploaded (I can only get unreadable raw binary data which I can also upload back into controller again), I require separate aggregate Project which holds Data property as well as number of attached files (for example, firmware project files), description, name and other fields. Now, whenever some controller is backed-up, I don't want to show "just binary data without any description" and force user to fill in all the descriptionary fields again. I want to look up if there's have already been done backup with the same binary data, and then just link this project to this backup so that user who backed-up another controller would instantly see lots of information regarding what's in this controller lives right now :)
So, I guess this is the case of set-based validation which occurs very often (as opposed to regular unique constraints), and also I would have lots of backups, so that separate aggregate which holds it all in the memory would be unwise.
Also I just thought there's another problem raises. I can't compute hash of binary data and tolerate small risk of two different backups be considered as the same project. This is industry domain which needs precise and robust solution. At the same time, I can't force unique constraint at binary data column (varbinary in SQL), because my binary data could be relatively big. So I guess I need to create separate table for [int (hash of binary data), Guid (id of the project)] relations and if hash of binary data of new backup is found, I need to load related aggregate and make sure binary data is the same. And if it's not - I also need some kind of mechanism to store more than one relation with the same hash.
Current implementation
I ended up creating separate table with two columns: DataHash (int) and AggregateId (Guid). Then I created domain service which has factory method GetOrCreateProject(Guid id, byte[] data). This method gets aggregate id by calculated data hash (it gets multiple values if there's multiple rows with the same hash), loads this aggregate and compares data parameter and aggregate.Data property. If they are equal - existing and loaded aggregate returned. If they are not equal - new hash entity added to hash table and new aggregate created.
This hash table is part of the domain now and now part of the domain is not event sourced. All future need for uniqueness validation (name of the BackupableThing, for example) would imply creation of such tables which add state-based storage to the domain side. This increases overall complexity and binds domain tightly. This is the point where I'm starting to ponder over if event sourcing even applies here and if not, where does it apply at all? I tried to apply it to simple system as a means to increase my knowledge and fully understand CQRS/ES patterns, but now I'm fighting complexities of set-based validation and see that simple state-based relational tables with some kind of ORM would be much better case (since I don't even need event log).

You are prematurely shoehorning your problem into DDD patterns when major aspects of the domain haven't been fully analyzed or expressed. This is a dangerous mix.
What is a Project, if you ask an expert of your domain? (hint: probably not "Project is some entity to encapsulate binary data")
What is a Backup, if you ask an expert of your domain?
What constraints about them should be satisfied in the real world?
What is a typical use case around Backupping?
We're progressively finding out more about some of these as you add updates and comments to your question, but it's the wrong way around.
Don't take Aggregates and Repositories and projections and unique keys as a starting point. Instead, first write clear definitions of your domain terms. What business processes are users carrying out? Since you say you want to use Event Sourcing, what events are happening? Figure out if your domain is rich enough for DDD to be a relevant modelling approach. When all of this is clearly stated, you will have the words to describe your backup uniqueness problem and approach it from a more relevant angle. I don't think you have them now.

No need to "query read-side" - as that is a bad idea. What you do is create a domain storage model for just the domain.
So you'll have the domain objects saved to EventStore and some special things saved somewhere else SQL, Key-Value, etc. Then a read consumer building your read models in SQL.
For instance in my app my domain instances listen to events to build domain query models which I save to riak kv.
A simple example which should illustrate my meaning. Queries are handled via a query processor, a popular pattern
class Handler :
IHandleMessages<Events.Added>,
IHandleMessages<Events.Removed>,
IHandleQueries<Queries.ObjectsByName>
{
public void Handle(Events.Added e) {
_orm.Add(new { ObjectId = e.ObjectId, Name = e.name });
}
public void Handle(Events.Removed e) {
_orm.Remove(x => x.ObjectId == e.ObjectId && x.Name == e.Name);
}
public void Handle(Queries.ObjectsByName q) {
_orm.Query(x => x.Name == q.Name);
}
}

My answer is quite generic as I'm not sure to fully understand you problem domain, but there's only 2 main ways to tackle set validation problems.
1. Enforce strong consistency
Enforcing strong consistency means that the invariant will be protected transactionnaly and therefore will never allow to be violated.
Enforcing strong consistency will most likely limit the scalability of your system, but if you can afford it then it may be the simplest way to go: preventing the conflict from occuring rather than dealing with the conflict after the fact is usually easier.
There are numerous ways strong consistency can be enforced, but here's two common ones:
Rely on a database unique constraint: If you have a datastore that supports them and both, your event store and this datastore can participate in the same transaction then you can use this approach.
E.g. (pseudo-code)
transaction {
uniquenessService.reserve(uniquenessKey); //writes to a DB unique index
//save aggregate that holds uniquenessKey
}
Use an aggregate root: This approach is very similar to the one described above, but one difference is that the rule lives explicitely in the domain rather than in the DB. The aggregate will be responsible for maintaining an in-memory set of uniqueness keys.
Given that the entire set of keys will have to be brought into memory every time you need to record a new one you should probably cache these kinds of aggregates in memory at all times.
I usually use this approach only when there's a very small set of potential unique keys. It could also be useful in scenarios where the uniqueness rule is very complex in itself and not a simple key lookup.
Please note that even when enforcing strong consistency the UI should probably prevent invalid commands from being sent. Therefore, you could also have the uniqueness information available through a read model which would be consumed by the the UI to detect conflicts early.
2. Eventual consistency
Here you would allow the rule to get violated, but then perform some compensating actions (either automated or manual) to resolve the problem.
Sometimes it's just overly limiting or challenging to enforce strong consistency. In these scenarios, you can ask the business if they would accept to resolve the broken rule after the fact. Duplicates are usually extremely rare especially if the UI does validate the command before sending it like it should (hackers could abuse the client-side check, but that is another story).
Events are great hooks when it comes to resolve consistency problems. You could listen to events such as SomeThingThatShouldBeUniqueCreated and then issue a query to check if there are duplicates.
Duplicates would be handled in the way the business wants them to be. For instance, you could send a message to an administrator so that he can manually resolve the problem.
Even though we may think that strong consistency is always needed, in many scenarios it is not. You have to explore the risks of allowing a rule to get violated for a period of time with business experts and determine how often that would occur. Sometimes you may realize that there is no real risk for the business and that the strong consistency was artificially imposed by the developer.

Related

Optimal way to overcome inconsistency when storing two domain models (User) with same unique identifier ("username")

When creating users I want to avoid duplicate usernames.
Prior creating I am checking if username already exists to throw exception; however when user requests are executed in parallel my code won't prevent storing duplicate usernames.
Edited
By the way I want to maintain consistency not in database but in application layer. I don't want to depend on specific database.
How do you think what will be optimal way to solve this problem?
Prior creating I am checking if username already exists to throw exception; however when user requests are executed in parallel my code won't prevent storing duplicate usernames.
The general term for what you are trying to achieve is set validation.
If you need to ensure that any change to a member of the set satisfies some invariant, then that implies that the set itself is a thing that you need to be able to load into memory. So your domain model might include a User Registry entity, and all modifications to users pass through the registry.
When you are dealing with uniqueness, another possibility is to use the unique property itself as the primary key (either as a natural key, or a hash), then write your constraints such to ensure that you don't get two different users stored under the same key.
(Do users in your domain have multiple email addresses? do they change addresses?)
It might be that the mapping of an email address to a user is a separate relationship from the user itself. Or that a user claiming to control an email address is a separate piece of information from verifying that the user controls that email address.
(In short, modeling information that your system controls differs from modeling information that some other system controls).
If you are using a relational data store, a unique constraint on the user name column will guarantee you can't store two users with the same user name.
Additionally, the check you perform could be transnational with the insert, so you check the name does not exist and store the new user within a single transaction.
On a human level, using email as a user name often makes it easier for people to remember their user name (rather than being Ako542 in one place and Ako392 in another). It is unlikely that two people would attempt to use it in parallel, thus making it unlikely they will see the message generated by the technical solutions provided above.
It's better and easier to ensure uniqueness on db level, but if you absolutely must to do that on applicative level and uniqueness must be granted on service cluster level (between all running instances of the service, so working with memory won't help) and you need solution for race condition, you can use distributed locking mechanism. Normally all MS ecosystems have component to provide locking functionality as service (Consul as example) but still it is bit more complex solution, and instead of being coupled to db the solution will be coupled to lock providing service.
Again, this is only relevant for very specific cases, and will help to avoid race condition problems during record creation between different instances of service on app level (the issue you described)

Implementing object change tracking in an N-Tier WCF MVC application

Most of the examples I've seen online shows object change tracking in a WinForms/WPF context. Or if it's on the web, connected objects are used, therefore, the changes made to each object can be tracked.
In my scenario, the objects are disconnected once they leave the data layer (Mapped into business objects in WCF, and mapped into DTO on the MVC application)
When the users make changes to the object on MVC (e.g., changing 1 field property), how do I send that change from the View, all the way down to the DB?
I would like to have an audit table, that saves the changes made to a particular object. What I would like to save is the before & after values of an object only for the properties that we modified
I can think of a few ways to do this
1) Implement an IsDirty flag for each property for all Models in the MVC layer(or in the javascript?). Propagate that information all the way back down to the service layer, and finally the data layer.
2) Having this change tracking mechanism within the service layer would be great, but how would I then keep track of the "original" values after the modified values have been passed back from MVC?
3) Database triggers? But I'm not sure how to get started. Is this even possible?
Are there any known object change tracking implementations out there for an n-tier mvc-wcf solution?
Example of the audit table:
Audit table
Id Object Property OldValue NewValue
--------------------------------------------------------------------------------------
1 Customer Name Bob Joe
2 Customer Age 21 22
Possible solutions to this problem will depend in large part on what changes you allow in the database while the user is editing the data.
In otherwords, once it "leaves" the database, is it locked exclusively for the user or can other users or processes update it in the meantime?
For example, if the user can get the data and sit on it for a couple of hours or days, but the database continues to allow updates to the data, then you really want to track the changes the user has made to the version currently in the database, not the changes that the user made to the data they are viewing.
The way that we handle this scenario is to start a transaction, read the entire existing object, and then use reflection to compare the old and new values, logging the changes into an audit log. This gets a little complex when dealing with nested records, but is well worth the time spent to implement.
If, on the other hand, no other users or processes are allowed to alter the data, then you have a couple of different options that vary in complexity, data storage, and impact to existing data structures.
For example, you could modify each property in each of your classes to record when it has changed and keep a running tally of these changes in the class (obviously a base class implementation helps substantially here).
However, depending on the point at which you capture the user's changes (every time they update the field in the form, for example), this could generate a substantial amount of non-useful log information because you probably only want to know what changed from the database perspective, not from the UI perspective.
You could also deep clone the object and pass that around the layers. Then, when it is time to determine what has changed, you can again use reflection. However, depending on the size of your business objects, this approach can impose a hefty performance penalty since a complete copy has to be moved over the wire and retained with the original record.
You could also implement the same approach as the "updates allowed while editing" approach. This, in my mind, is the cleanest solution because the original data doesn't have to travel with the edited data, there is no possibility of tampering with the original data and it supports numerous clients without having to support the change tracking in the UI level.
There are two parts to your question:
How to do it in MVC:
The usual way: you send the changes back to the server, a controller handles them, etc. etc..
The is nothing unusual in your use case that mandates a change in the way MVC usually works.
It is better for your use case scenario for the changes to be encoded as individual change operations, not as a modified object were you need to use reflection to find out what changes if any the user made.
How to do it on the database:
This is probably your intended question:
First of all stay away from ORM frameworks, life is too complex as it.
On the last step of the save operation you should have the following information:
The objects and fields that need to change and their new values.
You need to keep track of the following information:
What the last change to the object you intend to modify in the database.
This can be obtained from the Audit table and needs to be saved in a Session (or Session like object).
Then you need to do the following in a transaction:
Obtain the last change to the object(s) being modified from the database.
If the objects have changed abort, and inform the user of the collision.
If not obtain the current values of the fields being changed.
Save the new values.
Update the Audit table.
I would use a stored procedure for this to make the process less chatty, and for greater separations of concerns between the database code and the application code.

Change tracking entities from multiple sources in Domain Driven Design

I am currently in the process of developing a a rather big web application and is using domain driven design.
I have currently run into some trouble with tracking changes to my Product entity. The thing is, products are constructed partly from data in SQL Azure, partly from data in Azure Table Storage. If certain properties are changed, I will need to persist to both, other changes only to one.
As a result I can not use NHibernate or Entity Framework for tracking changes. For instance the Price argument on the
public void AddPrice(Price price)
method on the Product entity must be persisted to SQL Azure, calculations on a range of prices will take place and the result will be saved to Azure Table Storage.
How would you solve this?
Thoughts:
1) I thought about implementing my own change tracker based on Castle.DynamicProxy, but that seems rather tedious.
2) Implement events internally in the domain entities. This is not a good thing.
Scattering one entity across several persistent stores might not be a good idea. To be more precise, it might mean that it's not one and the same entity and could be split up in smaller, more accurately designed parts instead.
calculations on a range of prices will take place
Are you sure these calculations affect the Product entity and should be handled by the same NHibernate/EF session used in the Product repository ? Since they have to be stored elsewhere, don't they make up a first class notion in the ubiquitous language, resulting in a separate entity with a persistence logic of its own ?
See http://ayende.com/blog/153699/ask-ayende-repository-for-abstracting-multiple-data-sources
What do ORMs do? They take a copy of the data that's used to restore your object into its current state, just before they hand you a reference to the object. When behavior has been applied to the object and you're asking to persist it, the ORM will compare its copy of the data to the data currently inside the object and flush changes accordingly. Why not do the same? The only difference is that not all detected changes will be flushed to the same datastore.
HTH.
BTW, any concurrency going on here?

DDD: How to handle one entity stored in multiple storage systems?

I'm currently in the process of migrating a legacy application to a domain driven design model and I came across this issue:
The application is about managing a large amount of contacts in real time including checking for duplicates. Whenever someone saves a new contact, it has to pass a duplicate check with a 3rd party software (it's basically a similarity search software). If it passes the check, the contact will be created in SQL and a small subset of the contact (some core fields which are relevant for duplicate checking) has to be stored in the database of the 3rd party software.
So the entity "contact" lives in two (synchronized) storage systems, but one system only has a small subset of fields whereas SQL has 50+ fields for the contact.
Now I was thinking if it would be OK to create two types for "contact" (Contact and ContactShort). As a result I'd also have to create two repositories for those entities and use those repositories in a domain service which is ultimately used to perform those operations where I need the duplicate checking software (like the Save/Insert methods).
Is there a good rule of thumb of how to approach such a scenario?
EDIT: I still haven't found a definitive solution but thought a bit more about it:
Maybe it was wrong to separate the duplicate checking storage system from the SQL DB in this case. Actually, I think it is wrong to expose the methods of the 3rd party software. It is pure infrastructure. Since a save operation must never be performed without the duplicate check, I think the calls to the 3rd party software should be internal to the SQLRepository. It must never leave the infrastructure layer since it can never return a valid entity of a contact. What do you think?
To me your suggested solution sounds good. At the lower level (data access layer) you should have two independent objects that wrap access to two different databases (two repositories, as you require different connection strings. This can be 2 instances of the same XXXRepository if you use the same database engine, or it can be different repositories XXXRepository and YYYRepository to access 2 different database engines).
On the upper level (Domain Layer and GUI), however, you shouldn't be bothered how and where these data go. As you said, you have a service that separates the pluming so that the application domain and upper layers (like GUI) won't see what's happening below (in data access layer).

How to create rich domain objects while maintaing persistence ignorance?

First off, I am using web forms without any ORM framework.
I have been struggling with how to make my domain objects as "smart" and "rich" as they can be without allowing them access to my service and repository layer. My most recent attempt was in creating a model for gift certificates for a online store.
The main recurring issues that I am seeing is that:
More and more logic keeps being introduced in the service layer. All the calls to the repository must pass through the service layer and each time the parameters are validated (eg - exists in db, etc). As a result my service layer is growing, but my domain objects just have some simple contractual validations. Even object validation is in the service layer since if the ID of the item is null, it will check the db to ensure that the code is unique. IHMO, the consumer of the system should not care if the functionality they need deals with persistence or not.
I have a separate POCO for transaction log entries for when a gift certificate is redeemed. I assume that I should put a list or collection of these transactions as a property of my Gift Certificate model, but I am still unsure of when that property should be filled. Do I add a separate method on the service for loading the transactions into a object on demand (eg - LoadTransactions(gc object)) or should the transactions be automatically loaded any time a existing gift certificate or list of gift certificates are requested (or maybe a option in the getGCs to load transactions as well)
What about computed fields like "Available Balance"... should I even have properties like this on my object? Anytime I am working with the object, I will need to keep updating that property to insure it is up to date. Right now I simply have a service method GetBalanceByCode(gc code).
Even actions like redeeming a gift certificate are basically 100% data-centric (take some input parameters, validate them and add a transaction log entry to db).
More and more logic keeps being
introduced in the service layer (...)
Even object validation is in the
service layer (...)
Validation is not the best candidate as domain model element. Input (my personal preference is that it's represented as commands) should be validated at application service level. Domain logic should model how business work and assume that all the arguments are valid. Good candidates for domain logic are computations for example: you want to have them in one single place and have them well tested.
I have a separate POCO for transaction
log entries for when a gift
certificate is redeemed.
This kind of object is known as Event. You can learn about Events from Eric Evans 'What I learnt since the Blue Book' presentation. Event is basically an entity which is immutable. Events are quite often aggregates on their own because usually there's lots of them. By making them aggregates, you don't have any problems with lazy loading them as part of other objects's collection.
What about computed fields like
"Available Balance"... should I even
have properties like this on my
object?
Computed properties are kind of logic that naturally fits in domain model, however it's debatable if a better approach is to compute the value each time or compute it when object changes and persist it in the DB.
Even actions like redeeming a gift
certificate are basically 100%
data-centric (take some input
parameters, validate them and add a
transaction log entry to db).
This action would be modeled as creating a CertificateRedeemed event. This event would be probably created by Certificate aggregate or some other object. This blog post by Udi Dahan can be helpful
This is not an entirely easy question to answer given the fact that domain models are very subjective, and rely a lot on your...well, domain. It sounds like you are actually creating something similar to The Onion Architecture (and Part 2) described by Jeffery Palermo. This is not a bad pattern to use, though DDD purists will tell you it leads to "anemic" domain models (where your domain objects are basically Data holders with no behavior). The thing is, that may be exactly what you need in your scenario. A "full, rich" domain model may be overkill for what you are doing (and given your last bullet point it sounds like that could be the case).
You may not need a domain model for your system at all. You could be well served with some View Models (that is simple data models to describe your view) and have your UI send some DTOs to through your services to put the data in the database. If you find something that requires a more complex approach, then you can apply a richer domain model to that component. Also remember that you don't necessarily have one domain model in your system. There can, and in many cases should, be different models that describe things differently (often grouped into Bounded Contexts). The overall goal of DDD is to simplify otherwise complex systems. If its causing you additional complexity, then you may be taking the long way round.
There is an approach called DCI (data-context-interactions) which is supposed to be alternative to the old school OOP. Although it does not address explicitly the issue of persistence ignorance, your question brought it to my mind, because it deals with similar issues.
In DCI domain objects are small data-holders with only a little logic, like in your case, and interactions between them are implemented separately. The algorithm of interaction is not spread through small methods of several objects, but it is in one place, which might make it more lucid and understandable.
I think it is still rather academic thing than a solution we should start implementing tomorrow, but someone who comes across this question might be interested.

Categories