The Whole Values (1) that quantify a domain model have been checked to
ensure that they are recognizable values, may have been further edited
for suitability by the domain model and have been Echoed Back (4) to
the user. All of these checks are immediate on entry. There is,
however, a class of checking that should be deferred until the last
possible moment.
In The CHECKS Pattern Language of Information Integrity Ward Cunningham addresses Deferred Validations (6) for whole objects. But this is still not fully clear to me :(
I understand deferred validation is a very detailed validation for a complex object. So, should I use this validation in a test method or inside the domain property while following DDD? Is this can be implemented for UI?
And also when should I avoid this? What are the cons of Deferred validation? Can anyone please explain this with an example? Thanks in advance
There are various opinions on this and validation is a pretty large subject, but usually you never want to allow a domain object to be in an invalid state. Therefore, validation occurs at object construction and exceptions are thrown immediately.
E.g. A Person object cannot exist without a name in most domains.
However, it's not always possible to validate an object invariants at construction time. This is the case when an object must be allowed to exist in an incomplete/transient state.
E.g.
You are building an application which allows users to post an ad. All the fields are required before posting the ad, but there are a lot of details to fill and you want to give the user the option to save their unfinished work and continue later.
In the exemple above, it is not possible to validate the Ad entity at construction time since you must allow incomplete ads to be saved.
In this case, the ad posting's validation would occur only when it's about to be posted.
Keep in mind that there would be many other ways to solve the above issue in your domain. For exemple, one could not want to allow the Ad entity to be in an invalid state and could introduce a persistent AdBuilder object which serves the purpose of representing the stateful's ad creation process.
Also, someone could also decide that saving incomplete work is not a domain concern and that incomplete information should be stored on the client (e.g. localStorage in a web browser) until it is ready to be posted.
Related
When creating users I want to avoid duplicate usernames.
Prior creating I am checking if username already exists to throw exception; however when user requests are executed in parallel my code won't prevent storing duplicate usernames.
Edited
By the way I want to maintain consistency not in database but in application layer. I don't want to depend on specific database.
How do you think what will be optimal way to solve this problem?
Prior creating I am checking if username already exists to throw exception; however when user requests are executed in parallel my code won't prevent storing duplicate usernames.
The general term for what you are trying to achieve is set validation.
If you need to ensure that any change to a member of the set satisfies some invariant, then that implies that the set itself is a thing that you need to be able to load into memory. So your domain model might include a User Registry entity, and all modifications to users pass through the registry.
When you are dealing with uniqueness, another possibility is to use the unique property itself as the primary key (either as a natural key, or a hash), then write your constraints such to ensure that you don't get two different users stored under the same key.
(Do users in your domain have multiple email addresses? do they change addresses?)
It might be that the mapping of an email address to a user is a separate relationship from the user itself. Or that a user claiming to control an email address is a separate piece of information from verifying that the user controls that email address.
(In short, modeling information that your system controls differs from modeling information that some other system controls).
If you are using a relational data store, a unique constraint on the user name column will guarantee you can't store two users with the same user name.
Additionally, the check you perform could be transnational with the insert, so you check the name does not exist and store the new user within a single transaction.
On a human level, using email as a user name often makes it easier for people to remember their user name (rather than being Ako542 in one place and Ako392 in another). It is unlikely that two people would attempt to use it in parallel, thus making it unlikely they will see the message generated by the technical solutions provided above.
It's better and easier to ensure uniqueness on db level, but if you absolutely must to do that on applicative level and uniqueness must be granted on service cluster level (between all running instances of the service, so working with memory won't help) and you need solution for race condition, you can use distributed locking mechanism. Normally all MS ecosystems have component to provide locking functionality as service (Consul as example) but still it is bit more complex solution, and instead of being coupled to db the solution will be coupled to lock providing service.
Again, this is only relevant for very specific cases, and will help to avoid race condition problems during record creation between different instances of service on app level (the issue you described)
I'm wondering what's the best way to do validation of database constraints (e.g. UNIQUE) in a ASP.NET MVC application, build with DDD in mind, where the underlying layers are Application Layer (application services), Domain Layer (domain model) and Infrastructure Layer (persistance logic, logging, etc.).
I've been looking through lots of DDD samples, but what many of them doesn't mention is how to do validation in the repository (I suppose that this is where this type of validation fits). If you know of any samples doing this, please share them it will be much appreciated.
More specific, I have two questions. How would you perform the actual validation? Would you explicitly check if a customer name already exists by querying the database, or would you try inserting it directly in the database and catching the error if any (seems messy)? I prefer the first one, and if choosing this, should it be done in the repository, or should it be the job of a application service?
When the error is detected, how would you pass it to ASP.NET MVC so the user can be informed nicely about the error? Preferably using the ModelStateDictionary so the error is easily highlighted on the form.
In the N-Lyered app by Microsoft Spain, they use the IValidatableObject interface and the most simple property validation is placed on the entity itself, such as:
public IEnumerable<ValidationResult> Validate(ValidationContext validationContext)
{
var validationResults = new List<ValidationResult>();
if (String.IsNullOrWhiteSpace(this.FirstName))
validationResults.Add(new ValidationResult(Messages.validation_CustomerFirstNameCannotBeNull, new string[] { "FirstName" }));
return validationResults;
}
Before the entity is persisted, the Validate message is called to ensure that the properties are valid:
void SaveCustomer(Customer customer)
{
var validator = EntityValidatorFactory.CreateValidator();
if (validator.IsValid(customer)) //if customer is valid
{
_customerRepository.Add(customer);
_customerRepository.UnitOfWork.Commit();
}
else
throw new ApplicationValidationErrorsException(validator.GetInvalidMessages<Customer>(customer));
}
The ApplicationValidationErrorsException can then be catched in the MVC application and the validation error messages can be parsed and inserted into the ModelStateDictionary.
I could add all the validation logic into the SaveCustomer method, e.g. querying the database checking if a customer already exists using a given column (the UNIQUE one).
Maybe this is okay, but I would rather that the validator.IsValid (or something similar) would do this for me, or that validation is performed once again in the Infrastructure layer (if it belongs here, im not sure).
What do you think? How do you do it? I'm very interesting in gaining more insight into different validation techniques in layered applications.
Possible solution #1
In the case where the validation logic can't be done in the presentation layer (like Iulian Margarintescu suggests) and needs to be done in the service layer, how would you pass validation errors up to the presentation layer?
Microsoft has a suggestion here (see listing 5). What do you think about that approach?
You mention DDD, yet there is a lot more to DDD than entities and repositories. I assume you are familiar with Mr Eric Evans's book Domain Driven Design and i would strongly suggest you re-read the chapters about strategic design and bounded contexts. Also Mr Evans has a very nice talk called "What i've learned about DDD since the book" that you can find here. Talks about SOA, CQRS and event sourcing from Greg Young or Udi Dahan also contain a lot of information about DDD and applying DDD. I must warn you that you might discover things that will change the way you think about applying DDD.
Now for your question about validation - One approach might be to query the db (using an Ajax call that is directed to an app service) as soon as the user types something in the "name" field and try to suggest an alternative name if the one he entered already exists. When the user submits the form, try to insert the record in the db and handle any duplicate key exception (at the repository or app service level) . Since you are already checking for duplicates ahead of time the cases where you get an exception should be fairly rare so any decent "We are sorry, please retry" message should do since, unless you have A LOT of users they will probably never see it.
This post from Udi Dahan also has some information on approaching validation. Remember that this might be a constraint you are imposing on the business instead of a constraint that the business imposes on you - Maybe it provides more value for the business to allow customers with the same name to register, instead of rejecting them.
Also remember that DDD is a lot more about business than it is about technology. You can do DDD and deploy your app as a single assembly. Layers of client code on top of services on top of entities on top of repositories on top of databases have been abused so many times in the name of "good" design, without any reasons for why it is a good design.
I'm not sure this will answer your question(s) but i hope it will guide you to find the answers yourself.
I'm wondering what's the best way to do validation of database constraints (e.g. UNIQUE)
and if choosing this, should it be done in the repository, or should it be the job of a application service?
It depends on what you are validating.
If it's an aggregate root creation you are trying to validate - then there is nothing more global than app itself that "holds" it. In this case, I apply validation directly in repository.
If it's an entity, it lives in aggregate root context. In this case I'm validating entity uniqueness in aggregate root itself against all the other entities in this particular aggregate root. Same goes for value objects in entities/roots.
P.s. repository is a service. Do not look at services as universal store for necessary but hard to name properly code. Naming matters. The same goes with names like "Helpers", "Managers", "Common", "Utilities", etc. - they are pretty much meaningless.
Also - you don't need to pollute your code base with pattern names: AllProducts > ProductRepository; OrderRegistrator > OrderService; order.isCompleted > IsOrderCompletedSpecification.IsSatisfiedBy.
More specific, I have two questions. How would you perform the actual validation? Would you explicitly check if a customer name already exists by querying the database, or would you try inserting it directly in the database and catching the error if any (seems messy)?
I would query the database. Although, if high performance is a concern and customer name availability is only thing that database should enforce - I would go with relying on database (1 less round trip).
When the error is detected, how would you pass it to ASP.NET MVC so the user can be informed nicely about the error? Preferably using the ModelStateDictionary so the error is easily highlighted on the form.
Usually it is not a good idea to use exceptions for controlling flow of application, but, since I want to enforce UI to show only available things that can be done, I'm just throwing exception in case validation fails. In UI layer, there's a handler that neatly picks it up and spits out in html.
Also - it is important to understand what is the scope of command (e.g. product ordering command might check 2 things - if customer ain't debtor and if product is in store). If command has multiple associated validations, those should be coupled together so UI would receive them simultaneously. Otherwise it would lead to annoying user experience (going through multiple errors while trying to order that damn product over and over again).
This is a beginner pattern question for a web forms-over-data sort of thing. I read Exposing database IDs - security risk? and the accepted answer has me thinking that this is a waste of time, but wait...
I have an MVC project referencing a business logic library, and an assembly of NHibernate SQL repositories referencing the same. If something forced my hand to go and reference those repositories directly from my controller codebase, I'd know what went wrong. But when those controllers talk in URL parameters with the database record IDs, does it only seem wrong?
I can't conceive of those IDs ever turning un-consumable (by MVC actions). I don't think I'd ever need two UI entities corresponding to the same row in the database. I don't intend for the controller to interpret the ID in any way. Surrogate keys would make zero difference. Still, I want to have the problem because assumptions about the ralational design aren't any better than layer-skipping dependencies.
How would you make a web application that only references the business logic assembly and talks in BL objects and GUIDs that only have meaning for that session, while the assembly persists transactions using database IDs?
You can encrypt or hash your ids if you want. Using session id as a salt. It depends on the context. A public shopping site you want the catalog pages to be clear an easily copyable. User account admin it's fine to encrypt the ids, so users can't url hack into someone else's account.
I would not consider this to be security by obscurity. If a malicious user has one compromised account they can look at all the form fields, url ids, and cookie values set while logged in as that user. They can then try using those when logged in as a different user to escalate permissions. But by protecting them using session id as a salt, you have locked that data down so it's only useful in one session. The pages can't even be bookmarked. Could they figure out your protection? Possibly. But likely they'd just move on to another site. Locking your car door doesn't actually keep anyone out of your car if they want to get in, but it makes it harder, so everyone does it.
I'm no security expert, but I have no problem exposing certain IDs to the user, those such as Product IDs, User IDs, and anything that the user could normally read, meaning if I display a product to the user, displaying its Product ID is not a problem.
Things that are internal to the system that the users do not directly interact with, like Transaction IDs, I do not display to the user, not in fear of them editing it somehow, but just because that is not information that is useful to them.
Quite often in forms, I would have the action point to "mysite.com/messages/view/5", where 5 is the message they want to view. In all of these actions, I always ensure that the user has access to view it (modify or delete, which ever functionality is required), by doing a simple database check and ensure the logged in user is equal to the messages owner.
Be very very very careful as parameter tampering can lead to data modification. Rules on 'who can access what ids' must be very very carefully built into your application when exposing these ids.
For instance, if you are updating an Order based on OrderId, include in your where clause for load and updates that :
where order.orderid=passedInOrderId and Order.CustomerId=
I developed an extension to help with stored ids in MVC available here:
http://mvcsecurity.codeplex.com/
Also I talk about this a bit in my security course at: Hack Proofing your ASP.NET MVC and Web Forms Applications
Other than those responses, sometimes it's good to use obvious id's so people can hack the url for the information they want. For example, www.music.com\artist\acdc or www.music.com\arist\smashing-pumpkins. If it's meaningful to your users and if you can increase the information the user understands from the page through the URL then all the better and especially if your market segment is young or tech savvy then use the id to your advantage. This will also boost your SEO.
I would say when it's not of use, then encode it. It only takes one developer one mistake to not check a customer id against a session and you expose your entire customer base.
But of course, your unit tests should catch that!
While you will find some people who say that IDs are just an implementation detail, in most systems you need a way of uniquely identifying a domain entity, and most likely you will generate an ID for that identifier. The fact that the ID is generated by the database is an implementation detail; but once it has been generated it becomes an attribute of the domain entity, and it is therefore perfectly reasonable to use it wherever you need to reference the entity.
Should I wrap calls to a repository with try-catch block (aiming to catch/handle StaleObjectStateException) inside a corresponding controller in ASP.NET MVC application or should it take place inside a repository implementation?
Also how do I handle the exception, inform user. As far as I understand no rollback is intented?
Thanks!
The problems boils down to a different question: where and how to handle concurrent modification of entities? That is: user A and user B edit the same record and when the one that saves the record later (user B) gets a StaleObjectStateException because the version he edited is out of date now.
Here are some ideas:
Forcefully make the version of user B the "right" one by brute force, e.g. by retrieving the current version of the record from the DB and apply the whole state of user B's version to it. This is problematic if user A has changed e.g. the "E-Mail address" field and user B has changed the "User name" field. With this approach everything user A has done is gone. In this approach you'd catch StaleObjectStateException and fix everything inside the repository.
The "smart" approach: similarly to approach 1 everything is fixed inside the repository (ie. it catches and handles StaleObjectStateException completely), but it uses domain knowledge to selectively apply only some of the changes user B did. E.g. if user A changed the e-mail address and user B changed the user name, those changes don't exclude each other, so the repository could only update the e-mail address. This works well if two aspects of the record have been changed concurrently that do not directly rely on each other. Implementing this solution can be rather complex depending on how "smart" you want to be.
Reject concurrent changes inside the repository. In this case, if the StaleObjectStateException occurs the repository needs to report back that it couldn't save the record. It could actually just let the exception bubble up, but then you're leaking NHibernate up to e.g. the controller. Instead you could just throw your own exception with more useful details that are meaningful to your domain. In this situation, the controller is a good place to catch that exception. You then have different options of what to do, e.g.:
Inform the user that the record couldn't be saved due to concurrent changes by another user and throw all changes away, forcing him to do everything from scratch. This is of course painful for the user and should only be done if this really occurs rarely.
Inform the user about the issue and let him decide what to do, e.g. force his changes or start over.
While this goes beyond your question, I hope this still helps you.
If you are properly using DI putting this kind of exception management in the controller would break your separation of concerns.
The controllers belongs to the presentation layer, and the presentation layer is not aware of what you are using for data storage, and StaleObjectStateException is NHibernate stuff.
If I have a 3 layer web forms application that takes user input, I know I can validate that input using validation controls in the presentation layer. Should I also validate in the business and data layers as well to protect against SQL injection and also issues? What validations should go in each layer?
Another example would be passing a ID to return a record. Should the data layer ensure that the id is valid or should that happen in BLL / UI?
You should validate in all layers of your application.
What validation will occur at each layer is specific to the layer itself. Each layer should be safe to send "bad" requests to and get a meaningful response, but which checks to perform at each layer will depend on your specific requirements.
Broadly:
User Interface - Should validate user input, provide helpful error messages and visual clues to correcting them; it should be protecting your lower layers against invalid user input.
Business / Domain Layer - Should check arguments to methods are valid (throwing ArgumentException and similar when they are not) and should check that operations are possible within the constraints of your business rules; it should be protecting your domain against programming mistakes.
Data Layer - Should check the data you are trying to insert or update is valid within the context of your database, that it meets all the relational constraints and check constraints; it should be protecting your database against mistakes in data-access.
Validation at each layer will ensure that only data and operations the layer believes to be correct are allowed to enter. This gives you a great deal of predictability, knowing information had to meet certain criteria to make it through to your database, that operations had to be logical to make it through your domain layer, and that user input has been sanitized and is easier to work with.
It also gives you security knowing that if any of your layers was subverted, there is another layer performing checks behind it which should prevent anything entering which you don't want to.
Should I also validate in the business and data layers as well to protect against SQL injection and also issues?
Yes and Yes.
In your business layer code, you need to validate the input again (as client side can be spoofed), and also for your business logic, making sure the entries make sense for your application.
As for the data layer - you again need to ensure data is valid for the DB. Use parametrized queries as this will pretty much ensure no SQL injection will happen.
As for your specific question regarding the ID - the DB will know if an ID exists or not. Whether that is valid or not, depends on whether it has meaning for your business layer or not. If it purely a DB artefact (not part of your object model), than the DB needs to handle it, if it is a part of your object model and has significance to it, the business layer should handle it.
You absolutely need to validate in your business and data layers. The UI is an untrusted layer, it is always possible for somebody to bypass your client-side validation and in some cases your server-side UI validation.
Preventing SQL injection is simply a matter of parameterizing your queries. The phrase "SQL Injection" shouldn't even exist anymore, it's been a solved problem for years and years, and yet every day I see people writing queries using string concatenation. Don't do this. Parameterize the commands and you will be fine.
One of the main reasons you separate your app into multiple tiers is so that each tier is reusable. If individual tiers don't do their own validation, then they are not autonomous and you don't have proper separation of concerns. You also can't do any thorough testing without individual components doing built-in validation.
I tend to relax these restrictions for classes or methods that are internal or private because they're not getting directly tested or used. As long as the public API is fully-validated, private APIs can generally assume that the class is in a valid state.
So, basically, yes, every layer, in fact every public class and method needs to validate its own data/arguments.
Semantic validation, like checking whether or not a particular Customer ID is valid, is going to depend on your design requirements. Obviously the business layer has no way of knowing whether or not an ID exists until said ID actually hits the data layer, so it can't perform this check in an advance. Whether it throws an exception for a missing ID or simply returns null/ignores the error depends on exactly what the class/method is designed to do.
However, if this ID needs to be in a special format - for example, maybe you're using specially-coded account numbers ("R-12345-A-678") - then it does become the responsibility of the domain/business layer to validate the input and make sure it conforms to the correct format, especially if the consumer of your business class is trying to create a new account.
No layer should trust data coming from another layer. The analogy I use for this is one of fiefdoms. Say you want to send a message to the king. If the message is not in the proper format it will be rejected before it ever gets to his ears. You could continue to send messages until you eventually get the format right or you could use an emissary. The job of the emissary is to help you verify that your message will be in the acceptable format so that the king will hear it.
Each layer in a system is a fiefdom. Each layer acts as an emissary to the layer to which it will send data by verifying that it will be accepted. No layer trusts data coming from outside that layer (no one trusts messages from outside the fiefdom). The database does not trust the middle layer. The middle-layer does not trust the database or the presentation layer. The presentation does not trust the user or the middle layer.
So the answer is that absolutely you should check and re-check the data in each layer.
Short answer: yes.
Validate as input gets received in each new layer and before it gets acted upon, generally I validate such input just before it gets used or passed on to the next layer (javascript checks if it's a valid email and free of malicious input, likewise the business layer does before constructing a query using it.)
To your last question: if the ID returns a record, then it is valid, and you'd have to find the record's id to confirm whether or not it is valid, so you'd be making a lot of unnessecary lookups if you were to try that.
I hope that helps.
I do all of my validation at the Presenter layer in the Model-View-Presenter. Validation is somewhat tricky because it's really a crosscutting concern so many times.
I prefer to do it at the presenter layer because I can then shortcircuit calling to the model.
The other approach is to do the validation in the model layer but then the issue of communication of errors because you cannot easily inform other layers of errors aside from exceptions. You can always pack exceptions with data or create your own custom exception that you can attach a list of error messages or similar construct to but that always seem dirty to me.
Later when I expose my model through a web service I will implement double validation checking both in the Presenter and in the Model since it will be possible to jump the presenter layer if you call the web service. The other big advantage to this is that it decouples my validations for the presenter layer from the model since the model might only require raw validation of types to match the database whereas users of my UI I want more granular rules of what they input not just that they physically can.
Other questions: the sql injection portion that is a model concern and should not be in any middle layers. However most sql injection attacks are completely nullified when text fields don't allow special characters. The other part of this is you should almost always be using parametrized sql which makes sql injection not usable.
The question on the ID that's a model concern either it can get a record with that ID or it should return null or throw an exception for record not found depending on what convention you wish to establish.