C# / Object oriented design - maintaining valid object state

C# / Object oriented design - maintaining valid object state - c#

When designing a class, should logic to maintain valid state be incorporated in the class or outside of it ? That is, should properties throw exceptions on invalid states (i.e. value out of range, etc.), or should this validation be performed when the instance of the class is being constructed/modified ?

It belongs in the class. Nothing but the class itself (and any helpers it delegates to) should know, or be concerned with, the rules that determine valid or invalid state.

Yes, properties should check on valid/invalid values when being set. That's what it's for.

It should be impossible to put a class into an invalid state, regardless of the code outside it. That should make it clear.
On the other hand, the code outside it is still responsible for using the class correctly, so frequently it will make sense to check twice. The class's methods may throw an ArgumentException if passed something they don't like, and the calling code should ensure that this doesn't happen by having the right logic in place to validate input, etc.
There are also more complex cases where there are different "levels" of client involved in a system. An example is an OS - an application runs in "User mode" and ought to be incapable of putting the OS into an invalid state. But a driver runs in "Kernel mode" and is perfectly capable of corrupting the OS state, because it is part of a team that is responsible for implementing the services used by the applications.
This kind of dual-level arrangement can occur in object models; there can be "exterior" clients of the model that only see valid states, and "interior" clients (plug-ins, extensions, add-ons) which have to be able to see what would otherwise be regarded as "invalid" states, because they have a role to play in implementing state transitions. The definition of invalid/valid is different depending on the role being played by the client.

Generally this belongs in the class itself, but to some extent it has to also depend on your definition of 'valid'. For example, consider the System.IO.FileInfo class. Is it valid if it refers to file that no longer exists? How would it know?

I would agree with #Joel. Typcially this would be found in the class. However, I would not have the property accessors implement the validation logic. Rather I'd recommend a validation method for the persistence layer to call when the object is being persisted. This allows you to localize the validation logic in a single place and make different choices for valid/invalid based on the persistence operation being performed. If, for example, you are planning to delete an object from the database, do you care that some of its properties are invalid? Probably not -- as long as the ID and row versions are the same as those in the database, you just go ahead and delete it. Likewise, you may have different rules for inserts and updates, e.g., some fields may be null on insert, but required on update.

It depends.
If the validation is simple, and can be checked using only information contained in the class, then most of the time it's worth while to add the state checks to the class.
There are sometimes, however, where it's not really possible or desirable to do so.
A great example is a compiler. Checking the state of abstract syntax trees (ASTs) to make sure a program is valid is usually not done by either property setters or constructors. Instead, the validation is usually done by a tree visitor, or a series of mutually recursive methods in some sort of "semantic analysis class". In either case, however, properties are validated long after their values are set.
Also, with objects used to old UI state it's usually a bad idea (from a usability perspective) to throw exceptions when invalid values are set. This is particularly true for apps that use WPF data binding. In that case you want to display some sort of modeless feedback to the customer rather than throwing an exception.

The class really should maintain valid values. It shouldn't matter if these are entered through the constructor or through properties. Both should reject invalid values. If both a constructor parameter and a property require the same validation, you can either use a common private method to validate the value for both the property and the constructor or you can do the validation in the property and use the property inside your constructor when setting the local variables. I would recommend using a common validation method, personally.
Your class should throw an exception if it receives invalid values. All in all, good design can help reduce the chances of this happening.

The valid state in a class is best express with the concept of class invariant. It is a boolean expression which must hold true for the objects of that class to be valid.
The Design by Contract approach suggests that you, as a developer of class C, should guarantee that the class invariant holds:
After construction
After a call to a public method
This will imply that, since the object is encapsulated (noone can modify it except via calls to public methods), the invariant will also be satisfied at entering any public method, or at entering the destructor (in languages with destructors), if any.
Each public method states preconditions that the caller must satisfy, and postconditions that will be satisfied by the class at the end of every public method. Violating a precondition effectively violates the contract of the class, so that it can still be correct but it doesn't have to behave in any particular way, nor maintain the invariant, if it is called with a precondition violation. A class that fulfills its contract in the absence of caller violations can be said to be correct.
A concept different from correct but complementary to it (and certainly belonging to the multiple factors of software quality) is that of robust. In our context, a robust class will detect when one of its methods is called without fulfilling the method preconditions. In such cases, an assertion violation exception will typically be thrown, so that the caller knows that he blew it.
So, answering your question, both the class and its caller have obligations as part of the class contract. A robust class will detect contract violations and spit. A correct caller will not violate the contract.
Classes belonging to the public interface of a code library should be compiled as robust, while inner classes could be tested as robust but then run in the released product as just correct, without the precondition checks on. This depends on a number of things and was discussed elsewhere.

Related

How do get set methods stop dependencies?

So I understand that if we want to change the implementation detail of a class, using those details outside of the class will cause errors when things are changed, this is why we set those fields to private. However, if we use get set methods with a private field doesn't this do the same thing? If I decided I didn't want my class to have a name and a username, just a name, and I delete the private username field, the get / set methods will break with that and it will cause the places where those methods are used to also break. Isn't referencing one class a dependency no matter what in case we change that classes methods or fields? What is the point of Get Set methods then and how do they stop code from breaking like this?

However, if we use get set methods with a private field doesn't this do the same thing?
Yes. Arguably, yes. The original idea of Object Oriented Programming, as Alan Kay -who coined the term- initially thought about it, has been distorted. Alan Kay has expressed his dislike for setters:
Lots of so called object oriented languages have setters and when you have a setter on an object you turned it back into a data structure.
-- Alan Kay - Programming and Scaling (video).
Isn't referencing one class a dependency no matter what in case we change that classes methods or fields?
Correct. If you are referencing a class from another, your classes are tightly coupled. In that case a change of one class will propagate to the other. Regardless if the change is in public fields, getter, setters or something else.
If you are using an interface or similar indirection, they are loosely coupled. This looseness gives you an opportunity to stop the propagation of the change. Which you may or may not do.
Finally, if you are using an observer pattern or similar (e.g. events or listeners), you can have classes decoupled. This is, in a way, retrofitting the idea of passing messages as originally conceived by Alan Kay.
What is the point of Get Set methods then and how do they stop code from breaking like this?
They allow you to change the internal representation of the class. While the common approach is to have setters and getters correspond to a field, that does not have to be the case. A getter might return a constant, or compute a value form multiple fields. Similarly, a setter might update multiple fields (or even do nothing).
Reasons to have setters:
They give you an opportunity to implement validations.
They give you an opportunity to raise "changed" events.
They might be necessary to work with other systems (e.g. some Dependency Injection frameworks, also some User Interface frameworks).
You need to update multiple fields to keep an invariant. Presumably updating those other fields don't result in some public property changing value in an unexpected way (also don't break single responsibility principle, but that should be obvious). See Principle of least astonishment.
Reasons of getters:
They give you an opportunity to implement lazy initialization.
They give you an opportunity to return computed values.
They might make debugging easier. Consider some getters for DEBUG builds only.
If you had public fields, and then you decided you needed anything like what I described above, you may want to change to getters and setters. Plus, that change require to recompile the code that uses it (even if the source is the same, which would be the case with C# properties). Which is a reason it is advised to do it preemptively, in particular in code libraries (so that an application that uses it does not have to be recompiled if the library changed to a newer version that needed these changes).
These are reasons to not have getters: Often, getters exist to access a member to call method on it, which leads to very awkward interfaces (see Law of Demeter). Or to take a decision, which may lead to a Time-of-check to time-of-use bug, which also means the interface is not thread-safe ready. Or to do a computation, which is often better if the class has a method to do it itself (Tell, Don't Ask).
And for setters, aside for being a code smell of bad encapsulation, could be indicative of an unintended state machine. If code needs to call a setter (change the state), to make sure it has the intended value before calling a method, just make it a parameter (yes, even if you are going to repeat that parameter in a lot of methods). Such interface is easy to misuse, plus is not thread-safe ready. In general, avoid any interface design in which the code using it has to call things in an order that it does not forces you to (a good design will not let you call things in an order that results in an invalid state (see poka-yoke). Of course, not every contract can be expressed in the interface, we have exceptions for the rest.).
A thread-safe ready interface, is one that can be implemented in a thread-safe fashion. If an interface is not thread-safe ready, the only way to avoid threading problems while using it is to wrap access to it with locks external to it, regardless of how the interface is implemented. Often because the interface prevents consolidating reads and writes leading to a Time-of-check to time-of-use bug or an ABA problem.
There is value in public fields, when appropriate, too. In particular for performance, and for interoperability with native code. You will find, for example, that Vector types used in game development libraries often have public fields for its coordinates.
As you can see, there can be good reasons for both having and not having getters and setters. Similarly, there can be good reasons for both having or not having public fields. Plus, either case can be problematic if not used appropriately.
We have guidelines and "best practices" to avoid the pitfalls. Not having public fields is a very good default. And not every field needs getters and setters. However, you can make getters and setters, and you can make fields public. Do that if you have a good reason to do it.
If you make every field public you will likely run into trouble, braking encapsulation. If you make getters and setters for each and every field, it is not much better. Use them thoughtfully.

Separate Options class, overloaded constructor, or public properties with default values?

I have been working on a project where I have a Worker class that generates a lot of data in a multi-threaded fashion. The type, size, and location of the data is variable based on a large set of parameters that can be set by an end user. Essentially this is a big test harness that I am using to investigate how certain things perform based on a variation of the data. Right now I have at least 12 different parameters for the Worker class. I was thinking about switching over to a separate WorkerOptions class that contains all of these values, and then have the UI create the WorkerOptions object and then pass that into the Worker. However, I could also expose public properties on the Worker class to allow the options to be set appropriately at Worker creation as well.
What is the best way to go about this, and why? I am sure this will generate some different opinions but I am open to listen to debate about why different people might do it a different way. Some things to consider are that currently once a Worker is created and running, its configuration doesn't change unless it stops. This could be subject to change, but I don't think it will.
EDIT
I am not a C# developer normally, I know enough to be able to write applications that function and follow common design patterns, but my expertise is in SQL Server, so I might ask follow up questions to clarify your meaning.

I have as guideline that the parameters that are necessary to use the instance should be passed in the constructor and all 'optional' parameters should be properties.
The properties will be initialized of course in the constructor to their default values.
If the number of arguments is not high I use default value arguments, but 12 is quite some amount.
I forgot to mention the separate class for options. Mostly I don't do such thing, unless there is some 'business logic' inside the options (like checking if some option combinations are not possible). If it is just for storage, you end up a with a lot of extra references to this option class (instances).

I'd combine the two approaches.
Make your WorkerOptions class use a constructor that requires all the required parameters, and allows the optional parameters to be set either via an overload, optional arguments, or properties, then pass that in as an argument.
Having the WorkerOptions class gives you a nice DTO to pass around in case refactoring leads you to create an additional layer between the UI and the worker class itself. Using required parameters in its constructor gives you compile-time checking to prevent runtime errors.

Personally, from what you have said, I prefer the WorkerOptions approach. For the following reasons:
It's cleaner, 12 constructor parameters is not out of the question, but it is perhaps a little excessive.
You can apply polymorphism and all the other OO goodness to your WorkerOptions. You might want to define an IWorkerOptions at some stage, or use Builder to construct different sub-classes of WorkerOption.
I would also make all WorkerOption instances immutable, or at least come up with a 'lock' or 'freeze' mechanism to prevent changes once a Worker has started execution.

How is validating a parameter "high up" on the callstack done?

I've been reading the Framework Design Guidelines book, a book on designing frameworks in .NET, with excerpts from the framework designers on the decisions they made regarding each section (E.g. parameter design, exception handling, etc).
One of the tips, under parameter design, is to validate parameters as "high up on the callstack" as possible. This is because the work here is not as expensive as it is low on the callstack, so a performance penalty is not as costly when validating high up in the callstack.
Does this mean that when I pass parameters into a method or constructor, I validate them before doing anything else, or do I do so just before using the parameters (So there could be 100 lines of code between the parameter in the definition and the usage of the parameter)?
Thanks

Prefer to validate in the public API of an assembly. That means the public methods of the public classes.
Prefer to validate in the public methods of your classes. So if your class requires a non-null pointer to another object to work correctly, you could enforce this by requiring it as a constructor parameter and throwing an exception when a null pointer is supplied. From that point forward none of the member methods need to test if the pointer is non-null.
The idea is that no user can break your class (or assembly) by feeding invalid data. Of course the code won't work either way, but if you fail in a controlled way, it's more clear to the calling code what is wrong, and you won't have unpleasant side effects like resource leaks (or worse).

Failing fast is generally a good practice. All arguments passed to a method should be validated as soon as possible, without any unnecessary calculations being performed before, because that eases debugging and allows for easier recovery from the faulty situation.
In respect to input validation I consider performance a minor concern.

I haven't read the specific guidelines you mention, but I expect they're talking about the case where method A calls method B, which calls method C and a parameter value gets passed through all three calls. It's better to validate that parameter at the start of method A than somewhere in the middle of method C because if it's invalid, then you get to skip all of the stuff that happens in A and B and the start of C. This is especially true if B or C are called inside loops because then the low-level validation would occur many times instead of just once at the start of A.
Of course you have to balance that with how complicated the validation of the parameter is. It may just be way easier to understand if you validate it in the same place you use it.

Validate them as early as you can in your method!

What I believe this to mean is that you should validate data that could be invalid as soon as you receive it. Once it has been validated then no more checks are needed. If you wait until the bottom of the call stack then you may have to validate many times because your call tree may have many branches.
I would whole-heartedly agree with this advice, but not on the grounds of performance. By validating at the point of entry you are in a much better position to give a meaningful error message to the client who supplied the data. And by reducing the amount of validation that you do, you will end up with much clearer code.

Constructor or properties: which one is the better choice while assigning values

When we should use constructor over properties or vice versa while assigning values.

A constructor is a very convenient and powerful type of contract - a way to require consumers to provide certain information before they can even use your object. So for information that is necessary for the instance to function properly, use constructor parameters. This is the underlying concept of dependency injection - anything you depend on to do your job, must be injected (provided) to you before you begin.
Properties can represent an interesting problem. In general, experience has taught me that wherever possible, properties should be read-only and objects should generally be as externally immutable as possible. Adding a public setter to a property is a multiplier of complexity for your class. There are of course always types of objects - entities are a good example - where setters make sense. But for most objects, the pattern of "write-to via constructor" / "read-from via properties" for state has vastly reduced complexity and bug risks in the applications I've been responsible for.

Use constructor if the parameter values are really required for your object to be constructed (without them the object cannot start to live). Use properties for the parameters which have an acceptable default value, so it's OK not to assign them at all. You can provide some extra constructors which will assign some properties as a shorthand, courtesy to your users.

You use a constructor when you need arbitrary sane initial values, and properties when you want the values to be changeable later.

There are a few cases where mutable properties may be preferable:
For 'pure' mutable Data objects where merely setting the properties can have no side effects. For instance, you might have an object that represents some Entity in the database, but modifying its properties will not have any effect until you explicitly perform a Commit operation. The object is a package for containing data, but nothing directly reacts to changes in the data.
If you have a large amount of configurable state that will affect some operation and many of the configurable properties have meaningful default values. If these are properties of the class that performs the operation, it's typical to have some notion of 'freezing' the state so that the mutable properties throw exceptions while the operation is running.
If you're developing a class that will be consumed by a visual designer or other system that relies on Reflection over properties. For instance, the data binding system in WPF makes extensive use of mutable properties as a way to communicate UI interactions. With a proper design to manage these mutations, you can create some very powerful and responsive interfaces.

What guidelines are appropriate for determining when to implement a class member as a property versus a method?

The .NET coding standards PDF from SubMain that have started showing up in the "Sponsored By" area seems to indicate that properties are only appropriate for logical data members (see pages 34-35 of the document). Methods are deemed appropriate in the following cases:
The operation is a conversion, such as Object.ToString().
The operation is expensive enough that you want to communicate to the user that they should consider caching the result.
Obtaining a property value using the get accessor would have an observable side effect.
Calling the member twice in succession produces different results.
The order of execution is important.
The member is static but returns a value that can be changed.
The member returns an array.
Do most developers agree on the properties vs. methods argument above? If so, why? If not, why not?

They seem sound, and basically in line with MSDN member design guidelines:
http://msdn.microsoft.com/en-us/library/ms229059.aspx
One point that people sometimes seem to forget (*) is that callers should be able to set properties in any order. Particularly important for classes that support designers, as you can't be sure of the order generated code will set properties.
(*) I remember early versions of the Ajax Control Toolkit on Codeplex had numerous bugs due to developers forgetting this one.
As for "Calling the member twice in succession produces different results", every rule has an exception, as the property DateTime.Now illustrates.

Those are interesting guidelines, and I agree with them. It's interesting in that they are setting the rules based on "everything is a property except the following". That said, they are good guidelines for avoiding problems by defining something as a property that can cause issues later.
At the end of the day a property is just a structured method, so the rule of thumb I use is based on Object Orientation -- if the member represents data owned by the entity, it should be defined as a property; if it represents behavior of the entity it should be implemented as a method.

Fully agreed.
According to the coding guidelines properties are "nouns" and methods are "verbs". Keep in mind that a user may call the property very often while thinking it would be a "cheap" operation.
On the other side it's usually expected that a method may "take more time", so a user considers about caching method results.

What's so interesting about those guidelines is that they are clearly an argument for having extension properties as well as extension methods. Shame.

I never personally came to the conclusion or had the gut feeling that properties are fast, but the guidelines say they should be, so I just accept it.
I always struggle with what to name my slow "get" methods while avoiding FxCop warnings. GetPeopleList() sounds good to me, but then FxCop tells me it might be better as a property.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# / Object oriented design - maintaining valid object state - c#

It belongs in the class. Nothing but the class itself (and any helpers it delegates to) should know, or be concerned with, the rules that determine valid or invalid state.

Yes, properties should check on valid/invalid values when being set. That's what it's for.

Generally this belongs in the class itself, but to some extent it has to also depend on your definition of 'valid'. For example, consider the System.IO.FileInfo class. Is it valid if it refers to file that no longer exists? How would it know?

Related

How do get set methods stop dependencies?

Separate Options class, overloaded constructor, or public properties with default values?

How is validating a parameter "high up" on the callstack done?

Constructor or properties: which one is the better choice while assigning values

What guidelines are appropriate for determining when to implement a class member as a property versus a method?

Categories

Resources