How do you validate an object's internal state? - c#

I'm interested in hearing what technique(s) you're using to validate the internal state of an object during an operation that, from it's own point of view, only can fail because of bad internal state or invariant breach.
My primary focus is on C++, since in C# the official and prevalent way is to throw an exception, and in C++ there's not just one single way to do this (ok, not really in C# either, I know that).
Note that I'm not talking about function parameter validation, but more like class invariant integrity checks.
For instance, let's say we want a Printer object to Queue a print job asynchronously. To the user of Printer, that operation can only succeed, because an asynchronous queue result with arrive at another time. So, there's no relevant error code to convey to the caller.
But to the Printer object, this operation can fail if the internal state is bad, i.e., the class invariant is broken, which basically means: a bug. This condition is not necessarily of any interest to the user of the Printer object.
Personally, I tend to mix three styles of internal state validation and I can't really decide which one's the best, if any, only which one is absolutely the worst. I'd like to hear your views on these and also that you share any of your own experiences and thoughts on this matter.
The first style I use - better fail in a controllable way than corrupt data:
void Printer::Queue(const PrintJob& job)
{
// Validate the state in both release and debug builds.
// Never proceed with the queuing in a bad state.
if(!IsValidState())
{
throw InvalidOperationException();
}
// Continue with queuing, parameter checking, etc.
// Internal state is guaranteed to be good.
}
The second style I use - better crash uncontrollable than corrupt data:
void Printer::Queue(const PrintJob& job)
{
// Validate the state in debug builds only.
// Break into the debugger in debug builds.
// Always proceed with the queuing, also in a bad state.
DebugAssert(IsValidState());
// Continue with queuing, parameter checking, etc.
// Generally, behavior is now undefined, because of bad internal state.
// But, specifically, this often means an access violation when
// a NULL pointer is dereferenced, or something similar, and that crash will
// generate a dump file that can be used to find the error cause during
// testing before shipping the product.
}
The third style I use - better silently and defensively bail out than corrupt data:
void Printer::Queue(const PrintJob& job)
{
// Validate the state in both release and debug builds.
// Break into the debugger in debug builds.
// Never proceed with the queuing in a bad state.
// This object will likely never again succeed in queuing anything.
if(!IsValidState())
{
DebugBreak();
return;
}
// Continue with defenestration.
// Internal state is guaranteed to be good.
}
My comments to the styles:
I think I prefer the second style, where the failure isn't hidden, provided that an access violation actually causes a crash.
If it's not a NULL pointer involved in the invariant, then I tend to lean towards the first style.
I really dislike the third style, since it will hide lots of bugs, but I know people that prefers it in production code, because it creates the illusion of a robust software that doesn't crash (features will just stop to function, as in the queuing on the broken Printer object).
Do you prefer any of these or do you have other ways of achieving this?

You can use a technique called NVI (Non-Virtual-Interface) together with the template method pattern. This probably is how i would do it (of course, it's only my personal opinion, which is indeed debatable):
class Printer {
public:
// checks invariant, and calls the actual queuing
void Queue(const PrintJob&);
private:
virtual void DoQueue(const PringJob&);
};
void Printer::Queue(const PrintJob& job) // not virtual
{
// Validate the state in both release and debug builds.
// Never proceed with the queuing in a bad state.
if(!IsValidState()) {
throw std::logic_error("Printer not ready");
}
// call virtual method DoQueue which does the job
DoQueue(job);
}
void Printer::DoQueue(const PrintJob& job) // virtual
{
// Do the actual Queuing. State is guaranteed to be valid.
}
Because Queue is non-virtual, the invariant is still checked if a derived class overrides DoQueue for special handling.
To your options: I think it depends on the condition you want to check.
If it is an internal invariant
If it is an invariant, it should not
be possible for a user of your class
to violate it. The class should care
about its invariant itself. Therefor,
i would assert(CheckInvariant()); in
such a case.
It's merely a pre-condition of a method
If it's merely a pre-condition that
the user of the class would have to
guarantee (say, only printing after
the printer is ready), i would throw
std::logic_error as shown above.
I would really discourage from check a condition, but then doing nothing.
The user of the class could itself assert before calling a method that the pre-conditions of it are satisfied. So generally, if a class is responsible for some state, and it finds a state to be invalid, it should assert. If the class finds a condition to be violated that doesn't fall in its responsibility, it should throw.

The question is best considered in combination with how you test your software.
It's important that hitting a broken invariant during testing is filed as a high severity bug, just as a crash would be. Builds for testing during development can be made to stop dead and output diagnostics.
It can be appropriate to add defensive code, rather like your style 3: your DebugBreak would dump diagnostics in test builds, but just be a break point for developers. This makes less likely the situation where a developer is prevented from working by a bug in unrelated code.
Sadly, I've often seen it done the other way round, where developers get all the inconvenience, but test builds sail through broken invariants. Lots of strange behaviour bugs get filed, where in fact a single bug is the cause.

It's a fine and very relevant question. IMHO, any application architecture should provide a strategy to report broken invariants. One can decide to use exceptions, to use an 'error registry' object, or to explicitly check the result of any action. Maybe there are even other strategies - that's not the point.
Depending on a possibly loud crash is a bad idea: you cannot guarantee the application is going to crash if you don't know the cause of the invariant breach. In case it doesn't, you still have corrupt data.
The NonVirtual Interface solution from litb is a neat way to check invariants.

Tough question this one :)
Personally, I tend to just throw an exception since I'm usually too much into what I'm doing when implementing stuff to take care of what should be taken care of by your design. Usually this comes back and bites me later on...
My personal experience with the "Do-some-logging-and-then-don't-do-anything-more"-strategy is that it too comes back to bite you - especially if it's implemented like in your case (no global strategy, every class could potentially do it different ways).
What I would do, as soon as I discover a problem like this, would be to speak to the rest of my team and tell them that we need some kind of global error-handling. What the handling will do depends on your product (you don't want to just do nothing and log something in a subtle developer-minded file in an Air Traffic Controller-system, but it would work fine if you were making a driver for, say, a printer :) ).
I guess what Im saying is, that imho, this question is something you should resolve on a design-level of your application rather than at implementation level. - And sadly there's no magic solutions :(

Related

Error communication and recovery approaches in .NET

I am trying to do error communication and recovery in my C# code without using Exceptions.
To give an example, suppose there is a Func A, which can be called by Func B or Func C or other functions. Func A has to be designed keeping reuse in mind. (This application has an evolving library where new features will keep getting added over a period of time)
If Func A is not able to do what it is supposed to do, it returns an int, where any non-zero value indicates failure. I also want to communicate the reason for failure. The caller function can use this information in multiple ways:
It can show the error message to the user,
It may display its own error message more relevant to its context
It may itself return an int value indicating failure to further ancestor caller functions.
It may try to recover from the error, using some intelligent algorithm.
Hypothetically, any function on which other functions depend, may need to communicate multiple things to its caller function to take appropriate action, including status code, error message, and other variables indicating the state of data. Returning everything as a delimited string may not allow the caller function to retrieve the information without parsing the string (which will lead to its own problems and is not recommended).
The only other way is to return an object containing member variables for all required information. This may lead to too many 'state' objects, as each function will need to have its state object.
I want to understand how this requirement can be designed in the most elegant way. Note that at the time of coding, Func A may not know whether the caller function will have the intelligence to recover from the error or not, so I do not want to throw exceptions. Also, I want to see whether such a design is possible (and elegant at the same time) without using exceptions.
If the only way is to communicate using data objects for each function, then is it the way professional libraries are written. Can there be a generic data object? Note new functions may be added in future, which may have different state variables, or supporting information about their errors.
Also note that since the function's return value is a 'state' object, the actual data what it is supposed to return may need to be passed as a ref or out parameter.
Is there a design pattern for this?
I have read the following articles before posting this question:
http://blogs.msdn.com/b/ricom/archive/2003/12/19/44697.aspx
Do try/catch blocks hurt performance when exceptions are not thrown?
Error Handling without Exceptions
I have read many other articles also, which suggest not to use exceptions for code flow control, and for errors which are recoverable. Also, throwing exceptions have their own cost. Moreover, if the caller function wants to recover from exception thrown by each of the called functions, it will have to surround each function call with a try catch block, as a generic try catch block will not allow to 'continue' from the next line of the error line.
EDIT:
A few specific questions:
I need to write an application which will synchronize 2 different databases: one is a proprietory database, and the other is a SQL Server database. I want to encapsulate reusable functions in a separate layer.
The functionality is like this: The proprietory application can have many databases. Some information from each of these databases needs to be pushed to a single common SQL Server database. The proprietory application's databases can be read only when the application's GUI is open and it can be read only through XML.
The algorithm is like this:
Read List of Open databases in Proprietory Application
For each database, start Sync process.
Check whether the user currently logged in, in this database has the Sync Permission. (Note: each database may be opened using a different user id).
Read data from this database.
Transfer data to SQL Server
Proceed to next database.
While developing this application, I will be writing several reusable functions, like ReadUserPermission, ReadListOfDatabases, etc.
In this case, if ReadUserPermission finds that the permission does not exist, the caller should log this and proceed to next open database. If ReadListOfDatabases is not able to establish a connection with the Proprietory Application, the caller should automatically start the application, etc.
So which error conditions should be communicated should exceptions and which using return codes?
Note the reusable functions may be used in other projects, where the caller may have different error recovery requirements or capabilities, so that has to be kept in mind.
EDIT:
For all those advocating exceptions, I ask them:
If Func A calls Func B,C,D,E,F,G and Func B throws an exception on some error condition, but Func A can recover from this error and will like to continue rest of execution i.e. call Func B,C,D,..., how does exception handling allow to do this 'elegantly'? The only solution will be to wrap calls to each of B,C,D,... within a try catch block, so that remaining statements get executed.
Please also read these 2 comments:
https://stackoverflow.com/a/1279137/1113579
https://stackoverflow.com/a/1272547/1113579
Note I am not averse to using exceptions, if error recovery and remaining code execution can be achieved elegantly and without impacting performance. Also, slight performance impact is not a concern, but I prefer the design should be scalable and elegant.
EDIT:
Ok, Based on "Zdeslav Vojkovic" comments', I am now thinking about using exceptions.
If I were to use exceptions, can you give some use case when not to use exception, but use return codes? Note: I am talking about return codes, not the data which function is supposed to return. Is there any use case of using return codes to indicate success / failure, or no use case? That will help me understand better.
One use case of exceptions what I have understood from "Zdeslav Vojkovic" is when the callee function wants to compulsorily notify caller function of some condition and interrupt the caller execution. In the absence of exception, the caller may or may not choose to examine the return codes. But in case of exceptions, the caller function must necessarily handle the exception, if it wants to continue execution.
EDIT:
I had another interesting idea.
Any callee function which wants to support the idea of caller function recovering from its error can raise event, and check the event data after the event has been handled, and then decide to throw or not to throw exception. Error codes will not be used at all. Exceptions will be used for unrecovered errors. Basically when a callee function is unable to do what its contract says, it asks for "help" in the form of any available event handlers. Still if it is not able to perform the contract, it throws an exception. The advantage is that the added overhead of throwing exceptions is reduced, and exceptions are thrown only when the callee function or any of its caller functions are not able to recover from the error.
Suppose if the caller function does not want to handle the error, but rather the caller's caller function wants to handle the error, the custom event dispatcher will ensure that event handlers are called in the reverse order of event registration, i.e. the most recently registered event handler should be called prior to other registered event handlers, and if this event handler is able to resolve the error, the subsequent event handlers are not at all called. On the other hand, if the most recent event handler can not resolve the error, the event chain will propagate to the next handler.
Please give feedback on this approach.
How about a common FunctionResult object that you use as an out param on all your methods that you don't want to throw exceptions in?
public class FuncResultInfo
{
public bool ExecutionSuccess { get; set; }
public string ErrorCode { get; set; }
public ErrorEnum Error { get; set; }
public string CustomErrorMessage { get; set; }
public FuncResultInfo()
{
this.ExecutionSuccess = true;
}
public enum ErrorEnum
{
ErrorFoo,
ErrorBar,
}
}
public static class Factory
{
public static int GetNewestItemId(out FuncResultInfo funcResInfo)
{
var i = 0;
funcResInfo = new FuncResultInfo();
if (true) // whatever you are doing to decide if the function fails
{
funcResInfo.Error = FuncResultInfo.ErrorEnum.ErrorFoo;
funcResInfo.ErrorCode = "234";
funcResInfo.CustomErrorMessage = "Er mah gawds, it done blewed up!";
}
else
{
i = 5; // whatever.
}
return i;
}
}
Make sure all of your functions that can fail without exceptions have that out param for FuncResultInfo
"is it the way professional libraries are written?"
No, professional libraries are written by using exceptions for error handling - I am not sure if there is a pattern for using your suggested approach, but I consider it an anti-pattern (in .NET). After all, .NET itself is a professional framework and it uses exceptions. Besides, .NET developers are used to exceptions. Do you think that your library is really that special to force the users to learn completely different way of error handling?
What you just did is reinvent the COM error handling. If that is what you want to do then check this and ISupportErrorInfo interface for some ideas.
Why do you want to do this? I bet it is a performance 'optimization'.
Fear of performance issues with regard to exception handling is almost always a premature optimization. You will create an awkward API where each return value must be handled via ref/out parameters and which will hurt every user of your lib, just to solve the problem which likely doesn't exist at all.
"Func A may not know whether the caller function will have the
intelligence to recover from the error or not, so I do not want to
throw exceptions"
So you want to ensure that caller silently allows FuncA to mess up the system invariants and caller just goes on happily? It will just make it much harder to debug seemingly impossible bug which happens in another function later on due to this.
There are scenarios where it makes sense to avoid exceptions, but there should be a good justification for that. Exceptions are good idea.
EDIT: I see that you have added that you "have read many other articles also, which suggest not to use exceptions for code flow control". That is correct, exceptions in .NET are not for code flow but for error handling.
You ask:
If Func A calls Func B,C,D,E,F and it has to encapsulate each call
with try catch because it can recover from error or it will still like
to execute remaining function calls, then is not so many try catch
statements awkward
not more than alternative. You are making a mistake that you can simple handle all errors returned from functions in a same way but you usually can't.
Consider if you need to handle every function separately - worst case scenario and code is usually not written like that:
Result x, y;
try {
x = Function1();
}
catch(SomeException e) {
// handle error
}
try {
y = Function2();
}
catch(SomeOtherException e) {
// handle error
}
against:
int error;
Result x, y;
error = Function1(out x);
if(error != SOME_KNOWN_ISSUE) {
// handle error
}
error = Function2(out y);
if(error != SOME_KNOWN_ISSUE) {
// handle error
}
not a big difference. please don't tell me that you would not check the error code.
However, if you decide to ignore all errors (a horrible idea) then exceptions are simpler:
try {
var x = Function1();
var y = Function2();
var z = Function3();
}
catch Exception() { you still can see the message here and possibly rethrow }
vs
Result1 r1;
Function1(out r1);
Result2 r2;
Function2(out r2);
Result3 r3;
Function3(out r3);
// and here you still don't know whether there was an error
Can you elaborate what do you mean by "I need predictability with regard to time constraints"?
in some system level software or realtime stuff, you can't afford stack unwinding related to exception handling, as you can't guarantee the duration, and that could violate your timing requirements. But this is never the case in .NET as garbage collection is far worse in this regard.
Also, when you say "In .NET I would always use the exceptions for
error handling", can you explain how or what do you define as an error
condition? Is a recoverable situation an error condition or not an
error condition? –
#shambulater already gave a great example in comments. In FileStream, missing file is not recoverable and it will throw. In the client of FileStream it might be recoverable or not depending on context. Some clients will ignore it, some will exit the app, some will wrap it in another exception and let someone upstream to decide.
When will you not use exceptions?
In those cases where I would also not return an error code.
I use the FunctionResult approach extensively in ms-access and it works wonderfully. I consider it far better than error handling. For a start, each error message is application specific and is not the usually off target default error message. If the error propagates up a call list of functions, the error messages can be daisy chained together. This eventual error message looks like a call stack but is cleaner e.g. [Could not read photos from Drive F:, Could not read files, Drive not ready]. Wacko, I have just discovered that some Drives can be mounted but not ready. I could not have unit tested for that error as I didn't know that such an error could occur (means SD card reader is empty). Yet even without prior knowledge of this error, I could write an application that handled it gracefully.
My method is to call a method in a class that is written as a function that returns a boolean value. The return value is set to True in the last line of the function so if the function is exited before the last line, it is by default unsuccessful. I code, calling the function looks like if getphotos(folderID) then...do something .. Else report error. Inside the class module is a module level error variable (Str mEM) and it is read via a getter, so the class has an .em property which holds the error message. I also have a comment variable which is sometimes used like an error message, for example if the folder is empty, the code that looked for photos worked but did not return any photos. That would not be an error but it is something that I might want to communicate to the calling program. If there was an error, the user would get an error message and the calling procedure would exit. In contrast, if there was a cmt, such as 'no photos', then I might skill trying to read the photo metadata for example. How does Zdeslav Vojkovic handle subtlies like that with exceptions?
I am moving to C# hence finding this thread. I like the certainty of knowing why function calls failed (I interact with databases and filing systems all the time so I struggle to cover my projects with Unit Tests). I do agree with Zdeslav Vojkovic about using exceptions where their used is standard, but will not be be doing so in my own code. I am looking for a clean design pattern that allows me to validate parameters within the called function and to inform the caller if the parameters were not right.

What does "throw new NotImplementedException();" do exactly?

I have a class 'b' that inherits from class 'a'. In class 'a' there is some code that performs an action if an event is not null. I need that code to fire in class 'b' during specific times in the application. So in 'b' I subscribed to a new Handler(event).
If I leave the autogenerated event 'as is' in class 'b' with the throw new NotImplementedException(); line, the code works/runs as expected. As soon as I remove the thow exception, the application no longer works as expected.
So, what is the throw new NotImplementedException doing besides throwing the exception?
I realize I'm probably trying to solve my coding problem the wrong way at this point, and I am sure I will find a better way to do it (I'm still learning), but my question remains. Why does that line change the outcome of code?
EDIT:
I reallize I wan't very specific with my code. Unfortunately, because of strict policies, I can't be. I have in class 'a' an if statement.
if (someEvent != null)
When the code 'works', the if statement is returning true. When it isn't working as expected, it is returning 'false'. In class 'b', the only time the application 'works' (or the if statement returns true), is when I have the throw new NotImplementedException(); line in class 'b's event method that is autogenerated when I attached the new event.
Think about this: what if you want to add two integers with the following method...
private int Add(int x, int y)
{
}
...and have no code inside to do such (the method doesn't even return an integer). This is what NotImplementedException is used for.
NotImplementedException is simply an exception that is used when the code for what you're trying to do isn't written yet. It's often used in snippets as a placeholder until you fill in the body of whatever has been generated with actual code.
It's good practice to use a NotImplementedException as opposed to an empty block of code, because it will very clearly throw an error alerting you that section of your code isn't complete yet. If it was blank, then the method might run and nothing would happen, and that's a pain to debug sometimes.
It is simply an exception, as for why it means your application "works" is entirely dependent on the code handling any exceptions.
It is not a "special" exception as opposed to a normal exception (other than being derived from Exception like the rest). You tend to see it with code generation as a placeholder for implementing the member it is throwing inside. It is a lot easier to do this than have code generation try to understand the member structure in order to output compiling code.
When you say "no longer works as expected", I am assuming it compiles. If removing this stops the code from compiling then the chances are good you have a compilation error about a return value.
Perhaps the code that triggers the event expects a certain response from handlers, or if there are no handlers or exceptions occur it defaults the response and carries on. In your case, there is a handler and no exception so it expects a better response?
Complete guesswork.
If there is code in a that you need to use in b, consider making the method that houses the code protected and optionally virtual if you need to override the behaviour.
NotImplementedException, I believe, has a little special meaning: "this code is still in development and needs to be changed". Developers put this exception to make some code compileable, but not executable (to avoid data damage).
Information about this exception type can be found in documentation, it explains the meaning very detailed: https://learn.microsoft.com/en-us/dotnet/api/system.notimplementedexception?view=netcore-3.1
Some development tools, like Resharper, for example, generates new members with NotImplementedException inside, thus protecting you from execution the code, which is not ready. Same time they highlight this exceptions same way as "//todo:" comment
For other situations, for example, when you don't need to implement an interface or virtual member, or may be, you don't implement some paths in switch/case, if/else etc statements, you probably will use NotSupportedException, OutOfRangeException, ArgumentNullException, InvalidOperationException etc.
At the end, consider this situation to understand the purpose of NotImplementedException:
We are writing banking application, and, at the moment, implementing money transferring feature, we have:
public void Transfer(sourceAccount, destAccount, decimal sum)
{
sourceAccount.Credit(sum);
destAccount.Debit(sum);
}
Here we are calling two methods which do not exist yet. We are generating both with default NotImplementedException, and going to first one (Credit) to implement it. Lets say, implementation took some time, we have written test for it, and even have done several manual tests. We completely forgot about second method "Debit" and deploying our application to beta, or even to production. Testers or users start using our application and soon they are coming to money transfer functionality. They are trying to call Transfer, which shows them a general message "We are so sorry, we got some errors", same time we, as a developer team, receive notification about NotImplementedException happened with the stack trace pointing us to the method "Debit". Now we are implementing it, and releasing new version, which can do money transfers.
Now imagine, what would happen, if there was not an exception there: users would spend lot of money trying to do that transfers, trying several times, each time throwing money in to a hole.
Depending on the purposes of the application it can be bigger or smaller problem: may be button on your calculator does not work, or may be that was scientific calculator and we just missed some important math while calculating vaccine code against some aggressive virus.
The NotImplementedException is a way of declaring that a particular method of an interface or base class is simply not implemented in your type. This is the exception form of the E_NOTIMPL error code.
In general an implementation shouldn't be throwing a NotImplementedException unless it's a specifically supported scenario for that particular interface. In the vast majority of scenarios this is not the case and types should fully implement interfaces.
In terms of what it's doing though. It's simply throwing an exception. It's hard to say why the program keeps function in the face of the exception and breaks without it unless you give us a bit more information.

CRUD operations; do you notify whether the insert,update etc. went well?

I have a simple question for you (i hope) :)
I have pretty much always used void as a "return" type when doing CRUD operations on data.
Eg. Consider this code:
public void Insert(IAuctionItem item) {
if (item == null) {
AuctionLogger.LogException(new ArgumentNullException("item is null"));
}
_dataStore.DataContext.AuctionItems.InsertOnSubmit((AuctionItem)item);
_dataStore.DataContext.SubmitChanges();
}
and then considen this code:
public bool Insert(IAuctionItem item) {
if (item == null) {
AuctionLogger.LogException(new ArgumentNullException("item is null"));
}
_dataStore.DataContext.AuctionItems.InsertOnSubmit((AuctionItem)item);
_dataStore.DataContext.SubmitChanges();
return true;
}
It actually just comes down to whether you should notify that something was inserted (and went well) or not ?
I typically go with the first option there.
Given your code, if something goes wrong with the insert there will be an Exception thrown.
Since you have no try/catch block around the Data Access code, the calling code will have to handle that Exception...thus it will know both if and why it failed. If you just returned true/false, the calling code will have no idea why there was a failure (it may or may not care).
I think it would make more sense if in the case where "item == null" that you returned "false". That would indicate that it was a case that you expect to happen not infrequently, and that therefore you don't want it to raise an exception but the calling code could handle the "false" return value.
As it standards, you'll return "true" or there'll be an exception - that doesn't really help you much.
Don't fight the framework you happen to be in. If you are writing C code, where return values are the most common mechanism for communicating errors (for lack of a better built in construct), then use that.
.NET base class libraries use Exceptions to communicate errors and their absence means everything is okay. Because almost all code uses the BCL, much of it will be written to expect exceptions, except when it gets to a library written as if C# was C with no support for Exceptions, each invocation will need to be wrapped in a if(!myObject.DoSomething){ System.Writeline("Damn");} block.
For the next developer to use your code (which could be you after a few years when you've forgotten how you originally did it), it will be a pain to start writing all the calling code to take advantage of having error conditions passed as return values, as changes to values in an output parameter, as custom events, as callbacks, as messages to queue or any of the other imaginable ways to communicate failure or lack thereof.
I think it depends. Imaging that your user want to add a new post onto a forum. And the adding fail by some reason, then if you don't tell the user, they will never know that something wrong. The best way is to throw another exception with a nice message for them
And if it does not relate to the user, and you already logged it out to database log, you shouldn't care about return or not any more
I think it is a good idea to notify the user if the operation went well or not. Regardless how much you test your code and try to think out of the box, it is most likely that during its existence the software will encounter a problem you did not cater for, thus making it behave incorrectly. The use of notifications, to my opinion, allow the user to take action, a sort of Plan B if you like when the program fails. This action can either be a simple work around or else, inform people from the IT department so that they can fix it.
I'd rather click that extra "OK" button than learn that something went wrong when it is too late.
You should stick with void, if you need more data - use variables for it, as either you'll need specific data (And it can be more than one number/string) and an excpetion mechanism is a good solution for handling errors.
so.. if you want to know how many rows affected, if a sp returned something ect... - a return type will limit you..

Business Validation Logic Code Smell

Consider the following code:
partial class OurBusinessObject {
partial void OnOurPropertyChanged() {
if(ValidateOurProperty(this.OurProperty) == false) {
this.OurProperty = OurBusinessObject.Default.OurProperty;
}
}
}
That is, when the value of OurProperty in OurBusinessObject is changed, if the value is not valid, set it to be the default value. This pattern strikes me as code smell but others here (at my employer) do not agree. What are your thoughts?
Edited to add: I've been asked to add an explanation for why this is thought to be okay. The idea was that rather than having the producers of the business object validate the data, the business object could validate its own properties, and set clean default values in cases when the validation failed. Further, it was thought, if the validation rules change, the business object producers won't have to change their logic as the business object will take care of validating and cleaning the data.
It absolutely horrible. Good luck trying to debug issues in Production. The only thing it can lead to is to cover bugs, which will just pop up somewhere else, where it will be not obvious at all where they are coming from.
I think I have to agree with you. This could definitely lead to issues where the logic unexpectedly returns to the defaults, which could be very difficult to debug.
At the very least, this behavior should be logged, but this seems more like a case for throwing an exception.
To me this looks like the symptom, rather than the actual problem. What's really going on is that the setter for OurProperty fails to preserve the original value for use in the OnOurPropertyChanged event. If you do that, suddenly it becomes easier to make better choices about how to proceed.
For that matter, what you really want is an OnOurPropertyChanging event that is raised from the setter before the assignment actually takes place. This way you can allow or deny the assignment in the first place. Otherwise there is a small amount of time where your object is not valid, and that means the type is not thread safe and you can't count on consistency if you you consider concurrency is a concern.
Definitely a questionable practice.
How would an invalid value ever get assigned to this property? Wouldn't that indicate there's a bug somewhere in the calling code, in which case you'd probably want to know right away? Or that a user input something incorrectly in which case they should be informed right away?
In general, "failing fast" makes tracking down bugs a lot easier. Silently assigning a default behind the scenes is akin to "magic" and is only going to cause confusion to whoever has to maintain the codebase.
Distaste for the term 'code smell' aside, you might be right - depending on where it's coming from, silently changing the value is probably not a good thing. It would be better to ensure your value is valid instead of just reverting to the default.
I would highly recommend refactoring it to validate before setting the property.
You could always have a method that was more like:
T GetValidValueForProperty<T>(T suggestedValue, T currentValue);
or even:
T GetValidValueForProperty<T>(string propertyName, T suggestedValue, T currentValue);
If you do that, before you set the property, you could pass it to the business logic to validate, and the business logic could return the default property value (your current behavior) OR (more reasonable in most cases), return the currentValue, so setting had no effect.
This would be used more like:
T OurProperty
{
get
{
return this.propertyBackingField;
}
set
{
this.propertyBackingField = this.GetValidValueForProperty(value, this.propertyBackingField);
}
}
It doesn't really matter what you do, but it is important to validate before you change your current value. If you change your value before you determine whether the new value is good, you're asking for trouble in the long term.
It may or may not "smell", but I'm leaning more towards, "Yes it smells".
Does setting OurProperty to the default have a logical reason for doing so or is it simply convenient to do so in code? It is possible, however unlikely in practice, to contrive a scenario where this would be expected behavior, but I'm guessing that in most cases you should be throwing an exception and handling it cleanly somewhere.
Does setting the value to default get you closer to or move you away from the functional specifications description of how the application is supposed to work?
You are validating a change after it has been done? Validation should be done before the busyness property is altered.
Answering your questing: the solution presented in that code snippet can generate big issues in production, you don't know whether the default value appeared there due to invalid input or just because something else set the value to the default
It's hard to say without knowing the context or business rules. Generally speaking though, you should just validate at time of input, and maybe once more before persistence, but the way you're doing it won't really allow you to validate since you're not allowing a property to contain an invalid value.
I think your validation logic should raise an exception if asked to use an invalid value. If your consumer wants to use a default value, it should ask for it explicitly, either through a special, documented value or through another method.
The only kind of exceptions I can think would be forgivable would be, like, normalizing case, like in email fields to detect duplicates.
Furthermore, why in the world is this partial? Using partial classes for anything but a generated code framework is itself is a codesmell since you're likely using them to hide complexity which should be split up anyways!
I agree with Grzenio and would add that the best way to handle a validation error down in the domain layer (aka business objects) is to generate an exception. That exception could propagate all the way up into the UI layer where it could be handled and interactively rectified with the user. However, depending on the capabilities and technologies involved, this may not always be feasible, in which case, you probably should be validating up in the UI layer (possibly in addition to the domain layer). It's less than ideal, but might be your only viable option. In any case, setting it to a default value is a horrible thing to do and will lead to subtle bugs that will be near impossible to diagnose. If done on a broad scale, you'll have an unmaintainable system in no time (especially if you have no unit tests backing you up).
An argument that I have against this is the following. Suppose the user/producer of the business object accidentally inputs an invalid value. Then this pattern will gloss over that fact and default to clean data. But the right way to handle this is to throw an error and have the user/producer verify/clean their input data.
I'd say, implement PropertyChanging and allow the business logic to approve/deny a value, and then afterwards, throw an exception for invalid values.
This way, you don't ever have an invalid value. That, and you should never change a user's information. What if a user adds an entry to the database, and keeps track of it for his own records? Your code would re-assign the value to the default, and he's now tracking the wrong information. Its better to inform the user ASAP.

How much null checking is enough?

What are some guidelines for when it is not necessary to check for a null?
A lot of the inherited code I've been working on as of late has null-checks ad nauseam. Null checks on trivial functions, null checks on API calls that state non-null returns, etc. In some cases, the null-checks are reasonable, but in many places a null is not a reasonable expectation.
I've heard a number of arguments ranging from "You can't trust other code" to "ALWAYS program defensively" to "Until the language guarantees me a non-null value, I'm always gonna check." I certainly agree with many of those principles up to a point, but I've found excessive null-checking causes other problems that usually violate those tenets. Is the tenacious null checking really worth it?
Frequently, I've observed codes with excess null checking to actually be of poorer quality, not of higher quality. Much of the code seems to be so focused on null-checks that the developer has lost sight of other important qualities, such as readability, correctness, or exception handling. In particular, I see a lot of code ignore the std::bad_alloc exception, but do a null-check on a new.
In C++, I understand this to some extent due to the unpredictable behavior of dereferencing a null pointer; null dereference is handled more gracefully in Java, C#, Python, etc. Have I just seen poor-examples of vigilant null-checking or is there really something to this?
This question is intended to be language agnostic, though I am mainly interested in C++, Java, and C#.
Some examples of null-checking that I've seen that seem to be excessive include the following:
This example seems to be accounting for non-standard compilers as C++ spec says a failed new throws an exception. Unless you are explicitly supporting non-compliant compilers, does this make sense? Does this make any sense in a managed language like Java or C# (or even C++/CLR)?
try {
MyObject* obj = new MyObject();
if(obj!=NULL) {
//do something
} else {
//??? most code I see has log-it and move on
//or it repeats what's in the exception handler
}
} catch(std::bad_alloc) {
//Do something? normally--this code is wrong as it allocates
//more memory and will likely fail, such as writing to a log file.
}
Another example is when working on internal code. Particularly, if it's a small team who can define their own development practices, this seems unnecessary. On some projects or legacy code, trusting documentation may not be reasonable... but for new code that you or your team controls, is this really necessary?
If a method, which you can see and can update (or can yell at the developer who is responsible) has a contract, is it still necessary to check for nulls?
//X is non-negative.
//Returns an object or throws exception.
MyObject* create(int x) {
if(x<0) throw;
return new MyObject();
}
try {
MyObject* x = create(unknownVar);
if(x!=null) {
//is this null check really necessary?
}
} catch {
//do something
}
When developing a private or otherwise internal function, is it really necessary to explicitly handle a null when the contract calls for non-null values only? Why would a null-check be preferable to an assert?
(obviously, on your public API, null-checks are vital as it's considered impolite to yell at your users for incorrectly using the API)
//Internal use only--non-public, not part of public API
//input must be non-null.
//returns non-negative value, or -1 if failed
int ParseType(String input) {
if(input==null) return -1;
//do something magic
return value;
}
Compared to:
//Internal use only--non-public, not part of public API
//input must be non-null.
//returns non-negative value
int ParseType(String input) {
assert(input!=null : "Input must be non-null.");
//do something magic
return value;
}
One thing to remember that your code that you write today while it may be a small team and you can have good documentation, will turn into legacy code that someone else will have to maintain. I use the following rules:
If I'm writing a public API that will be exposed to others, then I will do null checks on all reference parameters.
If I'm writing an internal component to my application, I write null checks when I need to do something special when a null exists, or when I want to make it very clear. Otherwise I don't mind getting the null reference exception since that is also fairly clear what is going on.
When working with return data from other peoples frameworks, I only check for null when it is possible and valid to have a null returned. If their contract says it doesn't return nulls, I won't do the check.
First note that this a special case of contract-checking: you're writing code that does nothing other than validate at runtime that a documented contract is met. Failure means that some code somewhere is faulty.
I'm always slightly dubious about implementing special cases of a more generally useful concept. Contract checking is useful because it catches programming errors the first time they cross an API boundary. What's so special about nulls that means they're the only part of the contract you care to check? Still,
On the subject of input validation:
null is special in Java: a lot of Java APIs are written such that null is the only invalid value that it's even possible to pass into a given method call. In such cases a null check "fully validates" the input, so the full argument in favour of contract checking applies.
In C++, on the other hand, NULL is only one of nearly 2^32 (2^64 on newer architectures) invalid values that a pointer parameter could take, since almost all addresses are not of objects of the correct type. You can't "fully validate" your input unless you have a list somewhere of all objects of that type.
The question then becomes, is NULL a sufficiently common invalid input to get special treatment that (foo *)(-1) doesn't get?
Unlike Java, fields don't get auto-initialized to NULL, so a garbage uninitialized value is just as plausible as NULL. But sometimes C++ objects have pointer members which are explicitly NULL-inited, meaning "I don't have one yet". If your caller does this, then there is a significant class of programming errors which can be diagnosed by a NULL check. An exception may be easier for them to debug than a page fault in a library they don't have the source for. So if you don't mind the code bloat, it might be helpful. But it's your caller you should be thinking of, not yourself - this isn't defensive coding, because it only 'defends' against NULL, not against (foo *)(-1).
If NULL isn't a valid input, you could consider taking the parameter by reference rather than pointer, but a lot of coding styles disapprove of non-const reference parameters. And if the caller passes you *fooptr, where fooptr is NULL, then it has done nobody any good anyway. What you're trying to do is squeeze a bit more documentation into the function signature, in the hope that your caller is more likely to think "hmm, might fooptr be null here?" when they have to explicitly dereference it, than if they just pass it to you as a pointer. It only goes so far, but as far as it goes it might help.
I don't know C#, but I understand that it's like Java in that references are guaranteed to have valid values (in safe code, at least), but unlike Java in that not all types have a NULL value. So I'd guess that null checks there are rarely worth it: if you're in safe code then don't use a nullable type unless null is a valid input, and if you're in unsafe code then the same reasoning applies as in C++.
On the subject of output validation:
A similar issue arises: in Java you can "fully validate" the output by knowing its type, and that the value isn't null. In C++, you can't "fully validate" the output with a NULL check - for all you know the function returned a pointer to an object on its own stack which has just been unwound. But if NULL is a common invalid return due to the constructs typically used by the author of the callee code, then checking it will help.
In all cases:
Use assertions rather than "real code" to check contracts where possible - once your app is working, you probably don't want the code bloat of every callee checking all its inputs, and every caller checking its return values.
In the case of writing code which is portable to non-standard C++ implementations, then instead of the code in the question which checks for null and also catches the exception, I'd probably have a function like this:
template<typename T>
static inline void nullcheck(T *ptr) {
#if PLATFORM_TRAITS_NEW_RETURNS_NULL
if (ptr == NULL) throw std::bad_alloc();
#endif
}
Then as one of the list of things you do when porting to a new system, you define PLATFORM_TRAITS_NEW_RETURNS_NULL (and maybe some other PLATFORM_TRAITS) correctly. Obviously you can write a header which does this for all the compilers you know about. If someone takes your code and compiles it on a non-standard C++ implementation that you know nothing about, they're fundamentally on their own for bigger reasons than this, so they'll have to do it themselves.
If you write the code and its contract, you are responsible for using it in terms of its contract and ensuring the contract is correct. If you say "returns a non-null" x, then the caller should not check for null. If a null pointer exception then occurs with that reference/pointer, it is your contract that is incorrect.
Null checking should only go to the extreme when using a library that is untrusted, or does not have a proper contract. If it is your development team's code, stress that the contracts must not be broken, and track down the person who uses the contract incorrectly when bugs occur.
Part of this depends on how the code is used -- if it is a method available only within a project vs. a public API, for example. API error checking requires something stronger than an assertion.
So while this is fine within a project where it's supported with unit tests and stuff like that:
internal void DoThis(Something thing)
{
Debug.Assert(thing != null, "Arg [thing] cannot be null.");
//...
}
in a method where you don't have control over who calls it, something like this may be better:
public void DoThis(Something thing)
{
if (thing == null)
{
throw new ArgumentException("Arg [thing] cannot be null.");
}
//...
}
It depends on the situation. The rest of my answer assumes C++.
I never test the return value of new
since all the implementations I use
throw bad_alloc on failure. If I
see a legacy test for new returning
null in any code I'm working on, I
cut it out and don't bother to
replace it with anything.
Unless small minded coding standards
prohibit it, I assert documented
preconditions. Broken code which
violates a published contract needs
to fail immediately and
dramatically.
If the null arises from a runtime
failure which isn't due to broken
code, I throw. fopen failure and
malloc failure (though I rarely if
ever use them in C++) would fall
into this category.
I don't attempt to recover from
allocation failure. Bad_alloc gets
caught in main().
If the null test
is for an object which is
collaborator of my class, I rewrite
the code to take it by reference.
If the collaborator really might not
exist, I use the Null Object
design pattern to create a
placeholder to fail in well defined
ways.
NULL checking in general is evil as it's add a small negative token to the code testability. With NULL checks everywhere you can't use "pass null" technique and it will hit you when unit testing. It's better to have unit test for the method than null check.
Check out decent presentation on that issue and unit testing in general by Misko Hevery at http://www.youtube.com/watch?v=wEhu57pih5w&feature=channel
Older versions of Microsoft C++ (and probably others) did not throw an exception for failed allocations via new, but returned NULL. Code that had to run in both standard-conforming and older versions would have the redundant checking that you point out in your first example.
It would be cleaner to make all failed allocations follow the same code path:
if(obj==NULL)
throw std::bad_alloc();
It's widely known that there are procedure-oriented people (focus on doing things the right way) and results-oriented people (get the right answer). Most of us lie somewhere in the middle. Looks like you've found an outlier for procedure-oriented. These people would say "anything's possible unless you understand things perfectly; so prepare for anything." For them, what you see is done properly. For them if you change it, they'll worry because the ducks aren't all lined up.
When working on someone else's code, I try to make sure I know two things.
1. What the programmer intended
2. Why they wrote the code the way they did
For following up on Type A programmers, maybe this helps.
So "How much is enough" ends up being a social question as much as a technical question - there's no agreed-upon way to measure it.
(It drives me nuts too.)
Personally I think null testing is unnnecessary in the great majority of cases. If new fails or malloc fails you have bigger issues and the chance of recovering is just about nil in cases where you're not writing a memory checker! Also null testing hides bugs a lot in the development phases since the "null" clauses are frequently just empty and do nothing.
When you can specify which compiler is being used, for system functions such as "new" checking for null is a bug in the code. It means that you will be duplicating the error handling code. Duplicate code is often a source of bugs because often one gets changed and the other doesn't. If you can not specify the compiler or compiler versions, you should be more defensive.
As for internal functions, you should specify the contract and make sure that contract is enforce via unit tests. We had a problem in our code a while back where we either threw an exception or returned null in case of a missing object from our database. This just made things confusing for the caller of the api so we went through and made it consistant throughout the entire code base and removed the duplicate checks.
The important thing (IMHO) is to not have duplicate error logic where one branch will never be invoked. If you can never invoke code, then you can't test it, and you will never know if it is broken or not.
I'd say it depends a little on your language, but I use Resharper with C# and it basically goes out of it's way to tell me "this reference could be null" in which case I add a check, if it tells me "this will always be true" for "if (null != oMyThing && ....)" then I listen to it an don't test for null.
Whether to check for null or not greatly depends on the circumstances.
For example in our shop we check parameters to methods we create for null inside the method. The simple reason is that as the original programmer I have a good idea of exactly what the method should do. I understand the context even if the documentation and requirements are incomplete or less than satisfactory. A later programmer tasked with maintenance may not understand the context and may assume, wrongly, that passing null is harmless. If I know null would be harmful and I can anticipate that someone may pass null, I should take the simple step of making sure that the method reacts in a graceful way.
public MyObject MyMethod(object foo)
{
if (foo == null)
{
throw new ArgumentNullException("foo");
}
// do whatever if foo was non-null
}
I only check for NULL when I know what to do when I see NULL. "Know what to do" here means "know how to avoid a crash" or "know what to tell the user besides the location of the crash". For example, if malloc() returns NULL, I usually have no option but to abort the program. On the other hand, if fopen() returns NULL, I can let the user know the file name that could not be open and may be errno. And if find() returns end(), I usually know how to continue without crashing.
Lower level code should check use from higher level code. Usually this means checking arguments, but it can mean checking return values from upcalls. Upcall arguments need not be checked.
The aim is to catch bugs in immediate and obvious ways, as well as documenting the contract in code that does not lie.
I don't think it's bad code. A fair amount of Windows/Linux API calls return NULL on failure of some sort. So, of course, I check for failure in the manner the API specifies. Usually I wind up passing control flow to an error module of some fashion instead of duplicating error-handling code.
If I receive a pointer that is not guaranteed by language to be not null, and am going to de-reference it in a way that null will break me, or pass out put my function where I said I wouldn't produce NULLs, I check for NULL.
It is not just about NULLs, a function should check pre- and post-conditions if possible.
It doesn't matter at all if a contract of the function that gave me the pointer says it'll never produce nulls. We all make bugs. There's a good rule that a program shall fail early and often, so instead of passing the bug to another module and have it fail, I'll fail in place. Makes things so much easier to debug when testing. Also in critical systems makes it easier to keep the system sane.
Also, if an exception escapes main, stack may not be rolled up, preventing destructors from running at all (see C++ standard on terminate()). Which may be serious. So leaving bad_alloc unchecked can be more dangerous than it seems.
Fail with assert vs. fail with a run time error is quite a different topic.
Checking for NULL after new() if standard new() behavior has not been altered to return NULL instead of throwing seems obsolete.
There's another problem, which is that even if malloc returned a valid pointer, it doesn't yet mean you have allocated memory and can use it. But that is another story.
My first problem with this, is that it leads to code which is littered with null checks and the likes. It hurts readability, and i’d even go as far as to say that it hurts maintainability because it really is easy to forget a null check if you’re writing a piece of code where a certain reference really should never be null. And you just know that the null checks will be missing in some places. Which actually makes debugging harder than it needs to be. Had the original exception not been caught and replaced with a faulty return value, then we would’ve gotten a valuable exception object with an informative stacktrace. What does a missing null check give you? A NullReferenceException in a piece of code that makes you go: wtf? this reference should never be null!
So then you need to start figuring out how the code was called, and why the reference could possibly be null. This can take a lot of time and really hurts the efficiency of your debugging efforts. Eventually you’ll figure out the real problem, but odds are that it was hidden pretty deeply and you spent a lot more time searching for it than you should have.
Another problem with null checks all over the place is that some developers don’t really take the time to properly think about the real problem when they get a NullReferenceException. I’ve actually seen quite a few developers just add a null check above the code where the NullReferenceException occurred. Great, the exception no longer occurs! Hurray! We can go home now! Umm… how bout ‘no you can’t and you deserve an elbow to the face’? The real bug might not cause an exception anymore, but now you probably have missing or faulty behavior… and no exception! Which is even more painful and takes even more time to debug.
At first, this seemed like a strange question: null checks are great and a valuable tool. Checking that new returns null is definitely silly. I'm just going to ignore the fact that there are languages that allow that. I'm sure there are valid reasons, but I really don't think I can handle living in that reality :) All kidding aside, it seems like you should at least have to specify that the new should return null when there isn't enough memory.
Anyway, checking for null where appropriate leads to cleaner code. I'd go so far as to say that never assigning function parameters default values is the next logical step. To go even further, returning empty arrays, etc. where appropriate leads to even cleaner code. It is nice to not have to worry about getting nulls except where they are logically meaningful. Nulls as error values are better avoided.
Using asserts is a really great idea. Especially if it gives you the option of turning them off at runtime. Plus, it is a more explicitly contractual style :)

Categories