A better way to handle NullReferenceExceptions in C# - c#

I recently had a coding bug where under certain conditions a variable wasn't being initialized and I was getting a NullReferenceException . This took a while to debug as I had to find the bits of data that would generate this to recreate it the error as the exception doesn't give the variable name.
Obviously I could check every variable before use and throw an informative exception but is there a better (read less coding) way of doing this? Another thought I had was shipping with the pdb files so that the error information would contain the code line that caused the error. How do other people avoid / handle this problem?
Thanks

Firstly: don't do too much in a single statement. If you have huge numbers of dereferencing operations in one line, it's going to be much harder to find the culprit. The Law of Demeter helps with this too - if you've got something like order.SalesClerk.Manager.Address.Street.Length then you've got a lot of options to wade through when you get an exception. (I'm not dogmatic about the Law of Demeter, but everything in moderation...)
Secondly: prefer casting over using as, unless it's valid for the object to be a different type, which normally involves a null check immediately afterwards. So here:
// What if foo is actually a Control, but we expect it to be String?
string text = foo as string;
// Several lines later
int length = text.Length; // Bang!
Here we'd get a NullReferenceException and eventually trace it back to text being null - but then you wouldn't know whether that's because foo was null, or because it was an unexpected type. If it should really, really be a string, then cast instead:
string text = (string) foo;
Now you'll be able to tell the difference between the two scenarios.
Thirdly: as others have said, validate your data - typically arguments to public and potentially internal APIs. I do this in enough places in Noda Time that I've got a utility class to help me declutter the check. So for example (from Period):
internal LocalInstant AddTo(LocalInstant localInstant,
CalendarSystem calendar, int scalar)
{
Preconditions.CheckNotNull(calendar, "calendar");
...
}
You should document what can and can't be null, too.

In a lot of cases it's near impossible to plan and account for every type of exception that might happen at any given point in the execution flow of your application. Defensive coding is effective only to a certain point. The trick is to have a solid diagnostics stack incorporated into your application that can give you meaningful information about unhandled errors and crashes. Having a good top-level (last ditch) handler at the app-domain level will help a lot with that.
Yes, shipping the PDBs (even with a release build) is a good way to obtain a complete stack trace that can pinpoint the exact location and causes of errors. But whatever diagnostics approach you pick, it needs to be baked into the design of the application to begin with (ideally). Retrofitting an existing app can be tedious and time/money-intensive.

Sorry to say that I will always make a check to verify that any object I am using in a particular method is not null.
It's as simple as
if( this.SubObject == null )
{
throw new Exception("Could not perform METHOD - SubObject is null.");
}
else
{
...
}
Otherwise I can't think of any way to be thorough. Wouldn't make much sense to me not to make these checks anyway; I feel it's just good practice.

First of all you should always validate your inputs. If null is not allowed, throw an ArgumentNullException.
Now, I know how that can be painful, so you could look into assembly rewriting tools that do it for you. The idea is that you'd have a kind of attribute that would mark those arguments that can't be null:
public void Method([NotNull] string name) { ...
And the rewriter would fill in the blanks...
Or a simple extension method could make it easier
name.CheckNotNull();

If you are just looking for a more compact way to code against having null references, don't overlook the null-coalescing operator ?? MSDN
Obviously, it depends what you are doing but it can be used to avoid extra if statements.

Related

Good practices when handling Exceptions in C#

I've read in The Pragmatic Programmer, and in some other articles (including one from Joel Spolsky), that you should only throw Exceptions in exceptional cases. Otherwise, you should return an error.
Sometimes it is possible (returning -1, -0, or a positive number, for example), but there are other cases where that is not possible. I mean, if you are returning a class, you can always return null, but I think the idea is to return something that will alert the caller what happened.
If I always return null, I don't think it is useful to say: If this method returns null, it could be because of A, B, C, D, or E
So, how exactly can this be implemented in C#?
Edit:
Hours after I posted this question, I saw another question here where the question itself was about wheter the code posted in was a good practice or not.
I see that is another way of doing what I was asking here. Here is the link:
Generic property disadvantages?
A far better rule for when to throw exceptions is this:
Throw an exception when your method can't do what its name says that it does.
A null value can be used to indicate that you've asked for something that does not exist. It should not be returned for any other error condition.
The rule "only throw exceptions in cases that are exceptional" is unhelpful waffle IMO since it gives you no benchmark to indicate what is exceptional and what isn't. It's like saying "only eat food that is edible."
By using numbers you're directly contradicting what Microsoft recommend with Exceptions. The best source of information I've found that de-mystifies the whole subject, is Jeffrey Richter's CLR via C# 3. Also the .NET Framework Guidelines book is worth a read for its material on Exceptions.
To quote MSDN:
Do not return error codes. Exceptions are the primary means of reporting errors in frameworks.
One solution you could adopt is an out parameter, which gets the result. Your method then returns a bool much like a lot TryParse methods you meet in the framework.
One example to consider is int.TryParse. That uses an out parameter for the parsed value, and a bool return value to indicate success or failure. The bool could be replaced with an enum or a more complex object if the situation merits it. (For example, data validation could fail in ways that would really warrant a collection of failures etc.)
An alternative to this with .NET 4 is Tuple... so int.TryParse could have been:
public static Tuple<int, bool> TryParse(string text)
The next possibility is to have an object encapsulating the whole result, including the failure mode if appropriate. For example, you can ask a Task<T> for its result, its status, and its exception if it failed.
All of this is only appropriate if it's not really an error for this to occur, as such. It doesn't indicate a bug, just something like bad user input. I really don't like returning error codes - exceptions are much more idiomatic in .NET.
Microsoft has documented best practices for error handling in .NET
http://msdn.microsoft.com/en-us/library/8ey5ey87(VS.71).aspx
I'd recommend following those guidelines, as it's the standard for .NET and you'll have far fewer conflicts with other .NET developers.
(Note, I realize I posted to an older link, but the recommendations have remained.)
I also realize that different platforms vary on how they view proper error handling. This could be a subjective question, but I'll stick to what I said above - when developing in .NET follow the .NET guidelines.
You may want to consider following the TryXXX pattern with a couple simple overloads for the client to consider.
// Exception
public void Connect(Options o);
// Error Code
public bool TryConnect(Options o, out Error e);
Joel Spolsky is wrong. Returning status via error/return codes means that you can't trust the results of any method invocation — the returned value must be tested and dealt with.
This means that every method invocation returning such a value introduces at least one choice point (or more, depending on the domain of the returned value), and thus more possible execution paths through the code. All of which must be tested.
This code, then, assuming that the contract of Method1() and Method2() is to either succeed or throw an exception, has 1 possibly flow through it.:
foo.Method(...) ;
bar.Method(...) ;
If these methods indicate status via a return code, it all of a sudden gets very messy very quickly. Just returning a binary value:
bool fooSuccess = foo.Method(...);
if ( fooSuccess )
{
bool barSuccess = bar.Method(...);
if ( barSuccess )
{
// The normal course of events -- do the usual thing
}
else
{
// deal with bar.Method() failure
}
}
else // foo.Method() failed
{
// deal with foo.Method() failure
}
Returning status codes rather than throwing exceptions
complicates testing
complicates comprehension of the code
will almost certainly introduce bugs because developers aren't going to capture and test every possible case (after all, how often have you actually seen an I/O error occur?).
The caller should be checking to make sure things are hunky-dory prior to invocation the method (e.g., Check for file existence. If the file doesn't exist, don't try to open it).
Enforce the contracts of your method:
Preconditions. Caller warrants that these are true prior to method invocation.
Postconditions. Callee warrants that these are true following method invocation.
Invariant conditions. Callee warrants that these are always true.
Throw an exception if the contract is violated.
Then throw an exception if anything unexpected happens.
You code should be a strict disciplinarian.
I return such values as null references or negative index values only if I'm sure that there will be no misunderstanding when the code will be used by some other developer. Something like LINQ function IEnumerable<T>.FirstOrDefault. IEnumerable<T>.First throws an exception for empty collections, because it is expected that it will return the first element, and the empty collection is an exceptional case

General function question (C++ / Java / C#)

This question is probably language-agnostic, but I'll focus on the specified languages.
While working with some legacy code, I often saw examples of the functions, which (to my mind, obviously) are doing too much work inside them. I'm talking not about 5000 LoC monsters, but about functions, which implement prerequisity checks inside them.
Here is a small example:
void WorriedFunction(...) {
// Of course, this is a bit exaggerated, but I guess this helps
// to understand the idea.
if (argument1 != null) return;
if (argument2 + argument3 < 0) return;
if (stateManager.currentlyDrawing()) return;
// Actual function implementation starts here.
// do_what_the_function_is_used_for
}
Now, when this kind of function is called, the caller doesn't have to worry about all that prerequisities to be fulfilled and one can simply say:
// Call the function.
WorriedFunction(...);
Now - how should one deal with the following problem?
Like, generally speaking - should this function only do what it is asked for and move the "prerequisity checks" to the caller side:
if (argument1 != null && argument2 + argument3 < 0 && ...) {
// Now all the checks inside can be removed.
NotWorriedFunction();
}
Or - should it simply throw exceptions per every prerequisity mismatch?
if (argument1 != null) throw NullArgumentException;
I'm not sure this problem can be generalized, but still I want to here your thoughts about this - probably there is something I can rethink.
If you have alternative solutions, feel free to tell me about them :)
Thank you.
Every function/method/code block should have a precondition, which are the precise circumstances under which it is designed to work, and a postcondition, the state of the world when the function returns. These help your fellow programmers understand your intentions.
By definition, the code is not expected to work if the precondition is false, and is considered buggy if the postcondition is false.
Whether you write these down in your head, on paper in a design document, in comments, or in actual code checks is sort of a matter of taste.
But a practical issue, long-term, life is easier if you code the precondition and post-condition as explicit checks. If you code such checks, because the code is not expected to work
with a false precondition, or is buggy with a false postcondition, the pre- and post- condition checks should cause the program to report an error in a way that makes it easy to discover the point of failure. What code should NOT do is simply "return" having done nothing, as your example shows, as this implies that it has somehow executed correctly.
(Code may of course be defined to exit having done nothing, but if that's the case, the pre- and post- conditions should reflect this.)
You can obviously write such checks with an if statement (your example comes dangerously close):
if (!precondition) die("Precondition failure in WorriedFunction"); // die never comes back
But often the presence of a pre- or post-condition is indicated in the code by defining a special function/macro/statement... for the language called an assertion, and such special construct typically causes a program abort and backtrace when the assertion is false.
The way the code should have been written is as follows:
void WorriedFunction(...)
{ assert(argument1 != null); // fail/abort if false [I think your example had the test backwards]
assert(argument2 + argument3 >= 0);
assert(!stateManager.currentlyDrawing());
/* body of function goes here */
}
Sophisticated functions may be willing to tell their callers that some condition has failed. That's the real purpose of exceptions. If exceptions are present, technically the postcondition should say something to the effect of "the function may exit with an exception under condition xyz".
That's an interesting question. Check the concept of "Design By Contract", you may find it helpful.
It depends.
I'd like to seperate my answer between case 1, 3 and case 2.
case 1, 3
If you can safely recover from an argument problem, don't throw exceptions. A good example are the TryParse methods - if the input is wrongly formatted, they simply return false. Another example (for the exception) are all LINQ methods - they throw if the source is null or one of the mandatory Func<> are null. However, if they accept a custom IEqualityComparer<T> or IComparer<T>, they don't throw, and simply use the default implementation by EqualityComparer<T>.Default or Comparer<T>.Default. It all depends on the context of usage of the argument and if you can safely recover from it.
case 2
I'd only use this way if the code is in an infrastructure like environment. Recently, I started reimplementing the LINQ stack, and you have to implement a couple of interfaces - those implementations will never be used outside my own classes and methods, so you can make assumptions inside them - the outside will always access them via the interface and can't create them on their own.
If you make that assumption for API methods, your code will throw all sorts of exceptions on wrong input, and the user doesn't have a clue what is happening as he doesn't know the inside of your method.
"Or - should it simply throw exceptions per every prerequisity mismatch?"
Yes.
You should do checks before calling and function and if you own the function, you should make it throw exceptions if arguments passed are not as expected.
In your calling code these exceptions should be handled. Ofcourse arguments passed should be verified before the call.

null objects vs. empty objects

[ This is a result of Best Practice: Should functions return null or an empty object? but I'm trying to be very general. ]
In a lot of legacy (um...production) C++ code that I've seen, there is a tendency to write a lot of NULL (or similar) checks to test pointers. Many of these get added near the end of a release cycle when adding a NULL-check provides a quick fix to a crash caused by the pointer dereference--and there isn't a lot of time to investigate.
To combat this, I started to write code that took a (const) reference parameter instead of the (much) more common technique of passing a pointer. No pointer, no desire to check for NULL (ignoring the corner case of actually having a null reference).
In C#, the same C++ "problem" is present: the desire to check every unknown reference against null (ArgumentNullException) and to quickly fix NullReferenceExceptions by adding a null check.
It seems to me, one way to prevent this is to avoid null objects in the first place by using empty objects (String.Empty, EventArgs.Empty) instead. Another would be to throw an exception rather than return null.
I'm just starting to learn F#, but it appears there are far fewer null objects in that enviroment. So maybe you don't really have to have a lot of null references floating around?
Am I barking up the wrong tree here?
Passing non-null just to avoid a NullReferenceException is trading a straightforward, easy-to-solve problem ("it blows up because it's null") for a much more subtle, hard-to-debug problem ("something several calls down the stack is not behaving as expected because much earlier it got some object which has no meaningful information but isn't null").
NullReferenceException is a wonderful thing! It fails hard, loud, fast, and it's almost always quick and easy to identify and fix. It's my favorite exception, because I know when I see it, my task is only going to take about 2 minutes. Contrast this with a confusing QA or customer report trying to describe strange behavior that has to be reproduced and traced back to the origin. Yuck.
It all comes down to what you, as a method or piece of code, can reasonably infer about the code which called you. If you are handed a null reference, and you can reasonably infer what the caller might have meant by null (maybe an empty collection, for example?) then you should definitely just deal with the nulls. However, if you can't reasonably infer what to do with a null, or what the caller means by null (for example, the calling code is telling you to open a file and gives the location as null), you should throw an ArgumentNullException.
Maintaining proper coding practices like this at every "gateway" point - logical bounds of functionality in your code—NullReferenceExceptions should be much more rare.
I tend to be dubious of code with lots of NULLs, and try to refactor them away where possible with exceptions, empty collections, Java Optionals, and so on.
The "Introduce Null Object" pattern in Martin Fowler's Refactoring (page 260) may also be helpful. A Null Object responds to all the methods a real object would, but in a way that "does the right thing". So rather than always check an Order to see if order.getDiscountPolicy() is NULL, make sure the Order has a NullDiscountPolicy in these cases. This streamlines the control logic.
Null gets my vote. Then again, I'm of the 'fail-fast' mindset.
String.IsNullOrEmpty(...) is very helpful too, I guess it catches either situation: null or empty strings. You could write a similar function for all your classes you're passing around.
If you are writing code that returns null as an error condition, then don't: generally, you should throw an exception instead - far harder to miss.
If you are consuming code that you fear may return null, then mostly these are boneheaded exceptions: perhaps do some Debug.Assert checks at the caller to sense-check the output during development. You shouldn't really need vast numbers of null checks in you production, but if some 3rd party library returns lots of nulls unpredictably, then sure: do the checks.
In 4.0, you might want to look at code-contracts; this gives you much better control to say "this argument should never be passed in as null", "this function never returns null", etc - and have the system validate those claims during static analysis (i.e. when you build).
The thing about null is that it doesn't come with meaning. It is merely the absence of an object.
So, if you really mean an empty string/collection/whatever, always return the relevant object and never null. If the language in question allows you to specify that, do so.
In the case where you want to return something that means not a value specifiable with the static type, then you have a number of options. Returning null is one answer, but without a meaning is a little dangerous. Throwing an exception may actually be what you mean. You might want to extend the type with special cases (probably with polymorphism, that is to say the Special Case Pattern (a special case of which is the Null Object Pattern)). You might want to wrap the return value in an type with more meaning. Or you might want to pass in a callback object. There usually are many choices.
I'd say it depends. For a method returning a single object, I'd generally return null. For a method returning a collection, I'd generally return an empty collection (non-null). These are more along the lines of guidelines than rules, though.
If you are serious about wanting to program in a "nullless" environment, consider using extension methods more often, they are immune to NullReferenceExceptions and at least "pretend" that null isn't there anymore:
public static GetExtension(this string s)
{
return (new FileInfo(s ?? "")).Extension;
}
which can be called as:
// this code will never throw, not even when somePath becomes null
string somePath = GetDataFromElseWhereCanBeNull();
textBoxExtension.Text = somePath.GetExtension();
I know, this is only convenience and many people correctly consider it violation of OO principles (though the "founder" of OO, Bertrand Meyer, considers null evil and completely banished it from his OO design, which is applied to the Eiffel language, but that's another story). EDIT: Dan mentions that Bill Wagner (More Effective C#) considers it bad practice and he's right. Ever considered the IsNull extension method ;-) ?
To make your code more readable, another hint may be in place: use the null-coalescing operator more often to designate a default when an object is null:
// load settings
WriteSettings(currentUser.Settings ?? new Settings());
// example of some readonly property
public string DisplayName
{
get
{
return (currentUser ?? User.Guest).DisplayName
}
}
None of these take the occasional check for null away (and ?? is nothing more then a hidden if-branch). I prefer as little null in my code as possible, simply because I believe it makes the code more readable. When my code gets cluttered with if-statements for null, I know there's something wrong in the design and I refactor. I suggest anybody to do the same, but I know that opinions vary wildly on the matter.
(Update) Comparison with exceptions
Not mentioned in the discussion so far is the similarity with exception handling. When you find yourself ubiquitously ignoring null whenever you consider it's in your way, it is basically the same as writing:
try
{
//...code here...
}
catch (Exception) {}
which has the effect of removing any trace of the exceptions only to find it raises unrelated exceptions much later in the code. Though I consider it good to avoid using null, as mentioned before in this thread, having null for exceptional cases is good. Just don't hide them in null-ignore-blocks, it will end up having the same effect as the catch-all-exceptions blocks.
For the exception protagonists they usually stem from transactional programming and strong exception safety guarantees or blind guidelines. In any decent complexity, ie. async workflow, I/O and especially networking code they are simply inappropriate. The reason why you see Google style docs on the matter in C++, as well as all good async code 'not enforcing it' (think your favourite managed pools as well).
There is more to it and while it might look like a simplification, it really is that simple. For one you will get a lot of exceptions in something that wasn't designed for heavy exception use.. anyway I digress, read upon on this from the world's top library designers, the usual place is boost (just don't mix it up with the other camp in boost that loves exceptions, because they had to write music software :-).
In your instance, and this is not Fowler's expertise, an efficient 'empty object' idiom is only possible in C++ due to available casting mechanism (perhaps but certainly not always by means of dominance ).. On ther other hand, in your null type you are capable throwing exceptions and doing whatever you want while preserving the clean call site and structure of code.
In C# your choice can be a single instance of a type that is either good or malformed; as such it is capable of throwing acceptions or simply running as is. So it might or might not violate other contracts ( up to you what you think is better depending on the quality of code you're facing ).
In the end, it does clean up call sites, but don't forget you will face a clash with many libraries (and especially returns from containers/Dictionaries, end iterators spring to mind, and any other 'interfacing' code to the outside world ). Plus null-as-value checks are extremely optimised pieces of machine code, something to keep in mind but I will agree any day wild pointer usage without understanding constness, references and more is going to lead to different kind of mutability, aliasing and perf problems.
To add, there is no silver bullet, and crashing on null reference or using a null reference in managed space, or throwing and not handling an exception is an identical problem, despite what managed and exception world will try to sell you. Any decent environment offers a protection from those (heck you can install any filter on any OS you want, what else do you think VMs do), and there are so many other attack vectors that this one has been overhammered to pieces. Enter x86 verification from Google yet again, their own way of doing much faster and better 'IL', 'dynamic' friendly code etc..
Go with your instinct on this, weight the pros and cons and localise the effects.. in the future your compiler will optimise all that checking anyway, and far more efficiently than any runtime or compile-time human method (but not as easily for cross-module interaction).
I try to avoid returning null from a method wherever possible. There are generally two kinds of situations - when null result would be legal, and when it should never happen.
In the first case, when no result is legal, there are several solutions available to avoid null results and null checks that are associated with them: Null Object pattern and Special Case pattern are there to return substitute objects that do nothing, or do some specific thing under specific circumstances.
If it is legal to return no object, but still there are no suitable substitutes in terms of Null Object or Special Case, then I typically use the Option functional type - I can then return an empty option when there is no legal result. It is then up to the client to see what is the best way to deal with empty option.
Finally, if it is not legal to have any object returned from a method, simply because the method cannot produce its result if something is missing, then I choose to throw an exception and cut further execution.
How are empty objects better than null objects? You're just renaming the symptom. The problem is that the contracts for your functions are too loosely defined "this function might return something useful, or it might return a dummy value" (where the dummy value might be null, an "empty object", or a magic constant like -1.) But no matter how you express this dummy value, callers still have to check for it before they use the return value.
If you want to clean up your code, the solution should be to narrow down the function so that it doesn't return a dummy value in the first place.
If you have a function which might return a value, or might return nothing, then pointers are a common (and valid) way to express this. But often, your code can be refactored so that this uncertainty is removed. If you can guarantee that a function returns something meaningful, then callers can rely on it returning something meaningful, and then they don't have to check the return value.
You can't always return an empty object, because 'empty' is not always defined. For example what does it mean for an int, float or bool to be empty?
Returning a NULL pointer is not necessarily a bad practice, but I think it's a better practice to return a (const) reference (where it makes sense to do so of course).
And recently I've often used a Fallible class:
Fallible<std::string> theName = obj.getName();
if (theName)
{
// ...
}
There are various implementations available for such a class (check Google Code Search), I also created my own.

How much null checking is enough?

What are some guidelines for when it is not necessary to check for a null?
A lot of the inherited code I've been working on as of late has null-checks ad nauseam. Null checks on trivial functions, null checks on API calls that state non-null returns, etc. In some cases, the null-checks are reasonable, but in many places a null is not a reasonable expectation.
I've heard a number of arguments ranging from "You can't trust other code" to "ALWAYS program defensively" to "Until the language guarantees me a non-null value, I'm always gonna check." I certainly agree with many of those principles up to a point, but I've found excessive null-checking causes other problems that usually violate those tenets. Is the tenacious null checking really worth it?
Frequently, I've observed codes with excess null checking to actually be of poorer quality, not of higher quality. Much of the code seems to be so focused on null-checks that the developer has lost sight of other important qualities, such as readability, correctness, or exception handling. In particular, I see a lot of code ignore the std::bad_alloc exception, but do a null-check on a new.
In C++, I understand this to some extent due to the unpredictable behavior of dereferencing a null pointer; null dereference is handled more gracefully in Java, C#, Python, etc. Have I just seen poor-examples of vigilant null-checking or is there really something to this?
This question is intended to be language agnostic, though I am mainly interested in C++, Java, and C#.
Some examples of null-checking that I've seen that seem to be excessive include the following:
This example seems to be accounting for non-standard compilers as C++ spec says a failed new throws an exception. Unless you are explicitly supporting non-compliant compilers, does this make sense? Does this make any sense in a managed language like Java or C# (or even C++/CLR)?
try {
MyObject* obj = new MyObject();
if(obj!=NULL) {
//do something
} else {
//??? most code I see has log-it and move on
//or it repeats what's in the exception handler
}
} catch(std::bad_alloc) {
//Do something? normally--this code is wrong as it allocates
//more memory and will likely fail, such as writing to a log file.
}
Another example is when working on internal code. Particularly, if it's a small team who can define their own development practices, this seems unnecessary. On some projects or legacy code, trusting documentation may not be reasonable... but for new code that you or your team controls, is this really necessary?
If a method, which you can see and can update (or can yell at the developer who is responsible) has a contract, is it still necessary to check for nulls?
//X is non-negative.
//Returns an object or throws exception.
MyObject* create(int x) {
if(x<0) throw;
return new MyObject();
}
try {
MyObject* x = create(unknownVar);
if(x!=null) {
//is this null check really necessary?
}
} catch {
//do something
}
When developing a private or otherwise internal function, is it really necessary to explicitly handle a null when the contract calls for non-null values only? Why would a null-check be preferable to an assert?
(obviously, on your public API, null-checks are vital as it's considered impolite to yell at your users for incorrectly using the API)
//Internal use only--non-public, not part of public API
//input must be non-null.
//returns non-negative value, or -1 if failed
int ParseType(String input) {
if(input==null) return -1;
//do something magic
return value;
}
Compared to:
//Internal use only--non-public, not part of public API
//input must be non-null.
//returns non-negative value
int ParseType(String input) {
assert(input!=null : "Input must be non-null.");
//do something magic
return value;
}
One thing to remember that your code that you write today while it may be a small team and you can have good documentation, will turn into legacy code that someone else will have to maintain. I use the following rules:
If I'm writing a public API that will be exposed to others, then I will do null checks on all reference parameters.
If I'm writing an internal component to my application, I write null checks when I need to do something special when a null exists, or when I want to make it very clear. Otherwise I don't mind getting the null reference exception since that is also fairly clear what is going on.
When working with return data from other peoples frameworks, I only check for null when it is possible and valid to have a null returned. If their contract says it doesn't return nulls, I won't do the check.
First note that this a special case of contract-checking: you're writing code that does nothing other than validate at runtime that a documented contract is met. Failure means that some code somewhere is faulty.
I'm always slightly dubious about implementing special cases of a more generally useful concept. Contract checking is useful because it catches programming errors the first time they cross an API boundary. What's so special about nulls that means they're the only part of the contract you care to check? Still,
On the subject of input validation:
null is special in Java: a lot of Java APIs are written such that null is the only invalid value that it's even possible to pass into a given method call. In such cases a null check "fully validates" the input, so the full argument in favour of contract checking applies.
In C++, on the other hand, NULL is only one of nearly 2^32 (2^64 on newer architectures) invalid values that a pointer parameter could take, since almost all addresses are not of objects of the correct type. You can't "fully validate" your input unless you have a list somewhere of all objects of that type.
The question then becomes, is NULL a sufficiently common invalid input to get special treatment that (foo *)(-1) doesn't get?
Unlike Java, fields don't get auto-initialized to NULL, so a garbage uninitialized value is just as plausible as NULL. But sometimes C++ objects have pointer members which are explicitly NULL-inited, meaning "I don't have one yet". If your caller does this, then there is a significant class of programming errors which can be diagnosed by a NULL check. An exception may be easier for them to debug than a page fault in a library they don't have the source for. So if you don't mind the code bloat, it might be helpful. But it's your caller you should be thinking of, not yourself - this isn't defensive coding, because it only 'defends' against NULL, not against (foo *)(-1).
If NULL isn't a valid input, you could consider taking the parameter by reference rather than pointer, but a lot of coding styles disapprove of non-const reference parameters. And if the caller passes you *fooptr, where fooptr is NULL, then it has done nobody any good anyway. What you're trying to do is squeeze a bit more documentation into the function signature, in the hope that your caller is more likely to think "hmm, might fooptr be null here?" when they have to explicitly dereference it, than if they just pass it to you as a pointer. It only goes so far, but as far as it goes it might help.
I don't know C#, but I understand that it's like Java in that references are guaranteed to have valid values (in safe code, at least), but unlike Java in that not all types have a NULL value. So I'd guess that null checks there are rarely worth it: if you're in safe code then don't use a nullable type unless null is a valid input, and if you're in unsafe code then the same reasoning applies as in C++.
On the subject of output validation:
A similar issue arises: in Java you can "fully validate" the output by knowing its type, and that the value isn't null. In C++, you can't "fully validate" the output with a NULL check - for all you know the function returned a pointer to an object on its own stack which has just been unwound. But if NULL is a common invalid return due to the constructs typically used by the author of the callee code, then checking it will help.
In all cases:
Use assertions rather than "real code" to check contracts where possible - once your app is working, you probably don't want the code bloat of every callee checking all its inputs, and every caller checking its return values.
In the case of writing code which is portable to non-standard C++ implementations, then instead of the code in the question which checks for null and also catches the exception, I'd probably have a function like this:
template<typename T>
static inline void nullcheck(T *ptr) {
#if PLATFORM_TRAITS_NEW_RETURNS_NULL
if (ptr == NULL) throw std::bad_alloc();
#endif
}
Then as one of the list of things you do when porting to a new system, you define PLATFORM_TRAITS_NEW_RETURNS_NULL (and maybe some other PLATFORM_TRAITS) correctly. Obviously you can write a header which does this for all the compilers you know about. If someone takes your code and compiles it on a non-standard C++ implementation that you know nothing about, they're fundamentally on their own for bigger reasons than this, so they'll have to do it themselves.
If you write the code and its contract, you are responsible for using it in terms of its contract and ensuring the contract is correct. If you say "returns a non-null" x, then the caller should not check for null. If a null pointer exception then occurs with that reference/pointer, it is your contract that is incorrect.
Null checking should only go to the extreme when using a library that is untrusted, or does not have a proper contract. If it is your development team's code, stress that the contracts must not be broken, and track down the person who uses the contract incorrectly when bugs occur.
Part of this depends on how the code is used -- if it is a method available only within a project vs. a public API, for example. API error checking requires something stronger than an assertion.
So while this is fine within a project where it's supported with unit tests and stuff like that:
internal void DoThis(Something thing)
{
Debug.Assert(thing != null, "Arg [thing] cannot be null.");
//...
}
in a method where you don't have control over who calls it, something like this may be better:
public void DoThis(Something thing)
{
if (thing == null)
{
throw new ArgumentException("Arg [thing] cannot be null.");
}
//...
}
It depends on the situation. The rest of my answer assumes C++.
I never test the return value of new
since all the implementations I use
throw bad_alloc on failure. If I
see a legacy test for new returning
null in any code I'm working on, I
cut it out and don't bother to
replace it with anything.
Unless small minded coding standards
prohibit it, I assert documented
preconditions. Broken code which
violates a published contract needs
to fail immediately and
dramatically.
If the null arises from a runtime
failure which isn't due to broken
code, I throw. fopen failure and
malloc failure (though I rarely if
ever use them in C++) would fall
into this category.
I don't attempt to recover from
allocation failure. Bad_alloc gets
caught in main().
If the null test
is for an object which is
collaborator of my class, I rewrite
the code to take it by reference.
If the collaborator really might not
exist, I use the Null Object
design pattern to create a
placeholder to fail in well defined
ways.
NULL checking in general is evil as it's add a small negative token to the code testability. With NULL checks everywhere you can't use "pass null" technique and it will hit you when unit testing. It's better to have unit test for the method than null check.
Check out decent presentation on that issue and unit testing in general by Misko Hevery at http://www.youtube.com/watch?v=wEhu57pih5w&feature=channel
Older versions of Microsoft C++ (and probably others) did not throw an exception for failed allocations via new, but returned NULL. Code that had to run in both standard-conforming and older versions would have the redundant checking that you point out in your first example.
It would be cleaner to make all failed allocations follow the same code path:
if(obj==NULL)
throw std::bad_alloc();
It's widely known that there are procedure-oriented people (focus on doing things the right way) and results-oriented people (get the right answer). Most of us lie somewhere in the middle. Looks like you've found an outlier for procedure-oriented. These people would say "anything's possible unless you understand things perfectly; so prepare for anything." For them, what you see is done properly. For them if you change it, they'll worry because the ducks aren't all lined up.
When working on someone else's code, I try to make sure I know two things.
1. What the programmer intended
2. Why they wrote the code the way they did
For following up on Type A programmers, maybe this helps.
So "How much is enough" ends up being a social question as much as a technical question - there's no agreed-upon way to measure it.
(It drives me nuts too.)
Personally I think null testing is unnnecessary in the great majority of cases. If new fails or malloc fails you have bigger issues and the chance of recovering is just about nil in cases where you're not writing a memory checker! Also null testing hides bugs a lot in the development phases since the "null" clauses are frequently just empty and do nothing.
When you can specify which compiler is being used, for system functions such as "new" checking for null is a bug in the code. It means that you will be duplicating the error handling code. Duplicate code is often a source of bugs because often one gets changed and the other doesn't. If you can not specify the compiler or compiler versions, you should be more defensive.
As for internal functions, you should specify the contract and make sure that contract is enforce via unit tests. We had a problem in our code a while back where we either threw an exception or returned null in case of a missing object from our database. This just made things confusing for the caller of the api so we went through and made it consistant throughout the entire code base and removed the duplicate checks.
The important thing (IMHO) is to not have duplicate error logic where one branch will never be invoked. If you can never invoke code, then you can't test it, and you will never know if it is broken or not.
I'd say it depends a little on your language, but I use Resharper with C# and it basically goes out of it's way to tell me "this reference could be null" in which case I add a check, if it tells me "this will always be true" for "if (null != oMyThing && ....)" then I listen to it an don't test for null.
Whether to check for null or not greatly depends on the circumstances.
For example in our shop we check parameters to methods we create for null inside the method. The simple reason is that as the original programmer I have a good idea of exactly what the method should do. I understand the context even if the documentation and requirements are incomplete or less than satisfactory. A later programmer tasked with maintenance may not understand the context and may assume, wrongly, that passing null is harmless. If I know null would be harmful and I can anticipate that someone may pass null, I should take the simple step of making sure that the method reacts in a graceful way.
public MyObject MyMethod(object foo)
{
if (foo == null)
{
throw new ArgumentNullException("foo");
}
// do whatever if foo was non-null
}
I only check for NULL when I know what to do when I see NULL. "Know what to do" here means "know how to avoid a crash" or "know what to tell the user besides the location of the crash". For example, if malloc() returns NULL, I usually have no option but to abort the program. On the other hand, if fopen() returns NULL, I can let the user know the file name that could not be open and may be errno. And if find() returns end(), I usually know how to continue without crashing.
Lower level code should check use from higher level code. Usually this means checking arguments, but it can mean checking return values from upcalls. Upcall arguments need not be checked.
The aim is to catch bugs in immediate and obvious ways, as well as documenting the contract in code that does not lie.
I don't think it's bad code. A fair amount of Windows/Linux API calls return NULL on failure of some sort. So, of course, I check for failure in the manner the API specifies. Usually I wind up passing control flow to an error module of some fashion instead of duplicating error-handling code.
If I receive a pointer that is not guaranteed by language to be not null, and am going to de-reference it in a way that null will break me, or pass out put my function where I said I wouldn't produce NULLs, I check for NULL.
It is not just about NULLs, a function should check pre- and post-conditions if possible.
It doesn't matter at all if a contract of the function that gave me the pointer says it'll never produce nulls. We all make bugs. There's a good rule that a program shall fail early and often, so instead of passing the bug to another module and have it fail, I'll fail in place. Makes things so much easier to debug when testing. Also in critical systems makes it easier to keep the system sane.
Also, if an exception escapes main, stack may not be rolled up, preventing destructors from running at all (see C++ standard on terminate()). Which may be serious. So leaving bad_alloc unchecked can be more dangerous than it seems.
Fail with assert vs. fail with a run time error is quite a different topic.
Checking for NULL after new() if standard new() behavior has not been altered to return NULL instead of throwing seems obsolete.
There's another problem, which is that even if malloc returned a valid pointer, it doesn't yet mean you have allocated memory and can use it. But that is another story.
My first problem with this, is that it leads to code which is littered with null checks and the likes. It hurts readability, and i’d even go as far as to say that it hurts maintainability because it really is easy to forget a null check if you’re writing a piece of code where a certain reference really should never be null. And you just know that the null checks will be missing in some places. Which actually makes debugging harder than it needs to be. Had the original exception not been caught and replaced with a faulty return value, then we would’ve gotten a valuable exception object with an informative stacktrace. What does a missing null check give you? A NullReferenceException in a piece of code that makes you go: wtf? this reference should never be null!
So then you need to start figuring out how the code was called, and why the reference could possibly be null. This can take a lot of time and really hurts the efficiency of your debugging efforts. Eventually you’ll figure out the real problem, but odds are that it was hidden pretty deeply and you spent a lot more time searching for it than you should have.
Another problem with null checks all over the place is that some developers don’t really take the time to properly think about the real problem when they get a NullReferenceException. I’ve actually seen quite a few developers just add a null check above the code where the NullReferenceException occurred. Great, the exception no longer occurs! Hurray! We can go home now! Umm… how bout ‘no you can’t and you deserve an elbow to the face’? The real bug might not cause an exception anymore, but now you probably have missing or faulty behavior… and no exception! Which is even more painful and takes even more time to debug.
At first, this seemed like a strange question: null checks are great and a valuable tool. Checking that new returns null is definitely silly. I'm just going to ignore the fact that there are languages that allow that. I'm sure there are valid reasons, but I really don't think I can handle living in that reality :) All kidding aside, it seems like you should at least have to specify that the new should return null when there isn't enough memory.
Anyway, checking for null where appropriate leads to cleaner code. I'd go so far as to say that never assigning function parameters default values is the next logical step. To go even further, returning empty arrays, etc. where appropriate leads to even cleaner code. It is nice to not have to worry about getting nulls except where they are logically meaningful. Nulls as error values are better avoided.
Using asserts is a really great idea. Especially if it gives you the option of turning them off at runtime. Plus, it is a more explicitly contractual style :)

Throw/do-not-throw an exception based on a parameter - why is this not a good idea?

I was digging around in MSDN and found this article which had one interesting bit of advice: Do not have public members that can either throw or not throw exceptions based on some option.
For example:
Uri ParseUri(string uriValue, bool throwOnError)
Now of course I can see that in 99% of cases this would be horrible, but is its occasional use justified?
One case I have seen it used is with an "AllowEmpty" parameter when accessing data in the database or a configuration file. For example:
object LoadConfigSetting(string key, bool allowEmpty);
In this case, the alternative would be to return null. But then the calling code would be littered with null references check. (And the method would also preclude the ability to actually allow null as a specifically configurable value, if you were so inclined).
What are your thoughts? Why would this be a big problem?
I think it's definitely a bad idea to have a throw / no throw decision be based off of a boolean. Namely because it requires developers looking at a piece of code to have a functional knowledge of the API to determine what the boolean means. This is bad on it's own but when it changes the underlying error handling it can make it very easy for developers to make mistakes while reading code.
It would be much better and more readable to have 2 APIs in this case.
Uri ParseUriOrThrow(string value);
bool TryParseUri(string value, out Uri uri);
In this case it's 100% clear what these APIs do.
Article on why booleans are bad as parameters: http://blogs.msdn.com/jaredpar/archive/2007/01/23/boolean-parameters.aspx
It's usually best to choose one error handling mechanism and stick with it consistently. Allowing this sort of flip-flop code can't really improve the life of developers.
In the above example, what happens if parsing fails and throwOnError is false? Now the user has to guess if NULL if going to be returned, or god knows...
True there's an ongoing debate between exceptions and return values as the better error handling method, but I'm pretty certain there's a consensus about being consistent and sticking with whatever choice you make. The API can't surprise its users and error handling should be part of the interface, and be as clearly defined as the interface.
It's kind of nasty from a readabilty standpoint. Developers tend to expect every method to throw an exception, and if they want to ignore the exception, they'll catch it themselves. With the 'boolean flag' approach, every single method needs to implement this exception-inhibiting semantic.
However, I think the MSDN article is strictly referring to 'throwOnError' flags. In these cases either the error is ignored inside the method itself (bad, as it's hidden) or some kind of null/error object is returned (bad, because you're not using exceptions to handle the error, which is inconsistent and itself error-prone).
Whereas your example seems fine to me. An exception indicates a failure of the method to perform its duty - there is no return value. However the 'allowEmpty' flag changes the semantics of the method - so what would have been an exception ('Empty value') is now expected and legal. Plus, if you had thrown an exception, you wouldn't easily be able to return the config data. So it seems OK in this case.
In any public API it is really a bad idea to have two ways to check for a faulty condition because then it becomes non-obvious what will happen if the error occurs. Just by looking at the code will not help. You have to understand the semantics of the flag parameter (and nothing prevents it from being an expression).
If checking for null is not an option, and if I need to recover from this specific failure, I prefer to create a specific exception so that I can catch it later and handle it appropriately. In any other case I throw a general exception.
Another example in line with this could be set of TryParse methods on some of the value types
bool DateTime.TryParse(string text, out DateTime)
Having a donTThrowException parameter defeats the whole point of exceptions (in any language). If the calling code wants to have:
public static void Main()
{
FileStream myFile = File.Open("NonExistent.txt", FileMode.Open, FileAccess.Read);
}
they're welcome to (C# doesn't even have checked exceptions). In Java the same thing would be accomplished with:
public static void main(String[] args) throws FileNotFoundException
{
FileInputStream fs = new FileInputStream("NonExistent.txt");
}
Either way, it's the caller's job to decide how to handle (or not) the exception, not the callee's.
In the linked to article there is a note that Exceptions should not be used for flow of control - which seems to be implied in the example questsions. Exceptions should reflect Method level failure. To have a signature that it is OK to throw an Error seems like the design is not thought out.
Jeffrey Richters book CLR via C# points out - "you should throw an exception when the method cannot complete its task as indicated by its name".
His book also pointed out a very common error. People tend to write code to catch everything (his words "A ubiquitous mistake of developers who have not been properly trained on the proper use of exceptions tend to use catch blocks too often and improperly. When you catch an exception, you're stating that you expected this exception, you understand why it occurred, and you know how to deal with it.")
That has made me try to code for exceptions that I can expect and can handle in my logic otherwise it should be an error.
Validate your arguments and prevent the exceptions, and only catch what you can handle.
I would aver that it's often useful to have a parameter which indicates whether a failure should cause an exception or simply return an error indication, since such a parameters can be easily passed from an outer routine to an inner one. Consider something like:
Byte[] ReadPacket(bool DontThrowIfNone) // Documented as returning null if none
{
int len = ReadByte(DontThrowIfNone); // Documented as returning -1 if nothing
if (len
If something like a TimeoutException while reading the data should cause an exception, such exception should be thrown within the ReadByte() or ReadMultiBytesbytes(). If, however, such a lack of data should be considered normal, then the ReadByte() or ReadMultiBytesbytes() routine should not throw an exception. If one simply used the do/try pattern, a the ReadPacket and TryReadPacket routines would need to have almost identical code, but with one using Read* methods and the other using TryRead* methods. Icky.
It may be better to use an enumeration rather than a boolean.

Categories