c#: warning if results discarded - c#

I've had several timewasting bugs from my mistakes where the result of an expression is not used, so is silently discarded, something like this
x.something(...)
instead of the intended
var y = x.something(...)
This is trivial to catch and I want the compiler to do so. If I really want to throw away the result I'd be happy to cast to void or use a discard (_ = something()). Is there any way of it catching such silly mistakes? If so, are there other checks I can enable as well that may prove useful?

Related

Use of unassigned local variable on finally block

When could i in this example be unassigned?
int i;
try
{
i = 2;
}
catch
{
i = 3;
}
finally
{
string a = i.ToString();
}
You could get a ThreadAbortException before i=2 runs, for example. Anyway, the C# compiler is not exceptionally smart, so it's easy to fool with contrived examples like the above one. It doesn't have to recognize every situation, and if it's not sure it is assigned, even if you are sure, it will complain.
EDIT: I was a bit quick on my first assumption. So to refine it, here's what I think. Code is guaranteed to run in order, or if an exception happens, it will jump to the handlers. So i=2 may not run if an exception happens before that. I still claim that a ThreadAbortException is one of the few reasons why this can happen, even you have no code which could produce exceptions. Generally, if you have any number of different exception handlers, the compiler cannot know in advance which one will run. So it doesn't try to make any assumptions about that. It could know that if 1) there is only 1 catch block and 2) it is typeless, then, and only then, that one catch block is guaranteed to run. Or, if there were multiple catch handlers, and you assigned your variable in every one of them, it could also work, but I guess the compiler doesn't care about that either. However simple it may seem, it is a special case, and the C# compiler team has a tendency to ignore those special cases.
It's highly unlikely it will happen with the example you have posted. However, the compliler is going to be 'helpful' in this situation. As Hadas said, just initialize i to 0.
There are lots of code samples that one can write in which it is provably the case that a variable is assigned but that the compiler simply cannot prove that it is definitely assigned.
Just consider this much simpler case:
int i;
if ((bool)(object)true)
i = 0;
Console.WriteLine(i);
It is provably impossible for that case to ever access an unassigned i as well, yet it won't compile.
It is also provably impossible for the compiler to solve this problem in the general case. There are cases where it can prove a variable is certainly not definitely assigned, and there are cases where it can prove it definitely is assigned, but there are also cases where it just doesn't know either way. In those cases it chooses to fail, because it sees a few false positive errors as less harmful than false negatives.
To speak more about your specific case; you're saying that if a variable is assigned in both the try and catch blocks it is definitely assigned. While that may be true of your specific code, it's certainly not true in the general case. You need to consider exceptions that aren't handled by the catch block (even in your case, where none is specified, exceptions such as a stack overflow or out of memory won't be caught), you need to consider the catch block itself throwing an exception (again, it won't happen in your case, but the compiler would need to prove that to compile the code).
I think it would help if you to look at another type besides a primitive. Consider:
int i;
MyClass ed = new MyClass();
try
{
int newI = ed.getIntFromFunctionThatWillThrow();
i = newI;
}
catch (Exception e)
{
i = 3;
// do some abortion code.
}
finally
{
string a = i.ToString();
...
}
So in this block of code, there is no lexicographical guarantee for any one branch of execution. You have two (at least) branches to consider: The try-finally branch, and the try-catch branch. Since the function "getIntFromFunctionThatWillThrow" is going to throw (See what I did there?) i will be left unassigned, even though I try to use it later. However, that isn't something that is recognized until after runtime, like I have no idea what section of code this will get into if I have no inside information about the member on ed. So the compiler has no idea what value i will be. It only knows that it exists, and is of type int.
If it was an issue, a fix is to set i an initial value, this will suppress the error.
I hope this helps somewhat!

Compile time constants and reference types

Ok, consider the following code:
const bool trueOrFalse = false;
const object constObject = null;
void Foo()
{
if (trueOrFalse)
{
int hash = constObject.GetHashCode();
}
if (constObject != null)
{
int hash = constObject.GetHashCode();
}
}
trueOrFalse is a compile time constant and as such, the compiler warns correctly that int hash = o.GetHashCode(); is not reachable.
Also, constObject is a compile time constant and as such the compiler warns again correctly that int hash = o.GetHashCode(); is not reachable as o != null will never be true.
So why doesn't the compiler figure out that:
if (true)
{
int hash = constObject.GetHashCode();
}
is 100% sure to be a runtime exception and thus issue out a compile time error? I know this is probably a stupid corner case, but the compiler seems pretty smart reasoning about compile time constant value types, and as such, I was expecting it could also figure out this small corner case with reference types.
UPDATE: This question was the subject of my blog on July 17th 2012. Thanks for the great question!
Why doesn't the compiler figure out that my code is 100% sure to be a runtime exception and thus issue out a compile time error?
Why should the compiler make code that is guaranteed to throw into a compile-time error? Wouldn't that make:
int M()
{
throw new NotImplementedException();
}
into a compile-time error? But that's exactly the opposite of what you want it to be; you want this to be a runtime error so that the incomplete code compiles.
Now, you might say, well, dereferencing null is clearly undesirable always, whereas a "not implemented" exception is clearly desirable. So could the compiler detect just this specific situation of there being a null ref exception guaranteed to happen, and give an error?
Sure, it could. We'd just have to spend the budget on implementing a data flow analyzer that tracks when a given expression is known to be always null, and then make it a compile time error (or warning) to dereference that expression.
The questions to answer then are:
How much does that feature cost?
How much benefit does the user accrue?
Is there any other possible feature that has a better cost-to-benefit ratio, and provides more value to the user?
The answer to the first question is "rather a lot" -- code flow analyzers are expensive to design and build. The answer to the second question is "not very much" -- the number of situations in which you can prove that null is going to be dereferenced are very small. The answer to the third question has, over the last twelve years, always been "yes".
Therefore, no such feature.
Now, you might say, well, C# does have some limited ability to detect when an expression is always/never null; the nullable arithmetic analyzer uses this analysis to generate more optimal nullable arithmetic code (*), and clearly the flow analyzer uses it to determine reachability. So why not just use the already existing nullability and flow analyzer to detect when you've always dereferenced a null constant?
That would be cheap to implement, sure. But the corresponding user benefit is now tiny. How many times in real code do you initialize a constant to null, and then dereference it? It seems unlikely that anyone would actually do that.
Moreover: yes, it is always better to detect a bug at compile time instead of run time, because it is cheaper. But the bug here -- a guaranteed dereference of null -- will be caught the first time the code is tested, and subsequently fixed.
So basically the feature request here is to detect at compile time a very unlikely and obvioulsy wrong situation that will always be immediately caught and fixed the first time the code is run anyways. It is therefore not a very good candidate for spending budget on to implement it; we have lots of higher priorities.
(*) See the long series of articles on how the Roslyn compiler does so which begins at http://ericlippert.com/2012/12/20/nullable-micro-optimizations-part-one/
While unreachable code is useless and does not affect your execution, code that throws an error is executed. So
if (true) { int hash = constObject.GetHashCode();}
is more or less the same as
throw new NullReferenceException();
You might very well want to throw that null reference. Whereas the unreachable code is just taking up space if it were to be compiled.
It also won't warn about the following code that has the same effect:
throw new NullReferenceException();
There's a balance with warnings. Most compiler errors happen when the compiler can't produce anything meaningful from it.
Some happen with things that affect verifiability, or which cross a threshold of how likely they are to be a bug. For example, the following code:
private void DoNothing(out string str)
{
return;
}
private void UseNothing()
{
string s;
DoNothing(s);
}
Won't compile, though if it did it would do no harm (the only place DoNothing is called doesn't use the string passed, so the fact that it is never assigned isn't a problem). There's just too high a risk that I'm doing something stupid here to let it go.
Warnings are for things that are almost certainly foolish or at least not what you wanted to happen. Dead code is likely enough to be a bug to make a warning worthwhile, but likely enough to be sensible (e.g. trueOrFalse may change as the application is developed) to make an error inappropriate.
Warnings are meant to be useful, rather than nuisances, so the bar for them is put quite high. There's no exact science, but it was deemed that unreachable code made the cut, and trying to deduce when throwing exceptions wasn't the desired behaviour didn't.
It no doubt helps that the compiler already detects unreachable code (and doesn't compile it) but sees one deliberate throw much like another, no matter how convoluted on the one hand or direct on the other.
Why would you even want that to be a compile-time error? Why would you want code that is guaranteed to throw an exception to be invalid at compile time? What if I, the programmer, want the semantics of my program to be:
static void Main(string[] args) {
throw new NullReferenceException();
}
It's my program. The compiler has no business telling me this isn't valid C# code.

null objects vs. empty objects

[ This is a result of Best Practice: Should functions return null or an empty object? but I'm trying to be very general. ]
In a lot of legacy (um...production) C++ code that I've seen, there is a tendency to write a lot of NULL (or similar) checks to test pointers. Many of these get added near the end of a release cycle when adding a NULL-check provides a quick fix to a crash caused by the pointer dereference--and there isn't a lot of time to investigate.
To combat this, I started to write code that took a (const) reference parameter instead of the (much) more common technique of passing a pointer. No pointer, no desire to check for NULL (ignoring the corner case of actually having a null reference).
In C#, the same C++ "problem" is present: the desire to check every unknown reference against null (ArgumentNullException) and to quickly fix NullReferenceExceptions by adding a null check.
It seems to me, one way to prevent this is to avoid null objects in the first place by using empty objects (String.Empty, EventArgs.Empty) instead. Another would be to throw an exception rather than return null.
I'm just starting to learn F#, but it appears there are far fewer null objects in that enviroment. So maybe you don't really have to have a lot of null references floating around?
Am I barking up the wrong tree here?
Passing non-null just to avoid a NullReferenceException is trading a straightforward, easy-to-solve problem ("it blows up because it's null") for a much more subtle, hard-to-debug problem ("something several calls down the stack is not behaving as expected because much earlier it got some object which has no meaningful information but isn't null").
NullReferenceException is a wonderful thing! It fails hard, loud, fast, and it's almost always quick and easy to identify and fix. It's my favorite exception, because I know when I see it, my task is only going to take about 2 minutes. Contrast this with a confusing QA or customer report trying to describe strange behavior that has to be reproduced and traced back to the origin. Yuck.
It all comes down to what you, as a method or piece of code, can reasonably infer about the code which called you. If you are handed a null reference, and you can reasonably infer what the caller might have meant by null (maybe an empty collection, for example?) then you should definitely just deal with the nulls. However, if you can't reasonably infer what to do with a null, or what the caller means by null (for example, the calling code is telling you to open a file and gives the location as null), you should throw an ArgumentNullException.
Maintaining proper coding practices like this at every "gateway" point - logical bounds of functionality in your code—NullReferenceExceptions should be much more rare.
I tend to be dubious of code with lots of NULLs, and try to refactor them away where possible with exceptions, empty collections, Java Optionals, and so on.
The "Introduce Null Object" pattern in Martin Fowler's Refactoring (page 260) may also be helpful. A Null Object responds to all the methods a real object would, but in a way that "does the right thing". So rather than always check an Order to see if order.getDiscountPolicy() is NULL, make sure the Order has a NullDiscountPolicy in these cases. This streamlines the control logic.
Null gets my vote. Then again, I'm of the 'fail-fast' mindset.
String.IsNullOrEmpty(...) is very helpful too, I guess it catches either situation: null or empty strings. You could write a similar function for all your classes you're passing around.
If you are writing code that returns null as an error condition, then don't: generally, you should throw an exception instead - far harder to miss.
If you are consuming code that you fear may return null, then mostly these are boneheaded exceptions: perhaps do some Debug.Assert checks at the caller to sense-check the output during development. You shouldn't really need vast numbers of null checks in you production, but if some 3rd party library returns lots of nulls unpredictably, then sure: do the checks.
In 4.0, you might want to look at code-contracts; this gives you much better control to say "this argument should never be passed in as null", "this function never returns null", etc - and have the system validate those claims during static analysis (i.e. when you build).
The thing about null is that it doesn't come with meaning. It is merely the absence of an object.
So, if you really mean an empty string/collection/whatever, always return the relevant object and never null. If the language in question allows you to specify that, do so.
In the case where you want to return something that means not a value specifiable with the static type, then you have a number of options. Returning null is one answer, but without a meaning is a little dangerous. Throwing an exception may actually be what you mean. You might want to extend the type with special cases (probably with polymorphism, that is to say the Special Case Pattern (a special case of which is the Null Object Pattern)). You might want to wrap the return value in an type with more meaning. Or you might want to pass in a callback object. There usually are many choices.
I'd say it depends. For a method returning a single object, I'd generally return null. For a method returning a collection, I'd generally return an empty collection (non-null). These are more along the lines of guidelines than rules, though.
If you are serious about wanting to program in a "nullless" environment, consider using extension methods more often, they are immune to NullReferenceExceptions and at least "pretend" that null isn't there anymore:
public static GetExtension(this string s)
{
return (new FileInfo(s ?? "")).Extension;
}
which can be called as:
// this code will never throw, not even when somePath becomes null
string somePath = GetDataFromElseWhereCanBeNull();
textBoxExtension.Text = somePath.GetExtension();
I know, this is only convenience and many people correctly consider it violation of OO principles (though the "founder" of OO, Bertrand Meyer, considers null evil and completely banished it from his OO design, which is applied to the Eiffel language, but that's another story). EDIT: Dan mentions that Bill Wagner (More Effective C#) considers it bad practice and he's right. Ever considered the IsNull extension method ;-) ?
To make your code more readable, another hint may be in place: use the null-coalescing operator more often to designate a default when an object is null:
// load settings
WriteSettings(currentUser.Settings ?? new Settings());
// example of some readonly property
public string DisplayName
{
get
{
return (currentUser ?? User.Guest).DisplayName
}
}
None of these take the occasional check for null away (and ?? is nothing more then a hidden if-branch). I prefer as little null in my code as possible, simply because I believe it makes the code more readable. When my code gets cluttered with if-statements for null, I know there's something wrong in the design and I refactor. I suggest anybody to do the same, but I know that opinions vary wildly on the matter.
(Update) Comparison with exceptions
Not mentioned in the discussion so far is the similarity with exception handling. When you find yourself ubiquitously ignoring null whenever you consider it's in your way, it is basically the same as writing:
try
{
//...code here...
}
catch (Exception) {}
which has the effect of removing any trace of the exceptions only to find it raises unrelated exceptions much later in the code. Though I consider it good to avoid using null, as mentioned before in this thread, having null for exceptional cases is good. Just don't hide them in null-ignore-blocks, it will end up having the same effect as the catch-all-exceptions blocks.
For the exception protagonists they usually stem from transactional programming and strong exception safety guarantees or blind guidelines. In any decent complexity, ie. async workflow, I/O and especially networking code they are simply inappropriate. The reason why you see Google style docs on the matter in C++, as well as all good async code 'not enforcing it' (think your favourite managed pools as well).
There is more to it and while it might look like a simplification, it really is that simple. For one you will get a lot of exceptions in something that wasn't designed for heavy exception use.. anyway I digress, read upon on this from the world's top library designers, the usual place is boost (just don't mix it up with the other camp in boost that loves exceptions, because they had to write music software :-).
In your instance, and this is not Fowler's expertise, an efficient 'empty object' idiom is only possible in C++ due to available casting mechanism (perhaps but certainly not always by means of dominance ).. On ther other hand, in your null type you are capable throwing exceptions and doing whatever you want while preserving the clean call site and structure of code.
In C# your choice can be a single instance of a type that is either good or malformed; as such it is capable of throwing acceptions or simply running as is. So it might or might not violate other contracts ( up to you what you think is better depending on the quality of code you're facing ).
In the end, it does clean up call sites, but don't forget you will face a clash with many libraries (and especially returns from containers/Dictionaries, end iterators spring to mind, and any other 'interfacing' code to the outside world ). Plus null-as-value checks are extremely optimised pieces of machine code, something to keep in mind but I will agree any day wild pointer usage without understanding constness, references and more is going to lead to different kind of mutability, aliasing and perf problems.
To add, there is no silver bullet, and crashing on null reference or using a null reference in managed space, or throwing and not handling an exception is an identical problem, despite what managed and exception world will try to sell you. Any decent environment offers a protection from those (heck you can install any filter on any OS you want, what else do you think VMs do), and there are so many other attack vectors that this one has been overhammered to pieces. Enter x86 verification from Google yet again, their own way of doing much faster and better 'IL', 'dynamic' friendly code etc..
Go with your instinct on this, weight the pros and cons and localise the effects.. in the future your compiler will optimise all that checking anyway, and far more efficiently than any runtime or compile-time human method (but not as easily for cross-module interaction).
I try to avoid returning null from a method wherever possible. There are generally two kinds of situations - when null result would be legal, and when it should never happen.
In the first case, when no result is legal, there are several solutions available to avoid null results and null checks that are associated with them: Null Object pattern and Special Case pattern are there to return substitute objects that do nothing, or do some specific thing under specific circumstances.
If it is legal to return no object, but still there are no suitable substitutes in terms of Null Object or Special Case, then I typically use the Option functional type - I can then return an empty option when there is no legal result. It is then up to the client to see what is the best way to deal with empty option.
Finally, if it is not legal to have any object returned from a method, simply because the method cannot produce its result if something is missing, then I choose to throw an exception and cut further execution.
How are empty objects better than null objects? You're just renaming the symptom. The problem is that the contracts for your functions are too loosely defined "this function might return something useful, or it might return a dummy value" (where the dummy value might be null, an "empty object", or a magic constant like -1.) But no matter how you express this dummy value, callers still have to check for it before they use the return value.
If you want to clean up your code, the solution should be to narrow down the function so that it doesn't return a dummy value in the first place.
If you have a function which might return a value, or might return nothing, then pointers are a common (and valid) way to express this. But often, your code can be refactored so that this uncertainty is removed. If you can guarantee that a function returns something meaningful, then callers can rely on it returning something meaningful, and then they don't have to check the return value.
You can't always return an empty object, because 'empty' is not always defined. For example what does it mean for an int, float or bool to be empty?
Returning a NULL pointer is not necessarily a bad practice, but I think it's a better practice to return a (const) reference (where it makes sense to do so of course).
And recently I've often used a Fallible class:
Fallible<std::string> theName = obj.getName();
if (theName)
{
// ...
}
There are various implementations available for such a class (check Google Code Search), I also created my own.

Which is preferable and less expensive: class matching vs exception?

Which is less expensive and preferable: put1 or put2?
Map<String, Animal> map = new Map<String, Animal>();
void put1(){
for (.....)
if (Animal.class.isAssignableFrom(item[i].getClass())
map.put(key[i], item[i]);
void put2(){
for (.....)
try{
map.put(key[i], item[i]);}
catch (...){}
Question revision:
The question wasn't that clear. Let me revise the question a little. I forgot the casting so that put2 depends on cast exception failure. isAssignableFrom(), isInstanceOf() and instanceof are similar functionally and therefore incur the same expense just one is a method to include subclasses,while the 2nd is for exact type matching and the 3rd is the operator version. Both reflective methods and exceptions are expensive operations.
My question is for those who have done some benchmarking in this area - which is less expensive and preferable: instanceof/isassignablefrom vs cast exception?
void put1(){
for (.....)
if (Animal.class.isAssignableFrom(item[i].getClass())
map.put(key[i], (Animal)item[i]);
void put2(){
for (.....)
try{
map.put(key[i], (Animal)item[i]);}
catch (...){}
Probably you want:
if (item[i] instanceof Animal)
map.put(key[i], (Animal) item[i]);
This is almost certainly much better than calling isAssignableFrom.
Or in C# (since you added the c# tag):
var a = item[i] as Animal;
if (a != null)
map[key[i]] = a;
EDIT: The updated question is which is better: instanceof or cast-and-catch. The functionality is basically the same. The performance difference might not be significant and I would have to measure it; generally throwing an exception is slow, but I don't know about the rest. So I would decide based on style. Say what you mean.
If you always expect expect item[i] to be an Animal, and you're just being extra careful, cast-and-catch. Otherwise I find it much clearer to use instanceof, because that plainly says what you mean: "if this object is an Animal, put it in the map".
I'm confused. If item[i] is not an Animal, then how does map.put(key[i], item[i]) even compile?
That said, the first method says what you're intending to do, although I believe instanceof would be an even better check.
Typically exception handling will be significantly slower because, since it is supposed to be used for exceptional things (rarely occurring) not much work is spent by VM makers on speeding it up.
The tr/catch version of your code I would consider to be abuse of exception handling and would never consider doing it. The fact that you are thinking of doing something like this probably means you have a poor design, items should probably an Animal[] not something else, in which case you don't need to check at runtime at all. Let the compiler do the work for you.
I agree with a previous answer - this will not compile.
But, in my opinion, whether it is an exception or a check depends on the purpose of the function.
Is item[i] not being a Animal an error/exceptional case? Is it expected to happen rarely? In this case, it should be an exception.
If it is part of the logic - meaning you expect item[i] to be many things - and only if it is an Animal you want to put in a map. In this case, the instanceof check is the right way.
UPDATE :
I'll also add an example (bit lame) :
Which is better :
(1)
if ( aNumber < 100 ) {
processNumber(aNumber);
}
or (2)
try {
processNumber(aNumber); //Throws exception if aNumber >= 100
} catch () {
}
This depends on what the program does. (1) may be used for counting numbers < 100 for any integer input. (2) will be used if processNumber expects a percentage value which cannot be greater than 100.
The difference is, it is an error for program (2) to get aNumber > 100. However, for program (1) aNumber > 100 is valid, but "something" happens only when aNumber is < 100.
PS - This may not be helpful to you at all, and I apologize if this is the case.
Your two alternatives are not really equivalent. Which one to choose, depends totally on what your code is supposed to do:
If the item is expected to always be
an Animal, then you should use
put2 (which will throw, if
that's not the case...)
If the item may or may not be an
Animal, you should use put1 (which
checks a condition, not an error...)
Never care about performance in the first place, if you're writing code!

How much null checking is enough?

What are some guidelines for when it is not necessary to check for a null?
A lot of the inherited code I've been working on as of late has null-checks ad nauseam. Null checks on trivial functions, null checks on API calls that state non-null returns, etc. In some cases, the null-checks are reasonable, but in many places a null is not a reasonable expectation.
I've heard a number of arguments ranging from "You can't trust other code" to "ALWAYS program defensively" to "Until the language guarantees me a non-null value, I'm always gonna check." I certainly agree with many of those principles up to a point, but I've found excessive null-checking causes other problems that usually violate those tenets. Is the tenacious null checking really worth it?
Frequently, I've observed codes with excess null checking to actually be of poorer quality, not of higher quality. Much of the code seems to be so focused on null-checks that the developer has lost sight of other important qualities, such as readability, correctness, or exception handling. In particular, I see a lot of code ignore the std::bad_alloc exception, but do a null-check on a new.
In C++, I understand this to some extent due to the unpredictable behavior of dereferencing a null pointer; null dereference is handled more gracefully in Java, C#, Python, etc. Have I just seen poor-examples of vigilant null-checking or is there really something to this?
This question is intended to be language agnostic, though I am mainly interested in C++, Java, and C#.
Some examples of null-checking that I've seen that seem to be excessive include the following:
This example seems to be accounting for non-standard compilers as C++ spec says a failed new throws an exception. Unless you are explicitly supporting non-compliant compilers, does this make sense? Does this make any sense in a managed language like Java or C# (or even C++/CLR)?
try {
MyObject* obj = new MyObject();
if(obj!=NULL) {
//do something
} else {
//??? most code I see has log-it and move on
//or it repeats what's in the exception handler
}
} catch(std::bad_alloc) {
//Do something? normally--this code is wrong as it allocates
//more memory and will likely fail, such as writing to a log file.
}
Another example is when working on internal code. Particularly, if it's a small team who can define their own development practices, this seems unnecessary. On some projects or legacy code, trusting documentation may not be reasonable... but for new code that you or your team controls, is this really necessary?
If a method, which you can see and can update (or can yell at the developer who is responsible) has a contract, is it still necessary to check for nulls?
//X is non-negative.
//Returns an object or throws exception.
MyObject* create(int x) {
if(x<0) throw;
return new MyObject();
}
try {
MyObject* x = create(unknownVar);
if(x!=null) {
//is this null check really necessary?
}
} catch {
//do something
}
When developing a private or otherwise internal function, is it really necessary to explicitly handle a null when the contract calls for non-null values only? Why would a null-check be preferable to an assert?
(obviously, on your public API, null-checks are vital as it's considered impolite to yell at your users for incorrectly using the API)
//Internal use only--non-public, not part of public API
//input must be non-null.
//returns non-negative value, or -1 if failed
int ParseType(String input) {
if(input==null) return -1;
//do something magic
return value;
}
Compared to:
//Internal use only--non-public, not part of public API
//input must be non-null.
//returns non-negative value
int ParseType(String input) {
assert(input!=null : "Input must be non-null.");
//do something magic
return value;
}
One thing to remember that your code that you write today while it may be a small team and you can have good documentation, will turn into legacy code that someone else will have to maintain. I use the following rules:
If I'm writing a public API that will be exposed to others, then I will do null checks on all reference parameters.
If I'm writing an internal component to my application, I write null checks when I need to do something special when a null exists, or when I want to make it very clear. Otherwise I don't mind getting the null reference exception since that is also fairly clear what is going on.
When working with return data from other peoples frameworks, I only check for null when it is possible and valid to have a null returned. If their contract says it doesn't return nulls, I won't do the check.
First note that this a special case of contract-checking: you're writing code that does nothing other than validate at runtime that a documented contract is met. Failure means that some code somewhere is faulty.
I'm always slightly dubious about implementing special cases of a more generally useful concept. Contract checking is useful because it catches programming errors the first time they cross an API boundary. What's so special about nulls that means they're the only part of the contract you care to check? Still,
On the subject of input validation:
null is special in Java: a lot of Java APIs are written such that null is the only invalid value that it's even possible to pass into a given method call. In such cases a null check "fully validates" the input, so the full argument in favour of contract checking applies.
In C++, on the other hand, NULL is only one of nearly 2^32 (2^64 on newer architectures) invalid values that a pointer parameter could take, since almost all addresses are not of objects of the correct type. You can't "fully validate" your input unless you have a list somewhere of all objects of that type.
The question then becomes, is NULL a sufficiently common invalid input to get special treatment that (foo *)(-1) doesn't get?
Unlike Java, fields don't get auto-initialized to NULL, so a garbage uninitialized value is just as plausible as NULL. But sometimes C++ objects have pointer members which are explicitly NULL-inited, meaning "I don't have one yet". If your caller does this, then there is a significant class of programming errors which can be diagnosed by a NULL check. An exception may be easier for them to debug than a page fault in a library they don't have the source for. So if you don't mind the code bloat, it might be helpful. But it's your caller you should be thinking of, not yourself - this isn't defensive coding, because it only 'defends' against NULL, not against (foo *)(-1).
If NULL isn't a valid input, you could consider taking the parameter by reference rather than pointer, but a lot of coding styles disapprove of non-const reference parameters. And if the caller passes you *fooptr, where fooptr is NULL, then it has done nobody any good anyway. What you're trying to do is squeeze a bit more documentation into the function signature, in the hope that your caller is more likely to think "hmm, might fooptr be null here?" when they have to explicitly dereference it, than if they just pass it to you as a pointer. It only goes so far, but as far as it goes it might help.
I don't know C#, but I understand that it's like Java in that references are guaranteed to have valid values (in safe code, at least), but unlike Java in that not all types have a NULL value. So I'd guess that null checks there are rarely worth it: if you're in safe code then don't use a nullable type unless null is a valid input, and if you're in unsafe code then the same reasoning applies as in C++.
On the subject of output validation:
A similar issue arises: in Java you can "fully validate" the output by knowing its type, and that the value isn't null. In C++, you can't "fully validate" the output with a NULL check - for all you know the function returned a pointer to an object on its own stack which has just been unwound. But if NULL is a common invalid return due to the constructs typically used by the author of the callee code, then checking it will help.
In all cases:
Use assertions rather than "real code" to check contracts where possible - once your app is working, you probably don't want the code bloat of every callee checking all its inputs, and every caller checking its return values.
In the case of writing code which is portable to non-standard C++ implementations, then instead of the code in the question which checks for null and also catches the exception, I'd probably have a function like this:
template<typename T>
static inline void nullcheck(T *ptr) {
#if PLATFORM_TRAITS_NEW_RETURNS_NULL
if (ptr == NULL) throw std::bad_alloc();
#endif
}
Then as one of the list of things you do when porting to a new system, you define PLATFORM_TRAITS_NEW_RETURNS_NULL (and maybe some other PLATFORM_TRAITS) correctly. Obviously you can write a header which does this for all the compilers you know about. If someone takes your code and compiles it on a non-standard C++ implementation that you know nothing about, they're fundamentally on their own for bigger reasons than this, so they'll have to do it themselves.
If you write the code and its contract, you are responsible for using it in terms of its contract and ensuring the contract is correct. If you say "returns a non-null" x, then the caller should not check for null. If a null pointer exception then occurs with that reference/pointer, it is your contract that is incorrect.
Null checking should only go to the extreme when using a library that is untrusted, or does not have a proper contract. If it is your development team's code, stress that the contracts must not be broken, and track down the person who uses the contract incorrectly when bugs occur.
Part of this depends on how the code is used -- if it is a method available only within a project vs. a public API, for example. API error checking requires something stronger than an assertion.
So while this is fine within a project where it's supported with unit tests and stuff like that:
internal void DoThis(Something thing)
{
Debug.Assert(thing != null, "Arg [thing] cannot be null.");
//...
}
in a method where you don't have control over who calls it, something like this may be better:
public void DoThis(Something thing)
{
if (thing == null)
{
throw new ArgumentException("Arg [thing] cannot be null.");
}
//...
}
It depends on the situation. The rest of my answer assumes C++.
I never test the return value of new
since all the implementations I use
throw bad_alloc on failure. If I
see a legacy test for new returning
null in any code I'm working on, I
cut it out and don't bother to
replace it with anything.
Unless small minded coding standards
prohibit it, I assert documented
preconditions. Broken code which
violates a published contract needs
to fail immediately and
dramatically.
If the null arises from a runtime
failure which isn't due to broken
code, I throw. fopen failure and
malloc failure (though I rarely if
ever use them in C++) would fall
into this category.
I don't attempt to recover from
allocation failure. Bad_alloc gets
caught in main().
If the null test
is for an object which is
collaborator of my class, I rewrite
the code to take it by reference.
If the collaborator really might not
exist, I use the Null Object
design pattern to create a
placeholder to fail in well defined
ways.
NULL checking in general is evil as it's add a small negative token to the code testability. With NULL checks everywhere you can't use "pass null" technique and it will hit you when unit testing. It's better to have unit test for the method than null check.
Check out decent presentation on that issue and unit testing in general by Misko Hevery at http://www.youtube.com/watch?v=wEhu57pih5w&feature=channel
Older versions of Microsoft C++ (and probably others) did not throw an exception for failed allocations via new, but returned NULL. Code that had to run in both standard-conforming and older versions would have the redundant checking that you point out in your first example.
It would be cleaner to make all failed allocations follow the same code path:
if(obj==NULL)
throw std::bad_alloc();
It's widely known that there are procedure-oriented people (focus on doing things the right way) and results-oriented people (get the right answer). Most of us lie somewhere in the middle. Looks like you've found an outlier for procedure-oriented. These people would say "anything's possible unless you understand things perfectly; so prepare for anything." For them, what you see is done properly. For them if you change it, they'll worry because the ducks aren't all lined up.
When working on someone else's code, I try to make sure I know two things.
1. What the programmer intended
2. Why they wrote the code the way they did
For following up on Type A programmers, maybe this helps.
So "How much is enough" ends up being a social question as much as a technical question - there's no agreed-upon way to measure it.
(It drives me nuts too.)
Personally I think null testing is unnnecessary in the great majority of cases. If new fails or malloc fails you have bigger issues and the chance of recovering is just about nil in cases where you're not writing a memory checker! Also null testing hides bugs a lot in the development phases since the "null" clauses are frequently just empty and do nothing.
When you can specify which compiler is being used, for system functions such as "new" checking for null is a bug in the code. It means that you will be duplicating the error handling code. Duplicate code is often a source of bugs because often one gets changed and the other doesn't. If you can not specify the compiler or compiler versions, you should be more defensive.
As for internal functions, you should specify the contract and make sure that contract is enforce via unit tests. We had a problem in our code a while back where we either threw an exception or returned null in case of a missing object from our database. This just made things confusing for the caller of the api so we went through and made it consistant throughout the entire code base and removed the duplicate checks.
The important thing (IMHO) is to not have duplicate error logic where one branch will never be invoked. If you can never invoke code, then you can't test it, and you will never know if it is broken or not.
I'd say it depends a little on your language, but I use Resharper with C# and it basically goes out of it's way to tell me "this reference could be null" in which case I add a check, if it tells me "this will always be true" for "if (null != oMyThing && ....)" then I listen to it an don't test for null.
Whether to check for null or not greatly depends on the circumstances.
For example in our shop we check parameters to methods we create for null inside the method. The simple reason is that as the original programmer I have a good idea of exactly what the method should do. I understand the context even if the documentation and requirements are incomplete or less than satisfactory. A later programmer tasked with maintenance may not understand the context and may assume, wrongly, that passing null is harmless. If I know null would be harmful and I can anticipate that someone may pass null, I should take the simple step of making sure that the method reacts in a graceful way.
public MyObject MyMethod(object foo)
{
if (foo == null)
{
throw new ArgumentNullException("foo");
}
// do whatever if foo was non-null
}
I only check for NULL when I know what to do when I see NULL. "Know what to do" here means "know how to avoid a crash" or "know what to tell the user besides the location of the crash". For example, if malloc() returns NULL, I usually have no option but to abort the program. On the other hand, if fopen() returns NULL, I can let the user know the file name that could not be open and may be errno. And if find() returns end(), I usually know how to continue without crashing.
Lower level code should check use from higher level code. Usually this means checking arguments, but it can mean checking return values from upcalls. Upcall arguments need not be checked.
The aim is to catch bugs in immediate and obvious ways, as well as documenting the contract in code that does not lie.
I don't think it's bad code. A fair amount of Windows/Linux API calls return NULL on failure of some sort. So, of course, I check for failure in the manner the API specifies. Usually I wind up passing control flow to an error module of some fashion instead of duplicating error-handling code.
If I receive a pointer that is not guaranteed by language to be not null, and am going to de-reference it in a way that null will break me, or pass out put my function where I said I wouldn't produce NULLs, I check for NULL.
It is not just about NULLs, a function should check pre- and post-conditions if possible.
It doesn't matter at all if a contract of the function that gave me the pointer says it'll never produce nulls. We all make bugs. There's a good rule that a program shall fail early and often, so instead of passing the bug to another module and have it fail, I'll fail in place. Makes things so much easier to debug when testing. Also in critical systems makes it easier to keep the system sane.
Also, if an exception escapes main, stack may not be rolled up, preventing destructors from running at all (see C++ standard on terminate()). Which may be serious. So leaving bad_alloc unchecked can be more dangerous than it seems.
Fail with assert vs. fail with a run time error is quite a different topic.
Checking for NULL after new() if standard new() behavior has not been altered to return NULL instead of throwing seems obsolete.
There's another problem, which is that even if malloc returned a valid pointer, it doesn't yet mean you have allocated memory and can use it. But that is another story.
My first problem with this, is that it leads to code which is littered with null checks and the likes. It hurts readability, and i’d even go as far as to say that it hurts maintainability because it really is easy to forget a null check if you’re writing a piece of code where a certain reference really should never be null. And you just know that the null checks will be missing in some places. Which actually makes debugging harder than it needs to be. Had the original exception not been caught and replaced with a faulty return value, then we would’ve gotten a valuable exception object with an informative stacktrace. What does a missing null check give you? A NullReferenceException in a piece of code that makes you go: wtf? this reference should never be null!
So then you need to start figuring out how the code was called, and why the reference could possibly be null. This can take a lot of time and really hurts the efficiency of your debugging efforts. Eventually you’ll figure out the real problem, but odds are that it was hidden pretty deeply and you spent a lot more time searching for it than you should have.
Another problem with null checks all over the place is that some developers don’t really take the time to properly think about the real problem when they get a NullReferenceException. I’ve actually seen quite a few developers just add a null check above the code where the NullReferenceException occurred. Great, the exception no longer occurs! Hurray! We can go home now! Umm… how bout ‘no you can’t and you deserve an elbow to the face’? The real bug might not cause an exception anymore, but now you probably have missing or faulty behavior… and no exception! Which is even more painful and takes even more time to debug.
At first, this seemed like a strange question: null checks are great and a valuable tool. Checking that new returns null is definitely silly. I'm just going to ignore the fact that there are languages that allow that. I'm sure there are valid reasons, but I really don't think I can handle living in that reality :) All kidding aside, it seems like you should at least have to specify that the new should return null when there isn't enough memory.
Anyway, checking for null where appropriate leads to cleaner code. I'd go so far as to say that never assigning function parameters default values is the next logical step. To go even further, returning empty arrays, etc. where appropriate leads to even cleaner code. It is nice to not have to worry about getting nulls except where they are logically meaningful. Nulls as error values are better avoided.
Using asserts is a really great idea. Especially if it gives you the option of turning them off at runtime. Plus, it is a more explicitly contractual style :)

Categories