NullReference at seemingly innocent WeakReference access? - c#

So, I have a piece of code using WeakReferences. I know of the common .IsAlive race condition, so I didn't use that. I basically have something like this:
WeakReference lastString=new WeakReference(null);
public override string ToString()
{
if(lastString!=null)
{
var s=lastString.Target as string; //error here
if(s!=null)
{
return s;
}
}
var str=base.ToString();
lastString=new WeakReference(str);
return str;
}
Somehow, I'm getting a null reference exception at the marked line. By debugging it, I can confirm that lastString is indeed null, despite being wrapped in a null check and lastString never actually being set to null.
This only happens in a complex flow as well, which makes me think garbage collection is somehow taking my actual WeakReference object, and not just it's target.
Can someone enlighten me as to how this is happening and what the best course of action is?
EDIT:
I can't determine at all the cause of this. I ended up wrapping the error code in a try-catch just fix it for now. I'm very interested in the root cause of this though. I've been trying to reproduce this in a simple test case, but it's proven very difficult to do. Also, this only appears to happen when running under a unit test runner. If I take the code and trim it down to the minimum, it will continue to crash when running using TestDriven and Gallio, but will not fail when put into a console application

This ended up being a very hard to spot logic bug that was in plain sight.
The offending if statement really was more like this:
if(lastString!=null && limiter==null || limiter=lastLimiter)
The true grouping of this is more like this:
if((lastString!=null && limiter==null) || limiter=lastLimiter)
And as Murphy's law would dictate, somehow, in this one unrelated test case, lastLimiterand lastString got set to null by a method used no where but this one single test case.
So yea, no bug in the CLR, just my own logic bug that was very hard to spot

Related

A better way to handle NullReferenceExceptions in C#

I recently had a coding bug where under certain conditions a variable wasn't being initialized and I was getting a NullReferenceException . This took a while to debug as I had to find the bits of data that would generate this to recreate it the error as the exception doesn't give the variable name.
Obviously I could check every variable before use and throw an informative exception but is there a better (read less coding) way of doing this? Another thought I had was shipping with the pdb files so that the error information would contain the code line that caused the error. How do other people avoid / handle this problem?
Thanks
Firstly: don't do too much in a single statement. If you have huge numbers of dereferencing operations in one line, it's going to be much harder to find the culprit. The Law of Demeter helps with this too - if you've got something like order.SalesClerk.Manager.Address.Street.Length then you've got a lot of options to wade through when you get an exception. (I'm not dogmatic about the Law of Demeter, but everything in moderation...)
Secondly: prefer casting over using as, unless it's valid for the object to be a different type, which normally involves a null check immediately afterwards. So here:
// What if foo is actually a Control, but we expect it to be String?
string text = foo as string;
// Several lines later
int length = text.Length; // Bang!
Here we'd get a NullReferenceException and eventually trace it back to text being null - but then you wouldn't know whether that's because foo was null, or because it was an unexpected type. If it should really, really be a string, then cast instead:
string text = (string) foo;
Now you'll be able to tell the difference between the two scenarios.
Thirdly: as others have said, validate your data - typically arguments to public and potentially internal APIs. I do this in enough places in Noda Time that I've got a utility class to help me declutter the check. So for example (from Period):
internal LocalInstant AddTo(LocalInstant localInstant,
CalendarSystem calendar, int scalar)
{
Preconditions.CheckNotNull(calendar, "calendar");
...
}
You should document what can and can't be null, too.
In a lot of cases it's near impossible to plan and account for every type of exception that might happen at any given point in the execution flow of your application. Defensive coding is effective only to a certain point. The trick is to have a solid diagnostics stack incorporated into your application that can give you meaningful information about unhandled errors and crashes. Having a good top-level (last ditch) handler at the app-domain level will help a lot with that.
Yes, shipping the PDBs (even with a release build) is a good way to obtain a complete stack trace that can pinpoint the exact location and causes of errors. But whatever diagnostics approach you pick, it needs to be baked into the design of the application to begin with (ideally). Retrofitting an existing app can be tedious and time/money-intensive.
Sorry to say that I will always make a check to verify that any object I am using in a particular method is not null.
It's as simple as
if( this.SubObject == null )
{
throw new Exception("Could not perform METHOD - SubObject is null.");
}
else
{
...
}
Otherwise I can't think of any way to be thorough. Wouldn't make much sense to me not to make these checks anyway; I feel it's just good practice.
First of all you should always validate your inputs. If null is not allowed, throw an ArgumentNullException.
Now, I know how that can be painful, so you could look into assembly rewriting tools that do it for you. The idea is that you'd have a kind of attribute that would mark those arguments that can't be null:
public void Method([NotNull] string name) { ...
And the rewriter would fill in the blanks...
Or a simple extension method could make it easier
name.CheckNotNull();
If you are just looking for a more compact way to code against having null references, don't overlook the null-coalescing operator ?? MSDN
Obviously, it depends what you are doing but it can be used to avoid extra if statements.

C# Boilerplate code

I'm thinking of building some generic extensions that will take a way all these null, throw checks and asserts and instead use fluent APIs to handle this.
So I'm thinking of doing something like this.
Shall() - Not quite sure about this one yet
.Test(...) - Determines whether the contained logic executed without any errors
.Guard(...) - Guards the contained logic from throwing any exception
.Assert(...) - Asserts before the execution of the code
.Throw(...) - Throws an exception based on a certain condition
.Assume(...) - Similar to assert but calls to Contract.Assume
Usage: father.Shall().Guard(f => f.Shop())
The thing is that I don't want these extra calls at run-time and I know AOP can solve this for me, I want to inline these calls directly to the caller, if you have a better way of doing that please do tell.
Now, before I'm researching or doing anything I wonder whether someone already done that or know of a tool that is doing it ?
I really want to build something like that and release it to public because I think that it can save a lot of time and headache.
Some examples.
DbSet<TEntity> set = Set<TEntity>();
if (set != null)
{
if (Contains(entity))
{
set.Remove(entity);
}
else
{
set.Attach(entity);
set.Remove(entity);
}
}
Changes to the following.
Set<TEntity>().Shall().Guard(set =>
{
if (Contains(entity))
{
set.Remove(entity);
}
else
{
set.Attach(entity);
set.Remove(entity);
}
});
Instead of being funny and try to make fun of other people, some people can really learn something about maturity, you can share your experience and tell me what's so good or bad about it, that I'll accept.
I'm not trying to recreate Code Contracts, I know what it is I'm using it everyday, I'm trying to move the boilerplate code that is written to one place.
Sometimes you have methods that for each call you have to check the returned object and is not your code so you can't ensure that the callee won't result a null so in the caller you have to perform null checks on the returned object so I thought of something that may allow me to perform these checks easily when chaining calls.
Update: I'll have to think about it some more and change the API to make the intentions clear and the code more readable.
I think that the idea is not polished at all and that indeed I went too far with all these methods.
Anyways, I'll leave it for now.
It sounds like you're describing something like Code Contracts: http://msdn.microsoft.com/en-us/devlabs/dd491992
If I understand what you're looking for, then the closest thing I've come up with is the extension method:
public static Chain<T>(this T obj, Action<T> act)
{
act(obj);
return obj;
}
This allows you to do the following:
Set.Remove(Set.FirstOrDefault(entity) ?? entity.Chain(a => Set.Add(a)));
As you can see, though, this isn't the most readable code. This isn't to say that Chain extension method is bad (and it certainly has its uses), but that Chain extension method can definitely be abused, so use cautiously or the ghost of programming past will come back to haunt you.

How do I test code that should never be executed?

Following method shall only be called if it has been verified that there are invalid digits (by calling another method). How can I test-cover the throw-line in the following snippet? I know that one way could be to merge together the VerifyThereAreInvalidiDigits and this method. I'm looking for any other ideas.
public int FirstInvalidDigitPosition {
get {
for (int index = 0; index < this.positions.Count; ++index) {
if (!this.positions[index].Valid) return index;
}
throw new InvalidOperationException("Attempt to get invalid digit position whene there are no invalid digits.");
}
}
I also would not want to write a unit test that exercises code that should never be executed.
If the "throw" statement in question is truly unreachable under any possible scenario then it should be deleted and replaced with:
Debug.Fail("This should be unreachable; please find and fix the bug that caused this to be reached.");
If the code is reachable then write a unit test that tests that scenario. Error-reporting scenarios for public-accessible methods are perfectly valid scenarios. You have to handle all inputs correctly, even bad inputs. If the correct thing to do is to throw an exception then test that you are throwing an exception.
UPDATE: according to the comments, it is in fact impossible for the error to be hit and therefore the code is unreachable. But now the Debug.Fail is not reachable either, and it doesn't compile because the compiler notes that a method that returns a value has a reachable end point.
The first problem should not actually be a problem; surely the code coverage tool ought to be configurable to ignore unreachable debug-only code. But both problem can be solved by rewriting the loop:
public int FirstInvalidDigitPosition
{
get
{
int index = 0;
while(true)
{
Debug.Assert(index < this.positions.Length, "Attempt to get invalid digit position but there are no invalid digits!");
if (!this.positions[index].Valid) return index;
index++;
}
}
}
An alternative approach would be to reorganize the code so that you don't have the problem in the first place:
public int? FirstInvalidDigitPosition {
get {
for (int index = 0; index < this.positions.Count; ++index) {
if (!this.positions[index].Valid) return index;
}
return null;
}
}
and now you don't need to restrict the callers to call AreThereInvalidDigits first; just make it legal to call this method any time. That seems like the safer thing to do. Methods that blow up when you don't do some expensive check to verify that they are safe to call are fragile, dangerous methods.
I don't understand why you wouldn't want to write a unit test that exercises "code that should not happen".
If it "should not happen", then why are you writing the code at all? Because you think it might happen of course - perhaps an error in a future refactoring will break your other validation and this code will be executed.
If you think it's worth writing the code, it's worth unit testing it.
It's sometimes necessary to write small bits of code that can't execute, just to appease a compiler (e.g. if a function which always throw an exception, exits the application, etc. is called from a function which is supposed to return a value, the compiler may insist that the caller include a "return 0", "Return Nothing", etc. statement which can never execute. I wouldn't think one should be required to test such "code", since it may be impossible to make it execute without seriously breaking other parts of the system.
A more interesting question comes with code that the processor should never be able to execute under normal circumstances, but which exists to minimize harm from abnormal circumstances. For example, some safety-critical applications I've written start with code something like (real application is machine code; given below is pseudo-code)
register = 0
test register for zero
if non-zero goto dead
register = register - 1
test register for zero
if non-zero goto okay1
dead:
shut everything down
goto dead
okay1:
register = register + 1
if non-zero goto dead
I'm not sure how exactly one would go about testing that code in the failure case. The purpose of the code is to ensure that if the register has suffered electrostatic or other damage, or is otherwise not working, the system will shut down safely. Absent some very fancy equipment, though, I have no idea how to test such code.
Part of the purpose of testing is to test both things that should happen and things that can happen.
If you have code that should never execute, but could under the right (or wrong) conditions, you can verify that the exception occurs in the right conditions (in MS's Unit Test Framework) by decorating your test method with:
[ExpectedException(typeof(InvalidOperationException))]
To rephrase the issue, basically: how do you test an invalid state of your system when you've already gone to a lot of trouble to make sure that the system can't get into that invalid state in the first place?
It will make for slightly slower tests, but I would suggest using reflection to put your system into the invalid state in order to test it.
Also notice that reflection is precisely one of the possible ways for your system to get into that invalid state, which will execute your supposedly impossible to reach code.
If your code was truly impossible to reach, which is not quite the case in your example as far as I see, then you should remove the code. Eg:
public int GetValue() {
return 5;
throw new InvalidOperationException("Somehow the return statement did not work!");
}
I also would not want to write a unit test that exercises code that should never be executed.
What do you mean by "should" here? Like, that it ought not to be executed? Surely it should be ok for the code to be executed in a test. It should be executed in a test.

null objects vs. empty objects

[ This is a result of Best Practice: Should functions return null or an empty object? but I'm trying to be very general. ]
In a lot of legacy (um...production) C++ code that I've seen, there is a tendency to write a lot of NULL (or similar) checks to test pointers. Many of these get added near the end of a release cycle when adding a NULL-check provides a quick fix to a crash caused by the pointer dereference--and there isn't a lot of time to investigate.
To combat this, I started to write code that took a (const) reference parameter instead of the (much) more common technique of passing a pointer. No pointer, no desire to check for NULL (ignoring the corner case of actually having a null reference).
In C#, the same C++ "problem" is present: the desire to check every unknown reference against null (ArgumentNullException) and to quickly fix NullReferenceExceptions by adding a null check.
It seems to me, one way to prevent this is to avoid null objects in the first place by using empty objects (String.Empty, EventArgs.Empty) instead. Another would be to throw an exception rather than return null.
I'm just starting to learn F#, but it appears there are far fewer null objects in that enviroment. So maybe you don't really have to have a lot of null references floating around?
Am I barking up the wrong tree here?
Passing non-null just to avoid a NullReferenceException is trading a straightforward, easy-to-solve problem ("it blows up because it's null") for a much more subtle, hard-to-debug problem ("something several calls down the stack is not behaving as expected because much earlier it got some object which has no meaningful information but isn't null").
NullReferenceException is a wonderful thing! It fails hard, loud, fast, and it's almost always quick and easy to identify and fix. It's my favorite exception, because I know when I see it, my task is only going to take about 2 minutes. Contrast this with a confusing QA or customer report trying to describe strange behavior that has to be reproduced and traced back to the origin. Yuck.
It all comes down to what you, as a method or piece of code, can reasonably infer about the code which called you. If you are handed a null reference, and you can reasonably infer what the caller might have meant by null (maybe an empty collection, for example?) then you should definitely just deal with the nulls. However, if you can't reasonably infer what to do with a null, or what the caller means by null (for example, the calling code is telling you to open a file and gives the location as null), you should throw an ArgumentNullException.
Maintaining proper coding practices like this at every "gateway" point - logical bounds of functionality in your code—NullReferenceExceptions should be much more rare.
I tend to be dubious of code with lots of NULLs, and try to refactor them away where possible with exceptions, empty collections, Java Optionals, and so on.
The "Introduce Null Object" pattern in Martin Fowler's Refactoring (page 260) may also be helpful. A Null Object responds to all the methods a real object would, but in a way that "does the right thing". So rather than always check an Order to see if order.getDiscountPolicy() is NULL, make sure the Order has a NullDiscountPolicy in these cases. This streamlines the control logic.
Null gets my vote. Then again, I'm of the 'fail-fast' mindset.
String.IsNullOrEmpty(...) is very helpful too, I guess it catches either situation: null or empty strings. You could write a similar function for all your classes you're passing around.
If you are writing code that returns null as an error condition, then don't: generally, you should throw an exception instead - far harder to miss.
If you are consuming code that you fear may return null, then mostly these are boneheaded exceptions: perhaps do some Debug.Assert checks at the caller to sense-check the output during development. You shouldn't really need vast numbers of null checks in you production, but if some 3rd party library returns lots of nulls unpredictably, then sure: do the checks.
In 4.0, you might want to look at code-contracts; this gives you much better control to say "this argument should never be passed in as null", "this function never returns null", etc - and have the system validate those claims during static analysis (i.e. when you build).
The thing about null is that it doesn't come with meaning. It is merely the absence of an object.
So, if you really mean an empty string/collection/whatever, always return the relevant object and never null. If the language in question allows you to specify that, do so.
In the case where you want to return something that means not a value specifiable with the static type, then you have a number of options. Returning null is one answer, but without a meaning is a little dangerous. Throwing an exception may actually be what you mean. You might want to extend the type with special cases (probably with polymorphism, that is to say the Special Case Pattern (a special case of which is the Null Object Pattern)). You might want to wrap the return value in an type with more meaning. Or you might want to pass in a callback object. There usually are many choices.
I'd say it depends. For a method returning a single object, I'd generally return null. For a method returning a collection, I'd generally return an empty collection (non-null). These are more along the lines of guidelines than rules, though.
If you are serious about wanting to program in a "nullless" environment, consider using extension methods more often, they are immune to NullReferenceExceptions and at least "pretend" that null isn't there anymore:
public static GetExtension(this string s)
{
return (new FileInfo(s ?? "")).Extension;
}
which can be called as:
// this code will never throw, not even when somePath becomes null
string somePath = GetDataFromElseWhereCanBeNull();
textBoxExtension.Text = somePath.GetExtension();
I know, this is only convenience and many people correctly consider it violation of OO principles (though the "founder" of OO, Bertrand Meyer, considers null evil and completely banished it from his OO design, which is applied to the Eiffel language, but that's another story). EDIT: Dan mentions that Bill Wagner (More Effective C#) considers it bad practice and he's right. Ever considered the IsNull extension method ;-) ?
To make your code more readable, another hint may be in place: use the null-coalescing operator more often to designate a default when an object is null:
// load settings
WriteSettings(currentUser.Settings ?? new Settings());
// example of some readonly property
public string DisplayName
{
get
{
return (currentUser ?? User.Guest).DisplayName
}
}
None of these take the occasional check for null away (and ?? is nothing more then a hidden if-branch). I prefer as little null in my code as possible, simply because I believe it makes the code more readable. When my code gets cluttered with if-statements for null, I know there's something wrong in the design and I refactor. I suggest anybody to do the same, but I know that opinions vary wildly on the matter.
(Update) Comparison with exceptions
Not mentioned in the discussion so far is the similarity with exception handling. When you find yourself ubiquitously ignoring null whenever you consider it's in your way, it is basically the same as writing:
try
{
//...code here...
}
catch (Exception) {}
which has the effect of removing any trace of the exceptions only to find it raises unrelated exceptions much later in the code. Though I consider it good to avoid using null, as mentioned before in this thread, having null for exceptional cases is good. Just don't hide them in null-ignore-blocks, it will end up having the same effect as the catch-all-exceptions blocks.
For the exception protagonists they usually stem from transactional programming and strong exception safety guarantees or blind guidelines. In any decent complexity, ie. async workflow, I/O and especially networking code they are simply inappropriate. The reason why you see Google style docs on the matter in C++, as well as all good async code 'not enforcing it' (think your favourite managed pools as well).
There is more to it and while it might look like a simplification, it really is that simple. For one you will get a lot of exceptions in something that wasn't designed for heavy exception use.. anyway I digress, read upon on this from the world's top library designers, the usual place is boost (just don't mix it up with the other camp in boost that loves exceptions, because they had to write music software :-).
In your instance, and this is not Fowler's expertise, an efficient 'empty object' idiom is only possible in C++ due to available casting mechanism (perhaps but certainly not always by means of dominance ).. On ther other hand, in your null type you are capable throwing exceptions and doing whatever you want while preserving the clean call site and structure of code.
In C# your choice can be a single instance of a type that is either good or malformed; as such it is capable of throwing acceptions or simply running as is. So it might or might not violate other contracts ( up to you what you think is better depending on the quality of code you're facing ).
In the end, it does clean up call sites, but don't forget you will face a clash with many libraries (and especially returns from containers/Dictionaries, end iterators spring to mind, and any other 'interfacing' code to the outside world ). Plus null-as-value checks are extremely optimised pieces of machine code, something to keep in mind but I will agree any day wild pointer usage without understanding constness, references and more is going to lead to different kind of mutability, aliasing and perf problems.
To add, there is no silver bullet, and crashing on null reference or using a null reference in managed space, or throwing and not handling an exception is an identical problem, despite what managed and exception world will try to sell you. Any decent environment offers a protection from those (heck you can install any filter on any OS you want, what else do you think VMs do), and there are so many other attack vectors that this one has been overhammered to pieces. Enter x86 verification from Google yet again, their own way of doing much faster and better 'IL', 'dynamic' friendly code etc..
Go with your instinct on this, weight the pros and cons and localise the effects.. in the future your compiler will optimise all that checking anyway, and far more efficiently than any runtime or compile-time human method (but not as easily for cross-module interaction).
I try to avoid returning null from a method wherever possible. There are generally two kinds of situations - when null result would be legal, and when it should never happen.
In the first case, when no result is legal, there are several solutions available to avoid null results and null checks that are associated with them: Null Object pattern and Special Case pattern are there to return substitute objects that do nothing, or do some specific thing under specific circumstances.
If it is legal to return no object, but still there are no suitable substitutes in terms of Null Object or Special Case, then I typically use the Option functional type - I can then return an empty option when there is no legal result. It is then up to the client to see what is the best way to deal with empty option.
Finally, if it is not legal to have any object returned from a method, simply because the method cannot produce its result if something is missing, then I choose to throw an exception and cut further execution.
How are empty objects better than null objects? You're just renaming the symptom. The problem is that the contracts for your functions are too loosely defined "this function might return something useful, or it might return a dummy value" (where the dummy value might be null, an "empty object", or a magic constant like -1.) But no matter how you express this dummy value, callers still have to check for it before they use the return value.
If you want to clean up your code, the solution should be to narrow down the function so that it doesn't return a dummy value in the first place.
If you have a function which might return a value, or might return nothing, then pointers are a common (and valid) way to express this. But often, your code can be refactored so that this uncertainty is removed. If you can guarantee that a function returns something meaningful, then callers can rely on it returning something meaningful, and then they don't have to check the return value.
You can't always return an empty object, because 'empty' is not always defined. For example what does it mean for an int, float or bool to be empty?
Returning a NULL pointer is not necessarily a bad practice, but I think it's a better practice to return a (const) reference (where it makes sense to do so of course).
And recently I've often used a Fallible class:
Fallible<std::string> theName = obj.getName();
if (theName)
{
// ...
}
There are various implementations available for such a class (check Google Code Search), I also created my own.

How much null checking is enough?

What are some guidelines for when it is not necessary to check for a null?
A lot of the inherited code I've been working on as of late has null-checks ad nauseam. Null checks on trivial functions, null checks on API calls that state non-null returns, etc. In some cases, the null-checks are reasonable, but in many places a null is not a reasonable expectation.
I've heard a number of arguments ranging from "You can't trust other code" to "ALWAYS program defensively" to "Until the language guarantees me a non-null value, I'm always gonna check." I certainly agree with many of those principles up to a point, but I've found excessive null-checking causes other problems that usually violate those tenets. Is the tenacious null checking really worth it?
Frequently, I've observed codes with excess null checking to actually be of poorer quality, not of higher quality. Much of the code seems to be so focused on null-checks that the developer has lost sight of other important qualities, such as readability, correctness, or exception handling. In particular, I see a lot of code ignore the std::bad_alloc exception, but do a null-check on a new.
In C++, I understand this to some extent due to the unpredictable behavior of dereferencing a null pointer; null dereference is handled more gracefully in Java, C#, Python, etc. Have I just seen poor-examples of vigilant null-checking or is there really something to this?
This question is intended to be language agnostic, though I am mainly interested in C++, Java, and C#.
Some examples of null-checking that I've seen that seem to be excessive include the following:
This example seems to be accounting for non-standard compilers as C++ spec says a failed new throws an exception. Unless you are explicitly supporting non-compliant compilers, does this make sense? Does this make any sense in a managed language like Java or C# (or even C++/CLR)?
try {
MyObject* obj = new MyObject();
if(obj!=NULL) {
//do something
} else {
//??? most code I see has log-it and move on
//or it repeats what's in the exception handler
}
} catch(std::bad_alloc) {
//Do something? normally--this code is wrong as it allocates
//more memory and will likely fail, such as writing to a log file.
}
Another example is when working on internal code. Particularly, if it's a small team who can define their own development practices, this seems unnecessary. On some projects or legacy code, trusting documentation may not be reasonable... but for new code that you or your team controls, is this really necessary?
If a method, which you can see and can update (or can yell at the developer who is responsible) has a contract, is it still necessary to check for nulls?
//X is non-negative.
//Returns an object or throws exception.
MyObject* create(int x) {
if(x<0) throw;
return new MyObject();
}
try {
MyObject* x = create(unknownVar);
if(x!=null) {
//is this null check really necessary?
}
} catch {
//do something
}
When developing a private or otherwise internal function, is it really necessary to explicitly handle a null when the contract calls for non-null values only? Why would a null-check be preferable to an assert?
(obviously, on your public API, null-checks are vital as it's considered impolite to yell at your users for incorrectly using the API)
//Internal use only--non-public, not part of public API
//input must be non-null.
//returns non-negative value, or -1 if failed
int ParseType(String input) {
if(input==null) return -1;
//do something magic
return value;
}
Compared to:
//Internal use only--non-public, not part of public API
//input must be non-null.
//returns non-negative value
int ParseType(String input) {
assert(input!=null : "Input must be non-null.");
//do something magic
return value;
}
One thing to remember that your code that you write today while it may be a small team and you can have good documentation, will turn into legacy code that someone else will have to maintain. I use the following rules:
If I'm writing a public API that will be exposed to others, then I will do null checks on all reference parameters.
If I'm writing an internal component to my application, I write null checks when I need to do something special when a null exists, or when I want to make it very clear. Otherwise I don't mind getting the null reference exception since that is also fairly clear what is going on.
When working with return data from other peoples frameworks, I only check for null when it is possible and valid to have a null returned. If their contract says it doesn't return nulls, I won't do the check.
First note that this a special case of contract-checking: you're writing code that does nothing other than validate at runtime that a documented contract is met. Failure means that some code somewhere is faulty.
I'm always slightly dubious about implementing special cases of a more generally useful concept. Contract checking is useful because it catches programming errors the first time they cross an API boundary. What's so special about nulls that means they're the only part of the contract you care to check? Still,
On the subject of input validation:
null is special in Java: a lot of Java APIs are written such that null is the only invalid value that it's even possible to pass into a given method call. In such cases a null check "fully validates" the input, so the full argument in favour of contract checking applies.
In C++, on the other hand, NULL is only one of nearly 2^32 (2^64 on newer architectures) invalid values that a pointer parameter could take, since almost all addresses are not of objects of the correct type. You can't "fully validate" your input unless you have a list somewhere of all objects of that type.
The question then becomes, is NULL a sufficiently common invalid input to get special treatment that (foo *)(-1) doesn't get?
Unlike Java, fields don't get auto-initialized to NULL, so a garbage uninitialized value is just as plausible as NULL. But sometimes C++ objects have pointer members which are explicitly NULL-inited, meaning "I don't have one yet". If your caller does this, then there is a significant class of programming errors which can be diagnosed by a NULL check. An exception may be easier for them to debug than a page fault in a library they don't have the source for. So if you don't mind the code bloat, it might be helpful. But it's your caller you should be thinking of, not yourself - this isn't defensive coding, because it only 'defends' against NULL, not against (foo *)(-1).
If NULL isn't a valid input, you could consider taking the parameter by reference rather than pointer, but a lot of coding styles disapprove of non-const reference parameters. And if the caller passes you *fooptr, where fooptr is NULL, then it has done nobody any good anyway. What you're trying to do is squeeze a bit more documentation into the function signature, in the hope that your caller is more likely to think "hmm, might fooptr be null here?" when they have to explicitly dereference it, than if they just pass it to you as a pointer. It only goes so far, but as far as it goes it might help.
I don't know C#, but I understand that it's like Java in that references are guaranteed to have valid values (in safe code, at least), but unlike Java in that not all types have a NULL value. So I'd guess that null checks there are rarely worth it: if you're in safe code then don't use a nullable type unless null is a valid input, and if you're in unsafe code then the same reasoning applies as in C++.
On the subject of output validation:
A similar issue arises: in Java you can "fully validate" the output by knowing its type, and that the value isn't null. In C++, you can't "fully validate" the output with a NULL check - for all you know the function returned a pointer to an object on its own stack which has just been unwound. But if NULL is a common invalid return due to the constructs typically used by the author of the callee code, then checking it will help.
In all cases:
Use assertions rather than "real code" to check contracts where possible - once your app is working, you probably don't want the code bloat of every callee checking all its inputs, and every caller checking its return values.
In the case of writing code which is portable to non-standard C++ implementations, then instead of the code in the question which checks for null and also catches the exception, I'd probably have a function like this:
template<typename T>
static inline void nullcheck(T *ptr) {
#if PLATFORM_TRAITS_NEW_RETURNS_NULL
if (ptr == NULL) throw std::bad_alloc();
#endif
}
Then as one of the list of things you do when porting to a new system, you define PLATFORM_TRAITS_NEW_RETURNS_NULL (and maybe some other PLATFORM_TRAITS) correctly. Obviously you can write a header which does this for all the compilers you know about. If someone takes your code and compiles it on a non-standard C++ implementation that you know nothing about, they're fundamentally on their own for bigger reasons than this, so they'll have to do it themselves.
If you write the code and its contract, you are responsible for using it in terms of its contract and ensuring the contract is correct. If you say "returns a non-null" x, then the caller should not check for null. If a null pointer exception then occurs with that reference/pointer, it is your contract that is incorrect.
Null checking should only go to the extreme when using a library that is untrusted, or does not have a proper contract. If it is your development team's code, stress that the contracts must not be broken, and track down the person who uses the contract incorrectly when bugs occur.
Part of this depends on how the code is used -- if it is a method available only within a project vs. a public API, for example. API error checking requires something stronger than an assertion.
So while this is fine within a project where it's supported with unit tests and stuff like that:
internal void DoThis(Something thing)
{
Debug.Assert(thing != null, "Arg [thing] cannot be null.");
//...
}
in a method where you don't have control over who calls it, something like this may be better:
public void DoThis(Something thing)
{
if (thing == null)
{
throw new ArgumentException("Arg [thing] cannot be null.");
}
//...
}
It depends on the situation. The rest of my answer assumes C++.
I never test the return value of new
since all the implementations I use
throw bad_alloc on failure. If I
see a legacy test for new returning
null in any code I'm working on, I
cut it out and don't bother to
replace it with anything.
Unless small minded coding standards
prohibit it, I assert documented
preconditions. Broken code which
violates a published contract needs
to fail immediately and
dramatically.
If the null arises from a runtime
failure which isn't due to broken
code, I throw. fopen failure and
malloc failure (though I rarely if
ever use them in C++) would fall
into this category.
I don't attempt to recover from
allocation failure. Bad_alloc gets
caught in main().
If the null test
is for an object which is
collaborator of my class, I rewrite
the code to take it by reference.
If the collaborator really might not
exist, I use the Null Object
design pattern to create a
placeholder to fail in well defined
ways.
NULL checking in general is evil as it's add a small negative token to the code testability. With NULL checks everywhere you can't use "pass null" technique and it will hit you when unit testing. It's better to have unit test for the method than null check.
Check out decent presentation on that issue and unit testing in general by Misko Hevery at http://www.youtube.com/watch?v=wEhu57pih5w&feature=channel
Older versions of Microsoft C++ (and probably others) did not throw an exception for failed allocations via new, but returned NULL. Code that had to run in both standard-conforming and older versions would have the redundant checking that you point out in your first example.
It would be cleaner to make all failed allocations follow the same code path:
if(obj==NULL)
throw std::bad_alloc();
It's widely known that there are procedure-oriented people (focus on doing things the right way) and results-oriented people (get the right answer). Most of us lie somewhere in the middle. Looks like you've found an outlier for procedure-oriented. These people would say "anything's possible unless you understand things perfectly; so prepare for anything." For them, what you see is done properly. For them if you change it, they'll worry because the ducks aren't all lined up.
When working on someone else's code, I try to make sure I know two things.
1. What the programmer intended
2. Why they wrote the code the way they did
For following up on Type A programmers, maybe this helps.
So "How much is enough" ends up being a social question as much as a technical question - there's no agreed-upon way to measure it.
(It drives me nuts too.)
Personally I think null testing is unnnecessary in the great majority of cases. If new fails or malloc fails you have bigger issues and the chance of recovering is just about nil in cases where you're not writing a memory checker! Also null testing hides bugs a lot in the development phases since the "null" clauses are frequently just empty and do nothing.
When you can specify which compiler is being used, for system functions such as "new" checking for null is a bug in the code. It means that you will be duplicating the error handling code. Duplicate code is often a source of bugs because often one gets changed and the other doesn't. If you can not specify the compiler or compiler versions, you should be more defensive.
As for internal functions, you should specify the contract and make sure that contract is enforce via unit tests. We had a problem in our code a while back where we either threw an exception or returned null in case of a missing object from our database. This just made things confusing for the caller of the api so we went through and made it consistant throughout the entire code base and removed the duplicate checks.
The important thing (IMHO) is to not have duplicate error logic where one branch will never be invoked. If you can never invoke code, then you can't test it, and you will never know if it is broken or not.
I'd say it depends a little on your language, but I use Resharper with C# and it basically goes out of it's way to tell me "this reference could be null" in which case I add a check, if it tells me "this will always be true" for "if (null != oMyThing && ....)" then I listen to it an don't test for null.
Whether to check for null or not greatly depends on the circumstances.
For example in our shop we check parameters to methods we create for null inside the method. The simple reason is that as the original programmer I have a good idea of exactly what the method should do. I understand the context even if the documentation and requirements are incomplete or less than satisfactory. A later programmer tasked with maintenance may not understand the context and may assume, wrongly, that passing null is harmless. If I know null would be harmful and I can anticipate that someone may pass null, I should take the simple step of making sure that the method reacts in a graceful way.
public MyObject MyMethod(object foo)
{
if (foo == null)
{
throw new ArgumentNullException("foo");
}
// do whatever if foo was non-null
}
I only check for NULL when I know what to do when I see NULL. "Know what to do" here means "know how to avoid a crash" or "know what to tell the user besides the location of the crash". For example, if malloc() returns NULL, I usually have no option but to abort the program. On the other hand, if fopen() returns NULL, I can let the user know the file name that could not be open and may be errno. And if find() returns end(), I usually know how to continue without crashing.
Lower level code should check use from higher level code. Usually this means checking arguments, but it can mean checking return values from upcalls. Upcall arguments need not be checked.
The aim is to catch bugs in immediate and obvious ways, as well as documenting the contract in code that does not lie.
I don't think it's bad code. A fair amount of Windows/Linux API calls return NULL on failure of some sort. So, of course, I check for failure in the manner the API specifies. Usually I wind up passing control flow to an error module of some fashion instead of duplicating error-handling code.
If I receive a pointer that is not guaranteed by language to be not null, and am going to de-reference it in a way that null will break me, or pass out put my function where I said I wouldn't produce NULLs, I check for NULL.
It is not just about NULLs, a function should check pre- and post-conditions if possible.
It doesn't matter at all if a contract of the function that gave me the pointer says it'll never produce nulls. We all make bugs. There's a good rule that a program shall fail early and often, so instead of passing the bug to another module and have it fail, I'll fail in place. Makes things so much easier to debug when testing. Also in critical systems makes it easier to keep the system sane.
Also, if an exception escapes main, stack may not be rolled up, preventing destructors from running at all (see C++ standard on terminate()). Which may be serious. So leaving bad_alloc unchecked can be more dangerous than it seems.
Fail with assert vs. fail with a run time error is quite a different topic.
Checking for NULL after new() if standard new() behavior has not been altered to return NULL instead of throwing seems obsolete.
There's another problem, which is that even if malloc returned a valid pointer, it doesn't yet mean you have allocated memory and can use it. But that is another story.
My first problem with this, is that it leads to code which is littered with null checks and the likes. It hurts readability, and i’d even go as far as to say that it hurts maintainability because it really is easy to forget a null check if you’re writing a piece of code where a certain reference really should never be null. And you just know that the null checks will be missing in some places. Which actually makes debugging harder than it needs to be. Had the original exception not been caught and replaced with a faulty return value, then we would’ve gotten a valuable exception object with an informative stacktrace. What does a missing null check give you? A NullReferenceException in a piece of code that makes you go: wtf? this reference should never be null!
So then you need to start figuring out how the code was called, and why the reference could possibly be null. This can take a lot of time and really hurts the efficiency of your debugging efforts. Eventually you’ll figure out the real problem, but odds are that it was hidden pretty deeply and you spent a lot more time searching for it than you should have.
Another problem with null checks all over the place is that some developers don’t really take the time to properly think about the real problem when they get a NullReferenceException. I’ve actually seen quite a few developers just add a null check above the code where the NullReferenceException occurred. Great, the exception no longer occurs! Hurray! We can go home now! Umm… how bout ‘no you can’t and you deserve an elbow to the face’? The real bug might not cause an exception anymore, but now you probably have missing or faulty behavior… and no exception! Which is even more painful and takes even more time to debug.
At first, this seemed like a strange question: null checks are great and a valuable tool. Checking that new returns null is definitely silly. I'm just going to ignore the fact that there are languages that allow that. I'm sure there are valid reasons, but I really don't think I can handle living in that reality :) All kidding aside, it seems like you should at least have to specify that the new should return null when there isn't enough memory.
Anyway, checking for null where appropriate leads to cleaner code. I'd go so far as to say that never assigning function parameters default values is the next logical step. To go even further, returning empty arrays, etc. where appropriate leads to even cleaner code. It is nice to not have to worry about getting nulls except where they are logically meaningful. Nulls as error values are better avoided.
Using asserts is a really great idea. Especially if it gives you the option of turning them off at runtime. Plus, it is a more explicitly contractual style :)

Categories