Related
When could i in this example be unassigned?
int i;
try
{
i = 2;
}
catch
{
i = 3;
}
finally
{
string a = i.ToString();
}
You could get a ThreadAbortException before i=2 runs, for example. Anyway, the C# compiler is not exceptionally smart, so it's easy to fool with contrived examples like the above one. It doesn't have to recognize every situation, and if it's not sure it is assigned, even if you are sure, it will complain.
EDIT: I was a bit quick on my first assumption. So to refine it, here's what I think. Code is guaranteed to run in order, or if an exception happens, it will jump to the handlers. So i=2 may not run if an exception happens before that. I still claim that a ThreadAbortException is one of the few reasons why this can happen, even you have no code which could produce exceptions. Generally, if you have any number of different exception handlers, the compiler cannot know in advance which one will run. So it doesn't try to make any assumptions about that. It could know that if 1) there is only 1 catch block and 2) it is typeless, then, and only then, that one catch block is guaranteed to run. Or, if there were multiple catch handlers, and you assigned your variable in every one of them, it could also work, but I guess the compiler doesn't care about that either. However simple it may seem, it is a special case, and the C# compiler team has a tendency to ignore those special cases.
It's highly unlikely it will happen with the example you have posted. However, the compliler is going to be 'helpful' in this situation. As Hadas said, just initialize i to 0.
There are lots of code samples that one can write in which it is provably the case that a variable is assigned but that the compiler simply cannot prove that it is definitely assigned.
Just consider this much simpler case:
int i;
if ((bool)(object)true)
i = 0;
Console.WriteLine(i);
It is provably impossible for that case to ever access an unassigned i as well, yet it won't compile.
It is also provably impossible for the compiler to solve this problem in the general case. There are cases where it can prove a variable is certainly not definitely assigned, and there are cases where it can prove it definitely is assigned, but there are also cases where it just doesn't know either way. In those cases it chooses to fail, because it sees a few false positive errors as less harmful than false negatives.
To speak more about your specific case; you're saying that if a variable is assigned in both the try and catch blocks it is definitely assigned. While that may be true of your specific code, it's certainly not true in the general case. You need to consider exceptions that aren't handled by the catch block (even in your case, where none is specified, exceptions such as a stack overflow or out of memory won't be caught), you need to consider the catch block itself throwing an exception (again, it won't happen in your case, but the compiler would need to prove that to compile the code).
I think it would help if you to look at another type besides a primitive. Consider:
int i;
MyClass ed = new MyClass();
try
{
int newI = ed.getIntFromFunctionThatWillThrow();
i = newI;
}
catch (Exception e)
{
i = 3;
// do some abortion code.
}
finally
{
string a = i.ToString();
...
}
So in this block of code, there is no lexicographical guarantee for any one branch of execution. You have two (at least) branches to consider: The try-finally branch, and the try-catch branch. Since the function "getIntFromFunctionThatWillThrow" is going to throw (See what I did there?) i will be left unassigned, even though I try to use it later. However, that isn't something that is recognized until after runtime, like I have no idea what section of code this will get into if I have no inside information about the member on ed. So the compiler has no idea what value i will be. It only knows that it exists, and is of type int.
If it was an issue, a fix is to set i an initial value, this will suppress the error.
I hope this helps somewhat!
I recently had a coding bug where under certain conditions a variable wasn't being initialized and I was getting a NullReferenceException . This took a while to debug as I had to find the bits of data that would generate this to recreate it the error as the exception doesn't give the variable name.
Obviously I could check every variable before use and throw an informative exception but is there a better (read less coding) way of doing this? Another thought I had was shipping with the pdb files so that the error information would contain the code line that caused the error. How do other people avoid / handle this problem?
Thanks
Firstly: don't do too much in a single statement. If you have huge numbers of dereferencing operations in one line, it's going to be much harder to find the culprit. The Law of Demeter helps with this too - if you've got something like order.SalesClerk.Manager.Address.Street.Length then you've got a lot of options to wade through when you get an exception. (I'm not dogmatic about the Law of Demeter, but everything in moderation...)
Secondly: prefer casting over using as, unless it's valid for the object to be a different type, which normally involves a null check immediately afterwards. So here:
// What if foo is actually a Control, but we expect it to be String?
string text = foo as string;
// Several lines later
int length = text.Length; // Bang!
Here we'd get a NullReferenceException and eventually trace it back to text being null - but then you wouldn't know whether that's because foo was null, or because it was an unexpected type. If it should really, really be a string, then cast instead:
string text = (string) foo;
Now you'll be able to tell the difference between the two scenarios.
Thirdly: as others have said, validate your data - typically arguments to public and potentially internal APIs. I do this in enough places in Noda Time that I've got a utility class to help me declutter the check. So for example (from Period):
internal LocalInstant AddTo(LocalInstant localInstant,
CalendarSystem calendar, int scalar)
{
Preconditions.CheckNotNull(calendar, "calendar");
...
}
You should document what can and can't be null, too.
In a lot of cases it's near impossible to plan and account for every type of exception that might happen at any given point in the execution flow of your application. Defensive coding is effective only to a certain point. The trick is to have a solid diagnostics stack incorporated into your application that can give you meaningful information about unhandled errors and crashes. Having a good top-level (last ditch) handler at the app-domain level will help a lot with that.
Yes, shipping the PDBs (even with a release build) is a good way to obtain a complete stack trace that can pinpoint the exact location and causes of errors. But whatever diagnostics approach you pick, it needs to be baked into the design of the application to begin with (ideally). Retrofitting an existing app can be tedious and time/money-intensive.
Sorry to say that I will always make a check to verify that any object I am using in a particular method is not null.
It's as simple as
if( this.SubObject == null )
{
throw new Exception("Could not perform METHOD - SubObject is null.");
}
else
{
...
}
Otherwise I can't think of any way to be thorough. Wouldn't make much sense to me not to make these checks anyway; I feel it's just good practice.
First of all you should always validate your inputs. If null is not allowed, throw an ArgumentNullException.
Now, I know how that can be painful, so you could look into assembly rewriting tools that do it for you. The idea is that you'd have a kind of attribute that would mark those arguments that can't be null:
public void Method([NotNull] string name) { ...
And the rewriter would fill in the blanks...
Or a simple extension method could make it easier
name.CheckNotNull();
If you are just looking for a more compact way to code against having null references, don't overlook the null-coalescing operator ?? MSDN
Obviously, it depends what you are doing but it can be used to avoid extra if statements.
[ This is a result of Best Practice: Should functions return null or an empty object? but I'm trying to be very general. ]
In a lot of legacy (um...production) C++ code that I've seen, there is a tendency to write a lot of NULL (or similar) checks to test pointers. Many of these get added near the end of a release cycle when adding a NULL-check provides a quick fix to a crash caused by the pointer dereference--and there isn't a lot of time to investigate.
To combat this, I started to write code that took a (const) reference parameter instead of the (much) more common technique of passing a pointer. No pointer, no desire to check for NULL (ignoring the corner case of actually having a null reference).
In C#, the same C++ "problem" is present: the desire to check every unknown reference against null (ArgumentNullException) and to quickly fix NullReferenceExceptions by adding a null check.
It seems to me, one way to prevent this is to avoid null objects in the first place by using empty objects (String.Empty, EventArgs.Empty) instead. Another would be to throw an exception rather than return null.
I'm just starting to learn F#, but it appears there are far fewer null objects in that enviroment. So maybe you don't really have to have a lot of null references floating around?
Am I barking up the wrong tree here?
Passing non-null just to avoid a NullReferenceException is trading a straightforward, easy-to-solve problem ("it blows up because it's null") for a much more subtle, hard-to-debug problem ("something several calls down the stack is not behaving as expected because much earlier it got some object which has no meaningful information but isn't null").
NullReferenceException is a wonderful thing! It fails hard, loud, fast, and it's almost always quick and easy to identify and fix. It's my favorite exception, because I know when I see it, my task is only going to take about 2 minutes. Contrast this with a confusing QA or customer report trying to describe strange behavior that has to be reproduced and traced back to the origin. Yuck.
It all comes down to what you, as a method or piece of code, can reasonably infer about the code which called you. If you are handed a null reference, and you can reasonably infer what the caller might have meant by null (maybe an empty collection, for example?) then you should definitely just deal with the nulls. However, if you can't reasonably infer what to do with a null, or what the caller means by null (for example, the calling code is telling you to open a file and gives the location as null), you should throw an ArgumentNullException.
Maintaining proper coding practices like this at every "gateway" point - logical bounds of functionality in your code—NullReferenceExceptions should be much more rare.
I tend to be dubious of code with lots of NULLs, and try to refactor them away where possible with exceptions, empty collections, Java Optionals, and so on.
The "Introduce Null Object" pattern in Martin Fowler's Refactoring (page 260) may also be helpful. A Null Object responds to all the methods a real object would, but in a way that "does the right thing". So rather than always check an Order to see if order.getDiscountPolicy() is NULL, make sure the Order has a NullDiscountPolicy in these cases. This streamlines the control logic.
Null gets my vote. Then again, I'm of the 'fail-fast' mindset.
String.IsNullOrEmpty(...) is very helpful too, I guess it catches either situation: null or empty strings. You could write a similar function for all your classes you're passing around.
If you are writing code that returns null as an error condition, then don't: generally, you should throw an exception instead - far harder to miss.
If you are consuming code that you fear may return null, then mostly these are boneheaded exceptions: perhaps do some Debug.Assert checks at the caller to sense-check the output during development. You shouldn't really need vast numbers of null checks in you production, but if some 3rd party library returns lots of nulls unpredictably, then sure: do the checks.
In 4.0, you might want to look at code-contracts; this gives you much better control to say "this argument should never be passed in as null", "this function never returns null", etc - and have the system validate those claims during static analysis (i.e. when you build).
The thing about null is that it doesn't come with meaning. It is merely the absence of an object.
So, if you really mean an empty string/collection/whatever, always return the relevant object and never null. If the language in question allows you to specify that, do so.
In the case where you want to return something that means not a value specifiable with the static type, then you have a number of options. Returning null is one answer, but without a meaning is a little dangerous. Throwing an exception may actually be what you mean. You might want to extend the type with special cases (probably with polymorphism, that is to say the Special Case Pattern (a special case of which is the Null Object Pattern)). You might want to wrap the return value in an type with more meaning. Or you might want to pass in a callback object. There usually are many choices.
I'd say it depends. For a method returning a single object, I'd generally return null. For a method returning a collection, I'd generally return an empty collection (non-null). These are more along the lines of guidelines than rules, though.
If you are serious about wanting to program in a "nullless" environment, consider using extension methods more often, they are immune to NullReferenceExceptions and at least "pretend" that null isn't there anymore:
public static GetExtension(this string s)
{
return (new FileInfo(s ?? "")).Extension;
}
which can be called as:
// this code will never throw, not even when somePath becomes null
string somePath = GetDataFromElseWhereCanBeNull();
textBoxExtension.Text = somePath.GetExtension();
I know, this is only convenience and many people correctly consider it violation of OO principles (though the "founder" of OO, Bertrand Meyer, considers null evil and completely banished it from his OO design, which is applied to the Eiffel language, but that's another story). EDIT: Dan mentions that Bill Wagner (More Effective C#) considers it bad practice and he's right. Ever considered the IsNull extension method ;-) ?
To make your code more readable, another hint may be in place: use the null-coalescing operator more often to designate a default when an object is null:
// load settings
WriteSettings(currentUser.Settings ?? new Settings());
// example of some readonly property
public string DisplayName
{
get
{
return (currentUser ?? User.Guest).DisplayName
}
}
None of these take the occasional check for null away (and ?? is nothing more then a hidden if-branch). I prefer as little null in my code as possible, simply because I believe it makes the code more readable. When my code gets cluttered with if-statements for null, I know there's something wrong in the design and I refactor. I suggest anybody to do the same, but I know that opinions vary wildly on the matter.
(Update) Comparison with exceptions
Not mentioned in the discussion so far is the similarity with exception handling. When you find yourself ubiquitously ignoring null whenever you consider it's in your way, it is basically the same as writing:
try
{
//...code here...
}
catch (Exception) {}
which has the effect of removing any trace of the exceptions only to find it raises unrelated exceptions much later in the code. Though I consider it good to avoid using null, as mentioned before in this thread, having null for exceptional cases is good. Just don't hide them in null-ignore-blocks, it will end up having the same effect as the catch-all-exceptions blocks.
For the exception protagonists they usually stem from transactional programming and strong exception safety guarantees or blind guidelines. In any decent complexity, ie. async workflow, I/O and especially networking code they are simply inappropriate. The reason why you see Google style docs on the matter in C++, as well as all good async code 'not enforcing it' (think your favourite managed pools as well).
There is more to it and while it might look like a simplification, it really is that simple. For one you will get a lot of exceptions in something that wasn't designed for heavy exception use.. anyway I digress, read upon on this from the world's top library designers, the usual place is boost (just don't mix it up with the other camp in boost that loves exceptions, because they had to write music software :-).
In your instance, and this is not Fowler's expertise, an efficient 'empty object' idiom is only possible in C++ due to available casting mechanism (perhaps but certainly not always by means of dominance ).. On ther other hand, in your null type you are capable throwing exceptions and doing whatever you want while preserving the clean call site and structure of code.
In C# your choice can be a single instance of a type that is either good or malformed; as such it is capable of throwing acceptions or simply running as is. So it might or might not violate other contracts ( up to you what you think is better depending on the quality of code you're facing ).
In the end, it does clean up call sites, but don't forget you will face a clash with many libraries (and especially returns from containers/Dictionaries, end iterators spring to mind, and any other 'interfacing' code to the outside world ). Plus null-as-value checks are extremely optimised pieces of machine code, something to keep in mind but I will agree any day wild pointer usage without understanding constness, references and more is going to lead to different kind of mutability, aliasing and perf problems.
To add, there is no silver bullet, and crashing on null reference or using a null reference in managed space, or throwing and not handling an exception is an identical problem, despite what managed and exception world will try to sell you. Any decent environment offers a protection from those (heck you can install any filter on any OS you want, what else do you think VMs do), and there are so many other attack vectors that this one has been overhammered to pieces. Enter x86 verification from Google yet again, their own way of doing much faster and better 'IL', 'dynamic' friendly code etc..
Go with your instinct on this, weight the pros and cons and localise the effects.. in the future your compiler will optimise all that checking anyway, and far more efficiently than any runtime or compile-time human method (but not as easily for cross-module interaction).
I try to avoid returning null from a method wherever possible. There are generally two kinds of situations - when null result would be legal, and when it should never happen.
In the first case, when no result is legal, there are several solutions available to avoid null results and null checks that are associated with them: Null Object pattern and Special Case pattern are there to return substitute objects that do nothing, or do some specific thing under specific circumstances.
If it is legal to return no object, but still there are no suitable substitutes in terms of Null Object or Special Case, then I typically use the Option functional type - I can then return an empty option when there is no legal result. It is then up to the client to see what is the best way to deal with empty option.
Finally, if it is not legal to have any object returned from a method, simply because the method cannot produce its result if something is missing, then I choose to throw an exception and cut further execution.
How are empty objects better than null objects? You're just renaming the symptom. The problem is that the contracts for your functions are too loosely defined "this function might return something useful, or it might return a dummy value" (where the dummy value might be null, an "empty object", or a magic constant like -1.) But no matter how you express this dummy value, callers still have to check for it before they use the return value.
If you want to clean up your code, the solution should be to narrow down the function so that it doesn't return a dummy value in the first place.
If you have a function which might return a value, or might return nothing, then pointers are a common (and valid) way to express this. But often, your code can be refactored so that this uncertainty is removed. If you can guarantee that a function returns something meaningful, then callers can rely on it returning something meaningful, and then they don't have to check the return value.
You can't always return an empty object, because 'empty' is not always defined. For example what does it mean for an int, float or bool to be empty?
Returning a NULL pointer is not necessarily a bad practice, but I think it's a better practice to return a (const) reference (where it makes sense to do so of course).
And recently I've often used a Fallible class:
Fallible<std::string> theName = obj.getName();
if (theName)
{
// ...
}
There are various implementations available for such a class (check Google Code Search), I also created my own.
What are some guidelines for when it is not necessary to check for a null?
A lot of the inherited code I've been working on as of late has null-checks ad nauseam. Null checks on trivial functions, null checks on API calls that state non-null returns, etc. In some cases, the null-checks are reasonable, but in many places a null is not a reasonable expectation.
I've heard a number of arguments ranging from "You can't trust other code" to "ALWAYS program defensively" to "Until the language guarantees me a non-null value, I'm always gonna check." I certainly agree with many of those principles up to a point, but I've found excessive null-checking causes other problems that usually violate those tenets. Is the tenacious null checking really worth it?
Frequently, I've observed codes with excess null checking to actually be of poorer quality, not of higher quality. Much of the code seems to be so focused on null-checks that the developer has lost sight of other important qualities, such as readability, correctness, or exception handling. In particular, I see a lot of code ignore the std::bad_alloc exception, but do a null-check on a new.
In C++, I understand this to some extent due to the unpredictable behavior of dereferencing a null pointer; null dereference is handled more gracefully in Java, C#, Python, etc. Have I just seen poor-examples of vigilant null-checking or is there really something to this?
This question is intended to be language agnostic, though I am mainly interested in C++, Java, and C#.
Some examples of null-checking that I've seen that seem to be excessive include the following:
This example seems to be accounting for non-standard compilers as C++ spec says a failed new throws an exception. Unless you are explicitly supporting non-compliant compilers, does this make sense? Does this make any sense in a managed language like Java or C# (or even C++/CLR)?
try {
MyObject* obj = new MyObject();
if(obj!=NULL) {
//do something
} else {
//??? most code I see has log-it and move on
//or it repeats what's in the exception handler
}
} catch(std::bad_alloc) {
//Do something? normally--this code is wrong as it allocates
//more memory and will likely fail, such as writing to a log file.
}
Another example is when working on internal code. Particularly, if it's a small team who can define their own development practices, this seems unnecessary. On some projects or legacy code, trusting documentation may not be reasonable... but for new code that you or your team controls, is this really necessary?
If a method, which you can see and can update (or can yell at the developer who is responsible) has a contract, is it still necessary to check for nulls?
//X is non-negative.
//Returns an object or throws exception.
MyObject* create(int x) {
if(x<0) throw;
return new MyObject();
}
try {
MyObject* x = create(unknownVar);
if(x!=null) {
//is this null check really necessary?
}
} catch {
//do something
}
When developing a private or otherwise internal function, is it really necessary to explicitly handle a null when the contract calls for non-null values only? Why would a null-check be preferable to an assert?
(obviously, on your public API, null-checks are vital as it's considered impolite to yell at your users for incorrectly using the API)
//Internal use only--non-public, not part of public API
//input must be non-null.
//returns non-negative value, or -1 if failed
int ParseType(String input) {
if(input==null) return -1;
//do something magic
return value;
}
Compared to:
//Internal use only--non-public, not part of public API
//input must be non-null.
//returns non-negative value
int ParseType(String input) {
assert(input!=null : "Input must be non-null.");
//do something magic
return value;
}
One thing to remember that your code that you write today while it may be a small team and you can have good documentation, will turn into legacy code that someone else will have to maintain. I use the following rules:
If I'm writing a public API that will be exposed to others, then I will do null checks on all reference parameters.
If I'm writing an internal component to my application, I write null checks when I need to do something special when a null exists, or when I want to make it very clear. Otherwise I don't mind getting the null reference exception since that is also fairly clear what is going on.
When working with return data from other peoples frameworks, I only check for null when it is possible and valid to have a null returned. If their contract says it doesn't return nulls, I won't do the check.
First note that this a special case of contract-checking: you're writing code that does nothing other than validate at runtime that a documented contract is met. Failure means that some code somewhere is faulty.
I'm always slightly dubious about implementing special cases of a more generally useful concept. Contract checking is useful because it catches programming errors the first time they cross an API boundary. What's so special about nulls that means they're the only part of the contract you care to check? Still,
On the subject of input validation:
null is special in Java: a lot of Java APIs are written such that null is the only invalid value that it's even possible to pass into a given method call. In such cases a null check "fully validates" the input, so the full argument in favour of contract checking applies.
In C++, on the other hand, NULL is only one of nearly 2^32 (2^64 on newer architectures) invalid values that a pointer parameter could take, since almost all addresses are not of objects of the correct type. You can't "fully validate" your input unless you have a list somewhere of all objects of that type.
The question then becomes, is NULL a sufficiently common invalid input to get special treatment that (foo *)(-1) doesn't get?
Unlike Java, fields don't get auto-initialized to NULL, so a garbage uninitialized value is just as plausible as NULL. But sometimes C++ objects have pointer members which are explicitly NULL-inited, meaning "I don't have one yet". If your caller does this, then there is a significant class of programming errors which can be diagnosed by a NULL check. An exception may be easier for them to debug than a page fault in a library they don't have the source for. So if you don't mind the code bloat, it might be helpful. But it's your caller you should be thinking of, not yourself - this isn't defensive coding, because it only 'defends' against NULL, not against (foo *)(-1).
If NULL isn't a valid input, you could consider taking the parameter by reference rather than pointer, but a lot of coding styles disapprove of non-const reference parameters. And if the caller passes you *fooptr, where fooptr is NULL, then it has done nobody any good anyway. What you're trying to do is squeeze a bit more documentation into the function signature, in the hope that your caller is more likely to think "hmm, might fooptr be null here?" when they have to explicitly dereference it, than if they just pass it to you as a pointer. It only goes so far, but as far as it goes it might help.
I don't know C#, but I understand that it's like Java in that references are guaranteed to have valid values (in safe code, at least), but unlike Java in that not all types have a NULL value. So I'd guess that null checks there are rarely worth it: if you're in safe code then don't use a nullable type unless null is a valid input, and if you're in unsafe code then the same reasoning applies as in C++.
On the subject of output validation:
A similar issue arises: in Java you can "fully validate" the output by knowing its type, and that the value isn't null. In C++, you can't "fully validate" the output with a NULL check - for all you know the function returned a pointer to an object on its own stack which has just been unwound. But if NULL is a common invalid return due to the constructs typically used by the author of the callee code, then checking it will help.
In all cases:
Use assertions rather than "real code" to check contracts where possible - once your app is working, you probably don't want the code bloat of every callee checking all its inputs, and every caller checking its return values.
In the case of writing code which is portable to non-standard C++ implementations, then instead of the code in the question which checks for null and also catches the exception, I'd probably have a function like this:
template<typename T>
static inline void nullcheck(T *ptr) {
#if PLATFORM_TRAITS_NEW_RETURNS_NULL
if (ptr == NULL) throw std::bad_alloc();
#endif
}
Then as one of the list of things you do when porting to a new system, you define PLATFORM_TRAITS_NEW_RETURNS_NULL (and maybe some other PLATFORM_TRAITS) correctly. Obviously you can write a header which does this for all the compilers you know about. If someone takes your code and compiles it on a non-standard C++ implementation that you know nothing about, they're fundamentally on their own for bigger reasons than this, so they'll have to do it themselves.
If you write the code and its contract, you are responsible for using it in terms of its contract and ensuring the contract is correct. If you say "returns a non-null" x, then the caller should not check for null. If a null pointer exception then occurs with that reference/pointer, it is your contract that is incorrect.
Null checking should only go to the extreme when using a library that is untrusted, or does not have a proper contract. If it is your development team's code, stress that the contracts must not be broken, and track down the person who uses the contract incorrectly when bugs occur.
Part of this depends on how the code is used -- if it is a method available only within a project vs. a public API, for example. API error checking requires something stronger than an assertion.
So while this is fine within a project where it's supported with unit tests and stuff like that:
internal void DoThis(Something thing)
{
Debug.Assert(thing != null, "Arg [thing] cannot be null.");
//...
}
in a method where you don't have control over who calls it, something like this may be better:
public void DoThis(Something thing)
{
if (thing == null)
{
throw new ArgumentException("Arg [thing] cannot be null.");
}
//...
}
It depends on the situation. The rest of my answer assumes C++.
I never test the return value of new
since all the implementations I use
throw bad_alloc on failure. If I
see a legacy test for new returning
null in any code I'm working on, I
cut it out and don't bother to
replace it with anything.
Unless small minded coding standards
prohibit it, I assert documented
preconditions. Broken code which
violates a published contract needs
to fail immediately and
dramatically.
If the null arises from a runtime
failure which isn't due to broken
code, I throw. fopen failure and
malloc failure (though I rarely if
ever use them in C++) would fall
into this category.
I don't attempt to recover from
allocation failure. Bad_alloc gets
caught in main().
If the null test
is for an object which is
collaborator of my class, I rewrite
the code to take it by reference.
If the collaborator really might not
exist, I use the Null Object
design pattern to create a
placeholder to fail in well defined
ways.
NULL checking in general is evil as it's add a small negative token to the code testability. With NULL checks everywhere you can't use "pass null" technique and it will hit you when unit testing. It's better to have unit test for the method than null check.
Check out decent presentation on that issue and unit testing in general by Misko Hevery at http://www.youtube.com/watch?v=wEhu57pih5w&feature=channel
Older versions of Microsoft C++ (and probably others) did not throw an exception for failed allocations via new, but returned NULL. Code that had to run in both standard-conforming and older versions would have the redundant checking that you point out in your first example.
It would be cleaner to make all failed allocations follow the same code path:
if(obj==NULL)
throw std::bad_alloc();
It's widely known that there are procedure-oriented people (focus on doing things the right way) and results-oriented people (get the right answer). Most of us lie somewhere in the middle. Looks like you've found an outlier for procedure-oriented. These people would say "anything's possible unless you understand things perfectly; so prepare for anything." For them, what you see is done properly. For them if you change it, they'll worry because the ducks aren't all lined up.
When working on someone else's code, I try to make sure I know two things.
1. What the programmer intended
2. Why they wrote the code the way they did
For following up on Type A programmers, maybe this helps.
So "How much is enough" ends up being a social question as much as a technical question - there's no agreed-upon way to measure it.
(It drives me nuts too.)
Personally I think null testing is unnnecessary in the great majority of cases. If new fails or malloc fails you have bigger issues and the chance of recovering is just about nil in cases where you're not writing a memory checker! Also null testing hides bugs a lot in the development phases since the "null" clauses are frequently just empty and do nothing.
When you can specify which compiler is being used, for system functions such as "new" checking for null is a bug in the code. It means that you will be duplicating the error handling code. Duplicate code is often a source of bugs because often one gets changed and the other doesn't. If you can not specify the compiler or compiler versions, you should be more defensive.
As for internal functions, you should specify the contract and make sure that contract is enforce via unit tests. We had a problem in our code a while back where we either threw an exception or returned null in case of a missing object from our database. This just made things confusing for the caller of the api so we went through and made it consistant throughout the entire code base and removed the duplicate checks.
The important thing (IMHO) is to not have duplicate error logic where one branch will never be invoked. If you can never invoke code, then you can't test it, and you will never know if it is broken or not.
I'd say it depends a little on your language, but I use Resharper with C# and it basically goes out of it's way to tell me "this reference could be null" in which case I add a check, if it tells me "this will always be true" for "if (null != oMyThing && ....)" then I listen to it an don't test for null.
Whether to check for null or not greatly depends on the circumstances.
For example in our shop we check parameters to methods we create for null inside the method. The simple reason is that as the original programmer I have a good idea of exactly what the method should do. I understand the context even if the documentation and requirements are incomplete or less than satisfactory. A later programmer tasked with maintenance may not understand the context and may assume, wrongly, that passing null is harmless. If I know null would be harmful and I can anticipate that someone may pass null, I should take the simple step of making sure that the method reacts in a graceful way.
public MyObject MyMethod(object foo)
{
if (foo == null)
{
throw new ArgumentNullException("foo");
}
// do whatever if foo was non-null
}
I only check for NULL when I know what to do when I see NULL. "Know what to do" here means "know how to avoid a crash" or "know what to tell the user besides the location of the crash". For example, if malloc() returns NULL, I usually have no option but to abort the program. On the other hand, if fopen() returns NULL, I can let the user know the file name that could not be open and may be errno. And if find() returns end(), I usually know how to continue without crashing.
Lower level code should check use from higher level code. Usually this means checking arguments, but it can mean checking return values from upcalls. Upcall arguments need not be checked.
The aim is to catch bugs in immediate and obvious ways, as well as documenting the contract in code that does not lie.
I don't think it's bad code. A fair amount of Windows/Linux API calls return NULL on failure of some sort. So, of course, I check for failure in the manner the API specifies. Usually I wind up passing control flow to an error module of some fashion instead of duplicating error-handling code.
If I receive a pointer that is not guaranteed by language to be not null, and am going to de-reference it in a way that null will break me, or pass out put my function where I said I wouldn't produce NULLs, I check for NULL.
It is not just about NULLs, a function should check pre- and post-conditions if possible.
It doesn't matter at all if a contract of the function that gave me the pointer says it'll never produce nulls. We all make bugs. There's a good rule that a program shall fail early and often, so instead of passing the bug to another module and have it fail, I'll fail in place. Makes things so much easier to debug when testing. Also in critical systems makes it easier to keep the system sane.
Also, if an exception escapes main, stack may not be rolled up, preventing destructors from running at all (see C++ standard on terminate()). Which may be serious. So leaving bad_alloc unchecked can be more dangerous than it seems.
Fail with assert vs. fail with a run time error is quite a different topic.
Checking for NULL after new() if standard new() behavior has not been altered to return NULL instead of throwing seems obsolete.
There's another problem, which is that even if malloc returned a valid pointer, it doesn't yet mean you have allocated memory and can use it. But that is another story.
My first problem with this, is that it leads to code which is littered with null checks and the likes. It hurts readability, and i’d even go as far as to say that it hurts maintainability because it really is easy to forget a null check if you’re writing a piece of code where a certain reference really should never be null. And you just know that the null checks will be missing in some places. Which actually makes debugging harder than it needs to be. Had the original exception not been caught and replaced with a faulty return value, then we would’ve gotten a valuable exception object with an informative stacktrace. What does a missing null check give you? A NullReferenceException in a piece of code that makes you go: wtf? this reference should never be null!
So then you need to start figuring out how the code was called, and why the reference could possibly be null. This can take a lot of time and really hurts the efficiency of your debugging efforts. Eventually you’ll figure out the real problem, but odds are that it was hidden pretty deeply and you spent a lot more time searching for it than you should have.
Another problem with null checks all over the place is that some developers don’t really take the time to properly think about the real problem when they get a NullReferenceException. I’ve actually seen quite a few developers just add a null check above the code where the NullReferenceException occurred. Great, the exception no longer occurs! Hurray! We can go home now! Umm… how bout ‘no you can’t and you deserve an elbow to the face’? The real bug might not cause an exception anymore, but now you probably have missing or faulty behavior… and no exception! Which is even more painful and takes even more time to debug.
At first, this seemed like a strange question: null checks are great and a valuable tool. Checking that new returns null is definitely silly. I'm just going to ignore the fact that there are languages that allow that. I'm sure there are valid reasons, but I really don't think I can handle living in that reality :) All kidding aside, it seems like you should at least have to specify that the new should return null when there isn't enough memory.
Anyway, checking for null where appropriate leads to cleaner code. I'd go so far as to say that never assigning function parameters default values is the next logical step. To go even further, returning empty arrays, etc. where appropriate leads to even cleaner code. It is nice to not have to worry about getting nulls except where they are logically meaningful. Nulls as error values are better avoided.
Using asserts is a really great idea. Especially if it gives you the option of turning them off at runtime. Plus, it is a more explicitly contractual style :)
I don't really know much about the internals of compiler and JIT optimizations, but I usually try to use "common sense" to guess what could be optimized and what couldn't. So there I was writing a simple unit test method today:
#Test // [Test] in C#
public void testDefaultConstructor() {
new MyObject();
}
This method is actually all I need. It checks that the default constructor exists and runs without exceptions.
But then I started to think about the effect of compiler/JIT optimizations. Could the compiler/JIT optimize this method by eliminating the new MyObject(); statement completely? Of course, it would need to determine that the call graph does not have side effects to other objects, which is the typical case for a normal constructor that simply initializes the internal state of the object.
I presume that only the JIT would be allowed to perform such an optimization. This probably means that it's not something I should worry about, because the test method is being performed only once. Are my assumptions correct?
Nevertheless, I'm trying to think about the general subject. When I thought about how to prevent this method from being optimized, I thought I may assertTrue(new MyObject().toString() != null), but this is very dependent on the actual implementation of the toString() method, and even then, the JIT can determine that toString() method always returns a non-null string (e.g. if actually Object.toString() is being called), and thus optimize the whole branch. So this way wouldn't work.
I know that in C# I can use [MethodImpl(MethodImplOptions.NoOptimization)], but this is not what I'm actually looking for. I'm hoping to find a (language-independent) way of making sure that some specific part(s) of my code will actually run as I expect, without the JIT interfering in this process.
Additionally, are there any typical optimization cases I should be aware of when creating my unit tests?
Thanks a lot!
Don't worry about it. It's not allowed to ever optimize anything that can make a difference to your system (except for speed). If you new an object, code gets called, memory gets allocated, it HAS to work.
If you had it protected by an if(false), where false is a final, it could be optimized out of the system completely, then it could detect that the method doesn't do anything and optimize IT out (in theory).
Edit: by the way, it can also be smart enough to determine that this method:
newIfTrue(boolean b) {
if(b)
new ThisClass();
}
will always do nothing if b is false, and eventually figure out that at one point in your code B is always false and compile this routine out of that code completely.
This is where the JIT can do stuff that's virtually impossible in any non-managed language.
I think if you are worried about it getting optimized away, you may be doing a bit of testing overkill.
In a static language, I tend to think of the compiler as a test. If it passes compilation, that means that certain things are there (like methods). If you don't have another test that exercises your default constructor (which will prove it wont throw exceptions), you may want to think about why you are writing that default constructor in the first place (YAGNI and all that).
I know there are people that don't agree with me, but I feel like this sort of thing is just something that will bloat out your number of tests for no useful reason, even looking at it through TDD goggles.
Think about it this way:
Lets assume that compiler can determine that the call graph doesn't have any side effects(I don't think it is possible, I vaguely remember something about P=NP from my CS courses). It will optimize any method that doesn't have side effects. Since most tests don't have and shouldn't have any side effects then compiler can optimize them all away.
The JIT is only allowed to perform operations that do not affect the guaranteed semantics of the language. Theoretically, it could remove the allocation and call to the MyObject constructor if it can guarantee that the call has no side effects and can never throw an exception (not counting OutOfMemoryError).
In other words, if the JIT optimizes the call out of your test, then your test would have passed anyway.
PS: Note that this applies because you are doing functionality testing as opposed to performance testing. In performance testing, it's important to make sure the JIT does not optimize away the operation you are measuring, else your results become useless.
It seems that in C# I could do this:
[Test]
public void testDefaultConstructor() {
GC.KeepAlive(new MyObject());
}
AFAIU, the GC.KeepAlive method will not be inlined by the JIT, so the code will be guaranteed to work as expected. However, I don't know a similar construct in Java.
Every I/O is a side effect, so you can just put
Object obj = new MyObject();
System.out.println(obj.toString());
and you're fine.
Why should it matter? If the compiler/JIT can statically determine no asserts are going to be hit (which could cause side effects), then you're fine.