Which is less expensive and preferable: put1 or put2?
Map<String, Animal> map = new Map<String, Animal>();
void put1(){
for (.....)
if (Animal.class.isAssignableFrom(item[i].getClass())
map.put(key[i], item[i]);
void put2(){
for (.....)
try{
map.put(key[i], item[i]);}
catch (...){}
Question revision:
The question wasn't that clear. Let me revise the question a little. I forgot the casting so that put2 depends on cast exception failure. isAssignableFrom(), isInstanceOf() and instanceof are similar functionally and therefore incur the same expense just one is a method to include subclasses,while the 2nd is for exact type matching and the 3rd is the operator version. Both reflective methods and exceptions are expensive operations.
My question is for those who have done some benchmarking in this area - which is less expensive and preferable: instanceof/isassignablefrom vs cast exception?
void put1(){
for (.....)
if (Animal.class.isAssignableFrom(item[i].getClass())
map.put(key[i], (Animal)item[i]);
void put2(){
for (.....)
try{
map.put(key[i], (Animal)item[i]);}
catch (...){}
Probably you want:
if (item[i] instanceof Animal)
map.put(key[i], (Animal) item[i]);
This is almost certainly much better than calling isAssignableFrom.
Or in C# (since you added the c# tag):
var a = item[i] as Animal;
if (a != null)
map[key[i]] = a;
EDIT: The updated question is which is better: instanceof or cast-and-catch. The functionality is basically the same. The performance difference might not be significant and I would have to measure it; generally throwing an exception is slow, but I don't know about the rest. So I would decide based on style. Say what you mean.
If you always expect expect item[i] to be an Animal, and you're just being extra careful, cast-and-catch. Otherwise I find it much clearer to use instanceof, because that plainly says what you mean: "if this object is an Animal, put it in the map".
I'm confused. If item[i] is not an Animal, then how does map.put(key[i], item[i]) even compile?
That said, the first method says what you're intending to do, although I believe instanceof would be an even better check.
Typically exception handling will be significantly slower because, since it is supposed to be used for exceptional things (rarely occurring) not much work is spent by VM makers on speeding it up.
The tr/catch version of your code I would consider to be abuse of exception handling and would never consider doing it. The fact that you are thinking of doing something like this probably means you have a poor design, items should probably an Animal[] not something else, in which case you don't need to check at runtime at all. Let the compiler do the work for you.
I agree with a previous answer - this will not compile.
But, in my opinion, whether it is an exception or a check depends on the purpose of the function.
Is item[i] not being a Animal an error/exceptional case? Is it expected to happen rarely? In this case, it should be an exception.
If it is part of the logic - meaning you expect item[i] to be many things - and only if it is an Animal you want to put in a map. In this case, the instanceof check is the right way.
UPDATE :
I'll also add an example (bit lame) :
Which is better :
(1)
if ( aNumber < 100 ) {
processNumber(aNumber);
}
or (2)
try {
processNumber(aNumber); //Throws exception if aNumber >= 100
} catch () {
}
This depends on what the program does. (1) may be used for counting numbers < 100 for any integer input. (2) will be used if processNumber expects a percentage value which cannot be greater than 100.
The difference is, it is an error for program (2) to get aNumber > 100. However, for program (1) aNumber > 100 is valid, but "something" happens only when aNumber is < 100.
PS - This may not be helpful to you at all, and I apologize if this is the case.
Your two alternatives are not really equivalent. Which one to choose, depends totally on what your code is supposed to do:
If the item is expected to always be
an Animal, then you should use
put2 (which will throw, if
that's not the case...)
If the item may or may not be an
Animal, you should use put1 (which
checks a condition, not an error...)
Never care about performance in the first place, if you're writing code!
Related
I came across this implementation in Enumerable.cs by reflector.
public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
//check parameters
TSource local = default(TSource);
long num = 0L;
foreach (TSource local2 in source)
{
if (predicate(local2))
{
local = local2;
num += 1L;
//I think they should do something here like:
//if (num >= 2L) throw Error.MoreThanOneMatch();
//no necessary to continue
}
}
//return different results by num's value
}
I think they should break the loop if there are more than 2 items meets the condition, why they always loop through the whole collection? In case of that reflector disassembles the dll incorrectly, I write a simple test:
class DataItem
{
private int _num;
public DataItem(int num)
{
_num = num;
}
public int Num
{
get{ Console.WriteLine("getting "+_num); return _num;}
}
}
var source = Enumerable.Range(1,10).Select( x => new DataItem(x));
var result = source.Single(x => x.Num < 5);
For this test case, I think it will print "getting 0, getting 1" and then throw an exception. But the truth is, it keeps "getting 0... getting 10" and throws an exception.
Is there any algorithmic reason they implement this method like this?
EDIT Some of you thought it's because of side effects of the predicate expression, after a deep thought and some test cases, I have a conclusion that side effects doesn't matter in this case. Please provide an example if you disagree with this conclusion.
Yes, I do find it slightly strange especially because the overload that doesn't take a predicate (i.e. works on just the sequence) does seem to have the quick-throw 'optimization'.
In the BCL's defence however, I would say that the InvalidOperation exception that Single throws is a boneheaded exception that shouldn't normally be used for control-flow. It's not necessary for such cases to be optimized by the library.
Code that uses Single where zero or multiple matches is a perfectly valid possibility, such as:
try
{
var item = source.Single(predicate);
DoSomething(item);
}
catch(InvalidOperationException)
{
DoSomethingElseUnexceptional();
}
should be refactored to code that doesn't use the exception for control-flow, such as (only a sample; this can be implemented more efficiently):
var firstTwo = source.Where(predicate).Take(2).ToArray();
if(firstTwo.Length == 1)
{
// Note that this won't fail. If it does, this code has a bug.
DoSomething(firstTwo.Single());
}
else
{
DoSomethingElseUnexceptional();
}
In other words, we should leave the use of Single to cases when we expect the sequence to contain only one match. It should behave identically to Firstbut with the additional run-time assertion that the sequence doesn't contain multiple matches. Like any other assertion, failure, i.e. cases when Single throws, should be used to represent bugs in the program (either in the method running the query or in the arguments passed to it by the caller).
This leaves us with two cases:
The assertion holds: There is a single match. In this case, we want Single to consume the entire sequence anyway to assert our claim. There's no benefit to the 'optimization'. In fact, one could argue that the sample implementation of the 'optimization' provided by the OP will actually be slower because of the check on every iteration of the loop.
The assertion fails: There are zero or multiple matches. In this case, we do throw later than we could, but this isn't such a big deal since the exception is boneheaded: it is indicative of a bug that must be fixed.
To sum up, if the 'poor implementation' is biting you performance-wise in production, either:
You are using Single incorrectly.
You have a bug in your program. Once the bug is fixed, this particular performance problem will go away.
EDIT: Clarified my point.
EDIT: Here's a valid use of Single, where failure indicates bugs in the calling code (bad argument):
public static User GetUserById(this IEnumerable<User> users, string id)
{
if(users == null)
throw new ArgumentNullException("users");
// Perfectly fine if documented that a failure in the query
// is treated as an exceptional circumstance. Caller's job
// to guarantee pre-condition.
return users.Single(user => user.Id == id);
}
Update:
I got some very good feedback to my answer, which has made me re-think. Thus I will first provide the answer that states my "new" point of view; you can still find my original answer just below. Make sure to read the comments in-between to understand where my first answer misses the point.
New answer:
Let's assume that Single should throw an exception when it's pre-condition is not met; that is, when Single detects than either none, or more than one item in the collection matches the predicate.
Single can only succeed without throwing an exception by going through the whole collection. It has to make sure that there is exactly one matching item, so it will have to check all items in the collection.
This means that throwing an exception early (as soon as it finds a second matching item) is essentially an optimization that you can only benefit from when Single's pre-condition cannot be met and when it will throw an exception.
As user CodeInChaos says clearly in a comment below, the optimization wouldn't be wrong, but it is meaningless, because one usually introduces optimizations that will benefit correctly-working code, not optimizations that will benefit malfunctioning code.
Thus, it is actually correct that Single could throw an exception early; but it doesn't have to, because there's practically no added benefit.
Old answer:
I cannot give a technical reason why that method is implemented the way it is, since I didn't implement it. But I can state my understanding of the Single operator's purpose, and from there draw my personal conclusion that it is indeed badly implemented:
My understanding of Single:
What is the purpose of Single, and how is it different from e.g. First or Last?
Using the Single operator basically expresses one's assumption that exactly one item must be returned from the collection:
If you don't specify a predicate, it should mean that the collection is expected to contain exactly one item.
If you do specify a predicate, it should mean that exactly one item in the collection is expected to satisfy that condition. (Using a predicate should have the same effect as items.Where(predicate).Single().)
This is what makes Single different from other operators such as First, Last, or Take(1). None of those operators have the requirement that there should be exactly one (matching) item.
When should Single throw an exception?
Basically, when it finds that your assumption was wrong; i.e. when the underlying collection does not yield exactly one (matching) item. That is, when there are zero or more than one items.
When should Single be used?
The use of Single is appropriate when your program's logic can guarantee that the collection will yield exactly one item, and one item only. If an exception gets thrown, that should mean that your program's logic contains a bug.
If you process "unreliable" collections, such as I/O input, you should first validate the input before you pass it to Single. Single, together with an exception catch block, is not appropriate for making sure that the collection has only one matching item. By the time you invoke Single, you should already have made sure that there'll be only one matching item.
Conclusion:
The above states my understanding of the Single LINQ operator. If you follow and agree with this understanding, you should come to the conclusion that Single ought to throw an exception as early as possible. There is no reason to wait until the end of the (possibly very large) collection, because the pre-condition of Single is violated as soon as it detects a second (matching) item in the collection.
When considering this implementation we must remember that this is the BCL: general code that is supposed to work good enough in all sorts of scenarios.
First, take these scenarios:
Iterate over 10 numbers, where the first and second elements are equal
Iterate over 1.000.000 numbers, where the first and third elements are equal
The original algorithm will work well enough for 10 items, but 1M will have a severe waste of cycles. So in these cases where we know that there are two or more early in the sequences, the proposed optimization would have a nice effect.
Then, look at these scenarios:
Iterate over 10 numbers, where the first and last elements are equal
Iterate over 1.000.000 numbers, where the first and last elements are equal
In these scenarios the algorithm is still required to inspect every item in the lists. There is no shortcut. The original algorithm will perform good enough, it fulfills the contract. Changing the algorithm, introducing an if on each iteration will actually decrease performance. For 10 items it will be negligible, but 1M it will be a big hit.
IMO, the original implementation is the correct one, since it is good enough for most scenarios. Knowing the implementation of Single is good though, because it enables us to make smart decisions based on what we know about the sequences we use it on. If performance measurements in one particular scenario shows that Single is causing a bottleneck, well: then we can implement our own variant that works better in that particular scenario.
Update: as CodeInChaos and Eamon have correctly pointed out, the if test introduced in the optimization is indeed not performed on each item, only within the predicate match block. I have in my example completely overlooked the fact that the proposed changes will not affect the overall performance of the implementation.
I agree that introducing the optimization would probably benefit all scenarios. It is good to see though that eventually, the decision to implement the optimization is made on the basis of performance measurements.
I think it's a premature optimization "bug".
Why this is NOT reasonable behavior due to side effects
Some have argued that due to side effects, it should be expected that the entire list is evaluated. After all, in the correct case (the sequence indeed has just 1 element) it is completely enumerated, and for consistency with this normal case it's nicer to enumerate the entire sequence in all cases.
Although that's a reasonable argument, it flies in the face of the general practice throughout the LINQ libraries: they use lazy evaluation everywhere. It's not general practice to fully enumerate sequences except where absolutely necessary; indeed, several methods prefer using IList.Count when available over any iteration at all - even when that iteration may have side effects.
Further, .Single() without predicate does not exhibit this behavior: that terminates as soon as possible. If the argument were that .Single() should respect side-effects of enumeration, you'd expect all overloads to do so equivalently.
Why the case for speed doesn't hold
Peter Lillevold made the interesting observation that it may be faster to do...
foreach(var elem in elems)
if(pred(elem)) {
retval=elem;
count++;
}
if(count!=1)...
than
foreach(var elem in elems)
if(pred(elem)) {
retval=elem;
count++;
if(count>1) ...
}
if(count==0)...
After all, the second version, which would exit the iteration as soon as the first conflict is detected, would require an extra test in the loop - a test which in the "correct" is purely ballast. Neat theory, right?
Except, that's not bourne out by the numbers; for example on my machine (YMMV) Enumerable.Range(0,100000000).Where(x=>x==123).Single() is actually faster than Enumerable.Range(0,100000000).Single(x=>x==123)!
It's possibly a JITter quirk of this precise expression on this machine - I'm not claiming that Where followed by predicateless Single is always faster.
But whatever the case, the fail-fast solution is very unlikely to be significantly slower. After all, even in the normal case, we're dealing with a cheap branch: a branch that is never taken and thus easy on the branch predictor. And of course; the branch is further only ever encountered when pred holds - that's once per call in the normal case. That cost is simply negligible compared to the cost of the delegate call pred and its implementation, plus the cost of the interface methods .MoveNext() and .get_Current() and their implementations.
It's simply extremely unlikely that you'll notice the performance degradation caused by one predictable branch in comparison to all that other abstraction penalty - not to mention the fact that most sequences and predicates actually do something themselves.
It seems very clear to me.
Single is intended for the case where the caller knows that the enumeration contains exactly one match, since in any other case an expensive exception is thrown.
For this use case, the overload that takes a predicate must iterate over the whole enumeration. It is slightly faster to do so without an additional condition on every loop.
In my view the current implementation is correct: it is optimized for the expected use case of an enumeration that contains exactly one matching element.
That does appear to be a bad implementation, in my opinion.
Just to illustrate the potential severity of the problem:
var oneMillion = Enumerable.Range(1, 1000000)
.Select(x => { Console.WriteLine(x); return x; });
int firstEven = oneMillion.Single(x => x % 2 == 0);
The above will output all the integers from 1 to 1000000 before throwing an exception.
It's a head-scratcher for sure.
I only found this question after filing a report at https://connect.microsoft.com/VisualStudio/feedback/details/810457/public-static-tsource-single-tsource-this-ienumerable-tsource-source-func-tsource-bool-predicate-doesnt-throw-immediately-on-second-matching-result#
The side-effect argument doesn't hold water, because:
Having side-effects isn't really functional, and they're called Func for a reason.
If you do want side-effects, it makes no more sense to claim the version that has the side-effects throughout the whole sequence is desirable than it does to claim so for the version that throws immediately.
It does not match the behaviour of First or the other overload of Single.
It does not match at least some other implementations of Single, e.g. Linq2SQL uses TOP 2 to ensure that only the two matching cases needed to test for more than one match are returned.
We can construct cases where we should expect a program to halt, but it does not halt.
We can construct cases where OverflowException is thrown, which is not documented behaviour, and hence clearly a bug.
Most importantly of all, if we're in a condition where we expected the sequence to have only one matching element, and yet we're not, then something has clearly gone wrong. Apart from the general principle that the only thing you should do upon detecting an error state is clean-up (and this implementation delays that) before throwing, the case of an sequence having more than one matching element is going to overlap with the case of a sequence having more elements in total than expected - perhaps because the sequence has a bug that is causing it to loop unexpectedly. So it's precisely in one possible set of bugs that should trigger the exception, that the exception is most delayed.
Edit:
Peter Lillevold's mention of a repeated test may be a reason why the author chose to take the approach they did, as an optimisation for the non-exceptional case. If so it was needless though, even aside from Eamon Nerbonne showing it wouldn't improve much. There's no need to have a repeated test in the initial loop, as we can just change what we're testing for upon the first match:
public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
if(source == null)
throw new ArgumentNullException("source");
if(predicate == null)
throw new ArgumentNullException("predicate");
using(IEnumerator<TSource> en = source.GetEnumerator())
{
while(en.MoveNext())
{
TSource cur = en.Current;
if(predicate(cur))
{
while(en.MoveNext())
if(predicate(en.Current))
throw new InvalidOperationException("Sequence contains more than one matching element");
return cur;
}
}
}
throw new InvalidOperationException("Sequence contains no matching element");
}
Code is read more often then updated. Writing more readable code is better than writing powerful and geeky code when compilers can optimize for best execution.
For example see below code - this code can be compressed by combining the nested if statements, but will the compiler not optimize this code for best execution anyway while we get to maintain the readability of it?
// yeild sunRays when sky is blue.
// yeild sunRays when sky is not blue and sun is not present.
if (yieldWhenSkyIsBlue)
{
// if sky is blue and sun is present -> yeild sunRaysObjB.
if (sunObjA != null)
{
yield return sunRaysObjB;
}
else
{
// do not yield ;
}
}
else
{
// if sky is not blue and sun is not present -> yeild sunRaysObjB.
if (sunObjA == null)
{
yield return sunRaysObjB;
}
}
As opposed to something like this :
// yeild sunRays when (sky is blue) or (sun is not present and sky is blue).
// (this interpretation is a bit misleading as compared to first one?)
if(( sunObjA == null && yieldWhenSkyIsBlue ==false) || (yieldWhenSkyIsBlue && sunObjA != null) )
{
yield return sunRaysObjB;
}
Reading the first version depicts the use case better for future enhancements\updates ? The second version of the code is shorter but reading it does not make the use case very apparent or does it ? Are there other advantages of second case apart from concise code ?
update #1 : yes it returns ObjB in both cases but based on the condition it may not yield at all. so the strategy decides when to yield and when not. ( one more reason why readability is imp)
update #2 : updated to site a better example. copied the syntax from stripplingWarrior
update #3 : updated for "What do you expect to happen when the sun is out and the sky is blue".
I think the second code example is much more readable, and has the advantage of being pretty optimal anyway.
Most programmers will find this logic flow to be obvious and natural: you will return ObjB if ObjA is null, or if it's not null and howtoYieldFalg is set.
But if I had to choose between making code like this more readable and making it optimal, I'd make it readable first. Only if I discovered that it's the source of a bottleneck would I bother optimizing it. In this particular case, I can pretty much guarantee that your use of yield return will introduce way more overhead than a suboptimal evaluation of your conditionals.
Update
Take another look at your code samples: they are not logically equivalent. What do you expect to happen when the sun is out and the sky is blue? The second code sample correctly allows sun rays to shine in that case, whereas the first example does not.
The fact that it was so easy to introduce a bug in the first case which so many people failed to catch for so long should be ample evidence to show that the second approach is better. All those nested if/else statements can be tricky to keep straight, even to an experienced programmer. Simple boolean logic is a lot easier to keep straight, especially once you use variable names that give it meaning.
Update 2
Based on the further explanation, and with a little creativity, I'm going to suggest an approach that uses both comments and variable names to increase clarity:
/* Explanation: We live on a strange planet where the sun's
* rays can shine if the sky is blue while the sun is out,
* or if the sky is not blue and there is no sun. */
bool sunIsPresent = sunObjA != null;
if ((skyIsBlue && sunIsPresent) ||
(!skyIsBlue && !sunIsPresent))
{
yield return sunRaysObjB;
}
The compiler optimizes right through any way you organize your program's control flow, so you really do not have to worry about it.
The weakness of compilers though, is they only optimize based on preserving code semantics, not preserving the meaning you intend. I compiled both your examples in LLVM, and here are the control flow graphs generated:
and
I was surprised to find the two CFG's are slightly different. You will note that first is an instruction smaller, but in the second graph, there exists a path to the exit node which only passes through one comparison, whereas in the first, two comparisons are always necessary.
In fact, further tracing of possible routes yields that the first example has possible routes of 6,8,8,6 instructions long, while the second has routes of 8,10,10 respectively. In BOTH cases the average run length is 7 instructions long, but we can see that the first case has better best-time run lengths. Without more information the compiler cannot tell which is better.
tldr: Compilers do magic stuff, don't worry about it, code how you think is best.
This is probably not the popular opinion but I'd definitely not rely on the compiler to perform optimizations of this type. (It may do it, I don't know.) I don't see the second example as geeky - for me it describes more clearly that the two conditions are connected.
Typically I try to write as optimal code as possible without making it very cryptic and then let the compiler optimize that.
Though I haven't tested this particular case, I'm willing to bet that there will be no significant difference between the generated code, if any at all.
Unless you're doing it for fun or a specialized use case, I would argue human-readability is by far the more important quality of good code. The compiler is going to collapse much of your expressive code into more efficient forms, and what it misses you probably won't ever notice.
Given that, idiomatic code is easier to read even when it's less concise. Experienced readers of a language are going to recognize a common pattern more quickly than unfamiliar code that is, arguably 'more human' but breaks the familiar pattern. Looping/incrementing constructs are a good example of code that should be unsurprising. So, my approach is: Be expressive but not too clever.
[ This is a result of Best Practice: Should functions return null or an empty object? but I'm trying to be very general. ]
In a lot of legacy (um...production) C++ code that I've seen, there is a tendency to write a lot of NULL (or similar) checks to test pointers. Many of these get added near the end of a release cycle when adding a NULL-check provides a quick fix to a crash caused by the pointer dereference--and there isn't a lot of time to investigate.
To combat this, I started to write code that took a (const) reference parameter instead of the (much) more common technique of passing a pointer. No pointer, no desire to check for NULL (ignoring the corner case of actually having a null reference).
In C#, the same C++ "problem" is present: the desire to check every unknown reference against null (ArgumentNullException) and to quickly fix NullReferenceExceptions by adding a null check.
It seems to me, one way to prevent this is to avoid null objects in the first place by using empty objects (String.Empty, EventArgs.Empty) instead. Another would be to throw an exception rather than return null.
I'm just starting to learn F#, but it appears there are far fewer null objects in that enviroment. So maybe you don't really have to have a lot of null references floating around?
Am I barking up the wrong tree here?
Passing non-null just to avoid a NullReferenceException is trading a straightforward, easy-to-solve problem ("it blows up because it's null") for a much more subtle, hard-to-debug problem ("something several calls down the stack is not behaving as expected because much earlier it got some object which has no meaningful information but isn't null").
NullReferenceException is a wonderful thing! It fails hard, loud, fast, and it's almost always quick and easy to identify and fix. It's my favorite exception, because I know when I see it, my task is only going to take about 2 minutes. Contrast this with a confusing QA or customer report trying to describe strange behavior that has to be reproduced and traced back to the origin. Yuck.
It all comes down to what you, as a method or piece of code, can reasonably infer about the code which called you. If you are handed a null reference, and you can reasonably infer what the caller might have meant by null (maybe an empty collection, for example?) then you should definitely just deal with the nulls. However, if you can't reasonably infer what to do with a null, or what the caller means by null (for example, the calling code is telling you to open a file and gives the location as null), you should throw an ArgumentNullException.
Maintaining proper coding practices like this at every "gateway" point - logical bounds of functionality in your code—NullReferenceExceptions should be much more rare.
I tend to be dubious of code with lots of NULLs, and try to refactor them away where possible with exceptions, empty collections, Java Optionals, and so on.
The "Introduce Null Object" pattern in Martin Fowler's Refactoring (page 260) may also be helpful. A Null Object responds to all the methods a real object would, but in a way that "does the right thing". So rather than always check an Order to see if order.getDiscountPolicy() is NULL, make sure the Order has a NullDiscountPolicy in these cases. This streamlines the control logic.
Null gets my vote. Then again, I'm of the 'fail-fast' mindset.
String.IsNullOrEmpty(...) is very helpful too, I guess it catches either situation: null or empty strings. You could write a similar function for all your classes you're passing around.
If you are writing code that returns null as an error condition, then don't: generally, you should throw an exception instead - far harder to miss.
If you are consuming code that you fear may return null, then mostly these are boneheaded exceptions: perhaps do some Debug.Assert checks at the caller to sense-check the output during development. You shouldn't really need vast numbers of null checks in you production, but if some 3rd party library returns lots of nulls unpredictably, then sure: do the checks.
In 4.0, you might want to look at code-contracts; this gives you much better control to say "this argument should never be passed in as null", "this function never returns null", etc - and have the system validate those claims during static analysis (i.e. when you build).
The thing about null is that it doesn't come with meaning. It is merely the absence of an object.
So, if you really mean an empty string/collection/whatever, always return the relevant object and never null. If the language in question allows you to specify that, do so.
In the case where you want to return something that means not a value specifiable with the static type, then you have a number of options. Returning null is one answer, but without a meaning is a little dangerous. Throwing an exception may actually be what you mean. You might want to extend the type with special cases (probably with polymorphism, that is to say the Special Case Pattern (a special case of which is the Null Object Pattern)). You might want to wrap the return value in an type with more meaning. Or you might want to pass in a callback object. There usually are many choices.
I'd say it depends. For a method returning a single object, I'd generally return null. For a method returning a collection, I'd generally return an empty collection (non-null). These are more along the lines of guidelines than rules, though.
If you are serious about wanting to program in a "nullless" environment, consider using extension methods more often, they are immune to NullReferenceExceptions and at least "pretend" that null isn't there anymore:
public static GetExtension(this string s)
{
return (new FileInfo(s ?? "")).Extension;
}
which can be called as:
// this code will never throw, not even when somePath becomes null
string somePath = GetDataFromElseWhereCanBeNull();
textBoxExtension.Text = somePath.GetExtension();
I know, this is only convenience and many people correctly consider it violation of OO principles (though the "founder" of OO, Bertrand Meyer, considers null evil and completely banished it from his OO design, which is applied to the Eiffel language, but that's another story). EDIT: Dan mentions that Bill Wagner (More Effective C#) considers it bad practice and he's right. Ever considered the IsNull extension method ;-) ?
To make your code more readable, another hint may be in place: use the null-coalescing operator more often to designate a default when an object is null:
// load settings
WriteSettings(currentUser.Settings ?? new Settings());
// example of some readonly property
public string DisplayName
{
get
{
return (currentUser ?? User.Guest).DisplayName
}
}
None of these take the occasional check for null away (and ?? is nothing more then a hidden if-branch). I prefer as little null in my code as possible, simply because I believe it makes the code more readable. When my code gets cluttered with if-statements for null, I know there's something wrong in the design and I refactor. I suggest anybody to do the same, but I know that opinions vary wildly on the matter.
(Update) Comparison with exceptions
Not mentioned in the discussion so far is the similarity with exception handling. When you find yourself ubiquitously ignoring null whenever you consider it's in your way, it is basically the same as writing:
try
{
//...code here...
}
catch (Exception) {}
which has the effect of removing any trace of the exceptions only to find it raises unrelated exceptions much later in the code. Though I consider it good to avoid using null, as mentioned before in this thread, having null for exceptional cases is good. Just don't hide them in null-ignore-blocks, it will end up having the same effect as the catch-all-exceptions blocks.
For the exception protagonists they usually stem from transactional programming and strong exception safety guarantees or blind guidelines. In any decent complexity, ie. async workflow, I/O and especially networking code they are simply inappropriate. The reason why you see Google style docs on the matter in C++, as well as all good async code 'not enforcing it' (think your favourite managed pools as well).
There is more to it and while it might look like a simplification, it really is that simple. For one you will get a lot of exceptions in something that wasn't designed for heavy exception use.. anyway I digress, read upon on this from the world's top library designers, the usual place is boost (just don't mix it up with the other camp in boost that loves exceptions, because they had to write music software :-).
In your instance, and this is not Fowler's expertise, an efficient 'empty object' idiom is only possible in C++ due to available casting mechanism (perhaps but certainly not always by means of dominance ).. On ther other hand, in your null type you are capable throwing exceptions and doing whatever you want while preserving the clean call site and structure of code.
In C# your choice can be a single instance of a type that is either good or malformed; as such it is capable of throwing acceptions or simply running as is. So it might or might not violate other contracts ( up to you what you think is better depending on the quality of code you're facing ).
In the end, it does clean up call sites, but don't forget you will face a clash with many libraries (and especially returns from containers/Dictionaries, end iterators spring to mind, and any other 'interfacing' code to the outside world ). Plus null-as-value checks are extremely optimised pieces of machine code, something to keep in mind but I will agree any day wild pointer usage without understanding constness, references and more is going to lead to different kind of mutability, aliasing and perf problems.
To add, there is no silver bullet, and crashing on null reference or using a null reference in managed space, or throwing and not handling an exception is an identical problem, despite what managed and exception world will try to sell you. Any decent environment offers a protection from those (heck you can install any filter on any OS you want, what else do you think VMs do), and there are so many other attack vectors that this one has been overhammered to pieces. Enter x86 verification from Google yet again, their own way of doing much faster and better 'IL', 'dynamic' friendly code etc..
Go with your instinct on this, weight the pros and cons and localise the effects.. in the future your compiler will optimise all that checking anyway, and far more efficiently than any runtime or compile-time human method (but not as easily for cross-module interaction).
I try to avoid returning null from a method wherever possible. There are generally two kinds of situations - when null result would be legal, and when it should never happen.
In the first case, when no result is legal, there are several solutions available to avoid null results and null checks that are associated with them: Null Object pattern and Special Case pattern are there to return substitute objects that do nothing, or do some specific thing under specific circumstances.
If it is legal to return no object, but still there are no suitable substitutes in terms of Null Object or Special Case, then I typically use the Option functional type - I can then return an empty option when there is no legal result. It is then up to the client to see what is the best way to deal with empty option.
Finally, if it is not legal to have any object returned from a method, simply because the method cannot produce its result if something is missing, then I choose to throw an exception and cut further execution.
How are empty objects better than null objects? You're just renaming the symptom. The problem is that the contracts for your functions are too loosely defined "this function might return something useful, or it might return a dummy value" (where the dummy value might be null, an "empty object", or a magic constant like -1.) But no matter how you express this dummy value, callers still have to check for it before they use the return value.
If you want to clean up your code, the solution should be to narrow down the function so that it doesn't return a dummy value in the first place.
If you have a function which might return a value, or might return nothing, then pointers are a common (and valid) way to express this. But often, your code can be refactored so that this uncertainty is removed. If you can guarantee that a function returns something meaningful, then callers can rely on it returning something meaningful, and then they don't have to check the return value.
You can't always return an empty object, because 'empty' is not always defined. For example what does it mean for an int, float or bool to be empty?
Returning a NULL pointer is not necessarily a bad practice, but I think it's a better practice to return a (const) reference (where it makes sense to do so of course).
And recently I've often used a Fallible class:
Fallible<std::string> theName = obj.getName();
if (theName)
{
// ...
}
There are various implementations available for such a class (check Google Code Search), I also created my own.
I was wondering if there is a performance difference when using ifs in C#, and they are nested or not. Here's an example:
if(hello == true) {
if(index == 34) {
DoSomething();
}
}
Is this faster or slower than this:
if(hello == true && index == 34) {
DoSomething();
}
Any ideas?
Probably the compiler is smart enough to generate the same, or very similar code, for both versions. Unless performance is really a critical factor for your application, I would automatically choose the second version, for the sake of code readability.
Even better would be
if(SomethingShouldBeDone()) {
DoSomething();
}
...meanwhile in another part of the city...
private bool SomethingShouldBeDone()
{
return this.hello == true && this.index == 34;
}
In 99% of real-life situations this will have little or no performance impact, and provided you name things meaningfully it will be much easier to read, understand and (therefore) maintain.
Use whichever is most readable and still correct (sometimes juggling around boolean expressions will get you different behavior - especially if short-circuiting is involved). The execution time will be the same (or too close to matter).
Just for the record, sometimes I find nesting to be more readable (if the expression turns out to be too long or to have too many components) and sometimes I find it to be less readable (as in your short example).
Any modern compiler, and by that I mean anything built in the past 20 years, will compile these to the same code.
As to which you should use then it depends whichever is more readable and logical in the context of the project). Generally I would go for the second myself, but that would vary.
A strong point worth consideration though arises from maintenance. One of the more common bugs I have hunted down is a dangling if/else in the middle of a block of nested ifs. This arises if you have a complex series of if else conditions which has been amended by different programmers over a period - often several years. For example using pseudo-code for a simple case:
IF condition_a
IF condition_b
Do something
ELSE
Do something
END IF
ELSE
IF condition_b
Do something
END IF
END IF
you'll notice for the combination !condition_a && !condition_b the code will fall through the conditions doing nothing. This is quite easy to spot for just the pair of conditions, but can get very easy to miss very quickly once you have 3, 4 or more if/else conditions to check. What commonly happens is the nested structure is correct when first coded, but becomes incorrect (in terms of the business outputs) at some later point because the maintenance programmers will not understand or allow for the full range of options.
It's therefore generally more robust, over time, to code using combined conditions in the if structure adopting the flatest feasible structure and keep nesting to a minimum, hence with your example as there's no logical reason not to combine the two conditions into a single statement then you should do so
I can't see that there will be any great performance difference with either, but I do think that option two is MUCH more readable.
I don't believe there is any performance difference you might be experiencing between the two implementation..
Anyway, I go for for the latter implementation because it is more readable.
Depends on the compiler. The difference will be more apparent when you have code after the close of the nested if, but before the close of the outer.
I've wondered about this often myself. However, it seems there really is no difference (or not much to speak of) between the options. Readability-wise, the second option is more readable and so I usually choose that one unless I anticipate having to code specifically for each condition for some reason.
I'm having a disagreement with someone over how best to implement a simple method that takes an array of integers, and returns the highest integer (using C# 2.0).
Below are the two implementations - I have my own opinion of which is better, and why, but I'd appreciate any impartial opinions.
Option A
public int GetLargestValue(int[] values)
{
try {
Array.Sort(values);
return values[values.Length - 1];
}
catch (Exception){ return -1;}
}
Option B
public int GetLargestValue(int[] values)
{
if(values == null)
return -1;
if(values.Length < 1)
return -1;
int highestValue = values[0];
foreach(int value in values)
if(value > highestValue)
highestValue = value;
return highestValue;
}
Ôption B of course.
A is ugly :
Catch(Exception) is a really bad practice
You shoul not rely on exception for null ref, out of range,...
Sorting is way complexier than iteration
Complexity :
A will be O(n log(n)) and even O(n²) in worst case
B worst case is O(n)
A has the side effect that it sorts the array. This might be unexpected by the caller.
Edit: I don't like to return -1 for empty or null array (in both solutions), since -1 might be a legal value in the array. This should really generate an exception (perhaps ArgumentException).
I prefer Option B as it only traverses the collection exactly once.
In Option A, you may have to access many elements more than once (the number of times is dependant upon the implementation of the sort alogrithm).
The Option A is an inefficent implementation, but results in a fairly clear algorithm. It does however use a fairly ugly Exception catch which would only ever be triggered if an empty array is passed in (so could probably be written clearer with a pre-sort check).
PS, you should never simply catch "Exception" and then correct things. There are many types of Exceptions and generally you should catch each possible one and handle accordingly.
The second one is better.
The complexity of the first is O(N LogN), and for the second it is O(N).
I have to choose option B - not that it's perfect but because option A uses exceptions to represent logic.
I would say that it depends on what your goal is, speed or readability.
If processing speed is your goal, I'd say the second solution, but if readability is the goal, I'd pick the first one.
I'd probably go for speed for this type of function, so I'd pick the second one.
There are many factors to consider here. Both options should include the bounds checks that are in option B and do away with using Exception handling in that manner. The second option should perform better most of the time as it only needs to traverse the array once. However, if the data was already sorted or needed to be sorted; then Option A would be preferable.
No sorting algorithm performs in n time, so Option B will be the fastest on average.
Edit: Article on sorting
I see two points here:
Parameter testing as opposed to exception handling: Better use explicit checking, should also be faster.
Sorting and picking the largest value as opposed to walking the whole array. Since sorting involves handling each integer in the array at least once, it will not perform as well as walking the whole array (only) once.
Which is better? For the first point, definitely explicit checking. For the second, it depends...
The first example is shorter, makes it quicker to write and read/understand. The second is faster. So: If runtime efficiency is an issue, choose the second option. If fast coding is your goal, use the first one.