The best way to conditionally execute a function? - c#

Yes, I know the wording is hard to understand, but this is something that bugs me a lot. On a recent project, I have a function that recurses, and there are a number of conditions that would cause it to stop recursing (at the moment three). Which of the situations would be optional? (I.E. best performance or easiest maintenance).
1) Conditional return:
void myRecursingFunction (int i, int j){
if (conditionThatWouldStopRecursing) return;
if (anotherConditionThatWouldStopRecursing) return;
if (thirdConditionThatWouldStopRecursing) return;
doSomeCodeHere();
myRecursingFunction(i + 1, j);
myRecursingFunction(i, j + 1);
}
2) Wrap the whole thing in an if statement
void myRecursingFunction (int i, int j){
if (
!conditionThatWouldStopRecursing &&
!anotherConditionThatWouldStopRecursing &&
!thirdConditionThatWouldStopRecursing
){
doSomeCodeHere();
myRecursingFunction(i + 1, j);
myRecursingFunction(i, j + 1);
}
}
3) You're doing it wrong noob, no sane algorithm would ever use recursion.

Both of those approaches should result in the same IL code behind the scenes since they are equivalent boolean expressions. Note that each termination condition will be evaluated in the order you write it (since the compiler can't tell which is most likely), so you will want to put the most common termination condition first.
Even though structured programming dictates the second approach is better, personally I prefer to code return conditions as a separate block at the top of the recursive method. I find that easier to read and follow (though I am not a fan of returns in random areas of the method body).

I would opt for the first solution, since this makes it perfectly clear what the conditions are to stop the recursion.
It is more readable and more maintainable imho.

If it's something that needs to be fast, I'd recommend hitting the most common case as quickly as possible (i.e. put the base case at the end because you'll only hit it once). Also think about putting a base case-1 before the recursing clause (i.e. perform the test before you call the function again rather than check it on entry to the subsequent call) if that will make a difference.
And, with all things, don't optimise unless it's a problem. I'd go for clarity first.

I like the variant 1 better...
It is much easier to read then variant 2. Here you have to understand 3 negations and chain them all together with and. I know its not "hard", but it takes much longer then glancing on variant 1

My vote is for option #1 as well. It looks clearer to me.

Related

Is it possible to always eliminate goto's?

While making everthing with goto's is easy (as evidenced by f.ex. IL), I was wondering if it is also possible to eliminate all goto statements with higher level expressions and statements - say - using everything that's supported in Java.
Or if you like: what I'm looking for is 'rewrite rules' that will always work, regardless of the way the goto is created.
It's mostly intended as a theoretical question, purely as interest; not as a good/bad practices.
The obvious solution that I've thought about is to use something like this:
while (true)
{
switch (state) {
case [label]: // here's where all your goto's will be
state = [label];
continue;
default:
// here's the rest of the program.
}
}
While this will probably work and does fits my 'formal' question, I don't like my solution a single bit. For one, it's dead ugly and for two, it basically wraps the goto into a switch that does exactly the same as a goto would do.
So, is there a better solution?
Update 1
Since a lot of people seem to think the question is 'too broad', I'm going to elaborate a bit more... the reason I mentioned Java is because Java doesn't have a 'goto' statement. As one of my hobby projects, I was attempting to transform C# code into Java, which is proving to be quite challenging (partly because of this limitation in Java).
That got me thinking. If you have f.ex. the implementation of the 'remove' method in Open addressing (see: http://en.wikipedia.org/wiki/Open_addressing - note 1), it's quite convenient to have the 'goto' in the exceptional case, although in this particular case you could rewrite it by introducing a 'state' variable. Note that this is just one example, I've implemented code generators for continuations, which produce tons and tons of goto's when you're attempting to decompile them.
I'm also not sure if rewriting in this matter will always eliminate the 'goto' statement and if it is allowed in every case. While I'm not looking for a formal 'proof', some evidence that elimination is possible in this matter would be great.
So about the 'broadness', I challenge all the people that think there are 'too many answers' or 'many ways to rewrite a goto' to provide an algorithm or an approach to rewriting the general case please, since the only answer I found so far is the one I've posted.
This 1994 paper: Taming Control Flow: A Structured Approach to Eliminating Goto
Statements proposes an algorithm to eradicate all goto statements in a C program. The method is applicable to any program written in C# or any language that uses common constructs like if/switch/loop/break/continue (AFAIK, but I don't see why it wouldn't).
It begins with the two simplest transformations:
Case 1
Stuff1();
if (cond) goto Label;
Stuff2();
Label:
Stuff3();
becomes:
Stuff1();
if (!cond)
{
Stuff2();
}
Stuff3();
Case 2
Stuff1();
Label:
Stuff2();
Stuff3();
if (cond) goto Label;
becomes:
Stuff1();
do
{
Stuff2();
Stuff3();
} while (cond);
and builds on that to examine each complex case and apply iterative transformations that lead to those trivial cases. It then rounds off with the ultimate gotos/labels eradication algorithm.
This is a very interesting read.
UPDATE: Some other interesting papers on the subject (not easy to get your hands on, so I copy direct links here for reference):
A Formal Basis for Removing Goto Statements
A Goto-Elimination Method And Its Implementation For The McCat C Compiler
A situation where a goto can be avoided, but I think it is better to use it:
When I need to exit a inner loop and the loop:
for(int i = 0; i < list.Count; i++)
{
// some code that initializes inner
for(int j = 0; j < inner.Count; j++)
{
// Some code.
if (condition) goto Finished;
}
}
Finished:
// Some more code.
To avoid the goto you should do something like this:
for(int i = 0; i < list.Count; i++)
{
// some code that initializes inner
bool conditon = false;
for(int j = 0; j < inner.Count; j++)
{
// Some code that might change condition
if (condition) break;
}
if (condition) break;
}
// Some more code.
I think it looks much nicer with the goto statement.
The second situation is okay if the inner loop was in a different method.
void OuterLoop(list)
{
for(int i = 0; i < list.Count; i++)
{
// some code that initializes inner
if (InnerLoop(inner)) break;
}
}
bool InnerLoop(inner)
{
for(int j = 0; j < inner.Count; j++)
{
// Some code that might change condition
if (condition) return true;
}
return false;
}
I have some practical experience of attempting to take an unstructured program (in COBOL, no less) and render it as structured by removing every instance of GOTO. The original programmer was a reformed Assembly programmer, and though he may have known about the PERFORM statement, he didn't use it. It was GOTO GOTO GOTO. And it was serious spaghetti code -- several hundred lines worth of spaghetti code. I spent a couple of weeks worth of spare time trying to rewrite this monstrous construct, and eventually I had to give up. It was a huge steaming pile of insanity. It worked, though! Its job was to parse user instructions sent in to the mainframe in a textual format, and it did it well.
So, NO, it is not always to possible to completely eliminate GOTO -- if you are using manual methods. This is an edge case, however -- existing code that was written by a man with an apparently twisted programming mind. In modern times, there are tools available which can solve formerly intractable structural problems.
Since that day I have coded in Modula-2, C, Revelation Basic, three flavors of VB, and C# and have never found a situation that required or even suggested GOTO as a solution. For the original BASIC, however, GOTO was unavoidable.
Since you said it's a theoretical question, here is the theoretical answer.
Java is Turing complete so of course, yes. You can express any C# program in Java. You can also express it in Minecraft Redstone or Minesweeper. Compared to these alternatives expressing it in Java should be easy.
An obviously more practical answer though, namely an intelligible algorithm to do the transformation, has been given by Patrice.
I don't like my solution a single bit. For one, it's dead ugly and for two, it basically wraps the goto into a switch that does exactly the same as a goto would do.
That's what you're always going to get when looking for a general pattern which can replace any use of goto, something which does exactly the same as a goto would do. What else could it be? Different uses of goto should be replaced with best matching language construct. That's why constructs as switch statements and for loops exist in the first place, to make it easier and less error prone to create such a program flow. The compiler will still generate goto's (or jumps), but it will do so consistently where we will screw up. On top of that we don't have to read what the compiler generates, but we get to read (and write) something that's easier to understand.
You'll find that most compiler constructs exist to generalize a specific use of goto, those constructs where create based on the common patterns of goto usage which existed before it. The paper Patrice Gahide mentions sort of shows that process in reverse. If you are finding goto's in your code you are either looking at one of those patterns, in which case you should replace it with the matching language construct. Or you are looking at essentially unstructured code in which case you should actually structure the code (or leave it alone). Changing it to something unstructured but without goto can only make matters worse. (The same generalization process is still going on by the way, consider how foreach is being added to compilers to generalize a very common usage of for...)

Is it better to use if/then/else to flip a boolean, or negation?

Can I switch a boolean with one statement as effectively as with an if/then/else ?
Found this in another piece of code that is going into my app...
private void whatever()
{
////
//// a bunch of stuff
////
if (SomeBooleanValue)
{
SomeBooleanValue= false;
}
else
{
SomeBooleanValue = true;
}
}
Out of curiosity, I tried this...
private void whatever_whatever()
{
////
//// the same stuff
////
SomeBooleanValue = !SomeBooleanValue;
}
...and walked through it in debug, and it appears that I get the same result.
Is there a good reason to use the if/then/else instead of the single line way ?
Is there a good reason to use the if/then/else instead of the single line way
Not any that I can think of. Using the ! operator is cleaner and more intuitive for most programmers.
The 1-line way is perfectly fine, and the only reason why you'd use the if/ else structure is if you were doing other things aside from just toggling the boolean.
I would say the second one is better since it is more readable and compact (if/then/else IMO just adds unnecesary lines of code), that would be the only (but strong!) reason to prefer one from the other
Due to compiler optimizations, it will be the same as using the ! operator, which is easier to read for other programmers.
However,
To improve performance, the CPU will try to predict the execution logic ahead of time. For conditional (if/else) statements, it will try to predict the result of the condition and then load the rest of the logic. If it chooses incorrectly, it must go back and re-calculate everything again, hence decreasing performance.
http://en.wikipedia.org/wiki/Branch_predictor
Please, please, please write the code as a negation; it is the most succinct and concise expression of intent. If you write the code the first way, every reader of your code is going to waste time wondering why you did NOT write the simple negation.
Most code will be read, by you and others, many more times than it is written. Writing code that eases the reading process is, almost by definition, good code.

how to make code more readable with compiler optimizations in place?

Code is read more often then updated. Writing more readable code is better than writing powerful and geeky code when compilers can optimize for best execution.
For example see below code - this code can be compressed by combining the nested if statements, but will the compiler not optimize this code for best execution anyway while we get to maintain the readability of it?
// yeild sunRays when sky is blue.
// yeild sunRays when sky is not blue and sun is not present.
if (yieldWhenSkyIsBlue)
{
// if sky is blue and sun is present -> yeild sunRaysObjB.
if (sunObjA != null)
{
yield return sunRaysObjB;
}
else
{
// do not yield ;
}
}
else
{
// if sky is not blue and sun is not present -> yeild sunRaysObjB.
if (sunObjA == null)
{
yield return sunRaysObjB;
}
}
As opposed to something like this :
// yeild sunRays when (sky is blue) or (sun is not present and sky is blue).
// (this interpretation is a bit misleading as compared to first one?)
if(( sunObjA == null && yieldWhenSkyIsBlue ==false) || (yieldWhenSkyIsBlue && sunObjA != null) )
{
yield return sunRaysObjB;
}
Reading the first version depicts the use case better for future enhancements\updates ? The second version of the code is shorter but reading it does not make the use case very apparent or does it ? Are there other advantages of second case apart from concise code ?
update #1 : yes it returns ObjB in both cases but based on the condition it may not yield at all. so the strategy decides when to yield and when not. ( one more reason why readability is imp)
update #2 : updated to site a better example. copied the syntax from stripplingWarrior
update #3 : updated for "What do you expect to happen when the sun is out and the sky is blue".
I think the second code example is much more readable, and has the advantage of being pretty optimal anyway.
Most programmers will find this logic flow to be obvious and natural: you will return ObjB if ObjA is null, or if it's not null and howtoYieldFalg is set.
But if I had to choose between making code like this more readable and making it optimal, I'd make it readable first. Only if I discovered that it's the source of a bottleneck would I bother optimizing it. In this particular case, I can pretty much guarantee that your use of yield return will introduce way more overhead than a suboptimal evaluation of your conditionals.
Update
Take another look at your code samples: they are not logically equivalent. What do you expect to happen when the sun is out and the sky is blue? The second code sample correctly allows sun rays to shine in that case, whereas the first example does not.
The fact that it was so easy to introduce a bug in the first case which so many people failed to catch for so long should be ample evidence to show that the second approach is better. All those nested if/else statements can be tricky to keep straight, even to an experienced programmer. Simple boolean logic is a lot easier to keep straight, especially once you use variable names that give it meaning.
Update 2
Based on the further explanation, and with a little creativity, I'm going to suggest an approach that uses both comments and variable names to increase clarity:
/* Explanation: We live on a strange planet where the sun's
* rays can shine if the sky is blue while the sun is out,
* or if the sky is not blue and there is no sun. */
bool sunIsPresent = sunObjA != null;
if ((skyIsBlue && sunIsPresent) ||
(!skyIsBlue && !sunIsPresent))
{
yield return sunRaysObjB;
}
The compiler optimizes right through any way you organize your program's control flow, so you really do not have to worry about it.
The weakness of compilers though, is they only optimize based on preserving code semantics, not preserving the meaning you intend. I compiled both your examples in LLVM, and here are the control flow graphs generated:
and
I was surprised to find the two CFG's are slightly different. You will note that first is an instruction smaller, but in the second graph, there exists a path to the exit node which only passes through one comparison, whereas in the first, two comparisons are always necessary.
In fact, further tracing of possible routes yields that the first example has possible routes of 6,8,8,6 instructions long, while the second has routes of 8,10,10 respectively. In BOTH cases the average run length is 7 instructions long, but we can see that the first case has better best-time run lengths. Without more information the compiler cannot tell which is better.
tldr: Compilers do magic stuff, don't worry about it, code how you think is best.
This is probably not the popular opinion but I'd definitely not rely on the compiler to perform optimizations of this type. (It may do it, I don't know.) I don't see the second example as geeky - for me it describes more clearly that the two conditions are connected.
Typically I try to write as optimal code as possible without making it very cryptic and then let the compiler optimize that.
Though I haven't tested this particular case, I'm willing to bet that there will be no significant difference between the generated code, if any at all.
Unless you're doing it for fun or a specialized use case, I would argue human-readability is by far the more important quality of good code. The compiler is going to collapse much of your expressive code into more efficient forms, and what it misses you probably won't ever notice.
Given that, idiomatic code is easier to read even when it's less concise. Experienced readers of a language are going to recognize a common pattern more quickly than unfamiliar code that is, arguably 'more human' but breaks the familiar pattern. Looping/incrementing constructs are a good example of code that should be unsurprising. So, my approach is: Be expressive but not too clever.

Nested or not nested if-blocks?

I was wondering if there is a performance difference when using ifs in C#, and they are nested or not. Here's an example:
if(hello == true) {
if(index == 34) {
DoSomething();
}
}
Is this faster or slower than this:
if(hello == true && index == 34) {
DoSomething();
}
Any ideas?
Probably the compiler is smart enough to generate the same, or very similar code, for both versions. Unless performance is really a critical factor for your application, I would automatically choose the second version, for the sake of code readability.
Even better would be
if(SomethingShouldBeDone()) {
DoSomething();
}
...meanwhile in another part of the city...
private bool SomethingShouldBeDone()
{
return this.hello == true && this.index == 34;
}
In 99% of real-life situations this will have little or no performance impact, and provided you name things meaningfully it will be much easier to read, understand and (therefore) maintain.
Use whichever is most readable and still correct (sometimes juggling around boolean expressions will get you different behavior - especially if short-circuiting is involved). The execution time will be the same (or too close to matter).
Just for the record, sometimes I find nesting to be more readable (if the expression turns out to be too long or to have too many components) and sometimes I find it to be less readable (as in your short example).
Any modern compiler, and by that I mean anything built in the past 20 years, will compile these to the same code.
As to which you should use then it depends whichever is more readable and logical in the context of the project). Generally I would go for the second myself, but that would vary.
A strong point worth consideration though arises from maintenance. One of the more common bugs I have hunted down is a dangling if/else in the middle of a block of nested ifs. This arises if you have a complex series of if else conditions which has been amended by different programmers over a period - often several years. For example using pseudo-code for a simple case:
IF condition_a
IF condition_b
Do something
ELSE
Do something
END IF
ELSE
IF condition_b
Do something
END IF
END IF
you'll notice for the combination !condition_a && !condition_b the code will fall through the conditions doing nothing. This is quite easy to spot for just the pair of conditions, but can get very easy to miss very quickly once you have 3, 4 or more if/else conditions to check. What commonly happens is the nested structure is correct when first coded, but becomes incorrect (in terms of the business outputs) at some later point because the maintenance programmers will not understand or allow for the full range of options.
It's therefore generally more robust, over time, to code using combined conditions in the if structure adopting the flatest feasible structure and keep nesting to a minimum, hence with your example as there's no logical reason not to combine the two conditions into a single statement then you should do so
I can't see that there will be any great performance difference with either, but I do think that option two is MUCH more readable.
I don't believe there is any performance difference you might be experiencing between the two implementation..
Anyway, I go for for the latter implementation because it is more readable.
Depends on the compiler. The difference will be more apparent when you have code after the close of the nested if, but before the close of the outer.
I've wondered about this often myself. However, it seems there really is no difference (or not much to speak of) between the options. Readability-wise, the second option is more readable and so I usually choose that one unless I anticipate having to code specifically for each condition for some reason.

Which of these two GetLargestValue C# implementations is better, and why?

I'm having a disagreement with someone over how best to implement a simple method that takes an array of integers, and returns the highest integer (using C# 2.0).
Below are the two implementations - I have my own opinion of which is better, and why, but I'd appreciate any impartial opinions.
Option A
public int GetLargestValue(int[] values)
{
try {
Array.Sort(values);
return values[values.Length - 1];
}
catch (Exception){ return -1;}
}
Option B
public int GetLargestValue(int[] values)
{
if(values == null)
return -1;
if(values.Length < 1)
return -1;
int highestValue = values[0];
foreach(int value in values)
if(value > highestValue)
highestValue = value;
return highestValue;
}
Ôption B of course.
A is ugly :
Catch(Exception) is a really bad practice
You shoul not rely on exception for null ref, out of range,...
Sorting is way complexier than iteration
Complexity :
A will be O(n log(n)) and even O(n²) in worst case
B worst case is O(n)
A has the side effect that it sorts the array. This might be unexpected by the caller.
Edit: I don't like to return -1 for empty or null array (in both solutions), since -1 might be a legal value in the array. This should really generate an exception (perhaps ArgumentException).
I prefer Option B as it only traverses the collection exactly once.
In Option A, you may have to access many elements more than once (the number of times is dependant upon the implementation of the sort alogrithm).
The Option A is an inefficent implementation, but results in a fairly clear algorithm. It does however use a fairly ugly Exception catch which would only ever be triggered if an empty array is passed in (so could probably be written clearer with a pre-sort check).
PS, you should never simply catch "Exception" and then correct things. There are many types of Exceptions and generally you should catch each possible one and handle accordingly.
The second one is better.
The complexity of the first is O(N LogN), and for the second it is O(N).
I have to choose option B - not that it's perfect but because option A uses exceptions to represent logic.
I would say that it depends on what your goal is, speed or readability.
If processing speed is your goal, I'd say the second solution, but if readability is the goal, I'd pick the first one.
I'd probably go for speed for this type of function, so I'd pick the second one.
There are many factors to consider here. Both options should include the bounds checks that are in option B and do away with using Exception handling in that manner. The second option should perform better most of the time as it only needs to traverse the array once. However, if the data was already sorted or needed to be sorted; then Option A would be preferable.
No sorting algorithm performs in n time, so Option B will be the fastest on average.
Edit: Article on sorting
I see two points here:
Parameter testing as opposed to exception handling: Better use explicit checking, should also be faster.
Sorting and picking the largest value as opposed to walking the whole array. Since sorting involves handling each integer in the array at least once, it will not perform as well as walking the whole array (only) once.
Which is better? For the first point, definitely explicit checking. For the second, it depends...
The first example is shorter, makes it quicker to write and read/understand. The second is faster. So: If runtime efficiency is an issue, choose the second option. If fast coding is your goal, use the first one.

Categories