I was wondering if there is a performance difference when using ifs in C#, and they are nested or not. Here's an example:
if(hello == true) {
if(index == 34) {
DoSomething();
}
}
Is this faster or slower than this:
if(hello == true && index == 34) {
DoSomething();
}
Any ideas?
Probably the compiler is smart enough to generate the same, or very similar code, for both versions. Unless performance is really a critical factor for your application, I would automatically choose the second version, for the sake of code readability.
Even better would be
if(SomethingShouldBeDone()) {
DoSomething();
}
...meanwhile in another part of the city...
private bool SomethingShouldBeDone()
{
return this.hello == true && this.index == 34;
}
In 99% of real-life situations this will have little or no performance impact, and provided you name things meaningfully it will be much easier to read, understand and (therefore) maintain.
Use whichever is most readable and still correct (sometimes juggling around boolean expressions will get you different behavior - especially if short-circuiting is involved). The execution time will be the same (or too close to matter).
Just for the record, sometimes I find nesting to be more readable (if the expression turns out to be too long or to have too many components) and sometimes I find it to be less readable (as in your short example).
Any modern compiler, and by that I mean anything built in the past 20 years, will compile these to the same code.
As to which you should use then it depends whichever is more readable and logical in the context of the project). Generally I would go for the second myself, but that would vary.
A strong point worth consideration though arises from maintenance. One of the more common bugs I have hunted down is a dangling if/else in the middle of a block of nested ifs. This arises if you have a complex series of if else conditions which has been amended by different programmers over a period - often several years. For example using pseudo-code for a simple case:
IF condition_a
IF condition_b
Do something
ELSE
Do something
END IF
ELSE
IF condition_b
Do something
END IF
END IF
you'll notice for the combination !condition_a && !condition_b the code will fall through the conditions doing nothing. This is quite easy to spot for just the pair of conditions, but can get very easy to miss very quickly once you have 3, 4 or more if/else conditions to check. What commonly happens is the nested structure is correct when first coded, but becomes incorrect (in terms of the business outputs) at some later point because the maintenance programmers will not understand or allow for the full range of options.
It's therefore generally more robust, over time, to code using combined conditions in the if structure adopting the flatest feasible structure and keep nesting to a minimum, hence with your example as there's no logical reason not to combine the two conditions into a single statement then you should do so
I can't see that there will be any great performance difference with either, but I do think that option two is MUCH more readable.
I don't believe there is any performance difference you might be experiencing between the two implementation..
Anyway, I go for for the latter implementation because it is more readable.
Depends on the compiler. The difference will be more apparent when you have code after the close of the nested if, but before the close of the outer.
I've wondered about this often myself. However, it seems there really is no difference (or not much to speak of) between the options. Readability-wise, the second option is more readable and so I usually choose that one unless I anticipate having to code specifically for each condition for some reason.
Related
Scenario: my query variable is dynamic, there are 4 possible values for that depending on the report type (_reportType). Meaning there are 4 different queries and some of it doesn't have #STAFF in the where condition, so my question is, is it safe to just leave my
dBCommand.AddParameter("#STAFF", staff)
there or should I include if else condition just to be safe?
Like this
if(_reportType == 1)
{
dBCommand.AddParameter("#STAFF", staff);
}
else if (_reportType == 2)
{
//code
}
else if (_reportType == 3)
{
//code
}
else
{
//Don't add dBCommand.AddParameter("#STAFF", staff);
}
Is it safe just to leave addParameter("#STAFF", staff) even though I'm not going to use it in a query?
Example I'm going to write
dBCommand.Initialize(string.Format(query, "RetailTable"), batch);
dBCommand.AddParameter("#STAFF", staff);
But the query value doesn't have #STAFF in the WHERE condition
It should generally be ok to specify unused parameters, aside from the minor overhead of sending the value to the server. The exception is if you execute DDL queries that have a restriction of being the only statement in the batch (e.g. CREATE VIEW). Those would fail due to the parameter.
There are 2 glaring bad practices in your approach:
1. Generating dynamic query within the code.
This approach has many drawbacks and possible security loopholes. You should almost always avoid doing that.
Please go through the following links to understand this more:
https://codingsight.com/dynamic-sql-vs-stored-procedure/
https://www.ecanarys.com/Blogs/ArticleID/112/SQL-injection-attack-and-prevention-using-stored-procedure
2. Trying to use generic Where Clause that fits all your variations.
This approach is disaster in waiting, regardless of the query being written in your application code OR in a Stored Procedure.
This is an ugly code-smell and a maintenance nightmare.
No developer can ever be 100% sure that there will not be any change required during the lifespan of the application due to a simple fact that the client WILL need enhancements on regular bases.
So, even if this approach may work for you for a small period of time, this will blow back.
Assume, over the period, there are few more filter parameters added due to new requirements. Now, imagine how your code would look like and the possibilities it creates of problems you may get if they are not handled properly. Specially when YOU are not making those changes. Scary, right?
Always write code that will not only be easier to read and understand, but also easy to enhance and maintain, regardless of the person writing the code.
So, IMHO, you should add those if-else conditions OR use switch-case blocks to safeguard yourself and your client. It may look overkill in the start, but will surely payoff in future.
Hope this help!
Am I correct in saying that this:
public static void MethodName{bool first, bool second, bool third}
{
//Do something
}
Is more efficient than this:
public static void MethodName{bool [] boolArray}
{
bool first = boolArray[0];
bool second = boolArray[1];
bool third = boolArray[2];
//Do something
}
My thoughts are that for both they would have to declare first, second and third - just in different places. But for the second one it has to add it into an array and then unpack it again.
Unless you declared the array like this:
MethodName(new[] { true, true, true });
In which case I am not sure which is faster?
I ask because I am thinking of using the second one but wanted to know if/what the implications are on performance.
In this case performance is not particularly important, but it would be helpful for me to clarify this point.
Also, the second one has the advantage that you can pass as many values as you like to it, and it is also easier to read I think?
The reason I am thinking of using this is because there are already about 30 parameters being passed into the method and I feel it is becoming confusing to keep adding more. All these bools are closely related so I thought it may make the code more manageable to package them up.
I am working on existing code and it is not in my project scope to spend time reworking the method to decrease the number of parameters that are passed into the method, but I thought it would be good practice to understand the implications of this change.
In terms of performance, there's just an answer for your question:
"Programmers waste enormous amounts of time thinking about, or
worrying about, the speed of noncritical parts of their programs, and
these attempts at efficiency actually have a strong negative impact
when debugging and maintenance are considered. We should forget about
small efficiencies, say about 97% of the time: premature optimization
is the root of all evil. Yet we should not pass up our opportunities
in that critical 3%."
In terms of productivity, parameters > arrays.
Side note
Everyone should know that that was said by Donald Knuth in 1974. More than 40 years after this statement, we still fall on premature optimization (or even pointless optimization) very often!
Further reading
I would take a look at this other Q&A on Software Engineering
Am I correct in saying that this:
Is more efficient than this:
In isolation, yes. Unless the caller already has that array, in which case the second is the same or even (for larger argument types or more arguments) minutely faster.
I ask because I am thinking of using the second one but wanted to know if/what the implications are on performance.
Why are you thinking about the second one? If it is more natural at the point of the call then the reasons making it more natural are likely going to also have a performance impact that makes the second the better one in the wider context that outweighs this.
If you're starting off with three separate bools and you're wrapping them just to unwrap them again then I don't see what this offers in practice except for more typing.
So your reason for considering this at all is the more important thing here.
In this case performance is not particularly important
Then really don't worry about it. It's certainly known for hot-path code that hits params to offer overloads that take set numbers of individual parameters, but it really does only make a difference in hot paths. If you aren't in a hot path the lifetime saving of computing time of picking whichever of the two is indeed more efficient is unlikely to add up to the
amount of time it took you to write your post here.
If you are in a hot path and really need to shave off every nanosecond you can because you're looping so much that it will add up to something real, then you have to measure. Isolated changes have non-isolated effects when it comes to performance, so it doesn't matter whether the people on the Internet tell you A is faster than B if the wider context means the code calling A is slower than B. Measure. Measurement number one is "can I even notice?", if the answer to that measurement is "no" then leave it alone and find somewhere where the performance impact is noticeable to optimise instead.
Write "natural" code to start with, before seeing if little tweaks can have a performance impact in the bits that are actually hurting you. This isn't just because of the importance of readability and so on, but also because:
The more "natural" code in a given language very often is the more efficient. Even if you think it can't be, it's more likely to benefit from some compiler optimisation behind the scenes.
The more "natural" code is a lot easier to tweak for performance when it is necessary than code doing a bunch of strange things.
I don't think this would affect the performance of your app at all.
Personally
I'd go with the first option for two reasons:
Naming each parameter: if the project is a large scale project and there is a lot of coding or for possible future edits and enhancements.
Usability: if you are sending a list of similar parameters then you must use an array or a list, if it just a couple of parameters that happened to be of the same type then you should be sending them separately.
Third way would be use of params, Params - MSDN
In the end I dont think it will change much in performance.
array[] though inheritates from abstract Array class which implements IEnumerable and IEnumerable<t> (ICloneable, IList, ICollection,
IEnumerable, IStructuralComparable, IStructuralEquatable), this means objects are more blown up than three value type Parameters, which will make then slower obviously
Array - MSDN
You could test performance differences on both, but I doubt there would be much difference.
You have to consider maintainability, is another programmer, or even yourself going to understand why you did it that way in a few weeks, or a few months time when it's time for review? Is it easily extended, can you pass different object types through to your method?
If your passing a collection of items, then certainly packing them into an array would be quicker than specifying a new parameter for each additional item?
If you have to, you can do it that way, but have you considered param array??
Why use the params keyword?
public static void MethodName{params bool [] boolAarray}
{
//extract data here
}
Agreed with Matias' answer.
I also want to add that you need to add error checking, as you are passed an array, and nowhere is stated how many elements in your array you will receive. So you must first check that you have three elements in your array. This will balance the small perf gain that you may have earned.
Also, if you ever want to make this method available to other developers (as part of an API, public or private), intellisense will not help them at all in which parameters they're suppposed to set...
While using three parameters, you can do this :
///<summary>
///This method does something
///</summary>
///<param name="first">The first parameter</param>
///<param name="second">The second parameter</param>
///<param name="third">The third parameter</param>
public static void MethodName{bool first, bool second, bool third}
{
//Do something
}
And it will be displayed nicely and helpfully to others...
I would take a different approach and use Flags;
public static void MethodName(int Flag)
{
if (Flag & FIRST) { }
}
Chances are the compiler will do its own optimizations;
Check http://rextester.com/QRFL3116 Added method from Jamiec comment
M1 took 5ms
M2 took 23ms
M3 took 4ms
Code is read more often then updated. Writing more readable code is better than writing powerful and geeky code when compilers can optimize for best execution.
For example see below code - this code can be compressed by combining the nested if statements, but will the compiler not optimize this code for best execution anyway while we get to maintain the readability of it?
// yeild sunRays when sky is blue.
// yeild sunRays when sky is not blue and sun is not present.
if (yieldWhenSkyIsBlue)
{
// if sky is blue and sun is present -> yeild sunRaysObjB.
if (sunObjA != null)
{
yield return sunRaysObjB;
}
else
{
// do not yield ;
}
}
else
{
// if sky is not blue and sun is not present -> yeild sunRaysObjB.
if (sunObjA == null)
{
yield return sunRaysObjB;
}
}
As opposed to something like this :
// yeild sunRays when (sky is blue) or (sun is not present and sky is blue).
// (this interpretation is a bit misleading as compared to first one?)
if(( sunObjA == null && yieldWhenSkyIsBlue ==false) || (yieldWhenSkyIsBlue && sunObjA != null) )
{
yield return sunRaysObjB;
}
Reading the first version depicts the use case better for future enhancements\updates ? The second version of the code is shorter but reading it does not make the use case very apparent or does it ? Are there other advantages of second case apart from concise code ?
update #1 : yes it returns ObjB in both cases but based on the condition it may not yield at all. so the strategy decides when to yield and when not. ( one more reason why readability is imp)
update #2 : updated to site a better example. copied the syntax from stripplingWarrior
update #3 : updated for "What do you expect to happen when the sun is out and the sky is blue".
I think the second code example is much more readable, and has the advantage of being pretty optimal anyway.
Most programmers will find this logic flow to be obvious and natural: you will return ObjB if ObjA is null, or if it's not null and howtoYieldFalg is set.
But if I had to choose between making code like this more readable and making it optimal, I'd make it readable first. Only if I discovered that it's the source of a bottleneck would I bother optimizing it. In this particular case, I can pretty much guarantee that your use of yield return will introduce way more overhead than a suboptimal evaluation of your conditionals.
Update
Take another look at your code samples: they are not logically equivalent. What do you expect to happen when the sun is out and the sky is blue? The second code sample correctly allows sun rays to shine in that case, whereas the first example does not.
The fact that it was so easy to introduce a bug in the first case which so many people failed to catch for so long should be ample evidence to show that the second approach is better. All those nested if/else statements can be tricky to keep straight, even to an experienced programmer. Simple boolean logic is a lot easier to keep straight, especially once you use variable names that give it meaning.
Update 2
Based on the further explanation, and with a little creativity, I'm going to suggest an approach that uses both comments and variable names to increase clarity:
/* Explanation: We live on a strange planet where the sun's
* rays can shine if the sky is blue while the sun is out,
* or if the sky is not blue and there is no sun. */
bool sunIsPresent = sunObjA != null;
if ((skyIsBlue && sunIsPresent) ||
(!skyIsBlue && !sunIsPresent))
{
yield return sunRaysObjB;
}
The compiler optimizes right through any way you organize your program's control flow, so you really do not have to worry about it.
The weakness of compilers though, is they only optimize based on preserving code semantics, not preserving the meaning you intend. I compiled both your examples in LLVM, and here are the control flow graphs generated:
and
I was surprised to find the two CFG's are slightly different. You will note that first is an instruction smaller, but in the second graph, there exists a path to the exit node which only passes through one comparison, whereas in the first, two comparisons are always necessary.
In fact, further tracing of possible routes yields that the first example has possible routes of 6,8,8,6 instructions long, while the second has routes of 8,10,10 respectively. In BOTH cases the average run length is 7 instructions long, but we can see that the first case has better best-time run lengths. Without more information the compiler cannot tell which is better.
tldr: Compilers do magic stuff, don't worry about it, code how you think is best.
This is probably not the popular opinion but I'd definitely not rely on the compiler to perform optimizations of this type. (It may do it, I don't know.) I don't see the second example as geeky - for me it describes more clearly that the two conditions are connected.
Typically I try to write as optimal code as possible without making it very cryptic and then let the compiler optimize that.
Though I haven't tested this particular case, I'm willing to bet that there will be no significant difference between the generated code, if any at all.
Unless you're doing it for fun or a specialized use case, I would argue human-readability is by far the more important quality of good code. The compiler is going to collapse much of your expressive code into more efficient forms, and what it misses you probably won't ever notice.
Given that, idiomatic code is easier to read even when it's less concise. Experienced readers of a language are going to recognize a common pattern more quickly than unfamiliar code that is, arguably 'more human' but breaks the familiar pattern. Looping/incrementing constructs are a good example of code that should be unsurprising. So, my approach is: Be expressive but not too clever.
Ok so the title may have been confusing so i have posted 2 code snippets to illustrate what i mean.
NOTE: allUsers is just a collection.
RegularUser regUser = new RegularUser(userName, password, name, emailAddress);
allUsers.Add(regUser);
VS
allUsers.Add(new RegularUser(userName, password, name, emailAddress));
Which snippet A or B is better and why?
What are the advantages or disadvantages?
The example i wrote was C# but does the language (C#, Java etc) make a difference?
As far as C# is concerned, both of your code examples are practically identical at the IL level. The second examples still creates a reference to the created object and pushes it onto the stack, you just don't have a local variable hooked up to it. This will not create any performance problems at all.
1) Which snippet A or B is better and why?
They're really identical. The compiled code will be nearly identical, since a temporary object is pushed onto the stack, then used in the method call.
2) What are the advantages or disadvantages?
The main advantages and disadvantages to the approach are really just readability.
Your first example has the advantage of keeping a single "operation" per line of code, which, in many ways, is more maintainable.
The second example removes the unnecessary variable declaration, which may be more maintainable.
Personally, I feel that the number of parameters in your RegularUser constructor would probably push me, in this instance, towards your first option. I typically find that, when a line of code gets to be more than about half a screen width on an average monitor, it's easier to read and understand if it's split up. Splitting this up by introducing the temporary and calling Add separately makes this more clear.
However, if you're just adding an integer or a class that's very small, I'd probably vote to skip the unnecessary variable. This is completely a personal preference, however - your milage may (and probably will) vary.
3) The example i wrote was C# but does the language (C#, Java etc) make a difference?
No, for the most part. This is really language/implementation dependent, but most languages will have the same basic behavior and performance in both cases. It is possible (and highly likely) that some languages may treat this differently, but most mainstream languages will not.
I really like to create them the first way unless I really really know what is going on. It is much harder to do debugging if you don't create the object first...
The compiler will just turn the 2nd version into the 1st for you, anyway, so there isn't a net negative effect.
Pros of #1:
easier to debug (!)
theoretically easier to read, clearer
can use the object later
Cons:
more verbose
can be unnecessary, especially for trivial objects
Result:
1 for anything complex to create, or that may need to be inspected easily at debug time
2 for lots of annoying little stuff, like the following.
var list = new List<NameValuePair>(3);
list.Add( new NameValuePair("name", "valuable");
list.add( new NameValuePair("age", "valuable");
list.add( new NameValuePair("height", "not valuable");
var dates = new List<date>();
dates.Add(DateTime.Now());
dates.Add(DateTime.Now().Date().AddYears(-2));
As far as I know there isn't a real difference between languages when it comes to this. Some may not allow it, though.
Both are equal in terms of performance.
In terms of maintainability the second case is a nightmare, it is (nearly) impossible to trace in a debugger. So I tend to prefer the first one. In my early oop days I was always writing the second, because "I knew that they were objects and I was sooo good at grasping objects that I ... blah blah blah", but that wore off with time and especially maintenance time
Also, suppose that someone wants you to
FilterClass.FilterUser(regUser)
or
Database.AddToDatabase(regUser)
because it is the right place to do so, the first scenario is better.
Finally, when do you stop?
allUsers.Add(new RegularUser(new ReadFromInput(new EscapedName(new Name(new String(userName)))), password, name, emailAddress));
I've been poking around mscorlib to see how the generic collection optimized their enumerators and I stumbled on this:
// in List<T>.Enumerator<T>
public bool MoveNext()
{
List<T> list = this.list;
if ((this.version == list._version) && (this.index < list._size))
{
this.current = list._items[this.index];
this.index++;
return true;
}
return this.MoveNextRare();
}
The stack size is 3, and the size of the bytecode should be 80 bytes. The naming of the MoveNextRare method got me on my toes and it contains an error case as well as an empty collection case, so obviously this is breaching separation of concern.
I assume the MoveNext method is split this way to optimize stack space and help the JIT, and I'd like to do the same for some of my perf bottlenecks, but without hard data, I don't want my voodoo programming turning into cargo-cult ;)
Thanks!
Florian
If you're going to think about ways in which List<T>.Enumerator is "odd" for the sake of performance, consider this first: it's a mutable struct. Feel free to recoil with horror; I know I do.
Ultimately, I wouldn't start mimicking optimisations from the BCL without benchmarking/profiling what difference they make in your specific application. It may well be appropriate for the BCL but not for you; don't forget that the BCL goes through the whole NGEN-alike service on install. The only way to find out what's appropriate for your application is to measure it.
You say you want to try the same kind of thing for your performance bottlenecks: that suggests you already know the bottlenecks, which suggests you've got some sort of measurement in place. So, try this optimisation and measure it, then see whether the gain in performance is worth the pain of readability/maintenance which goes with it.
There's nothing cargo-culty about trying something and measuring it, then making decisions based on that evidence.
Separating it into two functions has some advantages:
If the method were to be inlined, only the fast path would be inlined and the error handling would still be a function call. This prevents inlining from costing too much extra space. But 80 bytes of IL is probably still above the threshold for inlining (it was once documented as 32 bytes, don't know if it's changed since .NET 2.0).
Even if it isn't inlined, the function will be smaller and fit within the CPU's instruction cache more easily, and since the slow path is separate, it won't have to be fetched into cache every time the fast path is.
It may help the CPU branch predictor optimize for the more common path (returning true).
I think that MoveNextRare is always going to return false, but by structuring it like this it becomes a tail call, and if it's private and can only be called from here then the JIT could theoretically build a custom calling convention between these two methods that consists of just a jmp instruction with no prologue and no duplication of epilogue.