Back in the day when I was learning C and assembly we were taught it is better to use simple comparisons to increase speed. So for example if you say:
if(x <= 0)
versus
if(x < 1)
which would execute faster? My argument (which may be wrong) is the second would almost always execute faster because there is only a single comparison) i.e. is it less than one, yes or no.
Whereas the first will execute fast if the number is less than 0 because this equates to true there is no need to check the equals making it as fast as the second, however, it will always be slower if the number is 0 or more because it has to then do a second comparison to see if it is equal to 0.
I am now using C# and while developing for desktops speed is not an issue (at least not to the degree that his point is worth arguing), I still think such arguments need to be considered as I am also developing for mobile devices which are much less powerful than desktops and speed does become an issue on such devices.
For further consideration, I am talking about whole numbers (no decimals) and numbers where there cannot be a negative number like -1 or -12,345 etc (unless there is an error), for example, when dealing with lists or arrays when you cant have a negative number of items but you want to check if a list is empty (or if there is a problem, set the value of x to negative to indicate error, an example is where there are some items in a list, but you cannot retrieve the whole list for some reason and to indicate this you set the number to negative which would not be the same as saying there are no items).
For the reason above I deliberately left out the obvious
if(x == 0)
and
if(x.isnullorempty())
and other such items for detecting a list with no items.
Again, for consideration, we are talking about the possibility of retrieving items from a database perhaps using SQL stored procedures which have the functionality mentioned (ie the standard (at least in this company) is to return a negative number to indicate a problem).
So in such cases, is it better to use the first or the second item above?
They're identical. Neither is faster than the other. They both ask precisely the same question, assuming x is an integer. C# is not assembly. You're asking the compiler to generate the best code to get the effect you are asking for. You aren't specifying how it gets that result.
See also this answer.
My argument (which may be wrong) is the second would almost always execute faster because there is only a single comparison) i.e. is it less than one, yes or no.
Clearly that's wrong. Watch what happens if you assume that's true:
< is faster than <= because it asks fewer questions. (Your argument.)
> is the same speed as <= because it asks the same question, just with an inverted answer.
Thus < is faster than >! But this same argument shows > is faster than <.
"just with an inverted answer" seems to sneak in an additional boolean operation so I'm not sure I follow this answer.
That's wrong (for silicon, it is sometimes correct for software) for the same reason. Consider:
3 != 4 is more expensive to compute than 3 == 4, because it's 3 != 4 with an inverted answer, an additional boolean operation.
3 == 4 is more expensive than 3 != 4, because it's 3 != 4 with an inverted answer, an additional boolean operation.
Thus, 3 != 4 is more expensive than itself.
An inverted answer is just the opposite question, not an additional boolean operation. Or, to be a bit more precise, it's with a different mapping of comparison results to final answer. Both 3 == 4 and 3 != 4 require you to compare 3 and 4. That comparison results in ether "equal" or "unequal". The questions just map "equal" and "unequal" to "true" and "false" differently. Neither mapping is more expensive than the other.
At least in most cases, no, there's no advantage to one over the other.
A <= does not normally get implemented as two separate comparisons. On a typical (e.g., x86) CPU, you'll have two separate flags, one to indicate equality, and one to indicate negative (which can also mean "less than"). Along with that, you'll have branches that depend on a combination of those flags, so < translates to a jl or jb (jump if less or jump if below --the former is for signed numbers, the latter for unsigned). A <= will translate to a jle or jbe (jump if less than or equal, jump if below or equal).
Different CPUs will use different names/mnemonics for the instructions, but most still have equivalent instructions. In every case of which I'm aware, all of those execute at the same speed.
Edit: Oops -- I meant to mention one possible exception to the general rule I mentioned above. Although it's not exactly from < vs. <=, if/when you can compare to 0 instead of any other number, you can sometimes gain a little (minuscule) advantage. For example, let's assume you had a variable you were going to count down until you reached some minimum. In a case like this, you might well gain a little advantage if you can count down to 0 instead of counting down to 1. The reason is fairly simple: the flags I mentioned previously are affected by most instructions. Let's assume you had something like:
do {
// whatever
} while (--i >= 1);
A compiler might translate this to something like:
loop_top:
; whatever
dec i
cmp i, 1
jge loop_top
If, instead, you compare to 0 (while (--i > 0) or while (--i != 0)), it might translate to something like this instead;
loop_top:
; whatever
dec i
jg loop_top
; or: jnz loop_top
Here the dec sets/clears the zero flag to indicate whether the result of the decrement was zero or not, so the condition can be based directly on the result from the dec, eliminating the cmp used in the other code.
I should add, however, that while this was quite effective, say, 30+ years ago, most modern compilers can handle translations like this without your help (though some compilers may not, especially for things like small embedded systems). IOW, if you care about optimization in general, it's barely possible that you might someday care -- but at least to me, application to C# seems doubtful at best.
Most modern hardware has built-in instructions for checking the less-than-or-equals consition in a single instruction that executes exactly as fast as the one checking the less-than condition. The argument that applied to the (much) older hardware no longer applies - choose the alternative that you think is most readable, i.e. the one that better conveys your idea to the readers of your code.
Here are my functions:
public static void TestOne()
{
Boolean result;
Int32 i = 2;
for (Int32 j = 0; j < 1000000000; ++j)
result = (i < 1);
}
public static void TestTwo()
{
Boolean result;
Int32 i = 2;
for (Int32 j = 0; j < 1000000000; ++j)
result = (i <= 0);
}
Here is the IL code, which is identical:
L_0000: ldc.i4.2
L_0001: stloc.0
L_0002: ldc.i4.0
L_0003: stloc.1
L_0004: br.s L_000a
L_0006: ldloc.1
L_0007: ldc.i4.1
L_0008: add
L_0009: stloc.1
L_000a: ldloc.1
L_000b: ldc.i4 1000000000
L_0010: blt.s L_0006
L_0012: ret
After a few testing sessions, obviously, the result is that neither is faster than the other. The difference consists only in few milliseconds which can't be considered a real difference, and the produced IL output is the same anyway.
Both ARM and x86 processors will have dedicated instructions both both "less than" and "less than or equal" (Which could also be evaluated as "NOT greater than"), so there will be absolutely no real world difference if you use any semi modern compiler.
While refactoring, if you change your mind about the logic, if(x<=0) is faster (and less error prone) to negate (ie if(!(x<=0)), compared to if(!(x<1)) which does not negate correctly) but that's probably not the performance you're referring to. ;-)
IF x<1 is faster, then the modern compilers will change x<=0 to x<1 (assuming x is an integral). So for modern compilers, this should not matter, and they should produce identical machine code.
Even if x<=0 compiled to different instructions than x<1, the performance difference would be so miniscule as to not be worth worrying about most of the time; there will very likely be other more productive areas for optimizations in your code. The golden rule is to profile your code and optimise the bits that ARE actually slow in the real world, not the bits that you think hypothetically may be slow, or are not as fast as they theoretically could be. Also concentrate on making your code readable to others, and not phantom micro-optimisations that disappear in a puff of compiler smoke.
#Francis Rodgers, you said:
Whereas the first will execute fast if the number is less than 0
because this equates to true there is no need to check the equals
making it as fast as the second, however, it will always be slower if
the number is 0 or more because it has to then do a second comparison
to see if it is equal to 0.
and (in commennts),
Can you explain where > is the same as <= because this doesn't make
sense in my logical world. For example, <=0 is not the same as >0 in
fact, totally opposite. I would just like an example so I can
understand your answer better
You ask for help, and you need help. I really want to help you, and I’m afraid that many other people need this help too.
Begin with the more basic thing. Your idea that testing for > is not the same as testing for <= is logically wrong (not only in any programming language). Look at these diagrams, relax, and think about it. What happens if you know that X <= Y in A and in B? What happen if you know that X > Y in each diagram?
Right, nothing changed, they are equivalent. The key detail of the diagrams is that true and false in A and B are on opposite sides. The meaning of that is that the compiler (or in general – de coder) has the freedom to reorganize the program flow in a way that both question are equivalent.That means, there is no need to split <= into two steps, only reorganize a little in your flow. Only a very bad compiler or interpreter will not be able to do that. Nothing to do yet with any assembler. The idea is that even for CPUs without sufficient flags for all comparisons the compiler can generate (pseudo) assembler code with use the test with best suit it characteristics. But adding the ability of the CPUs to check more than one flag in parallel at electronic level, the job of the compiler is very much simpler.
You may find curios/interesting to read the pages 3-14 to 3-15 and 5-3 to 5-5 (the last include the jump instructions with could be surprising for you) http://download.intel.com/products/processor/manual/325462.pdf
Anyway, I’d like to discuss more about related situations.
Comparing with 0 or with 1: #Jerry Coffin has a very good explanation at assembler level. Going deeply at machine code level the variant to compare with 1 needs to “hard code” the 1 into the CPU instruction and load it into the CPU, while the other variant managed to not do that. Anyway here the gain is absolutely small. I don’t think it will be measurable in speed in any real live situation. As a side comment, the instruction cmp i, 1 will made just a sort of subtractioni-1 (without saving the result) but setting the flags, and you end up comparing actually with 0 !!
More important could be this situation: compare X<=Y or Y>=X with obviously are logicaly equivalent, but that could have severe side effect if X and Y are expression with need to be evaluated and could influence the result of the other! Very bad still, and potentially undefined.
Now, coming back to the diagrams, looking at the assembler examples from #Jerry Coffin too. I see here the following issue. Real software is a sort of linear chain in memory. You select one of the conditions and jump into another program-memory position to continue while the opposite just continues. It could make sense to select the more frequent condition to be the one that just continues. I don’t see how we can give the compiler a hint in these situations, and obviously the complier can’t figure it out itself. Please, correct me if I’m wrong, but these sort of optimization problems are pretty much general, and the programmer must decide himself without the help of the complier.
But again, in fast any situation I’ll write my code looking at the general still and readability and not at these local small optimizations.
Related
I'm doing a bit of coding, where I have to write this sort of code:
if( array[i]==false )
array[i]=true;
I wonder if it should be re-written as
array[i]=true;
This raises the question: are comparisions faster than assignments?
What about differences from language to language? (contrast between java & cpp, eg.)
NOTE: I've heard that "premature optimization is the root of all evil." I don't think that applies here :)
This isn't just premature optimization, this is micro-optimization, which is an irrelevant distraction.
Assuming your array is of boolean type then your comparison is unnecessary, which is the only relevant observation.
Well, since you say you're sure that this matters you should just write a test program and measure to find the difference.
Comparison can be faster if this code is executed on multiple variables allocated at scattered addresses in memory. With comparison you will only read data from memory to the processor cache, and if you don't change the variable value when the cache decides to to flush the line it will see that the line was not changed and there's no need to write it back to the memory. This can speed up execution.
Edit: I wrote a script in PHP. I just noticed that there was a glaring error in it meaning the best-case runtime was being calculated incorrectly (scary that nobody else noticed!)
Best case just beats outright assignment but worst case is a lot worse than plain assignment. Assignment is likely fastest in terms of real-world data.
Output:
assignment in 0.0119960308075 seconds
worst case comparison in 0.0188510417938 seconds
best case comparison in 0.0116770267487 seconds
Code:
<?php
$arr = array();
$mtime = explode(" ", microtime());
$starttime = $mtime[1] + $mtime[0];
reset_arr($arr);
for ($i=0;$i<10000;$i++)
$arr[i] = true;
$mtime = explode(" ", microtime());
$firsttime = $mtime[1] + $mtime[0];
$totaltime = ($firsttime - $starttime);
echo "assignment in ".$totaltime." seconds<br />";
reset_arr($arr);
for ($i=0;$i<10000;$i++)
if ($arr[i])
$arr[i] = true;
$mtime = explode(" ", microtime());
$secondtime = $mtime[1] + $mtime[0];
$totaltime = ($secondtime - $firsttime);
echo "worst case comparison in ".$totaltime." seconds<br />";
reset_arr($arr);
for ($i=0;$i<10000;$i++)
if (!$arr[i])
$arr[i] = false;
$mtime = explode(" ", microtime());
$thirdtime = $mtime[1] + $mtime[0];
$totaltime = ($thirdtime - $secondtime);
echo "best case comparison in ".$totaltime." seconds<br />";
function reset_arr($arr) {
for ($i=0;$i<10000;$i++)
$arr[$i] = false;
}
I believe if comparison and assignment statements are both atomic(ie one processor instruction) and the loop executes n times, then in the worst-case comparing then assigning would require n+1(comparing on every iteration plus setting the assignement) executions whereas constantly asssigning the bool would require n executions. Therefore the second one is more efficient.
Depends on the language. However looping through arrays can be costly as well. If the array is in consecutive memory, the fastest is to write 1 bits (255s) across the entire array with memcpy assuming your language/compiler can do this.
Thus performing 0 reads-1 write total, no reading/writing the loop variable/array variable (2 reads/2 writes each loop) several hundred times.
I really wouldn't expect there to be any kind of noticeable performance difference for something as trivial as this so surely it comes down to what gives you clear, more readable code. I my opinion that would be always assigning true.
Might give this a try:
if(!array[i])
array[i]=true;
But really the only way to know for sure is to profile, I'm sure pretty much any compiler would see the comparison to false as unnecessary and optimize it out.
It all depends on the data type. Assigning booleans is faster than first comparing them. But that may not be true for larger value-based datatypes.
As others have noted, this is micro-optimization.
(In politics or journalism, this is known as navel-gazing ;-)
Is the program large enough to have more than a couple layers of function/method/subroutine calls?
If so, it probably had some avoidable calls, and those can waste hundreds as much time as low-level inefficiencies.
On the assumption that you have removed those (which few people do), then by all means run it 10^9 times under a stopwatch, and see which is faster.
Why would you even write the first version? What's the benefit of checking to see if something is false before setting it true. If you always are going to set it true, then always set it true.
When you have a performance bottleneck that you've traced back to setting a single boolean value unnecessarily, come back and talk to us.
I remember in one book about assembly language the author claimed that if condition should be avoided, if possible.
It is much slower if the condition is false and execution has to jump to another line, considerably slowing down performance. Also since programs are executed in machine code, I think 'if' is slower in every (compiled) language, unless its condition is true almost all the time.
If you just want to flip the values, then do:
array[i] = !array[i];
Performance using this is actually worse though, as instead of only having to do a single check for a true false value then setting, it checks twice.
If you declare a 1000000 element array of true,false, true,false pattern comparision is slower. (var b = !b) essentially does a check twice instead of once
I'm working on optimization techniques performed by the .NET Native compiler.
I've created a sample loop:
for (int i = 0; i < 100; i++)
{
Function();
}
And I've compiled it with Native. Then I disassembled the result .dll file with machine code inside in IDA. As the result, I have:
(I've removed a few unnecessary lines, so don't worry that address lines are inconsistent)
I understand that add esi, 0FFFFFFFFh means really subtract one from esi and alter Zero Flag if needed, so we can jump to the beginning if zero hasn't been reached yet.
What I don't understand is why did the compiler reverse the loop?
I came to the conclusion that
LOOP:
add esi, 0FFFFFFFFh
jnz LOOP
is just faster than for example
LOOP:
inc esi
cmp esi, 064h
jl LOOP
But is it really because of that and is the speed difference really significant?
inc might be slower than add because of the partial flag update. Moreover add affects the zero flag so you don't need to use another cmp instruction. Just jump directly.
This is one famous type of loop optimization
reversal: Loop reversal reverses the order in which values are assigned to the index variable. This is a subtle optimization which can help eliminate dependencies and thus enable other optimizations. Also, certain architectures utilize looping constructs at Assembly language level that count in a single direction only (e.g. decrement-jump-if-not-zero (DJNZ)).
Is it faster to count down than it is to count up?
GCC Loop optimization
You can see the result for other compilers here.
Your conclusion is correct: inverted cycle will target 0 (cycle will ends when register value reach 0), so that Add will set zero flag used in conditional branch.
This way you don't need dedicated Cmp which leads to: 1) size optimization 2) it's also faster (conclusion from compiler programmers decision and another answer).
That's pretty common assembler trick to write loop targeting 0. I am surprised you understand assembler, but don't know (asking) about it.
In order to evaluate a multiplication you have to evaluate the first term, then the second term and finally multiply the two values.
Given that every number multiplied by 0 is 0, if the evaluation of the first term returns 0 I would expect that the entire multiplication is evaluated to 0 without evaluating the second term.
However if you try this code:
var x = 0 * ComplexOperation();
The function ComplexOperation is called despite the fact that we know that x is 0.
The optimized behavior would be also consistent with the Boolean Operator '&&' that evaluates the second term only if the first one is evaluated as true. (The '&' operator evaluates both terms in any case)
I tested this behavior in C# but I guess it is the same for almost all languages.
Firstly, for floating-point, your assertion isn't even true! Consider that 0 * inf is not 0, and 0 * nan is not 0.
But more generally, if you're talking about optimizations, then I guess the compiler is free to not evaluate ComplexOperation if it can prove there are no side-effects.
However, I think you're really talking about short-circuit semantics (i.e. a language feature, not a compiler feature). If so, then the real justification is that C# is copying the semantics of earlier languages (originally C) to maintain consistency.
C# is not functional, so functions can have side effects. For example, you can print something from inside ComlpexOperation or change global static variables. So, whether it is called is defined by * contract.
You found yourself an example of different contracts with & and &&.
The language defines which operators have short-circuit semantics and which do not. Your ComplexOperation function may have side effects, those side effects may be deliberate, and the compiler is not free to assume that they should not occur just because the result of the function is effectively not used.
I will also add this would be obfuscated language design. There would be oodles of SO questions to the effect of...
//why is foo only called 9 times?????????
for(int i = 0; i < 10; i++) {
print((i-5)*foo());
}
Why allow short-circuiting booleans and not short-circuiting 0*? Well, firstly I will say that mixing short-circuit boolean with side-effects is a common source of bugs in code - if used well among maintainers who understand it as an obvious pattern then it may be okay, but it's very hard for me to imagine programmers becoming at all used to a hole in the integers at 0.
I have two question:
1) I need some expert view in terms of witting code which will be Performance and Memory Consumption wise sound enough.
2) Performance and Memory Consumption wise how good/bad is following piece of code and why ???
Need to increment the counter that could go maximum by 100 and writing code like this:
Some Sample Code is as follows:
for(int i=0;i=100;i++)
{
Some Code
}
for(long i=0;i=1000;i++)
{
Some Code
}
how good is to use Int16 or anything else instead of int, long if the requirement is same.
Need to increment the counter that could go maximum by 100 and writing code like this:
Options given:
for(int i=0;i=100;i++)
for(long i=0;i=1000;i++)
EDIT: As noted, neither of these would even actually compile, due to the middle expression being an assignment rather than an expression of type bool.
This demonstrates a hugely important point: get your code working before you make it fast. Your two loops don't do the same thing - one has an upper bound of 1000, the other has an upper bound of 100. If you have to choose between "fast" and "correct", you almost always want to pick "correct". (There are exceptions to this, of course - but that's usually in terms of absolute correctness of results across large amounts of data, not code correctness.)
Changing between the variable types here is unlikely to make any measurable difference. That's often the case with micro-optimizations. When it comes to performance, architecture is usually much more important than in-method optimizations - and it's also a lot harder to change later on. In general, you should:
Write the cleanest code you can, using types that represent your data most correctly and simply
Determine reasonable performance requirements
Measure your clean implementation
If it doesn't perform well enough, use profiling etc to work out how to improve it
DateTime dtStart = DateTime.Now;
for(int i=0;i=10000;i++)
{
Some Code
}
response.write ((DateTime.Now - dtStart).TotalMilliseconds.ToString());
same way for Long as well and you can know which one is better... ;)
When you are doing things that require a number representing iterations, or the quantity of something, you should always use int unless you have a good semantic reason to use a different type (ie data can never be negative, or it could be bigger than 2^31). Additionally, Worrying about this sort of nano-optimization concern will basically never matter when writing c# code.
That being said, if you are wondering about the differences between things like this (incrementing a 4 byte register versus incrementing 8 bytes), you can always cosult Mr. Agner's wonderful instruction tables.
On an Amd64 machine, incrementing long takes the same amount of time as incrementing int.**
On a 32 bit x86 machine, incrementing int will take less time.
** The same is true for almost all logic and math operations, as long as the value is not both memory bound and unaligned. In .NET a long will always be aligned, so the two will always be the same.
I am working on optimizing a physics simulation program using Red Gate's Performance Profiler. One part of the code dealing with collision detection had around 52 of the following little checks, dealing with cells in 26 directions in 3 dimensions, under two cases.
CollisionPrimitiveList cell = innerGrid[cellIndex + 1];
if (cell.Count > 0)
contactsMade += collideWithCell(obj, cell, data, ref attemptedContacts);
cell = innerGrid[cellIndex + grid.XExtent];
if (cell.Count > 0)
contactsMade += collideWithCell(obj, cell, data, ref attemptedContacts);
cell = innerGrid[cellIndex + grid.XzLayerSize];
if (cell.Count > 0)
contactsMade += collideWithCell(obj, cell, data, ref attemptedContacts);
As an extremely tight loop of the program, all of this had to be in the same method, but I found that, suddenly, after I had extended the area from two dimensions to three dimensions (rising the count to 52 checks from 16), suddenly cell.Count was no longer being inlined, even though it is a simple getter.
public int Count { get { return count; } }
This caused a humongous performance hit, and it took me a considerable time to find that, when cell.Count appeared in the method 28 times or less, it was inlined every time, but once cell.Count appeared in the method 29 times or more, it was not inlined a single time (even though the vast majority of calls were from worst-case scenario parts of the code that were rarely executed.)
So back to my question, does anybody have any idea to get around this limit? I think the easy solution is just to make the count field internal and not private, but I would like a better solution than this, or at least just a better understanding of the situation. I wish this sort of thing would have been mentioned on Microsoft's Writing High-Performance Managed Applications page at http://msdn.microsoft.com/en-us/library/ms973858.aspx but sadly it is not (possibly because of how arbitrary the 28 count limit is?)
I am using .NET 4.0.
EDIT: It looks like I misinterpreted my little testing. I found that the failure to inline was caused not by the methods themselves being called some 28+ times, but because the the method they ought to be inlined into is "too long" by some standard. This still confuses me, because I don't see how a simple getter could be rationally not inlined (and performance is significantly better with them inlined as my profiler clearly shows me), but apparently the CLI JIT compiler is refusing to inline anything just because the method is already large (playing around with slight variations showed me that this limit is a code size (from idasm) of 1500, above which no inlining is done, even in the case of my getters, which some testing showed add no additional code overhead to be inlined).
Thank you.
I haven't tested this, but it seems like one possible workaround is to have multiple properties that all return the same thing. Conceivably you could then get 28 inlines per property.
Note that the number of times a method is inlined most likely depends on the size of native code for that method (See http://blogs.msdn.com/b/vancem/archive/2008/08/19/to-inline-or-not-to-inline-that-is-the-question.aspx), the the number 28 is specific to that one property. A simple property would likely get inlined more times than a more complex method.
Straight off, this doesn't explain why 28 is the magic number, but I'm curious what would happen if you collate all your candidate CollisionListPrimitive instances into an array, and then call your "if count > 0" block within a loop of the array?
Is the cell.Count call then made inline again?
e.g.
CollisionPrimitiveList[] cells = new CollisionPrimitiveList {
innerGrid[cellIndex + 1],
innerGrid[cellIndex + grid.XExtent],
innerGrid[cellIndex + grid.XzLayerSize]
// and all the rest
};
// Loop over cells - for demo only. Use for loop or LINQ'ify if faster
foreach (CollisionPrimitiveList cell in cells)
{
if (cell.Count > 0)
contactsMade += collideWithCell(obj, cell, data, ref attemptedContacts);
}
I know performance is the issue, and you'll have overheads constructing the array and looping through it, but if cell.Count is inline again, might the performance still be better / good enough overall?
I'm guessing (though in no way positive) that this might have to do with the enregistration issue mentioned -- it's possible that the CLR is allocating a new variable for each if statement, and that those are exceeding a total of 64 variables. Do you think this might be the case?