Is the ternary operator (?:) thread safe in C#? - c#

Consider the following two alternatives of getting the higher number between currentPrice and 100...
int price = currentPrice > 100 ? currentPrice : 100
int price = Math.Max(currentPrice, 100)
I raised this question because I was thinking about a context where the currentPrice variable could be edited by other threads.
In the first case... could price obtain a value lower than 100?
I'm thinking about the following:
if (currentPrice > 100) {
//currentPrice is edited here.
price = currentPrice;
}

It is not threadsafe.
?: is just shortcut for normal if, so your if sample is equivalent to ? one - you can get price lower than 100 if there is no locking outside this code.

Not a specialist in C#, but even var++ is not thread save, since may translated to from reading into/writing from register in assembly.
Ternary operator is far more complicated. It has 3 parts, while each part can be endlessly big (e.g. call to some function). Therefore, it's pretty easy to conclude that ternary operator is not thread safe.

In theory, currentPrice is read twice. Once for comparison, once for assignment.
In practice, the compiler may cache the access to the variable. I don't know about C# but in C++ on x86:
MOV AX, [currentPrice]
MOV BX, 100 ;cache the immediate
CMP AX, BX
JLE $1 ;if(currentPrice > 100){
MOV AX, BX
$1: ;}
MOV [BP+price], AX ;price is on the stack.
The same load-once optimisation happens in Java bytecode unless currentPrice is declared volatile.
So, in theory, it can happen. In practice, on most platforms, it won't, but you cannot count on that.

As stated by others, it might be cached but the language does not require it.
You can use Interlocked.CompareExchange if you need lock-free threadsafe assignments. But given the example, I'd go for a more coarse grained locking strategy.

Related

Can the compiler/JIT optimize away short-circuit evaluation if there are no side-effects?

I have a test which goes:
if(variable==SOME_CONSTANT || variable==OTHER_CONSTANT)
In this circumstances, on a platform where branching over the second test would take more cycles than simply doing it, would the optimizer be allowed to treat the || as a simple |?
In this circumstances, on a platform where branching over the second test would take more cycles than simply doing it, would the optimizer be allowed to treat the || as a simple |?
Yes, that is permitted, and in fact the C# compiler will perform this optimization in some cases on && and ||, reducing them to & and |. As you note, there must be no side effects of evaluating the right side.
Consult the compiler source code for the exact details of when the optimization is generated.
The compiler will also perform that optimization when the logical operation involves lifted-to-nullable operands. Consider for example
int? z = x + y;
where x and y are also nullable ints; this will be generated as
int? z;
int? temp1 = x;
int? temp2 = y;
z = temp1.HasValue & temp2.HasValue ?
new int?(temp1.GetValueOrDefault() + temp2.GetValueOrDefault()) :
new int?();
Note that it's & and not &&. I knew that it is so fast to call HasValue that it would not be worth the extra branching logic to avoid it.
If you're interested in how I wrote the nullable arithmetic optimizer, I've written a detailed explanation of it here: https://ericlippert.com/2012/12/20/nullable-micro-optimizations-part-one/
Yes, the compiler can make that optimization. Indeed, every language of interest generally has an explicit or implicit "as if" type clause that makes such not-observable optimizations allowed without needing a specific rule for it. This allows is implement the checks in a non-shortcut manner, in addition to a whole host of more extreme optimizations, such as combining multiple conditions into one, eliminating the check entirely, implementing the check without any branch at all using predicated instructions, etc.
The other side, however, is that the specific optimization you mention of unconditionally performing the second check isn't performed very often on most common platforms because on many instruction sets the branching approach is the fastest, if you assume it doesn't change the predictability of the branch. For example, on x86, you can use cmp to compare a variable to a known value (as in your example), but the "result" ends up in the EFLAGs register (of which there is only one, architecturally). How do you implement the || in that case between the two comparison results? The second comparison will overwrite the flag set by the first, so you'll be stuck saving the flag somewhere, and then doing the second comparison, and then trying the "combine" the flags somehow just so you can do your single test1.
The truth is, ignoring prediction, the conditional branch is often almost free, especially when the compiler organizes it to be "not taken". For example, on x86, your condition could look like two cmp operations, each immediately followed by a jump over the code in the if() block. So just two branch instructions versus the hoops you'd have to jump though to reduce it to one. Going further - these cmp and subsequent branches often macro-fuse into a single operation that has about the same cost as the comparison alone (and take a single cycle). There are various caveats, but the overall assumption that "branching over the second test" will take much time is probably not well founded.
The main caveat is branch prediction. In the case that each individual clause is unpredictable, but where the whole condition is predictable, combining everything into a single branch can be very profitable. Imagine, for example, that in your (variable==SOME_CONSTANT || variable==OTHER_CONSTANT) that variable was equal to SOME_CONSTANT 50% of the time, and OTHER_CONSTANT 49% of the time. The if will thus be taken 99% of the time, but the first check variable==SOME_CONSTANT will be totally unpredictable: branching exactly half the time! In this case it would be a great idea to combine the checks, even at some cost, since the misprediction is expensive.
Now there are certain cases where the compiler can combine checks together simply due the form of the check. Peter shows an example using a range-check like example in his answer, and there are others.
Here's an interesting one I stumbled across where your SOME_CONSTANT is 2 and OTHER_CONSTANT is 4:
void test(int a) {
if (a == 2 || a == 4) {
call();
}
}
Both clang and icc implement this as a series of two checks and two branches, but recent gcc uses another trick:
test(int, int):
sub edi, 2
and edi, -3
je .L4
rep ret
.L4:
jmp call()
Essentially it subtracts 2 from a and then checks if any bit other than 0b10 is set. The values 2 and 4 are the only values accepted by that check. Interesting transformation! It's not that much better than the two branch approach, for predictable inputs, but for the unpredictable clauses but predictable final outcome case it will be a big win.
This isn't really a case of doing both checks unconditionally however: just a clever case of being able to combine multiple checks into fewer, possibly with a bit of math. So I don't know if it meets your criteria for a "yes, they actually do in practice" answer. Perhaps compilers do make this optimization, but I haven't seen it on x86. If it exists there it might only be triggered by profile-guided optimization, where the compiler has an idea of the probability of various clauses.
1 On platforms with fast cmov two cmovs to implement || is probably not a terrible choice, and && can be implemented similarly.
Compilers are allowed to optimize short-circuit comparisons into asm that isn't two separate test & branch. But sometimes it's not profitable (especially on x86 where compare-into-register takes multiple instructions), and sometimes compilers miss the optimization.
Or if compilers choose to make branchless code using a conditional-move, both conditions are always evaluated. (This is of course only an option when there are no side-effects).
One special case is range-checks: compilers can transform x > min && x < max (especially when min and max are compile-time constants) into a single check. This can be done with 2 instructions instead of branching on each condition separately. Subtracting the low end of the range will wrap to a large unsigned number if the input was lower, so a subtract + unsigned-compare gives you a range check.
The range-check optimization is easy / well-known (by compiler developers), so I'd assume C# JIT and ahead-of-time compilers would do it, too.
To take a C example (which has the same short-circuit evaluation rules as C#):
int foo(int x, int a, int b) {
if (10 < x && x < 100) {
return a;
}
return b;
}
Compiled (with gcc7.3 -O3 for the x86-64 Windows ABI, on the Godbolt compiler explorer. You can see output for ICC, clang, or MSVC; or for gcc on ARM, MIPS, etc.):
foo(int, int, int):
sub ecx, 11 # x-11
mov eax, edx # retval = a;
cmp ecx, 89
cmovnb eax, r8d # retval = (x-11U) < 89U ? retval : b;
ret
So the function is branchless, using cmov (conditional mov). #HansPassant says .NET's compiler only tends to do this for assignment operations, so maybe you'd only get that asm if you wrote it in the C#
source as retval = (10 < x && x < 100) ? a : b;.
Or to take a branching example, we get the same optimization of the range check into a sub and then an unsigned compare/branch instead of compare/cmov.
int ext(void);
int bar(int x) {
if (10 < x && x < 100) {
return ext();
}
return 0;
}
# gcc -O3
sub ecx, 11
cmp ecx, 88
jbe .L7 # jump if ((unsigned)x-11U) <= 88U
xor eax, eax # return 0;
ret
.L7:
jmp ext() # tailcall ext()
IDK if existing C# implementations make this optimization the same way, but it's easy and valid for all possible inputs, so they should.
Godbolt doesn't have a C# compiler; if there is a convenient online C# compiler that shows you the asm, it would be interesting to try these functions there. (I think they're valid C# syntax as well as valid C and valid C++).
Other cases
Some cases other than range-checks can be profitable to optimize into a single branch or cmov on multiple conditions. x86 can't compare into a register very efficiently (xor-zero / cmp / setcc), but in some cases you only need 0 / non-zero instead of a 0 / 1 boolean to combine later. x86's OR instruction sets flags, so you can or / jnz to jump if either register was non-zero. (But note that saving the test reg,reg before a jcc only saves code-size; macro-fusion works for test/jcc but not or/jcc, so or/test/jcc is the same number of uops as or/jcc. It saves a uop with cmovcc or setcc, though.)
If branches predict perfectly, two cmp / jcc are probably still cheapest (because of macro-fusion: cmp / jne is a single uop on recent CPUs), but if not then two conditions together may well predict better, or be better with CMOV.
int foo(int x, int a, int b) {
if ((a-10) || (x!=5)) {
return a;
}
return b;
}
On Godbolt with gcc7.3, clang5.0, ICC18, and MSVC CL19
gcc compiles it the obvious way, with 2 branches and a couple mov instructions. clang5.0 spots the opportunity to transform it:
# compiled for the x86-64 System V ABI this time: args in edi=x, esi=a, edx=b
mov eax, esi
xor eax, 10
xor edi, 5
or edi, eax # flags set from edi=(a^10) | (x^5)
cmovne edx, esi # edx = (edi!=0) ? a : b
mov eax, edx # return edx
ret
Other compilers need some hand-holding if you want them to emit code like this. (And clang could use the same help to realize that it can use lea to copy-and-subtract instead of needing a mov before xor to avoid destroying an input that's needed later).
int should_optimize_to(int x, int a, int b) {
// x!=10 fools compilers into missing the optimization
if ((a-10) | (x-5)) {
return a;
}
return b;
}
gcc, clang, msvc, and ICC all compile this to basically the same thing:
# gcc7.3 -O3
lea eax, [rsi-10] # eax = a-10
sub edi, 5 # x-=5
or eax, edi # set flags
mov eax, edx
cmovne eax, esi
ret
This is smarter than clang's code: putting the mov to eax before the cmov creates instruction-level parallelism. If mov has non-zero latency, that latency can happen in parallel with the latency of creating the flag input for cmov.
If you want this kind of optimization, you usually have to hand-hold compilers toward it.

Why does .NET Native compile loop in reverse order?

I'm working on optimization techniques performed by the .NET Native compiler.
I've created a sample loop:
for (int i = 0; i < 100; i++)
{
Function();
}
And I've compiled it with Native. Then I disassembled the result .dll file with machine code inside in IDA. As the result, I have:
(I've removed a few unnecessary lines, so don't worry that address lines are inconsistent)
I understand that add esi, 0FFFFFFFFh means really subtract one from esi and alter Zero Flag if needed, so we can jump to the beginning if zero hasn't been reached yet.
What I don't understand is why did the compiler reverse the loop?
I came to the conclusion that
LOOP:
add esi, 0FFFFFFFFh
jnz LOOP
is just faster than for example
LOOP:
inc esi
cmp esi, 064h
jl LOOP
But is it really because of that and is the speed difference really significant?
inc might be slower than add because of the partial flag update. Moreover add affects the zero flag so you don't need to use another cmp instruction. Just jump directly.
This is one famous type of loop optimization
reversal: Loop reversal reverses the order in which values are assigned to the index variable. This is a subtle optimization which can help eliminate dependencies and thus enable other optimizations. Also, certain architectures utilize looping constructs at Assembly language level that count in a single direction only (e.g. decrement-jump-if-not-zero (DJNZ)).
Is it faster to count down than it is to count up?
GCC Loop optimization
You can see the result for other compilers here.
Your conclusion is correct: inverted cycle will target 0 (cycle will ends when register value reach 0), so that Add will set zero flag used in conditional branch.
This way you don't need dedicated Cmp which leads to: 1) size optimization 2) it's also faster (conclusion from compiler programmers decision and another answer).
That's pretty common assembler trick to write loop targeting 0. I am surprised you understand assembler, but don't know (asking) about it.

Why increment and decrement are unary operations

Looks like it's a very strange question, because i've read a lot of documentation, where increment & decrement are unary operations without any explanation.
I could be wrong, but ++i is similar with i+=1 (if there aren't any overriding):
int i = 1;
Console.WriteLine(++i); // 2
int j = 1;
Console.WriteLine(j+=1); // 2
In this case, preincrement is simple syntatic sugar to hide binary operator plus and 1 as second argument.
Isn't it?
Why does increment and decrement are independent unary operations,- isn't it just binary plus operator with predefined second argument with value 1?
Your question boils down to why ++ and -- exist in the first place, when normal + and - could do the job.
With today's compiler optimisation capabilities, it's really all for historical reasons. ++ and -- date back to the early (but not earliest) days of C. The Development of the C Language by late Dennis Ritchie, author of the C language, gives some interesting historical insights:
Thompson went a step further by inventing the ++ and -- operators,
which increment or decrement;
[...]
They were not in the earliest versions of B, but
appeared along the way.
[...]
a stronger motivation
for the innovation was probably his observation that the translation
of ++x was smaller than that of x=x+1.
So the definite reason seems to be lost in the mists of history, but this article by Ritchie strongly suggests that increment and decrement operators owe their existence to performance issues with early compilers.
When C++ was invented, compatibility with C was one of the major design goals by its inventor Bjarne Stroustrup, so it's needless to mention that all C operators also exist in C++. As Stroustrup himself says in his FAQ:
I wanted C++ to be compatible with a complete language with sufficient
performance and flexibility for even the most demanding systems
programming.
As for C#, one its inventors Eric Lippert once stated here on the Stack Exchange network that the only reason for them being supported in C# is consistency with older languages:
[...] these operators are horrid features. They're very confusing; after
over 25 years I still get pre- and post- semantics mixed up. They
encourage bad habits like combining evaluation of results with
production of side effects. Had these features not been in
C/C++/Java/JavaScript/etc, they would not have been invented for C#.
P.S.: C++ is special because, as you have mentioned (even with the incorrect word "overriding"), you can overload all of those operators, which has lead to ++ and -- taking on slightly different semantics in the minds of many programmers. They sometimes read as "go ahead" ("go back") or "make one step forward" ("make one step backward"), typically with iterators. If you look at the ForwardIterator concept in C++, you will see that only the unary ++ is required by it.
The answer is very simple.
Unary Operation means the operator will do the operations on only on operand.
Next i++ and i+=1 both are different actions.
-> when i++ will executes you the compiler will goes the variable location and it will increment the value.
-> i+=1 executes the i and 1 will load into the register/temparory variable and the addition operation will done and the new value will copied into the i address loaction.
So compare to i+=1 will be cost compare to i++.

Restricting post/pre increment operator over a value rather than a variable, property and indexer

From this post (and not only) we got to the point that the ++ operator cannot be applied on expressions returning value.
And it's really obvious that 5++ is better to write as 5 + 1. I just want to summarize the whole thing around the increment/decrement operator. So let's go through these snippets of code that could be helpful to somebody stuck with the ++ first time at least.
// Literal
int x = 0++; // Error
// Constant
const int Y = 1;
double f = Y++; // error. makes sense, constants are not variables actually.
int z = AddFoo()++; // Error
Summary: ++ works for variables, properties (through a synthetic sugar) and indexers(the same).
Now the interest part - any literal expressions are optimized in CSC and, hence when we write, say
int g = 5 + 1; // This is compiled to 6 in IL as one could expect.
IL_0001: ldc.i4.6 // Pushes the integer value of 6 onto the evaluation stack as an int32.
For 5++ doesn't mean 5 becomes 6, it could be a shorthand for 5 + 1, like for x++ = x + 1
What's the real reason behind this restriction?
int p = Foo()++ //? yes you increase the return value of Foo() with 1, what's wrong with that?
Examples of code that can lead to logical issues are appreciated.
One of real-life example could be, perform one more actions than in the array.
for (int i = 0; i < GetCount()++; i++) { }
Maybe the lack of usage opts compiler teams to avoid similar features?
I don't insist this is a feature we lack of, just want to understand the dark side of this for compiler writers perhaps, though I'm not. But I know c++ allows this when returning a reference in the method. I'm neither a c++ guy(very poor knowledge) just want to get the real gist of the restriction.
Like, is it just because c# guys opted to restrict the ++ over value expressions or there are definite cases leading to unpredictable results?
In order for a feature to be worth supporting, it really needs to be useful. The code you've presented is in every case less readable than the alternative, which is just to use the normal binary addition operator. For example:
for (int i = 0; i < GetCount() + 1; i++) { }
I'm all in favour of the language team preventing you from writing unreadable code when in every case where you could do it, there's a simpler alternative.
Well before using these operators you should try to read up on how they do what they do. In particular you should understand the difference between postfix and prefix, which could help figure out what is and isn't allowed.
The ++ and -- operators modify their operands. Which means that the operand must be modifiable. If you can assign a value to the expression in question then it is modifiable, and is probably a variable(c#).
Taking a look at what these operators actually do. The postfix operators should increment after your line of code executes. As for the prefix operators, well they would need to have access to the value before the method had even been called yet. The way I read the syntax is ++lvalue (or ++variable) converting to memory operations:[read, write, read] or for lvalue++ [read, read, write] Though many compilers probably optimize secondary reads.
So looking at foo()++; the value is going to be plopped dead in the center of executing code. Which would mean the compiler would need to save the value somewhere more long-term in order for operations to be performed on said value, after the line of code has finished executing. Which is no doubt the exact reason C++ does not support this syntax either.
If you were to be returning a reference the compiler wouldn't have any trouble with the postfix. Of course in C# value types (ie. int, char, float, etc) cannot be passed by reference as they are value types.

Which is more performant: <= 0 or <1?

Back in the day when I was learning C and assembly we were taught it is better to use simple comparisons to increase speed. So for example if you say:
if(x <= 0)
versus
if(x < 1)
which would execute faster? My argument (which may be wrong) is the second would almost always execute faster because there is only a single comparison) i.e. is it less than one, yes or no.
Whereas the first will execute fast if the number is less than 0 because this equates to true there is no need to check the equals making it as fast as the second, however, it will always be slower if the number is 0 or more because it has to then do a second comparison to see if it is equal to 0.
I am now using C# and while developing for desktops speed is not an issue (at least not to the degree that his point is worth arguing), I still think such arguments need to be considered as I am also developing for mobile devices which are much less powerful than desktops and speed does become an issue on such devices.
For further consideration, I am talking about whole numbers (no decimals) and numbers where there cannot be a negative number like -1 or -12,345 etc (unless there is an error), for example, when dealing with lists or arrays when you cant have a negative number of items but you want to check if a list is empty (or if there is a problem, set the value of x to negative to indicate error, an example is where there are some items in a list, but you cannot retrieve the whole list for some reason and to indicate this you set the number to negative which would not be the same as saying there are no items).
For the reason above I deliberately left out the obvious
if(x == 0)
and
if(x.isnullorempty())
and other such items for detecting a list with no items.
Again, for consideration, we are talking about the possibility of retrieving items from a database perhaps using SQL stored procedures which have the functionality mentioned (ie the standard (at least in this company) is to return a negative number to indicate a problem).
So in such cases, is it better to use the first or the second item above?
They're identical. Neither is faster than the other. They both ask precisely the same question, assuming x is an integer. C# is not assembly. You're asking the compiler to generate the best code to get the effect you are asking for. You aren't specifying how it gets that result.
See also this answer.
My argument (which may be wrong) is the second would almost always execute faster because there is only a single comparison) i.e. is it less than one, yes or no.
Clearly that's wrong. Watch what happens if you assume that's true:
< is faster than <= because it asks fewer questions. (Your argument.)
> is the same speed as <= because it asks the same question, just with an inverted answer.
Thus < is faster than >! But this same argument shows > is faster than <.
"just with an inverted answer" seems to sneak in an additional boolean operation so I'm not sure I follow this answer.
That's wrong (for silicon, it is sometimes correct for software) for the same reason. Consider:
3 != 4 is more expensive to compute than 3 == 4, because it's 3 != 4 with an inverted answer, an additional boolean operation.
3 == 4 is more expensive than 3 != 4, because it's 3 != 4 with an inverted answer, an additional boolean operation.
Thus, 3 != 4 is more expensive than itself.
An inverted answer is just the opposite question, not an additional boolean operation. Or, to be a bit more precise, it's with a different mapping of comparison results to final answer. Both 3 == 4 and 3 != 4 require you to compare 3 and 4. That comparison results in ether "equal" or "unequal". The questions just map "equal" and "unequal" to "true" and "false" differently. Neither mapping is more expensive than the other.
At least in most cases, no, there's no advantage to one over the other.
A <= does not normally get implemented as two separate comparisons. On a typical (e.g., x86) CPU, you'll have two separate flags, one to indicate equality, and one to indicate negative (which can also mean "less than"). Along with that, you'll have branches that depend on a combination of those flags, so < translates to a jl or jb (jump if less or jump if below --the former is for signed numbers, the latter for unsigned). A <= will translate to a jle or jbe (jump if less than or equal, jump if below or equal).
Different CPUs will use different names/mnemonics for the instructions, but most still have equivalent instructions. In every case of which I'm aware, all of those execute at the same speed.
Edit: Oops -- I meant to mention one possible exception to the general rule I mentioned above. Although it's not exactly from < vs. <=, if/when you can compare to 0 instead of any other number, you can sometimes gain a little (minuscule) advantage. For example, let's assume you had a variable you were going to count down until you reached some minimum. In a case like this, you might well gain a little advantage if you can count down to 0 instead of counting down to 1. The reason is fairly simple: the flags I mentioned previously are affected by most instructions. Let's assume you had something like:
do {
// whatever
} while (--i >= 1);
A compiler might translate this to something like:
loop_top:
; whatever
dec i
cmp i, 1
jge loop_top
If, instead, you compare to 0 (while (--i > 0) or while (--i != 0)), it might translate to something like this instead;
loop_top:
; whatever
dec i
jg loop_top
; or: jnz loop_top
Here the dec sets/clears the zero flag to indicate whether the result of the decrement was zero or not, so the condition can be based directly on the result from the dec, eliminating the cmp used in the other code.
I should add, however, that while this was quite effective, say, 30+ years ago, most modern compilers can handle translations like this without your help (though some compilers may not, especially for things like small embedded systems). IOW, if you care about optimization in general, it's barely possible that you might someday care -- but at least to me, application to C# seems doubtful at best.
Most modern hardware has built-in instructions for checking the less-than-or-equals consition in a single instruction that executes exactly as fast as the one checking the less-than condition. The argument that applied to the (much) older hardware no longer applies - choose the alternative that you think is most readable, i.e. the one that better conveys your idea to the readers of your code.
Here are my functions:
public static void TestOne()
{
Boolean result;
Int32 i = 2;
for (Int32 j = 0; j < 1000000000; ++j)
result = (i < 1);
}
public static void TestTwo()
{
Boolean result;
Int32 i = 2;
for (Int32 j = 0; j < 1000000000; ++j)
result = (i <= 0);
}
Here is the IL code, which is identical:
L_0000: ldc.i4.2
L_0001: stloc.0
L_0002: ldc.i4.0
L_0003: stloc.1
L_0004: br.s L_000a
L_0006: ldloc.1
L_0007: ldc.i4.1
L_0008: add
L_0009: stloc.1
L_000a: ldloc.1
L_000b: ldc.i4 1000000000
L_0010: blt.s L_0006
L_0012: ret
After a few testing sessions, obviously, the result is that neither is faster than the other. The difference consists only in few milliseconds which can't be considered a real difference, and the produced IL output is the same anyway.
Both ARM and x86 processors will have dedicated instructions both both "less than" and "less than or equal" (Which could also be evaluated as "NOT greater than"), so there will be absolutely no real world difference if you use any semi modern compiler.
While refactoring, if you change your mind about the logic, if(x<=0) is faster (and less error prone) to negate (ie if(!(x<=0)), compared to if(!(x<1)) which does not negate correctly) but that's probably not the performance you're referring to. ;-)
IF x<1 is faster, then the modern compilers will change x<=0 to x<1 (assuming x is an integral). So for modern compilers, this should not matter, and they should produce identical machine code.
Even if x<=0 compiled to different instructions than x<1, the performance difference would be so miniscule as to not be worth worrying about most of the time; there will very likely be other more productive areas for optimizations in your code. The golden rule is to profile your code and optimise the bits that ARE actually slow in the real world, not the bits that you think hypothetically may be slow, or are not as fast as they theoretically could be. Also concentrate on making your code readable to others, and not phantom micro-optimisations that disappear in a puff of compiler smoke.
#Francis Rodgers, you said:
Whereas the first will execute fast if the number is less than 0
because this equates to true there is no need to check the equals
making it as fast as the second, however, it will always be slower if
the number is 0 or more because it has to then do a second comparison
to see if it is equal to 0.
and (in commennts),
Can you explain where > is the same as <= because this doesn't make
sense in my logical world. For example, <=0 is not the same as >0 in
fact, totally opposite. I would just like an example so I can
understand your answer better
You ask for help, and you need help. I really want to help you, and I’m afraid that many other people need this help too.
Begin with the more basic thing. Your idea that testing for > is not the same as testing for <= is logically wrong (not only in any programming language). Look at these diagrams, relax, and think about it. What happens if you know that X <= Y in A and in B? What happen if you know that X > Y in each diagram?
Right, nothing changed, they are equivalent. The key detail of the diagrams is that true and false in A and B are on opposite sides. The meaning of that is that the compiler (or in general – de coder) has the freedom to reorganize the program flow in a way that both question are equivalent.That means, there is no need to split <= into two steps, only reorganize a little in your flow. Only a very bad compiler or interpreter will not be able to do that. Nothing to do yet with any assembler. The idea is that even for CPUs without sufficient flags for all comparisons the compiler can generate (pseudo) assembler code with use the test with best suit it characteristics. But adding the ability of the CPUs to check more than one flag in parallel at electronic level, the job of the compiler is very much simpler.
You may find curios/interesting to read the pages 3-14 to 3-15 and 5-3 to 5-5 (the last include the jump instructions with could be surprising for you) http://download.intel.com/products/processor/manual/325462.pdf
Anyway, I’d like to discuss more about related situations.
Comparing with 0 or with 1: #Jerry Coffin has a very good explanation at assembler level. Going deeply at machine code level the variant to compare with 1 needs to “hard code” the 1 into the CPU instruction and load it into the CPU, while the other variant managed to not do that. Anyway here the gain is absolutely small. I don’t think it will be measurable in speed in any real live situation. As a side comment, the instruction cmp i, 1 will made just a sort of subtractioni-1 (without saving the result) but setting the flags, and you end up comparing actually with 0 !!
More important could be this situation: compare X<=Y or Y>=X with obviously are logicaly equivalent, but that could have severe side effect if X and Y are expression with need to be evaluated and could influence the result of the other! Very bad still, and potentially undefined.
Now, coming back to the diagrams, looking at the assembler examples from #Jerry Coffin too. I see here the following issue. Real software is a sort of linear chain in memory. You select one of the conditions and jump into another program-memory position to continue while the opposite just continues. It could make sense to select the more frequent condition to be the one that just continues. I don’t see how we can give the compiler a hint in these situations, and obviously the complier can’t figure it out itself. Please, correct me if I’m wrong, but these sort of optimization problems are pretty much general, and the programmer must decide himself without the help of the complier.
But again, in fast any situation I’ll write my code looking at the general still and readability and not at these local small optimizations.

Categories