Constants and compile time evaluation - Why change this behaviour - c#

If you forward to approximately 13 minutes into this video by Eric Lippert he describes a change that was made to the C# compiler that renders the following code invalid (Apparently prior to and including .NET 2 this code would have compiled).
int y;
int x = 10;
if (x * 0 == 0)
y = 123;
Console.Write(y);
Now I understand that clearly any execution of the above code actually evaluates to
int y;
int x = 10;
y = 123;
Console.Write(y);
But what I dont understand is why it is considered "desirable" to make the following code in-compilable? IE: What are the risks with allowing such inferences to run their course?

I'm still finding this question a bit confusing but let me see if I can rephrase the question into a form that I can answer. First, let me re-state the background of the question:
In C# 2.0, this code:
int x = 123;
int y;
if (x * 0 == 0)
y = 345;
Console.WriteLine(y);
was treated as though you'd written
int x = 123;
int y;
if (true)
y = 345;
Console.WriteLine(y);
which in turn is treated as:
int x = 123;
int y;
y = 345;
Console.WriteLine(y);
Which is a legal program.
But in C# 3.0 we took the breaking change to prevent this. The compiler no longer treats the condition as being "always true" despite the fact that you and I both know that it is always true. We now make this an illegal program, because the compiler reasons that it does not know that the body of the "if" is always executed, and therefore does not know that the local variable y is always assigned before it is used.
Why is the C# 3.0 behaviour correct?
It is correct because the specification states that:
a constant expression must contain only constants. x * 0 == 0 is not a constant expression because it contains a non-constant term, x.
the consequence of an if is only known to be always reachable if the condition is a constant expression equal to true.
Therefore, the code given should not classify the consequence of the conditional statement to be always reachable, and therefore should not classify the local y as being definitely assigned.
Why is it desirable that a constant expression contain only constants?
We want the C# language to be clearly understandable by its users, and correctly implementable by compiler writers. Requiring that the compiler make all possible logical deductions about the values of expressions works against those goals. It should be simple to determine whether a given expression is a constant, and if so, what its value is. Put simply, the constant evaluation code should have to know how to perform arithmetic, but should not need to know facts about arithmetical manipulations. The constant evaluator knows how to multiply 2 * 1, but it does not need to know the fact that "1 is the multiplicative identity on integers".
Now, it is possible that a compiler writer might decide that there are areas in which they can be clever, and thereby generate more optimal code. Compiler writers are permitted to do so, but not in a way that changes whether code is legal or illegal. They are only allowed to make optimizations that make the output of the compiler better when given legal code.
How did the bug happen in C# 2.0?
What happened was the compiler was written to run the arithmetic optimizer too early. The optimizer is the bit that is supposed to be clever, and it should have run after the program was determined to be legal. It was running before the program was determined to be legal, and was therefore influencing the result.
This was a potential breaking change: though it brought the compiler into line with the specification, it also potentially turned working code into error code. What motivated the change?
LINQ features, and specifically expression trees. If you said something like:
(int x)=>x * 0 == 0
and converted that to an expression tree, do you expect that to generate the expression tree for
(int x)=>true
? Probably not! You probably expected it to produce the expression tree for "multiply x by zero and compare the result to zero". Expression trees should preserve the logical structure of the expression in the body.
When I wrote the expression tree code it was not clear yet whether the design committee was going to decide whether
()=>2 + 3
was going to generate the expression tree for "add two to three" or the expression tree for "five". We decided on the latter -- constants are folded before expression trees are generated, but arithmetic should not be run through the optimizer before expression trees are generated.
So, let's consider now the dependencies that we've just stated:
Arithmetic optimization has to happen before codegen.
Expression tree rewriting has to happen before arithmetic optimizations
Constant folding has to happen before expression tree rewriting
Constant folding has to happen before flow analysis
Flow analysis has to happen before expression tree rewriting (because we need to know if an expression tree uses an uninitialized local)
We've got to find an order to do all this work in that honours all those dependencies. The compiler in C# 2.0 did them in this order:
constant folding and arithmetic optimization at the same time
flow analysis
codegen
Where can expression tree rewriting go in there? Nowhere! And clearly this is buggy, because flow analysis is now taking into account facts deduced by the arithmetic optimizer. We decided to rework the compiler so that it did things in the order:
constant folding
flow analysis
expression tree rewriting
arithmetic optimization
codegen
Which obviously necessitates the breaking change.
Now, I did consider preserving the existing broken behaviour, by doing this:
constant folding
arithmetic optimization
flow analysis
arithmetic de-optimization
expression tree rewriting
arithmetic optimization again
codegen
Where the optimized arithmetic expression would contain a pointer back to its unoptimized form. We decided that this was too much complexity in order to preserve a bug. We decided that it would be better to instead fix the bug, take the breaking change, and make the compiler architecture more easily understood.

The specification states that the definite assignment of something that is only assigned inside an if block is undetermined. The spec says nothing about compiler magic that removes the unnecessary if block. In particular, it makes for a very confusing error message as you change the if condition, and suddenly get an error about y not being assigned "huh? I haven't changed when y is assigned!".
The compiler is free to perform any obvious code removal it wants to, but first it needs to follow the specification for the rules.
Specifically, section 5.3.3.5 (MS 4.0 spec):
5.3.3.5 If statements
For an if statement stmt of the form:
if ( expr ) then-stmt else else-stmt
v has the same definite assignment state at the beginning of expr as at the beginning of stmt.
If v is definitely assigned at the end of expr, then it is definitely assigned on the control flow transfer to then-stmt and to either else-stmt or to the end-point of stmt if there is no else clause.
If v has the state “definitely assigned after true expression” at the end of expr, then it is definitely assigned on the control flow transfer to then-stmt, and not definitely assigned on the control flow transfer to either else-stmt or to the end-point of stmt if there is no else clause.
If v has the state “definitely assigned after false expression” at the end of expr, then it is definitely assigned on the control flow transfer to else-stmt, and not definitely assigned on the control flow transfer to then-stmt. It is definitely assigned at the end-point of stmt if and only if it is definitely assigned at the end-point of then-stmt.
Otherwise, v is considered not definitely assigned on the control flow transfer to either the then-stmt or else-stmt, or to the end-point of stmt if there is no else
For an initially unassigned variable to be considered definitely assigned at a certain location, an assignment to the variable must occur in every possible execution path leading to that location.
technically, the execution path exists where the if condition is false; if y was also assigned in the else, then fine, but... the specification explicitly makes no demand of spotting the if condition is always true.

Related

Prepare syntax tree (ast) to easily perform short circuit operations

What is the best way to prepare a syntax tree containing conditions to allow easy and fast short curcuit usage?
The rules of short circuit in general are very easy:
If one component in an and block returns false, the complete block will return false and execution can be exited
If one component in an or block returns true, the complete block will return true and execution can be exited
So for example this simple statement will be evaluated to the following syntax tree 1 = 0 and 1 = 1:
and
/ \
= =
/ \ / \
1 0 1 1
In this case it is easy. After executing the first part of the tree (branch),
the execution will be exited it only can return false. But if the tree gets more
complex, there must be a way to be more efficient. Or is this already the most
efficient way?
For example, how does the c# compiler evaluate the syntax tree in this cases?
EDIT
Should I write all conditions in a simple list, and branch to the end if true or false is not possible? So that i have no and and or parts at the end?
Thank you all a lot!
Your definition of short circuiting doesn't quite match that of C# and other languages. In most (presumably all) languages that have short circuiting the behavior depends on the value of the left operand only, that is:
left && right always evaluates left and only evaluates right if left was true.
left || right always evaluates left and only evaluates right if left was false.
So the rules of the language guarantee you that the right operand will never be tried first, even if the compiler may think that trying the right operand first would be more efficient. This way you know that list == null || list.IsEmpty() can't ever throw a null pointer exception.
So to answer your question, the compiler won't generate anything more efficient than "evaluate the left operand and then evaluate the right operand only if you have to" because anything else would break the rules of the language.
PS: In theory it would be possible for the compiler to reorder the operands if it can prove that they don't have any side-effects, but to the best of my knowledge that is not done. Either way that would not happen at the AST level.
PPS: The C# compiler does not evaluate the AST, it generates code from it. It's a compiler, not an interpreter.

Pointer arithmetic difference in C# and Visual C++ [duplicate]

What are "sequence points"?
What is the relation between undefined behaviour and sequence points?
I often use funny and convoluted expressions like a[++i] = i;, to make myself feel better. Why should I stop using them?
If you've read this, be sure to visit the follow-up question Undefined behavior and sequence points reloaded.
(Note: This is meant to be an entry to Stack Overflow's C++ FAQ. If you want to critique the idea of providing an FAQ in this form, then the posting on meta that started all this would be the place to do that. Answers to that question are monitored in the C++ chatroom, where the FAQ idea started out in the first place, so your answer is very likely to get read by those who came up with the idea.)
C++98 and C++03
This answer is for the older versions of the C++ standard. The C++11 and C++14 versions of the standard do not formally contain 'sequence points'; operations are 'sequenced before' or 'unsequenced' or 'indeterminately sequenced' instead. The net effect is essentially the same, but the terminology is different.
Disclaimer : Okay. This answer is a bit long. So have patience while reading it. If you already know these things, reading them again won't make you crazy.
Pre-requisites : An elementary knowledge of C++ Standard
What are Sequence Points?
The Standard says
At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations
shall be complete and no side effects of subsequent evaluations shall have taken place. (§1.9/7)
Side effects? What are side effects?
Evaluation of an expression produces something and if in addition there is a change in the state of the execution environment it is said that the expression (its evaluation) has some side effect(s).
For example:
int x = y++; //where y is also an int
In addition to the initialization operation the value of y gets changed due to the side effect of ++ operator.
So far so good. Moving on to sequence points. An alternation definition of seq-points given by the comp.lang.c author Steve Summit:
Sequence point is a point in time at which the dust has settled and all side effects which have been seen so far are guaranteed to be complete.
What are the common sequence points listed in the C++ Standard?
Those are:
at the end of the evaluation of full expression (§1.9/16) (A full-expression is an expression that is not a subexpression of another expression.)1
Example :
int a = 5; // ; is a sequence point here
in the evaluation of each of the following expressions after the evaluation of the first expression (§1.9/18) 2
a && b (§5.14)
a || b (§5.15)
a ? b : c (§5.16)
a , b (§5.18) (here a , b is a comma operator; in func(a,a++) , is not a comma operator, it's merely a separator between the arguments a and a++. Thus the behaviour is undefined in that case (if a is considered to be a primitive type))
at a function call (whether or not the function is inline), after the evaluation of all function arguments (if any) which
takes place before execution of any expressions or statements in the function body (§1.9/17).
1 : Note : the evaluation of a full-expression can include the evaluation of subexpressions that are not lexically
part of the full-expression. For example, subexpressions involved in evaluating default argument expressions (8.3.6) are considered to be created in the expression that calls the function, not the expression that defines the default argument
2 : The operators indicated are the built-in operators, as described in clause 5. When one of these operators is overloaded (clause 13) in a valid context, thus designating a user-defined operator function, the expression designates a function invocation and the operands form an argument list, without an implied sequence point between them.
What is Undefined Behaviour?
The Standard defines Undefined Behaviour in Section §1.3.12 as
behavior, such as might arise upon use of an erroneous program construct or erroneous data, for which this International Standard imposes no requirements 3.
Undefined behavior may also be expected when this
International Standard omits the description of any explicit definition of behavior.
3 : permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or with-
out the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
In short, undefined behaviour means anything can happen from daemons flying out of your nose to your girlfriend getting pregnant.
What is the relation between Undefined Behaviour and Sequence Points?
Before I get into that you must know the difference(s) between Undefined Behaviour, Unspecified Behaviour and Implementation Defined Behaviour.
You must also know that the order of evaluation of operands of individual operators and subexpressions of individual expressions, and the order in which side effects take place, is unspecified.
For example:
int x = 5, y = 6;
int z = x++ + y++; //it is unspecified whether x++ or y++ will be evaluated first.
Another example here.
Now the Standard in §5/4 says
Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression.
What does it mean?
Informally it means that between two sequence points a variable must not be modified more than once.
In an expression statement, the next sequence point is usually at the terminating semicolon, and the previous sequence point is at the end of the previous statement. An expression may also contain intermediate sequence points.
From the above sentence the following expressions invoke Undefined Behaviour:
i++ * ++i; // UB, i is modified more than once btw two SPs
i = ++i; // UB, same as above
++i = 2; // UB, same as above
i = ++i + 1; // UB, same as above
++++++i; // UB, parsed as (++(++(++i)))
i = (i, ++i, ++i); // UB, there's no SP between `++i` (right most) and assignment to `i` (`i` is modified more than once btw two SPs)
But the following expressions are fine:
i = (i, ++i, 1) + 1; // well defined (AFAIK)
i = (++i, i++, i); // well defined
int j = i;
j = (++i, i++, j*i); // well defined
Furthermore, the prior value shall be accessed only to determine the value to be stored.
What does it mean? It means if an object is written to within a full expression, any and all accesses to it within the same expression must be directly involved in the computation of the value to be written.
For example in i = i + 1 all the access of i (in L.H.S and in R.H.S) are directly involved in computation of the value to be written. So it is fine.
This rule effectively constrains legal expressions to those in which the accesses demonstrably precede the modification.
Example 1:
std::printf("%d %d", i,++i); // invokes Undefined Behaviour because of Rule no 2
Example 2:
a[i] = i++ // or a[++i] = i or a[i++] = ++i etc
is disallowed because one of the accesses of i (the one in a[i]) has nothing to do with the value which ends up being stored in i (which happens over in i++), and so there's no good way to define--either for our understanding or the compiler's--whether the access should take place before or after the incremented value is stored. So the behaviour is undefined.
Example 3 :
int x = i + i++ ;// Similar to above
Follow up answer for C++11 here.
This is a follow up to my previous answer and contains C++11 related material..
Pre-requisites : An elementary knowledge of Relations (Mathematics).
Is it true that there are no Sequence Points in C++11?
Yes! This is very true.
Sequence Points have been replaced by Sequenced Before and Sequenced After (and Unsequenced and Indeterminately Sequenced) relations in C++11.
What exactly is this 'Sequenced before' thing?
Sequenced Before(§1.9/13) is a relation which is:
Asymmetric
Transitive
between evaluations executed by a single thread and induces a strict partial order1
Formally it means given any two evaluations(See below) A and B, if A is sequenced before B, then the execution of A shall precede the execution of B. If A is not sequenced before B and B is not sequenced before A, then A and B are unsequenced 2.
Evaluations A and B are indeterminately sequenced when either A is sequenced before B or B is sequenced before A, but it is unspecified which3.
[NOTES]
1 : A strict partial order is a binary relation "<" over a set P which is asymmetric, and transitive, i.e., for all a, b, and c in P, we have that:
........(i). if a < b then ¬ (b < a) (asymmetry);
........(ii). if a < b and b < c then a < c (transitivity).
2 : The execution of unsequenced evaluations can overlap.
3 : Indeterminately sequenced evaluations cannot overlap, but either could be executed first.
What is the meaning of the word 'evaluation' in context of C++11?
In C++11, evaluation of an expression (or a sub-expression) in general includes:
value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation) and
initiation of side effects.
Now (§1.9/14) says:
Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated.
Trivial example:
int x;
x = 10;
++x;
Value computation and side effect associated with ++x is sequenced after the value computation and side effect of x = 10;
So there must be some relation between Undefined Behaviour and the above-mentioned things, right?
Yes! Right.
In (§1.9/15) it has been mentioned that
Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced4.
For example :
int main()
{
int num = 19 ;
num = (num << 3) + (num >> 3);
}
Evaluation of operands of + operator are unsequenced relative to each other.
Evaluation of operands of << and >> operators are unsequenced relative to each other.
4: In an expression that is evaluated more than once during the execution
of a program, unsequenced and indeterminately sequenced evaluations of its subexpressions need not be performed consistently in different evaluations.
(§1.9/15)
The value computations of the operands of an
operator are sequenced before the value computation of the result of the operator.
That means in x + y the value computation of x and y are sequenced before the value computation of (x + y).
More importantly
(§1.9/15) If a side effect on a scalar object is unsequenced relative to either
(a) another side effect on the same scalar object
or
(b) a value computation using the value of the same scalar object.
the behaviour is undefined.
Examples:
int i = 5, v[10] = { };
void f(int, int);
i = i++ * ++i; // Undefined Behaviour
i = ++i + i++; // Undefined Behaviour
i = ++i + ++i; // Undefined Behaviour
i = v[i++]; // Undefined Behaviour
i = v[++i]: // Well-defined Behavior
i = i++ + 1; // Undefined Behaviour
i = ++i + 1; // Well-defined Behaviour
++++i; // Well-defined Behaviour
f(i = -1, i = -1); // Undefined Behaviour (see below)
When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function. [Note: Value computations and side effects associated with different argument expressions are unsequenced. — end note]
Expressions (5), (7) and (8) do not invoke undefined behaviour. Check out the following answers for a more detailed explanation.
Multiple preincrement operations on a variable in C++0x
Unsequenced Value Computations
Final Note :
If you find any flaw in the post please leave a comment. Power-users (With rep >20000) please do not hesitate to edit the post for correcting typos and other mistakes.
C++17 (N4659) includes a proposal Refining Expression Evaluation Order for Idiomatic C++
which defines a stricter order of expression evaluation.
In particular, the following sentence
8.18 Assignment and compound assignment operators:....
In all cases, the assignment is sequenced after the value
computation of the right and left operands, and before the value computation of the assignment expression.
The right operand is sequenced before the left operand.
together with the following clarification
An expression X is said to be sequenced before an expression Y if every
value computation and every side effect associated with the expression X is sequenced before every value
computation and every side effect associated with the expression Y.
make several cases of previously undefined behavior valid, including the one in question:
a[++i] = i;
However several other similar cases still lead to undefined behavior.
In N4140:
i = i++ + 1; // the behavior is undefined
But in N4659
i = i++ + 1; // the value of i is incremented
i = i++ + i; // the behavior is undefined
Of course, using a C++17 compliant compiler does not necessarily mean that one should start writing such expressions.
I am guessing there is a fundamental reason for the change, it isn't merely cosmetic to make the old interpretation clearer: that reason is concurrency. Unspecified order of elaboration is merely selection of one of several possible serial orderings, this is quite different to before and after orderings, because if there is no specified ordering, concurrent evaluation is possible: not so with the old rules. For example in:
f (a,b)
previously either a then b, or, b then a. Now, a and b can be evaluated with instructions interleaved or even on different cores.
In C99(ISO/IEC 9899:TC3) which seems absent from this discussion thus far the following steteents are made regarding order of evaluaiton.
[...]the order of evaluation of subexpressions and the order in which
side effects take place are both unspecified. (Section 6.5 pp 67)
The order of evaluation of the operands is unspecified. If an attempt
is made to modify the result of an assignment operator or to access it
after the next sequence point, the behavior[sic] is undefined.(Section
6.5.16 pp 91)

Why are arithmetical expressions not optimized for multiplication by 0 in C#

In order to evaluate a multiplication you have to evaluate the first term, then the second term and finally multiply the two values.
Given that every number multiplied by 0 is 0, if the evaluation of the first term returns 0 I would expect that the entire multiplication is evaluated to 0 without evaluating the second term.
However if you try this code:
var x = 0 * ComplexOperation();
The function ComplexOperation is called despite the fact that we know that x is 0.
The optimized behavior would be also consistent with the Boolean Operator '&&' that evaluates the second term only if the first one is evaluated as true. (The '&' operator evaluates both terms in any case)
I tested this behavior in C# but I guess it is the same for almost all languages.
Firstly, for floating-point, your assertion isn't even true! Consider that 0 * inf is not 0, and 0 * nan is not 0.
But more generally, if you're talking about optimizations, then I guess the compiler is free to not evaluate ComplexOperation if it can prove there are no side-effects.
However, I think you're really talking about short-circuit semantics (i.e. a language feature, not a compiler feature). If so, then the real justification is that C# is copying the semantics of earlier languages (originally C) to maintain consistency.
C# is not functional, so functions can have side effects. For example, you can print something from inside ComlpexOperation or change global static variables. So, whether it is called is defined by * contract.
You found yourself an example of different contracts with & and &&.
The language defines which operators have short-circuit semantics and which do not. Your ComplexOperation function may have side effects, those side effects may be deliberate, and the compiler is not free to assume that they should not occur just because the result of the function is effectively not used.
I will also add this would be obfuscated language design. There would be oodles of SO questions to the effect of...
//why is foo only called 9 times?????????
for(int i = 0; i < 10; i++) {
print((i-5)*foo());
}
Why allow short-circuiting booleans and not short-circuiting 0*? Well, firstly I will say that mixing short-circuit boolean with side-effects is a common source of bugs in code - if used well among maintainers who understand it as an obvious pattern then it may be okay, but it's very hard for me to imagine programmers becoming at all used to a hole in the integers at 0.

Restricting post/pre increment operator over a value rather than a variable, property and indexer

From this post (and not only) we got to the point that the ++ operator cannot be applied on expressions returning value.
And it's really obvious that 5++ is better to write as 5 + 1. I just want to summarize the whole thing around the increment/decrement operator. So let's go through these snippets of code that could be helpful to somebody stuck with the ++ first time at least.
// Literal
int x = 0++; // Error
// Constant
const int Y = 1;
double f = Y++; // error. makes sense, constants are not variables actually.
int z = AddFoo()++; // Error
Summary: ++ works for variables, properties (through a synthetic sugar) and indexers(the same).
Now the interest part - any literal expressions are optimized in CSC and, hence when we write, say
int g = 5 + 1; // This is compiled to 6 in IL as one could expect.
IL_0001: ldc.i4.6 // Pushes the integer value of 6 onto the evaluation stack as an int32.
For 5++ doesn't mean 5 becomes 6, it could be a shorthand for 5 + 1, like for x++ = x + 1
What's the real reason behind this restriction?
int p = Foo()++ //? yes you increase the return value of Foo() with 1, what's wrong with that?
Examples of code that can lead to logical issues are appreciated.
One of real-life example could be, perform one more actions than in the array.
for (int i = 0; i < GetCount()++; i++) { }
Maybe the lack of usage opts compiler teams to avoid similar features?
I don't insist this is a feature we lack of, just want to understand the dark side of this for compiler writers perhaps, though I'm not. But I know c++ allows this when returning a reference in the method. I'm neither a c++ guy(very poor knowledge) just want to get the real gist of the restriction.
Like, is it just because c# guys opted to restrict the ++ over value expressions or there are definite cases leading to unpredictable results?
In order for a feature to be worth supporting, it really needs to be useful. The code you've presented is in every case less readable than the alternative, which is just to use the normal binary addition operator. For example:
for (int i = 0; i < GetCount() + 1; i++) { }
I'm all in favour of the language team preventing you from writing unreadable code when in every case where you could do it, there's a simpler alternative.
Well before using these operators you should try to read up on how they do what they do. In particular you should understand the difference between postfix and prefix, which could help figure out what is and isn't allowed.
The ++ and -- operators modify their operands. Which means that the operand must be modifiable. If you can assign a value to the expression in question then it is modifiable, and is probably a variable(c#).
Taking a look at what these operators actually do. The postfix operators should increment after your line of code executes. As for the prefix operators, well they would need to have access to the value before the method had even been called yet. The way I read the syntax is ++lvalue (or ++variable) converting to memory operations:[read, write, read] or for lvalue++ [read, read, write] Though many compilers probably optimize secondary reads.
So looking at foo()++; the value is going to be plopped dead in the center of executing code. Which would mean the compiler would need to save the value somewhere more long-term in order for operations to be performed on said value, after the line of code has finished executing. Which is no doubt the exact reason C++ does not support this syntax either.
If you were to be returning a reference the compiler wouldn't have any trouble with the postfix. Of course in C# value types (ie. int, char, float, etc) cannot be passed by reference as they are value types.

Is (--i == i++) an Undefined Behavior?

this question is related to my previous problem. The answer I got was "It is an Undefined behavior."
Please anyone explain:
What is an undefined behavior?
how can I know my code has an undefined behavior?
Example code:
int i = 5;
if (--i == i++)
Console.WriteLine("equal and i=" + i);
else
Console.WriteLine("not equal and i=" + i);
//output: equal and i=6
What is an Undefined-Behaviour?
It's quite simply any behaviour that is not specifically defined by the appropriate language specification. Some specs will list certain things as explicitly undefined, but really anything that's not described as being defined is undefined.
how can I know my code has an undefined behavior?
Hopefully your compiler will warn you - if that's not the case, you need to read the language specification and learn about all the funny corner cases and nooks & crannies that cause these sorts of problems.
Be careful out there!
It's undefined in C, but well-defined in C#:
From C# (ECMA-334) specification "Operator precedence and associativity" section (§14.2.1):
Except for the assignment operators and the null coalescing operator, all
binary operators are left-
associative, meaning that operations
are performed from left to right.
[Example: x + y + z is evaluated as (x + y) + z. end example]
So --i is evaluated first, changing i to 4 and evaluating to 4. Then i++ is evaluating, changing i to 5, but evaluating to 4.
Yes, that expression is undefined behavior as well (in C and C++). See http://en.wikipedia.org/wiki/Sequence_point for some information on the rules; you can also search for "sequence point" more generally (that is the set of rules that your code violates).
(This assumes C or C++.)
Carl's answer is exact in general.
In specific, the problem is what Jeremiah pointed out: sequence points.
To clarify, the chunk of code (--i == ++i) is a single "happening". It's a chunk of code that's evaluated all at once. There is no defined order of what happens first. The left side could be evaluated first, or the right side could, or maybe the equality is compared, then i is incremented, then decremented. Each of these behaviors could cause this expression to have different results. It's "undefined" what will happen here. You don't know what the answer will be.
Compare this to the statement i = i+1; Here, the right side is always evaluated first, then its result is stored into i. This is well-defined. There's no ambiguity.
Hope that helps a little.
In C the result is undefined, in C# it's defined.
In C, the comparison is interpreted as:
Do all of these, in any order:
- Decrease i, then get value of i into x
- Get value of i into y, then increase i
Then compare x and y.
In C# there are more operation boundaries, so the comparison is interpreted as:
Decrease i
then get value of i into x
then get value of i into y
then increase i
then compare x and y.
It's up to the compiler to choose in which order the operations are done within an operation boundary, so putting contradictory operations within the same boundary causes the result to be undefined.
Because the C standard states so. And your example clearly shows an undefined behabiour.
Depending on the order of evaluation, the comparison should be 4 == 5 or 5 == 6. And yet the condition returns True.
Your previous question was tagged [C], so I'm answering based on C, even though the code in your current question doesn't look like C.
The definition of undefined behavior in C99 says (§3.4.3):
1 undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
2 NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
Appendix J.2 of the C standard has a (long -- several pages) list of undefined behavior, though even that still isn't exhaustive. For the most part, undefined behavior means you broke the rules, so the way to know it is to know the rules.
Undefined behavior == the result cannot be guaranteed to always be the same whenever you run it in the exact same conditions, or the result cannot be guaranteed to always be the same whenever you use different compilers or runtimes to execute it.
In your code, since it is using a equal comparison operator which does not specify which side of the operands should be executed first, --i or i++ may end up running first, and your answer will depend on the actual implementation of the compiler. If --i is executed first, it will be 4 == 4, i=5; if i++ is implemented first, it will be 5 == 5, i=5.
The fact that the answer may turn out to be the same does not prevent the compiler from warning you that this is an undefined operation.
Now if this is a language that defines that the left hand side (or right hand side) should always be executed first, then the behavior will no longer be undefined.

Categories