Nothing != null - or does it? - c#

Recently in a previous project I came across a peculiar difference between VB.NET and C#.
Consider the following C# expression which:
null <= 2
This expression evaluates to False which is what I would expect.
Then the corresponding VB.NET expression:
Nothing <= 2
I was surprised to learn that this expression actually evaluates to True
It seems like a fairly fundamental design decision between the two languages and it certainly caught me out.
Is anyone able to tell me why?
Are null and Nothing one and the same?
If so, why do they behave differently?

Nothing in VB evaluates to the default value for a given type. (See this link for details.)
For an integer comparison (which the compiler will assume from the right hand operand), Nothing will thus be 0. 0 <= 2 is true for more obvious reasons :-)

Related

Why is checking a variable with is to non-type is not an error

I have seen many colleagues by mistake use
x is true
to check if the variable is true.
Independently of whether it is necessary to use is or the correct == to check if a boolean is true, I'd like to understand why this is not a compilation error.
The documentation1 2 of is states:
The is operator checks if the result of an expression is compatible with a given type.
AFAIK true is not a type.
Interestingly this also works for integers.
Update:
I tried it out with the online editor provided by W3 and there it used to be an error. Has pattern matching changed this behaviour with C# 8?
So does x is true do the same as x == true in C# 8 (even without the {}?

Prepare syntax tree (ast) to easily perform short circuit operations

What is the best way to prepare a syntax tree containing conditions to allow easy and fast short curcuit usage?
The rules of short circuit in general are very easy:
If one component in an and block returns false, the complete block will return false and execution can be exited
If one component in an or block returns true, the complete block will return true and execution can be exited
So for example this simple statement will be evaluated to the following syntax tree 1 = 0 and 1 = 1:
and
/ \
= =
/ \ / \
1 0 1 1
In this case it is easy. After executing the first part of the tree (branch),
the execution will be exited it only can return false. But if the tree gets more
complex, there must be a way to be more efficient. Or is this already the most
efficient way?
For example, how does the c# compiler evaluate the syntax tree in this cases?
EDIT
Should I write all conditions in a simple list, and branch to the end if true or false is not possible? So that i have no and and or parts at the end?
Thank you all a lot!
Your definition of short circuiting doesn't quite match that of C# and other languages. In most (presumably all) languages that have short circuiting the behavior depends on the value of the left operand only, that is:
left && right always evaluates left and only evaluates right if left was true.
left || right always evaluates left and only evaluates right if left was false.
So the rules of the language guarantee you that the right operand will never be tried first, even if the compiler may think that trying the right operand first would be more efficient. This way you know that list == null || list.IsEmpty() can't ever throw a null pointer exception.
So to answer your question, the compiler won't generate anything more efficient than "evaluate the left operand and then evaluate the right operand only if you have to" because anything else would break the rules of the language.
PS: In theory it would be possible for the compiler to reorder the operands if it can prove that they don't have any side-effects, but to the best of my knowledge that is not done. Either way that would not happen at the AST level.
PPS: The C# compiler does not evaluate the AST, it generates code from it. It's a compiler, not an interpreter.

Why are arithmetical expressions not optimized for multiplication by 0 in C#

In order to evaluate a multiplication you have to evaluate the first term, then the second term and finally multiply the two values.
Given that every number multiplied by 0 is 0, if the evaluation of the first term returns 0 I would expect that the entire multiplication is evaluated to 0 without evaluating the second term.
However if you try this code:
var x = 0 * ComplexOperation();
The function ComplexOperation is called despite the fact that we know that x is 0.
The optimized behavior would be also consistent with the Boolean Operator '&&' that evaluates the second term only if the first one is evaluated as true. (The '&' operator evaluates both terms in any case)
I tested this behavior in C# but I guess it is the same for almost all languages.
Firstly, for floating-point, your assertion isn't even true! Consider that 0 * inf is not 0, and 0 * nan is not 0.
But more generally, if you're talking about optimizations, then I guess the compiler is free to not evaluate ComplexOperation if it can prove there are no side-effects.
However, I think you're really talking about short-circuit semantics (i.e. a language feature, not a compiler feature). If so, then the real justification is that C# is copying the semantics of earlier languages (originally C) to maintain consistency.
C# is not functional, so functions can have side effects. For example, you can print something from inside ComlpexOperation or change global static variables. So, whether it is called is defined by * contract.
You found yourself an example of different contracts with & and &&.
The language defines which operators have short-circuit semantics and which do not. Your ComplexOperation function may have side effects, those side effects may be deliberate, and the compiler is not free to assume that they should not occur just because the result of the function is effectively not used.
I will also add this would be obfuscated language design. There would be oodles of SO questions to the effect of...
//why is foo only called 9 times?????????
for(int i = 0; i < 10; i++) {
print((i-5)*foo());
}
Why allow short-circuiting booleans and not short-circuiting 0*? Well, firstly I will say that mixing short-circuit boolean with side-effects is a common source of bugs in code - if used well among maintainers who understand it as an obvious pattern then it may be okay, but it's very hard for me to imagine programmers becoming at all used to a hole in the integers at 0.

Why does 'Submissions.Where(s => (false && s.Status == Convert.ToInt16("")))' raise an FormatException?

I thought the query was quite trivial, but it's raising a FormatException ("Input string was not in a correct format") nonetheless:
Submissions.Where(s => (false && s.Status == Convert.ToInt16("")))
(of course, in my code, another expression that evaluates to 'false' is located before '&&')
So why is the part after '&&' evaluated, since the first part is always false and the total expression can never evaluate to true?
The situation is particularly strange because only the Convert.ToInt16("") part seems to raise an exception - other parts of my original query of more or less the same structure, like
Submissions.Where(s => (false && s.SubmissionDate <= DateTime.Now))
are evaluated correctly.
As the others have pointed out, LINQ to SQL code gets pulled apart into an expression tree before being run as SQL code against the database. Since SQL does not necessarily follow the same short-circuit boolean rules as C#, the right side of your expression code might get parsed so that the SQL can be constructed.
From MSDN:
C# specifies short circuit semantics
based on lexical order of operands for
logical operators && and ||. SQL on
the other hand is targeted for
set-based queries and therefore
provides more freedom for the
optimizer to decide the order of
execution.
As for why you're getting an exception with this code, Convert.ToInt16("") will always throw precisely that exception because there's no way to convert an empty string into an integer. Your other example doesn't attempt an invalid conversion, hence it runs without a problem.
If Submissions is an IQueryable<T>, then this isn't a regular C# delegate, but is an expression tree. Some code (the LINQ provider) has to pull this tree apart and understand it - so if you have oddities in the expressions, then expect odd output.
Well based on your answer to my question in the comments, since it's Linq to Sql, it's not actually a delegate. I tried recreating it using Linq to Objects, and sure enough there was no issue at all. VS actually pointed out that "Unreachable code detected". Since in your case it's actually Linq to Sql, then it's building up an expression tree, in which case it has to decipher all of it and all bets are off.
Suggestion: use a static Int16 to hold the result of Convert.ToInt16(""), then refer to the static in the predicate.
Better still, do you know what the result of Convert.ToInt16("") is? Yes? Then use that instead. For instance, if it's 0, then say s.Status == 0. You could even make that a constant.

I don't like this... Is this cheating the language?

I have seen something like the following a couple times... and I hate it. Is this basically 'cheating' the language? Or.. would you consider this to be 'ok' because the IsNullOrEmpty is evaluated first, all the time?
(We could argue whether or not a string should be NULL when it comes out of a function, but that isn't really the question.)
string someString;
someString = MagicFunction();
if (!string.IsNullOrEmpty(someString) && someString.Length > 3)
{
// normal string, do whatever
}
else
{
// On a NULL string, it drops to here, because first evaluation of IsNullOrEmpty fails
// However, the Length function, if used by itself, would throw an exception.
}
EDIT:
Thanks again to everyone for reminding me of this language fundamental. While I knew "why" it worked, I can't believe I didn't know/remember the name of the concept.
(In case anyone wants any background.. I came upon this while troubleshooting exceptions generated by NULL strings and .Length > x exceptions... in different places of the code. So when I saw the above code, in addition to everything else, my frustration took over from there.)
You're taking advantage of a language feature known as short circuiting. This is not cheating the language but in fact using a feature exactly how it was designed to be used.
If you are asking if its ok to depend on the "short circuit" relational operators && and ||, then yes thats totally fine.
There is nothing wrong with this, as you just want to make certain you won't get a nullpointer exception.
I think it is reasonable to do.
With Extensions you can make it cleaner, but the basic concept would still be valid.
This code is totally valid, but I like to use the Null Coalesce Operator for avoid null type checks.
string someString = MagicFunction() ?? string.Empty;
if (someString.Length > 3)
{
// normal string, do whatever
}
else
{
// NULL strings will be converted to Length = 0 and will end up here.
}
Theres nothing wrong with this.
if(conditions are evaluated from left to right so it's perfectly fine to stack them like this.
This is valid code, in my opinion (although declaring a variable and assigning it on the next line is pretty annoying), but you should probably realize that you can enter the else-block also in the condition where the length of the string is < 3.
That looks to me like a perfectly reasonable use of logical short-circuitting--if anything, it's cheating with the language. I've only recently come from VB6 which didn't ever short-circuit, and that really annoyed me.
One problem to watch out for is that you might need to test for Null again in that else clause, since--as written--you're winding up there with both Null strings and length-less-than-three strings.
This is perfectly valid and there is nothing wrong with using it that way. If you are following documented behaviour for the language than all is well. In C# the syntax you are using are the conditional logic operators and thier docemented bahviour can be found on MSDN
For me it's the same as when you do not use parenthesis for when doing multiplication and addition in the same statement because the language documents that the multiplication operations will get carried out first.
Relying on short-circuiting is the "right thing" to do in most cases. It leads to terser code with fewer moving parts. Which generally means easier to maintain. This is especially true in C and C++.
I would seriously reconsider hiring someone who is not familiar with (and does not know how to use) short-circuiting operations.
I find it OK :) You're just making sure that you don't access a NULL variable.
Actually, I always do such checking before doing any operation on my variable (also, when indexing collections and so) - it's safer, a best practice, that's all ..
It makes sense because C# by default short circuits the conditions, so I think it's fine to use that to your advantage. In VB there may be some issues if the developer uses AND instead of ANDALSO.
I don't think it's any different than something like this:
INT* pNumber = GetAddressOfNumber();
if ((pNUmber != NULL) && (*pNumber > 0))
{
// valid number, do whatever
}
else
{
// On a null pointer, it drops to here, because (pNumber != NULL) fails
// However, (*pNumber > 0), if used by itself, would throw and exception when dereferencing NULL
}
It's just taking advantage of a feature in the language. This kind of idiom has been in common use, I think, since C started executing Boolean expressions in this manner (or whatever language did it first).)
If it were code in c that you compiled into assembly, not only is short-circuiting the right behavior, it's faster. In machine langauge the parts of the if statement are evaluated one after another. Not short-circuiting is slower.
Writing code cost a lot of $ to a company. But maintaining it cost more !
So, I'm OK with your point : chance are that this line of code will not be understood immediatly by the guy who will have to read it and correct it in 2 years.
Of course, he will be asked to correct a critical production bug. He will search here and there and may not notice this.
We should always code for the next guy and he may be less clever that we are. To me, this is the only thing to remember.
And this implies that we use evident language features and avoid the others.
All the best, Sylvain.
A bit off topic but if you rand the same example in vb.net like this
dim someString as string
someString = MagicFunction()
if not string.IsNullOrEmpty(someString) and someString.Length > 3 then
' normal string, do whatever
else
' do someting else
end if
this would go bang on a null (nothing) string but in VB.Net you code it as follows do do the same in C#
dim someString as string
someString = MagicFunction()
if not string.IsNullOrEmpty(someString) andalso someString.Length > 3 then
' normal string, do whatever
else
' do someting else
end if
adding the andalso make it behave the same way, also it reads better. as someone who does both vb and c' development the second vb one show that the login is slighty different and therefor easyer to explain to someone that there is a differeance etc.
Drux

Categories