Nullable Types and Operators

Nullable Types and Operators - c#

I was reading about Nullable Types and Operators from Wrox and came across the following statement:
When comparing nullable types, if only one of the operands is null, the comparison will always equate to false. This means that you cannot assume a condition is true just because its opposite is false.
Now, I get what the first statement means but didn't get the second statement.
Could you please elaborate?

It appears like the quote is saying that any comparison at all with a null type will return null, regardless of the operand.
So something like (null != 5) Would return false, where (null == 5) would also return false.
Now, the funny thing is, that when I ran a program, null != 5 returned true, so while I can't verify that statement for c# 2.0, it's definitely not true anymore in c# 4.0 +
This is the sample code I've used:
int? a = null;
int? b = 5;
if (a != b)
{
Console.WriteLine("A != B");
}
if (a == b)
{
Console.WriteLine("A == B");
}
This is the output
A != B
Press any key to continue . . .

That quote is wrong, MSDN is the official source: http://msdn.microsoft.com/en-us/library/2cf62fcy.aspx
When you perform comparisons with nullable types, if the value of one
of the nullable types is null and the other is not, all comparisons
evaluate to false except for != (not equal).
The second statement simply means that a comparison and its opposite may be both false, eg a>=b and a<b are both false.
Here's the logic. Null kind of means an unknown value. Now try to calculate 5<null, 5<=null, 5==null, 5>=null, 5>null. You can't exactly calculate any of these, they are not true or false either, they are unknown, but false makes a bit more sense than true. The exception is 5!=null. An exact number is not unknown, so it makes a bit more sense that it's true than false. So == and != are the opposite of each other, it's the easy case, but you can't say that for greater-less comparisons. It was a design choice when they made C#/.NET.
SQL on the other hand got it quite right I think with 3-valued-logic, see here: http://en.wikipedia.org/wiki/Null_%28SQL%29#Comparisons_with_NULL_and_the_three-valued_logic_.283VL.29

Related

How can "x & y" be false when both x and y are true?

Context:
I'm learning C# and have been messing about on the Pex for fun site. The site challenges you to re-implement a secret algorithm, by typing code into the site and examining how the inputs and outputs differ between your implementation and the secret implementation.
Problem:
Anyway, I got stuck on a basic code duel called XAndY.
From the name it seemed obvious that the answer was just:
public static bool Puzzle(bool x, bool y)
{
return x && y;
}
However, this was incorrect and Pex informed me that the following inputs produced a different result than the secret implementation:
Input:
x:true y:true (0x02)
Output:
my implementation: true (0x02)
secret implementation: false
Mismatch Your puzzle method produced the wrong result.
Code:
Puzzle(true, PexSafeHelpers.ByteToBoolean((byte)2));
After a lot of confusion trying to compare different types of true, I realised that the implementation that Pex was looking for was actually just using a bitwise AND:
return x & y;
Questions:
I thought that for both semantic and short-circuiting reasons you should use logical && for comparing boolean values, but regardless:
Does this mean that x & y and x && y definitively do not have the same outputs for all possible bool arguments? (or could it be something buggy in Pex?)
Does this mean you can differentiate between different values of bool true in C#? If so, how?

The puzzle is exploiting what, in my opinion, is a bug in the C# compiler. (The bug affects VB.NET as well.)
In the C# 5.0 specification, §4.1.8 says that "The possible values of type bool are true and false", and §7.11.3 says that operator &(bool x, bool y) is a logical operator:
The result of x & y is true if both x and y are true. Otherwise, the result is false.
It's obviously a violation of the specification for true & true to yield false. What's going on?
At run time, a bool is represented by a 1-byte integer. The C# compiler uses 0 to represent false and 1 to represent true. To implement the & operator, the C# compiler emits a bitwise AND instruction in the generated IL. At first glance, this seems to be okay: bitwise AND operations involving 0 and 1 correspond exactly with logical AND operations involving false and true.
However, §III.1.1.2 of the CLI specification explicitly allows a bool to be represented by an integer other than 0 or 1:
A CLI Boolean type occupies 1 byte in memory. A bit pattern of all zeroes denotes a value of false. A bit pattern with any one or more bits set (analogous to a non-zero integer) denotes a value of true.
By going beyond the scope of C#, it is indeed possible—and perfectly legal—to create a bool whose value is, say, 2, thus causing & to behave unexpectedly. This is what the Pex site is doing.
Here's a demonstration:
using System;
using System.Reflection.Emit;
class Program
{
static void Main()
{
DynamicMethod method =
new DynamicMethod("ByteToBoolean", typeof(bool), new[] { typeof(byte) });
ILGenerator il = method.GetILGenerator();
il.Emit(OpCodes.Ldarg_0); // Load the byte argument...
il.Emit(OpCodes.Ret); // and "cast" it directly to bool.
var byteToBoolean =
(Func<byte, bool>)method.CreateDelegate(typeof(Func<byte, bool>));
bool x = true;
bool y = byteToBoolean(2);
Console.WriteLine(x); // True
Console.WriteLine(y); // True
Console.WriteLine(x && y); // True
Console.WriteLine(x & y); // False (!) because 1 & 2 == 0
Console.WriteLine(y.Equals(false)); // False
Console.WriteLine(y.Equals(true)); // False (!) because 2 != 1
}
}
So the answers to your questions are:
Currently, it's possible for x & y and x && y to have different values. However, this behavior violates the C# specification.
Currently, you can use Boolean.Equals (as shown above) to differentiate between true values. However, this behavior violates the CLI specification of Boolean.Equals.

I noticed that this issue has been raised to the Roslyn compiler group. Further discussion.
It had the following resolution:
This is effectively by design and you can find the document detailing
that here:
https://github.com/dotnet/roslyn/blob/master/docs/compilers/Boolean%20Representation.md
You can see more details and past conversations about this here:
#24652
The noted document states:
Representation of Boolean Values
The C# and VB compilers represent true (True) and false (False) bool
(Boolean) values with the single byte values 1 and 0, respectively,
and assume that any boolean values that they are working with are
restricted to being represented by these two underlying values. The
ECMA 335 CLI specification permits a "true" boolean value to be
represented by any nonzero value. If you use boolean values that have
an underlying representation other than 0 or 1, you can get unexpected
results. This can occur in unsafe code in C#, or by interoperating
with a language that permits other values. To avoid these unexpected
results, it is the programmer's responsibility to normalize such
incoming values.
(All italics are my own emphasis)

& in C# is not a bitwise operator, assuming that the input values are Boolean values. It is overloaded. There are two entirely separate implementations of the operator. A non-short circuiting logical boolean operator if the inputs are booleans, and a bitwise AND if the values are non-boolean values.
In the code that you have shown, then input is a boolean variable. It's not a numeric value, it's not an expression that resolves to a boolean value (which may have side effects), or anything else.
When the input is two boolean variables there is never going to be any different in the output between & and &&. The only way to have any observable difference between these two is to have a boolean expression that is more complex than just resolving a variable to its value, or some non-boolean input.
If the operands could be of some type other than bool then its pretty trivial to provide a type that has different results for either operator, such as a super mean type that overrides the true operator in a manor inconsistent with it's implicit conversion to bool:

Since the implementation to satisfy the puzzle is entirely up to you, why dont you try:
public static bool Puzzle(bool x, bool y)
{
return x & !y;
}

Why is there a difference in checking null against a value in VB.NET and C#?

In VB.NET this happens:
Dim x As System.Nullable(Of Decimal) = Nothing
Dim y As System.Nullable(Of Decimal) = Nothing
y = 5
If x <> y Then
Console.WriteLine("true")
Else
Console.WriteLine("false") '' <-- I got this. Why?
End If
But in C# this happens:
decimal? x = default(decimal?);
decimal? y = default(decimal?);
y = 5;
if (x != y)
{
Debug.WriteLine("true"); // <-- I got this -- I'm with you, C# :)
}
else
{
Debug.WriteLine("false");
}
Why is there a difference?

VB.NET and C#.NET are different languages, built by different teams who have made different assumptions about usage; in this case the semantics of a NULL comparison.
My personal preference is for the VB.NET semantics, which in essence gives NULL the semantics "I don't know yet". Then the comparison of 5 to "I don't know yet". is naturally "I don't know yet"; ie NULL. This has the additional advantage of mirroring the behaviour of NULL in (most if not all) SQL databases. This is also a more standard (than C#'s) interpretation of three-valued logic, as explained here.
The C# team made different assumptions about what NULL means, resulting in the behaviour difference you show. Eric Lippert wrote a blog about the meaning of NULL in C#. Per Eric Lippert: "I also wrote about the semantics of nulls in VB / VBScript and JScript here and here".
In any environment in which NULL values are possible, it is imprtant to recognize that the Law of the Excluded Middle (ie that A or ~A is tautologically true) no longer can be relied on.
Update:
A bool (as opposed to a bool?) can only take the values TRUE and FALSE. However a language implementation of NULL must decide on how NULL propagates through expressions. In VB the expressions 5=null and 5<>null BOTH return false. In C#, of the comparable expressions 5==null and 5!=null only the second first [updated 2014-03-02 - PG] returns false. However, in ANY environment that supports null, it is incumbent on the programmer to know the truth tables and null-propagation used by that language.
Update
Eric Lippert's blog articles (mentioned in his comments below) on semantics are now at:
Sep. 30, 2003 - A Whole Lot of Nothing
Oct. 1, 2003 - A Little More on Nothing

Because x <> y returns Nothing instead of true. It is simply not defined since x is not defined. (similar to SQL null).
Note: VB.NET Nothing <> C# null.
You also have to compare the value of a Nullable(Of Decimal) only if it has a value.
So the VB.NET above compares similar to this(which looks less incorrect):
If x.HasValue AndAlso y.HasValue AndAlso x <> y Then
Console.WriteLine("true")
Else
Console.WriteLine("false")
End If
The VB.NET language specification:
7.1.1 Nullable Value Types
... A nullable value type can contain the same values as the non-nullable
version of the type as well as the null value. Thus, for a nullable
value type, assigning Nothing to a variable of the type sets the value
of the variable to the null value, not the zero value of the value
type.
For example:
Dim x As Integer = Nothing
Dim y As Integer? = Nothing
Console.WriteLine(x) ' Prints zero '
Console.WriteLine(y) ' Prints nothing (because the value of y is the null value) '

Look at the generated CIL (I've converted both to C#):
C#:
private static void Main(string[] args)
{
decimal? x = null;
decimal? y = null;
y = 5M;
decimal? CS$0$0000 = x;
decimal? CS$0$0001 = y;
if ((CS$0$0000.GetValueOrDefault() != CS$0$0001.GetValueOrDefault()) ||
(CS$0$0000.HasValue != CS$0$0001.HasValue))
{
Console.WriteLine("true");
}
else
{
Console.WriteLine("false");
}
}
Visual Basic:
[STAThread]
public static void Main()
{
decimal? x = null;
decimal? y = null;
y = 5M;
bool? VB$LW$t_struct$S3 = new bool?(decimal.Compare(x.GetValueOrDefault(), y.GetValueOrDefault()) != 0);
bool? VB$LW$t_struct$S1 = (x.HasValue & y.HasValue) ? VB$LW$t_struct$S3 : null;
if (VB$LW$t_struct$S1.GetValueOrDefault())
{
Console.WriteLine("true");
}
else
{
Console.WriteLine("false");
}
}
You'll see that the comparison in Visual Basic returns Nullable<bool> (not bool, false or true!). And undefined converted to bool is false.
Nothing compared to whatever is always Nothing, not false in Visual Basic (it is the same as in SQL).

The problem that's observed here is a special case of a more general problem, which is that the number of different definitions of equality that may be useful in at least some circumstances exceeds the number of commonly-available means to express them. This problem is in some cases made worse by an unfortunate belief that it is confusing to have different means of testing equality yield different results, and such confusion might be avoided by having the different forms of equality yield the same results whenever possible.
In reality, the fundamental cause of confusion is a misguided belief that the different forms of equality and inequality testing should be expected to yield the same result, notwithstanding the fact that different semantics are useful in different circumstances. For example, from an arithmetic standpoint, it's useful to be able to have Decimal which differ only in the number of trailing zeroes compare as equal. Likewise for double values like positive zero and negative zero. On the other hand, from a caching or interning standpoint, such semantics can be deadly. Suppose, for example, one had a Dictionary<Decimal, String> such that myDict[someDecimal] should equal someDecimal.ToString(). Such an object would seem reasonable if one had many Decimal values that one wanted to convert to string and expected there to be many duplicates. Unfortunately, if used such caching to convert 12.3 m and 12.40 m, followed by 12.30 m and 12.4 m, the latter values would yield "12.3", and "12.40" instead of "12.30" and "12.4".
Returning to the matter at hand, there is more than one sensible way of comparing nullable objects for equality. C# takes the standpoint that its == operator should mirror the behavior of Equals. VB.NET takes the standpoint that its behavior should mirror that of some other languages, since anyone who wants the Equals behavior could use Equals. In some sense, the right solution would be to have a three-way "if" construct, and require that if the conditional expression returns a three-valued result, code must specify what should happen in the null case. Since that is not an option with languages as they are, the next best alternative is to simply learn how different languages work and recognize that they are not the same.
Incidentally, Visual Basic's "Is" operator, which is lacking in C, can be used to test for whether a nullable object is, in fact, null. While one might reasonably question whether an if test should accept a Boolean?, having the normal comparison operators return Boolean? rather than Boolean when invoked on nullable types is a useful feature. Incidentally, in VB.NET, if one attempts to use the equality operator rather than Is, one will get a warning that the result of the comparison will always be Nothing, and one should use Is if one wants to test if something is null.

May be
this
post well help you:
If I remember correctly, 'Nothing' in VB means "the default value". For a value type, that's the default value, for a reference type, that would be null. Thus, assigning nothing to a struct, is no problem at all.

This is a definite weirdness of VB.
In VB, if you want to compare two nullable types, you should use Nullable.Equals().
In your example, it should be:
Dim x As System.Nullable(Of Decimal) = Nothing
Dim y As System.Nullable(Of Decimal) = Nothing
y = 5
If Not Nullable.Equals(x, y) Then
Console.WriteLine("true")
Else
Console.WriteLine("false")
End If

Your VB code is simply incorrect - if you change the "x <> y" to "x = y" you will still have "false" as the result. The most common way of expression this for nullable instances is "Not x.Equals(y)", and this will yield the same behavior as "x != y" in C#.

Disadvantages of using "variable !== FALSE" [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Reading a job ad, a requirement (sigh) was that the applicant should more or less hate the usage of variable !== FALSE. I, however, cannot see the reason of this, since I find it quite handy.
Say that a function (get_user( int user_id )) returns FALSE if it doesn't succeed (find the requested user), I can simply use:
user = get_user(823);
if(user !== FALSE) {
// User found
} else {
// No user found
}
I could, of course, simply use if(!user), however, I don't always find it suiting, especially when I have a few conditions to meet.
Are there any disadventages of writing code like this?
Clarification: This is a more global question, as the ad were against PHP usage of !== FALSE and C# usage of != FALSE.

You say get_user returns false when it doesn't succeed - and that's most of the problem. If a function call does not succeed, it should throw an exception, not return an answer that is not type safe ("false" as opposed to a User).

You could do on accident
if($variable = FALSE)
And this would change the value of $variable

In Javascript, the two following are equivalents:
user != FALSE
!user
However, this is slightly different:
user !== FALSE
as it is the negation of ===, which checks for both values and types to be the same.
Reference:
http://www.w3schools.com/js/js_comparisons.asp

If you have in Java
Boolean flag1 = new Boolean(true);
Boolean flag2 = new Boolean(false);
if (flag1 == Boolean.TRUE || flag1 == Boolean.FALSE) // is false.
if (flag2 == Boolean.TRUE || flag2 == Boolean.FALSE) // is false.
if (flag1 != Boolean.FALSE && flag2 != Boolean.FALSE) // is true
Using == FALSE or != FALSE may not work the way you think so while it is more verbose, it is also error prone.

From a dev point of view, it's better to design your methods that do not return different values that evaluate to FALSE. Eg your function should not return multiple possible values of 0 or NULL or FALSE or "" (empty string) or empty array() because this would require some unobvious logic to check the result after.
If we talk about PHP on the other hand it's good to know that some functions return both zero or false, eg strpos. You should not assume if (!strpos(...)) means the string is not contained in the other string, it could be on zero position, which evaluates to FALSE. Sometimes also a function might return FALSE on failure or empty array on success (that did not produce result that evaluates to TRUE).
So, point out there are cases this is neccessary or useful, but in general its bad practice to return multiple values that evaluate to FALSE, thus requiring to include the type in the comparisment operator.

What is asked for a requirement might be related to taste and to field of use.
In a strict logical system, one might be inclined to select ID values that represent objects (in any kind of store or memory) to not be 0 (and to be positive numbers only).
That leaves room for having 0 to represent a non-existing object (so called NULL object).
Also negative numbers could be used to signal meta information like errors. But a strict system should not use any negative numbers and pass meta-information on some context instead of the mainline of data-flow.
Probably this is what is wanted. In that case the value FALSE would not exist, because the whole system is about numbers only.
The benefit of such a system is that it works nearly in every programming language and across different paradigms.
Also commonly the number zero expresses a falsy condition, and non-zero numbers a truthy one.

Actually it depends on the language.
C
In the programm language C there is no bool and then TRUE or FALSE are defined values, mostly 1 for TRUE and 0 for FALSE. However, since this is just a sort of convention, some define FALSE as -1 and TRUE as 255. You see this results in strange behavior, therefor in C always check against a defined value and make sure those values match.
Also in C some people think the best way is to use 'FALSE == a' instead of 'a == FALSE'. The reason is that if one = is forgotten, the first (FALSE == a) gives a compilation error, and the second is treated as an assignment, which clearly is unintended.
C#
However, in C#, true and false are predefined and checking a variable against true and false is quite useless. a == true means exactly the same as just writing a.
Python
In Python however, there is a change. a == true means that the variable a should be a boolean, if you write just 'a' then it means it can also be 0, [], {} or any inited value.
A bit off-topic: I think if such detail is part of a 'requirement for a job', the job requirements would be like 10,000 pages.

Is relying on && short-circuiting safe in .NET?

Assume myObj is null. Is it safe to write this?
if(myObj != null && myObj.SomeString != null)
I know some languages won't execute the second expression because the && evaluates to false before the second part is executed.

Yes. In C# && and || are short-circuiting and thus evaluates the right side only if the left side doesn't already determine the result. The operators & and | on the other hand don't short-circuit and always evaluate both sides.
The spec says:
The && and || operators are called the conditional logical operators. They are also called the “shortcircuiting” logical operators.
...
The operation x && y corresponds to the operation x & y, except that y is evaluated only if x is true
...
The operation x && y is evaluated as (bool)x ? (bool)y : false. In other words, x is first evaluated and converted to type bool. Then, if x is true, y is evaluated and converted to type bool, and this becomes the result of the operation. Otherwise, the result of the operation is false.
(C# Language Specification Version 4.0 - 7.12 Conditional logical operators)
One interesting property of && and || is that they are short circuiting even if they don't operate on bools, but types where the user overloaded the operators & or | together with the true and false operator.
The operation x && y is evaluated as T.false((T)x) ? (T)x : T.&((T)x, y), where
T.false((T)x) is an invocation of the operator false declared in T, and T.&((T)x, y) is an invocation of the selected operator &. In addition, the value (T)x shall only be evaluated once.
In other words, x is first evaluated and converted to type T and operator false is invoked on the result to determine if x is definitely false.
Then, if x is definitely false, the result of the operation is the value previously computed for x converted to type T.
Otherwise, y is evaluated, and the selected operator & is invoked on the value previously computed for x converted to type T and the value computed for y to produce the result of the operation.
(C# Language Specification Version 4.0 - 7.12.2 User-defined conditional logical operators)

Yes, C# uses logical short-circuiting.
Note that although C# (and some other .NET languages) behave this way, it is a property of the language, not the CLR.

I know I'm late to the party, but in C# 6.0 you can do this too:
if(myObj?.SomeString != null)
Which is the same thing as above.
Also see:
What does question mark and dot operator ?. mean in C# 6.0?

Your code is safe - && and || are both short-circuited. You can use non-short-circuited operators & or |, which evaluate both ends, but I really don't see that in much production code.

sure, it's safe on C#, if the first operand is false then the second is never evaluated.

It is perfectly safe. C# is one of those languages.

In C#, && and || are short-circuited, meaning that the first condition is evaluated and the rest is ignored if the answer is determined.
In VB.NET, AndAlso and OrElse are also short-circuited.
In javaScript, && and || are short-circuited too.
I mention VB.NET to show that the ugly red-headed step-child of .net also has cool stuff too, sometimes.
I mention javaScript, because if you are doing web development then you probably might also use javaScript.

Yes, C# and most languages compute the if sentences from left to right.
VB6 by the way will compute the whole thing, and throw an exception if it's null...

an example is
if(strString != null && strString.Length > 0)
This line would cause a null exception if both sides executed.
Interesting side note. The above example is quite a bit faster than the IsNullorEmpty method.

In C++ and C# are multiple condition checks performed in a predetermined or random sequence?

Situation: condition check in C++ or C# with many criteria:
if (condition1 && condition2 && condition3)
{
// Do something
}
I've always believed the sequence in which these checks are performed is not guaranteed. So it is not necessarily first condition1 then condition2 and only then condition3. I learned it in my times with C++. I think I was told that or read it somewhere.
Up until know I've always written secure code to account for possible null pointers in the following situation:
if ((object != null) && (object.SomeFunc() != value))
{
// A bad way of checking (or so I thought)
}
So I was writing:
if (object != null)
{
if (object.SomeFunc() != value)
{
// A much better and safer way
}
}
Because I was not sure the not-null check will run first and only then the instance method will be called to perform the second check.
Now our greatest community minds are telling me the sequence in which these checks are performed is guaranteed to run in the left-to-right order.
I'm very surprised. Is it really so for both C++ and C# languages?
Has anybody else heard the version I heard before now?

Short Answer is left to right with short-circuit evaluation. The order is predictable.
// perfectly legal and quite a standard way to express in C++/C#
if( x != null && x.Count > 0 ) ...
Some languages evaluate everything in the condition before branching (VB6 for example).
// will fail in VB6 if x is Nothing.
If x Is Not Nothing And x.Count > 0 Then ...
Ref: MSDN C# Operators and their order or precedence.

They are defined to be evaluated from left-to-right, and to stop evaluating when one of them evaluates to false. That's true in both C++ and C#.

I don't think there is or has been any other way. That would be like the compiler deciding to run statements out of order for no reason. :) Now, some languages (like VB.NET) have different logical operators for short-circuiting and not short-circuiting. But, the order is always well defined at compile time.
Here is the operator precedence from the C# language spec. From the spec ...
Except for the assignment operators,
all binary operators are
left-associative, meaning that
operations are performed from left to
right. For example, x + y + z is
evaluated as (x + y) + z.

They have to be performed from left to right. This allows short circuit evaluation to work.
See the Wikipedia article for more information.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.