How is Nullable<T> different from a similar custom C# struct?

How is Nullable<T> different from a similar custom C# struct? - c#

In Nullable micro-optimizations, part one, Eric mentions that Nullable<T> has a strange boxing behaviour that could not be achieved by a similar user-defined type.
What are the special features that the C# language grants to the predefined Nullable<T> type? Especially the ones that could not be made to work on a MyNullable type.
Of course, Nullable<T> has special syntactic sugar T?, but my question is more about semantics.

What I was getting at is: there is no such thing as a boxed nullable. When you box an int, you get a reference to a boxed int. When you box an int?, you get either a null reference or a reference to a boxed int. You never get a boxed int?.
You can easily make your own Optional<T> struct, but you can't implement a struct that has that boxing behaviour. Nullable<T>'s special behaviour is baked into the runtime.
This fact leads to a number of oddities. For example:
C# 4: Dynamic and Nullable<>
C# Reflection: How to get the type of a Nullable<int>?
Cannot change type to nullable in generic method
And FYI there are other ways in which the Nullable<T> type is "magical". For instance, though it is a struct type, it does not satisfy the struct constraint. There's no way for you to make your own struct that has that property.

I found these two in the C# specifications:
The is operator works on T? as it would have on T, and the as operator can convert to nullable types.
Predefined and user-defined operators that operate on non-nullable value types are lifted to the nullable forms of those types.
Now, here are the features that I think are not limited to Nullable<T>:
The value in a switch can be of a nullable type. I don't think this counts, because switch also accepts user-defined implicit conversions that could be defined on a MyNullable type.
Nullable IDisposable types are supported, with a null check being inserted before the generated calls to Dispose(). I don't think this counts, because I could define MyNullable as a class and then it would be the same.
Here is what I am not sure about:
The specs mentions boxing/unboxing and implicit/explicit conversions, but I do not understand whether the same results can be achieved with a MyNullable.

C# lifts operators on nullable types. For example:
int? SumNullableInts(int? a, int? b)
{
return a + b;
}
You would have to do a lot of reflection work in MyNullable<T> to support that, and then the following would compile, where it shouldn't:
MyNullable<List<string>.Enumerator> SumNullableEnumerators(MyNullable<List<string>.Enumerator> a, MyNullable<List<string>.Enumerator> b)
{
return a + b;
}

Related

Is there a coalesce operator for parameters?

If F were "on" a, I could do this...
var obj = a?.F();
If F is not on a, I have to do this...
var obj = a == null ? null : MyFunc.F((A) a);
Or do I? Is there a more succinct way of skipping the method call if a parameter value is null?

The short answer is no, there's no succinct way to do that.
The slightly longer answer is still no, but there's an interesting language design point here. C# was designed by people with extremely good taste, if I say so myself, but it was designed over a very long period of time. Nowhere is this more obvious than with respect to its treatment of nullability.
In C# 1.0 we had a straightforward language in the tradition of C, C++, Java, JavaScript, and so on, where there are references and values, and references can be null. This has benefits; if it did not, Sir Tony would not have invented the null reference in the first place. But it has downsides: we have the possibility of null references, dereferencing null leads to program crashes, and there is an inconsistency between reference and value types: reference types have a natural "no value" value, and value types do not.
In C# 2.0 we added nullable value types, but nullable value types do not behave like nullable reference types. Of course nullable values types are not references, so you cannot "dereference" them, but if we squint a little, the .Value property looks a lot like "dereferencing", and it leads to a similar crash if the value is null. In that sense, they behave the same, but in other senses, they do not. Adding together two nullable integers does not crash if one of them is null; instead, the result is also null.
So at this point we have a contradiction built into the language:
Using a null value of a nullable value type usually automatically propagates the null, but using a null reference can crash.
And of course C# has then gone on to add a variety of features that make null references behave more like null values, like ?. and the related operations. There are also proposals for C# 8 that are very exciting, and will support "non nullable reference type" scenarios.
But the bolded text above is the fundamental problem you've pinpointed: the semantics of operators on nullable reference types are almost always "lift the non-nullable version of the operator to nullable types; if all the operands are non-null then the result is the same as the unlifted version; otherwise, the result is null". However, those semantics are not automatically extended to the . member access operator or the () call operator, regardless of whether the operands are nullable value types or nullable reference types. . can be lifted explicitly by ?. but the () operator does not get lifted to nullable semantics, ever.
Imagine a language like C# 1.0, but with Nullable<T> built in from the start, such that it applied to both reference and value types. In that world, you can see natural ways to implement generalized lifting, where if you have a method
class C { string M(double, int[]) }
and you call it with a Nullable<C> receiver, or Nullable<double> and Nullable<int[]> arguments, you automatically get the code that we build for you for nullable integer arithmetic: check whether the receiver or any arguments are null, and if they are, result in a null Nullable<string>. Otherwise, call the function normally and use the non-nullable result.
The C# compiler already implements these semantics for all user-defined operators declared on struct types; it would not be hardly any difficulty at all to extend those semantics to other kinds of methods. But we can't do it now; there are far too many backwards-compatibility issues to solve.
This design choice would also have the nice property that it would be a correct implementation of the "maybe monad".
But that's not the world we are in, and it's largely because these design considerations evolved over time, rather than being invented all at once. The next time you invent a new language, consider carefully how to represent nullability!

Do interface variables have value-type or reference-type semantics?

Do interface variables have value-type or reference-type semantics?
Interfaces are implemented by types, and those types are either value types or reference types. Obviously, both int and string implement IComparable, and int is a value type, and string is a reference type. But what about this:
IComparable x = 42;
IComparable y = "Hello, World!";
(The question I was trying to answer was presumably deleted because it asked whether interfaces are stored on the stack or the heap, and, as we should all be aware, it's more constructive to think of differences between value and reference types in terms of their semantics rather than their implementation. For a discussion, see Eric Lippert's The stack is an implementation detail.)

Usually, as per the existing answers, it is a reference-type and requires boxing; there is an exception though (isn't there always?). In a generic method with a where constraint, it can be both:
void Foo<T>(T obj) where T : ISomeInterface {
obj.SomeMethod();
}
This is a constrained operation, and is not boxed even if it is a value-type. This is achieved via constrained. Instead, the JIT performs the operation as virtual-call for reference-types, and static-call for value-types. No boxing.

This is about understanding boxing and unboxing of types. In your example, the int is boxed upon assignment and a reference to that "box" or object is what is assigned to x. The value type int is defined as a struct which implements IComparable. However, once you use that interface reference to refer to the value type int, it will be boxed and placed on the heap. That's how it works in practice. The fact that using an interface reference causes boxing to occur by definition makes this reference type semantics.
MSDN: Boxing and Unboxing

A variable or field whose type is IComparable is a reference-type variable or field, regardless of the type of the value assigned to that field. This means that x in the sample code is boxed.
A simple test will demonstrate that. The test is based on the fact that you can only unbox a value type to its original type (and the nullable version of that type):
[TestMethod, ExpectedException(typeof(InvalidCastException))]
public void BoxingTest()
{
IComparable i = 42;
byte b = (byte)i; //exception: not allowed to unbox an int to any type other than int
Assert.AreEqual(42, b);
Assert.Fail();
}
EDIT
On top of that, the C# specification specifically defines reference-type as comprising class types, interface types, array types and delegate types.
EDIT 2
As Marc Gravell points out in his answer, a generic type with an interface constraint is a different case. This doesn't cause boxing.

Variables of interface type will have always have either immutable semantics, mutable reference semantics, or "oddball" semantics (something other than normal reference or value semantics). If variable1 and variable2 are both declared as the same interface type, one performs variable2 = variable1, and one never again writes to either variable, the instance referred to by variable1 will always be indistinguishable from the one referred to be variable2 (since it will be the same instance).
Generic types with interface constraints may have immutable semantics, mutable reference semantics, or "quirky" semantics, but may also have mutable value semantics. This can be dangerous if the interface is not documented as having mutable value semantics. Unfortunately, there is no way to constrain an interface to have either immutable semantics or mutable value semantics (meaning that following variable2 = variable1, it should not be possible to change variable1 by writing variable2, nor vice versa). One could add a "struct" constraint along with the interface constraint, but that would exclude classes which have immutable semantics while not excluding structs that have reference semantics.

Why is Nullable type supported by CLR?

I heard that the addition of Nullable<T> type to C# 2.0 need to revise CLR(runtime) a little, is this change necessary? Could it make the same goal if a new Nullable<T> generic class was added only?

Nullable isn't a generic class as you indicate, Nullable<T> is generic (it has a type parameter, T). That's why Nullable<T> only arrived in C# 2.0: it came with the addition of generics to the CLR.
You could do the same thing with a general Nullable, but you couldn't do things like this:
int? myInt = 123;
int result = myInt.Value;
You would instead have to:
int result = (int)myInt.Value;
...and it might not be type-safe, by that I mean what if myInt.Value is a string? The generic version, Nullable<T>, would only allow an int into the Value property.
I don't completely understand what you're asking, though.. "why are generic classes useful"?

If I understand you correctly, you are asking why can't it just be Type introduced in the framework library? but instead the CLR need to be changed?
Well based on my understanding Nullable is a special type, not quite like other container type. First it is actually a value type - defined as struct, not a class. Separately it allows to be assigned a value of null (for value type, that is a special case), plus it support the use of '?' and operator '??' which is all new. The Nullable also become part of the Common type system. So I guess from these perspective, the specification, compiler and CLR will need to be changed.

There are only two reasons I know of for the special handling of Nullable<T>:
To allow a typecast from null Object reference to a Nullable<T> with HasValue = false.
To allow a Nullable<T> where HasValue is false, to compare equal to null.
Frankly, think it would have been better to let Nullable<T> box like any other value type, and define Nullable<T>.Empty as a value which may be compared against (for those cases where one might want to compare against a variable that might be null or might hold a value). To my mind, there's no reason why Object.Equals should report that an "int?" which is equal to null is equal to a "long?" which is also equal to null. The first should be viewed as an empty int-sized box, and the latter as an empty long-sized box.

What is the integer reference type in C#?

I'd like to have an integer variable which can be set to null and don't want to have to use the int? myVariable syntax. I tried using int and Int16 to no avail. Do I have to use int? myVariable?
I mentioned this because in Java there is both an 'int' type (a primitive) and 'Integer' (a reference type). I wanted to be sure that there isn't a built-in integer reference type that I could be using. I'll use 'int?' for what I'm doing.

For info, int? / Nullable<T> is not a reference-type; it is simply a "nullable type", meaning: a struct (essentially and int and a bool flag) with special compiler rules (re null checks, operators, etc) and CLI rules (for boxing/unboxing). There is no "integer reference-type" in .NET, unless you count boxing:
int i = 123;
object o = i; // box
but this creates an unnecessary object and has lots of associated other issues.
For what you want, int? should be ideal. You could use the long-hand syntax (Nullable<int>) but IMO this is unnecessarily verbose, and I've seen it confuse people.

Yes you should use nullable types
See Nullable
Nullable<int> Nullable<float>
or simply
int? float?
PS:
If you don't want to use ? notation or Nullable at all - simply use special structures for such a thing. For example DataTable:
var table = new DataTable();
table.Columns.Add('intCol',typeof(int));
var row = table.NewRow();
row['intCol'] = null; //

Nullable types are instances of the System.Nullable(T) struct.
Therefore int, int?, Nullable<int> are all value types, not reference types

Yes, it's the only way, as Int32/Int16/int is a primitive type (regardless of boxing/unboxing of these).

You will have to use a nullable type, i.e:
Nullable<int> or int?
The purpose of nullable type is, to enable you to assign null to a value type.
From MSDN:
Nullable types address the scenario where you want to be able to have a primitive type with a null (or unknown) value. This is common in database scenarios, but is also useful in other situations.
In the past, there were several ways of doing this:
A boxed value type. This is not strongly-typed at compile-time, and involves doing a heap allocation for every type.
A class wrapper for the value type. This is strongly-typed, but still involves a heap allocation, and the you have to write the wrapper.
A struct wrapper that supports the concept of nullability. This is a good solution, but you have to write it yourself.

You could use System.Nullable<int> if you hate ? so much.

Yes. That's why Microsoft added that. People were creating their own Nullable class so they included this pattern in .NET.

The <var>? syntax makes it nullable. If you dig a bit deeper, you'll see it's just a generic object that gets used.
If you want to do it like C++ you must use unsafe operations. Then, pointers like in C are available, but your code doesn't conform to CLR...
For using pure .NET without pointers, the ? is the way to go.

Nullable<T> confusion

Why is the following forbidden?
Nullable<Nullable<int>>
whereas
struct MyNullable <T>
{
}
MyNullable<Nullable<int>>
is NOT

This is because the struct constraint actually means 'not nullable' since Nullable, despite being a struct, is nullable (can accept the value null) the Nullable<int> is not a valid type parameter to the outer Nullable.
This is made explicit in the constraints documentation
where T: struct
The type argument must be a value type. Any value type except Nullable can be specified.
See Using Nullable Types (C# Programming Guide) for more information.
If you want the rationale for that you would need the actual language designer's comments on it which I can't find. However I would postulate that:
the compiler and platform changes required to achieve Nullable in it's current form are quite extensive (and were a relatively last minute addition to the 2.0 release).
They have several potentially confusing edge cases.
Allowing the equivalent of int?? would only confuse that since the language provides no way of distinguishing Nullable<Nullable<null>> and Nullable<null> nor any obvious solution to the following.
Nullable<Nullable<int>> x = null;
Nullable<int> y = null;
Console.WriteLine(x == null); // true
Console.WriteLine(y == null); // true
Console.WriteLine(x == y); // false or a compile time error!
Making that return true would be very complex and significant overhead on many operations involving the Nullable type.
Some types in the CLR are 'special', examples are strings and primitives in that the compiler and runtime know a lot about the implementation used by each other. Nullable is special in this way as well. Since it is already special cased in other areas special casing the where T : struct aspect is not such a big deal. The benefit of this is in dealing with structs in generic classes because none of them, apart from Nullable, can be compared against null. This means the jit can safely consider t == null to be false always.
Where languages are designed to allow two very different concepts to interact you tend to get weird, confusing or down right dangerous edge cases. As an example consider Nullable and the equality operators
int? x = null;
int? y = null;
Console.WriteLine(x == y); // true
Console.WriteLine(x >= y); // false!
By preventing Nullables when using struct generic constraint many nasty (and unclear) edge cases can be avoided.
As to the exact part of the specification that mandates this from section 25.7 (emphasis mine):
The value type constraint specifies that a type argument used for the type parameter
must be a value type (§25.7.1). Any non-nullable struct type, enum type, or type
parameter having the value type constraint satisfies this constraint. A type parameter
having the value type constraint shall not also have the constructor-constraint.
The System.Nullable type specifies the non-nullable value type constraint for T.
Thus, recursively constructed types of the forms T?? and Nullable<Nullable<T>> are prohibited.

I believe you can only use non-nullable value types in Nullables. Since Nullable itself is nullable, nesting this way is prohibited.
From http://msdn.microsoft.com/en-us/library/kwxxazwb.aspx
public Nullable(
T value
)
Type: T A value type.

Nullable is special because there's explicit support for boxing and unboxing of Nullable types built into the CLR:
If you use the MSIL box instruction against a Nullable<T>, you will actually get a null as the result. There's no other value type which will produce a null when boxed.
There's similar and symmetrical support for unboxing.

The generic type parameter for Nullable must itself be a non-nullable type (i.e. value type). This is the C# compiler warning I get, and it would seem to make sense. Tell me, why would you want to do such a thing anyway? I personally can see no use, and little meaning to such a declaration even.

Nullable allows you to take a value type and make it like a reference type, in the sense that the value either exists or not (is null). Since a reference type is already nullable it is not allowed.
Quoted from MSDN:
The Nullable(T) structure supports
using only a value type as a nullable
type because reference types are
nullable by design.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.