I'm trying to collect all of the situations in which boxing occurs in C#:
Converting value type to System.Object type:
struct S { }
object box = new S();
Converting value type to System.ValueType type:
struct S { }
System.ValueType box = new S();
Converting value of enumeration type to System.Enum type:
enum E { A }
System.Enum box = E.A;
Converting value type into interface reference:
interface I { }
struct S : I { }
I box = new S();
Using value types in C# string concatenation:
char c = F();
string s1 = "char value will box" + c;
note: constants of char type are concatenated at compile time
note: since version 6.0 C# compiler optimizes concatenation involving bool, char, IntPtr, UIntPtr types
Creating delegate from value type instance method:
struct S { public void M() {} }
Action box = new S().M;
Calling non-overridden virtual methods on value types:
enum E { A }
E.A.GetHashCode();
Using C# 7.0 constant patterns under is expression:
int x = …;
if (x is 42) { … } // boxes both 'x' and '42'!
Boxing in C# tuple types conversions:
(int, byte) _tuple;
public (object, object) M() {
return _tuple; // 2x boxing
}
Optional parameters of object type with value type default values:
void M([Optional, DefaultParameterValue(42)] object o);
M(); // boxing at call-site
Checking value of unconstrained generic type for null:
bool M<T>(T t) => t != null;
string M<T>(T t) => t?.ToString(); // ?. checks for null
M(42);
note: this may be optimized by JIT in some .NET runtimes
Type testing value of unconstrained or struct generic type with is/as operators:
bool M<T>(T t) => t is int;
int? M<T>(T t) => t as int?;
IEquatable<T> M<T>(T t) => t as IEquatable<T>;
M(42);
note: this may be optimized by JIT in some .NET runtimes
Are there any more situations of boxing, maybe hidden, that you know of?
That’s a great question!
Boxing occurs for exactly one reason: when we need a reference to a value type. Everything you listed falls into this rule.
For example since object is a reference type, casting a value type to object requires a reference to a value type, which causes boxing.
If you wish to list every possible scenario, you should also include derivatives, such as returning a value type from a method that returns object or an interface type, because this automatically casts the value type to the object / interface.
By the way, the string concatenation case you astutely identified also derives from casting to object. The + operator is translated by the compiler to a call to the Concat method of string, which accepts an object for the value type you pass, so casting to object and hence boxing occurs.
Over the years I’ve always advised developers to remember the single reason for boxing (I specified above) instead of memorize every single case, because the list is long and hard to remember. This also promotes understanding of what IL code the compiler generates for our C# code (for example + on string yields a call to String.Concat). When your’e in doubt what the compiler generates and if boxing occurs, you can use IL Disassembler (ILDASM.exe). Typically you should look for the box opcode (there is just one case when boxing might occur even though the IL doesn't include the box opcode, more detail below).
But I do agree that some boxing occurrences are less obvious. You listed one of them: calling a non-overridden method of a value type. In fact, this is less obvious for another reason: when you check the IL code you don’t see the box opcode, but the constraint opcode, so even in the IL it’s not obvious that boxing happens! I won't get into the exact detail why to prevent this answer from becoming even longer...
Another case for less obvious boxing is when calling a base class method from a struct. Example:
struct MyValType
{
public override string ToString()
{
return base.ToString();
}
}
Here ToString is overridden, so calling ToString on MyValType won’t generate boxing. However, the implementation calls the base ToString and that causes boxing (check the IL!).
By the way, these two non-obvious boxing scenarios also derive from the single rule above. When a method is invoked on the base class of a value type, there must be something for the this keyword to refer to. Since the base class of a value type is (always) a reference type, the this keyword must refer to a reference type, and so we need a reference to a value type and so boxing occurs due to the single rule.
Here is a direct link to the section of my online .NET course that discusses boxing in detail: http://motti.me/mq
If you are only interested in more advanced boxing scenarios here is a direct link there (though the link above will take you there as well once it discusses the more basic stuff): http://motti.me/mu
I hope this helps!
Motti
Calling non-virtual GetType() method on value type:
struct S { };
S s = new S();
s.GetType();
Mentioned in Motti's answer, just illustrating with code samples:
Parameters involved
public void Bla(object obj)
{
}
Bla(valueType)
public void Bla(IBla i) //where IBla is interface
{
}
Bla(valueType)
But this is safe:
public void Bla<T>(T obj) where T : IBla
{
}
Bla(valueType)
Return type
public object Bla()
{
return 1;
}
public IBla Bla() //IBla is an interface that 1 inherits
{
return 1;
}
Checking unconstrained T against null
public void Bla<T>(T obj)
{
if (obj == null) //boxes.
}
Use of dynamic
dynamic x = 42; (boxes)
Another one
enumValue.HasFlag
Using the non-generic collections in System.Collections such as
ArrayList or HashTable.
Granted these are specific instances of your first case, but they can be hidden gotchas. It's amazing the amount of code I still come across today that use these instead of List<T> and Dictionary<TKey,TValue>.
Adding any value type value into the ArrayList causes boxing:
ArrayList items = ...
numbers.Add(1); // boxing to object
Related
Please have a look at the following code. Why do I get a compile-error?
I don't get it!
Casting is a way of telling the compiler that I know more about the objects than it does. And in this case, I know for fact, that "x" does actually contain an instance of "SomeClass". But the compiler seems to be unwilling to accept that information.
https://dotnetfiddle.net/0DlmXf
public class StrangeConversion
{
public class SomeClass { }
public interface ISomeInterface { }
public class Implementation : SomeClass, ISomeInterface { }
public void Foo<T>() where T : class
{
T x = (T)Factory();
//Compile-error: Cannot convert type 'T' to 'SomeClass'
SomeClass a = (SomeClass)x;
//This is perfectly fine:
SomeClass b = (SomeClass)(object)x;
if (x is SomeClass c)
{
//This works as well and 'c' contains the reference.
}
}
private object Factory()
{
return new Implementation();
}
}
Edit:
#Charles Mager has the correct answer in the comment: There does not seem to be a valid reason. The language designers just didn't want to allow this cast.
I fixed using the as casting e.g.
SomeClass a = x as SomeClass;
This Answer explains is very well https://stackoverflow.com/a/132467/16690008
Essentially it's because it would throw an exception if T is not type of that class
It's hard to make sense of exactly what you're trying to achieve, but it seems like a generic constraint is what you're after
public void Foo<T>()
where T : SomeClass // constrain T to be inheriting from SomeClass
{
T x = Factory<T>(); // x is guaranted to be SomeClass
}
private T Factory<T>()
where T : SomeClass // same here
{
return new Implementation();
}
You constrain the generic to only reference types by specifying where T : class, but the compiler needs to know with certainty if the cast is possible. This means you are asking the compiler to trust that SomeClass can be cast from any reference type you pass to it, which is something it won't do. The microsoft docs state that for the class generic type constraint:
The type argument must be a reference type. This constraint applies also to any class, interface, delegate, or array type.
Its important to note that SomeClass b = (SomeClass)(object)x; works because of the cast to object which is the root of the object hierarchy. But as you can see from the list of supported reference types, SomeClass a = (SomeClass)x; has to account for things such as delegates, array types, etc., at which point the compiler will throw you the error
Don't do SomeClass b = (SomeClass)(object)x;, it is much cleaner to make proper use of type constraints along with the as & is operators which were designed for this exact purpose of type checking and safe casting
Short answer:
This behaviour is correct according to the spec. The spec is just bad here since this might convert a compile-error into a runtime-error.
Long answer:
I did some more research on the matter. This is an oversight in the language's spec.
C# uses the same syntax for two totally different things:
int i = (int)1.9
This converts the double 1.9 to an integer. The value is actually changed.
object o = "abc";
string s = (string) o;
This looks the same, but does not change the object referenced by "o" at all. It does only convert the type of the reference.
When it comes to generics, this kind of ambiguity is an issue:
function f(T x) {
var x = (string) x;
}
What should the language do if T is "int"?
That's why the spec forces the developer to cast to object first:
function f(T x) {
var x = (string)(object)x;
}
Now, the behaviour is clear: X might still be a value-type. But if it is, it will be converted to a reference-type first.
This ambiguity does not exist in my example, since T is guaranteed to be a reference type:
public void Foo<T>() where T : class
Thus the cast to object is not necessary. It could even be harmful if the "where" specifies an actual type. In that case, the forced cast to object might convert a compile-time-error (impossible cast) to a runtime-error.
Unfortunately, the people who created the spec, did not see this issue and did not include it.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Is it safe for structs to implement interfaces?
Take this code:
interface ISomeInterface
{
public int SomeProperty { get; }
}
struct SomeStruct : ISomeInterface
{
int someValue;
public int SomeProperty { get { return someValue; } }
public SomeStruct(int value)
{
someValue = value;
}
}
and then I do this somewhere:
ISomeInterface someVariable = new SomeStruct(2);
is the SomeStruct boxed in this case?
Jon's point is true, but as a side note there is one slight exception to the rule; generics. If you have where T : ISomeInterface, then this is constrained, and uses a special opcode. This means the interface can be used without boxing. For example:
public static void Foo<T>(T obj) where T : ISomeInterface {
obj.Bar(); // Bar defined on ISomeInterface
}
This does not involve boxing, even for value-type T. However, if (in the same Foo) you do:
ISomeInterface asInterface = obj;
asInterface.Bar();
then that boxes as before. The constrained only applies directly to T.
Yes, it is. Basically whenever you need a reference and you've only got a value type value, the value is boxed.
Here, ISomeInterface is an interface, which is a reference type. Therefore the value of someVariable is always a reference, so the newly created struct value has to be boxed.
I'm adding this to hopefully shed a little more light on the answers offered by Jon and Marc.
Consider this non-generic method:
public static void SetToNull(ref ISomeInterface obj) {
obj = null;
}
Hmm... setting a ref parameter to null. That's only possibly for a reference type, correct? (Well, or for a Nullable<T>; but let's ignore that case to keep things simple.) So the fact that this method compiles tells us that a variable declared to be of some interface type must be treated as a reference type.
The key phrase here is "declared as": consider this attempt to call the above method:
var x = new SomeStruct();
// This line does not compile:
// "Cannot convert from ref SomeStruct to ref ISomeInterface" --
// since x is declared to be of type SomeStruct, it cannot be passed
// to a method that wants a parameter of type ref ISomeInterface.
SetToNull(ref x);
Granted, the reason you can't pass x in the above code to SetToNull is that x would need to be declared as an ISomeInterface for you to be able to pass ref x -- and not because the compiler magically knows that SetToNull includes the line obj = null. But in a way that just reinforces my point: the obj = null line is legal precisely because it would be illegal to pass a variable not declared as an ISomeInterface to the method.
In other words, if a variable is declared as an ISomeInterface, it can be set to null, pure and simple. And that's because interfaces are reference types -- hence, declaring an object as an interface and assigning it to a value type object boxes that value.
Now, on the other hand, consider this hypothetical generic method:
// This method does not compile:
// "Cannot convert null to type parameter 'T' because it could be
// a non-nullable value type. Consider using 'default(T)' instead." --
// since this method could take a variable declared as, e.g., a SomeStruct,
// the compiler cannot assume a null assignment is legal.
public static void SetToNull<T>(ref T obj) where T : ISomeInterface {
obj = null;
}
The MSDN documentation tells us that structs are value, not reference types. They are boxed when converting to/from a variable of type object. But the central question here is: what about a variable of an interface type? Since the interface can also be implemented by a class, then this must be tantamount to converting from a value to a reference type, as Jon Skeet already said, therefore yes boxing would occur. More discussion on an msdn blog.
When I run the following:
ParentClass foo = serializer.Deserialize(xmlReader) as ParentClass;
The xml document loaded in xmlReader is an inherited type of ParentClass. When examined in the debugger, foo is showing as being an instance of the inherited class, not the parent class. Of course, the inherited class is also of type ParentClass, but why does the as keyword have this behavior? Why doesn't C# strip out all the other object information not required to convert to ParentClass.
This is not a problem, but more or less a question out of curiosity.
The object itself is not modified, which is why the object's type is still displayed as "ParentClass" in the debugger.
Consider the following example, which I think is illustrative. What do you think is output to the console here?
class Program
{
public class ParentClass
{
public virtual void foo()
{
Console.WriteLine("parent.foo");
}
public virtual void bar()
{
Console.WriteLine("parent.bar");
}
}
public class InheritedClass : ParentClass
{
public new void foo()
{
Console.WriteLine("inherited.foo");
}
public override void bar()
{
Console.WriteLine("inherited.bar");
}
}
static void Main(string[] args)
{
var inherited = new InheritedClass();
var parent = inherited as ParentClass;
var d = parent as dynamic;
parent.foo();
inherited.foo();
d.foo();
parent.bar();
inherited.bar();
d.bar();
Console.Read();
}
}
Only one object is created, and then two more references to it are created: one with the inherited static type, and one with the "dynamic" type. That all references refer to the same object is demonstrated by the fact that invoking "bar" invokes "InheritedClass.bar" regardless of the static type (the runtime type is always the same).
However, notice the difference between using "override" and "new": you will see that "parent.foo()" invokes the "ParentClass.foo" method. That is because the "parent" variable is of the static type "ParentClass" type, and so the C# compiler emits IL instructions to call the method on "ParentClass". You can see further that the "dynamic" type reference still calls "InheritedClass.foo", because dynamic types resolve at runtime, and this resolves to the actual runtime type which is "InheritedClass".
Edit #InBetween has an important distinction that I didn't consider. In the case of casting from a value type to a reference type (or vice versa), a new object is actually created, as new memory must be allocated on the heap or stack respectively (the "boxing" process). Of course, partly for this reason, virtual methods are not supported for struct and other value types.
as can only perform reference conversions, nullable conversions, and boxing conversions. It wont perform any other type of conversions like user-defined conversions.
In your case its performing a compatible reference conversion; the object remains the same, you are only changing the reference.
But as can "modify" an object in the sense that I think you are saying when, for example, boxing which entails more than simply converting the reference.
var o = 1 as object;
o is an alltogether different object than integer 1.
It is important to note though that in any succesful as conversion GetType() will still return the original type of the object which is not the general behavior of the cast operator.
Is there anything wrong with using an implicit operator like the following:
//linqpad c# program example
void Main()
{
var testObject = new MyClass<int>() { Value = 1 };
var add = 10 + testObject; //implicit conversion to int here
add.Dump(); // 11
}
class MyClass<T>
{
public T Value { get; set; }
public static implicit operator T (MyClass<T> myClassToConvert)
{
return myClassToConvert.Value;
}
}
I was thinking I could treat as instance of the object as a value type this way, but seeing as I've never seen an example of this I thought maybe there was a reason not to do something like this that someone could point out?
In my actual code I was thinking of doing this as part of a data abstraction layer, so that I could return objects with information describing the underlying data, but allow the logic code to treat it as a value type when all it needs to know about is the value, and at the same time keep it all nice and type safe with the generics.
If all of the following are true:
all possible values of your MyClass<T> type (including null if it’s not a value type!) map to a valid value of T
the implicit operator never throws (not even for null!)
the implicit conversion makes semantic sense and is not confusing to the client programmer
then there is nothing wrong with this. Of course you could do any of these three things, but it would be bad design. In particular, an implicit operator that throws can be very hard to debug because the place where it is called doesn’t say that it is being called.
For example, consider that T? has no implicit conversion to T (where T is, of course, a value type). If there was such an implicit operator, it would have to throw when the T? is null, as there is no obvious value to convert null to that would make sense for any value type T.
Let me give an example where I had trouble debugging an issue where the implicit operator threw:
public string Foo()
{
return some_condition ? GetSomething() : null;
}
Here, GetSomething returned something of a type I wrote which has a user-defined implicit conversion to string. I made absolutely sure that GetSomething could never return null, and yet I got a NullReferenceException! Why? Because the above code is not equivalent to
return some_condition ? (string)GetSomething() : (string)null;
but to
return (string)(some_condition ? GetSomething() : (Something)null);
Now you can see where the null came from!
That's a great pattern. Just keep in mind that in order to use it as a variable of type T, you have to either explicitly cast it to T, or assign it to a variable of type T. The cast will take place automatically in method calls and other things (such as your addition example) that take a T.
Implicit conversion without assignment?
I want to make a method:
object Execute()
{
return type.InvokeMember(..);
}
to accept a generic parameter:
T Execute<T>()
{
return Execute() as T;
/* doesn't work:
The type parameter 'T' cannot be used with the 'as' operator because
it does not have a class type constraint nor a 'class' constraint */
// also neither typeof(T), nor T.GetType() are possible
return (T) Execute(); // ok
}
But I think operator as will be very useful: if result type isn't T method will return null, instead of an exception! Is it possible to do?
You need to add
where T : class
to your method declaration, e.g.
T Execute<T>() where T : class
{
By the way, as a suggestion, that generic wrapper doesn't really add much value. The caller can write:
MyClass c = whatever.Execute() as MyClass;
Or if they want to throw on fail:
MyClass c = (MyClass)whatever.Execute();
The generic wrapper method looks like this:
MyClass c = whatever.Execute<MyClass>();
All three versions have to specify exactly the same three entities, just in different orders, so none are any simpler or any more convenient, and yet the generic version hides what is happening, whereas the "raw" versions each make it clear whether there will be a throw or a null.
(This may be irrelevant to you if your example is simplified from your actual code).
You cannot use the as operator with a generic type with no restriction. Since the as operator uses null to represent that it was not of the type, you cannot use it on value types. If you want to use obj as T, T will have to be a reference type.
T Execute<T>() where T : class
{
return Execute() as T;
}
This small piece of code is an exception safe substitution for the as-keyword:
return Execute() is T value ? value : default(T)
It uses the pattern matching feature introduced with C# 7.
Use it, if you don't want to restrict the generic parameter to a reference type
It seems like you are just adding a wrapper method for casting to the type the user wants, thus only adding overhead to the execution. For the user, writing
int result = Execute<int>();
isn't much different from
int result = (int)Execute();
You can use the out modifier to write the result into a variable in the caller's scope, and return a boolean flag to tell whether it succeeded:
bool Execute<T>(out T result) where T : class
{
result = Execute() as T;
return result != null;
}
Is there a chance that Execute() might return a value type? If so, then you need Earwicker's method for class types, and another generic method for value types. Might look like this:
Nullable<T> ExecuteForValueType<T> where T : struct
The logic inside that method would say
object rawResult = Execute();
Then, you'd have to get the type of rawResult and see if it can be assigned to T:
Nullable<T> finalReturnValue = null;
Type theType = rawResult.GetType();
Type tType = typeof(T);
if(tType.IsAssignableFrom(theType))
{
finalReturnValue = tType;
}
return finalReturnValue;
Finally, make your original Execute message figure out which T is has (class or struct type), and call the appropriate implementation.
Note: This is from rough memory. I did this about a year ago and probably don't remember every detail. Still, I hope pointing you in the general direction helps.