Unboxing Values Types from Objects

Unboxing Values Types from Objects - c#

I've been trying to understand this paragraph, but somehow I couldn't virtualize it in my mind, some one please elaborate it little bit:
Unboxing is not the exact opposite of boxing. The unboxing operation
is much less costly than boxing. Unboxing is really just the operation
of obtaining a pointer to the raw value type (data fields) contained
within an object. In effect, the pointer refers to the unboxed portion
in the boxed instance. So, unlike boxing, unboxing doesn't involve the
copying of any bytes in memory. Having made this important
clarification, it is important to note that an unboxing operation is
typically followed by copying the fields.
Richter, Jeffrey (2010-02-05). CLR via C# (Kindle Locations
4167-4171). OReilly Media - A. Kindle Edition.

In order to box an int you need to create an object on the heap large enough to hold all of the data that the struct holds. Allocating a new object on the heap means work for the GC to find a spot, and work for the GC to clean it up/move it around during and after its lifetime. These operations, while not super expensive, aren't cheap either.
To unbox a value type all you're doing is de-reference the pointer, so to speak. You simply need to look at the reference (which is what the object you have is) to find the location of the actual values. Looking up a value in memory is very cheap, which is why that paragraph is saying 'unboxing' is cheap.
Update:
While an unboxed value type will usually be copied to some other location right after being unboxed, that isn't always the case. Consider the following example:
public struct MyStruct
{
private int value = 42;
public void Foo()
{
Console.WriteLine(value);
}
}
static void Main()
{
object obj = new MyStruct();
((MyStruct)obj).Foo();
}
The MyStruct is boxed into obj but when it's unboxed it's never copied anywhere, a method is simply invoked on it. LIkewise you could pull a property/field out of the struct and copy just that part of it without needing to copy the whole thing. This might look a bit contrived, but it's still not entirely absurd. That said, as your quote implies, it's still likely to copy the struct after you unbox it.

Related

Generic base class implementation for both struct and object [duplicate]

I know that structs in .NET do not support inheritance, but its not exactly clear why they are limited in this way.
What technical reason prevents structs from inheriting from other structs?

The reason value types can't support inheritance is because of arrays.
The problem is that, for performance and GC reasons, arrays of value types are stored "inline". For example, given new FooType[10] {...}, if FooType is a reference type, 11 objects will be created on the managed heap (one for the array, and 10 for each type instance). If FooType is instead a value type, only one instance will be created on the managed heap -- for the array itself (as each array value will be stored "inline" with the array).
Now, suppose we had inheritance with value types. When combined with the above "inline storage" behavior of arrays, Bad Things happen, as can be seen in C++.
Consider this pseudo-C# code:
struct Base
{
public int A;
}
struct Derived : Base
{
public int B;
}
void Square(Base[] values)
{
for (int i = 0; i < values.Length; ++i)
values [i].A *= 2;
}
Derived[] v = new Derived[2];
Square (v);
By normal conversion rules, a Derived[] is convertible to a Base[] (for better or worse), so if you s/struct/class/g for the above example, it'll compile and run as expected, with no problems. But if Base and Derived are value types, and arrays store values inline, then we have a problem.
We have a problem because Square() doesn't know anything about Derived, it'll use only pointer arithmetic to access each element of the array, incrementing by a constant amount (sizeof(A)). The assembly would be vaguely like:
for (int i = 0; i < values.Length; ++i)
{
A* value = (A*) (((char*) values) + i * sizeof(A));
value->A *= 2;
}
(Yes, that's abominable assembly, but the point is that we'll increment through the array at known compile-time constants, without any knowledge that a derived type is being used.)
So, if this actually happened, we'd have memory corruption issues. Specifically, within Square(), values[1].A*=2 would actually be modifying values[0].B!
Try to debug THAT!

Imagine structs supported inheritance. Then declaring:
BaseStruct a;
InheritedStruct b; //inherits from BaseStruct, added fields, etc.
a = b; //?? expand size during assignment?
would mean struct variables don't have fixed size, and that is why we have reference types.
Even better, consider this:
BaseStruct[] baseArray = new BaseStruct[1000];
baseArray[500] = new InheritedStruct(); //?? morph/resize the array?

Structs do not use references (unless they are boxed, but you should try to avoid that) thus polymorphism isn't meaningful since there is no indirection via a reference pointer. Objects normally live on the heap and are referenced via reference pointers, but structs are allocated on the stack (unless they are boxed) or are allocated "inside" the memory occupied by a reference type on the heap.

Class like inheritance is not possible, as a struct is laid directly on the stack. An inheriting struct would be bigger then it parent, but the JIT doesn't know so, and tries to put too much on too less space. Sounds a little unclear, let's write a example:
struct A {
int property;
} // sizeof A == sizeof int
struct B : A {
int childproperty;
} // sizeof B == sizeof int * 2
If this would be possible, it would crash on the following snippet:
void DoSomething(A arg){};
...
B b;
DoSomething(b);
Space is allocated for the sizeof A, not for the sizeof B.

Here's what the docs say:
Structs are particularly useful for small data structures that have value semantics. Complex numbers, points in a coordinate system, or key-value pairs in a dictionary are all good examples of structs. Key to these data structures is that they have few data members, that they do not require use of inheritance or referential identity, and that they can be conveniently implemented using value semantics where assignment copies the value instead of the reference.
Basically, they're supposed to hold simple data and therefore do not have "extra features" such as inheritance. It would probably be technically possible for them to support some limited kind of inheritance (not polymorphism, due to them being on the stack), but I believe it is also a design choice to not support inheritance (as many other things in the .NET languages are.)
On the other hand, I agree with the benefits of inheritance, and I think we all have hit the point where we want our struct to inherit from another, and realize that it's not possible. But at that point, the data structure is probably so advanced that it should be a class anyway.

Structs are allocated on the stack. This means the value semantics are pretty much free, and accessing struct members is very cheap. This doesn't prevent polymorphism.
You could have each struct start with a pointer to its virtual function table. This would be a performance issue (every struct would be at least the size of a pointer), but it's doable. This would allow virtual functions.
What about adding fields?
Well, when you allocate a struct on the stack, you allocate a certain amount of space. The required space is determined at compile time (whether ahead of time or when JITting). If you add fields and then assign to a base type:
struct A
{
public int Integer1;
}
struct B : A
{
public int Integer2;
}
A a = new B();
This will overwrite some unknown part of the stack.
The alternative is for the runtime to prevent this by only writing sizeof(A) bytes to any A variable.
What happens if B overrides a method in A and references its Integer2 field? Either the runtime throws a MemberAccessException, or the method instead accesses some random data on the stack. Neither of these is permissible.
It's perfectly safe to have struct inheritance, so long as you don't use structs polymorphically, or so long as you don't add fields when inheriting. But these aren't terribly useful.

There is a point I would like to correct. Even though the reason structs cannot be inherited is because they live on the stack is the right one, it is at the same a half correct explanation. Structs, like any other value type can live in the stack. Because it will depend on where the variable is declared they will either live in the stack or in the heap. This will be when they are local variables or instance fields respectively.
In saying that, Cecil Has a Name nailed it correctly.
I would like to emphasize this, value types can live on the stack. This doesn't mean they always do so. Local variables, including method parameters, will. All others will not. Nevertheless, it still remains the reason they can't be inherited. :-)

This seems like a very frequent question. I feel like adding that value types are stored "in place" where you declare the variable; apart from implementation details, this means that there is no object header that says something about the object, only the variable knows what kind of data resides there.

Structs do support interfaces, so you can do some polymorphic things that way.

IL is a stack-based language, so calling a method with an argument goes something like this:
Push the argument onto the stack
Call the method.
When the method runs, it pops some bytes off the stack to get its argument. It knows exactly how many bytes to pop off because the argument is either a reference type pointer (always 4 bytes on 32-bit) or it is a value type for which the size is always known exactly.
If it is a reference type pointer then the method looks up the object in the heap and gets its type handle, which points to a method table which handles that particular method for that exact type. If it is a value type, then no lookup to a method table is necessary because value types do not support inheritance, so there is only one possible method/type combination.
If value types supported inheritance then there would be extra overhead in that the particular type of the struct would have to placed on the stack as well as its value, which would mean some sort of method table lookup for the particular concrete instance of the type. This would eliminate the speed and efficiency advantages of value types.

Why string shows the behavior of value type, but it is reference type in .net? [duplicate]

A String is a reference type even though it has most of the characteristics of a value type such as being immutable and having == overloaded to compare the text rather than making sure they reference the same object.
Why isn't string just a value type then?

Strings aren't value types since they can be huge, and need to be stored on the heap. Value types are (in all implementations of the CLR as of yet) stored on the stack. Stack allocating strings would break all sorts of things: the stack is only 1MB for 32-bit and 4MB for 64-bit, you'd have to box each string, incurring a copy penalty, you couldn't intern strings, and memory usage would balloon, etc...
(Edit: Added clarification about value type storage being an implementation detail, which leads to this situation where we have a type with value sematics not inheriting from System.ValueType. Thanks Ben.)

It is not a value type because performance (space and time!) would be terrible if it were a value type and its value had to be copied every time it were passed to and returned from methods, etc.
It has value semantics to keep the world sane. Can you imagine how difficult it would be to code if
string s = "hello";
string t = "hello";
bool b = (s == t);
set b to be false? Imagine how difficult coding just about any application would be.

A string is a reference type with value semantics. This design is a tradeoff which allows certain performance optimizations.
The distinction between reference types and value types are basically a performance tradeoff in the design of the language. Reference types have some overhead on construction and destruction and garbage collection, because they are created on the heap. Value types on the other hand have overhead on assignments and method calls (if the data size is larger than a pointer), because the whole object is copied in memory rather than just a pointer. Because strings can be (and typically are) much larger than the size of a pointer, they are designed as reference types. Furthermore the size of a value type must be known at compile time, which is not always the case for strings.
But strings have value semantics which means they are immutable and compared by value (i.e. character by character for a string), not by comparing references. This allows certain optimizations:
Interning means that if multiple strings are known to be equal, the compiler can just use a single string, thereby saving memory. This optimization only works if strings are immutable, otherwise changing one string would have unpredictable results on other strings.
String literals (which are known at compile time) can be interned and stored in a special static area of memory by the compiler. This saves time at runtime since they don't need to be allocated and garbage collected.
Immutable strings does increase the cost for certain operations. For example you can't replace a single character in-place, you have to allocate a new string for any change. But this is a small cost compared to the benefit of the optimizations.
Value semantics effectively hides the distinction between reference type and value types for the user. If a type has value semantics, it doesn't matter for the user if the type is a value type or reference type - it can be considered an implementation detail.

This is a late answer to an old question, but all other answers are missing the point, which is that .NET did not have generics until .NET 2.0 in 2005.
String is a reference type instead of a value type because it was of crucial importance for Microsoft to ensure that strings could be stored in the most efficient way in non-generic collections, such as System.Collections.ArrayList.
Storing a value-type in a non-generic collection requires a special conversion to the type object which is called boxing. When the CLR boxes a value type, it wraps the value inside a System.Object and stores it on the managed heap.
Reading the value from the collection requires the inverse operation which is called unboxing.
Both boxing and unboxing have non-negligible cost: boxing requires an additional allocation, unboxing requires type checking.
Some answers claim incorrectly that string could never have been implemented as a value type because its size is variable. Actually it is easy to implement string as a fixed-length data structure containing two fields: an integer for the length of the string, and a pointer to a char array. You can also use a Small String Optimization strategy on top of that.
If generics had existed from day one I guess having string as a value type would probably have been a better solution, with simpler semantics, better memory usage and better cache locality. A List<string> containing only small strings could have been a single contiguous block of memory.

Not only strings are immutable reference types.
Multi-cast delegates too.
That is why it is safe to write
protected void OnMyEventHandler()
{
delegate handler = this.MyEventHandler;
if (null != handler)
{
handler(this, new EventArgs());
}
}
I suppose that strings are immutable because this is the most safe method to work with them and allocate memory.
Why they are not Value types? Previous authors are right about stack size etc. I would also add that making strings a reference types allow to save on assembly size when you use the same constant string in the program. If you define
string s1 = "my string";
//some code here
string s2 = "my string";
Chances are that both instances of "my string" constant will be allocated in your assembly only once.
If you would like to manage strings like usual reference type, put the string inside a new StringBuilder(string s). Or use MemoryStreams.
If you are to create a library, where you expect a huge strings to be passed in your functions, either define a parameter as a StringBuilder or as a Stream.

In a very simple words any value which has a definite size can be treated as a value type.

Also, the way strings are implemented (different for each platform) and when you start stitching them together. Like using a StringBuilder. It allocats a buffer for you to copy into, once you reach the end, it allocates even more memory for you, in the hopes that if you do a large concatenation performance won't be hindered.
Maybe Jon Skeet can help up out here?

It is mainly a performance issue.
Having strings behave LIKE value type helps when writing code, but having it BE a value type would make a huge performance hit.
For an in-depth look, take a peek at a nice article on strings in the .net framework.

How can you tell string is a reference type? I'm not sure that it matters how it is implemented. Strings in C# are immutable precisely so that you don't have to worry about this issue.

Actually strings have very few resemblances to value types. For starters, not all value types are immutable, you can change the value of an Int32 all you want and it it would still be the same address on the stack.
Strings are immutable for a very good reason, it has nothing to do with it being a reference type, but has a lot to do with memory management. It's just more efficient to create a new object when string size changes than to shift things around on the managed heap. I think you're mixing together value/reference types and immutable objects concepts.
As far as "==": Like you said "==" is an operator overload, and again it was implemented for a very good reason to make framework more useful when working with strings.

The fact that many mention the stack and memory with respect to value types and primitive types is because they must fit into a register in the microprocessor. You cannot push or pop something to/from the stack if it takes more bits than a register has....the instructions are, for example "pop eax" -- because eax is 32 bits wide on a 32-bit system.
Floating-point primitive types are handled by the FPU, which is 80 bits wide.
This was all decided long before there was an OOP language to obfuscate the definition of primitive type and I assume that value type is a term that has been created specifically for OOP languages.

Isn't just as simple as Strings are made up of characters arrays. I look at strings as character arrays[]. Therefore they are on the heap because the reference memory location is stored on the stack and points to the beginning of the array's memory location on the heap. The string size is not known before it is allocated ...perfect for the heap.
That is why a string is really immutable because when you change it even if it is of the same size the compiler doesn't know that and has to allocate a new array and assign characters to the positions in the array. It makes sense if you think of strings as a way that languages protect you from having to allocate memory on the fly (read C like programming)

C# heap space allocation when boxing and unboxing

Been using C# for a while and I've been thinking this:
public static void Main(Strings[] args){
...
Person p = new Person("Bob",23,"Male");
List<Object> al = new List<Object>();
al.Add(p);
Person p = (Person)al[0];
}
A typical example of boxing and unboxing in Collection, but question is: when boxing the variable, the CLR allocates a extra space in GC heap and treat p as object, yet the Person class is "larger" than System.Object
So according to that, that may lose some values that Person class owns additionally, it will fail to get some data after unboxing.
How CLR work that out?
Any solutions are welcomed

Person is a class, so used by reference. No boxing or unboxing.
As opposed to a value type that may require boxing.

If the Person type is really a structure, what you haven't declared explicitly, the space on the heap is surely larger than the space needed for an object of System.Object class. However, at the moment when the data are moved to heap, the object itself is not the value you are giving to the Add method, this value is only a reference to the boxed object. If System.Object was a structure, then yes, the data will have to be truncated to fit the size of the structure. This is the reason why inherited structures aren't allowed.

Unable to understand how boxing is done

I was studying boxing and unboxing.
I went through this example, I am unable to understand the answer.
Can anyone explain to me please.
I know what boxing and unboxing does now, by looking at a simple example, but this example, confuses a bit.
An example of boxing and then unboxing, a tricky example.
[struct|class] Point {
public int x, y;
public Point(int x, int y) {
this.x = x;
this.y = y;
}
}
Point p = new Point(1, 1);
object o = p; p.x = 2;
Console.WriteLine(((Point)o).x);
I read the answer as:
It depends! If Point is a struct then the output is 1 but if Point is a class then the output is 2! A boxing conversion makes a copy of the value being boxed explaining the difference in behavior.
Here is ((point)o).x a boxing or unboxing?
Didn't understand, can anyone explain to me please.
I know that the answer should come 1, but if class then how 2?

I don't know why everyone is writing an essay, it's pretty simple to explain:
When you cast a struct into object, it is copied into a new object.
When you cast an object into a struct, it is copied into a new struct.
When you cast between classes, the object's contents are not copied; only the reference is copied.
Hope that helps.

Although C# tries to pretend that structure types derive from Object, that's only half true. According to the CLI spec, a structure type specification actually defines two kinds of thing: a type of heap object which derives from System.ValueType (and in turn System.Object), and a kind of storage location (local variable, static variable, class field, struct field, parameter, or array slot) which does not derive from anything, but is implicitly convertible to the heap object type.
Every heap object instance contains all the fields defined by the type or its parent classes (if any), along with a header which identifies its type and some other information about the instance. Every struct-type storage location contains either the bytes necessary to hold its value (if a primitive type), or else holds the concatenated values of all its fields; in neither case does it contain any sort of header that identifies its type. Instead, value types rely upon information in the generated code to know what they are.
If one stores a value type to a storage location of that value type, the compiler will overwrite all the bytes occupied by the destination with values taken from the original value type. If, however, one tries to store a value type to a reference-type storage location (like Object), the runtime will generate a new heap object with enough space to hold all the data from the value type, along with a header identifying its type, and store in the destination location a reference to that new object. If one tries to typecast a reference type to a value type, the runtime will first verify that the object is of the proper type and, if so, copy the data from the heap object to the destination.
There are a couple of tricky scenarios involving interfaces and generics. Interface types are reference types, but if a struct implements an interface, the implementing methods may act directly upon a boxed struct instance without having to unbox and rebox it. Further, interface types used as generic constraints do not require boxing. If one passes a variable of a value type like List<int>.Enumerator to a function EnumerateThings<TEnumerator>(ref TEnumerator it) where TEnumerator: IEnumerator<int>, that method will be able to accept a reference to that variable without boxing.

Since Point is Struct and Structs are value types which means they are copied when they are passed around.
So if you change a copy you are changing only that copy, not the original.
However if Point was a class, then it was passed by reference.
So if you change a copy you are changing only that copy and also the original.
As for your confusion
object o = p; is boxing
whereas
(Point)o is unboxing

To understand boxing you need to understand the difference between Value types and Reference types.
I think the simplest way to understand it is:
"Value types are allocated in-line. Reference types are always
allocated on the Heap"
meaning, that if you add a value type (struct, int, float, bool) inside of a reference-type as a class variable (public or private), that value-type's data is embedded wherever that reference-type lives on the Heap.
If you create a value-type inside of a function, but do NOT assign it to a public/private variable, that value-type is allocated in the Function-Stack (meaning once you leave that function it will get collected)
So, given that background knowledge, it should be pretty self-explanatory what happens when you "box" a value-type: you have to take that value type (wherever it was in-line allocated) and turn it into a reference-type (create a new object for it on the Heap).

First you need to know where objects are stored. Structs, enums and other value types are stored on the stack, in registers, or on the heap; classes and other reference types are stored on the heap. A good tutorial is here.
Boxing is done when a value type is being stored in a heap. A copy is being made from stack to heap. Unboxing is the other way around, a copy of a value will be made from heap to stack.
1 Point p = new Point(1, 1);
2 object o = p;
3 p.x = 2;
4 Console.WriteLine(((Point)o).x);
In your code above, if Point is a struct, a copy will be made to object "o". On your line 3, what you modified is is the Point in stack. On the last line you unboxed the object "o" but the value you will get is the copied value from the heap.
If the Point is class, in line 1, a space is created for the Point in the heap. Line 2, creates a new variable "o" that references to the same memory space. Remember that "p" & "o" are referencing on the same memory address location, so if you modify any of the variable just like in line 3, you will get the modified value on both variable.

clearing doubts about value and reference types

value type variables contain the actual data directly and reference type variables contain the reference to the actual data.
I think of this as:
l.h.s is value type and r.h.s is reference type
on the left hand side, if I copy i into j, a new memory location is filled with same original data (45).
on the right hand side, if I copy k to l, a new memory location is filled with the reference to the object; and this reference points to the actual object in memory.
now, I am confused about this reference type copying. here's a slight different thing:
Here, the copy on the r.h.s makes l points to same location as k.
My question is 1. "Which one is truer?" or is there more to it than I imagined?
Also, value types may be allocated on heap, depending upon the how jitter sees it fit, then 2. Can we force a reference type to be allocated on stack?
Sorry for sloppy image editing.

The first picture is better, l and k are different variables, occupying different places in memory.
value types may be allocated on heap, depending upon the how jitter sees it fit
Actually it depends more on the context and the way a value is used. A value-type field would always be allocated on the heap, boxing and closures are other reasons.
However, the 2nd picture applies when l is a ref parameter:
MyClass k = new ...;
M(ref k);
void M(ref MyClass l) { /* Here l is an alias for k */ }
then 2. Can we force a reference type to be allocated on stack?
There is something like stackalloc but it's an optimization that is 'invisible' to a C# programmer.
The simple and most useful answer is: No.

Neither, and both. The problem is that you are talking about an implementation detail that is not specified in the C# language itself.
In fact you could be programming against a machine that only has a stack, or you might have registers available. At the end of the day, it is all just an implementation detail. The model that most resembles reality is dependent on the machine architecture you are running on.

Can we force a reference type to be allocated on stack?
We cannot force anything in that manner, the reference is on the stack when you initialize a variable inside a method for example. The reference type, actually the object initialized with new keyword, including the value types inside it is allocated on the heap.
Although it is a topic that one could write a book about, it all comes down to this:
We have two kind of type behavior in .NET. Value behavior and reference behavior. The distinction between the two is in their concept. Value types represent the value itself, the actual data and references are memory locations. Memory locations that are addresses in the actual object instances created on the heap. They represent a sort of a link to the actual object in the virtual address space.
I have written a blog post that goes somewhat into detail and tries to conceptually explain how it all works on a lower level. But my explenation in there is based primarily on the x86 architecture and is not exactly how everything is implemented. How C# and the .NET framework together with the JIT can change, but hope it helps to make it clearer a little bit.

Nice drawing! I think the first image is "truer". They reference the same object, but to store the references they also need variables. References or pointers are also variables, which means they have their own memory location. 2. I don't think so. (I am not sure)

the first is the most right. a reference is a number, and there for when you copy you copy the address. in c you could actually put a reference into a integer, in c# the look over you a beat more.
if you declare a struct instead of a class it well be in the stack (as far as I remember)

The first picture is most correct because you consider
l = k;
If instead you have this situation
class MyClass
{
internal string k;
internal void Test()
{
OtherClass.Method(ref k);
}
}
class OtherClass
{
internal static Method(ref string l)
{
// do a lot of stuff using l
}
}
then in this case I would say the second picture is more correct because this parameter has the ref keyword with it. That means that if someone changes the k reference to point to another string object, while the OtherClass.Method is running, then all of a sudden the l variable will point to the new object too.
But this is only true because of the ref keyword.

First picture definitely... Otherwise an invocation with a "by-ref" argument would affect both references, i.e. Xxx(ref k); possibly "redirecting" both k and l.
Regarding stack et al, you might want to read Eric Lippert (The Stack Is An
Implementation Detail)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.