Common Type System - Structures - c#

i am stuck in programming terminology here, which is getting me confused and i cannot gather my thoughts on how to actually and correctly express(write) these few MSDN theory sentences from the Common Type System page.
Would anyone help me on this one, i want to understand this!
And if someone would be so kind to write some code and comment on this issue,
it would be awesome and praiseworthy of you!
//This is the text(it is taken from the "Structures" paragraph):
https://msdn.microsoft.com/en-us/library/zcx1eb1e(v=vs.110).aspx#
"For each value type, the common language runtime supplies a corresponding boxed type, which is a class that has the same state and behavior as the value type.
An instance of a value type is boxed when it is passed to a method that accepts a parameter of type System.Object.
It is unboxed (that is, converted from an instance of a class back to an instance of a value type) when control returns from a method call that accepts a value type as a by-reference parameter.
Some languages require that you use special syntax when the boxed type is required; others automatically use the boxed type when it is needed.
When you define a value type, you are defining both the boxed and the unboxed type."
Thank You in advance, best regards!

An object and a value type are stored differently. An object is a pointer to memory in the heap which contains the binary representation of that object. The stack is memory allocated to store pointers and value types. So a function doesn't have a pointer to an integer or bool. It is passed a copy of the actual value.
But if you have a method like this:
string GetString(object o)
{
return o.ToString();
}
That method expects an object, a pointer to a location in memory, even if you pass a value type to it. So in order to do that, the framework has to create an object stored on the heap containing that int so that it can pass a reference (pointer) to that value to the function. That's boxing.
Boxing is implicit. You don't have to call some conversion function to convert an int to an object.
Unboxing occurs when you take that object and cast it as a value type. For example,
object x = 5; //Boxes the value to create an object with a pointer
var y = (int)x; //Unboxes the value, creating an int on the stack.
When you unbox the object stored in the heap and referenced by x is inspected and its value retrieved. Unboxing is explicit. When you convert anything from object to a value type you must specify the type to which you are converting it.

.NET is language agnostic which allows programmers to write code in different languages (which can be compiled to IL), and that code can interact with other code written in different languages.
This feature is provided by the CTS (Common Type System), a standard that specifies how type definitions are represented in the memory and how types are declared, used, and managed in the CLR (Common Language Runtime).
Example
C# has an int data type and VB.NET has a Integer data type. After the compilation, both instances of int and Integer will use the same structure Int32 from CTS.

Related

Why is Array Covariance not safe?

i was reading about the covariance & contravariance from this blog and the
covariance on Array got me confused
now, if i have this
object[] obj= new string[5];
obj[0]=4;
why am i getting error during run time? Theoretically obj is a variable of type Object and Object can store any type, as all the types are inherited from the Object class. Now when i run this code i am not getting any run time error, can anyone explain me why
class baseclass
{
}
class client
{
static void Main()
{
object obj = new baseclass();
obj = 4;
Console.Read();
}
}
It is in fact confusing.
When you say object[] objs = new string[4] {}; then objs is actually an array of strings. Unsafe array covariance is unsafe because the type system is lying to you. That's why it is unsafe. You think that your array can hold a boxed integer, but it is really an array of strings and it cannot actually hold anything but strings.
Your question is "why is this not safe", and then you give an example of why it is not safe. It is not safe because it crashes at runtime when you do something that looks like it should be safe. It's a violation of the most basic rule of the type system: that a variable actually contains a value of the type of the variable.
For a variable of type object, that's not a lie. You can store any object in that variable, so it's safe. But a variable of type object[] is a lie. You can store things in that variable that are not object[].
This is in my opinion the worst feature of C# and the CLR. C# has this feature because the CLR has it. The CLR has it because Java has it, and the CLR designers wanted to be able to implement Java-like languages in the CLR. I do not know why Java has it; it's a terrible idea and they should not have done it.
Is that now clear?
An object of array type T[] has three notable abilities:
Any value read from the array may be stored in a container of type T.
Any value read from the array may be stored back into the same array.
Any value that fits in a container of type T may be stored into the array.
A non-null reference of type T[] will be capable of holding a reference to any object of type U[], where U derives from T. For any possible type U derived from T, any value read from a U[] may be stored into a container of type T, and may also be stored back into the same array. If a container of type T holds an reference to an object which is derived from T, but is not of type U nor any type derived from U, then a U[] would be incapable of holding that reference.
It would be awkward to allow code to read an item from one array and write it back to the same array, without also allowing it to ask for an item be read from one array and written into another. Rather than trying to limit such operations via compile-time constraints, C# opts instead to say that if code tries to store a value held in a T into an array identified via T[], such an operation will succeed if the T is null or identifies an object of a type not derived from the element type of the actual array identified by the T[].

Primitive type and reference type objects

I have an exam question from a past paper that I'm trying to answer:
Discuss variables of type primitive, reference and static in the context of a programming language. Give suitable examples [8].
The answer I have so far is:
A primitive type is an object which the language has given a predefined value. These types include int, bool and float. Reference type objects refer to these primitive types in a particular sequence when instantiated. Examples of these are strings and arrays. The static keyword, when assigned to a variable, means that there is only one instance of this variable and the value assigned applies to all references of the variable.
I'm fairly new to programming so I don't know if this is exactly right, so if anyone could give me some tips on how to improve the mark I would get for this question I'd greatly appreciate it.
A primitive type is an object which the language has given a
predefined value
Why? Even references can have predefined values as noted. For primitive (built in) types you may want to say these are types that a language provides built in support for. What your instructor might be glad to hear about is if you say that most primitive types are also value types in C# and you might want to discuss value types semantics (e.g., value type variable directly contains value - whereas a reference variable just contains an address to some object in memory).
About reference types again you may say that a reference variable doesn't contain the value or object directly - rather just a reference to it. Now again you may want to discuss reference semantics. For example if you have two reference variables pointing to same object - and you change the object from one reference change will be visible from another reference too - because both references point to same object. This is not the case with value types. If you assign same value type object to two different value type variables and change one variable - this change will not be visible in the second value type variable because each of them holds the value directly (e.g. each will have its own copy of the value type variable it was assigned to).
Static types you have already described.
You are on the right track for sure, but you are missing some fundamental concepts about these. Also, the 3 are not mutually exclusive:
A primitive type is simply a syntax shortcut defined by the compiler for Framework Class Library or FCL types.
A reference type is a pointer that represents an instance of a class. The objects they point to are allocated on the heap and the value of the variable is the memory address of that object rather than the class itself.
Static is not a type at all, but really defines where and when fields, properties, methods, and classes can be used. A static variable lives on the class rather then an instance. A static constructor is called the first time you access the class. A static method can be called from the class definition. That explains the persistence you see on static variables as you create and destroy them.
The answer to that question, in my opinion -- has not a thing to do with OOP and everything to do with the compiler and microprocessor.
The simplest and most accurate definition of the term that subsumes all of the qualities of a primitive type -- as I understand it -- is:
A primitive type must fit into the register used for operations on it -- IOW, in an X86 system -- the Accumulator.
So, primitive types are limited to the size of the Accumulator and can be operated upon by native processor instructions. (Basic math and Boolean/bit-shifting operations). Yes, it fits into heap memory and on the stack, but those are still essentially 8-bit entities and the registers are not.
OOP languages do not use primitive types for their managed memory processes, they use structures that mimic primitive types. (Even in .NET, when you use the keyword int -- it uses System.Int32 to wrap that.)

System.ValueType to System.Object why boxing is necessary

As Int32 is a struct which means it is a System.ValueType (which inherits System.Object), when I pass an Integer to a function which expects Object, why should CLR box it?
Does CLR assumes that Object is always a reference type?
It is a bit confusing to think that ValueType "is" an Object but when you have to pass it "as" object, you need box it...
Am I the only one who is wondering about this?
It's not that a type derived from Object is always a reference type, but rather that a variable of type Object always contains a reference. Suppose you wanted to store the actual value in the Object; how then would you decide how big the Object value would need to be?
A variable of a compile-time-known value type has a known size for which space can be allocated, but an Object, being able to 'contain' any value type, cannot be sized in advance. One logical solution then is to have the Object variable contain a special type of reference to a boxed object, whereby the size of the 'box' is allocated dynamically depending on what type is being boxed.
Some slightly more technical notes:
Another solution to the above problem would be to treat the Object as a reference to an arbitrary location in memory, which would prevent having to create a boxed copy. This is how it's done in C, where you can create a pointer to a value on the stack, for instance, then pass that to another function for use. This can be quite dangerous though, as what happens, for instance, if the function decides to keep that pointer around and use it at some undefined later time. Since the call stack has changed, that pointer is now pointing to something entirely different than was originally intended and writing to it will almost certainly have disastrous side effects.
Part of the goal of .NET, as a managed runtime, is to provide a 'safe' environment where these particular kinds of failures can't happen. Part of that trade-off is disallowing persisted direct references to stack memory, necessitating boxing when you want to 'persist' the contents of a value type in a variable containing a reference. This used to be a performance problem with collections in .NET 1.1, but the addition of Generics in .NET 2.0 meant that boxing was far less common an occurrence.

Passing a value using the ref keyword

After reading the MSDN article on the ref keyword, I am confused as to what C# does when you pass a value type using the ref keyword. The documentation states that the ValueTypes are not boxed. My question is how does C# handle passing a value type as a reference? Is it passing some copy to the data that is allocated on the Stack? Thanks.
Is it passing some copy to the data that is allocated on the Stack?
No, it does not make a copy. ref and out keyword can be compared to passing by pointer in C or passing by reference in C++, when the memory location (i.e. an address) of the variable is passed to the target method. The method that takes a reference would then modify the value directly in place using the memory location passed in.
Knowing that the variable is passed by reference, compiler inserts instructions that treat the ref variable as an address, allowing in-place modifications.
tl;dr: Boxing isn't "how you create a reference"; it's "how you package a primitive value type for consumers who don't expect that exact type".
In .NET, reference types are class instances on the heap. Value types like int or double are just the bytes: A 32-bit int is just four bytes worth of zeroes and ones. When you put it in, say a System.List (the old-timey pre-generic kind, that Granpaw whittled out down at the General Store), then take it back out, how will the compiler know what to do if you call GetType() on it? It would just have four bytes of... what? Who knows? If it stored a pointer in the List, it would have a pointer to four bytes of... who knows?
In your own method, the generated code knows what your variable is. Regular strong type-checking. But that doesn't work when you send your variable's value it to somebody else who only knows he's expecting Object.
So when you add an int to a List, or pass it to a function that takes Object as an argument, the compiler has to add some information to it so everybody else knows what he's getting.
So "Boxing" means packaging a non-reference value into an object that can be treated as an instance of Object. For ordinary ref parameters, that's not necessary, because the type is known the whole way: The code generated for the guts of the function doesn't have to be prepared to deal with any arbitrary reference type. It knows it's getting (for example) a pointer to an integer, and that's all it's going to get. Boxing provides capability that's not required in this case, and so the compiler doesn't waste your users' cycles on it.
Boxing isn't the only way to have a reference (in the broadest sense of the term) to, for example, a double. Rather, boxing is the only way to treat a double as an object that can be stored in a System.List: It has to be on the heap, it has to be castable to Object, has to have run-time type information, etc. etc.
For the following, all all the caller or the callee need is the address of 64 zeroes and ones somewhere:
void f(ref double d) { d *= 2; }

Unable to understand how boxing is done

I was studying boxing and unboxing.
I went through this example, I am unable to understand the answer.
Can anyone explain to me please.
I know what boxing and unboxing does now, by looking at a simple example, but this example, confuses a bit.
An example of boxing and then unboxing, a tricky example.
[struct|class] Point {
public int x, y;
public Point(int x, int y) {
this.x = x;
this.y = y;
}
}
Point p = new Point(1, 1);
object o = p; p.x = 2;
Console.WriteLine(((Point)o).x);
I read the answer as:
It depends! If Point is a struct then the output is 1 but if Point is a class then the output is 2! A boxing conversion makes a copy of the value being boxed explaining the difference in behavior.
Here is ((point)o).x a boxing or unboxing?
Didn't understand, can anyone explain to me please.
I know that the answer should come 1, but if class then how 2?
I don't know why everyone is writing an essay, it's pretty simple to explain:
When you cast a struct into object, it is copied into a new object.
When you cast an object into a struct, it is copied into a new struct.
When you cast between classes, the object's contents are not copied; only the reference is copied.
Hope that helps.
Although C# tries to pretend that structure types derive from Object, that's only half true. According to the CLI spec, a structure type specification actually defines two kinds of thing: a type of heap object which derives from System.ValueType (and in turn System.Object), and a kind of storage location (local variable, static variable, class field, struct field, parameter, or array slot) which does not derive from anything, but is implicitly convertible to the heap object type.
Every heap object instance contains all the fields defined by the type or its parent classes (if any), along with a header which identifies its type and some other information about the instance. Every struct-type storage location contains either the bytes necessary to hold its value (if a primitive type), or else holds the concatenated values of all its fields; in neither case does it contain any sort of header that identifies its type. Instead, value types rely upon information in the generated code to know what they are.
If one stores a value type to a storage location of that value type, the compiler will overwrite all the bytes occupied by the destination with values taken from the original value type. If, however, one tries to store a value type to a reference-type storage location (like Object), the runtime will generate a new heap object with enough space to hold all the data from the value type, along with a header identifying its type, and store in the destination location a reference to that new object. If one tries to typecast a reference type to a value type, the runtime will first verify that the object is of the proper type and, if so, copy the data from the heap object to the destination.
There are a couple of tricky scenarios involving interfaces and generics. Interface types are reference types, but if a struct implements an interface, the implementing methods may act directly upon a boxed struct instance without having to unbox and rebox it. Further, interface types used as generic constraints do not require boxing. If one passes a variable of a value type like List<int>.Enumerator to a function EnumerateThings<TEnumerator>(ref TEnumerator it) where TEnumerator: IEnumerator<int>, that method will be able to accept a reference to that variable without boxing.
Since Point is Struct and Structs are value types which means they are copied when they are passed around.
So if you change a copy you are changing only that copy, not the original.
However if Point was a class, then it was passed by reference.
So if you change a copy you are changing only that copy and also the original.
As for your confusion
object o = p; is boxing
whereas
(Point)o is unboxing
To understand boxing you need to understand the difference between Value types and Reference types.
I think the simplest way to understand it is:
"Value types are allocated in-line. Reference types are always
allocated on the Heap"
meaning, that if you add a value type (struct, int, float, bool) inside of a reference-type as a class variable (public or private), that value-type's data is embedded wherever that reference-type lives on the Heap.
If you create a value-type inside of a function, but do NOT assign it to a public/private variable, that value-type is allocated in the Function-Stack (meaning once you leave that function it will get collected)
So, given that background knowledge, it should be pretty self-explanatory what happens when you "box" a value-type: you have to take that value type (wherever it was in-line allocated) and turn it into a reference-type (create a new object for it on the Heap).
First you need to know where objects are stored. Structs, enums and other value types are stored on the stack, in registers, or on the heap; classes and other reference types are stored on the heap. A good tutorial is here.
Boxing is done when a value type is being stored in a heap. A copy is being made from stack to heap. Unboxing is the other way around, a copy of a value will be made from heap to stack.
1 Point p = new Point(1, 1);
2 object o = p;
3 p.x = 2;
4 Console.WriteLine(((Point)o).x);
In your code above, if Point is a struct, a copy will be made to object "o". On your line 3, what you modified is is the Point in stack. On the last line you unboxed the object "o" but the value you will get is the copied value from the heap.
If the Point is class, in line 1, a space is created for the Point in the heap. Line 2, creates a new variable "o" that references to the same memory space. Remember that "p" & "o" are referencing on the same memory address location, so if you modify any of the variable just like in line 3, you will get the modified value on both variable.

Categories