What's the magic of arrays in C#

What's the magic of arrays in C# - c#

int[] a = new int[5];
string[] b = new string[1];
The types of both a and b inherit from the abstract System.Array, but there is no real classes in the built-in library(it seems that there are some runtime types, you can't find the type defination class of an int[]). Can you tell me what happens while compiling? And why did they(the c# team) make this design(I mean why it's not something like Array<T>,instead they are using an abstract class with compiler magics)?

Trying to reason this out within the .NET type system doesn't get you very far. There is core support built into the JIT compiler and the CLR to deal with creating arrays. A statement like this:
var arr = new int[5];
Generates this IL:
IL_0001: ldc.i4.5
IL_0002: newarr [mscorlib]System.Int32
Which the JIT compiler then translate into this machine code:
00000035 mov edx,5 ; arg2 = array size
0000003a mov ecx,6F535F06h ; arg1 = typeof(int)
0000003f call FFD52128 ; call JIT_NewArr1(type, size)
Core ingredients here are the dedicated IL opcode, newarr, instead of the usual newobj opcode that creates an instance of a class. And the simple translation to a CLR helper function that actually gets the object created. You can have a look-see at this helper function with the SSCLI20 source code, clr\src\vm\jithelpers.cpp. Too large to post here, but it is heavily optimized to make this kind of code run as fast possible, having direct access to the type internals available to CLR code.
There are two of these helpers available, JIT_NewArr1() creates one-dimensional (vector) arrays and JIT_NewMDArr() creates multi-dimensional arrays. Compare to the two overloads available for Type.MakeArrayType().

And why did they(the c# team) make
this design(I mean why it's not
something like Array...
Generics are ideal for defining a container, as they constrain the element type so you can't insert type A and try to retrieve type B.
But generics were not added until CLR2/C#2. So arrays had to provide type safety in their own way.
Even so, it's not that different to generics. You note that there is no special class for int[]. But nor would there be for Array<int>. In generics there would only be the generic class Array<T>, and the CLR "magically" creates specialised versions for distinct type argument you use. So it would be no less "magic" if generics were used.
Despite this, in the CLR the type of any object is reified (it exists as a value you can manipulate), of type Type, and can be obtained with typeof. So although there is no code declaration of any array type (and why would you need to see it?) there is a Type object that you can query.
By the way, there was a design flaw in the way arrays constrain element types. You can declare an array:
int[] ints = ...
You can then store it in a looser variable:
object[] objs = ints;
But that means you can insert a string (at least it appears so at compile time):
objs[3] = "Oh dear";
At runtime it throws an exception. The idea of static type checking is to catch this kind of thing at compile time, not runtime. Generics would not have had this problem because they don't give assignment compatibility to generic class instances based on the compatibility of their type parameters. (Since C#4/CLR4 they have gained the ability to do that where it makes sense, but that wouldn't make sense for a mutable array.)

Look at the Array class.
When declaring an array using the [] syntax, the compiler, behind the scenes will use this class for you.
For C#, [] becomes a type that inherits from System.Array.
From the C# 4.0 spec:
§12.1.1 The System.Array type
The type System.Array is the abstract base type of all array types. An implicit reference conversion (§6.1.6) exists from any array type to System.Array, and an explicit reference conversion (§6.2.4) exists from System.Array to any array type. Note that System.Array is not itself an array-type. Rather, it is a class-type from which all array-types are derived.

There is such class. You cannot inherit it, but when you write "int[]" the compiler creates a type that inherits System.Array. So if you declare a variable:
int[] x;
This variable will have a type that inherits System.Array, and therefore has all its methods and properties.
This is also similar to delegates. When you define a delegate:
delegate void Foo(int x);
delegate int Bar(double x);
Then the type Foo is actually a class that inherits System.MulticastDelegate and Bar is a class that inherits System.Delegate.

I would recommend getting the ECMA 335 spec and looking for Arrays if you want to know the low level detail: http://www.ecma-international.org/publications/standards/Ecma-335.htm

I went digging through the ECMA 335 spec, so I figured I'd share what I read.
Exact array types are created automatically by the VES when they are required. Hence, the
operations on an array type are defined by the CTS. These generally are: allocating the array
based on size and lower-bound information, indexing the array to read and write a value,
computing the address of an element of the array (a managed pointer), and querying for the rank,
bounds, and the total number of values stored in the array.
The VES creates one array type for each
distinguishable array type.
Vectors are subtypes of System.Array, an abstract class pre-defined by the CLI. It provides several
methods that can be applied to all vectors. See Partition IV.
While vectors (§II.14.1) have direct support through CIL instructions, all other arrays are supported by
the VES by creating subtypes of the abstract class System.Array (see Partition IV)
While vectors (§II.14.1) have direct support through CIL instructions, all other arrays are supported by
the VES by creating subtypes of the abstract class System.Array (see Partition IV)
The class that the VES creates for arrays contains several methods whose implementation is supplied
by the VES:
It goes on to state, quite verbosely, that the methods supplied are:
Two constructors
Get
Set
Address (returns a managed pointer)
VES means Virtual Execution System, and the CLR is an implementation of it.
The spec also details how to store the data of the array (contiguously in row-major order), what indexing is allowed in arrays (0-based only), when a vector is created (single dimensional, 0-based arrays) as opposed to a different array type, when the CIL instruction newarr is used as opposed to newobj (creating a 0-based, single dimensional array).
Basically everything that the compiler has to do to build the method lookup tables etc.. for a regular type, it has to do for arrays, but they just programmed a more versatile and slightly special behavior into the compiler / JIT.
Why did they do it? Probably because arrays are special, widely used, and can be stored in an optimized fashion. The C# team did not necessarily make this decision though. It's more of a .NET thing, which is a cousin to Mono and Portable.NET, all of which are a CIL thing.

Arrays are special to CLR. They're allocated with 'newarr' instruction, and elements are accessed with 'ldelem*' and 'stelem*' instructions, not via System.Array methods;
see http://msdn.microsoft.com/en-us/library/system.reflection.emit.opcodes.newarr.aspx
You can check out ildasm output to see how arrays are represented.
So, to answer your question - no new type declaration is generated for any particular array.

[] is a syntax(syntatic sugar) for defining Arrays in c#. Maybe CreateInstance will be replaced at runtime
Array a = Array.CreateInstance(typeof(int), 5);
is same as
int[] a = new int[5];
Source for CreateInstance (taken from reflector)
public static unsafe Array CreateInstance(Type elementType, int length)
{
if (elementType == null)
{
throw new ArgumentNullException("elementType");
}
RuntimeType underlyingSystemType = elementType.UnderlyingSystemType as RuntimeType;
if (underlyingSystemType == null)
{
throw new ArgumentException(Environment.GetResourceString("Arg_MustBeType"), "elementType");
}
if (length < 0)
{
throw new ArgumentOutOfRangeException("length", Environment.GetResourceString("ArgumentOutOfRange_NeedNonNegNum"));
}
return InternalCreate((void*) underlyingSystemType.TypeHandle.Value, 1, &length, null);
}

Related

Generic base class implementation for both struct and object [duplicate]

I know that structs in .NET do not support inheritance, but its not exactly clear why they are limited in this way.
What technical reason prevents structs from inheriting from other structs?

The reason value types can't support inheritance is because of arrays.
The problem is that, for performance and GC reasons, arrays of value types are stored "inline". For example, given new FooType[10] {...}, if FooType is a reference type, 11 objects will be created on the managed heap (one for the array, and 10 for each type instance). If FooType is instead a value type, only one instance will be created on the managed heap -- for the array itself (as each array value will be stored "inline" with the array).
Now, suppose we had inheritance with value types. When combined with the above "inline storage" behavior of arrays, Bad Things happen, as can be seen in C++.
Consider this pseudo-C# code:
struct Base
{
public int A;
}
struct Derived : Base
{
public int B;
}
void Square(Base[] values)
{
for (int i = 0; i < values.Length; ++i)
values [i].A *= 2;
}
Derived[] v = new Derived[2];
Square (v);
By normal conversion rules, a Derived[] is convertible to a Base[] (for better or worse), so if you s/struct/class/g for the above example, it'll compile and run as expected, with no problems. But if Base and Derived are value types, and arrays store values inline, then we have a problem.
We have a problem because Square() doesn't know anything about Derived, it'll use only pointer arithmetic to access each element of the array, incrementing by a constant amount (sizeof(A)). The assembly would be vaguely like:
for (int i = 0; i < values.Length; ++i)
{
A* value = (A*) (((char*) values) + i * sizeof(A));
value->A *= 2;
}
(Yes, that's abominable assembly, but the point is that we'll increment through the array at known compile-time constants, without any knowledge that a derived type is being used.)
So, if this actually happened, we'd have memory corruption issues. Specifically, within Square(), values[1].A*=2 would actually be modifying values[0].B!
Try to debug THAT!

Imagine structs supported inheritance. Then declaring:
BaseStruct a;
InheritedStruct b; //inherits from BaseStruct, added fields, etc.
a = b; //?? expand size during assignment?
would mean struct variables don't have fixed size, and that is why we have reference types.
Even better, consider this:
BaseStruct[] baseArray = new BaseStruct[1000];
baseArray[500] = new InheritedStruct(); //?? morph/resize the array?

Structs do not use references (unless they are boxed, but you should try to avoid that) thus polymorphism isn't meaningful since there is no indirection via a reference pointer. Objects normally live on the heap and are referenced via reference pointers, but structs are allocated on the stack (unless they are boxed) or are allocated "inside" the memory occupied by a reference type on the heap.

Class like inheritance is not possible, as a struct is laid directly on the stack. An inheriting struct would be bigger then it parent, but the JIT doesn't know so, and tries to put too much on too less space. Sounds a little unclear, let's write a example:
struct A {
int property;
} // sizeof A == sizeof int
struct B : A {
int childproperty;
} // sizeof B == sizeof int * 2
If this would be possible, it would crash on the following snippet:
void DoSomething(A arg){};
...
B b;
DoSomething(b);
Space is allocated for the sizeof A, not for the sizeof B.

Here's what the docs say:
Structs are particularly useful for small data structures that have value semantics. Complex numbers, points in a coordinate system, or key-value pairs in a dictionary are all good examples of structs. Key to these data structures is that they have few data members, that they do not require use of inheritance or referential identity, and that they can be conveniently implemented using value semantics where assignment copies the value instead of the reference.
Basically, they're supposed to hold simple data and therefore do not have "extra features" such as inheritance. It would probably be technically possible for them to support some limited kind of inheritance (not polymorphism, due to them being on the stack), but I believe it is also a design choice to not support inheritance (as many other things in the .NET languages are.)
On the other hand, I agree with the benefits of inheritance, and I think we all have hit the point where we want our struct to inherit from another, and realize that it's not possible. But at that point, the data structure is probably so advanced that it should be a class anyway.

Structs are allocated on the stack. This means the value semantics are pretty much free, and accessing struct members is very cheap. This doesn't prevent polymorphism.
You could have each struct start with a pointer to its virtual function table. This would be a performance issue (every struct would be at least the size of a pointer), but it's doable. This would allow virtual functions.
What about adding fields?
Well, when you allocate a struct on the stack, you allocate a certain amount of space. The required space is determined at compile time (whether ahead of time or when JITting). If you add fields and then assign to a base type:
struct A
{
public int Integer1;
}
struct B : A
{
public int Integer2;
}
A a = new B();
This will overwrite some unknown part of the stack.
The alternative is for the runtime to prevent this by only writing sizeof(A) bytes to any A variable.
What happens if B overrides a method in A and references its Integer2 field? Either the runtime throws a MemberAccessException, or the method instead accesses some random data on the stack. Neither of these is permissible.
It's perfectly safe to have struct inheritance, so long as you don't use structs polymorphically, or so long as you don't add fields when inheriting. But these aren't terribly useful.

There is a point I would like to correct. Even though the reason structs cannot be inherited is because they live on the stack is the right one, it is at the same a half correct explanation. Structs, like any other value type can live in the stack. Because it will depend on where the variable is declared they will either live in the stack or in the heap. This will be when they are local variables or instance fields respectively.
In saying that, Cecil Has a Name nailed it correctly.
I would like to emphasize this, value types can live on the stack. This doesn't mean they always do so. Local variables, including method parameters, will. All others will not. Nevertheless, it still remains the reason they can't be inherited. :-)

This seems like a very frequent question. I feel like adding that value types are stored "in place" where you declare the variable; apart from implementation details, this means that there is no object header that says something about the object, only the variable knows what kind of data resides there.

Structs do support interfaces, so you can do some polymorphic things that way.

IL is a stack-based language, so calling a method with an argument goes something like this:
Push the argument onto the stack
Call the method.
When the method runs, it pops some bytes off the stack to get its argument. It knows exactly how many bytes to pop off because the argument is either a reference type pointer (always 4 bytes on 32-bit) or it is a value type for which the size is always known exactly.
If it is a reference type pointer then the method looks up the object in the heap and gets its type handle, which points to a method table which handles that particular method for that exact type. If it is a value type, then no lookup to a method table is necessary because value types do not support inheritance, so there is only one possible method/type combination.
If value types supported inheritance then there would be extra overhead in that the particular type of the struct would have to placed on the stack as well as its value, which would mean some sort of method table lookup for the particular concrete instance of the type. This would eliminate the speed and efficiency advantages of value types.

How would you call the kind of polymorphism introduced by arrays in C#

AFAIK at the infrastructure (CLI/.Net/Mono) level there is a single type to represent arrays: System.Array.
Physically this is a linear sequence of values, but logically they can be considered are organized in more than one dimension.
At the language level (e.g. C#) this logical view benefits from some syntactic sugar:
2D: T[,]
3D: T[,,]
42D: T[,,,...,,,]
There is an obvious parametric polymorphism as behind a 1D array for example can hide more than one type of array:
Array of integers: int[]
Array of strings references: string[]
Array of objects references: object[]
But how would you describe the structural polymorphism, the fact that the array can have multiple dimensions?
At the infrastructure level this is no more a part of the type system, and is only a logical view, so I think there is no polymorphism at all.
But at the language level it might be considered as some kind of inclusion polymorphism as all the arrays are logically presented as different types inheriting from a common base class.
Any input and correction is welcome.

The System.Array class has an essentially-infinite family of derivatives, one for each element type and possible number of dimensions. Because arrays predate generics, they have a special "quasi-hardcoded" means of accomplishing this. Basically, the .NET Framework contains a special hard-coded "recipe" so that given a base-item type and a number of dimensions, it can generate a type, derived from System.Array, which will act as an array with that specified base-item type and dimensionality. Further, because of the way array covariance works, arrays can behave as though they implement interfaces which are not actually part of their class, in ways that no other type can. For example, because a reference of type Cat[] is also a reference of type Animal[], Cat[] implements not only IList<Cat> but also IList<Animal>.
There is no mechanism via which any type other than System.Array can support such covariance, nor is there any way via which user code can define a family of classes such that for any integer n there will be a derivative whose get method takes n parameters. The generic types in .NET can do many things which arrays cannot, but since arrays are implemented via a mechanism other than the generic type facility, they can support some features that facility cannot.

All derived array types descend from System.Array and override the appropriate methods. Arrays are a bit special, though, in the sense that there is baked-in support in the CLR in the form of IL opcodes that interact directly with concrete array types. Still, though, this is the same kind of base class polymorphism as with other classes, at least when accessed via the Array class.
However, arrays are not actually 'polymorphic' in dimensionality in the sense you're thinking. A particular array instance is assigned a particular dimensionality when it's constructed and it can only be used with that dimensionality. This occurs both at the op code level and the GetValue methods. The IList implementation for 'Item' (the indexer) also only supports a single dimension; if you attempt to use it on a multi-dimensional array, you'll get an ArgumentException.
Also, just in case it's not clear, the A[x,y] syntax is actually not syntactic sugar. Multi-dimensional arrays are actually excluded from the built-in CLR support and instead are accessed via methods. This compiles to something like the following IL:
Ldloc A //(load array A onto the operation stack)
Ldloc x //(load value of variable x onto the operation stack)
Ldloc y //(load value of variable y onto the operation stack)
call int32[,]::Get(int, int) //(return a 4-byte signed integer from the array at indices [x,y], putting the result onto the operation stack)
You can compare this to the same for a single-dimensional array A[x]:
Ldloc A //(load array A onto the operation stack)
Ldloc x //(load value of variable x onto the operation stack)
Ldelem_I4 //(return the 4-byte signed integer from the array at index [x] and put the result onto the operation stack)

But how would you describe the structural polymorphism, the fact that the array can have multiple dimensions?
Three points:
At the language level, multidimensional arrays would qualify as internal polymorphism
Since multidimensional arrays can be emulated with associative arrays or lists, it need not depend on a specific data structure
Since multidimensional arrays can be created at runtime, it need not depend on type checking
References
Create a two-dimensional array at runtime
OO Uber-Zealotry Considered Silly
how c++ implements the polymorphism internally?
Refactoring a Switch Statement
C2 Wiki: External Polymorphism

Why do unboxed types have methods?

Why can you do things like
int i = 10;
i.ToString();
'c'.Equals('d');
1.ToString();
true.GetType();
in C#? Those things right there are either primitive, literal, unboxed, or any combination of those things; so why do they have methods? They are not objects and so should not have methods. Is this syntax sugar for something else? If so, what? I can understand having functions that do these things, for example:
string ToString(int number)
{
// Do mad code
return newString;
}
but in that case you would call it as a function, not a method:
string ranch = ToString(1);
What's going on here?
edit:
Just realised C# isn't a java clone anymore and the rules are totally different. oops :P

They act like that because the spec says so (and it's pretty nice) :
1.28 Value types
A value type is either a struct type or an enumeration type. C# provides a set of predefined struct types called the simple types.
The simple types are identified through reserved words.
...
1.28.4 Simple types
C# provides a set of predefined struct types called the simple types.
The simple types are identified through reserved words, but these
reserved words are simply aliases for predefined struct types in the
System namespace, as described in the table below.
...
Because a simple type aliases a struct type, every simple type has
members. For example, int has the members declared in System.Int32 and
the members inherited from System.Object, and the following statements
are permitted:
int i = int.MaxValue; // System.Int32.MaxValue constant
string s = i.ToString(); // System.Int32.ToString() instance method
string t = 123.ToString(); // System.Int32.ToString() instance method
The simple types differ from other struct types in that they permit
certain additional operations:
Most simple types permit values to be created by writing literals
(§1.16.4). For example, 123 is a literal of type int and 'a' is a
literal of type char. C# makes no provision for literals of struct
types in general, and nondefault values of other struct types are
ultimately always created through instance constructors of those
struct types.
As the spec explains simple types have some super powers like the ability to be const, a special literal syntax that could be used instead of new, and the capacity to be computed at compilation time (2+2 is actually written as 4 in the final MSIL stream)
But methods (as well as operators) aren't a special super powers and all structs could have them.
The specification (for C# 4.0, my copy paste is from an earlier version) could be downloaded from the microsoft website : C# Language Specification 4.0

Eric Lippert's recent article Inheritance and Representation explains.(Spoiler: You are confusing inheritance and representation.)
Not sure why you claim that the integer i, the character 'c' and the integer 1 are not objects. They are.

In C# all primitive types are actually structures.

So that you can use them!
It's convenient to be able to do so, so you can.
Now, in order to do so, primitives can be treated as structs. E.g. a 32-bit integer can be processed as a 32-bit integer, but it can also be processed as public struct Int32 : IComparable, IFormattable, IConvertible, IComparable<int>, IEquatable<int>. We mostly get the best of both worlds.

c# - where do arrays inherit from (i.e .int[] )

When creating an array, such as int[], does this inherit from anything? I thought it might inherit from System.Array, but after looking at the compiled CIL it does't appear so.
I thought it might inherit from System.Array or somehting similair, considering you can call methods and access properties on an array.
I.e.
int[] arr = {1, 2};
arr.Initialize();
arr.Length;

All arrays derive from System.Array. From an (admittedly ancient) edition of MSDN magazine:
All array types are implicitly derived from System.Array, which itself is derived from System.Object. This means that all arrays are always reference types which are allocated on the managed heap, and your app's variable contains a reference to the array and not the array itself.
From section 19.1.1 of the C# Language Specification (emphasis mine):
The type System.Array is the abstract base type of all array types. An implicit reference conversion
(§13.1.4) exists from any array type to System.Array and to any interface type implemented by
System.Array. An explicit reference conversion (§13.2.3) exists from System.Array and any interface
type implemented by System.Array to any array type. System.Array is not itself an array-type. Rather,
it is a class-type from which all array-types are derived.

An array does inherit from System.Array. It's a specialisation of a generic type, kind of like System.Array<int>, except that the runtime treats arrays as "special" - they are a special case of generics that existed in .NET 1.0 before the "general" generics were introduced in .NET 2.0.
Edit: Just checked my answer using Reflection and it looks like the base type of an array actually is System.Array. Corrected.

One thing I found interesting is that arrays also implements ICollection<>
int[] foo = new int[]{ };
ICollection<int> bar = foo;
// So, arrays would have type infered to use ICollection<> overload
IEnumerable<T> Foo<T>(Func<IEnumerable<T>> bar); // .Count()
ICollection<T> Foo<T>(Func<ICollection<T>> bar); // .Count

What is the Implementation of Generics for the NET Common Language Runtime

When you use generic collections in C# (or .NET in general), does the compiler basically do the leg-work developers used to have to do of making a generic collection for a specific type. So basically . . . it just saves us work?
Now that I think about it, that can't be right. Because without generics, we used to have to make collections that used a non-generic array internally, and so there was boxing and unboxing (if it was a collection of value types), etc.
So, how are generics rendered in CIL? What is it doing to impliment when we say we want a generic collection of something? I don't necessarily want CIL code examples (though that would be ok), I want to know the concepts of how the compiler takes our generic collections and renders them.
Thanks!
P.S. I know that I could use ildasm to look at this but I CIL still looks like chinese to me, and I am not ready to tackle that. I just want the concepts of how C# (and other languages I guess too) render in CIL to handle generics.

Forgive my verbose post, but this topic is quite broad. I'm going to attempt to describe what the C# compiler emits and how that's interpreted by the JIT compiler at runtime.
ECMA-335 (it's a really well written design document; check it out) is where it's at for knowing how everything, and I mean everything, is represented in a .NET assembly. There are a few related CLI metadata tables for generic information in an assembly:
GenericParam - Stores information about a generic parameter (index, flags, name, owning type/method).
GenericParamConstraint - Stores information about a generic parameter constraint (owning generic parameter, constraint type).
MethodSpec - Stores instantiated generic method signatures (e.g. Bar.Method<int> for Bar.Method<T>).
TypeSpec - Stores instantiated generic type signatures (.e.g. Bar<int> for Bar<T>).
So with this in mind, let's walk through a simple example using this class:
class Foo<T>
{
public T SomeProperty { get; set; }
}
When the C# compiler compiles this example, it will define Foo in the TypeDef metadata table, like it would for any other type. Unlike a non-generic type, it will also have an entry in the GenericParam table that will describe its generic parameter (index = 0, flags = ?, name = (index into String heap, "T"), owner = type "Foo").
One of the columns of data in the TypeDef table is the starting index into the MethodDef table that is the continuous list of methods defined on this type. For Foo, we've defined three methods: a getter and a setter to SomeProperty and a default constructor supplied by the compiler. As a result, the MethodDef table would hold a row for each of these methods. One of the important columns in the MethodDef table is the "Signature" column. This column stores a reference to a blob of bytes that describes the exact signature of the method. ECMA-335 goes into great detail about these metadata signature blobs, so I won't regurgitate that information here.
The method signature blob contains type information about the parameters as well as the return value. In our example, the setter takes a T and the getter returns a T. Well, what is a T then? In the signature blob, it's going to be a special value that means "the generic type parameter at index 0". This means the row in the GenericParams table that has index=0 with owner=type "Foo", which is our "T".
The same thing goes for the auto-property backing store field. Foo's entry in the TypeDef table will have a starting index into the Field table and the Field table has a "Signature" column. The field's signature will denote that the field's type is "the generic type parameter at index 0".
This is all well and good, but where does the code generation come into play when T is different types? It's actually the responsibility of the JIT compiler to generate the code for the generic instantiations and not the C# compiler.
Let's take a look at an example:
Foo<int> f1 = new Foo<int>();
f1.SomeProperty = 10;
Foo<string> f2 = new Foo<string>();
f2.SomeProperty = "hello";
This will compile to something like this CIL:
newobj <MemberRefToken1> // new Foo<int>()
stloc.0 // Store in local "f1"
ldloc.0 // Load local "f1"
ldc.i4.s 10 // Load a constant 32-bit integer with value 10
callvirt <MemberRefToken2> // Call f1.set_SomeProperty(10)
newobj <MemberRefToken3> // new Foo<string>()
stloc.1 // Store in local "f2"
ldloc.1 // Load local "f2"
ldstr <StringToken> // Load "hello" (which is in the user string heap)
callvirt <MemberRefToken4> // Call f2.set_SomeProperty("hello")
So what's this MemberRefToken business? A MemberRefToken is a metadata token (tokens are four byte values with the most-significant-byte being a metadata table identifier and the remaining three bytes are the row number, 1-based) that references a row in the MemberRef metadata table. This table stores a reference to a method or field. Before generics, this is the table that would store information about methods/fields you're using from types defined in referenced assemblies. However, it can also be used to reference a member on a generic instantiation. So let's say that MemberRefToken1 refers to the first row in the MemberRef table. It might contain this data: class = TypeSpecToken1, name = ".ctor", blob = <reference to expected signature blob of .ctor>.
TypeSpecToken1 would refer to the first row in the TypeSpec table. From above we know this table stores the instantiations of generic types. In this case, this row would contain a reference to a signature blob for "Foo<int>". So this MemberRefToken1 is really saying we are referencing "Foo<int>.ctor()".
MemberRefToken1 and MemberRefToken2 would share the same class value, i.e. TypeSpecToken1. They would differ, however, on the name and signature blob (MethodRefToken2 would be for "set_SomeProperty"). Likewise, MemberRefToken3 and MemberRefToken4 would share TypeSpecToken2, the instantiation of "Foo<string>", but differ on the name and blob in the same way.
When the JIT compiler compiles the above CIL, it notices that it's seeing a generic instantiation it hasn't seen before (i.e. Foo<int> or Foo<string>). What happens next is covered pretty well by Shiv Kumar's answer, so I won't repeat it in detail here. Simply put, when the JIT compiler encounters a new instantiated generic type, it may emit a whole new type into its type system with a field layout using the actual types in the instantiation in place of the generic parameters. They would also have their own method tables and JIT compilation of each method would involve replacing references to the generic parameters with the actual types from the instantiation. It's also the responsibility of the JIT compiler to enforce correctness and verifiability of the CIL.
So to sum up: C# compiler emits metadata describing what's generic and how generic types/methods are instantiated. The JIT compiler uses this information to emit new types (assuming it isn't compatible with an existing instantiation) at runtime for instantiated generic types and each type will have its own copy of the code that has been JIT compiled based on the actual types used in the instantiation.
Hopefully this made sense in some small way.

For value types, there is a specific "class" defined at run time for each value type generic class. For reference types there is only one class definition that is reused across the different types.
I'm simplifying here, but that's the concept.
Design and Implementation of Generics for the
NET Common Language Runtime
Our scheme runs roughly as follows:
When the runtime requires a particular
instantiation of a parameterized
class, the loader checks to see if the
instantiation is compatible with any
that it has seen before; if not, then
a ﬁeld layout is determined and new
vtable is created, to be shared
between all compatible instantiations.
The items in this vtable are entry
stubs for the methods of the class.
When these stubs are later invoked,
they will generate (“just- in-time”)
code to be shared for all compatible
instantiations. When compiling the
invocation of a (non-virtual)
polymorphic method at a particular
instantiation, we ﬁrst check to see
if we have compiled such a call before
for some compatible instantiation; if
not, then an entry stub is generated,
which will in turn generate code to be
shared for all compatible
instantiations. Two instantiations are
compatible if for any parameterized
class its compilation at these
instantiations gives rise to identical
code and other execution structures
(e.g. ﬁeld layout and GC tables),
apart from the dictionaries described
below in Section 4.4. In particular,
all reference types are compatible
with each other, because the loader
and JIT compiler make no distinction
for the purposes of ﬁeld layout or
code generation. On the implementation
for the Intel x86, at least, primitive
types are mutually incompatible, even
if they have the same size (ﬂoats and
ints have different parameter passing
conventions). That leaves user-deﬁned
struct types, which are compatible if
their layout is the same with respect
to garbage collection i.e. they share
the same pattern of traced pointers.
http://research.microsoft.com/pubs/64031/designandimplementationofgenerics.pdf

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

What's the magic of arrays in C# - c#

I would recommend getting the ECMA 335 spec and looking for Arrays if you want to know the low level detail: http://www.ecma-international.org/publications/standards/Ecma-335.htm

Related

Generic base class implementation for both struct and object [duplicate]

How would you call the kind of polymorphism introduced by arrays in C#

Why do unboxed types have methods?

c# - where do arrays inherit from (i.e .int[] )

What is the Implementation of Generics for the NET Common Language Runtime

Categories

Resources