how virtual generic method call is implemented? - c#

I'm interesting in how CLR implementes the calls like this:
abstract class A {
public abstract void Foo<T, U, V>();
}
A a = ...
a.Foo<int, string, decimal>(); // <=== ?
Is this call cause an some kind of hash map lookup by type parameters tokens as the keys and compiled generic method specialization (one for all reference types and the different code for all the value types) as the values?

I didn't find much exact information about this, so much of this answer is based on the excellent paper on .Net generics from 2001 (even before .Net 1.0 came out!), one short note in a follow-up paper and what I gathered from SSCLI v. 2.0 source code (even though I wasn't able to find the exact code for calling virtual generic methods).
Let's start simple: how is a non-generic non-virtual method called? By directly calling the method code, so the compiled code contains direct address. The compiler gets the method address from the method table (see next paragraph). Can it be that simple? Well, almost. The fact that methods are JITed makes it a little more complicated: what is actually called is either code that compiles the method and only then executes it, if it wasn't compiled yet; or it's one instruction that directly calls the compiled code, if it already exists. I'm going to ignore this detail further on.
Now, how is a non-generic virtual method called? Similar to polymorphism in languages like C++, there is a method table accessible from the this pointer (reference). Each derived class has its own method table and its methods there. So, to call a virtual method, get the reference to this (passed in as a parameter), from there, get the reference to the method table, look at the correct entry in it (the entry number is constant for specific function) and call the code the entry points to. Calling methods through interfaces is slightly more complicated, but not interesting for us now.
Now we need to know about code sharing. Code can be shared between two “instances” of the same method, if reference types in type parameters correspond to any other reference types, and value types are exactly the same. So, for example C<string>.M<int>() shares code with C<object>.M<int>(), but not with C<string>.M<byte>(). There is no difference between type type parameters and method type parameters. (The original paper from 2001 mentions that code can be shared also when both parameters are structs with the same layout, but I'm not sure this is true in the actual implementation.)
Let's make an intermediate step on our way to generic methods: non-generic methods in generic types. Because of code sharing, we need to get the type parameters from somewhere (e.g. for calling code like new T[]). For this reason, each instantiation of generic type (e.g. C<string> and C<object>) has its own type handle, which contains the type parameters and also method table. Ordinary methods can access this type handle (technically a structure confusingly called MethodTable, even though it contains more than just the method table) from the this reference. There are two types of methods that can't do that: static methods and methods on value types. For those, the type handle is passed in as a hidden argument.
For non-virtual generic methods, the type handle is not enough and so they get different hidden argument, MethodDesc, that contains the type parameters. Also, the compiler can't store the instantiations in the ordinary method table, because that's static. So it creates a second, different method table for generic methods, which is indexed by type parameters, and gets the method address from there, if it already exists with compatible type parameters, or creates a new entry.
Virtual generic methods are now simple: the compiler doesn't know the concrete type, so it has to use the method table at runtime. And the normal method table can't be used, so it has to look in the special method table for generic methods. Of course, the hidden parameter containing type parameters is still present.
One interesting tidbit learned while researching this: because the JITer is very lazy, the following (completely useless) code works:
object Lift<T>(int count) where T : new()
{
if (count == 0)
return new T();
return Lift<List<T>>(count - 1);
}
The equivalent C++ code causes the compiler to give up with a stack overflow.

Yes. The code for specific type is generated at the runtime by CLR and keeps a hashtable (or similar) of implementations.
Page 372 of CLR via C#:
When a method that uses generic type
parameters is JIT-compiled, the CLR
takes the method's IL, substitutes the
specified type arguments, and then
creates native code that is specific
to that method operating on the
specified data types. This is exactly
what you want and is one of the main
features of generics. However, there
is a downside to this: the CLR keeps
generating native code for every
method/type combination. This is
referred to as code explosion. This
can end up increasing the
application's working set
substantially, thereby hurting
performance.
Fortunately, the CLR has some
optimizations built into it to reduce
code explosion. First, if a method is
called for a particular type argument,
and later, the method is called again
using the same type argument, the CLR
will compile the code for this
method/type combination just once. So
if one assembly uses List,
and a completely different assembly
(loaded in the same AppDomain) also
uses List, the CLR will
compile the methods for List
just once. This reduces code explosion
substantially.

EDIT
I now came across I now came across https://msdn.microsoft.com/en-us/library/sbh15dya.aspx which clearly states that generics when using reference types are reusing the same code, thus I would accept that as the definitive authority.
ORIGINAL ANSWER
I am seeing here two disagreeing answers, and both have references to their side, so I will try to add my two cents.
First, Clr via C# by Jeffrey Richter published by Microsoft Press is as valid as an msdn blog, especially as the blog is already outdated, (for more books from him take a look at http://www.amazon.com/Jeffrey-Richter/e/B000APH134 one must agree that he is an expert on windows and .net).
Now let me do my own analysis.
Clearly two generic types that contain different reference type arguments cannot share the same code
For example, List<TypeA> and List<TypeB>> cannot share the same code, as this would cause the ability to add an object of TypeA to List<TypeB> via reflection, and the clr is strongly typed on genetics as well, (unlike Java in which only the compiler validates generic, but the underlying JVM has no clue about them).
And this does not apply only to types, but to methods as well, since for example a generic method of type T can create an object of type T (for example nothing prevents it from creating a new List<T>), in which case reusing the same code would cause havoc.
Furthermore the GetType method is not overridable, and it in fact always return the correct generic type, prooving that each type argument has indeed its own code.
(This point is even more important than it looks, as the clr and jit work based on the type object created for that object, by using GetType () which simply means that for each type argument there must be a separate object even for reference types)
Another issue that would result from code reuse, as the is and as operators will no longer work correctly, and in general all types of casting will have serious problems.
NOW TO ACTUAL TESTING:
I have tested it by having a generic type that contaied a static member, and than created two object with different type parameters, and the static fields were clrearly not shared, clearly prooving that code is not shared even for reference types.
EDIT:
See http://blogs.msdn.com/b/csharpfaq/archive/2004/03/12/how-do-c-generics-compare-to-c-templates.aspx on how it is implemented:
Space Use
The use of space is different between C++ and C#. Because C++
templates are done at compile time, each use of a different type in a
template results in a separate chunk of code being created by the
compiler.
In the C# world, it's somewhat different. The actual implementations
using a specific type are created at runtime. When the runtime creates
a type like List, the JIT will see if that has already been
created. If it has, it merely users that code. If not, it will take
the IL that the compiler generated and do appropriate replacements
with the actual type.
That's not quite correct. There is a separate native code path for
every value type, but since reference types are all reference-sized,
they can share their implementation.
This means that the C# approach should have a smaller footprint on
disk, and in memory, so that's an advantage for generics over C++
templates.
In fact, the C++ linker implements a feature known as “template
folding“, where the linker looks for native code sections that are
identical, and if it finds them, folds them together. So it's not a
clear-cut as it would seem to be.
As one can see the CLR "can" reuse the implementation for reference types, as do current c++ compilers, however there is no guarantee on that, and for unsafe code using stackalloc and pointers it is probably not the case, and there might be other situations as well.
However what we do have to know that in CLR type system, they are treated as different types, such as different calls to static constructors, separate static fields, separate type objects, and a object of a type argument T1 should not be able to access a private field of another object with type argument T2 (although for an object of the same type it is indeed possible to access private fields from another object of the same type).

Related

How is 'new T()' or 'typeof(T)' compiled?

I do not know an awful lot about IL, the CLR, or the JIT compiler. But what I've read is that generic code is compiled to generic IL, and only the JIT compiler eventually instantiates the generic code with the provided concrete type arguments. I makes sense that there might be multiple versions of some generic code fragment (method or class), because they have different representations in memory. For example, there is one version for each value type (or set thereof), and another for the set of all reference types. The type checking has already happened, so the JIT compiler is just concerned about representation.
However, what about statements like new T() or typeof(T), where T is a generic type argument in some surrounding class/method? My assumption is that these statements have to be re-compiled for absolutely every possible concrete type parameter for T, since the code actually is different for each of them. If you don't know T, you also cannot know which constructor to call in new T(), right?
It would be great to get some insight into this.

What's the method representation in memory?

While thinking a little bit about programming in Java/C# I wondered about how methods which belong to objects are represented in memory and how this fact does concern multi threading.
Is a method instantiated for each object in memory seperately or do
all objects of the same type share one instance of the method?
If the latter, how does the executing thread know which object's
attributes to use?
Is it possible to modify the code of a method in
C# with reflection for one, and only one object of many objects of
the same type?
Is a static method which does not use class attributes always thread safe?
I tried to make up my mind about these questions, but I'm very unsure about their answers.
Each method in your source code (in Java, C#, C++, Pascal, I think every OO and procedural language...) has only one copy in binaries and in memory.
Multiple instances of one object have separate fields but all share the same method code. Technically there is a procedure that takes a hidden this parameter to provide an illusion of executing a method on an object. In reality you are calling a procedure and passing structure (a bag of fields) to it along with other parameters. Here is a simple Java object and more-or-less equivalent pseudo-C code:
class Foo {
private int x;
int mulBy(int y) {
return x * y
}
}
Foo foo = new Foo()
foo.mulBy(3)
is translated to this pseude-C code (the encapsulation is forced by the compiler and runtime/VM):
struct Foo {
int x = 0;
}
int Foo_mulBy(Foo *this, int y) {
return this->x * y;
}
Foo* foo = new Foo();
Foo_mulBy(foo, 3)
You have to draw a difference between code and local variables and parameters it operates on (the data). Data is stored on call stack, local to each thread. Code can be executed by multiple threads, each thread has its own copy of instruction pointer (place in the method it currently executes). Also because this is a parameter, it is thread-local, so each thread can operate on a different object concurrently, even though it runs the same code.
That being said you cannot modify a method of only one instance because the method code is shared among all instances.
The Java specifications don't dictate how to do memory layout, and different implementations can do whatever they like, providing it meets the spec where it matters.
Having said that, the mainstream Oracle JVM (HotSpot) works off of things called oops - Ordinary Object Pointers. These consist of two words of header followed by the data which comprises the instance member fields (stored inline for primitive types, and as pointers for reference member fields).
One of the two header words - the class word - is a pointer to a klassOop. This is a special type of oop which holds pointers to the instance methods of the class (basically, the Java equivalent of a C++ vtable). The klassOop is kind-of a VM-level representation of the Class object corresponding to the Java type.
If you're curious about the low-level detail, you can find out a lot more by looking in the OpenJDK source for the definition of some of the oop types (klassOop is a good place to start).
tl;dr Java holds one blob of code for each method of each type. The blobs of code are shared among each instance of the type, and hidden this pointers are used to know which instance's members to use.
I am going to try to answer this in the context of C#.There are basically 3 different types of Methods
virtual
non-virtual
static
When your code is executed, you basically have two kinds of objects that are formed on the heap.
The object corresponding to the type of the object. This is called Type Object. This holds the type object pointer, the sync block index, the static fields and the method table.
The object corresponding to the object itself, which contains all the non static fields.
In response to your questions,
Is a method instantiated for each object in memory seperately or do all objects of the same type share one instance of the method?
This is a wrong way of understanding objects. All methods are per type only. Look at it this way. A method is just a set of instructions. The first time you call a particular method, the IL code is JITed into native instructions and saved in memory. The next time this is called, the address is picked up from the method table and the same instructions are executed again.
2.If the latter, how does the executing thread know which object's attributes to use?
Each static method call on a Type results in looking up the method table from the corresponding Type Object and finding the address of the JITed instruction. In case of methods that are not static, the the relevant object on which the method is called is maintained on the thread's local stack. Basically, you get the nearest object on the stack. That is always the object on which we want the method to be called.
3.Is it possible to modify the code of a method in C# with reflection for one, and only one object of many objects of the same type?
No, It is not possible now. (And I am thankful for that). The reason is that reflection only allows code inspection. If you figure out what some method actually means, there is no way you are going to be able to change the code in the same assembly.

Why are delegates reference types?

Quick note on the accepted answer: I disagree with a small part of Jeffrey's answer, namely the point that since Delegate had to be a reference type, it follows that all delegates are reference types. (It simply isn't true that a multi-level inheritance chain rules out value types; all enum types, for example, inherit from System.Enum, which in turn inherits from System.ValueType, which inherits from System.Object, all reference types.) However I think the fact that, fundamentally, all delegates in fact inherit not just from Delegate but from MulticastDelegate is the critical realization here. As Raymond points out in a comment to his answer, once you've committed to supporting multiple subscribers, there's really no point in not using a reference type for the delegate itself, given the need for an array somewhere.
See update at bottom.
It has always seemed strange to me that if I do this:
Action foo = obj.Foo;
I am creating a new Action object, every time. I'm sure the cost is minimal, but it involves allocation of memory to later be garbage collected.
Given that delegates are inherently themselves immutable, I wonder why they couldn't be value types? Then a line of code like the one above would incur nothing more than a simple assignment to a memory address on the stack*.
Even considering anonymous functions, it seems (to me) this would work. Consider the following simple example.
Action foo = () => { obj.Foo(); };
In this case foo does constitute a closure, yes. And in many cases, I imagine this does require an actual reference type (such as when local variables are closed over and are modified within the closure). But in some cases, it shouldn't. For instance in the above case, it seems that a type to support the closure could look like this: I take back my original point about this. The below really does need to be a reference type (or: it doesn't need to be, but if it's a struct it's just going to get boxed anyway). So, disregard the below code example. I leave it only to provide context for answers the specfically mention it.
struct CompilerGenerated
{
Obj obj;
public CompilerGenerated(Obj obj)
{
this.obj = obj;
}
public void CallFoo()
{
obj.Foo();
}
}
// ...elsewhere...
// This would not require any long-term memory allocation
// if Action were a value type, since CompilerGenerated
// is also a value type.
Action foo = new CompilerGenerated(obj).CallFoo;
Does this question make sense? As I see it, there are two possible explanations:
Implementing delegates properly as value types would have required additional work/complexity, since support for things like closures that do modify values of local variables would have required compiler-generated reference types anyway.
There are some other reasons why, under the hood, delegates simply can't be implemented as value types.
In the end, I'm not losing any sleep over this; it's just something I've been curious about for a little while.
Update: In response to Ani's comment, I see why the CompilerGenerated type in my above example might as well be a reference type, since if a delegate is going to comprise a function pointer and an object pointer it'll need a reference type anyway (at least for anonymous functions using closures, since even if you introduced an additional generic type parameter—e.g., Action<TCaller>—this wouldn't cover types that can't be named!). However, all this does is kind of make me regret bringing the question of compiler-generated types for closures into the discussion at all! My main question is about delegates, i.e., the thing with the function pointer and the object pointer. It still seems to me that could be a value type.
In other words, even if this...
Action foo = () => { obj.Foo(); };
...requires the creation of one reference type object (to support the closure, and give the delegate something to reference), why does it require the creation of two (the closure-supporting object plus the Action delegate)?
*Yes, yes, implementation detail, I know! All I really mean is short-term memory storage.
The question boils down to this: the CLI (Common Language Infrastructure) specification says that delegates are reference types. Why is this so?
One reason is clearly visible in the .NET Framework today. In the original design, there were two kinds of delegates: normal delegates and "multicast" delegates, which could have more than one target in their invocation list. The MulticastDelegate class inherits from Delegate. Since you can't inherit from a value type, Delegate had to be a reference type.
In the end, all actual delegates ended up being multicast delegates, but at that stage in the process, it was too late to merge the two classes. See this blog post about this exact topic:
We abandoned the distinction between Delegate and MulticastDelegate
towards the end of V1. At that time, it would have been a massive
change to merge the two classes so we didn’t do so. You should
pretend that they are merged and that only MulticastDelegate exists.
In addition, delegates currently have 4-6 fields, all pointers. 16 bytes is usually considered the upper bound where saving memory still wins out over extra copying. A 64-bit MulticastDelegate takes up 48 bytes. Given this, and the fact that they were using inheritance suggests that a class was the natural choice.
There is only one reason that Delegate needs to be a class, but it's a big one: while a delegate could be small enough to allow efficient storage as a value type (8 bytes on 32-bit systems, or 16 bytes on 64-bit systems), there's no way it could be small enough to efficiently guarantee if one thread attempts to write a delegate while another thread attempts to execute it, the latter thread wouldn't end up either invoking the old method on the new target, or the new method on the old target. Allowing such a thing to occur would be a major security hole. Having delegates be reference types avoids this risk.
Actually, even better than having delegates be structure types would be having them be interfaces. Creating a closure requires creating two heap objects: a compiler-generated object to hold any closed-over variables, and a delegate to invoke the proper method on that object. If delegates were interfaces, the object which held the closed-over variables could itself be used as the delegate, with no other object required.
Imagine if delegates were value types.
public delegate void Notify();
void SignalTwice(Notify notify) { notify(); notify(); }
int counter = 0;
Notify handler = () => { counter++; }
SignalTwice(handler);
System.Console.WriteLine(counter); // what should this print?
Per your proposal, this would internally be converted to
struct CompilerGenerated
{
int counter = 0;
public Execute() { ++counter; }
};
Notify handler = new CompilerGenerated();
SignalTwice(handler);
System.Console.WriteLine(counter); // what should this print?
If delegate were a value type, then SignalEvent would get a copy of handler, which means that a brand new CompilerGenerated would be created (a copy of handler) and passed to SignalEvent. SignalTwice would execute the delegate twice, which increments the counter twice in the copy. And then SignalTwice returns, and the function prints 0, because the original was not modified.
Here's an uninformed guess:
If delegates were implemented as value-types, instances would be very expensive to copy around since a delegate-instance is relatively heavy. Perhaps MS felt it would be safer to design them as immutable reference types - copying machine-word sized references to instances are relatively cheap.
A delegate instance needs, at the very least:
An object reference (the "this" reference for the wrapped method if it is an instance method).
A pointer to the wrapped function.
A reference to the object containing the multicast invocation list. Note that a delegate-type should support, by design, multicast using the same delegate type.
Let's assume that value-type delegates were implemented in a similar manner to the current reference-type implementation (this is perhaps somewhat unreasonable; a different design may well have been chosen to keep the size down) to illustrate. Using Reflector, here are the fields required in a delegate instance:
System.Delegate: _methodBase, _methodPtr, _methodPtrAux, _target
System.MulticastDelegate: _invocationCount, _invocationList
If implemented as a struct (no object header), these would add up to 24 bytes on x86 and 48 bytes on x64, which is massive for a struct.
On another note, I want to ask how, in your proposed design, making the CompilerGenerated closure-type a struct helps in any way. Where would the created delegate's object pointer point to? Leaving the closure type instance on the stack without proper escape analysis would be extremely risky business.
I can tell that making delegates as reference types is definitely a bad design choice. They could be value types and still support multi-cast delegates.
Imagine that Delegate is a struct composed of, let's say:
object target;
pointer to the method
It can be a struct, right?
The boxing will only occur if the target is a struct (but the delegate itself will not be boxed).
You may think it will not support MultiCastDelegate, but then we can:
Create a new object that will hold the array of normal delegates.
Return a Delegate (as struct) to that new object, which will implement Invoke iterating over all its values and calling Invoke on them.
So, for normal delegates, that are never going to call two or more handlers, it could work as a struct.
Unfortunately, that is not going to change in .Net.
As a side note, variance does not requires the Delegate to be reference types. The parameters of the delegate should be reference types. After all, if you pass a string were an object is required (for input, not ref or out), then no cast is needed, as string is already an object.
I saw this interesting conversation on the Internet:
Immutable doesn't mean it has to be a value type. And something that
is a value type is not required to be immutable. The two often go
hand-in-hand, but they are not actually the same thing, and there are
in fact counter-examples of each in the .NET Framework (the String
class, for example).
And the answer:
The difference being that while immutable reference types are
reasonably common and perfectly reasonable, making value types mutable
is almost always a bad idea, and can result in some very confusing
behaviour!
Taken from here
So, in my opinion the decision was made by language usability aspects, and not by compiler technological difficulties. I love nullable delegates.
I guess one reason is support for multi cast delegates Multi cast delegates are more complex than simply a few fields indicating target and method.
Another thing that's only possible in this form is delegate variance. This kind of variance requires a reference conversion between the two types.
Interestingly F# defines it's own function pointer type that's similar to delegates, but more lightweight. But I'm not sure if it's a value or reference type.

Is there a name for this pattern? (C# compile-time type-safety with "params" args of different types)

Is there a name for this pattern?
Let's say you want to create a method that takes a variable number of arguments, each of which must be one of a fixed set of types (in any order or combination), and some of those types you have no control over. A common approach would be to have your method take arguments of type Object, and validate the types at runtime:
void MyMethod (params object[] args)
{
foreach (object arg in args)
{
if (arg is SomeType)
DoSomethingWith((SomeType) arg);
else if (arg is SomeOtherType)
DoSomethingElseWith((SomeOtherType) arg);
// ... etc.
else throw new Exception("bogus arg");
}
}
However, let's say that, like me, you're obsessed with compile-time type safety, and want to be able to validate your method's argument types at compile time. Here's an approach I came up with:
void MyMethod (params MyArg[] args)
{
// ... etc.
}
struct MyArg
{
public readonly object TheRealArg;
private MyArg (object obj) { this.TheRealArg = obj; }
// For each type (represented below by "GoodType") that you want your
// method to accept, define an implicit cast operator as follows:
static public implicit operator MyArg (GoodType x)
{ return new MyArg(x); }
}
The implicit casts allow you to pass arguments of valid types directly to your routine, without having to explicitly cast or wrap them. If you try to pass a value of an unacceptable type, the error will be caught at compile time.
I'm sure others have used this approach, so I'm wondering if there's a name for this pattern.
There doesn't seem to be a named pattern on the Interwebs, but based on Ryan's comment to your question, I vote the name of the pattern should be Variadic Typesafety.
Generally, I would use it very sparingly, but I'm not judging the pattern as good or bad. Many of the commenters have made good points pro and con, which we see for other patterns such as Factory, Service Locator, Dependency Injection, MVVM, etc. It's all about context. So here's a stab at that...
Context
A variable set of disparate objects must be processed.
Use When
Your method can accept a variable number of arguments of disparate types that don't have a common base type.
Your method is widely used (i.e. in many places in code and/or by a great number of users of your framework. The point being that the type-safety provides enough of a benefit to warrant its use.
The arguments can be passed in any order, yet the set of disparate types is finite and the only set acceptable to the method.
Expressiveness is your design goal and you don't want to put the onus on the user to create wrappers or adapters (see Alternatives).
Implementation
You already provided that.
Examples
LINQ to XML (e.g. new XElement(...))
Other builders such as those that build SQL parameters.
Processor facades (e.g. those that could accept different types of delegates or command objects from different frameworks) to execute the commands without the need to create explicit command adapters.
Alternatives
Adapter. Accept a variable number of arguments of some adapter type (e.g. Adapter<T> or subclasses of a non-generic Adapter) that the method can work with to produce the desired results. This widens the set your method can use (the types are no-longer finite), but nothing is lost if the adapter does the right thing to enable the processing to still work. The disadvantage is the user has the additional burden of specifying existing and/or creating new adapters, which perhaps detracts from the intent (i.e. adds "ceremony", and weakens "essence").
Remove Type Safety. This entails accepting a very base type (e.g. Object) and placing runtime checks. Burden on what to know to pass is passed to the user, but the code is still expressive. Errors don't reveal themselves until runtime.
Composite. Pass a single object that is a composite of others. This requires a pre-method-call build up, but devolves back to using one of the above patterns for the items in the composite's collection.
Fluent API. Replace the single call with a series of specific calls, one for each type of acceptable argument. A canonical example is StringBuilder.
It is called an anti-pattern more commonly known as Poltergeist.
Update:
If the types of args are always constant and the order does not matter, then create overloads that each take collections (IEnumerable<T>) changing T in each method for the type you need to operate on. This will reduce your code complexity by:
Removing the MyArg class
eliminating the need for type casting in your MyMethod
will add an additional compile time type safety in that if you try to call the method with a list of args that you can't handle, you will get a compiler exception.
This looks like a subset of the Composite Pattern. Quoting from Wikipedia:
The composite pattern describes that a group of objects are to be treated in the same way as a single instance of an object.

What is the Implementation of Generics for the NET Common Language Runtime

When you use generic collections in C# (or .NET in general), does the compiler basically do the leg-work developers used to have to do of making a generic collection for a specific type. So basically . . . it just saves us work?
Now that I think about it, that can't be right. Because without generics, we used to have to make collections that used a non-generic array internally, and so there was boxing and unboxing (if it was a collection of value types), etc.
So, how are generics rendered in CIL? What is it doing to impliment when we say we want a generic collection of something? I don't necessarily want CIL code examples (though that would be ok), I want to know the concepts of how the compiler takes our generic collections and renders them.
Thanks!
P.S. I know that I could use ildasm to look at this but I CIL still looks like chinese to me, and I am not ready to tackle that. I just want the concepts of how C# (and other languages I guess too) render in CIL to handle generics.
Forgive my verbose post, but this topic is quite broad. I'm going to attempt to describe what the C# compiler emits and how that's interpreted by the JIT compiler at runtime.
ECMA-335 (it's a really well written design document; check it out) is where it's at for knowing how everything, and I mean everything, is represented in a .NET assembly. There are a few related CLI metadata tables for generic information in an assembly:
GenericParam - Stores information about a generic parameter (index, flags, name, owning type/method).
GenericParamConstraint - Stores information about a generic parameter constraint (owning generic parameter, constraint type).
MethodSpec - Stores instantiated generic method signatures (e.g. Bar.Method<int> for Bar.Method<T>).
TypeSpec - Stores instantiated generic type signatures (.e.g. Bar<int> for Bar<T>).
So with this in mind, let's walk through a simple example using this class:
class Foo<T>
{
public T SomeProperty { get; set; }
}
When the C# compiler compiles this example, it will define Foo in the TypeDef metadata table, like it would for any other type. Unlike a non-generic type, it will also have an entry in the GenericParam table that will describe its generic parameter (index = 0, flags = ?, name = (index into String heap, "T"), owner = type "Foo").
One of the columns of data in the TypeDef table is the starting index into the MethodDef table that is the continuous list of methods defined on this type. For Foo, we've defined three methods: a getter and a setter to SomeProperty and a default constructor supplied by the compiler. As a result, the MethodDef table would hold a row for each of these methods. One of the important columns in the MethodDef table is the "Signature" column. This column stores a reference to a blob of bytes that describes the exact signature of the method. ECMA-335 goes into great detail about these metadata signature blobs, so I won't regurgitate that information here.
The method signature blob contains type information about the parameters as well as the return value. In our example, the setter takes a T and the getter returns a T. Well, what is a T then? In the signature blob, it's going to be a special value that means "the generic type parameter at index 0". This means the row in the GenericParams table that has index=0 with owner=type "Foo", which is our "T".
The same thing goes for the auto-property backing store field. Foo's entry in the TypeDef table will have a starting index into the Field table and the Field table has a "Signature" column. The field's signature will denote that the field's type is "the generic type parameter at index 0".
This is all well and good, but where does the code generation come into play when T is different types? It's actually the responsibility of the JIT compiler to generate the code for the generic instantiations and not the C# compiler.
Let's take a look at an example:
Foo<int> f1 = new Foo<int>();
f1.SomeProperty = 10;
Foo<string> f2 = new Foo<string>();
f2.SomeProperty = "hello";
This will compile to something like this CIL:
newobj <MemberRefToken1> // new Foo<int>()
stloc.0 // Store in local "f1"
ldloc.0 // Load local "f1"
ldc.i4.s 10 // Load a constant 32-bit integer with value 10
callvirt <MemberRefToken2> // Call f1.set_SomeProperty(10)
newobj <MemberRefToken3> // new Foo<string>()
stloc.1 // Store in local "f2"
ldloc.1 // Load local "f2"
ldstr <StringToken> // Load "hello" (which is in the user string heap)
callvirt <MemberRefToken4> // Call f2.set_SomeProperty("hello")
So what's this MemberRefToken business? A MemberRefToken is a metadata token (tokens are four byte values with the most-significant-byte being a metadata table identifier and the remaining three bytes are the row number, 1-based) that references a row in the MemberRef metadata table. This table stores a reference to a method or field. Before generics, this is the table that would store information about methods/fields you're using from types defined in referenced assemblies. However, it can also be used to reference a member on a generic instantiation. So let's say that MemberRefToken1 refers to the first row in the MemberRef table. It might contain this data: class = TypeSpecToken1, name = ".ctor", blob = <reference to expected signature blob of .ctor>.
TypeSpecToken1 would refer to the first row in the TypeSpec table. From above we know this table stores the instantiations of generic types. In this case, this row would contain a reference to a signature blob for "Foo<int>". So this MemberRefToken1 is really saying we are referencing "Foo<int>.ctor()".
MemberRefToken1 and MemberRefToken2 would share the same class value, i.e. TypeSpecToken1. They would differ, however, on the name and signature blob (MethodRefToken2 would be for "set_SomeProperty"). Likewise, MemberRefToken3 and MemberRefToken4 would share TypeSpecToken2, the instantiation of "Foo<string>", but differ on the name and blob in the same way.
When the JIT compiler compiles the above CIL, it notices that it's seeing a generic instantiation it hasn't seen before (i.e. Foo<int> or Foo<string>). What happens next is covered pretty well by Shiv Kumar's answer, so I won't repeat it in detail here. Simply put, when the JIT compiler encounters a new instantiated generic type, it may emit a whole new type into its type system with a field layout using the actual types in the instantiation in place of the generic parameters. They would also have their own method tables and JIT compilation of each method would involve replacing references to the generic parameters with the actual types from the instantiation. It's also the responsibility of the JIT compiler to enforce correctness and verifiability of the CIL.
So to sum up: C# compiler emits metadata describing what's generic and how generic types/methods are instantiated. The JIT compiler uses this information to emit new types (assuming it isn't compatible with an existing instantiation) at runtime for instantiated generic types and each type will have its own copy of the code that has been JIT compiled based on the actual types used in the instantiation.
Hopefully this made sense in some small way.
For value types, there is a specific "class" defined at run time for each value type generic class. For reference types there is only one class definition that is reused across the different types.
I'm simplifying here, but that's the concept.
Design and Implementation of Generics for the
NET Common Language Runtime
Our scheme runs roughly as follows:
When the runtime requires a particular
instantiation of a parameterized
class, the loader checks to see if the
instantiation is compatible with any
that it has seen before; if not, then
a field layout is determined and new
vtable is created, to be shared
between all compatible instantiations.
The items in this vtable are entry
stubs for the methods of the class.
When these stubs are later invoked,
they will generate (“just- in-time”)
code to be shared for all compatible
instantiations. When compiling the
invocation of a (non-virtual)
polymorphic method at a particular
instantiation, we first check to see
if we have compiled such a call before
for some compatible instantiation; if
not, then an entry stub is generated,
which will in turn generate code to be
shared for all compatible
instantiations. Two instantiations are
compatible if for any parameterized
class its compilation at these
instantiations gives rise to identical
code and other execution structures
(e.g. field layout and GC tables),
apart from the dictionaries described
below in Section 4.4. In particular,
all reference types are compatible
with each other, because the loader
and JIT compiler make no distinction
for the purposes of field layout or
code generation. On the implementation
for the Intel x86, at least, primitive
types are mutually incompatible, even
if they have the same size (floats and
ints have different parameter passing
conventions). That leaves user-defined
struct types, which are compatible if
their layout is the same with respect
to garbage collection i.e. they share
the same pattern of traced pointers.
http://research.microsoft.com/pubs/64031/designandimplementationofgenerics.pdf

Categories