How do generics implement structs?

How do generics implement structs? - c#

I was thinking about this. classes are obviously passed around by ptr. I suspect structs are passed around by copying it but i don't know for sure. (it seems like a waste for an int array to have every element a ptr. and passing ptrs for ints)
But thinking about it, List<MyStruct> can not know the size of my struct. What happens when i do this? Are there multiple copies of "List`1" and every time i use it with a storage size it does not have it creates a new implementation? (adjusting for the new offsets of T and such).
That could make sense since the source would be in the CIL inside of a DLL. But i am completely guessing, how is it done? Perhaps a reference or page # to the ECMA standards?

Generics use the concept of open and closed generic types: A parametrized generic class definition (i.e. List<T>) is an open generic type of which the runtime generates a closed generic type for each different use you have in your code, i.e. a different type is created for List<int> and for List<MyStruct> - for each closed generic type the size and type of T is known at run-time.
Clarification from MSDN:
When a generic type or method is
compiled into Microsoft intermediate
language (MSIL), it contains metadata
that identifies it as having type
parameters. How the MSIL for a generic
type is used differs based on whether
the supplied type parameter is a value
type or reference type.
When a generic type is first
constructed with a value type as a
parameter, the runtime creates a
specialized generic type with the
supplied parameter or parameters
substituted in the appropriate
locations in the MSIL. Specialized
generic types are created one time for
each unique value type that is used as
a parameter.
Generics work somewhat differently for
reference types. The first time a
generic type is constructed with any
reference type, the runtime creates a
specialized generic type with object
references substituted for the
parameters in the MSIL. Then, every
time that a constructed type is
instantiated with a reference type as
its parameter, regardless of what type
it is, the runtime reuses the
previously created specialized version
of the generic type. This is possible
because all references are the same
size.

The CLR compiles 1 version of the generic class and uses it for all reference types. It also compiles 1 version for every value type usage to optimize for performance.

Related

cast objects to a known interface with generic type parameter where that type parameter is given as type variable at runtime

I am doing some reflection stuff. I have the following situation:
I have a variable Type myType with some runtime value (e.g. string or List<int>).
I have two variables object a and object b for which I know that they are of type IImmutableSet<myType>. (So if myType=string they'd be IImmutableSet<string>, if myType=List<int> they'd be IImmutableSet<List<int>> and so on.)
How can I cast them?
Motivation:
I want to do a comparison of a and b by content, i,e.
check that both their size is equal. For that, I need a Count property (or equivalent property/method).
check that a contains all elements of b. For that, I need a Contains(...) method.
If I had ISet<T>s, I'd just cast them to ICollection and use that interface's Count and Contains(...).
But since IImmutableSet<T> does not implement any non-generic collection interface, I need to cast a and b to IImmutableSet<myType> (or IReadOnlyCollection<myType>). That syntax doesn't work because it expects a compile-time constant type.
--
Another thought... if casting isn't possible, I'd be happy with being able to call the said methods. Since I'm already doing heavy reflection, I don't care about speed.

When does type checking of generic definitions and instantiations happen in C#?

In C#,
does type checking of generic definitions happen at compile time?
does type checking of instantiations of generics happen at run time?
Thanks.
The above questions are for me to understand the quotes in bold from C# in a Nutshell:
However, with C# generics, producer types (i.e., open types such as
List ) can be compiled into a library (such as mscorlib.dll).
This works because the synthesis between the producer and the
consumer that produces closed types doesn’t actually happen until
runtime.
To dig deeper into why this is the case, consider the Max method in
C#, once more:
static T Max <T> (T a, T b) where T : IComparable<T>
=> a.CompareTo (b) > 0 ? a : b;
Why couldn’t we have implemented it like this?
static T Max <T> (T a, T b)
=> (a > b ? a : b); // Compile error
The reason is that Max needs to be compiled once and work for all
possible values of T . Compilation cannot succeed, because there is
no single meaning for > across all values of T —in fact, not
every T even has a > operator.
I also have the same question for Java.

Both generic definitions and instantiations are checked at compile time. Further, they can be checked separately. Unlike C++ where you may have an error in your template that you don't discover until you later try to instantiate it, in C#, any compile errors in a generic declaration will be found when the declaration itself is compiled.
The magic that enables this that C++ lacks is constraints. This is what the example is showing you.
When you define a generic method or class, you can put constraints on the type parameters. Those limit which instantiations are allowed but also determine what operations you can take advantage of in the body of the generic declaration.
When the declaration is compiled, the compiler checks that you don't do anything with a type parameter that its constraints don't allow. So, for example, you'd get an error here:
T Foo<T> (T a) => a.CompareTo(b);
You're trying to call CompareTo on a, whose type is T. The compiler has no way of knowing that a user will only instantiate Foo with types that do have that method, so it pessimistically assumes it could be instantiated with a type that doesn't have that and prevents you from compiling this declaration.
When you change it to:
T Foo<T> (T a) where T : IComparable => a.CompareTo(b);
Now it knows every instantiation of T must have a CompareTo() method, so it compiles this.
Later when someone tries to instantiate Foo with some type, if the type does not implement IComparable, they get a compile error. Since the method says "You can only use me with types that implement IComparable", the compiler ensures they meet that constraint.

The main purpose of type-checking is to detect errors at compile-time i.e. before the software is used. This prevents bugs from reaching the users.
C# generic types are checked at compile-time. See the Benefits of Generics.
C# also does type-checking at run-time in certain situations, but that is too late to prevent bugs - the application is already running and being used.

Well, the compiler checks the usage of generic argument at compile time and generate “generic aware type”. There rea especial instructions in IL for this kind of operation. C# compiler uses any restriction at the generic definition to allow operation on generic argument. So, answer on the first question – Yes, it does check generic argument type. During definition of regular type based on generic type compiler checks if the type satisfies all restrictions and if so generate another type, internal one, for using with the combination of generic type and arguments. The compiler will use that generated type for all other instances of combination generic type and its argument type, but this, second phase, happened at runtime when request for creating of particular generic type with argument.
When type is created it is just regular, not generic type event the base class of it is generic and compiler uses common approach to check types related to any instance of this type.

Why is specialization in C# generics limited?

The question "What is reification?" has a comment on C#'s generics:
Type information is maintained, which allows specialization to an extent, by examining type arguments using reflection. However, the degree of specialization is limited, as a result of the fact that a generic type definition is compiled before any reification happens (this is done by compiling the definition against the constraints on the type parameters - thus, the compiler has to be able "understand" the definition even in the absence of specific type arguments).
What does it mean by "specialization"? Is it not the same as instantiation of a generic type with a specific type argument?
What does it mean by "the degree of specialization is limited"?
Why is it "a result of the fact that a generic type definition is compiled before any reification happens"?

What does it mean by "specialization"? Is it not the same as instantiation of a generic type with a specific type argument?
Author explains in the portion of his answer dedicated to Java generics that
specialization of a generic type [is] the ability to use specialized source code for any particular generic argument combination.
In other words, it is an ability to do something special if a generic type parameter is of a specific type. Supplying an implementation of List<T> that represents individual elements as bits when you instantiate the type as List<bool> would be an example of specialization.
What does it mean by "the degree of specialization is limited"?
Author means that although you can write things like
if (typeof(T) == typeof(bool)) {
...
}
your abilities to respond to a combination of type arguments are limited, because any decision on a type combination has to be made at run-time.
Why is it "a result of the fact that a generic type definition is compiled before any reification happens"?
Because reification is done in CLR, well after C# compiler is out of the picture. The compiler must produce a generic type definition for CLR to use as a "template" for making closed constructed types for instances of a generic class.

I believe the meaning is as follows:
When you define a generic type e.g. MyGenericType<T> your definition has to make sense for any value of T, as the generic type is compiled before you actually use it in a specific implementation ("the degree of specialization is limited, as a result of the fact that a generic type definition is compiled before any reification happens").
Later on, when you actually use a MyGenericType<int> the compiler/jit will create a new class which is pretty much MyGenericType<T> with every mention of T replaced with int. This is the process of reification. This means that at runtime, you can use the fact that the generic type is using an int, but your ability to make use of this (specialisation) is limited, since when you defined MyGenericType<T> you didn't know this.

Specialization is used as antonym to generalization. When you created a generic type, you generalized a type definition. When you initialized it with a type, you specialized the compiled generic type to be able to create object of the type at run-time.
IL compiles the generic type. At runtime, this compiled generic type is combined with specific type argument to produce an object of the specified class.
Yes, specialization is same as instantiation of a generic type with a specific type argument at runtime.
With generics, come constraints which basically fix the scope of generic type. You can tell that by defining that T can be a struct, class, or has to have some specific base class etc. You cannot create a class instance which is not allowed by the constraints defined on the generic type.
You can initialize the same generic type definition with a int, string or another class, if it satisfied the constraints in the generic class.
It cannot directly create an object of the class with T, not yet replaced by a defined type (primitive types like int, string, or your custom class or interface) and your code inside should be compatible to type being passed in as T for it to work.
Refer (Links from same question you mentioned above):
NET Generics and Code Bloat
Generics are not templates (as in C++)

how virtual generic method call is implemented?

I'm interesting in how CLR implementes the calls like this:
abstract class A {
public abstract void Foo<T, U, V>();
}
A a = ...
a.Foo<int, string, decimal>(); // <=== ?
Is this call cause an some kind of hash map lookup by type parameters tokens as the keys and compiled generic method specialization (one for all reference types and the different code for all the value types) as the values?

I didn't find much exact information about this, so much of this answer is based on the excellent paper on .Net generics from 2001 (even before .Net 1.0 came out!), one short note in a follow-up paper and what I gathered from SSCLI v. 2.0 source code (even though I wasn't able to find the exact code for calling virtual generic methods).
Let's start simple: how is a non-generic non-virtual method called? By directly calling the method code, so the compiled code contains direct address. The compiler gets the method address from the method table (see next paragraph). Can it be that simple? Well, almost. The fact that methods are JITed makes it a little more complicated: what is actually called is either code that compiles the method and only then executes it, if it wasn't compiled yet; or it's one instruction that directly calls the compiled code, if it already exists. I'm going to ignore this detail further on.
Now, how is a non-generic virtual method called? Similar to polymorphism in languages like C++, there is a method table accessible from the this pointer (reference). Each derived class has its own method table and its methods there. So, to call a virtual method, get the reference to this (passed in as a parameter), from there, get the reference to the method table, look at the correct entry in it (the entry number is constant for specific function) and call the code the entry points to. Calling methods through interfaces is slightly more complicated, but not interesting for us now.
Now we need to know about code sharing. Code can be shared between two “instances” of the same method, if reference types in type parameters correspond to any other reference types, and value types are exactly the same. So, for example C<string>.M<int>() shares code with C<object>.M<int>(), but not with C<string>.M<byte>(). There is no difference between type type parameters and method type parameters. (The original paper from 2001 mentions that code can be shared also when both parameters are structs with the same layout, but I'm not sure this is true in the actual implementation.)
Let's make an intermediate step on our way to generic methods: non-generic methods in generic types. Because of code sharing, we need to get the type parameters from somewhere (e.g. for calling code like new T[]). For this reason, each instantiation of generic type (e.g. C<string> and C<object>) has its own type handle, which contains the type parameters and also method table. Ordinary methods can access this type handle (technically a structure confusingly called MethodTable, even though it contains more than just the method table) from the this reference. There are two types of methods that can't do that: static methods and methods on value types. For those, the type handle is passed in as a hidden argument.
For non-virtual generic methods, the type handle is not enough and so they get different hidden argument, MethodDesc, that contains the type parameters. Also, the compiler can't store the instantiations in the ordinary method table, because that's static. So it creates a second, different method table for generic methods, which is indexed by type parameters, and gets the method address from there, if it already exists with compatible type parameters, or creates a new entry.
Virtual generic methods are now simple: the compiler doesn't know the concrete type, so it has to use the method table at runtime. And the normal method table can't be used, so it has to look in the special method table for generic methods. Of course, the hidden parameter containing type parameters is still present.
One interesting tidbit learned while researching this: because the JITer is very lazy, the following (completely useless) code works:
object Lift<T>(int count) where T : new()
{
if (count == 0)
return new T();
return Lift<List<T>>(count - 1);
}
The equivalent C++ code causes the compiler to give up with a stack overflow.

Yes. The code for specific type is generated at the runtime by CLR and keeps a hashtable (or similar) of implementations.
Page 372 of CLR via C#:
When a method that uses generic type
parameters is JIT-compiled, the CLR
takes the method's IL, substitutes the
specified type arguments, and then
creates native code that is specific
to that method operating on the
specified data types. This is exactly
what you want and is one of the main
features of generics. However, there
is a downside to this: the CLR keeps
generating native code for every
method/type combination. This is
referred to as code explosion. This
can end up increasing the
application's working set
substantially, thereby hurting
performance.
Fortunately, the CLR has some
optimizations built into it to reduce
code explosion. First, if a method is
called for a particular type argument,
and later, the method is called again
using the same type argument, the CLR
will compile the code for this
method/type combination just once. So
if one assembly uses List,
and a completely different assembly
(loaded in the same AppDomain) also
uses List, the CLR will
compile the methods for List
just once. This reduces code explosion
substantially.

EDIT
I now came across I now came across https://msdn.microsoft.com/en-us/library/sbh15dya.aspx which clearly states that generics when using reference types are reusing the same code, thus I would accept that as the definitive authority.
ORIGINAL ANSWER
I am seeing here two disagreeing answers, and both have references to their side, so I will try to add my two cents.
First, Clr via C# by Jeffrey Richter published by Microsoft Press is as valid as an msdn blog, especially as the blog is already outdated, (for more books from him take a look at http://www.amazon.com/Jeffrey-Richter/e/B000APH134 one must agree that he is an expert on windows and .net).
Now let me do my own analysis.
Clearly two generic types that contain different reference type arguments cannot share the same code
For example, List<TypeA> and List<TypeB>> cannot share the same code, as this would cause the ability to add an object of TypeA to List<TypeB> via reflection, and the clr is strongly typed on genetics as well, (unlike Java in which only the compiler validates generic, but the underlying JVM has no clue about them).
And this does not apply only to types, but to methods as well, since for example a generic method of type T can create an object of type T (for example nothing prevents it from creating a new List<T>), in which case reusing the same code would cause havoc.
Furthermore the GetType method is not overridable, and it in fact always return the correct generic type, prooving that each type argument has indeed its own code.
(This point is even more important than it looks, as the clr and jit work based on the type object created for that object, by using GetType () which simply means that for each type argument there must be a separate object even for reference types)
Another issue that would result from code reuse, as the is and as operators will no longer work correctly, and in general all types of casting will have serious problems.
NOW TO ACTUAL TESTING:
I have tested it by having a generic type that contaied a static member, and than created two object with different type parameters, and the static fields were clrearly not shared, clearly prooving that code is not shared even for reference types.
EDIT:
See http://blogs.msdn.com/b/csharpfaq/archive/2004/03/12/how-do-c-generics-compare-to-c-templates.aspx on how it is implemented:
Space Use
The use of space is different between C++ and C#. Because C++
templates are done at compile time, each use of a different type in a
template results in a separate chunk of code being created by the
compiler.
In the C# world, it's somewhat different. The actual implementations
using a specific type are created at runtime. When the runtime creates
a type like List, the JIT will see if that has already been
created. If it has, it merely users that code. If not, it will take
the IL that the compiler generated and do appropriate replacements
with the actual type.
That's not quite correct. There is a separate native code path for
every value type, but since reference types are all reference-sized,
they can share their implementation.
This means that the C# approach should have a smaller footprint on
disk, and in memory, so that's an advantage for generics over C++
templates.
In fact, the C++ linker implements a feature known as “template
folding“, where the linker looks for native code sections that are
identical, and if it finds them, folds them together. So it's not a
clear-cut as it would seem to be.
As one can see the CLR "can" reuse the implementation for reference types, as do current c++ compilers, however there is no guarantee on that, and for unsafe code using stackalloc and pointers it is probably not the case, and there might be other situations as well.
However what we do have to know that in CLR type system, they are treated as different types, such as different calls to static constructors, separate static fields, separate type objects, and a object of a type argument T1 should not be able to access a private field of another object with type argument T2 (although for an object of the same type it is indeed possible to access private fields from another object of the same type).

What is the Implementation of Generics for the NET Common Language Runtime

When you use generic collections in C# (or .NET in general), does the compiler basically do the leg-work developers used to have to do of making a generic collection for a specific type. So basically . . . it just saves us work?
Now that I think about it, that can't be right. Because without generics, we used to have to make collections that used a non-generic array internally, and so there was boxing and unboxing (if it was a collection of value types), etc.
So, how are generics rendered in CIL? What is it doing to impliment when we say we want a generic collection of something? I don't necessarily want CIL code examples (though that would be ok), I want to know the concepts of how the compiler takes our generic collections and renders them.
Thanks!
P.S. I know that I could use ildasm to look at this but I CIL still looks like chinese to me, and I am not ready to tackle that. I just want the concepts of how C# (and other languages I guess too) render in CIL to handle generics.

Forgive my verbose post, but this topic is quite broad. I'm going to attempt to describe what the C# compiler emits and how that's interpreted by the JIT compiler at runtime.
ECMA-335 (it's a really well written design document; check it out) is where it's at for knowing how everything, and I mean everything, is represented in a .NET assembly. There are a few related CLI metadata tables for generic information in an assembly:
GenericParam - Stores information about a generic parameter (index, flags, name, owning type/method).
GenericParamConstraint - Stores information about a generic parameter constraint (owning generic parameter, constraint type).
MethodSpec - Stores instantiated generic method signatures (e.g. Bar.Method<int> for Bar.Method<T>).
TypeSpec - Stores instantiated generic type signatures (.e.g. Bar<int> for Bar<T>).
So with this in mind, let's walk through a simple example using this class:
class Foo<T>
{
public T SomeProperty { get; set; }
}
When the C# compiler compiles this example, it will define Foo in the TypeDef metadata table, like it would for any other type. Unlike a non-generic type, it will also have an entry in the GenericParam table that will describe its generic parameter (index = 0, flags = ?, name = (index into String heap, "T"), owner = type "Foo").
One of the columns of data in the TypeDef table is the starting index into the MethodDef table that is the continuous list of methods defined on this type. For Foo, we've defined three methods: a getter and a setter to SomeProperty and a default constructor supplied by the compiler. As a result, the MethodDef table would hold a row for each of these methods. One of the important columns in the MethodDef table is the "Signature" column. This column stores a reference to a blob of bytes that describes the exact signature of the method. ECMA-335 goes into great detail about these metadata signature blobs, so I won't regurgitate that information here.
The method signature blob contains type information about the parameters as well as the return value. In our example, the setter takes a T and the getter returns a T. Well, what is a T then? In the signature blob, it's going to be a special value that means "the generic type parameter at index 0". This means the row in the GenericParams table that has index=0 with owner=type "Foo", which is our "T".
The same thing goes for the auto-property backing store field. Foo's entry in the TypeDef table will have a starting index into the Field table and the Field table has a "Signature" column. The field's signature will denote that the field's type is "the generic type parameter at index 0".
This is all well and good, but where does the code generation come into play when T is different types? It's actually the responsibility of the JIT compiler to generate the code for the generic instantiations and not the C# compiler.
Let's take a look at an example:
Foo<int> f1 = new Foo<int>();
f1.SomeProperty = 10;
Foo<string> f2 = new Foo<string>();
f2.SomeProperty = "hello";
This will compile to something like this CIL:
newobj <MemberRefToken1> // new Foo<int>()
stloc.0 // Store in local "f1"
ldloc.0 // Load local "f1"
ldc.i4.s 10 // Load a constant 32-bit integer with value 10
callvirt <MemberRefToken2> // Call f1.set_SomeProperty(10)
newobj <MemberRefToken3> // new Foo<string>()
stloc.1 // Store in local "f2"
ldloc.1 // Load local "f2"
ldstr <StringToken> // Load "hello" (which is in the user string heap)
callvirt <MemberRefToken4> // Call f2.set_SomeProperty("hello")
So what's this MemberRefToken business? A MemberRefToken is a metadata token (tokens are four byte values with the most-significant-byte being a metadata table identifier and the remaining three bytes are the row number, 1-based) that references a row in the MemberRef metadata table. This table stores a reference to a method or field. Before generics, this is the table that would store information about methods/fields you're using from types defined in referenced assemblies. However, it can also be used to reference a member on a generic instantiation. So let's say that MemberRefToken1 refers to the first row in the MemberRef table. It might contain this data: class = TypeSpecToken1, name = ".ctor", blob = <reference to expected signature blob of .ctor>.
TypeSpecToken1 would refer to the first row in the TypeSpec table. From above we know this table stores the instantiations of generic types. In this case, this row would contain a reference to a signature blob for "Foo<int>". So this MemberRefToken1 is really saying we are referencing "Foo<int>.ctor()".
MemberRefToken1 and MemberRefToken2 would share the same class value, i.e. TypeSpecToken1. They would differ, however, on the name and signature blob (MethodRefToken2 would be for "set_SomeProperty"). Likewise, MemberRefToken3 and MemberRefToken4 would share TypeSpecToken2, the instantiation of "Foo<string>", but differ on the name and blob in the same way.
When the JIT compiler compiles the above CIL, it notices that it's seeing a generic instantiation it hasn't seen before (i.e. Foo<int> or Foo<string>). What happens next is covered pretty well by Shiv Kumar's answer, so I won't repeat it in detail here. Simply put, when the JIT compiler encounters a new instantiated generic type, it may emit a whole new type into its type system with a field layout using the actual types in the instantiation in place of the generic parameters. They would also have their own method tables and JIT compilation of each method would involve replacing references to the generic parameters with the actual types from the instantiation. It's also the responsibility of the JIT compiler to enforce correctness and verifiability of the CIL.
So to sum up: C# compiler emits metadata describing what's generic and how generic types/methods are instantiated. The JIT compiler uses this information to emit new types (assuming it isn't compatible with an existing instantiation) at runtime for instantiated generic types and each type will have its own copy of the code that has been JIT compiled based on the actual types used in the instantiation.
Hopefully this made sense in some small way.

For value types, there is a specific "class" defined at run time for each value type generic class. For reference types there is only one class definition that is reused across the different types.
I'm simplifying here, but that's the concept.
Design and Implementation of Generics for the
NET Common Language Runtime
Our scheme runs roughly as follows:
When the runtime requires a particular
instantiation of a parameterized
class, the loader checks to see if the
instantiation is compatible with any
that it has seen before; if not, then
a ﬁeld layout is determined and new
vtable is created, to be shared
between all compatible instantiations.
The items in this vtable are entry
stubs for the methods of the class.
When these stubs are later invoked,
they will generate (“just- in-time”)
code to be shared for all compatible
instantiations. When compiling the
invocation of a (non-virtual)
polymorphic method at a particular
instantiation, we ﬁrst check to see
if we have compiled such a call before
for some compatible instantiation; if
not, then an entry stub is generated,
which will in turn generate code to be
shared for all compatible
instantiations. Two instantiations are
compatible if for any parameterized
class its compilation at these
instantiations gives rise to identical
code and other execution structures
(e.g. ﬁeld layout and GC tables),
apart from the dictionaries described
below in Section 4.4. In particular,
all reference types are compatible
with each other, because the loader
and JIT compiler make no distinction
for the purposes of ﬁeld layout or
code generation. On the implementation
for the Intel x86, at least, primitive
types are mutually incompatible, even
if they have the same size (ﬂoats and
ints have different parameter passing
conventions). That leaves user-deﬁned
struct types, which are compatible if
their layout is the same with respect
to garbage collection i.e. they share
the same pattern of traced pointers.
http://research.microsoft.com/pubs/64031/designandimplementationofgenerics.pdf

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.