I have been playing with Reflector and Reflexil, but when I created a static method, it didn't work. On some ispection, I found that methods have 2 different parameters/flags: IsStatic and HasThis.
What is the diference? Or is there no difference and one of the flags is simply unused? I have looked at extension methods and constructors, however extension methods are marked as normal static methods, and constructors as normal member methods (in regard to these 2 flags).
Reflexil displays together two sets of flags - method atttributes from the method definition in case of IsStatic, and calling conventions from the method signature definition in case of HasThis.
Method attributes contain general information about a specific method, like its accessibility, abstract/virtual/sealed status etc. (e.g. Static), while the signature is what the method takes and returns and how, which can be separated into several calling conventions.
The managed calling conventions are DEFAULT, VARARG, HASTHIS and EXPLICITTHIS. HASTHIS (instance in CIL) simply means that the this instance is internally passed as the first argument to the method (referenced by ldarg.0).
HASTHIS is useful in places where you would only be able to use a method signature, like in function pointers or the calli instruction (both unavailable in C#, though not in C++/CLI). However, EXPLICITTHIS would be more appropriate on these occasions.
The Static flag may not be needed for methods, but is certainly required for fields, as they have no calling convention, so probably consistency is the reason.
So, conceptually they are a bit different, but technically mean the same thing.
While in theory a non-static method may not need a this reference, setting both flags at the same time is prohibited, and ilasm doesn't allow me to construct a method with both flags on or off, setting them both based only on the presence of the static keyword.
Extension methods are only a C# thing, the "this" reference is the actual first parameter of the method and the rest is syntactic sugar.
Related
This question already has answers here:
Why are extension methods only allowed in non-nested, non-generic static class?
(3 answers)
Closed 9 years ago.
I understand that C# extension methods must be static. What I don't understand is why these extensions can't be defined in non static classes or generic ones?
Update: I am interested in the reason behind this design decision.
This is more of an observation than an answer, but...
When you call an instance method, a reference to the object you are calling is pushed onto the stack as the first argument in your method call. That first argument is "this" and is done implicitly.
When you define an extension method, you explicitly define a "this" as the first argument.
Is it possible that method resolution would be confusing if you could define extension methods and instance methods in the same class i.e. defining methods with the same name and, in effect, the same parameters when the "this" parameter is included.
Take a look to this piece of the .NET C# specification:
When the first parameter of a method includes the this modifier, that
method is said to be an extension method. Extension methods can only
be declared in non-generic, non-nested static classes. The first
parameter of an extension method can have no modifiers other than
this, and the parameter type cannot be a pointer type.
And this fragment from Jon Skeet's answer:
It's not clear to me why all of these restrictions are necessary -
other than potentially for compiler (and language spec) simplicity. I
can see why it makes sense to restrict it to non-generic types, but I
can't immediately see why they have to be non-nested and static. I
suspect it makes the lookup rules considerably simpler if you don't
have to worry about types contained within the current type etc, but I
dare say it would be possible.
Because the spec says so... Now there are probably good reasons why they wrote the spec this way.
The reason why they can't be declared in generic classes is quite obvious: given the way extension methods are called, where would you specify the type argument for the class?
The reason why it must be a static class is less obvious, but I think it makes sense. The main use case for static classes is to group helper methods together (e.g. Path, Directory, ProtectedData...), and extension methods are basically helper methods. It wouldn't make sense to be able to create an instance of Enumerable or Queryable, for example.
This question already has answers here:
Why are extension methods only allowed in non-nested, non-generic static class?
(3 answers)
Closed 9 years ago.
I understand that C# extension methods must be static. What I don't understand is why these extensions can't be defined in non static classes or generic ones?
Update: I am interested in the reason behind this design decision.
This is more of an observation than an answer, but...
When you call an instance method, a reference to the object you are calling is pushed onto the stack as the first argument in your method call. That first argument is "this" and is done implicitly.
When you define an extension method, you explicitly define a "this" as the first argument.
Is it possible that method resolution would be confusing if you could define extension methods and instance methods in the same class i.e. defining methods with the same name and, in effect, the same parameters when the "this" parameter is included.
Take a look to this piece of the .NET C# specification:
When the first parameter of a method includes the this modifier, that
method is said to be an extension method. Extension methods can only
be declared in non-generic, non-nested static classes. The first
parameter of an extension method can have no modifiers other than
this, and the parameter type cannot be a pointer type.
And this fragment from Jon Skeet's answer:
It's not clear to me why all of these restrictions are necessary -
other than potentially for compiler (and language spec) simplicity. I
can see why it makes sense to restrict it to non-generic types, but I
can't immediately see why they have to be non-nested and static. I
suspect it makes the lookup rules considerably simpler if you don't
have to worry about types contained within the current type etc, but I
dare say it would be possible.
Because the spec says so... Now there are probably good reasons why they wrote the spec this way.
The reason why they can't be declared in generic classes is quite obvious: given the way extension methods are called, where would you specify the type argument for the class?
The reason why it must be a static class is less obvious, but I think it makes sense. The main use case for static classes is to group helper methods together (e.g. Path, Directory, ProtectedData...), and extension methods are basically helper methods. It wouldn't make sense to be able to create an instance of Enumerable or Queryable, for example.
I'm interesting in how CLR implementes the calls like this:
abstract class A {
public abstract void Foo<T, U, V>();
}
A a = ...
a.Foo<int, string, decimal>(); // <=== ?
Is this call cause an some kind of hash map lookup by type parameters tokens as the keys and compiled generic method specialization (one for all reference types and the different code for all the value types) as the values?
I didn't find much exact information about this, so much of this answer is based on the excellent paper on .Net generics from 2001 (even before .Net 1.0 came out!), one short note in a follow-up paper and what I gathered from SSCLI v. 2.0 source code (even though I wasn't able to find the exact code for calling virtual generic methods).
Let's start simple: how is a non-generic non-virtual method called? By directly calling the method code, so the compiled code contains direct address. The compiler gets the method address from the method table (see next paragraph). Can it be that simple? Well, almost. The fact that methods are JITed makes it a little more complicated: what is actually called is either code that compiles the method and only then executes it, if it wasn't compiled yet; or it's one instruction that directly calls the compiled code, if it already exists. I'm going to ignore this detail further on.
Now, how is a non-generic virtual method called? Similar to polymorphism in languages like C++, there is a method table accessible from the this pointer (reference). Each derived class has its own method table and its methods there. So, to call a virtual method, get the reference to this (passed in as a parameter), from there, get the reference to the method table, look at the correct entry in it (the entry number is constant for specific function) and call the code the entry points to. Calling methods through interfaces is slightly more complicated, but not interesting for us now.
Now we need to know about code sharing. Code can be shared between two “instances” of the same method, if reference types in type parameters correspond to any other reference types, and value types are exactly the same. So, for example C<string>.M<int>() shares code with C<object>.M<int>(), but not with C<string>.M<byte>(). There is no difference between type type parameters and method type parameters. (The original paper from 2001 mentions that code can be shared also when both parameters are structs with the same layout, but I'm not sure this is true in the actual implementation.)
Let's make an intermediate step on our way to generic methods: non-generic methods in generic types. Because of code sharing, we need to get the type parameters from somewhere (e.g. for calling code like new T[]). For this reason, each instantiation of generic type (e.g. C<string> and C<object>) has its own type handle, which contains the type parameters and also method table. Ordinary methods can access this type handle (technically a structure confusingly called MethodTable, even though it contains more than just the method table) from the this reference. There are two types of methods that can't do that: static methods and methods on value types. For those, the type handle is passed in as a hidden argument.
For non-virtual generic methods, the type handle is not enough and so they get different hidden argument, MethodDesc, that contains the type parameters. Also, the compiler can't store the instantiations in the ordinary method table, because that's static. So it creates a second, different method table for generic methods, which is indexed by type parameters, and gets the method address from there, if it already exists with compatible type parameters, or creates a new entry.
Virtual generic methods are now simple: the compiler doesn't know the concrete type, so it has to use the method table at runtime. And the normal method table can't be used, so it has to look in the special method table for generic methods. Of course, the hidden parameter containing type parameters is still present.
One interesting tidbit learned while researching this: because the JITer is very lazy, the following (completely useless) code works:
object Lift<T>(int count) where T : new()
{
if (count == 0)
return new T();
return Lift<List<T>>(count - 1);
}
The equivalent C++ code causes the compiler to give up with a stack overflow.
Yes. The code for specific type is generated at the runtime by CLR and keeps a hashtable (or similar) of implementations.
Page 372 of CLR via C#:
When a method that uses generic type
parameters is JIT-compiled, the CLR
takes the method's IL, substitutes the
specified type arguments, and then
creates native code that is specific
to that method operating on the
specified data types. This is exactly
what you want and is one of the main
features of generics. However, there
is a downside to this: the CLR keeps
generating native code for every
method/type combination. This is
referred to as code explosion. This
can end up increasing the
application's working set
substantially, thereby hurting
performance.
Fortunately, the CLR has some
optimizations built into it to reduce
code explosion. First, if a method is
called for a particular type argument,
and later, the method is called again
using the same type argument, the CLR
will compile the code for this
method/type combination just once. So
if one assembly uses List,
and a completely different assembly
(loaded in the same AppDomain) also
uses List, the CLR will
compile the methods for List
just once. This reduces code explosion
substantially.
EDIT
I now came across I now came across https://msdn.microsoft.com/en-us/library/sbh15dya.aspx which clearly states that generics when using reference types are reusing the same code, thus I would accept that as the definitive authority.
ORIGINAL ANSWER
I am seeing here two disagreeing answers, and both have references to their side, so I will try to add my two cents.
First, Clr via C# by Jeffrey Richter published by Microsoft Press is as valid as an msdn blog, especially as the blog is already outdated, (for more books from him take a look at http://www.amazon.com/Jeffrey-Richter/e/B000APH134 one must agree that he is an expert on windows and .net).
Now let me do my own analysis.
Clearly two generic types that contain different reference type arguments cannot share the same code
For example, List<TypeA> and List<TypeB>> cannot share the same code, as this would cause the ability to add an object of TypeA to List<TypeB> via reflection, and the clr is strongly typed on genetics as well, (unlike Java in which only the compiler validates generic, but the underlying JVM has no clue about them).
And this does not apply only to types, but to methods as well, since for example a generic method of type T can create an object of type T (for example nothing prevents it from creating a new List<T>), in which case reusing the same code would cause havoc.
Furthermore the GetType method is not overridable, and it in fact always return the correct generic type, prooving that each type argument has indeed its own code.
(This point is even more important than it looks, as the clr and jit work based on the type object created for that object, by using GetType () which simply means that for each type argument there must be a separate object even for reference types)
Another issue that would result from code reuse, as the is and as operators will no longer work correctly, and in general all types of casting will have serious problems.
NOW TO ACTUAL TESTING:
I have tested it by having a generic type that contaied a static member, and than created two object with different type parameters, and the static fields were clrearly not shared, clearly prooving that code is not shared even for reference types.
EDIT:
See http://blogs.msdn.com/b/csharpfaq/archive/2004/03/12/how-do-c-generics-compare-to-c-templates.aspx on how it is implemented:
Space Use
The use of space is different between C++ and C#. Because C++
templates are done at compile time, each use of a different type in a
template results in a separate chunk of code being created by the
compiler.
In the C# world, it's somewhat different. The actual implementations
using a specific type are created at runtime. When the runtime creates
a type like List, the JIT will see if that has already been
created. If it has, it merely users that code. If not, it will take
the IL that the compiler generated and do appropriate replacements
with the actual type.
That's not quite correct. There is a separate native code path for
every value type, but since reference types are all reference-sized,
they can share their implementation.
This means that the C# approach should have a smaller footprint on
disk, and in memory, so that's an advantage for generics over C++
templates.
In fact, the C++ linker implements a feature known as “template
folding“, where the linker looks for native code sections that are
identical, and if it finds them, folds them together. So it's not a
clear-cut as it would seem to be.
As one can see the CLR "can" reuse the implementation for reference types, as do current c++ compilers, however there is no guarantee on that, and for unsafe code using stackalloc and pointers it is probably not the case, and there might be other situations as well.
However what we do have to know that in CLR type system, they are treated as different types, such as different calls to static constructors, separate static fields, separate type objects, and a object of a type argument T1 should not be able to access a private field of another object with type argument T2 (although for an object of the same type it is indeed possible to access private fields from another object of the same type).
I was just curious to know how Extension methods are hooked up to the Original class. I know in IL code it gives a call to Static Method, but how it does that and why dosen't it break encapsulation.
They don't "hook up".
The Visaul Studio IDE just makes it look like it does by showing them in the intellisense lists.
The compiler "knows" how to deal with the references in order to make the right method calls with the correct parameters.
This is simply syntactic sugar - the methods are simply static methods on a separate static class. Using the this modifier lets the compiler "know" to add the ExtensionAttribute to the class to mark it as an extension method.
Since extension methods do not in fact change the class and can only access public members on it, encapsulation is retained.
From MSDN:
Extension methods are a special kind of static method, but they are called as if they were instance methods on the extended type.
(emphasis mine)
Extension methods are specified by putting the this keyword in front of the first parameter of a static method:
public static void SomeExtension(this string s)
{
...
}
That is just syntactic sugar for decorating the method with System.Runtime.CompilerServices.ExtensionAttribute:
[Extension]
public static void SomeExtension(string s)
{
...
}
When the compiler sees that attribute, it knows to translate the extension method call to the appropriate static method call, passing the instance as the first parameter.
Since the calls are just normal static method calls, there is no chance to break encapsulation; the methods, like all static methods, only have access to the public interfaces of the extended types.
Extension methods are just syntactic sugar, they are just static methods. You are only able to access public fields or properties in them, just like normal static methods.
The key ingredient is that an instance method of a class isn't fundamentally different from a static method. With one small detail, they have a hidden argument. For example, the String.IndexOf(char) method actually looks like this to the CLR:
public static int IndexOf(string thisRef, char value) {
// etc...
}
The thisRef argument is what supplies the string reference whenever you use this in your code or access a member of the class. As you can see, it is a very small step from an extension method to an instance method. No changes were necessary in the CLR to support the feature.
One other minor difference is that the compiler emits code that checks if this is null for an instance method but does not do so for an extension method. You can call an extension method on a null object. While that might look like a feature, it is actually a restriction induced by the extension method not actually being a member of the class.
Internally, the CLR keeps a list of methods for the class, the MethodTable. Extension methods are not in them, preventing the compiler from emitting the callvirt IL instruction, the 'trick' that it uses to get the cheap null check. Explicitly emitting code to make the null check would have been possible but they elected not to do so. Not quite sure why.
Another automatic consequence of this is that an extension method cannot be virtual.
I think you should have a look at http://go.microsoft.com/fwlink/?LinkId=112388
I wrote an extension method for String to get a char argument, string.Remove(char). But when I used this, it instead called the default string.Remove(int) method.
Shouldn't the presence of an actual method have higher priority than an implicit conversion?
Instance methods have priority over extension methods. Your observation is proof of the same.
When resolving which method to call, it will always pick a matching instance method over an extension method... which is intuitive in a way.
Paraphrased from C# in depth,
When the compiler sees that you're
trying to call a method which looks
like an instance method but is unable
to find one, it then looks for
extension methods (that are visible
based on your using directives). In
case of multiple candidates as the
target extension method, the one with
"better conversion" similar to
overloading (e.g. if IChild and IBase
both have a similar extension method
defined.. IChild.ExtensionMethod is
chosen)
Also a hidden code-breaker could be lets say TypeA didn't have SecretMethod as an instance method in Libv1.0. So you write an extension method SecretMethod. If the author introduces an instance method of the same name and signature in v2.0 (sans the this param), and you recompile your source with the latest-n-greatest Libv2.0, all existing calls to the extension method would silently now be routed to the new instance method.
This behavior is correct. The reason is that introducing an extension method should not change the way existing code executes. Code should behave exactly the same with or without this "superfluous" extension method. It may seem counter-intuitive in certain cases (like yours), but happens for a reason.