Why does the c# compiler still read compile code from unused methods?

Why does the c# compiler still read compile code from unused methods? - c#

Say I have two methods, one calls the other. The second method has code that will generate a compile time error. Since it's not called, why does the compiler still bother to process it?
void method1()
{
var i = 1;
//method2();
}
void method2()
{
int i = "2";
}

You can't be sure that someone else won't call that method at runtime using reflection. Your code MUST compile or it's not valid code - if it's never used... comment it out!
To expand on this:
Basically at compile time you are strongly typed - .NET will type check everything to ensure that what you are trying to do is legal, however, you can still throw exceptions at run time due to null references, bad casts etc etc.
Reflection is a component of the .NET framework that allows a developer to inspect the properties/fields/methods etc of an assemblies types via the assembly metadata
Reflection allows runtime type discovery and inspection of these types, it also allows invocation of methods/properties and modification of fields etc. (You can even create new generic types at runtime or completely new types altogether).
In other words, you can't guarantee that code you think won't be called, isn't called somewhere else at some point. For reflection to be possible, every bit of code needs to be valid and compilable
Whether that code will succeed at runtime is another story - but that's why we have exception handling.

And then what if somebody else uses your compiled code later on and decides to use it?
Even with private methods Reflection can complicate matters.
If you don't use it, lose it. (or at least comment it out)

Related

Instantiating object with full name

quick theoric question. I have the following code:
if (partnership != null && partnership.UseCustomNotifier)
{
//some behavior
}
else
{
Integration.Protocol.Client.Notifier protocolNotifier = new Integration.Protocol.Client.Notifier();
}
I have two implementations for partnership that are chosen using reflection. Integration.Protocol is not in the usings. Implementation should be chosen dynamically; thing is, if I comment that last line (the instantiation of protocolNotifier), it will only chose one implementation (the one that does not come from the Integration.Protocol, because is the only one available). Otherwise, it will be chosen dynamically using reflection.
I know that this code sucks (I've improved it already), but I was curious on why this behavior was ocurring. I would guess that when the solution compiles before running, it checks that line where protocolNotifier is instantiated and adds the using on compilation time. Is this correct? Does it only happen within the scope of the method? Or the whole class? I am curious on how the .NET compiler works in those situations.

If I understand you correctly:
partnership is an object of a type that is chosen by using reflection to find all available classes with that name (we'll call it Partnership) and creating an object of the found type
The Integration.Protocol namespace has a class called Partnership (or whatever it's actually called), but there is also another class called Partnership in either this project's code or some other library that you're already using elsewhere
Depending on if you include that last line or not, your reflection code picks one or the other
If I'm correct in that, then I believe the behaviour you're seeing is simply because it's loading the Integration.Protocol library (assuming that is a separate DLL and not actually part of the current project code).
If you show us the code of how you set partnership, then we can confirm this. But if I'm correct, then if you don't use Integration.Protocol in that last line, then that library simply isn't loaded, and your reflection code won't find anything from that library.
It's not that "adding the using on compilation time", because using statements just allow you to not include the namespace when referring to classes. It doesn't have anything to do with whether libraries are loaded at runtime or not.

Why can't Mono support generic interface instantiation with AOT?

The Mono documentation has a code example about full AOT not supporting generic interface instantiation:
interface IFoo<T> {
...
void SomeMethod();
}
It says:
Since Mono has no way of determining from the static analysis what method will implement the interface IFoo<int>.SomeMethod this particular pattern is not supported."
So I think the compiler can't work with this method under type inference. But I still can't understand the underlying reason about the full AOT limitation.
There is still a similar problem with the Unity AOT script restrictions. In the following code:
using UnityEngine;
using System;
public class AOTProblemExample : MonoBehaviour, IReceiver
{
public enum AnyEnum {
Zero,
One,
}
void Start() {
// Subtle trigger: The type of manager *must* be
// IManager, not Manager, to trigger the AOT problem.
IManager manager = new Manager();
manager.SendMessage(this, AnyEnum.Zero);
}
public void OnMessage<T>(T value) {
Debug.LogFormat("Message value: {0}", value);
}
}
public class Manager : IManager {
public void SendMessage<T>(IReceiver target, T value) {
target.OnMessage(value);
}
}
public interface IReceiver {
void OnMessage<T>(T value);
}
public interface IManager {
void SendMessage<T>(IReceiver target, T value);
}
I am confused by this:
The AOT compiler does not realize that it should generate code for the generic method OnMessage with a T of AnyEnum, so it blissfully continues, skipping this method. When that method is called, and the runtime can’t find the proper code to execute, it gives up with this error message.
Why does the AOT not know the type when the JIT can infer the type? Can anyone offer a detailed answer?

Before describing the issues, consider this excerpt from another answer of mine that describes the generics situation on platforms that do support dynamic code generation:
In C# generics, the generic type definition is maintained in memory at runtime. Whenever a new concrete type is required, the runtime environment combines the generic type definition and the type arguments and creates the new type (reification). So we get a new type for each combination of the type arguments, at runtime.
The phrase at runtime is key to this, because it brings us to another point:
This implementation technique depends heavily on runtime support and JIT-compilation (which is why you often hear that C# generics have some limitations on platforms like iOS, where dynamic code generation is restricted).
So is it possible for a full AOT compiler to do that as well? It most certainly is possible. But is it easy?
There is a paper from Microsoft Research on pre-compiling .NET generics that describes the interaction of generics with AOT compilation, highlights some potential problems and proposes solutions. In this answer, I will use that paper to try to demonstrate why .NET generics aren't widely pre-compiled (yet).
Everything must be instantiated
Consider your example :
IManager manager = new Manager();
manager.SendMessage(this, AnyEnum.Zero);
Clearly we're calling the method IManager.SendMessage<AnyEnum> here, so the fully AOT compiler needs to compile that method.
But this is an interface call, which is effectively a virtual call, which means the we can't know ahead of time which implementation of the interface method will be called.
The JIT compiler doesn't care about this problem. When someone attempts to run a method that hasn't been compiled yet, the JIT will be notified and it will compile the method lazily.
On the contrary, a fully AOT compiler doesn't have access to all this runtime type information. So it has to pessimistically compile all possible instantiations of the generic method on all implementations of the interface. (Or just give up and not offer that feature.)
Generics can be infinitely recursive
object M<T>(long n)
{
if (n == 1)
{
return new T[]();
}
else
{
return M<T[]>(n - 1);
}
}
To instantiate M<int>(), the compiler needs to instantiate int[] and M<int[]>(). To instantiate M<int[]>(), the compiler needs to instantiate int[][] and M<int[][]>(). To instantiate M<int[][]>(), the compiler needs to instantiate int[][][] and M<int[][][]>().
This can be solved by using representative instantiations (just like the JIT compiler uses). This means that all generic arguments that are reference types can share their code. So:
int[][], int[][][], int[][][][] (and so on) can all share the same code, because they are arrays of references.
M<int[]>, M<int[][]>, M<int[][][]> (and so on) can all share the same code, because they operate on references.
Assemblies need to own their generics...
Since C# programs are compiled in assemblies, it's hard to tell exactly who should "own" which instantiation of each type.
Assembly1 declares the type G<T>.
Assembly2 (references Assembly1) instantiates the type G<int>.
Assembly3 (references Assembly1) instantiates the type G<int> as well.
AssemblyX (references all the above) wants to use G<int>.
Which assembly gets to compile the actual G<int>? If they happen to be standalone libraries, neither Assembly2 nor Assembly3 can be compiled without each owning a copy of G<int>. So we're already looking at duplicated native code.
...and those generics must still be compatible with each other
But then, when AssemblyX is compiled, which copy of G<int> should it use? Clearly, it has to be able to handle both, because it may need to receive a G<int> from or send a G<int> to either assembly.
But more importantly, in C# you can't have two types with identical fully qualified names that turn out to be incompatible. In other words:
G<int> obj = new G<int>();
The above can never fail on the grounds that G<int> (the variable's type) is the G<int> from Assembly2 while G<int> (the constructor's type) is the G<int> from Assembly3. If it fails for a reason like that, we're not in C# anymore!
So both types need to exist and they need to be made transparently compatible, even though they are compiled separately. For this to happen, the type handles need to be manipulated at link time in such a way that the semantics of the language are retained, including the fact that they should be assignable to each other, their type handles should compare as equal (e.g. when using typeof), and so on.

Unity is using an old version of Mono's Full-AOT the does not support generic interface method.
This is due to how generics are represented in the JIT vs in native code. (I would like to elaborate, but frankly, I do not trust myself to be accurate)
Newer versions of Mono's AOT compiler address this issue (of course, with other limitations), but Unity keeps an old version of Mono. (I think I remember hearing that they changed their approach from AOT to something a bit else, but I'm not sure how it works anymore).
I don't fully understand the topic WARNING
The way "generics" are handled in C++ (for example), which compiles to assembly, binary, is using a language mechanism called templates. These templates are more like glorified macros, and different code is actually generated for each type used. (Edit: Actually there are more differences between C# generics and C++ templates, but for the purpose of this answer I'll treat them as equivalent).
For example; for the following code:
template<typename T>
class Foo
{
public:
T GetValue() { return value; }
void SetValue(T a) {value = a;}
private:
T value;
};
int main()
{
Foo<int> a;
Foo<char *> b;
a.SetValue(0);
b.SetValue((char*)0);
a.GetValue();
b.GetValue();
return 0;
}
The following functions will be generated (got this by running nm --demangle)
00000000004005e4 W Foo<int>::GetValue()
00000000004005b2 W Foo<int>::SetValue(int)
00000000004005f4 W Foo<char*>::GetValue()
00000000004005ca W Foo<char*>::SetValue(char*)
This means that for every type you use this class with, another of instance of practically the same code will be generated (although I'm sure that GCC is smart enough to optimize some of the obvious cases, like getters and setters, and maybe more).
C#'s generics are a bit more complex.
Here's a very interesting article by Eric Lippert. The summary is that compiled C# generic code has only one instance that is, well, generic, and what that depends on the type is calculated at runtime.
When translating C# code to native/machine code (which is essentially what AOT does), there's a problem translating generics.
This is where the subject gets a bit fuzzy to me. I can only assume that AOT'd code does not retain runtime type information, so it needs code-per-type for generic cases.
When receiving an object of type IFooable, it is possible that the native virtual table format is not verbose enough to enable finding the correct implementation; although I admit I have no idea why would that be, or the exact details of the virtual table of AOT'd code (is it identical to that of C++'s?)

Weird compilation error when indirectly refer to an assembly that declares a generic extension method with type restriction

Well, it's clear for me that the title of my question is too complicated. I just tried to make it as specific as possible. So, I'll try to explain the problem better.
Problem context
Let's assume we have three .NET projects in a solution. The main project is a simple console application ApplicationAssembly. This project has a reference to another managed assembly library DirectlyReferencedLibrary. At the same time DirectlyReferencedLibrary refers to IndirectlyUsedLibrary.
So, the project usages looks like that:
ApplicationAssembly --> DirectlyReferencedLibrary --> IndirectlyUsedLibrary.
Notice that ApplicationAssembly doesn't use directly any type declared IndirectlyUsedLibrary. Let's also assume that all types declared in these assemblies reside in the same namespace.
This solution compiles and works fine.
Weird problem
The problem occurs when I have together the following conditions:
the ApplicationAssembly project has usages of LINQ expressions. For example, if there is the invocation of Select() on any object of enumerable type.
The DirectlyReferencedLibrary declares a class which has a generic extension method with a type restriction. The type restriction says that the generic type must be a descendant of a class from the IndirectlyUsedLibrary.
Here is the example of a such class.
using System;
namespace Test
{
public static class UnUsedType
{
// It's a generic extension method with a type restriction.
public static void Magic<T>(this T #this)
// It's a type restriction that uses a type from the IndirectlyUsedLibrary.
where T : ProblemType
{
Console.WriteLine("I do nothing actually.");
}
}
}
When I try to compile this project, I get the following error:
Error The type 'Test.ProblemType' is defined in an assembly that is not referenced. You must add a reference to assembly 'IndirectlyUsedLibrary, Version=1.0.0.0, Culture=neutral,
PublicKeyToken=null'. C:\Projects\Test\ApplicationAssembly\Program.cs 22 13 ApplicationAssembly
Question
Can anyone help me to understand why is it so?
P.S.
I've made a tiny solution for investigation. If you are so kind to help me, you will be able to take an archived solution here
P.P.S.
Sorry for my poor English.
UPD1
Another strange thing is that different invocations of the LINQ method may or may not produce the compile time error:
// Ok. Let's do some work using LINQ we love so much!
var strings = new[] { "aaa", "bbb", "ccc" };
Func<string, object> converter = item => (object) item;
// The following line makes problems.
var asObjects1 = strings.Select(converter);
// Everything is OK if we use the following line:
var asObjects2 = Enumerable.Select(strings, converter);

Can anyone help me to understand why is it so?
The C# compiler has the reasonable expectation that the transitive closure of the referenced assemblies is available at compile time. It does not have any kind of advanced logic that reasons about what it definitely needs, might need, might not need, or definitely does not need to know in order to solve all the problems in type analysis that your program is going to throw at it. If you reference an assembly directly or indirectly, the compiler assumes that there might be type information in there that it needs.
Also, if you don't have the set of referenced assemblies at compile time then what expectation is there that users will have them at runtime? It seems reasonable to expect that the compile time environment has at least the set of assemblies that are going to be required at runtime.
I don't want to do it.
We all have to do things we don't want to do in life.

I think you knew this, but because the type ProblemType is defined in the "IndirectlyUsedLibrary" but is a required definition for the Magic extension method it must be referenced to be available at compile time.
As to "why"... Well, the compiler needs to know details about what it's compiling doesn't it? It makes sense to me that the compiler require the same minimum set of compile time references that it requires at run time...

You'll not have an error if you use different namespaces for libraries. It's really wierd to use same namespace across different libraries.
Looks like compiler starts scanning your Test namespace for extensions once you first time use any. Hence the reference is required.

how virtual generic method call is implemented?

I'm interesting in how CLR implementes the calls like this:
abstract class A {
public abstract void Foo<T, U, V>();
}
A a = ...
a.Foo<int, string, decimal>(); // <=== ?
Is this call cause an some kind of hash map lookup by type parameters tokens as the keys and compiled generic method specialization (one for all reference types and the different code for all the value types) as the values?

I didn't find much exact information about this, so much of this answer is based on the excellent paper on .Net generics from 2001 (even before .Net 1.0 came out!), one short note in a follow-up paper and what I gathered from SSCLI v. 2.0 source code (even though I wasn't able to find the exact code for calling virtual generic methods).
Let's start simple: how is a non-generic non-virtual method called? By directly calling the method code, so the compiled code contains direct address. The compiler gets the method address from the method table (see next paragraph). Can it be that simple? Well, almost. The fact that methods are JITed makes it a little more complicated: what is actually called is either code that compiles the method and only then executes it, if it wasn't compiled yet; or it's one instruction that directly calls the compiled code, if it already exists. I'm going to ignore this detail further on.
Now, how is a non-generic virtual method called? Similar to polymorphism in languages like C++, there is a method table accessible from the this pointer (reference). Each derived class has its own method table and its methods there. So, to call a virtual method, get the reference to this (passed in as a parameter), from there, get the reference to the method table, look at the correct entry in it (the entry number is constant for specific function) and call the code the entry points to. Calling methods through interfaces is slightly more complicated, but not interesting for us now.
Now we need to know about code sharing. Code can be shared between two “instances” of the same method, if reference types in type parameters correspond to any other reference types, and value types are exactly the same. So, for example C<string>.M<int>() shares code with C<object>.M<int>(), but not with C<string>.M<byte>(). There is no difference between type type parameters and method type parameters. (The original paper from 2001 mentions that code can be shared also when both parameters are structs with the same layout, but I'm not sure this is true in the actual implementation.)
Let's make an intermediate step on our way to generic methods: non-generic methods in generic types. Because of code sharing, we need to get the type parameters from somewhere (e.g. for calling code like new T[]). For this reason, each instantiation of generic type (e.g. C<string> and C<object>) has its own type handle, which contains the type parameters and also method table. Ordinary methods can access this type handle (technically a structure confusingly called MethodTable, even though it contains more than just the method table) from the this reference. There are two types of methods that can't do that: static methods and methods on value types. For those, the type handle is passed in as a hidden argument.
For non-virtual generic methods, the type handle is not enough and so they get different hidden argument, MethodDesc, that contains the type parameters. Also, the compiler can't store the instantiations in the ordinary method table, because that's static. So it creates a second, different method table for generic methods, which is indexed by type parameters, and gets the method address from there, if it already exists with compatible type parameters, or creates a new entry.
Virtual generic methods are now simple: the compiler doesn't know the concrete type, so it has to use the method table at runtime. And the normal method table can't be used, so it has to look in the special method table for generic methods. Of course, the hidden parameter containing type parameters is still present.
One interesting tidbit learned while researching this: because the JITer is very lazy, the following (completely useless) code works:
object Lift<T>(int count) where T : new()
{
if (count == 0)
return new T();
return Lift<List<T>>(count - 1);
}
The equivalent C++ code causes the compiler to give up with a stack overflow.

Yes. The code for specific type is generated at the runtime by CLR and keeps a hashtable (or similar) of implementations.
Page 372 of CLR via C#:
When a method that uses generic type
parameters is JIT-compiled, the CLR
takes the method's IL, substitutes the
specified type arguments, and then
creates native code that is specific
to that method operating on the
specified data types. This is exactly
what you want and is one of the main
features of generics. However, there
is a downside to this: the CLR keeps
generating native code for every
method/type combination. This is
referred to as code explosion. This
can end up increasing the
application's working set
substantially, thereby hurting
performance.
Fortunately, the CLR has some
optimizations built into it to reduce
code explosion. First, if a method is
called for a particular type argument,
and later, the method is called again
using the same type argument, the CLR
will compile the code for this
method/type combination just once. So
if one assembly uses List,
and a completely different assembly
(loaded in the same AppDomain) also
uses List, the CLR will
compile the methods for List
just once. This reduces code explosion
substantially.

EDIT
I now came across I now came across https://msdn.microsoft.com/en-us/library/sbh15dya.aspx which clearly states that generics when using reference types are reusing the same code, thus I would accept that as the definitive authority.
ORIGINAL ANSWER
I am seeing here two disagreeing answers, and both have references to their side, so I will try to add my two cents.
First, Clr via C# by Jeffrey Richter published by Microsoft Press is as valid as an msdn blog, especially as the blog is already outdated, (for more books from him take a look at http://www.amazon.com/Jeffrey-Richter/e/B000APH134 one must agree that he is an expert on windows and .net).
Now let me do my own analysis.
Clearly two generic types that contain different reference type arguments cannot share the same code
For example, List<TypeA> and List<TypeB>> cannot share the same code, as this would cause the ability to add an object of TypeA to List<TypeB> via reflection, and the clr is strongly typed on genetics as well, (unlike Java in which only the compiler validates generic, but the underlying JVM has no clue about them).
And this does not apply only to types, but to methods as well, since for example a generic method of type T can create an object of type T (for example nothing prevents it from creating a new List<T>), in which case reusing the same code would cause havoc.
Furthermore the GetType method is not overridable, and it in fact always return the correct generic type, prooving that each type argument has indeed its own code.
(This point is even more important than it looks, as the clr and jit work based on the type object created for that object, by using GetType () which simply means that for each type argument there must be a separate object even for reference types)
Another issue that would result from code reuse, as the is and as operators will no longer work correctly, and in general all types of casting will have serious problems.
NOW TO ACTUAL TESTING:
I have tested it by having a generic type that contaied a static member, and than created two object with different type parameters, and the static fields were clrearly not shared, clearly prooving that code is not shared even for reference types.
EDIT:
See http://blogs.msdn.com/b/csharpfaq/archive/2004/03/12/how-do-c-generics-compare-to-c-templates.aspx on how it is implemented:
Space Use
The use of space is different between C++ and C#. Because C++
templates are done at compile time, each use of a different type in a
template results in a separate chunk of code being created by the
compiler.
In the C# world, it's somewhat different. The actual implementations
using a specific type are created at runtime. When the runtime creates
a type like List, the JIT will see if that has already been
created. If it has, it merely users that code. If not, it will take
the IL that the compiler generated and do appropriate replacements
with the actual type.
That's not quite correct. There is a separate native code path for
every value type, but since reference types are all reference-sized,
they can share their implementation.
This means that the C# approach should have a smaller footprint on
disk, and in memory, so that's an advantage for generics over C++
templates.
In fact, the C++ linker implements a feature known as “template
folding“, where the linker looks for native code sections that are
identical, and if it finds them, folds them together. So it's not a
clear-cut as it would seem to be.
As one can see the CLR "can" reuse the implementation for reference types, as do current c++ compilers, however there is no guarantee on that, and for unsafe code using stackalloc and pointers it is probably not the case, and there might be other situations as well.
However what we do have to know that in CLR type system, they are treated as different types, such as different calls to static constructors, separate static fields, separate type objects, and a object of a type argument T1 should not be able to access a private field of another object with type argument T2 (although for an object of the same type it is indeed possible to access private fields from another object of the same type).

ReSharper hints that I should do static methods in WebForms - Why? Am I missing something?

ReSharper sometimes hints that I can make some of my random utility methods in my WebForms static. Why would I do this? As far as I can tell, there's no benefit in doing so.. or is there? Am I missing something as far as static members in WebForms goes?

The real reason is not the performance reason -- that will be measured in billionths of a second, if it has any effect at all.
The real reason is that an instance method which makes no use of its instance is logically a design flaw. Suppose I wrote you a method:
class C
{
public int DoubleIt(int x, string y, Type z)
{
return x * 2;
}
}
Is this a well-designed method? No. It takes all kinds of information in which it then ignores and does not use to compute the result or execute a side effect. Why force the caller to pass in an unnecessary string and type?
Now, notice that this method also takes in a C, in the form of the "this" passed into the call. That is also ignored. This method should be static, and take one parameter.
A well-designed method takes in exactly the information it needs to compute its results and execute its side effects, no more, no less. Resharper is telling you that you have a design flaw in your code: you have a method that is taking in information that it is ignoring. Either fix the method so that it starts using that information, or stop passing in useless data by making the method static.
Again, the performance concern is a total red herring; you'll never notice a difference that small unless what you're doing takes on the order of a handful of processor cycles. The reason for the warning is to call your attention to a logical design flaw. Getting the program logic right is far more important than shaving off a nanosecond here and there.

I wouldn't mind any performance improvement, but what you might like is that static methods have no side effect on the instance. So unless you're having a lot of static state (do you?) this gives away your intention that this method is similar to a function, only looking at the parameters and (optional) returning a result.
For me this is a nice hint when I read someone else's code. I don't worry too much about shared state and can see the flow of information more easily. It's much more constrained in what it can do by declaring it static, which is less to worry about for me, the reader.

You will get a performance improvement, FxCop rule CA1822 is the same.
From MSDN:
Methods that do not access instance
data or call instance methods can be
marked as static (Shared in Visual
Basic). After you mark the methods as
static, the compiler will emit
non-virtual call sites to these
members. Emitting non-virtual call
sites will prevent a check at runtime
for each call that ensures that the
current object pointer is non-null.
This can result in a measurable
performance gain for
performance-sensitive code. In some
cases, the failure to access the
current object instance represents a
correctness issue

Resharper suggest to convert methods to static if they don't use any non-static variables or methods from the class.
Benefit could be a minor performance increase (application will use less memory), and there will be one less resharper warning ;)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Why does the c# compiler still read compile code from unused methods? - c#

Say I have two methods, one calls the other. The second method has code that will generate a compile time error. Since it's not called, why does the compiler still bother to process it? void method1() { var i = 1; //method2(); } void method2() { int i = "2"; }

And then what if somebody else uses your compiled code later on and decides to use it? Even with private methods Reflection can complicate matters. If you don't use it, lose it. (or at least comment it out)

Related

Instantiating object with full name

Why can't Mono support generic interface instantiation with AOT?

Weird compilation error when indirectly refer to an assembly that declares a generic extension method with type restriction

how virtual generic method call is implemented?

ReSharper hints that I should do static methods in WebForms - Why? Am I missing something?

Categories

Resources