Why are delegates reference types? - c#

Quick note on the accepted answer: I disagree with a small part of Jeffrey's answer, namely the point that since Delegate had to be a reference type, it follows that all delegates are reference types. (It simply isn't true that a multi-level inheritance chain rules out value types; all enum types, for example, inherit from System.Enum, which in turn inherits from System.ValueType, which inherits from System.Object, all reference types.) However I think the fact that, fundamentally, all delegates in fact inherit not just from Delegate but from MulticastDelegate is the critical realization here. As Raymond points out in a comment to his answer, once you've committed to supporting multiple subscribers, there's really no point in not using a reference type for the delegate itself, given the need for an array somewhere.
See update at bottom.
It has always seemed strange to me that if I do this:
Action foo = obj.Foo;
I am creating a new Action object, every time. I'm sure the cost is minimal, but it involves allocation of memory to later be garbage collected.
Given that delegates are inherently themselves immutable, I wonder why they couldn't be value types? Then a line of code like the one above would incur nothing more than a simple assignment to a memory address on the stack*.
Even considering anonymous functions, it seems (to me) this would work. Consider the following simple example.
Action foo = () => { obj.Foo(); };
In this case foo does constitute a closure, yes. And in many cases, I imagine this does require an actual reference type (such as when local variables are closed over and are modified within the closure). But in some cases, it shouldn't. For instance in the above case, it seems that a type to support the closure could look like this: I take back my original point about this. The below really does need to be a reference type (or: it doesn't need to be, but if it's a struct it's just going to get boxed anyway). So, disregard the below code example. I leave it only to provide context for answers the specfically mention it.
struct CompilerGenerated
{
Obj obj;
public CompilerGenerated(Obj obj)
{
this.obj = obj;
}
public void CallFoo()
{
obj.Foo();
}
}
// ...elsewhere...
// This would not require any long-term memory allocation
// if Action were a value type, since CompilerGenerated
// is also a value type.
Action foo = new CompilerGenerated(obj).CallFoo;
Does this question make sense? As I see it, there are two possible explanations:
Implementing delegates properly as value types would have required additional work/complexity, since support for things like closures that do modify values of local variables would have required compiler-generated reference types anyway.
There are some other reasons why, under the hood, delegates simply can't be implemented as value types.
In the end, I'm not losing any sleep over this; it's just something I've been curious about for a little while.
Update: In response to Ani's comment, I see why the CompilerGenerated type in my above example might as well be a reference type, since if a delegate is going to comprise a function pointer and an object pointer it'll need a reference type anyway (at least for anonymous functions using closures, since even if you introduced an additional generic type parameter—e.g., Action<TCaller>—this wouldn't cover types that can't be named!). However, all this does is kind of make me regret bringing the question of compiler-generated types for closures into the discussion at all! My main question is about delegates, i.e., the thing with the function pointer and the object pointer. It still seems to me that could be a value type.
In other words, even if this...
Action foo = () => { obj.Foo(); };
...requires the creation of one reference type object (to support the closure, and give the delegate something to reference), why does it require the creation of two (the closure-supporting object plus the Action delegate)?
*Yes, yes, implementation detail, I know! All I really mean is short-term memory storage.

The question boils down to this: the CLI (Common Language Infrastructure) specification says that delegates are reference types. Why is this so?
One reason is clearly visible in the .NET Framework today. In the original design, there were two kinds of delegates: normal delegates and "multicast" delegates, which could have more than one target in their invocation list. The MulticastDelegate class inherits from Delegate. Since you can't inherit from a value type, Delegate had to be a reference type.
In the end, all actual delegates ended up being multicast delegates, but at that stage in the process, it was too late to merge the two classes. See this blog post about this exact topic:
We abandoned the distinction between Delegate and MulticastDelegate
towards the end of V1. At that time, it would have been a massive
change to merge the two classes so we didn’t do so. You should
pretend that they are merged and that only MulticastDelegate exists.
In addition, delegates currently have 4-6 fields, all pointers. 16 bytes is usually considered the upper bound where saving memory still wins out over extra copying. A 64-bit MulticastDelegate takes up 48 bytes. Given this, and the fact that they were using inheritance suggests that a class was the natural choice.

There is only one reason that Delegate needs to be a class, but it's a big one: while a delegate could be small enough to allow efficient storage as a value type (8 bytes on 32-bit systems, or 16 bytes on 64-bit systems), there's no way it could be small enough to efficiently guarantee if one thread attempts to write a delegate while another thread attempts to execute it, the latter thread wouldn't end up either invoking the old method on the new target, or the new method on the old target. Allowing such a thing to occur would be a major security hole. Having delegates be reference types avoids this risk.
Actually, even better than having delegates be structure types would be having them be interfaces. Creating a closure requires creating two heap objects: a compiler-generated object to hold any closed-over variables, and a delegate to invoke the proper method on that object. If delegates were interfaces, the object which held the closed-over variables could itself be used as the delegate, with no other object required.

Imagine if delegates were value types.
public delegate void Notify();
void SignalTwice(Notify notify) { notify(); notify(); }
int counter = 0;
Notify handler = () => { counter++; }
SignalTwice(handler);
System.Console.WriteLine(counter); // what should this print?
Per your proposal, this would internally be converted to
struct CompilerGenerated
{
int counter = 0;
public Execute() { ++counter; }
};
Notify handler = new CompilerGenerated();
SignalTwice(handler);
System.Console.WriteLine(counter); // what should this print?
If delegate were a value type, then SignalEvent would get a copy of handler, which means that a brand new CompilerGenerated would be created (a copy of handler) and passed to SignalEvent. SignalTwice would execute the delegate twice, which increments the counter twice in the copy. And then SignalTwice returns, and the function prints 0, because the original was not modified.

Here's an uninformed guess:
If delegates were implemented as value-types, instances would be very expensive to copy around since a delegate-instance is relatively heavy. Perhaps MS felt it would be safer to design them as immutable reference types - copying machine-word sized references to instances are relatively cheap.
A delegate instance needs, at the very least:
An object reference (the "this" reference for the wrapped method if it is an instance method).
A pointer to the wrapped function.
A reference to the object containing the multicast invocation list. Note that a delegate-type should support, by design, multicast using the same delegate type.
Let's assume that value-type delegates were implemented in a similar manner to the current reference-type implementation (this is perhaps somewhat unreasonable; a different design may well have been chosen to keep the size down) to illustrate. Using Reflector, here are the fields required in a delegate instance:
System.Delegate: _methodBase, _methodPtr, _methodPtrAux, _target
System.MulticastDelegate: _invocationCount, _invocationList
If implemented as a struct (no object header), these would add up to 24 bytes on x86 and 48 bytes on x64, which is massive for a struct.
On another note, I want to ask how, in your proposed design, making the CompilerGenerated closure-type a struct helps in any way. Where would the created delegate's object pointer point to? Leaving the closure type instance on the stack without proper escape analysis would be extremely risky business.

I can tell that making delegates as reference types is definitely a bad design choice. They could be value types and still support multi-cast delegates.
Imagine that Delegate is a struct composed of, let's say:
object target;
pointer to the method
It can be a struct, right?
The boxing will only occur if the target is a struct (but the delegate itself will not be boxed).
You may think it will not support MultiCastDelegate, but then we can:
Create a new object that will hold the array of normal delegates.
Return a Delegate (as struct) to that new object, which will implement Invoke iterating over all its values and calling Invoke on them.
So, for normal delegates, that are never going to call two or more handlers, it could work as a struct.
Unfortunately, that is not going to change in .Net.
As a side note, variance does not requires the Delegate to be reference types. The parameters of the delegate should be reference types. After all, if you pass a string were an object is required (for input, not ref or out), then no cast is needed, as string is already an object.

I saw this interesting conversation on the Internet:
Immutable doesn't mean it has to be a value type. And something that
is a value type is not required to be immutable. The two often go
hand-in-hand, but they are not actually the same thing, and there are
in fact counter-examples of each in the .NET Framework (the String
class, for example).
And the answer:
The difference being that while immutable reference types are
reasonably common and perfectly reasonable, making value types mutable
is almost always a bad idea, and can result in some very confusing
behaviour!
Taken from here
So, in my opinion the decision was made by language usability aspects, and not by compiler technological difficulties. I love nullable delegates.

I guess one reason is support for multi cast delegates Multi cast delegates are more complex than simply a few fields indicating target and method.
Another thing that's only possible in this form is delegate variance. This kind of variance requires a reference conversion between the two types.
Interestingly F# defines it's own function pointer type that's similar to delegates, but more lightweight. But I'm not sure if it's a value or reference type.

Related

Delegate.CreateDelegate won't box a return value - deliberate, or an ommission?

I have a static method:
public class Example
{
//for demonstration purposes - just returns default(T)
public static T Foo<T>() { return default(T); }
}
And I need to be able to invoke it using a Type parameter calls to which could be numerous, so my standard pattern is to create a thread-safe cache of delegates (using ConcurrentDictionary in .Net 4) which dynamically invoke the Foo<T> method with the correct T. Without the caching, though, the code is this:
static object LateFoo(Type t)
{
//creates the delegate and invokes it in one go
return (Func<object>)Delegate.CreateDelegate(
typeof(Func<object>),
typeof(Example).GetMethod("Foo", BindingFlags.Public | BindingFlags.Static).
MakeGenericMethod(t))();
}
This is not the first time I've had to do this - and in the past I have use Expression trees to build and compile a proxy to invoke the target method - to ensure that return type conversion and boxing from int -> object (for example) is handled correctly.
Update - example of Expression code that works
static object LateFoo(Type t)
{
var method = typeof(Example)
.GetMethod("Foo", BindingFlags.Public | BindingFlags.Static)
.MakeGenericMethod(t);
//in practise I cache the delegate, invoking it freshly built or from the cache
return Expression.Lambda<Func<IField, object>>(Expression.Convert(
Expression.Call(method), typeof(object))).Compile()();
}
What's slightly amusing is that I learned early on with expressions that an explicit Convert was required and accepted it - and in lieu of the answers here it does now make sense why the .Net framework doesn't automatically stick the equivalent in.
End update
However, this time I thought I'd just use Delegate.CreateDelegate as it makes great play of the fact that (from MSDN):
Similarly, the return type of a delegate is compatible with the return type of a method if the return type of the method is more restrictive than the return type of the delegate, because this guarantees that the return value of the method can be cast safely to the return type of the delegate.
Now - if I pass typeof(string) to LateFoo method, everything is fine.
If, however, I pass typeof(int) I get an ArgumentException on the CreateDelegate call, message: Error binding to target method. There is no inner exception or further information.
So it would seem that, for method binding purposes, object is not considered more restrictive than int. Obviously, this must be to do with boxing being a different operation than a simple type conversion and value types not being treated as covariant to object in the .Net framework; despite the actual type relationship at runtime.
The C# compiler seems to agree with this (just shortest way I can model the error, ignore what the code would do):
public static int Foo()
{
Func<object> f = new Func<object>(Foo);
return 0;
}
Does not compile because the Foo method 'has the wrong return type' - given the CreateDelegate problem, C# is simply following .Net's lead.
It seems to me that .Net is inconsistent in it's treatment of covariance - either a value type is an object or it's not; & if it's not it should not expose object as a base (despite how much more difficult it would make our lives). Since it does expose object as a base (or is it only the language that does that?), then according to logic a value type should be covariant to object (or whichever way around you're supposed to say it) making this delegate bind correctly. If that covariance can only be achieved via a boxing operation; then the framework should take care of that.
I dare say the answer here will be that CreateDelegate doesn't say that it will treat a box operation in covariance because it only uses the word 'cast'. I also expect there are whole treatises on the wider subject of value types and object covariance, and I'm shouting about a long-defunct and settled subject. I think there's something I either don't understand or have missed, though - so please enlighten!
If this is unanswerable - I'm happy to delete.
You can only convert a delegate in this way if the parameters and return value can be converted using a representation conserving conversion.
Reference types can only be converted to other reference types in this way
Integral values can be converted to other integer values of the same size (int, uint, and enums of the same size are compatible)
A few more relevant blog articles:
This dichotomy motivates yet another classification scheme for conversions (†). We can divide conversions into representation-preserving conversions (B to D) and representation-changing conversions (T to U). (‡) We can think of representation-preserving conversions on reference types as those conversions which preserve the identity of the object. When you cast a B to a D, you’re not doing anything to the existing object; you’re merely verifying that it is actually the type you say it is, and moving on. The identity of the object and the bits which represent the reference stay the same. But when you cast an int to a double, the resulting bits are very different.
This is why covariant and contravariant conversions of interface and delegate types require that all varying type arguments be of reference types. To ensure that a variant reference conversion is always identity-preserving, all of the conversions involving type arguments must also be identity-preserving. The easiest way to ensure that all the non-trivial conversions on type arguments are identity-preserving is to restrict them to be reference conversions.
http://blogs.msdn.com/b/ericlippert/archive/2009/03/19/representation-and-identity.aspx
"but how can a value type, like int, which is 32 bits of memory, no more, no less, possibly inherit from object? An object laid out in memory is way bigger than 32 bits; it's got a sync block and a virtual function table and all kinds of stuff in there." Apparently lots of people think that inheritance has something to do with how a value is laid out in memory. But how a value is laid out in memory is an implementation detail, not a contractual obligation of the inheritance relationship! When we say that int inherits from object, what we mean is that if object has a member -- say, ToString -- then int has that member as well.
http://ericlippert.com/2011/09/19/inheritance-and-representation/
It seems to me that .Net is inconsistent in it's treatment of covariance - either a value type is an object or it's not; if it's not it should not expose object as a base
It depends on what the meaning of "is" is, as President Clinton famously said.
For the purposes of covariance, int is not object because int is not assignment compatible with object. A variable of type object expects a particular bit pattern with a particular meaning to be stored in it. A variable of type int expects a particular bit pattern with a particular meaning, but a different meaning than the meaning of a variable of object type.
However, for the purposes of inheritance, an int is an object because every member of object is also a member of int. If you want to invoke a method of object -- ToString, say -- on int, you are guaranteed that you can do so, because an int is a kind of object, and an object has ToString.
It is unfortunate, I agree, that the truth value of "an int is an object" varies depending on whether you mean "is assignment-compatible with" or "is a kind of".
If that covariance can only be achieved via a boxing operation; then the framework should take care of that.
OK. Where? Where should the boxing operation go? Someone, somewhere has to generate a hunk of IL that has a boxing instruction. Are you suggesting that when the framework sees:
Func<int> f1 = ()=>1;
Func<object> f2 = f1;
then the framework should automatically pretend that you said:
Func<object> f2 = ()=>(object)f1();
and thereby generate the boxing instruction?
That's a reasonable feature, but what are the consequences? Func<int> and Func<object> are reference types. If you do f2 = f1 on reference types like this, do you not expect that f2 and f1 have reference identity? Would it not be exceedingly strange for this test case to fail?
f2 = f1;
Debug.Assert(object.ReferenceEquals(f1, f2));
Because if the framework implemented that feature, it would.
Similarly, if you said:
f1 = MyMethod;
f2 = f1;
and you asked the two delegates whether they referred to the same method or not, would it not be exceedingly weird if they referred to different methods?
I think that would be weird. However, the VB designers do not. If you try to pull shenanigans like that in VB, the compiler will not stop you. The VB code generator will generate non-reference-equal delegates for you that refer to different methods. Try it!
Moral of the story: maybe C# is not the language for you. Maybe you prefer a language like VB, where the language is designed to take a "make a guess about what the user probably meant and just make it work" attitude. That's not the attitude of the C# designers. We are more "tell the user when something looks suspiciously wrong and let them figure out how they want to fix it" kind of people.
Even though I think #CodeInChaos is absolutely right, I can't help pointing this Eric Lippert's blog post out. In reply to the last comment to his post (at the very bottom of the page) Eric explains the rationale for such behaviour, and I think this is exactly what you're interested in.
UPDATE: As #Sheepy pointed out Microsoft moved old MSDN blogs into archive and removed all comments. Luckily, the Wayback Machine preserved the blog post in its original form.

What's the method representation in memory?

While thinking a little bit about programming in Java/C# I wondered about how methods which belong to objects are represented in memory and how this fact does concern multi threading.
Is a method instantiated for each object in memory seperately or do
all objects of the same type share one instance of the method?
If the latter, how does the executing thread know which object's
attributes to use?
Is it possible to modify the code of a method in
C# with reflection for one, and only one object of many objects of
the same type?
Is a static method which does not use class attributes always thread safe?
I tried to make up my mind about these questions, but I'm very unsure about their answers.
Each method in your source code (in Java, C#, C++, Pascal, I think every OO and procedural language...) has only one copy in binaries and in memory.
Multiple instances of one object have separate fields but all share the same method code. Technically there is a procedure that takes a hidden this parameter to provide an illusion of executing a method on an object. In reality you are calling a procedure and passing structure (a bag of fields) to it along with other parameters. Here is a simple Java object and more-or-less equivalent pseudo-C code:
class Foo {
private int x;
int mulBy(int y) {
return x * y
}
}
Foo foo = new Foo()
foo.mulBy(3)
is translated to this pseude-C code (the encapsulation is forced by the compiler and runtime/VM):
struct Foo {
int x = 0;
}
int Foo_mulBy(Foo *this, int y) {
return this->x * y;
}
Foo* foo = new Foo();
Foo_mulBy(foo, 3)
You have to draw a difference between code and local variables and parameters it operates on (the data). Data is stored on call stack, local to each thread. Code can be executed by multiple threads, each thread has its own copy of instruction pointer (place in the method it currently executes). Also because this is a parameter, it is thread-local, so each thread can operate on a different object concurrently, even though it runs the same code.
That being said you cannot modify a method of only one instance because the method code is shared among all instances.
The Java specifications don't dictate how to do memory layout, and different implementations can do whatever they like, providing it meets the spec where it matters.
Having said that, the mainstream Oracle JVM (HotSpot) works off of things called oops - Ordinary Object Pointers. These consist of two words of header followed by the data which comprises the instance member fields (stored inline for primitive types, and as pointers for reference member fields).
One of the two header words - the class word - is a pointer to a klassOop. This is a special type of oop which holds pointers to the instance methods of the class (basically, the Java equivalent of a C++ vtable). The klassOop is kind-of a VM-level representation of the Class object corresponding to the Java type.
If you're curious about the low-level detail, you can find out a lot more by looking in the OpenJDK source for the definition of some of the oop types (klassOop is a good place to start).
tl;dr Java holds one blob of code for each method of each type. The blobs of code are shared among each instance of the type, and hidden this pointers are used to know which instance's members to use.
I am going to try to answer this in the context of C#.There are basically 3 different types of Methods
virtual
non-virtual
static
When your code is executed, you basically have two kinds of objects that are formed on the heap.
The object corresponding to the type of the object. This is called Type Object. This holds the type object pointer, the sync block index, the static fields and the method table.
The object corresponding to the object itself, which contains all the non static fields.
In response to your questions,
Is a method instantiated for each object in memory seperately or do all objects of the same type share one instance of the method?
This is a wrong way of understanding objects. All methods are per type only. Look at it this way. A method is just a set of instructions. The first time you call a particular method, the IL code is JITed into native instructions and saved in memory. The next time this is called, the address is picked up from the method table and the same instructions are executed again.
2.If the latter, how does the executing thread know which object's attributes to use?
Each static method call on a Type results in looking up the method table from the corresponding Type Object and finding the address of the JITed instruction. In case of methods that are not static, the the relevant object on which the method is called is maintained on the thread's local stack. Basically, you get the nearest object on the stack. That is always the object on which we want the method to be called.
3.Is it possible to modify the code of a method in C# with reflection for one, and only one object of many objects of the same type?
No, It is not possible now. (And I am thankful for that). The reason is that reflection only allows code inspection. If you figure out what some method actually means, there is no way you are going to be able to change the code in the same assembly.

Immutable objects that reference each other?

Today I was trying to wrap my head around immutable objects that reference each other. I came to the conclusion that you can't possibly do that without using lazy evaluation but in the process I wrote this (in my opinion) interesting code.
public class A
{
public string Name { get; private set; }
public B B { get; private set; }
public A()
{
B = new B(this);
Name = "test";
}
}
public class B
{
public A A { get; private set; }
public B(A a)
{
//a.Name is null
A = a;
}
}
What I find interesting is that I cannot think of another way to observe object of type A in a state that is not yet fully constructed and that includes threads. Why is this even valid? Are there any other ways to observe the state of an object that is not fully constructed?
Why is this even valid?
Why do you expect it to be invalid?
Because a constructor is supposed to guarantee that the code it contains is executed before outside code can observe the state of the object.
Correct. But the compiler is not responsible for maintaining that invariant. You are. If you write code that breaks that invariant, and it hurts when you do that, then stop doing that.
Are there any other ways to observe the state of an object that is not fully constructed?
Sure. For reference types, all of them involve somehow passing "this" out of the constructor, obviously, since the only user code that holds the reference to the storage is the constructor. Some ways the constructor can leak "this" are:
Put "this" in a static field and reference it from another thread
make a method call or constructor call and pass "this" as an argument
make a virtual call -- particularly nasty if the virtual method is overridden by a derived class, because then it runs before the derived class ctor body runs.
I said that the only user code that holds a reference is the ctor, but of course the garbage collector also holds a reference. Therefore, another interesting way in which an object can be observed to be in a half-constructed state is if the object has a destructor, and the constructor throws an exception (or gets an asynchronous exception like a thread abort; more on that later.) In that case, the object is about to be dead and therefore needs to be finalized, but the finalizer thread can see the half-initialized state of the object. And now we are back in user code that can see the half-constructed object!
Destructors are required to be robust in the face of this scenario. A destructor must not depend on any invariant of the object set up by the constructor being maintained, because the object being destroyed might never have been fully constructed.
Another crazy way that a half-constructed object could be observed by outside code is of course if the destructor sees the half-initialized object in the scenario above, and then copies a reference to that object to a static field, thereby ensuring that the half-constructed, half-finalized object is rescued from death. Please do not do that. Like I said, if it hurts, don't do it.
If you're in the constructor of a value type then things are basically the same, but there are some small differences in the mechanism. The language requires that a constructor call on a value type creates a temporary variable that only the ctor has access to, mutate that variable, and then do a struct copy of the mutated value to the actual storage. That ensures that if the constructor throws, then the final storage is not in a half-mutated state.
Note that since struct copies are not guaranteed to be atomic, it is possible for another thread to see the storage in a half-mutated state; use locks correctly if you are in that situation. Also, it is possible for an asynchronous exception like a thread abort to be thrown halfway through a struct copy. These non-atomicity problems arise regardless of whether the copy is from a ctor temporary or a "regular" copy. And in general, very few invariants are maintained if there are asynchronous exceptions.
In practice, the C# compiler will optimize away the temporary allocation and copy if it can determine that there is no way for that scenario to arise. For example, if the new value is initializing a local that is not closed over by a lambda and not in an iterator block, then S s = new S(123); just mutates s directly.
For more information on how value type constructors work, see:
Debunking another myth about value types
And for more information on how C# language semantics try to save you from yourself, see:
Why Do Initializers Run In The Opposite Order As Constructors? Part One
Why Do Initializers Run In The Opposite Order As Constructors? Part Two
I seem to have strayed from the topic at hand. In a struct you can of course observe an object to be half-constructed in the same ways -- copy the half-constructed object to a static field, call a method with "this" as an argument, and so on. (Obviously calling a virtual method on a more derived type is not a problem with structs.) And, as I said, the copy from the temporary to the final storage is not atomic and therefore another thread can observe the half-copied struct.
Now let's consider the root cause of your question: how do you make immutable objects that reference each other?
Typically, as you've discovered, you don't. If you have two immutable objects that reference each other then logically they form a directed cyclic graph. You might consider simply building an immutable directed graph! Doing so is quite easy. An immutable directed graph consists of:
An immutable list of immutable nodes, each of which contains a value.
An immutable list of immutable node pairs, each of which has the start and end point of a graph edge.
Now the way you make nodes A and B "reference" each other is:
A = new Node("A");
B = new Node("B");
G = Graph.Empty.AddNode(A).AddNode(B).AddEdge(A, B).AddEdge(B, A);
And you're done, you've got a graph where A and B "reference" each other.
The problem, of course, is that you cannot get to B from A without having G in hand. Having that extra level of indirection might be unacceptable.
Yes, this is the only way for two immutable objects to refer to each other - at least one of them must see the other in a not-fully-constructed way.
It's generally a bad idea to let this escape from your constructor but in cases where you're confident of what both constructors do, and it's the only alternative to mutability, I don't think it's too bad.
"Fully constructed" is defined by your code, not by the language.
This is a variation on calling a virtual method from the constructor,
the general guideline is: don't do that.
To correctly implement the notion of "fully constructed", don't pass this out of your constructor.
Indeed, leaking the this reference out during the constructor will allow you to do this; it may cause problems if methods get invoked on the incomplete object, obviously. As for "other ways to observe the state of an object that is not fully constructed":
invoke a virtual method in a constructor; the subclass constructor will not have been called yet, so an override may try to access incomplete state (fields declared or initialized in the subclass, etc)
reflection, perhaps using FormatterServices.GetUninitializedObject (which creates an object without calling the constructor at all)
If you consider the initialization order
Derived static fields
Derived static constructor
Derived instance fields
Base static fields
Base static constructor
Base instance fields
Base instance constructor
Derived instance constructor
clearly through up-casting you can access the class BEFORE the derived instance constructor is called (this is the reason you shouldn't use virtual methods from constructors. They could easily access derived fields not initialized by the constructor/the constructor in the derived class could not have brought the derived class in a "consistent" state)
You can avoid the problem by instancing B last in your constuctor:
public A()
{
Name = "test";
B = new B(this);
}
If what you suggest was not possible, then A would not be immutable.
Edit: fixed, thanks to leppie.
The principle is that don't let your this object escape from the constructor body.
Another way to observe such problem is by calling virtual methods inside the constructor.
As noted, the compiler has no means of knowing at what point an object has been constructed well enough to be useful; it therefore assumes that a programmer who passes this from a constructor will know whether an object has been constructed well enough to satisfy his needs.
I would add, however, that for objects which are intended to be truly immutable, one must avoid passing this to any code which will examine the state of a field before it has been assigned its final value. This implies that this not be passed to arbitrary outside code, but does not imply that there is anything wrong with having an object under construction pass itself to another object for the purpose of storing a back-reference which will not actually be used until after the first constructor has completed.
If one were designing a language to facilitate the construction and use of immutable objects, it may be helpful for it to declare methods as being usable only during construction, only after construction, or either; fields could be declared as being non-dereferenceable during construction and read-only afterward; parameters could likewise be tagged to indicate that should be non-dereferenceable. Under such a system, it would be possible for a compiler to allow the construction of data structures which referred to each other, but where no property could ever change after it was observed. As to whether the benefits of such static checking would outweigh the cost, I'm not sure, but it might be interesting.
Incidentally, a related feature which would be helpful would be the ability to declare parameters and function returns as ephemeral, returnable, or (the default) persistable. If a parameter or function return were declared ephemeral, it could not be copied to any field nor passed as a persistable parameter to any method. Additionally, passing an ephemeral or returnable value as a returnable parameter to a method would cause the return value of the function to inherit the restrictions of that value (if a function has two returnable parameters, its return value would inherit the more restrictive constraint from its parameters). A major weakness with Java and .net is that all object references are promiscuous; once outside code gets its hands on one, there's no telling who may end up with it. If parameters could be declared ephemeral, it would more often be possible for code which held the only reference to something to know it held the only reference, and thus avoid needless defensive copy operations. Additionally, things like closures could be recycled if the compiler could know that no references to them existed after they returned.

how virtual generic method call is implemented?

I'm interesting in how CLR implementes the calls like this:
abstract class A {
public abstract void Foo<T, U, V>();
}
A a = ...
a.Foo<int, string, decimal>(); // <=== ?
Is this call cause an some kind of hash map lookup by type parameters tokens as the keys and compiled generic method specialization (one for all reference types and the different code for all the value types) as the values?
I didn't find much exact information about this, so much of this answer is based on the excellent paper on .Net generics from 2001 (even before .Net 1.0 came out!), one short note in a follow-up paper and what I gathered from SSCLI v. 2.0 source code (even though I wasn't able to find the exact code for calling virtual generic methods).
Let's start simple: how is a non-generic non-virtual method called? By directly calling the method code, so the compiled code contains direct address. The compiler gets the method address from the method table (see next paragraph). Can it be that simple? Well, almost. The fact that methods are JITed makes it a little more complicated: what is actually called is either code that compiles the method and only then executes it, if it wasn't compiled yet; or it's one instruction that directly calls the compiled code, if it already exists. I'm going to ignore this detail further on.
Now, how is a non-generic virtual method called? Similar to polymorphism in languages like C++, there is a method table accessible from the this pointer (reference). Each derived class has its own method table and its methods there. So, to call a virtual method, get the reference to this (passed in as a parameter), from there, get the reference to the method table, look at the correct entry in it (the entry number is constant for specific function) and call the code the entry points to. Calling methods through interfaces is slightly more complicated, but not interesting for us now.
Now we need to know about code sharing. Code can be shared between two “instances” of the same method, if reference types in type parameters correspond to any other reference types, and value types are exactly the same. So, for example C<string>.M<int>() shares code with C<object>.M<int>(), but not with C<string>.M<byte>(). There is no difference between type type parameters and method type parameters. (The original paper from 2001 mentions that code can be shared also when both parameters are structs with the same layout, but I'm not sure this is true in the actual implementation.)
Let's make an intermediate step on our way to generic methods: non-generic methods in generic types. Because of code sharing, we need to get the type parameters from somewhere (e.g. for calling code like new T[]). For this reason, each instantiation of generic type (e.g. C<string> and C<object>) has its own type handle, which contains the type parameters and also method table. Ordinary methods can access this type handle (technically a structure confusingly called MethodTable, even though it contains more than just the method table) from the this reference. There are two types of methods that can't do that: static methods and methods on value types. For those, the type handle is passed in as a hidden argument.
For non-virtual generic methods, the type handle is not enough and so they get different hidden argument, MethodDesc, that contains the type parameters. Also, the compiler can't store the instantiations in the ordinary method table, because that's static. So it creates a second, different method table for generic methods, which is indexed by type parameters, and gets the method address from there, if it already exists with compatible type parameters, or creates a new entry.
Virtual generic methods are now simple: the compiler doesn't know the concrete type, so it has to use the method table at runtime. And the normal method table can't be used, so it has to look in the special method table for generic methods. Of course, the hidden parameter containing type parameters is still present.
One interesting tidbit learned while researching this: because the JITer is very lazy, the following (completely useless) code works:
object Lift<T>(int count) where T : new()
{
if (count == 0)
return new T();
return Lift<List<T>>(count - 1);
}
The equivalent C++ code causes the compiler to give up with a stack overflow.
Yes. The code for specific type is generated at the runtime by CLR and keeps a hashtable (or similar) of implementations.
Page 372 of CLR via C#:
When a method that uses generic type
parameters is JIT-compiled, the CLR
takes the method's IL, substitutes the
specified type arguments, and then
creates native code that is specific
to that method operating on the
specified data types. This is exactly
what you want and is one of the main
features of generics. However, there
is a downside to this: the CLR keeps
generating native code for every
method/type combination. This is
referred to as code explosion. This
can end up increasing the
application's working set
substantially, thereby hurting
performance.
Fortunately, the CLR has some
optimizations built into it to reduce
code explosion. First, if a method is
called for a particular type argument,
and later, the method is called again
using the same type argument, the CLR
will compile the code for this
method/type combination just once. So
if one assembly uses List,
and a completely different assembly
(loaded in the same AppDomain) also
uses List, the CLR will
compile the methods for List
just once. This reduces code explosion
substantially.
EDIT
I now came across I now came across https://msdn.microsoft.com/en-us/library/sbh15dya.aspx which clearly states that generics when using reference types are reusing the same code, thus I would accept that as the definitive authority.
ORIGINAL ANSWER
I am seeing here two disagreeing answers, and both have references to their side, so I will try to add my two cents.
First, Clr via C# by Jeffrey Richter published by Microsoft Press is as valid as an msdn blog, especially as the blog is already outdated, (for more books from him take a look at http://www.amazon.com/Jeffrey-Richter/e/B000APH134 one must agree that he is an expert on windows and .net).
Now let me do my own analysis.
Clearly two generic types that contain different reference type arguments cannot share the same code
For example, List<TypeA> and List<TypeB>> cannot share the same code, as this would cause the ability to add an object of TypeA to List<TypeB> via reflection, and the clr is strongly typed on genetics as well, (unlike Java in which only the compiler validates generic, but the underlying JVM has no clue about them).
And this does not apply only to types, but to methods as well, since for example a generic method of type T can create an object of type T (for example nothing prevents it from creating a new List<T>), in which case reusing the same code would cause havoc.
Furthermore the GetType method is not overridable, and it in fact always return the correct generic type, prooving that each type argument has indeed its own code.
(This point is even more important than it looks, as the clr and jit work based on the type object created for that object, by using GetType () which simply means that for each type argument there must be a separate object even for reference types)
Another issue that would result from code reuse, as the is and as operators will no longer work correctly, and in general all types of casting will have serious problems.
NOW TO ACTUAL TESTING:
I have tested it by having a generic type that contaied a static member, and than created two object with different type parameters, and the static fields were clrearly not shared, clearly prooving that code is not shared even for reference types.
EDIT:
See http://blogs.msdn.com/b/csharpfaq/archive/2004/03/12/how-do-c-generics-compare-to-c-templates.aspx on how it is implemented:
Space Use
The use of space is different between C++ and C#. Because C++
templates are done at compile time, each use of a different type in a
template results in a separate chunk of code being created by the
compiler.
In the C# world, it's somewhat different. The actual implementations
using a specific type are created at runtime. When the runtime creates
a type like List, the JIT will see if that has already been
created. If it has, it merely users that code. If not, it will take
the IL that the compiler generated and do appropriate replacements
with the actual type.
That's not quite correct. There is a separate native code path for
every value type, but since reference types are all reference-sized,
they can share their implementation.
This means that the C# approach should have a smaller footprint on
disk, and in memory, so that's an advantage for generics over C++
templates.
In fact, the C++ linker implements a feature known as “template
folding“, where the linker looks for native code sections that are
identical, and if it finds them, folds them together. So it's not a
clear-cut as it would seem to be.
As one can see the CLR "can" reuse the implementation for reference types, as do current c++ compilers, however there is no guarantee on that, and for unsafe code using stackalloc and pointers it is probably not the case, and there might be other situations as well.
However what we do have to know that in CLR type system, they are treated as different types, such as different calls to static constructors, separate static fields, separate type objects, and a object of a type argument T1 should not be able to access a private field of another object with type argument T2 (although for an object of the same type it is indeed possible to access private fields from another object of the same type).

Why do BCL Collections use struct enumerators, not classes?

We all know mutable structs are evil in general. I'm also pretty sure that because IEnumerable<T>.GetEnumerator() returns type IEnumerator<T>, the structs are immediately boxed into a reference type, costing more than if they were simply reference types to begin with.
So why, in the BCL generic collections, are all the enumerators mutable structs? Surely there had to have been a good reason. The only thing that occurs to me is that structs can be copied easily, thus preserving the enumerator state at an arbitrary point. But adding a Copy() method to the IEnumerator interface would have been less troublesome, so I don't see this as being a logical justification on its own.
Even if I don't agree with a design decision, I would like to be able to understand the reasoning behind it.
Indeed, it is for performance reasons. The BCL team did a lot of research on this point before deciding to go with what you rightly call out as a suspicious and dangerous practice: the use of a mutable value type.
You ask why this doesn't cause boxing. It's because the C# compiler does not generate code to box stuff to IEnumerable or IEnumerator in a foreach loop if it can avoid it!
When we see
foreach(X x in c)
the first thing we do is check to see if c has a method called GetEnumerator. If it does, then we check to see whether the type it returns has method MoveNext and property current. If it does, then the foreach loop is generated entirely using direct calls to those methods and properties. Only if "the pattern" cannot be matched do we fall back to looking for the interfaces.
This has two desirable effects.
First, if the collection is, say, a collection of ints, but was written before generic types were invented, then it does not take the boxing penalty of boxing the value of Current to object and then unboxing it to int. If Current is a property that returns an int, we just use it.
Second, if the enumerator is a value type then it does not box the enumerator to IEnumerator.
Like I said, the BCL team did a lot of research on this and discovered that the vast majority of the time, the penalty of allocating and deallocating the enumerator was large enough that it was worth making it a value type, even though doing so can cause some crazy bugs.
For example, consider this:
struct MyHandle : IDisposable { ... }
...
using (MyHandle h = whatever)
{
h = somethingElse;
}
You would quite rightly expect the attempt to mutate h to fail, and indeed it does. The compiler detects that you are trying to change the value of something that has a pending disposal, and that doing so might cause the object that needs to be disposed to actually not be disposed.
Now suppose you had:
struct MyHandle : IDisposable { ... }
...
using (MyHandle h = whatever)
{
h.Mutate();
}
What happens here? You might reasonably expect that the compiler would do what it does if h were a readonly field: make a copy, and mutate the copy in order to ensure that the method does not throw away stuff in the value that needs to be disposed.
However, that conflicts with our intuition about what ought to happen here:
using (Enumerator enumtor = whatever)
{
...
enumtor.MoveNext();
...
}
We expect that doing a MoveNext inside a using block will move the enumerator to the next one regardless of whether it is a struct or a ref type.
Unfortunately, the C# compiler today has a bug. If you are in this situation we choose which strategy to follow inconsistently. The behaviour today is:
if the value-typed variable being mutated via a method is a normal local then it is mutated normally
but if it is a hoisted local (because it's a closed-over variable of an anonymous function or in an iterator block) then the local is actually generated as a read-only field, and the gear that ensures that mutations happen on a copy takes over.
Unfortunately the spec provides little guidance on this matter. Clearly something is broken because we're doing it inconsistently, but what the right thing to do is not at all clear.
Struct methods are inlined when type of struct is known at compile time, and calling method via interface is slow, so answer is: because of performance reason.

Categories