One thing I am concerned with is that I discovered two ways of registering delegates to events.
OnStuff += this.Handle;
OnStuff += new StuffEventHandler(this.Handle);
The first one is clean, and it makes sense doing "OnStuff -= this.Handle;" to unregister from the event... But with the latter case, should I do "OnStuff -= new StuffEventHandler(this.Handle);"? It feels like I am not removing anything at all, since I'm throwing in another StuffEventHandler reference. Does the event compare the delegate by reference? I am concerned I could start a nasty memory pool here. Get me? I don't have the reference to the "new StuffEventHandler" I previously registered.
What is the downside of doing #1?
What is benefit of doing #2?
Number one is just shorthand that will generate the same MSIL as the 2nd one, at compile type it'll look at this.Handle and infer the delegate to instantiate. But you should never unsubscribe by using new.
So there is no difference between the 2, just some syntactic sugar to make our code cleaner.
You don't need to worry about keeping a reference to the originally registered delegate, and you will not start a "nasty memory pool".
When you call "OnStuff -= new StuffEventHandler(this.Handle);" the removal code does not compare the delegate you are removing by reference: it checks for equality by comparing references to the target method(s) that the delegate will call, and removes the matching delegates from "OnStuff".
By the way, "OnStuff" itself is a delegate object (the event keyword that I assume you have in your declaration simply restricts the accessibility of the delegate).
I was under the impression that 2 is just syntax sugar. They should be exactly the same thing.
If I remember correctly, the first alternative is merely syntactic sugar for the second.
Related
In the talk "Lowering in C#: What's really going on in your code? - David Wengier" https://youtu.be/gc1AxbNybvw?t=1606 at the 26:39 mark, the presenter says that if the following code:
was lowered by the compiler to a method:
this could be a potential problem for memory usage, I quote:
Because the compiler doesn't know what the body of the lambda does,
this could cause the whole class C to have to live on in memory
forever. If the class would be Windows.Form, it could have a lot of
resources
And then he says, that for this reason, the compiler actually generates a class.
I fail to understand the reasoning how could the memory leak appear.
For example, the C# 7 local functions actually turn into methods, so why wouldn't it work for non-capturing lambdas too?
If the compiler generates a non-static method and binds it to an Action<string>, the this pointer has to be bound along with it whether it is used in the lambda or not. You cannot call a non-static method without a this pointer; this is a VM-level restriction that the compiler cannot circumvent. So the this pointer is captured and thus keeps the class instance alive as long as the Action is alive, which, if it is bound to a long-lived event handler, for example, could be a long time.
But IMO the whole argument starts from the wrong place. He says, "You kind think it conceptually gets lowered to [a method in the same class]." And then he goes on to explain why that's a bad idea. But I think this is a strawman argument. Why would the compiler generate a non-static method on the same class in the first place? There is no reason to do that. It is certainly not the line of thinking the C# designers used when coming up with a lowering for lambdas.
The argument should instead go like this: a lambda can capture things it uses (it's a closure). In order to capture them, a place to store them is needed. So the compiler generates a class, which has fields for the captured things.
If the class happens to capture nothing, there are no fields, but there's still a class. Now, the compiler could, if the lambda captures nothing, generate a static method in the containing class, or a non-static method if the lambda only catches this. But there are two problems with this:
Those methods could be discovered by reflection.
It increases the complexity of the compiler (completely separate ways of lowering the lambda for different capture patterns) for no gain (this other way of lowering has no real advantage).
So the compiler simply doesn't do that. The only thing it does, as an optimization, is reuse the same instance of the lambda class every time if the lambda is stateless (i.e. captures nothing). This is just a tiny branch in the existing lowering code and thus has far less complexity.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to correctly unregister an event handler
MSDN states the following two event subscriptions are exactly equivalent (C# 2.0 vs 1.0 syntax):
publisher.CustomEvent += HandleCustomEvent;
publisher.CustomEvent += new CustomEventHandler(HandleCustomEvent);
I note that the newer syntax hides instantiation of a delegate object.
Do I need to keep a reference to a delegate so that I can properly unsubscribe later?
// Retain reference to delegate used to subscribe.
this.handleCustomEvent = new CustomEventHandler(HandleCustomEvent);
publisher.CustomEvent += this.handleCustomEvent;
...
// Use earlier reference to unsubscribe.
publisher.CustomEvent -= this.handleCustomEvent;
Or, is this the same thing?
publisher.CustomEvent += HandleCustomEvent;
...
publisher.CustomEvent -= HandleCustomEvent;
If they are the same, why?
Does -= HandleCustomEvent also create a new()? If so, isn't this object different than the object created by += HandleCustomEvent?
This is the exact same thing I think, you just focused on the second, short-hand syntax bit.
How to correctly unregister an event handler
I wouldn't answer on that just, but I wanted to add this...
if you'd like you might want to take a look at Rx - Reactive Extensions which frees you of these very problems. Unsubscribe is basically not mandatory unless you would want to 'stop' the event sooner (this is simplified, there are more details to this)
Does -= HandleCustomEvent also create a new()? If so, isn't this object different than the object created by += HandleCustomEvent?
Yes and yes.
As far as I can tell, the MSDN documentation for Delegate.Remove does not specifically explain how it determines whether two delegates are equal. However, MulticastDelegate.RemoveImpl appears to use Delegate.Equals to determine equality, and that is documented:
The methods and targets are compared for equality as follows:
If the two methods being compared are both static and are the same method on the same class, the methods are considered equal and the targets are also considered equal.
If the two methods being compared are instance methods and are the same method on the same object, the methods are considered equal and the targets are also considered equal.
Otherwise, the methods are not considered to be equal and the targets are also not considered to be equal.
So even though the delegate passed to -= is not the same instance as the delegate passed to +=, the event will still be successfully unsubscribed.
Quick note on the accepted answer: I disagree with a small part of Jeffrey's answer, namely the point that since Delegate had to be a reference type, it follows that all delegates are reference types. (It simply isn't true that a multi-level inheritance chain rules out value types; all enum types, for example, inherit from System.Enum, which in turn inherits from System.ValueType, which inherits from System.Object, all reference types.) However I think the fact that, fundamentally, all delegates in fact inherit not just from Delegate but from MulticastDelegate is the critical realization here. As Raymond points out in a comment to his answer, once you've committed to supporting multiple subscribers, there's really no point in not using a reference type for the delegate itself, given the need for an array somewhere.
See update at bottom.
It has always seemed strange to me that if I do this:
Action foo = obj.Foo;
I am creating a new Action object, every time. I'm sure the cost is minimal, but it involves allocation of memory to later be garbage collected.
Given that delegates are inherently themselves immutable, I wonder why they couldn't be value types? Then a line of code like the one above would incur nothing more than a simple assignment to a memory address on the stack*.
Even considering anonymous functions, it seems (to me) this would work. Consider the following simple example.
Action foo = () => { obj.Foo(); };
In this case foo does constitute a closure, yes. And in many cases, I imagine this does require an actual reference type (such as when local variables are closed over and are modified within the closure). But in some cases, it shouldn't. For instance in the above case, it seems that a type to support the closure could look like this: I take back my original point about this. The below really does need to be a reference type (or: it doesn't need to be, but if it's a struct it's just going to get boxed anyway). So, disregard the below code example. I leave it only to provide context for answers the specfically mention it.
struct CompilerGenerated
{
Obj obj;
public CompilerGenerated(Obj obj)
{
this.obj = obj;
}
public void CallFoo()
{
obj.Foo();
}
}
// ...elsewhere...
// This would not require any long-term memory allocation
// if Action were a value type, since CompilerGenerated
// is also a value type.
Action foo = new CompilerGenerated(obj).CallFoo;
Does this question make sense? As I see it, there are two possible explanations:
Implementing delegates properly as value types would have required additional work/complexity, since support for things like closures that do modify values of local variables would have required compiler-generated reference types anyway.
There are some other reasons why, under the hood, delegates simply can't be implemented as value types.
In the end, I'm not losing any sleep over this; it's just something I've been curious about for a little while.
Update: In response to Ani's comment, I see why the CompilerGenerated type in my above example might as well be a reference type, since if a delegate is going to comprise a function pointer and an object pointer it'll need a reference type anyway (at least for anonymous functions using closures, since even if you introduced an additional generic type parameter—e.g., Action<TCaller>—this wouldn't cover types that can't be named!). However, all this does is kind of make me regret bringing the question of compiler-generated types for closures into the discussion at all! My main question is about delegates, i.e., the thing with the function pointer and the object pointer. It still seems to me that could be a value type.
In other words, even if this...
Action foo = () => { obj.Foo(); };
...requires the creation of one reference type object (to support the closure, and give the delegate something to reference), why does it require the creation of two (the closure-supporting object plus the Action delegate)?
*Yes, yes, implementation detail, I know! All I really mean is short-term memory storage.
The question boils down to this: the CLI (Common Language Infrastructure) specification says that delegates are reference types. Why is this so?
One reason is clearly visible in the .NET Framework today. In the original design, there were two kinds of delegates: normal delegates and "multicast" delegates, which could have more than one target in their invocation list. The MulticastDelegate class inherits from Delegate. Since you can't inherit from a value type, Delegate had to be a reference type.
In the end, all actual delegates ended up being multicast delegates, but at that stage in the process, it was too late to merge the two classes. See this blog post about this exact topic:
We abandoned the distinction between Delegate and MulticastDelegate
towards the end of V1. At that time, it would have been a massive
change to merge the two classes so we didn’t do so. You should
pretend that they are merged and that only MulticastDelegate exists.
In addition, delegates currently have 4-6 fields, all pointers. 16 bytes is usually considered the upper bound where saving memory still wins out over extra copying. A 64-bit MulticastDelegate takes up 48 bytes. Given this, and the fact that they were using inheritance suggests that a class was the natural choice.
There is only one reason that Delegate needs to be a class, but it's a big one: while a delegate could be small enough to allow efficient storage as a value type (8 bytes on 32-bit systems, or 16 bytes on 64-bit systems), there's no way it could be small enough to efficiently guarantee if one thread attempts to write a delegate while another thread attempts to execute it, the latter thread wouldn't end up either invoking the old method on the new target, or the new method on the old target. Allowing such a thing to occur would be a major security hole. Having delegates be reference types avoids this risk.
Actually, even better than having delegates be structure types would be having them be interfaces. Creating a closure requires creating two heap objects: a compiler-generated object to hold any closed-over variables, and a delegate to invoke the proper method on that object. If delegates were interfaces, the object which held the closed-over variables could itself be used as the delegate, with no other object required.
Imagine if delegates were value types.
public delegate void Notify();
void SignalTwice(Notify notify) { notify(); notify(); }
int counter = 0;
Notify handler = () => { counter++; }
SignalTwice(handler);
System.Console.WriteLine(counter); // what should this print?
Per your proposal, this would internally be converted to
struct CompilerGenerated
{
int counter = 0;
public Execute() { ++counter; }
};
Notify handler = new CompilerGenerated();
SignalTwice(handler);
System.Console.WriteLine(counter); // what should this print?
If delegate were a value type, then SignalEvent would get a copy of handler, which means that a brand new CompilerGenerated would be created (a copy of handler) and passed to SignalEvent. SignalTwice would execute the delegate twice, which increments the counter twice in the copy. And then SignalTwice returns, and the function prints 0, because the original was not modified.
Here's an uninformed guess:
If delegates were implemented as value-types, instances would be very expensive to copy around since a delegate-instance is relatively heavy. Perhaps MS felt it would be safer to design them as immutable reference types - copying machine-word sized references to instances are relatively cheap.
A delegate instance needs, at the very least:
An object reference (the "this" reference for the wrapped method if it is an instance method).
A pointer to the wrapped function.
A reference to the object containing the multicast invocation list. Note that a delegate-type should support, by design, multicast using the same delegate type.
Let's assume that value-type delegates were implemented in a similar manner to the current reference-type implementation (this is perhaps somewhat unreasonable; a different design may well have been chosen to keep the size down) to illustrate. Using Reflector, here are the fields required in a delegate instance:
System.Delegate: _methodBase, _methodPtr, _methodPtrAux, _target
System.MulticastDelegate: _invocationCount, _invocationList
If implemented as a struct (no object header), these would add up to 24 bytes on x86 and 48 bytes on x64, which is massive for a struct.
On another note, I want to ask how, in your proposed design, making the CompilerGenerated closure-type a struct helps in any way. Where would the created delegate's object pointer point to? Leaving the closure type instance on the stack without proper escape analysis would be extremely risky business.
I can tell that making delegates as reference types is definitely a bad design choice. They could be value types and still support multi-cast delegates.
Imagine that Delegate is a struct composed of, let's say:
object target;
pointer to the method
It can be a struct, right?
The boxing will only occur if the target is a struct (but the delegate itself will not be boxed).
You may think it will not support MultiCastDelegate, but then we can:
Create a new object that will hold the array of normal delegates.
Return a Delegate (as struct) to that new object, which will implement Invoke iterating over all its values and calling Invoke on them.
So, for normal delegates, that are never going to call two or more handlers, it could work as a struct.
Unfortunately, that is not going to change in .Net.
As a side note, variance does not requires the Delegate to be reference types. The parameters of the delegate should be reference types. After all, if you pass a string were an object is required (for input, not ref or out), then no cast is needed, as string is already an object.
I saw this interesting conversation on the Internet:
Immutable doesn't mean it has to be a value type. And something that
is a value type is not required to be immutable. The two often go
hand-in-hand, but they are not actually the same thing, and there are
in fact counter-examples of each in the .NET Framework (the String
class, for example).
And the answer:
The difference being that while immutable reference types are
reasonably common and perfectly reasonable, making value types mutable
is almost always a bad idea, and can result in some very confusing
behaviour!
Taken from here
So, in my opinion the decision was made by language usability aspects, and not by compiler technological difficulties. I love nullable delegates.
I guess one reason is support for multi cast delegates Multi cast delegates are more complex than simply a few fields indicating target and method.
Another thing that's only possible in this form is delegate variance. This kind of variance requires a reference conversion between the two types.
Interestingly F# defines it's own function pointer type that's similar to delegates, but more lightweight. But I'm not sure if it's a value or reference type.
When calling a delegate you always have to check if it is not null. This is an often cause for errors.
Since delegates are more or less just a list of functions, I would assume, that this could have been easily checked by the delegate itself.
Does anybody know, why it has been implemented as it is?
This may be stating the obvious, but you can declare your event to point to a no-op handler and then you don't need to check for null when invoking.
public event EventHandler MyEvent = delegate { };
Then you can call the handlers pointed to by MyEvent without checking for null.
The delegate itself can't check it. Since it is null you can't call methods on it.
The C# compiler on the other hand could automatically insert the null-check. But I don't know what behavior to use if the delegate is a function with a return-value.
One might argue that the C# compiler should insert that check for void-functions to avoid boilerplate code in that common case. But that decision is now in the past, you can only fix it without breaking existing programs if you get a time-machine.
You can use extension methods(since they can be called on a null object) to automate the null-check for events:
http://blogs.microsoft.co.il/blogs/shayf/archive/2009/05/25/a-handy-extension-method-raise-events-safely.aspx
And since a parameter is a temp variable you don't need to manually assign the event to a temp variable for thread-safety.
Internally, the compiler will generate 2 methods add_MyEvent and remove_MyEvent for each event declared in a class. When you write, MyEvent += ..., the compiler will actually generate a call to add_MyEvent which in turn will call System.Delegate.Combine.
Combine takes 2 delegates in parameter and creates a new (multicast) delegate from the 2 original delegates, handling the case when one of them is null (which is the case the first time you call +=).
I guess the compiler could have been a bit smarter and also handled the event call so that when you call
MyEvent(), it would actually generate a null check and actually invoke the delegate only if not null. (It would be nice to have Eric Lippert's view on that one).
The reason is performance (actually, that's my best guess). Events and delegates are made with a lot of compiler magic, stuff that cannot be replicated by mere C# code. But underneath it goes something like this:
A delegate is a compiler-generated class that inherits from MulticastDelegate class, which itself derives from the Delegate class. Note these two classes are magic and you can't inherit from them yourself (well, maybe you can, but you won't be able to use them very well).
An event however is implemented something like this:
private MyEventDelegateClass __cgbe_MyEvent; // Compiler generated backing field
public event MyEventDelegateClass MyEvent
{
add
{
this.__cgbe_MyEvent = (MyEventDelegateClass)Delegate.Combine(this.__cgbe_MyEvent, value);
}
remove
{
this.__cgbe_MyEvent = (MyEventDelegateClass)Delegate.Remove(this.__cgbe_MyEvent, value);
}
get
{
return this.__cgbe_MyEvent;
}
}
OK, so this isn't real code (nor is it exactly the way it is in real life), but it should give you an idea of what's going on.
The point is that the backing field initially really is null. That way there is no overhead for creating an instance of MyEventDelegateClass when creating your object. And that's why you have to check for null before invoking. Later on, when handlers get added/removed, an instance of MyEventDelegateClass is created and assigned to the field. And when the last handler is removed, this instance is also lost and the backing field is reset to null again.
This is the principle of "you don't pay for what you don't use". As long as you don't use the event, there will be no overhead for it. No extra memory, no extra CPU cycles.
I'd guess the reasons are history and consistency.
It is consistent with the way other types are handled; e.g. there's no language or platform assistence in dealing with nulls - it's the programmers responsibility to ensure the null placeholder is never actually used - there's no means of only optionally calling a method if the object reference you wish to call it on is non-null either, after all.
It's history, since null references happen to be included by default in most types, even though this isn't necessary. That is, rather than treating reference types like value types and require an additional "nullability" annotation to permit nullability, reference types are always nullable. I'm sure there were reasons for this back in the day when Java and .NET were designed, but it introduces many unnecessary bugs and complexities which are easy to avoid in a strongly typed language (e.g., .NET value types). But given the historical inclusion of null as a type-system wide "invalid" value, so to speak, it's only natural to do so for delegates as well.
If a delegate instance is null, how can it "check itself"? It's null... that means it does not exist, basically. There is nothing there to check itself in the first place. That's why your code which attempts to call the delegate must do that checking, first.
"Since delegates are more or less just
a list of functions, I would assume,
that this could have been easily
checked by the delegate itself".
Your assumption is wrong. Plain and simple.
I was going to post a question, but figured it out ahead of time and decided to post the question and the answer - or at least my observations.
When using an anonymous delegate as the WaitCallback, where ThreadPool.QueueUserWorkItem is called in a foreach loop, it appears that the same one foreach-value is passed into each thread.
List< Thing > things = MyDb.GetTheThings();
foreach( Thing t in Things)
{
localLogger.DebugFormat( "About to queue thing [{0}].", t.Id );
ThreadPool.QueueUserWorkItem(
delegate()
{
try
{
WorkWithOneThing( t );
}
finally
{
Cleanup();
localLogger.DebugFormat("Thing [{0}] has been queued and run by the delegate.", t.Id );
}
});
}
For a collection of 16 Thing instances in Things I observed that each 'Thing' passed to WorkWithOneThing corresponded to the last item in the 'things' list.
I suspect this is because the delegate is accessing the 't' outer variable. Note that I also experimented with passing the Thing as a parameter to the anonymous delegate, but the behavior remained incorrect.
When I re-factored the code to use a named WaitCallback method and passed the Thing 't' to the method, voilà ... the i'th instance of Things was correctly passed into WorkWithOneThing.
A lesson in parallelism I guess. I also imagine that the Parallel.For family addresses this, but that library was not an option for us at this point.
Hope this saves someone else some time.
Howard Hoffman
This is correct, and describes how C# captures outside variables inside closures. It's not directly an issue about parallelism, but rather about anonymous methods and lambda expressions.
This question discusses this language feature and its implications in detail.
This is a common occurrence when using closures and is especially evident when constructing LINQ queries. The closure references the variable, not its contents, therefore, to make your example work, you can just specify a variable inside the loop that takes the value of t and then reference that in the closure. This will ensure each version of your anonymous delegate references a different variable.
Below is a link detailing why that happens. It's written for VB but C# has the same semantics.
http://blogs.msdn.com/jaredpar/archive/2007/07/26/closures-in-vb-part-5-looping.aspx