Are lambda expressions (and to a degree, anonymous functions) closures?
My understanding of closures are that they are functions that are treated as objects, which seems to be an accurate representation of what anonymous functions and Lambda expressions do.
And is it correct to call them closures? I understand that closures came about (or became popular) due to the lisp dialect, but is it also a general programming term?
Thanks for any clarification that you can provide!
A lambda may be implemented using a closure, but it is not itself necessarily a closure.
A closure is "a function together with a referencing environment for the non-local variables of that function.".
When you make a lambda expression that uses variables defined outside of the method, then the lambda must be implemented using a closure. For example:
int i = 42;
Action lambda = () => { Console.WriteLine(i); };
In this case, the compiler generated method must have access to the variable (i) defined in a completely different scope. In order for this to work, the method it generates is a "function together with the referencing environment" - basically, it's creating a "closure" to retrieve access to the variable.
However, this lambda:
Action lambda2 = () => { Console.WriteLine("Foo"); }
does not rely on any "referencing environment", since it's a fully contained method. In this case, the compiler generates a normal static method, and there is no closure involved at all.
In both cases, the lambda is creating a delegate ("function object"), but it's only creating a closure in the first case, as the lambda doesn't necessarily need to "capture" the referencing environment in all cases.
Reed's answer is correct; I would just add few additional details:
lambda expressions and anonymous methods both have closure semantics; that is, they "capture" their outer variables and extend the lifetimes of those variables.
anonymous function is the term we use when we mean a lambda expression or an anonymous method. Yes, that is confusing. Sorry. It was the best we could come up with.
a function that can be treated as an object is just a delegate. What makes a lambda a closure is that it captures its outer variables.
lambda expressions converted to expression trees also have closure semantics, interestingly enough. And implementing that correctly was a pain in the neck, I tell you!
"this" is considered an "outer variable" for the purpose of creating a closure even though "this" is not a variable.
It's "closure" not "clojure."
That is not what a closure is. A closure is basically a representation of a function together with any non-local variables that the function consumes.
In that sense, lambdas are not closures, but they do cause closures to be generated by the compiler if they close over any variables.
If you use ILDASM on an assembly that contains a lambda that closes over some variables, you will see in that assembly a compiler generated class that repsresents the function and those variables that were closed over. That is the closure.
When you say
functions that are treated as objects,
that's normally just "function object" (in C# we'd say "delegate") and is common in functional programming.
Yes. Closures typically capture variables from the outer scope. Lambdas can do that. However if your lambda does not capture anything, it is not a closure.
Related
In the talk "Lowering in C#: What's really going on in your code? - David Wengier" https://youtu.be/gc1AxbNybvw?t=1606 at the 26:39 mark, the presenter says that if the following code:
was lowered by the compiler to a method:
this could be a potential problem for memory usage, I quote:
Because the compiler doesn't know what the body of the lambda does,
this could cause the whole class C to have to live on in memory
forever. If the class would be Windows.Form, it could have a lot of
resources
And then he says, that for this reason, the compiler actually generates a class.
I fail to understand the reasoning how could the memory leak appear.
For example, the C# 7 local functions actually turn into methods, so why wouldn't it work for non-capturing lambdas too?
If the compiler generates a non-static method and binds it to an Action<string>, the this pointer has to be bound along with it whether it is used in the lambda or not. You cannot call a non-static method without a this pointer; this is a VM-level restriction that the compiler cannot circumvent. So the this pointer is captured and thus keeps the class instance alive as long as the Action is alive, which, if it is bound to a long-lived event handler, for example, could be a long time.
But IMO the whole argument starts from the wrong place. He says, "You kind think it conceptually gets lowered to [a method in the same class]." And then he goes on to explain why that's a bad idea. But I think this is a strawman argument. Why would the compiler generate a non-static method on the same class in the first place? There is no reason to do that. It is certainly not the line of thinking the C# designers used when coming up with a lowering for lambdas.
The argument should instead go like this: a lambda can capture things it uses (it's a closure). In order to capture them, a place to store them is needed. So the compiler generates a class, which has fields for the captured things.
If the class happens to capture nothing, there are no fields, but there's still a class. Now, the compiler could, if the lambda captures nothing, generate a static method in the containing class, or a non-static method if the lambda only catches this. But there are two problems with this:
Those methods could be discovered by reflection.
It increases the complexity of the compiler (completely separate ways of lowering the lambda for different capture patterns) for no gain (this other way of lowering has no real advantage).
So the compiler simply doesn't do that. The only thing it does, as an optimization, is reuse the same instance of the lambda class every time if the lambda is stateless (i.e. captures nothing). This is just a tiny branch in the existing lowering code and thus has far less complexity.
I'm new to SO and programming and learning day by day with bits and pieces of tech (C#) jargons.
After Googling for a while, below is what I've researched about methods
A Method is a block of statements, which serves for code reusability
& it also supports overloading with different SIGNATURE....for ex:
drawShape(2pts), drawShape(3pts) etc...
An Anonymous method is one with block of statements, but no
name....(as its premature to ask, in wt situation we come across
anonymous method...any articles, samples ...)
Named method: Here's a link but at the end i didn't get what Named Method actually is...
Can anyone explain what a "Named" method is, and where do we use anonymous method?
A named method is a method you can call by its name (e.g. it is a function that has a name). For example, you have defined a function to add two numbers:
int f(int x, int y)
{
return x+y;
}
You would call this method by its name like so: f(1, 2);.
Anonymous method is a method that is passed as an argument to a function without the need for its name. These methods can be constructed at runtime or evaluated from a lambda expression at compile time.
These methods are often used in LINQ queries, for example:
int maxSmallerThan10 = array.Where(x => x < 10).Max();
The expression x => x < 10 is called a lambda expression and its result is an anonymous function that will be run by the method Where.
If you are a beginner, I would suggest you first read about more basic stuff. Check out the following links:
http://www.completecsharptutorial.com/
http://www.csharp-station.com/tutorial.aspx
http://www.homeandlearn.co.uk/csharp/csharp.html
Let's start from a simple method.
void MyMethod()
{
Console.WriteLine("Inside MyMethod"); //Write to output
}
The above method is a named-method which just writes Inside MyMethod to the output window.
Anonymous methods are some methods used in some special scenarios (when using delegates) where the method definition is usually smaller where you don't specify the name of the method.
For example, (delegate) => { Console.WriteLine("Inside Mymethod");}
Just start writing some simple programs and in the due course, when you use delegates or some advanced concepts, you will yourself learn. :)
Explanation by Analogy
Normally when we tell stories we refer to people by name:
"Freddie"
"Who's Freddie?"
"You know, Freddie, Freddie from Sales - the male guy with the red hair, who burned the building down...?"
In reality nobody cares who the person is, department he works etc. it's not like we'll refer to him every again. We want to be able to say: "Some guy burned down our building". All the other stuff (hair color, name etc.) is irrelevant and/or can be inferred.
What does this have to do with c#?
Typically in c# you would have to define a method if you want to use it: you must tell the compiler (typically):
what it is called,
and what goes into it (parameters + their types),
as well as what should come out (return type),
and whether it is something you can do in the privacy of your home or whether you can do it in public. (scope)
When you do that with methods, you are basically using named methods. But writing them out: that's a lot of effort. Especially if all of that can be inferred and you're never going to use it again.
That's basically where anonymous methods come in. It's like a disposable method - something quick and dirty - it reduces the amount you have to type in. That's basically the purpose of them.
Anonymous methods or anonymous functions, what seems to be the same, basically are delegates. As the link you point out: http://msdn.microsoft.com/en-us/library/bb882516.aspx describes, anonymous methods provide a simplified way to pass method to be executed by another method. Like a callback.
Another way to see it, is think about lambda expressions.
A named by the contrast is any common method.
From MSDN:
A delegate can be associated with a named method. When you instantiate a delegate by using a named method, the method is passed as a parameter. This is called using a named method. Delegates constructed with a named method can encapsulate either a static method or an instance method. Named methods are the only way to instantiate a delegate in earlier versions of C#. However, in a situation where creating a new method is unwanted overhead, C# enables you to instantiate a delegate and immediately specify a code block that the delegate will process when it is called. The block can contain either a lambda expression or an anonymous method.
and
In versions of C# before 2.0, the only way to declare a delegate was to use named methods. C# 2.0 introduced anonymous methods and in C# 3.0 and later, lambda expressions supersede anonymous methods as the preferred way to write inline code. However, the information about anonymous methods in this topic also applies to lambda expressions. There is one case in which an anonymous method provides functionality not found in lambda expressions. Anonymous methods enable you to omit the parameter list. This means that an anonymous method can be converted to delegates with a variety of signatures. This is not possible with lambda expressions. For more information specifically about lambda expressions, see Lambda Expressions (C# Programming Guide). Creating anonymous methods is essentially a way to pass a code block as a delegate parameter. By using anonymous methods, you reduce the coding overhead in instantiating delegates because you do not have to create a separate method.
So in answer to your question about when to use anonymous methods, then MSDN says: in a situation where creating a new method is unwanted overhead.
In my experience it's more down to a question of code reuse and readability.
Links:
http://msdn.microsoft.com/en-us/library/98dc08ac.aspx
http://msdn.microsoft.com/en-us/library/0yw3tz5k.aspx
Hope that helps
This is all about the Compile method of Expression Type. Sorry being naïve since I am a late comer. I have been reading about building expression in order to enables dynamic modification of executable code. And It make sense for me when it comes to emitting lambda expression from a given expression tree as long as for the varying inputs/environment (say for different values for any given constant/parameter/member expression). I presume, it would be ideal if I could cache (re-use) lambda those are generated/compiled from an expression tree provided there is no change in the environment.
Question: Does CLR always emit lambda expression even if I have there is no change in the environment? If so, what is the best was to avoid compilation of expression from lambda if there is no change in the environment?
CLR doesn't cache lambda expressions, Compile() each time returns a new delegate.
But it should be easy to cache, via something like this:
public Func<T> Get<T>(Expression<Func<T>> expression)
{
string key = expression.Body.ToString();
Func<T> result;
if (!_cache.TryGetValue(key, out result)) {
result = expression.Compile();
_cache.Add(key, result);
}
return result;
}
Lambda expressions is just a way to represent a piece of code: call this, call that, compare these arguments, return something, etc. Almost the same way you do it from code editor, or JIT does from IL.
Compile emits a delegate from particular lambda expression. Once you have compiled lambda into delegate, the delegate remains unchanged (the lambda remains unchanged too, because it is immutable).
This doesn't mean, that delegate can't accept any arguments or call any method of any object. This just means, that delegate's IL doesn't change. And yes, you can cache compiled delegate instance.
Calling Compile() will return a new delegate each time, and each time new MSIL code is emitted. This is slow and effectively creates a memory leak, since MSIL code is not subject to garbage collection. I created a library that offers cached compilation, which in fact compares the structure of the expressions correctly and enables reusing the cached delegates. All constants and closures are automatically replaced by parameters and get re-inserted into the delegate in an outer closure. That avoids the leaking memory and is much faster. Check it out here: https://github.com/Miaplaza/expression-utils
Could you give me some reasons for limitations of the dynamic type in C#? I read about them in "Pro C# 2010 and the .NET 4 platform". Here is an excerpt (if quoting books is illegal here, tell me and I will remove the excerpt):
While a great many things can be
defined using the dynamic keyword,
there are some limitations regarding
its usage. While they are not show
stoppers, do know that a dynamic data
item cannot make use of lambda
expressions or C# anonymous methods
when calling a method. For example,
the following code will always result
in errors, even if the target method
does indeed take a delegate parameter
which takes a string value and returns
void.
dynamic a = GetDynamicObject();
// Error! Methods on dynamic data can’t use lambdas!
a.Method(arg => Console.WriteLine(arg));
To circumvent this restriction, you
will need to work with the underlying
delegate directly, using the
techniques described in Chapter 11
(anonymous methods and lambda
expressions, etc). Another limitation
is that a dynamic point of data cannot
understand any extension methods (see
Chapter 12). Unfortunately, this would
also include any of the extension
methods which come from the LINQ APIs.
Therefore, a variable declared with
the dynamic keyword has very limited
use within LINQ to Objects and other
LINQ technologies:
dynamic a = GetDynamicObject();
// Error! Dynamic data can’t find the Select() extension method!
var data = from d in a select d;
Thanks in advance.
Tomas's conjectures are pretty good. His reasoning on extension methods is spot on. Basically, to make extension methods work we need the call site to at runtime somehow know what using directives were in force at compile time. We simply did not have enough time or budget to develop a system whereby this information could be persisted into the call site.
For lambdas, the situation is actually more complex than the simple problem of determining whether the lambda is going to expression tree or delegate. Consider the following:
d.M(123)
where d is an expression of type dynamic. *What object should get passed at runtime as the argument to the call site "M"? Clearly we box 123 and pass that. Then the overload resolution algorithm in the runtime binder looks at the runtime type of d and the compile-time type of the int 123 and works with that.
Now what if it was
d.M(x=>x.Foo())
Now what object should we pass as the argument? We have no way to represent "lambda method of one variable that calls an unknown function called Foo on whatever the type of x turns out to be".
Suppose we wanted to implement this feature: what would we have to implement? First, we'd need a way to represent an unbound lambda. Expression trees are by design only for representing lambdas where all types and methods are known. We'd need to invent a new kind of "untyped" expression tree. And then we'd need to implement all of the rules for lambda binding in the runtime binder.
Consider that last point. Lambdas can contain statements. Implementing this feature requires that the runtime binder contain the entire semantic analyzer for every possible statement in C#.
That was orders of magnitude out of our budget. We'd still be working on C# 4 today if we'd wanted to implement that feature.
Unfortunately this means that LINQ doesn't work very well with dynamic, because LINQ of course uses untyped lambdas all over the place. Hopefully in some hypothetical future version of C# we will have a more fully-featured runtime binder and the ability to do homoiconic representations of unbound lambdas. But I wouldn't hold my breath waiting if I were you.
UPDATE: A comment asks for clarification on the point about the semantic analyzer.
Consider the following overloads:
class C {
public void M(Func<IDisposable, int> f) { ... }
public void M(Func<int, int> f) { ... }
...
}
and a call
d.M(x=> { using(x) { return 123; } });
Suppose d is of compile time type dynamic and runtime type C. What must the runtime binder do?
The runtime binder must determine at runtime whether the expression x=>{...} is convertible to each of the delegate types in each of the overloads of M.
In order to do that, the runtime binder must be able to determine that the second overload is not applicable. If it were applicable then you could have an int as the argument to a using statement, but the argument to a using statement must be disposable. That means that the runtime binder must know all the rules for the using statement and be able to correctly report whether any possible use of the using statement is legal or illegal.
Clearly that is not restricted to the using statement. The runtime binder must know all the rules for all of C# in order to determine whether a given statement lambda is convertible to a given delegate type.
We did not have time to write a runtime binder that was essentially an entire new C# compiler that generates DLR trees rather than IL. By not allowing lambdas we only have to write a runtime binder that knows how to bind method calls, arithmetic expressions and a few other simple kinds of call sites. Allowing lambdas makes the problem of runtime binding on the order of dozens or hundreds of times more expensive to implement, test and maintain.
Lambdas: I think that one reason for not supporting lambdas as parameters to dynamic objects is that the compiler wouldn't know whether to compile the lambda as a delegate or as an expression tree.
When you use a lambda, the compiler decides based on the type of the target parameter or variable. When it is Func<...> (or other delegate) it compiles the lambda as an executable delegate. When the target is Expression<...> it compiles lambda into an expression tree.
Now, when you have a dynamic type, you don't know whether the parameter is delegate or expression, so the compiler cannot decide what to do!
Extension methods: I think that the reason here is that finding extension methods at runtime would be quite difficult (and perhaps also inefficient). First of all, the runtime would need to know what namespaces were referenced using using. Then it would need to search all classes in all loaded assemblies, filter those that are accessible (by namespace) and then search those for extension methods...
Eric (and Tomas) says it well, but here is how I think of it.
This C# statement
a.Method(arg => Console.WriteLine(arg));
has no meaning without a lot of context. Lambda expressions themselves have no types, rather they are convertible to delegate (or Expression) types. So the only way to gather the meaning is to provide some context which forces the lambda to be converted to a specific delegate type. That context is typically (as in this example) overload resolution; given the type of a, and the available overloads Method on that type (including extension members), we can possibly place some context that gives the lambda meaning.
Without that context to produce the meaning, you end up having to bundle up all kinds of information about the lambda in the hopes of somehow binding the unknowns at runtime. (What IL could you possibly generate?)
In vast contrast, one you put a specific delegate type there,
a.Method(new Action<int>(arg => Console.WriteLine(arg)));
Kazam! Things just got easy. No matter what code is inside the lambda, we now know exactly what type it has, which means we can compile IL just as we would any method body (we now know, for example, which of the many overloads of Console.WriteLine we're calling). And that code has one specific type (Action<int>), which means it is easy for the runtime binder to see if a has a Method that takes that type of argument.
In C#, a naked lambda is almost meaningless. C# lambdas need static context to give them meaning and rule out ambiguities that arise from many possible coercisons and overloads. A typical program provides this context with ease, but the dynamic case lacks this important context.
I was going to post a question, but figured it out ahead of time and decided to post the question and the answer - or at least my observations.
When using an anonymous delegate as the WaitCallback, where ThreadPool.QueueUserWorkItem is called in a foreach loop, it appears that the same one foreach-value is passed into each thread.
List< Thing > things = MyDb.GetTheThings();
foreach( Thing t in Things)
{
localLogger.DebugFormat( "About to queue thing [{0}].", t.Id );
ThreadPool.QueueUserWorkItem(
delegate()
{
try
{
WorkWithOneThing( t );
}
finally
{
Cleanup();
localLogger.DebugFormat("Thing [{0}] has been queued and run by the delegate.", t.Id );
}
});
}
For a collection of 16 Thing instances in Things I observed that each 'Thing' passed to WorkWithOneThing corresponded to the last item in the 'things' list.
I suspect this is because the delegate is accessing the 't' outer variable. Note that I also experimented with passing the Thing as a parameter to the anonymous delegate, but the behavior remained incorrect.
When I re-factored the code to use a named WaitCallback method and passed the Thing 't' to the method, voilà ... the i'th instance of Things was correctly passed into WorkWithOneThing.
A lesson in parallelism I guess. I also imagine that the Parallel.For family addresses this, but that library was not an option for us at this point.
Hope this saves someone else some time.
Howard Hoffman
This is correct, and describes how C# captures outside variables inside closures. It's not directly an issue about parallelism, but rather about anonymous methods and lambda expressions.
This question discusses this language feature and its implications in detail.
This is a common occurrence when using closures and is especially evident when constructing LINQ queries. The closure references the variable, not its contents, therefore, to make your example work, you can just specify a variable inside the loop that takes the value of t and then reference that in the closure. This will ensure each version of your anonymous delegate references a different variable.
Below is a link detailing why that happens. It's written for VB but C# has the same semantics.
http://blogs.msdn.com/jaredpar/archive/2007/07/26/closures-in-vb-part-5-looping.aspx