I just read MSDN and found something need any advise here.
http://msdn.microsoft.com/en-us/library/0yw3tz5k.aspx
A reference to the outer variable n is said to be captured when the delegate is created. Unlike local variables, the lifetime of a captured variable extends until the delegates that reference the anonymous methods are eligible for garbage collection.
Does the "Captured" means it will be copy by value?
But however I try to write a sample program as follow:
class Program
{
class async_class
{
private int n = 0;
public async_class()
{
for (int i = 0; i <= 9; i++)
{
System.Console.WriteLine("Outer n={0} address={1}", n, n.GetHashCode());
System.Threading.Thread thread1 = new System.Threading.Thread( () =>
{
System.Console.WriteLine("Inner after n={0} address={1}", ++n, n.GetHashCode());
});
thread1.Start();
//n = 10;
}
}
}
static void Main(string[] args)
{
async_class class1 = new async_class();
}
}
}
In this sample, the inner "++n" will write back to original outer "n". So the result will be.
Outer n=0 address=0
Outer n=0 address=0
Inner after n=1 address=1
Outer n=1 address=1
Anyone could explain more detail about the "captured" outer variable?
No, the whole point of saying that it's captured is that it's not just copying the values. Closures close over variables, not over values. Every single access of n in your program is accessing the same variable, there are never any copies made of it.
That said, your program is a confusing example of this case; it is using multiple threads, and that is introducing all sorts of race conditions as you are not safely manipulating the variable from multiple threads. This will cause all sorts of undefined behaviors. If you want to study closures, do so from a single thread; it will make your program much, much, much easier to reason about.
You can write much simpler programs to demonstrate that closures close over variables, not values. Here's a simple snippet:
int n = 2;
Action a = () => Console.WriteLine(n);
n = 5;
a();
If the closure captured the value of n, this would print 2. If it closes over the variable instead of its value, it will print 5. Go ahead and run it to see what happens.
Related
I' studying about ThreadPool in C# recently. And I wrote a small function to test it. Just as the code shown below:
static void threadTest()
{
int totalCnt = 2;
ManualResetEvent manual = new ManualResetEvent(false);
for(int i = 0; i < 2; ++i)
{
int cur = i;
Console.WriteLine("Before thread:" + cur);
ThreadPool.QueueUserWorkItem((s) => {
Console.WriteLine("In thread:" + cur);
if (Interlocked.Decrement(ref totalCnt) == 0) manual.Set();
});
Console.WriteLine("End thread:" + cur);
}
manual.WaitOne();
}
The output of this code is:
Before thread:0
End thread:0
Before thread:1
End thread:1
Before thread:2
End thread:2
Before thread:3
End thread:3
In thread:0
In thread:3
In thread:2
In thread:1
"In thread:xx" is printed after all loops are over. But every thread needs a temporary variable "cur" to output this line. However,as far as I know, a temporary variable in one loop will be removed from stack after the loop is over. In this example, thread can still visit "cur" when its loop is over. Can anyone explain this to me. Thanks!!
From the documentation about lambdas
Lambdas can refer to outer variables. These are the variables that are in scope in the method that defines the lambda expression, or in scope in the type that contains the lambda expression. Variables that are captured in this manner are stored for use in the lambda expression even if the variables would otherwise go out of scope and be garbage collected. An outer variable must be definitely assigned before it can be consumed in a lambda expression.
Emphasis mine.
In effect the lambda expression will be rewritten by the compiler to a class, and any outer variable used in the expression will be added as fields in that class.
What is a closure? Do we have them in .NET?
If they do exist in .NET, could you please provide a code snippet (preferably in C#) explaining it?
I have an article on this very topic. (It has lots of examples.)
In essence, a closure is a block of code which can be executed at a later time, but which maintains the environment in which it was first created - i.e. it can still use the local variables etc of the method which created it, even after that method has finished executing.
The general feature of closures is implemented in C# by anonymous methods and lambda expressions.
Here's an example using an anonymous method:
using System;
class Test
{
static void Main()
{
Action action = CreateAction();
action();
action();
}
static Action CreateAction()
{
int counter = 0;
return delegate
{
// Yes, it could be done in one statement;
// but it is clearer like this.
counter++;
Console.WriteLine("counter={0}", counter);
};
}
}
Output:
counter=1
counter=2
Here we can see that the action returned by CreateAction still has access to the counter variable, and can indeed increment it, even though CreateAction itself has finished.
If you are interested in seeing how C# implements Closure read "I know the answer (its 42) blog"
The compiler generates a class in the background to encapsulate the anoymous method and the variable j
[CompilerGenerated]
private sealed class <>c__DisplayClass2
{
public <>c__DisplayClass2();
public void <fillFunc>b__0()
{
Console.Write("{0} ", this.j);
}
public int j;
}
for the function:
static void fillFunc(int count) {
for (int i = 0; i < count; i++)
{
int j = i;
funcArr[i] = delegate()
{
Console.Write("{0} ", j);
};
}
}
Turning it into:
private static void fillFunc(int count)
{
for (int i = 0; i < count; i++)
{
Program.<>c__DisplayClass1 class1 = new Program.<>c__DisplayClass1();
class1.j = i;
Program.funcArr[i] = new Func(class1.<fillFunc>b__0);
}
}
Closures are functional values that hold onto variable values from their original scope. C# can use them in the form of anonymous delegates.
For a very simple example, take this C# code:
delegate int testDel();
static void Main(string[] args)
{
int foo = 4;
testDel myClosure = delegate()
{
return foo;
};
int bar = myClosure();
}
At the end of it, bar will be set to 4, and the myClosure delegate can be passed around to be used elsewhere in the program.
Closures can be used for a lot of useful things, like delayed execution or to simplify interfaces - LINQ is mainly built using closures. The most immediate way it comes in handy for most developers is adding event handlers to dynamically created controls - you can use closures to add behavior when the control is instantiated, rather than storing data elsewhere.
Func<int, int> GetMultiplier(int a)
{
return delegate(int b) { return a * b; } ;
}
//...
var fn2 = GetMultiplier(2);
var fn3 = GetMultiplier(3);
Console.WriteLine(fn2(2)); //outputs 4
Console.WriteLine(fn2(3)); //outputs 6
Console.WriteLine(fn3(2)); //outputs 6
Console.WriteLine(fn3(3)); //outputs 9
A closure is an anonymous function passed outside of the function in which it is created.
It maintains any variables from the function in which it is created that it uses.
A closure is when a function is defined inside another function (or method) and it uses the variables from the parent method. This use of variables which are located in a method and wrapped in a function defined within it, is called a closure.
Mark Seemann has some interesting examples of closures in his blog post where he does a parallel between oop and functional programming.
And to make it more detailed
var workingDirectory = new DirectoryInfo(Environment.CurrentDirectory);//when this variable
Func<int, string> read = id =>
{
var path = Path.Combine(workingDirectory.FullName, id + ".txt");//is used inside this function
return File.ReadAllText(path);
};//the entire process is called a closure.
Here is a contrived example for C# which I created from similar code in JavaScript:
public delegate T Iterator<T>() where T : class;
public Iterator<T> CreateIterator<T>(IList<T> x) where T : class
{
var i = 0;
return delegate { return (i < x.Count) ? x[i++] : null; };
}
So, here is some code that shows how to use the above code...
var iterator = CreateIterator(new string[3] { "Foo", "Bar", "Baz"});
// So, although CreateIterator() has been called and returned, the variable
// "i" within CreateIterator() will live on because of a closure created
// within that method, so that every time the anonymous delegate returned
// from it is called (by calling iterator()) it's value will increment.
string currentString;
currentString = iterator(); // currentString is now "Foo"
currentString = iterator(); // currentString is now "Bar"
currentString = iterator(); // currentString is now "Baz"
currentString = iterator(); // currentString is now null
Hope that is somewhat helpful.
Closures are chunks of code that reference a variable outside themselves, (from below them on the stack), that might be called or executed later, (like when an event or delegate is defined, and could get called at some indefinite future point in time)... Because the outside variable that the chunk of code references may gone out of scope (and would otherwise have been lost), the fact that it is referenced by the chunk of code (called a closure) tells the runtime to "hold" that variable in scope until it is no longer needed by the closure chunk of code...
Basically closure is a block of code that you can pass as an argument to a function. C# supports closures in form of anonymous delegates.
Here is a simple example:
List.Find method can accept and execute piece of code (closure) to find list's item.
// Passing a block of code as a function argument
List<int> ints = new List<int> {1, 2, 3};
ints.Find(delegate(int value) { return value == 1; });
Using C#3.0 syntax we can write this as:
ints.Find(value => value == 1);
If you write an inline anonymous method (C#2) or (preferably) a Lambda expression (C#3+), an actual method is still being created. If that code is using an outer-scope local variable - you still need to pass that variable to the method somehow.
e.g. take this Linq Where clause (which is a simple extension method which passes a lambda expression):
var i = 0;
var items = new List<string>
{
"Hello","World"
};
var filtered = items.Where(x =>
// this is a predicate, i.e. a Func<T, bool> written as a lambda expression
// which is still a method actually being created for you in compile time
{
i++;
return true;
});
if you want to use i in that lambda expression, you have to pass it to that created method.
So the first question that arises is: should it be passed by value or reference?
Pass by reference is (I guess) more preferable as you get read/write access to that variable (and this is what C# does; I guess the team in Microsoft weighed the pros and cons and went with by-reference; According to Jon Skeet's article, Java went with by-value).
But then another question arises: Where to allocate that i?
Should it actually/naturally be allocated on the stack?
Well, if you allocate it on the stack and pass it by reference, there can be situations where it outlives it's own stack frame. Take this example:
static void Main(string[] args)
{
Outlive();
var list = whereItems.ToList();
Console.ReadLine();
}
static IEnumerable<string> whereItems;
static void Outlive()
{
var i = 0;
var items = new List<string>
{
"Hello","World"
};
whereItems = items.Where(x =>
{
i++;
Console.WriteLine(i);
return true;
});
}
The lambda expression (in the Where clause) again creates a method which refers to an i. If i is allocated on the stack of Outlive, then by the time you enumerate the whereItems, the i used in the generated method will point to the i of Outlive, i.e. to a place in the stack that is no longer accessible.
Ok, so we need it on the heap then.
So what the C# compiler does to support this inline anonymous/lambda, is use what is called "Closures": It creates a class on the Heap called (rather poorly) DisplayClass which has a field containing the i, and the Function that actually uses it.
Something that would be equivalent to this (you can see the IL generated using ILSpy or ILDASM):
class <>c_DisplayClass1
{
public int i;
public bool <GetFunc>b__0()
{
this.i++;
Console.WriteLine(i);
return true;
}
}
It instantiates that class in your local scope, and replaces any code relating to i or the lambda expression with that closure instance. So - anytime you are using the i in your "local scope" code where i was defined, you are actually using that DisplayClass instance field.
So if I would change the "local" i in the main method, it will actually change _DisplayClass.i ;
i.e.
var i = 0;
var items = new List<string>
{
"Hello","World"
};
var filtered = items.Where(x =>
{
i++;
return true;
});
filtered.ToList(); // will enumerate filtered, i = 2
i = 10; // i will be overwriten with 10
filtered.ToList(); // will enumerate filtered again, i = 12
Console.WriteLine(i); // should print out 12
it will print out 12, as "i = 10" goes to that dispalyclass field and changes it just before the 2nd enumeration.
A good source on the topic is this Bart De Smet Pluralsight module (requires registration) (also ignore his erroneous use of the term "Hoisting" - what (I think) he means is that the local variable (i.e. i) is changed to refer to the the new DisplayClass field).
In other news, there seems to be some misconception that "Closures" are related to loops - as I understand "Closures" are NOT a concept related to loops, but rather to anonymous methods / lambda expressions use of local scoped variables - although some trick questions use loops to demonstrate it.
A closure aims to simplify functional thinking, and it allows the runtime to manage
state, releasing extra complexity for the developer. A closure is a first-class function
with free variables that are bound in the lexical environment. Behind these buzzwords
hides a simple concept: closures are a more convenient way to give functions access
to local state and to pass data into background operations. They are special functions
that carry an implicit binding to all the nonlocal variables (also called free variables or
up-values) referenced. Moreover, a closure allows a function to access one or more nonlocal variables even when invoked outside its immediate lexical scope, and the body
of this special function can transport these free variables as a single entity, defined in
its enclosing scope. More importantly, a closure encapsulates behavior and passes it
around like any other object, granting access to the context in which the closure was
created, reading, and updating these values.
Just out of the blue,a simple and more understanding answer from the book C# 7.0 nutshell.
Pre-requisit you should know :A lambda expression can reference the local variables and parameters of the method
in which it’s defined (outer variables).
static void Main()
{
int factor = 2;
//Here factor is the variable that takes part in lambda expression.
Func<int, int> multiplier = n => n * factor;
Console.WriteLine (multiplier (3)); // 6
}
Real part:Outer variables referenced by a lambda expression are called captured variables. A lambda expression that captures variables is called a closure.
Last Point to be noted:Captured variables are evaluated when the delegate is actually invoked, not when the variables were captured:
int factor = 2;
Func<int, int> multiplier = n => n * factor;
factor = 10;
Console.WriteLine (multiplier (3)); // 30
A closure is a function, defined within a function, that can access the local variables of it as well as its parent.
public string GetByName(string name)
{
List<things> theThings = new List<things>();
return theThings.Find<things>(t => t.Name == name)[0];
}
so the function inside the find method.
t => t.Name == name
can access the variables inside its scope, t, and the variable name which is in its parents scope. Even though it is executed by the find method as a delegate, from another scope all together.
I really don't understand Tasks and Threads well.
I have a method inside three levels of nested for that I want to run multiple times in different threads/tasks, but the variables I pass to the method go crazy, let me explain with some code:
List<int> numbers=new List<int>();
for(int a=0;a<=70;a++)
{
for(int b=0;b<=6;b++)
{
for(int c=0;b<=10;c++)
{
Task.Factory.StartNew(()=>MyMethod(numbers,a,b,c));
}
}
}
private static bool MyMethod(List<int> nums,int a,int b,int c)
{
//Really a lot of stuff here
}
This is the nest, myMethod really does a lot of things, like calculating the factorial of some numbers, writing into different documents and matching responses with a list of combinations and calling other little methods, it has also some return value (booleans), but I don't care about them at the moment.
The problem is that no task reach an end, it's like everytime the nest call the method it refreshes itself, removing previous instances.
It also give an error, "try to divide for 0", with values OVER the ones delimited by FORs, for example a=71, b=7, c=11 and all variables empty(that's why divided by zero). I really don't know how to solve it.
The problem is, that you are using a variable that has been or will be modifed outside your closure/lambda. You should get a warning, saying "Access to modified closure".
You can fix it by putting your loop variables into locals first and use those:
namespace ConsoleApplication9
{
using System.Collections.Generic;
using System.Threading.Tasks;
class Program
{
static void Main()
{
var numbers = new List<int>();
for(int a=0;a<=70;a++)
{
for(int b=0;b<=6;b++)
{
for(int c=0;c<=10;c++)
{
var unmodifiedA = a;
var unmodifiedB = b;
var unmodifiedC = c;
Task.Factory.StartNew(() => MyMethod(numbers, unmodifiedA, unmodifiedB, unmodifiedC));
}
}
}
}
private static void MyMethod(List<int> nums, int a, int b, int c)
{
//Really a lot of stuffs here
}
}
}
Check your for statements. b and c are never incremented.
You then have a closure over the loop variables which is likely to be the cause of other problems.
Captured variable in a loop in C#
Why is it bad to use an iteration variable in a lambda expression
I find myself doing the following a lot, and i don't know if there is any side effects or not but consider the following in a WinForms C# app.
(please excuse any errors as i am typing the code in, not copy pasting anything)
int a = 1;
int b = 2;
int c = 3;
this.Invoke((MethodInvoker)delegate()
{
int lol = a + b + c;
});
Is there anything wrong with that? Or should i be doing the long way >_<
int a = 1;
int b = 2;
int c = 3;
TrippleIntDelegate ffs = new TrippleIntDelegate(delegate(int a_, int b_, int c_)
{
int lol = a_ + b_ + c_;
});
this.Invoke(ffs);
The difference being the parameters are passed in instead of using the local variables, some pretty sweet .net magic. I think i looked at reflector on it once and it created an entirely new class to hold those variables.
So does it matter? Can i be lazy?
Edit: Note, do not care about the return value obviously. Otherwise i'd have to use my own typed delegate, albeit i could still use the local variables without passing it in!
The way you use it, it doesn't really make a difference. However, in the first case, your anonymous method is capturing the variables, which can have pretty big side effects if you don't know what your doing. For instance :
// No capture :
int a = 1;
Action<int> action = delegate(int a)
{
a = 42; // method parameter a
});
action(a);
Console.WriteLine(a); // 1
// Capture of local variable a :
int a = 1;
Action action = delegate()
{
a = 42; // captured local variable a
};
action();
Console.WriteLine(a); // 42
There's nothing wrong with passing in local variables as long as you understand that you're getting deferred execution. If you write this:
int a = 1;
int b = 2;
int c = 3;
Action action = () => Console.WriteLine(a + b + c);
c = 10;
action(); // Or Invoke(action), etc.
The output of this will be 13, not 6. I suppose this would be the counterpart to what Thomas said; if you read locals in a delegate, it will use whatever values the variables hold when the action is actually executed, not when it is declared. This can produce some interesting results if the variables hold reference types and you invoke the delegate asynchronously.
Other than that, there are lots of good reasons to pass local variables into a delegate; among other things, it can be used to simplify threading code. It's perfectly fine to do as long as you don't get sloppy with it.
Well, all of the other answers seem to ignore the multi-threading context and the issues that arise in that case. If you are indeed using this from WinForms, your first example could throw exceptions. Depending on the actual data you are trying to reference from your delegate, the thread that code is actually invoked on may or may not have the right to access the data you close around.
On the other hand, your second example actually passes the data via parameters. That allows the Invoke method to properly marshal data across thread boundaries and avoid those nasty threading issues. If you are calling Invoke from, say, a background worker, then then you should use something like your second example (although I would opt to use the Action<T, ...> and Func<T, ...> delegates whenever possible rather than creating new ones).
From a style perspective I'd choose the paramater passing variant. It's expresses the intent much easier to pass args instad of take ambients of any sort (and also makes it easier to test). I mean, you could do this:
public void Int32 Add()
{
return this.Number1 + this.Number2
}
but it's neither testable or clear. The sig taking params is much clearer to others what the method is doing... it's adding two numbers: not an arbatrary set of numbers or whatever.
I regularly do this with parms like collections which are used via ref anyway and don't need to be explicitlly 'returned':
public List<string> AddNames(List<String> names)
{
names.Add("kevin");
return names;
}
Even though the names collection is passed by ref and thus does not need to be explicitly returned, it is to me much clearer that the method takes the list and adds to it, then returns it back. In this case, there is no technical reason to write the sig this way, but, to me, good reasons as far as clarity and therefore maintainablity are concerned.
I found the following rather strange. Then again, I have mostly used closures in dynamic languages which shouldn't be suspectable to the same "bug". The following makes the compiler unhappy:
VoidFunction t = delegate { int i = 0; };
int i = 1;
It says:
A local variable named 'i' cannot be
declared in this scope because it
would give a different meaning to 'i',
which is already used in a 'child'
scope to denote something else
So this basically means that variables declared inside a delegate will have the scope of the function declared in. Not exactly what I would have expected. I havn't even tried to call the function. At least Common Lisp has a feature where you say that a variable should have a dynamic name, if you really want it to be local. This is particularly important when creating macros that do not leak, but something like that would be helpful here as well.
So I'm wondering what other people do to work around this issue?
To clarify I'm looking for a solution where the variables I declare in the delegete doesn't interfere with variables declared after the delegate. And I want to still be able to capture variables declared before the delegate.
It has to be that way to allow anonymous methods (and lambdas) to use local variables and parameters scoped in the containing method.
The workarounds are to either use different names for the variable, or create an ordinary method.
The "closure" created by an anonymous function is somewhat different from that created in other dynamic languages (I'll use Javascript as an example).
function thing() {
var o1 = {n:1}
var o2 = {dummy:"Hello"}
return function() { return o1.n++; }
}
var fn = thing();
alert(fn());
alert(fn());
This little chunk of javascript will display 1 then 2. The anonymous function can access the o1 variable because it exists on its scope chain. However the anonymous function has an entirely independant scope in which it could create another o1 variable and thereby hide any other further down the scope chain. Note also that all variables in the entire chain remain, hence o2 would continue to exist holding an object reference for as long as the fn varialbe holds the function reference.
Now compare with C# anonymous functions:-
class C1 { public int n {get; set;} }
class C2 { public string dummy { get; set; } }
Func<int> thing() {
var o1 = new C1() {n=1};
var o2 = new C2() {dummy="Hello"};
return delegate { return o1.n++; };
}
...
Func<int> fn = thing();
Console.WriteLine(fn());
Console.WriteLine(fn());
In this case the anonymous function is not creating a truely independant scope any more than variable declaration in any other in-function { } block of code would be (used in a foreach, if, etc.)
Hence the same rules apply, code outside the block cannot access variables declared inside the block but you cannot reuse an identifier either.
A closure is created when the anonymous function is passed outside of the function that it was created in. The variation from the Javascript example is that only those variables actually used by the anonymous function will remain, hence in this case the object held by o2 will be available for GC as soon as thing completes,
You'll also get CS0136 from code like this:
int i = 0;
if (i == 0) {
int i = 1;
}
The scope of the 2nd declaration of "i" is unambiguous, languages like C++ don't have any beef with it. But the C# language designers decided to forbid it. Given the above snippet, do you think still think that was a bad idea? Throw in a bunch of extra code and you could stare at this code for a while and not see the bug.
The workaround is trivial and painless, just come up with a different variable name.
It's because the delegate can reference variables outside the delegate:
int i = 1;
VoidFunction t = delegate { Console.WriteLine(i); };
If I remember correctly, the compiler creates a class member of the outside variables referenced in the anonymous method, in order to make this work.
Here is a workaround:
class Program
{
void Main()
{
VoidFunction t = RealFunction;
int i = 1;
}
delegate void VoidFunction();
void RealFunction() { int i = 0; }
}
Actually, the error doesn't seem to have anything to do with anonymous delegates or lamda expressions. If you try to compile the following program ...
using System;
class Program
{
static void Main()
{
// Action t = delegate
{
int i = 0;
};
int i = 1;
}
}
... you get exactly the same error, no matter whether you comment in the line or not. The error help shows a very similar case. I think it is reasonable to disallow both cases on the grounds that programmers could confuse the two variables.