I' studying about ThreadPool in C# recently. And I wrote a small function to test it. Just as the code shown below:
static void threadTest()
{
int totalCnt = 2;
ManualResetEvent manual = new ManualResetEvent(false);
for(int i = 0; i < 2; ++i)
{
int cur = i;
Console.WriteLine("Before thread:" + cur);
ThreadPool.QueueUserWorkItem((s) => {
Console.WriteLine("In thread:" + cur);
if (Interlocked.Decrement(ref totalCnt) == 0) manual.Set();
});
Console.WriteLine("End thread:" + cur);
}
manual.WaitOne();
}
The output of this code is:
Before thread:0
End thread:0
Before thread:1
End thread:1
Before thread:2
End thread:2
Before thread:3
End thread:3
In thread:0
In thread:3
In thread:2
In thread:1
"In thread:xx" is printed after all loops are over. But every thread needs a temporary variable "cur" to output this line. However,as far as I know, a temporary variable in one loop will be removed from stack after the loop is over. In this example, thread can still visit "cur" when its loop is over. Can anyone explain this to me. Thanks!!
From the documentation about lambdas
Lambdas can refer to outer variables. These are the variables that are in scope in the method that defines the lambda expression, or in scope in the type that contains the lambda expression. Variables that are captured in this manner are stored for use in the lambda expression even if the variables would otherwise go out of scope and be garbage collected. An outer variable must be definitely assigned before it can be consumed in a lambda expression.
Emphasis mine.
In effect the lambda expression will be rewritten by the compiler to a class, and any outer variable used in the expression will be added as fields in that class.
Related
What is a closure? Do we have them in .NET?
If they do exist in .NET, could you please provide a code snippet (preferably in C#) explaining it?
I have an article on this very topic. (It has lots of examples.)
In essence, a closure is a block of code which can be executed at a later time, but which maintains the environment in which it was first created - i.e. it can still use the local variables etc of the method which created it, even after that method has finished executing.
The general feature of closures is implemented in C# by anonymous methods and lambda expressions.
Here's an example using an anonymous method:
using System;
class Test
{
static void Main()
{
Action action = CreateAction();
action();
action();
}
static Action CreateAction()
{
int counter = 0;
return delegate
{
// Yes, it could be done in one statement;
// but it is clearer like this.
counter++;
Console.WriteLine("counter={0}", counter);
};
}
}
Output:
counter=1
counter=2
Here we can see that the action returned by CreateAction still has access to the counter variable, and can indeed increment it, even though CreateAction itself has finished.
If you are interested in seeing how C# implements Closure read "I know the answer (its 42) blog"
The compiler generates a class in the background to encapsulate the anoymous method and the variable j
[CompilerGenerated]
private sealed class <>c__DisplayClass2
{
public <>c__DisplayClass2();
public void <fillFunc>b__0()
{
Console.Write("{0} ", this.j);
}
public int j;
}
for the function:
static void fillFunc(int count) {
for (int i = 0; i < count; i++)
{
int j = i;
funcArr[i] = delegate()
{
Console.Write("{0} ", j);
};
}
}
Turning it into:
private static void fillFunc(int count)
{
for (int i = 0; i < count; i++)
{
Program.<>c__DisplayClass1 class1 = new Program.<>c__DisplayClass1();
class1.j = i;
Program.funcArr[i] = new Func(class1.<fillFunc>b__0);
}
}
Closures are functional values that hold onto variable values from their original scope. C# can use them in the form of anonymous delegates.
For a very simple example, take this C# code:
delegate int testDel();
static void Main(string[] args)
{
int foo = 4;
testDel myClosure = delegate()
{
return foo;
};
int bar = myClosure();
}
At the end of it, bar will be set to 4, and the myClosure delegate can be passed around to be used elsewhere in the program.
Closures can be used for a lot of useful things, like delayed execution or to simplify interfaces - LINQ is mainly built using closures. The most immediate way it comes in handy for most developers is adding event handlers to dynamically created controls - you can use closures to add behavior when the control is instantiated, rather than storing data elsewhere.
Func<int, int> GetMultiplier(int a)
{
return delegate(int b) { return a * b; } ;
}
//...
var fn2 = GetMultiplier(2);
var fn3 = GetMultiplier(3);
Console.WriteLine(fn2(2)); //outputs 4
Console.WriteLine(fn2(3)); //outputs 6
Console.WriteLine(fn3(2)); //outputs 6
Console.WriteLine(fn3(3)); //outputs 9
A closure is an anonymous function passed outside of the function in which it is created.
It maintains any variables from the function in which it is created that it uses.
A closure is when a function is defined inside another function (or method) and it uses the variables from the parent method. This use of variables which are located in a method and wrapped in a function defined within it, is called a closure.
Mark Seemann has some interesting examples of closures in his blog post where he does a parallel between oop and functional programming.
And to make it more detailed
var workingDirectory = new DirectoryInfo(Environment.CurrentDirectory);//when this variable
Func<int, string> read = id =>
{
var path = Path.Combine(workingDirectory.FullName, id + ".txt");//is used inside this function
return File.ReadAllText(path);
};//the entire process is called a closure.
Here is a contrived example for C# which I created from similar code in JavaScript:
public delegate T Iterator<T>() where T : class;
public Iterator<T> CreateIterator<T>(IList<T> x) where T : class
{
var i = 0;
return delegate { return (i < x.Count) ? x[i++] : null; };
}
So, here is some code that shows how to use the above code...
var iterator = CreateIterator(new string[3] { "Foo", "Bar", "Baz"});
// So, although CreateIterator() has been called and returned, the variable
// "i" within CreateIterator() will live on because of a closure created
// within that method, so that every time the anonymous delegate returned
// from it is called (by calling iterator()) it's value will increment.
string currentString;
currentString = iterator(); // currentString is now "Foo"
currentString = iterator(); // currentString is now "Bar"
currentString = iterator(); // currentString is now "Baz"
currentString = iterator(); // currentString is now null
Hope that is somewhat helpful.
Closures are chunks of code that reference a variable outside themselves, (from below them on the stack), that might be called or executed later, (like when an event or delegate is defined, and could get called at some indefinite future point in time)... Because the outside variable that the chunk of code references may gone out of scope (and would otherwise have been lost), the fact that it is referenced by the chunk of code (called a closure) tells the runtime to "hold" that variable in scope until it is no longer needed by the closure chunk of code...
Basically closure is a block of code that you can pass as an argument to a function. C# supports closures in form of anonymous delegates.
Here is a simple example:
List.Find method can accept and execute piece of code (closure) to find list's item.
// Passing a block of code as a function argument
List<int> ints = new List<int> {1, 2, 3};
ints.Find(delegate(int value) { return value == 1; });
Using C#3.0 syntax we can write this as:
ints.Find(value => value == 1);
If you write an inline anonymous method (C#2) or (preferably) a Lambda expression (C#3+), an actual method is still being created. If that code is using an outer-scope local variable - you still need to pass that variable to the method somehow.
e.g. take this Linq Where clause (which is a simple extension method which passes a lambda expression):
var i = 0;
var items = new List<string>
{
"Hello","World"
};
var filtered = items.Where(x =>
// this is a predicate, i.e. a Func<T, bool> written as a lambda expression
// which is still a method actually being created for you in compile time
{
i++;
return true;
});
if you want to use i in that lambda expression, you have to pass it to that created method.
So the first question that arises is: should it be passed by value or reference?
Pass by reference is (I guess) more preferable as you get read/write access to that variable (and this is what C# does; I guess the team in Microsoft weighed the pros and cons and went with by-reference; According to Jon Skeet's article, Java went with by-value).
But then another question arises: Where to allocate that i?
Should it actually/naturally be allocated on the stack?
Well, if you allocate it on the stack and pass it by reference, there can be situations where it outlives it's own stack frame. Take this example:
static void Main(string[] args)
{
Outlive();
var list = whereItems.ToList();
Console.ReadLine();
}
static IEnumerable<string> whereItems;
static void Outlive()
{
var i = 0;
var items = new List<string>
{
"Hello","World"
};
whereItems = items.Where(x =>
{
i++;
Console.WriteLine(i);
return true;
});
}
The lambda expression (in the Where clause) again creates a method which refers to an i. If i is allocated on the stack of Outlive, then by the time you enumerate the whereItems, the i used in the generated method will point to the i of Outlive, i.e. to a place in the stack that is no longer accessible.
Ok, so we need it on the heap then.
So what the C# compiler does to support this inline anonymous/lambda, is use what is called "Closures": It creates a class on the Heap called (rather poorly) DisplayClass which has a field containing the i, and the Function that actually uses it.
Something that would be equivalent to this (you can see the IL generated using ILSpy or ILDASM):
class <>c_DisplayClass1
{
public int i;
public bool <GetFunc>b__0()
{
this.i++;
Console.WriteLine(i);
return true;
}
}
It instantiates that class in your local scope, and replaces any code relating to i or the lambda expression with that closure instance. So - anytime you are using the i in your "local scope" code where i was defined, you are actually using that DisplayClass instance field.
So if I would change the "local" i in the main method, it will actually change _DisplayClass.i ;
i.e.
var i = 0;
var items = new List<string>
{
"Hello","World"
};
var filtered = items.Where(x =>
{
i++;
return true;
});
filtered.ToList(); // will enumerate filtered, i = 2
i = 10; // i will be overwriten with 10
filtered.ToList(); // will enumerate filtered again, i = 12
Console.WriteLine(i); // should print out 12
it will print out 12, as "i = 10" goes to that dispalyclass field and changes it just before the 2nd enumeration.
A good source on the topic is this Bart De Smet Pluralsight module (requires registration) (also ignore his erroneous use of the term "Hoisting" - what (I think) he means is that the local variable (i.e. i) is changed to refer to the the new DisplayClass field).
In other news, there seems to be some misconception that "Closures" are related to loops - as I understand "Closures" are NOT a concept related to loops, but rather to anonymous methods / lambda expressions use of local scoped variables - although some trick questions use loops to demonstrate it.
A closure aims to simplify functional thinking, and it allows the runtime to manage
state, releasing extra complexity for the developer. A closure is a first-class function
with free variables that are bound in the lexical environment. Behind these buzzwords
hides a simple concept: closures are a more convenient way to give functions access
to local state and to pass data into background operations. They are special functions
that carry an implicit binding to all the nonlocal variables (also called free variables or
up-values) referenced. Moreover, a closure allows a function to access one or more nonlocal variables even when invoked outside its immediate lexical scope, and the body
of this special function can transport these free variables as a single entity, defined in
its enclosing scope. More importantly, a closure encapsulates behavior and passes it
around like any other object, granting access to the context in which the closure was
created, reading, and updating these values.
Just out of the blue,a simple and more understanding answer from the book C# 7.0 nutshell.
Pre-requisit you should know :A lambda expression can reference the local variables and parameters of the method
in which it’s defined (outer variables).
static void Main()
{
int factor = 2;
//Here factor is the variable that takes part in lambda expression.
Func<int, int> multiplier = n => n * factor;
Console.WriteLine (multiplier (3)); // 6
}
Real part:Outer variables referenced by a lambda expression are called captured variables. A lambda expression that captures variables is called a closure.
Last Point to be noted:Captured variables are evaluated when the delegate is actually invoked, not when the variables were captured:
int factor = 2;
Func<int, int> multiplier = n => n * factor;
factor = 10;
Console.WriteLine (multiplier (3)); // 30
A closure is a function, defined within a function, that can access the local variables of it as well as its parent.
public string GetByName(string name)
{
List<things> theThings = new List<things>();
return theThings.Find<things>(t => t.Name == name)[0];
}
so the function inside the find method.
t => t.Name == name
can access the variables inside its scope, t, and the variable name which is in its parents scope. Even though it is executed by the find method as a delegate, from another scope all together.
I just read MSDN and found something need any advise here.
http://msdn.microsoft.com/en-us/library/0yw3tz5k.aspx
A reference to the outer variable n is said to be captured when the delegate is created. Unlike local variables, the lifetime of a captured variable extends until the delegates that reference the anonymous methods are eligible for garbage collection.
Does the "Captured" means it will be copy by value?
But however I try to write a sample program as follow:
class Program
{
class async_class
{
private int n = 0;
public async_class()
{
for (int i = 0; i <= 9; i++)
{
System.Console.WriteLine("Outer n={0} address={1}", n, n.GetHashCode());
System.Threading.Thread thread1 = new System.Threading.Thread( () =>
{
System.Console.WriteLine("Inner after n={0} address={1}", ++n, n.GetHashCode());
});
thread1.Start();
//n = 10;
}
}
}
static void Main(string[] args)
{
async_class class1 = new async_class();
}
}
}
In this sample, the inner "++n" will write back to original outer "n". So the result will be.
Outer n=0 address=0
Outer n=0 address=0
Inner after n=1 address=1
Outer n=1 address=1
Anyone could explain more detail about the "captured" outer variable?
No, the whole point of saying that it's captured is that it's not just copying the values. Closures close over variables, not over values. Every single access of n in your program is accessing the same variable, there are never any copies made of it.
That said, your program is a confusing example of this case; it is using multiple threads, and that is introducing all sorts of race conditions as you are not safely manipulating the variable from multiple threads. This will cause all sorts of undefined behaviors. If you want to study closures, do so from a single thread; it will make your program much, much, much easier to reason about.
You can write much simpler programs to demonstrate that closures close over variables, not values. Here's a simple snippet:
int n = 2;
Action a = () => Console.WriteLine(n);
n = 5;
a();
If the closure captured the value of n, this would print 2. If it closes over the variable instead of its value, it will print 5. Go ahead and run it to see what happens.
I have multiple tasks and each starts at a given LinkedListNode. Something like this:
first = linkedList.First;
int counter = 0;
while (iterator != null) {
counter++;
if (counter == threshold) {
Task.Factory.StartNew(() => run(first, iterator));
counter = 0;
first = iterator.Next;
}
iterator = iterator.Next;
}
The idea is that I want to run through a LinkedList and not convert it to an array because of memory requirements. So, I figured I'd just pass in the start and the end and iterate over that.
When my tasks actually start, it seems like the parameters are where ever they left off in the loop. Is there some way I can form a closure over the variables so that the task starts with the correct nodes from the LinkedList?
Or, maybe a better way of accomplishing this goal with a LinkedList?
Consider using Parallel.ForEach on your linked list instead. This looks like it will save you a lot of trouble.
Due to variable closure, you should declare the variable that is captured in your lambda expression within the body of your loop. Otherwise, the task's reads of the first variable would suffer a race condition with the main thread's subsequent updates to it.
first = linkedList.First;
while (iterator != null)
{
// ...
var current = first;
Task.Factory.StartNew(() => run(current));
// ...
}
You can pass state to your Task through the appropriate StartNew method which takes an object parameter. A closure is automatically performed over the data passed in this way to the Task.
I just encountered the following behavior:
for (var i = 0; i < 50; ++i) {
Task.Factory.StartNew(() => {
Debug.Print("Error: " + i.ToString());
});
}
Will result in a series of "Error: x", where most of the x are equal to 50.
Similarly:
var a = "Before";
var task = new Task(() => Debug.Print("Using value: " + a));
a = "After";
task.Start();
Will result in "Using value: After".
This clearly means that the concatenation in the lambda expression does not occur immediately. How is it possible to use a copy of the outer variable in the lambda expression, at the time the expression is declared? The following will not work better (which is not necessarily incoherent, I admit):
var a = "Before";
var task = new Task(() => {
var a2 = a;
Debug.Print("Using value: " + a2);
});
a = "After";
task.Start();
This has more to do with lambdas than threading. A lambda captures the reference to a variable, not the variable's value. This means that when you try to use i in your code, its value will be whatever was stored in i last.
To avoid this, you should copy the variable's value to a local variable when the lambda starts. The problem is, starting a task has overhead and the first copy may be executed only after the loop finishes. The following code will also fail
for (var i = 0; i < 50; ++i) {
Task.Factory.StartNew(() => {
var i1=i;
Debug.Print("Error: " + i1.ToString());
});
}
As James Manning noted, you can add a variable local to the loop and copy the loop variable there. This way you are creating 50 different variables to hold the value of the loop variable, but at least you get the expected result. The problem is, you do get a lot of additional allocations.
for (var i = 0; i < 50; ++i) {
var i1=i;
Task.Factory.StartNew(() => {
Debug.Print("Error: " + i1.ToString());
});
}
The best solution is to pass the loop parameter as a state parameter:
for (var i = 0; i < 50; ++i) {
Task.Factory.StartNew(o => {
var i1=(int)o;
Debug.Print("Error: " + i1.ToString());
}, i);
}
Using a state parameter results in fewer allocations. Looking at the decompiled code:
the second snippet will create 50 closures and 50 delegates
the third snippet will create 50 boxed ints but only a single delegate
That's because you are running the code in a new thread, and the main thread immediately goes on to change the variable. If the lambda expression were executed immediately, the entire point of using a task would be lost.
The thread doesn't get its own copy of the variable at the time the task is created, all the tasks use the same variable (which actually is stored in the closure for the method, it's not a local variable).
Lambda expressions do capture not the value of the outer variable but a reference to it. That is the reason why you do see 50 or After in your tasks.
To solve this create before your lambda expression a copy of it to capture it by value.
This unfortunate behaviour will be fixed by the C# compiler with .NET 4.5 until then you need to live with this oddity.
Example:
List<Action> acc = new List<Action>();
for (int i = 0; i < 10; i++)
{
int tmp = i;
acc.Add(() => { Console.WriteLine(tmp); });
}
acc.ForEach(x => x());
Lambda expressions are by definition lazily evaluated so they will not be evaluated until actually called. In your case by the task execution. If you close over a local in your lambda expression the state of the local at the time of execution will be reflected. Which is what you see. You can take advantage of this. E.g. your for loop really don't need a new lambda for every iteration assuming for the sake of the example that the described result was what you intended you could write
var i =0;
Action<int> action = () => Debug.Print("Error: " + i);
for(;i<50;+i){
Task.Factory.StartNew(action);
}
on the other hand if you wished that it actually printed "Error: 1"..."Error 50" you could change the above to
var i =0;
Func<Action<int>> action = (x) => { return () => Debug.Print("Error: " + x);}
for(;i<50;+i){
Task.Factory.StartNew(action(i));
}
The first closes over i and will use the state of i at the time the Action is executed and the state is often going to be the state after the loop finishes. In the latter case i is evaluated eagerly because it's passed as an argument to a function. This function then returns an Action<int> which is passed to StartNew.
So the design decision makes both lazily evaluation and eager evaluation possible. Lazily because locals are closed over and eagerly because you can force locals to be executed by passing them as an argument or as shown below declaring another local with a shorter scope
for (var i = 0; i < 50; ++i) {
var j = i;
Task.Factory.StartNew(() => Debug.Print("Error: " + j));
}
All the above is general for Lambdas. In the specific case of StartNew there's actually an overload that does what the second example does so that can be simplified to
var i =0;
Action<object> action = (x) => Debug.Print("Error: " + x);}
for(;i<50;+i){
Task.Factory.StartNew(action,i);
}
When using lambda expressions or anonymous methods in C#, we have to be wary of the access to modified closure pitfall. For example:
foreach (var s in strings)
{
query = query.Where(i => i.Prop == s); // access to modified closure
...
}
Due to the modified closure, the above code will cause all of the Where clauses on the query to be based on the final value of s.
As explained here, this happens because the s variable declared in foreach loop above is translated like this in the compiler:
string s;
while (enumerator.MoveNext())
{
s = enumerator.Current;
...
}
instead of like this:
while (enumerator.MoveNext())
{
string s;
s = enumerator.Current;
...
}
As pointed out here, there are no performance advantages to declaring a variable outside the loop, and under normal circumstances the only reason I can think of for doing this is if you plan to use the variable outside the scope of the loop:
string s;
while (enumerator.MoveNext())
{
s = enumerator.Current;
...
}
var finalString = s;
However variables defined in a foreach loop cannot be used outside the loop:
foreach(string s in strings)
{
}
var finalString = s; // won't work: you're outside the scope.
So the compiler declares the variable in a way that makes it highly prone to an error that is often difficult to find and debug, while producing no perceivable benefits.
Is there something you can do with foreach loops this way that you couldn't if they were compiled with an inner-scoped variable, or is this just an arbitrary choice that was made before anonymous methods and lambda expressions were available or common, and which hasn't been revised since then?
The compiler declares the variable in a way that makes it highly prone to an error that is often difficult to find and debug, while producing no perceivable benefits.
Your criticism is entirely justified.
I discuss this problem in detail here:
Closing over the loop variable considered harmful
Is there something you can do with foreach loops this way that you couldn't if they were compiled with an inner-scoped variable? or is this just an arbitrary choice that was made before anonymous methods and lambda expressions were available or common, and which hasn't been revised since then?
The latter. The C# 1.0 specification actually did not say whether the loop variable was inside or outside the loop body, as it made no observable difference. When closure semantics were introduced in C# 2.0, the choice was made to put the loop variable outside the loop, consistent with the "for" loop.
I think it is fair to say that all regret that decision. This is one of the worst "gotchas" in C#, and we are going to take the breaking change to fix it. In C# 5 the foreach loop variable will be logically inside the body of the loop, and therefore closures will get a fresh copy every time.
The for loop will not be changed, and the change will not be "back ported" to previous versions of C#. You should therefore continue to be careful when using this idiom.
What you are asking is thoroughly covered by Eric Lippert in his blog post Closing over the loop variable considered harmful and its sequel.
For me, the most convincing argument is that having new variable in each iteration would be inconsistent with for(;;) style loop. Would you expect to have a new int i in each iteration of for (int i = 0; i < 10; i++)?
The most common problem with this behavior is making a closure over iteration variable and it has an easy workaround:
foreach (var s in strings)
{
var s_for_closure = s;
query = query.Where(i => i.Prop == s_for_closure); // access to modified closure
My blog post about this issue: Closure over foreach variable in C#.
Having been bitten by this, I have a habit of including locally defined variables in the innermost scope which I use to transfer to any closure. In your example:
foreach (var s in strings)
query = query.Where(i => i.Prop == s); // access to modified closure
I do:
foreach (var s in strings)
{
string search = s;
query = query.Where(i => i.Prop == search); // New definition ensures unique per iteration.
}
Once you have that habit, you can avoid it in the very rare case you actually intended to bind to the outer scopes. To be honest, I don't think I have ever done so.
In C# 5.0, this problem is fixed and you can close over loop variables and get the results you expect.
The language specification says:
8.8.4 The foreach statement
(...)
A foreach statement of the form
foreach (V v in x) embedded-statement
is then expanded to:
{
E e = ((C)(x)).GetEnumerator();
try {
while (e.MoveNext()) {
V v = (V)(T)e.Current;
embedded-statement
}
}
finally {
… // Dispose e
}
}
(...)
The placement of v inside the while loop is important for how it is
captured by any anonymous function occurring in the
embedded-statement. For example:
int[] values = { 7, 9, 13 };
Action f = null;
foreach (var value in values)
{
if (f == null) f = () => Console.WriteLine("First value: " + value);
}
f();
If v was declared outside of the while loop, it would be shared
among all iterations, and its value after the for loop would be the
final value, 13, which is what the invocation of f would print.
Instead, because each iteration has its own variable v, the one
captured by f in the first iteration will continue to hold the value
7, which is what will be printed. (Note: earlier versions of C#
declared v outside of the while loop.)