C# lambda, local variable value not taken when you think? - c#

Suppose we have the following code:
void AFunction()
{
foreach(AClass i in AClassCollection)
{
listOfLambdaFunctions.AddLast( () => { PrintLine(i.name); } );
}
}
void Main()
{
AFunction();
foreach( var i in listOfLambdaFunctions)
i();
}
One might think that the above code would out the same as the following:
void Main()
{
foreach(AClass i in AClassCollection)
PrintLine(i.name);
}
However, it doesn't. Instead, it prints the name of the last item in AClassCollection every time.
It appears as if the same item was being used in each lambda function. I suspect there might be some delay from when the lambda was created to when the lambda took a snapshot of the external variables used in it.
Essentially, the lambda is holding a reference to the local variable i, instead of taking a "snapshot" of i's value when the lambda was created.
To test this theory, I tried this code:
string astr = "a string";
AFunc fnc = () => { System.Diagnostics.Debug.WriteLine(astr); };
astr = "changed";
fnc();
and, surprise, it outputs changed!
I am using XNA 3.1, and whichever version of C# that comes with it.
My questions are:
What is going on?
Does the lambda function somehow store a 'reference' to the variable or something?
Is there any way around this problem?

This is a modified closure
See: similar questions like Access to Modified Closure
To work around the issue you have to store a copy of the variable inside the scope of the for loop:
foreach(AClass i in AClassCollection)
{
AClass anotherI= i;
listOfLambdaFunctions.AddLast( () => { PrintLine(anotherI.name); } );
}

does the lambda function somehow store a 'reference' to the variable or something?
Close. The lambda function captures the variable itself. There is no need to store a reference to a variable, and in fact, in .NET it is impossible to permanently store a reference to a variable. You just capture the entire variable. You never capture the value of the variable.
Remember, a variable is a storage location. The name "i" refers to a particular storage location, and in your case, it always refers to the same storage location.
Is there anyway around this problem?
Yes. Create a new variable every time through the loop. The closure then captures a different variable every time.
This is one of the most frequently reported problems with C#. We're considering changing the semantics of the loop variable declaration so that a new variable is created every time through the loop.
For more details on this issue see my articles on the subject:
http://ericlippert.com/2009/11/12/closing-over-the-loop-variable-considered-harmful-part-one/

what is going on? does the lambda function somehow store a 'reference' to the variable or something?
Yes exactly that; c# captured variables are to the variable, not the value of the variable. You can usually get around this by introducing a temp variable and binding to that:
string astr = "a string";
var tmp = astr;
AFunc fnc = () => { System.Diagnostics.Debug.WriteLine(tmp); };
especially in foreach where this is notorious.

Yes, the lambda stores a reference to the variable (conceptually speaking, anyway).
A very simple workaround is this:
foreach(AClass i in AClassCollection)
{
AClass j = i;
listOfLambdaFunctions.AddLast( () => { PrintLine(j.name); } );
}
In every iteration of the foreach loop, a new j gets created, which the lambda captures.
i on the other hand, is the same variable throughout, but gets updated with every iteration (so all the lambdas end up seeing the last value)
And I agree that this is a bit surprising. :)

I've been caught by this one as well, as said by Calgary Coder, it is a modified closure. I really had trouble spotting them until I got resharper. Since it is one of the warnings that resharper watches for, I am much better at identifying them as I code.

Related

Why doesn't my C# compiler (Visual Studio) let me do this with a try block?

I have many scenarios during my development where I want to do something such as
try
{
long presult = EvalInner(eqtn,new Tuple<int, int>(++begidx,curidx-1),eqtnArgs);
}
catch ( Exception e )
{
throw e;
}
result = evalNewResult(result,lastop,presult,ref negateNextNum,ref negateNextOp);
// ...
return presult;
but then my compiler flags the presult on the line
result = evalNewResult(result,lastop,presult,ref negateNextNum,ref negateNextOp);
saying
The name 'presult' does not exist in the current context
If it were smart, it would understand that presult is either initialized in the try block, or the procedure is exited before presult is ever used.
Possible workarounds (none of them good):
Declare long presult; right before the try statement. This makes the compiler mad because it wrongly thinks there's a possibility of returning an unintialized variable.
Initialize it with long presult = default(long). This works, but it's bad practice because someone reading the code doesn't know whether intializing it to the default value is to work around the problem described in 1. or is because the value presult because set to the default long has some real meaning in the context of the program.
Initialize it with long? presult = null. This is semantically better because it's clear that it means "presult is meant to have no value at this point" whereas in 2. the reader has to figure out that presult has a meaningless value. The problem here is that, not only does it take extra memory to nullify a value, but I then have to change the function EvalInner to return a long? and this results in a chain of needing to change many more longs to long?s and my program ends up splattered with nullified variables; it's a complete mess of question marks haha.
Anyways, how should I be handling a case like this?
I'll go over your points one by one:
Declare long presult; right before the try statement. This makes the
compiler mad because it wrongly thinks there's a possibility of
returning an unintialized variable.
Actually, the compiler correctly determines that there is the possibility of returning an uninitialized variable. Since the variable is only set if the function on the right hand side succeeds, and since you have it in a try..catch block then there is the possibility that the function may throw and not return, therefore not initializing the variable. What the compiler is not smart enough to see is that you are catching the top level exception and throwing (in a bad way, losing the stack trace) and it should not reach the return. However there are ways to get around that (mostly during debug by dragging the execution cursor).
Initialize it with long presult = default(long). This works, but
it's bad practice because someone reading the code doesn't know
whether intializing it to the default value is to work around the
problem described in 1. or is because the value presult because set
to the default long has some real meaning in the context of the
program.
Since value types like long, int, short etc must have a value, this is not bad practice. If you want to represent them as not having a value, use the nullable versions of those types (i.e. long? presult = null).
Initialize it with long? presult = null. This is semantically better
because it's clear that it means "presult is meant to have no value
at this point" whereas in 2. the reader has to figure out that
presult has a meaningless value. The problem here is that, not only
does it take extra memory to nullify a value, but I then have to
change the function EvalInner to return a long? and this results in
a chain of needing to change many more longs to long?s and my
program ends up splattered with nullified variables; it's a complete
mess of question marks haha.
Again, the function must return a value that is a valid long, so if you want to return something that can easily be identified as an incorrect value, then return the nullable version, otherwise you have to return a valid value. Only float and double have NaN members...
Another option would be some kind of TryXXX method, where the return value is a boolean and you use an out long as a parameter to store the result.
I don't understand you problem. The compiler can't know the value of presult when you call evalNewResult that's why you need to declare it outside the try block. It's a general rule of scopes in C# and a lot of other languages.
The solution is to declare and initialize it before the try block. The question is "what value should presult have in case an exception occurs". The compiler can't ask this question himslef.
How about:
try
{
long presult = EvalInner(eqtn,new Tuple<int, int>(++begidx,curidx-1),eqtnArgs);
result = evalNewResult(result,lastop,presult,ref negateNextNum,ref negateNextOp);
// ...
return presult;
}
catch ( Exception e )
{
//Do some useful logging
throw; //Don't lose stacktrace!
}
Please check this link for more enlightment
Compilers are in the business of generating code which manages the storage of the data manipulated by that program. There are lots of different ways of generating code to manage memory, but over time two basic techniques have become entrenched.
The first is to have some sort of "long lived" storage area where the "lifetime" of each byte in the storage -- that is, the period of time when it is validly associated with some program variable -- cannot be easily predicted ahead of time. The compiler generates calls into a "heap manager" that knows how to dynamically allocate storage when it is needed and reclaim it when it is no longer needed.
The second is to have some sort of "short lived" storage area where the lifetime of each byte in the storage is well known, and, in particular, lifetimes of storages follow a "nesting" pattern. That is, the allocation of the longest-lived of the short-lived variables strictly overlaps the allocations of shorter-lived variables that come after it.
Local variables follow the latter pattern; when a method is entered, its local variables come alive. When that method calls another method, the new method's local variables come alive. They'll be dead before the first method's local variables are dead. The relative order of the beginnings and endings of lifetimes of storages associated with local variables can be worked out ahead of time.
For this reason, local variables are usually generated as storage on a "stack" data structure, because a stack has the property that the first thing pushed on it is going to be the last thing popped off.
So overall local variable are are short lived and usually stored in stack, why stack coz its efficient than others. Link for more info why stack
Why doesn't my C# compiler (Visual Studio) let me do this with a try block?
That is because braces define a scope. From Variable and Method Scope in Microsoft .NET:
If you declare a variable within a block construct such as an If statement, that variable's scope is only until the end of the block. The lifetime is until the procedure ends.
how should I be handling a case like this?
Go for option 1.
This makes the compiler mad because it wrongly thinks there's a possibility of returning an unintialized variable
Option 1 does not make the compiler mad. The compiler is always right :-)
I created the following SSCCE and it absolutely works:
using System;
namespace app1
{
class Program
{
static void Main(string[] args)
{
Presult();
}
private static long Presult()
{
long presult;
try
{
object eqtn = null;
char begidx = '\0';
int curidx = 0;
object eqtnArgs = null;
presult = EvalInner(eqtn, new Tuple<int, int>(++begidx, curidx - 1), eqtnArgs);
}
catch (Exception e)
{
throw e;
}
int result = 0;
object lastop = null;
object negateNextNum = null;
object negateNextOp = null;
result = evalNewResult(result, lastop, presult, ref negateNextNum, ref negateNextOp);
// ...
return presult;
}
private static int evalNewResult(int result, object lastop, long presult, ref object negateNextNum, ref object negateNextOp)
{
return 0;
}
private static long EvalInner(object eqtn, Tuple<int, int> tuple, object eqtnArgs)
{
return 0;
}
}
}
how should I be handling a case like this?
The correct way is your Option 1. That doesn't make compiler "mad", because in fact declaring a variable and initializing it later is allowed construct (not only for this particular case) from the very beginning, and compiler should be able to handle it correctly w/o problem.
IMO, the only drawback (or better say inconvenience) is that the keyword var cannot be used, so the type must be specified explicitly. But some people that are against using var in general would say this is indeed a good thing :-)

C# ForEach Loop (string declaration in loop)

I have this string array of names.
string[] someArray = { "Messi", "Ronaldo", "Bale", "Neymar", "Pele" };
foreach (string item in someArray)
{
Console.WriteLine(item);
}
When I run in debug mode with breakpoint on foreach line, and using Step Into, string item is highlighted each time with each pass like it declares it again. I suppose it should highlight item only. Does this mean that with each time item is declared again?
Note: using VS 2012.
The answer here is subtle because the question is (accidentally) unclear. Let me clarify it by breaking it apart into multiple questions.
Is the loop variable logically redeclared every time I go through a foreach loop?
As the other answer correctly notes, in C# 5 and higher, yes. In C# 1 through 4, no.
What is the observable consequence of that change?
In C# 1 there is no way to tell the difference. (And the spec is actually unclear as to where the variable is declared.)
In C# 2 the team added closures in the form of anonymous methods. If there are multiple closures that capture a loop variable in C# 2, 3 and 4 they all capture the same variable, and they all see it mutate as the loop runs. This is almost never what you want.
In C# 5 we took the breaking change to say that a new loop variable is declared every time through the loop, and so each closure captures a different variable and therefore does not observe it changing.
Why does the debugger go back to the loop variable declaration every time I step through the loop?
To inform you that the loop variable is being mutated. If you have
IEnumerable<string> someCollection = whatever;
foreach(string item in someCollection)
{
Console.WriteLine(item);
}
That is logically equivalent to
IEnumerable<string> someCollection = whatever;
{
IEnumerator<string> enumerator = someCollection.GetEnumerator();
try
{
while( enumerator.MoveNext() )
{
string item; // In C# 4.0 and before, this is outside the while
item = enumerator.Current;
{
Console.WriteLine(item);
}
}
}
finally
{
if (enumerator != null)
((IDisposable)enumerator).Dispose();
}
}
If you were debugging around that expanded version of the code, you would see the debugger highlight the invocations of MoveNext and Current. Well, the debugger does the same thing in the compact form; when MoveNext is about to be called the debugger highlights the in, for lack of anything better to highlight. When Current is about to be called it highlights item. Again, what would be the better thing to highlight?
So the fact that the debugger highlights the variable has nothing to do with whether the variable is redeclared?
Correct. It highlights the variable when Current is about to be called, because that will mutate the variable. It mutates the variable regardless of whether it is a new variable, in C# 5, or the same variable as before, in C# 4.
I noticed that you changed my example to use a generic sequence rather than an array. Why did you do that?
Because if the C# compiler knows that the collection in the foreach is an array, it actually just generates a good old fashioned for loop rather than calling MoveNext and Current and all that. Doing so is typically faster and produces less collection pressure, so it's a win all around.
However, the C# team wants the debugging experience to feel the same regardless. So again, the code generator creates stepping points on in -- this time where the generated for's loop variable is mutated -- and again, on the code which mutates the foreach loop variable. That way the debugging experience is consistent.
It depends on which version of C# you are targeting. In C# 5, it's a new variable. Before that, the variable was reused.
See Eric Lippert's blog post here: http://blogs.msdn.com/b/ericlippert/archive/2009/11/12/closing-over-the-loop-variable-considered-harmful.aspx
From the post:
UPDATE: We are taking the breaking change. In C# 5, the loop variable
of a foreach will be logically inside the loop, and therefore closures
will close over a fresh copy of the variable each time. The "for" loop
will not be changed. We return you now to our original article.

How closure in c# works when using lambda expressions?

In to following tutorial : http://www.albahari.com/threading/
They say that the following code :
for (int i = 0; i < 10; i++)
new Thread (() => Console.Write (i)).Start();
is non deterministic and can produce the following answer :
0223557799
I thought that when one uses lambda expressions the compiler creates some kind of anonymous class that captures the variables that are in use by creating members like them in the capturing class.
But i is value type, so i thought that he should be copied by value.
where is my mistake ?
It will be very helpful if the answer will explain how does closure work, how do it hold a "pointer" to a specific int , what code does generated in this specific case ?
The key point here is that closures close over variables, not over values. As such, the value of a given variable at the time you close over it is irrelevant. What matters is the value of that variable at the time the anonymous method is invoked.
How this happens is easy enough to see when you see what the compiler transforms the closure into. It'll create something morally similar to this:
public class ClosureClass1
{
public int i;
public void AnonyousMethod1()
{
Console.WriteLine(i);
}
}
static void Main(string[] args)
{
ClosureClass1 closure1 = new ClosureClass1();
for (closure1.i = 0; closure1.i < 10; closure1.i++)
new Thread(closure1.AnonyousMethod1).Start();
}
So here we can see a bit more clearly what's going on. There is one copy of the variable, and that variable has now been promoted to a field of a new class, instead of being a local variable. Anywhere that would have modified the local variable now modifies the field of this instance. We can now see why your code prints what it does. After starting the new thread, but before it can actually execute, the for loop in the main thread is going back and incrementing the variable in the closure. The variable that hasn't yet been read by the closure.
To produce the desired result what you need to do is make sure that, instead of having every iteration of the loop closing over a single variable, they need to each have a variable that they close over:
for (int i = 0; i < 10; i++)
{
int copy = i;
new Thread(() => Console.WriteLine(copy));
}
Now the copy variable is never changed after it is closed over, and our program will print out 0-9 (although in an arbitrary order, because threads can be scheduled however the OS wants).
As Albahari states, Although the passing arguments are value types, each thread captures the memory location thus resulting in unexpected results.
This is happening because before the Thread had any time to start, the loop already changed whatever value that inside i.
To avoid that, you should use a temp variable as Albahari stated, or only use it when you know the variable is not going to change.
i in Console.Write(i) is evaluated right when that statement is about to be executed. That statement will be executed once thread has been fully created and started running and got to that code. By that time loop has moved forward a few times and thus i can be any value by then. Closures, unlike regular functions, have visibility into local variables of a function in which it is defined (what makes them useful, and way to shoot oneself in a foot).

C# local copied variable value keep changing

I am facing this strange problem with strings.
I assigned a string like this:
string temp = DateTime.UtcNow.ToString("s");
_snapShotTime = string.Copy(temp);
//here threads started....
//while thread progressing I am passing _snapShotTime to create a directory.
//same in second threads.
But the time of local private variable _snapShotTime is keep on changing. I don't know why. I have used a local variable and copy value in it.
Thanks
I suspect your thread uses a lambda expression (or anonymous function) which captures _snapShotTime. That would indeed allow it to be changed. It's hard to say for sure without any code though.
If this is the problem, it's typically that you're referring to a captured variable which is declared outside the loop, but changed on every iteration of a loop. You can fix this by declaring a new variable which takes a copy of the original variable inside the loop, and only using that copy variable in the lambda expression. You'll get a "new" variable inside the loop on each iteration, so you won't have problems.
Strings are immutable, they do not change unless a variable is reassigned to a new string.
We need to see more code in order to help pinpoint the problem.
Why don't you just do
_snapShotTime = DateTime.UtcNow.ToString("s");
Also, place a breakpoint on that line and see when it is being called.
When it does break, see the stack and it will clarify things.
I suspect that your threads change the value of _snapShotTime

Access to modified closure... but why?

Saw several similar questions here, but none of them seemed to quite be my issue...
I understand (or thought I understood) the concept of closure, and understand what would cause Resharper to complain about access to a modified closure, but in the below code I don't understand how I'm breaching closure.
Because primaryApps is declared within the context of the for loop, primary isn't going to change while I'm processing primaryApps. If I had declared primaryApps outside the for loop, then absolutely, I have closure issues. But why in the code below?
var primaries = (from row in openRequestsDataSet.AppPrimaries
select row.User).Distinct();
foreach (string primary in primaries) {
// Complains because 'primary' is accessing a modified closure
var primaryApps = openRequestsDataSet.AppPrimaries.Select(x => x.User == primary);
Is Resharper just not smart enough to figure out it's not an issue, or is there a reason closure is an issue here that I'm not seeing?
The problem is in the following statement
Because primaryApps is declared within the context of the for loop, primary isn't going to change while I'm processing primaryApps.
There is simply no way for Resharper to 100% verify this. The lambda which references the closure here is passed to function outside the context of this loop: The AppPrimaries.Select method. This function could itself store the resulting delegate expression, execute it later and run straight into the capture of the iteration variable issue.
Properly detecting whether or not this is possible is quite an undertaking and frankly not worth the effort. Instead ReSharper is taking the safe route and warning about the potentially dangerous capture of the iteration variable.
Because primaryApps is declared within the context of the for loop, primary isn't going to change while I'm processing primaryApps. If I had declared primaryApps outside the for loop, then absolutely, I have closure issues. But why in the code below?
Jared is right; to demonstrate why your conclusion does not follow logically from your premise, let's make a program that declares primaryApps within the context for the for loop, and still suffers from a captured loop variable problem. Easy enough to do that.
static class Extensions
{
public IEnumerable<int> Select(this IEnumerable<int> items, Func<int, bool> selector)
{
C.list.Add(selector);
return System.Enumerable.Select(items, selector);
}
}
class C
{
public static List<Func<int, bool>> list = new List<Func<int, bool>>();
public static void M()
{
int[] primaries = { 10, 20, 30};
int[] secondaries = { 11, 21, 30};
foreach (int primary in primaries)
{
var primaryApps = secondaries.Select(x => x == primary);
// do something with primaryApps
}
C.N();
}
public static void N()
{
Console.WriteLine(C.list[0](10)); // true or false?
}
}
Where "primaryApps" is declared is completely irrelevant. The only thing that is relevant is that the closure might survive the loop, and therefore someone might invoke it later, incorrectly expecting that the variable captured in the closure was captured by value.
Resharper has no way to know that a particular implementation of Select does not stash away the selector for later; in fact, that is exactly what all of them do. How is Resharper supposed to know that they happen to stash it away in a place that won't be accessible later?
As far as I know Resharper generates the warning every time you access the foreach variable even if it does not really cause closure.
Yes it's just warning,
Look :
http://devnet.jetbrains.net/thread/273042

Categories