Saw several similar questions here, but none of them seemed to quite be my issue...
I understand (or thought I understood) the concept of closure, and understand what would cause Resharper to complain about access to a modified closure, but in the below code I don't understand how I'm breaching closure.
Because primaryApps is declared within the context of the for loop, primary isn't going to change while I'm processing primaryApps. If I had declared primaryApps outside the for loop, then absolutely, I have closure issues. But why in the code below?
var primaries = (from row in openRequestsDataSet.AppPrimaries
select row.User).Distinct();
foreach (string primary in primaries) {
// Complains because 'primary' is accessing a modified closure
var primaryApps = openRequestsDataSet.AppPrimaries.Select(x => x.User == primary);
Is Resharper just not smart enough to figure out it's not an issue, or is there a reason closure is an issue here that I'm not seeing?
The problem is in the following statement
Because primaryApps is declared within the context of the for loop, primary isn't going to change while I'm processing primaryApps.
There is simply no way for Resharper to 100% verify this. The lambda which references the closure here is passed to function outside the context of this loop: The AppPrimaries.Select method. This function could itself store the resulting delegate expression, execute it later and run straight into the capture of the iteration variable issue.
Properly detecting whether or not this is possible is quite an undertaking and frankly not worth the effort. Instead ReSharper is taking the safe route and warning about the potentially dangerous capture of the iteration variable.
Because primaryApps is declared within the context of the for loop, primary isn't going to change while I'm processing primaryApps. If I had declared primaryApps outside the for loop, then absolutely, I have closure issues. But why in the code below?
Jared is right; to demonstrate why your conclusion does not follow logically from your premise, let's make a program that declares primaryApps within the context for the for loop, and still suffers from a captured loop variable problem. Easy enough to do that.
static class Extensions
{
public IEnumerable<int> Select(this IEnumerable<int> items, Func<int, bool> selector)
{
C.list.Add(selector);
return System.Enumerable.Select(items, selector);
}
}
class C
{
public static List<Func<int, bool>> list = new List<Func<int, bool>>();
public static void M()
{
int[] primaries = { 10, 20, 30};
int[] secondaries = { 11, 21, 30};
foreach (int primary in primaries)
{
var primaryApps = secondaries.Select(x => x == primary);
// do something with primaryApps
}
C.N();
}
public static void N()
{
Console.WriteLine(C.list[0](10)); // true or false?
}
}
Where "primaryApps" is declared is completely irrelevant. The only thing that is relevant is that the closure might survive the loop, and therefore someone might invoke it later, incorrectly expecting that the variable captured in the closure was captured by value.
Resharper has no way to know that a particular implementation of Select does not stash away the selector for later; in fact, that is exactly what all of them do. How is Resharper supposed to know that they happen to stash it away in a place that won't be accessible later?
As far as I know Resharper generates the warning every time you access the foreach variable even if it does not really cause closure.
Yes it's just warning,
Look :
http://devnet.jetbrains.net/thread/273042
Related
I have this string array of names.
string[] someArray = { "Messi", "Ronaldo", "Bale", "Neymar", "Pele" };
foreach (string item in someArray)
{
Console.WriteLine(item);
}
When I run in debug mode with breakpoint on foreach line, and using Step Into, string item is highlighted each time with each pass like it declares it again. I suppose it should highlight item only. Does this mean that with each time item is declared again?
Note: using VS 2012.
The answer here is subtle because the question is (accidentally) unclear. Let me clarify it by breaking it apart into multiple questions.
Is the loop variable logically redeclared every time I go through a foreach loop?
As the other answer correctly notes, in C# 5 and higher, yes. In C# 1 through 4, no.
What is the observable consequence of that change?
In C# 1 there is no way to tell the difference. (And the spec is actually unclear as to where the variable is declared.)
In C# 2 the team added closures in the form of anonymous methods. If there are multiple closures that capture a loop variable in C# 2, 3 and 4 they all capture the same variable, and they all see it mutate as the loop runs. This is almost never what you want.
In C# 5 we took the breaking change to say that a new loop variable is declared every time through the loop, and so each closure captures a different variable and therefore does not observe it changing.
Why does the debugger go back to the loop variable declaration every time I step through the loop?
To inform you that the loop variable is being mutated. If you have
IEnumerable<string> someCollection = whatever;
foreach(string item in someCollection)
{
Console.WriteLine(item);
}
That is logically equivalent to
IEnumerable<string> someCollection = whatever;
{
IEnumerator<string> enumerator = someCollection.GetEnumerator();
try
{
while( enumerator.MoveNext() )
{
string item; // In C# 4.0 and before, this is outside the while
item = enumerator.Current;
{
Console.WriteLine(item);
}
}
}
finally
{
if (enumerator != null)
((IDisposable)enumerator).Dispose();
}
}
If you were debugging around that expanded version of the code, you would see the debugger highlight the invocations of MoveNext and Current. Well, the debugger does the same thing in the compact form; when MoveNext is about to be called the debugger highlights the in, for lack of anything better to highlight. When Current is about to be called it highlights item. Again, what would be the better thing to highlight?
So the fact that the debugger highlights the variable has nothing to do with whether the variable is redeclared?
Correct. It highlights the variable when Current is about to be called, because that will mutate the variable. It mutates the variable regardless of whether it is a new variable, in C# 5, or the same variable as before, in C# 4.
I noticed that you changed my example to use a generic sequence rather than an array. Why did you do that?
Because if the C# compiler knows that the collection in the foreach is an array, it actually just generates a good old fashioned for loop rather than calling MoveNext and Current and all that. Doing so is typically faster and produces less collection pressure, so it's a win all around.
However, the C# team wants the debugging experience to feel the same regardless. So again, the code generator creates stepping points on in -- this time where the generated for's loop variable is mutated -- and again, on the code which mutates the foreach loop variable. That way the debugging experience is consistent.
It depends on which version of C# you are targeting. In C# 5, it's a new variable. Before that, the variable was reused.
See Eric Lippert's blog post here: http://blogs.msdn.com/b/ericlippert/archive/2009/11/12/closing-over-the-loop-variable-considered-harmful.aspx
From the post:
UPDATE: We are taking the breaking change. In C# 5, the loop variable
of a foreach will be logically inside the loop, and therefore closures
will close over a fresh copy of the variable each time. The "for" loop
will not be changed. We return you now to our original article.
In to following tutorial : http://www.albahari.com/threading/
They say that the following code :
for (int i = 0; i < 10; i++)
new Thread (() => Console.Write (i)).Start();
is non deterministic and can produce the following answer :
0223557799
I thought that when one uses lambda expressions the compiler creates some kind of anonymous class that captures the variables that are in use by creating members like them in the capturing class.
But i is value type, so i thought that he should be copied by value.
where is my mistake ?
It will be very helpful if the answer will explain how does closure work, how do it hold a "pointer" to a specific int , what code does generated in this specific case ?
The key point here is that closures close over variables, not over values. As such, the value of a given variable at the time you close over it is irrelevant. What matters is the value of that variable at the time the anonymous method is invoked.
How this happens is easy enough to see when you see what the compiler transforms the closure into. It'll create something morally similar to this:
public class ClosureClass1
{
public int i;
public void AnonyousMethod1()
{
Console.WriteLine(i);
}
}
static void Main(string[] args)
{
ClosureClass1 closure1 = new ClosureClass1();
for (closure1.i = 0; closure1.i < 10; closure1.i++)
new Thread(closure1.AnonyousMethod1).Start();
}
So here we can see a bit more clearly what's going on. There is one copy of the variable, and that variable has now been promoted to a field of a new class, instead of being a local variable. Anywhere that would have modified the local variable now modifies the field of this instance. We can now see why your code prints what it does. After starting the new thread, but before it can actually execute, the for loop in the main thread is going back and incrementing the variable in the closure. The variable that hasn't yet been read by the closure.
To produce the desired result what you need to do is make sure that, instead of having every iteration of the loop closing over a single variable, they need to each have a variable that they close over:
for (int i = 0; i < 10; i++)
{
int copy = i;
new Thread(() => Console.WriteLine(copy));
}
Now the copy variable is never changed after it is closed over, and our program will print out 0-9 (although in an arbitrary order, because threads can be scheduled however the OS wants).
As Albahari states, Although the passing arguments are value types, each thread captures the memory location thus resulting in unexpected results.
This is happening because before the Thread had any time to start, the loop already changed whatever value that inside i.
To avoid that, you should use a temp variable as Albahari stated, or only use it when you know the variable is not going to change.
i in Console.Write(i) is evaluated right when that statement is about to be executed. That statement will be executed once thread has been fully created and started running and got to that code. By that time loop has moved forward a few times and thus i can be any value by then. Closures, unlike regular functions, have visibility into local variables of a function in which it is defined (what makes them useful, and way to shoot oneself in a foot).
I'm testing performance differences using various lambda expression syntaxes. If I have a simple method:
public IEnumerable<Item> GetItems(int point)
{
return this.items.Where(i => i.IsApplicableFor(point));
}
then there's some variable lifting going on here related to point parameter because it's a free variable from lambda's perspective. If I would call this method a million times, would it be better to keep it as it is or change it in any way to improve its performance?
What options do I have and which ones are actually feasible? As I understand it is I have to get rid of free variables so compiler won't have to create closure class and instantiate it on every call to this method. This instantiation usually takes significant amount of time compared to non-closure versions.
The thing is I would like to come up with some sort of lambda writing guidelines that would generally work, because it seems I'm wasting some time every time I write a heavily hit lambda expression. I have to manually test it to make sure it will work, because I don't know what rules to follow.
Alternative method
& example console application code
I've also written a different version of the same method that doesn't need any variable lifting (at least I think it doesn't, but you guys who understand this let me know if that's the case):
public IEnumerable<Item> GetItems(int point)
{
Func<int, Func<Item, bool>> buildPredicate = p => i => i.IsApplicableFor(p);
return this.items.Where(buildPredicate(point));
}
Check out Gist here. Just create a console application and copy the whole code into Program.cs file inside namespace block. You will see that the second example is much much slower even though it doesn't use free variables.
A contradictory example
The reason why I would like to construct some lambda best usage guidelines is that I've met this problem before and to my surprise that one turned out to be working faster when a predicate builder lambda expression was used.
Now explain that then. I'm completely lost here because it may as well turn out I won't be using lambdas at all when I know I have some heavy use method in my code. But I would like to avoid such situation and get to the bottom of it all.
Edit
Your suggestions don't seem to work
I've tried implementing a custom lookup class that internally works similar to what compiler does with a free variable lambda. But instead of having a closure class I've implemented instance members that simulate a similar scenario. This is the code:
private int Point { get; set; }
private bool IsItemValid(Item item)
{
return item.IsApplicableFor(this.Point);
}
public IEnumerable<TItem> GetItems(int point)
{
this.Point = point;
return this.items.Where(this.IsItemValid);
}
Interestingly enough this works just as slow as the slow version. I don't know why, but it seems to do nothing else than the fast one. It reuses the same functionality because these additional members are part of the same object instance. Anyway. I'm now extremely confused!
I've updated Gist source with this latest addition, so you can test for yourself.
What makes you think that the second version doesn't require any variable lifting? You're defining the Func with a Lambda expression, and that's going to require the same bits of compiler trickery that the first version requires.
Furthermore, you're creating a Func that returns a Func, which bends my brain a little bit and will almost certainly require re-evaluation with each call.
I would suggest that you compile this in release mode and then use ILDASM to examine the generated IL. That should give you some insight into what code is generated.
Another test that you should run, which will give you more insight, is to make the predicate call a separate function that uses a variable at class scope. Something like:
private DateTime dayToCompare;
private bool LocalIsDayWithinRange(TItem i)
{
return i.IsDayWithinRange(dayToCompare);
}
public override IEnumerable<TItem> GetDayData(DateTime day)
{
dayToCompare = day;
return this.items.Where(i => LocalIsDayWithinRange(i));
}
That will tell you if hoisting the day variable is actually costing you anything.
Yes, this requires more code and I wouldn't suggest that you use it. As you pointed out in your response to a previous answer that suggested something similar, this creates what amounts to a closure using local variables. The point is that either you or the compiler has to do something like this in order to make things work. Beyond writing the pure iterative solution, there is no magic you can perform that will prevent the compiler from having to do this.
My point here is that "creating the closure" in my case is a simple variable assignment. If this is significantly faster than your version with the Lambda expression, then you know that there is some inefficiency in the code that the compiler creates for the closure.
I'm not sure where you're getting your information about having to eliminate the free variables, and the cost of the closure. Can you give me some references?
Your second method runs 8 times slower than the first for me. As #DanBryant says in comments, this is to do with constructing and calling the delegate inside the method - not do do with variable lifting.
Your question is confusing as it reads to me like you expected the second sample to be faster than the first. I also read it as the first is somehow unacceptably slow due to 'variable lifting'. The second sample still has a free variable (point) but it adds additional overhead - I don't understand why you'd think it removes the free variable.
As the code you have posted confirms, the first sample above (using a simple inline predicate) performs jsut 10% slower than a simple for loop - from your code:
foreach (TItem item in this.items)
{
if (item.IsDayWithinRange(day))
{
yield return item;
}
}
So, in summary:
The for loop is the simplest approach and is "best case".
The inline predicate is slightly slower, due to some additional overhead.
Constructing and calling a Func that returns Func within each iteration is significantly slower than either.
I don't think any of this is surprising. The 'guideline' is to use an inline predicate - if it performs poorly, simplify by moving to a straight loop.
I profiled your benchmark for you and determined many things:
First of all, it spends half its time on the line return this.GetDayData(day).ToList(); calling ToList. If you remove that and instead manually iterate over the results, you can measure relative the differences in the methods.
Second, because IterationCount = 1000000 and RangeCount = 1, you are timing the initialization of the different methods rather than the amount of time it takes to execute them. This means your execution profile is dominated by creating the iterators, escaping variable records, and delegates, plus the hundreds of subsequent gen0 garbage collections that result from creating all that garbage.
Third, the "slow" method is really slow on x86, but about as fast as the "fast" method on x64. I believe this is due to how the different JITters create delegates. If you discount the delegate creation from the results, the "fast" and "slow" methods are identical in speed.
Fourth, if you actually invoke the iterators a significant number of times (on my computer, targetting x64, with RangeCount = 8), "slow" is actually faster than "foreach" and "fast" is faster than all of them.
In conclusion, the "lifting" aspect is negligible. Testing on my laptop shows that capturing a variable like you do requires an extra 10ns every time the lambda gets created (not every time it is invoked), and that includes the extra GC overhead. Furthermore, while creating the iterator in your "foreach" method is somewhat faster than creating the lambdas, actually invoking that iterator is slower than invoking the lambdas.
If the few extra nanoseconds required to create delegates is too much for your application, consider caching them. If you require parameters to those delegates (i.e. closures), consider creating your own closure classes such that you can create them once and then just change the properties when you need to reuse their delegates. Here's an example:
public class SuperFastLinqRangeLookup<TItem> : RangeLookupBase<TItem>
where TItem : RangeItem
{
public SuperFastLinqRangeLookup(DateTime start, DateTime end, IEnumerable<TItem> items)
: base(start, end, items)
{
// create delegate only once
predicate = i => i.IsDayWithinRange(day);
}
DateTime day;
Func<TItem, bool> predicate;
public override IEnumerable<TItem> GetDayData(DateTime day)
{
this.day = day; // set captured day to correct value
return this.items.Where(predicate);
}
}
When a LINQ expression that uses deferred execution executes within the same scope that encloses the free variables it references, the compiler should detect that and not create a closure over the lambda, because it's not needed.
The way to verify that would be by testing it using something like this:
public class Test
{
public static void ExecuteLambdaInScope()
{
// here, the lambda executes only within the scope
// of the referenced variable 'add'
var items = Enumerable.Range(0, 100000).ToArray();
int add = 10; // free variable referenced from lambda
Func<int,int> f = x => x + add;
// measure how long this takes:
var array = items.Select( f ).ToArray();
}
static Func<int,int> GetExpression()
{
int add = 10;
return x => x + add; // this needs a closure
}
static void ExecuteLambdaOutOfScope()
{
// here, the lambda executes outside the scope
// of the referenced variable 'add'
Func<int,int> f = GetExpression();
var items = Enumerable.Range(0, 100000).ToArray();
// measure how long this takes:
var array = items.Select( f ).ToArray();
}
}
Suppose we have the following code:
void AFunction()
{
foreach(AClass i in AClassCollection)
{
listOfLambdaFunctions.AddLast( () => { PrintLine(i.name); } );
}
}
void Main()
{
AFunction();
foreach( var i in listOfLambdaFunctions)
i();
}
One might think that the above code would out the same as the following:
void Main()
{
foreach(AClass i in AClassCollection)
PrintLine(i.name);
}
However, it doesn't. Instead, it prints the name of the last item in AClassCollection every time.
It appears as if the same item was being used in each lambda function. I suspect there might be some delay from when the lambda was created to when the lambda took a snapshot of the external variables used in it.
Essentially, the lambda is holding a reference to the local variable i, instead of taking a "snapshot" of i's value when the lambda was created.
To test this theory, I tried this code:
string astr = "a string";
AFunc fnc = () => { System.Diagnostics.Debug.WriteLine(astr); };
astr = "changed";
fnc();
and, surprise, it outputs changed!
I am using XNA 3.1, and whichever version of C# that comes with it.
My questions are:
What is going on?
Does the lambda function somehow store a 'reference' to the variable or something?
Is there any way around this problem?
This is a modified closure
See: similar questions like Access to Modified Closure
To work around the issue you have to store a copy of the variable inside the scope of the for loop:
foreach(AClass i in AClassCollection)
{
AClass anotherI= i;
listOfLambdaFunctions.AddLast( () => { PrintLine(anotherI.name); } );
}
does the lambda function somehow store a 'reference' to the variable or something?
Close. The lambda function captures the variable itself. There is no need to store a reference to a variable, and in fact, in .NET it is impossible to permanently store a reference to a variable. You just capture the entire variable. You never capture the value of the variable.
Remember, a variable is a storage location. The name "i" refers to a particular storage location, and in your case, it always refers to the same storage location.
Is there anyway around this problem?
Yes. Create a new variable every time through the loop. The closure then captures a different variable every time.
This is one of the most frequently reported problems with C#. We're considering changing the semantics of the loop variable declaration so that a new variable is created every time through the loop.
For more details on this issue see my articles on the subject:
http://ericlippert.com/2009/11/12/closing-over-the-loop-variable-considered-harmful-part-one/
what is going on? does the lambda function somehow store a 'reference' to the variable or something?
Yes exactly that; c# captured variables are to the variable, not the value of the variable. You can usually get around this by introducing a temp variable and binding to that:
string astr = "a string";
var tmp = astr;
AFunc fnc = () => { System.Diagnostics.Debug.WriteLine(tmp); };
especially in foreach where this is notorious.
Yes, the lambda stores a reference to the variable (conceptually speaking, anyway).
A very simple workaround is this:
foreach(AClass i in AClassCollection)
{
AClass j = i;
listOfLambdaFunctions.AddLast( () => { PrintLine(j.name); } );
}
In every iteration of the foreach loop, a new j gets created, which the lambda captures.
i on the other hand, is the same variable throughout, but gets updated with every iteration (so all the lambdas end up seeing the last value)
And I agree that this is a bit surprising. :)
I've been caught by this one as well, as said by Calgary Coder, it is a modified closure. I really had trouble spotting them until I got resharper. Since it is one of the warnings that resharper watches for, I am much better at identifying them as I code.
I was hoping someone could explain to me what bad thing could happen in this code, which causes ReSharper to give an 'Access to modified closure' warning:
bool result = true;
foreach (string key in keys.TakeWhile(key => result))
{
result = result && ContainsKey(key);
}
return result;
Even if the code above seems safe, what bad things could happen in other 'modified closure' instances? I often see this warning as a result of using LINQ queries, and I tend to ignore it because I don't know what could go wrong. ReSharper tries to fix the problem by making a second variable that seems pointless to me, e.g. it changes the foreach line above to:
bool result1 = result;
foreach (string key in keys.TakeWhile(key => result1))
Update: on a side note, apparently that whole chunk of code can be converted to the following statement, which causes no modified closure warnings:
return keys.Aggregate(
true,
(current, key) => current && ContainsKey(key)
);
In that particular code nothing, take the following as an example:
int myVal = 2;
var results = myDatabase.Table.Where(record => record.Value == myVal);
myVal = 3;
foreach( var myResult in results )
{
// TODO: stuff
}
It looks like result will return all records where the Value is 2 because that's what myVal was set to when you declared the query. However, due to deferred execution it will actually be all records where the Value is 3 because the query isn't executed until you iterate over it.
In your example the value isn't being modified, but Resharper is warning you that it could be and that deferred execution could cause you problems.
When you modify the result variable, the closure (the use of the variable inside the lambda expression) will pick up the change.
This frequently comes as an unexpected surprise to programmers who don't fully understand closures, so Resharper has a warning for it.
By making a separate result1 variable which is only used in the lambda expression, it will ignore any subsequent changes to the original result variable.
In your code, you're relying on the closure picking up the changes to the original variable so that it will know when to stop.
By the way, the simplest way to write your function without LINQ is like this:
foreach (string key in keys) {
if (ContainsKey(key))
return true;
}
return false;
Using LINQ, you can simply call Any():
return keys.Any<string>(ContainsKey);
See
http://ericlippert.com/2009/11/12/closing-over-the-loop-variable-considered-harmful-part-one/
for an extensive discussion of this issue and what we might do in hypothetical future versions of C# to mitigate it.
I'm not sure if ReSharper will give the exact same warning for this, but the following illustrates a similar situation. The iterator for a loop is used in a LINQ clause, but the clause isn't actually evaluated until after the loop has finished, by which time the iterator variable has changed. The following is a contrived example that looks like it should print all odd numbers from 1 to 100, but actually prints all numbers from 1 to 99.
var notEven = Enumerable.Range(1,100);
foreach (int i in Enumerable.Range(1, 50))
{
notEven = notEven.Where(s => s != i * 2);
}
Console.WriteLine(notEven.Count());
Console.WriteLine(string.Join(", ", notEven.Select(s => s.ToString()).ToArray()));
This can be quite an easy mistake to make. I've heard people say that if you truly understand closures/functional programming/etc. you should never make this mistake. I've also seen professional developers who certainly do have a good grasp of those concepts make this exact mistake.
Well, the warning is because result could be changed before the closure is executed (variables are taken at execution, not defition). In your case, you are actually taking advantage of that fact. If you use the resharper reccomodation, it will actually break your code, as key => result1 always returns true.