I've read a number of other questions about Access to Modified closure so I understand the basic principle. Still, I couldn't tell - does Parallel.ForEach have the same issues?
Take the following snippet where I recompute the usage stats for users for the last week as an example:
var startTime = DateTime.Now;
var endTime = DateTime.Now.AddHours(6);
for (var i = 0; i < 7; i++)
{
// this next line gives me "Access To Modified Closure"
Parallel.ForEach(allUsers, user => UpdateUsageStats(user, startTime, endTime));
// move back a day and continue the process
startTime = startTime.AddDays(-1);
endTime = endTime.AddDays(-1);
}
From what I know of this code the foreach should run my UpdateUsageStats routine right away and start/end time variables won't be updated till the next time around the loop. Is that correct or should I use local variables to make sure there aren't issues?
You are accessing a modified closure, so it does apply. But, you are not changing its value while you are using it, so assuming you are not changing the values inside UpdateUsageStats you don't have a problem here.
Parallel.Foreach waits for the execution to end, and only then are you changing the values in startTime and endTime.
"Access to modified closure" only leads to problems if the capture scope leaves the loop in which the capture takes place and is used elsewhere. For example,
var list = new List<Action>();
for (var i = 0; i < 7; i++)
{
list.Add(() => Console.WriteLine(i));
}
list.ForEach(a => a()); // prints "7" 7 times, because `i` was captured inside the loop
In your case the lamda doing the capture doesn't leave the loop (the Parallel.ForEach call is executed completely within the loop, each time around).
You still get the warning because the compiler doesn't know whether or not Parallel.ForEach is causing the the lambda to be stored for later invocation. Since we know more than the compiler we can safely ignore the warning.
Related
This question already has answers here:
Captured variable in a loop in C#
(10 answers)
Closed 2 years ago.
Here it goes a tricky one (maybe not for experts). I'm learning about concurrency in C# and I'm playing around with some dummy code to test the fundamentals of threads. Im surprised about how is possible that the following code prints the value 10 sometimes.
Thread[] pool = new Thread[10];
for(int i = 0; i < 10; i++)
{
pool[i] = new Thread(() => Console.Write(i));
pool[i].Start();
}
Typical output is like 76881079777. I know is the value 10 and not 1 and 0 because the for loop body executes 10 times (not strictly sure, but I couldn't break C#). I'm even more surprised why this does not throw an IndexOutOfRange exception in the statement pool[i] = new Thread(() => Console.Write(i));
As long as I know, the for loop executes like following:
Check the condition ( i < 10)
Executes the body if condition is true
Increments the control variable (i++)
Repeat
So, assuming that, is imposible for me to understand how the body can be executed with the value 10. Any ideas?
When you call pool[i].Start();, it does not mean that () => Console.Write(i) gets executed immediately. Instead, the operating system gets to decide when the passed method gets executed (this is also called "scheduling"). For instance, the operating system may decide to execute one of your threads after your loop is done. In this case, i is 10; so this is the reason why an output of 10 might be possible.
var counter=0;
var array = new int[] {0, 1, 2, 3,4};
var test = array.Select(a => counter++);
foreach (var item in test)
{
Console.WriteLine(item);
}
Console.ReadLine();
When I run the code above the console prints 0,1,2,3,4.
However, when I expand test array in the debug mode I can see numbers from 10 to 14. Why??
Also, can you help me why the console does not print 1,2,3,4,5 as it should return incremented counter.
The reason the output keeps changing is that test isn't actually evaluated until you enumerate through it. So opening up the debug view causes it to evaluate the enumeration. Then every time you enumerate it, it will run again, each time the counter variable increasing. So you can get some funny results by running the for loop multiple times or printing out test.First() multiple times.
You can prevent this by forcing the enumerable to materialise into a list:
var test = array.Select(a => counter++).ToList();
// ^^^^^^^^^
As for why it starts at zero, that's because ++ in this context is a post-increment operator meaning it returns the value and then increments. If you want it to start at 1, prefix the variable instead:
var test = array.Select(a => ++counter).ToList();
That's normal. When you just use the Select method, you get a lazy list, it means it will be evaluated when you access it. Here, you access it twice, when you execute the foreach and when you look in the debugger, each time, your select Func is getting executed, thus incrementing your counter.
If you replace by
var test = array.Select(a => counter++).ToList();
It won't be lazy anymore, and will be executed once when you call ToList(). Yet, staying lazy might be interesting especially if you want to add some conditions later, for example appending some Where conditions, you wouldn't like your query to be executed before you're finished building it.
Your counter starts at zero because counter++ will first give you the value, then only increment it. If you want to start at one you can either initialize counter to 1 or replace counter++ by ++counter, it will be first incremented then returned.
EDIT: I realized that I had been going about it completely the wrong way and after an overhaul, got it working. Thanks for the tips guys, I'll keep them in mind for the future.
I've hit an unusual problem in my program. What I need to do is find the difference between two times, divide it by 1.5 hours, then return the starting time followed by each 1.5 hour increment of the starting time. So if the time was 11:45 am - 2:45 pm, the time difference is three hours, 3/1.5 = 2, then return 11:45 am and 1:15 pm. At the moment, I can do everything except return more than one time. Depending on what I've tried, it returns either the initial time (11:45 am), the first increment (1:15 pm) or the end time (2:45 pm). So far I've tried a few different types of for and do/while loops. The closest I've come was simply concatenating the start time and the incremented time, but the start and end times can range anywhere from 3 - 6 hours so that's not a practical way to do it.
The latest thing I tried:
int i = 0;
do{
i++;
//Start is the starting time, say 11:45 am
start = start.AddMinutes(90);
return start.ToShortTimeString();
} while (i < totalSessions); //totalSessions is the result of hours / 1.5
and I'm calling the function on a dynamic label (which is also in a for loop):
z[i] = new Label();
z[i].Location = new Point(PointX, PointZ);
z[i].Name = "sessionTime_" + i;
z[i].Text = getPlayTimes(dt.Rows[i][1].ToString());
tabPage1.Controls.Add(z[i]);
z[i].BringToFront();
PointZ += z[i].Height;
I'm pretty new to c# so I guess I've just misunderstood something somewhere.
I think you're trying to solve the problem the wrong way. Instead of returning each value as you come to it, create a collection ie List, and push each of your results onto the list until your finishing condition is met.
Then, return the whole array as your return value. This way you will have a nice self-contained function that doesn't have cross-concerns with other logic - it does only it's one little task but does it well.
Good luck!
It's a bit difficult to determine your exact use case, as you haven't offered the complete code you are using. However, you can do what you are asking by using the yield return functionality:
public IEnumerable<string> GetPlayTimes()
{
int i = 0;
do
{
i++;
//Start is the starting time, say 11:45 am
start = start.AddMinutes(90);
yield return start.ToShortTimeString();
} while (i < totalSessions); //totalSessions is the result of hours / 1.5
}
And then use it like so:
foreach (var time in GetPlayTimes())
{
// Do something with time
}
So I have 3 nested for loops with the inner two doing little work. I want to convert the outer-most loop into a parallel one.
My question is:
If I have a variable inside the loop, something that is used as a temp value holder and takes a new value each loop. Do I need to worry about that variable when the parallelism begins ?
I mean are all the threads gonna be over-writing the same variable ?
for (int i = 0; i < persons.number; i++) //Loop all the people
var Dates = persons[i].Appointments.bydate(DateStep);
Do I need to worry about the Dates variable in the parallel loop ?
Sorry for the bad formatting of my question but it's only my second question and I'm getting there.
In short: No.
Because this variable is scoped inside the loop, it will be reassigned for every iteration of the loop anyways. It is not a value which is shared among different threads.
The only variables which you should worry about are those scoped outside of the loop.
Dates will be local to each loop iteration, so each thread will have a private copy on its own stack. No interference.
Be careful about variables declared outside the loop though.
I've been learning how to use the threadpools but I'm not sure that each of the threads in the pool are being executed properly and I suspect some are being executed more than once. I've cut down the code to the bare minimum and having been using Debug.WriteLine to try and work out what is going on but this produces some odd results.
My code is as follows (based on code from (WaitAll for multiple handles on a STA thread is not supported):
public void ThreadCheck()
{
string[] files;
classImport Import;
CountdownEvent done = new CountdownEvent(1);
ManualResetEvent[] doneEvents = new ManualResetEvent[10];
try
{
files = Directory.GetFiles(importDirectory, "*.ZIP");
for (int j = 0; j < doneEvents.Length; j++)
{
done.AddCount();
Import = new classImport(j, files[j], workingDirectory + #"\" + j.ToString(), doneEvents[j]);
ThreadPool.QueueUserWorkItem(
(state) =>
{
try
{
Import.ThreadPoolCallBack(state);
Debug.WriteLine("Thread " + j.ToString() + " started");
}
finally
{
done.Signal();
}
}, j);
}
done.Signal();
done.Wait();
}
catch (Exception ex)
{
Debug.WriteLine("Error in ThreadCheck():\n" + ex.ToString());
}
}
The classImport.ThreadPoolCallBack doesn't actually do anything at the minute.
If I step through the code manually I get:
Thread 1 started
Thread 2 started
.... all the way to ....
Thread 10 started
However, if I run it manually the Output window is filled with "Thread 10 started"
My question is: is there something wrong with my code for use of the threadpool or is the Debug.WriteLine's results being confused by the multiple threads?
The problem is that you're using the loop variable (j) within a lambda expression.
The details of why this is a problem are quite longwinded - see Eric Lippert's blog post for details (also read part 2).
Fortunately the fix is simple: just create a new local variable inside the loop and use that within the lambda expression:
for (int j = 0; j < doneEvents.Length; j++)
{
int localCopyOfJ = j;
... use localCopyOfJ within the lambda ...
}
For the rest of the loop body it's fine to use just j - it's only when it's captured by a lambda expression or anonymous method that it becomes a problem.
This is a common issue which trips up a lot of people - the C# team have considered changes to the behaviour for the foreach loop (where it really looks like you're already declaring a separate variable on each iteration), but it would cause interesting compatibility issues. (You could write C# 5 code which works fine, and with C# 4 it might compile fine but really be broken, for example.)
Essentially the local variable j you've got there is captured by the lambda expression, resulting in the old modified closure problem. You'll have to read that post to get a broad understanding of the issue, but I can speak about some specifics in this context.
It might appear as though each thread-pool task is seeing it's own "version" of j, but it isn't. In other words, subsequent mutations to j after a task has been created is visible to the task.
When you step through your code slowly, the thread-pool executes each task before the variable has an opportunity to change, which is why you get the expected result (one value for the variable is effectively "associated" with one task). In production, this isn't the case. It appears that for your specific test run, the loop completed before any of the tasks had an opportunity to run. This is why all of the tasks happened to see the same "last" value for j (Given the time it takes to schedule a job on the thread-pool, I would imagine this output to be typical.) But this isn't guaranteed by any means; you could see pretty much any output, depending on the particular timing characteristics of the environment you're running this code on.
Fortunately, the fix is simple:
for (int j = 0; j < doneEvents.Length; j++)
{
int jCopy = j;
// work with jCopy instead of j
Now, each task will "own" a particular value of the loop-variable.
the problem is that the j is a captured variable and is therefore the same capture reference is being used for each lambda expression.