I've been toying with parallelism and I'm having some trouble understanding what's going on in my program.
I'm trying to replicate some of the functionality of the XNA framework. I'm using a component-style setup and one way I thought of making my program a little more efficient was to call the Update method of each component in a separate Task. However, I'm obviously doing something horribly wrong.
The code I'm using in my loop for an update call is:
public void Update(GameTime gameTime)
{
Task[] tasks = new Task[engineComponents.Count];
for (int i = 0; i < tasks.Length; i++)
{
tasks[i] = new Task(() => engineComponents[i].Update(gameTime));
tasks[i].Start();
}
Task.WaitAll(tasks);
}
This throws a weird error:
An unhandled exception of type 'System.AggregateException' occurred in mscorlib.dll
The inner exception talks about an index being out of range.
If I change
Task[] tasks = new Task[engineComponents.Count];
to
Task[] tasks = new Task[engineComponents.Count - 1];
then this seems to work (or at least the program executes without an exception), but there isn't enough room in the array for all of the components. Nonetheless, all of the components are updated, despite there not being enough room in the tasks array to hold them all.
However, the gameTime object that is passed as a parameter goes somewhat insane when the game is running. I've found it hard to pin-point the problem, but I have two components that both simply move a circle's x-position using
x += (float)(gameTime.ElapsedGameTime.TotalSeconds * 10);
However, when using Tasks, their x-positions very quickly become disparate from one another, when they should in fact be the same. Each engineComponent.Update(gameTime) is called once per-update cycle, and the same gameTime object is passed.
When using tasks[i].RunSynchronously(); in place of tasks[i].Start();, the program runs exactly as expected.
I understand that using Tasks in this manner may not be a particularly efficient programming practice, so my question is one of curiosity: why isn't the above code working as I would expect? I know I'm missing something obvious, but I've been unable to track down what specifically is wrong with this implementation.
Apologies for the long question, and thanks for reading ;)
The problem is that your lambda expression is capturing i - not the value of i, but the variable itself.
That means that by the time your task executes, the loop may well be on the next iteration (or even later). So some of your components may be updated more than once, some may not be updated at all, and the final tasks are likely to execute when i is outside the range of engineComponents, hence the exception. For more details, see Eric Lippert's blog posts:
Closing over the loop variable considered harmful
Closing over the loop variable, part two
Three options to fix this:
Take a copy of the variable inside the loop. Each variable declared inside the loop will be captured separately:
for (int i = 0; i < tasks.Length; i++)
{
int copyOfI = i;
tasks[i] = new Task(() => engineComponents[copyOfI].Update(gameTime));
tasks[i].Start();
}
Capture engineComponents[i] in a separate variable instead:
for (int i = 0; i < tasks.Length; i++)
{
var component = engineComponents[i];
tasks[i] = new Task(() => component.Update(gameTime));
tasks[i].Start();
}
If you're using C# 5, using a foreach loop will do what you want:
var tasks = new List<Task>();
foreach (var component in engineComponents)
{
Task task = new Task(() => component.Update(gameTime));
tasks.Add(task);
task.Start();
}
Task.WaitAll(tasks.ToArray());
Note that the last solution will not work with the C# 4 compiler, as the behaviour of the foreach iteration variable was for it to be a single variable, just like i. You don't need to target .NET 4.5 or higher for it to work, but you do need to use a C# 5 compiler.
Another option is to not use tasks explicitly at all - use Parallel.ForEach instead:
// This replaces your entire method body
Parallel.ForEach(engineComponents, component => component.Update(gameTime));
Much simpler!
Try the following:
for (int i = 0; i < tasks.Length; i++)
{
var innerI = i;
tasks[i] = new Task(() => engineComponents[innerI].Update(gameTime));
tasks[i].Start();
}
You need a new variable for each task, which will be captured by linq expression and will hold index of your job part. Now all your tasks uses i variable and perform work on the latest element.
Your loop variable is captured in a closure. Here is an article by Eric Lippert for a more detailed explanation. You can easily solve the issue by declaring an inner variable inside the loop:
public void Update(GameTime gameTime)
{
Task[] tasks = new Task[engineComponents.Count];
for (int i = 0; i < tasks.Length; i++)
{
int inner = i; // Declare another temp variable
tasks[i] = new Task(() => engineComponents[inner].Update(gameTime));
tasks[i].Start();
}
Task.WaitAll(tasks);
}
Related
Trying to pass an element of an array of strings into a function which is being called in a Task.Run. Anyone know what is the error here?
The code here doesn't work, it behaves as if ProcessElem never gets called.
string[] arr = message.Split(new string[] {"\n"}, StringSplitOptions.None);
for (int i = 0; i < arr.Length; i++) {
if(arr[i] != "") {
var t = Task.Run(() => this.ProcessElem(arr[i]));
}
}
However the code below works
string[] arr = message.Split(new string[] {"\n"}, StringSplitOptions.None);
for (int i = 0; i < arr.Length; i++) {
if(arr[i] != "") {
var tmp = arr[i];
var t = Task.Run(() => this.ProcessElem(tmp));
}
}
I'm very new to how C# does things, but it seems like both patterns are unsafe because the function that calls Task.Run() might return before the ProcessElem function executes, and if the strings are pass by reference then they will be destroyed before ProcessElem is called.
If this is the case, what would be the best way to pass the string into ProcessElem?
Also, why doesn't the first version actually "call" ProcessElem? I have a print statement at the top of ProcessElem and it only gets printed # the second version.
Welcome to captured variables.
Task.Run(() => this.ProcessElem(arr[i]))
This essentially means:
Take my lambda action: () => this.ProcessElem(arr[i])
Run it after you've found/created a thread to do so. i.e. some time later.
However, there's only one variable involved, i, and that's defined outside your lambda action's scope, it's not being copied, the same variable is just being captured and referenced.
By the time that thread gets around to executing, the value of i has most likely changed. Usually, the loop finishes before the threads perform their work.
That means that by that time, i equals arr.Length and all threads try to access arr[arr.length] which obviously results in an IndexOutOfRangeException.
When you do var tmp = arr[i];, you are creating a fresh variable per loop iteration, copying the loop variable and capturing that copy in your lambda, which is why it works.
The source of your problem is how the actual "coroutines" work in C#
i is not passed as the current value but rather as ref i which means that your Action always will receive the current i value when it gets executed.
Chances are, you run this code and the Tasks are not executed in parallel. That means, the specific task executed gets the current value of i which, in most simple cases, will be as provided as exit condition: arr.Length + 1
to proof:
for (int i = 0; i < arr.Length; i++)
{
if (arr[i] != "")
{
var j = i;
var t = Task.Run(() => ProcessElem(arr[j]));
tasklist.Add(t);
}
}
will work perfectly fine (unless you have some problems in your ProcessElem method :P)
in regards of string-destruction, unless you got some object that implements IDisposable, you should be fine with passing it into some lambda.
It will exist, until the actual lambda got deleted (as it will retain some reference to the object eg. in this case arr)
Your problem is an age old issue, its how lamdas work, and its very well documented.
However, assuming you are just creating and awaiting a bunch of tasks, then save your self code, hassle, and task creation and just use TPL Parallel.For or AsParallel
Parallel.For(0, arr.Length, (i) => ProcessElem(arr[i]));
Or
arr.AsParallel().ForAll(ProcessElem);
Or if you really don't want empty strings
arr.Where(x => !string.IsNullOrEmpty(x))
.AsParallel()
.ForAll(ProcessElem);
I'm using Task to process multiple requests in parallel and passing a different parameter to each task but it seems all the tasks takes one final parameter and execute the method using that.
Below is the sample code. I was expecting output as:
0 1 2 3 4 5 6 ..99
but I get:
100 100 100 ..10 .
May be before print method is called, i's value is already 100 but shouldn't each method print the parameter passed to it? Why would print method takes the final value of i?
class Program
{
static void Main(string[] args)
{
Task[]t = new Task[100];
for (int i = 0; i < 100; i++)
{
t[i] = Task.Factory.StartNew(() => print(i));
}
Task.WaitAll(t);
Console.WriteLine("complete");
Console.ReadLine();
}
private static void print(object i)
{
Console.WriteLine((int)i);
}
}
You're a victim of a closure. A simplest fix to this issue is:
for (int i = 0; i < 100; i++)
{
int v = i;
t[i] = Task.Factory.StartNew(() => print(v));
}
You can find more detailed explanations here and here.
Problems occur when you reference a variable without considering
its scope.
Task[]t = new Task[100];
for (int i = 0; i < 100; i++)
{
t[i] = Task.Factory.StartNew(() => print(i));
}
Task.WaitAll(t);
You might think that, your task will consider each i th value in it's execution. But that won't happen since Task execution start sometime in future. That means, the variable i is shared by all the closures created by the steps of the for loop. By the time the tasks start, the value of the single, shared variable i. This is why all task print same ith value.
The solution is to introduce an additional temporary variable in
the appropriate scope.
Task[]t = new Task[100];
for (int i = 0; i < 100; i++)
{
var temp=i;
t[i] = Task.Factory.StartNew(() => print(temp));
}
Task.WaitAll(t);
This version prints the numbers 1, 2, 3, 4..100 in an arbitrary order, but each
number will be printed. The reason is that the variable tmp is declared
within the block scope of the for loop’s body. This causes a new
variable named tmp to be instantiated with each iteration of the for
loop. (In contrast, all iterations of the for loop share a single instance
of the variable i.)
For info, another fix here is to use the state parameter of the Task API, i.e.
t[i] = Task.Factory.StartNew(state => print((int)state), i);
Unfortunately, since the state parameter is object, this still boxes the value, but it avoids needing an entire closure and separate delegate per call (with the code shown immediately above, the compiler is smart enough to use a single delegate instance for all the iterations; this is not possible if you add a local variable (like the v in BartoszKP's answer), as the target is the closure instance, and that then varies per iteration).
In this example, I'm attempting to pass by value, but the reference is passed instead.
for (int i = 0; i < 10; i++)
{
Thread t = new Thread(() => new PhoneJobTest(i));
t.Start();
}
This can be remedied like so:
for (int i = 0; i < 10; i++)
{
int jobNum = i;
Thread t = new Thread(() => new PhoneJobTest(jobNum));
t.Start();
}
What's is going on here? Why does the original example pass the reference?
Well, that's just how C# works. The lambda expression in your statement constructs a lexical closure, which stores a single reference to i that persists even after the loop has concluded.
To remedy it, you can do just the thing that you did.
Feel free to read more on this particular issue all around the Web; my choice would be Eric Lippert's discussion here.
This is easier to understand if you look at what happens, in terms of scope:
for (int i = 0; i < 10; i++)
{
Thread t = new Thread(() => new PhoneJobTest(i);
t.Start();
}
Basically translates to something very close to this:
int i = 0;
while (i < 10)
{
Thread t = new Thread(() => new PhoneJobTest(i);
t.Start();
i++;
}
When you use a lambda expression, and it uses a variable declared outside of the lambda (in your case, i), the compiler creates something called a closure - a temporary class that "wraps" the i variable up and provides it to the delegate generated by the lambda.
The closure is constructed at the same level as the variable (i), so in your case:
int i = 0;
ClosureClass = new ClosureClass(ref i); // Defined here! (of course, not called this)
while (i < 10)
{
Thread t = new Thread(() => new PhoneJobTest(i);
t.Start();
i++;
}
Because of this, each Thread gets the same closure defined.
When you rework your loop to use a temporary, the closure is generated at that level instead:
for (int i = 0; i < 10; i++)
{
int jobNum = i;
ClosureClass = new ClosureClass(ref jobNum); // Defined here!
Thread t = new Thread(() => new PhoneJobTest(jobNum);
t.Start();
}
Now, each Thread gets its own instance, and everything works properly.
Short answer: closures. Long answer given here (among other places): Differing behavior when starting a thread: ParameterizedThreadStart vs. Anonymous Delegate. Why does it matter?
You definitely want to read Eric Lippert's "Closing over the loop variable considered harmful":
Part 1
Part 2
In Short: The behavior you see is exactly how C# works.
It happens because of the way C# passes parameters to a lambda. It wraps the variable access in a class which is created during compilation, and exposes it as a field to the lambda body.
When using an anonymous delegate or a lambda expression a closure is created so outside variables can be referenced. When the closure is created the stack (value) variables are promoted to the heap.
One way to avoid this is to start the thread with a ParameterizedThreadStart delegate. E.G.:
static void Main()
{
for (int i = 0; i < 10; i++)
{
bool flag = false;
var parameterizedThread = new Thread(ParameterizedDisplayIt);
parameterizedThread.Start(flag);
flag = true;
}
Console.ReadKey();
}
private static void ParameterizedDisplayIt(object flag)
{
Console.WriteLine("Param:{0}", flag);
}
Coincidentally I ran into this concept just yesterday: Link
I have multiple tasks and each starts at a given LinkedListNode. Something like this:
first = linkedList.First;
int counter = 0;
while (iterator != null) {
counter++;
if (counter == threshold) {
Task.Factory.StartNew(() => run(first, iterator));
counter = 0;
first = iterator.Next;
}
iterator = iterator.Next;
}
The idea is that I want to run through a LinkedList and not convert it to an array because of memory requirements. So, I figured I'd just pass in the start and the end and iterate over that.
When my tasks actually start, it seems like the parameters are where ever they left off in the loop. Is there some way I can form a closure over the variables so that the task starts with the correct nodes from the LinkedList?
Or, maybe a better way of accomplishing this goal with a LinkedList?
Consider using Parallel.ForEach on your linked list instead. This looks like it will save you a lot of trouble.
Due to variable closure, you should declare the variable that is captured in your lambda expression within the body of your loop. Otherwise, the task's reads of the first variable would suffer a race condition with the main thread's subsequent updates to it.
first = linkedList.First;
while (iterator != null)
{
// ...
var current = first;
Task.Factory.StartNew(() => run(current));
// ...
}
You can pass state to your Task through the appropriate StartNew method which takes an object parameter. A closure is automatically performed over the data passed in this way to the Task.
I'm using Task to process multiple requests in parallel and passing a different parameter to each task but it seems all the tasks takes one final parameter and execute the method using that.
Below is the sample code. I was expecting output as:
0 1 2 3 4 5 6 ..99
but I get:
100 100 100 ..10 .
May be before print method is called, i's value is already 100 but shouldn't each method print the parameter passed to it? Why would print method takes the final value of i?
class Program
{
static void Main(string[] args)
{
Task[]t = new Task[100];
for (int i = 0; i < 100; i++)
{
t[i] = Task.Factory.StartNew(() => print(i));
}
Task.WaitAll(t);
Console.WriteLine("complete");
Console.ReadLine();
}
private static void print(object i)
{
Console.WriteLine((int)i);
}
}
You're a victim of a closure. A simplest fix to this issue is:
for (int i = 0; i < 100; i++)
{
int v = i;
t[i] = Task.Factory.StartNew(() => print(v));
}
You can find more detailed explanations here and here.
Problems occur when you reference a variable without considering
its scope.
Task[]t = new Task[100];
for (int i = 0; i < 100; i++)
{
t[i] = Task.Factory.StartNew(() => print(i));
}
Task.WaitAll(t);
You might think that, your task will consider each i th value in it's execution. But that won't happen since Task execution start sometime in future. That means, the variable i is shared by all the closures created by the steps of the for loop. By the time the tasks start, the value of the single, shared variable i. This is why all task print same ith value.
The solution is to introduce an additional temporary variable in
the appropriate scope.
Task[]t = new Task[100];
for (int i = 0; i < 100; i++)
{
var temp=i;
t[i] = Task.Factory.StartNew(() => print(temp));
}
Task.WaitAll(t);
This version prints the numbers 1, 2, 3, 4..100 in an arbitrary order, but each
number will be printed. The reason is that the variable tmp is declared
within the block scope of the for loop’s body. This causes a new
variable named tmp to be instantiated with each iteration of the for
loop. (In contrast, all iterations of the for loop share a single instance
of the variable i.)
For info, another fix here is to use the state parameter of the Task API, i.e.
t[i] = Task.Factory.StartNew(state => print((int)state), i);
Unfortunately, since the state parameter is object, this still boxes the value, but it avoids needing an entire closure and separate delegate per call (with the code shown immediately above, the compiler is smart enough to use a single delegate instance for all the iterations; this is not possible if you add a local variable (like the v in BartoszKP's answer), as the target is the closure instance, and that then varies per iteration).