Trying to pass an element of an array of strings into a function which is being called in a Task.Run. Anyone know what is the error here?
The code here doesn't work, it behaves as if ProcessElem never gets called.
string[] arr = message.Split(new string[] {"\n"}, StringSplitOptions.None);
for (int i = 0; i < arr.Length; i++) {
if(arr[i] != "") {
var t = Task.Run(() => this.ProcessElem(arr[i]));
}
}
However the code below works
string[] arr = message.Split(new string[] {"\n"}, StringSplitOptions.None);
for (int i = 0; i < arr.Length; i++) {
if(arr[i] != "") {
var tmp = arr[i];
var t = Task.Run(() => this.ProcessElem(tmp));
}
}
I'm very new to how C# does things, but it seems like both patterns are unsafe because the function that calls Task.Run() might return before the ProcessElem function executes, and if the strings are pass by reference then they will be destroyed before ProcessElem is called.
If this is the case, what would be the best way to pass the string into ProcessElem?
Also, why doesn't the first version actually "call" ProcessElem? I have a print statement at the top of ProcessElem and it only gets printed # the second version.
Welcome to captured variables.
Task.Run(() => this.ProcessElem(arr[i]))
This essentially means:
Take my lambda action: () => this.ProcessElem(arr[i])
Run it after you've found/created a thread to do so. i.e. some time later.
However, there's only one variable involved, i, and that's defined outside your lambda action's scope, it's not being copied, the same variable is just being captured and referenced.
By the time that thread gets around to executing, the value of i has most likely changed. Usually, the loop finishes before the threads perform their work.
That means that by that time, i equals arr.Length and all threads try to access arr[arr.length] which obviously results in an IndexOutOfRangeException.
When you do var tmp = arr[i];, you are creating a fresh variable per loop iteration, copying the loop variable and capturing that copy in your lambda, which is why it works.
The source of your problem is how the actual "coroutines" work in C#
i is not passed as the current value but rather as ref i which means that your Action always will receive the current i value when it gets executed.
Chances are, you run this code and the Tasks are not executed in parallel. That means, the specific task executed gets the current value of i which, in most simple cases, will be as provided as exit condition: arr.Length + 1
to proof:
for (int i = 0; i < arr.Length; i++)
{
if (arr[i] != "")
{
var j = i;
var t = Task.Run(() => ProcessElem(arr[j]));
tasklist.Add(t);
}
}
will work perfectly fine (unless you have some problems in your ProcessElem method :P)
in regards of string-destruction, unless you got some object that implements IDisposable, you should be fine with passing it into some lambda.
It will exist, until the actual lambda got deleted (as it will retain some reference to the object eg. in this case arr)
Your problem is an age old issue, its how lamdas work, and its very well documented.
However, assuming you are just creating and awaiting a bunch of tasks, then save your self code, hassle, and task creation and just use TPL Parallel.For or AsParallel
Parallel.For(0, arr.Length, (i) => ProcessElem(arr[i]));
Or
arr.AsParallel().ForAll(ProcessElem);
Or if you really don't want empty strings
arr.Where(x => !string.IsNullOrEmpty(x))
.AsParallel()
.ForAll(ProcessElem);
Related
I'm using Task to process multiple requests in parallel and passing a different parameter to each task but it seems all the tasks takes one final parameter and execute the method using that.
Below is the sample code. I was expecting output as:
0 1 2 3 4 5 6 ..99
but I get:
100 100 100 ..10 .
May be before print method is called, i's value is already 100 but shouldn't each method print the parameter passed to it? Why would print method takes the final value of i?
class Program
{
static void Main(string[] args)
{
Task[]t = new Task[100];
for (int i = 0; i < 100; i++)
{
t[i] = Task.Factory.StartNew(() => print(i));
}
Task.WaitAll(t);
Console.WriteLine("complete");
Console.ReadLine();
}
private static void print(object i)
{
Console.WriteLine((int)i);
}
}
You're a victim of a closure. A simplest fix to this issue is:
for (int i = 0; i < 100; i++)
{
int v = i;
t[i] = Task.Factory.StartNew(() => print(v));
}
You can find more detailed explanations here and here.
Problems occur when you reference a variable without considering
its scope.
Task[]t = new Task[100];
for (int i = 0; i < 100; i++)
{
t[i] = Task.Factory.StartNew(() => print(i));
}
Task.WaitAll(t);
You might think that, your task will consider each i th value in it's execution. But that won't happen since Task execution start sometime in future. That means, the variable i is shared by all the closures created by the steps of the for loop. By the time the tasks start, the value of the single, shared variable i. This is why all task print same ith value.
The solution is to introduce an additional temporary variable in
the appropriate scope.
Task[]t = new Task[100];
for (int i = 0; i < 100; i++)
{
var temp=i;
t[i] = Task.Factory.StartNew(() => print(temp));
}
Task.WaitAll(t);
This version prints the numbers 1, 2, 3, 4..100 in an arbitrary order, but each
number will be printed. The reason is that the variable tmp is declared
within the block scope of the for loop’s body. This causes a new
variable named tmp to be instantiated with each iteration of the for
loop. (In contrast, all iterations of the for loop share a single instance
of the variable i.)
For info, another fix here is to use the state parameter of the Task API, i.e.
t[i] = Task.Factory.StartNew(state => print((int)state), i);
Unfortunately, since the state parameter is object, this still boxes the value, but it avoids needing an entire closure and separate delegate per call (with the code shown immediately above, the compiler is smart enough to use a single delegate instance for all the iterations; this is not possible if you add a local variable (like the v in BartoszKP's answer), as the target is the closure instance, and that then varies per iteration).
I've been toying with parallelism and I'm having some trouble understanding what's going on in my program.
I'm trying to replicate some of the functionality of the XNA framework. I'm using a component-style setup and one way I thought of making my program a little more efficient was to call the Update method of each component in a separate Task. However, I'm obviously doing something horribly wrong.
The code I'm using in my loop for an update call is:
public void Update(GameTime gameTime)
{
Task[] tasks = new Task[engineComponents.Count];
for (int i = 0; i < tasks.Length; i++)
{
tasks[i] = new Task(() => engineComponents[i].Update(gameTime));
tasks[i].Start();
}
Task.WaitAll(tasks);
}
This throws a weird error:
An unhandled exception of type 'System.AggregateException' occurred in mscorlib.dll
The inner exception talks about an index being out of range.
If I change
Task[] tasks = new Task[engineComponents.Count];
to
Task[] tasks = new Task[engineComponents.Count - 1];
then this seems to work (or at least the program executes without an exception), but there isn't enough room in the array for all of the components. Nonetheless, all of the components are updated, despite there not being enough room in the tasks array to hold them all.
However, the gameTime object that is passed as a parameter goes somewhat insane when the game is running. I've found it hard to pin-point the problem, but I have two components that both simply move a circle's x-position using
x += (float)(gameTime.ElapsedGameTime.TotalSeconds * 10);
However, when using Tasks, their x-positions very quickly become disparate from one another, when they should in fact be the same. Each engineComponent.Update(gameTime) is called once per-update cycle, and the same gameTime object is passed.
When using tasks[i].RunSynchronously(); in place of tasks[i].Start();, the program runs exactly as expected.
I understand that using Tasks in this manner may not be a particularly efficient programming practice, so my question is one of curiosity: why isn't the above code working as I would expect? I know I'm missing something obvious, but I've been unable to track down what specifically is wrong with this implementation.
Apologies for the long question, and thanks for reading ;)
The problem is that your lambda expression is capturing i - not the value of i, but the variable itself.
That means that by the time your task executes, the loop may well be on the next iteration (or even later). So some of your components may be updated more than once, some may not be updated at all, and the final tasks are likely to execute when i is outside the range of engineComponents, hence the exception. For more details, see Eric Lippert's blog posts:
Closing over the loop variable considered harmful
Closing over the loop variable, part two
Three options to fix this:
Take a copy of the variable inside the loop. Each variable declared inside the loop will be captured separately:
for (int i = 0; i < tasks.Length; i++)
{
int copyOfI = i;
tasks[i] = new Task(() => engineComponents[copyOfI].Update(gameTime));
tasks[i].Start();
}
Capture engineComponents[i] in a separate variable instead:
for (int i = 0; i < tasks.Length; i++)
{
var component = engineComponents[i];
tasks[i] = new Task(() => component.Update(gameTime));
tasks[i].Start();
}
If you're using C# 5, using a foreach loop will do what you want:
var tasks = new List<Task>();
foreach (var component in engineComponents)
{
Task task = new Task(() => component.Update(gameTime));
tasks.Add(task);
task.Start();
}
Task.WaitAll(tasks.ToArray());
Note that the last solution will not work with the C# 4 compiler, as the behaviour of the foreach iteration variable was for it to be a single variable, just like i. You don't need to target .NET 4.5 or higher for it to work, but you do need to use a C# 5 compiler.
Another option is to not use tasks explicitly at all - use Parallel.ForEach instead:
// This replaces your entire method body
Parallel.ForEach(engineComponents, component => component.Update(gameTime));
Much simpler!
Try the following:
for (int i = 0; i < tasks.Length; i++)
{
var innerI = i;
tasks[i] = new Task(() => engineComponents[innerI].Update(gameTime));
tasks[i].Start();
}
You need a new variable for each task, which will be captured by linq expression and will hold index of your job part. Now all your tasks uses i variable and perform work on the latest element.
Your loop variable is captured in a closure. Here is an article by Eric Lippert for a more detailed explanation. You can easily solve the issue by declaring an inner variable inside the loop:
public void Update(GameTime gameTime)
{
Task[] tasks = new Task[engineComponents.Count];
for (int i = 0; i < tasks.Length; i++)
{
int inner = i; // Declare another temp variable
tasks[i] = new Task(() => engineComponents[inner].Update(gameTime));
tasks[i].Start();
}
Task.WaitAll(tasks);
}
I have multiple tasks and each starts at a given LinkedListNode. Something like this:
first = linkedList.First;
int counter = 0;
while (iterator != null) {
counter++;
if (counter == threshold) {
Task.Factory.StartNew(() => run(first, iterator));
counter = 0;
first = iterator.Next;
}
iterator = iterator.Next;
}
The idea is that I want to run through a LinkedList and not convert it to an array because of memory requirements. So, I figured I'd just pass in the start and the end and iterate over that.
When my tasks actually start, it seems like the parameters are where ever they left off in the loop. Is there some way I can form a closure over the variables so that the task starts with the correct nodes from the LinkedList?
Or, maybe a better way of accomplishing this goal with a LinkedList?
Consider using Parallel.ForEach on your linked list instead. This looks like it will save you a lot of trouble.
Due to variable closure, you should declare the variable that is captured in your lambda expression within the body of your loop. Otherwise, the task's reads of the first variable would suffer a race condition with the main thread's subsequent updates to it.
first = linkedList.First;
while (iterator != null)
{
// ...
var current = first;
Task.Factory.StartNew(() => run(current));
// ...
}
You can pass state to your Task through the appropriate StartNew method which takes an object parameter. A closure is automatically performed over the data passed in this way to the Task.
I'm using Task to process multiple requests in parallel and passing a different parameter to each task but it seems all the tasks takes one final parameter and execute the method using that.
Below is the sample code. I was expecting output as:
0 1 2 3 4 5 6 ..99
but I get:
100 100 100 ..10 .
May be before print method is called, i's value is already 100 but shouldn't each method print the parameter passed to it? Why would print method takes the final value of i?
class Program
{
static void Main(string[] args)
{
Task[]t = new Task[100];
for (int i = 0; i < 100; i++)
{
t[i] = Task.Factory.StartNew(() => print(i));
}
Task.WaitAll(t);
Console.WriteLine("complete");
Console.ReadLine();
}
private static void print(object i)
{
Console.WriteLine((int)i);
}
}
You're a victim of a closure. A simplest fix to this issue is:
for (int i = 0; i < 100; i++)
{
int v = i;
t[i] = Task.Factory.StartNew(() => print(v));
}
You can find more detailed explanations here and here.
Problems occur when you reference a variable without considering
its scope.
Task[]t = new Task[100];
for (int i = 0; i < 100; i++)
{
t[i] = Task.Factory.StartNew(() => print(i));
}
Task.WaitAll(t);
You might think that, your task will consider each i th value in it's execution. But that won't happen since Task execution start sometime in future. That means, the variable i is shared by all the closures created by the steps of the for loop. By the time the tasks start, the value of the single, shared variable i. This is why all task print same ith value.
The solution is to introduce an additional temporary variable in
the appropriate scope.
Task[]t = new Task[100];
for (int i = 0; i < 100; i++)
{
var temp=i;
t[i] = Task.Factory.StartNew(() => print(temp));
}
Task.WaitAll(t);
This version prints the numbers 1, 2, 3, 4..100 in an arbitrary order, but each
number will be printed. The reason is that the variable tmp is declared
within the block scope of the for loop’s body. This causes a new
variable named tmp to be instantiated with each iteration of the for
loop. (In contrast, all iterations of the for loop share a single instance
of the variable i.)
For info, another fix here is to use the state parameter of the Task API, i.e.
t[i] = Task.Factory.StartNew(state => print((int)state), i);
Unfortunately, since the state parameter is object, this still boxes the value, but it avoids needing an entire closure and separate delegate per call (with the code shown immediately above, the compiler is smart enough to use a single delegate instance for all the iterations; this is not possible if you add a local variable (like the v in BartoszKP's answer), as the target is the closure instance, and that then varies per iteration).
I just encountered the following behavior:
for (var i = 0; i < 50; ++i) {
Task.Factory.StartNew(() => {
Debug.Print("Error: " + i.ToString());
});
}
Will result in a series of "Error: x", where most of the x are equal to 50.
Similarly:
var a = "Before";
var task = new Task(() => Debug.Print("Using value: " + a));
a = "After";
task.Start();
Will result in "Using value: After".
This clearly means that the concatenation in the lambda expression does not occur immediately. How is it possible to use a copy of the outer variable in the lambda expression, at the time the expression is declared? The following will not work better (which is not necessarily incoherent, I admit):
var a = "Before";
var task = new Task(() => {
var a2 = a;
Debug.Print("Using value: " + a2);
});
a = "After";
task.Start();
This has more to do with lambdas than threading. A lambda captures the reference to a variable, not the variable's value. This means that when you try to use i in your code, its value will be whatever was stored in i last.
To avoid this, you should copy the variable's value to a local variable when the lambda starts. The problem is, starting a task has overhead and the first copy may be executed only after the loop finishes. The following code will also fail
for (var i = 0; i < 50; ++i) {
Task.Factory.StartNew(() => {
var i1=i;
Debug.Print("Error: " + i1.ToString());
});
}
As James Manning noted, you can add a variable local to the loop and copy the loop variable there. This way you are creating 50 different variables to hold the value of the loop variable, but at least you get the expected result. The problem is, you do get a lot of additional allocations.
for (var i = 0; i < 50; ++i) {
var i1=i;
Task.Factory.StartNew(() => {
Debug.Print("Error: " + i1.ToString());
});
}
The best solution is to pass the loop parameter as a state parameter:
for (var i = 0; i < 50; ++i) {
Task.Factory.StartNew(o => {
var i1=(int)o;
Debug.Print("Error: " + i1.ToString());
}, i);
}
Using a state parameter results in fewer allocations. Looking at the decompiled code:
the second snippet will create 50 closures and 50 delegates
the third snippet will create 50 boxed ints but only a single delegate
That's because you are running the code in a new thread, and the main thread immediately goes on to change the variable. If the lambda expression were executed immediately, the entire point of using a task would be lost.
The thread doesn't get its own copy of the variable at the time the task is created, all the tasks use the same variable (which actually is stored in the closure for the method, it's not a local variable).
Lambda expressions do capture not the value of the outer variable but a reference to it. That is the reason why you do see 50 or After in your tasks.
To solve this create before your lambda expression a copy of it to capture it by value.
This unfortunate behaviour will be fixed by the C# compiler with .NET 4.5 until then you need to live with this oddity.
Example:
List<Action> acc = new List<Action>();
for (int i = 0; i < 10; i++)
{
int tmp = i;
acc.Add(() => { Console.WriteLine(tmp); });
}
acc.ForEach(x => x());
Lambda expressions are by definition lazily evaluated so they will not be evaluated until actually called. In your case by the task execution. If you close over a local in your lambda expression the state of the local at the time of execution will be reflected. Which is what you see. You can take advantage of this. E.g. your for loop really don't need a new lambda for every iteration assuming for the sake of the example that the described result was what you intended you could write
var i =0;
Action<int> action = () => Debug.Print("Error: " + i);
for(;i<50;+i){
Task.Factory.StartNew(action);
}
on the other hand if you wished that it actually printed "Error: 1"..."Error 50" you could change the above to
var i =0;
Func<Action<int>> action = (x) => { return () => Debug.Print("Error: " + x);}
for(;i<50;+i){
Task.Factory.StartNew(action(i));
}
The first closes over i and will use the state of i at the time the Action is executed and the state is often going to be the state after the loop finishes. In the latter case i is evaluated eagerly because it's passed as an argument to a function. This function then returns an Action<int> which is passed to StartNew.
So the design decision makes both lazily evaluation and eager evaluation possible. Lazily because locals are closed over and eagerly because you can force locals to be executed by passing them as an argument or as shown below declaring another local with a shorter scope
for (var i = 0; i < 50; ++i) {
var j = i;
Task.Factory.StartNew(() => Debug.Print("Error: " + j));
}
All the above is general for Lambdas. In the specific case of StartNew there's actually an overload that does what the second example does so that can be simplified to
var i =0;
Action<object> action = (x) => Debug.Print("Error: " + x);}
for(;i<50;+i){
Task.Factory.StartNew(action,i);
}