Jagged array of tasks - Concurrency Issues - c#

I am defining a jagged array of threads (such that each thread can operate on the directory tree of its own) in this manner
Task[][] threads = new Task[InstancesDir.Length][];
for (int i = 0; i < InstancesDir.Length; i++)
{
threads[i] = new Task[InstancesDir[i].Length];
}
for (int i = 0; i < FilesDir.Length; i++)
{
for (int j = 0; j < FilesDir[i].Length; j++)
{
threads[i][j] = Task.Run(() =>
{
Calculate(i, j, InstancesDir, FilesDir, PointSum);
});
}
Task.WaitAll(threads[i]);
}
But in calculate i always get value of j >= FilesDir[i].Length . I have also checked that objects are passed by value except arrays. What could be a workaround for this and what could be the reason for this behavior?
PS. Introducing a shared lock might help in mitigation the concurrency issue but i want to know about the reason for such behavior.

But in calculate i always get value of j >= FilesDir[i].Length
This isn't a concurrency issue, as your for loop is executing on a single thread. This happens because the lambda expression is closing over your i and j variables. This effect is called Closure.
In order to avoid it, create a temp copy before passing both variables to Task.Run:
var tempJ = j;
var tempI = i;
threads[tempI][tempJ] = Task.Run(() =>
{
Calculate(tempI, tempJ, InstancesDir, FilesDir, PointSum);
});

Related

Single-Threaded enumeration works, but Multi-Threaded does not, not sure why

Some more details:
I have a list of arrays of circle objects. Each circle has a list area that corresponds to a pixel on a bitmap.
When I call diffCircles, I'm comparing the vector2's in each area of each circle in each array iteratively to remove their area from the original circle.
I'm trying to add multithreading to a project of mine (to process a lot of short operations in a small span of time).
The original method is:
for(int i = 0; i < generations.Count-1; i++)
{
for(int j = 0; j < generations[i].Length; j++)
{
foreach (Circle circle in generations[i][j].children)
{
Circle.DiffCircles(generations[0][0], circle);
}
}
}
And with the above implementation, the program works perfectly as intended (it's just slow).
However if I try to change it to:
for (int i = 0; i < generations.Count - 1; i++)
{
for (int j = 0; j < generations[i].Length; j++)
{
foreach (Circle circle in generations[i][j].children)
{
Thread diffCirc = new Thread(() => Circle.DiffCircles(generations[0][0], circle));
diffCirc.Start();
}
}
}
The program stops working as intended.
The DiffCircles method is
public static void DiffCircles(Circle original, Circle toSubtract)
{
for (int i = 0; i < original.area.Count - 1; i++)
{
for (int j = 0; j > toSubtract.area.Count - 1; j++)
{
if (original.area[i-1].X == toSubtract.area[j-1].X
&& original.area[i-1].Y == toSubtract.area[j-1].Y)
{
original.area.Remove(original.area[i-1]);
}
}
}
}
Could I get any advice? I've never worked with multi-threading before. I can't understand why the multithreaded implementation doesn't work, but the singlethreaded implementation does. Would appreciate any advice!
Maybe deadlock caused by thread contention.
You can try to lock your thread.
lock(new object()){
Thread diffCirc = new Thread(() => Circle.DiffCircles(generations[0][0], circle));
diffCirc.Start();
}

Parallel array processing in C#

I have an array of 921600 numbers between 0 and 255.
I need to check each number whether it's above a threshold or not.
Is it possible to check the first- and second half of the array at the same time, to cut down on run time?
What I mean is, is it possible to run the following two for loops in parallel?
for(int i = 0; i < 921600 / 2; i++)
{
if(arr[i] > 240) counter++;
}
for(int j = 921600 / 2; j < 921600; j++)
{
if(arr[j] > 240) counter++;
}
Thank you in advance!
I suggest using Parallel Linq (PLinq) for this
int[] source = ...
int count = source
.AsParallel() // comment this out if you want sequential version
.Count(item => item > 240);
What you are asking is strictly possible as per below.
int counter = 0;
var tasks = new List<Task>();
var arr = Enumerable.Range(0, 921600).ToArray();
tasks.Add(Task.Factory.StartNew(() =>
{
for (int i = 0; i < 921600 / 2; i++)
{
if (arr[i] > 240) counter++;
}
}));
tasks.Add(Task.Factory.StartNew(() =>
{
for (int j = 921600 / 2; j < 921600; j++)
{
if (arr[j] > 240) counter++;
}
}));
Task.WaitAll(tasks.ToArray());
Do not use this code! You will encounter a race condition with incrementing the integer where one thread's increment is lost due to a Read, Read, Write, Write situation. Running this in LinqPad, I ended up with counter being anything between 600000 and 800000. Obviously that range is nowhere near the actual value.
The solution to this race condition is to introduce a lock that means that only one thread can touch the variable at any one time. This negates the ability for the assignment to be multithreaded but allows us to get the correct answer. (This takes 0.042s on my machine for reference)
int counter = 0;
var tasks = new List<Task>();
var arr = Enumerable.Range(0, 921600).ToArray();
var locker = new Object();
tasks.Add(Task.Factory.StartNew(() =>
{
for (int i = 0; i < 921600 / 2; i++)
{
if (arr[i] > 240)
lock (locker)
counter++;
}
}));
tasks.Add(Task.Factory.StartNew(() =>
{
for (int j = 921600 / 2; j < 921600; j++)
{
if (arr[j] > 240)
lock (locker)
counter++;
}
}));
Task.WaitAll(tasks.ToArray());
The solution is indeed to use Parallel Linq as Dmitry has suggested:
Enumerable.Range(0, 921600).AsParallel().Count(x=>x>240);
This takes 0.031s which is quicker than our locking code and still returns the correct answer but removing the AsParallel call makes it run in 0.024s. Running a piece of code in parallel introduces a overhead to manage the threads. Sometimes the performance improvement outweighs this but a surprisingly large amount of the time it doesn't.
The moral of the story is to always run some metrics/timings of your expected data against your implementation of any code to check whether there is actually a performance benefit.
while googling for parallel concepts, came across your query. Might be the below little trick might help you
int n=921600/2;
for(int i=0; i<n; i++)
{
if(arr[i]>240) counter ++;
if(arr[n + i] > 240) counter ++;
}

reimplement a loop using Parallel.For

how to re implement the loop below using Parallel.For?
for (int i = 0; i < data.Length; ++i)
{
int cluster = clustering[i];
for (int j = 0; j < data[i].Length; ++j)
means[cluster][j] += data[i][j]; // accumulate sum
}
getting better performance and speed up is the goal.
You can mostly just replace the outer loop. However, you need to take care with the setting, as you're setting values from multiple threads:
Parallel.For(0, data.Length, i =>
{
int cluster = clustering[i];
for (int j = 0; j < data[i].Length; ++j)
Interlocked.Add(ref means[cluster][j], data[i][j]);
});
However, this may not run any faster, and may actually run significantly slower, as you could easily introduce false sharing since everything is reading from and writing to the same arrays.

Thread safe pass integer to Action with Task

The following code does not work as I expect it. What am I doing wrong? Output is different on every run. Is there a better way of doing this? Assume action does something more complex than what's below.
Action<int> action = (int m) =>
{
if ((m % 2) == 0)
Console.WriteLine("Even");
else
Console.WriteLine("Odd");
};
const int n = 10;
Task[] tasks = new Task[n];
for (int i = 0; i < n; i++)
{
tasks[i] = Task.Factory.StartNew(() => action(i+1));
}
Task.WaitAll(tasks);
The lambda in your loop is capturing a reference to the same i variable every time through the loop, not its value.
Change your loop to something like:
for (int i = 0; i < n; i++)
{
var j = i;
tasks[i] = Task.Factory.StartNew(() => action(j+1));
}
Note that the output will still be different on every run, but you should get exactly five even and five odd outputs.

Use array of threads

I'm new to threads so it might be an easy one for you, but I've spent some hours trying to figure it out.
Let's say I have a function
public double Gain(List<int> lRelevantObsIndex, ushort uRelevantAttribute)
which needs some time to finish, but is a read only func.
I have an array of ushort[] values, and I want to get the ushort value that achieves the minimum value of the Gain function.
Here is what I've got so far, but it's not working:
lRelevantObsIndex is a read only index.
lRelevantAttributes is the list of ushort values.
//Initialize the threads
double[] aGains = new double[lRelevantAttributes.Count];
Thread[] aThreads = new Thread[lRelevantAttributes.Count];
for (int i = 0; i < lRelevantAttributes.Count; i++)
{
aThreads[i] = new Thread(() => aGains[i] = Gain(lRelevantObsIndex, lRelevantAttributes[i]));
aThreads[i].Start();
}
//Join the threads
for (int i = 0; i < lRelevantAttributes.Count; i++)
aThreads[i].Join();
//The easy part - find the minimum once all threads are done
ushort uResult = 0;
double dMinGain = UInt16.MaxValue;
for (int i = 0; i < lRelevantAttributes.Count; i++)
{
if (aGains[i] < dMinGain)
{
dMinGain = aGains[i];
uResult = lRelevantAttributes[i];
}
}
return uResult;
I know this is a simple multithreading question - but still need your brains since I'm new to this.
This one is somewhat tricky: your for loop uses a modified value here (a so-called access to modified closure)
for (int i = 0; i < lRelevantAttributes.Count; i++)
{
aThreads[i] = new Thread(() => aGains[i] = Gain(lRelevantObsIndex, lRelevantAttributes[i]));
aThreads[i].Start();
}
At the time the thread starts, i will be different in your lambda, accessing a wrong item. Modify your loop as follows:
for (int ii = 0; ii < lRelevantAttributes.Count; ii++)
{
var i = ii; // Now i is a temporary inside the loop, so its value will be captured instead
aThreads[i] = new Thread(() => aGains[i] = Gain(lRelevantObsIndex, lRelevantAttributes[i]));
aThreads[i].Start();
}
This will fix the problem, because lambdas will capture the current value of the temporary variable i on each iteration of the loop.
I'm not sure if this is your problem, but it is a problem:
for (int i = 0; i < lRelevantAttributes.Count; i++)
{
aThreads[i] = new Thread(() => aGains[i] = Gain(lRelevantObsIndex, lRelevantAttributes[i]));
aThreads[i].Start();
}
When a lambda refers to a loop variable, the binding is delayed, so that when your lambda actually runs, it takes the value of i at the time the lambda runs, not the value it had when the lambda was created. To fix this, declare a secondary variable inside the loop, and use that in the lambda:
for (int i = 0; i < lRelevantAttributes.Count; i++)
{
int j = i;
aThreads[i] = new Thread(() => aGains[j] = Gain(lRelevantObsIndex, lRelevantAttributes[j]));
aThreads[i].Start();
}
You can do the same on Task
[Fact]
public void Test()
{
List<Task<int>> tasks = Enumerable.Range(0, 5) //- it's equivalent how many threads
.Select(x => Task.Run(() => DoWork(x)))
.ToList();
int[] result = Task.WhenAll(tasks).Result; //- Join threads
result.ToList().ForEach(Console.WriteLine);
}
private int DoWork(int taskId)
{
return taskId;
}
Result output:
3
0
1
2
4

Categories