Parallel array processing in C# - c#

I have an array of 921600 numbers between 0 and 255.
I need to check each number whether it's above a threshold or not.
Is it possible to check the first- and second half of the array at the same time, to cut down on run time?
What I mean is, is it possible to run the following two for loops in parallel?
for(int i = 0; i < 921600 / 2; i++)
{
if(arr[i] > 240) counter++;
}
for(int j = 921600 / 2; j < 921600; j++)
{
if(arr[j] > 240) counter++;
}
Thank you in advance!

I suggest using Parallel Linq (PLinq) for this
int[] source = ...
int count = source
.AsParallel() // comment this out if you want sequential version
.Count(item => item > 240);

What you are asking is strictly possible as per below.
int counter = 0;
var tasks = new List<Task>();
var arr = Enumerable.Range(0, 921600).ToArray();
tasks.Add(Task.Factory.StartNew(() =>
{
for (int i = 0; i < 921600 / 2; i++)
{
if (arr[i] > 240) counter++;
}
}));
tasks.Add(Task.Factory.StartNew(() =>
{
for (int j = 921600 / 2; j < 921600; j++)
{
if (arr[j] > 240) counter++;
}
}));
Task.WaitAll(tasks.ToArray());
Do not use this code! You will encounter a race condition with incrementing the integer where one thread's increment is lost due to a Read, Read, Write, Write situation. Running this in LinqPad, I ended up with counter being anything between 600000 and 800000. Obviously that range is nowhere near the actual value.
The solution to this race condition is to introduce a lock that means that only one thread can touch the variable at any one time. This negates the ability for the assignment to be multithreaded but allows us to get the correct answer. (This takes 0.042s on my machine for reference)
int counter = 0;
var tasks = new List<Task>();
var arr = Enumerable.Range(0, 921600).ToArray();
var locker = new Object();
tasks.Add(Task.Factory.StartNew(() =>
{
for (int i = 0; i < 921600 / 2; i++)
{
if (arr[i] > 240)
lock (locker)
counter++;
}
}));
tasks.Add(Task.Factory.StartNew(() =>
{
for (int j = 921600 / 2; j < 921600; j++)
{
if (arr[j] > 240)
lock (locker)
counter++;
}
}));
Task.WaitAll(tasks.ToArray());
The solution is indeed to use Parallel Linq as Dmitry has suggested:
Enumerable.Range(0, 921600).AsParallel().Count(x=>x>240);
This takes 0.031s which is quicker than our locking code and still returns the correct answer but removing the AsParallel call makes it run in 0.024s. Running a piece of code in parallel introduces a overhead to manage the threads. Sometimes the performance improvement outweighs this but a surprisingly large amount of the time it doesn't.
The moral of the story is to always run some metrics/timings of your expected data against your implementation of any code to check whether there is actually a performance benefit.

while googling for parallel concepts, came across your query. Might be the below little trick might help you
int n=921600/2;
for(int i=0; i<n; i++)
{
if(arr[i]>240) counter ++;
if(arr[n + i] > 240) counter ++;
}

Related

Using Modulo instead of two loops

I'm new at programming in general and learning C# right now.
I just wrote a little programm where I have to step through an int[] in a specific pattern. The pattern is as follows:
Start at the last entry of the array (int i)
form the sum of i and (if avaible) the three entrys above (e.g. i += i-1 ... i += i-3)
Change i to i -= 4 (if avaible)
Repeat from step 2 until i = 0;
Therefore i wrote the following loop:
for (int i = intArray.Length - 1; i >= 0; i -= 4)
{
for (int a = 1; a <= 3; a++)
{
if (i - a >= 0)
{
intArray[i] += intArray[i - a];
intArray[i - a] = 0;
}
}
}
Now my new assignment is to change my code to only use 1 loop with the help of modulo-operations. I do understand what modulo does, but i can't figure out how to use it to get rid of the second loop.
Maybe somebody explain it to me? Thank you very much in advance.
While iterating over the array, the idea would be to use the modulo 4 operation to calculate the next index to which you will add the current value.
This should work with any array lengths:
for (int i = 0; i < intArray.Length; i++){
// we check how far away we are from next index which stores the sum
var offset = (intArray.Length - 1 - i) % 4;
if (offset == 0) continue;
intArray[i+offset] += intArray[i];
intArray[i] = 0;
}
iterating the array reversely makes it a bit more complicated, but what you wanted to get rid of inner loop with modulo is probably this:
var intArray = Enumerable.Range(1, 15).ToArray();
for (int i = 1; i < intArray.Length - 1; i ++)
{
intArray[i - (i % 4)] += intArray[i];
intArray[i] = 0;
}

Why writing by column is slow in two dimensional array in C#

I have two-dimensional array when I am adding values by column it write very slowly (less than 300x):
class Program
{
static void Main(string[] args)
{
TwoDimArrayPerfomrance.GetByColumns();
TwoDimArrayPerfomrance.GetByRows();
}
}
class TwoDimArrayPerfomrance
{
public static void GetByRows()
{
int maxLength = 20000;
int[,] a = new int[maxLength, maxLength];
DateTime dt = DateTime.Now;
Console.WriteLine("The current time is: " + dt.ToString());
//fill value
for (int i = 0; i < maxLength; i++)
{
for (int j = 0; j < maxLength; j++)
{
a[i, j] = i + j;
}
}
DateTime end = DateTime.Now;
Console.WriteLine("Total: " + end.Subtract(dt).TotalSeconds);
}
public static void GetByColumns()
{
int maxLength = 20000;
int[,] a = new int[maxLength, maxLength];
DateTime dt = DateTime.Now;
Console.WriteLine("The current time is: " + dt.ToString());
for (int i = 0; i < maxLength; i++)
{
for (int j = 0; j < maxLength; j++)
{
a[j, i] = j + i;
}
}
DateTime end = DateTime.Now;
Console.WriteLine("Total: " + end.Subtract(dt).TotalSeconds);
}
}
The Column vice taking around 4.2 seconds
while Row wise taking 1.53
It is the "cache proximity" problem mentioned in the first comment. There are memory caches that any data must go through to be accessed by the CPU. Those caches store blocks of memory, so if you are first accessing memory N and then memory N+1 then cache is not changed. But if you first access memory N and then memory N+M (where M is big enough) then new memory block must be added to the cache. When you add new block to the cache some existing block must be removed. If you then have to access this removed block then you have inefficiency in the code.
I concur fully with what #Dialecticus wrote... I'll just add that there are bad ways to write a microbenchark, and there are worse ways. There are many things to do when microbenchmarking. Remembering to run in Release mode without the debugger attached, remembering that there is a GC, and that it is better if it runs when you want it to run, and not casually when you are benchmarking, remembering that sometimes the code is compiled only after it is executed at least once, so at least a round of full warmup is a good idea... and so on... There is even a full library about benchmarking (https://benchmarkdotnet.org/articles/overview.html) that is used by Microscot .NET Core teams to check that there are no speed regressions on the code they write.
class Program
{
static void Main(string[] args)
{
if (Debugger.IsAttached)
{
Console.WriteLine("Warning, debugger attached!");
}
#if DEBUG
Console.WriteLine("Warning, Debug version!");
#endif
Console.WriteLine($"Running at {(Environment.Is64BitProcess ? 64 : 32)}bits");
Console.WriteLine(RuntimeInformation.FrameworkDescription);
Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;
Console.WriteLine();
const int MaxLength = 10000;
for (int i = 0; i < 10; i++)
{
Console.WriteLine($"Round {i + 1}:");
TwoDimArrayPerfomrance.GetByRows(MaxLength);
GC.Collect();
GC.WaitForPendingFinalizers();
TwoDimArrayPerfomrance.GetByColumns(MaxLength);
GC.Collect();
GC.WaitForPendingFinalizers();
Console.WriteLine();
}
}
}
class TwoDimArrayPerfomrance
{
public static void GetByRows(int maxLength)
{
int[,] a = new int[maxLength, maxLength];
Stopwatch sw = Stopwatch.StartNew();
//fill value
for (int i = 0; i < maxLength; i++)
{
for (int j = 0; j < maxLength; j++)
{
a[i, j] = i + j;
}
}
sw.Stop();
Console.WriteLine($"By Rows, size {maxLength} * {maxLength}, {sw.ElapsedMilliseconds / 1000.0:0.00} seconds");
// So that the assignment isn't optimized out, we do some fake operation on the array
for (int i = 0; i < maxLength; i++)
{
for (int j = 0; j < maxLength; j++)
{
if (a[i, j] == int.MaxValue)
{
throw new Exception();
}
}
}
}
public static void GetByColumns(int maxLength)
{
int[,] a = new int[maxLength, maxLength];
Stopwatch sw = Stopwatch.StartNew();
//fill value
for (int i = 0; i < maxLength; i++)
{
for (int j = 0; j < maxLength; j++)
{
a[j, i] = i + j;
}
}
sw.Stop();
Console.WriteLine($"By Columns, size {maxLength} * {maxLength}, {sw.ElapsedMilliseconds / 1000.0:0.00} seconds");
// So that the assignment isn't optimized out, we do some fake operation on the array
for (int i = 0; i < maxLength; i++)
{
for (int j = 0; j < maxLength; j++)
{
if (a[i, j] == int.MaxValue)
{
throw new Exception();
}
}
}
}
}
Ah... and multi-dimensional arrays of the type FooType[,] went the way of the dodo with .NET 3.5, when LINQ came out and it didn't support them. You should use jagged arrays FooType[][].
If you try to map your two dimensional array to a one dimensional one, it might be a bit easier to see what is going on.
The mapping gives
var a = int[maxLength * maxLength];
Now the lookup calculation is up to you.
for (int i = 0; i < maxLength; i++)
{
for (int j = 0; j < maxLength; j++)
{
//var rowBased = j + i * MaxLength;
var colBased = i + j * MaxLength;
//a[rowBased] = i + j;
a[colBased] = i + j;
}
}
So observe the following
On column based lookup the number of multiplications is 20.000 * 20.000 multiplications, because j changes for every loop
On row based lookup the i * MaxLength is compiler optimised and only happens 20.000 times.
Now that a is a one dimensional array it's also easier to see how the memory is being accessed. On row based index the memory is access sequentially, where as column based access is almost random and depending on the size of the array the overhead will vary as you have seen it.
Looking a bit on what BenchmarkDotNet produces
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
AMD Ryzen 9 3900X, 1 CPU, 24 logical and 12 physical cores
.NET Core SDK=5.0.101
Method
MaxLength
Mean
Error
StdDev
GetByRows
100
23.60 us
0.081 us
0.076 us
GetByColumns
100
23.74 us
0.357 us
0.334 us
GetByRows
1000
2,333.20 us
13.150 us
12.301 us
GetByColumns
1000
2,784.43 us
10.027 us
8.889 us
GetByRows
10000
238,599.37 us
1,592.838 us
1,412.009 us
GetByColumns
10000
516,771.56 us
4,272.849 us
3,787.770 us
GetByRows
50000
5,903,087.26 us
13,822.525 us
12,253.308 us
GetByColumns
50000
19,623,369.45 us
92,325.407 us
86,361.243 us
You will see that while MaxLength is reasonable small, the differences are almost negligible (100x100) and (1000x1000), because I expect that the CPU can keep the allocated two dimensional array in the fast access memory cache and the differences are only related to the number of multiplications.
When the matrix becomes larger, then the CPU can no longer keep all allocated memory in its internal cache and we will start to see cache-misses and fetching memory from the external memory storage instead, which is always going to be a lot slower.
That overhead just increases as the size of the matrix grows.

How to implement Tasks in function? c#

I have to multiply matrix by vector using class Task from TPL (it's labaratory work and I have to). I did it using Parallel For and it works but now I'm stuck because I have no idea how to implement it using tasks.
public static int[] matxvecParallel(int [,] mat, int[] vec)ParallelFor
{
int[] res = new int[mat.GetLength(0)];
Parallel.For(0, mat.GetLength(0), i =>
{
for (int k = 0; k < mat.GetLength(1); k++)
{
res[i] += mat[i, k] * vec[k];
}
});
return res;
}
I did something stupid to find out how tasks works and I still don't understand.
How to change my code?
public static int[] matxvecTask(int[,] mat, int[] vec)
{
int[] res = new int[mat.GetLength(0)];
int countTasks = 4;
Task[] arrayOfTasks = new Task[countTasks];
for (int k = 0; k < mat.GetLength(0); k++)
{
for(int i = 0; i < countTasks; i++)
{
int index = i;
arrayOfTasks[index] = Task.Run(() =>
{
for (int j = 0; j < mat.GetLength(1); j++)
{
res[i] += mat[i, j] * vec[j];
}
});
}
}
return res;
}
To make it work, change this line to use index instead of i:
res[index] += mat[index, j] * vec[j];
When you use i, you fall in the trap of closures.
Then, you should also wait for all the tasks to complete before moving on to the next iteration:
Task.WaitAll(arrayOfTasks);
Now, you will gain absolutely nothing if you replace Parallel.For with tasks. You only make your code more complicated. Both tasks and Parallel will execute your computations on the thread pool. However, Parallel is optimized especially for the purpose and it easier to write and read.

Jagged array of tasks - Concurrency Issues

I am defining a jagged array of threads (such that each thread can operate on the directory tree of its own) in this manner
Task[][] threads = new Task[InstancesDir.Length][];
for (int i = 0; i < InstancesDir.Length; i++)
{
threads[i] = new Task[InstancesDir[i].Length];
}
for (int i = 0; i < FilesDir.Length; i++)
{
for (int j = 0; j < FilesDir[i].Length; j++)
{
threads[i][j] = Task.Run(() =>
{
Calculate(i, j, InstancesDir, FilesDir, PointSum);
});
}
Task.WaitAll(threads[i]);
}
But in calculate i always get value of j >= FilesDir[i].Length . I have also checked that objects are passed by value except arrays. What could be a workaround for this and what could be the reason for this behavior?
PS. Introducing a shared lock might help in mitigation the concurrency issue but i want to know about the reason for such behavior.
But in calculate i always get value of j >= FilesDir[i].Length
This isn't a concurrency issue, as your for loop is executing on a single thread. This happens because the lambda expression is closing over your i and j variables. This effect is called Closure.
In order to avoid it, create a temp copy before passing both variables to Task.Run:
var tempJ = j;
var tempI = i;
threads[tempI][tempJ] = Task.Run(() =>
{
Calculate(tempI, tempJ, InstancesDir, FilesDir, PointSum);
});

reimplement a loop using Parallel.For

how to re implement the loop below using Parallel.For?
for (int i = 0; i < data.Length; ++i)
{
int cluster = clustering[i];
for (int j = 0; j < data[i].Length; ++j)
means[cluster][j] += data[i][j]; // accumulate sum
}
getting better performance and speed up is the goal.
You can mostly just replace the outer loop. However, you need to take care with the setting, as you're setting values from multiple threads:
Parallel.For(0, data.Length, i =>
{
int cluster = clustering[i];
for (int j = 0; j < data[i].Length; ++j)
Interlocked.Add(ref means[cluster][j], data[i][j]);
});
However, this may not run any faster, and may actually run significantly slower, as you could easily introduce false sharing since everything is reading from and writing to the same arrays.

Categories