Parallel algorithm slower than sequential - c#

I spent the last few days on creating a parallel version of a code (college work), but I came to a dead end (at least for me): The parallel version is nearly as twice slower than the sequential one, and I have no clue on why. Here is the code:
Variables.GetMatrix();
int ThreadNumber = Environment.ProcessorCount/2;
int SS = Variables.PopSize / ThreadNumber;
//GeneticAlgorithm GA = new GeneticAlgorithm();
Stopwatch stopwatch = new Stopwatch(), st = new Stopwatch(), st1 = new Stopwatch();
List<Thread> ThreadList = new List<Thread>();
//List<Task> TaskList = new List<Task>();
GeneticAlgorithm[] SubPop = new GeneticAlgorithm[ThreadNumber];
Thread t;
//Task t;
ThreadVariables Instance = new ThreadVariables();
stopwatch.Start();
st.Start();
PopSettings();
InitialPopulation();
st.Stop();
//Lots of attributions...
int SPos = 0, EPos = SS;
for (int i = 0; i < ThreadNumber; i++)
{
int temp = i, StartPos = SPos, EndPos = EPos;
t = new Thread(() =>
{
SubPop[temp] = new GeneticAlgorithm(Population, NumSeq, SeqSize, MaxOffset, PopFit, Child, Instance, StartPos, EndPos);
SubPop[temp].RunGA();
SubPop[temp].ShowPopulation();
});
t.Start();
ThreadList.Add(t);
SPos = EPos;
EPos += SS;
}
foreach (Thread a in ThreadList)
a.Join();
double BestFit = SubPop[0].BestSol;
string BestAlign = SubPop[0].TV.Debug;
for (int i = 1; i < ThreadNumber; i++)
{
if (BestFit < SubPop[i].BestSol)
{
BestFit = SubPop[i].BestSol;
BestAlign = SubPop[i].TV.Debug;
Variables.ResSave = SubPop[i].TV.ResSave;
Variables.NumSeq = SubPop[i].TV.NumSeq;
}
}
Basically the code creates an array of the object type, instantiante and run the algorithm in each position of the array, and collecting the best value of the object array at the end. This type of algorithm works on a three-dimentional data array, and on the parallel version I assign each thread to process one range of the array, avoiding concurrency on data. Still, I'm getting the slow timing... Any ideas?
I'm using an Core i5, which has four cores (two + two hyperthreading), but any amount of threads greater than one I use makes the code run slower.
What I can explain of the code I'm running in parallel is:
The second method being called in the code I posted makes about 10,000 iterations, and in each iteration it calls one function. This function may or may not call others more (spread across two different objects for each thread) and make lots of calculations, it depends on a bunch of factors which are particular of the algorithm. And all these methods for one thread work in an area of a data array that isn't accessed by the other threads.

With System.Linq there is a lot to make simpler:
int ThreadNumber = Environment.ProcessorCount/2;
int SS = Variables.PopSize / ThreadNumber;
int numberOfTotalIterations = // I don't know what goes here.
var doneAlgorithms = Enumerable.Range(0, numberOfTotalIterations)
.AsParallel() // Makes the whole thing running in parallel
.WithDegreeOfParallelism(ThreadNumber) // We don't need this line if you want the system to manage the number of parallel processings.
.Select(index=> _runAlgorithmAndReturn(index,SS))
.ToArray(); // This is obsolete if you only need the collection of doneAlgorithms to determine the best one.
// If not, keep it to prevent multiple enumerations.
// So we sort algorithms by BestSol ascending and take the first one to determine the "best".
// OrderBy causes a full enumeration, hence the above mentioned obsoletion of the ToArray() statement.
GeneticAlgorithm best = doneAlgorithms.OrderBy(algo => algo.BestSol).First();
BestFit = best.Bestsol;
BestAlign = best.TV.Debug;
Variables.ResSave = best.TV.ResSave;
Variables.NumSeq = best.TV.NumSeq;
And declare a method to make it a bit more readable
/// <summary>
/// Runs a single algorithm and returns it
/// </summary>
private GeneticAlgorithm _runAlgorithmAndReturn(int index, int SS)
{
int startPos = index * SS;
int endPos = startPos + SS;
var algo = new GeneticAlgorithm(Population, NumSeq, SeqSize, MaxOffset, PopFit, Child, Instance, startPos, endPos);
algo.RunGA();
algo.ShowPopulation();
return algo;
}

There is a big overhead in creating threads.
Instead of creating new threads, use the ThreadPool, as show below:
Variables.GetMatrix();
int ThreadNumber = Environment.ProcessorCount / 2;
int SS = Variables.PopSize / ThreadNumber;
//GeneticAlgorithm GA = new GeneticAlgorithm();
Stopwatch stopwatch = new Stopwatch(), st = new Stopwatch(), st1 = new Stopwatch();
List<WaitHandle> WaitList = new List<WaitHandle>();
//List<Task> TaskList = new List<Task>();
GeneticAlgorithm[] SubPop = new GeneticAlgorithm[ThreadNumber];
//Task t;
ThreadVariables Instance = new ThreadVariables();
stopwatch.Start();
st.Start();
PopSettings();
InitialPopulation();
st.Stop();
//lots of attributions...
int SPos = 0, EPos = SS;
for (int i = 0; i < ThreadNumber; i++)
{
int temp = i, StartPos = SPos, EndPos = EPos;
ManualResetEvent wg = new ManualResetEvent(false);
WaitList.Add(wg);
ThreadPool.QueueUserWorkItem((unused) =>
{
SubPop[temp] = new GeneticAlgorithm(Population, NumSeq, SeqSize, MaxOffset, PopFit, Child, Instance, StartPos, EndPos);
SubPop[temp].RunGA();
SubPop[temp].ShowPopulation();
wg.Set();
});
SPos = EPos;
EPos += SS;
}
ManualResetEvent.WaitAll(WaitList.ToArray());
double BestFit = SubPop[0].BestSol;
string BestAlign = SubPop[0].TV.Debug;
for (int i = 1; i < ThreadNumber; i++)
{
if (BestFit < SubPop[i].BestSol)
{
BestFit = SubPop[i].BestSol;
BestAlign = SubPop[i].TV.Debug;
Variables.ResSave = SubPop[i].TV.ResSave;
Variables.NumSeq = SubPop[i].TV.NumSeq;
}
}
Note that instead of using Join to wait the thread execution, I'm using WaitHandles.

You're creating the threads yourself, so there's some extreme overhead there. Parallelise like the comments suggested. Also make sure the time a single work-unit takes is long enough. A single thread/workunit should be alive for at least ~20 ms.
Pretty basic things really. I'd suggest you really read up on how multi-threading in .NET works.
I see you don't create too many threads. But the optimal threadcount can't be determined just from the processor count. The built-in Parallel class has advanced algorithms to reduce the overall time.
Partitioning and threading are some pretty complex things that require a lot knowledge to get right, so unless you REALLY know what you're doing rely on the Parallel class to handle it for you.

Related

How to fill an array in multiple threads?

There is such an array, I know what is needed through Thread, but I don’t understand how to do it. Do you need to split the array into parts, or can you do something right away?
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
int[] a = new int[10000];
Random rand = new Random();
for (int i = 0; i < a.Length; i++)
{
a[i] = rand.Next(-100, 100);
}
foreach (var p in a)
Console.WriteLine(p);
TimeSpan ts = stopWatch.Elapsed;
stopWatch.Stop();
string elapsedTime = String.Format("{0:00}:{1:00}:{2:00}.{3:00}",
ts.Hours, ts.Minutes, ts.Seconds,
ts.Milliseconds / 10);
Console.WriteLine("RunTime " + elapsedTime);
Another approach, compared to John Wu's, is to use a custom partitioner. I think that it is a little more readable.
using System.Collections.Concurrent;
using System.Threading.Tasks;
int[] a = new int[10000];
int batchSize = 1000;
Random rand = new Random();
Parallel.ForEach(Partitioner.Create(0, a.Length, batchSize), range =>
{
for (int i = range.Item1; i < range.Item2; i++)
{
a[i] = rand.Next(-100, 100);
}
});
In modern c#, you should almost never have to use Thread objects themselves-- they are fraught with peril, and there are other language features that will do the job just as well (see async and TPL). I'll show you a way to do it with TPL.
Note: Due to the problem of false sharing, you need to rig things so that the different threads are working on different memory areas. Otherwise you will see no gain in performance-- indeed, performance could get considerably worse. In this example I divide the array into blocks of 4,000 bytes (1,000 elements) each and work on each block in a separate thread.
using System.Threading.Tasks;
var array = new int[10000];
var offsets = Enumerable.Range(0, 10).Select( x => x * 1000 );
Parallel.ForEach( offsets, offset => {
for ( int i=0; i<1000; i++ )
{
array[offset + i] = random.Next( -100,100 );
}
});
That all being said, I doubt you'll see much of a gain in performance in this example-- the array is much too small to be worth the additional overhead.

c# performance changes rapidly by looping just 0,001% more often

I'm working on an UI project which has to work with huge datasets (every second 35 new values) which will then be displayed in a graph. The user shall be able to change the view from 10 Minutes up to Month view. To archive this I wrote myself a helper function which truncate a lot of data to a 600 byte array which then should be displayed on a LiveView Chart.
I found out that at the beginning the software works very well and fast, but as longer the software runs (e.g. for a month) and the memory usage raises (to ca. 600 mb) the function get's a lot of slower (up to 8x).
So I made some tests to to find the source of this. Quite surprised I found out that there is something like a magic number where the function get's 2x slower , just by changing 71494 loops to 71495 from 19ms to 39ms runtime
I'm really confused. Even when you comment out the second for loop (where the arrays are geting truncated) it is a lot of slower.
Maybe this has something to do with the Garbage Collector? Or does C# compress memory automatically?
Using Visual Studio 2017 with newest updates.
The Code
using System;
using System.Collections.Generic;
using System.Diagnostics;
namespace TempoaryTest
{
class ProductNameStream
{
public struct FileValue
{
public DateTime Time;
public ushort[] Value;
public ushort[] Avg1;
public ushort[] Avg2;
public ushort[] DAvg;
public ushort AlarmDelta;
public ushort AlarmAverage;
public ushort AlarmSum;
}
}
public static class Program
{
private const int MAX_MEASURE_MODEL = 600;
private const int TEST = 71494;
//private const int TEST = 71495;//this one doubles the consuming time!
public static void Main(string[] bleg)
{
List<ProductNameStream.FileValue> fileValues = new List<ProductNameStream.FileValue>();
ProductNameStream.FileValue fil = new ProductNameStream.FileValue();
DateTime testTime = DateTime.Now;
Console.WriteLine("TEST: {0} {1:X}", TEST, TEST);
//Creating example List
for (int n = 0; n < TEST; n++)
{
fil = new ProductNameStream.FileValue
{
Time = testTime = testTime.AddSeconds(1),
Value = new ushort[8],
Avg1 = new ushort[8],
Avg2 = new ushort[8],
DAvg = new ushort[8]
};
for (int i = 0; i < 8; i++)
{
fil.Value[i] = (ushort)(n + i);
fil.Avg1[i] = (ushort)(TEST - n - i);
fil.Avg2[i] = (ushort)(n / (i + 1));
fil.DAvg[i] = (ushort)(n * (i + 1));
}
fil.AlarmDelta = (ushort)DateTime.Now.Ticks;
fil.AlarmAverage = (ushort)(fil.AlarmDelta / 2);
fil.AlarmSum = (ushort)(n);
fileValues.Add(fil);
}
var sw = Stopwatch.StartNew();
/* May look like the same as MAX_MEASURE_MODEL but since we use int
* as counter we must be aware of the int round down.*/
int cnt = (fileValues.Count / (fileValues.Count / MAX_MEASURE_MODEL)) + 1;
ProductNameStream.FileValue[] newFileValues = new ProductNameStream.FileValue[cnt];
ProductNameStream.FileValue[] fileValuesArray = fileValues.ToArray();
//Truncate the big list to a 600 Array
for (int n = 0; n < fileValues.Count; n++)
{
if ((n % (fileValues.Count / MAX_MEASURE_MODEL)) == 0)
{
cnt = n / (fileValues.Count / MAX_MEASURE_MODEL);
newFileValues[cnt] = fileValuesArray[n];
newFileValues[cnt].Value = new ushort[8];
newFileValues[cnt].Avg1 = new ushort[8];
newFileValues[cnt].Avg2 = new ushort[8];
newFileValues[cnt].DAvg = new ushort[8];
}
else
{
for (int i = 0; i < 8; i++)
{
if (newFileValues[cnt].Value[i] < fileValuesArray[n].Value[i])
newFileValues[cnt].Value[i] = fileValuesArray[n].Value[i];
if (newFileValues[cnt].Avg1[i] < fileValuesArray[n].Avg1[i])
newFileValues[cnt].Avg1[i] = fileValuesArray[n].Avg1[i];
if (newFileValues[cnt].Avg2[i] < fileValuesArray[n].Avg2[i])
newFileValues[cnt].Avg2[i] = fileValuesArray[n].Avg2[i];
if (newFileValues[cnt].DAvg[i] < fileValuesArray[n].DAvg[i])
newFileValues[cnt].DAvg[i] = fileValuesArray[n].DAvg[i];
}
if (newFileValues[cnt].AlarmSum < fileValuesArray[n].AlarmSum)
newFileValues[cnt].AlarmSum = fileValuesArray[n].AlarmSum;
if (newFileValues[cnt].AlarmDelta < fileValuesArray[n].AlarmDelta)
newFileValues[cnt].AlarmDelta = fileValuesArray[n].AlarmDelta;
if (newFileValues[cnt].AlarmAverage < fileValuesArray[n].AlarmAverage)
newFileValues[cnt].AlarmAverage = fileValuesArray[n].AlarmAverage;
}
}
Console.WriteLine(sw.ElapsedMilliseconds);
}
}
}
This is most likely being caused by the garbage collector, as you suggested.
I can offer two pieces of evidence to indicate that this is so:
If you put GC.Collect() just before you start the stopwatch, the difference in times goes away.
If you instead change the initialisation of the list to new List<ProductNameStream.FileValue>(TEST);, the difference in time also goes away.
(Initialising the list's capacity to the final size in its constructor prevents multiple reallocations of its internal array while items are being added to it, which will reduce pressure on the garbage collector.)
Therefore, I assert based on this evidence that it is indeed the garbage collector that is impacting your timings.
Incidentally, the threshold value was slightly different for me, and for at least one other person too (which isn't surprising if the timing differences are being caused by the garbage collector).
Your data structure is inefficient and is forcing you to do a lot of allocations during computation. Have a look of thisfixed size array inside a struct
. Also preallocate the list. Don't rely on the list to constantly adjust its size which also creates garbage.

How to run Forloop under a SECOND without altering the loops

I am required to run this forloop in under a SECOND without making any changes
to the loops and the forloop as a whole.
var someLongDataString = "";
const int sLen = 30, loops = 50000;
var source = new string('X', sLen);
Console.WriteLine();
for (var i = 0; i < loops; i++)
{
someLongDataString += source;
}
EDIT: the restriction doesn't apply to someLongDataString += source;
As I indicated in the comment, it implies a change to the statement within the loop.
Your code:
This code:
class Program
{
static void Main(string[] args)
{
var someLongDataString = "";
var builder = new StringBuilder(); // <---- use a StringBuilder instead
const int sLen = 30, loops = 50000;
var source = new string('X', sLen);
var stopwatch = Stopwatch.StartNew();
Console.WriteLine();
for (var i = 0; i < loops; i++)
{
//someLongDataString += source;
builder.Append(source);
}
someLongDataString = builder.ToString();
Console.WriteLine("Elapsed: {0}", stopwatch.Elapsed);
Console.WriteLine("Press any key");
Console.ReadLine();
}
}
After:
Why so fast?
Performing en-mass concatenations to a string is terribly inefficient sadly as new objects are being created, remember string is immutable. StringBuilder isn't and in addition, uses a pre-allocated buffer to accommodate potential additions. So instead of allocating for each and every call, it only restructures it's buffer once it becomes filled.
MSDN has this to say on StringBuilder:
For routines that perform extensive string manipulation (such as apps that modify a string numerous times in a loop), modifying a string repeatedly can exact a significant performance penalty. The alternative is to use StringBuilder, which is a mutable string class. Mutability means that once an instance of the class has been created, it can be modified by appending, removing, replacing, or inserting characters. A StringBuilder object maintains a buffer to accommodate expansions to the string. New data is appended to the buffer if room is available; otherwise, a new, larger buffer is allocated, data from the original buffer is copied to the new buffer, and the new data is then appended to the new buffer. MSDN
Pay particular note to this too:
Although the StringBuilder class generally offers better performance than the String class, you should not automatically replace String with StringBuilder whenever you want to manipulate strings. Performance depends on the size of the string, the amount of memory to be allocated for the new string, the system on which your app is executing, and the type of operation. You should be prepared to test your app to determine whether StringBuilder actually offers a significant performance improvement. MSDN
One way is to use string builder. But this requires a change with in the forloop
var starttime = DateTime.Now;
var someLongDataString = new StringBuilder(100000);
const int sLen = 30, loops = 50000;
var source = new string('X', sLen);
Console.WriteLine();
for (var i = 0; i < loops; i++)
{
someLongDataString.Append(source);
}
Console.WriteLine((DateTime.Now - starttime).Milliseconds);
Console.ReadLine();

How To Make This For Loop Faster

I have the problem that this for loop takes so much time to complete.
I want a faster way to complete it.
ArrayList arrayList = new ArrayList();
byte[] encryptedBytes = null;
for (int i = 0; i < iterations; i++)
{
encryptedBytes = Convert.FromBase64String(inputString.Substring(base64BlockSize * i,
base64BlockSize));
arrayList.AddRange(rsaCryptoServiceProvider.Decrypt(encryptedBytes, true));
}
The iterations variable sometimes is larger than 100,000 and that takes like for ever.
Did you consider running the decryption process in a parallel loop. Your input strings have to be prepared first in a regular loop, but that's a quick process. Then you run the decryption in Parallel.For:
var inputs = new List<string>();
var result = new string[(inputString.Length / 64) - 1];
// Create inputs from the input string.
for (int i = 0; i < iterations; ++i)
{
inputs.Add(inputString.Substring(base64BlockSize * i, base64BlockSize));
}
Parallel.For(0, iterations, i =>
{
var encryptedBytes = Convert.FromBase64String(inputs[i]);
result[i] = rsaCryptoServiceProvider.Decrypt(encryptedBytes, true);
});
I assumed the result returned is a string but if that's not the case then you have to adjust the type for the concurrent bag collection.

Dynamically run more than one thread in c#

I have a program where I need to run a number of threads at the same time
int defaultMaxworkerThreads = 0;
int defaultmaxIOThreads = 0;
ThreadPool.GetMaxThreads(out defaultMaxworkerThreads, out defaultmaxIOThreads);
ThreadPool.SetMaxThreads(defaultMaxworkerThreads, defaultmaxIOThreads);
List<Data1> Data1 = PasswordFileHandler.ReadPasswordFile("Data1.txt");
List<Data1> Data2 = PasswordFileHandler.ReadPasswordFile("Data2.txt");
while (Data1.Count >= 0)
{
List<String> Data1Subset = (from sub in Data1 select sub).Take(NumberOfWordPrThead).ToList();
Data1 = _Data1.Except(Data1Subset ).ToList();
_NumberOfTheadsRunning++;
ThreadPool.QueueUserWorkItem(new WaitCallback(ThreadCompleted), new TaskInfo(Data1Subset , Data2 ));
//Start theads based on how many we like to start
}
How can I run more than 1 thread at a time? I would like to decide the number of threads at run-time, based on the number of cores and a config setting, but my code only seems to always run one one thread.
How should I change it to run on more than one thread?
As #TomTom pointed out, your code will work properly if you set both SetMinThreads and SetMaxThreads. In accordance with MSDN you also have to watch out not to quit the main thread too early, before the execution of the ThreadPool:
// used to simulate different work time
static Random random = new Random();
// worker
static private void callback(Object data)
{
Console.WriteLine(String.Format("Called from {0}", data));
System.Threading.Thread.Sleep(random.Next(100, 1000));
}
//
int minWorker, minIOC;
ThreadPool.GetMinThreads(out minWorker, out minIOC);
ThreadPool.SetMaxThreads(5, minIOC);
ThreadPool.SetMinThreads(3, minIOC);
for(int i = 0; i < 3; i++)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(callback), i.ToString());
}
// give the ThreadPool a change to run
Thread.Sleep(1000);
A good alternative to the standard ThreadPool is the Task Parallel Library which introduces the concept of Tasks. Using the Task object you could for example easily start multiple tasks like this:
// global variable
Random random = new Random(); // used to simulate different work time
// unit of work
private void callback(int i)
{
Console.WriteLine(String.Format("Nr. {0}", i));
System.Threading.Thread.Sleep(random.Next(100, 1000));
}
const int max = 5;
var tasks = new System.Threading.Tasks.Task[max];
for (int i = 0; i < max; i++)
{
var copy = i;
// create the tasks and init the work units
tasks[i] = new System.Threading.Tasks.Task(() => callback(copy));
}
// start the parallel execution
foreach (var task in tasks)
{
task.Start();
}
// optionally wait for all tasks to finish
System.Threading.Tasks.Task.WaitAll(tasks);
You could also start the code execution immediately using Task.Factory like this:
const int max = 5;
var tasks = new System.Threading.Tasks.Task[max];
for (int i = 0; i < max; i++)
{
var copy = i;
// start execution immediately
tasks[i] = System.Threading.Tasks.Task.Factory.StartNew(() => callback(copy));
}
System.Threading.Tasks.Task.WaitAll(tasks);
Have a look at this SO post to see the difference between ThreadPool.QueueUserWorkItem vs. Task.Factory.StartNew.

Categories