Simple parallelisation for hashset - c#

I have 2 loops(nested), trying to do a simple parallelisation
pseudocode:
for item1 in data1 (~100 million row)
for item2 in data2 (~100 rows)
result = process(item1,item2) // couple of if conditions
hashset.add(result) // while adding, incase of a duplicate i also decide wihch one to retain
process(item1,item2) to be precise has 4 if conditions bases on values in item1 and item2.(time taken is less than 50ms)
data1 size is Nx17
data2 size is Nx17
result size is 1x17 (result is joined into a string before it is added into hashset)
max output size: unknown, but i would like to be ready for atleast 500 million
which means the hashset would be holding 500 million items. (how to handle so much data in a hashset would be an another question i guess)
Should i just use a concurrent hashset to make it thread safe and go with parallel.each or should i go with TASK concept
Please provide some code samples based on your opinion.

The answer depends a lot on the costs of process(data1, data2). If this is a CPU-intensive operation, then you can surely benefit from Parallel.ForEach. Of course, you should use concurrent dictionary, or lock around your hash table. You should benchmark to see what works best for you. If process has too little impact on performance, then probably you will get nothing from the parallelization - the locking on the hashtable will kill it all.
You should also try to see whether enumerating data2 on the outer loop is also faster. It might give you another benefit - you can have a separate hashtable for each instance of data2 and then merge the results into one hashtable. This will avoid locks.
Again, you need to do your tests, there is no universal answer here.

My suggestion is to separate the processing of the data from the saving of the results to the HashSet, because the first is parallelizable but the second is not. You could achieve this separation with the producer-consumer pattern, using a BlockingCollection and threads (or tasks). But I'll show a solution using a more specialized tool, the TPL Dataflow library. I'll assume that the data are two arrays of integers, and the processing function can produce up to 500,000,000 different results:
var data1 = Enumerable.Range(1, 100_000_000).ToArray();
var data2 = Enumerable.Range(1, 100).ToArray();
static int Process(int item1, int item2)
{
return unchecked(item1 * item2) % 500_000_000;
}
The dataflow pipeline will have two blocks. The first block is a TransformBlock that accepts an item from the data1 array, processes it with all items of the data2 array, and returns a batch of the results (as an int array).
var processBlock = new TransformBlock<int, int[]>(item1 =>
{
int[] batch = new int[data2.Length];
for (int j = 0; j < data2.Length; j++)
{
batch[j] = Process(item1, data2[j]);
}
return batch;
}, new ExecutionDataflowBlockOptions()
{
BoundedCapacity = 100,
MaxDegreeOfParallelism = 3 // Configurable
});
The second block is and ActionBlock that receives the processed batches from the first block, and adds the individual results in the HashSet.
var results = new HashSet<int>();
var saveBlock = new ActionBlock<int[]>(batch =>
{
for (int i = 0; i < batch.Length; i++)
{
results.Add(batch[i]);
}
}, new ExecutionDataflowBlockOptions()
{
BoundedCapacity = 100,
MaxDegreeOfParallelism = 1 // Mandatory
});
The line below links the two blocks together, so that the data will flow automatically from the first block to the second:
processBlock.LinkTo(saveBlock,
new DataflowLinkOptions() { PropagateCompletion = true });
The last step is to feed the first block with the items of the data1 array, and wait for the completion of the whole operation.
for (int i = 0; i < data1.Length; i++)
{
processBlock.SendAsync(data1[i]).Wait();
}
processBlock.Complete();
saveBlock.Completion.Wait();
The HashSet contains now the results.
A note about using the BoundedCapacity option. This option controls the flow of the data, so that a fast block upstream will not flood with data a slow block downstream. Configuring properly this option increases the memory and CPU efficiency
of the pipeline.
The TPL Dataflow library is built-in the .NET Core, and available as a package for .NET Framework.

Related

C# - Multi-threaded producer-consumer pattern where the consumer also produces work

Assume I have Producer-Consumer pattern where the consumer can also produce additional work. Essentially, imagine a list with 1000 integers:
var LL = new List<int> {1, 2, 3, ....., 1000};
I want to multi-thread sum - so I am taking 2 numbers at a time, summing them and adding the result back to LL. I would do this until there is only 1 entry left in LL when the last outstanding thread returns.
My experimental code looks like this:
var LL = Enumerable.Range(1, 1000).ToList();
Func<int, int, int> sum = (a, b) => { return a + b; };
object o = new object();
int outstandingThreads = 0;
while (LL.Count > 1 || outstandingThreads > 0)
{
//Note that I set an upper bound of 8 simulateneous Threads
if (LL.Count > 1 && outstandingThreads < 8)
{
var l1 = LL[0];
LL.RemoveAt(0);
var l2 = LL[0];
LL.RemoveAt(0);
Interlocked.Increment(ref outstandingThreads);
var t = Task.Factory.StartNew(() =>
{
var rr = l1 + l2;
// In practice I would use a ConcurrentBag and not explicitly log
lock (o)
{
LL.Add(rr);
}
Interlocked.Decrement(ref outstandingThreads);
}, CancellationToken.None,
TaskCreationOptions.DenyChildAttach,
TaskScheduler.Default);
}
}
I'm scratching my head as this is not working. I get a different result almost every time. I must be hitting a race condition that I cannot see. Please note, that processing a List is not my actual test case, just a simplification. If there's a better pattern I could be using, I'm also all ears. Multithreading, as you can see, is not my forte.
You've got a lock around Add, but RemoveAt is also modification of the list.
Why no lock around that?
A race may happen between .Add from worker thread and .RemoveAt from main thread, and it could screw up the .Count property that the List caches (calculating .Count by walking the whole list would be an overkill, so the List caches it for sure), as both Add and Remove do two things: modify the list items and update the .Count, even if it doesn't crash, it may get messed up, so yeah, I think that's it.

Parallelism control incremental variable

I have the following TPL function:
int arrayIndex = 0;
Dictionary < string, int > customModel = new Dictionary < string, int > ();
Task task = Task.Factory.StartNew(() =>
// process each employee holiday
Parallel.ForEach < EmployeeHolidaysModel > (holidays,
new ParallelOptions() {
MaxDegreeOfParallelism = System.Enviroment.ProcessorCount
},
item => {
customModel.Add(item.HolidayName, arrayIndex);
// increment the index
arrayIndex++;
})
);
//wait for all Tasks to finish
Task.WaitAll(task);
The problem is that arrayIndex won't have unique values because of the Parallelism.
Is there a way I can control the arrayIndex variable so between parallel tasks the value is unique?
Basically in my customModel I can't have a duplicate arrayIndex value.
Appreciate any help.
Three problems here:
You are writing to shared variables (both the int and the dictionary). This is unsafe. You must either synchronize or use thread-safe collections.
The amount of work that you're doing per iteration is so small that the overhead of parallelism will be multiple orders of magnitude bigger. This is not a good case for parallelism. Expect major slowdowns.
You start a task, then wait for it. What did you meant to accomplish doing that?
I think you need a basic tutorial about threading. These are very basic issues. You won't be having fun using multi-threading at your current level of knowledge...
You'll need to use Interlocked.Increment(). You should probably also use ConcurrentDictionary to be safe, assuming that's not just sample-code you cooked up for the question.
Similarly, the Task isn't necessary here, since you're just waiting on it to finish filling customModel. Obviously, your scenario may be more complex.
But given the code you posted, I'd do something like:
int arrayIndex = 0;
ConcurrentDictionary<string,int> customModel
= new ConcurrentDictionary<string,int>();
Parallel.ForEach<EmployeeHolidaysModel>(
holidays,
new ParallelOptions() {
MaxDegreeOfParallelism = System.Enviroment.ProcessorCount
},
item => customModel.TryAdd(
item.HolidayName,
Interlocked.Increment(ref arrayIndex)
)
);
NowYouCanDoSomethingWith(customModel);

Consuming IEnumerable in parallel with a fixed number of Tasks/Threads

I have a source IEnumerable<T> which I would like to process in a parallel way having a fixed number of tasks/threads (close to the number of processors) each of them grabbing the next item from the source and process it until all the elements have been iterated.
Parallel.For is not a candidate as the number of elements is unknown.
Parallel.ForEach is not a candidate because is creates many Tasks even when specifying MaxDegreeOfParallelism as this parameter only ensures the maximum number of tasks running concurrently but not the number of tasks created.
Each Task must be notified that the source is traversed until its end so it can run some wrapping up logic.
The elements of the source list cannot be held in memory but must be processed and discarded continuously.
Sounds like a producer/consumer problem with the simplification that the producer can be single-threaded and once the IEnumerable is finished, no more element will be added.
How would a solution for this problem look like using the TPL? Do I have to implement my own shareable thread-safe IEnumerable or does the framework provide something?
EDIT: this is my try with Parallel.ForEach and specifying MaxDegreeOfParallelism which does not prevent the TPL to create many tasks.
int nbTasks = 0;
Parallel.ForEach(positions, new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount },
() => { return new List<IPositionData>(); },
(position, loop, list) =>
{
Thread.Sleep(1);
list.Add(position);
return list;
},
list => Interlocked.Add(ref nbTasks, 1));
Trace.WriteLine(string.Format("Tasks: {0}", nbTasks));
Comment: positions is my source IEnumerable<IPositionData>. I've just ran this and for example, nbTasks is 64 (and not the expected 4 on my 4 cores).
You can limit the number of tasks in Parallel.ForEach by using an overload that expects a ParallelOptions object and setting the MaxDegreeOfParallelism property.
You can limit number of tasks in Parallel.ForEach:
in maxNumberOfTasks = 4;
Parallel.ForEach(collection, new ParallelOptions { MaxDegreeOfParallelism = maxNumberOfTasks},
i => {
//Your action here
});

How to loop a list while using Multi-threading in c#

I have a List to loop while using multi-thread,I will get the first item of the List and do some processing,then remove the item.
While the count of List is not greater than 0 ,fetch data from data.
In a word:
In have a lot of records in my database.I need to publish them to my server.In the process of publishing, multithreading is required and the number of threads may be 10 or less.
For example:
private List<string> list;
void LoadDataFromDatabase(){
list=...;//load data from database...
}
void DoMethod()
{
While(list.Count>0)
{
var item=list.FirstOrDefault();
list.RemoveAt(0);
DoProcess();//how to use multi-thread (custom the count of theads)?
if(list.Count<=0)
{
LoadDataFromDatabase();
}
}
}
Please help me,I'm a beginner of c#,I have searched a lot of solutions, but no similar.
And more,I need to custom the count of theads.
Should your processing of the list be sequential? In other words, cannot you process element n + 1 while not finished yet processing of element n? If this is your case, then Multi-Threading is not the right solution.
Otherwise, if your processing elements are fully independent, you can use m threads, deviding Elements.Count / m elements for each thread to work on
Example: printing a list:
List<int> a = new List<int> { 1, 2, 3, 4,5 , 6, 7, 8, 9 , 10 };
int num_threads = 2;
int thread_elements = a.Count / num_threads;
// start the threads
Thread[] threads = new Thread[num_threads];
for (int i = 0; i < num_threads; ++i)
{
threads[i] = new Thread(new ThreadStart(Work));
threads[i].Start(i);
}
// this works fine if the total number of elements is divisable by num_threads
// but if we have 500 elements, 7 threads, then thread_elements = 500 / 7 = 71
// but 71 * 7 = 497, so that there are 3 elements not processed
// process them here:
int actual = thread_elements * num_threads;
for (int i = actual; i < a.Count; ++i)
Console.WriteLine(a[i]);
// wait all threads to finish
for (int i = 0; i < num_threads; ++i)
{
threads[i].Join();
}
void Work(object arg)
{
Console.WriteLine("Thread #" + arg + " has begun...");
// calculate my working range [start, end)
int id = (int)arg;
int mystart = id * thread_elements;
int myend = (id + 1) * thread_elements;
// start work on my range !!
for (int i = mystart; i < myend; ++i)
Console.WriteLine("Thread #" + arg + " Element " + a[i]);
}
ADD For your case, (uploading to server), it is the same as the code obove. You assign a number of threads, assigning each thread number of elements (which is auto calculated in the variable thread_elements, so you need only to change num_threads). For method Work, all you need is replacing the line Console.WriteLine("Thread #" + arg + " Element " + a[i]); with you uploading code.
One more thing to keep in mind, that multi-threading is dependent on your machine CPU. If your CPU has 4 cores, for example, then the best performance obtained would be 4 threads at maximum, so that assigning each core a thread. Otherwise, if you have 10 threads, for example, they would be slower than 4 threads because they will compete on CPU cores (Unless the threads are idle, waiting for some event to occur (e.g. uploading). In this case, 10 threads can run, because they don't take %100 of CPU usage)
WARNING: DO NOT modify the list while any thread is working (add, remove, set element...), neither assigning two threads the same element. Such things cause you a lot of bugs and exceptions !!!
This is a simple scenario that can be expanded in multiple ways if you add some details to your requirements:
IEnumerable<Data> LoadDataFromDatabase()
{
return ...
}
void ProcessInParallel()
{
while(true)
{
var data = LoadDataFromDatabase().ToList();
if(!data.Any()) break;
data.AsParallel().ForEach(ProcessSingleData);
}
}
void ProcessSingleData(Data d)
{
// do something with data
}
There are many ways to approach this. You can create threads and partition the list yourself or you can take advantage of the TPL and utilize Parallel.ForEach. In the example on the link you see a Action is called for each member of the list being iterated over. If this is your first taste of threading I would also attempt to do it the old fashioned way.
Here my opinion ;)
You can avoid use multithread if youur "List" is not really huge.
Instead of a List, you can use a Queue (FIFO - First In First Out). Then only use Dequeue() method to get one element of the Queue, DoSomeWork and get the another. Something like:
while(queue.Count > 0)
{
var temp = DoSomeWork(queue.Dequeue());
}
I think that this will be better for your propose.
I will get the first item of the List and do some processing,then remove the item.
Bad.
First, you want a queue, not a list.
Second, you do not process then remove, you remove THEN process.
Why?
So that you keep the locks small. Lock list access (note you need to synchonize access), remove, THEN unlock immediately and then process. THis way you keep the locks short. If you take, process, then remove - you basically are single threaded as you have to keep the lock in place while processing, so the next thread does not take the same item again.
And as you need to synchronize access and want multiple threads this is about the only way.
Read up on the lock statement for a start (you can later move to something like spinlock). Do NOT use threads unless you ahve to put schedule Tasks (using the Tasks interface new in 4.0), which gives you more flexibility.

Should I always use Parallel.Foreach because more threads MUST speed up everything?

Does it make sense to you to use for every normal foreach a parallel.foreach loop ?
When should I start using parallel.foreach, only iterating 1,000,000 items?
No, it doesn't make sense for every foreach. Some reasons:
Your code may not actually be parallelizable. For example, if you're using the "results so far" for the next iteration and the order is important)
If you're aggregating (e.g. summing values) then there are ways of using Parallel.ForEach for this, but you shouldn't just do it blindly
If your work will complete very fast anyway, there's no benefit, and it may well slow things down
Basically nothing in threading should be done blindly. Think about where it actually makes sense to parallelize. Oh, and measure the impact to make sure the benefit is worth the added complexity. (It will be harder for things like debugging.) TPL is great, but it's no free lunch.
No, you should definitely not do that. The important point here is not really the number of iterations, but the work to be done. If your work is really simple, executing 1000000 delegates in parallel will add a huge overhead and will most likely be slower than a traditional single threaded solution. You can get around this by partitioning the data, so you execute chunks of work instead.
E.g. consider the situation below:
Input = Enumerable.Range(1, Count).ToArray();
Result = new double[Count];
Parallel.ForEach(Input, (value, loopState, index) => { Result[index] = value*Math.PI; });
The operation here is so simple, that the overhead of doing this in parallel will dwarf the gain of using multiple cores. This code runs significantly slower than a regular foreach loop.
By using a partition we can reduce the overhead and actually observe a gain in performance.
Parallel.ForEach(Partitioner.Create(0, Input.Length), range => {
for (var index = range.Item1; index < range.Item2; index++) {
Result[index] = Input[index]*Math.PI;
}
});
The morale of the story here is that parallelism is hard and you should only employ this after looking closely at the situation at hand. Additionally, you should profile the code both before and after adding parallelism.
Remember that regardless of any potential performance gain parallelism always adds complexity to the code, so if the performance is already good enough, there's little reason to add the complexity.
The short answer is no, you should not just use Parallel.ForEach or related constructs on each loop that you can.
Parallel has some overhead, which is not justified in loops with few, fast iterations. Also, break is significantly more complex inside these loops.
Parallel.ForEach is a request to schedule the loop as the task scheduler sees fit, based on number of iterations in the loop, number of CPU cores on the hardware and current load on that hardware. Actual parallel execution is not always guaranteed, and is less likely if there are fewer cores, the number of iterations is low and/or the current load is high.
See also Does Parallel.ForEach limits the number of active threads? and Does Parallel.For use one Task per iteration?
The long answer:
We can classify loops by how they fall on two axes:
Few iterations through to many iterations.
Each iteration is fast through to each iteration is slow.
A third factor is if the tasks vary in duration very much – for instance if you are calculating points on the Mandelbrot set, some points are quick to calculate, some take much longer.
When there are few, fast iterations it's probably not worth using parallelisation in any way, most likely it will end up slower due to the overheads. Even if parallelisation does speed up a particular small, fast loop, it's unlikely to be of interest: the gains will be small and it's not a performance bottleneck in your application so optimise for readability not performance.
Where a loop has very few, slow iterations and you want more control, you may consider using Tasks to handle them, along the lines of:
var tasks = new List<Task>(actions.Length);
foreach(var action in actions)
{
tasks.Add(Task.Factory.StartNew(action));
}
Task.WaitAll(tasks.ToArray());
Where there are many iterations, Parallel.ForEach is in its element.
The Microsoft documentation states that
When a parallel loop runs, the TPL partitions the data source so that
the loop can operate on multiple parts concurrently. Behind the
scenes, the Task Scheduler partitions the task based on system
resources and workload. When possible, the scheduler redistributes
work among multiple threads and processors if the workload becomes
unbalanced.
This partitioning and dynamic re-scheduling is going to be harder to do effectively as the number of loop iterations decreases, and is more necessary if the iterations vary in duration and in the presence of other tasks running on the same machine.
I ran some code.
The test results below show a machine with nothing else running on it, and no other threads from the .Net Thread Pool in use. This is not typical (in fact in a web-server scenario it is wildly unrealistic). In practice, you may not see any parallelisation with a small number of iterations.
The test code is:
namespace ParallelTests
{
class Program
{
private static int Fibonacci(int x)
{
if (x <= 1)
{
return 1;
}
return Fibonacci(x - 1) + Fibonacci(x - 2);
}
private static void DummyWork()
{
var result = Fibonacci(10);
// inspect the result so it is no optimised away.
// We know that the exception is never thrown. The compiler does not.
if (result > 300)
{
throw new Exception("failed to to it");
}
}
private const int TotalWorkItems = 2000000;
private static void SerialWork(int outerWorkItems)
{
int innerLoopLimit = TotalWorkItems / outerWorkItems;
for (int index1 = 0; index1 < outerWorkItems; index1++)
{
InnerLoop(innerLoopLimit);
}
}
private static void InnerLoop(int innerLoopLimit)
{
for (int index2 = 0; index2 < innerLoopLimit; index2++)
{
DummyWork();
}
}
private static void ParallelWork(int outerWorkItems)
{
int innerLoopLimit = TotalWorkItems / outerWorkItems;
var outerRange = Enumerable.Range(0, outerWorkItems);
Parallel.ForEach(outerRange, index1 =>
{
InnerLoop(innerLoopLimit);
});
}
private static void TimeOperation(string desc, Action operation)
{
Stopwatch timer = new Stopwatch();
timer.Start();
operation();
timer.Stop();
string message = string.Format("{0} took {1:mm}:{1:ss}.{1:ff}", desc, timer.Elapsed);
Console.WriteLine(message);
}
static void Main(string[] args)
{
TimeOperation("serial work: 1", () => Program.SerialWork(1));
TimeOperation("serial work: 2", () => Program.SerialWork(2));
TimeOperation("serial work: 3", () => Program.SerialWork(3));
TimeOperation("serial work: 4", () => Program.SerialWork(4));
TimeOperation("serial work: 8", () => Program.SerialWork(8));
TimeOperation("serial work: 16", () => Program.SerialWork(16));
TimeOperation("serial work: 32", () => Program.SerialWork(32));
TimeOperation("serial work: 1k", () => Program.SerialWork(1000));
TimeOperation("serial work: 10k", () => Program.SerialWork(10000));
TimeOperation("serial work: 100k", () => Program.SerialWork(100000));
TimeOperation("parallel work: 1", () => Program.ParallelWork(1));
TimeOperation("parallel work: 2", () => Program.ParallelWork(2));
TimeOperation("parallel work: 3", () => Program.ParallelWork(3));
TimeOperation("parallel work: 4", () => Program.ParallelWork(4));
TimeOperation("parallel work: 8", () => Program.ParallelWork(8));
TimeOperation("parallel work: 16", () => Program.ParallelWork(16));
TimeOperation("parallel work: 32", () => Program.ParallelWork(32));
TimeOperation("parallel work: 64", () => Program.ParallelWork(64));
TimeOperation("parallel work: 1k", () => Program.ParallelWork(1000));
TimeOperation("parallel work: 10k", () => Program.ParallelWork(10000));
TimeOperation("parallel work: 100k", () => Program.ParallelWork(100000));
Console.WriteLine("done");
Console.ReadLine();
}
}
}
the results on a 4-core Windows 7 machine are:
serial work: 1 took 00:02.31
serial work: 2 took 00:02.27
serial work: 3 took 00:02.28
serial work: 4 took 00:02.28
serial work: 8 took 00:02.28
serial work: 16 took 00:02.27
serial work: 32 took 00:02.27
serial work: 1k took 00:02.27
serial work: 10k took 00:02.28
serial work: 100k took 00:02.28
parallel work: 1 took 00:02.33
parallel work: 2 took 00:01.14
parallel work: 3 took 00:00.96
parallel work: 4 took 00:00.78
parallel work: 8 took 00:00.84
parallel work: 16 took 00:00.86
parallel work: 32 took 00:00.82
parallel work: 64 took 00:00.80
parallel work: 1k took 00:00.77
parallel work: 10k took 00:00.78
parallel work: 100k took 00:00.77
done
Running code Compiled in .Net 4 and .Net 4.5 give much the same results.
The serial work runs are all the same. It doesn't matter how you slice it, it runs in about 2.28 seconds.
The parallel work with 1 iteration is slightly longer than no parallelism at all. 2 items is shorter, so is 3 and with 4 or more iterations is all about 0.8 seconds.
It is using all cores, but not with 100% efficiency. If the serial work was divided 4 ways with no overhead it would complete in 0.57 seconds (2.28 / 4 = 0.57).
In other scenarios I saw no speed-up at all with parallel 2-3 iterations. You do not have fine-grained control over that with Parallel.ForEach and the algorithm may decide to "partition " them into just 1 chunk and run it on 1 core if the machine is busy.
There is no lower limit for doing parallel operations. If you have only 2 items to work on but each one will take a while, it might still make sense to use Parallel.ForEach. On the other hand if you have 1000000 items but they don't do very much, the parallel loop might not go any faster than the regular loop.
For example, I wrote a simple program to time nested loops where the outer loop ran both with a for loop and with Parallel.ForEach. I timed it on my 4-CPU (dual-core, hyperthreaded) laptop.
Here's a run with only 2 items to work on, but each takes a while:
2 outer iterations, 100000000 inner iterations:
for loop: 00:00:00.1460441
ForEach : 00:00:00.0842240
Here's a run with millions of items to work on, but they don't do very much:
100000000 outer iterations, 2 inner iterations:
for loop: 00:00:00.0866330
ForEach : 00:00:02.1303315
The only real way to know is to try it.
In general, once you go above a thread per core, each extra thread involved in an operation will make it slower, not faster.
However, if part of each operation will block (the classic example being waiting on disk or network I/O, another being producers and consumers that are out of synch with each other) then more threads than cores can begin to speed things up again, because tasks can be done while other threads are unable to make progress until the I/O operation returns.
For this reason, when single-core machines were the norm, the only real justifications in multi-threading were when either there was blocking of the sort I/O introduces or else to improve responsiveness (slightly slower to perform a task, but much quicker to start responding to user-input again).
Still, these days single-core machines are increasingly rare, so it would appear that you should be able to make everything at least twice as fast with parallel processing.
This will still not be the case if order is important, or something inherent to the task forces it to have a synchronised bottleneck, or if the number of operations is so small that the increase in speed from parallel processing is outweighed by the overheads involved in setting up that parallel processing. It may or may not be the case if a share resource requires threads to block on other threads performing the same parallel operation (depending on the degree of lock contention).
Also, if your code is inherently multithreaded to begin with, you can be in a situation where you are essentially competing for resources with yourself (a classic case being ASP.NET code handling simultaneous requests). Here the advantage in parallel operation may mean that a single test operation on a 4-core machine approaches 4 times the performance, but once the number of requests needing the same task to be performed reaches 4, then since each of those 4 requests are each trying to use each core, it becomes little better than if they had a core each (perhaps slightly better, perhaps slightly worse). The benefits of parallel operation hence disappears as the use changes from a single-request test to a real-world multitude of requests.
You shouldn't blindly replace every single foreach loop in your application with the parallel foreach. More threads doesn't necessary mean that your application will work faster. You need to slice the task into smaller tasks which could run in parallel if you want to really benefit from multiple threads. If your algorithm is not parallelizable you won't get any benefit.
No. You need to understand what the code is doing and whether it is amenable to parallelization. Dependencies between your data items can make it hard to parallelize, i.e., if a thread uses the value calculated for the previous element it has to wait until the value is calculated anyway and can't run in parallel. You also need to understand your target architecture, though, you will typically have a multicore CPU on just about anything you buy these days. Even on a single core, you can get some benefits from more threads but only if you have some blocking tasks. You should also keep in mind that there is overhead in creating and organizing the parallel threads. If this overhead is a significant fraction of (or more than) the time your task takes you could slow it down.
These are my benchmarks showing pure serial is slowest, along with various levels of partitioning.
class Program
{
static void Main(string[] args)
{
NativeDllCalls(true, 1, 400000000, 0); // Seconds: 0.67 |) 595,203,995.01 ops
NativeDllCalls(true, 1, 400000000, 3); // Seconds: 0.91 |) 439,052,826.95 ops
NativeDllCalls(true, 1, 400000000, 4); // Seconds: 0.80 |) 501,224,491.43 ops
NativeDllCalls(true, 1, 400000000, 8); // Seconds: 0.63 |) 635,893,653.15 ops
NativeDllCalls(true, 4, 100000000, 0); // Seconds: 0.35 |) 1,149,359,562.48 ops
NativeDllCalls(true, 400, 1000000, 0); // Seconds: 0.24 |) 1,673,544,236.17 ops
NativeDllCalls(true, 10000, 40000, 0); // Seconds: 0.22 |) 1,826,379,772.84 ops
NativeDllCalls(true, 40000, 10000, 0); // Seconds: 0.21 |) 1,869,052,325.05 ops
NativeDllCalls(true, 1000000, 400, 0); // Seconds: 0.24 |) 1,652,797,628.57 ops
NativeDllCalls(true, 100000000, 4, 0); // Seconds: 0.31 |) 1,294,424,654.13 ops
NativeDllCalls(true, 400000000, 0, 0); // Seconds: 1.10 |) 364,277,890.12 ops
}
static void NativeDllCalls(bool useStatic, int nonParallelIterations, int parallelIterations = 0, int maxParallelism = 0)
{
if (useStatic) {
Iterate<string, object>(
(msg, cntxt) => {
ServiceContracts.ForNativeCall.SomeStaticCall(msg);
}
, "test", null, nonParallelIterations,parallelIterations, maxParallelism );
}
else {
var instance = new ServiceContracts.ForNativeCall();
Iterate(
(msg, cntxt) => {
cntxt.SomeCall(msg);
}
, "test", instance, nonParallelIterations, parallelIterations, maxParallelism);
}
}
static void Iterate<T, C>(Action<T, C> action, T testMessage, C context, int nonParallelIterations, int parallelIterations=0, int maxParallelism= 0)
{
var start = DateTime.UtcNow;
if(nonParallelIterations == 0)
nonParallelIterations = 1; // normalize values
if(parallelIterations == 0)
parallelIterations = 1;
if (parallelIterations > 1) {
ParallelOptions options;
if (maxParallelism == 0) // default max parallelism
options = new ParallelOptions();
else
options = new ParallelOptions { MaxDegreeOfParallelism = maxParallelism };
if (nonParallelIterations > 1) {
Parallel.For(0, parallelIterations, options
, (j) => {
for (int i = 0; i < nonParallelIterations; ++i) {
action(testMessage, context);
}
});
}
else { // no nonParallel iterations
Parallel.For(0, parallelIterations, options
, (j) => {
action(testMessage, context);
});
}
}
else {
for (int i = 0; i < nonParallelIterations; ++i) {
action(testMessage, context);
}
}
var end = DateTime.UtcNow;
Console.WriteLine("\tSeconds: {0,8:0.00} |) {1,16:0,000.00} ops",
(end - start).TotalSeconds, (Math.Max(parallelIterations, 1) * nonParallelIterations / (end - start).TotalSeconds));
}
}

Categories