Difference between ThreadPool.QueueUserWorkItem and Parallel.ForEach?

Difference between ThreadPool.QueueUserWorkItem and Parallel.ForEach? - c#

What is the main difference between two of following approaches:
ThreadPool.QueueUserWorkItem
Clients objClient = new Clients();
List<Clients> objClientList = Clients.GetClientList();
foreach (var list in objClientList)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(SendFilesToClient), list);
}
System.Threading.Tasks.Parallel ForEach
Clients objClient = new Clients();
List<Clients> objClientList = Clients.GetClientList();
Parallel.ForEach<Clients>(objClientList, list =>
{
SendFilesToClient(list);
});
I am new to multi-threading and want to know what's going to happen in each case (in terms of execution process) what's the level of multi-threading for each approach? Help me visualize both the processes.
SendFilesToClient: Gets data from database, converts to Excel and sends the Excel file to respective client.
Thanks!

The main difference is functional. Parallel.ForEach will block (by design), so it will not return until all of the objects have been processed. Your foreach queuing threadpool thread work will push the work onto background threads, and not block.
Also, the Parallel.ForEach version will have another major advantages - unhandled exceptions will be pushed back to the call site here, instead of left unhandled on a ThreadPool thread.
In general, Parallel.ForEach will be more efficient. Both options use the ThreadPool, but Parallel.ForEach does intelligent partitioning to prevent overthreading and to reduce the amount of overhead required by the scheduler. Individual tasks (which will map to ThreadPool threads) get reused, and effectively "pooled" to lower overhead, especially if SendFilesToClient is a fast operation (which, in this case, will not be true).
Note that you can also, as a third option, use PLINQ:
objClientList.AsParallel().ForAll(SendFilesToClient);
This will be very similar to the Parallel.ForEach method in terms of performance and functionality.

Related

What does the Parallel.Foreach do behind the scenes?

So I just cant grasp the concept here.
I have a Method that uses the Parallel class with the Foreach method.
But the thing I dont understand is, does it create new threads so it can run the function faster?
Let's take this as an example.
I do a normal foreach loop.
private static void DoSimpleWork()
{
foreach (var item in collection)
{
//DoWork();
}
}
What that will do is, it will take the first item in the list, assign the method DoWork(); to it and wait until it finishes. Simple, plain and works.
Now.. There are three cases I am curious about
If I do this.
Parallel.ForEach(stringList, simpleString =>
{
DoMagic(simpleString);
});
Will that split up the Foreach into let's say 4 chunks?
So what I think is happening is that it takes the first 4 lines in the list, assigns each string to each "thread" (assuming parallel creates 4 virtual threads) does the work and then starts with the next 4 in that list?
If that is wrong please correct me I really want to understand how this works.
And then we have this.
Which essentially is the same but with a new parameter
Parallel.ForEach(stringList, new ParallelOptions() { MaxDegreeOfParallelism = 32 }, simpleString =>
{
DoMagic(simpleString);
});
What I am curious about is this
new ParallelOptions() { MaxDegreeOfParallelism = 32 }
Does that mean it will take the first 32 strings from that list (if there even is that many in the list) and then do the same thing as I was talking about above?
And for the last one.
Task.Factory.StartNew(() =>
{
Parallel.ForEach(stringList, simpleString =>
{
DoMagic(simpleString);
});
});
Would that create a new task, assigning each "chunk" to it's own task?

Do not mix async code with parallel. Task is for async operations - querying a DB, reading file, awaiting some comparatively-computation-cheap operation such that your UI won't be blocked and unresponsive.
Parallel is different. That's designed for 1) multi-core systems and 2) computational-intensive operations. I won't go in details how it works, that kind of info could be found in an MS documentation. Long story short, Parallel.For most probably will make it's own decision on what exactly when and how to run. It might disobey you parameters, i.e. MaxDegreeOfParallelism or somewhat else. The whole idea is to provide the best possible parallezation, thus complete your operation as fast as possible.

Parallel.ForEach perform the equivalent of a C# foreach loop, but with each iteration executing in parallel instead of sequentially. There is no sequencing, it depends on whether the OS can find an available thread, if there is it will execute
MaxDegreeOfParallelism
By default, For and ForEach will utilize as many threads as the OS provides, so changing MaxDegreeOfParallelism from the default only limits how many concurrent tasks will be used by the application.
You do not need to modify this parameter in general but may choose to change it in advanced scenarios:
When you know that a particular algorithm you're using won't scale
beyond a certain number of cores. You can set the property to avoid
wasting cycles on additional cores.
When you're running multiple algorithms concurrently and want to
manually define how much of the system each algorithm can utilize.
When the thread pool's heuristics is unable to determine the right
number of threads to use and could end up injecting too many
threads. e.g. in long-running loop body iterations, the
thread pool might not be able to tell the difference between
reasonable progress or livelock or deadlock, and might not be able
to reclaim threads that were added to improve performance. You can set the property to ensure that you don't use more than a reasonable number of threads.
Task.StartNew is usually used when you require fine-grained control for a long-running, compute-bound task, and like what #Сергей Боголюбов mentioned, do not mix them up
It creates a new task, and that task will create threads asynchronously to run the for loop
You may find this ebook useful: http://www.albahari.com/threading/#_Introduction

does the work and then starts with the next 4 in that list?
This depends on your machine's hardware and how busy the machine's cores are with other processes/apps your CPU is working on
Does that mean it will take the first 32 strings from that list (if there even if that many in the list) and then do the same thing as I was talking about above?
No, there's is no guarantee that it will take first 32, could be less. It will vary each time you execute the same code
Task.Factory.StartNew creates a new tasks but it will not create a new one for each chunk as you expect.
Putting a Parallel.ForEach inside a new Task will not help you further reduce the time taken for the parallel tasks themselves.

How to store results from tasks running in threadpool?

I have a problem with a threadpool efficiency. I'm not sure I understand the whole concept. I did a lot of reading before asking that question and I know that threadpool is a good solution if you have a lot of small, relatively quick functions AND what's more important - non-blocking tasks. Using lock is very bad in threadpool.
And here is my question: How to return values from threadpool functions? If you have functions to run they probably produce some results, right? It's good to store those results somewhere. Where?
I'm running c.a. 200k very quick functions in a threadpool. The results I store in the List. Of course I have to do:
lock(lockobj)
{
myList.Add(result);
}
So, is this the right way? I mean, if your functions returns SOMETHING, you have to store them in some kind of collection. It has to be a blocking collection. So, I started thinking... "Blocking is very bead in threadpool, but you have to do this, at least once - at the end of every function
How to store/return results from functions running in threadpool?
Thanks!
JB
EDIT: By "function" I mean...
ThreadPool.QueueUserWorkItem(state =>
{
Result r = function(); // previously named "Task"
lock(lockobj)
{
allResults.Add(r);
}
}

If you don't want to block the ThreadPool threads use a lock-free approach. ConcurrentQueue is currently lock-free (as of .NET 4.6.2) when you enqueue items.
So simply do this:
public static ConcurrentQueue<Result> AllResults { get; } = new ConcurrentQueue<Result>();
ThreadPool.QueueUserWorkItem(state =>
{
Result r = function();
AllResults.Enqueue(r);
}
This will guarantee you don't block ThreadPool threads.

Any kind of collection that is thread safe/synchronized will do. There are plenty in .net framework.
You can also use volatile variables to store data between multiple threads - but this is usually considered a bad practice.
Another approach can be to schedule those operations on tasks that can produce results, they run by default on the thread pool and you can get the return values by awaiting the methods and checking the Result of the Task that is returned.
Finally you can write your own code in order to synchronize access to certain regions of code/variables etc using stuff like lock, semaphores, mutex etc

Parallelizing multiple long-running tasks with async/await

I have a helper method returns IEnumerable<string>. As the collection grows, it's slowing down dramatically. My current approach is to do essentially the following:
var results = new List<string>();
foreach (var item in items)
{
results.Add(await item.Fetch());
}
I'm not actually sure whether this asynchronicity gives me any benefit (it sure doesn't seem like it), but all methods up the stack and to my controller's actions are asynchronous:
public async Task<IHttpActionResult> FetchAllItems()
As this code is ultimately used by my API, I'd really like to parallelize these all for what I hope would be great speedup. I've tried .AsParallel:
var results = items
.AsParallel()
.Select(i => i.Fetch().Result)
.AsList();
return results;
And .WhenAll (returning a string[]):
var tasks = items.Select(i => i.Fetch());
return Task<string>.WhenAll<string>(tasks).Result;
And a last-ditch effort of firing off all long-running jobs and sequentially awaiting them (hoping that they were all running in parallel, so waiting on one would let all others nearly complete):
var tasks = new LinkedList<Task<string>>();
foreach (var item in items)
tasks.AddLast(item.Fetch());
var results = new LinkedList<string>();
foreach (var task in tasks)
results.AddLast(task.Result);
In every test case, the time it takes to run is directly proportional to the number of items. There's no discernable speedup by doing this. What am I missing in using Tasks and await/async?

There's a difference between parallel and concurrent. Concurrency just means doing more than one thing at a time, whereas parallel means doing more than one thing on multiple threads. async is great for concurrency, but doesn't (directly) help you with parallelism.
As a general rule, parallelism on ASP.NET should be avoided. This is because any parallel work you do (i.e., AsParallel, Parallel.ForEach, etc) shares the same thread pool as ASP.NET, so that reduces ASP.NET's capability to handle other requests. This impacts the scalability of your web service. It's best to leave the thread pool to ASP.NET.
However, concurrency is just fine - specifically, asynchronous concurrency. This is where Task.WhenAll comes in. Code like this is what you should be looking for (note that there is no call to Task<T>.Result):
var tasks = items.Select(i => i.Fetch());
return await Task<string>.WhenAll<string>(tasks);
Given your other code samples, it would be good to run through your call tree starting at Fetch and replace all Result calls with await. This may be (part of) your problem, because Result forces synchronous execution.
Another possible problem is that the underlying resource being fetched does not support concurrent access, or there may be throttling that you're not aware of. E.g., if Fetch retrieves data from another web service, check out System.Net.ServicePointManager.DefaultConnectionLimit.

There is also a configurable limitation on the max connections to a single server that can make download performance independent to the number of client threads.
To change the connection limit use
ServicePointManager.DefaultConnectionLimit
Maximum concurrent requests for WebClient, HttpWebRequest, and HttpClient

Parallel.For vs regular threads

I'm trying to understand why Parallel.For is able to outperform a number of threads in the following scenario: consider a batch of jobs that can be processed in parallel. While processing these jobs, new work may be added, which then needs to be processed as well. The Parallel.For solution would look as follows:
var jobs = new List<Job> { firstJob };
int startIdx = 0, endIdx = jobs.Count;
while (startIdx < endIdx) {
Parallel.For(startIdx, endIdx, i => WorkJob(jobs[i]));
startIdx = endIdx; endIdx = jobs.Count;
}
This means that there are multiple times where the Parallel.For needs to synchronize. Consider a bread-first graph algorithm algorithm; the number of synchronizations would be quite large. Waste of time, no?
Trying the same in the old-fashioned threading approach:
var queue = new ConcurrentQueue<Job> { firstJob };
var threads = new List<Thread>();
var waitHandle = new AutoResetEvent(false);
int numBusy = 0;
for (int i = 0; i < maxThreads; i++)
threads.Add(new Thread(new ThreadStart(delegate {
while (!queue.IsEmpty || numBusy > 0) {
if (queue.IsEmpty)
// numbusy > 0 implies more data may arrive
waitHandle.WaitOne();
Job job;
if (queue.TryDequeue(out job)) {
Interlocked.Increment(ref numBusy);
WorkJob(job); // WorkJob does a waitHandle.Set() when more work was found
Interlocked.Decrement(ref numBusy);
}
}
// others are possibly waiting for us to enable more work which won't happen
waitHandle.Set();
})));
threads.ForEach(t => t.Start());
threads.ForEach(t => t.Join());
The Parallel.For code is of course much cleaner, but what I cannot comprehend, it's even faster as well! Is the task scheduler just that good? The synchronizations were eleminated, there's no busy waiting, yet the threaded approach is consistently slower (for me). What's going on? Can the threading approach be made faster?
Edit: thanks for all the answers, I wish I could pick multiple ones. I chose to go with the one that also shows an actual possible improvement.

The two code samples are not really the same.
The Parallel.ForEach() will use a limited amount of threads and re-use them. The 2nd sample is already starting way behind by having to create a number of threads. That takes time.
And what is the value of maxThreads ? Very critical, in Parallel.ForEach() it is dynamic.
Is the task scheduler just that good?
It is pretty good. And TPL uses work-stealing and other adaptive technologies. You'll have a hard time to do any better.

Parallel.For doesn't actually break up the items into single units of work. It breaks up all the work (early on) based on the number of threads it plans to use and the number of iterations to be executed. Then has each thread synchronously process that batch (possibly using work stealing or saving some extra items to load-balance near the end). By using this approach the worker threads are virtually never waiting on each other, while your threads are constantly waiting on each other due to the heavy synchronization you're using before/after every single iteration.
On top of that since it's using thread pool threads many of the threads it needs are likely already created, which is another advantage in its favor.
As for synchronization, the entire point of a Parallel.For is that all of the iterations can be done in parallel, so there is almost no synchronization that needs to take place (at least in their code).
Then of course there is the issue of the number of threads. The threadpool has a lot of very good algorithms and heuristics to help it determine how many threads are need at that instant in time, based on the current hardware, the load from other applications, etc. It's possible that you're using too many, or not enough threads.
Also, since the number of items that you have isn't known before you start I would suggest using Parallel.ForEach rather than several Parallel.For loops. It is simply designed for the situation that you're in, so it's heuristics will apply better. (It also makes for even cleaner code.)
BlockingCollection<Job> queue = new BlockingCollection<Job>();
//add jobs to queue, possibly in another thread
//call queue.CompleteAdding() when there are no more jobs to run
Parallel.ForEach(queue.GetConsumingEnumerable(),
job => job.DoWork());

Your creating a bunch of new threads and the Parallel.For is using a Threadpool. You'll see better performance if you were utilizing the C# threadpool but there really is no point in doing that.
I would shy away from rolling out your own solution; if there is a corner case where you need customization use the TPL and customize..

Performance of running Parallel.Foreach on several threads

I have 3 main processing threads, each of them performing operations on the values of ConcurrentDictionaries by means of Parallel.Foreach. The dictionaries vary in size from 1,000 elements to 250,000 elements
TaskFactory factory = new TaskFactory();
Task t1 = factory.StartNew(() =>
{
Parallel.ForEach(dict1.Values, item => ProcessItem(item));
});
Task t2 = factory.StartNew(() =>
{
Parallel.ForEach(dict2.Values, item => ProcessItem(item));
});
Task t3 = factory.StartNew(() =>
{
Parallel.ForEach(dict3.Values, item => ProcessItem(item));
});
t1.Wait();
t2.Wait();
t3.Wait();
I compared the performance (total execution time) of this construct with just running the Parallel.Foreach in the main thread and the performance improved a lot (the execution time was reduced approximately 5 times)
My questions are:
Is there something wrong with the
approach above? If yes, what and how
can it be improved?
What is the reason for the different execution times?
What is a good way to debug/analyze such a situation?
EDIT: To further clarify the situation: I am mocking the client calls on a WCF service, that each comes on a separate thread (the reason for the Tasks). I also tried to use ThreadPool.QueueUserWorkItem instead of Task, without a performance improvement. The objects in the dictionary have between 20 and 200 properties (just decimals and strings) and there is no I/O activity
I solved the problem by queuing the processing requests in a BlockingCollection and processing them one at the time

You're probably over-parallelizing.
You don't need to create 3 tasks if you already use a good (and balanced) parallelization inside each one of them.
Parallel.Foreach already try to use the right number of threads to exploit the full CPU potential without saturating it. And by creating other tasks having Parallel.Foreach you're probably saturating it.
(EDIT: as Henk said, they probably have some problems in coordinating the number of threads to spawn when run in parallel, and at least this leads to a bigger overhead).
Have a look here for some hints.

First of all, a Task is not a Thread.
Your Parallel.ForEach() calls are run by a scheduler that uses the ThreadPool and should try to optimize Thread usage. The ForEach applies a Partitioner. When you run these in parallel they cannot coordinate very well.
Only if there is a performance problem, consider helping with extra tasks or DegreeOfParallelism directives. And then always profile and analyze first.
An explanation of your results is difficult, it could be caused by many factors (I/O for example) but the advantage of the 'single main task' is that the scheduler has more control and the CPU and Cache are used better (locality).

The dictionaries vary widely in size and by the looks of it (given everything finishes in <5s) the amount of processing work is small. Without knowing more it's hard to say what's actually going on. How big are your dictionary items? The main thread scenario you're comparing this to looks like this right?
Parallel.ForEach(dict1.Values, item => ProcessItem(item));
Parallel.ForEach(dict2.Values, item => ProcessItem(item));
Parallel.ForEach(dict3.Values, item => ProcessItem(item));
By adding the Tasks around each ForEach your adding more overhead to manage the tasks and probably causing memory contention as dict1, dict2 and dict3 all try and be in memory and hot in cache at the same time. Remember, CPU cycles are cheap, cache misses are not.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.