Implement Parallel.Foreach inside a while loop

Implement Parallel.Foreach inside a while loop - c#

I have scenario where I need to run a Parallel.Foreach within a while loop. I need to understand the impact of this implementation in terms of how the processing will take place. I will have an implementation something like this
ConcurrentQueue<MyTable> queue = new ConcurrentQueue<MyTable>();
Here, I have initially added lot of items in queue but while execution also, more items can be added in the queue.
while(true)
{
Parallel.Foreach(queue, (myTable) => {some processing});
Sleep(sometime);
}
Each time one item will be de-queued and new thread will be spawned to work with it, in the meanwhile new items will be added for that I need to keep an infinite while loop.
Now, I need to understand that as concurrent queue is thread safe, I think each item will be processed one time only in spite of while above foreach but I am not sure about is that there will be multiple threads of foreach itself that will be spawning child threads or single copy of foreach will be running within while loop. I do not know how foreach itself is implemented.

I have scenario where I need to run a Parallel.Foreach within a while loop.
I don't think you do. You want to process new items as they come in in parallel, but I think this is not the best way to do that.
I think the best way is to use ActionBlock from TPL Dataflow. It won't waste CPU or threads when there are no items to process and if you set its MaxDegreeOfParallelism, it will process items in parallel:
ActionBlock<MyTable> actionBlock = new ActionBlock<MyTable>(
myTable => /* some processing */,
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded
});
...
actionBlock.Post(someTable);
If you don't want to or can't (it's .Net 4.5 only) use TPL Dataflow, another option would be use a single Parallel.Foreach() (no while) together with BlockingCollection and GetConsumingPartitioner() (not GetConsumingEnumerable()!).
Using this, the Parallel.Foreach() threads will be blocked when there are no items to process, but there also won't be any delays in processing (like the ones caused by your Sleep()):
BlockingCollection<MyTable> queue = new BlockingCollection<MyTable>();
...
Parallel.ForEach(
queue.GetConsumingPartitioner(), myTable => /* some processing */);
...
queue.Add(someTable);
I think each item will be processed one time only in spite of while above foreach but I am not sure
That's one reason why you should use one of the options above, since they mean you don't need to know much about the details of how they work, they just work.

Related

What does the Parallel.Foreach do behind the scenes?

So I just cant grasp the concept here.
I have a Method that uses the Parallel class with the Foreach method.
But the thing I dont understand is, does it create new threads so it can run the function faster?
Let's take this as an example.
I do a normal foreach loop.
private static void DoSimpleWork()
{
foreach (var item in collection)
{
//DoWork();
}
}
What that will do is, it will take the first item in the list, assign the method DoWork(); to it and wait until it finishes. Simple, plain and works.
Now.. There are three cases I am curious about
If I do this.
Parallel.ForEach(stringList, simpleString =>
{
DoMagic(simpleString);
});
Will that split up the Foreach into let's say 4 chunks?
So what I think is happening is that it takes the first 4 lines in the list, assigns each string to each "thread" (assuming parallel creates 4 virtual threads) does the work and then starts with the next 4 in that list?
If that is wrong please correct me I really want to understand how this works.
And then we have this.
Which essentially is the same but with a new parameter
Parallel.ForEach(stringList, new ParallelOptions() { MaxDegreeOfParallelism = 32 }, simpleString =>
{
DoMagic(simpleString);
});
What I am curious about is this
new ParallelOptions() { MaxDegreeOfParallelism = 32 }
Does that mean it will take the first 32 strings from that list (if there even is that many in the list) and then do the same thing as I was talking about above?
And for the last one.
Task.Factory.StartNew(() =>
{
Parallel.ForEach(stringList, simpleString =>
{
DoMagic(simpleString);
});
});
Would that create a new task, assigning each "chunk" to it's own task?

Do not mix async code with parallel. Task is for async operations - querying a DB, reading file, awaiting some comparatively-computation-cheap operation such that your UI won't be blocked and unresponsive.
Parallel is different. That's designed for 1) multi-core systems and 2) computational-intensive operations. I won't go in details how it works, that kind of info could be found in an MS documentation. Long story short, Parallel.For most probably will make it's own decision on what exactly when and how to run. It might disobey you parameters, i.e. MaxDegreeOfParallelism or somewhat else. The whole idea is to provide the best possible parallezation, thus complete your operation as fast as possible.

Parallel.ForEach perform the equivalent of a C# foreach loop, but with each iteration executing in parallel instead of sequentially. There is no sequencing, it depends on whether the OS can find an available thread, if there is it will execute
MaxDegreeOfParallelism
By default, For and ForEach will utilize as many threads as the OS provides, so changing MaxDegreeOfParallelism from the default only limits how many concurrent tasks will be used by the application.
You do not need to modify this parameter in general but may choose to change it in advanced scenarios:
When you know that a particular algorithm you're using won't scale
beyond a certain number of cores. You can set the property to avoid
wasting cycles on additional cores.
When you're running multiple algorithms concurrently and want to
manually define how much of the system each algorithm can utilize.
When the thread pool's heuristics is unable to determine the right
number of threads to use and could end up injecting too many
threads. e.g. in long-running loop body iterations, the
thread pool might not be able to tell the difference between
reasonable progress or livelock or deadlock, and might not be able
to reclaim threads that were added to improve performance. You can set the property to ensure that you don't use more than a reasonable number of threads.
Task.StartNew is usually used when you require fine-grained control for a long-running, compute-bound task, and like what #Сергей Боголюбов mentioned, do not mix them up
It creates a new task, and that task will create threads asynchronously to run the for loop
You may find this ebook useful: http://www.albahari.com/threading/#_Introduction

does the work and then starts with the next 4 in that list?
This depends on your machine's hardware and how busy the machine's cores are with other processes/apps your CPU is working on
Does that mean it will take the first 32 strings from that list (if there even if that many in the list) and then do the same thing as I was talking about above?
No, there's is no guarantee that it will take first 32, could be less. It will vary each time you execute the same code
Task.Factory.StartNew creates a new tasks but it will not create a new one for each chunk as you expect.
Putting a Parallel.ForEach inside a new Task will not help you further reduce the time taken for the parallel tasks themselves.

Task vs Barrier

So my problem is as follows: I have a list of items to process and I'd like to process the items in parallel then commit the processed items.
The barrier class in C# will allow me to do this - I can run threads in parallel to process the list of items and when SignalAndWait is called and all participants hit he barrier I can commit the processed items.
The Task class will also allow me to do this - on the Task.WaitAll call I can wait for all tasks to complete and I can commit the processed items. If I understand correctly each task will run on it's own thread not a bunch of tasks in parallel on the same thread.
Is my understand correct on both usages for the problem?
Is there any advantage between one over the other?
Is there any way a hybrid solution is better (barrier and tasks?).

Is my understand correct on both usages for the problem?
I think you have a misunderstanding of the Barrier class. The docs say:
A barrier is a user-defined synchronization primitive that enables multiple threads (known as participants) to work concurrently on an algorithm in phases.
A barrier is a synchronization primitive. Comparing it to a unit of work which may be computed in parallel such as a Task isn't correct.
A barrier can signal all threads to wait until all others have completed some work and check upon that work. By itself, it has no parallel computation capabilities and no threading model behind it.
Is there any advantage between one over the other?
As for question 1, you see this is irrelevant.
Is there any way a hybrid solution is better (barrier and tasks?).
In your case, I'm not sure its needed at all. If you sinply want to do CPU bound computation in parallel on a collection of items, you have Parallel.ForEach exactly for that purpose. It will partition an enumerable and invoke them in parallel, and block until the entire collection has been computed.

I'm not directly answering your question because I think that working with barriers and tasks is just making your code more complex than it needs to be.
I'd suggest using Microsoft's Reactive Framework for this - NuGet "Rx-Main" - as it just makes the whole problem super simple.
Here's the code:
var query =
from item in items.ToObservable()
from processed in Observable.Start(() => processItem(item))
select new { item, processed };
query
.ToArray()
.Subscribe(processedItems =>
{
/* commit the processed items */
});
The query turns a list of items into a observable and then processes each item using Observable.Start(...). This optimally fires off new threads as needed. The .ToArray() takes the sequence of individual results and changes it into a single array of results. The .Subscribe(...) method then allows you to process the results.
The code is much simpler than using tasks or barriers.

Difference between ThreadPool.QueueUserWorkItem and Parallel.ForEach?

What is the main difference between two of following approaches:
ThreadPool.QueueUserWorkItem
Clients objClient = new Clients();
List<Clients> objClientList = Clients.GetClientList();
foreach (var list in objClientList)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(SendFilesToClient), list);
}
System.Threading.Tasks.Parallel ForEach
Clients objClient = new Clients();
List<Clients> objClientList = Clients.GetClientList();
Parallel.ForEach<Clients>(objClientList, list =>
{
SendFilesToClient(list);
});
I am new to multi-threading and want to know what's going to happen in each case (in terms of execution process) what's the level of multi-threading for each approach? Help me visualize both the processes.
SendFilesToClient: Gets data from database, converts to Excel and sends the Excel file to respective client.
Thanks!

The main difference is functional. Parallel.ForEach will block (by design), so it will not return until all of the objects have been processed. Your foreach queuing threadpool thread work will push the work onto background threads, and not block.
Also, the Parallel.ForEach version will have another major advantages - unhandled exceptions will be pushed back to the call site here, instead of left unhandled on a ThreadPool thread.
In general, Parallel.ForEach will be more efficient. Both options use the ThreadPool, but Parallel.ForEach does intelligent partitioning to prevent overthreading and to reduce the amount of overhead required by the scheduler. Individual tasks (which will map to ThreadPool threads) get reused, and effectively "pooled" to lower overhead, especially if SendFilesToClient is a fast operation (which, in this case, will not be true).
Note that you can also, as a third option, use PLINQ:
objClientList.AsParallel().ForAll(SendFilesToClient);
This will be very similar to the Parallel.ForEach method in terms of performance and functionality.

Why isn't Parallel.ForEach running multiple threads?

Today i tried do some optimization to foreach statement, that works on XDocument.
Before optimization:
foreach (XElement elem in xDoc.Descendants("APSEvent").ToList())
{
//some operations
}
After optimization:
Parallel.ForEach(xDoc.Descendants("APSEvent").ToList(), elem =>
{
//same operations
});
I saw that .NET in Parallel.ForEach(...) opened ONLY one thread! As a result the timespan of Parallel was bigger than standard foreach.
Why do you think .NET only opened 1 thread? Because of locking of file?
Thanks

It's by design that Parallel.ForEach may use fewer threads than requested to achieve better performance. According to MSDN [link]:
By default, the Parallel.ForEach and Parallel.For methods can use a variable number of tasks. That's why, for example, the ParallelOptions class has a MaxDegreeOfParallelism property instead of a "MinDegreeOfParallelism" property. The idea is that the system can use fewer threads than requested to process a loop.
The .NET thread pool adapts dynamically to changing workloads by allowing the number of worker threads for parallel tasks to change over time. At run time, the system observes whether increasing the number of threads improves or degrades overall throughput and adjusts the number of worker threads accordingly.

From the problem description, there is nothing that explains why the TPL is not spawning more threads.
There is no evidence in the question that is even the problem. That can be fixed quite easily: you could log the thread id, before you enter the loop, and as the first thing you do inside your loop.
If it is always the same number, it is the TPL failing to spawn threads. You should then try different versions of your code and what change triggers the TPL to serialize everything. One reason could be if there are a small number of elements in your list. The TPL partitions your collection, and if you have only a few items, you might end up with only one batch. This behavior is configurable by the way.
It could be you are inadvertedly taking a lock in in the loop, then you will be seeing lots of different numbers, but no speedup. Then, simplify the code until the problem vanishes.

Not always the parallel way is faster than the "old fashion way"
http://social.msdn.microsoft.com/Forums/en-US/parallelextensions/thread/c860cf3f-f7a6-46b5-8a07-ca2f413258dd

use it like this:
int ParallelThreads = 10;
Parallel.ForEach(xDoc.Descendants("APSEvent").ToList(), new ParallelOptions() { MaxDegreeOfParallelism = ParallelThreads }, (myXDOC, i, j) =>
{
//do whatever you want here
});

Yes exactly, Document.Load(...) locks the file and due to resource contention between threads, TPL is unable to use the power of multiple threads. Try to load the XML into a Stream and then use Parallel.For(...).

Do you happen to have a single processor? TPL may limit the number of threads to one in this case. Same thing may happen if the collection is very small. Try a bigger collection.
See this answer for more details on how the degree of parallelism is determined.

Performance of running Parallel.Foreach on several threads

I have 3 main processing threads, each of them performing operations on the values of ConcurrentDictionaries by means of Parallel.Foreach. The dictionaries vary in size from 1,000 elements to 250,000 elements
TaskFactory factory = new TaskFactory();
Task t1 = factory.StartNew(() =>
{
Parallel.ForEach(dict1.Values, item => ProcessItem(item));
});
Task t2 = factory.StartNew(() =>
{
Parallel.ForEach(dict2.Values, item => ProcessItem(item));
});
Task t3 = factory.StartNew(() =>
{
Parallel.ForEach(dict3.Values, item => ProcessItem(item));
});
t1.Wait();
t2.Wait();
t3.Wait();
I compared the performance (total execution time) of this construct with just running the Parallel.Foreach in the main thread and the performance improved a lot (the execution time was reduced approximately 5 times)
My questions are:
Is there something wrong with the
approach above? If yes, what and how
can it be improved?
What is the reason for the different execution times?
What is a good way to debug/analyze such a situation?
EDIT: To further clarify the situation: I am mocking the client calls on a WCF service, that each comes on a separate thread (the reason for the Tasks). I also tried to use ThreadPool.QueueUserWorkItem instead of Task, without a performance improvement. The objects in the dictionary have between 20 and 200 properties (just decimals and strings) and there is no I/O activity
I solved the problem by queuing the processing requests in a BlockingCollection and processing them one at the time

You're probably over-parallelizing.
You don't need to create 3 tasks if you already use a good (and balanced) parallelization inside each one of them.
Parallel.Foreach already try to use the right number of threads to exploit the full CPU potential without saturating it. And by creating other tasks having Parallel.Foreach you're probably saturating it.
(EDIT: as Henk said, they probably have some problems in coordinating the number of threads to spawn when run in parallel, and at least this leads to a bigger overhead).
Have a look here for some hints.

First of all, a Task is not a Thread.
Your Parallel.ForEach() calls are run by a scheduler that uses the ThreadPool and should try to optimize Thread usage. The ForEach applies a Partitioner. When you run these in parallel they cannot coordinate very well.
Only if there is a performance problem, consider helping with extra tasks or DegreeOfParallelism directives. And then always profile and analyze first.
An explanation of your results is difficult, it could be caused by many factors (I/O for example) but the advantage of the 'single main task' is that the scheduler has more control and the CPU and Cache are used better (locality).

The dictionaries vary widely in size and by the looks of it (given everything finishes in <5s) the amount of processing work is small. Without knowing more it's hard to say what's actually going on. How big are your dictionary items? The main thread scenario you're comparing this to looks like this right?
Parallel.ForEach(dict1.Values, item => ProcessItem(item));
Parallel.ForEach(dict2.Values, item => ProcessItem(item));
Parallel.ForEach(dict3.Values, item => ProcessItem(item));
By adding the Tasks around each ForEach your adding more overhead to manage the tasks and probably causing memory contention as dict1, dict2 and dict3 all try and be in memory and hot in cache at the same time. Remember, CPU cycles are cheap, cache misses are not.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.