What's a good strategy for processing a queue in parallel?

What's a good strategy for processing a queue in parallel? - c#

I'm writing a program which needs to recursively search through a folder structure, and would like to do so in parallel with several threads.
I've written the rather trivial synchronous method already - adding the root directory to the queue initially, then dequeuing a directory, queuing its subdirectories, etc., until the queue is empty. I'll use a ConcurrentQueue<T> for my queue, but have already realized that my loops will stop prematurely. The first thread will dequeue the root directory, and immediately every other thread could see that the queue is empty and exit, leaving the first thread as the only one running. I would like each thread to loop until the queue is empty, then wait until another thread queues some more directories, and keep going. I need some sort of checkpoint in my loop so that none of the threads will exit until every thread has reached the end of the loop, but I'm not sure the best way to do this without deadlocking when there really are no more directories to process.

Use the Task Parallel Library.
Create a Task to process the first folder. In this create a Task to process each subfolder (recursively) and a task for each relevant file. Then wait on all the tasks for this folder.
The TPL runtime will make use of the thread pool avoiding creating threads, which is an expensive operation. for small pieces of work.
Note:
If the work per file is trivial do it inline rather than creating another task (IO performance will be the limiting factor).
This approach will generally work best if blocking operations are avoided, but if IO performance is the limit then this might not matter anyway—start simple and measure.
Before .NET 4 much of this can be done with the thread pool, but you'll need to use events to wait for tasks to complete, and that waiting will tie up thread pool threads.1
1 As I understand it, in the TPL when waiting on tasks—using a TPL method—TPL will reuse that thread for other tasks until the wait is fulfilled.

If you want to stick to the concept of an explicit queue have a look on the BlockingCollection class. The method GetConsumingEnumerable() returns a IEnumerable which blocks, when the collection has run out of items and continues as soon new items are available. This means whenever the collection is empty the thread is blocked and thus prevents a premature stop of it.
However: Basically this is very useful for producer-consumer scenarios. I am not sure if your problem falls into this category.

It would seem like in this case that your best bet would be to create one thread to start, then whenever you load sub-directories, you should task threads from the thread pool to handle them. Allow your threads to exit when they are done and call new ones from the pool every time you go one step further into the directories. This way there is no deadlock and your system uses threads as it needs them. You could even specify how many threads to start based upon how many folders were found.
Edit: Changed the above to be more clear that you don't want to explicitly create new threads but instead you want to take advantage of the thread pool to add and remove threads as needed without the overhead.

Related

UWP handling many threads

So I am developing a UWP application that has a large number of threads. Previously I would start all the threads with System.Threading.Tasks.Task.Run(), save their thread handles to an array, then Task.WaitAll() for completion and get the results. This currently is taking too much memory though.
I have changed my code to only wait for a smaller amount of threads and copy their results out before continuing on to more of the threads. Since UWP the UWP implementation of Task does not implement IDisposable, what is the proper way to signal the framework that I am done with a task so it can be garbage collected? I would like to read out the results of the treads after a certain number of them come in and dispose of the threads resources to make space for the next threads.
Thanks so much!

Just to point out an issue which might be degrading the performance of your application: You are deliberately blocking the thread until all Tasks complete rather than actually await for them. That would make sense, if you are not performing Asynchronous work inside them, but if you are, you should definitely switch to:
Task.WhenAll rather than Task.WaitAll , such as this:
List<Tasks> tasks = new List<Tasks> { Method1(), Method2(), ... };
Task result = await Task.WhenAll(tasks);
This way, you are actually leveraging the asynchrony of your app, and you will not block the current thread until all the tasks are completed, like Task.WaitAll() does.
Since you are utilizing the Task.Run() method, instead of the Task.Factory.StartNew(), the TaskScheduler used is the default, and utilizes Threads from the Thread Pool. So you will not actually end up blocking the UI thread, but blocking many Thread Pool threads, is also not good.
Taking from Microsoft documentation, for one of the cases where Thread Pools should not be used:
You have tasks that cause the thread to block for long periods of
time. The thread pool has a maximum number of threads, so a large
number of blocked thread pool threads might prevent tasks from
starting.
Edit:
I do not need anything else but I will look in to that! Thanks! So is
there any way I can get it to run the Tasks like a FIFO with just the
API's available with the default thread pool?
You should take a look, into Continuations
A continuation is nothing else other than a task which is activated whenever it's antecedent task/tasks have completed. If you have specific tasks which you only want to execute after another task has completed you should take a look into Continuations, since they are extremely flexible, and you can actually create really complex flow of Tasks to better suit your needs.

Garbage collection on a .Net application always works the same, when a variable is not needed anymore (out of scope) it is collected.
Why do you think the threads are consuming the memory? It is much likely than the process inside the threads is the one consuming the memory.

How can I make a thread wait until another thread is wating (C#)

I have a consumer thread that creates some worker threads. These threads must switch between active and waiting states. When all worker threads are in the waiting states, it means that the current job is done. How can I make the consumer thread wait for all the worker threads to be in the waiting state? I want a behavior very similar to Thread.Join() on all worker threads, however, I want the threads to keep running for the next job. I cannot create new threads because the jobs are in a tight loop and creating new threads is costly.

As far as I am aware there is no mechanism to do what you wish. (Thread.Join but since you can't block that is not an option)
From the info you provided it sounds like your really building a state machine, just across multiple threads.
I would create a Singleton and have that act as a state machine. Threads could signal to the Singleton there status.
It sounds like you have an indeterminate number of threads, so you would need to put the status of each in a collection. I would look here Thread Safe Collections to find the right fit for how you wish to store your state information.
Hope this helps.

Apologies for the brief answer (may expand later), but you probably the WaitHandle.WaitAll method, combined with a ManualResetEvent. You would pass your ManualResetEvent objects into each worker thread when they're created, signal them when they become idle, and pass the entire set of handles into the WaitHandle.WaitAll method to wake the observing thread when they're complete. You can also use the timeout feature of this method if you want to periodically run some kind of task while waiting, or perform some kind of operation if the task is taking too long.
Note that if your worker threads are intended to terminate when the operation is complete (wasn't totally clear if this is the case), it might be more appropriate to spawn them as tasks and use Task.WaitAll instead.
Edit: On a quick re-read, it sounds like you do want to be using tasks rather than trying to re-use full worker threads. Tasks use threads which have been allocated from the thread pool, eliminating that thread creation overhead you were worried about, because the threads will (generally) be ready and waiting for work. You can simply spawn each task and wait for them all to be finished.

The "bag of tasks" concept in C#, enqueue,pause,cancel logical tasks

The app I'm developing is composed this way:
A producer task scan the file system for text files and put a reference to them in a bag.
Many consumer tasks take file refs from the bag concurrently and read the files (and do some short work with their content)
I must be able to pause and resume the whole process.
I've tried using TPL, creating a task for every file ref as they are put in the bag (in this case the bag is just a concept, the producer directly create the consumers task as it find files) but this way I don't have control over the task I create, I can't (or I don't know how to) pause them. I can write some code to suspend the thread currently executing the task but that will ruin the point of working with logical tasks instead of manully creating threads wouldn't it? I would want something like "task already assigned to phisical thread can complete but waiting logical tasks should not start until resume command"
How can I achive this? Can it be done with TPL or should I use something else?
EDIT:
Your answers are all valid but my main doubt remains unanswered. We are talking about tasks, if I use TPL my producer and my many consumer will be tasks (right?) not threads (well, ok at the moment of the execution tasks will be mapped on threads). Every synchronization mechanism i've found (like the one proposed in the comment "ManualResetEventSlim") work at thread level.
E.g. the description of the Wait() method of "ManualResetEventSlim" is "Blocks the current thread until the current ManualResetEventSlim is set."
My knowledge of task is purely academic, I don't know how things works in the "real world" but it seem logical to me that I need a way to coordinate (wait/signal/...) tasks at task level or things could get weird... like... two task may be mapped on the same thread but one was supposed to signal the other that was waiting then deadlock. I'm a bit confused. This is why I asked if my app could use TPL instead of old style simple threads.

Yes, you can do that. First, you have a main thread, your application. There you have two workers, represented by threads. The first worker would be a producer and the second worker would be a consumer.
When your application starts, you start the workers. Both of them operates on the concurrency collection, the bag. Producer searches for files and puts references to the bag and consumer takes references from the bag and starts a task per reference.
When you want to signal pause, simply pause the producer. If you do that, consumer also stops working if there is nothing in the bag. If this is not a desired behaviour, you can simply define that pausing of the producer also clears the bag - backup your bag first and than clear it. This way all running tasks will finish their job and consumer will not start new tasks, but it can still run and wait for the results.
EDIT:
Based on your edit. I don't know how to achieve it the way you want, but although it is nice try to use new technologies, don't let your mind be clouded. Using a ThreadPool is also nice thing. It will take more time to start the application, but once it is running, consuming will be faster, because you already have workers ready.
It is not a bad idea, you can specify a maximum number of workers. If you create a task for every item in the bag, it will be more memory-consuming because you will still allocate and release memory. This will not happen with ThreadPool.

Sure you can use TPL for this. And may be also reactive extensions and LINQ to simplify grouping and pausing/resuming the thread works.
If you have just a short job on each file, it is pretty good idea to not to disturb the handler function with cancellations. You can just suspend queueing the workers instead.
I imagine something like this:
You directory scanner thread puts the found files into an observable collection.
The consumer thread subscribes the collection changes and gets/removes the files and assigns them to workers.

Multi threading which would be the best to use? (Threadpool or threads)

Hopefully this is a better question than my previous. I have a .exe which I will be passing different parameters (file paths) to which it will then take in and parse. So I will have a loop going, looping through the file paths in a list and passing them to this .exe file.
For this to be more efficient, I want to spread the execution across multiple cores which I think you do through threading.
My question is, should I use the threadpool, or multiple threads to run this .exe asynchronously?
Also, depending on which one of those you guys think is the best, if you can point me to a tutorial that will have some info on what I want to do. Thank you!
EDIT:
I need to limit the number of executions of the .exe to ONE execution PER CORE. This is the most efficient because if I am parsing 100,000 files I can't just fire up 100000 processes. So I am using threads to limit the number of executions at one time to one execution per core. If there is another way (other than threads) to find out if a processor isn't tied up in execution, or if the .exe has finished please explain.
But if there isn't another way, my FINAL question is how would I use a thread to call a parse method and then call back when that thread is no longer in use?
SECOND UPDATE (VERY IMPORTANT):
I went through what everyone told me, and found out a key element that I left out that I thought didn't matter. So I am using a GUI and I don't want it to be locked up. THAT is why I wanted to use threads. My main question now is, how do I send back information from a thread so I know when the execution is over?

As I said in my answer to your previous question, I think you don't understand the difference between processes and threads. Processes are incredibly "heavy" (*); each process can contain many threads. If you are spawning new processes from a parent process, that parent process doesn't need to create new threads; each process will have its own collection of threads.
Only create threads in the parent process if all the work is being done in the same process.
Think of a thread as a worker, and a process as a building containing one or more workers.
One strategy is "build a single building and populate it with ten workers who do each do some amount of work". You get the expense of building one process and ten threads.
If your strategy is "build a building. Then have the one worker in that building order the construction of a thousand more buildings, each of which contains a worker that does their bidding", then you get the expense of building 1001 buildings and hiring 1001 workers.
The strategy you do not want to pursue is "build a building. Hire 1000 workers in that building. Then instruct each worker to build a building, which then has one worker to go do the real work." There is no point in making a thread whose sole job is creating a process that then creates a thread! You have 1001 buildings and 2001 workers, half of whom are immediately idle but still have to be paid.
Looking at your specific problem: the key question is "where is the bottleneck?" Spawning off new processes or new threads only helps when the performance problem is that the perf is gated on the processor. If the performance of your parser is gated not on how fast you can parse the file but rather on how fast you can get it off disk, then parallelizing it is going to make things far, far worse. You'll have a huge amount of system resources devoted to all hammering on the same disk controller at the same time, and the disk controller will get slower as more load piles up on it.
UPDATE:
I need to limit the number of executions of the .exe to ONE execution PER CORE. This is the most efficient because if I am parsing 100,000 files I can't just fire up 100000 processes. So I am using threads to limit the number of executions at one time to one execution per core. If there is another way (other than threads) to find out if a processor isn't tied up in execution, or if the .exe has finished please explain
This seems like an awfully complicated way to go about it. Suppose you have n processors. Your proposed strategy, as I understand it, is to fire up n threads, then have each thread fire up one process, and you know that since the operating system will probably schedule one thread per CPU that somehow the processor will magically also schedule the new thread in each new process on a different CPU?
That seems like a tortuous chain of reasoning that depends on implementation details of the operating system. This is craziness. If you want to set the processor affinity of a particular process, just set the processor affinity on the process! Don't be doing this crazy thing with threads and hope that it works out.
I say that if you want to have no more than n instances of an executable running, one per processor, don't mess around with threads at all. Rather, just have one thread sit in a loop, constantly monitoring what processes are running. If there are fewer than n copies of the executable running, spawn another and set its processor affinity to be the CPU you like best. If there are n or more copies of the executable running, go to sleep for a second (or a minute, or whatever makes sense), and when you wake up, check again. Keep doing that until you're done. That seems like a much easier approach.
(*) Threads are also heavy, but they are lighter than processes.

Spontaneously I would push your file paths into a thread safe queue and then fire up a number of threads (say one per core). Each thread would repeatedly pop one item from the queue and process the it accordingly. The work is done when the queue is empty.
Implementation suggestions (to answer some of the questions in comments):
Queue:
In C# you could have a look at the Queue Class and the Queue.Synchronized Method for the implementation of the queue:
"Public static (Shared in Visual Basic) members of this type are thread safe. Any instance members are not guaranteed to be thread safe.
To guarantee the thread safety of the Queue, all operations must be done through the wrapper returned by the Synchronized method.
Enumerating through a collection is intrinsically not a thread-safe procedure. Even when a collection is synchronized, other threads can still modify the collection, which causes the enumerator to throw an exception. To guarantee thread safety during enumeration, you can either lock the collection during the entire enumeration or catch the exceptions resulting from changes made by other threads."
Threading:
For the threading part I suppose that any of the examples in the msdn threading tutorial would do (the tutorial is a bit old, but should be valid). Should not need to worry about synchronizing the threads as they can work independently from each other. The queue above is the only common resource they should need to access (hence the importance of thread safety of the queue).
Start the external process (.exe):
The following code is borrowed (and tweaked) from How to wait for a shelled application to finish by using Visual C#. You need to edit for your own needs, but as a starter:
//How to Wait for a Shelled Process to Finish
//Create a new process info structure.
ProcessStartInfo pInfo = new ProcessStartInfo();
//Set the file name member of the process info structure.
pInfo.FileName = "mypath\myfile.exe";
//Start the process.
Process p = Process.Start(pInfo);
//Wait for the process to end.
p.WaitForExit();
Pseudo code:
Main thread;
Create thread safe queue
Populate the queue with all the file paths
Create child threads and wait for them to finish
Child threads:
While queue is not empty << this section is critical, not more then one
pop file from queue << thread can check and pop at the time
start external exe
wait for it....
end external exe
end while
Child thread exits
Main thread waits for all child threads to finish
Program finishes.

See this question for how to find out the number of cores.
Then use Parallel.ForEach with ParallelOptions with MaxDegreeOfParallelism set to the number of cores.
Parallel.ForEach(args, new ParallelOptions() { MaxDegreeOfParallelism = Environment.ProcessorCount }, (element) => Console.WriteLine(element));

If you're targeting the .Net 4 framework the Parallel.For or Parallel.Foreach are extremely helpful. If those don't meet your requirements I've found the Task.Factory to be useful and straightforward to use as well.

To answer your revised question, you want processes. You just need to create the correct number of processes running the exe. Don't worry about forcing them onto specific cores. Windows will do that automatically.
How to do this:
You want to determine the number of cores on the machine. You may simply know it, and hardcode it, or you might want to use something like System.Environment.ProcessorCount.
Create a List<Process> object.
Then you want to start that many processes using System.Diagnostics.Process.Start. The return value will be a process object, which you will want to add to the List.
Now repeat the following until you are finished:
Call Thread.Sleep to wait for a while. Perhaps a minute or so.
Loop through each Process in the list but be sure to use a for loop rather than a foreach loop. For each process, call Refresh() then check the 'HasExited' property of each process, and if it is true, create a new process using Process.Start, and replace the exited process in the list with the newly created one.

If you're launching a .exe, then you have no choice. You will be running this asynchronously in a separate process. For the program which does the launching, I would recommend that you use a single thread and keep a list of the processes you launched.

Each exe launched will occur in its own process. You don't need to use a threadpool or multiple threads; the OS manages the processes (and since they're processes and not threads, they're very independent; completely separate memory space, etc.).

Resource usage of ThreadPool RegisterWaitForSingleObject

I am writing a server application which processes request from multiple clients. For the processing of requests I am using the threadpool.
Some of these requests modify a database record, and I want to restrict the access to that specific record to one threadpool thread at a time. For this I am using named semaphores (other processes are also accessing these records).
For each new request that wants to modify a record, the thread should wait in line for its turn.
And this is where the question comes in:
As I don't want the threadpool to fill up with threads waiting for access to a record, I found the RegisterWaitForSingleObject method in the threadpool.
But when I read the documentation (MSDN) under the section Remarks:
New wait threads are created automatically when required. ...
Does this mean that the threadpool will fill up with wait-threads? And how does this affect the performance of the threadpool?
Any other suggestions to boost performance is more than welcome!
Thanks!

Your solution is a viable option. In the absence of more specific details I do not think I can offer other tangible options. However, let me try to illustrate why I think your current solution is, at the very least, based on sound theory.
Lets say you have 64 requests that came in simultaneously. It is reasonable to assume that the thread pool could dispatch each one of those requests to a thread immediately. So you might have 64 threads that immediately begin processing. Now lets assume that the mutex has already been acquired by another thread and it is held for a really long time. That means those 64 threads will be blocked for a long time waiting for the thread that currently owns the mutex to release it. That means those 64 threads are wasted on doing nothing.
On the other hand, if you choose to use RegisterWaitForSingleObject as opposed to using a blocking call to wait for the mutex to be released then you can immediately release those 64 waiting threads (work items) and allow them to be put back into the pool. If I were to implement my own version of RegisterWaitForSingleObject then I would use the WaitHandle.WaitAny method which allows me to specify up to 64 handles (I did not randomly choose 64 for the number of requests afterall) in a single blocking method call. I am not saying it would be easy, but I could replace my 64 waiting threads for only a single thread from the pool. I do not know how Microsoft implemented the RegisterWaitForSingleObject method, but I am guessing they did it in a manner that is at least as efficient as my strategy. To put this another way, you should be able to reduce the number of pending work items in the thread pool by at least a factor of 64 by using RegisterWaitForSingleObject.
So you see, your solution is based on sound theory. I am not saying that your solution is optimal, but I do believe your concern is unwarranted in regards to the specific question asked.

IMHO you should let the database do its own synchronization. All you need to do is to ensure that you're sync'ed within your process.
Interlocked class might be a premature optimization that is too complex to implement. I would recommend using higher-level sync objects, such as ReaderWriterLockSlim. Or better yet, a Monitor.

An approach to this problem that I've used before is to have the first thread that gets one of these work items be responsible for any other ones that occur while it's processing the work item(s), This is done by queueing the work items then dropping into a critical section to process the queue. Only the 'first' thread will drop into the critical section. If a thread can't get the critical section, it'll leave and let the thread already operating in the critical section handle the queued object.
It's really not very complicated - the only thing that might not be obvious is that when leaving the critical section, the processing thread has to do it in a way that doesn't potentially leave a late-arriving workitem on the queue. Basically, the 'processing' critical section lock has to be released while holding the queue lock. If not for this one requirement, a synchronized queue would be sufficient, and the code would really be simple!
Pseudo code:
// `workitem` is an object that contains the database modification request
//
// `queue` is a Queue<T> that can hold these workitem requests
//
// `processing_lock` is an object use to provide a lock
// to indicate a thread is processing the queue
// any number of threads can call this function, but only one
// will end up processing all the workitems.
//
// The other threads will simply drop the workitem in the queue
// and leave
void threadpoolHandleDatabaseUpdateRequest(workitem)
{
// put the workitem on a queue
Monitor.Enter(queue.SyncRoot);
queue.Enqueue(workitem);
Monitor.Exit(queue.SyncRoot);
bool doProcessing;
Monitor.TryEnter(processing_queue, doProcessing);
if (!doProcessing) {
// another thread has the processing lock, it'll
// handle the workitem
return;
}
for (;;) {
Monitor.Enter(queue.SyncRoot);
if (queue.Count() == 0) {
// done processing the queue
// release locks in an order that ensures
// a workitem won't get stranded on the queue
Monitor.Exit(processing_queue);
Monitor.Exit(queue.SyncRoot);
break;
}
workitem = queue.Dequeue();
Monitor.Exit(queue.SyncRoot);
// this will get the database mutex, do the update and release
// the database mutex
doDatabaseModification(workitem);
}
}

ThreadPool creates a wait thread for ~64 waitable objects.
Good comments are here: Thread.sleep vs Monitor.Wait vs RegisteredWaitHandle?

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.