Need suggestion for best approach for Multi-threading in c# 3.0 (No Parallel or Task)
The situation is, I have a Queue with 500 items. At a particular time I can run only 10 threads (Max). Below is my code.
While (queue.Count > 0)
{
Thread[] threads = new Thread[no_of_threads];
for (int j = 0; j < no_of_threads; j++)
{
threads[j] = new Thread(StartProcessing);//StartProcessing Dequeue one item each time //for a single thread
threads[j].Start();
}
foreach (Thread objThread in threads)
{
objThread.Join();
}
}
Problem in this approach is, for an instance, if no_of_threads = 10 and out of them 9 threads are done with processing, and 1 thread is still working, I cannot come out of loop and delegate work to the free threads until all 10 threads are done.
I need at all the time 10 threads should work till the queue count > 0.
This is easily done with a Semaphore.
The idea is to create a semaphore with a maximum count of N, where N is the number of threads you allow. The loop waits on the semaphore and queues tasks as it acquires the semaphore.
Semaphore ThreadsAvailable = new Semaphore(10, 10);
while (Queue.Count > 0)
{
ThreadsAvailable.WaitOne();
// Must dequeue item here, otherwise you could run off the end of the queue
ThreadPool.QueueUserWorkItem(DoStuff, Queue.Dequeue());
}
// Wait for remaining threads to finish
int threadCount = 10;
while (threadCount != 0)
{
ThreadsAvailable.WaitOne();
--threadCount;
}
void DoStuff(object item)
{
ItemType theItem = (ItemType)item;
// process the item
StartProcessing(item);
// And then release the semaphore so another thread can run
ThreadsAvailable.Release();
}
The item is dequeued in the main loop because that avoids a race condition that otherwise is rather messy to handle. If you let the thread dequeue the item, then the thread has to do this:
lock (queue)
{
if (queue.Count > 0)
item = queue.Dequeue();
else
// There wasn't an item to dequeue
return;
}
Otherwise, the following sequence of events is likely to occur when there is only one item left in the queue.
main loop checks Queue.Count, which returns 1
main loop calls QueueUserWorkItem
main loop checks Queue.Count again, which returns 1 because the thread hasn't started yet
new thread starts and dequeues an item
main loop tries to dequeue an item and throws an exception because queue.Count == 0
If you're willing to handle things that way, then you're okay. The key is making sure that the thread calls Release on the semaphore before the thread exits. You can do that with explicitly managed threads, or with the ThreadPool approach that I posted. I just used ThreadPool because I find it easier than explicitly managing threads.
So all you need to handle this is a queue that is designed to be accessed from multilpe threads. Were you using .NET 4.0 I'd say use BlockingCollection. Not only will it work perfectly, but it's very efficient. You can rather trivially make your own class that is just a Queue with lock calls around all of the methods. It will work about as well, but it won't be as efficient. (It will likely be efficient enough for your purposes though, and a re-writing BlockingCollection "properly" would be quite hard.)
Once you have that queue each worker can just grab an item from that queue, process it, then ask the queue for another. When there are no more you don't need to worry about ending that thread; there's no more work it could do.
You should use ThreadPool which manages and optimizes threads for you
Once a thread in the pool completes its task, it is returned to a queue of waiting threads, where it can be reused. This reuse enables applications to avoid the cost of creating a new thread for each task.
Thread pools typically have a maximum number of threads. If all the threads are busy, additional tasks are put in queue until they can be serviced as threads become available.
It's better not to interfere into ThreadPool since it's enough smart to manage and allocate threads. But if you really need to do this, you can set the constraint of the maximum number of threads by using SetMaxThreads method
Instead of controlling the threads from the outside, let each thread consume data itself.
Pseudocode:
create 10 threads
thread code:
while elements in queue
get element from queue
process element
This is a simple producer-consumer scenario. You need a thread-safe queue like this one: Creating a blocking Queue<T> in .NET? - 10 threads can read and process job by job in a loop until the queue is empty.
Depending on how you fill the queue (prior to starting processing it or while processing it) you can end those threads as soon as the queue becomes empty or when you signal it to stop by means of a stop flag. In the latter case you probably need to wake the threads (eg with dummy jobs).
Related
Thread1 does Enqueue()
Thread2, 3 does Dequeue()
Thread 2, 3 has same mutex, when i used different mutex, sometime dequeue works twice.
When i use same mutex in Thread 1,2,3 it works fine. What is difference between using same mutex in Thread 1,2,3 and thread1_mutex for Thread1, thread2_mutex for Thread 2,3?
How to prevent dequeue (Thread 2 and 3) working twice for same value?
if dequeue works twice for same value, it prints twice in my WPF Textbox. I want to make it null for later Dequeue value.
public class NewData
{
public int seq;
public int data;
public NewData()
{
}
public NewData(int seq, int data)
{
this.seq = seq;
this.data = data;
}
}
private void Thread1()
{
while (true)
{
for (int i = 1; i <= threadRunningTime / threadSleep; i++)
{
NewData newData = new NewData(i, random.Next(100));
thread1_mutex.WaitOne();
queue.Enqueue(newData);
thread1_mutex.ReleaseMutex();
Thread.Sleep(threadSleep);
}
}
}
private void Thread2()
{
while (true)
{
NewData newData = new NewData();
thread2_mutex.WaitOne();
if (queue.Count != 0)
{
newData = queue.Dequeue();
}
else
{
newData = null;
}
thread2_mutex.ReleaseMutex();
Thread.Sleep(threadSleep);
}
}
private void Thread3()
{
while (true)
{
NewData newData = new NewData();
thread2_mutex.WaitOne();
if (queue.Count != 0)
{
newData = queue.Dequeue();
}
else
{
newData = null;
}
thread2_mutex.ReleaseMutex();
Thread.Sleep(threadSleep);
}
}
When i use same mutex in Thread 1,2,3 it works fine. What is difference between using same mutex in Thread 1,2,3 and thread1_mutex for Thread1, thread2_mutex for Thread 2,3?
The difference is that using different mutexes would allow one thread to enqueue an item at the same time another thread dequeues an item. Since the Queue class is not thread safe this is not allowed, and just about anything may happen if you do this. You must use a single mutex to prevent concurrent access, with the exception of concurrent read only access. But both enqueue and dequeue needs to write things, so that is not relevant in this case.
How to prevent dequeue (Thread 2 and 3) working twice for same value? if dequeue works twice for same value, it prints twice in my WPF Textbox. I want to make it null for later Dequeue value.
I would assume this is due to the issue above. If only a single thread has exclusive access to the queue you should not be getting duplicates. Note that any updates of the UI must be done from the UI thread. So if you are reading values from multiple threads you will need a thread safe way to hand these values over to the UI thread for display. In some sense, console programs may be easier to use for demonstration, since Console.WriteLine is thread safe.
I would also recommend using the lock statement instead of mutex. The former is both easier to use and should perform better. The only real use case I know for mutex is to provide synchronization across multiple processes, and that is a fairly rare thing to do. Ofcource, even better would be to use a concurrentQueue, but I'm assume that goes against the spirit of the assignment. Note that "mutex" may be used either as an abstract concept, i.e. to provide exclusive access to a resource, or to the specific mutex class in .net. So there may be some confusion about the terms used.
When Thread3 wins the race and calls Mutex.WaitOne before Thread2, then Thread2 must wait until Thread3 has released the mutex by calling Mutex.ReleaseMutex.
If Thread3 has finally released the mutex, Thread2 will be able to continue execution.
This is called a mutual exclusive (short mutex) lock or synchronization mechanism.
If you had used a dedicated mutex for each thread, then the threads won't be able to lock each other out and therefore both threads can access the shared resource at the same time (concurrently).
You need at least two participants for a mutual relationship.
In other words: if you want to synchronize access to a shared resource e.g., a Queue, in order to prevent undefined behavior like the dequeuing of the same element by different threads, all the accessing threads must be using the same mutex instance in order to be able lock each other out (mutually). That's the essence of synchronization.
Each mutex instance represents a new waiting queue for the threads.
If multiple threads share the same resource, they must also share the same waiting queue.
This is true for every synchronization mechanism.
I have a method that works on a queue. After consuming the first object in the queue, it goes to sleep for a predefined period (say 10 secs). Is there a way to wake that thread up if the queue is modified by any other thread on the 3rd or 4th second?
You should be using a collection specifically designed for such a purpose. One example is BlockingCollection, which allows you to take an item from the collection and, if there are no items to take, the method will block until there is an item to give to you. It is also a collection that is specifically designed to be manipulated from multiple threads, easing your burden on synchronization.
Note that BlockingCollection can be initialized so that it's backed with different types of collections. By default it will use a ConcurrentQueue, but there are other collections in the System.Collections.Concurrent namespace that you can use if you don't want queue semantics (it seems you do though). You can also implement your own collection implementing IProducerConsumerCollection<T> if you really need something unique.
Instead of Thread.Sleep:
You can use Monitor.Wait with a timeout and you can use Monitor.Pulseto wake it up if you need to from any thread.
Really good example/explanation here
In any case i'd recomend not to use Thread.Sleep() because it blocks thread completely.
It's much better to use AutoResetEvent or ManualResetEvent to synchronize two or more threads:
https://msdn.microsoft.com/en-us/library/system.threading.autoresetevent(v=vs.110).aspx
Servy has the correct answer for this using the Blocking Collection.
Just to add further: It creates a new thread pooled thread when "work" items become available on the queue and processes them asynchronously on that thread.
You can use one in a producer/consumer queue:
E.g.:
/// <summary>
/// Producer/consumer queue. Used when a task needs executing, it’s enqueued to ensure order,
/// allowing the caller to get on with other things. The number of consumers can be defined,
/// each running on a thread pool task thread.
/// Adapted from: http://www.albahari.com/threading/part5.aspx#_BlockingCollectionT
/// </summary>
public class ProducerConsumerQueue : IDisposable
{
private BlockingCollection<Action> _taskQ = new BlockingCollection<Action>();
public ProducerConsumerQueue(int workerCount)
{
// Create and start a separate Task for each consumer:
for (int i = 0; i < workerCount; i++)
{
Task.Factory.StartNew(Consume);
}
}
public void Dispose()
{
_taskQ.CompleteAdding();
}
public void EnqueueTask(Action action)
{
_taskQ.Add(action);
}
private void Consume()
{
// This sequence that we’re enumerating will block when no elements
// are available and will end when CompleteAdding is called.
// Note: This removes AND returns items from the collection.
foreach (Action action in _taskQ.GetConsumingEnumerable())
{
// Perform task.
action();
}
}
}
Thank you all for the options you suggested. I finally settled on AutoResetEvent for this requirement. After consuming the first object in the queue, instead of putting the main thread to Sleep, I spawned a new thread from the main thread where I called sleep. The main thread would just wait. Once the new thread wakes up, it will signal the main thread using Set and the main thread will resume. That is one part.
The second part - If any other thread modifies the queue, even that thread will call Set on the same EventWaitHandle, thus again making the main thread to resume.
This might not be an optimal solution but simpler than other approaches.
I would put the thread into a while iteration, then reduce the sleeptimer to something like 200 milliseconds.
But in every iteration I would check whether the queue was modified.
This way the Thread is constantly in the sleep-mode and kind of wakes up, when the queue was modified.
When you want to stop the thread you just set the while condition to false.
private static void Main(string[] args)
{
for (int i = 0; i < 1000; i++)
{
Task.Factory.StartNew(() =>
{
Thread.Sleep(1000);
Console.WriteLine("hej");
Thread.Sleep(10000);
});
}
Console.ReadLine();
}
Why this code won't print 1000 times "hej" after one second? Why Thread.Sleep(10000) has an impact on code behavior?
Factory.StartNew effectively delegates the work to ThreadPool.
Threadpool will create number of threads immediately to respond the request as long as threads count is less than or equal to processor count. Once it reaches processor count, threadpool will stop creating new threads immediately. That makes sense, because creating number of threads more than processor count introduces Thread scheduling overhead and returns nothing.
Instead it will throttle the creation of threads. It waits for 500 ms to see if any work still pending and no threads to process the request. If pending works are there, it will introduce a new thread(only one). This process keeps on going as long as you have enough works to do.
When work queue's traffic is cleared, threadpool will destroy the threads. And above mentioned process keeps on going.
Also, There is a max limit for number of threads threadpool can run simultaneously. If you hit that, threadpool will stop creating more threads and wait for previous work items to complete, So that it can reuse the existing thread.
That's not the end of story, It is convoluted! These are few decisions taken by ThreadPool.
I hope now that will be clear why you see what you see.
There are a multitude of factors that would alter the result.
Some being (but not limited to):
The inherent time for the iteration of the loop
The size of the thread pool
Thread management overhead
The way you code behaves is intended behaviour. You wait 1000 milliseconds to print hej and after printing you do Thread.sleep for another 10000 millesconds. If you want to print 1000 times hej after one second remove Thread.sleep(10000).
I have been trying to figure out how to solve an requirement I have but for the life of me I just can't come up with a solution.
I have a database of items which stores them a kind of queue.
(The database has already been implemented and other processes will be adding items to this queue.)
The items require a lot of work/time to "process" so I need to be able to:
Constantly de-queue items from the database.
For each item run a new thread and process the item and then return true/false it it was successfully processed. (this will be used to re-add it to the database queue or not)
But to only do this while the current number of active threads (one per item being processed) is less then a maximum number of threads parameter.
Once the maximum number of threads has been reached I need to stop de-queuing items from the database until the current number of threads is less than the maximum number of threads.
At which point it needs to continue de-queuing items.
It feels like this should be something I can come up with but it is just not coming to me.
To clarify: I only need to implement the threading. The database has already be implemented.
One really easy way to do this is with a Semaphore. You have one thread that dequeues items and creates threads to process them. For example:
const int MaxThreads = 4;
Semaphore sem = new Semaphore(MaxThreads, MaxThreads);
while (Queue.HasItems())
{
sem.WaitOne();
var item = Queue.Dequeue();
Threadpool.QueueUserWorkItem(ProcessItem, item); // see below
}
// When the queue is empty, you have to wait for all processing
// threads to complete.
// If you can acquire the semaphore MaxThreads times, all workers are done
int count = 0;
while (count < MaxThreads)
{
sem.WaitOne();
++count;
}
// the code to process an item
void ProcessItem(object item)
{
// cast the item to whatever type you need,
// and process it.
// when done processing, release the semaphore
sem.Release();
}
The above technique works quite well. It's simple to code, easy to understand, and very effective.
One change is that you might want to use the Task API rather Threadpool.QueueUserWorkItem. Task gives you more control over the asynchronous processing, including cancellation. I used QueueUserWorkItem in my example because I'm more familiar with it. I would use Task in a production program.
Although this does use N+1 threads (where N is the number of items you want processed concurrently), that extra thread isn't often doing anything. The only time it's running is when it's assigning work to worker threads. Otherwise, it's doing a non-busy wait on the semaphore.
Do you just not know where to start?
Consider a thread pool with a max number of threads. http://msdn.microsoft.com/en-us/library/y5htx827.aspx
Consider spinning up your max number of threads immediately and monitoring the DB. http://msdn.microsoft.com/en-us/library/system.threading.threadpool.queueuserworkitem.aspx is convenient.
Remember that you can't guarantee your process will be ended safely...crashes happen. Consider logging of processing state.
Remember that your select and remove-from-queue operations should be atomic.
Ok, so the architecture of the solution is going to depend on one thing: does the processing time per queue item vary according to the item's data?
If not then you can have something that merely round-robins between the processing threads. This will be fairly simple to implement.
If the processing time does vary then you're going to need something with more of a 'next available' feel to it, so that whichever of you threads happens to be free first gets given the job of processing the data item.
Having worked that out you're then going to have the usual run around with how to synchronise between a queue reader and the processing threads. The difference between 'next-available' and 'round-robin' is how you do that synchronisation.
I'm not overly familiar with C#, but I've heard tell of a beast called a background worker. That is likely to be an acceptable means of bringing this about.
For round robin, just start up a background worker per queue item, storing the workers' references in an array. Limit yourself to, say, 16 in progress background workers. The idea is that having started 16 you would then wait for the first to complete before starting the 17th, and so on. I believe that background workers actually run as jobs on the thread pool, so that will automatically limit the number of threads that are actually running at any one time to something appropriate for the underlying hardware. To wait for a background worker see this. Having waited for a background worker to complete you'd then handle its result and start another up.
For the next available approach its not so different. Instead of waiting for the 1st to complete you would use WaitAny() to wait for any of the workers to complete. You handle the return from whichever one completed, and then start another one up and go back to WaitAny().
The general philosophy of both approaches is to keep a number of threads on the boil all the time. A features of the next-available approach is that the order in which you emit the results is not necessarily the same as the order of the input items. If that matters then the round robin approach with more background workers than CPU cores will be reasonably efficient (the threadpool will just start commissioned but not yet running workers anyway). However the latency will vary with the processing time.
BTW 16 is an arbitrary number chosen on the basis of how many cores you think will be on the PC running the software. More cores, bigger number.
Of course, in the seemingly restless and ever changing world of .NET there may now be a better way of doing this.
Good luck!
I have an unlimited number of tasks in a db queue somewhere. What is the best way to have a program working on n tasks simultaneously on n different threads, starting new tasks as old ones get done? When one task finishes, another task should asynchronously begin. The currently-running count should always be n.
My initial thought was to use a thread pool, but that seems unnecessary considering that the tasks to be worked on will be retrieved within the individual threads. In other words, each thread will on its own go get its next task rather than having a main thread get tasks and then distribute them.
I see multiple options for doing this, and I don't know which one I should use for optimal performance.
1) Thread Pool - In light of there not necessarily being any waiting threads, I'm not sure this is necessary.
2) Semaphore - Same as 1. What's the benefit of a semaphore if there aren't tasks waiting to be allocated by the main thread?
3) Same Threads Forever - Kick the program off with n threads. When a thread is done working, it gets the next task itself. The main thread just monitors to makes sure the n threads are still alive.
4) Event Handling - Same as 3, except that when a thread finishes a task, it fires off an ImFinished event before dying. An ImFinished event handler kicks off a new thread. This seems just like 3 but with more overhead (since new threads are constantly being created)
5) Something else?
BlockingCollection makes this whole thing pretty trivial:
var queue = new BlockingCollection<Action>();
int numWorkers = 5;
for (int i = 0; i < numWorkers; i++)
{
Thread t = new Thread(() =>
{
foreach (var action in queue.GetConsumingEnumerable())
{
action();
}
});
t.Start();
}
You can then have the main thread add items to the blocking collection after starting the workers (or before, if you want). You can even spawn multiple producer threads to add items to the queue.
Note that the more conventional approach would be to use Tasks instead of using Thread classes directly. The primary reasons that I didn't suggest it first is that you specifically requested an exact number of threads to be running (rather than a maximum) and you just don't have as much control over how Task objects are run (which is good; they can be optimized on your behalf). If that control isn't as important as you have stated the following may end up being preferable:
var queue = new BlockingCollection<Action>();
int numWorkers = 5;
for (int i = 0; i < numWorkers; i++)
{
Task.Factory.StartNew(() =>
{
foreach (var action in queue.GetConsumingEnumerable())
{
action();
}
}, CancellationToken.None, TaskCreationOptions.LongRunning, TaskScheduler.Default);
}
I like model #3, and have used it before; it reduces the number of threads starting and stopping, and makes the main thread a true "supervisor", reducing the work it has to do.
As Servy has indicated, the System.Collections.Concurrent namespace has a few constructs that are extremely valuable here. ConcurrentQueue is a thread-safe FIFO collection implementation designed to be used in just such a model; one or more "producer" threads add elements to the "input" side of the queue, while one or more "consumers" take elements out of the other end. If there is nothing in the queue, the call to get the item simply returns false; you can react to that by exiting out of the task method (the supervisor can then decide whether to start another task, probably by monitoring the input to the queue and ramping up when more items come in).
BlockingCollection adds the behavior of causing threads to wait when they attempt to get a value from the queue, if the queue doesn't have anything. It can also be configured to have a maximum capacity, above which it will block the "producer" threads adding any more elements until there is available capacity. BlockingCollection uses a ConcurrentQueue by default, but you can set it up to be a Stack, Dictionary or Bag if you wish. Using this model, you can have the tasks run indefinitely; when there's nothing to do they'll simply block until there is something for at least one of them to work on, so all the supervisor has to check for is tasks erroring out (a critical element of any robust threaded workflow pattern).
This is easily achieved with the TPL Dataflow library.
First, let's assume you have a BufferBlock<T>, this is your queue:
var queue = new BufferBlock<T>();
Then, you need the action to perform on the block, this is represented by the ActionBlock<T> class:
var action = new ActionBlock<T>(t => { /* Process t here */ },
new ExecutionDataflowBlockOptions {
// Number of concurrent tasks.
MaxDegreeOfParallelism = ...,
});
Note the constructor above, it takes an instance of ExecutionDataflowBlockOptions and sets the MaxDegreeOfParallelism property to however many concurrent items you want to be processed at the same time.
Underneath the surface, the Task Parallel Library is being used to handle allocating threads for tasks, etc. TPL Dataflow is meant to be a higher level abstraction which allows you to tweak just how much parallelism/throttling/etc that you want.
For example, if you didn't want the ActionBlock<TInput> to buffer any items (preferring them to live in the BufferBlock<T>), you can also set the BoundedCapacity property, which will limit the number of items that the ActionBlock<TInput> will hold onto at once (which includes the number of items being processed, as well as reserved items):
var action = new ActionBlock<T>(t => { /* Process t here */ },
new ExecutionDataflowBlockOptions {
// Number of concurrent tasks.
MaxDegreeOfParallelism = ...,
// Set to MaxDegreeOfParallelism to not buffer.
BoundedCapacity ...,
});
Also, if you want a new, fresh Task<TResult> instance to process every item, then you can set the MaxMessagesPerTask property to one, indicating that each and every Task<TResult> will process one item:
var action = new ActionBlock<T>(t => { /* Process t here */ },
new ExecutionDataflowBlockOptions {
// Number of concurrent tasks.
MaxDegreeOfParallelism = ...,
// Set to MaxDegreeOfParallelism to not buffer.
BoundedCapacity ...,
// Process once item per task.
MaxMessagesPerTask = 1,
});
Note that depending on how many other tasks your application is running, this might or might not be optimal for you, and you might also want to think of the cost of spinning up a new task for every item that comes through the ActionBlock<TInput>.
From there, it's a simple matter of linking the BufferBlock<T> to the ActionBlock<TInput> with a call to the LinkTo method:
IDisposable connection = queue.LinkTo(action, new DataflowLinkOptions {
PropagateCompletion = true;
});
You set the PropogateCompletion property to true here so that when waiting on the ActionBlock<T>, the completion will be sent to the ActionBlock<T> (if/when there are no more items to process) which you might subsequently wait on.
Note the you can call the Dispose method on the IDisposable interface implementation returned from the call to LinkTo if you want the link between the blocks to be removed.
Finally, you post items to the buffer using the Post method:
queue.Post(new T());
And when you're done (if you are ever done), you call the Complete method:
queue.Complete();
Then, on the action block, you can wait until it's done by waiting on the Task instance exposed by the Completion property:
action.Completion.Wait();
Hopefully, the elegance of this is clear:
You don't have to manage the creation of new Task instances/threads/etc to manage the work, the blocks do it for you based on the settings you provide (and this is on a per-block basis).
Cleaner separation of concerns. The buffer is separated from the action, as are all the other blocks. You build the blocks and then link them together.
I'm a VB guy, but you can easily translate:
Private Async Sub foo()
Dim n As Integer = 16
Dim l As New List(Of Task)
Dim jobs As New Queue(Of Integer)(Enumerable.Range(1, 100))
For i = 1 To n
Dim j = jobs.Dequeue
l.Add(Task.Run((Sub()
Threading.Thread.Sleep(500)
Console.WriteLine(j)
End Sub)))
Next
While l.Count > 0
Dim t = Await Task.WhenAny(l)
If jobs.Count > 0 Then
Dim j = jobs.Dequeue
l(l.IndexOf(t)) = (Task.Run((Sub()
Threading.Thread.Sleep(500)
Console.WriteLine(j)
End Sub)))
Else
l.Remove(t)
End If
End While
End Sub
There's an article from Stephen Toub, why you shouldn't use Task.WhenAny in this way ... WITH A LARGE LIST OF TASKS, but with "some" tasks you usually dont run into a problem
The idea is quite simple: You have a list, where you add as many (running) tasks as you would like to run in parallel. Then you (a)wait for the first one to finish. If there are still jobs in the queue, you assign the job to a new task and then (a)wait again. If there are no jobs in the queue, you simply remove the finished task. If both your tasklist and the queue is empty, you are done.
The Stephen Toub article: http://blogs.msdn.com/b/pfxteam/archive/2012/08/02/processing-tasks-as-they-complete.aspx