I need to implement a sort of task buffer. Basic requirements are:
Process tasks in a single background thread
Receive tasks from multiple threads
Process ALL received tasks i.e. make sure buffer is drained of buffered tasks after a stop signal is received
Order of tasks received per thread must be maintained
I was thinking of implementing it using a Queue like below. Would appreciate feedback on the implementation. Are there any other brighter ideas to implement such a thing?
public class TestBuffer
{
private readonly object queueLock = new object();
private Queue<Task> queue = new Queue<Task>();
private bool running = false;
public TestBuffer()
{
}
public void start()
{
Thread t = new Thread(new ThreadStart(run));
t.Start();
}
private void run()
{
running = true;
bool run = true;
while(run)
{
Task task = null;
// Lock queue before doing anything
lock (queueLock)
{
// If the queue is currently empty and it is still running
// we need to wait until we're told something changed
if (queue.Count == 0 && running)
{
Monitor.Wait(queueLock);
}
// Check there is something in the queue
// Note - there might not be anything in the queue if we were waiting for something to change and the queue was stopped
if (queue.Count > 0)
{
task = queue.Dequeue();
}
}
// If something was dequeued, handle it
if (task != null)
{
handle(task);
}
// Lock the queue again and check whether we need to run again
// Note - Make sure we drain the queue even if we are told to stop before it is emtpy
lock (queueLock)
{
run = queue.Count > 0 || running;
}
}
}
public void enqueue(Task toEnqueue)
{
lock (queueLock)
{
queue.Enqueue(toEnqueue);
Monitor.PulseAll(queueLock);
}
}
public void stop()
{
lock (queueLock)
{
running = false;
Monitor.PulseAll(queueLock);
}
}
public void handle(Task dequeued)
{
dequeued.execute();
}
}
You can actually handle this with the out-of-the-box BlockingCollection.
It is designed to have 1 or more producers, and 1 or more consumers. In your case, you would have multiple producers and one consumer.
When you receive a stop signal, have that signal handler
Signal producer threads to stop
Call CompleteAdding on the BlockingCollection instance
The consumer thread will continue to run until all queued items are removed and processed, then it will encounter the condition that the BlockingCollection is complete. When the thread encounters that condition, it just exits.
You should think about ConcurrentQueue, which is FIFO, in fact. If not suitable, try some of its relatives in Thread-Safe Collections. By using these you can avoid some risks.
I suggest you take a look at TPL DataFlow. BufferBlock is what you're looking for, but it offers so much more.
Look at my lightweight implementation of threadsafe FIFO queue, its a non-blocking synchronisation tool that uses threadpool - better than create own threads in most cases, and than using blocking sync tools as locks and mutexes. https://github.com/Gentlee/SerialQueue
Usage:
var queue = new SerialQueue();
var result = await queue.Enqueue(() => /* code to synchronize */);
You could use Rx on .NET 3.5 for this. It might have never come out of RC, but I believe it is stable* and in use by many production systems. If you don't need Subject you might find primitives (like concurrent collections) for .NET 3.5 you can use that didn't ship with the .NET Framework until 4.0.
Alternative to Rx (Reactive Extensions) for .net 3.5
* - Nit picker's corner: Except for maybe advanced time windowing, which is out of scope, but buffers (by count and time), ordering, and schedulers are all stable.
Related
I have read the Toub's thread pool is a good solution for longer running tasks, so I implemented it in the following code. I'm not even sure if my implementation is a good one because I seem to have sporadic memory bloat. The process runs around 50 MB most of the time then will spike to almost a GB and stay there.
The thread pool implementation is as follows (should I even be doing this?):
private void Run()
{
while (!_stop)
{
// Create new threads if we have room in the pool
while (ManagedThreadPool.ActiveThreads < _runningMax)
{
ManagedThreadPool.QueueUserWorkItem(new WaitCallback(FindWork));
}
// Pause for a second so we don't run the CPU to death
Thread.Sleep(1000);
}
}
The method FindWork looks like this:
private void FindWork(object stateInfo)
{
bool result = false;
bool process = false;
bool queueResult = false;
Work_Work work = null;
try
{
using (Queue workQueue = new Queue(_workQueue))
{
// Look for work on the work queue
workQueue.Open(Queue.Mode.Consume);
work = workQueue.ConsumeWithBlocking<Work_Work>();
// Do some work with the message from the queue ...
return;
The ConsumeWithBlocking method blocks if there is nothing in the queue. Then we call return to exit the thread if we successfully retrieve a message and process it.
Typically we run 10 threads with them typically in the blocking state (WaitSleepJoin). The whole point of this is to have 10 threads running at all times.
Am I going about this all wrong?
I have a Thread (STAThread) in a Windows Service, which performs a big amount of work. When the windows service is restarted I want to stop this thread gracefully.
I know of a couple of ways
A volatile boolean
ManualResetEvent
CancellationToken
As far as I have found out Thread.Abort is a no go...
What is the best practice ?
The work is perfomed in another class than the one where the thread is started, so it is necessary to either introduce a cancellationToken parameter in a constructor or for example have a volatile variable. But I just can't figure out what is smartest.
Update
Just to clarify a little I have wrapped up a very simple example of what I'm talking about. As said earlier, this is being done in a windows service. Right now I'm thinking a volatile boolean that is checked on in the loop or a cancellationToken....
I cannot wait for the loop to finish, as stated below it can take several minutes, making the system administrators of the server believe that something is wrong with the service when they need to restart it.... I can without problems just drop all the work within the loop without problems, however I cannot do this with a Thread.Abort it is "evil" and furthermore a COM interface is called, so a small clean up is needed.
Class Scheduler{
private Thread apartmentThread;
private Worker worker;
void Scheduling(){
worker = new Worker();
apartmentThread = new Thread(Run);
apartmentThread.SetApartmentState(ApartmentState.STA);
apartmentThread.Start();
}
private void Run() {
while (!token.IsCancellationRequested) {
Thread.Sleep(pollInterval * MillisecondsToSeconds);
if (!token.IsCancellationRequested) {
worker.DoWork();
}
}
}
}
Class Worker{
//This will take several minutes....
public void DoWork(){
for(int i = 0; i < 50000; i++){
//Do some work including communication with a COM interface
//Communication with COM interface doesn't take long
}
}
}
UPDATE
Just examined performance, using a cancellationToken where the isCancelled state is "examined" in the code, is much faster than using a waitOne on a ManualResetEventSlim. Some quick figuers, an if on the cancellationToken iterating 100.000.000 times in a for loop costs me approx. 500 ms, where the WaitOne costs approx. 3 seconds. So performance in this scenario it is faster to use the cancellationToken.
You haven't posted enough of your implementation but I would highly recommend a CancellationToken if that is available to you. It's simple enough to use and understand from a maintainability standpoint. You can setup cooperative cancellation as well too if you decide to have more than one worker thread.
If you find yourself in a situation where this thread may block for long periods of time, it's best to setup your architecture so that this doesn't occur. You shouldn't be starting threads that won't play nice when you tell them to stop. If they don't stop when you ask them, the only real way is to tear down the process and let the OS kill them.
Eric Lippert posted a fantastic answer to a somewhat-related question here.
I tend to use a bool flag, a lock object and a Terminate() method, such as:
object locker = new object();
bool do_term = false;
Thread thread = new Thread(ThreadStart(ThreadProc));
thread.Start();
void ThreadProc()
{
while (true) {
lock (locker) {
if (do_term) break;
}
... do work...
}
}
void Terminate()
{
lock (locker) {
do_term = true;
}
}
Asides from Terminate() all the other fields and methods are private to the "worker" class.
Use a WaitHandle, most preferably a ManualResetEvent. Your best bet is to let whatever is in your loop finish. This is the safest way to accomplish your goal.
ManualResetEvent _stopSignal = new ManualResetEvent(false); // Your "stopper"
ManualResetEvent _exitedSignal = new ManualResetEvent(false);
void DoProcessing() {
try {
while (!_stopSignal.WaitOne(0)) {
DoSomething();
}
}
finally {
_exitedSignal.Set();
}
}
void DoSomething() {
//Some work goes here
}
public void Terminate() {
_stopSignal.Set();
_exitedSignal.WaitOne();
}
Then to use it:
Thread thread = new Thread(() => { thing.DoProcessing(); });
thread.Start();
//Some time later...
thing.Terminate();
If you have a particularly long-running process in your "DoSomething" implementation, you may want to call that asynchronously, and provide it with state information. That can get pretty complicated, though -- better to just wait until your process is finished, then exit, if you are able.
There are two situations in which you may find your thread:
Processing.
Blocking.
In the case where your thread is processing something, you must wait for your thread to finish processing in order for it to safely exit. If it's part of a work loop, then you can use a boolean flag to terminate the loop.
In the case where your thread is blocking, then you need to wake your thread and get it processing again. A thread may be blocking on a ManualResetEvent, a database call, a socket call or whatever else you could block on. In order to wake it up, you must call the Thread.Interrupt() method which will raise a ThreadInterruptedException.
It may look something like this:
private object sync = new object():
private bool running = false;
private void Run()
{
running = true;
while(true)
{
try
{
lock(sync)
{
if(!running)
{
break;
}
}
BlockingFunction();
}
catch(ThreadInterruptedException)
{
break;
}
}
}
public void Stop()
{
lock(sync)
{
running = false;
}
}
And here is how you can use it:
MyRunner r = new MyRunner();
Thread t = new Thread(()=>
{
r.Run();
});
t.IsBackground = true;
t.Start();
// To stop the thread
r.Stop();
// Interrupt the thread if it's in a blocking state
t.Interrupt();
// Wait for the thread to exit
t.Join();
I have scenarios where I need a main thread to wait until every one of a set of possible more than 64 threads have completed their work, and for that I wrote the following helper utility, (to avoid the 64 waithandle limit on WaitHandle.WaitAll())
public static void WaitAll(WaitHandle[] handles)
{
if (handles == null)
throw new ArgumentNullException("handles",
"WaitHandle[] handles was null");
foreach (WaitHandle wh in handles) wh.WaitOne();
}
With this utility method, however, each waithandle is only examined after every preceding one in the array has been signalled... so it is in effect synchronous, and will not work if the waithandles are autoResetEvent wait handles (which clear as soon as a waiting thread has been released)
To fix this issue I am considering changing this code to the following, but would like others to check and see if it looks like it will work, or if anyone sees any issues with it, or can suggest a better way ...
Thanks in advance:
public static void WaitAllParallel(WaitHandle[] handles)
{
if (handles == null)
throw new ArgumentNullException("handles",
"WaitHandle[] handles was null");
int actThreadCount = handles.Length;
object locker = new object();
foreach (WaitHandle wh in handles)
{
WaitHandle qwH = wh;
ThreadPool.QueueUserWorkItem(
delegate
{
try { qwH.WaitOne(); }
finally { lock(locker) --actThreadCount; }
});
}
while (actThreadCount > 0) Thread.Sleep(80);
}
If you know how many threads you have, you can use an interlocked decrement. This is how I usually do it:
{
eventDone = new AutoResetEvent();
totalCount = 128;
for(0...128) {ThreadPool.QueueUserWorkItem(ThreadWorker, ...);}
}
void ThreadWorker(object state)
try
{
... work and more work
}
finally
{
int runningCount = Interlocked.Decrement(ref totalCount);
if (0 == runningCount)
{
// This is the last thread, notify the waiters
eventDone.Set();
}
}
Actually, most times I don't even signal but instead invoke a callback continues the processing from where the waiter would continue. Less blocked threads, more scalability.
I know is different and may not apply to your case (eg. for sure will not work if some of thoe handles are not threads, but I/O or events), but it may worth thinking about this.
I'm not sure what exactly you're trying to do, but would a CountdownEvent (.NET 4.0) conceptually solve your problem?
I'm not a C# or .NET programmer, but you could use a semaphore that is posted when one of your worker threads exits. The monitoring thread would simply wait on the semaphore n times where n is the number of worker threads. Semaphores are traditionally used to count resources in use but they can be used to count jobs completed by waiting on the same semaphore for n times.
When working with lots of simultaneous threads, I prefer to add each thread's ManagedThreadId into a Dictionary when I start the thread, and then have each thread invoke a callback routine that removes the dying thread's id from the Dictionary. The Dictionary's Count property tells you how many threads are active. Use the value side of the key/value pair to hold info that your UI thread can use to report status. Wrap the Dictionary with a lock to keep things safe.
ThreadPool.QueueUserWorkItem(o =>
{
try
{
using (var h = (o as WaitHandle))
{
if (!h.WaitOne(100000))
{
// Alert main thread of the timeout
}
}
}
finally
{
Interlocked.Decrement(ref actThreadCount);
}
}, wh);
I'm trying to use WebClient to download a bunch of files asynchronously. From my understanding, this is possible, but you need to have one WebClient object for each download. So I figured I'd just throw a bunch of them in a queue at the start of my program, then pop them off one at a time and tell them to download a file. When the file is done downloading, they can get pushed back onto the queue.
Pushing stuff onto my queue shouldn't be too bad, I just have to do something like:
lock(queue) {
queue.Enqueue(webClient);
}
Right? But what about popping them off? I want my main thread to sleep when the queue is empty (wait until another web client is ready so it can start the next download). I suppose I could use a Semaphore alongside the queue to keep track of how many elements are in the queue, and that would put my thread to sleep when necessary, but it doesn't seem like a very good solution. What happens if I forget to decrement/increment my Semaphore every time I push/pop something on/off my queue and they get out of sync? That would be bad. Isn't there some nice way to have queue.Dequeue() automatically sleep until there is an item to dequeue then proceed?
I'd also welcome solutions that don't involve a queue at all. I just figured a queue would be the easiest way to keep track of which WebClients are ready for use.
Here's an example using a Semaphore. IMO it is a lot cleaner than using a Monitor:
public class BlockingQueue<T>
{
Queue<T> _queue = new Queue<T>();
Semaphore _sem = new Semaphore(0, Int32.MaxValue);
public void Enqueue(T item)
{
lock (_queue)
{
_queue.Enqueue(item);
}
_sem.Release();
}
public T Dequeue()
{
_sem.WaitOne();
lock (_queue)
{
return _queue.Dequeue();
}
}
}
What you want is a producer/consumer queue.
I have a simple example of this in my threading tutorial - scroll about half way down that page. It was written pre-generics, but it should be easy enough to update. There are various features you may need to add, such as the ability to "stop" the queue: this is often performed by using a sort of "null work item" token; you inject as many "stop" items in the queue as you have dequeuing threads, and each of them stops dequeuing when it hits one.
Searching for "producer consumer queue" may well provide you with better code samples - this was really just do demonstrate waiting/pulsing.
IIRC, there are types in .NET 4.0 (as part of Parallel Extensions) which will do the same thing but much better :) I think you want a BlockingCollection wrapping a ConcurrentQueue.
I use a BlockingQueue to deal with exactly this type of situation. You can call .Dequeue when the queue is empty, and the calling thread will simply wait until there is something to Dequeue.
public class BlockingQueue<T> : IEnumerable<T>
{
private int _count = 0;
private Queue<T> _queue = new Queue<T>();
public T Dequeue()
{
lock (_queue)
{
while (_count <= 0)
Monitor.Wait(_queue);
_count--;
return _queue.Dequeue();
}
}
public void Enqueue(T data)
{
if (data == null)
throw new ArgumentNullException("data");
lock (_queue)
{
_queue.Enqueue(data);
_count++;
Monitor.Pulse(_queue);
}
}
IEnumerator<T> IEnumerable<T>.GetEnumerator()
{
while (true)
yield return Dequeue();
}
IEnumerator IEnumerable.GetEnumerator()
{
return ((IEnumerable<T>) this).GetEnumerator();
}
}
Just use this in place of a normal Queue and it should do what you need.
What is the most recommended .NET custom threadpool that can have separate instances i.e more than one threadpool per application?
I need an unlimited queue size (building a crawler), and need to run a separate threadpool in parallel for each site I am crawling.
Edit :
I need to mine these sites for information as fast as possible, using a separate threadpool for each site would give me the ability to control the number of threads working on each site at any given time. (no more than 2-3)
Thanks
Roey
I believe Smart Thread Pool can do this. It's ThreadPool class is instantiated so you should be able to create and manage your separate site specific instances as you require.
Ami bar wrote an excellent Smart thread pool that can be instantiated.
take a look here
Ask Jon Skeet: http://www.yoda.arachsys.com/csharp/miscutil/
Parallel extensions for .Net (TPL) should actually work much better if you want a large number of parallel running tasks.
Using BlockingCollection can be used as a queue for the threads.
Here is an implementation of it.
Updated at 2018-04-23:
public class WorkerPool<T> : IDisposable
{
BlockingCollection<T> queue = new BlockingCollection<T>();
List<Task> taskList;
private CancellationTokenSource cancellationToken;
int maxWorkers;
private bool wasShutDown;
int waitingUnits;
public WorkerPool(CancellationTokenSource cancellationToken, int maxWorkers)
{
this.cancellationToken = cancellationToken;
this.maxWorkers = maxWorkers;
this.taskList = new List<Task>();
}
public void enqueue(T value)
{
queue.Add(value);
waitingUnits++;
}
//call to signal that there are no more item
public void CompleteAdding()
{
queue.CompleteAdding();
}
//create workers and put then running
public void startWorkers(Action<T> worker)
{
for (int i = 0; i < maxWorkers; i++)
{
taskList.Add(new Task(() =>
{
string myname = "worker " + Guid.NewGuid().ToString();
try
{
while (!cancellationToken.IsCancellationRequested)
{
var value = queue.Take();
waitingUnits--;
worker(value);
}
}
catch (Exception ex) when (ex is InvalidOperationException) //throw when collection is closed with CompleteAdding method. No pretty way to do this.
{
//do nothing
}
}));
}
foreach (var task in taskList)
{
task.Start();
}
}
//wait for all workers to be finish their jobs
public void await()
{
while (waitingUnits >0 || !queue.IsAddingCompleted)
Thread.Sleep(100);
shutdown();
}
private void shutdown()
{
wasShutDown = true;
Task.WaitAll(taskList.ToArray());
}
//case something bad happen dismiss all pending work
public void Dispose()
{
if (!wasShutDown)
{
queue.CompleteAdding();
shutdown();
}
}
}
Then use like this:
WorkerPool<int> workerPool = new WorkerPool<int>(new CancellationTokenSource(), 5);
workerPool.startWorkers(value =>
{
log.Debug(value);
});
//enqueue all the work
for (int i = 0; i < 100; i++)
{
workerPool.enqueue(i);
}
//Signal no more work
workerPool.CompleteAdding();
//wait all pending work to finish
workerPool.await();
You can have as many polls has you like simply creating new WorkPool objects.
This free nuget library here: CodeFluentRuntimeClient has a CustomThreadPool class that you can reuse. It's very configurable, you can change pool threads priority, number, COM apartment state, even name (for debugging), and also culture.
Another approach is to use a Dataflow Pipeline. I added these later answer because i find Dataflows a much better approach for these kind of problem, the problem of having several thread pools. They provide a more flexible and structured approach and can easily scale vertically.
You can broke your code into one or more blocks, link then with Dataflows and let then the Dataflow engine allocate threads according to CPU and memory availability
I suggest to broke into 3 blocks, one for preparing the query to the site page , one access site page, and the last one to Analise the data.
This way the slow block (get) may have more threads allocated to compensate.
Here how would look like the Dataflow setup:
var linkOptions = new DataflowLinkOptions { PropagateCompletion = true };
prepareBlock.LinkTo(get, linkOptions);
getBlock.LinkTo(analiseBlock, linkOptions);
Data will flow from prepareBlock to getBlock and then to analiseBlock.
The interfaces between blocks can be any class, just have to bee the same. See the full example on Dataflow Pipeline
Using the Dataflow would be something like this:
while ...{
...
prepareBlock.Post(...); //to send data to the pipeline
}
prepareBlock.Complete(); //when done
analiseBlock.Completion.Wait(cancellationTokenSource.Token); //to wait for all queues to empty or cancel