.NET ThreadPool QueueUserWorkItem Synchronization - c#

I am employing ThreadPool.QueueUserWorkItem to play some sound files and not hanging up the GUI while doing so.
It is working but has an undesirable side effect.
While the QueueUserWorkItem CallBack Proc is being executed there is nothing to stop it from starting a new thread. This causes the samples in the threads to overlap.
How can I make it so that it waits for the already running thread to finish running and only then run the next request?
EDIT: private object sync = new Object();
lock (sync) {
.......do sound here
}
this works. plays in the sounds in order.
but some samples are getting played more than once when i keep sending sound requests before the one being played ends. will investigate.
EDIT: is the above a result of Lock Convoy #Aaronaught mentioned?

This is a classic thread synchronization issue, where you have multiple clients that all want to use the same resource and need to control how they access it. In this particular case, the sound system is willing to play more than one sound at the same time (and this is often desirable), but since you don't want that behavior, you can use standard locking to gate access to the sound system:
public static class SequentialSoundPlayer
{
private static Object _soundLock = new object();
public static void PlaySound(Sound sound)
{
ThreadPool.QueueUserWorkItem(AsyncPlaySound, sound);
}
private static void AsyncPlaySound(Object state)
{
lock (_soundLock)
{
Sound sound = (Sound) state;
//Execute your sound playing here...
}
}
}
where Sound is whatever object you're using to represent a sound to be played. This mechanism is 'first come, first served' when multiple sounds vie for play time.
As mentioned in another response, be careful of excessive 'pile-up' of sounds, as you'll start to tie up the ThreadPool.

You could use a single thread with a queue to play all the sounds.
When you want to play a sound, insert a request into the queue and signal to the playing thread that there is a new sound file to be played. The sound playing thread sees the new request and plays it. Once the sound completes, it checks to see if there are any more sounds in the queue and if so plays the next, otherwise it waits for the next request.
One possible problem with this method is that if you have too many sounds that need to be played you can get an evergrowing backlog so that sounds may come several seconds or possibly even minutes late. To avoid this you might want to put a limit on the queue size and drop some sounds if you have too many.

A very simple producer/consumer queue would be ideal here - since you only have 1 producer and 1 consumer you can do it with minimal locking.
Don't use a critical section (lock statement) around the actual Play method/operation as some people are suggesting, you can very easily end up with a lock convoy. You do need to lock, but you should only be doing it for very short periods of time, not while a sound is actually playing, which is an eternity in computer time.
Something like this:
public class SoundPlayer : IDisposable
{
private int maxSize;
private Queue<Sound> sounds = new Queue<Sound>(maxSize);
private object sync = new Object();
private Thread playThread;
private bool isTerminated;
public SoundPlayer(int maxSize)
{
if (maxSize < 1)
throw new ArgumentOutOfRangeException("maxSize", maxSize,
"Value must be > 1.");
this.maxSize = maxSize;
this.sounds = new Queue<Sound>();
this.playThread = new Thread(new ThreadStart(ThreadPlay));
this.playThread.Start();
}
public void Dispose()
{
isTerminated = true;
lock (sync)
{
Monitor.PulseAll(sync);
}
playThread.Join();
}
public void Play(Sound sound)
{
lock (sync)
{
if (sounds.Count == maxSize)
{
return; // Or throw exception, or block
}
sounds.Enqueue(sound);
Monitor.PulseAll(sync);
}
}
private void PlayInternal(Sound sound)
{
// Actually play the sound here
}
private void ThreadPlay()
{
while (true)
{
lock (sync)
{
while (!isTerminated && (sounds.Count == 0))
Monitor.Wait(sync);
if (isTerminated)
{
return;
}
Sound sound = sounds.Dequeue();
Play(sound);
}
}
}
}
This will allow you to throttle the number of sounds being played by setting maxSize to some reasonable limit, like 5, after which point it will simply discard new requests. The reason I use a Thread instead of ThreadPool is simply to maintain a reference to the managed thread and be able to provide proper cleanup.
This only uses one thread, and one lock, so you'll never have a lock convoy, and will never have sounds playing at the same time.
If you're having any trouble understanding this, or need more detail, have a look at Threading in C# and head over to the "Producer/Consumer Queue" section.

The simplest code you could write would be as follows:
private object playSoundSync = new object();
public void PlaySound(Sound someSound)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(delegate
{
lock (this.playSoundSync)
{
PlaySound(someSound);
}
}));
}
Allthough very simple it pontentially could yield problems:
If you play a lot of (longer) sounds simultaneously there will be a lot of locks and a lot of threadpool threads get used up.
The order you enqueued the sounds is not necesesarily the order they will be played back.
in practise these problems should only be relevant if you play a lot of sounds frequently or if the sounds are very long.

Another option, if you can make the (major) simplifying assumption that any attempts to play a second sound while the first is still playing will just be ignored, is to use a single event:
private AutoResetEvent playEvent = new AutoResetEvent(true);
public void Play(Sound sound)
{
ThreadPool.QueueUserWorkItem(s =>
{
if (playEvent.WaitOne(0))
{
// Play the sound here
playEvent.Set();
}
});
}
This one's dead easy, with the obvious disadvantage that it will simply discard "extra" sounds instead of queuing them. But in this case, it may be exactly what you want, and we get to use the thread pool because this function will return instantly if a sound is already playing. It's basically "lock-free."

As per your edit, create your thread like this:
MySounds sounds = new MySounds(...);
Thread th = new Thread(this.threadMethod, sounds);
th.Start();
And this will be your thread entry point.
private void threadMethod (object obj)
{
MySounds sounds = obj as MySounds;
if (sounds == null) { /* do something */ }
/* play your sounds */
}

The use of ThreadPool is not the error. The error is queueing every sound as work item. Naturally the thread pool will start more threads. This is what it is supposed to do.
Build your own queue. I have one (AsyncActionQueue). It queues items and when it has an item it will start a ThreadPool WorkItem - not one per item, ONE (unless one is already queued and not finished). The callback basically unqeueues items and processes them.
This allows me to have X queues share Y threads (i.e. not waste threads) and still get very nice async operations. I use that for a comples UI trading application - X windows (6, 8) communicating with a central service cluster (i.e. a number of services) and they all use async queues to move items back and forth (well, mostly forth towards the UI).
One thing you NEED to be aware of - and that has been said already - is that if you overload your queue, it will fall back. What to do then depends on your. I have a ping/pong message that gets queued regularly to a service (from the window) and if not returned in time, the window goes grey marking "I am stale" until it catches up.

Microsoft's new TPL Dataflow Library could be a good solution for this sort of thing. Check out the video here - the first code example demonstrated fits your requirements pretty much exactly.
http://channel9.msdn.com/posts/TPL-Dataflow-Tour

Related

Potential race conditions with ConcurrentBag and multithreaded application

I've been wrestling for the past few months with how to improve a process where I'm using a DispatcherTimer to periodically check resources to see if they need to be updated/processed. After updating the resource("Product"), move the Product to the next step in the process, etc. The resource may or may not be available immediately.
The reason I have been struggling is two-fold. One reason is that I want to implement this process asynchronously, since it is just synchronous at the moment. The second reason is that I have identified the area where my implementation is stuck and it seems like not an uncommon design pattern but I have no idea how to describe it succinctly, so I can't figure out how to get a useful answer from google.
A rather important note is that I am accessing these Products via direct USB connection, so I am using LibUsbDotNet to interface with the devices. I have made the USB connections asyncronous so I can connect to multiple Products at the same time and process an arbitrary number at once.
public Class Product
{
public bool IsSoftwareUpdated = false;
public bool IsProductInformationCorrect = false;
public bool IsEOLProcessingCompleted = false;
public Product(){}
~Product()
}
public class ProcessProduct
{
List<Product> bagOfProducts = new List<Product>(new Product[10]);
ConcurrentBag<Product> UnprocessedUnits = new ConcurrentBag<Product>();
ConcurrentBag<Product> CurrentlyUpdating = new ConcurrentBag<Product>();
ConcurrentBag<Product> CurrentlyVerifyingInfo = new ConcurrentBag<Product>();
ConcurrentBag<Product> FinishedProcessing = new ConcurrentBag<Product>();
DispatcherTimer _timer = new DispatcherTimer();
public ProcessProduct()
{
_timer.Tick += Timer_Tick; //Every 1 second, call Timer_Tick
_timer.Interval = new TimeSpan(0,0,1); //1 Second timer
bagOfProducts.ForEach(o => UnprocessedUnits.Add(o)); //Fill the UnprocessedUnits with all products
StartProcessing();
}
private void StartProcessing()
{
_timer.Start();
}
private void Timer_Tick(object sender, EventArgs e)
{
ProductOrganizationHandler();
foreach(Product prod in CurrentlyUpdating.ToList())
{
UpdateProcessHandler(prod); //Async function that uses await
}
foreach(Product prod in CurrentlyVerifyingInfo.ToList())
{
VerifyingInfoHandler(prod); //Async function that uses Await
}
if(FinishedProcessing.Count == bagOfProducts.Count)
{
_timer.Stop(); //If all items have finished processing, then stop the process
}
}
private void ProductOrganizationHandler()
{
//Take(read REMOVE) Product from each ConcurrentBag 1 by 1 and moves that item to the bag that it needs to go
//depending on which process step is finished
//(or puts it back in the same bag if that step was not finished).
//E.G, all items are moved from UnprocessUnits to CurrentlyUpdating or CurrentlyVerifying etc.
//If a product is finished updating, it is moved from CurrentlyUpdating to CurrentlyVerifying or FinishedProcessing
}
private async void UpdateProcessHandler(Product prod)
{
await Task.Delay(1000).ConfigureAwait(false);
//Does some actual work validating USB communication and then running through the USB update
}
private async void VerifyingInfoHandler(Product prod)
{
await Task.Delay(1000).ConfigureAwait(false);
//Does actual work here and communicates with the product via USB
}
}
Full Compile-ready code example available via my code on Pastebin.
So, my question really is this: Are there any meaningful race conditions in this code? Specifically, with the ProductOrganizationHandler() code and the looping through the ConcurrentBags in Timer_Tick() (since a new call to Timer_Tick() happens every second). I'm sure this code works the majority of the time, but I am afraid of a hard-to-track bug later on that happens because of a rare race condition when, say, ProductOrganizationHandler() takes > 1 sec to run for some dumb reason.
As a secondary note: Is this even the best design pattern for this type of process? C# is my first OOP language and all self-taught on the job (nearly all of my job is Embedded C) so I don't have any formal experience with OOP design patterns.
My main goal is to asynchronously Update/Verify/Communicate with each device as it becomes available via USB. Once all products in the list are finished (or a timeout), then the process finishes. This project is in .NET 5.
EDIT: For anyone that comes along later with the same question, here's what I did.
I did not understand that DispatcherTimer add Ticks to the Dispatcher queue. This implies that a tick will only run if there is not already another instance of Tick already running or, worded another way, Timer_Tick will run to completion before the next Timer_Tick instance runs.
So, most(all?) of the Threading/concurrency concerns I had were unfounded and I can treat the Timer_Tick as a single-threaded non-concurrent function (which it is).
Also, to keep Ticks from piling up, I ran _timer.Stop() at the beginning of Timer_Tick and restarted the timer at the end of Timer_Tick.
First of all, you are using DispatchTimer, this will raise ticks on the UI thread. So as far as I can see there is no multi threading going on in the example. There are other timers, like System.Timers.Timer that raises events on a background thread if that is the intent. But if you just want to check and update status every so often, and are not running any code that blocks, just using the UI thread is fine and will simplify things a lot.
Even if we assume ProductOrganizationHandler did run on a worker thread, it would still be generally safe to remove items from one concurrent collection and putting them in another. But it would not guarantee that items are processed in any particular order, nor that any specific item is processed by a given tick of the timer. But since the timer will tick periodically all the items should eventually be processed. Keep in mind that most timers need to be disposed, so you need to handle that somehow, including if the processing is stopped prematurely.
Keep in mind that async does not mean concurrent, so I would not use it unless your USB library provides async methods. Even then I would avoid async void since this promotes exceptions to the captured synchronization context, potentially crashing the application, so it should mostly be used in the outermost layer, like button event handlers, or timers, and then you should probably handle exceptions somehow.
As for the best way to do it, I would take a look at DataFlow library.

What is the reason for "while(true) { Thread.Sleep }"?

I sometimes encounter code in the following form:
while (true) {
//do something
Thread.Sleep(1000);
}
I was wondering if this is considered good or bad practice and if there are any alternatives.
Usually I "find" such code in the main-function of services.
I recently saw code in the "Run" function in a windows azure worker role which had the following form:
ClassXYZ xyz = new ClassXYZ(); //ClassXYZ creates separate Threads which execute code
while (true) {
Thread.Sleep(1000);
}
I assume there are better ways to prevent a service (or azure worker role) from exiting.
Does anyone have a suggestion for me?
Well when you do that with Thread.Sleep(1000), your processor wastes a tiny amount of time to wake up and do nothing.
You could do something similar with CancelationTokenSource.
When you call WaitOne(), it will wait until it receives a signal.
CancellationTokenSource cancelSource = new CancellationTokenSource();
public override void Run()
{
//do stuff
cancelSource.Token.WaitHandle.WaitOne();
}
public override void OnStop()
{
cancelSource.Cancel();
}
This will keep the Run() method from exiting without wasting your CPU time on busy waiting.
An alternative approach may be using an AutoResetEvent and instantiate it signaled by default.
public class Program
{
public static readonly AutoResetEvent ResetEvent = new AutoResetEvent(true);
public static void Main(string[] args)
{
Task.Factory.StartNew
(
() =>
{
// Imagine sleep is a long task which ends in 10 seconds
Thread.Sleep(10000);
// We release the whole AutoResetEvent
ResetEvent.Set();
}
);
// Once other thread sets the AutoResetEvent, the program ends
ResetEvent.WaitOne();
}
}
Is the so-called while(true) a bad practice?
Well, in fact, a literal true as while loop condition may be considered a bad practice, since it's an unbrekeable loop: I would always use a variable condition which may result in true or false.
When I would use a while loop or something like the AutoResetEvent approach?
When to use while loop...
...when you need to execute code while waiting the program to end.
When to use AutoResetEvent approach...
...when you just need to hold the main thread in order to prevent the program to end, but such main thread just needs to wait until some other thread requests a program exit.
If you see code like this...
while (true)
{
//do something
Thread.Sleep(1000);
}
It's most likely using Sleep() as a means of waiting for some event to occur — something like user input/interaction, a change in the file system (such as a file being created or modified in a folder, network or device event, etc. That would suggest using more appropriate tools:
If the code is waiting for a change in the file system, use a FileSystemWatcher.
If the code is waiting for a thread or process to complete, or a network event to occur, use the appropriate synchronization primitive and WaitOne(), WaitAny() or WaitAll() as appropriate. If you use an overload with a timeout in a loop, it gives you cancelability as well.
But without knowing the actual context, it's rather hard to say categorically that it's either good, bad or indifferent. If you've got a daemon running that has to poll on a regular basis (say an NTP client), a loop like that would make perfect sense (though the daemon would need some logic to monitor for shutdown events occuring.) And even with something like that, you could replace it with a scheduled task: a different, but not necessarily better, design.
If you use while(true) you have no programmatic means of ending the loop from outside the loop.
I'd prefer, at least, a while(mySingletonValue) which would allow us to switch the loop as needed.
An additional approach would be to remove the functional behavior from the looping behavior. Your loop my still be infinite but it calls a function defined elsewhere. Therefore the looping behavior is completely isolated to what is being executed by the loop:
while(GetMySingletonValue())
{
someFunction();
}
In this way your singleton controls the looping behavior entirely.
There are better ways to keep the Azure Service and exit when needed.
Refer:
http://magnusmartensson.com/howto-wait-in-a-workerrole-using-system-timers-timer-and-system-threading-eventwaithandle-over-system-threading-thread-sleep
http://blogs.lessthandot.com/index.php/DesktopDev/MSTech/azure-worker-role-exiting-safely/
It really depends on that //do something on how it determines when to break out of the loop.
In general terms, more appropriate way to do it is to use some synchronization primitive (like ManualResetEvent) to wait on, and the code that processes and triggers the break of the loop (on the other thread) to signal on that primitive. This way you don't have thread wasting resources by being scheduled in every second to do nothing, and is a much cleaner way to do it.
I personally don't like Thread.Sleep code. Because it locks the main thread. You can write something like this, if it is a windows application besides it allows you more flexibility and you can call it async:
bool switchControl = true;
while (switchControl) {
//do something
await Wait(1);
}
async void Wait(int Seconds)
{
DateTime Tthen = DateTime.Now;
do
{
Application.DoEvents(); //Or something else or leave empty;
} while (Tthen.AddSeconds(Seconds) > DateTime.Now);
}

Two threads one core

I'm playing around with a simple console app that creates one thread and I do some inter thread communication between the main and the worker thread.
I'm posting objects from the main thread to a concurrent queue and the worker thread is dequeueing that and does some processing.
What strikes me as odd, is that when I profile this app, even despite I have two cores.
One core is 100% free and the other core have done all the work, and I see that both threads have been running in that core.
Why is this?
Is it because I use a wait handle that sets when I post a message and releases when the processing is done?
This is my sample code, now using 2 worker threads.
It still behaves the same, main, worker1 and worker2 is running in the same core.
Ideas?
[EDIT]
It sort of works now, atleast, I get twice the performance compared to yesterday.
the trick was to slow down the consumer just enough to avoid signaling using the AutoResetEvent.
public class SingleThreadDispatcher
{
public long Count;
private readonly ConcurrentQueue<Action> _queue = new ConcurrentQueue<Action>();
private volatile bool _hasMoreTasks;
private volatile bool _running = true;
private int _status;
private readonly AutoResetEvent _signal = new AutoResetEvent(false);
public SingleThreadDispatcher()
{
var thread = new Thread(Run)
{
IsBackground = true,
Name = "worker" + Guid.NewGuid(),
};
thread.Start();
}
private void Run()
{
while (_running)
{
_signal.WaitOne();
do
{
_hasMoreTasks = false;
Action task;
while (_queue.TryDequeue(out task) && _running)
{
Count ++;
task();
}
//wait a short while to let _hasMoreTasks to maybe be set to true
//this avoids the roundtrip to the AutoResetEvent
//that is, if there is intense pressure on the pool, we let some new
//tasks have the chance to arrive and be processed w/o signaling
if(!_hasMoreTasks)
Thread.Sleep(5);
Interlocked.Exchange(ref _status, 0);
} while (_hasMoreTasks);
}
}
public void Schedule(Action task)
{
_hasMoreTasks = true;
_queue.Enqueue(task);
SetSignal();
}
private void SetSignal()
{
if (Interlocked.Exchange(ref _status, 1) == 0)
{
_signal.Set();
}
}
}
Is it because I use a wait handle that sets when I post a message and releases when the processing is done?
Without seeing your code it is hard to say for sure, but from your description it appears that the two threads that you wrote act as co-routines: when the main thread is running, the worker thread has nothing to do, and vice versa. It looks like .NET scheduler is smart enough to not load the second core when this happens.
You can change this behavior in several ways - for example
by doing some work on the main thread before waiting on the handle, or
by adding more worker threads that would compete for the tasks that your main thread posts, and could both get a task to work on.
OK, I've figured out what the problem is.
The producer and consumer is pretty much just as fast in this case.
This results in the consumer finishing all its work fast and then looping back to wait for the AutoResetEvent.
The next time the producer sends a task, it has to touch the AutoresetEvent and set it.
The solution was to add a very very small delay in the consumer, making it slightly slower than the producer.
This results in when the producer sends a task, it notices that the consumer is already active and it just has to post to the worker queue w/o touching the AutoResetEvent.
The original behavior resulted in a sort of ping-pong effect, that can be seen on the screenshot.
Dasblinkelight (probably) has the right answer.
Apart from that, it would also be the correct behaviour when one of your threads is I/O bound (that is, it's not stuck on the CPU) - in that case, you've got nothing to gain from using multiple cores, and .NET is smart enough to just change contexts on one core.
This is often the case for UI threads - it has very little work to do, so there usually isn't much of a reason for it to occupy a whole core for itself. And yes, if your concurrent queue is not used properly, it could simply mean that the main thread waits for the worker thread - again, in that case, there's no need to switch cores, since the original thread is waiting anyway.
You should use BlockingCollection rather than ConcurrentQueue. By default, BlockingCollection uses a ConcurrentQueue under the hood, but it has a much easier to use interface. In particular, it does non-busy waits. In addition, BlockingCollection supports cancellation, so your consumer becomes very simple. Here's an example:
public class SingleThreadDispatcher
{
public long Count;
private readonly BlockingCollection<Action> _queue = new BlockingCollection<Action>();
private readonly CancellationTokenSource _cancellation = new CancellationTokenSource();
public SingleThreadDispatcher()
{
var thread = new Thread(Run)
{
IsBackground = true,
Name = "worker" + Guid.NewGuid(),
};
thread.Start();
}
private void Run()
{
foreach (var task in _queue.GetConsumingEnumerable(_cancellation.Token))
{
Count++;
task();
}
}
public void Schedule(Action task)
{
_queue.Add(task);
}
}
The loop with GetConsumingEnumerable will do a non-busy wait on the queue. There's no need to do it with a separate event. It will wait for an item to be added to the queue, or it will exit if you set the cancellation token.
To stop it normally, you just call _queue.CompleteAdding(). That tells the consumer that no more items will be added to the queue. The consumer will empty the queue and then exit.
If you want to quit early, then just call _cancellation.Cancel(). That will cause GetConsumingEnumerable to exit.
In general, you shouldn't ever have to use ConcurrentQueue directly. BlockingCollection is easier to use and provides equivalent performance.

Can I optimise this concurrency better?

I've recently begun my first multi-threading code, and I'd appreciate some comments.
It delivers video samples from a buffer that is filled in the background by a stream parser (outside the scope of this question). If the buffer is empty, it needs to wait until the buffer level becomes acceptable and then continue.
Code is for Silverlight 4, some error-checking removed:
// External class requests samples - can happen multiple times concurrently
protected override void GetSampleAsync()
{
Interlocked.Add(ref getVideoSampleRequestsOutstanding, 1);
}
// Runs on a background thread
void DoVideoPumping()
{
do
{
if (getVideoSampleRequestsOutstanding > 0)
{
PumpNextVideoSample();
// Decrement the counter
Interlocked.Add(ref getVideoSampleRequestsOutstanding, -1);
}
else Thread.Sleep(0);
} while (!this.StopAllBackgroundThreads);
}
void PumpNextVideoSample()
{
// If the video sample buffer is empty, tell stream parser to give us more samples
bool MyVidBufferIsEmpty = false; bool hlsClientIsExhausted = false;
ParseMoreSamplesIfMyVideoBufferIsLow(ref MyVidBufferIsEmpty, ref parserAtEndOfStream);
if (parserAtEndOfStream) // No more data, start running down buffers
this.RunningDownVideoBuffer = true;
else if (MyVidBufferIsEmpty)
{
// Buffer is empty, wait for samples
WaitingOnEmptyVideoBuffer = true;
WaitOnEmptyVideoBuffer.WaitOne();
}
// Buffer is OK
nextSample = DeQueueVideoSample(); // thread-safe, returns NULL if a problem
// Send the sample to the external renderer
ReportGetSampleCompleted(nextSample);
}
The code seems to work well. However, I'm told that using Thread.Wait(...) is 'evil': when no samples are being requested, my code loops unnecessarily, eating up CPU time.
Can my code be further optimised? Since my class is designed for an environment where samples WILL be requested, does the potential 'pointless loop' scenario outweigh the simplicity of its current design?
Comments much appreciated.
This looks like the classic producer/consumer pattern. The normal way to solve this is with what is known as a blocking queue.
Version 4.0 of .net introduced a set of efficient, well-designed, concurrent collection classes for this very type of problem. I think BlockingCollection<T> will serve your present needs.
If you don't have access to .net 4.0 then there are many websites containing implementations of blocking queues. Personally my standard reference is Joe Duffy's book, Concurrent Programming on Windows. A good start would be Marc Gravell's blocking queue presented here in Stack Overflow.
The first advantage of using a blocking queue is that you stop using busy wait loops, hacky calls to Sleep() etc. Using a blocking queue to avoid this sort of code is always a good idea.
However, I perceive a more important benefit to using a blocking queue. At the moment your code to produce work items, consume them, and handle the queue is all intermingled. If you use a blocking queue correctly then you will end up with much better factored code which keeps separate various components of the algorithm: queue, producer and consumer.
You have one main problem: Thread.Sleep()
It has a granularity of ~20ms, that is kind of crude for video. In addition Sleep(0) has issues of possible starvation of lower-priority threads [].
The better approach is waiting on a Waithandle, preferably built into a Queue.
Blocking queue is a good and simple example of a blocking queue.
The main key is that the threads need to be coordinated with signals and not by checking the value of a counter or the state of a data structure. Any checking takes ressources (CPU) and thus you need signals (Monitor.Wait and Monitor.Pulse).
You could use an AutoResetEvent rather than a manual thread.sleep. It's fairly simple to do so:
AutoResetEvent e;
void RequestSample()
{
Interlocked.Increment(ref requestsOutstanding);
e.Set(); //also set this when StopAllBackgroundThreads=true!
}
void Pump()
{
while (!this.StopAllBackgroundThreads) {
e.WaitOne();
int leftOver = Interlocked.Decrement(ref requestsOutstanding);
while(leftOver >= 0) {
PumpNextVideoSample();
leftOver = Interlocked.Decrement(ref requestsOutstanding);
}
Interlocked.Increment(ref requestsOutstanding);
}
}
Note that it's probably even more attractive to implement a semaphore. Basically; synchronization overhead is liable to be almost nil anyhow in your scenario, and a simpler programming model is worth it. With a semaphore, you'd have something like this:
MySemaphore sem;
void RequestSample()
{
sem.Release();
}
void Pump()
{
while (true) {
sem.Acquire();
if(this.StopAllBackgroundThreads) break;
PumpNextVideoSample();
}
}
...I'd say the simplicity is worth it!
e.g. a simple implemenation of a semaphore:
public sealed class SimpleSemaphore
{
readonly object sync = new object();
public int val;
public void WaitOne()
{
lock(sync) {
while(true) {
if(val > 0) {
val--;
return;
}
Monitor.Wait(sync);
}
}
}
public void Release()
{
lock(sync) {
if(val==int.MaxValue)
throw new Exception("Too many releases without waits.");
val++;
Monitor.Pulse(sync);
}
}
}
On one trivial benchmark this trivial implementation needs ~1.7 seconds where Semaphore needs 7.5 and SemaphoreSlim needs 1.1; suprisingly reasonable, in other words.

Multiple Threads

I post a lot here regarding multithreading, and the great stackoverflow community have helped me alot in understand multithreading.
All the examples I have seen online only deal with one thread.
My application is a scraper for an insurance company (family company ... all free of charge). Anyway, the user is able to select how many threads they want to run. So lets say for example the user wants the application to scrape 5 sites at one time, and then later in the day he choses 20 threads because his computer isn't doing anything else so it has the resources to spare.
Basically the application builds a list of say 1000 sites to scrape. A thread goes off and does that and updates the UI and builds the list.
When thats finished another thread is called to start the scraping. Depending on the number of threads the user has set to use it will create x number of threads.
Whats the best way to create these threads? Should I create 1000 threads in a list. And loop through them? If the user has set 5 threads to run, it will loop through 5 at a time.
I understand threading, but it's the application logic which is catching me out.
Any ideas or resources on the web that can help me out?
You could consider using a thread pool for that:
using System;
using System.Threading;
public class Example
{
public static void Main()
{
ThreadPool.SetMaxThreads(100, 10);
// Queue the task.
ThreadPool.QueueUserWorkItem(new WaitCallback(ThreadProc));
Console.WriteLine("Main thread does some work, then sleeps.");
Thread.Sleep(1000);
Console.WriteLine("Main thread exits.");
}
// This thread procedure performs the task.
static void ThreadProc(Object stateInfo)
{
Console.WriteLine("Hello from the thread pool.");
}
}
This scraper, does it use a lot of CPU when its running?
If it does a lot of communication with these 1000 remote sites, downloading their pages, that may be taking more time than the actual analysis of the pages.
And how many CPU cores does your user have? If they have 2 (which is common these days) then beyond two simultaneous threads performing analysis, they aren't going to see any speed up.
So you probably need to "parallelize" the downloading of the pages. I doubt you need to do the same for the analysis of the pages.
Take a look into asynchronous IO, instead of explicit multi-threading. It lets you launch a bunch of downloads in parallel and then get called back when each one completes.
If you really just want the application, use something someone else already spent time developing and perfecting:
http://arachnode.net/
arachnode.net is a complete and comprehensive .NET web crawler for
downloading, indexing and storing
Internet content including e-mail
addresses, files, hyperlinks, images,
and Web pages.
Whether interested or involved in
screen scraping, data mining, text
mining, research or any other
application where a high-performance
crawling application is key to the
success of your endeavors,
arachnode.net provides the solution
you need for success.
If you also want to write one yourself because it's a fun thing to write (I wrote one not long ago, and yes, it is alot of fun ) then you can refer to this pdf provided by arachnode.net which really explains in detail the theory behind a good web crawler:
http://arachnode.net/media/Default.aspx?Sort=Downloads&PageIndex=1
Download the pdf entitled: "Crawling the Web" (second link from top). Scroll to Section 2.6 entitled: "2.6 Multi-threaded Crawlers". That's what I used to build my crawler, and I must say, I think it works quite well.
I think this example is basically what you need.
public class WebScraper
{
private readonly int totalThreads;
private readonly List<System.Threading.Thread> threads;
private readonly List<Exception> exceptions;
private readonly object locker = new object();
private volatile bool stop;
public WebScraper(int totalThreads)
{
this.totalThreads = totalThreads;
threads = new List<System.Threading.Thread>(totalThreads);
exceptions = new List<Exception>();
for (int i = 0; i < totalThreads; i++)
{
var thread = new System.Threading.Thread(Execute);
thread.IsBackground = true;
threads.Add(thread);
}
}
public void Start()
{
foreach (var thread in threads)
{
thread.Start();
}
}
public void Stop()
{
stop = true;
foreach (var thread in threads)
{
if (thread.IsAlive)
{
thread.Join();
}
}
}
private void Execute()
{
try
{
while (!stop)
{
// Scrap away!
}
}
catch (Exception ex)
{
lock (locker)
{
// You could have a thread checking this collection and
// reporting it as you see fit.
exceptions.Add(ex);
}
}
}
}
The basic logic is:
You have a single queue in which you put the URLs to scrape then you create your threads and use a queue object to which every thread has access. Let the threads start a loop:
lock the queue
check if there are items in the queue, if not, unlock queue and end thread
dequeue first item in the queue
unlock queue
process item
invoke an event that updates the UI (Remember to lock the UI Controller)
return to step 1
Just let the Threads do the "get stuff from the queue" part (pulling the jobs) instead of giving them the urls (pushing the jobs), that way you just say
YourThreadManager.StartThreads(numberOfThreadsTheUserWants);
and everything else happens automagically. See the other replies to find out how to create and manage the threads .
I solved a similar problem by creating a worker class that uses a callback to signal the main app that a worker is done. Then I create a queue of 1000 threads and then call a method that launches threads until the running thread limit is reached, keeping track of the active threads with a dictionary keyed by the thread's ManagedThreadId. As each thread completes, the callback removes its thread from the dictionary and calls the thread launcher.
If a connection is dropped or times out, the callback reinserts the thread back into the queue. Lock around the queue and the dictionary. I create threads vs using the thread pool because the overhead of creating a thread is insignificant compared to the connection time, and it allows me to have a lot more threads in flight. The callback also provides a convenient place with which to update the user interface, even allowing you to change the thread limit while it's running. I've had over 50 open connections at one time. Remember to increase your MacConnections property in your app.config (default is two).
I would use a queue and a condition variable and mutex, and start just the requested number of threads, for example, 5 or 20 (and not start 1,000).
Each thread blocks on the condition variable. When woken up, it dequeues the first item, unlocks the queue, works with the item, locks the queue and checks for more items. If the queue is empty, sleep on the condition variable. If not, unlock, work, repeat.
While the mutex is locked, it can also check if the user has requested the count of threads to be reduced. Just check if count > max_count, and if so, the thread terminates itself.
Any time you have more sites to queue, just lock the mutex and add them to the queue, then broadcast on the condition variable. Any threads that are not already working will wake up and take new work.
Any time the user increases the requested thread count, just start them up and they will lock the queue, check for work, and either sleep on the condition variable or get going.
Each thread will be continually pulling more work from the queue, or sleeping. You don't need more than 5 or 20.
Consider using the event-based asynchronous pattern (AsyncOperation and AsyncOperationManager Classes)
You might want to take a look at the ProcessQueue article on CodeProject.
Essentially, you'll want to create (and start) the number of threads that are appropriate, in your case that number comes from the user. Each of these threads should process a site, then find the next site needed to process. Even if you don't use the object itself (though it sounds like it would suit your purposes pretty well, though I'm obviously biased!) it should give you some good insight into how this sort of thing would be done.

Categories