Limiting the number of threadpool threads - c#

I am using ThreadPool in my application. I have first set the limit of the thread pool by using the following:
ThreadPool.SetMaxThreads(m_iThreadPoolLimit,m_iThreadPoolLimit);
m_Events = new ManualResetEvent(false);
and then I have queued up the jobs using the following
WaitCallback objWcb = new WaitCallback(abc);
ThreadPool.QueueUserWorkItem(objWcb, m_objThreadData);
Here abc is the name of the function that I am calling.
After this I am doing the following so that all my threads come to 1 point and the main thread takes over and continues further
m_Events.WaitOne();
My thread limit is 3. The problem that I am facing is, inspite of the thread pool limit set to 3, my application is processing more than 3 files at the same time, whereas it was supposed to process only 3 files at a time. Please help me solve this issue.

What kind of computer are you using?
From MSDN
You cannot set the number of worker
threads or the number of I/O
completion threads to a number smaller
than the number of processors in the
computer.
If you have 4 cores, then the smallest you can have is 4.
Also note:
If the common language runtime is
hosted, for example by Internet
Information Services (IIS) or SQL
Server, the host can limit or prevent
changes to the thread pool size.
If this is a web site hosted by IIS then you cannot change the thread pool size either.

A better solution involves the use of a Semaphore which can throttle the concurrent access to a resource1. In your case the resource would simply be a block of code that processes work items.
var finished = new CountdownEvent(1); // Used to wait for the completion of all work items.
var throttle = new Semaphore(3, 3); // Used to throttle the processing of work items.
foreach (WorkItem item in workitems)
{
finished.AddCount();
WorkItem capture = item; // Needed to safely capture the loop variable.
ThreadPool.QueueUserWorkItem(
(state) =>
{
throttle.WaitOne();
try
{
ProcessWorkItem(capture);
}
finally
{
throttle.Release();
finished.Signal();
}
}, null);
}
finished.Signal();
finished.Wait();
In the code above WorkItem is a hypothetical class that encapsulates the specific parameters needed to process your tasks.
The Task Parallel Library makes this pattern a lot easier. Just use the Parallel.ForEach method and specify a ParallelOptions.MaxDegreesOfParallelism that throttles the concurrency.
var options = new ParallelOptions();
options.MaxDegreeOfParallelism = 3;
Parallel.ForEach(workitems, options,
(item) =>
{
ProcessWorkItem(item);
});
1I should point out that I do not like blocking ThreadPool threads using a Semaphore or any blocking device. It basically wastes the threads. You might want to rethink your design entirely.

You should use Semaphore object to limit concurent threads.

You say the files are open: are they actually being actively processed, or just left open?
If you're leaving them open: Been there, done that! Relying on connections and resources (it was a DB connection in my case) to close at end of scope should work, but it can take for the dispose / garbage collection to kick in.

Related

How can I make sure a dataflow block only creates threads on a on-demand basis?

I've written a small pipeline using the TPL Dataflow API which receives data from multiple threads and performs handling on them.
Setup 1
When I configure it to use MaxDegreeOfParallelism = Environment.ProcessorCount (comes to 8 in my case) for each block, I notice it fills up buffers in multiple threads and processing the second block doesn't start until +- 1700 elements have been received across all threads. You can see this in action here.
Setup 2
When I set MaxDegreeOfParallelism = 1 then I notice all elements are received on a single thread and processing the sending already starts after +- 40 elements are received. Data here.
Setup 3
When I set MaxDegreeOfParallelism = 1 and I introduce a delay of 1000ms before sending each input, I notice elements get sent as soon as they are received and every received element is put on a separate thread. Data here.
So far the setup. My questions are the following:
When I compare setups 1 & 2 I notice that processing elements starts much faster when done in serial compared to parallel (even after accounting for the fact that parallel has 8x as many threads). What causes this difference?
Since this will be run in an ASP.NET environment, I don't want to spawn unnecessary threads since they all come from a single threadpool. As shown in setup 3 it will still spread itself over multiple threads even when there is only a handful of data. This is also surprising because from setup 1 I would assume that data is spread sequentially over threads (notice how the first 50 elements all go to thread 16). Can I make sure it only creates new threads on a on-demand basis?
There is another concept called the BufferBlock<T>. If the TransformBlock<T> already queues input, what would be the practical difference of swapping the first step in my pipeline (ReceiveElement) for a BufferBlock?
class Program
{
static void Main(string[] args)
{
var dataflowProcessor = new DataflowProcessor<string>();
var amountOfTasks = 5;
var tasks = new Task[amountOfTasks];
for (var i = 0; i < amountOfTasks; i++)
{
tasks[i] = SpawnThread(dataflowProcessor, $"Task {i + 1}");
}
foreach (var task in tasks)
{
task.Start();
}
Task.WaitAll(tasks);
Console.WriteLine("Finished feeding threads"); // Needs to use async main
Console.Read();
}
private static Task SpawnThread(DataflowProcessor<string> dataflowProcessor, string taskName)
{
return new Task(async () =>
{
await FeedData(dataflowProcessor, taskName);
});
}
private static async Task FeedData(DataflowProcessor<string> dataflowProcessor, string threadName)
{
foreach (var i in Enumerable.Range(0, short.MaxValue))
{
await Task.Delay(1000); // Only used for the delayedSerialProcessing test
dataflowProcessor.Process($"Thread name: {threadName}\t Thread ID:{Thread.CurrentThread.ManagedThreadId}\t Value:{i}");
}
}
}
public class DataflowProcessor<T>
{
private static readonly ExecutionDataflowBlockOptions ExecutionOptions = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount
};
private static readonly TransformBlock<T, T> ReceiveElement = new TransformBlock<T, T>(element =>
{
Console.WriteLine($"Processing received element in thread {Thread.CurrentThread.ManagedThreadId}");
return element;
}, ExecutionOptions);
private static readonly ActionBlock<T> SendElement = new ActionBlock<T>(element =>
{
Console.WriteLine($"Processing sent element in thread {Thread.CurrentThread.ManagedThreadId}");
Console.WriteLine(element);
}, ExecutionOptions);
static DataflowProcessor()
{
ReceiveElement.LinkTo(SendElement);
ReceiveElement.Completion.ContinueWith(x =>
{
if (x.IsFaulted)
{
((IDataflowBlock) ReceiveElement).Fault(x.Exception);
}
else
{
ReceiveElement.Complete();
}
});
}
public void Process(T newElement)
{
ReceiveElement.Post(newElement);
}
}
Before you deploy your solution to the ASP.NET environment, I suggest you to change your architecture: IIS can suspend threads in ASP.NET for it's own use after the request handled so your task could be unfinished. Better approach is to create a separate windows service daemon, which handles your dataflow.
Now back to the TPL Dataflow.
I love the TPL Dataflow library but it's documentation is a real mess.
The only useful document I've found is Introduction to TPL Dataflow.
There are some clues in it which can be helpful, especially the ones about Configuration Settings (I suggest you to investigate the implementing your own TaskScheduler with using your own TheadPool implementation, and MaxMessagesPerTask option) if you need:
The built-in dataflow blocks are configurable, with a wealth of control provided over how and where blocks perform their work. Here are some key knobs available to the developer, all of which are exposed through the DataflowBlockOptions class and its derived types (ExecutionDataflowBlockOptions and GroupingDataflowBlockOptions), instances of which may be provided to blocks at construction time.
TaskScheduler customization, as #i3arnon mentioned:
By default, dataflow blocks schedule work to TaskScheduler.Default, which targets the internal workings of the .NET ThreadPool.
MaxDegreeOfParallelism
It defaults to 1, meaning only one thing may happen in a block at a time. If set to a value higher than 1, that number of messages may be processed concurrently by the block. If set to DataflowBlockOptions.Unbounded (-1), any number of messages may be processed concurrently, with the maximum automatically managed by the underlying scheduler targeted by the dataflow block. Note that MaxDegreeOfParallelism is a maximum, not a requirement.
MaxMessagesPerTask
TPL Dataflow is focused on both efficiency and control. Where there are necessary trade-offs between the two, the system strives to provide a quality default but also enable the developer to customize behavior according to a particular situation. One such example is the trade-off between performance and fairness. By default, dataflow blocks try to minimize the number of task objects that are necessary to process all of their data. This provides for very efficient execution; as long as a block has data available to be processed, that block’s tasks will remain to process the available data, only retiring when no more data is available (until data is available again, at which point more tasks will be spun up). However, this can lead to problems of fairness. If the system is currently saturated processing data from a given set of blocks, and then data arrives at other blocks, those latter blocks will either need to wait for the first blocks to finish processing before they’re able to begin, or alternatively risk oversubscribing the system. This may or may not be the correct behavior for a given situation. To address this, the MaxMessagesPerTask option exists.
It defaults to DataflowBlockOptions.Unbounded (-1), meaning that there is no maximum. However, if set to a positive number, that number will represent the maximum number of messages a given block may use a single task to process. Once that limit is reached, the block must retire the task and replace it with a replica to continue processing. These replicas are treated fairly with regards to all other tasks scheduled to the scheduler, allowing blocks to achieve a modicum of fairness between them. In the extreme, if MaxMessagesPerTask is set to 1, a single task will be used per message, achieving ultimate fairness at the potential expense of more tasks than may otherwise have been necessary.
MaxNumberOfGroups
The grouping blocks are capable of tracking how many groups they’ve produced, and automatically complete themselves (declining further offered messages) after that number of groups has been generated. By default, the number of groups is DataflowBlockOptions.Unbounded (-1), but it may be explicitly set to a value greater than one.
CancellationToken
This token is monitored during the dataflow block’s lifetime. If a cancellation request arrives prior to the block’s completion, the block will cease operation as politely and quickly as possible.
Greedy
By default, target blocks are greedy and want all data offered to them.
BoundedCapacity
This is the limit on the number of items the block may be storing and have in flight at any one time.

Multi-threading potentially long running operations

I am writing a Windows Service. I have a 'backlog' (in SQL) of records that have to be processed by the service. The backlog might also be empty. The record processing is a potentially very long running operation (3+ minutes).
I have a class and method in it which would go to the SQL, choose a record and process it, if there are any records to process. Then the method will exist and that's it. Note: I can't know in advance which records will be processed - the class method decides this as part of its logic.
I want to achieve parallel processing. I want to have X number of workers (where X is the optimal for the host PC) at any time. While the backlog is empty, those workers finish their jobs and exit pretty quickly (~50-100ms, maybe). I want any 'freed' worker to start over again (i.e. re-run).
I have done some reading and I deduct that ThreadPool is not a good option for long-running operations. The .net 4.0+ parallel library is not a good option either, as I don't want to wait all workers to finish and I don't want to predefine/declare in advance the tasks.
In layman terms I want to have X workers who query the data source for items and when some of them find such - operate on it, the rest would continue to look for newly pushed items into the backlog.
What would be the best approach? I think I will have to manage the threads entirely by myself? i.e. first step - determine the optimum number of threads (perhaps by checking the Environment.ProcessorCount) and then start the X threads. Monitor for IsAlive on each thread and restart it? This seems awfully unprofessional.
Any suggestions?
You can start one task per core,As tasks finish start new ones.You can use numOfThreads depending on ProcessorCount or specific number
int numOfThreads = System.Environment.ProcessorCount;
// int numOfThreads = X;
for(int i =0; i< numOfThreads; i++)
task.Add(Task.Factory.StartNew(()=> {});
while(task.count>0) //wait for task to finish
{
int index = Task.WaitAny(tasks.ToArray());
tasks.RemoveAt(index);
if(incomplete work)
task.Add(Task.Factory.StartNew()=> {....});
}
or
var options = new ParallelOptions();
options.MaxDegreeOfParllelism = System.Environment.ProcessorCount;
Parallel.For(0,N,options, (i) => {/*long running computattion*/};
or
You can Implement Producer-Coustomer pattern with BlockingCollection
This topic is excellently taught by Dr.Joe Hummel on his Pluralsight course "Async and parallel programming: Application design "
Consider using ActionBlock<T> from TPL.DataFlow library. It can be configured to process concurrently multiple messages using all available CPU cores.
ActionBlock<QueueItem> _processor;
Task _completionTask;
bool _done;
async Task ReadQueueAsync(int pollingInterval)
{
while (!_done)
{
// Get a list of items to process from SQL database
var list = ...;
// Schedule the work
foreach(var item in list)
{
_processor.Post(item);
}
// Give SQL server time to re-fill the queue
await Task.Delay(pollingInterval);
}
// Signal the processor that we are done
_processor.Complete();
}
void ProcessItem(QueueItem item)
{
// Do your work here
}
void Setup()
{
// Configure action block to process items concurrently
// using all available CPU cores
_processor= new ActionBlock<QueueItem>(new Action<QueueItem>(ProcessItem),
new ExecutionDataFlowBlock{MaxDegreeOfParallelism = DataFlowBlockOptions.Unbounded});
_done = false;
var queueReaderTask = ReadQueueAsync(QUEUE_POLL_INTERVAL);
_completionTask = Task.WhenAll(queueReaderTask, _processor.Completion);
}
void Complete()
{
_done = true;
_completionTask.Wait();
}
Per MaxDegreeOfParallelism's documentation: "Generally, you do not need to modify this setting. However, you may choose to set it explicitly in advanced usage scenarios such as these:
When you know that a particular algorithm you're using won't scale
beyond a certain number of cores. You can set the property to avoid
wasting cycles on additional cores.
When you're running multiple algorithms concurrently and want to
manually define how much of the system each algorithm can utilize.
You can set a MaxDegreeOfParallelism value for each.
When the thread pool's heuristics is unable to determine the right
number of threads to use and could end up injecting too many
threads. For example, in long-running loop body iterations, the
thread pool might not be able to tell the difference between
reasonable progress or livelock or deadlock, and might not be able to reclaim threads that were added to improve performance. In this
case, you can set the property to ensure that you don't use more
than a reasonable number of threads."
If you do not have an advanced usage scenario like the 3 cases above, you may want to hand your list of items or tasks to be run to the Task Parallel Library and let the framework handle the processor count.
List<InfoObject> infoList = GetInfo();
ConcurrentQueue<ResultObject> output = new ConcurrentQueue<ResultObject>();
await Task.Run(() =>
{
Parallel.Foreach<InfoObject>(infoList, (item) =>
{
ResultObject result = ProcessInfo(item);
output.Add(result);
});
});
foreach(var resultObj in output)
{
ReportOnResultObject(resultObj);
}
OR
List<InfoObject> infoList = GetInfo();
List<Task<ResultObject>> tasks = new List<Task<ResultObject>>();
foreach (var item in infoList)
{
tasks.Add(Task.Run(() => ProcessInfo(item)));
}
var results = await Task.WhenAll(tasks);
foreach(var resultObj in results)
{
ReportOnResultObject(resultObj);
}
H/T to IAmTimCorey tutorials:
https://www.youtube.com/watch?v=2moh18sh5p4
https://www.youtube.com/watch?v=ZTKGRJy5P2M

C# multithreading debugging

i am trying to build a multi threaded server that is supposed to spawn new threads for every incoming connection, BUT for all my effort, it spawns new threads only when it feels like it, Can anybody help me debug this code? am i missing something obvious?
while (true)
{
if (txtAddress.Text.Trim() == "Any" || txtAddress.Text.Trim() == "any")
ipEndP = new IPEndPoint(IPAddress.Any, 778);
else
ipEndP = new IPEndPoint(IPAddress.Parse(txtAddress.Text.Trim()), 778);
tcpL = new TcpListener(ipEndP);
tcpL.Server.SetSocketOption(SocketOptionLevel.Socket, SocketOptionName.ReuseAddress, true);
tcpL.Start();
tempSock = tcpL.AcceptSocket();
//t = new Thread(ConnectionHandler); //When new client is connected, new thread //is created to handle the connection
//t.Priority = ThreadPriority.Highest;
//t.Start(tempSock);
ThreadPool.QueueUserWorkItem(ConnectionHandler, tempSock);
}
Check out the MSDN docs for QueueUserWorkItem
Queues a method for execution. The method executes when a thread pool thread becomes available.
Placing a thread in the user work item queue does not guarantee that it will begin executing right away. That's actually a good thing. If you got so many connections that you needed hundreds or thousands of threads, that could easily bring your server to it's knees (and certainly would be very wasteful due to excessive context switches).
Your commented out code should kick off a new thread for every connection. Is that not working? If so, what exactly is not working? Note that creating a new thread for every connection is much more expensive than using the thread pool.
UPDATE
Based on your remark that the commented out code is also failing to create too many threads, I would add...
You are creating WAY too many threads if that happens.
Often I see people asking why they can't create more than around 2000 threads in a process. The reason is not that there is any particular limit inherent in Windows. Rather, the programmer failed to take into account the amount of address space each thread uses.
A thread consists of some memory in kernel mode (kernel stacks and object management), some memory in user mode (the thread environment block, thread-local storage, that sort of thing), plus its stack. (Or stacks if you're on an Itanium system.)
Usually, the limiting factor is the stack size.
http://blogs.msdn.com/b/oldnewthing/archive/2005/07/29/444912.aspx

Replace infinite thread loop (message pump) with Tasks

In my application I have to listen on multiple different queues and deserialize/dispatch incoming messages received on queues.
Actually, what I am doing to achieve this is that each QueueConnector object creates a new thread on construction, which executes a infinite loop with a blocking call to queue.Receive() to receive next message in queue as exposed by the code below :
// Instantiate message pump thread
msmqPumpThread = new Thread(() => while (true)
{
// Blocking call (infinite timeout)
// Wait for a new message to come in queue and get it
var message = queue.Receive();
// Deserialize/Dispatch message
DeserializeAndDispatchMessage(message);
}).Start();
I'd like to know if this "message pump" can be replaced using Task(s) instead of going through an infinite loop on a new Thread.
I made a task already for the Message receiving part (see below) but I don't really see how to use it for a message pump (Can I recall the same task on completion over and over again, with continuations, replacing infinite loop in separate thread as in the code above ?)
Task<Message> GetMessageFromQueueAsync()
{
var tcs = new TaskCompletionSource<Message>();
ReceiveCompletedEventHandler receiveCompletedHandler = null;
receiveCompletedHandler = (s, e) =>
{
queue.ReceiveCompleted -= receiveCompletedHandler;
tcs.SetResult(e.Message);
};
queue.BeginReceive();
return tcs.Task;
}
Will I gain anything by using Tasks instead of an infinite loop in a separate thread (with a blocking call => blocking thread) in this context ? And if yes, how to do it properly ?
Please note that this application don't have a lot of QueueConnector objects, and won't have (maybe 10 connectors MAX), meaning ten Threads max through the first solution, so memory footprint / performance starting threads is not an issue here. I was rather thinking about scheduling performance / CPU usage. Will there be any difference ?
You will generally have more overhead and less throughput with async code when the count of threads is low. Nonblocking code is most useful when the number of threads is very high causing a) lots of wasted memory due to stacks and b) context switches. It has noticable overhead though because of more allocation, more indirection and more user-kernel-transitions.
For low thread counts (< 100) you probably shouldn't worry. Try to focus on writing maintainable, bug-resistant and simple code. Use threads.

Multiple Threads

I post a lot here regarding multithreading, and the great stackoverflow community have helped me alot in understand multithreading.
All the examples I have seen online only deal with one thread.
My application is a scraper for an insurance company (family company ... all free of charge). Anyway, the user is able to select how many threads they want to run. So lets say for example the user wants the application to scrape 5 sites at one time, and then later in the day he choses 20 threads because his computer isn't doing anything else so it has the resources to spare.
Basically the application builds a list of say 1000 sites to scrape. A thread goes off and does that and updates the UI and builds the list.
When thats finished another thread is called to start the scraping. Depending on the number of threads the user has set to use it will create x number of threads.
Whats the best way to create these threads? Should I create 1000 threads in a list. And loop through them? If the user has set 5 threads to run, it will loop through 5 at a time.
I understand threading, but it's the application logic which is catching me out.
Any ideas or resources on the web that can help me out?
You could consider using a thread pool for that:
using System;
using System.Threading;
public class Example
{
public static void Main()
{
ThreadPool.SetMaxThreads(100, 10);
// Queue the task.
ThreadPool.QueueUserWorkItem(new WaitCallback(ThreadProc));
Console.WriteLine("Main thread does some work, then sleeps.");
Thread.Sleep(1000);
Console.WriteLine("Main thread exits.");
}
// This thread procedure performs the task.
static void ThreadProc(Object stateInfo)
{
Console.WriteLine("Hello from the thread pool.");
}
}
This scraper, does it use a lot of CPU when its running?
If it does a lot of communication with these 1000 remote sites, downloading their pages, that may be taking more time than the actual analysis of the pages.
And how many CPU cores does your user have? If they have 2 (which is common these days) then beyond two simultaneous threads performing analysis, they aren't going to see any speed up.
So you probably need to "parallelize" the downloading of the pages. I doubt you need to do the same for the analysis of the pages.
Take a look into asynchronous IO, instead of explicit multi-threading. It lets you launch a bunch of downloads in parallel and then get called back when each one completes.
If you really just want the application, use something someone else already spent time developing and perfecting:
http://arachnode.net/
arachnode.net is a complete and comprehensive .NET web crawler for
downloading, indexing and storing
Internet content including e-mail
addresses, files, hyperlinks, images,
and Web pages.
Whether interested or involved in
screen scraping, data mining, text
mining, research or any other
application where a high-performance
crawling application is key to the
success of your endeavors,
arachnode.net provides the solution
you need for success.
If you also want to write one yourself because it's a fun thing to write (I wrote one not long ago, and yes, it is alot of fun ) then you can refer to this pdf provided by arachnode.net which really explains in detail the theory behind a good web crawler:
http://arachnode.net/media/Default.aspx?Sort=Downloads&PageIndex=1
Download the pdf entitled: "Crawling the Web" (second link from top). Scroll to Section 2.6 entitled: "2.6 Multi-threaded Crawlers". That's what I used to build my crawler, and I must say, I think it works quite well.
I think this example is basically what you need.
public class WebScraper
{
private readonly int totalThreads;
private readonly List<System.Threading.Thread> threads;
private readonly List<Exception> exceptions;
private readonly object locker = new object();
private volatile bool stop;
public WebScraper(int totalThreads)
{
this.totalThreads = totalThreads;
threads = new List<System.Threading.Thread>(totalThreads);
exceptions = new List<Exception>();
for (int i = 0; i < totalThreads; i++)
{
var thread = new System.Threading.Thread(Execute);
thread.IsBackground = true;
threads.Add(thread);
}
}
public void Start()
{
foreach (var thread in threads)
{
thread.Start();
}
}
public void Stop()
{
stop = true;
foreach (var thread in threads)
{
if (thread.IsAlive)
{
thread.Join();
}
}
}
private void Execute()
{
try
{
while (!stop)
{
// Scrap away!
}
}
catch (Exception ex)
{
lock (locker)
{
// You could have a thread checking this collection and
// reporting it as you see fit.
exceptions.Add(ex);
}
}
}
}
The basic logic is:
You have a single queue in which you put the URLs to scrape then you create your threads and use a queue object to which every thread has access. Let the threads start a loop:
lock the queue
check if there are items in the queue, if not, unlock queue and end thread
dequeue first item in the queue
unlock queue
process item
invoke an event that updates the UI (Remember to lock the UI Controller)
return to step 1
Just let the Threads do the "get stuff from the queue" part (pulling the jobs) instead of giving them the urls (pushing the jobs), that way you just say
YourThreadManager.StartThreads(numberOfThreadsTheUserWants);
and everything else happens automagically. See the other replies to find out how to create and manage the threads .
I solved a similar problem by creating a worker class that uses a callback to signal the main app that a worker is done. Then I create a queue of 1000 threads and then call a method that launches threads until the running thread limit is reached, keeping track of the active threads with a dictionary keyed by the thread's ManagedThreadId. As each thread completes, the callback removes its thread from the dictionary and calls the thread launcher.
If a connection is dropped or times out, the callback reinserts the thread back into the queue. Lock around the queue and the dictionary. I create threads vs using the thread pool because the overhead of creating a thread is insignificant compared to the connection time, and it allows me to have a lot more threads in flight. The callback also provides a convenient place with which to update the user interface, even allowing you to change the thread limit while it's running. I've had over 50 open connections at one time. Remember to increase your MacConnections property in your app.config (default is two).
I would use a queue and a condition variable and mutex, and start just the requested number of threads, for example, 5 or 20 (and not start 1,000).
Each thread blocks on the condition variable. When woken up, it dequeues the first item, unlocks the queue, works with the item, locks the queue and checks for more items. If the queue is empty, sleep on the condition variable. If not, unlock, work, repeat.
While the mutex is locked, it can also check if the user has requested the count of threads to be reduced. Just check if count > max_count, and if so, the thread terminates itself.
Any time you have more sites to queue, just lock the mutex and add them to the queue, then broadcast on the condition variable. Any threads that are not already working will wake up and take new work.
Any time the user increases the requested thread count, just start them up and they will lock the queue, check for work, and either sleep on the condition variable or get going.
Each thread will be continually pulling more work from the queue, or sleeping. You don't need more than 5 or 20.
Consider using the event-based asynchronous pattern (AsyncOperation and AsyncOperationManager Classes)
You might want to take a look at the ProcessQueue article on CodeProject.
Essentially, you'll want to create (and start) the number of threads that are appropriate, in your case that number comes from the user. Each of these threads should process a site, then find the next site needed to process. Even if you don't use the object itself (though it sounds like it would suit your purposes pretty well, though I'm obviously biased!) it should give you some good insight into how this sort of thing would be done.

Categories