N number of threads asynchronously getting/executing tasks - c#

I have an unlimited number of tasks in a db queue somewhere. What is the best way to have a program working on n tasks simultaneously on n different threads, starting new tasks as old ones get done? When one task finishes, another task should asynchronously begin. The currently-running count should always be n.
My initial thought was to use a thread pool, but that seems unnecessary considering that the tasks to be worked on will be retrieved within the individual threads. In other words, each thread will on its own go get its next task rather than having a main thread get tasks and then distribute them.
I see multiple options for doing this, and I don't know which one I should use for optimal performance.
1) Thread Pool - In light of there not necessarily being any waiting threads, I'm not sure this is necessary.
2) Semaphore - Same as 1. What's the benefit of a semaphore if there aren't tasks waiting to be allocated by the main thread?
3) Same Threads Forever - Kick the program off with n threads. When a thread is done working, it gets the next task itself. The main thread just monitors to makes sure the n threads are still alive.
4) Event Handling - Same as 3, except that when a thread finishes a task, it fires off an ImFinished event before dying. An ImFinished event handler kicks off a new thread. This seems just like 3 but with more overhead (since new threads are constantly being created)
5) Something else?

BlockingCollection makes this whole thing pretty trivial:
var queue = new BlockingCollection<Action>();
int numWorkers = 5;
for (int i = 0; i < numWorkers; i++)
{
Thread t = new Thread(() =>
{
foreach (var action in queue.GetConsumingEnumerable())
{
action();
}
});
t.Start();
}
You can then have the main thread add items to the blocking collection after starting the workers (or before, if you want). You can even spawn multiple producer threads to add items to the queue.
Note that the more conventional approach would be to use Tasks instead of using Thread classes directly. The primary reasons that I didn't suggest it first is that you specifically requested an exact number of threads to be running (rather than a maximum) and you just don't have as much control over how Task objects are run (which is good; they can be optimized on your behalf). If that control isn't as important as you have stated the following may end up being preferable:
var queue = new BlockingCollection<Action>();
int numWorkers = 5;
for (int i = 0; i < numWorkers; i++)
{
Task.Factory.StartNew(() =>
{
foreach (var action in queue.GetConsumingEnumerable())
{
action();
}
}, CancellationToken.None, TaskCreationOptions.LongRunning, TaskScheduler.Default);
}

I like model #3, and have used it before; it reduces the number of threads starting and stopping, and makes the main thread a true "supervisor", reducing the work it has to do.
As Servy has indicated, the System.Collections.Concurrent namespace has a few constructs that are extremely valuable here. ConcurrentQueue is a thread-safe FIFO collection implementation designed to be used in just such a model; one or more "producer" threads add elements to the "input" side of the queue, while one or more "consumers" take elements out of the other end. If there is nothing in the queue, the call to get the item simply returns false; you can react to that by exiting out of the task method (the supervisor can then decide whether to start another task, probably by monitoring the input to the queue and ramping up when more items come in).
BlockingCollection adds the behavior of causing threads to wait when they attempt to get a value from the queue, if the queue doesn't have anything. It can also be configured to have a maximum capacity, above which it will block the "producer" threads adding any more elements until there is available capacity. BlockingCollection uses a ConcurrentQueue by default, but you can set it up to be a Stack, Dictionary or Bag if you wish. Using this model, you can have the tasks run indefinitely; when there's nothing to do they'll simply block until there is something for at least one of them to work on, so all the supervisor has to check for is tasks erroring out (a critical element of any robust threaded workflow pattern).

This is easily achieved with the TPL Dataflow library.
First, let's assume you have a BufferBlock<T>, this is your queue:
var queue = new BufferBlock<T>();
Then, you need the action to perform on the block, this is represented by the ActionBlock<T> class:
var action = new ActionBlock<T>(t => { /* Process t here */ },
new ExecutionDataflowBlockOptions {
// Number of concurrent tasks.
MaxDegreeOfParallelism = ...,
});
Note the constructor above, it takes an instance of ExecutionDataflowBlockOptions and sets the MaxDegreeOfParallelism property to however many concurrent items you want to be processed at the same time.
Underneath the surface, the Task Parallel Library is being used to handle allocating threads for tasks, etc. TPL Dataflow is meant to be a higher level abstraction which allows you to tweak just how much parallelism/throttling/etc that you want.
For example, if you didn't want the ActionBlock<TInput> to buffer any items (preferring them to live in the BufferBlock<T>), you can also set the BoundedCapacity property, which will limit the number of items that the ActionBlock<TInput> will hold onto at once (which includes the number of items being processed, as well as reserved items):
var action = new ActionBlock<T>(t => { /* Process t here */ },
new ExecutionDataflowBlockOptions {
// Number of concurrent tasks.
MaxDegreeOfParallelism = ...,
// Set to MaxDegreeOfParallelism to not buffer.
BoundedCapacity ...,
});
Also, if you want a new, fresh Task<TResult> instance to process every item, then you can set the MaxMessagesPerTask property to one, indicating that each and every Task<TResult> will process one item:
var action = new ActionBlock<T>(t => { /* Process t here */ },
new ExecutionDataflowBlockOptions {
// Number of concurrent tasks.
MaxDegreeOfParallelism = ...,
// Set to MaxDegreeOfParallelism to not buffer.
BoundedCapacity ...,
// Process once item per task.
MaxMessagesPerTask = 1,
});
Note that depending on how many other tasks your application is running, this might or might not be optimal for you, and you might also want to think of the cost of spinning up a new task for every item that comes through the ActionBlock<TInput>.
From there, it's a simple matter of linking the BufferBlock<T> to the ActionBlock<TInput> with a call to the LinkTo method:
IDisposable connection = queue.LinkTo(action, new DataflowLinkOptions {
PropagateCompletion = true;
});
You set the PropogateCompletion property to true here so that when waiting on the ActionBlock<T>, the completion will be sent to the ActionBlock<T> (if/when there are no more items to process) which you might subsequently wait on.
Note the you can call the Dispose method on the IDisposable interface implementation returned from the call to LinkTo if you want the link between the blocks to be removed.
Finally, you post items to the buffer using the Post method:
queue.Post(new T());
And when you're done (if you are ever done), you call the Complete method:
queue.Complete();
Then, on the action block, you can wait until it's done by waiting on the Task instance exposed by the Completion property:
action.Completion.Wait();
Hopefully, the elegance of this is clear:
You don't have to manage the creation of new Task instances/threads/etc to manage the work, the blocks do it for you based on the settings you provide (and this is on a per-block basis).
Cleaner separation of concerns. The buffer is separated from the action, as are all the other blocks. You build the blocks and then link them together.

I'm a VB guy, but you can easily translate:
Private Async Sub foo()
Dim n As Integer = 16
Dim l As New List(Of Task)
Dim jobs As New Queue(Of Integer)(Enumerable.Range(1, 100))
For i = 1 To n
Dim j = jobs.Dequeue
l.Add(Task.Run((Sub()
Threading.Thread.Sleep(500)
Console.WriteLine(j)
End Sub)))
Next
While l.Count > 0
Dim t = Await Task.WhenAny(l)
If jobs.Count > 0 Then
Dim j = jobs.Dequeue
l(l.IndexOf(t)) = (Task.Run((Sub()
Threading.Thread.Sleep(500)
Console.WriteLine(j)
End Sub)))
Else
l.Remove(t)
End If
End While
End Sub
There's an article from Stephen Toub, why you shouldn't use Task.WhenAny in this way ... WITH A LARGE LIST OF TASKS, but with "some" tasks you usually dont run into a problem
The idea is quite simple: You have a list, where you add as many (running) tasks as you would like to run in parallel. Then you (a)wait for the first one to finish. If there are still jobs in the queue, you assign the job to a new task and then (a)wait again. If there are no jobs in the queue, you simply remove the finished task. If both your tasklist and the queue is empty, you are done.
The Stephen Toub article: http://blogs.msdn.com/b/pfxteam/archive/2012/08/02/processing-tasks-as-they-complete.aspx

Related

What determines the number of threads for a TaskFactory spawned jobs?

I have the following code:
var factory = new TaskFactory();
for (int i = 0; i < 100; i++)
{
var i1 = i;
factory.StartNew(() => foo(i1));
}
static void foo(int i)
{
Thread.Sleep(1000);
Console.WriteLine($"foo{i} - on thread {Thread.CurrentThread.ManagedThreadId}");
}
I can see it only does 4 threads at a time (based on observation). My questions:
What determines the number of threads used at a time?
How can I retrieve this number?
How can I change this number?
P.S. My box has 4 cores.
P.P.S. I needed to have a specific number of tasks (and no more) that are concurrently processed by the TPL and ended up with the following code:
private static int count = 0; // keep track of how many concurrent tasks are running
private static void SemaphoreImplementation()
{
var s = new Semaphore(20, 20); // allow 20 tasks at a time
for (int i = 0; i < 1000; i++)
{
var i1 = i;
Task.Factory.StartNew(() =>
{
try
{
s.WaitOne();
Interlocked.Increment(ref count);
foo(i1);
}
finally
{
s.Release();
Interlocked.Decrement(ref count);
}
}, TaskCreationOptions.LongRunning);
}
}
static void foo(int i)
{
Thread.Sleep(100);
Console.WriteLine($"foo{i:00} - on thread " +
$"{Thread.CurrentThread.ManagedThreadId:00}. Executing concurently: {count}");
}
When you are using a Task in .NET, you are telling the TPL to schedule a piece of work (via TaskScheduler) to be executed on the ThreadPool. Note that the work will be scheduled at its earliest opportunity and however the scheduler sees fit. This means that the TaskScheduler will decide how many threads will be used to run n number of tasks and which task is executed on which thread.
The TPL is very well tuned and continues to adjust its algorithm as it executes your tasks. So, in most cases, it tries to minimize contention. What this means is if you are running 100 tasks and only have 4 cores (which you can get using Environment.ProcessorCount), it would not make sense to execute more than 4 threads at any given time, as otherwise it would need to do more context switching. Now there are times where you want to explicitly override this behaviour. Let's say in the case where you need to wait for some sort of IO to finish, which is a whole different story.
In summary, trust the TPL. But if you are adamant to spawn a thread per task (not always a good idea!), you can use:
Task.Factory.StartNew(
() => /* your piece of work */,
TaskCreationOptions.LongRunning);
This tells the DefaultTaskscheduler to explicitly spawn a new thread for that piece of work.
You can also use your own Scheduler and pass it in to the TaskFactory. You can find a whole bunch of Schedulers HERE.
Note another alternative would be to use PLINQ which again by default analyses your query and decides whether parallelizing it would yield any benefit or not, again in the case of a blocking IO where you are certain starting multiple threads will result in a better execution you can force the parallelism by using WithExecutionMode(ParallelExecutionMode.ForceParallelism) you then can use WithDegreeOfParallelism, to give hints on how many threads to use but remember there is no guarantee you would get that many threads, as MSDN says:
Sets the degree of parallelism to use in a query. Degree of
parallelism is the maximum number of concurrently executing tasks that
will be used to process the query.
Finally, I highly recommend having a read of THIS great series of articles on Threading and TPL.
If you increase the number of tasks to for example 1000000 you will see a lot more threads spawned over time. The TPL tends to inject one every 500ms.
The TPL threadpool does not understand IO-bound workloads (sleep is IO). It's not a good idea to rely on the TPL for picking the right degree of parallelism in these cases. The TPL is completely clueless and injects more threads based on vague guesses about throughput. Also to avoid deadlocks.
Here, the TPL policy clearly is not useful because the more threads you add the more throughput you get. Each thread can process one item per second in this contrived case. The TPL has no idea about that. It makes no sense to limit the thread count to the number of cores.
What determines the number of threads used at a time?
Barely documented TPL heuristics. They frequently go wrong. In particular they will spawn an unlimited number of threads over time in this case. Use task manager to see for yourself. Let this run for an hour and you'll have 1000s of threads.
How can I retrieve this number? How can I change this number?
You can retrieve some of these numbers but that's not the right way to go. If you need a guaranteed DOP you can use AsParallel().WithDegreeOfParallelism(...) or a custom task scheduler. You also can manually start LongRunning tasks. Do not mess with process global settings.
I would suggest using SemaphoreSlim because it doesn't use Windows kernel (so it can be used in Linux C# microservices) and also has a property SemaphoreSlim.CurrentCount that tells how many remaining threads are left so you don't need the Interlocked.Increment or Interlocked.Decrement. I also removed i1 because i is value type and it won't be changed by the call of foo method passing the i argument so it's no need to copy it into i1 to ensure it never changes (if that was the reasoning for adding i1):
private static void SemaphoreImplementation()
{
var maxThreadsCount = 20; // allow 20 tasks at a time
var semaphoreSlim = new SemaphoreSlim(maxTasksCount, maxTasksCount);
var taskFactory = new TaskFactory();
for (int i = 0; i < 1000; i++)
{
taskFactory.StartNew(async () =>
{
try
{
await semaphoreSlim.WaitAsync();
var count = maxTasksCount-semaphoreSlim.CurrentCount; //SemaphoreSlim.CurrentCount tells how many threads are remaining
await foo(i, count);
}
finally
{
semaphoreSlim.Release();
}
}, TaskCreationOptions.LongRunning);
}
}
static async void foo(int i, int count)
{
await Task.Wait(100);
Console.WriteLine($"foo{i:00} - on thread " +
$"{Thread.CurrentThread.ManagedThreadId:00}. Executing concurently: {count}");
}

Is it the correct implementation?

I am having a Windows Service that needs to pick the jobs from database and needs to process it.
Here, each job is a scanning process that would take approx 10 mins to complete.
I am very new to Task Parallel Library. I have implemented in the following way as sample logic:
Queue queue = new Queue();
for (int i = 0; i < 10000; i++)
{
queue.Enqueue(i);
}
for (int i = 0; i < 100; i++)
{
Task.Factory.StartNew((Object data ) =>
{
var Objdata = (Queue)data;
Console.WriteLine(Objdata.Dequeue());
Console.WriteLine(
"The current thread is " + Thread.CurrentThread.ManagedThreadId);
}, queue, TaskCreationOptions.LongRunning);
}
Console.ReadLine();
But, this is creating lot of threads. Since loop is repeating 100 times, it is creating 100 threads.
Is it right approach to create that many number of parallel threads ?
Is there any way to limit the number of threads to 10 (concurrency level)?
An important factor to remember when allocating new Threads is that the OS has to allocate a number of logical entities in order for that current thread to run:
Thread kernel object - an object for describing the thread,
including the thread's context, cpu registers, etc
Thread environment block - For exception handling and thread local
storage
User-mode stack - 1MB of stack
Kernel-mode stack - For passing arguments from user mode to kernel
mode
Other than that, the number of concurrent Threads that may run depend on the number of cores your machine is packing, and creating an amount of threads that is larger than the number of cores your machine owns will start causing Context Switching, which in the long run may slow your work down.
So after the long intro, to the good stuff. What we actually want to do is limit the number of threads running and reuse them as much as possible.
For this kind of job, i would go with TPL Dataflow which is based on the Producer-Consumer pattern. Just a small example of what can be done:
// a BufferBlock is an equivalent of a ConcurrentQueue to buffer your objects
var bufferBlock = new BufferBlock<object>();
// An ActionBlock to process each object and do something with it
var actionBlock = new ActionBlock<object>(obj =>
{
// Do stuff with the objects from the bufferblock
});
bufferBlock.LinkTo(actionBlock);
bufferBlock.Completion.ContinueWith(t => actionBlock.Complete());
You may pass each Block a ExecutionDataflowBlockOptions which may limit the Bounded Capacity (The number of objects inside the BufferBlock) and MaxDegreeOfParallelism which tells the block the number of maximum concurrency you may want.
There is a good example here to get you started.
Glad you asked, because you're right in the sense that - this is not the best approach.
The concept of Task should not be confused with a Thread. A Thread can be compared to a chef in a kitchen, while a Task is a dish ordered by a customer. You have a bunch of chefs, and they process the dish orders in some ordering (usually FIFO). A chef finishes a dish then moves on to the next. The concept of Thread Pool is the same. You create a bunch of Tasks to be completed, but you do not need to assign a new thread to each task.
Ok so the actual bits to do it. There are a few. The first one is ThreadPoll.QueueUserWorkItem. (http://msdn.microsoft.com/en-us/library/system.threading.threadpool.queueuserworkitem(v=vs.110).aspx). Using the Parallel library, Parallel.For can also be used, it will automatically spawn threads based on the number of actual CPU cores available in the system.
Parallel.For(0, 100, i=>{
//here, this method will be called 100 times, and i will be 0 to 100
WaitForGrassToGrow();
Console.WriteLine(string.Format("The {0}-th task has completed!",i));
});
Note that there is no guarantee that the method called by Parallel.For is called in sequence (0,1,2,3,4,5...). The actual sequence depends on the execution.

Using thread-local resources with parallel for loops

Imagine a long list of data to be processed. Processing is CPU bound and can be done in parallel.
To process a data item requires a large object (~50MB) to hold intermediate processing results. This object may be re-used during the processing of a subsequent task.
I want to do something like this:
Processor[] processors = GetProcessors(Environment.ProcessorCount);
Parallel.For(
0,
itemCount,
new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount },
item =>
{
int threadIndex = /* TODO */;
processors[threadIndex].Process(item);
}
);
The goal is to only ever have Environment.ProcessorCount instances of my large object, and reuse them as efficiently as possible.
How can I do this?
You just need to use the overload of Parallel.For that takes in the two functions to set up and tear down your thread local object.
Parallel.For(
0,
itemCount,
() => new Processor(),
new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount },
(item, loopState, processor) =>
{
processor.Process(item);
// return the processor to be used for another invocation
return processor;
}
processor =>
{
//Do any tear down work you need to do, like dispose the object if it is disposeable
processor.Dispose();
}
);
Because the Parallel class function do not immediately jump to using ParallelOptions.MaxDegreeOfParallelism threads (they start at one thread then ramp up to the max you defined) it will only create one instance of Processor if only one thread gets created and have up to ParallelOptions.MaxDegreeOfParallelism objects created at the same time.
I do not know the implementation details of the default scheduler but it may or may not stop threads then create new ones causing a new Processor object to be created. However, if that happens (which it may not, I don't know) you will still only have a maximum of ParallelOptions.MaxDegreeOfParallelism objects existing at the same time.
Here's an approach that works. I devised this answer while writing the question, but thought the question was interesting enough to post anyway. If someone has a better solution, I'd like to learn it.
Use a concurrent collection (such as ConcurrentQueue<Processor>) to allocate instances of the Processor between threads.
Processor[] processors = GetProcessors(Environment.ProcessorCount);
var queue = new ConcurrentQueue<Processor>(processors);
Parallel.For(
0,
itemCount,
new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount },
item =>
{
// Obtain the processor
Processor processor;
queue.TryDequeue(out processor);
processor.Process(item);
// Store the processor again for another invocation
queue.Enqueue(processor);
}
);
An actual implementation should assert that TryDequeue returns true, and also enqueue the processors again in case of an exception.
So long as the processing time is much larger than the time spent in queue contention, the overhead should be minimal.

How do I maintain a fixed collection of threads?

Need suggestion for best approach for Multi-threading in c# 3.0 (No Parallel or Task)
The situation is, I have a Queue with 500 items. At a particular time I can run only 10 threads (Max). Below is my code.
While (queue.Count > 0)
{
Thread[] threads = new Thread[no_of_threads];
for (int j = 0; j < no_of_threads; j++)
{
threads[j] = new Thread(StartProcessing);//StartProcessing Dequeue one item each time //for a single thread
threads[j].Start();
}
foreach (Thread objThread in threads)
{
objThread.Join();
}
}
Problem in this approach is, for an instance, if no_of_threads = 10 and out of them 9 threads are done with processing, and 1 thread is still working, I cannot come out of loop and delegate work to the free threads until all 10 threads are done.
I need at all the time 10 threads should work till the queue count > 0.
This is easily done with a Semaphore.
The idea is to create a semaphore with a maximum count of N, where N is the number of threads you allow. The loop waits on the semaphore and queues tasks as it acquires the semaphore.
Semaphore ThreadsAvailable = new Semaphore(10, 10);
while (Queue.Count > 0)
{
ThreadsAvailable.WaitOne();
// Must dequeue item here, otherwise you could run off the end of the queue
ThreadPool.QueueUserWorkItem(DoStuff, Queue.Dequeue());
}
// Wait for remaining threads to finish
int threadCount = 10;
while (threadCount != 0)
{
ThreadsAvailable.WaitOne();
--threadCount;
}
void DoStuff(object item)
{
ItemType theItem = (ItemType)item;
// process the item
StartProcessing(item);
// And then release the semaphore so another thread can run
ThreadsAvailable.Release();
}
The item is dequeued in the main loop because that avoids a race condition that otherwise is rather messy to handle. If you let the thread dequeue the item, then the thread has to do this:
lock (queue)
{
if (queue.Count > 0)
item = queue.Dequeue();
else
// There wasn't an item to dequeue
return;
}
Otherwise, the following sequence of events is likely to occur when there is only one item left in the queue.
main loop checks Queue.Count, which returns 1
main loop calls QueueUserWorkItem
main loop checks Queue.Count again, which returns 1 because the thread hasn't started yet
new thread starts and dequeues an item
main loop tries to dequeue an item and throws an exception because queue.Count == 0
If you're willing to handle things that way, then you're okay. The key is making sure that the thread calls Release on the semaphore before the thread exits. You can do that with explicitly managed threads, or with the ThreadPool approach that I posted. I just used ThreadPool because I find it easier than explicitly managing threads.
So all you need to handle this is a queue that is designed to be accessed from multilpe threads. Were you using .NET 4.0 I'd say use BlockingCollection. Not only will it work perfectly, but it's very efficient. You can rather trivially make your own class that is just a Queue with lock calls around all of the methods. It will work about as well, but it won't be as efficient. (It will likely be efficient enough for your purposes though, and a re-writing BlockingCollection "properly" would be quite hard.)
Once you have that queue each worker can just grab an item from that queue, process it, then ask the queue for another. When there are no more you don't need to worry about ending that thread; there's no more work it could do.
You should use ThreadPool which manages and optimizes threads for you
Once a thread in the pool completes its task, it is returned to a queue of waiting threads, where it can be reused. This reuse enables applications to avoid the cost of creating a new thread for each task.
Thread pools typically have a maximum number of threads. If all the threads are busy, additional tasks are put in queue until they can be serviced as threads become available.
It's better not to interfere into ThreadPool since it's enough smart to manage and allocate threads. But if you really need to do this, you can set the constraint of the maximum number of threads by using SetMaxThreads method
Instead of controlling the threads from the outside, let each thread consume data itself.
Pseudocode:
create 10 threads
thread code:
while elements in queue
get element from queue
process element
This is a simple producer-consumer scenario. You need a thread-safe queue like this one: Creating a blocking Queue<T> in .NET? - 10 threads can read and process job by job in a loop until the queue is empty.
Depending on how you fill the queue (prior to starting processing it or while processing it) you can end those threads as soon as the queue becomes empty or when you signal it to stop by means of a stop flag. In the latter case you probably need to wake the threads (eg with dummy jobs).

ThreadPool frustrations - Thread creation exceeding SetMaxThreads

I've got an I/O intensive operation.
I only want a MAX of 5 threads ever running at one time.
I've got 8000 tasks to queue and complete.
Each task takes approximately 15-20seconds to execute.
I've looked around at ThreadPool, but
ThreadPool.SetMaxThreads(5, 0);
List<task> tasks = GetTasks();
int toProcess = tasks.Count;
ManualResetEvent resetEvent = new ManualResetEvent(false);
for (int i = 0; i < tasks.Count; i++)
{
ReportGenerator worker = new ReportGenerator(tasks[i].Code, id);
ThreadPool.QueueUserWorkItem(x =>
{
worker.Go();
if (Interlocked.Decrement(ref toProcess) == 0)
resetEvent.Set();
});
}
resetEvent.WaitOne();
I cannot figure out why... my code is executing more than 5 threads at one time. I've tried to setmaxthreads, setminthreads, but it keeps executing more than 5 threads.
What is happening? What am I missing? Should I be doing this in another way?
Thanks
There is a limitation in SetMaxThreads in that you can never set it lower than the number of processors on the system. If you have 8 processors, setting it to 5 is the same as not calling the function at all.
Task Parallel Library can help you:
List<task> tasks = GetTasks();
Parallel.ForEach(tasks, new ParallelOptions { MaxDegreeOfParallelism = 5 },
task => {ReportGenerator worker = new ReportGenerator(task.Code, id);
worker.Go();});
What does MaxDegreeOfParallelism do?
I think there's a different and better way to approach this. (Pardon me if I accidentally Java-ize some of the syntax)
The main thread here has a lists of things to do in "Tasks" -- instead of creating threads for each task, which is really not efficient when you have so many items, create the desired number of threads and then have them request tasks from the list as needed.
The first thing to do is add a variable to the class this code comes from, for use as a pointer into the list. We'll also add one for the maximum desired thread count.
// New variable in your class definition
private int taskStackPointer;
private final static int MAX_THREADS = 5;
Create a method that returns the next task in the list and increments the stack pointer. Then create a new interface for this:
// Make sure that only one thread has access at a time
[MethodImpl(MethodImplOptions.Synchronized)]
public task getNextTask()
{
if( taskStackPointer < tasks.Count )
return tasks[taskStackPointer++];
else
return null;
}
Alternately, you could return tasks[taskStackPointer++].code, if there's a value you can designate as meaning "end of list". Probably easier to do it this way, however.
The interface:
public interface TaskDispatcher
{
[MethodImpl(MethodImplOptions.Synchronized)] public task getNextTask();
}
Within the ReportGenerator class, change the constructor to accept the dispatcher object:
public ReportGenerator( TaskDispatcher td, int idCode )
{
...
}
You'll also need to alter the ReportGenerator class so that the processing has an outer loop that starts off by calling td.getNextTask() to request a new task, and which exits the loop when it gets back a NULL.
Finally, alter the thread creation code to something like this: (this is just to give you an idea)
taskStackPointer = 0;
for (int i = 0; i < MAX_THREADS; i++)
{
ReportGenerator worker = new ReportGenerator(this,id);
worker.Go();
}
That way you create the desired number of threads and keep them all working at max capacity.
(I'm not sure I got the usage of "[MethodImpl(MethodImplOptions.Synchronized)]" exactly right... I am more used to Java than C#)
Your tasks list will have 8k items in it because you told the code to put them there:
List<task> tasks = GetTasks();
That said, this number has nothing to do with how many threads are being used in the sense that the debugger is always going to show how many items you added to the list.
There are various ways to determine how many threads are in use. Perhaps one of the simplest is to break into the application with the debugger and take a look at the threads window. Not only will you get a count, but you'll see what each thread is doing (or not) which leads me to...
There is significant discussion to be had about what your tasks are doing and how you arrived at a number to 'throttle' the thread pool. In most use cases, the thread pool is going to do the right thing.
Now to answer your specific question...
To explicitly control the number of concurrent tasks, consider a trivial implementation that would involve changing your task collection from a List to BlockingCollection (that will internally use a ConcurrentQueue) and the following code to 'consume' the work:
var parallelOptions = new ParallelOptions
{
MaxDegreeOfParallelism = 5
};
Parallel.ForEach(collection.GetConsumingEnumerable(), options, x =>
{
// Do work here...
});
Change MaxDegreeOfParallelism to whatever concurrent value you have determined is appropriate for the work you are doing.
The following might be of interest to you:
Parallel.ForEach Method
BlockingCollection
Chris
Its works for me. This way you can't use a number of workerthreads smaller than "minworkerThreads". The problem is if you need five "workerthreads" maximum and the "minworkerThreads" is six doesn't work.
{
ThreadPool.GetMinThreads(out minworkerThreads,out minportThreads);
ThreadPool.SetMaxThreads(minworkerThreads, minportThreads);
}
MSDN
Remarks
You cannot set the maximum number of worker threads or I/O completion threads to a number smaller than the number of processors on the computer. To determine how many processors are present, retrieve the value of the Environment.ProcessorCount property. In addition, you cannot set the maximum number of worker threads or I/O completion threads to a number smaller than the corresponding minimum number of worker threads or I/O completion threads. To determine the minimum thread pool size, call the GetMinThreads method.
If the common language runtime is hosted, for example by Internet Information Services (IIS) or SQL Server, the host can limit or prevent changes to the thread pool size.
Use caution when changing the maximum number of threads in the thread pool. While your code might benefit, the changes might have an adverse effect on code libraries you use.
Setting the thread pool size too large can cause performance problems. If too many threads are executing at the same time, the task switching overhead becomes a significant factor.

Categories