c# task multi-queue throttling

c# task multi-queue throttling - c#

I need a environment which needs to maintain different task queues, and for each of them to have a well defined number of concurrent threads that can execute for each queue. Something like this:
Queue 1 -> 3 threads;
Queue 2 -> 6 threads;
Kind of Task system. I have managed to implement by myself this using plain old c# code (aka System.Threading.Thread, lock and queue) which works more than fine for 1+ year. However, I keep reading articles about the wonders of TaskFactory and TaskScheduler, about being possible this things with built-in classes in .NET, but I have failed to find an example to prove this. I would like to test it and to compare with what I have right now to see if it's working better and if it does, to replace it.
More, I can live without having to limit/set the number of parallel threads for each queue as long as I can get the guarantee that if an item targeted for queue #2 is executed imediatly even if queue #1 is executing on full load.
So, my question is - is there something in .net 4 and more, can someone point me to a sample? I am looking for one an entire week and failed to get something relevant.

This is actually pretty trivial using the TPL and the new collections in System.Collections.Concurrent.
For your needs the BlockingCollection<T> is what I would recommend. By default it uses a ConcurrentQueue<T> as the underlying store which is perfect for what you want.
var queue = new BlockingCollection<Message>();
To set some code working on those messages, and control how many can execute in parallel is as simple as this:
//Set max parallel Tasks
var options = new ParallelOptions
{
MaxDegreeOfParallelism = 10
};
Parallel.ForEach(queue.GetConsumingEnumerable(), options, msg =>
{
//Do some stuff with this message
});
So what is going on here? Well...
The call to GetConsumingEnumerable() will actually block until there is something in queue to consume. This is great because no extra code is necessary for signaling that new work is ready to be done. Rather, as queue fills up, a new Task with your (anonymous) delegate will be kicked off with an item.
The ParallelOptions object allows you to control how Parallel.ForEach operates. In this case, you are telling it you never want more than 10 Tasks executing at any one time. It is important to note that Tasks != Threads. The details are murky, but needless to say there is a lot of optimization going on under the hood. It's all pluggable mind you, but that is not for the faint of heart.
There are obviously a lot of details I haven't covered here, but hopefully you can see how simple and expressive using the Task Parallel Library can be.

Related

C# Task behaviour

Hi I have a short question regarding tasks. As far as I understand Tasks can start multiple threads in itself.
Lets say I have two hardware sensors which give me data over two different dataports.
I want to implement them as producers in my c# project and then do something with the data. Would it make sense to start the data collection in two different tasks? Or should i implement them in the same task since c# will automatically put them on different threads?
var producerWorker = Task.Factory.StartNew(() => SensorB(number));
var producerWorker2 = Task.Factory.StartNew(() => SensorA(number));
or
var producerWorker = Task.Factory.StartNew(() => Sensor_A_AND_B(number));
My second problem is: When I have two different producers in two different tasks, how do I add their data to the same BlockingCollection queue if they have different datatypes but need to be at the same position in the queue?
For example if I have queueA for SensorA, queueB for SensorB, and queueC.
Both queues can be filled at different speeds. So lets say queueA has 50 elements, but SensorB is a lot faster and already has 100 elements stored in queueB. However I need to retrieve the data in a way, so that I can place queueA[33].data and queueB[33].data in queueC[33].data. Of course I would not like to start with element33, but always with the first element which was stored in queueA and queueB....
I hope you get what i mena

Tasks are executed in whatever way the runtime thinks is the best. Generally, there's a thread pool and both tasks run on available threads. If you really need to poll two sensors in parallel, I would recommend you to use two real threads to poll and use Reactive Extensions to process the readings in sync.

Judging by your question, you should do some reading on how tasks and Async work in C#, the topic is too large to answer on stack overflow. I would recommend picking up a book, because MS. documentation is rubbish when it comes to providing a solid block of knowledge.
Brifly, a task can not start multiple threads inside itself. Conceptually, a task is a smaller unit than a thread. A single thread can process multiple tasks, so lets say you have 20 tasks, the c# runtime will have a thread-pool of, for example, 4 threads, and they will take a task each, process it, then move on to the next task, and so on.
Perhaps what you are referring to is Asyncronous operations. That's a very different beast than a thread. Basically you are asking some part of the computer to go off, do an independent piece of work, for example send data over network and notify your program when it's done, without blocking the thread in the meantime.
Avoid using Task.Factory, because it has many ways of shooting yourself in the foot.Take a look at Stephen Cleary blog. Task.Run(... is a better choice most of the time.
My best guess is that when you say:
Or should i implement them in the same task since c# will automatically put them on different threads?
You are referring to async operations.
For simplicity's sake you could create two separate tasks, and as soon as they recieve data pop it into a queue.
Your question suggests that you need to synchronize the incoming data.If that's so, a blocking queue is probably the wrong choice. Use concurrent queue instead. A different task could read QueueA[x] and QueueB[x], and buffer the incoming data. Then you could pop then onto QueueC when both A and B supply N'th result.

Elegant way to do a threaded .net application with multiple working threads, multuple source and sink threads?

I've got an application where there are several threads that provide data, that needs to go through some heavy math. The math part needs a lot of initialization, afterwards it's pretty fast - as such I can't just spawn a thread every time I need to do the calculation, nor should every source thread have its own solver (there can be a LOT of such threads, beyond a certain point the memory requirements are obscene, and the overhead gets in the way or processing power).
I would like to use a following model: The data gathering and using threads would call to a single object, through one thread-safe interface function, like
public OutData DoMath(InData data) {...}
that would take care of the rest. This would involve finding a free worker thread (or waiting and blocking till one is available) passing by some means the data in a thread safe manner to one of the free worker threads, waiting (blocking) for it to do its job and gathering the result and returning it.
The worker thread(s) would then go into some sleep/blocked state, until a new input item would appear on its interface (or a command to clean up and die).
I know how to do this by means of various convoluted locks, queues and waits in a very horrible nasty way. I'm guessing there's a better, more elegant way.
My questions are:
Is this a good architecture for this?
Are there commonly used elegant means of doing this?
The target framework is .NET 4.5 or higher.
Thank you,
David

The math part needs a lot of initialization, afterwards it's pretty fast - as such I can't just spawn a thread every time I need to do the calculation, nor should every source thread have its own solver (there can be a LOT of such threads, beyond a certain point the memory requirements are obscene, and the overhead gets in the way or processing power).
Sounds like a pool of lazy-initialized items. You can use a basic BlockingCollection for this, but I recommend overriding the default queue-like behavior with a stack-like behavior to avoid initializing contexts you may not ever need.
I'll call the expensive-to-initialize type MathContext:
private static readonly BlockingColleciton<Lazy<MathContext>> Pool;
static Constructor()
{
Pool = new BlockingCollection<Lazy<MathContext>>(new ConcurrentStack<Lazy<MathContext>>());
for (int i = 0; i != 100; ++i) // or whatever you want your upper limit to be
Pool.Add(new Lazy<MathContext>());
}
This would involve finding a free worker thread (or waiting and blocking till one is available)
Actually, there's no point in using a worker thread here. Since your interface is synchronous, the calling thread can just do the work itself.
OutData DoMath(InData data)
{
// First, take a context from the pool.
var lazyContext = Pool.Take();
try
{
// Initialize the context if necessary.
var context = lazyContext.Value;
return ... // Do the actual work.
}
finally
{
// Ensure the context is returned to the pool.
Pool.Add(lazyContext);
}
}
I also think you should check out the TPL Dataflow library. It would require a bit of code restructuring, but it sounds like it may be a good fit for your problem domain.

Investigate Task Parallel Library. It has a set of methods for creating and managing threads. And such classes as ReaderWriterLock, ManualResetEvent
and their derivatives may help in synchronizing threads

Don't use locks. This problem sounds nice for a proper nearly lock free approach.
I think what you need to look into is the BlockingCollection. This class is a powerful collection for multiple consumers and producers. If you think about using it with Parallel.ForEach you may want to look into writing your own Partitioner to get some more performance out of it. Parallel contains a couple of very nice methods if you only need a couple of threads for a relatively short time. That sounds like something you need to do. There are also overloads that provide initialization and finalization methods for each spawned thread along with passing thread local variables from one stage of the function to the next. That may really help you.
The general tips apply here of cause too. Try to split up your application in as may small parts as possible. That usually clears things up nicely and the ways how to do things become clearer.
All in all from what you told about the problem at hand I do not think that you need a lot of blocking synchronization. The BlockingCollection is only blocking the consumer threads until new data is ready to be consumed. And the producer if you limit the size...
I can't think of anything beyond that out of the top of my head. This is a very general question and without some specific issues it is hard to help beyond that.
I still hope that helps.

You've pretty much described a thread pool - fortunately, there's quite a few simple APIs you can use for that. The simplest is probably
await Task.Run(() => DoMath(inData));
or just call Task.Run(() => DoMath(inData)).GetAwaiter().GetResult() if you don't mind blocking the requesting thread.
Instead of starting a whole new thread, it will simply borrow a thread from the .NET thread pool for the computation, and then return the result. Since you're doing almost pure CPU work, the thread pool will have only as much threads as you really need (that is, about the same (or double) amount as the number of CPU cores you have).
Using the await based version is a bit trickier - you need to ensure your whole call chain returns Tasks - but it has a major advantage in avoiding the need to keep the calling thread alive while you wait for the results to be done. And even better, if you make sure the original thread is also a thread-pool thread, you don't even need the Task.Run - the threads will be balanced automatically. Since you're only doing synchronous work anyway, this turns your whole problem into simply avoiding any manual new Thread, and using Task.Run(...) instead.

First, create a pool of N such "math service objects" that are heavy. Then, guard usage of that pool with a new SemaphoreSlim(N, N). Accessing those objects is then as easy as:
SemaphoreSlim sem = ...;
//...
await sem.WaitAsync();
var obj = TakeFromPool();
DoWork(obj);
Return(obj);
sem.Release();
You can vary this pattern in many ways. The core of it is the pool plus a semaphore that can be used to wait if the pool is empty at the time.

Best way to limit the number of active Tasks running via the Parallel Task Library

Consider a queue holding a lot of jobs that need processing. Limitation of queue is can only get 1 job at a time and no way of knowing how many jobs there are. The jobs take 10s to complete and involve a lot of waiting for responses from web services so is not CPU bound.
If I use something like this
while (true)
{
var job = Queue.PopJob();
if (job == null)
break;
Task.Factory.StartNew(job.Execute);
}
Then it will furiously pop jobs from the queue much faster than it can complete them, run out of memory and fall on its ass. >.<
I can't use (I don't think) ParallelOptions.MaxDegreeOfParallelism because I can't use Parallel.Invoke or Parallel.ForEach
3 alternatives I've found
Replace Task.Factory.StartNew with
Task task = new Task(job.Execute,TaskCreationOptions.LongRunning)
task.Start();
Which seems to somewhat solve the problem but I am not clear exactly what this is doing and if this is the best method.
Create a custom task scheduler that limits the degree of concurrency
Use something like BlockingCollection to add jobs to collection when started and remove when finished to limit number that can be running.
With #1 I've got to trust that the right decision is automatically made, #2/#3 I've got to work out the max number of tasks that can be running myself.
Have I understood this correctly - which is the better way, or is there another way?
EDIT - This is what I've come up with from the answers below, producer-consumer pattern.
As well as overall throughput aim was not to dequeue jobs faster than could be processed and not have multiple threads polling queue (not shown here but thats a non-blocking op and will lead to huge transaction costs if polled at high frequency from multiple places).
// BlockingCollection<>(1) will block if try to add more than 1 job to queue (no
// point in being greedy!), or is empty on take.
var BlockingCollection<Job> jobs = new BlockingCollection<Job>(1);
// Setup a number of consumer threads.
// Determine MAX_CONSUMER_THREADS empirically, if 4 core CPU and 50% of time
// in job is blocked waiting IO then likely be 8.
for(int numConsumers = 0; numConsumers < MAX_CONSUMER_THREADS; numConsumers++)
{
Thread consumer = new Thread(() =>
{
while (!jobs.IsCompleted)
{
var job = jobs.Take();
job.Execute();
}
}
consumer.Start();
}
// Producer to take items of queue and put in blocking collection ready for processing
while (true)
{
var job = Queue.PopJob();
if (job != null)
jobs.Add(job);
else
{
jobs.CompletedAdding()
// May need to wait for running jobs to finish
break;
}
}

I just gave an answer which is very applicable to this question.
Basically, the TPL Task class is made to schedule CPU-bound work. It is not made for blocking work.
You are working with a resource that is not CPU: waiting for service replies. This means the TPL will mismange your resource because it assumes CPU boundedness to a certain degree.
Manage the resources yourself: Start a fixed number of threads or LongRunning tasks (which is basically the same). Decide on the number of threads empirically.
You can't put unreliable systems into production. For that reason, I recommend #1 but throttled. Don't create as many threads as there are work items. Create as many threads which are needed to saturate the remote service. Write yourself a helper function which spawns N threads and uses them to process M work items. You get totally predictable and reliable results that way.

Potential flow splits and continuations caused by await, later on in your code or in a 3rd party library, won't play nicely with long running tasks (or threads), so don't bother using long running tasks. In the async/await world, they're useless. More details here.
You can call ThreadPool.SetMaxThreads but before you make this call, make sure you set the minimum number of threads with ThreadPool.SetMinThreads, using values below or equal to the max ones. And by the way, the MSDN documentation is wrong. You CAN go below the number of cores on your machine with those method calls, at least in .NET 4.5 and 4.6 where I used this technique to reduce the processing power of a memory limited 32 bit service.
If however you don't wish to restrict the whole app but just the processing part of it, a custom task scheduler will do the job. A long time ago, MS released samples with several custom task schedulers, including a LimitedConcurrencyLevelTaskScheduler. Spawn the main processing task manually with Task.Factory.StartNew, providing the custom task scheduler, and every other task spawned by it will use it, including async/await and even Task.Yield, used for achieving asynchronousy early on in an async method.
But for your particular case, both solutions won't stop exhausting your queue of jobs before completing them. That might not be desirable, depending on the implementation and purpose of that queue of yours. They are more like "fire a bunch of tasks and let the scheduler find the time to execute them" type of solutions. So perhaps something a bit more appropriate here could be a stricter method of control over the execution of the jobs via semaphores. The code would look like this:
semaphore = new SemaphoreSlim(max_concurrent_jobs);
while(...){
job = Queue.PopJob();
semaphore.Wait();
ProcessJobAsync(job);
}
async Task ProcessJobAsync(Job job){
await Task.Yield();
... Process the job here...
semaphore.Release();
}
There's more than one way to skin a cat. Use what you believe is appropriate.

Microsoft has a very cool library called DataFlow which does exactly what you want (and much more). Details here.
You should use the ActionBlock class and set the MaxDegreeOfParallelism of the ExecutionDataflowBlockOptions object. ActionBlock plays nicely with async/await, so even when your external calls are awaited, no new jobs will begin processing.
ExecutionDataflowBlockOptions actionBlockOptions = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
};
this.sendToAzureActionBlock = new ActionBlock<List<Item>>(async items => await ProcessItems(items),
actionBlockOptions);
...
this.sendToAzureActionBlock.Post(itemsToProcess)

The problem here doesn't seem to be too many running Tasks, it's too many scheduled Tasks. Your code will try to schedule as many Tasks as it can, no matter how fast they are executed. And if you have too many jobs, this means you will get OOM.
Because of this, none of your proposed solutions will actually solve your problem. If it seems that simply specifying LongRunning solves your problem, then that's most likely because creating a new Thread (which is what LongRunning does) takes some time, which effectively throttles getting new jobs. So, this solution only works by accident, and will most likely lead to other problems later on.
Regarding the solution, I mostly agree with usr: the simplest solution that works reasonably well is to create a fixed number of LongRunning tasks and have one loop that calls Queue.PopJob() (protected by a lock if that method is not thread-safe) and Execute()s the job.
UPDATE: After some more thinking, I realized the following attempt will most likely behave terribly. Use it only if you're really sure it will work well for you.
But the TPL tries to figure out the best degree of parallelism, even for IO-bound Tasks. So, you might try to use that to your advantage. Long Tasks won't work here, because from the point of view of TPL, it seems like no work is done and it will start new Tasks over and over. What you can do instead is to start a new Task at the end of each Task. This way, TPL will know what's going on and its algorithm may work well. Also, to let the TPL decide the degree of parallelism, at the start of a Task that is first in its line, start another line of Tasks.
This algorithm may work well. But it's also possible that the TPL will make a bad decision regarding the degree of parallelism, I haven't actually tried anything like this.
In code, it would look like this:
void ProcessJobs(bool isFirst)
{
var job = Queue.PopJob(); // assumes PopJob() is thread-safe
if (job == null)
return;
if (isFirst)
Task.Factory.StartNew(() => ProcessJobs(true));
job.Execute();
Task.Factory.StartNew(() => ProcessJob(false));
}
And start it with
Task.Factory.StartNew(() => ProcessJobs(true));

TaskCreationOptions.LongRunning is useful for blocking tasks and using it here is legitimate. What it does is it suggests to the scheduler to dedicate a thread to the task. The scheduler itself tries to keep number of threads on same level as number of CPU cores to avoid excessive context switching.
It is well described in Threading in C# by Joseph Albahari

I use a message queue/mailbox mechanism to achieve this. It's akin to the actor model. I have a class that has a MailBox. I call this class my "worker." It can receive messages. Those messages are queued and they, essentially, define tasks that I want the worker to run. The worker will use Task.Wait() for its Task to finish before dequeueing the next message and starting the next task.
By limiting the number of workers I have, I am able to limit the number of concurrent threads/tasks that are being run.
This is outlined, with source code, in my blog post on a distributed compute engine. If you look at the code for IActor and the WorkerNode, I hope it makes sense.
https://long2know.com/2016/08/creating-a-distributed-computing-engine-with-the-actor-model-and-net-core/

Task.Factory.StartNew or Parallel.ForEach for many long-running tasks? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Parallel.ForEach vs Task.Factory.StartNew
I need to run about 1,000 tasks in a ThreadPool on a nightly basis (the number may grow in the future). Each task is performing a long running operation (reading data from a web service) and is not CPU intensive. Async I/O is not an option for this particular use case.
Given an IList<string> of parameters, I need to DoSomething(string x). I am trying to pick between the following two options:
IList<Task> tasks = new List<Task>();
foreach (var p in parameters)
{
tasks.Add(Task.Factory.StartNew(() => DoSomething(p), TaskCreationOptions.LongRunning));
}
Task.WaitAll(tasks.ToArray());
OR
Parallel.ForEach(parameters, new ParallelOptions {MaxDegreeOfParallelism = Environment.ProcessorCount*32}, DoSomething);
Which option is better and why?
Note :
The answer should include a comparison between the usage of TaskCreationOptions.LongRunning and MaxDegreeOfParallelism = Environment.ProcessorCount * SomeConstant.

Perhaps you aren't aware of this, but the members in the Parallel class are simply (complicated) wrappers around Task objects. In case you're wondering, the Parallel class creates the Task objects with TaskCreationOptions.None. However, the MaxDegreeOfParallelism would affect those task objects no matter what creation options were passed to the task object's constructor.
TaskCreationOptions.LongRunning gives a "hint" to the underlying TaskScheduler that it might perform better with oversubscription of the threads. Oversubscription is good for threads with high-latency, for example I/O, because it will assign more than one thread (yes thread, not task) to a single core so that it will always have something to do, instead of waiting around for an operation to complete while the thread is in a waiting state. On the TaskScheduler that uses the ThreadPool, it will run LongRunning tasks on their own dedicated thread (the only case where you have a thread per task), otherwise it will run normally, with scheduling and work stealing (really, what you want here anyway)
MaxDegreeOfParallelism controls the number of concurrent operations run. It's similar to specifying the max number of paritions that the data will be split into and processed from. If TaskCreationOptions.LongRunning were able to be specified, all this would do would be to limit the number of tasks running at a single time, similar to a TaskScheduler whose maximum concurrency level is set to that value, similar to this example.
You might want the Parallel.ForEach. However, adding MaxDegreeOfParallelism equal to such a high number actually won't guarantee that there will be that many threads running at once, since the tasks will still be controlled by the ThreadPoolTaskScheduler. That scheduler will the number of threads running at once to the smallest amount possible, which I suppose is the biggest difference between the two methods. You could write (and specify) your own TaskScheduler that would mimic the max degree of parallelism behavior, and have the best of both worlds, but I'm doubting that something you're interested in doing.
My guess is that, depending on latency and the number of actual requests you need to do, using tasks will perform better in many(?) cases, though wind up using more memory, while parallel will be more consistent in resource usage. Of course, async I/O will perform monstrously better than any of these two options, but I understand you can't do that because you're using legacy libraries. So, unfortunately, you'll be stuck with mediocre performance no matter which one of those you chose.
A real solution would be to figure out a way to make async I/O happen; since I don't know the situation, I don't think I can be more helpful than that. Your program (read, thread) will continue execution, and the kernel will wait for the I/O operation to complete (this is also known as using I/O completion ports). Because the thread is not in a waiting state, the runtime can do more work on less threads, which usually ends up in an optimal relationship between the number of cores and number of threads. Adding more threads, as much as I wish it would, does not equate to better performance (actually, it can often hurt performance, because of things like context switching).
However, this entire answer is useless in a determining a final answer for your question, though I hope it will give you some needed direction. You won't know what performs better until you profile it. If you don't try them both (I should clarify that I mean the Task without the LongRunning option, letting the scheduler handle thread switching) and profile them to determine what is best for your particular use case, you're selling yourself short.

Both options are entirely inappropriate for your scenario.
TaskCreationOptions.LongRunning is certainly a better choice for tasks that are not CPU-bound, as the TPL (Parallel classes/extensions) are almost exclusively meant for maximizing the throughput of a CPU-bound operation by running it on multiple cores (not threads).
However, 1000 tasks is an unacceptable number for this. Whether or not they're all running at once isn't exactly the issue; even 100 threads waiting on synchronous I/O is an untenable situation. As one of the comments suggests, your application will be using an enormous amount of memory and end up spending almost all of its time in context-switching. The TPL is not designed for this scale.
If your operations are I/O bound - and if you are using web services, they are - then async I/O is not only the correct solution, it's the only solution. If you have to re-architect some of your code (such as, for example, adding asynchronous methods to major interfaces where there were none originally), do it, because I/O completion ports are the only mechanism in Windows or .NET that can properly support this particular type of concurrency.
I've never heard of a situation where async I/O was somehow "not an option". I cannot even conceive of any valid use case for this constraint. If you are unable to use async I/O then this would indicate a serious design problem that must be fixed, ASAP.

While this is not a direct comparison, I think it may help you. I do something similar to what you describe (in my case I know there is a load balanced server cluster on the other end serving REST calls). I get good results using Parrallel.ForEach to spin up an optimal number of worker threads provided that I also use the following code to tell my operating system it can connect to more than usual number of endpoints.
var servicePointManager = System.Net.ServicePointManager.FindServicePoint(Uri);
servicePointManager.ConnectionLimit = 250;
Note you have to call that once for each unique URL you connect to.

Design Pattern Alternative to Coroutines

Currently, I have a large number of C# computations (method calls) residing in a queue that will be run sequentially. Each computation will use some high-latency service (network, disk...).
I was going to use Mono coroutines to allow the next computation in the computation queue to continue while a previous computation is waiting for the high latency service to return. However, I prefer to not depend on Mono coroutines.
Is there a design pattern that's implementable in pure C# that will enable me to process additional computations while waiting for high latency services to return?
Thanks
Update:
I need to execute a huge number (>10000) of tasks, and each task will be using some high-latency service. On Windows, you can't create that much threads.
Update:
Basically, I need a design pattern that emulates the advantages (as follows) of tasklets in Stackless Python (http://www.stackless.com/)
Huge # of tasks
If a task blocks the next task in the queue executes
No wasted cpu cycle
Minimal overhead switching between tasks

You can simulate cooperative microthreading using IEnumerable. Unfortunately this won't work with blocking APIs, so you need to find APIs that you can poll, or which have callbacks that you can use for signalling.
Consider a method
IEnumerable Thread ()
{
//do some stuff
Foo ();
//co-operatively yield
yield null;
//do some more stuff
Bar ();
//sleep 2 seconds
yield new TimeSpan (2000);
}
The C# compiler will unwrap this into a state machine - but the appearance is that of a co-operative microthread.
The pattern is quite straightforward. You implement a "scheduler" that keeps a list of all the active IEnumerators. As it cycles through the list, it "runs" each one using MoveNext (). If the value of MoveNext is false, the thread has ended, and the scheduler removes it from the list. If it's true, then the scheduler accesses the Current property to determine the current state of the thread. If it's a TimeSpan, the thread wishes to sleep, and the scheduler moved it onto some queue that can be flushed back into the main list when the sleep timespans have ended.
You can use other return objects to implement other signalling mechanisms. For example, define some kind of WaitHandle. If the thread yields one of these, it can be moved to a waiting queue until the handle is signalled. Or you could support WaitAll by yielding an array of wait handles. You could even implement priorities.
I did a simple implementation of this scheduler in about 150LOC but I haven't got round to blogging the code yet. It was for our PhyreSharp PhyreEngine wrapper (which won't be public), where it seems to work pretty well for controlling a couple of hundred characters in one of our demos. We borrowed the concept from the Unity3D engine -- they have some online docs that explain it from a user point of view.

.NET 4.0 comes with extensive support for Task parallelism:
How to: Use Parallel.Invoke to Execute Simple Parallel Tasks
How to: Return a Value from a Task
How to: Chain Multiple Tasks with Continuations

I'd recommend using the Thread Pool to execute multiple tasks from your queue at once in manageable batches using a list of active tasks that feeds off of the task queue.
In this scenario your main worker thread would initially pop N tasks from the queue into the active tasks list to be dispatched to the thread pool (most likely using QueueUserWorkItem), where N represents a manageable amount that won't overload the thread pool, bog your app down with thread scheduling and synchronization costs, or suck up available memory due to the combined I/O memory overhead of each task.
Whenever a task signals completion to the worker thread, you can remove it from the active tasks list and add the next one from your task queue to be executed.
This will allow you to have a rolling set of N tasks from your queue. You can manipulate N to affect the performance characteristics and find what is best in your particular circumstances.
Since you are ultimately bottlenecked by hardware operations (disk I/O and network I/O, CPU) I imagine smaller is better. Two thread pool tasks working on disk I/O most likely won't execute faster than one.
You could also implement flexibility in the size and contents of the active task list by restricting it to a set number of particular type of task. For example if you are running on a machine with 4 cores, you might find that the highest performing configuration is four CPU-bound tasks running concurrently along with one disk-bound task and a network task.
If you already have one task classified as a disk IO task, you may choose to wait until it is complete before adding another disk IO task, and you may choose to schedule a CPU-bound or network-bound task in the meanwhile.
Hope this makes sense!
PS: Do you have any dependancies on the order of tasks?

You should definitely check out the Concurrency and Coordination Runtime. One of their samples describes exactly what you're talking about: you call out to long-latency services, and the CCR efficiently allows some other task to run while you wait. It can handle huge number of tasks because it doesn't need to spawn a thread for each one, though it will use all your cores if you ask it to.

Isn't this a conventional use of multi-threaded processing?
Have a look at patterns such as Reactor here

Writing it to use Async IO might be sufficient.
This can lead to nasy, hard to debug code without strong structure in the design.

You should take a look at this:
http://www.replicator.org/node/80
This should do exactly what you want. It is a hack, though.

Some more information about the "Reactive" pattern (as mentioned by another poster) with respect to an implementation in .NET; aka "Linq to Events"
http://themechanicalbride.blogspot.com/2009/07/introducing-rx-linq-to-events.html
-Oisin

In fact, if you use one thread for a task, you will lose the game. Think about why Node.js can support huge number of conections. Using a few number of thread with async IO!!! Async and await functions can help on this.
foreach (var task in tasks)
{
await SendAsync(task.value);
ReadAsync();
}
SendAsync() and ReadAsync() are faked functions to async IO call.
Task parallelism is also a good choose. But I am not sure which one is faster. You can test both of them
in your case.

Yes of course you can. You just need to build a dispatcher mechanism that will call back on a lambda that you provide and goes into a queue. All the code I write in unity uses this approach and I never use coroutines. I wrap methods that use coroutines such as WWW stuff to just get rid of it. In theory, coroutines can be faster because there is less overhead. Practically they introduce new syntax to a language to do a fairly trivial task and furthermore you can't follow the stack trace properly on an error in a co-routine because all you'll see is ->Next. You'll have to then implement the ability to run the tasks in the queue on another thread. However, there is parallel functions in the latest .net and you'd be essentially writing similar functionality. It wouldn't be many lines of code really.
If anyone is interested I would send the code, don't have it on me.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.