What's the operational difference between Parallel and Task in C#?

What's the operational difference between Parallel and Task in C#? - c#

I work as the sole application developer within a database-focussed team. Recently, I've been trying to improve the efficiency of a process which my predecessor had prototyped. The best way to do this was to thread it. So this was my approach:
public void DoSomething()
{
Parallel.ForEach(rowCollection), (fr) =>
{
fr.Result = MyCleaningOperation();
});
}
Which functions fine, but causes errors. The errors are arising in a third-party tool the call is coding. This tool is supposed to be thread safe, but it looks strongly as though they're arising when two threads try and perform the same operation at the same time.
So I went back to the prototype. Previously I'd only looked at this to see how to talk to the third-party tool. But when I examined the called code, I discovered my predecessor had threaded it using Task and Action, operators with which I'm not familiar.
Action<object> MyCleaningOperation = (object obj) =>
{
// invoke the third-party tool.
}
public void Main()
{
Task[] taskCollection = new Task[1];
for (int i = 0; i < rowCollection.Length; i++)
{
taskCollection[i] = new Task(MyCleaningOperation, i);
}
foreach (var task in taskCollection)
{
task.Start();
}
try
{
Task.WaitAll(taskCollection);
}
catch (Exception ex)
{
throw ex;
}
}
Now, that's not great code but it is a prototype. Allegedly his prototype did not error and ran at a greater speed than mine. I cannot verify this because his prototype was dependent on a dev database that no longer exists.
I don't particularly want to go on a wild goose chase of trying out different kinds of threading in my app to see if some throw errors or not - they're intermittent so it would be a long drawn out process. More so because having read about Task I cannot see any reason why it would work more effectively than Parallel. And because I'm using a void function I cannot easily add an await to mimic the prototype operation.
So: is there an operational difference between the two? Or any other reason why one might cause a tool to trip up with multiple threads using the same resource and the other not?

Action<T> is a void-returning delegate which takes a T. It represents an operation which consumes a T, produces nothing, and is started when invoked.
Task<T> is what it says on the tin: it represents a job that is possibly not yet complete, and when it is complete, it provides a T to its completion.
So, let's make sure you've got it so far: what is the completion of a Task<T>?
Don't read on until you've sussed it out.
.
.
.
.
.
The completion of a task is an action. A task produces a T in the future; an action performs an action on that T when it is available.
All right, so then what is a Task, no T? A task that does not produce a value when it completes. What's the completion of a Task? Plainly an Action.
How can we describe the task performed by a Task then? It does something but produces no result. So that's an Action. Suppose the task requires that it consumes an object to do its work; then that's an Action<object>.
Make sure you understand the relationships here. They are a bit tricky but they all make sense. The names are carefully chosen.
So what then is a thread? A thread is a worker that can do tasks. Do not confuse tasks with threads.
having read about Task I cannot see any reason why it would work more effectively than Parallel.
You see what I mean I hope. This sentence makes no sense. Tasks are just that: tasks. Deliver this book to this address. Add these numbers. Mow this lawn. Tasks are not workers, and they are certainly not the concept of "hire a bunch of workers to do tasks". Parallelism is a strategy for assigning workers to tasks.
Moreover, do not fall into the trap of believing that tasks are inherently parallel. There is no requirement that tasks be performed simultaneously by multiple workers; much of the work we've done in C# in the past few years has been to ensure that tasks may be performed efficiently by a single worker. If you need to make breakfast, mow the lawn and pick up the mail, you don't need to hire a staff to do those things, but you can still pick up the mail while the toast is toasting.
You should examine carefully your claim that the best way to increase performance is to parallelize. Remember, parallelization is simply hiring as many workers as there are CPU cores to run them, and then handing out tasks to each. This is only an improvement if (1) the tasks can actually be run in parallel, independently, (2) the tasks are gated on CPU, not I/O, and (3) you can write programs that are correct in the face of multiple threads of execution in the same program.
If your tasks really are "embarrassingly parallel" and can run completely independently of each other then you might consider process parallelism rather than thread parallelism. It's safer and easier.

The errors are arising in a third-party tool the call is coding. This tool is supposed to be thread safe, but it looks strongly as though they're arising when two threads try and perform the same operation at the same time.
If that's correct, then parallel tasks won't prevent errors any more than Parallel will.
But when I examined the called code, I discovered my predecessor had threaded it using Task and Action, operators with which I'm not familiar.
That code looks OK, though it does use the task constructor combined with Start, which would be more elegantly expressed with Task.Run.
The prototype is using a dynamic task-based parallelism approach, which is overkill for this situation. Your code is using parallel loops, which is more appropriate for data parallelism (see Selecting the Right Pattern and Figure 1).
Allegedly his prototype did not error and ran at a greater speed than mine.
If the error is due to a multithreading but in the third-party tool, then the prototype was just as susceptible to those errors. Perhaps it was using an earlier version of the tool, or the data in the dev database did not expose the bug, or it just got lucky.
Regarding performance, I would expect Parallel to have superior performance to plain task parallelism in general, because Parallel can "batch" operations among tasks, reducing the overhead. Though that extra logic does come with a cost, too, so for small data sizes it could be less performant.
IMO the bigger question is the correctness, and if it fails with Parallel, then it could just as easily fail with parallel tasks.

On the surface, the difference between a task and a thread is this:
A thread is one of the ways you can involve the operating system and the processor in how to have the computer do more than one thing at a time, by having something that is scheduled on the processor and allow it to execute, potentially (and these days, most often) at the same time that other things execute, simply because the processors of today can do more than one thing at the same time
A task, in the context of Task or Task<T>, on the other hand, is the representation of something that has the potential of completing at some point in the future, and then represent the result of that completion
That's basically it.
Sure, you can wrap a thread in a task, but if your question is just "what is the difference between a thread and a task" then the above is it.
You can easily represent things that have nothing to do with threads or even parallel execution of code in a task and it would still be a task. Asynchronous I/O uses tasks heavily these days and most of those (at least the good implementations) doesn't use (extra) threads at all.

Related

How to synchronize TPL Tasks, by using Monitor / Mutex / Semaphore? Or should one use something else entirely?

I'm trying to move some of my old projects from ThreadPool and standalone Thread to TPL Task, because it supports some very handy features, like continuations with Task.ContinueWith (and from C# 5 with async\await), better cancellation, exception capturing, and so on. I'd love to use them in my project. However I already see potential problems, mostly with synchronization.
I've written some code which shows a Producer / Consumer problem, using a classic stand-alone Thread:
class ThreadSynchronizationTest
{
private int CurrentNumber { get; set; }
private object Synchro { get; set; }
private Queue<int> WaitingNumbers { get; set; }
public void TestSynchronization()
{
Synchro = new object();
WaitingNumbers = new Queue<int>();
var producerThread = new Thread(RunProducer);
var consumerThread = new Thread(RunConsumer);
producerThread.Start();
consumerThread.Start();
producerThread.Join();
consumerThread.Join();
}
private int ProduceNumber()
{
CurrentNumber++;
// Long running method. Sleeping as an example
Thread.Sleep(100);
return CurrentNumber;
}
private void ConsumeNumber(int number)
{
Console.WriteLine(number);
// Long running method. Sleeping as an example
Thread.Sleep(100);
}
private void RunProducer()
{
while (true)
{
int producedNumber = ProduceNumber();
lock (Synchro)
{
WaitingNumbers.Enqueue(producedNumber);
// Notify consumer about a new number
Monitor.Pulse(Synchro);
}
}
}
private void RunConsumer()
{
while (true)
{
int numberToConsume;
lock (Synchro)
{
// Ensure we met out wait condition
while (WaitingNumbers.Count == 0)
{
// Wait for pulse
Monitor.Wait(Synchro);
}
numberToConsume = WaitingNumbers.Dequeue();
}
ConsumeNumber(numberToConsume);
}
}
}
In this example, ProduceNumber generates a sequence of increasing integers, while ConsumeNumber writes them to the Console. If producing runs faster, numbers will be queued for consumption later. If consumption runs faster, the consumer will wait until a number is available. All synchronization is done using Monitor and lock (internally also Monitor).
When trying to 'TPL-ify' similar code, I already see a few issues I'm not sure how to go about. If I replace new Thread().Start() with Task.Run():
TPL Task is an abstraction, which does not even guarantee that the code will run on a separate thread. In my example, if the producer control method runs synchronously, the infinite loop will cause the consumer to never even start. According to MSDN, providing a TaskCreationOptions.LongRunning parameter when running the task should hint the TaskScheduler to run the method appropriately, however I didn't find any way to ensure that it does. Supposedly TPL is smart enough to run tasks the way the programmer intended, but that just seems like a bit of magic to me. And I don't like magic in programming.
If I understand how this works correctly, a TPL Task is not guaranteed to resume on the same thread as it started. If it does, in this case it would try to release a lock it doesn't own while the other thread holds the lock forever, resulting in a deadlock. I remember a while ago Eric Lippert writing that it's the reason why await is not allowed in a lock block. Going back to my example, I'm not even sure how to go about solving this issue.
These are the few issues that crossed my mind, although there may be (probably are) more. How should I go about solving them?
Also, this made me think, is using the classical approach of synchronizing via Monitor, Mutex or Semaphore even the right way to do TPL code? Perhaps I'm missing something that I should be using instead?

Your question pushes the limits of broadness for Stack Overflow. Moving from plain Thread implementations to something based on Task and other TPL features involves a wide variety of considerations. Taken individually, each concern has almost certainly been addressed in a prior Stack Overflow Q&A, and taken in aggregate there are too many considerations to address competently and comprehensively in a single Stack Overflow Q&A.
So, with that said, let's look just at the specific issues you've asked about here.
TPL Task is an abstraction, which does not even guarantee that the code will run on a separate thread. In my example, if the producer control method runs synchronously, the infinite loop will cause the consumer to never even start. According to MSDN, providing a TaskCreationOptions.LongRunning parameter when running the task should hint the TaskScheduler to run the method appropriately, however I didn't find any way to ensure that it does. Supposedly TPL is smart enough to run tasks the way the programmer intended, but that just seems like a bit of magic to me. And I don't like magic in programming.
It is true that the Task object itself does not guarantee asynchronous behavior. For example, an async method which returns a Task object could contain no asynchronous operations at all, and could run for an extended period of time before returning an already-completed Task object.
On the other hand, Task.Run() is guaranteed to operate asynchronously. It is documented as such:
Queues the specified work to run on the ThreadPool and returns a task or Task<TResult> handle for that work
While the Task object itself abstracts the idea of a "future" or "promise" (to use synonymous terms found in programming), the specific implementation is very much tied to the thread pool. When used correctly, you can be assured of asynchronous operation.
If I understand how this works correctly, a TPL Task is not guaranteed to resume on the same thread as it started. If it does, in this case it would try to release a lock it doesn't own while the other thread holds the lock forever, resulting in a deadlock. I remember a while ago Eric Lippert writing that it's the reason why await is not allowed in a lock block. Going back to my example, I'm not even sure how to go about solving this issue.
Only some synchronization objects are thread-specific. For example, Monitor is. But Semaphore is not. Whether this is useful to you or not depends on what you are trying to implement. For example, you can implement the producer/consumer pattern with a long running thread that uses BlockingCollection<T>, without needing to call any explicit synchronization objects at all. If you did want to use TPL techniques, you could use SemaphoreSlim and its WaitAsync() method.
Of course, you could also use the Dataflow API. For some scenarios this would be preferable. For very simple producer/consumer, it would probably be overkill. :)
Also, this made me think, is using the classical approach of synchronizing via Monitor, Mutex or Semaphore even the right way to do TPL code? Perhaps I'm missing something that I should be using instead?
IMHO, this is the crux of the matter. Moving from Thread-based programming to the TPL is not simply a matter of a straight-forward mapping from one construct to another. In some cases, doing so would be inefficient, and in other cases it simply won't work.
Indeed, I would say a key feature of TPL and especially of async/await is that synchronization of threads is much less necessary. The general idea is to perform operations asynchronously, with minimal interaction between threads. Data flows between threads only at well-defined points (i.e. retrieved from the completed Task objects), reducing or even eliminating the need for explicit synchronization.
It's impossible to suggest specific techniques, as how best to implement something will depend on what exactly the goal is. But the short version is to understand that when using TPL, very often it is simply unnecessary to use synchronization primitives such as what you're used to using with the lower-level API. You should strive to develop enough experience with the TPL idioms that you can recognize which ones apply to which programming problems, so that you apply them directly rather than trying to mentally map your old knowledge.
In a way, this is (I think) analogous to learning a new human language. At first, one spends a lot of time mentally translating literally, possibly remapping to adjust to grammar, idioms, etc. But ideally at some point, one internalizes the language and is able to express oneself in that language directly. Personally, I've never gotten to that point when it comes to human languages, but I understand the concept in theory :). And I can tell you firsthand, it works quite well in the context of programming languages.
By the way, if you are interested in seeing how TPL ideas taken to extremes work out, you might like to read through Joe Duffy's recent blog articles on the topic. Indeed, the most recent version of .NET and associated languages have borrowed heavily from concepts developed in the Midori project he's describing.

Tasks in .Net are a hybrid. TPL brought tasks in .Net 4.0, but async-await only came with .Net 4.5.
There's a difference between the original tasks and the truly asynchronous tasks that came with async-await. The first is simply an abstraction of a "unit of work" that runs on some thread, but asynchronous tasks don't need a thread, or run anywhere at all.
The regular tasks (or Delegate Tasks) are queued on some TaskScheduler (usually by Task.Run that uses the ThreadPool) and are executed by the same thread throughout the task's lifetime. There's no problem at all in using a traditional lock here.
The asynchronous tasks (or Promise Tasks) usually don't have code to execute, they just represent an asynchronous operation that will complete in the future. Take Task.Delay(10000) for example. The task is created, and completed after 10 seconds but there's nothing running in the meantime. Here you can still use the traditional lock when appropriate (but not with an await inside the critical section) but you can also lock asynchronously with SemaphoreSlim.WaitAsync (or other async synchronization constructs)
Is using the classical approach of synchronizing via Monitor, Mutex or Semaphore even the right way to do TPL code?
It may be, that depends on what the code actually does and whether it uses TPL (i.e. Tasks) or async-await. However, there are many other tools you can now use like async synchronization constructs (AsyncLock) and async data structures (TPL Dataflow)

Manage many repetitive, CPU intensive tasks, running parallelly?

I need to constantly perform 20 repetitive, CPU intensive calculations as fast as possible. So there is 20 tasks which contain looped methods in :
while(!token.IsCancellationRequested)
to repeat them as fast as possible. All calculations are performed at the same time. Unfortunatelly this makes the program unresponsive, so added :
await Task.Delay(15);
At this point program doesn't hang but adding Delay is not correct approach and it unnecessarily slows down the speed of calculations. It is WPF program without MVVM. What approach would you suggest to keep all 20 tasks working at the same time? Each of them will be constantly repeated as soon as it finished. I would like to keep CPU (all cores) utilisation at max values (or near) to ensure best efficiency.
EDIT:
There is 20 controls in which user adjusts some parameters. Calculations are done in:
private async Task Calculate()
{
Task task001 = null;
task001 = Task.Run(async () =>
{
while (!CTSFor_task001.IsCancellationRequested)
{
await Task.Delay(15);
await CPUIntensiveMethod();
}
}, CTSFor_task001.Token);
}
Each control is independent. Calcullations are 100% CPU-bound, no I/O activity. (All values come from variables) During calculations values of some UI items are changed:
this.Dispatcher.BeginInvoke(new Action(() =>
{
this.lbl_001.Content = "someString";
}));

Let me just write the whole thing as an answer. You're confusing two related, but ultimately separate concepts (thankfully - that's why you can benefit from the distinction). Note that those are my definitions of the concepts - you'll hear tons of different names for the same things and vice versa.
Asynchronicity is about breaking the imposed synchronicity of operations (ie. op 1 waits for op 2, which waits for op 3, which waits for op 4...). For me, this is the more general concept, but nowadays it's more commonly used to mean what I'd call "inherent asynchronicity" - ie. the algorithm itself is asynchronous, and we're only using synchronous programming because we have to (and thanks to await and async, we don't have to anymore, yay!).
The key thought here is waiting. I can't do anything on the CPU, because I'm waiting for the result of an I/O operation. This kind of asynchronous programming is based on the thought that asynchronous operations are almost CPU free - they are I/O bound, not CPU-bound.
Parallelism is a special kind of the general asynchronicity, in which the operations don't primarily wait for one another. In other words, I'm not waiting, I'm working. If I have four CPU cores, I can ideally use four computing threads for this kind of processing - in an ideal world, my algorithm will scale linearly with the number of available cores.
With asynchronicity (waiting), using more threads will improve the apparent speed regardless of the number of the available logical cores. This is because 99% of the time, the code doesn't actually do any work, it's simply waiting.
With parallelism (working), using more threads is directly tied to the number of available work cores.
The lines blur a lot. That's because of things you may not even know are happening, for example the CPU (and the computer as a whole) is incredibly asynchronous on its own - the apparent synchronicity it shows is only there to allow you to write code synchronously; all the optimalizations and asynchronicity is limited by the fact that on output, everything is synchronous again. If the CPU had to wait for data from memory every time you do i ++, it wouldn't matter if your CPU was operating at 3 GHz or 100 MHz. Your awesome 3 GHz CPU would sit there idle 99% of the time.
With that said, your calculation tasks are CPU-bound. They should be executed using parallelism, because they are doing work. On the other hand, the UI is I/O bound, and it should be using asynchronous code.
In reality, all your async Calculate method does is that it masks the fact that it's not actually inherently asynchronous. Instead, you want to run it asynchronously to the I/O.
In other words, it's not the Calculate method that's asynchronous. It's the UI that wants this to run asynchronously to itself. Remove all that Task.Run clutter from there, it doesn't belong.
What to do next? That depends on your use case. Basically, there's two scenarios:
You want the tasks to always run, always in the background, from start to end. In that case, simply create a thread for each of them, and don't use Task at all. You might also want to explore some options like a producer-consumer queue etc., to optimize the actual run-time of the different possible calculation tasks. The actual implementation is quite tightly bound to what you're actually processing.
Or, you want to start the task on an UI action, and then work with the resulting values back in the UI method that started them when the results are ready. In that case, await finally comes to play:
private btn_Click(object sender, EventArgs e)
{
var result = await Task.Run(Calculate);
// Do some (little) work with the result once we get it
tbxResult.Text = result;
}
The async keyword actually has no place in your code at all.
Hope this is more clear now, feel free to ask more questions.

So what you actually seek is a clarification of a good practice to maximize performance while keeping the UI responsive. As Luaan clarified, the async and await sections in your proposal will not benefit your problem, and Task.Run is not suited for your work; using threads is a better approach.
Define an array of Threads to run one on each logical processor. Distribute your task data between them and control your 20 repetitive calculations via BufferBlock provided in TPL DataFlow library.
To keep UI responsive, I suggest two approaches:
Your calculations demand many frequent UI updates: Put their required update information in a queue and update them in Timer event.
Your calculations demand scarce UI updates: Update UI with an invocation method like Control.BeginInvoke

As #Luaan says, I would strongly recommend reading up on async/await, the key point being it doesn't introduce any parallelism.
I think what you're trying to do is something like the simple example below, where you kick off CPUIntensiveMethod on the thread pool and await its completion. await returns control from the Calculate method (allowing the UI thread to continue working) until the task completes, at which point it continues with the while loop.
private async Task Calculate()
{
while (!CTSFor_task001.IsCancellationRequested)
{
await Task.Run(CPUIntensiveMethod);
}
}

Best way to limit the number of active Tasks running via the Parallel Task Library

Consider a queue holding a lot of jobs that need processing. Limitation of queue is can only get 1 job at a time and no way of knowing how many jobs there are. The jobs take 10s to complete and involve a lot of waiting for responses from web services so is not CPU bound.
If I use something like this
while (true)
{
var job = Queue.PopJob();
if (job == null)
break;
Task.Factory.StartNew(job.Execute);
}
Then it will furiously pop jobs from the queue much faster than it can complete them, run out of memory and fall on its ass. >.<
I can't use (I don't think) ParallelOptions.MaxDegreeOfParallelism because I can't use Parallel.Invoke or Parallel.ForEach
3 alternatives I've found
Replace Task.Factory.StartNew with
Task task = new Task(job.Execute,TaskCreationOptions.LongRunning)
task.Start();
Which seems to somewhat solve the problem but I am not clear exactly what this is doing and if this is the best method.
Create a custom task scheduler that limits the degree of concurrency
Use something like BlockingCollection to add jobs to collection when started and remove when finished to limit number that can be running.
With #1 I've got to trust that the right decision is automatically made, #2/#3 I've got to work out the max number of tasks that can be running myself.
Have I understood this correctly - which is the better way, or is there another way?
EDIT - This is what I've come up with from the answers below, producer-consumer pattern.
As well as overall throughput aim was not to dequeue jobs faster than could be processed and not have multiple threads polling queue (not shown here but thats a non-blocking op and will lead to huge transaction costs if polled at high frequency from multiple places).
// BlockingCollection<>(1) will block if try to add more than 1 job to queue (no
// point in being greedy!), or is empty on take.
var BlockingCollection<Job> jobs = new BlockingCollection<Job>(1);
// Setup a number of consumer threads.
// Determine MAX_CONSUMER_THREADS empirically, if 4 core CPU and 50% of time
// in job is blocked waiting IO then likely be 8.
for(int numConsumers = 0; numConsumers < MAX_CONSUMER_THREADS; numConsumers++)
{
Thread consumer = new Thread(() =>
{
while (!jobs.IsCompleted)
{
var job = jobs.Take();
job.Execute();
}
}
consumer.Start();
}
// Producer to take items of queue and put in blocking collection ready for processing
while (true)
{
var job = Queue.PopJob();
if (job != null)
jobs.Add(job);
else
{
jobs.CompletedAdding()
// May need to wait for running jobs to finish
break;
}
}

I just gave an answer which is very applicable to this question.
Basically, the TPL Task class is made to schedule CPU-bound work. It is not made for blocking work.
You are working with a resource that is not CPU: waiting for service replies. This means the TPL will mismange your resource because it assumes CPU boundedness to a certain degree.
Manage the resources yourself: Start a fixed number of threads or LongRunning tasks (which is basically the same). Decide on the number of threads empirically.
You can't put unreliable systems into production. For that reason, I recommend #1 but throttled. Don't create as many threads as there are work items. Create as many threads which are needed to saturate the remote service. Write yourself a helper function which spawns N threads and uses them to process M work items. You get totally predictable and reliable results that way.

Potential flow splits and continuations caused by await, later on in your code or in a 3rd party library, won't play nicely with long running tasks (or threads), so don't bother using long running tasks. In the async/await world, they're useless. More details here.
You can call ThreadPool.SetMaxThreads but before you make this call, make sure you set the minimum number of threads with ThreadPool.SetMinThreads, using values below or equal to the max ones. And by the way, the MSDN documentation is wrong. You CAN go below the number of cores on your machine with those method calls, at least in .NET 4.5 and 4.6 where I used this technique to reduce the processing power of a memory limited 32 bit service.
If however you don't wish to restrict the whole app but just the processing part of it, a custom task scheduler will do the job. A long time ago, MS released samples with several custom task schedulers, including a LimitedConcurrencyLevelTaskScheduler. Spawn the main processing task manually with Task.Factory.StartNew, providing the custom task scheduler, and every other task spawned by it will use it, including async/await and even Task.Yield, used for achieving asynchronousy early on in an async method.
But for your particular case, both solutions won't stop exhausting your queue of jobs before completing them. That might not be desirable, depending on the implementation and purpose of that queue of yours. They are more like "fire a bunch of tasks and let the scheduler find the time to execute them" type of solutions. So perhaps something a bit more appropriate here could be a stricter method of control over the execution of the jobs via semaphores. The code would look like this:
semaphore = new SemaphoreSlim(max_concurrent_jobs);
while(...){
job = Queue.PopJob();
semaphore.Wait();
ProcessJobAsync(job);
}
async Task ProcessJobAsync(Job job){
await Task.Yield();
... Process the job here...
semaphore.Release();
}
There's more than one way to skin a cat. Use what you believe is appropriate.

Microsoft has a very cool library called DataFlow which does exactly what you want (and much more). Details here.
You should use the ActionBlock class and set the MaxDegreeOfParallelism of the ExecutionDataflowBlockOptions object. ActionBlock plays nicely with async/await, so even when your external calls are awaited, no new jobs will begin processing.
ExecutionDataflowBlockOptions actionBlockOptions = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
};
this.sendToAzureActionBlock = new ActionBlock<List<Item>>(async items => await ProcessItems(items),
actionBlockOptions);
...
this.sendToAzureActionBlock.Post(itemsToProcess)

The problem here doesn't seem to be too many running Tasks, it's too many scheduled Tasks. Your code will try to schedule as many Tasks as it can, no matter how fast they are executed. And if you have too many jobs, this means you will get OOM.
Because of this, none of your proposed solutions will actually solve your problem. If it seems that simply specifying LongRunning solves your problem, then that's most likely because creating a new Thread (which is what LongRunning does) takes some time, which effectively throttles getting new jobs. So, this solution only works by accident, and will most likely lead to other problems later on.
Regarding the solution, I mostly agree with usr: the simplest solution that works reasonably well is to create a fixed number of LongRunning tasks and have one loop that calls Queue.PopJob() (protected by a lock if that method is not thread-safe) and Execute()s the job.
UPDATE: After some more thinking, I realized the following attempt will most likely behave terribly. Use it only if you're really sure it will work well for you.
But the TPL tries to figure out the best degree of parallelism, even for IO-bound Tasks. So, you might try to use that to your advantage. Long Tasks won't work here, because from the point of view of TPL, it seems like no work is done and it will start new Tasks over and over. What you can do instead is to start a new Task at the end of each Task. This way, TPL will know what's going on and its algorithm may work well. Also, to let the TPL decide the degree of parallelism, at the start of a Task that is first in its line, start another line of Tasks.
This algorithm may work well. But it's also possible that the TPL will make a bad decision regarding the degree of parallelism, I haven't actually tried anything like this.
In code, it would look like this:
void ProcessJobs(bool isFirst)
{
var job = Queue.PopJob(); // assumes PopJob() is thread-safe
if (job == null)
return;
if (isFirst)
Task.Factory.StartNew(() => ProcessJobs(true));
job.Execute();
Task.Factory.StartNew(() => ProcessJob(false));
}
And start it with
Task.Factory.StartNew(() => ProcessJobs(true));

TaskCreationOptions.LongRunning is useful for blocking tasks and using it here is legitimate. What it does is it suggests to the scheduler to dedicate a thread to the task. The scheduler itself tries to keep number of threads on same level as number of CPU cores to avoid excessive context switching.
It is well described in Threading in C# by Joseph Albahari

I use a message queue/mailbox mechanism to achieve this. It's akin to the actor model. I have a class that has a MailBox. I call this class my "worker." It can receive messages. Those messages are queued and they, essentially, define tasks that I want the worker to run. The worker will use Task.Wait() for its Task to finish before dequeueing the next message and starting the next task.
By limiting the number of workers I have, I am able to limit the number of concurrent threads/tasks that are being run.
This is outlined, with source code, in my blog post on a distributed compute engine. If you look at the code for IActor and the WorkerNode, I hope it makes sense.
https://long2know.com/2016/08/creating-a-distributed-computing-engine-with-the-actor-model-and-net-core/

Task.Factory.StartNew or Parallel.ForEach for many long-running tasks? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Parallel.ForEach vs Task.Factory.StartNew
I need to run about 1,000 tasks in a ThreadPool on a nightly basis (the number may grow in the future). Each task is performing a long running operation (reading data from a web service) and is not CPU intensive. Async I/O is not an option for this particular use case.
Given an IList<string> of parameters, I need to DoSomething(string x). I am trying to pick between the following two options:
IList<Task> tasks = new List<Task>();
foreach (var p in parameters)
{
tasks.Add(Task.Factory.StartNew(() => DoSomething(p), TaskCreationOptions.LongRunning));
}
Task.WaitAll(tasks.ToArray());
OR
Parallel.ForEach(parameters, new ParallelOptions {MaxDegreeOfParallelism = Environment.ProcessorCount*32}, DoSomething);
Which option is better and why?
Note :
The answer should include a comparison between the usage of TaskCreationOptions.LongRunning and MaxDegreeOfParallelism = Environment.ProcessorCount * SomeConstant.

Perhaps you aren't aware of this, but the members in the Parallel class are simply (complicated) wrappers around Task objects. In case you're wondering, the Parallel class creates the Task objects with TaskCreationOptions.None. However, the MaxDegreeOfParallelism would affect those task objects no matter what creation options were passed to the task object's constructor.
TaskCreationOptions.LongRunning gives a "hint" to the underlying TaskScheduler that it might perform better with oversubscription of the threads. Oversubscription is good for threads with high-latency, for example I/O, because it will assign more than one thread (yes thread, not task) to a single core so that it will always have something to do, instead of waiting around for an operation to complete while the thread is in a waiting state. On the TaskScheduler that uses the ThreadPool, it will run LongRunning tasks on their own dedicated thread (the only case where you have a thread per task), otherwise it will run normally, with scheduling and work stealing (really, what you want here anyway)
MaxDegreeOfParallelism controls the number of concurrent operations run. It's similar to specifying the max number of paritions that the data will be split into and processed from. If TaskCreationOptions.LongRunning were able to be specified, all this would do would be to limit the number of tasks running at a single time, similar to a TaskScheduler whose maximum concurrency level is set to that value, similar to this example.
You might want the Parallel.ForEach. However, adding MaxDegreeOfParallelism equal to such a high number actually won't guarantee that there will be that many threads running at once, since the tasks will still be controlled by the ThreadPoolTaskScheduler. That scheduler will the number of threads running at once to the smallest amount possible, which I suppose is the biggest difference between the two methods. You could write (and specify) your own TaskScheduler that would mimic the max degree of parallelism behavior, and have the best of both worlds, but I'm doubting that something you're interested in doing.
My guess is that, depending on latency and the number of actual requests you need to do, using tasks will perform better in many(?) cases, though wind up using more memory, while parallel will be more consistent in resource usage. Of course, async I/O will perform monstrously better than any of these two options, but I understand you can't do that because you're using legacy libraries. So, unfortunately, you'll be stuck with mediocre performance no matter which one of those you chose.
A real solution would be to figure out a way to make async I/O happen; since I don't know the situation, I don't think I can be more helpful than that. Your program (read, thread) will continue execution, and the kernel will wait for the I/O operation to complete (this is also known as using I/O completion ports). Because the thread is not in a waiting state, the runtime can do more work on less threads, which usually ends up in an optimal relationship between the number of cores and number of threads. Adding more threads, as much as I wish it would, does not equate to better performance (actually, it can often hurt performance, because of things like context switching).
However, this entire answer is useless in a determining a final answer for your question, though I hope it will give you some needed direction. You won't know what performs better until you profile it. If you don't try them both (I should clarify that I mean the Task without the LongRunning option, letting the scheduler handle thread switching) and profile them to determine what is best for your particular use case, you're selling yourself short.

Both options are entirely inappropriate for your scenario.
TaskCreationOptions.LongRunning is certainly a better choice for tasks that are not CPU-bound, as the TPL (Parallel classes/extensions) are almost exclusively meant for maximizing the throughput of a CPU-bound operation by running it on multiple cores (not threads).
However, 1000 tasks is an unacceptable number for this. Whether or not they're all running at once isn't exactly the issue; even 100 threads waiting on synchronous I/O is an untenable situation. As one of the comments suggests, your application will be using an enormous amount of memory and end up spending almost all of its time in context-switching. The TPL is not designed for this scale.
If your operations are I/O bound - and if you are using web services, they are - then async I/O is not only the correct solution, it's the only solution. If you have to re-architect some of your code (such as, for example, adding asynchronous methods to major interfaces where there were none originally), do it, because I/O completion ports are the only mechanism in Windows or .NET that can properly support this particular type of concurrency.
I've never heard of a situation where async I/O was somehow "not an option". I cannot even conceive of any valid use case for this constraint. If you are unable to use async I/O then this would indicate a serious design problem that must be fixed, ASAP.

While this is not a direct comparison, I think it may help you. I do something similar to what you describe (in my case I know there is a load balanced server cluster on the other end serving REST calls). I get good results using Parrallel.ForEach to spin up an optimal number of worker threads provided that I also use the following code to tell my operating system it can connect to more than usual number of endpoints.
var servicePointManager = System.Net.ServicePointManager.FindServicePoint(Uri);
servicePointManager.ConnectionLimit = 250;
Note you have to call that once for each unique URL you connect to.

Design Pattern Alternative to Coroutines

Currently, I have a large number of C# computations (method calls) residing in a queue that will be run sequentially. Each computation will use some high-latency service (network, disk...).
I was going to use Mono coroutines to allow the next computation in the computation queue to continue while a previous computation is waiting for the high latency service to return. However, I prefer to not depend on Mono coroutines.
Is there a design pattern that's implementable in pure C# that will enable me to process additional computations while waiting for high latency services to return?
Thanks
Update:
I need to execute a huge number (>10000) of tasks, and each task will be using some high-latency service. On Windows, you can't create that much threads.
Update:
Basically, I need a design pattern that emulates the advantages (as follows) of tasklets in Stackless Python (http://www.stackless.com/)
Huge # of tasks
If a task blocks the next task in the queue executes
No wasted cpu cycle
Minimal overhead switching between tasks

You can simulate cooperative microthreading using IEnumerable. Unfortunately this won't work with blocking APIs, so you need to find APIs that you can poll, or which have callbacks that you can use for signalling.
Consider a method
IEnumerable Thread ()
{
//do some stuff
Foo ();
//co-operatively yield
yield null;
//do some more stuff
Bar ();
//sleep 2 seconds
yield new TimeSpan (2000);
}
The C# compiler will unwrap this into a state machine - but the appearance is that of a co-operative microthread.
The pattern is quite straightforward. You implement a "scheduler" that keeps a list of all the active IEnumerators. As it cycles through the list, it "runs" each one using MoveNext (). If the value of MoveNext is false, the thread has ended, and the scheduler removes it from the list. If it's true, then the scheduler accesses the Current property to determine the current state of the thread. If it's a TimeSpan, the thread wishes to sleep, and the scheduler moved it onto some queue that can be flushed back into the main list when the sleep timespans have ended.
You can use other return objects to implement other signalling mechanisms. For example, define some kind of WaitHandle. If the thread yields one of these, it can be moved to a waiting queue until the handle is signalled. Or you could support WaitAll by yielding an array of wait handles. You could even implement priorities.
I did a simple implementation of this scheduler in about 150LOC but I haven't got round to blogging the code yet. It was for our PhyreSharp PhyreEngine wrapper (which won't be public), where it seems to work pretty well for controlling a couple of hundred characters in one of our demos. We borrowed the concept from the Unity3D engine -- they have some online docs that explain it from a user point of view.

.NET 4.0 comes with extensive support for Task parallelism:
How to: Use Parallel.Invoke to Execute Simple Parallel Tasks
How to: Return a Value from a Task
How to: Chain Multiple Tasks with Continuations

I'd recommend using the Thread Pool to execute multiple tasks from your queue at once in manageable batches using a list of active tasks that feeds off of the task queue.
In this scenario your main worker thread would initially pop N tasks from the queue into the active tasks list to be dispatched to the thread pool (most likely using QueueUserWorkItem), where N represents a manageable amount that won't overload the thread pool, bog your app down with thread scheduling and synchronization costs, or suck up available memory due to the combined I/O memory overhead of each task.
Whenever a task signals completion to the worker thread, you can remove it from the active tasks list and add the next one from your task queue to be executed.
This will allow you to have a rolling set of N tasks from your queue. You can manipulate N to affect the performance characteristics and find what is best in your particular circumstances.
Since you are ultimately bottlenecked by hardware operations (disk I/O and network I/O, CPU) I imagine smaller is better. Two thread pool tasks working on disk I/O most likely won't execute faster than one.
You could also implement flexibility in the size and contents of the active task list by restricting it to a set number of particular type of task. For example if you are running on a machine with 4 cores, you might find that the highest performing configuration is four CPU-bound tasks running concurrently along with one disk-bound task and a network task.
If you already have one task classified as a disk IO task, you may choose to wait until it is complete before adding another disk IO task, and you may choose to schedule a CPU-bound or network-bound task in the meanwhile.
Hope this makes sense!
PS: Do you have any dependancies on the order of tasks?

You should definitely check out the Concurrency and Coordination Runtime. One of their samples describes exactly what you're talking about: you call out to long-latency services, and the CCR efficiently allows some other task to run while you wait. It can handle huge number of tasks because it doesn't need to spawn a thread for each one, though it will use all your cores if you ask it to.

Isn't this a conventional use of multi-threaded processing?
Have a look at patterns such as Reactor here

Writing it to use Async IO might be sufficient.
This can lead to nasy, hard to debug code without strong structure in the design.

You should take a look at this:
http://www.replicator.org/node/80
This should do exactly what you want. It is a hack, though.

Some more information about the "Reactive" pattern (as mentioned by another poster) with respect to an implementation in .NET; aka "Linq to Events"
http://themechanicalbride.blogspot.com/2009/07/introducing-rx-linq-to-events.html
-Oisin

In fact, if you use one thread for a task, you will lose the game. Think about why Node.js can support huge number of conections. Using a few number of thread with async IO!!! Async and await functions can help on this.
foreach (var task in tasks)
{
await SendAsync(task.value);
ReadAsync();
}
SendAsync() and ReadAsync() are faked functions to async IO call.
Task parallelism is also a good choose. But I am not sure which one is faster. You can test both of them
in your case.

Yes of course you can. You just need to build a dispatcher mechanism that will call back on a lambda that you provide and goes into a queue. All the code I write in unity uses this approach and I never use coroutines. I wrap methods that use coroutines such as WWW stuff to just get rid of it. In theory, coroutines can be faster because there is less overhead. Practically they introduce new syntax to a language to do a fairly trivial task and furthermore you can't follow the stack trace properly on an error in a co-routine because all you'll see is ->Next. You'll have to then implement the ability to run the tasks in the queue on another thread. However, there is parallel functions in the latest .net and you'd be essentially writing similar functionality. It wouldn't be many lines of code really.
If anyone is interested I would send the code, don't have it on me.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.