I love PLINQ. In all of my test cases on various parallelization patterns, it performs well and consistently. But I recently ran into a question that has me a little bothered: what is the functional difference between these two examples? What, hopefully if anything, qualifies the PLINQ example to not be similar or equivalent to the following anti-pattern?
PLINQ:
public int PLINQSum()
{
return Enumerable.Range(0, N)
.AsParallel()
.Select((x) => x + 1)
.Sum();
}
Sync over async:
public int AsyncSum()
{
var tasks = Enumerable.Range(0, N)
.Select((x) => Task.Run(() => x + 1));
return Task.WhenAll(tasks).Result.Sum();
}
The AsyncSum method is not an example of Sync over Async. It is an example of using the Task.Run method with the intention of parallelizing a calculation. You might think that Task = async, but it's not. The Task class was introduced with the .NET Framework 4.0 in 2010, as part of the Task Parallel Library, two years before the advent of the async/await technology with the .NET Framework 4.5 in 2012.
What is Sync over Async: We use this term to describe a situation where an asynchronous API is invoked and then waited synchronously, causing a thread to be blocked until the completion of the asynchronous operation. It is implied that the asynchronous API has a truly asynchronous implementation, meaning that it uses no thread while the operation is in-flight. Most, but not all, of the asynchronous APIs that are built-in the .NET platform have truly asynchronous implementations.
The two examples in your question are technically different, but not because one of them is Sync over Async. None of them is. Both are parallelizing a synchronous operation (the mathematical addition x + 1), that cannot be performed without utilizing the CPU. And when we use the CPU, we use a thread.
Characterizing the AsyncSum method as anti-pattern might be fair, but not because it is Sync over Async. You might want to call it anti-pattern because:
It allocates and schedules a Task for each number in the sequence, incurring a gigantic overhead compared to the tiny computational work that has to be performed.
It saturates the ThreadPool for the whole duration of the parallel operation.
It forces the ThreadPool to create additional threads, resulting in oversubscription (more threads than CPUs). This results in the operating system having more work to do (switching between threads).
It has bad behavior in case of exceptions. Instead of stopping the operation as soon as possible after an error has occurred, it will invoke the lambda invariably for all elements in the sequence. As a result you'll have to wait for longer until you observe the error, and finally you might observe a huge number of errors.
It doesn't utilize the current thread. The current thread is blocked doing nothing, while all the work is done by ThreadPool threads. In comparison the PLINQ utilizes the current thread as one of its worker threads. This is something that you could also do manually, by creating some of the tasks with the Task constructor (instead of Task.Run), and then use the RunSynchronously method in order to run them on the current thread, while the rest of the tasks are scheduled on the ThreadPool.
var task1 = new Task<int>(() => 1 + 1); // Cold task
var task2 = Task.Run(() => 2 + 2); // Hot task scheduled on the ThreadPool
task1.RunSynchronously(); // Run the cold task on the current thread
int sum = Task.WhenAll(task1, task2).Result.Sum(); // Wait both tasks
The name AsyncSum itself is inappropriate, since there is nothing asynchronous happening inside this method. A better name could be WhenAll_TaskRun_Sum.
Related
Given an external API method signature like the following:
Task<int> ISomething.GetValueAsync(int x)
I often see code such as the following:
public async Task Go(ISomething i)
{
int a = await i.GetValueAsync(1);
int b = await i.GetValueAsync(2);
int c = await i.GetValueAsync(3);
int d = await i.GetValueAsync(4);
Console.WriteLine($"{a} - {b} - {c} - {d}");
}
In code reviews it is sometimes suggested this is inefficient and should be rewritten:
public async Task Go(ISomething i)
{
Task<int> ta = i.GetValueAsync(1);
Task<int> tb = i.GetValueAsync(2);
Task<int> tc = i.GetValueAsync(3);
Task<int> td = i.GetValueAsync(4);
await Task.WhenAll(ta,tb,tc,td);
int a = ta.Result, b= tb.Result, c=tc.Result, d = td.Result;
Console.WriteLine($"{a} - {b} - {c} - {d}");
}
I can see the logic behind allowing parallelisation but in reality is this worthwhile? It presumably adds some overhead to the scheduler and if the methods themselves are very lightweight, it seems likely that thread parallelisation would be more costly than the time saved. Further, on a busy server running many applications, it seems unlikely there would be lots of cores sitting idle.
I can't tell if this is always a good pattern to follow, or if it's an optimisation to make on a case-by-case basis? Do Microsoft (or anyone else) give good best practice advice? Should we always write Task - based code in this way as a matter of course?
if the methods themselves are very lightweight, it seems likely that thread parallelisation would be more costly than the time saved.
This is definitely an issue with parallel code (the parallel form of concurrency), but this is asynchronous concurrency. Presumably, GetValueAsync is a true asynchronous method, which generally implies I/O operations. And I/O operations tend to dwarf many local considerations. (Side note: the WhenAll approach actually causes fewer thread switches and less scheduling overhead, but it does increase memory overhead slightly).
So, this is a win in the general case. However, that does assume that the statements above are correct (i.e., GetValueAsync performs I/O).
As a contrary point, however, I have seen this sometimes used as a band-aid for an inadequate data access layer. If you're hitting a SQL database with four queries, then the best solution is usually to combine that data access into a single query rather than do four calls to the same SQL database concurrently.
First, what does await actually do?
public async Task M() {
await Task.Yield();
}
If the awaitable object has already completed, then execution continues immediately. Otherwise a callback delegate is added to the awaitable object. This callback will be invoked immediately when the task result is made available.
So what about Task.WhenAll, how does that work? The current implementation adds a callback to every incomplete task. That callback will decrement a counter atomically. When the counter reaches zero, the result will be made available.
No new I/O is scheduled, no continuations added to the thread pool. Just a small counter added to the end of each tasks processing. Your continuation will resume on whichever thread executed the last task.
If you are measuring a performance problem. I wouldn't worry about the overheads of Task.WhenAll.
I have a similair question to Running async methods in parallel in that I wish to run a number of functions from a list of functions in parallel.
I have noted in a number of comments online it is mentioned that if you have another await in your methods, Task.WhenAll() will not help as Async methods are not parallel.
I then went ahead and created a thread for each using function call with the below (the number of parallel functions will be small typically 1 to 5):
public interface IChannel
{
Task SendAsync(IMessage message);
}
public class SendingChannelCollection
{
protected List<IChannel> _channels = new List<IChannel>();
/* snip methods to add channels to list etc */
public async Task SendAsync(IMessage message)
{
var tasks = SendAll(message);
await Task.WhenAll(tasks.AsParallel().Select(async task => await task));
}
private IEnumerable<Task> SendAll(IMessage message)
{
foreach (var channel in _channels)
yield return channel.SendAsync(message, qos);
}
}
I would like to double check I am not doing anything horrendous with code smells or bugs as i get to grips with what I have patched together from what i have found online. Many thanks in advance.
Let's compare the behaviour of your line:
await Task.WhenAll(tasks.AsParallel().Select(async task => await task));
in contrast with:
await Task.WhenAll(tasks);
What are you delegating to PLINQ in the first case? Only the await operation, which does basically nothing - it invokes the async/await machinery to wait for one task. So you're setting up a PLINQ query that does all the heavy work of partitioning and merging the results of an operation that amounts to "do nothing until this task completes". I doubt that is what you want.
If you have another await in your methods, Task.WhenAll() will not help as Async methods are not parallel.
I couldn't find that in any of the answers to the linked questions, except for one comment under the question itself. I'd say that it's probably a misconception, stemming from the fact that async/await doesn't magically turn your code into concurrent code. But, assuming you're in an environment without a custom SynchronizationContext (so not an ASP or WPF app), continuations to async functions will be scheduled on the thread pool and possibly run in parallel. I'll delegate you to this answer to shed some light on that. That basically means that if your SendAsync looks something like this:
Task SendAsync(IMessage message)
{
// Synchronous initialization code.
await something;
// Continuation code.
}
Then:
The first part before await runs synchronously. If this part is heavyweight, you should introduce parallelism in SendAll so that the initialization code is run in parallel.
await works as usual, waiting for work to complete without using up any threads.
The continuation code will be scheduled on the thread pool, so if a few awaits finish up at the same time their continuations might be run in parallel if there's enough threads in the thread pool.
All of the above is assuming that await something actually awaits asynchronously. If there's a chance that await something completes synchronously, then the continuation code will also run synchronously.
Now there is a catch. In the question you linked one of the answers states:
Task.WhenAll() has a tendency to become unperformant with large scale/amount of tasks firing simultaneously - without moderation/throttling.
Now I don't know if that's true, since I weren't able to find any other source claiming that. I guess it's possible and in that case it might actually be beneficial to invoke PLINQ to deal with partitioning and throttling for you. However, you said you typically handle 1-5 functions, so you shouldn't worry about this.
So to summarize, parallelism is hard and the correct approach depends on how exactly your SendAsync method looks like. If it has heavyweight initialization code and that's what you want to parallelise, you should run all the calls to SendAsync in parallel. Otherwise, async/await will be implicitly using the thread pool anyway, so your call to PLINQ is redundant.
If we fill a list of Tasks that need to do both CPU-bound and I/O bound work, by simply passing their method declaration to that list (Not by creating a new task and manually scheduling it by using Task.Start), how exactly are these tasks handled?
I know that they are not done in parallel, but concurrently.
Does that mean that a single thread will move along them, and that single thread might not be the same thread in the thread pool, or the same thread that initially started waiting for them all to complete/added them to the list?
EDIT: My question is about how exactly these items are handled in the list concurrently - is the calling thread moving through them, or something else is going on?
Code for those that need code:
public async Task SomeFancyMethod(int i)
{
doCPUBoundWork(i);
await doIOBoundWork(i);
}
//Main thread
List<Task> someFancyTaskList = new List<Task>();
for (int i = 0; i< 10; i++)
someFancyTaskList.Add(SomeFancyMethod(i));
// Do various other things here --
// how are the items handled in the meantime?
await Task.WhenAll(someFancyTaskList);
Thank you.
Asynchronous methods always start running synchronously. The magic happens at the first await. When the await keyword sees an incomplete Task, it returns its own incomplete Task. If it sees a complete Task, execution continues synchronously.
So at this line:
someFancyTaskList.Add(SomeFancyMethod(i));
You're calling SomeFancyMethod(i), which will:
Run doCPUBoundWork(i) synchronously.
Run doIOBoundWork(i).
If doIOBoundWork(i) returns an incomplete Task, then the await in SomeFancyMethod will return its own incomplete Task.
Only then will the returned Task be added to your list and your loop will continue. So the CPU-bound work is happening sequentially (one after the other).
There is some more reading about this here: Control flow in async programs (C#)
As each I/O operation completes, the continuations of those tasks are scheduled. How those are done depends on the type of application - particularly, if there is a context that it needs to return to (desktop and ASP.NET do unless you specify ConfigureAwait(false), ASP.NET Core doesn't). So they might run sequentially on the same thread, or in parallel on ThreadPool threads.
If you want to immediately move the CPU-bound work to another thread to run that in parallel, you can use Task.Run:
someFancyTaskList.Add(Task.Run(() => SomeFancyMethod(i)));
If this is in a desktop application, then this would be wise, since you want to keep CPU-heavy work off of the UI thread. However, then you've lost your context in SomeFancyMethod, which may or may not matter to you. In a desktop app, you can always marshall calls back to the UI thread fairly easily.
I assume you don't mean passing their method declaration, but just invoking the method, like so:
var tasks = new Task[] { MethodAsync("foo"),
MethodAsync("bar") };
And we'll compare that to using Task.Run:
var tasks = new Task[] { Task.Run(() => MethodAsync("foo")),
Task.Run(() => MethodAsync("bar")) };
First, let's get the quick answer out of the way. The first variant will have lower or equal parallelism to the second variant. Parts of MethodAsync will run the caller thread in the first case, but not in the second case. How much this actually affects the parallelism depends entirely on the implementation of MethodAsync.
To get a bit deeper, we need to understand how async methods work. We have a method like:
async Task MethodAsync(string argument)
{
DoSomePreparationWork();
await WaitForIO();
await DoSomeOtherWork();
}
What happens when you call such a method? There is no magic. The method is a method like any other, just rewritten as a state machine (similar to how yield return works). It will run as any other method until it encounters the first await. At that point, it may or may not return a Task object. You may or may not await that Task object in the caller code. Ideally, your code should not depend on the difference. Just like yield return, await on a (non-completed!) task returns control to the caller of the method. Essentially, the contract is:
If you have CPU work to do, use my thread.
If whatever you do would mean the thread isn't going to use the CPU, return a promise of the result (a Task object) to the caller.
It allows you to maximize the ratio of what CPU work each thread is doing. If the asynchronous operation doesn't need the CPU, it will let the caller do something else. It doesn't inherently allow for parallelism, but it gives you the tools to do any kind of asynchronous operation, including parallel operations. One of the operations you can do is Task.Run, which is just another asynchronous method that returns a task, but which returns to the caller immediately.
So, the difference between:
MethodAsync("foo");
MethodAsync("bar");
and
Task.Run(() => MethodAsync("foo"));
Task.Run(() => MethodAsync("bar"));
is that the former will return (and continue to execute the next MethodAsync) after it reaches the first await on a non-completed task, while the latter will always return immediately.
You should usually decide based on your actual requirements:
Do you need to use the CPU efficiently and minimize context switching etc., or do you expect the async method to have negligible CPU work to do? Invoke the method directly.
Do you want to encourage parallelism or do you expect the async method to do interesting amounts of CPU work? Use Task.Run.
Here is your code rewritten without async/await, with old-school continuations instead. Hopefully it will make it easier to understand what's going on.
public Task CompoundMethodAsync(int i)
{
doCPUBoundWork(i);
return doIOBoundWorkAsync(i).ContinueWith(_ =>
{
doMoreCPUBoundWork(i);
});
}
// Main thread
var tasks = new List<Task>();
for (int i = 0; i < 10; i++)
{
Task task = CompoundMethodAsync(i);
tasks.Add(task);
}
// The doCPUBoundWork has already ran synchronously 10 times at this point
// Do various things while the compound tasks are progressing concurrently
Task.WhenAll(tasks).ContinueWith(_ =>
{
// The doIOBoundWorkAsync/doMoreCPUBoundWork have completed 10 times at this point
// Do various things after all compound tasks have been completed
});
// No code should exist here. Move everything inside the continuation above.
When I need some parallel processing I usually do it like this:
static void Main(string[] args)
{
var tasks = new List<Task>();
var toProcess = new List<string>{"dog", "cat", "whale", "etc"};
toProcess.ForEach(s => tasks.Add(CanRunAsync(s)));
Task.WaitAll(tasks.ToArray());
}
private static async Task CanRunAsync(string item)
{
// simulate some work
await Task.Delay(10000);
}
I had cases when this did not process the items in parallel and had to use Task.Run to force it to run on different threads.
What am I missing?
Task means "a thing that needs doing, which may have already completed, may be executing on a parallel thread, or may be depending on out-of-process data (sockets, etc), or might just be ... connected to a switch somewhere that says 'done'" - it has very little to do with threading, other than: if you schedule a continuation (aka await), then somehow that will need to get back onto a thread to fire, but how that happens and what that means is up to whatever code created and owns the task.
Note: parallelism can be expressed in terms of multiple tasks (if you so choose), but multiple tasks doesn't imply parallelism.
In your case: it all depends on what CanRun does or is - and we don't know that. It should also probably be called CanRunAsync.
I had cases when this did not process the items in parallel and had to use Task.Run to force it to run on different threads.
Most likely these cases were associated with methods that have an asynchronous contract, but their implementation is synchronous. Like this method for example:
static async Task NotAsync(string item)
{
Thread.Sleep(10000); // Simulate a CPU-bound calculation, or a blocking I/O operation
await Task.CompletedTask;
}
Any thread that invokes this method will be blocked for 10 seconds, and then it will be handed back an already completed task. Although the contract of the NotAsync method is asynchronous (it has an awaitable return type), its actual implementation is synchronous because it does all the work during the invocation. So when you try to create multiple tasks by invoking this method:
toProcess.ForEach(s => tasks.Add(NotAsync(s)));
...the current thread will be blocked for 10 seconds * number of tasks. When these tasks are created they are all completed, so waiting for their completion will cause zero waiting:
Task.WaitAll(tasks.ToArray()); // Waits for 0 seconds
By wrapping the NotAsync in a Task.Run you ensure that the current thread will not be blocked, because the NotAsync will be invoked on the ThreadPool.
toProcess.ForEach(s => tasks.Add(Task.Run(() => NotAsync(s))));
The Task.Run returns immediately a Task, with guaranteed zero blocking.
It should be noted that writing asynchronous methods with synchronous implementations violates Microsoft's guidelines:
An asynchronous method that is based on TAP can do a small amount of work synchronously, such as validating arguments and initiating the asynchronous operation, before it returns the resulting task. Synchronous work should be kept to the minimum so the asynchronous method can return quickly.
But sometimes even Microsoft violates this guideline. That's because violating this one is better than violating the guideline about not exposing asynchronous wrappers for synchronous methods. In order words exposing APIs that call Task.Run internally in order to give the impression of being asynchronous, is an even greater sin than blocking the current thread.
I inherited a large web application that uses MVC5 and C#. Some of our controllers make several slow database calls and I want to make them asynchronous in an effort to allow the worker threads to service other requests while waiting for the database calls to complete. I want to do this with the least amount of refactoring. Say I have the following controller
public string JsonData()
{
var a = this.servicelayer.getA();
var b = this.servicelayer.getB();
return SerializeObject(new {a, b});
}
I have made the two expensive calls a, b asynchronous by leaving the service layer unchanged and rewriting the controller as
public async Task<string> JsonData()
{
var task1 = Task<something>.Run(() => this.servicelayer.getA());
var task2 = Task<somethingelse>.Run(() => this.servicelayer.getB());
await Task.WhenAll(task1, task2);
var a = await task1;
var b = await task2;
return SerializeObject(new {a, b});
}
The above code runs without any issues but I can't tell using Visual Studio if the worker threads are now available to service other requests or if using Task.Run() in a asp.net controller doesn't do what I think it does. Can anyone comment on the correctness of my code and if it can be improved in any way? Also, I read that using async in a controller has additional overhead and should be used only for long running code. What is the minimum criteria that I can use to decide if the controller needs async? I understand that every use case is different but wondering if there is a baseline that I can use as a starting point. 2 database calls? anything over 2 seconds to return?
The guideline is that you should use async whenever you have I/O. I.e., a database. The overhead is miniscule compared to any kind of I/O.
That said, blocking a thread pool thread via Task.Run is what I call "fake asynchrony". It's exactly what you don't want to do on ASP.NET.
Instead, start at your "lowest-level" code and make that truly asynchronous. E.g., EF6 supports asynchronous database queries. Then let the async code grow naturally from there towards your controller.
The only improvement the new code has is it runs both A and B concurrently and not one at a time. There's actually no real asynchrony in this code.
When you use Task.Run you are offloading work to be done on another thread, so basically you start 2 threads and release the current thread while awaiting both tasks (each of them running completely synchronously)
That means that the operation will finish faster (because of the parallelism) but will be using twice the threads and so will be less scalable.
What you do want to do is make sure all your operations are truly asynchronous. That will mean having a servicelayer.getAAsync() and servicelayer.getBAsync() so you could truly release the threads while IO is being processed:
public async Task<string> JsonData()
{
return SerializeObject(new {await servicelayer.getAAsync(), await servicelayer.getBAsync()});
}
If you can't make sure your actual IO operations are truly async, it would be better to keep the old code.
More on why to avoid Task.Run: Task.Run Etiquette Examples: Don't Use Task.Run in the Implementation