Number of threads required for IO bound async work - c#

TL;DR:
Is it possible to kick off a series of IO bound tasks using only one thread using async await?
In less brief:
Trying to learn async await. In this video ("Async Best Practices for C# and Visual Basic"), the speaker gives an example of using async await to kick off some IO bound work. He explicitly says (at 21m 40s) whilst explaining why parallel for loops are not optimal as they use up loads of threads:
We don't need more threads for this. We don't need two threads...
Can we really kick off multiple requests asyncronously without using more than one thread? How? Unfortunately, the speaker didn't provide all the code so here's my stab at it:
// Pretty much exactly the same as video
private async Task<List<string>> LoadHousesAsync()
{
// Running on the UI thread
Debug.Print("Thread: " + Thread.CurrentThread.ManagedThreadId);
var tasks = new List<Task<string>>();
for (int i = 0; i < 5; i++)
{
Task<string> t = LoadHouseAsync(i);
tasks.Add(t);
}
string[] loadedHouses = await Task.WhenAll(tasks);
return loadedHouses.ToList();
}
// My guess of the LoadHouseAsync method
private Task<string> LoadHouseAsync(int i)
{
// Running on the UI thread
Debug.Print("Thread: " + Thread.CurrentThread.ManagedThreadId);
return Task.Run(() => LoadHouse(i));
}
// My guess of the LoadHouse method
private string LoadHouse(int i)
{
// **** This is on a different thread :( ****
Debug.Print("Thread: " + Thread.CurrentThread.ManagedThreadId);
Thread.Sleep(5000); // simulate I/O bound work
return "House" + i;
}
Here's the output.
Thread: 10
Thread: 10
Thread: 3
Thread: 10
Thread: 10
Thread: 11
Thread: 10
Thread: 12
Thread: 10
Thread: 13
Thread: 14

You can do it with async I/O. What you made is a very nice example of doing it wrong (unfortunately, it's also quite common).
Task.Run runs a method on a thread pool thread, Thread.Sleep blocks the thread. So your example simulates doing synchronous (blocking) I/O on multiple threads.
To correctly perform async I/O, you need to use async methods all the way down. Never use Task.Run for I/O. You can simulate an asynchronous I/O method using Task.Delay:
private async Task<string> LoadHouseAsync(int i)
{
Debug.Print("Thread: " + Thread.CurrentThread.ManagedThreadId);
await Task.Delay(5000); // simulate async I/O bound work
return "House" + i;
}

Can we really kick off multiple requests asyncronously without using more than one thread?
Yes, here is a real world example (pseudo asp.net-mvc and entity framework, where EF will make IO calls to Sql Server for example).
public async ActionResult()
{
var model = new Company();
using (var db1 = new DbContext)
using (var db2 = new DbContext)
{
var task1 = db1.Employees.ToListAsync();
var task2 = db1.Managers.ToListAsync();
await Task.WhenAll(task1, task2);
model.employees = task1.Result;
model.managers = task2.Result;
}
return View(model);
}

Related

What makes mixing of sync and async Tasks so terrible slow in the following example?

I ran into big problems when (unintentionally) mixing async with syn tasks: The following examples are compressed versions of the original problem.
Platform is Windows 10, Microsoft.NET.Sdk.Web, 2 Cores, 4 Logical prcessors # 2.4 GHz
This code represents the original problem: it executes one sync and one async tasks each 20 times:
var sema = new SemaphoreSlim(1);
var tasks = new List<Task>();
for (int i = 0; i < 20; i++)
{
var t2 = Task.Run(async () =>
{
var sw = new Stopwatch();
sw.Start();
await sema.WaitAsync().ConfigureAwait(false);
try
{
await Task.Delay(1).ConfigureAwait(false);
}
finally
{
sema.Release();
}
sw.Stop();
Console.WriteLine($"sync {sw.Elapsed}");
});
var t1 = Task.Run(() =>
{
var sw = new Stopwatch();
sw.Start();
sema.Wait();
try
{
}
finally
{
sema.Release();
}
sw.Stop();
Console.WriteLine($"async {sw.Elapsed}");
});
tasks.Add(t1);
tasks.Add(t2);
}
await Task.WhenAll(tasks).ConfigureAwait(false);
It takes about 16s to complete. Notably the first sync is after 800ms.
sync 00:00:00.8306484
sync 00:00:16.8401071
sync 00:00:16.8559379
sync 00:00:16.8697014
async 00:00:16.8697706
async 00:00:16.8699273
async 00:00:16.8710140
async 00:00:16.8710523
sync 00:00:16.0248058
async 00:00:16.0246810
sync 00:00:15.0783237
async 00:00:15.0782280
sync 00:00:14.5762648
async 00:00:14.5760971
sync 00:00:13.5689368
async 00:00:13.5591823
sync 00:00:12.6271075
async 00:00:12.6270483
sync 00:00:11.6318846
async 00:00:11.6317560
sync 00:00:10.6406636
async 00:00:10.6404542
sync 00:00:09.1580280
async 00:00:09.1574764
sync 00:00:08.1862783
async 00:00:08.1860869
sync 00:00:07.2034033
async 00:00:07.2032430
sync 00:00:06.2139071
async 00:00:06.2136905
sync 00:00:05.2354887
async 00:00:05.2353404
sync 00:00:04.2503136
async 00:00:04.2501821
sync 00:00:03.2656311
async 00:00:03.2655521
sync 00:00:02.2806897
async 00:00:02.2805796
sync 00:00:01.2974060
async 00:00:01.2972398
In contrast, the following code runs two async tasks each 20 times:
var sema = new SemaphoreSlim(1);
var tasks = new List<Task>();
for (int i = 0; i < 20; i++)
{
var t2 = Task.Run(async () =>
{
var sw = new Stopwatch();
sw.Start();
await sema.WaitAsync().ConfigureAwait(false);
try
{
await Task.Delay(1).ConfigureAwait(false);
}
finally
{
sema.Release();
}
sw.Stop();
Console.WriteLine($"sync {sw.Elapsed}");
});
var t1 = Task.Run(async () =>
{
var sw = new Stopwatch();
sw.Start();
await sema.WaitAsync().ConfigureAwait(false);
try
{
}
finally
{
sema.Release();
}
sw.Stop();
Console.WriteLine($"async {sw.Elapsed}");
});
tasks.Add(t1);
tasks.Add(t2);
}
await Task.WhenAll(tasks).ConfigureAwait(false);
It takes only 300ms to complete:
async 00:00:00.0180861
sync 00:00:00.0329542
async 00:00:00.0181292
sync 00:00:00.0177771
async 00:00:00.0432851
sync 00:00:00.0476872
sync 00:00:00.0635321
sync 00:00:00.0774490
async 00:00:00.0775557
async 00:00:00.0775724
async 00:00:00.0775398
sync 00:00:00.0942652
sync 00:00:00.1082544
async 00:00:00.1080930
sync 00:00:00.1240859
async 00:00:00.1246952
sync 00:00:00.1397922
async 00:00:00.1414005
sync 00:00:00.1547058
async 00:00:00.1546395
sync 00:00:00.1705435
async 00:00:00.1705003
sync 00:00:00.1863422
async 00:00:00.1865136
sync 00:00:00.2052246
async 00:00:00.2054538
sync 00:00:00.2172049
async 00:00:00.2171460
sync 00:00:00.2330110
async 00:00:00.2329556
sync 00:00:00.2489691
async 00:00:00.2492344
sync 00:00:00.2647401
async 00:00:00.2645481
async 00:00:00.2645736
sync 00:00:00.2785660
async 00:00:00.2785652
sync 00:00:00.2944074
sync 00:00:00.3116578
async 00:00:00.3184570
I know, that mixing async and sync should be avoided. But what is the reason for the immense delay in the first variant? I cannot imagine what goes on within the 16 seconds, which is almost "infinite time" for the CPU.
Moreover, why is even the first message of the first case only after 800ms: This time itself is already very unexpected.
The thread pool is designed for operations that run quickly. Your first program schedules a bunch of work to the thread pool that runs for an extremely long period of time (because you've scheduled a ton of operations that can only ever run sequentially anyway, so they're all waiting on each other), you're scheduling more work than there are workers, so you end up in the situation where every single worker is just sitting there waiting on other work further down the queue of the thread pool. In this situation you've basically generated the most common async deadlock situation as you're blocking the scheduler from running the continuations needed to let work finish, only it doesn't technically deadlock because the thread pool will notice that no work is being done, and add more workers over time, each of which will just sit there and do nothing, waiting for things further down the queue of work to finally be scheduled. Eventually you end up having enough thread pool threads that work can actually proceed. But work won't proceed until the thread pool finally has created about as many threads as you have work scheduled for it, and as you can see, that takes some time.
When you do the whole thing asynchronously you don't have that common sync over async problem of blocking the scheduler from doing more work, as the work you're having the thread pool do is only ever the actual work that needs to be done, instead of having the workers sit there and block while waiting for other things to finish.

Is parallel asynchronous execution where a thread sleeps using multiple threads?

This is the code that I wrote to better understand asynchronous methods. I knew that an asynchronous method is not the same as multithreading, but it does not seem so in this particular scenario:
class Program
{
static void Main(string[] args)
{
Thread.CurrentThread.CurrentCulture = new System.Globalization.CultureInfo("en-US");
//the line above just makes sure that the console output uses . to represent doubles instead of ,
ExecuteAsync();
Console.ReadLine();
}
private static async Task ParallelAsyncMethod() //this is the method where async parallel execution is taking place
{
List<Task<string>> tasks = new List<Task<string>>();
for (int i = 0; i < 5; i++)
{
tasks.Add(Task.Run(() => DownloadWebsite()));
}
var strings = await Task.WhenAll(tasks);
foreach (var str in strings)
{
Console.WriteLine(str);
}
}
private static string DownloadWebsite() //Imitating a website download
{
Thread.Sleep(1500); //making the thread sleep for 1500 miliseconds before returning
return "Download finished";
}
private static async void ExecuteAsync()
{
var watch = Stopwatch.StartNew();
await ParallelAsyncMethod();
watch.Stop();
Console.WriteLine($"It took the machine {watch.ElapsedMilliseconds} milliseconds" +
$" or {Convert.ToDouble(watch.ElapsedMilliseconds) / 1000} seconds to complete this task");
Console.ReadLine();
}
}
//OUTPUT:
/*
Download finished
Download finished
Download finished
Download finished
Download finished
It took the machine 1537 milliseconds or 1.537 seconds to complete this task
*/
As you can see, the DownloadWebsite method waits for 1.5 seconds and then returns "a". The method called ParallelAsyncMethod adds five of these methods into the "tasks" list and then starts the parallel asynchronous execution. As you can see, I also tracked the amount of time that it takes for the ExecuteAsync method to be executed. The result is always somewhere around 1540 milliseconds. Here is my question: if the DownloadWebsite method required a thread to sleep 5 times for 1500 milliseconds, does it mean that the parallel execution of these methods required 5 different threads? If not, then how come it only took the program 1540 milliseconds to be executed and not ~7500 ms?
I knew that an asynchronous method is not the same as multi-threading
That is correct, an asynchronous method releases the current thread whilst I/O occurs, and schedules a continuation after it's completion.
Async and threads are completely unrelated concepts.
but it does not seem so in this particular scenario
That is because you explicitly run DownloadWebsite on the ThreadPool using Task.Run, which imitates asynchronous code by returning a Task after instructing the provided delegate to run.
Because you are not waiting for each Task to complete before starting the next, multiple threads can be used simultaneously.
Currently each thread is being blocked, as you have used Thread.Sleep in the implementation of DownloadWebsite, meaning you are actually running 5 synchronous methods on the ThreadPool.
In production code your DownloadWebsite method should be written asynchronously, maybe using HttpClient.GetAsync:
private static async Task<string> DownloadWebsiteAsync()
{
//...
await httpClinet.GetAsync(//...
//...
}
In that case, GetAsync returns a Task, and releases the current thread whilst waiting for the HTTP response.
You can still run multiple async methods concurrently, but as the thread is released each time, this may well use less than 5 separate threads and may even use a single thread.
Ensure that you dont use Task.Run with an asynchronous method; this simply adds unnecessary overhead:
var tasks = new List<Task<string>>();
for (int i = 0; i < 5; i++)
{
tasks.Add(DownloadWebsiteAsync()); // No need for Task.Run
}
var strings = await Task.WhenAll(tasks);
As an aside, if you want to imitate an async operation, use Task.Delay instead of Thread.Sleep as the former is non-blocking:
private static async Task<string> DownloadWebsite() //Imitating a website download
{
await Task.Delay(1500); // Release the thread for ~1500ms before continuing
return "Download finished";
}

Async/Await single thread/some threads

I need a little rule about correct usage of await. Run this code in .net core c# 7.2:
static class Program
{
static async Task<string> GetTaskAsync(int timeout)
{
Console.WriteLine("Task Thread: " + Thread.CurrentThread.ManagedThreadId);
await Task.Delay(timeout);
return timeout.ToString();
}
static async Task Main()
{
Console.WriteLine("Main Thread: " + Thread.CurrentThread.ManagedThreadId);
Console.WriteLine("Should be greater than 5000");
await Watch(NotParallel);
Console.WriteLine("Should be less than 5000");
await Watch(Parallel);
}
public static async Task Parallel()
{
var res1 = GetTaskAsync(2000);
var res2 = GetTaskAsync(3000);
Console.WriteLine("result: " + await res1 + await res2);
}
public static async Task NotParallel()
{
var res1 = await GetTaskAsync(2000);
var res2 = await GetTaskAsync(3000);
Console.WriteLine("result: " + res1 + res2);
}
private static async Task Watch(Func<Task> func) {
var sw = new Stopwatch();
sw.Start();
await func?.Invoke();
sw.Stop();
Console.WriteLine("Elapsed: " + sw.ElapsedMilliseconds);
Console.WriteLine("---------------");
}
}
As you all can see the behavior of two methods are different. It's easy to get wrong in practice. So i need a "thumb rule".
Update for real men Please, run code. And explain please why Parallel() runs faster than NonParallel().
While calling GetTaskAsync without await, you actually get a Task with the method to execute (that is, GetTaskAsync) wrapped in. But when calling await GetTaskAsync, execution is suspended until the method is done executing, and then you get the result.
Let me be more clear:
var task = GetTaskAsync(2000);
Here, task is of type Task<string>.
var result = await GetTaskAsync(2000);
Here result is of type string.
So to address your first interrogation: when to await your Tasks really depends on your execution flow.
Now, as to why Parallel() is faster, I suggest your read this article (everything is of interest, but for your specific example, you may jump to Tasks return "hot").
Now let's break it down:
The await keyword serves to halt the code until the task is completed,
but doesn't actually start it.
In your example, NotParallel() will take longer because your Tasks execute sequentially, one after the other. As the article explains:
This is due to the tasks being awaited inline.
In Parallel() however...
the tasks now run in parallel. This is due to the fact that all [tasks]
are started before all [tasks] are subsequently awaited, again, because
they return hot.
About 'hot' tasks
I suggest your read the following: Task-based Asynchronous Pattern (TAP)
The Task Status section is of interest here to understand the concepts of cold and hot tasks:
Tasks that are created by the public Task constructors are referred to as cold tasks, because they begin their life cycle in the non-scheduled Created state and are scheduled only when Start is called on these instances.
All other tasks begin their life cycle in a hot state, which means that the asynchronous operations they represent have already been initiated
I invite you to read extensively about async/await and Tasks. Here are a few resources in addition to the ones I provided above:
Asynchronous Programming in C# 5.0 part two: Whence await?
Async/Await - Best Practices in Asynchronous Programming
Async and Await

Tasks being called synchronously despite being declared async

Consider the following code:
public async static Task<bool> Sleeper(int sleepTime)
{
Console.WriteLine("Sleeping for " + sleepTime + " seconds");
System.Threading.Thread.Sleep(1000 * sleepTime);
return true;
}
static void Main(string[] args)
{
Random rnd = new Random();
List<Task<bool>> tasks = new List<Task<bool>>();
Console.WriteLine("Kicking off tasks");
for (int i = 0; i < 3; i++)
{
tasks.Add(Sleeper(rnd.Next(10, 15)));
}
Console.WriteLine("All tasks launched");
Task.WhenAll(tasks);
int nComplete = 0;
foreach (var task in tasks)
{
if (task.Result)
nComplete++;
}
Console.WriteLine(nComplete + " Successful tasks");
}
Each task should sleep for a random amount of time (between 10-15 seconds). However my output looks like the following
Kicking off tasks
Sleeping for 12 seconds
Sleeping for 14 seconds
Sleeping for 12 seconds
All tasks launched
3 Successful tasks
Each "task" clearly waited for the previous task to be completed before starting (I also saw this when debugging and stepping through the code), why is this?
EDIT A lot of people have mentioned using Task.Delay which does work as expected. But what if I'm not doing anything like sleeping, just a lot of work. Consider a large do nothing loop
int s = 1;
for (int i = 0; i < 100000000000000; i++)
s *= i;
This still executes synchronously
async does not mean "runs on another thread". Stephen Toub's blog goes into a lot more detail, but under the hood the current SynchronizationContext and the operations performed determines if and when a task runs on a separate thread. In your case, Thread.Sleep doesn't do anything explicitly to run on a different thread, so it doesn't.
If you used await Task.Delay(1000 + sleepTime) instead of Thread.Sleep I think you'll find that things work as you expect, because Task.Delay is plugged into the async/await infrastructure, while Thread.Sleep isn't.
This is because you are using Thread.Sleep which is sleeping the thread which invokes Sleeper. async methods start on the thread which call them hence you are sleeping your main application thread.
In asynchronous code, you should be using Task.Delay like so:
public async static Task<bool> Sleeper(int sleepTime)
{
Console.WriteLine("Sleeping for " + sleepTime + " seconds");
await Task.Delay(1000 * sleepTime).ConfigureAwait(false);
return true;
}
async does not mean "run on another thread". I have an async intro that goes into detail about what async does mean.
In fact, Sleeper will generate a compiler warning informing you of the fact that it will run synchronously. It's a good idea to turn on "Warnings as Errors" for all new projects.
If you have CPU-bound work to do, you may run it on a thread pool thread and asynchronously wait for it to complete by using await Task.Run, as such:
public static bool Sleeper(int sleepTime);
...
for (int i = 0; i < 3; i++)
{
var sleepTime = rnd.Next(10, 15);
tasks.Add(Task.run(() => Sleeper(sleepTime)));
}

Checking if a thread returned to thread pool

How can I check if a thread returned to the thread pool, using VS C# 2015 debugger?
What's problematic in my case is the fact that it cannot be detected by debugging line by line.
async Task foo()
{
int y = 0;
await Task.Delay(5);
// (1) thread 2000 returns to thread pool here...
while (y<5) y++;
}
async Task testAsync()
{
Task task = foo();
// (2) ... and here thread 2000 is back from the thread pool, to run the code below. I want
// to confirm that it was in the thread pool in the meantime, using debugger.
int i = 0;
while (i < 100)
{
Console.WriteLine("Async 1 before: " + i++);
}
await task;
}
In the first line of testAsync running on thread 2000, foo is called. Once it encounters await Task.Delay(5), thread 2000 returns to thread pool (allegedly, I'm trying to confirm this), and the method waits for Task.Delay(5) to complete. In the meantime, the control returns to the caller and the first loop of testAsync is executed on thread 2000 as well.
So between two consecutive lines of code, the thread returned to thread pool and came back from there. How can I confirm this with debugger? Possibly with Threads debugger window?
To clarify a bit more what I'm asking: foo is running on thread 2000. There are two possible scenarios:
When it hits await Task.Delay(5), thread 2000 returns to the thread pool for a very short time, and the control returns to the caller, at line (2), which will execute on thread 2000 taken from the thread pool. If this is true, you can't detect it easily, because Thread 2000 was in the thread pool during time between two consecutive lines of code.
When it hits await Task.Delay(5), thread 2000 doesn't return to thread pool, but immediately executes code in testAsync starting from line (2)
I'd like to verify which one is really happening.
There is a major mistake in your assumption:
When it hits await Task.Delay(5), thread 2000 returns to the thread pool
Since you don't await foo() yet, when thread 2000 hits Task.Delay(5) it just creates a new Task and returns to testAsync() (to int i = 0;). It moves on to the while block, and only then you await task. At this point, if task is not completed yet, and assuming the rest of the code is awaited, thread 2000 will return to the thread pool. Otherwise, if task is already completed, it will synchronously continue from foo() (at while (y<5) y++;).
EDIT:
what if the main method called testAsync?
When synchronous method calls and waits async method, it must block the thread if the async method returns uncompleted Task:
void Main()
{
var task = foo();
task.Wait(); //Will block the thread if foo() is not completed.
}
Note that in the above case the thread is not returning to the thread pool - it is completely suspended by the OS.
Maybe you can give an example of how to call testAsync so that thread 2000 returns to the thread pool?
Assuming thread 2k is the main thread, it cannot return to the thread pool. But you can use Task.Run(()=> foo()) to run foo() on the thread pool, and since the calling thread is the main thread, another thread pool thread will pick up that Task. So the following code:
static void Main(string[] args)
{
Console.WriteLine("main started on thread {0}", Thread.CurrentThread.ManagedThreadId);
var testAsyncTask = Task.Run(() => testAsync());
testAsyncTask.Wait();
}
static async Task testAsync()
{
Console.WriteLine("testAsync started on thread {0}", Thread.CurrentThread.ManagedThreadId);
await Task.Delay(1000);
Console.WriteLine("testAsync continued on thread {0}", Thread.CurrentThread.ManagedThreadId);
}
Produced (on my PC) the following output:
main started on thread 1
testAsync started on thread 3
testAsync continued on thread 4
Press any key to continue . . .
Threads 3 and 4 came from and returned to the thread pool.
You can print out the Thread.CurrentThread.ManagedThreadId to the console. Note that the thread-pool is free to re-use that same thread to run continuations on it, so there's no guarantee that it'll be different:
void Main()
{
TestAsync().Wait();
}
public async Task FooAsync()
{
int y = 0;
await Task.Delay(5);
Console.WriteLine($"After awaiting in FooAsync:
{Thread.CurrentThread.ManagedThreadId }");
while (y < 5) y++;
}
public async Task TestAsync()
{
Console.WriteLine($"Before awaiting in TestAsync:
{Thread.CurrentThread.ManagedThreadId }");
Task task = foo();
int i = 0;
while (i < 100)
{
var x = i++;
}
await task;
Console.WriteLine($"After awaiting in TestAsync:
{Thread.CurrentThread.ManagedThreadId }");
}
Another thing you can check is ThreadPool.GetAvailableThreads to determine if another worker has been handed out for use:
async Task FooAsync()
{
int y = 0;
await Task.Delay(5);
Console.WriteLine("Thread-Pool threads after first await:");
int avaliableWorkers;
int avaliableIo;
ThreadPool.GetAvailableThreads(out avaliableWorkers, out avaliableIo);
Console.WriteLine($"Available Workers: { avaliableWorkers},
Available IO: { avaliableIo }");
while (y < 1000000000) y++;
}
async Task TestAsync()
{
int avaliableWorkers;
int avaliableIo;
ThreadPool.GetAvailableThreads(out avaliableWorkers, out avaliableIo);
Console.WriteLine("Thread-Pool threads before first await:");
Console.WriteLine($"Available Workers: { avaliableWorkers},
Available IO: { avaliableIo }");
Console.WriteLine("-------------------------------------------------------------");
Task task = FooAsync();
int i = 0;
while (i < 100)
{
var x = i++;
}
await task;
}
On my machine, this yields:
Thread-Pool threads before first await:
Available Workers: 1023, Available IO: 1000
----------------------------------------------
Thread-Pool threads after first await:
Available Workers: 1022, Available IO: 1000
I'd like to verify which one is really happening.
There is no way to "verify" that with debugger, because the debugger is made to simulate the logical (synchronous) flow - see Walkthrough: Using the Debugger with Async Methods.
In order to understand what is happening (FYI it's your case (2)), you need to learn how await works starting from Asynchronous Programming with Async and Await - What Happens in an Async Method section, Control Flow in Async Programs and many other sources.
Look at this snippet:
static void Main(string[] args)
{
Task.Run(() =>
{
// Initial thread pool thread
var t = testAsync();
t.Wait();
});
Console.ReadLine();
}
If we make the lambda to be async and use await t; instead of t.Wait();, this is the point where the initial thread will be returned to the thread pool. As I mentioned above, you cannot verify that with debugger. But look at the above code and think logically - we are blocking the initial thread, so if it' wasn't free, your testAsync and foo methods will not be able to resume. But they do, and this can easily be verified by putting breakpoint after await lines.

Categories