I am using Reactive Extensions (Rx) to buffer some data. I'm having an issue though in that I then need to do something asynchronous with this data, yet I don't want the buffer to pass the next group through until the asynchronous operation is complete.
I've tried to structure the code two ways (contrived example):
public async Task processFiles<File>(IEnumerable<File> files)
{
await files.ToObservable()
.Buffer(10)
.SelectMany(fi => fi.Select(f => upload(f)) //Now have an IObservable<Task>
.Select(t => t.ToObservable())
.Merge()
.LastAsync();
}
public Task upload(File item)
{
return Task.Run(() => { //Stuff });
}
or
public async Task processFiles<File>(IEnumerable<File> files)
{
var buffered = files.ToObservable()
.Buffer(10);
buffered.Subscribe(async files => await Task.WhenAll(files.Select(f => upload(f)));
await buffered.LastAsync();
}
public Task upload(File item)
{
return Task.Run(() => { //Stuff });
}
Unfortunately, neither of these methods have worked as the buffer pushes the next group before the async operations complete. The intent is to have each buffered group executed asynchronously and only when that operation is complete, continue with the next buffered group.
Any help is greatly appreciated.
To make sure I understand you correctly, it sounds like you want to ensure you carry on buffering items while only presenting each buffer when the previous buffer has been processed.
You also need to make the processing of each buffer asynchronous.
It's probably valuable to consider some theoretical points, because I have to confess that I'm a bit confused about the approach. IObservable is often said to be the dual of IEnumerable because it mirrors the latter with the key difference being that data is pushed to the consumer rather than the consumer pulling it as it chooses.
You are trying to use the buffered stream like an IEnumerable instead of an IObservable - you essentially want to pull the buffers rather than have them pushed to you - so I do have to wonder if have you picked the right tool for the job? Are you are trying to hold up the buffering operation itself while a buffer is processed? As a consumer having the data pushed at you this isn't really a correct approach.
You could consider applying a ToEnumerable() call to the buffer operation, so that you can deal we the buffers when ready. That won't prevent ongoing buffering occurring while you deal with a current buffer though.
There's little you can do to prevent this - doing the buffer processing synchronously inside a Select() operation applied to the buffer would carry a guarantee that no subsequent OnNext() call would occur until the Select() projection completed. The guarantee comes for free as the Rx library operators enforce the grammar of Rx. But it's only guaranteeing non-overlapping invocations of OnNext() - there's nothing to say a given operator couldn't (and indeed shouldn't) carry on getting the next OnNext() ready to go. That's the nature of a push based API.
It's very unclear why you think you need the projection to be asynchronous if you also want to block the Buffers? Have a think about this - I suspect using a synchronous Select() in your observer might solve the issue but it's not entirely clear from your question.
Similar to a synchronous Select() is a synchronous OnNext() handler which is easier to handle if your processing of items have no results - but it's not the same because (depending on the implementation of the Observable) you are only blocking delivery of OnNext() calls to that Subscriber rather than all Subscribers. However, with just a single Subscriber it's equivalent so you could do something like:
void Main()
{
var source = Observable.Range(1, 4);
source.Buffer(2)
.Subscribe(i =>
{
Console.WriteLine("Start Processing Buffer");
Task.WhenAll(from n in i select DoUpload(n)).Wait();
Console.WriteLine("Finished Processing Buffer");
});
}
private Task DoUpload(int i)
{
return Task.Factory.StartNew(
() => {
Thread.Sleep(1000);
Console.WriteLine("Process File " + i);
});
}
Which outputs (*with no guarantee on the order of Process File x within a Buffer):
Start Processing Buffer
Process File 2
Process File 1
Finished Processing Buffer
Start Processing Buffer
Process File 3
Process File 4
Finished Processing Buffer
If you prefer to use a Select() and your projection returns no results, you can do it like this:
source.Buffer(2)
.Select(i =>
{
Console.WriteLine("Start Processing Buffer");
Task.WhenAll(from n in i select DoUpload(n)).Wait();
Console.WriteLine("Finished Processing Buffer");
return Unit.Default;
}).Subscribe();
NB: Sample code written in LINQPad and including Nuget package Rx-Main. This code is for illustrative purposes - don't Thread.Sleep() in production code!
First, I think your requirement to execute the items from each group in parallel, but each group in series is quite unusual. A more common requirement would be to to execute the items in parallel, but at most n of them at the same time. This way, there are not fixed groups, so if a single items takes too long, other items don't have to wait for it.
To do what you're asking for, I think TPL Dataflow is more suitable than Rx (though some Rx code will still be useful). TPL Dataflow is centered about “blocks” that execute stuff, by default in series, which is exactly what you need.
Your code could look like this:
public static class Extensions
{
public static Task ExecuteInGroupsAsync<T>(
this IEnumerable<T> source, Func<T, Task> func, int groupSize)
{
var block = new ActionBlock<IEnumerable<T>>(
g => Task.WhenAll(g.Select(func)));
source.ToObservable()
.Buffer(groupSize)
.Subscribe(block.AsObserver());
return block.Completion;
}
}
public Task ProcessFiles(IEnumerable<File> files)
{
return files.ExecuteInGroupsAsync(Upload, 10);
}
This leaves most of the heavy lifting on the ActionBlock (and some on Rx). Dataflow blocks can act as Rx observers (and observables), so we can take advantage of that to keep using Buffer().
We want to handle the whole group at once, so we use Task.WhenAll() to create a Task that completes when the whole group completes. Dataflow blocks understand Task-returning functions, so next group won't start executing until the Task returned by the previous group completes.
The final result is the Completion Task, which will complete after the source observable completes and all processing is done.
TPL Dataflow also has BatchBlock, which works like Buffer() and we could directly Post() each item from the collection (without using ToObservable() and AsObserver()), but I think using Rx for this part of the code makes it simpler.
EDIT: Actually you don't need TPL Dataflow here at all. Using ToEnumerable() as James World suggested will be enough:
public static async Task ExecuteInGroupsAsync<T>(
this IEnumerable<T> source, Func<T, Task> func, int groupSize)
{
var groups = source.ToObservable().Buffer(groupSize).ToEnumerable();
foreach (var g in groups)
{
await Task.WhenAll(g.Select(func));
}
}
Or even simpler without Rx using Batch() from morelinq:
public static async Task ExecuteInGroupsAsync<T>(
this IEnumerable<T> source, Func<T, Task> func, int groupSize)
{
var groups = source.Batch(groupSize);
foreach (var group in groups)
{
await Task.WhenAll(group.Select(func));
}
}
Related
This question already has answers here:
Why Async Method always blocking thread?
(4 answers)
Closed 2 years ago.
I have two implementations of concurrent execution of a Task method within a Foreach loop, both I add to a list of Tasks and execute concurrently with Task.WhenAll(Tasks).
In one implementation, I use Tasks.Add(Task.Run(() => DoSomething(item))), while in the second implementation, I omit the Task.Run() lambda function by just doing Tasks.Add(MyMethodAsync(item)).
I then execute the methods concurrently with (await Task.WhenAll(Tasks)).ToList();, as normal.
The result is that the second implementation has minimal performance improvement over a synchronous implementation, while the first implementation has very noticeable improvement (profiled at about ~11.5s compared to ~6s).
My question is, why does the second implementation not work as well? Shouldn't both Task.Run(lambda) and the direct method call both return a list of Tasks?
// The asynchronous method
public async Task<int> MyMethodAsync(Item item)
{
Thread.Sleep(1000); // Emulate CPU bound work
await Task.Delay(1000); // Emulate IO bound work
return 0; // placeholder int
}
// Implementation 1: with Task.Run lambda function
List<Task<int>> Tasks = new List<Task<int>>();
foreach (Item item in items)
{
Tasks.Add(Task.Run(() => MyMethodAsync(item)));
}
List<int> itemResults = (await Task.WhenAll(Tasks)).ToList();
// Implementation 2: without Task.Run lambda function
List<Task<int>> Tasks = new List<Task<int>>();
foreach (Item item in items)
{
Tasks.Add(MyMethodAsync(item));
}
List<int> itemResults = (await Task.WhenAll(Tasks)).ToList();
public async Task<int> MyMethodAsync(Item item)
{
Thread.Sleep(1000); // This is just a placeholder method
return 0; // placeholder int
}
This method, as you have written it, is solely CPU bound. It keeps the CPU busy and as such requires “full attention” of a thread in order for it be processed. It being marked as async and it returning a Task does not mean that there is anything happening asynchronously here. So this mostly equivalent to the following fully synchronous method:
public Task<int> MyMethod(Item item)
{
Thread.Sleep(1000);
return Task.FromResult(0); // this is synchronous!
}
So when you execute this method, it blocks the CPU for that one second and then returns something which happens to be a completed task.
Now, looking at how you are running your test, this also explains what you are seeing. Let’s take the second implementation first:
foreach (Item item in items)
{
Tasks.Add(MyMethodAsync(item));
}
As we realized, MyMethodAsync is fully synchronous. So executing the method here blocks the loop from continuing to the next item before the method returns. That means that for 10 items, the loop completes in approximately 10 seconds and the list will only contain completed tasks then (which will be awaited on instantly).
Compare this to the first implementation:
foreach (Item item in items)
{
Tasks.Add(Task.Run(() => MyMethodAsync(item)));
}
Task.Run is an easy way to offload something onto a thread from the thread pool. This means that you basically create a job for some other thread to execute in the beackground. And Task.Run gives you a task that represents the completion of that job. Since you only call MyMethodAsync inside the lambda function that you pass to Task.Run, this will not block the loop itself. So in the end, this will create 10 individual jobs that will be executed on other threads which will then happen to call the synchronous MyMethodAsync.
Instead of blocking the execution of your main thread which is executing the loop, you are now blocking some thread pool threads. Those threads run concurrently so this will be faster than making each synchronous call in sequence on the main thread. That is why using Task.Run here yields faster results.
That all being said, when you are using Task.Run for this, you are still blocking some threads. Since thread pool threads are generally limited, you usually want to avoid this. Of course, there are exceptions and using Task.Run for CPU-bound work to offload it to a background thread is generally okay, but you should still be careful here since it’s not uncommon that you run out of available threads.
Instead, you should try to do actual asynchronous work here. When something is truly asynchronous, that means that the thread executing the method is released to do something else until the asynchronous work is complete. That means that you are reusing threads more effectively and generally don’t block for things that don’t need to block.
If you replace your method with an asynchronous implementation, you can see this in effect here:
public async Task<int> MyMethodAsync(Item item)
{
await Task.Delay(1000);
return 0;
}
Task.Delay is truly asynchronous, so this method will not block the thread that called it. This should mean that none of your two implementations will block and both will just start the work while then waiting for it to complete.
I’ve tested this with 10000 items and basically saw no real difference. I had a difference of up to 20 milliseconds (in either direction) but I am willing to ignoring this due to not having a good benchmark environment for this particular test.
If anything, using Task.Run here, with an asynchronous method, should be slightly less efficient because it still schedules a thread to run the asynchronous method (meaning that scheduled thread will be released immediately). So there is a slight overhead—which still can be ignored in most applications.
I have to consume the output of multiple asynchronous tasks right after they complete.
Would there be a reasonable perf difference in any of these approaches?
Simple Await
public async Task<List<Baz>> MyFunctionAsync(List<Foo> FooList) {
results = new List<Baz>();
List<Task<List<Baz>>> tasks = new List<Task<List<Baz>>>();
foreach (Foo foo in FooList) {
tasks.Add(FetchBazListFromFoo(entry));
foreach (Task<List<Baz>> task in tasks) {
results.AddRange(await task);
return results;
}
WhenAll
public async Task<List<Baz>> MyFunctionAsync(List<Foo> FooList) {
results = new List<Baz>();
List<Task<List<Baz>>> tasks = new List<Task<List<Baz>>>();
foreach (Foo foo in FooList) {
tasks.Add(FetchBazListFromFoo(entry));
foreach (List<Baz> bazList in await Task.WhenAll(tasks))
results.AddRange(bazList);
return results;
}
WaitAll
public async Task<List<Baz>> MyFunctionAsync(List<Foo> FooList) {
results = new List<Baz>();
List<Task<List<Baz>>> tasks = new List<Task<List<Baz>>>();
foreach (Foo foo in FooList) {
tasks.Add(FetchBazListFromFoo(entry));
foreach (List<Baz> bazList in await Task.WaitAll(tasks))
results.AddRange(bazList);
return results;
}
WhenAny
public async Task<List<Baz>> MyFunctionAsync(List<Foo> FooList) {
results = new List<Baz>();
List<Task<List<Baz>>> tasks = new List<Task<List<Baz>>>();
foreach (Foo foo in FooList) {
tasks.Add(FetchBazListFromFoo(entry));
while (tasks.Count > 0) {
Task<List<Baz>> finished = Task.WhenAny(tasks);
results.AddRange(await finished);
tasks.Remove(finished);
}
return results;
}
FooList has about 100 entries.
FetchBazListFromFoo makes about 30 REST API calls and does some synchronous work for each result of REST API call.
Additionally, Is there an internal overhead diff in WhenAll v WhenAny?
WhenAll returns control after all tasks are completed, while WhenAny returns control as soon as a single task is completed. The latter seems to require more internal management.
The third approach (WaitAll) is invalid because the Task.WaitAll is a void returning method, so it cannot be awaited. This code will just produce a compile-time error.
The other three approaches are very similar, with some subtle differences.
Simple Await: starts all tasks and then awaits them one-by-one. It will collect all results in the correct order. In case of an exception it will return before all tasks are completed, and it will report only the exception of the first failed task (first in order, not chronologically).
Not recommended unless this behavior is exactly what you want (most probably it isn't).
WhenAll: starts all tasks and then awaits all of them to complete. It will collect all results in the correct order. In case of an exception it will return after all tasks have been completed, and it will report only the exception of the first failed task (first in order, not chronologically).
Not recommended unless this behavior is exactly what you want (most probably it isn't either).
WhenAny: starts all tasks and then awaits all of them to complete. It will collect all results in order of completion, so the original order will not be preserved. In case of an exception it will return immediately, and it will report the exception of the first failed task (this time first chronologically, not in order). The while loop introduces an overhead that is absent from the other two approaches, which will be quite significant if the number of tasks is larger than 10,000, and it will grow exponentially as the number of tasks becomes larger.
Not recommended unless this behavior is exactly what you want (I bet by now you should not be a fan of this either).
All of these approaches: will bombard the remote server with a huge number of concurrent requests, making it hard for that machine to respond quickly, and in the worst case triggering a defensive anti-DOS-attack mechanism.
A better solution to this problem is to use the specialized API Parallel.ForEachAsync, available from .NET 6 and later. This method parallelizes multiple asynchronous operations, enforces a maximum
degree of parallelism which by default is Environment.ProcessorCount, and also supports cancellation and fast completion in case of exceptions. You can find a usage example here. This method does not return the results of the asynchronous operations. You can collect the results as a side effect of the asynchronous operations, as shown here.
Another, more advanced, solution is the TPL Dataflow library. A usage example of this library can be found here.
The simple await will perform each item one after another, essentially synchronously - this would be the slowest.
WhenAll will wait for all of tasks to be done - the runtime will be whatever the longest single task is.
Do not use WaitAll - it is synchronous, just use WhenAll
WhenAny allows you to handle each task as it completes. This in will be faster than WhenAll in some cases, depending on how much processing you have to do after the task.
IMO, unless you need to start post processing immediately when each task complets, WhenAll is the simplest/cleanest approach and would work fine in most scenarios.
I have a function like such:
static void AddResultsToDb(IEnumerable<int> numbers)
{
foreach (int number in numbers)
{
int result = ComputeResult(number); // This takes a long time, but is thread safe.
AddResultToDb(number, result); // This is quick but not thread safe.
}
}
I could solve this problem by using, for example, Parallel.ForEach to compute the results, and then use a regular foreach to add the results to the database.
However, for educational purposes, I would like a solution that revolves around await/async. But no matter how much I read about it, I cannot wrap my mind around it. If await/async is not applicable in this context, I would like to understand why.
As others have suggested, this isn't a case of using async/await as that is for asynchrony. What you're doing is concurrency. Microsoft has a framework specifically for that and it solves this problem nicely.
So for learning purposes, you should use Microsoft's Reactive Framework (aka Rx) - NuGet System.Reactive and add using System.Reactive.Linq; - then you can do this:
static void AddResultsToDb(IEnumerable<int> numbers)
{
numbers
.ToObservable()
.SelectMany(n => Observable.Start(() => new { n, r = ComputeResult(n) }))
.Do(x => AddResultToDb(x.n, x.r))
.Wait();
}
The SelectMany/Observable.Start combination allows as many ComputeResult calls to occur as possible concurrently. The nice thing about Rx is that it then serializes the results so that only one call at a time goes to AddResultToDb.
To control the degrees of parallelism you can change the SelectMany to a Select/Merge like this:
static void AddResultsToDb(IEnumerable<int> numbers)
{
numbers
.ToObservable()
.Select(n => Observable.Start(() => new { n, r = ComputeResult(n) }))
.Merge(maxConcurrent: 2)
.Do(x => AddResultToDb(x.n, x.r))
.Wait();
}
The async and await pattern is not really suitable for your first method. It's well suited for IO Bound workloads to achieve scalability, or for frameworks that have UI's for responsiveness. It's less suited for raw CPU workloads.
However you could still get benefits from parallel processing because your first method is expensive and thread safe.
In the following example I used Parallel LINQ (PLINQ) for a fluent expression of the results without worrying about a pre-sized array / concurrent collection / locking, though you could use other TPL functionality, like Parallel.For/ForEach
// Potentially break up the workloads in parallel
// return the number and result in a ValueTuple
var results = numbers.AsParallel()
.Select(x => (number: x, result: ComputeResult(x)))
.ToList();
// iterate through the number and results and execute them serially
foreach (var (number, result) in results)
AddResultToDb(number, result);
Note : The assumption here is the order is not important
Supplemental
Your method AddResultToDb looks like it's just inserting results into a database, which is IO Bound and is worthy of async, furthermore could probably take all results at once and insert them in bulk/batch saving round trips
From Comments credit #TheodorZoulias
To preserve the order you could use the method AsOrdered, at
the cost of some performance penalty. A possible performance
improvement is to remove the ToList(), so that the results are added
to the DB concurrently with the computations.
To make the results available as fast as possible it's probably a good
idea to disable the partial buffering that happens by default, by
chaining the method
.WithMergeOptions(ParallelMergeOptions.NotBuffered) in the query
var results = numbers.AsParallel()
.Select(x => (number: x, result: ComputeResult(x)))
.WithMergeOptions(ParallelMergeOptions.NotBuffered)
.AsOrdered();
Example
Additional resources
ParallelEnumerable.AsOrdered Method
Enables treatment of a data source as if it were ordered, overriding
the default of unordered. AsOrdered may only be invoked on non-generic
sequences
ParallelEnumerable.WithMergeOptions
Sets the merge options for this query, which specify how the query
will buffer output.
ParallelMergeOptions Enum
NotBuffered Use a merge without output buffers. As soon as result elements have been computed, make that element available to the
consumer of the query.
This isn't really a case for async/await because it sounds like ComputeResult is expensive computationally, as opposed to just taking a long, indeterminate amount of time. aync/await is better for tasks you are truly waiting on. Parallel.ForEach will actually thread your workload.
If anything, AddResultToDb is what you would want to async/await - you would be waiting on an external action to complete.
Good in-depth explanation: https://stackoverflow.com/a/35485780/127257
Using Parallel.For honestly seems like the simplest solution, since your computations are likely to be CPU-bound. Async/await is better for I/O bound operations since it does not require another thread to wait for an I/O operation to complete (see there is no thread).
That being said, you can still use async/await for tasks that you put on the thread pool. So here's how you could do it.
static void AddResultToDb(int number)
{
int result = ComputeResult(number);
AddResultToDb(number, result);
}
static async Task AddResultsToDb(IEnumerable<int> numbers)
{
var tasks = numbers.Select
(
number => Task.Run( () => AddResultToDb(number) )
)
.ToList();
await Task.WhenAll(tasks);
}
I am struggling to grasp the basic concept of c# async await.
Basically what I have is a List of objects which I need to process, the processing involves iterating through its properties and joining strings, and then creating a new object (in this case called a trellocard) and eventually adding a list of trellocards.
The iteration takes quiet a long time, So what I would like to do is process multiple objects at asynchronously.
I've tried multiple approaches but basically I want to do something like this. (in the below example I have removed the processing, and just put system.threading.thread.sleep(200). Im await that this is NOT an async method, and I could use tasks.delay but the point is my processing does not have any async methods, i want to just run the entire method with multiple instances.
private async Task<List<TrelloCard>> ProcessJobs(IQueryable<IGrouping<CardGrouping, Job>> jobs)
{
List<TrelloCard> cards = new List<TrelloCard>();
foreach (var job in jobs.ToList())
{
card = await ProcessCards(job, cards); // I would like to run multiple instances of the processing
cards.add(card); //Once each instance is finshed it adds it to the list
}
private async Task<TrelloCard> ProcessCards(Job job)
{
System.Threading.Thread.Sleep(2000); //Just for examples sake
return new TrelloCard();
}
I am struggling to grasp the basic concept of c# async await.
Simple definition would be, Async-Await is a part .Net concurrency, which can be used to make multiple IO calls, and in process not waste the Threads, which are meant for Compute operations. Its like call to Database, Web service, Network calls, File IO, all of which doesn't need a current process thread
In your current case, where the use case is:
iterating through its properties and joining strings, and then creating a new object
eventually adding a list of trellocards
This seems to be a compute bound operation, until and unless you are doing an IO, to me it seems you are traversing an in memory object, for this case the better choice would be:
Parallel.ForEach, to parallelize the in memory processing, though you need to be careful of Race conditions, as a given memory could be accessed by multiple threads, thus corrupting it specially during write operation, so at least in current code use Thread safe collection like ConcurrentBag from System.Collections.Concurrent namespace, or which ever suit the use case instead of List<TrelloCard>, or you may consider following Thread safe list
Also please note that, in case your methods are not by default Async, then you may plan to wrap them in a Task.Run, to await upon, though this would need a Thread pool thread, but can be called using Async-Await
Parallel.Foreach code for your use case (I am doing direct replacement, there seems to be an issue in your code, since ProcessCards function, just takes Job object but you are also passing the collection Cards, which is compilation error):
private List<TrelloCard> ProcessJobs(IQueryable<IGrouping<CardGrouping, Job>> jobs)
{
ConcurrentBag<TrelloCard> cards = new ConcurrentBag<TrelloCard>();
Parallel.ForEach(jobs.ToList(), (job) =>
{
card = ProcessCards(job); // I would like to run multiple instances of the processing
cards.Add(card); //Once each instance is finshed it adds it to the list
});
return cards.ToList();
}
private TrelloCard ProcessCards(Job job)
{
return new TrelloCard();
}
If you want them to run in parallel you could spawn a new Task for each operation and then await the completion of all using Task.WhenAll.
private async Task<List<TrelloCard>> ProcessJobs(IQueryable<IGrouping<CardGrouping, Job>> jobs)
{
List<Task<TrelloCard>> tasks = new List<Task<TrelloCard>>();
foreach (var job in jobs)
{
tasks.Add(ProcessCards(job));
}
var results = await Task.WhenAll(tasks);
return results.ToList();
}
private Task<TrelloCard> ProcessCards(Job job)
{
return Task.Run(() =>
{
System.Threading.Thread.Sleep(2000); //Just for examples sake
return new TrelloCard();
});
}
jobs.ToList() is just wasting memory. It's already IEnumerable so can be used in a foreach.
ProcessCards doesn't compile. You need something like this
private Task<TrelloCard> ProcessCards(Job job)
{
return Task.Run(() =>
{
System.Threading.Thread.Sleep(2000); //Just for examples sake
return new TrelloCard();
});
}
Now you want ProcessJobs to
create a ProcessCards task for each job
wait for all tasks to finish
return a sequence of TrelloCard
private async Task<List<TrelloCard>> ProcessJobs(IQueryable<IGrouping<CardGrouping, Job>> jobs)
{
return await Task.WhenAll(jobs.Select(ProcessCards));
}
What's difference between this two asynchronous methods? If didn't, In which situation this two kind of methods can was different?
Thanks.
public async Task<int> MyMethod1Async()
{
return 1;
}
public async Task<int> MyMethod2Async()
{
return await new Task<int>(() => 1);
}
Taking a look at the two methods:
public async Task<int> MyMethod1Async()
{
return 1;
}
This will run synchronously because there are no "await" operators in it - it just returns 1, so it's no different than if you had just done the following:
public int MyMethod1()
{
return 1;
}
The following method is probably a better illustration of the difference between different "types" of async:
public async Task<string> MyMethod1Async()
{
using (HttpClient client = new HttpClient())
{
client.BaseAddress = new Uri("SomeBaseAddress");
// This will return control to the method's caller until this gets a result from the server
HttpResponseMessage message = await client.GetAsync("SomeURI");
// The same as above - returns control to the method's caller until this is done
string content = await message.Content.ReadAsStringAsync();
return content;
}
}
Code like this won't necessarily spawn extra threads (unless that's how Microsoft happened to have implemented those particular library calls). Either way, await/async does not require the creation of additional threads; it can run asynchronously on the same thread.
My standard illustration of this fact is as follows: suppose you go a restaurant with 10 people. When the waiter comes by, the first person he asks for his order isn't ready; however, the other 9 people are. Thus, the waiter asks the other 9 people for their orders and then comes back to the original guy hoping he'll be ready to order by then. (It's definitely not the case that they'll get a second waiter to wait for the original guy to be ready to order and doing so probably wouldn't save much time anyway). That's how async/await works in many cases (the exception being that some of the Task Parallel library calls, like Thread.Run(...), actually are executing on other threads - in our illustration, bringing in a second waiter - so make sure you check the documentation for which is which).
The next item you list won't work because you just create the task, you don't actually do anything with it:
public async Task<int> MyMethod2Async()
{
return await new Task<int>(() => 1);
}
I'm assuming that you actually intended to do something like the following:
public async Task<int> MyMethod2Async()
{
return await Task.Run<int>(() => 1);
}
This will run the lambda expression in the thread pool, return control to MyMethod2Async's caller until the lambda expression has a result, and then return the value from the lambda expression once it does have a result.
To summarize, the difference is whether you're running asynchronously on the same thread (equivalent to the first guy at your table telling the waiter to come back to him after everyone else has ordered) or if you're running the task on a separate thread.
At risk of oversimplifying things a lot, CPU-bound tasks should generally be run asynchronously on a background thread. However, IO-bound tasks (or other cases where the holdup is mostly just waiting for some kind of result from an external system) can often be run asynchronously on the same thread; there won't necessarily be much of a performance improvement from putting it on a background thread vs. doing it asynchronously on the same thread.
The first method returns an already completed task with a Result of 1.
The second method returns a Task<int> that will never complete.