I am struggling to grasp the basic concept of c# async await.
Basically what I have is a List of objects which I need to process, the processing involves iterating through its properties and joining strings, and then creating a new object (in this case called a trellocard) and eventually adding a list of trellocards.
The iteration takes quiet a long time, So what I would like to do is process multiple objects at asynchronously.
I've tried multiple approaches but basically I want to do something like this. (in the below example I have removed the processing, and just put system.threading.thread.sleep(200). Im await that this is NOT an async method, and I could use tasks.delay but the point is my processing does not have any async methods, i want to just run the entire method with multiple instances.
private async Task<List<TrelloCard>> ProcessJobs(IQueryable<IGrouping<CardGrouping, Job>> jobs)
{
List<TrelloCard> cards = new List<TrelloCard>();
foreach (var job in jobs.ToList())
{
card = await ProcessCards(job, cards); // I would like to run multiple instances of the processing
cards.add(card); //Once each instance is finshed it adds it to the list
}
private async Task<TrelloCard> ProcessCards(Job job)
{
System.Threading.Thread.Sleep(2000); //Just for examples sake
return new TrelloCard();
}
I am struggling to grasp the basic concept of c# async await.
Simple definition would be, Async-Await is a part .Net concurrency, which can be used to make multiple IO calls, and in process not waste the Threads, which are meant for Compute operations. Its like call to Database, Web service, Network calls, File IO, all of which doesn't need a current process thread
In your current case, where the use case is:
iterating through its properties and joining strings, and then creating a new object
eventually adding a list of trellocards
This seems to be a compute bound operation, until and unless you are doing an IO, to me it seems you are traversing an in memory object, for this case the better choice would be:
Parallel.ForEach, to parallelize the in memory processing, though you need to be careful of Race conditions, as a given memory could be accessed by multiple threads, thus corrupting it specially during write operation, so at least in current code use Thread safe collection like ConcurrentBag from System.Collections.Concurrent namespace, or which ever suit the use case instead of List<TrelloCard>, or you may consider following Thread safe list
Also please note that, in case your methods are not by default Async, then you may plan to wrap them in a Task.Run, to await upon, though this would need a Thread pool thread, but can be called using Async-Await
Parallel.Foreach code for your use case (I am doing direct replacement, there seems to be an issue in your code, since ProcessCards function, just takes Job object but you are also passing the collection Cards, which is compilation error):
private List<TrelloCard> ProcessJobs(IQueryable<IGrouping<CardGrouping, Job>> jobs)
{
ConcurrentBag<TrelloCard> cards = new ConcurrentBag<TrelloCard>();
Parallel.ForEach(jobs.ToList(), (job) =>
{
card = ProcessCards(job); // I would like to run multiple instances of the processing
cards.Add(card); //Once each instance is finshed it adds it to the list
});
return cards.ToList();
}
private TrelloCard ProcessCards(Job job)
{
return new TrelloCard();
}
If you want them to run in parallel you could spawn a new Task for each operation and then await the completion of all using Task.WhenAll.
private async Task<List<TrelloCard>> ProcessJobs(IQueryable<IGrouping<CardGrouping, Job>> jobs)
{
List<Task<TrelloCard>> tasks = new List<Task<TrelloCard>>();
foreach (var job in jobs)
{
tasks.Add(ProcessCards(job));
}
var results = await Task.WhenAll(tasks);
return results.ToList();
}
private Task<TrelloCard> ProcessCards(Job job)
{
return Task.Run(() =>
{
System.Threading.Thread.Sleep(2000); //Just for examples sake
return new TrelloCard();
});
}
jobs.ToList() is just wasting memory. It's already IEnumerable so can be used in a foreach.
ProcessCards doesn't compile. You need something like this
private Task<TrelloCard> ProcessCards(Job job)
{
return Task.Run(() =>
{
System.Threading.Thread.Sleep(2000); //Just for examples sake
return new TrelloCard();
});
}
Now you want ProcessJobs to
create a ProcessCards task for each job
wait for all tasks to finish
return a sequence of TrelloCard
private async Task<List<TrelloCard>> ProcessJobs(IQueryable<IGrouping<CardGrouping, Job>> jobs)
{
return await Task.WhenAll(jobs.Select(ProcessCards));
}
Related
If we fill a list of Tasks that need to do both CPU-bound and I/O bound work, by simply passing their method declaration to that list (Not by creating a new task and manually scheduling it by using Task.Start), how exactly are these tasks handled?
I know that they are not done in parallel, but concurrently.
Does that mean that a single thread will move along them, and that single thread might not be the same thread in the thread pool, or the same thread that initially started waiting for them all to complete/added them to the list?
EDIT: My question is about how exactly these items are handled in the list concurrently - is the calling thread moving through them, or something else is going on?
Code for those that need code:
public async Task SomeFancyMethod(int i)
{
doCPUBoundWork(i);
await doIOBoundWork(i);
}
//Main thread
List<Task> someFancyTaskList = new List<Task>();
for (int i = 0; i< 10; i++)
someFancyTaskList.Add(SomeFancyMethod(i));
// Do various other things here --
// how are the items handled in the meantime?
await Task.WhenAll(someFancyTaskList);
Thank you.
Asynchronous methods always start running synchronously. The magic happens at the first await. When the await keyword sees an incomplete Task, it returns its own incomplete Task. If it sees a complete Task, execution continues synchronously.
So at this line:
someFancyTaskList.Add(SomeFancyMethod(i));
You're calling SomeFancyMethod(i), which will:
Run doCPUBoundWork(i) synchronously.
Run doIOBoundWork(i).
If doIOBoundWork(i) returns an incomplete Task, then the await in SomeFancyMethod will return its own incomplete Task.
Only then will the returned Task be added to your list and your loop will continue. So the CPU-bound work is happening sequentially (one after the other).
There is some more reading about this here: Control flow in async programs (C#)
As each I/O operation completes, the continuations of those tasks are scheduled. How those are done depends on the type of application - particularly, if there is a context that it needs to return to (desktop and ASP.NET do unless you specify ConfigureAwait(false), ASP.NET Core doesn't). So they might run sequentially on the same thread, or in parallel on ThreadPool threads.
If you want to immediately move the CPU-bound work to another thread to run that in parallel, you can use Task.Run:
someFancyTaskList.Add(Task.Run(() => SomeFancyMethod(i)));
If this is in a desktop application, then this would be wise, since you want to keep CPU-heavy work off of the UI thread. However, then you've lost your context in SomeFancyMethod, which may or may not matter to you. In a desktop app, you can always marshall calls back to the UI thread fairly easily.
I assume you don't mean passing their method declaration, but just invoking the method, like so:
var tasks = new Task[] { MethodAsync("foo"),
MethodAsync("bar") };
And we'll compare that to using Task.Run:
var tasks = new Task[] { Task.Run(() => MethodAsync("foo")),
Task.Run(() => MethodAsync("bar")) };
First, let's get the quick answer out of the way. The first variant will have lower or equal parallelism to the second variant. Parts of MethodAsync will run the caller thread in the first case, but not in the second case. How much this actually affects the parallelism depends entirely on the implementation of MethodAsync.
To get a bit deeper, we need to understand how async methods work. We have a method like:
async Task MethodAsync(string argument)
{
DoSomePreparationWork();
await WaitForIO();
await DoSomeOtherWork();
}
What happens when you call such a method? There is no magic. The method is a method like any other, just rewritten as a state machine (similar to how yield return works). It will run as any other method until it encounters the first await. At that point, it may or may not return a Task object. You may or may not await that Task object in the caller code. Ideally, your code should not depend on the difference. Just like yield return, await on a (non-completed!) task returns control to the caller of the method. Essentially, the contract is:
If you have CPU work to do, use my thread.
If whatever you do would mean the thread isn't going to use the CPU, return a promise of the result (a Task object) to the caller.
It allows you to maximize the ratio of what CPU work each thread is doing. If the asynchronous operation doesn't need the CPU, it will let the caller do something else. It doesn't inherently allow for parallelism, but it gives you the tools to do any kind of asynchronous operation, including parallel operations. One of the operations you can do is Task.Run, which is just another asynchronous method that returns a task, but which returns to the caller immediately.
So, the difference between:
MethodAsync("foo");
MethodAsync("bar");
and
Task.Run(() => MethodAsync("foo"));
Task.Run(() => MethodAsync("bar"));
is that the former will return (and continue to execute the next MethodAsync) after it reaches the first await on a non-completed task, while the latter will always return immediately.
You should usually decide based on your actual requirements:
Do you need to use the CPU efficiently and minimize context switching etc., or do you expect the async method to have negligible CPU work to do? Invoke the method directly.
Do you want to encourage parallelism or do you expect the async method to do interesting amounts of CPU work? Use Task.Run.
Here is your code rewritten without async/await, with old-school continuations instead. Hopefully it will make it easier to understand what's going on.
public Task CompoundMethodAsync(int i)
{
doCPUBoundWork(i);
return doIOBoundWorkAsync(i).ContinueWith(_ =>
{
doMoreCPUBoundWork(i);
});
}
// Main thread
var tasks = new List<Task>();
for (int i = 0; i < 10; i++)
{
Task task = CompoundMethodAsync(i);
tasks.Add(task);
}
// The doCPUBoundWork has already ran synchronously 10 times at this point
// Do various things while the compound tasks are progressing concurrently
Task.WhenAll(tasks).ContinueWith(_ =>
{
// The doIOBoundWorkAsync/doMoreCPUBoundWork have completed 10 times at this point
// Do various things after all compound tasks have been completed
});
// No code should exist here. Move everything inside the continuation above.
Before I start. I have looked at similar questions and I don't think they have an answer in my situation.
I am having problems with Task.Factory.StartNew and Task.WaitAll.
I am getting null exceptions on a object within a created class that is initialized in the task, even though the code that is throwing a null exception should be waiting until all tasks are complete.
If I run this code without the tasks it works fine.
Why is Task.WaitAll not waiting until all of the Tasks have been completed?
Queue<Task> tasks = new Queue<Task>();
//Go through all transactions in the file via the reader.
foreach (transaction t in xr.read_x12(_progressbar_all_processing)) {
tasks.Enqueue(Task.Factory.StartNew(() => {
//Create a new provider from the current transaction and then
//add it to the global provider list.
provider p = new provider(t);
t_info.provider_list.Add(p);
//Null out the segments of the current transaction
//We are done with them and now the garbage collector
//can clean them up for us.
t.segments = null;
}));
}
Task.WaitAll(tasks.ToArray());
foreach(provider p in t_info.providers){
//Every provider has a List<claims> claims_list
//Do something with p.claims_list
foreach(claim c in p.claims_list){ //<--null exception here
}
}
t_info.provider_list is a List<provider> this class is not safe to have multiple threads write to it at once, you must synchronize access to the list.
lock(t_info.provider_list)
{
t_info.provider_list.Add(p);
}
This will only allow a single thread to do the Add call at a time and will fix your issues with a broken collection.
A suggestion to make this easier to get right: use Task.WhenAll instead. Make each of your tasks return a value which is the result of its own unit of work.
WhenAll has the signature:
Task<TResult[]> WhenAll<TResult>(IEnumerable<Task<TResult>> tasks)
Task.WhenAll on MSDN
So you pass it a collection of tasks that each evaluate to a TResult and you get back a task that evaluates to an array containing all the results when they're done.
This way, you are absolved of any responsibility for using thread-safe collections to pass data between tasks. It's much harder to get wrong.
It's also compatible with async/await, which is all about consuming values returned via tasks.
What's difference between this two asynchronous methods? If didn't, In which situation this two kind of methods can was different?
Thanks.
public async Task<int> MyMethod1Async()
{
return 1;
}
public async Task<int> MyMethod2Async()
{
return await new Task<int>(() => 1);
}
Taking a look at the two methods:
public async Task<int> MyMethod1Async()
{
return 1;
}
This will run synchronously because there are no "await" operators in it - it just returns 1, so it's no different than if you had just done the following:
public int MyMethod1()
{
return 1;
}
The following method is probably a better illustration of the difference between different "types" of async:
public async Task<string> MyMethod1Async()
{
using (HttpClient client = new HttpClient())
{
client.BaseAddress = new Uri("SomeBaseAddress");
// This will return control to the method's caller until this gets a result from the server
HttpResponseMessage message = await client.GetAsync("SomeURI");
// The same as above - returns control to the method's caller until this is done
string content = await message.Content.ReadAsStringAsync();
return content;
}
}
Code like this won't necessarily spawn extra threads (unless that's how Microsoft happened to have implemented those particular library calls). Either way, await/async does not require the creation of additional threads; it can run asynchronously on the same thread.
My standard illustration of this fact is as follows: suppose you go a restaurant with 10 people. When the waiter comes by, the first person he asks for his order isn't ready; however, the other 9 people are. Thus, the waiter asks the other 9 people for their orders and then comes back to the original guy hoping he'll be ready to order by then. (It's definitely not the case that they'll get a second waiter to wait for the original guy to be ready to order and doing so probably wouldn't save much time anyway). That's how async/await works in many cases (the exception being that some of the Task Parallel library calls, like Thread.Run(...), actually are executing on other threads - in our illustration, bringing in a second waiter - so make sure you check the documentation for which is which).
The next item you list won't work because you just create the task, you don't actually do anything with it:
public async Task<int> MyMethod2Async()
{
return await new Task<int>(() => 1);
}
I'm assuming that you actually intended to do something like the following:
public async Task<int> MyMethod2Async()
{
return await Task.Run<int>(() => 1);
}
This will run the lambda expression in the thread pool, return control to MyMethod2Async's caller until the lambda expression has a result, and then return the value from the lambda expression once it does have a result.
To summarize, the difference is whether you're running asynchronously on the same thread (equivalent to the first guy at your table telling the waiter to come back to him after everyone else has ordered) or if you're running the task on a separate thread.
At risk of oversimplifying things a lot, CPU-bound tasks should generally be run asynchronously on a background thread. However, IO-bound tasks (or other cases where the holdup is mostly just waiting for some kind of result from an external system) can often be run asynchronously on the same thread; there won't necessarily be much of a performance improvement from putting it on a background thread vs. doing it asynchronously on the same thread.
The first method returns an already completed task with a Result of 1.
The second method returns a Task<int> that will never complete.
This might be the worst StackOverflow title I've ever written. What I'm actually trying to do is execute an asynchronous method that uses the async/await convention (and itself contains additional await calls) from within a synchronous method multiple times in parallel while maintaining the same thread throughout the execution of each branch of the parallel execution, including for all await continuations. To put it another way, I want to execute some async code synchronously, but I want to do it multiple times in parallel. Now you can see why the title was so bad. Perhaps this is best illustrated with some code...
Assume I have the following:
public class MyAsyncCode
{
async Task MethodA()
{
// Do some stuff...
await MethodB();
// Some other stuff
}
async Task MethodB()
{
// Do some stuff...
await MethodC();
// Some other stuff
}
async Task MethodC()
{
// Do some stuff...
}
}
The caller is synchronous (from a console application). Let me try illustrating what I'm trying to do with an attempt to use Task.WaitAll(...) and wrapper tasks:
public void MyCallingMethod()
{
List<Task> tasks = new List<Task>();
for(int c = 0 ; c < 4 ; c++)
{
MyAsyncCode asyncCode = new MyAsyncCode();
tasks.Add(Task.Run(() => asyncCode.MethodA()));
}
Task.WaitAll(tasks.ToArray());
}
The desired behavior is for MethodA, MethodB, and MethodC to all be run on the same thread, both before and after the continuation, and for this to happen 4 times in parallel on 4 different threads. To put it yet another way, I want to remove the asynchronous behavior of my await calls since I'm making the calls parallel from the caller.
Now, before I go any further, I do understand that there's a difference between asynchronous code and parallel/multi-threaded code and that the former doesn't imply or suggest the latter. I'm also aware the easiest way to achieve this behavior is to remove the async/await declarations. Unfortunately, I don't have the option to do this (it's in a library) and there are reasons why I need the continuations to all be on the same thread (having to do with poor design of said library). But even more than that, this has piqued my interest and now I want to know from an academic perspective.
I've attempted to run this using PLINQ and immediate task execution with .AsParallel().Select(x => x.MethodA().Result). I've also attempted to use the AsyncHelper class found here and there, which really just uses .Unwrap().GetAwaiter().GetResult(). I've also tried some other stuff and I can't seem to get the desired behavior. I either end up with all the calls on the same thread (which obviously isn't parallel) or end up with the continuations executing on different threads.
Is what I'm trying to do even possible, or are async/await and the TPL just too different (despite both being based on Tasks)?
The methods that you are calling do not use ConfigureAwait(false). This means that we can force the continuations to resume in a context we like. Options:
Install a single-threaded synchronization context. I believe Nito.Async has that.
Use a custom TaskScheduler. await looks at TaskScheduler.Current and resumes at that scheduler if it is non-default.
I'm not sure if there are any pros and cons for either option. Option 2 has easier scoping I think. Option 2 would look like:
Task.Factory.StartNew(
() => MethodA()
, new ConcurrentExclusiveSchedulerPair().ExclusiveScheduler).Unwrap();
Call this once for each parallel invocation and use Task.WaitAll to join all those tasks. Probably you should dispose of that scheduler as well.
I'm (ab)using ConcurrentExclusiveSchedulerPair here to get a single-threaded scheduler.
If those methods are not particularly CPU-intensive you can just use the same scheduler/thread for all of them.
You can create 4 independent threads, each one executes MethodA with a limited-concurrency (actually, no concurrency at all) TaskScheduler. That will ensure that every Task, and continuation Tasks, that the thread creates, will be executed by that thread.
public void MyCallingMethod()
{
CancellationToken csl = new CancellationToken();
var threads = Enumerable.Range(0, 4).Select(p =>
{
var t = new Thread(_ =>
{
Task.Factory.StartNew(() => MethodA(), csl, TaskCreationOptions.None,
new LimitedConcurrencyLevelTaskScheduler(1)).Wait();
});
t.Start();
return t;
}).ToArray();
//You can block the main thread and wait for the other threads here...
}
That won't ensure you a 4th degree parallelism, of course.
You can see an implementation of such TaskScheduler in MSDN - https://msdn.microsoft.com/en-us/library/ee789351(v=vs.110).aspx
I will first provide the pseudocode and describe it below:
public void RunUntilEmpty(List<Job> jobs)
{
while (jobs.Any()) // the list "jobs" will be modified during the execution
{
List<Job> childJobs = new List<Job>();
Parallel.ForEach(jobs, job => // this will be done in parallel
{
List<Job> newJobs = job.Do(); // after a job is done, it may return new jobs to do
lock (childJobs)
childJobs.AddRange(newJobs); // I would like to add those jobs to the "pool"
});
jobs = childJobs;
}
}
As you can see, I am performing a unique type of foreach. The source, the set (jobs), can simply be enhanced during the execution and this behaviour cannot be determined earlier. When the method Do() is called on an object (here, job), it may return new jobs to perform and thus would enhance the source (jobs).
I could call this method (RunUntilEmpty) recursively, but unfortunately the stack can be really huge and is likely to result in an overflow.
Could you please tell me how to achieve this? Is there a way of doing this kind of actions in C#?
If I understand correctly, you basically start out with some collection of Job objects, each representing some task which can itself create one or more new Job objects as a result of performing its task.
Your updated code example looks like it will basically accomplish this. But note that, as commenter CommuSoft points out, it won't make most efficient use of your CPU cores. Because you are only updating the list of jobs after each group of jobs has completed, there's no way for newly-generated jobs to run until all of the previously-generated jobs have completed.
A better implementation would use a single queue of jobs, continually retrieving new Job objects for execution as old ones complete.
I agree that TPL Dataflow may be a useful way to implement this. However, depending on your needs, you might find it simple enough to just queue the tasks directly to the thread pool and use CountdownEvent to track the progress of the work so that your RunUntilEmpty() method knows when to return.
Without a good, minimal, complete code example, it's impossible to provide an answer that includes a similarly complete code example. But hopefully the below snippet illustrates the basic idea well enough:
public void RunUntilEmpty(List<Job> jobs)
{
CountdownEvent countdown = new CountdownEvent(1);
QueueJobs(jobs, countdown);
countdown.Signal();
countdown.Wait();
}
private static void QueueJobs(List<Job> jobs, CountdownEvent countdown)
{
foreach (Job job in jobs)
{
countdown.AddCount(1);
Task.Run(() =>
{
// after a job is done, it may return new jobs to do
QueueJobs(job.Do(), countdown);
countdown.Signal();
});
}
}
The basic idea is to queue a new task for each Job object, incrementing the counter of the CountdownEvent for each task that is queued. The tasks themselves do three things:
Run the Do() method,
Queue any new tasks, using the QueueJobs() method so that the CountdownEvent object's counter is incremented accordingly, and
Signal the CountdownEvent, decrementing its counter for the current task
The main RunUntilEmpty() signals the CountdownEvent to account for the single count it contributed to the object's counter when it created it, and then waits for the counter to reach zero.
Note that the calls to QueueJobs() are not recursive. The QueueJobs() method is not called by itself, but rather by the anonymous method declared within it, which is itself also not called by QueueJobs(). So there is no stack-overflow issue here.
The key feature in the above is that tasks are continuously queued as they become known, i.e. as they are returned by the previously-executed Do() method calls. Thus, the available CPU cores are kept busy by the thread pool, at least to the extent that any completed Do() method has in fact returned any new Job object to run. This addresses the main problem with the version of the code you've included in your question.