Need help understanding background tasks in async/await calls in Azure Functions

Need help understanding background tasks in async/await calls in Azure Functions - c#

I've been creating a service using C# in Azure Functions. I've read guides on how the best usage of async/await but don't understand its value in the context of Azure functions.
In my Azure function, I have 3 calls being made to a external API. I tried to use async/await to kick off my API calls. The idea is that the the first two tasks return a list each, which will be concatenated, and then compared against the third list.
After this comparison is complete, all the items are then sent to a storage container queue, for processing by another function that uses a queue trigger.
My async implementation is below:
var firstListTask = GetResourceListAsync(1);
var secondListTask = GetResourceListAsync(2);
var thirdListTask = GetOtherResourceListAsync(1);
var completed = await Task.WhenAll(firstListTask, secondListTask);
var resultList = completed[0].Concat(completed[1]);
var compareList = await thirdListTask;
// LINQ here to filter out resultList based on compareList
Running the above, I get an execution time of roughly 38 seconds.
My understanding of the async implementation is that I kick off all 3 async calls to get my lists.
The first two tasks are awaited with 'await Task.WhenAll...' - at this point the thread exits the async method and 'does something else' until the API returns the payload
API payload is received, the method is then resumed and continues executing the next instruction (concatenating the two lists)
The third task is then awaited with 'await thirdListTask', which exits the async method and 'does something else' until the API returns the payload
API payload is received, the method is then resumed and continues executing the next instruction (filtering lists)
Now if I run the same code synchronously, I get an execution time of about 40 seconds:
var firstList = GetResourceList(1)
var secondList = GetResourceList(2);
var resultList = firstList.Concat(secondListTask)
var compaireList = GetOtherResourceList(1);
var finalList = // linq query to filter out resultList based on compareList
I can see that the async function runs 2 seconds faster than the sync function, I'm assuming this is because the thirdListTask is being kicked off at the same time as firstListTask and secondListTask?
My problem with the async implementation is that I don't understand what 'does something else' entails in the context of Azure Functions. From my understanding there is nothing else to do other than the operations on the next line, but it can't progress there until the payload has returned from the API.
Moreover, is the following code sample doing the same thing as my first async implementation? I'm extremely confused seeing examples of Azure Functions that use await for each async call, just to await another call in the next line.
var firstList = await GetResourceListAsync(1);
var secondList = await GetResourceListAsync(2);
var resultList = firstList.Concat(secondList);
var compareList= await GetOtherResourceListAsync(1);
// LINQ here to filter out resultList based on compareList
I've tried reading MS best practice for Azure Functions and similar questions around async/await on stackoverflow, but I can't seem to wrap my head around the above. Can anyone help simplify this?

var firstListTask = GetResourceListAsync(1);
var secondListTask = GetResourceListAsync(2);
var thirdListTask = GetOtherResourceListAsync(1);
This starts all 3 tasks. All 3 API calls are now running.
var completed = await Task.WhenAll(firstListTask, secondListTask);
This async awaits until both tasks finish. It frees up the thread to go "do something else" What is this something else? Whatever the framework needs it to be. It's a resource being freed, so it could be used in running another async operation's continuation etc.
var compareList = await thirdListTask;
At this point, your API call has most likely completed already, as it was started with the other 2. If it completed, the await will pull out the value or throw an exception if the task was faulted. If it is still ongoing, it will async wait for it to complete, freeing up the thread to "go do something else"
var firstList = await GetResourceListAsync(1);
var secondList = await GetResourceListAsync(2);
var resultList = firstList.Concat(secondList);
var compareList= await GetOtherResourceListAsync(1);
This is different from your first example. If e.g. all your API calls take 5 seconds to complete, the total running time will be 15 seconds, as you sequentially start and await for it to complete. In your first example, the total running time will roughly be 5 seconds, so 3 times quicker.

Related

How to spawn threads to make multiple posts at the same time

I have a .netcore 6 BackGroundService which pushes data from on-premise to a 3rd party API.
The 3rd party API takes about 500 milliseconds to process the API call.
The problem is that I have about 1,000,000 rows of data to push to this API one at a time. At 1/2 second per row, it's going to take about 6 days to sync up.
So, I would like to try to spawn multiple threads in order to hit the API simultaneously with 10 threads.
var startTime = DateTimeOffset.Now;
var batchSize = _config.GetValue<int>("BatchSize");
using (var scope = _serviceScopeFactory.CreateScope())
{
var context = scope.ServiceProvider.GetRequiredService<PlankContext>();
var dncEntries = await context.PlankQueueDnc.Where(x => x.ToProcessFlag == true).Take(batchSize).ToListAsync();
foreach (var plankQueueDnc in dncEntries)
{
var response = await _plankConnector.InsertDncAsync(plankQueueDnc);
context.PlankQueueDnc.Update(plankQueueDnc);
}
await context.SaveChangesAsync();
}
Here is the code. As you can see, it gets a batch of 100 records and then processes them one by one. Is there a way to modify this so this line is not awaited? I don't quite understand how it would work if it were not awaited. Would it create a thread for each execution in the loop?
var response = await _plankConnector.InsertDncAsync(plankQueueDnc);
I am clearly not up to speed on threads as well as the esteemed #StephanCleary.
So suggestions would be appreciated.

In .NET 6 you can use Parallel.ForEachAsync to execute operations concurrently, using either all available cores or a limited Degree-Of-Parallelism.
The following code loads all records, executes the posts concurrently, then updates the records :
using (var scope = _serviceScopeFactory.CreateScope())
{
var context = scope.ServiceProvider.GetRequiredService<PlankContext>();
var dncEntries = await context.PlankQueueDnc
.Where(x => x.ToProcessFlag == true)
.Take(batchSize)
.ToListAsync();
await Parallel.ForEachAsync(dncEntries,async plankQueueDnc=>
{
var response = await _plankConnector.InsertDncAsync(plankQueueDnc);
plankQueueDnc.Whatever=response.Something;
};
await context.SaveChangesAsync();
}
There's no reason to call Update as a DbContext tracks the objects it loaded and knows which ones were modified. SaveChangesAsync will persist all changes in a single transaction
DOP and Throttling
By default, ParallelForEachAsync will execute as many tasks concurrently as there are cores. This may be too little or too much for HTTP calls. On the one hand, the client machine isn't using its CPU at all while waiting for the remote service. On the other hand, the remote service itself may not like or even allow too many concurrent calls and may even impose throttling.
The ParallelOptions class can be used to specify the degree of parallelism. If the API allows it, we could execute eg 20 concurrent calls :
var option=new ParallelOptions { MaxDegreeOfParallelism = 20};
await Parallel.ForEachAsync(dncEntries,options,async plankQueueDnc=>{...});
Many services impose a rate on how many requests can be made in a period of time. A (somewhat naive) way of implementing this is to add a small delay in the task worker code can take care of this :
var delay=100;
await Parallel.ForEachAsync(dncEntries,options,async plankQueueDnc=>{
...
await Task.Delay(delay);
});

Can anyone tell which code block does parallel operation and why?

Code Block 1 :-
var services1 = new service1();
var services2 = new service2();
var result1 = await service1.GetData();
var result2 = await service2.GetData();
Code Block 2 :-
var services1 = new service1();
var services2 = new service2();
var task1 = await service1.GetData();
var task2 = await service2.GetData();
Task.WhenAll(task1,task2);
today i got these question in my quiz..!
As options where to choose one from them CB1 or CB2.

Your first example is fine as long as await service1.GetData() does not throw an exception. If it does, then the result of, or any exeptions thrown by, await service2.GetData() will be lost.
It will, however, serialise the operations, as service2.GetData() will not be invoked until service1.GetData() has completed.
Your second example will not compile, unless you meant to do this:
var service1 = new service1();
var service2 = new service2();
var task1 = service1.GetData();
var task2 = service2.GetData();
await Task.WhenAll(task1, task2);
Where the Task.WhenAll is awaited rather than service1.GetData() and service2.GetData().
Then you can safely access the results like this:
var result1 = task1.Result;
var result2 = task2.Result;
The difference here is that there is only one place that an exception can be thrown: Task.WhenAll, which will aggregate the exceptions from all provided tasks.
It will also allow service2.GetData() to be invoked whilst any asynchronous work done by service1.GetData() is executing.
There is a third option as well, assuming service1.GetData() and service2.GetData() have the same return type:
var service1 = new service1();
var service2 = new service2();
var results = await Task.WhenAll(services1.GetData(), services2.GetData());
That way, the result of each Task will be added to an array (here results).
You could then extract the individual values:
var result1 = results[0];
var result2 = results[1];

Normally I wouldn't answer a homework question, since you should have learned it in class. But I feel the need to answer the this one, because it's a bad question and I fear you are being helped to misunderstand asynchronous programming.
Parallel != asynchronous.
"Parallel" means that two or more pieces of code are being executed at the same time. That means there is more than one thread. It's about how code runs.
"Asynchronous" means that while a block of code is waiting for some external operation, the thread is freed to do some other work, instead of locking the thread. It's about how code waits.
Let's assume that GetData() makes a network request to get the data. This is what happens in that second example:
service1.GetData() runs until the network request is sent and returns a Task.
service2.GetData() runs until the network request is sent and returns a Task.
So far, both network requests have been sent and we're waiting for responses. Everything has happened on the same thread, not in parallel. But we still need to run the continuation of each (everything after await in GetData()) after each response is received. How those run depends on if the application has a synchronization context.
If there is a synchronization context (ASP.NET, or UI app, for example) then nothing will run in parallel. The continuation of each call to GetData() will run one after the other on the same thread.
If there is no synchronization context, (ASP.NET Core or console app or ConfigureAwait(false) is used inside GetData(), for example) then each continuation will run on a ThreadPool thread as soon as the responses come back, which may happen in parallel.
If your teacher wants you to put B, then put the answer that will get you the marks. But it might actually be wrong, unless you have been given more detail about the type of application and if it has a synchronization context.
Also, there should be an await before Task.WhenAll().
Microsoft has an excellent series of articles about Asynchronous programming with async and await that are worth the read. You will find the other articles in that series in the table of contents on the left of that first article.

Running a large amount of activity functions from data from a database

We have a database with around 400k elements we need to compute. Below is shown a sample of an orchestrator function.
[FunctionName("Crawl")]
public static async Task<List<string>> RunOrchestrator(
[OrchestrationTrigger] DurableOrchestrationContext context)
{
if (!context.IsReplaying)
{
}
WriteLine("In orchistration");
var outputs = new List<string>();
var tasks = new Task<string>[3];
var retryOptions = new RetryOptions(
firstRetryInterval: TimeSpan.FromSeconds(60),
maxNumberOfAttempts: 3);
// Replace "hello" with the name of your Durable Activity Function.
tasks[0] = context.CallActivityWithRetryAsync<string>("Crawl_Hello",retryOptions, "Tokyo");
tasks[1] = context.CallActivityWithRetryAsync<string>("Crawl_Hello", retryOptions, "Seattle");
tasks[2] = context.CallActivityWithRetryAsync<string>("Crawl_Hello",retryOptions, "London");
await Task.WhenAll(tasks);
return outputs;
}
Every time an activty is called the orchestration function is called. But I dont want to get 400k items from the database each time an activity is getting called. Would just just add all the activity code inside the if statement or what is the right approach here? I can't see that working with the WaitAll function.

Looks like you've figured out the approach for this as you've mentioned in your other query but elaborating the same here for the benefit of others.
Ideally, you should have an activity function to first fetch all the data that you need first, batch them up and call another activity function that processes that data.
Since you have a large number of elements to compute on, its best to split compute into separate sub-orchestrators because the fan-in operation is performed on a single instance.
For further reading, there are some documented performance targets that could help when deploying durable functions.

Looking for clarification around Task.WhenAll and await in async MVC web app

I've been investigating how to incorporate asynchronous methods into my MVC controllers, specifically to leverage the potential of parallel execution.
I found this article particularly helpful. However there's one concept I'd appreciate clarification on. The article I linked above uses the following code to execute a series of I/O-bound methods in parallel:
var widgetTask = widgetService.GetWidgetsAsync();
var prodTask = prodService.GetProductsAsync();
var gizmoTask = gizmoService.GetGizmosAsync();
await Task.WhenAll(widgetTask, prodTask, gizmoTask);
var pwgVM = new ProdGizWidgetVM(
widgetTask.Result,
prodTask.Result,
gizmoTask.Result
);
What I'd like to know is how this differs from the following code:
var widgetTask = widgetService.GetWidgetsAsync();
var prodTask = prodService.GetProductsAsync();
var gizmoTask = gizmoService.GetGizmosAsync();
var pwgVM = new ProdGizWidgetVM(
await widgetTask,
await prodTask,
await gizmoTask
);
According to my understanding these two code blocks are equivalent. Is that correct? If not it'd be great if someone could explain the difference and suggest which is preferable in my case.

1 will execute all 3 Tasks in parallel but not necessarily in order
Task.WhenAll completes, when all tasks inside have completed. widgetTask will be executed to its first await, then while waiting asynchronous for the io-bound work, prodTask will be executed to its first await and while still waiting for the first 2 Tasks to complete, gizmoTask will be executed to its first await. Then after all 3 have finished executing the io-bound work in parallel, they will run to completion. After this, Task.WhenAll will also report completion.
2 will execute all 3 Tasks in parallel and in the given order
widgetTask will start, then prodTask will start and then gizmoTask. The you start awaiting widgetTask. When it has completed, it will return and the non io-bound code that is inside this task will run. After it has finished, prodTask will be awaited. As the io-bound work of it happened asynchronous it might already be completed, so this will be quite fast. After the io-bound work completed, to code after the await inside of prodTask will be called. Same for gizmoTask
So normally you want version 1 and not version 2 as long as you do not care about the order in which the code after the await inside your task is executed.

Async Task.WhenAll with timeout - issue with completed tasks accumulating

I have created the following in order to execute multiple async tasks with a timeout. I was looking for something that will allow extracting results from the tasks - taking only those that beat the timeout, regardless if the rest of tasks failed to do so (simplified):
TimeSpan timeout = TimeSpan.FromSeconds(5.0);
Task<Task>[] tasksOfTasks =
{
Task.WhenAny(SomeTaskAsync("a"), Task.Delay(timeout)),
Task.WhenAny(SomeTaskAsync("b"), Task.Delay(timeout)),
Task.WhenAny(SomeTaskAsync("c"), Task.Delay(timeout))
};
Task[] completedTasks = await Task.WhenAll(tasksOfTasks);
List<MyResult> = completedTasks.OfType<Task<MyResult>>().Select(task => task.Result).ToList();
I have implemented this in a non-static class in the (Web API) server.
This worked well on the first call, however additional calls caused completedTasks to strangely accumulate tasks from previous calls to the server (as shown by the debugger). On the second call there were 6 completed tasks, on the third call 9 and so on.
My questions:
Any idea why is that?
I assume it's because the previous tasks weren't cancelled however this code is in a new instance of a class!
Any idea how to avoid this accumulation?
PS: See my answer to this question.

I couldn't use my psychic debugging to understand why your code "caused completedTasks to strangely accumulate tasks from previous calls" but it does probably expose some of your misunderstandings.
Here's a working example based on your code (using string instead of MyResult):
Task<string> timeoutTask =
Task.Delay(TimeSpan.FromSeconds(5)).ContinueWith(_ => string.Empty);
Task<Task<string>>[] tasksOfTasks =
{
Task.WhenAny(SomeTaskAsync("a"), timeoutTask),
Task.WhenAny(SomeTaskAsync("b"), timeoutTask),
Task.WhenAny(SomeTaskAsync("c"), timeoutTask)
};
Task<string>[] completedTasks = await Task.WhenAll(tasksOfTasks);
List<string> results = completedTasks.Where(task => task != timeoutTask).
Select(task => task.Result).ToList();
So, what's different:
I'm using the same timeout task for all WhenAny calls. There's no need to use more, and they could complete in slightly different times.
I make the timeout task return a value, so it's actually a Task<string> and not a Task.
That makes each WhenAny call also return a Task<string> (and tasksOfTasks be Task<Task<string>>[]) which would make it possible to actually return a result out of these tasks.
After awaiting we need to filter out WhenAny calls that returned our timeout task, because there would be no result there (only string.Empty) using completedTasks.Where(task => task != timeoutTask).
P.S : I've also answered that question and I would (surprisingly) recommend you use my solution.
Note: Using the Task.Result property isn't advisable. You should await it instead (even when you know it's already completed)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.