Azure Durable Function - Do While in Orchestrator - c#

I have an orchestrator that calls an activity function to process customer Id's
The output of this activity returns error ID's (if there are any) and I wish to reprocess these Id's by executing the activity again until there are no error id's (output is null).
Is it good practice to have a do loop for an orchestrator?
How do I include a 5 min delay before each time the activity gets executed?
public static async Task<string> RunOrchestrator(
[OrchestrationTrigger] IDurableOrchestrationContext context, ILogger log)
{
log = context.CreateReplaySafeLogger(log);
dynamic errorOutput = null;
dynamic processCustomers = context.GetInput<dynamic>();
log = context.CreateReplaySafeLogger(log);
do
{
log.LogInformation("Calling activity");
errorOutput = await context.CallActivityAsync<dynamic>("GetCSPCustomerLicenses_Activity", processCustomers);
//Get customers to process from error object
processCustomers = errorOutput;
//Wait 5 minutes - how do I achieve this ?
} while (errorOutput != null);
return "Success";
}

Maybe you can use durable timers for delaying execution, please refer to Timers in Durable Functions (Azure Functions) first:
Durable Functions provides durable timers for use in orchestrator functions to implement delays or to set up timeouts on async actions. Durable timers should be used in orchestrator functions instead of Thread.Sleep and Task.Delay (C#), or setTimeout() and setInterval() (JavaScript), or time.sleep() (Python).
This is a code sample for delay usage:
public static async Task<string> RunOrchestrator(
[OrchestrationTrigger] IDurableOrchestrationContext context, ILogger log)
{
log = context.CreateReplaySafeLogger(log);
dynamic errorOutput = null;
dynamic processCustomers = context.GetInput<dynamic>();
log = context.CreateReplaySafeLogger(log);
do
{
log.LogInformation("Calling activity");
errorOutput = await context.CallActivityAsync<dynamic>("GetCSPCustomerLicenses_Activity", processCustomers);
//Get customers to process from error object
processCustomers = errorOutput;
//Wait 5 minutes - how do I achieve this ?
DateTime deadline = context.CurrentUtcDateTime.Add(TimeSpan.FromMinutes(5));
await context.CreateTimer(deadline, CancellationToken.None);
} while (errorOutput != null);
return "Success";
}

Related

Proper way to implement my own async Worker Service(s) according to TAP pattern

I'm newbie in async. And making WPF app for scraping and API calls purposes. WPF's UI is needed only for service monitoring and settings control, so all services will run simultaneously in the background. And most of them is doing similar work.
For this one I need to implement strategy like this:
Start worker on threadpool
Worker must send request and process response from Website
2.1 If response processed and some new data appeared - raise an event
2.2 If request failed - handle an error
2.3 If there are many error percent for last x requests - stop worker
No matter the last request completed/failed/running we must send another request
Another request should be sent not earlier than the set delay but should not exceed the delay too much (as far as possible).
private _workTask;
private List<ScrapeParameters> _scrapeParams = new();
public event EventHandler<ScrapedEventArgs>? NewDataScraped;
//Can I run Worker like this?
public void RunWorker()
{
if(_workTask.IsCompleted)
_workTask = WorkAsync(_token)
}
private async Task WorkAsync(CancellationToken cancelToken)
{
List<Task> processTasks = new();
while(true)
{
if(cancelToken.IsCancellationRequested) return;
//Delay could be from 0.5 second to any value
var delayTask = Task.Delay(WorkerDelay);
var completedTasks = processTasks.Where(t => t.IsCompleted)
var setToHandle = new HashSet<Task>(completedTasks);
foreach(var task in setToHandle)
{
//Theoretical logic to handle errors and completion
if(task.IsFaulted)
HandleFaultedTask(task);
else
CountCompleted();
processTasks.Remove(task);
}
//Theoretical logic to obtain the desired parameters.
var currParameters = GetParameters();
processTasks.Add(ProcessAsync(currParameters, cancelToken));
await delayTask;
}
}
//This method usually takes around 2-4 seconds
private async Task ProcessAsync(ScrapeParameters parameters CancellationToken cancelToken)
{
//Some work with http requests
var response = await Client.GetAsync(parameters.ToUri());
...
//Processing response
...
if(newData != null)
NewDataScraped?.Invoke(new(newData));
}
Does my implementation matches the TAP pattern?
Especially I would like to focus on RunWorker() and setToHandle

Continue async method execution after main method returned : best practices? [duplicate]

I have some process that take long processing time that client not need the response immediately. I've tried below code without success.
[HttpPost, Route("run")]
public async Task Run()
{
_ = this.LongProcess().ConfigureAwait(false);
return await Task.CompletedTask;
}
The service still take time until my LongProces finish before return to the client.
How can I make the Run method return to the client promptly ?
How can I make the Run method return to the client promptly?
You need a basic distributed architecture. I recommend:
A durable storage system, such as Azure Queues or AWS Simple Queue Service.
An independent processor, such as Azure Functions or AWS Lambdas.
Then, your API enqueues the message and returns:
[HttpPost, Route("run")]
public async Task Run()
{
var message = new Message();
await _queueService.EnqueueAsync(message);
return;
}
and the independent processor dequeues messages and handles them:
async Task HandleMessage(Message message)
{
await LongProcess().ConfigureAwait(false);
}

.NET5 Background Service Concurrency

I have a background service that will be started when the application performing startup. The background service will start to create multiple tasks based on how many workers are set. As I do various trials and monitor the open connection on DB. The open connection is always the same value as the worker I set. Let say I set 32 workers, then the connection will be always 32 open connections shown as I use query to check it. FYI I am using Postgres as the DB server. In order to check the open connection, I use the query below to check the connection when the application is running.
select * from pg_stat_activity where application_name = 'myapplication';
Below is the background service code.
public class MessagingService : BackgroundService {
private int worker = 32;
protected override async Task ExecuteAsync(CancellationToken cancellationToken) {
var tasks = new List<Task>();
for (int i=0; i<worker; i++) {
tasks.Add(DoJob(cancellationToken));
}
while (!cancellationToken.IsCancellationRequested) {
try {
var completed = await Task.WhenAny(tasks);
tasks.Remove(completed);
} catch (Exception) {
await Task.Delay(1000, cancellationToken);
}
if (!cancellationToken.IsCancellationRequested) {
tasks.Add(DoJob(cancellationToken));
}
}
}
private async Task DoJob(CancellationToken cancellationToken) {
using (var scope = _services.CreateScope()) {
var service = scope.ServiceProvider
.GetRequiredService<MessageService>();
try {
//do select and update query on db if null return false otherwise send mail
if (!await service.Run(cancellationToken)) {
await Task.Delay(1000, cancellationToken);
}
} catch (Exception) {
await Task.Delay(1000, cancellationToken);
}
}
}
}
The workflow is not right as it will keep creating the task and leave the connection open and idle. Also, the CPU and memory usage are high when running those tasks. How can I achieve like when there is no record found on DB only keep 1 worker running at the moment? If a record or more is found it will keep increasing until the preset max worker then decreasing the worker when the record is less than the max worker. If this question is too vague or opinion-based then please let me know and I will try my best to make it as specific as possible.
Update Purpose
The purpose of this service is to perform email delivery. There is another API that will be used to create a scheduled job. Once the job is added to the DB, this service will do the email delivery at the scheduled time. Eg, 5k schedule jobs are added to the DB and the scheduled time to perform the job is '2021-12-31 08:00:00' and the time when creating the scheduled job is 2021-12-31 00:00:00'. The service will keep on looping from 00:00:00 until 08:00:00 with 32 workers running at the same time then just start to do the email delivery. How can I improve it to more efficiency like normally when there is no job scheduled only 1 worker is running. When it checked there is 5k scheduled job it will fully utilise all the worker. After 5k job is completed, it will back to 1 workers.
My suggestion is to spare yourself from the burden of manually creating and maintaining worker tasks, by using an ActionBlock<T> from the TPL Dataflow library. This component is a combination of an input queue and an Action<T> delegate. You specify the delegate in its constructor, and you feed it with messages with its Post method. The component invokes the delegate for each message it receives, with the specified degree of parallelism. When there are no more messages to send, you notify it by invoking its Complete method, and then await its Completion so that you know that all work that was delegated to it has completed.
Below is a rough demonstration if how you could use this component:
protected override async Task ExecuteAsync(CancellationToken cancellationToken)
{
var processor = new ActionBlock<Job>(async job =>
{
await ProcessJob(job);
await MarkJobAsCompleted(job);
}, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 32
});
try
{
while (true)
{
Task delayTask = Task.Delay(TimeSpan.FromSeconds(60), cancellationToken);
Job[] jobs = await FetchReadyToProcessJobs();
foreach (var job in jobs)
{
await MarkJobAsPending(job);
processor.Post(job);
}
await delayTask; // Will throw when the token is canceled
}
}
finally
{
processor.Complete();
await processor.Completion;
}
}
The FetchReadyToProcessJobs method is supposed to connect to the database, and fetch all the jobs whose time has come to be processed. In the above example this method is invoked every 60 seconds. The Task.Delay is created before invoking the method, and awaited after the returned jobs have been posted to the ActionBlock<T>. This way the interval between invocations will be stable and consistent.

What is best practice for calling async method from Quartz.NET Job Execute?

I'm converting previous synchronous Quartz.NET 2.0 jobs to the new async Quartz.NET 3.0 framework and I was curious what the best practice was for dealing with the results of calling another async method that you need the results from?
In my scenario I'm using a package called CliWrap that is for interacting with command line executables. In my scenario I use their buffered option which captures the stdout and stderr streams into a buffer that you can then inspect.
My question then, is it better to have the Quartz job wait on the result from the CliWrap call (Option 1 below) or is it better to have the job be async as well and assign a JobListener to grab the buffered result when the job completes (Option 2 below)? Thanks
Option 1
public Task Execute(IJobExecutionContext context) {
MyJobDetails jobDetails = context.MergedJobDataMap["MyJobDetails"] as MyJobDetails;
var result = Cli.Wrap(jobDetails.ExecPath))
.WithArguments(jobDetails.Arguments)
.ExecuteBufferedAsync();
var r= result.GetAwaiter().GetResult();
//do whatever with output
string stdout = r.StandardOutput;
return result;
}
Option 2
public async Task Execute(IJobExecutionContext context) {
MyJobDetails jobDetails = context.MergedJobDataMap["MyJobDetails"] as MyJobDetails;
var result = await Cli.Wrap(jobDetails.ExecPath))
.WithArguments(jobDetails.Arguments)
.ExecuteBufferedAsync();
//set the result in the context
context.Result = result;
}
public class SimpleListener : IJobListener {
public Task JobWasExecuted(IJobExecutionContext context, JobExecutionException jobException, CancellationToken cancellationToken = default(CancellationToken)) {
var result = (BufferedCommandResult)context.Result;
//do whatever with output
string stdout = result.StandardOutput;
}
}
You should almost never use GetAwaiter().GetResult() in any context. You are blocking the thread with GetAwaiter().GetResult() and it will waste the whole purpose of async and await.
You should go with Option 2.

Calling AWS RDS CreateDBSnapshotAsync Asynchronously "Set It And Forget It"

In an AWS Lambda function, I would like to be able to call a component to create a RDS DB Snapshot. There is an async method on the client named CreateDBSnapshotAsync. But, because this is AWS Lambda, I only have 5 minutes to complete the task. So, if I await it, the AWS Lambda function will timeout. And, apparently when it times out, the call is cancelled and then the snapshot is not completed.
Is there some way I can make the call in a COMPLETELY asynchronously way so that once I invoke it, it will complete no matter if my Lambda function times out or not?
In other words, I don't care about the result, I just want to invoke the process and move on, a "set it and forget it" mentality.
My call (without the await, obviously) is as below
using (var rdsClient = new AmazonRDSClient())
{
Task<CreateDBSnapshotResponse> response = rdsClient.CreateDBSnapshotAsync(new CreateDBSnapshotRequest($"MySnapShot", instanceId));
}
As requested, here's the full method:
public async Task<CloudFormationResponse> MigrateDatabase(CloudFormationRequest request, ILambdaContext context)
{
LambdaLogger.Log($"{nameof(MigrateDatabase)} invoked: " + JsonConvert.SerializeObject(request));
if (request.RequestType != "Delete")
{
try
{
var migrations = this.Context.Database.GetPendingMigrations().OrderBy(b=>b).ToList();
for (int i = 0; i < migrations.Count(); i++)
{
string thisMigration = migrations [i];
this.ApplyMigrationInternal(thisMigration);
}
this.TakeSnapshotAsync(context,migrations.Last());
return await CloudFormationResponse.CompleteCloudFormationResponse(null, request, context);
}
catch (Exception e)
{
LambdaLogger.Log(e.ToString());
if (e.InnerException != null) LambdaLogger.Log(e.InnerException.ToString());
return await CloudFormationResponse.CompleteCloudFormationResponse(e, request, context);
}
}
return await CloudFormationResponse.CompleteCloudFormationResponse(null, request, context);
}
internal void TakeSnapshotAsync(ILambdaContext context, string migration)
{
var instanceId = this.GetEnvironmentVariable(nameof(DBInstance));
using (var rdsClient = new AmazonRDSClient())
{
Task<CreateDBSnapshotResponse> response = rdsClient.CreateDBSnapshotAsync(new CreateDBSnapshotRequest($"{instanceId}{migration.Replace('_','-')}", instanceId));
while (context.RemainingTime > TimeSpan.FromSeconds(15))
{
Thread.Sleep(15000);
}
}
}
First refactor that sub function to use proper async syntax along with the use of Task.WhenAny.
internal async Task TakeSnapshotAsync(ILambdaContext context, string migration) {
var instanceId = this.GetEnvironmentVariable(nameof(DBInstance));
//don't wrap in using block or it will be disposed before you are done with it.
var rdsClient = new AmazonRDSClient();
var request = new CreateDBSnapshotRequest($"{instanceId}{migration.Replace('_','-')}", instanceId);
//don't await this long running task
Task<CreateDBSnapshotResponse> response = rdsClient.CreateDBSnapshotAsync(request);
Task delay = Task.Run(async () => {
while (context.RemainingTime > TimeSpan.FromSeconds(15)) {
await Task.Delay(15000); //Don't mix Thread.Sleep. use Task.Delay and await it.
}
}
// The call returns as soon as the first operation completes,
// even if the others are still running.
await Task.WhenAny(response, delay);
}
So if the RemainingTime runs out, it will break out of the call even if the snap shot task is still running so that the request does not time out.
Now you should be able to await the snapshot while there is still time available in the context
public async Task<CloudFormationResponse> MigrateDatabase(CloudFormationRequest request, ILambdaContext context) {
LambdaLogger.Log($"{nameof(MigrateDatabase)} invoked: " + JsonConvert.SerializeObject(request));
if (request.RequestType != "Delete") {
try {
var migrations = this.Context.Database.GetPendingMigrations().OrderBy(b=>b).ToList();
for (int i = 0; i < migrations.Count(); i++) {
string thisMigration = migrations [i];
this.ApplyMigrationInternal(thisMigration);
}
await this.TakeSnapshotAsync(context, migrations.Last());
return await CloudFormationResponse.CompleteCloudFormationResponse(null, request, context);
} catch (Exception e) {
LambdaLogger.Log(e.ToString());
if (e.InnerException != null) LambdaLogger.Log(e.InnerException.ToString());
return await CloudFormationResponse.CompleteCloudFormationResponse(e, request, context);
}
}
return await CloudFormationResponse.CompleteCloudFormationResponse(null, request, context);
}
This should also allow for any exceptions thrown by the RDS client to be caught by the currently executing thread. Which should help with troubleshooting any exception messages.
Some interesting information from documentation.
Using Async in C# Functions with AWS Lambda
If you know your Lambda function will require a long-running process, such as uploading large files to Amazon S3 or reading a large stream of records from DynamoDB, you can take advantage of the async/await pattern. When you use this signature, Lambda executes the function synchronously and waits for the function to return a response or for execution to time out.
From docs about timeouts
Function Settings
...
Timeout – The amount of time that Lambda allows a function to run before stopping it. The default is 3 seconds. The maximum allowed value is 900 seconds.
If getting a HTTP timeout then shorten the delay but leave the long running task. You still use the Task.WhenAny to give the long running task an opportunity to finish first even if that is not the expectation.
internal async Task TakeSnapshotAsync(ILambdaContext context, string migration) {
var instanceId = this.GetEnvironmentVariable(nameof(DBInstance));
//don't wrap in using block or it will be disposed before you are done with it.
var rdsClient = new AmazonRDSClient();
var request = new CreateDBSnapshotRequest($"{instanceId}{migration.Replace('_','-')}", instanceId);
//don't await this long running task
Task<CreateDBSnapshotResponse> response = rdsClient.CreateDBSnapshotAsync(request);
Task delay = Task.Delay(TimeSpan.FromSeconds(2.5));
// The call returns as soon as the first operation completes,
// even if the others are still running.
await Task.WhenAny(response, delay);
}

Categories