Azure Functions: How to manage Durable Functions with Blob Triggers? - c#

Imagine that I have a storage account with a blob container, which get files uploaded eventually.
I want to process each file that reaches on the blob storage, open it, extract and store information. Definitively a expensive operation that could fit in a Durable Functions scenario.
Here's the trigger:
[FunctionName("PayrollFileTrigger")]
public static async Task Start(
[BlobTrigger("files/{name}", Connection = "AzureWebJobsStorage")]Stream myBlob, string name,
[DurableClient] IDurableOrchestrationClient starter,
ILogger log)
{
string instanceId = await starter.StartNewAsync("PayrollFile_StartFunction", "payroll_file", name);
}
...which calls the orchestration:
[FunctionName("PayrollFile_StartFunction")]
public async static Task<IActionResult> Run(
[OrchestrationTrigger] IDurableOrchestrationContext context, string blobName,
ExecutionContext executionContext, ILogger log)
{
//Downloads the blob
string filePath =
await context.CallActivityWithRetryAsync<string>("DownloadPayrollBlob", options, blobName);
if (filePath == null) return ErrorResult(ERROR_MSG_1, log);
//Extract data
var payroll =
await context.CallActivityWithRetryAsync<Payroll>("ExtractBlobData", options, filePath);
... and so on (just a sample here) ...
}
But there is a problem. While testing this error occurs, meaning, I think, that I can't start another orchestration with the same id:
An Orchestration instance with the status Pending already exists.
1 - So if I push many files to the container which the trigger is "listening", in a short period of time, the orchestration will get busy with one of them and will ignore other further events?
2 - When the orchestration will get rid of pending status? It occurs automatically?
3 - Should I create a new orchestration instance for each file to be processed? I know you can omit the instanceId parameter, so it get generated randomly and never conflicts with one already started. But, is it safe to do? How do I manage them and ensure they will get finished sometime?

string instanceId = await starter.StartNewAsync("PayrollFile_StartFunction", "payroll_file", name);
The second argument is the instanceId, which is required to be unique.
Instead, try:
string instanceId = await starter.StartNewAsync("PayrollFile_StartFunction", input: name);

Depending on what you want you might want to have only 1 durable instance per file. Microsoft state that you should
Use a random identifier for the instance ID. Random instance IDs help ensure an equal load distribution when you're scaling orchestrator functions across multiple VMs. The proper time to use non-random instance IDs is when the ID must come from an external source, or when you're implementing the singleton orchestrator pattern.
https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-instance-management?tabs=csharp#start-instances
In your specific case I'd say you can go without supplying the instanceId yourself and perhaps log the generated instanceId or write it in a storage solution alongside information about the file that started the orchestration.

Related

Azure Functions - get parameter value in middleware

I am running a v4 Azure Function in an isolated process. It is triggered by a message coming from a Service Bus queue. I have created a simple middleware and would like to get my hands on the incoming data (a simple string). How can I do that from the middleware itself? It doesn't seem FunctionContext is of use in this case.
public class SimpleMiddleware : IFunctionsWorkerMiddleware
{
public async Task Invoke(FunctionContext context, FunctionExecutionDelegate next)
{
await next(context);
}
}
The service bus message data and metadata is extracted and placed into the BindingData dictionary. Try looking at context.BindingContext.BindingData and find the key that exposes your message's data.

Run heavy task that uses dbContext on the API background without await

my C# .Net 6 API project has a reporting requirement: Convert any query or class to CSV file.
The way that I've got the tasks is as follows:
in a [POST] request endpoint named export create the CSV file from a query and upload it to blob storage, without making the user wait for the task to finish.
once the controller gets the requests start the task and return 200 immediately.
later on frontEnd will make a get request and ask for the document, if the document is done, return the document URL.
This is the endpoint code that I have so far:
[HttpPost("export")]
public virtual async Task<IActionResult> Export([FromQuery] UrlRequestBase? urlRequestBase,
[FromBody] BodyRequestBase? bodyRequestBase)
{
object? response;
int status = 200;
try
{
await urlRequestBase.Parse(this);
await bodyRequestBase.Parse();
//Run the export creation in another thread
Task.Run(() => _repositoryBase.CreateExport(urlRequestBase, bodyRequestBase));
return StatusCode(status);
}
catch (ExceptionBase ex)
{
return StatusCode(ex.CodeResult, ex.CreateResponseFromException());
}
}
The problem is that when I try to make a query inside the repository the dbContext is
disposed of because of the lifetime of the DI container, so I get the following error:
Cannot access a disposed context instance. A common cause of this error is disposing a
context instance that was resolved from dependency injection and then later trying to use
the same context instance elsewhere in your application. This may occur if you are calling
'Dispose' on the context instance, or wrapping it in a using statement. If you are using
dependency injection, you should let the dependency injection container take care of
disposing context instances.
it only works when I add await operator but is intended to not wait this time.
How can I run this type of heavy task without await operator and still use the dbContext?
Exists a better way to do it?
You can use the HostingEnvironment.QueueBackgroundWorkItem functionality of ASP.net
There are lots of docs on that, but here's the one that looks good to me:
https://www.c-sharpcorner.com/article/run-background-task-using-hostingenvironment-queuebackgroundworkitem-net-framew/
And steps that might work:
design your worker class with public properties for the query and for a task ID.
Generate a unique task ID somehow, e.g., GUID or counter
'new up' your workout BEFORE the call to QueueBackgroundWorker... so you can set the task ID and query
Queue the item for work
Insert the unique task ID into a table (or equivalent) with a blank URL and/or status or progress fields
Return the task ID to the client, end the web request. This is so the client can save it to ask for the correct document.
Worker logic
Worker will run the query and store the results in the blob storage.
Worker updates row for task ID and status/progress and fills in the URL
Now client behavior:
calls a different web API method with the task ID
server reads the data table for that ID, says 'not found', 'in progress', 'errored', 'done-here's the URL', etc.
While I was typing, 'tia' made a comment about a similar service, I don't think it's exactly the same thing and if it isn't, you could use the same design as these bullets as it appears to offer similar functionality.
Study the docs and see which might be better for you, asp.net has some cool toys as you can see!
The FastAI response was so helpful but unfortunately, I had almost no time to implement it.
DavidG recommended me to use HangFire and now my issue is finally solved
[HttpPost("export")]
public virtual async Task<IActionResult> Export([FromQuery] UrlRequestBase? urlRequestBase,
[FromBody] BodyRequestBase? bodyRequestBase)
{
object? response;
int status = 200;
try
{
await urlRequestBase.Parse(this);
await bodyRequestBase.Parse();
//HangFire solution
_backgroundJobClient.Enqueue(() => _repositoryBase.CreateExport(urlRequestBase, bodyRequestBase));
return StatusCode(status);
}
catch (ExceptionBase ex)
{
return StatusCode(ex.CodeResult, ex.CreateResponseFromException());
}
return StatusCode(status, await ResponseBase.Response(response));
}
Thank you, for taking the time to help me!

Access Storage Account Connection String Within Function Body

I have a queue trigger (for example):
[FunctionName("Handle-Invoice")]
[StorageAccount("StorageAccountConnectionString")]
public async Task InvoiceQueueTrigger(
[QueueTrigger("invoice-queue")] CloudQueueMessage cloudMessage,
ExecutionContext context,
ILogger log)
{
// Async code which might need access to the storage account for the queue
}
Where StorageAccountConnectionString is defined in the function's config.
I'm wondering if there's a way to implicitly access the value of the connection string defined in the StorageAccount attribute?
I know I can access the value directly from either the environment or configuration but it'd be good to access the value either by a binding or some other method.
Thanks.

Can I run a Azure function once on a specific date triggered by my asp.net mvc site?

I know I can create a scheduled Azure function to run on a schedule. But what I want is to be able to run a Azure function once given a specific date/time I pass it, along with some data parameters.
Ex. I'm scheduling classes in my site and I want to email out all students when the class is over. So when the class is created for Monday October 9th # 4:00PM I want to send a message to trigger my Azure function on that same day, but like 1 hour later at 5:00PM. And I want to pass it some info like the class id or something. I also want to be able to remove this queued trigger if the class is canceled.
Is this possible in Azure and my ASP.Net MVC site?
Another potential way to achieve this would be using Durable Functions. I adopted this solution off of Pattern #5, since you have potential human interaction in the form of a class cancellation.
The below is an untested rough framework of what your orchestrator function would look like. Your email logic would go in a function called SendEmail that utilizes an ActivityTrigger, and you could add new classes and cancel them by utilizing these APIs.
public static async Task Run(DurableOrchestrationContext ctx)
{
var classInfo = ctx.GetInput<ClassInfo>();
var targetDateTime = DateTime.Parse(classInfo.ClassEndDateString);
var maxTimeSpan = new TimeSpan(96, 0, 0);
using (var timeoutCts = new CancellationTokenSource())
{
while(true)
{
TimeSpan timeLeft = targetDateTime.Subtract(ctx.CurrentUtcDateTime);
if(timeLeft <= TimeSpan.Zero) {
break;
}
DateTime checkTime;
if(timeLeft > maxTimeSpan) {
checkTime = ctx.CurrentUtcDateTime.Add(maxTimeSpan);
} else {
checkTime = ctx.CurrentUtcDateTime.Add(timeLeft);
}
Task durableTimeout = ctx.CreateTimer(checkTime, timeoutCts.Token);
Task<bool> cancellationEvent = ctx.WaitForExternalEvent<bool>("Cancellation");
if (cancellationEvent == await Task.WhenAny(cancellationEvent, durableTimeout))
{
timeoutCts.Cancel();
return
}
}
await ctx.CallActivityAsync("SendEmail", classInfo.ClassData);
}
}
public class ClassInfo {
public string ClassEndDateString {get; set; }
public string ClassData {get; set;}
}
My classes might be scheduled upto 30 days ahead of the current date.
Per my understanding, you could also leverage the Scheduled messages from Azure Service Bus and set the ScheduledEnqueueTimeUtc property for your message. For more details about triggering the service bus queue message, you could follow Azure Functions Service Bus bindings. For sending the message, you could install the WindowsAzure.ServiceBus in your MVC application and leverage QueueClient.ScheduleMessageAsync for sending scheduled message and QueueClient.CancelScheduledMessageAsync for cancelling the scheduled message. Moreover, you could follow the code snippet in this issue.
You should be using the queue trigger for this. You can have your MVC app add CloudQueueMessage objects to an Azure Storage Queue. These messages can contain the data parameters (i.e. student list, professor name, etc.) that are unique to the class, and you can delay the visibility in the Queue with the visibilitytimeout parameter when you perform the "Put Message".
You can calculate the visibilitytimeout by subtracting the desired time from the current time. Note that the timeout cannot be longer than 7 days, so if these events are being queued for long periods of times, you may be forced to reinsert the message into the queue repeatedly until you get within the 7 day range and you can successfully process the message.

Approach for tying all NLog logs back to the original request within WebAPI?

I am working to build an API using WebAPI, and have been using NLog for logging throughout the stack. My API solution has two main projects including:
The website layer itself that implements the controllers and webapi stuff
A service layer that implements "async" commands and handlers in a CQRS-like fashion
What I'm trying to achieve is to automatically generate a unique ID that I can attach to log statements so that any logs written while servicing a single request, no matter what layer they came from, can be linked back to that original request. I'd also like this to work without passing the unique ID around, or having the log statements themselves be concerned with including it in their calls.
With that goal in mind I started looking into writing a custom delegating handler to intercept each request (following this post for guidance) and add a unique ID as a property within NLog. I ended up with the following:
/// <summary>
/// This class is a WebAPI message handler that helps establish the data and operations needed
/// to associate log statements through the entire stack back to the originating request.
///
/// Help from here: http://weblogs.asp.net/fredriknormen/log-message-request-and-response-in-asp-net-webapi
/// </summary>
public class InitializeLoggingMessageHandler : DelegatingHandler
{
private ILogger _logger;
// The logger is injected with Autofac
//
public InitializeLoggingMessageHandler(ILogger logger)
{
_logger = logger;
}
protected async override System.Threading.Tasks.Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, System.Threading.CancellationToken cancellationToken)
{
// Get a unique ID for this request
//
var uniqueId = Guid.NewGuid().ToString();
// Now that we have a unique ID for this request, we add it to NLog's MDC as a property
// we can use in the log layouts. We do NOT use the global diagnostic context because
// WebAPI is multi-threaded, and we want the ID to be scoped to just the thread servicing
// this request.
//
NLog.MappedDiagnosticsContext.Set("UniqueId", uniqueId);
// Capture some details about the request for logging
//
var requestInfo = string.Format("{0} {1}", request.Method, request.RequestUri);
var requestMessage = await request.Content.ReadAsByteArrayAsync();
_logger.Info("Request: {0} - {1}", requestInfo, Encoding.UTF8.GetString(requestMessage));
var response = await base.SendAsync(request, cancellationToken);
return response;
}
}
With this code I can then use the unique ID in log layouts like so:
<target xsi:type="Debugger" name="DebugLogger"
layout="${longdate} ${logger} ${mdc:item=UniqueId} ${message}" />
The problem with this approach is that I'm using NLog's MappedDiagnosticsContext to try to save the unique ID as a property that can be used within layouts (so my code doing the logging doesn't need to know). This is a thread-local mechanism for storing values, so it breaks down when you have async code since the thread that starts a request, may not be the one that executes all of it.
So what happens is the first log messages have the unique ID included, but the ones later on could be missing it since they're on a different thread and can't access the value. I also can't use the GlobalDiagnosticsContext within NLog because it's truly global, so multiple requests in WebAPI would easily overwrite the unique ID, and the data would be useless.
So with the goal of associating all log messages back to the request that originated within WebAPI, is there another mechanism that I should be considering?
Take a look at LogicalCallContext. As of .NET 4.5, it supports async scenarios.
Mr. Jeffrey Richter:
The .NET Framework has a little-known facility that allows you to associate data with a “logical” thread-of-execution. This facility is called logical
call context and it allows data to flow to other threads, AppDomains, and even to threads in other processes.
NLog.Extension.Logging ver. 1.0 is able to capture context-properties created with ILogger.BeginScope. These can be extracted using NLog ${mdlc}.
Microsoft engine will by default inject properties like RequestId, RequestPath, etc.
See also: https://github.com/NLog/NLog.Extensions.Logging/wiki/NLog-properties-with-Microsoft-Extension-Logging
If you're using Application Insights they automatically set System.Diagnostics.Activity.Current to an object that has all the Application Insights info you could want and more, including RootId and Id that lets you correlate with other events.
See this answer for more details and how to log it easily with nlog.

Categories