Service Bus message abandoned despite WebJobs SDK handler completed successfully

Service Bus message abandoned despite WebJobs SDK handler completed successfully - c#

I have implemented a long running process as a WebJob using the WebJobs SDK.
The long running process is awaited because I want the result.
public async Task ProcessMessage([ServiceBusTrigger("queuename")] MyMessage message)
{
await Run(message.SomeProperty); // takes several minutes
// I want to do something with the result here later..
}
What I can't figure out is why the message sometimes is abandoned which of course triggers the handler again. I've tried to debug (locally), setting breakpoints before ProcessMessage finishes and I can see that it appears to finish successfully.
The Sevice Bus part of the WebJobs SDK takes care of message lock renewal, so that shouldn't be a problem as far as I've understood.
What am I missing and how do I troubleshoot?

[Edited previously incorrect response]
The WebJobs SDK relies on the automatic lock renewals done by MessageReceiver.OnMessageAsync. These renewals are governed by the OnMessageOptions.AutoRenewTimeout setting, which can be configured like so in the v1.1.0 release of the WebJobs SDK:
JobHostConfiguration config = new JobHostConfiguration();
ServiceBusConfiguration sbConfig = new ServiceBusConfiguration();
sbConfig.OnMessageOptions = new OnMessageOptions
{
MaxConcurrentCalls = 16,
AutoRenewTimeout = TimeSpan.FromMinutes(10)
};
config.UseServiceBus(sbConfig);
You can also customize these values via a custom MessageProcessor. See the release notes here for more details on these new features.

Related

Azure Service Bus MessageLockLostException when Completing Locked Message

I'm getting a MessageLockLostException when performing a complete operation on Azure Service Bus after performing a long operation of 30 minutes to over an hour. I want this process to scale and be resilient to failures so I keep hold of the Message lock and renew it well within the default lock duration of 1 minute. However when I try to complete the message at the end, even though I can see all the lock renewals have occurred at the correct time I get a MessageLockLostException. I want to scale this up in the future however there is currently only one instance of the application and I can confirm that the message still exists on the Service Bus Subscription after it errors so the problem is definitely around the lock.
Here are the steps I take.
Obtain a message and configure a lock
messages = await Receiver.ReceiveAsync(1, TimeSpan.FromSeconds(10)).ConfigureAwait(false);
var message = messages[0];
var messageBody = GetTypedMessageContent(message);
Messages.TryAdd(messageBody, message);
LockTimers.TryAdd(
messageBody,
new Timer(
async _ =>
{
if (Messages.TryGetValue(messageBody, out var msg))
{
await Receiver.RenewLockAsync(msg.SystemProperties.LockToken).ConfigureAwait(false);
}
},
null,
TimeSpan.FromSeconds(Config.ReceiverInfo.LockRenewalTimeThreshold),
TimeSpan.FromSeconds(Config.ReceiverInfo.LockRenewalTimeThreshold)));
Perform the long running process
Complete the message
internal async Task Complete(T message)
{
if (Messages.TryGetValue(message, out var msg))
{
await Receiver.RenewLockAsync(msg.SystemProperties.LockToken);
await Receiver.CompleteAsync(msg.SystemProperties.LockToken).ConfigureAwait(false);
}
}
The code above is a stripped down version of what's there, I removed some try catch error handling and logging we have but I can confirm that when debugging the issue I can see the timer execute on time. It's just the "CompleteAsync" that fails.
Additional Info;
Service Bus Topic has Partitioning Enabled
I have tried renewing it at 80% of the threshold (48 seconds), 30% of the Threshold (18 seconds) and 10% of the Threshold (6 seconds)
I've searched around for an answer and the closest thing I found was this article but it's from 2016.
I couldn't get it to fail in a standalone Console Application so I don't know if it's something I'm doing in my Application but I can confirm that the lock renewal occurs for the duration of the processing and returns the correct DateTime for the updated lock, I'd expect if the lock was truely lost that the CompleteAsync would fail
I'm using the Microsoft.Azure.ServiceBus nuget package Version="4.1.3"
My Application is Dotnet Core 3.1 and uses a Service Bus Wrapper Package which is written in Dotnet Standard 2.1
The message completes if you don't hold onto it for a long time and occasionally completes even when you do.
Any help or advice on how I could complete my Service Bus message successfully after an hour would be great

The issue here wasn't with my code. It was with Partitioning on the Service Bus topic. If you search around there are some issues on the Microsoft GitHub around completion of messages. That's not important anyway because the fix I used here was to use the Subscription forwarding feature to move the message to a new Topic with partitioning disabled and then read the message from that new topic and I was able to use the exact same code to keep the message locked for a long time and still complete it successfully

How to "trick" Azure Function into running longer than 10 minutes

Azure Functions have a time limit of 10 minutes. Suppose I have a long-running task such as downloading a file that takes 1 hr to download.
[FunctionName("PerformDownload")]
[return: Queue("download-path-queue")]
public static async Task<string> RunAsync([QueueTrigger("download-url-queue")] string url, TraceWriter log)
{
string downloadPath = Path.Combine(Path.GetTempPath(), Guid.NewGuid().ToString);
log.Info($"Downloading file at url {url} to {downloadPath} ...");
using (var client = new WebClient())
{
await client.DownloadFileAsync(new Uri(url), myLocalFilePath);
}
log.Info("Finished!");
}
Is there any hacky way to make something like this start and then resume in another function before the time limit expires? Or is there a better way altogether to integrate some long task like this into a workflow that uses Azure Functions?
(On a slightly related note, is plain Azure Web Jobs obsolete? I can't find it under Resources.)

Adding for others who might come across this post: Workflows composed of several Azure Functions can be created in code using the Durable Functions extension, which can be used to create orchestration functions that schedule async tasks, shut down, and are reawakened when said async work is complete.
They're not a direct solution for long-running tasks that require an open TCP port, such as downloading a file, (for that, a function running on an App Service Plan has no execution time limit), but it can be used to integrate such tasks into a larger workflow.

Is there any hacky way to make something like this start and then
resume in another function before the time limit expires?
If you are on a Consumption Plan you have no control over how long your Function App runs, and so it would not be reliable to use background threads that continue running after your Function entry point completes.
On an App Service plan you're running on VMs you pay for, so you can configure your Function App to run continuously. Also AFAIK you don't have to have a Function timeout on an App Service Plan, so your main Function entry point can run for as long as you want.
Or is there a better way altogether to integrate some long task like this into a workflow that uses Azure Functions?
Yes. Use Azure Data Factory to copy data into Blob Storage, and then process it. The Data Factory pipeline can call Functions both before and after the copy activity.

One additional option, depending on the details of your workload, is to take advantage of Azure Container Instances. You can have your Azure Function spin up a container, process your workload (download your file \ do some processing, etc), and then shut down your container for you. Spin up time is typically a few seconds and you only pay for what you use (no need for a dedicated app service plan or vm instance). More details on ACI here.

10 minutes (based on the timeout setting in the host.json file) after the last function of your function app has been triggered, the VM running your function app will stop.
To prevent this behavior to happen, you can have an empty Timertrigger function that runs every 5 minutes. it wont cost anything and will keep your app up and running.

I think the issue is related with the Cold Start state. Here you can find more details about it.
https://markheath.net/post/avoiding-azure-functions-cold-starts
What you can do is, create an trigger azure function that "ping" your long running function to keep it "warm"
namespace NewProject
{
public static class PingTimer
{
[FunctionName("PingTimer")]
public static async Task Run([TimerTrigger("0 */4 * * * *")]TimerInfo myTimer, TraceWriter log)
{
// This CRON job executes every 4 minutes
log.Info($"PingTimer function executed at: {DateTime.Now}");
var client = new HttpClient();
string url = #"<Azure function URL>";
var result = await client.GetAsync(new Uri(url));
log.Info($"PingTimer function executed completed at: {DateTime.Now}");
}
}}

How to programmatically mark an Azure WebJob as failed?

Is there a way to mark a WebJob (triggered, not continuous) as failed, without throwing an exception? I need to check that certain conditions are true to mark the job as successful.

According to Azure WebJob SDK, Code from TriggeredFunctionExecutor class.
public async Task<FunctionResult> TryExecuteAsync(TriggeredFunctionData input, CancellationToken cancellationToken)
{
IFunctionInstance instance = _instanceFactory.Create((TTriggerValue)input.TriggerValue, input.ParentId);
IDelayedException exception = await _executor.TryExecuteAsync(instance, cancellationToken);
FunctionResult result = exception != null ?
new FunctionResult(exception.Exception)
: new FunctionResult(true);
return result;
}
We know that the WebJobs status depends on whether your WebJob/Function is executed without any exceptions or not. We can't set the finial status of a running WebJob programmatically.
I need to check that certain conditions are true to mark the job as successful.
Throw an exception is the only way I found. Or you could store the webjob execute result in an additional place(For example, Azure Table Storage). We can get the current invocation id by ExecutionContext class. In your webjob, you could save the current invocation id and the status you wanted to an Azure Table Storage. You could query the status later if you needed from Azure Table Storage based on the invocation id.
public static void ProcessQueueMessage([QueueTrigger("myqueue")] string message, ExecutionContext context, TextWriter log)
{
log.WriteLine(message);
SaveStatusToTableStorage(context.InvocationId, "Fail/Success");
}
To use ExecutionContext as parameter, you need to install Azure WebJobs SDK Extensions using NuGet and invoke UserCore method before your run your WebJob.
var config = new JobHostConfiguration();
config.UseCore();
var host = new JobHost(config);
host.RunAndBlock();

Throwing an unmanaged exception will result in a Failed execution.
But i have noticed that it will also result with a bad management of your message: i.e. your message will be dequeued but not moved to your poison queue regarding your configuration (but maybe it was due to my SDK version).

#Jean NETR-VALERE the newer versions of the WebJobs packages do act as you say and if an exception is thrown the job will fail, and will continue to be run over and over and over until you finally clear your queue. This is absolutely horrible behavior and I have no clue why they changed this.
Yes they did change it to make it work this way, because I use an older version of the webjobs package just for this reason. About 3 months ago I upgraded to the newer version, and shortly after could not understand why the above behavior was happening . Once I reverted back to the older version, it started working correctly again and after failing 5 times is moved to poison queue and never ran again. My point is that if you want the correct (IMO) behavior, see if you can go back to using version 1.1.0 and you will be happy. Hope that helps.

To mark a triggered web job as failed you just need to set process exit code to non-zero.
System.Environment.ExitCode = 1;
When you throw an unhandled exception it also sets the exit code, that is how Azure determines failure.

Trigger WebJob at a particular time after record added to a database

I want to trigger an Azure Webjob 24Hours after I have added a record to a database using .NET . Obviously there will be multiple tasks for the Webjob to handle, all at their designated time. Is there a way ( in the Azure Library for .NET) in which i can schedule this tasks ?
I am free to use Message Queues , but I want to try and avoid the unnecessary polling of the WebJob for new messages.

If you want to trigger the execution of a WebJob 24 hours after a record insertion in a SQL database I would definitely use Azure Queues for this. So after you insert the record, just add a message to the queue.
In order to do this you can easily leverage the initialVisibilityDelay property that can be passed to the CloudQueue.AddMessage() method. This will make the message invisible for 24 hours in your case, and then it will appear to be processed by your Webjob. You don't have to schedule anything, just have a Continuous WebJob listening to a queue running.
Here's some sample code:
public void AddMessage(T message, TimeSpan visibilityDelay)
{
var serializedMessage = JsonConvert.SerializeObject(message);
var queue = GetQueueReference(message);
queue.AddMessage(new CloudQueueMessage(serializedMessage), null, visibilityDelay);
}
private static CloudQueue GetQueueReference(T message)
{
var storageAccount = CloudStorageAccount.Parse("Insert connection string");
var queueClient = storageAccount.CreateCloudQueueClient();
var queueReference = queueClient.GetQueueReference("Insert Queue Name");
queueReference.CreateIfNotExists();
return queueReference;
}
Hope this helps

Since the event of adding a record to the database is the trigger here, You can use Azure Management Libraries to create a Azure Scheduler Job to execute after 24hrs from the time the db record is inserted. Azure Scheduler Jobs can do only 3 things : make HTTP/HTTPS requests or Put Message in Queue. Since you do not want to poll queues, here are two options
Deploy the existing Web Job as Wep API where each task is reachable by unique URLs, so that the scheduler task can execute the right HTTP/HTTPS request
Create a new WebAPI/Wep API which takes accepts request (like a man in the middle) and pro-grammatically run the existing web job on demand, again using Azure management libraries.
Please let me know if any of these strategies help.

To invoke a WebJob from your Website,is not good idea rather than you can add the WebJob code inside your Website and simply call that code. you can still easily use the WebJob SDK from inside your Website.
https://github.com/Azure/azure-webjobs-sdk-samples
we wouldn't recommend to invoke the WebJob from your Website is that the invocation contains a secret you rather not store on your Website (deployment credentials).
Recommendation:
To separate WebJob and Website code, the best thing to do is to communicate using a queue, the WebJob listens on the queue and the Website pushes the request to the queue.

Masstransit use RabbitMQ is very slow performance?

I used RabbitMQ without Masstransit and send 10,000 message/per sec and One million message in 100 second.
But after using Masstransit with RabbitMQ the performance is very low in my machine.
The hard disk is very active (99% usage) when publish/consume message and CPU activity for this process is almost 0%.
When the run Publisher/Subscriber console application with this code :
var bus = ServiceBusFactory.New(x =>
{
x.UseRabbitMq();
x.ReceiveFrom("rabbitmq://localhost/Example_Hello");
});
var message = new MyMessage() { Text = "hello", When = DateTime.Now };
for (int i = 0; i < 100; i++)
{
bus.Publish<MyMessage>(message, x => { });
}
Published 100 message in 6 second and I don't know why is very slow.
My machine's configuration and software version is:
Windows 8.1 64bit
Intel Core i3 3.30GHz
Memory 8GB
Visual Studio 2013 C#.Net 4.5.1
Erlang 6.3
RabbitMQ 3.4.4
Masstransit 2.9.9
RabbitMQ.Client 3.4.0

This is because under the covers, MassTransit is waiting for RabbitMQ to Ack the message, ensuring that it was successfully accepted by the broker before returning to the caller. Without this wait, if the broker fails to receive the write, the message could be lost.
With MassTransit v3 (currently in pre-release), the Publish method (and all Send methods) are now async, returning a Task that can be awaited (or not, if you don't care about the outcome).
I could add a PublishAsync method for .NET 4.0 developers to version 2.9.9, in fact, that's what I may end up doing as a temporary workaround for applications still using the current stable version. It may also be useful to add a NoWait option to the SendContext, allowing the application to opt-out of the Ack wait behavior.
I just favor durability over performance in my use case, honestly.

I have found a bug in MT2 <= 2.10.1 that prevents it from correctly handling the WaitForAck flag. I posted a patch proposal and hope Chris releases 2.10.2 as soon as possible.
The detailed information on the issue is described here:
https://groups.google.com/forum/#!topic/masstransit-discuss/XiqSDnJzd8U
In short, the issue is caused by the bug in the SendContext copy constructor, despite the original context has the wait for ack flag set to false, the context that is used in the call has the flag always set to true.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Service Bus message abandoned despite WebJobs SDK handler completed successfully - c#

Related

Azure Service Bus MessageLockLostException when Completing Locked Message

How to "trick" Azure Function into running longer than 10 minutes

How to programmatically mark an Azure WebJob as failed?

Trigger WebJob at a particular time after record added to a database

Masstransit use RabbitMQ is very slow performance?

Categories

Resources