Handling service bus Message.Complete() exceptions - c#

Consider the scenario, an Azure service bus with message deduplication enabled, with a single topic, with a single subscription and an application that is subscribed to that queue.
How can I ensure that the application receives messages from the queue once and only once ?
Here is the code I'm using in my application to receive messages :
public abstract class ServiceBusListener<T> : IServiceBusListener
{
private SubscriptionClient subscriptionClient;
// ..... snip
private void ReceiveMessages()
{
message = this.subscriptionClient.Receive(TimeSpan.FromSeconds(5));
if (message != null)
{
T payload = message.GetBody<T>(message);
try
{
DoWork(payload);
message.Complete();
}
catch (Exception exception)
{
// message.Complete failed
}
}
}
}
The problem I forsee is that if message.Complete() fails for whatever reason, then that message that has just been processed will remain on the subscription's queue in Azure. When ReceiveMessages() is called again it will pick up that same message from the queue and the application would do the same work again.
Whilst the best solution would be to have idempotent domain logic (DoWork(payload)), this would be very difficult to write in this instance.
The only method I can see to ensure once and only once delivery to an application is by building another queue to act as an intermediary between the Azure service bus and the application. I believe this is called a 'Durable client-side queue'.
However I can see that this would be a potential issue for a lot of applications that use Azure service bus, so is a durable client-side queue the only solution ?

The default behavior when you dequeue a message is called "Peek-Lock" it will lock the message so no one else can get it while your processing it and will remove it when you commit. It will unlock if you fail to commit, so it could be picked up again. This is probably what you are experiencing. You can change the behavior to use "Receive and Delete" which will delete it from the queue as soon as you receive it for processing.
https://msdn.microsoft.com/en-us/library/azure/hh780770.aspx
https://azure.microsoft.com/en-us/documentation/articles/service-bus-dotnet-how-to-use-topics-subscriptions/#how-to-receive-messages-from-a-subscription

I have similar challenges in a large scale Azure platform I am responsible for. I use a logical combination of the concepts embodied by the Compensating Transaction pattern (https://msdn.microsoft.com/en-us/library/dn589804.aspx), and Event sourcing Pattern (https://msdn.microsoft.com/en-us/library/dn589792.aspx). Exactly how you incorporate these concepts will vary, but ultimately, you may need to plan on your own "rollback" logic, or detecting that a previous process completed 100% successfully minus the removal of the message. If there is something you could check upfront, you will know that a message was simply not removed, then complete it and move on. How expensive that "check" is may make this a bad idea. You can even "create" an artificial final step, like adding a row to a DB, that runs only when the DoWork reaches the end. You can then check for that row before processing any other messages.
IMO, the best approach is to make sure that all of the steps in your DoWork() check for the existence of the work as having already been performed (if possible). For example, if it's creating a DB table, run a "IF NOT EXISTS(SELECT TABLE_NAME FROM INFORMATION_SCHEMA...". In that scenario, even in the unlikely event this happens, it's safe to process the message again.
Other approaches I use are to store the MessageID's (the sequential bigint on each message) of the previous X messages (i.e. 10,000), and then check for their existence (NOT IN) before I proceed with processing a message. Not as expensive as you might think and very safe. If found, simply Complete() the message and move on. In other situations, I update the message with a "starting" type status (inline in certain queue types, persisted elsewhere in others), then proceed. If you read a message and this is already set to "started", you know something either failed or did not clear appropriately.
Sorry this is not a clear cut answer, but there are a lot of considerations.
Kindest regards...

You can continue to use a single subscription if you include the logic to detect if the message has been successfully processed already or the stage it had reached into your message handling.
For example, I use service bus messages to insert payments from an external payment system into a CRM system. The message handling logic first checks to see if the payment already exists in CRM (using unique ids associated with the payment) before inserting. This was required because very occasionally the payment would be successfully added to CRM but not reported back as such (timeout or connectivity). Using Receive/Delete when picking up a message would mean that payments would potentially be lost, not checking that the payment already existed could result in duplicate payments.
If this is not possible then another solution I have applied is updating table storage to record the progress of handling a message. When picking up a message the table is checked to see if any stages have already been completed. This allows a message to continue from the stage it had reached previously.
The most likely cause of the scenario you outline is that the time taken to DoWork exceeds the lock on the message. The message lock timeout can be adjusted to a value that safely exceeds the expected DoWork period.
It also possible to call RenewLock on a message within the handler if you are able to track time taken to process against the message lock expiry.
Maybe I misunderstand the design principle of a second queue but it seems as if this would be just as vulnerable to the original scenario you outlined.
Hard to give a definitive answer without knowing what your DoWork() involves but I would consider one or combination of the above as a better solution.

Related

C# - Azure Durable Function - Restart Orchestration

i'm working with durable functions. I have already understood how Durable Functions work, so it has an Orchestration that controls the Flow (the order that activities work), that orchestration takes care of the Sequence of the Activities.
But currently i am having a question that i'm not finding the correct answer and maybe you can help me on that:
Imagine that i have one orchestration, with 5 activities.
One of the activities do a Call to an API that will get a Document as an Array of bytes.
If one of the activities fail, the orchestration can throw an exception and i can detect that through the code.
Also i have some retry options that Retry the activities with an interval of 2 minutes.
But... What if those retries doesn't success?
As i was able to read, i can use "ContinueasNew" method in order to restart the orchestration, but there is a problem i think.
If i use this method in order to restart the orchestration 1 hour after, will it resume the activity where it was?
I mean, if the first activity is done and when i restart the orchestration due to failure of one of the activities, will it resume on the 2nd activity as it was before?
Thank you for your time guys.
If you restart the orchestration, it doesn't have any state of the previous one.
So the first activity will run again.
If you don't want that to happen, you'll need to retry the second one until it succeeds.
I would not recommend making that infinite though, an orchestration should always finish at some point.
I'd just increase the retry count to a sufficiently high number so I can be confident that the processing will succeed in at least 99% of cases.
(How likely is your activity to fail?)
Then if it still fails, you could send a message to a queue and have it trigger some alert. You could then start that one from the beginning.
If something fails so many times that the retry amount is breached, there could be something wrong with the data itself and typically a manual intervention may be needed at that point.
Another option could be to send the alert from within the orchestration if the retries fail, and then wait for an external event to come from an admin who approves or denies it to retry.

Is there a pattern or an easy way to cancel a specific run in an Azure WebJob Function

In my multi-tenant application I have a background process that runs in a webjob and can take several minutes. The time varies according to each customer's data.
But sometimes, when I'm testing something, I start the process and (looking at the logs) soon I realize something is wrong and I want to cancel that specific run.
I cannot just kill all the messages in the queue or stop the WebJob, because I'd be killing the processes that are running for the other customers.
And I want to do it programmatically so I can put a Cancel button in my web application.
I was not able to find the best architecture approach (or a pattern) to work with this kind of execution cancellation.
I read about passing a CancellationTokenSource, but I couldn't think of how I would call the Cancel() method on the specific run that I want to cancel. Should I store all currently running tokens in a static class? And then send another message to the webjob telling that I want to cancel it?
(I think that might be the answer, but I'm afraid I'm overthinking. That's why I'm asking it here.)
My Function is as simple as:
public static void EngineProcessQueue([QueueTrigger("job-for-process")] string message, TextWriter log)
{
// Inside this method there is a huge codebase
// and I'm afraid that I'll have to put the "if (token.IsCancelled)" in lots of places...
// (but that's another question)
ProcessQueueMessage(message, log);
}
QueueTrigger is essentially a function trigger. The Cancel you want should not be supported.
Because once the function execution method is entered, the specific business logic code may have asynchronous operations. Assuming that even if we delete or stop the QueueTrigger at this time, business data will be affected and rollback cannot be achieved.
The following is my personal suggestion,
because I think the cancel operation can be improved from the business logic:
Use redis cache, and create a object name of mypools, to store your bussiness command.
When running webjob, we can get all Queue, we also can find in Azure Storage Explore. And we can save it in mypools with specical command.
The format of command should be ClientName-TriggerName-Status-Extend. Such as Acompany-jobforprocess-run-null, when this command has not been executed yet, we can modify it with Acompany-jobforprocess-cancel-null.
We can set Azure WebJob queue name at runtime. Then dynamically handle business in the program.For the executed business, data rollback is performed.

Is there a way I can delay the retry for a service bus message in an Azure function?

I have a function which pulls messages off a subscription, and forwards them to an HTTP endpoint. If the endpoint is unavailable, an exception is thrown. When this happens, I would like to delay the next attempt of that specific message for a certain amount of time, e.g. 15 minutes. So far, I have found the following solutions:
Catch the exception, sleep, then throw. This is a terrible solution, as I will be charged for CPU usage while it is sleeping, and it will affect the throughput of the function.
Catch the exception, clone the message, set the ScheduledEnqueueTimeUtc property and add it back to the queue. This is a nicer way, but it resets the delivery count, so an actual problem will never be dead-lettered, and it is resent to all subscriptions, when only one subscriber failed to process it.
Catch the exception, and place the message on a storage queue instead. This means maintaining a storage queue to match each subscription, and having two functions instead of one.
What I would ideally like to happen is to catch the exception, and exit the function without releasing the lock on the message. That way, as soon as the lock expires the message will be retried again. However, it seems that after completing successfully, the function calls Complete() on the message, and after an exception is thrown, the function calls Abandon() on the message. Is it possible to bypass this, or to achieve the delay some other way?
This is now supported natively through retry policies, which were added to Azure Functions around November 2020 (preview). You can configure the retry policy as fixed-delay or exponential-backoff.
[FunctionName("MyFunction")]
[FixedDelayRetry(10, "00:15:00")] // retries with a 15-minute delay
public static void Run(
[ServiceBusTrigger("MyTopic", "MySubscription", Connection = "ServiceBusConnection")] string myQueueItem)
{
// Forward message to HTTP endpoint, throwing exception if endpoint unavailable
}
I will address your situation by offering that the flow you are proposing is better handled by a LogicApp over a pure function.
It's quite easy to implement the wait/retry/dequeue next pattern in a LogicApp since this type of flow control is exactly what LogicApps was designed for.
Although still in preview (not recommended for production code), you could use Durable Functions. If you want to maintain the ability to manipulate objects in code, this is likely your best bet!
(+1 to LogicApp solution too!)

How to correctly invoke big logic in another Thread/Task?

Here is my problem, I got a WCF project, which doesnt really matter in fact because it's more about C#/.NET I believe. In my WCF Service when client is requestinq one of the methods I make the validation of the input, and if it succeeds I start some business logic calculactions. I want to start this logic in another thread/task so after the input validation I can immediately return response. Its something like this:
XXXX MyMethod(MyArgument arg)
{
var validation = _validator.Validate(arg);
if (validation.Succeed)
{
Task.Run(() => businessLogic())
}
return MyResponseModel();
}
I need to make it like this because my buesinessLogic can take long time calculactions and database saves in the end, but client requesting the Service have to know immediately if the model is correct.
In my businessLogic calculactions/saves that will be running in background thread I have to catch exceptions if something fail and save it in database. (its pretty big logic so many exceptions can be thrown, like for example after calculactions im persisting the object in the database so save error can be thrown if database is offline for example)
How to correctly implement/what to use for such a requirements? I am just giving consideration if using Task.Run and invoking all the logic in the action event is a good practice?
You can do it like this.
Be aware, though, that worker processes can exit at any time. In that case outstanding work will simply be lost. Maybe you should queue the work to a message queue instead.
Also, if the task "crashes" you will not be notified in any way. Implement your own error logging.
Also, there is no limit to the number of tasks that you can spawn like this. If processing is too slow more and more work will queue up. This might not at all be a problem if you know that the server will not be overloaded.
It was suggested that Task.Run will use threads and therefore not scale. This is not necessarily so. Usually, the bottleneck of any processing is not the number of threads but the backend resources being used (database, disk, services, ...). Even using hundreds of threads is not in any way likely to be a bottleneck. Async IO is not a way around backend resource constraints.

Azure function running multiple times for the same service bus queue message

I have an Azure function (based on the new C# functions instead of the old .csx functions) that is triggered whenever a message comes into an Azure Service Bus Queue. Once the function is triggered, it starts processing the service bus message. It decodes the message, reads a bunch of databases, updates a bunch of others, etc... This can take upwards of 30 minutes at times.
Since, this is not a time sensitive process, 30 minutes or even 60 minutes is not an issue. The problem is that in the meanwhile, Azure Function seems to kick in again and picks up the same message again and again and reprocesses it. This is an issue and cause problems in our business logic.
So, the question is, can we force the Azure function to run in a singleton mode? Or if that's not possible, how do we change the polling interval?
The issue is related to Service Bus setting...
What is happening is that the message is added to the queue, the message is then given to the function and a lock is placed on that message so that no other consumer can see/process that message while you have a lock on it.
If within that lock period you do not tell service bus that you've processed the file, or to extend the lock, the lock is removed from the message and it will become visible to other services that will then process that message, which is what you are seeing.
Fortunately, Azure Functions can automatically renew the lock for you. In the host.json file there is an autoRenewTimeout setting that specifies for how long you want Azure Functions to keep on renewing the lock for.
https://github.com/Azure/azure-webjobs-sdk-script/wiki/host.json
"serviceBus": {
// the maximum duration within which the message lock will be renewed automatically.
"autoRenewTimeout": "00:05:00"
},
AutoRenewTimeout is not as great as suggested. It has a downside that you need to be aware of. It's not a guaranteed operation. Being a client side initiated operation, it can and sometimes will fail, leaving you in the same state as you are today.
What you could do to address it is to review your design. If you have a long running process, then you process the message, and hand off processing to something that can run longer than MaxLockDuration. The fact that your function is taking so long, indicates you have a long running process. Messaging is not designed for that.
One of the potential solutions would be to get a message, register processing intent in a storage table. Have another storage table triggered function to kick-off processing that could take X minutes. Mark it as a singleton. By doing so, you'll be processing your messages in parallel, writing "request for long running processing" into storage table, completing Service Bus messaging and therefore not triggering their re-processing. With the long running processing you can decide how to handle failure cases.
Hope that helps.
So your question is on how to make the Azure Function Message trigger to process one message at a time, and a message has to be processed once only.
I have been able to achieve the same functionality using the following host.json configuration.
{
"serviceBus": {
"maxConcurrentCalls": 1,
"autoRenewTimeout": "23:59:59",
"autoComplete": true
}
}
Note that I set the autoRenewTimeout to 24 hour, which is long enough for a really long running process. Otherwise you can change it to a duration that fits your need.
Many will argue the suitability of using Azure Function for a long running operation. But that is not the question that needs an answer here.
I also experience the same issue, what I did is remove the default rule and add a custom rule on subscriber.
(OriginalTopic='name-of-topic' AND OriginalSubscription='name-of-subcription-sub') OR (NOT Exists([OriginalTopic]))

Categories