Durable Function Fan out with Time limit - remains in "running" status

Durable Function Fan out with Time limit - remains in "running" status - c#

I am attempting to implement a timeout for my Durable function.
In my function, I am currently doing a fan out of activity functions, each of which call a separate API to collect current pricing data. (Price comparison site). All of this works well and I am happy with the results, however I need to implement a time out in case 1 or more APIs do not respond within a reasonable time (~15 seconds)
I am using the following pattern:
var parallelActivities = new List<Task<T>>
{
context.CallActivityAsync<T>( "CallApi1", input ),
context.CallActivityAsync<T>( "CallApi2", input ),
context.CallActivityAsync<T>( "CallApi3", input ),
context.CallActivityAsync<T>( "CallApi4", input ),
context.CallActivityAsync<T>( "CallApi5", input ),
context.CallActivityAsync<T>( "CallApi16", input )
};
var timeout = TimeSpan.FromSeconds(15);
var deadline = context.CurrentUtcDateTime.Add(timeout);
using ( var cts = new CancellationTokenSource() )
{
var timeoutTask = context.CreateTimer(deadline, cts.Token);
var taskRaceWinner = await Task.WhenAny(Task.WhenAll( parallelActivities ), timeoutTask);
if ( taskRaceWinner != timeoutTask )
{
cts.Cancel();
}
foreach ( var completedParallelActivity in parallelActivities.Where( task => task.Status == TaskStatus.RanToCompletion ) )
{
//Process results here
}
//More logic here
}
Everything seems to work correctly. If any activity doesn't return within the time limit, the timeout task wins, and the data is processed and returned correctly.
The Durable functions documentation indicates that the Orchestrator states "This mechanism does not actually terminate in-progress activity function execution. Rather, it simply allows the orchestrator function to ignore the result and move on. For more information, see the Timers documentation."
Unfortunately my function remains in the "running" status until it ultimately hits the durable function timeout and recycles.
Am I doing something wrong? I realize that, generally, the durable function will be marked as running until all activities have completed, however the documentation above indicates that I should be able to "ignore" the activities that are running too long.
I could implement a timeout in each individual API, however that doesn't seem like good design and I have been resisting. So, please help me stackoverflow!

According to this, The Durable Task Framework will not change an orchestration's status to "completed" until all outstanding tasks are completed or canceled even though output of those are ignored. Also, according to this and this, we can't cancel activity/sub-orchestration from parent at this moment. So, currently only way I can think of is to pass a Timeout param (of TimeSpan type) from parent as part of the input object to the activity (e.g. context.CallActivityAsync<T>( "CallApi1", input )) and let the child activity function handle it's exit respecting that timeout. I tested this myself and works fine. Please feel free to reach me for any follow up.

Related

c# make my a specific line to timeout after x seconds c#

I have a line in C# which does not work very reliable and does not time out at all and runs for infinity.
to be more precise i am trying to check the connection to a proxy WebClient.DownloadString
I want it to timeout after 5 seconds without making the full method asynchronous
so the code should be like this:
bool success = false
do_this_for_maximum_5_seconds_or_until_we_reach_the_end
{
WebClient.DownloadString("testurl");
success = true;
}
it will try to download testurl and after it did download it it will set success to true. If DownloadString takes more than 5 seconds, the call is canceled, we do not reach the the line where we set success to true, so it remains false and i know that it field.
The thread will remain frozen while we try to DownloadString, so the action is not taking parallel. The ONLY difference to a normal line would be that we set a timeout after 5 seconds
Please do not suggest alternatives such as using HttpClient, because i need a similar codes also for other places, so i simply want a code which will run in a synchronous application (i have not learned anything about asynchronus programing therefore i would like to avoid it completely)
my approach was like suggested by Andrew Arnott in this thread
Asynchronously wait for Task<T> to complete with timeout
however my issue is, I am not exactly sure what type of variable "SomeOperationAsync()" is in his example (i mean it seems like a task, but how can i put actions into the task?), and the bigger issue is that VS wants to switch the complete Method to asynchronos, but i want to run everything synchronous but just with a timeout for a specific line of code.
In case the question has been answered somewhere kindly provide a link
Thank you for any help!!

You should use Microsoft's Reactive Framework (aka Rx) - NuGet System.Reactive and add using System.Reactive.Linq; - then you can do this:
var downloadString =
Observable
.Using(() => new WebClient(), wc => Observable.Start(() => wc.DownloadString("testurl")))
.Select(x => new { success = true, result = x });
var timeout =
Observable
.Timer(TimeSpan.FromSeconds(5.0))
.Select(x => new { success = false, result = (string)null });
var operation = Observable.Amb(downloadString, timeout);
var output = await operation;
if (output.success)
{
Console.WriteLine(output.result);
}
The first observable downloads your string. The second sets up a timeout. The third, uses the Amb operator to get the result from which ever of the two input observables completes first.
Then we can await the third observable to get its value. And then it's a simple task to check what result you got.

Dataflowblock stops updating UI but still runs the action?

This issue is really hard to debug, not always happens (not happen in a short time so that I can just debug the code easily) and looks like no one out there has had the similar issue like this? (I've googled for hours without finding anything related to this issue).
In a short word, my dataflow network works fine at some point until I find out that the terminal block (which updates the UI) seems to stop working (no new data updated on the UI) whereas all the upwards dataflow blocks are still working fine. So it's like there is some disconnection between the other blocks and the ui block here.
Here is my detailed dataflow network, let's check out first before I'm going to explain more about the issue:
//the network graph first
[raw data block]
-> [switching block] -> [data counting block]
-> [processing block] -> [ok result block] -> [completion monitoring]
-> [not ok result block] -> [completion monitoring]
//in the UI code behind where I can consume the network and plug-in some other blocks for updating
//like this:
[ok result block] -> [ok result counting block]
[not ok result block] -> [other ui updating]
The block [ok result block] is a BroadcastBlock which pushes result to the [ok result counting block]. The issue I've described partly here is this [ok result counting block] seems to be disconnected from [ok result block].
var options = new DataflowBlockOptions { EnsureOrdered = false };
var execOptions = new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 80 };
//[raw data block]
var rawDataBlock = new BufferBlock<Input>(options);
//[switching block]
var switchingBlock = new TransformManyBlock<Input,Input>(e => new[] {e,null});
//[data counting block]
var dataCountingBlock = new BroadcastBlock<Input>(null);
//[processing block]
var processingBlock = new TransformBlock<Input,int>(async e => {
//call another api to compute the result
var result = await …;
//rollback the input for later processing (some kind of retry)
if(result < 0){
//per my logging, there is only one call dropping
//in this case
Task.Run(rollback);
}
//local function to rollback
async Task rollback(){
await rawDataBlock.SendAsync(e).ConfigureAwait(false);
}
return result;
}, execOptions);
//[ok result block]
var okResultBlock = new BroadcastBlock<int>(null, options);
//[not ok result block]
var notOkResultBlock = new BroadcastBlock<int>(null, options);
//[completion monitoring]
var completionMonitoringBlock = new ActionBlock<int>(e => {
if(rawDataBlock.Completion.IsCompleted && processingBlock.InputCount == 0){
processingBlock.Complete();
}
}, execOptions);
//connect the blocks to build the network
rawDataBlock.LinkTo(switchingBlock);
switchingBlock.LinkTo(processingBlock, e => e != null);
switchingBlock.LinkTo(dataCountingBlock, e => e == null);
processingBlock.LinkTo(okResultBlock, e => e >= 9);
processingBlock.LinkTo(notOkResultBlock, e => e < 9);
okResultBlock.LinkTo(completionMonitoringBlock);
notOkResultBlock.LinkTo(completionMonitoringBlock);
In the UI code behind, I plug in some other UI blocks to update the info. Here I'm using WPF but I think it does not matter here:
var uiBlockOptions = new ExecutionDataflowBlockOptions {
TaskScheduler = TaskScheduler.FromCurrentSynchronizationContext()
};
dataCountingBlock.LinkTo(new ActionBlock<int>(e => {
//these are properties in the VM class, which is bound to the UI (xaml view)
RawInputCount++;
}, uiBlockOptions));
okResultBlock.LinkTo(new ActionBlock<int>(e => {
//these are properties in the VM class, which is bound to the UI (xaml view)
ProcessedCount++;
OkResultCount++;
}, uiBlockOptions));
notOkResultBlock.LinkTo(new ActionBlock<int>(e => {
//these are properties in the VM class, which is bound to the UI (xaml view)
ProcessedCount++;
PendingCount = processingBlock.InputCount;
}, uiBlockOptions));
I do have code monitoring the completion status of the blocks: rawDataBlock, processingBlock, okResultBlock, notOkResultBlock.
I also have other logging code inside the processingBlock to help diagnosing.
So as I said, after some fairly long time (about 1 hour with about 600K items processed, actually this number says nothing about the issue, it could be random), the network seems to still run fine except that some counts (ok result, not ok result) are not updated, as if the okResultBlock and notOkResultBlock were disconnected from the processingBlock OR they were disconnected from the UI blocks (which updates the UI). I ensure that the processingBlock is still working (no exception logged and the results are still written to file), the dataCountingBlock is still working well (with new count updated on the UI), all the blocks processingBlock, okResultBlock, notOkResultBlock are not completed (their completions are .ContinueWith a task which logs out the status and nothing logged).
So it's really stuck there. I don't have any clue about why it could stop working like that. This could only happen when we use a black-box library like TPL Dataflow. I know it may also be hard for you to diagnose, imagine and think about possibilities. I just asked here for suggestions to solve this as well as any shared experience (about the similar issues) from you and possibly some guesses about what could cause such kind of issue in TPL Dataflow
UPDATE:
I've successfully reproduced the bug one more time and before I had prepared some code to write down some info to help debugging. The issue now keeps down to this point: The processingBlock somehow does not actually push/post/send any msg to all the linked blocks (including the okResultBlock and notOkResultBlock) AND even a new block (prepended with DataflowLinkOptions having Append of false) linked to it could not receive any message (the result). As I said the processBlock does seem to still work fine (its Action does run the code inside and produce result logging normally). So this is still a very strange issue.
In a short word, the problem now becomes why the processBlock could not send/post its messages to all the other linked blocks? Is there any possible cause for that to occur? How to know if the blocks are linked successfully (after the call to .LinkTo)?

It's actually my fault, the processingBlock is actually blocked but it's blocked correctly and in a good way (by design).
The processingBlock is blocked by 2 factors:
The EnsureOrdered is true (as by default), so the output is always queued in the processed order.
There is at least one output result which cannot be pushed out (to some other block).
So if one output result cannot be pushed out, it will be a blocking item because of all the output results being queued in the processed order. All the after processed output results will simply be blocked (queued up) by the first output result that cannot be pushed out.
In my case the special output result that cannot be pushed out here is a null result. That null result can only be produced by some error (exception handling). So I have 2 blocks okResultBlock and notOkResultBlock linked to the processingBlock. But both those blocks are filtered to let only non-null results go through. Sorry that my question does not reflect the exact code I have, about the output type. In the question it is just a simple int but actually it's a class (nullable), the actual linking code looks like this:
processingBlock.LinkTo(okResultBlock, e => e != null && e.Point >= 9);
processingBlock.LinkTo(notOkResultBlock, e => e != null && e.Point < 9);
So the null output result will be blocked and consequentially block all the after processed result (because of the option EnsureOrdered being true by default).
To fix this, I just simply set the EnsureOrdered to false (although this is not required to avoid the blocking, but it's good in my case) and add one more block to consume the null output result (this is the most important to help avoid blocking):
processingBlock.LinkTo(DataflowBlock.NullTarget<Output>(), e => e == null);

Hangfire ContinueWithJob is stuck in awaiting state, though parent job has succeeded

I have a few jobs executed one after the other via ContinueJobWith<MyHandler>(parentJobId, x => x.DoWork()).
However, the second job is not getting processed and always sits in Awaiting state:
The job itself is like this:
Why this can happen and where to check for resultion?
We are using Autofac as DI container, but we have our own JobActivator implementation because we have to deal with multitenancy.
We are using SQL Server 2019 for storage.
Hangfire version is 1.7.10
This is MVC 5 application
I've not seen any errors/exceptions in any logs or during debugging
After going through this I've added this to our Autofac registration
builder.RegisterType<BackgroundJobStateChanger>()
.As<IBackgroundJobStateChanger>()
.InstancePerLifetimeScope();
This made no difference.
This is how the jobs are executed:
var parentJobId = _backgroundJobClient.Schedule<Handler>(h => h.ConvertCertToTraining(certId, command.SetUpOneToOneRelationship), TimeSpan.FromSeconds(1));
var filesCopyJObId = _backgroundJobClient.ContinueJobWith<Handler>(parentJobId, h => h.CopyAttachedFiles());
_backgroundJobClient.ContinueJobWith<Handler>(filesCopyJObId, h => h.NotifyUser(command.CertificationToBeConvertedIds, _principal.GetEmail()));
All the parameters are either int, bool or string. If I enqueue the awaiting jobs by hand, they are executed without issues.
I've added Hangfire logging, but could not see any issues there: server starts, stops, jobs change status, but could not see any obvious errors there.
What other things I should consider or where/how should I debug this?

From the looks of it, the first job with ID 216348 completed successfully but your second job with ID 216349 is waiting on the parent ID of 216347. According to Hangfire documentation and experience, the parentID should be of the job that you are waiting to finish before executing the second job.
According to Hangfire documentation on ContinueJobWith, "Continuations are executed when its parent job has been finished". From your screenshots, it is not clear whats going on with JobID: 216347. Once this job, 216347 completes, job with ID 216349 should kick off. If you are expecting 216349 to start after 216348 finishes, check your code and make sure correct ParentID is passed to the second job.
Update
Based on this thread, add the ContinuationsSupportAttribute to GlobalJobFilters.Filter where you configure Hangfire service. This should make your Hangfire instance aware of continuation jobs.
GlobalJobFilters.Filters.Add(new ContinuationsSupportAttribute());

During the investigation, it turned out that we were replacing JobFilterProviderCollection with our own collection:
var filterProviderCollection = new JobFilterProviderCollection
{
new MyFilterProvider(...)
};
var backgroundJobClient = new BackgroundJobClient(JobStorage.Current, filterProviderCollection);
MyFilterProvider looked like this:
public IEnumerable<JobFilter> GetFilters(Job job)
{
return new JobFilter[]
{
new JobFilter(new HangfireTenantFilter(_tenantDetail, _principal), JobFilterScope.Global, null),
new JobFilter(new HangfireFunctionalityFilter(_functionalityFilter), JobFilterScope.Global, null),
};
}
It turned out that code that was doing work on Continuation only took filters from this filter collection and ContinuationsSupportAttribute was not executed there in the right time. So re-adding default Hangfire filters from GlobalJobFilters.Filters fixed the situation:
public IEnumerable<JobFilter> GetFilters(Job job)
{
var customFilters = new List<JobFilter>()
{
new JobFilter(new HangfireTenantFilter(_tenantDetail, _principal), JobFilterScope.Global, null),
new JobFilter(new HangfireFunctionalityFilter(_functionalityFilter), JobFilterScope.Global, null),
};
customFilters.AddRange(GlobalJobFilters.Filters);
return customFilters;
}

Create a trigger to run a long time executed function C#

I have a function that supposes to run every night at 12 AM and to do some job
usually it takes 2 hours...
I want to create a trigger that calls it.
so I created an Azure function app with time trigger that calls with HTTP request to my controller that calls my function.
the controller function I created just for test.
[HttpGet]
public async Task<bool> updateFromRegAdmin()
{
try
{
RegEditApi_Service.retrieveRegAdminApiCredentials();
return true;
}
catch (Exception e)
{
Logger.writeToLog(Logger.LOG_SEVERITY_TYPE.Error, "", "updateFromRegAdmin ", e.Message);
return false;
}
}
so as I said the function "retrieveRegAdminApiCredentials" runs 2 hours.
and the problem is the request comes to timeout after a few minutes...
so how can I create a request that just triggers the inner function and let it run in the background?
by the way, I can't create a trigger on the server without an HTTP request because my company has scaled servers on Azure(it will run my trigger multiple time and create DB duplicates).
my previous solution to that was...
public class JobScheduler
{
public static void Start()
{
IScheduler scheduler = StdSchedulerFactory.GetDefaultScheduler();
scheduler.Start();
IJobDetail job = JobBuilder.Create<GetExchangeRates>().Build();
ITrigger trigger = TriggerBuilder.Create()
.WithDailyTimeIntervalSchedule
(s =>
s.WithIntervalInHours(24)
.OnEveryDay()
.StartingDailyAt(TimeOfDay.HourAndMinuteOfDay(00, 00))
)
.Build();
scheduler.ScheduleJob(job, trigger);
}
}
public class GetExchangeRates : IJob
{
public void Execute(IJobExecutionContext context)
{
Random random = new Random();
int randomNumber = random.Next(100000, 900000);
Thread.Sleep(randomNumber);
RegEditApi_Service.retrieveRegAdminApiCredentials();
}
}

If I understand you correctly, what you have is an Azure Function Timer trigger, that sends an HTTP request to your server with "RegEditApi_Service.retrieveRegAdminApiCredentials()".
The problem is, your function times out. To solve this, you should have the HTTP endpoint behind "retrieveRegAdminApiCredentials()", return immediately on accepting the request.
If you need some return value from the server, you should have the server put a message on some queue ( like Azure Storage queue) and have another Azure Function that listens to this queue, and accepts the message.
If the result of the long operation is relatively small, you can just have the result in the message. Otherwise, you would need to perform some operation, but this operation should be much quicker, because you have already performed the long running operation, and kept the answer, so now you will just retrieve it, and possibly do some cleanup.
You can also look into Azure Durable Functions, it is intended for this use case, but is still in preview, and I'm not sure how much benefit it will give you :
https://learn.microsoft.com/en-us/azure/azure-functions/durable-functions-overview#pattern-3-async-http-apis

Looks like you need a dedicated component able to schedule and execute a queue of tasks. There are nice frameworks for that, but if you dislike those for whatever reason, then make sure you initiate/reuse idle thread and force long execution there. As such, your API will return something alike: 200, OK meaning that process has started successfuly.
Key idea: distinct your threads explicitly. That's actually quite challenging.

Azure functions by default run to a maximum of 15 minutes (maybe 5, too lazy to check the documentation right now :-) ).
If your function is on a Consumption Plan, you can't increase this time. You can do it if you host your function on a App Service plan.

Pay with Amazon behaving async

I have integrated Pay with Amazon with my web app, but I have determined that capturing funds only works when I step through the code debugging, and does not happen if I don't have a break-point. To me, this indicates that a pause is necessary. I am using recurring payments. The relevant section of code is below:
...
//make checkout object
AmazonAutomaticSimpleCheckout asc = new AmazonAutomaticSimpleCheckout(billingAgreeementId);
//capture
CaptureResponse cr = asc.Capture(authId, amount, 1);
//check if capture was successful
if (cr.CaptureResult.CaptureDetails.CaptureStatus.State == PaymentStatus.COMPLETED)
{
...
//give the user the things they paid for in the database
...
return "success";
}
...
So, if I have a break-point at the capture line under //capture, then the function returns success. If I do not have the break-point, I get a runtime exception System.NullReferenceException: Object reference not set to an instance of an object. regarding the following if statement.
To me, this implies that I should be able to await the capture method.
Also note, the capture(...) method is calling the CaptureAction(...) method, just as the C# sample does.
//Invoke the Capture method
public CaptureResponse Capture(string authId, string captureAmount, int indicator)
{
return CaptureAction(propertiesCollection, service, authId, captureAmount, billingAgreementId, indicator, null, null);
}
How can I await the capture call? Am I forgetting to pass a parameter to indicate that it should execute the operation immediately?

It seems after some experimentation, that a function that will essentially achieve the wait I was performing manually using a break-point is the function CheckAuthorizationStatus(), which is also in the C# sample provided with the documentation.
So the fixed code simply adds CheckAuthorizationStatus() before calling the capture() method. CheckAuthorizationStatus() apparently loops until the state of the authorization changes. This seems somewhat kludgey to me, but seems to be how the Pay with Amazon APIs are meant to be used, as best I can tell. Corrected code below:
//make checkout object
AmazonAutomaticSimpleCheckout asc = new AmazonAutomaticSimpleCheckout(billingAgreeementId);
//capture
CaptureResponse cr;
GetAuthorizationDetailsResponse gadr = asc.CheckAuthorizationStatus(authId);
cr = asc.Capture(authId, amount, 1);
//gadr = asc.CheckAuthorizationStatus(authId);
//check if capture was succeddful
if (cr.CaptureResult.CaptureDetails.CaptureStatus.State == PaymentStatus.COMPLETED)
{
...
return "success";
}

When using asynchronous mode you will typically rely on a couple of ways of handling it. The result of AuthorizeOnBillingAgreement will return a Amazon authorization Id (e.g. P01-1234567-1234567-A000001). Once you have the authorization Id you can:
Poll GetAuthorizationDetails - This will return the authorization details which will contain the "State" of the authorization. When the state is "Open" you can then make the Capture API call passing in the authorization Id.
Wait for the Instant Payment Notification (IPN). If you have a IPN handler you can watch for it and make the capture API call as described in step 1. The IPN is usually sent within 60 seconds and it will have the final processing status (Open or Declined).
You shouldn't add an arbitrary pause. You should always check the state of the authorization before making the capture. Even if the payment status is completed you still need to check the state.

Disclaimer:
I don't implement recurring payments, only a straightforward payment - though just reading the documentation it seems similar or at least there is a synchronous option.
Because it meets my requirements, I opt for the synchronous process. In essence treating it like a "payment gateway" - give me the result "now" and I'll deal with whatever result.
Additionally, AUTH and CAPTURE in one step - again, this is based on one's operational requirement/s.
The 2 related items are:
CaptureNow=true
TransactionTimeout=0
A value of zero always returns a synchronous Open or Declined
You'll get (synchronously):
AuthorizeResult.AuthorizationDetails which will have
AmazonAuthorizationId, AuthorizationAmount, etc
AuthorizeResult.AuthorizationDetails.IdList
null on failure
otherwise it will contain the capture id (if capture was successful)
AuthorizeResult.AuthorizationDetails.IdList.member - I've only seen this to contain 1 item (the CaptureId)
You can then use the CaptureId to call GetCaptureDetails and do what you need to do after parsing the GetCaptureDetailsResponse
Again, above is based on Payments API flow (not recurring Payments/Billing Agreement) so I hope it at least helps/gives you an avenue/idea for testing the synchronous option.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Durable Function Fan out with Time limit - remains in "running" status - c#

Related

c# make my a specific line to timeout after x seconds c#

Dataflowblock stops updating UI but still runs the action?

Hangfire ContinueWithJob is stuck in awaiting state, though parent job has succeeded

Create a trigger to run a long time executed function C#

Pay with Amazon behaving async

Categories

Resources