Hangfire ContinueWithJob is stuck in awaiting state, though parent job has succeeded - c#

I have a few jobs executed one after the other via ContinueJobWith<MyHandler>(parentJobId, x => x.DoWork()).
However, the second job is not getting processed and always sits in Awaiting state:
The job itself is like this:
Why this can happen and where to check for resultion?
We are using Autofac as DI container, but we have our own JobActivator implementation because we have to deal with multitenancy.
We are using SQL Server 2019 for storage.
Hangfire version is 1.7.10
This is MVC 5 application
I've not seen any errors/exceptions in any logs or during debugging
After going through this I've added this to our Autofac registration
builder.RegisterType<BackgroundJobStateChanger>()
.As<IBackgroundJobStateChanger>()
.InstancePerLifetimeScope();
This made no difference.
This is how the jobs are executed:
var parentJobId = _backgroundJobClient.Schedule<Handler>(h => h.ConvertCertToTraining(certId, command.SetUpOneToOneRelationship), TimeSpan.FromSeconds(1));
var filesCopyJObId = _backgroundJobClient.ContinueJobWith<Handler>(parentJobId, h => h.CopyAttachedFiles());
_backgroundJobClient.ContinueJobWith<Handler>(filesCopyJObId, h => h.NotifyUser(command.CertificationToBeConvertedIds, _principal.GetEmail()));
All the parameters are either int, bool or string. If I enqueue the awaiting jobs by hand, they are executed without issues.
I've added Hangfire logging, but could not see any issues there: server starts, stops, jobs change status, but could not see any obvious errors there.
What other things I should consider or where/how should I debug this?

From the looks of it, the first job with ID 216348 completed successfully but your second job with ID 216349 is waiting on the parent ID of 216347. According to Hangfire documentation and experience, the parentID should be of the job that you are waiting to finish before executing the second job.
According to Hangfire documentation on ContinueJobWith, "Continuations are executed when its parent job has been finished". From your screenshots, it is not clear whats going on with JobID: 216347. Once this job, 216347 completes, job with ID 216349 should kick off. If you are expecting 216349 to start after 216348 finishes, check your code and make sure correct ParentID is passed to the second job.
Update
Based on this thread, add the ContinuationsSupportAttribute to GlobalJobFilters.Filter where you configure Hangfire service. This should make your Hangfire instance aware of continuation jobs.
GlobalJobFilters.Filters.Add(new ContinuationsSupportAttribute());

During the investigation, it turned out that we were replacing JobFilterProviderCollection with our own collection:
var filterProviderCollection = new JobFilterProviderCollection
{
new MyFilterProvider(...)
};
var backgroundJobClient = new BackgroundJobClient(JobStorage.Current, filterProviderCollection);
MyFilterProvider looked like this:
public IEnumerable<JobFilter> GetFilters(Job job)
{
return new JobFilter[]
{
new JobFilter(new HangfireTenantFilter(_tenantDetail, _principal), JobFilterScope.Global, null),
new JobFilter(new HangfireFunctionalityFilter(_functionalityFilter), JobFilterScope.Global, null),
};
}
It turned out that code that was doing work on Continuation only took filters from this filter collection and ContinuationsSupportAttribute was not executed there in the right time. So re-adding default Hangfire filters from GlobalJobFilters.Filters fixed the situation:
public IEnumerable<JobFilter> GetFilters(Job job)
{
var customFilters = new List<JobFilter>()
{
new JobFilter(new HangfireTenantFilter(_tenantDetail, _principal), JobFilterScope.Global, null),
new JobFilter(new HangfireFunctionalityFilter(_functionalityFilter), JobFilterScope.Global, null),
};
customFilters.AddRange(GlobalJobFilters.Filters);
return customFilters;
}

Related

How to simulate long running process in .NET API Controller Correctly

I am trying to create a unit test to simulate my API being called by many people at the same time.
I've got this code in my unit test:
var tasks = new List<Task>();
for (int i = 0; i < 10; i++)
{
var id = i; // must assign to new variable inside for loop
var t = Task.Run(async () =>
{
response = await Client.GetAsync("/api/test2?id=" + id);
Assert.AreEqual(HttpStatusCode.OK, response.StatusCode);
});
tasks.Add(t);
}
await Task.WhenAll(tasks);
Then in my controller I am putting in a Thread.Sleep.
But when I do this, the total time for all tests to complete is 10 x the sleep time.
I expected all the calls to be made and to have ended up at the Thread.Sleep call at more or less the same time.
But it seems the API calls are actually made one after the other.
The reason I am testing the parallel API call is because I want to test a deadlock issue with my data repository when using SQLite which has only happened when more than 1 user uses my website at the same time.
And I have never been able to simulate this and I thought I'd create a unit test, but the code I have now seems to not be executing the calls in parallel.
My plan with the Thread.Sleep calls was to put a couple in the Controller method to make sure all requests end up between certain code blocks at the same time.
Do I need to set a a max number of parallel requests on the Web Server or something or am I doing something obviously wrong?
Thanks in advance.
Update 1:
I forgot to mention I get the same results with await Task.Delay(1000); and many similar alternatives.
Not sure if it's clear but this is all running within a unit test using NUnit.
And the "Web Server" and Client is created like this:
var builder = new WebHostBuilder().UseStartup<TStartup>();
Server = new TestServer(builder);
Client = Server.CreateClient();
You can use Task.Delay(time in milliseconds). Thread.Sleep will not release a thread and it can't process other tasks while waiting to result.
The HttpClient class in .NET has a limit of two concurrent requests to the same server by default, which I believe might be causing the issue in this case. Usually, this limit can be overridden by creating a new HttpClientHandler and using it as an argument in the constructor:
new HttpClient(new HttpClientHandler
{
MaxConnectionsPerServer = 100
})
But because the clients are created using the TestServer method, that gets a little more complicated. You could try changing the ServicePointManager.DefaultConnectionLimit property like below, but I'm not sure if that will work with the TestServer:
System.Net.ServicePointManager.DefaultConnectionLimit = 100;
That being said, I believe using Unit Tests for doing load testing is not a good approach and recommend looking into tools specific for load testing.
Reference for ServicePointManagerClass
This blog post also has more in-depth information about the subject
I found the problem with my test.
It was not the TestServer or the client code, it was the Database code.
In my controller I was starting an NHibernate Transaction, and that was blocking the requests because it would put a lock on the table being updated.
This is correct, so I had to change my code a bit to not automatically start a transaction. But rather leave that up to the calling code to manage.

Hangfire persist local scope

I would like to use Hangfire to create long running fire and forget task. If the web server dies and the background job is retried, I would like it to pick up where it left off.
In the example below, let's say that foo.RetryCount reaches 3 -> server restarts -> Hangfire reruns the job. In this case I would only like to run the task 7 more times (based on MaxAttemps), instead of restarting from zero.
I thought Hangfire persisted the arguments passed to the method in their current state, but as far as I can tell they are reset.
var foo = new Foo { RetryCount = 0, MaxAttemps = 10 };
BackgroundJob.Enqueue(() => RequestAndRetryOnFailure(foo));
void RequestAndRetryOnFailure(Foo foo)
{
// make request to server, if fail, wait for a
// while and try again later if not foo.MaxAttemps is reached
foo.RetryCount++;
}
I use hangfire extensively for a lot of different actions and have a constant need to reschedule a job that started but couldn't execute due to certain constraints.
The persistency you are referring to happens in the serialized version of the job that's enqeued but no longer kept once it does execute.
What I would recommend is, schedule the job to execute after certain amount if the server is not available. This will also help restart the job if the job is scheduled and hangfire reboots.
var foo = new Foo { RetryCount = 0, MaxAttemps = 10 };
BackgroundJob.Enqueue(() => RequestAndRetryOnFailure(foo));
void RequestAndRetryOnFailure(Foo foo)
{
// make request to server, if fail, wait for a
// while and try again later if not foo.MaxAttemps is reached
if (request to server failed)
{
foo.RetryCount ++;
If (foo.RetryCount < foo.MaxAttempts)
BackgroundJob.Schedule(RequestAndRetryOnFailure(foo), Timespan.FromSeconds(30));
else
return; // do nothing
}
}

Create a trigger to run a long time executed function C#

I have a function that supposes to run every night at 12 AM and to do some job
usually it takes 2 hours...
I want to create a trigger that calls it.
so I created an Azure function app with time trigger that calls with HTTP request to my controller that calls my function.
the controller function I created just for test.
[HttpGet]
public async Task<bool> updateFromRegAdmin()
{
try
{
RegEditApi_Service.retrieveRegAdminApiCredentials();
return true;
}
catch (Exception e)
{
Logger.writeToLog(Logger.LOG_SEVERITY_TYPE.Error, "", "updateFromRegAdmin ", e.Message);
return false;
}
}
so as I said the function "retrieveRegAdminApiCredentials" runs 2 hours.
and the problem is the request comes to timeout after a few minutes...
so how can I create a request that just triggers the inner function and let it run in the background?
by the way, I can't create a trigger on the server without an HTTP request because my company has scaled servers on Azure(it will run my trigger multiple time and create DB duplicates).
my previous solution to that was...
public class JobScheduler
{
public static void Start()
{
IScheduler scheduler = StdSchedulerFactory.GetDefaultScheduler();
scheduler.Start();
IJobDetail job = JobBuilder.Create<GetExchangeRates>().Build();
ITrigger trigger = TriggerBuilder.Create()
.WithDailyTimeIntervalSchedule
(s =>
s.WithIntervalInHours(24)
.OnEveryDay()
.StartingDailyAt(TimeOfDay.HourAndMinuteOfDay(00, 00))
)
.Build();
scheduler.ScheduleJob(job, trigger);
}
}
public class GetExchangeRates : IJob
{
public void Execute(IJobExecutionContext context)
{
Random random = new Random();
int randomNumber = random.Next(100000, 900000);
Thread.Sleep(randomNumber);
RegEditApi_Service.retrieveRegAdminApiCredentials();
}
}
If I understand you correctly, what you have is an Azure Function Timer trigger, that sends an HTTP request to your server with "RegEditApi_Service.retrieveRegAdminApiCredentials()".
The problem is, your function times out. To solve this, you should have the HTTP endpoint behind "retrieveRegAdminApiCredentials()", return immediately on accepting the request.
If you need some return value from the server, you should have the server put a message on some queue ( like Azure Storage queue) and have another Azure Function that listens to this queue, and accepts the message.
If the result of the long operation is relatively small, you can just have the result in the message. Otherwise, you would need to perform some operation, but this operation should be much quicker, because you have already performed the long running operation, and kept the answer, so now you will just retrieve it, and possibly do some cleanup.
You can also look into Azure Durable Functions, it is intended for this use case, but is still in preview, and I'm not sure how much benefit it will give you :
https://learn.microsoft.com/en-us/azure/azure-functions/durable-functions-overview#pattern-3-async-http-apis
Looks like you need a dedicated component able to schedule and execute a queue of tasks. There are nice frameworks for that, but if you dislike those for whatever reason, then make sure you initiate/reuse idle thread and force long execution there. As such, your API will return something alike: 200, OK meaning that process has started successfuly.
Key idea: distinct your threads explicitly. That's actually quite challenging.
Azure functions by default run to a maximum of 15 minutes (maybe 5, too lazy to check the documentation right now :-) ).
If your function is on a Consumption Plan, you can't increase this time. You can do it if you host your function on a App Service plan.

How can I check if a job exists in Hangfire before I delete it?

Hangfire hangs if you try and delete a job that doesn't exist, i.e if jobId isn't in Hangfire.Job
BackgroundJob.Delete(jobId);
Is there any way of checking if a job exists before trying to delete it?
Try using the monitoring API (JobStorage.Current.GetMonitoringApi()), there is a possibility to get job details or list of jobs.
Complete code example:
var monitoringApi = JobStorage.Current.GetMonitoringApi();
var deletedJobs = monitoringApi.DeletedJobs(0, 10);
If you want to get queued items:
// If no queue name was defined somewhere, probably this will be "default".
// If no items have been queued, then the queue won't exist, and it will error here.
var queue = monitoringApi.Queues().First().Name;
var enqueud jobs = monitoringApi.EnqueuedJobs(queue, 0, 10);
There is no need to do this anymore, as the bug causing Hangfire to hang has been fixed.

MVC 5 Shared Long Running Task

I have a long running action/method that is called when a user clicks a button on a internal MVC5 application. The button is shared by all users, meaning a second person can come in and click it seconds after it has been clicked. The long running task is updating a shared task window to all clients via SignalR.
Is there a recommended way to check if the task is still busy and simply notifying the user it's still working? Is there another recommended approach? (can't use external windows service for the work)
Currently what I am doing seems like a bad idea or I could be wrong and it's feasible. See below for a sample of what I am doing.
public static Task WorkerTask { get; set; }
public JsonResult SendData()
{
if (WorkerTask == null)
{
WorkerTask = Task.Factory.StartNew(async () =>
{
// Do the 2-15 minute long running job
});
WorkerTask = null;
}
else
{
TempData["Message"] = "Data is already being exported. Please see task window for the status.";
}
return Json(Url.Action("Export", "Home"), JsonRequestBehavior.AllowGet);
}
I don't think what you're doing will work at all. I see three issues:
You are storing the WorkerTask on the controller (I think). A new controller is created for every request. Therefore, a new WorkerTask will always be created.
If #1 weren't true, you would still need to wrap the instantiation of WorkerTask in a lock because multiple clients could reach the WorkerTask == null check at the same time.
You shouldn't have long running tasks in your web application. The app pool could restart at any time killing your WorkerTask.
If you want to skip the best practices advice of "don't do long running work in your web app", you could use the HostingEnvironment.QueueBackgroundWorkItem introduced in .NET 4.5.2 to kick off the long running task. You could store a variable in the HttpApplication.Cache to indicate whether the long running process has been kicked off.
This solution has more than a few issues (it won't work in a web farm, the app pool could die, etc.). A more robust solution would be to use something like Quartz.net or Hangfire.

Categories