I have implemented a recurring job which needs to run every minute. Now and then the job has a hickup as an API-Call is involved which can take a bit longer to response. So that the job is enqued a second time, even though it wasn't finished in the previous run.
My Question:
How do I prevent a Hangfire job to run if another instance of the same job is already running?
Thank you!
Related
Is it possible to send a heartbeat to hangfire (Redis Storage) to tell the system that the process is still alive? At the moment I set the InvisibilityTimeout to TimeSpan.MaxValue to prevent hangfire from restarting the job. But, if the process fails or the server restarts, the job will never be removed from the list of running jobs. So my idea was, to remove the large time out and send a kind of heartbeat instead. Is this possible?
I found https://discuss.hangfire.io/t/hangfire-long-job-stop-and-restart-several-time/4282/2 which deals with how to keep a long-running job alive in Hangfire.
The User zLanger says that jobs are considered dead and restarted once you ...
[...] are hitting hangfire’s invisibilityTimeout. You have two options.
increase the timeout to more than the job will ever take to run
have the job send a heartbeat to let hangfire’s know it’s still alive.
That's not new to you. But interestingly, the follow-up question there is:
How do you implement heartbeat on job?
This remains unanswered there, a hint that that your problem is really not trivial.
I have never handled long-running jobs in Hangfire, but I know the problem from other queuing systems like the former SunGrid Engine which is how I got interested in your question.
Back in the days, I had exactly your problem with SunGrid and the department's computer guru told me that one should at any cost avoid long-running jobs according to some mathematical queuing theory (I will try to contact him and find the reference to the book he quoted). His idea is maybe worth sharing with you:
If you have some job which takes longer than the tolerated maximal running time of the queuing system, do not submit the job itself, but rather multiple calls of a wrapper script which is able to (1) start, (2) freeze-stop, (3) unfreeze-continue the actual task.
This stop-continue can indeed be a suspend (CTRL+Z respectively fg in Linux) on operating-system level, see e.g. unix.stackexchange.com on that issue.
In practice, I had the binary myMonteCarloExperiment.x and the wrapper-script myMCjobStarter.sh. The maximum compute time I had was a day. I would fill the queue with hundreds of calls of the wrapper-script with the boundary condition that only one at a time of them should be running. The script would check whether there is already a process myMonteCarloExperiment.x started anywhere on the compute cluster, if not, it would start an instance. In case there was a suspended process, the wrapper script would forward it and let it run for 23 hours and 55 minutes, and suspend the process then. In any other case, the wrapper script would report an error.
This approach does not implement a job heartbeat, but it does indeed run a lengthy job. It also keeps the queue administrator happy by avoiding that job logs of Hangfire have to be cleaned up.
Further references
How to prevent a Hangfire recurring job from restarting after 30 minutes of continuous execution seems to be a good read
I am using Quartz.net.
I have configure job with attribute DisallowConcurrentExecution. I want single instance of that job execute.
I have configured trigger that fire every 10 seconds but in some situation my job get more than minutes to complete. Once this happen I can not see Last Execution Time and Next execution correct. It still refer to old time.
I am new to quartz but I know that thread pool might schedule job that in queue and when one instance complete and new will get start because of attribute configuration but why it is not maintaining time of execution properly.
Please help.
Double-posted here: https://github.com/quartznet/quartznet/issues/173
This works as designed. Quartz considers your trigger misfired as it
didn't run when it was supposed to (job's concurrent execution
protection prohibited it). You need to tweak your misfire handling
configuration.
http://www.quartz-scheduler.net/documentation/quartz-2.x/tutorial/more-about-triggers.html
I created a job that implements IStatefulJob and according to the quartz docs
"if a job is stateful, and a trigger attempts to 'fire' the job while it is already
executing, the trigger will block (wait) until the previous execution completes"
Is there anyway way to remove the block and kill the newly fired instance of the job?
The job I am running can have wildly different run times based on the amount of data behind it and I am concerned that if we have a number of jobs waiting to run that it could have a negative effect...
Thanks
Unfortunately no. As a job implementor you are responsible for making sure that job will keep track whether it has reached its time limit of 'good behavior'. Normally there's no need as jobs take somewhat expected time to complete.
Same goes when you want to interrupt all jobs in scheduler, you need to implement IInterruptableJob and set flag that your main job loop watches.
You can always rethink the design. It shouldn't be problem to queue same job as it has the same duty to do. With misfire instructions you can configure misfired (queued too long) instanced to be discarded and wait for the next fire time.
I have the following problem with Quartz:
A job is scheduled to run every 10 minutes. Sometimes (rarely) the job might take longer than 10 minutes. In such cases, Quartz will put the same job on the queue to run after the current one (same job) is executing. Normally that is no problem; the job will run two times in a row and all is well and functioning. However, in some cases, the second time the job will also take more than 10 minutes. I would expect that Quartz will just put it in the queue one more time. Instead this job never gets queued and is not run again. Everything else is normal besides this job, which never runs again until the system is restarted.
Is this the expected behavior? Is there any way that I can modify it to better suit my needs?
There was an issue with more jobs running and the QUARTZ thread pool being maxed out.
I have a task that needs to run every 30 seconds. I can do one of two things:
Write a command line app that runs the task once, waits 30 seconds, runs it again and then exits. I can schedule this task with Scheduled Tasks in Windows to run every minute
Write a Service that runs a task repeatedly while waiting 30 seconds in between each run.
Number 1 is more trivial, in my opinion and I would opt to do it this way by default. Am I wimping out? Is there a reason why I should make this a Service and not a scheduled task? What are the pros and cons of both and which would you pick in the end?
I read a nice blog post about this question recently. It goes into a lot of good reasons why you should not write a service to run a recurring job. Additionally, this question has been asked before:
https://stackoverflow.com/questions/390307/windows-service-vs-scheduled-task
Windows Service or Scheduled Task, which one do we prefer?
One advantage of using the scheduled task, is that if there is some potential risk involved with running the service such as a memory leak or hanging network connection, then the windows service can potentially hang aroung for a long time, adversely affecting other users. On the other hand, the scheduled task is written to be short running, so even if it does leak, the effect is minimised.
On the other hand, someone in one of the above questions commented that the scheduler has a limit of accuracy of somewhere in the range of 1 minute, so you may see that the scheduler is unable to run your task every 30 seconds with accuracy.
Obviously there are a number of tradeoffs to consider, but hopefully this will help you make a good decision.
If you're trying to run every 30 seconds, I'd go for option 2. This is pretty much a continually running job, in that case. The overhead of starting and stopping the process is probably higher than the process itself, especially if you use an appropriate timer.
If you make a job that is running once a day (or a few times a day), then I'd go for option 1 - using a scheduled task.
The task scheduler in windows seems a bit flakey in my opinion. I think you would get a more reliable result running as a service.
Also, a service could keep resources in memory, such as reading input from a file, and only have to do this at start-up of the service, not every 30 seconds.
30 seconds is a pretty short interval (relatively speaking) between processing cycles. Like the others I have my concerns about the task scheduler and I am afraid such a short interval will only compound the issues you might encounter if you took that approach. If this were my project I would almost certainly go with the service.