Quartz removes a job if it queues itself while running - c#

I have the following problem with Quartz:
A job is scheduled to run every 10 minutes. Sometimes (rarely) the job might take longer than 10 minutes. In such cases, Quartz will put the same job on the queue to run after the current one (same job) is executing. Normally that is no problem; the job will run two times in a row and all is well and functioning. However, in some cases, the second time the job will also take more than 10 minutes. I would expect that Quartz will just put it in the queue one more time. Instead this job never gets queued and is not run again. Everything else is normal besides this job, which never runs again until the system is restarted.
Is this the expected behavior? Is there any way that I can modify it to better suit my needs?

There was an issue with more jobs running and the QUARTZ thread pool being maxed out.

Related

hangfire recurring job: prevent parallel execution of the same job

I have implemented a recurring job which needs to run every minute. Now and then the job has a hickup as an API-Call is involved which can take a bit longer to response. So that the job is enqued a second time, even though it wasn't finished in the previous run.
My Question:
How do I prevent a Hangfire job to run if another instance of the same job is already running?
Thank you!

Send heartbeat in long running hangfire process

Is it possible to send a heartbeat to hangfire (Redis Storage) to tell the system that the process is still alive? At the moment I set the InvisibilityTimeout to TimeSpan.MaxValue to prevent hangfire from restarting the job. But, if the process fails or the server restarts, the job will never be removed from the list of running jobs. So my idea was, to remove the large time out and send a kind of heartbeat instead. Is this possible?
I found https://discuss.hangfire.io/t/hangfire-long-job-stop-and-restart-several-time/4282/2 which deals with how to keep a long-running job alive in Hangfire.
The User zLanger says that jobs are considered dead and restarted once you ...
[...] are hitting hangfire’s invisibilityTimeout. You have two options.
increase the timeout to more than the job will ever take to run
have the job send a heartbeat to let hangfire’s know it’s still alive.
That's not new to you. But interestingly, the follow-up question there is:
How do you implement heartbeat on job?
This remains unanswered there, a hint that that your problem is really not trivial.
I have never handled long-running jobs in Hangfire, but I know the problem from other queuing systems like the former SunGrid Engine which is how I got interested in your question.
Back in the days, I had exactly your problem with SunGrid and the department's computer guru told me that one should at any cost avoid long-running jobs according to some mathematical queuing theory (I will try to contact him and find the reference to the book he quoted). His idea is maybe worth sharing with you:
If you have some job which takes longer than the tolerated maximal running time of the queuing system, do not submit the job itself, but rather multiple calls of a wrapper script which is able to (1) start, (2) freeze-stop, (3) unfreeze-continue the actual task.
This stop-continue can indeed be a suspend (CTRL+Z respectively fg in Linux) on operating-system level, see e.g. unix.stackexchange.com on that issue.
In practice, I had the binary myMonteCarloExperiment.x and the wrapper-script myMCjobStarter.sh. The maximum compute time I had was a day. I would fill the queue with hundreds of calls of the wrapper-script with the boundary condition that only one at a time of them should be running. The script would check whether there is already a process myMonteCarloExperiment.x started anywhere on the compute cluster, if not, it would start an instance. In case there was a suspended process, the wrapper script would forward it and let it run for 23 hours and 55 minutes, and suspend the process then. In any other case, the wrapper script would report an error.
This approach does not implement a job heartbeat, but it does indeed run a lengthy job. It also keeps the queue administrator happy by avoiding that job logs of Hangfire have to be cleaned up.
Further references
How to prevent a Hangfire recurring job from restarting after 30 minutes of continuous execution seems to be a good read

Azure Cloud-Service OnStop

I using Azure Cloud Worker Role for processing incoming task from queues. Processing of each task can take up to several hours and each worker-role can handle up to N tasks simultaneously. Basically, it's working.
Now, you can read in documentation that from time to time, the worker role can be shutdown (for software update, OS upgrade, ...). Basically, it's fine. But, this planned shutdown cannot forcedly stop the worker-role already running tasks.
Expected:
When calling the OnStop() method by the environment:
the worker role will stop getting new tasks for processing.
Wait for running tasks completion.
Continue with the planned shutdown.
Actual:
OnStop() method can be block for up to 5 minutes. I cannot guaranty that I'll finish processing the task in 5 minutes - so, this is problem... My task is being killed in the middle of processing and this became unstable situation for my software.
How I'm can avoid this 5 minutes limit? Any tip will be welcome.
How I'm can avoid this 5 minutes limit?
Unfortunately, you can't. This is a hard limit imposed from Azure side. You will need to work around that.
There are two possible solutions I can think of and both of them would require you to rethink about your current architecture:
Break your one big task into many smaller tasks and create some kind of work flow.
Make your task idempotent so that even if it gets terminated in between (because of worker role shutdown or error in task itself) and when it gets pick up by another instance, it starts again in such a way that your output of the task is not corrupted.
No, you cannot bypass this limit. In general you should not rely on any of your instances running continuously for any long period of time. Instances may be suddenly stopped or they may suddenly disappear (because of an underlying server failure). You software should be designed such that when an instance is restarted (possibly redeployed) or some other instance finds capacity to take a previously released work item that work item is reprocessed without any adverse effects.

Console Application Interval vs Thread.Sleep

I have a console application which should periodically listen remote database,
if there is a new value then do some stuff.
Normally I create windows task scheduler job to run this console app every 2 minutes.
Another option I think, in console app I will have a code like;
while(true)
{
ConnectDatabaseAndProcess();
System.Threading.Thread.Sleep(120000);
}
So I assume console app will be always open, and will wait 2 minutes for every process and continue.
In performance matter will it make any difference?
Unless your computer is so overloaded that the time to create a process every two minutes is a huge drain on resources, there's no benefit to having your program sitting in a loop waiting for two minutes, just so that it can poll the database.
The benefit of using scheduled tasks is that you can change the scheduled task frequency (make it once every five minutes, or once an hour, or whatever) without having to modify the program. Sure, you could use an application configuration file, but why? Why duplicate functionality that already exists in the operating system, and is more flexible.
Also, with a scheduled task, you know that the program will start the polling operation again the next time the computer is rebooted. If you depend on the program to provide that delay, you have to either remember to start it every time, or put it in the startup task list.
Also, when the program is sitting there idle, it's occupying memory that could be used by other processes.
All told, using scheduled tasks is a much more flexible and robust solution. Any marginal performance gain (and we're talking, at most, one second) from having the program always running is far outweighed by the disadvantages.

Which is better to use for a recurring job: Service or Scheduled Task?

I have a task that needs to run every 30 seconds. I can do one of two things:
Write a command line app that runs the task once, waits 30 seconds, runs it again and then exits. I can schedule this task with Scheduled Tasks in Windows to run every minute
Write a Service that runs a task repeatedly while waiting 30 seconds in between each run.
Number 1 is more trivial, in my opinion and I would opt to do it this way by default. Am I wimping out? Is there a reason why I should make this a Service and not a scheduled task? What are the pros and cons of both and which would you pick in the end?
I read a nice blog post about this question recently. It goes into a lot of good reasons why you should not write a service to run a recurring job. Additionally, this question has been asked before:
https://stackoverflow.com/questions/390307/windows-service-vs-scheduled-task
Windows Service or Scheduled Task, which one do we prefer?
One advantage of using the scheduled task, is that if there is some potential risk involved with running the service such as a memory leak or hanging network connection, then the windows service can potentially hang aroung for a long time, adversely affecting other users. On the other hand, the scheduled task is written to be short running, so even if it does leak, the effect is minimised.
On the other hand, someone in one of the above questions commented that the scheduler has a limit of accuracy of somewhere in the range of 1 minute, so you may see that the scheduler is unable to run your task every 30 seconds with accuracy.
Obviously there are a number of tradeoffs to consider, but hopefully this will help you make a good decision.
If you're trying to run every 30 seconds, I'd go for option 2. This is pretty much a continually running job, in that case. The overhead of starting and stopping the process is probably higher than the process itself, especially if you use an appropriate timer.
If you make a job that is running once a day (or a few times a day), then I'd go for option 1 - using a scheduled task.
The task scheduler in windows seems a bit flakey in my opinion. I think you would get a more reliable result running as a service.
Also, a service could keep resources in memory, such as reading input from a file, and only have to do this at start-up of the service, not every 30 seconds.
30 seconds is a pretty short interval (relatively speaking) between processing cycles. Like the others I have my concerns about the task scheduler and I am afraid such a short interval will only compound the issues you might encounter if you took that approach. If this were my project I would almost certainly go with the service.

Categories