Prevent multiple servers to run the same job in Hangfire

Prevent multiple servers to run the same job in Hangfire - c#

I run multiple instance of my application, and I have configured Hangfire to run as part of my Startup.cs configurations.
I want to generate a monthly report, and I'd like to ensure it's getting enqueued only once. DisableConcurrentExecution doesn't help as it prevents execution in the same time.
I read about Mutex as well:
When we create multiple background jobs based on this method, they will be executed one after another on a best-effort basis with the limitations described below. If there’s a background job protected by a mutex currently executing, other executions will be throttled (rescheduled by default a minute later), allowing a worker to process other jobs without waiting.
According to my understanding, Mutex will prevent concurrent execution, but it'll run my reports X times (where X is the number of my instances), one after another.
How can I ensure to enqueue a cron job only once?
How can I add the job without having to call it through an endpoint (e.g. POST <server>/api/enqueue_jobs
I don't have snippets to provide because I'm stuck with the configuration itself, I hope this answer won't be closed because I put efforts in trying to solve it by my own.

Related

Send heartbeat in long running hangfire process

Is it possible to send a heartbeat to hangfire (Redis Storage) to tell the system that the process is still alive? At the moment I set the InvisibilityTimeout to TimeSpan.MaxValue to prevent hangfire from restarting the job. But, if the process fails or the server restarts, the job will never be removed from the list of running jobs. So my idea was, to remove the large time out and send a kind of heartbeat instead. Is this possible?

I found https://discuss.hangfire.io/t/hangfire-long-job-stop-and-restart-several-time/4282/2 which deals with how to keep a long-running job alive in Hangfire.
The User zLanger says that jobs are considered dead and restarted once you ...
[...] are hitting hangfire’s invisibilityTimeout. You have two options.
increase the timeout to more than the job will ever take to run
have the job send a heartbeat to let hangfire’s know it’s still alive.
That's not new to you. But interestingly, the follow-up question there is:
How do you implement heartbeat on job?
This remains unanswered there, a hint that that your problem is really not trivial.
I have never handled long-running jobs in Hangfire, but I know the problem from other queuing systems like the former SunGrid Engine which is how I got interested in your question.
Back in the days, I had exactly your problem with SunGrid and the department's computer guru told me that one should at any cost avoid long-running jobs according to some mathematical queuing theory (I will try to contact him and find the reference to the book he quoted). His idea is maybe worth sharing with you:
If you have some job which takes longer than the tolerated maximal running time of the queuing system, do not submit the job itself, but rather multiple calls of a wrapper script which is able to (1) start, (2) freeze-stop, (3) unfreeze-continue the actual task.
This stop-continue can indeed be a suspend (CTRL+Z respectively fg in Linux) on operating-system level, see e.g. unix.stackexchange.com on that issue.
In practice, I had the binary myMonteCarloExperiment.x and the wrapper-script myMCjobStarter.sh. The maximum compute time I had was a day. I would fill the queue with hundreds of calls of the wrapper-script with the boundary condition that only one at a time of them should be running. The script would check whether there is already a process myMonteCarloExperiment.x started anywhere on the compute cluster, if not, it would start an instance. In case there was a suspended process, the wrapper script would forward it and let it run for 23 hours and 55 minutes, and suspend the process then. In any other case, the wrapper script would report an error.
This approach does not implement a job heartbeat, but it does indeed run a lengthy job. It also keeps the queue administrator happy by avoiding that job logs of Hangfire have to be cleaned up.
Further references
How to prevent a Hangfire recurring job from restarting after 30 minutes of continuous execution seems to be a good read

Calling different API with different interval time , saved in my database table

I have a list of API from different client saved in my Database table and all the API have different time interval for there API to be called. What should be my approach to call the API. New data may be added in the List of API table . Should I go for Dynamic Timers?
I have an application (GUI) which clients use to add new records.
These records represent an API url and the time (Schedule) at which that API should be called.
Your Challenge is to write code that is able to call all the Client specified API's at the specified schedule/time.

To me - API calling & Handling the responses (Storing into DB etc) should be one component. and, scheduling when to call which API - should be other component (Something like cron job). This way - when the time is right appropriate API call would be triggered. This also gives you flexibility to do multiple tries/retries in a day etc.
Update after your comment:
You have an application (GUI) which clients use to add new records.
These records represent an API url and the time (Schedule) at which that API should be called.
Your Challenge is to write code that is able to call all the Client specified API's at the specified schedule/time.
If I have got that problem right - my original suggestion stands.
Component 1 - Scheduler
Use Quartz.net (or create your own using a Timer etc) - and create a Service (say WCF) or Process which will read records from Database and identify all the schedules and the API urls that need to be called. When the scheduled time happens Quartz.net will trigger your handler method - where you will make a Call to Component 2 and pass on the API url.
Component 2 - API Engine
When it receives a call from Component 1 - it will make the API call and fetch the response. Store/process it as required.

There are various schedulers that can be used to do this automatically. For example, you could use Quartz.NET and its AdoJobStore. I haven't used that myself, but it sounds appropriate:
With the use of the included AdoJobStore, all Jobs and Triggers configured as "non-volatile" are stored in a relational database via ADO.NET.
Alternatively, your database may well have timers built into it. However, if this is primarily an academic exercise (as suggested by "your challenge") you may not be able to use these.
I would keep a table of scheduled tasks, with columns specifying:
When the task should next be run
What the task should do
How to work out the next iteration of that task afterwards
If the task has been started, when it was started
If the task completed, when it completed
You can then write code in an infinite loop to just scan that table, e.g. once per minute. It should look for all tasks with a "next time" earlier than now that haven't completed:
If the task hasn't been started, update the row to show that it has been started (now), and start executing the task
If the task was started recently, ignore it
If the task was started "a long time ago" (i.e. longer than it would take to run successfully), either mark it as "broken" somehow, or restart
When a task completes successfully, update the row to indicate that it's finished, and add another row for the next time it should be started.
You'll need to work out exactly what your error strategy is:
How long should the gap be between a task starting and you deciding it's failed?
Do you always want to restart the task, or should some failures be permanent?
Do you need to record how often a task failed, and give up after a certain number of tries?
What do you do if you explicitly notice that the task has failed while you're executing it? (Rather than just by the fact that it was started a long time ago.)
For extra reliability, you'd need to think of other aspects too:
Do you need multiple task runners?
How can you spot when a task runner has failed, and restart that?
How do you deal with multiple task runners trying to start the same task at the same time?
You may not need to actually implement everything here, but it's worth considering them.

Guarantee immediate start of parallel threads/tasks/whatever

I will use "Process" to refer to the work that is going to happen in parallel, and "enqueue" to refer to whatever process is going to be used to initiate that process (whether that be Task.Run, ThreadPool.QUWI, new Thread() ... whatever).
We have a performance sensitive program that spawn multiple parallel processes to gather data.
We're having issues with the spawning, that the processes are not beginning immediately.
Specifically, if we prepare a process, start a timer, enqueue the process, and check the timer as the very first action in the process ... then we see that the time delay occasionally stretches into 100s or even 1000s of milliseconds.
Given that the process itself is supposed to only run for 3-10 seconds, having a 2sec delay between enqueuing and activation of the process is a major issue.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Currently our implementations started using TP.QUWI, and then we move to using Task.Run.
Our initial investigation lead us to the Thread-Creation-Strategy used by Threadpool and using ThreadPool.SetMinThreads(), so we're pursuing that angle, to see if that will completely resolve the issue.
But is there another change/approach that we should be looking at, if our goal is to have the process start immediately after enqueuing?

Taken from here (I strongly suggest you have a read up):
Seems as though what you want can be achieved by overridding the default task scheduler.... scarey...
You can't assume that all parallel tasks will immediately run. Depending on the current work load and system configuration, tasks might be scheduled to run one after another, or they might run at the same time. For more information about how tasks are scheduled, see the section, "The Default Task Scheduler," later in this chapter.
Creating Tasks with Custom Scheduling
You can customize the details of how tasks in .NET are scheduled and run by overriding the default task scheduler that's used by the task factory methods. For example, you can provide a custom task scheduler as an argument to one of the overloaded versions of the TaskFactory.StartNew method.
There are some cases where you might want to override the default scheduler. The most common case occurs when you want your task to run in a particular thread context... Other cases occur when the load-balancing heuristics of the default task scheduler don't work well for your application. For more information, see the section, "Thread Injection," later in this chapter.
Unless you specify otherwise, any new tasks will use the current task scheduler...
You can implement your own task scheduler class. For more information, see the section, "Writing a Custom Task Scheduler," later in this chapter.
Thread Injection
The .NET thread pool automatically manages the number of worker threads in the pool...
Have a read of this SO post "replacing the task scheduler in c sharp with a custom built one"

Azure Cloud-Service OnStop

I using Azure Cloud Worker Role for processing incoming task from queues. Processing of each task can take up to several hours and each worker-role can handle up to N tasks simultaneously. Basically, it's working.
Now, you can read in documentation that from time to time, the worker role can be shutdown (for software update, OS upgrade, ...). Basically, it's fine. But, this planned shutdown cannot forcedly stop the worker-role already running tasks.
Expected:
When calling the OnStop() method by the environment:
the worker role will stop getting new tasks for processing.
Wait for running tasks completion.
Continue with the planned shutdown.
Actual:
OnStop() method can be block for up to 5 minutes. I cannot guaranty that I'll finish processing the task in 5 minutes - so, this is problem... My task is being killed in the middle of processing and this became unstable situation for my software.
How I'm can avoid this 5 minutes limit? Any tip will be welcome.

How I'm can avoid this 5 minutes limit?
Unfortunately, you can't. This is a hard limit imposed from Azure side. You will need to work around that.
There are two possible solutions I can think of and both of them would require you to rethink about your current architecture:
Break your one big task into many smaller tasks and create some kind of work flow.
Make your task idempotent so that even if it gets terminated in between (because of worker role shutdown or error in task itself) and when it gets pick up by another instance, it starts again in such a way that your output of the task is not corrupted.

No, you cannot bypass this limit. In general you should not rely on any of your instances running continuously for any long period of time. Instances may be suddenly stopped or they may suddenly disappear (because of an underlying server failure). You software should be designed such that when an instance is restarted (possibly redeployed) or some other instance finds capacity to take a previously released work item that work item is reprocessed without any adverse effects.

Thread Aborted?

Hi,
I have a ASP.NET application where I have added a Webservice that contains a "fire and forget" method. When this method is executed it will start a loop (0-99999) and for every loop it will read a xml file and save it to the database.
The problem is that this action will take a couple of hours and it usually ends with a Thread Aborted exception?
I have seen that you can increase the executionTimeout and this is how :
<httpRuntime executionTimeout="604800"/>
<compilation debug="true">
But this does not help?
I have also tried to add a thread.sleep within the loop. If I set it to 500 it will go half way and if I set <100 it will just go a couple of 1000 loops before the thread aborted exception?
How can I solve this?

Don't run the loop inside the web service. Instead, have it in a console app, a winforms app, or possibly even a windows service. Use the web service to start up the other program.
A web service is basically a web page, and asp.net web pages are not meant to host long running processes.
This article does not directly answer your question, but contains a snippet of info relevant to my answer:
http://msdn.microsoft.com/en-us/magazine/dd296718.aspx
However, when the duration of the
operation grows longer than the
typical ASP.NET session duration (20
minutes) or requires multiple actors
(as in my hiring example), ASP.NET
does not offer sufficient support. You
may recall that the ASP.NET worker
processes automatically shut down on
idle and periodically recycle
themselves. This will cause big
problems for long-running operations,
as state held within those processes
will be lost.
and the article is a good read, at any rate. It may offer ideas for you.

Not sure if this is 'the answer', but when you receive the web service call you could consider dispatching the action onto another thread. That could then run until completion. You would want to consider how you ensure that someone doesn't kick off two of these processes simultaneously though.

I have a ASP.NET application where I
have added a Webservice that contains
a "fire and forget" method. When this
method is executed it will start a
loop (0-99999) and for every loop it
will read a xml file and save it to
the database.
Lets not go into that I fhind this approach quite... hm... bad for many reasons (like a mid of the thing reset). I would queue the request, then return, and have a queue listener do the processing with transactional integrity.
Anyhow, what you CAN do is:
Queue a WorkItem for a wpool thread to pick things up.
Return immediately.
Besides that, web services and stuff like this are not a good place for hourly long running processes. Tick off a workflow, handle it separately.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.