I would like to build a job scheduler.
So this job scheduler allows the user to configure:
Job start time. (datetime value)
Job frequency: minutes, hours, days, months, years (any integer value)
I have 2 options on how to build this scheduler:
Use the C# Timer class. But this also means that I have to create a new Timer object for every scheduled job. I am planning to expose an API endpoint for the user to call and POST the starttime and frequency info. So when the endpoint is called, I will need to create a new Timer object.
Will this even scale? How do I manage the Timer objects? I need to allow user to create, update and delete their jobs.
Use the Azure Scheduler. However, I have a very large user database. I had a look at the pricing, the maximum total jobs that can run on 1 instance is 5 million jobs only. Plus, it is difficult for me to manage the running instances if I have more than 1 instance running. How do I decide to load balance multiple instances of schedulers?
You could use Azure Worker roles, one worker role per job type, and different schedules for each job, users can create their own separate schedule for each job, picking from a list of predefined "job types".
For this you can use RabbitMQ. Basically, every scheduler is just producer-consumer concept over standard messenger. I recommend you to use some messenger for this task, because they can guarantee that your task executed if it was scheduled, even if your system was shutted down. It is self-balanced, stable and pretty much everything you need already implemented here.
More here: https://www.rabbitmq.com/tutorials/tutorial-one-dotnet.html
For your task you simply specify producer side which if running will enqueue tasks into RabbitMQ with some schedule without care about how they execute in messenger. This way you can add, delete, edit, read schedules in your producer. Also, a little advice, instead of frequency terms use cron-expressions. It is widely support, easy to understand concept for specifying human-readable schedules.
More here: https://crontab.guru/
There are existing libraries that you can use, e.g. Quartz or Hangfire. The first one is rather simple to use library that I have used successfully, the latter has a UI in addition of running tasks, etc. that you can serve.
I am sure there are plenty of other libraries if those are not good enough.
Derek here from Azure Scheduler.We have enterprise customers using Scheduler similar to the scenarios you described, depending on what your user/job profile looks like, it should be easy to come up with a good way to partition them into different job collections.
How many jobs do you expect tot have? Sounds like you need much more than the 5 million we support in a single job collection with P20 plan. I'd like to better understand your scenario and help you decided whether Azure Scheduler is the right solution.
You can reach me at Derek.Li (at) microsoft dot com.
Related
I am working on an application where I have multiple servers on different machines doing long operations for me. There is a windows service running on those machines written with hangfire/topshelf. Only one operation can run at a time per machine. Additionally I want to do some status check and cleaning jobs periodically on each server, so I can't just queue them as jobs.
Is there a way to do that in hangfire? Also, is there a way to send a follow-up job to the same server as an earlier job?
ADD-ON: I know one possibility would be to add another hangfire layer: Make each of the services a hangfire client with own DB and serve themselves, and then schedule recurring jobs for them, but that seems awfully complicated - especially when scaling out and adding servers.
If your task is to run some scheduled task on each server, I think, the best option is to implement it yourself, Hangfire don't support events handling, only command handling. I think, you reached the point of Hangfire possibilities and need to switch to more powerful and general tool.
For events and their handling you can use other systems, for example RabbitMQ. You just specify event generator and subscribe all your machines for this event.
I know this is a bit late, but the way we handle this sort of thing is just to write a simple console application and schedule it with Windows Task Scheduler.
You've probably resolved this by now, but
1 - one job per server - as you have it - worker count - probably the best as you can have multiple queues per server and the filters won't help you there.
2 - should the cleanup run after each processing job?
if yes, you can create the cleanup job from within your process job execution (ok maybe not perfect design but it works just fine) and assign to a queue on the same server, just add some logic in filters to ensure processing job is followed by a cleanup job and you're sorted.
alternatively you can use Continuation jobs (as on the site https://www.hangfire.io/) - Haven't used these but sounds like it might do the trick.
if you just want to periodically run the cleanup code then just schedule the job as recurring on each of the servers
I've been building a web service to synchronize data between SalesForce and Zendesk at my company. In the process of doing so, I've built several optimizations to drastically reduce execution time, such as caching some of the larger datasets that are retrieved from each service.
However, this comes at a price. When caching the data, it can upwards to 3-5 minutes to download everything through SalesForce and Zendesk's APIs.
To combat this, I was thinking of having a background worker that automatically cached all the required data every day a midnight. However, I'm not sure what the best method of doing this would be.
Would it suffice to build a class that merely has a worker thread that checks every several minutes to see if it is after midnight, and activate it on launch from Global.asax. Or is there some sort of scheduler already in existence?
EDIT
There seems to be some division between using something like:
FluentScheduler or Quartz.net to house everything within my applications.
Versus using something like windows task scheduler and writing a secondary application to call a function of my application to do so. It seems that using a third party library would be more simple, but is there any inherent benefit to using the Windows Task Scheduler.
I think you want to add your data caching logic to a project of type "console application". You'll be able to deploy this to your server and run it as a scheduled task using windows "Task Scheduler". If you've not worked with this project type or scheduled tasks before there are stack overflow questions which should help here, here, and here. You can add command line parameters if you need and you should have a look at adding a mutex so that only one instance of your code will ever run at once.
add an endpoint that will know how do it and use the windows task scheduler to call that new caching endpoint.
I have an Work Tracker WPF application which deployed in Windows Server 2008 and this Tracker application is communicating with (Tracker)windows service VIA WCF Service.
User can create any work entry/edit/add/delete/Cancel any work entry from Worker Tracker GUI application. Internally it will send a request to the Windows service. Windows Service will get the work request and process it in multithreading. Each workrequest entry will actually create n number of work files (based on work priority) in a output folder location.
So each work request will take to complete the work addition process.
Now my question is If I cancel the currently creating work entry. I want to to stop the current windows service work in RUNTIME. The current thread which is creating output files for the work should get STOPPED. All the thread should killed. All the thread resources should get removed once the user requested for CANCEL.
My workaround:
I use Windows Service On Custom Command method to send custom values to the windows service on runtime. What I am achieving here is it is processing the current work or current thread (ie creating output files for the work item recieved).and then it is coming to custom command for cancelling the request.
Is there any way so that the Work item request should get stopped once we get the custom command.
Any work around is much appreciated.
Summary
You are essentially talking about running a task host for long running tasks, and being able to cancel those tasks. Your specific question seems to want to know the best way to implement this in .NET. Your architecture is good, although you are brave to roll your own rather than using existing frameworks, and you haven't mentioned scaling your architecture later.
My preference is for using the TPL Task object. It supports cancellation, and is easy to poll for progress, etc. You can only use this in .NET 4 onwards.
It is hard to provide code without basically designing a whole job hosting engine for you and knowing your .NET version. I have described the steps in detail below, with references to example code.
Your approach of using the Windows Service OnCustomCommand is fine, you could also use a messaging service (see below) if you have that option for client-service comms. This would be more appropriate for a scenario where you have many clients talking to a central job service, and the job service is not on the same machine as the client.
Running and cancelling tasks on threads
Before we look at your exact context, it would be good to review MSDN - Asynchronous Programming Patterns. There are three main .NET patterns to run and cancel jobs on threads, and I list them in order of preference for use:
TAP: Task-based Asynchronous Pattern
Based on Task, which has been available only since .NET 4
The prefered way to run and control any thread-based activity from .NET 4 onwards
Much simpler to implement that EAP
EAP: Event-based Asynchronous Pattern
Your only option if you don't have .NET 4 or later.
Hard to implement, but once you have understood it you can roll it out and it is very reliable to use
APM: Asynchronous Programming Model
No longer relevant unless you maintain legacy code or use old APIs.
Even with .NET 1.1 you can implement a version of EAP, so I will not cover this as you say you are implementing your own solution
The architecture
Imagine this like a REST based service.
The client submits a job, and gets returned an identifier for the job
A job engine then picks up the job when it is ready, and starts running it
If the client doesn't want the job any more, then they delete the job, using it's identifier
This way the client is completely isolated from the workings of the job engine, and the job engine can be improved over time.
The job engine
The approach is as follows:
For a submitted task, generate a universal identifier (UID) so that you can:
Identify a running task
Poll for results
Cancel the task if required
return that UID to the client
queue the job using that identifier
when you have resources
run the job by creating a Task
store the Task in a dictionary against the UID as a key
When the client wants results, they send the request with the UID and you return progress by checking against the Task that you retrieve from the dictionary. If the task is complete they can then send a request for the completed data, or in your case just go and read the completed files.
When they want to cancel they send the request with the UID, and you cancel the Task by finding it in the dictionary and telling it to cancel.
Cancelling inside a job
Inside your code you will need to regularly check your cancellation token to see if you should stop running code (see How do I abort/cancel TPL Tasks? if you are using the TAP pattern, or Albahari if you are using EAP). At that point you will exit your job processing, and your code, if designed well, should dispose of IDiposables where required, remove big strings from memory etc.
The basic premise of cancellation is that you check your cancellation token:
After a block of work that takes a long time (e.g. a call to an external API)
Inside a loop (for, foreach, do or while) that you control, you check on each iteration
Within a long block of sequential code, that might take "some time", you insert points to check on a regular basis
You need to define how quickly you need to react to a cancellation - for a windows service it should be within milliseconds, preferably, to make sure that windows doesn't have problems restarting or stopping the service.
Some people do this whole process with threads, and by terminating the thread - this is ugly and not recommended any more.
Reliability
You need to ask: what happens if your server restarts, the windows service crashes, or any other exception happens causing you to lose incomplete jobs? In this case you may want a queue architecture that is reliable in order to be able to restart jobs, or rebuild the queue of jobs you haven't started yet.
If you don't want to scale, this is simple - use a local database that the windows service stored job information in.
On submission of a job, record its details in the database
When you start a job, record that against the job record in the database
When the client collects the job, mark it for delayed garbage collection in the database, and then delete it after a set amount of time (1 hour, 1 day ...)
If your service restarts and there are "in progress jobs" then requeue them and then start your job engine again.
If you do want to scale, or your clients are on many computers, and you have a job engine "farm" of 1 or more servers, then look at using a message queue instead of directly communicating using OnCustomCommand.
Message Queues have multiple benefits. They will allow you to reliably submit jobs to a central queue that many workers can then pick up and process, and to decouple your clients and servers so you can scale out your job running services. They are used to ensure jobs are reliably submitted and processed in a highly decoupled fashion, and this can work locally or globally, but always reliably, you can even then combine it with running your windows service on cloud workers which you can dynamically scale.
Examples of technologies are MSMQ (if you want to maintain your own, or must stay inside your own firewall), or Windows Azure Service Bus (WASB) - which is cheap, and already done for you. In either case you will want to use Patterns and Best Practices for Enterprise Integration. In the case of WASB then there are many (MSDN), many (MSDN samples for BrokeredMessaging etc.), many (new Task-based API) developer resources, and NuGet packages for you to use
I'm new to the whole concept of ASP.NET, so please be patient with me. My requirement is to create a scheduling component that will do some tasks that are stored in a database. What's the most common way of implementing this. My idea is to store intervals as TimeSpan and somehow poll my database in very short intervals.
I would like to recommend to implement task executor as Windows Service.
Here are some approaches:
The service may periodically polls the database and perform the tasks.
Using MSMQ. It allows to use system-wide queues: one process (ASP.NET web application) can produce queue items, another one (the service) - consume them.
How to do asynchronous programming using ASP.NET, MSMQ and Windows Service, for long running processes.
For more advanced scheduling scenarios, Quartz.net is a very useful tool - http://quartznet.sourceforge.net/ - ported from Java (I think).
As others are pointing out, ASP.Net's lifecycle isn't well suited to regular scheduled tasks. However, I've seen several web apps and web app frameworks (e.g. DotNetNuke and YetAnotherForum, I think) which perform scheduled tasks by occasionally borrowing threadpool threads after web hits have been serviced. This is very useful in shared hosting models where you're normally restricted in what you can install on the server.
If you're looking at short intervals, you usually use a Timer class - there are several in the Framework e.g. System.Timers.Timer and System.Threading.Timer. You set the Interval as a number of milliseconds (using the TimeSpan static methods such as FromSeconds or FromMinutes for ease of reading)
Look at System.Timers.Timer vs System.Threading.Timer for a comparison of the two timers
I have a simple Azure Worker role running that performs a task every few seconds. Below is the code that accomplishes this.
public override void Run()
{
try
{
while (true)
{
DoSomething();
System.Threading.Thread.Sleep(3000);
}
}
catch (Exception ex)
{
Log.Add(ex, true);
}
}
What I'd like to do now is add a second task DoSomethingElse() that fires once and only once per day. I've thought of a couple of ways to accomplish this:
Add a counter that calls the new task every nth loop
Add conditional logic to the new task that compares the current time to a prescribed time of day
Use some TBD scheduler library (such as Quartz.NET)
The first two solutions strike me as very brittle without additional code to deal with situations where the service is stopped and restarted. The third solution strikes me as potentially overkill.
My question is, what is the best practice for scheduling tasks at different intervals within an Azure Worker Role? I have a slight preference for sticking with straight .NET and not using a third-party library (though I'm not ruling it out).
Note, #3 above comes from this older question Recommend a C# Task Scheduling Library
The first two options are the simplest but they are brittle - especially in the cloud where roles can be recycled/load balanced etc... If the persistence is in memory or even disk based in the cloud, then it will be brittle.
Outside of other third party options, you could look at persisting the schedule and execution data into external storage (table services, sql azure, etc...). On a periodic timer, the worker role can query for the jobs that are due to be performed, record starting and then run the job. That also allows you to potentially scale out the number of worker roles since it's persistence is external.
This can get complicated in a hurry but if you keep it simple with frequency and recording run times, it can be fairly straight forward.
Steve Marx wrote a nice couple of blog entries on how to build a task scheduler on Windows Azure using blob leases, I think you will find this very useful.