I would like to use Hangfire to create long running fire and forget task. If the web server dies and the background job is retried, I would like it to pick up where it left off.
In the example below, let's say that foo.RetryCount reaches 3 -> server restarts -> Hangfire reruns the job. In this case I would only like to run the task 7 more times (based on MaxAttemps), instead of restarting from zero.
I thought Hangfire persisted the arguments passed to the method in their current state, but as far as I can tell they are reset.
var foo = new Foo { RetryCount = 0, MaxAttemps = 10 };
BackgroundJob.Enqueue(() => RequestAndRetryOnFailure(foo));
void RequestAndRetryOnFailure(Foo foo)
{
// make request to server, if fail, wait for a
// while and try again later if not foo.MaxAttemps is reached
foo.RetryCount++;
}
I use hangfire extensively for a lot of different actions and have a constant need to reschedule a job that started but couldn't execute due to certain constraints.
The persistency you are referring to happens in the serialized version of the job that's enqeued but no longer kept once it does execute.
What I would recommend is, schedule the job to execute after certain amount if the server is not available. This will also help restart the job if the job is scheduled and hangfire reboots.
var foo = new Foo { RetryCount = 0, MaxAttemps = 10 };
BackgroundJob.Enqueue(() => RequestAndRetryOnFailure(foo));
void RequestAndRetryOnFailure(Foo foo)
{
// make request to server, if fail, wait for a
// while and try again later if not foo.MaxAttemps is reached
if (request to server failed)
{
foo.RetryCount ++;
If (foo.RetryCount < foo.MaxAttempts)
BackgroundJob.Schedule(RequestAndRetryOnFailure(foo), Timespan.FromSeconds(30));
else
return; // do nothing
}
}
We are using the following method in a Stateful Service on Service-Fabric. The service has partitions. Sometimes we get a FabricNotReadableException from this peace of code.
public async Task HandleEvent(EventHandlerMessage message)
{
var queue = await StateManager.GetOrAddAsync<IReliableQueue<EventHandlerMessage>>(EventHandlerServiceConstants.EventHandlerQueueName);
using(ITransaction tx = StateManager.CreateTransaction())
{
await queue.EnqueueAsync(tx, message);
await tx.CommitAsync();
}
}
Does that mean that the partition is down and is being moved? Of that we hit a secondary partition? Because there is also a FabricNotPrimaryException that is being raised in some cases.
I have seen the MSDN link (https://msdn.microsoft.com/en-us/library/azure/system.fabric.fabricnotreadableexception.aspx). But what does
Represents an exception that is thrown when a partition cannot accept reads.
mean? What happened that a partition cannot accept a read?
Under the covers Service Fabric has several states that can impact whether a given replica can safely serve reads and writes. They are:
Granted (you can think of this as normal operation)
Not Primary
No Write Quorum (again mainly impacting writes)
Reconfiguration Pending
FabricNotPrimaryException which you mention can be thrown whenever a write is attempted on a replica which is not currently the Primary, and maps to the NotPrimary state.
FabricNotReadableException maps to the other states (you don't really need to worry or differentiate between them), and can happen in a variety of cases. One example is if the replica you are trying to perform the read on is a "Standby" replica (a replica which was down and which has been recovered, but there are already enough active replicas in the replica set). Another example is if the replica is a Primary but is being closed (say due to an upgrade or because it reported fault), or if it is currently undergoing a reconfiguration (say for example that another replica is being added). All of these conditions will result in the replica not being able to satisfy writes for a small amount of time due to certain safety checks and atomic changes that Service Fabric needs to handle under the hood.
You can consider FabricNotReadableException retriable. If you see it, just try the call again and eventually it will resolve into either NotPrimary or Granted. If you get FabricNotPrimary exception, generally this should be thrown back to the client (or the client in some way notified) that it needs to re-resolve in order to find the current Primary (the default communication stacks that Service Fabric ships take care of watching for non-retriable exceptions and re-resolving on your behalf).
There are two current known issues with FabricNotReadableException.
FabricNotReadableException should have two variants. The first should be explicitly retriable (FabricTransientNotReadableException) and the second should be FabricNotReadableException. The first version (Transient) is the most common and is probably what you are running into, certainly what you would run into in the majority of cases. The second (non-transient) would be returned in the case where you end up talking to a Standby replica. Talking to a standby won't happen with the out of the box transports and retry logic, but if you have your own it is possible to run into it.
The other issue is that today the FabricNotReadableException should be deriving from FabricTransientException, making it easier to determine what the correct behavior is.
Posted as an answer (to asnider's comment - Mar 16 at 17:42) because it was too long for comments! :)
I am also stuck in this catch 22. My svc starts and immediately receives messages. I want to encapsulate the service startup in OpenAsync and set up some ReliableDictionary values, then start receiving message. However, at this point the Fabric is not Readable and I need to split this "startup" between OpenAsync and RunAsync :(
RunAsync in my service and OpenAsync in my client also seem to have different Cancellation tokens, so I need to work around how to deal with this too. It just all feels a bit messy. I have a number of ideas on how to tidy this up in my code but has anyone come up with an elegant solution?
It would be nice if ICommunicationClient had a RunAsync interface that was called when the Fabric becomes ready/readable and cancelled when the Fabric shuts down the replica - this would seriously simplify my life. :)
I was running into the same problem. My listener was starting up before the main thread of the service. I queued the list of listeners needing to be started, and then activated them all early on in the main thread. As a result, all messages coming in were able to be handled and placed into the appropriate reliable storage. My simple solution (this is a service bus listener):
public Task<string> OpenAsync (CancellationToken cancellationToken)
{
string uri;
Start ();
uri = "<your endpoint here>";
return Task.FromResult (uri);
}
public static object lockOperations = new object ();
public static bool operationsStarted = false;
public static List<ClientAuthorizationBusCommunicationListener> pendingStarts = new List<ClientAuthorizationBusCommunicationListener> ();
public static void StartOperations ()
{
lock (lockOperations)
{
if (!operationsStarted)
{
foreach (ClientAuthorizationBusCommunicationListener listener in pendingStarts)
{
listener.DoStart ();
}
operationsStarted = true;
}
}
}
private static void QueueStart (ClientAuthorizationBusCommunicationListener listener)
{
lock (lockOperations)
{
if (operationsStarted)
{
listener.DoStart ();
}
else
{
pendingStarts.Add (listener);
}
}
}
private void Start ()
{
QueueStart (this);
}
private void DoStart ()
{
ServiceBus.WatchStatusChanges (HandleStatusMessage,
this.clientId,
out this.subscription);
}
========================
In the main thread, you call the function to start listener operations:
protected override async Task RunAsync (CancellationToken cancellationToken)
{
ClientAuthorizationBusCommunicationListener.StartOperations ();
...
This problem likely manifested itself here as the bus in question already had messages and started firing the second the listener was created. Trying to access anything in state manager was throwing the exception you were asking about.
I have a long running action/method that is called when a user clicks a button on a internal MVC5 application. The button is shared by all users, meaning a second person can come in and click it seconds after it has been clicked. The long running task is updating a shared task window to all clients via SignalR.
Is there a recommended way to check if the task is still busy and simply notifying the user it's still working? Is there another recommended approach? (can't use external windows service for the work)
Currently what I am doing seems like a bad idea or I could be wrong and it's feasible. See below for a sample of what I am doing.
public static Task WorkerTask { get; set; }
public JsonResult SendData()
{
if (WorkerTask == null)
{
WorkerTask = Task.Factory.StartNew(async () =>
{
// Do the 2-15 minute long running job
});
WorkerTask = null;
}
else
{
TempData["Message"] = "Data is already being exported. Please see task window for the status.";
}
return Json(Url.Action("Export", "Home"), JsonRequestBehavior.AllowGet);
}
I don't think what you're doing will work at all. I see three issues:
You are storing the WorkerTask on the controller (I think). A new controller is created for every request. Therefore, a new WorkerTask will always be created.
If #1 weren't true, you would still need to wrap the instantiation of WorkerTask in a lock because multiple clients could reach the WorkerTask == null check at the same time.
You shouldn't have long running tasks in your web application. The app pool could restart at any time killing your WorkerTask.
If you want to skip the best practices advice of "don't do long running work in your web app", you could use the HostingEnvironment.QueueBackgroundWorkItem introduced in .NET 4.5.2 to kick off the long running task. You could store a variable in the HttpApplication.Cache to indicate whether the long running process has been kicked off.
This solution has more than a few issues (it won't work in a web farm, the app pool could die, etc.). A more robust solution would be to use something like Quartz.net or Hangfire.
I've got a C# console app running on Windows Server 2003 whose purpose is to read a table called Notifications and a field called "NotifyDateTime" and send an email when that time is reached. I have it scheduled via Task Scheduler to run hourly, check to see if the NotifyDateTime falls within that hour, and then send the notifications.
It seems like because I have the notification date/times in the database that there should be a better way than re-running this thing every hour.
Is there a lightweight process/console app I could leave running on the server that reads in the day's notifications from the table and issues them exactly when they're due?
I thought service, but that seems overkill.
My suggestion is to write simple application, which uses Quartz.NET.
Create 2 jobs:
First, fires once a day, reads all awaiting notification times from database planned for that day, creates some triggers based on them.
Second, registered for such triggers (prepared by the first job), sends your notifications.
What's more,
I strongly advice you to create windows service for such purpose, just not to have lonely console application constantly running. It can be accidentally terminated by someone who have access to the server under the same account. What's more, if the server will be restarted, you have to remember to turn such application on again, manually, while the service can be configured to start automatically.
If you're using web application you can always have this logic hosted e.g. within IIS Application Pool process, although it is bad idea whatsoever. It's because such process is by default periodically restarted, so you should change its default configuration to be sure it is still working in the middle of the night, when application is not used. Unless your scheduled tasks will be terminated.
UPDATE (code samples):
Manager class, internal logic for scheduling and unscheduling jobs. For safety reasons implemented as a singleton:
internal class ScheduleManager
{
private static readonly ScheduleManager _instance = new ScheduleManager();
private readonly IScheduler _scheduler;
private ScheduleManager()
{
var properties = new NameValueCollection();
properties["quartz.scheduler.instanceName"] = "notifier";
properties["quartz.threadPool.type"] = "Quartz.Simpl.SimpleThreadPool, Quartz";
properties["quartz.threadPool.threadCount"] = "5";
properties["quartz.threadPool.threadPriority"] = "Normal";
var sf = new StdSchedulerFactory(properties);
_scheduler = sf.GetScheduler();
_scheduler.Start();
}
public static ScheduleManager Instance
{
get { return _instance; }
}
public void Schedule(IJobDetail job, ITrigger trigger)
{
_scheduler.ScheduleJob(job, trigger);
}
public void Unschedule(TriggerKey key)
{
_scheduler.UnscheduleJob(key);
}
}
First job, for gathering required information from the database and scheduling notifications (second job):
internal class Setup : IJob
{
public void Execute(IJobExecutionContext context)
{
try
{
foreach (var kvp in DbMock.ScheduleMap)
{
var email = kvp.Value;
var notify = new JobDetailImpl(email, "emailgroup", typeof(Notify))
{
JobDataMap = new JobDataMap {{"email", email}}
};
var time = new DateTimeOffset(DateTime.Parse(kvp.Key).ToUniversalTime());
var trigger = new SimpleTriggerImpl(email, "emailtriggergroup", time);
ScheduleManager.Instance.Schedule(notify, trigger);
}
Console.WriteLine("{0}: all jobs scheduled for today", DateTime.Now);
}
catch (Exception e) { /* log error */ }
}
}
Second job, for sending emails:
internal class Notify: IJob
{
public void Execute(IJobExecutionContext context)
{
try
{
var email = context.MergedJobDataMap.GetString("email");
SendEmail(email);
ScheduleManager.Instance.Unschedule(new TriggerKey(email));
}
catch (Exception e) { /* log error */ }
}
private void SendEmail(string email)
{
Console.WriteLine("{0}: sending email to {1}...", DateTime.Now, email);
}
}
Database mock, just for purposes of this particular example:
internal class DbMock
{
public static IDictionary<string, string> ScheduleMap =
new Dictionary<string, string>
{
{"00:01", "foo#gmail.com"},
{"00:02", "bar#yahoo.com"}
};
}
Main entry of the application:
public class Program
{
public static void Main()
{
FireStarter.Execute();
}
}
public class FireStarter
{
public static void Execute()
{
var setup = new JobDetailImpl("setup", "setupgroup", typeof(Setup));
var midnight = new CronTriggerImpl("setuptrigger", "setuptriggergroup",
"setup", "setupgroup",
DateTime.UtcNow, null, "0 0 0 * * ?");
ScheduleManager.Instance.Schedule(setup, midnight);
}
}
Output:
If you're going to use service, just put this main logic to the OnStart method (I advice to start the actual logic in a separate thread not to wait for the service to start, and the same avoid possible timeouts - not in this particular example obviously, but in general):
protected override void OnStart(string[] args)
{
try
{
var thread = new Thread(x => WatchThread(new ThreadStart(FireStarter.Execute)));
thread.Start();
}
catch (Exception e) { /* log error */ }
}
If so, encapsulate the logic in some wrapper e.g. WatchThread which will catch any errors from the thread:
private void WatchThread(object pointer)
{
try
{
((Delegate) pointer).DynamicInvoke();
}
catch (Exception e) { /* log error and stop service */ }
}
You trying to implement polling approach, where a job is monitoring a record in DB for any changes.
In this case we are trying to hit DB for periodic time, so if the one hour delay reduced to 1 min later stage, then this solution turns to performance bottle neck.
Approach 1
For this scenario please use Queue based approach to avoid any issues, you can also scale up number of instances if you are sending so many emails.
I understand there is a program updates NotifyDateTime in a table, the same program can push a message to Queue informing that there is a notification to handle.
There is a windows service looking after this queue for any incoming messages, when there is a message it performs the required operation (ie sending email).
Approach 2
http://msdn.microsoft.com/en-us/library/vstudio/zxsa8hkf(v=vs.100).aspx
you can also invoke C# code from SQL Server Stored procedure if you are using MS SQL Server. but in this case you are making use of your SQL server process to send mail, which is not a good practice.
However you can invoke a web service, OR WCF service which can send emails.
But Approach 1 is error free, Scalable , Track-able, Asynchronous , and doesn't trouble your data base OR APP, you have different process to send email.
Queues
Use MSMQ which is part of windows server
You can also try https://www.rabbitmq.com/dotnet.html
Pre-scheduled tasks (at undefined times) are generally a pain to handle, as opposed to scheduled tasks where Quartz.NET seems well suited.
Furthermore, another distinction is to be made between fire-and-forget for tasks that shouldn't be interrupted/change (ex. retries, notifications) and tasks that need to be actively managed (ex. campaign or communications).
For the fire-and-forget type tasks a message queue is well suited. If the destination is unreliable, you will have to opt for retry levels (ex. try send (max twice), retry after 5 minutes, try send (max twice), retry after 15 minutes) that at least require specifying message specific TTL's with a send and retry queue. Here's an explanation with a link to code to setup a retry level queue
The managed pre-scheduled tasks will require that you use a database queue approach (Click here for a CodeProject article on designing a database queue for scheduled tasks)
. This will allow you to update, remove or reschedule notifications given you keep track of ownership identifiers (ex. specifiy a user id and you can delete all pending notifications when the user should no longer receive notifications such as being deceased/unsubscribed)
Scheduled e-mail tasks (including any communication tasks) require finer grained control (expiration, retry and time-out mechanisms). The best approach to take here is to build a state machine that is able to process the e-mail task through its steps (expiration, pre-validation, pre-mailing steps such as templating, inlining css, making links absolute, adding tracking objects for open tracking, shortening links for click tracking, post-validation and sending and retrying).
Hopefully you are aware that the .NET SmtpClient isn't fully compliant with the MIME specifications and that you should be using a SAAS e-mail provider such as Amazon SES, Mandrill, Mailgun, Customer.io or Sendgrid. I'd suggest you look at Mandrill or Mailgun. Also if you have some time, take a look at MimeKit which you can use to construct MIME messages for the providers allow sending raw e-mail and doesn't necessarily support things like attachments/custom headers/DKIM signing.
I hope this sets you on the right path.
Edit
You will have to use a service to poll at specific intervals (ex. 15 seconds or 1 minute). The database load can be somewhat negated by checkout out a certain amount of due tasks at a time and keeping an internal pool of messages due for sending (with a time-out mechanism in place). When there's no messages returned, just 'sleep' the polling for a while. I'd would advise against building such a system out against a single table in a database - instead design an independant e-mail scheduling system that you can integrate with.
I would turn it into a service instead.
You can use System.Threading.Timer event handler for each of the scheduled times.
Scheduled tasks can be scheduled to run just once at a specific time (as opposed to hourly, daily, etc.), so one option would be to create the scheduled task when the specific field in your database changes.
You don't mention which database you use, but some databases support the notion of a trigger, e.g. in SQL: http://technet.microsoft.com/en-us/library/ms189799.aspx
If you know when the emails need to be sent ahead of time then I suggest that you use a wait on an event handle with the appropriate timeout. At midnight look at the table then wait on an event handle with the timeout set to expire when the next email needs to be sent. After sending the email wait again with the timeout set based on the next mail that should be sent.
Also, based on your description, this should probably be implemented as a service but it is not required.
I have been dealing with the same problem about three years ago. I have changed the process several times before it was good enough, I tell you why:
First implementation was using special deamon from webhosting which called the IIS website. The website checked the caller IP and then check the database and send emails. This was working until one day, when I got a lot of very dirty emails from the users that I have totally spammed their mailboxes. The drawback of keeping email in database and sending from SMTP email is that there is NOTHING which ensure DB to SMTP transaction. You are never sure if the email has been successfully sent or not. Sending email can be successfull, can failed or it can be false positive or it can be false negative (SMTP client tells you, that the email was not sent, but it was). There was some problem with SMTP server and the server returned false(email not send), but the email was sent. The daemon was resending the email every hour the whole day before the dirty emails appears.
Second implementation: To prevent spamming, I have changed the algorithm, that the email is considered to be sent even if it failed (my email notification was not too important). My first advice is: "Don't launch the deamon too often, because this false negative smtp error makes users upset."
After several month there were some changes on the server and the daemon was not working well. I got the idea from the stackoverflow: bind the .NET timer to the web application domain. It wasn't good idea, because it seems, that IIS can restart application from time to time because of memory leaks and the timer never fires if the restarts are more often then timer ticks.
The last implementation. Windows scheduler every hour fires python batch which read local website. This fire ASP.NET code. The advantage is that time windows scheduler call the the local batch and website reliably. IIS doesn't hang, it has restart ability. The timer site is part of my website, it is still one projects. (you can use console app instead). Simple is better. It just works!
Your first choice is the correct option in my opinion. Task Scheduler is the MS recommended way to perform periodic jobs. Moreover it's flexible, can reports failures to ops, is optimized and amortized amongst all tasks in the system, ...
Creating any console-kind app that runs all the time is fragile. It can be shutdown by anyone, needs an open seesion, doesn't restart automatically, ...
The other option is creating some kind of service. It's guaranteed to be running all the time, so that would at least work. But what was your motivation?
"It seems like because I have the notification date/times in the database that there should be a better way than re-running this thing every hour."
Oh yeah optimization... So you want to add a new permanently running service to your computer so that you avoid one potentially unrequired SQL query every hour? The cure looks worse than the disease to me.
And I didn't mention all the drawbacks of the service. On one hand, your task uses no resource when it doesn't run. It's very simple, lightweight and the query efficient (provided you have the right index).
On the other hand, if your service crashes it's probably gone for good. It needs a way to be notified of new e-mails that may need to be sent earlier than what's currently scheduled. It permanently uses computer resources, such as memory. Worse, it may contain memory leaks.
I think that the cost/benefit ratio is very low for any solution other than the trivial periodic task.
Needed:
A Windows Service That Executes Jobs from a Job Queue in a DB
Wanted:
Example Code, Guidance, or Best Practices for this type of Application
Background:
A user will click on an ashx link that will insert a row into the DB.
I need my windows service to periodically poll for rows in this table, and it should execute a unit of work for each row.
Emphasis:
This isn't completely new terrain for me.
EDIT: You can assume that I know how to create a Windows Service and basic data access.
But I need to write this service from scratch.
And I'd just like to know upfront what I need to consider.
EDIT: I'm most worried about jobs that fail, contention for jobs, and keeping the service running.
Given that you are dealing with a database queue, you have a fair cut of the job already done for you due to the transactional nature of databases. Typical queue driven application has a loop that does:
while(1) {
Start transction;
Dequeue item from queue;
process item;
save new state of item;
commit;
}
If processing crashes midway, the transaction rolls back and the item is processed on the next service start up.
But writing queues in a database is actually a lot trickier than you believe. If you deploy a naive approach, you'll find out that your enqueue and dequeue are blocking each other and the ashx page becomes unresponsive. Next you'll discover the dequeue vs. dequeue are deadlocking and your loop is constantly hitting error 1205. I strongly urge you to read this article Using Tables as Queues.
Your next challenge is going to be getting the pooling rate 'just right'. Too aggressive and your database will be burning hot from the pooling requests. Too lax and your queue will grow at rush hours and will drain too slowly. You should consider using an entirely different approach: use a SQL Server built-in QUEUE object and rely on the magic of the WAITFOR(RECEIVE) semantics. This allows for completely poll free self load tuning service behavior. Actually, there is more: you don't need a service to start with. See Asynchronous Procedures Execution for an explanation on what I'm talking about: launching processing asynchronously in SQL Server from a web service call, in a completely reliable manner. And finally, if the logic must be in C# process then you can leverage the External Activator, which allows the processing to be hosted in standalone processes as opposed to T-SQL procedures.
First you'll need to consider
How often to poll for
Does your service just stop and start or does it support pause and continue.
Concurrency. Services can increase the likelihood of a encountering a problem
Implementation
Use a System.Timers.Timer not a Threading.Timer
Maker sure you set the Timer.AutoReset to false. This will stop the reentrant problem.
Make sure to include execution time
Here's the basic framework of all those ideas. It includes a way to debug this which is a pain
public partial class Service : ServiceBase{
System.Timers.Timer timer;
public Service()
{
timer = new System.Timers.Timer();
//When autoreset is True there are reentrancy problme
timer.AutoReset = false;
timer.Elapsed += new System.Timers.ElapsedEventHandler(DoStuff);
}
private void DoStuff(object sender, System.Timers.ElapsedEventArgs e)
{
Collection stuff = GetData();
LastChecked = DateTime.Now;
foreach (Object item in stuff)
{
try
{
item.Dosomthing()
}
catch (System.Exception ex)
{
this.EventLog.Source = "SomeService";
this.EventLog.WriteEntry(ex.ToString());
this.Stop();
}
TimeSpan ts = DateTime.Now.Subtract(LastChecked);
TimeSpan MaxWaitTime = TimeSpan.FromMinutes(5);
if (MaxWaitTime.Subtract(ts).CompareTo(TimeSpan.Zero) > -1)
timer.Interval = MaxWaitTime.Subtract(ts).TotalMilliseconds;
else
timer.Interval = 1;
timer.Start();
}
protected override void OnPause()
{
base.OnPause();
this.timer.Stop();
}
protected override void OnContinue()
{
base.OnContinue();
this.timer.Interval = 1;
this.timer.Start();
}
protected override void OnStop()
{
base.OnStop();
this.timer.Stop();
}
protected override void OnStart(string[] args)
{
foreach (string arg in args)
{
if (arg == "DEBUG_SERVICE")
DebugMode();
}
#if DEBUG
DebugMode();
#endif
timer.Interval = 1;
timer.Start();
}
private static void DebugMode()
{
Debugger.Break();
}
}
EDIT Fixed loop in Start()
EDIT Turns out Milliseconds is not the same as TotalMilliseconds
You may want to have a look at Quartz.Net to manage scheduling the jobs. Not sure if it will fit your particular situation, but it's worth a look.
Some things I can think of, based on your edit:
Re: job failure:
Determine whether a job can be retried and do one of the following:
Move the row to an "error" table for logging / reporting later OR
Leave the row in the queue so that it will be reprocessed by the job service
You could add a column like WaitUntil or something similar to delay retrying the job after a failure
Re: contention:
Add a timestamp column such as "JobStarted" or "Locked" to track when the job was started. This will prevent other threads (assuming your service is multithreaded) from trying to execute the job simultaneously.
You'll need to have some cleanup process that goes through and clears stale jobs for re-processing (in the event the job service fails and your lock is never released).
Re: keeping the service running
You can tell windows to restart a service if it fails.
You can detect previous failure upon startup by keeping some kind of file open while the service is running and deleting it upon successful shutdown. If your service starts up and that file already exists, you know the service previously failed and can alert an operator or perform the necessary cleanup operations.
I'm really just poking around in the dark here. I'd strongly suggest prototyping the service and returning with any specific questions about the way it functions.