I have multiple windows services which run 24/7 on a server. For logging events etc. I already use log4net but I want to be able to see if all my services are still running. So I've stumbled upon this question and learned about the ServiceController class. Now I've had the idea to make another service in which I create a ServiceController object per service, and use the WaitForStatus method to be notified when any of the services are stopped. I'd be able to check for any statuses externally through a hosted WCF in the servicecontroller service.
But I've also seen the answer to this question which states a ServiceController should be closed and disposed. Would it be bad to let my ServiceController wait 24/7 until any of my services stopped? Or should I use Quartz or a simple Timer to run a check every x amount of time?
Thanks in advance
You shouldn't. There is no mechanism in Windows to let a service status change generate an event. So ServiceController.WaitForStatus() must poll. It is hard-coded to query the service status 4 times per second, a Thread.Sleep(250) hard-codes the poll interval. Use a decompiler to see this for yourself.
So you basically have many threads in your program, doing nothing but sleep for hours. That's pretty ugly, a thread is an expensive OS object. These threads don't burn any core but the OS thread scheduler is still involved, constantly re-activating the threads when their sleep period expires.
If you need this kind of responsiveness to status changes then it is okayish, but keep in mind that it cannot be more responsive than 250 msec. And keep in mind that increasing the interval by using a Timer sounds attractive but do consider the problem with polling. If you do it, say, once a minute and an admin stops and restarts the service in, say, 30 seconds between two polls then you'll never see the status change. Oops.
Consider to use only one thread that queries many ServiceControllers through their Status property. Your own polling code, minus the cost of the threads.
Related
Is it possible to send a heartbeat to hangfire (Redis Storage) to tell the system that the process is still alive? At the moment I set the InvisibilityTimeout to TimeSpan.MaxValue to prevent hangfire from restarting the job. But, if the process fails or the server restarts, the job will never be removed from the list of running jobs. So my idea was, to remove the large time out and send a kind of heartbeat instead. Is this possible?
I found https://discuss.hangfire.io/t/hangfire-long-job-stop-and-restart-several-time/4282/2 which deals with how to keep a long-running job alive in Hangfire.
The User zLanger says that jobs are considered dead and restarted once you ...
[...] are hitting hangfire’s invisibilityTimeout. You have two options.
increase the timeout to more than the job will ever take to run
have the job send a heartbeat to let hangfire’s know it’s still alive.
That's not new to you. But interestingly, the follow-up question there is:
How do you implement heartbeat on job?
This remains unanswered there, a hint that that your problem is really not trivial.
I have never handled long-running jobs in Hangfire, but I know the problem from other queuing systems like the former SunGrid Engine which is how I got interested in your question.
Back in the days, I had exactly your problem with SunGrid and the department's computer guru told me that one should at any cost avoid long-running jobs according to some mathematical queuing theory (I will try to contact him and find the reference to the book he quoted). His idea is maybe worth sharing with you:
If you have some job which takes longer than the tolerated maximal running time of the queuing system, do not submit the job itself, but rather multiple calls of a wrapper script which is able to (1) start, (2) freeze-stop, (3) unfreeze-continue the actual task.
This stop-continue can indeed be a suspend (CTRL+Z respectively fg in Linux) on operating-system level, see e.g. unix.stackexchange.com on that issue.
In practice, I had the binary myMonteCarloExperiment.x and the wrapper-script myMCjobStarter.sh. The maximum compute time I had was a day. I would fill the queue with hundreds of calls of the wrapper-script with the boundary condition that only one at a time of them should be running. The script would check whether there is already a process myMonteCarloExperiment.x started anywhere on the compute cluster, if not, it would start an instance. In case there was a suspended process, the wrapper script would forward it and let it run for 23 hours and 55 minutes, and suspend the process then. In any other case, the wrapper script would report an error.
This approach does not implement a job heartbeat, but it does indeed run a lengthy job. It also keeps the queue administrator happy by avoiding that job logs of Hangfire have to be cleaned up.
Further references
How to prevent a Hangfire recurring job from restarting after 30 minutes of continuous execution seems to be a good read
I have a Windows service that is calling a stored proc over and over (in an infinite loop).
The code looks like this:
while(1)
{
callStoredProc();
doSomethingWithResults();
}
However, how there might be cases where the loop gets stuck with no response, but the service is still technically running.
I imagine there are tools to monitor the health of a service, to let operations teams know to restart it.
But for my scenario this won't help since the service will still be technically running, but it's stuck and can't continue.
What's the best way to ensure this process restarts if this scenario happens?
Would the solution be to use a task scheduler that checks for the heartbeat of this process, and restarts the service if it there's no heartbeat for a period of time? To have another separate thread that monitors the progress of the first process?
Windows services have various recovery options which takes care of question 1. For question 2, the best bet would be to use a timeout approach whereby if the service takes more than X amount of time to complete it restarts or stops what it's doing (I don't know the nature of your service so can't provide implementation detail).
The heartbeat idea would work as well, however, that just becomes another thing to manage/maintain & install.
I have determined that I have some intense operations that shouldn't occur in the context of a web request in my EWL web application. I see that EWL supports running a Windows Service which will be perfect for running my intense operations in the background without tying up web request threads and forcing users to wait.
I am to the point where I need to implement void WindowsServiceBase.Tick() but I don't see how often Tick() is called. How often is Tick() called by default and is this configurable?
I also see from the source that the Windows Service sends a "health check" email before calling Tick(). What's the thinking behind this? What if I don't want my email spammed with these emails?
The code in ServiceBaseAdapter is cryptic, but looking very carefully you can see that Tick will be called ten seconds after Init completes and, from then on, ten seconds after the last Tick call completes.
The health check email will go out shortly after midnight every day. It's designed to let you know that the service is still alive.
I have a timer running in my web application. Each time the application starts up, the timer is created. The issue is that the app pool ends after an idle period which also ends the timer. The next request causes the app pool to start back up and a new timer is created.
Is there anyway to keep the timer from resetting?
This is a question I see a lot. The short answer is no, the long answer is that even if you would periodically poll the web site it will eventually recycle the app pool anyway.
If need to do background work like this and embed that in ASP.NET you have to create a robust work queue that doesn't break if there are interruptions or crashes because it's going to happen. And that's just good design anyway for long running processes. This might seem like a lot of work but a simple design can take you very far.
The recommended approach is to pull that code into a separate Win32 service because the nature of such workloads don't sit well in REST based architectures.
If all you need is a periodic check, then it might be fine with just having an external script polling the web site but it's a crude way of handling timers.
Folks,
I want to develop a long running windows service (it should be working without problems for months), and I wonder what is the better option here:
Use a while(true) loop in the OnStop method
Use a timer to tick each n seconds and trigger my code
Any other options ?
Thanks
Essam
I wouldn't do #1.
I'd either do #2, or I'd spin off a separate thread during OnStart that does the actual work.
Anything but #1
The services manager (or the user, if he's the one activating the controls) expects OnStart() and OnStop() to return in a timely fashion.
The way it's usually done is to start your own thread that keeps things running and ofcourse, listens to an event that might tell it to stop.
Might be worth considering a scheduled task with a short interval. Saves writing a lot of plumbing code and dealing with the peculiarities of Windows Services timers.
Don't mess with the service controller code. If the service wants to stop, you will only make matters worse by using #1. And BTW the service can always crash, in which case your while(true) won't help you a thing.
If you really want to have a "running windows service (it should be working without problems for months)", you'd better make sure your own code is properly and thoroughly tested using unit and integration tests before your run it as a service.
I would NOT recommend #1.
What I’ve done in the past for the exact same scenario/situation is create a scheduled task that runs ever N seconds, kicks off a small script that simply does these 2 things: #1 checks for “IsAlreadyRunning” flag (which is read from the database) #2 If the flag is true, then the script immediately stops end exits. If the flag is false, the script kicks off a separate process (exe) in a new thread (which utilizes a service to perform a task that can be either very short or sometimes really long, depending on the amount of records to process). This process of course sets and resets the IsAlreadyRunning flag to ensure threads do not kick off actions that overlap. I have a service that's been running for years now with this approach and I never had any problems with it. My main process utilizes a web service and bunch of other things to perform some heavy backup operations.
The System.Threading.Timer class would seem appropiate for this sort of usage.
Is it doing a
1 clean up task, or
2 waking up and looking to see if needs to run a task
If it is something like #2, then using MSMQ would be more appropriate. With MSMQ task would get done almost immediately.