WinService sometimes fails to start after boot - c#

I have a few Windows services (all written in C#) that all show the same strange behaviour.
I have them set to delayed auto start so that they get started after the boot (delayed because well they are not critical).
They all host WCF services as parts of Client-Server applications and were installed using WiX if that matters.
I noticed that sometimes they just don't start.
If you look into the Services window fast enough after the OS is ready they have status "Starting". If you then refresh the view they are no longer starting but not "Started" either.
You can then start them manually without any problem whatsoever.
This produces no error messages and no log entrys. And to make it even better this only occurs if the machine has been shut down and turned on again. Reboot works perfectly fine every time (tried it about 20 times on two different machines)
If you set the failure actions to restart the service after failure it seems it will eventually start the service successfully but surely this can not be the ideal solution.
OSs are Windows 7 and WinServer 2008 R2
What am I missing here? Why do they fail to be started automatically(the first time at least)? And why does it make a difference if the computer boots following a reboot or a shutdown?
EDIT:
I was wrong about the failure actions. The did not fix the problem.
EDIT 2:
I have added exception handling around everything to log possible exceptions. But so far no exceptions have been logged.

Might it be the WCF Services take a long time to start? afaik, the windows service has to come up in a certain time (best practices is 30 seconds, technical limit I don't know) to not time out. That could explain why your service is in status "starting" but does not start.

Please see my answer from the duplicate. A windows service typically shouldn't have access to the desktop for security reasons. But it certainly should have a good amount of logging in it. You probably have a race condition. The only thing you could do about this in WiX would be to express a dependency on another service to get the service control manager to wait awhile before starting the service. But it really would be better if your code was more robust. An example would be the OnStart event fire up a background worker process and then return success. The background thread could then keep attempting to host the WCF endpoint and everything do a fair amount of logging in the process.

Related

ASP.NET .NET 4.5 Application crashes in IIS periodically and I can't figure out the cause

I have a .net 4.5 ASP.NET WebAPI application. Deployed in IIS using 1 worker on an 8gig VM with 4 CPUs.
I made changes to it recently (upgraded ServiceStack.Interfaces, ServiceStack.Common, ServiceStack.Redis and a bunch of dependencies) and started noticing that the IIS app pool this app is deployed on recycles about once an hour (give or take a few minutes).
There is nothing in my application logs that show any kind of issues. I collect metrics using telegraf and I do NOT see memory metrics increase at all, as far as all the metrics I look at everything looks absolutely normal and then the app pool recycles.
I looked at the event viewer and filtered the logs by WAS source and see event with ID 5011. Which basically means the IIS worker crash as I understand.
So then I used the DebugDiag and ran it on my local box with the app deployed on my box (I can reproduce the issue locally). It ran for a while and finally got the same event in the event viewer. Looked at the crash analysis logs from DebugDiag and all I see if a bunch of exceptions logged but nothing concrete right before the crash.
At this point I'm not entirely sure what else I can to figure out what's causing the crash so hoping there are more suggestions on what I can do to get more transparency.
What I think is happening is, there is some incompatibility with one of my dependencies and some of the upgraded packages which cause an exception to be thrown which is not handled by anything and crashes the IIS worker.
My application is working perfectly fine, as far as all API endpoints functions wit no issues, memory is NOT increasing, CPU is fine. So as far as I can tell there are no issues upto the crash.
Wondering if anyone knows any tricks to find whats causing the crash and/or handle it, prevent this exception from escaping and crashing the worker.
I was able to narrow down with some confidence that the issue lies somewhere within the ServiceStack.Redis RedisPubSubServer. What is the actual issue, I don't know as that would take a lot more time to dig and I've wasted too much time already.
However, piggybacking on some existing code I had (from before ServiceStack supported sentinel) I created a new implementation of the redis client wrapper for the which I call LazySentinelServiceStackClientWrapper; instead of using the built-in sentinel manager, it relies on a custom sentinel provider which I created LazySentinelApiSentinelProvider this implementation attempts to interrogate the available sentinel hosts in random order for master and slave nodes and then I construct a pool using the retrieved read/write and readonly hosts and this pool is used to run the redis operations. The pool is refreshed whenever an error occurs (after a failover). Opposed to the builtin sentinel manager that comes with ServiceStack.Redis which instantiates Redis pubsub server and listens for messages from sentinel whenever configuration changes such as fail-overs occur and updates the managed redis connection pool.
I installed my version of this redis client wrapper into my application has seen no app pool recycle events since (other than the scheduled ones).
Above is the log of app pool recycle events before I disabled the ServiceStack.Redis sentinel manager.
And here's the log of app pool recycle events after installing my new lazy sentinel manager
The first spike is me recycling the app manually and second one is the scheduled 1am recycle. So clearly the issue is solved.
What is the actual reason why the sentinel manager via redis pub sub server is causing IIS rapid fail protection to fire and recycle the app pool I do not know. Maybe someone with much more redis experience and/or IIS experience can attest to that. Also I did not test this in .net core and only tested for a .net 4.5.1 application deployed in IIS but on many different machines including local development machine and beefy production machines.
Finally one last note, that first image which shows all the recycle events, that's on my CI machine which is barely taking any traffic, maybe 1 request every few minutes. So this means the issue is not some memory leak or some resource exhaustion. Whatever the issue is, it happens regardless of traffic, CPU load, memory load, it just happens periodically.
Needless to say I will not be using the builtin sentinel manager at least for now.

Process Management Library

I am working on an OLAP application, WCF + Silverlight clients (up to 100 concurrent users). Unfortunately from time to time, a specific service call goes crazy (although it is perfectly valid, just too complex) and occasionally (once a month) brings the whole server down (by consuming all CPU).
A solution would involve killing user request or even the whole user session which is not a big deal for us from the business perspective - recovering/restarting the whole application is.
The idea of isolating user sessions into separate processes is very tempting: CPU/memory throttling and clean resource disposal (not like Thread.Abort) - if modern browsers can do this just for web pages, maybe it's time to do this on servers. We just want to evaluate this concept and see pros and cons in our particular scenario.
Hence the questions:
Is there already an existing library/framework which will be useful for managing processes (like pre-spawning/reusing processes, throttling, kill after timeout)?
Are there any "best practices" or guidelines how to create such architecture?
I was having same problem with my WCF services they too serve more than 100 clients..
and problem which i discovered using IIS logs (C:\Windows\System32\LogFiles\HTTPERR)
I found my problem in Application Pool Recycle timeout on IIS setting.
Application pool was getting restarted every 48 hours and which was causing issues with already subscribed clients.
So i would suggest
1. Analyze the http error logs and IIS logs which will give more information about all the application pools status if any gets shutdown or recycled.
2. If application pool crashes then Setup for Windbg and attach the process set the correct source file path. It will tell you the location if any exceptions are occurring.

Windows service as insurance that my application server is running

I created a server application that always need to be online and running.
In case that my application is been shutdown i want to restart the application.
Is there a way that "Windows service" will be my online insurance?
If not is there another way?
Thanks.
Yes, windows services can be set to restart if they fail.
In case that my application is been shutdown i want to restart the application.
This makes NO Sense and you should not do it.
If the admin shuts you down (and that is the only way a service is shut down) then he has a reason - maybe he has to apply emergency patches or something else.
If the app CRASHES that is another thing (and windows services can be set up to automatically restart when crashing), but if you geta shutdown request do NOT restart as a service. Never. I can name you a dozen cases where it would be stupid - mostly around system maintenance that you would interfere with. You basically risk being terminated because the computer is switched off.
If your application is not a windows service you need to learn the basics how windows works, because it is ridiculously crappy to have a normal application act as sever process. For anything, a user must be interactively logged in. Next time I see a server with a logged in user because some stupid programmer could not do his job I promise I (censored).

"A timeout was reached while waiting for the service to connect" error after rebooting

I have a custom-written Windows service that I run on a number of Hyper-V VMs. The VMs get rebooted a couple times an hour as part of some automated tests being run. The service is set to automatic start and almost all of the time, it starts up fine.
However, maybe 5% of the time, with no pattern that I can discern, the service fails to start. When it fails, I get an error in Event Viewer saying
A timeout was reached (30000 milliseconds) while waiting for the My Service Name service to connect.
When this occurs, I can start the service manually, or restart again, and the service will start fine.
The thing I can't figure out is that the 30 second timeout doesn't appear to be occurring in my code. The very first line of my service class's OnStart() method logs "Starting..." to its log4net log. When the service fails to start, I don't even get anything logged at all, which indicates to me that either log4net can't log for whatever reason, or the timeout is occurring before my OnStart() gets called.
The service runs on a variety of OSes, from XP all the way up to Win7 and 2008R2, and I know that setting the service to delayed start may solve this for Vista and later, but that seems like a hack.
I haven't been able to remote debug this because of the fact that it happens so intermittently and during system startup, and I'm at a loss as to further ways to try to figure out what's going on. Any ideas?
My guess - and that's all it is - is that the disk is thrashing hard during startup, to the point where the .NET Framework itself isn't starting in the 30 seconds that Windows allocates for services to start.
A kludgy workaround may be to set the service to start manually, then write a very small stub service in unmanaged code (e.g. C++, Delphi) to start the service.
Another approach may be to start the service remotely from another machine. The sc command should do the job nicely.
I was seeing this error in the Event Viewer when trying to install a service with powershell.
The problem I had was that I had different values for "Service Name" and "Service Display Name" in my powershell script to those that I had specified in the program.cs file of my Console Application.
For what it's worth, I discovered that I received this message (almost immediately upon service startup) because I did not have version 4.5 of the .NET framework installed on the target machine. I rolled back the version I was using to version 4.0 (which was already installed on the target machine) and the service worked as expected.
I think I may have also found another contributing factor to this kind of does not start on reboot error.
It appears that if the Windows Event Log is set to Overwrite Events > 7days.. size 512kb.. But a lot of activity has occurred within this window, then Event Log is effectively full because it can't overwrite the number of events generated inside that timeframe. If you set the eventlog to a much larger size OR to Overwrite as needed then you won't experience this issue
My issue with the same error was that the .Net installation on the server was not working correctly.
To figure this out:
I made a small console app with identical logic as the executing service, and I made a try-catch around the whole code piece, dumping it all out to console.
Not sure why the information didn't bubble up, but we saw the valuable messages about the Framework errors that we would never have seen otherwise.
We are having the same problem on Windows 2016 Server.
A fix that seems to be working is changing the user under which the service running from Local Service Account to local Administrator (not sure what's the cause).

Windowservice error

Hi i made some changes in windowservice coding side(some class files related to that),,means i did coding to fetch version value from registry,,After stopping the service i copied the exe from
the application side which was 72 Kb to installerpath side which was 74 Kb as fresh when installed from Installershield.So here 74kb old one is replaced with
72 kb,.But now the problem i am getting is this error
Services
Could not start the Monitor service on Local Computer.
Error 1053: The service did not respond to the start or control request in a timely fashion.
I googled for this error ,,some forums telling to install fresh framework copy.I did that changes but still getting error,and more over my coding part is
correct
Can any one suggests any solutions .
The first thing is to check if the user running the service has permissions to access the registry. Also often you can find more detail in EventViewer.
Otherwise it means that your service takes too long to initialize, please take a look at my old question: What is the timeout for starting a windows service?
Services have (if my memory serves me) 30 seconds to reply to a control request (thats start/stop etc).
You should check the code in the OnStart method implemented in your service to ensure that nothing is taking a long time. If you do have some long-running task that must occur as your service starts you should start this work on its own thread.
Now i got the mistake,,What actually i was doing is i am copying DLL created in the debug mode and copying into the Installer path..Actually what i need is i have to copy DLL obtained from the release mode and copy into the installer side

Categories