I am working on an OLAP application, WCF + Silverlight clients (up to 100 concurrent users). Unfortunately from time to time, a specific service call goes crazy (although it is perfectly valid, just too complex) and occasionally (once a month) brings the whole server down (by consuming all CPU).
A solution would involve killing user request or even the whole user session which is not a big deal for us from the business perspective - recovering/restarting the whole application is.
The idea of isolating user sessions into separate processes is very tempting: CPU/memory throttling and clean resource disposal (not like Thread.Abort) - if modern browsers can do this just for web pages, maybe it's time to do this on servers. We just want to evaluate this concept and see pros and cons in our particular scenario.
Hence the questions:
Is there already an existing library/framework which will be useful for managing processes (like pre-spawning/reusing processes, throttling, kill after timeout)?
Are there any "best practices" or guidelines how to create such architecture?
I was having same problem with my WCF services they too serve more than 100 clients..
and problem which i discovered using IIS logs (C:\Windows\System32\LogFiles\HTTPERR)
I found my problem in Application Pool Recycle timeout on IIS setting.
Application pool was getting restarted every 48 hours and which was causing issues with already subscribed clients.
So i would suggest
1. Analyze the http error logs and IIS logs which will give more information about all the application pools status if any gets shutdown or recycled.
2. If application pool crashes then Setup for Windbg and attach the process set the correct source file path. It will tell you the location if any exceptions are occurring.
Related
I have a .net 4.5 ASP.NET WebAPI application. Deployed in IIS using 1 worker on an 8gig VM with 4 CPUs.
I made changes to it recently (upgraded ServiceStack.Interfaces, ServiceStack.Common, ServiceStack.Redis and a bunch of dependencies) and started noticing that the IIS app pool this app is deployed on recycles about once an hour (give or take a few minutes).
There is nothing in my application logs that show any kind of issues. I collect metrics using telegraf and I do NOT see memory metrics increase at all, as far as all the metrics I look at everything looks absolutely normal and then the app pool recycles.
I looked at the event viewer and filtered the logs by WAS source and see event with ID 5011. Which basically means the IIS worker crash as I understand.
So then I used the DebugDiag and ran it on my local box with the app deployed on my box (I can reproduce the issue locally). It ran for a while and finally got the same event in the event viewer. Looked at the crash analysis logs from DebugDiag and all I see if a bunch of exceptions logged but nothing concrete right before the crash.
At this point I'm not entirely sure what else I can to figure out what's causing the crash so hoping there are more suggestions on what I can do to get more transparency.
What I think is happening is, there is some incompatibility with one of my dependencies and some of the upgraded packages which cause an exception to be thrown which is not handled by anything and crashes the IIS worker.
My application is working perfectly fine, as far as all API endpoints functions wit no issues, memory is NOT increasing, CPU is fine. So as far as I can tell there are no issues upto the crash.
Wondering if anyone knows any tricks to find whats causing the crash and/or handle it, prevent this exception from escaping and crashing the worker.
I was able to narrow down with some confidence that the issue lies somewhere within the ServiceStack.Redis RedisPubSubServer. What is the actual issue, I don't know as that would take a lot more time to dig and I've wasted too much time already.
However, piggybacking on some existing code I had (from before ServiceStack supported sentinel) I created a new implementation of the redis client wrapper for the which I call LazySentinelServiceStackClientWrapper; instead of using the built-in sentinel manager, it relies on a custom sentinel provider which I created LazySentinelApiSentinelProvider this implementation attempts to interrogate the available sentinel hosts in random order for master and slave nodes and then I construct a pool using the retrieved read/write and readonly hosts and this pool is used to run the redis operations. The pool is refreshed whenever an error occurs (after a failover). Opposed to the builtin sentinel manager that comes with ServiceStack.Redis which instantiates Redis pubsub server and listens for messages from sentinel whenever configuration changes such as fail-overs occur and updates the managed redis connection pool.
I installed my version of this redis client wrapper into my application has seen no app pool recycle events since (other than the scheduled ones).
Above is the log of app pool recycle events before I disabled the ServiceStack.Redis sentinel manager.
And here's the log of app pool recycle events after installing my new lazy sentinel manager
The first spike is me recycling the app manually and second one is the scheduled 1am recycle. So clearly the issue is solved.
What is the actual reason why the sentinel manager via redis pub sub server is causing IIS rapid fail protection to fire and recycle the app pool I do not know. Maybe someone with much more redis experience and/or IIS experience can attest to that. Also I did not test this in .net core and only tested for a .net 4.5.1 application deployed in IIS but on many different machines including local development machine and beefy production machines.
Finally one last note, that first image which shows all the recycle events, that's on my CI machine which is barely taking any traffic, maybe 1 request every few minutes. So this means the issue is not some memory leak or some resource exhaustion. Whatever the issue is, it happens regardless of traffic, CPU load, memory load, it just happens periodically.
Needless to say I will not be using the builtin sentinel manager at least for now.
I have a few Windows services (all written in C#) that all show the same strange behaviour.
I have them set to delayed auto start so that they get started after the boot (delayed because well they are not critical).
They all host WCF services as parts of Client-Server applications and were installed using WiX if that matters.
I noticed that sometimes they just don't start.
If you look into the Services window fast enough after the OS is ready they have status "Starting". If you then refresh the view they are no longer starting but not "Started" either.
You can then start them manually without any problem whatsoever.
This produces no error messages and no log entrys. And to make it even better this only occurs if the machine has been shut down and turned on again. Reboot works perfectly fine every time (tried it about 20 times on two different machines)
If you set the failure actions to restart the service after failure it seems it will eventually start the service successfully but surely this can not be the ideal solution.
OSs are Windows 7 and WinServer 2008 R2
What am I missing here? Why do they fail to be started automatically(the first time at least)? And why does it make a difference if the computer boots following a reboot or a shutdown?
EDIT:
I was wrong about the failure actions. The did not fix the problem.
EDIT 2:
I have added exception handling around everything to log possible exceptions. But so far no exceptions have been logged.
Might it be the WCF Services take a long time to start? afaik, the windows service has to come up in a certain time (best practices is 30 seconds, technical limit I don't know) to not time out. That could explain why your service is in status "starting" but does not start.
Please see my answer from the duplicate. A windows service typically shouldn't have access to the desktop for security reasons. But it certainly should have a good amount of logging in it. You probably have a race condition. The only thing you could do about this in WiX would be to express a dependency on another service to get the service control manager to wait awhile before starting the service. But it really would be better if your code was more robust. An example would be the OnStart event fire up a background worker process and then return success. The background thread could then keep attempting to host the WCF endpoint and everything do a fair amount of logging in the process.
The Problem
I have an ASP.NET web application connecting to an SQL server database.
Occasionally, the application hangs when some specific users try to log in. I can replicate this when I try to log in as that user on the live site from my machine, but the problem doesn't appear in an identical site (the same code, the same machine, the same database and user account - only the worker process and application pool are different).
The problem also disappears (at least temporarily) on an application pool recycle. When it happens again it tends to be the same users.
Normally I'd assume this was a problem in the application-specific code or database, and I'm still looking for a way that that's possible, but the fact it doesn't happen in an identical site running against the same database makes me doubt it.
More information
From some logging, the hang seems to happen after the Page_Load of the requested page, but before the Page_Load of the user controls on that page.
The hanging requests can be seen in the current requests of the site's worker process in IIS, the state shows as ExecuteRequestHandler and the module is the IsapiModule.
The SQL server's activity monitor shows no long running queries during the time that the application is hanging, the profiler shows no frequent identical queries coming from the application during the hang.
Most users can continue to use the application without problems.
The application stores user details with the in process ASP.NET session state.
On some occasions the problem has disappeared on its own after 10-20 minutes, other times it persists until an application pool recycle.
Occasionally the logging shows that after ~5 minutes of hanging after the requested page's Page_Load, the Page_Load of the first child user control is called, after which there is another hang.
Summary
I'm basically wondering if there is any chance that this might be due to a problem with ASP.NET session locking (are there any known bugs, or dangers of deadlock on session?)
Could it be a problem with the request's thread being assigned a low priority for processor time?
If anybody has any ideas about what this could be I'd be interested, if not, then might debugging a memory dump of the worker process help me? I don't have any experience of or know anything about how safe it is to take a memory dump of a live, running worker process.
I have an MVC3 razor page that essentially does this
[HttpPost]
public ActionResult Index(Mymodel model)
{
//run background process
return(model);
}
public backgroundprocesscompleted()
{
//write to database
}
The run background process part runs an analysis via a library then once it is done runs backgroundprocesscompleted. This whole thing works for small files (46 rows) but when I do it on bigger files (11k) it runs on IIS and appears to time out. I have changed numerous settings but still it times out. Essentially what I would like to do is force the web application to never timeout. Any good ideas on how to fix this? Thanks
There are several reasons why a longer-running background process in ASP.Net might fail, including
An unhandled exception in a thread not associated with a request will take down the process.
The AppDomain your site runs in can go down for a number of reasons and take down your background task with it.
When you modify web.config, ASP.NET will recycle the AppDomain
IIS will itself recycle the entire w3wp.exe process every 29 hours (by default)
In a shared hosting environment, many web servers are configured to tear down the application pools after some period of inactivity
http://haacked.com/archive/2011/10/16/the-dangers-of-implementing-recurring-background-tasks-in-asp-net.aspx
The article linked above has some suggestions for mitigating those risks. However, I would recommend doing long-running processing entirely outside of ASP.Net. One straightforward approach is to write the work into a transactional MSMQ Queue, and have a Windows Service read work items off of that queue and process them.
We have developed some long running C# console applications which will be run by Windows Scheduled tasks.
These applications might be run on many different server machines on intranet/extranet.
We cannot ensure that they run on one machine because each application might need access to some resources, which are available only on a certain machine.
Still, all these applications are using a common WCF service to access the database.
We need to ensure, that there is only instance of one of our applications running at any moment.
As the apps might be on different extranet computers, we cannot use per-machine mutexes or MSMQ.
I have thought about the following solution - WCF Mutex service with a timeout. When one app runs, it checks to see if it is already launched (maybe on another machine) and then (in a dedicated thread) periodically pings the WCF Mutex service to update the timestamp (if ping fails, the app exits immediately). If the timestamp gets expired, this means, that the application has crashed, so it can be run again.
I would like to know, if this "WCF mutex" is optimal solution for my problem. Maybe there are already some third party libraries which have implemented such functionality?
You mutex solution has a race condition.
If an app on a different server checks the timestamp in the window after the timestamp expired, but before the current service had updated the timestamp you will have two instances running.
I'd probably go the opposite route. I'd have a central monitoring service. This service would continually monitor the health of the system. If it detects a service went down, it would restart it on either that machine or a different one.
You may want to bite the bullet and go with a full Enterprise Service Bus. Check the Wikipedia article for ESBs. It lists over a dozen commercial and open source systems.
How about a file lock on a network location?
If you can create/open the file with exclusive read write then it is the only app running. If this running app then subsequently crashes, the lock is released automaticaly by the OS.
Tim
Oops, just re-read the question and saw "extranet", ignore me!