I am running a ASP.NET 4.5 Web API application on IIS 8.5. Under worker process, I am seeing certain request are queued up by IIS and are never being served. This could be a memory leak issue. I am still investigating. During investigation I read about thread pool settings.
The article talks about Recommended Threading Settings for Reducing Contention. The article says make these changes in machine.config. My question is can I make these changes in web.config ?
Do we have any other recommendation for fixing he queuing issue.
Machine.config specify the configuration for all .Net applications in this windows instance. Web.config specific the configuration for specific application.
If this issue only happen in single IIS web application, you could specify the configuration in Web.config. Otherwise, it is suggested to set them in Machine.config.
In my experience, Your problem should be an application pool hang or low performance issue if request get stuck in queue.
First of all, please check whether any error message was logged in IIS application or system event log.
To troubleshooting performance issue, please try to capture dump file with Procdump or debug diagnostic tool.
We need to check these managed stack traces via these dump file. So that we got to know which method or request are getting slow or hung.
With WINDBG mex extension, we will know:
how many request are being processed
How's the status of each thread
Is there any thread get frozen or dead locked
If there is a deadlock, Which thread is get locked and what's the address of your dead lock.
If we need to know which configuration or solution should be applied in your server, find the root cause or characteristic is also necessary.
If you don't know how to Analyze dump file, Debug diagnostic analysis tool or WINDBG analyze -v command would help.
It's something in your application code that is hanging or just running too long.
The request monitor in IIS Manager shows all currently running requests - not just requests that have not been served yet. I confirmed this by running a project in Visual Studio that I have setup to debug in IIS. I set a breakpoint in Visual Studio and initiated a request in my browser. When it hit that breakpoint, the request monitor showed that request and the time kept climbing as long as I did not continue execution of the code.
While sitting at the breakpoint, the "State" I saw in IIS Manager was "ExecuteRequestHandler", so if that's the state you see, then the request is surely being served by your application, and it's your application being slow.
If it only happens with one specific API call, you can look through your code for possible deadlocks or long-running queries. Jokies' answer might help you pinpoint where and why easier (maybe). If it makes SQL queries, you could also look at pending queries on your SQL server to see what it's doing.
(Side note: I didn't even know that request monitor existed before this, so this was interesting to look into!)
Update: You can enabled Failed-Request Tracing in IIS to hunt down exactly where it's stalling. Create a timed rule (log a trace when requests take longer than x seconds). The only downside is that it has to change your web.config to enable this, so it'll recycle your app pool, which may or may not restrict when you can do this.
There is more information in this article about tracing long-running requests, including that you can modify your code to use Trace.Write to write information from your code into the traces that IIS picks up.
Related
I have a .net 4.5 ASP.NET WebAPI application. Deployed in IIS using 1 worker on an 8gig VM with 4 CPUs.
I made changes to it recently (upgraded ServiceStack.Interfaces, ServiceStack.Common, ServiceStack.Redis and a bunch of dependencies) and started noticing that the IIS app pool this app is deployed on recycles about once an hour (give or take a few minutes).
There is nothing in my application logs that show any kind of issues. I collect metrics using telegraf and I do NOT see memory metrics increase at all, as far as all the metrics I look at everything looks absolutely normal and then the app pool recycles.
I looked at the event viewer and filtered the logs by WAS source and see event with ID 5011. Which basically means the IIS worker crash as I understand.
So then I used the DebugDiag and ran it on my local box with the app deployed on my box (I can reproduce the issue locally). It ran for a while and finally got the same event in the event viewer. Looked at the crash analysis logs from DebugDiag and all I see if a bunch of exceptions logged but nothing concrete right before the crash.
At this point I'm not entirely sure what else I can to figure out what's causing the crash so hoping there are more suggestions on what I can do to get more transparency.
What I think is happening is, there is some incompatibility with one of my dependencies and some of the upgraded packages which cause an exception to be thrown which is not handled by anything and crashes the IIS worker.
My application is working perfectly fine, as far as all API endpoints functions wit no issues, memory is NOT increasing, CPU is fine. So as far as I can tell there are no issues upto the crash.
Wondering if anyone knows any tricks to find whats causing the crash and/or handle it, prevent this exception from escaping and crashing the worker.
I was able to narrow down with some confidence that the issue lies somewhere within the ServiceStack.Redis RedisPubSubServer. What is the actual issue, I don't know as that would take a lot more time to dig and I've wasted too much time already.
However, piggybacking on some existing code I had (from before ServiceStack supported sentinel) I created a new implementation of the redis client wrapper for the which I call LazySentinelServiceStackClientWrapper; instead of using the built-in sentinel manager, it relies on a custom sentinel provider which I created LazySentinelApiSentinelProvider this implementation attempts to interrogate the available sentinel hosts in random order for master and slave nodes and then I construct a pool using the retrieved read/write and readonly hosts and this pool is used to run the redis operations. The pool is refreshed whenever an error occurs (after a failover). Opposed to the builtin sentinel manager that comes with ServiceStack.Redis which instantiates Redis pubsub server and listens for messages from sentinel whenever configuration changes such as fail-overs occur and updates the managed redis connection pool.
I installed my version of this redis client wrapper into my application has seen no app pool recycle events since (other than the scheduled ones).
Above is the log of app pool recycle events before I disabled the ServiceStack.Redis sentinel manager.
And here's the log of app pool recycle events after installing my new lazy sentinel manager
The first spike is me recycling the app manually and second one is the scheduled 1am recycle. So clearly the issue is solved.
What is the actual reason why the sentinel manager via redis pub sub server is causing IIS rapid fail protection to fire and recycle the app pool I do not know. Maybe someone with much more redis experience and/or IIS experience can attest to that. Also I did not test this in .net core and only tested for a .net 4.5.1 application deployed in IIS but on many different machines including local development machine and beefy production machines.
Finally one last note, that first image which shows all the recycle events, that's on my CI machine which is barely taking any traffic, maybe 1 request every few minutes. So this means the issue is not some memory leak or some resource exhaustion. Whatever the issue is, it happens regardless of traffic, CPU load, memory load, it just happens periodically.
Needless to say I will not be using the builtin sentinel manager at least for now.
I've been trying to diagnose an issue pertaining to thousands of hung/stuck EndRequest requests in IIS. This is becoming a large problem for us as we're hitting the concurrent connection cap after about a week or two and have to recycle the whole application pool to clear the request list.
Because this is a live application, I have limited troubleshooting options, so anything that would halt or bring down the application pool I am not allowed to do.
IIS Information
Concurrent connection cap is set to its maximum of 65535.
Configuration debug in the web config is set to false and we have a
timeout set at 110 seconds.
Windows Server 2012 R2 Version 6.2 (Build 9200)
IIS Version 8.5.9600.16384
The long running requests have 0 data transfer, checked with
WireShark.
I'm pretty much at a loss on why these aren't timing out. I've set all the appropriate settings - the ones I could find from MSDN and other sources. We have a very, very hard time replicating this on our development environment so it's been blind testing for the most part. I've found articles and such on other state hangs, but I cannnot find anything on why a request in the EndRequest state will not time out.
Advanced Settings Page:
https://postimg.org/image/gxec32kmt/
Application Pool Requests Page:
https://postimg.org/image/qupcw57o5/
Web Config:
https://postimg.org/image/5xt4rh1xh/
Update 1
I did a bit of digging into our fallback that is supposed to close connections after an hour of no usage. We seem to currently have 10,153 sessions still active with a last active time of 3 days ago. I've stepped through this function quite a bit and it seems to be working as intended. It goes through the list of sessions and anyone over an hour of inactivity has their WebSocketHandler.Close() method called. However it seems some sessions are refusing to close after the method is being called. We have logging in place to tell us if any exceptions are being thrown during the run but it seems as though it's running as expected.
This was my mistake. I was running against an old sessions data pull. A current pull of the session data shows no sessions running greater then their specified time. This means that the WebSocketHandler.Close() was called on them and they were removed from our in-memory list.
Update 2
NETSTAT using netstat -s on pastebin: https://pastebin.com/embed_js/qBbZ4gJ1
Update 3
Correction to update 1. Can a connection close be called and fail? If so, then we're accidentally orphaning the reference to the connection in our server. I would still expect the IIS timeout to kick in however, there must be some catches to it collecting requests.
I have a shared hosting account with 128MB of RAM and my site is in its own app pool.
The site is small and gets low traffic, but I keep getting the following error:
A worker process serving application pool 'xxx v4.0 (Classic)' has requested a recycle because it reached its private bytes memory limit.
This is happening frequently, which restarts the app pool. If the app pool restarts too often, eventually it will stop. Then I'll get a 503 error when I go to the site.
The site is written using c#, with data access from ef and ado.net. All my database connections are in using statements and I am confident they are being opened and closed correctly.
I have spoken to the host and I can upgrade the RAM to 256MB which does appear to make the site run nicely. But I am a bit concerned that just upgrading the RAM is only masking the problem temporarily.
Debug is set to false in the web config and I before I copy the files to the server I am building for release.
When I run the solution in visual studio my IIS Worker Process hovers around 100 MB.
I think my questions are:
Is there any way I can replicate my hosting environment on my local machine?
Is it normal for a fairly small website to exceed 128MB of RAM?
I am at a bit of a loss of what to try. Any help or guidance would be greatly appreciated.
Other potentially important info:
.NET Framework is 4.5
Web Forms
AjaxControlToolkit is used (only the scripts I need are loaded)
I've looked at many blog posts and similar questions but I can't seem to make any progress.
Thanks
Jim
That message is about hitting the configured limit within IIS itself, it does not necessarily have anything to do with the amount of RAM on the host itself (although the settings you do set within IIS should take your aggregate RAM into account, so there is an indirect link).
Open IIS
Left Click on "Application Pools"
Find your dedicated pool and right click on it, selecting "Recycling..."
Check the "Private memory usage (in KB):" value
That is what you are exceeding
[These instructions are based on IIS 7.5 but are similar for other versions]
I am working on an OLAP application, WCF + Silverlight clients (up to 100 concurrent users). Unfortunately from time to time, a specific service call goes crazy (although it is perfectly valid, just too complex) and occasionally (once a month) brings the whole server down (by consuming all CPU).
A solution would involve killing user request or even the whole user session which is not a big deal for us from the business perspective - recovering/restarting the whole application is.
The idea of isolating user sessions into separate processes is very tempting: CPU/memory throttling and clean resource disposal (not like Thread.Abort) - if modern browsers can do this just for web pages, maybe it's time to do this on servers. We just want to evaluate this concept and see pros and cons in our particular scenario.
Hence the questions:
Is there already an existing library/framework which will be useful for managing processes (like pre-spawning/reusing processes, throttling, kill after timeout)?
Are there any "best practices" or guidelines how to create such architecture?
I was having same problem with my WCF services they too serve more than 100 clients..
and problem which i discovered using IIS logs (C:\Windows\System32\LogFiles\HTTPERR)
I found my problem in Application Pool Recycle timeout on IIS setting.
Application pool was getting restarted every 48 hours and which was causing issues with already subscribed clients.
So i would suggest
1. Analyze the http error logs and IIS logs which will give more information about all the application pools status if any gets shutdown or recycled.
2. If application pool crashes then Setup for Windbg and attach the process set the correct source file path. It will tell you the location if any exceptions are occurring.
(Sorry if this is a really long question, it said to be specific)
The company I work for has a number of sites, which have been running for some time with no problems. The applications are a mix of ASP.NET 2.0, 3.5, and 4.0, all using an ADO.NET to connect to a SQL Server Standard instance (on the same webserver) all being hosted with IIS7.
The problem began when we moved to an upgraded webserver. We made every effort to set up the server, db instance and IIS with the exact same settings (except for the different machine name, and the fact that we had upgraded from SQLExpress to Standard), and as far as we could tell, we did. Both servers are running Windows Server 2008 R2 (all current updates applied), and received a default install.
The problem is very apparent when starting up one of these applications. When you reach the login page of our application, the page itself loads extremely fast. This is true even when you load the page from a new machine that could not possibly have the page cached, with IIS caching disabled. The problem is actually visible when you enter your login information and click the login button. Because of the (not great)design of our databases, the login process must access a number of databases, theoretically up to 150 separate DBs, but in practice usually 2. The problem occurs even when only 2 databases (the minimum) are opened. Not a great design, but we have to live with it for now.
When trying to initially open a connection to the database, the entire process stops for about 20 seconds every time, regardless of whether you are connecting to 2 dbs or 40. I have run a .NET profiler (jetbrains dottrace) against the process, and the only information I could take from it was that one or all of the calls to sqlconnection.open() was accounting for 90% of the time. This only happens on first-use of the application, but the problem is compounded by the fact that IIS seems to disregard the recycling settings we have set for it, and recycles the application after a few minutes of idle, causing the problem to occur again.
I also tried to use the SQL Server profiler to see which database operations were the cause of the slowdown, but because of all the other DB activity, (and the fact that I had to do this on our production server, because the problem doesnt occur in our test environments) I couldn't pin down the exact operation that was causing the stoppage. I will try coming in late at night and shutting down the production sites to run the SQL profiler, but I might not be able to do this right away.
In the course of researching the problem, I have tried a couple solutions
Thinking it might be a name resolution problem, I tried modifiying both the hosts file on the webserver as well as giving the connectionstrings an IP address instead of the servername to resolve, with no difference. I have heard of the LLMNR protocol causing problems like this, but I think trying to connect by IP or resolving with the hosts file should have eliminated that possibility, tho i admit I never tried actually turning off LLMNR.
I have increased the idle timeouts, recycling intervals etc in IIS, but this doesn't even seem to be respected, much less solving the problem. This leads me to believe there is a setting overriding the IIS application settings on the machine.
multiple other code fixes, none of which made any difference. Is a SqlServer setting causing the problem?
other stuff that i forgot by now.
Any ideas, experience or whatevers would be greatly appreciated in helping me solve this problem!
I would advise using a non-tcp connection if you are still running the SQL instance on the local machine. SQL Server supports several protocols, tcp, named pipes, and shared memory are the more common.
Named Pipes
Data Source=np:computer\instance
Shared Memory
Data Source=lpc:computer\instance
Personally I prefer the Shared Memory. Remember you need to enable these protocols, and to avoid configuration mistakes I suggest you disable all you are not using.
see http://msdn.microsoft.com/en-us/library/ms187892.aspx
IIS Reset
In IIS7 there are two ways to configure the idle-timeout. Both begin by clicking on the "Application Pools" section and right-clicking the appropriate app domain. If you click the "Recycling..." option there is one setting. The other is in "Advanced Settings..." under the section for "Process Model" you will find "Idle Time-out (minutes)" which set to zero disables the process timeout. This later option is the one that works for us.
If I were you I'd solve this problem first as restarting the appdomain and/or worker process is always painful even if you don't have a 20 second lag.
Some ideas:
from the web server, can you ping the db server and get a "normal"
response, or are you seeing a similar delay?
if you're seeing a delay, run a tracert to see if you can nail down where the slowness is occurring
try using a tool like QueryExpress (http://www.albahari.com/queryexpress.aspx) which doesn't require an install to run. You can download this EXE and run it from your web server. See if you can connect to your db using this and run queries in a normal fashion.
Try something like SysInternals' TcpView (http://technet.microsoft.com/en-us/sysinternals/bb897437) to take a look at your open connections and see what activity is happening on your server and how much data is being sent to and received from your db server.
Just some initial thoughts on where I'd start to look based upon your problem description. I hope this helps. Good luck with things!
With IIS not respecting recycling settings: did restarting IIS/rebooting change the behavior?