Huge amount of close_wait after .NET CORE webapp recycled on IIS behing Windows NLB - c#

Trying to find out the root cause of this issue for weeks without success, hope someone will get new ideas where to look.
We have an Inprocess .NET Core application running on Windows 2012RC2 with IIS Ver 8.5 (.net core hosting 2.2.6).
This application is duplicated on 2 servers sharing the same IP through Windows NLB.
Both machines are physical with recent HW and 1GB link
The traffic managed by the application is IMHO not that high at all (300 requests per minute !!)
However, randomly when IIS the application pool is recycling, the application is not able to send the ACK from the TCP sequence to acknowledge the CLOSE_WAIT. Ending up with exponential number of close_wait connection close wait comparison (see traffic picture of the 2 servers at issue time)
The server at one point is not able to serve anything anymore.
This will end-up either with the default keepalivetimeout of windows occurs or if we manually recycle again the apppool.
Looking into tcpview tool at that time, the close wait are linked to the system process , so not able to know if it is the new worker process or the closing one which triggered this issue.
What I don't understand is why suddenly the app can't ACK the new close_wait calls ?
Because of the "VIP" (NLB) shared by the servers ? For us configuration and HB traffic seems ok.
Something wrong with IIS and .net core Inprocess model ? Didn't see anyone complaining about that
Any httpclient definition in the code wrongly done ? devs had a look ad didn't find anything wrong there too
Is the application is not able to send the ACK or not able to detect the FIN_ACK2 ?quite hard to see when the close wait increase so quickly, even with wireshark dump on big server
Mitigation was to reduce Windows keepalived time and controlling the recycling at very low traffic time, but I would like to know if anyone has any other idea where to dig into, or had same issue...

Related

How to diagnostics multiple unexpected client connection/disconnection on blazor server side

I have an blazor server-side app deployed on average 100 site which generally works pretty well.
But for very few site, end-users complain about micro disconnection/reconnection on the app (they see the components-reconnect-modal few seconds).
I logged some information on the OnConnectionUpAsync and OnConnectionDownAsync of the CircuitHandler to see whats happen, and effectively i notice some log showing a lot of time client disconnected for one, two or three seconds. But i don't have the reason
There is an average of maybe 50 and 100 end user connected at the same time at the app on this site.
I really don't know how to start to investigate this case. This client have a server well dimensionnate, with recent windows server, the configuration is OK.
I don't have specific configuration or parameters on the app
I would suspect the problem is on the network of my client (so not specific on my app, like shutdown of the network for 3 second), but The client assure me is network works very well and he don't have any problems with his others apps.
Have you any idea what i can do to have more information on this problem ?
do you know some tools which can test the network and show potentials problems ?
thanks in advance

ASP.NET .NET 4.5 Application crashes in IIS periodically and I can't figure out the cause

I have a .net 4.5 ASP.NET WebAPI application. Deployed in IIS using 1 worker on an 8gig VM with 4 CPUs.
I made changes to it recently (upgraded ServiceStack.Interfaces, ServiceStack.Common, ServiceStack.Redis and a bunch of dependencies) and started noticing that the IIS app pool this app is deployed on recycles about once an hour (give or take a few minutes).
There is nothing in my application logs that show any kind of issues. I collect metrics using telegraf and I do NOT see memory metrics increase at all, as far as all the metrics I look at everything looks absolutely normal and then the app pool recycles.
I looked at the event viewer and filtered the logs by WAS source and see event with ID 5011. Which basically means the IIS worker crash as I understand.
So then I used the DebugDiag and ran it on my local box with the app deployed on my box (I can reproduce the issue locally). It ran for a while and finally got the same event in the event viewer. Looked at the crash analysis logs from DebugDiag and all I see if a bunch of exceptions logged but nothing concrete right before the crash.
At this point I'm not entirely sure what else I can to figure out what's causing the crash so hoping there are more suggestions on what I can do to get more transparency.
What I think is happening is, there is some incompatibility with one of my dependencies and some of the upgraded packages which cause an exception to be thrown which is not handled by anything and crashes the IIS worker.
My application is working perfectly fine, as far as all API endpoints functions wit no issues, memory is NOT increasing, CPU is fine. So as far as I can tell there are no issues upto the crash.
Wondering if anyone knows any tricks to find whats causing the crash and/or handle it, prevent this exception from escaping and crashing the worker.
I was able to narrow down with some confidence that the issue lies somewhere within the ServiceStack.Redis RedisPubSubServer. What is the actual issue, I don't know as that would take a lot more time to dig and I've wasted too much time already.
However, piggybacking on some existing code I had (from before ServiceStack supported sentinel) I created a new implementation of the redis client wrapper for the which I call LazySentinelServiceStackClientWrapper; instead of using the built-in sentinel manager, it relies on a custom sentinel provider which I created LazySentinelApiSentinelProvider this implementation attempts to interrogate the available sentinel hosts in random order for master and slave nodes and then I construct a pool using the retrieved read/write and readonly hosts and this pool is used to run the redis operations. The pool is refreshed whenever an error occurs (after a failover). Opposed to the builtin sentinel manager that comes with ServiceStack.Redis which instantiates Redis pubsub server and listens for messages from sentinel whenever configuration changes such as fail-overs occur and updates the managed redis connection pool.
I installed my version of this redis client wrapper into my application has seen no app pool recycle events since (other than the scheduled ones).
Above is the log of app pool recycle events before I disabled the ServiceStack.Redis sentinel manager.
And here's the log of app pool recycle events after installing my new lazy sentinel manager
The first spike is me recycling the app manually and second one is the scheduled 1am recycle. So clearly the issue is solved.
What is the actual reason why the sentinel manager via redis pub sub server is causing IIS rapid fail protection to fire and recycle the app pool I do not know. Maybe someone with much more redis experience and/or IIS experience can attest to that. Also I did not test this in .net core and only tested for a .net 4.5.1 application deployed in IIS but on many different machines including local development machine and beefy production machines.
Finally one last note, that first image which shows all the recycle events, that's on my CI machine which is barely taking any traffic, maybe 1 request every few minutes. So this means the issue is not some memory leak or some resource exhaustion. Whatever the issue is, it happens regardless of traffic, CPU load, memory load, it just happens periodically.
Needless to say I will not be using the builtin sentinel manager at least for now.

IIS takes up to 2 minutes to handle off and request to .NET app

We have an .NET application hosted on IIS, it has 4 GET/POST methods.
What i found that under higher load (100 parallel calls), it takes IIS about 2 minutes between when the GET/POST request is received and when the .NET Application starts to handle it.
This happened on a local test server, still happens after app was deployed to azure with more powerfull CPU/RAM.
I am currently looking into any possible limits like number of connections but making no progress.
Anyone can offer any hints?
Thank you
Additional info based on questions asked:
Overall, until the number of parallel clients reaches 50, it works as expected. After this grows over 50 (this is not a hard number, moves between about 50-60 clients), we start to see this issue with IIS not handling over the request to .NET fast enough.
After a second IIS server was deployed and the .NET application was installed, it is now capable of serving 100 users, ergo 50 users / IIS.

A worker process serving application pool 'xxx v4.0 (Classic)' has requested a recycle because it reached its private bytes memory limit

I have a shared hosting account with 128MB of RAM and my site is in its own app pool.
The site is small and gets low traffic, but I keep getting the following error:
A worker process serving application pool 'xxx v4.0 (Classic)' has requested a recycle because it reached its private bytes memory limit.
This is happening frequently, which restarts the app pool. If the app pool restarts too often, eventually it will stop. Then I'll get a 503 error when I go to the site.
The site is written using c#, with data access from ef and ado.net. All my database connections are in using statements and I am confident they are being opened and closed correctly.
I have spoken to the host and I can upgrade the RAM to 256MB which does appear to make the site run nicely. But I am a bit concerned that just upgrading the RAM is only masking the problem temporarily.
Debug is set to false in the web config and I before I copy the files to the server I am building for release.
When I run the solution in visual studio my IIS Worker Process hovers around 100 MB.
I think my questions are:
Is there any way I can replicate my hosting environment on my local machine?
Is it normal for a fairly small website to exceed 128MB of RAM?
I am at a bit of a loss of what to try. Any help or guidance would be greatly appreciated.
Other potentially important info:
.NET Framework is 4.5
Web Forms
AjaxControlToolkit is used (only the scripts I need are loaded)
I've looked at many blog posts and similar questions but I can't seem to make any progress.
Thanks
Jim
That message is about hitting the configured limit within IIS itself, it does not necessarily have anything to do with the amount of RAM on the host itself (although the settings you do set within IIS should take your aggregate RAM into account, so there is an indirect link).
Open IIS
Left Click on "Application Pools"
Find your dedicated pool and right click on it, selecting "Recycling..."
Check the "Private memory usage (in KB):" value
That is what you are exceeding
[These instructions are based on IIS 7.5 but are similar for other versions]

Process Management Library

I am working on an OLAP application, WCF + Silverlight clients (up to 100 concurrent users). Unfortunately from time to time, a specific service call goes crazy (although it is perfectly valid, just too complex) and occasionally (once a month) brings the whole server down (by consuming all CPU).
A solution would involve killing user request or even the whole user session which is not a big deal for us from the business perspective - recovering/restarting the whole application is.
The idea of isolating user sessions into separate processes is very tempting: CPU/memory throttling and clean resource disposal (not like Thread.Abort) - if modern browsers can do this just for web pages, maybe it's time to do this on servers. We just want to evaluate this concept and see pros and cons in our particular scenario.
Hence the questions:
Is there already an existing library/framework which will be useful for managing processes (like pre-spawning/reusing processes, throttling, kill after timeout)?
Are there any "best practices" or guidelines how to create such architecture?
I was having same problem with my WCF services they too serve more than 100 clients..
and problem which i discovered using IIS logs (C:\Windows\System32\LogFiles\HTTPERR)
I found my problem in Application Pool Recycle timeout on IIS setting.
Application pool was getting restarted every 48 hours and which was causing issues with already subscribed clients.
So i would suggest
1. Analyze the http error logs and IIS logs which will give more information about all the application pools status if any gets shutdown or recycled.
2. If application pool crashes then Setup for Windbg and attach the process set the correct source file path. It will tell you the location if any exceptions are occurring.

Categories