ASP.NET 2.0-4.0 Web Applications experiencing extremely slow initial start-up. - c#

(Sorry if this is a really long question, it said to be specific)
The company I work for has a number of sites, which have been running for some time with no problems. The applications are a mix of ASP.NET 2.0, 3.5, and 4.0, all using an ADO.NET to connect to a SQL Server Standard instance (on the same webserver) all being hosted with IIS7.
The problem began when we moved to an upgraded webserver. We made every effort to set up the server, db instance and IIS with the exact same settings (except for the different machine name, and the fact that we had upgraded from SQLExpress to Standard), and as far as we could tell, we did. Both servers are running Windows Server 2008 R2 (all current updates applied), and received a default install.
The problem is very apparent when starting up one of these applications. When you reach the login page of our application, the page itself loads extremely fast. This is true even when you load the page from a new machine that could not possibly have the page cached, with IIS caching disabled. The problem is actually visible when you enter your login information and click the login button. Because of the (not great)design of our databases, the login process must access a number of databases, theoretically up to 150 separate DBs, but in practice usually 2. The problem occurs even when only 2 databases (the minimum) are opened. Not a great design, but we have to live with it for now.
When trying to initially open a connection to the database, the entire process stops for about 20 seconds every time, regardless of whether you are connecting to 2 dbs or 40. I have run a .NET profiler (jetbrains dottrace) against the process, and the only information I could take from it was that one or all of the calls to sqlconnection.open() was accounting for 90% of the time. This only happens on first-use of the application, but the problem is compounded by the fact that IIS seems to disregard the recycling settings we have set for it, and recycles the application after a few minutes of idle, causing the problem to occur again.
I also tried to use the SQL Server profiler to see which database operations were the cause of the slowdown, but because of all the other DB activity, (and the fact that I had to do this on our production server, because the problem doesnt occur in our test environments) I couldn't pin down the exact operation that was causing the stoppage. I will try coming in late at night and shutting down the production sites to run the SQL profiler, but I might not be able to do this right away.
In the course of researching the problem, I have tried a couple solutions
Thinking it might be a name resolution problem, I tried modifiying both the hosts file on the webserver as well as giving the connectionstrings an IP address instead of the servername to resolve, with no difference. I have heard of the LLMNR protocol causing problems like this, but I think trying to connect by IP or resolving with the hosts file should have eliminated that possibility, tho i admit I never tried actually turning off LLMNR.
I have increased the idle timeouts, recycling intervals etc in IIS, but this doesn't even seem to be respected, much less solving the problem. This leads me to believe there is a setting overriding the IIS application settings on the machine.
multiple other code fixes, none of which made any difference. Is a SqlServer setting causing the problem?
other stuff that i forgot by now.
Any ideas, experience or whatevers would be greatly appreciated in helping me solve this problem!

I would advise using a non-tcp connection if you are still running the SQL instance on the local machine. SQL Server supports several protocols, tcp, named pipes, and shared memory are the more common.
Named Pipes
Data Source=np:computer\instance
Shared Memory
Data Source=lpc:computer\instance
Personally I prefer the Shared Memory. Remember you need to enable these protocols, and to avoid configuration mistakes I suggest you disable all you are not using.
see http://msdn.microsoft.com/en-us/library/ms187892.aspx
IIS Reset
In IIS7 there are two ways to configure the idle-timeout. Both begin by clicking on the "Application Pools" section and right-clicking the appropriate app domain. If you click the "Recycling..." option there is one setting. The other is in "Advanced Settings..." under the section for "Process Model" you will find "Idle Time-out (minutes)" which set to zero disables the process timeout. This later option is the one that works for us.
If I were you I'd solve this problem first as restarting the appdomain and/or worker process is always painful even if you don't have a 20 second lag.

Some ideas:
from the web server, can you ping the db server and get a "normal"
response, or are you seeing a similar delay?
if you're seeing a delay, run a tracert to see if you can nail down where the slowness is occurring
try using a tool like QueryExpress (http://www.albahari.com/queryexpress.aspx) which doesn't require an install to run. You can download this EXE and run it from your web server. See if you can connect to your db using this and run queries in a normal fashion.
Try something like SysInternals' TcpView (http://technet.microsoft.com/en-us/sysinternals/bb897437) to take a look at your open connections and see what activity is happening on your server and how much data is being sent to and received from your db server.
Just some initial thoughts on where I'd start to look based upon your problem description. I hope this helps. Good luck with things!

With IIS not respecting recycling settings: did restarting IIS/rebooting change the behavior?

Related

How to diagnostics multiple unexpected client connection/disconnection on blazor server side

I have an blazor server-side app deployed on average 100 site which generally works pretty well.
But for very few site, end-users complain about micro disconnection/reconnection on the app (they see the components-reconnect-modal few seconds).
I logged some information on the OnConnectionUpAsync and OnConnectionDownAsync of the CircuitHandler to see whats happen, and effectively i notice some log showing a lot of time client disconnected for one, two or three seconds. But i don't have the reason
There is an average of maybe 50 and 100 end user connected at the same time at the app on this site.
I really don't know how to start to investigate this case. This client have a server well dimensionnate, with recent windows server, the configuration is OK.
I don't have specific configuration or parameters on the app
I would suspect the problem is on the network of my client (so not specific on my app, like shutdown of the network for 3 second), but The client assure me is network works very well and he don't have any problems with his others apps.
Have you any idea what i can do to have more information on this problem ?
do you know some tools which can test the network and show potentials problems ?
thanks in advance

ASP.NET .NET 4.5 Application crashes in IIS periodically and I can't figure out the cause

I have a .net 4.5 ASP.NET WebAPI application. Deployed in IIS using 1 worker on an 8gig VM with 4 CPUs.
I made changes to it recently (upgraded ServiceStack.Interfaces, ServiceStack.Common, ServiceStack.Redis and a bunch of dependencies) and started noticing that the IIS app pool this app is deployed on recycles about once an hour (give or take a few minutes).
There is nothing in my application logs that show any kind of issues. I collect metrics using telegraf and I do NOT see memory metrics increase at all, as far as all the metrics I look at everything looks absolutely normal and then the app pool recycles.
I looked at the event viewer and filtered the logs by WAS source and see event with ID 5011. Which basically means the IIS worker crash as I understand.
So then I used the DebugDiag and ran it on my local box with the app deployed on my box (I can reproduce the issue locally). It ran for a while and finally got the same event in the event viewer. Looked at the crash analysis logs from DebugDiag and all I see if a bunch of exceptions logged but nothing concrete right before the crash.
At this point I'm not entirely sure what else I can to figure out what's causing the crash so hoping there are more suggestions on what I can do to get more transparency.
What I think is happening is, there is some incompatibility with one of my dependencies and some of the upgraded packages which cause an exception to be thrown which is not handled by anything and crashes the IIS worker.
My application is working perfectly fine, as far as all API endpoints functions wit no issues, memory is NOT increasing, CPU is fine. So as far as I can tell there are no issues upto the crash.
Wondering if anyone knows any tricks to find whats causing the crash and/or handle it, prevent this exception from escaping and crashing the worker.
I was able to narrow down with some confidence that the issue lies somewhere within the ServiceStack.Redis RedisPubSubServer. What is the actual issue, I don't know as that would take a lot more time to dig and I've wasted too much time already.
However, piggybacking on some existing code I had (from before ServiceStack supported sentinel) I created a new implementation of the redis client wrapper for the which I call LazySentinelServiceStackClientWrapper; instead of using the built-in sentinel manager, it relies on a custom sentinel provider which I created LazySentinelApiSentinelProvider this implementation attempts to interrogate the available sentinel hosts in random order for master and slave nodes and then I construct a pool using the retrieved read/write and readonly hosts and this pool is used to run the redis operations. The pool is refreshed whenever an error occurs (after a failover). Opposed to the builtin sentinel manager that comes with ServiceStack.Redis which instantiates Redis pubsub server and listens for messages from sentinel whenever configuration changes such as fail-overs occur and updates the managed redis connection pool.
I installed my version of this redis client wrapper into my application has seen no app pool recycle events since (other than the scheduled ones).
Above is the log of app pool recycle events before I disabled the ServiceStack.Redis sentinel manager.
And here's the log of app pool recycle events after installing my new lazy sentinel manager
The first spike is me recycling the app manually and second one is the scheduled 1am recycle. So clearly the issue is solved.
What is the actual reason why the sentinel manager via redis pub sub server is causing IIS rapid fail protection to fire and recycle the app pool I do not know. Maybe someone with much more redis experience and/or IIS experience can attest to that. Also I did not test this in .net core and only tested for a .net 4.5.1 application deployed in IIS but on many different machines including local development machine and beefy production machines.
Finally one last note, that first image which shows all the recycle events, that's on my CI machine which is barely taking any traffic, maybe 1 request every few minutes. So this means the issue is not some memory leak or some resource exhaustion. Whatever the issue is, it happens regardless of traffic, CPU load, memory load, it just happens periodically.
Needless to say I will not be using the builtin sentinel manager at least for now.

Application switched to live URL causes excessive DB usage

Very very strange issue here... Apologies in advance for the wall of text.
We have a suite of applications running on an EC2 instance, all connecting to an RDS instance.
We are hosting the staging and production applications on the same EC2 server.
With one of the applications, as soon as the staging app is moved to prod, over 250 or so connections to the DB are opened, causing the RDS instance to max out CPU usage and make the entire suite slow down. The staging application itself does not have this issue.
The issue can be replicated by both deploying the app via our Octopus setup, and also physically copy pasting the BIN/Views folder from staging to live.
The connections are instant, boosting the CPU usage to 99% in less than a minute.
Things to note...
Running how to see active SQL Server connections? will show the bulk connections, none of which have a LoginName.
Resource monitor on the FE server will list the connections, all coming from a IIS, seemingly scanning all outbound ports, attempting to connect to the DB server on its port. FE server address and DB server address blacked out respectively. Only a snippet of all all of the connections.
The app needs users to log in to perform 99.9% of tasks. There is a public "Forgot your password" method that was updated to accept either a username or password. No change to the form structure or form action URL, just an extra check in the back.
Other changes were around how data was to be displayed and payment restrictions under certain conditions. Both of which require a login.
Things I've tried...
New app pools
Just giving it a few days to forget this ever happened
Not using Octopus to publish
Checking all areas that were updated between versions to see if a connection was not closed properly.
Really at a loss as to what is happening. This is the first time that I've seen something like this. Especially strange that staging is fine, but the same app on another URL/Connection string fails so badly.
The only think I can think of would potentially be some kind of scraper that is polling the public form, but that makes no sense as why isn't it happening with the current app...
Is there something in AWS that can monitor the calls that are being made? I vaguely remember something in NewRelic being able to do so.
Any suggestions and/or similar experiences are welcomed.
Edits.
Nothing outstanding in logs for the day of the issue (yesterday)
No incoming traffic to match all of the outbound requests
No initialisation is performed by the application on startup
Update...
We use ADO for most of our queries. A query was updated to get data from different tables. The method name and parameters were not changed, just the body of the query. If I use sys.dm_exec_sql_text to see what is getting sent to the DB, I can see that is IS the updated query that is being sent in each of the hundreds of connections. They are all showing as suspended though... Nothing has changed in regards to how that query is sent to the server, just the body of the query itself...
So, one of the other queries that was published in the update broke it. We reverted only that query and deployed a new version, and it is fine.
Strangely enough, it's one that is being run in one form or another over the entire suite. But just died under any sort of load that wasn't staging, which is why I assumed it would be the last place to look.

IE10 and Windows 8 and ASP.NET MVC (IIS 7)

I'm having a very interesting, but frustrating issue. I have an MVC 4 site running with standard ASP.NET Authentication.
In and only in the combination of IE 10 on Windows 8, when I traverse my site and navigate to an http url from an https url (both on the same site), it is generating a different asp.net_sessionid value. In every other browser/OS combo I have tried, this does not appear to be an issue.
I have searched high and low and while I certainly have found people experiencing various authentication issues (usually regarding IIS7 not recognizing IE10 as a browser), I have not found anyone else claiming to have experienced this exact issue. More concerning, I published an 'out of the box' MVC template project and it has the same issue. I can't possibly be the only one who has run across this problem (so I hope).
Anyone else run into this? Or maybe even just have some suggestions?
Thanks
UPDATE
Okay, so there is one more important aspect. I am running this on a load balanced environment. If I push the apps to a single server and test, I have no issues.
You mention that you have had this problem in a load balanced environment, right? I assume you are using the default "In Proc" method of storing session data. If that is the case, then I think I know what could be happening. (for the sake of argument, I will assume 2 servers, but It doesn't really matter if you have more)
You are being sent to ServerA and a session is created. Because this is In Process ServerB has no idea about it. Eventually (and how that happens is a matter of how your load balancer is set up. Sticky session? Cookies? Round robin?) you will be sent to ServerB. Because that server had no idea you already have a session; a new one is created and you are given a new session ID.
So why is it happening under your exact repro steps? Well, my hunch is that given enough time and load, you would see it just navigating from /page1 to /page2. Again - this depends on how your load balancer is setup, but it could be that since you are changing protocol that triggers something and you are sent to the another server in the pool.
How can you fix it?
To start, make sure you have the same machine key in their machine.config. If you don't have access to that I think it will work in the web.config, but I haven't tried it.
Now, set up another way to store session state. In Sql Server perhaps or MySql or Postgres or wherever. If you have SQL Server that will be the easiest since the driver is built it, but if you have another datastore you will either need to build or find a library that will do it. I worked on a project where we used Postgres to store session state.
We used npgsql as the driver to conenct to the server, and built our own PgsqlSessionProvider:SessionStateStoreProviderBase and hooking it up is actually really easy
<sessionState mode="Custom" customProvider="PgsqlSessionProvider">
<providers>
<add name="PgsqlSessionProvider" type="My.namespace.PgsqlSessionStateStore" connectionStringName="connectionStringName" writeExceptionsToEventLog="true" />
</providers>
</sessionState>

SQL Server 2000 intermittent connection exceptions on production server - specific environment problem?

We've been having intermittent problems causing users to be forcibly logged out of out application.
Our set-up is ASP.Net/C# web application on Windows Server 2003 Standard Edition with SQL Server 2000 on the back end. We've recently performed a major product upgrade on our client's VMWare server (we have a guest instance dedicated to us) and whereas we had none of these issues with the previous release the added complexity that the new upgrade brings to the product has caused a lot of issues. We are also running SQL Server 2000 (build 8.00.2039, or SP4) and the IIS/ASP.NET (.Net v2.0.50727) application on the same box and connecting to each other via a TCP/IP connection.
Primarily, the exceptions being thrown are:
System.IndexOutOfRangeException: Cannot find table 0.
System.ArgumentException: Column 'password' does not belong to table Table.
[This exception occurs in the log in script, even though there is clearly a password column available]
System.InvalidOperationException: There is already an open DataReader associated with this Command which must be closed first.
[This one is occurring very regularly]
System.InvalidOperationException: This SqlTransaction has completed; it is no longer usable.
System.ApplicationException: ExecuteReader requires an open and available Connection. The connection's current state is connecting.
System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
And just today, for the first time:
System.Web.UI.ViewStateException: Invalid viewstate.
We have load tested the app using the same number of concurrent users as the production server and cannot reproduce these errors. They are very intermittent and occur even when there are only 8/9/10 user connections. My gut is telling me its ASP.NET - SQL Server 2000 connection issues..
We've pretty much ruled out code-level Data Access Layer errors at this stage (we've a development team of 15 experienced developers working on this) so we think its a specific production server environment issue.
The Invalid Viewstate error is pretty common in a high traffic web site. Though, if you recently moved to multiple web servers, make sure you're sharing the same machine key so Viewstate is signed with the same key on all servers. http://www.codinghorror.com/blog/archives/000132.html
Based on the other errors I'd guess that you are using shared connections across multiple threads. Are your connections stored in static variables, Application state, Session state, or other object that's used across multiple requests? Maybe there's a hashtable somewhere containing connections, commands, or transactions. None of the ADO.Net objects are thread safe. So, make sure you only use them in a single threaded fashion.
Another possibility is you're passing around the ADO.NET objects and not consistently disposing of them and managing their scope. Maybe they're cached in the request context or some such?
I know you don't want to hear this, but people smarter than I have said it (check out McConnell's Code Complete if you don't believe me):
It's probably your code, and your gut is probably correct:
My gut is telling me its ASP.NET - SQL
Server 2000 connection issues..
The Errors being thrown are quite specific, and contextually, they look like they're just trying to connect and having a hard time -- which if it only happens in the client's environment, could be indicative of a setting not set correctly for the VM to access TCP connections on the host machine (under a different instance).
Are you sure that none of your code has changed since before the move, and that your previous environment had logging like this enabled? It may have been happening (to a lesser degree) before, but your environment didn't catch it because you didn't have logging enabled.
If that's not the issue, and I'm reading your post correctly: You're running a server on a guest instance provided by the client on their pipe and bandwidth? If that's the case, then quite possibly (around the same time as that upgrade) some routing configuration was changed, or firewall changes were made, or whatever box the instance is on had some change made now that it handles your stuff differently.
If you can't reproduce it in your environment, and you are 100% certain that it isn't your code; then logically it can only be their environment that is the issue.
Lads, just as an update, it turned out that the problem was VMWare related under heavy usage - what a fun week! We're changing the code around to suit the VMWare environment and we've seen some improvement already.
Thanks for the suggestions, I appreciate it.

Categories