Application switched to live URL causes excessive DB usage - c#

Very very strange issue here... Apologies in advance for the wall of text.
We have a suite of applications running on an EC2 instance, all connecting to an RDS instance.
We are hosting the staging and production applications on the same EC2 server.
With one of the applications, as soon as the staging app is moved to prod, over 250 or so connections to the DB are opened, causing the RDS instance to max out CPU usage and make the entire suite slow down. The staging application itself does not have this issue.
The issue can be replicated by both deploying the app via our Octopus setup, and also physically copy pasting the BIN/Views folder from staging to live.
The connections are instant, boosting the CPU usage to 99% in less than a minute.
Things to note...
Running how to see active SQL Server connections? will show the bulk connections, none of which have a LoginName.
Resource monitor on the FE server will list the connections, all coming from a IIS, seemingly scanning all outbound ports, attempting to connect to the DB server on its port. FE server address and DB server address blacked out respectively. Only a snippet of all all of the connections.
The app needs users to log in to perform 99.9% of tasks. There is a public "Forgot your password" method that was updated to accept either a username or password. No change to the form structure or form action URL, just an extra check in the back.
Other changes were around how data was to be displayed and payment restrictions under certain conditions. Both of which require a login.
Things I've tried...
New app pools
Just giving it a few days to forget this ever happened
Not using Octopus to publish
Checking all areas that were updated between versions to see if a connection was not closed properly.
Really at a loss as to what is happening. This is the first time that I've seen something like this. Especially strange that staging is fine, but the same app on another URL/Connection string fails so badly.
The only think I can think of would potentially be some kind of scraper that is polling the public form, but that makes no sense as why isn't it happening with the current app...
Is there something in AWS that can monitor the calls that are being made? I vaguely remember something in NewRelic being able to do so.
Any suggestions and/or similar experiences are welcomed.
Edits.
Nothing outstanding in logs for the day of the issue (yesterday)
No incoming traffic to match all of the outbound requests
No initialisation is performed by the application on startup
Update...
We use ADO for most of our queries. A query was updated to get data from different tables. The method name and parameters were not changed, just the body of the query. If I use sys.dm_exec_sql_text to see what is getting sent to the DB, I can see that is IS the updated query that is being sent in each of the hundreds of connections. They are all showing as suspended though... Nothing has changed in regards to how that query is sent to the server, just the body of the query itself...

So, one of the other queries that was published in the update broke it. We reverted only that query and deployed a new version, and it is fine.
Strangely enough, it's one that is being run in one form or another over the entire suite. But just died under any sort of load that wasn't staging, which is why I assumed it would be the last place to look.

Related

How to diagnostics multiple unexpected client connection/disconnection on blazor server side

I have an blazor server-side app deployed on average 100 site which generally works pretty well.
But for very few site, end-users complain about micro disconnection/reconnection on the app (they see the components-reconnect-modal few seconds).
I logged some information on the OnConnectionUpAsync and OnConnectionDownAsync of the CircuitHandler to see whats happen, and effectively i notice some log showing a lot of time client disconnected for one, two or three seconds. But i don't have the reason
There is an average of maybe 50 and 100 end user connected at the same time at the app on this site.
I really don't know how to start to investigate this case. This client have a server well dimensionnate, with recent windows server, the configuration is OK.
I don't have specific configuration or parameters on the app
I would suspect the problem is on the network of my client (so not specific on my app, like shutdown of the network for 3 second), but The client assure me is network works very well and he don't have any problems with his others apps.
Have you any idea what i can do to have more information on this problem ?
do you know some tools which can test the network and show potentials problems ?
thanks in advance

SQL Dependency causing errors in SQL Server

We have a C# web application that is using SQL dependency to expire cached query data. Although everything is working okay we are seeing a lot of errors being generated particularly in our production environment.
The first error messages is this:
Service Broker needs to access the master key in the database
'SubscriberManager'. Error code:32. The master key has to exist and
the service master key encryption is required.
It shows up in both the server event log and the SQL server log files. I believe the actual content of the messages something of a red herring as I have created a database master key.
I have also tried
Recreating both the service master key and the database master key.
Made sure the database owner is sa.
Made sure the user account has permissions to create services, queues, procedures and subscribe
query notifications
Made sure the broker is enabled
I have seen other people with similar errors whilst researching the issue but the error code almost always seems to be 25 or 26 not 32. I have been unable to find anything that tells me what these error codes mean so I'm not sure of the significance.
At the same time this happens we also sometimes get an error from .net saying the following:
System.InvalidOperationException: When using SqlDependency without
providing an options value, SqlDependency.Start() must be called for
each server that is being executed against
The applications call SqlDependency.Start in the Application_Start
event though.
These errors are strangely intermittent though, they do not happen every time a particular page is hit or even every time a notification is created or triggered. I have tried attaching SQL Profiler and monitoring the broker events and seen the notifications created an triggered without any problems.
Finally I am seeing a lot of errors like this:
The query notification dialog on conversation handle
'{2FA2445B-1667-E311-943C-02C798B618C6}.' closed due to the following
error: '-8490Cannot
find the remote service
'SqlQueryNotificationService-7303d251-1eb2-4f3a-9e08-d5d17c28b6cf'
because it does not exist.'.
I understand that a certain number of these are normal due to the way that SqlDependency.Stop doesn't clean everything up in the database that we are seeing thousands of these on one of our production servers.
What is frustrating as we have been using SQL notifications for several years now without issue so something must have changed in either our application or the server setups to cause this but at this point I have no idea what.
The applications are .net 4.0 MVC and WCF running on Windows 2012 servers calling SQL 2012 servers also running on Windows 2012.
According to Microsoft if the SQL Server is short on resources particularly memory it can cause theses errors to be generated.

ASP.NET 2.0-4.0 Web Applications experiencing extremely slow initial start-up.

(Sorry if this is a really long question, it said to be specific)
The company I work for has a number of sites, which have been running for some time with no problems. The applications are a mix of ASP.NET 2.0, 3.5, and 4.0, all using an ADO.NET to connect to a SQL Server Standard instance (on the same webserver) all being hosted with IIS7.
The problem began when we moved to an upgraded webserver. We made every effort to set up the server, db instance and IIS with the exact same settings (except for the different machine name, and the fact that we had upgraded from SQLExpress to Standard), and as far as we could tell, we did. Both servers are running Windows Server 2008 R2 (all current updates applied), and received a default install.
The problem is very apparent when starting up one of these applications. When you reach the login page of our application, the page itself loads extremely fast. This is true even when you load the page from a new machine that could not possibly have the page cached, with IIS caching disabled. The problem is actually visible when you enter your login information and click the login button. Because of the (not great)design of our databases, the login process must access a number of databases, theoretically up to 150 separate DBs, but in practice usually 2. The problem occurs even when only 2 databases (the minimum) are opened. Not a great design, but we have to live with it for now.
When trying to initially open a connection to the database, the entire process stops for about 20 seconds every time, regardless of whether you are connecting to 2 dbs or 40. I have run a .NET profiler (jetbrains dottrace) against the process, and the only information I could take from it was that one or all of the calls to sqlconnection.open() was accounting for 90% of the time. This only happens on first-use of the application, but the problem is compounded by the fact that IIS seems to disregard the recycling settings we have set for it, and recycles the application after a few minutes of idle, causing the problem to occur again.
I also tried to use the SQL Server profiler to see which database operations were the cause of the slowdown, but because of all the other DB activity, (and the fact that I had to do this on our production server, because the problem doesnt occur in our test environments) I couldn't pin down the exact operation that was causing the stoppage. I will try coming in late at night and shutting down the production sites to run the SQL profiler, but I might not be able to do this right away.
In the course of researching the problem, I have tried a couple solutions
Thinking it might be a name resolution problem, I tried modifiying both the hosts file on the webserver as well as giving the connectionstrings an IP address instead of the servername to resolve, with no difference. I have heard of the LLMNR protocol causing problems like this, but I think trying to connect by IP or resolving with the hosts file should have eliminated that possibility, tho i admit I never tried actually turning off LLMNR.
I have increased the idle timeouts, recycling intervals etc in IIS, but this doesn't even seem to be respected, much less solving the problem. This leads me to believe there is a setting overriding the IIS application settings on the machine.
multiple other code fixes, none of which made any difference. Is a SqlServer setting causing the problem?
other stuff that i forgot by now.
Any ideas, experience or whatevers would be greatly appreciated in helping me solve this problem!
I would advise using a non-tcp connection if you are still running the SQL instance on the local machine. SQL Server supports several protocols, tcp, named pipes, and shared memory are the more common.
Named Pipes
Data Source=np:computer\instance
Shared Memory
Data Source=lpc:computer\instance
Personally I prefer the Shared Memory. Remember you need to enable these protocols, and to avoid configuration mistakes I suggest you disable all you are not using.
see http://msdn.microsoft.com/en-us/library/ms187892.aspx
IIS Reset
In IIS7 there are two ways to configure the idle-timeout. Both begin by clicking on the "Application Pools" section and right-clicking the appropriate app domain. If you click the "Recycling..." option there is one setting. The other is in "Advanced Settings..." under the section for "Process Model" you will find "Idle Time-out (minutes)" which set to zero disables the process timeout. This later option is the one that works for us.
If I were you I'd solve this problem first as restarting the appdomain and/or worker process is always painful even if you don't have a 20 second lag.
Some ideas:
from the web server, can you ping the db server and get a "normal"
response, or are you seeing a similar delay?
if you're seeing a delay, run a tracert to see if you can nail down where the slowness is occurring
try using a tool like QueryExpress (http://www.albahari.com/queryexpress.aspx) which doesn't require an install to run. You can download this EXE and run it from your web server. See if you can connect to your db using this and run queries in a normal fashion.
Try something like SysInternals' TcpView (http://technet.microsoft.com/en-us/sysinternals/bb897437) to take a look at your open connections and see what activity is happening on your server and how much data is being sent to and received from your db server.
Just some initial thoughts on where I'd start to look based upon your problem description. I hope this helps. Good luck with things!
With IIS not respecting recycling settings: did restarting IIS/rebooting change the behavior?

C#, Sql Server 2008: Stream large result set to end user only works on some databases

I have a long running query that returns a large data set. This query is called from a web service and the results are converted to a CSV file for the end user. Previous versions would take 10+ minutes to run and would only return results to the end user once the query completes.
I rewrote the query to where it runs in a minute or so in most cases, and rewrote the way it is accessed so the results would be streamed to the client as they came into the asp.net web service from the database server. I tested this using a local instance of SQL Server as well as a remote instance without issue.
Now, on the cusp of production deployment it seems our production SQL server machine does not send any results back to the web service until the query has completed execution. Additionally, I found another machine, that is identical to the remote server that works (clones), is also not streaming results.
The version of SQL Server 2008 is identical on all machines. The production machine has a slightly different version of Windows Server installed (6.0 vs 6.1). The production server has 4 cores and several times the RAM as the other servers. The other servers are single core with 1GB ram.
Is there any setting that would be causing this? Or is there any setting I can set that will prevent SQL Server from buffering the results?
Although I know this won't really affect the overall runtime at all, it will change the end-user perception greatly.
tl;dr;
I need the results of a a query to stream to the end user as the query runs. It works with some database machines, but not on others. All machines are running the same version of SQL Server.
The gist of what I am doing in C#:
var reader = cmd.ExecuteReader();
Response.Write(getHeader());
while(reader.Read())
{
Response.Write(getCSVForRow(reader));
if(shouldFlush()) Response.Flush()
}
Clarification based on response below
There are 4 database servers, Local, Prod, QA1, QA2. They are all running SQL Server 2008. They all have identical databases loaded on them (more or less, 1 day lag on non-prod).
The web service is hosted on my machine (though I have tested remotely hosted as well).
The only change between tests is the connection string in the web.config.
QA2 is working (streaming), and it is a clone of QA1 (VMs). The only difference between QA1 and QA2 is an added database on QA2 not related to this query at all.
QA1 is not working.
All tests include the maximum sized dataset in the result (we limit to 5k rows at this time). The browser displays a download dialog once the first flush happens. This is the desired result. We want them to know their download is processing, even if the download speed is low and at times drops to zero (such is the way with databases).
My flushing code is simple at this time. Every k rows we flush, with k currently set to 20.
The most perplexing part of this is the fact that QA1 and QA2 behave differently. I did notice our production server is set to compatibility mode 2005 (90) where both QA and local database are set to 2008 (100). I doubt this matters. When I exec the sprocs through SSMS I have similar behavior across all machines. I see results stream in immediately.
Is there any connection string setting that could disable the streaming?
Everything I know says that what you're doing should work; both the DataReader and Response.Write()/.Flush() act in a "streaming" fashion and will result in the client getting the data one row at a time as soon as there are rows to get. Response does include a buffer, but you're pushing the buffer to the client after every read/write iteration which minimizes its use.
I'd check that the web service is configured to respond correctly to Flush() commands from the response. Make sure the production environment is not a Win2008 Server Core installation; Windows Server 2008 does not support Response.Flush() in certain Server Core roles. I'd also check that the conditions evaluated in ShouldFlush() will return true when you expect them to in the production environment (You may be checking the app config for a value, or looking at IIS settings; I dunno).
In your test, I'd try a much larger set of sample data; it may be that the production environment is exposing problems that are also present on the test environments, but with a smaller set of test data and a high-speed Ethernet backbone, the problem isn't noticeable compared to returning hundreds of thousands of rows over DSL. You can verify that it is working in a streaming fashion by inserting a Thread.Sleep() call after each Flush(250); this'll slow down execution of the service, and let you watch the response get fed to your client at 4 rows per second.
Lastly, make sure that the client you're using in the production environment is set up to display CSV files in a fashion that allows for streaming. This basically means that a web browser acting as the client should not be configured to pass the file off to a third-party app. A web browser can easily display a text stream passed over HTTP; that's what it does, really. However, if it sees the stream as a CSV file, and it's configured to hand CSV files over to Excel to open, the browser will cache the whole file before invoking the third-party app.
Put a new task that builds this huge CSV file in a task table.
Run the procedure to process this task.
Wait for the result to appear in your task table with SqlDependency.
Return the result to the client.

SQL Server 2000 intermittent connection exceptions on production server - specific environment problem?

We've been having intermittent problems causing users to be forcibly logged out of out application.
Our set-up is ASP.Net/C# web application on Windows Server 2003 Standard Edition with SQL Server 2000 on the back end. We've recently performed a major product upgrade on our client's VMWare server (we have a guest instance dedicated to us) and whereas we had none of these issues with the previous release the added complexity that the new upgrade brings to the product has caused a lot of issues. We are also running SQL Server 2000 (build 8.00.2039, or SP4) and the IIS/ASP.NET (.Net v2.0.50727) application on the same box and connecting to each other via a TCP/IP connection.
Primarily, the exceptions being thrown are:
System.IndexOutOfRangeException: Cannot find table 0.
System.ArgumentException: Column 'password' does not belong to table Table.
[This exception occurs in the log in script, even though there is clearly a password column available]
System.InvalidOperationException: There is already an open DataReader associated with this Command which must be closed first.
[This one is occurring very regularly]
System.InvalidOperationException: This SqlTransaction has completed; it is no longer usable.
System.ApplicationException: ExecuteReader requires an open and available Connection. The connection's current state is connecting.
System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
And just today, for the first time:
System.Web.UI.ViewStateException: Invalid viewstate.
We have load tested the app using the same number of concurrent users as the production server and cannot reproduce these errors. They are very intermittent and occur even when there are only 8/9/10 user connections. My gut is telling me its ASP.NET - SQL Server 2000 connection issues..
We've pretty much ruled out code-level Data Access Layer errors at this stage (we've a development team of 15 experienced developers working on this) so we think its a specific production server environment issue.
The Invalid Viewstate error is pretty common in a high traffic web site. Though, if you recently moved to multiple web servers, make sure you're sharing the same machine key so Viewstate is signed with the same key on all servers. http://www.codinghorror.com/blog/archives/000132.html
Based on the other errors I'd guess that you are using shared connections across multiple threads. Are your connections stored in static variables, Application state, Session state, or other object that's used across multiple requests? Maybe there's a hashtable somewhere containing connections, commands, or transactions. None of the ADO.Net objects are thread safe. So, make sure you only use them in a single threaded fashion.
Another possibility is you're passing around the ADO.NET objects and not consistently disposing of them and managing their scope. Maybe they're cached in the request context or some such?
I know you don't want to hear this, but people smarter than I have said it (check out McConnell's Code Complete if you don't believe me):
It's probably your code, and your gut is probably correct:
My gut is telling me its ASP.NET - SQL
Server 2000 connection issues..
The Errors being thrown are quite specific, and contextually, they look like they're just trying to connect and having a hard time -- which if it only happens in the client's environment, could be indicative of a setting not set correctly for the VM to access TCP connections on the host machine (under a different instance).
Are you sure that none of your code has changed since before the move, and that your previous environment had logging like this enabled? It may have been happening (to a lesser degree) before, but your environment didn't catch it because you didn't have logging enabled.
If that's not the issue, and I'm reading your post correctly: You're running a server on a guest instance provided by the client on their pipe and bandwidth? If that's the case, then quite possibly (around the same time as that upgrade) some routing configuration was changed, or firewall changes were made, or whatever box the instance is on had some change made now that it handles your stuff differently.
If you can't reproduce it in your environment, and you are 100% certain that it isn't your code; then logically it can only be their environment that is the issue.
Lads, just as an update, it turned out that the problem was VMWare related under heavy usage - what a fun week! We're changing the code around to suit the VMWare environment and we've seen some improvement already.
Thanks for the suggestions, I appreciate it.

Categories