We have a C# web application that is using SQL dependency to expire cached query data. Although everything is working okay we are seeing a lot of errors being generated particularly in our production environment.
The first error messages is this:
Service Broker needs to access the master key in the database
'SubscriberManager'. Error code:32. The master key has to exist and
the service master key encryption is required.
It shows up in both the server event log and the SQL server log files. I believe the actual content of the messages something of a red herring as I have created a database master key.
I have also tried
Recreating both the service master key and the database master key.
Made sure the database owner is sa.
Made sure the user account has permissions to create services, queues, procedures and subscribe
query notifications
Made sure the broker is enabled
I have seen other people with similar errors whilst researching the issue but the error code almost always seems to be 25 or 26 not 32. I have been unable to find anything that tells me what these error codes mean so I'm not sure of the significance.
At the same time this happens we also sometimes get an error from .net saying the following:
System.InvalidOperationException: When using SqlDependency without
providing an options value, SqlDependency.Start() must be called for
each server that is being executed against
The applications call SqlDependency.Start in the Application_Start
event though.
These errors are strangely intermittent though, they do not happen every time a particular page is hit or even every time a notification is created or triggered. I have tried attaching SQL Profiler and monitoring the broker events and seen the notifications created an triggered without any problems.
Finally I am seeing a lot of errors like this:
The query notification dialog on conversation handle
'{2FA2445B-1667-E311-943C-02C798B618C6}.' closed due to the following
error: '-8490Cannot
find the remote service
'SqlQueryNotificationService-7303d251-1eb2-4f3a-9e08-d5d17c28b6cf'
because it does not exist.'.
I understand that a certain number of these are normal due to the way that SqlDependency.Stop doesn't clean everything up in the database that we are seeing thousands of these on one of our production servers.
What is frustrating as we have been using SQL notifications for several years now without issue so something must have changed in either our application or the server setups to cause this but at this point I have no idea what.
The applications are .net 4.0 MVC and WCF running on Windows 2012 servers calling SQL 2012 servers also running on Windows 2012.
According to Microsoft if the SQL Server is short on resources particularly memory it can cause theses errors to be generated.
Related
I have a SQL Server with two databases, a production database and a development database. The .net 2.0 website hitting the production database with manual SqlConnection code is working fine. The other database is being hit from a newer ASP.NET MVC app using Entity Framework 6.2 and is getting timeout issues. The timeout takes 30 seconds the first time, but the page comes back almost instantaneously on subsequent refreshes. Both websites are on the same box as the database, so are only using "localhost" to connect. They are using SQL Server user logins, not Windows authentication.
I copied the .edmx and .tt files into a .net console app and that app has no problem hitting the database with the exact same linq query and pulling the same data that is failing.
I then created a new web site and copied just that same code into an aspx page. It fails the first time with a timeout, and then works on subsequent attempts (and a week ago, the main dev site was doing the same thing).
I separated the dev database from the SQL Server 2008 R2 server and attached it to a newly installed instance of SQL Server Express on a different port, and get the same results.
The web server is windows server 2008 standard 32-bit. I copied both websites and the console application to a new box (I thought was 2016, but it turns out it is 2008 standard 64-bit) and get the same results.
The dev site was working up until a couple of months ago. The client was using local user accounts for everything, but had a domain and wanted to do testing with windows authentication for an old vb app that hits the same database, and I had started migrating testing accounts to the domain. When the client tried to later, for an unrelated reason, change his password, we discovered that he was already using a domain account, but that his laptop could not connect to the domain. We found several other computers that could not connect, even though the machines I had connected to the domain during my testing were working fine. An outside network "friend" was brought in to figure out what was going on. At that point, I lost all track of what was actually done. I know that different network and domain configurations were tried and didn't fix the domain issues, but I don't know what. However, the production site was never rendered inoperative.
I have no idea what is going on. Does anyone else?
Oh, and in case it was a provider issue, I've also tried manual connection using OleDbconnection from the web app, and it also fails with the Timeout issue.
Update:
I spun up a new DataCenter 2016 box, installed IIS and .net on it and copied the website to that box. It has no problems hitting the database and pulling the data from the other server.
I know patches and such were updated on the original box while the domain and network were being manipulated, but I don't know how far behind they were. I suspect that some patch changed some default or inherited .net configuration options or something. I did do a "repair" on the .net installation, and that didn't make a difference. However, with the production site working fine, I'm not currently willing to uninstall .net or anything else. I'm afraid I would risk pushing this same error into the production site and the client would be screwed.
It seems that for some reason, the timeout period elapsed while attempting to consume the pre-login handshake acknowledgement.
Try increasing the connect timeout property in your connection string to 60 or more. Default is 15 (in seconds).
Example: Data Source=(LocalDB)\v11.0;Integrated Security=True;Connect Timeout=30
Very very strange issue here... Apologies in advance for the wall of text.
We have a suite of applications running on an EC2 instance, all connecting to an RDS instance.
We are hosting the staging and production applications on the same EC2 server.
With one of the applications, as soon as the staging app is moved to prod, over 250 or so connections to the DB are opened, causing the RDS instance to max out CPU usage and make the entire suite slow down. The staging application itself does not have this issue.
The issue can be replicated by both deploying the app via our Octopus setup, and also physically copy pasting the BIN/Views folder from staging to live.
The connections are instant, boosting the CPU usage to 99% in less than a minute.
Things to note...
Running how to see active SQL Server connections? will show the bulk connections, none of which have a LoginName.
Resource monitor on the FE server will list the connections, all coming from a IIS, seemingly scanning all outbound ports, attempting to connect to the DB server on its port. FE server address and DB server address blacked out respectively. Only a snippet of all all of the connections.
The app needs users to log in to perform 99.9% of tasks. There is a public "Forgot your password" method that was updated to accept either a username or password. No change to the form structure or form action URL, just an extra check in the back.
Other changes were around how data was to be displayed and payment restrictions under certain conditions. Both of which require a login.
Things I've tried...
New app pools
Just giving it a few days to forget this ever happened
Not using Octopus to publish
Checking all areas that were updated between versions to see if a connection was not closed properly.
Really at a loss as to what is happening. This is the first time that I've seen something like this. Especially strange that staging is fine, but the same app on another URL/Connection string fails so badly.
The only think I can think of would potentially be some kind of scraper that is polling the public form, but that makes no sense as why isn't it happening with the current app...
Is there something in AWS that can monitor the calls that are being made? I vaguely remember something in NewRelic being able to do so.
Any suggestions and/or similar experiences are welcomed.
Edits.
Nothing outstanding in logs for the day of the issue (yesterday)
No incoming traffic to match all of the outbound requests
No initialisation is performed by the application on startup
Update...
We use ADO for most of our queries. A query was updated to get data from different tables. The method name and parameters were not changed, just the body of the query. If I use sys.dm_exec_sql_text to see what is getting sent to the DB, I can see that is IS the updated query that is being sent in each of the hundreds of connections. They are all showing as suspended though... Nothing has changed in regards to how that query is sent to the server, just the body of the query itself...
So, one of the other queries that was published in the update broke it. We reverted only that query and deployed a new version, and it is fine.
Strangely enough, it's one that is being run in one form or another over the entire suite. But just died under any sort of load that wasn't staging, which is why I assumed it would be the last place to look.
(Sorry if this is a really long question, it said to be specific)
The company I work for has a number of sites, which have been running for some time with no problems. The applications are a mix of ASP.NET 2.0, 3.5, and 4.0, all using an ADO.NET to connect to a SQL Server Standard instance (on the same webserver) all being hosted with IIS7.
The problem began when we moved to an upgraded webserver. We made every effort to set up the server, db instance and IIS with the exact same settings (except for the different machine name, and the fact that we had upgraded from SQLExpress to Standard), and as far as we could tell, we did. Both servers are running Windows Server 2008 R2 (all current updates applied), and received a default install.
The problem is very apparent when starting up one of these applications. When you reach the login page of our application, the page itself loads extremely fast. This is true even when you load the page from a new machine that could not possibly have the page cached, with IIS caching disabled. The problem is actually visible when you enter your login information and click the login button. Because of the (not great)design of our databases, the login process must access a number of databases, theoretically up to 150 separate DBs, but in practice usually 2. The problem occurs even when only 2 databases (the minimum) are opened. Not a great design, but we have to live with it for now.
When trying to initially open a connection to the database, the entire process stops for about 20 seconds every time, regardless of whether you are connecting to 2 dbs or 40. I have run a .NET profiler (jetbrains dottrace) against the process, and the only information I could take from it was that one or all of the calls to sqlconnection.open() was accounting for 90% of the time. This only happens on first-use of the application, but the problem is compounded by the fact that IIS seems to disregard the recycling settings we have set for it, and recycles the application after a few minutes of idle, causing the problem to occur again.
I also tried to use the SQL Server profiler to see which database operations were the cause of the slowdown, but because of all the other DB activity, (and the fact that I had to do this on our production server, because the problem doesnt occur in our test environments) I couldn't pin down the exact operation that was causing the stoppage. I will try coming in late at night and shutting down the production sites to run the SQL profiler, but I might not be able to do this right away.
In the course of researching the problem, I have tried a couple solutions
Thinking it might be a name resolution problem, I tried modifiying both the hosts file on the webserver as well as giving the connectionstrings an IP address instead of the servername to resolve, with no difference. I have heard of the LLMNR protocol causing problems like this, but I think trying to connect by IP or resolving with the hosts file should have eliminated that possibility, tho i admit I never tried actually turning off LLMNR.
I have increased the idle timeouts, recycling intervals etc in IIS, but this doesn't even seem to be respected, much less solving the problem. This leads me to believe there is a setting overriding the IIS application settings on the machine.
multiple other code fixes, none of which made any difference. Is a SqlServer setting causing the problem?
other stuff that i forgot by now.
Any ideas, experience or whatevers would be greatly appreciated in helping me solve this problem!
I would advise using a non-tcp connection if you are still running the SQL instance on the local machine. SQL Server supports several protocols, tcp, named pipes, and shared memory are the more common.
Named Pipes
Data Source=np:computer\instance
Shared Memory
Data Source=lpc:computer\instance
Personally I prefer the Shared Memory. Remember you need to enable these protocols, and to avoid configuration mistakes I suggest you disable all you are not using.
see http://msdn.microsoft.com/en-us/library/ms187892.aspx
IIS Reset
In IIS7 there are two ways to configure the idle-timeout. Both begin by clicking on the "Application Pools" section and right-clicking the appropriate app domain. If you click the "Recycling..." option there is one setting. The other is in "Advanced Settings..." under the section for "Process Model" you will find "Idle Time-out (minutes)" which set to zero disables the process timeout. This later option is the one that works for us.
If I were you I'd solve this problem first as restarting the appdomain and/or worker process is always painful even if you don't have a 20 second lag.
Some ideas:
from the web server, can you ping the db server and get a "normal"
response, or are you seeing a similar delay?
if you're seeing a delay, run a tracert to see if you can nail down where the slowness is occurring
try using a tool like QueryExpress (http://www.albahari.com/queryexpress.aspx) which doesn't require an install to run. You can download this EXE and run it from your web server. See if you can connect to your db using this and run queries in a normal fashion.
Try something like SysInternals' TcpView (http://technet.microsoft.com/en-us/sysinternals/bb897437) to take a look at your open connections and see what activity is happening on your server and how much data is being sent to and received from your db server.
Just some initial thoughts on where I'd start to look based upon your problem description. I hope this helps. Good luck with things!
With IIS not respecting recycling settings: did restarting IIS/rebooting change the behavior?
We've recently developed an internal web application for our intranet on top Microsoft's MVC Framework (v2). This seems to work wonderfully, but after some as yet unknown events, we're seeing a situation where database queries appear to yield no results, whilst no exceptions are caught, which has stumped me somewhat.
Republishing the application doesn't change the behaviour, but restarting the application pool will restore functionality completely.
For completeness, we're using a dedicated application pool for MVC applications, and connect to SQL Server using SQL Authentication, which carries on working from other hosts whilst the web server is having a sulk. There don't appear to be any exceptions thrown (we're not catching any, and the built in unhandled exception magic isn't catching anything). We're not using LINQ to SQL in this instance, but rather the ADO.Net approach of SqlConnection/SqlCommand/stored procedures which I'd normally expect to throw an exception if they failed to connect, or a stored procedure failed. I've profiled the application and it doesn't appear to leak any resources either.
I think I've covered all the angles, but where else should I look to get into the forensics of finding the cause of the issue?
EDIT: I probably should mention that we're using NTLM authentication, and doing tricks like editing the web.config (to force the application to reload) has no effect -- we have to recycle the whole application pool to fix.
Have you tried using SQL profiler to monitor the database traffic during the anomaly?
We've been having intermittent problems causing users to be forcibly logged out of out application.
Our set-up is ASP.Net/C# web application on Windows Server 2003 Standard Edition with SQL Server 2000 on the back end. We've recently performed a major product upgrade on our client's VMWare server (we have a guest instance dedicated to us) and whereas we had none of these issues with the previous release the added complexity that the new upgrade brings to the product has caused a lot of issues. We are also running SQL Server 2000 (build 8.00.2039, or SP4) and the IIS/ASP.NET (.Net v2.0.50727) application on the same box and connecting to each other via a TCP/IP connection.
Primarily, the exceptions being thrown are:
System.IndexOutOfRangeException: Cannot find table 0.
System.ArgumentException: Column 'password' does not belong to table Table.
[This exception occurs in the log in script, even though there is clearly a password column available]
System.InvalidOperationException: There is already an open DataReader associated with this Command which must be closed first.
[This one is occurring very regularly]
System.InvalidOperationException: This SqlTransaction has completed; it is no longer usable.
System.ApplicationException: ExecuteReader requires an open and available Connection. The connection's current state is connecting.
System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
And just today, for the first time:
System.Web.UI.ViewStateException: Invalid viewstate.
We have load tested the app using the same number of concurrent users as the production server and cannot reproduce these errors. They are very intermittent and occur even when there are only 8/9/10 user connections. My gut is telling me its ASP.NET - SQL Server 2000 connection issues..
We've pretty much ruled out code-level Data Access Layer errors at this stage (we've a development team of 15 experienced developers working on this) so we think its a specific production server environment issue.
The Invalid Viewstate error is pretty common in a high traffic web site. Though, if you recently moved to multiple web servers, make sure you're sharing the same machine key so Viewstate is signed with the same key on all servers. http://www.codinghorror.com/blog/archives/000132.html
Based on the other errors I'd guess that you are using shared connections across multiple threads. Are your connections stored in static variables, Application state, Session state, or other object that's used across multiple requests? Maybe there's a hashtable somewhere containing connections, commands, or transactions. None of the ADO.Net objects are thread safe. So, make sure you only use them in a single threaded fashion.
Another possibility is you're passing around the ADO.NET objects and not consistently disposing of them and managing their scope. Maybe they're cached in the request context or some such?
I know you don't want to hear this, but people smarter than I have said it (check out McConnell's Code Complete if you don't believe me):
It's probably your code, and your gut is probably correct:
My gut is telling me its ASP.NET - SQL
Server 2000 connection issues..
The Errors being thrown are quite specific, and contextually, they look like they're just trying to connect and having a hard time -- which if it only happens in the client's environment, could be indicative of a setting not set correctly for the VM to access TCP connections on the host machine (under a different instance).
Are you sure that none of your code has changed since before the move, and that your previous environment had logging like this enabled? It may have been happening (to a lesser degree) before, but your environment didn't catch it because you didn't have logging enabled.
If that's not the issue, and I'm reading your post correctly: You're running a server on a guest instance provided by the client on their pipe and bandwidth? If that's the case, then quite possibly (around the same time as that upgrade) some routing configuration was changed, or firewall changes were made, or whatever box the instance is on had some change made now that it handles your stuff differently.
If you can't reproduce it in your environment, and you are 100% certain that it isn't your code; then logically it can only be their environment that is the issue.
Lads, just as an update, it turned out that the problem was VMWare related under heavy usage - what a fun week! We're changing the code around to suit the VMWare environment and we've seen some improvement already.
Thanks for the suggestions, I appreciate it.