SQL Server disconnection causes connection pool issues - c#

I have a windows service which receives messages via RabbitMQ, this triggers an event handler which does some work and then attempts to persist the result to the database. It's threaded using:
ThreadPool.QueueUserWorkItem(ProcessMessageOnThread, messageReceived);
where ProcessMessageOnThread is a method which does the work on the messageReceived which is a representation of the message dequeued from RabbitMQ.
Under normal circumstances the windows service operated as expected, that is dequeue, process and persist.
I want to ensure that all of my messages are processed and given a fair change to be processed so if I can't open a connection to SQL Server I simply requeue the message for it to be processed again (hopefully that time the SQL Server will be back, otherwise this continues - and I'm fine with that).
Now the problem comes when the process has been running as expected for a period of time, the SQL Server connection pool has filled up and then SQL Server is disconnected, now this is when things get a bit unstable.
One of two things can happen:
An exception is thrown on connection.Open() - however I'm catching this and so not worried about it
An exception is thrown on cmd.ExecuteNonQuery() - which is where I'm executing a stored procedure
It is the second option that I need to figure out how to handle. Previously I assumed that any exception here meant that there was a problem with the data I was passing into the stored procedure and therefore should just move it out of the queue and have something else analyse it.
However, now I think I need a new approach to handle the cases where the exception is to do with the connection not actually being established.
I've had a look at the SqlException class and noticed a property called Class which has this description Gets the severity level of the error returned from SQL Server, now the info on this says:
Messages with a severity level of 10 or less are informational and indicate problems caused by mistakes in information that a user has entered. Severity levels from 11 through 16 are generated by the user, and can be corrected by the user. Severity levels from 17 through 25 indicate software or hardware errors. When a level 17, 18, or 19 error occurs, you can continue working, although you might not be able to execute a particular statement.
Does this mean to fix my exception handling I can just check if (ex.Class > 16) then requeue message because the problem is with the connection else throw it away as it is most likely to do with malformed data being send to the stored procedure?
So the question is, how should I do exception handling and how can I detect when calling cmd.ExecuteNonQuery() if the exception thrown is because of a disconnected connection.
Update:
I've experienced problems previously with connections not being returned to the pool (this was due to threading issues) and have fixed those problems, so I'm confident the issue isn't to do with connections not going back into the pool. Also, the logic around what the connections are being used for is so simple also I'm ensuring they are closed consistently...so I'm more interested in answers to do with the disconnection of the Sql Server and then the capturing the behaviour of cmd.ExecuteNonQuery()

Connections in the connection pool can get into a weird state for various reasons, all of which have to do with poor application design:
Closing the connection before its associated data reader
Change a setting (like transaction isolation level) that the pool does not reset
Starting an asynchronous query (BeginOpenReader) and then returning the connection to the pool before the asynchronous handler fires
You should investigate your application and make sure connections are properly returned to the pool. One thing that can help debugging is reducing the size of the application pool in a development setting. You change the size of the pool in the connection string:
...;Integrated Security=SSPI;Max Pool Size=2;Pooling=True;
This makes pooling issues much easy to reproduce.
If you can't find the cause, but still need to deploy a fix, you could use one of ClearPool or ClearAllPools. A good place to do that is when you detect one of the suspicious exceptions after Open() or ExecuteNonQuery(). Both are static methods on the SqlConnection class:
SqlConnection.ClearPool(yourConnection);
Or for an even rougher approach:
SqlConnection.ClearAllPools()
Note that this is basically Pokémon Exception Handling. If it works, you'll have no idea why. :)

Related

SqlException 'timeout expired' when executing nonquery, but data are inserted

I'm hitting weird behavior under load testing scenario: backend (sql server 2012) is getting overloaded and some of the commands times out (this is still expected as the backend server is half-intentionally slow HW); but our platform is regularly (with increasing delay) retrying the timeouted operation - and after few retries it suddenly starts receiving 'cannot insert duplicate key' SqlException.
I verified that only a single row with a specific unique key can be generated and is attempted to be inserted (first insert and all possible retries always happens on the same thread).
I also altered the SP so that it uses explicit transaction:
BEGIN TRY
BEGIN TRANSACTION;
-- Insert into table A
-- Insert into table B
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
THROW
END CATCH
Yet the issue is still occurring.
Are there any ideas why this can be happening?
How can I find out where is the timeout coming from (backend vs. client side)?
Is there a way how to make sure that the operation either successful finishes or fails (basically transaction - but probably from client side code)?
EDIT01:
I believe one way of solving this is leverage of ado.net integration of SQL server distributed transactions - e.g:
using (TransactionScope scope = new TransactionScope())
{
//Perform the sql commands
//if above statements throws (e.g. due to timeout) - than the transaction is not commited and it will be rolled back
scope.Complete()
}
HOWEVER: I agree that it only adds complexity and actuly might be still object to the same problem (Two Generals problem as outlined by usr).
The best approach therefore is likely to code client and server side to count on such an option - again as noted by usr in his answer
This is expected behavior. When the communication between the client and the server is interrupted the client does not know the result of the operation. It might never have been sent, or it was sent but not received, or it was received but failed, or it was received but the success response did not come through.
This is the Two Generals Problem. It is unsolvable (when defining it strictly).
You must work around it. Either check for existence before insert or handle the duplicate key exception.
Or, simply increase the timeout. It does not do you any good to abort an otherwise working command that would have succeeded eventually. Aborting and restarting it does not make it go faster (except by coincidence). Timeouts are mostly useful for network errors or run-away queries (bugs).
Per the documentation SqlException Class.
The exception that is thrown when SQL Server returns a warning or
error. This class cannot be inherited.
My experience is that if you got a SQL exception then it came from SQL.
Timeout is a SQL setting. You can set it SSMS.
OK I am now believe usr +1 that this comes from the client side
SqlCommand.CommandTimeout
Gets or sets the wait time before terminating the attempt to execute a
command and generating an error.
That just is not how I would want it to be implemented.
You could have a crazy query and lose connection SQL would just keep on running.

Cancel background worker which is calling external process

I have created a TelNet server for a project I need to do which is working fine, however when a client connects to the server it needs to connect to a database, again this works fine when the connection information is correct and/or calls to the database do not take too long.
If the database call takes a long time (usually due to incorrect credentials or a badly optimised stored procedure) the server will crash with a Windows error message (i.e. not debuggable), which I understand is the underlying TCP system kicking in, which is fine. To resolve this I am putting all the database calls into BackgroundWorkers, so the server (and clients) continue to work, however I need to kill off this process if it is obviously taking too long.
I know about using BackgroundWorker.CancellationPending, but as this is a single method call to the database (via and external DLL), it will never get checked. Same issue with a self-made approach that I have seen elsewhere. The other option I have seen is using Thread.Abort(), but I also know that is unpredictable and unsafe, so probably best not to use that.
Does anyone have any suggestions how to accomplish this?
The problem here is that an external DLL is controlling the waiting. Normally, you could cancel ADO.NET connections or socket connections but this doesn't work here.
Two reliable approaches:
Move the connection into a child process that you can kill. Kill is safe (in contrast to Thread.Abort!) because all state of that process is gone at the same time.
Structure the application so that in case of cancellation the result of the connection attempt is just being ignored and that the app continues running something else. You just let the hanging connection attempt "dangle" in the background and throw away its result when it happens to return later.

SSAS Cube processing failed. returning trace information takes forever

I'm using a small c# program to process SSAS databases using the C# API. The problem is that one of the cubes hangs under processing, and Database.Process() is a blocking call, meaning that the process hangs until killed through Process Explorer.
If I do the processing manually, through SQL Server Management Studio, I get the following error: Process failed. Trace information is still being transferred. If you do not want to wait for all of the information to arrive press Stop. but the behavior is otherwise the same. After 4 hours, still no trace information. If I press stop it tells me the connection was lost. But if the reason is the connection was lost, how come Database.Process() doesn't throw an exception?
I have two questions:
How do I start debugging this?
Is there some way to programmatically (or otherwise) set a SSAS database to fail-fast without sending a seemingly infinite amount of trace information?
I am using SQL Server 2008R2 if it makes any difference.
One way of debugging Analysis Services issues is using Performance Monitor. You can find this in the start menu at "Microsoft SQL Server 2008/Performance Tools/SQL Server Profiler". Launch this tool, click on "New Trace", select a Analysis Services connection, and leave the event selection etc. at their default settings for now, and then click "Run". You will see several events traced of the server to which you connected.
If you then start the processing from your program, you should be able to see this appearing as several events, and hopefully the error will appear in the trace somewhere. Hopefully, you have a server with not too much concurrent actions. Otherwise, you will to have to limit the tracing to specific events before starting it.

Handling Internet Connection Hiccups and Database Connections

I realize that there is no way to atomically guarantee:
if(hasInternet)
doDatabaseCall();
However, what is the correct way of handling connection problems when dealing with DbConnection and DbCommand objects in the .NET world? I'm specifically interested in the MySqlConnection and MySqlCommand objects but assume (hope) its pattern of exceptions is the same as SQL Server's classes.
I'd assume that if the internet goes down before calling, conn.Open(), a MySqlException gets raised. I'd also assume the same happens if you call ExecuteReader or ExecuteNonQuery and the internet has gone down.
I'm not sure because the docs for the MySql objects don't say. The SqlServer objects just say that it might raise a SqlException which means:
An exception occurred while executing the command against a locked row. This exception is not generated when you are using Microsoft .NET Framework version 1.0.
That doesn't seem to cover connection issues... What I'd like to do is handle the exception, wait for some amount of time, and start over again. My application's sole purpose is to execute these database calls and its running on a dedicated system so retrying forever is really the best option I believe. That said, I would love to differentiate between connection exceptions and other kinds of database exceptions, is that possible?
I've done some testing and it appears to work as I assume but does it work in all edge cases? Such as: the command was successfully sent to the database server but the connection goes down before or while the results are being returned? If it doesn't work in all edge cases then I'm going to have to: execute a command, query for the desired state change, execute the next command, etc. It's important that each command goes through.
I am connecting to a port on localhost that is forwarded via SSH to a remote server if that makes a difference.
As for the SqlDataProvider:
The SqlException exception has a several properties that give you detailed information why your operation failed.
For your use case the "Class" property might be a good choice. It's a byte indicating the severity of the exception.
See: http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlexception.class.aspx
If that is not specific enough, you can examine that individual errors in the Errors collection.
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlexception.errors.aspx
Based on that information you can decide whether to retry.

Why so many sp_resetconnections for C# connection pooling?

We have a web service coded in C# that makes many calls to MS SQL Server 2005 database. The code uses Using blocks combined with C#'s connection pooling.
During a SQL trace, we saw many, many calls to "sp_resetconnection". Most of these are short < 0.5 sec, however sometimes we get calls lasting as much as 9 seconds.
From what I've read sp_resetconnection is related to connection pooling and basically resets the state of an open connection. My questions:
Why does an open connection need its state reset?
Why so many of these calls!
What could cause a call to sp_reset connection to take a non-trivial amount of time.
This is quite the mystery to me, and I appreciate any and all help!
The reset simply resets things so that you don't have to reconnect to reset them. It wipes the connection clean of things like SET or USE operations so each query has a clean slate.
The connection is still being reused. Here's an extensive list:
sp_reset_connection resets the following aspects of a connection:
It resets all error states and numbers (like ##error)
It stops all EC's (execution contexts) that are child threads of a parent EC executing a parallel query
It will wait for any outstanding I/O operations that is outstanding
It will free any held buffers on the server by the connection
It will unlock any buffer resources that are used by the connection
It will release all memory allocated owned by the connection
It will clear any work or temporary tables that are created by the connection
It will kill all global cursors owned by the connection
It will close any open SQL-XML handles that are open
It will delete any open SQL-XML related work tables
It will close all system tables
It will close all user tables
It will drop all temporary objects
It will abort open transactions
It will defect from a distributed transaction when enlisted
It will decrement the reference count for users in current database; which release shared database lock
It will free acquired locks
It will releases any handles that may have been acquired
It will reset all SET options to the default values
It will reset the ##rowcount value
It will reset the ##identity value
It will reset any session level trace options using dbcc traceon()
sp_reset_connection will NOT reset:
Security context, which is why connection pooling matches connections based on the exact connection string
If you entered an application role using sp_setapprole, since application roles can not be reverted
The transaction isolation level(!)
Here's an explanation of What does sp_reset_connection do? which says, in part "Data access API's layers like ODBC, OLE-DB and SqlClient call the (internal) stored procedure sp_reset_connection when re-using a connection from a connection pool. It does this to reset the state of the connection before it gets re-used." Then it gives some specifics of what that system sproc does. It's a good thing.
sp_resetconnection will get called everytime you request a new connection from a pool.
It has to do this since the pool cannot guarantee the user (you, the programmer probably :)
have left the connection in a proper state. e.g. Returning an old connection with uncommited transactions would be ..bad.
The nr of calls should be related to the nr of times you fetch a new connection.
As for some calls taking non-trivial amount of time, I'm not sure. Could be the server is just very busy processing other stuff at that time. Could be network delays.
Basically the calls are the clear out state information. If you have ANY open DataReaders it will take a LOT longer to occur. This is because your DataReaders are only holding a single row, but could pull more rows. They each have to be cleared as well before the reset can proceed. So make sure you have everything in using() statements and are not leaving things open in some of your statements.
How many total connections do you have running when this happens?
If you have a max of 5 and you hit all 5 then calling a reset will block - and it will appear to take a long time. It really is not, it is just blocked waiting on a pooled connection to become available.
Also if you are running on SQL Express you can get blocked due to threading requirements very easily (could also happen in full SQL Server, but much less likely).
What happens if you turn off connection pooling?

Categories