How can I prevent 'System.Transactions.TransactionException' error when using NServiceBus - c#

My program makes use of NServiceBus as the service bus.
Now, when I run a some part of my program, it fires a command to initiate a process. This process involves data look up from a database (with A lot of data) by 3 separate handlers (classes) in the program. So, in some way, they are happening in parallel. As these 3 classes receives and handles the same command, then starts work
Searching through similar posts on stack overflow, I've come across a number of suggestions. Including 'increasing the timeout time' in both the config and machine.config. Done this to know avail.
This post! made me realise it could be an issue with NServiceBus and MSDTC.
I've also attached visual studio debugger to the program process and witnessed the exception taking place at every point where I'm querying a repository class - which queries the database.
System.Transactions.TransactionException occurred
HResult=-2146233087
Message=The operation is not valid for the state of the transaction.
Source=System.Transactions
StackTrace:
at
System.Transactions.TransactionState.EnlistVolatile(InternalTransaction
tx, IEnlistmentNotification enlistmentNotification, EnlistmentOptions
enlistmentOptions, Transaction atomicTransaction)
InnerException:
I'm tempted to just have a try catch everywhere. But that's me getting desperate. And, I'm ignoring a lot of data.
Please, any ideas?
All response will be appreciated.

Related

EF and Async - Weird live scenario

I'm implementing async all over my cloud-based project.
I'm now trying to figure out why my TransactionScope keeps crashing randomly. The messages are "Cannot perform this operation because there are pending operations" or "The operation is not valid for the state of the transaction" an other similar.
I say that it crashes randomly, because if you retry the operation, it works eventually...
At first I implemented the TransactionScopeAsyncFlowOption.Enabled overload... the fail ratio decreased.
Then I made the whole operation use the same DbContext (previous guys created a new one for each CRUD operation, like selecting a user? new context for you! now you want the sales of that user? let me get that using a new context! Create a new sale? Lets do that on this new context here... and so on...). fail ratio declined even further.
Then i decided to await as soon as possible (previously i was firing some queries at the start of the operation, and only waiting right before using the result). That significantly improved the fail ratio.
Now, I got a message in my logs that signaled a FK mismatch... Thats really weird because this is a very solid app and FK logic going wrong is a very basic mistake. Looking at the log, i see something like "Error for client CLIENT_A. Complete message: (bla bla bla) The conflict occurred in database db_CLIENT_B"!!!
In my multi-tenant app, each tenant has it's own database, so CLIENT_A should have problems only with db_CLIENT_A. We are very meticulous about this.
That is a really serious problem. That means that either unity container is giving the wrong instance of DbContext (it's configured for single instance per request) or there's a serious problem going on regarding the async/await and parallel and distinct operations... I think it could be mix, taking into account that DbContext is not thread-safe and neither is the Resolve, although it's getting called only once (resolve for DBContext happens very soon on the pipeline)
So my question is: What can i do to figure this out?
PS: ON the last 7 days, i have 5 logs for this. This might've happen more times (the switching) but if the other database has a compatible FK... well, i will here for that in a couple of days when managers start emitting financial reports...
This is caused by Unity. This happens when I call 'Resolve' within a async scope.

SSAS Cube processing failed. returning trace information takes forever

I'm using a small c# program to process SSAS databases using the C# API. The problem is that one of the cubes hangs under processing, and Database.Process() is a blocking call, meaning that the process hangs until killed through Process Explorer.
If I do the processing manually, through SQL Server Management Studio, I get the following error: Process failed. Trace information is still being transferred. If you do not want to wait for all of the information to arrive press Stop. but the behavior is otherwise the same. After 4 hours, still no trace information. If I press stop it tells me the connection was lost. But if the reason is the connection was lost, how come Database.Process() doesn't throw an exception?
I have two questions:
How do I start debugging this?
Is there some way to programmatically (or otherwise) set a SSAS database to fail-fast without sending a seemingly infinite amount of trace information?
I am using SQL Server 2008R2 if it makes any difference.
One way of debugging Analysis Services issues is using Performance Monitor. You can find this in the start menu at "Microsoft SQL Server 2008/Performance Tools/SQL Server Profiler". Launch this tool, click on "New Trace", select a Analysis Services connection, and leave the event selection etc. at their default settings for now, and then click "Run". You will see several events traced of the server to which you connected.
If you then start the processing from your program, you should be able to see this appearing as several events, and hopefully the error will appear in the trace somewhere. Hopefully, you have a server with not too much concurrent actions. Otherwise, you will to have to limit the tracing to specific events before starting it.

SQL Server disconnection causes connection pool issues

I have a windows service which receives messages via RabbitMQ, this triggers an event handler which does some work and then attempts to persist the result to the database. It's threaded using:
ThreadPool.QueueUserWorkItem(ProcessMessageOnThread, messageReceived);
where ProcessMessageOnThread is a method which does the work on the messageReceived which is a representation of the message dequeued from RabbitMQ.
Under normal circumstances the windows service operated as expected, that is dequeue, process and persist.
I want to ensure that all of my messages are processed and given a fair change to be processed so if I can't open a connection to SQL Server I simply requeue the message for it to be processed again (hopefully that time the SQL Server will be back, otherwise this continues - and I'm fine with that).
Now the problem comes when the process has been running as expected for a period of time, the SQL Server connection pool has filled up and then SQL Server is disconnected, now this is when things get a bit unstable.
One of two things can happen:
An exception is thrown on connection.Open() - however I'm catching this and so not worried about it
An exception is thrown on cmd.ExecuteNonQuery() - which is where I'm executing a stored procedure
It is the second option that I need to figure out how to handle. Previously I assumed that any exception here meant that there was a problem with the data I was passing into the stored procedure and therefore should just move it out of the queue and have something else analyse it.
However, now I think I need a new approach to handle the cases where the exception is to do with the connection not actually being established.
I've had a look at the SqlException class and noticed a property called Class which has this description Gets the severity level of the error returned from SQL Server, now the info on this says:
Messages with a severity level of 10 or less are informational and indicate problems caused by mistakes in information that a user has entered. Severity levels from 11 through 16 are generated by the user, and can be corrected by the user. Severity levels from 17 through 25 indicate software or hardware errors. When a level 17, 18, or 19 error occurs, you can continue working, although you might not be able to execute a particular statement.
Does this mean to fix my exception handling I can just check if (ex.Class > 16) then requeue message because the problem is with the connection else throw it away as it is most likely to do with malformed data being send to the stored procedure?
So the question is, how should I do exception handling and how can I detect when calling cmd.ExecuteNonQuery() if the exception thrown is because of a disconnected connection.
Update:
I've experienced problems previously with connections not being returned to the pool (this was due to threading issues) and have fixed those problems, so I'm confident the issue isn't to do with connections not going back into the pool. Also, the logic around what the connections are being used for is so simple also I'm ensuring they are closed consistently...so I'm more interested in answers to do with the disconnection of the Sql Server and then the capturing the behaviour of cmd.ExecuteNonQuery()
Connections in the connection pool can get into a weird state for various reasons, all of which have to do with poor application design:
Closing the connection before its associated data reader
Change a setting (like transaction isolation level) that the pool does not reset
Starting an asynchronous query (BeginOpenReader) and then returning the connection to the pool before the asynchronous handler fires
You should investigate your application and make sure connections are properly returned to the pool. One thing that can help debugging is reducing the size of the application pool in a development setting. You change the size of the pool in the connection string:
...;Integrated Security=SSPI;Max Pool Size=2;Pooling=True;
This makes pooling issues much easy to reproduce.
If you can't find the cause, but still need to deploy a fix, you could use one of ClearPool or ClearAllPools. A good place to do that is when you detect one of the suspicious exceptions after Open() or ExecuteNonQuery(). Both are static methods on the SqlConnection class:
SqlConnection.ClearPool(yourConnection);
Or for an even rougher approach:
SqlConnection.ClearAllPools()
Note that this is basically Pokémon Exception Handling. If it works, you'll have no idea why. :)

Handling Internet Connection Hiccups and Database Connections

I realize that there is no way to atomically guarantee:
if(hasInternet)
doDatabaseCall();
However, what is the correct way of handling connection problems when dealing with DbConnection and DbCommand objects in the .NET world? I'm specifically interested in the MySqlConnection and MySqlCommand objects but assume (hope) its pattern of exceptions is the same as SQL Server's classes.
I'd assume that if the internet goes down before calling, conn.Open(), a MySqlException gets raised. I'd also assume the same happens if you call ExecuteReader or ExecuteNonQuery and the internet has gone down.
I'm not sure because the docs for the MySql objects don't say. The SqlServer objects just say that it might raise a SqlException which means:
An exception occurred while executing the command against a locked row. This exception is not generated when you are using Microsoft .NET Framework version 1.0.
That doesn't seem to cover connection issues... What I'd like to do is handle the exception, wait for some amount of time, and start over again. My application's sole purpose is to execute these database calls and its running on a dedicated system so retrying forever is really the best option I believe. That said, I would love to differentiate between connection exceptions and other kinds of database exceptions, is that possible?
I've done some testing and it appears to work as I assume but does it work in all edge cases? Such as: the command was successfully sent to the database server but the connection goes down before or while the results are being returned? If it doesn't work in all edge cases then I'm going to have to: execute a command, query for the desired state change, execute the next command, etc. It's important that each command goes through.
I am connecting to a port on localhost that is forwarded via SSH to a remote server if that makes a difference.
As for the SqlDataProvider:
The SqlException exception has a several properties that give you detailed information why your operation failed.
For your use case the "Class" property might be a good choice. It's a byte indicating the severity of the exception.
See: http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlexception.class.aspx
If that is not specific enough, you can examine that individual errors in the Errors collection.
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlexception.errors.aspx
Based on that information you can decide whether to retry.

Handling rude application aborts in .NET

I know I'm opening myself to a royal flaming by even asking this, but I thought I would see if StackOverflow has any solutions to a problem that I'm having...
I have a C# application that is failing at a client site in a way that I am unable to reproduce locally. Unfortunately, it is very difficult (impossible) for me to get any information that at all helps in isolating the source of the problem.
I have in place a rather extensive error monitoring framework which is watching for unhandled exceptions in all the usual places:
Backstop exception handler in threads I control
Application.ThreadException for WinForms exceptions
AppDomain.CurrentDomain.UnhandledException
Which logs detailed information in a place where I have access to them.
This has been very useful in the past to identify issues in production code, but has not been giving me any information at about the current series of issues.
My best guess is that the core issue is one of the "rude" exception types (thread abort, out of memory, stack overflow, access violation, etc.) that are escalating to a rude shutdown that are ripping down the process before I have a chance to see what is going on.
Is there anything that I can be doing to snapshot information as my process is crashing that would be useful? Ideally, I would be able to write out my custom log format, but I would be happy if I could have a reliable way of ensuring that a crash dump is written somewhere.
I was hoping that I could implement class deriving from CriticalFinalizerObject and have it spit a last-chance error log out when it is disposing, but that doesn't seem to be triggered in the StackOverflow scenario which I tested.
I am unable to use Windows Error Reporting and friends due to the lack of a code signing certificate.
I'm not trying to "recover" from arbitrary exceptions, I'm just trying to make a note of what went wrong on the way down.
Any ideas?
You could try creating a minidump file. This is a C++ API, but it should be possible to write a small C++ program that starts your application keeps a handle to the process, waits on the process handle, and then uses the process handle to create a minidump when the application dies.
If you have done what you claim:
Try-Catch on the Application.Run
Unhandled Domain Exceptions
Unhandled Thread Exceptions
Try Catch handlers in all threads
Then you would have caught the exception except perhaps if it is being thrown by a third party or COM component.
You certainly haven't given enough information.
What events does the client say leads up to the exception?
What COM or third party components do you use? (Do you properly instance and reference these components? Do you pass valid arguments to COM function calls?)
Do you make use of any un-managed - un-safe code?
Are you positive that you have all throw-capable calls covered with try-catch?
I'm just saying that no-one can offer you any helpful advice unless you post a heck of lot more information and even at that we probably can only speculate as to the source of you problem.
Have a set of fresh eyes look at your code.
Some errors cannot be caught by logging.
See this similar question for more details:
StackOverflowException in .NET
Here's a link explaining asynchronous exceptions (and why you can't recover from them):
http://www.bluebytesoftware.com/blog/PermaLink.aspx?guid=c1898a31-a0aa-40af-871c-7847d98f1641

Categories