How do you deal with transport-level errors in SqlConnection? - c#

Every now and then in a high volume .NET application, you might see this exception when you try to execute a query:
System.Data.SqlClient.SqlException: A transport-level error has
occurred when sending the request to the server.
According to my research, this is something that "just happens" and not much can be done to prevent it. It does not happen as a result of a bad query, and generally cannot be duplicated. It just crops up maybe once every few days in a busy OLTP system when the TCP connection to the database goes bad for some reason.
I am forced to detect this error by parsing the exception message, and then retrying the entire operation from scratch, to include using a new connection. None of that is pretty.
Anybody have any alternate solutions?

I posted an answer on another question on another topic that might have some use here. That answer involved SMB connections, not SQL. However it was identical in that it involved a low-level transport error.
What we found was that in a heavy load situation, it was fairly easy for the remote server to time out connections at the TCP layer simply because the server was busy. Part of the reason was the defaults for how many times TCP will retransmit data on Windows weren't appropriate for our situation.
Take a look at the registry settings for tuning TCP/IP on Windows. In particular you want to look at TcpMaxDataRetransmissions and maybe TcpMaxConnectRetransmissions. These default to 5 and 2 respectively, try upping them a little bit on the client system and duplicate the load situation.
Don't go crazy! TCP doubles the timeout with each successive retransmission, so the timeout behavior for bad connections can go exponential on you if you increase these too much. As I recall upping TcpMaxDataRetransmissions to 6 or 7 solved our problem in the vast majority of cases.

This blog post by Michael Aspengren explains the error message "A transport-level error has occurred when sending the request to the server."

To answer your original question:
A more elegant way to detect this particular error, without parsing the error message, is to inspect the Number property of the SqlException.
(This actually returns the error number from the first SqlError in the Errors collection, but in your case the transport error should be the only one in the collection.)

I had the same problem albeit it was with service requests to a SQL DB.
This is what I had in my service error log:
System.Data.SqlClient.SqlException: A transport-level error has occurred when sending the request to the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
I have a C# test suite that tests a service. The service and DB were both on external servers so I thought that might be the issue. So I deployed the service and DB locally to no avail. The issue continued. The test suite isn't even a hard pressing performance test at all, so I had no idea what was happening. The same test was failing each time, but when I disabled that test, another one would fail continuously.
I tried other methods suggested on the Internet that didn't work either:
Increase the registry values of TcpMaxDataRetransmissions and TcpMaxConnectRetransmissions.
Disable the "Shared Memory" option within SQL Server Configuration Manager under "Client Protocols" and sort TCP/IP to 1st in the list.
This might occur when you are testing scalability with a large number of client connection attempts. To resolve this issue, use the regedit.exe utility to add a new DWORD value named SynAttackProtect to the registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\ with value data of 00000000.
My last resort was to use the old age saying "Try and try again". So I have nested try-catch statements to ensure that if the TCP/IP connection is lost in the lower communications protocol that it does't just give up there but tries again. This is now working for me, however it's not a very elegant solution.

use Enterprise Services with transactional components

I have seen this happen in my own environment a number of times. The client application in this case is installed on many machines. Some of those machines happen to be laptops people were leaving the application open disconnecting it and then plugging it back in and attempting to use it. This will then cause the error you have mentioned.
My first point would be to look at the network and ensure that servers aren't on DHCP and renewing IP Addresses causing this error. If that isn't the case then you have to start trawlling through your event logs looking for other network related.
Unfortunately it is as stated above a network error. The main thing you can do is just monitor the connections using a tool like netmon and work back from there.
Good Luck.

You should also check hardware connectivity to the database.
Perhaps this thread will be helpful:
http://channel9.msdn.com/forums/TechOff/234271-Conenction-forcibly-closed-SQL-2005/

I'm using reliability layer around my DB commands (abstracted away in the repository interfaece). Basically that's just code that intercepts any expected exception (DbException and also InvalidOperationException, that happens to get thrown on connectivity issues), logs it, captures statistics and retries everything again.
With that reliability layer present, the service has been able to survive stress-testing gracefully (constant dead-locks, network failures etc). Production is far less hostile than that.
PS: There is more on that here (along with a simple way to define reliability with the interception DSL)

I had the same problem. I asked my network geek friends, and all said what people have replied here: Its the connection between the computer and the database server. In my case it was my Internet Service Provider, or there router that was the problem. After a Router update, the problem went away. But do you have any other drop-outs of internet connection from you're computer or server? I had...

I experienced the transport error this morning in SSMS while connected to SQL 2008 R2 Express.
I was trying to import a CSV with \r\n. I coded my row terminator for 0x0d0x0a. When I changed it to 0x0a, the error stopped. I can change it back and forth and watch it happen/not happen.
BULK INSERT #t1 FROM 'C:\123\Import123.csv' WITH
( FIRSTROW = 1, FIELDTERMINATOR = ',', ROWTERMINATOR = '0x0d0x0a' )
I suspect I am not writing my row terminator correctly because SQL parses one character at a time right while I'm trying to pass two characters.
Anyhow, this error is 4 years old now, but it may provide a bit of information for the next user.

I just wanted to post a fix here that worked for our company on new software we've installed. We were getting the following error since day 1 on the client log file: Server was unable to process request. ---> A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The semaphore timeout period has expired.) ---> The semaphore timeout period has expired.
What completely fixed the problem was to set up a link aggregate (LAG) on our switch. Our Dell FX1 server has redundant fiber lines coming out of the back of it. We did not realize that the switch they're plugged into needed to have a LAG configured on those two ports. See details here: https://docs.meraki.com/display/MS/Switch+Ports#SwitchPorts-LinkAggregation

Related

.NET/C# to MySql running on linux - exception on first command, but subsequent commands do work

Have a really crazy situation. I can't post specifics, so I'm just looking for general guidance. We have already opened a ticket with Oracle/MySql support. I'm just looking to see if anyone else has run into this situation or anything similar. Here is our scenario:
Windows 2012 R2 Server with .NET 4.7.1 running.
Simple Windows Forms .NET application.
We are trying to run a simple query against a Linux MySql Server. MySql is Enterprise Version 5.7.x.
On the first attempted connection, the Windows Forms app locks the UI, waits about 15 seconds, and then reports back that there is an error running the command. The error is shown below.
System.ApplicationException: An exception occurred on the following sql command:select * from tablename where compl_date >= '2019-12-17 04:44:34 PM' ---> MySql.Data.MySqlClient.MySqlException: Authentication to host 'ip address' for user 'userid' using method 'mysql_native_password' failed with message: Reading from the stream has failed. ---> MySql.Data.MySqlClient.MySqlException: Reading from the stream has failed. ---> System.IO.EndOfStreamException: Attempted to read past the end of the stream.
When this error pops up, if I click on the "Continue" button, subsequent calls to the database work as intended (at about a 95% rate).
On the server, the mysqld error logs are shown below for the first call. Subsequent calls do work.
2019-12-16T22:06:29.554171Z 3496 [Warning] IP address 'client ip address' could not be resolved: Name or service not known
2019-12-16T22:06:50.188443Z 3496 [Note] Aborted connection 3496 to db: 'drupaldb' user: 'userid' host: 'ip address' (Got an error reading communication packets)
2019-12-17T02:53:17.832725Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 11355ms. The settings might not be optimal. (flushed=0 and evicted=0, during the time.)
2019-12-17T03:25:18.200855Z 3527 [Note] Got an error reading communication packets
2019-12-17T03:25:37.167395Z 3528 [Note] Got packets out of order
2019-12-17T03:25:37.382512Z 3529 [Note] Got packets out of order
2019-12-17T03:25:47.688836Z 3530 [Note] Bad handshake
2019-12-17T14:26:33.619967Z 4803 [Note] Got timeout reading communication packets
2019-12-17T19:34:34.741441Z 4851 [Note] Got timeout reading communication packets
2019-12-17T19:47:47.595426Z 4853 [Note] Got timeout reading communication packets
2019-12-17T19:48:45.586357Z 4854 [Note] Got timeout reading communication packets
If you have some general ideas, let me know.
FYI, we have some other linux/mysql instances, and this runs just fine.
At this point, we think we have solved the problem, at least for the short term. Both server and client are sitting on a private network. We think that the database server is trying to send a certificate to the windows client. The windows client is also on this private network. We think the Windows Client is not accepting the ssl certificate and that this is causing the failure on the first connection attempt. By adding the option "SslMode=None", this seems to resolve the issue.
Blog post we found that helped us: https://blog.csdn.net/fancyf/article/details/78295964

Reading from stream failed - mysql_native_password error

I have been facing the following error intermittently.
Authentication to host '127.0.0.1' for user 'root' using method 'mysql_native_password' failed with message: Reading from the stream has failed.
It shots up any time and I am at my wits end. I also posted a bug on MySQL bugs and solutions are not proving to be effective in any way.I hope you guys can help me out.
Here is the link to MySQL Bug for details: Never seems to go away!
Some more detail: I have a client-server system but this bug occurs on the server system(where MySQL database is installed) when a local running app on the server system tries to run a query.
I had already opened a question here but since has been dead. Just a caveat I thought that skip-name-resolve solved the issue but it seems to just have lowered the frequency. Hope someone would help me out this time around.
EDIT: The MySQL guys say that in a client server setup server may close a connection if it is unused for a long time. However, this is not what I am facing as I create a new connection everytime I want to execute a query. I made this point clear in the last comment on the MySQL Bugs.
Guys I tried this: "SslMode=None" in the connection string, but if you need SSL then read this:
http://www.voidcn.com/article/p-phfoefri-bpr.html
here is a sample connection string that works:
connectionString="Server=192.168.10.5;Database=mydata;Uid=root;Pwd=****;SslMode=None"
Hope this helps
I've been getting this error, quite frequently with Amazon's MySQL RDS instances. And most multi-AZ instances.
It would be interesting to compare notes to see if others mostly get this issue with RDS also?
Amazon is known to rely heavily on "fast" DNS changes to switch over stuff with things like ELBs. I wonder if the same thing is happening with RDS? Or some other internal AWS switching is messing up the idle connections in the pool.
This would explain why the Oracle devs can't reproduce it and don't see it as much of an issue.
Anyway I've had to just deal with it and add retry logic when opening a connection.
This issue is caused by Ssl.
Solution 1: SSL is not required. Since it is caused by SSL, we can turn off SSL by appending "SslMode=None" to the connection string.
Solution 2: SSL is required, server identity is important and needs to be verified. The server needs a internet connection to do the cert verification. Please note the crypto API doesn't update CTL for every process. The CTL is maintained at operating system level. Once you connect the server to connect and make an SSL database connection to the server, the CTL will be updated automatically. Then you may disconnect the internet connection. Note again the CTL has its expiration date and after that the Windows needs to update it again. This will occur probably after several months.
Solution 3: SSL is required but the server identity is not important. Typically SSL is only used to encrypt the network transport in this case. We can turn off CTL update:
Press Win+R to open the "Run" dialog
Type "gpedit.msc" (without quotes) and press Enter
In the "Local Group Policy Editor", expand "Computer Configuration", expand "Administrative Templates", expand "System", expand "Internet Communication Management", and then click "Internet Communication settings".
In the details panel, double-click "Turn off Automatic Root Certificates Update", clickEnabled, then click OK. This change will be effective immediatelly without restart.
http://www.voidcn.com/article/p-phfoefri-bpr.html
unfortunately, this error occurs if the application and mysql are on the same computer, if you move it to a different computer it is fine.
I tried many ways but for now there is no other solution. bug has been reported many times by others https://bugs.mysql.com/bug.php?id=76597
I had the exact same problem performing the upgrade on a windows form application. The solution I found was to change the server, because that one was in trouble. On the server that was presenting the similar situation you described had installed WordPress with MYSQL 5.6.34, on the other I did a clean install with MYSQL version 5.6.26.
I don't know if it has to do with the environment variables used. I believe it has nothing to do with Connection Timeout, if it is a property that is used only with the open connection. This error occurred in a shared environment as well as in a local installation with Maria DB. Another problem I found was that one of the selection commands that retrieved the data was having a problem in its formation not respecting the blanks:
SELECT COLUNA1, COLUNA2 FROM TABLE;
I made the change to SELECT COLUMN1, COLUMN2 FROM TABLE;
I am still testing on this solution I presented, and as of the time of posting there were no more errors.
I was getting the error
Authentication to host 'localhost' for user 'root' using method 'mysql_native_password' failed with message: Reading from the stream
I solved it when I put SslMode=None in my connection string.
However, I checked that the message is different from you
Check my connection
connection.ConnectionString = "server=myadressserver;userid=myuser;password=mypassword;database=test;SslMode=None";

How to Efficently Test Whether an SQL Server Instance is Running in C#

I have an SQL Server (2008 R2) based (C# WinForms) application that predominantly runs on a local machine using a local installation of SQL Server 2008 R2. One problem I have is that if the user does not have a server instance running and tries to execute some commands or perform some operations, the queries are sent off to SQL Server and it takes an age to throw an SqlException telling me the requested instance is not started.
I have read the following question and associated answers, but these solutions are far from ideal. WMI seem very much over-kill and I do not want to have to include extra .dlls in my installation package for the software if it can be avoided.
I have also come accross the SqlDataSourceEnumerator Class documented here
// Retrieve the enumerator instance and then the data.
SqlDataSourceEnumerator instance = SqlDataSourceEnumerator.Instance;
System.Data.DataTable table = instance.GetDataSources();
which dumps the available connection into a DataTable. However, there seems to be inherent problems with returning all the available connections:
"All of the available servers may or may not be listed. The list can vary depending on
factors such as timeouts and network traffic. This can cause the list to be different
on two consecutive calls." - MSDN.
There has to be a set way of dealing with this problem. Say I have the following SqlConnection string:
Data Source=localhost;Initial Catalog=MyDB;Integrated Security=True;Connection Timeout = 0
what can I use as an efficient (this is crucial) check as to whether the selected instance ('localhost' [the default instance] or 'SomeInstanceName') is running?
Thanks for your time.
I don't think you need to worry about timeouts or network issues when the server and client are the same machine. Just attempting to connect is about efficient as you're going to get, the crucial part is going to be how long do you let the connection attempt try before you give up (connection timeout). You can shorten that window obviously, but if you make it too short, then the problem doesn't really make sense.
You can change the connection timeout to be a shorter period, but essentially, the only way it knows that a server isn't there, is from a timeout.
Any technique you use will likely have the exact same timeout issue.
If you know the instance name, such as "MSSQL$InstanceName" you can use the System.ServiceProcess.ServiceController class to get a list of all services on the machine and then loop through looking to see if any ServiceName == MSSQL$InstanceName.
I have found this to be very fast plus you can check to see if it is running and start it if it is not running.
You could try opening up a simple TCP connection to the standard SQL port and see if it sticks .
set the connection timeout to some reasonably low value based on your environment.
see
http://support.microsoft.com/kb/287932
for sql serevr port numbers.

What should the client do while the TIBCO EMS server attempts failover?

The TIBCO EMS user's guide (pg 292) says:
The backup server will work indefinitely to either A) become the
primary server or B) reconnect to the primary server. It also says
clients may receive fail-over notification when the switch is successful (see also TIBCO EMS .NET reference pg 220).
I have some questions spinning off of these facts...
What kind of errors occur on the client side while the servers are attempting fail-over/reconnect?
What is the appropriate response from the client?
Get new Connection objects from the ConnectionFactory until one works?
Wait for fail-over notification? (are current Connection instances fixed at this time? or do I need to get a new instance?)
I hope the scenario is clear, any related information or advice would be appreciated too.
I can at least answer #1 above.
If you have enabled Tibems.SetExceptionOnFTSwitch(true); and have set up an exception handler to capture the messages the server sends to the client, you will see the following:
For single-server, non-fault tolerant connection failures:
"Connection has been terminated".
For fault-tolerant connection failures:
"Connection has performed fault-tolerant switch to "
If you attempt to publish while the connection is down, a TIBCO.EMS.IllegalStateException is thrown with the "Producer is closed" message.
for #2 above, I think the answer is to allow the EMS library to handle as much as possible. Once we got the EMS reconnect functionality to work, it gracefully tried to reconnect until the server became available again and once it reconnected, it was like there was never a problem. The only gotcha is probably if you try to publish a message before the ems connection is back. This is where the exception handler comes in, Once notified that you are in failover mode, you can adjust exception handling on the publisher side to suppress the error until the connection is back. The thing I don't know is how do you tell when you've exhausted all reconnect attempts.
Anyway, Seems like our two worlds are closely related when it comes to EMS - hope our findings (based on your comments on my questions) help you.
We use TEMS (Tibco EMS - a Tibco Product for WCF) So it becomes a custom binding. We tried to break it by doing things like bounce the server to force switch overs and it works really well. make sure you are using version 1.2 not 1.1 because you cannot do anything other then client acknowledgement.

Can't figure out what network conditions have changed to cause network code to become non-functional

This is one of those situations where basically me leaving the building for 12 hours caused my code to stop working.
The code was being used in production/running for a few days to collect some data etc. for analysis. Now, the network portion of my code simply cannot function.
The code itself, where it is running, and the network it is on have NOT changed at all.
I've checked all the firewall rules on the server, and the clients. Nothing has changed there.
Now there's just an endless stream of my catch statements throwing miscellaneous network errors.
"An existing connection was forcibly closed by the host" - it wasn't
"Only one usage of each address is normally permitted" - there are no attempts made of this nature
The code I am using for the client/server is basically the code that was provided by Microsoft:
client: https://msdn.microsoft.com/en-us/library/bew39x2a(v=vs.110).aspx
server: https://msdn.microsoft.com/en-us/library/fx6588te.aspx

Categories