Transport errors - c#

I'm working on a crawler and my attempt at fixing an issue with this exception:
System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host at System.Net.Sockets.NetworkStream.BeginRead(Byte[] buffer, Int32 offset, Int32 size, AsyncCallback callback, Object state)
was to implement a retry pattern after using wireshark and looking at network logs I concluded that these errors are most likely transient.
However these exceptions are really bugging me now and I would really like to get to the bottom of why I am getting these errors. Can anyone suggest a good strategy to adopt and tools I can use or reasons you can think of why the connection is being forcibly closed?
Thanks

I see two questions here:
Why is the Exception being thrown?
Why is the connection being forcibly closed?
Why is the Exception being thrown?
This is a problem with the transport implementation you have chosen to consume. Apparently, microsoft decided to communicate the error by wrapping it up in an exception and throw it up the stack. The corresponding source code can be found here: http://referencesource.microsoft.com/#System/net/System/Net/Sockets/NetworkStream.cs,766
In the source code, you can also see that the InnerException is set and contains a localization independent representation of the errorCode.
The bottom line is that this exception being thrown does not mean anything exceptional happened, it can happen just because the connection was dropped.
Which brings us to the next question:
Why is the connection being forcibly closed?
Just as the exceptions message hints, the reason could well be the remote host. Therefore, looking at the remote hosts implementation could be required to get to the bottom of this.
I suspect though, that just judging by the exception, you cannot rule out the reason to be somewhere in between the hosts (sharks have shown an appetite for fiber cables).
I suggest the following experiment:
Set up the two hosts residing on different machines and let them connect through a cable.
While the connection is established, unplug the cable.
This could not disproof but at least proof the possibility.
However, "working on a crawler" suggests that you might encounter a variety of different hosts and it is to be expected that some of them turn taciturn sometimes for whatever reason you would care to imagine.
EDIT:
I remember catching this exception when using TCP over IP when the remote host sent a packet with the RST Flag set. The value of the RST Flag is displayed in Wireshark.
networkstream

My 50 cents: This is the normal behaviour when using networkstreams for reading data from a socket. It is not a user error, the exception thrown just causes the data processing in the reading thread to be interrupted. Just wrap it up with a try/catch-handler accordingly.
You could try to use the DebuggerNonUserCode attribute (https://msdn.microsoft.com/de-de/library/system.diagnostics.debuggernonusercodeattribute%28v=vs.110%29.aspx) to suppress debugger alerts when an exception is triggered. Be aware that this may also "hide" other exceptions...

Related

NetworkStream gets System.IO.IOException: Unable to write data to the transport connection

I am using a NetworkStream to keep an open TCP/IP connection that messages can be sent across. I receive a message, process it, and then return an ACK. I am working with a site where occasionally I receive the message, but when I go to send the ACK, I get an IOException. Sometimes this lasts for only one or two messages (I can receive the next message), and other times it continues until the service is stopped and restarted.
Below is the code for my NetworkStream without any of the processing:
using (NetworkStream stream = client.GetStream())
{
stream.ReadTimeout = ReadTimeout;
...
if (stream.CanRead && stream.DataAvailable)
bytesRead = stream.Read(receivedBytes, 0, BufferSize);
...
stream.Write(ack, 0, ack.Length);
}
Note that the code loops between reading new messages and sending the ACK.
When I call stream.Write, I will sometimes get the following exception:
System.IO.IOException: Unable to write data to the transport connection: An established connection was aborted by the software in your host machine. ---> System.Net.Sockets.SocketException: An established connection was aborted by the software in your host machine
at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size)
--- End of inner exception stack trace ---
at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size)
at Framework.Communication.Model.Server.AcceptMessage(IAsyncResult ar)
I have looked this message up and found several sources for it for several different communication methods. It sounds like this is a very generic exception that does not really tell what is happening.
Does anyone know what would cause this, or where I could start looking to narrow down the solution?
You are right to think this is a generic problem, so you will have to be more defensive.
Somethings to consider
When an application closes a socket the right way, it sends a message containing 0 bytes.
In some cases you may get a SocketException indicating something went wrong.
In a third situation, the remote party is no longer connected (for instance by unplugging the network cable) without any communication between the two parties.
If this happens, you'll have to write data to the socket in order to detect that you can no longer reach the remote party. This is why keep-alive mechanisms were invented - they check every so often whether they can still communicate with the other side.
Note : None of this explains the intermittent nature of part of the
question (if i'm reading it correctly)
So lets see what the documentation says
NetworkStream.Write Method (Byte[], Int32, Int32)
IOException
There was a failure while writing to the network.
An error occurred when accessing the socket. See the Remarks section for more information.
Remarks
If you receive a SocketException, use the
SocketException.ErrorCode property to obtain the specific error
code, and refer to the Windows Sockets version 2 API error code
documentation in MSDN for a detailed description of the error.
So in my mind, as mentioned before you need to be a bit more defensive.
Check for 0 bytes.
Check for errors.
I you get an error make sure you are checking the error codes
In either of these cases, its probably good practice (and makes sense) to restart the connection again, log the failure, and assume the connection has been closed (or abnormally closed)
Additional resources
Detect closed network connection
**** Detection of Half-Open (Dropped) Connections **** Stephen Cleary

If no data is received over stream tcpclient close the connection

I would like to know on TcpClient's NetworkStream what exactly happen if timeout occurs.
While debugging the code i found that after request is sent and if no data is received within mention timeout period it throws below exception and unfortunately closes the connection (TcpClient.Connected become false):
Unable to read data from the transport connection: A connection
attempt failed because the connected party did not properly respond
after a period of time, or established connection failed because
connected host has failed to respond.
It throws the exception is okay, but i would like to know how i can prevent it from closing the connection.
It would be great if someone can provide more insights on this.
Have you checked this one? Reconnect TCPClient after interruption I think if you have a long enough TTL of your TCP Connection, should an exception occurs (I believe you would get thrown a SocketException) you can catch that up and initiate your retry logic. There are several implementations for this and obviously that would depend on the use case but normally there is a number of attempts (configuration value) before "giving up" connecting. That way your manager will retry connecting X number of times and will carry on if there is a successful connection otherwise will propagate up in the chain the exception.

Wince Socket exception on asynchronous HTTP request

I am writing a WinCE app in C# that makes an HTTP POST to an APACHE server residing on my network. Because of some network issues (I am guessing), I get the following exception in the managed code
System.Net.Sockets.SocketException occurred
Message="A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond"
ErrorCode=10060
NativeErrorCode=10060
StackTrace:
at System.Net.Sockets.Socket.ConnectNoCheck(EndPoint remoteEP)
at System.Net.Sockets.Socket.Connect(EndPoint remoteEP)
at System.Net.Connection.doConnect(IPEndPoint ep)
at System.Net.Connection.connect(Object ignored)
at System.Threading.ThreadPool.WorkItem.doWork(Object o)
at System.Threading.Timer.ring()
This exception isn't always thrown, but when it is thrown, my app fails to connect to the server AT ALL. Repeated connection attempts don't help in reconnecting either. The only thing that helps is closing and re-deploying the app.
I can't catch the exception because its inside of managed code. Is there any way to circumvent this and close all socket connections to my server and re-initialize them? Is there something I am doing wrong?
The exception message looks a bit misleading ("connection attempt failed because the connected party") but I think it means your hardware is communicating with the server, but the server is not accepting the connection on the TCP level.
A problem I could think of is "hanging" connections, causing the server to reach the maximum number of concurrent connections and to stop accepting new ones.
Although it's just a guess, you might want to check the apache log if you can to see if you can find out if the server reports anything, and perhaps try restarting apache as soon as the problem occurs again. If that helps, you still need to find the cause of course.

TcpClient.BeginRead/TcpClient.EndRead doesn't throw exception when internet disconnected

I'm using TcpListener to accept & read from TcpClient.
The problem is that when reading from a TcpClient, TcpClient.BeginRead / TcpClient.EndRead doesn't throw exception when the internet is disconnected. It throws exception only if client's process is ended or connection is closed by server or client.
The system generally has no chance to know that connection is broken. The only reliable way to know this is to attempt to send something. When you do this, the packet is sent, then lost or bounced and your system knows that connection is no longer available, and reports the problem back to you by error code or exception (depending on environment). Reading is usually not enough cause reading only checks the state of input buffer, and doesn't send the packet to the remote side.
As far as I know, low level sockets doesn't notify you in such cases. You should provide your own time out implementation or ping the server periodically.
If you want to know about when the network status changes you can subscribe to the System.Net.NetworkInformation.NetworkChange.NetworkAvailabilityChanged event. This is not specific to the internet, just the local network.
EDIT
Sorry, I misunderstood. The concept of "connected" really doesn't exist the more you think about it. This post does a great job of going into more details about that. There is a Connected property on the TcpClient but MSDN says (emphasis mine):
Because the Connected property only
reflects the state of the connection
as of the most recent operation, you
should attempt to send or receive a
message to determine the current
state. After the message send fails,
this property no longer returns true.
Note that this behavior is by design.
You cannot reliably test the state of
the connection because, in the time
between the test and a send/receive,
the connection could have been lost.
Your code should assume the socket is
connected, and gracefully handle
failed transmissions.
Basically the only way to check for a client connection it to try to send data. If it goes through, you're connected. If it fails, you're not.
I don't think you'd want BeginRead and EndRead throwing exceptions as these should be use in multi threaded scenarios.
You probably need to implement some other mechanism to respond to the dropping of a connection.

How do you deal with transport-level errors in SqlConnection?

Every now and then in a high volume .NET application, you might see this exception when you try to execute a query:
System.Data.SqlClient.SqlException: A transport-level error has
occurred when sending the request to the server.
According to my research, this is something that "just happens" and not much can be done to prevent it. It does not happen as a result of a bad query, and generally cannot be duplicated. It just crops up maybe once every few days in a busy OLTP system when the TCP connection to the database goes bad for some reason.
I am forced to detect this error by parsing the exception message, and then retrying the entire operation from scratch, to include using a new connection. None of that is pretty.
Anybody have any alternate solutions?
I posted an answer on another question on another topic that might have some use here. That answer involved SMB connections, not SQL. However it was identical in that it involved a low-level transport error.
What we found was that in a heavy load situation, it was fairly easy for the remote server to time out connections at the TCP layer simply because the server was busy. Part of the reason was the defaults for how many times TCP will retransmit data on Windows weren't appropriate for our situation.
Take a look at the registry settings for tuning TCP/IP on Windows. In particular you want to look at TcpMaxDataRetransmissions and maybe TcpMaxConnectRetransmissions. These default to 5 and 2 respectively, try upping them a little bit on the client system and duplicate the load situation.
Don't go crazy! TCP doubles the timeout with each successive retransmission, so the timeout behavior for bad connections can go exponential on you if you increase these too much. As I recall upping TcpMaxDataRetransmissions to 6 or 7 solved our problem in the vast majority of cases.
This blog post by Michael Aspengren explains the error message "A transport-level error has occurred when sending the request to the server."
To answer your original question:
A more elegant way to detect this particular error, without parsing the error message, is to inspect the Number property of the SqlException.
(This actually returns the error number from the first SqlError in the Errors collection, but in your case the transport error should be the only one in the collection.)
I had the same problem albeit it was with service requests to a SQL DB.
This is what I had in my service error log:
System.Data.SqlClient.SqlException: A transport-level error has occurred when sending the request to the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)
I have a C# test suite that tests a service. The service and DB were both on external servers so I thought that might be the issue. So I deployed the service and DB locally to no avail. The issue continued. The test suite isn't even a hard pressing performance test at all, so I had no idea what was happening. The same test was failing each time, but when I disabled that test, another one would fail continuously.
I tried other methods suggested on the Internet that didn't work either:
Increase the registry values of TcpMaxDataRetransmissions and TcpMaxConnectRetransmissions.
Disable the "Shared Memory" option within SQL Server Configuration Manager under "Client Protocols" and sort TCP/IP to 1st in the list.
This might occur when you are testing scalability with a large number of client connection attempts. To resolve this issue, use the regedit.exe utility to add a new DWORD value named SynAttackProtect to the registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\ with value data of 00000000.
My last resort was to use the old age saying "Try and try again". So I have nested try-catch statements to ensure that if the TCP/IP connection is lost in the lower communications protocol that it does't just give up there but tries again. This is now working for me, however it's not a very elegant solution.
use Enterprise Services with transactional components
I have seen this happen in my own environment a number of times. The client application in this case is installed on many machines. Some of those machines happen to be laptops people were leaving the application open disconnecting it and then plugging it back in and attempting to use it. This will then cause the error you have mentioned.
My first point would be to look at the network and ensure that servers aren't on DHCP and renewing IP Addresses causing this error. If that isn't the case then you have to start trawlling through your event logs looking for other network related.
Unfortunately it is as stated above a network error. The main thing you can do is just monitor the connections using a tool like netmon and work back from there.
Good Luck.
You should also check hardware connectivity to the database.
Perhaps this thread will be helpful:
http://channel9.msdn.com/forums/TechOff/234271-Conenction-forcibly-closed-SQL-2005/
I'm using reliability layer around my DB commands (abstracted away in the repository interfaece). Basically that's just code that intercepts any expected exception (DbException and also InvalidOperationException, that happens to get thrown on connectivity issues), logs it, captures statistics and retries everything again.
With that reliability layer present, the service has been able to survive stress-testing gracefully (constant dead-locks, network failures etc). Production is far less hostile than that.
PS: There is more on that here (along with a simple way to define reliability with the interception DSL)
I had the same problem. I asked my network geek friends, and all said what people have replied here: Its the connection between the computer and the database server. In my case it was my Internet Service Provider, or there router that was the problem. After a Router update, the problem went away. But do you have any other drop-outs of internet connection from you're computer or server? I had...
I experienced the transport error this morning in SSMS while connected to SQL 2008 R2 Express.
I was trying to import a CSV with \r\n. I coded my row terminator for 0x0d0x0a. When I changed it to 0x0a, the error stopped. I can change it back and forth and watch it happen/not happen.
BULK INSERT #t1 FROM 'C:\123\Import123.csv' WITH
( FIRSTROW = 1, FIELDTERMINATOR = ',', ROWTERMINATOR = '0x0d0x0a' )
I suspect I am not writing my row terminator correctly because SQL parses one character at a time right while I'm trying to pass two characters.
Anyhow, this error is 4 years old now, but it may provide a bit of information for the next user.
I just wanted to post a fix here that worked for our company on new software we've installed. We were getting the following error since day 1 on the client log file: Server was unable to process request. ---> A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The semaphore timeout period has expired.) ---> The semaphore timeout period has expired.
What completely fixed the problem was to set up a link aggregate (LAG) on our switch. Our Dell FX1 server has redundant fiber lines coming out of the back of it. We did not realize that the switch they're plugged into needed to have a LAG configured on those two ports. See details here: https://docs.meraki.com/display/MS/Switch+Ports#SwitchPorts-LinkAggregation

Categories