I'm experiencing really weird behavior with the Socket.Connect method in C#. I am attempting a TCP Socket.Connect to a valid IP but closed port and the method is continuing as if I have successfully connected. When I packet sniffed what was going on I saw that the app was receiving RST packets from the remote machine. Yet from the tracing that is in place it is clear that the connect method is not throwing an exception. Any ideas what might be causing this?
The code that is running is basically this
IPEndPoint iep =
new IPEndPoint(System.Net.IPAddress.Parse(m_ipAddress), m_port);
Socket tcpSocket = new Socket(AddressFamily.InterNetwork,
SocketType.Stream, ProtocolType.Tcp);
tcpSocket.Connect(iep);
To add to the mystery... when running this code in a stand alone console application, the result is as expected – the connect method throws an exception. However, when running it in the Windows Service deployment we have the connect method does not throw an exception.
Edit in response to Mystere Man's answer
How would the exception be swallowed? I have a Trace.WriteLine right above the .Connect method and a Trace.WriteLine right under it (not shown in the code sample for readability). I know that both traces are running. I also have a try catch around the whole thing which also does a Trace.Writeline and I don't see that in the log files anywhere. I have also enabled the internal socket tracing as you suggested. I don't see any exceptions. I see what appears to be successful connections.
I am trying to identify differences between the windows service app and the diagnostic console app I made. I am running out of ideas though
End edit
Thanks
Are you sure the exception isn't being caught and swallowed in the service, but not in the console app?
My first step would be to isolate the differences between the two implementations. You mention tracing, but you don't say whether this is Network tracing (part of the BCL) or your own tracing. If you're not using network tracing, then enable that.
see AppDomain.CurrentDomain.UnhandledException
I have never observed this again. It seems to me that something was corrupt somewhere. Either the OS on which the app was installed or the .NET framework.
Related
I have implemented an Http Listener in c# as a windows service. The windows service is set to start automatically when the machine is restarted. When I manually start the service after installing it, the http listener works fine and it responds to the requests it receives. But, when the service is started on a system restart, I get the following error:
System.Net.HttpListenerException (0x80004005): The format of the specified network name is not valid
I get this error on listener.Start().
The code of http listener is like this:
HttpListener listener = new HttpListener();
listener.Prefixes.Add("http://myip:port/");
listener.Start();
I got a suggestion from this already asked question. If I follow what's given in the answer, it still doesn't work.
Furthermore, I tried running:
netsh http show iplisten
in powershell, the list is empty. Even when the http listener works (when the first time I install the service and run it), the output of this command is empty list. So I don't think this is an issue.
Any suggestions will be really helpful.
Answering my own question. It seems there are some other services that need to be running for us to be able to start an http listener. These are not yet started by the time windows starts my service. I found two solutions for this, one is to use delayed start
sc.exe config myservicename start=delayed-auto
The other is to have a try catch while starting the http listener, and if it fails, try again after a few seconds. In my case, time is of the essence so I'm using the second approach because it start the listener about 2 minutes faster than the first approach.
I use an C# Console Application to put and read messages of the MQ..
When the application starts, it connect once with the MQ and then the connection should be always upholded.
The program runs every 30 sec and check if new messages are in the queue or a database(to put them on the queue) and check the isConnected-variable if its true.
But what happen if an exception(2009 - connection broke) in the Put/Get occur? Will the isConnected automatically set to false?
Is the connection automatically disconnected or do I have to call Disconnect() in the error handling?
Thanks!
To answer your exact question, for a basic .net application (non XMS) using MQQueue for put/get, if you get CERTAIN bad return codes from the underlying API call which indicates a connection issue, MQ will attempt an MQBACK and an MQDISC for you and will result in the connection handle being invalidated (IsConnected would return false) and an exception being thrown. However if an exception occurs outside those return codes then no attempt is made to do anything with the connection.
Basically you should not code an application relying on this behaviour, when the most simple answer is to always disconnect if you get an exception which relates to the quality of the connection or queue manager. For example, a no message available etc type exception doesnt mean you need to disconnect but a connection broken obviously does. There is no harm in calling disconnect on an already disconnected connection.
I have a simple pub-sub setup on a mid-sized network, using ZMQ 2.1. Although some subscribers are using C# bindings, others are using Python bindings, and the issue I'm having is the same for either.
If I pull the network cable from a machine running a subscriber, I get an un-catchable error that immediately terminates that subscriber.
Here's a very simple example of a subscriber in Python (not actual production code, but enough to reproduce the problem):
import zmq
def main(server_address, port):
context = zmq.Context()
sub_socket = context.socket(zmq.SUB)
sub_socket.connect("tcp://" + server_address + ":" + str(port))
sub_socket.setsockopt(zmq.SUBSCRIBE, "KITH1S2")
while True:
msg = sub_socket.recv()
print msg
if __name__ == "__main__": main("company-intranet", 4000)
In C# the program simply terminates silently. In Python I at least get this:
Assertion failed: rc == 0 (....\src\zmq_connector.cpp:48)
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
I've tried non-blocking versions, and poller versions, but in either case this instant termination problem persists. Is there something obvious I should be doing but I'm not? (That is, obvious to someone else :) ).
EDIT:
Found the following: https://zeromq.jira.com/browse/LIBZMQ-207
Seems as though it is/was a known issue.
That link further links to Github, where a change log for 2.1.10 has this note:
Fixed issue 207, assertion failure in zmq_connecter.cpp:48, when an
invalid zmq_connect() string was used, or the hostname could not be
resolved. The zmq_connect() call now returns -1 in both those cases.
Although connect() does indeed throw an Invalid Argument exception in Python (not C# apparently?), recv() still fails. If the subscriber machine suddenly loses the network, that subscriber will simply stop functioning.
So - I'm going to try using IP addresses instead of named addresses to see if this will bypass the issue. Not ideal, but better than insta-crash.
Original question: Is there something obvious I should be doing but I'm not?
No.
The workaround for now is to use IP addressing. This does not cause program failure upon network disconnect for ZMQ 2.1.x.
I have to write a TCP Client that will have ability to reconnect to server. The server can be unavailable due to poor network connection quality or some maintenance issues. I'm searching for quality solutions in this area.
My current solutions is following:
keep connection state in ConnectionState enum {Offline, Online, Connecting}
create client with TcpClient class.
create two timers called ConnectionCheckTimer, and ReconnectTimer
connect to server
start reader thread and connection check timer
reading is performed with tcpClient.GetStream() and then reading from this stream
when Exception is caught in readerLoop client state is changed to offline and ReconnectTimer is launched
ConnectionCheckTimer periodically checks lastMessageTimestamp and compares it with current time if the interval is greater then maxValue it launches ReconnectTimer
Currently i'm not satisfied with this solution because it still generates exceptions for instance ObjectDisposedException on TcpClient.NetworkStream. I'm looking for some clean and reusable Tcp reconnecting client implementation that is able to cope with all sockets problems that can occur during connecting, disconnecting, reading data.
If you have connection issues, you will always have exceptions. I think you have a sound outline, you just need to handle the exceptions. You could start with your own Socket class implemenation and write the TCPIP Server. Starter code is at MS:
http://msdn.microsoft.com/en-us/library/fx6588te(VS.71).aspx
The C# code is half way down the VB page.
The class you should use is "SocketAsyncEventArgs".
I've used it in this project:
http://ts3querylib.codeplex.com/
Have a look at the AsyncTcpDispatcher class.
The TIBCO EMS user's guide (pg 292) says:
The backup server will work indefinitely to either A) become the
primary server or B) reconnect to the primary server. It also says
clients may receive fail-over notification when the switch is successful (see also TIBCO EMS .NET reference pg 220).
I have some questions spinning off of these facts...
What kind of errors occur on the client side while the servers are attempting fail-over/reconnect?
What is the appropriate response from the client?
Get new Connection objects from the ConnectionFactory until one works?
Wait for fail-over notification? (are current Connection instances fixed at this time? or do I need to get a new instance?)
I hope the scenario is clear, any related information or advice would be appreciated too.
I can at least answer #1 above.
If you have enabled Tibems.SetExceptionOnFTSwitch(true); and have set up an exception handler to capture the messages the server sends to the client, you will see the following:
For single-server, non-fault tolerant connection failures:
"Connection has been terminated".
For fault-tolerant connection failures:
"Connection has performed fault-tolerant switch to "
If you attempt to publish while the connection is down, a TIBCO.EMS.IllegalStateException is thrown with the "Producer is closed" message.
for #2 above, I think the answer is to allow the EMS library to handle as much as possible. Once we got the EMS reconnect functionality to work, it gracefully tried to reconnect until the server became available again and once it reconnected, it was like there was never a problem. The only gotcha is probably if you try to publish a message before the ems connection is back. This is where the exception handler comes in, Once notified that you are in failover mode, you can adjust exception handling on the publisher side to suppress the error until the connection is back. The thing I don't know is how do you tell when you've exhausted all reconnect attempts.
Anyway, Seems like our two worlds are closely related when it comes to EMS - hope our findings (based on your comments on my questions) help you.
We use TEMS (Tibco EMS - a Tibco Product for WCF) So it becomes a custom binding. We tried to break it by doing things like bounce the server to force switch overs and it works really well. make sure you are using version 1.2 not 1.1 because you cannot do anything other then client acknowledgement.