I have been trying to set up a gRPC API capable of streaming events to a client. Basically, after a client has subscribed, the server will use gRPC's "Server Streaming" feature to send any new event to the client.
I expect there to be periods of inactivity, where the connection should remain active. However, with my current setup it seems Nginx is cutting the connection after 60 seconds of inactivity with the following exception at the client:
Grpc.Core.RpcException: Status(StatusCode="Internal", Detail="Error starting gRPC call. HttpRequestException: An error occurred while sending the request. IOException: The request was aborted. IOException: The response ended prematurely, with at least 9 additional bytes expected.", DebugException="System.Net.Http.HttpRequestException: An error occurred while sending the request.
---> System.IO.IOException: The request was aborted.
---> System.IO.IOException: The response ended prematurely, with at least 9 additional bytes expected.
The question is why? and how can I prevent it?
My setup
The API is built in ASP.NET Core 3 (will probably upgrade to .NET 5 soon) and is running in a Docker container on a Digital Ocean server.
Nginx is also running in a Docker container on the server and works as a reverse proxy for the API (among other things).
The client is a simple C# client written in .NET Core and is run locally.
What have I tried?
I have tried to connect to the Docker image directly on the server using grpc_cli (bypassing Nginx) where the connection remain active for long periods of inactivity without any issues. So I can't see what else it can be, except Nginx. Also, most of Nginx default timeout values seem to be 60 seconds.
I have tried these Nginx settings and various combinations of them, yet haven't found the right one (or the right combination) yet:
location /commands.CommandService/ {
grpc_pass grpc://commandApi;
grpc_socket_keepalive on;
grpc_read_timeout 3000s; # These are recommended everywhere, but I haven't had any success
grpc_send_timeout 3000s; #
grpc_next_upstream_timeout 0;
proxy_request_buffering off;
proxy_buffering off;
proxy_connect_timeout 3000s;
proxy_send_timeout 3000s;
proxy_read_timeout 3000s;
proxy_socket_keepalive on;
keepalive_timeout 90s;
send_timeout 90s;
client_body_timeout 3000s;
}
The most common suggestion for people with similar issues is to use grpc_read_timeout and grpc_send_timeout, but they don't work for me. I guess it makes sense since I'm not actively sending/receiving anything.
My client code looks like this:
var httpClientHandler = new HttpClientHandler();
var channel = GrpcChannel.ForAddress("https://myapi.com", new GrpcChannelOptions()
{
HttpClient = new HttpClient(httpClientHandler) { Timeout = Timeout.InfiniteTimeSpan },
});
var commandService = channel.CreateGrpcService<ICommandService>();
var request = new CommandSubscriptionRequest()
{
HandlerId = _handlerId
};
var sd = new CancellationTokenSource();
var r = new CallContext(callOptions: new CallOptions(deadline: null, cancellationToken: sd.Token));
await foreach (var command in commandService.SubscribeCommandsAsync(request, r))
{
Console.WriteLine("Processing command: " + command.Id);
}
return channel;
To be clear, the call to the API works and I can receive commands from the server. If I just keep sending commands from the API, everything is working beautifully. But as soon as I stop for 60 seconds (I have timed it), the connection breaks.
A possible workaround would be to just keep sending a kind of heartbeat to keep the connection open, but I would prefer not to.
Does anyone know how I can fix it? Am I missing something obvious?
UPDATE: Turns out it wasn't Nginx. After I updated the API and the client to .NET 5 the problem disappeared. I can't say in what version this was fixed, but at least it's gone in .NET 5.
Not sure this is an Nginx issue, looks like a client connection problem.
Your results look very similar to an issue I had, that should have been fixed in .net 3.0 patch. Try updating to a newer version of .NET and see if that fixes the problem.
Alternatively, it could be a problem with the max number of connections. Try setting the MaxConcurrentConnections for the kestrel server (in appsettings.json):
{
"Kestrel": {
"Limits": {
"MaxConcurrentConnections": 100,
"MaxConcurrentUpgradedConnections": 100
}
}
}
I am currently working with a server application we have designed to communicate with a Xamarin mobile app. We are using an old messaging library that makes a connection with a TcpClient and keeps the connection open (with a heartbeat message every 3 seconds). We added SSL to the library by wrapping the TcpClient stream with an SslStream. We have run the server application on Windows and it works well, but our ultimate target is Mono on a BeagleBoneBlack.
However, when we close the stream and the client on the mobile app side and then attempt to re-initiate a new connection, the SslStream.AuthenticateAsServer(...) will not complete on the server. However, if I completely close the mobile app, the server will throw an exception. At that point, I re-open the app, and can reconnect without any issue.
So it seems something low level is not being closed on either the app or the server side. What is odd is that I run the exact same code on both when the server is running on windows and I don't have an issue.
Here is my code that closes/disposes the stream
public async Task Disconnect()
{
if (!UseAsync)
{
semaphore.Wait();
}
try
{
if (UseSSL)
{
SslStream?.Close();
}
Client?.GetStream()?.Close();
Client?.Close();
}
catch (Exception ex) // Assuming we had an exception from trying to close the sslstream
{
logger.Error(ex, "Did not close/dispose correctly: {0}", ex.ToString());
}
finally
{
SslStream = null;
Client = null;
if (!UseAsync)
{
semaphore.Release();
}
}
}
Edit: It shouldn't be significant since the issue seems to lie at the server somewhere, and the client and server ssl code is almost identical, but in case someone asks, here is the client disconnect code
public async Task Disconnect()
{
try
{
if (UseSSL)
{
_sslStream?.Close();
}
Client?.GetStream()?.Close();
Client?.Close();
}
catch // Assuming we had an exception from trying to close the sslstream
{
// Ignore exceptions since we've already closed them
}
finally
{
_sslStream = null;
Client = null;
}
}
Edit 2
It should also be noted that I've found at least one bug report that looks like it's the same issue I'm dealing with. It doesn't appear from this bug report that it has ever been resolved, but I found other reports that seemed to reflect a similar issue in the mono framework and it was resolved. Additionally, I have added code to send some "dummy" data from the client after the connect ant it seems to have no affect.
Edit 3: I ultimately receive this exception on the client side
System.IO.IOException: The authentication or decryption has failed. ---> System.IO.IOException: The authentication or decryption has failed. ---> Mono.Security.Protocol.Tls.TlsException: The authentication or decryption has failed.
at Mono.Security.Protocol.Tls.RecordProtocol.EndReceiveRecord (System.IAsyncResult asyncResult) [0x0003a] in /Users/builder/data/lanes/3511/501e63ce/source/mono/mcs/class/Mono.Security/Mono.Security.Protocol.Tls/RecordProtocol.cs:430
at Mono.Security.Protocol.Tls.SslClientStream.SafeEndReceiveRecord (System.IAsyncResult ar, System.Boolean ignoreEmpty) [0x00000] in /Users/builder/data/lanes/3511/501e63ce/source/mono/mcs/class/Mono.Security/Mono.Security.Protocol.Tls/SslClientStream.cs:256
at Mono.Security.Protocol.Tls.SslClientStream.NegotiateAsyncWorker (System.IAsyncResult result) [0x00360] in /Users/builder/data/lanes/3511/501e63ce/source/mono/mcs/class/Mono.Security/Mono.Security.Protocol.Tls/SslClientStream.cs:533
Well... I almost feel silly answering this, but I feel others could end up in my situation out of equal ignorance, so hopefully this helps someone out with a similar architecture.
What happened is that I created a self signed certificate and had it in my project folder, so the same certificate was being used on all of my "server" instances. What was happening is that, due to some internal caching of the SSL session cache, after connecting to one device, it would cache its information and then re-use that information when connecting the next time. The result would be that, since the self-signed certificate used the same host name for all devices, once it tried to reconnect to device B after being connected to device A, it would attempt to re-use device A's cached information (as far as I can tell) The reason this didn't initially make sense is that I could connect and disconnect to multiple different Windows servers over and over again, but only when connecting to Mono servers did I see this issue. As a result, it still seems to be a bug in either Windows or Mono's SSLStream implementation (since, in a perfect world, they are identical), but unfortunately I don't have the time to dig through the de-compiled source code to find it. And frankly, it probably doesn't matter because what I was doing was breaking the whole notion of an SSL connection anyways.
Ultimately, I created a function to programmatically generate a unique certificate for each device using Mono.Security and then provide a mechanism to provide the client with the unique hostname (even though the client is connecting directly to an IP address).
I have a server app and sometimes, when the client tries to connect, I get the following error:
NOTE: the "couldn't get stream from client or login failed" is a text that's added by me in catch statement
and the line at which it stops ( sThread : line 96 ) is :
tcpClient = (TcpClient)client;
clientStream = tcpClient.GetStream();
sr = new StreamReader(clientStream);
sw = new StreamWriter(clientStream);
// line 96:
a = sr.ReadLine();
What may be causing this problem? Note that it doesn't happen all the time
I received this error when calling a web-service. The issue was also related to transport level security. I could call the web-service through a website project, but when reusing the same code in a test project I would get a WebException that contained this message. Adding the following line before making the call resolved the issue:
System.Net.ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls12;
Edit
System.Net.ServicePointManager.SecurityProtocol - This property
selects the version of the Secure Sockets Layer (SSL) or Transport
Layer Security (TLS) protocol to use for new connections that use the
Secure Hypertext Transfer Protocol (HTTPS) scheme only; existing
connections are not changed.
I believe the SecurityProtocol configuration is important during the TLS handshake when selecting the protocol version.
TLS handshake - This protocol is used to exchange all the information required by both sides for the exchange of the actual application data by TLS.
ClientHello - A client sends a ClientHello message specifying the highest TLS protocol version it supports ...
ServerHello - The server responds with a ServerHello message, containing the chosen protocol version ... The chosen protocol version should be the highest that both the client and server support. For example, if the client supports TLS version 1.1 and the server supports version 1.2, version 1.1 should be selected; version 1.2 should not be selected.
This error usually means that the target machine is running, but the service that you're trying to connect to is not available. (Either it stopped, crashed, or is busy with another request.)
In English:
The connection to the machine (remote host/server/PC that the service runs at) was made but since the service was not available on that machine, the machine didn't know what to do with the request.
If the connection to the machine was not available, you'd see a different error. I forget what it is, but it's along the lines of "Service Unreachable" or "Unavailable".
Edit - added
It IS possible that this is being caused by a firewall blocking the port, but given that you say it's intermittent ("sometimes when the client tries to connect"), that's very unlikely. I didn't include that originally because I had ruled it out mentally before replying.
My specific case scenario was that the Azure app service had the minimum TLS version changed to 1.2
I don't know if that's the default from now on, but changing it back to 1.0 made it work.
You can access the setting inside "SSL Settings".
According to "Hans Vonn" replies.
Adding the following line before making the call resolved the issue:
System.Net.ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls12;
After adding Security protocol and working fine but I have to add before every API call which is not healthy. I just upgrade .net framework version at least 4.6 and working as expected do not require to adding before every API call.
Not sure which of the fixes in these blog posts helped, but one of them sorted this issue for me ...
http://briancaos.wordpress.com/2012/07/06/unable-to-read-data-from-the-transport-connection-the-connection-was-closed/
The trick that helped me was to quit using a WebRequest and use a HttpWebRequest instead. The HttpWebRequest allows me to play with 3 important settings:
and
http://briancaos.wordpress.com/2012/06/15/an-existing-connection-was-forcibly-closed-by-the-remote-host/
STEP 1: Disable KeepAlive
STEP 2: Set ProtocolVersion to Version10
STEP 3: Limiting the number of service points
For those who may find this later, after .NET version 4.6, I was running into this problem as well.
Make sure that you check your web.config file for the following lines:
<compilation debug="true" targetFramework="4.5">
...
<httpRuntime targetFramework="4.5" />
If you are running 4.6.x or a higher version of .NET on the server, make sure you adjust these targetFramework values to match the version of the framework on your server. If your versions read less than 4.6.x, then I would recommend you upgrade .NET and use the newer version unless your code is dependent on an older version (which, in that case, you should consider updating it).
I changed the targetFrameworks to 4.7.2 and the problem disappeared:
<compilation debug="true" targetFramework="4.7.2">
...
<httpRuntime targetFramework="4.7.2" />
The newer frameworks sort this issue out by using the best protocol available and blocking insecure or obsolete ones. If the remote service you are trying to connect to or call is giving this error, it could be that they don't support the old protocols anymore.
Calls to HTTPS services from one of our servers were also throwing the "Unable to read data from the transport connection : An existing connection was forcibly closed" exception. HTTP service, though, worked fine. Used Wireshark to see that it was a TLS handshake Failure. Ended up being that the cipher suite on the server needed to be updated.
This solved my problem. I added this line before the request is made:
System.Net.ServicePointManager.Expect100Continue = false;
It seemed there were a proxy in the way of the server that not supported 100-continue behavior.
This won't help for intermittent issues, but may be useful for other people with a similar problem.
I had cloned a VM and started it up on a different network with a new IP address but not changed the bindings in IIS. Fiddler was showing me "Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host" and IE was telling me "Turn on TLS 1.0, TLS 1.1, and TLS 1.2 in Advanced settings". Changing the binding to the new IP address solved it for me.
For some reason, the connection to the server was lost. It could be that the server explicitly closed the connection, or a bug on the server caused it to be closed unexpectedly. Or something between the client and the server (a switch or router) dropped the connection.
It might be server code that caused the problem, and it might not be. If you have access to the server code, you can put some debugging in there to tell you when client connections are closed. That might give you some indication of when and why connections are being dropped.
On the client, you have to write your code to take into account the possibility of the server failing at any time. That's just the way it is: network connections are inherently unreliable.
I was sending the HttpWebRequest from Console App, and UserAgent was
null by (default), so setting UserAgent worked along with setting
SecurityProtocol.
Should set SecurityProtocol before creating HttpWebRequest.
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12 | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls;
HttpWebRequest req = (HttpWebRequest)WebRequest.Create("yourpostURL");
req.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36";
The webrequest user agent is null by default. Just google "block empty user agent" and you'll find a strong desire of many web server admins to do just that.
Sending my request with
request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0";
fixed the issue.
I get that problem in the past. I'm using PostgreSQL and when I run my program, sometimes it connects and sometimes it throws an error like that.
When I experiment with my code, I put my Connection code at the very first line below the public Form. Here is an example:
BEFORE:
public Form1()
{
//HERE LIES SOME CODES FOR RESIZING MY CONTROLS DURING RUNTIME
//CODE
//CODE AGAIN
//ANOTHER CODE
//CODE NA NAMAN
//CODE PA RIN!
//Connect to Database to generate auto number
NpgsqlConnection iConnect = new NpgsqlConnection("Server=localhost;Port=5432;User ID=postgres;Password=pass;Database=DB");
iConnect.Open();
NpgsqlCommand iQuery = new NpgsqlCommand("Select * from table1", iConnect);
NpgsqlDataReader iRead = iQuery.ExecuteReader();
NpgsqlDataAdapter iAdapter = new NpgsqlDataAdapter(iQuery);
DataSet iDataSet = new DataSet();
iAdapter.Fill(iDataSet, "ID");
MessageBox.Show(iDataSet.Tables["ID"].Rows.Count.ToString());
}
NOW:
public Form1()
{
//Connect to Database to generate auto number
NpgsqlConnection iConnect = new NpgsqlConnection("Server=localhost;Port=5432;User ID=postgres;Password=pass;Database=DB");
iConnect.Open();
NpgsqlCommand iQuery = new NpgsqlCommand("Select * from table1", iConnect);
NpgsqlDataReader iRead = iQuery.ExecuteReader();
NpgsqlDataAdapter iAdapter = new NpgsqlDataAdapter(iQuery);
DataSet iDataSet = new DataSet();
iAdapter.Fill(iDataSet, "ID");
MessageBox.Show(iDataSet.Tables["ID"].Rows.Count.ToString());
//HERE LIES SOME CODES FOR RESIZING MY CONTROLS DURING RUNTIME
//CODE
//CODE AGAIN
//ANOTHER CODE
//CODE NA NAMAN
//CODE PA RIN!
}
I think that the program must read first the connection before doing anything, I don't know, correct me if I'm wrong. But according to my research, it's not a code problem - it was actually from the machine itself.
System.Net.ServicePointManager.Expect100Continue = false;
This issue sometime occurs due the reason of proxy server implemented on web server. To bypass the proxy server by putting this line before calling the send service.
We had a very similar issue whereby a client's website was trying to connect to our Web API service and getting that same message. This started happening completely out of the blue when there had been no code changes or Windows updates on the server where IIS was running.
In our case it turned out that the calling website was using a version of .Net that only supported TLS 1.0 and for some reason the server where our IIS was running stopped appeared to have stopped accepting TLS 1.0 calls. To diagnose that we had to explicitly enable TLS via the registry on the IIS's server and then restart that server. These are the reg keys:
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS
1.0\Client] "DisabledByDefault"=dword:00000000 "Enabled"=dword:00000001
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS
1.0\Server] "DisabledByDefault"=dword:00000000 "Enabled"=dword:00000001
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS
1.1\Client] "DisabledByDefault"=dword:00000000 "Enabled"=dword:00000001
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS
1.1\Server] "DisabledByDefault"=dword:00000000 "Enabled"=dword:00000001
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS
1.2\Client] "DisabledByDefault"=dword:00000000 "Enabled"=dword:00000001
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS
1.2\Server] "DisabledByDefault"=dword:00000000 "Enabled"=dword:00000001
If that doesn't do it, you could also experiment with adding the entry for SSL 2.0:
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\SSL 2.0\Client]
"DisabledByDefault"=dword:00000000
"Enabled"=dword:00000001
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\SSL 2.0\Server]
"DisabledByDefault"=dword:00000000
"Enabled"=dword:00000001
My answer to another question here has this powershell script that we used to add the entries:
NOTE: Enabling old security protocols is not a good idea, the right answer in our case was to get the client website to update it's code to use TLS 1.2, but the registry entries above can help diagnose the issue in the first place.
The reason this was happening to me was I had a recursive dependency in my DI provider. In my case I had:
services.AddScoped(provider => new CfDbContext(builder.Options));
services.AddScoped(provider => provider.GetService<CfDbContext>());
Fix was to just remove the second scoped service registration
services.AddScoped(provider => new CfDbContext(builder.Options));
Had a similar problem and was getting the following errors depending on what app I used and if we bypassed the firewall / load balancer or not:
HTTPS handshake to [blah] (for #136) failed.
System.IO.IOException Unable to read data from the transport
connection: An existing connection was forcibly closed by the remote
host
and
ReadResponse() failed: The server did not return a complete response for this request. Server returned 0 bytes.
The problem turned out to be that the SSL Server Certificate got missed and wasn't installed on a couple servers.
For me, It was an issue where in the IIS binding it had the IP address of the web server.
I changed it to use all unassigned IPs and my application started to work.
I experienced the error with python clr running mdx query to Microsoft analytic services using adomd
I solved it with help of Hans Vonn and here is the python version:
clr.AddReference("System.Net")
from System.Net import ServicePointManager, SecurityProtocolType
ServicePointManager.SecurityProtocol = SecurityProtocolType.Ssl3 | SecurityProtocolType.Tls12 | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls
I received this error simply because I was attempting to make an http connection to an https-only server. Changing the request protocol in the URI from http to https thus resolved it.
This is how I solved the issue:
int i = 0;
while (stream.DataAvailable == true)
{
bytes[i] = ((byte)stream.ReadByte());
i++;
}
data = System.Text.Encoding.ASCII.GetString(bytes, 0, i);
Console.WriteLine("Received: {0}", data);
I had a Third Party application (Fiddler) running to try and see the requests being sent. Closing this application fixed it for me
If you have a https certificate on the domain, make sure you have the https binding to the domain name in IIS.
In IIS -> Select your domain -> Click on Bindings
Site Bindings Window opens up. Add a binding for https.
Try checking if you can establish handshake in the first place. I had this issue before when uploading a file and I only figured out that the issue was the nonexistent route when I removed the upload and checked if it can login given the parameters.
Another option would be to check the error code generated using try-catch block and first catching a WebException.
In my case, the error code was "SendFailure" because of certificate issue on HTTPS url, once I hit HTTP, that got resolved.
https://learn.microsoft.com/en-us/dotnet/api/system.net.webexceptionstatus?redirectedfrom=MSDN&view=netframework-4.8
This problem occurring when the Service is Unavailable within the proxy server. We can bypass the proxy server.
Before start, the service, apply this code line.
System.Net.ServicePointManager.Expect100Continue = false;
Further details
In my case I resolved this problem setting a correct API's url in my application.
It was an error connection between the application and API.
I've installed memcached on Windows as a service, listening on the default port 11211. I know this works, because I can telnet to the server and carry out get / set commands without any problems.
I've then downloaded the Enyim Memcached client (Enyim.Caching.dll, version 2.7) and written a simple test program:
var mcc = new MemcachedClientConfiguration();
mcc.AddServer("127.0.0.1:11211");
mcc.SocketPool.ReceiveTimeout = new TimeSpan(0, 0, 10);
mcc.SocketPool.ConnectionTimeout = new TimeSpan(0, 0, 10);
mcc.SocketPool.DeadTimeout = new TimeSpan(0, 0, 20);
using (MemcachedClient client = new MemcachedClient(mcc))
{
client.Store(StoreMode.Set, "enyimtest", "test value");
Console.WriteLine(client.Get<string>("enyimtest"));
}
I know this connects to my server correctly, as calling the stats command in telnet shows an increase in the number of connections. However, it doesn't call get or set, as the cmd_get and cmd_set stats counters remain constant. The call to client.Get returns null.
The program does not error in any way. Does anyone know what could prevent the Enyim client from working in this situation?
EDIT:
Looks like this is caused by a timeout. Afer configuring log4net to capture the client's logging output, I found it contained the following (in addition to other stack trace items):
2010-12-17 14:26:37,579 [1] ERROR Enyim.Caching.Memcached.MemcachedNode [(null)] - System.IO.IOException: Failed to read from the socket '172.23.0.100:11211'. Error: TimedOut
2010-12-17 14:26:37,626 [1] WARN Enyim.Caching.Memcached.MemcachedNode.InternalPoolImpl [(null)] - Marking node 172.23.0.100:11211 as dead
I still don't understand why it is timing out though?
After an hour or so of playing around, I've found the answer. I used Wireshark to look at the network traffic to and from the server. I noticed that when using the Enyim client, the messages looked nothing like those when using telnet. In particular, I couldn't read the protocol commands going across the wire when using the Enyim client.
Therefore, I concluded that the Enyim client was using a different protocol.
A second protocol was added to the memcached server in version 1.4, which is the binary protocol. Prior to that, only the text protocol was supported. The latest Windows binary I can find for memcached is the one from Jellycan, and it is only version 1.2.6.
The Enyim client is configured to use the Binary protocol by default, which was just ignored by my server as it couldn't be understood.
I added the following line to my test program, and things started working immediately:
mcc.Protocol = MemcachedProtocol.Text;
I ran into the same issue above. I too struggled to find a newer version of memcached for Windows, but did find one eventually.
I've put links to the latest binaries along with other useful resources here.
I can't reconnect to MQQueueManager after a while as an exception (reason code 2059 - MQRC_Q_MGR_NOT_AVAILABLE) is thrown when I'm constructing new object of MQQueueManager. My client app is written in .NET/C# and I'm running it on Win2003.
However I can connect to QM after I have restarted my client app. This would indicate that some state is incorrect in QM libraries? How can I reset the state in code so that I could reconnect to QM? Is there a way to reset/disconnect all active TCP connections to QM from client app code?
My connection code:
Hashtable properties = new Hashtable();
properties.Add( MQC.HOST_NAME_PROPERTY, Host );
properties.Add( MQC.PORT_PROPERTY, Port );
properties.Add( MQC.USER_ID_PROPERTY, UserId );
properties.Add( MQC.PASSWORD_PROPERTY, Password );
properties.Add( MQC.CHANNEL_PROPERTY, ChannelName );
properties.Add( MQC.TRANSPORT_PROPERTY, TransportType );
// Following line throws an exception randomly
MQQueueManager queueManager = new MQQueueManager( qmName, properties );
Stack trace:
Source: amqmdnet
CompletionCode: 2
ReasonCode: 2059
Reason: 2059
Stack Trace:
at IBM.WMQ.MQBase.throwNewMQException()
at IBM.WMQ.MQQueueManager.Connect(String queueManagerName)
at IBM.WMQ.MQQueueManager..ctor(String qmName, Hashtable properties)
at WebSphereMQOutboundAdapter.WebSphereMQOutbound.ConnectToWebSphereMQ()
Connections are per-thread so if you are attempting to create a new connection while the previous QMgr object is still instantiated, you would get this. If you close the previous connection and destroy the object before creating a new object you should be OK. Since queues and other WMQ objects depend on a connection handle these will also need to be destroyed and then reinstantiated after the new connection is made.
There are of course a few other explanations for this behavior but these are much less likely. For example, it is possible that a channel exit or (in WMQ v7) configuration could be limiting the number of simultaneous connections from a given IP address. When a connection is severed rather than closed, the channel agent holding the connection on the QMgr side has to time out before the QMgr sees the connection as closed. If connection limiting is in place, these "ghost" connections reduce the available pool. But as I said, this is far less common than programs not cleaning up old objects prior to a reconnect attempt.
There is also the possibility that this is a bug. To reduce that possibility, and for a variety of other reasons such as WMQ v6 going end of life next year, I'd recommend use of WMQ v7.0.1.2 for this project, at both the client and server side. In general, you can use v7.0.1.2 client with a v6.0.x server as long as you stick to v6 functionality. Among other things, .Net code is better integrated in v7 and the Cat-3 SupportPacs are now included in the base install media rather than a separate download.
After some months fighting with this issue and IBM support, the best solution I found is to change the connect/disconnect code in IBM MQ Driver.
Instead of calling manager.Disconnect() and manager.Close() for each GET/PUT, connect once and then reconnect only if you have some exception (like loosing connection).
What I've figure out is that some bug exists in IBM MQ Driver that caches some information for each connect/disconnect. When this buffer is full, the application stops reconnecting.
The driver version (client DLL's) I have this issue is: 7.0.1.6