WCF - retry doesn't work? - c#

I have the following config for my binding:
<binding name="wshttp" openTimeout="00:01:00" sendTimeout="00:02:00" receiveTimeout="00:03:00" closeTimeout="00:04:00">
..snap
<reliableSession inactivityTimeout="00:05:00" maxRetryCount="8" ordered="true"/>
..snap
</binding>
My expectation here is, that when the client proxy fails to send within 2 minutes, the request should be retried. However:
16:37:49,242 INFO Start process
16:39:49,588 FATAL The request operation did not complete within the allotted timeout of 00:02:00
So the application throws an error within 2 minutes, and doesn't retry the request. What should I do to get it to start retrying?

The WCF implementation of WS-ReliableMessaging does not work that way. If a proxy operation times out, no (further) retries will be performed. The retry logic of the protocol applies to messages that have been passed through to the underlying transport but have not been acknowledged at the RM layer, bounded ultimately by the MaxRetryCount and the InactivityTimeout.
Once you receive a CommunicationException or TimeoutException from your proxy channel, you can consider the session to be terminated. At this point, you'll need to reconnect and start over (or if you know where you "left off" and save some state you might be able to recover -- but this logic would be your responsibility to implement).
Basically, you should pass a timeout value which represents the longest duration you're willing to wait for the communication operation to complete. If that fails, then you must Abort() and start over.

Related

How long is a WCF connection held open?

I'm running a small WCF client application that connects to an IIS server every few minutes to download data. There are about 500 of these clients for 2 or 3 servers, and my basic code is something like this:
Client connection = null;
try
{
connection = new Client();
List<TPointer> objects = connection.GetList();
// Some work on List<T>
foreach (TPointer pointer in objects)
{
T data = GetDataFromStream(pointer, connection);
// Some additional processing on T
}
connection.SendMoreData();
// More work
}
catch (...)
{
// Exception handling for various exceptions
}
finally
{
// Handle Close() or Abort()
if (connection != null)
connection.Close();
}
When I simulate running all the clients at once for large amounts of TPointers, I start encountering the following error:
System.TimeoutException: The request channel timed out while waiting for a reply after 00:01:00.
That seems like one of those errors that can occur for any number of reasons. For all I know the server could just be swamped, or I could be requesting too large/too many objects and it's taking too long to download (a whole minute though?). Increasing the timeout is an option, but I'd like to understand the actual problem instead of fixing the symptom.
Given I have no control over the server, how can I streamline my client?
I'm not actually sure what the "request channel" mentioned in the timeout refers to. Does the timeout start ticking from when I create new Client() until I call Client.Close()? Or does each specific request I'm sending to the server (e.g. GetList or GetData) get another minute? Is it worth my while to close Client() in between each call to the server? (I'm hoping not... that would be ugly)
Would it be helpful to chunk up the amount of data I'm receiving? The GetList() call can be quite large (running into the thousands). I could try obtaining a few objects at a time and jobbing off the post-processing for later...
Edit:
Since a few people mentioned streaming:
The Client binding uses TransferMode.StreamedResponse.
GetDataFromStream() uses a Stream derived from TPointer, and SendMoreData()'s payload size is more or less negligible.
Only GetList() actually returns a non-stream object, but I'm unclear as to whether or not that affects the method of transfer.
Or does each specific request I'm sending to the server (e.g. GetList or GetData) get another minute?
The timeout property applies to each and every operation that you're doing. It's reset. If your timeout is one minute, then it starts the moment you invoke that method.
What I'd do is implement a retry policy and use an async version of the client's method and use a CancellationToken or call Abort() on your client when it's taking too long. Alternatively, you can increment or set your timeouts on the InnerChannel on the operation timout.
client.InnerChannel.OperationTimeout = TimeSpan.FromMinutes(10);
You can use that during your operation and in your retry policy you can abort entirely and reset your timeout after your retries have failed or succeeded.
Alternatively, you can try to stream your results and see if you can operate individually on them, but I don't know if keeping that connection open will trip the timeout. You'll have to hold off on operating on your collection until you have everything.
Also, set TransferMode = TransferMode.StreamedResponse in your binding.
I believe the timeout you are hitting is time to first response. In your scenario here first response is the whole response since you are returning the list, more data more time. You might want to consider streaming the data instead of returning a full list.
I suggest to modify both your web.config file (wcf side) and also app.config (client side), adding binding section like this (i.e. timeout of 25 minutes in stead of 1 minute which is default value):
<bindings>
<wsHttpBinding>
<binding name="WSHttpBinding_IYourService"
openTimeout="00:25:00"
closeTimeout="00:25:00"
sendTimeout="00:25:00"
receiveTimeout="00:25:00">
</binding>
</wsHttpBinding>
</bindings>
Given I have no control over the server, how can I streamline my client?
Basically you can not do this when you only have control over the client. It seems like the operations return no Stream (unless the pointers are types which derive from Stream).
If you want to know more about how to generally achieve streaming just read up on this MSDN article.
Everything you can do on the client is scratching on the surface of the problem. Like #The Anathema proposed in his answer you can create a retry logic and/or set the timeout to a higher value. But to eradicate the root of the problem you'd need to investigate the source of the service itself so that it can handle a higher amount of requests. Or have instances of the service running on multiple servers with a load balancer in front.
I ended up going with a combination of the answers here, so I'll just post an answer. I chunked GetList() to a certain size to avoid keeping the connection open so long (it also had a positive effect on the code in general, since I was keeping less in memory temporarily.) I already have a retry policy in place, but will also plan on messing with the timeout, as The Anathema and a couple others suggested.

Reducing time out period on WCF Channel Dispose calls?

Calling Dispose() on a WCF channel will sometimes block for one minute until a TimeoutException is raised. This seems to be generally if the server has torn down the channel from its end already.
Since we're trying to dispose of the channel and given this usually happens when the channel has already been torn down from the other end, is it possible to reduce the time out period used for the Dispose() calls?
Calling Dispose() on a WCF channel will sometimes block for one minute
until a TimeoutException is raised. This seems to be generally if the
server has torn down the channel from its end already.
Not always. Depending on your binding & your channel management, you can not close/dispose a channel until the service has finished processing the operation.
There is a precious article here that explains why one-way calls are not always one-way aka why closing a channel can block. This can help you choose another binding configuration.
Since we're trying to dispose of the channel and given this usually
happens when the channel has already been torn down from the other
end, is it possible to reduce the time out period used for the
Dispose() calls?
This can be managed by the client's timeout settings in the client config file. There are four settings (Open, Send, Receive & Close). This depends on your binding but it is generally like this (one minute here):
<binding openTimeout="00:01:00"
closeTimeout="00:01:00"
sendTimeout="00:01:00"
receiveTimeout="00:01:00">
</binding>
Here is the deal : the WCF client side throws TimeoutExceptions after this duration when the request processing takes 1 min 30 on the service.
Calling Dispose or Close is pretty much the same and will try to close the channel. You have to be very aware of the Dispose/Close issue : Closing a channel can throw exceptions, causing the channel to remain open. Read on the way to avoid this here.
I'm also very curious why calling Dispose takes 60 sec in your context. This suggests something is not valid in your WCF implementation.

ServiceModel error logged on server when closing client application: existing connection was forcibly closed by the remote host

I have a self-hosted WCF service, and several client processes ... everything works well, clients start, make several service calls, and exit. However on the server my error logs (where I forward error level trace messages from System.ServiceModel) have an entry every time a client application closes (this does not coincide with a service method call).
I'm using a custom tcp binding on .NET 4.5 ...
<bindings>
<customBinding>
<binding name="tcp">
<security authenticationMode="SecureConversation" />
<binaryMessageEncoding compressionFormat="GZip" />
<tcpTransport />
</binding>
</customBinding>
The client derives from ClientBase, and I do call Close() on the client without issue. Several instances of the ClientBase are created and Closed during operation with no errors.
My guess is that the client is keeping a socket open for re-use (a sensible optimization). Then at application exit, that socket is getting zapped.
Does this represent an error that I should fix? If its not really an "error" can I none-the-less avoid the situation somehow so as to not put junk-to-ignore in my error logs?
The client binding configuration is exactly the same as the server (naturally). Here is my calling code... note I use the ServiceHelper class from this question.
using (var helper = new ServiceHelper<ARAutomatchServiceClient, ServiceContracts.IARAutomatchService>())
{
return await helper.Proxy.GetBatchesAsync(startDate, DateTime.Today.AddDays(5));
}
Specifically, the "Error" level trace events on the server that I am seeing contains these messages (stack traces and other elements cleaned for brevity):
System.ServiceModel Error: 131075 :
System.ServiceModel.CommunicationException: The socket connection was
aborted. This could be caused by an error processing your message or a
receive timeout being exceeded by the remote host, or an underlying
network resource issue.
System.Net.Sockets.SocketException: An existing connection was
forcibly closed by the remote host NativeErrorCode: 2746
The source of all of the unwanted error messages that I have seen in the ServiceModel trace logs come from connections in the connection pool timing out on the server, or the client dropping a connection when the client process exits.
If a pooled connection times out on the server, there are some trace messages written on the server immediately on timing out and then on the client when starting the next operation. These are "Error" level trace messages.
If the client process exits before closing the connection, you get a different Error level trace message on the server immediately when the client process exits.
The fact that these are Error level trace messages is particularly annoying because I typically log these even in production environments ... but it appears these should mostly just be ignored, since its the result of a routine connection pool connection timing out.
One description of a pooled connection closing issue has been addressed by Microsoft here.
http://support.microsoft.com/kb/2607014
The article above advises that ServiceModel handles the Exception and it is safe to ignore when you see it in the TraceLogs. That particular situation is recorded as a "Information" level event, which again does not bother me as much as the "Error" level events that I'm actually logging. I tried to "filter" these messages from the logs, but it was rather difficult.
Naturally you can avoid the situation altogether by explicitly closing the pooled connections (on the client) before they timeout (on the server). In order for a client to close a connection in the connection pool (for a WCF binding with tcp transport) the only thing I know that works is to explicitly Close the ChannelFactory instance. In fact if you are not caching these instances (and not using ClientBase which usually caches them for you) then you will have no problems! If you DO want to cache your ChannelFactory instances, then you should at least explicitly close them before the application exits, which is not advice that I've seen ANYWHERE. Closing these before the client application exits will take care of one of the major sources of dropped sockets that get logged as ServiceModel "Error" traces on the server.
Here is a little code for closing a channel factory:
try
{
if (channelFactory != null)
{
if (channelFactory.State != CommunicationState.Faulted)
{
channelFactory.Close();
}
else
{
channelFactory.Abort();
}
}
}
catch (CommunicationException)
{
channelFactory.Abort();
}
catch (TimeoutException)
{
channelFactory.Abort();
}
catch (Exception)
{
channelFactory.Abort();
throw;
}
finally
{
channelFactory= null;
}
Just where you call that code is a bit tricky. Internally I schedule it in AppDomain.ProcessExit to "make sure" it gets called, but then also suggest consumer of my service base classes remember call the "close cached factories" code explicitly sometime earlier then AppDomain.ProcessExit, since ProcessExit handlers are limited to ~3 seconds to complete. Of course processes can close abruptly and never call this, but that's OK so long as it doesn't happen enough to flood your server logs.
As far as the pooled connections timing out ... you can just raise the TCP Transport "ConnectionPool" timeout value on the server to something very high (several days) and probably be OK in some situations. This would at least make it unlikely or infrequent that a connection would time out on the server. Note that having a shorter timeout value on the client doesn't appear to affect the situation in any way, so that setting might as well be left as the default. (reasoning: The client's connection be observed as timed out the next time the client needs a connection, but by this time the server will have either timed out already and logged the error, or if not, then the client will close and create a new connection and restart the server timeout period. however simply using the connection would also restart the server timeout period.)
So again, you must have a high enough connection pool timeout on the server, regardless of the client settings, to cover the period of inactivity on your client. You can further reduce the likelihood of a pooled connection timing out by reducing the size of the pool on the client (maxOutboundConnectionsPerEndpoint) so that the client doesn't open more connections than are really needed, leaving them to go unused and then eventually time-out on the server.
Configuring the connection pool for a binding has to be done in code for the built-in bindings (like netTcpBinding). For custom bindings you can do it in configuration like this (here I set a server to timeout in 2 days, and only pool 100 connections):
<customBinding>
<binding name="tcp">
<security authenticationMode="SecureConversation"/>
<binaryMessageEncoding compressionFormat="GZip"/>
<tcpTransport>
<connectionPoolSettings idleTimeout="2.00:00:00"
maxOutboundConnectionsPerEndpoint="100" />
</tcpTransport>
</binding>
</customBinding>
These two approaches together (raising the server-side timeout and closing ChannelFactory instances when clients exit) might solve your problem, or at least reduce the number of "Safe to ignore" messages significantly. Be sure the server timeout for the connection pool is AT LEAST what the client is to make sure that the connection will timeout on the client first in the case that it ever does timeout on the server (this appears to be handled more gracefully in ServiceModel, with fewer trace messages, and is exactly the situation referred to in the knowledge base article linked above).
In the Server, you'll ideally want enough maxOutboudnConnectionsPerEndpoint to serve (number of clients) x (their number of pooled connections). Otherwise you might end up with pool overruns on the server, which emit Warning level trace events. That's not too bad. If there are no connections available on the server's pool when a new client tries to connect, this generates a bunch of events on the client and server. In all of these cases (even if the pool on the server is constantly overrun) WCF will recover and function, just not optimally. That is in my experience at least ... its possible that if the "LeaseTime" for a new connection times out waiting for a server connection pool spot to open up (default is 5 minutes) then it will just fail altogether? Not sure...
A final suggestion might be to periodically close your ChannelFactory objects and recycle the cached copy. This may have only a limited impact on performance, assuming the client doesn't try to use the service exactly while the ChannelFactory instance is recycling. For instance you might schedule recycles of cached ChannelFactory instances for 5 minutes after it is created (not after it was last used, since it might have multiple pooled connections, one of which has not been used for a while). Then set your connection pool timeout on the server to be 10 minutes or so. But be sure the server timeout is a good bit over the ChannelFactory recycle period, because when you go to recycle the ChannelFactory you might have to wait until a pending operation is completed (meanwhile some unused pooled connection possibly just timed out on the server).
All of these things are micro-optimizations that may not be worth doing ... but if you log Error level ServiceModel trace events in production, you'll probably want to do SOMETHING (even if it is disabling connection pooling or ChannelFactory caching) or your logs will likely be swamped with "safe to ignore" errors.

Errors from wcf service method after few hours of work

I have a problem with wcf servcices.
Wcf services method is inovked by an application. This app calls service method very often (dozens of times per minute). The service method is called properly (with Close() at the end, or Abort() after exception). The most strange thing for me is that after few hours my app is getting errors from services:
An error occurred while receiving the HTTP response to http://domain.xx/MyService.svc. This could be due to the service endpoint binding not using the HTTP protocol. This could also be due to an HTTP request context being aborted by the server (possibly due to the service shutting down). See server logs for more details. The underlying connection was closed: An unexpected error occurred on a receive. Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.
or this one:
*The request channel timed out while waiting for a reply after 00:15:00. Increase the timeout value passed to the call to Request or increase the SendTimeout value on the Binding. The time allotted to this operation may have been a portion of a longer timeout. The HTTP request to 'http://domain.xx/MyService.svc' has exceeded the allotted timeout of 00:15:00. The time allotted to this operation may have been a portion of a longer timeout. The operation has timed out *
What can couse such errors? Why services are working properly within few first hours?
I would check your application log. From my experience, those errors tend to me more server related than code related. IIS may be having problems.
I know you mentioned it but it looks your are not closing your channels right. Also, make sure you do NOT use the same client for many server calls. Just create one, use it for a single call, and dispose it.
Here's a good read about closing WCF channels.

WCF Connections exceeding max connections when using Asynchronous pattern

I have a simple WCF service that i'm communicating with Asynchronously.
The thing i don't like is when calling the EndServiceMethod(IASyncResult)
if i forget to call the Close() method, the service will actually leave the connection open and then all remaining connections will fail after the wcf reaches it's max concurrent connections count with timeout exceptions.
I've tried using the [ServiceBehavior(InstanceContextMode=InstanceContextMode.PerCall)]
attribute to the Service contract, which doesn't appear to have any effect on the state of the connection from the service.
Perhaps I've implemented it incorrectly?
Any ideas or suggestions.
I'm trying to locate a behavior pattern for the WCF that allows the clients to make a request, and then the server to respond to the request and then assume that the connection is finished and can be terminated.
This is actually a tricky problem.
On the one hand if you do not close the connection it will remain open until it times out (1 min), under load you will hit the max connections (default 10).
On the other hand you are calling the services asynchronously, so if you close the connect before the callback is received, the callback will be lost.
There are a few things that you could try:
increase the max connections
close the connection in the callback handler
reduce the length of the timeout
Specifies the throttling mechanism of a Windows Communication Foundation (WCF) service.
http://msdn.microsoft.com/en-us/library/ms731379%28v=VS.90%29.aspx
I don't know if this helps:
You can set the binding so that
Security is set to none
Reliable sessions are disabled
<wsHttpBinding>
<binding name="MyWsHttpBinding">
<reliableSession enabled="false"/>
<security mode="None" />
</binding>
</wsHttpBinding>
I've discovered that by doing this I can open an unlimited number of channels and "forget" to close them.
Then you have to ask if that's an acceptable configuration for your circumstances.
I wouldn't use the Asynchronous pattern with WCF. Instead I would just use synchronous calls with normal using blocks to ensure the connection is closed. Then I would wrap the whole mess in a normal Task (.NET 4.0) or ThreadPool work item.
Closing any type of connection when you don't need it anymore is just basic developer responsibility. There is nothing to complain about. Close connection and you will not have this problem. Trying to solve missing Close calls in any other way is nonsense.

Categories