How long is a WCF connection held open? - c#

I'm running a small WCF client application that connects to an IIS server every few minutes to download data. There are about 500 of these clients for 2 or 3 servers, and my basic code is something like this:
Client connection = null;
try
{
connection = new Client();
List<TPointer> objects = connection.GetList();
// Some work on List<T>
foreach (TPointer pointer in objects)
{
T data = GetDataFromStream(pointer, connection);
// Some additional processing on T
}
connection.SendMoreData();
// More work
}
catch (...)
{
// Exception handling for various exceptions
}
finally
{
// Handle Close() or Abort()
if (connection != null)
connection.Close();
}
When I simulate running all the clients at once for large amounts of TPointers, I start encountering the following error:
System.TimeoutException: The request channel timed out while waiting for a reply after 00:01:00.
That seems like one of those errors that can occur for any number of reasons. For all I know the server could just be swamped, or I could be requesting too large/too many objects and it's taking too long to download (a whole minute though?). Increasing the timeout is an option, but I'd like to understand the actual problem instead of fixing the symptom.
Given I have no control over the server, how can I streamline my client?
I'm not actually sure what the "request channel" mentioned in the timeout refers to. Does the timeout start ticking from when I create new Client() until I call Client.Close()? Or does each specific request I'm sending to the server (e.g. GetList or GetData) get another minute? Is it worth my while to close Client() in between each call to the server? (I'm hoping not... that would be ugly)
Would it be helpful to chunk up the amount of data I'm receiving? The GetList() call can be quite large (running into the thousands). I could try obtaining a few objects at a time and jobbing off the post-processing for later...
Edit:
Since a few people mentioned streaming:
The Client binding uses TransferMode.StreamedResponse.
GetDataFromStream() uses a Stream derived from TPointer, and SendMoreData()'s payload size is more or less negligible.
Only GetList() actually returns a non-stream object, but I'm unclear as to whether or not that affects the method of transfer.

Or does each specific request I'm sending to the server (e.g. GetList or GetData) get another minute?
The timeout property applies to each and every operation that you're doing. It's reset. If your timeout is one minute, then it starts the moment you invoke that method.
What I'd do is implement a retry policy and use an async version of the client's method and use a CancellationToken or call Abort() on your client when it's taking too long. Alternatively, you can increment or set your timeouts on the InnerChannel on the operation timout.
client.InnerChannel.OperationTimeout = TimeSpan.FromMinutes(10);
You can use that during your operation and in your retry policy you can abort entirely and reset your timeout after your retries have failed or succeeded.
Alternatively, you can try to stream your results and see if you can operate individually on them, but I don't know if keeping that connection open will trip the timeout. You'll have to hold off on operating on your collection until you have everything.
Also, set TransferMode = TransferMode.StreamedResponse in your binding.

I believe the timeout you are hitting is time to first response. In your scenario here first response is the whole response since you are returning the list, more data more time. You might want to consider streaming the data instead of returning a full list.

I suggest to modify both your web.config file (wcf side) and also app.config (client side), adding binding section like this (i.e. timeout of 25 minutes in stead of 1 minute which is default value):
<bindings>
<wsHttpBinding>
<binding name="WSHttpBinding_IYourService"
openTimeout="00:25:00"
closeTimeout="00:25:00"
sendTimeout="00:25:00"
receiveTimeout="00:25:00">
</binding>
</wsHttpBinding>
</bindings>

Given I have no control over the server, how can I streamline my client?
Basically you can not do this when you only have control over the client. It seems like the operations return no Stream (unless the pointers are types which derive from Stream).
If you want to know more about how to generally achieve streaming just read up on this MSDN article.
Everything you can do on the client is scratching on the surface of the problem. Like #The Anathema proposed in his answer you can create a retry logic and/or set the timeout to a higher value. But to eradicate the root of the problem you'd need to investigate the source of the service itself so that it can handle a higher amount of requests. Or have instances of the service running on multiple servers with a load balancer in front.

I ended up going with a combination of the answers here, so I'll just post an answer. I chunked GetList() to a certain size to avoid keeping the connection open so long (it also had a positive effect on the code in general, since I was keeping less in memory temporarily.) I already have a retry policy in place, but will also plan on messing with the timeout, as The Anathema and a couple others suggested.

Related

How to make PostAsync respond if API is down

I am calling an API using these commands:
byte[] messageBytes = System.Text.Encoding.UTF8.GetBytes(message);
var content = new ByteArrayContent(messageBytes);
content.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue("application/json");
HttpResponseMessage response = client.PostAsync(ApiUrl, content).Result;
However the code stops executing at the PostAsync line. I put a breakpoint on the next line but it is never reached. It does not throw an error immediately, but a few minutes later it throws an error like:
System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
I presume this means the API is down. What can I do to make PostAsync spit back an error immediately even if the API is down so that I can handle the error and inform the user?
Thank you.
Broadly speaking, what you're asking is "How can I check if an API is available?", and the answer to this depends how low level you want to get, and what you want to do for each level of unavailability:
Is there internet connectivity? Is it worth probing this locally first (as it's relatively quick to check)?
Is the server address correct? If it's wrong it doesn't matter how long you wait. Can the user configure this?
Is the address correct but the server is unable or unwilling to respond? What then?
If you're willing to lump them all into a single "can't contact server in a reasonable amount of time" bucket, there are a few approaches:
Decrease timeouts (beware)
In the case you gave, it sounds like your request is simply timing out: the address or port is wrong, the server is under immense load and can't respond in a timely fashion, you're attempting to contact a non-SSL endpoint using SSL or vice-versa, etc. In any of these cases, you can't know if the request has timed out, until it actually times out. One thing you can do is reduce the HttpClient request timeout. Beware: going too low will cause slow connections to time out on users, which is a worse problem than the one you have.
Pre-check
You could, either before each call, periodically, or at some point early in the client initialisation, do a quick probe of the API to see if it's responsive. This can be spun off into either an async or background task while the UI is being built, etc. This gives you more time to wait for a response, and as an added bonus if the API is responding slowly you can notify your users of this so they know not to expect immediate responses to their clicks. This will improve user experience. If the pre-check fails, you could show an error and advise the user to either check connectivity, check server address (if it's configurable), retry, etc.
Use a CancellationToken
You could pass a CancellationToken into PostAsync with a suitable timeout set, which also allows you to let the user cancel the request if they want to. Read up on CancellationToken for more information.
EDIT: as Alex pointed out, this line is not usually how you deal with async tasks:
HttpResponseMessage response = client.PostAsync(ApiUrl, content).Result;
Change this instead to:
HttpResponseMessage response = await client.PostAsync(ApiUrl, content);
Of course the calling method will then also need to be marked as async, and so on ("It's asyncs, all the way up"), but this is a good thing - it means that your code is not blocking a thread while it waits for a response from the server.
Have a read here for some good material.
Hope that helps

How can I determine if a Network Stream timed out?

I have a TCP client and I set the network stream timeout as follows.
stream.ReadTimeout = 60000;
It works. But I would like to know how to test if the stream timed out. The class doesn't provide this method.
A little more detail to the question.
I am sending data to a TCPListener, about 33KB every 30 minutes. Typically, the transmission lasts about 10s and the client issues a manual "DISCONNECT" command to causes the Listener to start again. The client is an embedded system using a 3G module and sometimes the network connectivity causes the link to break. Right now, I am simply setting a read timeout of 60s. If we do not get data during that time, we simply restart the listener and wait for the next connection.
I am logging the performance of the system and would like to know how many timeouts typically occur in, say, one week. It'd have been good for the listener to simply check if the read operation time out, but I do not see a way of doing it easily in C#.
Will appreciate any help.
I do not really understand the problem about logging. I would look for the Read operation's return value, because if that is 0, a timeout occured for sure. And before I reinitialized the listener I would put a logging logic that logs the fact of the timeout. Please tell me if I misunderstood the concept of your program.

Unable to resolve DNS (sometimes?)

Given an application that in parallel requests 100 urls at a time for 10000 urls, I'll receive the following error for 50-5000 of them:
The remote name cannot be resolved 'www.url.com'
I understand that the error means the DNS Server was unable to resolve the url. However, for each run, the number of urls that cannot be resolved changes (ranging from 50 to 5000).
Am I making too many requests too fast? And can I even do that? - Running the same test on a much more powerful server, shows that only 10 urls could not be resolved - which sounds much more realistic.
The code that does the parallel requesting:
var semp = new SemaphoreSlim(100);
var uris = File.ReadAllLines(#"C:\urls.txt").Select(x => new Uri(x));
foreach(var uri in uris)
{
Task.Run(async () =>
{
await semp.WaitAsync();
var result = await Web.TryGetPage(uri); // Using HttpWebRequest
semp.Release();
});
}
I'll bet that you didn't know that the DNS lookup of HttpWebRequest (which is the cornerstone of all .net http apis) happens synchronously, even when making async requests (annoying, right?). This means that firing off many requests at once causes severe ThreadPool strain and large amount of latency. This can lead to unexpected timeouts. If you really want to step things up, don't use the .net dns implementation. You can use a third party library to resolve hosts and create your webrequest with an ip instead of a hostname, then manually set the host header before firing off the request. You can achieve much higher throughput this way.
It does sound like you're swamping your local DNS server (in the jargon, your local recursive DNS resolver).
When your program issues a DNS resolution request, it sends a port 53 datagram to the local resolver. That resolver responds either by replying from its cache or recursively resending the request to some other resolver that's been identified as possibly having the record you're looking for.
So, your multithreaded program is causing a lot of datagrams to fly around. Internet Protocol hosts and routers handle congestion and overload by dropping datagram packets. It's like handling a traffic jam on a bridge by bulldozing cars off the bridge. In an overload situation, some packets just disappear.
So, it's up to endpoint software using datagram protocols to try again if their packets get lost. That's the purpose of TCP, and that's how it can provide the illusion of an error-free stream of data even though it can only communicate with datagrams.
So, your program will need to try again when you get resolution failure on some of your DNS requests. You're a datagram endpoint so you own the responsibility of retry. I suspect the .net library is give you back failure when some of your requests time out because your datagrams got dropped.
Now, here's the important thing. It is also the responsibility of a datagram endpoint program, like yours, to implement congestion control. TCP does this automatically using its sliding window system, with an algorithm called slow-start / exponential backoff. If TCP didn't do this all internet routers would be congested all the time. This algorithm was dreamed up by Van Jacobson, and you should go read about it.
In the meantime you should implement a simple form of it in your bulk DNS lookup program. Here's how you might do that.
Start with a batch size of, say, 5 lookups.
Every time you get the whole batch back successfully, increase your batch size by one for your next batch. This is slow-start. As long as you're not getting congestion, you increase the network load.
Every time you get a failure to resolve a name, reduce the size of the next batch by half. So, for example, if your batch size was 30 and you got a failure, your next batch size will be 15. This is exponential backoff. You respond to congestion by dramatically reducing the load you're putting on the network.
Implement a maximum batch size of something like 100 just to avoid being too much of a pig and looking like a crude denial-of-service attack to the DNS system.
I had a similar project a while ago and this strategy worked well for me.

Reducing time out period on WCF Channel Dispose calls?

Calling Dispose() on a WCF channel will sometimes block for one minute until a TimeoutException is raised. This seems to be generally if the server has torn down the channel from its end already.
Since we're trying to dispose of the channel and given this usually happens when the channel has already been torn down from the other end, is it possible to reduce the time out period used for the Dispose() calls?
Calling Dispose() on a WCF channel will sometimes block for one minute
until a TimeoutException is raised. This seems to be generally if the
server has torn down the channel from its end already.
Not always. Depending on your binding & your channel management, you can not close/dispose a channel until the service has finished processing the operation.
There is a precious article here that explains why one-way calls are not always one-way aka why closing a channel can block. This can help you choose another binding configuration.
Since we're trying to dispose of the channel and given this usually
happens when the channel has already been torn down from the other
end, is it possible to reduce the time out period used for the
Dispose() calls?
This can be managed by the client's timeout settings in the client config file. There are four settings (Open, Send, Receive & Close). This depends on your binding but it is generally like this (one minute here):
<binding openTimeout="00:01:00"
closeTimeout="00:01:00"
sendTimeout="00:01:00"
receiveTimeout="00:01:00">
</binding>
Here is the deal : the WCF client side throws TimeoutExceptions after this duration when the request processing takes 1 min 30 on the service.
Calling Dispose or Close is pretty much the same and will try to close the channel. You have to be very aware of the Dispose/Close issue : Closing a channel can throw exceptions, causing the channel to remain open. Read on the way to avoid this here.
I'm also very curious why calling Dispose takes 60 sec in your context. This suggests something is not valid in your WCF implementation.

ServiceModel error logged on server when closing client application: existing connection was forcibly closed by the remote host

I have a self-hosted WCF service, and several client processes ... everything works well, clients start, make several service calls, and exit. However on the server my error logs (where I forward error level trace messages from System.ServiceModel) have an entry every time a client application closes (this does not coincide with a service method call).
I'm using a custom tcp binding on .NET 4.5 ...
<bindings>
<customBinding>
<binding name="tcp">
<security authenticationMode="SecureConversation" />
<binaryMessageEncoding compressionFormat="GZip" />
<tcpTransport />
</binding>
</customBinding>
The client derives from ClientBase, and I do call Close() on the client without issue. Several instances of the ClientBase are created and Closed during operation with no errors.
My guess is that the client is keeping a socket open for re-use (a sensible optimization). Then at application exit, that socket is getting zapped.
Does this represent an error that I should fix? If its not really an "error" can I none-the-less avoid the situation somehow so as to not put junk-to-ignore in my error logs?
The client binding configuration is exactly the same as the server (naturally). Here is my calling code... note I use the ServiceHelper class from this question.
using (var helper = new ServiceHelper<ARAutomatchServiceClient, ServiceContracts.IARAutomatchService>())
{
return await helper.Proxy.GetBatchesAsync(startDate, DateTime.Today.AddDays(5));
}
Specifically, the "Error" level trace events on the server that I am seeing contains these messages (stack traces and other elements cleaned for brevity):
System.ServiceModel Error: 131075 :
System.ServiceModel.CommunicationException: The socket connection was
aborted. This could be caused by an error processing your message or a
receive timeout being exceeded by the remote host, or an underlying
network resource issue.
System.Net.Sockets.SocketException: An existing connection was
forcibly closed by the remote host NativeErrorCode: 2746
The source of all of the unwanted error messages that I have seen in the ServiceModel trace logs come from connections in the connection pool timing out on the server, or the client dropping a connection when the client process exits.
If a pooled connection times out on the server, there are some trace messages written on the server immediately on timing out and then on the client when starting the next operation. These are "Error" level trace messages.
If the client process exits before closing the connection, you get a different Error level trace message on the server immediately when the client process exits.
The fact that these are Error level trace messages is particularly annoying because I typically log these even in production environments ... but it appears these should mostly just be ignored, since its the result of a routine connection pool connection timing out.
One description of a pooled connection closing issue has been addressed by Microsoft here.
http://support.microsoft.com/kb/2607014
The article above advises that ServiceModel handles the Exception and it is safe to ignore when you see it in the TraceLogs. That particular situation is recorded as a "Information" level event, which again does not bother me as much as the "Error" level events that I'm actually logging. I tried to "filter" these messages from the logs, but it was rather difficult.
Naturally you can avoid the situation altogether by explicitly closing the pooled connections (on the client) before they timeout (on the server). In order for a client to close a connection in the connection pool (for a WCF binding with tcp transport) the only thing I know that works is to explicitly Close the ChannelFactory instance. In fact if you are not caching these instances (and not using ClientBase which usually caches them for you) then you will have no problems! If you DO want to cache your ChannelFactory instances, then you should at least explicitly close them before the application exits, which is not advice that I've seen ANYWHERE. Closing these before the client application exits will take care of one of the major sources of dropped sockets that get logged as ServiceModel "Error" traces on the server.
Here is a little code for closing a channel factory:
try
{
if (channelFactory != null)
{
if (channelFactory.State != CommunicationState.Faulted)
{
channelFactory.Close();
}
else
{
channelFactory.Abort();
}
}
}
catch (CommunicationException)
{
channelFactory.Abort();
}
catch (TimeoutException)
{
channelFactory.Abort();
}
catch (Exception)
{
channelFactory.Abort();
throw;
}
finally
{
channelFactory= null;
}
Just where you call that code is a bit tricky. Internally I schedule it in AppDomain.ProcessExit to "make sure" it gets called, but then also suggest consumer of my service base classes remember call the "close cached factories" code explicitly sometime earlier then AppDomain.ProcessExit, since ProcessExit handlers are limited to ~3 seconds to complete. Of course processes can close abruptly and never call this, but that's OK so long as it doesn't happen enough to flood your server logs.
As far as the pooled connections timing out ... you can just raise the TCP Transport "ConnectionPool" timeout value on the server to something very high (several days) and probably be OK in some situations. This would at least make it unlikely or infrequent that a connection would time out on the server. Note that having a shorter timeout value on the client doesn't appear to affect the situation in any way, so that setting might as well be left as the default. (reasoning: The client's connection be observed as timed out the next time the client needs a connection, but by this time the server will have either timed out already and logged the error, or if not, then the client will close and create a new connection and restart the server timeout period. however simply using the connection would also restart the server timeout period.)
So again, you must have a high enough connection pool timeout on the server, regardless of the client settings, to cover the period of inactivity on your client. You can further reduce the likelihood of a pooled connection timing out by reducing the size of the pool on the client (maxOutboundConnectionsPerEndpoint) so that the client doesn't open more connections than are really needed, leaving them to go unused and then eventually time-out on the server.
Configuring the connection pool for a binding has to be done in code for the built-in bindings (like netTcpBinding). For custom bindings you can do it in configuration like this (here I set a server to timeout in 2 days, and only pool 100 connections):
<customBinding>
<binding name="tcp">
<security authenticationMode="SecureConversation"/>
<binaryMessageEncoding compressionFormat="GZip"/>
<tcpTransport>
<connectionPoolSettings idleTimeout="2.00:00:00"
maxOutboundConnectionsPerEndpoint="100" />
</tcpTransport>
</binding>
</customBinding>
These two approaches together (raising the server-side timeout and closing ChannelFactory instances when clients exit) might solve your problem, or at least reduce the number of "Safe to ignore" messages significantly. Be sure the server timeout for the connection pool is AT LEAST what the client is to make sure that the connection will timeout on the client first in the case that it ever does timeout on the server (this appears to be handled more gracefully in ServiceModel, with fewer trace messages, and is exactly the situation referred to in the knowledge base article linked above).
In the Server, you'll ideally want enough maxOutboudnConnectionsPerEndpoint to serve (number of clients) x (their number of pooled connections). Otherwise you might end up with pool overruns on the server, which emit Warning level trace events. That's not too bad. If there are no connections available on the server's pool when a new client tries to connect, this generates a bunch of events on the client and server. In all of these cases (even if the pool on the server is constantly overrun) WCF will recover and function, just not optimally. That is in my experience at least ... its possible that if the "LeaseTime" for a new connection times out waiting for a server connection pool spot to open up (default is 5 minutes) then it will just fail altogether? Not sure...
A final suggestion might be to periodically close your ChannelFactory objects and recycle the cached copy. This may have only a limited impact on performance, assuming the client doesn't try to use the service exactly while the ChannelFactory instance is recycling. For instance you might schedule recycles of cached ChannelFactory instances for 5 minutes after it is created (not after it was last used, since it might have multiple pooled connections, one of which has not been used for a while). Then set your connection pool timeout on the server to be 10 minutes or so. But be sure the server timeout is a good bit over the ChannelFactory recycle period, because when you go to recycle the ChannelFactory you might have to wait until a pending operation is completed (meanwhile some unused pooled connection possibly just timed out on the server).
All of these things are micro-optimizations that may not be worth doing ... but if you log Error level ServiceModel trace events in production, you'll probably want to do SOMETHING (even if it is disabling connection pooling or ChannelFactory caching) or your logs will likely be swamped with "safe to ignore" errors.

Categories