WCF timeout error with code running fine - c#

I have a project that uses a WCF service to do some database queries, builds an "Environment" object (which consists of different database class objects) and returns it inside a "Workspace" object to the client. It's been running fine.
I added another "Database" type to the service with all the correct contract and method updates. Now when I call the method the client times out after 1 minute. In debugging it take about 3-5 seconds to hit the end of the service method. Then nothing happens for the rest of the minute until on the client side we see a timeout problem. There are no errors/exceptions thrown.
Please see below:
Calling from client:
490 m_ScanWorkspace = m_Connection.ScanProxy.CreateEnvironments
End of service method:
477 return tWorkspace;
478 }
It takes 3-5 seconds to get to line 478 in the service. F10 shows it's complete.
Nothing happens until 1 minute later when line 490 in the client shows a timeout error. while debugging I can see a valid object in tWorkspace.

Firstly, set up WCF tracing using the Diagnostics namespace. Just use the first example on that tutorial and WCF will dump out a log of all activity, which you can open up in the log viewer. It will tell you exactly where the call is failing, which will help you pinpoint the problem.
WCF is great, but the error messages it gives are cryptic and often close to useless. A timeout after 1 minute doesn't necessarily mean what a timeout would normally mean - i.e. couldn't find the server. It could be other issues.
More than likely there will be a threshold exceeded which causes the response object to be incomplete. This could be array length, string content length, message size, and so on. You will find some of these detailed here: https://stackoverflow.com/a/480191/146077
Good luck!

Related

Excessive timeout on simple byte array reply from SOAP / WSDL Web Service call ... as if it started replying but never finishes?

My C# client (running on .NET Framework 4.5.1 or later) calls a WSDL-defined SOAP web service call that returns a byte[] (with length typically about 100000). We make hundreds of calls to this web service just fine -- they normally take just a few seconds to return. But very intermittently, the call sits there for exactly 5 minutes and then throws an InvalidOperationException indicating that "There is an error in XML document (1, 678)", with an InnerException that is a WebException "The operation has timed out." We've wrapped a try-catch around this call, look for those particular Exceptions, and then ask the user if they'd like us to retry it, and usually it works just fine on the next try.
Looking at the logging on the server, the logs for the good calls and the intermittent bad calls look exactly the same. In particular, in both cases we get the log statement at the very end of the web service, right before the "return byteArray;"... and it is doing that in the typical 3-15 seconds from the start of the call. So, it seems the web service returns the byte array successfully, but the client that called the web service just never receives it.
However, the client does NOT get the typical SoapException or WebException... for example, if we pause the web service in the debugger right before that return, then after 60 seconds the client will get a WebException "The operation has timed out." But we don't get that in this case... instead we are stuck there for a full 5 minutes before we finally get the InvalidOperationException mentioned above. So, it is as if it started receiving the reply, so it doesn't consider it timed out the normal way, but it never gets the rest of the reply, and the parsing/deserializing of the XML containing the reply eventually times out.
Question #1: Any suggestions on what's happening here? Or what we might be doing wrong in our web service that would result in a byte[] reply getting stuck mid-return intermittently? I'd obviously love to fix the root problem.
Question #2: What controls the length of that 5 minute timeout?? Our exception handling for this would be okay except for the ridiculous 5 minute timeout. After about 10 seconds, the user knows it is stuck because it normally returns in 10 seconds or less. But they have to sit there and wait for 5 minutes before they can do anything. We have set every timeout setting we could find to just 60 seconds, but none seem to control this. We have set:
In the server Web.config: <httpRuntime executionTimeout="60">
In the server Global.asax.cs: HttpContext.Current.Server.ScriptTimeout = 60;
In both server and client: ServicePointManager.MaxServicePointIdleTime = 60000;
In the client, right after we new up the WSDL-defined class derived from SoapHttpClientProtocol with all the web service calls, we call: service.Timeout = 60000;
We previously had those at their defaults or set to 100 / 100000 ... we lowered them all to 60 / 60000 to see if the 5 minute wait would come down at all (just in case one or more of them were being added into that 5 minutes). But no, no matter what we changed any of those timeouts to, the timeout in this case remains exactly 5 minutes, every time it gets stuck.
Does anybody know where the length of the timeout is set for when it generates an InvalidOperationException on the XML document containing the returned byte array due to an InnerException WebException with the timeout?? (please!)

Real time data storage with Azure Tables using ASP.NET

I am using a Lab View application to simulate a test running, which would post a JSON string to my ASP.NET application. Within the ASP.NET application I format the data with the proper partition and row keys, then send it to Azure Table Storage.
The problem that I am having is that after what seems like a random amount of time (i.e. 5 minutes, 2 hours, 5 hours), the data fails to be saved into Azure. I am try to catch any exceptions within the ASP.NET application and send the error message back to the Lab View app and the Lab View app is also catching any exceptions in may encounter so I can trouble shoot where the issue is occurring.
The only error that I am able to catch is a Timeout Error 56 in the Lab View program. My question is, does anyone have an idea of where I should be looking for the root cause of this? I do not know where to begin.
EDIT:
I am using a table storage writer that I found here to do batch operations with retries.
The constructor for exponential retry policy is below:
public ExponentialRetry(TimeSpan deltaBackoff, int maxAttempts)
when you (or the library you use to be exact) instantiate this as RetryPolicy = new ExponentialRetry(TimeSpan.FromMilliseconds(2),100) you are basically setting the max attempts as 100 which means you may end up waiting up to around 2^100 milliseconds (there is some more math behind this but just simplifying) for each of your individual batch requests to fail on the client side until the sdk gives up retrying.
The other issue with that code is it executes batch requests sequentially and synchronously, that has multiple bad effects, first, all subsequent batch requests are blocked by the current batch request, second your cores are blocked waiting on I/O operations, third it has no exception handling so if one of the batch operations throw an exception, the method bails out and would not continue any further processing other batch requests.
My recommendation, do not use that library, batch operations are fairly straight forward. The default retry policy if you do not explicitly define is the exponential retry policy anyways with sensible default parameters (does 3 retries) so you do not even need to define your own retry object. For best scalability and throughput run your batch operations async (and concurrently).
As to why things fail, when you write your own api, catch the StorageException and check the http status code on the exception itself. You could be getting throttled by azure as one of the possibilities but it is hard to say without further debugging or you providing the http status code for the failed batch operations to us.
You need to check whether an exception is transient or not. As Peter said on his comment, Azure Storage client already implements a retry policy. You can also wrap your code with another retry code (e.g using polly) or you should change the default policy associated to Azure Storage Client.

Weird WCF behaviour regarding Timouts and Exceptions

I have created an WCF service hosted inside a normal Windows service. This service is deployed to customers and set up on their servers. Therefore (afaik) I need to establish the WCF proxy dynamically and cannot rely on some prebuilt proxy created by VS or the Silverlight tools. The clients in this case are mobile apps built with Xamarin.Forms.
The Code to create the "Channel":
public void Init(int timeout = 15)
{
ea = new EndpointAddress(string.Format("http://{0}:{1}/{2}", _settingsService.ConnectionIP, _settingsService.ConnectionPort, _settingsService.ConnectionEndpoint));
bhttpb = new BasicHttpBinding(BasicHttpSecurityMode.None);
bhttpb.SendTimeout = TimeSpan.FromSeconds(timeout);
cfIMMC = new ChannelFactory<IMaintMobileContract>(bhttpb, ea);
cfIMMC.Opened += cfIMMC_Opened;
cfIMMC.Faulted += cfIMMC_Faulted;
cfIMMC.Closed += cfIMMC_Closed;
immc = cfIMMC.CreateChannel(ea);
immc.Ping(); // This function is defined by me in the Contract. It only returns true, if the server can be reached.
}
So far everything works fine if the service is running, but the app has to run "offline" and then it gets weird.
When the connection is established there is no EndpointException or anything, and when a function is called it just sits there waiting until the timeout hits.
It would be really nice to get some information whether the WCF service is actually there or not. I have function calls that can take up to multiple minutes and it would be fatal for the app to wait that long when the WCF server is not there at all. How can I achieve that?
Update:
Right now it got even weirder. Now, aprox. 30 seconds after the Ping() fails, I get System.Net.Sockets.SocketException: Connection timed out and System.Net.WebException: Error: ConnectFailure (Connection timed out) out of nowhere.
Update 2 :
Here a pic of the CallStack:
If you need fast feedback regarding whether service is alive or not, then setup additional endpoint (with separate contract containing only Ping method) and set small timeouts for it.
And important part is to set send/receive timeouts to small value as well - this will ensure that Ping method returns/throws fast if service is not available.
As far as I remember WCF does not open channel (== does not connect to server) until you call one of the methods - that's why you don't have exceptions before Ping is called.
About exception after 30 seconds. Where do you see it? I mean is it Visual Studio that breaks there or do you have your application failing with unhandled exception? I'm asking it because I see this in the Xamarin/Mono code:
initConn = new WaitCallback (state => {
try {
InitConnection (state);
} catch {}
});
And it means that even though this exception is thrown after 30 seconds - it'll be swallowed. What really happens is that when request is sent (i.e. when you call Ping()) the runtime tries to open connection in background (your call stack confirms that) and 30 seconds is default Windows timeout for connection. WCF will fail earlier if it has lower timeout set (like in your case), but connection attempt will last for 30 seconds and will complete with exception.
So, my opinion is that you should not care about this exception, unless it somehow stops your application.

SignalR Groups.Add times out and fails

I'm trying to add a member to a Group using SignalR 2.2. Every single time, I hit a 30 second timeout and get a "System.Threading.Tasks.TaskCanceledException: A task was canceled." error.
From a GroupSubscriptionController that I've written, I'm calling:
var hubContext = GlobalHost.ConnectionManager.GetHubContext<ProjectHub>();
await hubContext.Groups.Add(connectionId, groupName);
I've found this issue where people are periodically encountering this, but it happens to me every single time. I'm running the backend (ASP.NET 4.5) on one VS2015 launched localhost port, and the frontend (AngularJS SPA) on another VS 2015 launched localhost port.
I had gotten SignalR working to the point where messages were being broadcast to every connected client. It seemed so easy. Now, adding in the Groups part (so that people only get select messages from the server) has me pulling my hair out...
That task cancellation error could be being thrown because the connectionId can't be found in the SignalR registry of connected clients.
How are you getting this connectionId? You have multiple servers/ports going - is it possible that you're getting your wires crossed?
I know there is an accepted answer to this, but I came across this once for a different reason.
First off, do you know what Groups.Add does?
I had expected Groups.Add's task to complete almost immediately every time, but not so. Groups.Add returns a task that only completes, when the client (i.e. Javascript) acknowledges that it has been added to a group - this is useful for reconnecting so it can resubscribe to all its old groups. Note this acknowledgement is not visible to the developer code and nicely covered up for you.
The problem is that the client may not respond because they have disconnected (i.e. they've navigated to another page). This will mean that the await call will have to wait until the connection has disconnected (default timeout 30 seconds) before giving up by throwing a TaskCanceledException.
See http://www.asp.net/signalr/overview/guide-to-the-api/working-with-groups for more detail on groups

Application throwing time out exception on sending bulk data

I have a Console Application which consumes a BizTalk Web Service. The Problem is that when I send the BizTalk Service object data in bulk, my console application throws the exception:
Application has either timed out or is Timing out.
My application actually needs to wait for the Biztalk service to finish processing its job. Increasing the obj.Timeout value was of no help. Is there anything else other than using Thread.Sleep method (which I want to avoid)?
Below is the relevant code snippet from my application:
pumpSyncService.Timeout = 750000;
outputRecords = pumpSyncService.PumpSynchronization(pumpRecords);
The pump records contain an array of objects. When the count is around 30, I get a correct response, but when the count increases to around 150 I get the exception.
Try sending smaller chunks in a loop. Instead of sending 150 all at once, send 30 records 5 times. The timeout might be happening because it takes too long to send 150 records.
you should be able to send all 30 at once , if the service allows you to. I am assuming you have verified that the event kicking this off is not firing 5 times . try it asynchronously and process your results when they come back.

Categories