I am using a netNamedPipeBinding to perform inter-process WCF communication from a windows app to a windows service.
Now my app is running well in all other accounts (after fighting off my fair share of WCF exceptions as anybody who has worked with WCF would know..) but this error is one that is proving to be quite resilient.
To paint a picture of my scenario: my windows service could be queued to do some work at any given time through a button pressed in the windows app and it then talks over the netNamedPipeBinding which is a binding that supports callbacks (two-way communication) if you are not familiar and initiates a request to perform this work, (in this case a file upload procedure) it also throws the callbacks (events) every few seconds ranging from file progress to transfer speed etc. etc. back to the windows app, so there is some fairly tight client-server integration; this is how I receive my progress of what's running in my windows service back into my windows app.
Now, all is great, the WCF gods are relatively happy with me right now apart from one nasty exception which I receive every time I shutdown the app prematurely (which is a perfectly valid scenario). Whilst a transfer is in progress, and callbacks are firing pretty heavily, I receive this error:
System.ServiceModel.ProtocolException:
The channel received an unexpected input message with Action
'http://tempuri.org/ITransferServiceContract/TransferSpeedChangedCallback'
while closing. You should only close your channel when you are not expecting
any more input messages.
Now I understand that error, but unfortunately I cannot guarantee to close my channel after never receiving any more input messsages, as the user may shutdown the app at any time therefore the work will still be continuing in the background of the windows service (kind of like how a virus scanner operates). The user should be able to start and close the win management tool app as much as they like with no interference.
Now the error, I receive immediately after performing my Unsubscribe() call which is the second last call before terminating the app and what I believe is the preferred way to disconnect a WCF client. All the unsubscribe does before closing the connection is simply removes the client id from an array which was stored locally on the win service wcf service (as this is an instance SHARED by both the win service and windows app as the win service can perform work at scheduled events by itself) and after the client id array removal I perform, what I hope (feel) should be a clean disconnection.
The result of this, besides receiving an exception, is my app hangs, the UI is in total lock up, progress bars and everything mid way, with all signs pointing to having a race condition or WCF deadlock [sigh], but I am pretty thread-savvy now and I think this is a relatively isolated situation and reading the exception as-is, I don't think it's a 'thread' issue per-se, as it states more an issue of early disconnection which then spirals all my threads into mayhem, perhaps causing the lock up.
My Unsubscribe() approach on the client looks like this:
public void Unsubscribe()
{
try
{
// Close existing connections
if (channel != null &&
channel.State == CommunicationState.Opened)
{
proxy.Unsubscribe();
}
}
catch (Exception)
{
// This is where we receive the 'System.ServiceModel.ProtocolException'.
}
finally
{
Dispose();
}
}
And my Dispose() method, which should perform the clean disconnect:
public void Dispose()
{
// Dispose object
if (channel != null)
{
try
{
// Close existing connections
Close();
// Attempt dispose object
((IDisposable)channel).Dispose();
}
catch (CommunicationException)
{
channel.Abort();
}
catch (TimeoutException)
{
channel.Abort();
}
catch (Exception)
{
channel.Abort();
throw;
}
}
}
And the WCF service Subscription() counterpart and class attributes (for reference) on the windows service server (nothing tricky here and my exception occurs client side):
[ServiceBehavior(InstanceContextMode = InstanceContextMode.Single,
ConcurrencyMode = ConcurrencyMode.Multiple)]
public class TransferService : LoggableBase, ITransferServiceContract
{
public void Unsubscribe()
{
if (clients.ContainsKey(clientName))
{
lock (syncObj)
{
clients.Remove(clientName);
}
}
#if DEBUG
Console.WriteLine(" + {0} disconnected.", clientName);
#endif
}
...
}
Interface of:
[ServiceContract(
CallbackContract = typeof(ITransferServiceCallbackContract),
SessionMode = SessionMode.Required)]
public interface ITransferServiceContract
{
[OperationContract(IsInitiating = true)]
bool Subscribe();
[OperationContract(IsOneWay = true)]
void Unsubscribe();
...
}
Interface of callback contract, it doesn't do anything very exciting, just calls events via delegates etc. The reason I included this is to show you my attributes. I did alleviate one set of deadlocks already by including UseSynchronizationContext = false:
[CallbackBehavior(UseSynchronizationContext = false,
ConcurrencyMode = ConcurrencyMode.Multiple)]
public class TransferServiceCallback : ITransferServiceCallbackContract
{ ... }
Really hope somebody can help me! Thanks a lot =:)
OH my gosh, I found the issue.
That exception had nothing to do with the underyling app hang, that was just a precautionary exception which you can safely catch.
You would not believe it, I spent about 6 hours on and off on this bug, it turned out to be the channel.Close() locking up waiting for pending WCF requests to complete (which never would complete until the transfer has finished! which defeats the purpose!)
I just went brute-force breakpointing line after line, my issue was if I was too slow..... it would never hang, because somehow the channel would be available to close (even before the transfer had finished) so I had to breakpoint F5 and then quickly step to catch the hang, and that's the line it ended on. I now simply apply a timeout value to the Close() operation and catch it with a TimeoutException and then hard abort the channel if it cannot shut down in a timely fashion!
See the fix code:
private void Close()
{
if (channel != null &&
channel.State == CommunicationState.Opened)
{
// If cannot cleanly close down the app in 3 seconds,
// channel is locked due to channel heavily in use
// through callbacks or the like.
// Throw TimeoutException
channel.Close(new TimeSpan(0, 0, 0, 3));
}
}
public void Dispose()
{
// Dispose object
if (channel != null)
{
try
{
// Close existing connections
// *****************************
// This is the close operation where we perform
//the channel close and timeout check and catch the exception.
Close();
// Attempt dispose object
((IDisposable)channel).Dispose();
}
catch (CommunicationException)
{
channel.Abort();
}
catch (TimeoutException)
{
channel.Abort();
}
catch (Exception)
{
channel.Abort();
throw;
}
}
}
I am so happy to have this bug finally over and done with! My app is now shutting down cleanly after a 3 second timeout regardless of the current WCF service state, I hope I could have helped someone else who ever finds themselves suffering a similar issue.
Graham
Related
I have a WPF application in which i want to return list of data or any data when user call it. Also i need to call WCF service to get data. What if service is down for any reason and i want to fixed broken service or wait for service alive and return the data. Let me show you what i am doing:
public List<MyData> GetMyData()
{
try
{
var data =GetOrCreateChannel().GetMyData(); //GetOrCreateChannel method create WCF service channel
return data;
}
catch(Exception ex)
{
_log.error(ex);
FixedBrokenService()
return GetMyData(); //Call again this method.
}
}
In above method, if service is not running, it will go to catch block and again call the same method until unless service is down. Whenever service get alive, it will return the data. I want to know is this approach is fine or not? What if service is down for 2 to 3 hour it wil recursivly call method and the stack size in memory will increasing. Is there any other approach?
What if service is down for 2 to 3 hour it wil recursivly call method and the stack size in memory will increasing. Is there any other approach?
I think you're asking because you already sense there might be some other way to improve what you've got so far; my guess is you're looking for some standard.
If so, I'd recommend Google's Exponential backoff guideline, here applied to Google Maps calls.
The idea is to introduce a delay between subsequent calls to the web service, increasing it in case of repeated failures.
A simple change would be:
public List<MyData> GetMyData()
{
List<MyData> data = null;
int delayMilliseconds = 100;
bool waitingForResults = true;
while (waitingForResults)
{
try
{
data = GetOrCreateChannel().GetMyData();
waitingForResults = false; // if this executes, you've got your data and can exit
}
catch (Exception ex)
{
_log.error(ex);
FixedBrokenService();
Thread.Sleep(delayMilliseconds); // wait before retrying
delayMilliseconds = delayMilliseconds * 2; // increase your delay
}
}
return data;
}
This way you won't have to deal with recursion either; don't forget to add
using System.Threading; to the top.
Since you mentioned WPF, we might want to take Jeroen's suggestion and wait in another thread: this means that your WPF GUI won't be frozen while you try reconnecting, but it will be enabled and perhaps show a spinner, a wait message or something like that (e.g. "Reconnecting in x seconds").
This requires changing the second to last line, i.e. Thread.Sleep(delayMilliseconds); to Wait(delayMilliseconds); and adding these two methods below GetMyData:
private async static Task Wait(int delayMilliseconds)
{
await WaitAsync(delayMilliseconds);
}
private static Task WaitAsync(int delayMilliseconds)
{
Thread.Sleep(delayMilliseconds);
return new Task(() => { });
}
Try using a wcf client with ClientBase (there are tons of examples). You can register to an event of the InnerChannel named InnerChannel.Faulted. When that event is called it means the service has failed somehow.
Instead if immediately retrying to connect in the catch you can write a separate thread which retries to connect with the client when the service has gone down.
I have a Windows Service. On startup it checks whether any work has been assigned to it - and if none has, it just quits. The problem is that the recovery mechanism is set to Restart the Service.
This is what I want if the service legitimately crashes, but not if I quit the service programmatically on my own.
So far everything I've tried below has resulted in Windows automatically restarting the service:
Thread th;
protected override void OnStart(string[] args)
{
th = new Thread(new ThreadStart(StartMyService));
th.Start();
}
private void StartMyService()
{
if (WorkAvailable()) {
InitWork();
} else {
this.ExitCode = 0; // no errors
Stop();
}
}
protected override void OnStop()
{
// dispose of things
try
{
if (th != null && th.ThreadState == ThreadState.Running)
th.Abort();
}
catch { }
Environment.Exit(this.ExitCode);
}
I've tried different ExitCode values, but Windows always restarts the Service. I've also tried Environment.FailFast with same results. What am I missing?
Ignoring the issue of whether or not this is good design, the reason the OS is using the failure recovery action is because the service is failing.
When you call Stop the runtime marks the service as being in the process of stopping. It then calls the OnStop method. When the OnStop method returns it finishes cleanup and then exits the service. If this is the only service in the executable then the executable also exits. There is never a need to exit the process yourself.
When you call Environment.Exit (or Environment.FailFast) you cause the service executable to suddenly exit while the ServiceControlManager still has it listed as running, so the OS quite rightly considers the service to have failed.
I consume a WCF service asynchronously. If I can't connect to the service or an exception occurs it went to faulted state and it writes the error to the Error property of the AsyncCompletedEventArgs.
What do I have to do with the service client? I cannot close it because it would throw a CommunicationObjectFaultedException. What else do I have to do after logging the error?
Here's my code:
MyServiceClient serviceClient = new MyServiceClient();
//Close the connection with the Service or log an error
serviceClient.JustAMethod += (object sender, AsyncCompletedEventArgs args) =>
{
if (args.Error != null)
{
//Log error
ErrorHandler.Log(args.Error);
}
else
{
serviceClient.Close();
}
};
//Call the service
serviceClient.JustAMethodAsync();
You can abort it, and create a new one. Here's a fragment from a class I wrote that deals with that issue. Everything that it touches here is legal to touch when the client is in the faulted state.
if (_client.InnerChannel.State == CommunicationState.Faulted)
{
_client.Abort();
_client = new TServiceClient();
}
TServiceClient is any subclass of System.ServiceModel.ClientBase<TIClientInterface>.
I wrote that because I've had constant access issues calling webservices from the server end of an MVC4 web app, with the browser client accessing the page via RDS.
However, as of now, the above code isn't in use. For reasons I don't understand, it had a lot more access-denied exceptions than the simplest approach of invariably creating a new client for every call, and disposing it after. I never bother checking faulted state because I never use them for more than one call anyway.
using (var cli = new Blah.Blah.FooWCFClient())
{
_stuff = cli.GetStuff();
}
...in a try/catch, of course. If you see any issues with the client-caching/Abort approach, I'd suggest you try creating a new client for every call. Maybe it costs a few cycles, but it's silly to call a web service and then start worrying about runtime efficiency. That horse has left the barn.
I don't know how this would interact with the asynchronous business, other than a vague intuition about keeping things simple and not sharing anything across threads.
Welcome to my nightmare. I haven't yet identified the cause of our access issues, but I doubt things can possibly be that bad for you. So I hope at least one of those two options will work out.
UPDATE
Here's some .tt-generated service wrapper code from our XAML application. Every web service call method gets wrapped like this, and it's been bulletproof for years. I would recommend doing essentially this:
public static POCO.Thing GetThing(int thingID)
{
var proxy = ServiceFactory.CreateNewFooWCFClientInstance();
try
{
var returnValue = proxy.GetThing(thingID);
proxy.Close();
return returnValue;
}
catch(Exception ex)
{
// ***********************************
// Error logging boilerplate redacted
// ***********************************
proxy.Abort();
throw;
}
}
I have a feeling that it's just as well if you don't reuse WCF client objects at all.
There is not much you can do with it. Create a new one and let the garbage collector collect the other one.
I've got a project called DotRas on CodePlex that exposes a component called RasConnectionWatcher which uses the RasConnectionNotification Win32 API to receive notifications when connections on a machine change. One of my users recently brought to my attention that if the machine comes out of sleep mode, and attempts to redial the connection, the connection goes into a loop indicating the connection is already being dialed even though it isn't. This loop will not end until the application is restarted, even if done through a synchronous call which all values on the structs are unique for that specific call, and none of it is retained once the call completes.
I've done as much as I can to fix the problem, but I fear the problem is something I've done with the RasConnectionNotification API and using ThreadPool.RegisterWaitForSingleObject which might be blocking something else in Windows.
The below method is used to register 1 of the 4 change types the API supports, and the handle to associate with it to monitor. During runtime, the below method would be called 4 times during initialization to register all 4 change types.
private void Register(NativeMethods.RASCN changeType, RasHandle handle)
{
AutoResetEvent waitObject = new AutoResetEvent(false);
int ret = SafeNativeMethods.Instance.RegisterConnectionNotification(handle, waitObject.SafeWaitHandle, changeType);
if (ret == NativeMethods.SUCCESS)
{
RasConnectionWatcherStateObject stateObject = new RasConnectionWatcherStateObject(changeType);
stateObject.WaitObject = waitObject;
stateObject.WaitHandle = ThreadPool.RegisterWaitForSingleObject(waitObject, new WaitOrTimerCallback(this.ConnectionStateChanged), stateObject, Timeout.Infinite, false);
this._stateObjects.Add(stateObject);
}
}
The event passed into the API gets signaled when Windows detects a change in the connections on the machine. The callback used just takes the change type registered from the state object and then processes it to determine exactly what changed.
private void ConnectionStateChanged(object obj, bool timedOut)
{
lock (this.lockObject)
{
if (this.EnableRaisingEvents)
{
try
{
// Retrieve the active connections to compare against the last state that was checked.
ReadOnlyCollection<RasConnection> connections = RasConnection.GetActiveConnections();
RasConnection connection = null;
switch (((RasConnectionWatcherStateObject)obj).ChangeType)
{
case NativeMethods.RASCN.Disconnection:
connection = FindEntry(this._lastState, connections);
if (connection != null)
{
this.OnDisconnected(new RasConnectionEventArgs(connection));
}
if (this.Handle != null)
{
// The handle that was being monitored has been disconnected.
this.Handle = null;
}
this._lastState = connections;
break;
}
}
catch (Exception ex)
{
this.OnError(new System.IO.ErrorEventArgs(ex));
}
}
}
}
}
Everything works perfectly, other than when the machine comes out of sleep. Now the strange thing is when this happens, if a MessageBox is displayed (even for 1 ms and closed by using SendMessage) it will work. I can only imagine something I've done is blocking something else in Windows so that it can't continue processing while the event is being processed by the component.
I've stripped down a lot of the code here, the full source can be found at:
http://dotras.codeplex.com/SourceControl/changeset/view/68525#1344960
I've come for help from people much smarter than myself, I'm outside of my comfort zone trying to fix this problem, any assistance would be greatly appreciated!
Thanks! - Jeff
After a lot of effort, I tracked down the problem. Thankfully it wasn't a blocking issue in Windows.
For those curious, basically once the machine came out of sleep the developer was attempting to immediately dial a connection (via the Disconnected event). Since the network interfaces hadn't finished initializing, an error was returned and the connection handle was not being closed. Any attempts to close the connection would throw an error indicating the connection was already closed, even though it wasn't. Since the handle was left open, any subsequent attempts to dial the connection would cause an actual error.
I just had to make an adjustment in the HangUp code to hide the error thrown when a connection is closed that has already been closed.
Using vs2008, vb.net, C#, fw 3.5
I am consuming my service in my client
Service is hosted in IIS
Client(winforms MDI) is generated using svcutil using /l, /r, /ct, & /n switches
Service and client both use a MyEntities.dll
I am using nettcp with TransportWithMessageCredential
I cache the proxy in the main form
if Membership.ValidateUser(UsernameTextBox.Text, PasswordTextBox.Text)
_proxy = new MyServiceClient
_proxy.ClientCredentials.UserName.UserName = "username"
_proxy.ClientCredentials.UserName.Password = "password"
I then pass the _proxy around to any child forms/plugins that need to use it
ex
List(of Orders) = _proxy.ChannelFactory.CreateChannel.GetOrders(customer)
Everything is working great but my questions are this:
What happens to the channels after the call? Are they magically disposed?
How could I monitor this, with a profiler?
Is there a way I can have error handling in one place, or do I need to place try/catch in every call like What is the best workaround for the WCF client `using` block issue?
try
{
...
client.Close();
}
catch (CommunicationException e)
{
...
client.Abort();
}
catch (TimeoutException e)
{
...
client.Abort();
}
catch (Exception e)
{
...
client.Abort();
throw;
}
Could I subscribe to the _proxy.InnerChannel.Faulted and do that clean up there?
Regards
_Eric
I use to do two different things, depending on the use case:
In a client scenario where I know only one instance of the channel is used at a time, I lazy-create a channel, and re-use the created instance. In case it is faulted, closed, or disposed, the channel is re-created.
In scenarios where multiple channels can be requested at the same time, I think it is the best to do the exception handling dance. In order to avoid code bloat, you can centralize it into a method that accepts a delegate for the actual work that it done, so that it form a write-once exoskeleton around your payload code.
Additional test results/notes
It seems I have partially answered my own question, I ran this a loop for 500 X
List(of Orders) = _proxy.ChannelFactory.CreateChannel.GetOrders(customer)
This is very evil, and on the start of the 11th iteration got a timeout error, which is the max users of my service(10). Does this mean that someone can implement any wcf client and open as many channels as the wcf server will allow?
I did find that this gave me the expected results and completed all 500 iterations
Dim channel = _proxy.ChannelFactory.CreateChannel
e.result = Channel.GetOrders(customer)
Dim Ich = DirectCast(channel, ServiceModel.IClientChannel)
Ich.Close()
Ich.Dispose()
My question is now
can I casttochannel, close and dispose inside the _proxy.InnerChannel.Faulted event or for every call I make just wrap it in a try and then catch timeout/comm/fault exceptions leaving the proxy be but disposing of the channel? If the later is the case is there a way to encapsulate this?
Regards
_Eric