We have a very tricky interop problem wherein the thread used to initialize a 3rd-party system has to be the same thread used to terminate it. Failure to do this results in a deadlock. We are performing interop from a WCF service hosted in IIS. Currently this cleanup is done in disposal and normally works very well. Unfortunately, under heavy load IIS will do a rude unload and we never get to call dispose. We can move the shutdown logic into a critical finalizer but that doesn't help since we no longer have access to the initializing thread! At this point our only recourse seems to be notifying the CLR that the AppDomain is now likely in a corrupted state. However, I'm not sure how to do that (or if it's even possible). It may be that this is the utility of contracts at a class level but I admit I don't really understand those fully.
EDIT: Alternatively, this is could be viewed as a thread affinity problem in the finalizer. If anyone has a clever solution to that, I'm all ears :)
Try to split the code that depends on that native dependency to a standalone Windows service application if possible. If it cannot work well with WCF/IIS, you should avoid the conflicts instead of fighting against it.
Related
I have been using async/await for a while, but delved deeper recently, and read a lot of best practice tips saying to by default always use ConfigureAwait(false) to prevent deadlocks and improve performance.
I just want to make sure I am not missing something when I presume this only applies when the is an actual current SynchronizationContext or TaskScheduler in play, correct?
If I have a Windows service app that is responding to messages/commands/etc. asynchronously, it always just uses the default scheduler = probably the same threadpool thread that the awaitable completed on will execute the continuation, thus no deadlock and no performance difference can be had from using ConfigureAwait(false), correct?
It's not like I can't put it there, but I hate noisey code so much...
In general, this is true. When working in a Console or Service scenario, there is no SynchronizationContext installed (by default) so the continueOnCapturedContext option in ConfigureAwait will have no effect, which means you can safely remove it without changing the runtime behavior.
However, there can be exceptions, so I would often suggest writing your code including ConfigureAwait(false) when appropriate anyways.
The main advantages of including this even in a console or service application are:
The code becomes reusable in other applications later. If you choose to reuse this code, you won't have to track down bugs that arise from not including this.
If you happen to install (or use a library that installs) a SynchronizationContext while running, the behavior of your methods won't change.
So I've googled that it freezes because of using unsafe code, and AbortException throws only when control flow returns to managed code. So, in my case I have a native library, called in a thread. So sometimes I can't abort it, because the library is native and the Abort method not just do nothing, but freezes the calling thread.
So, I'd like to solve it.
For example, using a different process should help, but it's very complicated.
So, a less heavy solution is to use ' AppDomains' . But anyway I should create an exe and call it. I tried to generate it in memory like this
var appDomain = AppDomain.CreateDomain("newDomain");
var assemblyBuilder = appDomain.DefineDynamicAssembly(new AssemblyName("myAsm"), AssemblyBuilderAccess.RunAndCollect);
var module = assemblyBuilder.DefineDynamicModule("myDynamicModule");
var type = module.DefineType("myStaticBulder", TypeAttributes.Public);
var methBuilder = type.DefineMethod("exec", MethodAttributes.Static | MethodAttributes.Public);
var ilGenerator = methBuilder.GetILGenerator();
but I found only EMIT-way, it's very very complicated.
Does a superficial solution exist?
This cannot work by design. The CLR has very strict rules about what kind of code can safely be aborted. It is important, beyond the unwise use of Thread.Abort(), plenty of cases where the CLR must abort code, AppDomain unloads being foremost.
The iron-clad rule is that the CLR must be convinced that it is safe to abort the code. It is only convinced of that if the thread is busy executing managed code or is waiting on a managed synchronization object. Your case does not qualify, no way for the CLR to have any idea what that native code is doing. Aborting a thread in such a state almost never not causes problems. Same idea of the danger of Thread.Abort() but multiplied by a thousand. A subsequent deadlock on an internal operating system lock is very likely, utterly undebuggable.
An AppDomain therefore is not a solution either, it cannot be unloaded until the thread stopped running and it won't.
Only thing you can do is isolate that code in a separate process. Write a little helper EXE project that exposes its api through a standard .NET IPC mechanism like a socket, named pipe, memory mapped file, remoting or WCF. When the code hangs, you can safely Process.Kill() it. No damage can be done, the entire process state is thrown away. Recovering tends to be quite tricky however, you still do have to get the process restarted and get it back into the original state. Especially the state restoration is usually very difficult to do reliably.
Recently I worked with an external dll library where I have no influence on it.
Under some special circumstances, a method of this third party dll is blocking and never returning.
I tried to work around this issue by executing this method in a new AppDomain. After a custom timeout, I wanted to Unload the AppDomain and kill all this crap ;)
Unfortunately, it does not work - as someone would expect.
After some time it throws CannotUnloadAppDomainException since the blocking method does not allow aborting the thread gracefully.
I depend on using this library and it does not seem that there will be an update soon.
So can I work around this issue, even if it's not best practice?
Any bad hack appreciated :)
An AppDomain cannot typically solve that problem, it's only good to throw away the state of your program. The real issue is that your thread is stuck. In cases like these, calling Thread.Abort() is unlikely to work, it will just get stuck as well. A thread can only be aborted if it is a "alertable wait state", blocking on a CLR synchronization object. Or executing managed code. In a state that the CLR knows how to safely clean up. Most 3rd party code falls over like this when executing unmanaged code, no way to ever clean that up in a safe way. A decisive hint that this is the case is AppDomain.Unload failing to get the job done, it can only unload the AppDomain when it can abort the threads that are executing code in the domain.
The only good alternative is to run that code in a separate process. Which you can kill with Process.Kill(). Windows do the cleanup. You'd use a .NET interop mechanism to talk to that code. Like named pipes, sockets, remoting or WCF. Plus the considerable hassle of having to write the code that can detect the timeout, kills the process, starts it back up and recovers internal state since you now restart with an uninitialized instance of that 3rd party code.
Do not forget about the real fix. Create a small repro project that reproduces the problem. When it hangs, create a minidump of the process. Send both to the 3rd party support group.
after reading this (scroll down the end to Blocking Issues) I think your only solution is to run the method in a different process - this might involve quite a bit of refactoring and/or a 'host' project (eg Console application) that loads the method in question and makes it easy to call (eg reading args from command line) when launching the new process using the Process class
You can always use background worker, no need to create a new appdomain. This will ensure that you have complete control over the execution of the thread.
However, there is no way to ensure that you can gracefully abort the thread. As the dll is unmanaged, chances are there that it may cause memory leaks. However, spawning a new thread will ensure that your application does not crash when the Dll does not respond.
I am doing a project where I am loading several assemblies during runtime, for each of those assemblies I use reflection to find some specific classes, instantiate them and calling their methods. All this is working fine, but for some of the calls the process encounters a stack overflow which terminates my entire program. I don't have any control over the source code of the assemblies I am loading so I cant change the code I'm executing.
What I have tried to solve the problem:
I assign a thread to do the invocation of the methods and tried to
abort the thread after a timeintervall(I know that this is bad
practice but I cant change the code to terminate friendly). This
however doesn't work, I think the thread is to busy "stackoverflowing"
to handle the Abort-call.
Ive tried reducing the actual memory the thread has access to, this is not even a solution because you cant catch the stackoverflow-exception so my program terminates anyway (just quicker)
Questions:
Can a thread be to busy to be aborted? Is there some way to abort a thread that is having this behaviour?
How can we call code (that we don't have any control over) in a good way?
Thanks in advance!
The recommended procedure in case of "opaque code" is to actually fork a new process and start it. That way you gain two benefits:
If it fails by itself, it's isolated and won't take your main application down as well.
You can safely kill it and it won't cause as much trouble as an aborted thread.
I always got a DisconnectedContext (a managed debugging assistant) when I run my application using Visual Studio. Given Google and docs, this can happen when COM objects on STA are called from other thread.
However, when I look throught all the threads when the popup appears, I don't find anything like this. (And I don't find anything weird at all).
Some ideas on how I can find the way the DisconnectedContext is raised?
Found this while looking for the same answer, thought I'd add a comment...
This error is virtually unavoidable in any multi-threaded app using CLR objects through in-process interop (on transient threads). The problem is that the CLR had non-deterministic cleanup of objects (which may be RCW's, with thread-affinity on the underlying COM objects). There's no way you can tell the runtime to clean up objects created on a thread (at least without creating another non-deterministic cleanup handle on the thread); it's a design limitation of the interop mechanism. Given that, there's no way to ever safely exit a thread which has created any CLR objects without potentially getting this error.
Best advice: don't use CLR/interop if you can help it. Next best advice: use COM+ to process-isolate your interop, so the CLR can live in a process which never terminates threads (use persistent thread pool or equivalent). Next best advice: join me in continuing to tell Microsoft about this design-level problem with their interop, and hope they fix it.
This is a pretty serious warning, don't ignore it. The scenario is that you created a COM object on a thread and that thread exited. But you keep using that object. COM takes care of objects that announced themselves to be not thread-safe (aka apartment threaded), it automatically marshals any calls on that object to the thread that created it. That can't work when that thread is no longer around.
Ignoring the warning can produce occasional and very hard to troubleshoot threading race errors. Stuff that goes subtly wrong only once a week. Review your code, pay attention to how the object that it complains about got created.