Recently I worked with an external dll library where I have no influence on it.
Under some special circumstances, a method of this third party dll is blocking and never returning.
I tried to work around this issue by executing this method in a new AppDomain. After a custom timeout, I wanted to Unload the AppDomain and kill all this crap ;)
Unfortunately, it does not work - as someone would expect.
After some time it throws CannotUnloadAppDomainException since the blocking method does not allow aborting the thread gracefully.
I depend on using this library and it does not seem that there will be an update soon.
So can I work around this issue, even if it's not best practice?
Any bad hack appreciated :)
An AppDomain cannot typically solve that problem, it's only good to throw away the state of your program. The real issue is that your thread is stuck. In cases like these, calling Thread.Abort() is unlikely to work, it will just get stuck as well. A thread can only be aborted if it is a "alertable wait state", blocking on a CLR synchronization object. Or executing managed code. In a state that the CLR knows how to safely clean up. Most 3rd party code falls over like this when executing unmanaged code, no way to ever clean that up in a safe way. A decisive hint that this is the case is AppDomain.Unload failing to get the job done, it can only unload the AppDomain when it can abort the threads that are executing code in the domain.
The only good alternative is to run that code in a separate process. Which you can kill with Process.Kill(). Windows do the cleanup. You'd use a .NET interop mechanism to talk to that code. Like named pipes, sockets, remoting or WCF. Plus the considerable hassle of having to write the code that can detect the timeout, kills the process, starts it back up and recovers internal state since you now restart with an uninitialized instance of that 3rd party code.
Do not forget about the real fix. Create a small repro project that reproduces the problem. When it hangs, create a minidump of the process. Send both to the 3rd party support group.
after reading this (scroll down the end to Blocking Issues) I think your only solution is to run the method in a different process - this might involve quite a bit of refactoring and/or a 'host' project (eg Console application) that loads the method in question and makes it easy to call (eg reading args from command line) when launching the new process using the Process class
You can always use background worker, no need to create a new appdomain. This will ensure that you have complete control over the execution of the thread.
However, there is no way to ensure that you can gracefully abort the thread. As the dll is unmanaged, chances are there that it may cause memory leaks. However, spawning a new thread will ensure that your application does not crash when the Dll does not respond.
Related
So I've googled that it freezes because of using unsafe code, and AbortException throws only when control flow returns to managed code. So, in my case I have a native library, called in a thread. So sometimes I can't abort it, because the library is native and the Abort method not just do nothing, but freezes the calling thread.
So, I'd like to solve it.
For example, using a different process should help, but it's very complicated.
So, a less heavy solution is to use ' AppDomains' . But anyway I should create an exe and call it. I tried to generate it in memory like this
var appDomain = AppDomain.CreateDomain("newDomain");
var assemblyBuilder = appDomain.DefineDynamicAssembly(new AssemblyName("myAsm"), AssemblyBuilderAccess.RunAndCollect);
var module = assemblyBuilder.DefineDynamicModule("myDynamicModule");
var type = module.DefineType("myStaticBulder", TypeAttributes.Public);
var methBuilder = type.DefineMethod("exec", MethodAttributes.Static | MethodAttributes.Public);
var ilGenerator = methBuilder.GetILGenerator();
but I found only EMIT-way, it's very very complicated.
Does a superficial solution exist?
This cannot work by design. The CLR has very strict rules about what kind of code can safely be aborted. It is important, beyond the unwise use of Thread.Abort(), plenty of cases where the CLR must abort code, AppDomain unloads being foremost.
The iron-clad rule is that the CLR must be convinced that it is safe to abort the code. It is only convinced of that if the thread is busy executing managed code or is waiting on a managed synchronization object. Your case does not qualify, no way for the CLR to have any idea what that native code is doing. Aborting a thread in such a state almost never not causes problems. Same idea of the danger of Thread.Abort() but multiplied by a thousand. A subsequent deadlock on an internal operating system lock is very likely, utterly undebuggable.
An AppDomain therefore is not a solution either, it cannot be unloaded until the thread stopped running and it won't.
Only thing you can do is isolate that code in a separate process. Write a little helper EXE project that exposes its api through a standard .NET IPC mechanism like a socket, named pipe, memory mapped file, remoting or WCF. When the code hangs, you can safely Process.Kill() it. No damage can be done, the entire process state is thrown away. Recovering tends to be quite tricky however, you still do have to get the process restarted and get it back into the original state. Especially the state restoration is usually very difficult to do reliably.
I am doing a project where I am loading several assemblies during runtime, for each of those assemblies I use reflection to find some specific classes, instantiate them and calling their methods. All this is working fine, but for some of the calls the process encounters a stack overflow which terminates my entire program. I don't have any control over the source code of the assemblies I am loading so I cant change the code I'm executing.
What I have tried to solve the problem:
I assign a thread to do the invocation of the methods and tried to
abort the thread after a timeintervall(I know that this is bad
practice but I cant change the code to terminate friendly). This
however doesn't work, I think the thread is to busy "stackoverflowing"
to handle the Abort-call.
Ive tried reducing the actual memory the thread has access to, this is not even a solution because you cant catch the stackoverflow-exception so my program terminates anyway (just quicker)
Questions:
Can a thread be to busy to be aborted? Is there some way to abort a thread that is having this behaviour?
How can we call code (that we don't have any control over) in a good way?
Thanks in advance!
The recommended procedure in case of "opaque code" is to actually fork a new process and start it. That way you gain two benefits:
If it fails by itself, it's isolated and won't take your main application down as well.
You can safely kill it and it won't cause as much trouble as an aborted thread.
We have a very tricky interop problem wherein the thread used to initialize a 3rd-party system has to be the same thread used to terminate it. Failure to do this results in a deadlock. We are performing interop from a WCF service hosted in IIS. Currently this cleanup is done in disposal and normally works very well. Unfortunately, under heavy load IIS will do a rude unload and we never get to call dispose. We can move the shutdown logic into a critical finalizer but that doesn't help since we no longer have access to the initializing thread! At this point our only recourse seems to be notifying the CLR that the AppDomain is now likely in a corrupted state. However, I'm not sure how to do that (or if it's even possible). It may be that this is the utility of contracts at a class level but I admit I don't really understand those fully.
EDIT: Alternatively, this is could be viewed as a thread affinity problem in the finalizer. If anyone has a clever solution to that, I'm all ears :)
Try to split the code that depends on that native dependency to a standalone Windows service application if possible. If it cannot work well with WCF/IIS, you should avoid the conflicts instead of fighting against it.
I always got a DisconnectedContext (a managed debugging assistant) when I run my application using Visual Studio. Given Google and docs, this can happen when COM objects on STA are called from other thread.
However, when I look throught all the threads when the popup appears, I don't find anything like this. (And I don't find anything weird at all).
Some ideas on how I can find the way the DisconnectedContext is raised?
Found this while looking for the same answer, thought I'd add a comment...
This error is virtually unavoidable in any multi-threaded app using CLR objects through in-process interop (on transient threads). The problem is that the CLR had non-deterministic cleanup of objects (which may be RCW's, with thread-affinity on the underlying COM objects). There's no way you can tell the runtime to clean up objects created on a thread (at least without creating another non-deterministic cleanup handle on the thread); it's a design limitation of the interop mechanism. Given that, there's no way to ever safely exit a thread which has created any CLR objects without potentially getting this error.
Best advice: don't use CLR/interop if you can help it. Next best advice: use COM+ to process-isolate your interop, so the CLR can live in a process which never terminates threads (use persistent thread pool or equivalent). Next best advice: join me in continuing to tell Microsoft about this design-level problem with their interop, and hope they fix it.
This is a pretty serious warning, don't ignore it. The scenario is that you created a COM object on a thread and that thread exited. But you keep using that object. COM takes care of objects that announced themselves to be not thread-safe (aka apartment threaded), it automatically marshals any calls on that object to the thread that created it. That can't work when that thread is no longer around.
Ignoring the warning can produce occasional and very hard to troubleshoot threading race errors. Stuff that goes subtly wrong only once a week. Review your code, pay attention to how the object that it complains about got created.
I have a thread that goes out and attempts to make a connection. In the thread, I make a call to a third party library. Sometimes, this call hangs, and never returns. On the UI thread, I want to be able to cancel the connection attempt by aborting the thread, which should abort the hung call to the third party library.
I've called Thread.Abort, but have now read that Thread.Abort only works when control returns to managed code. I have observed that this is true, because the thread never aborts, and I've been sitting on Thread.Join for ten minutes now. What should I do with this hung thread? Should I just null the reference and move on? I'd like to be as clean as possible--
Random thought: I wonder if you could write a second assembly as a small console exe that does this communication... launch it with Process.Start and capture results either via the file system or by intercepting stdout. Then if it hangs you can kill the process.
A bit harsh, maybe - and obviously it has overheads of spawning a process - but it should at least be possible to kill it.
This function in your third-party library doesn't have a timeout or cancel function? If so, that's pretty poor design. There's not going to be any pretty solution here, methinks...
Unfortunately, there's no way you're going to get around it, short of using the Win32 API to kill the thread manually, which is certainly not going to be clean. However, if this third-party library is not giving you any other options, it may be the thing to do. The TerminateThread function is what you'll want to use, but observe the warning! To get the thread ID to pass to this function, you have to use another Win32 API call (the Thread class doesn't expose it directly). The approach here will be to set the value of a volatile class variable to the result of GetCurrentThreadId at the start of the managed thread method, and then use this thread ID later to terminate the thread.
Not sure if this will do it or be acceptable, but its worth a shot.
[DllImport("kernel32.dll")]
private static extern bool TerminateThread (Int32 id, Int32 dwexit);
From the documentation
TerminateThread is a dangerous function that should only be used in the most extreme cases. You should call TerminateThread only if you know exactly what the target thread is doing, and you control all of the code that the target thread could possibly be running at the time of the termination. For example, TerminateThread can result in the following problems:
If the target thread owns a critical section, the critical section will not be released.
If the target thread is allocating memory from the heap, the heap lock will not be - released.
If the target thread is executing certain kernel32 calls when it is terminated, the kernel32 state for the thread's process could be inconsistent.
If the target thread is manipulating the global state of a shared DLL, the state of the DLL could be destroyed, affecting other users of the DLL.
Managed threads can't directly stop native threads. So if the call is blocked in native code then the best you can do is have the managed thread check then terminate once it returns. If it never returns, maybe there's a version of the call with a timemout?
If not, killing the thread (through win32) is not usually a good idea...
Not a good solution to ever wait on a thread (in any language) indefinitely, especially if you are making external calls. Always use a join with a timeout, or a spin lock that monitors the state of a shared atomic variable until it changes, or you reach a timeout. I'm not a C# guy, but these are all sound concurrency practices.