Debugging ideas for a blocked Windows Message loop - c#

I have a longstanding C# .NET 3.5 application 'freeze' which I am at a loss with. There are two C# executables. One has a full UI, the other runs as a tray app. They both communicate via WCF to a third service app, also running in the tray.
Randomly the UI thread of main Winforms app will deadlock. Mysteriously if I quit the tray app the UI of this app will unlock.
Whenever I attach the debugger to either app I learn nothing useful. The UI thread is blocked in the frozen app on the Application.Run method. All other threads are either sleeping, or blocked on Invokes onto the UI thread.
Also mysteriously another running application like Photoshop will behave strangely whilst this deadlock is in place. Quitting the tray app sorts this too.
All I can deduce is that something is going wrong with the main Windows-level message pump, but I don't really understand how I can debug further into this. I've installed the framework source code and can see the deadlocked app is stuck in a while loop in :
Application.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop
but don't really understand enough to do anything with this information.
Does anyone have any advice at all on where to look further? I've been chasing this random deadlock bug for months.
Thanks,
Nick

I think this could be a red herring since this one is in Visual Studio SDK so really your debugging freezes.
I have had to debug a few work related/work unrelated freezes and they are very very nasty and require meticulous instrumentation and code review. So be patient!
Here is a few pieces of advice from me:
1) You will see a few red herrings on the way so be careful not to get bugged down on them and confuse manifestations of the problem with the cause itself.
2) What is the timing of this freeze? How long does it take? A TCP connection time out usually takes 23 seconds while a database connection times out in 30, a command in 120 seconds (could be different on different settings) so the time it takes is a big clue. If it does not resolve by itself and you have to close one application to get rid of it, it is almost certainly a thread or database deadlock.
3) Use sysinternal's Process Explorer and Process Monitor to see what they are doing and at what point they freeze. The last sactivity could give you a hint not always.
4) I know it will take sometime but start writing tracing in your code so that you find the axact location of the issue and from then on, it usually takes a few hours to days to find the problem.
5) If you have more info, post another question and let me know.

Related

Application freezes for no reason, no deadlock and impossible to attach to Visual Studio for debugging

I have a C# .NET 4.6 console application that is supposed to run continuously (over days/months). However, after a non deterministic duration, all running threads will freeze for no apparent reason (CPU usage is at 0%, memory is not particularly high), and trying to attach the application to an instance of Visual Studio 2015 for debugging will fail (pressing "pause" will cause Visual Studio to stop responding!).
I inspected the parallel stack traces (captured via a dump in the process explorer) and could not find any sign of a deadlock (which would otherwise be the obvious culprit).
Here are for example 2 parallel stacks that are frozen (not even in my code but in the DirectoryInfo.cs core library, and ServiceStack OrmLite library), even though there are absolutely no reasons for them to be stuck like this.
I have previously noticed this behavior of freezing on other parts of code so it really seems these libraries are "victims" of the freeze and not responsible for it. Even if there were a deadlock which I could not see, it should not prevent these threads from completing as they are not waiting for anything.
Finally, killing the process and restarting it will always allow the previously frozen operations to run successfully.
Do you have any clue on what could be causing this kind of weird behavior/have any advice on tools to be used to get more insight?
Seems to be both threads are hanging while reading information(execute reader reads data from file, Enumerator trying to read data from file system for directory information). Is the file system is accessible at that point of time? Are you able to access the directory in which the reading is happening ?

Why would I bother properly terminating all threads in a multi-thread process?

The company I work for uses Visual Studio to develop its website and all of its features, and there is also a separate site that's been developed for testing the site. This 'testing' site can run individual test cases against the website, and must be run for each possible case.
Everything is written in VB.NET and each time the program is run a single thread is created to run the test. However, at the 'end' of the test the thread seems to still lingers. The stop button in Visual Studio must be manually clicked in order to terminate the application. Also, a process icon lingers in the task bar long after the application has closed.
It appears to me that the program is not correctly terminating all threads run during the tests, but I'm not sure if this is an issue worth brining up in the office, so I ask the following question...
What is the purpose of properly closing an application and all threads running on it, and what are the consequences, if any, of not doing so?
Well it's probably a small problem now, but it's not a good practice, IMHO. Imagine what would happen if the same code was now being executed by a continuous integration server, for instance, TeamCity (or Jenkins, or...), and the unit tests are being run continuously and automatically, by said build server.
What would happen to the build status when those threads fail to close down cleanly? We often face this problem due to bad design decisions in threading, or due to simple (and possibly, idiotic) mistakes in our unit testing code. The net effect though, is a hung build process.
I've seen CI servers hang for almost half a day before someone (mercifully) killed the build process. Essentially, this indicates a problem in our code that may or may not become a huge issue. If this was server-side code, there is potential for this code to lead to a pretty bad situation. My advice would be to dig out your introspection toolkits (memory profiling, perf profiling, etc) and see what exactly is going on, and resolve it.
We had a similar problem with an application that is being called to index SPA pages on our application server. It was throwing an exception in some cases and threads were not closing. The biggest downside is that it will consume the servers memory which is bad
Another downside as it runs as a web application that it will consume available ports and stop running when it run out of available ports.
The code should be modified to peacefully kill the thread after finishing or on exceptions and of course report any.

Program hanging point identification

There is a C#-program which hangs pretty rare. Execution of the program takes place on a remote machines and to start debugger is not an option. Run external profiler is more realistic, but also conjugate with huge difficulties. How can you determine the point of the program hang without profiler or debugger?
Option "detailed logging on FS" is poorly suited. The program consists of about 20 thousand lines of code and hangs not often.
I have tried Process Explorer but it works very strange (or I have not understood it). If you have managed to "catch" the moment when thread entered into an infinite loop, it is possible to see the stack in that moment. But this thread disappears quite quickly (whether in PE or it is really killed by the environment).
The option to create another application, application-monitor, is acceptable. If you can say how to create a dump of the main process or to obtain information about threads of the main process, it would be great. If you have some ready tools, it would be even better.
When an application crashes, it should normally be logged into Window's Application Event Log. It's not extremely detailed, but should give pretty solid clues anyway without any external tools needed.
To get there, you can either search "Event Log" in the Start Menu or find it in the Control Panel. It is located in the Administrative Tools section.
Once you're in the Event Viewer, open the Windows Logs item on the left then select Application. You should be able to find your application in the list using the Source column.
At the bottom you'll find the error detail, timestamp and a couple more infos which can help you debug your application.
Picture taken from Cyberlink.com
By 'hang' do you mean the program stops working until it is restarted or that the program pauses for an unusual amount of time. If the latter it could be in a heavy GC collection. If it's the former and you suspect some sort of infinite loop then in task manager (or process explorer) you should see it pretty much eating up one of the processor cores. For example if you have four cores and a program in hung in a tight loop, you will see roughly 25% cpu usage in the performance panel (assuming an otherwise lightly loaded machine).
MS supports managed debugging, see Debugging Managed Code Using the Windows Debugger You can use the sos extension to break the code execution and look at the state of the program. You might want to have the programs pdb handy if you take this approach.

Application becomes sluggish after minimizing

We have a fairly large winforms application that contains all the following:
Socket Connections
Async Calls to get data from web services thru backgroundworkers.
Timers and other events.
The application works fine as long as someone is interacting with it. However if it is minimized for 30 odd minutes or more (say lunch break for example) and then restored, it feels very sluggish and slow and never recovers its original responsiveness and needs to be restarted.
What could be the connection between minimizing a Winform app for a long time and this unusual unresponsiveness? Perhaps a GC issue but can't find anything. Looking for pointers on what/where to look to solve this. Thanks.
If this issue easy to reproduce - I suggest you to attach profiler when you will restore your application after 30 minutes and see what is going on. http://msdn.microsoft.com/en-us/library/ms182384.aspx
Also you can just attach debugger to see what is going on after you restore your app.

How can I make sure a C# console program ALWAYS exits?

I've written a small C# console app that is used by many users on a shared storage server. It's runtime should always be < 3 seconds or so, and is run automatically in the background to assist another GUI app the user is really trying to use. Because of this, I want to make sure the program ALWAYS exits completely, no matter if it throws an error or what not.
In the Application_Startup, I have the basic structure of:
try
{
// Calls real code here
}
catch
{
// Log any errors (and the logging itself has a try with empty catch around it
// so that there's no way it can causes problems)
}
finally
{
Application.Shutdown();
}
I figured that with this structure, it was impossible for my app to become a zombie process. However, when trying to push new versions of this app, I repeatedly find that I cannot delete and replace the executable because the "file is in use", meaning that it's hanging on someone's computer out there, even though it should only run for a few seconds and always shutdown.
So, how is it that my app is seemingly becoming a hanging process on peoples' computers with the code structure I have? What am I missing?
Edit: Added "Application." to resolve ShutDown() for clarity.
There are two options here:
Your console application doesn't really finish in 3 seconds, but rather takes a lot longer. You need to debug it and see what takes it that long.
Your console application takes 3 seconds to exit, but it is run every minute by the GUI, and you have more than 40 users, so the probability of finding the executable unused are slim.
If it's the first one, and you don't want to debug it, you can always start a second thread, wait for 3 seconds and then kill the entire process.
Maybe the code inside the try block is still executing for at least one of the clients and is not really limited to 3s or so. To prevent such case, you would need multithreaded application - one thread for processing and one in the background killing the working thread after a timeout. Prior to that you should ask yourself if such infrastructure is really needed.
Another thing that comes to mind would be that one of the users had the application running right at the moment, probability depends on the number of your users.
Maybe designing your support app as a always running multithreaded service would be a much better idea instead of instantiating one running application for each client request.

Categories