Service to continue on StackOverflowException - c#

We use a third party library to manipulate Pdf's. Our application runs as a Windows service and handles thousands of files every month. Once in a while someone uploads a malformed Pdf, which makes the library run amok and eventually throw a StackOverflowException.
The library manufacturer has not fixed the error over the last 2 years, and we can't have our production crash when someone feels like it.
Automatically restarting the service does not seem like an option, as the application would then retry the malformed file. Since we process many files in parallel, we cannot know which is the malformed when starting.
Since stackoverflows can't be catched by default, I would like to know if I can tweak the CLR of the service to catch the exception anyway.

You could re-architect the application to create new child processes, each with its own instance of the library, to perform the work.
Most importantly, with this approach the failing instance doesn't crash the entire application or take the other child processes down with it. You also have the advantage that the manager process can keep track of which files are in progress (and on which process) so it would know which files to not retry after a failure.

Related

COM Add-in: Resolve the error DisconnectedContext in WinWord.exe

I built an add-on to Microsoft Word. When the user clicks a button, it runs a number of processes that export a list of Microsoft Word documents to Filtered HTML. This works fine.
Where the code falls down is in processing large amounts of files. After the file conversions are done and I call the next function, the app crashes and I get this information from Visual Studio:
Managed Debugging Assistant 'DisconnectedContext' has detected a problem in 'C:\Program Files\Microsoft Office\root\Office16\WINWORD.EXE'.
Additional information: Transition into COM context 0x56255b88 for
this RuntimeCallableWrapper failed with the following error: System
call failed. (Exception from HRESULT: 0x80010100
(RPC_E_SYS_CALL_FAILED)). This is typically because the COM context
0x56255b88 where this RuntimeCallableWrapper was created has been
disconnected or it is busy doing something else. Releasing the
interfaces from the current COM context (COM context 0x56255cb0). This
may cause corruption or data loss. To avoid this problem, please
ensure that all COM contexts/apartments/threads stay alive and are
available for context transition, until the application is completely
done with the RuntimeCallableWrappers that represents COM components
that live inside them.
After some testing, I realized that if I simply remove all the code after the file conversions, there are no problems. To resolve this, I place the remainder of my code in yet another button.
The problem is I don't want to give the user two buttons. After reading various other threads, it sounds like my code has a memory or threading issue. The answers I am reading do not help me truly understand what to do next.
I feel like this is what I want to do:
1- Run conversion.
2- Close thread/cleanup memory issue from conversion.
3- Continue running code.
Unfortunately, I really don't know how to do #2 or if it is even possible. Your help is very much appreciated.
or it is busy doing something else
The managed debugging assistant diagnostic you got is pretty gobbledygooky but that's the part of the message that accurately describes the real problem. You have a firehose problem, the 3rd most common issue associated with threading. The mishap is hard to diagnose because this goes wrong inside the Word plumbing and not your code.
Trying not to commit the same gobbledygook sin myself, what goes wrong is that the interop calls you make into the Office program are queued, waiting for their turn to get executed. The underlying "system call" that the error code hints at is PostMessage(). Wherever there is a queue, there is a risk that the queue gets too large. Happens when the producer (your program) is adding items too the queue far faster than the consumer (the Office program) removes them. The firehose problem. Unless the producer slows down, the queue will grow without bounds and something is going to fail if it is allowed to grow endlessly, at a minimum the process runs out of memory.
It is not allowed to get close to that problem. The underlying queue that PostMessage() uses is protected by the OS. Windows fails the call when the queue already contains 10,000 messages. That's a fatal error that RPC does not know how to recover from, or rather should not try to recover from. Something is amiss and it isn't pretty. It returns an error code to your program to tell you about it. That's RPC_E_SYS_CALL_FAILED. Nothing much better happens in your program, the CLR doesn't know how to recover from it either, nor does your code. So the show is over, the interop call you made got lost and was not executed by Word.
Finding a completely reliable workaround for this awkward problem is not that straight-forward. Beware that this can happen on any interop call, so catching the exception and trying again is pretty drastically unpractical. But do keep in mind that the Q+D fix is very simple. The plain problem is that your program is running too fast, slowing it down with a Thread.Sleep() or Task.Delay() call is quite crude but will always fix the issue. Well, assuming you delay enough.
I think, but don't know for a fact because nobody ever posts repro code, that this issue is also associated with using a console mode app or a worker thread in your program. If it is a console mode app then try applying the [STAThread] attribute to your Main() method. If it is a worker thread then call Thread.SetApartmentState() before starting the thread, but beware it is very important to also create the Application interface on that worker thread. Not otherwise a workaround for an add-in.
If neither of those workarounds is effective or too unpractical then consider that you can automagically slow your program down, and ensure the queue is emptied, by occasionally reading something back from the Office program. Something silly, any property getter call will do. Necessarily you can't get the property value until the Office program catches up. That can still fail, there is also a 60 second time-out on the interop call. But that's something you can fix, you can call CoRegisterMessageFilter() in your program to install a callback that runs when the timeout trips. Very gobbledygooky as well, but the cut-and-paste code is readily available.

Detect 100% CPU load by a referenced library

I have a ASP.NET (C#) website that uses a third party DLL to process the data that the users POST via a web form. The call is pretty straightforward:
string result = ThirdPartyLib.ProcessData(myString);
Once in a blue moon this library hangs and (according to my hosting provider logs) consumes 100% of CPU. The website is hosted on a shared hosting, so I have no access to the IIS or event logs. When this happens, my website is automatically stopped by the hosting provider performance monitor, and I have manually switch it back on.
Now, I know that the right thing to to is investigate the problem and fix (or replace) the DLL. But as it's third-party software, I am unuable to fix it, and their support is not helpful at all. Moreover, I can't reproduce the problem. Replacing the library is a pain too.
Is there a way in C# to detect when this DLL starts consuming 100%CPU and kill the process automatically from my ASP.NET code?
You cannot "detect" if the current process is hanging because as the caller of a method (third party or not) you're simply not in control until it returns.
What you can do is move the call to the third party library into a separate executable and have it output its result via the standard output (you can simply use Console.WriteLine(string) for this).
Once you've done that, you can start a separate Process that runs this executable, read the result via StandardOutput and use WaitForExit(int) to wait a certain amount of time (maybe a few seconds) for the process to finish. The return value of WaitForExit() tells you if the process actually exited. In case it didn't, you can Kill() it and move on without IIS worker process hanging as a whole.

Need to communicate with child process to process a file

I have several 3rd party DLLs which are super flaky. They can sometimes hang and never return or sometimes throw weird exceptions which can bring down the whole process.
I want to move these DLLs and load then in a separate Child process. That way instead of having to do nasty Thread.Abort I can just bring down the process cleanly and later re-spawn it when required.
So my parent application receives a list of files that need to be processed by certain third party DLLs. I essentially need to pass these to the new child process, let it process the file and then communicate back to the parent that it was successful. I must also be able to bring down the process if sh*t hits the fan and re-spawn. These files come as constant stream so spawning a process every-time i get a file is not possible, id want it to hang around and just accept requests.
Right now i'm spawning the child process from the parent and then attempting to use Memory Mapped Files to share the files/work. Would it be easier just passing the location of said file and somehow getting a response when its processed?
What would be a good strategy here...
I would....
Create a WCF service, using PerCall instancing that hosts the dlls and does the file process - this would spawn a new instance for each call and if any goes down it should not affect the other. You could host it even as part of your main app but maybe as a separate Windows service, and as its probably going to be on the same machine, use named pipes transport.
Fire each request at it from your main app.
If you don't get a successful response (as long as its not a wcf exception - i.e. endpoint not found) just retry the request for x number of times

How to debug why application dies exception-less, when another application is closed?

I'm fixing bugs on an application, that is kind of data consumer/worker, getting data from third party application, using supplied API and libraries for doing so. It's c++ based API and the .net application is using a bit of c++ to access the libraries. Also - the application is multi-threaded, it's windowed (Winforms), uses several third party libraries (nhibernate, mysql and others). It might be relevant to add, that our consumer thread is the only place in the code, when it accesses the c++ library.
The problem? When the producent application is closing (takes a bit more time, more than a minute), consumer application dies within seconds, without error/exception - even thought they're opened independently. No information in Event Log, no Dr. Watson action, no exceptions in Visual Studio (debug just stops).
I've tried:
Stepping throughout the code to see the moment, where it closes, but it always happened in different places, was it calling the producent's libraries code, or not.
Debugged the application with Visual Studio configured to break on any exception throwing - but it dies without a thing.
Creating crash dumps (using ADPlus.vbs) and using windbg on them (I'm new to such low-level debugging, though), but !analyze resulted with different stack traces - leaving me traceless.
What would be the good direction to find out why the consumer application dies? Is there a way, to get around the problem (like showing a prompt message to the user, like: "Producent application is closing, consumer application will do the same!")?
[EDIT]
Consumer application is multi-threaded, and it's one consumer thread as addon to UI thread. Also - the third party app we're using as producer uses COM to send information to any consumer app (aka add-on).
Me and my coworker decided to comment out some code, to find the code, that possibly makes the problem. And probably we've found it - the application dies if and only if we've registered our consumer to producer. After reading documentation for the third party app, it turned out that consumer apps have to actively query for message of closing the producer, otherwise they would be forcefully terminated by the producer app.
So: 95% that the problem is third party application which we're querying for data is sending COM message to forcefully terminate our application (I'll post info / change to wiki, if we'd test it's the only reason).
The general scenario described here is a source for a very common confusion and misunderstanding related to cases where one tries to understand 'how come my application vanished into thin air without leaving any trace?'.
The immediate assumtion would be: my application 'died' or 'crashed' or 'encountered such unexpected exception, which is even not visible to the debugger and thus did not create any dump-file. Happened to me few good times...
The real answer in most cases would be that the application did not realy crash or die and did not receive any excpetion, but was simply shutted-down gracefully, but from a flow that I did not expect.
The easiest way to debug such cases will be to put a breakpoint in kernel32!ExitProcess and to follow the stack and see how we got here.
Hope this helps
It turns out, that its the host application, that kills my application. The proper way to debug the problem was to spy on windows messages and to see, that my application is getting Process Terminate message.

Best Practices of fault toleration and reliability for scheduled tasks or services

I have been working on many applications which run as windows service or scheduled tasks.
Now, i want to make sure that these applications will be fault tolerant and reliable. For example; i have a service that runs every hour. if the service crashes while its operating or running, i d like the application to run again for the same period (there are several things involved with this including transactions of data processing) , to avoid data loss. moreover, i d like the program to report the error with details. My goal is to avoid data loss and not falling behind for running the program.
I have built a class library that a user can import into a project. Library is supposed to keep information of running instance of the program, ie. program reads and writes information of running interval, running status etc. This data is stored in a database.
I was curious, if there are some best practices to make the scheduled tasks/ windows services fault tolerant and reliable.
Edit : I am talking about independent tasks or services which on different servers. and my goal is to make sure that the service will keep running, report any failures and recover from them.
I'm interested in what other people have to say, but I'll give you a few points that I've stumbled across:
Make an event handler for Unhandled Exceptions. This way you can clean up resources, write to a log file, email an administrator, or anything you need to instead of having it crash.
AppDomain.CurrentDomain.UnhandledException += new UnhandledExceptionEventHandler(AppUnhandledExceptionEventHandler);
Override any servicebase event handlers you need in the main part of your application. OnStart and OnStop are pretty crucial, but there are many others you can use. http://msdn.microsoft.com/en-us/library/system.serviceprocess.servicebase%28v=VS.71%29.aspx
Beware of timers. Windows forms timers won't work right in a service. User System.Threading.Timers or System.Timers.Timer. Best Timer for using in a Windows service
If you are updating on a thread, make sure you use a lock() or monitor in key sections to make sure everything is threadsafe.
Be careful not to use anything user specific, as a service runs without a specific user context. I noticed some of my SQL connection strings were no longer working for windows authorizations, etc. Also have heard people having trouble with mapped drives.
Never make a service with a UI. In fact for Vista and 7 they make it nearly impossible to do anyway. It shouldn't require user interaction, the most you can do is send a message with a WIN32 function. MSDN claims making interactive services is bad practice. http://msdn.microsoft.com/en-us/library/ms683502%28VS.85%29.aspx
For debugging purposes, it is way cool to make a service run as a console application until you get it doing what you want it to. Awesome tutorial: http://mycomponent.blogspot.com/2009/04/create-debug-install-windows-service-in.html
Anyway, hope that helps a little, but that is just a couple thing I poked around to find on my own.
Something obvious - don't run all your tasks at the same time. Try to schedule them so only one task is using some expensive resource at any time (if possible). For example, if you need to send out newsletters and some specific notifications, schedule them at different times. If two tasks need to clean up something in the database, let the one run after another.
Also schedule tasks to run outside of normal business hours - at night obviously.

Categories