How to debug random Outlook crashes (C# add-in) - c#

From a lot of our customers, we have been getting complaints that Outlook unexpectedly crashes (complete process restart) when using our plugin.
So far it has been impossible to reproduce, we can only analyze the logs after the fact and the only thing we know so far is that if we turn one specific add-in off, then the problem stops (it's a local add-in that helps sending e-mails with a configurable template). This add-in runs on .NET and is written in C#.
We have spent weeks gathering and analyzing logs. The crash always reports an Event ID 1000 in the Event Log, which points the faulting module to kernelbase.dll / olmapi32.dll / wwlib.dll / ntll.dll.... or some other dll file. The crash happens on several Outlook builds, old or new, monthly channel or semi channel, doesn't matter.
From our code we were finally able to simulate one crash after running an analysis in VisualStudio which warned us about some potential NullExceptions, when testing with that we could simulate one Outlook crash pointing to Kernelbase.dll. We now fixed this in a new patch and still awaiting results from customers, but in the meantime are there any more options to debug such a random crash? Hope anyone can help us here.

That is a widely spread problem when dealing with Office COM add-ins. The problem can be related to other add-ins, not only yours. Even to locate the source of the issue is very complicated in such cases. You can generate a dump crash and then analyze it to identify the source, but it may not help well because changes made by any add-in may not be detected following that way - the exception which could lead to the crash can be thrown by Outlook itself. For example, a badly written COM add-in may release a COM object and then finish its works, then at some point the host application detects that a required COM object is disposed and cannot continue execution, so it crashes suddenly.
To identify the source of the issue, first of all, you need to add any logging mechanisms to the add-in and see where and when the issue takes place. Then you can try to start simplifying the source code of your add-in by commenting line by line and seeing results after whether it helps or not. It also makes sense to try a newly created add-in, so it can be sure the issue comes from any other add-in, not your own code. There are a lot of helpful steps that could be made, but they depend on the specific scenario you deal with.
You can enable Outlook logs as well. Read more about that in the How to enable global and advanced logging for Microsoft Outlook article.

Try to collect a crash dump using ProcDump.exe and then open it in windbg.
Download ProcDump from
https://docs.microsoft.com/en-us/sysinternals/downloads/procdump
and run the following from command line:
procdump.exe -e -ma -o -w outlook.exe

Related

Working application launched from C# fails

I have C# application acting as a scheduler. It runs various applications successfully. One of these applications (VB6) fails halfway through the job. If I execute this VB6 application directly with the exact same parameters, it completes successfully. The scheduler runs other VB6 applications successfully. Does anybody know what could cause this? What in the environment changes when you launch an application (VB6 exe) from within another application (C#)? Maybe there is an expert that can point me to something to help solve this?
I am adding more logging to the VB6 application and currently the error points to a routine executing SQL commands, but I have other applications executing the same code with no problem. At this stage I am stumped.
The following might be different
user account / user rights
working directory
environment variables
I suggest inspecting the VB6 application with Process Explorer and comparing against a working version.
OK, I found the problem. I started by rolling back the VB6 code two versions and proved that it worked. I then added small pieces of the new code and checking every time if it still works. I did not add back all the code (some were just cosmetic) and it is now working with the new functionality. It has taken a LOT of hours and it will take a lot more to determine what caused the original error, so I decided to take the win, because I cannot afford more hours.

troubleshooting error code 1000 application crash

Problem solved alert. Read the final update first.
.................................................
I have vb6 application that calls a c# library via COM
The C# library is Framework 4.5.2
If I build the COM library on a particular machine running VS2017 15.5.6 I don't have a problem.
If I checkout the same code and build it on a different machine ( I tried 2 of them) with vS2017 15.5.2 in a particular record I get an application crash.
The error occurs on the line of code
if (edge.Extra == null) // given edge is not null and Extra is a property
In the Windows Event log there is
Faulting application name: jtJobTalk.exe, version: 1.0.0.0, time stamp: 0x5a9f5b1c
Faulting module name: ntdll.dll, version: 6.3.9600.18895, time stamp: 0x5a4b127e
Exception code: 0xc00000fd
Fault offset: 0x0006d46c
Faulting process ID: 0xb74
Faulting application start time: 0x01d3b5c77355520d
Faulting application path: C:\jobtalk\jtJobTalk.exe
Faulting module path: C:\Windows\SYSTEM32\ntdll.dll
Report ID: d1374fd2-21ba-11e8-8272-d050999dc03c
I tried sfc scan and no problems were reported.
On yet another computer (running Windows 7) I get the error
A new guard page for the stack cannot be created
How can I further troubleshoot this issue?
I am afraid to update the VS version on the good machine in case this causes me to be unable to release.
[Update]
After putting in some calls to MessageBox.Show I have established that the error was caused by an object referencing itself from within it's own constructor.
It has taken me a day at least to find this out. I am looking for any pearls of wisdom that could have helped me diagnose the issue in an easier way.
Exception code: 0xc00000fd
That is STATUS_STACK_OVERFLOW. A very common mishap, and hard to debug, so much so that they named a popular programmer web site after it. You are getting the raw version of this mishap with little assistance from the debugger. Always an issue in interop code, you can't rely on the managed debugger engine to help you diagnose it. Google is your best bet, all of the top hits for this phrase help you get on the right path to start fixing it.
If I build the COM library on a particular machine...
That is being unlucky, turning over the wrong stones to look for a cause can bog you down for a while. SOE is never caused by build problems or a buggy VS version, always a coding mistake. The most basic reason it would occur on one machine but not another is that you don't test the program with the exact same data. Or made a quicky coding change that looked very innocent, but wasn't.
if (edge.Extra == null)
That is one of the very common causes for SOE, a buggy property getter. Something like this:
public class Example {
private Foo edge;
public Foo Edge {
get { return Edge; } // Oops, meant edge
}
}
You can certainly look at this for a while and never see it. It would be nice if the compiler had a diagnostic for it, but the C# compiler does not have the necessary plumbing to ferret this out. The other very common cause is a field initializer:
public class Example {
private Example foo = new Example();
// etc...
}
Which can easily get more convoluted from there, when for example you create an instance of another class, and that class creates an Example object in its constructor. And the C# language supports writing recursive code, it is one of the standard programming techniques. If that code is any more complex than O(log(n)) then you can always easily crash it with too much data.
...any pearls of wisdom
Yes, there is one. If you don't have the managed debugging engine helping out with exceptions then diagnosing errors gets pretty hard to do. The VB6 runtime can provide you with the exception message, but not the Holy Stack Trace. Info that is lost in the transition between the two very different runtime environments.
But you can have that cake and eat it too, the trick is to get the managed debugger to start the VB6 IDE. Right-click your C# class library project > Properties > Debug tab. Select the "Start external program" radio button and type "C:\Program Files (x86)\Microsoft Visual Studio\VB6\VB6.exe". You can optionally set the "Command line arguments" to the full path of your .vbp project, that way the IDE automatically loads your VB6 project. Use Debug > Windows > Exception Settings and tick the "Common Language Runtime Exceptions" checkbox so it displays the tick mark. This makes the debugger stop on any C# exception, before it is passed to your VB6 code.
Press F5 and the VB6 IDE starts running. Press F5 again to start your VB6 code. Any mishap in the C# code now causes the managed debugger to step in. Usually the display automatically switches to the VS IDE, but sometimes you have to click the blinking taskbar button. You get to look at the code that threw the exception and use Debug > Windows > Stack Trace to find out how it got there.
I'm not 100% sure that this also works to diagnose SOE, the VB6 runtime might step in too soon to allow the CLR to see the exception. I don't have VB6 installed anymore to check. Please try it and let me know.

Anyway to deal with weird errors while my .net program running?

I made a program, which works fine on my PC without any errors, it also works fine on some office PCs, but it crashes without any describable error on customer's PC and some others.
Crashes are completely random, sometime it may crash and sometimes not.
Crashes are not related with any actions, sometimes it may crash when they just look at the program and wait for crash.
Customers send me this beautiful screens and want me to solve this.
There you see common error reporting dialog, but not info about Exception.
My program uses Unity Web Player running in WebBrowser control. It's always run in background on the hidden tab which becomes visible when needed.
Any ideas how to handle such errors?
I think you should first ensure that the environment at your place and that of your customers are identical.
Maybe there're dll or other programs installed at your place (Unity web player as you mentioned for example) or anything in your Registry that may differ.
Else there's no point in getting error on one PC and not on another.
Make sure all dll are well deployed
Check your registry,
Ensure that all related programs are well installed

Debugging EXCHANGE transport agent in VS2010 c#

I was given the source to a transport agent that parses incoming email that meet a certain criteria. I need to make some modifications but I need to track variables and my debugging attempts have been unsuccessful.
I build the dll, install it in exchange, set a break point then attach to the relevant process but nothing appears to be happening. I am not experienced in this method of debugging and I'm pretty sure that i'm missing a step, but all the documentation i'm able to find basically has the process listed as those few steps. Any assistance?
Figured it out
For anyone in the same situation, you need to do is this:
Compile your project in Debug mode.
Deploy it to exchange however you do that.
Since it is a DLL and running through exchange, exchange will be the host process so you'll have to attach a debugging to the exchange process for debugging.
You can do that but going to the Debug menu in VS and selecting "Attach to process", select the process that will be running the DLL.
When VS attaches to the process just set breakpoints in your code and you should be good to go.

Next steps debugging crash in customer environment

Part of our product is an IE plugin (BHO), which is running happily in lots of different environments across multiple OS versions/IE versions.
However, in a trial setup for one customer, running XP SP3 machines via citrix XenDesktop, IE 7 is crashing when the two below conditions are met:
Our plugin is loaded
The Shockwave flash object add-on is loaded (latest version - Flash11e.ocx)
Some extra info:
The crash happens when we then try and show a dialog to the user, or shortly after this. However the crash doesn't happen in our code, which is all written in C#, it happens in various places, often ole32.dll.
Our dialogs are HTML pages rendered in a webbrowser control, shown in a Form via form.ShowDialog(ownerWindow) in the BHO.
Either plugin seems to work fine independently. Disabling flash, or skipping any sites that use flash prevent the crash.
The customer is reasonably accommodating, and I was able to run IE with the MS Debugging Tools in order to capture a few dumps at the time of the crash. I'm now having some trouble interpreting the dumps. Thinking it was heap corruption I ran the debugging tools with full pageheap enabled, but that did not trigger a breakpoint.
The analysis from the Debugging tools is as follows:
In
iexplore_PID_5064_Date_12_20_2011__Time_11_19_26AM_161_Second_Chance_Exception_C0000005.dmp
the assembly instruction at ole32!HandleIncomingCall+e2 in
C:\WINDOWS\system32\ole32.dll from Microsoft Corporation has caused an
access violation exception (0xC0000005) when trying to read from
memory location 0x03ce4ff8 on thread
The stack trace at the point of crash is:
Thread 7 - System ID 1140
Entry point ieframe!CTabWindow::_TabWindowThreadProc
Create time 20/12/2011 19:18:08
Time spent in user mode 0 Days 0:0:19.828
Time spent in kernel mode 0 Days 0:0:10.468
Full Call Stack
Function Arg 1 Arg 2 Arg 3 Arg 4 Source
ole32!HandleIncomingCall+e2 0f9aafbc 00000034 00000001 07e8ab6c
ole32!STAInvoke+24 17444f80 00000001 0781efc0 077e8f10
ole32!AppInvoke+7e 17444f28 077e8f10 0781efc0 07e8ab6c
ole32!ComInvokeWithLockAndIPID+2c2 17444f28 077ec420 00000000 17444f28
ole32!ComInvoke+60 17444f28 00000400 0774ee30 07bcfe48
ole32!ThreadDispatch+23 17444f28 07bcfeb0 7752b096 00000000
ole32!ThreadWndProc+fe 005d0594 078b6ee0 0000babe 17444f2c
user32!InternalCallWinProc+28 7752b096 005d0594 00000400 0000babe
user32!UserCallWinProcCheckWow+150 00000000 7752b096 005d0594 00000400
user32!DispatchMessageWorker+306 7bcff64 00000000 07bcffb4 3e25e69b
user32!DispatchMessageW+f 07bcff64 0013e490 0013e5b8 07868ff0
ieframe!CTabWindow::_TabWindowThreadProc+189 07e03e30 0013e490 0013e5b8 07868ff0
kernel32!BaseThreadStart+37 3e25e464 07868ff0 00000000 00000000
I'm going to see what else I can get from this dump file, but I'm hoping someone here will have a great idea. I'd like to test a lot more stuff at the customer site, but we only have so many chances with them, so I need to use any time I get there very wisely.
For me a couple of next steps seem to be:
If the problem is flash messing up something in the way of us showing dialogs, I'd like to test a completely stripped down BHO that just shows dialogs, to show that the problem does not lie with our code.
There are a lot of other plugins installed on the machine, it would be nice to start with a stripped down image and build up from there, to see when the problem starts triggering.
Sometimes the crash happens in pseuoserverinproc.dll, which is part of HDX MediaStream, which runs flash content locally rather than on the server.
== update
I've had quite a bit of success with WinDbg analysing the dumps that I have. I think it makes quite a bit of sense to try and use gflags/windbg on the desktop that is having the troubles and debug it live.
That would be my recommended next step to anyone in a similar position at the moment, will know more about how good this advice is an a weeks time when I've had a chance to apply it.
There is a debugger version of the flash player which can output diagnostic information that might help you. I realise that the issue isn't necessarily flash, but it might offer some insight into possible issues.
I must admit I havn't installed it for some time, but I believe these links might help you:
Insutrctions on how to configure the debugger version to output logs:
http://kb2.adobe.com/cps/403/kb403009.html
The download link of the debugger version:
http://www.adobe.com/support/flashplayer/downloads.html
One thing you could do outside the customer site is run your code through a static analyzer, for example pclint to see if there any obvious bugs in your own code that get triggered in special situations.
We solved the problem in the end (well worked around it). If anyone is interested, this is how we did it.
Analysing the stack dumps with WinDbg (which is a great tool). We found that after the problem was isolated to showing WinForms in iexplore.exe after flash had loaded in XenDesktop deployments. Knowing this we were able to work around the problem.
The key was getting good crash dumps, working out a minimal reproduction scenario and having a good customer that let us test our theory!

Categories