I have a WinForms application that uses COM Interop to connect to Microsoft Office applications. I have read a great deal of material regarding how to properly dispose of COM objects and here is typical code from my application using techniques from Microsoft's own article (here):
Excel.Application excel = new Excel.Application();
Excel.Workbook book = excel.Workbooks.Add();
Excel.Range range = null;
foreach (Excel.Worksheet sheet in book.Sheets)
{
range = sheet.Range["A2:Z2"];
// Process [range] here.
range.MergeCells();
System.Runtime.InteropServices.Marshal.ReleaseComObject(range);
range = null;
}
// Release explicitly declared objects in hierarchical order.
System.Runtime.InteropServices.Marshal.ReleaseComObject(book);
System.Runtime.InteropServices.Marshal.ReleaseComObject(excel);
book = null;
excel = null;
// As taken from:
// http://msdn.microsoft.com/en-us/library/aa679807(v=office.11).aspx.
System.GC.Collect();
System.GC.WaitForPendingFinalizers();
System.GC.Collect();
System.GC.WaitForPendingFinalizers();
All exception handling has been stripped to make the code clearer for this question.
What happens to the [sheet] object in the [foreach] loop? Presumably, it will not get cleaned up, nor can we tamper with it while it is being enumerated. One alternative would be to use an indexing loop but that makes for ugly code and some constructs in the Office Object Libraries do not even support indexing.
Also, the [foreach] loop references the collection [book.Sheets]. Does that leave orphaned RCW counts as well?
So two questions here:
What is the best approach to clean up when enumerating is necessary?
What happens to the intermediate objects like [Sheets] in [book.Sheets] since they are not explicitly declared or cleaned up?
UPDATE:
I was surprised by Hans Passant's suggestion and felt it necessary to provide some context.
This is client/server application where the client connects to many different Office apps including Access, Excel, Outlook, PowerPoint and Word among others. It has over 1,500 classes (and growing) that test for certain tasks being performed by end-users as well as simulate them in training mode. It is used to train and test students for Office proficiency in academic environments. With multiple developers and loads of classes, it has been a difficult to enforce COM-friendly coding practices. I eventually resorted to create automated tests using a combination of reflection and source code parsing to ensure the integrity of these classes at a pre-code-review stage.
Will give Hans' suggestion a try and revert back.
Enumerating
Your sheet loop variable is, indeed, not being released. When writing interop code for excel you have to constantly watch your RCWs. In preference to using foreach enumertions I tend to use for as it makes me realise whenever I've grabbed a reference by having to explicitly declare the variable. If you must enumerate, then at the end of the loop (before you leave the loop) do this:
if (Marshal.IsComObject(sheet)) {
Marshal.ReleaseComObject(sheet);
}
And, be careful of continue and break statements that leave the loop before you have released your reference.
Intermediates
It depends on whether or not the intermediate is actually a COM object (book.Sheets is) but if it is then you need to first get a reference to it in a field, then enumerate that reference, and then ensure you dispose of the field. Otherwise you are essentially "double dotting" (see below):
using xl = Microsoft.Office.Interop.Excel;
...
public void DoStuff () {
...
xl.Sheets sheets = book.Sheets;
bool sheetsReleased = false;
try {
...
foreach (xl.Sheet in sheets) { ... try, catch and dispose of sheet ... }
... release sheets using Marshal.ReleaseComObject ...
sheetsDisposed = true;
}
catch (blah) { ... if !sheetsDisposed , dispose of sheets ... }
}
The above code is the general pattern (it gets lengthy if you type it in full so I have only focussed on the important parts)
What about errors?
Be fastidious in your use of try ... catch ... finally. Make sure that you use this very carefully. finally does not always get called in the case of things like stack overflow, out of memory, security exceptions, so if you want to ensure you clean up, and don't leave phantom excel instances open if your code crashes, then you must conditionally execute reference releasing in the catch before exceptions are thrown.
Therefore, inside every foreach or for loop, you also need to use try ... catch ... finally to make sure that the enumeration variable is released.
Double dotting
Also do not "double dot" (only use a single period in lines of code). Doing this in foreach is a common mistake that is easy for us to do. I still catch myself doing it if I've been off doing non-COM C# for a while as it is more and more common to chain periods together due to LINQ style expressions.
Examples of double dotting:
item.property.propertyIWant
item.Subcollection[0] (you are calling SubCollection before then calling an indexer property on that subcollection)
foreach x in y.SubCollection (essentially you are calling SubCollection.GetEnumerator, so you are "double dotting" again)
Phantom Excel
The big test, of course, is to see if Excel remains open in the task manager once your program exits. If it does then you probably left a COM reference open.
References
You say you have researched this heavily, but in case it helps, then a few of the references that I found helpful were:
The mapping between interface pointers and runtime callable wrappers (RCWs)
VSTO and COM Interop
ReleaseComObject (cbrumme)
Robust solutions
One of the above references mentions a helper he uses for foreach loops. Personally, if I'm doing more than a simple "script" project then I'll first spend time on developing a library specifically wrapping the COM objects for my scenario. I have a common set of classes now that I reuse, and I've found that the time invested in setting that up before doing anything else is more than recovered in not having to hunt down unclosed references later on. Automated testing is also essential to help with this and reaps rewards for any COM interop, not just Excel.
Each COM object, such as Sheet, will be wrapped in a class that implements IDisposable. It will expose properties such as Sheets which in turn has an indexer. Ownership is tracked all the way through, and at the end if you simply dispose of the master object, such as the WorkbookWrapper, then everything else gets disposed of internally. Adding a sheet, for instance, is tracked so a new sheet will also be disposed of.
While this is not a bulletproof approach you can at least rely on it for 95 % of the use cases, and the other 5 % you are totally aware of and take care of in the code. Most importantly, it is tested and reusable once you have done it the first time.
Related
Background: I am trying to offer automation of Outlook without referencing the Interop dll. This is to make it easier for clients to install without having to install or reference dlls, in specific folders for the tool.
I am NOT writing the overarching software/tool. So I cannot affect STA/MTA attributes. It is a 3rd party software that allows you to drop in code snippets to form a basic class, and methods etc....
If we use Interop dll reference and Early binding, then the following lines work fine:
if(item.SenderEmailType == "EX") {
Microsoft.Office.Interop.Outlook.AddressEntry entry = (Microsoft.Office.Interop.Outlook.AddressEntry)item.Sender;
Microsoft.Office.Interop.Outlook.ExchangeUser user = (Microsoft.Office.Interop.Outlook.ExchangeUser)entry.GetExchangeUser();
row["SenderEmailAddress"] = user.PrimarySmtpAddress;
} else {
row["SenderEmailAddress"] = item.SenderEmailAddress;
}
If we remove the dll Reference, and use dynamic keyword, this piece of code never works, where 'item' is in fact one of a Microsoft.Office.Interop.Outlook.MailItem retrieved via the Items.Restrict method:
foreach(dynamic item in folderItems) {
if(item.SenderEmailType == "EX") {
dynamic entry = item.Sender;
// the object is always null, as it seems that runtimebinding is not happening properly?
dynamic user = entry.GetExchangeUser();
// perhaps the COM Object has to attempt to call Outlook, and the info is never returned on the same thread??
row["SenderEmailAddress"] = user.PrimarySmtpAddress;
} else {
// this is OK, as it seems that SenderEmailAddress property data is delivered with the COM Object first time around
row["SenderEmailAddress"] = item.SenderEmailAddress;
}
}
When the above snippet is wrapped by a simple try-catch block, it works every time. Note that there is NO other difference in code/objects/instance or even email details. The addition of try-catch makes it work - every time. Remove try-catch, and it fails every time.
So the question is, why is this the case, and can i confidently sign off completion of code with the try catch block?
Is this to do with the thread coming back from the COM call being wrapped in try-catch that makes the call come back with results?
I am not familiar enough with the inner workings of COM, so tried looking up some old threads, and there is some hint about threading, COM and runtime binding. But none seem to mention this simple issue of try-catch 'fixing' everything.
I'm writing a lot of Office add-ins in C#, and I love all the wonderful ways you can extend the functionality of especially Excel. But one thing that keeps bugging me is the overhead of doing pretty much anything to pretty much any Office object.
I'm aware that there are high-level tricks to doing many things faster, like reading and writing object[,] arrays to larger cell ranges instead of accessing individual cells, and so on. But regardless, a complicated add-in will always end up accessing lots of different objects, or many properties of a few objects, or the same properties over and over again.
And when profiling my add-ins I always find I spend at least 90% of my CPU time accessing basic properties of Office objects. For instance, here is a bit of code I use to check if a window has been scrolled, so I can update some overlay graphics accordingly:
Excel.Window window = Globals.ThisAddIn.Application.ActiveWindow;
if (window.ScrollColumn != previousScrollColumn)
{
needsRedraw = true;
previousScrollColumn = window.ScrollColumn;
}
if (window.ScrollRow != previousScrollRow)
{
needsRedraw = true;
previousScrollRow = window.ScrollRow;
}
if (window.Zoom != previousZoom)
{
needsRedraw = true;
previousZoom = window.Zoom;
}
The first line, getting the active window, and each of the if statements, each accessing a property of that window, all light up when profiling. They're really slow.
Now I know these are COM objects in managed wrappers, and there's some sort of managed->unmanaged interface stuff going on, probably inter-process communication and whatnot, so I'm not surprised that there's some overhead, but I'm still amazed at how much it adds up.
So are there any tricks for speeding stuff like this up?
For instance, in the above case I'm accessing three properties of the same object. I can't help but think there must be some way to read them all in one go, like maybe via a native companion add-in or something...?
Any ideas?
If you can get the Open XML, you can load it and traverse it using the Open XML SDK or other related libraries. Word has this (Range.WordOpenXML) but I don't know if Excel does. Even then, it might be that not all properties are exposed, for example the scroll location is probably not there.
I'm debugging some legacy code. It loads a user defined COM object, allows the user to call functions in it, and then releases it. However, we have found that every time we load and unload the COM object, we leak memory. As a test, we changed the code to load it and hang on to it, and keep re-using it until program exit and the leak went away.
Here are the relevant code snippets:
This C++ codeis is called to load the COM object, pszProgId is a string identifying the target DLL.
COleDispatchDriver *pDispatchDriver = NULL;
pDispatchDriver = new COleDispatchDriver();
if (!pDispatchDriver->CreateDispatch(pszProgId, &oleException))
{
throw &oleException;
}
pDispatchDriver->m_bAutoRelease = TRUE;
*ppvObject = (void *) pDispatchDriver;
void ** ppvObject is a pointer we pass around to generically hold different objects. It is part of a much larger structure.
And here is the code we call when releasing the COM object.
After we are done using the COM object, we release it as follows:
COleDispatchDriver* pDispatchDriver = (COleDispatchDriver*) (*((LONG_PTR*)(ppvObject)));
pDispatchDriver->ReleaseDispatch();
delete pDispatchDriver;
This is leaking about 1 meg every call. The target COM object is C#. Anyone have any idea what we're doing wrong or a better way to do what we're trying to do?
We are building this in VisualStudio 2015 in case that is relevant.
Re xMRi:
As already noted, we tried changing that flag to TRUE to no effect. As a sanity check, I tried doing that again after reading your post and again it did nothing to fix the memory leak. So for better clarity, I've updated my code to show it set to TRUE which is almost certainly the right value but still exhibiting the same memory leak described above.
ReleaseDispatch does nothing if you set m_bAutoRelease to FALSE. So in fact you don't free the instance of this COM object.
See the implementation:
void COleDispatchDriver::ReleaseDispatch()
{
if (m_lpDispatch != NULL)
{
if (m_bAutoRelease)
m_lpDispatch->Release();
m_lpDispatch = NULL;
}
}
So created the problem in setting m_bAutoRelease to FALSE yourself. Check the reasons why you are doing this.
You can directly get the LPDISPTACH pointer and call Release() but this is exactly what should be done when m_bAutoRelease is TRUE.
Sometimes, when I save to XML, I end up with a completely empty XML file.
I can't reproduce the issue on demand yet. It is just occasional. Are there steps that one can take to assist the user in this regard?
At the moment I do this:
public bool SavePublisherData()
{
bool bSaved = false;
try
{
XmlSerializer x = new XmlSerializer(_PublisherData.GetType());
using (StreamWriter writer = new StreamWriter(_strPathXML))
{
_PublisherData.BuildPublisherListFromDictionary();
x.Serialize(writer, _PublisherData);
bSaved = true;
}
}
catch
{
}
return bSaved;
}
The reason I have not put anything in the catch block is because this code is part of a C# DLL and I am calling it from an MFC project. I have read that you can't (or shouldn't) pass exceptions through from one environment to another. Thus, when an exception happens in my DLL I don't really know how I can sensibly feed that information to the user so they can see it. That is a side issue.
But this is how I save it. So, what steps can one take to try and prevent complete data loss?
Thank you.
Update
I have looked at the KB article that the link in the comments refers to and it states:
Use the following XmlSerializer class constructors. These class constructors cache the assemblies.
This is also re-stated in the article itself indicated in the comments:
What is the solution?
The default constructors XmlSerializer(type) and XmlSerializer(type, defaultNameSpace) caches the dynamic assembly so if you use those constructors only one copy of the dynamic assembly needs to be created.
Seems pretty smart… why not do this in all constructors? Hmm… interesting idea, wonder why they didn’t think of that one:) Ok, the other constructors are used for special cases, and the assumption would be that you wouldn’t create a ton of the same XmlSerializers using those special cases, which would mean that we would cache a lot of items we later didn’t need and use up a lot of extra space. Sometimes you have to do what is good for the majority of the people.
So what do you do if you need to use one of the other constructors? My suggestion would be to cache the XmlSerializer if you need to use it often. Then it would only be created once.
My code uses one of these default constructors as you can see:
XmlSerializer(_PublisherData.GetType());
So I don't think I need to worry about this XmlSerializerFactory in this instance.
I would like to be able to:
compare Word Interop COM proxies on a "reference equality" basis; and
map from a specific object (say a paragraph) to the collection it comes from, OR at least
determine whether two paragraphs are from the same section and which one comes relatively before the previous one
Why do I want to do this? I am trying to build a Word Add-In that acts similarly to a spell-checker in the sense that it runs in the background (by background I mean by regularly stealing time from the main Word thread using SendMessage) and scans the document for certain text "tokens". I want to be able to keep a collection of the tokens around and update them as the document changes. A specific example of this is if the user edits a given paragraph, I want to rescan the paragraph and update my data structure which points to that paragraph. If there is no way to map between the paragraph the user edited in (i.e. the paragraph where the start of the selection range is) and a paragraph that I have "stored" in a data structure, I can't do this.
Example Code for item #1, above
If I write the following VBA code:
Dim Para1 As Paragraph
Dim Para2a As Paragraph
Dim Para2b As Paragraph
Set Para1 = ActiveDocument.Paragraphs(1)
Set Para2a = Para1.Next
Set Para2b = Para1.Next.Next.Previous
If Para2a Is Para2b Then
Debug.Print ("Para2a Is Para2b")
Else
Debug.Print ("Para2a Is Not Para2b")
End If
Then I am getting the output:
"Para2a Is Not Para2b"
Which is perhaps physically true (different COM proxies) but not logically true. I need to be able to compare those paragraphs and determine if they are logically the same underlying paragraph.
(I am planning to write the add-in in C#, but the above VBA code demonstrates the kind of problem I need to overcome before doing too much coding).
For items 2 and 3 above, hopefully they will be self-explanatory. Say I have a paragraph (interop proxy) reference. I want to figure out "where" it is in the document. Does it belong to Section 1? Is it in a footer? Without this ability, all I can reasonably do to obtain an idea of where things come from is rescan the entire document every time it changes, which is of course absurdly inefficient and won't be timely enough for the app user.
Any thoughts greatly appreciated! I'm happy to post additional information as needed.
Navigating the particulars of reference equality in the context of COM Interop is always an interesting exercise.
I wouldn't be privy to the implementation details of the Paragraph.Next() and Paragraph.Previous() methods, however the behavior they exhibit is very similar to how COM-based collections act in general in regards to Runtime Callable Wrapper creation.
Typically, if possible, the framework avoids creating new RCW instances in response to additional references being made to COM objects that already have an RCW initialized and assigned. If an RCW already exists for a particular pointer to IUnknown, an internal reference count maintained by that RCW is incremented, and then the RCW is returned. This allows the framework to avoid incrementing the actual COM object's reference count (AddRef).
COM-based collections, which are COM objects that have managed representations implementing IEnumerable, seem to generate a new RCW each time an item is accessed, even if that item has already been accessed during the session.
For example:
Word.Document document = Application.ActiveDocument;
Paragraphs paragraphs = document.Paragraphs;
Paragraph first = paragraphs[1];
Paragraph second = paragraphs[1];
bool thisIsFalse = (first == second);
If you want to do any sort of "reference equality" checking, you need to escape from the COM based collection, specifically in your case: the Paragraphs object. You can do this simply by grabbing its kids and storing them in your own, purely managed and predictable collection, like so:
List<Paragraph> niceParagraphs = paragraphs.Cast<Paragraph>().ToList();
Although using LINQ with COM Interop may look a bit scary (if it doesn't to you...it really should!) I'm fairly certain the above code is safe and will not leave any dangling references out there, or anything else nasty. I have not tested the above code exhaustively, however.
Don't forget to properly release those resources when you are done with them, at least if your requirements require that level of prudence.