Very High Memory Usage in .NET 4.0

Very High Memory Usage in .NET 4.0 - c#

I have a C# Windows Service that I recently moved from .NET 3.5 to .NET 4.0. No other code changes were made.
When running on 3.5, memory utilzation for a given work load was roughly 1.5 GB of memory and throughput was 20 X per second. (The X doesn't matter in the context of this question.)
The exact same service running on 4.0 uses between 3GB and 5GB+ of memory, and gets less than 4 X per second. In fact, the service will typically end up stalling out as memory usage continue to climb until my system is siting at 99% utilization and page file swapping goes nuts.
I'm not sure if this has to do with garbage collection, or what, but I'm having trouble figuring it out. My window service uses the "Server" GC via the config file switch seen below:
<runtime>
<gcServer enabled="true"/>
</runtime>
Changing this option to false didn't seem to make a difference. Futhermore, from the reading I've done on the new GC in 4.0, the big changes only effect the workstation GC mode, not server GC mode. So perhaps GC has nothing to do with the issue.
Ideas?

Well this was an interesting one.
The root cause turns out to be a change in the behavior of SQL Server Reporting Services' LocalReport class (v2010) when running this on top of .NET 4.0.
Basically, Microsoft altered the behavior of RDLC processing so that each time a report was processed it was done so in a seperate application domain. This was actually done specifically to address a memory leak caused by the inability to unload assemblies from app domains. When the LocalReport class processed an RDLC file, it actually creates an assembly on the fly and loads it into the app domain.
In my case, due to the large volume of report I was processing, this was resulting in very large numbers of System.Runtime.Remoting.ServerIdentity objects being created. This was my tip off to the cause, as I was confused as to why processing an RLDC required remoting.
Of course, to call a method on a class in another app domain, remoting is exactly what you use. In .NET 3.5, this wasn't necessary as, by default, the RDLC-assembly was loaded into the same app domain. In .NET 4.0, however, a new app domain is created by default.
The fix was fairly easy. First I needed to go enable legacy security policy using the following config:
<runtime>
<NetFx40_LegacySecurityPolicy enabled="true"/>
</runtime>
Next, I needed to force the RDLCs to be processed in the same app domain as my service by calling the following:
myLocalReport.ExecuteReportInCurrentAppDomain(AppDomain.CurrentDomain.Evidence);
This resolved the issue.

I ran into this exact issue. And it is true that app domains are created and not cleaned up. However I wouldn't recommend reverting to legacy. They can be cleaned up by ReleaseSandboxAppDomain().
LocalReport report = new LocalReport();
...
report.ReleaseSandboxAppDomain();
Some other things I also do to clean up:
Unsubscribe to any SubreportProcessing events,
Clear Data Sources,
Dispose the report.
Our windows service processes several reports a second and there are no leaks.

I'm pretty late to this, but I have a real solution and can explain why!
It turns out that LocalReport here is using .NET Remoting to dynamically create a sub appdomain and run the report in order to avoid a leak internally somewhere. We then notice that, eventually, the report will release all the memory after 10 to 20 minutes. For people with a lot of PDFs being generated, this isn't going to work. However, the key here is that they are using .NET Remoting. One of the key parts to Remoting is something called "Leasing". Leasing means that it will keep that Marshal Object around for a while since Remoting is usually expensive to setup and its probably going to be used more than once. LocalReport RDLC is abusing this.
By default, the leasing time is... 10 minutes! Also, if something makes various calls into it, it adds another 2 minutes to the wait time! Thus, it can randomly be between 10 and 20 minutes depending how the calls line up. Luckily, you can change how long this timeout happens. Unluckily, you can only set this once per app domain... Thus, if you need remoting other than PDF generation, you will probably need to make another service running it so you can change the defaults. To do this, all you need to do is run these 4 lines of code at startup:
LifetimeServices.LeaseTime = TimeSpan.FromSeconds(5);
LifetimeServices.LeaseManagerPollTime = TimeSpan.FromSeconds(5);
LifetimeServices.RenewOnCallTime = TimeSpan.FromSeconds(1);
LifetimeServices.SponsorshipTimeout = TimeSpan.FromSeconds(5);
You'll see the memory use start to rise and then within a few seconds you should see the memory start coming back down. Took me days with a memory profiler to really track this down and realize what was happening.
You can't wrap ReportViewer in a using statement (Dispose crashes), but you should be able to if you use LocalReport directly. After that disposes, you can call GC.Collect() if you want to be doubly sure you are doing everything you can to free up that memory.
Hope this helps!
Edit
Apparently, you should call GC.Collect(0) after generating a PDF report or else it appears the memory use could still get high for some reason.

You might want to
profile the heap
use WinDbg + SOS.dll to establish what resource is being leaked and from where the reference is held
Perhaps some API has changed semantics or there might even be a bug in the 4.0 version of the framework

Just for completeness, if anyone is looking for the equivalent ASP.Net web.config setting, it is:
<system.web>
<trust legacyCasModel="true" level="Full"/>
</system.web>
ExecuteReportInCurrentAppDomain works the same.
Thanks to this Social MSDN reference.

It seems as though Microsoft tried putting the report into its own separate memory space to work around all of the memory leaks rather than fix them. In doing so, they introduced some hard crashes, and ended up having more memory leaks anyway. They seem to cache the report definition, but never use it and never clean it up, and every new report creates a new report definition, taking up more and more memory.
I played around with doing the same thing: use a separate app domain and marshal the report over to it. I think that is a terrible solution and makes a mess very quickly.
What I did instead is similar: split the reporting part of your program out into its own separate reports program. This turns out to be a good way to organize your code anyway.
The tricky part is passing information to the separate program. Use the Process class to start a new instance of the reports program and pass any parameters it needs on the command line. The first parameter should be an enum or similar value indicating the report that should be printed. My code for this in the main program looks something like:
const string sReportsProgram = "SomethingReports.exe";
public static void RunReport1(DateTime pDate, int pSomeID, int pSomeOtherID) {
RunWithArgs(ReportType.Report1, pDate, pSomeID, pSomeOtherID);
}
public static void RunReport2(int pSomeID) {
RunWithArgs(ReportType.Report2, pSomeID);
}
// TODO: currently no support for quoted args
static void RunWithArgs(params object[] pArgs) {
// .Join here is my own extension method which calls string.Join
RunWithArgs(pArgs.Select(arg => arg.ToString()).Join(" "));
}
static void RunWithArgs(string pArgs) {
Console.WriteLine("Running Report Program: {0} {1}", sReportsProgram, pArgs);
var process = new Process();
process.StartInfo.FileName = sReportsProgram;
process.StartInfo.Arguments = pArgs;
process.Start();
}
And the reports program looks something like:
[STAThread]
static void Main(string[] pArgs) {
Application.EnableVisualStyles();
Application.SetCompatibleTextRenderingDefault(false);
var reportType = (ReportType)Enum.Parse(typeof(ReportType), pArgs[0]);
using (var reportForm = GetReportForm(reportType, pArgs))
Application.Run(reportForm);
}
static Form GetReportForm(ReportType pReportType, string[] pArgs) {
switch (pReportType) {
case ReportType.Report1: return GetReport1Form(pArgs);
case ReportType.Report2: return GetReport2Form(pArgs);
default: throw new ArgumentOutOfRangeException("pReportType", pReportType, null);
}
}
Your GetReportForm methods should pull the report definition, make use of relevant arguments to obtain the dataset, pass the data and any other arguments to the report, and then place the report in a report viewer on a form and return a reference to the form. Note that it is possible to extract much of this process so that you can basically say 'give me a form for this report from this assembly using this data and these arguments'.
Also note that both programs must be able to see your data types that are relevant to this project, so hopefully you have extracted your data classes into their own library, which both of these programs can share a reference to. It would not work to have all of the data classes in the main program, because you would have a circular dependency between the main program and the report program.
Don't over do it with the arguments, either. Do any database querying you need in the reports program; don't pass a huge list of objects (which probably wouldn't work anyway). You should just be passing simple things like database ID fields, date ranges, etc. If you have particularly complex parameters, you might need to push that part of the UI to the reports program too and not pass them as arguments on the command line.
You can also put a reference to the reports program in your main program, and the resulting .exe and any related .dlls will be copied to the same output folder. You can then run it without specifying a path and just use the executable filename by itself (ie: "SomethingReports.exe"). You can also remove the reporting dlls from the main program.
One issue with this is that you will get a manifest error if you've never actually published the reports program. Just dummy publish it once, to generate a manifest and then it will work.
Once you have this working, it's very nice to see your regular program's memory stay constant when printing a report. The reports program appears, taking up more memory than your main program, and then disappears, cleaning it up completely with your main program taking up no more memory than it already had.
Another issue might be that each report instance will now take up more memory than before, since they are now entire separate programs. If the user prints a lot of reports and never closes them, it will use up a lot of memory very fast. But I think this is still much better since that memory can easily be reclaimed simply by closing the reports.
This also makes your reports independent of your main program. They can stay open even after closing the main program, and you can generate them from the command line manually, or from other sources as well.

Related

How do I Integrate one application’s UI into another?

I apologize for the length of the question, but I believe it is difficult to understand the “why” without the background.
Background: I have two applications running in a Windows Embedded Standard 7 environment. They should be the only two applications running on the machine. One, called “Controller”, is written in C++ the other, “DBconnector”, is written in c#. This is not new code. It has been in active use and development for almost 20 years.
The purpose of the software is to run a manufacturing machine for producing parts. These machines are big and dangerous if the program crashes. Long ago, I discovered that if the network went down for some reason, all the threads in the application would stall – not just the network thread. This was disastrous since leaving the controller in a state with the wrong relays on in extremely rare circumstances could cause the machine to literally explode. Note: Several things have been added to the software and hardware to prevent this now. While this danger doesn’t really exist anymore, stability is still extremely important. I never want the operator to be stuck in a state where they can’t hit the reset button. My solution at the time was to move the networking tasks into a separate application. The OS was windows XP based at the time. I have no idea if the problem still exists in windows 10 since I really don’t want to rewrite hundreds of thousands of lines of code to try and merge the two programs now.
The development of the two programs diverged such that the one that controlled the machine, Controller, was designed for extreme stability and the other, DBconnector, was where dangerous things like networking and most file I/O happened. Communication between the two programs is facilitated using a memory mapped file that they both can access. I have no problem sharing window handles or process id’s or any other data that might be needed between the two programs.
Here is my question. How can I make the Controller application display the GUI of DBconnector? For example, I have started to add functionality to Controller that requires DBconnector to display the quality control sheets that are held on a web site on company servers. I want for an operator to be able to pull up the quality control sheet directly on the machine. The operator currently only interacts with the Controller application. I don’t want Controller to be able to access the network. Also, C# has some tools to make displaying a web page easy. It seems to me that the place to do this is DBconnector. The problem is that DBconnector runs in the background and cannot currently be seen or accessed by a user. So, the question is how to solve this.
First option I have tried is to tell DBconnector to come forward and put Controller in the background. Then, when the user is done, Controller comes back to the front. I have made this to work using some hacks, but it is inconsistent. The trick I used was to minimize and then maximize DBconnector which seems to bring it to the front most of the time and try to hold focus on one or the other. There still might be a way to do it this way, but it needs to be something that is consistent.
The second option is to run the DBconnector application inside of one of Controller’s windows. I have no idea how to do this. I thought about using ATL or COM, but I think these run as threads within Controllers process rather than as a separate application.
The third option I’ve considered is to create a window inside Controller that intercepts and passes all user input messages directly to Dbconnector using a windows message handle and takes a screenshot of DBconnector whenever the it is invalidated and passes it through the memory mapped file. Currently, this is what I am swaying towards.
Are there any suggestions on how to do the first and last option better, or how to do the second option at all, or another solution that I have missed? Keep in mind that our current hardware is running Windows Embedded Standard 7. The project is currently in visual studio 2015. The C++ window technology is MFC implemented using libraries originally from around 2003 I think. DBconnector is in .NET framework 4 in C#.

Using filehelpers ExcelStorage - Excel File not opening

I am using filehelpers ExcelStorage somewhat like this:
ExcelStorage provider = new ExcelStorage(typeof(Img));
provider.StartRow = 2;
provider.StartColumn = 1;
provider.FileName = "Customers.xls";
provider.HeaderRows = 6;
provider.InsertRecords(imgs.ToArray()); // imgs was a list before
And when I am done inserting records, I would like to open the Excelfile I created (with my software still running). But it seems that Excel is somehow locked. I.e. there is an Excel instance running in process manager. When I kill all Excel instances I can open the file. Do I have to dispose the ExcelStorage in some sort of way?

I've used FileHelpers, but not ExcelStorage. The link here suggests that you should probably be using FileHelpers.ExcelNPOIStorage instead.
Looking at the source code for ExcelStorage, there is no public dispose method. There is a private CloseAndCleanup method which is called at the end of InsertRecords. Therefore I don't think there's anything you are doing wrong.
The usage of ExcelNPOIStorage looks very much the same, there is a call to GC.Collect() within the private cleanup method here, so I'd guess that there was a known issue with the cleanup of the prior version of the component.

Your best bet is to grab a copy of HANDLE.EXE which you can use with an elevated command prompt to see what has a handle to the file in question. This may be your code, anti virus or excel (if open). Excel does keep a full lock on a file when open preventing ordinary notepad access etc.
If the process owning the handle to the file is your own code, then see if the handle exists once you have exited back to the development environment. If that clears the handle, then you are not releasing the lock properly and that can be slightly trickier as it will depend on exactly what you have coded.
The CloseAndCleanup function mentioned by #timbo is only called from a few places, the Sheets property and the ExtractRecords / InsertRecords functions. The only other thing to wonder is whether you are seeing any exceptions when it attempts to perform the CloseAndCleanup or the reference count the Excel application hasn't been properly released by the COM system.
If you can replicate this with a small sample app, I will be more than willing to give it a quick test and see what happens.
Note 1, if you are running your code from within Visual Studio, it may be a process called <APPNAME>.VSHOST.EXE which is visual studio's development process, or if you've turned off Visual Studio hosting, just your <APP>.EXE. If running within IIS for a web page or web service, you will more than likely have a w3p process.
Note 2, if you run handle without being elevated, it may or may not find the handle to the file in question. Therefore, it is always recommended to run elevated to ensure results are accurate.
Note 3, the difference between ExcelStorage and ExcelNPOIStorage is that the former deals with .xls and the latter deals with .xlsx if I remember rightly.

How to debug slow Office application interop constructor?

I have an application which deals with excel. Recently I encountered a problem with very slow creation of Excel object.
I've recreated the issue with this simple code:
Microsoft.Office.Interop.Excel.Application xlApp;
xlApp = new Microsoft.Office.Interop.Excel.Application();
The second line causes the delay.
In order to measure the time needed for new object allocation, above code has been extended with time tracking solution and the results are conclusive. In NORMAL situation, above code executes in 0.5s while in case of FAULTY-BEHAVIOR it can take up to 5 minutes.
There are no memory leaks and excel objects are being properly freed. My solution has been running 24/7 whole year without any issues. I'm not sure if it's important but the application is running on 20 separate user's sessions (server machine). So there are 20 copies of this application running at the same time and it may result in 20 copies of Excel running at the same time.
First time the issue has been noticed 2 months ago and has been solved by upgrade of Office (2010 -> 2013). This time I have more time to investigate and sadly results aren't promising.
Facts:
only one machine is currently affected by this issue (24 cpu cores, 24GB of Ram)
CPU isn't stressed at all when the "delay" happens
I've tried using "process monitor" application to verify what happens when we "new Excel.Application()" constructor (to see if there is any excessive disk/memory/cpu usage) - no signs of resources limitations. No sign of log files related to COM objects, etc.
The only issue here is this few minutes of delay. All other Excel Interop commands work as usual.
Main Question:
Is there a way to debug this Microsoft.Office.Interop.Excel.Application() constructor to see which part is an issue here?
External content
One guy with similar issue. His solution didn't help with my problem at all.
EDIT - additional test
PowerPoint constructor is not affected by the delay
ppApp = new Microsoft.Office.Interop.PowerPoint.Application();

I've found solution on my own. I'll post it as someone else may encounter similar problem and it can save him hours/days of investigation.
What i did to find solution?
I've analyzed test application (basically only one line where new excel application is being created) with Process Monitor and it didn't show anything important. Then I repeated analysis with newly started Excel process. It highlighted numerous reads of windows registry
HKEY_USERS\S-1-5-21-2929665075-1795331740-364918325-1024\Software\Microsoft\Office\15.0\Excel\Resiliency\DocumentRecovery
Under above location I've discovered tens of thousands of keys. They all were created by Excel's "auto-recovery" functionality. Because of the numbers, loading them when starting new Excel object was taking about 40 seconds. This number was additionally being multiplied by another 10-20 simultaneously loaded sessions (did I mention my application is running on 20 user sessions?).
Solution:
Removal of "Resilency" registry tree does the trick.
Why all these "auto-recovery" entries were there in a first place? I guess I don't handle closing of Excel very well and it "thinks" I'm having regular crashes and "tries" to help.
Now what's left is preventing it from happening all over again. I'll have a closer look at my ExcelClose() function.
Thanks for your attention - Adrian

I don't think the problem is with this constructor. Try to create the object dynamically:
var obj = Activator.CreateInstance(Type.GetTypeFromProgID("Excel.Application"));
Then cast it to Microsoft.Office.Interop.Excel.Application:
var xlApp = (Microsoft.Office.Interop.Excel.Application)obj;
MessageBox.Show(xlApp.Name);
I'd expect the slow-down to move to the Activator.CreateInstance call.
Anyway, you can try to work it around by placing the following into you app.config file (more details):
<runtime>
<generatePublisherEvidence enabled="false"/>
</runtime>
I'd also suggest to make sure you're running the latest VSTO Runtime and the latest Office PIAs.

Random error: Attempted to read or write protected memory

We have a C# .Net application using WCF services. And the application is deployed in our production server under a Windows Service Application. One part of the module is responsible for creating shape files ((*.shp, *.dbf) for a smaller area the workers will be working today and send them down to a PDA.
To write the shape files, we use a third party dll, NetTopologySuite
GisSharpBlog.NetTopologySuite.IO.ShapefileWriter
which is also in C#. (I am not sure whether any dll it reference use unmanaged code.)
The system might work fine for a while say for a week. Then suddenly we get an exception saying
Attempted to read or write protected memory.
This is often an indication that other memory is corrupt.
from the Write method, where we write the geometry collection to shape files.
sfw.Write(FileName, new GeometryCollection(gc.ToArray()));
(GeometryCollection is also from a third party dll, GeoAPI.dll)
This error brings down the whole service and makes it unfunctional. Then we would just restart the service and try to run the same data again, it would work fine for another week till it crash again. It happens only in production and at random times. We were not able to find the cause of the issue.
Many forums suggest that it might be because of memory leaks in some unmanaged code. But we couldn't find which one.
We are also ready to rewrite the part that create new shape files.
Please help me to resolve this issue.
Let me know if more details are required. Thanks in advance.

In my experience, that message was a result of a memory leak. This is what I'd do if I am in your situation especially since you are working on a third-party DLL.
1) Monitor your WCF server and see what is going on with the DLLHost.exe and the aspnet services in the task manager. I have a feeling that your third-party DLL has a memory leak that causes these 2 services to bloat and reach the limit of your servers memory. This is the reason why it works for a while and then suddenly just stopped working.
2) Identify a good schedule on when you can recycle your servers memory and application pool. Since the issue is rampant, you might want to do this every midnight or when no one is actively using it.
3) Write a good error logging code to know exactly what is happening during the time it bogged down. I would put the following information on the error logs: The parameters that you are passing, the user who encountered that problem etc. This is so you will know exactly what is happening.
4) Check the Event Viewer as maybe there is some information in there that can pinpoint the problem.
4) After doing 1, 2, and 3 and I will call your third-party DLL vendor and see what they can do to help you. You might need to provide the information that you collected from 1, 2, 3 and 4 items from above.
Good luck and I hope this will help.

I think you have some unmanaged code in the third libraries that is getting an address protected by the system or used by other applications.

You have an Access Violation (pointer to memory not belonging to your application space, including null/mass - 0x0 - address) in one of your third-party DLLs.
Or else, it's maybe some unmanaged COMObject you're using that causes this error.

The random nature of this error, would suggest to me that it may be a matter of threads. Specifically the Write method of ShapefileWriter might have been called, got delayed in a thread then you call Close. The delayed Write method then tries to write over a closed (and protected) file, which could result in the error you see.
This is purely speculation since there's not much code to make a better guess, but I've experienced this issue using video writing libraries, so it might be the same in your case.

Check to make sure you don't have threads within threads. That is what happened when I encountered this error. See this link for more information: Attempted to read or write protected memory. This is often an indication that other memory is corrupt

How can I debug an internal error in the .NET Runtime?

I am trying to debug some work that processes large files. The code itself works, but there are sporadic errors reported from the .NET Runtime itself. For context, the processing here is a 1.5GB file (loaded into memory once only) being processed and released in a loop, deliberately to try to reproduce this otherwise unpredictable error.
My test fragment is basically:
try {
byte[] data =File.ReadAllBytes(path);
for(int i = 0 ; i < 500 ; i++)
{
ProcessTheData(data); // deserialize and validate
// force collection, for tidiness
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
GC.WaitForPendingFinalizers();
}
} catch(Exception ex) {
Console.WriteLine(ex.Message);
// some more logging; StackTrace, recursive InnerException, etc
}
(with some timing and other stuff thrown in)
The loop will process fine for an non-deterministic number of iterations fully successfully - no problems whatsoever; then the process will terminate abruptly. The exception handler is not hit. The test does involve a lot of memory use, but it saw-tooths very nicely during each iteration (there is not an obvious memory leak, and I have plenty of headroom - 14GB unused primary memory at the worst point in the saw-tooth). The process is 64-bit.
The windows error-log contains 3 new entries, which (via exit code 80131506) suggest an Execution Engine error - a nasty little critter. A related answer, suggests a GC error, with a "fix" to disable concurrent GC; however this "fix" does not prevent the issue.
Clarification: this low-level error does not hit the CurrentDomain.UnhandledException event.
Clarification: the GC.Collect is there only to monitor the saw-toothing memory, to check for memory leaks and to keep things predictable; removing it does not make the problem go away: it just makes it keep more memory between iterations, and makes the dmp files bigger ;p
By adding more console tracing, I have observed it faulting during each of:
during deserialization (lots of allocations, etc)
during GC (between a GC "approach" and a GC "complete", using the GC notification API)
during validation (just foreach over some of the data) - curiously just after a GC "complete" during the validation
So lots of different scenarios.
I can obtain crash-dump (dmp) files; how can I investigate this further, to see what the system is doing when it fails so spectacularly?

If you have memory dumps, I'd suggest using WinDbg to look at them, assuming that you're not doing that already.
Trying running the comment !EEStack (mixed native and managed stack trace), and see if there's anything that might jump out in the stack trace. In my test program, I found this one of the times as my stack trace where a FEEE happened (I was purposefully corrupting the heap):
0:000> !EEStack
---------------------------------------------
Thread 0
Current frame: ntdll!NtWaitForSingleObject+0xa
Child-SP RetAddr Caller, Callee
00000089879bd3d0 000007fc586610ea KERNELBASE!WaitForSingleObjectEx+0x92, calling ntdll!NtWaitForSingleObject
00000089879bd400 000007fc5869811c KERNELBASE!RaiseException+0x68, calling ntdll!RtlRaiseException
[...]
00000089879bec80 000007fc49109cf6 clr!WKS::gc_heap::gc1+0x96, calling clr!WKS::gc_heap::mark_phase
00000089879becd0 000007fc49109c21 clr!WKS::gc_heap::garbage_collect+0x222, calling clr!WKS::gc_heap::gc1
00000089879bed10 000007fc491092f1 clr!WKS::GCHeap::RestartEE+0xa2, calling clr!Thread::ResumeRuntime
00000089879bed60 000007fc4910998d clr!WKS::GCHeap::GarbageCollectGeneration+0xdd, calling clr!WKS::gc_heap::garbage_collect
00000089879bedb0 000007fc4910df9c clr!WKS::GCHeap::Alloc+0x31b, calling clr!WKS::GCHeap::GarbageCollectGeneration
00000089879bee00 000007fc48ff82e1 clr!JIT_NewArr1+0x481
Since this could be related to heap corruption from the garbage collector, I would try the !VerifyHeap command. At least you could make sure that the heap is intact (and your problem lies elsewhere) or discover that your issue might actually be with the GC or some P/Invoke routines corrupting it.
If you find that the heap is corrupt, I might try and discover how much of the heap is corrupted, which you might be able to do via !HeapStat. That might just show the entire heap corrupt from a certain point, though.
It's difficult to suggest any other methods to analyze this via WinDbg, since I have no real clue about what your code is doing or how it's structured.
I suppose if you find it to be an issue with the heap and thus meaning it could be GC weirdness, I would look at the CLR GC events in Event Tracing for Windows.
If the minidumps you're getting aren't cutting it and you're using Windows 7/2008R2 or later, you can use Global Flags (gflags.exe) to attach a debugger when the process terminates without an exception, if you're not getting a WER notification.
In the Silent Process Exit tab, enter the name of the executable, not the full path to it (ie. TestProgram.exe). Use the following settings:
Check Enable Silent Process Exit Monitoring
Check Launch Monitor Process
For the Monitor Process, use {path to debugging tools}\cdb.exe -server tcp:port=5005 -g -G -p %e.
And apply the settings.
When your test program crashes, cdb will attach and wait for you to connect to it. Start WinDbg, type Ctrl+R, and use the connection string: tcp:port=5005,server=localhost.
You might be able to skip using remote debugging and instead use {path to debugging tools}\windbg.exe %e. However, the reason I suggested remote instead, was because WerFault.exe, which I believe is what reads the registry and launches the monitor process, will start the debugger in Session 0.
You can make session 0 interactive and connect to the window station, but I can't remember how that's done. It's also inconvenient, because you'd have to switch back and forth between sessions if you need to access any of your existing windows you've had open.

Tools->Debugging->General->Enable .Net Framework Debugging
+
Tools->IntelliTace-> IntelliTaceEbents And Call Information
+
Tools->IntelliTace-> Set StorIntelliTace Recordings in this directory
and choose a directory
should allow you to step INTO .net code and trace every single function call.
I tried it on a small sample project and it works
after each debug session it suppose to create a recording of the debug session. it the set directory
even if CLR dies if im not mistaken
this should allow you to get to the extact call before CLR collapsed.

Try writing a generic exception handler and see if there is an unhandled exception killing your app.
AppDomain currentDomain = AppDomain.CurrentDomain;
currentDomain.UnhandledException += new UnhandledExceptionEventHandler(MyExceptionHandler);
static void MyExceptionHandler(object sender, UnhandledExceptionEventArgs e) {
Console.WriteLine(e.ExceptionObject.ToString());
Console.WriteLine("Press Enter to continue");
Console.ReadLine();
Environment.Exit(1);

I usually invesitgate memory related problems with Valgrind and gdb.
If you run your things on Windows, there are plenty of good alternatives such as verysleepy for callgrind as suggested here:
Is there a good Valgrind substitute for Windows?
If you really want to debug internal errors of the .NET runtime, you have the problem that there is no source for neither the class libraries nor the VM.
Since you can't debug what you don't have, I suggest that (apart from decompiling the .NET framework libraries in question with ILSpy, and adding them to your project, which still doesn't cover the vm) you could use the mono runtime.
There you have both the source of the class libraries as well as of the VM.
Maybe your program works fine with mono, then your problem would be solved, at least as long as it's only a one-time-processing task.
If not, there is an extensive FAQ on debugging, including GDB support
http://www.mono-project.com/Debugging
Miguel also has this post regarding valgrind support:
http://tirania.org/blog/archive/2007/Jun-29.html
In addition to that, if you let it run on Linux, you can also use strace, to see what's going on in the syscalls. If you don't have extensive winforms usage or WinAPI calls, .NET programs usually work fine on Linux (for problems regarding file system case-sensitivity, you can loopmount a case-insensitive file system and/or use MONO_IOMAP).
If you're Windows centric person, this post
says the closest thing Windows has is WinDbg's Logger.exe, but ltrace information is not as extensive.
Mono sourcecode is available here:
http://download.mono-project.com/sources/
You are probably interested in the sources of the latest mono version
http://download.mono-project.com/sources/mono/mono-3.0.3.tar.bz2
If you need framework 4.5, you'll need mono 3, you can find precompiled packages here
https://www.meebey.net/posts/mono_3.0_preview_debian_ubuntu_packages/
If you want to make changes to the sourcecode, this is how to compile it:
http://ubuntuforums.org/showthread.php?t=1591370

There are .NET exceptions which can not be caught. Check out: http://msdn.microsoft.com/en-us/magazine/dd419661.aspx.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.