Is there a way to put the GC on hold completely for a section of code?
The only thing I've found in other similar questions is GC.TryStartNoGCRegion but it is limited to the amount of memory you specify which itself is limited to the size of an ephemeral segment.
Is there a way to bypass that completely and tell .NET "allocate whatever you need, don't do GC period" or to increase the size of segments? From what I found it is at most 1GB on a many core server and this is way less than what I need to allocate yet I don't want GC to happen (I have up to terabytes of free RAM and there are thousands of GC spikes during that section, I'd be more than happy to trade those for 10 or even 100 times the RAM usage).
Edit:
Now that there's a bounty I think it's easier if I specify the use case. I'm loading and parsing a very large XML file (1GB for now, 12GB soon) into objects in memory using LINQ to XML. I'm not looking for an alternative to that. I'm creating millions of small objects from millions of XElements and the GC is trying to collect non-stop while I'd be very happy keeping all that RAM used up. I have 100s of GBs of RAM and as soon as it hits 4GB used, the GC starts collecting non-stop which is very memory friendly but performance unfriendly. I don't care about memory but I do care about performance. I want to take the opposite trade-off.
While i can't post the actual code here is some sample code that is very close to the end code that may help those who asked for more information :
var items = XElement.Load("myfile.xml")
.Element("a")
.Elements("b") // There are about 2 to 5 million instances of "b"
.Select(pt => new
{
aa = pt.Element("aa"),
ab = pt.Element("ab"),
ac = pt.Element("ac"),
ad = pt.Element("ad"),
ae = pt.Element("ae")
})
.Select(pt => new
{
aa = new
{
aaa = double.Parse(pt.aa.Attribute("aaa").Value),
aab = double.Parse(pt.aa.Attribute("aab").Value),
aac = double.Parse(pt.aa.Attribute("aac").Value),
aad = double.Parse(pt.aa.Attribute("aad").Value),
aae = double.Parse(pt.aa.Attribute("aae").Value)
},
ab = new
{
aba = double.Parse(pt.aa.Attribute("aba").Value),
abb = double.Parse(pt.aa.Attribute("abb").Value),
abc = double.Parse(pt.aa.Attribute("abc").Value),
abd = double.Parse(pt.aa.Attribute("abd").Value),
abe = double.Parse(pt.aa.Attribute("abe").Value)
},
ac = new
{
aca = double.Parse(pt.aa.Attribute("aca").Value),
acb = double.Parse(pt.aa.Attribute("acb").Value),
acc = double.Parse(pt.aa.Attribute("acc").Value),
acd = double.Parse(pt.aa.Attribute("acd").Value),
ace = double.Parse(pt.aa.Attribute("ace").Value),
acf = double.Parse(pt.aa.Attribute("acf").Value),
acg = double.Parse(pt.aa.Attribute("acg").Value),
ach = double.Parse(pt.aa.Attribute("ach").Value)
},
ad1 = int.Parse(pt.ad.Attribute("ad1").Value),
ad2 = int.Parse(pt.ad.Attribute("ad2").Value),
ae = new double[]
{
double.Parse(pt.ae.Attribute("ae1").Value),
double.Parse(pt.ae.Attribute("ae2").Value),
double.Parse(pt.ae.Attribute("ae3").Value),
double.Parse(pt.ae.Attribute("ae4").Value),
double.Parse(pt.ae.Attribute("ae5").Value),
double.Parse(pt.ae.Attribute("ae6").Value),
double.Parse(pt.ae.Attribute("ae7").Value),
double.Parse(pt.ae.Attribute("ae8").Value),
double.Parse(pt.ae.Attribute("ae9").Value),
double.Parse(pt.ae.Attribute("ae10").Value),
double.Parse(pt.ae.Attribute("ae11").Value),
double.Parse(pt.ae.Attribute("ae12").Value),
double.Parse(pt.ae.Attribute("ae13").Value),
double.Parse(pt.ae.Attribute("ae14").Value),
double.Parse(pt.ae.Attribute("ae15").Value),
double.Parse(pt.ae.Attribute("ae16").Value),
double.Parse(pt.ae.Attribute("ae17").Value),
double.Parse(pt.ae.Attribute("ae18").Value),
double.Parse(pt.ae.Attribute("ae19").Value)
}
})
.ToArray();
Currently the best i could find was switching to server GC (which changed nothing by itself) that has larger segment size and let me use a much larger number for no gc section :
GC.TryStartNoGCRegion(10000000000); // On Workstation GC this crashed with a much lower number, on server GC this works
It goes against my expectations (this is 10GB, yet from what i could find in the doc online my segment size in my current setup should be 1 to 4GB so i expected an invalid argument).
With this setup i have what i wanted (GC is on hold, i have 22GB allocated instead of 7, all the temporary objects aren't GCed, but the GC runs once (a single time!) over the whole batch process instead of many many times per second (before the change the GC view in visual studio looked like a straight line from all the individual dots of GC triggering).
This isn't great as it won't scale (adding a 0 leads to a crash) but it's better than anything else i found so far.
Unless anyone finds out how to increase the segment size so that i can push this further or has a better alternative to completely halt the GC (and not just a certain generation but all of it) i will accept my own answer in a few days.
I think the best solution in your case would be this piece of code I used in one of my projects some times ago
var currentLatencySettings = GCSettings.LatencyMode;
GCSettings.LatencyMode = GCLatencyMode.LowLatency;
//your operations
GCSettings.LatencyMode = currentLatencySettings;
You are surpressing as much as you can (according to my knowledge) and you can still call GC.Collect() manually.
Look at the MSDN article here
Also, I would strongly suggest paging the parsed collection using LINQ Skip() and Take() methods. And finally joining the output arrays
I am not sure whether its possible in your case, however have you tried processing your XML file in parallel. If you can break down your XML file in smaller parts, you can spawn multiple processes from within your code. Each process handling a separate file. You can then combine all the results. This would certainly increase your performance and also with each process separately you will have its separate allocation of memory, which should also increase your memory allocation at a particular time while processing all the XML files.
Related
Given the task to improve the performance of a piece of code, I have came across the following phenomenon. I have a large collection of reference types in a generic Queue and I'm removing and processing the element one by one, then add them to another generic collection.
It seems the larger the elements are the more time it takes to add the element to the collection.
Trying to narrow down the problem to the relevant part of the code, I've written a test (omitting the processing of elements, just doing the insert):
class Small
{
public Small()
{
this.s001 = "001";
this.s002 = "002";
}
string s001;
string s002;
}
class Large
{
public Large()
{
this.s001 = "001";
this.s002 = "002";
...
this.s050 = "050";
}
string s001;
string s002;
...
string s050;
}
static void Main(string[] args)
{
const int N = 1000000;
var storage = new List<object>(N);
for (int i = 0; i < N; ++i)
{
//storage.Add(new Small());
storage.Add(new Large());
}
List<object> outCollection = new List<object>();
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = N-1; i > 0; --i)
{
outCollection.Add(storage[i];);
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
}
On the test machine, using the Small class, it takes about 25-30 ms to run, while it takes 40-45 ms with Large.
I know that the outCollection has to grow from time to time to be able to store all the items, so there is some dynamic memory allocation. But given an initial collection size even makes the difference more obvious: 11-12 ms with Small and 35-38 ms with Large objects.
I am somewhat surprised, as these are reference types, so I was expecting the collections to work only with references to the Small/Large instances. I have read Eric Lippert's relevant article that and know that references should not be treated as pointers. At the same time, AFAIK currently they are implemented as pointers and their size and the collection's performance should be independent of element size.
I've decided to put up a question here hoping that someone could explain or help me to understand what's happening here. Aside the performance improvement, I'm really curious what is happening behind the scenes.
Update:
Profiling data using the diagnostic tools didn't help me much, although I have to admit I'm not an expert using the profiler. I'll collect more data later today to find where the bottleneck is.
The pressure on the GC is quite high of course, especially with the Large instances. But once the instances are created and stored in the storage collection, and the program enters the loop, there was no collection triggered any more, and memory usage hasn't increased significantly (outCollction already pre-allocated).
Most of the CPU time is of course spent with memory allocation (JIT_New), around 62% and the only other significant entry is Function Name Inclusive Samples Exclusive Samples Inclusive Samples % Exclusive Samples % Module Name
System.Collections.Generic.List`1[System.__Canon].Add with about 7%.
With 1 million items the preallocated outCollection size is 8 million bytes (the same as the size of storage); one can suspect 64 bit addresses being stored in the collections.
Probably I'm not using the tools properly or don't have the experience to interpret the results correctly, but the profiler didn't help me to get closer to the cause.
If the loop is not triggering collections and it only copies pointers between 2 pre-allocated collections, how could the item size cause any difference? Cache hit/miss ratio is supposed to be the more or less the same in both cases, as the loop is iteration over a list of "addresses" in both cases.
Thanks for all the help so far, I will collect more data, and put an update here if anything found.
I suspect that at least one action in the above (maybe some type checks) will require a de-reference. Then the fact that many Smalls are probably sat close together on the heap and thus sharing cache lines could account for some amount of difference (certainly many more of them could share a single cache line than Larges).
Added to which you are also accessing them in the reverse order in which they were allocated which maximises such a benefit.
We have created a monitoring application for our enterprise app that will monitor our applications Performance counters. We monitor a couple system counters (memory, cpu) and 10 or so of our own custom performance counters. We have 7 or 8 exes that we monitor, so we check 80 counters every couple seconds.
Everything works great except when we loop over the counters the cpu takes a hit, 15% or so on my pretty good machine but on other machines we have seen it much higher. We are wanting our monitoring app to run discretely in the background looking for issues, not eating up a significant amount of the cpu.
This can easily be reproduced by this simple c# class. This loads all processes and gets Private Bytes for each. My machine has 150 processes. CallNextValue Takes 1.4 seconds or so and 16% cpu
class test
{
List<PerformanceCounter> m_counters = new List<PerformanceCounter>();
public void Load()
{
var processes = System.Diagnostics.Process.GetProcesses();
foreach (var p in processes)
{
var Counter = new PerformanceCounter();
Counter.CategoryName = "Process";
Counter.CounterName = "Private Bytes";
Counter.InstanceName = p.ProcessName;
m_counters.Add(Counter);
}
}
private void CallNextValue()
{
foreach (var c in m_counters)
{
var x = c.NextValue();
}
}
}
Doing this same thing in Perfmon.exe in windows and adding the counter Process - Private Bytes with all processes selected I see virtually NO cpu taken up and it's also graphing all processes.
So how is Perfmon getting the values? Is there a better/different way to get these performance counters in c#?
I've tried using RawValue instead of NextValue and i don't see any difference.
I've played around with Pdh call in c++ (PdhOpenQuery, PdhCollectQueryData, ...). My first tests don't seem like these are any easier on the cpu but i haven't created a good sample yet.
I'm not very familiar with the .NET performance counter API, but I have a guess about the issue.
The Windows kernel doesn't actually have an API to get detailed information about just one process. Instead, it has an API that can be called to "get all the information about all the processes". It's a fairly expensive API call. Every time you do c.NextValue() for one of your counters, the system makes that API call, throws away 99% of the data, and returns the data about the single process you asked about.
PerfMon.exe uses the same PDH APIs, but it uses a wildcard query -- it creates a single query that gets data for all of the processes at once, so it essentially only calls c.NextValue() once every second instead of calling it N times (where N is the number of processes). It gets a huge chunk of data back (data for all of the processes), but it's relatively cheap to scan through that data.
I'm not sure that the .NET performance counter API supports wildcard queries. The PDH API does, and it would be much cheaper to perform one wildcard query than to perform a whole bunch of single-instance queries.
Sorry for a long response, but I've found your question only now. Anyway, if anyone will need additional help, I have a solution:
I've made a little research on my custom process and I've understood that when we have a code snippet like
PerformanceCounter ourPC = new PerformanceCounter("Process", "% Processor time", "processname", true);
ourPC.NextValue();
Then our performance counter's NextValue() will show you the (number of logical cores * task manager cpu load of the process) value which is kind of logical thing, I suppose.
So, your problem may be that you have a slight CPU load in the task manager because it understands that you have a multiple core CPU, although the performance counter counts it by the formula above.
I see a one (kind of crutchy) possible solution for your problem so your code should be rewritten like this:
private void CallNextValue()
{
foreach (var c in m_counters)
{
var x = c.NextValue() / Environment.ProcessorCount;
}
}
Anyway, I do not recommend you to use Environment.ProcessorCount although I've used it: I just didn't want to add too much code to my short snippet.
You can see a good way to find out how much logical cores (yeah, if you have core i7, for example, you'll have to count logical cores, not physical) do you have in a system if you'll follow this link:
How to find the Number of CPU Cores via .NET/C#?
Good luck!
We have a Web Service using WebApi 2, .NET 4.5 on Server 2012. We were seeing occasional latency increases by 10-30ms with no good reason. We were able to track down the problematic piece of code to LOH and GC.
There is some text which we convert to its UTF8 byte representation (actually, the serialization library we use does that). As long as the text is shorter than 85000 bytes, latency is stable and short: ~0.2 ms on average and at 99%. As soon as the 85000 boundary is crossed, average latency increases to ~1ms while the 99% jumps to 16-20ms. Profiler shows that most of the time is spent in GC. To be certain, if I put GC.Collect between iterations, the measured latency goes back to 0.2ms.
I have two questions:
Where does the latency come from? As far as I understand the LOH
isn't compacted. SOH is being compacted, but doesn't show the latency.
Is there a practical way to work around this? Note
that I can’t control the size of the data and make it smaller.
--
public void PerfTestMeasureGetBytes()
{
var text = File.ReadAllText(#"C:\Temp\ContactsModelsInferences.txt");
var smallText = text.Substring(0, 85000 + 100);
int count = 1000;
List<double> latencies = new List<double>(count);
for (int i = 0; i < count; i++)
{
Stopwatch sw = new Stopwatch();
sw.Start();
var bytes = Encoding.UTF8.GetBytes(smallText);
sw.Stop();
latencies.Add(sw.Elapsed.TotalMilliseconds);
//GC.Collect(2, GCCollectionMode.Default, true);
}
latencies.Sort();
Console.WriteLine("Average: {0}", latencies.Average());
Console.WriteLine("99%: {0}", latencies[(int)(latencies.Count * 0.99)]);
}
The performance problems usually come from two areas: allocation and fragmentation.
Allocation
The runtime guarantees clean memory so spends cycles cleaning it. When you allocate a large object, that's a lot of memory and starts to add milliseconds to a single allocation (when lets be honest, simple allocation in .NET is actually very fast, so we usually never care about this).
Fragmentation occurs when LOH objects are allocated then reclaimed. Until recently, the GC could not reorganise the memory to remove these old object "gaps", and thus could only fit the next object in that gap if it was the same size or smaller. Recently, the GC has been given the ability to compact the LOH, which removes this issue, but costs time during compaction.
My guess in your case is you are suffering from both issues and triggering GC runs, but it depends on how often your code is attempting to allocate items in the LOH. If you are doing lots of allocations, try the object pooling route. If you cannot control a pool effectively (lumpy object lifetimes or disparate usage patterns), try chunking the data you are working against to avoid it completely.
Your Options
I've encountered two approaches to the LOH:
Avoid it.
Use it, but realise you are using it and manage it explicitly.
Avoid it
This involves chunking your large object (usually an array of some sort) into, well, chunks that each fall under the LOH barrier. We do this when serialising large object streams. Works well, but an implementation would be specific to your environment so I'm hesitant to provide a coded example.
Use it
A simple way to tackle both allocation and fragmentation is long-lived objects. Explicitly make an empty array (or arrays) of a large size to accommodate your large object, and don't get rid of it (or them). Leave it around and re-use it like an object pool. You pay for this allocation, but can do this either on first use or during application idle time, but you pay less for re-allocation (because you aren't re-allocating) and lessen fragmentation issues because you aren't constantly asking to allocate stuff and you aren't reclaiming items (which causes the gaps in the first place).
That said, a halfway house may be in order. Reserve a section of memory up-front for an object pool. Done early, these allocations should be contiguous in memory so you won't get any gaps, and leave the tail end of the available memory for uncontrolled items. Do beware though that this obviously has an impact on the working set of your application - an object pool takes space regardless of it being used or not.
Resources
The LOH is covered a lot out in the web, but pay attention to the date of the resource. In the latest .NET versions the LOH has received some love, and has improved. That said, if you are on an older version I think the resources on the net are fairly accurate as the LOH never really received any serious updates in a long time between inception and .NET 4.5 (ish).
For example, there is this article from 2008 http://msdn.microsoft.com/en-us/magazine/cc534993.aspx
And a summary of improvements in .NET 4.5: http://blogs.msdn.com/b/dotnet/archive/2011/10/04/large-object-heap-improvements-in-net-4-5.aspx
In addition to the following, make sure that you're using the server garbage collector. That doesn't affect how the LOH is used, but my experience is that it does significantly reduce the amount of time spent in GC.
The best work around I found for avoiding large object heap problems is to create a persistent buffer and re-use it. So rather than allocating a new byte array with every call to Encoding.GetBytes, pass the byte array to the method.
In this case, use the GetBytes overload that takes a byte array. Allocate an array that's large enough to hold the bytes for your longest expected string, and keep it around. For example:
// allocate buffer at class scope
private byte[] _theBuffer = new byte[1024*1024];
public void PerfTestMeasureGetBytes()
{
// ...
for (...)
{
var sw = Stopwatch.StartNew();
var numberOfBytes = Encoding.UTF8.GetBytes(smallText, 0, smallText.Length, _theBuffer, 0);
sw.Stop();
// ...
}
The only problem here is that you have to make sure your buffer is large enough to hold the largest string. What I've done in the past is to allocate the buffer to the largest size I expect, but then check to make sure it's large enough whenever I go to use it. If it's not large enough, then re-allocate it. How you do that depends on how rigorous you want to be. When working with primarily Western European text, I'd just double the string length. For example:
string textToConvert = ...
if (_theBuffer.Length < 2*textToConvert.Length)
{
// reallocate the buffer
_theBuffer = new byte[2*textToConvert.Length];
}
Another way to do it is to just try the GetString, and reallocate on failure. Then retry. For example:
while (!good)
{
try
{
numberOfBytes = Encoding.UTF8.GetString(theString, ....);
good = true;
}
catch (ArgumentException)
{
// buffer isn't big enough. Find out how much I really need
var bytesNeeded = Encoding.UTF8.GetByteCount(theString);
// and reallocate the buffer
_theBuffer = new byte[bytesNeeded];
}
}
If you make the buffer's initial size large enough to accommodate the largest string you expect, then you probably won't get that exception very often. Which means that the number of times you have to reallocate the buffer will be very small. You could, of course, add some padding to the bytesNeeded so that you allocate more, in case you have some other outliers.
I have a giant data set in a c# windows service that uses about 12GB of ram.
Dictionary<DateTime,List<List<Item>>>
There is a constant stream of new data being added, about 1GB per hour. Old data is occasionally removed. This is a high speed buffer for web pages.
I have a parameter in the config file called "MaxSizeMB". I would like to allow the user to enter, say "11000", and my app will delete some old data every time the app exceeds 11GB of ram usage.
This has proved to be frustratingly difficult.
You would think that you can just call GC.GetTotalMemory(false). This would give you the memory usage of .net managed objects (lets pretent it says 10.8GB). Then you just add a constant 200MB as a safety net for all the other stuff allocated in the app.
This doesn't work. In fact, the more data that is loaded, the bigger the difference between GC.GetTotalMemory and task manager. I even tried to work out a constant multiplier value instead of a constant add value, but I cannot get consistent results. The best i have done so far is count the total number of items in the data structure, multiply by 96, and pretend that number is the ram usage. This is also confusing because the Item object is a 32byte struct. This pretend ram usage is also too unstable. Sometimes the app will delete old data at 11GB, but sometimes it will delete data at 8GB ram usage, because my pretend number calculates a false 11GB.
So i can either use this conservative fake ram calculation, and often not use all the ram I am allowed to use (like 2GB lost), or I can use GC.GetTotalMemory and the customer will freak out that the app goes over the ram setting occasionally.
Is there any way I can use as much ram as possible without going over a limit, as it appears in task manager? I don't care if the math is a multiplier, constant add value, power, whatever. I want to stuff data into a data structure and delete data when I hit the max setting.
Note: i already do some memory shrinking techniques such as using a struct as the Item, list.Capacity = list.Count, and GC.Collect(GC.MaxGeneration). Those seem like a separate issue though.
Use System.Diagnostics.PerformanceCounter and monitor your current process memory usage and available memory, based on this, your application should decide to delete something or not..
Several problems
Garbage collection
Getting a good measure of memory
What is the maximum
You assume there is a hard maximum.
But an object needs contiguous memory so that is a soft maximum.
As for an accurate size measure you could record the size of each list and keep a running total.
Then when you purge read the size and reduce from that running total.
Why fight .NET memory limitations and physical memory limitations
I would so go with a database on an SSD
If it is read only and you have known classes then you could use like a RavenDB
Reconsider your design
OK so I am not getting very far with managing .NET memory limitation that you are never going to tame.
Still reconsider your design.
If your PK is a DateTime and assume you only need 24 hours put one per dictionary per hour as that is just one object.
At the end of 23 hours new the prior - let the GC collect the whole thing.
The answer is super simple.
var n0 = System.Diagnostics.Process.GetCurrentProcess().PrivateMemorySize64;
var n1 = System.Diagnostics.Process.GetCurrentProcess().WorkingSet64;
var n2 = System.Diagnostics.Process.GetCurrentProcess().VirtualMemorySize64;
float f0 = ((float)n0)/(1000*1000);
float f1 = ((float)n1)/(1000*1000);
float f2 = ((float)n2)/(1000*1000);
Console.WriteLine("private = " + f0 + " MB");
Console.WriteLine("working = " + f1 + " MB");
Console.WriteLine("virtual = " + f2 + " MB");
results:
private = 931.9096 MB
working = 722.0756 MB
virtual = 1767.146 MB
All this moaning and fussing about task manager and .net object size and the answer is built into .NET in one line of code.
I gave the answer to Sarvesh because he got me started down the right path with PerformanceCounter, but GetCurrentProcess() turned out to be a nice shortcut to simply inspect your own process.
While talking to a colleague about a particular group of apps using up nearly 1.5G memory on startup... he pointed me to a very good link on .NET production debugging
The part that has me puzzled is ...
For example, if you allocate 1 MB of
memory to a single block, the large
object heap expands to 1 MB in size.
When you free this object, the large
object heap does not decommit the
virtual memory, so the heap stays at 1
MB in size. If you allocate another
500-KB block later, the new block is
allocated within the 1 MB block of
memory belonging to the large object
heap. During the process lifetime, the
large object heap always grows to hold
all the large block allocations
currently referenced, but never
shrinks when objects are released,
even if a garbage collection occurs.
Figure 2.4 on the next page shows an
example of a large object heap.
Now let's say we have a fictional app that creates a flurry of large objects ( > 85KB), so the large object heap grows lets say to 200 Meg. Now lets say we have 10 such app instances running.. so that 2000 Megs allocated. Now is this memory never given back to the OS until the process shuts down... (is what I understood)
Are there any gaps in my understanding? How do we get back unused memory in the various LOHeaps ; we don't create the perfect storm of OutOfMemoryExceptions ?
Update: From Marc's response, I wanted to clarify that the LOH objects are not referenced - the large objects are use-n-throw - however the heap doesn't shrink even though the heap is relatively empty post the initial surge.
Update#2: Just including a code snippet (exaggerated but gets the point across I think).. I see an OutOfMemoryException around the time the Virtual memory hits the 1.5G mark on my machine (1.7G on another).. From Eric L.'s blog post, 'process memory can be visualized as a massive file on disk..' - this result is thus unexpected. The machines in this instance had GBs of free space on the HDD. Does the PageFile.sys OS file (or related settings) impose any restrictions?
static float _megaBytes;
static readonly int BYTES_IN_MB = 1024*1024;
static void BigBite()
{
try
{
var list = new List<byte[]>();
int i = 1;
for (int x = 0; x < 1500; x++)
{
var memory = new byte[BYTES_IN_MB + i];
_megaBytes += memory.Length / BYTES_IN_MB;
list.Add(memory);
Console.WriteLine("Allocation #{0} : {1}MB now", i++, _megaBytes);
}
}
catch (Exception e)
{ Console.WriteLine("Boom! {0}", e); // I put a breakpoint here to check the console
throw;
}
}
static void Main(string[] args)
{
BigBite();
Console.WriteLine("Check VM now!"); Console.ReadLine();
_megaBytes = 0;
ThreadPool.QueueUserWorkItem(delegate { BigBite(); });
ThreadPool.QueueUserWorkItem(delegate { BigBite(); });
Console.ReadLine(); // will blow before it reaches here
}
There is a clarification I would like to make first.
- Assuming you are running app as a 32bit app, the VA space available for your process is only 2 GB , 3GB if you enabled large address space switch, so even if you have HUGE page file, it doesn't matter if you are 32bit process, it matters if you run 64bit, where you have huge address space.
Object with size > 85000 bytes are allocated on LOH, note it is 85000 bytes not 85K, it is also implementation details that could change.
Now, back to your question.
The GC will un-commit the LOH segments that are not used in 2 situations
1- When the memory pressure on the machine is high ( ~95-98% )
2- When it fails to satisfy new allocation requests, it will decommit the unused pages in the LOH
so you will get back the memory in one of these cases.
The fact that you are hitting an OOM before reaching the 2GB limit could mean you have VA fragmentation, VA fragmentation occur when you don't have continuous VA address space to satisfy new allocation, for example you ask for 8KB segment, and you don't have 2 consecutive pages in your VA ( assuming page size is 4 K)
you can use !vamap debugger extension in Debugging tools for windows to validate this.
Hope this helps
Thanks
If the LOH wants to keep memory, that is up to the LOH - however, don't forget that OutOfMemoryException is per-process, since really the hard disk is the limiting factor for virtual memory. Eric Lippert blogged about this recently. Of course, that doesn't prevent it getting poor performance from all the paging...
Well, if you really have this kind of allocation pattern, you could move your large objects into another appdomain - when you decide to free all of the large objects, release the appdomain and the heap for that appdomain will be released.