Why does .NET reserve so much memory for my application?

Why does .NET reserve so much memory for my application? - c#

When I run my application, in a profiler I see that is uses about 80MB of memory (total committed bytes, performance counter). But when I look at the size of the allocated memory, it is over 400MB!
So my question is, why is .NET reserving so much memory for my application? Is this normal?

you should read Memory Mystery. I had similar questions a while ago and stopped asking myself after reading this.
I read other sources, but I cant find now, use keywords "unreasonable allocation of memory windows OS". In a nutshell, OS gives more than your app require depending upon physically available memory resources
for e.g. if you are running your app on two machines with different RAM, it can be guaranteed that both these machines will have different memory allocations

As you no doubt know, there is a massive difference between actual memory used and allocated. An application's allocated memory doesn't mean that it's actually being used anywhere; all it really means is that the OS has 'marked' a zone of virtual memory (which is exactly that - virtual) ready for use by the application.
The memory isn't necessarily being used or starving other processes - it just could if the app starts to fill it.
This allocated number, also, will likely scale based on the overall memory ecosystem of the machine. If there's plenty of room when an app starts up, then it'll likely grab a larger allocation than if there's less.
That principle is the same as the one which says it's good practise to create a List<T>, say, with a reasonable initial capacity that'll mean a decent number of items can be added before resizing needs to take place. The OS takes the same approach with memory usage.

"Reserving" memory is by no means the same as "allocated" ram. Read the posts Steve and Krishna linked to.
The part your client needs to look at is Private Bytes. But even that isn't exactly a hard number as parts of your app may be swapped to the virtual disk.
In short, unless your Private Bytes section is pretty well out of control OR you have leaks (ie: undisposed unmanaged resources) you (and your client) should ignore this and let the OS manage what is allocated, what's in physical ram and what's swapped out to disk.

It's fairly common for software to issue one large memory request to the underlying operating system, then internally manage its own use of the allocated memory block. So common, in fact, that Windows' (and other operating systems') memory manager explicitly supports the concept, called "uncommitted memory" -- memory that the process has requested but hasn't made use of yet. That memory really doesn't exist as far as bits taking up space on your DRAM chips until the process actually makes use of it. The preallocation of memory effectively costs nothing.
Applications do this for many reasons -- though it's primarily done for performance reasons. An application with knowledge of its own memory usage patterns can optimize its allocator for that pattern; similarly, for address locality reasons, as successive memory requests from the OS won't always be 'next' to each other in memory, which can affect the performance of the CPU cache and could even preclude you from using some optimizations.
.NET in particular allocates space for the managed heap ahead of time, for both of the reasons listed above. In most cases, allocating memory on the managed heap merely involves incrementing a top-of-heap pointer, which is incredibly fast --- and also not possible with the standard memory allocator (which has a more general design to perform acceptably in a fragmented heap, whereas the CLR's GC uses memory compaction to sharply limit the fragmentation of the managed heap), and also not possible if the managed heap itself is fragmented across the process address space due to multiple allocations at different points in time.

Related

C# Excessive Garbage Collection - Large Strings, G2 pressure?

I'm writing a high-ish volume web service in C# running in 64-bit IIS on Win 2k8 (.NET 4.5) that works with XML payloads and does a variety of operations on small and large objects (where the large objects are mainly strings, some over 85k (so going onto the LOH)). Requests are stateless, and memory usage remains steady over time. Lots of memory is being allocated and released per request, no memory appears to be being leaked.
Operating at a maximum of 25 transactions per second, with an average call lasting 5s, it's spending 40-60% of it's time in GC according to two profiling tools, and perfmon shows a steady 20 G0 and G1 collections over 5 seconds, and 15 G2 collections over 5 seconds - meaning lots of (we think) premature promtion into G2 for data that we'd expect to stay in G0. Everything I read indicates this is very excessive. We expect that the system should be able to perform at a higher throughput than 25 tps and assume the GC activity is preventing this.
The machines serving the requests have lots of memory - 16GB - and the application, under load, consumes at most 1GB when under load for an hour. I understand that a bigger heap won't necessarily make things better, but there is spare memory.
I appreciate this is light on specifics (will try to recreate the conditions with a trivial application if time permits) - but can anyone explain why we see so much G2 GC activity? Should I be focusing on the LOH? People keep telling me that the CLR's GC "adapts" to your load, but it's not changing it's behavior in this case and, unlike other runtimes, there seems to be little I can do to tune it (have tried workstation GC, but there is very little observable difference).

Microsoft decided to design the String class so that all strings are stored in memory as a monolithic sequence of characters. While this works well for some usage patterns, it works dreadfully for others.
One thing I've found very helpful is to avoid creating instances of String whenever possible. If a method will often be used to operate on part of a supplied string, and will in turn ask other methods to operate on parts of it, the methods should accept arguments specifying the range of the String upon which they should operate. This will avoid the need for callers of the first method to use Subst to construct a new String for the method to act upon, and will avoid the need to have the method call Subst to feed portions of the string to its callers. In some cases where I have used this technique, the creation of thousands of String instances--some quite large--could be replaced with zero.

CLR's GC "adapts" to your load
It can't know how much memory you are willing to tolerate as overhead. Here, you probably want to give the app like 5GB of heap so that collections are much rarer. The GC has no built-in tuning knobs for that (subjective note: that's a pitty).
You can force bigger heap sizes by using one of the low latency modes for short durations. That should cause the GC to try hard to avoid G2 collections. Monitor the RAM usage and disable low latency mode when consumption reaches 5GB.
This is a risky strategy but it's the best I think you can do.
I would not do it. You can maximally gain 2x throughput. Your CPU is maxed out, right? Workstation GC does not scale to multiple cores and leaves CPUs unused.

Weird out of memory exceptions [duplicate]

If you application is such that it has to do lot of allocation/de-allocation of large size objects (>85000 Bytes), its eventually will cause memory fragmentation and you application will throw an Out of memory exception.
Is there any solution to this problem or is it a limitation of CLR memory management?

Unfortunately, all the info I've ever seen only suggests managing risk factors yourself: reuse large objects, allocate them at the beginning, make sure they're of sizes that are multiples of each other, use alternative data structures (lists, trees) instead of arrays. That just gave me an another idea of creating a non-fragmenting List that instead of one large array, splits into smaller ones. Arrays / Lists seem to be the most frequent culprits IME.
Here's an MSDN magazine article about it:
http://msdn.microsoft.com/en-us/magazine/cc534993.aspx, but there isn't that much useful in it.

The thing about large objects in the CLR's Garbage Collector is that they are managed in a different heap.
The garbage collector uses a mechanism called "Compacting", which is basically fragmentation and re-linkage of objects in the regular heap.
The thing is, since "compacting" large objects (copying and re-linking them) is an expensive procedure, the GC provides a different heap for them, which is never being compacted.
Note also that memory allocation is contiguous. Meaning if you allocate Object #1 and then Object #2, Object #2 will always be placed after Object #1.
This is probably what's causing you to get OutOfMemoryExceptions.
I would suggest having a look at design patterns like Flyweight, Lazy Initialization and Object Pool.
You could also force GC collection, if you're suspecting that some of those large objects are already dead and have not been collected due to flaws in your flow of control, causing them to reach higher generations just before being ready for collection.

A program always bombs on OOM because it is asking for a chunk of memory that's too large, never because it completely exhausted all virtual memory address space. You could argue that's a problem with the LOH getting fragmented, it is just as easy to argue that the program is using too much virtual memory.
Once a program goes beyond allocating half the addressable virtual memory (a gigabyte), it is really time to either consider making its code smarter so it doesn't gobble so much memory. Or making a 64-bit operating system a prerequisite. The latter is always cheaper. It doesn't come out of your pocket either.

Is there any solution to this problem or is it a limitation of CLR memory management?
There is no solution besides reconsidering your design. And it is not a problem of the CLR. Note, the problem is the same for unmanaged applications. It is given by the fact, that too much memory is used by the application at the same time and in segments laying 'disadvantageous' out in memory. If some external culprit has to be pointed at nevertheless, I would rather point at the OS memory manager, which (of course) does not compact its vm address space.
The CLR manages free regions of the LOH in a free list. This in most cases is the best what can be done against fragmentation. But since for really large objects, the number of objects per LOH segment decreases - we eventually end up having only one object per segment. And where those objects are positioned in the vm space is completely up to the memory manager of the OS. This means, the fragmentation mostly happens on the OS level - not on the CLR. This is an often overseen aspect of heap fragmentation and it is not .NET to blame for it. (But it is also true, fragmentation can also occour on the managed side like nicely demonstrated in that article.)
Common solutions have been named already: reuse your large objects. I up to now was not confronted with any situation, where this could not be done by proper design. However, it can be tricky sometimes and therefore may be expensive though.

We were precessing images in multiple threads. With images being large enough, this also caused OutOfMemory exceptions due to memory fragmentation. We tried to solve the problem by using unsafe memory and pre-allocating heap for every thread. Unfortunately, this didn't help completely since we relied on several libraries: we were able to solve the problem in our code, but not 3rd party.
Eventually we replaced threads with processes and let operating system do the hard work. Operating systems have long ago built a solution for memory fragmentation, so it's unwise to ignore it.

I have seen in a different answer that the LOH can shrink in size:
Large Arrays, and LOH Fragmentation. What is the accepted convention?
"
...
Now, having said that, the LOH can shrink in size if the area at its end is completely free of live objects, so the only problem is if you leave objects in there for a long time (e.g. the duration of the application).
...
"
Other then that you can make your program run with extended memory up to 3GB on 32bit system and up to 4 GB on 64bit system.
Just add the flag /LARGEADDRESSAWARE in your linker or this post build event:
call "$(DevEnvDir)..\tools\vsvars32.bat"
editbin /LARGEADDRESSAWARE "$(TargetPath)"
In the end if you are planning to run the program for a long time with lots of large objects you will have to optimize the memory usage and you might even have to reuse allocated objects to avoid garbage collector which is similar in concept, to working with real time systems.

CLR / High memory consumption after switching from 32-bit process to 64-bit process

I have a backend application (windows service) built on top of .NET Framework 4.5 (C#). The application runs on Windows Server 2008 R2 server, with 64GB of memory.
Due to dependencies I had, I used to compile and run this application as a 32-bit process (compile it as x86) and use /LARGEADDRESSAWARE flag to let the application use more than 2GB memory in the user space. Using this configuration, the average memory consumption (according to the "memory (private working set)" column in the task manager) was about 300-400MB.
The reason I needed the LARGEADDRESSAWARE flag, and the reason i changed it to 64-bit, is that although 300-400MB is the average, once in a while this app doing stuff that involves loading a lot of data into the memory (and it's much easier to develop and manage this kind of stuff when you're not very limited memory-wise).
Recently (after removing those x86 native dependencies), I changed the application compilation to "Any CPU", so now, on the production server, it runs as a 64-bit process. Starting when I did this change, the average memory consumption (according to the task manager) got to new levels: 3-4 GB, when there is no other change that may explain this change in behavior.
Here are some additional facts about the current state:
According to the "#Bytes in all heaps" counter, the total amount of memory is about 600MB.
When debugging the process with WinDbg+SOS, !dumpheap -stat showed that there are about 250-300MB free, but all the other object was much less than the total amount of memory the process used.
According to the GC performance counters, there are Gen0 collections on regular basis. In fact, the "% Time in GC" counter indicates that 10-20% in average of the time spent on GC (which makes sense given the nature of the application - a lot of allocations of information and data structures that are in use for short time).
I'm using Server GC in this app.
There is no memory problem on the server. It uses about 50-60% of the available memory (64GB).
My questions:
Why is a great difference between the memory allocated to the process (according to the task manager) and the actual size of the CLR heap (there is no un-managed code in the process that can explain this)?
Why is the 64-bit process takes more memory compared to the same process running as 32-bit process? even when considering that pointers takes twice the size, there's a big difference.
Can i do something to lower the memory consumption, or to have better understanding of the issue?
Thanks!

There are a few things to consider:
1) You mentioned you're using Server GC mode. In server GC mode, CLR creates one heap for every CPU core on the machine, which is more efficient more multi-threaded processing in server processes, e.g. Asp.Net processes. Each heap has two segment: one for small objects, one for large objects. Each segment starts with 4 gb reserved memory. Basically server GC mode tries to use more memory on the system to trade for overall system performance.
2) Pointer is bigger on 64-bit, of course.
3) Foreground Gen2 GC becomes super expensive in server GC mode due to heap is much larger. So CLR tries super hard to reduce the number of foreground Gen2 GC, sometimes using background Gen2 GC.
4) Depending on usage, fragmentation can become a real issue. I've seen heaps with 98% fragmentation (98% heap is free blocks).
To really solve your problem, you need to get an ETW trace + a memory dump, and then use tools like PerfView for detailed analysis.

A 64-bit process will naturally use 64-bit pointers, effectively doubling the memory usage of every reference. Certain platform-dependent variables such as IntPtr will also take up double the space.
The first and best thing you can do is to run a memory profiler to see where exactly the extra memory footprint is coming from. Anything else is speculative!

C# memory usage

How I can get the actual memory used in my C# application?
Task Manager shows different metrics.
Process Explorer shows increased usage of private bytes.
Performance counter (perfmon.msc) showed different metrics
when I used .NET memory profiler, it showed most of the memory is garbage collected and only few Live bytes.
I do not know which to believe.

Memory usage is somewhat more complicated than displaying a single number or two. I suggest you take a look at Mark Russinovich's excellent post on the different kinds of counters in Windows.
.NET only complicates matters further. A .NET process is just another Windows process, so obviously it will have all the regular metrics, but in addition to that the CLR acts as a memory manager for the managed application. So depending on the point of view these numbers will vary.
The CLR effectively allocates and frees virtual memory in big chunks on behalf of the .NET application and then hands out bits of memory to the application as needed. So while your application may use very little memory at a given point in time this memory may or may not have been released to the OS.
On top of that the CLR itself uses memory to load IL, compile IL to native code, store all the type information and so forth. All of this adds to the memory footprint of the process.
If you want to know how much memory your managed application uses for data, the Bytes in all heaps counter is useful. Private bytes may be used as a somewhat rough estimate for the application's memory usage on the process level.
You may also want to check out these related questions:
Reducing memory usage of .NET applications?
How to detect where a Memory Leak is?

If you are using VS 2010 you can use Visual Studio 2010 Profiler.
This tool can create very informative reports for you.

If you want to know approximately how many bytes are allocated on the GC heap (ignoring memory used by the runtime, the JIT compiler, etc.), you can call GC.GetTotalMemory. We've used this when tracking down memory leaks.

Download VADump (If you do not have it yet)
Usage: VADUMP.EXE -sop [PID]

Well, what is "actual memory used in my C# application" ?
Thanks to Virtual memory and (several) Memory management layers in Windows and the CLR, this is a rather complicated question.
From the sources you mention the CLR profiler will give you the most detailed breakdown, I would call that the most accurate.
But there is no 'single number' answer, the question whether Application A use more or less memory than B can be impossible to answer.
So what do you actually want to know? Do you have a concrete performance problem to solve?

Understanding Memory Performance Counters

[Update - Sep 30, 2010]
Since I studied a lot on this & related topics, I'll write whatever tips I gathered out of my experiences and suggestions provided in answers over here-
1) Use memory profiler (try CLR Profiler, to start with) and find the routines which consume max mem and fine tune them, like reuse big arrays, try to keep references to objects to minimal.
2) If possible, allocate small objects (less than 85k for .NET 2.0) and use memory pools if you can to avoid high CPU usage by garbage collector.
3) If you increase references to objects, you're responsible to de-reference them the same number of times. You'll have peace of mind and code probably will work better.
4) If nothing works and you are still clueless, use elimination method (comment/skip code) to find out what is consuming most memory.
Using memory performance counters inside your code might also help you.
Hope these help!
[Original question]
Hi!
I'm working in C#, and my issue is out of memory exception.
I read an excellent article on LOH here ->
http://www.simple-talk.com/dotnet/.net-framework/the-dangers-of-the-large-object-heap/
Awesome read!
And,
http://dotnetdebug.net/2005/06/30/perfmon-your-debugging-buddy/
My issue:
I am facing out of memory issue in an enterprise level desktop application. I tried to read and understand stuff about memory profiling and performance counter (tried WinDBG also! - little bit) but am still clueless about basic stuff.
I tried CLR profiler to analyze the memory usage. It was helpful in:
Showing me who allocated huge chunks of memory
What data type used maximum memory
But, both, CLR Profiler and Performance Counters (since they share same data), failed to explain:
The numbers that is collected after each run of the app - how to understand if there is any improvement?!?!
How do I compare the performance data after each run - is lower/higher number of a particular counter good or bad?
What I need:
I am looking for the tips on:
How to free (yes, right) managed data type objects (like arrays, big strings) - but not by making GC.Collect calls, if possible. I have to handle arrays of bytes of length like 500KB (unavoidable size :-( ) every now and then.
If fragmentation occurs, how to compact memory - as it seems that .NET GC is not really effectively doing that and causing OOM.
Also, what exactly is 85KB limit for LOH? Is this the size of the object of the overall size of the array? This is not very clear to me.
What memory counters can tell if code changes are actually reducing the chances of OOM?
Tips I already know
Set managed objects to null - mark them garbage - so that garbage collector can collect them. This is strange - after setting a string[] object to null, the # bytes in all Heaps shot up!
Avoid creating objects/arrays > 85KB - this is not in my control. So, there could be lots of LOH.
3.
Memory Leaks Indicators:
# bytes in all Heaps increasing
Gen 2 Heap Size increasing
# GC handles increasing
# of Pinned Objects increasing
# total committed Bytes increasing
# total reserved Bytes increasing
Large Object Heap increasing
My situation:
I have got 4 GB, 32-bit machine with Wink 2K3 server SP2 on it.
I understand that an application can use <= 2 GB of physical RAM
Increasing the Virtual Memory (pagefile) size has no effect in this scenario.
As its OOM issue, I am only focusing on memory related counters only.
Please advice! I really need some help as I'm stuck because of lack of good documentation!

Nayan, here are the answers to your questions, and a couple of additional advices.
You cannot free them, you can only make them easier to be collected by GC. Seems you already know the way:the key is reducing the number of references to the object.
Fragmentation is one more thing which you cannot control. But there are several factors which can influence this:
LOH external fragmentation is less dangerous than Gen2 external fragmentation, 'cause LOH is not compacted. The free slots of LOH can be reused instead.
If the 500Kb byte arrays are referring to are used as some IO buffers (e.g. passed to some socket-based API or unmanaged code), there are high chances that they will get pinned. A pinned object cannot be compacted by GC, and they are one of the most frequent reasons of heap fragmentation.
85K is a limit for an object size. But remember, System.Array instance is an object too, so all your 500K byte[] are in LOH.
All counters that are in your post can give a hint about changes in memory consumption, but in your case I would select BIAH (Bytes in all heaps) and LOH size as primary indicators. BIAH show the total size of all managed heaps (Gen1 + Gen2 + LOH, to be precise, no Gen0 - but who cares about Gen0, right? :) ), and LOH is the heap where all large byte[] are placed.
Advices:
Something that already has been proposed: pre-allocate and pool your buffers.
A different approach which can be effective if you can use any collection instead of contigous array of bytes (this is not the case if the buffers are used in IO): implement a custom collection which internally will be composed of many smaller-sized arrays. This is something similar to std::deque from C++ STL library. Since each individual array will be smaller than 85K, the whole collection won't get in LOH. The advantage you can get with this approach is the following: LOH is only collected when a full GC happens. If the byte[] in your application are not long-lived, and (if they were smaller in size) would get in Gen0 or Gen1 before being collected, this would make memory management for GC much easier, since Gen2 collection is much more heavyweight.
An advice on the testing & monitoring approach: in my experience, the GC behavior, memory footprint and other memory-related stuff need to be monitored for quite a long time to get some valid and stable data. So each time you change something in the code, have a long enough test with monitoring the memory performance counters to see the impact of the change.
I would also recommend to take a look at % Time in GC counter, as it can be a good indicator of the effectiveness of memory management. The larger this value is, the more time your application spends on GC routines instead of processing the requests from users or doing other 'useful' operations. I cannot give advices for what absolute values of this counter indicate an issue, but I can share my experience for your reference: for the application I am working on, we usually treat % Time in GC higher than 20% as an issue.
Also, it would be useful if you shared some values of memory-related perf counters of your application: Private bytes and Working set of the process, BIAH, Total committed bytes, LOH size, Gen0, Gen1, Gen2 size, # of Gen0, Gen1, Gen2 collections, % Time in GC. This would help better understand your issue.

You could try pooling and managing the large objects yourself. For example, if you often need <500k arrays and the number of arrays alive at once is well understood, you could avoid deallocating them ever--that way if you only need, say, 10 of them at a time, you could suffer a fixed 5mb memory overhead instead of troublesome long-term fragmentation.
As for your three questions:
Is just not possible. Only the garbage collector decides when to finalize managed objects and release their memory. That's part of what makes them managed objects.
This is possible if you manage your own heap in unsafe code and bypass the large object heap entirely. You will end up doing a lot of work and suffering a lot of inconvenience if you go down this road. I doubt that it's worth it for you.
It's the size of the object, not the number of elements in the array.
Remember, fragmentation only happens when objects are freed, not when they're allocated. If fragmentation is indeed your problem, reusing the large objects will help. Focus on creating less garbage (especially large garbage) over the lifetime of the app instead of trying to deal with the nuts and bolts of the gc implementation directly.

Another indicator is watching Private Bytes vs. Bytes in all Heaps. If Private Bytes increases faster than Bytes in all Heaps, you have an unmanaged memory leak. If 'Bytes in all Heaps` increases faster than 'Private Bytes' it is a managed leak.
To correct something that #Alexey Nedilko said:
"LOH external fragmentation is less dangerous than Gen2 external
fragmentation, 'cause LOH is not compacted. The free slots of LOH can
be reused instead."
is absolutely incorrect. Gen2 is compacted which means there is never free space after a collection. The LOH is NOT compacted (as he correctly mentions) and yes, free slots are reused. BUT if the free space is not contiguous to fit the requested allocation, then the segment size is increased - and can continue to grow and grow. So, you can end up with gaps in the LOH that are never filled. This is a common cause of OOMs and I've seen this in many memory dumps I've analyzed.
Though there are now methods in the GC API (as of .NET 4.51) that can be called to programatically compact the LOH, I strongly recommend to avoid this - if app performance is a concern. It is extremely expensive to perform this operation at runtime and and hurt your app performance significantly. The reason that the default implementation of the GC was to be performant which is why they omitted this step in the first place. IMO, if you find that you have to call this because of LOH fragmentation, you are doing something wrong in your app - and it can be improved with pooling techniques, splitting arrays, and other memory allocation tricks instead. If this app is an offline app or some batch process where performance isn't a big deal, maybe it's not so bad but I'd use it sparingly at best.
A good visual example of how this can happen is here - The Dangers of the Large Object Heap and here Large Object Heap Uncovered - by Maoni (GC Team Lead on the CLR)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.