Who is responsible for C# memory allocation?

Who is responsible for C# memory allocation? - c#

What part of the .NET framework takes responsibility to allocate memory. Is it GC?

It is the CLR but in close cooperation with the GC. And the GC is a part of the CLR so it's not such a clear division.
Allocation takes place at the start of the free section of the Heap, it is a very simple and fast operation. Allocation on the Large Object Heap (LOH) is slightly more complicated.

Do Visit http://www.codeproject.com/Articles/38069/Memory-Management-in-NET
Allocation of Memory
"Generally .NET is hosted using Host process, during debugging .NET
creates a process using VSHost.exe which gives the programmer the
basic debugging facilities of the IDE and also direct managed memory
management of the CLR. After deploying your application, the CLR
creates the process in the name of its executable and allocates memory
directly through Managed Heaps.
When CLR is loaded, generally two managed heaps are allocated; one is
for small objects and other for Large Objects. We generally call it as
SOH (Small Object Heap) and LOH (Large Object Heap). Now when any
process requests for memory, it transfers the request to CLR, it then
assigns memory from these Managed Heaps based on their size.
Generally, SOH is assigned for the memory request when size of the
memory is less than 83 KBs( 85,000 bytes). If it is greater than this,
it allocates memory from LOH. On more and more requests of memory .NET
commits memory in smaller chunks."
Upon reading further this paragraphs, Its the CLR with the help of Windows (32bit or 64Bit) it "allocates" the memory.
The "De-allocation" is managed by GC.
"The relationships between the Object and the process associated with
that object are maintained through a Graph. When garbage collection is
triggered it deems every object in the graph as garbage and traverses
recursively to all the associated paths of the graph associated with
the object looking for reachable objects. Every time the Garbage
collector reaches an object, it marks the object as reachable. Now
after finishing this task, garbage collector knows which objects are
reachable and which aren’t. The unreachable objects are treated as
Garbage to the garbage collector."

Despite the name, many kinds of modern "garbage collectors" don't actually collect garbage as their primary operation. Instead, they often identify everything in an area of memory that isn't garbage and move it somewhere that is known not to contain anything of value. Unless the area contained an object that was "pinned" and couldn't be moved (in which case things are more complicated) the system will then know that the area of memory from which things were moved contains nothing of value.
In many such collectors, once the last reference to an object has disappeared, no bytes of memory that had been associated with that object will ever again be examined prior to the time that they get blindly overwritten with new data. If the GC expects that the next use of the old region of memory will be used to hold new objects, it will likely zero out all the bytes in one go, rather than doing so piecemeal to satisfy allocations, but if the GC expects that it will be used as a destination for objects copied from elsewhere it may not bother. While objects are guaranteed to remain in memory as long as any reference exists, once the last reference to an object has ceased to exist there may be no way of knowing whether every byte of memory that had been allocated to that object has actually been overwritten.
While .NET does sometimes have to take affirmative action when certain objects (e.g. those whose type overrides Finalize) are found to have been abandoned, in general I think it's best to think of the "GC" as being not a subsystem that "collects" garbage, but rather as a garbage-collected memory pool's manager, that needs to at all times be kept informed of everything that isn't garbage. While the manager's duties include the performance of GC cycles, they go far beyond that, and I don't think it's useful to separate GC cycles from the other duties.

Related

What mechanisms of the C# language are used in order to pass an instance of an object to the `GC.AddMemoryPressure` method?

What mechanisms of the C# language are used in order to pass an instance of an object to the GC.AddMemoryPressure method?
I met the following code sample in the CLR via C# book:
private sealed class BigNativeResource {
private readonly Int32 m_size;
public BigNativeResource(Int32 size) {
m_size = size;
// Make the GC think the object is physically bigger
if (m_size > 0) GC.AddMemoryPressure(m_size);
Console.WriteLine("BigNativeResource create.");
}
~BigNativeResource() {
// Make the GC think the object released more memory
if (m_size > 0) GC.RemoveMemoryPressure(m_size);
Console.WriteLine("BigNativeResource destroy.");
}
}
I can not understand how do we associate an instance of an object with the pressure it adds. I do not see object reference being passed to the GC.AddMemoryPressure. Do we associate the added memory pressure (amp) with an object at all?
Also, I do not see any reasons in calling the GC.RemoveMemoryPressure(m_size);. Literally it should be of no use. Let me explain myself. There are two possibilities: there is an association between the object instance or there is no such association.
In the former case, the GC should now the m_size in order to prioritize and decide when to undertake a collection. So, it definitely should remove the memory pressure by itself (otherwise what would it mean for a GC to remove an object while taking into an account the amp?).
In the later case it is not clear what the use of the adding and removing the amp at all. The GC can only work with the roots which are by definitions instances of classes. I.e. GC only can collect the objects. So, in case there is no association between objects and the amp I see no way how the amp could affect the GC (so I assume there is an association).

I can not understand how do we associate an instance of an object with the pressure it adds.
The instance of the object associates the pressure it adds with a reference to itself by calling AddMemoryPressure. The object already has identity with itself! The code which adds and removes the pressure knows what this is.
I do not see object reference being passed to the GC.AddMemoryPressure.
Correct. There is not necessarily an association between added pressure and any object, and regardless of whether there is or not, the GC does not need to know that information to act appropriately.
Do we associate the added memory pressure (amp) with an object at all?
The GC does not. If your code does, that's the responsibility of your code.
Also, I do not see any reasons in calling the GC.RemoveMemoryPressure(m_size)
That's so that the GC knows that the additional pressure has gone away.
I see no way how the amp could affect the GC
It affects the GC by adding pressure!
I think there is a fundamental misunderstanding of what's going on here.
Adding memory pressure is just telling the GC that there are facts about memory allocation that you know, and that the GC does not know, but are relevant to the action of the GC. There is no requirement that the added memory pressure be associated with any instance of any object or tied to the lifetime of any object.
The code you've posted is a common pattern: an object has additional memory associated with each instance, and it adds a corresponding amount of pressure upon allocation of the additional memory and removes it upon deallocation of the additional memory. But there is no requirement that additional pressure be associated with a specific object or objects. If you added a bunch of unmanaged memory allocations in your static void Main() method, you might decide to add memory pressure corresponding to it, but there is no object associated with that additional pressure.

These methods exist to let GC know about memory usage outside of managed heap. There is no object to pass to these methods because memory is not directly related to any particular managed object. It's responsibility of author of the code to notify GC about change in memory usage correctly.
GC.AddMemoryPressure(Int64)
… the runtime takes into account only the managed memory, and thus underestimates the urgency of scheduling garbage collection.
Extreme example would be you have 32 bit app and GC thinks it can easily allocate almost 2GB of managed (C#) objects. As part of the code you use native interop to allocate 1GB. Without the AddMemoryPressure call GC will still think it's free to wait till you allocate/deallocate a lot of managed objects... but around the time you allocated 1GB of managed objects GC runs into strange state - it should have whole extra GB to play with but there is nothing left so it has to scramble to collect memory at that point. If AddMemoryPressure was properly used GC would had chance to adjust and more aggressively collect earlier in background or at points that allowed shorter/smaller impact.

AddMemoryPressure is used to declare (emphasis here) that you have sensible sized unmanaged data allocated somewhere. This method is a courtesy that the runtime gives to you.
The purpose of the method is to declare under your own responsibility that somewhere you have unmanaged data that is logically bound to some managed object instance. The garbage collector has a simple counter and tracks your request by simply adding the amount you specify to the counter.
The documentation is is clear about that: when the unmanaged memory goes away, you must tell the garbage collector that it has gone away.
You need to use this method to inform the garbage collector that the unmanaged memory is there but could be freed if the associated object is disposed. Then, the garbage collector is able to schedule better its collection tasks.

What are approaches to optimize the mark phase of a non-generational GC?

I am running on Unity's Boehm–Demers–Weiser garbage collector, which is a non-generational GC.
I have a large tree of managed objects in memory (~100k objects, ~200MiB allocation).
These objects are essentially a cache and never go out of scope, so they never actually get sweeped by the GC.
However, because Boehm is non-generational, this stale cache never gets moved up to higher generations. This causes the mark phase to take a very high amount of processing time, as it has to traverse this whole cache on every collection, causing noticeable lag spikes.
This is "by-design", as the Unity documentation puts it:
Crucially, Unity’s garbage collection – which uses the Boehm GC algorithm – is non-generational and non-compacting. “Non-generational” means that the GC must sweep through the entire heap when performing a collection pass, and its performance therefore degrades as the heap expands.
I am well aware of approaches to reduce recurring garbage allocation, however I cannot find any information on how to optimize a large, stale, baseline allocation in a non-generational GC.
More specifically:
Is there any way to mark a root pointer (e.g. static field) as ignored from GC entirely?
Are there some data structure patterns that are faster to traverse in the mark phase?
Conversely, are there known data structure patterns that hinder the mark phase speed?
These questions are just some of my hypotheses to solve this, but I'm open to all suggestions.

One could approximate generational behavior by separating program startup with initialization of static data structures from steady state operation. All pointers into the startup memory region could be ignored while pointers from it should not exist since nothing allocated after the switchpoint (which would be under GC control) has been allocated yet.
One could even GC the startup region once before switching to a new region. Essentially you would end up with a limited form of region-based, non-moving collector where references between regions only happen in one direction.

Counting total objects queued for garbage collection

I wanted to add a small debug UI to my OpenGL game, which will be updated frequently with various debugging options/output displays. One thing I wanted was a constant counter that shows active objects in each generation of the garbage collector. I don't want names or anything, just a total count; something that I can eyeball when I do certain things within the game.
My problem, however, is that I can't seem to find a way to count the total objects currently alive in the various generations.
I even considered keeping a global static field, which would be incremented within every constructor and decremented within class finalizers. This would require hand-coding said functionality into every class though, and would not solve the problem of a "per-generation total".
Do you know how I could go about doing this?

(Question title:) "Counting total objects queued for garbage collection"
(From the question's body:) "My problem, however, is that I can't seem to find a way to count the total objects currently alive in the various generations."
Remark: Your question's title and body ask for opposite things. In the title, you're asking for the number of objects that can no longer be reached via any GC root, while in the body, you're asking for "live" objects, i.e. those that can still be reached via any GC root.
Let me start by saying that there might not be any way to do this, basically because objects in .NET are not reference-counted, so they cannot be immediately marked as "no longer needed" when the last reference to them disappears or goes out of scope. I believe .NET's mark-and-compact garbage collector only discovers which objects are alive and which can be reclaimed during an actual garbage collection (during the "mark" phase). You however seem to want this information in advance, i.e. before a GC occurs.
That being said, here are perhaps your best options:
Perhaps your best bet in .NET's managed Framework Class Library are performance counters. But it doesn't look like there are any suitable counters available: There are performance counters giving you the number of allocated bytes in the various GC generations, but AFAIK no counters for the number of live/dead objects.
You might also want to take a look at the CLR's (i.e. the runtime's) unmanaged, COM-based Debugging API. Given that you have retrieved an ICorDebugProcess5 interface, these methods might be of interest:
ICorDebugProcess5::EnumerateGCReferences method:
"Gets an enumerator for all objects that are to be garbage-collected in a process."
See also this answer to a similar question on SO.
Note that this is about objects that are to be garbage-collected, not about live objects.
ICorDebugProcess5::GetGCHeapInformation method:
"Provides general information about the garbage collection heap, including whether it is currently enumerable."
If it turns out that the managed heap is enumerable, you could use…
ICorDebugProcess5::EnumerateHeap method:
"Gets an enumerator for the objects on the managed heap."
The objects returned by this enumerator are of this type:
COR_HEAPOBJECT structure:
"Provides information about an object on the managed heap."
You might not be actually interested in these details, but just in the number of objects returned by the enumerator.
(I haven't used this API myself, perhaps there exists a better and more efficient way.)
In Sept 2015, Microsoft published a managed library called clrmd aka Microsoft.Diagnostics.Runtime on GitHub. It is based on the same foundation as the unmanaged debugging API mentioned above. The project includes documentation about enumerating objects in the GC heap.
Btw. there is an extremely informative book out there by Ben Watson, "Writing High-Performance .NET Code", which includes solid tips on how to make .NET memory allocation and GC more efficient.

Garbage Collector doesn't have to collect objects.
... that fact will be discovered when the garbage collector
runs the collector for whatever generation the object was in. (If it
runs at all, which it might not. There is no guarantee that the GC
runs.)
(C) Eric Lippert
If the application performs normally and the memory consumption is not increasing the GC can let it work without interruptions. That means that numbers will differ from run to run.
If I were you I wouldn't spend time on getting generations information, but just the size of used memory.
The simple but not very accurate way is to get it from GC.
// Determine the best available approximation of the number
// of bytes currently allocated in managed memory.
Console.WriteLine("Total Memory: {0}", GC.GetTotalMemory(false));
If you see that used memory increases and decreases often, then you can use existing profilers to figure out where are you allocating too mush, or even where the memory leak is.

.NET Free memory usage (how to prevent overallocation / release memory to the OS)

I'm currently working on a website that makes large use of cached data to avoid roundtrips.
At startup we get a "large" graph (hundreds of thouthands of different kinds of objects).
Those objects are retrieved over WCF and deserialized (we use protocol buffers for serialization)
I'm using redgate's memory profiler to debug memory issues (the memory didn't seem to fit with how much memory we should need "after" we're done initializing and end up with this report
Now what we can gather from this report is that:
1) Most of the memory .NET allocated is free (it may have been rightfully allocated during deserialisation, but now that it's free, i'd like for it to return to the OS)
2) Memory is fragmented (which is bad, as everytime i refresh the cash i need to redo the memory hungry deserialisation process and this, in turn creates large object that may throw an OutOfMemoryException due to fragmentation)
3) I have no clue why the space is fragmented, because when i look at the large object heap, there are only 30 instances, 15 object[] are directly attached to the GC and totally unrelated to me, 1 is a char array also attached directly to the GC Heap, the remaining 15 are mine but are not the cause of this as i get the same report if i comment them out in code.
So my question is, what can i do to go further with this? I'm not really sure what to look for in debugging / tools as it seems my memory is fragmented, but not by me, and huge amounts of free spaces are allocated by .net , which i can't release.
Also please make sure you understand the question well before answering, i'm not looking for a way to free memory within .net (GC.Collect), but to free memory that is already free in .net , to the system as well as to defragment said memory.
Note that a slow solution is fine, if it's possible to manually defragment the large heap i'd be all for it as i can call it at the end of RefreshCache and it's ok if it takes 1 or 2 second to run.
Thanks for your help!
A few notes i forgot:
1) The project is a .net 2.0 website, i get the same results running it in a .net 4 pool, idem if i run it in a .net 4 pool and convert it to .net 4 and recompile.
2) These are results of a release build, so debug build can not be the issue.
3) And this is probably quite important, i do not get these issues at all in the webdev server, only in IIS, in the webdev i get memory consumption rather close to my actual consumption (well more, but not 5-10X more!)

Objects allocated on the large object heap (objects >= 85,000 bytes, normally arrays) are not compacted by the garbage collector. Microsoft decided that the cost of moving those objects around would be too high.
The recommendation is to reuse large objects if possible to avoid
fragmentation on the managed heap and the VM space.
http://msdn.microsoft.com/en-us/magazine/cc534993.aspx
I'm assuming that your large objects are temporary byte arrays created by your deserialization library. If the library allows you to supply your own byte arrays, you could preallocate them at the start of the program and then reuse them.

I know this isn't the answer you'd like to hear, but you can't forcefully release the memory back to the OS. However, for what reason do you want to do so? .NET will free its heap back to the OS once you're running low on physical memory. But if there's an ample amount of free physical memory, .NET will keep its heap to make future allocation of objects faster. If you really wanted to force .NET to release its heap back to the OS, I suppose you could write a C program which just mallocs until it runs out of memory. This should cause the OS to signal .NET to free its unused portion of the heap.
It's better that unused memory be reseved for .NET so that your application will have better allocation performance (since the runtime knows what memory is free and what isn't, allocation can just use the free memory without having to syscall into the OS to get more memory).
The garbage collector is in charge of defragmenting the heap. Every so often (usually during collection runs), it will move objects around the heap if it determines this needs to be done. (This is why C++/CLI has the pin_ptr construct for "pinning" objects).
Fragmentation usually isn't a big issue though with memory, since it provides fast random access.
As for your OutOfMemoryException, I don't have a good answer for. Ordinarily I'd suspect that your old object graph isn't being collected (some object somewhere is holding a reference onto it, a "memory leak"). But since you're using a profiler, I don't know then.

As of .NET 4.5.1 you can set a one-time flag to compact LOH before issuing a call to GC collect, i.e.
Runtime.GCSettings.LargeObjectHeapCompactionMode = System.Runtime.GCLargeObjectHeapCompactionMode.CompactOnce;
GC.Collect(); // This will cause the LOH to be compacted (once).

Some testing and some C++ later, i've found the reason why i get so much free memory, it's because of IIS instancing the CLR via VM Hoarding (providing a dll to instantiate it without VM Hoarding takes up as much initial memory, but does release most of it as time goes which is the behavior i expect).
So this does fix my reported memory issue, however i still get about 100mb free memory no matter what, and i still think this is due to fragmentation and fragments only being released at once, because the profiler still reports memory fragmentation. So not marking my own answer as an answer in hope someone can shed some light on this or direct me to tools that can either fix this or help me debug the root cause.

It's intriguing that it works differently on the WebDevServer as to IIS...
Is it possible that IIS is using the server garbage-collector, and the WebDev server the workstation garbage collector? The method of garbage collection can affect fragmentation. It'll probably be set in your aspnet.config file. See: http://support.microsoft.com/kb/911716

If you havent found your answer I think the following clues can help you :
Back to the basics : we sometimes forget that the objects can be explicitly set free, call explicitly the Dispose method of the objects (because you didnt mention it, I suppose you do an "object = null" instruction instead).
Use the inherited method, you dont need to implement one, unless your class doesnt have it, which I doubt it.
MSDN Help states about this method :
... There is no performance benefit in implementing the Dispose
method on types that use only managed resources (such as arrays)
because they are automatically reclaimed by the garbage collector. Use
the Dispose method primarily on managed objects that use native
resources and on COM objects that are exposed to the .NET
Framework. ...
Because it says that "they are automatically reclaimed by garbage collector" we can infer that when the method is called does the "releasing thing" (Again Im trying only to give you clues).
Besides I found this interesting article (I suppose ... I didn read it ...completely) : Garbage Collection: Automatic Memory Management in the Microsoft .NET Framework (http://msdn.microsoft.com/en-us/magazine/bb985010.aspx) which states the following in the "Forcing an Object to Clean Up" section :
..., it is also recommended that you add an additional method to
the type that allows a user of the type to explicitly clean up the
object when they want. By convention, this method should be called
Close or Dispose ....
Maybe the answer lies in this article if you read it carefully or just keep investigating in this direction.

C# memory leak?

I have a C# application that loops through a datatable, and pushes these into some locations such as Sage and a SQL table.
While it used to work fine, I'm inexplicably now getting Out of Memory exceptions after an hour or so of running it. I've noticed in the task manager, the memory usage rises by anbout 1mb every second, and keeps on going!
I was under the impression garbage collection would take of anything, but to be sure I ensure I dispose any objects after using them. I know without code it's hard to diagnose, but there's a lot of it and I'm looking more for general advice.

but to be sure I ensure I dispose any objects after using them
Dispose() is not directly related to memory management or leaks.
You'll have to look for unused objects that are still 'reachable'. Use a memory-profiler to find out.
You can start with the free CLR-Profiler.

There are a couple of potential problems that spring to mind:
There is a large pool of objects that are left inelegible for garbage collection (i.e. they are still "reachable"). For example if you add an object to an list in every loop then the list will grown unboundedly and each element in the list will remain inelegible for garbage collection as long as that list is still reachable. I'm not claiming that this is what is happening, this is just an example of how memory might be allocated and then left without being collected.
For some reason the garbage collector isn't doing a collection.
The high memory use is actually due to an unmanaged component that you are using in your application (e.g. via P/Invoke or COM interop).
Without seeing any code its tricky to give specific advice on how to fix your problem however reading through Investigating Memory Issues should give you some pointers on how to diagnose the memory problem yourself. In particular my first step would probably be to examine performance counters to see if the garbage collector is actually running, and to check the various heap sizes.
Note that Dispose and the IDisposable interface is unrelated to memory use - its important to dispose of objects like database connections once you are done with them as it frees up any associated resources (e.g. handles) however disposing of objects that implement IDisposable is very unlikely to have an impact on memory use.

Garbage collection can only get rid of objects that are no longer referenced from anything else. In addition it can only get rid of managed objects - it has no control about memory created from native code you may be interfacing with. These therefore are the two root causes for memory leaks in C# code.
The first thing to look at is perfmon. Get the counters for the private bytes and the .net heap size for the process. If the heap size remains flat (or rises and drops) but private bytes keeps increasing you've got some native code allocating memory and not releasing it.
If the heap size just keeps growing then the leak is in your managed code and you'll need a profiler like ANTS, DotTrace or even WinDbg (with SOS extension) to inspect the heap and see what objects are lying about.

The most popular "memory leak" on .Net platform is forgotten collection that repeatetly added in some infinite loop.

When you new something for temporary memory use.
Always use following way, it ensures calling dispose.
using (Someclass A = new Someclass())
{
....something about A
}
Someclass is a class implemented interface IDisposable
GC won't save you if there some part of unsafe code is involved(P/Invoke, Com etc..), and if there still a reference some where exists.
If you find memory leaking, use WinDbg will see what is in the heap.
This article may give you some help.
http://www.codeproject.com/KB/dotnet/Memory_Leak_Detection.aspx

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.