MSDN's C# docs uses the terms stack and heap (eg. when talking about variable allocation).
It is stated that these terms means different memory spaces with well defined purposes (eg. local variables are always allocated in the stack whereas member variables of reference types are always allocated in the heap).
I understand what those terms mean in terms of certain CPU architecture and certain operating systems. Let's assume the x86-64 architecture, where the stack will be a per-thread contiguous memory block used to store call frames (local variables, return address, etc), and the heap being a one-per-process memory block for general purpose use.
What I do not understand yet is how those high-level and low-level definitions relate together.
I couldn't find an objective definition for the terms stack and heap in the MSDN docs, but I assume they mean something very similar to what those terms means in the x86-64 architecture.
For the purpose of this question, let's assume we are working on a custom device which the CPU and OS don't implement the concept of a separate stack and heap, they both (CPU/OS) deal directly with virtual memory. Will the stack and the heap (as cited in the MSDN docs) even exists in an .net application running on this particular device? If so, are they enforced by the CLR? Are they created on top of what the OS returns as allocated memory?
Operating Systems usualy handle memory as following:
The OS allocates the memory for global and static variables when it
loads the program into memory and doesn't deallocate it until the
program terminates. ... The other two regions, the stack and the heap,
are more dynamic - the memory is allocated and deallocated by the
program as needed. (Reference needed)
The "program" - in this case the CLR - allocates/requests memory on the stack or heap from the os when needed.
How does CLR manage this?
Managed Heap
The CLR implements an Managed Heap. This in fact is an abstraction over the native heap provided by the os:
The Managed Heap is used to divide the Heap in further segments (Large Object Heap, ObjectHeap for example).
When you allocate memory on the Managed Heap you won't get back a
real pointer as a return value. What you get instead is a Handle, which is a indirection to the "real" pointer. This is because the GC Collector can
compact your heap and move around the objects (allocated on native heap) and thus changing the pointer address)
If you want to allocate memory in the unmanaged heap you need to use
Marshal.AllocHGlobal or Marshal.AllocCoTaskMem
If code is blittable or you use the unsafe keyword you need to
pin the object so it doesent get moved around by GC. Because in both cases your referring to the object via the pointer directly and not the handle...
Managed Stack
This is used for stackwalking
Ref Counting, so the GC knows when he can move an object to the
next generation and so on...
So you see theres a level of abstraction over the native concepts provided by the os so it can be managed by the clr but the data at the end resides in the native stack/heap.
But what if the device/os doesn't provide a heap/stack (as stated per your question)?
CLR on a device/os without stack/heap
If the CLR would be made available on such an os without such memory segmentation, the CLR could be build seperately for that os (so it accepts the same IL and manages memory in a way it is efficient without the stack/heap).
In this case the documentation would change for such systems.
-OR-
It also could be possible that the clr creates their own datastructuers within the memory and manage its own stack and heap just to virtually comform the spezifications.
Remember the CLR is always Build separately for its destination (you dont use the same CLR Sourcecode and compile it once for linux and once for windows). Heck theres even an own CLR built in Javascript (.net blazor).
But since theres no such implementation nobody really can tell (what if questions are always tricky... What if I die tomorrow, what would I do today?). There are virtually no OS around that does not split their processes in Stack/heap.
Btw you stated following:
whereas member variables are always allocated in the heap
Thats not completely true if the object itself resides on the stack (eg. struct) then the members are also allocated on the stack.
A reference type always stores the pointer on the stack/heap (whereever the object resides of which it is member of) and points to an object in the heap. Local variables, eg. variables in methods, are always on the stack (value for valuetypes and pointer/handle for reference types)
Related
What part of the .NET framework takes responsibility to allocate memory. Is it GC?
It is the CLR but in close cooperation with the GC. And the GC is a part of the CLR so it's not such a clear division.
Allocation takes place at the start of the free section of the Heap, it is a very simple and fast operation. Allocation on the Large Object Heap (LOH) is slightly more complicated.
Do Visit http://www.codeproject.com/Articles/38069/Memory-Management-in-NET
Allocation of Memory
"Generally .NET is hosted using Host process, during debugging .NET
creates a process using VSHost.exe which gives the programmer the
basic debugging facilities of the IDE and also direct managed memory
management of the CLR. After deploying your application, the CLR
creates the process in the name of its executable and allocates memory
directly through Managed Heaps.
When CLR is loaded, generally two managed heaps are allocated; one is
for small objects and other for Large Objects. We generally call it as
SOH (Small Object Heap) and LOH (Large Object Heap). Now when any
process requests for memory, it transfers the request to CLR, it then
assigns memory from these Managed Heaps based on their size.
Generally, SOH is assigned for the memory request when size of the
memory is less than 83 KBs( 85,000 bytes). If it is greater than this,
it allocates memory from LOH. On more and more requests of memory .NET
commits memory in smaller chunks."
Upon reading further this paragraphs, Its the CLR with the help of Windows (32bit or 64Bit) it "allocates" the memory.
The "De-allocation" is managed by GC.
"The relationships between the Object and the process associated with
that object are maintained through a Graph. When garbage collection is
triggered it deems every object in the graph as garbage and traverses
recursively to all the associated paths of the graph associated with
the object looking for reachable objects. Every time the Garbage
collector reaches an object, it marks the object as reachable. Now
after finishing this task, garbage collector knows which objects are
reachable and which aren’t. The unreachable objects are treated as
Garbage to the garbage collector."
Despite the name, many kinds of modern "garbage collectors" don't actually collect garbage as their primary operation. Instead, they often identify everything in an area of memory that isn't garbage and move it somewhere that is known not to contain anything of value. Unless the area contained an object that was "pinned" and couldn't be moved (in which case things are more complicated) the system will then know that the area of memory from which things were moved contains nothing of value.
In many such collectors, once the last reference to an object has disappeared, no bytes of memory that had been associated with that object will ever again be examined prior to the time that they get blindly overwritten with new data. If the GC expects that the next use of the old region of memory will be used to hold new objects, it will likely zero out all the bytes in one go, rather than doing so piecemeal to satisfy allocations, but if the GC expects that it will be used as a destination for objects copied from elsewhere it may not bother. While objects are guaranteed to remain in memory as long as any reference exists, once the last reference to an object has ceased to exist there may be no way of knowing whether every byte of memory that had been allocated to that object has actually been overwritten.
While .NET does sometimes have to take affirmative action when certain objects (e.g. those whose type overrides Finalize) are found to have been abandoned, in general I think it's best to think of the "GC" as being not a subsystem that "collects" garbage, but rather as a garbage-collected memory pool's manager, that needs to at all times be kept informed of everything that isn't garbage. While the manager's duties include the performance of GC cycles, they go far beyond that, and I don't think it's useful to separate GC cycles from the other duties.
What happen at background(in case of memory) when I declare variable and then create object for that variable . Is reference variable store anywhere and in which format and how this variable points to the memory on heap. Please clarify below doubts in the comments.
For example
ClassA instance; // Where this variable store and how much memory occupies
instance=new ClassA(); //How instance variable points to memory
EDIT
What will effect on my program memory if my program contains so many unused variable.
The instance variable is just a pointer at runtime, it points to the object allocated in the GC heap. The variable can live anywhere, stack, CPU register, inside another object that's on the heap or in the loader heap if it is static.
The big deal about the garbage collector is that it is capable of finding this pointer during a garbage collection. And can thus see that the object is still referenced and can adjust the pointer value when it compacts the heap. That's fairly straight-forward when the reference is static on inside another object. Harder when it is on the stack or a register, the jitter provides sufficient info to let the GC find it.
A reference variable is stored inline. If it's a local variable, it's allocated on the stack, if it's a member of a class it's allocated as part of the object on the heap.
An instance of a class is always allocated on the heap.
A reference is just a pointer, but what's special is that the garbage collector is aware of the reference. So, a reference uses the amount of space that a pointer uses. In a 32 bit process it uses 4 bytes, in a 64 bit process it uses 8 bytes.
The storage location of a local variable for the reference itself is platform dependent (jitters can choose where they want to store it.) Typically it will be in memory on the call stack for the method defining the local or in a CPU register. The size is also platform dependent, but is generally 4 bytes for 32-bit architecture and 8 bytes for 64-bit architecture.
The reference may or may not 'point' to the heap. It's better to think of it as an opaque reference identifier which can be used for accessing the object. The underlying pointer can change at runtime.
Regarding unused variables, the optimizing compiler will often eliminate any unused local variables entirely, so it has no impact at all on runtime performance. Also, the type of overhead you're talking about for storing a reference is miniscule for modern platforms.
If you need answers about this then I would recommend you get your hands on "CLR via C#", it is a book about how the CLR functions and it includes lots of information about this.
To answer your question, there are many things that you need to think about to answer this question.
For instance you need to store the instructions for each method in the class. When the class first loads this will effectively be a pointer to the .Net IL instructions. When the method is first needed by the application, it will be JIT compiled to actual instructions for the processor and this will be stored in memory.
Then you have static storage for class fields that will be stored only once per class.
Each class in .Net that is instantiated requires storage for various reasons, but not limited to things like inheritance, garbage collection, layout. Then you have storage for the various references that you may keep to the object which will itself take storage.
If memory is truly critical to what you are doing, then C# may not be the best choice for your application. Otherwise, just enjoy the benefits in productivity you will gain from using .NET and accept that this ease of use comes with a price of memory usage and less performance (in some cases) from a C/C++ app.
From the MSDN documentation of Marshal.AllocHGlobal:
AllocHGlobal is one of two memory allocation methods in the Marshal class. This method exposes the Win32 LocalAlloc function from Kernel32.dll.
Considering there's a GlobalAlloc API which allocates memory on the global heap, rather than the local heap, isn't this method's name rather misleading?
Was there a reason for naming it AllocHGlobal, rather than AllocHLocal?
Update: Simon points out in the comments that there's no such thing as a global heap in Windows any more, and the GlobalAlloc and LocalAlloc APIs remained for legacy purposes only. These days, the GlobalAlloc API is nothing morethan a wrapper for LocalAlloc.
This explains why the API doesn't call GlobalAlloc at all, but it doesn't explain why the API was named AllocHGlobal when it doesn't (can't) use a global heap, nor does it even call GlobalAlloc. The naming cannot possibly be for legacy reasons, because it wasn't introduced until .NET 2.0, way after 16-bit support was dropped. So, the question remains: why is Marshal.AllocHGlobal so misleadingly named?
Suppose you're doing data transfer between apps using drag and drop or over the clipboard. To populate the STGMEDIUM structure you need an HGLOBAL. So you call AllocHGlobal. Hence the name.
The main use for this function is to interop with APIs that want an HGLOBAL. It would be confusing if it was called anything else because when you wanted an HGLOBAL you'd have to find some documentation to tell you that AllocAnythingElse produced a value you could use as an HGLOBAL.
This goes back to the olden days of Windows version 3. Back then there was a notion of a "default heap", the GlobalAlloc() api function allocated from it. Memory allocated from that heap could be shared between all processes.
That changed in the 32-bit version of Windows, processes can no longer share memory through a heap. Which made the terms "global heap" and "local heap" meaningless. There is still a notion of a default heap, the "process heap". GlobalAlloc() now allocates from that heap. But it can't be shared across process boundaries. The actual implementation of GlobalAlloc, and of Marshal.AllocHGlobal, uses the LocalAlloc() api function. Another Windows 3 holdover, somewhat more suitably named for what happens these days. It in turn uses HeapAlloc() with GetProcessHeap() on 32-bit Windows.
Agreeing on the heap to use is a significant interop concern. This very often goes wrong in poorly written C code that you pinvoke. Any such code that returns a pointer to allocated memory that needs to be released by the caller often fails due to memory leaks or access violations. Such C code allocates from its own heap with the malloc() function. Which is a private heap created by the C runtime library. You have no hope of releasing such memory, you don't know what heap was used and have no way to obtain the handle to the CRT heap.
This can only come to a good end when the C code uses a well-known heap. Like the process heap. Or CoTaskMemAlloc(), used by COM code. The other one in the Marshal class. Note that the pinvoke marshaller always releases memory when necessary with CoTaskMemFree(). That's a kaboom on Vista and up if that memory wasn't allocated with CoTaskMemAlloc(), a silent leak on XP.
I think you should read https://msdn.microsoft.com/en-us/library/ms810603.aspx
A short part of it:
Global and Local Memory Functions At first glance it appears that the
local and global memory management functions exist in Windows purely
for backward compatibility with Windows version 3.1. This may be true,
but the functions are managed as efficiently as the new heap functions
discussed below. In fact, porting an application from 16-bit Windows
does not necessarily include migrating from global and local memory
functions to heap memory functions. The global and local functions
offer the same basic capabilities (and then some) and are just as fast
to work with. If anything, they are probably more convenient to work
with because you do not have to keep track of a heap handle.
Nonetheless, the implementation of these functions is not the same as
it was for 16-bit Windows. 16-bit Windows had a global heap, and each
application had a local heap. Those two heap managers implemented the
global and local functions. Allocating memory via GlobalAlloc meant
retrieving a chunk of memory from the global heap, while LocalAlloc
allocated memory from the local heap. Windows now has a single heap
for both types of functions—the default heap described above.
Now you're probably wondering if there is any difference between the
local and global functions themselves. Well, the answer is no, they
are now the same. In fact, they are interchangeable. Memory allocated
via a call to LocalAlloc can be reallocated with GlobalReAlloc and
then locked by LocalLock. The following table lists the global and
local functions now available.
It seems redundant to have two sets of functions that perform exactly
the same, but that's where the backward compatibility comes in.
Whether you used the global or local functions in a 16-bit Windows
application before doesn't matter now—they are equally efficient.
etc...
Also memory was really expensive back in the days. Also have a look at the PostScript Language Reference Manual which might give you a good insight on usage of local/global memory.
I have just started with .NET Framework with C# as my language. I somewhat understand the concept of GC in Java, and had a revisit to the same concept in .NET today.
In C#, the value types are put onto the stack(same as the case with Java,where local variables are put onto the stack). But in C#, even struct is included in value types. So, even structs are placed onto the stack. In a worst case scenario, where there are many method calls, and the stack is populated heavily with many methods, and each method has many local value types, and many structs that themselves have many local value types, will the Garbage Collector ever affect the stack? From what I researched(and partly what I was taught about), I understand that it won't do so. Primarily because manipulating stack content will involve a lot of overhead, and besides, GC only consults stack to lookup for references - and nothing more than that.
Just to add another question related on the same topic : Forcing a call to GC(like System.gc() in Java, not sure about the C# equivalent), doesn't ensure that the GC routine is called then and there. So where should I place such a call - where I expect that I need the GC to run, or any random place as there is no guarantee that my call would immediately trigger the GC? Or should I just leave the stuff to the Runtime Environment and not bother about it?
Note: I added the Java tag because I'm trying to link concepts from there. I understand that the internal functioning of GC in the two separate Runtime Environments will definitely be different, but I guess the underlying concept would be the same.
No garbage collect does not affect objects on the java stack.
GC only affects objects in the jvm's heap. The java GC process is multi-tiered and can be very complex, and worth reading up on. Check out a site like: http://javarevisited.blogspot.com/2011/04/garbage-collection-in-java.html to get a good grasp on how it operates.
As far as forcing the system's GC that is a bad idea. The jvm will have a better idea that you when a GC needs to run. If you are attempting to allocate a big object, the jvm will ensure the space is there for you without you needing to tell it to run the GC.
EDIT
My bad, you are more concerned about C# than java. The same principals of memory management apply, stack is unaffected, don't explicitly run a GC, etc. C# is designed to operate in a similar manor to java. http://msdn.microsoft.com/en-us/library/ms973837.aspx
Stacks don't need the assistance of a garbage collector; because, as you move out of stack frames (the scope of the current execution within the stack), the entire frame, including contents, is freed (and overwritten) as you create a new stack frame.
function foo(int a, int b) {
int i;
doStuff();
}
creates a stack frame (rough visualization)
---- Frame Start ----
(value for parameter a)
(value for parameter b)
(other items needed for tracking execution)
(extra stack frame space
(value for stack allocated i)
)
---- End of Frame ----
When entering a function, stack allocated variables are allocated as the frame is allocated, when exiting the frame, the entire frame is discarded, deallocating the memory for frame allocated variables.
Keep in mind that Java typically allocates object references and stack local primitives on the stack, not whole objects. Only a few recent optimizations permit in-stack allocation of objects not reachable outside the frame; which has such conditions on it that it is not considered something you can count on.
That said, references in the stack frame typically point to the heap, which is garbage collected normally.
If you read this for .NET, it only works on the managed heap:
http://msdn.microsoft.com/en-us/library/ee787088.aspx
MSDN seems to be a treasure trove of information, here is the parent topic on the GC in the CLR:
http://msdn.microsoft.com/en-us/library/0xy59wtx
.NET garbage collection is explained in depth in Garbage Collection on MSDN. The Garbage Collector tracks memory only in the Managed Heap.
No. AFAIK GC doesn't effect stack. It effects only HEAP memory. Stack frame will be created upon method calls and will be removed on method exit.
EDIT
This MSDN article explains how GC works in .NET framework.
I am having a little confusion , may be this question is very silly one.
where does the memory allocated for a unmanaged component?
In my .net code if i initiated an unmanaged component, where this component is going to be loaded and memory is allocated ?
How CLR marshall call between Managed and Unmanaged heap ?
EDIT
Thanks for your reply but what i am asking is say suppose i do a DLLIMPORT of User32.Dll , this is clearly a unmanaged dll and i call some function in User32.DLL now my question , how CLR marshall my call to this unmanged dll?
It starts out pretty easy. The pinvoke marshaller first calls LoadLibrary and passes the DLL name you specified, the DllImportAttribute.Value property. In your case, user32.dll is already loaded because it gets loaded by the .NET bootstrapper, its reference count just gets incremented. But normally the Windows loader gets the DLL mapped into the address space of the process so the exported functions can be called.
Next is GetProcAddress to get the address of the function to call, the DllImportAttribute.EntryPoint property. The marshaller makes a couple of tries unless you used ExactSpelling. A function name like "foo" is tested several possible ways, foo and fooW or fooA. Nasty implementation detail of Win32 related to the difference between Unicode and Ansi characters. The CharSet property matters here.
Now I need to wave hands a bit because it gets tricky. The marshaller constructs a stack frame, setting up the arguments that need to be passed to the exported function. This requires low level code, carefully excluded from prying eyes. Take it at face value that it performs the kind of translations that the Marshal class supports to convert between managed and unmanaged types. The DllImportAttribute.CallingConvention property matters here because that determines what argument value needs to be place where so that the called function can read it properly.
Next it sets up an SEH exception handler so that hardware exceptions raised by the called code can be caught and translated into a managed exception. The one that generates the more common one, AccessViolationException. And others.
Next, it pushes a special cookie on the stack to indicate that unmanaged code is about to start using stack. This prevents the garbage collector from blundering into unmanaged stack frames and interpret the pointers it finds there as managed object references. You can see this cookie back in the debugger's call stack, [Managed to Native Transition].
Next, just an indirect call to the function address as found with GetProcAddress(). That gets the unmanaged code running.
After the call, cleanup might need to be done to release memory that was allocated to pass the unmanaged arguments. The return value might need to be translated back to a managed value. And that's it, assuming nothing nasty happened, execution continues on the next managed code statement.
Unmanaged memory allocations come from the process heap. You are responsible for allocating/deallocating the memory, since it will not get garbage collected because the GC does not know about these objects.
Just as an academic piece of info expanding on what has been posted here:
There are about 8 different heaps that the CLR uses:
Loader Heap: contains CLR structures and the type system
High Frequency Heap: statics, MethodTables, FieldDescs, interface map
Low Frequency Heap: EEClass, ClassLoader and lookup tables
Stub Heap: stubs for CAS, COM wrappers, P/Invoke
Large Object Heap: memory allocations that require more than 85k bytes
GC Heap: user allocated heap memory private to the app
JIT Code Heap: memory allocated by mscoreee (Execution Engine) and the JIT compiler for managed code
Process/Base Heap: interop/unmanaged allocations, native memory, etc
HTH
Part of your question is answered by Michael. I answer the other part.
If CLR loaded into an unmanaged process, it is called CLR hosting. This usually involves calling an entry point in mscoree DLL and then the default AppDomain is loaded. In such a case, CLR asks for a block of memory from the process and when given, that becomes its memory space and will have a stack and heap.