What is memory space occupied for reference and object - c#

What happen at background(in case of memory) when I declare variable and then create object for that variable . Is reference variable store anywhere and in which format and how this variable points to the memory on heap. Please clarify below doubts in the comments.
For example
ClassA instance; // Where this variable store and how much memory occupies
instance=new ClassA(); //How instance variable points to memory
EDIT
What will effect on my program memory if my program contains so many unused variable.

The instance variable is just a pointer at runtime, it points to the object allocated in the GC heap. The variable can live anywhere, stack, CPU register, inside another object that's on the heap or in the loader heap if it is static.
The big deal about the garbage collector is that it is capable of finding this pointer during a garbage collection. And can thus see that the object is still referenced and can adjust the pointer value when it compacts the heap. That's fairly straight-forward when the reference is static on inside another object. Harder when it is on the stack or a register, the jitter provides sufficient info to let the GC find it.

A reference variable is stored inline. If it's a local variable, it's allocated on the stack, if it's a member of a class it's allocated as part of the object on the heap.
An instance of a class is always allocated on the heap.
A reference is just a pointer, but what's special is that the garbage collector is aware of the reference. So, a reference uses the amount of space that a pointer uses. In a 32 bit process it uses 4 bytes, in a 64 bit process it uses 8 bytes.

The storage location of a local variable for the reference itself is platform dependent (jitters can choose where they want to store it.) Typically it will be in memory on the call stack for the method defining the local or in a CPU register. The size is also platform dependent, but is generally 4 bytes for 32-bit architecture and 8 bytes for 64-bit architecture.
The reference may or may not 'point' to the heap. It's better to think of it as an opaque reference identifier which can be used for accessing the object. The underlying pointer can change at runtime.
Regarding unused variables, the optimizing compiler will often eliminate any unused local variables entirely, so it has no impact at all on runtime performance. Also, the type of overhead you're talking about for storing a reference is miniscule for modern platforms.

If you need answers about this then I would recommend you get your hands on "CLR via C#", it is a book about how the CLR functions and it includes lots of information about this.
To answer your question, there are many things that you need to think about to answer this question.
For instance you need to store the instructions for each method in the class. When the class first loads this will effectively be a pointer to the .Net IL instructions. When the method is first needed by the application, it will be JIT compiled to actual instructions for the processor and this will be stored in memory.
Then you have static storage for class fields that will be stored only once per class.
Each class in .Net that is instantiated requires storage for various reasons, but not limited to things like inheritance, garbage collection, layout. Then you have storage for the various references that you may keep to the object which will itself take storage.
If memory is truly critical to what you are doing, then C# may not be the best choice for your application. Otherwise, just enjoy the benefits in productivity you will gain from using .NET and accept that this ease of use comes with a price of memory usage and less performance (in some cases) from a C/C++ app.

Related

What are the managed stack and the managed heap?

MSDN's C# docs uses the terms stack and heap (eg. when talking about variable allocation).
It is stated that these terms means different memory spaces with well defined purposes (eg. local variables are always allocated in the stack whereas member variables of reference types are always allocated in the heap).
I understand what those terms mean in terms of certain CPU architecture and certain operating systems. Let's assume the x86-64 architecture, where the stack will be a per-thread contiguous memory block used to store call frames (local variables, return address, etc), and the heap being a one-per-process memory block for general purpose use.
What I do not understand yet is how those high-level and low-level definitions relate together.
I couldn't find an objective definition for the terms stack and heap in the MSDN docs, but I assume they mean something very similar to what those terms means in the x86-64 architecture.
For the purpose of this question, let's assume we are working on a custom device which the CPU and OS don't implement the concept of a separate stack and heap, they both (CPU/OS) deal directly with virtual memory. Will the stack and the heap (as cited in the MSDN docs) even exists in an .net application running on this particular device? If so, are they enforced by the CLR? Are they created on top of what the OS returns as allocated memory?
Operating Systems usualy handle memory as following:
The OS allocates the memory for global and static variables when it
loads the program into memory and doesn't deallocate it until the
program terminates. ... The other two regions, the stack and the heap,
are more dynamic - the memory is allocated and deallocated by the
program as needed. (Reference needed)
The "program" - in this case the CLR - allocates/requests memory on the stack or heap from the os when needed.
How does CLR manage this?
Managed Heap
The CLR implements an Managed Heap. This in fact is an abstraction over the native heap provided by the os:
The Managed Heap is used to divide the Heap in further segments (Large Object Heap, ObjectHeap for example).
When you allocate memory on the Managed Heap you won't get back a
real pointer as a return value. What you get instead is a Handle, which is a indirection to the "real" pointer. This is because the GC Collector can
compact your heap and move around the objects (allocated on native heap) and thus changing the pointer address)
If you want to allocate memory in the unmanaged heap you need to use
Marshal.AllocHGlobal or Marshal.AllocCoTaskMem
If code is blittable or you use the unsafe keyword you need to
pin the object so it doesent get moved around by GC. Because in both cases your referring to the object via the pointer directly and not the handle...
Managed Stack
This is used for stackwalking
Ref Counting, so the GC knows when he can move an object to the
next generation and so on...
So you see theres a level of abstraction over the native concepts provided by the os so it can be managed by the clr but the data at the end resides in the native stack/heap.
But what if the device/os doesn't provide a heap/stack (as stated per your question)?
CLR on a device/os without stack/heap
If the CLR would be made available on such an os without such memory segmentation, the CLR could be build seperately for that os (so it accepts the same IL and manages memory in a way it is efficient without the stack/heap).
In this case the documentation would change for such systems.
-OR-
It also could be possible that the clr creates their own datastructuers within the memory and manage its own stack and heap just to virtually comform the spezifications.
Remember the CLR is always Build separately for its destination (you dont use the same CLR Sourcecode and compile it once for linux and once for windows). Heck theres even an own CLR built in Javascript (.net blazor).
But since theres no such implementation nobody really can tell (what if questions are always tricky... What if I die tomorrow, what would I do today?). There are virtually no OS around that does not split their processes in Stack/heap.
Btw you stated following:
whereas member variables are always allocated in the heap
Thats not completely true if the object itself resides on the stack (eg. struct) then the members are also allocated on the stack.
A reference type always stores the pointer on the stack/heap (whereever the object resides of which it is member of) and points to an object in the heap. Local variables, eg. variables in methods, are always on the stack (value for valuetypes and pointer/handle for reference types)

Are static class members pinned?

I have a C# class, having a static ImageList object. This image list will be shared with various ListView headers (via SendMessage... HDM_SETIMAGELIST) on several forms in my application.
Though I understand that static objects are not eligible for garbage collection, it is not clear to me if they are also ineligible for relocation (compaction) by the garbage collector. Do I also need to pin this object since it is shared with unmanaged code, say, using GCHandle.Alloc?
Environment is VS 2008, Compact Framework 3.5.
The instance itself is not static. The reference is. If you null the reference the instance becomes eligible for GC. Internally, all static instances are references through a pinned handle to an array of statics. I.e. the instance is implicitly pinned by the runtime.
If you look at the GCroot of an instance declared as a static member, you'll see something like this:
HandleTable:
008113ec (pinned handle)
-> 032434c8 System.Object[]
-> 022427b0 System.Collections.Generic.List`1[[System.String, mscorlib]]
If you null the static reference the corresponding entry in the pinned array is nulled as well.
Now, these are obviously implementation details so they could potentially change.
Yes. You need to pin the object.
While it's true that the reference is static, that is, you may access this location anywhere from your member it's reference is still a GC handle. That is, it's eligible for garbage collection (and/or compaction) but it will of course never happen.
I don't think it's necessarily wrong to think that the static modifier implies that it will eventually have a static location in memory but there bigger issue is that there's no API that allows you to get at the memory address without pinning the object. Whether it's being moved by the GC or not.
Moreover, each static member is unqiue per AppDomain (not process). The same static member could exist in different memory locations in the same process and it can be garbage collected when the AppDomain unloads. This is quite the edge case I'll admit but there's no real advantage of not pinning objects even if it could be done without pinning.
Do I also need to pin this object since it is shared with unmanaged
code, say, using GCHandle.Alloc?
Yes. If the pointer is not pinned, GC is free to move that memory, so you may have dangling C++ pointers, pointing to some non valid, or worse, non their memory at all.
Also, "shared" word should be clarified. If you allocate and pass to unmanaged memory, which copies it somewhere, you may avoid to pin them constantly. Depends on what happens once you pass control to unmanaged environment.
EDIT
Even considering interesting answer from #Brian, I would still opt for pinning the pointer. To make explicit in the code the notion of the fixed pointer, avoid any possible misguide in future code maintenance and keep clarity.

Do references get updated when Garbage Collectors move data in heap?

I read that GC (Garbage Collectors) moves data in Heap for performance reasons, which I don't quite understand why since it is random access memory, maybe for better sequential access but I wonder if references in Stack get updated when such a move occurs in Heap. But maybe the offset address remains the same but other parts of data get moved by Garbage Collectors, I am not sure though.
I think this question pertains to implementation detail since not all garbage collectors may perform such optimization or they may do it but not update references (if it is a common practice among garbage collector implementations). But I would like to get some overall answer specific to CLR (Common Language Runtime) garbage collectors though.
And also I was reading Eric Lippert's "References are not addresses" article here, and the following paragraph confused me little bit:
If you think of a reference is actually being an opaque GC handle then
it becomes clear that to find the address associated with the handle
you have to somehow "fix" the object. You have to tell the GC "until
further notice, the object with this handle must not be moved in
memory, because someone might have an interior pointer to it". (There
are various ways to do that which are beyond the scope of this
screed.)
It sounds like for reference types, we don't want data to be moved. Then what else we store in the heap, which we can move around for performance optimization? Maybe type information we store there? By the way, in case you wonder what that article is about, then Eric Lippert is comparing references to pointers little bit and try to explain how it may be wrong to say that references are just addresses even though it is how C# implements it.
And also, if any of my assumptions above is wrong, please correct me.
Yes, references get updated during a garbage collection. Necessarily so, objects are moved when the heap is compacted. Compacting serves two major purposes:
it makes programs more efficient by using the processor's data caches more efficiently. That is a very, very big deal on modern processors, RAM is exceedingly slow compared to the execution engine, a fat two orders of magnitude. The processor can be stalled for hundreds of instructions when it has to wait for RAM to supply a variable value.
it solves the fragmentation problem that heaps suffer from. Fragmentation occurs when a small object is released that is surrounded by live objects. A hole that cannot be used for anything else but an object of equal or smaller size. Bad for memory usage efficiency and processor efficiency. Note how the LOH, the Large Object Heap in .NET, does not get compacted and therefore suffers from this fragmentation problem. Many questions about that at SO.
In spite of Eric's didactic, an object reference really is just an address. A pointer, exactly the same kind you'd use in a C or C++ program. Very efficient, necessarily so. And all the GC has to do after moving an object is update the address stored in that pointer to the moved object. The CLR also permits allocating handles to objects, extra references. Exposed as the GCHandle type in .NET, but only necessary if the GC needs help determining if an object should stay alive or should not be moved. Only relevant if you interop with unmanaged code.
What is not so simple is finding that pointer back. The CLR is heavily invested in ensuring that can be done reliably and efficiently. Such pointers can be stored in many different places. The easier ones to find back are object references stored in a field of an object, a static variable or a GCHandle. The hard ones are pointers stored on the processor stack or a CPU register. Happens for method arguments and local variables for example.
One guarantee that the CLR needs to provide to make that happen is that the GC can always reliably walk the stack of a thread. So it can find local variables back that are stored in a stack frame. Then it needs to know where to look in such a stack frame, that's the job of the JIT compiler. When it compiles a method, it doesn't just generate the machine code for the method, it also builds a table that describes where those pointers are stored. You'll find more details about that in this post.
Looking at C++\CLI In Action, there's a section about interior pointers vs pinning pointers:
C++/CLI provides two kinds of pointers that work around this problem.
The first kind is called an interior pointer, which is updated by the
runtime to reflect the new location of the object that's pointed to
every time the object is relocated. The physical address pointed to by
the interior pointer never remains the same, but it always points to
the same object. The other kind is called a pinning pointer, which
prevents the GC from relocating the object; in other words, it pins
the object to a specific physical location in the CLR heap. With some
restrictions, conversions are possible between interior, pinning, and
native pointers.
From that, you can conclude that reference types do move in the heap and their addresses do change. After the Mark and Sweep phase, the objects get compacted inside the heap, thus actually moving to new addresses. The CLR is responsible to keep track of the actual storage location and update those interior pointers using an internal table, making sure that when accessed, it still points to the valid location of the object.
There's an example taken from here:
ref struct CData
{
int age;
};
int main()
{
for(int i=0; i<100000; i++) // ((1))
gcnew CData();
CData^ d = gcnew CData();
d->age = 100;
interior_ptr<int> pint = &d->age; // ((2))
printf("%p %d\r\n",pint,*pint);
for(int i=0; i<100000; i++) // ((3))
gcnew CData();
printf("%p %d\r\n",pint,*pint); // ((4))
return 0;
}
Which is explained:
In the sample code, you create 100,000 orphan CData objects ((1)) so
that you can fill up a good portion of the CLR heap. You then create a
CData object that's stored in a variable and ((2)) an interior pointer
to the int member age of this CData object. You then print out the
pointer address as well as the int value that is pointed to. Now,
((3)) you create another 100,000 orphan CData objects; somewhere along
the line, a garbage-collection cycle occurs (the orphan objects
created earlier ((1)) get collected because they aren't referenced
anywhere). Note that you don't use a GC::Collect call because that's
not guaranteed to force a garbage-collection cycle. As you've already
seen in the discussion of the garbage-collection algorithm in the
previous chapter, the GC frees up space by removing the orphan objects
so that it can do further allocations. At the end of the code (by
which time a garbage collection has occurred), you again ((4)) print
out the pointer address and the value of age. This is the output I got
on my machine (note that the addresses will vary from machine to
machine, so your output values won't be the same):
012CB4C8 100
012A13D0 100

Why is it called Marshal.AllocHGlobal if it allocates on the local heap?

From the MSDN documentation of Marshal.AllocHGlobal:
AllocHGlobal is one of two memory allocation methods in the Marshal class. This method exposes the Win32 LocalAlloc function from Kernel32.dll.
Considering there's a GlobalAlloc API which allocates memory on the global heap, rather than the local heap, isn't this method's name rather misleading?
Was there a reason for naming it AllocHGlobal, rather than AllocHLocal?
Update: Simon points out in the comments that there's no such thing as a global heap in Windows any more, and the GlobalAlloc and LocalAlloc APIs remained for legacy purposes only. These days, the GlobalAlloc API is nothing morethan a wrapper for LocalAlloc.
This explains why the API doesn't call GlobalAlloc at all, but it doesn't explain why the API was named AllocHGlobal when it doesn't (can't) use a global heap, nor does it even call GlobalAlloc. The naming cannot possibly be for legacy reasons, because it wasn't introduced until .NET 2.0, way after 16-bit support was dropped. So, the question remains: why is Marshal.AllocHGlobal so misleadingly named?
Suppose you're doing data transfer between apps using drag and drop or over the clipboard. To populate the STGMEDIUM structure you need an HGLOBAL. So you call AllocHGlobal. Hence the name.
The main use for this function is to interop with APIs that want an HGLOBAL. It would be confusing if it was called anything else because when you wanted an HGLOBAL you'd have to find some documentation to tell you that AllocAnythingElse produced a value you could use as an HGLOBAL.
This goes back to the olden days of Windows version 3. Back then there was a notion of a "default heap", the GlobalAlloc() api function allocated from it. Memory allocated from that heap could be shared between all processes.
That changed in the 32-bit version of Windows, processes can no longer share memory through a heap. Which made the terms "global heap" and "local heap" meaningless. There is still a notion of a default heap, the "process heap". GlobalAlloc() now allocates from that heap. But it can't be shared across process boundaries. The actual implementation of GlobalAlloc, and of Marshal.AllocHGlobal, uses the LocalAlloc() api function. Another Windows 3 holdover, somewhat more suitably named for what happens these days. It in turn uses HeapAlloc() with GetProcessHeap() on 32-bit Windows.
Agreeing on the heap to use is a significant interop concern. This very often goes wrong in poorly written C code that you pinvoke. Any such code that returns a pointer to allocated memory that needs to be released by the caller often fails due to memory leaks or access violations. Such C code allocates from its own heap with the malloc() function. Which is a private heap created by the C runtime library. You have no hope of releasing such memory, you don't know what heap was used and have no way to obtain the handle to the CRT heap.
This can only come to a good end when the C code uses a well-known heap. Like the process heap. Or CoTaskMemAlloc(), used by COM code. The other one in the Marshal class. Note that the pinvoke marshaller always releases memory when necessary with CoTaskMemFree(). That's a kaboom on Vista and up if that memory wasn't allocated with CoTaskMemAlloc(), a silent leak on XP.
I think you should read https://msdn.microsoft.com/en-us/library/ms810603.aspx
A short part of it:
Global and Local Memory Functions At first glance it appears that the
local and global memory management functions exist in Windows purely
for backward compatibility with Windows version 3.1. This may be true,
but the functions are managed as efficiently as the new heap functions
discussed below. In fact, porting an application from 16-bit Windows
does not necessarily include migrating from global and local memory
functions to heap memory functions. The global and local functions
offer the same basic capabilities (and then some) and are just as fast
to work with. If anything, they are probably more convenient to work
with because you do not have to keep track of a heap handle.
Nonetheless, the implementation of these functions is not the same as
it was for 16-bit Windows. 16-bit Windows had a global heap, and each
application had a local heap. Those two heap managers implemented the
global and local functions. Allocating memory via GlobalAlloc meant
retrieving a chunk of memory from the global heap, while LocalAlloc
allocated memory from the local heap. Windows now has a single heap
for both types of functions—the default heap described above.
Now you're probably wondering if there is any difference between the
local and global functions themselves. Well, the answer is no, they
are now the same. In fact, they are interchangeable. Memory allocated
via a call to LocalAlloc can be reallocated with GlobalReAlloc and
then locked by LocalLock. The following table lists the global and
local functions now available.
It seems redundant to have two sets of functions that perform exactly
the same, but that's where the backward compatibility comes in.
Whether you used the global or local functions in a 16-bit Windows
application before doesn't matter now—they are equally efficient.
etc...
Also memory was really expensive back in the days. Also have a look at the PostScript Language Reference Manual which might give you a good insight on usage of local/global memory.

(C#) Can struct be viewed in CLRProfiler?

As CLRProfiler use words like HEAP statistic, OBJECTS finalized, it made me think it will atmost only show boxed struct? So what if the structs are my source of problem? How can I know about it with CLRProfiler??
According to the documentation
"CLRProfiler is a tool that is focused on analyzing what is going on in the garbage collector heap"
so naturally you'll see various statistics concerning the heap.
Structs are value types, so when they are allocated on their own, they are allocated on the stack. The stack is cleaned up during stack unwind and thus is not subject to garbage collection by the GC. If value types are boxed or more commonly if they are part of a reference type, their values will be stored on the heap.
My guess is that if a struct is a source of your problem it is because your application stores a great number of these. This is typically done using arrays (which is the underlying type of a number of .NET collections). Array is a reference type, so it is stored on the heap. If the array holds structs the values too go on the heap as part of the array instance.
In other words, if you want to inspect standalone structs during runtime, you have to locate them on the stacks of the running managed threads. To be honest I am not too familiar with CLRProfiler so I don't know if it supports that. You can, however, inspect this with debuggers such as WinDbg. If, on the other hand, the struct in question is stored in a collection, you have to locate the instance on the heap.

Categories