How would the memory look like for this object? - c#

I am wondering how the memory layout for this class (its object) would look like:
class MyClass
{
string myString;
int myInt;
public MyClass(string str, int i)
{
myString = str;
myInt = i;
}
}
MyClass obj = new MyClass("hello", 42);
Could anyone visualize that?
Update:
Based on the answer from Olivier Rogier and the comments from ckuri and Jon Skeet I tried to come up with a high level chart, heavily influenced by the devblog article mentioned by ckuri.
So to my understanding:
obj (8 bytes reference) points to the object including metadata (actually not to its beginning, but let's ignore that for simplicity).
At this place the myInt is stored and the myString reference value (which is the reference to the real string value)
I don't want to got into the last details, but what I am still curious about:
If obj.myString shall be accessed, are there two "lookups" necessary, e.g. first looking up obj, then following it and looking up myString or is there something like a global address table where the address for obj.myString is directly stored?
Where is the reference value of obj stored? Is it part of the program object block like myString is part of the obj object block? (assuming obj is created inside an instance program)

At this place the myInt is stored and the myString reference value (which is the reference to the real string value)
Let's make sure you're not going down bad paths here.
First off, it's unclear to me why you re-ordered the integer and the string in the diagram compared to the source code. It is implementation-defined how the string and the integer are packed, and in what order, and whether there are any padding bytes. If you care about these details, ask a more clear question.
Second, it is unclear what you mean by "the real string value". Strings are of reference type. The real value of the string is the reference. The values of the contents of the string are in the referenced location.
if obj.myString shall be accessed, are there two "lookups" necessary, e.g. first looking up obj, then following it and looking up myString
I assume that by "lookup" you mean dereference.
So for example, if we have:
var obj = whatever;
char c = obj.myString[1];
then yes, we have two dereferences. The . dereferences obj to get myString, which is a reference. The [1] dereferences myString to get the char.
Where is the reference value of obj stored?
obj is a variable. A variable is a storage location. That storage location can be in a number of places:
If obj is short lived, or even better, ephemeral, then it can be enregistered or put on the short term pool. (More commonly known as the stack, but it is a better habit in my opinion to think of the short term pool in terms of its semantics, namely, storage that lives not longer than activation. The stack is an implementation detail.)
If obj is not known to be short lived then it goes on the long-term pool, also known as the managed heap.

Each instance of a class or struct has a "personal memory space" for data, but methods are shared once for all objects.
First, you need 4 bytes on x32 or 8 bytes on x64 to store the reference to the memory address of the object (reference is a hidden pointer to forget to manage it).
Next, the object has two data members here:
One integer that takes 4 bytes.
One string that here takes 5 chars : 5x2 bytes = 10 bytes.
So for data the object takes 18 bytes on x32 or 22 bytes on x64 system.
Since string object contains an integer for the length, the size is a little more than that : 22 on x32 and 26 on x64.
Since string is a reference, we need to add again 4 or 8 bytes => 26 or 34 bytes.
Since string has some other static and instance fields in the class declaration like the first char, it takes a little more than this.
Is string actually an array of chars or does it just have an indexer?
In addition to that, the memory in the code segment has the instructions of the code of methods. This code is common for all instances.
In addition to that, there are class tables and virtual tables to describe types, methods signatures and polymorphism rules.
If the object is instantiated in a method it uses the heap memory.
If the object is instantiated in the declaration as a class member, I don't know how works .NET but it may be allocated in the data segment of the processus.
And memory is like a train where wagoons are the bytes.
Here is a pseudo-diagram of the memory.
It is not the very true reality but it may help to understand:
Does accessing a variable in C# class reads the entire class from memory?
C# Heap(ing) Vs Stack(ing) In .NET
A byte is the elementary unit of the memory that stores one value at a time between 0 and 255 (unsigned) or -128 and +127 (signed).
Learn the basics about C# data types' variables
Shifting Behavior for Signed Integers
A Tutorial on Data Representation
Seeing this sketch today (2021.01.28) I realize it may be mlisleading, and it is why I wrote "It is not the very true reality but it may help to understand", because in reality the code of the implementation of methods are loaded from the binary files EXE and DLLs when the process is starded and is stored is the CODE SEGMENT, as all data, static (literals) and dynamic (instances) are in the DATA SEGMENT (if things had not changed since x32 and protected mode). Methods' non-virtual tables as well as methods' virtual tables are not stored in the data segment for each instance of objects. I don't remember details but these tables is for code. Also Data of each instance of an object is a projection from its definition as well as its ancestors, in one place, one full instance.
Memory segmentation
x86 memory segmentation

Related

Why is the Pinnable<T> class in C# 7.2 defined the way it is?

I'm aware that Pinnable<T> is an internal class used by the methods in the new Unsafe class, and it's not meant to be used anywhere else other than in that class. This question is not about something practical, but it's just to understand why it's been designed like this and to learn a bit more about the language and its various "tricks" like this one.
As a recap, the Pinnable<T> class is defined here, and it looks like this:
[StructLayout(LayoutKind.Sequential)]
internal sealed class Pinnable<T>
{
public T Data;
}
And it's mainly used in the Span<T>.DangerousCreate method, here:
public static Span<T> DangerousCreate(object obj, ref T objectData, int length)
{
Pinnable<T> pinnable = Unsafe.As<Pinnable<T>>(obj);
IntPtr byteOffset = Unsafe.ByteOffset<T>(ref pinnable.Data, ref objectData);
return new Span<T>(pinnable, byteOffset, length);
}
The reason for Pinnable<T> being that it's used to keep track of the original object, in case the Span<T> instance was created by one (instead of a native pointer).
Given that reference type doesn't matter when pinning a reference (fixing both a ref T and Unsafe.As<T, byte>(ref T) works the same), is there a specific reason why the Pinnable<T> class was made generic? The original design in DotNetCross here in fact had a Pinnable class with just a single byte field, and it worked just the same. Is there any reason why using a generic class in this case would be an advantage, other than avoiding to cast the reference time when writing/reading/returning it?
Is there any other way, other than this unsafe-cast done with Unsafe.As, to get a reference to an object (I mean a reference to the object contents, otherwise it'd be the same as any variable of a class type)? I mean, any way to get a reference (which should basically have the same address of the actual object variable in the first place, right?) to an object without having to pass through some custom defined secondary class.
First of all, the Struct in [StructLayout(LayoutKind.Sequential)] doesn't mean that it is only valid for structs, it means the layout of the actual structure of the fields in memory, be it in a class or in a value type. This controls the actual runtime layout of the data, not just how the type would marshal to unmanaged code. The Sequential is important because without it, the runtime is pretty much free to store the memory however it sees fit, which means that Data may have some padding before it.
From what I understand about the implementation, the reason for Pinnable is to allow creating an instance of Span to a memory that may be moved by the GC, without having to pin the object first. If you don't use actual pointers and just references, nothing at all will need to be pinned.
I have noticed that it was introduced in a commit with a description saying it made Span more "portable" (a bold word for something that does a lot of unsafe things). I can't think of any other reason than something related to alignment for why it is generic. I suppose representing a T in terms of an offset from another T is better than as an offset from a byte. It may happen that the type of the first field may play a role in its actual address, even if the type was marked with LayoutKind.Sequential.
A reference to an object is different from an interior reference to an object (a reference to its data). It is implementation defined, but in .NET Framework, an instance of any class (or a boxed value type) starts with a header consisting of a sync block (for lock) and a pointer to the method table, a.k.a. the type of the object. On 32-bit, the header is 8 bytes, but the actual pointer points to the pointer to the method table (for performance reasons, getting the type happens more often than locking an object).
One but not portable way of getting the pointer to the start of the data is therefore casting the object reference to a pointer and adding 4 bytes to it. There the first field should start.
Another way I can think of is utilising GCHandle.AddrOfPinnedObject. It is commonly used for accessing array or string data, but it works for other objects:
[StructLayout(LayoutKind.Sequential)]
class Obj
{
public int A;
}
var obj = new Obj();
var gc = GCHandle.Alloc(obj, GCHandleType.Pinned);
IntPtr interior = gc.AddrOfPinnedObject();
Marshal.WriteInt32(interior, 0, 16);
Console.WriteLine(obj.A);
I think this actually is quite portable, but still needs to pin the object (there is InternalAddrOfPinnedObject defined in GCHandle, but even if that doesn't check whether the handle is actually pinned, the returned value may not be valid if it was used on a non-pinned object).
Still, the technique Span uses seems like the most portable way of doing that, since a lot of the underlying work is done in pure CIL (like reference arithmetics).

What is the memory overhead of a .NET custom struct type?

There is a fixed overhead associated with a .NET object as more fully outlined in this SO question: What is the memory overhead of a .NET Object being 12 or 24 bytes depending on whether you are in a 32 bit or 64 bit process.
That said, basic value types like int, double, boolean, etc incur no overhead because they are Value Types.
Where does that leave custom struct types that you put together in your application? On the one hand, they are value types like int, double, boolean above [so should not incur overhead] but on the other hand they derive indirectly from System.Object and so should (technically) incur overhead.
The size of a struct is determined by the sum of the sizes of its fields plus the padding between the fields that get them aligned properly, plus padding at the end of the struct that ensures that they are still aligned properly when the struct is stored in an array.
So, for one, a struct is not entirely unlikely to contain a field of a reference type. Like a string. In which case the struct is going to be larger since references are pointers under the hood, taking 8 bytes instead of 4.
The padding is the much sneakier detail. In 32-bit mode, variables cannot count on an alignment better than 4. An issue with double and long, 8 byte types that can easily get mis-aligned. Notably affecting perf of a 32-bit program, if a double is misaligned across an L1 cache boundary line then a read or write can be 3x as slow. Also the core reason that these types are not atomic in the C# memory model. Not an issue in 64-bit mode, the CLR then must and does provide an alignment guarantee of 8.
Nevertheless, the CLR does attempt to give such struct members proper alignment in 32-bit mode, even though the struct itself is not guaranteed to be aligned. Otherwise a side-effect of structs having an implicit [StructLayout(LayoutKind.Sequential, Pack=8)] attribute. An oddity in the CLR source code, the C++ statement that does this has no comment. I suspect it was a quicky fix for less than stellar unmanaged interop perf, keeping structs blittable is pretty important to speed.
You'll however not always get this, the CLR gives up if the struct contains a member that is itself a struct that does not have sequential layout. Notably this happens for DateTime and DateTimeOffset, the programmers who wrote them applied the [StructLayout(LayoutKind.Auto)] attribute on them for very mysterious reasons. In the case of DateTimeOffset likely to be a copy/paste bug. Layout of your struct will now be unpredictable, it becomes LayoutKind.Auto as well and the CLR re-arranges fields to minimize the struct size. This can cause extra padding in x64 mode.
These are obscure implementation details that you should never fret about.
Where does that leave custom struct types that you put together in
your application?
They are no different than primitive types. They carry no additional overhead other than the fields they have. The fact that they derive from object doesn't mean they incur the overhead that reference types carry, which is the method table pointer and sync root block.
You can test this out using Marshal.SizeOf:
void Main()
{
var f = new Foo();
Console.WriteLine(Marshal.SizeOf(f));
}
public struct Foo
{
public int X { get; set; }
public int Y { get; set; }
}
This will print 8 in a 32bit mode, which is exactly two integer values (4 bytes each).
Note Marshal.SizeOf will output the size of the unmanaged object. The CLR is still free to re-order fields or pack them together.
You may have some overhead with fields' alignments:
https://msdn.microsoft.com/en-us/library/system.runtime.interopservices.structlayoutattribute(v=vs.100).aspx
E.g.
public struct Test {
public Byte A;
public long B;
}
will be of size 16 bytes on 64-bit process and 12 bytes on 32-bit process correspondingly (when we could expect 9 bytes only);
Zero.
Unless there's gaps due to alignment, in which case it's the cost of those gaps.
But otherwise the structure is just the same as if the fields were separate variables laid out on the stack.
Box it though, and you're dealing with it via object which has the same overheads as any other reference type.

Obtain non-explicit field offset

I have the following class:
[StructLayout(LayoutKind.Sequential)]
class Class
{
public int Field1;
public byte Field2;
public short? Field3;
public bool Field4;
}
How can I get the byte offset of Field4 starting from the start of the class data (or object header)?
To illustrate:
Class cls = new Class();
fixed(int* ptr1 = &cls.Field1) //first field
fixed(bool* ptr2 = &cls.Field4) //requested field
{
Console.WriteLine((byte*)ptr2-(byte*)ptr1);
}
The resulting offset is, in this case, 5, because the runtime actually moves Field3 to the end of the type (and pads it), probably because it its type is generic. I know there is Marshal.OffsetOf, but it returns unmanaged offset, not managed.
How can I retrieve this offset from a FieldInfo instance? Is there any .NET method used for that, or do I have to write my own, taking all the exceptions into account (type size, padding, explicit offsets, etc.)?
Offset of a field within a class or struct in .NET 4.7.2:
public static int GetFieldOffset(this FieldInfo fi) =>
GetFieldOffset(fi.FieldHandle);
public static int GetFieldOffset(RuntimeFieldHandle h) =>
Marshal.ReadInt32(h.Value + (4 + IntPtr.Size)) & 0xFFFFFF;
These return the byte offset of a field within a class or struct, relative to the layout of some respective managed instance at runtime. This works for all StructLayout modes, and for both value- and reference-types (including generics, reference-containing, or otherwise non-blittable). The offset value is zero-based relative to the beginning of the user-defined content or 'data body' of the struct or class only, and doesn't include any header, prefix, or other pad bytes.
Discussion
Since struct types have no header, the returned integer offset value can used directly via pointer arithmetic, and System.Runtime.CompilerServices.Unsafe if necessary (not shown here). Reference-type objects, on the other hand, have a header which has to be skipped-over in order to reference the desired field. This object header is usually a single IntPtr, which means IntPtr.Size needs to be added to the the offset value. It is also necessary to dereference the GC ("garbage collection") handle to obtain the object's address in the first place.
With these considerations, we can synthesize a tracking reference to the interior of a GC object at runtime by combining the field offset (obtained via the method shown above) with an instance of the class (e.g. an Object handle).
The following method, which is only meaningful for class (and not struct) types, demonstrates the technique. For simplicity, it uses ref-return and the System.Runtime.CompilerServices.Unsafe libary. Error checking, such as asserting fi.DeclaringType.IsSubclassOf(obj.GetType()) for example, is also elided for simplicity.
/// <summary>
/// Returns a managed reference ("interior pointer") to the value or instance of type 'U'
/// stored in the field indicated by 'fi' within managed object instance 'obj'
/// </summary>
public static unsafe ref U RefFieldValue<U>(Object obj, FieldInfo fi)
{
var pobj = Unsafe.As<Object, IntPtr>(ref obj);
pobj += IntPtr.Size + GetFieldOffset(fi.FieldHandle);
return ref Unsafe.AsRef<U>(pobj.ToPointer());
}
This method returns a managed "tracking" pointer into the interior of the garbage-collected object instance obj.[see comment] It can be used to arbitrarily read or write the field, so this one function replaces the traditional pair of separate getter/setter functions. Although the returned pointer cannot be stored in the GC heap and thus has a lifetime limited to the scope of the current stack frame (i.e., and below), it is very cheap to obtain at any time by simply calling the function again.
Note that this generic method is only parameterized with <U>, the type of the fetched pointed-at value, and not for the type ("<T>", perhaps) of the containing class (the same applies for the IL version below). It's because the bare-bones simplicity of this technique doesn't require it. We already know that the containing instance has to be a reference (class) type, so at runtime it will present via a reference handle to a GC object with object header, and those facts alone are sufficient here; nothing further needs to be known about putative type "T".
It's a matter of opinion whether adding vacuous <T, … >, which would allow us to indicate the where T: class constraint, would improve the look or feel of the example above. It certainly wouldn't hurt anything; I believe the JIT is smart enough to not generate additional generic method instantiations for generic arguments that have no effect. But since doing so seems chatty (other than for stating the constraint), I opted for the minimalism of strict necessity here.
In my own use, rather than passing a FieldInfo or its respective FieldHandle every time, what I actually retain are the various integer offset values for the fields of interest as returned from GetFieldOffset, since these are also invariant at runtime, once obtained. This eliminates the extra step (of calling GetFieldOffset) each time the pointer is fetched. In fact, since I am able to include IL code in my projects, here is the exact code that I use for the function above. As with the C# just shown, it trivially synthesizes a managed pointer from a containing GC-object obj, plus a (retained) integer offset offs within it.
// Returns a managed 'ByRef' pointer to the (struct or reference-type) instance of type U
// stored in the field at byte offset 'offs' within reference type instance 'obj'
.method public static !!U& RefFieldValue<U>(object obj, int32 offs) aggressiveinlining
{
ldarg obj
ldarg offs
sizeof object
add
add
ret
}
So even if you are not able to directly incorporate this IL, showing it here, I think, nicely illustrates the extremely low runtime overhead and alluring simplicity, in general, of this technique.
Example usage
class MyClass { public byte b_bar; public String s0, s1; public int iFoo; }
The first demonstration gets the integer offset of reference-typed field s1 within an instance of MyClass, and then uses it to get and set the field value.
var fi = typeof(MyClass).GetField("s1");
// note that we can get a field offset without actually
// having any instance of 'MyClass'
var offs = GetFieldOffset(fi);
// i.e., later...
var mc = new MyClass();
RefFieldValue<String>(mc, offs) = "moo-maa"; // field "setter"
// note: method call used as l-value, on the left-hand side of '=' assignment!
RefFieldValue<String>(mc, offs) += "!!"; // in-situ access
Console.WriteLine(mc.s1); // --> moo-maa!! (in the original)
// can be used as a non-ref "getter" for by-value access
var _ = RefFieldValue<String>(mc, offs) + "%%"; // 'mc.s1' not affected
If this seems a bit cluttered, you can dramatically clean it up by retaining the managed pointer as ref local variable. As you know, this type of pointer is automatically adjusted--with interior offset preserved--whenever the GC moves the containing object. This means that it will remain valid even as you continue accessing the field unawares. In exchange for allowing this capability, the CLR requires that the ref local variable itself not be allowed to escape its stack frame, which in this case is enforced by the C# compiler.
// demonstrate using 'RuntimeFieldHandle', and accessing a value-type
// field (int) this time
var h = typeof(MyClass).GetField(nameof(mc.iFoo)).FieldHandle;
// later... (still using 'mc' instance created above)
// acquire managed pointer to 'mc.iFoo'
ref int i = ref RefFieldValue<int>(mc, h);
i = 21; // directly affects 'mc.iFoo'
Console.WriteLine(mc.iFoo == 21); // --> true
i <<= 1; // operates directly on 'mc.iFoo'
Console.WriteLine(mc.iFoo == 42); // --> true
// any/all 'ref' uses of 'i' just affect 'mc.iFoo' directly:
Interlocked.CompareExchange(ref i, 34, 42); // 'mc.iFoo' (and 'i' also): 42 -> 34
Summary
The usage examples focused on using the technique with a class object, but as noted, the GetFieldOffset method shown here works perfectly fine with struct as well. Just be sure not to use the RefFieldValue method with value-types, since that code includes adjusting for an expected object header. For that simpler case, just use System.Runtime.CompilerServicesUnsafe.AddByteOffset for your address arithmetic instead.
Needless to say, this technique might seem a bit radical to some. I'll just note that it has worked flawlessly for me for many years, specifically on .NET Framework 4.7.2, and including 32- and 64-bit mode, debug vs. release, plus whichever various JIT optimization settings I've tried.
With some tricks around TypedReference.MakeTypedReference, it is possible to obtain the reference to the field, and to the start of the object's data, then just subtract. The method can be found in SharpUtils.

How to get size of class / struct object in C# [duplicate]

I have a class, and I want to inspect its fields and report eventually how many bytes each field takes. I assume all fields are of type Int32, byte, etc.
How can I find out easily how many bytes does the field take?
I need something like:
Int32 a;
// int a_size = a.GetSizeInBytes;
// a_size should be 4
You can't, basically. It will depend on padding, which may well be based on the CLR version you're using and the processor etc. It's easier to work out the total size of an object, assuming it has no references to other objects: create a big array, use GC.GetTotalMemory for a base point, fill the array with references to new instances of your type, and then call GetTotalMemory again. Take one value away from the other, and divide by the number of instances. You should probably create a single instance beforehand to make sure that no new JITted code contributes to the number. Yes, it's as hacky as it sounds - but I've used it to good effect before now.
Just yesterday I was thinking it would be a good idea to write a little helper class for this. Let me know if you'd be interested.
EDIT: There are two other suggestions, and I'd like to address them both.
Firstly, the sizeof operator: this only shows how much space the type takes up in the abstract, with no padding applied round it. (It includes padding within a structure, but not padding applied to a variable of that type within another type.)
Next, Marshal.SizeOf: this only shows the unmanaged size after marshalling, not the actual size in memory. As the documentation explicitly states:
The size returned is the actually the
size of the unmanaged type. The
unmanaged and managed sizes of an
object can differ. For character
types, the size is affected by the
CharSet value applied to that class.
And again, padding can make a difference.
Just to clarify what I mean about padding being relevant, consider these two classes:
class FourBytes { byte a, b, c, d; }
class FiveBytes { byte a, b, c, d, e; }
On my x86 box, an instance of FourBytes takes 12 bytes (including overhead). An instance of FiveBytes takes 16 bytes. The only difference is the "e" variable - so does that take 4 bytes? Well, sort of... and sort of not. Fairly obviously, you could remove any single variable from FiveBytes to get the size back down to 12 bytes, but that doesn't mean that each of the variables takes up 4 bytes (think about removing all of them!). The cost of a single variable just isn't a concept which makes a lot of sense here.
Depending on the needs of the questionee, Marshal.SizeOf might or might not give you what you want. (Edited after Jon Skeet posted his answer).
using System;
using System.Runtime.InteropServices;
public class MyClass
{
public static void Main()
{
Int32 a = 10;
Console.WriteLine(Marshal.SizeOf(a));
Console.ReadLine();
}
}
Note that, as jkersch says, sizeof can be used, but unfortunately only with value types. If you need the size of a class, Marshal.SizeOf is the way to go.
Jon Skeet has laid out why neither sizeof nor Marshal.SizeOf is perfect. I guess the questionee needs to decide wether either is acceptable to his problem.
From Jon Skeets recipe in his answer I tried to make the helper class he was refering to. Suggestions for improvements are welcome.
public class MeasureSize<T>
{
private readonly Func<T> _generator;
private const int NumberOfInstances = 10000;
private readonly T[] _memArray;
public MeasureSize(Func<T> generator)
{
_generator = generator;
_memArray = new T[NumberOfInstances];
}
public long GetByteSize()
{
//Make one to make sure it is jitted
_generator();
long oldSize = GC.GetTotalMemory(false);
for(int i=0; i < NumberOfInstances; i++)
{
_memArray[i] = _generator();
}
long newSize = GC.GetTotalMemory(false);
return (newSize - oldSize) / NumberOfInstances;
}
}
Usage:
Should be created with a Func that generates new Instances of T. Make sure the same instance is not returned everytime. E.g. This would be fine:
public long SizeOfSomeObject()
{
var measure = new MeasureSize<SomeObject>(() => new SomeObject());
return measure.GetByteSize();
}
It can be done indirectly, without considering the alignment.
The number of bytes that reference type instance is equal service fields size + type fields size.
Service fields(in 32x takes 4 bytes each, 64x 8 bytes):
Sysblockindex
Pointer to methods table
+Optional(only for arrays) array size
So, for class without any fileds, his instance takes 8 bytes on 32x machine. If it is class with one field, reference on the same class instance, so, this class takes(64x):
Sysblockindex + pMthdTable + reference on class = 8 + 8 + 8 = 24 bytes
If it is value type, it does not have any instance fields, therefore in takes only his fileds size. For example if we have struct with one int field, then on 32x machine it takes only 4 bytes memory.
I had to boil this down all the way to IL level, but I finally got this functionality into C# with a very tiny library.
You can get it (BSD licensed) at bitbucket
Example code:
using Earlz.BareMetal;
...
Console.WriteLine(BareMetal.SizeOf<int>()); //returns 4 everywhere I've tested
Console.WriteLine(BareMetal.SizeOf<string>()); //returns 8 on 64-bit platforms and 4 on 32-bit
Console.WriteLine(BareMetal.SizeOf<Foo>()); //returns 16 in some places, 24 in others. Varies by platform and framework version
...
struct Foo
{
int a, b;
byte c;
object foo;
}
Basically, what I did was write a quick class-method wrapper around the sizeof IL instruction. This instruction will get the raw amount of memory a reference to an object will use. For instance, if you had an array of T, then the sizeof instruction would tell you how many bytes apart each array element is.
This is extremely different from C#'s sizeof operator. For one, C# only allows pure value types because it's not really possible to get the size of anything else in a static manner. In contrast, the sizeof instruction works at a runtime level. So, however much memory a reference to a type would use during this particular instance would be returned.
You can see some more info and a bit more in-depth sample code at my blog
if you have the type, use the sizeof operator. it will return the type`s size in byte.
e.g.
Console.WriteLine(sizeof(int));
will output:
4
You can use method overloading as a trick to determine the field size:
public static int FieldSize(int Field) { return sizeof(int); }
public static int FieldSize(bool Field) { return sizeof(bool); }
public static int FieldSize(SomeStructType Field) { return sizeof(SomeStructType); }
Simplest way is: int size = *((int*)type.TypeHandle.Value + 1)
I know this is implementation detail but GC relies on it and it needs to be as close to start of the methodtable for efficiency plus taking into consideration how GC code complex is nobody will dare to change it in future. In fact it works for every minor/major versions of .net framework+.net core. (Currently unable to test for 1.0)
If you want more reliable way, emit a struct in a dynamic assembly with [StructLayout(LayoutKind.Auto)] with exact same fields in same order, take its size with sizeof IL instruction. You may want to emit a static method within struct which simply returns this value. Then add 2*IntPtr.Size for object header. This should give you exact value.
But if your class derives from another class, you need to find each size of base class seperatly and add them + 2*Inptr.Size again for header. You can do this by getting fields with BindingFlags.DeclaredOnly flag.
System.Runtime.CompilerServices.Unsafe
Use System.Runtime.CompilerServices.Unsafe.SizeOf<T>() where T: unmanaged
(when not running in .NET Core you need to install that NuGet package)
Documentation states:
Returns the size of an object of the given type parameter.
It seems to use the sizeof IL-instruction just as Earlz solution does as well. (source)
The unmanaged constraint is new in C# 7.3

How do you explain C++ pointers to a C#/Java developer? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am a C#/Java developer trying to learn C++. As I try to learn the concept of pointers, I am struck with the thought that I must have dealt with this concept before. How can pointers be explained using only concepts that are familiar to a .NET or Java developer? Have I really never dealt with this, is it just hidden to me, or do I use it all the time without calling it that?
Java objects in C++
A Java object is the equivalent of a C++ shared pointer.
A C++ pointer is like a Java object without the garbage collection built in.
C++ objects.
C++ has three ways of allocating objects:
Static Storage Duration objects.
These are created at startup (before main) and die after main exits.
There are some technical caveats to that but that is the basics.
Automatic Storage Duration objects.
These are created when declared and destroyed when they go out of scope.
I believe these are like C# structs
Dynamic Storage Duration objects
These are created via new and the closest to a C#/Java object (AKA pointers)
Technically pointers need to be destroyed manually via delete. But this is considered bad practice and under normal situations they are put inside Automatic Storage Duration Objects (usually called smart pointers) that control their lifespan. When the smart pointer goes out of scope it is destroyed and its destructor can call delete on the pointer. Smart pointers can be though of as fine grain garbage collectors.
The closest to Java is the shared_ptr, this is a smart pointer that keeps a count of the number of users of the pointer and deletes it when nobody is using it.
You are "using pointers" all the time in C#, it's just hidden from you.
The best way I reckon to approach the problem is to think about the way a computer works. Forget all of the fancy stuff of .NET: you have the memory, which just holds byte values, and the processor, which just does things to these byte values.
The value of a given variable is stored in memory, so is associated with a memory address. Rather than having to use the memory address all the time, the compiler lets you read from it and write to it using a name.
Furthermore, you can choose to interpret a value as a memory address at which you wish to find another value. This is a pointer.
For example, lets say our memory contains the following values:
Address [0] [1] [2] [3] [4] [5] [6] [7]
Data 5 3 1 8 2 7 9 4
Let's define a variable, x, which the compiler has chosen to put at address 2. It can be seen that the value of x is 1.
Let's now define a pointer, p which the compiler has chosen to put at address 7. The value of p is 4. The value pointed to by p is the value at address 4, which is the value 2. Getting at the value is called dereferencing.
An important concept to note is that there is no such thing as a type as far as memory is concerned: there are just byte values. You can choose to interpret these byte values however you like. For example, dereferencing a char pointer will just get 1 byte representing an ASCII code, but dereferencing an int pointer may get 4 bytes making up a 32 bit value.
Looking at another example, you can create a string in C with the following code:
char *str = "hello, world!";
What that does is says the following:
Put aside some bytes in our stack frame for a variable, which we'll call str.
This variable will hold a memory address, which we wish to interpret as a character.
Copy the address of the first character of the string into the variable.
(The string "hello, world!" will be stored in the executable file and hence will be loaded into memory when the program loads)
If you were to look at the value of str you'd get an integer value which represents an address of the first character of the string. However, if we dereference the pointer (that is, look at what it's pointing to) we'll get the letter 'h'.
If you increment the pointer, str++;, it will now point to the next character. Note that pointer arithmetic is scaled. That means that when you do arithmetic on a pointer, the effect is multiplied by the size of the type it thinks it's pointing at. So assuming int is 4 bytes wide on your system, the following code will actually add 4 to the pointer:
int *ptr = get_me_an_int_ptr();
ptr++;
If you end up going past the end of the string, there's no telling what you'll be pointing at; but your program will still dutifully attempt to interpret it as a character, even if the value was actually supposed to represent an integer for example. You may well be trying to access memory which is not allocated to your program however, and your program will be killed by the operating system.
A final useful tip: arrays and pointer arithmetic are the same thing, it's just syntactic sugar. If you have a variable, char *array, then
array[5]
is completely equivalent to
*(array + 5)
A pointer is the address of an object.
Well, technically a pointer value is the address of an object. A pointer object is an object (variable, call it what you prefer) capable of storing a pointer value, just as an int object is an object capable of storing an integer value.
["Object" in C++ includes instances of class types, and also of built-in types (and arrays, etc). An int variable is an object in C++, if you don't like that then tough luck, because you have to live with it ;-)]
Pointers also have static type, telling the programmer and the compiler what type of object it's the address of.
What's an address? It's one of those 0x-things with numbers and letters it it that you might sometimes have seen in a debugger. For most architectures we can consider memory (RAM, to over-simplify) as a big sequence of bytes. An object is stored in a region of memory. The address of an object is the index of the first byte occupied by that object. So if you have the address, the hardware can get at whatever's stored in the object.
The consequences of using pointers are in some ways the same as the consequences of using references in Java and C# - you're referring to an object indirectly. So you can copy a pointer value around between function calls without having to copy the whole object. You can change an object via one pointer, and other bits of code with pointers to the same object will see the changes. Sharing immutable objects can save memory compared with lots of different objects all having their own copy of the same data that they all need.
C++ also has something it calls "references", which share these properties to do with indirection but are not the same as references in Java. Nor are they the same as pointers in C++ (that's another question).
"I am struck with the thought that I must have dealt with this concept before"
Not necessarily. Languages may be functionally equivalent, in the sense that they all compute the same functions as a Turing machine can compute, but that doesn't mean that every worthwhile concept in programming is explicitly present in every language.
If you wanted to simulate the C memory model in Java or C#, though, I suppose you'd create a very large array of bytes. Pointers would be indexes in the array. Loading an int from a pointer would involve taking 4 bytes starting at that index, and multiplying them by successive powers of 256 to get the total (as happens when you deserialize an int from a bytestream in Java). If that sounds like a ridiculous thing to do, then it's because you haven't dealt with the concept before, but nevertheless it's what your hardware has been doing all along in response to your Java and C# code[*]. If you didn't notice it, then it's because those languages did a good job of creating other abstractions for you to use instead.
Literally the closest the Java language comes to the "address of an object" is that the default hashCode in java.lang.Object is, according to the docs, "typically implemented by converting the internal address of the object into an integer". But in Java, you can't use an object's hashcode to access the object. You certainly can't add or subtract a small number to a hashcode in order to access memory within or in the vicinity of the original object. You can't make mistakes in which you think that your pointer refers to the object you intend it to, but actually it refers to some completely unrelated memory location whose value you're about to scribble all over. In C++ you can do all those things.
[*] well, not multiplying and adding 4 bytes to get an int, not even shifting and ORing, but "loading" an int from 4 bytes of memory.
References in C# act the same way as pointers in C++, without all the messy syntax.
Consider the following C# code:
public class A
{
public int x;
}
public void AnotherFunc(A a)
{
a.x = 2;
}
public void SomeFunc()
{
A a = new A();
a.x = 1;
AnotherFunc(a);
// a.x is now 2
}
Since classes are references types, we know that we are passing an existing instance of A to AnotherFunc (unlike value types, which are copied).
In C++, we use pointers to make this explicit:
class A
{
public:
int x;
};
void AnotherFunc(A* a) // notice we are pointing to an existing instance of A
{
a->x = 2;
}
void SomeFunc()
{
A a;
a.x = 1;
AnotherFunc(&a);
// a.x is now 2
}
"How can pointers be explained using only concepts that are familiar to a .NET or Java developer? "
I'd suggest that there are really two distinct things that need to be learnt.
The first is how to use pointers, and heap allocated memory, to solve specific problems. With an appropriate style, using shared_ptr<> for example, this can be done in a manner analogous to that of Java. A shared_ptr<> has a lot in common with a Java object handle.
Secondly, however, I would suggest that pointers in general are a fundamentally lower level concept that Java, and to a lesser extent C#, deliberately hides. To program in C++ without moving to that level will guarantee a host of problems. You need to think in terms of the underlying memory layout and think of pointers as literally pointers to specific pieces of storage.
To attempt to understand this lower level in terms of higher concepts would be an odd path to take.
Get two sheets of large format graph paper, some scissors and a friend to help you.
Each square on the sheets of paper represents one byte.
One sheet is the stack.
The other sheet is the heap. Give the heap to your friend - he is the memory manager.
You are going to pretend to be a C program and you'll need some memory. When running your program, cut out chunks from the stack and the heap to represent memory allocation.
Ready?
void main() {
int a; /* Take four bytes from the stack. */
int *b = malloc(sizeof(int)); /* Take four bytes from the heap. */
a = 1; /* Write on your first little bit of graph paper, WRITE IT! */
*b = 2; /* Get writing (on the other bit of paper) */
b = malloc(sizeof(int)); /* Take another four bytes from the heap.
Throw the first 'b' away. Do NOT give it
back to your friend */
free(b); /* Give the four bytes back to your friend */
*b = 3; /* Your friend must now kill you and bury the body */
} /* Give back the four bytes that were 'a' */
Try with some more complex programs.
Explain the difference between the stack and the heap and where objects go.
Value types such as structs (both C++ and C#) go on the stack. Reference types (class instances) get put on the heap. A pointer (or reference) points to the memory location on the heap for that specific instance.
Reference type is the key word. Using a pointer in C++ is like using ref keyword in C#.
Managed apps make working with this stuff easy so .NET devs are spared the hassle and confusion. Glad I don't do C anymore.
The key for me was to understand the way memory works. Variables are stored in memory. The places in which you can put variables in memory are numbered. A pointer is a variable that holds this number.
Any C# programmer that understands the semantic differences between classes and structs should be able to understand pointers. I.e., explaining in terms of value vs. reference semantics (in .NET terms) should get the point across; I wouldn't complicate things by trying to explain in terms of ref (or out).
In C#, all references to classes are roughly the equivalent to pointers in the C++ world. For value types (structs, ints, etc..) this is not the case.
C#:
void func1(string parameter)
void func2(int parameter)
C++:
void func1(string* parameter)
void func2(int parameter)
Passing a parameter using the ref keyword in C# is equivalent to passing a parameter by reference in C++.
C#:
void func1(ref string parameter)
void func2(ref int parameter)
C++:
void func1((string*)& parameter)
void func2(int& parameter)
If the parameter is a class, it would be like passing a pointer by reference.

Categories