Can a DynamicMethod access variables outside of it? - c#

Can a Dynamic Method work like a normal method or code, in that it can access variables where variables can normally be accessed, call methods, and initialize variables (of course in the scope of the method)?
The only examples I've seen is where it is passed some parameters and returns some value and does nothing to change any variables outside of it.
I'm talking about the System.Reflection.Emit.DynamicMethod class. I'm having trouble understanding it since one needs to use MSIL which I don't know much of yet.

Yes. A DynamicMethod can be attached to a class, in which case it can access class-private static fields (and possibly class-private fields if the DynamicMethod is an instance method, but I don't recall whether that's a supported scenario). It can also access assembly-internal methods, properties, and types that are internal to the assembly in which the DynamicMethod is created.
What's the scenario you're using dynamic methods for?
I have some blog articles about dynamic programming, including a couple entries about using the DynamicMethod class on my blog: http://robpaveza.net/tag/dynamic-programming . Specifically, this article talks about how to calculate a file revision proof, and you can see the result implementation here (evidently, I never wrote part 2, but the implementation in BN# that I linked was the result of the analysis).
Let me walk you through the Compile method:
Type parameterType = typeof(uint).MakeByRefType()
The final method is going to take out uint parameters; this line obtains a reference to the uint-ref runtime type. The method declaration would look like this if I were to write it in normal C#:
public static void CheckRevision(out uint a, out uint b, out uint c, out uint s);
38-40. foreach (string formula in formulas) CompileStandardFormula(generator, formula)
As I mention in my blog post about it, the math that I do is always provided in the form of:
A=A-S B=B-C C=C+A A=A+B
Where A, B, and C are state variables and S is an input (the next uint value from the file).
The CompileStandardFormula function emits the IL that computes the logic for one individual operation of the four shown. Recall that the CLR is a stack-based state machine, and math operations occur with the values on the stack being popped and results being pushed. So, for A=A-S, for example, the following IL is what would be emitted:
ldarg.0 // push &A, which is a reference to the location that actually contains the value of A
ldarg.0 // push &A
ldind.u4 // dereference the top-most value on the stack, which puts the actual value of &A ready for operation
ldarg.3 // push &S
ldind.u4 // dereference &S
sub // subtracts [stack-1] from [stack-2], which effectively is A-S
stind.u4 // remember the first ldarg.0? That's getting accessed now and the subtraction result is going there
So, at this point, it should be pretty easy to figure out: my DynamicMethod compiles the math operation required to update all state variables for a single pass in the file. After all of the IL is emitted, because we know the state of the stack has nothing on it (more than when the method entered, anyway), we can just throw out a quick 'ret' instruction and we're done.
Anyway, hope this is helpful.

Related

Why stack-based value type fields are guaranteed to be zero?

Please note that, this question is NOT the the same as "Why do local variables require initialization, but fields do not?" or "Why can't I define a default constructor for a struct in .NET?".
Let's say we have the following code:
struct MyStruct {
int num;
}
static void Main(string[] args) {
MyStruct m = new MyStruct();
Console.WriteLine(m.num); // display 0
}
We can see that the when we use new new MyStruct(), default parameterless constructor invokes, which initializs its field to their default value(0 in this case).
Buf if we do:
static void Main(string[] args) {
MyStruct m;
Console.WriteLine(m.num); // compile error
}
This code doesn't compile because we have to assign a value to the struct's fields before we can use them. This mean when we just declare a struct MyStruct m;, the struct's default constructor won't be called, which means that the stack just gets decremented by size of m (allocate space for m). When stack gets decremented, the space can contain any value (e.g values left by previous stack operations)
But if I put a breakpoint:
static void Main(string[] args) {
MyStruct m;
<------------ breakpoint here
...
}
and run the debug mode, when the mouse hovers over on m, I can clearly see that m.num is 0, it is always zero no matter how many time I try.
How come it is always zero? Does CLR initialize the new allocated content of stack to be 0? if CLR does initialize it to 0, then that means MyStruct m; is equivalent to MyStruct m = new MyStruct();, Then why Microsoft team doesn't make MyStruct m; the same as MyStruct m = new MyStruct();?
This also happens with integers and other built-ins; consider local variable int x. The C# spec mandates that it must be assigned prior to use, but the IL initializes it to 0.
Verifiable IL (which is what C# produces unless you use an "unsafe" feature like pointers or the SkipLocalsInit feature Matthew Watson mentioned) guarantees the locals are zeroed out at method startup.
I'd guess this is because if the locals weren't zeroed, they could contain arbitrary data (whatever happened to be on the call stack prior to adding the current method's frame), which would hinder the JITter's safety guarantees.
Otherwise, the JITter would also have to do some kind of definite assignment analysis itself, which would be an unnecessary cost if high-level languages are already going to guarantee it.
C# has an additional rule that says local variables must be definitely assigned prior to use. This extends to fields of value-type variables.
(I'll add that this feature has been helpful to me in the past: if I have a method that returns a struct, and I start the method off with a local variable that I return at the end of the method, then I can use this compiler check to guarantee all code paths through the method fully populate the struct.)
So you're right in that at runtime MyStruct m; and MyStruct m = new MyStruct(); will both lead to m being zeroed-out. The difference is that C# enforces an additional compile-time requirement.
As to why there is this difference, that's a matter of language design. IL is intended to be quickly understood and compiled by the JITter, so fewer / simpler rules makes that job easier. But C# is intended to help developers write programs, and checking that a local is assigned before access is apparently worth the cost of the compile time check to ensure developers don't forget to intentionally give value type variables a value before using them.

Why is the Pinnable<T> class in C# 7.2 defined the way it is?

I'm aware that Pinnable<T> is an internal class used by the methods in the new Unsafe class, and it's not meant to be used anywhere else other than in that class. This question is not about something practical, but it's just to understand why it's been designed like this and to learn a bit more about the language and its various "tricks" like this one.
As a recap, the Pinnable<T> class is defined here, and it looks like this:
[StructLayout(LayoutKind.Sequential)]
internal sealed class Pinnable<T>
{
public T Data;
}
And it's mainly used in the Span<T>.DangerousCreate method, here:
public static Span<T> DangerousCreate(object obj, ref T objectData, int length)
{
Pinnable<T> pinnable = Unsafe.As<Pinnable<T>>(obj);
IntPtr byteOffset = Unsafe.ByteOffset<T>(ref pinnable.Data, ref objectData);
return new Span<T>(pinnable, byteOffset, length);
}
The reason for Pinnable<T> being that it's used to keep track of the original object, in case the Span<T> instance was created by one (instead of a native pointer).
Given that reference type doesn't matter when pinning a reference (fixing both a ref T and Unsafe.As<T, byte>(ref T) works the same), is there a specific reason why the Pinnable<T> class was made generic? The original design in DotNetCross here in fact had a Pinnable class with just a single byte field, and it worked just the same. Is there any reason why using a generic class in this case would be an advantage, other than avoiding to cast the reference time when writing/reading/returning it?
Is there any other way, other than this unsafe-cast done with Unsafe.As, to get a reference to an object (I mean a reference to the object contents, otherwise it'd be the same as any variable of a class type)? I mean, any way to get a reference (which should basically have the same address of the actual object variable in the first place, right?) to an object without having to pass through some custom defined secondary class.
First of all, the Struct in [StructLayout(LayoutKind.Sequential)] doesn't mean that it is only valid for structs, it means the layout of the actual structure of the fields in memory, be it in a class or in a value type. This controls the actual runtime layout of the data, not just how the type would marshal to unmanaged code. The Sequential is important because without it, the runtime is pretty much free to store the memory however it sees fit, which means that Data may have some padding before it.
From what I understand about the implementation, the reason for Pinnable is to allow creating an instance of Span to a memory that may be moved by the GC, without having to pin the object first. If you don't use actual pointers and just references, nothing at all will need to be pinned.
I have noticed that it was introduced in a commit with a description saying it made Span more "portable" (a bold word for something that does a lot of unsafe things). I can't think of any other reason than something related to alignment for why it is generic. I suppose representing a T in terms of an offset from another T is better than as an offset from a byte. It may happen that the type of the first field may play a role in its actual address, even if the type was marked with LayoutKind.Sequential.
A reference to an object is different from an interior reference to an object (a reference to its data). It is implementation defined, but in .NET Framework, an instance of any class (or a boxed value type) starts with a header consisting of a sync block (for lock) and a pointer to the method table, a.k.a. the type of the object. On 32-bit, the header is 8 bytes, but the actual pointer points to the pointer to the method table (for performance reasons, getting the type happens more often than locking an object).
One but not portable way of getting the pointer to the start of the data is therefore casting the object reference to a pointer and adding 4 bytes to it. There the first field should start.
Another way I can think of is utilising GCHandle.AddrOfPinnedObject. It is commonly used for accessing array or string data, but it works for other objects:
[StructLayout(LayoutKind.Sequential)]
class Obj
{
public int A;
}
var obj = new Obj();
var gc = GCHandle.Alloc(obj, GCHandleType.Pinned);
IntPtr interior = gc.AddrOfPinnedObject();
Marshal.WriteInt32(interior, 0, 16);
Console.WriteLine(obj.A);
I think this actually is quite portable, but still needs to pin the object (there is InternalAddrOfPinnedObject defined in GCHandle, but even if that doesn't check whether the handle is actually pinned, the returned value may not be valid if it was used on a non-pinned object).
Still, the technique Span uses seems like the most portable way of doing that, since a lot of the underlying work is done in pure CIL (like reference arithmetics).

Obtain non-explicit field offset

I have the following class:
[StructLayout(LayoutKind.Sequential)]
class Class
{
public int Field1;
public byte Field2;
public short? Field3;
public bool Field4;
}
How can I get the byte offset of Field4 starting from the start of the class data (or object header)?
To illustrate:
Class cls = new Class();
fixed(int* ptr1 = &cls.Field1) //first field
fixed(bool* ptr2 = &cls.Field4) //requested field
{
Console.WriteLine((byte*)ptr2-(byte*)ptr1);
}
The resulting offset is, in this case, 5, because the runtime actually moves Field3 to the end of the type (and pads it), probably because it its type is generic. I know there is Marshal.OffsetOf, but it returns unmanaged offset, not managed.
How can I retrieve this offset from a FieldInfo instance? Is there any .NET method used for that, or do I have to write my own, taking all the exceptions into account (type size, padding, explicit offsets, etc.)?
Offset of a field within a class or struct in .NET 4.7.2:
public static int GetFieldOffset(this FieldInfo fi) =>
GetFieldOffset(fi.FieldHandle);
public static int GetFieldOffset(RuntimeFieldHandle h) =>
Marshal.ReadInt32(h.Value + (4 + IntPtr.Size)) & 0xFFFFFF;
These return the byte offset of a field within a class or struct, relative to the layout of some respective managed instance at runtime. This works for all StructLayout modes, and for both value- and reference-types (including generics, reference-containing, or otherwise non-blittable). The offset value is zero-based relative to the beginning of the user-defined content or 'data body' of the struct or class only, and doesn't include any header, prefix, or other pad bytes.
Discussion
Since struct types have no header, the returned integer offset value can used directly via pointer arithmetic, and System.Runtime.CompilerServices.Unsafe if necessary (not shown here). Reference-type objects, on the other hand, have a header which has to be skipped-over in order to reference the desired field. This object header is usually a single IntPtr, which means IntPtr.Size needs to be added to the the offset value. It is also necessary to dereference the GC ("garbage collection") handle to obtain the object's address in the first place.
With these considerations, we can synthesize a tracking reference to the interior of a GC object at runtime by combining the field offset (obtained via the method shown above) with an instance of the class (e.g. an Object handle).
The following method, which is only meaningful for class (and not struct) types, demonstrates the technique. For simplicity, it uses ref-return and the System.Runtime.CompilerServices.Unsafe libary. Error checking, such as asserting fi.DeclaringType.IsSubclassOf(obj.GetType()) for example, is also elided for simplicity.
/// <summary>
/// Returns a managed reference ("interior pointer") to the value or instance of type 'U'
/// stored in the field indicated by 'fi' within managed object instance 'obj'
/// </summary>
public static unsafe ref U RefFieldValue<U>(Object obj, FieldInfo fi)
{
var pobj = Unsafe.As<Object, IntPtr>(ref obj);
pobj += IntPtr.Size + GetFieldOffset(fi.FieldHandle);
return ref Unsafe.AsRef<U>(pobj.ToPointer());
}
This method returns a managed "tracking" pointer into the interior of the garbage-collected object instance obj.[see comment] It can be used to arbitrarily read or write the field, so this one function replaces the traditional pair of separate getter/setter functions. Although the returned pointer cannot be stored in the GC heap and thus has a lifetime limited to the scope of the current stack frame (i.e., and below), it is very cheap to obtain at any time by simply calling the function again.
Note that this generic method is only parameterized with <U>, the type of the fetched pointed-at value, and not for the type ("<T>", perhaps) of the containing class (the same applies for the IL version below). It's because the bare-bones simplicity of this technique doesn't require it. We already know that the containing instance has to be a reference (class) type, so at runtime it will present via a reference handle to a GC object with object header, and those facts alone are sufficient here; nothing further needs to be known about putative type "T".
It's a matter of opinion whether adding vacuous <T, … >, which would allow us to indicate the where T: class constraint, would improve the look or feel of the example above. It certainly wouldn't hurt anything; I believe the JIT is smart enough to not generate additional generic method instantiations for generic arguments that have no effect. But since doing so seems chatty (other than for stating the constraint), I opted for the minimalism of strict necessity here.
In my own use, rather than passing a FieldInfo or its respective FieldHandle every time, what I actually retain are the various integer offset values for the fields of interest as returned from GetFieldOffset, since these are also invariant at runtime, once obtained. This eliminates the extra step (of calling GetFieldOffset) each time the pointer is fetched. In fact, since I am able to include IL code in my projects, here is the exact code that I use for the function above. As with the C# just shown, it trivially synthesizes a managed pointer from a containing GC-object obj, plus a (retained) integer offset offs within it.
// Returns a managed 'ByRef' pointer to the (struct or reference-type) instance of type U
// stored in the field at byte offset 'offs' within reference type instance 'obj'
.method public static !!U& RefFieldValue<U>(object obj, int32 offs) aggressiveinlining
{
ldarg obj
ldarg offs
sizeof object
add
add
ret
}
So even if you are not able to directly incorporate this IL, showing it here, I think, nicely illustrates the extremely low runtime overhead and alluring simplicity, in general, of this technique.
Example usage
class MyClass { public byte b_bar; public String s0, s1; public int iFoo; }
The first demonstration gets the integer offset of reference-typed field s1 within an instance of MyClass, and then uses it to get and set the field value.
var fi = typeof(MyClass).GetField("s1");
// note that we can get a field offset without actually
// having any instance of 'MyClass'
var offs = GetFieldOffset(fi);
// i.e., later...
var mc = new MyClass();
RefFieldValue<String>(mc, offs) = "moo-maa"; // field "setter"
// note: method call used as l-value, on the left-hand side of '=' assignment!
RefFieldValue<String>(mc, offs) += "!!"; // in-situ access
Console.WriteLine(mc.s1); // --> moo-maa!! (in the original)
// can be used as a non-ref "getter" for by-value access
var _ = RefFieldValue<String>(mc, offs) + "%%"; // 'mc.s1' not affected
If this seems a bit cluttered, you can dramatically clean it up by retaining the managed pointer as ref local variable. As you know, this type of pointer is automatically adjusted--with interior offset preserved--whenever the GC moves the containing object. This means that it will remain valid even as you continue accessing the field unawares. In exchange for allowing this capability, the CLR requires that the ref local variable itself not be allowed to escape its stack frame, which in this case is enforced by the C# compiler.
// demonstrate using 'RuntimeFieldHandle', and accessing a value-type
// field (int) this time
var h = typeof(MyClass).GetField(nameof(mc.iFoo)).FieldHandle;
// later... (still using 'mc' instance created above)
// acquire managed pointer to 'mc.iFoo'
ref int i = ref RefFieldValue<int>(mc, h);
i = 21; // directly affects 'mc.iFoo'
Console.WriteLine(mc.iFoo == 21); // --> true
i <<= 1; // operates directly on 'mc.iFoo'
Console.WriteLine(mc.iFoo == 42); // --> true
// any/all 'ref' uses of 'i' just affect 'mc.iFoo' directly:
Interlocked.CompareExchange(ref i, 34, 42); // 'mc.iFoo' (and 'i' also): 42 -> 34
Summary
The usage examples focused on using the technique with a class object, but as noted, the GetFieldOffset method shown here works perfectly fine with struct as well. Just be sure not to use the RefFieldValue method with value-types, since that code includes adjusting for an expected object header. For that simpler case, just use System.Runtime.CompilerServicesUnsafe.AddByteOffset for your address arithmetic instead.
Needless to say, this technique might seem a bit radical to some. I'll just note that it has worked flawlessly for me for many years, specifically on .NET Framework 4.7.2, and including 32- and 64-bit mode, debug vs. release, plus whichever various JIT optimization settings I've tried.
With some tricks around TypedReference.MakeTypedReference, it is possible to obtain the reference to the field, and to the start of the object's data, then just subtract. The method can be found in SharpUtils.

How do you explain C++ pointers to a C#/Java developer? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am a C#/Java developer trying to learn C++. As I try to learn the concept of pointers, I am struck with the thought that I must have dealt with this concept before. How can pointers be explained using only concepts that are familiar to a .NET or Java developer? Have I really never dealt with this, is it just hidden to me, or do I use it all the time without calling it that?
Java objects in C++
A Java object is the equivalent of a C++ shared pointer.
A C++ pointer is like a Java object without the garbage collection built in.
C++ objects.
C++ has three ways of allocating objects:
Static Storage Duration objects.
These are created at startup (before main) and die after main exits.
There are some technical caveats to that but that is the basics.
Automatic Storage Duration objects.
These are created when declared and destroyed when they go out of scope.
I believe these are like C# structs
Dynamic Storage Duration objects
These are created via new and the closest to a C#/Java object (AKA pointers)
Technically pointers need to be destroyed manually via delete. But this is considered bad practice and under normal situations they are put inside Automatic Storage Duration Objects (usually called smart pointers) that control their lifespan. When the smart pointer goes out of scope it is destroyed and its destructor can call delete on the pointer. Smart pointers can be though of as fine grain garbage collectors.
The closest to Java is the shared_ptr, this is a smart pointer that keeps a count of the number of users of the pointer and deletes it when nobody is using it.
You are "using pointers" all the time in C#, it's just hidden from you.
The best way I reckon to approach the problem is to think about the way a computer works. Forget all of the fancy stuff of .NET: you have the memory, which just holds byte values, and the processor, which just does things to these byte values.
The value of a given variable is stored in memory, so is associated with a memory address. Rather than having to use the memory address all the time, the compiler lets you read from it and write to it using a name.
Furthermore, you can choose to interpret a value as a memory address at which you wish to find another value. This is a pointer.
For example, lets say our memory contains the following values:
Address [0] [1] [2] [3] [4] [5] [6] [7]
Data 5 3 1 8 2 7 9 4
Let's define a variable, x, which the compiler has chosen to put at address 2. It can be seen that the value of x is 1.
Let's now define a pointer, p which the compiler has chosen to put at address 7. The value of p is 4. The value pointed to by p is the value at address 4, which is the value 2. Getting at the value is called dereferencing.
An important concept to note is that there is no such thing as a type as far as memory is concerned: there are just byte values. You can choose to interpret these byte values however you like. For example, dereferencing a char pointer will just get 1 byte representing an ASCII code, but dereferencing an int pointer may get 4 bytes making up a 32 bit value.
Looking at another example, you can create a string in C with the following code:
char *str = "hello, world!";
What that does is says the following:
Put aside some bytes in our stack frame for a variable, which we'll call str.
This variable will hold a memory address, which we wish to interpret as a character.
Copy the address of the first character of the string into the variable.
(The string "hello, world!" will be stored in the executable file and hence will be loaded into memory when the program loads)
If you were to look at the value of str you'd get an integer value which represents an address of the first character of the string. However, if we dereference the pointer (that is, look at what it's pointing to) we'll get the letter 'h'.
If you increment the pointer, str++;, it will now point to the next character. Note that pointer arithmetic is scaled. That means that when you do arithmetic on a pointer, the effect is multiplied by the size of the type it thinks it's pointing at. So assuming int is 4 bytes wide on your system, the following code will actually add 4 to the pointer:
int *ptr = get_me_an_int_ptr();
ptr++;
If you end up going past the end of the string, there's no telling what you'll be pointing at; but your program will still dutifully attempt to interpret it as a character, even if the value was actually supposed to represent an integer for example. You may well be trying to access memory which is not allocated to your program however, and your program will be killed by the operating system.
A final useful tip: arrays and pointer arithmetic are the same thing, it's just syntactic sugar. If you have a variable, char *array, then
array[5]
is completely equivalent to
*(array + 5)
A pointer is the address of an object.
Well, technically a pointer value is the address of an object. A pointer object is an object (variable, call it what you prefer) capable of storing a pointer value, just as an int object is an object capable of storing an integer value.
["Object" in C++ includes instances of class types, and also of built-in types (and arrays, etc). An int variable is an object in C++, if you don't like that then tough luck, because you have to live with it ;-)]
Pointers also have static type, telling the programmer and the compiler what type of object it's the address of.
What's an address? It's one of those 0x-things with numbers and letters it it that you might sometimes have seen in a debugger. For most architectures we can consider memory (RAM, to over-simplify) as a big sequence of bytes. An object is stored in a region of memory. The address of an object is the index of the first byte occupied by that object. So if you have the address, the hardware can get at whatever's stored in the object.
The consequences of using pointers are in some ways the same as the consequences of using references in Java and C# - you're referring to an object indirectly. So you can copy a pointer value around between function calls without having to copy the whole object. You can change an object via one pointer, and other bits of code with pointers to the same object will see the changes. Sharing immutable objects can save memory compared with lots of different objects all having their own copy of the same data that they all need.
C++ also has something it calls "references", which share these properties to do with indirection but are not the same as references in Java. Nor are they the same as pointers in C++ (that's another question).
"I am struck with the thought that I must have dealt with this concept before"
Not necessarily. Languages may be functionally equivalent, in the sense that they all compute the same functions as a Turing machine can compute, but that doesn't mean that every worthwhile concept in programming is explicitly present in every language.
If you wanted to simulate the C memory model in Java or C#, though, I suppose you'd create a very large array of bytes. Pointers would be indexes in the array. Loading an int from a pointer would involve taking 4 bytes starting at that index, and multiplying them by successive powers of 256 to get the total (as happens when you deserialize an int from a bytestream in Java). If that sounds like a ridiculous thing to do, then it's because you haven't dealt with the concept before, but nevertheless it's what your hardware has been doing all along in response to your Java and C# code[*]. If you didn't notice it, then it's because those languages did a good job of creating other abstractions for you to use instead.
Literally the closest the Java language comes to the "address of an object" is that the default hashCode in java.lang.Object is, according to the docs, "typically implemented by converting the internal address of the object into an integer". But in Java, you can't use an object's hashcode to access the object. You certainly can't add or subtract a small number to a hashcode in order to access memory within or in the vicinity of the original object. You can't make mistakes in which you think that your pointer refers to the object you intend it to, but actually it refers to some completely unrelated memory location whose value you're about to scribble all over. In C++ you can do all those things.
[*] well, not multiplying and adding 4 bytes to get an int, not even shifting and ORing, but "loading" an int from 4 bytes of memory.
References in C# act the same way as pointers in C++, without all the messy syntax.
Consider the following C# code:
public class A
{
public int x;
}
public void AnotherFunc(A a)
{
a.x = 2;
}
public void SomeFunc()
{
A a = new A();
a.x = 1;
AnotherFunc(a);
// a.x is now 2
}
Since classes are references types, we know that we are passing an existing instance of A to AnotherFunc (unlike value types, which are copied).
In C++, we use pointers to make this explicit:
class A
{
public:
int x;
};
void AnotherFunc(A* a) // notice we are pointing to an existing instance of A
{
a->x = 2;
}
void SomeFunc()
{
A a;
a.x = 1;
AnotherFunc(&a);
// a.x is now 2
}
"How can pointers be explained using only concepts that are familiar to a .NET or Java developer? "
I'd suggest that there are really two distinct things that need to be learnt.
The first is how to use pointers, and heap allocated memory, to solve specific problems. With an appropriate style, using shared_ptr<> for example, this can be done in a manner analogous to that of Java. A shared_ptr<> has a lot in common with a Java object handle.
Secondly, however, I would suggest that pointers in general are a fundamentally lower level concept that Java, and to a lesser extent C#, deliberately hides. To program in C++ without moving to that level will guarantee a host of problems. You need to think in terms of the underlying memory layout and think of pointers as literally pointers to specific pieces of storage.
To attempt to understand this lower level in terms of higher concepts would be an odd path to take.
Get two sheets of large format graph paper, some scissors and a friend to help you.
Each square on the sheets of paper represents one byte.
One sheet is the stack.
The other sheet is the heap. Give the heap to your friend - he is the memory manager.
You are going to pretend to be a C program and you'll need some memory. When running your program, cut out chunks from the stack and the heap to represent memory allocation.
Ready?
void main() {
int a; /* Take four bytes from the stack. */
int *b = malloc(sizeof(int)); /* Take four bytes from the heap. */
a = 1; /* Write on your first little bit of graph paper, WRITE IT! */
*b = 2; /* Get writing (on the other bit of paper) */
b = malloc(sizeof(int)); /* Take another four bytes from the heap.
Throw the first 'b' away. Do NOT give it
back to your friend */
free(b); /* Give the four bytes back to your friend */
*b = 3; /* Your friend must now kill you and bury the body */
} /* Give back the four bytes that were 'a' */
Try with some more complex programs.
Explain the difference between the stack and the heap and where objects go.
Value types such as structs (both C++ and C#) go on the stack. Reference types (class instances) get put on the heap. A pointer (or reference) points to the memory location on the heap for that specific instance.
Reference type is the key word. Using a pointer in C++ is like using ref keyword in C#.
Managed apps make working with this stuff easy so .NET devs are spared the hassle and confusion. Glad I don't do C anymore.
The key for me was to understand the way memory works. Variables are stored in memory. The places in which you can put variables in memory are numbered. A pointer is a variable that holds this number.
Any C# programmer that understands the semantic differences between classes and structs should be able to understand pointers. I.e., explaining in terms of value vs. reference semantics (in .NET terms) should get the point across; I wouldn't complicate things by trying to explain in terms of ref (or out).
In C#, all references to classes are roughly the equivalent to pointers in the C++ world. For value types (structs, ints, etc..) this is not the case.
C#:
void func1(string parameter)
void func2(int parameter)
C++:
void func1(string* parameter)
void func2(int parameter)
Passing a parameter using the ref keyword in C# is equivalent to passing a parameter by reference in C++.
C#:
void func1(ref string parameter)
void func2(ref int parameter)
C++:
void func1((string*)& parameter)
void func2(int& parameter)
If the parameter is a class, it would be like passing a pointer by reference.

Calling a non-void function without using its return value. What actually happens?

So, I found a similar question here, but the answers are more about style and whether or not you are able to do it.
My question is, what actually happens when you call a non-void function that returns an object, but you never assign or use said returned object? So, less about whether or not you can, because I absolutely know you can and understand the other question linked above... what does the compiler/runtime environment do?
This is not a language specific question, but if you answer, please specify what language you are referring to, since behaviors will differ.
I believe that for both C# and Java, the result ends up on the stack, and the compiler then forces a pop instruction to ignore it. Eric Lippert's blog post on "The void is invariant" has some more information on this.
For example, consider the following C# code:
using System;
public class Test
{
static int Foo() { return 10; }
public static void Main()
{
Foo();
Foo();
}
}
The IL generated (by the MS C# 4 compiler) for the Main method is:
.method public hidebysig static void Main() cil managed
{
.entrypoint
.maxstack 8
L_0000: call int32 Test::Foo()
L_0005: pop
L_0006: call int32 Test::Foo()
L_000b: pop
L_000c: ret
}
Note the calls to pop - which disappear if you make Foo a void method.
what does the compiler do?
The compiler generates a pop instruction that discards the result off the virtual stack.
what does the runtime environment do?
It typically jits the code into code that passes the return value back in a register, rather than a stack location. (Typically EAX on x86 architectures.)
The jitter knows that the value will go unused, so it probably generates code that clears the register. Or perhaps it just leaves it hanging around in the register for a while.
Which runtime environment do you care about? There are lots of them, and they all have different jitters.
It depends a bit on the calling convention being used. For small/simple types, the return will typically happen in a register. In this case, the function will write the value into the register, but nothing else will pay attention, and the next time that register is needed (which will typically happen pretty quickly) it'll be overwritten with something else.
For larger types, the compiler will normally allocate a structure to hold the return value. Exactly where/how it does that allocation will vary with the compiler though -- in some cases it'll be a static structure, and the contents will be ignored and overwritten the next time you call a function returning the same type. In other cases it'll be on the stack, and even though you haven't used it, it still needs to be allocated before the function is called, and freed afterwards
For C++, typically, the compiler will optimize out the returning of the variable, turning it into a void function, and if this enables further optimizations, the compiler may optimize out the entire function call or parts of it that only pertain to the return value. For Java and C#, I have little idea.
In .NET, if the object being returned is a reference type, and the application has no other references to that object, then you'll still have an object floating around in memory until the garbage collector decides to collect it.
This is potentially bad if the object being returned happens to be holding on to resources. If it implements the IDisposable interface, then your code ought to be calling the Dispose method on it, but in this case the Dispose method would never be called.
EDIT: corrected a typo.

Categories