How does C#/.NET implement pinning of ref/in/out parameters?

How does C#/.NET implement pinning of ref/in/out parameters? - c#

In "unsafe" C# code, it is possible to get a pointer to a ref, in, or out parameter by using the fixed statement:
class A
{
unsafe void Test(ref int i)
{
fixed(int* ptr = &i)
{
// Do something with ptr.
}
}
}
The fixed statement "pins" the memory for i in place for the duration of the block so that the GC won't move the memory for i someplace else, which would invalidate ptr.
So my question, which I ask out of curiosity and a desire to better understand the performance implications of pinning ref/in/out parameters, is: How does C# and/or the .NET runtime know what object, if any, actually needs to be pinned? Because if i is a reference to a member field of an object, then doesn't it need to pin that whole object? And if i is a reference to a local variable in the calling function, then isn't there nothing that needs to be pinned at all? Does it somehow walk up the call stack until it finds the actual variable or field referred to by i? (Which sounds potentially expensive.)

Related

converting .net reference to c++ reference

I have a c++ function called innergetnum that gets as a parameter float& num
I write in managed c++ a function with the following code:
void getnum(float% num)
{
innercppclass.innergetnum(num);
}
It doesn't work because he fails to convert num to float&
The only solution I found is to make additional temp variable float tmp, pass it to innergetnum and then to assign it to num.
Unfortunately I have many ref variables I want pass and the code looks ugly and I feel like the temp variable is a hack.
Is there a better way to solve it?

error C2664: 'innercppclass::getnum' : cannot convert parameter 1 from 'float' to 'float &'
An object from the gc heap (a dereferenced gc pointer) cannot be converted to a native reference
You forgot to document the error you're dealing with, this is what you saw. This is entirely by design and fundamental to the way managed code works.
The float% argument of the managed function can be an interior pointer to a managed object, like the field of a ref class. The float& reference will be a raw unmanaged pointer at runtime, pointing to the float value. Which allows the callee to update the value. Both are just plain pointers at runtime, the only difference is that the garbage collector can see the interior pointer but not the unmanaged pointer. The jitter tells it where to look for the managed pointer, no such help for the native C++ function since it wasn't jitted.
So assigning the unmanaged pointer with the interior pointer value would be possible. However, something very nasty happens when the garbage collector runs while the native code is running. Note that a GC can occur when other threads in the program allocate memory. One important thing the GC does is compact the heap, it moves managed objects as part of the collection. A very desirable trait, it gets rid of the holes in the heap and makes managed code fast by improving locality of reference. Trouble is, the native code is holding a pointer to where that float used to be before it got moved. And if it writes through the pointer, updating the float value, it will corrupt the GC heap.
There isn't any way that the GC can stop the native code from doing this, it doesn't know where the pointer value is located in memory so it cannot update it. No such trouble with the interior pointer but an unsolvable problem for the native reference.
So the compiler complains, it can't possibly generate code that won't crash your program sooner or later (usually later) with completely undiagnosable heap damage. You already found the workaround, you need to generate the pointer from a storage location that is not the GC heap. The stack is fine, local variables never move:
void Managed::getnum(float% num) {
float temp;
innercppclass::getnum(temp);
num = temp;
}
Otherwise the kind of code you'd write as well when you turn void getnum(float%) into float getnum(). Or more typically in managed code, a property getter:
property float num {
float get() {
float temp;
innercppclass::getnum(temp);
return temp;
}
}
Nothing much else you can do about it, it is a very fundamental restriction.

Storing a "managed" context parameter in an unmanaged DLL

I don't know if this is a bad idea or not. I'm using an unmanaged DLL (written by me) in C#.
There are some callback functions that can be set up in the DLL, but these can only mapped to static class members on the C# side.
Since I want to make a callback operate on a particular class instance I'm wondering if it would be safe to store a class instance pointer inside the DLL's state information.
From the DLL's perspective this will simply be a 32-bit context integer, but from the C# side this will be an actual class "pointer" or "reference", with the callback signature defined something like so:
public delegate void StatusChangeHandler(ContextClass context, int someCallbackValue);
It does compile and it does appear to work, I just don't know if this is guaranteed. Is this an acceptable practice?

One problem that I see here, is that .Net have a garbage collector, which can move your class around. So your saved pointer may be invalidated. In order to prevent this for simple types you should pin the object like this:
byte[] b = new byte[1000];
// pin b, and get pointer to the first element.
fixed (byte* ptr = &b)
{
//use your fixed pointer to b. b will not be moved untill code leaves fixed region.
}
Though, for complex types, .Net may be smartenough to pin objects automatically, I would not rely on that.
So you have write something like this:
var ctx = new Context();
fixed (IntPtr ptr = &ctx)
{
StatusChange(ptr);
// do other stuff, and don't leave fixed region, until you can clear the pointer in the native library.
}
But really, I think a much simpler and reliably way will be to create a static dictionary for your context objects, and give your native dll only a key for that dictionary, which could be a number, string or GUID. E.g. anything that is a value, not a pointer.

Is it safe to keep C++ pointers in C#?

I'm currently working on some C#/C++ code which makes use of invoke. In the C++ side there is a std::vector full of pointers each identified by index from the C# code, for example a function declaration would look like this:
void SetName(char* name, int idx)
But now I'm thinking, since I'm working with pointers couldn't I sent to C# the pointer address itself then in code I could do something like this:
void SetName(char*name, int ptr)
{
((TypeName*)ptr)->name = name;
}
Obviously that's a quick version of what I'm getting at (and probably won't compile).
Would the pointer address be guaranteed to stay constant in C++ such that I can safely store its address in C# or would this be too unstable or dangerous for some reason?

In C#, you don't need to use a pointer here, you can just use a plain C# string.
[DllImport(...)]
extern void SetName(string name, int id);
This works because the default behavior of strings in p/invoke is to use MarshalAs(UnmanagedType.LPStr), which converts to a C-style char*. You can mark each argument in the C# declaration explicitly if it requires some other way of being marshalled, eg, [MarshalAs(UnmanagedType.LPWStr)], for an arg that uses a 2-byte per character string.
The only reason to use pointers is if you need to retain access to the data pointed to after you've called the function. Even then, you can use out parameters most of the time.
You can p/invoke basically anything without requiring pointers at all (and thus without requiring unsafe code, which requires privileged execution in some environments).

Yes, no problem. Native memory allocations never move so storing the pointer in an IntPtr on the C# side is fine. You need some kind of pinvoked function that returns this pointer, then
[DllImport("something.dll", CharSet = CharSet.Ansi)]
void SetName(IntPtr vector, string name, int index);
Which intentionally lies about this C++ function:
void SetName(std::vector<std::string>* vect, const char* name, int index) {
std::string copy = name;
(*vect)[index] = copy;
}
Note the usage of new in the C++ code, you have to copy the string. The passed name argument points to a buffer allocated by the pinvoke marshaller and is only valid for the duration of the function body. Your original code cannot work. If you intend to return pointers to vector<> elements then be very careful. A vector re-allocates its internal array when you add elements. Such a returned pointer will then become invalid and you'll corrupt the heap when you use it later. The exact same thing happens with a C# List<> but without the risk of dangling pointers.

I think it's stable till you command C++ code and perfectly aware what he does, and other developers that work on the same code know about that danger too.
So by my opinion, it's not very secure way of architecture, and I would avoid it as much as I can.
Regards.

The C# GC moves things, but the C++ heap does not move anything- a pointer to an allocated object is guaranteed to remain valid until you delete it. The best architecture for this situation is just to send the pointer to C# as an IntPtr and then take it back in C++.
It's certainly a vastly, incredibly better idea than the incredibly BAD, HORRIFIC integer cast you've got going there.

How do you explain C++ pointers to a C#/Java developer? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am a C#/Java developer trying to learn C++. As I try to learn the concept of pointers, I am struck with the thought that I must have dealt with this concept before. How can pointers be explained using only concepts that are familiar to a .NET or Java developer? Have I really never dealt with this, is it just hidden to me, or do I use it all the time without calling it that?

Java objects in C++
A Java object is the equivalent of a C++ shared pointer.
A C++ pointer is like a Java object without the garbage collection built in.
C++ objects.
C++ has three ways of allocating objects:
Static Storage Duration objects.
These are created at startup (before main) and die after main exits.
There are some technical caveats to that but that is the basics.
Automatic Storage Duration objects.
These are created when declared and destroyed when they go out of scope.
I believe these are like C# structs
Dynamic Storage Duration objects
These are created via new and the closest to a C#/Java object (AKA pointers)
Technically pointers need to be destroyed manually via delete. But this is considered bad practice and under normal situations they are put inside Automatic Storage Duration Objects (usually called smart pointers) that control their lifespan. When the smart pointer goes out of scope it is destroyed and its destructor can call delete on the pointer. Smart pointers can be though of as fine grain garbage collectors.
The closest to Java is the shared_ptr, this is a smart pointer that keeps a count of the number of users of the pointer and deletes it when nobody is using it.

You are "using pointers" all the time in C#, it's just hidden from you.
The best way I reckon to approach the problem is to think about the way a computer works. Forget all of the fancy stuff of .NET: you have the memory, which just holds byte values, and the processor, which just does things to these byte values.
The value of a given variable is stored in memory, so is associated with a memory address. Rather than having to use the memory address all the time, the compiler lets you read from it and write to it using a name.
Furthermore, you can choose to interpret a value as a memory address at which you wish to find another value. This is a pointer.
For example, lets say our memory contains the following values:
Address [0] [1] [2] [3] [4] [5] [6] [7]
Data 5 3 1 8 2 7 9 4
Let's define a variable, x, which the compiler has chosen to put at address 2. It can be seen that the value of x is 1.
Let's now define a pointer, p which the compiler has chosen to put at address 7. The value of p is 4. The value pointed to by p is the value at address 4, which is the value 2. Getting at the value is called dereferencing.
An important concept to note is that there is no such thing as a type as far as memory is concerned: there are just byte values. You can choose to interpret these byte values however you like. For example, dereferencing a char pointer will just get 1 byte representing an ASCII code, but dereferencing an int pointer may get 4 bytes making up a 32 bit value.
Looking at another example, you can create a string in C with the following code:
char *str = "hello, world!";
What that does is says the following:
Put aside some bytes in our stack frame for a variable, which we'll call str.
This variable will hold a memory address, which we wish to interpret as a character.
Copy the address of the first character of the string into the variable.
(The string "hello, world!" will be stored in the executable file and hence will be loaded into memory when the program loads)
If you were to look at the value of str you'd get an integer value which represents an address of the first character of the string. However, if we dereference the pointer (that is, look at what it's pointing to) we'll get the letter 'h'.
If you increment the pointer, str++;, it will now point to the next character. Note that pointer arithmetic is scaled. That means that when you do arithmetic on a pointer, the effect is multiplied by the size of the type it thinks it's pointing at. So assuming int is 4 bytes wide on your system, the following code will actually add 4 to the pointer:
int *ptr = get_me_an_int_ptr();
ptr++;
If you end up going past the end of the string, there's no telling what you'll be pointing at; but your program will still dutifully attempt to interpret it as a character, even if the value was actually supposed to represent an integer for example. You may well be trying to access memory which is not allocated to your program however, and your program will be killed by the operating system.
A final useful tip: arrays and pointer arithmetic are the same thing, it's just syntactic sugar. If you have a variable, char *array, then
array[5]
is completely equivalent to
*(array + 5)

A pointer is the address of an object.
Well, technically a pointer value is the address of an object. A pointer object is an object (variable, call it what you prefer) capable of storing a pointer value, just as an int object is an object capable of storing an integer value.
["Object" in C++ includes instances of class types, and also of built-in types (and arrays, etc). An int variable is an object in C++, if you don't like that then tough luck, because you have to live with it ;-)]
Pointers also have static type, telling the programmer and the compiler what type of object it's the address of.
What's an address? It's one of those 0x-things with numbers and letters it it that you might sometimes have seen in a debugger. For most architectures we can consider memory (RAM, to over-simplify) as a big sequence of bytes. An object is stored in a region of memory. The address of an object is the index of the first byte occupied by that object. So if you have the address, the hardware can get at whatever's stored in the object.
The consequences of using pointers are in some ways the same as the consequences of using references in Java and C# - you're referring to an object indirectly. So you can copy a pointer value around between function calls without having to copy the whole object. You can change an object via one pointer, and other bits of code with pointers to the same object will see the changes. Sharing immutable objects can save memory compared with lots of different objects all having their own copy of the same data that they all need.
C++ also has something it calls "references", which share these properties to do with indirection but are not the same as references in Java. Nor are they the same as pointers in C++ (that's another question).
"I am struck with the thought that I must have dealt with this concept before"
Not necessarily. Languages may be functionally equivalent, in the sense that they all compute the same functions as a Turing machine can compute, but that doesn't mean that every worthwhile concept in programming is explicitly present in every language.
If you wanted to simulate the C memory model in Java or C#, though, I suppose you'd create a very large array of bytes. Pointers would be indexes in the array. Loading an int from a pointer would involve taking 4 bytes starting at that index, and multiplying them by successive powers of 256 to get the total (as happens when you deserialize an int from a bytestream in Java). If that sounds like a ridiculous thing to do, then it's because you haven't dealt with the concept before, but nevertheless it's what your hardware has been doing all along in response to your Java and C# code[*]. If you didn't notice it, then it's because those languages did a good job of creating other abstractions for you to use instead.
Literally the closest the Java language comes to the "address of an object" is that the default hashCode in java.lang.Object is, according to the docs, "typically implemented by converting the internal address of the object into an integer". But in Java, you can't use an object's hashcode to access the object. You certainly can't add or subtract a small number to a hashcode in order to access memory within or in the vicinity of the original object. You can't make mistakes in which you think that your pointer refers to the object you intend it to, but actually it refers to some completely unrelated memory location whose value you're about to scribble all over. In C++ you can do all those things.
[*] well, not multiplying and adding 4 bytes to get an int, not even shifting and ORing, but "loading" an int from 4 bytes of memory.

References in C# act the same way as pointers in C++, without all the messy syntax.
Consider the following C# code:
public class A
{
public int x;
}
public void AnotherFunc(A a)
{
a.x = 2;
}
public void SomeFunc()
{
A a = new A();
a.x = 1;
AnotherFunc(a);
// a.x is now 2
}
Since classes are references types, we know that we are passing an existing instance of A to AnotherFunc (unlike value types, which are copied).
In C++, we use pointers to make this explicit:
class A
{
public:
int x;
};
void AnotherFunc(A* a) // notice we are pointing to an existing instance of A
{
a->x = 2;
}
void SomeFunc()
{
A a;
a.x = 1;
AnotherFunc(&a);
// a.x is now 2
}

"How can pointers be explained using only concepts that are familiar to a .NET or Java developer? "
I'd suggest that there are really two distinct things that need to be learnt.
The first is how to use pointers, and heap allocated memory, to solve specific problems. With an appropriate style, using shared_ptr<> for example, this can be done in a manner analogous to that of Java. A shared_ptr<> has a lot in common with a Java object handle.
Secondly, however, I would suggest that pointers in general are a fundamentally lower level concept that Java, and to a lesser extent C#, deliberately hides. To program in C++ without moving to that level will guarantee a host of problems. You need to think in terms of the underlying memory layout and think of pointers as literally pointers to specific pieces of storage.
To attempt to understand this lower level in terms of higher concepts would be an odd path to take.

Get two sheets of large format graph paper, some scissors and a friend to help you.
Each square on the sheets of paper represents one byte.
One sheet is the stack.
The other sheet is the heap. Give the heap to your friend - he is the memory manager.
You are going to pretend to be a C program and you'll need some memory. When running your program, cut out chunks from the stack and the heap to represent memory allocation.
Ready?
void main() {
int a; /* Take four bytes from the stack. */
int *b = malloc(sizeof(int)); /* Take four bytes from the heap. */
a = 1; /* Write on your first little bit of graph paper, WRITE IT! */
*b = 2; /* Get writing (on the other bit of paper) */
b = malloc(sizeof(int)); /* Take another four bytes from the heap.
Throw the first 'b' away. Do NOT give it
back to your friend */
free(b); /* Give the four bytes back to your friend */
*b = 3; /* Your friend must now kill you and bury the body */
} /* Give back the four bytes that were 'a' */
Try with some more complex programs.

Explain the difference between the stack and the heap and where objects go.
Value types such as structs (both C++ and C#) go on the stack. Reference types (class instances) get put on the heap. A pointer (or reference) points to the memory location on the heap for that specific instance.
Reference type is the key word. Using a pointer in C++ is like using ref keyword in C#.
Managed apps make working with this stuff easy so .NET devs are spared the hassle and confusion. Glad I don't do C anymore.

The key for me was to understand the way memory works. Variables are stored in memory. The places in which you can put variables in memory are numbered. A pointer is a variable that holds this number.

Any C# programmer that understands the semantic differences between classes and structs should be able to understand pointers. I.e., explaining in terms of value vs. reference semantics (in .NET terms) should get the point across; I wouldn't complicate things by trying to explain in terms of ref (or out).

In C#, all references to classes are roughly the equivalent to pointers in the C++ world. For value types (structs, ints, etc..) this is not the case.
C#:
void func1(string parameter)
void func2(int parameter)
C++:
void func1(string* parameter)
void func2(int parameter)
Passing a parameter using the ref keyword in C# is equivalent to passing a parameter by reference in C++.
C#:
void func1(ref string parameter)
void func2(ref int parameter)
C++:
void func1((string*)& parameter)
void func2(int& parameter)
If the parameter is a class, it would be like passing a pointer by reference.

Calling a non-void function without using its return value. What actually happens?

So, I found a similar question here, but the answers are more about style and whether or not you are able to do it.
My question is, what actually happens when you call a non-void function that returns an object, but you never assign or use said returned object? So, less about whether or not you can, because I absolutely know you can and understand the other question linked above... what does the compiler/runtime environment do?
This is not a language specific question, but if you answer, please specify what language you are referring to, since behaviors will differ.

I believe that for both C# and Java, the result ends up on the stack, and the compiler then forces a pop instruction to ignore it. Eric Lippert's blog post on "The void is invariant" has some more information on this.
For example, consider the following C# code:
using System;
public class Test
{
static int Foo() { return 10; }
public static void Main()
{
Foo();
Foo();
}
}
The IL generated (by the MS C# 4 compiler) for the Main method is:
.method public hidebysig static void Main() cil managed
{
.entrypoint
.maxstack 8
L_0000: call int32 Test::Foo()
L_0005: pop
L_0006: call int32 Test::Foo()
L_000b: pop
L_000c: ret
}
Note the calls to pop - which disappear if you make Foo a void method.

what does the compiler do?
The compiler generates a pop instruction that discards the result off the virtual stack.
what does the runtime environment do?
It typically jits the code into code that passes the return value back in a register, rather than a stack location. (Typically EAX on x86 architectures.)
The jitter knows that the value will go unused, so it probably generates code that clears the register. Or perhaps it just leaves it hanging around in the register for a while.
Which runtime environment do you care about? There are lots of them, and they all have different jitters.

It depends a bit on the calling convention being used. For small/simple types, the return will typically happen in a register. In this case, the function will write the value into the register, but nothing else will pay attention, and the next time that register is needed (which will typically happen pretty quickly) it'll be overwritten with something else.
For larger types, the compiler will normally allocate a structure to hold the return value. Exactly where/how it does that allocation will vary with the compiler though -- in some cases it'll be a static structure, and the contents will be ignored and overwritten the next time you call a function returning the same type. In other cases it'll be on the stack, and even though you haven't used it, it still needs to be allocated before the function is called, and freed afterwards

For C++, typically, the compiler will optimize out the returning of the variable, turning it into a void function, and if this enables further optimizations, the compiler may optimize out the entire function call or parts of it that only pertain to the return value. For Java and C#, I have little idea.

In .NET, if the object being returned is a reference type, and the application has no other references to that object, then you'll still have an object floating around in memory until the garbage collector decides to collect it.
This is potentially bad if the object being returned happens to be holding on to resources. If it implements the IDisposable interface, then your code ought to be calling the Dispose method on it, but in this case the Dispose method would never be called.
EDIT: corrected a typo.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.