Is it safe to keep C++ pointers in C#? - c#

I'm currently working on some C#/C++ code which makes use of invoke. In the C++ side there is a std::vector full of pointers each identified by index from the C# code, for example a function declaration would look like this:
void SetName(char* name, int idx)
But now I'm thinking, since I'm working with pointers couldn't I sent to C# the pointer address itself then in code I could do something like this:
void SetName(char*name, int ptr)
{
((TypeName*)ptr)->name = name;
}
Obviously that's a quick version of what I'm getting at (and probably won't compile).
Would the pointer address be guaranteed to stay constant in C++ such that I can safely store its address in C# or would this be too unstable or dangerous for some reason?

In C#, you don't need to use a pointer here, you can just use a plain C# string.
[DllImport(...)]
extern void SetName(string name, int id);
This works because the default behavior of strings in p/invoke is to use MarshalAs(UnmanagedType.LPStr), which converts to a C-style char*. You can mark each argument in the C# declaration explicitly if it requires some other way of being marshalled, eg, [MarshalAs(UnmanagedType.LPWStr)], for an arg that uses a 2-byte per character string.
The only reason to use pointers is if you need to retain access to the data pointed to after you've called the function. Even then, you can use out parameters most of the time.
You can p/invoke basically anything without requiring pointers at all (and thus without requiring unsafe code, which requires privileged execution in some environments).

Yes, no problem. Native memory allocations never move so storing the pointer in an IntPtr on the C# side is fine. You need some kind of pinvoked function that returns this pointer, then
[DllImport("something.dll", CharSet = CharSet.Ansi)]
void SetName(IntPtr vector, string name, int index);
Which intentionally lies about this C++ function:
void SetName(std::vector<std::string>* vect, const char* name, int index) {
std::string copy = name;
(*vect)[index] = copy;
}
Note the usage of new in the C++ code, you have to copy the string. The passed name argument points to a buffer allocated by the pinvoke marshaller and is only valid for the duration of the function body. Your original code cannot work. If you intend to return pointers to vector<> elements then be very careful. A vector re-allocates its internal array when you add elements. Such a returned pointer will then become invalid and you'll corrupt the heap when you use it later. The exact same thing happens with a C# List<> but without the risk of dangling pointers.

I think it's stable till you command C++ code and perfectly aware what he does, and other developers that work on the same code know about that danger too.
So by my opinion, it's not very secure way of architecture, and I would avoid it as much as I can.
Regards.

The C# GC moves things, but the C++ heap does not move anything- a pointer to an allocated object is guaranteed to remain valid until you delete it. The best architecture for this situation is just to send the pointer to C# as an IntPtr and then take it back in C++.
It's certainly a vastly, incredibly better idea than the incredibly BAD, HORRIFIC integer cast you've got going there.

Related

How to P/Invoke a function with an unknown struct in C#?

I know this sounds really strange, but I don't know how to even ask this properly. I've been trying to P/Invoke into NVidia's NVML library with limited success: I've managed to call a few of the APIs exported by that library
Now I am trying to call nvmlDeviceGetHandleByIndex_v2 but I've been stuck for a long while on this one. It takes in a nvmlDevice_t pointer, but I've found nothing on what nvmlDevice_t actually is beyond this header definition:
typedef struct nvmlDevice_st* nvmlDevice_t;
The problem is that the header file does not make any other reference to nvmlDevice_st so I don't know how much heap space to allocate for it, if any. I've found this official C++ example that calls that same function like this:
nvmlDevice_t device;
CheckNVMLErrors(nvmlDeviceGetHandleByIndex(device_index, &device));
My main problem is that I'm not familiar enough with C/C++ to understand the implicit mechanics/memory allocation done by the device declaration, and the nvml.h header does not define what nvmlDevice_st actually is.
I tried calling it with a ref int parameter (with an initial 0 value) and apparently it does work but I want to understand why, if possible. For reference, the value of that ref int parameter after the call was 1460391512, in case something can be gleamed off that.
If you look at the source, that is just an internal pointer used by the SDK. The value it points to has no meaning to you. You use it to identify a device you are working with.
Think Handle or HWND in Windows. You call something like FindWindow(), it returns what seems to be a random value back to you. You don't care what that value holds, you just use that value to identify that window when you call GetWindowText() or any other windowing methods.
So, you are on the right track with using ref int, but what you want is a pointer. So you should use out IntPtr to get the value.

C# marshal unmanaged pointer return type

I have an unmanaged library which has a function like this:
type* foo();
foo basically allocates an instance of the unmanaged type on the managed heap through Marshal.AllocHGlobal.
I have a managed version of type. It's not blittable but I have MarshalAs attributes set on members so I can use Marshal.PtrToStructure to get a managed version of it. But having to wrap calls to foo with extra bookkeeping to call Marshal.PtrToStructure is a bit annoying.
I'd like to be able to do something like this on the C# side:
[DllImport("mylib", CallingConvention = CallingConvention.Cdecl)]
[return: MarshalAs(UnmanagedType.LPStruct)]
type* foo();
and have C#'s marshaller handle the conversion behind the scenes, like it does for function arguments. I thought I should be able to do this because type is allocated on the managed heap. But maybe I can't? Is there any way to have C#'s inbuilt marshaller handle the unmanaged-to-managed transition on the return type for me without having to manually call Marshal.PtrToStructure?
A custom marshaler works fine if, on the .NET side, typeis declared as a class, not as a struct.
This is clearly stated in UnmanagedType enumeration:
Specifies the custom marshaler class when used with the
MarshalAsAttribute.MarshalType or MarshalAsAttribute.MarshalTypeRef
field. The MarshalAsAttribute.MarshalCookie field can be used to pass
additional information to the custom marshaler. You can use this
member on any reference type.
Here is some sample code that should work fine
[[DllImport("mylib", CallingConvention = CallingConvention.Cdecl)]
[return : MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef= typeof(typeMarshaler))]
private static extern type Foo();
private class typeMarshaler : ICustomMarshaler
{
public static readonly typeMarshaler Instance = new typeMarshaler();
public static ICustomMarshaler GetInstance(string cookie) => Instance;
public int GetNativeDataSize() => -1;
public object MarshalNativeToManaged(IntPtr nativeData) => Marshal.PtrToStructure<type>(nativeData);
// in this sample I suppose the native side uses GlobalAlloc (or LocalAlloc)
// but you can use any allocation library provided you use the same on both sides
public void CleanUpNativeData(IntPtr nativeData) => Marshal.FreeHGlobal(nativeData);
public IntPtr MarshalManagedToNative(object managedObj) => throw new NotImplementedException();
public void CleanUpManagedData(object managedObj) => throw new NotImplementedException();
}
[StructLayout(LayoutKind.Sequential)]
class type
{
/* declare fields */
};
Of course, changing unmanaged struct declarations into classes can have deep implications (that may not always raise compile-time errors), especially if you have a lot of existing code.
Another solution is to use Roslyn to parse your code, extract all Foo-like methods and generate one additional .NET method for each. I would do this.
type* foo()
This is very awkward function signature, hard to use correctly in a C or C++ program and that never gets better when you pinvoke. Memory management is the biggest problem, you want to work with the programmer that wrote this code to make it better.
Your preferred signature should resemble int foo(type* arg, size_t size). In other words, the caller supplies the memory and the native function fills it in. The size argument is required to avoid memory corruption, necessary when the version of type changes and gets larger. Often included as a field of type. The int return value is useful to return an error code so you can fail gracefully. Beyond making it safe, it is also much more efficient since no memory allocation is required at all. You can simply pass a local variable.
... allocates an instance of the unmanaged type on the managed heap through Marshal.AllocHGlobal
No, this is where memory management assumptions get very dangerous. Never the managed heap, native code has no decent way to call into the CLR. And you cannot assume that it used the equivalent of Marshal.AllocHGlobal(). The native code typically uses malloc() to allocate the storage, which heap is used to allocate from is an implementation detail of the CRT it links. Only that CRT's free() function is guaranteed to release it reliably. You cannot call free() yourself. Skip to the bottom to see why AllocHGlobal() appeared to be correct.
There are function signatures that forces the pinvoke marshaller to release the memory, it does so by calling Marshal.FreeCoTaskMem(). Note that this is not equivalent to Marshal.AllocHGlobal(), it uses a different heap. It assumes that the native code was written to support interop well and used CoTaskMemAlloc(), it uses the heap that is dedicated to COM interop.
It's not blittable but I have MarshalAs attributes set...
That is the gritty detail that explains why you have to make it awkward. The pinvoke marshaller does not want to solve this problem since it has to marshal a copy and there is too much risk automatically releasing the storage for the object and its members. Using [MarshalAs] is unnecessary and does not make the code better, simply change the return type to IntPtr. Ready to pass to Marshal.PtrToStructure() and whatever memory release function you need.
I have to talk about the reason that Marshal.AllocHGlobal() appeared to be correct. It did not used to be, but has changed in recent Windows and VS versions. There was a big design change in Win8 and VS2012. The OS no longer creates separate heaps that Marshal.AllocHGlobal and Marshal.AllocCoTaskMem allocate from. It is now a single heap, the default process heap (GetProcessHeap() returns it). And there was a corresponding change in the CRT included with VS2012, it now also uses GetProcessHeap() instead of creating its own heap with HeapCreate().
Very big change and not publicized widely. Microsoft has not released any motivation for this that I know of, I assume that the basic reason was WinRT (aka UWP), lots of memory management nastiness to get C++, C# and Javascript code to work together seamlessly. This is quite convenient to everybody that has to write interop code, you can now assume that Marshal.FreeHGlobal() gets the job done. Or Marshal.FreeCoTaskMem() like the pinvoke marshaller uses. Or free() like the native code would use, no difference anymore.
But also a significant risk, you can no longer assume that the code is bug-free when it works well on your dev machine and must re-test on Win7. You get an AccessViolationException if you guessed wrong about the release function. It is worse if you also have to support XP or Win2003, no crash at all but you'll silently leak memory. Very hard to deal with that when it happens since you can't get ahead without changing the native code. Best to get it right early.

Is it possible to create a 6400 byte integer?

I have a function which I can't alter because of protection and abstraction, and it is declared like this:
GetDeviceLongInfo(int, int, ref int);
In which the "ref int" argument to be passed is said to give back 6400 bytes of information.
My question is, how can I get this information in a variable if the only choice I have is to give the function an Int32? Can I allocate more memory for that Int32? Is this even possible to achieve in some way?
EDIT:
I can tell you that the function uses the ref int to dump values in it, the int size (size of the information) is not fixed, depends on the option chosed in the second parameter. I can't even look at the function to see how it uses that ref.
You can allocate an int[] and pass that to the function. This is a hack but I don't see why it should not be safe.
var array = new int[6400 / sizeof(int)];
GetDevice(..., ref array[0]);
The array is pinned by the CLR for the duration of the call.
Note, that ref is a so called managed pointer to the CLR. It is marshaled by passing it as a pointer and pinning the object it points to. An int[] would be passed in almost the same way (a pointer to the first element is passed).
Can I allocate more memory for that Int32? No
Is this even possible to achieve in some way? Changing the signature or using the int as a reference to the data are both options
You're attempting to marshal an array (which is a native pointer to data) to an integer. C# will have no problem with that, but processing it is another story. Also note that depending on your architecture you will have different pointer sizes, which means using a 32-bit int isn't the way to go.
See also: http://msdn.microsoft.com/en-us/library/z6cfh6e6(v=vs.110).aspx
I cannot remember the details from the top of my head, but basically you want to use the MarshalAs to tell .NET that it's a pointer to an array. IIRC it was something like this (1600 = 6400/4):
void GetDeviceLongInfo(int, int, [MarshalAs(UnmanagedType.LPArray, SizeConst=1600)] int[] ar );
update
I noticed the questions on how this works, so here it is... How this signature will work: signature in C is probably (long, long, long*) which means the third argument should be a pointer to int. The underlying buffer will be filled with the GetDeviceLongInfo by means of a strncpy or something similar. Things that can go wrong is passing a buffer that's too small (that's checked running it in Debug mode in VS), using the wrong processor architecture, incorrectly passing the integer instead of a pointer (you can try casting the address of your AllocHGlobal to int and see if that works -- that does mean you will have to run on x86 though) and basically a whole lot of other things :-)
Apparently you cannot change anything to the signature. What you're basically attempting to do then is allocate a buffer, cast it to an int* and then process it. Since the approach of usr isn't working, I'd try Marshal.AllocHGlobal to create the buffer, and then pass it to the function (if needed, use unsafe code).

Returing a string whose length is unknown a priori from C++ called from C#

I have searched far and wide for an answer to my question, and all the solutions are not acceptable, not applicable, and/or confusing.
I am needing to return a string from a function implemented in C++ back to the calling code in C#.
The returned string needs to be returned as a parameter rather than a return value since I need to pass/return multiple strings for some functions. The length of the string varies, so I can't just allocate a buffer, etc.
Any help would be appreciated.
NOTE: The solution posted and mentioned by Justin and/or others is NOT a solution for my use case. As I stated in the question, I do not know the size of the string prior to making the call to the C++ code. I can't pre-allocate a StringBuffer and pass it to the C++ code.
One way is to declare the parameter as ref IntPtr. So:
static extern void DoSomething(ref IntPtr returnedString);
So you call it and get a string with:
IntPtr pstr;
DoSomething(ref pstr);
string theString = Marshal.PtrToStringAnsi(pstr);
However, it's important to remember that the returned pointer was allocated by your C++ code. If you want it to be deallocated, you'll need to call the C++ code to do it.
You might also want to look at Marshal.PtrToStringAuto, and other similar functions.
Note also that this copies the data from the pointer to the string. If you want to refer to the string in place, you'll have to play with IntPtr and the Marshal class, or delve into the wonderful world of unsafe code and pointers.
Adding to Jim Michel's answer, I would create a helper function, like
String FromCppString(IntPtr a_Pointer)
{
String result = Marshal.PtrToStringAnsi(a_Pointer);
FreeCppString(a_Pointer);
return result;
}
where FreeCppString is another function exported from C++, freeing the string properly. The original c++ function will just allocate as many strings as necessary and put them into parameters. The C# function will use FromCppString() to extract them.
Use the portable (multi-compiler multi-language) string provided by the platform for the express purpose of passing strings between components implemented in different languages -- BSTR.
In C++, you use SysAllocString or SysAllocStringLen. P/invoke already knows how to deal with these (convert to .NET string and call SysFreeString) as long as you use the right signature.
extern static void DoSomething([MarshalAs(UnmanagedType.BStr)] out String returnedString);
And then simply call it:
string theString;
DoSomething(out theString);
That's it, no special conversions or cleanup necessary, since p/invoke took care of it.
For more information, read this MSDN page on string handling in p/invoke
NOTE: I guess none of the examples in the link are exactly this case, so here's the C++ prototype
void DoSomething(__out BSTR *s);

Are ref and out in C# the same a pointers in C++?

I just made a Swap routine in C# like this:
static void Swap(ref int x, ref int y)
{
int temp = x;
x = y;
y = temp;
}
It does the same thing that this C++ code does:
void swap(int *d1, int *d2)
{
int temp=*d1;
*d1=*d2;
*d2=temp;
}
So are the ref and out keywords like pointers for C# without using unsafe code?
They're more limited. You can say ++ on a pointer, but not on a ref or out.
EDIT Some confusion in the comments, so to be absolutely clear: the point here is to compare with the capabilities of pointers. You can't perform the same operation as ptr++ on a ref/out, i.e. make it address an adjacent location in memory. It's true (but irrelevant here) that you can perform the equivalent of (*ptr)++, but that would be to compare it with the capabilities of values, not pointers.
It's a safe bet that they are internally just pointers, because the stack doesn't get moved and C# is carefully organised so that ref and out always refer to an active region of the stack.
EDIT To be absolutely clear again (if it wasn't already clear from the example below), the point here is not that ref/out can only point to the stack. It's that when it points to the stack, it is guaranteed by the language rules not to become a dangling pointer. This guarantee is necessary (and relevant/interesting here) because the stack just discards information in accordance with method call exits, with no checks to ensure that any referrers still exist.
Conversely when ref/out refers to objects in the GC heap it's no surprise that those objects are able to be kept alive as long as necessary: the GC heap is designed precisely for the purpose of retaining objects for any length of time required by their referrers, and provides pinning (see example below) to support situations where the object must not be moved by GC compacting.
If you ever play with interop in unsafe code, you will find that ref is very closely related to pointers. For example, if a COM interface is declared like this:
HRESULT Write(BYTE *pBuffer, UINT size);
The interop assembly will turn it into this:
void Write(ref byte pBuffer, uint size);
And you can do this to call it (I believe the COM interop stuff takes care of pinning the array):
byte[] b = new byte[1000];
obj.Write(ref b[0], b.Length);
In other words, ref to the first byte gets you access to all of it; it's apparently a pointer to the first byte.
Reference parameters in C# can be used to replace one use of pointers, yes. But not all.
Another common use for pointers is as a means for iterating over an array. Out/ref parameters can not do that, so no, they are not "the same as pointers".
ref and out are only used with function arguments to signify that the argument is to be passed by reference instead of value. In this sense, yes, they are somewhat like pointers in C++ (more like references actually). Read more about it in this article.
The nice thing about using out is that you're guaranteed that the item will be assigned a value -- you will get a compile error if not.
Actually, I'd compare them to C++ references rather than pointers. Pointers, in C++ and C, are a more general concept, and references will do what you want.
All of these are undoubtedly pointers under the covers, of course.
While comparisons are in the eye of the beholder...I say no. 'ref' changes the calling convention but not the type of the parameters. In your C++ example, d1 and d2 are of type int*. In C# they are still Int32's, they just happen to be passed by reference instead of by value.
By the way, your C++ code doesn't really swap its inputs in the traditional sense. Generalizing it like so:
template<typename T>
void swap(T *d1, T *d2)
{
T temp = *d1;
*d1 = *d2;
*d2 = temp;
}
...won't work unless all types T have copy constructors, and even then will be much more inefficient than swapping pointers.
The short answer is Yes (similar functionality, but not exactly the same mechanism).
As a side note, if you use FxCop to analyse your code, using out and ref will result in a "Microsoft.Design" error of "CA1045:DoNotPassTypesByReference."

Categories