c++/c# Marshal a struct to get a fixed pointer - c#

I have an older app which accepts dll plugins. Each plugin has a function that creates a new instance of a "widget". This function identifies the new "widget" instance by returning a unique int. In the c++ plugin template I have it's done by declaring a new widget struct and returning its pointer cast as an int as a unique identifier (not my idea, but I have to work with it):
struct widget {
...
};
int widget_instance (HWND hwnd) {
widget* myWidget = new widget;
...
return (int) myWidget;
}
I'm working on a c# plugin for this app and I need to replicate this mechanism in c#. How do I declare a struct inside a function, so that each time the function is called it returns a fixed, non-changing pointer to the newly declared struct? My code so far:
public struct Widget {
...
}
public static int widget_instance(uint hwnd) {
Widget w = new Widget();
...
IntPtr myWidget = Marshal.AllocCoTaskMem(Marshal.SizeOf(w));
Marshal.StructureToPtr(w, myWidget, false);
return (int) myWidget;
}
Does this seem right? Do I need to "pin" the struct for this to work as c++ does?
EDIT: To elaborate: the unique ID is not just a simple "cookie" int, it's in fact used later on in the c++ template to access the struct instance by casting the ID as a pointer:
widget* w = (widget*) WidgetID;
w->firstElement = 123;
I assume the c# equivalent would be:
widget w = (widget)Marshal.PtrToStructure((IntPtr)widgetID,typeof(widget));
w.firstElement = 123;

You seem to imply that the returned integer identifier is just a cookie, and isn't treated by the app as a pointer. That would mean that the app never performs operations on the widget, but always asks the plugin to perform the operations on the app's behalf. If that's the case then there's no reason why you can't use your own scheme for assigning cookie identifiers. So...
You could add a "cookie" member to your Widget class, and a static "next cookie" member for assigning cookies. This would be a simple and semantically sound implementation. Obviously it would require thread-safety measures. It would also avoid the C++ technique of casting the "this" pointer to an int, which you say you don't like although I think it's perfectly fine.
The C++ pointer-to-int cast technique might serve a second purpose in addition to providing a unique identifier. When the app gives the plugin a cookie, the plugin can find the widget directly and at zero runtime cost by casting the cookie back to a pointer. Although that's hardly safe. But I don't think you can do that in C# anyway.
Your proposed C# code means that for every widget the app asks for, you wantonly create and initialize a second widget. If you really want to mimic the C++ technique so that your cookies are guaranteed to be unique process-wide, as opposed to just unique within the single plugin, then you could just allocate a one-byte block but don't initialize it. Or even a zero-sized block, but I'm not quite certain that that would yield unique addresses.

Yes, it is correct, since AllocCoTaskMem allocates the memory from the unmanaged COM allocator.
The memory is released or relocated only with the FreeCoTaskMem or ReAllocCoTaskMem functions, respectively.

Related

Marshal out parameter that is a reference type

I'm trying to get a C# class (not struct!) from a C++ function back (using out parameter). This is the C# side:
[StructLayout(LayoutKind.Sequential)]
public class OutClass
{
public int X;
}
// This interface function will be mapped to the C++ function
void MarshalOutClass(out OutClass outClass);
The C++ part looks like this
struct OutClass
{
int x = 0;
};
extern "C" MARSHAL_TESTS_API void MarshalOutClass(OutClass** out);
The function mapping works via a custom mechanism and is not part of this question. All marshaling attributes are treated normally.
In C#, we usually declare out arguments inline, like:
MarshalOutClass(out var outClass);
DoSomething(outClass);
Since C# reference types are marshaled by pointer, I figured I would have to add another pointer for ref or out parameters. So I use OutClass** on the C++ side.
I assume that C# translates the
MarshalOutClass(out var outClass);
part to roughly
OutClass outClass = default(OutClass);
MarshalOutClass(ref outClass);
which is a reference to null, or on C++ side: a pointer to a nullptr. This wouldn't be a problem for C# value types (aka struct) because their default is a default constructed instance.
This means, I'd have to manually create an instance of my object on the C++ side and marshal it back to C#.
void MarshalOutClass(OutClass** out)
{
auto ptr = static_cast<OutClass*>(CoTaskMemAlloc(sizeof(OutClass)));
*ptr = OutClass{};
*out = ptr;
}
The code seems to work, but I'm not sure if I'm leaking memory with this or if the marshaler is going to to take care of it properly. I don't do any cleanup on the C# side.
This brings me to the following questions:
is my assumption on how ref and out are translated to C++ correct?
is CoTaskMemAlloc the correct function here?
do I have to perform any additional memory management related tasks (on either side)?
is the overall approach correct here? What should I do differently?
I know that static_cast<OutClass*>(CoTaskMemAlloc(sizeof(OutClass))) is a bit sketchy here but lets assume OutClass is always a trivial type.

C# marshal unmanaged pointer return type

I have an unmanaged library which has a function like this:
type* foo();
foo basically allocates an instance of the unmanaged type on the managed heap through Marshal.AllocHGlobal.
I have a managed version of type. It's not blittable but I have MarshalAs attributes set on members so I can use Marshal.PtrToStructure to get a managed version of it. But having to wrap calls to foo with extra bookkeeping to call Marshal.PtrToStructure is a bit annoying.
I'd like to be able to do something like this on the C# side:
[DllImport("mylib", CallingConvention = CallingConvention.Cdecl)]
[return: MarshalAs(UnmanagedType.LPStruct)]
type* foo();
and have C#'s marshaller handle the conversion behind the scenes, like it does for function arguments. I thought I should be able to do this because type is allocated on the managed heap. But maybe I can't? Is there any way to have C#'s inbuilt marshaller handle the unmanaged-to-managed transition on the return type for me without having to manually call Marshal.PtrToStructure?
A custom marshaler works fine if, on the .NET side, typeis declared as a class, not as a struct.
This is clearly stated in UnmanagedType enumeration:
Specifies the custom marshaler class when used with the
MarshalAsAttribute.MarshalType or MarshalAsAttribute.MarshalTypeRef
field. The MarshalAsAttribute.MarshalCookie field can be used to pass
additional information to the custom marshaler. You can use this
member on any reference type.
Here is some sample code that should work fine
[[DllImport("mylib", CallingConvention = CallingConvention.Cdecl)]
[return : MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef= typeof(typeMarshaler))]
private static extern type Foo();
private class typeMarshaler : ICustomMarshaler
{
public static readonly typeMarshaler Instance = new typeMarshaler();
public static ICustomMarshaler GetInstance(string cookie) => Instance;
public int GetNativeDataSize() => -1;
public object MarshalNativeToManaged(IntPtr nativeData) => Marshal.PtrToStructure<type>(nativeData);
// in this sample I suppose the native side uses GlobalAlloc (or LocalAlloc)
// but you can use any allocation library provided you use the same on both sides
public void CleanUpNativeData(IntPtr nativeData) => Marshal.FreeHGlobal(nativeData);
public IntPtr MarshalManagedToNative(object managedObj) => throw new NotImplementedException();
public void CleanUpManagedData(object managedObj) => throw new NotImplementedException();
}
[StructLayout(LayoutKind.Sequential)]
class type
{
/* declare fields */
};
Of course, changing unmanaged struct declarations into classes can have deep implications (that may not always raise compile-time errors), especially if you have a lot of existing code.
Another solution is to use Roslyn to parse your code, extract all Foo-like methods and generate one additional .NET method for each. I would do this.
type* foo()
This is very awkward function signature, hard to use correctly in a C or C++ program and that never gets better when you pinvoke. Memory management is the biggest problem, you want to work with the programmer that wrote this code to make it better.
Your preferred signature should resemble int foo(type* arg, size_t size). In other words, the caller supplies the memory and the native function fills it in. The size argument is required to avoid memory corruption, necessary when the version of type changes and gets larger. Often included as a field of type. The int return value is useful to return an error code so you can fail gracefully. Beyond making it safe, it is also much more efficient since no memory allocation is required at all. You can simply pass a local variable.
... allocates an instance of the unmanaged type on the managed heap through Marshal.AllocHGlobal
No, this is where memory management assumptions get very dangerous. Never the managed heap, native code has no decent way to call into the CLR. And you cannot assume that it used the equivalent of Marshal.AllocHGlobal(). The native code typically uses malloc() to allocate the storage, which heap is used to allocate from is an implementation detail of the CRT it links. Only that CRT's free() function is guaranteed to release it reliably. You cannot call free() yourself. Skip to the bottom to see why AllocHGlobal() appeared to be correct.
There are function signatures that forces the pinvoke marshaller to release the memory, it does so by calling Marshal.FreeCoTaskMem(). Note that this is not equivalent to Marshal.AllocHGlobal(), it uses a different heap. It assumes that the native code was written to support interop well and used CoTaskMemAlloc(), it uses the heap that is dedicated to COM interop.
It's not blittable but I have MarshalAs attributes set...
That is the gritty detail that explains why you have to make it awkward. The pinvoke marshaller does not want to solve this problem since it has to marshal a copy and there is too much risk automatically releasing the storage for the object and its members. Using [MarshalAs] is unnecessary and does not make the code better, simply change the return type to IntPtr. Ready to pass to Marshal.PtrToStructure() and whatever memory release function you need.
I have to talk about the reason that Marshal.AllocHGlobal() appeared to be correct. It did not used to be, but has changed in recent Windows and VS versions. There was a big design change in Win8 and VS2012. The OS no longer creates separate heaps that Marshal.AllocHGlobal and Marshal.AllocCoTaskMem allocate from. It is now a single heap, the default process heap (GetProcessHeap() returns it). And there was a corresponding change in the CRT included with VS2012, it now also uses GetProcessHeap() instead of creating its own heap with HeapCreate().
Very big change and not publicized widely. Microsoft has not released any motivation for this that I know of, I assume that the basic reason was WinRT (aka UWP), lots of memory management nastiness to get C++, C# and Javascript code to work together seamlessly. This is quite convenient to everybody that has to write interop code, you can now assume that Marshal.FreeHGlobal() gets the job done. Or Marshal.FreeCoTaskMem() like the pinvoke marshaller uses. Or free() like the native code would use, no difference anymore.
But also a significant risk, you can no longer assume that the code is bug-free when it works well on your dev machine and must re-test on Win7. You get an AccessViolationException if you guessed wrong about the release function. It is worse if you also have to support XP or Win2003, no crash at all but you'll silently leak memory. Very hard to deal with that when it happens since you can't get ahead without changing the native code. Best to get it right early.

Why is the Pinnable<T> class in C# 7.2 defined the way it is?

I'm aware that Pinnable<T> is an internal class used by the methods in the new Unsafe class, and it's not meant to be used anywhere else other than in that class. This question is not about something practical, but it's just to understand why it's been designed like this and to learn a bit more about the language and its various "tricks" like this one.
As a recap, the Pinnable<T> class is defined here, and it looks like this:
[StructLayout(LayoutKind.Sequential)]
internal sealed class Pinnable<T>
{
public T Data;
}
And it's mainly used in the Span<T>.DangerousCreate method, here:
public static Span<T> DangerousCreate(object obj, ref T objectData, int length)
{
Pinnable<T> pinnable = Unsafe.As<Pinnable<T>>(obj);
IntPtr byteOffset = Unsafe.ByteOffset<T>(ref pinnable.Data, ref objectData);
return new Span<T>(pinnable, byteOffset, length);
}
The reason for Pinnable<T> being that it's used to keep track of the original object, in case the Span<T> instance was created by one (instead of a native pointer).
Given that reference type doesn't matter when pinning a reference (fixing both a ref T and Unsafe.As<T, byte>(ref T) works the same), is there a specific reason why the Pinnable<T> class was made generic? The original design in DotNetCross here in fact had a Pinnable class with just a single byte field, and it worked just the same. Is there any reason why using a generic class in this case would be an advantage, other than avoiding to cast the reference time when writing/reading/returning it?
Is there any other way, other than this unsafe-cast done with Unsafe.As, to get a reference to an object (I mean a reference to the object contents, otherwise it'd be the same as any variable of a class type)? I mean, any way to get a reference (which should basically have the same address of the actual object variable in the first place, right?) to an object without having to pass through some custom defined secondary class.
First of all, the Struct in [StructLayout(LayoutKind.Sequential)] doesn't mean that it is only valid for structs, it means the layout of the actual structure of the fields in memory, be it in a class or in a value type. This controls the actual runtime layout of the data, not just how the type would marshal to unmanaged code. The Sequential is important because without it, the runtime is pretty much free to store the memory however it sees fit, which means that Data may have some padding before it.
From what I understand about the implementation, the reason for Pinnable is to allow creating an instance of Span to a memory that may be moved by the GC, without having to pin the object first. If you don't use actual pointers and just references, nothing at all will need to be pinned.
I have noticed that it was introduced in a commit with a description saying it made Span more "portable" (a bold word for something that does a lot of unsafe things). I can't think of any other reason than something related to alignment for why it is generic. I suppose representing a T in terms of an offset from another T is better than as an offset from a byte. It may happen that the type of the first field may play a role in its actual address, even if the type was marked with LayoutKind.Sequential.
A reference to an object is different from an interior reference to an object (a reference to its data). It is implementation defined, but in .NET Framework, an instance of any class (or a boxed value type) starts with a header consisting of a sync block (for lock) and a pointer to the method table, a.k.a. the type of the object. On 32-bit, the header is 8 bytes, but the actual pointer points to the pointer to the method table (for performance reasons, getting the type happens more often than locking an object).
One but not portable way of getting the pointer to the start of the data is therefore casting the object reference to a pointer and adding 4 bytes to it. There the first field should start.
Another way I can think of is utilising GCHandle.AddrOfPinnedObject. It is commonly used for accessing array or string data, but it works for other objects:
[StructLayout(LayoutKind.Sequential)]
class Obj
{
public int A;
}
var obj = new Obj();
var gc = GCHandle.Alloc(obj, GCHandleType.Pinned);
IntPtr interior = gc.AddrOfPinnedObject();
Marshal.WriteInt32(interior, 0, 16);
Console.WriteLine(obj.A);
I think this actually is quite portable, but still needs to pin the object (there is InternalAddrOfPinnedObject defined in GCHandle, but even if that doesn't check whether the handle is actually pinned, the returned value may not be valid if it was used on a non-pinned object).
Still, the technique Span uses seems like the most portable way of doing that, since a lot of the underlying work is done in pure CIL (like reference arithmetics).

C# equivalent of a const pointer/pointer to const in C++

I am learning the basics of C++, coming from the .NET world (C#).
One topic i found interesting was the const keyword and its usage with pointers (const pointer/pointer to const).
I'd like to know if there's any C# language equivalent of the const pointer/pointer to const that C++ has?
(I know C# doesn't have pointers, i am considering references to be the pointer-like types in C#).
Also, out of interest, if there's no such equivalent, what were the decisions behind not including such a feature?
There is no direct equivalent to passing references as 'const' in C#, but there are alternative ways to accomplish its purpose. The most common way to do this is to make your reference class either completely immutable (once constructed, its state should never change) or pass it as an immutable public interface. The latter is the closest to the intention of the 'const' parameter contract (I'm giving you a reference to something so you can use it, but I'm asking you not to change it.) A poorly-behaved client could 'cast away' the public interface to a mutable form, of course, but it still makes the intention clear. You could 'cast away' const in C++, as well, thought this was rarely a good idea.
One other thing in C++ is that you would often prefer to pass as const when you knew that the lifetime of the reference you were passing was limited in scope. C++ often follows the pattern where objects are created and destroyed on the stack within method scope, so any references to those objects should not be persisted outside that scope (since using them after they fall out of scope could cause really nasty stack corruption crashes.) A const reference should not be mutated, so it's a strong hint that storing it somewhere to reference later would be a bad idea. A method with const parameters is promising that it's safe to pass these scoped references. Since C# never allows storing references to objects on the stack (outside of parameters), this is less of a concern.
The concept of constant objects (i.e. readonly) in C# (or Java for that matter) corresponds approximately to object *const in C++, i.e. a constant pointer to a non-constant object.
There are several reasons for it - for one specifying const correctly and making it useful in the language is quite hard. Taking c++ as an example, you have to define lots of methods twice with only small changes to the signature, there's const_cast, the fact that const is only applied shallow, etc.
So C# went for the easy solution to make the language simpler - D went the other way with transitive const correctness, etc. as I understand it (never written a single line in D, so take that with a grain of salt).
The usual solution in C#/Java is to have immutable classes, possibly using a builder pattern, or a simple wrapper that prohibits changes (e.g. all the unmodifiable collections that wrap another collection and throw exceptions for the mutating methods like add).
8,000,000 years later, but C# 7.2 uses the "in" keyword which is sort of like const. It tells the compiler to pass a struct or primitive variable by reference (like ref or out) but the method WILL NOT modify the variable.
public void DoSomething(in int variable){
//Whatever
}
is functionally equivalent to C++'s
void Foo::DoSomething(int& const variable){
//Whatever
}
or arguably even
void Foo::DoSomething(int const * const variable){
//Whatever
}
The main reason for doing this in C#, according to MSDN, is to tell the compiler that it can pass the variable by reference since it won't be modified. This allows for potentially better performance when passing large structs
Regarding constant pointers, see this answer: Difference between const. pointer and reference?
In other words, a reference in C++ is very similar to a const pointer for most applications. However, a reference in C# is closer to a pointer in C++ regarding how they can be used.
Reference objects passed as arguments to methods in C# can be reinstantiated (say, from Object Instance A to Object Instance B), however the B's lifetime is only within the method scope and is disposed of once returned to the caller (since the pointer itself is passed by value) and the caller's reference is always to A. In this sense, you can freely pass around references in C# and know that they cannot be made to point to different objects (unless ref/out keywords are used).
C# example -
class Foo
{
//Stateful field
public int x;
//Constructor
public Foo()
{
x = 6;
}
}
public class Program
{
public static void Main()
{
var foo = new Foo();
foo.x = 8;
VarTestField(foo);
Console.WriteLine(foo.x);
RefTestField(ref foo);
Console.WriteLine(foo.x);
}
//Object passed by reference, pointer passed by value
static void VarTestField(Foo whatever){
whatever = new Foo();
}
//Object passed by reference, pointer passed by reference
static void RefTestField(ref Foo whatever){
whatever = new Foo();
}
}
Output:
8
6
So no, you cannot declare a constant pointer in C#. Still, using proven design patterns, algorithms, and OOP fundamentals wisely, along with the built in syntax of the language, you can achieve the desired behavior.

Storing a "managed" context parameter in an unmanaged DLL

I don't know if this is a bad idea or not. I'm using an unmanaged DLL (written by me) in C#.
There are some callback functions that can be set up in the DLL, but these can only mapped to static class members on the C# side.
Since I want to make a callback operate on a particular class instance I'm wondering if it would be safe to store a class instance pointer inside the DLL's state information.
From the DLL's perspective this will simply be a 32-bit context integer, but from the C# side this will be an actual class "pointer" or "reference", with the callback signature defined something like so:
public delegate void StatusChangeHandler(ContextClass context, int someCallbackValue);
It does compile and it does appear to work, I just don't know if this is guaranteed. Is this an acceptable practice?
One problem that I see here, is that .Net have a garbage collector, which can move your class around. So your saved pointer may be invalidated. In order to prevent this for simple types you should pin the object like this:
byte[] b = new byte[1000];
// pin b, and get pointer to the first element.
fixed (byte* ptr = &b)
{
//use your fixed pointer to b. b will not be moved untill code leaves fixed region.
}
Though, for complex types, .Net may be smartenough to pin objects automatically, I would not rely on that.
So you have write something like this:
var ctx = new Context();
fixed (IntPtr ptr = &ctx)
{
StatusChange(ptr);
// do other stuff, and don't leave fixed region, until you can clear the pointer in the native library.
}
But really, I think a much simpler and reliably way will be to create a static dictionary for your context objects, and give your native dll only a key for that dictionary, which could be a number, string or GUID. E.g. anything that is a value, not a pointer.

Categories