Struct/Class metadata for object representation and managed/unmanaged interaction

Struct/Class metadata for object representation and managed/unmanaged interaction - c#

I know there is plenty of similar questions, but I want to understand very specific aspects that are never mentioned anywhere. In both cases: managed (NET runtime in this case) and native /unmanaged (c, cpp, etc)
Taken from here:
https://adamsitnik.com/Value-Types-vs-Reference-Types
The first issue that not just wasn't glossed over but not even mentioned is how your code/runtime supposed to know what type of struct you are dealing with? If struct is just all its data packed together, then where is metadata about what type of struct it is? Okay, may be in case of managed .NET runtime and binaries it is easier, it is part of the IL, but what about native binary code? It gets stripped 100%, if you open it for text reading there is no function or struct names in you binary. How would you know runtime what the struct you receive and how to treat/ parse it if there is no struct metadata with it? The pointer just points at memory, but there is no struct' structure and members there stored. At least class can be identified by extra data it has (object header and method table).
Things get even more confusing when you receive struct data from unmanaged/native space. You NEED to have that data embedded into struct otherwise how would you know what you receive? And I can't even begin to understand* how would this work for something like classes, because they are thousandfold more complex. How do you even return class from unmanaged space?

If struct is just all its data packed together, then where is metadata about what type of struct it is?
There isn't any. With raw value data - whether that means an individual value, or an array/vector/whatever, the consuming code is the thing that knows how to interpret it. What it is is just: bytes.
How would you know runtime what the struct you receive and how to treat/ parse it if there is no struct metadata with it?
Because it is defined in the API, either in a signature, or with human words such as "the pointer refers to the start of len elements of type Foo"
You NEED to have that data embedded into struct
No you don't; you just need to agree in advance what you are sending/receiving. This is how the vast majority of interactions between different codebases works, and has always worked. Having object metadata is the exception, not the norm.
How do you even return class from unmanaged space?
You wouldn't, if by "classes" in this context you mean managed objects.

Related

How can I make a Value Type collection in .NET?

The basic building block type of my application decomposes into a type (class or structure) which contains some standard value types (int, bool, etc..) and some array types of standard value types where there will be a small (but unknown) number of elements in the collection.
Given that I have many instances of the above building block, I would like to limit the memory usage of my basic type by using an array/collection as a Value Type instead of the standard Reference Type. Part of the problem is that my standard usage will be to have the arrays containing zero, one or two elements in them and the overhead of the array reference type in this scenario is prohibitive.
I have empirically observed and research has confirmed that the array wrapper itself introduces unwanted (by me, in this situation) overhead in each instance.
How do I make a collection a Value Type / Struct in .NET?
Side Note: it is interesting that Apple's Swift language has arrays as value types by default.
Pre-Emptive Comment
I am fully aware that the above is a non-standard way of using the .NET framework and is very bad practice etc...so it's not necessary to comment to that effect. I really just want to know how to achieve what I am asking.

The fixed keyword referenced in the docs seems to be what you're looking for. It has the same constraints on types as structs do, but it does require unsafe.
internal unsafe struct MyBuffer
{
public fixed char fixedBuffer[128];
}
If you wanted to also have a fixed array of your struct it would be more complicated. fixed only supports the base value types, so you'd have to drop into manual memory allocation.

A mix of ideas from a DirectBuffer and a BufferPool could work.
If you use a buffer pool then fixing buffers in memory is not a big issue because buffers become effectively long-lived and do not affect GC compaction as much as if you were fixing every new byte[] without a pool.
The DirectBuffer uses flyweight pattern and adds very little overhead. You could read/write any blittable struct directly using pointers. Other than SBE, Flatbuffers and Cup'n Proto also use such approach as far as I understand. In the linked implementation you should change the delegate so that it returns a discarded byte[] to the pool.
Big advantage of such solution is zero-copy if you need to interop with native code or send data over network. Additionally, you could allocate a single buffer and work with offsets/lengths in ArraySegment-like fashion.
Update:
I have re-read the question and realized that it was specifically about collections as value types. However the main rationale seems to be memory pressure, so this answer could be an alternative solution for memory, even though DirectBuffer is a class.

Parse a string of an entire struct

I got a string which holds an entire definition of a C++ native struct (That means the ENTIRE struct - name, fields, enums but without methods in it, in the same syntax as you would simply write the struct on your own).
What I need to do is take the string and convert it to a string representing the Managed C++ type of the native type.
The parsing and handling of the string is done with C#.
I am looking for a way and a library perhaps which makes it easy to do what I need.
I was thinking of somehow create a template and edit it with the data from the given native struct.
If you got an answer please divide it into 2 cases - The case in which I DO NOT have access to the project/dll where the native struct itself is defined (which means it needs to be a pure parsing of the string) and to a case in which I do have access, in which I might be able to somehow use reflection?

In the end I used a T4 template. I made a template of a a managed C++ struct and parsed the native struct line be line and got the info needed about members and such and then inserted it to the T4 template.

Why can fixed size buffers only be of primitive types?

We have to interop with native code a lot, and in this case it is much faster to use unsafe structs that don't require marshaling. However, we cannot do this when the structs contain fixed size buffers of nonprimitive types.
Why is it a requirement from the C# compiler that fixed size buffers are only of the primitive types? Why can a fixed size buffer not be made of a struct such as:
[StructLayout(LayoutKind.Sequential)]
struct SomeType
{
int Number1;
int Number2;
}

Fixed size buffers in C# are implemented with a CLI feature called "opaque classes". Section I.12.1.6.3 of Ecma-335 describes them:
Some languages provide multi-byte data structures whose contents are manipulated directly by
address arithmetic and indirection operations. To support this feature, the CLI allows value types
to be created with a specified size but no information about their data members. Instances of
these “opaque classes” are handled in precisely the same way as instances of any other class, but
the ldfld, stfld, ldflda, ldsfld, and stsfld instructions shall not be used to access their contents.
The "no information about their data members" and "ldfld/stfld shall not be used" are the rub. The 2nd rule puts the kibosh on structures, you need ldfld and stfld to access their members. The C# compiler cannot provide an alternative, the layout of a struct is a runtime implementation detail. Decimal and Nullable<> are out because they are structs as well. IntPtr is out because its size depends on the bitness of the process, making it difficult for the C# compiler to generate the address for the ldind/stind opcode used to access the buffer. Reference types references are out because the GC needs to be able to find them back and can't by the 1st rule. Enum types have a variable size that depend on their base type; sounds like a solvable problem, not entirely sure why they skipped it.
Which just leaves the ones mentioned by the C# language specification: sbyte, byte, short, ushort, int, uint, long, ulong, char, float, double or bool. Just the simple types with a well defined size.

What is a fixed buffer?
From MSDN:
In C#, you can use the fixed statement to create a buffer with a fixed size array in a data structure. This is useful when you are working with existing code, such as code written in other languages, pre-existing DLLs or COM projects. The fixed array can take any attributes or modifiers that are allowed for regular struct members. The only restriction is that the array type must be bool, byte, char, short, int, long, sbyte, ushort, uint, ulong, float, or double.
I'm just going to quote Mr. Hans Passant in regards to why a fixed buffer MUST be unsafe. You might see Why is a fixed size buffers (arrays) must be unsafe? for more information.
Because a "fixed buffer" is not a real array. It is a custom value type, about the only way
to generate one in the C# language that I know. There is no way for
the CLR to verify that indexing of the array is done in a safe way.
The code is not verifiable either. The most graphic demonstration of
this:
using System;
class Program {
static unsafe void Main(string[] args) {
var buf = new Buffer72();
Console.WriteLine(buf.bs[8]);
Console.ReadLine();
}
}
public struct Buffer72 {
public unsafe fixed byte bs[7];
}
You can arbitrarily access the stack frame in this example. The standard buffer overflow injection
technique would be available to malicious code to patch the function
return address and force your code to jump to an arbitrary location.
Yes, that's quite unsafe.
Why can't a fixed buffer contain non-primitive data types?
Simon White raised a valid point:
I'm gonna go with "added complexities to the compiler". The compiler would have to check that no .NET specific functionality was applied to the struct that applied to enumerable items. For example, generics, interface implementation, even deeper properties of non-primitive arrays, etc. No doubt the runtime would also have some interop issues with that sort of thing too.
And Ibasa:
"But that is already done by the compiler." Only partly. The compiler can do the checks to see if a type is managed but that doesn't take care of generating code to read/write structs to fixed buffers. It can be done (there's nothing stopping it at CIL level) it just isn't implemented in C#.
Lastly, Mehrdad:
I think it's literally because they don't want you to use fixed-size buffers (because they want you to use managed code). Making it too easy to interop with native code makes you less likely to use .NET for everything, and they want to promote managed code as much as possible.
The answer appears to be a resounding "it's just not implemented".
Why's it not implemented?
My guess is that the cost and implementation time just isn't worth it to them. The developers would rather promote managed code over unmanaged code. It could possibly be done in a future version of C#, but the current CLR lacks a lot of the complexity needed.
An alternative could be the security issue. Being that fixed buffers are immensely vulnerable to all sorts of problems and security risks should they be implemented poorly in your code, I can see why the use of them would be discouraged over managed code in C#. Why put a lot of work into something you'd like to discourage the use of?

I understand your point of view...on the other hand I suppose that it could be some kind of forward compatibility reserved by Microsoft. Your code is compiled to MSIL and it is bussiness of specific .NET Framework and OS to layout it in memory.
I can imagine that it may come new CPU from intel which will require to layout variables to every 8 bytes to gain the optimal performance. In that case there will be need in future, in some future .NET Framework 6 and some future Windows 9 to layout these struct in different way. In this case, your example code would be pressure for Microsoft not to change the memory layout in the future and not speed up the .NET framework to modern HW.
It is only speculation...
Did you tried to set FieldOffset? See C++ union in C#

Calling native DLL functions that return (and accept) pointers to structs

I've got a "native" DLL with some nice functions that can return (and accept) pointers to data which is formatted according to particular C structs.
In my C# program I don't care about the struct internals, I just want to get and pass them from/to the native functions. I've already managed to pinvoke the functions inside the DLL.
For to pointers, I've thought of using void* (as a "pointer-to-unknown"), since I really don't care the internal fields of the pointed structs, I just need to store and use the pointers to pass it to the DLL library functions.
But using void* for many different kinds of data makes my code unreadable! Is there any way in C# to typedef void* some_nicer_type_t ? Or to do something like that?

You could consider using IntPtr instead.
From MSDN:
A platform-specific type that is used
to represent a pointer or a handle.
This may ultimately aid you in writing non-unsafe code, too.
EDIT:
To address your desired needs as reiterated in comment to this question, one thing I might suggest (though not proposing this to be ideal) is to define a struct or class which is essentially a wrapper around a pointer:
public struct TypedPointer
{
public IntPtr UnderlyingPointer;
}
As you yourself bring up, this may lead to wrapping even more code in order to have it all conform to the usage and aesthetics of usage that you want.

IntPtr
is the way to go.
Only area which you have to be careful is the memory management. Unsafe or not - C# needs a reference otherwise it will be sooner or later garbage-collected - potentially also when one of the external dlls uses it. Thus you need to ensure that the reference to the IntPtr is kept as long as needed also on the C# side.

Marshaling – what is it and why do we need it?

What is marshalling and why do we need it?
I find it hard to believe that I cannot send an int over the wire from C# to C and have to marshall it. Why can't C# just send the 32 bits over with a starting and terminating signal, telling C code that it has received an int?
If there are any good tutorials or sites about why we need marshalling and how to use it, that would be great.

Because different languages and environments have different calling conventions, different layout conventions, different sizes of primitives (cf. char in C# and char in C), different object creation/destruction conventions, and different design guidelines. You need a way to get the stuff out of managed land an into somewhere where unmanaged land can see and understand it and vice versa. That's what marshalling is for.

.NET code(C#, VB) is called "managed" because it's "managed" by CLR (Common Language Runtime)
If you write code in C or C++ or assembler it is all called "unmanaged", since no CLR is involved. You are responsible for all memory allocation/de-allocation.
Marshaling is the process between managed code and unmanaged code; It is one of the most important services offered by the CLR.

Marshalling an int is ideally just what you said: copying the memory from the CLR's managed stack into someplace where the C code can see it. Marshalling strings, objects, arrays, and other types are the difficult things.
But the P/Invoke interop layer takes care of almost all of these things for you.

As Vinko says in the comments, you can pass primitive types without any special marshalling. These are called "blittable" types and include types like byte, short, int, long, etc and their unsigned counterparts.
This page contains the list of blittable and non-blittable types.

Marshalling is a "medium" for want of a better word or a gateway, to communicate with the unmanaged world's data types and vice versa, by using the pinvoke, and ensures the data is returned back in a safe manner.

Marshalling is passing signature of a function to a different process which is on a different machine, and it is usually implemented by conversion of structured data to a dedicated format, which can be transferred to other processor systems (serialization / deserialization).

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.