Why does this code work without the unsafe keyword? - c#

In an answer to his own controversial question, Mash has illustrated that you don't need the "unsafe" keyword to read and write directly to the bytes of any .NET object instance. You can declare the following types:
[StructLayout(LayoutKind.Explicit)]
struct MemoryAccess
{
[FieldOffset(0)]
public object Object;
[FieldOffset(0)]
public TopBytes Bytes;
}
class TopBytes
{
public byte b0;
public byte b1;
public byte b2;
public byte b3;
public byte b4;
public byte b5;
public byte b6;
public byte b7;
public byte b8;
public byte b9;
public byte b10;
public byte b11;
public byte b12;
public byte b13;
public byte b14;
public byte b15;
}
And then you can do things like change an "immutable" string. The following code prints "bar" on my machine:
string foo = "foo";
MemoryAccess mem = new MemoryAccess();
mem.Object = foo;
mem.Bytes.b8 = (byte)'b';
mem.Bytes.b10 = (byte)'a';
mem.Bytes.b12 = (byte)'r';
Console.WriteLine(foo);
You can also trigger an AccessViolationException by corrupting object references with the same technique.
Question: I thought that (in pure managed C# code) the unsafe keyword was necessary to do things like this. Why is it not necessary here? Does this mean that pure managed "safe" code is not really safe at all?

OK, that is nasty... the dangers of using a union. That may work, but isn't a very good idea - I guess I'd compare it to reflection (where you can do most things). I'd be interested to see if this works in a constrained access environment - if so, it may represent a bigger problem...
I've just tested it without the "Full Trust" flag, and the runtime rejects it:
Could not load type 'MemoryAccess'
from assembly 'ConsoleApplication4,
Version=1.0.0.0, Culture=neutral,
PublicKeyToken=null' because objects
overlapped at offset 0 and the
assembly must be verifiable.
And to have this flag, you already need high trust - so you can already do more nasty things. Strings are a slightly different case, because they aren't normal .NET objects - but there are other examples of ways to mutate them - the "union" approach is an interesting one, though. For another hacky way (with enough trust):
string orig = "abc ", copy = orig;
typeof(string).GetMethod("AppendInPlace",
BindingFlags.NonPublic | BindingFlags.Instance,
null, new Type[] { typeof(string), typeof(int) }, null)
.Invoke(orig, new object[] { "def", 3 });
Console.WriteLine(copy); // note we didn't touch "copy", so we have
// mutated the same reference

Whoops, I've muddled unsafe with fixed. Here's a corrected version:
The reason that the sample code does not require tagging with the unsafe keyword is that it does not contain pointers (see below quote for why this is regarded as unsafe). You are quite correct: "safe" might better be termed "run-time friendly". For more information on this topic I refer you to Don Box and Chris Sells Essential .NET
To quote MSDN,
In the common language runtime (CLR),
unsafe code is referred to as
unverifiable code. Unsafe code in C#
is not necessarily dangerous; it is
just code whose safety cannot be
verified by the CLR. The CLR will
therefore only execute unsafe code if
it is in a fully trusted assembly. If
you use unsafe code, it is your
responsibility to ensure that your
code does not introduce security risks
or pointer errors.
The difference between fixed and unsafe is that fixed stops the CLR from moving things around in memory, so that things outside the run-time can safely access them, whereas unsafe is about exactly the opposite problem: while the CLR can guarantee correct resolution for a dotnet reference, it cannot do so for a pointer. You may recall various Microsofties going on about how a reference is not a pointer, and this is why they make such a fuss about a subtle distinction.

You are still opting out of the 'managed' bit. There is the underlying assumption that if you can do that then you know what you're doing.

Related

What's the size and alignment of C# fixed bool array in struct?

When doing P/Invoke, it is important to make the data layout match.
We can control the layout of struct by using some attribute.
For example:
struct MyStruct
{
public bool f;
}
gives a size of 4. While we can tell compiler to make it a 1 byte bool to match C++ type of bool:
struct MyStruct
{
[MarshalAs(UnmanagedType.I1)]
public bool f;
}
gives a size of 1.
These make sense. But when I test fixed bool array, I was confused.
unsafe struct MyStruct
{
public fixed bool fs[1];
}
gives a size of 4 bytes. and
unsafe struct MyStruct
{
public fixed bool fs[4];
}
still gives a size of 4 bytes. but
unsafe struct MyStruct
{
public fixed bool fs[5];
}
gives a size of 8.
It looks like in fixed bool array, the size of bool element is still 1 byte, but the alignment is 4 bytes. This doesn't match C++ bool array, which is 1 byte size and alignment.
Can someone explain me on this?
Update : I finally find out, the reason is, bool type in a struct, then that struct will NEVER be blittable! So don't expect a struct which has bool type inside to be same layout as in C.
Regards,
Xiang.
A bool is rather special, it goes back to Dennis Ritchie's decision to not give the C language a bool type. That caused plenty of mayhem, language and operating system designers added it themselves and made incompatible choices.
It was added to the Winapi as the BOOL typedef. That's the default marshaling if you don't force another type. Typedef-ed as int to keep it compatible with C, takes 4 bytes as you found out. And aligns to 4, as you found out, like any int does.
It was added to C++. Without a size specification, most C++ compiler implementations chose a single byte for storage. Most notably the Microsoft C++ compiler did, the most likely implementation you'll interop with.
It was added to COM Automation as VARIANT_BOOL. Originally targeted as the new extension model for Visual Basic to get rid of the VBX restrictions, it became wildly popular and just about any language runtime on Windows now supports it. VB back then was heavily affected by 16-bit operating system sensibilities, a VARIANT_BOOL takes 2 bytes.
All three native runtime environments are likely targets for interop in a C# program. Clearly the CLR designers had a very difficult choice to make, having to pick between 1, 2 and 4 bytes. There is no way to win, while the CLR does have a shot at guessing at COM interop, it cannot know whether you try to interop with a C-based api or a C++ program. So they made the only logical choice: none of them.
A struct or class type that contains a bool is never blittable. Not even when you apply [MarshalAs(UnmanagedType.U1)], the one that would make it compatible with the CLR type. Not so sure that was a good decision, it however was the one they made so we'll have to deal with it.
Getting a blittable struct is highly desirable, it avoids copying. It allows native code to directly access the managed heap and stack. Pretty dangerous and many a broken pinvoke declaration has corrupted the GC heap without the usual benefit of the unsafe keyword alert. But impossible to beat for speed.
You get a blittable struct by not using bool. Use byte instead. You can still get the bool back by wrapping the struct member with a property. Don't use an auto-implemented property, you must care about the position of the byte. Thus:
struct MyStruct
{
private byte _f;
public bool f {
get { return _f != 0; }
set { _f = value ? 1 : 0; }
}
}
Native code is oblivious to the property. Don't fret about runtime overhead for the getter and setter, the jitter optimizer makes them disappear and they turn into a single CPU instruction each.
Should work:
[StructLayout(LayoutKind.Sequential)]
unsafe struct MyStruct
{
public fixed bool fs[5];
}

Why cannot marshal struct with auto layout

I encountered an odd behaviour when marshalling a struct with auto layout kind.
For example: let's take a simple code:
[StructLayout(LayoutKind.Auto)]
public struct StructAutoLayout
{
byte B1;
long Long1;
byte B2;
long Long2;
byte B3;
}
public static void Main()
{
Console.WriteLine("Sizeof struct is {0}", Marshal.SizeOf<StructAutoLayout>());
}
it throws an exception:
Unhandled Exception: System.ArgumentException: Type
'StructAutoLayout' cannot be marshaled as an unmanaged
structure; no meaningful size or offset can be computed.
So it means that compiler doesn't know struct size at compile time? I was sure that this attribute reorders struct fields and then compiles it, but it doesn't.
It doesn't make any sense. Marshalling is used for interop - and when doing interop, the two sides have to agree exactly on the structure of the struct.
When you use auto layout, you defer the decision about the structure layout to the compiler. Even different versions of the same compiler can result in different layouts - that's a problem. For example, one compiler might use this:
public struct StructAutoLayout
{
byte B1;
long Long1;
byte B2;
long Long2;
byte B3;
}
while another might do something like this:
public struct StructAutoLayout
{
byte B1;
byte B2;
byte B3;
byte _padding;
long Long1;
long Long2;
}
When dealing with native/unmanaged code, there's pretty much no meta-data involved - just pointers and values. The other side has no way of knowing how the structure is actually laid out, it expects a fixed layout you both agreed upon in advance.
.NET has a tendency to make you spoiled - almost everything just works. This is not the case when interoping with something like C++ - if you just guess your way around, you'll most likely end up with a solution that usually works, but once in a while crashes your whole application. When doing anything with unmanaged / native code, make sure you understand perfectly what you're doing - unmanaged interop is just fragile that way.
Now, the Marshal class is designed specifically for unmanaged interop. If you read the documentation for Marshal.SizeOf, it specifically says
Returns the size of an unmanaged type in bytes.
And of course,
You can use this method when you do not have a structure. The layout must be sequential or explicit.
The size returned is the size of the unmanaged type. The unmanaged and managed sizes of an object can differ. For character types, the size is affected by the CharSet value applied to that class.
If the type can't possibly be marshalled, what should Marshal.SizeOf return? That doesn't even make sense :)
Asking for the size of a type or an instance doesn't make any sense in a managed environment. "Real size in memory" is an implementation detail as far as you are concerned - it's not a part of the contract, and it's not something to rely on. If the runtime / compiler wanted, it could make every byte 77 bytes long, and it wouldn't break any contract whatsoever as long as it only stores values from 0 to 255 exactly.
If you used a struct with an explicit (or sequential) layout instead, you would have a definite contract for how the unmanaged type is laid out, and Marshal.SizeOf would work. However, even then, it will only return the size of the unmanaged type, not of the managed one - that can still differ. And again, both can be different on different systems (for example, IntPtr will be four bytes on a 32-bit system and eight bytes on a 64-bit system when running as a 64-bit application).
Another important point is that there's multiple levels of "compilation" in a .NET application. The first level, using a C# compiler, is only the tip of the iceberg - and it's not the part that handles reordering fields in the auto-layout structs. It simply marks the struct as "auto-layouted", and it's done. The actual layouting is handled when you run the application by the CLI (the specification is not clear on whether the JIT compiler handles that, but I would assume so). But that has nothing to do with Marshal.SizeOf or even sizeof - both of those are still handled at runtime. Forget everything you know from C++ - C# (and even C++/CLI) is an entirely different beast.
If you need to profile managed memory, use a memory profiler (like CLRProfiler). But do understand that you're still profiling memory in a very specific environment - different systems or .NET versions can give you different results. And in fact, there's nothing saying two instances of the same structure must be the same size.

Non-blittable error on a blittable type

I have this struct and this code:
[StructLayout(LayoutKind.Sequential, Pack = 8)]
private class xvid_image_t
{
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 4)]
public int[] stride;
// [MarshalAs(UnmanagedType.ByValArray, SizeConst = 4)]
// public IntPtr[] plane;
}
public int decore()
{
xvid_image_t myStruct = new xvid_image_t();
myStruct.stride = new int[4]; // can be commented out - same result
GCHandle.Alloc(myStruct, GCHandleType.Pinned);
// ...
}
When I try to run it I get an ArgumentException saying:
Object contains non-primitive or non-blittable data
After reading this MSDN page saying
The following complex types are also blittable types:
One-dimensional arrays of blittable types, such as an array of integers. However, a type that contains a variable array of blittable types is not itself blittable.
Formatted value types that contain only blittable types (and classes if they are marshaled as formatted types). For more information about formatted value types, see Default Marshaling for Value Types.
I don't understand what I am doing wrong.
I don't just want to use Marshal, but to understand this too.
So what I actually want is to know:
Why?
How can I resolve this?
Will the solution you provide also work with the commented line in the struct?
I am using .Net 4.5 but a solution for .Net 2.0 is also needed.
Object contains non-primitive or non-blittable data
That's the exception message you get. You are focusing on the "non-blittable" part of the message, but that's not the problem. It is the "non-primitive" part that's the issue. An array is a non-primitive data type.
The CLR is trying to keep you out of trouble here. You could pin the object but then you still have a problem, the array won't be pinned. An object isn't truly pinned when it has fields that need to be pinned as well.
And you have a bigger problem with the UnmanagedType.ByValArray, that requires a structural conversion. In other words, the layout that you need is completely different from the layout of the managed class object. Only the pinvoke marshaller can make this conversion.
You can get what you want without using the pinvoke marshaller by using fixed size buffers, using the fixed keyword. This requires using the unsafe keyword. Make it look like this:
[StructLayout(LayoutKind.Sequential)]
unsafe private struct xvid_image_t {
public fixed int stride[4];
}
Note that you have to change the declaration to a struct type. It is now a value type, you no longer need to use GCHandle to pin the value when you make it a local variable. Do make sure that whatever unmanaged code takes the structure value, usually by reference, does not store a pointer to the struct. That's going to blow up badly and utterly undiagnosably. The unsafe keyword is appropriate here. If it does store the pointer then you really do have to byte the bullet and use Marshal.AllocHGlobal() and Marshal.StructureToPtr() to ensure the pointer stays valid while the unmanaged code is using it.
An annoying limitation of .NET is that the only array-ish things it recognizes are a stand-alone System.Array object and a System.String, both of which are reference types. It's possible for code written in C# to use a fixed array (as noted by Hans Passant), but such a type is not recognized by .NET itself, and code which uses fixed arrays is not verifiable. Additionally, fixed arrays are limited to holding primitives, and cannot be accessed by other languages such as vb.net.
Two alternatives to using a fixed array are to
replace the fixed array with some combination of fields which together total the proper size (using N variables in most cases, but perhaps replacing e.g. a char[4] with a UInt32, or a char[8] with a UInt64). If the array is not too large, one might define (either via cut/paste or Reflection) a set of static methods which take a struct by ref and read/write the proper element, and then create an array of delegates to call such methods.
replace the entire structure with an array, and then pass the first element of that array as a ref parameter. This may be even more "dangerous" than using a fixed array within a structure, but is the only way I know of in vb.net to get "pass-by-ref" semantics with a structure that contains something that really needs to be accessed as an array.
While I can understand that value-type arrays might have been considered "confusing" (especially if they were auto-boxed) there are places where they would have been the semantically-correct approach for array storage, both from the standpoint of allowing pass-by-ref semantics for COM interop and also from the standpoint of methods that are supposed to return a small number of values. For example, in System.Drawing2d, there is a method which returns the current graphics transform as a float[6]; other than by experimentation, there would be no clear way of knowing whether changes to that array after it is returned would affect, might affect, or are guaranteed not to affect anything else. If the method returned a value-type array, it would be clear that changes to the returned array cannot affect anything else. Nonetheless, whether or not value-type arrays would have been a useful part of the Framework, the fact remains that whether for good or bad reasons no such thing exists.
I took the below answer from this link (here)
SItuLongEmailMsg msg = newSItuLongEmailMsg();
// set members
msg.text = new byte[2048];
// assign to msg.text
int msgSize = Marshal.SizeOf(msg);
IntPtr ptr = Marshal.AllocHGlobal(msgSize);
Marshal.StructureToPtr(msg, ptr, true);
byte[] dataOut = new byte[msgSize];
Marshal.Copy(ptr, dataOut, 0, msgSize);

.NET C# unsafe/fixed doesn't pin passthrough array element?

I have some concurrent code which has an intermittent failure and I've reduced the problem down to two cases which seem identical, but where one fails and the other doesn't.
I've now spent way too much time trying to create a minimal, complete example that fails, but without success, so I'm just posting the lines that fail in case anyone can see an obvious problem.
Object lock = new Object();
struct MyValueType { readonly public int i1, i2; };
class Node { public MyValueType x; public int y; public Node z; };
volatile Node[] m_rg = new Node[300];
unsafe void Foo()
{
Node[] temp;
while (true)
{
temp = m_rg;
/* ... */
Monitor.Enter(lock);
if (temp == m_rg)
break;
Monitor.Exit(lock);
}
#if OK // this works:
Node cur = temp[33];
fixed (MyValueType* pe = &cur.x)
*(long*)pe = *(long*)&e;
#else // this reliably causes random corruption:
fixed (MyValueType* pe = &temp[33].x)
*(long*)pe = *(long*)&e;
#endif
Monitor.Exit(lock);
}
I have studied the IL code and it looks like what's happening is that the Node object at array position 33 is moving (in very rare cases) despite the fact that we are holding a pointer to a value type within it.
It's as if the CLR doesn't notice that we are passing through a heap (movable) object--the array element--in order to access the value type. The 'OK' version has never failed under extended testing on an 8-way machine, but the alternate path fails quickly every time.
Is this never supposed to work, and 'OK' version is too streamlined to fail under stress?
Do I need to pin the object myself using GCHandle (I notice in the IL that the fixed statement alone is not doing so)?
If manual pinning is required here, why is the compiler allowing access through a heap object (without pinning) in this way?
note: This question is not discussing the elegance of reinterpreting the blittable value type in a nasty way, so please, no criticism of this aspect of the code unless it is directly relevant to the problem at hand.. thanks
[edit: jitted asm]
Thanks to Hans' reply, I understand better why the jitter is placing things on the stack in what otherwise seem like vacuous asm operations. See [rsp + 50h] for example, and how it gets nulled out after the 'fixed' region. The remaining unresolved question is whether [cur+18h] (lines 207-20C) on the stack is somehow sufficient to protect the access to the value type in a way that is not adequate for [temp+33*IntPtr.Size+18h] (line 24A).
[edit]
summary of conclusions, minimal example
Comparing the two code fragments below, I now believe that #1 is not ok, whereas #2 is acceptable.
(1.) The following fails (on x64 jit at least); GC can still move the MyClass instance if you try to fix it in-situ, via an array reference. There's no place on the stack for the reference of the particular object instance (the array element that needs to be fixed) to be published, for the GC to notice.
struct MyValueType { public int foo; };
class MyClass { public MyValueType mvt; };
MyClass[] rgo = new MyClass[2000];
fixed (MyValueType* pvt = &rgo[1234].mvt)
*(int*)pvt = 1234;
(2.) But you can access a structure inside a (movable) object using fixed (without pinning) if you provide an explicit reference on the stack which can be advertised to the GC:
struct MyValueType { public int foo; };
class MyClass { public MyValueType mvt; };
MyClass[] rgo = new MyClass[2000];
MyClass mc = &rgo[1234]; // <-- only difference -- add this line
fixed (MyValueType* pvt = &mc.mvt) // <-- and adjust accordingly here
*(int*)pvt = 1234;
This is where I'll leave it unless someone can provide corrections or more information...
Modifying objects of managed type through fixed pointers can results in undefined behavior (C# Language specification, chapter 18.6.)
Well, you are doing just that. In spite of the verbiage in the spec and the MSDN library, the fixed keyword does not in fact make the object unmoveable, it doesn't get pinned. You probably found out from looking at the IL. It uses a clever trick by generating a pointer + offset and letting the garbage collector adjust the pointer. I don't have a great explanation why this fails in one case but not the other. I don't see a fundamental difference in the generated machine code. But then I probably didn't reproduce your exact machine code either, the snippet isn't great.
As near as I can tell it should fail in both cases because of the structure member access. That causes the pointer + offset to collapse to a single pointer with a LEA instruction, preventing the garbage collector from recognizing the reference. Structures have always been trouble for the jitter. Thread timing could explain the difference, perhaps.
You could post to connect.microsoft.com for a second opinion. It is however going to be difficult to navigate around the spec violation. If my theory is correct then a read could fail too, much harder to prove though.
Fix it by actually pinning the array with GCHandle.
Puzzling over this, and I'm guessing here, it looks like the compiler is taking &temp (fixed pointer to the tmp array) then indexing that with [33]. So you're pinning the temp array, rather than the node. Try...
fixed (MyValueType* pe = &(temp[33]).x)
*(long*)pe = *(long*)&e;

C# Unsafe/Fixed Code

Can someone give an example of a good time to actually use "unsafe" and "fixed" in C# code? I've played with it before, but never actually found a good use for it.
Consider this code...
fixed (byte* pSrc = src, pDst = dst) {
//Code that copies the bytes in a loop
}
compared to simply using...
Array.Copy(source, target, source.Length);
The second is the code found in the .NET Framework, the first a part of the code copied from the Microsoft website, http://msdn.microsoft.com/en-us/library/28k1s2k6(VS.80).aspx.
The built in Array.Copy() is dramatically faster than using Unsafe code. This might just because the second is just better written and the first is just an example, but what kinds of situations would you really even need to use Unsafe/Fixed code for anything? Or is this poor web developer messing with something above his head?
It's useful for interop with unmanaged code. Any pointers passed to unmanaged functions need to be fixed (aka. pinned) to prevent the garbage collector from relocating the underlying memory.
If you are using P/Invoke, then the default marshaller will pin objects for you. Sometimes it's necessary to perform custom marshalling, and sometimes it's necessary to pin an object for longer than the duration of a single P/Invoke call.
I've used unsafe-blocks to manipulate Bitmap-data. Raw pointer-access is significantly faster than SetPixel/GetPixel.
unsafe
{
BitmapData bmData = bm.LockBits(...)
byte *bits = (byte*)pixels.ToPointer();
// Do stuff with bits
}
"fixed" and "unsafe" is typically used when doing interop, or when extra performance is required. Ie. String.CopyTo() uses unsafe and fixed in its implementation.
reinterpret_cast style behaviour
If you are bit manipulating then this can be incredibly useful
many high performance hashcode implementations use UInt32 for the hash value (this makes the shifts simpler). Since .Net requires Int32 for the method you want to quickly convert the uint to an int. Since it matters not what the actual value is, only that all the bits in the value are preserved a reinterpret cast is desired.
public static unsafe int UInt32ToInt32Bits(uint x)
{
return *((int*)(void*)&x);
}
note that the naming is modelled on the BitConverter.DoubleToInt64Bits
Continuing in the hashing vein, converting a stack based struct into a byte* allows easy use of per byte hashing functions:
// from the Jenkins one at a time hash function
private static unsafe void Hash(byte* data, int len, ref uint hash)
{
for (int i = 0; i < len; i++)
{
hash += data[i];
hash += (hash << 10);
hash ^= (hash >> 6);
}
}
public unsafe static void HashCombine(ref uint sofar, long data)
{
byte* dataBytes = (byte*)(void*)&data;
AddToHash(dataBytes, sizeof(long), ref sofar);
}
unsafe also (from 2.0 onwards) lets you use stackalloc. This can be very useful in high performance situations where some small variable length array like temporary space is needed.
All of these uses would be firmly in the 'only if your application really needs the performance' and thus are inappropriate in general use, but sometimes you really do need it.
fixed is necessary for when you wish to interop with some useful unmanaged function (there are many) that takes c-style arrays or strings. As such it is not only for performance reasons but correctness ones when in interop scenarios.
Unsafe is useful for (for example) getting pixel data out of an image quickly using LockBits. The performance improvement over doing this using the managed API is several orders of magnitude.
We had to use a fixed when an address gets passed to a legacy C DLL. Since the DLL maintained an internal pointer across function calls, all hell would break loose if the GC compacted the heap and moved stuff around.
I believe unsafe code is used if you want to access something outside of the .NET runtime, ie. it is not managed code (no garbage collection and so on). This includes raw calls to the Windows API and all that jazz.
This tells me the designers of the .NET framework did a good job of covering the problem space--of making sure the "managed code" environment can do everything a traditional (e.g. C++) approach can do with its unsafe code/pointers. In case it cannot, the unsafe/fixed features are there if you need them. I'm sure someone has an example where unsafe code is needed, but it seems rare in practice--which is rather the point, isn't it? :)

Categories