C# Unsafe/Fixed Code

C# Unsafe/Fixed Code - c#

Can someone give an example of a good time to actually use "unsafe" and "fixed" in C# code? I've played with it before, but never actually found a good use for it.
Consider this code...
fixed (byte* pSrc = src, pDst = dst) {
//Code that copies the bytes in a loop
}
compared to simply using...
Array.Copy(source, target, source.Length);
The second is the code found in the .NET Framework, the first a part of the code copied from the Microsoft website, http://msdn.microsoft.com/en-us/library/28k1s2k6(VS.80).aspx.
The built in Array.Copy() is dramatically faster than using Unsafe code. This might just because the second is just better written and the first is just an example, but what kinds of situations would you really even need to use Unsafe/Fixed code for anything? Or is this poor web developer messing with something above his head?

It's useful for interop with unmanaged code. Any pointers passed to unmanaged functions need to be fixed (aka. pinned) to prevent the garbage collector from relocating the underlying memory.
If you are using P/Invoke, then the default marshaller will pin objects for you. Sometimes it's necessary to perform custom marshalling, and sometimes it's necessary to pin an object for longer than the duration of a single P/Invoke call.

I've used unsafe-blocks to manipulate Bitmap-data. Raw pointer-access is significantly faster than SetPixel/GetPixel.
unsafe
{
BitmapData bmData = bm.LockBits(...)
byte *bits = (byte*)pixels.ToPointer();
// Do stuff with bits
}
"fixed" and "unsafe" is typically used when doing interop, or when extra performance is required. Ie. String.CopyTo() uses unsafe and fixed in its implementation.

reinterpret_cast style behaviour
If you are bit manipulating then this can be incredibly useful
many high performance hashcode implementations use UInt32 for the hash value (this makes the shifts simpler). Since .Net requires Int32 for the method you want to quickly convert the uint to an int. Since it matters not what the actual value is, only that all the bits in the value are preserved a reinterpret cast is desired.
public static unsafe int UInt32ToInt32Bits(uint x)
{
return *((int*)(void*)&x);
}
note that the naming is modelled on the BitConverter.DoubleToInt64Bits
Continuing in the hashing vein, converting a stack based struct into a byte* allows easy use of per byte hashing functions:
// from the Jenkins one at a time hash function
private static unsafe void Hash(byte* data, int len, ref uint hash)
{
for (int i = 0; i < len; i++)
{
hash += data[i];
hash += (hash << 10);
hash ^= (hash >> 6);
}
}
public unsafe static void HashCombine(ref uint sofar, long data)
{
byte* dataBytes = (byte*)(void*)&data;
AddToHash(dataBytes, sizeof(long), ref sofar);
}
unsafe also (from 2.0 onwards) lets you use stackalloc. This can be very useful in high performance situations where some small variable length array like temporary space is needed.
All of these uses would be firmly in the 'only if your application really needs the performance' and thus are inappropriate in general use, but sometimes you really do need it.
fixed is necessary for when you wish to interop with some useful unmanaged function (there are many) that takes c-style arrays or strings. As such it is not only for performance reasons but correctness ones when in interop scenarios.

Unsafe is useful for (for example) getting pixel data out of an image quickly using LockBits. The performance improvement over doing this using the managed API is several orders of magnitude.

We had to use a fixed when an address gets passed to a legacy C DLL. Since the DLL maintained an internal pointer across function calls, all hell would break loose if the GC compacted the heap and moved stuff around.

I believe unsafe code is used if you want to access something outside of the .NET runtime, ie. it is not managed code (no garbage collection and so on). This includes raw calls to the Windows API and all that jazz.

This tells me the designers of the .NET framework did a good job of covering the problem space--of making sure the "managed code" environment can do everything a traditional (e.g. C++) approach can do with its unsafe code/pointers. In case it cannot, the unsafe/fixed features are there if you need them. I'm sure someone has an example where unsafe code is needed, but it seems rare in practice--which is rather the point, isn't it? :)

Related

Why cannot marshal struct with auto layout

I encountered an odd behaviour when marshalling a struct with auto layout kind.
For example: let's take a simple code:
[StructLayout(LayoutKind.Auto)]
public struct StructAutoLayout
{
byte B1;
long Long1;
byte B2;
long Long2;
byte B3;
}
public static void Main()
{
Console.WriteLine("Sizeof struct is {0}", Marshal.SizeOf<StructAutoLayout>());
}
it throws an exception:
Unhandled Exception: System.ArgumentException: Type
'StructAutoLayout' cannot be marshaled as an unmanaged
structure; no meaningful size or offset can be computed.
So it means that compiler doesn't know struct size at compile time? I was sure that this attribute reorders struct fields and then compiles it, but it doesn't.

It doesn't make any sense. Marshalling is used for interop - and when doing interop, the two sides have to agree exactly on the structure of the struct.
When you use auto layout, you defer the decision about the structure layout to the compiler. Even different versions of the same compiler can result in different layouts - that's a problem. For example, one compiler might use this:
public struct StructAutoLayout
{
byte B1;
long Long1;
byte B2;
long Long2;
byte B3;
}
while another might do something like this:
public struct StructAutoLayout
{
byte B1;
byte B2;
byte B3;
byte _padding;
long Long1;
long Long2;
}
When dealing with native/unmanaged code, there's pretty much no meta-data involved - just pointers and values. The other side has no way of knowing how the structure is actually laid out, it expects a fixed layout you both agreed upon in advance.
.NET has a tendency to make you spoiled - almost everything just works. This is not the case when interoping with something like C++ - if you just guess your way around, you'll most likely end up with a solution that usually works, but once in a while crashes your whole application. When doing anything with unmanaged / native code, make sure you understand perfectly what you're doing - unmanaged interop is just fragile that way.
Now, the Marshal class is designed specifically for unmanaged interop. If you read the documentation for Marshal.SizeOf, it specifically says
Returns the size of an unmanaged type in bytes.
And of course,
You can use this method when you do not have a structure. The layout must be sequential or explicit.
The size returned is the size of the unmanaged type. The unmanaged and managed sizes of an object can differ. For character types, the size is affected by the CharSet value applied to that class.
If the type can't possibly be marshalled, what should Marshal.SizeOf return? That doesn't even make sense :)
Asking for the size of a type or an instance doesn't make any sense in a managed environment. "Real size in memory" is an implementation detail as far as you are concerned - it's not a part of the contract, and it's not something to rely on. If the runtime / compiler wanted, it could make every byte 77 bytes long, and it wouldn't break any contract whatsoever as long as it only stores values from 0 to 255 exactly.
If you used a struct with an explicit (or sequential) layout instead, you would have a definite contract for how the unmanaged type is laid out, and Marshal.SizeOf would work. However, even then, it will only return the size of the unmanaged type, not of the managed one - that can still differ. And again, both can be different on different systems (for example, IntPtr will be four bytes on a 32-bit system and eight bytes on a 64-bit system when running as a 64-bit application).
Another important point is that there's multiple levels of "compilation" in a .NET application. The first level, using a C# compiler, is only the tip of the iceberg - and it's not the part that handles reordering fields in the auto-layout structs. It simply marks the struct as "auto-layouted", and it's done. The actual layouting is handled when you run the application by the CLI (the specification is not clear on whether the JIT compiler handles that, but I would assume so). But that has nothing to do with Marshal.SizeOf or even sizeof - both of those are still handled at runtime. Forget everything you know from C++ - C# (and even C++/CLI) is an entirely different beast.
If you need to profile managed memory, use a memory profiler (like CLRProfiler). But do understand that you're still profiling memory in a very specific environment - different systems or .NET versions can give you different results. And in fact, there's nothing saying two instances of the same structure must be the same size.

Is it safe to keep C++ pointers in C#?

I'm currently working on some C#/C++ code which makes use of invoke. In the C++ side there is a std::vector full of pointers each identified by index from the C# code, for example a function declaration would look like this:
void SetName(char* name, int idx)
But now I'm thinking, since I'm working with pointers couldn't I sent to C# the pointer address itself then in code I could do something like this:
void SetName(char*name, int ptr)
{
((TypeName*)ptr)->name = name;
}
Obviously that's a quick version of what I'm getting at (and probably won't compile).
Would the pointer address be guaranteed to stay constant in C++ such that I can safely store its address in C# or would this be too unstable or dangerous for some reason?

In C#, you don't need to use a pointer here, you can just use a plain C# string.
[DllImport(...)]
extern void SetName(string name, int id);
This works because the default behavior of strings in p/invoke is to use MarshalAs(UnmanagedType.LPStr), which converts to a C-style char*. You can mark each argument in the C# declaration explicitly if it requires some other way of being marshalled, eg, [MarshalAs(UnmanagedType.LPWStr)], for an arg that uses a 2-byte per character string.
The only reason to use pointers is if you need to retain access to the data pointed to after you've called the function. Even then, you can use out parameters most of the time.
You can p/invoke basically anything without requiring pointers at all (and thus without requiring unsafe code, which requires privileged execution in some environments).

Yes, no problem. Native memory allocations never move so storing the pointer in an IntPtr on the C# side is fine. You need some kind of pinvoked function that returns this pointer, then
[DllImport("something.dll", CharSet = CharSet.Ansi)]
void SetName(IntPtr vector, string name, int index);
Which intentionally lies about this C++ function:
void SetName(std::vector<std::string>* vect, const char* name, int index) {
std::string copy = name;
(*vect)[index] = copy;
}
Note the usage of new in the C++ code, you have to copy the string. The passed name argument points to a buffer allocated by the pinvoke marshaller and is only valid for the duration of the function body. Your original code cannot work. If you intend to return pointers to vector<> elements then be very careful. A vector re-allocates its internal array when you add elements. Such a returned pointer will then become invalid and you'll corrupt the heap when you use it later. The exact same thing happens with a C# List<> but without the risk of dangling pointers.

I think it's stable till you command C++ code and perfectly aware what he does, and other developers that work on the same code know about that danger too.
So by my opinion, it's not very secure way of architecture, and I would avoid it as much as I can.
Regards.

The C# GC moves things, but the C++ heap does not move anything- a pointer to an allocated object is guaranteed to remain valid until you delete it. The best architecture for this situation is just to send the pointer to C# as an IntPtr and then take it back in C++.
It's certainly a vastly, incredibly better idea than the incredibly BAD, HORRIFIC integer cast you've got going there.

True Unsafe Code Performance

I understand unsafe code is more appropriate to access things like the Windows API and do unsafe type castings than to write more performant code, but I would like to ask you if you have ever noticed any significant performance improvement in real-world applications by using it when compared to safe c# code.

Some Performance Measurements
The performance benefits are not as great as you might think.
I did some performance measurements of normal managed array access versus unsafe pointers in C#.
Results from a build run outside of Visual Studio 2010, .NET 4, using an
Any CPU | Release build on the following PC specification: x64-based PC, 1 quad-core processor. Intel64 Family 6 Model 23 Stepping 10 GenuineIntel ~2833 Mhz.
Linear array access
00:00:07.1053664 for Normal
00:00:07.1197401 for Unsafe *(p + i)
Linear array access - with pointer increment
00:00:07.1174493 for Normal
00:00:10.0015947 for Unsafe (*p++)
Random array access
00:00:42.5559436 for Normal
00:00:40.5632554 for Unsafe
Random array access using Parallel.For(), with 4 processors
00:00:10.6896303 for Normal
00:00:10.1858376 for Unsafe
Note that the unsafe *(p++) idiom actually ran slower. My guess this broke a compiler optimization that was combining the loop variable and the (compiler generated) pointer access in the safe version.
Source code available on github.

As was stated in other posts, you can use unsafe code in very specialised contexts to get a significant performance inprovement. One of those scenarios is iterating over arrays of value types. Using unsafe pointer arithmetic is much faster than using the usual pattern of for-loop/indexer..
struct Foo
{
int a = 1;
int b = 2;
int c = 0;
}
Foo[] fooArray = new Foo[100000];
fixed (Foo* foo = fooArray) // foo now points to the first element in the array...
{
var remaining = fooArray.length;
while (remaining-- > 0)
{
foo->c = foo->a + foo->b;
foo++; // foo now points to the next element in the array...
}
}
The main benefit here is that we've cut out array index checking entirely..
While very performant, this kind of code is hard to handle, can be quite dangerous (unsafe), and breaks some fundamental guidelines (mutable struct). But there are certainly scenarios where this is appropriate...

A good example is image manipulations. Modifying the Pixels by using a pointer to their bytes (which requires unsafe code) is quite a bit faster.
Example: http://www.gutgames.com/post/Using-Unsafe-Code-for-Faster-Image-Manipulation.aspx
That being said, for most scenarios, the difference wouldn't be as noticeable. So before you use unsafe code, profile your application to see where the performance bottlenecks are and test whether unsafe code is really the solution to make it faster.

Well, I would suggest to read this blog-post: MSDN blogs: Array Bounds Check Elimination in the CLR
This clarifies how bounds-checks are done in C#. Moreover, Thomas Bratts tests seem to me useless (looking at the code) since the JIT removes in his 'save' loops the bound-checks anyway.

I am using unsafe code for video manipulation code.
In such code you want it to run as fast as possible without internal checks on values etc. Without unsafe attributes my could would not be able to keep up with the video stream at 30fps or 60 fps. (depending on used camera).
But because of speed its widely used by people who code graphics.

To all that are looking at these answers I would like to point out that even though the answers are excellent, a lot has changed sins the answers have been posted.
Please note that .net has changed quite a bit and one now also has the possibility to access new data types like vectors, Span, ReadOnlySpan as well as hardware specific libraries and classes like those found in System.Runtime.Intrinsics in core 3.0
Have a look at this blog post to see how hardware optimized loops could be used and this blog how to fallback to safe methods if optimal hardware is not available.

Marshal.PtrToStringUni() vs new String()?

Suppose i have a pointer of type char* to unicode string, and i know the length:
char* _unmanagedStr;
int _unmanagedStrLength;
and i have 2 ways to convert it to .NET string:
Marshal.PtrToStringUni((IntPtr)_unmanagedStr, _unmanagedStrLength);
and
new string(_unmanagedStr, 0, _unmanagedStrLength);
In my tests, both calls gives me exactly the same result, but the new string() is like 1.8x times faster than Marshal.PtrToStringUni().
Why is that performance difference?
Is there any another functional difference between the both?

Judging from available source code (Rotor), the System.String(Char*) constructor uses a heavily optimized code path through CtorCharPtr(), it allocates the string with FastAllocateString(). Marshal.PtrToStringUni() follows an entirely different code path, it is written in C++ and looks to be copying the string twice, without the benefit of a "fast allocator".
Clearly, not the same programmer worked on this. Almost certainly not even the same team since the code fits a different programming model. The closest manager in common was probably four levels up.
Not sure how that would be helpful, use the fast one. Mishaps would generate a similar kind of exception on Windows.

The second is not CLS compliant, requires unsafe code and might have undetermined behavior which is why probably it's faster. There's also a need to pin the pointer to the unmanaged address or the garbage collector might reallocate it which leades to a more cluttered code. Unless you've determined that this is a bottleneck for your application you'll probably want to use the PtrToStringUni function.

Are ref and out in C# the same a pointers in C++?

I just made a Swap routine in C# like this:
static void Swap(ref int x, ref int y)
{
int temp = x;
x = y;
y = temp;
}
It does the same thing that this C++ code does:
void swap(int *d1, int *d2)
{
int temp=*d1;
*d1=*d2;
*d2=temp;
}
So are the ref and out keywords like pointers for C# without using unsafe code?

They're more limited. You can say ++ on a pointer, but not on a ref or out.
EDIT Some confusion in the comments, so to be absolutely clear: the point here is to compare with the capabilities of pointers. You can't perform the same operation as ptr++ on a ref/out, i.e. make it address an adjacent location in memory. It's true (but irrelevant here) that you can perform the equivalent of (*ptr)++, but that would be to compare it with the capabilities of values, not pointers.
It's a safe bet that they are internally just pointers, because the stack doesn't get moved and C# is carefully organised so that ref and out always refer to an active region of the stack.
EDIT To be absolutely clear again (if it wasn't already clear from the example below), the point here is not that ref/out can only point to the stack. It's that when it points to the stack, it is guaranteed by the language rules not to become a dangling pointer. This guarantee is necessary (and relevant/interesting here) because the stack just discards information in accordance with method call exits, with no checks to ensure that any referrers still exist.
Conversely when ref/out refers to objects in the GC heap it's no surprise that those objects are able to be kept alive as long as necessary: the GC heap is designed precisely for the purpose of retaining objects for any length of time required by their referrers, and provides pinning (see example below) to support situations where the object must not be moved by GC compacting.
If you ever play with interop in unsafe code, you will find that ref is very closely related to pointers. For example, if a COM interface is declared like this:
HRESULT Write(BYTE *pBuffer, UINT size);
The interop assembly will turn it into this:
void Write(ref byte pBuffer, uint size);
And you can do this to call it (I believe the COM interop stuff takes care of pinning the array):
byte[] b = new byte[1000];
obj.Write(ref b[0], b.Length);
In other words, ref to the first byte gets you access to all of it; it's apparently a pointer to the first byte.

Reference parameters in C# can be used to replace one use of pointers, yes. But not all.
Another common use for pointers is as a means for iterating over an array. Out/ref parameters can not do that, so no, they are not "the same as pointers".

ref and out are only used with function arguments to signify that the argument is to be passed by reference instead of value. In this sense, yes, they are somewhat like pointers in C++ (more like references actually). Read more about it in this article.

The nice thing about using out is that you're guaranteed that the item will be assigned a value -- you will get a compile error if not.

Actually, I'd compare them to C++ references rather than pointers. Pointers, in C++ and C, are a more general concept, and references will do what you want.
All of these are undoubtedly pointers under the covers, of course.

While comparisons are in the eye of the beholder...I say no. 'ref' changes the calling convention but not the type of the parameters. In your C++ example, d1 and d2 are of type int*. In C# they are still Int32's, they just happen to be passed by reference instead of by value.
By the way, your C++ code doesn't really swap its inputs in the traditional sense. Generalizing it like so:
template<typename T>
void swap(T *d1, T *d2)
{
T temp = *d1;
*d1 = *d2;
*d2 = temp;
}
...won't work unless all types T have copy constructors, and even then will be much more inefficient than swapping pointers.

The short answer is Yes (similar functionality, but not exactly the same mechanism).
As a side note, if you use FxCop to analyse your code, using out and ref will result in a "Microsoft.Design" error of "CA1045:DoNotPassTypesByReference."

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.