I need to create a array that is aligned to a 64 byte boundary. I need to do this as I'm calling a DLL which uses AVX, which requires the data to be aligned. Essentially I need to do this in C#:
void* ptr = _aligned_malloc(64 * 1024, 64);
int8_t* memory_ptr = (int8_t*)ptr;
I'm pretty sure I can't create an array to such a boundary naturally in C#. So one option is to create an byte array that is x+64 long, and then 'create' an array that overlays it, but with an offset at the required boundary.
The problem is how do I accomplish this, and not have a memory leak? (Memory leaking is the reason I'd rather not use the DLL to create a reference to the array and pass it to C#. Unless there is a good way to do so?)
Using the helpful answers below, this is what I have, hopefully it helps others:
public class Example : IDisposable
{
private ulong memory_ptr;
public unsafe Example()
{
memory_ptr = (ulong)NativeMemory.AlignedAlloc(0x10000, 64);
}
public unsafe Span<byte> Memory => new Span<byte>((void*)memory_ptr, 0x10000);
public unsafe void Dispose()
{
NativeMemory.Free((void*)memory_ptr);
}
}
As mentioned, .NET 6 has NativeMemory.AlignedAlloc. You need to make sure to call AlignedFree otherwise you could get a leak.
void* a = default;
try
{
a = NativeMemory.AlignedAlloc(size * sizeof(long), 64);
var span = new Span<long>(a, size);
// fill span
// call DLL with span
}
finally
{
NativeMemory.AlignedFree(a);
}
A pinned GCHandle is another option for older versions of .NET. You then need to calculate the starting aligned offset with the following code, where alignment would be 64 in your case.
var ptr = (long)handle.AddrOfPinnedObject();
var offset = (int) ((ptr + alignment - 1) / alignment * alignment - ptr) / sizeof(long);
Again you need to make sure to call handle.Free in a finally.
To avoid the memory leak, first you need to pin the array. Pinning prevents the object pointed to from moving on the garbage-collected heap.
There's an example of something similar to what you're doing here.
However, that example doesn't go far enough as it only pins without controlling the initial memory allocation. To also prevent the memory leak, instead use GCHandle.Alloc with GCHandleType.Pinned. Like this.
Related
I am receiving a buffer and i want from it to create a new buffer ( concatenating bytes prefixed,infixed and postfixed) and send it later on to a socket.
Eg:
Initial buffer: "aaaa"
Final buffer: "$4\r\naaaa\r\n" (Redis RESP Protocol - Bulk Strings)
How can i transform the span to memory ? (I do not know if i should use stackalloc given the fact that i do not know how big the input buffer is.I figured it would be faster).
private static readonly byte[] RESP_BULK_ID =BitConverter.GetBytes('$');
private static readonly byte[] RESP_FOOTER = Encoding.UTF8.GetBytes("\r\n");
static Memory<byte> GetNodeSpan(in ReadOnlyMemory<byte> payload) {
ReadOnlySpan<byte> payloadHeader = BitConverter.GetBytes(payload.Length);
Span<byte> result = stackalloc byte[
RESP_BULK_ID.Length +
payloadHeader.Length +
RESP_FOOTER.Length +
payload.Length +
RESP_FOOTER.Length
];
Span<byte> cursor = result;
RESP_BULK_ID.CopyTo(cursor);
cursor=cursor.Slice(RESP_BULK_ID.Length);
payloadHeader.CopyTo(cursor);
cursor = cursor.Slice(payloadHeader.Length);
RESP_FOOTER.CopyTo(cursor);
cursor = cursor.Slice(RESP_FOOTER.Length);
payload.Span.CopyTo(cursor);
cursor = cursor.Slice(payload.Span.Length);
RESP_FOOTER.CopyTo(cursor);
return new Memory<byte>(result.AsBytes()) // ?can not convert from span to memory ,and cant return span because it can be referenced outside of scope
}
P.S : Should i use old-school for loops instead of CopyTo?
Memory<T> is designed to have some managed object (for example an array) as a target. Converting Memory<T> to Span<T> then simply pins target object in memory and uses it's address to construct Span<T>. But opposit conversion is not possible - because Span<T> can point to part of memory that does not belong to any managed object (unmanaged memory, stack, etc.), it is not possible to directly convert Span<T> to Memory<T>. (There is actually way to do this, but it involves implementing your own MemoryManager<T> similar to NativeMemoryManager, is unsafe and dangerous and I'm pretty sure it is not what you want).
Using stackalloc is a bad idea for two reasons:
Since you don't know size of the payload in advace, you could easily get StackOverflowException if payload is too big.
(as comment in your source code already suggests) It is terrible idea trying to return something allocated on the stack of current method, as it would likely result in either corrupted data or application crash.
The only way to return result on the stack would require caller of GetNodeSpan to stackalloc memory in advance, convert it to Span<T> and pass it as an additional argument. Problem is that (1) caller of GetNodeSpan would have to know how much to allocate and (2) would not help you convert Span<T> to Memory<T>.
So to store the result, you will need object allocated on the heap. The simple solution is just to allocate new array, instead of stackalloc. Such array can then be used to construct Span<T> (used for copying) as well as Memory<T> (used as a method result):
static Memory<byte> GetNodeSpan(in ReadOnlyMemory<byte> payload)
{
ReadOnlySpan<byte> payloadHeader = BitConverter.GetBytes(payload.Length);
byte[] result = new byte[RESP_BULK_ID.Length +
payloadHeader.Length +
RESP_FOOTER.Length +
payload.Length +
RESP_FOOTER.Length];
Span<byte> cursor = result;
// ...
return new Memory<byte>(result);
}
The obvious drawback is that you have to allocate new array for each method call. To avoid this, you can use memory pooling, where allocated arrays are reused:
static IMemoryOwner<byte> GetNodeSpan(in ReadOnlyMemory<byte> payload)
{
ReadOnlySpan<byte> payloadHeader = BitConverter.GetBytes(payload.Length);
var result = MemoryPool<byte>.Shared.Rent(
RESP_BULK_ID.Length +
payloadHeader.Length +
RESP_FOOTER.Length +
payload.Length +
RESP_FOOTER.Length);
Span<byte> cursor = result.Memory.Span;
// ...
return result;
}
Please note that this solution returns IMemoryOwner<byte> (instead of Memory<T>). Caller can access Memory<T> with IMemoryOwner<T>.Memory property and must call IMemoryOwner<byte>.Dispose() to return array back to pool when memory is no longer needed. Second thing to notice is that MemoryPool<byte>.Shared.Rent() can actually return array that is longer than required minimum. Thus your method will probably need to also return actual length of the result (for example as an out parameter), because IMemoryOwner<byte>.Memory.Length can return more than was actually copied to the result.
P.S.: I would expect for loop to be marginally faster only for copying very short arrays (if at all), where you can save a few CPU cycles by avoiding method call. But Span<T>.CopyTo() uses optimized method that can copy several bytes at once and (i strongly believe) uses special CPU instructions for copying blocks of memory and therefore should be much faster.
I'm using a library which has a function SendBuffer(int size, IntPtr pointer) with IntPtr as a parameter.
var list = new List<float>{3, 2, 1};
IntPtr ptr = list.getPointerToInternalArray();
SendBuffer(ptr, list.Count);
How to get IntPtr from the array stored in List<T> (and/or T[])?
If this is a P/Invoke call to an unmanaged code you should retrieve the pinned address of the buffer (to prevent that GC relocate the buffer) and pass this to the method:
// use an array as a buffer
float[] buffer = new float[]{3, 2, 1};
// pin it to a fixed address:
GCHandle handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
try
{
// retrieve the address as a pointer and use it to call the native method
SendBuffer(handle.AddrOfPinnedObject(), buffer.Length);
}
finally
{
// free the handle so GC can collect the buffer again
handle.Free();
}
The array is sent every frame and it's big
In that case it might be warranted to access the internal backing array that List uses. This is a hack and brittle in the face of future .NET versions. That said .NET uses a very high compatibility bar and they probably would not change a field name in such a core type. Also, for performance reasons it is pretty much guaranteed that List will always use a single backing array for its items. So although this is a high risk technique it might be warranted here.
Or, better yet, write your own List that you control and that you can get the array from. (Since you seem to be concerned with perf I wonder why you are using List<float> anyway because accessing items is slower compared to a normal array.)
Get the array, then use fixed(float* ptr = array) SendBuffer(ptr, length) to pin it and pass it without copying memory.
There is not need to use the awkward and slow GCHandle type here. Pinning using fixed uses an IL feature to make this super fast. Should be near zero cost.
No guarantee that the internal representation of a List<T> is going to be a single array... in fact it's pretty likely that it's not. So you need to create a local array copy using ToArray in order for this to work.
Once you have, there are a couple of options.
First you can use the fixed keyword to pin the array and get a pointer to it:
T[] buffer = theList.ToArray();
unsafe
{
fixed (T* p = buffer)
{
IntPtr ptr = (IntPtr)p;
SomeFunction(ptr);
}
}
Alternatively you can tell the garbage collector to fix the data in memory until you're done with the operation, like this:
GCHandle pinned = GCHandle.Alloc(buffer, GCHandleType.Pinned);
IntPtr ptr = pinned.AddrOfPinnedObject();
SomeFunction(ptr);
pinnedArray.Free();
(Or see taffer's answer with more error handling).
In both cases you need to finish with the value before returning, so you can't use either method to get an IntPtr to the array as a return value. Doing it this way minimizes the opportunity for that pointer to be used for evil.
Please have a look at the following c# code:
double* ptr;
fixed(double* vrt_ptr = &vertices[0])
{
fixed(int* tris_ptr = &tris[0])
{
ptr = compute(vrt_ptr, 5, (double*)tris_ptr, 5);
// compute() is a native C++ function
}
}
Debug.Log("Vertices Recieved: " + *ptr);
/* and so on */
I am having garbage value from *ptr. I have a suspicion that the array assigned to ptr by compute doesn't retain outside fixed block. Is it so?? Or is it due to some other problem?
This is not valid code, the garbage collector can only update the value of the vrt_ptr and tris_ptr variables. But the unmanaged code uses a copy of these pointers, the value of the copy cannot be updated by the GC. So if a garbage collection occurs while the unmanaged code is running, possible for example when other threads in the program trigger a collection, then the unmanaged code will read garbage data through the pointer copy. Very hard to diagnose, it doesn't happen very often.
You must pin the vertices and tris arrays. In your case already ably done by the pinvoke marshaller, simply by passing the arrays directly without using fixed. Fix:
double* ptr = compute(vertices, 5, tris, 5);
Adjust the pinvoke declaration accordingly, replacing double* with double[].
You'll now also have to the deal with the likely reason you wrote this code in the first place. There is no scenario where casting an int[] to double[] is ever valid, the likely reason you got a garbage result early before that GC disaster could strike. If you can't update the declaration of tris for some reason then you must create a double[] before the call.
I call a piece of an unmanaged C++ code from my C# application
to calculate fast fourier transform of a discrete time signal.
I make a call something like this
IntPtr ptr = ComputeFFTW(packetSig, packetSig.Length, (int)samplFrequency,(int)fftPoints);
unsafe
{
double *dPtr = (double*)ptr;
for(int l = 0; l < fftData.Length; l++)
{
fftData[l] = dPtr[l];
}
}
Though this snippet of code works fine and gives me the desired results, i can see that there is sort of performance hit (memory leak) is incurred while calculation is in progress. The CLR fails to reclaim the local (double) variables and my application gobbles up RAM space considerably.
Can anyone of you suggest the places where i might be doing it wrong.
From my side, I ran my application using ANTS Mem Profiler and i can see on the snapshot that the double objects nearly claim >150MB of the mem space. Is this a normal behaviour ??
Class Name Live Size (bytes) Live Instances
Double[] 150,994,980 3
Any help is appreciated in this regard
Srivatsa
Since the C++ function allocates memory you will have to manually free that chunk in your C# application (free the pointer). A better way to do invoke unmanaged code is to allocate all the variables and memory chunks (Temp parameters too) in your C# application and pass them to your C++ code as parameters. In this way you wont have any memory issues with your unmanaged code.
You can use Marshal.Copy(IntPtr, Double[], Int32, Int32) method to copy array of double values from unmanaged ptr to managed ffData array:
IntPtr ptr = ComputeFFTW(packetSig, packetSig.Length, (int)samplFrequency,(int)fftPoints);
Marshal.Copy(ptr, fftData, 0, fftData.Length);
If ComputeFFTW returns pointer to dynamically allocated memory, you need to release it after using. Make this in unmanaged code, add function like Release and pass ptr to it.
Can someone explain me why the below C# code doesn't crash? Why does Visual Studio actually allow to compile it? My understanding is that I am getting a fixed pointer, but it is fixed only within the 'fixed' statement. When the pointer is returned from the 'Foo' function, the array 'ar' may be collected. I then force GC to actually do this, but consecutive writing to the memory (which is now deallocated) doesn't cause any error.
class Program
{
static unsafe byte* Foo()
{
byte[] ar = new byte[100];
fixed (byte* ptr = ar)
{
return ptr;
}
}
static unsafe void Main(string[] args)
{
byte* ptr = Foo();
GC.Collect();
for (int t = 0;;++t) ptr[t%100] = 0;
}
}
Eric is right, but the answer you probably want to hear is that "sometimes it's useful to retain the address outside of the fixed statement".
Maybe the memory from that pointer is already fixed by another fixed statement somewhere else, and it makes sense to return it? The compiler is not trying to second guess you and give noisy warnings.
That said, I'd hope that CodeAnalysis or other advanced tools would step in here where the compiler is letting you cut off your own foot.
Just because the memory is freed doesn't mean that writing to it would cause an error of any kind. When the garbage collector reclaims memory, it just marks it as free in it's internal memory map -- it doesn't give it back to the OS right away, so it's still valid memory for your process to use.
Of course, using a pointer outside of the fixed block for it is a very bad idea -- don't do it.