Fixing an array of array in C# (unsafe code) - c#

I'm trying to come up with a solution as to how I can pass an array of arrays from C# into a native function. I already have a delegate to the function (Marshal.GetDelegateForFunctionPointer), but now I'm trying to pass a multidimensional array (or rather; an array of arrays) into it.
This code example works when the input has 2 sub-arrays, but I need to be able to handle any number of sub-arrays. What's the easiest way you can think of to do that? I'd prefer not to copy the data between arrays as this will be happening in a real-time loop (I'm communicating with an audio effect)
public void process(float[][] input)
{
unsafe
{
// If I know how many sub-arrays I have I can just fix them like this... but I need to handle n-many arrays
fixed (float* inp0 = input[0], inp1 = input[1] )
{
// Create the pointer array and put the pointers to input[0] and input[1] into it
float*[] inputArray = new float*[2];
inputArray[0] = inp0;
inputArray[1] = inp1;
fixed(float** inputPtr = inputArray)
{
// C function signature is someFuction(float** input, int numberOfChannels, int length)
functionDelegate(inputPtr, 2, input[0].length);
}
}
}
}

You can pin an object in place without using fixed by instead obtaining a pinned GCHandle to the object in question. Of course, it should go without saying that by doing so you take responsibility for ensuring that the pointer does not survive past the point where the object is unpinned. We call it "unsafe" code for a reason; you get to be responsible for safe memory management, not the runtime.
http://msdn.microsoft.com/en-us/library/system.runtime.interopservices.gchandle.aspx

It makes no sense trying to lock the array of references to the managed arrays.
The references values in there probably don't point to the adress of the first element, and even if they did, that would be an implementation detail. It could change from release to release.
Copying an array of pointers to a lot of data should not be that slow, especcially not when compared with the multimedia processing you are calling into.
If it is significant, allocate your data outside of the managed heap, then there is no pinning or copying. But more bookkeeping.

The easiest way I know is to use one dimension array. It reduce complexity, memory fragmentation and also will have better performance. I actually do so in my project. You can use manual indexing like array[i][j] = oneDimArray[i *n + j] and pass n as param to a function. And you will do only one fixing just like you done in your example:
public void process(float[] oneDimInput, int numberOfColumns)
{
unsafe
{
fixed (float* inputPtr = &oneDimInput[0])
{
// C function signature is someFuction(
// float* input,
// int number of columns in oneDimInput
// int numberOfChannels,
// int length)
functionDelegate(inputPtr, numberOfColumns, 2, oneDimInput[0].length);
}
}
}
Also I need to note, that two dimension arrays rarely used in high performance computation libraries as Intel MKL, Intel IPP and many others. Even BLAS and Lapack interfaces contain only one dimension arrays and emulate two dimension using aproach I've mentioned (for performance reasons).

Related

Best way to return array of certain length

I'm new to C# from C programming background, sorry for probably a basic question. I'm trying to find the best way to return an array of values from a method.
In C, we can do either of the following:
void myFuction(double[] inputA, int lenA, double* outputA) : the function gets the input array "inputA" of length "lenA" elements, and returns the "outputA" pointer which is allocated memory before the call to "myFunction" and has got populated in the function "myFunction".
double* myFunction (double[] inputA, int lenA) : alternatively, be allocated memory inside "myFucntion" based on a desired length, gets populated, and returned as "return outputA".
What's the best way to do this in C#?
What's the best way to do this in C#?
Pointers are almost never used directly in C#, so the idiomatic signature would use arrays:
double[] myFunction (double[] inputA, int lenA)
but note that in C# you can also get the length of an array very easily (int len = inputA.Length), so lenA may not be needed unless you want to support processing a subset of the array.
As you progress, you'll find that other data structures like List and interfaces like IEnumerable may be preferable depending on your use case (i.e. you may decide that the function should just return an iterable collection without specifying the actual type of that collection).
As a side note (and from someone who went from C++ to C# myself), the best thing you can do is NOT think of C# as an "extension" or "alternative" to C++. While the syntax and types are very similar, C# as a managed language does many fundamental things very differently, so you may be limiting yourself if you just try to "port" C++ code to C#. Try to think of it as a brand new language and framework with some syntax overlap.
c# allows for both of these methods as well, however arrays are denoted with [] even in returns instead of the * pointer.
double[] myFunction(double[] inputA, int lenA)
{
double[] output = new double[lengthOfArray];
...
return output;
}
Alternatively, c# allows for an "out" parameter that can be used in a similar manner.
void myFunction(double[] inputA, int lenA, out double[] output)
{
...
output = ...
//OR
output[0] = 1; //replace 0 with index #
}
Both are acceptable practices, with the first example being the more common way to do it in c# for readable and organization.

Understanding Unsafe code and its uses

I am currently reading the ECMA-334 as suggested by a friend that does programming for a living. I am on the section dealing with Unsafe code. Although, I am a bit confused by what they are talking about.
The garbage collector underlying C# might work by moving objects
around in memory, but this motion is invisible to most C# developers.
For developers who are generally content with automatic memory
management but sometimes need fine-grained control or that extra bit
of performance, C# provides the ability to write “unsafe” code. Such
code can deal directly with pointer types and object addresses;
however, C# requires the programmer to fix objects to temporarily
prevent the garbage collector from moving them. This “unsafe” code
feature is in fact a “safe” feature from the perspective of both
developers and users. Unsafe code shall be clearly marked in the code
with the modifier unsafe, so developers can't possibly use unsafe
language features accidentally, and the compiler and the execution
engine work together to ensure 26 8 9BLanguage overview that unsafe
code cannot masquerade as safe code. These restrictions limit the use
of unsafe code to situations in which the code is trusted.
The example
using System;
class Test
{
static void WriteLocations(byte[] arr)
{
unsafe
{
fixed (byte* pArray = arr)
{
byte* pElem = pArray;
for (int i = 0; i < arr.Length; i++)
{
byte value = *pElem;
Console.WriteLine("arr[{0}] at 0x{1:X} is {2}",
i, (uint)pElem, value);
pElem++;
}
}
}
}
static void Main()
{
byte[] arr = new byte[] { 1, 2, 3, 4, 5 };
WriteLocations(arr);
Console.ReadLine();
}
}
shows an unsafe block in a method named WriteLocations that fixes an
array instance and uses pointer manipulation to iterate over the
elements. The index, value, and location of each array element are
written to the console. One possible example of output is:
arr[0] at 0x8E0360 is 1
arr[1] at 0x8E0361 is 2
arr[2] at 0x8E0362 is 3
arr[3] at 0x8E0363 is 4
arr[4] at 0x8E0364 is 5
but, of course, the exact memory locations can be different in
different executions of the application.
Why is knowing the exact memory locations of for example, this array beneficial to us as developers? And could someone explain this ideal in a simplified context?
The fixed language feature is not exactly "beneficial" as it is "absolutely necessary".
Ordinarily a C# user will imagine Reference-types as being equivalent to single-indirection pointers (e.g. for class Foo, this: Foo foo = new Foo(); is equivalent to this C++: Foo* foo = new Foo();.
In reality, references in C# are closer to double-indirection pointers, it's a pointer (or rather, a handle) to an entry in a massive object table that then stores the actual addresses of objects. The GC not only will clean-up unused objects, but also move objects around in memory to avoid memory fragmentation.
All this is well-and-good if you're exclusively using object references in C#. As soon as you use pointers then you've got problems because the GC could run at any point in time, even during tight-loop execution, and when the GC runs your program's execution is frozen (which is why the CLR and Java are not suitable for Hard Real Time applications - a GC pause can last a few hundred milliseconds in some cases).
...because of this inherent behaviour (where an object is moved during code execution) you need to prevent that object being moved, hence the fixed keyword, which instructs the GC not to move that object.
An example:
unsafe void Foo() {
Byte[] safeArray = new Byte[ 50 ];
safeArray[0] = 255;
Byte* p = &safeArray[0];
Console.WriteLine( "Array address: {0}", &safeArray );
Console.WriteLine( "Pointer target: {0}", p );
// These will both print "0x12340000".
while( executeTightLoop() ) {
Console.WriteLine( *p );
// valid pointer dereferencing, will output "255".
}
// Pretend at this point that GC ran right here during execution. The safeArray object has been moved elsewhere in memory.
Console.WriteLine( "Array address: {0}", &safeArray );
Console.WriteLine( "Pointer target: {0}", p );
// These two printed values will differ, demonstrating that p is invalid now.
Console.WriteLine( *p )
// the above code now prints garbage (if the memory has been reused by another allocation) or causes the program to crash (if it's in a memory page that has been released, an Access Violation)
}
So instead by applying fixed to the safeArray object, the pointer p will always be a valid pointer and not cause a crash or handle garbage data.
Side-note: An alternative to fixed is to use stackalloc, but that limits the object lifetime to the scope of your function.
One of the primary reasons I use fixed is for interfacing with native code. Suppose you have a native function with the following signature:
double cblas_ddot(int n, double* x, int incx, double* y, int incy);
You could write an interop wrapper like this:
public static extern double cblas_ddot(int n, [In] double[] x, int incx,
[In] double[] y, int incy);
And write C# code to call it like this:
double[] x = ...
double[] y = ...
cblas_dot(n, x, 1, y, 1);
But now suppose I wanted to operate on some data in the middle of my array say starting at x[2] and y[2]. There is no way to make the call without copying the array.
double[] x = ...
double[] y = ...
cblas_dot(n, x[2], 1, y[2], 1);
^^^^
this wouldn't compile
In this case fixed comes to the rescue. We can change the signature of the interop and use fixed from the caller.
public unsafe static extern double cblas_ddot(int n, [In] double* x, int incx,
[In] double* y, int incy);
double[] x = ...
double[] y = ...
fixed (double* pX = x, pY = y)
{
cblas_dot(n, pX + 2, 1, pY + 2, 1);
}
I've also used fixed in rare cases where I need fast loops over arrays and needed to ensure the .NET array bounds checking was not happening.
In general, the exact memory locations within an "unsafe" block are not so relevant.
As explained in Dai`s answer, when you are using Garbage Collector managed memory, you need to make sure that the data you are manipulating does not get moved (using "fixed"). You generally use this when
You are running a performance critical operation many times in a loop, and manipulating raw byte structures is sufficiently faster.
You are doing interop and have some non-standard data marshaling needs.
In a some cases, you are working with memory that is not managed by the Garbage Collector, some examples of such scenarios are:
When doing interop with unmanaged code, it can be used to prevent repeatedly marshaling data back and forth, and instead do some work in larger granularity chunks, using the "raw bytes", or structs mapped to these raw bytes.
When doing low level IO with large buffers that you need to share with the OS (e.g. for scatter/gather IO).
When creating specific structures in a memory mapped file. An example for instance could be a B+Tree with memory page sized nodes, that is stored in a disk based file that you want to page into memory.

How to convert ref byte into byte[]?

Have anyone had an experience in converting ref byte into byte[]?
If the function takes an argument like
void foo(ref byte buffer);
then it is possible to call foo using
void call_func()
{
byte arr[] = new byte[10];
foo(ref arr[0]);
}
The question is how can one re-convert the buffer argument into byte[] array in the foo.
You don't.
In order to avoid pinning the entire array, the runtime might just make a copy of the single element you selected (and then copy back after the call). In that case your function will get the address of a temporary copy, which is unrelated to the address of the other array elements. (Well, there could be some aliasing considerations, this optimization is much more likely for pinvoke and/or remote calls, where aliasing analysis is more feasible)
If you need an array, pass the array.
If you don't care that it might not work right, you can use unsafe code to get to the other elements.
pinned( byte* p = &buffer ) {
buffer[4] = 0;
}

Pass a pointer to an array section as a parameter in C#

I'm just learning neural networks and I would like to have the neuron's constructor receive a pointer to a section in an array that would be the chromosome. Something like this:
public int* ChromosomeSection;
public Neuron(int* chromosomeSection)
{
ChromosomeSection = chromosomeSection;
}
So then I would create my neurons with something like this:
int[] Chromosome = new int[neuronsCount * neuronDataSize];
for (int n = 0; n < Chromosome.Length; n += neuronDataSize)
{
AddNeuron(new Neuron(Chromosome + n));
}
Is it possible to do this in C#? I know C# supports some unsafe code. But I don't know how to tell the compiler that the line public Neuron(int* chromosomeSection) is unsafe.
Also, will I be able to do every operation that I would do in C++ or C? Is there any gotcha I should be aware of before starting to do it this way? Never worked with unsafe code in C# before.
Eric Lippert has nice two-part series: References and Pointers, Part One and "managed pointers" implementation (References and Pointers, Part Two).
Here's a handy type I whipped up when I was translating some complex pointer-manipulation code from C to C#. It lets you make a safe "managed pointer" to the interior of an array. You get all the operations you can do on an unmanaged pointer: you can dereference it as an offset into an array, do addition and subtraction, compare two pointers for equality or inequality, and represent a null pointer. But unlike the corresponding unsafe code, this code doesn't mess up the garbage collector and will assert if you do something foolish, like try to compare two pointers that are interior to different arrays. (*) Enjoy!
Hope it helps.
Sounds like you could use ArraySegment<int> for what you are trying to do.
ArraySegment is a wrapper around
an array that delimits a range of
elements in that array. Multiple
ArraySegment instances can refer to
the same original array and can
overlap.
The Array property returns the entire
original array, not a copy of the
array; therefore, changes made to the
array returned by the Array property
are made to the original array.
Yes this is perfectly possible in C#, although a pointer on alone is not sufficient information to use it in this way, you'd also need an Int32 length parameter, so you know how many times it's safe to increment that pointer without an overrun - this should be familiar if you're from a C++ background.

How to get the pointer to the middle of an array in c#

First, basic info on our environment: We're using c# .net 4.0, on Win7-x64, targeting 32-bit.
We have a preallocated -large- array. In a function, we would like to return a pointer to an arbitrary point in this array, so that the calling function can know where to write. Ex:
class SomeClass {
void function_that_uses_the_array() {
Byte [] whereToWrite = getEmptyPtrLocation(1200);
Array.Copy(sourceArray, whereToWrite, ...);
}
}
class DataProvider {
int MAX_SIZE = 1024*1024*64;
Byte [] dataArray = new Byte[MAX_SIZE];
int emptyPtr=0;
Byte[] getEmptyPtrLocation(int requestedBytes) {
int ref = emptyPtr;
emptyPtr += requestedBytes;
return dataArray[ref];
}
}
Essentially, we want to preallocate a big chunk of memory, and reserve arbitrary length portions of this memory block and let some other class/function to use that portion of memory.
In the above example, getEmptyPtrLocation function is incorrect; it is declared as returning Byte[], but attempting to return a single byte value.
Thanks
As others have said, you can't do this in C# - and generally you shouldn't do anything like it. Work with the system - take advantage of the garbage collector etc.
Having said that, if you want to do something similar and you can trust your clients not to overrun their allocated slot, you could make your method return ArraySegment<byte> instead of byte[]. That would represent "part of an array". Obviously most of the methods in .NET don't use ArraySegment<T> - but you could potentially write extension methods using it as the target for some of the more common operations that you want to use.
This is C# not C++ - you are truly working against the garbage collector here. Memory allocation is generally very fast in C#, also it doesn't suffer from memory fragmentation one of the other main reasons to pre-allocate in C++.
Having said that and you still want to do it I would just use the array index and length, so you can use Array.Copy to write at that position.
Edit:
As others have pointed out the ArraySegment<T> fits your needs better: it limits the consumer to just his slice of the array and it doesn't require to allocate a separate array before updating the content in the original array (as represented by the ArraySegment).
Quite old post this is but I have an easy solution. I see that this question has lot's of answers like "you shouldn't". If there is a method to do something, why not use it?
But that aside, you can use the following
int[] array = new int[1337];
position = 69;
void* pointer = (void*)Marshal.UnsafeAddrOfPinnedArrayElement(array, position);
Without going into unmanaged code, you can't return a pointer to the middle of the array in C#, all you can do is return the array index where you want to enter the data.
If you must do this, your options include returning an ArraySegment (as Jon Skeet pointed out), or using Streams. You can return a MemoryStream that is a subset of an array using this constructor. You will need to handle all of your memory reads and writes as stream operations however, not as offset operations.

Categories