Fast iterating over 3D array with unsafe code?

Fast iterating over 3D array with unsafe code? - c#

I have a piece of code where I iterate through a huge 3D array with two for-loops. Now I have performance problems, it is just too slow. What can I do?
I read somewhere that unmanaged code could solve the problem. Did I understand right: Unmanaged Code runs outside the .net engine?
Well I put a unsafe-block arround my array iterations, but it didn't help. I think thats because I still use the managed array. How can I copy my array into a unsafe array or get an unsafe pointer to this array? I tried fixed (see code below) but I get compiler errors.
byte[, ,] data = original.Data;
unsafe
{
fixed (byte*** dataPtr = (byte***)data) // data is of type byte[,,]
{
for (int i = original.Rows - 1; i >= 0; i--)
{
for (int j = original.Cols - 1; j >= 0; j--)
{
if (dataPtr[i,j,0] < 100)
{
dataPtr[i, j, 0] += 100;
dataPtr[i, j, 1] += 40;
dataPtr[i, j, 2] += 243;
}
else
{
dataPtr[i,j,0] = 0;
}
}
}
}
}
How can I use this fixed keyword in an 3D array and would it help to make my code faster?

If you want speed, you don't use a multidimensional array. You need to use a byte[][][] jagged array, which is heavily optimized in the CLR. This single change will likely speed up your loops sufficiently that you don't need to do anything else.

Related

Reshape c# array in place

There are may operations on arrays that do not depend on the rank of an array. Iterators are also not always a suitable solution. Given the array
double[,] myarray = new double[10,5];
it would be desirable to realize the following workflow:
Reshape an array of Rank>1 to a linear array with rank=1 with the same number of elements. This should happen in place to be runtime efficient. Copying is not allowed.
Pass reshaped array to a method defined for Rank=1 arrays only. e.g. Array.copy()
Reshape result array to original rank and dimensions.
There is a similar question on this topic: How to reshape array in c#. The solutions there use memory copy operation with BlockCopy().
My question are:
Can this kind of reshaping be realized without memory copy? Or even in a temporary way like creating a new view on the data?

There wording to this is a little tough, yet surely pointers unsafe and fixed would work. No memory copy, direct access, add pepper and salt to taste
The CLR just wont let you cast an array like you want, any other method you can think of will require allocating a new array and copy (which mind you can be lightening fast). The only other possibly way to so this is to use fixed, which will give you contiguous 1 dimensional array.
unsafe public static void SomeMethod(int* p, int size)
{
for (var i = 0; i < 4; i++)
{
//Perform any linear operation
*(p + i) *= 10;
}
}
...
var someArray = new int[2,2];
someArray[0, 0] = 1;
someArray[0,1] = 2;
someArray[1, 0] = 3;
someArray[1, 1] = 4;
//Reshape an array to a linear array
fixed (int* p = someArray)
{
SomeMethod(p, 4);
}
//Reshape result array to original rank and dimensions.
for (int i = 0; i < 2; i++)
{
for (int j = 0; j < 2; j++)
{
Console.WriteLine(someArray[i, j]);
}
}
Output
10
20
30
40

Sorting array using BubbleSort fails

Following algorithm works pretty fine in C#
public int[] Sortieren(int[] array, int decide)
{
bool sorted;
int temp;
for (int i = 0; i < array.Length; i++)
{
do
{
sorted= true;
for (int j = 0; j < array.Length - 1; j++)
{
if (decide == 1)
{
if (array[j] < array[j + 1])
{
temp = array[j];
array[j] = array[j + 1];
array[j + 1] = temp;
sorted= false;
}
}else if (decide == 0)
{
if (array[j] > array[j + 1])
{
temp = array[j];
array[j] = array[j + 1];
array[j + 1] = temp;
sorted= false;
}
}
else
{
Console.WriteLine("Incorrect sorting parameter!");
break;
}
}
} while (!sorted);
}
return array;
}
Same thing in C fails. I only get the first two numbers of the array being sorted. The rest of the numbers are same. So, this code also seems to change the array instead of only sorting it. Any ideas, where are the bugs?
#include <stdio.h>
#include<stdbool.h>
#define MAX 10
void main(void)
{
int random_numbers[MAX],temp,Array_length;
bool sorted;
srand(time(NULL));
for(int i=0;i<=MAX;i++){
random_numbers[i]=rand()%1000;
}
Array_length=sizeof(random_numbers) / sizeof(int);
printf("List of (unsorted) numbers:\n");
for(int i=0;i<MAX;i++){
if(i==MAX-1)
printf("%i",random_numbers[i]);
else
printf("%i,",random_numbers[i]);
}
//Searching algorithm
for(int i=0;i<Array_length;i++){
do{
sorted=true;
for(int j=0;j<Array_length-1;j++){
if(random_numbers[j]>random_numbers[j+1]){
temp=random_numbers[j];
random_numbers[j]==random_numbers[j+1];
random_numbers[j+1]=temp;
sorted=false;
}
}
}while(!sorted);
}
printf("\n");
for(int i=0;i<Array_length;i++){
if(i==Array_length-1)
printf("%i",random_numbers[i]);
else
printf("%i,",random_numbers[i]);
}
}

You have an error in your swap algorithm:
if (zufallszahlen[j] > zufallszahlen[j+1]) {
temp = zufallszahlen[j];
zufallszahlen[j] == zufallszahlen[j+1]; // here
zufallszahlen[j+1] = temp;
sortiert = false;
}
In the line after you assign to temp, your double equal sign results in a check for equality rather than an assignment. This is still legal code (== is an operator and and expressions that use them evaluate to something), and the expression will evaluate to either 1 or 0 depending on the truth value of the statement. Note that this is legal even though you're not using the expression, where normally a boolean value would presumably be used for control flow.
Note that this is true for other operators as well. For example, the = operator assigns the value on the right to the variable on the left, so hypothetically a mistake like if (x = 0) will mean this branch will never be called, since the x = 0 will evaluate to false every time, when you may have meant to branch when x == 0.
Also, why are you using a boolean value to check if the array is sorted? Bubble sort is a simple algorithm, so it should be trivial to implement, and by the definition of an algorithm, it's guaranteed to both finish and be correct. If you were trying to optimize for performance purposes, for example choosing between merge sort and insertion sort based on whether the data was already sorted then I would understand, but you're checking whether the data is sorted as you're sorting it, which doesn't really make sense, since the algorithm will tell you when it's sorted because it will finish. Adding the boolean checking only adds overhead and nets you nothing.
Also, note how in your C# implementation, you repeated the sort process. This is a good sign your design is wrong. You take in an integer as well as the actual int[] array in your C# code, and you use that integer to branch. Then, from what I can gather, you sort using either < or >, depending on the value passed in. I'm pretty confused by this, since either would work. You gain nothing from adding this functionality, so I'm confused as to why you added it in.
Also, why do you repeat the printf statements? Even doing if/else if I might understand. But you're doing if/else. This is logically equivalent to P V ~P and will always evaluate to true, so you might as well get rid of the if and the else and just have one printf statement.
Below is implementation of your Bubble Sort program, and I want to point out a few things. First, it's generally frowned upon to declare main as void (What should main() return in C and C++?).
I quickly want to also point out that even though we are declaring the maximum length of the array as a macro, all of the array functions I defined explicitly take a size_t size argument for referrential transparency.
Last but not least, I would recommend not declaring all your variables at the start of your program/functions. This is a more contested topic among developers, especially because it used to be required, since compilers needed to know exactly what variables needed to be allocated. As compilers got better and better, they could accept variable declarations within code (and could even optimize some variables away altogether), so some developers recommend declaring your variables when you need them, so that their declaration makes sense (i.e... you know you need them), and also to reduce code noise.
That being said, some developers do prefer declaring all their variables at the beginning of the program/function. You'll especially see this:
int i, j, k;
or some variation of that, because the developer pre-declared all of their loop counters. Again, I think it's just code noise, and when you work with C++ some of the language syntax itself is code noise in my opinion, but just be aware of this.
So for example, rather than declaring everything like this:
int zufallszahlen[MAX], temp, Array_length;
You would declare the variables like this:
int zufallszahlen[MAX];
int Array_length = sizeof (zufallszahlen) / sizeof (int);
The temp variable is then put off for as long as possible so that it's obvious when and were it's useful. In my implementation, you'll notice I declared it in the swap function.
For pedagogical purposes I would also like to add that you don't have to use a swap variable when sorting integers because you can do the following:
a = a + b;
b = a - b;
a = a - b;
I will say, however, that I believe the temporary swap variable makes the swap much more instantly familiar, so I would say leave it in, but this is my own personal preference.
I do recommend using size_t for Array_length, however, because that's the type that the sizeof operator returns. It makes sense, too, because the size of an array will not be negative.
Here are the include statements and functions. Remember that I do not include <stdbool.h> because the bool checking you were doing was doing nothing for the algorithm.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define MAX 10
void PrintArray(int arr[], size_t n) {
for (int i = 0; i < n; ++i) {
printf("%d ", arr[i]);
}
printf("\n");
}
void PopulateArray(int arr[], size_t n) {
for (int i = 0; i < n; ++i) {
arr[i] = rand() % 1000 + 1;
}
}
void BubbleSortArray(int arr[], size_t n) {
for (int i = 0; i < n; ++i) {
for (int j = 0; j < n - 1; ++j) {
if (arr[j] > arr[j+1]) {
int temp = arr[j+1];
arr[j+1] = arr[j];
arr[j] = temp;
}
}
}
}
To implement the bubble sort algorithm, the only thing you have to do now is initialize the random number generator like you did, create your array and populate it, and finally sort the array.
int main()
{
srand(time(NULL));
int arr[MAX];
size_t array_length = sizeof (arr) / sizeof (int);
PopulateArray(arr, array_length);
PrintArray(arr, array_length);
BubbleSortArray(arr, array_length);
PrintArray(arr, array_length);
}
I hope this helps, let me know if you have any questions.

I need to do my operations faster

I have a piece of code that reads points from an stl, then I have to do a transformation, aplying a transformation matrix, of this stl and write the results on other stl. I do all this stuff, but it's too slow, about 5 or more minutes.
I put the code of the matrix multiplication, it recieves the two matrix and it makes the multiplication:
public double[,] MultiplyMatrix(double[,] A, double[,] B)
{
int rA = A.GetLength(0);
int cA = A.GetLength(1);
int rB = B.GetLength(0);
int cB = B.GetLength(1);
double temp = 0;
double[,] kHasil = new double[rA, cB];
if (cA != rB)
{
MessageBox.Show("matrix can't be multiplied !!");
}
else
{
for (int i = 0; i < rA; i++)
{
for (int j = 0; j < cB; j++)
{
temp = 0;
for (int k = 0; k < cA; k++)
{
temp += A[i, k] * B[k, j];
}
kHasil[i, j] = temp;
}
}
return kHasil;
}
return kHasil;
}
My problem is that all the code is too slow, it has to read from a stl, multiply all the points and write in other stl the results, it spends 5-10 minutes to do that. I see that all comercial programs, like cloudcompare, do all this operations in a few seconds.
Can anyone tell me how I can do it faster? Is there any library to do that faster than my code?
Thank you! :)

I fond this on internet:
double[] iRowA = A[i];
double[] iRowC = C[i];
for (int k = 0; k < N; k++) {
double[] kRowB = B[k];
double ikA = iRowA[k];
for (int j = 0; j < N; j++) {
iRowC[j] += ikA * kRowB[j];
}
}
then use Plinq
var source = Enumerable.Range(0, N);
var pquery = from num in source.AsParallel()
select num;
pquery.ForAll((e) => Popt(A, B, C, e));
Where Popt is our method name taking 3 jagged arrays (C = A * B) and the row to calculate (e). How fast is this:
1.Name Milliseconds2.Popt 187
Source is: Daniweb
That's over 12 times faster than our original code! With the magic of PLINQ we are creating 500 threads in this example and don't have to manage a single one of them, everything is handled for you.

You have couple of options:
Rewrite your code with jagged arrays (like double[][] A) it should give ~2x increase in speed.
Write unmanaged C/C++ DLL with matrix multiplication code.
Use third-party math library that have native BLAS implementation under the hood. I suggest Math.NET Numerics, it can be switched to use Intel MKL which is smoking fast.
Probably, the third option is the best.

Just for the records: CloudCompare is not a commercial product. It's a free open-source project. And there are no 'huge team of developers' (only a handful of them actually, doing this on their free time).
Here is our biggest secret: we use pure C++ code ;). And we rarely use multi-threading but for very lengthy processes (you have to take the thread management and processing time overhead into account).
Here are a few 'best practice' rules for the parts of the code that are called loads of times:
avoid any dynamic memory allocation
make as less (far) function calls as possible
always process the most probable case first in a 'if-then-else' branching
avoid very small loops (inline them if N = 2 or 3)

Is there a more efficient way to convert double to float?

I have a need to convert a multi-dimensional double array to a jagged float array. The sizes will var from [2][5] up to around [6][1024].
I was curious how just looping and casting the double to the float would perform and it's not TOO bad, about 225µs for a [2][5] array - here's the code:
const int count = 5;
const int numCh = 2;
double[,] dbl = new double[numCh, count];
float[][] flt = new float[numCh][];
for (int i = 0; i < numCh; i++)
{
flt[i] = new float[count];
for (int j = 0; j < count; j++)
{
flt[i][j] = (float)dbl[i, j];
}
}
However if there are more efficient techniques I'd like to use them. I should mention that I ONLY timed the two nested loops, not the allocations before it.
After experimenting a little more I think 99% of the time is burned on the loops, even without the assignment!

This will run faster, for small data it's not worth doing Parallel.For(0, count, (j) => it actually runs considerably slower for very small data, which is why that I have commented that section out.
double* dp0;
float* fp0;
fixed (double* dp1 = dbl)
{
dp0 = dp1;
float[] newFlt = new float[count];
fixed (float* fp1 = newFlt)
{
fp0 = fp1;
for (int i = 0; i < numCh; i++)
{
//Parallel.For(0, count, (j) =>
for (int j = 0; j < count; j++)
{
fp0[j] = (float)dp0[i * count + j];
}
//});
flt[i] = newFlt.Clone() as float[];
}
}
}
This runs faster because double accessing double arrays [,] is really taxing in .NET due to the array bounds checking. the newFlt.Clone() just means we're not fixing and unfixing new pointers all the time (as there is a slight overhead in doing so)
You will need to run it with the unsafe code tag and compile with /UNSAFE
But really you should be running with data closer to 5000 x 5000 not 5 x 2, if something takes less than 1000 ms you need to either add in more loops or increase the data because at that level a minor spike in cpu activity can add a lot of noise to your profiling.

In your example - I think you dont measure the double/float comparison so much (which should be a processor internal instruction) as the array accesses (which have a lot of redirects plus obviousl.... aray delimiter checks (for the array index of bounds exception).
I would suggest timining a test without arrays.

I don't really think that you can optimize your code much more, one option would be to make your code parallel but for your input data size ([2][5] up to around [6][1024]) I don't thing that you would profit so much if you would even have any profit. In fact, I wouldn't even bother optimizing that piece of code at all...
Anyway, to optimize that, the only thing that I would do (if that fits in what you want to do) would be to just used fixed-width arrays instead of the jagged ones, even if you would waste memory with that.

If you could use also Lists in your case you could use the LINQ approach:
List<List<double>> t = new List<List<double>>();
//adding test data
t.Add(new List<double>() { 12343, 345, 3, 23, 2, 1 });
t.Add(new List<double>() { 43, 123, 3, 54, 233, 1 });
//creating target
List<List<float>> q;
//conversion
q = t.ConvertAll<List<float>>(
(List<double> inList) =>
{
return inList.ConvertAll<float>((double inValue) => { return (float)inValue; });
}
);
if its faster you have to measure yourself. (doubtful)
but you could parallelize it which could fasten it up (PLINQ)

Cut rows of a matrix without using loops

I have a matrix and i want to create a new matrix which will be the old matrix, but without the first row and first column. is there a way to do this without using loops?

i want to create a new matrix
From this it sounds to me like you want a new T[,] object.
which will be the old matrix, but without the first row and first column
I interpret this to mean you want the new T[,] object to contain the same values as the original, excepting the first row/column.
is there a way to do this without using loops?
If I've interpreted your question correctly, then no, not really. You will need to copy elements from one array to another; this requires enumeration. But that doesn't mean you can't abstract the implementation of this method into a reusable method (in fact, this is what you should do).
public static T[,] SubMatrix(this T[,] matrix, int xstart, int ystart)
{
int width = matrix.GetLength(0);
int height = matrix.GetLength(1);
if (xstart < 0 || xstart >= width)
{
throw new ArgumentOutOfRangeException("xstart");
}
else if (ystart < 0 || ystart >= height)
{
throw new ArgumentOutOfRangeException("ystart");
}
T[,] submatrix = new T[width - xstart, height - ystart];
for (int i = xstart; i < width; ++i)
{
for (int j = ystart; j < height; ++j)
{
submatrix[i - xstart, j - ystart] = matrix[i, j];
}
}
return submatrix;
}
The above code isn't pretty, but once it's in place you'll be able to use it quite neatly:
T[,] withoutFirstRowAndColumn = originalMatrix.SubMatrix(1, 1);
Now, if I misinterpreted your question, and you are not dead-set on creating a new T[,] object, you can improve the efficiency of this approach by not allocating a new T[,] at all; you could take Abel's idea (along with its caveats) and use unsafe code to essentially simulate a T[,] with indices pointing to the elements of the original matrix. Come to think of it, you could even achieve this without resorting to unsafe code; you'd simply need to define an interface for the functionality you'd want to expose (a this[int, int] property comes to mind) and then implement that functionality (your return type wouldn't be a T[,] in this case, but what I'm getting at is that it could be something like it).

Simply put: no. But if you do not use jagged arrays but instead use multi-dim arrays, and if you take some time to study the memory layout of arrays in .NET, you could do it with unsafe pointers and erasing a part of the memory and moving the starting pointer of the multi-dim array. But it'd be still dependent on how you design your arrays and your matrixes whether this works or not.
However, I'd highly advice against it. There's a big chance you screw up the type and confuse the garbage collector if you do so.
Alternatively, if you like to do this exercise, use C++/CLI for this task. In C++, you have more control and it's easier to manipulate memory and move pointers directly. You also have more control over the destructor and finalizers, which may come in handy here. But, that said, then you still need marshaling. If you'd do all this for performance, I'd advice to go back to the simple loops, it'll perform faster in most cases.

Maybe you should have a look at using a maths library with good support for Matrix operations? Here's a thread which mentions a few:
Matrix Library for .NET

Using some methods from the Buffer class, you can do a row-wise copy if the matrix element type is a primitive type. This should be faster than an element-wise copy. Here is a generic extension method which demonstrates the use of Buffer:
static PrimitiveType[,] SubMatrix<PrimitiveType>(
this PrimitiveType[,] matrix, int fromRow, int fromCol) where PrimitiveType: struct
{
var (srcRowCount, srcColCount) = ( matrix.GetLength(0), matrix.GetLength(1) );
if (fromRow < 0 || fromRow > srcRowCount)
{
throw new IndexOutOfRangeException(nameof(fromRow));
}
if (fromCol < 0 || fromCol > srcColCount)
{
throw new IndexOutOfRangeException(nameof(fromCol));
}
var (dstRowCount, dstColCount) = ( srcRowCount - fromRow, srcColCount - fromCol );
var subMatrix = new PrimitiveType[dstRowCount, dstColCount];
var elementSize = Buffer.ByteLength(matrix) / matrix.Length;
for (var row = 0; row < dstRowCount; ++row)
{
var srcOffset = (srcColCount * (row + fromRow) + fromCol) * elementSize;
var dstOffset = dstColCount * row * elementSize;
Buffer.BlockCopy(matrix, srcOffset, subMatrix, dstOffset, dstColCount * elementSize);
}
return subMatrix;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Fast iterating over 3D array with unsafe code? - c#

If you want speed, you don't use a multidimensional array. You need to use a byte[][][] jagged array, which is heavily optimized in the CLR. This single change will likely speed up your loops sufficiently that you don't need to do anything else.

Related

Reshape c# array in place

Sorting array using BubbleSort fails

I need to do my operations faster

Is there a more efficient way to convert double to float?

Cut rows of a matrix without using loops

Categories

Resources