How to multiply 2 matrices using Parallel.ForEach?

How to multiply 2 matrices using Parallel.ForEach? - c#

There is a function that multiplies two matrices as usual
public IMatrix Multiply(IMatrix m1, IMatrix m2)
{
var resultMatrix = new Matrix(m1.RowCount, m2.ColCount);
for (long i = 0; i < m1.RowCount; i++)
{
for (byte j = 0; j < m2.ColCount; j++)
{
long sum = 0;
for (byte k = 0; k < m1.ColCount; k++)
{
sum += m1.GetElement(i, k) * m2.GetElement(k, j);
}
resultMatrix.SetElement(i, j, sum);
}
}
return resultMatrix;
}
This function should be rewritten using Parallel.ForEach Threading, I tried this way
public IMatrix Multiply(IMatrix m1, IMatrix m2)
{
// todo: feel free to add your code here
var resultMatrix = new Matrix(m1.RowCount, m2.ColCount);
Parallel.ForEach(m1.RowCount, row =>
{
for (byte j = 0; j < m2.ColCount; j++)
{
long sum = 0;
for (byte k = 0; k < m1.ColCount; k++)
{
sum += m1.GetElement(row, k) * m2.GetElement(k, j);
}
resultMatrix.SetElement(row, j, sum);
}
});
return resultMatrix;
}
But there is an error with the type argument in the loop. How can I fix it?

Just use Parallel.For instead of Parallel.Foreach, that should let you keep the exact same body as the non-parallel version:
Parallel.For(0, m1.RowCount, i =>{
...
}
Note that only fairly large matrices will benefit from parallelization, so if you are working with 4x4 matrices for graphics, this is not the approach to take.
One problem with multiplying matrices is that you need to access one value for each row for one of the matrices in your innermost loop. This access pattern may be difficult to cache by your processor, causing lots of cache misses. So a fairly easy optimization is to copy an entire column to a temporary array and do all computations that need this column before reading the next. This lets all memory access be nice and linear and easy to cache. this will do more work overall, but better cache utilization easily makes it a win. There are even more cache efficient methods, but the complexity also tend to increase.
Another optimization would be to use SIMD, but this might require platform specific code for best performance, and will likely involve more work. But you might be able to find libraries that are already optimized.
But perhaps most importantly, Profile your code. It is quite easy to have simple things consume lot of time. You are for example using an interface, so if you may have a virtual method call for each memory access that cannot be inlined, potentially causing a severe performance penalty compared to a direct array access.

ForEach receives a collection, IEnumerable as the first argument and m1.RowCount is a number.
Probably Parallel.For() is what you wanted.

Related

Quick way of checking if a 2D array contains an element c#

I'm currently coding battleships as a part of a college project. The game works perfectly fine but I'd like to implement a way to check if a ship has been completely sunk. This is the method I'm currently using:
public static bool CheckShipSunk(string[,] board, string ship){
for(int i = 0; i < board.GetLength(0); i++){
for(int j = 0; j < board.GetLength(1); j++){
if(board[i,j] == ship){return false;}
}
}
return true;
}
The problem with this is that there are 5 ships, and this is very inefficient when checking hundreds of elements 5 times over, not to mention the sub-par quality of college computers. Is there an easier way of checking if a 2D array contains an element?

Use an arithmetic approach to loop-through with just 1 loop.
public static bool CheckShipSunk(string[,] board, string ship){
int rows = board.GetLength(0);
int cols = board.GetLength(1);
for (int i = 0; i < rows * cols; i++) {
int row = i / cols;
int col = i % cols;
if (board[row, col] == ship)
return false;
}
return true;
}
But I am with Nysand on just caching and storing that information in cells. The above code although might work, is not recommended as it is still not as efficient

this is very inefficient when checking hundreds of elements 5 times over
Have you done any profiling? Computers are fast even your old college computers. Checking hundreds of elements should take microseconds. From Donald Knuths famous quote
There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
So if you feel your program is slow I would recommend to start with profiling. If you are in university this might be a very valuable skill to learn.
There are also better algorithms/datastructures that could be employed. I would for example expect each ship to know what locations they are at, and various other information, like if it is sunk at all. Selecting appropriate data structures are also a very important skill to learn, but a difficult one. Also, try to not get stuck in analysis-paralysis, a terrible inefficient ugly working solution is still better than the most beautiful code that does not work.
However, a very easy thing to fix is moving .GetLength out of the loop. This is a very slow call, and only doing this once should make your loop a several times faster for almost no effort. You might also consider replacing the strings with some other identifier, like an int.

Willy-Nilly you have to scan either the entire array or up to the first ship.
You can simplify the code by quering the array with help of Linq, but not increase performance which has O(length * width) time complexity.
using System.Linq;
...
// No explicit loops, juts a query (.net will loop for you)
public static bool CheckShipSunk(string[,] board, string ship) => board is null
? throw new ArgumentNullException(nameof(board))
: board.AsEnumerable().Any(item => item == ship);
If you are looking for performance (say, you have a really huge array, many ships to test etc.), I suggest changing the data structure: Dictionary<string, (int row, int col)>
instead of the string[,] array:
Dictionary<string, (int row, int col)> sunkShips =
new Dictionary<string, (int row, int col)>(StringComparer.OrdinalIgnoreCase) {
{ "Yamato", (15, 46) },
{ "Bismark", (11, 98) },
{ "Prince Of Wales", (23, 55) },
};
and then get it as easy as
public static bool CheckShipSunk(IDictionary<string, (int row, int col)> sunkShips,
string ship) =>
sunkShips?.Keys?.Contains(ship) ?? false;
Note that time complexity is O(1) which means it doesn't depend on board length and width

Optimizing this enormous for loop

So I have this for loop:
for (int i = 0; i < meshes.Count; i++)
{
for (int j = 0; j < meshes.Count; j++)
{
for (int m = 0; m < meshes[i].vertices.Length; m++)
{
for (int n = 0; n < meshes[i].vertices.Length; n++)
{
if ((meshes[i].vertices[m].x == meshes[j].vertices[n].x) && (meshes[i].vertices[m].z == meshes[j].vertices[n].z))
{
if (meshes[i].vertices[m] != meshes[j].vertices[n])
{
meshes[i].vertices[m].y = meshes[j].vertices[n].y;
}
}
}
}
}
}
Which goes through a few million vectors and compares them to all other vectors, to then modify some of their y values. I think it works, however after hitting play it takes an unbelievably long time to load (currently been waiting for 15 minutes, and still going). Is there a way to make it more efficient? Thanks for the help!

As I read this, what you're basically doing, is that for all vertices with the same x and z, you set their y value to the same.
A more optimized way would be to use the Linq method GroupBy which internally uses hash mapping to avoid exponential time complexity like your current approach:
var vGroups = meshes.SelectMany(mesh => mesh.vertices)
.GroupBy(vertex => new { vertex.x, vertex.z });
foreach (var vGroup in vGroups)
{
vGroup.Aggregate((prev, curr) =>
{
// If prev is null (i.e. first iteration of the "loop")
// don't change the 'y' value
curr.Y = prev?.y ?? curr.y;
return curr;
});
}
// All vertices should now be updated in the 'meshes'
Note, that the final y value of the vertices depends on the order of the meshes and vertices in your original list. The first vertex in each vGroup is the deciding vertex. I believe it'll be opposite of your approach, where it's the last vertex that's the deciding one, but it doesn't sound like that's important for you.
Furthermore, be aware that in this (and your) approach you are possibly merging two vertices in the same mesh if two vertices have the same x and z values. I don't know if that's intended but I wanted to point it out.
A additional performance optimization would be to parallelize this. Just start out with call to AsParallel:
var vGroups = meshes.AsParallel()
.SelectMany(mesh => mesh.vertices)
.GroupBy(vertex => new { vertex.x, vertex.z });
// ...
Be aware, that parallelization is not always speeding things up if the computation you are trying to parallelize is not that computationally expensive. The overhead from parallelizing it may outweigh the benefits. I'm not sure if the GroupBy operation is heavy enough for it to be beneficial but you'll have to test that out for yourself. Try without it first.
For a simplified example, see this fiddle.

You want to make Y equal for all vertices with the same X and Z. Lets do just that
var yForXZDict = new Dictionary<(int, int), int>();
foreach (var mesh in meshes)
{
foreach (var vertex in mesh.vertices)
{
var xz = (vertex.x, vertex.z);
if (yForXZDict.TryGetValue(xz, out var y))
{
vertex.y = y;
}
else
{
yForXZDict[xz] = vertex.y;
}
}
}
You should replace int to the exact type you use for coordinates

You are comparing twice unnecessarily.
Here a short example of what I mean:
Let's say we have meshes A, B, C.
You are comparing
A, A
A, B
A, C
B, A
B, B
B, C
C, A
C, B
C, C
while this checks e.g. the combination A and B two times.
One first easy improvement would be to use e.g.
for (int i = 0; i < meshes.Count; i++)
{
// only check the current and following meshes
for (int j = i; j < meshes.Count; j++)
{
...
do you even want to compare a mesh with itself? Otherwise you can actually even use j = i + 1 so only compare the current mesh to the next and following meshes.
Then for the vertices it depends. If you actually also want to check the mesh with itself at least you want int n = m + 1 in the case that i == j.
It makes no sense to check a vertex with itself since the condition will always be true.
A next point is minimize accesses
You are accessing e.g.
meshes[i].vertices
five times!
rather get and store it once like e.g.
// To minimize GC it sometimes makes sense to reuse variables outside of a loop
Mesh meshA;
Mesh meshB;
Vector3[] vertsA;
Vector3[] vertsB;
Vector3 vA;
Vector3 vB;
for (int i = 0; i < meshes.Count; i++)
{
meshA = meshes[i];
vertsA = meshA.vertices;
for (int j = i; j < meshes.Count; j++)
{
meshB = meshes[j];
vertsB = meshB.vertices;
for(int m = 0; m < vertsA.Length; m++)
{
vA = vertsA[m];
...
Also note that a line like
meshes[i].vertices[m].y = meshes[j].vertices[n].y;
Actually shouldn't even compile!
The vertices are Vector3 which is a struct so assigning the
meshes[i].vertices[m].y
only changes the value of a returned Vector3 instance but shouldn't in any way change the content of the array.
You would rather work with the vA as mentioned before and at the end assign it back via
vertsA[m] = vA;
and then at the end of the loop assign the entire array back once via
meshA.vertices = vertsA;
And well finally: I would put this into a Thread or use Unity's JobSystem and the burst compiler and meanwhile e.g. display a progress bar or some User feedback instead of freezing the entire application.
Yet another point is floating point precision
you are directly comparing two float values using ==. Due to the floating point precision this might fail even if it shouldn't e.g.
10f * 0.1f == 1f
is not necessarily true. It might be 0.99999999 or 1.0000000001.
Therefore Unity uses only a precision of 0.00001 for Vector3 == Vector3.
You should either do the same and use
if(Mathf.Abs(vA.x - vB.x) <= 0.00001f)`
or use
if(Mathf.Approximately(vA.x, vB.x))
which equals
if(Mathf.Abs(vA.x - vB.x) <= Mathf.Epsilon)`
where Epsilon is the smallest value two floats can differ

I need to do my operations faster

I have a piece of code that reads points from an stl, then I have to do a transformation, aplying a transformation matrix, of this stl and write the results on other stl. I do all this stuff, but it's too slow, about 5 or more minutes.
I put the code of the matrix multiplication, it recieves the two matrix and it makes the multiplication:
public double[,] MultiplyMatrix(double[,] A, double[,] B)
{
int rA = A.GetLength(0);
int cA = A.GetLength(1);
int rB = B.GetLength(0);
int cB = B.GetLength(1);
double temp = 0;
double[,] kHasil = new double[rA, cB];
if (cA != rB)
{
MessageBox.Show("matrix can't be multiplied !!");
}
else
{
for (int i = 0; i < rA; i++)
{
for (int j = 0; j < cB; j++)
{
temp = 0;
for (int k = 0; k < cA; k++)
{
temp += A[i, k] * B[k, j];
}
kHasil[i, j] = temp;
}
}
return kHasil;
}
return kHasil;
}
My problem is that all the code is too slow, it has to read from a stl, multiply all the points and write in other stl the results, it spends 5-10 minutes to do that. I see that all comercial programs, like cloudcompare, do all this operations in a few seconds.
Can anyone tell me how I can do it faster? Is there any library to do that faster than my code?
Thank you! :)

I fond this on internet:
double[] iRowA = A[i];
double[] iRowC = C[i];
for (int k = 0; k < N; k++) {
double[] kRowB = B[k];
double ikA = iRowA[k];
for (int j = 0; j < N; j++) {
iRowC[j] += ikA * kRowB[j];
}
}
then use Plinq
var source = Enumerable.Range(0, N);
var pquery = from num in source.AsParallel()
select num;
pquery.ForAll((e) => Popt(A, B, C, e));
Where Popt is our method name taking 3 jagged arrays (C = A * B) and the row to calculate (e). How fast is this:
1.Name Milliseconds2.Popt 187
Source is: Daniweb
That's over 12 times faster than our original code! With the magic of PLINQ we are creating 500 threads in this example and don't have to manage a single one of them, everything is handled for you.

You have couple of options:
Rewrite your code with jagged arrays (like double[][] A) it should give ~2x increase in speed.
Write unmanaged C/C++ DLL with matrix multiplication code.
Use third-party math library that have native BLAS implementation under the hood. I suggest Math.NET Numerics, it can be switched to use Intel MKL which is smoking fast.
Probably, the third option is the best.

Just for the records: CloudCompare is not a commercial product. It's a free open-source project. And there are no 'huge team of developers' (only a handful of them actually, doing this on their free time).
Here is our biggest secret: we use pure C++ code ;). And we rarely use multi-threading but for very lengthy processes (you have to take the thread management and processing time overhead into account).
Here are a few 'best practice' rules for the parts of the code that are called loads of times:
avoid any dynamic memory allocation
make as less (far) function calls as possible
always process the most probable case first in a 'if-then-else' branching
avoid very small loops (inline them if N = 2 or 3)

Is there a more efficient way to convert double to float?

I have a need to convert a multi-dimensional double array to a jagged float array. The sizes will var from [2][5] up to around [6][1024].
I was curious how just looping and casting the double to the float would perform and it's not TOO bad, about 225µs for a [2][5] array - here's the code:
const int count = 5;
const int numCh = 2;
double[,] dbl = new double[numCh, count];
float[][] flt = new float[numCh][];
for (int i = 0; i < numCh; i++)
{
flt[i] = new float[count];
for (int j = 0; j < count; j++)
{
flt[i][j] = (float)dbl[i, j];
}
}
However if there are more efficient techniques I'd like to use them. I should mention that I ONLY timed the two nested loops, not the allocations before it.
After experimenting a little more I think 99% of the time is burned on the loops, even without the assignment!

This will run faster, for small data it's not worth doing Parallel.For(0, count, (j) => it actually runs considerably slower for very small data, which is why that I have commented that section out.
double* dp0;
float* fp0;
fixed (double* dp1 = dbl)
{
dp0 = dp1;
float[] newFlt = new float[count];
fixed (float* fp1 = newFlt)
{
fp0 = fp1;
for (int i = 0; i < numCh; i++)
{
//Parallel.For(0, count, (j) =>
for (int j = 0; j < count; j++)
{
fp0[j] = (float)dp0[i * count + j];
}
//});
flt[i] = newFlt.Clone() as float[];
}
}
}
This runs faster because double accessing double arrays [,] is really taxing in .NET due to the array bounds checking. the newFlt.Clone() just means we're not fixing and unfixing new pointers all the time (as there is a slight overhead in doing so)
You will need to run it with the unsafe code tag and compile with /UNSAFE
But really you should be running with data closer to 5000 x 5000 not 5 x 2, if something takes less than 1000 ms you need to either add in more loops or increase the data because at that level a minor spike in cpu activity can add a lot of noise to your profiling.

In your example - I think you dont measure the double/float comparison so much (which should be a processor internal instruction) as the array accesses (which have a lot of redirects plus obviousl.... aray delimiter checks (for the array index of bounds exception).
I would suggest timining a test without arrays.

I don't really think that you can optimize your code much more, one option would be to make your code parallel but for your input data size ([2][5] up to around [6][1024]) I don't thing that you would profit so much if you would even have any profit. In fact, I wouldn't even bother optimizing that piece of code at all...
Anyway, to optimize that, the only thing that I would do (if that fits in what you want to do) would be to just used fixed-width arrays instead of the jagged ones, even if you would waste memory with that.

If you could use also Lists in your case you could use the LINQ approach:
List<List<double>> t = new List<List<double>>();
//adding test data
t.Add(new List<double>() { 12343, 345, 3, 23, 2, 1 });
t.Add(new List<double>() { 43, 123, 3, 54, 233, 1 });
//creating target
List<List<float>> q;
//conversion
q = t.ConvertAll<List<float>>(
(List<double> inList) =>
{
return inList.ConvertAll<float>((double inValue) => { return (float)inValue; });
}
);
if its faster you have to measure yourself. (doubtful)
but you could parallelize it which could fasten it up (PLINQ)

How do I read/write C# BigIntegers to/from a file?

in one of my classes, I have a routine that reads and writes an array of type Decimal (using BinaryReader / BinaryWriter's ReadDecimal() and Write() methods, to wit:
BinaryReader inputReader = new BinaryReader(File.OpenRead(BaseFilePath));
for (int x = 0; x < 6; x++) {
for (int y = 0; y < m_Codes[x].GetLength(0); y++) {
for (int z = 0; z < m_Codes[x].GetLength(1); z++) {
m_Codes[x][y, z] = inputReader.ReadDecimal();
}
}
}
and
for (int x = 0; x < 6; x++) {
for (int y = 0; y < m_Codes[x].GetLength(0); y++) {
for (int z = 0; z < m_Codes[x].GetLength(1); z++) {
outputWriter.Write(m_Codes[x][y, z]);
}
}
}
.. as you can see, only the first dimension is known at design time, the others vary on runtime.
In a perfect world, I would replace ReadDecimal() with ReadBigInteger() and something similar for the writing methods, but that does not seem to be supported in the Stream classes; I'm guessing this is because BigInteger can be of any length.
About the best thing I can think of is to "hand code" the BigInteger, by converting it to a byte[] array, then writing the length of that array, then writing each byte in the array itself (and doing the reverse to read it in)
Two questions:
1) Is this a better way?
2) I'm primarily motivated by a desire to increase performance; boes BigInteger even perform that much better than Decimal, if at all?

There's one fairly simple approach: Call BigDecimal.ToByteArray to serialize, and the BigDecimal(byte[]) constructor when deserializing. Admittedly that ends up copying the data, but I'd still expect it to be reasonably fast.
What's of more concern to you: serialization performance or arithmetic performance?
As for any speed differences between BigInteger and decimal - you should test it for the operations you actually want to perform, being aware that they will behave differently (e.g. dividing 3 by 2 will obviously give a different answer for each type).

You could convert to a string (BigInteger.ToSting()) and then write that string (as strings are directly supported with BinaryReader and BinaryWriter this avoids needing to do any encoding/decoding yourself).
Then convert it back with BigInteger.Parse.
To address the performance: I think you'll need to measure for the cases you are interested.
When relatively small values (say abs(value) < 2128) I would expect BigInteger's performance to be within a couple of orders of magnitude of long's performance (ie. no more than ~500 times slower). But as BigInteger instances get larger operations will take longer (more bits have to be manipulated). On the other hand decimal should have reasonably consistent performance at all scales, but it could be very much slower than long for numbers in the intersection of their ranges (decimal is a much more complex representation: scale factors and retaining actual significant digits through calculations; but no intuition of the effect of this complexity).
And remember: BigDecimal is exact – it never rounds; decimal is approximate – data can fall off the end and be thrown away. It seems unlikely that any one business problem would support either.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.