I have a 2d-array custom Vector class of around 250, 250 in dimensions. The Vector class just stores x and y float components for the vector. My project requires that I perform a smoothing function on the array so that a new array is created by taking the local average of i indices around each vector in the array. My problem is that my current solution does not compute fast enough and was wondering if there was a better way of computing this.
Pseudo code for my current solution can be seen below. I am implementing this in C#, any help would be much appreciated. My actual solution use 1d arrays for the speed up, but I didn't include that here.
function smoothVectorArray(Vector[,] myVectorArray, int averagingDistance) {
newVectorArray = new Vector[250,250];
for (x = 0; x < 250; x++)
{
for (y = 0; y < 250; y++)
{
vectorCount = 0;
vectorXTotal = 0;
vectorYTotal = 0;
for (i = -averageDistance; i < averagingDistance+ 1; i++)
{
for (j = -averageDistance; j < averagingDistance+ 1; j++)
{
tempX = x + i;
tempY = y + j;
if (inArrayBounds(tempX, tempY)) {
vectorCount++;
vectorXTotal += myVectorArray[tempX, tempY].x;
vectorYTotal += myVectorArray[tempX, tempY].y;
}
}
}
newVectorArray[x, y] = new Vector(vectorXTotal / vectorCount, vectorYTotal / vectorCount);
}
}
return newVectorArray;
}
What your inner cycles do is calculating sum of rectangular ares:
for (i = -averageDistance; i < averagingDistance+ 1; i++)
for (j = -averageDistance; j < averagingDistance+ 1; j++)
You can pre-calculate those efficiently in O(n^2). Let's introduce array S[N][N] (where N = 250 in your case).
To make it simpler I will assume there is only one coordinate. You can easily adapt it to pair (x, y) by building 2 arrays.
S[i, j] - will be sum of sub-rectangle (0, 0)-(i, j)
we can build this array efficiently:
S[0, 0] = myVectorArray[0, 0]; //rectangle (0, 0)-(0,0) has only one cell (0, 0)
for (int i = 1; i < N; ++i){
S[0, i] = S[0, i-1] + myVectorArray[0, i]; //rectangle (0, 0)-(0, i) is calculated based on previous rectangle (0,0)-(0,i-1) and new cell (0, i)
S[i, 0] = S[i - 1, 0] + myVectorArray[i, 0]; //same for (0, 0)-(i, 0)
}
for (int i = 1; i < N; ++i){
var currentRowSum = myVectorArray[i, 0];
for (int j = 1; j < N; ++j){
currentRowSum += myVectorArray[i, j]; //keep track of sum in current row
S[i, j] = S[i - 1, j] + currentRowSum; //rectangle (0,0)-(i,j) sum constrcuted as //rectanle (0, 0) - (i-1, j) which is current rectagnle without current row which is already calculated + current row sum
}
}
Once we have have this partials sums array calculated we can get sub rectangle sum in O(1). Lets say we want to get sum in rectangle (a, b)-(c,d)
To get it we start with big rectangle (0, 0)-(c, d) from which we need to subtract (0, 0)-(a-1, d-1) and (0, 0)-(c-1, b-1) and add add back rectangle (0, 0)-(a-1, b-1) since it was subtracted twice.
This way your can get rid of your inner cycle.
https://en.wikipedia.org/wiki/Summed_area_table
You will definitely want to take advantage of CPU cache for the solution, it sounds like you have that in mind with your 1D array solution. Try to arrange the algorithm to work on chunks of contiguous memory at a time, rather than hopping around the array. To this point you should either use a Vector struct, rather than a class, or use two arrays of floats, one for the x values and one for the y values. By using a class, your array is storing pointers to various spots in the heap. So even if you iterate over the array in order, you are still missing the cache all the time as you hop to the location of the Vector object. Every cache miss is ~200 cpu cycles wasted. This would be the main thing to work out first.
After that, some micro-optimizations you can consider are
using an inlining hint on the inArrayBounds method: [MethodImpl(MethodImplOptions.AggressiveInlining)]
using unsafe mode and iterating with pointer arithmetic to avoid arrays bounds checking overhead
These last two ideas may or may not have any significant impact, you should test.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I understand nested FOR loops. I understand what they do, and how they do it. But my problem is that they seem horribly unreadable to me.
Take this example:
for (int i = 0, y = 0; y <= ySize; y++) {
for (int x = 0; x <= xSize; x++, i++) {
vertices[i] = new Vector3(x, y);
}
}
Now, this loop is pretty straightforward. It's just an x/y "2 dimensional" loop. But as I add more and more "dimensions" to this nested loop, is there a way to make the code not a horrible mess of nests within nests and stupid amounts of backtracing counter variables (i, x, y, z, etc.)?
Also, does additional nesting affect performance in a linear way, or do additonal FORs make things more and more inefficient as you nest more of them?
I think that the issue you have here is less the nested for loops, and more an unusual use of variables within the loops.
Newlines before the opening braces can help with readability too (although this is subjective).
How about this instead:
int i = 0;
for (int y = 0; y <= ySize; y++)
{
for (int x = 0; x <= xSize; x++)
{
vertices[i++] = new Vector3(x, y);
}
}
This approach should remain relatively readable for additional dimensions too (in this example I've moved the incrementing of i out to its own line, as suggested by usr).
int i = 0;
for (int y = 0; y <= ySize; y++)
{
for (int x = 0; x <= xSize; x++)
{
for (int a = 0; a <= aSize; a++)
{
for (int b = 0; b <= bSize; b++)
{
vertices[i] = new Vector3(x, y, a, b);
i++;
}
}
}
}
Regarding performance, I would suggest focussing on making sure that the code is readable and understandable by a human first, and then measuring the run-time performance, possibly with a tool such as RedGate ANTS
The usual solution is to refactor into methods which contain one or two for loops, and keep refactoring until each method is clear and not too large.
Another solution to stop the indenting, and to separate the loop-resulting-data from the applying-logic, is to use Linq.
int i = 0;
var coordinates = from y in Enumerable.Range(0, ySize + 1)
from x in Enumerable.Range(0, xSize + 1)
select new { x, y, i = i++ };
foreach (var coordinate in coordinates) {
vertices[coordinate.i] = new Vector3(coordinate.x, coordinate.y);
}
This is only if the vertices array is already declared. If you can just create a new array, then you can do simply this:
var vertices = (from y in Enumerable.Range(0, ySize + 1)
from x in Enumerable.Range(0, xSize + 1)
select new Vector3(coordinate.x, coordinate.y)
).ToArray();
var vertices =
(from y in Enumerable.Range(0, ySize)
from x in Enumerable.Range(0, xSize)
select new Vector3(x, y)).ToList();
Loops are overused. Most loops can be expressed as queries. That makes them easier to write and maintain and it makes them expressions as opposed to statements which are easier to move around.
Performance is much worse, like 3-10x here. Whether that matters to your specific case depends on how much time is spent here and what your performance goals are.
a) Usually you will find that you're not going to need very deep nesting, so it won't be a problem.
b) You can make the nested loops into a separate method. (i.e. if you're nesting d in c in b in a - you can make a method that accepts a and b as parameters and does c and d. You can even let VS do this for you by selecting the c loop and clicking Edit->Refactor->Extract Method.)
As for performance - obviously more nesting means more iterations, but if you have them - you need them. Just changing a nested loop to be included in the original loop (and calculating where you are inside the "actual" code) will IMHO usually not help in any noticeable way.
An N-deep loop nest is likely to be more readable than any alternative expressible in C#. Instead, consider the use of a higher-level language that has vector arithmetic as a primitive: for instance, in NumPy the direct equivalent of your code is
xSize = 10
ySize = 20
vertices = np.meshgrid(
np.linspace(0, xSize, xSize),
np.linspace(0, ySize, ySize))
and the N-dimensional generalization is
sizes = [10, 20, 30, 40, 10] # as many array entries as you like
vertices = np.meshgrid(*(
np.linspace(0, kSize, kSize)
for kSize in sizes))
Just messing around here, but how about padding out spaces so that the loop conditions line up? Here is #Richard Everett's code reformatted a bit:
int i = 0;
for (int y = 0; y <= ySize; y++) {
for (int x = 0; x <= xSize; x++) {
for (int a = 0; a <= aSize; a++) {
for (int b = 0; b <= bSize; b++) {
vertices[i++] = new Vector3(x, y, a, b);
}
}
}
}
I feel like I'm missing something terribly obvious, but I cannot seem to find the array pair with the lowest value.
I have an int[,] worldMapXY where a 2D map is stored, say worldMapXY[0,0] through worldMapXY[120,120]. All values of map's array are 1 (wall\invalid) or 0 (path/valid).
I'm writing a method that will find coordinates in one of the eight cardinal directions to create a spawn point. So I also have int[,] validSpotArr which has a subset of bounds of the map closest to the direction I'm setting the spawn. The values for wall/invalid locations are set to 9999, the values for path/valid locations are set to (x + y). This is all specific to the bottom left corner, nearest to [0,0], hence "BL" or "Bottom Left"
case "BL":
for (int x = (int)border + 1; x < worldX + (int)border / 4; x++)
{
for (int y = (int)border + 1; y < worldY + (int)border / 4; y++)
{
if (worldMapXY[x,y] = 0)
{
validSpotArr[x,y] = x + y;
}
else
{
validSpotArr[x,y] = 9999;
}
}
}
What I can't quite wrap my head around is how to determine the coordinates/index of validSpotArr with the lowest value in such a way that I could pass those as separate x and y coordinates to another function (to set the spawn point). I suspect there's a lambda operator that may help, but I literally don't understand lambdas. Clearly that needs to be my next point of study.
E.g. - if validSpotArr[23, 45] = 68, and 68 is the lowest value, how do I set x=23 and y=45?
Edit: I tried messing around with something like this, but it isn't right:
Array.IndexOf(validSpotArr, validSpotArr.Min());
While not precisely an answer to your question, in a strictly given situation I'd probably go for finding those from within the cycles, i.e.
int minValidSpot = int.MaxValue, minX, minY;
for (int x = (int)border + 1; x < worldX + int(border) / 4; x++)
{
for (int y = (int)border + 1; y < worldY + int(border) / 4; y++)
{
if (worldMapXY[x,y] = 0)
{
validSpotArr[x,y] = x + y;
}
else
{
validSpotArr[x,y] = 9999;
}
if ( minValidSpot > validSpotArr[x,y] )
{
minValidSpot = validSpotArr[x,y];
minX = x;
minY = y;
}
}
}
Other than that, if looking for some kind of more universal solution, I'd probably just flatten that array, the maths for index conversion (nD<=>1D) are pretty simple.
I have read the question for Performance of 2-dimensional array vs 1-dimensional array
But in conclusion it says could be the same (depending the map own map function, C does this automatically)?...
I have a matrix wich has 1,000 columns and 440,000,000 rows where each element is a double in C#...
If I am doing some computations in memory, which one could be better to use in performance aspect? (note that I have the memory needed to hold such a monstruos quantity of information)...
If what you're asking is which is better, a 2D array of size 1000x44000 or a 1D array of size 44000000, well what's the difference as far as memory goes? You still have the same number of elements! In the case of performance and understandability, the 2D is probably better. Imagine having to manually find each column or row in a 1D array, when you know exactly where they are in a 2D array.
It depends on how many operations you are performing. In the below example, I'm setting the values of the array 2500 times. Size of the array is (1000 * 1000 * 3). The 1D array took 40 seconds and the 3D array took 1:39 mins.
var startTime = DateTime.Now;
Test1D(new byte[1000 * 1000 * 3]);
Console.WriteLine("Total Time taken 1d = " + (DateTime.Now - startTime));
startTime = DateTime.Now;
Test3D(new byte[1000,1000,3], 1000, 1000);
Console.WriteLine("Total Time taken 3D = " + (DateTime.Now - startTime));
public static void Test1D(byte[] array)
{
for (int c = 0; c < 2500; c++)
{
for (int i = 0; i < array.Length; i++)
{
array[i] = 10;
}
}
}
public static void Test3D(byte[,,] array, int w, int h)
{
for (int c = 0; c < 2500; c++)
{
for (int i = 0; i < h; i++)
{
for (int j = 0; j < w; j++)
{
array[i, j, 0] = 10;
array[i, j, 1] = 10;
array[i, j, 2] = 10;
}
}
}
}
The difference between double[1000,44000] and double[44000000] will not be significant.
You're probably better of with the [,] version (letting the compiler(s) figure out the addressing. But the pattern of your calculations is likely to have more impact (locality and cache use).
Also consider the array-of-array variant, double[1000][]. It is a known 'feature' of the Jitter that it cannot eliminate range-checking in the [,] arrays.
I am trying to figure out why "Choice A" performs better that "Choice B". My test shows something like 228 vs 830 or there about, it's like a 4 x difference. Looking at the IL, the untrained eye doesn't pick-out the subtly between the 2 calls.
Thank you,
Stephen
const int SIZE = 10000;
void Main()
{
var sw = Stopwatch.StartNew();
int[,]A = new int[SIZE, SIZE];
int total, x, y;
// Choice A
total = 0;
for (x = 0; x < SIZE; x++)
{
for (y = 0; y < SIZE; y++)
{
total += A[x, y];
}
}
Console.WriteLine(sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
// Choice B
total = 0;
for (y = 0; y < SIZE; y++)
{
for (x = 0; x < SIZE; x++)
{
total += A[x, y];
}
}
Console.WriteLine(sw.ElapsedMilliseconds);
}
// Define other methods and classes here
Ok, I broke this out so that they would run independently of each other and mitigate any caching and or diagnostics... and B is ALWAYS coming in behind A
namespace ConsoleApplication1
{
class ProgramA
{
const int SIZE = 10000;
static void Main(string[] args)
{
var sw = Stopwatch.StartNew();
int[,] A = new int[SIZE, SIZE];
int total, x, y;
// Choice A
total = 0;
for (x = 0; x < SIZE; x++)
{
for (y = 0; y < SIZE; y++)
{
total += A[x, y];
}
}
Console.WriteLine(sw.ElapsedMilliseconds);
Console.ReadLine();
}
}
class ProgramB
{
const int SIZE = 10000;
static void Main(string[] args)
{
var sw = Stopwatch.StartNew();
int[,] A = new int[SIZE, SIZE];
int total, x, y;
// Choice B
total = 0;
for (y = 0; y < SIZE; y++)
{
for (x = 0; x < SIZE; x++)
{
total += A[x, y];
}
}
Console.WriteLine(sw.ElapsedMilliseconds);
Console.ReadLine();
}
}
}
At a guess, cache effects would be the big one here.
A two-dimensional array is layed out in memory like so:
(0, 0) (0, 1) (0, 2) (0, 3) (1, 0) (1, 1) (1, 2) ...
In option A, you're accessing successive elements in memory - this means that when the CPU fetches a cache line, it gets several successive elements. While option B is jumping around through memory. Thus option B requires significantly more memory accesses once the array becomes larger than the cache size.
Ahh I think I remember.
If you think of a 2d array as a table in memory, the first value is the row, the second value is a column.
[0, 0] [0, 1] [0, 2] [0, 3]...
[1, 0] [1, 1] [1, 2] [1, 3]...
When you iterate over it, the first loop is the row, the second loop is the column. It's quicker to iterate by doing foreach row, assign each column.
In the second scenario it's values are assigned as
[0, 0] [1, 0] [2, 0] [3, 0]...
[0, 1] [1, 1] [2, 1] [3, 1]...
So this is slower because your looping, you're assigning foreach column, foreach row. You're only assigning the first column, for each row.
Does that make sense?
Edit: This was one of the things I was looking for:
http://en.wikipedia.org/wiki/Row-major_order
In row-major storage, a
multidimensional array in linear
memory is accessed such that rows are
stored one after the other.
So when iterating over a row at a time, it's not jumping around memory looking for each next row to assign the value to the column, it has the row, assigns all columns, then jumps to the next row in memory.
To expand upon the cacheing answers:
The values in question are 4 bytes each and IIRC current memory architecture reads 16 byte lines from memory assuming a properly populated motherboard. (I don't know about DDR3, it's three-chip nature suggests the reads are even bigger.) Thus when you read a line of memory you get 4 values.
When you do it the first way you use all of these values before going back to the memory for the next line. Done the second way you use only one of them and it then gets flushed from the on-chip cache long before it's called for again.
I need to implement this scenario in C#:
The matrix will be very large, maybe 10000x10000 or larger. I will use this for distance matrix in hierarchical clustering algorithm. In every iteration of the algorithm the matrix should be updated (joining 2 rows into 1 and 2 columns into 1). If I use simple double[,] or double[][] matrix this operations will be very "expensive".
Please, can anyone suggest C# implementation of this scenario?
Do you have a algorithm at the moment? And what do you mean by expensive? Memory or time expensive? If memory expensive: There is not much you can do in c#. But you can consider executing the calculation inside a database using temporary objects. If time expensive: You can use parallelism to join columns and rows.
But beside that I think a simple double[,] array is the fastest and memory sparing way you can get in c#, because accessing the array values is an o(1) operation and arrays have a least amount of memory and management overhead (compared to lists and dictionaries).
As mentioned above, a basic double[,] is going to be the most effective way of handling this in C#.
Remember that C# sits of top of managed memory, and as such you have less fine grain control over low level (in terms of memory) operations in contrast to something like basic C. Creating your own objects in C# to add functionality will only use more memory in this scenario, and likely slow the algorithm down as well.
If you have yet to pick an algorithm, CURE seems to be a good bet. The choice of algorithm may affect your data structure choice, but that's not likely.
You will find that the algorithm determines the theoretical limits of 'cost' at any rate. For example you will read that for CURE, you are bound by a O(n2 log n) running time, and O(n) memory use.
I hope this helps. If you can provide more detail, we might be able to assist further!
N.
It's not possible to 'merge' two rows or two columns, you'd have to copy the whole matrix into a new, smaller one, which is indeed unacceptably expensive.
You should probably just add the values in one row to the previous and then ignore the values, acting like they where removed.
the arrays of arrays: double[][] is actually faster than double[,]. But takes more memory.
The whole array merging thing might not be needed if you change the algoritm a bit, but this might help u:
public static void MergeMatrix()
{
int size = 100;
// Initialize the matrix
double[,] matrix = new double[size, size];
for (int i = 0; i < size; i++)
for (int j = 0; j < size; j++)
matrix[i, j] = ((double)i) + (j / 100.0);
int rowMergeCount = 0, colMergeCount = 0;
// Merge last row.
for (int i = 0; i < size; i++)
matrix[size - rowMergeCount - 2, i] += matrix[size - rowMergeCount - 1, i];
rowMergeCount++;
// Merge last column.
for (int i = 0; i < size; i++)
matrix[i, size - colMergeCount - 2] += matrix[i, size - colMergeCount - 1];
colMergeCount++;
// Read the newly merged values.
int newWidth = size - rowMergeCount, newHeight = size - colMergeCount;
double[,] smaller = new double[newWidth, newHeight];
for (int i = 0; i < newWidth; i++)
for (int j = 0; j < newHeight; j++)
smaller[i, j] = matrix[i, j];
List<int> rowsMerged = new List<int>(), colsMerged = new List<int>();
// Merging row at random position.
rowsMerged.Add(15);
int target = rowsMerged[rowMergeCount - 1];
int source = rowsMerged[rowMergeCount - 1] + 1;
// Still using the original matrix since it's values are still usefull.
for (int i = 0; i < size; i++)
matrix[target, i] += matrix[source, i];
rowMergeCount++;
// Merging col at random position.
colsMerged.Add(37);
target = colsMerged[colMergeCount - 1];
source = colsMerged[colMergeCount - 1] + 1;
for (int i = 0; i < size; i++)
matrix[i, target] += matrix[i, source];
colMergeCount++;
newWidth = size - rowMergeCount;
newHeight = size - colMergeCount;
smaller = new double[newWidth, newHeight];
for (int i = 0, j = 0; i < newWidth && j < size; i++, j++)
{
for (int k = 0, m = 0; k < newHeight && m < size; k++, m++)
{
smaller[i, k] = matrix[j, m];
Console.Write(matrix[j, m].ToString("00.00") + " ");
// So merging columns is more expensive because we have to check for it more often while reading.
if (colsMerged.Contains(m)) m++;
}
if (rowsMerged.Contains(j)) j++;
Console.WriteLine();
}
Console.Read();
}
In this code I use two 1D helper lists to calculate the index into a big array containing the data. Deleting rows/columns is really cheap since I only need to remove that index from the helper-lists. But of course the memory in the big array remains, i.e. depending on your usage you have a memory-leak.
public class Matrix
{
double[] data;
List<int> cols;
List<int> rows;
private int GetIndex(int x,int y)
{
return rows[y]+cols[x];
}
public double this[int x,int y]
{
get{return data[GetIndex(x,y)];}
set{data[GetIndex(x,y)]=value;}
}
public void DeleteColumn(int x)
{
cols.RemoveAt(x);
}
public void DeleteRow(int y)
{
rows.RemoveAt(y);
}
public Matrix(int width,int height)
{
cols=new List<int>(Enumerable.Range(0,width));
rows=new List<int>(Enumerable.Range(0,height).Select(i=>i*width));
data=new double[width*height];
}
}
Hm, to me this looks like a simple binary tree. The left node represents the next value in a row and the right node represents the column.
So it should be easy to iterate rows and columns and combine them.
Thank you for the answers.
At the moment I'm using this solution:
public class NodeMatrix
{
public NodeMatrix Right { get; set;}
public NodeMatrix Left { get; set; }
public NodeMatrix Up { get; set; }
public NodeMatrix Down { get; set; }
public int I { get; set; }
public int J { get; set; }
public double Data { get; set; }
public NodeMatrix(int I, int J, double Data)
{
this.I = I;
this.J = J;
this.Data = Data;
}
}
List<NodeMatrix> list = new List<NodeMatrix>(10000);
Then I'm building the connections between the nodes. After that the matrix is ready.
This will use more memory, but operations like adding rows and columns, joining rows and columns I think will be far more faster.