I'm looking for resources that can help me determine which approach to use in creating a 2d data structure with C#.
Do you mean multidimensional array? It's simple:
<type>[,] <name> = new <type>[<first dimenison>,<second dimension>];
Here is MSDN reference:
Multidimensional Arrays (C#)
#Traumapony-- I'd actually state that the real performance gain is made in one giant flat array, but that may just be my C++ image processing roots showing.
It depends on what you need the 2D structure to do. If it's storing something where each set of items in the second dimension is the same size, then you want to use something like a large 1D array, because the seek times are faster and the data management is easier. Like:
for (y = 0; y < ysize; y++){
for (x = 0; x < xsize; x++){
theArray[y*xsize + x] = //some stuff!
}
}
And then you can do operations which ignore neighboring pixels with a single passthrough:
totalsize = xsize*ysize;
for (x = 0; x < totalsize; x++){
theArray[x] = //some stuff!
}
Except that in C# you probably want to actually call a C++ library to do this kind of processing; C++ tends to be faster for this, especially if you use the intel compiler.
If you have the second dimension having multiple different sizes, then nothing I said applies, and you should look at some of the other solutions. You really need to know what your functional requirements are in order to be able to answer the question.
Depending on the type of the data, you could look at using a straight 2 dimensional array:
int[][] intGrid;
If you need to get tricky, you could always go the generics approach:
Dictionary<KeyValuePair<int,int>,string>;
That allows you to put complex types in the value part of the dictionary, although makes indexing into the elements more difficult.
If you're looking to store spatial 2d point data, System.Drawing has a lot of support for points in 2d space.
For performance, it's best not to use multi-dimensional arrays ([,]); instead, use jagged arrays. e.g.:
<type>[][] <name> = new <type>[<first dimension>];
for (int i = 0; i < <first dimension>; i++)
{
<name>[i] = new <type>[<second dimension>];
}
To access:
<type> item = <name>[<first index>][<second index>];
Data Structures in C#
Seriously, I'm not trying to be critical of the question, but I got tons of useful results right at the top of my search when I Googled for:
data structures c#
If you have specific questions about specific data structures, we might have more specific answers...
Related
I think the title is quite clear, so I'll just write some personal opinions here.
Consider a matrix of numbers, the equivalent representations in C# code are double[,] and double[][] respectively. When using multi-dimensional array (2D in this specific situation), It can be easily seen that one doesn't have to check either there is any null reference of double[] or the size of rows are the same, which allows a better understanding of the core problem. Also it descirbes the matrix more accurately from my point of view, since in most cases a matrix should be treated as a single entity rather than a list of arrays.
But using multi-dimensional array may result in more lines of code. If one wants to apply math operations on it, say, transposition, he would have to use nested loops like
var row = mat.GetLength(0);
var col = mat.GetLength(1);
var newmat = new double[col, row];
for (var i = 0; i < row; i++)
{
for (var j = 0; j < col; j++)
{
newmat[j, i] = mat[i, j];
}
}
With jagged array, he can simply write
var newmat = Enumerable.Range(0, mat[0].Length - 1).
Select(i => mat.Select(r => r[i]).ToArray()).ToArray();
I'm not sure which one is better. Usually I only create my own subroutine unless there is no solution provided by .Net, so I prefer the latter. But multi-dimensional array do have its advantages which I really like. Could anyone teach me about how to choose between them?
It's not about the lines of code that is the problem, but the efficiency of the code itself.
If you had a sparse matrix (matrix with almost all zeros), you want to use a jagged matrix because iterating through the two-dimensional matrix searching for non-zero elements would waste time.
However, if you had a matrix and you wanted to find its determinant, it would be simpler to use the method of co-factors on it. If you're not familiar with the method, it involves breaking up the matrix into smaller matrices, eventually to the 2x2 version where you can simply perform a*d-b*c. This isn't possible with jagged matrices.
I have a matrix [3,15000]. I need to count covariance matrix for the original matrix and then find its eigenvalues.
This is a part of my code:
double[,] covarianceMatrix = new double[numberOfObjects,numberOfObjects];
for (int n=0; n<numberOfObjects;n++)
{
for (int m=0;m<numberOfObjects;m++)
{
double sum = 0;
for (int k=0; k<TimeAndRepeats[i,1]; k++)
{
sum += originalMatrix[k,n]*originalMatrix[k,m];
}
covarianceMatrix[n,m] = sum/TimeAndRepeats[i,1];
}
}
alglib.smatrixevd(covarianceMatrix,numberOfObjects,1,true,out eigenValues, out eigenVectors);
NumberOfObjects here is about 15000.
When I do my computations for a smaller number of objects everything is Ok, but for all my data I get an exeption.
Is it possible to solve this problem?
I am using macOS, x64
My environment is MonoDevelop
double[,] covarianceMatrix = new double[numberOfObjects,numberOfObjects];
You said that your matrix is [3, 15000] and that numberOfObjects is 15000. By this line of code here, you're creating a matrix of [15000, 15000] of doubles
15000 * 15000 = 225000000 doubles at 8 bytes each: 1,800,000,000 bytes or 1.8GB
That's probably why you are running out of memory.
Edit:
According to this question and this question the size of objects in C# cannot be larger that 2GB. The 1.8GB does not count any additional overhead required to reference the items in the array, so that 1.8GB might actually be > 2GB when everything is accounted for (Can't say without the debugging info, someone with more C# experience might have to set me straight on this). You might consider this workaround if you're trying to work with really large array, since statically allocated arrays can get messy.
When you create covarianceMatrix, you are creatinf an object of 15000*15000 = 225000000
so you need 1800000000 bytes of memory. it is because of that that you have OutofMemoryException
Exception name tells you exactly what's the problem. You could use floats instead of doubles to bisect ammount of memory needed. Other option would be to create some class object for a covariance matrix that would save data in a disk file, though you'd need to implement proper mechanisms to operate on it and the performance would be limited aswell.
I have a very large two dimensional array and I need to compute vector operations on this array. NTerms and NDocs are both very large integers.
var myMat = new double[NTerms, NDocs];
I need to to extract vector columns from this matrix. Currently, I'm using for loops.
col = 100;
for (int i = 0; i < NTerms; i++)
{
myVec[i] = myMat[i, col];
}
This operation is very slow. In Matlab I can extract the vector without the need for iteration, like so:
myVec = myMat[:,col];
Is there any way to do this in C#?
There are no such constructs in C# that will allow you to work with arrays as in Matlab. With the code you already have you can speed up process of vector creation using Task Parallel Library that was introduced in .NET Framework 4.0.
Parallel.For(0, NTerms, i => myVec[i] = myMat[i, col]);
If your CPU has more than one core then you will get some improvement in performance otherwise there will be no effect.
For more examples of how Task Parallel Library could be used with matrixes and arrays you can reffer to the MSDN article Matrix Decomposition.
But I doubt that C# is a good choice when it comes to some serious math calculations.
Some possible problems:
Could it be the way that elements are accessed for multi-dimensional arrays in C#. See this earlier article.
Another problem may be that you are accessing non-contiguous memory - so not much help from cache, and maybe you're even having to fetch from virtual memory (disk) if the array is very large.
What happens to your speed when you access a whole row at a time, instead of a column? If that's significantly faster, you can be 90% sure it's a contiguous-memory issue...
I have a 2D string array in C# and I need to shift that array to left in one dimension
how can I do that in efficient way
I dont want use nested for and i want an algurithm in O(n) not O(n2)
for (int i = 50; i < 300; i++)
{
for (int j = 0; j < 300; j++)
{
numbers[i-50, j] = numbers[i, j];
}
}
If you want to shift large amounts of data around quickly, use Array.Copy rather than a loop that copies individual characters.
If you swap to a byte array and use Array.Copy or Buffer.BlockCopy you will probably improve the performance a bit more (but if you have to convert to/from character arrays you may lose everything you've gained).
(edit: Now that you've posted example code): If you use references to the array rows then you may be able to shift the references rather than having to move the data itself. Any you can still shift the references using Array.Copy)
But if you change your approach so you don't need to shift the data, you'll gain considerably better performance - not doing the work at all if you can avoid it is always faster! Chances are you can wrap the data in an accessor layer that keeps track of how much the data has been shifted and modifies your indexes to return the data you are after. (This will slightly slow down access to the data, but saves you shifting the data, so may result in a net win - depending on how much you access relative to how much you shift)
The most efficient way would be to not shift it at all, but instead change how you access the array. For example, keep an offset that tells you where in the dimension the first column is.
I have two multi-dimentional arrays declared like this:
bool?[,] biggie = new bool?[500, 500];
bool?[,] small = new bool?[100, 100];
I want to copy part of the biggie one into the small. Let’s say I want from the index 100 to 199 horizontally and 100 to 199 vertically.
I have written a simple for statement that goes like this:
for(int x = 0; x < 100; x++)
{
For(int y = 0; y < 100; y++)
{
Small[x,y] = biggie[x+100,y+100];
}
}
I do this A LOT in my code, and this has proven to be a major performance jammer.
Array.Copy only copies single-dimentional arrays, and with multi-dimentional arrays it just considers as if the whole matrix is a single array, putting each row at the end of the other, which won’t allow me to cut a square in the middle of my array.
Is there a more efficient way to do this?
Ps.: I do consider refactoring my code in order not to do this at all, and doing whatever I want to do with the bigger array. Copying matrixes just can’t be painless, the point is that I have already stumbled upon this before, looked for an answer, and got none.
In my experience, there are two ways to do this efficiently:
Use unsafe code and work directly with pointers.
Convert the 2D array to a 1D array and do the necessary arithmetic when you need to access it as a 2D array.
The first approach is ugly and it uses potentially invalid assumptions since 2D arrays are not guaranteed to be laid out contiguously in memory. The upshot to the first approach is that you don't have to change your code that is already using 2D arrays. The second approach is as efficient as the first approach, doesn't make invalid assumptions, but does require updating your code.