I have a few questions about how .Net stores multi-dimensional arrays in memory. I am interested in true multi-dimensional arrays not jagged arrays.
How does the Common Language Runtime (CLR) store multi-dimensional arrays? Is it in row-major, column-major, Iliffe vector or some other format?
Is this format required by the Common Language Infrastucture (CLI) or any other specification?
Is it likely to vary over time or between platforms?
This is in the specification ECMA-335 Partition I (my bold)
8.9.1 Array types
Array elements shall be laid out within the array object in row-major order (i.e., the elements associated with the rightmost array dimension shall be laid out contiguously from lowest to highest index). The actual storage allocated for each array element can include platform-specific padding. (The size of this storage, in bytes, is returned by the sizeof instruction when it is applied to the type of that array‘s elements.)
Section 14.2 also has more explanation.
These two sections specifically refer to arrays as opposed to vectors, the latter of which is the more familiar zero-based one-dimensional array used in most places.
Arrays on the other hand, can be multi-dimensional and based on any bound, they can also be one-dimensional.
So essentially, it's just one big array under the hood, and uses simple math to calculate offsets.
"Is it likely to vary over time or between platforms?" I'll get my crystal ball out... The spec can always change, and implementations may decide to derogate from it.
Related
The Array.BinarySearch(Array array, object value) method's documentation mentions that
This method does not support searching arrays that contain negative indexes.
If you look at its source, there's a comment inside the GetMedian(int low, int hi) method that states:
// Note both may be negative, if we are dealing with arrays w/ negative lower bounds.
Contract.Requires(low <= hi);
Correct me if I'm wrong, but C# by itself has no notion of negative-index arrays and there is no such thing as an array's lower bound (like an Array.LowerBound property or something). That would lead me to the conclusion that while C# does not, CLR itself does allow negatively indexed arrays. Am I correct? Is there a .NET language that has such arrays? Why would you even have such arrays in the first place?
Bonus imaginary points if there's a reflection voodoo ritual one can perform to get a negatively indexed array in C# and break the fabric of reality.
When creating a long[] in C#, due to the size limitation of 2GiB for any object in the CLR, I expect it to be able to hold a maximum of 2GiB / 64 Bit = 268,435,456 Elements. However, the maximum number of elements that the array can actually hold before throwing an exception is 268,435,448. Also, a long[][] can hold multiple long[]s with the above number of elements, thus being substantially larger than 2GiB. My questions are:
Where did those 64 bytes go that cannot be allocated. What are they used for by the CLR?
Why can a twodimensional array be larger than 2GiB?
Where did those 64 bytes go that cannot be allocated. What are they used for by the CLR?
Part of them go for the object header (sync block and vtable pointer, two pointers) and part for the array dimensions. Also, possibly a few pointers are used by the managed heap itself, because an object that big will require a separate chunk of heap.
Why can a twodimensional array be larger than 2GiB?
Because it is not a single CLR object. Every inner array is a separate object limited to 2GB, and the outer array only holds references to the inner arrays.
If I declare a List of char arrays, are they allocated in contiguous memory, or does .NET create a linked list instead?
If it's not contiguous, is there a way I can declare a contiguous list of char arrays? The size of the char arrays is know ahead of time and is fixed (they are all the same size).
Yes, but not in the way that you want. List<T> guarantees that its elements are stored contiguously.
Arrays are a reference type, so the references are stored cotiguously as List<T> guarantees. However, the arrays themselves are allocated separately and where they are stored has nothing to do with the list. It is only concerned with its elements, the references.
If you require that then you should simply use one large array and maintain boundary data.
EDIT: Per your comment:
The inner arrays are always 9 chars.
So, in this case, cache coherency may be an issue because the sub-arrays are so small. You'll be jumping around a lot in memory getting from one array to the next, and I'll just take you on your word about the performance sensitivity of this code.
Just use a multi-dimensional if you can. This of course assumes you know the size or that you can impose a maximum size on it.
Is it possible to trade some memory to reduce complexity/time and just set a max size for N? Using a multi-dimensional array (but don't use the latter) is the only way you can guarantee contiguous allocation.
EDIT 2:
Trying to keep the answer in sync with the comments. You say that the max size of the first dimension is 9! and, as before, the size of the second dimension is 9.
Allocate it all up front. You're trading some memory for time. 9! * 9 * 2 / 1024 / 1024 == ~6.22MB.
As you say, the List may grow to that size anyway, so worst case you waste a few MB of memory. I don't think it's going to be an issue unless you plan on running this code in a toaster oven. Just allocate the buffer as one array up front and you're good.
List functions as a dynamic array, not a linked list, but this is beside the point. No memory will be allocated for the char[]s until they themselves are instantiated. The List is merely responsible for holding references to char[]s, of which it will contain none when first created.
If it's not contiguous, is there a way I can declare a contiguous list of char arrays? The size of the char arrays is know ahead of time and is fixed (they are all the same size).
No, but you could instantiate a 2-dimensional array of chars, if you also know how many char arrays there would have been:
char[,] array = new char[x, y];
I've been trying to decide the best way to store information in VBO's for OpenGL. I've been reading about how matricies are stored as single-dimensional arrays because of the performance penalties of using multi-dimensional arrays.
I was wondering, does this extend to storing information in VBO's? does it make more sense to store vertex information within VBOs in single or multidimensional arrays?
Before I get a bunch of answers saying it depends on what I'm storing, I'm talking specifically about things that would traditionally be considered for multidimensional arrays (maps, grids, etc....).
what type of performance hits would I be looking at by using the multidimensional arrays if any?
The question is invalid, as it makes no sense. Buffer objects are blocks of memory stored and managed by OpenGL. It's not a C# array or a C# multi-dimensional array. It's a block of memory that you transfer data into.
The vertex data you put into a buffer object must conform with what glVertexAttribPointer allows. How you achieve that in C# is up to you and whatever API you use.
are stored as single-dimensional arrays because of the performance penalties of using multi-dimensional arrays.
I think your confusion stems from differences in terminology regarding multidimensional arrays and how they can be represented.
In C an array is accessed through pointer dereferencing:
int arr[10];
arr[5] <--> *(arr+5)
In fact the indexing operator o[i] fully equivalent to *(o+i) and you can actually also write i[o], because it yields the same expression. So the index operator does pointer arithmetic and dereferences. What does this mean for C multidim arrays:
int marr[10][10];
int (*row)[10] = marr[5] <--> *(marr + 5);
mrow[5] = *(mrow + 5) <--> marr[5][5];
And you can expand this for as many dimensions as you like. So, in C multidimensional arrays go through multiple pointer dereferences, which may also mean, that the data is not contigous in memory, and in face may have different sizes for each row. And that's why they are inefficient. C does this that was, to allow for dynamic allocation of the column pointers and the rows.
However the other way to store multidimensional arrays is a flat storage model. I.e. you just concatenate all the elements into one single string of values. And by introducing dimensions you know, where to cut this string to rearrange it.
int fmarr[10*20*30];
int at1_2_3 = fmarr[1*30*20 + 2*20 + 3];
As you can see all sizes are static, and the data is contigous. There are no pointer dereferences, which makes working with the data much simpler and more importantly allows for vectorization of the operations. This makes things much more performant.
I'm trying to optimize some code where I have a large number of arrays containing structs of different size, but based on the same interface. In certain cases the structs are larger and hold more data, othertimes they are small structs, and othertimes I would prefer to keep null as a value to save memory.
My first question is. Is it a good idea to do something like this? I've previously had an array of my full data struct, but when testing mixing it up I would virtually be able to save lots of memory. Are there any other downsides?
I've been trying out different things, and it seams to work quite well when making an array of a common interface, but I'm not sure I'm checking the size of the array correctly.
To simplified the example quite a bit. But here I'm adding different structs to an array. But I'm unable to determine the size using the traditional Marshal.SizeOf method. Would it be correct to simply iterate through the collection and count the sizeof for each value in the collection?
IComparable[] myCollection = new IComparable[1000];
myCollection[0] = null;
myCollection[1] = (int)1;
myCollection[2] = "helloo world";
myCollection[3] = long.MaxValue;
System.Runtime.InteropServices.Marshal.SizeOf(myCollection);
The last line will throw this exception:
Type 'System.IComparable[]' cannot be marshaled as an unmanaged structure; no meaningful size or offset can be computed.
Excuse the long post:
Is this an optimal and usable solution?
How can I determine the size
of my array?
I may be wrong but it looks to me like your IComparable[] array is a managed array? If so then you can use this code to get the length
int arrayLength = myCollection.Length;
If you are doing platform interop between C# and C++ then the answer to your question headline "Can I find the length of an unmanaged array" is no, its not possible. Function signatures with arrays in C++/C tend to follow the following pattern
void doSomeWorkOnArrayUnmanaged(int * myUnmanagedArray, int length)
{
// Do work ...
}
In .NET the array itself is a type which has some basic information, such as its size, its runtime type etc... Therefore we can use this
void DoSomeWorkOnManagedArray(int [] myManagedArray)
{
int length = myManagedArray.Length;
// Do work ...
}
Whenever using platform invoke to interop between C# and C++ you will need to pass the length of the array to the receiving function, as well as pin the array (but that's a different topic).
Does this answer your question? If not, then please can you clarify
Optimality always depends on your requirements. If you really need to store many elements of different classes/structs, your solution is completely viable.
However, I guess your expectations on the data structure might be misleading: Array elements are per definition all of the same size. This is even true in your case: Your array doesn't store the elements themselves but references (pointers) to them. The elements are allocated somewhere on the VM heap. So your data structure actually goes like this: It is an array of 1000 pointers, each pointer pointing to some data. The size of each particular element may of course vary.
This leads to the next question: The size of your array. What are you intending to do with the size? Do you need to know how many bytes to allocate when you serialize your data to some persistent storage? This depends on the serialization format... Or do you need just a rough estimate on how much memory your structure is consuming? In the latter case you need to consider the array itself and the size of each particular element. The array which you gave in your example consumes approximately 1000 times the size of a reference (should be 4 bytes on a 32 bit machine and 8 bytes on a 64 bit machine). To compute the sizes of each element, you can indeed iterate over the array and sum up the size of the particular elements. Please be aware that this is only an estimate: The virtual machine adds some memory management overhead which is hard to determine exactly...