Negative index arrays in .NET - c#

The Array.BinarySearch(Array array, object value) method's documentation mentions that
This method does not support searching arrays that contain negative indexes.
If you look at its source, there's a comment inside the GetMedian(int low, int hi) method that states:
// Note both may be negative, if we are dealing with arrays w/ negative lower bounds.
Contract.Requires(low <= hi);
Correct me if I'm wrong, but C# by itself has no notion of negative-index arrays and there is no such thing as an array's lower bound (like an Array.LowerBound property or something). That would lead me to the conclusion that while C# does not, CLR itself does allow negatively indexed arrays. Am I correct? Is there a .NET language that has such arrays? Why would you even have such arrays in the first place?
Bonus imaginary points if there's a reflection voodoo ritual one can perform to get a negatively indexed array in C# and break the fabric of reality.

Related

How are multi-dimensional arrays stored (.Net)?

I have a few questions about how .Net stores multi-dimensional arrays in memory. I am interested in true multi-dimensional arrays not jagged arrays.
How does the Common Language Runtime (CLR) store multi-dimensional arrays? Is it in row-major, column-major, Iliffe vector or some other format?
Is this format required by the Common Language Infrastucture (CLI) or any other specification?
Is it likely to vary over time or between platforms?
This is in the specification ECMA-335 Partition I (my bold)
8.9.1 Array types
Array elements shall be laid out within the array object in row-major order (i.e., the elements associated with the rightmost array dimension shall be laid out contiguously from lowest to highest index). The actual storage allocated for each array element can include platform-specific padding. (The size of this storage, in bytes, is returned by the sizeof instruction when it is applied to the type of that array‘s elements.)
Section 14.2 also has more explanation.
These two sections specifically refer to arrays as opposed to vectors, the latter of which is the more familiar zero-based one-dimensional array used in most places.
Arrays on the other hand, can be multi-dimensional and based on any bound, they can also be one-dimensional.
So essentially, it's just one big array under the hood, and uses simple math to calculate offsets.
"Is it likely to vary over time or between platforms?" I'll get my crystal ball out... The spec can always change, and implementations may decide to derogate from it.

Does primitive array expects integer as index

Should primitive array content be accessed by int for best performance?
Here's an example
int[] arr = new arr[]{1,2,3,4,5};
Array is only 5 elements in length, so the index doesn't have to be int, but short or byte, that would save useless 3 byte memory allocation if byte is used instead of int. Of course, if only i know that array wont overflow size of 255.
byte index = 1;
int value = arr[index];
But does this work as good as it sounds?
Im worried about how this is executed on lower level, does index gets casted to int or other operations which would actually slow down the whole process instead of this optimizing it.
In C and C++, arr[index] is formally equivalent to *(arr + index). Your concerns about casting should be answerable in terms of the simpler question about what the machine will do when it needs to add add an integer offset to a pointer.
I think it's safe to say that on most modern machines when you add a "byte" to a pointer, its going to use the same instruction as it would if you added a 32-bit integer to a pointer. And indeed it's still going to represent that byte using the machine word size, padded with some unused space. So this isn't going to make using the array faster.
Your optimization might make a difference if you need to store millions of these indices in a table, and then using byte instead of int would use 4 times less memory and take less time to move that memory around. If the array you are indexing is huge, and the index needs to be larger than the machine word side, then that's a different consideration. But I think it's safe to say that in most normal situations this optimization doesn't really make sense, and size_t is probably the most appropriate generic type for array indices all things being equal (since it corresponds exactly to the machine word size, on the majority of architectures).
does index gets casted to int or other operations which would actually slow down the whole process instead of this optimizing it
No, but
that would save useless 3 byte memory allocation
You don't gain anything by saving 3 bytes.
Only if you are storing a huge array of those indices then the amount of space you would save might make it a worthwhile investment.
Otherwise stick with a plain int, it's the processor's native word size and thus the fastest.

IEnumerable, .ElementAt and unsigned/signed indices in C#?

This is just a language curiosity rather than problem.
The ElementAt() method of IEnumerable accepts integers to get the Nth element of an enumerable collection.
For example
var list = new List<char>() { 'a', 'b', 'c' };
var alias = list.AsEnumerable();
int N = 0;
alias.ElementAt(N); //gets 'a'
All good, however, why doesn't ElementAt() accept unsigned integers (uint) ?
e.g.
uint N = 0;
alias.ElementAt(N); //doesn't compile
I can understand why ElementAt could accept integers to allow negative indices (e.g. Python allows negative indices where list[-1] refers to the last element), so it makes sense to have accept negative indices for those languages that do use them even if C# doesn't.
But I can't quite see the reasoning for disallowing unsigned integers, if anything an unsigned integer is better as it guarantees that the index will not be negative (so only the upper bound of the range needs to be checked).
The best thing I could think of is perhaps the CLR team decided to standardize on signed integers to allow other languages (e.g. Python) that do have negative indices to use the same code and ensure the ranges would consistent across languages.
Does anyone have a better/authoritative explanation for why .ElementAt() doesn't allow unsigned ints ?
-Marcin
The real reason is that .NET arrays can be non-zero based, even if C# language does not support declaring such arrays. You can still create them using Array.CreateInstance Method (Type, Int32[], Int32[]).
Note the special name of the type of the created object (System.Int32[*]) with an asterisk in it.
List is implemented using array internally, and it would not be practical to use different type for indexing.
Also, Count property usually participates in array index calculation, in which a partial result could be negative. Mixing types in an expression would be cumbersome and error prone.
Having a type which can't represent negative index would not help in error detection.
Using automatical clipping with unchecked operation would not fix logical array index computation errors in an application anyway.
Following example shows negative-based array manipulation C#:
var negativeBasedArray = Array.CreateInstance(typeof(Int32),
new []{2}, // array of array sizes for each dimension
new []{-1}); // array of lower bounds for each dimension
Console.WriteLine(negativeBasedArray.GetType()); // System.Int32[*]
negativeBasedArray.SetValue(123, -1);
negativeBasedArray.SetValue(456, 0);
foreach(var i in negativeBasedArray)
{
Console.WriteLine(i);
}
// 123
// 456
Console.WriteLine(negativeBasedArray.GetLowerBound(0)); // -1
Console.WriteLine(negativeBasedArray.GetUpperBound(0)); // 0

C# jagged array type declaration in reverse

I just had one of these "What the..." moments. Is the following intended and is there some obscure reasoning behind the "non-natural" declaration of arrays in C#?
int[,][] i; // declares an 2D - array where each element is an int[] !
// you have to use it like this:
i = new int[2,3][];
i[1,2] = new int[0];
I would have expected the other way around. int[,][] declares an 1-dimensional array where each element is a two dimensional array.
Funny though, the type's Name is reversed:
Console.WriteLine(typeof(int[,][]).Name); // prints "Int32[][,]"
Can someone explain this? Is this intentionally? (Using .NET 4.5 under Windows.)
You can find a lengthy discussion in Eric Lippert's blog Arrays of arrays.
What C# actually does
It’s a mess. No matter which option we choose, something ends up not
matching our intuition. Here’s what we actually chose in C#.
First off: option two [It’s a two-dimensional array, each element is a one-dimensional array of ints] is correct. We force you to live with the
weirdness entailed by Consequence One; you do not actually make an
element type into an array of that type by appending an array
specifier. You make it into an array type by prepending the specifier
to the list of existing array specifiers. Crazy but true.
The word 'prepending' partly explains your output of the reversed type-name. A CLR type name is not necessarily the same as the C# declaration.
But the more relevant quote is at the bottom:
That all said, multidimensional ragged arrays are almost certainly a bad code smell.
I had the same "what the..." moment the first time I saw new int[5][]; instead of new int[][5];.
EL's (very nice) blog post is dancing around one thing: there's a perfect way to do it for people with a ComSci degree, but no good way to do it for others. If you just follow the grammar, you declare right-to-left, new right-to-left and index left-to-right:
// 1D array of 2D arrays, no special rules needed:
int[,][] N; N=new[,][5]; N[0]=new int[4,4];
But C#'s target audience isn't people with 4-year CS degrees (who have all seen Reverse Polish and love right-to-left.) The trick, IMHO, in understanding C# jagged arrays is that they decided to make special rules for them, when they didn't technically need to.

How can I determine the sizeof for an unmanaged array in C#?

I'm trying to optimize some code where I have a large number of arrays containing structs of different size, but based on the same interface. In certain cases the structs are larger and hold more data, othertimes they are small structs, and othertimes I would prefer to keep null as a value to save memory.
My first question is. Is it a good idea to do something like this? I've previously had an array of my full data struct, but when testing mixing it up I would virtually be able to save lots of memory. Are there any other downsides?
I've been trying out different things, and it seams to work quite well when making an array of a common interface, but I'm not sure I'm checking the size of the array correctly.
To simplified the example quite a bit. But here I'm adding different structs to an array. But I'm unable to determine the size using the traditional Marshal.SizeOf method. Would it be correct to simply iterate through the collection and count the sizeof for each value in the collection?
IComparable[] myCollection = new IComparable[1000];
myCollection[0] = null;
myCollection[1] = (int)1;
myCollection[2] = "helloo world";
myCollection[3] = long.MaxValue;
System.Runtime.InteropServices.Marshal.SizeOf(myCollection);
The last line will throw this exception:
Type 'System.IComparable[]' cannot be marshaled as an unmanaged structure; no meaningful size or offset can be computed.
Excuse the long post:
Is this an optimal and usable solution?
How can I determine the size
of my array?
I may be wrong but it looks to me like your IComparable[] array is a managed array? If so then you can use this code to get the length
int arrayLength = myCollection.Length;
If you are doing platform interop between C# and C++ then the answer to your question headline "Can I find the length of an unmanaged array" is no, its not possible. Function signatures with arrays in C++/C tend to follow the following pattern
void doSomeWorkOnArrayUnmanaged(int * myUnmanagedArray, int length)
{
// Do work ...
}
In .NET the array itself is a type which has some basic information, such as its size, its runtime type etc... Therefore we can use this
void DoSomeWorkOnManagedArray(int [] myManagedArray)
{
int length = myManagedArray.Length;
// Do work ...
}
Whenever using platform invoke to interop between C# and C++ you will need to pass the length of the array to the receiving function, as well as pin the array (but that's a different topic).
Does this answer your question? If not, then please can you clarify
Optimality always depends on your requirements. If you really need to store many elements of different classes/structs, your solution is completely viable.
However, I guess your expectations on the data structure might be misleading: Array elements are per definition all of the same size. This is even true in your case: Your array doesn't store the elements themselves but references (pointers) to them. The elements are allocated somewhere on the VM heap. So your data structure actually goes like this: It is an array of 1000 pointers, each pointer pointing to some data. The size of each particular element may of course vary.
This leads to the next question: The size of your array. What are you intending to do with the size? Do you need to know how many bytes to allocate when you serialize your data to some persistent storage? This depends on the serialization format... Or do you need just a rough estimate on how much memory your structure is consuming? In the latter case you need to consider the array itself and the size of each particular element. The array which you gave in your example consumes approximately 1000 times the size of a reference (should be 4 bytes on a 32 bit machine and 8 bytes on a 64 bit machine). To compute the sizes of each element, you can indeed iterate over the array and sum up the size of the particular elements. Please be aware that this is only an estimate: The virtual machine adds some memory management overhead which is hard to determine exactly...

Categories