Why is "long" being allowed as array length in C#? - c#

I wanted to try to allocate a 4 billion bytes array and this is my C# code:
long size = 4 * 1000;
size *= 1000;
size *= 1000;
byte[] array = new byte[size];
this code fails with System.OverflowException on the line containing new. Okay, turns out Length returns int, so the array length is also limited to what int can store.
Then why is there no compile-time error and long is allowed to be used as the number of array elements at allocation?

Because the specification says so in section 7.6.10.4:
Each expression in the expression list must be of type int, uint, long, or ulong, or implicitly convertible to one or more of these types.
This is most likely to easily allow creation of arrays larger than 2 GiB, even though they are not supported yet (but will be without a language change once the CLR makes such a change). Mono does support this, however and .NET 4.5 apparently will allow larger arrays too.
Regarding array length being an int by the way: There is also LongLength, returning a long. This was in .NET 1.1 and probably a future-proofing change.

why long is allowed as array length?
Answer is: long in .net means Int64
And array indexing can be Int64 according to specification.
2nd question: Why overflowexception is showing?
Because any single object can not be allocated more than 2GB of memory.

It is a limitation of the CLR, no single object can exceed 2GB, including arrays:
Large array C# OutOfMemoryException
This is regardless of 32-bit or 64-bit OSs. That said, it doesn't stop you from using more than that amount in total, just not on one object.
It is a runtime error because if you keep the long (or other initializing value) within range, it will work.
You can initialize arrays with all integral types: sbyte, char, short, int, and long - all compile; the unsigned variants work too.

There is a solution in .Net 4.5-4.6 for allowing large size for an array.
<runtime>
<gcAllowVeryLargeObjects enabled="true" />
</runtime>
See documentation.

The long is an integral type, so it can be used to define the array; the exception is from building too large of an array, not specifically from using a long.
Ex, this works just fine:
long size = 20;
byte[] array = new byte[size];

Related

Does primitive array expects integer as index

Should primitive array content be accessed by int for best performance?
Here's an example
int[] arr = new arr[]{1,2,3,4,5};
Array is only 5 elements in length, so the index doesn't have to be int, but short or byte, that would save useless 3 byte memory allocation if byte is used instead of int. Of course, if only i know that array wont overflow size of 255.
byte index = 1;
int value = arr[index];
But does this work as good as it sounds?
Im worried about how this is executed on lower level, does index gets casted to int or other operations which would actually slow down the whole process instead of this optimizing it.
In C and C++, arr[index] is formally equivalent to *(arr + index). Your concerns about casting should be answerable in terms of the simpler question about what the machine will do when it needs to add add an integer offset to a pointer.
I think it's safe to say that on most modern machines when you add a "byte" to a pointer, its going to use the same instruction as it would if you added a 32-bit integer to a pointer. And indeed it's still going to represent that byte using the machine word size, padded with some unused space. So this isn't going to make using the array faster.
Your optimization might make a difference if you need to store millions of these indices in a table, and then using byte instead of int would use 4 times less memory and take less time to move that memory around. If the array you are indexing is huge, and the index needs to be larger than the machine word side, then that's a different consideration. But I think it's safe to say that in most normal situations this optimization doesn't really make sense, and size_t is probably the most appropriate generic type for array indices all things being equal (since it corresponds exactly to the machine word size, on the majority of architectures).
does index gets casted to int or other operations which would actually slow down the whole process instead of this optimizing it
No, but
that would save useless 3 byte memory allocation
You don't gain anything by saving 3 bytes.
Only if you are storing a huge array of those indices then the amount of space you would save might make it a worthwhile investment.
Otherwise stick with a plain int, it's the processor's native word size and thus the fastest.

Is the 32-bit .NET max byte array size < 2GB?

I have been looking at some SO questions related to the max size of an array of bytes (here and here) and have been playing with some arrays and getting some results I don't quite understand. My code is as follows:
byte[] myByteArr;
byte[] myByteArr2 = new byte[671084476];
for (int i = 1; i < 2; i++)
{
myByteArr = new byte[671084476];
}
This will compile and upon execution it will throw a 'System.OutOfMemoryException' on the initialization of myByteArr. If I change the 2 in the for loop to a 1 or I comment out one of the initialization's (either myByteArr2 or myByteArr) it will run fine.
Also, byte[] myByteArr = new byte[Int32.MaxValue - 56]; throws the same exception.
Why does this happen when compiled for 32-bit? Aren't they within the 2GB limit?
The limits of a 32-bit program are not per-object. It's a process limit. You cannot have more than 2GB total in use.
Not only that, but in practice, it's often difficult to get anywhere near 2GB due to address space fragmentation. .NET's managed (ie. movable) memory helps somewhat, but doesn't eliminate this problem.
Even if you are using a 64-bit process, you may have a similar problem because in C# arrays are indexed by an int, which is defined as a 32-bit signed integer, and thus can't address past the 2GB boundary in an array of bytes. If you read the answer to the second link carefully, you'll also see that there is a 2GB per object limit. Your array of bytes presumably has some overhead, so it can't get to the full 2GB just for the raw data.
See #Habib's link in the comments for details.

BitConverter vs Casting Differences

Given that:
int onlyLastByteContainsValue = 35;
Which of the following is faster and why?
byte valueInByte = BitConverter.GetBytes(onlyLastByteContainsValue)[3];
Or
byte valueInByte = (byte)onlyLastByteContainsValue;
Follow-up question: Are there other differences between the two above?
Naturally, the cast will be faster, from my profiling by up to x5 with optimizations off (and even moreso with optimizations on).
Of course there are different things going on:
Your BitConverter statement allocates an array with a size of sizeof(int), fills it with all the bytes of the int value, and then indexes the array to retrieve only one byte. It goes without saying that this is wasteful of resources.
The cast checks that the specified value is within the range of byte (unless unchecked is specified), and casts the least significant byte of the value to a byte.

C# Object Size Overhead

I am working on optimization of memory consuming application. In relation to that I have question regarding C# reference type size overhead.
The C# object consumes as many bytes as its fields, plus some additional administrative
overhead. I presume that administrative overhead can be different for different .NET versions and implementations.
Do you know what is the size (or maximum size if the overhead is variable) of the administrative overhead for C# objects (C# 4.0 and Windows 7 and 8 environment)?
Does the administrative overhead differs between 32- or 64-bit .NET runtime?
Typically, there is an 8 or 12 byte overhead per object allocated by the GC. There are 4 bytes for the syncblk and 4 bytes for the type handle on 32bit runtimes, 8 bytes on 64bit runtimes. For details, see the "ObjectInstance" section of Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects on MSDN Magazine.
Note that the actual reference does change on 32bit or 64bit .NET runtimes as well.
Also, there may be padding for types to fit on address boundaries, though this depends a lot on the type in question. This can cause "empty space" between objects as well, but is up to the runtime (mostly, though you can affect it with StructLayoutAttribute) to determine when and how data is aligned.
There is an article online with the title "The Truth About .NET Objects And Sharing Them Between AppDomains" which shows some rotor source code and some results of experimenting with objects and sharing them between app domains via a plain pointer.
http://geekswithblogs.net/akraus1/archive/2012/07/25/150301.aspx
12 bytes for all 32-bit versions of the CLR
24 bytes for all 64-bit versions of the CLR
You can do test this quite easily by adding millions of objects (N) to an array. Since the pointer size is known you can calculate the object size by dividing the value by N.
var initial = GC.GetTotalMemory(true);
const int N = 10 * 1000 * 1000;
var arr = new object[N];
for (int i = 0; i < N; i++)
{
arr[i] = new object();
}
var ObjSize = (GC.GetTotalMemory(false) - initial - N * IntPtr.Size) / N;
to get an approximate value on your .NET platform.
The object size is actually defined to allow the GC to make assumptions about the minimum object size.
\sscli20\clr\src\vm\object.h
//
// The generational GC requires that every object be at least 12 bytes
// in size.
#define MIN_OBJECT_SIZE (2*sizeof(BYTE*) + sizeof(ObjHeader))
For e.g. 32 bit this means that the minimum object size is 12 bytes which do leave a 4-byte hole. This hole is empty for an empty object but if you add e.g. int to your empty class then it is filled and the object size stays at 12 bytes.
There are two types of overhead for an object:
Internal data used to handle the object.
Padding between data members.
The internal data is two pointers, so in a 32-bit application that is 8 bytes, and in a 64-bit application that is 16 bytes.
Data members are padded so that they start on an even address boundary. If you for example have a byte and an int in the class, the byte is probably padded with three unused bytes so that the int starts on the next machine word boundary.
The layout of the classes is determined by the JIT compiler depending on the architecture of the system (and might vary between framework versions), so it's not known to the C# compiler.

What is the Maximum Size that an Array can hold?

In C# 2008, what is the Maximum Size that an Array can hold?
System.Int32.MaxValue
Assuming you mean System.Array, ie. any normally defined array (int[], etc). This is the maximum number of values the array can hold. The size of each value is only limited by the amount of memory or virtual memory available to hold them.
This limit is enforced because System.Array uses an Int32 as it's indexer, hence only valid values for an Int32 can be used. On top of this, only positive values (ie, >= 0) may be used. This means the absolute maximum upper bound on the size of an array is the absolute maximum upper bound on values for an Int32, which is available in Int32.MaxValue and is equivalent to 2^31, or roughly 2 billion.
On a completely different note, if you're worrying about this, it's likely you're using alot of data, either correctly or incorrectly. In this case, I'd look into using a List<T> instead of an array, so that you are only using as much memory as needed. Infact, I'd recommend using a List<T> or another of the generic collection types all the time. This means that only as much memory as you are actually using will be allocated, but you can use it like you would a normal array.
The other collection of note is Dictionary<int, T> which you can use like a normal array too, but will only be populated sparsely. For instance, in the following code, only one element will be created, instead of the 1000 that an array would create:
Dictionary<int, string> foo = new Dictionary<int, string>();
foo[1000] = "Hello world!";
Console.WriteLine(foo[1000]);
Using Dictionary also lets you control the type of the indexer, and allows you to use negative values. For the absolute maximal sized sparse array you could use a Dictionary<ulong, T>, which will provide more potential elements than you could possible think about.
Per MSDN it is
By default, the maximum size of an Array is 2 gigabytes (GB).
In a 64-bit environment, you can avoid the size restriction by setting the enabled attribute of the gcAllowVeryLargeObjects configuration element to true in the run-time environment.
However, the array will still be limited to a total of 4 billion elements.
Refer Here http://msdn.microsoft.com/en-us/library/System.Array(v=vs.110).aspx
Note: Here I am focusing on the actual length of array by assuming that we will have enough hardware RAM.
This answer is about .NET 4.5
According to MSDN, the index for array of bytes cannot be greater than 2147483591. For .NET prior to 4.5 it also was a memory limit for an array. In .NET 4.5 this maximum is the same, but for other types it can be up to 2146435071.
This is the code for illustration:
static void Main(string[] args)
{
// -----------------------------------------------
// Pre .NET 4.5 or gcAllowVeryLargeObjects unset
const int twoGig = 2147483591; // magic number from .NET
var type = typeof(int); // type to use
var size = Marshal.SizeOf(type); // type size
var num = twoGig / size; // max element count
var arr20 = Array.CreateInstance(type, num);
var arr21 = new byte[num];
// -----------------------------------------------
// .NET 4.5 with x64 and gcAllowVeryLargeObjects set
var arr451 = new byte[2147483591];
var arr452 = Array.CreateInstance(typeof(int), 2146435071);
var arr453 = new byte[2146435071]; // another magic number
return;
}
Here is an answer to your question that goes into detail:
http://www.velocityreviews.com/forums/t372598-maximum-size-of-byte-array.html
You may want to mention which version of .NET you are using and your memory size.
You will be stuck to a 2G, for your application, limit though, so it depends on what is in your array.
I think it is linked with your RAM (or probably virtual memory) space and for the absolute maximum constrained to your OS version (e.g. 32 bit or 64 bit)
I think if you don't consider the VM, it is Integer.MaxValue

Categories