If I declare a List of char arrays, are they allocated in contiguous memory, or does .NET create a linked list instead?
If it's not contiguous, is there a way I can declare a contiguous list of char arrays? The size of the char arrays is know ahead of time and is fixed (they are all the same size).
Yes, but not in the way that you want. List<T> guarantees that its elements are stored contiguously.
Arrays are a reference type, so the references are stored cotiguously as List<T> guarantees. However, the arrays themselves are allocated separately and where they are stored has nothing to do with the list. It is only concerned with its elements, the references.
If you require that then you should simply use one large array and maintain boundary data.
EDIT: Per your comment:
The inner arrays are always 9 chars.
So, in this case, cache coherency may be an issue because the sub-arrays are so small. You'll be jumping around a lot in memory getting from one array to the next, and I'll just take you on your word about the performance sensitivity of this code.
Just use a multi-dimensional if you can. This of course assumes you know the size or that you can impose a maximum size on it.
Is it possible to trade some memory to reduce complexity/time and just set a max size for N? Using a multi-dimensional array (but don't use the latter) is the only way you can guarantee contiguous allocation.
EDIT 2:
Trying to keep the answer in sync with the comments. You say that the max size of the first dimension is 9! and, as before, the size of the second dimension is 9.
Allocate it all up front. You're trading some memory for time. 9! * 9 * 2 / 1024 / 1024 == ~6.22MB.
As you say, the List may grow to that size anyway, so worst case you waste a few MB of memory. I don't think it's going to be an issue unless you plan on running this code in a toaster oven. Just allocate the buffer as one array up front and you're good.
List functions as a dynamic array, not a linked list, but this is beside the point. No memory will be allocated for the char[]s until they themselves are instantiated. The List is merely responsible for holding references to char[]s, of which it will contain none when first created.
If it's not contiguous, is there a way I can declare a contiguous list of char arrays? The size of the char arrays is know ahead of time and is fixed (they are all the same size).
No, but you could instantiate a 2-dimensional array of chars, if you also know how many char arrays there would have been:
char[,] array = new char[x, y];
Related
Let's just say I'm running a physics simulation that uses integers as vertexes on a model. In this simulation I load arrays of integers to a list as the amount of vertexes may vary; like so:
List<int[]> x = new List<int[]>();
x.Add(new <int[1]>());
I know it's a bit overboard, considering to use 2GB worth of integers, but the model could range anywhere from a single object to entire open field. So, considering this process is repeated enough to take up 2GB, would each element/array have it's own 2GB as it's own object or does the entire list still count as the same object.
The list is an object, the backing array inside the list (T[], so: int[][]) is an object, and each int[] array is (separately) an object. As long as no individual array is too large, you're OK. At no point are the arrays in a List<some array> treated as contiguous, so it doesn't matter if their combined length exceeds the 2 GiB limit.
Note that you can enable very-large-object support in your configuration (<gcAllowVeryLargeObjects>) to squeeze out a slightly larger array limit - for most arrays (not bytes/single-byte elements) it changes the maximum element count to 2,146,435,071 - which is ~8 GiB in your case (int[]). That doesn't necessarily mean it is a good idea to do so :)
When creating a long[] in C#, due to the size limitation of 2GiB for any object in the CLR, I expect it to be able to hold a maximum of 2GiB / 64 Bit = 268,435,456 Elements. However, the maximum number of elements that the array can actually hold before throwing an exception is 268,435,448. Also, a long[][] can hold multiple long[]s with the above number of elements, thus being substantially larger than 2GiB. My questions are:
Where did those 64 bytes go that cannot be allocated. What are they used for by the CLR?
Why can a twodimensional array be larger than 2GiB?
Where did those 64 bytes go that cannot be allocated. What are they used for by the CLR?
Part of them go for the object header (sync block and vtable pointer, two pointers) and part for the array dimensions. Also, possibly a few pointers are used by the managed heap itself, because an object that big will require a separate chunk of heap.
Why can a twodimensional array be larger than 2GiB?
Because it is not a single CLR object. Every inner array is a separate object limited to 2GB, and the outer array only holds references to the inner arrays.
I'm looking for the most efficient way to store and manage a large byte array in memory. I will have the need to both insert and delete bytes from any position within the array.
At first, I was thinking that a regular array was best.
byte[] buffer = new byte[ArraySize];
This would allow me to access any byte within the array. I can also resize the array. However, there doesn't appear to be any built-in support for shifting or moving items within the array.
One option is to have a loop to move items one by one but that sounds horribly inefficient in C#. Another option is to create a new array and copy bytes over to the correct position, but that requires copying all data in the array.
Is there no better option?
Actually, I just found the Buffer Class, which appears ideal for what I need.
It looks like the BlockCopy method will block copy a bunch of items and supports copying within the same array, and even correctly handles overlapping items.
I think the best option in this case is a hybrid between a regular array and a list. This would only be necessary with megabyte sized arrays though.
So you could do something like this:
List<byte[]> buffer;
And have each element of the list just a chunk of the data(say 64K or something small and manageable)
It'd require quite a bit of custom code, but would definitely be the fastest option when having to shift data around in a large array.
Also, if you're doing a lot more shifting of bytes than anything else, LinkedList<T> may work better (but it's famously bad for everything but a specific set of cases)
To clarify why this is more correct than an array, consider inserting 1 byte to the beginning of an array. You must allocate another array (double memory consumption) and then copy every byte to the new array after inserting the new byte, and then free the old array (possible heap corruption depending on size)
Consider now this method with lists.
If you have to insert a lot of bytes, you'll probably want to insert at the beginning of the buffer list. This is an O(n) operation, so your ending efficiency for this operation is O(n/CHUNK_SIZE)
Or, if you just need to insert a single byte, you can just get the first element of the list and copy the array as normal. Then, the speed is O(CHUNK_SIZE), which isn't horrible, especially if n in comparison is very large (megabytes of data)
In my program I have a bunch of growing arrays where a new element is grown one by one to the end of the array. I identified Lists to be a speed bottleneck in a critical part of my program due to their slow access time in comparison with an array - switching to an array increased performance tremendously to an acceptable level. So to grow the array i'm using Array.Resize. This works well as my implementation restricts the array size to approximately 20 elements, so the O(N) performance of Array.Resize is bounded.
But it would be better if there was a way to just increase an array by one element at the end without having to use Array.Resize; which I believe does a copy of the old array to the newly sized array.
So my question is, is there a more efficiant method for adding one element to the end of an array without using List or Array.Resize?
A List has constant time access just like an array. For 'growing arrays' you really should be using List.
When you know that you may be adding elements to an array backed structure, you don't want to add one new size at a time. Usually it is best to grow an array by doubling it's size when it fills up.
As has been previously mentioned, List<T> is what you are looking for. If you know the initial size of the list, you can supply an initial capacity to the constructor, which will increase your performance for your initial allocations:
List<int> values = new List<int>(5);
values.Add(1);
values.Add(2);
values.Add(3);
values.Add(4);
values.Add(5);
List's allocate 4 elements to begin with (unless you specify a capacity when you construct it) and then grow every 4 elements.
Why don't you try a similar thing with Array? I.e. create it as having 4 elements, then when you insert the fifth element, first grow the array by another 4 elements.
There is no way to resize an array, so the only way to get a larger array is to use Array.Resize to create a new array.
Why not just create the arrays to have 20 elements from start (or whatever capacity you need at most), and use a variable to keep track of how many elements are used in the array? That way you never have to resize any arrays.
Growing an array AFAIK means that a new array is allocated, the existing content being copied to the new instance. I doubt that this should be faster than using List...?
it's much faster to resize an array in chunks (like 10) and store this as a seperate variable e.g capacity and then only resize the array when the capacity is reached. This is how a list works but if you prefer to use arrays then you should look into resizing them in larger chunks especially if you have a large number of Array.Resize calls
I think that every method, that wants to use array, will not be ever optimized because an array is a static structure so I think it's better to use dynamic structures like List or others.
Is it worthwhile to initialize the collection size of a List<T> if it's reasonably known?
Edit: Furthering this question, after reading the first answers this question really boils down to what is the default capacity and how is the growth operation performed, does it double the capacity etc.?
Yes, it gets to be important when your List<T> gets large. The exact numbers depend on the element type and the machine architecture, let's pick a List of reference types on a 32-bit machine. Each element will then take 4 bytes inside an internal array. The list will start out with a Capacity of 0 and an empty array. The first Add() call grows the Capacity to 4, reallocating the internal array to 16 bytes. Four Add() calls later, the array is full and needs to be reallocated again. It doubles the size, Capacity grows to 8, array size to 32 bytes. The previous array is garbage.
This repeats as necessary, several copies of the internal array will become garbage.
Something special happens when the array has grown to 65,536 bytes (16,384 elements). The next Add() doubles the size again to 131,072 bytes. That's a memory allocation that exceeds the threshold for "large objects" (85,000 bytes). The allocation is now no longer made on the generation 0 heap, it is taken from the Large Object Heap.
Objects on the LOH are treated specially. They are only garbage collected during a generation 2 collection. And the heap doesn't get compacted, it takes too much time to move such large chunks.
This repeats as necessary, several LOH objects will become garbage. They can take up memory for quite a while, generation 2 collections do not happen very often. Another problem is that these large blocks tend to fragment the virtual memory address space.
This doesn't repeat endlessly, sooner or later the List class needs to re-allocate the array and it has grown so large that there isn't a hole left in the virtual memory address space to fit the array. Your program will bomb with an OutOfMemoryException. Usually well before all available virtual memory has been consumed.
Long story short, by setting the Capacity early, before you start filling the List, you can reserve that large internal array up front. You won't get all those awkward released blocks in the Large Object Heap and avoid fragmentation. In effect, you'll be able to store many more objects in the list and your program runs leaner since there's so little garbage. Do this only if you have a good idea how large the list will be, using a large Capacity that you'll never fill is wasteful.
It is, as per documentation
If the size of the collection can be
estimated, specifying the initial
capacity eliminates the need to
perform a number of resizing
operations while adding elements to
the List(T).
Well, it will stop you the values in the list (which will be references if the element type is a reference type) from having to be copied occasionally as the list grows.
If it's going to be a particularly large list and you've got a pretty good idea of the size, it won't hurt. However, if estimating the size involves extra calculations or any significant amount of code, I wouldn't worry about it unless you find it becomes a problem - it could distract from the main focus of the code, and the resizing is unlikely to cause performance issues unless it's a really big list or you're doing it a lot.