Behind the Scenes with Arrays in C# - c#

C# handles arrays much differently than C or C++ does. It treats them as an object for one thing, although they do have a fixed size. So I have some questions about arrays in C#:
If I create an array new int[10] will the Length property be guaranteed to be 10 or does C# try and resize the array if I get close to filling it up?
If I need to resize the array, do I have to create a new one with a larger size and then copy over each element in the array, or can I simply add more space without copying the elements over.
EDIT: it looks like you can use Array.Copy to copy the array over.
The List collection in C# seems like a dynamically sized array (maybe a mix between arrays and linked lists). If I use a List instead of an array, can I control its size? (i.e. force it to be length 10, then set its size to 20 if I need more space)

If I create an array new int[10] will the Length property be guaranteed to be 10 or does C# try and resize the array if I get close to filling it up?
It'll be 10. Always 10.
If I need to resize the array, do I have to create a new one with a larger size and then copy over each element in the array, or can I simply add more space without copying the elements over.
You need to resize it and copy. There are some helper methods that make this easier, like Array.Resize - but make no mistake that it is creating a new array and using Array.Copy to put everything there. If you need a resizable collection, use List<T>.
The List collection in C# seems like a dynamically sized array (maybe a mix between arrays and linked lists). If I use a List instead of an array, can I control its size? (i.e. force it to be length 10, then set its size to 20 if I need more space).
The size is controlled automatically. It will grow when needed. There is a constructor overload to accept the initial size. Internally, it also uses an array that is being resized when needed. When it's filled; it grows by double of its current capacity, and copies everything in the new underlying array in the list. But all that magic happens behind the scenes. If you want to manually resize the internal array of the List<T>, set the Capacity property to the number of items.

1) Yes, the Length will be 10.
2) Yes you have to create another one and copy the members. The Array class provides a method for that but you can't avoid the copying.
3) A list is indeed more convenient. You control its length (Count) by adding or removing members.
As a consequence, arrays are used less frequently in C#. We prefer List<T> classes (which could be using arrays under the covers). But they do incur some overhead.
Use arrays for low-level or fixed-size problems. Also there are no good (fast) multi-dim List classes.

Yes, arrays defined like that have a fixed size.
Yes, if you wish to extend you'll have to create a new Array
Yes, you can specify an initial capacity, with all values initializes to the default of the type stored. Extending happens automatically.

No
If you don't know the size of the array, and if the collection has to grow, use a List<T> instead. A List uses an array internally, but the List takes care of resizing the array if necessary.
The List will grow automatically as needed. You can set it's initiali Capacity though.

Arrays are of a fixed size. If you want to change the size of an array, allocate a new one, and copy the items into it using Array.Copy (or Array.Resize).
It is recommended to use List<T> instead, if you are regularly doing this. That handles all the resizing for you. You can specify an initial capacity when you create the list, but if you add more than the capacity the backing array is automatically resized to fit the extra items.

quick n dirty ;)
If I create an array new int[10] will the Length property be guaranteed to be 10 or does C# try and resize the array if I get close to filling it up?
NO
If I need to resize the array, do I have to create a new one with a larger size and then copy over each element in the array, or can I simply add more space without copying the elements over.
YES
The List collection in C# seems like a dynamically sized array (maybe a mix between arrays and linked lists). If I use a List instead of an array, can I control its size? (i.e. force it to be length 10, then set its size to 20 if I need more space)
If you construct your list without a capacity it will grow. You can set the maximum size using the constructor like...
List<string> list = new List<string>(10);

C# creates array of length 10 and don't resize it.
You can use list, you can't resize arrays.
Yes you can set list Capacity.
Capacity is the number of elements that the List(Of T) can store
before resizing is required, while Count is the number of elements
that are actually in the List(Of T).

Related

Preallocating List c#

I am working in c# and I am creating a list (newList) that I can get from the length of another list (otherList). In c# is list implemented in a way that preallocating the length of the list is better for performance using otherList.Count or to just use newList.Add(obj) and not worry about the length?
The following constructor for List<T> is implemented for the purpose of improving performance in scenarios like yours:
http://msdn.microsoft.com/en-us/library/dw8e0z9z.aspx
public List(int capacity)
Just pass the capacity in the constructor.
newList = new List<string>(otherList.Count);
If you know the exact length of the new list, creating it with that capacity indeed performs - a bit - better.
The reason is that the implementation of List<T> internally uses an array. If this gets too small, a new array is created and the items from the old array are copied over to the new item.
Taken from the Remarks section on MSDN
The capacity of a List<T> is the number of elements that the List<T>
can hold. As elements are added to a List<T>, the capacity is automatically increased as required by reallocating the internal array.
If the size of the collection can be estimated, specifying the initial capacity eliminates the need to perform a number of resizing operations while adding elements to the List<T>.
The capacity can be decreased by calling the TrimExcess method or by
setting the Capacity property explicitly. Decreasing the capacity
reallocates memory and copies all the elements in the List<T>.
So, this would suggest that there would be a performance increase if you have an estimate of the size of the list you are going to populate. Of course the other-side of this is allocating a list size too big and therefore using up memory unnecessarily.
To be honest, I would not worry about this sort of micro optimization unless I really need to.
So I benchmarked the scenario of preallocation, and just wanted to share some numbers. The code simply times this:
var repetitions = 100000000
var list = new List<object>();
for (var i = 0; i < repetitions; i++)
list.Add(new object());
The comparison is by allocating in the list the exact number of repetitions, twice the number and without preallocation. This are the times:
List not preallocated: 6561, 6394, 6556, 6283, 6466
List preallocated: 5951, 6037, 5885, 6044, 5996
List double preallocated: 6710, 6665, 6729, 6760, 6624
So is preallocation worth it? Yes if the number of objects in the list is known, or if a bottom line of this number is known (how many items will the list have at least).
If not, you could risk actually wasting more time and memory, since the preallocation will use time and memory to make space for the allocation desired.
Note that also in terms of memory, exact preallocation is the most memory efficient, since not preallocating will make your list grow over the tightest needed capacity.

Is C# List<char[]> allocated in contiguous memory?

If I declare a List of char arrays, are they allocated in contiguous memory, or does .NET create a linked list instead?
If it's not contiguous, is there a way I can declare a contiguous list of char arrays? The size of the char arrays is know ahead of time and is fixed (they are all the same size).
Yes, but not in the way that you want. List<T> guarantees that its elements are stored contiguously.
Arrays are a reference type, so the references are stored cotiguously as List<T> guarantees. However, the arrays themselves are allocated separately and where they are stored has nothing to do with the list. It is only concerned with its elements, the references.
If you require that then you should simply use one large array and maintain boundary data.
EDIT: Per your comment:
The inner arrays are always 9 chars.
So, in this case, cache coherency may be an issue because the sub-arrays are so small. You'll be jumping around a lot in memory getting from one array to the next, and I'll just take you on your word about the performance sensitivity of this code.
Just use a multi-dimensional if you can. This of course assumes you know the size or that you can impose a maximum size on it.
Is it possible to trade some memory to reduce complexity/time and just set a max size for N? Using a multi-dimensional array (but don't use the latter) is the only way you can guarantee contiguous allocation.
EDIT 2:
Trying to keep the answer in sync with the comments. You say that the max size of the first dimension is 9! and, as before, the size of the second dimension is 9.
Allocate it all up front. You're trading some memory for time. 9! * 9 * 2 / 1024 / 1024 == ~6.22MB.
As you say, the List may grow to that size anyway, so worst case you waste a few MB of memory. I don't think it's going to be an issue unless you plan on running this code in a toaster oven. Just allocate the buffer as one array up front and you're good.
List functions as a dynamic array, not a linked list, but this is beside the point. No memory will be allocated for the char[]s until they themselves are instantiated. The List is merely responsible for holding references to char[]s, of which it will contain none when first created.
If it's not contiguous, is there a way I can declare a contiguous list of char arrays? The size of the char arrays is know ahead of time and is fixed (they are all the same size).
No, but you could instantiate a 2-dimensional array of chars, if you also know how many char arrays there would have been:
char[,] array = new char[x, y];

Removing duplicate items from a list without using temp memory

I want to write a function that takes a collection of integers and removes the duplicates from the collection. I can not apply any sorting algorithm. Similarly I cannot duplicate the collection. I need to conserve the memory and provide an efficient solution that can process millions of items without significantly overusing the battery.
if you are very short on memory, best solution would be not to include the redundant
integers in the list in the first place.
To do this you might use an array [0..65536] of boolean (that you might 'pack' 8 by 8 to get it smaller) which record which one was allready used.
Another solution is to have the list sorted, by inserting items in the right place, but not inserting them if they are allready here. Insertion will be in log(number of unique items so far) for each item, so it should be something like a n*log(n) time for your list.
If you do not have control over the source, you could still use an array of boolean, maybe bigger if you need to, then initialize it (set all to false, then : isUsed[itemList[i]] = true;), then you can dispose of the list so you have memory again, then build a new list out of the array. So the output will be ordered.
If your integers are 32 bits, array would be 500 MB big, so maybe too big..., but depending on the integers distribution (is there a wide range of possible numbers ?? ), you might do be able to lower that size...
Notice that if you are very short on memory you might use an object pool to reuse objects.
(you might even re-use objects that you just removed from the list.)

Best Way to Represent Large Byte Array

I'm looking for the most efficient way to store and manage a large byte array in memory. I will have the need to both insert and delete bytes from any position within the array.
At first, I was thinking that a regular array was best.
byte[] buffer = new byte[ArraySize];
This would allow me to access any byte within the array. I can also resize the array. However, there doesn't appear to be any built-in support for shifting or moving items within the array.
One option is to have a loop to move items one by one but that sounds horribly inefficient in C#. Another option is to create a new array and copy bytes over to the correct position, but that requires copying all data in the array.
Is there no better option?
Actually, I just found the Buffer Class, which appears ideal for what I need.
It looks like the BlockCopy method will block copy a bunch of items and supports copying within the same array, and even correctly handles overlapping items.
I think the best option in this case is a hybrid between a regular array and a list. This would only be necessary with megabyte sized arrays though.
So you could do something like this:
List<byte[]> buffer;
And have each element of the list just a chunk of the data(say 64K or something small and manageable)
It'd require quite a bit of custom code, but would definitely be the fastest option when having to shift data around in a large array.
Also, if you're doing a lot more shifting of bytes than anything else, LinkedList<T> may work better (but it's famously bad for everything but a specific set of cases)
To clarify why this is more correct than an array, consider inserting 1 byte to the beginning of an array. You must allocate another array (double memory consumption) and then copy every byte to the new array after inserting the new byte, and then free the old array (possible heap corruption depending on size)
Consider now this method with lists.
If you have to insert a lot of bytes, you'll probably want to insert at the beginning of the buffer list. This is an O(n) operation, so your ending efficiency for this operation is O(n/CHUNK_SIZE)
Or, if you just need to insert a single byte, you can just get the first element of the list and copy the array as normal. Then, the speed is O(CHUNK_SIZE), which isn't horrible, especially if n in comparison is very large (megabytes of data)

C# Increasing an array by one element at the end

In my program I have a bunch of growing arrays where a new element is grown one by one to the end of the array. I identified Lists to be a speed bottleneck in a critical part of my program due to their slow access time in comparison with an array - switching to an array increased performance tremendously to an acceptable level. So to grow the array i'm using Array.Resize. This works well as my implementation restricts the array size to approximately 20 elements, so the O(N) performance of Array.Resize is bounded.
But it would be better if there was a way to just increase an array by one element at the end without having to use Array.Resize; which I believe does a copy of the old array to the newly sized array.
So my question is, is there a more efficiant method for adding one element to the end of an array without using List or Array.Resize?
A List has constant time access just like an array. For 'growing arrays' you really should be using List.
When you know that you may be adding elements to an array backed structure, you don't want to add one new size at a time. Usually it is best to grow an array by doubling it's size when it fills up.
As has been previously mentioned, List<T> is what you are looking for. If you know the initial size of the list, you can supply an initial capacity to the constructor, which will increase your performance for your initial allocations:
List<int> values = new List<int>(5);
values.Add(1);
values.Add(2);
values.Add(3);
values.Add(4);
values.Add(5);
List's allocate 4 elements to begin with (unless you specify a capacity when you construct it) and then grow every 4 elements.
Why don't you try a similar thing with Array? I.e. create it as having 4 elements, then when you insert the fifth element, first grow the array by another 4 elements.
There is no way to resize an array, so the only way to get a larger array is to use Array.Resize to create a new array.
Why not just create the arrays to have 20 elements from start (or whatever capacity you need at most), and use a variable to keep track of how many elements are used in the array? That way you never have to resize any arrays.
Growing an array AFAIK means that a new array is allocated, the existing content being copied to the new instance. I doubt that this should be faster than using List...?
it's much faster to resize an array in chunks (like 10) and store this as a seperate variable e.g capacity and then only resize the array when the capacity is reached. This is how a list works but if you prefer to use arrays then you should look into resizing them in larger chunks especially if you have a large number of Array.Resize calls
I think that every method, that wants to use array, will not be ever optimized because an array is a static structure so I think it's better to use dynamic structures like List or others.

Categories