I have been looking at .NET libraries using ILSpy and have come across List<T> class definition in System.Collections.Generic namespace. I see that the class uses methods like this one:
// System.Collections.Generic.List<T>
/// <summary>Removes all elements from the <see cref="T:System.Collections.Generic.List`1" />.</summary>
public void Clear()
{
if (this._size > 0)
{
Array.Clear(this._items, 0, this._size);
this._size = 0;
}
this._version++;
}
So, the Clear() method of the List<T> class actually uses Array.Clear method. I have seen many other List<T> methods that use Array stuff in the body.
Does this mean that List<T> is actually an undercover Array or List only uses some part of Array methods?
I know lists are type safe and don't require boxing/unboxing but this has confused me a bit.
The list class is not itself an array. In other words, it does not derive from an array. Instead it encapsulates an array that is used by the implementation to hold the list's member elements.
Since List<T> offers random access to its elements, and those elements are indexed 0..Count-1, using an array to store the elements is the obvious implementation.
This tends to surprise C++ programmers that know std::list. A linked list, covered in .NET as well with the LinkedList class. And has the same perf characteristics, O(1) for inserts and deletes.
You should however in general avoid it. Linked lists do not perform well on modern processors. Which greatly depend on the cpu caches to get reasonable performance with memory that's many times slower than the execution core. A simple array is by far the data structure that takes most advantage of the cache. Accessing an element gives very high odds that subsequent elements are present in the cache as well. That is not the case for a linked list, elements tend to be scattered throughout the address space, make a cache miss likely. They can be very expensive, as much as 200 cycles with the cpu doing nothing but waiting on the memory sub-system to supply the data.
But do keep the perf characteristics in mind, adding or removing an element that is not at the end of the List costs O(n), just like an array. And a large List can generate a lot of garbage as the array needs to be expanded, setting the Capacity property up front can help a lot to avoid that. More about that in this answer. And otherwise the exact same concerns for std::vector<>.
Yes, List<T> uses an array internally to store the items, although in most cases the array is actually larger than the number of elements in the collection -- it has some extra "padding" at the end so that you can add new items without it having to reallocate memory every time. It keeps track of the actual size of the collection with a separate field (you can see this._size in your generated code). When you add more elements than the current array has room for, it will automatically allocate a new larger array -- twice as big, I think -- and copy over all the existing elements.
If you're concerned about a List<T> using more memory than necessary, you can set the size of the array explicitly with the constructor override that accepts a capacity parameter, if you know the size in advance, or call the TrimExcess() method to make sure the array is (close to) to actual size of the collection.
Random access memory is an array, so in that sense all data structures from linked-lists to heaps and beyond, that rely on random-access to memory for their performance behaviour, are built on the array that is system memory. It is more a question of how many-levels of abstraction are in between.
Of course in a modern virtual memory machine, the random-access system memory is itself an abstraction built on a complicated virtual-memory model of multi-tier pipelined caches, non-cached RAM, and disk.
Related
I have millions of instances of class Data, I seek optimization advise.
Is there a way to optimize it in any way - save memory for example by serializing it somehow, although it will hurt the retrieval speed which is important too. Maybe turning the class to struct - but it seems that the class is pretty large for struct.
Queries for this objects can take hundreds-millions of these objects at a time. They sit in a list and queried by DateTime. The results are aggregated in different ways, many calculation can be applied.
[Serializable]
[DataContract]
public abstract class BaseData {}
[Serializable]
public class Data : BaseData {
public byte member1;
public int member2;
public long member3;
public double member4;
public DateTime member5;
}
Unfortunately, while you did specify that you want to "optimize", you did not specify what the exact problem is you mean to tackle. So I cannot really give you more than general advice.
Serialization will not help you. Your Data objects are already stored as bytes in memory. Nor will turning it into a struct help. The difference between a struct and a class lies in their addressing and referencing behaviour, not in their memory footprint.
The only way I can think of to reduce the memory footprint of a collection with "hundreds-millions" of these objects would be to serialize and compress the entire thing. But that is not feasible. You would always have to decompress the entire thing before accessing it, which would shoot your performance to hell AND actually almost double the memory consumption on access (compressed and decompressed data both lying in memory at that point).
The best general advice I can give you is not to try to optimize this scenario yourself, but use specialized software for that. By specialized software, I mean a (in-memory) database. One example of a database you can use in-memory, and for which you already have everything you need on-board in the .NET framework, is SQLite.
I assume, as you seem to imply, that you have a class with many members, have a large number of instances, and need to keep them all in memory at the same time to perform calculations.
I ran a few tests to see if you could actually get different sizes for the classes you described.
I used this simple method for finding the in-memory size of an object:
private static void MeasureMemory()
{
int size = 10000000;
object[] array = new object[size];
long before = GC.GetTotalMemory(true);
for (int i = 0; i < size; i++)
{
array[i] = new Data();
}
long after = GC.GetTotalMemory(true);
double diff = after - before;
Console.WriteLine("Total bytes: " + diff);
Console.WriteLine("Bytes per object: " + diff / size);
}
It may be primitive, but I find that it works fine for situations like this.
As expected, almost nothing you can do to that class (turning it to a struct, removing the inheritance, or the method attributes) influences the memory being used by a single instance. As far as memory usage goes, they are all equivalent. However, do try to fiddle with your actual classes and run them through the given code.
The only way you could actually reduce the memory footprint of an instance would be to use smaller structures for keeping your data (int instead of long for example). If you have a large number of booleans, you could group them into a byte or integer, and have simple property wrappers to work with them (A boolean takes a byte of memory). These may be insignificant things in most situations, but for a hundred million objects, removing a boolean could make a difference of a hundred MB of memory. Also, be aware that the platform target you choose for your application can have an impact on the memory footprint of an object (x64 builds take up more memory then x86 ones).
Serializing the data is very unlikely to help. An in-memory database has it's upsides, especially if you are doing complex queries. However, it is unlikely to actually reduce the memory usage for your data. Unfortunately, there just aren't many ways to reduce the footprint of basic data types. At some point, you will just have to move to a file-based database.
However, here are some ideas. Please be aware that they are hacky, highly conditional, decrease the computation performance and will make the code harder to maintain.
It is often a case in large data structures that objects in different states will have only some properties filled, and the other will be set to null or a default value. If you can identify such groups of properties, perhaps you could move them to a sub-class, and have one reference that could be null instead of having several properties take up space. Then you only instantiate the sub-class once it is needed. You could write property wrappers that could hide this from the rest of the code. Have in mind that the worst case scenario here would have you keeping all the properties in memory, plus several object headers and pointers.
You could perhaps turn members that are likely to take a default value into binary representations, and then pack them into a byte array. You would know which byte positions represent which data member, and could write properties that could read them. Position the properties that are most likely to have a default value at the end of the byte array (a few longs that are often 0 for example). Then, when creating the object, adjust the byte array size to exclude the properties that have the default value, starting from the end of the list, until you hit the first member that has a non-default value. When the outside code requests a property, you can check if the byte array is large enough to hold that property, and if not, return the default value. This way, you could potentially save some space. Best case, you will have a null pointer to a byte array instead of several data members. Worst case, you will have full byte arrays taking as much space as the original data, plus some overhead for the array. The usefulness depends on the actual data, and assumes that you do relatively few writes, as the re-computation of the array will be expensive.
Hope any of this helps :)
When using ToList(), is there a performance impact that needs to be considered?
I was writing a query to retrieve files from a directory, which is the query:
string[] imageArray = Directory.GetFiles(directory);
However, since I like to work with List<> instead, I decided to put in...
List<string> imageList = Directory.GetFiles(directory).ToList();
So, is there some sort of performance impact that should be considered when deciding to do a conversion like this - or only to be considered when dealing with a large number of files? Is this a negligible conversion?
IEnumerable<T>.ToList()
Yes, IEnumerable<T>.ToList() does have a performance impact, it is an O(n) operation though it will likely only require attention in performance critical operations.
The ToList() operation will use the List(IEnumerable<T> collection) constructor. This constructor must make a copy of the array (more generally IEnumerable<T>), otherwise future modifications of the original array will change on the source T[] also which wouldn't be desirable generally.
I would like to reiterate this will only make a difference with a huge list, copying chunks of memory is quite a fast operation to perform.
Handy tip, As vs To
You'll notice in LINQ there are several methods that start with As (such as AsEnumerable()) and To (such as ToList()). The methods that start with To require a conversion like above (ie. may impact performance), and the methods that start with As do not and will just require some cast or simple operation.
Additional details on List<T>
Here is a little more detail on how List<T> works in case you're interested :)
A List<T> also uses a construct called a dynamic array which needs to be resized on demand, this resize event copies the contents of an old array to the new array. So it starts off small and increases in size if required.
This is the difference between the Capacity and Count properties on List<T>. Capacity refers to the size of the array behind the scenes, Count is the number of items in the List<T> which is always <= Capacity. So when an item is added to the list, increasing it past Capacity, the size of the List<T> is doubled and the array is copied.
Is there a performance impact when calling toList()?
Yes of course. Theoretically even i++ has a performance impact, it slows the program for maybe a few ticks.
What does .ToList do?
When you invoke .ToList, the code calls Enumerable.ToList() which is an extension method that return new List<TSource>(source). In the corresponding constructor, under the worst circumstance, it goes through the item container and add them one by one into a new container. So its behavior affects little on performance. It's impossible to be a performance bottle neck of your application.
What's wrong with the code in the question
Directory.GetFiles goes through the folder and returns all files' names immediately into memory, it has a potential risk that the string[] costs a lot of memory, slowing down everything.
What should be done then
It depends. If you(as well as your business logic) gurantees that the file amount in the folder is always small, the code is acceptable. But it's still suggested to use a lazy version: Directory.EnumerateFiles in C#4. This is much more like a query, which will not be executed immediately, you can add more query on it like:
Directory.EnumerateFiles(myPath).Any(s => s.Contains("myfile"))
which will stop searching the path as soon as a file whose name contains "myfile" is found. This is obviously has a better performance then .GetFiles.
Is there a performance impact when calling toList()?
Yes there is. Using the extension method Enumerable.ToList() will construct a new List<T> object from the IEnumerable<T> source collection which of course has a performance impact.
However, understanding List<T> may help you determine if the performance impact is significant.
List<T> uses an array (T[]) to store the elements of the list. Arrays cannot be extended once they are allocated so List<T> will use an over-sized array to store the elements of the list. When the List<T> grows beyond the size the underlying array a new array has to be allocated and the contents of the old array has to be copied to the new larger array before the list can grow.
When a new List<T> is constructed from an IEnumerable<T> there are two cases:
The source collection implements ICollection<T>: Then ICollection<T>.Count is used to get the exact size of the source collection and a matching backing array is allocated before all elements of the source collection is copied to the backing array using ICollection<T>.CopyTo(). This operation is quite efficient and probably will map to some CPU instruction for copying blocks of memory. However, in terms of performance memory is required for the new array and CPU cycles are required for copying all the elements.
Otherwise the size of the source collection is unknown and the enumerator of IEnumerable<T> is used to add each source element one at a time to the new List<T>. Initially the backing array is empty and an array of size 4 is created. Then when this array is too small the size is doubled so the backing array grows like this 4, 8, 16, 32 etc. Every time the backing array grows it has to be reallocated and all elements stored so far have to be copied. This operation is much more costly compared to the first case where an array of the correct size can be created right away.
Also, if your source collection contains say 33 elements the list will end up using an array of 64 elements wasting some memory.
In your case the source collection is an array which implements ICollection<T> so the performance impact is not something you should be concerned about unless your source array is very large. Calling ToList() will simply copy the source array and wrap it in a List<T> object. Even the performance of the second case is not something to worry about for small collections.
It will be as (in)efficient as doing:
var list = new List<T>(items);
If you disassemble the source code of the constructor that takes an IEnumerable<T>, you will see it will do a few things:
Call collection.Count, so if collection is an IEnumerable<T>, it will force the execution. If collection is an array, list, etc. it should be O(1).
If collection implements ICollection<T>, it will save the items in an internal array using the ICollection<T>.CopyTo method. It should be O(n), being n the length of the collection.
If collection does not implement ICollection<T>, it will iterate through the items of the collection, and will add them to an internal list.
So, yes, it will consume more memory, since it has to create a new list, and in the worst case, it will be O(n), since it will iterate through the collection to make a copy of each element.
"is there a performance impact that needs to be considered?"
The issue with your precise scenario is that first and foremost your real concern about performance would be from the hard-drive speed and efficiency of the drive's cache.
From that perspective, the impact is surely negligible to the point that NO it need not be considered.
BUT ONLY if you really need the features of the List<> structure to possibly either make you more productive, or your algorithm more friendly, or some other advantage. Otherwise, you're just purposely adding an insignificant performance hit, for no reason at all. In which case, naturally, you shouldn’t do it! :)
ToList() creates a new List and put the elements in it which means that there is an associated cost with doing ToList(). In case of small collection it won't be very noticeable cost but having a huge collection can cause a performance hit in case of using ToList.
Generally you should not use ToList() unless work you are doing cannot be done without converting collection to List. For example if you just want to iterate through the collection you don't need to perform ToList
If you are performing queries against a data source for example a Database using LINQ to SQL then the cost of doing ToList is much more because when you use ToList with LINQ to SQL instead of doing Delayed Execution i.e. load items when needed (which can be beneficial in many scenarios) it instantly loads items from Database into memory
Considering the performance of retrieving file list, ToList() is negligible. But not really for other scenarios. That really depends on where you are using it.
When calling on an array, list, or other collection, you create a copy of the collection as a List<T>. The performance here depends on the size of the list. You should do it when really necessary.
In your example, you call it on an array. It iterates over the array and adds the items one by one to a newly created list. So the performance impact depends on the number of files.
When calling on an IEnumerable<T>, you materialize the IEnumerable<T> (usually a query).
ToList Will create a new list and copy elements from original source to the newly created list so only thing is to copy the elements from the original source and depends on the source size
Let's look for another example;
If you working on databases when you run ToList() method and check the SQL Profiler for this code;
var IsExist = (from inc in entities.be_Settings
where inc.SettingName == "Number"
select inc).ToList().Count > 0;
Auto created query will like this:
SELECT [Extent1].[SettingName] AS [SettingName], [Extent1].[SettingValue] AS [SettingValue] FROM [dbo].[be_Settings] AS [Extent1] WHERE N'Number' = [Extent1].[SettingName]
The select query is run with the ToList method, and the results of the query are stored in memory, and it is checked whether there is a record by looking at the number of elements of the List. For example, if there are 1000 records in your table with the relevant criteria, these 1000 records are first brought from the database and converted into objects, and then they are thrown into a List and you only check the number of elements of this List. So this is very inefficient way to choose.
its not exactly about list perfomance but if u have high dimensional array
u can use HashSet instead of List.
So, I need an array of items. And I was wondering which one would be the fastest/best to use (in c#), I'll be doing to following things:
Adding elements at the end
Removing elements at the start
Looking at the first and last element (every frame)
Clearing it occasionally
Converting it to a normal array (not a list. I'm using iTween and it asks a normal array.) I'll do this almost every frame.
So, what would be the best to use considering these things? Especially the last one, since I'm doing that every frame. Should I just use an array, or is there something else that converts very fast to array and also has easy adding/removing of elements at the start & end?
Requirements 1) and 2) point to a Queue<T>, it is the only standard collection optimized for these 2 operations.
3) You'll need a little trickery for getting at the Last element, First is Peek().
4) is simple (.Clear())
5) The standard .ToArray() method will do this.
You will not escape copying all elements (O(n)) for item 5)
You could take a look at LinkedList<T>.
It has O(1) support for inspecting, adding and removing items at the beginning or end. It requires O(n) to copy to an array, but that seems unavoidable. The copy could be avoided if the API you were using accepted an ICollection<T> or IEnumerable<T>, but if that can't be changed then you may be stuck with using ToArray.
If your list changes less than once per frame then you could cache the array and only call ToArray again if the list has changed since the previous frame. Here's an implementation of a few of the methods, to give you an idea of how this potential optimization can work:
private LinkedList<T> list = new LinkedList<T>();
private bool isDirty = true;
private T[] array;
public void Enqueue(T t)
{
list.AddLast(t);
isDirty = true;
}
public T[] ToArray()
{
if (isDirty)
{
array = list.ToArray();
isDirty = false;
}
return array;
}
I'm assuming you are using classes (and not structs)? (If you are using structs (value type) then that changes things a bit.)
The System.Collections.Generic.List class lets you do all that, and quickly. The only part that could be done better with a LinkedList is removing from the start, but a single block memory copy isn't much pain, and it will create arrays without any hassle.
I wouldn't recommend using a Linked List, especially if you are only removing from the start or end. Each addition (with the standard LinkedList collection) requires a memory allocation (it has to build an object to reference what you actually want to add).
Lists also have lots of convenient functions, which you need to be careful when using if performance is an issue. Lists are essentially arrays which get bigger as you add stuff (every time you overfill them, they get much bigger, which saves excessive memory operations). Clearing them requires no effort, and leaves the memory allocated to be used another day.
In personal experience, .NET isn't suited to generic Linked Lists, you need to be writing your code specifically to work with them throughout. Lists:
Are easy to use
Do everything you want
Won't leave your memory looking like swiss cheese (well, as best you can do when you are allocating a new array every frame - I recommend you give the garbage collector the chance to get rid of any old arrays before making a new one if these Arrays are going to be big by re-using any array references and nulling any you don't need).
The right choice will depend heavily on the specifics of the application, but List is always a safe bet if you ask me, and you won't have to write any structure specific code to get it working.
If you do feel like using Lists, you'll want to look into these methods and properties:
ToArray() // Makes those arrays you want
Clear() // Clears the array
Add(item) // Adds an item to the end
RemoveAt(index) // index 0 for the first item, .Count - 1 for the last
Count // Retrieves the number of items in the list - it's not a free lookup, so try an avoid needless requests
Sorry if this whole post is overkill.
How about a circular array? If you keep the index of the last element and the first, you can have O(1) for every criteria you gave.
EDIT: You could take a C++ vector approach for capacity: double the size when it gets full.
Regular List will do the work and it is faster than LinkedList for insert.
Adding elements at the end -> myList.Insert(myList.Count - 1)
Removing elements at the start -> myList.RemoveAt(0)
Looking at the first and last element (every frame) -> myList[0] or
myList[myList.Count - 1]
Clearing it occasionally -> myList.Clear()
Converting it to a normal array (not a list. I'm using iTween and it
asks a normal array.) I'll do this almost every frame. ->
myList.ToArray()
Which is faster and why:
IEnumerable<T> clxnOfTs = GetSeriouslyHugeCollection();
var list = new List<T>(clxnOfTs.Count);
foreach (T t in clxnOfTs) list.Add(t);
or
IEnumerable<T> clxnOfTs = GetSeriouslyHugeCollection();
var linkedList = new LinkedList<T>();
foreach (T t in clxnOfTs) linkedList.Add(t);
Assume this will run on a newish multi-core server with loads of memory.
So really, the question is whether pre-allocating the array that underpins the List all at once and then filling it is faster than simply allocating each LinkedListNode as each T is added to the LinkedList.
My intuition says that allocating a very large chunk of contiguous memory all at once is more expensive than allocating many little chunks anywhere on the heap because it is unlikely that the chunk of contiguous memory will already exist.
Thanks!
Jeff
So, as with any performance related question, if you really care about the answer you should create a reaslistic test harness, write the code both ways, and profile it.
But to address your question in a more general sense, I would offer some advice about the scenarios where different kinds of list structures make sense.
List<T> makes sense (from a performace standpoint) when you generally add/remove items at the end of the list, and only rarely add or remove items in the middle. It also works best when you have some expectation about the list's capacity ahead of time. Since List<T> internally allocates memory contiguously, it behaves better from a cache locality standpoint. Since List<T> uses an array as its backing structure, it is also very efficient for random (indexed) access.
LinkedList<T> works better with problems where you need to insert or remove items from the middle or front of the list often. Since it doesn't have to re-allocate or shift the contents of the list to do this, it will perform much better. Since LinkedList<T> used a linked node structure, it does not provide efficient random (indexed) access to the data. As a result, it will perform poorly if you attempt to use LINQ operators like ElementAt(). Linked lists generally perform worse from a cache locality standpoint since they are usually implemented to allocate nodes on demand. Some implementation use pre-cached and recycled nodes which are allocated in pools to minimize this problem - however, I don't believe the .NET implementation does so.
Allocating many small bits of memory is much more expensive than allocating one large chunk, acquiring the global heap lock isn't that cheap. A List<> runs rings around a LinkedList<>, cpu cache locality is king.
So, I "exhaustively" tested allocating and filling up various containers with various values, with and without specifying capacity, and the answers given here are basically correct. (Kudos to them for venturing hypotheses!)
Constructing a List with initial capacity is actually relatively fast; around a millisecond for as many as 100 million objects containing doubles. Constructing a List without initial capacity or constructing a LinkedList were basically instantaneous, as would be expected.
However, filling the containers revealed very significant performance differences:
Most importantly, filling the LinkedList was dooog slow. It didn't finish adding 100 million objects (simply a double wrapped in an object) in under a minute.
Filling a List constructed WITHOUT an initial capacity with 100 million objects took my test machine an average of 3732 ms. Fast.
Filling a List constructed WITH a specified initial capacity with 100 million objects took my test machine an average of 2295 ms. Very fast.
I concur with those who say the speed is due to the efficiency of manipulating contiguous memory in a cache line.
You have to measure for your particular case. There is no way around it. I would expect one large allocation to be faster if it is small enough :).
CLR memory allocator is designed to handle huge number of small allocations well, but allocating one large block is likley faster (cost of the allocation mostly does not depend on size in CLR).
Now if your size goes above ~80K than object will be allocated on Large Objects Heap (LOH) which is slower compared to allocation from general heap and it has its own consequences (like collection will not happen in Gen 0 for such chunks).
Group of related data like a list of parts etc., can be handled either using Arrays(Array of Parts) or using Collection. I understand that When Arrays are used, Insertion, Deletion and some other operations have performance impact when it is compared with Collections. Does this mean that Arrays are not used internally by the collections?, If so what is the data structure used for collections like List, Collection etc?
How the collections are handled internally?
List<T> uses an internal array. Removing/inserting items near the beginning of the list will be more expensive than doing the same near the end of the list, since the entire contents of the internal array need to be shifted in one direction. Also, once you try to add an item when the internal list is full, a new, bigger array will be constructed, the contents copied, and the old array discarded.
The Collection<T> class, when used with the parameterless constructor, uses a List<T> internally. So performance-wise they will be identical, with the exception of overhead caused by wrapping. (Essentially one more level of indirection, which is going to be negligible in most scenarios.)
LinkedList<T> is, as its name implies, a linked list. This will sacrifice iteration speed for insertion/removal speed. Since iterating means traversing pointers-to-pointers-to-pointers ad infinitum, this is going to take more work overall. Aside from the pointer traversal, two nodes may not be allocated anywhere near each other, reducing the effectiveness of CPU RAM caches.
However, the amount of time required to insert or remove a node is constant, since it requires the same number of operations no matter the state of the list. (This does not take into account any work that must be done to actually locate the item to remove, or to traverse the list to find the insertion point!)
If your primary concern with your collection is testing if something is in the collection, you might consider a HashSet<T> instead. Addition of items to the set will be relatively fast, somewhere between insertion into a list and a linked list. Removal of items will again be relatively fast. But the real gain is in lookup time -- testing if a HashSet<T> contains an item does not require iterating the entire list. On average it will perform faster than any list or linked list structure.
However, a HashSet<T> cannot contain equivalent items. If part of your requirements is that two items that are considered equal (by an Object.Equals(Object) overload, or by implementing IEquatable<T>) coexist independently in the collection, then you simply cannot use a HashSet<T>. Also, HashSet<T> does not guarantee insertion order, so you also can't use a HashSet<T> if maintaining some sort of ordering is important.
There are two basic ways to implement a simple collection:
contiguous array
linked list
Contiguous arrays have performance disadvantages for the operations you mentioned because the memory space of the collection is either preallocated or allocated based on the contents of the collection. Thus deletion or insertion requires moving many array elements to keep the entire collection contiguous and in the proper order.
Linked lists remove these issues because the items in the collection do not need to be stored in memory contiguously. Instead each element contains a reference to one or more of the other elements. Thus, when an insertion is made, the item in question is created anywhere in memory and only the references on one or two of the elements already in the collection need to be modified.
For example:
LinkedList<object> c = new LinkedList<object>(); // a linked list
object[] a = new object[] { }; // a contiguous array
This is simplified of course. The internals of LinkedList<> are doubtless more complex than a simple singly or doubly linked list, but that is the basic structure.
I think that some collection classes might use arrays internally as well as linked lists or something similar. The benefit of using collections from the System.Collections namespace instead of arrays, is that you do not need to spend any extra time writing code to perform update operations.
Arrays will always be more lightweight, and if you know some very good search algorithms, then you might even be able to use them more efficiently, but most of the the time you can avoid reinventing the wheel by using classes from System.Collections. These classes are meant to help the programmer avoid writing code that has already been written and tuned hundreds of times, so it is unlikely that you'll get a significant performance boost by manipulating arrays yourself.
When you need a static collection that doesn't require much adding, removing or editing, then perhaps it is a good time to use an array, since they don't require the extra memory that collections do.