Best .NET Array/List - c#

So, I need an array of items. And I was wondering which one would be the fastest/best to use (in c#), I'll be doing to following things:
Adding elements at the end
Removing elements at the start
Looking at the first and last element (every frame)
Clearing it occasionally
Converting it to a normal array (not a list. I'm using iTween and it asks a normal array.) I'll do this almost every frame.
So, what would be the best to use considering these things? Especially the last one, since I'm doing that every frame. Should I just use an array, or is there something else that converts very fast to array and also has easy adding/removing of elements at the start & end?

Requirements 1) and 2) point to a Queue<T>, it is the only standard collection optimized for these 2 operations.
3) You'll need a little trickery for getting at the Last element, First is Peek().
4) is simple (.Clear())
5) The standard .ToArray() method will do this.
You will not escape copying all elements (O(n)) for item 5)

You could take a look at LinkedList<T>.
It has O(1) support for inspecting, adding and removing items at the beginning or end. It requires O(n) to copy to an array, but that seems unavoidable. The copy could be avoided if the API you were using accepted an ICollection<T> or IEnumerable<T>, but if that can't be changed then you may be stuck with using ToArray.
If your list changes less than once per frame then you could cache the array and only call ToArray again if the list has changed since the previous frame. Here's an implementation of a few of the methods, to give you an idea of how this potential optimization can work:
private LinkedList<T> list = new LinkedList<T>();
private bool isDirty = true;
private T[] array;
public void Enqueue(T t)
{
list.AddLast(t);
isDirty = true;
}
public T[] ToArray()
{
if (isDirty)
{
array = list.ToArray();
isDirty = false;
}
return array;
}

I'm assuming you are using classes (and not structs)? (If you are using structs (value type) then that changes things a bit.)
The System.Collections.Generic.List class lets you do all that, and quickly. The only part that could be done better with a LinkedList is removing from the start, but a single block memory copy isn't much pain, and it will create arrays without any hassle.
I wouldn't recommend using a Linked List, especially if you are only removing from the start or end. Each addition (with the standard LinkedList collection) requires a memory allocation (it has to build an object to reference what you actually want to add).
Lists also have lots of convenient functions, which you need to be careful when using if performance is an issue. Lists are essentially arrays which get bigger as you add stuff (every time you overfill them, they get much bigger, which saves excessive memory operations). Clearing them requires no effort, and leaves the memory allocated to be used another day.
In personal experience, .NET isn't suited to generic Linked Lists, you need to be writing your code specifically to work with them throughout. Lists:
Are easy to use
Do everything you want
Won't leave your memory looking like swiss cheese (well, as best you can do when you are allocating a new array every frame - I recommend you give the garbage collector the chance to get rid of any old arrays before making a new one if these Arrays are going to be big by re-using any array references and nulling any you don't need).
The right choice will depend heavily on the specifics of the application, but List is always a safe bet if you ask me, and you won't have to write any structure specific code to get it working.
If you do feel like using Lists, you'll want to look into these methods and properties:
ToArray() // Makes those arrays you want
Clear() // Clears the array
Add(item) // Adds an item to the end
RemoveAt(index) // index 0 for the first item, .Count - 1 for the last
Count // Retrieves the number of items in the list - it's not a free lookup, so try an avoid needless requests
Sorry if this whole post is overkill.

How about a circular array? If you keep the index of the last element and the first, you can have O(1) for every criteria you gave.
EDIT: You could take a C++ vector approach for capacity: double the size when it gets full.

Regular List will do the work and it is faster than LinkedList for insert.
Adding elements at the end -> myList.Insert(myList.Count - 1)
Removing elements at the start -> myList.RemoveAt(0)
Looking at the first and last element (every frame) -> myList[0] or
myList[myList.Count - 1]
Clearing it occasionally -> myList.Clear()
Converting it to a normal array (not a list. I'm using iTween and it
asks a normal array.) I'll do this almost every frame. ->
myList.ToArray()

Related

Most efficient way, Tags or List<GameObject>?

In my game I can use a list of game objects or tags to iterate but i prefer knows what is the most efficient way.
Save more memory using tags or unity requires many resources to do a search by tag?
public List<City> _Citys = new List<City>();
or
foreach(GameObject go in GameObject.FindGameObjectsWithTag("City"))
You're better of using a List of City objects and doing a standard for loop to iterate over the 'City' objects. The List just simply holds references to the 'City' objects, so impact on memory should be minimal - you could use an array of GameObjects[] instead of a List (which is what FindGameObjectsWithTag returns).
It's better for performance to use a populated List/Array rather than searching by Tags and of course you're explicitly pointing to an object rather than using 'magic' strings -- if you change the tag name later on then the FindGameObjectsWithTag method will silently break, as it will no longer find any objects.
Also, avoid using a foreach loop in Unity as this unfortunately creates a lot of garbage (the garbage collector in Unity isn't great so it's best to create as little garbage as possbile), instead just use a standard for loop:
Replace the “foreach” loops with simple “for” loops. For some reason, every iteration of every “foreach” loop generated 24 Bytes of garbage memory. A simple loop iterating 10 times left 240 Bytes of memory ready to be collected which was just unacceptable
EDIT: As mentioned in pid's answer - measure. You can use the built-in Unity profiler to inspect memory usage: http://docs.unity3d.com/Manual/ProfilerMemory.html
Per Microsoft's C# API rules, verbs such as Find* or Count* denote active code while terms such as Length stand for actual values that require no code execution.
Now, if the Unity3D folks respected those guidelines is a matter of debate, but from the name of the method I can already tell that it has a cost and should not be taken too lightly.
On the other side, your question is about performance, not correctness. Both ways are correct per se, but one is supposed to have better performance.
So, the main rule of refactoring for performance is: MEASURE.
It depends on memory allocation and garbage collection, it is impossible to tell which really is faster without measuring.
So the best advice I could give you is pretty general. Whenever you feel the need to enhance performance of code you have to actually measure what you are about to improve, before and after.
Your code examples are 2 distinctly different things. One is instantiating a list, and one is enumerating over an IEnumerable returned from a function call.
I assume you mean the difference between iterating over your declared list vs iterating over the return value from GameObject.FindObjectsWithTag() in which case;
Storing a List as a member variable in your class, populating it once and then iterating over it several times is more efficient than iterating over GameObject.FindObjectsWithTag several times.
This is because you keep your List and your references to the objects in your list at all times without having to repopulate it.
GameObject.FindObjectsWithTag will search your entire object hierarchy and compile a list of all the objects that it finds that matches your search criteria. This is done every time you call the function, so there is additional overhead even if the amount of objects it finds is the same as it still searches your hierarchy.
To be honest, you could just cache your results with a List object using GameObject.FindObjectWithTag providing the amount of objects returned will not change. (As in to say you are not instantiating or destroying any of those objects)

Is List<T> really an undercover Array in C#?

I have been looking at .NET libraries using ILSpy and have come across List<T> class definition in System.Collections.Generic namespace. I see that the class uses methods like this one:
// System.Collections.Generic.List<T>
/// <summary>Removes all elements from the <see cref="T:System.Collections.Generic.List`1" />.</summary>
public void Clear()
{
if (this._size > 0)
{
Array.Clear(this._items, 0, this._size);
this._size = 0;
}
this._version++;
}
So, the Clear() method of the List<T> class actually uses Array.Clear method. I have seen many other List<T> methods that use Array stuff in the body.
Does this mean that List<T> is actually an undercover Array or List only uses some part of Array methods?
I know lists are type safe and don't require boxing/unboxing but this has confused me a bit.
The list class is not itself an array. In other words, it does not derive from an array. Instead it encapsulates an array that is used by the implementation to hold the list's member elements.
Since List<T> offers random access to its elements, and those elements are indexed 0..Count-1, using an array to store the elements is the obvious implementation.
This tends to surprise C++ programmers that know std::list. A linked list, covered in .NET as well with the LinkedList class. And has the same perf characteristics, O(1) for inserts and deletes.
You should however in general avoid it. Linked lists do not perform well on modern processors. Which greatly depend on the cpu caches to get reasonable performance with memory that's many times slower than the execution core. A simple array is by far the data structure that takes most advantage of the cache. Accessing an element gives very high odds that subsequent elements are present in the cache as well. That is not the case for a linked list, elements tend to be scattered throughout the address space, make a cache miss likely. They can be very expensive, as much as 200 cycles with the cpu doing nothing but waiting on the memory sub-system to supply the data.
But do keep the perf characteristics in mind, adding or removing an element that is not at the end of the List costs O(n), just like an array. And a large List can generate a lot of garbage as the array needs to be expanded, setting the Capacity property up front can help a lot to avoid that. More about that in this answer. And otherwise the exact same concerns for std::vector<>.
Yes, List<T> uses an array internally to store the items, although in most cases the array is actually larger than the number of elements in the collection -- it has some extra "padding" at the end so that you can add new items without it having to reallocate memory every time. It keeps track of the actual size of the collection with a separate field (you can see this._size in your generated code). When you add more elements than the current array has room for, it will automatically allocate a new larger array -- twice as big, I think -- and copy over all the existing elements.
If you're concerned about a List<T> using more memory than necessary, you can set the size of the array explicitly with the constructor override that accepts a capacity parameter, if you know the size in advance, or call the TrimExcess() method to make sure the array is (close to) to actual size of the collection.
Random access memory is an array, so in that sense all data structures from linked-lists to heaps and beyond, that rely on random-access to memory for their performance behaviour, are built on the array that is system memory. It is more a question of how many-levels of abstraction are in between.
Of course in a modern virtual memory machine, the random-access system memory is itself an abstraction built on a complicated virtual-memory model of multi-tier pipelined caches, non-cached RAM, and disk.

Is there a performance impact when calling ToList()?

When using ToList(), is there a performance impact that needs to be considered?
I was writing a query to retrieve files from a directory, which is the query:
string[] imageArray = Directory.GetFiles(directory);
However, since I like to work with List<> instead, I decided to put in...
List<string> imageList = Directory.GetFiles(directory).ToList();
So, is there some sort of performance impact that should be considered when deciding to do a conversion like this - or only to be considered when dealing with a large number of files? Is this a negligible conversion?
IEnumerable<T>.ToList()
Yes, IEnumerable<T>.ToList() does have a performance impact, it is an O(n) operation though it will likely only require attention in performance critical operations.
The ToList() operation will use the List(IEnumerable<T> collection) constructor. This constructor must make a copy of the array (more generally IEnumerable<T>), otherwise future modifications of the original array will change on the source T[] also which wouldn't be desirable generally.
I would like to reiterate this will only make a difference with a huge list, copying chunks of memory is quite a fast operation to perform.
Handy tip, As vs To
You'll notice in LINQ there are several methods that start with As (such as AsEnumerable()) and To (such as ToList()). The methods that start with To require a conversion like above (ie. may impact performance), and the methods that start with As do not and will just require some cast or simple operation.
Additional details on List<T>
Here is a little more detail on how List<T> works in case you're interested :)
A List<T> also uses a construct called a dynamic array which needs to be resized on demand, this resize event copies the contents of an old array to the new array. So it starts off small and increases in size if required.
This is the difference between the Capacity and Count properties on List<T>. Capacity refers to the size of the array behind the scenes, Count is the number of items in the List<T> which is always <= Capacity. So when an item is added to the list, increasing it past Capacity, the size of the List<T> is doubled and the array is copied.
Is there a performance impact when calling toList()?
Yes of course. Theoretically even i++ has a performance impact, it slows the program for maybe a few ticks.
What does .ToList do?
When you invoke .ToList, the code calls Enumerable.ToList() which is an extension method that return new List<TSource>(source). In the corresponding constructor, under the worst circumstance, it goes through the item container and add them one by one into a new container. So its behavior affects little on performance. It's impossible to be a performance bottle neck of your application.
What's wrong with the code in the question
Directory.GetFiles goes through the folder and returns all files' names immediately into memory, it has a potential risk that the string[] costs a lot of memory, slowing down everything.
What should be done then
It depends. If you(as well as your business logic) gurantees that the file amount in the folder is always small, the code is acceptable. But it's still suggested to use a lazy version: Directory.EnumerateFiles in C#4. This is much more like a query, which will not be executed immediately, you can add more query on it like:
Directory.EnumerateFiles(myPath).Any(s => s.Contains("myfile"))
which will stop searching the path as soon as a file whose name contains "myfile" is found. This is obviously has a better performance then .GetFiles.
Is there a performance impact when calling toList()?
Yes there is. Using the extension method Enumerable.ToList() will construct a new List<T> object from the IEnumerable<T> source collection which of course has a performance impact.
However, understanding List<T> may help you determine if the performance impact is significant.
List<T> uses an array (T[]) to store the elements of the list. Arrays cannot be extended once they are allocated so List<T> will use an over-sized array to store the elements of the list. When the List<T> grows beyond the size the underlying array a new array has to be allocated and the contents of the old array has to be copied to the new larger array before the list can grow.
When a new List<T> is constructed from an IEnumerable<T> there are two cases:
The source collection implements ICollection<T>: Then ICollection<T>.Count is used to get the exact size of the source collection and a matching backing array is allocated before all elements of the source collection is copied to the backing array using ICollection<T>.CopyTo(). This operation is quite efficient and probably will map to some CPU instruction for copying blocks of memory. However, in terms of performance memory is required for the new array and CPU cycles are required for copying all the elements.
Otherwise the size of the source collection is unknown and the enumerator of IEnumerable<T> is used to add each source element one at a time to the new List<T>. Initially the backing array is empty and an array of size 4 is created. Then when this array is too small the size is doubled so the backing array grows like this 4, 8, 16, 32 etc. Every time the backing array grows it has to be reallocated and all elements stored so far have to be copied. This operation is much more costly compared to the first case where an array of the correct size can be created right away.
Also, if your source collection contains say 33 elements the list will end up using an array of 64 elements wasting some memory.
In your case the source collection is an array which implements ICollection<T> so the performance impact is not something you should be concerned about unless your source array is very large. Calling ToList() will simply copy the source array and wrap it in a List<T> object. Even the performance of the second case is not something to worry about for small collections.
It will be as (in)efficient as doing:
var list = new List<T>(items);
If you disassemble the source code of the constructor that takes an IEnumerable<T>, you will see it will do a few things:
Call collection.Count, so if collection is an IEnumerable<T>, it will force the execution. If collection is an array, list, etc. it should be O(1).
If collection implements ICollection<T>, it will save the items in an internal array using the ICollection<T>.CopyTo method. It should be O(n), being n the length of the collection.
If collection does not implement ICollection<T>, it will iterate through the items of the collection, and will add them to an internal list.
So, yes, it will consume more memory, since it has to create a new list, and in the worst case, it will be O(n), since it will iterate through the collection to make a copy of each element.
"is there a performance impact that needs to be considered?"
The issue with your precise scenario is that first and foremost your real concern about performance would be from the hard-drive speed and efficiency of the drive's cache.
From that perspective, the impact is surely negligible to the point that NO it need not be considered.
BUT ONLY if you really need the features of the List<> structure to possibly either make you more productive, or your algorithm more friendly, or some other advantage. Otherwise, you're just purposely adding an insignificant performance hit, for no reason at all. In which case, naturally, you shouldn’t do it! :)
ToList() creates a new List and put the elements in it which means that there is an associated cost with doing ToList(). In case of small collection it won't be very noticeable cost but having a huge collection can cause a performance hit in case of using ToList.
Generally you should not use ToList() unless work you are doing cannot be done without converting collection to List. For example if you just want to iterate through the collection you don't need to perform ToList
If you are performing queries against a data source for example a Database using LINQ to SQL then the cost of doing ToList is much more because when you use ToList with LINQ to SQL instead of doing Delayed Execution i.e. load items when needed (which can be beneficial in many scenarios) it instantly loads items from Database into memory
Considering the performance of retrieving file list, ToList() is negligible. But not really for other scenarios. That really depends on where you are using it.
When calling on an array, list, or other collection, you create a copy of the collection as a List<T>. The performance here depends on the size of the list. You should do it when really necessary.
In your example, you call it on an array. It iterates over the array and adds the items one by one to a newly created list. So the performance impact depends on the number of files.
When calling on an IEnumerable<T>, you materialize the IEnumerable<T> (usually a query).
ToList Will create a new list and copy elements from original source to the newly created list so only thing is to copy the elements from the original source and depends on the source size
Let's look for another example;
If you working on databases when you run ToList() method and check the SQL Profiler for this code;
var IsExist = (from inc in entities.be_Settings
where inc.SettingName == "Number"
select inc).ToList().Count > 0;
Auto created query will like this:
SELECT [Extent1].[SettingName] AS [SettingName], [Extent1].[SettingValue] AS [SettingValue] FROM [dbo].[be_Settings] AS [Extent1] WHERE N'Number' = [Extent1].[SettingName]
The select query is run with the ToList method, and the results of the query are stored in memory, and it is checked whether there is a record by looking at the number of elements of the List. For example, if there are 1000 records in your table with the relevant criteria, these 1000 records are first brought from the database and converted into objects, and then they are thrown into a List and you only check the number of elements of this List. So this is very inefficient way to choose.
its not exactly about list perfomance but if u have high dimensional array
u can use HashSet instead of List.

.NET queue ElementAt performance

I'm having a hard time with parts of my code:
private void UpdateOutputBuffer()
{
T[] OutputField = new T[DisplayedLength];
int temp = 0;
int Count = HistoryQueue.Count;
int Sample = 0;
//Then fill the useful part with samples from the queue
for (temp = DisplayStart; temp != DisplayStart + DisplayedLength && temp < Count; temp++)
{
OutputField[Sample++] = HistoryQueue.ElementAt(Count - temp - 1);
}
DisplayedHistory = OutputField;
}
It takes most of the time in the program. The number of elements in HistoryQueue is 200k+. Could this be because the queue in .NET is implemented internally as a linked list?
What would be a better way of going about this? Basically, the class should act like a FIFO that starts dropping elements at ~500k samples and I could pick DisplayedLength elements and put them into OutputField. I was thinking of writing my own Queue that would use a circular buffer.
The code worked fine for count lower values. DisplayedLength is 500.
Thank you,
David
Queue does not have an ElementAt method. I'm guessing you are getting this via Linq, and that it is simply doing a forced iteration over n elements until it gets to the desired index. This is obviously going to slow down as the collection gets bigger. If ElementAt represents a common access pattern, then pick a data structure that can be accessed via index e.g. an Array.
Yes, the linked-list-ness is almost certainly the problem. There's a reason why Queue<T> doesn't implement IList<T> :) (Having said that, I believe Stack<T> is implemented using an array, and that still doesn't implement IList<T>. It could provide efficient random access, but it doesn't.)
I can't easily tell which portion of the queue you're trying to display, but I strongly suspect that you could simplify the method and make it more efficient using something like:
T[] outputField = HistoryQueue.Skip(...) /* adjust to suit requirements... */
.Take(DisplayedLength)
.Reverse()
.ToArray();
That's still going to have to skip over a huge number of items individually, but at least it will only have to do it once.
Have you thought of using a LinkedList<T> directly? That would make it a lot easier to read items from the end of the list really easily.
Building your own bounded queue using a circular buffer wouldn't be hard, of course, and may well be the better solution in the long run.
Absolutely the wrong data structure to use here. ElementAt is O(n), which makes your loop O(n2). You should use something else instead of a Queue.
Personally I don't think a queue is what you're looking for, but your access pattern is even worse. Use iterators if you want sequential access:
foreach(var h in HistoryQueue.Skip(DisplayStart).Take(DisplayedLength).Reverse())
// work with h
If you need to be able to pop/push at either end and have indexed access you really need an implementation of Deque (multiple array form). While there is no implementation in the BCL, there are plenty of third party ones (to get started, if needed you could implement your own later).

.NET Collection Classes

Group of related data like a list of parts etc., can be handled either using Arrays(Array of Parts) or using Collection. I understand that When Arrays are used, Insertion, Deletion and some other operations have performance impact when it is compared with Collections. Does this mean that Arrays are not used internally by the collections?, If so what is the data structure used for collections like List, Collection etc?
How the collections are handled internally?
List<T> uses an internal array. Removing/inserting items near the beginning of the list will be more expensive than doing the same near the end of the list, since the entire contents of the internal array need to be shifted in one direction. Also, once you try to add an item when the internal list is full, a new, bigger array will be constructed, the contents copied, and the old array discarded.
The Collection<T> class, when used with the parameterless constructor, uses a List<T> internally. So performance-wise they will be identical, with the exception of overhead caused by wrapping. (Essentially one more level of indirection, which is going to be negligible in most scenarios.)
LinkedList<T> is, as its name implies, a linked list. This will sacrifice iteration speed for insertion/removal speed. Since iterating means traversing pointers-to-pointers-to-pointers ad infinitum, this is going to take more work overall. Aside from the pointer traversal, two nodes may not be allocated anywhere near each other, reducing the effectiveness of CPU RAM caches.
However, the amount of time required to insert or remove a node is constant, since it requires the same number of operations no matter the state of the list. (This does not take into account any work that must be done to actually locate the item to remove, or to traverse the list to find the insertion point!)
If your primary concern with your collection is testing if something is in the collection, you might consider a HashSet<T> instead. Addition of items to the set will be relatively fast, somewhere between insertion into a list and a linked list. Removal of items will again be relatively fast. But the real gain is in lookup time -- testing if a HashSet<T> contains an item does not require iterating the entire list. On average it will perform faster than any list or linked list structure.
However, a HashSet<T> cannot contain equivalent items. If part of your requirements is that two items that are considered equal (by an Object.Equals(Object) overload, or by implementing IEquatable<T>) coexist independently in the collection, then you simply cannot use a HashSet<T>. Also, HashSet<T> does not guarantee insertion order, so you also can't use a HashSet<T> if maintaining some sort of ordering is important.
There are two basic ways to implement a simple collection:
contiguous array
linked list
Contiguous arrays have performance disadvantages for the operations you mentioned because the memory space of the collection is either preallocated or allocated based on the contents of the collection. Thus deletion or insertion requires moving many array elements to keep the entire collection contiguous and in the proper order.
Linked lists remove these issues because the items in the collection do not need to be stored in memory contiguously. Instead each element contains a reference to one or more of the other elements. Thus, when an insertion is made, the item in question is created anywhere in memory and only the references on one or two of the elements already in the collection need to be modified.
For example:
LinkedList<object> c = new LinkedList<object>(); // a linked list
object[] a = new object[] { }; // a contiguous array
This is simplified of course. The internals of LinkedList<> are doubtless more complex than a simple singly or doubly linked list, but that is the basic structure.
I think that some collection classes might use arrays internally as well as linked lists or something similar. The benefit of using collections from the System.Collections namespace instead of arrays, is that you do not need to spend any extra time writing code to perform update operations.
Arrays will always be more lightweight, and if you know some very good search algorithms, then you might even be able to use them more efficiently, but most of the the time you can avoid reinventing the wheel by using classes from System.Collections. These classes are meant to help the programmer avoid writing code that has already been written and tuned hundreds of times, so it is unlikely that you'll get a significant performance boost by manipulating arrays yourself.
When you need a static collection that doesn't require much adding, removing or editing, then perhaps it is a good time to use an array, since they don't require the extra memory that collections do.

Categories