How to remove from List<T> efficiently (C#)? - c#

If I understood correctly (and please correct me if i'm wrong), list is implemented by array in .NET, which means that every deletion of an item in the list will cause re-allocation of all the list (which in turn means O(n)).
I'm developing a game, in the game i have many bullets fly in the air on any giving moment, let's say 100 bullets, each frame I move them by few pixels and check for collision with objects in the game, I need to remove from the list every bullet that collided.
So I collect the collided bullet in another temporary list and then do the following:
foreach (Bullet bullet in bulletsForDeletion)
mBullets.Remove(bullet);
Because the loop is O(n) and the remove is O(n), I spend O(n^2) time to remove.
Is there a better way to remove it, or more suitable collection to use?

Sets and linked lists both have constant time removal. Can you use either of those data structures?
There's no way to avoid the O(N) cost of removal from List<T>. You'll need to use a different data structure if this is a problem for you. It may make the code which calculates bulletsToRemove feel nicer too.
ISet<T> has nice methods for calculating differences and intersections between sets of objects.
You lose ordering by using sets, but given you are taking bullets, I'm guessing that is not an issue. You can still enumerate it in constant time.
In your case, you might write:
mBullets.ExceptWith(bulletsForDeletion);

Create a new list:
var newList = `oldList.Except(deleteItems).ToList()`.
Try to use functional idioms wherever possible. Don't modify existing data structures, create new ones.
This algorithm is O(N) thanks to hashing.

Can't you just switch from List (equivalent of Java's ArrayList to highlight that) to LinkedList? LinkedList takes O(1) to delete a specific element, but O(n) to delete by index

How a list is implemented internally is not something you should be thinking about. You should be interacting with the list in that abstraction level.
If you do have performance problems and you pinpoint them to the list, then it is time to look at how it is implemented.
As for removing items from a list - you can use mBullets.RemoveAll(predicate), where predicate is an expression that identifies items that have collided.

RemoveAt(int index) is faster than Remove(T item) because the later use the first inside it, doing reflection, this is the code inside each function.
Also, Remove(T) has the function IndexOf, which has inside it a second loop to evalute the index of each item.
public bool Remove(T item)
{
int index = this.IndexOf(item);
if (index < 0)
return false;
this.RemoveAt(index);
return true;
}
public void RemoveAt(int index)
{
if ((uint) index >= (uint) this._size)
ThrowHelper.ThrowArgumentOutOfRangeException();
--this._size;
if (index < this._size)
Array.Copy((Array) this._items, index + 1, (Array) this._items, index, this._size - index);
this._items[this._size] = default (T);
++this._version;
}
I would do a loop like this:
for (int i=MyList.Count-1; i>=0; i--)
{
// add some code here
if (needtodelete == true)
MyList.RemoveAt(i);
}

In the spirit of functional idiom suggested by "usr", you might consider not removing from your list at all. Instead, have your update routine take a list and return a list. The list returned contains only the "still-alive" objects. You can then swap lists at the end of the game loop (or immediately, if appropriate). I've done this myself.

Ideally, it shouldn't be important to think about how C# implements its List methods, and somewhat unfortunate that it reassigns all the array elements after index i when doing somelist.RemoveAt(i). Sometimes optimizations around standard library implementations are necessary, though.
If you find your execution is expending unacceptable effort shuffling list elements around when removing them from a list with RemoveAt, one approach that might work for you is to swap it with the end, and then RemoveAt off the end of the list:
if (i < mylist.Count-1) {
mylist[i] = mylist[mylist.Count-1];
}
mylist.RemoveAt(mylist.Count-1);
This is of course an O(1) operation.

Related

Best .NET Array/List

So, I need an array of items. And I was wondering which one would be the fastest/best to use (in c#), I'll be doing to following things:
Adding elements at the end
Removing elements at the start
Looking at the first and last element (every frame)
Clearing it occasionally
Converting it to a normal array (not a list. I'm using iTween and it asks a normal array.) I'll do this almost every frame.
So, what would be the best to use considering these things? Especially the last one, since I'm doing that every frame. Should I just use an array, or is there something else that converts very fast to array and also has easy adding/removing of elements at the start & end?
Requirements 1) and 2) point to a Queue<T>, it is the only standard collection optimized for these 2 operations.
3) You'll need a little trickery for getting at the Last element, First is Peek().
4) is simple (.Clear())
5) The standard .ToArray() method will do this.
You will not escape copying all elements (O(n)) for item 5)
You could take a look at LinkedList<T>.
It has O(1) support for inspecting, adding and removing items at the beginning or end. It requires O(n) to copy to an array, but that seems unavoidable. The copy could be avoided if the API you were using accepted an ICollection<T> or IEnumerable<T>, but if that can't be changed then you may be stuck with using ToArray.
If your list changes less than once per frame then you could cache the array and only call ToArray again if the list has changed since the previous frame. Here's an implementation of a few of the methods, to give you an idea of how this potential optimization can work:
private LinkedList<T> list = new LinkedList<T>();
private bool isDirty = true;
private T[] array;
public void Enqueue(T t)
{
list.AddLast(t);
isDirty = true;
}
public T[] ToArray()
{
if (isDirty)
{
array = list.ToArray();
isDirty = false;
}
return array;
}
I'm assuming you are using classes (and not structs)? (If you are using structs (value type) then that changes things a bit.)
The System.Collections.Generic.List class lets you do all that, and quickly. The only part that could be done better with a LinkedList is removing from the start, but a single block memory copy isn't much pain, and it will create arrays without any hassle.
I wouldn't recommend using a Linked List, especially if you are only removing from the start or end. Each addition (with the standard LinkedList collection) requires a memory allocation (it has to build an object to reference what you actually want to add).
Lists also have lots of convenient functions, which you need to be careful when using if performance is an issue. Lists are essentially arrays which get bigger as you add stuff (every time you overfill them, they get much bigger, which saves excessive memory operations). Clearing them requires no effort, and leaves the memory allocated to be used another day.
In personal experience, .NET isn't suited to generic Linked Lists, you need to be writing your code specifically to work with them throughout. Lists:
Are easy to use
Do everything you want
Won't leave your memory looking like swiss cheese (well, as best you can do when you are allocating a new array every frame - I recommend you give the garbage collector the chance to get rid of any old arrays before making a new one if these Arrays are going to be big by re-using any array references and nulling any you don't need).
The right choice will depend heavily on the specifics of the application, but List is always a safe bet if you ask me, and you won't have to write any structure specific code to get it working.
If you do feel like using Lists, you'll want to look into these methods and properties:
ToArray() // Makes those arrays you want
Clear() // Clears the array
Add(item) // Adds an item to the end
RemoveAt(index) // index 0 for the first item, .Count - 1 for the last
Count // Retrieves the number of items in the list - it's not a free lookup, so try an avoid needless requests
Sorry if this whole post is overkill.
How about a circular array? If you keep the index of the last element and the first, you can have O(1) for every criteria you gave.
EDIT: You could take a C++ vector approach for capacity: double the size when it gets full.
Regular List will do the work and it is faster than LinkedList for insert.
Adding elements at the end -> myList.Insert(myList.Count - 1)
Removing elements at the start -> myList.RemoveAt(0)
Looking at the first and last element (every frame) -> myList[0] or
myList[myList.Count - 1]
Clearing it occasionally -> myList.Clear()
Converting it to a normal array (not a list. I'm using iTween and it
asks a normal array.) I'll do this almost every frame. ->
myList.ToArray()

.NET queue ElementAt performance

I'm having a hard time with parts of my code:
private void UpdateOutputBuffer()
{
T[] OutputField = new T[DisplayedLength];
int temp = 0;
int Count = HistoryQueue.Count;
int Sample = 0;
//Then fill the useful part with samples from the queue
for (temp = DisplayStart; temp != DisplayStart + DisplayedLength && temp < Count; temp++)
{
OutputField[Sample++] = HistoryQueue.ElementAt(Count - temp - 1);
}
DisplayedHistory = OutputField;
}
It takes most of the time in the program. The number of elements in HistoryQueue is 200k+. Could this be because the queue in .NET is implemented internally as a linked list?
What would be a better way of going about this? Basically, the class should act like a FIFO that starts dropping elements at ~500k samples and I could pick DisplayedLength elements and put them into OutputField. I was thinking of writing my own Queue that would use a circular buffer.
The code worked fine for count lower values. DisplayedLength is 500.
Thank you,
David
Queue does not have an ElementAt method. I'm guessing you are getting this via Linq, and that it is simply doing a forced iteration over n elements until it gets to the desired index. This is obviously going to slow down as the collection gets bigger. If ElementAt represents a common access pattern, then pick a data structure that can be accessed via index e.g. an Array.
Yes, the linked-list-ness is almost certainly the problem. There's a reason why Queue<T> doesn't implement IList<T> :) (Having said that, I believe Stack<T> is implemented using an array, and that still doesn't implement IList<T>. It could provide efficient random access, but it doesn't.)
I can't easily tell which portion of the queue you're trying to display, but I strongly suspect that you could simplify the method and make it more efficient using something like:
T[] outputField = HistoryQueue.Skip(...) /* adjust to suit requirements... */
.Take(DisplayedLength)
.Reverse()
.ToArray();
That's still going to have to skip over a huge number of items individually, but at least it will only have to do it once.
Have you thought of using a LinkedList<T> directly? That would make it a lot easier to read items from the end of the list really easily.
Building your own bounded queue using a circular buffer wouldn't be hard, of course, and may well be the better solution in the long run.
Absolutely the wrong data structure to use here. ElementAt is O(n), which makes your loop O(n2). You should use something else instead of a Queue.
Personally I don't think a queue is what you're looking for, but your access pattern is even worse. Use iterators if you want sequential access:
foreach(var h in HistoryQueue.Skip(DisplayStart).Take(DisplayedLength).Reverse())
// work with h
If you need to be able to pop/push at either end and have indexed access you really need an implementation of Deque (multiple array form). While there is no implementation in the BCL, there are plenty of third party ones (to get started, if needed you could implement your own later).

When to use each type of loop?

I'm learning the basics of programming here (C#) but I think this question is generic in its nature.
What are some simple practical situations that lend themselves closer to a particular type of loop?
The while and for loops seem pretty similar and there are several SO questions addressing the differences between the two. How about foreach? From my basic understanding, its seems I ought to be able to do everything a foreach loop does within a for loop.
Which ever works best for code readability. In other words use the one that fits the situation best.
while: When you have a condition that needs to be checked at the start of each loop. e.g. while(!file.EndOfFile) { }
for: When you have an index or counter you are incrementing on each loop. for (int i = 0; i<array.Length; i++) { }. Essentially, the thing you are looping over is an indexable collection, array, list, etc.
foreach: When you are looping over a collection of objects or other Enumerable. In this event you may not know (or care) the size of the collection, or the collection is not index based (e.g. a set of objects). Generally I find foreach loops to be the most readable when I'm not interested in the index of something or any other exit conditions.
Those are my general rules of thumb anyway.
1. foreach and for
A foreach loop works with IEnumerator, when a for loop works with an index (in object myObject = myListOfObjects[i], i is the index).
There is a big difference between the two:
an index can access directly any object based on its position within a list.
an enumerator can only access the first element of a list, and then move to the next element (as described in the previous link from the msdn). It cannot access an element directly, just knowing the index of the element within a list.
So an enumerator may seem less powerful, but:
you don't always know the position of elements in a group, because all groups are not ordered/indexed.
you don't always know the number of elements in a list (think about a linked list).
even when it's ordered, the indexed access of a list may be based internally on an enumerator, which means that each time you're accessing an element by its position you may be actually enumerating all elements of the list up until the element you want.
indexes are not always numeric. Think about Dictionary.
So actually the big strength of the foreach loop and the underlying use of IEnumerator is that it applies to any type which implements IEnumerable (implementing IEnumerable just means that you provide a method that returns an enumerator). Lists, Arrays, Dictionaries, and all other group types all implement IEnumerable. And you can be sure that the enumerator they have is as good as it gets: you won't find a fastest way to go through a list.
So, the for loop can generally be considered as a specialized foreach loop:
public void GoThrough(List<object> myList)
{
for (int i=0; i<myList.Count; i++)
{
MessageBox.Show(myList[i].ToString());
}
}
is perfectly equivalent to:
public void GoThrough(List<object> myList)
{
foreach (object item in myList)
{
MessageBox.Show(item.ToString());
}
}
I said generally because there is an obvious case when the for loop is necessary: when you need the index (i.e. the position in the list) of the object, for some reason (like displaying it). You will though eventually realize that this happens only in specific cases when you do good .NET programming, and that foreach should be your default candidate for loops over a group of elements.
Now to keep comparing the foreach loop, it is indeed just an eye-candy specific while loop:
public void GoThrough(IEnumerable myEnumerable)
{
foreach (object obj in myEnumerable)
{
MessageBox.Show(obj.ToString());
}
}
is perfectly equivalent to:
public void GoThrough(IEnumerable myEnumerable)
{
IEnumerator myEnumerator = myEnumerable.GetEnumerator();
while (myEnumerator.MoveNext())
{
MessageBox.Show(myEnumerator.Current.ToString());
}
}
The first writing is a lot simpler though.
2. while and do..while
The while (condition) {action} loop and the do {action} while (condition) loop just differ from each other by the fact that the first one tests the condition before applying the action, when the second one applies the action, then tests the condition. The do {..} while (..) loop is used quite marginally compared to the others, since it runs the action at least once even if the condition is initially not met (which can lead to trouble, since the action is generally dependent on the condition).
The while loop is more general than the for and foreach ones, which apply specifically to lists of objects. The while loop just has a condition to go on, which can be based on anything. For example:
string name = string.empty;
while (name == string.empty)
{
Console.WriteLine("Enter your name");
name = Console.ReadLine();
}
asks the user to input his name then press Enter, until he actually inputs something. Nothing to do with lists, as you can see.
3. Conclusion
When you are going through a list, you should use foreach unless you need the numeric index, in which case you should use for.
When it doesn't have anything to do with list, and it's just a procedural construction, you should use while(..) {..}.
Now to conclude with something less restrictive: your first goal with .NET should be to make your code readable/maintainable and make it run fast, in that order of priority. Anything that achieves that is good for you. Personally though, I think the foreach loop has the advantage that potentially, it's the most readable and the fastest.
Edit: there is an other case where the for loop is useful: when you need indexing to go through a list in a special way or if you need to modify the list when in the loop. For example, in this case because we want to remove every null element from myList:
for (int i=myList.Count-1; i>=0; i--)
{
if (myList[i] == null) myList.RemoveAt(i);
}
You need the for loop here because myList cannot be modified from within a foreach loop, and we need to go through it backwards because if you remove the element at the position i, the position of all elements with an index >i will change.
But the use for these special constructions have been reduced since LINQ. The last example can be written like this in LINQ for example:
myList.RemoveAll(obj => obj == null);
LINQ is a second step though, learn the loops first.
when you know how many iterations there will be use for
when you don't know use while, when don't know and need to execute code at least once use do
when you iterate through collection and don't need index use foreach
(also you can not use collection[i] on everything that you can use foreach on)
As others have said, 'it depends'.
I find I use simple 'for' loops very rarely nowadays. If you start to use Linq you'll find you either don't need loops at all and when you do it's the 'foreach' loop that's called for.
Ultimately I agree with Colin Mackay - code for readability!
The do while loop has been forgotten, I think :)
Taken from here.
The C# while statement executes a statement or a block of statements until a specified expression evaluates to false . In some situation you may want to execute the loop at least one time and then check the condition. In this case you can use do..while loop.
The difference between do..while and while is that do..while evaluates its expression at the bottom of the loop instead of the top. Therefore, the statements within the do block are always executed at least once. From the following example you can understand how do..while loop function.
using System;
using System.Windows.Forms;
namespace WindowsApplication1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
int count = 5;
do{
MessageBox.Show(" Loop Executed ");
count++;
}while (count <=4);
}
private void button2_Click(object sender, EventArgs e)
{
int count = 5;
while (count <=4){
MessageBox.Show(" Loop Executed ");
count++;
}
}
}
}
If you have a collection and you kow upfront you're going to systematically pass through all values, use foreach as it is usually easier to work with the "current" instance.
If some condition can make you stop iterating, you can use for or while. They are pretty similar, the big difference being that for takes control of when the current index or value is updated (in the for declaration) as in a while you decide when and where in the while block to update some values that are then checked in the while predicate.
If you are writing a parser class, lets say an XMLParser that will read XML nodes from a given source, you can use while loop as you don't know how many tags are there.
Also you can use while when you iterate if the variable is true or not.
You can use for loop if you want to have a bit more control over your iterations

Is there a performance difference between these two algorithms for shuffling an IEnumerable?

These two questions give similar algorthims for shuffling an IEnumerable:
C#: Is using Random and OrderBy a good shuffle algorithm?
Can you enumerate a collection in C# out of order?
Here are the two methods side-by-side:
public static IEnumerable<T> Shuffle1<T> (this IEnumerable<T> source)
{
Random random = new Random ();
T [] copy = source.ToArray ();
for (int i = copy.Length - 1; i >= 0; i--) {
int index = random.Next (i + 1);
yield return copy [index];
copy [index] = copy [i];
}
}
public static IEnumerable<T> Shuffle2<T> (this IEnumerable<T> source)
{
Random random = new Random ();
List<T> copy = source.ToList ();
while (copy.Count > 0) {
int index = random.Next (copy.Count);
yield return copy [index];
copy.RemoveAt (index);
}
}
They are basically identical, except one uses a List, and one uses an array. Conceptually, the second one seems more clear to me. But is there a substantial performance benefit to be gained from using an array? Even if the Big-O time is the same, if it is several times faster, it could make a noticeable difference.
The second version will probably be a bit slower because of RemoveAt. Lists are really arrays that grow when you add elements to them, and as such, insertion and removal in the middle is slow (in fact, MSDN states that RemoveAt has an O(n) complexity).
Anyway, the best would be to simply use a profiler to compare both methods.
The first doesn't compile, although it's apparent that you're trying to reify the enumerable, and then implement Fisher-Yates; that's probably the correct approach, and it shouldn't be unclear to anyone who has ever shuffled an array before. The second using RemoveAt is bad for the reasons stated by other commenters.
EDIT: Your top implementation looks like it's correct now, and it's a good way to do it.
The first algorithm is O(n) as it has a loop which performs an O(1) swap on each iteration. The second algorithm is O(n^2) as it performs an O(n) RemoveAt operation on each iteration. In addition indexers on lists are slower than indexes on arrays because the former is a method call whereas the latter is an IL instruction.
So out of the two the first one is likely to be faster. That said, if you're after performance, why bother yielding the results? It's already converting to an array so just shuffle that in place and return the array directly (or wrapped in a ReadOnlyCollection<T> if you're worried about people changing it) which is probably faster still.
On a side note, both methods have bugs that the behaviour of Random when used by multiple threads is undefined, so they should probably use a thread-safe random number generator.

Enumerator problem, Any way to avoid two loops?

I have a third party api, which has a class that returns an enumerator for different items in the class.
I need to remove an item in that enumerator, so I cannot use "for each". Only option I can think of is to get the count by iterating over the enum and then run a normal for loop to remove the items.
Anyone know of a way to avoid the two loops?
Thanks
[update] sorry for the confusion but Andrey below in comments is right.
Here is some pseudo code out of my head that won't work and for which I am looking a solution which won't involve two loops but I guess it's not possible:
for each (myProperty in MyProperty)
{
if (checking some criteria here)
MyProperty.Remove(myProperty)
}
MyProperty is the third party class that implements the enumerator and the remove method.
Common pattern is to do something like this:
List<Item> forDeletion = new List<Item>();
foreach (Item i in somelist)
if (condition for deletion) forDeletion.Add(i);
foreach (Item i in forDeletion)
somelist.Remove(i); //or how do you delete items
Loop through it once and create a second array which contains the items which should not be deleted.
If you know it's a collection, you can go with reverted for:
for (int i = items.Count - 1; i >= 0; i--)
{
items.RemoveAt(i);
}
Otherwise, you'll have to do two loops.
You can create something like this:
public IEnumerable<item> GetMyList()
{
foreach (var x in thirdParty )
{
if (x == ignore)
continue;
yield return x;
}
}
I need to remove an item in that enumerator
As long as this is a single item that's not a problem. The rule is that you cannot continue to iterate after modifying the collection. Thus:
foreach (var item in collection) {
if (item.Equals(toRemove) {
collection.Remove(toRemove);
break; // <== stop iterating!!
}
}
It is not possible to remove an item from an Enumerator. What you can do is to copy or filter(or both) the content of the whole enumeration sequence.
You can achieve this by using linq and do smth like this:
YourEnumerationReturningFunction().Where(item => yourRemovalCriteria);
Can you elaborate on the API and the API calls you are using?
If you receive an IEnumerator<T> or IEnumerable<T> you cannot remove any item from the sequence behind the enumerator because there is no method to do so. And you should of course not rely on down casting an received object because the implementation may change. (Actually a well designed API should not expose mutable objects holding internal state at all.)
If you receive IList<T> or something similar you can just use a normal for loop from back to front and remove the items as needed because there is no iterator which state could be corrupted. (Here the rule about exposing mutable state should apply again - modifying the returned collection should not change any state.)
IEnumerator.Count() will decide at run-time what it needs to do - enumerate to count or reflect to see it's a collection and call .Count that way.
I like SJoerd's suggestion but I worry about how many items we may be talking about.
Why not something like ..
// you don't want 2 and 3
IEnumerable<int> fromAPI = Enumerable.Range(0, 10);
IEnumerable<int> result = fromAPI.Except(new[] { 2, 3 });
A clean, readable way to do this is as follows (I'm guessing at the third-party container's API here since you haven't specified it.)
foreach(var delItem in ThirdPartyContainer.Items
.Where(item=>ShouldIDeleteThis(item))
//or: .Where(ShouldIDeleteThis)
.ToArray()) {
ThirdPartyContainer.Remove(delItem);
}
The call to .ToArray() ensures that all items to be deleted have been greedily cached before the foreach iteration begins.
Behind the scenes this involves an array and an extra iteration over that, but that's generally very cheap, and the advantage of this method over the other answers to this question is that it works on plain enumerables and does not involve tricky mutable state issues that are hard to read and easy to get wrong.
By contrast, iterating in reverse, while not rocket science, is much more prone to off-by-one errors and harder to read; and it also relies on internals of the collection such as not changing order in between deletions (e.g. better not be a binary heap, say). Manually adding items that should be deleted to a temporary list is just unnecessary code - that's what .ToArray() will do just fine :-).
an enumerator always has a private field pointing to the real collection.
you can get it via reflection.modify it.
have fun.

Categories