.NET queue ElementAt performance - c#

I'm having a hard time with parts of my code:
private void UpdateOutputBuffer()
{
T[] OutputField = new T[DisplayedLength];
int temp = 0;
int Count = HistoryQueue.Count;
int Sample = 0;
//Then fill the useful part with samples from the queue
for (temp = DisplayStart; temp != DisplayStart + DisplayedLength && temp < Count; temp++)
{
OutputField[Sample++] = HistoryQueue.ElementAt(Count - temp - 1);
}
DisplayedHistory = OutputField;
}
It takes most of the time in the program. The number of elements in HistoryQueue is 200k+. Could this be because the queue in .NET is implemented internally as a linked list?
What would be a better way of going about this? Basically, the class should act like a FIFO that starts dropping elements at ~500k samples and I could pick DisplayedLength elements and put them into OutputField. I was thinking of writing my own Queue that would use a circular buffer.
The code worked fine for count lower values. DisplayedLength is 500.
Thank you,
David

Queue does not have an ElementAt method. I'm guessing you are getting this via Linq, and that it is simply doing a forced iteration over n elements until it gets to the desired index. This is obviously going to slow down as the collection gets bigger. If ElementAt represents a common access pattern, then pick a data structure that can be accessed via index e.g. an Array.

Yes, the linked-list-ness is almost certainly the problem. There's a reason why Queue<T> doesn't implement IList<T> :) (Having said that, I believe Stack<T> is implemented using an array, and that still doesn't implement IList<T>. It could provide efficient random access, but it doesn't.)
I can't easily tell which portion of the queue you're trying to display, but I strongly suspect that you could simplify the method and make it more efficient using something like:
T[] outputField = HistoryQueue.Skip(...) /* adjust to suit requirements... */
.Take(DisplayedLength)
.Reverse()
.ToArray();
That's still going to have to skip over a huge number of items individually, but at least it will only have to do it once.
Have you thought of using a LinkedList<T> directly? That would make it a lot easier to read items from the end of the list really easily.
Building your own bounded queue using a circular buffer wouldn't be hard, of course, and may well be the better solution in the long run.

Absolutely the wrong data structure to use here. ElementAt is O(n), which makes your loop O(n2). You should use something else instead of a Queue.

Personally I don't think a queue is what you're looking for, but your access pattern is even worse. Use iterators if you want sequential access:
foreach(var h in HistoryQueue.Skip(DisplayStart).Take(DisplayedLength).Reverse())
// work with h

If you need to be able to pop/push at either end and have indexed access you really need an implementation of Deque (multiple array form). While there is no implementation in the BCL, there are plenty of third party ones (to get started, if needed you could implement your own later).

Related

Fastest Approach to Finding Number of Unique Members in a List

I've been trying to find a good way of looking for the number of unique values from a list. There was a very good question here which I tried to peruse to create a solution that looks like this:
gridStats[0] = gridList.SelectMany(x => x.Position.Easting).Distinct().ToList().Count();
gridStats[1] = gridList.SelectMany(x => x.Position.Northing).Distinct().ToList().Count();
However, that seems to produce an error saying that I am implicitly declaring the type arguments that didn't make sense. Further research seemed to suggest that 'Distinct', good as it is, would not actually provide what I am looking for in any case without some additional code.
Therefore, I gave up on that approach and tried to go for a loop method, and I have arrived at this:
List<double> eastings = new List<double>();
List<double> northings = new List<double>();
for (int i = 0; i < gridList.Count; i++)
{
if (!eastings.Contains(gridList[i].Position.Easting))
{
eastings.Add(gridList[i].Position.Easting);
}
if (!northings.Contains(gridList[i].Position.Northing))
{
northings.Add(gridList[i].Position.Northing);
}
}
gridStats[0] = eastings.Count;
gridStats[1] = northings.Count;
Note here that 'gridList' can have hundreds of millions of entries.
Quite predictably, this loop is not particularly fast in use. Therefore, I was hoping it would be possible to either get assistance in making that loop more efficient or assistance in sorting out the Linq approach.
What do you suggest as the best approach when the only concern is the speed at which this task is performed?
You were so close.
Distinct is indeed the best choice for this scenario - it's similar to HashSet<T> based implementation, but uses internally a special lightweight hash set implementation. In practice I don't think there will be a noticeable difference in performance, but still Distinct is more readable and at the same time a bit faster.
What you've missed though is that the question in the link is about list of objects having a list property so it needed SelectMany, while in your case the objects hold a single property, so a simple Select will do the job, like this
gridStats[0] = gridList.Select(x => x.Position.Easting).Distinct().Count();
gridStats[1] = gridList.Select(x => x.Position.Northing).Distinct().Count();
Also note that ToList call was not needed in order to use Count extension method. Every operation has a cost, so don't include unnecessary methods - they'll not make your code more readable, but for sure will make it slower and more space consuming.
You can speed this up by using HashSet instead of List for eastings and northings:
HashSet<double> eastings = new HashSet<double>();
HashSet<double> northings = new HashSet<double>();
The reason this would be faster is because a HashSet uses a hash to give O(1) look ups, versus using List which will be O(n) (it has to search the whole list to see if the item exists).

Best .NET Array/List

So, I need an array of items. And I was wondering which one would be the fastest/best to use (in c#), I'll be doing to following things:
Adding elements at the end
Removing elements at the start
Looking at the first and last element (every frame)
Clearing it occasionally
Converting it to a normal array (not a list. I'm using iTween and it asks a normal array.) I'll do this almost every frame.
So, what would be the best to use considering these things? Especially the last one, since I'm doing that every frame. Should I just use an array, or is there something else that converts very fast to array and also has easy adding/removing of elements at the start & end?
Requirements 1) and 2) point to a Queue<T>, it is the only standard collection optimized for these 2 operations.
3) You'll need a little trickery for getting at the Last element, First is Peek().
4) is simple (.Clear())
5) The standard .ToArray() method will do this.
You will not escape copying all elements (O(n)) for item 5)
You could take a look at LinkedList<T>.
It has O(1) support for inspecting, adding and removing items at the beginning or end. It requires O(n) to copy to an array, but that seems unavoidable. The copy could be avoided if the API you were using accepted an ICollection<T> or IEnumerable<T>, but if that can't be changed then you may be stuck with using ToArray.
If your list changes less than once per frame then you could cache the array and only call ToArray again if the list has changed since the previous frame. Here's an implementation of a few of the methods, to give you an idea of how this potential optimization can work:
private LinkedList<T> list = new LinkedList<T>();
private bool isDirty = true;
private T[] array;
public void Enqueue(T t)
{
list.AddLast(t);
isDirty = true;
}
public T[] ToArray()
{
if (isDirty)
{
array = list.ToArray();
isDirty = false;
}
return array;
}
I'm assuming you are using classes (and not structs)? (If you are using structs (value type) then that changes things a bit.)
The System.Collections.Generic.List class lets you do all that, and quickly. The only part that could be done better with a LinkedList is removing from the start, but a single block memory copy isn't much pain, and it will create arrays without any hassle.
I wouldn't recommend using a Linked List, especially if you are only removing from the start or end. Each addition (with the standard LinkedList collection) requires a memory allocation (it has to build an object to reference what you actually want to add).
Lists also have lots of convenient functions, which you need to be careful when using if performance is an issue. Lists are essentially arrays which get bigger as you add stuff (every time you overfill them, they get much bigger, which saves excessive memory operations). Clearing them requires no effort, and leaves the memory allocated to be used another day.
In personal experience, .NET isn't suited to generic Linked Lists, you need to be writing your code specifically to work with them throughout. Lists:
Are easy to use
Do everything you want
Won't leave your memory looking like swiss cheese (well, as best you can do when you are allocating a new array every frame - I recommend you give the garbage collector the chance to get rid of any old arrays before making a new one if these Arrays are going to be big by re-using any array references and nulling any you don't need).
The right choice will depend heavily on the specifics of the application, but List is always a safe bet if you ask me, and you won't have to write any structure specific code to get it working.
If you do feel like using Lists, you'll want to look into these methods and properties:
ToArray() // Makes those arrays you want
Clear() // Clears the array
Add(item) // Adds an item to the end
RemoveAt(index) // index 0 for the first item, .Count - 1 for the last
Count // Retrieves the number of items in the list - it's not a free lookup, so try an avoid needless requests
Sorry if this whole post is overkill.
How about a circular array? If you keep the index of the last element and the first, you can have O(1) for every criteria you gave.
EDIT: You could take a C++ vector approach for capacity: double the size when it gets full.
Regular List will do the work and it is faster than LinkedList for insert.
Adding elements at the end -> myList.Insert(myList.Count - 1)
Removing elements at the start -> myList.RemoveAt(0)
Looking at the first and last element (every frame) -> myList[0] or
myList[myList.Count - 1]
Clearing it occasionally -> myList.Clear()
Converting it to a normal array (not a list. I'm using iTween and it
asks a normal array.) I'll do this almost every frame. ->
myList.ToArray()

What is the fastest way of changing Dictionary<K,V>?

This is an algorithmic question.
I have got Dictionary<object,Queue<object>>. Each queue contains one or more elements in it. I want to remove all queues with only one element from the dictionary. What is the fastest way to do it?
Pseudo-code: foreach(item in dict) if(item.Length==1) dict.Remove(item);
It is easy to do it in a loop (not foreach, of course), but I'd like to know which approach is the fastest one here.
Why I want it: I use that dictionary to find duplicate elements in a large set of objects. The Key in dictionary is kind of a hash of the object, the Value is a queue of all objects found with the same hash. Since I want only duplicates, I need to remove all items with just a single object in associated queue.
Update:
It may be important to know that in a regular case there are just a few duplicates in a large set of objects. Let's assume 1% or less. So possibly it could be faster to leave the Dictionary as is and create a new one from scatch with just selected elements from the first one... and then deelte the first Dictionary completely. I think it depends on the comlpexity of computational Dictionary class's methods used in particular algorithms.
I really want to see this problem on a theoretical level because as a teacher I want to discuss it with students. I didn't provide any concrete solution myself because I think it is really easy to do it. The question is which approach is the best, the fastest.
var itemsWithOneEntry = dict.Where(x => x.Value.Count == 1)
.Select(x => x.Key)
.ToList();
foreach (var item in itemsWithOneEntry) {
dict.Remove(item));
}
It stead of trying to optimize the traversing of the collection how about optimizing the content of the collection so that it only includes the duplicates? This would require changing your collection algorithm instead to something like this
var duplicates = new Dictionary<object,Queue<object>>;
var possibleDuplicates = new Dictionary<object,object>();
foreach(var item in original){
if(possibleDuplicates.ContainsKey(item)){
duplicates.Add(item, new Queue<object>{possibleDuplicates[item],item});
possibleDuplicates.Remove(item);
} else if(duplicates.ContainsKey(item)){
duplicates[item].Add(item);
} else {
possibleDuplicates.Add(item);
}
}
Note that you should probably measure the impact of this on the performance in a realistic scenario before you bother to make your code any more complex than it really needs to be. Most imagined performance problems are not in fact the real cause of slow code.
But supposing you do find that you could get a speed advantage by avoiding a linear search for queues of length 1, you could solve this problem with a technique called indexing.
As well as your dictionary containing all the queues, you maintain an index container (probably another dictionary) that only contains the queues of length 1, so when you need them they are already available separately.
To do this, you need to enhance all the operations that modify the length of the queue, so that they have the side-effect of updating the index container.
One way to do it is to define a class ObservableQueue. This would be a thin wrapper around Queue except it also has a ContentsChanged event that fires when the number of items in the queue changes. Use ObservableQueue everywhere instead of the plain Queue.
Then when you create a new queue, enlist on its ContentsChanged event a handler that checks to see if the queue only has one item. Based on this you can either insert or remove it from the index container.

Selecting specific object in queue ( ie peek +1)

if Peek returns the next object in a queue, is there a method I can use to get a specific object? For example, I want to find the third object in the queue and change one of its values?
right now Im just doing a foreach through the queue which might be the best solution, but I didnt know if there was something special you can use with peek? ie Queue.Peek(2)
If you want to access elements directly (with an O(1) operation), use an array instead of a queue because a queue has a different function (FIFO).
A random access operation on a queue will be O(n) because it needs to iterate over every element in the collection...which in turn makes it sequential access, rather than direct random access.
Then again, since you're using C#, you can use queue.ElementAt(n) from System.Linq (since Queue implements IEnumerable) but that won't be O(1) i.e. it will still iterate over the elements.
Although this is still O(n), it's certainly easier to read if you use the LINQ extention methods ElementAt() or ElementAtOrDefault(), these are extentions of IEnumerable<T>, which Queue<T> implements.
using System.Linq;
Queue<T> queue = new Queue<T>();
T result;
result = queue.ElementAt(2);
result = queue.ElementAtOrDefault(2);
Edit
If you do go with the other suggestions of converting your Queue to an array just for this operation that you need to decide if the likely sizes of your queue and the distance of the index you'll be looking for from the start of your queue justify the O(n) operation of calling .ToArray(). ElementAt(m), not to mention the space requirements of creating a secondary storage location for it.
foreach through a queue. Kind of a paradox.
However, if you can foreach, it is an IEnumerable, so the usual linq extensions apply:
queue.Skip(1).FirstOrDefault()
or
queue.ElementAt(1)
You could do something like this as a one off:
object thirdObjectInQueue = queue.ToArray()[2];
I wouldn't recommend using it a lot, however, as it copies the whole queue to an array, thereby iterating over the whole queue anyway.

Using foreach (...) syntax while also incrementing an index variable inside the loop

When looking at C# code, I often see patterns like this:
DataType[] items = GetSomeItems();
OtherDataType[] itemProps = new OtherDataType[items.Length];
int i = 0;
foreach (DataType item in items)
{
// Do some stuff with item, then finally
itemProps[i] = item.Prop;
i++;
}
The for-loop iterates over the objects in items, but also keeping a counter (i) for iterating over itemProps as well. I personally don't like this extra i hanging around, and instead would probably do something like:
DataType[] items = GetSomeItems();
OtherDataType[] itemProps = new OtherDataType[items.Length];
for (int i = 0; i < items.Length; i++)
{
// Do some stuff with items[i], then finally
itemProps[i] = items[i].Prop;
}
Is there perhaps some benfit to the first approach I'm not aware of? Is this a result of everybody trying to use that fancy foreach (...) syntax? I'm interested in your opinions on this.
If you are using C# 3.0 that will be better;
OtherDataType[] itemProps = items.Select(i=>i.Prop).ToArray();
With i being outside the array then if would be available after the completion of the loop. If you wanted to count the number of items and the collection didn't provide a .Count or .UBound property then this could be useful.
Like you I would normally use the second method, looks much cleaner to me.
In this case, I don't think so. Sometimes, though, the collection doesn't implement this[int index] but it does implement GetEnumerator(). In the latter case, you don't have much choice.
Some data structures are not well suited for random access but can be iterated over very fast ( Trees, linked lists, etc ). So if you need to iterate over one of these but need a count for some reason, your doomed to go the ugly way...
Semantically they may be equivalent, but in fact using foreach over an enumerator gives the compiler more scope to optimise.
I don't remember all the arguments off the top of my head,but they are well covered in Effective C#, which is recommended reading.
foreach (DataType item in items)
This foreach loop makes it crystal clear that you're iterating over all the DataType item of, well yes, items. Maybe it makes the code a little longer, but it's not a "bad" code. For the other for-loop, you need to check inside the brackets to have an idea for what this loop is used.
The problem with this example lies in the fact that you're iterating over two different arrays in the same time which we don't do that often.. so we are stuck between two strategies.. either we "hack a bit" the fancy-foreach as you call it or we get back on the old-not-so-loved for(int i = 0; i ...). (There are other ways than those 2, of course)
So, I think it's the Vim vs Emacs things coming back in your question with the For vs Foreach loop :) People who like the for(), will say this foreach is useless, might cause performance issues and is just big. People who prefere foreach will say something like, we don't care if there's two extra line if we can read the code and maintenance it easily.
Finally, the i is outside the scope first the first example and inside for the second.. reasons to that?! Because if you use the i outside of your foreach, I would have called differently. And, for my opinion, I prefer the foreach ways because you see immediately what is happening. You also don't have to think about if it's < or =. You know immediately that you are iterating over all the list, However, sadly, people will forget about the i++ at the end :D So, I say Vim!
Lets not forget that some collections do not implement a direct access operator[] and that you have to iterate using the IEnumerable interface which is most easily accessed with foreach().

Categories