I am writing a method that does some manipulation on a List. During each iteration, the algorithm will either delete, add or move an existing item (so the list is changing, one item at a time). Move operation will be performed using an extension method that is implemented with the following logic:
T item = base[oldIndex];
base.RemoveItem(oldIndex);
base.InsertItem(newIndex, item);
Also, during each iteration, one item in the list will be queried for its current index. List.IndexOf(item) is an expensive operation (O(n)), and I wonder if there is a way to improve the performance of this operation. I thought about keeping a dictionary of the indices, but then on every insertion and removal of an item I will need to update the indices of O(n) items the come after it, so that does not help.
Note: I am ware that List.InsertItem and List.RemoveItem take O(n) anyway, but I am specifically asking about how to improve the performance of the IndexOf operation in such a scenario.
Thanks
Related
Please let me know if this is already answered somewhere; I can't find it.
In memory, I have a collection of objects in firstList and related objects in secondList. I want to find all items from firstList where the id's match the secondList item's RelatedId. So it's fairly straightforward:
var items = firstList.Where(item => secondList.Any(secondItem => item.Id == secondItem.RelatedId));
But when both first and second lists get even moderately larger, this call is extremely expensive and takes many seconds to complete. Is there a way to break this up, or redesign it into multiple queries to make it more efficient?
The reason this code is so inefficient is that for every element of the first list, it has to iterate over (in the worst case) the entirety of the second list to look for an item with matching id.
A more efficient way to do this using LINQ would be using the Join method as follows:
var items = firstList.Join(secondList, item => item.Id, secondItem => secondItem.RelatedId, (item, _) => item);
If the second collection may contain duplicate IDs, you will additionally have to run Distinct() (possibly with some changes depending on the equality semantics for the members of the first list) on the result to maintain the semantics of your original code.
This code resulted in a roughly 100x speedup for me with a test using two lists of 10000 elements each.
If you're running this operation often and one of the collections does not change, you could consider caching the Ids or RelatedIds in a HashSet instead.
What is the best performance optimized alternative to choose among Dictionary<TKey,TValue>, HashSet<T> and List<T> with respect to :
Add values(without duplicates)
LookUps
Delete values.
I have to avoid adding duplicate values to the collection I know HashSet is good, since it skips the add if a duplicate is detected, a dictionary on the other hand throws an exception if a duplicate is found. List will need an additional ifExists check on the existing items before adding the value. But adding values in a HashSet<T> without duplicates seems to take around 1 min for 10K records. Is there a way to optimize this.
Ok ... In terms of theory, all data structures that you talked about (HashSet, Dictionary and List) have asimptotically O (1 ) time complexity for adding items. Hashing data structures have O(1) for delete also. For Lists, depends a lot where are you perfoming the delete operation: if you remove at a random "i" position then you have O(N) complexity due to the fact that all items from i+1 to the end of the list must be shifted to the left by one position. If you remove always the last element then it is an O(1) complexity.
But most important, data strucures that are based on hashing have a big bonus : O (1) lookup complexity. But this is only in theory. In practice if you define a very bad hashcode for your types you could fallback to an O(N) complexity. A simple example would be overriding gethashcode function and returning a constant int. I suspect that your bad performance comes from a bad GetHashCode design.
Another thing to remember: dictionary and HashSet are data structures to be used in different scenarious. You could view Dictionary as a kind of array for wich the index can be any type and HashSet a special list that does not allow duplicates
This perfectly answers the performance stats for Dictionary, List and HashSet w.r.t:
Add, LookUp and Remove
http://theburningmonk.com/2011/03/hashset-vs-list-vs-dictionary/
When it comes to performance and storing unique values I would prefer a Hash set or dictionary depending on my requirement.
hashSet is used when you dont have a key value pair to enter and still you dont want a duplicate in your collection. So, hashset is a collection to store unique values wihtout a key value pair.
where as when I have a pair of key and value i prefer dictionary to store unique values.
I'm using Linq with objects and have a question.
Actual workload is processed in a background job.
The actual workload should not be put to further tasks. It is more important, that the method which processes the actual workitems does it quite fast and gives a result, therefore the complete workload is put to chunks of workitems.
At a time, I only process the first n objects of the list, therefore I take n objects and then remove them from the complete list of workitems.
Roughly:
int amount = 100;
List<WorkItem> actualWorkload = null;
lock(lockWorkload){ // complete Workload is filled in other thread
if (actualWorkload.Count > amount)
{
actualWorkload.AddRange(completeWorkload.Take(amount));
}
else
{
actualWorkload.AddRange(completeWorkload);
}
completeWorkload.RemoveAll(x => actualWorkload.Contains(x));
}
//do something with workitems Process(actualWorkload);
My question is: can 'Take' and 'Remove' be somehow combined, so that there is only one step to take items and directly remove this items from a list? I search fro something liek the 'Take' of a BlockingCollection which removes an item while it is returned.
Even if there would be an operation that allows that, in the background it would still need to remove and add them.
But maybe you don't need two lists? Maybe you could have only one list with all items, and each item has a status that you set to a different value or something. But of course, I don't know your exact case so this may not be feasible.
"Take and remove" sounds to me a lot like "Dequeue", so perhaps you could just use a Queue or a ConcurrentQueue?
However, the enumerator does NOT remove items from the queue, so you would have to call Dequeue repeatedly yourself.
I am trying to find out more details about the sorted list so that I can analyze whether it will perform like I need it to in several specific situations.
Does anyone know the specific sorting algorithm that it uses to sort its elements?
SortedList<T,U> uses an array internally for its keys, and performs the "sort" by inserting the items on Add in the correct order. When you call Add with a new item, it does a Array.BinarySearch with an Array.Insert.
This is why it has these characteristics:
This method is an O(n) operation for unsorted data, where n is Count. It is an O(log n) operation if the new element is added at the end of the list. If insertion causes a resize, the operation is O(n).
I have a QueueList table:
{Id, Queue_Instance_ID, Parent_Queue_Instance_ID, Child_Queue_Instance_ID}
What would be the most efficient way to stuff this into a LinkedList<QueueList>? Do you think I can go lower than o(n^2)?
O(n) is better than O(n^2), right? ;)
I assume that the parent and child id are actually the previous and next id in the list.
First put the items in a dictionary so that you easily can look them up on the QueueInstanceId. In this loop you also locate the first item. Then you just add the items to the linked list and use the dictionary to get the next item. Example in C#:
Dictionary<int, QueueList> lookup = new Dictionary<int, QueueList>();
QueueList first = null;
foreach (QueueList item in source) {
lookup.Add(item.QueueInstanceId, item);
if (item.ParentQueueInstanceId == -1) {
first = item;
}
}
LinkedList<QueueList> list = new LinkedList<QueueList>();
do {
list.AddLast(first);
} while (lookup.TryGetValue(first.ChildQueueInstanceId, out first));
Adding items to a dictionary and getting them by key are O(1) operations, and each loop is the length of the source list, so it's all an O(n) operation.
Thinking about this a little bit, you can do it in O(n) time, although it will take some extra stuff that will add some overhead. (Still O(n)).
You could use some sort of universal hashing scheme so that you can build a good hashtable. Essentially, hash by the instance id.
-Take each queue, and throw it into a doubly linked node of some sort.
- Insert each node into the hashtable.
- When you insert, check if the parent and/or child are in the hashtable and link accordingly.
_ when you are done, start at the head (which you should be able to figure out in the first pass), walk the list and add it to your LinkedList or just use the resulting doubly linked list.
This takes one pass that is O(n), and the hashtable takes O(n) to initialize and inserts are O(1).
The problem left for you is choosing a good hashtable implementation that will give you those results.
Also, you have to consider the memory overhead, as I do not know how many results you expect. Hopefully enough to keep in memory.
I don't know of or think there is an SQL query based solution, but my SQL knowledge is very limited.
Hope this helps!