IndexOf too slow on list. Faster solution? - c#

I have generic list which must be a preserved order so I can retrieve the index of an object in the list. The problem is IndexOf is way too slow. If I comment the IndexOf out, the code runs fast as can be. Is there a better way, such as a preserved ordered hash list for c#?
Thanks,
Nate
Edit -
The order in which the items are added/inserted is the order it needs to be. No sorting on them is necessary. Also this list has the potential to be updated often, add, remove, insert. Basically I need to translate the object to an index due to them being represented in a grid control so I can perform operations on the grid control based on index.

If it's not sorted, but the order needs to be preserved, then you could have a separate Dictionary<YourClass, int> which would contain the index for each element.
If you want a sorted list, then check previous posts - you can use SortedList<Tkey, TValue> in .Net 3.5, or sort it and use BinarySearch in older .Net versions.
[Edit] You can find similar examples on the web, e.g.: OrderedList. This one internally uses an ArrayList and a HashTable, but you can easily make it generic.
[Edit2] Ooops.. the example I gave you doesn't implement IndexOf the way I described at the beginning... But you get the point - one list should be ordered, the other one used for quick lookup.

Sort it using List<T>.Sort, then use the List<T>.BinarySearch method: "Searches the entire sorted List(T) for an element [...] This method is an O(log n) operation, where n is the number of elements in the range."

See the bottom of this article here.
It appears that writing your own method to retrieve the index is much quicker than using the IndexOf method, due to the fact that it calls into a virtual method depending on the type.
Something like this may therefore improve your performance. I wrote a small unit test to verify that this improves the performance of the search, and it did, by about 15x in a list with 10,000 items.
static int GetIndex(IList<Item> list, Item value)
{
for (int index = 0; index < list.Count; index++)
{
if (list[index] == value)
{
return index;
}
}
return -1;
}

Perhaps you are looking for SortedList<TKey, TValue>?

If the order of the objects in the list has to be preserved then the only way I can think of where you're going to get the fastest possible access is to tell the object what its index position is when its added etc to the list. That way you can query the object to get its index in the list. The downside, and its a big downside in my view, is that the inserted objects now have a dependency on the list.

I suggest to use the SortedList<TKey, TValue> or SortedDictionary<TKey, TValue> class if you need the items sorted. The differences are the following.
SortedList<TKey, TValue> uses less memory than SortedDictionary<TKey, TValue>.
SortedDictionary<TKey, TValue> has faster insertion and removal operations for
unsorted data: O(log n) as opposed to O(n) for SortedList<TKey, TValue>.
If the list is populated all at once from sorted data,SortedList<TKey, TValue> is
faster than SortedDictionary<TKey, TValue>.
If you just want to preserve the ordering, you can just use a Dictionary<TKey, TValue> and store the item as key and the index as value. The drawback is that reordering the items, insertions, or deletion are quite expensive to do.

Well there is no reason you should ever have to order a hash list...that's kind of the point. However, a hash list should do the trick quite readily.

If you are using the List class then you could use the Sort method to sort it after is initially populated then use the BinarySearch Method to find the appropriate element.

I'm not sure about specifics in C#, but you might be able to sort it (QuickSort?) and then use a binary search on it (BinarySearch performance is O(log2(N)), versus Sequential, such as indexOf, which is O(n)). (IMPORTANT: For a Binary Search, your structure must be sorted)
When you insert items to your data structure, you could try a modified binary search to find the insertion point as well, or if you are adding a large group, you would add them and then sort them.
The only issue is that insertion will be slower.

Related

c# list methods: ElementAt(index) vs Find(content)

I am currently building a data structure which relies a lot on efficiency.
Can anyone provide me with resources on how the Find(item => item.X = myObject.Property) method actually works?
Does it iterate linearly throughout all elements until it finds the element?
And what if I know the index of myObject and I use ElementAt(index)?
Which will be the most efficient of these two please?
From the MSDN documentation on List<T>.Find
This method performs a linear search; therefore, this method is an O(n) operation, where n is Count.
I imagine that ElementAt is optimized for IList and will do a direct index. But since you're apparently using this object from the List concrete type anyway, why not just do a direct index? Like this:
var result = list[index];
If you already know the index, there is no point to searching. Just go straight to it.

Is a HashSet<T> the same as List<T> but with uniqueness?

I need to have an ability to have unique items in a collection.
I was going to use a Dictionary so I could use the ContainsKey method but I thought it would be a waste as I wouldnt use the Value property of the Key/Value pair.
I came across the HashSet<T> which looks very promising. The only thing I can find that I can't find in the List<T> docs is that HashSet<T> is unordered. I think that is fine, I assume it means its not ordered using a IEqualityComparer. As long as the order in which items are added are in the same index position I think it will be ok as I have to do duplicate checking hence the hashset and then check all entries are sequential.
Is there anything else I have missed comparing the two types?
No, importantly HashSet<T> doesn't have any concept of ordering or indexing - a list conceptually has slots 0....n-1, whereas a set is "just a set".
I think that is fine, I assume it means its not ordered using a IEqualityComparer.
IEqualityComparer isn't used for ordering anyway - it only talks about equality and hash codes. HashSet<T> isn't ordered by either an element comparison (as, say, SortedSet<T> is) or insertion order.
As long as the order in which items are added are in the same index position I think it will be ok.
There is no index position, and when you iterate over a HashSet<T> there's no guarantee you'll get them back in the order in which you added them. If you're even thinking about ordering, HashSet<T> isn't what you're after.
Then again, all of this is also true of Dictionary<TKey, TValue> - you shouldn't make any assumptions about ordering there, either.
This is a 'picture' of what a List<T> looks like:
List: |a|b|r|t|i|p|c|y|z|...
Index: |0|1|2|3|4|5|6|7|8|...
The List<T> represents, well, a list of items. You can refer to an item by its position in the list.
This is a 'picture' of what a HashSet<T> looks like:
Set: |a|b|c| | | | | |i| | | | | | |p| |r| |t| | | | |y|z|
Bucket: |a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z|
The HashSet<T> represents a set of unique items. Every item has its own 'bucket'. You can refer to an item by its bucket. The bucket that an item belongs in is calculated directly from an item.
One of the advantages of using a HashSet over a List is constant-time searches. In a List, an item could be anywhere in the List, so to find it, you need to look at every item in the List. In a HashSet, there is only one possible location for any given item. Therefore, to search for an item, all you need to do is look in its bucket. If it's there, it's there, if it's not, it's not.
The illustrations may not be 100% accurate (for simplicity's sake). Especially the HashSet example.
No. A HashSet doesn’t allow access via index because the items aren’t ordered. This does not mean, as you suspect, that they aren’t ordered according to some IEqualityComparer. It means that they are not stored inside the hash set in the order of adding them.
So if you need an order preserving or random access container, HashSet is not for you.
It sounds like this is what you're after:
class UniqueList<T> : Collection<T>
{
protected override void InsertItem(int index, T item)
{
if (!base.Contains(item))
{
base.InsertItem(index, item);
}
else
{
// whatever
}
}
}
Calling UniqueList.Add will add an item to the end of the list, and will not add duplicate values.
Well HashSet conceptually is a List of unique values, but in difference from List<T> it doesn't actually implements IList interface, but implements ICollection.
Plus it has a set of special functions, like :
Intersection, IsSubsetOf, IsSupersetOf, Union, which List<T> doesn't have.
These functions, naturally, are handy in operations on multiple HasSets.
You got it slightly wrong. Neither Dictionary nor HashSet preserves the order of the items, this means you can't rely on the item index. Theoretically you can use LINQ ElementAt() to access item by index, but again both collections do not gurantee that order is preserved.
.NET provides an OrderedDictionary class, but it is not generic so you would not have a type safety at compile time. Anyways it allows accessing items by index.
Here is a custom implementation of the generic one: OrderedDictionary(of T): A generic implementation of IOrderedDictionary. The key point: it persists two collections -- List and Dictionary at the same time; List provides access by index and Dictionary provides fast access by a key.

In an object of type List<T>, does accessing an object by index run through each item in the list, or does it use a much faster approach?

For example:
List<MyClass> myList = new List<MyClass>();
...
// add lots of members...
...
MyClass myClass = myList[25];
Will asking for index 25 take much longer than asking for index 1, or does it use some quick algorithm to jump straight to the 25th item?
Thanks!
Internally List<T> is implemented as array (which grows when you're adding new items) so accessing of n-th element will be O(1) operation. (Therefore there will be no difference in speed between getting myList[1] and myList[25].)
Excerpt from the List<T>.Item property documentation:
Retrieving the value of this property is an O(1) operation; setting the property is also an O(1) operation.
I can imagine how slow would be .NET applications if List<T> had to jump through all items before getting n-th...
From the Item property of List<T>
Retrieving the value of this property is an O(1) operation; setting the property is also an O(1) operation.
No, it's very fast. In fact it's not an algorithm at all*; the backing store for List<T> is just a T[] array; so all it has to do is jump to a known location in memory.
In abstract terms, think of it like this: since the elements of an array reside in a contiguous block of memory, you can imagine the array as a number line. Does it take you any longer to find "10" on a number line than "1"? No -- you know exactly how the numbers are laid out, so all you have to do is look straight at 10. You don't have to scroll your eyes through 1, 2, 3, etc., in other words.
Granted, that's a highly non-technical analogy; but it's pretty consistent with how accessing an element of an array works.
*A calculation is required, yes: the address of the first element in the array plus the product the element size with the index. But to call this an "algorithm" would be a stretch; and anyway, it is a constant-time operation regardless.
No, removing and insertion on the other hand is dependent on where you remove an element, since it is a dynamic array.
http://en.wikipedia.org/wiki/Dynamic_array
List<T> uses T[] internally so indexing is supported directly by the underlying data structure.

How to get a partilcular item from a Dictionary object with index?

How to retrive an item from an dictionary object using an Index? eg i have a dicitiory object of 10 items and i have to get the 5th keypairvalue from the dictionary?
Dictionaries are unordered. If you mean "the 5th item added to the dictionary" - they don't provide this functionality.
One thing to be careful of is that in many cases Dictionary<TKey, TValue> appears to be ordered - if you just add a bunch of entries and then iterate, then under the current implementation I believe you will at least usually get back the pairs in the same order. However, it's not guaranteed, it's not meant to happen particularly - it's just a quirk of the implementation. If you delete entries and then add more, then the whole thing goes pear-shaped.
Fundamentally, if you want ordering as well as key lookups, you need to store a list as well as a dictionary.
If you are using .NET 3.5 or greater:
var keyValuePair = d.ElementAt(4);
However, this is using an enumerator behind the scenes and the ordering of enumerated items from a dictionary is not guaranteed:
The IDictionary interface allows the contained keys and values to be enumerated, but it does not imply any particular sort order (From IDictionary reference on MSDN).
This means that the element you get back might not correspond to the order you inserted it in and thus is probably not what you expect.
There is an OrderedDictionary class in System.Collections.Specialized that enforces the ordering and allows you to access by index through the Item indexer. However, this is from the pre-generics days so it only accepts object key-values and thus isn't quite as friendly to work with as the generic collections.
I just found this article on CodeProject that implements a generic OrderedDictionary. I have never used this but it might be useful for you.
Ignoring the fundamental abuse of a dictionary that this question presents:
int counter = 0;
foreach (var pair in yourDictionary)
{
if (++counter == 5)
{
// pair contains your fifth item
}
}
If you're using a generic dictionary like this:
Dicionary<int,string> myDict = new Dictionary<int,string>();
You could pull the 5th value from the dictionary by converting the output to a list:
string SomeString = myDict.Values.ToList()[4];
But typically you'd use a dictionary when you're more concerned about retrieving a value based on a pre-determined key rather than it's position in the list.
The 5th according to which ordering? The Dictionary class does not guarantee any specific ordering. If you want it in some specific ordering, retrieve the pairs from the collection (for example as John suggests) and sort them, then get the KeyValuePair at the index you need. If you need it ordered by insertion order, try using the System.Collections.Specialized.OrderedDictionary instead, then you can access the KeyValuePair directly by index.
Use System.Linq
string item=dicOBj.Keys.ElementAt(index);
you can get both key and value in the same way specifying index
.NET Framework has 8 dictionary classes: Dictionary, Hastable, ListDictionary, OrderedDictionary, SortedDictionary , SortedList, SortedList Generic. In all these classes items can be retrieved by key, but items can be retrieved by index only in OrderedDictionary, SortedList, SortedList Generic. If you need to retrieve from you dictionary items by key or by index you must use one of these classes: OrderedDictionary, SortedList, SortedList Generic.
How to use these classes you can find: OrderedDictionary Class , SortedList Class

Fastest way to find out whether two ICollection<T> collections contain the same objects

What is the fastest way to find out whether two ICollection<T> collections contain precisely the same entries? Brute force is clear, I was wondering if there is a more elegant method.
We are using C# 2.0, so no extension methods if possible, please!
Edit: the answer would be interesting both for ordered and unordered collections, and would hopefully be different for each.
use C5
http://www.itu.dk/research/c5/
ContainsAll
" Check if all items in a
supplied collection is in this bag
(counting multiplicities).
The
items to look for.
True if all items are
found."
[Tested]
public virtual bool ContainsAll<U>(SCG.IEnumerable<U> items) where U : T
{
HashBag<T> res = new HashBag<T>(itemequalityComparer);
foreach (T item in items)
if (res.ContainsCount(item) < ContainsCount(item))
res.Add(item);
else
return false;
return true;
}
First compare the .Count of the collections if they have the same count the do a brute force compare on all elements. Worst case scenarios is O(n). This is in the case the order of elements needs to be the same.
The second case where the order is not the same, you need to use a dictionary to store the count of elements found in the collections: Here's a possible algorithm
Compare collection Count : return false if they are different
Iterate the first collection
If item doesn't exist in dictionary then add and entry with Key = Item, Value = 1 (the count)
If item exists increment the count for the item int the dictionary;
Iterate the second collection
If item is not in the dictionary the then return false
If item is in the dictionary decrement count for the item
If count == 0 the remove item;
return Dictionary.Count == 0;
For ordered collections, you can use the SequenceEqual() extension method defined by System.Linq.Enumerable:
if (firstCollection.SequenceEqual(secondCollection))
You mean the same entries or the same entries in the same order?
Anyway, assuming you want to compare if they contain the same entries in the same order, "brute force" is really your only option in C# 2.0. I know what you mean by non elegant, but if the atomic comparision itself is O(1), the whole process should be in O(N), which is not that bad.
If the entries need to be in the same order (besides being the same), then I suggest - as an optimization - that you iterate both collections at the same time and compare the current entry in each collection. Otherwise, the brute force is the way to go.
Oh, and another suggestion - you could override Equals for the collection class and implement the equality stuff in there (depends on you project, though).
Again, using the C5 library, having two sets, you could use:
C5.ICollection<T> set1 = C5.ICollection<T> ();
C5.ICollection<T> set2 = C5.ICollecton<T> ();
if (set1.UnsequencedEquals (set2)) {
// Do something
}
The C5 library includes a heuristic that actually tests the unsequenced hash codes of the two sets first (see C5.ICollection<T>.GetUnsequencedHashCode()) so that if the hash codes of the two sets are unequal, it doesn't need to iterate over every item to test for equality.
Also something of note to you is that C5.ICollection<T> inherits from System.Collections.Generic.ICollection<T>, so you can use C5 implementations while still using the .NET interfaces (though you have access to less functionality through .NET's stingy interfaces).
Brute force takes O(n) - comparing all elements (assuming they are sorted), which I would think is the best you could do - unless there is some property of the data that makes it easier.
I guess for the case of not sorted, its O(n*n).
In which case, I would think a solution based around a merge sort would probably help.
For example, could you re-model it so that there was only one collection? Or 3 collections, one for those in collection A only, one for B only and for in both - so if the A only and B only are empty - then they are the same... I am probably going off on totally the wrong tangent here...

Categories