How does C# SortedList internally work? - c#

I was wondering, how does the SortedList work.
I know that a regular List is based on a dynamic array, but what is SortedList based on?
And what sorting algorithm it uses?
Thanks

From the sortedlist documentation: "SortedList is implemented as an array of key/value pairs, sorted by the key."
http://msdn.microsoft.com/en-us/library/e7a8xew6%28v=vs.110%29.aspx
If you use the default constructor (no parameters): "Initializes a new instance of the SortedList class that is empty, has the default initial capacity, and is sorted according to the IComparable interface implemented by each key added to the SortedList object."
http://msdn.microsoft.com/en-us/library/cxb97few%28v=vs.110%29.aspx
Or you can pass a custom comparer:
Initializes a new instance of the SortedList class that is empty, has the default initial capacity, and is sorted according to the specified IComparer interface.
http://msdn.microsoft.com/en-us/library/e7a8xew6%28v=vs.110%29.aspx
Other constructor options:
http://msdn.microsoft.com/en-us/library/System.Collections.SortedList.SortedList%28v=vs.110%29.aspx
How to use IComparer interface: http://msdn.microsoft.com/en-us/library/system.collections.icomparer%28v=vs.110%29.aspx

SortedList class source code can be found here.
According to this source, SortedList keeps data in two plain arrays of keys and values:
private TKey[] keys;
private TValue[] values;
Sort order is maintained on the array of keys. When a new item (key/value pair) is added, SortedList first finds proper index in the sorted array of keys (using Array.BinarySearch), then moves partial contents of both key and value arrays (using Array.Copy) starting from this index upward to create a gap where both key and value are inserted. Likewise, when an item is deleted by its key, SortedList searches for the item's index in the array of keys, then moves partial contents of both arrays from this index downward to close the gap.
So, one thing to keep in mind is that when adding to or deleting from a large SortedList, a lot of data may be moved around. On the positive side, retrieving items by index is always fast regardless of the list size.

A SortedList is an object that maintain's two array to store the element's of the list.
One array store's the Key and the other array store's its associated values.
The element's in the SortedList are sorted either according to a specific IComparer implementation specified when the SortedList is created or according to the IComparable implementation provided by the keys themselves. They cannot contain duplicate keys.
Whenever any element is added or removed the index is adjusted accordingly as a result operations related to SortedList are slower.

If you are interested in the internals of how it works, get yourself any half decent decompiler, such as .Net Reflector.
A quick look shows that internally, SortedList maintains it's sorted state by keeping an internal array sorted at all times. When an item is added using the Add Method, it uses a binary search on the keys to identify the correct index to insert the new item.

Related

What are the differences between a list, sorted list, and an array list? (c#)

From what I've read, a list, sorted list, and an array list have many things in common, but at the same time have a few differences.
I would like to know: What are the differences between them that a beginner should know? Why choose one over the other? And what are some good habits to form when using them in code?
Thank you for your time.
From MSDN:
A SortedList element can be accessed by its key, like an element in
any IDictionary implementation, or by its index, like an element in
any IList implementation.
A SortedList object internally maintains two arrays to store the
elements of the list; that is, one array for the keys and another
array for the associated values. Each element is a key/value pair that
can be accessed as a DictionaryEntry object. A key cannot be null, but
a value can be.
Also for choosing best collection you can see this.
with List<T> and SortedList<T> you can specify the type of the element and are generally easier to use because of that. ArrayList is legacy, and holds objects but you must cast them to the contained type yourself.
SortedList<T> as the name implies is a sorted list of type T. Use it when you want a sorted list. Use List<T> when a sorted ordering is unnecessary or when a general collection of T is sufficient or you provide your own sorting mechanism. SortedList<T> will be slower on adding items then List<T>, so only use it when necessary.

Is a HashSet<T> the same as List<T> but with uniqueness?

I need to have an ability to have unique items in a collection.
I was going to use a Dictionary so I could use the ContainsKey method but I thought it would be a waste as I wouldnt use the Value property of the Key/Value pair.
I came across the HashSet<T> which looks very promising. The only thing I can find that I can't find in the List<T> docs is that HashSet<T> is unordered. I think that is fine, I assume it means its not ordered using a IEqualityComparer. As long as the order in which items are added are in the same index position I think it will be ok as I have to do duplicate checking hence the hashset and then check all entries are sequential.
Is there anything else I have missed comparing the two types?
No, importantly HashSet<T> doesn't have any concept of ordering or indexing - a list conceptually has slots 0....n-1, whereas a set is "just a set".
I think that is fine, I assume it means its not ordered using a IEqualityComparer.
IEqualityComparer isn't used for ordering anyway - it only talks about equality and hash codes. HashSet<T> isn't ordered by either an element comparison (as, say, SortedSet<T> is) or insertion order.
As long as the order in which items are added are in the same index position I think it will be ok.
There is no index position, and when you iterate over a HashSet<T> there's no guarantee you'll get them back in the order in which you added them. If you're even thinking about ordering, HashSet<T> isn't what you're after.
Then again, all of this is also true of Dictionary<TKey, TValue> - you shouldn't make any assumptions about ordering there, either.
This is a 'picture' of what a List<T> looks like:
List: |a|b|r|t|i|p|c|y|z|...
Index: |0|1|2|3|4|5|6|7|8|...
The List<T> represents, well, a list of items. You can refer to an item by its position in the list.
This is a 'picture' of what a HashSet<T> looks like:
Set: |a|b|c| | | | | |i| | | | | | |p| |r| |t| | | | |y|z|
Bucket: |a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z|
The HashSet<T> represents a set of unique items. Every item has its own 'bucket'. You can refer to an item by its bucket. The bucket that an item belongs in is calculated directly from an item.
One of the advantages of using a HashSet over a List is constant-time searches. In a List, an item could be anywhere in the List, so to find it, you need to look at every item in the List. In a HashSet, there is only one possible location for any given item. Therefore, to search for an item, all you need to do is look in its bucket. If it's there, it's there, if it's not, it's not.
The illustrations may not be 100% accurate (for simplicity's sake). Especially the HashSet example.
No. A HashSet doesn’t allow access via index because the items aren’t ordered. This does not mean, as you suspect, that they aren’t ordered according to some IEqualityComparer. It means that they are not stored inside the hash set in the order of adding them.
So if you need an order preserving or random access container, HashSet is not for you.
It sounds like this is what you're after:
class UniqueList<T> : Collection<T>
{
protected override void InsertItem(int index, T item)
{
if (!base.Contains(item))
{
base.InsertItem(index, item);
}
else
{
// whatever
}
}
}
Calling UniqueList.Add will add an item to the end of the list, and will not add duplicate values.
Well HashSet conceptually is a List of unique values, but in difference from List<T> it doesn't actually implements IList interface, but implements ICollection.
Plus it has a set of special functions, like :
Intersection, IsSubsetOf, IsSupersetOf, Union, which List<T> doesn't have.
These functions, naturally, are handy in operations on multiple HasSets.
You got it slightly wrong. Neither Dictionary nor HashSet preserves the order of the items, this means you can't rely on the item index. Theoretically you can use LINQ ElementAt() to access item by index, but again both collections do not gurantee that order is preserved.
.NET provides an OrderedDictionary class, but it is not generic so you would not have a type safety at compile time. Anyways it allows accessing items by index.
Here is a custom implementation of the generic one: OrderedDictionary(of T): A generic implementation of IOrderedDictionary. The key point: it persists two collections -- List and Dictionary at the same time; List provides access by index and Dictionary provides fast access by a key.

How to get a partilcular item from a Dictionary object with index?

How to retrive an item from an dictionary object using an Index? eg i have a dicitiory object of 10 items and i have to get the 5th keypairvalue from the dictionary?
Dictionaries are unordered. If you mean "the 5th item added to the dictionary" - they don't provide this functionality.
One thing to be careful of is that in many cases Dictionary<TKey, TValue> appears to be ordered - if you just add a bunch of entries and then iterate, then under the current implementation I believe you will at least usually get back the pairs in the same order. However, it's not guaranteed, it's not meant to happen particularly - it's just a quirk of the implementation. If you delete entries and then add more, then the whole thing goes pear-shaped.
Fundamentally, if you want ordering as well as key lookups, you need to store a list as well as a dictionary.
If you are using .NET 3.5 or greater:
var keyValuePair = d.ElementAt(4);
However, this is using an enumerator behind the scenes and the ordering of enumerated items from a dictionary is not guaranteed:
The IDictionary interface allows the contained keys and values to be enumerated, but it does not imply any particular sort order (From IDictionary reference on MSDN).
This means that the element you get back might not correspond to the order you inserted it in and thus is probably not what you expect.
There is an OrderedDictionary class in System.Collections.Specialized that enforces the ordering and allows you to access by index through the Item indexer. However, this is from the pre-generics days so it only accepts object key-values and thus isn't quite as friendly to work with as the generic collections.
I just found this article on CodeProject that implements a generic OrderedDictionary. I have never used this but it might be useful for you.
Ignoring the fundamental abuse of a dictionary that this question presents:
int counter = 0;
foreach (var pair in yourDictionary)
{
if (++counter == 5)
{
// pair contains your fifth item
}
}
If you're using a generic dictionary like this:
Dicionary<int,string> myDict = new Dictionary<int,string>();
You could pull the 5th value from the dictionary by converting the output to a list:
string SomeString = myDict.Values.ToList()[4];
But typically you'd use a dictionary when you're more concerned about retrieving a value based on a pre-determined key rather than it's position in the list.
The 5th according to which ordering? The Dictionary class does not guarantee any specific ordering. If you want it in some specific ordering, retrieve the pairs from the collection (for example as John suggests) and sort them, then get the KeyValuePair at the index you need. If you need it ordered by insertion order, try using the System.Collections.Specialized.OrderedDictionary instead, then you can access the KeyValuePair directly by index.
Use System.Linq
string item=dicOBj.Keys.ElementAt(index);
you can get both key and value in the same way specifying index
.NET Framework has 8 dictionary classes: Dictionary, Hastable, ListDictionary, OrderedDictionary, SortedDictionary , SortedList, SortedList Generic. In all these classes items can be retrieved by key, but items can be retrieved by index only in OrderedDictionary, SortedList, SortedList Generic. If you need to retrieve from you dictionary items by key or by index you must use one of these classes: OrderedDictionary, SortedList, SortedList Generic.
How to use these classes you can find: OrderedDictionary Class , SortedList Class

Unexpected issue Copying Dictionaries

My idea was to copy a dictionary while resetting all the values of the previous one, so i have this instruction:
var dic2 = new Dictionary<string, int>(dic.ToDictionary(kvp => kvp.Key, kvp => 0));
However i had an unexpected problem doing this, since the new copied dictionary doesnt have the same order of keys of the previous one.
Any way to reset the values but to maintain the same order of keys? Witouth resorting to some type of sorting?
The answer is not to rely on the order of keys in a Dictionary<,> in the first place. It's emphatically not guaranteed.
This is documented on MSDN, but not nearly as clearly as I'd have wanted:
For purposes of enumeration, each item
in the dictionary is treated as a
KeyValuePair structure
representing a value and its key. The
order in which the items are returned
is undefined.
.NET doesn't currently have an insertion-order-preserving dictionary implementation as far as I'm aware :(
The order of the keys in a Dictionary<K,V> isn't maintained. You might want to use a SortedDictionary<K,V> instead (note that this class sorts the entries based on the key, but doesn't allow an arbitrary order, unless you create a specific key comparer)
.Net dictionaries are unordered.
Your question has no answer.
You should consider using a List<KeyValuePair<string, int>> instead.
Dictionary doesn't define sequence of keys. It is not array or list. You should not rely on order of keys on dictionary. Dictionary was made for by-key access not for sequential.

IndexOf too slow on list. Faster solution?

I have generic list which must be a preserved order so I can retrieve the index of an object in the list. The problem is IndexOf is way too slow. If I comment the IndexOf out, the code runs fast as can be. Is there a better way, such as a preserved ordered hash list for c#?
Thanks,
Nate
Edit -
The order in which the items are added/inserted is the order it needs to be. No sorting on them is necessary. Also this list has the potential to be updated often, add, remove, insert. Basically I need to translate the object to an index due to them being represented in a grid control so I can perform operations on the grid control based on index.
If it's not sorted, but the order needs to be preserved, then you could have a separate Dictionary<YourClass, int> which would contain the index for each element.
If you want a sorted list, then check previous posts - you can use SortedList<Tkey, TValue> in .Net 3.5, or sort it and use BinarySearch in older .Net versions.
[Edit] You can find similar examples on the web, e.g.: OrderedList. This one internally uses an ArrayList and a HashTable, but you can easily make it generic.
[Edit2] Ooops.. the example I gave you doesn't implement IndexOf the way I described at the beginning... But you get the point - one list should be ordered, the other one used for quick lookup.
Sort it using List<T>.Sort, then use the List<T>.BinarySearch method: "Searches the entire sorted List(T) for an element [...] This method is an O(log n) operation, where n is the number of elements in the range."
See the bottom of this article here.
It appears that writing your own method to retrieve the index is much quicker than using the IndexOf method, due to the fact that it calls into a virtual method depending on the type.
Something like this may therefore improve your performance. I wrote a small unit test to verify that this improves the performance of the search, and it did, by about 15x in a list with 10,000 items.
static int GetIndex(IList<Item> list, Item value)
{
for (int index = 0; index < list.Count; index++)
{
if (list[index] == value)
{
return index;
}
}
return -1;
}
Perhaps you are looking for SortedList<TKey, TValue>?
If the order of the objects in the list has to be preserved then the only way I can think of where you're going to get the fastest possible access is to tell the object what its index position is when its added etc to the list. That way you can query the object to get its index in the list. The downside, and its a big downside in my view, is that the inserted objects now have a dependency on the list.
I suggest to use the SortedList<TKey, TValue> or SortedDictionary<TKey, TValue> class if you need the items sorted. The differences are the following.
SortedList<TKey, TValue> uses less memory than SortedDictionary<TKey, TValue>.
SortedDictionary<TKey, TValue> has faster insertion and removal operations for
unsorted data: O(log n) as opposed to O(n) for SortedList<TKey, TValue>.
If the list is populated all at once from sorted data,SortedList<TKey, TValue> is
faster than SortedDictionary<TKey, TValue>.
If you just want to preserve the ordering, you can just use a Dictionary<TKey, TValue> and store the item as key and the index as value. The drawback is that reordering the items, insertions, or deletion are quite expensive to do.
Well there is no reason you should ever have to order a hash list...that's kind of the point. However, a hash list should do the trick quite readily.
If you are using the List class then you could use the Sort method to sort it after is initially populated then use the BinarySearch Method to find the appropriate element.
I'm not sure about specifics in C#, but you might be able to sort it (QuickSort?) and then use a binary search on it (BinarySearch performance is O(log2(N)), versus Sequential, such as indexOf, which is O(n)). (IMPORTANT: For a Binary Search, your structure must be sorted)
When you insert items to your data structure, you could try a modified binary search to find the insertion point as well, or if you are adding a large group, you would add them and then sort them.
The only issue is that insertion will be slower.

Categories