Can anyone explain what is the complexity of
Dictionary.ContainsValue?
I know that Dictionary.ContainsKey complexity is O(1).
Short answer : O(N)
Long Answer :
This method performs a linear search; therefore, the average execution time is proportional to Count. That is, this method is an O(n) operation, where n is Count.
Official documentation
O(n).
This method performs a linear search; therefore, the average execution time is proportional to Count. That is, this method is an O(n) operation, where n is Count.
Particularly, in Dictionary structure the key is generated in a hashing mechanism. When calling to ContainsKey function it computes the hash code argument and checks if this computed argument exists in the hash table.
The value of the dictionary is not hashed and when calling to ContainsValue function the algorithm has to iterate over the values stored until it finds the first matched item (if exists).
Note, if you use this function you might need another structure that will work with a better complexity. You might store all values in a HashSet<T>.
Related
I understand why a HashTable Add is O(1) (however please correct me if I'm wrong): The item being added is always allocated to the first available spot in the backing array.
I understand why a Lookup is O(n) (again, please correct me if I'm wrong): You need to walk through the backing array to find the value/key requested, and the running time of this operation will be directly proportional to the size of the collection.
However, why, then, is a Delete constant? Seems to me the same principals involved in an Add/Lookup are required.
EDIT
The MSDN article refers to a scenario where the item requested to be deleted isn't found. It mentions this as being an O(1) operation.
The worst cases for Insert and Delete are supposed to be O(n), see http://en.wikipedia.org/wiki/Hash_table.
When we Insert, we have to check if the value is in the table or not, hence O(n) in the worst case.
Just imagine a pathological case when all hash values are the same.
Maybe MSDN refers to average complexity.
O(1) is the best case, and probably the average case if you appropriately size the table. Worst case deletion for a HashTable is O(n).
Consider for example the documentation for the .NET Framework 4.5 Dictionary<TKey, TValue> class:
In the remarks for the .ContainsKey method, they state that
This method approaches an O(1) operation.
And in the remarks for the .Count property, they state that
Retrieving the value of this property is an O(1) operation.
Note that I am not necessarily asking for the details of C#, .NET, Dictionary or what Big O notation is in general. I just found this distinction of "approaches" intriguing.
Is there any difference? If so, how significant can it potentially be? Should I pay attention to it?
If the hash function used by the underlying objects in the hash code is "good" it means collisions will be very rare. Odds are good that there will be only one item at a given hash bucket, maybe two, and almost never more. If you could say, for certain, that there will never be more than c items (where c is a constant) in a bucket, then the operation would be O(c) (which is O(1)). But that assurance can't be made. It's possible, that you just happen to have n different items that, unluckily enough, all collide, and all end up in the same bucket, and in that case, ContainsKey is O(n). It's also possible that the hash function isn't "good" and results in hash collisions frequently, which can make the actual contains check worse than just O(1).
It's because Dictionary is an implementation of hash tables - this means that a key lookup is done by using a hashing function that tells you which bucket among many buckets contained in the data structure contains the value you're looking up. Usually, for a good hashing function, assuming a large enough set of buckets, each bucket only contains a single element - in which case the complexity is indeed O(1) - unfortunately, this is not always true - the hashing function may have clashes, in which case a bucket may contain more than one entry and so the algorithm has to iterate through the bucket until it finds the entry you're looking for - so it's no longer O(1) for these (hopefully) rare cases.
O(1) is a constant running time, approaching O(1) is close to constant but not quite, but for most purposes is a negligible increase. You should not pay attention to it.
Something either is or isn't O(1). I think what they're trying to say is that running time is approximately O(1) per operation for a large number of operations.
Let's assume I have data of size N (i.e.N elements) and the dictionary was created with capacity N. What is the complexity of:
space -- of entire dictionary
time -- adding entry to dictionary
MS revealed only that entry retrieval is close to O(1). But what about the rest?
What is the complexity of space -- of entire dictionary
Dictionary uses associative array data structure that is of O(N) space complexity.
Msdn says: "the Dictionary class is implemented as a hash table". And hash table uses associative arrays in turn.
What is the complexity of time -- adding entry to dictionary
A single add takes amortized O(1) time. In most cases it is O(1), it changes to O(N) when the underlying data structure needs to grow or shrink. As the later only happens infrequently, people use the word "amortized".
The time complexity of adding a new entry is documented under Dictionary<T>.Add():
If Count is less than the capacity, this method approaches an O(1) operation. If the capacity must be increased to accommodate the new element, this method becomes an O(n) operation, where n is Count.
It is not formally documented, but widely stated (and visible via disassembly) that the underlying storage is an array of name-value pairs. Thus space complexity is O(n).
As it is not part of the specification this could, in theory, change; but in practice is highly unlikely to because it would change the performance of the various operations (eg. enumeration) which could be visible.
In C#.NET, I like using HashSets because of their supposed O(1) time complexity for lookups. If I have a large set of data that is going to be queried, I often prefer using a HashSet to a List, since it has this time complexity.
What confuses me is the constructor for the HashSet, which takes IEqualityComparer as an argument:
http://msdn.microsoft.com/en-us/library/bb359100.aspx
In the link above, the remarks note that the "constructor is an O(1) operation," but if this is the case, I am curious if lookup is still O(1).
In particular, it seems to me that, if I were to write a Comparer to pass in to the constructor of a HashSet, whenever I perform a lookup, the Comparer code would have to be executed on every key to check to see if there was a match. This would not be O(1), but O(n).
Does the implementation internally construct a lookup table as elements are added to the collection?
In general, how might I ascertain information about complexity of .NET data structures?
A HashSet works via hashing (via IEqualityComparer.GetHashCode) the objects you insert and tosses the objects into buckets per the hash. The buckets themselves are stored in an array, hence the O(1) part.
For example (this is not necessarily exactly how the C# implementation works, it just gives a flavor) it takes the first character of the hash and throws everything with a hash starting with 1 into bucket 1. Hash of 2, bucket 2, and so on. Inside that bucket is another array of buckets that divvy up by the second character in the hash. So on for every character in the hash....
Now, when you look something up, it hashes it, and jumps thru the appropriate buckets. It has to do several array lookups (one for each character in the hash) but does not grow as a function of N, the number of objects you've added, hence the O(1) rating.
To your other question, here is a blog post with the complexity of a number of collections' operations: http://c-sharp-snippets.blogspot.com/2010/03/runtime-complexity-of-net-generic.html
if I were to write a Comparer to pass in to the constructor of a HashSet, whenever I perform a lookup, the Comparer code would have to be executed on every key to check to see if there was a match. This would not be O(1), but O(n).
Let's call the value you are searching for the "query" value.
Can you explain why you believe the comparer has to be executed on every key to see if it matches the query?
This belief is false. (Unless of course the hash code supplied by the comparer is the same for every key!) The search algorithm executes the equality comparer on every key whose hash code matches the query's hash code, modulo the number of buckets in the hash table. That's how hash tables get O(1) lookup time.
Does the implementation internally construct a lookup table as elements are added to the collection?
Yes.
In general, how might I ascertain information about complexity of .NET data structures?
Read the documentation.
Actually the lookup time of a HashSet<T> isn't always O(1).
As others have already mentioned a HashSet uses IEqualityComparer<T>.GetHashCode().
Now consider a struct or object which always returns the same hash code x.
If you add n items to your HashSet there will be n items with the same hash in it (as long as the objects aren't equal).
So if you were to check if an element with the hash code x exists in your HashSet it will run equality checks for all objects with the hash code x to test wether the HashSet contains the element
It would depends on quality of hash function (GetHashCode()) your IEqualityComparer implementation provides. Ideal hash function should provide well-distributed random set of hash codes. These hash codes will be used as an index which allows mapping key to a value, so search for a value by key becomes more efficient especially when a key is a complex object/structure.
the Comparer code would have to be executed on every key to check to
see if there was a match. This would not be O(1), but O(n).
This is not how hashtable works, this is some kind of straightforward bruteforce search. In case of hashtable you would have more intelligent approach which uses search by index (hash code).
Lookup is still O(1) if you pass an IEqualityComparer. The hash set still uses the same logic as if you don't pass an IEqualityComparer; it just uses the IEqualityComparer's implementations of GetHashCode and Equals instead of the instance methods of System.Object (or the overrides provided by the object in question).
Is there difference in speed between Dictionary.ContainsKey/Value and a foreach loop that checks for a certain key/value?
ContainsKey is faster :
This method approaches an O(1) operation.
ContainsValue is like a foreach loop.
This method performs a linear search; therefore, the average execution time is proportional to Count. That is, this method is an O(n) operation, where n is Count.
Yes.
ContainsKey is nearly O(1). As for ContainsValue, I can't tell for sure, but I think there won't be much difference to a loop.