When do we use HashSet<> [duplicate] - c#

This question already has answers here:
When should I use the HashSet<T> type?
(11 answers)
Closed 8 years ago.
i have a small sample .
//Class
public class GetEntity
{
public string name1 { get; set; }
public string name2 { get; set; }
public GetEntity() { }
}
and:
public void GetHash()
{
HashSet objHash = new HashSet();
GetEntity obj = new GetEntity();
obj.name1 = "Ram";
obj.name2 = "Shyam";
objHash.Add(obj);
foreach (GetEntity objEntity in objHash)
{
Label2.Text = objEntity.name1.ToString() + objEntity.name2.ToString();
}
}
Code works fine.Same task is done through Dictionary and List.But i want to know when we use HashSet<> , Dictionary<> or List<>.Is there only performance issue or any other things which i dont understand.Thanks.

i want to know when we use HashSet<> , Dictionary<> or <List>
They all have different purpose and used in different scenarios
HashSet
Is used when you want to have a collection with unique elements. HashSet stores list of unique elements and won't allow duplicates in it.
Dictionary
Is used when you want to have a value against a unique key. Each element in Dictionary has two parts a (unique) key and a value. You can store a unique key in it (just like Hashset) in addition you can store a value against that unique key.
List
Is just a simple collection of elements. You can have duplicates in it.

Set does not contain duplicated values.

I am not a C# guy myself but following should be the difference.
PLease correct me if I am wrong
HashSet will only take unique values, values can be randomly accessed by index, works in constant time
Dictionary will take key value pairs, Values can be accessed randomly by key names, key names can not be duplicate. This also is a very fast DS. Works in constant time
List will take n values even if they are not unique, values has to be accessed sequentially. Time complexity for insert and retrieval would be o(n) in worst case scenario

They are all called collections, which are usually in the namespace System.Collections.Generic.
When to use a certain data structure essentially requires understanding what operations they support. For HashSet, it's basically a Set in mathematics, and supports efficient Add, Remove, and quick judgement on whether an element Exists in the set. Given it's a set, the elements must be unique in Hashsets.
For Dictionary, it's basically a mapping structure, i.e. a set of Key-Value Pairs. Dictionary provides efficient Query on key-value pairs with a given key and also Add/Remove of the key-value pairs.
Lists are an ordered collection of elements. Unlike Hashsets, judging the existence of an element in a list is inefficient. And unlike Dictionaries, the internal data structure is not key-value pairs but simple objects. Another difference is you can use index (like list[3]) to efficiently access elements in Lists. (although it's not true for LinkedList)

Related

using HashSet as underlying storage to replicate a dictionary

I was asked a question today to re-implement the dictionary. My solution is to use a HashSet as the storage, and a class to represent the KeyValue pair. In this class, I override the GetHashCode and Equals methods in order to add the KeyValue pair instance to the HashSet.
I then read the source code for C# Dictionary, and found it uses the array for storage, and loop through the array to find the matching keyvalues.
Is my approach correct? What is advantage of current Dictionary implementation in C#? Thanks in advance.
public class MyDictionary<K,V>
{
private class KV
{
public K Key {get;set;}
public V Value {get;set;}
public override int GetHashCode()
{
return Key.GetHashCode();
}
public override bool Equals(object o)
{
var obj = ((KV)o).Key;
return Key.Equals(obj);
}
}
private readonly HashSet<KV> _store = new HashSet<KV>();
public void Add(K key, V value)
{
_store.Add(new KV{Key = key, Value = value});
}
public V this[K key]
{
get
{
KV _kv;
if (_store.TryGetValue(new KV{Key = key}, out _kv))
{
return _kv.Value;
}
else
{
return default(V);
}
}
set
{
this.Add(key, value);
}
}
}
How do you think HashSet is implemented? The code that you're seeing in Dictionary is going to look very similar to the code that's internally in HashSet. Both are backed by an array that stores a collection of all of the keyed items that share a hash, it's just that one stores a key and a pair, and one just stores the key on its own.
If you're just asking why the developer for Dictionary re-implemented some similar code to what's in a HashSet rather than actually using the actual HashSet internally, we can only guess. They naturally could have, if they wanted to, in the sense that they can create functionally identical results from the perspective of an outside observer.
The reason to use Dictionary is because it is well written, well tested, is already done, and it works.
Your code has a problem when replacing the value associated with a key that's already been added. The following code:
dict["hi"]=10;
dict["hi"]=4;
Console.WriteLine(dict["hi"]);
will output 10 with your class. Dictionary will output (correctly) 4.
As far as the use of arrays, both HashSet and Dictionary use them in their implementations.
HashSet
private int[] m_buckets;
private HashSet<T>.Slot[] m_slots;
Dictionary
private int[] buckets;
private Dictionary<TKey, TValue>.Entry[] entries;
HashSet and Dictionary do not loop through their arrays to find the key/value. They use a modulus of the hashcode value to directly index into the bucket array. The value in the bucket array points into the slots or entries array. Then, they loop over the list of keys that had identical hashcodes or colliding hashcodes (two different hashcodes that result in the same value after the modulus is applied). These little collision lists are in the slots or entries arrays, and are typically very small, usually with just a single element.
Why isn't Dictionary just implemented onto HashSet? Because the two classes do two different things. HashSet is geared towards storing a set of unique keys. Dictionary is geared towards storing values associated with unique keys. You tried to use a HashSet to store a value by embedding it in the key (which is an object). But I pointed out why that fails to work. It's because HashSet doesn't entertain the concept of a value. It cares only for the key. So it's not suited to being used as a dictionary. Now, you could use Dictionary to implement a HashSet, but that would be wasteful, as there is code and memory in Dictionary dedicated to handling the values. There are two classes, that are each made to fulfill a specific purpose. They are similar, but not the same
What is advantage of ... us[ing] the array for storage, and loop[ing] through the array to find the matching keyvalues[?]
I can answer this from a Java perspective. I think it's very similar in C#.
The Big O time complexity of a get from a hashset is O(1), while an array is O(n). Naively, one might think the hashset would perform better. But it's not that simple. Computing a hash code is relatively expensive, and each class provides its own hashing algorithm, so the run time and quality of hash distribution can vary widely. (It is inefficient but perfectly legal for a class to return the same hash for every object. Hash based collections storing such objects will degenerate to array performance.)
The upshot of all this is that despite the theoretical performance difference, it turns out that for small collections, which are the vast majority of collections in a typical program, iterating over an array is faster than computing a hash. Google introduced an array based map as an alternative to hashmap in their Android API, and they suggest that the array based version performs better for collections up to around 10 to 100 elements. The uncertain range is because, as I mentioned, the cost of hashing varies.
Bottom line... if performance matters, forget Big O and trust your benchmarks.
The problem with your implementation is that a HashSet only stores a single entry for the specified key, in your case the hash value. So if the caller wants to add two entries to your dictionary that happen to have the same hash value then only the first is stored, the second is ignored.
A dictionary is typically implemented as a list of entries that match the hash value, that way you can have multiple entries with the same hash value. This does make it more complicated because when adding/removing/looking up you need to handle the list.

Use List or Dictionary?

I have a program where i need to store a list of some variables.
each variables has a name and a value and i want to make a function that gets the name of a variable and returns its value:
object getValue(string name);
To do that i have two choices:
1: Store the variables in a dictionary Dictionary and then the function getValue would just fetch the variable whose key is the name i am looking for:
object getValue (string name)
{
return variablesDictionary[name].Value;
}
2: Store the variables in a list and then access the wanted variable through linq:
object getValue (string name)
{
return variablesList.Where(v => v.Name == name).First();
}
Both are very simple but the second one (linq) seems more compelling because it uses linq and also because in the first method the same name is stored in two different places which is redundant.
What is the best method with respect to best practices and performance?
Thanks
Using a dictionary is way faster than using a list, at least in any case when it matters.
The performance of a dictionary lookup is O(1), while a list search is O(n). That means that it takes the same time to find the item in the dictionary with few items as with many items, but finding them in the list takes longer the more items that you have.
For very small sets of variables the list may be slightly faster, but then they are both so fast that it doesn't matter. With many items the dictionary clearly outperforms the list.
The dictionary uses a bit more memory, but not so much. Remember that it will only store the reference to the name, it's not another copy of the string.
You should definitely use a Dictionary in this case. If you use a List and want to do a lookup, in the worst case the program has to loop over the entire list to find the right object. For a Dictionary, this is always a constant time, irrespective of its size.
By the way, 'uses LINQ' is not a good reason to prefer one method over the other.
What you're trying to do is exactly what dictionaries were designed for.
Technically, any dictionary that uses an object property as a key to that object will be "redundant" as you describe, but because string is a reference type, it's not like you'll be using up a huge amount of memory to store the "redundant" key.
At the cost of a few extra bytes, you get a huge performance increase. The thing that makes dictionaries so cool is that they're hash tables, so they'll always look up an element from a key quickly, no matter how big they are. But if you use a list and try to iterate over it with LINQ, you might have 10,000 items in the list and the one you're looking for is at the end, and it will take approximately 10,000 times longer than looking it up with a Dictionary. (For a more formal look at the math involved, try Googling "Big O notation" and "time complexity". It's a very useful bit of theory to know about when developing software!)

How does C# SortedList internally work?

I was wondering, how does the SortedList work.
I know that a regular List is based on a dynamic array, but what is SortedList based on?
And what sorting algorithm it uses?
Thanks
From the sortedlist documentation: "SortedList is implemented as an array of key/value pairs, sorted by the key."
http://msdn.microsoft.com/en-us/library/e7a8xew6%28v=vs.110%29.aspx
If you use the default constructor (no parameters): "Initializes a new instance of the SortedList class that is empty, has the default initial capacity, and is sorted according to the IComparable interface implemented by each key added to the SortedList object."
http://msdn.microsoft.com/en-us/library/cxb97few%28v=vs.110%29.aspx
Or you can pass a custom comparer:
Initializes a new instance of the SortedList class that is empty, has the default initial capacity, and is sorted according to the specified IComparer interface.
http://msdn.microsoft.com/en-us/library/e7a8xew6%28v=vs.110%29.aspx
Other constructor options:
http://msdn.microsoft.com/en-us/library/System.Collections.SortedList.SortedList%28v=vs.110%29.aspx
How to use IComparer interface: http://msdn.microsoft.com/en-us/library/system.collections.icomparer%28v=vs.110%29.aspx
SortedList class source code can be found here.
According to this source, SortedList keeps data in two plain arrays of keys and values:
private TKey[] keys;
private TValue[] values;
Sort order is maintained on the array of keys. When a new item (key/value pair) is added, SortedList first finds proper index in the sorted array of keys (using Array.BinarySearch), then moves partial contents of both key and value arrays (using Array.Copy) starting from this index upward to create a gap where both key and value are inserted. Likewise, when an item is deleted by its key, SortedList searches for the item's index in the array of keys, then moves partial contents of both arrays from this index downward to close the gap.
So, one thing to keep in mind is that when adding to or deleting from a large SortedList, a lot of data may be moved around. On the positive side, retrieving items by index is always fast regardless of the list size.
A SortedList is an object that maintain's two array to store the element's of the list.
One array store's the Key and the other array store's its associated values.
The element's in the SortedList are sorted either according to a specific IComparer implementation specified when the SortedList is created or according to the IComparable implementation provided by the keys themselves. They cannot contain duplicate keys.
Whenever any element is added or removed the index is adjusted accordingly as a result operations related to SortedList are slower.
If you are interested in the internals of how it works, get yourself any half decent decompiler, such as .Net Reflector.
A quick look shows that internally, SortedList maintains it's sorted state by keeping an internal array sorted at all times. When an item is added using the Add Method, it uses a binary search on the keys to identify the correct index to insert the new item.

What is the correct way of using a Dictionary in C#?

I have (lots of) objects Foo with an unique ID and want to store these in a Dictionary. The dictionary key in C# can be any primitive type or object. I could use the integer foo1.ID as key but also the object foo1.
Which is the correct way of implementing that and is there a difference in performance using either the ID (an integer) or the object as key?
NB. The values in the dictionary are other (type of) of objects.
How do you intend to search the dictionary? If you intend to search for items within the dictionary based purely on ID, then use that as the key. OTOH, if you're going to have an instance of a Foo, then make that the key.
Re: your edit - now we know that the Foo is either "the key" or "the object that provides the key value by accessing a property", then it seems simple to say, use a Dictionary<Foo,OtherClass> - assuming you've set up equality comparisons on Foo objects appropriately - why force every instance of lookup to know to extract a specific property from the Foo objects?
It depends on your use case. Assuming you want to look up objects given their key value you of course want the id to be the key. That you are asking this question makes me think maybe you don't want a dictionary at all - if you just need to keep a collection of items use a List<T> instead - dictionaries are for mapping a key (e.g. an id) to a value (e.g. a custom object).
Whatever you use as a key has to be able to be compared. For primitive types, equality is defined generally as you would expect. For objects, you would be testing reference equality unless you define another way to compare the objects, which you can do by passing the appropriate type of IComparer in the Dictionary constructor.
In your case, however, keying to the int is likely to be simplest. You would gain no real benefit from using the object as its own key. You can create the dictionary simply by taking your collection of Foo objects and doing something like:
IDictionary<int, Foo> fooDictionary = fooCollection.ToDictionary(f => f.ID);
Searching the dictionary will be more efficient than simply searching the collection for the given ID each time in most cases.
Dictionaries are Key Value Pairs. Each Key should be Unique. The Compiler has to make sure the Keys are unique. By Giving the Key as an object instead of an integer, You are probably doing an overkill. The compiler has compare to check the whole object in the Key to make sure it is unique. So i would go for Integer Key if that help you to identify your record uniquely.
Use the ID - if you already have the object, there's no point in looking it up, too.

How to get a partilcular item from a Dictionary object with index?

How to retrive an item from an dictionary object using an Index? eg i have a dicitiory object of 10 items and i have to get the 5th keypairvalue from the dictionary?
Dictionaries are unordered. If you mean "the 5th item added to the dictionary" - they don't provide this functionality.
One thing to be careful of is that in many cases Dictionary<TKey, TValue> appears to be ordered - if you just add a bunch of entries and then iterate, then under the current implementation I believe you will at least usually get back the pairs in the same order. However, it's not guaranteed, it's not meant to happen particularly - it's just a quirk of the implementation. If you delete entries and then add more, then the whole thing goes pear-shaped.
Fundamentally, if you want ordering as well as key lookups, you need to store a list as well as a dictionary.
If you are using .NET 3.5 or greater:
var keyValuePair = d.ElementAt(4);
However, this is using an enumerator behind the scenes and the ordering of enumerated items from a dictionary is not guaranteed:
The IDictionary interface allows the contained keys and values to be enumerated, but it does not imply any particular sort order (From IDictionary reference on MSDN).
This means that the element you get back might not correspond to the order you inserted it in and thus is probably not what you expect.
There is an OrderedDictionary class in System.Collections.Specialized that enforces the ordering and allows you to access by index through the Item indexer. However, this is from the pre-generics days so it only accepts object key-values and thus isn't quite as friendly to work with as the generic collections.
I just found this article on CodeProject that implements a generic OrderedDictionary. I have never used this but it might be useful for you.
Ignoring the fundamental abuse of a dictionary that this question presents:
int counter = 0;
foreach (var pair in yourDictionary)
{
if (++counter == 5)
{
// pair contains your fifth item
}
}
If you're using a generic dictionary like this:
Dicionary<int,string> myDict = new Dictionary<int,string>();
You could pull the 5th value from the dictionary by converting the output to a list:
string SomeString = myDict.Values.ToList()[4];
But typically you'd use a dictionary when you're more concerned about retrieving a value based on a pre-determined key rather than it's position in the list.
The 5th according to which ordering? The Dictionary class does not guarantee any specific ordering. If you want it in some specific ordering, retrieve the pairs from the collection (for example as John suggests) and sort them, then get the KeyValuePair at the index you need. If you need it ordered by insertion order, try using the System.Collections.Specialized.OrderedDictionary instead, then you can access the KeyValuePair directly by index.
Use System.Linq
string item=dicOBj.Keys.ElementAt(index);
you can get both key and value in the same way specifying index
.NET Framework has 8 dictionary classes: Dictionary, Hastable, ListDictionary, OrderedDictionary, SortedDictionary , SortedList, SortedList Generic. In all these classes items can be retrieved by key, but items can be retrieved by index only in OrderedDictionary, SortedList, SortedList Generic. If you need to retrieve from you dictionary items by key or by index you must use one of these classes: OrderedDictionary, SortedList, SortedList Generic.
How to use these classes you can find: OrderedDictionary Class , SortedList Class

Categories