Is there any alternative of Dictionary in .NET? - c#

I have a requirement of using a dictionary in the project but as we know that they are only accessible using the keys and not using the indexes, and I want to access the items in dictionary using indexes. So I fiddle over the web and found out about OrderedDictionary as they are accessible using both the indexes and keys but they have some performance issue and as I am reading/writing the dictionary every minute of the day so it's not a good idea to use OrderedDictionary.
So lastly my question in here is that is there any alternative available which gives me functionality of Dictionary and I can also access it using indexes and doesn't cause a performance issue.

SortedList<TKey, TValue> has a property, Values, that is an IList<TValue>. Is it enough? It's fast only for small "sets" of elements. The difference with SortedDictionary is here http://msdn.microsoft.com/en-us/library/5z658b67(v=vs.80).aspx
Can I ask you why you want to access it "by index"? You can still enumerate it with foreach, you know?

in response to your comment that you are only expecting a hundred updates a minute, this is very little work - practically nothing. You can still use an OrderedDictionary, performance will not be an issue for you.

Try SortedDictionary
http://msdn.microsoft.com/en-us/library/f7fta44c.aspx

Related

Concurrent collection with constant read time

ConcurrentDictionary offer constant read time. I do not need key-value pairs, only keys.. I searched for read times on ConcurrentBag and havent found how it is implemented?
Is there a constant read time ConcurrentCollection, besides ConcurrentDictionary?
ConcurrentBag is probably not what you are looking for:
Represents a thread-safe, unordered collection of objects.
Which means that it allows duplicates (whereas the dictionary doesn't)
Bags are useful for storing objects when ordering doesn't matter, and unlike sets, bags support duplicates.
As for performance, it certainly isn't as much performant as a list (so at least O(n)) (C# - Performance comparison of ConcurrentBag vs List)
For a ConcurrentSet check your luck with the custom implementation here: How to implement ConcurrentHashSet in .Net
You can also check the list of Concurrent collections to see if something else suits your needs.

List.AddRange Performance of Adding Array vs List

When using List.AddRange(), is there any difference in performance between adding a List or Array.
MyList.AddRange(MyArrayof1000ComplexElements);
VS
MyList.AddRange(MyListof1000ComplexElements);
or is there no difference?
Since an array and a list both implement ICollection<T>, it uses the same code.
It resolves to a call to Array.Copy(...)
http://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs#e569d850a66a1771#references
There is no difference between List<T> and T[] - AddRange uses the same handling for anything implementing ICollection<T>, which both of those do.
Both Array and List implement the ICollection<T> interface. Therefore, the implementation of List.AddRange that is used will be identical and will offer the same performance.
In the future, you can either test something like this yourself with a simple program using the Stopwatch class for timing or download a tool like JetBrain's dotPeek to inspect the framework code yourself.
This is a more interesting question than some of the comments might suggest.
As it happens, for this specific list/array implementation the answer is: no difference. Both rely on the same collection interface.
But it doesn't have to be that way. If a list is implemented as a doubly-linked list (which it is in many other cases) then appending one list to another is O(1) while appending an array to a list is O(n).
And I would not start by benchmarking to resolve this question. Benchmarking is time-consuming to do well and can easily produce results susceptible to misinterpretation. In this case a careful study of the implementation and the underlying source code (easily available through a .NET disassembler) will answer the question faster. Then benchmark to confirm, if it matters enough.
Please note that the specific O(1) optimisation that applies here is only available if MyListof1000ComplexElements too is a List. If it some kind of enumerator or linked list then the performance will be O(n).
In response to those who have criticised this answer, please note that it has been written with the intention of highlighting that the other answers given are based on a specific interpretation of the question. They fail to point out how narrowly they have interpreted the question and how narrowly their answers apply. Another reader might easily miss the fact that this answer only applies to this specific circumstance if they don't say so. My aim is simply to point out that in many other closely related situations, this is an O(n) operation rather than O(1).

How to store 50,000 English words so that it takes as little memory as possible

I have to store ~50,000 English words in memory and I'd like to know what would be the best data structure in term of memory footprint (and loading speed). Would it be a Trie? How would I serialize it into a file? Is there anything better than that?
Essentially, once the ~50,000 words are loaded into memory, I simply need to check if the word exists or not.
Well, according to your provided guidelines, a simple List would be better.
Fetching time would be obviously slower than a Trie or Dictionary, but
"in term of memory footprint (and loading speed)"
It will require very little memory overhead, and will load faster (as no indexes / prefix data structures are built).
See this blog post for some memory comparison details (In JavaScript, but still applies).
According to this answer, the Dictionary class is what you need. As per the MSDN documentation, you should use the TryGetValue method to access your data:
Use the TryGetValue method if your code frequently attempts to access
keys that are not in the dictionary. Using this method is more
efficient than catching the KeyNotFoundException thrown by the Item
property.
Yes, a trie sounds ok for this. For serialising you'd have two options:
Use the original word list and rebuild the trie. It should be fast enough, I guess, but you may want to profile it.
Just use normal .NET serialisation for the type and dump it to a file. This prevents programs in other languages from reading it, though.
A Dictionary object is proposed.
Read these:
Most efficient in-memory data structure for read-only dictionary access
Why is Dictionary preferred over hashtable?
For Help on implementation read this:
http://msdn.microsoft.com/en-us/library/xfhwa508.aspx
For serializing the dictionary object or the hash table read this reference:
http://blogs.msdn.com/b/adam/archive/2010/09/10/how-to-serialize-a-dictionary-or-hashtable-in-c.aspx

Have Arrays in .NET lost their significance?

For every situation that warrants the use of an array ... there is an awesome collection with benefits. Is there any specific use case for Arrays any more in .NET?
Sending/Receiving data with a specific length comes to mind, ie. Serial Port, Web Request, FTP Request. Basically stuff that works on a lower level in the system. Also, most Collections are using an array for storage (Noteable exception: LinkedList<T>). Collections are just another abstraction layer.
Arrays are useful because they are always linear in memory and are fast to work with. For example I can take a byte[] and marshal directly into a structure without any problems but a List<T> would have to be converted to an array first as far as I know.
No they still have their uses and should always be considered.
Remember arrays are very basic representations of a fixed length so they are very fast and most languages understand them depending on the type used in the array.
You need to define an array size at the time that it is created and cannot change its size later. Lists and other things can grow as needed which adds overhead with respect to memory allocation.
Lists and other types are useful because they can do a lot but sometimes you don't need all that extra overhead so an array is all you need.
It's like driving a 4x4 because you think one day you might need to go off roading even though there is 99.9% of a chance you will be on normal roads. Array would be the basic car and a List for example would be a 4x4... it does everything else a car can do (an under the hood might use most of the same parts) but at the expense of gas, cost, might not fit in certain parking stalls, etc...
Arrays = performance and compatability
Lists (or other representations) = ease of use at a cost of performance and compatability
Yes, there certainly still is a use for arrays. Some methods still needs arrays.
For example:
string[] items = "a,b;c:d".Split(new char[]{',',';',':'});
It's still the simplest way to keep a bunch of items, and the number one choice until you need some specific feature, like for example dynamic growth.
Have arrays lost (some) significance?
Yes. For many tasks requiring a 'table' of items there now are more flexible and useful solutions like List<> and IEnumerable<>.
Have arrays lost their importance?
No. They are the fastest form of storage and they are used 'under the hood' in most of the collection classes, System.String etc.
So, arrays have become more low-level, and an application programmer will be using them directly less often.
Quite apart from everything else, many of those great collection classes you refer to are implemented using arrays. You might not be using them explicitly, but you're using loads of them and your program is better for it. That means that arrays must be in the language (or that the collections are implemented directly using lots of native code, which would be suckier).
Yes? Anytime I have a type which internally maintains a fixed-size collection of items, I use an array as it's the fastest to iterate and requires the least memory. No sense using a List<T>, Queue<T>, etc. if you don't need those features.
Two cases where I work daily with arrays:
Image analysis. Images are almost always byte[] or int[] arrays.
External hardware communication. Most devices require perfectly structured arrays to send/receive messages.
It's a fair question, but the answer is definitely that they're still useful. Speed is one reason, simplicity for fixed sizes is another. But I think the most important one is flexibility. It gives you a nice base to design your own collection, backed by a simple array, if you ever needed it.
No , Array will not loose its importance.
As when you know about the no of items in advance , you can go for array which gives you very fast access.
2- In Graph theroy , still when you store information about link between vertex , you can implment using arrays which are more fast than LinkList implementation.
3- Some methods like string.split returns array.
you can use this wonderful static placeholder of items in a verity of computer problems
I use arrays to maniulate images like in WritableBitmap class.
One thing I must tell you Arrays are the building blocks for any programming language. If you want to declare a storage having more than one element, Arrays are the basic option for you.
Say for instance a List.
If you see the definition of List, it actually holds
T[] items
Just use Reflector and find the definition of List, you will be surprised to find out List to be actually an Array. In .NET most of the collection other than LinkedList are basically an Array implementation. They used Array because of its fast storage and retrieval.
I agree that Array has limitation of Update or Remove, if your main emphasis is in storage than speed, you might go for Linked List.
What do you think the backing field behind many of those fancy collections is?

Should I be concerned about .NET dictionary speed?

I will be creating a project that will use dictionary lookups and inserts quite a bit. Is this something to be concerned about?
Also, if I do benchmarking and such and it is really bad, then what is the best way of replacing dictionary with something else? Would using an array with "hashed" keys even be faster? That wouldn't help on insert time though will it?
Also, I don't think I'm micro-optimizing because this really will be a significant part of code on a production server, so if this takes an extra 100ms to complete, then we will be looking for new ways to handle this.
You are micro-optimizing. Do you even have working code yet? Remember, "If it doesn't work, it doesn't matter how fast it doesn't work." (Mich Ravera) http://www.codingninja.co.uk/best-programmers-quotes/.
You have no idea where the bottlenecks will be, and already you're focused on Dictionary. What if the problem is somewhere else?
How do you know how the Dictionary class is implemented? Maybe it already uses an array with hashed keys!
P.S. It's really ".NET Dictionaries", not "C# Dictionaries", because C# is just one of several programming languages that use the framework.
Hello, I will be creating a project
that will use dictionary lookups and
inserts quite a bit. Is this something
to be concerned about?
Yes. It is always wise to consider performance factors up front.
The form that your concern should take is as follows: your concern should be encouraging you to write realistic, user-focused performance specifications. It should be encouraging you to start writing performance tests early, and running them often, so that you can see how every single change to the product affects performance. That way you will be informed immediately when a code change causes a user-affecting change in performance. And it should be encouraging you to run profiles often, so that you are reasoning about performance based on empirical measurements, rather than random guesses and hunches.
Also, if I do benchmarking and such
and it is really bad, then what is the
best way of replacing dictionary with
something else?
The best way to do this is to build a reasonable abstraction layer. If you have a class (or interface) which represents the "insert" and "lookup" abstract data type, then you can replace its internals without changing any of the callers.
Note that adding a layer of abstraction itself has a performance cost. If your profiling shows that the abstraction layer is too expensive, if the extra couple nanoseconds per call is too much, then you might have to get rid of the abstraction layer. Again, this decision will be driven by real-world performance data.
Would using an array with "hashed"
keys even be faster? That wouldn't
help on insert time though will it?
Neither you nor anyone reading this can possibly know which one is faster until you write it both ways and then benchmark it both ways under real-world conditions. Doing it under "lab" conditions will skew your results; you'll need to understand how things work when the GC is under realistic memory pressure, and so on. You might as well ask us which horse will run faster in next year's Kentucky Derby. If we knew the answer just by looking at the racing form, we'd all be rich already. You can't possibly expect anyone to know which of two entirely hypothetical, unwritten pieces of code will be faster under unspecified conditions!
The Dictionary<TKey, TValue> class is actually implemented as a hash table which makes lookups very fast (close to O(1)). See the API documentation for more information. I doubt you could make a better implementation yourself.
Wait and see if the performance of your application is below expectations
If it is then use a profiler to determine if the Dictionary lookup is the source of the problem
If it is then do some tests with representative data to see if another choice of list would be quicker.
In short - no, in general you shouldn't worry about the performance of implementation details until after you have a problem.
I would do a benchmark of the Dictionary, HashTable (HashSet in .NET), and perhaps a home grown class, and see which works out best under your typical usage conditions.
Normally I would say it's fine (insert StackOverflow's favorite premature ejaculation quote here), but if this is a core peice of the application, Benchmark, Benchmark, Benchmark.
The only concern that I can think of is that the speed of the dictionary relies on the key class having a reasonably fast GetHashCode method. Lookups and inserts are really fast, so you shouldn't have any problem there.
Regarding using an array, that's what the Dictionary class does already. Actually it uses two arrays, one for the keys and one for the values.
If you would have any performance problems with a Dictionary, it would be quite easy to make a wrapper for any kind of storage, that has the same methods and behaviour as a Dictionary so that you can replace it seamlessly.
I'm not sure that anyone has really answered this part yet:
Also, if I do benchmarking and such
and it is really bad, then what is the
best way of replacing dictionary with
something else?
For this, wherever possible, declare your variables as IDictionary<TKey, TValue>. That's the main interface that Dictionary derives from. (I'm assuming that if you care that much about performance, then you aren't considering non-generic collections.) Then, in the future, you can change the underlying implementation class without having to change any of the code that uses that dictionary. For example:
IDictionary<string, int> myDict = new Dictionary<string, int>();
If your application is multithreaded then the key part of performance is going to be synchronizing this Dictionary correctly.
If it is single-threaded then almost certainly bottleneck will be elsewhere. Such as reading these objects from wherever you are reading them.
I use Dictionary for UDP relay server . Each time packet arrives it performs Dictionary.ContainsKey and Dictionary[Key] , and it works great (massive number of clients). I had concerns when I was making the thing but it turned out that was last thing I should worry about.
Have a look at C# HybridDictionary Usage
HybridDictionary Class
This class is recommended for cases
where the number of elements in a
dictionary is unknown. It takes
advantage of the improved performance
of a ListDictionary with small
collections, and offers the
flexibility of switching to a
Hashtable which handles larger
collections better than ListDictionary
You may consider using the C5 library. I've found it to be very fast and thoughtfully designed. Others on stackoverflow have found the same. With C5 you have the option of using general type interfaces (with a captial I), or directly the data structures underneath. Naturally the interfaces allow you to swap out different implementations, but I have found in performance testing that the interfaces will cost you.
You may want to look at the KeyedCollection class in System.ObjectModel. From the MSDN description, "provides the abstract base class for a collection whose keys are embedded in the values."

Categories