Fastest way to get any element from a Dictionary - c#

I'm implementing A* in C# (not for pathfinding) and I need Dictionary to hold open nodes, because I need fast insertion and fast lookup. I want to get the first open node from the Dictionary (it can be any random node). Using Dictionary.First() is very slow. If I use an iterator, MoveNext() is still using 15% of the whole CPU time of my program. What is the fastest way to get any random element from a Dictionary?

I suggest you use a specialized data structure for this purpose, as the regular Dictionary was not made for this.
In Java, I would probably recommend LinkedHashMap, for which there are custom C# equivalents (not built-in sadly) (see).
It is, however, rather easy to implement this yourself in a reasonable fashion. You could, for instance, use a regular dictionary with tuples that point to the next element as well as the actual data. Or you could keep a secondary stack that simply stores all keys in order of addition. Just some ideas. I never did implemented nor profiled this myself, but I'm sure you'll find a good way.
Oh, and if you didn't already, you might also want to check the hash code distribution, to make sure there is no problem there.

Finding the first (or an index) element in a dictionary is actually O(n) because it has to iterate over every bucket until a non-empty one is found, so MoveNext will actually be the fastest way.
If this were a problem, I would consider using something like a stack, where pop is an O(1) operation.

Try
Enumerable.ToList(dictionary.Values)[new Random().next(dictionary.Count)].
Should have pretty good performance but watch out for memory usage if your dictionary is huge. Obviously take care of not creating the random object every time and you might be able to cache the return value of Enumerable.ToList if its members don't change too frequently.

Related

Maintaining data locality in a Dictionary<TKey,TValue>

I'm making a game and I decided that for reasons, I'd give each game object an int entity ID that I could easily search them by instead of having to linearly search a list or worse, many lists. The idea was inspired by the ECS pattern and I figured if I made sure to re-use ints when they were destroyed, it would help keep all the data close together in memory and reduce cache misses by a bit. (I know that depends more on access order, just thinking in the abstract here). The problem is I'm now doubting myself and I've read so much that I can't keep the ideas straight in my head.
The question is essentially if I keep endlessly adding higher numbered keys to a Dictionary<int, SomeClass>, will the speed/memory usage be worse than if I try to re-use lower numbers?
Note: I feel like the answer is going to be "write your own class" but I was trying to avoid that and I don't think I'd do a good job if I don't understand this concept.
No, it makes no difference at all. From MSDN:
The Dictionary generic class provides a mapping from a set of keys to a set of values. Each addition to the dictionary consists of a value and its associated key. Retrieving a value by using its key is very fast, close to O(1), because the Dictionary class is implemented as a hash table.
So, the speed will always be O(1) because it internally uses a hash table, the value of the key doesn't affects it at all.
The only problem you can face is if you reach int.MaxValue, that's up to your scenerio.
Okay here's my best effort at answering this myself, apologies if I get anything wrong.
Short answer: No. If you add higher numbers they just get stuck somewhere into the array until it's full. The solution to the example problem is to just replace the dictionary with a GameObject array and use the int as an index, and if necessary write a class to handle expanding it.
Longer answer: I think my confusion came from reading somewhere that a dictionary was just a pair of parallel arrays or something like that. I guess that's true but since it's indexed by hash codes, it's not intended for contiguous index values. So it's doing a bunch of redundant work to handle cases that I'm never going to use it for.

Mass Update a property on multiple records inside a dictionary (VB.NET / C#)

I have a Dictionary (of Long, Class), where Class has multiple properties (assume we have a property called Updated as Boolean).
I want to update this (Updated) property to (True) at once for let's say all Odd key records (or based on any specific rule). What is the best way to do so?
My thoughts are to use Linq to fetch those records then (for each) them, but is there any better way to do so like doing a mass update where a condition happens (like what we do in the database)?
An example of my approach is below. Appreciate it if there is a better way to do such an update...
Thanks
Dim ReturnedObjs = From Obj In Dictionary Where Obj.Key Mod 2 = 1
For Each item As KeyValuePair(Of Long, Class) In ReturnedObjs
item.Value.Updated = True
Next
First, this sounds like a obvious case for the speed rant:
https://ericlippert.com/2012/12/17/performance-rant/
Second:
The best way is to keep this in the Database. You are not going to beat the speed of a DB Query with Indexes designed for quick matching, by transfering the data over the network twice (once to get it, once to return it) and doubling the search load (once to get all odd ones, once to update all the ones you just changed). My standing advice is to always keep as much work as possible on the DB side. Your client code will never be able to beat it.
Third:
If you do need to use client side processing:
Now a lot of my answer depend on details of the implementation, how the JiT and general Compiler optimsiations work, etc.
Foreach uses works on enumerators, not Collections. But if you feed a collection to foreaach, a Enumerator is implicitly created. Now enumerators do have two properties:
If the collection changes, the Enumerator becomes invalid. Most people learn about them because they ran into this issue.
It is a extra function call and set of checks for accessing a collection. So it will be a slowdown. How much is hard to say, as the Optimisations and JiT are pretty good.
So you propably want to use for loop instead.
If you could turn the Dictionary into a collection where the Primary Key is used as Index, it might be a bit faster. But hat has the danger of running into a lot of "dry spells" regarding data, so it depends a lot on your source data.

Poor use of dictionary?

I've read on here that iterating though a dictionary is generally considered abusing the data structure and to use something else.
However, I'm having trouble coming up with a better way to accomplish what I'm trying to do.
When a tag is scanned I use its ID as the key and the value is a list of zones it was seen in. About every second I check to see if a tag in my dictionary has been seen in two or more zones and if it has, queue it up for some calculations.
for (int i = 0; i < TagReads.Count; i++)
{
var tag = TagReads.ElementAt(i).Value;
if (tag.ZoneReads.Count > 1)
{
Report.Tags.Enqueue(tag);
Boolean success = false;
do
{
success = TagReads.TryRemove(tag.Epc, out TagInfo outTag);
} while (!success);
}
}
I feel like a dictionary is the correct choice here because there can be many tags to look up but something about this code nags me as being poor.
As far as efficiency goes. The speed is fine for now in our small scale test environment but I don't have a good way to find out how it will work on a massive scale until it is put to use, hence my concern.
I believe that there's an alternative approach which doesn't involve iterating a big dictionary.
First of all, you need to create a HashSet<T> of tags on which you'll store those tags that have been detected in more than 2 zones. We'll call it tagsDetectedInMoreThanTwoZones.
And you may refactor your code flow as follows:
A. Whenever you detect a tag in one zone...
Add the tag and the zone to the main dictionary.
Create an exclusive lock against tagsDetectedInMoreThanTwoZones to avoid undesired behaviors in B..
Check if the key has more than one zone. If this is true, add it to tagsDetectedInMoreThanTwoZones.
Release the lock against tagsDetectedInMoreThanTwoZones.
B. Whenever you need to process a tag which has been detected in more than one zone...
Create an exclusive lock against tagsDetectedInMoreThanTwoZones to avoid more than a thread trying to process them.
Iterate tagsDetectedInTwoOrMoreZones.
Use each tag in tagsDetectedInMoreThanTwoZones to get the zones in your current dictionary.
Clear tagsDetectedInMoreThanTwoZones.
Release the exclusive lock against tagsDetectedInMoreThanTwoZones.
Now you'll iterate those tags that you already know that have been detected in more than a zone!
In the long run, you can even make per-region partitions so you never get a tagsDetectedInMoreThanTwoZones set with too many items to iterate, and each set could be consumed by a dedicated thread!
If you are going to do a lot of lookup in your code and only sometimes iterate through the whole thing, then I think the dictionary use is ok. I would like to point out thought that your use of ElementAt is more alarming. ElementAt performs very poorly when used on objects that do not implement IList<T> and the dictionary does not. For IEnumerables<T> that do not implement IList the way the nth element is found is through iteration, so your for-loop will iterate the dictionary once for each element. You are better off with a standard foreach.
I feel like this is a good use for a dictionary, giving you good access speed when you want to check if an ID is already in the collection.

LINQ ToDictionary initial capacity

I regularly use the LINQ extension method ToDictionary, but am wondering about the performance. There is no parameter to define the capacity for the dictionary and with a list of 100k items or more, this could become an issue:
IList<int> list = new List<int> { 1, 2, ... , 1000000 };
IDictionary<int, string> dictionary = list.ToDictionary<int, string>(x => x, x => x.ToString("D7"));
Does the implementation actually take the list.Count and passes it to the constructor for the dictionary?
Or is the resizing of the dictionary fast enough, so I don't really have to worry about it?
Does the implementation actually take the list.Count and passes it to
the constructor for the dictionary?
No. According to ILSpy, the implementation is basically this:
Dictionary<TKey, TElement> dictionary = new Dictionary<TKey, TElement>(comparer);
foreach (TSource current in source)
{
dictionary.Add(keySelector(current), elementSelector(current));
}
return dictionary;
If you profile your code and determine that the ToDictionary operation is your bottleneck, its trivial to make your own function based on the above code.
Does the implementation actually take the list.Count and passes it to the constructor for the dictionary?
This is an implementation detail and it shouldn't matter to you.
Or is the resizing of the dictionary fast enough, so I don't really have to worry about it?
Well, I don't know. Only you know whether or not this is actually a bottleneck in your application, and whether or not the performance is acceptable. If you want to know if it's fast enough, write the code and time it. As Eric Lippert is wont to say, if you want to know how fast two horses are, do you pit them in a race against each other, or do you ask random strangers on the Internet which one is faster?
That said, I'm having a really hard time imaging this being a bottleneck in any realistic application. If adding items to a dictionary is a bottleneck in your application, you're doing something wrong.
I don't think it'll be a bottleneck TBH. And in case you have real complaints and issues, you should look into it at that time to see if you can improve it, may be you can do paging instead of converting everything at once.
I don't know about resizing the dictionary, but checking the implementation with dotPeek.exe suggests that the implementation does not take the list length.
What the code basically does is:
create a new dictionary
iterate over sequence and add items
If you find this a bottleneck, it would be trivial to create your own extension method ToDictionaryWithCapacity that works on something that can have its length actually computed without iterating the whole thing.
Just scanned the Dictionary implementation. Basically, when it starts to fill up, the internal list is resized by roughly doubling it to a near prime. So that should not happen too frequently.
Does the implementation actually take the list.Count and passes it to the constructor for the dictionary?
It doesn't. That's because the calling Count() would enumerate the source, and then adding it to the dictionary would enumerate the source a second time. It's not a good idea to enumerate the source twice, for example this would fail on DataReaders.
Or is the resizing of the dictionary fast enough, so I don't really have to worry about it?
The Dictionary.Resize method is used to expand the dictionary. It allocates a new dictionary and copies the existing items into the new dictionary (using Array.Copy). The dictionary size is increased in prime number steps.
This is not the fastest way, but fast enough if you do not know the size.

Help with C#.NET generic collections performance and optimization

I am trying to optimize a piece of .NET 2.0 C# code that looks like this:
Dictionary<myType, string> myDictionary = new Dictionary<myType, string>();
// some other stuff
// inside a loop check if key is there and if not add element
if(!myDictionary.ContainsKey(currentKey))
{
myDictionary.Add(currentKey, "");
}
Looks like the Dictionary has been used by whoever wrote this piece of code even if not needed (only the key is being used to store a list of unique values) because faster than a List of myType objects for search.
This seems obviously wrong as only the key of the dictionary but I am trying to understand what's the best way to fix it.
Questions:
1) I seem to understand I would get a good performance boost even just using .NET 3.5 HashSet. Is this correct?
2) What would be the best way to optimize the code above in .NET 2.0 and why?
EDIT:
This is existing code I am trying to optimize, it's looping through dozens of thousands items and for each one of them is calling a ContainsKey. There's gotta be a better way of doing it (even in .NET 2.0)! :)
I think you need to break this down into 2 questions
Is Dictionary<myType,string> the best available type for this scenario
No. Based on your breakdown, HashSet<myType> is clearly the better choice because it's usage pattern more accurately fits the scenario
Will switching to Hashset<myType> give me a performance boost?
This is really subjective and only a profiler can give you the answer to this question. Likely you'll see a very minor memory size improvement per element in the collection. But in terms of raw computing power I doubt you'll see a huge difference. Only a profiler can tell you if there is one.
Before you ever make a performance related change to your code remember the golden rule.
Don't make any performance related changes until a profiler has told you precisely what is wrong with your code.
Making changes which violate this rule are just guesses. A profiler is the only way to measure success of a performance fix.
1) No. A dictionary does a hash on the key so your lookup should be O(1). A Hashset should result in less memory needed though. But honestly, it isn't that much that you will really see a performance boost.
2) Give us some more detail as to what you are trying to accomplish. The code you posted is pretty simple. Have you measured yet? Are you seeing that this method is slow? Don't forget "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." -- Donald Knuth
Depending on the size of your keys, you may actually see performance degrade.
One way in 2.0 would be to try and insert it and catch the exception (of course, this depends on how many duplicate keys you plan on having:
foreach(string key in keysToAdd)
{
try
{
dictionary.Add(key, "myvalue");
}
catch(ArgumentException)
{
// do something about extra key
}
}
The obvious mistake (if we discuss performance) I can see is the double work done when calling ContainsKey and then adding the key-value pair. When the pair is added using Add method, the key is again internally checked for presense. The whole if block can be safely replaced by this:
...
myDictionary[currentKey] = "";
...
If the key already exists there, the value will be just replaces and no exception will get thrown. Moreover, if the value is not used at all I would personally use null values to fill it. Can see no reason for using any string constant there.
The possible performance degrade mentioned by scottm is not for doing simple lookups. It is for calculating the intersection between 2 sets. HashSet does have slightly faster lookups than Dictionary. The performance difference really is going to be very small, though, as everyone says -- the lookup takes most of the time & creating the KeyValuePair takes very little.
For 2.0, you could make the "Value" object one of these:
public struct Empty {}
It may do slightly better than the "".
Or you could try making a reference to System.Core.dll in your 2.0 project, so you can use the HashSet.
Also, make sure that GetHashCode and Equals are as efficient as possible for MyType. I've been bitten by using a dictionary on something with a really slow GetHashCode (I believe we tried to use a delegate as a key or something like that.)

Categories