LINQ ToDictionary initial capacity - c#

I regularly use the LINQ extension method ToDictionary, but am wondering about the performance. There is no parameter to define the capacity for the dictionary and with a list of 100k items or more, this could become an issue:
IList<int> list = new List<int> { 1, 2, ... , 1000000 };
IDictionary<int, string> dictionary = list.ToDictionary<int, string>(x => x, x => x.ToString("D7"));
Does the implementation actually take the list.Count and passes it to the constructor for the dictionary?
Or is the resizing of the dictionary fast enough, so I don't really have to worry about it?

Does the implementation actually take the list.Count and passes it to
the constructor for the dictionary?
No. According to ILSpy, the implementation is basically this:
Dictionary<TKey, TElement> dictionary = new Dictionary<TKey, TElement>(comparer);
foreach (TSource current in source)
{
dictionary.Add(keySelector(current), elementSelector(current));
}
return dictionary;
If you profile your code and determine that the ToDictionary operation is your bottleneck, its trivial to make your own function based on the above code.

Does the implementation actually take the list.Count and passes it to the constructor for the dictionary?
This is an implementation detail and it shouldn't matter to you.
Or is the resizing of the dictionary fast enough, so I don't really have to worry about it?
Well, I don't know. Only you know whether or not this is actually a bottleneck in your application, and whether or not the performance is acceptable. If you want to know if it's fast enough, write the code and time it. As Eric Lippert is wont to say, if you want to know how fast two horses are, do you pit them in a race against each other, or do you ask random strangers on the Internet which one is faster?
That said, I'm having a really hard time imaging this being a bottleneck in any realistic application. If adding items to a dictionary is a bottleneck in your application, you're doing something wrong.

I don't think it'll be a bottleneck TBH. And in case you have real complaints and issues, you should look into it at that time to see if you can improve it, may be you can do paging instead of converting everything at once.

I don't know about resizing the dictionary, but checking the implementation with dotPeek.exe suggests that the implementation does not take the list length.
What the code basically does is:
create a new dictionary
iterate over sequence and add items
If you find this a bottleneck, it would be trivial to create your own extension method ToDictionaryWithCapacity that works on something that can have its length actually computed without iterating the whole thing.
Just scanned the Dictionary implementation. Basically, when it starts to fill up, the internal list is resized by roughly doubling it to a near prime. So that should not happen too frequently.

Does the implementation actually take the list.Count and passes it to the constructor for the dictionary?
It doesn't. That's because the calling Count() would enumerate the source, and then adding it to the dictionary would enumerate the source a second time. It's not a good idea to enumerate the source twice, for example this would fail on DataReaders.
Or is the resizing of the dictionary fast enough, so I don't really have to worry about it?
The Dictionary.Resize method is used to expand the dictionary. It allocates a new dictionary and copies the existing items into the new dictionary (using Array.Copy). The dictionary size is increased in prime number steps.
This is not the fastest way, but fast enough if you do not know the size.

Related

ConcurrentDictionary adding and reading data [duplicate]

I have read this in answer to many questions on here. But what exactly does it mean?
var test = new Dictionary<int, string>();
test.Add(0, "zero");
test.Add(1, "one");
test.Add(2, "two");
test.Add(3, "three");
Assert(test.ElementAt(2).Value == "two");
The above code seems to work as expected. So in what manner is a dictionary considered unordered? Under what circumstances could the above code fail?
Well, for one thing it's not clear whether you expect this to be insertion-order or key-order. For example, what would you expect the result to be if you wrote:
var test = new Dictionary<int, string>();
test.Add(3, "three");
test.Add(2, "two");
test.Add(1, "one");
test.Add(0, "zero");
Console.WriteLine(test.ElementAt(0).Value);
Would you expect "three" or "zero"?
As it happens, I think the current implementation preserves insertion ordering so long as you never delete anything - but you must not rely on this. It's an implementation detail, and that could change in the future.
Deletions also affect this. For example, what would you expect the result of this program to be?
using System;
using System.Collections.Generic;
class Test
{
static void Main()
{
var test = new Dictionary<int, string>();
test.Add(3, "three");
test.Add(2, "two");
test.Add(1, "one");
test.Add(0, "zero");
test.Remove(2);
test.Add(5, "five");
foreach (var pair in test)
{
Console.WriteLine(pair.Key);
}
}
}
It's actually (on my box) 3, 5, 1, 0. The new entry for 5 has used the vacated entry previously used by 2. That's not going to be guaranteed either though.
Rehashing (when the dictionary's underlying storage needs to be expanded) could affect things... all kinds of things do.
Just don't treat it as an ordered collection. It's not designed for that. Even if it happens to work now, you're relying on undocumented behaviour which goes against the purpose of the class.
A Dictionary<TKey, TValue> represents a Hash Table and in a hashtable there is no notion of order.
The documentation explains it pretty well:
For purposes of enumeration, each item
in the dictionary is treated as a
KeyValuePair structure
representing a value and its key. The
order in which the items are returned
is undefined.
There's a lot of good ideas here, but scattered, so I'm going to try to create an answer that lays it out better, even though the problem has been answered.
First, a Dictionary has no guaranteed order, so you use it only to quickly look up a key and find a corresponding value, or you enumerate through all the key-value pairs without caring what the order is.
If you want order, you use an OrderedDictionary but the tradeoff is that lookup is slower, so if you don't need order, don't ask for it.
Dictionaries (and HashMap in Java) use hashing. That is O(1) time regardless of the size of your table. Ordered dictionaries typically use some sort of balanced tree which is O(log2(n)) so as your data grows, access gets slower. To compare, for 1 million elements, that's on the order of 2^20, so you'd have to do on the order of 20 lookups for a tree, but 1 for a hash map. That's a LOT faster.
Hashing is deterministic. Non-determinism means when you hash(5) the first time, and you hash(5) the next time, you get a different place. That would be completely useless.
What people meant to say is that if you add things to a dictionary, the order is complicated, and subject to change any time you add (or potentially remove) an element. For example, imagine the hash table has 500k elements into it, and you have 400k values. When you add one more, you reach the critical threshhold because it needs about 20% empty space to be efficient, so it allocates a bigger table (say, 1 million entries) and re-hashes all the values. Now they are all in different locations than they were before.
If you build the same Dictionary twice (read my statement carefully, THE SAME), you will get the same order. But as Jon correctly says, don't count on it. Too many things can make it not the same, even the initially allocated size.
This brings up an excellent point. It is really, really expensive to have to resize a hashmap. That means you have to allocate a bigger table, and re-insert every key-value pair. So it is well worth allocating 10x the memory it needs rather than have even a single grow have to happen. Know your size of hashmap, and preallocate enough if at all possible, it's a huge performance win. And if you have a bad implementation that doesn't resize, it can be a disaster if you pick too small of a size.
Now what Jon argued with me about in my comment in his answer was that if you add objects to a Dictionary in two different runs, you will get two different orderings. True, but that's not the dictionary's fault.
When you say:
new Foo();
you are creating a new object at a new location in memory.
If you use the value Foo as the key in a dictionary, with no other information, the only thing they can do is use the address of the object as the key.
That means that
var f1 = new Foo(1);
var f2 = new Foo(1);
f1 and f2 are not the same object, even if they have the same values.
So if you were to put them into Dictionaries:
var test = new Dictionary<Foo, string>();
test.Add(f1, "zero");
don't expect it to be the same as:
var test = new Dictionary<Foo, string>();
test.Add(f2, "zero");
even if both f1 and f2 have the same values. That has nothing to do with the deterministic behavior of the Dictionary.
Hashing is an awesome topic in computer science, my favorite to teach in data structures.
Check out Cormen and Leiserson for a high end book on red-black trees vs. hashing
This guy named Bob has a great site about hashing, and optimal hashes: http://burtleburtle.net/bob
The order is non-deterministic.
From here
For purposes of enumeration, each item in the dictionary is treated as a KeyValuePair structure representing a value and its key. The order in which the items are returned is undefined.
Maybe for your needs OrderedDictionary is the required.
I don't know C# or any of .NET, but the general concept of a Dictionary is that it's a collection of key-value pairs.
You don't access sequentially to a dictionary as you would when, for example, iterating a list or array.
You access by having a key, then finding whether there's a value for that key on the dictionary and what is it.
In your example you posted a dictionary with numerical keys which happen to be sequential, without gaps and in ascending order of insertion.
But no matter in which order you insert a value for key '2', you will always get the same value when querying for key '2'.
I don't know if C# permits, I guess yes, to have key types other than numbers, but in that case, it's the same, there's no explicit order on the keys.
The analogy with a real life dictionary could be confusing, as the keys which are the words, are alphabetically ordered so we can find them faster, but if they weren't, the dictionary would work anyway, because the definition of the word "Aardvark" would have the same meaning, even if it came after "Zebra". Think of a novel, in the other hand, changing the order of the pages wouldn't make any sense, as they are an ordered collection in essence.
The class Dictionary<TKey,TValue> is implemented using an array-backed index-linked list. If no items are ever removed, the backing store will hold items in order. When an item is removed, however, the space will be marked for reuse before the array is expanded. As a consequence, if e.g. ten items are added to a new dictionary, the fourth item is deleted, a new item is added, and the dictionary is enumerated, the new item will likely appear fourth rather than tenth, but there is no guarantee that different versions of Dictionary will handle things the same way.
IMHO, it would have been helpful for Microsoft to document that a dictionary from which no items are ever deleted will enumerate items in the original order, but that once any items are deleted, any future changes to the dictionary may arbitrary permute the items therein. Upholding such a guarantee as long as no items are deleted would be relatively cheap for most reasonable dictionary implementations; continuing to uphold the guarantee after items are deleted would be much more expensive.
Alternatively, it might have been helpful to have an AddOnlyDictionary which would be thread-safe for a single writer simultaneous with any number of readers, and guarantee to retain items in sequence (note that if items are only added--never deleted or otherwise modified--one may take a "snapshot" merely by noting how many items it presently contains). Making a general-purpose dictionary thread-safe is expensive, but adding the above level of thread-safety would be cheap. Note that efficient multi-writer multi-reader usage would not require use of a reader-writer lock, but could simply be handled by having writers lock and having readers not bother to.
Microsoft didn't implement an AddOnlyDictionary as described above, of course, but it's interesting to note that the thread-safe ConditionalWeakTable has add-only semantics, probably because--as noted--it's much easier to add concurrency to add-only collections than to collections which allow deletion.
Dictionary< string, Obj>, not SortedDictionary< string, Obj >, is default to sequence by the insertion order. Strange enough you need to specifically declare a SortedDictionary to have a dictionary that sorted by key string order:
public SortedDictionary<string, Row> forecastMTX = new SortedDictionary<string, Row>();

Does foreach loop work more slowly when used with a not stored list or array?

I am wondered at if foreach loop works slowly if an unstored list or array is used as an in array or List.
I mean like that:
foreach (int number in list.OrderBy(x => x.Value)
{
// DoSomething();
}
Does the loop in this code calculates the sorting every iteration or not?
The loop using stored value:
List<Tour> list = tours.OrderBy(x => x.Value) as List<Tour>;
foreach (int number in list)
{
// DoSomething();
}
And if it does, which code shows the better performance, storing the value or not?
This is often counter-intuitive, but generally speaking, the option that is best for performance is to wait as long as possible to materialize results into a concrete structure like a list or array. Please keep in mind that this is a generalization, and so there are plenty of cases where it doesn't hold. Nevertheless, the first instinct is better when you avoid creating the list for as long as possible.
To demonstrate with your sample, we have these two options:
var list = tours.OrderBy(x => x.Value).ToList();
foreach (int number in list)
{
// DoSomething();
}
vs this option:
foreach (int number in list.OrderBy(x => x.Value))
{
// DoSomething();
}
To understand what is going on here, you need to look at the .OrderBy() extension method. Reading the linked documentation, you'll see it returns a IOrderedEnumerable<TSource> object. With an IOrderedEnumerable, all of the sorting needed for the foreach loop is already finished when you first start iterating over the object (and that, I believe, is the crux of your question: No, it does not re-sort on each iteration). Also note that both samples use the same OrderBy() call. Therefore, both samples have the same problem to solve for ordering the results, and they accomplish it the same way, meaning they take exactly the same amount of time to reach that point in the code.
The difference in the code samples, then, is entirely in using the foreach loop directly vs first calling .ToList(), because in both cases we start from an IOrderedEnumerable. Let's look closely at those differences.
When you call .ToList(), what do you think happens? This method is not magic. There is still code here which must execute in order to produce the list. This code still effectively uses it's own foreach loop that you can't see. Additionally, where once you only needed to worry about enough RAM to handle one object at a time, you are now forcing your program to allocate a new block of RAM large enough to hold references for the entire collection. Moving beyond references, you may also potentially need to create new memory allocations for the full objects, if you were reading a from a stream or database reader before that really only needed one object in RAM at a time. This is an especially big deal on systems where memory is the primary constraint, which is often the case with web servers, where you may be serving and maintaining session RAM for many many sessions, but each session only occasionally uses any CPU time to request a new page.
Now I am making one assumption here, that you are working with something that is not already a list. What I mean by this, is the previous paragraphs talked about needing to convert an IOrderedEnumerable into a List, but not about converting a List into some form of IEnumerable. I need to admit that there is some small overhead in creating and operating the state machine that .Net uses to implement those objects. However, I think this is a good assumption. It turns out to be true far more often than we realize. Even in the samples for this question, we're paying this cost regardless, by the simple virtual of calling the OrderBy() function.
In summary, there can be some additional overhead in using a raw IEnumerable vs converting to a List, but there probably isn't. Additionally, you are almost certainly saving yourself some RAM by avoiding the conversions to List whenever possible... potentially a lot of RAM.
Yes and no.
Yes the foreach statement will seem to work slower.
No your program has the same total amount of work to do so you will not be able to measure a difference from the outside.
What you need to focus on is not using a lazy operation (in this case OrderBy) multiple times without a .ToList or ToArray. In this case you are only using it once(foreach) but it is an easy thing to miss.
Edit: Just to be clear. The as statement in the question will not work as intended but my answer assumes no .ToList() after OrderBy .
This line won't run:
List<Tour> list = tours.OrderBy(x => x.Value) as List<Tour>; // Returns null.
Instead, you want to store the results this way:
List<Tour> list = tours.OrderBy(x => x.Value).ToList();
And yes, the second option (storing the results) will enumerate much faster as it will skip the sorting operation.

Fastest way to get any element from a Dictionary

I'm implementing A* in C# (not for pathfinding) and I need Dictionary to hold open nodes, because I need fast insertion and fast lookup. I want to get the first open node from the Dictionary (it can be any random node). Using Dictionary.First() is very slow. If I use an iterator, MoveNext() is still using 15% of the whole CPU time of my program. What is the fastest way to get any random element from a Dictionary?
I suggest you use a specialized data structure for this purpose, as the regular Dictionary was not made for this.
In Java, I would probably recommend LinkedHashMap, for which there are custom C# equivalents (not built-in sadly) (see).
It is, however, rather easy to implement this yourself in a reasonable fashion. You could, for instance, use a regular dictionary with tuples that point to the next element as well as the actual data. Or you could keep a secondary stack that simply stores all keys in order of addition. Just some ideas. I never did implemented nor profiled this myself, but I'm sure you'll find a good way.
Oh, and if you didn't already, you might also want to check the hash code distribution, to make sure there is no problem there.
Finding the first (or an index) element in a dictionary is actually O(n) because it has to iterate over every bucket until a non-empty one is found, so MoveNext will actually be the fastest way.
If this were a problem, I would consider using something like a stack, where pop is an O(1) operation.
Try
Enumerable.ToList(dictionary.Values)[new Random().next(dictionary.Count)].
Should have pretty good performance but watch out for memory usage if your dictionary is huge. Obviously take care of not creating the random object every time and you might be able to cache the return value of Enumerable.ToList if its members don't change too frequently.

Is there a performance impact when calling ToList()?

When using ToList(), is there a performance impact that needs to be considered?
I was writing a query to retrieve files from a directory, which is the query:
string[] imageArray = Directory.GetFiles(directory);
However, since I like to work with List<> instead, I decided to put in...
List<string> imageList = Directory.GetFiles(directory).ToList();
So, is there some sort of performance impact that should be considered when deciding to do a conversion like this - or only to be considered when dealing with a large number of files? Is this a negligible conversion?
IEnumerable<T>.ToList()
Yes, IEnumerable<T>.ToList() does have a performance impact, it is an O(n) operation though it will likely only require attention in performance critical operations.
The ToList() operation will use the List(IEnumerable<T> collection) constructor. This constructor must make a copy of the array (more generally IEnumerable<T>), otherwise future modifications of the original array will change on the source T[] also which wouldn't be desirable generally.
I would like to reiterate this will only make a difference with a huge list, copying chunks of memory is quite a fast operation to perform.
Handy tip, As vs To
You'll notice in LINQ there are several methods that start with As (such as AsEnumerable()) and To (such as ToList()). The methods that start with To require a conversion like above (ie. may impact performance), and the methods that start with As do not and will just require some cast or simple operation.
Additional details on List<T>
Here is a little more detail on how List<T> works in case you're interested :)
A List<T> also uses a construct called a dynamic array which needs to be resized on demand, this resize event copies the contents of an old array to the new array. So it starts off small and increases in size if required.
This is the difference between the Capacity and Count properties on List<T>. Capacity refers to the size of the array behind the scenes, Count is the number of items in the List<T> which is always <= Capacity. So when an item is added to the list, increasing it past Capacity, the size of the List<T> is doubled and the array is copied.
Is there a performance impact when calling toList()?
Yes of course. Theoretically even i++ has a performance impact, it slows the program for maybe a few ticks.
What does .ToList do?
When you invoke .ToList, the code calls Enumerable.ToList() which is an extension method that return new List<TSource>(source). In the corresponding constructor, under the worst circumstance, it goes through the item container and add them one by one into a new container. So its behavior affects little on performance. It's impossible to be a performance bottle neck of your application.
What's wrong with the code in the question
Directory.GetFiles goes through the folder and returns all files' names immediately into memory, it has a potential risk that the string[] costs a lot of memory, slowing down everything.
What should be done then
It depends. If you(as well as your business logic) gurantees that the file amount in the folder is always small, the code is acceptable. But it's still suggested to use a lazy version: Directory.EnumerateFiles in C#4. This is much more like a query, which will not be executed immediately, you can add more query on it like:
Directory.EnumerateFiles(myPath).Any(s => s.Contains("myfile"))
which will stop searching the path as soon as a file whose name contains "myfile" is found. This is obviously has a better performance then .GetFiles.
Is there a performance impact when calling toList()?
Yes there is. Using the extension method Enumerable.ToList() will construct a new List<T> object from the IEnumerable<T> source collection which of course has a performance impact.
However, understanding List<T> may help you determine if the performance impact is significant.
List<T> uses an array (T[]) to store the elements of the list. Arrays cannot be extended once they are allocated so List<T> will use an over-sized array to store the elements of the list. When the List<T> grows beyond the size the underlying array a new array has to be allocated and the contents of the old array has to be copied to the new larger array before the list can grow.
When a new List<T> is constructed from an IEnumerable<T> there are two cases:
The source collection implements ICollection<T>: Then ICollection<T>.Count is used to get the exact size of the source collection and a matching backing array is allocated before all elements of the source collection is copied to the backing array using ICollection<T>.CopyTo(). This operation is quite efficient and probably will map to some CPU instruction for copying blocks of memory. However, in terms of performance memory is required for the new array and CPU cycles are required for copying all the elements.
Otherwise the size of the source collection is unknown and the enumerator of IEnumerable<T> is used to add each source element one at a time to the new List<T>. Initially the backing array is empty and an array of size 4 is created. Then when this array is too small the size is doubled so the backing array grows like this 4, 8, 16, 32 etc. Every time the backing array grows it has to be reallocated and all elements stored so far have to be copied. This operation is much more costly compared to the first case where an array of the correct size can be created right away.
Also, if your source collection contains say 33 elements the list will end up using an array of 64 elements wasting some memory.
In your case the source collection is an array which implements ICollection<T> so the performance impact is not something you should be concerned about unless your source array is very large. Calling ToList() will simply copy the source array and wrap it in a List<T> object. Even the performance of the second case is not something to worry about for small collections.
It will be as (in)efficient as doing:
var list = new List<T>(items);
If you disassemble the source code of the constructor that takes an IEnumerable<T>, you will see it will do a few things:
Call collection.Count, so if collection is an IEnumerable<T>, it will force the execution. If collection is an array, list, etc. it should be O(1).
If collection implements ICollection<T>, it will save the items in an internal array using the ICollection<T>.CopyTo method. It should be O(n), being n the length of the collection.
If collection does not implement ICollection<T>, it will iterate through the items of the collection, and will add them to an internal list.
So, yes, it will consume more memory, since it has to create a new list, and in the worst case, it will be O(n), since it will iterate through the collection to make a copy of each element.
"is there a performance impact that needs to be considered?"
The issue with your precise scenario is that first and foremost your real concern about performance would be from the hard-drive speed and efficiency of the drive's cache.
From that perspective, the impact is surely negligible to the point that NO it need not be considered.
BUT ONLY if you really need the features of the List<> structure to possibly either make you more productive, or your algorithm more friendly, or some other advantage. Otherwise, you're just purposely adding an insignificant performance hit, for no reason at all. In which case, naturally, you shouldn’t do it! :)
ToList() creates a new List and put the elements in it which means that there is an associated cost with doing ToList(). In case of small collection it won't be very noticeable cost but having a huge collection can cause a performance hit in case of using ToList.
Generally you should not use ToList() unless work you are doing cannot be done without converting collection to List. For example if you just want to iterate through the collection you don't need to perform ToList
If you are performing queries against a data source for example a Database using LINQ to SQL then the cost of doing ToList is much more because when you use ToList with LINQ to SQL instead of doing Delayed Execution i.e. load items when needed (which can be beneficial in many scenarios) it instantly loads items from Database into memory
Considering the performance of retrieving file list, ToList() is negligible. But not really for other scenarios. That really depends on where you are using it.
When calling on an array, list, or other collection, you create a copy of the collection as a List<T>. The performance here depends on the size of the list. You should do it when really necessary.
In your example, you call it on an array. It iterates over the array and adds the items one by one to a newly created list. So the performance impact depends on the number of files.
When calling on an IEnumerable<T>, you materialize the IEnumerable<T> (usually a query).
ToList Will create a new list and copy elements from original source to the newly created list so only thing is to copy the elements from the original source and depends on the source size
Let's look for another example;
If you working on databases when you run ToList() method and check the SQL Profiler for this code;
var IsExist = (from inc in entities.be_Settings
where inc.SettingName == "Number"
select inc).ToList().Count > 0;
Auto created query will like this:
SELECT [Extent1].[SettingName] AS [SettingName], [Extent1].[SettingValue] AS [SettingValue] FROM [dbo].[be_Settings] AS [Extent1] WHERE N'Number' = [Extent1].[SettingName]
The select query is run with the ToList method, and the results of the query are stored in memory, and it is checked whether there is a record by looking at the number of elements of the List. For example, if there are 1000 records in your table with the relevant criteria, these 1000 records are first brought from the database and converted into objects, and then they are thrown into a List and you only check the number of elements of this List. So this is very inefficient way to choose.
its not exactly about list perfomance but if u have high dimensional array
u can use HashSet instead of List.

What is the fastest way of changing Dictionary<K,V>?

This is an algorithmic question.
I have got Dictionary<object,Queue<object>>. Each queue contains one or more elements in it. I want to remove all queues with only one element from the dictionary. What is the fastest way to do it?
Pseudo-code: foreach(item in dict) if(item.Length==1) dict.Remove(item);
It is easy to do it in a loop (not foreach, of course), but I'd like to know which approach is the fastest one here.
Why I want it: I use that dictionary to find duplicate elements in a large set of objects. The Key in dictionary is kind of a hash of the object, the Value is a queue of all objects found with the same hash. Since I want only duplicates, I need to remove all items with just a single object in associated queue.
Update:
It may be important to know that in a regular case there are just a few duplicates in a large set of objects. Let's assume 1% or less. So possibly it could be faster to leave the Dictionary as is and create a new one from scatch with just selected elements from the first one... and then deelte the first Dictionary completely. I think it depends on the comlpexity of computational Dictionary class's methods used in particular algorithms.
I really want to see this problem on a theoretical level because as a teacher I want to discuss it with students. I didn't provide any concrete solution myself because I think it is really easy to do it. The question is which approach is the best, the fastest.
var itemsWithOneEntry = dict.Where(x => x.Value.Count == 1)
.Select(x => x.Key)
.ToList();
foreach (var item in itemsWithOneEntry) {
dict.Remove(item));
}
It stead of trying to optimize the traversing of the collection how about optimizing the content of the collection so that it only includes the duplicates? This would require changing your collection algorithm instead to something like this
var duplicates = new Dictionary<object,Queue<object>>;
var possibleDuplicates = new Dictionary<object,object>();
foreach(var item in original){
if(possibleDuplicates.ContainsKey(item)){
duplicates.Add(item, new Queue<object>{possibleDuplicates[item],item});
possibleDuplicates.Remove(item);
} else if(duplicates.ContainsKey(item)){
duplicates[item].Add(item);
} else {
possibleDuplicates.Add(item);
}
}
Note that you should probably measure the impact of this on the performance in a realistic scenario before you bother to make your code any more complex than it really needs to be. Most imagined performance problems are not in fact the real cause of slow code.
But supposing you do find that you could get a speed advantage by avoiding a linear search for queues of length 1, you could solve this problem with a technique called indexing.
As well as your dictionary containing all the queues, you maintain an index container (probably another dictionary) that only contains the queues of length 1, so when you need them they are already available separately.
To do this, you need to enhance all the operations that modify the length of the queue, so that they have the side-effect of updating the index container.
One way to do it is to define a class ObservableQueue. This would be a thin wrapper around Queue except it also has a ContentsChanged event that fires when the number of items in the queue changes. Use ObservableQueue everywhere instead of the plain Queue.
Then when you create a new queue, enlist on its ContentsChanged event a handler that checks to see if the queue only has one item. Based on this you can either insert or remove it from the index container.

Categories