I would like to get the same behavior as of List.Insert(Index, content )
In List , it just pushes the rest of elements forward while enables you to insert new element at specified index.
But I am dealing Concurrency so I can't use List anymore instead I need to use Concurrent collection. Any idea how we can achieve this?
Note :
I am trying to achieve custom Sorting of custom Objects stored in the concurrent collection( i.e. If at index = 2, Last Name is alphabetically less than the incoming LastName, it must allow the incoming to be placed at index = 2 , while pushing /sliding the old value to next indexes. Thus retaining all contents with new one)
The ConcurrentBag<T> does not provide the functionality that you are looking for. It's not a list, it's a bag. You can't control the order of its contents, and you can't even remove a specific item from this collection. All that you can do is to Add or Take an item.
The rich functionality that you are looking for is not offered by any concurrent collection. Your best bet is probably to use a normal List<T> protected with a lock. Just make sure that you never touch the List<T> outside of a protected region. Whether you need to Add, or Insert, or Remove, or enumerate, or read an item, or read the Count, or anything else, you must always do it inside a lock region that is locked with the same object.
As a side note, it is quite likely that what you are trying to do is fundamentally wrong. There is a reason that the functionality you are asking for is not available: It's practically impossible to use it in a meaningful way without introducing race-conditions. For example two threads could independently figure out that they must insert a new item in the index 5, based on the existing values in the list, and then both try to inserting it at this index, concurrently. Both will succeed, but one of the two items will end-up in the index 6 after being pushed by the other item, and the two items might not be in the correct order in respect to each other.
Related
What would be the easiest way to track how long an element has been part of a list? For instance, I would like to pop an element from a list after it has been added for 2 minutes.
Would I have to create two lists, one holding the actual element and the other the time that element was added to the list? Then checking the "time" list in order to know when it has reached two minutes?
I have a feeling theres a much simpler and efficient method to do this but I cannot think of it at the moment...
If you want to have the minimum amount of code to write, you can have a look at the MemoryCache class, which implements an expiration policy.
Using the CacheItemPolicy you can even have a callback method executed when the item is removed after expiration.
Rather than storing the elements in the lists directly, you could use a wrapper class which included the element and its storage time, then store instances of the wrapper class instead.
You would probably want to use a queue rather than a list; you will be removing items from the front a lot, which is far more efficient with a queue than with a list.
How often you check the queue is something you'd need to decide on. You could possibly use a separate thread to check every so often, in which case you'd probably want to use a ConcurrentQueue<T>
This is an algorithmic question.
I have got Dictionary<object,Queue<object>>. Each queue contains one or more elements in it. I want to remove all queues with only one element from the dictionary. What is the fastest way to do it?
Pseudo-code: foreach(item in dict) if(item.Length==1) dict.Remove(item);
It is easy to do it in a loop (not foreach, of course), but I'd like to know which approach is the fastest one here.
Why I want it: I use that dictionary to find duplicate elements in a large set of objects. The Key in dictionary is kind of a hash of the object, the Value is a queue of all objects found with the same hash. Since I want only duplicates, I need to remove all items with just a single object in associated queue.
Update:
It may be important to know that in a regular case there are just a few duplicates in a large set of objects. Let's assume 1% or less. So possibly it could be faster to leave the Dictionary as is and create a new one from scatch with just selected elements from the first one... and then deelte the first Dictionary completely. I think it depends on the comlpexity of computational Dictionary class's methods used in particular algorithms.
I really want to see this problem on a theoretical level because as a teacher I want to discuss it with students. I didn't provide any concrete solution myself because I think it is really easy to do it. The question is which approach is the best, the fastest.
var itemsWithOneEntry = dict.Where(x => x.Value.Count == 1)
.Select(x => x.Key)
.ToList();
foreach (var item in itemsWithOneEntry) {
dict.Remove(item));
}
It stead of trying to optimize the traversing of the collection how about optimizing the content of the collection so that it only includes the duplicates? This would require changing your collection algorithm instead to something like this
var duplicates = new Dictionary<object,Queue<object>>;
var possibleDuplicates = new Dictionary<object,object>();
foreach(var item in original){
if(possibleDuplicates.ContainsKey(item)){
duplicates.Add(item, new Queue<object>{possibleDuplicates[item],item});
possibleDuplicates.Remove(item);
} else if(duplicates.ContainsKey(item)){
duplicates[item].Add(item);
} else {
possibleDuplicates.Add(item);
}
}
Note that you should probably measure the impact of this on the performance in a realistic scenario before you bother to make your code any more complex than it really needs to be. Most imagined performance problems are not in fact the real cause of slow code.
But supposing you do find that you could get a speed advantage by avoiding a linear search for queues of length 1, you could solve this problem with a technique called indexing.
As well as your dictionary containing all the queues, you maintain an index container (probably another dictionary) that only contains the queues of length 1, so when you need them they are already available separately.
To do this, you need to enhance all the operations that modify the length of the queue, so that they have the side-effect of updating the index container.
One way to do it is to define a class ObservableQueue. This would be a thin wrapper around Queue except it also has a ContentsChanged event that fires when the number of items in the queue changes. Use ObservableQueue everywhere instead of the plain Queue.
Then when you create a new queue, enlist on its ContentsChanged event a handler that checks to see if the queue only has one item. Based on this you can either insert or remove it from the index container.
I look for a way to observe a collection for changes, without using indices. I know when items are added and removed from the source collection, but the collection is not using indices (it's a custom kind of hashset with notification on add/remove). Items are being populated to the list in a undeterministic order and using indices wouldnt make much sense, so I'm trying to avoid it completely. But Im still going to bind this list to a view, so there will be some ordering of the items eventually. My goal is to have all the ordering in a collectionview.
The question is if there is a way to make a collectionview on a index-less source collection and still get the UI to respond to items being removed and added effectively without having to rebuild the list everytime. I'm not sure if I make any sense here. My goal is to get rid of indices but still benefit from collectionchanged-events and collectionview-ordering. Possible?
UPDATE
I've tried to implement a custom ICollectionView such as SetCollectionView(HashSet set) but it won't work for some reason. Not yet anyway.
Another option could perhaps be to implement a custom ReadOnlyObservableCollection-wrapper with some custom ordering on the GetEnumerator. I haven't tested it yet. I would have to sort the list according to the choosen ordering before extracting the index for the NotifyCollectionChanged-event but that should work.
You can use the ObservableHashSet class.
See the question here
How can I make an Observable Hashset in C#?
or go directly to the code here:
http://geoffcox.bellacode.com/2011/12/09/observablehashset/
You need to have an index somewhere, because all of the UI binding plumbing is index-based. You can layer an indexed list over your existing hashset, but it's going to be slow (I can't provide a formal proof, but my gut tells me it would be quite awful, something like O(n)). If you want a quick base collection that you can layer re-ordered UI lists on top of, you might want to look into a balanced sorted tree, rather than a hashset.
I know what Sets are and common operations on sets like union, intersection, difference, subset. However i don't understand in which situations are set based operations desired? Any real world examples? What are the advantages of using set vs using a list or a Hash? If i have two lists then i can find the union,intersection of those lists too. So why use Sets?
Edit
I specifically want to know real world situations where i should use a set instead of a list.
Set guarantees there there is no duplicate object in it. List doesn't so you can have multiple entries of "equal" objects in a list. There are million of things that you can use set and it will make your life much easier, for example, a set of countries, a set of username, etc. If you use a list to store these data, you will need to check whether your list has already contained the same element or not before adding the new one unless the list is allowed to have duplicates.
In other words, set may be considered as a list without any duplicates. However, the interface of Set and List aren't really the same in Java. For example, you aren't able to get the element at certain position in a set. This is because position is not important in the set (but it is for a list). Therefore, selecting which data collection to use depends entirely on the purpose.
I, myself, found that Set is very useful in many cases and reduces the amount of checking for duplicates. One of my use cases is to use set to find how many chemical elements are in a molecule. The molecule contains a list of atom objects and each atom is associated to a element symbol so in order to find the type of element, I loop over all the atoms and add the element to an element set. All the duplicates are removed without hassle.
Among other things, sets typically guarantee access times of O(logN). They also enforce only one entry with a given value (by throwing exceptions when you try to add a duplicate).
Hashes typically offer O(1) access, but do not guarantee uniqueness.
I have a dictionary with around 1 milions items. I am constantly looping throw the dictionnary :
public void DoAllJobs()
{
foreach (KeyValuePair<uint, BusinessObject> p in _dictionnary)
{
if(p.Value.MustDoJob)
p.Value.DoJob();
}
}
The execution is a bit long, around 600 ms, I would like to deacrese it. Here is the contraints :
MustDoJob values mostly stay the same beetween two calls to DoAllJobs()
60-70% of the MustDoJob values == false
From time to times MustDoJob change for 200 000 pairs.
Some p.Value.DoJob() can not be computed at the same time (COM object call)
Here, I do not need the key part of the _dictionnary objet but I really do need it somewhere else
I wanted to do the following :
Parallelizes but I am not sure is going to be effective due to 4.
Sorts the dictionnary since 1. and 2. (and stop want I find the first MustDoJob == false) but I am wondering what 3. would result in
I did not implement any of the previous ideas since it could be a lot of job and I would like to investigate others options before. So...any ideas ?
What I would suggest is that your business object could raise an event to indicate that it needs to do a job when MustDoJob becomes true and you can subscribe to that event and store references to those objects in a simple list and then process the contents of that list when the DoAllJobs() method is called
My first suggestion would be to use just the values from the dictionary:
foreach (BusinessObject> value in _dictionnary.Values)
{
if(value.MustDoJob)
{
value.DoJob();
}
}
With LINQ this could be even easier:
foreach (BusinessObject value in _dictionnary.Values.Where(v => v.MustDoJob))
{
value.DoJob();
}
That makes it clearer. However, it's not clear what else is actually causing you a problem. How quickly do you need to be able to iterate over the dictionary? I expect it's already pretty nippy... is anything actually wrong with this brute force approach? What's the impact of it taking 600ms to iterate over the collection? Is that 600ms when nothing needs to do any work?
One thing to note: you can't change the contents of the dictionary while you're iterating over it - whether in this thread or another. That means not adding, removing or replacing key/value pairs. It's okay for the contents of a BusinessObject to change, but the dictionary relationship between the key and the object can't change. If you want to minimise the time during which you can't modify the dictionary, you can take a copy of the list of references to objects which need work doing, and then iterate over that:
foreach (BusinessObject value in _dictionnary.Values
.Where(v => v.MustDoJob)
.ToList())
{
value.DoJob();
}
Try using a profiler first. 4 makes me curious - 600ms may not be that much if the COM object uses most of the time, and then it is either paralellize or live with it.
I would get sure first - with a profiler run - that you dont target the totally wrong issue here.
Having established that the loop really is the problem (see TomTom's answer), I would maintain a list of the items on which MustDoJob is true -- e.g., when MustDoJob is set, add it to the list, and when you process and clear the flag, remove it from the list. (This might be done directly by the code manipulating the flag, or by raising an event when the flag changes; depends on what you need.) Then you loop through the list (which is only going to be 60-70% of the length), not the dictionary. The list might contain the object itself or just its key in the dictionary, although it will be more efficient if it holds the object itself as you avoid the dictionary lookup. It does depend on how frequently you're queuing 200k of them, and how time-critical the queuing vs. the execution is.
But again: Step 1 is make sure you're solving the right problem.
The use of a dictionary to me implies that the intention is to find items by a key, rather than visit every item. On the other hand, 600ms for looping through a million items is respectable.
Perhaps alter your logic so that you can simply pick the relevant items satisfying the condition directly out of the dictionary.
Use a List of KeyValuePairs instead. This means you can iterate over it super-quickly by doing
List<KeyValuePair<string,object>> list = ...;
int totalItems = list.Count;
for (int x = 0; x < totalItems; x++)
{
// whatever you plan to do with them, you have access to both KEY and VALUE.
}
I know this post is old, but I was looking for a way to iterate over a dictionary without the increased overhead of the Enumerator being created (GC and all), or generally a faster way to iterate over it.