Say I have a rolling collection of values where I specify the size of the collection and any time a new value is added, any old values beyond this specified size are dropped off. Obviously (and I've tested this) the best type of collection to use for this behavior is a Queue:
myQueue.Enqueue(newValue)
If myQueue.Count > specifiedSize Then myQueue.Dequeue()
However, what if I want to calculate the difference between the first and last items in the Queue? Obviously I can't access the items by index. But to switch from a Queue to something implementing IList seems like overkill, as does writing a new Queue-like class. Right now I've got:
Dim firstValue As Integer = myQueue.Peek()
Dim lastValue As Integer = myQueue.ToArray()(myQueue.Count - 1)
Dim diff As Integer = lastValue - firstValue
That call to ToArray() bothers me, but a superior alternative isn't coming to me. Any suggestions?
One thing you could do is have a temporary variable that stores the value that was just enqueued because that will be the last value and so the variable can be accessed to get that value.
Seems to me if you need quick access to the first item in the list, then you're using the wrong data structure. Switch a LinkedList instead, which conveniently has First and Last properties.
Be sure you only add and remove items to the linked list using AddLast and RemoveFirst to maintain the Queue property. To prevent yourself from inadvertantly violating the Queue property, consider creating a wrapper class around the linked list and exposing only the properties you need from your queue.
public class LastQ<T> : Queue<T>
{
public T Last { get; private set; }
public new void Enqueue(T item)
{
Last = item;
base.Enqueue(item);
}
}
Edit:
Obviously this basic class should be more robust to do things like protect the Last property on an empty queue. But this should be enough for the basic idea.
You could use a deque (double-ended queue).
I don't think there is one built into System.Collections(.Generic) but here's some info on the data structure. If you implemented something like this you could just use PeekLeft() and PeekRight() to get the first and last values.
Of course, it will be up to you whether or not implementing your own deque is preferable to dealing with the unsexiness of ToArray(). :)
http://www.codeproject.com/KB/recipes/deque.aspx
Your best bet would be to keep track of the last value added to the Queue, then use the myQueue.Peek() function to see the "first" (meaning next) item in the list without removing it.
Related
I encountered a problem with SortedList where 2 methods give 2 diffrent results.
//Item Data is one of this sortedList item
var itemPos = Items.IndexOfValue(ItemData);
Item item;
Items.TryGetValue(itemPos, out item);
The result is not that obvious. I will operate on numbers rather on abstracts letters to better ilustrate what is happening.
itemPos is getting set to 5. Ok! Next we try to get this item again form this index but no. It returns null. Ofcourse this is not happening immediately. This code is called before that happens.
public void MoveItem(int indexFrom, int indexWhere)
{
Item itemToSawp;
Items.TryGetValue(indexFrom, out itemToSawp);
Items.Remove(indexFrom);
Items.Add(indexWhere, itemToSawp);
}
To move items on sorted list we have to remove and add item again. Ok! Debugging says that operation went wonderfully and my item have now index 5 i moved it from index 4. Where index 5 were empty before method MoveItem.
Or was it empty? Before that operation i had index 5 filled with stuff and i called simple Items.Remove(5);
But now happens what i described before.
Worth noting that this only is happening when i move item upwards in index, look from 4-5. When i move from 5-4 everything works correctly.
Do you have any idea what is going on over here? I'm using .NET 3.5
Full code
using System;
using System.Collections.Generic;
class Program
{
static SortedList<int, ItemData> Items = new SortedList<int, ItemData>();
static void Main(string[] args)
{
var Foo = new ItemData();
Items.Add(0, Foo);
Items.Add(1, new ItemData());
Items.Remove(1);
MoveItem(0, 1);
var itemPos = Items.IndexOfValue(Foo);
Console.WriteLine(itemPos);
//Console should return 1 i think
ItemData item;
Items.TryGetValue(itemPos, out item);
}
public static void MoveItem(int indexFrom, int indexWhere)
{
ItemData itemToSawp;
Items.TryGetValue(indexFrom, out itemToSawp);
Items.Remove(indexFrom);
Items.Add(indexWhere, itemToSawp);
}
class ItemData
{
}
}
EDIT: This is a bit confusing, but! Indexer (look at name of it :P) Take as argument KEY not INDEX. This is what get me confused and i mixedup everythig. So as Christoph said. Its better to use you own list you could use it as desire or read documentation deeply.
From what I can tell, you are understanding the concept of a SortedList incorrectly. SortedLists are used when you have objects associated with keys, and the keys are sortable, and their order is relevant for performance, a certain algorithm or so. For example, think of a marathon, you could store Runner objects based on their finishing time in a SortedList. In any case, keep in mind the key is a sortable value, and each key is associated with an arbitrary value object.
Now, some problems that I observe in your code:
In the first code box, line 2, you find the index of a value object. That defeats the purpose of using a SortedList because that operation is slow whereas looking up a value using a key is fast (via internal hashtable or so).
In the first code box, line 4, you call TryGetValue. Look up the definition, the first parameter is a key, not an index within the SortedList. So this example is wrong from a semantic point of view.
Regarding moving items around in a SortedList (code box 2), that will always require removing the value object using the original key and then adding the value object with a different key (typically larger or smaller). But then again, I don't see why you would want to move items around in a SortedList anyway. The whole point is that you can simply add value objects that are associated with a sortable key, and the SortedList sorts all those objects automatically for you.
I have a feeling that you might want to consider a regular List object, or even just an array if size is fixed or limited. Then you get all your index semantics and can swap items around if that is what your algorithm really wants to do.
Edit: I just saw the complete code. My general recommendations above stand. The issue in your full example is that you are confusing keys with indexes. After MoveItem(0,1), the Foo object is registered with key 1, but since there is just one entry in the SortedList, it is at index 0, which you get with IndexOfValue (the slow operation). Then when you do a TryGetValue, you really look up an entry with key 0, which doesn't exist. You incorrectly assumed TryGetValue would take an index as parameter.
You're confused between the key of the entry and the index of the entry. Your MoveItem method just changes the key associated with a value (by removing the old entry and creating a new entry). After these lines:
Items.Add(0, Foo);
Items.Add(1, new ItemData());
Items.Remove(1);
... there will only be a single entry in the collection, and MoveItem will remove/add so that doesn't change the count. Therefore IndexOfValue can only return 0 (or -1 if it's not found).
To get 1, you'd want to find the key associated with the value, not the index. For example:
int index = Items.IndexOfValue(Foo);
int key = Items.Keys[index];
Console.WriteLine("Key = {0}", key); // Prints 1
Note that TryGetValue takes the key, not the index - so this line:
Items.TryGetValue(itemPos, out item);
... would be a very odd one.
All of this is easier to see if you use a different key type, e.g. a string. Then you can't get confused between keys and indexes, because the types are different and the compiler won't let you use one where you meant the other.
This is an algorithmic question.
I have got Dictionary<object,Queue<object>>. Each queue contains one or more elements in it. I want to remove all queues with only one element from the dictionary. What is the fastest way to do it?
Pseudo-code: foreach(item in dict) if(item.Length==1) dict.Remove(item);
It is easy to do it in a loop (not foreach, of course), but I'd like to know which approach is the fastest one here.
Why I want it: I use that dictionary to find duplicate elements in a large set of objects. The Key in dictionary is kind of a hash of the object, the Value is a queue of all objects found with the same hash. Since I want only duplicates, I need to remove all items with just a single object in associated queue.
Update:
It may be important to know that in a regular case there are just a few duplicates in a large set of objects. Let's assume 1% or less. So possibly it could be faster to leave the Dictionary as is and create a new one from scatch with just selected elements from the first one... and then deelte the first Dictionary completely. I think it depends on the comlpexity of computational Dictionary class's methods used in particular algorithms.
I really want to see this problem on a theoretical level because as a teacher I want to discuss it with students. I didn't provide any concrete solution myself because I think it is really easy to do it. The question is which approach is the best, the fastest.
var itemsWithOneEntry = dict.Where(x => x.Value.Count == 1)
.Select(x => x.Key)
.ToList();
foreach (var item in itemsWithOneEntry) {
dict.Remove(item));
}
It stead of trying to optimize the traversing of the collection how about optimizing the content of the collection so that it only includes the duplicates? This would require changing your collection algorithm instead to something like this
var duplicates = new Dictionary<object,Queue<object>>;
var possibleDuplicates = new Dictionary<object,object>();
foreach(var item in original){
if(possibleDuplicates.ContainsKey(item)){
duplicates.Add(item, new Queue<object>{possibleDuplicates[item],item});
possibleDuplicates.Remove(item);
} else if(duplicates.ContainsKey(item)){
duplicates[item].Add(item);
} else {
possibleDuplicates.Add(item);
}
}
Note that you should probably measure the impact of this on the performance in a realistic scenario before you bother to make your code any more complex than it really needs to be. Most imagined performance problems are not in fact the real cause of slow code.
But supposing you do find that you could get a speed advantage by avoiding a linear search for queues of length 1, you could solve this problem with a technique called indexing.
As well as your dictionary containing all the queues, you maintain an index container (probably another dictionary) that only contains the queues of length 1, so when you need them they are already available separately.
To do this, you need to enhance all the operations that modify the length of the queue, so that they have the side-effect of updating the index container.
One way to do it is to define a class ObservableQueue. This would be a thin wrapper around Queue except it also has a ContentsChanged event that fires when the number of items in the queue changes. Use ObservableQueue everywhere instead of the plain Queue.
Then when you create a new queue, enlist on its ContentsChanged event a handler that checks to see if the queue only has one item. Based on this you can either insert or remove it from the index container.
I have a third party api, which has a class that returns an enumerator for different items in the class.
I need to remove an item in that enumerator, so I cannot use "for each". Only option I can think of is to get the count by iterating over the enum and then run a normal for loop to remove the items.
Anyone know of a way to avoid the two loops?
Thanks
[update] sorry for the confusion but Andrey below in comments is right.
Here is some pseudo code out of my head that won't work and for which I am looking a solution which won't involve two loops but I guess it's not possible:
for each (myProperty in MyProperty)
{
if (checking some criteria here)
MyProperty.Remove(myProperty)
}
MyProperty is the third party class that implements the enumerator and the remove method.
Common pattern is to do something like this:
List<Item> forDeletion = new List<Item>();
foreach (Item i in somelist)
if (condition for deletion) forDeletion.Add(i);
foreach (Item i in forDeletion)
somelist.Remove(i); //or how do you delete items
Loop through it once and create a second array which contains the items which should not be deleted.
If you know it's a collection, you can go with reverted for:
for (int i = items.Count - 1; i >= 0; i--)
{
items.RemoveAt(i);
}
Otherwise, you'll have to do two loops.
You can create something like this:
public IEnumerable<item> GetMyList()
{
foreach (var x in thirdParty )
{
if (x == ignore)
continue;
yield return x;
}
}
I need to remove an item in that enumerator
As long as this is a single item that's not a problem. The rule is that you cannot continue to iterate after modifying the collection. Thus:
foreach (var item in collection) {
if (item.Equals(toRemove) {
collection.Remove(toRemove);
break; // <== stop iterating!!
}
}
It is not possible to remove an item from an Enumerator. What you can do is to copy or filter(or both) the content of the whole enumeration sequence.
You can achieve this by using linq and do smth like this:
YourEnumerationReturningFunction().Where(item => yourRemovalCriteria);
Can you elaborate on the API and the API calls you are using?
If you receive an IEnumerator<T> or IEnumerable<T> you cannot remove any item from the sequence behind the enumerator because there is no method to do so. And you should of course not rely on down casting an received object because the implementation may change. (Actually a well designed API should not expose mutable objects holding internal state at all.)
If you receive IList<T> or something similar you can just use a normal for loop from back to front and remove the items as needed because there is no iterator which state could be corrupted. (Here the rule about exposing mutable state should apply again - modifying the returned collection should not change any state.)
IEnumerator.Count() will decide at run-time what it needs to do - enumerate to count or reflect to see it's a collection and call .Count that way.
I like SJoerd's suggestion but I worry about how many items we may be talking about.
Why not something like ..
// you don't want 2 and 3
IEnumerable<int> fromAPI = Enumerable.Range(0, 10);
IEnumerable<int> result = fromAPI.Except(new[] { 2, 3 });
A clean, readable way to do this is as follows (I'm guessing at the third-party container's API here since you haven't specified it.)
foreach(var delItem in ThirdPartyContainer.Items
.Where(item=>ShouldIDeleteThis(item))
//or: .Where(ShouldIDeleteThis)
.ToArray()) {
ThirdPartyContainer.Remove(delItem);
}
The call to .ToArray() ensures that all items to be deleted have been greedily cached before the foreach iteration begins.
Behind the scenes this involves an array and an extra iteration over that, but that's generally very cheap, and the advantage of this method over the other answers to this question is that it works on plain enumerables and does not involve tricky mutable state issues that are hard to read and easy to get wrong.
By contrast, iterating in reverse, while not rocket science, is much more prone to off-by-one errors and harder to read; and it also relies on internals of the collection such as not changing order in between deletions (e.g. better not be a binary heap, say). Manually adding items that should be deleted to a temporary list is just unnecessary code - that's what .ToArray() will do just fine :-).
an enumerator always has a private field pointing to the real collection.
you can get it via reflection.modify it.
have fun.
I have a dictionary with around 1 milions items. I am constantly looping throw the dictionnary :
public void DoAllJobs()
{
foreach (KeyValuePair<uint, BusinessObject> p in _dictionnary)
{
if(p.Value.MustDoJob)
p.Value.DoJob();
}
}
The execution is a bit long, around 600 ms, I would like to deacrese it. Here is the contraints :
MustDoJob values mostly stay the same beetween two calls to DoAllJobs()
60-70% of the MustDoJob values == false
From time to times MustDoJob change for 200 000 pairs.
Some p.Value.DoJob() can not be computed at the same time (COM object call)
Here, I do not need the key part of the _dictionnary objet but I really do need it somewhere else
I wanted to do the following :
Parallelizes but I am not sure is going to be effective due to 4.
Sorts the dictionnary since 1. and 2. (and stop want I find the first MustDoJob == false) but I am wondering what 3. would result in
I did not implement any of the previous ideas since it could be a lot of job and I would like to investigate others options before. So...any ideas ?
What I would suggest is that your business object could raise an event to indicate that it needs to do a job when MustDoJob becomes true and you can subscribe to that event and store references to those objects in a simple list and then process the contents of that list when the DoAllJobs() method is called
My first suggestion would be to use just the values from the dictionary:
foreach (BusinessObject> value in _dictionnary.Values)
{
if(value.MustDoJob)
{
value.DoJob();
}
}
With LINQ this could be even easier:
foreach (BusinessObject value in _dictionnary.Values.Where(v => v.MustDoJob))
{
value.DoJob();
}
That makes it clearer. However, it's not clear what else is actually causing you a problem. How quickly do you need to be able to iterate over the dictionary? I expect it's already pretty nippy... is anything actually wrong with this brute force approach? What's the impact of it taking 600ms to iterate over the collection? Is that 600ms when nothing needs to do any work?
One thing to note: you can't change the contents of the dictionary while you're iterating over it - whether in this thread or another. That means not adding, removing or replacing key/value pairs. It's okay for the contents of a BusinessObject to change, but the dictionary relationship between the key and the object can't change. If you want to minimise the time during which you can't modify the dictionary, you can take a copy of the list of references to objects which need work doing, and then iterate over that:
foreach (BusinessObject value in _dictionnary.Values
.Where(v => v.MustDoJob)
.ToList())
{
value.DoJob();
}
Try using a profiler first. 4 makes me curious - 600ms may not be that much if the COM object uses most of the time, and then it is either paralellize or live with it.
I would get sure first - with a profiler run - that you dont target the totally wrong issue here.
Having established that the loop really is the problem (see TomTom's answer), I would maintain a list of the items on which MustDoJob is true -- e.g., when MustDoJob is set, add it to the list, and when you process and clear the flag, remove it from the list. (This might be done directly by the code manipulating the flag, or by raising an event when the flag changes; depends on what you need.) Then you loop through the list (which is only going to be 60-70% of the length), not the dictionary. The list might contain the object itself or just its key in the dictionary, although it will be more efficient if it holds the object itself as you avoid the dictionary lookup. It does depend on how frequently you're queuing 200k of them, and how time-critical the queuing vs. the execution is.
But again: Step 1 is make sure you're solving the right problem.
The use of a dictionary to me implies that the intention is to find items by a key, rather than visit every item. On the other hand, 600ms for looping through a million items is respectable.
Perhaps alter your logic so that you can simply pick the relevant items satisfying the condition directly out of the dictionary.
Use a List of KeyValuePairs instead. This means you can iterate over it super-quickly by doing
List<KeyValuePair<string,object>> list = ...;
int totalItems = list.Count;
for (int x = 0; x < totalItems; x++)
{
// whatever you plan to do with them, you have access to both KEY and VALUE.
}
I know this post is old, but I was looking for a way to iterate over a dictionary without the increased overhead of the Enumerator being created (GC and all), or generally a faster way to iterate over it.
I have an app that has a ConcurrentQueue of items that have an ID property and a ConcurrentQueue of tasks for each item, the queue items look like:
class QueueItem {
public int ID { get; set; }
public ConcurrentQueue<WorkItem> workItemQueue { get; set; }
}
and the queue itself looks like:
ConcurrentQueue<QueueItem> itemQueue;
I have one thread doing a foreach over the itemQueue, deQueueing an item from each queue and doing work on it:
foreach(var queueItem in itemQueue) {
WorkItem workItem;
if (queueItem.workItemQueue.TryDequeue(out workItem))
doWork(workItem);
else
// no more workItems for this queueItem
}
I'm using ConcurrentQueues because I have a separate thread potentially adding queueItems to the itemQueue, and adding workItems to each workItemQueue.
My problem comes when I have no more workItems in a queueItem - I'd like to remove that queueItem from the itemQueue - something like...
if (queueItem.workItemQueue.TryDequeue(out workItem))
doWork(workItem);
else
itemQueue.TryRemove(queueItem);
...but I can't find a way to do that easily. The way i've come up with is to dequeue each QueueItem and then Enqueue it if there's still WorkItems in the workItemQueue:
for (int i = 0; i < itemQueue.Count; i++) {
QueueItem item;
itemQueue.TryDequeue(out queueItem);
if (queueItem.workItemQueue.TryDequeue(out workItem)) {
itemQueue.Enqueue(queueItem);
doWork(workItem);
}
else
break;
}
Is there a better way to accomplish what I want using the PFX ConcurrentQueue, or is this a reasonable way to do this, should I use a custom concurrent queue/list implementation or am I missing something?
In general, there is no efficient ways to remove specific items from queues. They generally have O(1) queue and dequeues, but O(n) removes, which is what your implementation does.
One alternative structure is something called a LinkedHashMap. Have a look at the Java implementation if you are interested.
It is essentially a Hash table and a linked list, which allows O(1) queue, dequeue and remove.
This isn't implemented in .Net yet, but there are a few implementations floating around the web.
Now, the question is, why is itemQueue a queue? From your code samples, you never enqueue or dequeue anything from it (except to navigate around the Remove problem). I have a suspicion that your problem could be simplified if a more suitable data structure is used. Could you give examples on what other pieces of code access itemQueue?
This may not work for everyone, but the following is the solution I came up with for removing an item from a concurrent queue, since this is the first google result, I thought I would leave my solution behind.
What I did was temporarily replace the working queue with an empty, convert the original to a list and remove the item(s), then create a new queue from the modified list and put it back.
In code (sorry this is VB.net rather C#):
Dim found As Boolean = False
//'Steal the queue for a second, wrap the rest in a try-finally block to make sure we give it back
Dim theCommandQueue = Interlocked.Exchange(_commandQueue, New ConcurrentQueue(Of Command))
Try
Dim cmdList = theCommandQueue.ToList()
For Each item In cmdList
If item Is whateverYouAreLookingFor Then
cmdList.Remove(item)
found = True
End If
Next
//'If we found the item(s) we were looking for, create a new queue from the modified list.
If found Then
theCommandQueue = New ConcurrentQueue(Of Command)(cmdList)
End If
Finally
//'always put the queue back where we found it
Interlocked.Exchange(_commandQueue, theCommandQueue)
End Try
Aside: This is my first answer, so feel free to put up some editing advice and/or edit my answer.
Queues are meant when you want to handle items in a FIFO style, Stacks for LIFO. There is also a concurrentdictionary and a concurrentbag. Make sure that a queue is actually what you want. I don't think I would ever do a foreach on a concurrentqueue.
What you likely want is a single queue for your work items (have them use a common interface and make a queue on the interface, the interface should expose the inherited type to which it can be recast later if needed). If the workitems belong to a parent, then a property can be used which will hold a key to the parent (consider a GUID for the key), and the parent can be kept in a concurrentdictionary and referenced/removed as needed.
If you must do it the way you have it, consider adding a flag. you can then mark the item in the itemqueue as 'closed' or whatever, so that when it is dequeued, it will get ignored.