I want to safely iterate(not get a collection was changed during iteration) through an array that can be changed by another thread.
What's the best way I can do it?
What do you mean by "safely"? Do you mind seeing a mixture of old and new values? If so, you can just iterate using foreach. (This is for two reasons: firstly, arrays don't have an internal "version number" like List<T> does, so the iterator can't detect that the array has changed. Secondly, using foreach on a type known to be an array in C# causes the compiler to use the length and indexer anyway, rather than using an iterator.)
If you want to get a snapshot, you basically need to take a copy of the array and iterate over that.
EDIT: I was assuming you wanted concurrency. (I'm not sure why I assumed that.) I agree with the other answers that locking is an alternative - but you need to lock around the whole iteration, e.g.
lock (someLock)
{
foreach (var x in array)
{
// Stuff
}
}
and likewise in the writing thread:
lock (someLock)
{
array[0] = "foo";
}
(You could lock on the array itself, but I generally prefer to have private locks.)
Make a copy? I guess it depends on whether you want your iteration to be a 'snapshot in time' or if you want to see the changes 'live'. The latter can get pretty dicey.
Target 4.0 and use one of the many thread safe collections.
You probably want to use syncronized access. You can lock the array for the time you need to parse it and the time you add/remove items.
You could use lock/unlock (http://msdn.microsoft.com/en-us/library/c5kehkcz(VS.80).aspx
For ultra safeness, you could lock all access to the array while you are iterating through it. Its the only way to ensure that the array you are iterating through is fully up to date..
Related
I've got a list full of structs, which I want to iterate through and alter concurrently.
The code is conceptually as follows:
Parallel.For(0, pointsList.Count(), i=> pointsList[i] = DoThing(pointsList[i]));
I'm neither adding to nor removing from the list, only accessing and mutating its items.
I imagine this is fine, but thought I should check: is this OK as is, or do I need to use Lock somewhere for fear that I mess up the list object??
It is not guaranteed to be safe. Changing an element in the list increments the list's private _version field, which is how various threads can tell if the list has been modified. If any other thread attempts to enumerate the list or uses a method like List<T>.ForEach() then there could potentially be an issue. It sounds like it would be extremely rare, which isn't necessarily a good thing-- in fact, it makes for the most maddening type of defect.
If you want to be safe, create a new list instead of changing the existing one. This is a common "functional" approach to threading issues that avoids locks.
var newList = pointsList.AsParallel().Select( item => DoThing(item) );
When you're done, you can always replace the old list if you want.
pointsList = newList.ToList();
I've read on here that iterating though a dictionary is generally considered abusing the data structure and to use something else.
However, I'm having trouble coming up with a better way to accomplish what I'm trying to do.
When a tag is scanned I use its ID as the key and the value is a list of zones it was seen in. About every second I check to see if a tag in my dictionary has been seen in two or more zones and if it has, queue it up for some calculations.
for (int i = 0; i < TagReads.Count; i++)
{
var tag = TagReads.ElementAt(i).Value;
if (tag.ZoneReads.Count > 1)
{
Report.Tags.Enqueue(tag);
Boolean success = false;
do
{
success = TagReads.TryRemove(tag.Epc, out TagInfo outTag);
} while (!success);
}
}
I feel like a dictionary is the correct choice here because there can be many tags to look up but something about this code nags me as being poor.
As far as efficiency goes. The speed is fine for now in our small scale test environment but I don't have a good way to find out how it will work on a massive scale until it is put to use, hence my concern.
I believe that there's an alternative approach which doesn't involve iterating a big dictionary.
First of all, you need to create a HashSet<T> of tags on which you'll store those tags that have been detected in more than 2 zones. We'll call it tagsDetectedInMoreThanTwoZones.
And you may refactor your code flow as follows:
A. Whenever you detect a tag in one zone...
Add the tag and the zone to the main dictionary.
Create an exclusive lock against tagsDetectedInMoreThanTwoZones to avoid undesired behaviors in B..
Check if the key has more than one zone. If this is true, add it to tagsDetectedInMoreThanTwoZones.
Release the lock against tagsDetectedInMoreThanTwoZones.
B. Whenever you need to process a tag which has been detected in more than one zone...
Create an exclusive lock against tagsDetectedInMoreThanTwoZones to avoid more than a thread trying to process them.
Iterate tagsDetectedInTwoOrMoreZones.
Use each tag in tagsDetectedInMoreThanTwoZones to get the zones in your current dictionary.
Clear tagsDetectedInMoreThanTwoZones.
Release the exclusive lock against tagsDetectedInMoreThanTwoZones.
Now you'll iterate those tags that you already know that have been detected in more than a zone!
In the long run, you can even make per-region partitions so you never get a tagsDetectedInMoreThanTwoZones set with too many items to iterate, and each set could be consumed by a dedicated thread!
If you are going to do a lot of lookup in your code and only sometimes iterate through the whole thing, then I think the dictionary use is ok. I would like to point out thought that your use of ElementAt is more alarming. ElementAt performs very poorly when used on objects that do not implement IList<T> and the dictionary does not. For IEnumerables<T> that do not implement IList the way the nth element is found is through iteration, so your for-loop will iterate the dictionary once for each element. You are better off with a standard foreach.
I feel like this is a good use for a dictionary, giving you good access speed when you want to check if an ID is already in the collection.
Is there a function that would allow you to add an element to a List<T> and get the index as int back? List<T>.Add() is a void and does not return a value.
List.Count-1 is a solution when your are not working with threads. My list is a static member and can be accessed with multiple thread at a time and using the count-1 value is totally thread unsafe and could easily lead to wrong results.
The index will be used for specific treatment to each element.
Thank you!
List's methods are not thread safe alone; if you just call List.Add repeatedly from several threads you could very well run into problems. You need to use some sort of synchronization technique if you're going to use a List in the first place, so you might as well include a call to List.Count inside of that critical section.
Your other option would be to use a collection that is designed to be used by multiple threads, such as those in System.Collections.Concurrent.
if you want a thread safe index you need locking since people could have deleted an item/changed the list even if you have the index.
That said, you could try list.IndexOf(value) which returns the index of value (assuming all elements are unique)
I believe that the list has a indexOf() method, so you can call it after adding the element to the list like this:
list.add(element);
int index = list.indexOf(element);
if Peek returns the next object in a queue, is there a method I can use to get a specific object? For example, I want to find the third object in the queue and change one of its values?
right now Im just doing a foreach through the queue which might be the best solution, but I didnt know if there was something special you can use with peek? ie Queue.Peek(2)
If you want to access elements directly (with an O(1) operation), use an array instead of a queue because a queue has a different function (FIFO).
A random access operation on a queue will be O(n) because it needs to iterate over every element in the collection...which in turn makes it sequential access, rather than direct random access.
Then again, since you're using C#, you can use queue.ElementAt(n) from System.Linq (since Queue implements IEnumerable) but that won't be O(1) i.e. it will still iterate over the elements.
Although this is still O(n), it's certainly easier to read if you use the LINQ extention methods ElementAt() or ElementAtOrDefault(), these are extentions of IEnumerable<T>, which Queue<T> implements.
using System.Linq;
Queue<T> queue = new Queue<T>();
T result;
result = queue.ElementAt(2);
result = queue.ElementAtOrDefault(2);
Edit
If you do go with the other suggestions of converting your Queue to an array just for this operation that you need to decide if the likely sizes of your queue and the distance of the index you'll be looking for from the start of your queue justify the O(n) operation of calling .ToArray(). ElementAt(m), not to mention the space requirements of creating a secondary storage location for it.
foreach through a queue. Kind of a paradox.
However, if you can foreach, it is an IEnumerable, so the usual linq extensions apply:
queue.Skip(1).FirstOrDefault()
or
queue.ElementAt(1)
You could do something like this as a one off:
object thirdObjectInQueue = queue.ToArray()[2];
I wouldn't recommend using it a lot, however, as it copies the whole queue to an array, thereby iterating over the whole queue anyway.
I have a dictionary with around 1 milions items. I am constantly looping throw the dictionnary :
public void DoAllJobs()
{
foreach (KeyValuePair<uint, BusinessObject> p in _dictionnary)
{
if(p.Value.MustDoJob)
p.Value.DoJob();
}
}
The execution is a bit long, around 600 ms, I would like to deacrese it. Here is the contraints :
MustDoJob values mostly stay the same beetween two calls to DoAllJobs()
60-70% of the MustDoJob values == false
From time to times MustDoJob change for 200 000 pairs.
Some p.Value.DoJob() can not be computed at the same time (COM object call)
Here, I do not need the key part of the _dictionnary objet but I really do need it somewhere else
I wanted to do the following :
Parallelizes but I am not sure is going to be effective due to 4.
Sorts the dictionnary since 1. and 2. (and stop want I find the first MustDoJob == false) but I am wondering what 3. would result in
I did not implement any of the previous ideas since it could be a lot of job and I would like to investigate others options before. So...any ideas ?
What I would suggest is that your business object could raise an event to indicate that it needs to do a job when MustDoJob becomes true and you can subscribe to that event and store references to those objects in a simple list and then process the contents of that list when the DoAllJobs() method is called
My first suggestion would be to use just the values from the dictionary:
foreach (BusinessObject> value in _dictionnary.Values)
{
if(value.MustDoJob)
{
value.DoJob();
}
}
With LINQ this could be even easier:
foreach (BusinessObject value in _dictionnary.Values.Where(v => v.MustDoJob))
{
value.DoJob();
}
That makes it clearer. However, it's not clear what else is actually causing you a problem. How quickly do you need to be able to iterate over the dictionary? I expect it's already pretty nippy... is anything actually wrong with this brute force approach? What's the impact of it taking 600ms to iterate over the collection? Is that 600ms when nothing needs to do any work?
One thing to note: you can't change the contents of the dictionary while you're iterating over it - whether in this thread or another. That means not adding, removing or replacing key/value pairs. It's okay for the contents of a BusinessObject to change, but the dictionary relationship between the key and the object can't change. If you want to minimise the time during which you can't modify the dictionary, you can take a copy of the list of references to objects which need work doing, and then iterate over that:
foreach (BusinessObject value in _dictionnary.Values
.Where(v => v.MustDoJob)
.ToList())
{
value.DoJob();
}
Try using a profiler first. 4 makes me curious - 600ms may not be that much if the COM object uses most of the time, and then it is either paralellize or live with it.
I would get sure first - with a profiler run - that you dont target the totally wrong issue here.
Having established that the loop really is the problem (see TomTom's answer), I would maintain a list of the items on which MustDoJob is true -- e.g., when MustDoJob is set, add it to the list, and when you process and clear the flag, remove it from the list. (This might be done directly by the code manipulating the flag, or by raising an event when the flag changes; depends on what you need.) Then you loop through the list (which is only going to be 60-70% of the length), not the dictionary. The list might contain the object itself or just its key in the dictionary, although it will be more efficient if it holds the object itself as you avoid the dictionary lookup. It does depend on how frequently you're queuing 200k of them, and how time-critical the queuing vs. the execution is.
But again: Step 1 is make sure you're solving the right problem.
The use of a dictionary to me implies that the intention is to find items by a key, rather than visit every item. On the other hand, 600ms for looping through a million items is respectable.
Perhaps alter your logic so that you can simply pick the relevant items satisfying the condition directly out of the dictionary.
Use a List of KeyValuePairs instead. This means you can iterate over it super-quickly by doing
List<KeyValuePair<string,object>> list = ...;
int totalItems = list.Count;
for (int x = 0; x < totalItems; x++)
{
// whatever you plan to do with them, you have access to both KEY and VALUE.
}
I know this post is old, but I was looking for a way to iterate over a dictionary without the increased overhead of the Enumerator being created (GC and all), or generally a faster way to iterate over it.