In a parallel section of my code, I save the results from each thread to a ConcurrentBag. However, when this is complete, I need to iterate through each of these results and run them through my evaluation algorithm. Will a normal foreach actually iterate through all members, or do I need special code? I've also thought about using something like a queue instead of a bag, but I don't know which would be best. The bag will typically contain only 20 or so items at the end of the parallel code.
ie, will actually access and run foreach for ALL members of the ConcurrentBag?
ConcurrentBag futures = new ConcurrentBag();
foreach(move in futures)
{
// stuff
}
You don't need special code, the C#'s foreach will call "GetEnumerator" which gives you a snapshot:
The items enumerated represent a moment-in-time snapshot of the
contents of the bag. It does not reflect any update to the collection
after GetEnumerator was called.
You can use normal ForEach on ConcurrentBag, but it wont produce any performance impact.
This is similar like using ForEach on List.
For better performace use Parallel.Foreach on ConcurrentBag.
Parallel.ForEach(futures,move=> doOperation;);
Use ConcurrentBag, only if u want to perform multithreaded operations.
Related
I've got a list full of structs, which I want to iterate through and alter concurrently.
The code is conceptually as follows:
Parallel.For(0, pointsList.Count(), i=> pointsList[i] = DoThing(pointsList[i]));
I'm neither adding to nor removing from the list, only accessing and mutating its items.
I imagine this is fine, but thought I should check: is this OK as is, or do I need to use Lock somewhere for fear that I mess up the list object??
It is not guaranteed to be safe. Changing an element in the list increments the list's private _version field, which is how various threads can tell if the list has been modified. If any other thread attempts to enumerate the list or uses a method like List<T>.ForEach() then there could potentially be an issue. It sounds like it would be extremely rare, which isn't necessarily a good thing-- in fact, it makes for the most maddening type of defect.
If you want to be safe, create a new list instead of changing the existing one. This is a common "functional" approach to threading issues that avoids locks.
var newList = pointsList.AsParallel().Select( item => DoThing(item) );
When you're done, you can always replace the old list if you want.
pointsList = newList.ToList();
I am wondered at if foreach loop works slowly if an unstored list or array is used as an in array or List.
I mean like that:
foreach (int number in list.OrderBy(x => x.Value)
{
// DoSomething();
}
Does the loop in this code calculates the sorting every iteration or not?
The loop using stored value:
List<Tour> list = tours.OrderBy(x => x.Value) as List<Tour>;
foreach (int number in list)
{
// DoSomething();
}
And if it does, which code shows the better performance, storing the value or not?
This is often counter-intuitive, but generally speaking, the option that is best for performance is to wait as long as possible to materialize results into a concrete structure like a list or array. Please keep in mind that this is a generalization, and so there are plenty of cases where it doesn't hold. Nevertheless, the first instinct is better when you avoid creating the list for as long as possible.
To demonstrate with your sample, we have these two options:
var list = tours.OrderBy(x => x.Value).ToList();
foreach (int number in list)
{
// DoSomething();
}
vs this option:
foreach (int number in list.OrderBy(x => x.Value))
{
// DoSomething();
}
To understand what is going on here, you need to look at the .OrderBy() extension method. Reading the linked documentation, you'll see it returns a IOrderedEnumerable<TSource> object. With an IOrderedEnumerable, all of the sorting needed for the foreach loop is already finished when you first start iterating over the object (and that, I believe, is the crux of your question: No, it does not re-sort on each iteration). Also note that both samples use the same OrderBy() call. Therefore, both samples have the same problem to solve for ordering the results, and they accomplish it the same way, meaning they take exactly the same amount of time to reach that point in the code.
The difference in the code samples, then, is entirely in using the foreach loop directly vs first calling .ToList(), because in both cases we start from an IOrderedEnumerable. Let's look closely at those differences.
When you call .ToList(), what do you think happens? This method is not magic. There is still code here which must execute in order to produce the list. This code still effectively uses it's own foreach loop that you can't see. Additionally, where once you only needed to worry about enough RAM to handle one object at a time, you are now forcing your program to allocate a new block of RAM large enough to hold references for the entire collection. Moving beyond references, you may also potentially need to create new memory allocations for the full objects, if you were reading a from a stream or database reader before that really only needed one object in RAM at a time. This is an especially big deal on systems where memory is the primary constraint, which is often the case with web servers, where you may be serving and maintaining session RAM for many many sessions, but each session only occasionally uses any CPU time to request a new page.
Now I am making one assumption here, that you are working with something that is not already a list. What I mean by this, is the previous paragraphs talked about needing to convert an IOrderedEnumerable into a List, but not about converting a List into some form of IEnumerable. I need to admit that there is some small overhead in creating and operating the state machine that .Net uses to implement those objects. However, I think this is a good assumption. It turns out to be true far more often than we realize. Even in the samples for this question, we're paying this cost regardless, by the simple virtual of calling the OrderBy() function.
In summary, there can be some additional overhead in using a raw IEnumerable vs converting to a List, but there probably isn't. Additionally, you are almost certainly saving yourself some RAM by avoiding the conversions to List whenever possible... potentially a lot of RAM.
Yes and no.
Yes the foreach statement will seem to work slower.
No your program has the same total amount of work to do so you will not be able to measure a difference from the outside.
What you need to focus on is not using a lazy operation (in this case OrderBy) multiple times without a .ToList or ToArray. In this case you are only using it once(foreach) but it is an easy thing to miss.
Edit: Just to be clear. The as statement in the question will not work as intended but my answer assumes no .ToList() after OrderBy .
This line won't run:
List<Tour> list = tours.OrderBy(x => x.Value) as List<Tour>; // Returns null.
Instead, you want to store the results this way:
List<Tour> list = tours.OrderBy(x => x.Value).ToList();
And yes, the second option (storing the results) will enumerate much faster as it will skip the sorting operation.
if Peek returns the next object in a queue, is there a method I can use to get a specific object? For example, I want to find the third object in the queue and change one of its values?
right now Im just doing a foreach through the queue which might be the best solution, but I didnt know if there was something special you can use with peek? ie Queue.Peek(2)
If you want to access elements directly (with an O(1) operation), use an array instead of a queue because a queue has a different function (FIFO).
A random access operation on a queue will be O(n) because it needs to iterate over every element in the collection...which in turn makes it sequential access, rather than direct random access.
Then again, since you're using C#, you can use queue.ElementAt(n) from System.Linq (since Queue implements IEnumerable) but that won't be O(1) i.e. it will still iterate over the elements.
Although this is still O(n), it's certainly easier to read if you use the LINQ extention methods ElementAt() or ElementAtOrDefault(), these are extentions of IEnumerable<T>, which Queue<T> implements.
using System.Linq;
Queue<T> queue = new Queue<T>();
T result;
result = queue.ElementAt(2);
result = queue.ElementAtOrDefault(2);
Edit
If you do go with the other suggestions of converting your Queue to an array just for this operation that you need to decide if the likely sizes of your queue and the distance of the index you'll be looking for from the start of your queue justify the O(n) operation of calling .ToArray(). ElementAt(m), not to mention the space requirements of creating a secondary storage location for it.
foreach through a queue. Kind of a paradox.
However, if you can foreach, it is an IEnumerable, so the usual linq extensions apply:
queue.Skip(1).FirstOrDefault()
or
queue.ElementAt(1)
You could do something like this as a one off:
object thirdObjectInQueue = queue.ToArray()[2];
I wouldn't recommend using it a lot, however, as it copies the whole queue to an array, thereby iterating over the whole queue anyway.
This is something that I was exploring to see if I could take what was
List<MdiChild> openMdiChildren = new List<MdiChild>();
foreach(child in MdiManager.Pages)
{
openMdiChildren.Add(child);
}
foreach(child in openMdiChild)
{
child.Close();
}
and shorten it to not require 2 foreach loops.
Note I've changed what the objects are called to simplify this for this example (these come from 3rd party controls). But for information and understanding
MdiManager.Pages inherits form CollectionBase, which in turn inherits IEnumerable
and MdiChild.Close() removes the open child from the MdiManager.Pages Collection, thus altering the collection and causing the enumeration to throw an exception if the collection was modified during enumeration, e.g..
foreach(child in MdiManage.Pages)
{
child.Close();
}
I was able to the working double foreach to
((IEnumerable) MdiManager.Pages).Cast<MdiChild>.ToList()
.ForEach(new Action<MdiChild>(c => c.Close());
Why does this not have the same issues dealing with modifying the collection during enumeration? My best guess is that when Enumerating over the List created by the ToList call that it is actually executing the actions on the matching item in the MdiManager.Pages collection and not the generated List.
Edit
I want to make it clear that my question is how can I simplify this, I just wanted to understand why there weren't issues with modifying a collection when I performed it as I have it written currently.
Your call to ToList() is what saves you here, as it's essentially duplicating what you're doing above. ToList() actually creates a List<T> (a List<MdiChild> in this case) that contains all of the elements in MdiManager.Pages, then your subsequent call to ForEach operates on that list, not on MdiManager.Pages.
In the end, it's a matter of style preference. I'm not personally a fan of the ForEach function (I prefer the query composition functions like Where and ToList() for their simplicity and the fact that they aren't engineered to have side-effects upon the original source, whereas ForEach is not).
You could also do:
foreach(child in MdiManager.Pages.Cast<MdiChild>().ToList())
{
child.Close();
}
Fundamentally, all three approaches do exactly the same thing (they cache the contents of MdiManager.Pages into a List<MdiChild>, then iterate over that cached list and call Close() on each element.
When you call the ToList() method you're actually enumerating the MdiManager.Pages and creating a List<MdiChild> right there (so that's your foreach loop #1). Then when the ForEach() method executes it will enumerate the List<MdiChild> created previously and execute your action on each item (so that's foreach loop #2).
So essentially it's another way of accomplishing the same thing, just using LINQ.
You could also write it as:
foreach(var page in MdiManager.Pages.Cast<MdiChild>.ToList())
page.Close();
In any case, when you call ToList() extension method on an IEnumerable; you are creating a brand new list. Deleted from its source collection ( in this case, MdiManager.Pages ) will not affect the list output by ToList().
This same technique can be used to delete elements from a source collection without worrying about affecting the source enumerable.
You're mostly right.
ToList() creates a copy of the enumeration, and therefore you are enumerating the copy.
You could also do this, which is equivalent, and shows what you are doing:
var copy = new List<MdiChild>(MdiManager.Pages.Cast<MdiChild>());
foreach(var child in copy)
{
child.Close();
}
Since you are enumerating the elements of the copy enumeration, you don't have to worry about modifying the Pages collection, since each object referece that existed in the Pages collection now also exists in copy and changes to Pages don't affect it.
All the remaining methods on the call, ForEach() and the casts, are superfluous and can be eliminated.
At first glance, the culprit is ToList(), which is a method returning a copy of the items as a List, thus circumventing the problem.
I want to safely iterate(not get a collection was changed during iteration) through an array that can be changed by another thread.
What's the best way I can do it?
What do you mean by "safely"? Do you mind seeing a mixture of old and new values? If so, you can just iterate using foreach. (This is for two reasons: firstly, arrays don't have an internal "version number" like List<T> does, so the iterator can't detect that the array has changed. Secondly, using foreach on a type known to be an array in C# causes the compiler to use the length and indexer anyway, rather than using an iterator.)
If you want to get a snapshot, you basically need to take a copy of the array and iterate over that.
EDIT: I was assuming you wanted concurrency. (I'm not sure why I assumed that.) I agree with the other answers that locking is an alternative - but you need to lock around the whole iteration, e.g.
lock (someLock)
{
foreach (var x in array)
{
// Stuff
}
}
and likewise in the writing thread:
lock (someLock)
{
array[0] = "foo";
}
(You could lock on the array itself, but I generally prefer to have private locks.)
Make a copy? I guess it depends on whether you want your iteration to be a 'snapshot in time' or if you want to see the changes 'live'. The latter can get pretty dicey.
Target 4.0 and use one of the many thread safe collections.
You probably want to use syncronized access. You can lock the array for the time you need to parse it and the time you add/remove items.
You could use lock/unlock (http://msdn.microsoft.com/en-us/library/c5kehkcz(VS.80).aspx
For ultra safeness, you could lock all access to the array while you are iterating through it. Its the only way to ensure that the array you are iterating through is fully up to date..