Loop list while adding to it - c#

I've built my system in c-sharp (winforms) and I've run into a problem. In my view - my graphical interface - I'm starting a pretty heavy algorithm, which in each loop adds a result to a list in my view. The algorithm runs in a presenter (MVP pattern), using a backgroundworker - enabling the view not to freeze. As I said before, the algorithm runs in a loop, and since it's so heavy, I want to process the results of the algorithm as they come in.
View:
...
public List<string> Results { get; }
...
_presenter.RunAlgorithmAsync();
//Start processing results
...
Backgroundworker in presenter:
...
_view.Results.Add(result);
...
To sum it up, how can I start processing the list while the backgroundworker adds to it? Of course, the backgroundworker can work faster than the processing of the list, and vice versa - the processing may have to wait for results to arrive to the list, and the list need to be able up build up a stack of results.
I realize this question may be blurry, but if you ask me questions, I'm sure I can define the problem better.

Use a queue and have the two threads treat it as a producer and consumer.

Make the BackgroundWorker call a method in the view which adds the item to the list and processes it.

Use a threadsafe queue to drive your producer/consumer pattern, such as the .NET 4 ConcurrentQueue: http://www.codethinked.com/post/2010/02/04/NET-40-and-System_Collections_Concurrent_ConcurrentQueue.aspx

Is it something that you can use the ObservableCollection and catch the CollectionChanged event to catch and process each item as it's added to the collection?

Related

Best way to delay execution

Let's say I have a method that I run in a separate thread via Task.Factory.StartNew().
This method reports so many progress (IProgress) that it freezes my GUI.
I know that simply reducing the number of reports would be a solution, like reporting only 1 out of 10 but in my case, I really want to get all reports and display them in my GUI.
My first idea was to queue all reports and treat them one by one, pausing a little bit between each of them.
Firstly: Is it a good option?
Secondly: How to implement that? Using a timer or using some kind of Task.Delay()?
UPDATE:
I'll try to explain better. The progress sent to the GUI consists of geocoordinates that I display on a map. Displaying each progress one after another provide some kind of animation on the map. That's why I don't want to skip any of them.
In fact, I don't mind if the method that I execute in another thread finishes way before the animation. All I want, is to be sure that all points have been displayed for at least a certain amount of time (let's say 200 ms).
Sounds like the whole point of having the process run in a separate thread is wasted if this is the result. As such, my first recommendation would be to reduce the number of updates if possible.
If that is out of the question, perhaps you could revise the data you are sending as part of each update. How large, and how complex is the object or data-structure used for reporting? Can performance be improved by reducing it's complexity?
Finally, you might try another approach: What if you create a third thread that just handles the reporting, and delivers it to your GUI in larger chunks? If you let your worker-thread report it's status to this reporter-thread, then let the reporter thread report back to your main GUI-thread only occasionally (e.g. every 1 in 10, as you suggest yourself above, bur then reporting 10 chunks of data at once), then you won't call on your GUI that often, yet you'll still be able to keep all the status data from the processing, and make it available in the GUI.
I don't know how viable this will be for your particular situation, but it might be worth an experiment or two?
I have many concerns regarding your solution, but I can't say for sure which one can be a problem without code samples.
First of all, Stephen Cleary in his StartNew is Dangerous article points out the real problem with this method with using it with default parameters:
Easy enough for the simple case, but let’s consider a more realistic example:
private void Form1_Load(object sender, EventArgs e)
{
Compute(3);
}
private void Compute(int counter)
{
// If we're done computing, just return.
if (counter == 0)
return;
var ui = TaskScheduler.FromCurrentSynchronizationContext();
Task.Factory.StartNew(() => A(counter))
.ContinueWith(t =>
{
Text = t.Result.ToString(); // Update UI with results.
// Continue working.
Compute(counter - 1);
}, ui);
}
private int A(int value)
{
return value; // CPU-intensive work.
}
...
Now, the question returns: what thread does A run on? Go ahead and walk through it; you should have enough knowledge at this point to figure out the answer.
Ready? The method A runs on a thread pool thread the first time, and then it runs on the UI thread the last two times.
I strongly recommend you to read whole article for better understanding the StartNew method usage, but want to point out the last advice from there:
Unfortunately, the only overloads for StartNew that take a
TaskScheduler also require you to specify the CancellationToken and
TaskCreationOptions. This means that in order to use
Task.Factory.StartNew to reliably, predictably queue work to the
thread pool, you have to use an overload like this:
Task.Factory.StartNew(A, CancellationToken.None,
TaskCreationOptions.DenyChildAttach, TaskScheduler.Default);
And really, that’s kind of ridiculous. Just use Task.Run(() => A());.
So, may be your code can be easily improved simply by switching the method you are creating news tasks. But there is some other suggestions regarding your question:
Use BlockingCollection for the storing the reports, and write a simple consumer from this queue to UI, so you'll always have a limited number of reports to represent, but at the end all of them will be handled.
Use a ConcurrentExclusiveSchedulerPair class for your logic: for generating the reports use the ConcurrentScheduler Property and for displaying them use ExclusiveScheduler Property.

Thread Safe Generic List block Enumeration

First I would like to state that I have looked up probably a hundred google, and stackoverflow questions related to my question. I cannot find anything that answers my specific question.
I converted a DataTable into a List. I have multiple threads that enumerate the List with foreach. However, once every 5 minutes a master thread needs to refresh that List with the latest data. When this occurs, I need to block other threads from reading the thread until the master thread has fully updated the List.
All the articles and questions I have found blocks access on the single add. I know I can write a blocking for the update, but I need that lock to be also syncronized with all the other threads that enumerate. I do not want to update the List while other threads are in middle of their own enumeration.
How can I write a lock that will be utilized foreach statements and also for my update function?
Thanks
EDIT: I want to block "Consumers / Observers" when the producer thread is producing. I do not want Consumers / Observers blocking each other.
Put all code that populates and enumerates the list in a lock block:
lock (theList)
{
Update(theList);
}
lock (theList)
{
foreach (var thingy in theList)
{
DoStuff(thingy);
}
}
At first I though you could try to use a BlockingCollection. It implements the producer/consumer problem.
What you need actually is a way to set a rendez-vous for your reading threads and your master thread.
I think this can be done with monitors (especially by using Wait and Pulse methods). Try it and tell us what you found.

Task vs Barrier

So my problem is as follows: I have a list of items to process and I'd like to process the items in parallel then commit the processed items.
The barrier class in C# will allow me to do this - I can run threads in parallel to process the list of items and when SignalAndWait is called and all participants hit he barrier I can commit the processed items.
The Task class will also allow me to do this - on the Task.WaitAll call I can wait for all tasks to complete and I can commit the processed items. If I understand correctly each task will run on it's own thread not a bunch of tasks in parallel on the same thread.
Is my understand correct on both usages for the problem?
Is there any advantage between one over the other?
Is there any way a hybrid solution is better (barrier and tasks?).
Is my understand correct on both usages for the problem?
I think you have a misunderstanding of the Barrier class. The docs say:
A barrier is a user-defined synchronization primitive that enables multiple threads (known as participants) to work concurrently on an algorithm in phases.
A barrier is a synchronization primitive. Comparing it to a unit of work which may be computed in parallel such as a Task isn't correct.
A barrier can signal all threads to wait until all others have completed some work and check upon that work. By itself, it has no parallel computation capabilities and no threading model behind it.
Is there any advantage between one over the other?
As for question 1, you see this is irrelevant.
Is there any way a hybrid solution is better (barrier and tasks?).
In your case, I'm not sure its needed at all. If you sinply want to do CPU bound computation in parallel on a collection of items, you have Parallel.ForEach exactly for that purpose. It will partition an enumerable and invoke them in parallel, and block until the entire collection has been computed.
I'm not directly answering your question because I think that working with barriers and tasks is just making your code more complex than it needs to be.
I'd suggest using Microsoft's Reactive Framework for this - NuGet "Rx-Main" - as it just makes the whole problem super simple.
Here's the code:
var query =
from item in items.ToObservable()
from processed in Observable.Start(() => processItem(item))
select new { item, processed };
query
.ToArray()
.Subscribe(processedItems =>
{
/* commit the processed items */
});
The query turns a list of items into a observable and then processes each item using Observable.Start(...). This optimally fires off new threads as needed. The .ToArray() takes the sequence of individual results and changes it into a single array of results. The .Subscribe(...) method then allows you to process the results.
The code is much simpler than using tasks or barriers.

Reactive Extension subscribe is blocking my WPF application

I rewrote some old async code of mine that makes that makes SOAP calls. The fetch() method would go out, get the result from the SOAP interface, and then add it to a DataTable that is bound to my WPF view. The new code uses Reactive Extensions to get a list of strings and creates an IObservable from the list. I thought it would return the results asynchronously, but the entire UI locks up until the entire result set is ready. I'm new to Reactive Extensions so I'm hoping I'm just missing something simple.
The Code:
(from click event)
private void fetchSoapRows()
{
var strings = (txtInput.Text.Split('*')).ToObservable();
strings.Subscribe(s=> SoapQueryEngine.Fetch(s));
}
Also, does anyone know how I could write a test to make certain this method doesn't block the application in the future?
There are two parts to an observable query, the Query itself and the Subscription.
Your Query is an IEnumerable<string> producing values as fast as the computer can do it.
Your Subscription is
SoapQueryEngine.Fetch(s);
This runs Fetch for each string produced by the Query in the Subscriber thread which tends to be the thread where you're setting up your Subscription (although it isn't necessarily).
The issue has to do with the intention and design of Rx. It's intended that the Query is the long-running process and the Subscription is a short method that deals with the results. If you want to run a long running function as an Rx Observable your best option is to use Observable.ToAsync.
You should also take a look at this question to see a similar problem which shows more of what's going on in the background.
There is nothing inherently concurrent about Rx. If you want to make your calls to Fetch you will need to change SoapQueryEngine so that it is async or call it on another thread and then bring the results back to the UI thread.
Try this way. Instead of subscribing to the event text changed event, create an observable on the event and observe it on the thread pool:
Observable.FromEventPattern(<subscribe event>, <unsubscribe event>)
.ObserveOn(ThreadPoolScheduler.Instance)
.SelectMany(s => s.Split('*'))
.Subscribe(s=> SoapQueryEngine.Fetch(s));

Comparison of Join and WaitAll

For multiple threads wait, can anyone compare the pros and cons of using WaitHandle.WaitAll and Thread.Join?
WaitHandle.WaitAll has a 64 handle limit so that is obviously a huge limitation. On the other hand, it is a convenient way to wait for many signals in only a single call. Thread.Join does not require creating any additional WaitHandle instances. And since it could be called individually on each thread the 64 handle limit does not apply.
Personally, I have never used WaitHandle.WaitAll. I prefer a more scalable pattern when I want to wait on multiple signals. You can create a counting mechanism that counts up or down and once a specific value is reach you signal a single shared event. The CountdownEvent class conveniently packages all of this into a single class.
var finished = new CountdownEvent(1);
for (int i = 0; i < NUM_WORK_ITEMS; i++)
{
finished.AddCount();
SpawnAsynchronousOperation(
() =>
{
try
{
// Place logic to run in parallel here.
}
finally
{
finished.Signal();
}
}
}
finished.Signal();
finished.Wait();
Update:
The reason why you want to signal the event from the main thread is subtle. Basically, you want to treat the main thread as if it were just another work item. Afterall, it, along with the other real work items, is running concurrently as well.
Consider for a moment what might happen if we did not treat the main thread as a work item. It will go through one iteration of the for loop and add a count to our event (via AddCount) indicating that we have one pending work item right? Lets say the SpawnAsynchronousOperation completes and gets the work item queued on another thread. Now, imagine if the main thread gets preempted before swinging around to the next iteration of the loop. The thread executing the work item gets its fair share of the CPU and starts humming along and actually completes the work item. The Signal call in the work item runs and decrements our pending work item count to zero which will change the state of the CountdownEvent to signalled. In the meantime the main thread wakes up and goes through all iterations of the loop and hits the Wait call, but since the event got prematurely signalled it pass on by even though there are still pending work items.
Again, avoiding this subtle race condition is easy when you treat the main thread as a work item. That is why the CountdownEvent is intialized with one count and the Signal method is called before the Wait.
I like #Brian's answer as a comparison of the two mechanisms.
If you are on .Net 4, it would be worthwhile exploring Task Parallel Library to achieve Task Parellelism via System.Threading.Tasks which allows you to manage tasks across multiple threads at a higher level of abstraction. The signalling you asked about in this question to manage thread interactions is hidden or much simplified, and you can concentrate on properly defining what each Task consists of and how to coordinate them.
This may seem offtopic but as Microsoft themselves say in the MSDN docs:
in the .NET Framework 4, tasks are the
preferred API for writing
multi-threaded, asynchronous, and
parallel code.
The waitall mechanism involves kernal-mode objects. I don't think the same is true for the join mechanism. I would prefer join, given the opportunity.
Technically though, the two are not equivalent. IIRC Join can only operate on one thread. Waitall can hold for the signalling of multiple kernel objects.

Categories