Parallel For and background worker - c#

I have this simple for:
for (int i = 0; i < nro_archivos; ++i) //Cargar el objeto img
{
string nombrearchivo = archivosdicom[i].FullName;
img.Add(new ImagenDicom(nombrearchivo));
Progress_Bar_Loading_Images.PerformStep();
}
followed by this:
decimal[] sliceseparation_imagen = new decimal[img.Count - 1];
for (int i = 0; i < img.Count; i++)
{
if (i < img.Count - 1)
{
sliceseparation_imagen[i] = Math.Abs(img[i + 1].Z - img[i].Z);
}
}
sliceseparation_promedio = sliceseparation_imagen.Average();
Now, my challenge is:
I implemented Paralell For but can't use the progressbar.. so I was thinking on using BackgroundWorker but the problem is that the operation right after the for is dependent on the load of the object img which happens in the for so until that's not done I can't continue.
My understanding of BackGroundWorker is that it executes in the background while the main program continues its execution, so this approach will bring errors when trying to access an img object that has not been created by the time the main program reaches the code outside the for.
Does it worth to use Background Worker in this case to speed up the load of the img object? if it does, how do I wait until the backgroundworker has done its job to then continue with the execution of the main program? I need to report progress on the for operation to the user so using a parallel for without something that would allow me to report back to the user won't work.
Thanks,
Matias.

If I understood the problem right here, you have one set of work when you're loading the image, you can care less about this one and pretend this can happen in parallel anyway but the problem is you need to report progress.
And after loading it you have one other block of work you need to do so, you can do it right away or you can do it after all the images are loaded.
Instead of parallel, you can go for Tasks. Access UI thread through a UI dispatcher so you don't have to worry about UI thread access issues.
var tasks = new List<Task>
{
Task.Run(() => {
// Block 1
// Use a proper dispatcher here to access the UI thread so you can report your progress}),
};
Task.WaitAll(tasks);
Now you have got your loads done, and you can progress with your second block of work.
But, as I can see you only need the average out of it and you don't need proper order to get average.
var tasks = new List<Task>
{
Task.Run(() => { /* for loop logic #1 */})
.ContinueWith((x)=> {
// Get your task result and execute second block
})
};
Task.WaitAll(tasks);
Now you have a continued task with it and all you have to do is call for an average after this is done.
You can go with two separate blocks too. I figured as these tasks are intertwined, why can't you just continue with one task anyway.

Using Tasks may help
var tasks = new List<Task>
{
Task.Run(() => { /* for loop logic #1 */
/* when interacting w/UI either use Dispatcher
for WPF for control.Invoke in winforms */
}),
Task.Run(() => { /* for loop logic #2 */})
};
Task.WaitAll(tasks.ToArray());

Related

Parallel request to scrape multiple pages of a website

I want to scrape a website with plenty of pages with interesting data but as the source is very large I want to multithread and limit the overload.
I use a Parallel.ForEach to start each chunk of 10 tasks and I wait in the main for loop until the numbers of active threads started drop below a threshold. For that I use a counter of active threads I increment when starting a new thread with a WebClient and decrement when the DownloadStringCompleted event of the WebClient is triggered.
Originally the questions was how to use DownloadStringTaskAsync instead of DownloadString and wait that each of the threads started in the Parallel.ForEach has completed. This has been solved with a workaround:
a counter (activeThreads) and a Thread.Sleep in the main foor loop.
Is using await DownloadStringTaskAsync instead of DownloadString supposed to improve at all the speed by freeing a thread while waiting for the DownloadString data to arrive ?
And to get back to the original question, is there a way to do this more elegantly using TPL without the workaround of involving a counter ?
private static volatile int activeThreads = 0;
public static void RecordData()
{
var nbThreads = 10;
var source = db.ListOfUrls; // Thousands urls
var iterations = source.Length / groupSize;
for (int i = 0; i < iterations; i++)
{
var subList = source.Skip(groupSize* i).Take(groupSize);
Parallel.ForEach(subList, (item) => RecordUri(item));
//I want to wait here until process further data to avoid overload
while (activeThreads > 30) Thread.Sleep(100);
}
}
private static async Task RecordUri(Uri uri)
{
using (WebClient wc = new WebClient())
{
Interlocked.Increment(ref activeThreads);
wc.DownloadStringCompleted += (sender, e) => Interlocked.Decrement(ref iterationsCount);
var jsonData = "";
RootObject root;
jsonData = await wc.DownloadStringTaskAsync(uri);
var root = JsonConvert.DeserializeObject<RootObject>(jsonData);
RecordData(root)
}
}
If you want an elegant solution you should use Microsoft's Reactive Framework. It's dead simple:
var source = db.ListOfUrls; // Thousands urls
var query =
from uri in source.ToObservable()
from jsonData in Observable.Using(
() => new WebClient(),
wc => Observable.FromAsync(() => wc.DownloadStringTaskAsync(uri)))
select new { uri, json = JsonConvert.DeserializeObject<RootObject>(jsonData) };
IDisposable subscription =
query.Subscribe(x =>
{
/* Do something with x.uri && x.json */
});
That's the entire code. It's nicely multi-threaded and it's kept under control.
Just NuGet "System.Reactive" to get the bits.
Parallel.ForEach
Will create ProcessorCount tasks to execute the function for each item in the source Enumerable. It will take care that there are not to many tasks and will wait for all items and tasks to be executed.
Task.WhenAll
Only awaits the given tasks it does not execute them. Its on your hand to execute them in a proper way and not to many at once.
But there is some fault in your code. The function RecordUri will return a task that has to be awaited otherwise the ForEach will just create more and more as the function will never know when the current task is completed. Also problematic is that you create a task in a task and the first task does nothing else then wait for the first one.
You might also want to take a look at this overload of Parallel.ForEach
https://msdn.microsoft.com/en-us/library/dd782934(v=vs.110).aspx
Edit
Is using await DownloadStringTaskAsync instead of DownloadString supposed to improve at all the speed by freeing a thread while waiting for the DownloadString data to arrive ?
No. As when a task is awaiting a external resource it enters a Suspended state (Windows api that is not using some old/dirty iteration waiting). So there is no much difference.
What differs is the overhead the compiler will generate when compiling your async code. The DownloadStringTaskAsync will create a task that contains the long operation. If you use await it, you will attach yourself to that task (by ContinueWith). So you just create a Task for awaiting another. This is the overhead i was talking about in the upper text.
My approach would be: Use the synchronous method inside your Parallel.ForEach. The Threadding will be done by PLinq and you are free to go on.
Remember "KISS"

.NET TPL Start a task after one ends indefinitely

I have an application which process images extracted from database. I need to process various images in parallel, and because of that i'm using .NET TPL with Tasks.
My application is a Winforms app in C#. I just have the option to choose how many processes will start and a Start button. The way that i'm doing right now is this:
private Action getBusinessAction(int numProcess) {
return () =>
{
try {
while (true) {
(new BusinessClass()).doSomeProcess(numProcess);
tokenCancelacion.ThrowIfCancellationRequested();
}
}
catch (OperationCanceledException ex) {
Console.WriteLine("Cancelled process");
}
};
}
...
for (int cnt = 0; cnt < NUM_OF_MAX_PROCESS; cnt++) {
Task.Factory.StartNew(getBusinessAction(cnt + 1), tokenCancelacion);
}
In this case, if i choose 8 as the number of processes, it starts 8 tasks which are running until application is closed. But i think that a better approach will be to start a number of tasks which calls doSomeProcess method, and then finish. But when a task finishes, i would like to start the same task or start a new instance of the task that does the same, in order to having always that number of processes running in parallel. Is there a way in TPL to achieve this?
This sounds like a good fit for TPL Dataflow's ActionBlock. You create a single block at the start. Set how many items it should process concurrently (i.e. 8) and post the items into it:
var block = new ActionBlock<Image>(
image => ProcessImage(image),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 8});
foreach (var image in GetImages())
{
block.Post(image);
}
This will take care of creating up to 8 tasks when needed and when they aren't anymore the number would go down.
When you are done you should signal the block for completion and wait for it to complete:
block.Complete();
await block.Completion;

How to use Threads for Processing Many Tasks

I have a C# requirement for individually processing a 'great many' (perhaps > 100,000) records. Running this process sequentially is proving to be very slow with each record taking a good second or so to complete (with a timeout error set at 5 seconds).
I would like to try running these tasks asynchronously by using a set number of worker 'threads' (I use the term 'thread' here cautiously as I am not sure if I should be looking at a thread, or a task or something else).
I have looked at the ThreadPool, but I can't imagine it could queue the volume of requests required. My ideal pseudo code would look something like this...
public void ProcessRecords() {
SetMaxNumberOfThreads(20);
MyRecord rec;
while ((rec = GetNextRecord()) != null) {
var task = WaitForNextAvailableThreadFromPool(ProcessRecord(rec));
task.Start()
}
}
I will also need a mechanism that the processing method can report back to the parent/calling class.
Can anyone point me in the right direction with perhaps some example code?
A possible simple solution would be to use a TPL Dataflow block which is a higher abstraction over the TPL with configurations for degree of parallelism and so forth. You simply create the block (ActionBlock in this case), Post everything to it, wait asynchronously for completion and TPL Dataflow handles all the rest for you:
var block = new ActionBlock<MyRecord>(
rec => ProcessRecord(rec),
new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = 20});
MyRecord rec;
while ((rec = GetNextRecord()) != null)
{
block.Post(rec);
}
block.Complete();
await block.Completion
Another benefit is that the block starts working as soon as the first record arrives and not only when all the records have been received.
If you need to report back on each record you can use a TransformBlock to do the actual processing and link an ActionBlock to it that does the updates:
var transform = new TransfromBlock<MyRecord, Report>(rec =>
{
ProcessRecord(rec);
return GenerateReport(rec);
}, new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = 20});
var reporter = new ActionBlock<Report>(report =>
{
RaiseEvent(report) // Or any other mechanism...
});
transform.LinkTo(reporter, new DataflowLinkOptions { PropagateCompletion = true });
MyRecord rec;
while ((rec = GetNextRecord()) != null)
{
transform.Post(rec);
}
transform.Complete();
await transform.Completion
Have you thought about using parallel processing with Actions?
ie, create a method to process a single record, add each record method as an action into a list, and then perform a parrallel.for on the list.
Dim list As New List(Of Action)
list.Add(New Action(Sub() MyMethod(myParameter)))
Parallel.ForEach(list, Sub(t) t.Invoke())
This is in vb.net, but I think you get the gist.

Loop through list and create multple threads

I want to loop through a list of URLs and check each URL if the website is down or not using multiple threads.
My approach:
while (_lURLs.Count > 0)
{
while (_iRunningThreads < _iNumThreads)
{
Thread t = new Thread(new ParameterizedThreadStart(CheckWebsite));
string strUrl = GetNextURL();
if (!string.IsNullOrEmpty(strUrl))
{
t.Start(strUrl);
_iRunningThreads++;
}
else
{
break;
}
}
}
private string GetNextURL()
{
lock (_lURLs)
{
if (_lURLs.Count > 0)
{
string strRetVal = _lURLs[0];
_lURLs.RemoveAt(0);
return strRetVal;
}
else
{
return string.Empty;
}
}
}
When a thread is finished the _iRunningThreads property gets decremented.
My problem is: The outer while loop blocks everything "while (_lURLs.Count > 0)".
Adding a Application.DoEvents() in the outer while loop helps but I want to use the code in a c# library where Application.DoEvents() is not available.
Thank you for you help.
Instead of managing the threads yourself, you can use the TPL.
Also, if you're using .Net Framework 4.5 you can even add async/await and the WhenAll method to prevent blocking...
Here is a small example:
private async Task CheckUrl()
{
List<Task> tasks = new List<Task>();
string url = GetNextUrl();
while (!String.IsNullOrEmpty(url))
{
tasks.Add(Task.Run(() => CheckWebSite(url)));
url = GetNextUrl();
}
await Task.WhenAll(tasks);
// All tasks have finished...
}
I think using the .NET ThreadPool would be a good idea in this case, if the tasks take quite a short time to complete.
Check out: http://msdn.microsoft.com/en-us/library/4yd16hza.aspx
This allows you to simplify your code a bit as the ThreadPool automatically manages the count of the worker threads. You just have to call ThreadPool.QueueUserWorkItem for each URL you have and increment a running task counter. Queuing items into the ThreadPool won't block the UI thread.
Have the ThreadPool tasks decrement the counter (as you have now) and when the counter gets to zero (all tasks have been ran) call a callback function so that your main code knows when all the URLs have been processed. You can update the UI or what ever else you want to do from that callback.

Task workflow sequence is wrong

With the code below, the final UI updates made in the final ContinueWith never take place. I think it is because of the Wait() I have at the end.
The reason I am doing that is because without the Wait, the method will return the IDataProvider before its finished being constructed in the background.
Can someone help me get this right?
Cheers,
Berryl
private IDataProvider _buildSQLiteProvider()
{
IDataProvider resultingDataProvider = null;
ISession session = null;
var watch = Stopwatch.StartNew();
var uiContext = TaskScheduler.FromCurrentSynchronizationContext();
// get the data
var buildProvider = Task.Factory
.StartNew(
() =>
{
// code to build it
});
// show some progress if we haven't finished
buildProvider.ContinueWith(
taskResult =>
{
// show we are making progress;
},
CancellationToken.None, TaskContinuationOptions.None, uiContext);
// we have data: reflect completed status in ui
buildProvider.ContinueWith(
dataProvider =>
{
// show we are finished;
},
CancellationToken.None, TaskContinuationOptions.OnlyOnRanToCompletion, uiContext);
try {
buildProvider.Wait();
}
catch (AggregateException ae)
{
foreach (var e in ae.InnerExceptions)
Console.WriteLine(e.Message);
}
Console.WriteLine("Exception handled. Let's move on.");
CurrentSessionContext.Bind(session);
return resultingDataProvider;
}
====
just to be clear
I am not having trouble talking to the ui thread. The first continue with updates the ui just fine. The trouble I am having is the timing of the last ui update and the return of the data provider.
I commented out some of the code to reduce the noise level in tis post and focus on the task sequencing.
====
ok, working code
private void _showSQLiteProjecPicker()
{
var watch = Stopwatch.StartNew();
var uiScheduler = TaskScheduler.FromCurrentSynchronizationContext();
ISession session = null;
// get the data
var buildProvider = Task.Factory.StartNew(
() =>
{
var setProgress = Task.Factory.StartNew(
() =>
{
IsBusy = true;
Status = string.Format("Fetching data...");
},
CancellationToken.None, TaskCreationOptions.None, uiScheduler);
var provider = new SQLiteDataProvider();
session = SQLiteDataProvider.Session;
return provider;
});
buildProvider.ContinueWith(
buildTask =>
{
if(buildTask.Exception != null) {
Console.WriteLine(buildTask.Exception);
}
else {
Check.RequireNotNull(buildTask.Result);
Check.RequireNotNull(session);
_updateUiTaskIsComplete(watch);
CurrentSessionContext.Bind(session);
var provider = buildTask.Result;
var dao = provider.GetActivitySubjectDao();
var vm = new ProjectPickerViewModel(dao);
_showPicker(vm);
}
},
CancellationToken.None, TaskContinuationOptions.OnlyOnRanToCompletion, uiScheduler);
}
UPDATE BELOW
This code doesn't look like it warrants TPL to me. Looks like maybe a good use for a BackgroundWorker instead!
Either way, the updates are probably not taking place because you can't update the UI from a separate thread -- you need to run the update on the UI thread. You should use the Dispatcher for this (http://stackoverflow.com/questions/303116/system-windows-threading-dispatcher-and-winforms contains info for both WPF and WinForms)
Update:
So I obviously missed some of the code so here's a revised answer. First of all, Nicholas is correct -- .ContinueWith returns a new task (http://msdn.microsoft.com/en-us/library/dd270696.aspx). So instead of
var result = Task.Factory.StartNew(...);
result.ContinueWith(...);
you probably want to create a new task and then make all the ContinueWith() calls and assign to the task and then call .Start() on the task. Something like:
var task = new Task(...).ContinueWith(...);
task.Start();
However, there is a flaw in the design to begin with (as I see it)! You're trying to run this code async, wihch is why you're using threads and TPL. However, you're calling buildProvider.Wait(); on the UI thread which blocks the UI thread until this task completes! Aside from the issue of repainting the UI in the ContinueWith() while the UI thread is blocked, there's no benefit to multithreading here since you're blocking the UI thread (a major no-no). What you probably want to do is stick the Bind()-ing inside a ContinueWith or something so that you don't have to call Wait() and block the UI thread.
My $0.02 is that if you expect the query to take a long time what you really want is 2 threads (or tasks in TPL)-- one to perform the query and one to update the UI at intervals with status. If you don't expect it to take so long I think you just want a single thread (Task) to query and then update the UI when it's done. I would probably do this via BackgroundWorker. TPL was built for managing lots of tasks and continuations and such but seems overkill for this kind of thing -- I think you could do it using a BackgroundWorker in a lot less code. But you mention you want to use TPL which is fine, but you're going to have to rework this a bit so that it actually runs in the background!
PS - you probably meant to put the Console.WriteLine("Exception handled. Let's move on."); inside the catch
I'm a little hazy, but last time I used the TPL I found it confusing. ContinueWith() returns a new Task instance. So you need to assign the second ContinueWith() result to a new variable, say var continuedTask = builderProvider.ContinueWith(...), and then change the last one to reference continuedTask.ContinueWith() instead of buildProvider.ContinueWith(). Then Wait() on the last Task.
Hope that helps!

Categories