I am new to threaded programming. I have to run few tasks in PARALLEL and in Background (so that main UI execution thread remain responsive to user actions) and wait for each one of them to complete before proceeding further execution.
Something like:
foreach(MyTask t in myTasks)
{
t.DoSomethinginBackground(); // There could be n number of task, to save
// processing time I wish to run each of them
// in parallel
}
// Wait till all tasks complete doing something parallel in background
Console.Write("All tasks Completed. Now we can do further processing");
I understand that there could be several ways to achieve this. But I am looking for the best solution to implement in .Net 4.0 (C#).
To me it would seem like you want Parallel.ForEach
Parallel.ForEach(myTasks, t => t.DoSomethingInBackground());
Console.Write("All tasks Completed. Now we can do further processing");
You can also perform multiple tasks within a single loop
List<string> results = new List<string>(myTasks.Count);
Parallel.ForEach(myTasks, t =>
{
string result = t.DoSomethingInBackground();
lock (results)
{ // lock the list to avoid race conditions
results.Add(result);
}
});
In order for the main UI thread to remain responsive, you will want to use a BackgroundWorker and subscribe to its DoWork and RunWorkerCompleted events and then call
worker.RunWorkerAsync();
worker.RunWorkerAsync(argument); // argument is an object
You can use Task library to complete:
string[] urls = ...;
var tasks = urls.Select(url => Task.Factory.StartNew(() => DoSomething(url)));
To avoid locking UI Thread, you can use ContinueWhenAll in .NET 4.0:
Task.Factory.ContinueWhenAll(tasks.ToArray(), _ =>
Console.Write("All tasks Completed. Now we can do further processing");
);
If you are in the latest version of .NET, you can use Task.WhenAll instead
If you use Net 4.0 or up, refer to the Parallel class and Task class. Joseph Albahari wrote very clear book about that: http://www.albahari.com/threading/part5.aspx#_Creating_and_Starting_Tasks
Related
I am working on some legacy code which repeatedly calls a long running task in a new thread:
var jobList = spGetSomeJobIds.ToList();
jobList.ForEach((jobId) =>
{
var myTask = Task.Factory.StartNew(() => CallExpensiveStoredProc(jobId),
TaskCreationOptions.LongRunning);
myTask.Wait();
});
As the calling thread immediately calls Wait and blocks until the task completes I can't see any point in the Task.Factory.StartNew code. Am I missing something? is there something about TaskCreationOptions.LongRunning which might add value?
As msdn says:
Waits for the Task to complete execution.
in addition, there is the following statement:
Wait blocks the calling thread until the task completes.
So myTask.Wait(); looks redundant as method CallExpensiveStoredProc returns nothing.
As a good practise, it would be better to use async and await operators when you deal with asyncronous operations such as database operations.
UPDATE:
What we have is:
We run LongRunning, so new Thread is created. It can be seen in source files.
Then we call myTask.Wait();. This method just waits when myTask will finish its work. So all jobList iterations will be executed sequantially, not parallely. So now we need to decide how our job should be executed - sequantially(case A) or parallelly(case B).
Case A: Sequantial execution of our jobs
If your jobs should be executed sequntially, then a few questions might be arisen:
what for do we use multithreading, if our code is executing sequantially? Our code should be clean and simple. So we can avoid using multithreading in this case
when we create a new thread, we are adding additional overheads to the threadpool. Because thread pool tries to determine the optimal number of threads and it creates at least one thread per core. That means when all of the thread pool threads are busy, the task might wait (in extreme cases infinitely long), until it actually starts executing.
To sum up, so there is no gain in this case to create new Thread, especially new thread using LongRunning enum.
Case B: Parallel execution of our jobs
If our goal is to run all jobs parallely, then myTask.Wait(); should be eliminated because it makes code to be executed sequntially.
Code to test:
var jobs = new List<int>(){1, 2, 3 };
jobs.ForEach(j =>
{
var myTask = Task.Factory.StartNew(() =>
{
Console.WriteLine($"This is a current number of executing task: { j }");
Thread.Sleep(5000); // Imitation of long-running operation
Console.WriteLine($"Executed: { j }");
}, TaskCreationOptions.LongRunning);
myTask.Wait();
});
Console.WriteLine($"All jobs are executed");
To conclude in this case B, there is no gain to create new Thread, especially new thread using LongRunning enum. Because this is an expensive operation in the time it takes to be created and in memory consumption.
I (think that I) understand the differences between threads and tasks.
Threads allow us to do multiple things in parallel (they are CPU-bound).
Asynchronous tasks release the processor time while some I/O work is done (they are I/O-bound).
Now, let's say I want to do multiple asynchronous tasks in parallel. For example, I want to download several pages of a paged response at the same time. Or, I want to write new data into two different databases. What is the correct way to handle the threads? Should they be async and awaited? Or can the async operation be just inside the thread? What is the best practice for error handling?
I have tried creating my own utility method to start a new async thread, but I have a feeling that it can go horribly wrong.
public static Task<Thread> RunInThreadAsync<T>(T actionParam, Func<T, Task> asyncAction)
{
var thread = new Thread(async () => await asyncAction(actionParam));
thread.Start();
return thread;
}
Is this ok? Or should the method be public static async Task<Thread>? If yes, what should be awaited? There is no thread.StartAsync(). Or should I use Task.Run instead?
Note: Using await Task.WhenAll or similar approaches without an explicit new thread is not an option for me. The "worker" thread is run in background (to avoid blocking the main thread) and is later processed by other services in the system.
I (think that I) understand the differences between threads and tasks.
There's one important concept missing here: concurrency. Concurrency is doing more than one thing at a time. This is different than "parallel", which is a term most developers use to mean "doing more than one thing at a time using threads". So, parallelism is one form of concurrency, and asynchrony is another form of concurrency.
Now, let's say I want to do multiple asynchronous tasks in parallel.
And here's the problem: mixing two forms of concurrency. What you really want to do is multiple asynchronous tasks concurrently. And the way to do this is via Task.WhenAll.
Using await Task.WhenAll or similar approaches without an explicit new thread is not an option for me. The "worker" thread is run in background (to avoid blocking the main thread) and is later processed by other services in the system.
This argument doesn't make any sense. Asynchronous code won't block the main thread because it's asynchronous. There's no explicit thread necessary.
If, for some unknown reason, you really do need a background thread, then just wrap your code in Task.Run. Thread should only ever be used for COM interop; any other use of Thread is legacy code as soon as it is written.
System.Threading.Thread has been in .NET since version 1.1. It allows you to control multiple worker threads within your application. This only uses 1 core of your CPU.
The Task Parallel Library (TPL) introduced the ability to leverage multiple cores on your machine with async Tasks or System.Threading.Tasks.Task<T>.
My approach for your "multiple downloader" scenario, would be to create a new CancellationTokenSource which allows me to cancel my Tasks. The I would start creating my Task<T> and start them. You can use Task.WaitAll() to sit and wait.
You should be aware that you can chain your tasks together in a sequence by using the ContinueWith<T>() method.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApp2
{
async class Program
{
static bool DownloadFile (string path)
{
// Do something here. long running task.
// check for cancellation -> Task.Factory.CancellationToken.IsCancellationRequested
return true;
}
static async void Main(string[] args)
{
var paths = new[] { "Somepaths", "to the files youwant", "to download" };
List<Task<bool>> results = new List<Task<bool>>();
var cts = new System.Threading.CancellationTokenSource();
foreach(var path in paths)
{
var task = new Task<bool>(_path => DownloadFile((string)_path), path, cts.Token);
task.Start();
results.Add(task);
}
// use cts.Cancel(); to cancel all associated tasks.
// Task.WhenAll() to do something when they are all done.
// Task.WaitAll( results.ToArray() ); // to sit and wait.
Console.WriteLine("Press <Enter> to quit.");
var final = Console.ReadLine();
}
}
}
Say I have 10 threads busily doing something and they sometimes call a method
public HandleWidgets(Widget w) { HeavyLifting(w) }
However, I don't want my 10 threads to wait on HeavyLifting(w), but instead, dispatch the HeavyLifting(w) work to an 11th thread, the HeavyLifter thread and continue asynchronously. The HeavyLifter thread dispatched to should always be the same thread, and I don't want to make multiple threads (hence, I can't do something quite like this: C# Asynchronous call without EndInvoke?).
HeavyLifting(w) is "fire and forget" in that the threads that call HandleWidgets() don't need a callback or anything like that.
What's a healthy practice for this?
I'm surprised none of the other answers here mention TPL DataFlow. You connect a bunch of blocks together and post data through the chain. You can control the concurrency of each block explicitly, so you could do the following:
var transformBlk =
new TransformBlock<int,int>(async i => {
await Task.Delay(i);
return i * 10;
}, new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = 10});
var processorBlk=
new ActionBlock<int>(async i => {
Console.WriteLine(i);
},new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = 1});
transformBlk.LinkTo(processorBlk);
var data = Enumerable.Range(1, 20).Select(x => x * 1000);
foreach(var x in data)
{
transformBlk.Post(x);
}
Basically you have threads that are producers of work and one thread that is a consumer of it.
Create a thread and have it Take from a BlockingCollection in a loop. This is your consumer thread which will call HeavyLifting. It will simply wait until an item is available and then proces it:
A call to Take may block until an item is available to be removed.
The other threads can simply add items to the collection.
Note that BlockingCollection doesn't guarantee ordering of items added/removed by itself:
The order in which an item is removed depends on the type of collection used to create the BlockingCollection instance. When you create a BlockingCollection object, you can specify the type of collection to use
You can create a limited concurrency TaskScheduler to be used as a Task factory, as provided in this example from MSDN, with limitation to single thread:
var lcts = new LimitedConcurrencyLevelTaskScheduler(1);
TaskFactory factory = new TaskFactory(lcts);
The implement your function as:
public HandleWidgets(Widget w)
{
factory.StartNew(() => HeavyLifting(w));
}
Create a queue which is shared among all threads and one semaphore, which is also shared among all threads. Worker threads (that should not wait for HeavyLifter) post requests like this:
lock (queue)
{
queue.Enqueue(request);
semaphore.Release();
}
HeavyLifter is a background thread (doesn't stop process from exiting) and runs the following code in infinite loop:
while (true)
{
semaphore.WaitOne();
Request item = null
lock (queue)
{
item = queue.Dequeue();
}
this.ProcessRequest(item);
}
EDIT: typo.
--- EDIT ---
I just noticed you need "fire and forget", in which case a blocking collection alone would be enough. The solution below is really for more complex scenarios where you need to return a result, or propagate an exception, or compose tasks in some fashion (e.g. via async/await) etc...
Use TaskCompletionSource to expose the work done in the "synchronous" thread as a Task-based API to the "client" threads.
For each invocation of HandleWidgets (CT = "client thread", "ST" = synchronous thread):
CT: Create a separate TaskCompletionSource.
CT: Dispatch HeavyLifting to the ST (probably through a BlockingCollection; also pass the TaskCompletionSource to it, so it can do the last step below).
CT: Return TaskCompletionSource's Task to the caller without waiting for the work on ST to finish.
CT: Continue normally. If/when it is no longer possible to continue without waiting on HeavyLifting to finish (in the ST), wait on the task above.
ST: When HeavyLifting finishes, call SetResult (or SetException or SetCanceled, as appropriate), which unblocks any CTs that might currently wait on the task.
I have a Windows Service that processes tasks created by users. This Service runs on a server with 4 cores. The tasks mostly involve heavy database work (generating a report for example). The server also has a few other services running so I don't want to spin up too many threads (let's say a maximum of 4).
If I use a BlockingCollection<MyCustomTask>, is it a better idea to create 4 Thread objects and use these to consume from the BlockingCollection<MyCustomTask> or should I use Parallel.Foreach to accomplish this?
I'm looking at the ParallelExtensionsExtras which contains a StaTaskScheduler which uses the former, like so (slightly modified the code for clarity):
var threads = Enumerable.Range(0, numberOfThreads).Select(i =>
{
var thread = new Thread(() =>
{
// Continually get the next task and try to execute it.
// This will continue until the scheduler is disposed and no more tasks remain.
foreach (var t in _tasks.GetConsumingEnumerable())
{
TryExecuteTask(t);
}
});
thread.IsBackground = true;
thread.SetApartmentState(ApartmentState.STA);
return thread;
}).ToList();
// Start all of the threads
threads.ForEach(t => t.Start());
However, there's also a BlockingCollectionPartitioner in the same ParallelExtensionsExtras which would enable the use of Parallel.Foreach on a BlockingCollection<Task>, like so:
var blockingCollection = new BlockingCollection<MyCustomTask>();
Parallel.ForEach(blockingCollection.GetConsumingEnumerable(), task =>
{
task.DoSomething();
});
It's my understanding that the latter leverages the ThreadPool. Would using Parallel.ForEach have any benefits in this case?
This answer is relevant if Task class in your code has nothing to do with System.Threading.Tasks.Task.
As a simple rule, use Parallel.ForEach to run tasks that will end eventually. Like execute some work in parallel with some other work
Use Threads when they run routine for the whole life of application.
So, it looks like in your case you should use Threads approach.
I may be going about this all wrong but I'm stuck. I have a GUI application that spawns a separate thread that downloads a bunch of data from a server. When this download thread is finished I want it to send a signal to the main thread so that it knows it can now display the downloaded data.
I've tried calling Invoke (from my main form) to call a delegate to do the display work, but this blocks my downloader thread until its finished. I kind of want to just do a BeginInvoke without an EndInvoke but I know its not proper to do so.
There are a few options.
My personal favorite is to use the TPL. On your UI thread, you can make a TaskFactory, like so:
// Given:
// TaskFactory uiFactory;
uiFactory = new TaskFactory(TaskScheduler.FromCurrentSynchronizationContext());
Then, in your background task, you can just create a Task to update your UI:
var task = uiFactory.StartNew( () => UpdateUserInterface(data));
This will marshal to the UI thread correctly, similar to a BeginInvoke call. If you need to block, you can call task.Wait() (or task.Result if the Update method returns a value).
There are several options:
For WinForms use the Control.BeginInvoke method.
For WPF use the Dispatcher.BeginInvoke method.
"The TPL has other schedulers in addition to the default one and also allows you to create custom schedulers. One of the schedulers that TPL provides is based on the current synchronization context, and it can be used to ensure that my task executes on the UI thread." (Source article):
var ui = TaskScheduler.FromCurrentSynchronizationContext();
Task.Factory.ContinueWhenAll(tasks.ToArray(),
result =>
{
var time = watch.ElapsedMilliseconds;
label1.Content += time.ToString();
}, CancellationToken.None, TaskContinuationOptions.None, ui);
In the case with download scenario, .ContinueWith() continuation would be appropriate.