I have a Windows Service that processes tasks created by users. This Service runs on a server with 4 cores. The tasks mostly involve heavy database work (generating a report for example). The server also has a few other services running so I don't want to spin up too many threads (let's say a maximum of 4).
If I use a BlockingCollection<MyCustomTask>, is it a better idea to create 4 Thread objects and use these to consume from the BlockingCollection<MyCustomTask> or should I use Parallel.Foreach to accomplish this?
I'm looking at the ParallelExtensionsExtras which contains a StaTaskScheduler which uses the former, like so (slightly modified the code for clarity):
var threads = Enumerable.Range(0, numberOfThreads).Select(i =>
{
var thread = new Thread(() =>
{
// Continually get the next task and try to execute it.
// This will continue until the scheduler is disposed and no more tasks remain.
foreach (var t in _tasks.GetConsumingEnumerable())
{
TryExecuteTask(t);
}
});
thread.IsBackground = true;
thread.SetApartmentState(ApartmentState.STA);
return thread;
}).ToList();
// Start all of the threads
threads.ForEach(t => t.Start());
However, there's also a BlockingCollectionPartitioner in the same ParallelExtensionsExtras which would enable the use of Parallel.Foreach on a BlockingCollection<Task>, like so:
var blockingCollection = new BlockingCollection<MyCustomTask>();
Parallel.ForEach(blockingCollection.GetConsumingEnumerable(), task =>
{
task.DoSomething();
});
It's my understanding that the latter leverages the ThreadPool. Would using Parallel.ForEach have any benefits in this case?
This answer is relevant if Task class in your code has nothing to do with System.Threading.Tasks.Task.
As a simple rule, use Parallel.ForEach to run tasks that will end eventually. Like execute some work in parallel with some other work
Use Threads when they run routine for the whole life of application.
So, it looks like in your case you should use Threads approach.
Related
I have a lot of Task what i running simultaneously. But sometimes there are too much of them to run in same time. So i want to run them by bunches for 100 Tasks. But i'm not sure how can i modify my code.
There is my current code:
protected void ValidateFile(List<MyFile> validFiles, MyFile file)
{
///do something
validFiles.add(file);
}
internal Task ValidateFilesAsync(List<MyFile> validFiles, SplashScreenManager splashScreen, MyFile file)
{
return Task.Run(() => ValidateFile(validFiles, file)).ContinueWith(
t => splashScreen?.SendCommand(SplashScreen.SplashScreenCommand.IncreaseGeneralActionValue,
1));
}
var validFiles = new List<MyFile>;
var tasks = new List<Task>();
foreach (var file in filesToValidate)
{
tasks.Add(ValidateFilesAsync(validFiles, splashScreenManager, file));
}
Task.WaitAll(tasks.ToArray());
I'm not very good with Tasks so code may be not an optimal, but somehow it's working.
I found that i can use Parallel.Foreach with MaxDegreeOfParallelism param, but for this i have to ValidateFile into Action and remove ValidateFilesAsync and in this case i will lose ContinueWith functional what i use to increase progress bar in gui.
How can i achive restriction of simultaneously running Tasks and if possible save ContinueWith-like functional?
I would suggest using Parallel.Foreach. Not sure what you mean with " have to ValidateFile into Action", just change your foreach-body to a lambda. There should be plenty of examples easily available.
To update the UI there are several ways to do it:
Use SynchronizationContext
Start a task using a task scheduler that runs the task on the UI thread
Update the progress variable on the background thread, and use a timer to poll the progress variable from the UI thread.
Keep in mind that IO, like reading files, may not improve much, if at all, by reading in parallel. Spinning disks are inherently serial, and have a fairly large seek times. SSDs are inherently parallel, but I would still not expect any huge performance gains from reading in parallel, especially not going up to 100 concurrent reads.
I (think that I) understand the differences between threads and tasks.
Threads allow us to do multiple things in parallel (they are CPU-bound).
Asynchronous tasks release the processor time while some I/O work is done (they are I/O-bound).
Now, let's say I want to do multiple asynchronous tasks in parallel. For example, I want to download several pages of a paged response at the same time. Or, I want to write new data into two different databases. What is the correct way to handle the threads? Should they be async and awaited? Or can the async operation be just inside the thread? What is the best practice for error handling?
I have tried creating my own utility method to start a new async thread, but I have a feeling that it can go horribly wrong.
public static Task<Thread> RunInThreadAsync<T>(T actionParam, Func<T, Task> asyncAction)
{
var thread = new Thread(async () => await asyncAction(actionParam));
thread.Start();
return thread;
}
Is this ok? Or should the method be public static async Task<Thread>? If yes, what should be awaited? There is no thread.StartAsync(). Or should I use Task.Run instead?
Note: Using await Task.WhenAll or similar approaches without an explicit new thread is not an option for me. The "worker" thread is run in background (to avoid blocking the main thread) and is later processed by other services in the system.
I (think that I) understand the differences between threads and tasks.
There's one important concept missing here: concurrency. Concurrency is doing more than one thing at a time. This is different than "parallel", which is a term most developers use to mean "doing more than one thing at a time using threads". So, parallelism is one form of concurrency, and asynchrony is another form of concurrency.
Now, let's say I want to do multiple asynchronous tasks in parallel.
And here's the problem: mixing two forms of concurrency. What you really want to do is multiple asynchronous tasks concurrently. And the way to do this is via Task.WhenAll.
Using await Task.WhenAll or similar approaches without an explicit new thread is not an option for me. The "worker" thread is run in background (to avoid blocking the main thread) and is later processed by other services in the system.
This argument doesn't make any sense. Asynchronous code won't block the main thread because it's asynchronous. There's no explicit thread necessary.
If, for some unknown reason, you really do need a background thread, then just wrap your code in Task.Run. Thread should only ever be used for COM interop; any other use of Thread is legacy code as soon as it is written.
System.Threading.Thread has been in .NET since version 1.1. It allows you to control multiple worker threads within your application. This only uses 1 core of your CPU.
The Task Parallel Library (TPL) introduced the ability to leverage multiple cores on your machine with async Tasks or System.Threading.Tasks.Task<T>.
My approach for your "multiple downloader" scenario, would be to create a new CancellationTokenSource which allows me to cancel my Tasks. The I would start creating my Task<T> and start them. You can use Task.WaitAll() to sit and wait.
You should be aware that you can chain your tasks together in a sequence by using the ContinueWith<T>() method.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApp2
{
async class Program
{
static bool DownloadFile (string path)
{
// Do something here. long running task.
// check for cancellation -> Task.Factory.CancellationToken.IsCancellationRequested
return true;
}
static async void Main(string[] args)
{
var paths = new[] { "Somepaths", "to the files youwant", "to download" };
List<Task<bool>> results = new List<Task<bool>>();
var cts = new System.Threading.CancellationTokenSource();
foreach(var path in paths)
{
var task = new Task<bool>(_path => DownloadFile((string)_path), path, cts.Token);
task.Start();
results.Add(task);
}
// use cts.Cancel(); to cancel all associated tasks.
// Task.WhenAll() to do something when they are all done.
// Task.WaitAll( results.ToArray() ); // to sit and wait.
Console.WriteLine("Press <Enter> to quit.");
var final = Console.ReadLine();
}
}
}
I am using the task parallel library like this in a .aspx page:
Parallel.Invoke(
new Action[]
{
() => { users= service.DoAbc(); },
() => { products= service.DoDef(); }
});
Previously I was firing off a thread per call, and it was more responsive that it is now when I use Parallel.Invoke.
Should I assume the TPL library will do what's best or is there a way for me to tweak it so it actually does calls in parallel?
I guess it comes down to the type of hardware my website is running on, which I believe is a VM.
Each of my calls makes a http request to fetch results from a API call.
Parallel.Invoke will run you methods in parallel unless this is more expensive than running them sequentially, or there are no available threads in the threadpool.This is an optimization, not an issue. Under normal circumstances you shouldn't try to second-guess the framework and just let it do its job.
You should consider overriding this behavior if you want to invoke some long-running IO-bound methods. Parallel.Invoke uses the default TaskScheduler which uses about as many threads as there are cores (not sure how many) to avoid overloading the CPU. This is not an issue if your actions just wait for some IO or network call to complete.
You can specify the maximum number of threads using the Parallel.Invoke(ParallelOptions,Action[])]1 override. You can also use the ParallelOptions class to pass a cancellation token or specifiy a custom TaskScheduler, eg one that allows you to use more threads than the default scheduler.
You can rewrite your code like this:
Parallel.Invoke(
new ParallelOptions{MaxDegreeOfParallelism=30},
new Action[]
{
() => { users= service.DoAbc(); },
() => { products= service.DoDef(); }
});
Still, you should not try to modify the default options unless you find an actual performance problem. You may end up oversubscribing your CPU and causing delays or thrashing.
You could fire off a couple tasks to handle the calls.
// Change to Task.Factory.StartNew depending on .NET version
var userTask = Task.Run(() => service.DoAbc());
var productsTask = Task.Run(() => service.DoDef());
Task.WaitAll(userTask, productsTask);
users = userTask.Result;
products = productsTask.Result;
I have a "worker" process that is running constantly on a dedicated server, sending emails, processing data extracts etc.
I want to have all of these processes running asynchronously, but I only want one instance of each process running at any one time. If a process is already running, I don't want to queue up running it again.
[example, simplified]
while (true)
{
// SLEEP HERE
Task task1 = Task.Factory.StartNew(() => DataScheduleWorker.Run());
Task task2 = Task.Factory.StartNew(() => EmailQueueWorker.Run());
}
Basically, I want this entire process to run endlessly, with each of the tasks running parallel to each other, but only one instance of each task running at any point in time.
How can I achieve this in C# 5? What's the cleanest/best way to implement this?
EDIT
Would something as simple as this suffice, or would this be deemed bad?:
Task dataScheduleTask = null;
while (true)
{
Thread.Sleep(600);
// Data schedule worker
if (dataScheduleTask != null && dataScheduleTask.IsCompleted) dataScheduleTask = null;
if (dataScheduleTask == null)
{
dataScheduleTask = Task.Factory.StartNew(() => DataScheduleWorker.Run());
}
}
This sounds like a perfect job for either an actors framework, or possibly TPL Dataflow. Fundamentally you've got one actor (or block) for each job, waiting for messages and processing them independently of the other actors. In either case, your goal should be to write as little of the thread handling and message passing code as possible - ideally none. This problem has already been largely solved; you just need to evaluate the available options and use the best one for your task. I would probably start with Dataflow, personally.
I am new to threaded programming. I have to run few tasks in PARALLEL and in Background (so that main UI execution thread remain responsive to user actions) and wait for each one of them to complete before proceeding further execution.
Something like:
foreach(MyTask t in myTasks)
{
t.DoSomethinginBackground(); // There could be n number of task, to save
// processing time I wish to run each of them
// in parallel
}
// Wait till all tasks complete doing something parallel in background
Console.Write("All tasks Completed. Now we can do further processing");
I understand that there could be several ways to achieve this. But I am looking for the best solution to implement in .Net 4.0 (C#).
To me it would seem like you want Parallel.ForEach
Parallel.ForEach(myTasks, t => t.DoSomethingInBackground());
Console.Write("All tasks Completed. Now we can do further processing");
You can also perform multiple tasks within a single loop
List<string> results = new List<string>(myTasks.Count);
Parallel.ForEach(myTasks, t =>
{
string result = t.DoSomethingInBackground();
lock (results)
{ // lock the list to avoid race conditions
results.Add(result);
}
});
In order for the main UI thread to remain responsive, you will want to use a BackgroundWorker and subscribe to its DoWork and RunWorkerCompleted events and then call
worker.RunWorkerAsync();
worker.RunWorkerAsync(argument); // argument is an object
You can use Task library to complete:
string[] urls = ...;
var tasks = urls.Select(url => Task.Factory.StartNew(() => DoSomething(url)));
To avoid locking UI Thread, you can use ContinueWhenAll in .NET 4.0:
Task.Factory.ContinueWhenAll(tasks.ToArray(), _ =>
Console.Write("All tasks Completed. Now we can do further processing");
);
If you are in the latest version of .NET, you can use Task.WhenAll instead
If you use Net 4.0 or up, refer to the Parallel class and Task class. Joseph Albahari wrote very clear book about that: http://www.albahari.com/threading/part5.aspx#_Creating_and_Starting_Tasks