Replacing threads with tasks

Replacing threads with tasks - c#

New to threading and tasks here :)
So, I wrote a simple threading program that creates a few threads and runs them asynchronously then waits for them to finish.
I then changed it to a Task. The code does exactly the same thing and the only change is I change a couple of statements.
So, two questions really:
In the below code, what is the difference?
I'm struggling to figure out async/await. How would I integrate it into the below, or given all examples seem to be one method calls another that are both async/await return is this a bad example of using Task to do background work?
Thanks.
namespace ConsoleApp1
{
class Program
{
static void Main(string[] args)
{
ThreadSample();
TaskSample();
}
private static void ThreadSample()
{
Random r = new Random();
MyThreadTest[] myThreads = new MyThreadTest[4];
Thread[] threads = new Thread[4];
for (int i = 0; i < 4; i++)
{
myThreads[i] = new MyThreadTest($"T{i}", r.Next(1, 500));
threads[i] = new Thread(new ThreadStart(myThreads[i].ThreadSample));
threads[i].Start();
}
for (int i = 0; i < 4; i++)
{
threads[i].Join();
}
System.Console.WriteLine("Finished");
System.Console.ReadKey();
}
private static void TaskSample()
{
Random r = new Random();
MyThreadTest[] myTasks = new MyThreadTest[4];
Task[] tasks = new Task[4];
for (int i = 0; i < 4; i++)
{
myTasks[i] = new MyThreadTest($"T{i}", r.Next(1, 500));
tasks[i] = new Task(new Action(myTasks[i].ThreadSample));
tasks[i].Start();
}
for (int i = 0; i < 4; i++)
{
tasks[i].Wait();
}
System.Console.WriteLine("Finished");
System.Console.ReadKey();
}
}
class MyThreadTest
{
private string name;
private int interval;
public MyThreadTest(string name, int interval)
{
this.name = name;
this.interval = interval;
Console.WriteLine($"Thread created: {name},{interval}");
}
public void ThreadSample()
{
for (int i = 0; i < 5; i++)
{
Thread.Sleep(interval);
Console.WriteLine($"{name} At {i} on thread {Thread.CurrentThread.ManagedThreadId}");
}
}
public void TaskSample()
{
for (int i = 0; i < 5; i++)
{
Thread.Sleep(interval);
Console.WriteLine($"{name} At {i} on thread {Thread.CurrentThread.ManagedThreadId}");
}
}
}
}

The Task Parallel Library (TPL) is an abstraction, and you shouldn't try to compare Tasks directly with threads. The Task object represents the abstract concept of an asynchronous task - a piece of code that should execute asynchronously and which will either complete, fault (throw an exception) or be canceled. The abstraction means you can write and use such tasks without worrying too much about exactly how they're executed asynchronously. There are lots of useful things like ContinueWith() you can use to compose, sequence and otherwise manage tasks.
Threads are a much lower level concrete system facility that can be used to run code asynchronously, but without all the niceties you get from the Task Parallel Library (TPL). If you want to sequence tasks or anything like that, you have to code it yourself.
In your example code, you're not actually directly creating any threads. Instead, the Actions you've written are being executed by the system thread pool. Of course, this can be changed. The TPL abstraction layer provides the TaskScheduler class which you can extend - if you have some special way of running code asynchronously, you can write a TaskScheduler to use TPL with it.
async/await is 100% compiler sugar. The compiler decomposes an async method into chunks, each of which becomes a Task, and those chunks execute sequentially with the help of a state machine, all generated by the compiler. One caution: by default, await captures the current SynchronizationContext and resumes on that context. So if you're doing this in WPF or Windows Forms, your continuation code after an await isn't actually running in a thread at all, it's running on the UI thread. You can disable this by calling ConfigureAwait(false). Really, async/await are primarily intended for asynchronous programming in UI environments where synchronization to a main thread is important.

In the below code, what is the difference?
The difference is big. Task is a unit of work, which will use a thread(s) from thread pool allocated based on estimated amount of work to be computed. if there is another Task, and there are paused, but still alive threads, in the pool, instead of spinning of a new thread (which is very costy) it reuses already created one. Multiple tasks can end-up using the same thread eventually (non simultaneously obviously)
Task based parallelism in nutshell is: Tasks are jobs, ThreadPool provisions resource to complete those jobs. Consequence, more clever, elastic thread/resource utilization, especially in general purpose programs targeting variety of execution environments and resource availability, for example VMs on cloud.
I'm struggling to figure out async/await.
await implied dependency of one task from another. If in your case you don't have it, other than waiting all of them to complete, what are you doing is pretty much enough.
If you need, you can achieve that with TPL too via, for example, ContinueWith

Related

Dynamic limitation of concurrent tasks

Is there a type for this example, such as TaskScheduler or some other type with which I could dynamically change the number of tasks being executed in parallel? I just need a yes or no answer, if yes, which one. It is necessary only for the example below.
using System.Threading;
using System.Threading.Tasks;
namespace ConsoleApp1
{
internal class Program
{
static void Main(string[] args)
{
Task[] tasks = new Task[100000];
for (int i = 0; i < tasks.Length; i++)
{
tasks[i] = Task.Run(() => Thread.Sleep(10000));
}
Task.WaitAll(tasks);
}
}
}

Task run on the thread pool, to some extent you can leverage ThreadPool.SetMaxThreads (note that you cannot set the maximum number of worker threads or I/O completion threads to a number smaller than the number of processors on the computer):
int counter = 0;
ThreadPool.SetMaxThreads(32, 32);
Task[] tasks = new Task[1000];
for (int i = 0; i < tasks.Length; i++)
{
await Task.Yield();
if (i == 500)
{
ThreadPool.SetMaxThreads(16, 16);
}
tasks[i] = Task.Run(() =>
{
Interlocked.Increment(ref counter);
Console.WriteLine($"Concurrently running {Volatile.Read(ref counter)}"); // you will see change in maximum number of concurrent task running
Thread.Sleep(1000);
Interlocked.Decrement(ref counter);
Console.WriteLine($"Concurrently end {Volatile.Read(ref counter)}");
});
}
Task.WaitAll(tasks);
Though in general it is not recommended to change thread pool size:
Use caution when changing the maximum number of threads in the thread pool. While your code might benefit, the changes might have an adverse effect on code libraries you use.
Setting the thread pool size too large can cause performance problems. If too many threads are executing at the same time, the task switching overhead becomes a significant factor.
Another option is to adapt LimitedConcurrencyLevelTaskScheduler from documentation and use TaskFactory to instantiate new tasks.

Limiting the number of parallel task with SemaphoreSlim - why does it work?

in MS Docu you can read about SemaphoreSlim:
„Represents a lightweight alternative to Semaphore that limits the number of threads that can access a resource or pool of resources concurrently.“
https://learn.microsoft.com/en-us/dotnet/api/system.threading.semaphoreslim?view=net-5.0
In my understanding a Task is different from Thread. Task is higher level than Thread. Different tasks can run on the same thread. Or a task can be continued on another thread than it was started on.
(Compare: "server-side applications in .NET using asynchrony will use very few threads without limiting themselves to that. If everything really can be served by a single thread, it may well be - if you never have more than one thing to do in terms of physical processing, then that's fine."
from in C# how to run method async in the same thread)
IMO if you put this information together, the conclusion is that you can’t limit the number of Tasks running in parallel with the use of a semaphore slim, but…
there are other texts that give this kind of advice (How to limit the amount of concurrent async I/O operations?, see “You can definitely do this…”)
if I’m executing this code on my machine it seems it IS possible. If I work with different numbers for _MaxDegreeOfParallelism and different ranges of numbers, _RunningTasksCount doesn’t exceed the limit that is given by MaxDegreeOfParallelism.
Can somebody provide me some information to clearify?
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Hello World!");
IRunner runner = new RunnerSemaphore();
runner.Run();
Console.WriteLine("Hit any key to close...");
Console.ReadLine();
}
}
public class RunnerSemaphore : IRunner
{
private readonly SemaphoreSlim _ConcurrencySemaphore;
private List<int> _Numbers;
private int _MaxDegreeOfParallelism = 3;
private object _RunningTasksLock = new object();
private int _RunningTasksCount = 0;
public RunnerSemaphore()
{
_ConcurrencySemaphore = new SemaphoreSlim(_MaxDegreeOfParallelism);
_Numbers = _Numbers = Enumerable.Range(1, 100).ToList();
}
public void Run()
{
RunAsync().Wait();
}
private async Task RunAsync()
{
List<Task> allTasks = new List<Task>();
foreach (int number in _Numbers)
{
var task = Task.Run
(async () =>
{
await _ConcurrencySemaphore.WaitAsync();
bool isFast = number != 1;
int delay = isFast ? 200 : 10000;
Console.WriteLine($"Start Work {number}\tManagedThreadId {Thread.CurrentThread.ManagedThreadId}\tRunning {IncreaseTaskCount()} tasks");
await Task.Delay(delay).ConfigureAwait(false);
Console.WriteLine($"End Work {number}\tManagedThreadId {Thread.CurrentThread.ManagedThreadId}\tRunning {DecreaseTaskCount()} tasks");
})
.ContinueWith((t) =>
{
_ConcurrencySemaphore.Release();
});
allTasks.Add(task);
}
await Task.WhenAll(allTasks.ToArray());
}
private int IncreaseTaskCount()
{
int taskCount;
lock (_RunningTasksLock)
{
taskCount = ++ _RunningTasksCount;
}
return taskCount;
}
private int DecreaseTaskCount()
{
int taskCount;
lock (_RunningTasksLock)
{
taskCount = -- _RunningTasksCount;
}
return taskCount;
}
}

Represents a lightweight alternative to Semaphore that limits the number of threads that can access a resource or pool of resources concurrently.
Well, that was a perfectly fine description when SemaphoreSlim was first introduced - it was just a lightweight Semaphore. Since that time, it has gotten new methods (i.e., WaitAsync) that enable it to act like an asynchronous synchronization primitive.
In my understanding a Task is different from Thread. Task is higher level than Thread. Different tasks can run on the same thread. Or a task can be continued on another thread than it was started on.
This is true for what I call "Delegate Tasks". There's also a completely different kind of Task that I call "Promise Tasks". Promise tasks are similar to promises (or "futures") in other languages (e.g., JavaScript), and they just represent the completion of some event. Promise tasks do not "run" anywhere; they just complete based on some future event (usually via a callback).
async methods always return promise tasks. The code in an asynchronous method is not actually run as part of the task; the task itself only represents the completion of the async method. I recommend my async intro for more information about async and how the code portions are scheduled.
if you put this information together, the conclusion is that you can’t limit the number of Tasks running in parallel with the use of a semaphore slim
This is personal preference, but I try to be very careful about terminology, precisely to avoid problems like this question. Delegate tasks may run in parallel, e.g., Parallel. Promise tasks do not "run", and they don't run in "parallel", but you can have multiple concurrent promise tasks that are all in progress. And SemaphoreSlim's WaitAsync is a perfect match for limiting that kind of concurrency.
You may wish to read about Stephen Toub's AsyncSemaphore (and other articles in that series). It's not the same implementation as SemaphoreSlim, but behaves essentially the same as far as promise tasks are concerned.

Use Parallel.ForEach on method returning task - avoid Task.WaitAll

I've got a method which takes IWorkItem, starts work on it and returns related task. The method has to look like this because of external library used.
public Task WorkOn(IWorkItem workItem)
{
//...start asynchronous operation, return task
}
I want to do this work on multiple work items. I don't know how many of them will be there - maybe 1, maybe 10 000.
WorkOn method has internal pooling and may involve waiting if too many pararell executions will be reached. (like in SemaphoreSlim.Wait):
public Task WorkOn(IWorkItem workItem)
{
_semaphoreSlim.Wait();
}
My current solution is:
public void Do(params IWorkItem[] workItems)
{
var tasks = new Task[workItems.Length];
for (var i = 0; i < workItems.Length; i++)
{
tasks[i] = WorkOn(workItems[i]);
}
Task.WaitAll(tasks);
}
Question: may I use somehow Parallel.ForEach in this case? To avoid creating 10000 tasks and later wait because of WorkOn's throttling?

That actually is not that easy. You can use Parallel.ForEach to throttle the amount of tasks that are spawned. But I am unsure how that will perform/behave in your condition.
As a general rule of thumb I usually try to avoid mixing Task and Parallel.
Surely you can do something like this:
public void Do(params IWorkItem[] workItems)
{
Parallel.ForEach(workItems, (workItem) => WorkOn(workItem).Wait());
}
Under "normal" conditions this should limit your concurrency nicely.
You could also go full async-await and add some limiting to your concurrency with some tricks. But you have to do the concurrency limiting yourself in that case.
const int ConcurrencyLimit = 8;
public async Task Do(params IWorkItem[] workItems)
{
var cursor = 0;
var currentlyProcessing = new List<Task>(ConcurrencyLimit);
while (cursor < workItems.Length)
{
while (currentlyProcessing.Count < ConcurrencyLimit && cursor < workItems.Length)
{
currentlyProcessing.Add(WorkOn(workItems[cursor]));
cursor++;
}
Task finished = await Task.WhenAny(currentlyProcessing);
currentlyProcessing.Remove(finished);
}
await Task.WhenAll(currentlyProcessing);
}
As I said... a lot more complicated. But it will limit the concurrency to any value you apply as well. In addition it properly uses the async-await pattern. If you don't want non-blocking multi threading you can easily wrap this function into another function and do a blocking .Wait on the task returned by this function.
In key in this implementation is the Task.WhenAny function. This function will return one finished task in the applied list of task (wrapped by another task for the await.

Multithreaded code executes by threadnumber-times slower using System.Threading and Visual Studio C# Express Hosting Process

I have a very simple program counting the characters in a string. An integer threadnum sets the number of threads and divides the data by threadnum accordingly into chunks for each thread to process.
Each thread increments the values contained in a shared dictionary, building a character historgram.
private Dictionary<UInt32, int> dict = new Dictionary<UInt32, int>();
In order to wait for all threads to finish and continue with the main process, I invoke Thread.Join
Initially I had a local dictionary for each thread which get merged afterwards, but a shared dictionary worked fine, without locking.
No references are locked in the method BuildDictionary, though locking the dictionary did not significantly impact thread-execution time.
Each thread is timed, and the resulting dictionary compared.
The dictionary content is the same regardless of a single or multiple threads - as it should be.
Each thread takes a fraction determined by threadnum to complete - as it should be.
Problem:
The total time is roughly a multiple of threadnum , that is to say the execution time increases ?
(Unfortunately I cannot run a C# Profiler at the moment. Additionally I would prefer C# 3 code compatibility. )
Others are likely struggling as well. It may be that the VS 2010 express edition vshost process stacks and schedules threads to be run sequentially?
Another MT-performance issue was posted recently posted here as "Visual Studio C# 2010 Express Debug running Faster than Release":
Code:
public int threadnum = 8;
Thread[] threads = new Thread[threadnum];
Stopwatch stpwtch = new Stopwatch();
stpwtch.Start();
for (var threadidx = 0; threadidx < threadnum; threadidx++)
{
threads[threadidx] = new Thread(BuildDictionary);
threads[threadidx].Start(threadidx);
threads[threadidx].Join(); //Blocks the calling thread, till thread completion
}
WriteLine("Total - time: {0} msec", stpwtch.ElapsedMilliseconds);
Can you help please?
Update:
It appears that the strange behavior of an almost linear slowdown with increasing thread-number is an artifact due to the numerous hooks of the IDE's Debugger.
Running the process outside the developer environment, I actually do get a 30% speed increase on a 2 logical/physical core machine. During debugging I am already at the high end of CPU utilization, and hence I suspect it is wise to have some leeway during development through additional idle cores.
As initially, I let each thread compute on its own local data-chunk, which is locked and written back to a shared list and aggregated after all threads have finished.
Conclusion:
Be heedful of the environment the process is running in.

We can put the dictionary synchronization issues Tony the Lion mentions in his answer aside for the moment, because in your current implementation you are in fact not running anything in parallel!
Let's take a look at what you are currently doing in your loop:
Start a thread.
Wait for the thread to complete.
Start the next thread.
In other words, you should not be calling Join inside the loop.
Instead, you should start all threads as you are doing, but use a singaling construct such as an AutoResetEvent to determine when all threads have completed.
See example program:
class Program
{
static EventWaitHandle _waitHandle = new AutoResetEvent(false);
static void Main(string[] args)
{
int numThreads = 5;
for (int i = 0; i < numThreads; i++)
{
new Thread(DoWork).Start(i);
}
for (int i = 0; i < numThreads; i++)
{
_waitHandle.WaitOne();
}
Console.WriteLine("All threads finished");
}
static void DoWork(object id)
{
Thread.Sleep(1000);
Console.WriteLine(String.Format("Thread {0} completed", (int)id));
_waitHandle.Set();
}
}
Alternatively you could just as well be calling Join in the second loop if you have references to the threads available.
After you have done this you can and should worry about the dictionary synchronization problems.

A Dictionary can support multiple readers concurrently, as long as the collection is not modified. From MSDN
You say:
but a shared dictionary worked fine, without locking.
Each thread increments the values contained in a shared dictionary
Your program is by definition broken, if you alter the data in the dictionary without proper locking, you will end up with bugs. Nothing more needs to be said.

I wouldn't use some shared static Dictionary, if each thread worked on a local copy you could amalgamate your results once all threads had signalled completion.
WaitHandle.WaitAll avoids any deadlocking on an AutoResetEvent.
class Program
{
static void Main()
{
char[] text = "Some String".ToCharArray();
int numThreads = 5;
// I leave the implementation of the next line to the OP.
Partition[] partitions = PartitionWork(text, numThreads);
completions = new WaitHandle[numThreads];
results = IDictionary<char, int>[numThreads];
for (int i = 0; i < numThreads; i++)
{
results[i] = new IDictionary<char, int>();
completions[i] = new ManualResetEvent(false);
new Thread(DoWork).Start(
text,
partitions[i].Start,
partitions[i].End,
results[i],
completions[i]);
}
if (WaitHandle.WaitAll(completions, new TimeSpan(366, 0, 0, 0))
{
Console.WriteLine("All threads finished");
}
else
{
Console.WriteLine("Timed out after a year and a day");
}
// Merge the results
IDictionary<char, int> result = results[0];
for (int i = 1; i < numThreads - 1; i ++)
{
foreach(KeyValuePair<char, int> item in results[i])
{
if (result.ContainsKey(item.Key)
{
result[item.Key] += item.Value;
}
else
{
result.Add(item.Key, item.Value);
}
}
}
}
static void BuildDictionary(
char[] text,
int start,
int finish,
IDictionary<char, int> result,
WaitHandle completed)
{
for (int i = start; i <= finish; i++)
{
if (result.ContainsKey(text[i])
{
result[text[i]]++;
}
else
{
result.Add(text[i], 1);
}
}
completed.Set();
}
}
With this implementation the only variable that is ever shared is the char[] of the text and that is always read only.
You do have the burden of merging the dictionaries at the end but, that is a small price for avoiding any concurrencey issues. In a later version of the framework I would have used TPL and ConcurrentDictionary and possibly Partitioner<TSource>.

I totally agree with TonyTheLion and others, and as you fix the actual problem with join'ing at the wrong place, there still will be problem with (no) locks and updating the shared dictionary. I wanted to drop you a quick workaround: just wrap your integer value into some object:
instead of:
Dictionary<uint, int> dict = new Dictionary<uint, int>();
use:
class Entry { public int value; }
Dictionary<uint, Entry> dict = new Dictionary<uint, Entry>();
and now increment the Entry::value instead. That way, the Dictionary will not notice any changes and it will be safe without locking the dictionary.
Note: this will however work only if you are guaranteed if one thread would use only its own one Entry. I've just noticed this is not true as you said 'histogram of characters'. You will have to lock over each Entry during the increment, or some increments may be lost. Still, locking at Entry layer will speed up signinificantly when compared to locking at whole dictionary

Roem saw it.
Your main thread should Join the X other Threads after having started all of them.
Else it waits for the 1st thread to be finished, to start and wait for the 2nd one.
for (var threadidx = 0; threadidx < threadnum; threadidx++)
{
threads[threadidx] = new Thread(BuildDictionary);
threads[threadidx].Start(threadidx);
}
for (var threadidx = 0; threadidx < threadnum; threadidx++)
{
threads[threadidx].Join(); //Blocks the calling thread, till thread completion
}

As Rotem points out, by joining in the loop you are waiting for each thread to complete before going continuing.
The hint for why this is can be found on the Thread.Join documentation on MSDN
Blocks the calling thread until a thread terminates
So you loop will not continue until that one thread has completed it's work. To start all the threads then wait for them to complete, join them outside the loop:
public int threadnum = 8;
Thread[] threads = new Thread[threadnum];
Stopwatch stpwtch = new Stopwatch();
stpwtch.Start();
// Start all the threads doing their work
for (var threadidx = 0; threadidx < threadnum; threadidx++)
{
threads[threadidx] = new Thread(BuildDictionary);
threads[threadidx].Start(threadidx);
}
// Join to all the threads to wait for them to complete
for (var threadidx = 0; threadidx < threadnum; threadidx++)
{
threads[threadidx].Join();
}
System.Diagnostics.Debug.WriteLine("Total - time: {0} msec", stpwtch.ElapsedMilliseconds);
You will really need to post your BuildDictionary function. It is very likely that the operation will be no faster with multiple threads and the threading overhead will actually increase execution time.

Separate threadPool for each task

I've got application which has two main task: encoding, processing video.
These tasks are independant.
Each task I would like run with configurable number of threads.
For this reason for one task I usually use ThreadPool and SetMaxThreads. But now I've got two tasks and would like "two configurable(number of threads) threapool for each task".
Well, ThreadPool is a static class. So how can I implement my strategy(easy configurable number of threads for each task).
Thanks

You will probably want your own thread pool. If you are using .NET 4.0 then it is actually fairly easy to roll your own if you use the BlockingCollection class.
public class CustomThreadPool
{
private BlockingCollection<Action> m_WorkItems = new BlockingCollection<Action>();
public CustomThreadPool(int numberOfThreads)
{
for (int i = 0; i < numberOfThreads; i++)
{
var thread = new Thread(
() =>
{
while (true)
{
Action action = m_WorkItems.Take();
action();
}
});
thread.IsBackground = true;
thread.Start();
}
}
public void QueueUserWorkItem(Action action)
{
m_WorkItems.Add(action);
}
}
That is really all there is to it. You would create a CustomThreadPool for each actual pool you want to control. I posted the minimum amount of code to get a crude thread pool going. Naturally, you might want to tweak and expand this implementation to suit your specific need.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.