Handling asynchronous threads - c#

I (think that I) understand the differences between threads and tasks.
Threads allow us to do multiple things in parallel (they are CPU-bound).
Asynchronous tasks release the processor time while some I/O work is done (they are I/O-bound).
Now, let's say I want to do multiple asynchronous tasks in parallel. For example, I want to download several pages of a paged response at the same time. Or, I want to write new data into two different databases. What is the correct way to handle the threads? Should they be async and awaited? Or can the async operation be just inside the thread? What is the best practice for error handling?
I have tried creating my own utility method to start a new async thread, but I have a feeling that it can go horribly wrong.
public static Task<Thread> RunInThreadAsync<T>(T actionParam, Func<T, Task> asyncAction)
{
var thread = new Thread(async () => await asyncAction(actionParam));
thread.Start();
return thread;
}
Is this ok? Or should the method be public static async Task<Thread>? If yes, what should be awaited? There is no thread.StartAsync(). Or should I use Task.Run instead?
Note: Using await Task.WhenAll or similar approaches without an explicit new thread is not an option for me. The "worker" thread is run in background (to avoid blocking the main thread) and is later processed by other services in the system.

I (think that I) understand the differences between threads and tasks.
There's one important concept missing here: concurrency. Concurrency is doing more than one thing at a time. This is different than "parallel", which is a term most developers use to mean "doing more than one thing at a time using threads". So, parallelism is one form of concurrency, and asynchrony is another form of concurrency.
Now, let's say I want to do multiple asynchronous tasks in parallel.
And here's the problem: mixing two forms of concurrency. What you really want to do is multiple asynchronous tasks concurrently. And the way to do this is via Task.WhenAll.
Using await Task.WhenAll or similar approaches without an explicit new thread is not an option for me. The "worker" thread is run in background (to avoid blocking the main thread) and is later processed by other services in the system.
This argument doesn't make any sense. Asynchronous code won't block the main thread because it's asynchronous. There's no explicit thread necessary.
If, for some unknown reason, you really do need a background thread, then just wrap your code in Task.Run. Thread should only ever be used for COM interop; any other use of Thread is legacy code as soon as it is written.

System.Threading.Thread has been in .NET since version 1.1. It allows you to control multiple worker threads within your application. This only uses 1 core of your CPU.
The Task Parallel Library (TPL) introduced the ability to leverage multiple cores on your machine with async Tasks or System.Threading.Tasks.Task<T>.
My approach for your "multiple downloader" scenario, would be to create a new CancellationTokenSource which allows me to cancel my Tasks. The I would start creating my Task<T> and start them. You can use Task.WaitAll() to sit and wait.
You should be aware that you can chain your tasks together in a sequence by using the ContinueWith<T>() method.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApp2
{
async class Program
{
static bool DownloadFile (string path)
{
// Do something here. long running task.
// check for cancellation -> Task.Factory.CancellationToken.IsCancellationRequested
return true;
}
static async void Main(string[] args)
{
var paths = new[] { "Somepaths", "to the files youwant", "to download" };
List<Task<bool>> results = new List<Task<bool>>();
var cts = new System.Threading.CancellationTokenSource();
foreach(var path in paths)
{
var task = new Task<bool>(_path => DownloadFile((string)_path), path, cts.Token);
task.Start();
results.Add(task);
}
// use cts.Cancel(); to cancel all associated tasks.
// Task.WhenAll() to do something when they are all done.
// Task.WaitAll( results.ToArray() ); // to sit and wait.
Console.WriteLine("Press <Enter> to quit.");
var final = Console.ReadLine();
}
}
}

Related

Start async method in new thread - No overload

I have these methods in a class
public async Task GetCompanies(int requestDuration, long startTimepoint)
{
_requestDuration = requestDuration;
_startTimepoint = startTimepoint;
Thread thread = new Thread(new ThreadStart(Test));
// This line doesnt compile - No overload for GetCompaniesApi matches delegate ThreadStart
Thread thread2 = new Thread(new ThreadStart(GetCompaniesApi));
}
public void Test()
{
}
public async Task GetCompaniesApi (int requestDuration, long? startTimepoint)
{
// code removed as not relevant
}
So my question is how can I run a method that is async in a different thread, I don't really know what "No overload for GetCompaniesApi matches delegate ThreadStart" means, or what I need to change.
EDIT
If I just explain fully what i'm trying to do that might be better than the more specific question I asked at the start.
Basically I want to call a HTTP GET request which is streaming, as in it never ends, so I want to force the HTTP GET request to end after X seconds and whatever we have got from the response body at that point will be it.
So in order to try and do this I thought i'd run that HTTP GET request in a separate thread, then sleep the main thread, then somehow get the other thread to stop. I don't see how its possible to use cancellation tokens as the thread is stuck on line "await _streamToReadFrom.CopyToAsync(streamToWriteTo);" all the time.
public async Task GetCompanies(int requestDuration, long startTimepoint)
{
Task task = Task.Run(() => { GetCompaniesApi(requestDuration, startTimepoint); });
Thread.Sleep(requestDuration * 1000);
// Is it now possible to cancel task?
}
public async Task GetCompaniesApi (int requestDuration, long? startTimepoint)
{
string url = $"https://stream.companieshouse.gov.uk/companies?timepoint={startTimepoint}";
using (HttpResponseMessage response = await _httpClient.GetAsync(url, HttpCompletionOption.ResponseHeadersRead))
using (_streamToReadFrom = await response.Content.ReadAsStreamAsync())
{
string fileToWriteTo = Path.GetTempFileName();
using (Stream streamToWriteTo = System.IO.File.Open(fileToWriteTo, FileMode.Create))
{
await _streamToReadFrom.CopyToAsync(streamToWriteTo);
}
}
}
ThreadStart is a delegate representing parameterless method with void return value (so it it not async aware, i.e. the tread will not wait for task completion), while your method requires 2 parameters to be passed so from purely technical standpoint you can do something like new ThreadStart(() => GetCompaniesApi(1,2)), or just new Thread(() => GetCompaniesApi(1, 2)) (compiler will create the delegate for you). But you should not.
In modern .NET rarely there is need to create threads directly, just use Task.Run to schedule your async method on the thread pool (do not forget to provide the parameters):
await Task.Run(() => Test());
For async method - just invoke it:
var result = await GetCompaniesApi(someInt, someNullableLong);
If for some reason it is not actually async then better fix the method itself, but if needed you can wrap it into Task.Run too.
I think the question reveals a potential misunderstanding of multi threading and async.
The main purpose of async is to hide the latency of IO operations, like network or disk access. That helps reducing resource usage, and keeping the UI responsive.
Using background threads are mostly for hiding the latency of computations. For example if you are doing some slow image processing task. In that case you typically use Task.Run to execute a synchronous method, and await the result. Just make sure the method is thread safe.
While there are cases that mix these types of work and where you need to combine both methods, it is not that common. From your code I would guess your work is mostly IO related, so you should probably not use any backgrounds threads. Note that some libraries lie about being asynchronous, i.e. methods that return a task, but will still block. In that treating the method as a synchronous thread and use Task.Run to execute it might be warranted.
Also keep in mind that multi threaded programming is difficult. It introduces a great number of new types of faults, and most faults will not be caught by the compiler, and many are spurious and difficult to reproduce. So you really need to be aware of the dangers and know how to correctly use locks and other forms of synchronization.

Do race conditions exist when using just async/await?

.NET 5.0
using Microsoft.VisualStudio.TestTools.UnitTesting;
using System.Threading.Tasks;
using System;
using System.Collections.Generic;
namespace AsyncTest
{
[TestClass]
public class AsyncTest
{
public async Task AppendNewIntVal(List<int> intVals)
{
await Task.Delay(new Random().Next(15, 45));
intVals.Add(new Random().Next());
}
public async Task AppendNewIntVal(int count, List<int> intVals)
{
var appendNewIntValTasks = new List<Task>();
for (var a = 0; a < count; a++)
{
appendNewIntValTasks.Add(AppendNewIntVal(intVals));
}
await Task.WhenAll(appendNewIntValTasks);
}
[TestMethod]
public async Task TestAsyncIntList()
{
var appendCount = 30;
var intVals = new List<int>();
await AppendNewIntVal(appendCount, intVals);
Assert.AreEqual(appendCount, intVals.Count);
}
}
}
The above code compiles and runs, but the test fails with output similar to:
Assert.AreEqual failed. Expected:<30>. Actual:<17>.
In the above example the "Actual" value is 17, but it varies between executions.
I know I am missing some understanding around how asynchronous programming works in .NET as I'm not getting the expected output.
From my understanding, the AppendNewIntVal method kicks off N number of tasks, then waits for them all to complete. If they have all completed, I'd expect they would have each appended a single value to the list but that's not the case. It looks like there's a race condition but I didn't think that was possible because the code is not multithreaded. What am I missing?
Yes, if you don't await each awaitable immediately, i.e. here:
appendNewIntValTasks.Add(AppendNewIntVal(intVals));
This line is in async terms equivalent to (in thread-based code) Thread.Start, and we now have no safety around the inner async code:
intVals.Add(new Random().Next());
which can now fail in the same concurrency ways when two flows call Add at the same time. You should also probably avoid new Random(), as that isn't necessarily random (it is time based on many framework versions, and can end up with two flows getting the same seed).
So: the code as shown is indeed dangerous.
The obviously safe version is:
public async Task AppendNewIntVal(int count, List<int> intVals)
{
for (var a = 0; a < count; a++)
{
await AppendNewIntVal(intVals);
}
}
It is possible to defer the await, but you're explicitly opting into concurrency when you do that, and your code needs to handle it suitably defensively.
Yes, race conditions do exist.
Async methods are basically tasks that can potentially run in parallel, depending on a task scheduler they are submitted to. The default one is ThreadPoolTaskScheduler, which is a wrapper around ThreadPool. Thus, if you submit your tasks to a scheduler (thread pool) that can execute multiple tasks in parallel, you are likely going to run into race conditions.
You could make your code a bit safer:
lock (intVals) intVals.Add(new Random().Next());
But then this opens up another can of worms :)
If you are interested in more details about async programming, see this link. Also this article is quite useful and explains best practices in asynchronous programming.
Happy (asynchronous) coding!
Yes, race conditions are indeed possible when using async/await in a way that introduces concurrency. To introduce concurrency you must:
Launch multiple asynchronous operations concurrently, i.e. launch the next operation without awaiting the completion of the previous operation, and,
Have no ambient synchronization¹ mechanism in place, namely a SynchronizationContext, that would synchronize the execution of the continuations of the asynchronous operations.
In your case both conditions are met, so the continuations are running on multiple threads concurrently. And since the List<T> class is not thread-safe, you get undefined behavior.
To see what effect a SynchronizationContext has in a situation like this, you can install the Nito.AsyncEx.Context package and do this:
[TestMethod]
public void TestAsyncIntList()
{
AsyncContext.Run(async () =>
{
var appendCount = 30;
var intVals = new List<int>();
await AppendNewIntVal(appendCount, intVals);
Assert.AreEqual(appendCount, intVals.Count);
});
}
FYI many types of applications install automatically a SynchronizationContext when launched (WPF and Windows Forms to name a few). Console applications do not though, hence the need to be extra cautious when writing an async-enabled console application.
¹ It's worth noting that the similar-looking terms synchronized/unsynchronized and synchronous/asynchronous are mostly unrelated. This can be a source of confusion for someone who is not familiar with these terms. The first term is about preventing multiple threads from accessing a shared resource concurrently. The second term is about doing something without blocking a thread.

Task based vs. thread based Watchdog - but async needed

We're using watchdogs to determine whether a connected system is still alive or not.
In the previous code we used TCP directly and treated the watchdog in a separate thread. Now is a new service used that provides it's data using gRPC.
For that we tried using the async interface with tasks but a task based watchdog will fail.
I wrote a small DEMO that abstracts the code and illustrates the problem. You can switch between task based watchdog and thread based watchdog by commenting out line 18 with //.
The demo contains this code that causes the problem:
async Task gRPCSendAsync(CancellationToken cancellationToken = default) => await Task.Yield();
async Task gRPCReceiveAsync(CancellationToken cancellationToken = default) => await Task.Yield();
var start = DateTime.UtcNow;
await gRPCSendAsync(cancellationToken).ConfigureAwait(false);
await gRPCReceiveAsync(cancellationToken).ConfigureAwait(false);
var end = DateTime.UtcNow;
if ((end - start).TotalMilliseconds >= 100)
// signal failing
If this code is used in Task.Run it will signal failing if the application has a lot cpu-work to do in other tasks.
If a dedicated thread is used the watchdog works as expected and no problem is raise.
I do understand the problem: All code after await may be (if not finished already or does not contain a "real" await) queued to the thread pool. But the thread pool has other things to do so that it took too long to finish the method.
Yes the simple answer is: USE THREAD.
But using a thread limits us to only use synchronous methods. There is no way to call an async method out of a thread. I created another sample that shows that all code after first await will be queued to thread bool so that CallAsync().Wait() will not work. (Btw. that issue is much more handled here.)
We're having a lot of async code that may be used within such time critical operations.
So the question is: Is there any way to perform that that operations using tasks with async/await?
Maybe I'm completely wrong and creating an task based watchdog should be done very differently.
thoughts
I was thinking about System.Threading.Timer but the problem of async sending and async receiving will cause that problem anyways.
Here is how you could use Stephen Cleary's AsyncContext class from the Nito.AsyncEx.Context package, in order to constrain an asynchronous workflow to a dedicated thread:
await Task.Factory.StartNew(() =>
{
AsyncContext.Run(async () =>
{
await DoTheWatchdogAsync(watchdogCts.Token);
});
}, TaskCreationOptions.LongRunning);
The call to AsyncContext.Run will block until the supplied asynchronous operation is completed. All asynchronous continuations created by the DoTheWatchdogAsync will be processed internally by the AsyncContext on the current thread. In the above example the current thread is not a ThreadPool thread, because of the flag TaskCreationOptions.LongRunning used in the construction of the wrapper Task. You could confirm this by querying the property Thread.CurrentThread.IsThreadPoolThread.
If you prefer you could use a traditional Thread constructor instead of the somewhat unconventional Task.Factory.StartNew+LongRunning.

Waiting on a continuous UI background polling task

I am somewhat new to parallel programming C# (When I started my project I worked through the MSDN examples for TPL) and would appreciate some input on the following example code.
It is one of several background worker tasks. This specific task pushes status messages to a log.
var uiCts = new CancellationTokenSource();
var globalMsgQueue = new ConcurrentQueue<string>();
var backgroundUiTask = new Task(
() =>
{
while (!uiCts.IsCancellationRequested)
{
while (globalMsgQueue.Count > 0)
ConsumeMsgQueue();
Thread.Sleep(backgroundUiTimeOut);
}
},
uiCts.Token);
// Somewhere else entirely
backgroundUiTask.Start();
Task.WaitAll(backgroundUiTask);
I'm asking for professional input after reading several topics like Alternatives to using Thread.Sleep for waiting, Is it always bad to use Thread.Sleep()?, When to use Task.Delay, when to use Thread.Sleep?, Continuous polling using Tasks
Which prompts me to use Task.Delay instead of Thread.Sleep as a first step and introduce TaskCreationOptions.LongRunning.
But I wonder what other caveats I might be missing? Is polling the MsgQueue.Count a code smell? Would a better version rely on an event instead?
First of all, there's no reason to use Task.Start or use the Task constructor. Tasks aren't threads, they don't run themselves. They are a promise that something will complete in the future and may or may not produce any results. Some of them will run on a threadpool thread. Use Task.Run to create and run the task in a single step when you need to.
I assume the actual problem is how to create a buffered background worker. .NET already offers classes that can do this.
ActionBlock< T >
The ActionBlock class already implements this and a lot more - it allows you to specify how big the input buffer is, how many tasks will process incoming messages concurrently, supports cancellation and asynchronous completion.
A logging block could be as simple as this :
_logBlock=new ActionBlock<string>(msg=>File.AppendAllText("myLog.txt",msg));
The ActionBlock class itself takes care of buffering the inputs, feeding new messages to the worker function when it arrives, potentially blocking senders if the buffer gets full etc. There's no need for polling.
Other code can use Post or SendAsync to send messages to the block :
_block.Post("some message");
When we are done, we can tell the block to Complete() and await for it to process any remaining messages :
_block.Complete();
await _block.Completion;
Channels
A newer, lower-level option is to use Channels. You can think of channels as a kind of asynchronous queue, although they can be used to implement complex processing pipelines. If ActionBlock was written today, it would use Channels internally.
With channels, you need to provide the "worker" task yourself. There's no need for polling though, as the ChannelReader class allows you to read messages asynchronously or even use await foreach.
The writer method could look like this :
public ChannelWriter<string> LogIt(string path,CancellationToken token=default)
{
var channel=Channel.CreateUnbounded<string>();
var writer=channel.Writer;
_=Task.Run(async ()=>{
await foreach(var msg in channel.Reader.ReadAllAsync(token))
{
File.AppendAllText(path,msg);
}
},token).ContinueWith(t=>writer.TryComplete(t.Exception);
return writer;
}
....
_logWriter=LogIt(somePath);
Other code can send messages by using WriteAsync or TryWrite, eg :
_logWriter.TryWrite(someMessage);
When we're done, we can call Complete() or TryComplete() on the writer :
_logWriter.TryComplete();
The line
.ContinueWith(t=>writer.TryComplete(t.Exception);
is needed to ensure the channel is closed even if an exception occurs or the cancellation token is signaled.
This may seem too cumbersome at first. Channels allow us to easily run initialization code or carry state from one message to the next. We could open a stream before the loop starts and use it instead of reopening the file each time we call File.AppendAllText, eg :
public ChannelWriter<string> LogIt(string path,CancellationToken token=default)
{
var channel=Channel.CreateUnbounded<string>();
var writer=channel.Writer;
_=Task.Run(async ()=>{
//***** Can't do this with an ActionBlock ****
using(var writer=File.AppendText(somePath))
{
await foreach(var msg in channel.Reader.ReadAllAsync(token))
{
writer.WriteLine(msg);
//Or
//await writer.WriteLineAsync(msg);
}
}
},token).ContinueWith(t=>writer.TryComplete(t.Exception);
return writer;
}
Definitely Task.Delay is better than Thread.Sleep, because you will not be blocking the thread on the pool, and during the wait the thread on the pool will be available to handle other tasks. Then, you don't need to make your task long-running. Long-running tasks are run in a dedicated thread, and then Task.Delay is meaningless.
Instead, I will recommend a different approach. Just use System.Threading.Timer and make your life simple. Timers are kernel objects that will run their callback on the thread pool, and you will not have to worry about delay or sleep.
The TPL Dataflow library is the preferred tool for this kind of job. It allows building efficient producer-consumer pairs quite easily, and more complex pipelines as well, while offering a complete set of configuration options. In your case using a single ActionBlock should be enough.
A simpler solution you might consider is to use a BlockingCollection. It has the advantage of not requiring the installation of any package (because it is built-in), and it's also much easier to learn. You don't have to learn more than the methods Add, CompleteAdding, and GetConsumingEnumerable. It also supports cancellation. The drawback is that it's a blocking collection, so it blocks the consumer thread while waiting for new messages to arrive, and the producer thread while waiting for available space in the internal buffer (only if you specify a boundedCapacity in the constructor).
var uiCts = new CancellationTokenSource();
var globalMsgQueue = new BlockingCollection<string>();
var backgroundUiTask = new Task(() =>
{
foreach (var item in globalMsgQueue.GetConsumingEnumerable(uiCts.Token))
{
ConsumeMsgQueueItem(item);
}
}, uiCts.Token);
The BlockingCollection uses a ConcurrentQueue internally as a buffer.

Using Task.Yield to overcome ThreadPool starvation while implementing producer/consumer pattern

Answering the question: Task.Yield - real usages?
I proposed to use Task.Yield allowing a pool thread to be reused by other tasks. In such pattern:
CancellationTokenSource cts;
void Start()
{
cts = new CancellationTokenSource();
// run async operation
var task = Task.Run(() => SomeWork(cts.Token), cts.Token);
// wait for completion
// after the completion handle the result/ cancellation/ errors
}
async Task<int> SomeWork(CancellationToken cancellationToken)
{
int result = 0;
bool loopAgain = true;
while (loopAgain)
{
// do something ... means a substantial work or a micro batch here - not processing a single byte
loopAgain = /* check for loop end && */ cancellationToken.IsCancellationRequested;
if (loopAgain) {
// reschedule the task to the threadpool and free this thread for other waiting tasks
await Task.Yield();
}
}
cancellationToken.ThrowIfCancellationRequested();
return result;
}
void Cancel()
{
// request cancelation
cts.Cancel();
}
But one user wrote
I don't think using Task.Yield to overcome ThreadPool starvation while
implementing producer/consumer pattern is a good idea. I suggest you
ask a separate question if you want to go into details as to why.
Anybody knows, why is not a good idea?
There are some good points left in the comments to your question. Being the user you quoted, I'd just like to sum it up: use the right tool for the job.
Using ThreadPool doesn't feel like the right tool for executing multiple continuous CPU-bound tasks, even if you try to organize some cooperative execution by turning them into state machines which yield CPU time to each other with await Task.Yield(). Thread switching is rather expensive; by doing await Task.Yield() on a tight loop you add a significant overhead. Besides, you should never take over the whole ThreadPool, as the .NET framework (and the underlying OS process) may need it for other things. On a related note, TPL even has the TaskCreationOptions.LongRunning option that requests to not run the task on a ThreadPool thread (rather, it creates a normal thread with new Thread() behind the scene).
That said, using a custom TaskScheduler with limited parallelism on some dedicated, out-of-pool threads with thread affinity for individual long-running tasks might be a different thing. At least, await continuations would be posted on the same thread, which should help reducing the switching overhead. This reminds me of a different problem I was trying to solve a while ago with ThreadAffinityTaskScheduler.
Still, depending on a particular scenario, it's usually better to use an existing well-established and tested tool. To name a few: Parallel Class, TPL Dataflow, System.Threading.Channels, Reactive Extensions.
There is also a whole range of existing industrial-strength solutions to deal with Publish-Subscribe pattern (RabbitMQ, PubNub, Redis, Azure Service Bus, Firebase Cloud Messaging (FCM), Amazon Simple Queue Service (SQS) etc).
After a bit of debating on the issue with other users - who are worried about the context switching and its influence on the performance.
I see what they are worried about.
But I meant: do something ... inside the loop to be a substantial task - usually in the form of a message handler which reads a message from the queue and processes it. The message handlers are usually user defined and the message bus executes them using some sort of dispatcher. The user can implement a handler which executes synchronously (nobody knows what the user will do), and without Task.Yield that will block the thread to process those synchronous tasks in a loop.
Not to be empty worded i added tests to github: https://github.com/BBGONE/TestThreadAffinity
They compare the ThreadAffinityTaskScheduler, .NET ThreadScheduler with BlockingCollection and .NET ThreadScheduler with Threading.Channels.
The tests show that for Ultra Short jobs the performance degradation is
around 15%. To use the Task.Yield without the performance degradation (even small) - it is not to use extremely short tasks and if the task is too short then combine shorter tasks into a bigger batch.
[The price of context switch] = [context switch duration] / ([job duration]+[context switch duration]).
In that case the influence of the switching the tasks is negligible on the performance. But it adds a better task cooperation and responsiveness of the system.
For long running tasks it is better to use a custom Scheduler which executes tasks on its own dedicated thread pool - (like the WorkStealingTaskScheduler).
For the mixed jobs - which can contain different parts - short running CPU bound, asynchronous and long running code parts. It is better to split the task into subtasks.
private async Task HandleLongRunMessage(TestMessage message, CancellationToken token = default(CancellationToken))
{
// SHORT SYNCHRONOUS TASK - execute as is on the default thread (from thread pool)
CPU_TASK(message, 50);
// IO BOUND ASYNCH TASK - used as is
await Task.Delay(50);
// BUT WRAP the LONG SYNCHRONOUS TASK inside the Task
// which is scheduled on the custom thread pool
// (to save threadpool threads)
await Task.Factory.StartNew(() => {
CPU_TASK(message, 100000);
}, token, TaskCreationOptions.DenyChildAttach, _workStealingTaskScheduler);
}

Categories