How to handle a deadlock in third-party code - c#

We have a third-party method Foo which sometimes runs in a deadlock for unknown reasons.
We are executing an single-threaded tcp-server and call this method every 30 seconds to check that the external system is available.
To mitigate the problem with the deadlock in the third party code we put the ping-call in a Task.Run to so that the server does not deadlock.
Like
async Task<bool> WrappedFoo()
{
var timeout = 10000;
var task = Task.Run(() => ThirdPartyCode.Foo());
var delay = Task.Delay(timeout);
if (delay == await Task.WhenAny(delay, task ))
{
return false;
}
else
{
return await task ;
}
}
But this (in our opinion) has the potential to starve the application of free threads. Since if one call to ThirdPartyCode.Foo deadlock the thread will never recover from this deadlock and if this happens often enough we might run out of resources.
Is there a general approach how one should handle deadlocking third-party code?
A CancellationToken won't work because the third-party-api does not provide any cancellation options.
Update:
The method at hand is from the SAPNCO.dll provided by SAP to establish and test rfc-connections to a sap-system, therefore the method is not a simple network-ping. I renamed the method in the question to avoid further misunderstandings

Is there a general approach how one should handle deadlocking third-party code?
Yes, but it's not easy or simple.
The problem with misbehaving code is that it can not only leak resources (e.g., threads), but it can also indefinitely hold onto important resources (e.g., some internal "handle" or "lock").
The only way to forcefully reclaim threads and other resources is to end the process. The OS is used to cleaning up misbehaving processes and is very good at it. So, the solution here is to start a child process to do the API call. Your main application can communicate with its child process by redirected stdin/stdout, and if the child process ever times out, the main application can terminate it and restart it.
This is, unfortunately, the only reliable way to cancel uncancelable code.

Cancelling a task is a collaborative operation in that you pass a CancellationToken to the desired method and externally you use CancellationTokenSource.Cancel:
public void Caller()
{
try
{
CancellationTokenSource cts=new CancellationTokenSource();
Task longRunning= Task.Run(()=>CancellableThirdParty(cts.Token),cts.Token);
Thread.Sleep(3000); //or condition /signal
cts.Cancel();
}catch(OperationCancelledException ex)
{
//treat somehow
}
}
public void CancellableThirdParty(CancellationToken token)
{
while(true)
{
// token.ThrowIfCancellationRequested() -- if you don't treat the cancellation here
if(token.IsCancellationRequested)
{
// code to treat the cancellation signal
//throw new OperationCancelledException($"[Reason]");
}
}
}
As you can see in the code above , in order to cancel an ongoing task , the method running inside it must be structured around the CancellationToken.IsCancellationRequested flag or simply CancellationToken.ThrowIfCancellationRequested method ,
so that the caller just issues the CancellationTokenSource.Cancel.
Unfortunately if the third party code is not designed around CancellationToken ( it does not accept a CancellationToken parameter ), then there is not much you can do.

Your code isn't cancelling the blocked operation. Use a CancellationTokenSource and pass a cancellation token to Task.Run instead :
var cts=new CancellationTokenSource(timeout);
try
{
await Task.Run(() => ThirdPartyCode.Ping(),cts.Token);
return true;
}
catch(TaskCancelledException)
{
return false;
}
It's quite possible that blocking is caused due to networking or DNS issues, not actual deadlock.
That still wastes a thread waiting for a network operation to complete. You could use .NET's own Ping.SendPingAsync to ping asynchronously and specify a timeout:
var ping=new Ping();
var reply=await ping.SendPingAsync(ip,timeout);
return reply.Status==IPStatus.Success;
The PingReply class contains far more detailed information than a simple success/failure. The Status property alone differentiates between routing problems, unreachable destinations, time outs etc

Related

Running multipe Task<> in an enterprise application in a safe way

I'm designing the software architecture for a product who can instantiate a series of "agents" doing some useful things.
Let's say each agent implement an interface having a function:
Task AsyncRun(CancellationToken token)
Because since these agents are doing a lot of I/O it could make some sense having as an async function. More over, the AsyncRun is supposed never complete, if no exception or explict cancellation occour.
Now the question is: main program has to run this on multiple agents, I would like to know the correct way of running that multiple task, signal each single completion ( that are due to cancellation/errors ):
for example I'm thinking on something like having an infinite loop like this
//.... all task cretaed are in the array tasks..
while(true)
{
await Task.WhenAny(tasks)
//.... check each single task for understand which one(s) exited
// re-run the task if requested replacing in the array tasks
}
but not sure if it is the correct ( or even best way )
And moreover I would like to know if this is the correct pattern, especially because the implementer can mismatch the RunAsync and do a blocking call, in which case the entire application will hang.
// re-run the task if requested replacing in the array tasks
This is the first thing I'd consider changing. It's far better to not let an application handle its own "restarting". If an operation failed, then there's no guarantee that an application can recover. This is true for any kind of operation in any language/runtime.
A better solution is to let another application restart this one. Allow the exception to propagate (logging it if possible), and allow it to terminate the application. Then have your "manager" process (literally a separate executable process) restart as necessary. This is the way all modern high-availability systems work, from the Win32 services manager, to ASP.NET, to the Kubernetes container manager, to the Azure Functions runtime.
Note that if you do want to take this route, it may make sense to split up the tasks to different processes, so they can be restarted independently. That way a restart in one won't cause a restart in others.
However, if you want to keep all your tasks in the same process, then the solution you have is fine. If you have a known number of tasks at the beginning of the process, and that number won't change (unless they fail), then you can simplify the code a bit by factoring out the restarting and using Task.WhenAll instead of Task.WhenAny:
async Task RunAsync(Func<CancellationToken, Task> work, CancellationToken token)
{
while (true)
{
try { await work(token); }
catch
{
// log...
}
if (we-should-not-restart)
break;
}
}
List<Func<CancellationToken, Task>> workToDo = ...;
var tasks = workToDo.Select(work => RunAsync(work, token));
await Task.WhenAll(tasks);
// Only gets here if they all complete/fail and were not restarted.
the implementer can mismatch the RunAsync and do a blocking call, in which case the entire application will hang.
The best way to prevent this is to wrap the call in Task.Run, so this:
await work(token);
becomes this:
await Task.Run(() => work(token));
In order to know whether the task completes successfully, or is cancelled or faulted, you could use a continuation. The continuation will be invoked as soon as the task finishes, whether that's because of failure, cancellation or completion. :
using (var tokenSource = new CancellationTokenSource())
{
IEnumerable<IAgent> agents; // TODO: initialize
var tasks = new List<Task>();
foreach (var agent in agents)
{
var task = agent.RunAsync(tokenSource.Token)
.ContinueWith(t =>
{
if (t.IsCanceled)
{
// Do something if cancelled.
}
else if (t.IsFaulted)
{
// Do something if faulted (with t.Exception)
}
else
{
// Do something if the task has completed.
}
});
tasks.Add(task);
}
await Task.WhenAll(tasks);
}
In the end you will wait for the continued tasks. Also see this answer.
If you are afraid that the IAgent implementations will create blocking calls and want to prevent the application from hanging, you can wrap the call to the async method in Task.Run. This way the call to the agent is executed on the threadpool and is therefore non-blocking:
var task = Task.Run(async () =>
await agent.RunAsync(tokenSource.Token)
.ContinueWith(t =>
{
// Same as above
}));
You may want to use Task.Factory.StartNew instead to mark the task as longrunning for example.

Throttling with SemaphoreSlim -- "Task.Run()" vs "new Func<Task>()"

This might not be specific to SemaphoreSlim exclusively, but basically my question is about whether there is a difference between the below two methods of throttling a collection of long running tasks, and if so, what that difference is (and when if ever to use either).
In the example below, let's say that each tracked task involves loading data from a Url (totally made up example, but is a common one that I've found for SemaphoreSlim examples).
The main difference comes down to how the individual tasks are added to the list of tracked tasks. In the first example, we call Task.Run() with a lambda, whereas in the second, we new up a Func(<Task<Result>>()) with a lambda and then immediately call that func and add the result to the tracked task list.
Examples:
Using Task.Run():
SemaphoreSlim ss = new SemaphoreSlim(_concurrentTasks);
List<string> urls = ImportUrlsFromSource();
List<Task<Result>> trackedTasks = new List<Task<Result>>();
foreach (var item in urls)
{
await ss.WaitAsync().ConfigureAwait(false);
trackedTasks.Add(Task.Run(async () =>
{
try
{
return await ProcessUrl(item);
}
catch (Exception e)
{
_log.Error($"logging some stuff");
throw;
}
finally
{
ss.Release();
}
}));
}
var results = await Task.WhenAll(trackedTasks);
Using a new Func:
SemaphoreSlim ss = new SemaphoreSlim(_concurrentTasks);
List<string> urls = ImportUrlsFromSource();
List<Task<Result>> trackedTasks = new List<Task<Result>>();
foreach (var item in urls)
{
trackedTasks.Add(new Func<Task<Result>>(async () =>
{
await ss.WaitAsync().ConfigureAwait(false);
try
{
return await ProcessUrl(item);
}
catch (Exception e)
{
_log.Error($"logging some stuff");
throw;
}
finally
{
ss.Release();
}
})());
}
var results = await Task.WhenAll(trackedTasks);
There are two differences:
Task.Run does error handling
First off all, when you call the lambda, it runs. On the other hand, Task.Run would call it. This is relevant because Task.Run does a bit of work behind the scenes. The main work it does is handling a faulted task...
If you call a lambda, and the lambda throws, it would throw before you add the Task to the list...
However, in your case, because your lambda is async, the compiler would create the Task for it (you are not making it by hand), and it will correctly handle the exception and make it available via the returned Task. Therefore this point is moot.
Task.Run prevents task attachment
Task.Run sets DenyChildAttach. This means that the tasks created inside the Task.Run run independently from (are not synchronized with) the returned Task.
For example, this code:
List<Task<int>> trackedTasks = new List<Task<int>>();
var numbers = new int[]{0, 1, 2, 3, 4};
foreach (var item in numbers)
{
trackedTasks.Add(Task.Run(async () =>
{
var x = 0;
(new Func<Task<int>>(async () =>{x = item; return x;}))().Wait();
Console.WriteLine(x);
return x;
}));
}
var results = await Task.WhenAll(trackedTasks);
Will output the numbers from 0 to 4, in unknown order. However the following code:
List<Task<int>> trackedTasks = new List<Task<int>>();
var numbers = new int[]{0, 1, 2, 3, 4};
foreach (var item in numbers)
{
trackedTasks.Add(new Func<Task<int>>(async () =>
{
var x = 0;
(new Func<Task<int>>(async () =>{x = item; return x;}))().Wait();
Console.WriteLine(x);
return x;
})());
}
var results = await Task.WhenAll(trackedTasks);
Will output the numbers from 0 to 4, in order, every time. This is odd, right? What happens is that the inner task is attached to outer one, and executed right away in the same thread. But if you use Task.Run, the inner task is not attached and scheduled independently.
This remain true even if you use await, as long as the task you await does not go to an external system...
What happens with external system? Well, for example, if your task is reading from an URL - as in your example - the system would create a TaskCompletionSource, get the Task from it, set a response handler that writes the result to the TaskCompletionSource, make the request, and return the Task. This Task is not scheduled, it running on the same thread as a parent task makes no sense. And thus, it can break the order.
Since, you are using await to wait on an external system, this point is moot too.
Conclusion
I must conclude that these are equivalent.
If you want to be safe, and make sure it works as expected, even if - in a future version - some of the above points stops being moot, then keep Task.Run. On the other hand, if you really want to optimize, use the lambda and avoid the Task.Run (very small) overhead. However, that probably won't be a bottleneck.
Addendum
When I talk about a task that goes to an external system, I refer to something that runs outside of .NET. There a bit of code that will run in .NET to interface with the external system, but the bulk of the code will not run in .NET, and thus will not be in a managed thread at all.
The consumer of the API specify nothing for this to happen. The task would be a promise task, but that is not exposed, for the consumer there is nothing special about it.
In fact, a task that goes to an external system may barely run in the CPU at all. Futhermore, it might just be waiting on something exterior to the computer (it could be the network or user input).
The pattern is as follows:
The library creates a TaskCompletionSource.
The library sets a means to recieve a notification. It can be a callback, event, message loop, hook, listening to a socket, a pipe line, waiting on a global mutex... whatever is necesary.
The library sets code to react to the notification that will call SetResult, or SetException on the TaskCompletionSource as appropiate for the notification recieved.
The library does the actual call to the external system.
The library returns TaskCompletionSource.Task.
Note: with extra care of optimization not reordering things where it should not, and with care of handling errors during the setup phase. Also, if a CancellationToken is involved, it has to be taken into account (and call SetCancelled on the TaskCompletionSource when appropiate). Also, there could be tear down necesary in the reaction to the notification (or on cancellation). Ah, do not forget to validate your parameters.
Then the external system goes and does whatever it does. Then when it finishes, or something goes wrong, gives the library the notification, and your Task is sudendtly completed, faulted... (or if cancellation happened, your Task is now cancelled) and .NET will schedule the continuations of the task as needed.
Note: async/await uses continuations behind the scenes, that is how execution resumes.
Incidentally, if you wanted to implement SempahoreSlim yourself, you would have to do something very similar to what I describe above. You can see it in my backport of SemaphoreSlim.
Let us see a couple of examples of promise tasks...
Task.Delay: when we are waiting with Task.Delay, the CPU is not spinning. This is not running in a thread. In this case the notification mechanism will be an OS timer. When the OS sees that the time of the timer has elapsed, it will call into the CLR, and then the CLR will mark the task as completed. What thread was waiting? none.
FileStream.ReadSync: when we are reading from storage with FileStream.ReadSync the actual work is done by the device. The CRL has to declare a custom event, then pass the event, the file handle and the buffer to the OS... the OS calls the device driver, the device driver interfaces with the device. As the storage device recovers the information, it will write to memory (directly on the specified buffer) via DMA technology. And when it is done, it will set an interruption, that is handled by the driver, that notifies the OS, that calls the custom event, that marks the task as completed. What thread did read the data from storage? none.
A similar pattern will be used to download from a web page, except, this time the device goes to the network. How to make an HTTP request and how the system waits for a response is beyond the scope of this answer.
It is also possible that the external system is another program, in which case it would run on a thread. But it won't be a managed thread on your process.
Your take away is that these task do not run on any of your threads. And their timing might depend on external factors. Thus, it makes no sense to think of them as running in the same thread, or that we can predict their timing (well, except of course, in the case of the timer).
Both are not very good because they create the tasks immediately. The func version is a little less overhead since it saves the Task.Run route over the thread pool just to immediately end the thread pool work and suspend on the semaphore. You don't need an async Func, you could simplify this by using an async method (possibly a local function).
But you should not do this at all. Instead, use a helper method that implements a parallel async foreach.
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate {
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
Then you just go urls.ForEachAsync(myDop, async input => await ProcessAsync(input));
Here, the tasks are created on demand. You can even make the input stream lazy.

Cancelling async uploading task

I've got an Uploaderclass with one method -Upload
public static int Upload(string endpoint,object objectToBeUploaded)
{
Source.Token.ThrowIfCancellationRequested();
var repos = new UploadRepository(endpoint);
return repos.Upload(objectToBeUploaded);
}
The Source is a static CancellationTokenSource available in the project.
I also have a list of endpoints I need to upload a certain object for.
The code in the Form (it's a very small project using WinForms) looks like this:
private async Task UploadObjectAsync(
string endpoint,
object objectToBeUploaded)
{
try
{
int elementId = await Task.Factory.StartNew(
() => Uploader.Upload(endpoint,objectToBeUploaded));
//do something with the returned value..
}
catch(OperationCanceledEception ex)
{
//handle the exception..
}
}
And then I set the btnUpload.Click handler like this so I can later use it:
this.btnUpload.Click += async (s, e) =>
{
foreach(var endpoint in endpoints)
{
await UploadObjectASsync(endpoint,someObject);
}
}
The problem is that whenever I start uploading to all the endpoints(how they are obtained is irrelevant) and I decide to cancel the uploading process using Source.Cancel(); the first UploadObjectAsyncwill always go through since
the Source.Token.ThrowIfCancellationRequested(); check in the Upload method has already been passed. The rest of tasks will be cancelled normally and handled gracefully.
How am I to restructure this code in order to make sure that the first UploadObjectAsync Task will also be cancelled?
It is worth mentioning that I also don't have access to the source code of the uploading process itself (service reference) - the repos.Upload(objectToBeUploaded) in my Upload method.
You need to make your UploadRepository.Upload take a CancellationToken.
Specially when that's the one doing the I/O operation.. That's when the async/await really pays-off.
That will also help you get rid of that: Task.Factory.StartNew since the Upload method will return Task already. There will be no need to spin off a task.
In your current setup, given enough time for the tasks to start (and go through your ThrowIfCancellationRequested) you won't be able to cancel any upload. Even if it takes 30 seconds.
Also, you might be interested in: Task.Run
There isn't anything practical you can do. The Upload method doesn't take a token. The first task has already passed the cancelation check by the time you hit the cancel button. You can prove to yourself the cancel is a timing issue by adding a 10 second sleep ahead of throw if cancelled call. All tasks would then cancel.
The problem is that you can't stop the process that happens inside the Upload function unless it checks for the state of the CancellationToken any terminates itself.
So what you could do is to abort the thread that is executing by doing something like this:
int elementId = await Task.Factory.StartNew(() =>
{
try
{
using (Source.Token.Register(Thread.CurrentThread.Interrupt))
{
return Uploader.Upload(endpoint, objectToBeUploaded));
}
}
catch (ThreadInterruptedException ex)
{
throw new OperationCanceledEception(ex)
}
}, Source.Token);
Using the Source.Token.Register(delegate) function you cause the token to call that function in case the token is cancelled. This way the thread that is currently executing the the uploaded and should throw a exception right away.
This method only works in case the thread enters the WaitSleepJoin-State from time to time, because the exception is only raised in case the thread is in that state. Have a look at the documentation of the Thread.Interrupt function.
The alternative is to use Thread.Abort and the ThreadAbortedException. This will kill your thread in any case, but it may corrupt the internal state of your service, because locks the thread holds won't be released properly. So be very careful using this method.

Synchronous I/O within an async/await-based Windows Service

Let's say I have a Windows Service which is doing some bit of work, then sleeping for a short amount of time, over and over forever (until the service is shut down). So in the service's OnStart, I could start up a thread whose entry point is something like:
private void WorkerThreadFunc()
{
while (!shuttingDown)
{
DoSomething();
Thread.Sleep(10);
}
}
And in the service's OnStop, I somehow set that shuttingDown flag and then join the thread. Actually there might be several such threads, and other threads too, all started in OnStart and shut down/joined in OnStop.
If I want to instead do this sort of thing in an async/await based Windows Service, it seems like I could have OnStart create cancelable tasks but not await (or wait) on them, and have OnStop cancel those tasks and then Task.WhenAll().Wait() on them. If I understand correctly, the equivalent of the "WorkerThreadFunc" shown above might be something like:
private async Task WorkAsync(CancellationToken cancel)
{
while (true)
{
cancel.ThrowIfCancellationRequested();
DoSomething();
await Task.Delay(10, cancel).ConfigureAwait(false);
}
}
Question #1: Uh... right? I am new to async/await and still trying to get my head around it.
Assuming that's right, now let's say that DoSomething() call is (or includes) a synchronous write I/O to some piece of hardware. If I'm understanding correctly:
Question #2: That is bad? I shouldn't be doing synchronous I/O within a Task in an async/await-based program? Because it ties up a thread from the thread pool while the I/O is happening, and threads from the thread pool are a highly limited resource? Please note that I might have dozens of such Workers going simultaneously to different pieces of hardware.
I am not sure I'm understanding that correctly - I am getting the idea that it's bad from articles like Stephen Cleary's "Task.Run Etiquette Examples: Don't Use Task.Run for the Wrong Thing", but that's specifically about it being bad to do blocking work within Task.Run. I'm not sure if it's also bad if I'm just doing it directly, as in the "private async Task Work()" example above?
Assuming that's bad too, then if I understand correctly I should instead utilize the nonblocking version of DoSomething (creating a nonblocking version of it if it doesn't already exist), and then:
private async Task WorkAsync(CancellationToken cancel)
{
while (true)
{
cancel.ThrowIfCancellationRequested();
await DoSomethingAsync(cancel).ConfigureAwait(false);
await Task.Delay(10, cancel).ConfigureAwait(false);
}
}
Question #3: But... what if DoSomething is from a third party library, which I must use and cannot alter, and that library doesn't expose a nonblocking version of DoSomething? It's just a black box set in stone that at some point does a blocking write to a piece of hardware.
Maybe I wrap it and use TaskCompletionSource? Something like:
private async Task WorkAsync(CancellationToken cancel)
{
while (true)
{
cancel.ThrowIfCancellationRequested();
await WrappedDoSomething().ConfigureAwait(false);
await Task.Delay(10, cancel).ConfigureAwait(false);
}
}
private Task WrappedDoSomething()
{
var tcs = new TaskCompletionSource<object>();
DoSomething();
tcs.SetResult(null);
return tcs.Task;
}
But that seems like it's just pushing the issue down a bit further rather than resolving it. WorkAsync() will still block when it calls WrappedDoSomething(), and only get to the "await" for that after WrappedDoSomething() has already completed the blocking work. Right?
Given that (if I understand correctly) in the general case async/await should be allowed to "spread" all the way up and down in a program, would this mean that if I need to use such a library, I essentially should not make the program async/await-based? I should go back to the Thread/WorkerThreadFunc/Thread.Sleep world?
What if an async/await-based program already exists, doing other things, but now additional functionality that uses such a library needs to be added to it? Does that mean that the async/await-based program should be rewritten as a Thread/etc.-based program?
Actually there might be several such threads, and other threads too, all started in OnStart and shut down/joined in OnStop.
On a side note, it's usually simpler to have a single "master" thread that will start/join all the others. Then OnStart/OnStop just deals with the master thread.
If I want to instead do this sort of thing in an async/await based Windows Service, it seems like I could have OnStart create cancelable tasks but not await (or wait) on them, and have OnStop cancel those tasks and then Task.WhenAll().Wait() on them.
That's a perfectly acceptable approach.
If I understand correctly, the equivalent of the "WorkerThreadFunc" shown above might be something like:
Probably want to pass the CancellationToken down; cancellation can be used by synchronous code, too:
private async Task WorkAsync(CancellationToken cancel)
{
while (true)
{
DoSomething(cancel);
await Task.Delay(10, cancel).ConfigureAwait(false);
}
}
Question #1: Uh... right? I am new to async/await and still trying to get my head around it.
It's not wrong, but it only saves you one thread on a Win32 service, which doesn't do much for you.
Question #2: That is bad? I shouldn't be doing synchronous I/O within a Task in an async/await-based program? Because it ties up a thread from the thread pool while the I/O is happening, and threads from the thread pool are a highly limited resource? Please note that I might have dozens of such Workers going simultaneously to different pieces of hardware.
Dozens of threads are not a lot. Generally, asynchronous I/O is better because it doesn't use any threads at all, but in this case you're on the desktop, so threads are not a highly limited resource. async is most beneficial on UI apps (where the UI thread is special and needs to be freed), and ASP.NET apps that need to scale (where the thread pool limits scalability).
Bottom line: calling a blocking method from an asynchronous method is not bad but it's not the best, either. If there is an asynchronous method, call that instead. But if there isn't, then just keep the blocking call and document it in the XML comments for that method (because an asynchronous method blocking is rather surprising behavior).
I am getting the idea that it's bad from articles like Stephen Cleary's "Task.Run Etiquette Examples: Don't Use Task.Run for the Wrong Thing", but that's specifically about it being bad to do blocking work within Task.Run.
Yes, that is specifically about using Task.Run to wrap synchronous methods and pretend they're asynchronous. It's a common mistake; all it does is trade one thread pool thread for another.
Assuming that's bad too, then if I understand correctly I should instead utilize the nonblocking version of DoSomething (creating a nonblocking version of it if it doesn't already exist)
Asynchronous is better (in terms of resource utilization - that is, fewer threads used), so if you want/need to reduce the number of threads, you should use async.
Question #3: But... what if DoSomething is from a third party library, which I must use and cannot alter, and that library doesn't expose a nonblocking version of DoSomething? It's just a black box set in stone that at some point does a blocking write to a piece of hardware.
Then just call it directly.
Maybe I wrap it and use TaskCompletionSource?
No, that doesn't do anything useful. That just calls it synchronously and then returns an already-completed task.
But that seems like it's just pushing the issue down a bit further rather than resolving it. WorkAsync() will still block when it calls WrappedDoSomething(), and only get to the "await" for that after WrappedDoSomething() has already completed the blocking work. Right?
Yup.
Given that (if I understand correctly) in the general case async/await should be allowed to "spread" all the way up and down in a program, would this mean that if I need to use such a library, I essentially should not make the program async/await-based? I should go back to the Thread/WorkerThreadFunc/Thread.Sleep world?
Assuming you already have a blocking Win32 service, it's probably fine to just keep it as it is. If you are writing a new one, personally I would make it async to reduce threads and allow asynchronous APIs, but you don't have to do it either way. I prefer Tasks over Threads in general, since it's much easier to get results from Tasks (including exceptions).
The "async all the way" rule only goes one way. That is, once you call an async method, then its caller should be async, and its caller should be async, etc. It does not mean that every method called by an async method must be async.
So, one good reason to have an async Win32 service would be if there's an async-only API you need to consume. That would cause your DoSomething method to become async DoSomethingAsync.
What if an async/await-based program already exists, doing other things, but now additional functionality that uses such a library needs to be added to it? Does that mean that the async/await-based program should be rewritten as a Thread/etc.-based program?
No. You can always just block from an async method. With proper documentation so when you are reusing/maintaining this code a year from now, you don't swear at your past self. :)
If you still spawn your threads, well, yes, it's bad. Because it will not give you any benefit as the thread is still allocated and consuming resources for the specific purpose of running your worker function. Running a few threads to be able to do work in parallel within a service has a minimal impact on your application.
If DoSomething() is synchronous, you could switch to the Timer class instead. It allows multiple timers to use a smaller amount of threads.
If it's important that the jobs can complete, you can modify your worker classes like this:
SemaphoreSlim _shutdownEvent = new SemaphoreSlim(0,1);
public async Task Stop()
{
return await _shutdownEvent.WaitAsync();
}
private void WorkerThreadFunc()
{
while (!shuttingDown)
{
DoSomething();
Thread.Sleep(10);
}
_shutdownEvent.Release();
}
.. which means that during shutdown you can do this:
var tasks = myServices.Select(x=> x.Stop());
Task.WaitAll(tasks);
A thread can only do one thing at a time. While it is working on your DoSomething it can't do anything else.
In an interview Eric Lippert described async-await in a restaurant metaphor. He suggests to use async-await only for functionality where your thread can do other things instead of waiting for a process to complete, like respond to operator input.
Alas, your thread is not waiting, it is doing hard work in DoSomething. And as long as DoSomething is not awaiting, your thread will not return from DoSomething to do the next thing.
So if your thread has something meaningful to do while procedure DoSomething is executing, it's wise to let another thread do the DoSomething, while your original thread is doing the meaningful stuff. Task.Run( () => DoSomething()) could do this for you. As long as the thread that called Task.Run doesn't await for this task, it is free to do other things.
You also want to cancel your process. DoSomething can't be cancelled. So even if cancellation is requested you'll have to wait until DoSomething is completed.
Below is your DoSomething in a form with a Start button and a Cancel button. While your thread is DoingSomething, one of the meaningful things your GUI thread may want to do is respond to pressing the cancel button:
void CancellableDoSomething(CancellationToken token)
{
while (!token.IsCancellationRequested)
{
DoSomething()
}
}
async Task DoSomethingAsync(CancellationToken token)
{
var task = Task.Run(CancellableDoSomething(token), token);
// if you have something meaningful to do, do it now, otherwise:
return Task;
}
CancellationTokenSource cancellationTokenSource = null;
private async void OnButtonStartSomething_Clicked(object sender, ...)
{
if (cancellationTokenSource != null)
// already doing something
return
// else: not doing something: start doing something
cancellationTokenSource = new CancellationtokenSource()
var task = AwaitDoSomethingAsync(cancellationTokenSource.Token);
// if you have something meaningful to do, do it now, otherwise:
await task;
cancellationTokenSource.Dispose();
cancellationTokenSource = null;
}
private void OnButtonCancelSomething(object sender, ...)
{
if (cancellationTokenSource == null)
// not doing something, nothing to cancel
return;
// else: cancel doing something
cancellationTokenSource.Cancel();
}

Preventing task from running on certain thread

I have been struggling a bit with some async await stuff. I am using RabbitMQ for sending/receiving messages between some programs.
As a bit of background, the RabbitMQ client uses 3 or so threads that I can see: A connection thread and two heartbeat threads. Whenever a message is received via TCP, the connection thread handles it and calls a callback which I have supplied via an interface. The documentation says that it is best to avoid doing lots of work during this call since its done on the same thread as the connection and things need to continue on. They supply a QueueingBasicConsumer which has a blocking 'Dequeue' method which is used to wait for a message to be received.
I wanted my consumers to be able to actually release their thread context during this waiting time so somebody else could do some work, so I decided to use async/await tasks. I wrote an AwaitableBasicConsumer class which uses TaskCompletionSources in the following fashion:
I have an awaitable Dequeue method:
public Task<RabbitMQ.Client.Events.BasicDeliverEventArgs> DequeueAsync(CancellationToken cancellationToken)
{
//we are enqueueing a TCS. This is a "read"
rwLock.EnterReadLock();
try
{
TaskCompletionSource<RabbitMQ.Client.Events.BasicDeliverEventArgs> tcs = new TaskCompletionSource<RabbitMQ.Client.Events.BasicDeliverEventArgs>();
//if we are cancelled before we finish, this will cause the tcs to become cancelled
cancellationToken.Register(() =>
{
tcs.TrySetCanceled();
});
//if there is something in the undelivered queue, the task will be immediately completed
//otherwise, we queue the task into deliveryTCS
if (!TryDeliverUndelivered(tcs))
deliveryTCS.Enqueue(tcs);
}
return tcs.Task;
}
finally
{
rwLock.ExitReadLock();
}
}
The callback which the rabbitmq client calls fulfills the tasks: This is called from the context of the AMQP Connection thread
public void HandleBasicDeliver(string consumerTag, ulong deliveryTag, bool redelivered, string exchange, string routingKey, RabbitMQ.Client.IBasicProperties properties, byte[] body)
{
//we want nothing added while we remove. We also block until everybody is done.
rwLock.EnterWriteLock();
try
{
RabbitMQ.Client.Events.BasicDeliverEventArgs e = new RabbitMQ.Client.Events.BasicDeliverEventArgs(consumerTag, deliveryTag, redelivered, exchange, routingKey, properties, body);
bool sent = false;
TaskCompletionSource<RabbitMQ.Client.Events.BasicDeliverEventArgs> tcs;
while (deliveryTCS.TryDequeue(out tcs))
{
//once we manage to actually set somebody's result, we are done with handling this
if (tcs.TrySetResult(e))
{
sent = true;
break;
}
}
//if nothing was sent, we queue up what we got so that somebody can get it later.
/**
* Without the rwlock, this logic would cause concurrency problems in the case where after the while block completes without sending, somebody enqueues themselves. They would get the
* next message and the person who enqueues after them would get the message received now. Locking prevents that from happening since nobody can add to the queue while we are
* doing our thing here.
*/
if (!sent)
{
undelivered.Enqueue(e);
}
}
finally
{
rwLock.ExitWriteLock();
}
}
rwLock is a ReaderWriterLockSlim. The two queues (deliveryTCS and undelivered) are ConcurrentQueues.
The problem:
Every once in a while, the method that awaits the dequeue method throws an exception. This would not normally be an issue since that method is also async and so it enters the "Exception" completion state that tasks enter. The problem comes in the situation where the task that calls DequeueAsync is resumed after the await on the AMQP Connection thread that the RabbitMQ client creates. Normally I have seen tasks resume onto the main thread or one of the worker threads floating around. However, when it resumes onto the AMQP thread and an exception is thrown, everything stalls. The task does not enter its "Exception state" and the AMQP Connection thread is left saying that it is executing the method that had the exception occur.
My main confusion here is why this doesn't work:
var task = c.RunAsync(); //<-- This method awaits the DequeueAsync and throws an exception afterwards
ConsumerTaskState state = new ConsumerTaskState()
{
Connection = connection,
CancellationToken = cancellationToken
};
//if there is a problem, we execute our faulted method
//PROBLEM: If task fails when its resumed onto the AMQP thread, this method is never called
task.ContinueWith(this.OnFaulted, state, TaskContinuationOptions.OnlyOnFaulted);
Here is the RunAsync method, set up for the test:
public async Task RunAsync()
{
using (var channel = this.Connection.CreateModel())
{
...
AwaitableBasicConsumer consumer = new AwaitableBasicConsumer(channel);
var result = consumer.DequeueAsync(this.CancellationToken);
//wait until we find something to eat
await result;
throw new NotImplementeException(); //<-- the test exception. Normally this causes OnFaulted to be called, but sometimes, it stalls
...
} //<-- This is where the debugger says the thread is sitting at when I find it in the stalled state
}
Reading what I have written, I see that I may not have explained my problem very well. If clarification is needed, just ask.
My solutions that I have come up with are as follows:
Remove all Async/Await code and just use straight up threads and block. Performance will be decreased, but at least it won't stall sometimes
Somehow exempt the AMQP threads from being used for resuming tasks. I assume that they were sleeping or something and then the default TaskScheduler decided to use them. If I could find a way to tell the task scheduler that those threads are off limits, that would be great.
Does anyone have an explanation for why this is happening or any suggestions to solving this? Right now I am removing the async code just so that the program is reliable, but I really want to understand what is going on here.
I first recommend that you read my async intro, which explains in precise terms how await will capture a context and use that to resume execution. In short, it will capture the current SynchronizationContext (or the current TaskScheduler if SynchronizationContext.Current is null).
The other important detail is that async continuations are scheduled with TaskContinuationOptions.ExecuteSynchronously (as #svick pointed out in a comment). I have a blog post about this but AFAIK it is not officially documented anywhere. This detail does make writing an async producer/consumer queue difficult.
The reason await isn't "switching back to the original context" is (probably) because the RabbitMQ threads don't have a SynchronizationContext or TaskScheduler - thus, the continuation is executed directly when you call TrySetResult because those threads look just like regular thread pool threads.
BTW, reading through your code, I suspect your use of a reader/writer lock and concurrent queues are incorrect. I can't be sure without seeing the whole code, but that's my impression.
I strongly recommend you use an existing async queue and build a consumer around that (in other words, let someone else do the hard part :). The BufferBlock<T> type in TPL Dataflow can act as an async queue; that would be my first recommendation if you have Dataflow available on your platform. Otherwise, I have an AsyncProducerConsumerQueue type in my AsyncEx library, or you could write your own (as I describe on my blog).
Here's an example using BufferBlock<T>:
private readonly BufferBlock<RabbitMQ.Client.Events.BasicDeliverEventArgs> _queue = new BufferBlock<RabbitMQ.Client.Events.BasicDeliverEventArgs>();
public void HandleBasicDeliver(string consumerTag, ulong deliveryTag, bool redelivered, string exchange, string routingKey, RabbitMQ.Client.IBasicProperties properties, byte[] body)
{
RabbitMQ.Client.Events.BasicDeliverEventArgs e = new RabbitMQ.Client.Events.BasicDeliverEventArgs(consumerTag, deliveryTag, redelivered, exchange, routingKey, properties, body);
_queue.Post(e);
}
public Task<RabbitMQ.Client.Events.BasicDeliverEventArgs> DequeueAsync(CancellationToken cancellationToken)
{
return _queue.ReceiveAsync(cancellationToken);
}
In this example, I'm keeping your DequeueAsync API. However, once you start using TPL Dataflow, consider using it elsewhere as well. When you need a queue like this, it's common to find other parts of your code that would also benefit from a dataflow approach. E.g., instead of having a bunch of methods calling DequeueAsync, you could link your BufferBlock to an ActionBlock.

Categories