We ran into a bug where we had to validate a list of objects with an async method. The writer of the code wanted to just stuff it into a Linq expression like so:
var invalidObjects = list
.Where(x => _service.IsValidAsync(x).Result)
.ToList();
The validating method looked something like this:
public async Task<bool> IsValidAsync(object #object) {
var validObjects = await _cache.GetAsync<List<object>>("ValidObjectsCacheKey");
return validObjects.Contains(#object);
}
This little solution caused the whole application to hang on the await _cache.GetAsync line.
The cache is a distributed cache (redis).
After changing the linq to a simple foreach and properly awaiting _service.IsValidAsync, the code ran deadlock-free and basically in an instant.
I understand on a basic level how async-await works, but I can't wrap my head around why this happened especially because the list only had one object.
Any suggestion is welcome!
EDIT: The application is running on .net Core 2.2, but the library in which the problem happened is targeting .netstandard 2.0.
System.Threading.SynchronizationContext.Current returns null at the time of the deadlock
EDIT2: It turns changing the cache provider (but still accessing it via an async method) also resolves the issue, so the bug might actually be in the redis cache client:
https://github.com/aspnet/Caching/blob/master/src/Microsoft.Extensions.Caching.StackExchangeRedis/RedisCache.cs
Apart from the already stated issue of mixing async-await and blocking calls like .Result or .Wait()
Reference Async/Await - Best Practices in Asynchronous Programming
To summarize this second guideline, you should avoid mixing async and blocking code. Mixed async and blocking code can cause deadlocks, more-complex error handling and unexpected blocking of context threads. The exception to this guideline is the Main method for console applications, or—if you’re an advanced user—managing a partially asynchronous codebase.
Sometimes the simple approach, as you have already discovered, is to traverse the list and properly await the asynchronous function
For example
var invalidObjects = //...
foreach(var x in list){
if(!(await _service.IsValidAsync(x)))
invalidObjects.Add(x);
}
Related
I have the following code:
var things = await GetDataFromApi(cancellationToken);
var builder = new StringBuilder(JsonSerializer.Serialize(things));
await things
.GroupBy(x => x.Category)
.ToAsyncEnumerable()
.SelectManyAwaitWithCancellation(async (category, ct) =>
{
var thingsWithColors = await _colorsApiClient.GetColorsFor(category.Select(thing => thing.Name).ToList(), ct);
return category
.Select(thing => ChooseBestColor(thingsWithColors))
.ToAsyncEnumerable();
})
.ForEachAsync(thingAndColor =>
{
Console.WriteLine(Thread.CurrentThread.ManagedThreadId); // prints different IDs
builder.Replace(thingAndColor.Thing, $"{thingAndColor.Color} {thingAndColor.Thing}");
}, cancellationToken);
It uses System.Linq.Async and I find it difficult to understand.
In "classic"/synchronous LINQ, the whole thing would get executed only when I call ToList() or ToArray() on it. In the example above, there is no such call, but the lambdas get executed anyway. How does it work?
The other concern I have is about multi-threading. I heard many times that async != multithreading. Then, how is that possible that the Console.WriteLine(Thread.CurrentThread.ManagedThreadId); prints various IDs? Some of the IDs get printed multiple times, but overall there are about 5 thread IDs in the output. None of my code creates any threads explicitly. It's all async-await.
The StringBuilder does not support multi-threading, and I'd like to understand if the implementation above is valid.
Please ignore the algorithm of my code, it does not really matter, it's just an example. What matters is the usage of System.Async.Linq.
ForEachAsync would have a similar effect as ToList/ToArray since it forces evaluation of the entire list.
By default, anything after an await continues on the same execution context, meaning if the code runs on the UI thread, it will continue running on the UI thread. If it runs on a background thread, it will continue to run on a background thread, but not necessarily the same one.
However, none of your code should run in parallel. That does not necessarily mean it is thread safe, there probably need to be some memory barriers to ensure data is flushed correctly, but I would assume these barriers are issued by the framework code itself.
The System.Async.Linq, as well as the whole dotnet/reactive repository, is currently a semi-abandoned project. The issues on GitHub are piling up, and nobody answers them officially for almost a year. There is no documentation published, apart from the XML documentation in the source code on top of each method. You can't really use this library without studying the source code, which is generally easy to do because the code is short, readable, and honestly doesn't do too much. The functionality offered by this library is similar with the functionality found in the System.Linq, with the main difference being that the input is IAsyncEnumerable<T> instead of IEnumerable<T>, and the delegates can return values wrapped in ValueTask<T>s.
With the exception of a few operators like the Merge (and only one of its overloads), the System.Async.Linq doesn't introduce concurrency. The asynchronous operations are invoked one at a time, and then they are awaited before invoking the next operation. The SelectManyAwaitWithCancellation operator is not one of the exceptions. The selector is invoked sequentially for each element, and the resulting IAsyncEnumerable<TResult> is enumerated sequentially, and its values yielded the one after the other. So it's unlikely to create thread-safety issues.
The ForEachAsync operator is just a substitute of doing a standard await foreach loop, and was included in the library at a time when the C# language support for await foreach was non existent (before C# 8). I would recommend against using this operator, because its resemblance with the new Parallel.ForEachAsync API could create confusion. Here is what is written inside the source code of the ForEachAsync operator:
// REVIEW: Once we have C# 8.0 language support, we may want to do away with these
// methods. An open question is how to provide support for cancellation,
// which could be offered through WithCancellation on the source. If we still
// want to keep these methods, they may be a candidate for
// System.Interactive.Async if we consider them to be non-standard
// (i.e. IEnumerable<T> doesn't have a ForEach extension method either).
I've encountered a weird behavior regarding async extension methods in separate assemblies.
We have the following:
One assembly handling sending of EventGridEvent. Target is .NET Standard 2.0. This assembly references Microsoft.Azure.EventGrid.
One assembly using assembly no. 1. Target is .NET Framework 4.7.
For some reason, making synchronous methods from assembly no. 2 to assembly no. 1 results in weird behaviour. Consider the two functions we have in assembly no. 1:
public async Task PublishAsync(...)
{
await _eventGridClient.PublishEventsAsync(_eventGridTopicHostName, ...);
}
public void Publish(...)
{
_eventGridClient.PublishEventsAsync(_eventGridTopicHostName, ...).Wait();
}
If we call the first method from assembly no. 2 with PublishAsync().Wait(), it will never return. Publish() will, however. But, if Publish() calls PublishAsync().Wait(), that method will also hang.
Worth mentioning is that EventGridClient contains LongRunningOperationRetryTimeout with default set to 30, which is ignored. It never returns.
Anyone have any idea what causes this behavior? A workaround is to copy code, but we would like to avoid that.
Thanks in advance.
You should never block on async code by calling Wait() or .Result on the returned Task. #Stephen Cleary explains why on his blog.
When _eventGridClient.PublishEventsAsync is called, the SynchronizationContext is captured. When the task completes, it waits for the context to become available but it never will since you are blocking it with your call to .Wait(). This leads to a deadlock.
You may get of out trouble by avoiding capturing the context by calling ConfigureAwait(false):
public async Task PublishAsync(...)
{
await _eventGridClient.PublishEventsAsync(_eventGridTopicHostName, ...)
.ConfigureAwait(false);
}
But the best solution is still not to block at all. Async code should be "async all the way" as explained in the linked blog post.
The problem was that the calling method was running on a UI thread. It was solved by wrapping the call like so: Task.Run(() => ...).Wait()
I have to address a temporary situation that requires me to do a non-ideal thing: I have to call an async method from inside a sync one.
Let me just say here that I know all about the problems I'm getting myself into and I understand reasons why this is not advised.
That said, I'm dealing with a large codebase, which is completely sync from top to bottom and there is no way I can rewrite everything to use async await in a reasonable amount of time. But I do need to rewrite a number of small parts of this codebase to use the new async API that I'be been slowly developing over the last year or so, because it has a lot of new features that the old codebase would benefit from as well, but can't get them for legacy reasons. And since all that code isn't going away any time soon, I'm facing a problem.
TL;DR: A large sync codebase cannot be easily rewritten to support async but now requires calls into another large codebase, which is completely async.
I'm currently doing the simplest thing that works in the sync codebase: wrapping each async call into a Task.Run and waiting for the Result in a sync way.
The problem with this approach is, that it becomes slow whenever sync codebase does this in a tight loop. I understand the reasons and I sort of know what I can do - I'd have to make sure that all async calls are started on the same thread instead of borrowing a new one each time from the thread pool (which is what Task.Run does). This borrowing and returning incurs a lot of switching which can slow things down considerably if done a lot.
What are my options, short of writing my own scheduler that would prefer to reuse a single dedicated thread?
UPDATE: To better illustrate what I'm dealing with, I offer an example of one of the simplest transformations I need to do (there are more complex ones as well).
It's basically simple LINQ query that uses a custom LINQ provider under the hood. There's no EF or anything similar underneath.
[Old code]
var result = (from c in syncCtx.Query("Components")
where c.Property("Id") == id
select c).SingleOrDefault();
[New code]
var result = Task.Run(async () =>
{
Dictionary<string, object> data;
using (AuthorizationManager.Instance.AuthorizeAsInternal())
{
var uow = UnitOfWork.Current;
var source = await uow.Query("Components")
.Where("Id = #id", new { id })
.PrepareAsync();
var single = await source.SingleOrDefaultAsync();
data = single.ToDictionary();
}
return data;
}).Result;
As mentioned, this is one of the less complicated examples and it already contains 2 async calls.
UPDATE 2: I tried removing the Task.Run and invoking .Result directly on the result of a wrapper async method, as suggested by #Evk and #Manu. Unfortunately, while testing this in my staging environment, I quickly ran into a deadlock. I'm still trying to understand what exactly transpired, but it's obvious that Task.Run cannot simply be removed in my case. There are additional complications to be resolved, first...
I don't think you are on the right track. Wrapping every async call in a Task.Run seems horrible to me, it always starts an additional tasks which you don't need. But I understand that introducing async/await in a large codebase can be problematic.
I see a possible solution: Extract all async calls into separate, async methods. This way, your project will have a pretty nice transition from sync to async, since you can change methods one by one without affecting other parts of the code.
Something like this:
private Dictionary<string, object> GetSomeData(string id)
{
var syncCtx = GetContext();
var result = (from c in syncCtx.Query("Components")
where c.Property("Id") == id
select c).SingleOrDefault();
DoSomethingSyncWithResult(result);
return result;
}
would become something like this:
private Dictionary<string, object> GetSomeData(string id)
{
var result = FetchComponentAsync(id).Result;
DoSomethingSyncWithResult(result);
return result;
}
private async Task<Dictionary<string, object>> FetchComponentAsync(int id)
{
using (AuthorizationManager.Instance.AuthorizeAsInternal())
{
var uow = UnitOfWork.Current;
var source = await uow.Query("Components")
.Where("Id = #id", new { id })
.PrepareAsync();
var single = await source.SingleOrDefaultAsync();
return single.ToDictionary();
}
}
Since you are in a Asp.Net environment, mixing sync with async is a very bad idea. I'm surprised that your Task.Run solution works for you. The more you incorporate the new async codebase into the old sync codebase, the more you will run into problems and there is no easy fix for that, except rewriting everything in an async way.
I strongly suggest you to not mix your async parts into the sync codebase. Instead, work from "bottom to top", change everything from sync to async where you need to await an async call. It may seem like a lot of work, but the benefits are much higher than if you search for some "hacks" now and don't fix the underlining problems.
Recently I've developed doubts about the way I'm implementing the async-await pattern in my Web API projects. I've read that async-await should be "all the way" and that's what I've done. But it's all starting to seem redundant and I'm not sure that I'm doing this correctly. I've a got a controller that calls a repository and it calls a data access (entity framework 6) class - "async all the way". I've read a lot of conflicting stuff on this and would like to get it cleared-up.
EDIT: The referenced possible duplicate is a good post, but not specific enough for my needs. I included code to illustrate the problem. It seems really difficult to get a decisive answer on this. It would be nice if we could put async-await in one place and let .net handle the rest, but we can't. So, am I over doing it or is it not that simple.
Here's what I've got:
Controller:
public async Task<IHttpActionResult> GetMessages()
{
var result = await _messageRepository.GetMessagesAsync().ConfigureAwait(false);
return Ok(result);
}
Repository:
public async Task<List<string>> GetMessagesAsync()
{
return await _referralMessageData.GetMessagesAsync().ConfigureAwait(false);
}
Data:
public async Task<List<string>> GetMessagesAsync()
{
return await _context.Messages.Select(i => i.Message).ToListAsync().ConfigureAwait(false);
}
It would be nice if we could put async-await in one place and let .net handle the rest, but we can't. So, am I over doing it or is it not that simple.
It would be nice if it was simpler.
The sample repository and data code don't have much real logic in them (and none after the await), so they can be simplified to return the tasks directly, as other commenters have noted.
On a side note, the sample repository suffers from a common repository problem: doing nothing. If the rest of your real-world repository is similar, you might have one level of abstraction too many in your system. Note that Entity Framework is already a generic unit-of-work repository.
But regarding async and await in the general case, the code often has work to do after the await:
public async Task<IHttpActionResult> GetMessages()
{
var result = await _messageRepository.GetMessagesAsync();
return Ok(result);
}
Remember that async and await are just fancy syntax for hooking up callbacks. There isn't an easier way to express this method's logic asynchronously. There have been some experiments around, e.g., inferring await, but they have all been discarded at this point (I have a blog post describing why the async/await keywords have all the "cruft" that they do).
And this cruft is necessary for each method. Each method using async/await is establishing its own callback. If the callback isn't necessary, then the method can just return the task directly, avoiding async/await. Other asynchronous systems (e.g., promises in JavaScript) have the same restriction: they have to be asynchronous all the way.
It's possible - conceptually - to define a system in which any blocking operation would yield the thread automatically. My foremost argument against a system like this is that it would have implicit reentrancy. Particularly when considering third-party library changes, an auto-yielding system would be unmaintainable IMO. It's far better to have the asynchrony of an API explicit in its signature (i.e., if it returns Task, then it's asynchronous).
Now, #usr makes a good point that maybe you don't need asynchrony at all. That's almost certainly true if, e.g., your Entity Framework code is querying a single instance of SQL Server. This is because the primary benefit of async on ASP.NET is scalability, and if you don't need scalability (of the ASP.NET portion), then you don't need asynchrony. See the "not a silver bullet" section in my MSDN article on async ASP.NET.
However, I think there's also an argument to be made for "natural APIs". If an operation is naturally asynchronous (e.g., I/O-based), then its most natural API is an asynchronous API. Conversely, naturally synchronous operations (e.g., CPU-based) are most naturally represented as synchronous APIs. The natural API argument is strongest for libraries - if your repository / data access layer was its own dll intended to be reused in other (possibly desktop or mobile) applications, then it should definitely be an asynchronous API. But if (as is more likely the case) it is specific to this ASP.NET application which does not need to scale, then there's no specific need to make the API either asynchronous or synchronous.
But there's a good two-pronged counter-argument regarding developer experience. Many developers don't know their way around async at all; would a code maintainer be likely to mess it up? The other prong of that argument is that the libraries and tooling around async are still coming up to speed. Most notable is the lack of a causality stack when there are exceptions to trace down (on a side note, I wrote a library that helps with this). Furthermore, parts of ASP.NET are not async-compatible - most notably, MVC filters and child actions (they are fixing both of those with ASP.NET vNext). And ASP.NET has different behavior regarding timeouts and thread aborts for asynchronous handlers - adding yet a little more to the async learning curve.
Of course, the counter-counter argument would be that the proper response to behind-the-times developers is to train them, not restrict the technologies available.
In short:
The proper way to do async is "all the way". This is especially true on ASP.NET, and it's not likely to change anytime soon.
Whether async is appropriate, or helpful, is up to you and your application's scenario.
public async Task<List<string>> GetMessagesAsync()
{
return await _referralMessageData.GetMessagesAsync().ConfigureAwait(false);
}
public async Task<List<string>> GetMessagesAsync()
{
return await _context.Messages.Select(i => i.Message).ToListAsync().ConfigureAwait(false);
}
If the only calls you do to asynchronous methods are tail-calls, then you don't really need to await:
public Task<List<string>> GetMessagesAsync()
{
return _referralMessageData.GetMessagesAsync();
}
public Task<List<string>> GetMessagesAsync()
{
return _context.Messages.Select(i => i.Message).ToListAsync();
}
About the only thing you lose is some stack-trace information, but that's rarely all that useful. Remove the await then instead of generating a state-machine that handles the waiting you just pass back the task produced by the called method up to the calling method, and the calling method can await on that.
The methods can also sometimes be inlined now, or perhaps have tail-call optimisation done on them.
I'd even go so far as to turn non-task-based paths into task-based if it was relatively simple to do so:
public async Task<List<string>> GetMeesagesAsync()
{
if(messageCache != null)
return messageCache;
return await _referralMessageData.GetMessagesAsync().ConfigureAwait(false);
}
Becomes:
public Task<List<string>> GetMeesagesAsync()
{
if(messageCache != null)
return Task.FromResult(messageCache);
return _referralMessageData.GetMessagesAsync();
}
However, if at any point you need the results of a task to do further work, then awaiting is the way to go.
As best as I can, I opt for async all the way down. However, I am still stuck using ASP.NET Membership which isn't built for async. As a result my calls to methods like string[] GetRolesForUser() can't use async.
In order to build roles properly I depend on data from various sources so I am using multiple tasks to fetch the data in parallel:
public override string[] GetRolesForUser(string username) {
...
Task.WaitAll(taskAccounts, taskContracts, taskOtherContracts, taskMoreContracts, taskSomeProduct);
...
}
All of these tasks are simply fetching data from a SQL Server database using the Entity Framework. However, the introduction of that last task (taskSomeProduct) is causing a deadlock while none of the other methods have been.
Here is the method that causes a deadlock:
public async Task<int> SomeProduct(IEnumerable<string> ids) {
var q = from c in this.context.Contracts
join p in this.context.Products
on c.ProductId equals p.Id
where ids.Contains(c.Id)
select p.Code;
//Adding .ConfigureAwait(false) fixes the problem here
var codes = await q.ToListAsync();
var slotCount = codes .Sum(p => char.GetNumericValue(p, p.Length - 1));
return Convert.ToInt32(slotCount);
}
However, this method (which looks very similar to all the other methods) isn't causing deadlocks:
public async Task<List<CustomAccount>> SomeAccounts(IEnumerable<string> ids) {
return await this.context.Accounts
.Where(o => ids.Contains(o.Id))
.ToListAsync()
.ToCustomAccountListAsync();
}
I'm not quite sure what it is about that one method that is causing the deadlock. Ultimately they are both doing the same task of querying the database. Adding ConfigureAwait(false) to the one method does fix the problem, but I'm not quite sure what differentiates itself from the other methods which execute fine.
Edit
Here is some additional code which I originally omitted for brevity:
public static Task<List<CustomAccount>> ToCustomAccountListAsync(this Task<List<Account>> sqlObjectsTask) {
var sqlObjects = sqlObjectsTask.Result;
var customObjects = sqlObjects.Select(o => PopulateCustomAccount(o)).ToList();
return Task.FromResult<List<CustomAccount>>(customObjects);
}
The PopulateCustomAccount method simply returns a CustomAccount object from the database Account object.
In ToCustomAccountListAsync you call Task.Result. That's a classic deadlock. Use await.
This is not an answer, but I have a lot to say, it wouldn't fit in comments.
Some fact: EF context is not thread safe and doesn't support parallel execution:
While thread safety would make async more useful it is an orthogonal feature. It is unclear that we could ever implement support for it in the most general case, given that EF interacts with a graph composed of user code to maintain state and there aren't easy ways to ensure that this code is also thread safe.
For the moment, EF will detect if the developer attempts to execute two async operations at one time and throw.
Some prediction:
You say that:
The parallel execution of the other four tasks has been in production for months without deadlocking.
They can't be executing in parallel. One possibility is that the thread pool cannot assign more than one thread to your operations, in that case they would be executed sequentially. Or it could be the way you are initializing your tasks, I'm not sure. Assuming they are executed sequentially (otherwise you would have recognized the exception I'm talking about), there is another problem:
Task.WaitAll hanging with multiple awaitable tasks in ASP.NET
So maybe it isn't about that specific task SomeProduct but it always happens on the last task? Well, if they executed in parallel, there wouldn't be a "last task" but as I've already pointed out, they must be running sequentially considering they had been in production for quite a long time.