Async LINQ - not lazy? Multithreaded? - c#

I have the following code:
var things = await GetDataFromApi(cancellationToken);
var builder = new StringBuilder(JsonSerializer.Serialize(things));
await things
.GroupBy(x => x.Category)
.ToAsyncEnumerable()
.SelectManyAwaitWithCancellation(async (category, ct) =>
{
var thingsWithColors = await _colorsApiClient.GetColorsFor(category.Select(thing => thing.Name).ToList(), ct);
return category
.Select(thing => ChooseBestColor(thingsWithColors))
.ToAsyncEnumerable();
})
.ForEachAsync(thingAndColor =>
{
Console.WriteLine(Thread.CurrentThread.ManagedThreadId); // prints different IDs
builder.Replace(thingAndColor.Thing, $"{thingAndColor.Color} {thingAndColor.Thing}");
}, cancellationToken);
It uses System.Linq.Async and I find it difficult to understand.
In "classic"/synchronous LINQ, the whole thing would get executed only when I call ToList() or ToArray() on it. In the example above, there is no such call, but the lambdas get executed anyway. How does it work?
The other concern I have is about multi-threading. I heard many times that async != multithreading. Then, how is that possible that the Console.WriteLine(Thread.CurrentThread.ManagedThreadId); prints various IDs? Some of the IDs get printed multiple times, but overall there are about 5 thread IDs in the output. None of my code creates any threads explicitly. It's all async-await.
The StringBuilder does not support multi-threading, and I'd like to understand if the implementation above is valid.
Please ignore the algorithm of my code, it does not really matter, it's just an example. What matters is the usage of System.Async.Linq.

ForEachAsync would have a similar effect as ToList/ToArray since it forces evaluation of the entire list.
By default, anything after an await continues on the same execution context, meaning if the code runs on the UI thread, it will continue running on the UI thread. If it runs on a background thread, it will continue to run on a background thread, but not necessarily the same one.
However, none of your code should run in parallel. That does not necessarily mean it is thread safe, there probably need to be some memory barriers to ensure data is flushed correctly, but I would assume these barriers are issued by the framework code itself.

The System.Async.Linq, as well as the whole dotnet/reactive repository, is currently a semi-abandoned project. The issues on GitHub are piling up, and nobody answers them officially for almost a year. There is no documentation published, apart from the XML documentation in the source code on top of each method. You can't really use this library without studying the source code, which is generally easy to do because the code is short, readable, and honestly doesn't do too much. The functionality offered by this library is similar with the functionality found in the System.Linq, with the main difference being that the input is IAsyncEnumerable<T> instead of IEnumerable<T>, and the delegates can return values wrapped in ValueTask<T>s.
With the exception of a few operators like the Merge (and only one of its overloads), the System.Async.Linq doesn't introduce concurrency. The asynchronous operations are invoked one at a time, and then they are awaited before invoking the next operation. The SelectManyAwaitWithCancellation operator is not one of the exceptions. The selector is invoked sequentially for each element, and the resulting IAsyncEnumerable<TResult> is enumerated sequentially, and its values yielded the one after the other. So it's unlikely to create thread-safety issues.
The ForEachAsync operator is just a substitute of doing a standard await foreach loop, and was included in the library at a time when the C# language support for await foreach was non existent (before C# 8). I would recommend against using this operator, because its resemblance with the new Parallel.ForEachAsync API could create confusion. Here is what is written inside the source code of the ForEachAsync operator:
// REVIEW: Once we have C# 8.0 language support, we may want to do away with these
// methods. An open question is how to provide support for cancellation,
// which could be offered through WithCancellation on the source. If we still
// want to keep these methods, they may be a candidate for
// System.Interactive.Async if we consider them to be non-standard
// (i.e. IEnumerable<T> doesn't have a ForEach extension method either).

Related

How to refactor 'lock' to 'async/await'?

We have an old library written in C# targeting framework 2.0. Recently we are going to use it in a modern .net core project and intend to use async/await. However, the old library has a lot of lock blocks.
We plan to add new async methods to implement the same logic.
For example,
the old code
void GetOrder()
{
// ...
lock(_lock)
{
//...
}
}
expected result
async Task AsyncGetOrder()
{
// ...
await DoSomethingWithLock()
}
Please give me some advices about how to translate lock into async/await.
You could use SemaphoreSlim, but if there's a lot of it, the AsyncLock library will probably make the conversion much easier (and cleaner).
Just go with the AsyncLock library and relax.
The first thing you need to take into account is that async methods can call non-async methods, but it's not trivial for non-async methods to call async methods and wait for them to finish.
This means that every method that has a lock inside of it will probably need to be called only by async methods. You can do blocking waits for async methods but then there's no point to refactoring and you have to be very careful to avoid deadlocks.
You also need to be aware that in some project types there's an importance to the identity of the executing thread. For example in WPF there are some things that only the UI thread is allowed to do, and if you launch a Task that runs such code from a thread in the thread-pool, you're likely to experience exceptions.
Having said that, if you want to refactor a method that waits on a lock into an async method that asynchronously waits for a lock/semaphore, you should use SemaphoreSlim and await the WaitAsync call. This way the async method will yield control when the SemaphoreSlim is blocking execution, and resume execution at some point later.
There are a few general approaches:
#1 Refactor away from the need for locks at all
Look around for how to write lock-free C#. Sometimes it involved judicious use of the Interlocked class. Other times it involves a shift in mindset toward immutable state (ask a functional programmer). There are many cases when doing this has boosted parallel performance significantly.
And of course lockless threadsafe code can be executed either synchronously or asynchronously.
#2 Refactor away from the need for a reentrant lock, then go async with something that doesn't allow reentrance
This is basically what people are recommending when they recommend SemaphoreSlim, the AsyncEx NuGet package, or similar.
Stephen Cleary has written about the wisdom behind going this route. He also gives some examples of how to do it:
https://blog.stephencleary.com/2013/04/recursive-re-entrant-locks.html
#3 Find an async drop-in replacement for Monitor.Lock/lock
Basically what you need is something that gives all three of these at the same time:
Asynchronicity
Reentrance
Mutual exclusion
Monitor.Lock/lock give you the second and third things. So you want something that gives you the first one also without sacrificing the other two.
This is a little trickier than it might seem. At first glance there are several NuGet packages which appear to do this. But the only correct one that I know of is this NuGet package (which I wrote):
https://www.nuget.org/packages/ReentrantAsyncLock/
Here it is in action:
var asyncLock = new ReentrantAsyncLock();
var raceCondition = 0;
// You can acquire the lock asynchronously
await using (await asyncLock.LockAsync(CancellationToken.None))
{
await Task.WhenAll(
Task.Run(async () =>
{
// The lock is reentrant
await using (await asyncLock.LockAsync(CancellationToken.None))
{
// The lock provides mutual exclusion
raceCondition++;
}
}),
Task.Run(async () =>
{
await using (await asyncLock.LockAsync(CancellationToken.None))
{
raceCondition++;
}
})
);
}
Assert.Equal(2, raceCondition);
This is certainly not the first attempt at doing this. But like I said it's the only correct attempt that I've seen so far. Some other implementations will deadlock trying to re-enter the lock in one of the
Task.Run calls. Others will not actually provide mutual exclusion and the
raceCondition variable will sometimes equal 1 instead of 2:
Stephen Cleary's POC does not provide mutual exclusion: https://dotnetfiddle.net/vLKyCX
NeoSmart.AsyncLock does not provide reentrance with mutual exclusion: https://dotnetfiddle.net/CkK674
Flettu sometimes does not provide reentrance, sometimes throws a semaphore count exception, or otherwise does not provide mutual exclusion: https://dotnetfiddle.net/o0c7j7
CellWars.Threading.AsyncLock does not provide mutual exclusion: https://dotnetfiddle.net/Tz38lN

Various ways of asynchronously returning a collection (with C# 7 features)

I have a simple synchronous method, looking like this:
public IEnumerable<Foo> MyMethod(Source src)
{
// returns a List of Oof objects from a web service
var oofs = src.LoadOofsAsync().Result;
foreach(var oof in oofs)
{
// transforms an Oof object to a Foo object
yield return Transform(oof);
}
}
Since the method is part of a web application, it is good to use all resources as effectively as possible. Therefore, I would like to change the method into an asynchronous one. The easiest option is to do something like this:
public async Task<IEnumerable<Foo>> MyMethodAsync(Source src)
{
var oofs = await src.LoadOofsAsync();
return oofs.Select(oof => Transform(oof));
}
I am not an expert on either async/await or IEnumerable. However, from what I understand, using this approach "kills" the benefits of IEnumerable, because the Task is awaited until the whole collection is loaded, thus omitting the "laziness" of the IEnumerable collection.
On other StackOverflow posts I have read several suggestions for using Rx.NET (or System.Reactive). Quickly browsing through the documentation I have read that IObservable<T> is their asynchronous alternative to IEnumerable<T>. However, using the naive approach and trying to type the following just did not work:
public async IObservable<Foo> MyMethodReactive(Source src)
{
var oofs = await src.LoadOofsAsync();
foreach(var oof in oofs)
{
yield return Transform(oof);
}
}
I got an compilation error, that IObservable<T> does implement neither GetEnumerator(), nor GetAwaiter() - thus it cannot use both yield and async. I have not read the documentation of Rx.NET deeper, so I am probably just using the library incorrectly. But I did not want to spend time learning a new framework to modify a single method.
With the new possibilities in C# 7 it is now possible to implement custom types. Thus I, theoretically, could implement an IAsyncEnumerable, which would define both GetEnumerator() and GetAwaiter() methods. However, from my previous experience, I remember an unsuccessful attempt to create a custom implementation of GetEnumerator()... I ended up with a simple List, hidden in a container.
Thus we have 4 possible approaches to solve the task:
Keep the code synchronous, but with IEnumerable
Change it to asynchronous, but wrap IEnumerable in a Task<T>
Learn and use Rx.NET (System.Reactive)
Create a custom IAsyncEnumerable with C# 7 features
What are the benefits and drawbacks of each of these attempts? Which of them has the most significant impact on resource utilization?
Keep the code synchronous, but with IEnumerable
Change it to asynchronous, but wrap IEnumerable in a Task
Learn and use Rx.NET (System.Reactive)
Create a custom IAsyncEnumerable with C# 7 features
What are the benefits and drawbacks of each of these attempts? Which
of them has the most significant impact on resource utilization?
In your situation, it sounds like the best option is Task<IEnumerable<T>>. Here's what where each option excels:
Synchronous code (or parallel synchronous code) excels when there is no I/O, but heavy CPU use. If you have I/O code waiting synchronously (like your first method implementation), the CPU is just burning cycles while waiting for the web service to respond doing nothing.
Task<IEnumerable<T>> is meant for when there is an I/O operation to fetch a collection. The thread running waiting for the I/O operation can then have something else scheduled on it while awaiting.
This sounds like your case.
Rx is best for push scenarios: Where there is data being 'pushed' to your code which you want to respond to. Common examples are applications that receive stock-market pricing data, or chat applications.
IAsyncEnumerable is meant for when you have a collection where each item will require or generate an async task. An example: Iterating over a collection of items and executing some sort of unique DB query for each one. If your Transform was in fact an I/O-bound async method, then this is probably more sensible.

Is it a bad practice to combine use of Task and IObservable in my C# application?

I've recently gotten into Rx and I'm using it to help me pull data from several APIs in a data mining application.
I have an interface that I implement for each API, which encapsulates common calls to each API, e.g.
public interface IMyApi {
IObservable<string> GetApiName(); //Cold feed for getting the API's name.
IObservable<int> GetNumberFeed(); //Hot feed of numbers from the API
}
My question is around cold IObservables vs Tasks. In my mind, a cold observable is basically a task, they operate in much the same way. It strikes me as strange to be 'abstracting' a Task away as a cold observable, when you could argue that a Task is all you need. Also using a cold observable to wrap Tasks hides the nature of the activity, since the signature looks the same as a hot observable.
Another way I could represent the above interface is:
public interface IMyApi {
Task<string> GetApiNameAsync(); //Async method for getting the API's name.
IObservable<int> GetNumberFeed(); //Hot feed of numbers from the API
}
Is there some conventional wisdom on why I shouldn't mix and match between Tasks and IObservables?
Edit: To clarify - I've read the other discussions posted and understand the relationship between Rx and TPL, but my concerns are mainly about whether or not it's safe to combine the two in an application and whether it can lead to bad practice or threading and scheduling pitfalls?
My question is around cold IObservables vs Tasks. In my mind, a cold observable is basically a task, they operate in much the same way
It is important to note that this is not the case, they are very different. Here's the core difference:
// Nothing happens here at all! Just like calling Enumerable.Range(0, 100000000)
// doesn't actually create a huge array, until I use foreach.
var myColdObservable = MakeANetworkRequestObservable();
// Two network requests made!
myColdObservable.Subscribe(x => /*...*/);
myColdObservable.Subscribe(x => /*...*/);
// Only ***one*** network request made, subscribers share the
// result
var myTaskObservable = MakeATask().ToObservable();
myTaskObservable.Subscribe(x => /*...*/);
myTaskObservable.Subscribe(x => /*...*/);
Why is this important? Several methods in Rx such as Retry depend on this behavior:
// Retries three times, then gives up
myColdObservable.Retry(3).Subscribe(x => /*...*/);
// Actually *never* retries, and is effectively the same as if the
// Retry were never there, since all three tries will get the same
// result!
myTaskObservable.Retry(3).Subscribe(x => /*...*/);
So in general, making your Observables as cold will generally make your life easier.
How can I make a Task Cold?
Use the Defer operator:
var obs = Observable.Defer(() => CreateATask().ToObservable());
// CreateATask called *twice* here
obs.Subscribe(/*...*/);
obs.Subscribe(/*...*/);
There's no problem mixing the models, and in fact even the Rx team has included many adaptive operators in Rx. For example, ToTask, ToObservable, SelectMany, DeferAsync, StartAsync, ToAsync, etc. You can even await an IObservable<T> within an async method.
The primary difference that should affect your decision is cardinality:
IObservable<T> is [0,∞]
Task<T> is [0,1]
So if you need to represent only a single return value, then strongly consider using Task<T>.
The difference between Task and IObservable is not hot vs. cold: Task-returning methods can pretty much be "cold" (return new Task on every call) or "hot" (always return the same Task), just like IObservables.
The difference between the two is that IObservable represents a sequence of results, while Task represents a single result.
So, in cases when you'll always have a single result (or an error), use Task, when you can have any number of results, use IObservable.

Is Async await keyword equivalent to a ContinueWith lambda?

Could someone please be kind enough to confirm if I have understood the Async await keyword correctly? (Using version 3 of the CTP)
Thus far I have worked out that inserting the await keyword prior to a method call essentially does 2 things, A. It creates an immediate return and B. It creates a "continuation" that is invoked upon the completion of the async method invocation. In any case the continuation is the remainder of the code block for the method.
So what I am wondering is, are these two bits of code technically equivalent, and if so, does this basically mean that the await keyword is identical to creating a ContinueWith Lambda (Ie: it's basically a compiler shortcut for one)? If not, what are the differences?
bool Success =
await new POP3Connector(
"mail.server.com", txtUsername.Text, txtPassword.Text).Connect();
// At this point the method will return and following code will
// only be invoked when the operation is complete(?)
MessageBox.Show(Success ? "Logged In" : "Wrong password");
VS
(new POP3Connector(
"mail.server.com", txtUsername.Text, txtPassword.Text ).Connect())
.ContinueWith((success) =>
MessageBox.Show(success.Result ? "Logged In" : "Wrong password"));
The general idea is correct - the remainder of the method is made into a continuation of sorts.
The "fast path" blog post has details on how the async/await compiler transformation works.
Differences, off the top of my head:
The await keyword also makes use of a "scheduling context" concept. The scheduling context is SynchronizationContext.Current if it exists, falling back on TaskScheduler.Current. The continuation is then run on the scheduling context. So a closer approximation would be to pass TaskScheduler.FromCurrentSynchronizationContext into ContinueWith, falling back on TaskScheduler.Current if necessary.
The actual async/await implementation is based on pattern matching; it uses an "awaitable" pattern that allows other things besides tasks to be awaited. Some examples are the WinRT asynchronous APIs, some special methods such as Yield, Rx observables, and special socket awaitables that don't hit the GC as hard. Tasks are powerful, but they're not the only awaitables.
One more minor nitpicky difference comes to mind: if the awaitable is already completed, then the async method does not actually return at that point; it continues synchronously. So it's kind of like passing TaskContinuationOptions.ExecuteSynchronously, but without the stack-related problems.
It's "essentially" that, but the generated code does strictly more than just that. For lots more detail on the code generated, I'd highly recommend Jon Skeet's Eduasync series:
http://codeblog.jonskeet.uk/category/eduasync/
In particular, post #7 gets into what gets generated (as of CTP 2) and why, so probably a great fit for what you're looking for at the moment:
http://codeblog.jonskeet.uk/2011/05/20/eduasync-part-7-generated-code-from-a-simple-async-method/
EDIT: I think it's likely to be more detail than what you're looking for from the question, but if you're wondering what things look like when you have multiple awaits in the method, that's covered in post #9 :)
http://codeblog.jonskeet.uk/2011/05/30/eduasync-part-9-generated-code-for-multiple-awaits/

How does async works in C#?

Microsoft announced the Visual Studio Async CTP today (October 28, 2010) that introduces the async and await keywords into C#/VB for asynchronous method execution.
First I thought that the compiler translates the keywords into the creation of a thread but according to the white paper and Anders Hejlsberg's PDC presentation (at 31:00) the asynchronous operation happens completely on the main thread.
How can I have an operation executed in parallel on the same thread? How is it technically possible and to what is the feature actually translated in IL?
It works similarly to the yield return keyword in C# 2.0.
An asynchronous method is not actually an ordinary sequential method. It is compiled into a state machine (an object) with some state (local variables are turned into fields of the object). Each block of code between two uses of await is one "step" of the state machine.
This means that when the method starts, it just runs the first step and then the state machine returns and schedules some work to be done - when the work is done, it will run the next step of the state machine. For example this code:
async Task Demo() {
var v1 = foo();
var v2 = await bar();
more(v1, v2);
}
Would be translated to something like:
class _Demo {
int _v1, _v2;
int _state = 0;
Task<int> _await1;
public void Step() {
switch(this._state) {
case 0:
this._v1 = foo();
this._await1 = bar();
// When the async operation completes, it will call this method
this._state = 1;
op.SetContinuation(Step);
case 1:
this._v2 = this._await1.Result; // Get the result of the operation
more(this._v1, this._v2);
}
}
The important part is that it just uses the SetContinuation method to specify that when the operation completes, it should call the Step method again (and the method knows that it should run the second bit of the original code using the _state field). You can easily imagine that the SetContinuation would be something like btn.Click += Step, which would run completely on a single thread.
The asynchronous programming model in C# is very close to F# asynchronous workflows (in fact, it is essentially the same thing, aside from some technical details), and writing reactive single-threaded GUI applications using async is quite an interesting area - at least I think so - see for example this article (maybe I should write a C# version now :-)).
The translation is similar to iterators (and yield return) and in fact, it was possible to use iterators to implement asynchronous programming in C# earlier. I wrote an article about that a while ago - and I think it can still give you some insight on how the translation works.
How can I have an operation executed in parallel on the same thread?
You can't. Asynchrony is not "parallelism" or "concurrency". Asynchrony might be implemented with parallelism, or it might not be. It might be implemented by breaking up the work into small chunks, putting each chunk of work on a queue, and then executing each chunk of work whenever the thread happens to be not doing anything else.
I've got a whole series of articles on my blog about how all this stuff works; the one directly germane to this question will probably go up Thursday of next week. Watch this link for details.
As I understand it, what the async and await keywords do is that every time an async method employs the await keyword, the compiler will turn the remainder of the method into a continuation that is scheduled when the async operation is completed. That allows async methods to return to the caller immediately and resume work when the async part is done.
According to the available papers there are a lot details to it, but unless I am mistaken, that is the gist of it.
As I see it the purpose of the async methods is not to run a lot of code in parallel, but to chop up async methods into a number of small chunks, that can be called as needed. The key point is that the compiler will handle all the complex wiring of callbacks using tasks/continuations. This not only reduces complexity, but allows async method to be written more or less like traditional synchronous code.

Categories