What are Publish and SelectMany doing in this query? - c#

I’ve got new question about one (or two) Reactive methods.
In my scenario I needed an observable sequence capable of suppressing other emitted Tasks while the first Task wasn’t completed, and ended up with something like this:
Observable.Interval(TimeSpan.FromMilliseconds(200))
.Select(x => Observable.FromAsync(async () =>
{
await Task.Delay(1000);
// Simulating long running task
Console.WriteLine(x);
}))
.Publish(x => x.FirstAsync().SelectMany(c => c).Repeat())
.Subscribe();
I tried to Google but I really can’t explain few things:
First of all, how that works 😂?
What is exactly on that Reactive sequence that blocks the observable from reaching the subscription?
What exactly Replay does in that? Isn’t Replay supposed to replay the Task in this case? Or I don’t know.
Can anyone explain detailed every step in that Reactive query?
What does Publish with that kind of selector. How Replay is playing in that query? And why do I need to call SelectMany on FirstAsync if only one element will be emitted anyway.

The .SelectMany(c => c) is an idiomatic way to flatten/merge a nested sequence. You can replace it with .Merge(), and the behavior of the query will be identical.
The Publish operator, when used with a Func<IObservable<TSource>, IObservable<TResult>> parameter, subscribes to the query on which it is chained, and then remains subscribed until the sequence produced by the lambda completes. So in your case, by wrapping the inner sequence x.FirstAsync().SelectMany(c => c).Replay() in a Publish, you delay the unsubscription from the chained sequence (the Interval+Select+FromAsync) until the inner sequence completes. The inner sequence never completes, so the chained sequence keeps forever producing one cold IObservable<Unit> subsequence every second. You can observe this happening, by intercepting a Do operator before the Publish:
.Do(x => Console.WriteLine($"New subsequence: {x.GetType().Name}"))
The Replay operator is similar to the Publish, with the difference that the Replay has memory of past notifications, whilst the Publish has no memories whatsoever. I guess that your intention was to attach the Repeat instead of the Replay. The Replay without parameter produces a "connectable" observable, that doesn't subscribe automatically to the chained sequence. You have either to Connect it manually, or to attach the RefCount operator to it. In your case you are doing neither, so the resulting sequence never emits anything and never completes. It's a nasty dead-lock situation.

Related

Async LINQ - not lazy? Multithreaded?

I have the following code:
var things = await GetDataFromApi(cancellationToken);
var builder = new StringBuilder(JsonSerializer.Serialize(things));
await things
.GroupBy(x => x.Category)
.ToAsyncEnumerable()
.SelectManyAwaitWithCancellation(async (category, ct) =>
{
var thingsWithColors = await _colorsApiClient.GetColorsFor(category.Select(thing => thing.Name).ToList(), ct);
return category
.Select(thing => ChooseBestColor(thingsWithColors))
.ToAsyncEnumerable();
})
.ForEachAsync(thingAndColor =>
{
Console.WriteLine(Thread.CurrentThread.ManagedThreadId); // prints different IDs
builder.Replace(thingAndColor.Thing, $"{thingAndColor.Color} {thingAndColor.Thing}");
}, cancellationToken);
It uses System.Linq.Async and I find it difficult to understand.
In "classic"/synchronous LINQ, the whole thing would get executed only when I call ToList() or ToArray() on it. In the example above, there is no such call, but the lambdas get executed anyway. How does it work?
The other concern I have is about multi-threading. I heard many times that async != multithreading. Then, how is that possible that the Console.WriteLine(Thread.CurrentThread.ManagedThreadId); prints various IDs? Some of the IDs get printed multiple times, but overall there are about 5 thread IDs in the output. None of my code creates any threads explicitly. It's all async-await.
The StringBuilder does not support multi-threading, and I'd like to understand if the implementation above is valid.
Please ignore the algorithm of my code, it does not really matter, it's just an example. What matters is the usage of System.Async.Linq.
ForEachAsync would have a similar effect as ToList/ToArray since it forces evaluation of the entire list.
By default, anything after an await continues on the same execution context, meaning if the code runs on the UI thread, it will continue running on the UI thread. If it runs on a background thread, it will continue to run on a background thread, but not necessarily the same one.
However, none of your code should run in parallel. That does not necessarily mean it is thread safe, there probably need to be some memory barriers to ensure data is flushed correctly, but I would assume these barriers are issued by the framework code itself.
The System.Async.Linq, as well as the whole dotnet/reactive repository, is currently a semi-abandoned project. The issues on GitHub are piling up, and nobody answers them officially for almost a year. There is no documentation published, apart from the XML documentation in the source code on top of each method. You can't really use this library without studying the source code, which is generally easy to do because the code is short, readable, and honestly doesn't do too much. The functionality offered by this library is similar with the functionality found in the System.Linq, with the main difference being that the input is IAsyncEnumerable<T> instead of IEnumerable<T>, and the delegates can return values wrapped in ValueTask<T>s.
With the exception of a few operators like the Merge (and only one of its overloads), the System.Async.Linq doesn't introduce concurrency. The asynchronous operations are invoked one at a time, and then they are awaited before invoking the next operation. The SelectManyAwaitWithCancellation operator is not one of the exceptions. The selector is invoked sequentially for each element, and the resulting IAsyncEnumerable<TResult> is enumerated sequentially, and its values yielded the one after the other. So it's unlikely to create thread-safety issues.
The ForEachAsync operator is just a substitute of doing a standard await foreach loop, and was included in the library at a time when the C# language support for await foreach was non existent (before C# 8). I would recommend against using this operator, because its resemblance with the new Parallel.ForEachAsync API could create confusion. Here is what is written inside the source code of the ForEachAsync operator:
// REVIEW: Once we have C# 8.0 language support, we may want to do away with these
// methods. An open question is how to provide support for cancellation,
// which could be offered through WithCancellation on the source. If we still
// want to keep these methods, they may be a candidate for
// System.Interactive.Async if we consider them to be non-standard
// (i.e. IEnumerable<T> doesn't have a ForEach extension method either).

Executing Task based methods in Observable chain => IObservable<IObservable<Unit>>

I have a lot of code that is reactive but needs to call into Task based methods.
For example, in this snippet PracticeIdChanged is an IObservable. When PracticeIdChanged fires, I want the system to react by reloading some stuff and have code that looks like this:
PracticeIdChanged.Subscribe(async x => {
SelectedChargeInfo.Item = null;
await LoadAsync().ConfigureAwait(false);
});
Although it seems to work ok, I get warnings about executing async code in the Subscribe. Additionally, I consider this to be a code smell as I am mixing two separate threading models which I think may come to bite me later.
Refactoring like this works (even) with no combination methods like .Merge(), .Switch() or .Concat()
PracticeIdChanged
.Do(_ => SelectedChargeInfo.Item = null)
.Select(_ => LoadAsync().ToObservable())
.Subscribe();
When PracticeIdChanged fires the LoadAsync method executes
The Select results in an IObservable<IObservable> which looks odd. Is this ok or does it require some combination function like .Merge or .Switch()
In many places, I use SelectMany to execute the Task based method but it requires returning Task which would require changing the signature of the Task based method in the example above which I do not want to do.
It depends on what kind of notifications you expect to get from the resulting sequence, and what kind of behavior you want in case of errors. In your example you .Subscribe() to the sequence without passing any handler whatsoever (onNext/onError/onCompleted), indicating that you are not interested to be notified for anything. You don't care about the completion of the asynchronous operations, all of them becoming essentially fired-and-forgotten. Also a failure of one asynchronous operation will have no consequence to the rest: the already started asynchronous operations will continue running (they won't get canceled), and starting new asynchronous operations will not be impeded. Finally a failure of the source sequence (PracticeIdChanged) will result in an unhandled exception, that will crash the process. If that's the behavior that you want, then your current setup is what you need.
For comparison, let's consider this setup:
await PracticeIdChanged
.Do(_ => SelectedChargeInfo.Item = null)
.Select(_ => Observable.FromAsync(ct => LoadAsync(ct)))
.Merge()
.DefaultIfEmpty();
This setup assumes that the LoadAsync method has a CancellationToken parameter. The resulting sequence is awaited. The await will complete when all LoadAsync operations have completed, or any one of them has failed, or if the source sequence has failed. In case of failure, all currently running asynchronous operations will receive a cancellation signal, so that they can bail out quickly. The await will not wait for their completion though. Only the first error that occurred will be propagated as an exception. This exception can be handled by wrapping the await in a try/catch block. There is no possibility for an uncatchable, process-crashing, unhandled exception.
The purpose of the DefaultIfEmpty at the end of the chain is to prevent an InvalidOperationException, in case the source sequence emits zero elements. It's a workaround for this strange "feature" of empty observable sequences, to throw when waited synchronously or asynchronously.

C# Rx Observable.Never<> behaves like Observable.Empty<>?

I'm new to Rx and have this code snippet for a try.
Observable.Never<string>().Subscribe(Console.Write);
Observable.Empty<string>().Subscribe(Console.Write);
I expected that Never<string>() will behave like Console.ReadKey which will not end, but as I run these 2 lines, they end immediately, so [Never] behaves like [Empty] to me.
What is the correct understanding of [Never] and is there a good sample usage for it?
Both the Observable.Never() and Observable.Empty() observable will not emit any values. However, the observable built with Observable.Never() will not complete and instead stays "open/active". It might be a difference at the location where you consume these observable if the observable completes (Empty()) or not (Never()), but this depends on your actual use-case.
Having observables which doesn't emit any values might sound useless, but maybe you are at a location where you have to provide an observable (instead of using null). So you can write something like this:
public override IObservable<string> NameChanged => Observable.Never<string>();
So I don't have a ton of experience with Rx, but I believe all Subscribe is doing is registering what to do when the observable emits. If your observable never emits (ie Empty or Never) then the method is never called. The application is not waiting for the subscription itself to end. If you wanted to wait forever you would use something like
Observable.Never<string>().Wait();
This ties back into the reason you should not use async operation in Subscribe. Take the following code
static void Main(string[] args)
{
Observable.Range(1, 5).Subscribe(async x => await DoTheThing(x));
Console.WriteLine("done");
}
static async Task DoTheThing(int x)
{
await Task.Delay(TimeSpan.FromSeconds(x));
Console.WriteLine(x);
}
When run the application will immediately write "done" and exit after pushing the values into the observable because it is unaware of the subscriber in the context of whether it has completed its handling or not. Hopefully I made that clear, and if someone with more Rx knowledge wants to step in to help if needed that'd be good.
This link gives you the difference between empty,never ,and throw:
http://reactivex.io/documentation/operators/empty-never-throw.html
And this is one usage of Never:
https://rxjs-dev.firebaseapp.com/api/index/const/NEVER

System.Reactive Throttling an async method

I have been putting off using reactive extensions for so long, and I thought this would be a good use. Quite simply, I have a method that can be called for various reasons on various code paths
private async Task GetProductAsync(string blah) {...}
I need to be able to throttle this method. That's to say, I want to stop the flow of calls until no more calls are made (for a specified period of time). Or more clearly, if 10 calls to this method happen within a certain time period, i want to limit (throttle) it to only 1 call (after a period) when the last call was made.
I can see an example using a method with IEnumerable, this kind of makes sense
static IEnumerable<int> GenerateAlternatingFastAndSlowEvents()
{ ... }
...
var observable = GenerateAlternatingFastAndSlowEvents().ToObservable().Timestamp();
var throttled = observable.Throttle(TimeSpan.FromMilliseconds(750));
using (throttled.Subscribe(x => Console.WriteLine("{0}: {1}", x.Value, x.Timestamp)))
{
Console.WriteLine("Press any key to unsubscribe");
Console.ReadKey();
}
Console.WriteLine("Press any key to exit");
Console.ReadKey();
However, (and this has always been my major issue with Rx, forever), how do I create an Observable from a simple async method.
Update
I have managed to find an alternative approach using ReactiveProperty
Barcode = new ReactiveProperty<string>();
Barcode.Select(text => Observable.FromAsync(async () => await GetProductAsync(text)))
.Throttle(TimeSpan.FromMilliseconds(1000))
.Switch()
.ToReactiveProperty();
The premise is I catch it at the text property Barcode, however it has its own drawbacks, as ReactiveProperty takes care of notification, and I cant silently update the backing field as its already managed.
To summarise, how can I convert an async method call to Observable, so I can user the Throttle method?
Unrelated to your question, but probably helpful: Rx's Throttle operator is really a debounce operator. The closest thing to a throttling operator is Sample. Here's the difference (assuming you want to throttle or debounce to one item / 3 seconds):
items : --1-23----4-56-7----8----9-
throttle: --1--3-----4--6--7--8-----9
debounce: --1-------4--6------8----9-
Sample/throttle will bunch items that arrive in the sensitive time and emit the last one on the next sampling tick. Debounce throws away items that arrive in the sensitive time, then re-starts the clock: The only way for an item to emit is if it was preceded by Time-Range of silence.
RX.Net's Throttle operator does what debounce above depicts. Sample does what throttle above depicts.
If you want something different, describe how you want to throttle.
There are two key ways of converting a Task to an Observable, with an important difference between them.
Observable.FromAsync(()=>GetProductAsync("test"));
and
GetProductAsync("test").ToObservable();
The first will not start the Task until you subscribe to it.
The second will create (and start) the task and the result will either immediately or sometime later appear in the observable, depending on how fast the Task is.
Looking at your question in general though, it seems that you want to stop the flow of calls. You do not want to throttle the flow of results, which would result in unnecessary computation and loss.
If this is your aim, your GetProductAsync could be seen as an observer of call events, and the GetProductAsync should throttle those calls. One way of achieving that would be to declare a
public event Action<string> GetProduct;
and use
var callStream= Observable.FromEvent<string>(
handler => GetProduct+= handler ,
handler => GetProduct-= handler);
The problem then becomes how to return the result and what should happen when your 'caller's' call is throttled out and discarded.
One approach there could be to declare a type "GetProductCall" which would have the input string and output result as properties.
You could then have a setup like:
var callStream= Observable.FromEvent<GetProductCall>(
handler => GetProduct+= handler ,
handler => GetProduct-= handler)
.Throttle(...)
.Select(r=>async r.Result= await GetProductCall(r.Input).ToObservable().FirstAsync());
(code not tested, just illustrative)
Another approach might include the Merge(N) overload that limits the max number of concurrent observables.

Observable.Range being repeated?

New to Rx -- I have a sequence that appears to be functioning correctly except for the fact that it appears to repeat.
I think I'm missing something around calls to Select() or SelectMany() that triggers the range to re-evaluate.
Explanation of Code & What I'm trying to Do
For all numbers, loop through a method that retrieves data (paged from a database).
Eventually, this data will be empty (I only want to keep processing while it retrieves data
For each of those records retrieved, I only want to process ones that should be processed
Of those that should be processed, I'd like to process up to x of them in parallel (according to a setting).
I want to wait until the entire sequence is completed to exit the method (hence the wait call at the end).
Problem With the Code Below
I run the code through with a data set that I know only has 1 item.
So, page 0 returns 1 item, and page 1 return 0 items.
My expectation is that the process runs once for the one item.
However, I see that both page 0 and 1 are called twice and the process thus runs twice.
I think this has something to do with a call that is causing the range to re-evaluate beginning from 0, but I can't figure out what that it is.
The Code
var query = Observable.Range(0, int.MaxValue)
.Select(pageNum =>
{
_etlLogger.Info("Calling GetResProfIDsToProcess with pageNum of {0}", pageNum);
return _recordsToProcessRetriever.GetResProfIDsToProcess(pageNum, _processorSettings.BatchSize);
})
.TakeWhile(resProfList => resProfList.Any())
.SelectMany(records => records.Where(x=> _determiner.ShouldProcess(x)))
.Select(resProf => Observable.Start(async () => await _schoolDataProcessor.ProcessSchoolsAsync(resProf)))
.Merge(maxConcurrent: _processorSettings.ParallelProperties)
.Do(async trackingRequests =>
{
await CreateRequests(trackingRequests.Result, createTrackingPayload);
var numberOfAttachments = SumOfRequestType(trackingRequests.Result, TrackingRecordRequestType.AttachSchool);
var numberOfDetachments = SumOfRequestType(trackingRequests.Result, TrackingRecordRequestType.DetachSchool);
var numberOfAssignmentTypeUpdates = SumOfRequestType(trackingRequests.Result,
TrackingRecordRequestType.UpdateAssignmentType);
_etlLogger.Info("Extractor generated {0} attachments, {1} detachments, and {2} assignment type changes.",
numberOfAttachments, numberOfDetachments, numberOfAssignmentTypeUpdates);
});
var subscription = query.Subscribe(
trackingRequests =>
{
//Nothing really needs to happen here. Technically we're just doing something when it's done.
},
() =>
{
_etlLogger.Info("Finished! Woohoo!");
});
await query.Wait();
This is because you subscribe to the sequence twice. Once at query.Subscribe(...) and again at query.Wait().
Observable.Range(0, int.MaxValue) is a cold observable. Every time you subscribe to it, it will be evaluated again. You could make the observable hot by publishing it with Publish(), then subscribe to it, and then Connect() and then Wait(). This does add a risk to get a InvalidOperationException if you call Wait() after the last element is already yielded. A better alternative is LastOrDefaultAsync().
That would get you something like this:
var connectable = query.Publish();
var subscription = connectable.Subscribe(...);
subscription = new CompositeDisposable(connectable.Connect(), subscription);
await connectable.LastOrDefaultAsync();
Or you can avoid await and return a task directly with ToTask() (do remove async from your method signature).
return connectable.LastOrDefaultAsync().ToTask();
Once converted to a task, you can synchronously wait for it with Wait() (do not confuse Task.Wait() with Observable.Wait()).
connectable.LastOrDefaultAsync().ToTask().Wait();
However, most likely you do not want to wait at all! Waiting in a async context makes little sense. What you should do it put the remaining of the code that needs to run after the sequence completes in the OnComplete() part of the subscription. If you have (clean-up) code that needs to run even when you unsubscribe (Dispose), consider Observable.Using or the Finally(...) method to ensure this code is ran.
As already mentioned the cause of the Observable.Range being repeated is the fact that you're subscribing twice - once with .Subscribe(...) and once with .Wait().
In this kind of circumstance I would go with a very simple blocking call to get the values. Just do this:
var results = query.ToArray().Wait();
The .ToArray() turns a multi-valued IObservable<T> into a single values IObservable<T[]>. The .Wait() turns this into T[]. It's the easy way to ensure only one subscription, blocking, and getting all of the values out.
In your case you may not need all values, but I think this is a good habit to get into.

Categories