System.Reactive Throttling an async method - c#

I have been putting off using reactive extensions for so long, and I thought this would be a good use. Quite simply, I have a method that can be called for various reasons on various code paths
private async Task GetProductAsync(string blah) {...}
I need to be able to throttle this method. That's to say, I want to stop the flow of calls until no more calls are made (for a specified period of time). Or more clearly, if 10 calls to this method happen within a certain time period, i want to limit (throttle) it to only 1 call (after a period) when the last call was made.
I can see an example using a method with IEnumerable, this kind of makes sense
static IEnumerable<int> GenerateAlternatingFastAndSlowEvents()
{ ... }
...
var observable = GenerateAlternatingFastAndSlowEvents().ToObservable().Timestamp();
var throttled = observable.Throttle(TimeSpan.FromMilliseconds(750));
using (throttled.Subscribe(x => Console.WriteLine("{0}: {1}", x.Value, x.Timestamp)))
{
Console.WriteLine("Press any key to unsubscribe");
Console.ReadKey();
}
Console.WriteLine("Press any key to exit");
Console.ReadKey();
However, (and this has always been my major issue with Rx, forever), how do I create an Observable from a simple async method.
Update
I have managed to find an alternative approach using ReactiveProperty
Barcode = new ReactiveProperty<string>();
Barcode.Select(text => Observable.FromAsync(async () => await GetProductAsync(text)))
.Throttle(TimeSpan.FromMilliseconds(1000))
.Switch()
.ToReactiveProperty();
The premise is I catch it at the text property Barcode, however it has its own drawbacks, as ReactiveProperty takes care of notification, and I cant silently update the backing field as its already managed.
To summarise, how can I convert an async method call to Observable, so I can user the Throttle method?

Unrelated to your question, but probably helpful: Rx's Throttle operator is really a debounce operator. The closest thing to a throttling operator is Sample. Here's the difference (assuming you want to throttle or debounce to one item / 3 seconds):
items : --1-23----4-56-7----8----9-
throttle: --1--3-----4--6--7--8-----9
debounce: --1-------4--6------8----9-
Sample/throttle will bunch items that arrive in the sensitive time and emit the last one on the next sampling tick. Debounce throws away items that arrive in the sensitive time, then re-starts the clock: The only way for an item to emit is if it was preceded by Time-Range of silence.
RX.Net's Throttle operator does what debounce above depicts. Sample does what throttle above depicts.
If you want something different, describe how you want to throttle.

There are two key ways of converting a Task to an Observable, with an important difference between them.
Observable.FromAsync(()=>GetProductAsync("test"));
and
GetProductAsync("test").ToObservable();
The first will not start the Task until you subscribe to it.
The second will create (and start) the task and the result will either immediately or sometime later appear in the observable, depending on how fast the Task is.
Looking at your question in general though, it seems that you want to stop the flow of calls. You do not want to throttle the flow of results, which would result in unnecessary computation and loss.
If this is your aim, your GetProductAsync could be seen as an observer of call events, and the GetProductAsync should throttle those calls. One way of achieving that would be to declare a
public event Action<string> GetProduct;
and use
var callStream= Observable.FromEvent<string>(
handler => GetProduct+= handler ,
handler => GetProduct-= handler);
The problem then becomes how to return the result and what should happen when your 'caller's' call is throttled out and discarded.
One approach there could be to declare a type "GetProductCall" which would have the input string and output result as properties.
You could then have a setup like:
var callStream= Observable.FromEvent<GetProductCall>(
handler => GetProduct+= handler ,
handler => GetProduct-= handler)
.Throttle(...)
.Select(r=>async r.Result= await GetProductCall(r.Input).ToObservable().FirstAsync());
(code not tested, just illustrative)
Another approach might include the Merge(N) overload that limits the max number of concurrent observables.

Related

Using Task.WhenAny to await capacity on a SemaphoreSlim

I have an Async processing pipeline. I'm implementing a constraint such that I need to limit the number of submissions to the next stage. For my component, I have:
a single input source (items are tagged with a source id)
a single destination that I need to propagate the inputs to in a round-robin fashion
If capacity is available for multiple clients, I'll forward a message for each (i.e. if I wake because client 3's semaphore has finally become available, I may first send a message for client 2, then 3, etc)
The processing loop is thus waiting on one or more of the following conditions to continue processing:
more input has arrived (it might be for a client that is not at its limit)
capacity has been released for a client that we are holding data for
Ideally, I'd thus use Task.WhenAny with
a task representing the input c.Reader.WaitToReadAsync(ct).AsTask()
N tasks representing the clients for which we are holding data, but it's not yet valid for submission (the Wait for the SemaphoreSlim would fail)
SemaphoreSlim's AvailableWaitHandle would be ideal - I want to know when it's available but I don't want to reserve it yet as I have a chain of work to process - I just want to know if one of my trigger conditions has arisen
Is there a way to await the AvailableWaitHandle ?
My current approach is a hack derived from this answer to a similar question by #usr - posting for reference
My actual code is here - there's also some more detail about the whole problem in my self-answer below
I want to know when it's available but I don't want to reserve it yet as I have a chain of work to process
This is very strange and it seems like SemaphoreSlim may not be what you want to use. SemaphoreSlim is a kind of mutual exclusion object that can allow multiple takers. It is sometimes used for throttling. But I would not want to use it as a signal.
It seems like something more like an asynchronous manual-reset event would be what you really want. Or, if you wanted to maintain a locking/concurrent-collection kind of concept, an asynchronous monitor or condition variable.
That said, it is possible to use a SemaphoreSlim as a signal. I just strongly hesitate suggesting this as a solution, since it seems like this requirement is highlighting a mistake in the choice of synchronization primitive.
Is there a way to await the AvailableWaitHandle?
Yes. You can await anything by using TaskCompletionSource. For WaitHandles in particular, ThreadPool.RegisterWaitForSingleObject gives you an efficient wait.
So, what you want to do is create a TCS, register the handle with the thread pool, and complete the TCS in the callback for that handle. Keep in mind that you want to be sure that the TCS is eventually completed and that everything is disposed properly.
I have support for this in my AsyncEx library (WaitHandleAsyncFactory.FromWaitHandle); code is here.
My AsyncEx library also has support for asynchronous manual-reset events, monitors, and condition variables.
Variation of #usr's answer which solved my problem
class SemaphoreSlimExtensions
public static Task AwaitButReleaseAsync(this SemaphoreSlim s) =>
s.WaitAsync().ContinueWith(_t -> s.Release(), TaskContinuationOptions.ExecuteSynchronously);
public static bool TryTake(this SemaphoreSlim s) =>
s.Wait(0);
In my use case, the await is just a trigger for synchronous logic that then walks the full set - the TryTake helper is in my case a natural way to handle the conditional acquisition of the semaphore and the processing that's contingent on that. My wait looks like this:
SemaphoreSlim[] throttled = Enumerable.Empty();
while (!ct.IsCancellationRequested)
{
var throttledClients = from s in throttled select s.AwaitButReleaseAsync();
var timeout = 3000;
var otherConditions = new[] { input.Reader.WaitToReadAsync().ToTask(), Task.Delay(ct, timeout) };
await Task.WhenAny(throttledClients.Append(otherConditions));
throttled = propagateStuff();
}
The actual code is here - I have other cases that follow the same general pattern. The bottom line is that I want to separate the concern of waiting for the availability of capacity on a SemaphoreSlim from actually reserving that capacity.

C# Rx Observable.Never<> behaves like Observable.Empty<>?

I'm new to Rx and have this code snippet for a try.
Observable.Never<string>().Subscribe(Console.Write);
Observable.Empty<string>().Subscribe(Console.Write);
I expected that Never<string>() will behave like Console.ReadKey which will not end, but as I run these 2 lines, they end immediately, so [Never] behaves like [Empty] to me.
What is the correct understanding of [Never] and is there a good sample usage for it?
Both the Observable.Never() and Observable.Empty() observable will not emit any values. However, the observable built with Observable.Never() will not complete and instead stays "open/active". It might be a difference at the location where you consume these observable if the observable completes (Empty()) or not (Never()), but this depends on your actual use-case.
Having observables which doesn't emit any values might sound useless, but maybe you are at a location where you have to provide an observable (instead of using null). So you can write something like this:
public override IObservable<string> NameChanged => Observable.Never<string>();
So I don't have a ton of experience with Rx, but I believe all Subscribe is doing is registering what to do when the observable emits. If your observable never emits (ie Empty or Never) then the method is never called. The application is not waiting for the subscription itself to end. If you wanted to wait forever you would use something like
Observable.Never<string>().Wait();
This ties back into the reason you should not use async operation in Subscribe. Take the following code
static void Main(string[] args)
{
Observable.Range(1, 5).Subscribe(async x => await DoTheThing(x));
Console.WriteLine("done");
}
static async Task DoTheThing(int x)
{
await Task.Delay(TimeSpan.FromSeconds(x));
Console.WriteLine(x);
}
When run the application will immediately write "done" and exit after pushing the values into the observable because it is unaware of the subscriber in the context of whether it has completed its handling or not. Hopefully I made that clear, and if someone with more Rx knowledge wants to step in to help if needed that'd be good.
This link gives you the difference between empty,never ,and throw:
http://reactivex.io/documentation/operators/empty-never-throw.html
And this is one usage of Never:
https://rxjs-dev.firebaseapp.com/api/index/const/NEVER

Is it a bad practice to combine use of Task and IObservable in my C# application?

I've recently gotten into Rx and I'm using it to help me pull data from several APIs in a data mining application.
I have an interface that I implement for each API, which encapsulates common calls to each API, e.g.
public interface IMyApi {
IObservable<string> GetApiName(); //Cold feed for getting the API's name.
IObservable<int> GetNumberFeed(); //Hot feed of numbers from the API
}
My question is around cold IObservables vs Tasks. In my mind, a cold observable is basically a task, they operate in much the same way. It strikes me as strange to be 'abstracting' a Task away as a cold observable, when you could argue that a Task is all you need. Also using a cold observable to wrap Tasks hides the nature of the activity, since the signature looks the same as a hot observable.
Another way I could represent the above interface is:
public interface IMyApi {
Task<string> GetApiNameAsync(); //Async method for getting the API's name.
IObservable<int> GetNumberFeed(); //Hot feed of numbers from the API
}
Is there some conventional wisdom on why I shouldn't mix and match between Tasks and IObservables?
Edit: To clarify - I've read the other discussions posted and understand the relationship between Rx and TPL, but my concerns are mainly about whether or not it's safe to combine the two in an application and whether it can lead to bad practice or threading and scheduling pitfalls?
My question is around cold IObservables vs Tasks. In my mind, a cold observable is basically a task, they operate in much the same way
It is important to note that this is not the case, they are very different. Here's the core difference:
// Nothing happens here at all! Just like calling Enumerable.Range(0, 100000000)
// doesn't actually create a huge array, until I use foreach.
var myColdObservable = MakeANetworkRequestObservable();
// Two network requests made!
myColdObservable.Subscribe(x => /*...*/);
myColdObservable.Subscribe(x => /*...*/);
// Only ***one*** network request made, subscribers share the
// result
var myTaskObservable = MakeATask().ToObservable();
myTaskObservable.Subscribe(x => /*...*/);
myTaskObservable.Subscribe(x => /*...*/);
Why is this important? Several methods in Rx such as Retry depend on this behavior:
// Retries three times, then gives up
myColdObservable.Retry(3).Subscribe(x => /*...*/);
// Actually *never* retries, and is effectively the same as if the
// Retry were never there, since all three tries will get the same
// result!
myTaskObservable.Retry(3).Subscribe(x => /*...*/);
So in general, making your Observables as cold will generally make your life easier.
How can I make a Task Cold?
Use the Defer operator:
var obs = Observable.Defer(() => CreateATask().ToObservable());
// CreateATask called *twice* here
obs.Subscribe(/*...*/);
obs.Subscribe(/*...*/);
There's no problem mixing the models, and in fact even the Rx team has included many adaptive operators in Rx. For example, ToTask, ToObservable, SelectMany, DeferAsync, StartAsync, ToAsync, etc. You can even await an IObservable<T> within an async method.
The primary difference that should affect your decision is cardinality:
IObservable<T> is [0,∞]
Task<T> is [0,1]
So if you need to represent only a single return value, then strongly consider using Task<T>.
The difference between Task and IObservable is not hot vs. cold: Task-returning methods can pretty much be "cold" (return new Task on every call) or "hot" (always return the same Task), just like IObservables.
The difference between the two is that IObservable represents a sequence of results, while Task represents a single result.
So, in cases when you'll always have a single result (or an error), use Task, when you can have any number of results, use IObservable.

Converting a IEnumerable<T> to IObservable<T>, with maximum parallelism

I have a sequence of async tasks to do (say, fetch N web pages). Now what I want is to expose them all as an IObservable<T>. My current solution uses the answer from this question:
async Task<ResultObj> GetPage(string page) {
Console.WriteLine("Before");
var result = await FetchFromInternet(page);
Console.WriteLine("After");
return result;
}
// pages is an IEnumerable<string>
IObservable<ResultObj> resultObservable =pages.Select(GetPage).
Select(t => Observable.FromAsync(() => t)).Merge();
// Now consume the list
foreach(ResultObj obj in resultObservable.ToEnumerable()) {
Console.WriteLine(obj.ToString());
}
The problem is that I do not know the number of pages to be fetched, and it could be large. I do not want to make hundreds of simultaneous requests. So I want a way to limit the maximum number of tasks that will be executed in parallel. Is there a way to limit the number of concurrent invocations of GetPage?
There is a Merge overload that takes a maxConcurrent parameter, but it does not seem to actually limit the concurrency of the function invokation. THe console prints all the Before messages before the After messages.
Note: I need to convert back to IEnumerable<T>. I'm writing a data source for a system that gives me descriptors of data to fetch, and I need to give it back a list of the downloaded data.
EDIT
The following should work. This overload limits the number of concurrent subscriptions.
var resultObservable = pages
.Select(p => Observable.FromAsync(() => GetPage(p)))
.Merge(maxConcurrent);
Explanation
In order to understand why this change is needed we need some background
FromAsync returns an observable that will invoke the passed Func every time it is subscribed to. This implies that if the observable is never subscribed to, it will never be invoked.
Merge eagerly reads the source sequence, and only subscribes to n observables simultaneously.
With these two pieces we can know why the original version will execute everything in parallel: because of (2), GetPage will have already been invoked for all the source strings by the time Merge decides how many observables need to be subscribed.
And we can also see why the second version works: even though the sequence has been fully iterated over, (1) means that GetPage is not invoked until Merge decides it needs to subscribe to n observables. This leads to the desired result of only n tasks being executed simultaneously.

.Net RX: tracking progress of parallel execution

I need to execute multiple long-running operations in parallel and would like to report a progress in some way. From my initial research it seems that IObservable fits into this model. The idea is that I call a method that return IObservable of int where int is reported percent complete, parallel execution starts immediately upon exiting a method, this observable must be a hot observable so that all subscribers learn the same progress information at specific point in time, e.g. late subscriber may only learn that the whole execution is complete and there is no more progress to track.
The closest approach to this problem that I found is to use Observable.ForkJoin and Observable.Start, but I can't come to understanding how to make them a single observable that I can return from a method. 
Please share your ideas of how can it be achieved or maybe there is another approach to this problem using .Net RX.
To make a hot observable, I would probably start with a method that uses a BehaviorSubject as the return value and the way the operations report progress. If you just want the example, skip to the end. The rest of this answer explains the steps.
I will assume for the sake of this answer that your long-running operations do not have their own way to be called asynchronously. If they do, the next step may be a little different. The next thing to do is to send the work to another thread using an IScheduler. You may allow the caller to select where the work happens by making an overload that takes the scheduler as a parameter if desired (in which case the overload that does not will pick a default scheduler). There are quite a few overloads of IScheduler.Scheduler, of which several are extensions methods, so you should look through them to see which is most appropriate for your situation; I'm using the on that takes only an Action here. If you have multiple operations that can all run in parallel, you can call scheduler.Schedule multiple times.
The hardest part of this will probably be determining what the progress is at any given point. If you have multiple operations going on at once, you will probably need to keep track of how many have completed to know what the current progress is. With the information you provided, I can't be more specific than that.
Finally, if your operations are cancellable, you may want to take a CancellationToken as a parameter. You can use this to cancel the operation while it is in the scheduler's queue before it starts. If you write your operation code correctly, it can use the token for cancellation as well.
IObservable<int> DoStuff(/*args*/,
CancellationToken cancel,
IScheduler scheduler)
{
BehaviorSubject<int> progress;
//if you don't take it as a parameter, pick a scheduler
//IScheduler scheduler = Scheduler.ThreadPool;
var disp = scheduler.Schedule(() =>
{
//do stuff that needs to run on another thread
//report progres
porgress.OnNext(25);
});
var disp2 = scheduler.Schedule(...);
//if the operation is cancelled before the scheduler has started it,
//you need to dispose the return from the Schedule calls
var allOps = new CompositeDisposable(disp, disp2);
cancel.Register(allOps.Dispose);
return progress;
}
Here is one approach
// setup a method to do some work,
// and report it's own partial progress
Func<string, IObservable<int>> doPartialWork =
(arg) => Observable.Create<int>(obsvr => {
return Scheduler.TaskPool.Schedule(arg,(sched,state) => {
var progress = 0;
var cancel = new BooleanDisposable();
while(progress < 10 && !cancel.IsDisposed)
{
// do work with arg
Thread.Sleep(550);
obsvr.OnNext(1); //report progress
progress++;
}
obsvr.OnCompleted();
return cancel;
});
});
var myArgs = new[]{"Arg1", "Arg2", "Arg3"};
// run all the partial bits of work
// use SelectMany to get a flat stream of
// partial progress notifications
var xsOfPartialProgress =
myArgs.ToObservable(Scheduler.NewThread)
.SelectMany(arg => doPartialWork(arg))
.Replay().RefCount();
// use Scan to get a running aggreggation of progress
var xsProgress = xsOfPartialProgress
.Scan(0d, (prog,nextPartial)
=> prog + (nextPartial/(myArgs.Length*10d)));

Categories