Observable.Range being repeated? - c#

New to Rx -- I have a sequence that appears to be functioning correctly except for the fact that it appears to repeat.
I think I'm missing something around calls to Select() or SelectMany() that triggers the range to re-evaluate.
Explanation of Code & What I'm trying to Do
For all numbers, loop through a method that retrieves data (paged from a database).
Eventually, this data will be empty (I only want to keep processing while it retrieves data
For each of those records retrieved, I only want to process ones that should be processed
Of those that should be processed, I'd like to process up to x of them in parallel (according to a setting).
I want to wait until the entire sequence is completed to exit the method (hence the wait call at the end).
Problem With the Code Below
I run the code through with a data set that I know only has 1 item.
So, page 0 returns 1 item, and page 1 return 0 items.
My expectation is that the process runs once for the one item.
However, I see that both page 0 and 1 are called twice and the process thus runs twice.
I think this has something to do with a call that is causing the range to re-evaluate beginning from 0, but I can't figure out what that it is.
The Code
var query = Observable.Range(0, int.MaxValue)
.Select(pageNum =>
{
_etlLogger.Info("Calling GetResProfIDsToProcess with pageNum of {0}", pageNum);
return _recordsToProcessRetriever.GetResProfIDsToProcess(pageNum, _processorSettings.BatchSize);
})
.TakeWhile(resProfList => resProfList.Any())
.SelectMany(records => records.Where(x=> _determiner.ShouldProcess(x)))
.Select(resProf => Observable.Start(async () => await _schoolDataProcessor.ProcessSchoolsAsync(resProf)))
.Merge(maxConcurrent: _processorSettings.ParallelProperties)
.Do(async trackingRequests =>
{
await CreateRequests(trackingRequests.Result, createTrackingPayload);
var numberOfAttachments = SumOfRequestType(trackingRequests.Result, TrackingRecordRequestType.AttachSchool);
var numberOfDetachments = SumOfRequestType(trackingRequests.Result, TrackingRecordRequestType.DetachSchool);
var numberOfAssignmentTypeUpdates = SumOfRequestType(trackingRequests.Result,
TrackingRecordRequestType.UpdateAssignmentType);
_etlLogger.Info("Extractor generated {0} attachments, {1} detachments, and {2} assignment type changes.",
numberOfAttachments, numberOfDetachments, numberOfAssignmentTypeUpdates);
});
var subscription = query.Subscribe(
trackingRequests =>
{
//Nothing really needs to happen here. Technically we're just doing something when it's done.
},
() =>
{
_etlLogger.Info("Finished! Woohoo!");
});
await query.Wait();

This is because you subscribe to the sequence twice. Once at query.Subscribe(...) and again at query.Wait().
Observable.Range(0, int.MaxValue) is a cold observable. Every time you subscribe to it, it will be evaluated again. You could make the observable hot by publishing it with Publish(), then subscribe to it, and then Connect() and then Wait(). This does add a risk to get a InvalidOperationException if you call Wait() after the last element is already yielded. A better alternative is LastOrDefaultAsync().
That would get you something like this:
var connectable = query.Publish();
var subscription = connectable.Subscribe(...);
subscription = new CompositeDisposable(connectable.Connect(), subscription);
await connectable.LastOrDefaultAsync();
Or you can avoid await and return a task directly with ToTask() (do remove async from your method signature).
return connectable.LastOrDefaultAsync().ToTask();
Once converted to a task, you can synchronously wait for it with Wait() (do not confuse Task.Wait() with Observable.Wait()).
connectable.LastOrDefaultAsync().ToTask().Wait();
However, most likely you do not want to wait at all! Waiting in a async context makes little sense. What you should do it put the remaining of the code that needs to run after the sequence completes in the OnComplete() part of the subscription. If you have (clean-up) code that needs to run even when you unsubscribe (Dispose), consider Observable.Using or the Finally(...) method to ensure this code is ran.

As already mentioned the cause of the Observable.Range being repeated is the fact that you're subscribing twice - once with .Subscribe(...) and once with .Wait().
In this kind of circumstance I would go with a very simple blocking call to get the values. Just do this:
var results = query.ToArray().Wait();
The .ToArray() turns a multi-valued IObservable<T> into a single values IObservable<T[]>. The .Wait() turns this into T[]. It's the easy way to ensure only one subscription, blocking, and getting all of the values out.
In your case you may not need all values, but I think this is a good habit to get into.

Related

Observable timers disposing

I'm using the Reactive .NET extensions and I wonder about its disposal. I know in some cases it's good to dispose it like that: .TakeUntil(Observable.Timer(TimeSpan.FromMinutes(x))). I
First case
In this case, I have a timer that triggers after x seconds and then it completes and should be disposed.
public void ScheduleOrderCancellationIfNotFilled(string pair, long orderId, int waitSecondsBeforeCancel)
{
Observable.Timer(TimeSpan.FromSeconds(waitSecondsBeforeCancel))
.Do(e =>
{
var result = _client.Spot.Order.GetOrder(pair, orderId);
if (result.Success)
{
if (result.Data?.Status != OrderStatus.Filled)
{
_client.Spot.Order.CancelOrder(pair, orderId);
}
}
})
.Subscribe();
}
Second case
In this case, the timer runs on the first second and then it repeats itself on each 29 minutes. This should live until its defining class is disposed. I believe this one should be disposed with IDisposable implementation. How?
var keepAliveListenKey = Observable.Timer(TimeSpan.FromSeconds(1), TimeSpan.FromMinutes(29))
.Do(async e =>
{
await KeepAliveListenKeyAsync().ConfigureAwait(false);
})
.Subscribe();
Edit
I also want it to be using a Subject<T> which makes it easier to dispose and to reset the subscription.
For ex. Reset and Dispose observable subscriber, Reactive Extensions (#Enigmativity)
public class UploadDicomSet : ImportBaseSet
{
IDisposable subscription;
Subject<IObservable<long>> subject = new Subject<IObservable<long>>();
public UploadDicomSet()
{
subscription = subject.Switch().Subscribe(s => CheckUploadSetList(s));
subject.OnNext(Observable.Interval(TimeSpan.FromMinutes(2)));
}
void CheckUploadSetList(long interval)
{
subject.OnNext(Observable.Never<long>());
// Do other things
}
public void AddDicomFile(SharedLib.DicomFile dicomFile)
{
subject.OnNext(Observable.Interval(TimeSpan.FromMinutes(2)));
// Reset the subscription to go off in 2 minutes from now
// Do other things
}
}
In the first case it gonna be disposed automatically. It is, actually, a common way to achieve automatic subscription management and that's definitely nice and elegant way to deal with rx.
In the second case you have over-engineered. Observable.Timer(TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(1)) is itself sufficient to generate a sequence of ascending longs over time. Since this stream is endless by its nature, you right - explicit subscription management is required. So it is enough to have:
var sub = Observable.Timer(TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(1)).Subscribe()
...and sub.Dispose() it later.
P.S. Note that in your code you .Do async/await. Most probably that is not what you want. You want SelectMany to ensure that async operation is properly awaited and exceptions handled.
Answering your questions in the comments section:
What about disposing using Subject instead?
Well, nothing so special about it. Both IObserver<>, IObservable<> is implemented by this class such that it resembles classical .NET events (list of callbacks to be called upon some event). It does not differ in any sense with respect to your question and use-case.
May you give an example about the .Do with exception handling?
Sure. The idea is that you want translate your async/await encapsulated into some Task<T> to IObservable<T> such that is preserves both cancellation and error signals. For that .SelectMany method must be used (like SelectMany from LINQ, the same idea). So just change your .Do to .SelectMany.
Observable
.Timer(TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(1))
.SelectMany(_ => Observable.FromAsync(() => /* that's the point where your Task<> becomes Observable */ myTask))
I'm confused again. Do I need IObservable<IObservable> (Select) or IObservable (SelectMany)
Most probably, you don't need switch. Why? Because it was created mainly to avoid IO race conditions, such that whenever new event is emitted, the current one (which might be in progress due to natural parallelism or asynchronous workflow) is guaranteed to be cancelled (i.e. unsubscribed). Otherwise race conditions can (and will) damage your state.
SelectMany, on the contrary, will make sure all of them are happen sequentially, in some total order they have indeed arrived. Nothing will be cancelled. You will finish (await, if you wish) current callback and then trigger the next one. Of course, such behavior can be altered by means of appropriate IScheduler, but that is another story.
Reactive Observable Subscription Disposal (#Enigmativity)
The disposable returned by the Subscribe extension methods is returned solely to allow you to manually unsubscribe from the observable before the observable naturally ends.
If the observable completes - with either OnCompleted or OnError - then the subscription is already disposed for you.
One important thing to note: the garbage collector never calls .Dispose() on observable subscriptions, so you must dispose of your subscriptions if they have not (or may not have) naturally ended before your subscription goes out of scope.
First case
Looks like I don't need to manually .Dispose() the subscription in the first case scenario because it ends naturally.
Dispose is being triggered at the end.
var xs = Observable.Create<long>(o =>
{
var d = Observable.Timer(TimeSpan.FromSeconds(5))
.Do(e =>
{
Console.WriteLine("5 seconds elapsed.");
})
.Subscribe(o);
return Disposable.Create(() =>
{
Console.WriteLine("Disposed!");
d.Dispose();
});
});
var subscription = xs.Subscribe(x => Console.WriteLine(x));
Second case
but in the second case, where it doesn't end "naturally", I should dispose it.
Dispose is not triggered unless manually disposed.
var xs = Observable.Create<long>(o =>
{
var d = Observable.Timer(TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(1))
.Do(e =>
{
Console.WriteLine("Test.");
})
.Subscribe(o);
return Disposable.Create(() =>
{
Console.WriteLine("Disposed!");
d.Dispose();
});
});
var subscription = xs.Subscribe(x => Console.WriteLine(x));
Conclusion
He gave such a nice examples, it's worth seeing if you are asking yourself the same question.

System.Reactive Throttling an async method

I have been putting off using reactive extensions for so long, and I thought this would be a good use. Quite simply, I have a method that can be called for various reasons on various code paths
private async Task GetProductAsync(string blah) {...}
I need to be able to throttle this method. That's to say, I want to stop the flow of calls until no more calls are made (for a specified period of time). Or more clearly, if 10 calls to this method happen within a certain time period, i want to limit (throttle) it to only 1 call (after a period) when the last call was made.
I can see an example using a method with IEnumerable, this kind of makes sense
static IEnumerable<int> GenerateAlternatingFastAndSlowEvents()
{ ... }
...
var observable = GenerateAlternatingFastAndSlowEvents().ToObservable().Timestamp();
var throttled = observable.Throttle(TimeSpan.FromMilliseconds(750));
using (throttled.Subscribe(x => Console.WriteLine("{0}: {1}", x.Value, x.Timestamp)))
{
Console.WriteLine("Press any key to unsubscribe");
Console.ReadKey();
}
Console.WriteLine("Press any key to exit");
Console.ReadKey();
However, (and this has always been my major issue with Rx, forever), how do I create an Observable from a simple async method.
Update
I have managed to find an alternative approach using ReactiveProperty
Barcode = new ReactiveProperty<string>();
Barcode.Select(text => Observable.FromAsync(async () => await GetProductAsync(text)))
.Throttle(TimeSpan.FromMilliseconds(1000))
.Switch()
.ToReactiveProperty();
The premise is I catch it at the text property Barcode, however it has its own drawbacks, as ReactiveProperty takes care of notification, and I cant silently update the backing field as its already managed.
To summarise, how can I convert an async method call to Observable, so I can user the Throttle method?
Unrelated to your question, but probably helpful: Rx's Throttle operator is really a debounce operator. The closest thing to a throttling operator is Sample. Here's the difference (assuming you want to throttle or debounce to one item / 3 seconds):
items : --1-23----4-56-7----8----9-
throttle: --1--3-----4--6--7--8-----9
debounce: --1-------4--6------8----9-
Sample/throttle will bunch items that arrive in the sensitive time and emit the last one on the next sampling tick. Debounce throws away items that arrive in the sensitive time, then re-starts the clock: The only way for an item to emit is if it was preceded by Time-Range of silence.
RX.Net's Throttle operator does what debounce above depicts. Sample does what throttle above depicts.
If you want something different, describe how you want to throttle.
There are two key ways of converting a Task to an Observable, with an important difference between them.
Observable.FromAsync(()=>GetProductAsync("test"));
and
GetProductAsync("test").ToObservable();
The first will not start the Task until you subscribe to it.
The second will create (and start) the task and the result will either immediately or sometime later appear in the observable, depending on how fast the Task is.
Looking at your question in general though, it seems that you want to stop the flow of calls. You do not want to throttle the flow of results, which would result in unnecessary computation and loss.
If this is your aim, your GetProductAsync could be seen as an observer of call events, and the GetProductAsync should throttle those calls. One way of achieving that would be to declare a
public event Action<string> GetProduct;
and use
var callStream= Observable.FromEvent<string>(
handler => GetProduct+= handler ,
handler => GetProduct-= handler);
The problem then becomes how to return the result and what should happen when your 'caller's' call is throttled out and discarded.
One approach there could be to declare a type "GetProductCall" which would have the input string and output result as properties.
You could then have a setup like:
var callStream= Observable.FromEvent<GetProductCall>(
handler => GetProduct+= handler ,
handler => GetProduct-= handler)
.Throttle(...)
.Select(r=>async r.Result= await GetProductCall(r.Input).ToObservable().FirstAsync());
(code not tested, just illustrative)
Another approach might include the Merge(N) overload that limits the max number of concurrent observables.

Verify methods are called in order

I have the following method:
public async Task DeleteAmendment(int amendmentHeaderId, int userId)
{
// Delete the corresponding version records.
await _amendmentVersionService.DeleteForAmendmentAsync(amendmentHeaderId);
// Delete the corresponding lifecycle records.
await _amendmentLifecycleService.DeleteForAmendmentAsync(amendmentHeaderId);
// Delete the amendment header record itself.
await _amendmentHeaderService.DeleteAsync(amendmentHeaderId, userId);
}
I am trying to verify that the methods are called in order.
I have tried setting callbacks (see below)
AmendmentVersionService.Setup(x => x.DeleteForAmendmentAsync(It.IsAny<int>()))
.Callback(() => ServiceCallbackList.Add("AmendmentVersionService"));
AmendmentLifecycleService.Setup(x => x.DeleteForAmendmentAsync(It.IsAny<int>()))
.Callback(() => ServiceCallbackList.Add("AmendmentLifecycleService"));
AmendmentHeaderService.Setup(x => x.DeleteAsync(It.IsAny<int>(), It.IsAny<int>()))
.Callback(() => ServiceCallbackList.Add("AmendmentHeaderService"));
But the list only contains the string "AmendmentVersionService"
Any ideas?
One way to achieve the same goal, with a different concept would be to have 3 tests, one per call. It's a bit dirty, but as a fallback solution, it could get you out of the woods
for the first call:
Setup call 2 to throw an exception of a custom type TestException.
assert only call 1 was performed
Expect the TestException to be thrown
for the second call:
Setup call 3 to throw an exception of a custom type TestException.
assert call 1 and 2 were performed
Expect the TestException to be thrown
for the third call:
Setup all calls to perform normally.
assert call 1, 2 and 3 were performed
You could use continuations (below) but really if you need to garuntee that these things happen in order then they should not be async operations. Typically you would want the async operations to be able to run at the same time;
public async Task DeleteAmendment(int amendmentHeaderId, int userId)
{
Task.Run(() =>
{
// Delete the corresponding version records.
await _amendmentVersionService.DeleteForAmendmentAsync(amendmentHeaderId);
}).ContinueWith(_ =>
{
// Delete the corresponding lifecycle records.
await _amendmentLifecycleService.DeleteForAmendmentAsync(amendmentHeaderId);
}).ContinueWith(_ =>
{
// Delete the amendment header record itself.
await _amendmentHeaderService.DeleteAsync(amendmentHeaderId, userId);
});
}
Your problem is that you will never be able to know if a method was performed as a result of the previous one finishing (awaited) or if you were lucky enough not to suffer from a race condition (call made without await, or no ContinueWith)
The only way you can actually test it for sure is by replacing the default TaskScheduler by an implementation of your own, which will not queue the subsequent task. If the subsequent task gets called, then your code is wrong. If not, that means that the task is really executed as a result of the previous one completing.
We have done this in Testeroids, a test framework a friend and I built.
by doing so, your custom TaskScheduler can perform the tasks sequentially, in a single thread (to really highlight the timeline problems you could have) and record which tasks were scheduled and in which older.
It will require a lot of effort on your part if you want to be that thorough, but at least you get the idea.
In order to replace the defaukt TaskScheduler, you can get inspired by the work we did on Testeroids.
https://github.com/Testeroids/Testeroids/blob/c5f3f02e8078db649f804d94c37cdab3df89fed4/solution/src/app/Testeroids/TplTestPlatformHelper.cs
Thanks to Stephen Brickner ...
I made all my calls synchronous which made the callbacks in the Moq's work like a dream.
Thanks for all your help much appreciated!

Converting a IEnumerable<T> to IObservable<T>, with maximum parallelism

I have a sequence of async tasks to do (say, fetch N web pages). Now what I want is to expose them all as an IObservable<T>. My current solution uses the answer from this question:
async Task<ResultObj> GetPage(string page) {
Console.WriteLine("Before");
var result = await FetchFromInternet(page);
Console.WriteLine("After");
return result;
}
// pages is an IEnumerable<string>
IObservable<ResultObj> resultObservable =pages.Select(GetPage).
Select(t => Observable.FromAsync(() => t)).Merge();
// Now consume the list
foreach(ResultObj obj in resultObservable.ToEnumerable()) {
Console.WriteLine(obj.ToString());
}
The problem is that I do not know the number of pages to be fetched, and it could be large. I do not want to make hundreds of simultaneous requests. So I want a way to limit the maximum number of tasks that will be executed in parallel. Is there a way to limit the number of concurrent invocations of GetPage?
There is a Merge overload that takes a maxConcurrent parameter, but it does not seem to actually limit the concurrency of the function invokation. THe console prints all the Before messages before the After messages.
Note: I need to convert back to IEnumerable<T>. I'm writing a data source for a system that gives me descriptors of data to fetch, and I need to give it back a list of the downloaded data.
EDIT
The following should work. This overload limits the number of concurrent subscriptions.
var resultObservable = pages
.Select(p => Observable.FromAsync(() => GetPage(p)))
.Merge(maxConcurrent);
Explanation
In order to understand why this change is needed we need some background
FromAsync returns an observable that will invoke the passed Func every time it is subscribed to. This implies that if the observable is never subscribed to, it will never be invoked.
Merge eagerly reads the source sequence, and only subscribes to n observables simultaneously.
With these two pieces we can know why the original version will execute everything in parallel: because of (2), GetPage will have already been invoked for all the source strings by the time Merge decides how many observables need to be subscribed.
And we can also see why the second version works: even though the sequence has been fully iterated over, (1) means that GetPage is not invoked until Merge decides it needs to subscribe to n observables. This leads to the desired result of only n tasks being executed simultaneously.

.Net RX: tracking progress of parallel execution

I need to execute multiple long-running operations in parallel and would like to report a progress in some way. From my initial research it seems that IObservable fits into this model. The idea is that I call a method that return IObservable of int where int is reported percent complete, parallel execution starts immediately upon exiting a method, this observable must be a hot observable so that all subscribers learn the same progress information at specific point in time, e.g. late subscriber may only learn that the whole execution is complete and there is no more progress to track.
The closest approach to this problem that I found is to use Observable.ForkJoin and Observable.Start, but I can't come to understanding how to make them a single observable that I can return from a method. 
Please share your ideas of how can it be achieved or maybe there is another approach to this problem using .Net RX.
To make a hot observable, I would probably start with a method that uses a BehaviorSubject as the return value and the way the operations report progress. If you just want the example, skip to the end. The rest of this answer explains the steps.
I will assume for the sake of this answer that your long-running operations do not have their own way to be called asynchronously. If they do, the next step may be a little different. The next thing to do is to send the work to another thread using an IScheduler. You may allow the caller to select where the work happens by making an overload that takes the scheduler as a parameter if desired (in which case the overload that does not will pick a default scheduler). There are quite a few overloads of IScheduler.Scheduler, of which several are extensions methods, so you should look through them to see which is most appropriate for your situation; I'm using the on that takes only an Action here. If you have multiple operations that can all run in parallel, you can call scheduler.Schedule multiple times.
The hardest part of this will probably be determining what the progress is at any given point. If you have multiple operations going on at once, you will probably need to keep track of how many have completed to know what the current progress is. With the information you provided, I can't be more specific than that.
Finally, if your operations are cancellable, you may want to take a CancellationToken as a parameter. You can use this to cancel the operation while it is in the scheduler's queue before it starts. If you write your operation code correctly, it can use the token for cancellation as well.
IObservable<int> DoStuff(/*args*/,
CancellationToken cancel,
IScheduler scheduler)
{
BehaviorSubject<int> progress;
//if you don't take it as a parameter, pick a scheduler
//IScheduler scheduler = Scheduler.ThreadPool;
var disp = scheduler.Schedule(() =>
{
//do stuff that needs to run on another thread
//report progres
porgress.OnNext(25);
});
var disp2 = scheduler.Schedule(...);
//if the operation is cancelled before the scheduler has started it,
//you need to dispose the return from the Schedule calls
var allOps = new CompositeDisposable(disp, disp2);
cancel.Register(allOps.Dispose);
return progress;
}
Here is one approach
// setup a method to do some work,
// and report it's own partial progress
Func<string, IObservable<int>> doPartialWork =
(arg) => Observable.Create<int>(obsvr => {
return Scheduler.TaskPool.Schedule(arg,(sched,state) => {
var progress = 0;
var cancel = new BooleanDisposable();
while(progress < 10 && !cancel.IsDisposed)
{
// do work with arg
Thread.Sleep(550);
obsvr.OnNext(1); //report progress
progress++;
}
obsvr.OnCompleted();
return cancel;
});
});
var myArgs = new[]{"Arg1", "Arg2", "Arg3"};
// run all the partial bits of work
// use SelectMany to get a flat stream of
// partial progress notifications
var xsOfPartialProgress =
myArgs.ToObservable(Scheduler.NewThread)
.SelectMany(arg => doPartialWork(arg))
.Replay().RefCount();
// use Scan to get a running aggreggation of progress
var xsProgress = xsOfPartialProgress
.Scan(0d, (prog,nextPartial)
=> prog + (nextPartial/(myArgs.Length*10d)));

Categories