I have an observable stream IObservable<Task<T>> (call it stream A). From this stream, I wish to produce an IObservable<T> (stream B). If these were my only requirements, I believe it would be sufficient to say
streamB = streamA.Select(async x => await x).Select(x => x.Result);
The first Select I expect is truly asynchronous, in the sense that as the stream produces more Task<T> each would be awaited. The second Select however, will then "block" until each of the Task<T>s complete in order. If I'm thinking correctly here, there's no actual blocking going on though, because the second Select won't be entered until each async operation, respectively, in the first Select completes and resumes execution. The problem for me with this pattern is the fact that I must wait for each task to complete (i.e. succeed, fail or be cancelled) in the order that they arrive.
Consider this scenario: Assume I have two Task<T>s, t1 and t2. t1 arrives before t2. However, when t2 arrives, t1 has yet to complete, i.e. in a marble diagram
stream A: -------------------t1-------------t2-------------------------------
async Select completion --------------------------t1-----------t2------------
stream B: -----------------------------------------------------t2.Result----
In other words, the arrival of t2 from stream A before the completion of t1 essentially means that t1 should be ignored and not produced by stream B.
I have been able to solve this problem by using very imperative code and adding (too much) complexity, by passing the task to a special helper class that keeps track of all tasks that arrive, using an incrementing long to id each, then awaiting each task and using callbacks that pass the id back to the helper class to tell it which Task has completed. If the id is "older" than the latest to arrive, the result is ignored.
I strongly feel I am over-complicating what on the surface appears as a simple problem. Is there no infrastructure or pattern to solve this kind of problem, either in System.Reactive or e.g. in TPL?
Here's the query you need:
IObservable<T> streamB = streamA.Select(t => Observable.FromAsync(() => t)).Switch();
You should almost always avoid .Select(async x => await x) as it breaks the Rx contract allowing overlapping executions due to the async sending control back to the calling thread.
The normal way to turn an IObservable<Task<T>> to an IObservable<T> is to use .SelectMany(t => Observable.FromAsync(() => t)).
In your case, though, you want to throw away any currently computing values if a new task comes through. So this changes the query from .SelectMany(t => Observable.FromAsync(() => t)) to .Select(t => Observable.FromAsync(() => t)).Switch().
.Switch() turns an IObservable<IObservable<T>> into IObservable<T> by only producing values from the latest inner observable produced by the outer observable. It effectively ignores all but the latest observable. Just what you need.
Here's a demonstration of this working:
void Main()
{
IObservable<Task<long>> streamA = new []
{
ReturnDelayedAsync(1),
ReturnDelayedAsync(42),
}.ToObservable();
IObservable<long> streamB = streamA.Select(t => Observable.FromAsync(() => t)).Switch();
streamB.Subscribe(Console.WriteLine);
}
public async Task<long> ReturnDelayedAsync(long x)
{
await Task.Delay(TimeSpan.FromSeconds(2.0));
return x;
}
That produces a single value of 42 on the console.
Related
I'm using the Reactive .NET extensions and I wonder about its disposal. I know in some cases it's good to dispose it like that: .TakeUntil(Observable.Timer(TimeSpan.FromMinutes(x))). I
First case
In this case, I have a timer that triggers after x seconds and then it completes and should be disposed.
public void ScheduleOrderCancellationIfNotFilled(string pair, long orderId, int waitSecondsBeforeCancel)
{
Observable.Timer(TimeSpan.FromSeconds(waitSecondsBeforeCancel))
.Do(e =>
{
var result = _client.Spot.Order.GetOrder(pair, orderId);
if (result.Success)
{
if (result.Data?.Status != OrderStatus.Filled)
{
_client.Spot.Order.CancelOrder(pair, orderId);
}
}
})
.Subscribe();
}
Second case
In this case, the timer runs on the first second and then it repeats itself on each 29 minutes. This should live until its defining class is disposed. I believe this one should be disposed with IDisposable implementation. How?
var keepAliveListenKey = Observable.Timer(TimeSpan.FromSeconds(1), TimeSpan.FromMinutes(29))
.Do(async e =>
{
await KeepAliveListenKeyAsync().ConfigureAwait(false);
})
.Subscribe();
Edit
I also want it to be using a Subject<T> which makes it easier to dispose and to reset the subscription.
For ex. Reset and Dispose observable subscriber, Reactive Extensions (#Enigmativity)
public class UploadDicomSet : ImportBaseSet
{
IDisposable subscription;
Subject<IObservable<long>> subject = new Subject<IObservable<long>>();
public UploadDicomSet()
{
subscription = subject.Switch().Subscribe(s => CheckUploadSetList(s));
subject.OnNext(Observable.Interval(TimeSpan.FromMinutes(2)));
}
void CheckUploadSetList(long interval)
{
subject.OnNext(Observable.Never<long>());
// Do other things
}
public void AddDicomFile(SharedLib.DicomFile dicomFile)
{
subject.OnNext(Observable.Interval(TimeSpan.FromMinutes(2)));
// Reset the subscription to go off in 2 minutes from now
// Do other things
}
}
In the first case it gonna be disposed automatically. It is, actually, a common way to achieve automatic subscription management and that's definitely nice and elegant way to deal with rx.
In the second case you have over-engineered. Observable.Timer(TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(1)) is itself sufficient to generate a sequence of ascending longs over time. Since this stream is endless by its nature, you right - explicit subscription management is required. So it is enough to have:
var sub = Observable.Timer(TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(1)).Subscribe()
...and sub.Dispose() it later.
P.S. Note that in your code you .Do async/await. Most probably that is not what you want. You want SelectMany to ensure that async operation is properly awaited and exceptions handled.
Answering your questions in the comments section:
What about disposing using Subject instead?
Well, nothing so special about it. Both IObserver<>, IObservable<> is implemented by this class such that it resembles classical .NET events (list of callbacks to be called upon some event). It does not differ in any sense with respect to your question and use-case.
May you give an example about the .Do with exception handling?
Sure. The idea is that you want translate your async/await encapsulated into some Task<T> to IObservable<T> such that is preserves both cancellation and error signals. For that .SelectMany method must be used (like SelectMany from LINQ, the same idea). So just change your .Do to .SelectMany.
Observable
.Timer(TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(1))
.SelectMany(_ => Observable.FromAsync(() => /* that's the point where your Task<> becomes Observable */ myTask))
I'm confused again. Do I need IObservable<IObservable> (Select) or IObservable (SelectMany)
Most probably, you don't need switch. Why? Because it was created mainly to avoid IO race conditions, such that whenever new event is emitted, the current one (which might be in progress due to natural parallelism or asynchronous workflow) is guaranteed to be cancelled (i.e. unsubscribed). Otherwise race conditions can (and will) damage your state.
SelectMany, on the contrary, will make sure all of them are happen sequentially, in some total order they have indeed arrived. Nothing will be cancelled. You will finish (await, if you wish) current callback and then trigger the next one. Of course, such behavior can be altered by means of appropriate IScheduler, but that is another story.
Reactive Observable Subscription Disposal (#Enigmativity)
The disposable returned by the Subscribe extension methods is returned solely to allow you to manually unsubscribe from the observable before the observable naturally ends.
If the observable completes - with either OnCompleted or OnError - then the subscription is already disposed for you.
One important thing to note: the garbage collector never calls .Dispose() on observable subscriptions, so you must dispose of your subscriptions if they have not (or may not have) naturally ended before your subscription goes out of scope.
First case
Looks like I don't need to manually .Dispose() the subscription in the first case scenario because it ends naturally.
Dispose is being triggered at the end.
var xs = Observable.Create<long>(o =>
{
var d = Observable.Timer(TimeSpan.FromSeconds(5))
.Do(e =>
{
Console.WriteLine("5 seconds elapsed.");
})
.Subscribe(o);
return Disposable.Create(() =>
{
Console.WriteLine("Disposed!");
d.Dispose();
});
});
var subscription = xs.Subscribe(x => Console.WriteLine(x));
Second case
but in the second case, where it doesn't end "naturally", I should dispose it.
Dispose is not triggered unless manually disposed.
var xs = Observable.Create<long>(o =>
{
var d = Observable.Timer(TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(1))
.Do(e =>
{
Console.WriteLine("Test.");
})
.Subscribe(o);
return Disposable.Create(() =>
{
Console.WriteLine("Disposed!");
d.Dispose();
});
});
var subscription = xs.Subscribe(x => Console.WriteLine(x));
Conclusion
He gave such a nice examples, it's worth seeing if you are asking yourself the same question.
I have a helper class that saves text messages to the local file system. This method returns a Task object, and is asynchronous by definition.
I want to be able to observe when this method gets called, so I can continuously monitor the size and length of the buffer and make a decision based on that.
I am trying to implement this using the Reactive Extension for .NET. However, I can't come up with a design that allows me to continuously listen to messages being added to the buffer. Below is my current implementation:
public IObservable<Unit> Receive(InternalMessage message)
{
var observable = FileBuffer.BufferMessage(message.MessageId.ToString(), message, DateTime.UtcNow).ToObservable(); //This returns a Task, which I convert into an Observable
return observable;
}
Here is how I subscribe to the observable:
IObservable<Unit> receiverObservable = batchHandler.Receive(message);
receiverObservable.Subscribe(
x => Console.WriteLine("On next"),
ex => //TODO,
() => // Completed);
I want the subscriber to be called every time the method Receive is called. However, AFAIK, once this method is called, the observable completes and the sequence is terminated, so future calls to Receive won't be listened to.
Can someone recommend a way to use the Rx.Net libraries to implement this observable pattern that I am looking for, that is, how to keep the sequence open and feed it with results for async methods?
Receive as you've coded it, returns IObservable<Unit>, representing the completion of a single task. You want to subscribe to something that returns IObservable<IObservable<Unit>> representing a stream of task-completions.
There are a number of ways to do this, the best of which probably depends on how your class is set up and how you're calling it.
Here's the laziest one:
You declare a class-level variable subject that represents a stream of your calls:
Subject<IObservable<Unit>> subject = new Subject<IObservable<Unit>>();
subject.Merge().Subscribe(
x => Console.WriteLine("On next"),
ex => { }, //TODO
() => { } // Completed
);
Then when you have a new call, you just add it to the subject.
IObservable<Unit> receiverObservable = batchHandler.Receive(message);
subject.OnNext(receiverObservable);
The reason this is really lazy is that Rx is functional at its core, which tends to look down on mutable-state variables. Subjects are basically mutable state.
The better way to do it is to figure out when/why you're calling Receive, and structure that as an observable. Once that's done, you can work off of that:
IObservable<Unit> sourceReasonsToCallReceive; // Most likely sourced from event
sourceReasonsToCallReceive.SelectMany(_ => batchHandler.Receive(message))
.SubScribe(
x => Console.WriteLine("On next"),
ex => { }, //TODO
() => { } // Completed
);
Hope that helps.
New to Rx -- I have a sequence that appears to be functioning correctly except for the fact that it appears to repeat.
I think I'm missing something around calls to Select() or SelectMany() that triggers the range to re-evaluate.
Explanation of Code & What I'm trying to Do
For all numbers, loop through a method that retrieves data (paged from a database).
Eventually, this data will be empty (I only want to keep processing while it retrieves data
For each of those records retrieved, I only want to process ones that should be processed
Of those that should be processed, I'd like to process up to x of them in parallel (according to a setting).
I want to wait until the entire sequence is completed to exit the method (hence the wait call at the end).
Problem With the Code Below
I run the code through with a data set that I know only has 1 item.
So, page 0 returns 1 item, and page 1 return 0 items.
My expectation is that the process runs once for the one item.
However, I see that both page 0 and 1 are called twice and the process thus runs twice.
I think this has something to do with a call that is causing the range to re-evaluate beginning from 0, but I can't figure out what that it is.
The Code
var query = Observable.Range(0, int.MaxValue)
.Select(pageNum =>
{
_etlLogger.Info("Calling GetResProfIDsToProcess with pageNum of {0}", pageNum);
return _recordsToProcessRetriever.GetResProfIDsToProcess(pageNum, _processorSettings.BatchSize);
})
.TakeWhile(resProfList => resProfList.Any())
.SelectMany(records => records.Where(x=> _determiner.ShouldProcess(x)))
.Select(resProf => Observable.Start(async () => await _schoolDataProcessor.ProcessSchoolsAsync(resProf)))
.Merge(maxConcurrent: _processorSettings.ParallelProperties)
.Do(async trackingRequests =>
{
await CreateRequests(trackingRequests.Result, createTrackingPayload);
var numberOfAttachments = SumOfRequestType(trackingRequests.Result, TrackingRecordRequestType.AttachSchool);
var numberOfDetachments = SumOfRequestType(trackingRequests.Result, TrackingRecordRequestType.DetachSchool);
var numberOfAssignmentTypeUpdates = SumOfRequestType(trackingRequests.Result,
TrackingRecordRequestType.UpdateAssignmentType);
_etlLogger.Info("Extractor generated {0} attachments, {1} detachments, and {2} assignment type changes.",
numberOfAttachments, numberOfDetachments, numberOfAssignmentTypeUpdates);
});
var subscription = query.Subscribe(
trackingRequests =>
{
//Nothing really needs to happen here. Technically we're just doing something when it's done.
},
() =>
{
_etlLogger.Info("Finished! Woohoo!");
});
await query.Wait();
This is because you subscribe to the sequence twice. Once at query.Subscribe(...) and again at query.Wait().
Observable.Range(0, int.MaxValue) is a cold observable. Every time you subscribe to it, it will be evaluated again. You could make the observable hot by publishing it with Publish(), then subscribe to it, and then Connect() and then Wait(). This does add a risk to get a InvalidOperationException if you call Wait() after the last element is already yielded. A better alternative is LastOrDefaultAsync().
That would get you something like this:
var connectable = query.Publish();
var subscription = connectable.Subscribe(...);
subscription = new CompositeDisposable(connectable.Connect(), subscription);
await connectable.LastOrDefaultAsync();
Or you can avoid await and return a task directly with ToTask() (do remove async from your method signature).
return connectable.LastOrDefaultAsync().ToTask();
Once converted to a task, you can synchronously wait for it with Wait() (do not confuse Task.Wait() with Observable.Wait()).
connectable.LastOrDefaultAsync().ToTask().Wait();
However, most likely you do not want to wait at all! Waiting in a async context makes little sense. What you should do it put the remaining of the code that needs to run after the sequence completes in the OnComplete() part of the subscription. If you have (clean-up) code that needs to run even when you unsubscribe (Dispose), consider Observable.Using or the Finally(...) method to ensure this code is ran.
As already mentioned the cause of the Observable.Range being repeated is the fact that you're subscribing twice - once with .Subscribe(...) and once with .Wait().
In this kind of circumstance I would go with a very simple blocking call to get the values. Just do this:
var results = query.ToArray().Wait();
The .ToArray() turns a multi-valued IObservable<T> into a single values IObservable<T[]>. The .Wait() turns this into T[]. It's the easy way to ensure only one subscription, blocking, and getting all of the values out.
In your case you may not need all values, but I think this is a good habit to get into.
I am using Reactive Extensions (Rx) to buffer some data. I'm having an issue though in that I then need to do something asynchronous with this data, yet I don't want the buffer to pass the next group through until the asynchronous operation is complete.
I've tried to structure the code two ways (contrived example):
public async Task processFiles<File>(IEnumerable<File> files)
{
await files.ToObservable()
.Buffer(10)
.SelectMany(fi => fi.Select(f => upload(f)) //Now have an IObservable<Task>
.Select(t => t.ToObservable())
.Merge()
.LastAsync();
}
public Task upload(File item)
{
return Task.Run(() => { //Stuff });
}
or
public async Task processFiles<File>(IEnumerable<File> files)
{
var buffered = files.ToObservable()
.Buffer(10);
buffered.Subscribe(async files => await Task.WhenAll(files.Select(f => upload(f)));
await buffered.LastAsync();
}
public Task upload(File item)
{
return Task.Run(() => { //Stuff });
}
Unfortunately, neither of these methods have worked as the buffer pushes the next group before the async operations complete. The intent is to have each buffered group executed asynchronously and only when that operation is complete, continue with the next buffered group.
Any help is greatly appreciated.
To make sure I understand you correctly, it sounds like you want to ensure you carry on buffering items while only presenting each buffer when the previous buffer has been processed.
You also need to make the processing of each buffer asynchronous.
It's probably valuable to consider some theoretical points, because I have to confess that I'm a bit confused about the approach. IObservable is often said to be the dual of IEnumerable because it mirrors the latter with the key difference being that data is pushed to the consumer rather than the consumer pulling it as it chooses.
You are trying to use the buffered stream like an IEnumerable instead of an IObservable - you essentially want to pull the buffers rather than have them pushed to you - so I do have to wonder if have you picked the right tool for the job? Are you are trying to hold up the buffering operation itself while a buffer is processed? As a consumer having the data pushed at you this isn't really a correct approach.
You could consider applying a ToEnumerable() call to the buffer operation, so that you can deal we the buffers when ready. That won't prevent ongoing buffering occurring while you deal with a current buffer though.
There's little you can do to prevent this - doing the buffer processing synchronously inside a Select() operation applied to the buffer would carry a guarantee that no subsequent OnNext() call would occur until the Select() projection completed. The guarantee comes for free as the Rx library operators enforce the grammar of Rx. But it's only guaranteeing non-overlapping invocations of OnNext() - there's nothing to say a given operator couldn't (and indeed shouldn't) carry on getting the next OnNext() ready to go. That's the nature of a push based API.
It's very unclear why you think you need the projection to be asynchronous if you also want to block the Buffers? Have a think about this - I suspect using a synchronous Select() in your observer might solve the issue but it's not entirely clear from your question.
Similar to a synchronous Select() is a synchronous OnNext() handler which is easier to handle if your processing of items have no results - but it's not the same because (depending on the implementation of the Observable) you are only blocking delivery of OnNext() calls to that Subscriber rather than all Subscribers. However, with just a single Subscriber it's equivalent so you could do something like:
void Main()
{
var source = Observable.Range(1, 4);
source.Buffer(2)
.Subscribe(i =>
{
Console.WriteLine("Start Processing Buffer");
Task.WhenAll(from n in i select DoUpload(n)).Wait();
Console.WriteLine("Finished Processing Buffer");
});
}
private Task DoUpload(int i)
{
return Task.Factory.StartNew(
() => {
Thread.Sleep(1000);
Console.WriteLine("Process File " + i);
});
}
Which outputs (*with no guarantee on the order of Process File x within a Buffer):
Start Processing Buffer
Process File 2
Process File 1
Finished Processing Buffer
Start Processing Buffer
Process File 3
Process File 4
Finished Processing Buffer
If you prefer to use a Select() and your projection returns no results, you can do it like this:
source.Buffer(2)
.Select(i =>
{
Console.WriteLine("Start Processing Buffer");
Task.WhenAll(from n in i select DoUpload(n)).Wait();
Console.WriteLine("Finished Processing Buffer");
return Unit.Default;
}).Subscribe();
NB: Sample code written in LINQPad and including Nuget package Rx-Main. This code is for illustrative purposes - don't Thread.Sleep() in production code!
First, I think your requirement to execute the items from each group in parallel, but each group in series is quite unusual. A more common requirement would be to to execute the items in parallel, but at most n of them at the same time. This way, there are not fixed groups, so if a single items takes too long, other items don't have to wait for it.
To do what you're asking for, I think TPL Dataflow is more suitable than Rx (though some Rx code will still be useful). TPL Dataflow is centered about “blocks” that execute stuff, by default in series, which is exactly what you need.
Your code could look like this:
public static class Extensions
{
public static Task ExecuteInGroupsAsync<T>(
this IEnumerable<T> source, Func<T, Task> func, int groupSize)
{
var block = new ActionBlock<IEnumerable<T>>(
g => Task.WhenAll(g.Select(func)));
source.ToObservable()
.Buffer(groupSize)
.Subscribe(block.AsObserver());
return block.Completion;
}
}
public Task ProcessFiles(IEnumerable<File> files)
{
return files.ExecuteInGroupsAsync(Upload, 10);
}
This leaves most of the heavy lifting on the ActionBlock (and some on Rx). Dataflow blocks can act as Rx observers (and observables), so we can take advantage of that to keep using Buffer().
We want to handle the whole group at once, so we use Task.WhenAll() to create a Task that completes when the whole group completes. Dataflow blocks understand Task-returning functions, so next group won't start executing until the Task returned by the previous group completes.
The final result is the Completion Task, which will complete after the source observable completes and all processing is done.
TPL Dataflow also has BatchBlock, which works like Buffer() and we could directly Post() each item from the collection (without using ToObservable() and AsObserver()), but I think using Rx for this part of the code makes it simpler.
EDIT: Actually you don't need TPL Dataflow here at all. Using ToEnumerable() as James World suggested will be enough:
public static async Task ExecuteInGroupsAsync<T>(
this IEnumerable<T> source, Func<T, Task> func, int groupSize)
{
var groups = source.ToObservable().Buffer(groupSize).ToEnumerable();
foreach (var g in groups)
{
await Task.WhenAll(g.Select(func));
}
}
Or even simpler without Rx using Batch() from morelinq:
public static async Task ExecuteInGroupsAsync<T>(
this IEnumerable<T> source, Func<T, Task> func, int groupSize)
{
var groups = source.Batch(groupSize);
foreach (var group in groups)
{
await Task.WhenAll(group.Select(func));
}
}
I have an interleaved stream which I split into separate sequential streams.
Producer
int streamCount = 3;
new MyIEnumerable<ElementType>()
.ToObservable(Scheduler.ThreadPool)
.Select((x,i) => new { Key = (i % streamCount), Value = x })
.Subscribe(x => outputs[x.Key].OnNext(x.Value));
Where outputs[] are Subjects which process the streams are defined below. The .ObserveOn() is used to process the streams concurrently (multi-threaded).
Consumers
var outputs = Enumerable.Repeat(0, streamCount).Select(_ => new Subject<char>()).ToArray();
outputs[0].ObserveOn(Scheduler.ThreadPool).Subscribe(x => Console.WriteLine("stream 0: {0}", x));
outputs[1].ObserveOn(Scheduler.ThreadPool).Subscribe(x => Console.WriteLine("stream 1: {0}", x));
outputs[2].ObserveOn(Scheduler.ThreadPool).Subscribe(x => Console.WriteLine("stream 2: {0}", x));
The problem with this code is that it will read the entire enumerable as fast as possible, even if the output streams cannot catch up. In my case the enumerable is a file stream so this might cause using a lot of memory. Therefore, I would like the reading to block if the buffer(s) reach some threshold.
I have solved this by using a semaphore on the producer and consumers like shown below. However, I am not sure that this is considered a good solution (in terms of Rx contracts, programming style, etc).
var semaphore = new SemaphoreSlim(MAX_BUFFERED_ELEMENTS);
// add to producer (before subscribe)
.Do(_ => semaphore.Wait());
// add to consumer (before subscribe)
.Do(_ => semaphore.Release()))
It might be a good idea to pass a CancelationToken to the call to Wait() and make sure it is cancelled when the stream stops abnormally?
I think your solution is very reasonable. The biggest problem (having some background to the previous question) is that the 'insides' of your solution are currently exposed everywhere. Just make sure that when you code this properly you clean up the following:
Wrap everything into a class that exposes a single method: IDisposable Subscribe(<index>, Action) or alternatively IObservable<element> ToObservable(<index>)). Either the returned subscription or the returned observable will have all the 'work' already done to them, namely the added Do actions and so forth. The fact that there's a dictionary or list under it all should be completely irrelevant to the user, otherwise any change to your code here will require changes all over the place.
A CancelationToken is a great idea, make sure to cancel it on either OnCompleted or OnError, which you can do using overloads to Do.