I'm interested in an ActionBlock implementation for Framework 4.0, since there it seems that TPL.Dataflow isn't supported for Framework 4.0.
More particularly, I'm interested in the case of the constructor that receives the Func<TInput, Task> delegate and the MaxDegreeOfParallism = 1 case.
I thought about implementing it using reactive extensions, but I'm not sure how to do it. Thought about creating a Subject<TInput> and calling OnNext on Post, and using SelectMany and task ToObservable stuff, but I'm not sure what to do with the scheduler. Here is a draft of what I was thinking of.
public class ActionBlock<TInput>
{
private readonly TaskCompletionSource<object> mCompletion = new TaskCompletionSource<object>();
private readonly Subject<TInput> mQueue = new Subject<TInput>();
public ActionBlock(Func<TInput, Task> action)
{
var observable =
from item in mQueue
from _ in action(item).ToObservable()
select _;
observable.Subscribe(x => { },
OnComplete);
}
private void OnComplete()
{
mCompletion.SetResult(null);
}
public void Post(TInput input)
{
mQueue.OnNext(input);
}
public Task Completion
{
get
{
return mCompletion.Task;
}
}
public void Complete()
{
mQueue.OnCompleted();
}
}
I thought maybe using EventLoopScheduler but I'm not sure it fits here since this is async.
Any ideas?
mQueue
.Select(input => Observable.FromAsync(() => action(input))
.Merge(maxDegreeOfParallelism)
.Subscribe(...);
If indeed maxDegreeOfParallelism is always 1, then just use Concat instead of Merge:
mQueue
.Select(input => Observable.FromAsync(() => action(input))
.Concat()
.Subscribe(...);
This works because FromAsync just creates a cold observable that will not run the async action until it is subscribed. We then use the maxConcurrency parameter of Merge (or just Concat) to limit the number of concurrent subscriptions (and thus the number of async actions running).
Edit:
And since your goal is to just have a Task that represents the completion of the stream, you can use ToTask instead of directly subscribing. ToTask will subscribe and return a Task with the final value. Because ToTask will throw if the observable does not produce a value, we'll use Count to guarantee it produces a value:
// task to mark completion
private readonly Task mCompletion;
// ...
this.mCompletion = mQueue
.Select(input => Observable.FromAsync(() => action(input))
.Concat()
.Count()
.ToTask();
Related
I have tried to write console observable as in the example below, but it doesn't work. There are some issues with subscriptions. How to solve these issues?
static class Program
{
static async Task Main(string[] args)
{
// var observable = Observable.Interval(TimeSpan.FromMilliseconds(1000)).Publish().RefCount(); // works
// var observable = FromConsole().Publish().RefCount(); // doesn't work
var observable = FromConsole(); // doesn't work
observable.Subscribe(Console.WriteLine);
await Task.Delay(1500);
observable.Subscribe(Console.WriteLine);
await new TaskCompletionSource().Task;
}
static IObservable<string> FromConsole()
{
return Observable.Create<string>(async observer =>
{
while (true)
{
observer.OnNext(Console.ReadLine());
}
});
}
}
If I used Observable.Interval, it subscribes two times and I have two outputs for one input. If I used any version of FromConsole, I have one subscription and a blocked thread.
To start with, it is usually best to avoid using Observable.Create to create observables - it's certainly there for that purpose, but it can create observables that don't behave like you think they should because of their blocking nature. As you've discovered!
Instead, when possible, use the built-in operators to create observables. And that can be done in this case.
My version of FromConsole is this:
static IObservable<string> FromConsole() =>
Observable
.Defer(() =>
Observable
.Start(() => Console.ReadLine()))
.Repeat();
Observable.Start effectively is like Task.Run for observables. It calls Console.ReadLine() for us without blocking.
The Observable.Defer/Repeat pair repeatedly calls Observable.Start(() => Console.ReadLine()). Without the Defer it would just call Observable.Start and repeatedly return the one string forever.
That solves that.
Now, the second issue is that you want to see the value from the Console.ReadLine() output by both subscriptions to the FromConsole() observable.
Due to the way Console.ReadLine works, you are getting values from each subscription, but only one at a time. Try this code:
static async Task Main(string[] args)
{
var observable = FromConsole();
observable.Select(x => $"1:{x}").Subscribe(Console.WriteLine);
observable.Select(x => $"2:{x}").Subscribe(Console.WriteLine);
await new TaskCompletionSource<int>().Task;
}
static IObservable<string> FromConsole() =>
Observable
.Defer(() =>
Observable
.Start(() => Console.ReadLine()))
.Repeat();
When I run that I get this kind of output:
1:ddfd
2:dfff
1:dfsdfs
2:sdffdfd
1:sdfsdfsdf
The reason for this is that each subscription starts up a fresh subscription to FromConsole. So you have two calls to Console.ReadLine() they effectively queue and each one only gets each alternate input. Hence the alternation between 1 & 2.
So, to solve this you simply need the .Publish().RefCount() operator pair.
Try this:
static async Task Main(string[] args)
{
var observable = FromConsole().Publish().RefCount();
observable.Select(x => $"1:{x}").Subscribe(Console.WriteLine);
observable.Select(x => $"2:{x}").Subscribe(Console.WriteLine);
await new TaskCompletionSource<int>().Task;
}
static IObservable<string> FromConsole() =>
Observable
.Defer(() =>
Observable
.Start(() => Console.ReadLine()))
.Repeat();
I now get:
1:Hello
2:Hello
1:World
2:World
In a nutshell, it's the combination of the non-blocking FromConsole observable and the use of .Publish().RefCount() that makes this work the way you expect.
The problem is that the Console.ReadLine is a blocking method, so the subscription to the FromConsole sequence blocks indefinitely, so the await Task.Delay(1500); line is never reached. You can solve this problem by reading from the console asynchronously, offloading the blocking call to a ThreadPool thread:
static IObservable<string> FromConsole()
{
return Observable.Create<string>(async observer =>
{
while (true)
{
observer.OnNext(await Task.Run(() => Console.ReadLine()));
}
});
}
You can take a look at this question about why there is no better solution than offloading.
As a side note, subscribing to a sequence without providing an onError handler is not a good idea, unless having the process crash with an unhandled exception is an acceptable behavior for your app. It is especially problematic with sequences produced with Observable.Create<T>(async, because it can lead to weird/buggy behavior like this one: Async Create hanging while publishing observable.
You need to return a observable without the publish. You can then subscribe to it and do your thing further. Here is an example. When I run it i can readline multiple times.
public class Program
{
static void Main(string[] args)
{
FromConsole().Subscribe(x =>
{
Console.WriteLine(x);
});
}
static IObservable<string> FromConsole()
{
return Observable.Create<string>(async observer =>
{
while (true)
{
observer.OnNext(Console.ReadLine());
}
});
}
}
I have a fairly simple producer-consumer pattern where (simplified) I have two producers who produce output that is to be consumed by one consumer.
For this I use System.Threading.Tasks.Dataflow.BufferBlock<T>
A BufferBlock object is created. One Consumer is listening to this BufferBlock, and processes any received input.
Two 'Producerssend data to theBufferBlock` simultaneously
Simplified:
BufferBlock<int> bufferBlock = new BufferBlock<int>();
async Task Consume()
{
while(await bufferBlock.OutputAvailable())
{
int dataToProcess = await outputAvailable.ReceiveAsync();
Process(dataToProcess);
}
}
async Task Produce1()
{
IEnumerable<int> numbersToProcess = ...;
foreach (int numberToProcess in numbersToProcess)
{
await bufferBlock.SendAsync(numberToProcess);
// ignore result for this example
}
}
async Task Produce2()
{
IEnumerable<int> numbersToProcess = ...;
foreach (int numberToProcess in numbersToProcess)
{
await bufferBlock.SendAsync(numberToProcess);
// ignore result for this example
}
}
I'd like to start the Consumer first and then start the Producers as separate tasks:
var taskConsumer = Consume(); // do not await yet
var taskProduce1 = Task.Run( () => Produce1());
var taskProduce2 = Task.Run( () => Produce2());
// await until both producers are finished:
await Task.WhenAll(new Task[] {taskProduce1, taskProduce2});
bufferBlock.Complete(); // signal that no more data is expected in bufferBlock
// await for the Consumer to finish:
await taskConsumer;
At first glance, this is exactly how the producer-consumer was meant: several producers produce data while a consumer is consuming the produced data.
Yet, BufferBlock about thread safety says:
Any instance members are not guaranteed to be thread safe.
And I thought that the P in TPL meant Parallel!
Should I worry? Is my code not thread safe?
Is there a different TPL Dataflow class that I should use?
Yes, the BufferBlock class is thread safe. I can't back this claim by pointing to an official document, because the "Thread Safety" section has been removed from the documentation. But I can see in the source that the class contains a lock object for synchronizing the incoming messages:
/// <summary>Gets the lock object used to synchronize incoming requests.</summary>
private object IncomingLock { get { return _source; } }
When the Post extension method is called (source code), the explicitly implemented ITargetBlock.OfferMessage method is invoked (source code). Below is an excerpt of this method:
DataflowMessageStatus ITargetBlock<T>.OfferMessage(DataflowMessageHeader messageHeader,
T messageValue, ISourceBlock<T> source, bool consumeToAccept)
{
//...
lock (IncomingLock)
{
//...
_source.AddMessage(messageValue);
//...
}
}
It would be strange indeed if this class, or any other XxxBlock class included in the TPL Dataflow library, was not thread-safe. It would severely hamper the ease of use of this great library.
I think an ActionBlock<T> would better suit what your doing since it has a built in buffer that many producers can send data in through. The default block options process the data on single background task but you can set a new value for parallelism and bounded capacity. With ActionBlock<T> the main area of concern to ensure thread safety will be in the delegate you pass that processes each message. The operation of that function has to be independent of each message, i.e. not modifying shared state just like any Parrallel... function.
public class ProducerConsumer
{
private ActionBlock<int> Consumer { get; }
public ProducerConsumer()
{
Consumer = new ActionBlock<int>(x => Process(x));
}
public async Task Start()
{
var producer1Tasks = Producer1();
var producer2Tasks = Producer2();
await Task.WhenAll(producer1Tasks.Concat(producer2Tasks));
Consumer.Complete();
await Consumer.Completion;
}
private void Process(int data)
{
// process
}
private IEnumerable<Task> Producer1() => Enumerable.Range(0, 100).Select(x => Consumer.SendAsync(x));
private IEnumerable<Task> Producer2() => Enumerable.Range(0, 100).Select(x => Consumer.SendAsync(x));
}
I have code which streams data down from SQL and writes it to a different store. The code is approximately this:
using (var cmd = new SqlCommand("select * from MyTable", connection))
{
using (var reader = await cmd.ExecuteReaderAsync())
{
var list = new List<MyData>();
while (await reader.ReadAsync())
{
var row = GetRow(reader);
list.Add(row);
if (list.Count == BatchSize)
{
await WriteDataAsync(list);
list.Clear();
}
}
if (list.Count > 0)
{
await WriteDataAsync(list);
}
}
}
I would like to use Reactive extensions for this purpose instead. Ideally the code would look like this:
await StreamDataFromSql()
.Buffer(BatchSize)
.ForEachAsync(async batch => await WriteDataAsync(batch));
However, it seems that the extension method ForEachAsync only accepts synchronous actions. Would it be possible to write an extension which would accept an async action?
Would it be possible to write an extension which would accept an async action?
Not directly.
Rx subscriptions are necessarily synchronous because Rx is a push-based system. When a data item arrives, it travels through your query until it hits the final subscription - which in this case is to execute an Action.
The await-able methods provided by Rx are awaiting the sequence itself - i.e., ForEachAsync is asynchronous in terms of the sequence (you are asynchronously waiting for the sequence to complete), but the subscription within ForEachAsync (the action taken for each element) must still be synchronous.
In order to do a sync-to-async transition in your data pipeline, you'll need to have a buffer. An Rx subscription can (synchronously) add to the buffer as a producer while an asynchronous consumer is retrieving items and processing them. So, you'd need a producer/consumer queue that supports both synchronous and asynchronous operations.
The various block types in TPL Dataflow can satisfy this need. Something like this should suffice:
var obs = StreamDataFromSql().Buffer(BatchSize);
var buffer = new ActionBlock<IList<T>>(batch => WriteDataAsync(batch));
using (var subscription = obs.Subscribe(buffer.AsObserver()))
await buffer.Completion;
Note that there is no backpressure; as quickly as StreamDataFromSql can push data, it'll be buffered and stored in the incoming queue of the ActionBlock. Depending on the size and type of data, this can quickly use a lot of memory.
The correct thing to do is to use Reactive Extensions properly to get this done - so start from the point that you create the connection right up until you write your data.
Here's how:
IObservable<IList<MyData>> query =
Observable
.Using(() => new SqlConnection(""), connection =>
Observable
.Using(() => new SqlCommand("select * from MyTable", connection), cmd =>
Observable
.Using(() => cmd.ExecuteReader(), reader =>
Observable
.While(() => reader.Read(), Observable.Return(GetRow(reader))))))
.Buffer(BatchSize);
IDisposable subscription =
query
.Subscribe(async list => await WriteDataAsync(list));
I couldn't test the code, but it should work. This code assumes that WriteDataAsync can take a IList<MyData> too. If it doesn't just drop in a .ToList().
Here is a version of the ForEachAsync method that supports asynchronous actions. It projects the source observable to a nested IObservable<IObservable<Unit>> containing the asynchronous actions, and then flattens it back to an IObservable<Unit> using the Merge operator. The resulting observable is finally converted to a task.
By default the actions are invoked sequentially, but it is possible to invoke them concurrently by configuring the optional maximumConcurrency argument.
Canceling the optional cancellationToken argument results to the immediate completion (cancellation) of the returned Task, potentially before the cancellation of the currently running actions.
Any exception that may occur is propagated through the Task, and causes the cancellation of all currently running actions.
/// <summary>
/// Invokes an asynchronous action for each element in the observable sequence,
/// and returns a 'Task' that represents the completion of the sequence and
/// all the asynchronous actions.
/// </summary>
public static Task ForEachAsync<TSource>(
this IObservable<TSource> source,
Func<TSource, CancellationToken, Task> action,
CancellationToken cancellationToken = default,
int maximumConcurrency = 1)
{
// Arguments validation omitted
return source
.Select(item => Observable.FromAsync(ct => action(item, ct)))
.Merge(maximumConcurrency)
.DefaultIfEmpty()
.ToTask(cancellationToken);
}
Usage example:
await StreamDataFromSql()
.Buffer(BatchSize)
.ForEachAsync(async (batch, token) => await WriteDataAsync(batch, token));
Here is the source code for ForEachAsync and an article on the ToEnumerable and AsObservable method
We can make a wrapper around the ForEachAsync that will await a Task-returning function:
public static async Task ForEachAsync<T>( this IObservable<T> t, Func<T, Task> onNext )
{
foreach ( var x in t.ToEnumerable() )
await onNext( x );
}
Example usage:
await ForEachAsync( Observable.Range(0, 10), async x => await Task.FromResult( x ) );
I have following code:
IObservable<Data> _source;
...
_source.Subscribe(StoreToDatabase);
private async Task StoreToDatabase(Data data) {
await dbstuff(data);
}
However, this does not compile. Is there any way how to observe data asynchronously? I tried async void, it works, but I feel that given solution is not feasible.
I also checked Reactive Extensions Subscribe calling await, but it does not answer my question (I do not care about the SelectMany result.)
You don't have to care about the SelectMany result. The answer is still the same... though you need your task to have a return type (i.e. Task<T>, not Task).
Unit is essentially equivalent to void, so you can use that:
_source.SelectMany(StoreToDatabase).Subscribe();
private async Task<Unit> StoreToDatabase(Data data)
{
await dbstuff(data);
return Unit.Default;
}
This SelectMany overload accepts a Func<TSource, Task<TResult> meaning the resulting sequence will not complete until the task is completed.
Late answer, but I think that the following extension methods correctly encapsulate what Charles Mager proposed in his answer:
public static IDisposable SubscribeAsync<T>(this IObservable<T> source,
Func<Task> asyncAction, Action<Exception> handler = null)
{
Func<T,Task<Unit>> wrapped = async t =>
{
await asyncAction();
return Unit.Default;
};
if(handler == null)
return source.SelectMany(wrapped).Subscribe(_ => { });
else
return source.SelectMany(wrapped).Subscribe(_ => { }, handler);
}
public static IDisposable SubscribeAsync<T>(this IObservable<T> source,
Func<T,Task> asyncAction, Action<Exception> handler = null)
{
Func<T, Task<Unit>> wrapped = async t =>
{
await asyncAction(t);
return Unit.Default;
};
if(handler == null)
return source.SelectMany(wrapped).Subscribe(_ => { });
else
return source.SelectMany(wrapped).Subscribe(_ => { }, handler);
}
I've been using TPL DataFlow to control back pressure and have used it to solve this problem.
The key part is ITargetBlock<TInput>.AsObserver() - source.
// Set a block to handle each element
ITargetBlock<long> targetBlock = new ActionBlock<long>(async p =>
{
Console.WriteLine($"Received {p}");
await Task.Delay(1000);
Console.WriteLine($"Finished handling {p}");
},
new ExecutionDataflowBlockOptions { BoundedCapacity = 1 });
// Generate an item each second for 10 seconds
var sequence = Observable.Interval(TimeSpan.FromSeconds(1)).Take(10);
// Subscribe with an observer created from the target block.
sequence.Subscribe(targetBlock.AsObserver());
// Await completion of the block
await targetBlock.Completion;
The important part here is that the ActionBlock's bounded capacity is set to 1. This prevents the block from receiving more than one item at a time and will block OnNext if an item is already being processed!
My big surprise here was that it can be safe to call Task.Wait and Task.Result inside your subscription. Obviously, if you have called ObserverOnDispatcher() or similar you will probably hit deadlocks. Be careful!
So you want to run the Store Data Procedure, possibly some other procedure and asynchronously await the completion or partial result. How about Create constructor shown here:
IObservable<Int32> NotifyOfStoringProgress =
Observable.Create(
(Func<IObserver<Int32>, Task>)
(async (ObserverToFeed) =>
{
ObserverToFeed.OnNext(-1);
Task StoreToDataBase = Task.Run(()=> {;});
ObserverToFeed.OnNext(0);
;;
await StoreToDataBase;
ObserverToFeed.OnNext(1);
;;
}));
NotifyOfStoringProgress.Subscribe(onNext: Notification => {;});
Here is my Interval definition:
m_interval = Observable.Interval(TimeSpan.FromSeconds(5), m_schedulerProvider.EventLoop)
.ObserveOn(m_schedulerProvider.EventLoop)
.Select(l => Observable.FromAsync(DoWork))
.Concat()
.Subscribe();
In the code above, I feed the IScheduler in both Interval & ObserveOn from a SchedulerProvider so that I can unit test faster (TestScheduler.AdvanceBy). Also, DoWork is an async method.
In my particular case, I want the DoWork function to be called every 5 seconds. The issue here is that I want the 5 seconds to be the time between the end of DoWork and the start of the other. So if DoWork takes more than 5 seconds to execute, let's say 10 seconds, the first call would be at 5 seconds and the second call at 15 seconds.
Unfortunately, the following test proves it does not behave like that:
[Fact]
public void MultiPluginStatusHelperShouldWaitForNextQuery()
{
m_queryHelperMock
.Setup(x => x.CustomQueryAsync())
.Callback(() => Thread.Sleep(10000))
.Returns(Task.FromResult(new QueryCompletedEventData()))
.Verifiable()
;
var multiPluginStatusHelper = m_container.GetInstance<IMultiPluginStatusHelper>();
multiPluginStatusHelper.MillisecondsInterval = 5000;
m_testSchedulerProvider.EventLoopScheduler.AdvanceBy(TimeSpan.FromMilliseconds(5000).Ticks);
m_testSchedulerProvider.EventLoopScheduler.AdvanceBy(TimeSpan.FromMilliseconds(5000).Ticks);
m_queryHelperMock.Verify(x => x.CustomQueryAsync(), Times.Once);
}
The DoWork calls the CustomQueryAsync and the test fails saying that is was called twice. It should only be called once because of the delay forced with .Callback(() => Thread.Sleep(1000)).
What am I doing wrong here ?
My actual implementation comes from this example.
This problem comes up a lot, usually when polling some non-observable data source. When I come across it, I use a RepeatAfterDelay operator I wrote a while back:
public static IObservable<T> RepeatAfterDelay<T>(this IObservable<T> source, TimeSpan delay, IScheduler scheduler)
{
var repeatSignal = Observable
.Empty<T>()
.Delay(delay, scheduler);
// when source finishes, wait for the specified
// delay, then repeat.
return source.Concat(repeatSignal).Repeat();
}
And this is how I use it:
// do first set of work immediately, and then every 5 seconds do it again
m_interval = Observable
.FromAsync(DoWork)
.RepeatAfterDelay(TimeSpan.FromSeconds(5), scheduler)
.Subscribe();
// wait 5 seconds, then do first set of work, then again every 5 seconds
m_interval = Observable
.Timer(TimeSpan.FromSeconds(5), scheduler)
.SelectMany(_ => Observable
.FromAsync(DoWork)
.RepeatAfterDelay(TimeSpan.FromSeconds(5), scheduler))
.Subscribe();
Your problem is that your code is mixing lazy (Observable) and non-lazy (Task) constructs. While your first Task is executing the Interval will fire again and create a new task in the Select operator. If you want to avoid this behavior you need to wrap your Observable into a Defer block:
m_interval = Observable.Interval(TimeSpan.FromSeconds(5), m_schedulerProvider.EventLoop)
.ObserveOn(m_schedulerProvider.EventLoop)
//I think `Defer` implicitly wraps Tasks, if not wrap it in `FromAsync` Again
.Select(l => Observable.Defer(() => DoWork()))
.Concat()
.Subscribe();
The result of this is that each Observable will only execute the deferred Task when it is subscribed to, i.e. when the previous completes.
Notably this does have a problem if your producer is producing much faster than you can consume, it will begin to pile up and each your memory. As an alternative I would propose using this GenerateAsync implementation:
public static IObservable<TOut> GenerateAsync<TResult, TOut>(
Func<Task<TResult>> initialState,
Func<TResult, bool> condition,
Func<TResult, Task<TResult>> iterate,
Func<TResult, TimeSpan> timeSelector,
Func<TResult, TOut> resultSelector,
IScheduler scheduler = null)
{
var s = scheduler ?? Scheduler.Default;
return Observable.Create<TOut>(async obs => {
//You have to do your initial time delay here.
var init = await initialState();
return s.Schedule(init, timeSelector(init), async (state, recurse) =>
{
//Check if we are done
if (!condition(state))
{
obs.OnCompleted();
return;
}
//Process the result
obs.OnNext(resultSelector(state));
//Initiate the next request
state = await iterate(state);
//Recursively schedule again
recurse(state, timeSelector(state));
});
});
}
GenerateAsync(DoWork /*Initial state*/,
_ => true /*Forever*/,
_ => DoWork() /*Do your async task*/,
_ => TimeSpan.FromSeconds(5) /*Delay between events*/,
_ => _ /*Any transformations*/,
scheduler)
.Subscribe();
The above removes the issue of producer/consumer races, by not scheduling the next event until after the first one is done.
While #Brandon's solution is nice and clean I discovered that it blocks a thread to wait for the delay timer. Non-blocking alternative can look something like:
public static IObservable<T> DelayRepeat<T>(this IObservable<T> source, TimeSpan delay) =>
source
.Concat(
Observable.Create<T>(async observer =>
{
await Task.Delay(delay);
observer.OnCompleted();
}))
.Repeat();