What are the functional benefits of recursive scheduling in System.Reactive? - c#

I'm current reading http://www.introtorx.com/ and I'm getting really interested in stripping Subject<T> out of my reactive code. I'm starting to understand how to encapsulate sequence generation so that I can reason better about a given sequence. I read a few SO questions and ended up reading about scheduling. Of particular interest is recursive scheduling, using the Schedule(this IScheduler scheduler, Action<TState,Action<TState>>)overloads - like this one.
The book is starting to show its age in a few areas, and the biggest i see is that it never compares its techniques to alternatives that may be achieved using the Task and async/await language features. I always end up feeling like I could write less code by ignoring the book advice and using the asynchronous toys, but the back of my mind nags me about being lazy and not learning the pattern properly.
With that, here is my question. If I wanted to schedule a sequence at an interval, support cancellation and perform work on a background thread, I might do this:
static void Main(string[] args)
{
var sequence = Observable.Create<object>(o =>
{
CancellationTokenSource cancellationTokenSource = new CancellationTokenSource();
DoWerk(o, cancellationTokenSource);
return cancellationTokenSource.Cancel;
});
sequence.Subscribe(p => Console.Write(p));
Console.ReadLine();
}
private static async void DoWerk(IObserver<object> o, CancellationTokenSource cancellationTokenSource)
{
string message = "hello world!";
for (int i = 0; i < message.Length; i++)
{
await Task.Delay(250, cancellationTokenSource.Token);
o.OnNext(message[i]);
if (cancellationTokenSource.IsCancellationRequested)
{
break;
}
}
o.OnCompleted();
}
Note the use of async void to create concurrency without explicitly borrowing a thread pool thread with Task.Run(). await Task.Delay() will, however, do just that but it will not lease the thread for long.
What are the limitations and pitfalls here? What are the reasons that you might prefer to use recursive scheduling?

I personally wouldn't use await Task.Delay(250, cancellationTokenSource.Token); as a way to slow down a loop. It's better than Thread.Sleep(250), but it's still code smell to me.
I would look at it that you should use a built-in operator in preference to a roll-your-own solution like this.
The operator you need is one of the most powerful, but often overlooked. Try Observable.Generate. He's how:
static void Main(string[] args)
{
IObservable<char> sequence = Observable.Create<char>(o =>
{
string message = "hello world!";
return
Observable
.Generate(
0,
n => n < message.Length,
n => n + 1,
n => message[n],
n => TimeSpan.FromMilliseconds(250.0))
.Subscribe(o);
});
using (sequence.Subscribe(p => Console.Write(p)))
{
Console.ReadLine();
}
}
This is self-cancelling (when you call .Dispose() on the subscription) and produces values every 250.0 milliseconds.
I've continued to use the Observable.Create operator to ensure that the message variable is encapsulated within the observable - otherwise it is possible for someone to change the value of message as the observable is working with it and thus break it.
As an alternative, that might not be as efficient with memory, but is self-encapsulating, try this:
IObservable<char> sequence =
Observable
.Generate(
"hello world!",
n => !String.IsNullOrEmpty(n),
n => n.Substring(1),
n => n[0],
n => TimeSpan.FromMilliseconds(250.0));
And, finally, there's nothing "recursive" about the scheduling in your question. What did you mean by that?
I finally figured out what you're looking at. I missed it in the question.
Here's an example using the recursive scheduling:
IObservable<char> sequence = Observable.Create<char>(o =>
{
string message = "hello world!";
return Scheduler.Default.Schedule<string>(message, TimeSpan.FromMilliseconds(250.0), (state, schedule) =>
{
if (!String.IsNullOrEmpty(state))
{
o.OnNext(state[0]);
schedule(state.Substring(1), TimeSpan.FromMilliseconds(250.0));
}
else
{
o.OnCompleted();
}
});
});

Related

How to make an IObservable<string> from console input

I have tried to write console observable as in the example below, but it doesn't work. There are some issues with subscriptions. How to solve these issues?
static class Program
{
static async Task Main(string[] args)
{
// var observable = Observable.Interval(TimeSpan.FromMilliseconds(1000)).Publish().RefCount(); // works
// var observable = FromConsole().Publish().RefCount(); // doesn't work
var observable = FromConsole(); // doesn't work
observable.Subscribe(Console.WriteLine);
await Task.Delay(1500);
observable.Subscribe(Console.WriteLine);
await new TaskCompletionSource().Task;
}
static IObservable<string> FromConsole()
{
return Observable.Create<string>(async observer =>
{
while (true)
{
observer.OnNext(Console.ReadLine());
}
});
}
}
If I used Observable.Interval, it subscribes two times and I have two outputs for one input. If I used any version of FromConsole, I have one subscription and a blocked thread.
To start with, it is usually best to avoid using Observable.Create to create observables - it's certainly there for that purpose, but it can create observables that don't behave like you think they should because of their blocking nature. As you've discovered!
Instead, when possible, use the built-in operators to create observables. And that can be done in this case.
My version of FromConsole is this:
static IObservable<string> FromConsole() =>
Observable
.Defer(() =>
Observable
.Start(() => Console.ReadLine()))
.Repeat();
Observable.Start effectively is like Task.Run for observables. It calls Console.ReadLine() for us without blocking.
The Observable.Defer/Repeat pair repeatedly calls Observable.Start(() => Console.ReadLine()). Without the Defer it would just call Observable.Start and repeatedly return the one string forever.
That solves that.
Now, the second issue is that you want to see the value from the Console.ReadLine() output by both subscriptions to the FromConsole() observable.
Due to the way Console.ReadLine works, you are getting values from each subscription, but only one at a time. Try this code:
static async Task Main(string[] args)
{
var observable = FromConsole();
observable.Select(x => $"1:{x}").Subscribe(Console.WriteLine);
observable.Select(x => $"2:{x}").Subscribe(Console.WriteLine);
await new TaskCompletionSource<int>().Task;
}
static IObservable<string> FromConsole() =>
Observable
.Defer(() =>
Observable
.Start(() => Console.ReadLine()))
.Repeat();
When I run that I get this kind of output:
1:ddfd
2:dfff
1:dfsdfs
2:sdffdfd
1:sdfsdfsdf
The reason for this is that each subscription starts up a fresh subscription to FromConsole. So you have two calls to Console.ReadLine() they effectively queue and each one only gets each alternate input. Hence the alternation between 1 & 2.
So, to solve this you simply need the .Publish().RefCount() operator pair.
Try this:
static async Task Main(string[] args)
{
var observable = FromConsole().Publish().RefCount();
observable.Select(x => $"1:{x}").Subscribe(Console.WriteLine);
observable.Select(x => $"2:{x}").Subscribe(Console.WriteLine);
await new TaskCompletionSource<int>().Task;
}
static IObservable<string> FromConsole() =>
Observable
.Defer(() =>
Observable
.Start(() => Console.ReadLine()))
.Repeat();
I now get:
1:Hello
2:Hello
1:World
2:World
In a nutshell, it's the combination of the non-blocking FromConsole observable and the use of .Publish().RefCount() that makes this work the way you expect.
The problem is that the Console.ReadLine is a blocking method, so the subscription to the FromConsole sequence blocks indefinitely, so the await Task.Delay(1500); line is never reached. You can solve this problem by reading from the console asynchronously, offloading the blocking call to a ThreadPool thread:
static IObservable<string> FromConsole()
{
return Observable.Create<string>(async observer =>
{
while (true)
{
observer.OnNext(await Task.Run(() => Console.ReadLine()));
}
});
}
You can take a look at this question about why there is no better solution than offloading.
As a side note, subscribing to a sequence without providing an onError handler is not a good idea, unless having the process crash with an unhandled exception is an acceptable behavior for your app. It is especially problematic with sequences produced with Observable.Create<T>(async, because it can lead to weird/buggy behavior like this one: Async Create hanging while publishing observable.
You need to return a observable without the publish. You can then subscribe to it and do your thing further. Here is an example. When I run it i can readline multiple times.
public class Program
{
static void Main(string[] args)
{
FromConsole().Subscribe(x =>
{
Console.WriteLine(x);
});
}
static IObservable<string> FromConsole()
{
return Observable.Create<string>(async observer =>
{
while (true)
{
observer.OnNext(Console.ReadLine());
}
});
}
}

RX terminolgy: Async processing in RX operator when there are frequent observable notifications

The purpose is to do some async work on a scarce resource in a RX operator, Select for example. Issues arise when observable notifications came at a rate that is faster than the time it takes for the async operation to complete.
Now I actually solved the problem. My question would be what is the correct terminology for this particular kind of issue? Does it have a name? Is it backpressure? Research I did until now indicate that this is some kind of a pressure problem, but not necessarily backpressure from my understanding. The most relevant resources I found are these:
https://github.com/ReactiveX/RxJava/wiki/Backpressure-(2.0)
http://reactivex.io/documentation/operators/backpressure.html
Now to the actual code. Suppose there is a scarce resource and it's consumer. In this case exception is thrown when resource is in use. Please note that this code should not be changed.
public class ScarceResource
{
private static bool inUse = false;
public async Task<int> AccessResource()
{
if (inUse) throw new Exception("Resource is alredy in use");
var result = await Task.Run(() =>
{
inUse = true;
Random random = new Random();
Thread.Sleep(random.Next(1, 2) * 1000);
inUse = false;
return random.Next(1, 10);
});
return result;
}
}
public class ResourceConsumer
{
public IObservable<int> DoWork()
{
var resource = new ScarceResource();
return resource.AccessResource().ToObservable();
}
}
Now here is the problem with a naive implementation to consume the resource. Error is thrown because notifications came at a faster rate than the consumer takes to run.
private static void RunIntoIssue()
{
var numbers = Enumerable.Range(1, 10);
var observableSequence = numbers
.ToObservable()
.SelectMany(n =>
{
Console.WriteLine("In observable: {0}", n);
var resourceConsumer = new ResourceConsumer();
return resourceConsumer.DoWork();
});
observableSequence.Subscribe(n => Console.WriteLine("In observer: {0}", n));
}
With the following code the problem is solved. I slow down processing by using a completed BehaviorSubject in conjunction with the Zip operator. Essentially what this code does is to take a sequential approach instead of a parallel one.
private static void RunWithZip()
{
var completed = new BehaviorSubject<bool>(true);
var numbers = Enumerable.Range(1, 10);
var observableSequence = numbers
.ToObservable()
.Zip(completed, (n, c) =>
{
Console.WriteLine("In observable: {0}, completed: {1}", n, c);
var resourceConsumer = new ResourceConsumer();
return resourceConsumer.DoWork();
})
.Switch()
.Select(n =>
{
completed.OnNext(true);
return n;
});
observableSequence.Subscribe(n => Console.WriteLine("In observer: {0}", n));
Console.Read();
}
Question
Is this backpressure, and if not does it have another terminology associated?
You're basically implementing a form of locking, or a mutex. Your code an cause backpressure, it's not really handling it.
Imagine if your source wasn't a generator function, but rather a series of data pushes. The data pushes arrive at a constant rate of every millisecond. It takes you 10 Millis to process each one, and your code forces serial processing. This causes backpressure: Zip will queue up the unprocessed datapushes infinitely until you run out of memory.

Is it in general dubious to call Task.Factory.StartNew(async () => {}) in Subscribe?

I have a situation where I need to use a custom scheduler to run tasks (these need to be tasks) and the scheduler does not set a synchronization context (so no ObserveOn, SubscribeOn, SynchronizationContextScheduler etc. I gather). The following is how I ended up doing it. Now, I wonder, I'm not really sure if this is the fittest way of doing asynchronous calls and awaiting their results. Is this all right or is there a more robust or idiomatic way?
var orleansScheduler = TaskScheduler.Current;
var someObservable = ...;
someObservable.Subscribe(i =>
{
Task.Factory.StartNew(async () =>
{
return await AsynchronousOperation(i);
}, CancellationToken.None, TaskCreationOptions.None, orleansScheduler);
});
What if awaiting wouldn't be needed?
<edit: I found a concrete, and a simplified example of what I'm doing here. Basically I'm using Rx in Orleans and the above code is bare-bones illustration of what I'm up to. Though I'm also interested in this situation in general too.
The final code
It turns out this was a bit tricky in the Orleans context. I don't see how I could get to use ObserveOn, which would be just the thing I'd like to use. The problem is that by using it, the Subscribe would never get called. The code:
var orleansScheduler = TaskScheduler.Current;
var factory = new TaskFactory(orleansScheduler);
var rxScheduler = new TaskPoolScheduler(factory);
var someObservable = ...;
someObservable
//.ObserveOn(rxScheduler) This doesn't look like useful since...
.SelectMany(i =>
{
//... we need to set the custom scheduler here explicitly anyway.
//See Async SelectMany at http://log.paulbetts.org/rx-and-await-some-notes/.
//Doing the "shorthand" form of .SelectMany(async... would call Task.Run, which
//in turn runs always on .NET ThreadPool and not on Orleans scheduler and hence
//the following .Subscribe wouldn't be called.
return Task.Factory.StartNew(async () =>
{
//In reality this is an asynchronous grain call. Doing the "shorthand way"
//(and optionally using ObserveOn) would get the grain called, but not the
//following .Subscribe.
return await AsynchronousOperation(i);
}, CancellationToken.None, TaskCreationOptions.None, orleansScheduler).Unwrap().ToObservable();
})
.Subscribe(i =>
{
Trace.WriteLine(i);
});
Also, a link to a related thread at Codeplex Orleans forums.
I strongly recommend against StartNew for any modern code. It does have a use case, but it's very rare.
If you have to use a custom task scheduler, I recommend using ObserveOn with a TaskPoolScheduler constructed from a TaskFactory wrapper around your scheduler. That's a mouthful, so here's the general idea:
var factory = new TaskFactory(customScheduler);
var rxScheduler = new TaskPoolScheduler(factory);
someObservable.ObserveOn(rxScheduler)...
Then you could use SelectMany to start an asynchronous operation for each event in a source stream as they arrive.
An alternative, less ideal solution is to use async void for your subscription "events". This is acceptable, but you have to watch your error handling. As a general rule, don't allow exceptions to propagate out of an async void method.
There is a third alternative, where you hook an observable into a TPL Dataflow block. A block like ActionBlock can specify its task scheduler, and Dataflow naturally understands asynchronous handlers. Note that by default, Dataflow blocks will throttle the processing to a single element at a time.
Generally speaking, instead of subscribing to execute, it's better/more idiomatic to project the task parameters into the task execution and subscribe just for the results. That way you can compose with further Rx downstream.
e.g. Given a random task like:
static async Task<int> DoubleAsync(int i, Random random)
{
Console.WriteLine("Started");
await Task.Delay(TimeSpan.FromSeconds(random.Next(10) + 1));
return i * 2;
}
Then you might do:
void Main()
{
var random = new Random();
// stream of task parameters
var source = Observable.Range(1, 5);
// project the task parameters into the task execution, collect and flatten results
source.SelectMany(i => DoubleAsync(i, random))
// subscribe just for results, which turn up as they are done
// gives you flexibility to continue the rx chain here
.Subscribe(result => Console.WriteLine(result),
() => Console.WriteLine("All done."));
}

How to cleanup hanging tasks on C# Task API?

I have a simple function as the following:
static Task<A> Peirce<A, B>(Func<Func<A, Task<B>>, Task<A>> a)
{
var aa = new TaskCompletionSource<A>();
var tt = new Task<A>(() =>
a(b =>
{
aa.SetResult(b);
return new TaskCompletionSource<B>().Task;
}).Result
);
tt.Start();
return Task.WhenAny(aa.Task, tt).Result;
}
The idea is simple: for any implementation of a, it must return a Task<A> to me. For this purpose, it may or may not use the parameter (of type Func<A, Task<B>). If it do, our callback will be called and it sets the result of aa, and then aa.Task will complete. Otherwise, the result of a will not depend on its parameter, so we simply return its value. In any of the situation, either aa.Task or the result of a will complete, so it should never block unless a do not uses its parameter and blocks, or the task returned by a blocks.
The above code works, for example
static void Main(string[] args)
{
Func<Func<int, Task<int>>, Task<int>> t = a =>
{
return Task.FromResult(a(20).Result + 10);
};
Console.WriteLine(Peirce(t).Result); // output 20
t = a => Task.FromResult(10);
Console.WriteLine(Peirce(t).Result); // output 10
}
The problem here is, the two tasks aa.Task and tt must be cleaned up once the result of WhenAny has been determined, otherwise I am afraid there will be a leak of hanging tasks. I do not know how to do this, can any one suggest something? Or this is actually not a problem and C# will do it for me?
P.S. The name Peirce came from the famous "Peirce's Law"(((A->B)->A)->A) in propositional logic.
UPDATE: the point of matter is not "dispose" the tasks but rather stop them from running. I have tested, when I put the "main" logic in a 1000 loop it runs slowly (about 1 loop/second), and creates a lot of threads so it is a problem to solve.
A Task is a managed object. Unless you are introducing unmanaged resources, you shouldn't worry about a Task leaking resources. Let the GC clean it up and let the finalizer take care of the WaitHandle.
EDIT:
If you want to cancel tasks, consider using cooperative cancellation in the form of a CancellationTokenSource. You can pass this token to any tasks via the overload, and inside of each task, you may have some code as follows:
while (someCondition)
{
if (cancelToken.IsCancellationRequested)
break;
}
That way your tasks can gracefully clean up without throwing an exception. However you can propogate an OperationCancelledException if you call cancelToken.ThrowIfCancellationRequested(). So the idea in your case would be that whatever finishes first can issue the cancellation to the other tasks so that they aren't hung up doing work.
Thanks to #Bryan Crosby's answer, I can now implement the function as the following:
private class CanceledTaskCache<A>
{
public static Task<A> Instance;
}
private static Task<A> GetCanceledTask<A>()
{
if (CanceledTaskCache<A>.Instance == null)
{
var aa = new TaskCompletionSource<A>();
aa.SetCanceled();
CanceledTaskCache<A>.Instance = aa.Task;
}
return CanceledTaskCache<A>.Instance;
}
static Task<A> Peirce<A, B>(Func<Func<A, Task<B>>, Task<A>> a)
{
var aa = new TaskCompletionSource<A>();
Func<A, Task<B>> cb = b =>
{
aa.SetResult(b);
return GetCanceledTask<B>();
};
return Task.WhenAny(aa.Task, a(cb)).Unwrap();
}
and it works pretty well:
static void Main(string[] args)
{
for (int i = 0; i < 1000; ++i)
{
Func<Func<int, Task<String>>, Task<int>> t =
async a => (await a(20)).Length + 10;
Console.WriteLine(Peirce(t).Result); // output 20
t = async a => 10;
Console.WriteLine(Peirce(t).Result); // output 10
}
}
Now it is fast and not consuming to much resources. It can be even faster (about 70 times in my machine) if you do not use the async/await keyword:
static void Main(string[] args)
{
for (int i = 0; i < 10000; ++i)
{
Func<Func<int, Task<String>>, Task<int>> t =
a => a(20).ContinueWith(ta =>
ta.IsCanceled ? GetCanceledTask<int>() :
Task.FromResult(ta.Result.Length + 10)).Unwrap();
Console.WriteLine(Peirce(t).Result); // output 20
t = a => Task.FromResult(10);
Console.WriteLine(Peirce(t).Result); // output 10
}
}
Here the matter is, even you can detected the return value of a(20), there is no way to cancel the async block rather than throwing an OperationCanceledException and it prevents WhenAny to be optimized.
UPDATE: optimised code and compared async/await and native Task API.
UPDATE: If I can write the following code it will be ideal:
static Task<A> Peirce<A, B>(Func<Func<A, Task<B>>, Task<A>> a)
{
var aa = new TaskCompletionSource<A>();
return await? a(async b => {
aa.SetResult(b);
await break;
}) : await aa.Task;
}
Here, await? a : b has value a's result if a successes, has value b if a is cancelled (like a ? b : c, the value of a's result should have the same type of b).
await break will cancel the current async block.
As Stephen Toub of MS Parallel Programming Team says: "No. Don't bother disposing of your tasks."
tldr: In most cases, disposing of a task does nothing, and when the task actually has allocated unmanaged resources, its finalizer will release them when the task object is collected.

Lock a single access variable for parallel threads in C#

Hello i have this code
var queue = new BlockingCollection<int>();
queue.Add(0);
var producers = Enumerable.Range(1, 3)
.Select(_ => Task.Factory.StartNew(()=>
{
Enumerable.Range(1, queue.Count)
.ToList().ForEach(i =>
{
lock (queue)
{
if (!queue.Contains(i))
{
Console.WriteLine("Thread" + Task.CurrentId.ToString());
queue.Add(i);
}
}
Thread.Sleep(100);
});
}))
.ToArray();
Task.WaitAll(producers);
queue.CompleteAdding();
foreach (var item in queue.GetConsumingEnumerable())
{
Console.WriteLine(item.ToString());
}
But i need each time that a single thread ads something to the queue.Add(i) the
Enumerable.Range(1, queue.Count) to be inceased so that the code executes until there are no more items to be added to the queue. I hope you understand the question.
In other words i need this action to run infinitely untill i tell it to stop.
Any suggestions?
I´m sorry to say, but I can´t understand your motives for writing something like that without further explaination :(
Is the following code useful to you in any way? Because I don´t think it is :P
int n = 2;
Task[] producers = Enumerable.Range(1, 3).Select(_ =>
Task.Factory.StartNew(() =>
{
while (queue.Count < n)
{
lock (queue)
{
if (!queue.Contains(n))
{
Console.WriteLine("Thread" + Task. CurrentId);
queue.Add(n);
Interlocked.Increment(ref n);
}
}
Thread.Sleep(100);
}
}))
.ToArray();
I mean, it will just go on and on. It´s like a reeeeeeaaallllyyy strange way of just adding numbers to a List
Please explain you objective and we might be able to help you.
I see, what you need is a BlockingCollection which came with .NET 4.0.
It allows to implement the Producer-Consumer pattern.
Multiple threads or tasks can add items to the collection concurrently. Multiple consumers can remove items concurrently, and if the collection becomes empty, the consuming threads will block and wait until a producer adds an item. Over and over again ...
... until a special method will be called by producer to identify the end, saying consumer "Hey, stop waiting there - nothing will come anymore!".
I am not posting code samples, because there are some under given link. You can find much more if you just google for Producer-Consumer pattern and/or BlockingCollection.

Categories