Task.WhenAny acting like Task.WhenAll - c#

I wrote a little program to test using BufferBlock (System.Threading.Tasks.Dataflow) to implement a dual-priority consumer-producer queue.
The consumer should always use any items from the high-priority queue first.
In this initial test, I have the producer running at a much slower rate than the consumer, so the data should just come out in the same order it went in, regardless of priority.
However, I find that the result of Task.WhenAny() is not completing until there is something in both queues (or there's a completion), thus acting like Task.WhenAll().
I thought I understood async/await, and I've perused Cleary's "Concurrency in C# Cookbook." However, there is something going on that I'm not understanding.
Any ideas?
Code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow; // add nuget package, 4.8.0
using static System.Console;
namespace DualBufferBlockExample { // .Net Framework 4.6.1
class Program {
private static async Task Produce(BufferBlock<int> queueLo, BufferBlock<int> queueHi, IEnumerable<int> values) {
await Task.Delay(10);
foreach(var value in values) {
if(value == 3 || value == 7)
await queueHi.SendAsync(value);
else
await queueLo.SendAsync(value);
WriteLine($"Produced {value} qL.Cnt={queueLo.Count} qH.Cnt={queueHi.Count}");
await Task.Delay(1000); // production lag
}
queueLo.Complete();
queueHi.Complete();
}
private static async Task<IEnumerable<int>> Consume(BufferBlock<int> queueLo, BufferBlock<int> queueHi) {
var results = new List<int>();
while(true) {
int value = -1;
while(queueLo.Count > 0 || queueHi.Count > 0) { // take from hi-priority first
if(queueHi.TryReceive(out value) ||
queueLo.TryReceive(out value)) { // process value
results.Add(value);
WriteLine($" Consumed {value}");
await Task.Delay(100); // consumer processing time shorter than production
}
}
var hasNorm = queueHi.OutputAvailableAsync();
var hasLow = queueLo.OutputAvailableAsync();
var anyT = await Task.WhenAny(hasNorm, hasLow); // <<<<<<<<<< behaves like WhenAll
WriteLine($" WhenAny {anyT.Result} qL.Result={hasLow.Result} qH.Result={hasNorm.Result} qL.Count={queueLo.Count} qH.Count={queueHi.Count}");
if(!anyT.Result)
break; // both queues are empty & complete
}
return results;
}
static async Task TestDataFlow() {
var queueLo = new BufferBlock<int>();
var queueHi = new BufferBlock<int>();
// Start the producer and consumer.
var consumer = Consume(queueLo, queueHi);
WriteLine("Consumer Started");
var producer = Produce(queueLo, queueHi, Enumerable.Range(0, 10));
WriteLine("Producer Started");
// Wait for everything to complete.
await Task.WhenAll(producer, consumer, queueLo.Completion, queueHi.Completion);
// show consumer's output
var results = await consumer;
Write("Results:");
foreach(var x in results)
Write($" {x}");
WriteLine();
}
static void Main(string[] args) {
try {
TestDataFlow().Wait();
} catch(Exception ex) {
WriteLine($"TestDataFlow exception: {ex.ToString()}");
}
ReadLine();
}
}
}
Output:
Consumer Started
Producer Started
Produced 0 qL.Cnt=1 qH.Cnt=0
Produced 1 qL.Cnt=2 qH.Cnt=0
Produced 2 qL.Cnt=3 qH.Cnt=0
Produced 3 qL.Cnt=3 qH.Cnt=1
WhenAny True qL.Result=True qH.Result=True qL.Count=3 qH.Count=1
Consumed 3
Consumed 0
Consumed 1
Consumed 2
Produced 4 qL.Cnt=1 qH.Cnt=0
Produced 5 qL.Cnt=2 qH.Cnt=0
Produced 6 qL.Cnt=3 qH.Cnt=0
Produced 7 qL.Cnt=3 qH.Cnt=1
WhenAny True qL.Result=True qH.Result=True qL.Count=3 qH.Count=1
Consumed 7
Consumed 4
Consumed 5
Consumed 6
Produced 8 qL.Cnt=1 qH.Cnt=0
Produced 9 qL.Cnt=2 qH.Cnt=0
WhenAny True qL.Result=True qH.Result=False qL.Count=2 qH.Count=0
Consumed 8
Consumed 9
WhenAny False qL.Result=False qH.Result=False qL.Count=0 qH.Count=0
Results: 3 0 1 2 7 4 5 6 8 9

After calling WhenAny your immediately blocking on both tasks using .Result without knowing that they're both complete.
var anyT = await Task.WhenAny(hasNorm, hasLow);
//This line blocks on both the hasNorm and hasLow tasks preventing execution from continuing.
WriteLine($" WhenAny {anyT.Result} qL.Result={hasLow.Result} qH.Result={hasNorm.Result} qL.Count={queueLo.Count} qH.Count={queueHi.Count}");
awaiting both tasks will also give you the same behavior. The best you can do is await the task returned from WhenAny and only print the results from the completed task.
Additionally, a priority queue is not something that TPL-Dataflow does well out of the box. It treats all messages equally so you end plugging in your own priority implementation. That said you can make it work.

Related

Losing items somewhere in C# BlockingCollection with GetConsumingEnumerable()

I'm trying to do a parallel SqlBulkCopy to multiple targets over WAN, many of which may be having slow connections and/or connection cutoffs; their connection speed varies from 2 to 50 mbits download, and I am sending from a connection with 1000 mbit upload; a lot of the targets need multiple retries to correctly finish.
I'm currently using a Parallel.ForEach on the GetConsumingEnumerable() of a BlockingCollection (queue); however I either stumbled upon some bug, or I am having problems fully understanding its purpose, or simply got something wrong..
The code never calls the CompleteAdding() method of the blockingcollection,
it seems that somewhere in the parallel-foreach-loop some of the targets get lost.
Even if there are different approaches to this, and disregarding the kind of work it is doing in the loop, the blockingcollection shouldn't behave the way it does in this example, should it?
In the foreach-loop, I do the work, and add the target to a results-collection in case it completed successfully, or re-add the target to the BlockingCollection in case of an error until the target reached the max retries threshold; at that point I add it to the results-collection.
In an additional Task, I loop until the count of the results-collection equals the initial count of the targets; then I do the CompleteAdding() on the blocking collection.
I already tried using a locking object for the operations on the results-collection (using a List<int> instead) and the queue, with no luck, but that shouldn't be necessary anyways. I also tried adding the retries to a separate collection, and re-adding those to the BlockingCollection in a different Task instead of in the parallel.foreach.
Just for fun I also tried compiling with .NET from 4.5 to 4.8, and different C# language versions.
Here is a simplified example:
List<int> targets = new List<int>();
for (int i = 0; i < 200; i++)
{
targets.Add(0);
}
BlockingCollection<int> queue = new BlockingCollection<int>(new ConcurrentQueue<int>());
ConcurrentBag<int> results = new ConcurrentBag<int>();
targets.ForEach(f => queue.Add(f));
// Bulkcopy in die Filialen:
Task.Run(() =>
{
while (results.Count < targets.Count)
{
Thread.Sleep(2000);
Console.WriteLine($"Completed: {results.Count} / {targets.Count} | queue: {queue.Count}");
}
queue.CompleteAdding();
});
int MAX_RETRIES = 10;
ParallelOptions options = new ParallelOptions { MaxDegreeOfParallelism = 50 };
Parallel.ForEach(queue.GetConsumingEnumerable(), options, target =>
{
try
{
// simulate a problem with the bulkcopy:
throw new Exception();
results.Add(target);
}
catch (Exception)
{
if (target < MAX_RETRIES)
{
target++;
if (!queue.TryAdd(target))
Console.WriteLine($"{target.ToString("D3")}: Error, can't add to queue!");
}
else
{
results.Add(target);
Console.WriteLine($"Aborted after {target + 1} tries | {results.Count} / {targets.Count} items finished.");
}
}
});
I expected the count of the results-collection to be the exact count of the targets-list in the end, but it seems to never reach that number, which results in the BlockingCollection never being marked as completed, so the code never finishes.
I really don't understand why not all of the targets get added to the results-collection eventually! The added count always varies, and is mostly just shy of the expected final count.
EDIT: I removed the retry-part, and replaced the ConcurrentBag with a simple int-counter, and it still doesn't work most of the time:
List<int> targets = new List<int>();
for (int i = 0; i < 500; i++)
targets.Add(0);
BlockingCollection<int> queue = new BlockingCollection<int>(new ConcurrentQueue<int>());
//ConcurrentBag<int> results = new ConcurrentBag<int>();
int completed = 0;
targets.ForEach(f => queue.Add(f));
var thread = new Thread(() =>
{
while (completed < targets.Count)
{
Thread.Sleep(2000);
Console.WriteLine($"Completed: {completed} / {targets.Count} | queue: {queue.Count}");
}
queue.CompleteAdding();
});
thread.Start();
ParallelOptions options = new ParallelOptions { MaxDegreeOfParallelism = 4 };
Parallel.ForEach(queue.GetConsumingEnumerable(), options, target =>
{
Interlocked.Increment(ref completed);
});
Sorry, found the answer: the default partitioner used by blockingcollection and parallel foreach is chunking and buffering, which results in the foreach loop to forever wait for enough items for the next chunk.. for me, it sat there for a whole night, without processing the last few items!
So, instead of:
ParallelOptions options = new ParallelOptions { MaxDegreeOfParallelism = 4 };
Parallel.ForEach(queue.GetConsumingEnumerable(), options, target =>
{
Interlocked.Increment(ref completed);
});
you have to use:
var partitioner = Partitioner.Create(queue.GetConsumingEnumerable(), EnumerablePartitionerOptions.NoBuffering);
ParallelOptions options = new ParallelOptions { MaxDegreeOfParallelism = 4 };
Parallel.ForEach(partitioner, options, target =>
{
Interlocked.Increment(ref completed);
});
Parallel.ForEach is meant for data parallelism (ie processing 100K rows using all 8 cores), not concurrent operations. This is essentially a pub/sub and async problem, if not a pipeline problem. There's nothing for the CPU to do in this case, just start the async operations and wait for them to complete.
.NET handles this since .NET 4.5 through the Dataflow classes and lately, the lower-level System.Threading.Channel namespace.
In its simplest form, you can create an ActionBlock<> that takes a buffer and target connection and publishes the data. Let's say you use this method to send the data to a server :
async Task MyBulkCopyMethod(string connectionString,DataTable data)
{
using(var bcp=new SqlBulkCopy(connectionString))
{
//Set up mappings etc.
//....
await bcp.WriteToServerAsync(data);
}
}
You can use this with an ActionBlock class with a configured degree of parallelism. Dataflow classes like ActionBlock have their own input, and where appropriate, output buffers, so there's no need to create a separate queue :
class DataMessage
{
public string Connection{get;set;}
public DataTable Data {get;set;}
}
...
var options=new ExecutionDataflowBlockOptions {
MaxDegreeOfParallelism = 50,
BoundedCapacity = 8
};
var block=new ActionBlock<DataMessage>(msg=>MyBulkCopyMethod(msg.Connection,msg.Data, options);
We can start posting messages to the block now. By setting the capacity to 8 we ensure the input buffer won't get filled with large messages if the block is too slow. MaxDegreeOfParallelism controls how may operations run concurrently. Let's say we want to send the same data to many servers :
var data=.....;
var servers=new[]{connString1, connString2,....};
var messages= from sv in servers
select new DataMessage{ ConnectionString=sv,Table=data};
foreach(var msg in messages)
{
await block.SendAsync(msg);
}
//Tell the block we are done
block.Complete();
//Await for all messages to finish processing
await block.Completion;
Retries
One possibility for retries is to use a retry loop in the worker function. A better idea would be to use a different block and post failed messages there.
var block=new ActionBlock<DataMessage>(async msg=> {
try {
await MyBulkCopyMethod(msg.Connection,msg.Data, options);
}
catch(SqlException exc) when (some retry condition)
{
//Post without awaiting
retryBlock.Post(msg);
});
When the original block completes we want to tell the retry block to complete as well, no matter what :
block.Completion.ContinueWith(_=>retryBlock.Complete());
Now we can await the retryBlock to complete.
That block could have a smaller DOP and perhaps a delay between attempts :
var retryOptions=new ExecutionDataflowBlockOptions {
MaxDegreeOfParallelism = 5
};
var retryBlock=new ActionBlock<DataMessage>(async msg=>{
await Task.Delay(1000);
try {
await MyBulkCopyMethod(msg.Connection,msg.Data, options);
}
catch (Exception ....)
{
...
}
});
This pattern can be repeated to create multiple levels of retry, or different conditions. It can also be used to create different priority workers by giving a larger DOP to high priority workers, or a larger delay to low priority workers

Multiple Async Calls with Pause Between Calls

I have an IEnumerable<Task>, where each Task will call the same endpoint. However, the endpoint can only handle so many calls per second. How can I put, say, a half second delay between each call?
I have tried adding Task.Delay(), but of course awaiting them simply means that the app waits a half second before sending all the calls at once.
Here is a code snippet:
var resultTasks = orders
.Select(async task =>
{
var result = new VendorTaskResult();
try
{
result.Response = await result.CallVendorAsync();
}
catch(Exception ex)
{
result.Exception = ex;
}
return result;
} );
var results = Task.WhenAll(resultTasks);
I feel like I should do something like
Task.WhenAll(resultTasks.EmitOverTime(500));
... but how exactly do I do that?
What you describe in your question is in other words rate limiting. You'd like to apply rate limiting policy to your client, because the API you use enforces such a policy on the server to protect itself from abuse.
While you could implement rate limiting yourself, I'd recommend you to go with some well established solution. Rate Limiter from Davis Desmaisons was the one that I picked at random and I instantly liked it. It had solid documentation, superior coverage and was easy to use. It is also available as NuGet package.
Check out the simple snippet below that demonstrates running semi-overlapping tasks in sequence while defering the task start by half a second after the immediately preceding task started. Each task lasts at least 750 ms.
using ComposableAsync;
using RateLimiter;
using System;
using System.Threading.Tasks;
namespace RateLimiterTest
{
class Program
{
static void Main(string[] args)
{
Log("Starting tasks ...");
var constraint = TimeLimiter.GetFromMaxCountByInterval(1, TimeSpan.FromSeconds(0.5));
var tasks = new[]
{
DoWorkAsync("Task1", constraint),
DoWorkAsync("Task2", constraint),
DoWorkAsync("Task3", constraint),
DoWorkAsync("Task4", constraint)
};
Task.WaitAll(tasks);
Log("All tasks finished.");
Console.ReadLine();
}
static void Log(string message)
{
Console.WriteLine(DateTime.Now.ToString("HH:mm:ss.fff ") + message);
}
static async Task DoWorkAsync(string name, IDispatcher constraint)
{
await constraint;
Log(name + " started");
await Task.Delay(750);
Log(name + " finished");
}
}
}
Sample output:
10:03:27.121 Starting tasks ...
10:03:27.154 Task1 started
10:03:27.658 Task2 started
10:03:27.911 Task1 finished
10:03:28.160 Task3 started
10:03:28.410 Task2 finished
10:03:28.680 Task4 started
10:03:28.913 Task3 finished
10:03:29.443 Task4 finished
10:03:29.443 All tasks finished.
If you change the constraint to allow maximum two tasks per second (var constraint = TimeLimiter.GetFromMaxCountByInterval(2, TimeSpan.FromSeconds(1));), which is not the same as one per half a second, then the output could be like:
10:06:03.237 Starting tasks ...
10:06:03.264 Task1 started
10:06:03.268 Task2 started
10:06:04.026 Task2 finished
10:06:04.031 Task1 finished
10:06:04.275 Task3 started
10:06:04.276 Task4 started
10:06:05.032 Task4 finished
10:06:05.032 Task3 finished
10:06:05.033 All tasks finished.
Note that the current version of Rate Limiter targets .NETFramework 4.7.2+ or .NETStandard 2.0+.
This is just a thought, but another approach could be to create a queue and add another thread that runs polling the queue for calls that need to go out to your endpoint.
Have you considered just turning that into a foreach-loop with a Task.Delay call? You seem to want to explicitly call them sequentially, it won't hurt if that is obvious from your code.
var results = new List<YourResultType>;
foreach(var order in orders){
var result = new VendorTaskResult();
try
{
result.Response = await result.CallVendorAsync();
results.Add(result.Response);
}
catch(Exception ex)
{
result.Exception = ex;
}
}
Instead of selecting from orders you could loop over them, and inside the loop put the result into a list and then call Task.WhenAll.
Would look something like:
var resultTasks = new List<VendorTaskResult>(orders.Count);
orders.ToList().ForEach( item => {
var result = new VendorTaskResult();
try
{
result.Response = await result.CallVendorAsync();
}
catch(Exception ex)
{
result.Exception = ex;
}
resultTasks.Add(result);
Thread.Sleep(x);
});
var results = Task.WhenAll(resultTasks);
If you want to control the number of requests executed simultaneously, you have to use a semaphore.
I have something very similar, and it works fine with me. Please note that I call ToArray() after the Linq query finishes, that triggers the tasks:
using (HttpClient client = new HttpClient()) {
IEnumerable<Task<string>> _downloads = _group
.Select(job => {
await Task.Delay(300);
return client.GetStringAsync(<url with variable job>);
});
Task<string>[] _downloadTasks = _downloads.ToArray();
_pages = await Task.WhenAll(_downloadTasks);
}
Now please note that this will create n nunmber of tasks, all in parallel, and the Task.Delay literally does nothing. If you want to call the pages synchronously (as it sounds by putting a delay between the calls), then this code may be better:
using (HttpClient client = new HttpClient()) {
foreach (string job in _group) {
await Task.Delay(300);
_pages.Add(await client.GetStringAsync(<url with variable job>));
}
}
The download of the pages is still asynchronous (while downloading other tasks are done), but each call to download the page is synchronous, ensuring that you can wait for one to finish in order to call the next one.
The code can be easily changed to call the pages asynchronously in chunks, like every 10 pages, wait 300ms, like in this sample:
IEnumerable<string[]> toParse = myData
.Select((v, i) => new { v.code, group = i / 20 })
.GroupBy(x => x.group)
.Select(g => g.Select(x => x.code).ToArray());
using (HttpClient client = new HttpClient()) {
foreach (string[] _group in toParse) {
string[] _pages = null;
IEnumerable<Task<string>> _downloads = _group
.Select(job => {
return client.GetStringAsync(<url with job>);
});
Task<string>[] _downloadTasks = _downloads.ToArray();
_pages = await Task.WhenAll(_downloadTasks);
await Task.Delay(5000);
}
}
All this does is group your pages in chunks of 20, iterate through the chunks, download all pages of the chunk asynchronously, wait 5 seconds, move on to the next chunk.
I hope that is what you were waiting for :)
The proposed method EmitOverTime is doable, but only by blocking the current thread:
public static IEnumerable<Task<TResult>> EmitOverTime<TResult>(
this IEnumerable<Task<TResult>> tasks, int delay)
{
foreach (var item in tasks)
{
Thread.Sleep(delay); // Delay by blocking
yield return item;
}
}
Usage:
var results = await Task.WhenAll(resultTasks.EmitOverTime(500));
Probably better is to create a variant of Task.WhenAll that accepts a delay argument, and delays asyncronously:
public static async Task<TResult[]> WhenAllWithDelay<TResult>(
IEnumerable<Task<TResult>> tasks, int delay)
{
var tasksList = new List<Task<TResult>>();
foreach (var task in tasks)
{
await Task.Delay(delay).ConfigureAwait(false);
tasksList.Add(task);
}
return await Task.WhenAll(tasksList).ConfigureAwait(false);
}
Usage:
var results = await WhenAllWithDelay(resultTasks, 500);
This design implies that the enumerable of tasks should be enumerated only once. It is easy to forget this during development, and start enumerating it again, spawning a new set of tasks. For this reason I propose to make it an OnlyOnce enumerable, as it is shown in this question.
Update: I should mention why the above methods work, and under what premise. The premise is that the supplied IEnumerable<Task<TResult>> is deferred, in other words non-materialized. At the method's start there are no tasks created yet. The tasks are created one after the other during the enumeration of the enumerable, and the trick is that the enumeration is slow and controlled. The delay inside the loop ensures that the tasks are not created all at once. They are created hot (in other words already started), so at the time the last task has been created some of the first tasks may have already been completed. The materialized list of half-running/half-completed tasks is then passed to Task.WhenAll, that waits for all to complete asynchronously.

ParallelEnumerable.WithDegreeOfParallelism() not restricting tasks?

I'm attempting to use AsParallel() with async-await to have an application process a series of tasks in parallel, but with a restricted degree of concurrency due to the task starting an external Process that has significant memory usage (hence wanting to wait for the process to complete before proceeding to the next item in the series). Most literature I've seen on the function ParallelEnumerable.WithDegreeOfSeparation suggests that using it will set a max limit on concurrent tasks at any one time, but my own tests seem to suggest that it's skipping the limit altogether.
To provide an rough example (WithDegreeOrParallelism() set to 1 deliberately to demonstrate the issue):
public class Example
{
private async Task HeavyTask(int i)
{
await Task.Delay(10 * 1000);
}
public async Task Run()
{
int n = 0;
await Task.WhenAll(Enumerable.Range(0, 100)
.AsParallel()
.WithDegreeOfParallelism(1)
.Select(async i =>
{
Interlocked.Increment(ref n);
Console.WriteLine("[+] " + n);
await HeavyTask(i);
Interlocked.Decrement(ref n);
Console.WriteLine("[-] " + n);
}));
}
}
class Program
{
public static void Main(string[] args)
{
Task.Run(async () =>
{
await new Example().Run();
}).Wait();
}
}
From what I understand, the code above is meant to produce output along the lines of:
[+] 1
[-] 0
[+] 1
[-] 0
...
But instead returns:
[+] 1
[+] 2
[+] 3
[+] 4
...
Suggesting that it starting off all the tasks in the list and then waiting for the tasks to return.
Is there anything particularly obvious (or non-obvious) that I'm doing wrong which is making it seem like WithDegreeOfParallelism() is being ignored?
Update
Sorry, after testing your code i understand what you are seeing now
async i =>
Async lambda is just async void, basically unobserved task which will run regardless Thread.CurrentThread.ManagedThreadId); will show you clearly it is consuming as many threads as it likes
Also note, if your heavy task is IO bound, then skip the PLINQ and Parallel use async and await in an TPL Dataflow ActionBlock as it will give you the best of both worlds
E.g
public static async Task DoWorkLoads(List<Something> results)
{
var options = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 2
};
var block = new ActionBlock<int>(MyMethodAsync, options);
foreach (var item in list)
block.Post(item );
block.Complete();
await block.Completion;
}
...
public async Task MyMethodAsync(int i)
{
await Task.Delay(10 * 1000);
}
Original
This is very subtle and a very common misunderstanding, however the documentation i think seems wrong
Sets the degree of parallelism to use in a query. Degree of
parallelism is the maximum number of concurrently executing tasks that
will be used to process the query.
Though if we dig into this a bit more we get a better understanding, also there are github conversations on this as well.
ParallelOptions.MaxDegreeOfParallelism vs PLINQ’s WithDegreeOfParallelism
PLINQ is different. Some important Standard Query Operators in PLINQ
require communication between the threads involved in the processing
of the query, including some that rely on a Barrier to enable threads
to operate in lock-step. The PLINQ design requires that a specific
number of threads be actively involved for the query to make any
progress. Thus when you specify a DegreeOfParallelism for PLINQ,
you’re specifying the actual number of threads that will be involved,
rather than just a maximum.

Unwrapping IObservable<Task<T>> into IObservable<T> with order preservation

Is there a way to unwrap the IObservable<Task<T>> into IObservable<T> keeping the same order of events, like this?
Tasks: ----a-------b--c----------d------e---f---->
Values: -------A-----------B--C------D-----E---F-->
Let's say I have a desktop application that consumes a stream of messages, some of which require heavy post-processing:
IObservable<Message> streamOfMessages = ...;
IObservable<Task<Result>> streamOfTasks = streamOfMessages
.Select(async msg => await PostprocessAsync(msg));
IObservable<Result> streamOfResults = ???; // unwrap streamOfTasks
I imagine two ways of dealing with that.
First, I can subscribe to streamOfTasks using the asynchronous event handler:
streamOfTasks.Subscribe(async task =>
{
var result = await task;
Display(result);
});
Second, I can convert streamOfTasks using Observable.Create, like this:
var streamOfResults =
from task in streamOfTasks
from value in Observable.Create<T>(async (obs, cancel) =>
{
var v = await task;
obs.OnNext(v);
// TODO: don't know when to call obs.OnComplete()
})
select value;
streamOfResults.Subscribe(result => Display(result));
Either way, the order of messages is not preserved: some later messages that
don't need any post-processing come out faster than earlier messages that
require post-processing. Both my solutions handle the incoming messages
in parallel, but I'd like them to be processed sequentially, one by one.
I can write a simple task queue to process just one task at a time,
but perhaps it's an overkill. Seems to me that I'm missing something obvious.
UPD. I wrote a sample console program to demonstrate my approaches. All solutions by far don't preserve the original order of events. Here is the output of the program:
Timer: 0
Timer: 1
Async handler: 1
Observable.Create: 1
Observable.FromAsync: 1
Timer: 2
Async handler: 2
Observable.Create: 2
Observable.FromAsync: 2
Observable.Create: 0
Async handler: 0
Observable.FromAsync: 0
Here is the complete source code:
// "C:\Program Files (x86)\MSBuild\14.0\Bin\csc.exe" test.cs /r:System.Reactive.Core.dll /r:System.Reactive.Linq.dll /r:System.Reactive.Interfaces.dll
using System;
using System.Reactive;
using System.Reactive.Concurrency;
using System.Reactive.Linq;
using System.Threading.Tasks;
class Program
{
static void Main()
{
Console.WriteLine("Press ENTER to exit.");
// the source stream
var timerEvents = Observable.Timer(TimeSpan.Zero, TimeSpan.FromSeconds(1));
timerEvents.Subscribe(x => Console.WriteLine($"Timer: {x}"));
// solution #1: using async event handler
timerEvents.Subscribe(async x =>
{
var result = await PostprocessAsync(x);
Console.WriteLine($"Async handler: {x}");
});
// solution #2: using Observable.Create
var processedEventsV2 =
from task in timerEvents.Select(async x => await PostprocessAsync(x))
from value in Observable.Create<long>(async (obs, cancel) =>
{
var v = await task;
obs.OnNext(v);
})
select value;
processedEventsV2.Subscribe(x => Console.WriteLine($"Observable.Create: {x}"));
// solution #3: using FromAsync, as answered by #Enigmativity
var processedEventsV3 =
from msg in timerEvents
from result in Observable.FromAsync(() => PostprocessAsync(msg))
select result;
processedEventsV3.Subscribe(x => Console.WriteLine($"Observable.FromAsync: {x}"));
Console.ReadLine();
}
static async Task<long> PostprocessAsync(long x)
{
// some messages require long post-processing
if (x % 3 == 0)
{
await Task.Delay(TimeSpan.FromSeconds(2.5));
}
// and some don't
return x;
}
}
Combining #Enigmativity's simple approach with #VMAtm's idea of attaching the counter and some code snippets from this SO question, I came up with this solution:
// usage
var processedStream = timerEvents.SelectAsync(async t => await PostprocessAsync(t));
processedStream.Subscribe(x => Console.WriteLine($"Processed: {x}"));
// my sample console program prints the events ordered properly:
Timer: 0
Timer: 1
Timer: 2
Processed: 0
Processed: 1
Processed: 2
Timer: 3
Timer: 4
Timer: 5
Processed: 3
Processed: 4
Processed: 5
....
Here is my SelectAsync extension method to transform IObservable<Task<TSource>> into IObservable<TResult> keeping the original order of events:
public static IObservable<TResult> SelectAsync<TSource, TResult>(
this IObservable<TSource> src,
Func<TSource, Task<TResult>> selectorAsync)
{
// using local variable for counter is easier than src.Scan(...)
var counter = 0;
var streamOfTasks =
from source in src
from result in Observable.FromAsync(async () => new
{
Index = Interlocked.Increment(ref counter) - 1,
Result = await selectorAsync(source)
})
select result;
// buffer the results coming out of order
return Observable.Create<TResult>(observer =>
{
var index = 0;
var buffer = new Dictionary<int, TResult>();
return streamOfTasks.Subscribe(item =>
{
buffer.Add(item.Index, item.Result);
TResult result;
while (buffer.TryGetValue(index, out result))
{
buffer.Remove(index);
observer.OnNext(result);
index++;
}
});
});
}
I'm not particularly satisfied with my solution as it looks too complex to me, but at least it doesn't require any external dependencies. I'm using here a simple Dictionary to buffer and reorder task results because the subscriber need not to be thread-safe (the subscriptions are neved called concurrently).
Any comments or suggestions are welcome. I'm still hoping to find the native RX way of doing this without custom buffering extension method.
The RX library contains three operators that can unwrap an observable sequence of tasks, the Concat, Merge and Switch. All three accept a single source argument of type IObservable<Task<T>>, and return an IObservable<T>. Here are their descriptions from the documentation:
Concat
Concatenates all task results, as long as the previous task terminated successfully.
Merge
Merges results from all source tasks into a single observable sequence.
Switch
Transforms an observable sequence of tasks into an observable sequence producing values only from the most recent observable sequence. Each time a new task is received, the previous task's result is ignored.
In other words the Concat returns the results in their original order, the Merge returns the results in order of completion, and the Switch filters out any results from tasks that didn't complete before the next task was emitted. So your problem can be solved by just using the built-in Concat operator. No custom operator is needed.
var streamOfResults = streamOfTasks
.Select(async task =>
{
var result1 = await task;
var result2 = await PostprocessAsync(result1);
return result2;
})
.Concat();
The tasks are already started before they are emitted by the streamOfTasks. In other words they are emerging in a "hot" state. So the fact that the Concat operator awaits them the one after the other has no consequence regarding the concurrency of the operations. It only affects the order of their results. This would be a consideration if instead of hot tasks you had cold observables, like these created by the Observable.FromAsync and Observable.Create methods, in which case the Concat would execute the operations sequentially.
Is the following simple approach an answer for you?
IObservable<Result> streamOfResults =
from msg in streamOfMessages
from result in Observable.FromAsync(() => PostprocessAsync(msg))
select result;
To maintain the order of events you can funnel your stream into a TransformBlock from TPL Dataflow. The TransformBlock would execute your post-processing logic and will maintain the order of its output by default.
using System;
using System.Collections.Generic;
using System.Reactive.Linq;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;
using NUnit.Framework;
namespace HandlingStreamInOrder {
[TestFixture]
public class ItemHandlerTests {
[Test]
public async Task Items_Are_Output_In_The_Same_Order_As_They_Are_Input() {
var itemHandler = new ItemHandler();
var timerEvents = Observable.Timer(TimeSpan.Zero, TimeSpan.FromMilliseconds(250));
timerEvents.Subscribe(async x => {
var data = (int)x;
Console.WriteLine($"Value Produced: {x}");
var dataAccepted = await itemHandler.SendAsync((int)data);
if (dataAccepted) {
InputItems.Add(data);
}
});
await Task.Delay(5000);
itemHandler.Complete();
await itemHandler.Completion;
CollectionAssert.AreEqual(InputItems, itemHandler.OutputValues);
}
private IList<int> InputItems {
get;
} = new List<int>();
}
public class ItemHandler {
public ItemHandler() {
var options = new ExecutionDataflowBlockOptions() {
BoundedCapacity = DataflowBlockOptions.Unbounded,
MaxDegreeOfParallelism = Environment.ProcessorCount,
EnsureOrdered = true
};
PostProcessBlock = new TransformBlock<int, int>((Func<int, Task<int>>)PostProcess, options);
var output = PostProcessBlock.AsObservable().Subscribe(x => {
Console.WriteLine($"Value Output: {x}");
OutputValues.Add(x);
});
}
public async Task<bool> SendAsync(int data) {
return await PostProcessBlock.SendAsync(data);
}
public void Complete() {
PostProcessBlock.Complete();
}
public Task Completion {
get { return PostProcessBlock.Completion; }
}
public IList<int> OutputValues {
get;
} = new List<int>();
private IPropagatorBlock<int, int> PostProcessBlock {
get;
}
private async Task<int> PostProcess(int data) {
if (data % 3 == 0) {
await Task.Delay(TimeSpan.FromSeconds(2));
}
return data;
}
}
}
Rx and TPL can be easily combined here, and TPL do save the order of events, by default, so your code could be something like this:
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;
static async Task<long> PostprocessAsync(long x) { ... }
IObservable<Message> streamOfMessages = ...;
var streamOfTasks = new TransformBlock<long, long>(async msg =>
await PostprocessAsync(msg)
// set the concurrency level for messages to handle
, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = Environment.ProcessorCount });
// easily convert block into observable
IObservable<long> streamOfResults = streamOfTasks.AsObservable();
Edit: Rx extensions meant to be a reactive pipeline of events for UI. As this type of applications are in general single-threaded, so messages are being handled with saving the order. But in general events in C# aren't thread safe, so you have to provide some additional logic to same the order.
If you don't like the idea to introduce another dependency, you need to store the operation number with Interlocked class, something like this:
// counter for operations get started
int operationNumber = 0;
// counter for operations get done
int doneNumber = 0;
...
var currentOperationNumber = Interlocked.Increment(ref operationNumber);
...
while (Interlocked.CompareExchange(ref doneNumber, currentOperationNumber + 1, currentOperationNumber) != currentOperationNumber)
{
// spin once here
}
// handle event
Interlocked.Increment(ref doneNumber);

Task sequencing and re-entracy

I've got the following scenario, which I think might be quite common:
There is a task (a UI command handler) which can complete either synchronously or asynchronously.
Commands may arrive faster than they are getting processed.
If there is already a pending task for a command, the new command handler task should be queued and processed sequentially.
Each new task's result may depend on the result of the previous task.
Cancellation should be observed, but I'd like to leave it outside the scope of this question for simplicity. Also, thread-safety (concurrency) is not a requirement, but re-entrancy must be supported.
Here's a basic example of what I'm trying to achieve (as a console app, for simplicity):
using System;
using System.Threading.Tasks;
namespace ConsoleApp
{
class Program
{
static void Main(string[] args)
{
var asyncOp = new AsyncOp<int>();
Func<int, Task<int>> handleAsync = async (arg) =>
{
Console.WriteLine("this task arg: " + arg);
//await Task.Delay(arg); // make it async
return await Task.FromResult(arg); // sync
};
Console.WriteLine("Test #1...");
asyncOp.RunAsync(() => handleAsync(1000));
asyncOp.RunAsync(() => handleAsync(900));
asyncOp.RunAsync(() => handleAsync(800));
asyncOp.CurrentTask.Wait();
Console.WriteLine("\nPress any key to continue to test #2...");
Console.ReadLine();
asyncOp.RunAsync(() =>
{
asyncOp.RunAsync(() => handleAsync(200));
return handleAsync(100);
});
asyncOp.CurrentTask.Wait();
Console.WriteLine("\nPress any key to exit...");
Console.ReadLine();
}
// AsyncOp
class AsyncOp<T>
{
Task<T> _pending = Task.FromResult(default(T));
public Task<T> CurrentTask { get { return _pending; } }
public Task<T> RunAsync(Func<Task<T>> handler)
{
var pending = _pending;
Func<Task<T>> wrapper = async () =>
{
// await the prev task
var prevResult = await pending;
Console.WriteLine("\nprev task result: " + prevResult);
// start and await the handler
return await handler();
};
_pending = wrapper();
return _pending;
}
}
}
}
The output:
Test #1...
prev task result: 0
this task arg: 1000
prev task result: 1000
this task arg: 900
prev task result: 900
this task arg: 800
Press any key to continue to test #2...
prev task result: 800
prev task result: 800
this task arg: 200
this task arg: 100
Press any key to exit...
It works in accordance with the requirements, until re-entrancy is introduced in test #2:
asyncOp.RunAsync(() =>
{
asyncOp.RunAsync(() => handleAsync(200));
return handleAsync(100);
});
The desired output should be 100, 200, rather than 200, 100, because there's already a pending outer task for 100. That's obviously because the inner task executes synchronously, breaking the logic var pending = _pending; /* ... */ _pending = wrapper() for the outer task.
How to make it work for test #2, too?
One solution would be to enforce asynchrony for every task, with Task.Factory.StartNew(..., TaskScheduler.FromCurrentSynchronizationContext(). However, I don't want to impose asynchronous execution upon the command handlers which might be synchronous internally. Also, I don't want to depend on the behavior of any particular synchronization context (i.e. relying upon that Task.Factory.StartNew should return before the created task has been actually started).
In the real-life project, I'm responsible for what AsyncOp is above, but have no control over the command handlers (i.e., whatever is inside handleAsync).
I almost forgot it's possible to construct a Task manually, without starting or scheduling it. Then, "Task.Factory.StartNew" vs "new Task(...).Start" put me back on track. I think this is one of those few cases when the Task<TResult> constructor may actually be useful, along with nested tasks (Task<Task<T>>) and Task.Unwrap():
// AsyncOp
class AsyncOp<T>
{
Task<T> _pending = Task.FromResult(default(T));
public Task<T> CurrentTask { get { return _pending; } }
public Task<T> RunAsync(Func<Task<T>> handler, bool useSynchronizationContext = false)
{
var pending = _pending;
Func<Task<T>> wrapper = async () =>
{
// await the prev task
var prevResult = await pending;
Console.WriteLine("\nprev task result: " + prevResult);
// start and await the handler
return await handler();
};
var task = new Task<Task<T>>(wrapper);
var inner = task.Unwrap();
_pending = inner;
task.RunSynchronously(useSynchronizationContext ?
TaskScheduler.FromCurrentSynchronizationContext() :
TaskScheduler.Current);
return inner;
}
}
The output:
Test #1...
prev task result: 0
this task arg: 1000
prev task result: 1000
this task arg: 900
prev task result: 900
this task arg: 800
Press any key to continue to test #2...
prev task result: 800
this task arg: 100
prev task result: 100
this task arg: 200
It's now also very easy to make AsyncOp thread-safe by adding a lock to protect _pending, if needed.
Updated, this has been further improved with cancel/restart logic.
Here is a solution that is worse on every aspect compared to the accepted answer, except from being thread-safe (which is not a requirement of the question). Disadvantages:
All lambdas are executed asynchronously (there is no fast path).
The executeOnCurrentContext configuration effects all lambdas (it's not a per-lambda configuration).
This solution uses as processing engine an ActionBlock from the TPL Dataflow library.
public class AsyncOp<T>
{
private readonly ActionBlock<Task<Task<T>>> _actionBlock;
public AsyncOp(bool executeOnCurrentContext = false)
{
var options = new ExecutionDataflowBlockOptions();
if (executeOnCurrentContext)
options.TaskScheduler = TaskScheduler.FromCurrentSynchronizationContext();
_actionBlock = new ActionBlock<Task<Task<T>>>(async taskTask =>
{
try
{
taskTask.RunSynchronously();
await await taskTask;
}
catch { } // Ignore exceptions
}, options);
}
public Task<T> RunAsync(Func<Task<T>> taskFactory)
{
var taskTask = new Task<Task<T>>(taskFactory);
if (!_actionBlock.Post(taskTask))
throw new InvalidOperationException("Not accepted"); // Should never happen
return taskTask.Unwrap();
}
}
Microsoft's Rx does provide an easy way to do this kind of thing. Here's a simple (perhaps overly simple) way of doing it:
var subject = new BehaviorSubject<int>(0);
IDisposable subscription =
subject
.Scan((x0, x1) =>
{
Console.WriteLine($"previous value {x0}");
return x1;
})
.Skip(1)
.Subscribe(x => Console.WriteLine($"current value {x}\r\n"));
subject.OnNext(1000);
subject.OnNext(900);
subject.OnNext(800);
Console.WriteLine("\r\nPress any key to continue to test #2...\r\n");
Console.ReadLine();
subject.OnNext(200);
subject.OnNext(100);
Console.WriteLine("\r\nPress any key to exit...");
Console.ReadLine();
The output I get is this:
previous value 0
current value 1000
previous value 1000
current value 900
previous value 900
current value 800
Press any key to continue to test #2...
previous value 800
current value 200
previous value 200
current value 100
Press any key to exit...
It's easy to cancel at any time by calling subscription.Dispose().
Error handling in Rx is generally a little more bespoke than normal. It's not just a matter of throwing a try/catch around things. You also can repeat steps that error with a Retry operator in the case of things like IO errors.
In this circumstance, because I've used a BehaviorSubject (which repeats its last value whenever it is subscribed to) you can easily just resubscribe using a Catch operator.
var subject = new BehaviorSubject<int>(0);
var random = new Random();
IDisposable subscription =
subject
.Select(x =>
{
if (random.Next(10) == 0)
throw new Exception();
return x;
})
.Catch<int, Exception>(ex => subject.Select(x => -x))
.Scan((x0, x1) =>
{
Console.WriteLine($"previous value {x0}");
return x1;
})
.Skip(1)
.Subscribe(x => Console.WriteLine($"current value {x}\r\n"));
Now with the .Catch<int, Exception>(ex => subject.Select(x => -x)) it inverts the value of the query should an exception be raised.
A typical output may be like this:
previous value 0
current value 1000
previous value 1000
current value 900
previous value 900
current value 800
Press any key to continue to test #2...
previous value 800
current value -200
previous value -200
current value -100
Press any key to exit...
Note the -ve numbers in the second half. An exception was handled and the query was able to continue.

Categories