Observable.Retry doesn't work as expected

Observable.Retry doesn't work as expected - c#

I have a sequence of numbers that are processed using an async method. I'm simulating a remote service call that may fail. In case of failure, I would like to retry until the call successes.
The problem is that with the code I'm trying, every time an exception is thrown in the async method, the sequence seems to hang forever.
You can test it with this simple code snippet (it's tested in LINQPad)
Random rnd = new Random();
void Main()
{
var numbers = Enumerable.Range(1, 10).ToObservable();
var processed = numbers.SelectMany(n => Process(n).ToObservable().Retry());
processed.Subscribe( f => Console.WriteLine(f));
}
public async Task<int> Process(int n)
{
if (rnd.Next(2) == 1)
{
throw new InvalidOperationException();
}
await Task.Delay(2000);
return n*10;
}
It should process every element, retrying the ones that have failed. Instead, it never ends and I don't know why.
How can I make it to do what I want?
EDIT: (thanks #CharlesNRice and #JonSkeet for the clues!):
This works!
Random rnd = new Random();
void Main()
{
var numbers = Enumerable.Range(1, 10).ToObservable();
var processed = numbers.SelectMany(n => RetryTask(() => MyTask(n)).ToObservable());
processed.Subscribe(f => Console.WriteLine(f));
}
private async Task<int> MyTask(int n)
{
if (rnd.Next(2) == 1)
{
throw new InvalidOperationException();
}
await System.Threading.Tasks.Task.Delay(2000);
return n * 10;
}
async Task<T> RetryTask<T>(Func<Task<T>> myTask, int? retryCount = null)
{
while (true)
{
try
{
return await myTask();
}
catch (Exception)
{
Debug.WriteLine("Retrying...");
if (retryCount.HasValue)
{
if (retryCount == 0)
{
throw;
}
retryCount--;
}
}
}
}

Rolling your own Retry is overkill in this case. You can achieve the same thing by simply wrapping your method call in a Defer block and it will be re-executed when the retry occurs.
var numbers = Enumerable.Range(1, 10).ToObservable();
var processed = numbers.SelectMany(n =>
//Defer call passed method every time it is subscribed to,
//Allowing the Retry to work correctly.
Observable.Defer(() =>
Process(n).ToObservable()).Retry()
);
processed.Subscribe( f => Console.WriteLine(f));

You are retrying back to the same Task that is in a faulted state. Retry will resubscribe back to the observable source. The source of your retry is the ToObservable(). It will not act like a task factory and make a new Task and since the task is faulted it continues to retry on the faulted task and will never be successful.
You can check out this answer how to make your own retry wrapper
https://stackoverflow.com/a/6090049/1798889

Related

RX terminolgy: Async processing in RX operator when there are frequent observable notifications

The purpose is to do some async work on a scarce resource in a RX operator, Select for example. Issues arise when observable notifications came at a rate that is faster than the time it takes for the async operation to complete.
Now I actually solved the problem. My question would be what is the correct terminology for this particular kind of issue? Does it have a name? Is it backpressure? Research I did until now indicate that this is some kind of a pressure problem, but not necessarily backpressure from my understanding. The most relevant resources I found are these:
https://github.com/ReactiveX/RxJava/wiki/Backpressure-(2.0)
http://reactivex.io/documentation/operators/backpressure.html
Now to the actual code. Suppose there is a scarce resource and it's consumer. In this case exception is thrown when resource is in use. Please note that this code should not be changed.
public class ScarceResource
{
private static bool inUse = false;
public async Task<int> AccessResource()
{
if (inUse) throw new Exception("Resource is alredy in use");
var result = await Task.Run(() =>
{
inUse = true;
Random random = new Random();
Thread.Sleep(random.Next(1, 2) * 1000);
inUse = false;
return random.Next(1, 10);
});
return result;
}
}
public class ResourceConsumer
{
public IObservable<int> DoWork()
{
var resource = new ScarceResource();
return resource.AccessResource().ToObservable();
}
}
Now here is the problem with a naive implementation to consume the resource. Error is thrown because notifications came at a faster rate than the consumer takes to run.
private static void RunIntoIssue()
{
var numbers = Enumerable.Range(1, 10);
var observableSequence = numbers
.ToObservable()
.SelectMany(n =>
{
Console.WriteLine("In observable: {0}", n);
var resourceConsumer = new ResourceConsumer();
return resourceConsumer.DoWork();
});
observableSequence.Subscribe(n => Console.WriteLine("In observer: {0}", n));
}
With the following code the problem is solved. I slow down processing by using a completed BehaviorSubject in conjunction with the Zip operator. Essentially what this code does is to take a sequential approach instead of a parallel one.
private static void RunWithZip()
{
var completed = new BehaviorSubject<bool>(true);
var numbers = Enumerable.Range(1, 10);
var observableSequence = numbers
.ToObservable()
.Zip(completed, (n, c) =>
{
Console.WriteLine("In observable: {0}, completed: {1}", n, c);
var resourceConsumer = new ResourceConsumer();
return resourceConsumer.DoWork();
})
.Switch()
.Select(n =>
{
completed.OnNext(true);
return n;
});
observableSequence.Subscribe(n => Console.WriteLine("In observer: {0}", n));
Console.Read();
}
Question
Is this backpressure, and if not does it have another terminology associated?

You're basically implementing a form of locking, or a mutex. Your code an cause backpressure, it's not really handling it.
Imagine if your source wasn't a generator function, but rather a series of data pushes. The data pushes arrive at a constant rate of every millisecond. It takes you 10 Millis to process each one, and your code forces serial processing. This causes backpressure: Zip will queue up the unprocessed datapushes infinitely until you run out of memory.

C# Running many async tasks the same time

I'm kinda new to async tasks.
I've a function that takes student ID and scrapes data from specific university website with the required ID.
private static HttpClient client = new HttpClient();
public static async Task<Student> ParseAsync(string departmentLink, int id, CancellationToken ct)
{
string website = string.Format(departmentLink, id);
try
{
string data;
var stream = await client.GetAsync(website, ct);
using (var reader = new StreamReader(await stream.Content.ReadAsStreamAsync(), Encoding.GetEncoding("windows-1256")))
data = reader.ReadToEnd();
//Parse data here and return Student.
} catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
And it works correctly. Sometimes though I need to run this function for a lot of students so I use the following
for(int i = ids.first; i <= ids.last; i++)
{
tasks[i - ids.first] = ParseStudentData.ParseAsync(entity.Link, i, cts.Token).ContinueWith(t =>
{
Dispatcher.Invoke(() =>
{
listview_students.Items.Add(t.Result);
//Students.Add(t.Result);
//lbl_count.Content = $"{listview_students.Items.Count}/{testerino.Length}";
});
});
}
I'm storing tasks in an array to wait for them later.
This also works finely as long as the students count is between (0, ~600?) it's kinda random.
And then for every other student that still hasn't been parsed throws A task was cancelled.
Keep in mind that, I never use the cancellation token at all.
I need to run this function on so many students it can reach ~9000 async task altogether. So what's happening?

You are basically creating a denial of service attack on the website when you are queuing up 9000 requests in such a short time frame. Not only is this causing you errors, but it could take down the website. It would be best to limit the number of concurrent requests to a more reasonable value (say 30). While there are probably several ways to do this, one that comes to mind is the following:
private async Task Test()
{
var tasks = new List<Task>();
for (int i = ids.first; i <= ids.last; i++)
{
tasks.Add(/* Do stuff */);
await WaitList(tasks, 30);
}
}
private async Task WaitList(IList<Task> tasks, int maxSize)
{
while (tasks.Count > maxSize)
{
var completed = await Task.WhenAny(tasks).ConfigureAwait(false);
tasks.Remove(completed);
}
}
Other approaches might leverage the producer/consumer pattern using .Net classes such as a BlockingCollection

This is what I ended up with based on #erdomke code:
public static async Task ForEachParallel<T>(
this IEnumerable<T> list,
Func<T, Task> action,
int dop)
{
var tasks = new List<Task>(dop);
foreach (var item in list)
{
tasks.Add(action(item));
while (tasks.Count >= dop)
{
var completed = await Task.WhenAny(tasks).ConfigureAwait(false);
tasks.Remove(completed);
}
}
// Wait for all remaining tasks.
await Task.WhenAll(tasks).ConfigureAwait(false);
}
// usage
await Enumerable
.Range(1, 500)
.ForEachParallel(i => ProcessItem(i), Environment.ProcessorCount);

Access results in Enumerable.Range without having to wait for all tasks to complete

I have a console app, where I need to access some url 200 times, wait for all of the requests to return and work on the 200 results.
I did it like that, in parallel:
var classNameTasks = Enumerable.Range(1, 200).Select(i => webApi.getSplittedClassName()).ToArray();
string[][] splittedClassNames = await Task.WhenAll(classNameTasks);
if (splittedClassNames[0] == null)
result = new TextResult("Error accessing the web");
getSplittedClassName returns a string[], if the internet is off it will return null.
Now, as you can see, after the completion of all the tasks, I do an if to check the content, if its null -> internet issues.
The problem here is that I need to wait for the whole 200 requests to return before I can check the content.
I am looking for a way to right away detect a scenario where there is no internet, and I return null, without having to wait for the 200 requests.

To do this you need
A CancellationTokenSource to signal that the job is done.
The WhenAll method from Tortuga.Anchor
static async Task Test()
{
TextResult result;
var cts = new CancellationTokenSource();
var classNameTasks = Enumerable.Range(1, 200).Select(i => getSplittedClassName(cts)).ToArray();
await classNameTasks.WhenAll(cts.Token);
if (cts.IsCancellationRequested)
result = new TextResult("Error accessing the web");
string[][] splittedClassNames = classNameTasks.Select(t => t.Result).ToArray();
}
private static async Task<string[]> getSplittedClassName(CancellationTokenSource cts)
{
try
{
//real code goes here
await Task.Delay(1000, cts.Token); //the token would be passed to the real web method
return new string[0];
}
catch
{
cts.Cancel(); //stop trying
return null;
}
}

How to limit the Maximum number of parallel tasks in c#

I have a collection of 1000 input message to process. I'm looping the input collection and starting the new task for each message to get processed.
//Assume this messages collection contains 1000 items
var messages = new List<string>();
foreach (var msg in messages)
{
Task.Factory.StartNew(() =>
{
Process(msg);
});
}
Can we guess how many maximum messages simultaneously get processed at the time (assuming normal Quad core processor), or can we limit the maximum number of messages to be processed at the time?
How to ensure this message get processed in the same sequence/order of the Collection?

You could use Parallel.Foreach and rely on MaxDegreeOfParallelism instead.
Parallel.ForEach(messages, new ParallelOptions {MaxDegreeOfParallelism = 10},
msg =>
{
// logic
Process(msg);
});

SemaphoreSlim is a very good solution in this case and I higly recommend OP to try this, but #Manoj's answer has flaw as mentioned in comments.semaphore should be waited before spawning the task like this.
Updated Answer: As #Vasyl pointed out Semaphore may be disposed before completion of tasks and will raise exception when Release() method is called so before exiting the using block must wait for the completion of all created Tasks.
int maxConcurrency=10;
var messages = new List<string>();
using(SemaphoreSlim concurrencySemaphore = new SemaphoreSlim(maxConcurrency))
{
List<Task> tasks = new List<Task>();
foreach(var msg in messages)
{
concurrencySemaphore.Wait();
var t = Task.Factory.StartNew(() =>
{
try
{
Process(msg);
}
finally
{
concurrencySemaphore.Release();
}
});
tasks.Add(t);
}
Task.WaitAll(tasks.ToArray());
}
Answer to Comments
for those who want to see how semaphore can be disposed without Task.WaitAll
Run below code in console app and this exception will be raised.
System.ObjectDisposedException: 'The semaphore has been disposed.'
static void Main(string[] args)
{
int maxConcurrency = 5;
List<string> messages = Enumerable.Range(1, 15).Select(e => e.ToString()).ToList();
using (SemaphoreSlim concurrencySemaphore = new SemaphoreSlim(maxConcurrency))
{
List<Task> tasks = new List<Task>();
foreach (var msg in messages)
{
concurrencySemaphore.Wait();
var t = Task.Factory.StartNew(() =>
{
try
{
Process(msg);
}
finally
{
concurrencySemaphore.Release();
}
});
tasks.Add(t);
}
// Task.WaitAll(tasks.ToArray());
}
Console.WriteLine("Exited using block");
Console.ReadKey();
}
private static void Process(string msg)
{
Thread.Sleep(2000);
Console.WriteLine(msg);
}

I think it would be better to use Parallel LINQ
Parallel.ForEach(messages ,
new ParallelOptions{MaxDegreeOfParallelism = 4},
x => Process(x);
);
where x is the MaxDegreeOfParallelism

With .NET 5.0 and Core 3.0 channels were introduced.
The main benefit of this producer/consumer concurrency pattern is that you can also limit the input data processing to reduce resource impact.
This is especially helpful when processing millions of data records.
Instead of reading the whole dataset at once into memory, you can now consecutively query only chunks of the data and wait for the workers to process it before querying more.
Code sample with a queue capacity of 50 messages and 5 consumer threads:
/// <exception cref="System.AggregateException">Thrown on Consumer Task exceptions.</exception>
public static async Task ProcessMessages(List<string> messages)
{
const int producerCapacity = 10, consumerTaskLimit = 3;
var channel = Channel.CreateBounded<string>(producerCapacity);
_ = Task.Run(async () =>
{
foreach (var msg in messages)
{
await channel.Writer.WriteAsync(msg);
// blocking when channel is full
// waiting for the consumer tasks to pop messages from the queue
}
channel.Writer.Complete();
// signaling the end of queue so that
// WaitToReadAsync will return false to stop the consumer tasks
});
var tokenSource = new CancellationTokenSource();
CancellationToken ct = tokenSource.Token;
var consumerTasks = Enumerable
.Range(1, consumerTaskLimit)
.Select(_ => Task.Run(async () =>
{
try
{
while (await channel.Reader.WaitToReadAsync(ct))
{
ct.ThrowIfCancellationRequested();
while (channel.Reader.TryRead(out var message))
{
await Task.Delay(500);
Console.WriteLine(message);
}
}
}
catch (OperationCanceledException) { }
catch
{
tokenSource.Cancel();
throw;
}
}))
.ToArray();
Task waitForConsumers = Task.WhenAll(consumerTasks);
try { await waitForConsumers; }
catch
{
foreach (var e in waitForConsumers.Exception.Flatten().InnerExceptions)
Console.WriteLine(e.ToString());
throw waitForConsumers.Exception.Flatten();
}
}
As pointed out by Theodor Zoulias:
On multiple consumer exceptions, the remaining tasks will continue to run and have to take the load of the killed tasks. To avoid this, I implemented a CancellationToken to stop all the remaining tasks and handle the exceptions combined in the AggregateException of waitForConsumers.Exception.
Side note:
The Task Parallel Library (TPL) might be good at automatically limiting the tasks based on your local resources. But when you are processing data remotely via RPC, it's necessary to manually limit your RPC calls to avoid filling the network/processing stack!

If your Process method is async you can't use Task.Factory.StartNew as it doesn't play well with an async delegate. Also there are some other nuances when using it (see this for example).
The proper way to do it in this case is to use Task.Run. Here's #ClearLogic answer modified for an async Process method.
static void Main(string[] args)
{
int maxConcurrency = 5;
List<string> messages = Enumerable.Range(1, 15).Select(e => e.ToString()).ToList();
using (SemaphoreSlim concurrencySemaphore = new SemaphoreSlim(maxConcurrency))
{
List<Task> tasks = new List<Task>();
foreach (var msg in messages)
{
concurrencySemaphore.Wait();
var t = Task.Run(async () =>
{
try
{
await Process(msg);
}
finally
{
concurrencySemaphore.Release();
}
});
tasks.Add(t);
}
Task.WaitAll(tasks.ToArray());
}
Console.WriteLine("Exited using block");
Console.ReadKey();
}
private static async Task Process(string msg)
{
await Task.Delay(2000);
Console.WriteLine(msg);
}

You can create your own TaskScheduler and override QueueTask there.
protected virtual void QueueTask(Task task)
Then you can do anything you like.
One example here:
Limited concurrency level task scheduler (with task priority) handling wrapped tasks

You can simply set the max concurrency degree like this way:
int maxConcurrency=10;
var messages = new List<1000>();
using(SemaphoreSlim concurrencySemaphore = new SemaphoreSlim(maxConcurrency))
{
foreach(var msg in messages)
{
Task.Factory.StartNew(() =>
{
concurrencySemaphore.Wait();
try
{
Process(msg);
}
finally
{
concurrencySemaphore.Release();
}
});
}
}

If you need in-order queuing (processing might finish in any order), there is no need for a semaphore. Old fashioned if statements work fine:
const int maxConcurrency = 5;
List<Task> tasks = new List<Task>();
foreach (var arg in args)
{
var t = Task.Run(() => { Process(arg); } );
tasks.Add(t);
if(tasks.Count >= maxConcurrency)
Task.WaitAny(tasks.ToArray());
}
Task.WaitAll(tasks.ToArray());

I ran into a similar problem where I wanted to produce 5000 results while calling apis, etc. So, I ran some speed tests.
Parallel.ForEach(products.Select(x => x.KeyValue).Distinct().Take(100), id =>
{
new ParallelOptions { MaxDegreeOfParallelism = 100 };
GetProductMetaData(productsMetaData, client, id).GetAwaiter().GetResult();
});
produced 100 results in 30 seconds.
Parallel.ForEach(products.Select(x => x.KeyValue).Distinct().Take(100), id =>
{
new ParallelOptions { MaxDegreeOfParallelism = 100 };
GetProductMetaData(productsMetaData, client, id);
});
Moving the GetAwaiter().GetResult() to the individual async api calls inside GetProductMetaData resulted in 14.09 seconds to produce 100 results.
foreach (var id in ids.Take(100))
{
GetProductMetaData(productsMetaData, client, id);
}
Complete non-async programming with the GetAwaiter().GetResult() in api calls resulted in 13.417 seconds.
var tasks = new List<Task>();
while (y < ids.Count())
{
foreach (var id in ids.Skip(y).Take(100))
{
tasks.Add(GetProductMetaData(productsMetaData, client, id));
}
y += 100;
Task.WhenAll(tasks).GetAwaiter().GetResult();
Console.WriteLine($"Finished {y}, {sw.Elapsed}");
}
Forming a task list and working through 100 at a time resulted in a speed of 7.36 seconds.
using (SemaphoreSlim cons = new SemaphoreSlim(10))
{
var tasks = new List<Task>();
foreach (var id in ids.Take(100))
{
cons.Wait();
var t = Task.Factory.StartNew(() =>
{
try
{
GetProductMetaData(productsMetaData, client, id);
}
finally
{
cons.Release();
}
});
tasks.Add(t);
}
Task.WaitAll(tasks.ToArray());
}
Using SemaphoreSlim resulted in 13.369 seconds, but also took a moment to boot to start using it.
var throttler = new SemaphoreSlim(initialCount: take);
foreach (var id in ids)
{
throttler.WaitAsync().GetAwaiter().GetResult();
tasks.Add(Task.Run(async () =>
{
try
{
skip += 1;
await GetProductMetaData(productsMetaData, client, id);
if (skip % 100 == 0)
{
Console.WriteLine($"started {skip}/{count}, {sw.Elapsed}");
}
}
finally
{
throttler.Release();
}
}));
}
Using Semaphore Slim with a throttler for my async task took 6.12 seconds.
The answer for me in this specific project was use a throttler with Semaphore Slim. Although the while foreach tasklist did sometimes beat the throttler, 4/6 times the throttler won for 1000 records.
I realize I'm not using the OPs code, but I think this is important and adds to this discussion because how is sometimes not the only question that should be asked, and the answer is sometimes "It depends on what you are trying to do."
Now to answer the specific questions:
How to limit the maximum number of parallel tasks in c#: I showed how to limit the number of tasks that are completed at a time.
Can we guess how many maximum messages simultaneously get processed at the time (assuming normal Quad core processor), or can we limit the maximum number of messages to be processed at the time? I cannot guess how many will be processed at a time unless I set an upper limit but I can set an upper limit. Obviously different computers function at different speeds due to CPU, RAM etc. and how many threads and cores the program itself has access to as well as other programs running in tandem on the same computer.
How to ensure this message get processed in the same sequence/order of the Collection? If you want to process everything in a specific order, it is synchronous programming. The point of being able to run things asynchronously is ensuring that they can do everything without an order. As you can see from my code, the time difference is minimal in 100 records unless you use async code. In the event that you need an order to what you are doing, use asynchronous programming up until that point, then await and do things synchronously from there. For example, task1a.start, task2a.start, then later task1a.await, task2a.await... then later task1b.start task1b.await and task2b.start task 2b.await.

public static void RunTasks(List<NamedTask> importTaskList)
{
List<NamedTask> runningTasks = new List<NamedTask>();
try
{
foreach (NamedTask currentTask in importTaskList)
{
currentTask.Start();
runningTasks.Add(currentTask);
if (runningTasks.Where(x => x.Status == TaskStatus.Running).Count() >= MaxCountImportThread)
{
Task.WaitAny(runningTasks.ToArray());
}
}
Task.WaitAll(runningTasks.ToArray());
}
catch (Exception ex)
{
Log.Fatal("ERROR!", ex);
}
}

you can use the BlockingCollection, If the consume collection limit has reached, the produce will stop producing until a consume process will finish. I find this pattern more easy to understand and implement than the SemaphoreSlim.
int TasksLimit = 10;
BlockingCollection<Task> tasks = new BlockingCollection<Task>(new ConcurrentBag<Task>(), TasksLimit);
void ProduceAndConsume()
{
var producer = Task.Factory.StartNew(RunProducer);
var consumer = Task.Factory.StartNew(RunConsumer);
try
{
Task.WaitAll(new[] { producer, consumer });
}
catch (AggregateException ae) { }
}
void RunConsumer()
{
foreach (var task in tasks.GetConsumingEnumerable())
{
task.Start();
}
}
void RunProducer()
{
for (int i = 0; i < 1000; i++)
{
tasks.Add(new Task(() => Thread.Sleep(1000), TaskCreationOptions.AttachedToParent));
}
}
Note that the RunProducer and RunConsumer has spawn two independent tasks.

Asynchronous Task.WhenAll with timeout

Is there a way in the new async dotnet 4.5 library to set a timeout on the Task.WhenAll method? I want to fetch several sources, and stop after say 5 seconds, and skip the sources that weren't finished.

You could combine the resulting Task with a Task.Delay() using Task.WhenAny():
await Task.WhenAny(Task.WhenAll(tasks), Task.Delay(timeout));
If you want to harvest completed tasks in case of a timeout:
var completedResults =
tasks
.Where(t => t.Status == TaskStatus.RanToCompletion)
.Select(t => t.Result)
.ToList();

I think a clearer, more robust option that also does exception handling right would be to use Task.WhenAny on each task together with a timeout task, go through all the completed tasks and filter out the timeout ones, and use await Task.WhenAll() instead of Task.Result to gather all the results.
Here's a complete working solution:
static async Task<TResult[]> WhenAll<TResult>(IEnumerable<Task<TResult>> tasks, TimeSpan timeout)
{
var timeoutTask = Task.Delay(timeout).ContinueWith(_ => default(TResult));
var completedTasks =
(await Task.WhenAll(tasks.Select(task => Task.WhenAny(task, timeoutTask)))).
Where(task => task != timeoutTask);
return await Task.WhenAll(completedTasks);
}

Check out the "Early Bailout" and "Task.Delay" sections from Microsoft's Consuming the Task-based Asynchronous Pattern.
Early bailout. An operation represented by t1 can be grouped in a
WhenAny with another task t2, and we can wait on the WhenAny task. t2
could represent a timeout, or cancellation, or some other signal that
will cause the WhenAny task to complete prior to t1 completing.

What you describe seems like a very common demand however I could not find anywhere an example of this. And I searched a lot... I finally created the following:
TimeSpan timeout = TimeSpan.FromSeconds(5.0);
Task<Task>[] tasksOfTasks =
{
Task.WhenAny(SomeTaskAsync("a"), Task.Delay(timeout)),
Task.WhenAny(SomeTaskAsync("b"), Task.Delay(timeout)),
Task.WhenAny(SomeTaskAsync("c"), Task.Delay(timeout))
};
Task[] completedTasks = await Task.WhenAll(tasksOfTasks);
List<MyResult> = completedTasks.OfType<Task<MyResult>>().Select(task => task.Result).ToList();
I assume here a method SomeTaskAsync that returns Task<MyResult>.
From the members of completedTasks, only tasks of type MyResult are our own tasks that managed to beat the clock. Task.Delay returns a different type.
This requires some compromise on typing, but still works beautifully and quite simple.
(The array can of course be built dynamically using a query + ToArray).
Note that this implementation does not require SomeTaskAsync to receive a cancellation token.

In addition to timeout, I also check the cancellation which is useful if you are building a web app.
public static async Task WhenAll(
IEnumerable<Task> tasks,
int millisecondsTimeOut,
CancellationToken cancellationToken)
{
using(Task timeoutTask = Task.Delay(millisecondsTimeOut))
using(Task cancellationMonitorTask = Task.Delay(-1, cancellationToken))
{
Task completedTask = await Task.WhenAny(
Task.WhenAll(tasks),
timeoutTask,
cancellationMonitorTask
);
if (completedTask == timeoutTask)
{
throw new TimeoutException();
}
if (completedTask == cancellationMonitorTask)
{
throw new OperationCanceledException();
}
await completedTask;
}
}

Check out a custom task combinator proposed in http://tutorials.csharp-online.net/Task_Combinators
async static Task<TResult> WithTimeout<TResult>
(this Task<TResult> task, TimeSpan timeout)
{
Task winner = await (Task.WhenAny
(task, Task.Delay (timeout)));
if (winner != task) throw new TimeoutException();
return await task; // Unwrap result/re-throw
}
I have not tried it yet.

void result version of #i3arnon 's answer, along with comments and changing first argument to use extension this.
I've also got a forwarding method specifying timeout as an int using TimeSpan.FromMilliseconds(millisecondsTimeout) to match other Task methods.
public static async Task WhenAll(this IEnumerable<Task> tasks, TimeSpan timeout)
{
// Create a timeout task.
var timeoutTask = Task.Delay(timeout);
// Get the completed tasks made up of...
var completedTasks =
(
// ...all tasks specified
await Task.WhenAll(tasks
// Now finish when its task has finished or the timeout task finishes
.Select(task => Task.WhenAny(task, timeoutTask)))
)
// ...but not the timeout task
.Where(task => task != timeoutTask);
// And wait for the internal WhenAll to complete.
await Task.WhenAll(completedTasks);
}

Seems like the Task.WaitAll overload with the timeout parameter is all you need - if it returns true, then you know they all completed - otherwise, you can filter on IsCompleted.
if (Task.WaitAll(tasks, myTimeout) == false)
{
tasks = tasks.Where(t => t.IsCompleted);
}
...

I came to the following piece of code that does what I needed:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using System.Net.Http;
using System.Json;
using System.Threading;
namespace MyAsync
{
class Program
{
static void Main(string[] args)
{
var cts = new CancellationTokenSource();
Console.WriteLine("Start Main");
List<Task<List<MyObject>>> listoftasks = new List<Task<List<MyObject>>>();
listoftasks.Add(GetGoogle(cts));
listoftasks.Add(GetTwitter(cts));
listoftasks.Add(GetSleep(cts));
listoftasks.Add(GetxSleep(cts));
List<MyObject>[] arrayofanswers = Task.WhenAll(listoftasks).Result;
List<MyObject> answer = new List<MyObject>();
foreach (List<MyObject> answers in arrayofanswers)
{
answer.AddRange(answers);
}
foreach (MyObject o in answer)
{
Console.WriteLine("{0} - {1}", o.name, o.origin);
}
Console.WriteLine("Press <Enter>");
Console.ReadLine();
}
static async Task<List<MyObject>> GetGoogle(CancellationTokenSource cts)
{
try
{
Console.WriteLine("Start GetGoogle");
List<MyObject> l = new List<MyObject>();
var client = new HttpClient();
Task<HttpResponseMessage> awaitable = client.GetAsync("http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=broersa", cts.Token);
HttpResponseMessage res = await awaitable;
Console.WriteLine("After GetGoogle GetAsync");
dynamic data = JsonValue.Parse(res.Content.ReadAsStringAsync().Result);
Console.WriteLine("After GetGoogle ReadAsStringAsync");
foreach (var r in data.responseData.results)
{
l.Add(new MyObject() { name = r.titleNoFormatting, origin = "google" });
}
return l;
}
catch (TaskCanceledException)
{
return new List<MyObject>();
}
}
static async Task<List<MyObject>> GetTwitter(CancellationTokenSource cts)
{
try
{
Console.WriteLine("Start GetTwitter");
List<MyObject> l = new List<MyObject>();
var client = new HttpClient();
Task<HttpResponseMessage> awaitable = client.GetAsync("http://search.twitter.com/search.json?q=broersa&rpp=5&include_entities=true&result_type=mixed",cts.Token);
HttpResponseMessage res = await awaitable;
Console.WriteLine("After GetTwitter GetAsync");
dynamic data = JsonValue.Parse(res.Content.ReadAsStringAsync().Result);
Console.WriteLine("After GetTwitter ReadAsStringAsync");
foreach (var r in data.results)
{
l.Add(new MyObject() { name = r.text, origin = "twitter" });
}
return l;
}
catch (TaskCanceledException)
{
return new List<MyObject>();
}
}
static async Task<List<MyObject>> GetSleep(CancellationTokenSource cts)
{
try
{
Console.WriteLine("Start GetSleep");
List<MyObject> l = new List<MyObject>();
await Task.Delay(5000,cts.Token);
l.Add(new MyObject() { name = "Slept well", origin = "sleep" });
return l;
}
catch (TaskCanceledException)
{
return new List<MyObject>();
}
}
static async Task<List<MyObject>> GetxSleep(CancellationTokenSource cts)
{
Console.WriteLine("Start GetxSleep");
List<MyObject> l = new List<MyObject>();
await Task.Delay(2000);
cts.Cancel();
l.Add(new MyObject() { name = "Slept short", origin = "xsleep" });
return l;
}
}
}
My explanation is in my blogpost:
http://blog.bekijkhet.com/2012/03/c-async-examples-whenall-whenany.html

In addition to svick's answer, the following works for me when I have to wait for a couple of tasks to complete but have to process something else while I'm waiting:
Task[] TasksToWaitFor = //Your tasks
TimeSpan Timeout = TimeSpan.FromSeconds( 30 );
while( true )
{
await Task.WhenAny( Task.WhenAll( TasksToWaitFor ), Task.Delay( Timeout ) );
if( TasksToWaitFor.All( a => a.IsCompleted ) )
break;
//Do something else here
}

You can use the following code:
var timeoutTime = 10;
var tasksResult = await Task.WhenAll(
listOfTasks.Select(x => Task.WhenAny(
x, Task.Delay(TimeSpan.FromMinutes(timeoutTime)))
)
);
var succeededtasksResponses = tasksResult
.OfType<Task<MyResult>>()
.Select(task => task.Result);
if (succeededtasksResponses.Count() != listOfTasks.Count())
{
// Not all tasks were completed
// Throw error or do whatever you want
}
//You can use the succeededtasksResponses that contains the list of successful responses
How it works:
You need to put in the timeoutTime variable the limit of time for all tasks to be completed. So basically all tasks will wait in maximum the time that you set in timeoutTime. When all the tasks return the result, the timeout will not occur and the tasksResult will be set.
After that we are only getting the completed tasks. The tasks that were not completed will have no results.

I tried to improve on the excellent i3arnon's solution, in order to fix some minor issues, but I ended up with a completely different implementation. The two issues that I tried to solve are:
In case more than one of the tasks fail, propagate the errors of all failed tasks, and not just the error of the first failed task in the list.
Prevent memory leaks in case all tasks complete much faster than the timeout.
Leaking an active Task.Delay might result in a non-negligible amount of leaked memory, in case the WhenAll is called in a loop, and the timeout is large.
On top of that I added a cancellationToken argument, XML documentation that explains what this method is doing, and argument validation. Here it is:
/// <summary>
/// Returns a task that will complete when all of the tasks have completed,
/// or when the timeout has elapsed, or when the token is canceled, whatever
/// comes first. In case the tasks complete first, the task contains the
/// results/exceptions of all the tasks. In case the timeout elapsed first,
/// the task contains the results/exceptions of the completed tasks only.
/// In case the token is canceled first, the task is canceled. To determine
/// whether a timeout has occured, compare the number of the results with
/// the number of the tasks.
/// </summary>
public static Task<TResult[]> WhenAll<TResult>(
Task<TResult>[] tasks,
TimeSpan timeout, CancellationToken cancellationToken = default)
{
if (tasks == null) throw new ArgumentNullException(nameof(tasks));
tasks = tasks.ToArray(); // Defensive copy
if (tasks.Any(t => t == null)) throw new ArgumentException(
$"The {nameof(tasks)} argument included a null value.", nameof(tasks));
if (timeout < TimeSpan.Zero && timeout != Timeout.InfiniteTimeSpan)
throw new ArgumentOutOfRangeException(nameof(timeout));
if (cancellationToken.IsCancellationRequested)
return Task.FromCanceled<TResult[]>(cancellationToken);
var cts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken);
cts.CancelAfter(timeout);
var continuationOptions = TaskContinuationOptions.DenyChildAttach |
TaskContinuationOptions.ExecuteSynchronously;
var continuations = tasks.Select(task => task.ContinueWith(_ => { },
cts.Token, continuationOptions, TaskScheduler.Default));
return Task.WhenAll(continuations).ContinueWith(allContinuations =>
{
cts.Dispose();
if (allContinuations.IsCompletedSuccessfully)
return Task.WhenAll(tasks); // No timeout or cancellation occurred
Debug.Assert(allContinuations.IsCanceled);
if (cancellationToken.IsCancellationRequested)
return Task.FromCanceled<TResult[]>(cancellationToken);
// Now we know that timeout has occurred
return Task.WhenAll(tasks.Where(task => task.IsCompleted));
}, default, continuationOptions, TaskScheduler.Default).Unwrap();
}
This WhenAll implementation elides async and await, which is not advisable in general. In this case it is necessary, in order to propagate all the errors in a not nested AggregateException. The intention is to simulate the behavior of the built-in Task.WhenAll method as accurately as possible.
Usage example:
string[] results;
Task<string[]> whenAllTask = WhenAll(tasks, TimeSpan.FromSeconds(15));
try
{
results = await whenAllTask;
}
catch when (whenAllTask.IsFaulted) // It might also be canceled
{
// Log all errors
foreach (var innerEx in whenAllTask.Exception.InnerExceptions)
{
_logger.LogError(innerEx, innerEx.Message);
}
throw; // Propagate the error of the first failed task
}
if (results.Length < tasks.Length) throw new TimeoutException();
return results;
Note: the above API has a design flaw. In case at least one of the tasks has failed or has been canceled, there is no way to determine whether a timeout has occurred. The Exception.InnerExceptions property of the task returned by the WhenAll may contain the exceptions of all tasks, or part of the tasks, and there is no way to say which is which. Unfortunately I can't think of a solution to this problem.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Observable.Retry doesn't work as expected - c#

Related

RX terminolgy: Async processing in RX operator when there are frequent observable notifications

C# Running many async tasks the same time

Access results in Enumerable.Range without having to wait for all tasks to complete

How to limit the Maximum number of parallel tasks in c#

Asynchronous Task.WhenAll with timeout

Categories

Resources