How to await multiple IAsyncEnumerable - c#

We have code like this:
var intList = new List<int>{1,2,3};
var asyncEnumerables = intList.Select(Foo);
private async IAsyncEnumerable<int> Foo(int a)
{
while (true)
{
await Task.Delay(5000);
yield return a;
}
}
I need to start await foreach for every asyncEnumerable's entry. Every loop iteration should wait each other, and when every iteration is done i need to collect every iteration's data and process that by another method.
Can i somehow achieve that by TPL? Otherwise, couldn't you give me some ideas?

What works for me is the Zip function in this repo (81 line)
I'm using it like this
var intList = new List<int> { 1, 2, 3 };
var asyncEnumerables = intList.Select(RunAsyncIterations);
var enumerableToIterate = async_enumerable_dotnet.AsyncEnumerable.Zip(s => s, asyncEnumerables.ToArray());
await foreach (int[] enumerablesConcatenation in enumerableToIterate)
{
Console.WriteLine(enumerablesConcatenation.Sum()); //Sum returns 6
await Task.Delay(2000);
}
static async IAsyncEnumerable<int> RunAsyncIterations(int i)
{
while (true)
yield return i;
}

Here is a generic method Zip you could use, implemented as an iterator. The cancellationToken is decorated with the EnumeratorCancellation attribute, so that the resulting IAsyncEnumerable is WithCancellation friendly.
using System.Runtime.CompilerServices;
public static async IAsyncEnumerable<TSource[]> Zip<TSource>(
IEnumerable<IAsyncEnumerable<TSource>> sources,
[EnumeratorCancellation]CancellationToken cancellationToken = default)
{
var enumerators = sources
.Select(x => x.GetAsyncEnumerator(cancellationToken))
.ToArray();
try
{
while (true)
{
var array = new TSource[enumerators.Length];
for (int i = 0; i < enumerators.Length; i++)
{
if (!await enumerators[i].MoveNextAsync()) yield break;
array[i] = enumerators[i].Current;
}
yield return array;
}
}
finally
{
foreach (var enumerator in enumerators)
{
await enumerator.DisposeAsync();
}
}
}
Usage example:
await foreach (int[] result in Zip(asyncEnumerables))
{
Console.WriteLine($"Result: {String.Join(", ", result)}");
}

Related

How do I run a method both parallel and sequentially in C#?

I have a C# console app. In this app, I have a method that I will call DoWorkAsync. For the context of this question, this method looks like this:
private async Task<string> DoWorkAsync()
{
System.Threading.Thread.Sleep(5000);
var random = new Random();
var chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
var length = random.Next(10, 101);
await Task.CompletedTask;
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}
I call DoWorkAsync from another method that determines a) how many times this will get ran and b) if each call will be ran in parallel or sequentially. That method looks like this:
private async Task<Task<string>[]> DoWork(int iterations, bool runInParallel)
{
var tasks = new List<Task<string>>();
for (var i=0; i<iterations; i++)
{
if (runInParallel)
{
var task = Task.Run(() => DoWorkAsync());
tasks.Add(task);
}
else
{
await DoWorkAsync();
}
}
return tasks.ToArray();
}
After all of the tasks are completed, I want to display the results. To do this, I have code that looks like this:
var random = new Random();
var tasks = await DoWork(random.Next(10, 101);
Task.WaitAll(tasks);
foreach (var task in tasks)
{
Console.WriteLine(task.Result);
}
This code works as expected if the code runs in parallel (i.e. runInParallel is true). However, when runInParallel is false (i.e. I want to run the Tasks sequentially) the Task array doesn't get populated. So, the caller doesn't have any results to work with. I don't know how to fix it though. I'm not sure how to add the method call as a Task that will run sequentially. I understand that the idea behind Tasks is to run in parallel. However, I have this need to toggle between parallel and sequential.
Thank you!
the Task array doesn't get populated.
So populate it:
else
{
var task = DoWorkAsync();
tasks.Add(task);
await task;
}
P.S.
Also your DoWorkAsync looks kinda wrong to me, why Thread.Sleep and not await Task.Delay (it is more correct way to simulate asynchronous execution, also you won't need await Task.CompletedTask this way). And if you expect DoWorkAsync to be CPU bound just make it like:
private Task<string> DoWorkAsync()
{
return Task.Run(() =>
{
// your cpu bound work
return "string";
});
}
After that you can do something like this (for both async/cpu bound work):
private async Task<string[]> DoWork(int iterations, bool runInParallel)
{
if(runInParallel)
{
var tasks = Enumerable.Range(0, iterations)
.Select(i => DoWorkAsync());
return await Task.WhenAll(tasks);
}
else
{
var result = new string[iterations];
for (var i = 0; i < iterations; i++)
{
result[i] = await DoWorkAsync();
}
return result;
}
}
Why is DoWorkAsync an async method?
It isn't currently doing anything asynchronous.
It seems that you are trying to utilise multiple threads to improve the performance of expensive CPU-bound work, so you would be better to make use of Parallel.For, which is designed for this purpose:
private string DoWork()
{
System.Threading.Thread.Sleep(5000);
var random = new Random();
var chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
var length = random.Next(10, 101);
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}
private string[] DoWork(int iterations, bool runInParallel)
{
var results = new string[iterations];
if (runInParallel)
{
Parallel.For(0, iterations - 1, i => results[i] = DoWork());
}
else
{
for (int i = 0; i < iterations; i++) results[i] = DoWork();
}
return results;
}
Then:
var random = new Random();
var serial = DoWork(random.Next(10, 101));
var parallel = DoWork(random.Next(10, 101), true);
I think you'd be better off doing the following:
Create a function that creates a (cold) list of tasks (or an array Task<string>[] for instance). No need to run them. Let's call this GetTasks()
var jobs = GetTasks();
Then, if you want to run them "sequentially", just do
var results = new List<string>();
foreach (var job in jobs)
{
var result = await job;
results.Add(result);
}
return results;
If you want to run them in parallel :
foreach (var job in jobs)
{
job.Start();
}
await results = Task.WhenAll(jobs);
Another note,
All this in itself should be a Task<string[]>, the Task<Task<... smells like a problem.

What might ChannelReader<T>.ReadAny implementation look like?

BlockingCollection<T> has the handy static TakeFromAny method allowing you to consume multiple collections "I want the next item from any of these collections".
ChannelReader<T> doesn't have an equivalent so if you did want to consume multiple channels into a single stream - say to print received items to Console 1 by 1, how might this be done?
The fast path is easy, but the slow path is quite tricky. The implementation below returns a Task<ValueTuple<T, int>> that contains the value taken from one of the readers, and the zero-based index of that reader in the input array.
public static Task<(T Item, int Index)> ReadFromAnyAsync<T>(
params ChannelReader<T>[] channelReaders) =>
ReadFromAnyAsync(channelReaders, CancellationToken.None);
public static async Task<(T Item, int Index)> ReadFromAnyAsync<T>(
ChannelReader<T>[] channelReaders,
CancellationToken cancellationToken)
{
cancellationToken.ThrowIfCancellationRequested();
// Fast path
for (int i = 0; i < channelReaders.Length; i++)
{
if (channelReaders[i].TryRead(out var item)) return (item, i);
}
// Slow path
var locker = new object();
int resultIndex = -1;
T resultItem = default;
while (true)
{
using (var cts = CancellationTokenSource
.CreateLinkedTokenSource(cancellationToken, default))
{
bool availableAny = false;
Task[] tasks = channelReaders
.Select(async (reader, index) =>
{
try
{
bool available = await reader.WaitToReadAsync(cts.Token)
.ConfigureAwait(false);
if (!available) return;
}
catch // Cancellation, or channel completed with exception
{
return;
}
availableAny = true;
lock (locker) // Take from one reader only
{
if (resultIndex == -1 && reader.TryRead(out var item))
{
resultIndex = index;
resultItem = item;
cts.Cancel();
}
}
})
.ToArray();
await Task.WhenAll(tasks).ConfigureAwait(false);
if (resultIndex != -1) return (resultItem, resultIndex);
cancellationToken.ThrowIfCancellationRequested();
if (!availableAny) throw new ChannelClosedException(
"All channels are marked as completed.");
}
}
}

Restricting the enumerations of LINQ queries to One Only

I have a LINQ query that should NOT be enumerated more than once, and I want to avoid enumerating it twice by mistake. Is there any extension method I can use to ensure that I am protected from such a mistake? I am thinking about something like this:
var numbers = Enumerable.Range(1, 10).OnlyOnce();
Console.WriteLine(numbers.Count()); // shows 10
Console.WriteLine(numbers.Count()); // throws InvalidOperationException: The query cannot be enumerated more than once.
The reason I want this functionality is because I have an enumerable of tasks, that is intended to instantiate and run the tasks progressivelly, while it is enumerated slowly under control. I already made the mistake to run the tasks twice because I forgot that it's a differed enumerable and not
an array.
var tasks = Enumerable.Range(1, 10).Select(n => Task.Run(() => Console.WriteLine(n)));
Task.WaitAll(tasks.ToArray()); // Lets wait for the tasks to finish...
Console.WriteLine(String.Join(", ", tasks.Select(t => t.Id))); // Lets see the completed task IDs...
// Oups! A new set of tasks started running!
I want to avoid enumerating it twice by mistake.
You can wrap the collection with a collection that throws if it's enumerated twice.
eg:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
namespace ConsoleApp8
{
public static class EnumExtension
{
class OnceEnumerable<T> : IEnumerable<T>
{
IEnumerable<T> col;
bool hasBeenEnumerated = false;
public OnceEnumerable(IEnumerable<T> col)
{
this.col = col;
}
public IEnumerator<T> GetEnumerator()
{
if (hasBeenEnumerated)
{
throw new InvalidOperationException("This collection has already been enumerated.");
}
this.hasBeenEnumerated = true;
return col.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
public static IEnumerable<T> OnlyOnce<T>(this IEnumerable<T> col)
{
return new OnceEnumerable<T>(col);
}
}
class Program
{
static void Main(string[] args)
{
var col = Enumerable.Range(1, 10).OnlyOnce();
var colCount = col.Count(); //first enumeration
foreach (var c in col) //second enumeration
{
Console.WriteLine(c);
}
}
}
}
Enumerables enumerate, end of story. You just need to call ToList, or ToArray
// this will enumerate and start the tasks
var tasks = Enumerable.Range(1, 10)
.Select(n => Task.Run(() => Console.WriteLine(n)))
.ToList();
// wait for them all to finish
Task.WaitAll(tasks.ToArray());
Console.WriteLine(String.Join(", ", tasks.Select(t => t.Id)));
Hrm if you want parallelism
Parallel.For(0, 100, index => Console.WriteLine(index) );
or if you are using async and await pattern
public static async Task DoWorkLoads(IEnumerable <Something> results)
{
var options = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 50
};
var block = new ActionBlock<Something>(MyMethodAsync, options);
foreach (var result in results)
block.Post(result);
block.Complete();
await block.Completion;
}
...
public async Task MyMethodAsync(Something result)
{
await SomethingAsync(result);
}
Update, Since you are after a way to control the max degree of conncurrency, you could use this
public static async Task<IEnumerable<Task>> ExecuteInParallel<T>(this IEnumerable<T> collection,Func<T, Task> callback,int degreeOfParallelism)
{
var queue = new ConcurrentQueue<T>(collection);
var tasks = Enumerable.Range(0, degreeOfParallelism)
.Select(async _ =>
{
while (queue.TryDequeue(out var item))
await callback(item);
})
.ToArray();
await Task.WhenAll(tasks);
return tasks;
}
Rx certainly is an option to control parallelism.
var query =
Observable
.Range(1, 10)
.Select(n => Observable.FromAsync(() => Task.Run(() => new { Id = n })));
var tasks = query.Merge(maxConcurrent: 3).ToArray().Wait();
Console.WriteLine(String.Join(", ", tasks.Select(t => t.Id)));

Transform IEnumerable<Task<T>> asynchronously by awaiting each task

Today I was wondering how to transform a list of Tasks by awaiting each of it.
Consider the following example:
private static void Main(string[] args)
{
try
{
Run(args);
Console.ReadLine();
}
catch (Exception ex)
{
Console.WriteLine(ex.ToString());
Console.ReadLine();
}
}
static async Task Run(string[] args)
{
//Version 1: does compile, but ugly and List<T> overhead
var tasks1 = GetTasks();
List<string> gainStrings1 = new List<string>();
foreach (Task<string> task in tasks1)
{
gainStrings1.Add(await task);
}
Console.WriteLine(string.Join("", gainStrings1));
//Version 2: does not compile
var tasks2 = GetTasks();
IEnumerable<string> gainStrings2 = tasks2.Select(async t => await t);
Console.WriteLine(string.Join("", gainStrings2));
}
static IEnumerable<Task<string>> GetTasks()
{
string[] messages = new[] { "Hello", " ", "async", " ", "World" };
for (int i = 0; i < messages.Length; i++)
{
TaskCompletionSource<string> tcs = new TaskCompletionSource<string>();
tcs.SetResult(messages[i]);
yield return tcs.Task;
}
}
I'd like to transform my list of Tasks without the foreach, however either the anonymous function syntax nor the usual function syntax allows me to do what my foreach does.
Do I have to rely on my foreach and the List<T> or is there any way to get it to work with IEnumerable<T> and all its advantages?
What about this:
await Task.WhenAll(tasks1);
var gainStrings = tasks1.Select(t => t.Result).ToList();
Wait for all tasks to end and then extract results. This is ideal if you don't care in which order they are finished.
EDIT2:
Even better way:
var gainStrings = await Task.WhenAll(tasks1);

Parallel.Foreach + yield return?

I want to process something using parallel loop like this :
public void FillLogs(IEnumerable<IComputer> computers)
{
Parallel.ForEach(computers, cpt=>
{
cpt.Logs = cpt.GetRawLogs().ToList();
});
}
Ok, it works fine. But How to do if I want the FillLogs method return an IEnumerable ?
public IEnumerable<IComputer> FillLogs(IEnumerable<IComputer> computers)
{
Parallel.ForEach(computers, cpt=>
{
cpt.Logs = cpt.GetRawLogs().ToList();
yield return cpt // KO, don't work
});
}
EDIT
It seems not to be possible... but I use something like this :
public IEnumerable<IComputer> FillLogs(IEnumerable<IComputer> computers)
{
return computers.AsParallel().Select(cpt => cpt);
}
But where I put the cpt.Logs = cpt.GetRawLogs().ToList(); instruction
Short version - no, that isn't possible via an iterator block; the longer version probably involves synchronized queue/dequeue between the caller's iterator thread (doing the dequeue) and the parallel workers (doing the enqueue); but as a side note - logs are usually IO-bound, and parallelising things that are IO-bound often doesn't work very well.
If the caller is going to take some time to consume each, then there may be some merit to an approach that only processes one log at a time, but can do that while the caller is consuming the previous log; i.e. it begins a Task for the next item before the yield, and waits for completion after the yield... but that is again, pretty complex. As a simplified example:
static void Main()
{
foreach(string s in Get())
{
Console.WriteLine(s);
}
}
static IEnumerable<string> Get() {
var source = new[] {1, 2, 3, 4, 5};
Task<string> outstandingItem = null;
Func<object, string> transform = x => ProcessItem((int) x);
foreach(var item in source)
{
var tmp = outstandingItem;
// note: passed in as "state", not captured, so not a foreach/capture bug
outstandingItem = new Task<string>(transform, item);
outstandingItem.Start();
if (tmp != null) yield return tmp.Result;
}
if (outstandingItem != null) yield return outstandingItem.Result;
}
static string ProcessItem(int i)
{
return i.ToString();
}
I don't want to be offensive, but maybe there is a lack of understanding. Parallel.ForEach means that the TPL will run the foreach according to the available hardware in several threads. But that means, that ii is possible to do that work in parallel! yield return gives you the opportunity to get some values out of a list (or what-so-ever) and give them back one-by-one as they are needed. It prevents of the need to first find all items matching the condition and then iterate over them. That is indeed a performance advantage, but can't be done in parallel.
Although the question is old I've managed to do something just for fun.
class Program
{
static void Main(string[] args)
{
foreach (var message in GetMessages())
{
Console.WriteLine(message);
}
}
// Parallel yield
private static IEnumerable<string> GetMessages()
{
int total = 0;
bool completed = false;
var batches = Enumerable.Range(1, 100).Select(i => new Computer() { Id = i });
var qu = new ConcurrentQueue<Computer>();
Task.Run(() =>
{
try
{
Parallel.ForEach(batches,
() => 0,
(item, loop, subtotal) =>
{
Thread.Sleep(1000);
qu.Enqueue(item);
return subtotal + 1;
},
result => Interlocked.Add(ref total, result));
}
finally
{
completed = true;
}
});
int current = 0;
while (current < total || !completed)
{
SpinWait.SpinUntil(() => current < total || completed);
if (current == total) yield break;
current++;
qu.TryDequeue(out Computer computer);
yield return $"Completed {computer.Id}";
}
}
}
public class Computer
{
public int Id { get; set; }
}
Compared to Koray's answer this one really uses all the CPU cores.
You can use the following extension method
public static class ParallelExtensions
{
public static IEnumerable<T1> OrderedParallel<T, T1>(this IEnumerable<T> list, Func<T, T1> action)
{
var unorderedResult = new ConcurrentBag<(long, T1)>();
Parallel.ForEach(list, (o, state, i) =>
{
unorderedResult.Add((i, action.Invoke(o)));
});
var ordered = unorderedResult.OrderBy(o => o.Item1);
return ordered.Select(o => o.Item2);
}
}
use like:
public void FillLogs(IEnumerable<IComputer> computers)
{
cpt.Logs = computers.OrderedParallel(o => o.GetRawLogs()).ToList();
}
Hope this will save you some time.
How about
Queue<string> qu = new Queue<string>();
bool finished = false;
Task.Factory.StartNew(() =>
{
Parallel.ForEach(get_list(), (item) =>
{
string itemToReturn = heavyWorkOnItem(item);
lock (qu)
qu.Enqueue(itemToReturn );
});
finished = true;
});
while (!finished)
{
lock (qu)
while (qu.Count > 0)
yield return qu.Dequeue();
//maybe a thread sleep here?
}
Edit:
I think this is better:
public static IEnumerable<TOutput> ParallelYieldReturn<TSource, TOutput>(this IEnumerable<TSource> source, Func<TSource, TOutput> func)
{
ConcurrentQueue<TOutput> qu = new ConcurrentQueue<TOutput>();
bool finished = false;
AutoResetEvent re = new AutoResetEvent(false);
Task.Factory.StartNew(() =>
{
Parallel.ForEach(source, (item) =>
{
qu.Enqueue(func(item));
re.Set();
});
finished = true;
re.Set();
});
while (!finished)
{
re.WaitOne();
while (qu.Count > 0)
{
TOutput res;
if (qu.TryDequeue(out res))
yield return res;
}
}
}
Edit2: I agree with the short No answer. This code is useless; you cannot break the yield loop.

Categories