BlockingCollection<T> has the handy static TakeFromAny method allowing you to consume multiple collections "I want the next item from any of these collections".
ChannelReader<T> doesn't have an equivalent so if you did want to consume multiple channels into a single stream - say to print received items to Console 1 by 1, how might this be done?
The fast path is easy, but the slow path is quite tricky. The implementation below returns a Task<ValueTuple<T, int>> that contains the value taken from one of the readers, and the zero-based index of that reader in the input array.
public static Task<(T Item, int Index)> ReadFromAnyAsync<T>(
params ChannelReader<T>[] channelReaders) =>
ReadFromAnyAsync(channelReaders, CancellationToken.None);
public static async Task<(T Item, int Index)> ReadFromAnyAsync<T>(
ChannelReader<T>[] channelReaders,
CancellationToken cancellationToken)
{
cancellationToken.ThrowIfCancellationRequested();
// Fast path
for (int i = 0; i < channelReaders.Length; i++)
{
if (channelReaders[i].TryRead(out var item)) return (item, i);
}
// Slow path
var locker = new object();
int resultIndex = -1;
T resultItem = default;
while (true)
{
using (var cts = CancellationTokenSource
.CreateLinkedTokenSource(cancellationToken, default))
{
bool availableAny = false;
Task[] tasks = channelReaders
.Select(async (reader, index) =>
{
try
{
bool available = await reader.WaitToReadAsync(cts.Token)
.ConfigureAwait(false);
if (!available) return;
}
catch // Cancellation, or channel completed with exception
{
return;
}
availableAny = true;
lock (locker) // Take from one reader only
{
if (resultIndex == -1 && reader.TryRead(out var item))
{
resultIndex = index;
resultItem = item;
cts.Cancel();
}
}
})
.ToArray();
await Task.WhenAll(tasks).ConfigureAwait(false);
if (resultIndex != -1) return (resultItem, resultIndex);
cancellationToken.ThrowIfCancellationRequested();
if (!availableAny) throw new ChannelClosedException(
"All channels are marked as completed.");
}
}
}
Related
We have code like this:
var intList = new List<int>{1,2,3};
var asyncEnumerables = intList.Select(Foo);
private async IAsyncEnumerable<int> Foo(int a)
{
while (true)
{
await Task.Delay(5000);
yield return a;
}
}
I need to start await foreach for every asyncEnumerable's entry. Every loop iteration should wait each other, and when every iteration is done i need to collect every iteration's data and process that by another method.
Can i somehow achieve that by TPL? Otherwise, couldn't you give me some ideas?
What works for me is the Zip function in this repo (81 line)
I'm using it like this
var intList = new List<int> { 1, 2, 3 };
var asyncEnumerables = intList.Select(RunAsyncIterations);
var enumerableToIterate = async_enumerable_dotnet.AsyncEnumerable.Zip(s => s, asyncEnumerables.ToArray());
await foreach (int[] enumerablesConcatenation in enumerableToIterate)
{
Console.WriteLine(enumerablesConcatenation.Sum()); //Sum returns 6
await Task.Delay(2000);
}
static async IAsyncEnumerable<int> RunAsyncIterations(int i)
{
while (true)
yield return i;
}
Here is a generic method Zip you could use, implemented as an iterator. The cancellationToken is decorated with the EnumeratorCancellation attribute, so that the resulting IAsyncEnumerable is WithCancellation friendly.
using System.Runtime.CompilerServices;
public static async IAsyncEnumerable<TSource[]> Zip<TSource>(
IEnumerable<IAsyncEnumerable<TSource>> sources,
[EnumeratorCancellation]CancellationToken cancellationToken = default)
{
var enumerators = sources
.Select(x => x.GetAsyncEnumerator(cancellationToken))
.ToArray();
try
{
while (true)
{
var array = new TSource[enumerators.Length];
for (int i = 0; i < enumerators.Length; i++)
{
if (!await enumerators[i].MoveNextAsync()) yield break;
array[i] = enumerators[i].Current;
}
yield return array;
}
}
finally
{
foreach (var enumerator in enumerators)
{
await enumerator.DisposeAsync();
}
}
}
Usage example:
await foreach (int[] result in Zip(asyncEnumerables))
{
Console.WriteLine($"Result: {String.Join(", ", result)}");
}
I am having an issue with my code below. I have a cold source that begins when you subscribe. I want to run Observable once, so I am using a replay call on it. What I found is that when it hits the conditional branch to write the header, it starts the Observable on the call to FirstAsync, and then starts the Observable again in a new thread at the ForEachAsync call. I end up with the observable running concurently in two threads. I am not sure why this is occuring.
public async Task WriteToFileAsync(string filename, IObservable<IFormattableTestResults> source, bool overwrite)
{
ThrowIfInvalidFileName(filename);
var path = Path.Combine(_path, filename);
bool fileExists = File.Exists(path);
using (var writer = new StreamWriter(path, !overwrite))
{
var replay = source.Replay().RefCount();
if (overwrite || !fileExists)
{
var first = await replay.FirstAsync();
var header = GetCsvHeader(first.GetResults());
await writer.WriteLineAsync(header);
}
await replay.ForEachAsync(result => writer.WriteLineAsync(FormatCsv(result.GetResults())));
}
}
Edit 10/22/2015: Adding more code
private Task RunIVBoardCurrentAdjust(IVBoardAdjustment test)
{
logger.Info("Loading IV board current adjustment test.");
var testCases = _testCaseLoader.GetIVBoardCurrentAdjustTests().ToArray();
var source = test.RunCurrentAdjustment(testCases);
return _fileResultsService.WriteToFileAsync("IVBoardCurrentAdjust.csv", source, false);
}
public IObservable<IVBoardCurrentAdjustTestResults> RunCurrentAdjustment(IEnumerable<IVBoardCurrentAdjustTestCase> testCases)
{
return
testCases
.Select(RunCurrentAdjustment)
.Concat();
}
public IObservable<IVBoardCurrentAdjustTestResults> RunCurrentAdjustment(IVBoardCurrentAdjustTestCase testCase)
{
logger.Debug("Preparing IV board current adjustment procedure.");
return Observable.Create<IVBoardCurrentAdjustTestResults>(
(observer, cancelToken) =>
{
var results =
RunAdjustment(testCase)
.Do(result => logger.Trace(""))
.Select(
(output, i) =>
new IVBoardCurrentAdjustTestResults(i, testCase, output)
{
Component = "IV Board",
Description = "Characterization (Secant Method)"
})
.Replay();
results.Subscribe(observer, cancelToken);
var task = StoreResultInBTD(results, testCase, 1/testCase.Load);
results.Connect();
return task;
});
}
private IObservable<IRootFindingResult> RunAdjustment<T>(IVBoardAdjustTestCase<T> testCase) where T : DacCharacterizationSecantInput
{
logger.Debug("Initializing IV board test.");
SetupTest(testCase);
return
new DacCharacterization()
.RunSecantMethod(
code => _yellowCake.IVBoard.DacRegister.Value = code,
() => _dmm.Read(),
GetTestInputs(testCase));
}
private async Task StoreResultInBTD(IObservable<IVBoardAdjustTestResults> results, IVBoardAdjustTestCase testCase, double targetScalingFactor = 1)
{
var points =
results
.Select(
result =>
new IVBoardCharacteristicCurveTestPoint(
(result.Output.Target - result.Output.Error) * targetScalingFactor,
(int)result.Output.Root));
var curve = await points.ToArray();
_yellowCake.BoardTest.WriteIVBoardAjust(curve, testCase.Mode, testCase.Range);
_yellowCake.BoardTest.SaveToFile();
}
private IEnumerable<DacCharacterizationSecantInput> GetTestInputs<T>(IVBoardAdjustTestCase<T> testCase) where T : DacCharacterizationSecantInput
{
foreach (var input in testCase.Inputs)
{
logger.Debug("Getting next test input.");
_dmm.Config.PowerLineCycles.Value = input.IntegrationTime;
yield return input.Input;
}
}
public IObservable<IRootFindingResult> RunSecantMethod(
Action<int> setDacOutput,
Func<double> readMeanOutput,
IEnumerable<DacCharacterizationSecantInput> inputs)
{
var search = new SecantSearch();
var param = SecantMethodParameter.Create(setDacOutput, readMeanOutput);
return
Observable
.Create<IRootFindingResult>(
(observer, cancelToken) =>
Task.Run(() =>
{
foreach (var input in inputs)
{
cancelToken.ThrowIfCancellationRequested();
var result =
search.FindRoot(
param.SearchFunction,
input.FirstGuess,
input.SecondGuess,
input.Target,
input.SearchOptions,
cancelToken,
() => param.AdaptedDacCode);
if (!result.Converged)
{
observer.OnError(new FailedToConvergeException(result));
}
observer.OnNext(result);
}
}, cancelToken));
}
I have found many methods of using the TaskFactory but I could not find anything about starting more tasks and watching when one ends and starting another one.
I always want to have 10 tasks working.
I want something like this
int nTotalTasks=10;
int nCurrentTask=0;
Task<bool>[] tasks=new Task<bool>[nThreadsNum];
for (int i=0; i<1000; i++)
{
string param1="test";
string param2="test";
if (nCurrentTask<10) // if there are less than 10 tasks then start another one
tasks[nCurrentThread++] = Task.Factory.StartNew<bool>(() =>
{
MyClass cls = new MyClass();
bool bRet = cls.Method1(param1, param2, i); // takes up to 2 minutes to finish
return bRet;
});
// How can I stop the for loop until a new task is finished and start a new one?
}
Check out the Task.WaitAny method:
Waits for any of the provided Task objects to complete execution.
Example from the documentation:
var t1 = Task.Factory.StartNew(() => DoOperation1());
var t2 = Task.Factory.StartNew(() => DoOperation2());
Task.WaitAny(t1, t2)
I would use a combination of Microsoft's Reactive Framework (NuGet "Rx-Main") and TPL for this. It becomes very simple.
Here's the code:
int nTotalTasks=10;
string param1="test";
string param2="test";
IDisposable subscription =
Observable
.Range(0, 1000)
.Select(i => Observable.FromAsync(() => Task.Factory.StartNew<bool>(() =>
{
MyClass cls = new MyClass();
bool bRet = cls.Method1(param1, param2, i); // takes up to 2 minutes to finish
return bRet;
})))
.Merge(nTotalTasks)
.ToArray()
.Subscribe((bool[] results) =>
{
/* Do something with the results. */
});
The key part here is the .Merge(nTotalTasks) which limits the number of concurrent tasks.
If you need to stop the processing part way thru just call subscription.Dispose() and everything gets cleaned up for you.
If you want to process each result as they are produced you can change the code from the .Merge(...) like this:
.Merge(nTotalTasks)
.Subscribe((bool result) =>
{
/* Do something with each result. */
});
This should be all you need, not complete, but all you need to do is wait on the first to complete and then run the second.
Task.WaitAny(task to wait on);
Task.Factory.StartNew()
Have you seen the BlockingCollection class? It allows you to have multiple threads running in parallel and you can wait from results from one task to execute another. See more information here.
The answer depends on whether the tasks to be scheduled are CPU or I/O bound.
For CPU-intensive work I would use Parallel.For() API setting the number of thread/tasks through MaxDegreeOfParallelism property of ParallelOptions
For I/O bound work the number of concurrently executing tasks can be significantly larger than the number of available CPUs, so the strategy is to rely on async methods as much as possible, which reduces the total number of threads waiting for completion.
How can I stop the for loop until a new task is finished and start a
new one?
The loop can be throttled by using await:
static void Main(string[] args)
{
var task = DoWorkAsync();
task.Wait();
// handle results
// task.Result;
Console.WriteLine("Done.");
}
async static Task<bool> DoWorkAsync()
{
const int NUMBER_OF_SLOTS = 10;
string param1="test";
string param2="test";
var results = new bool[NUMBER_OF_SLOTS];
AsyncWorkScheduler ws = new AsyncWorkScheduler(NUMBER_OF_SLOTS);
for (int i = 0; i < 1000; ++i)
{
await ws.ScheduleAsync((slotNumber) => DoWorkAsync(i, slotNumber, param1, param2, results));
}
ws.Complete();
await ws.Completion;
}
async static Task DoWorkAsync(int index, int slotNumber, string param1, string param2, bool[] results)
{
results[slotNumber] = results[slotNumber} && await Task.Factory.StartNew<bool>(() =>
{
MyClass cls = new MyClass();
bool bRet = cls.Method1(param1, param2, i); // takes up to 2 minutes to finish
return bRet;
}));
}
A helper class AsyncWorkScheduler uses TPL.DataFlow components as well as Task.WhenAll():
class AsyncWorkScheduler
{
public AsyncWorkScheduler(int numberOfSlots)
{
m_slots = new Task[numberOfSlots];
m_availableSlots = new BufferBlock<int>();
m_errors = new List<Exception>();
m_tcs = new TaskCompletionSource<bool>();
m_completionPending = 0;
// Initial state: all slots are available
for(int i = 0; i < m_slots.Length; ++i)
{
m_slots[i] = Task.FromResult(false);
m_availableSlots.Post(i);
}
}
public async Task ScheduleAsync(Func<int, Task> action)
{
if (Volatile.Read(ref m_completionPending) != 0)
{
throw new InvalidOperationException("Unable to schedule new items.");
}
// Acquire a slot
int slotNumber = await m_availableSlots.ReceiveAsync().ConfigureAwait(false);
// Schedule a new task for a given slot
var task = action(slotNumber);
// Store a continuation on the task to handle completion events
m_slots[slotNumber] = task.ContinueWith(t => HandleCompletedTask(t, slotNumber), TaskContinuationOptions.ExecuteSynchronously);
}
public async void Complete()
{
if (Interlocked.CompareExchange(ref m_completionPending, 1, 0) != 0)
{
return;
}
// Signal the queue's completion
m_availableSlots.Complete();
await Task.WhenAll(m_slots).ConfigureAwait(false);
// Set completion
if (m_errors.Count != 0)
{
m_tcs.TrySetException(m_errors);
}
else
{
m_tcs.TrySetResult(true);
}
}
public Task Completion
{
get
{
return m_tcs.Task;
}
}
void SetFailed(Exception error)
{
lock(m_errors)
{
m_errors.Add(error);
}
}
void HandleCompletedTask(Task task, int slotNumber)
{
if (task.IsFaulted || task.IsCanceled)
{
SetFailed(task.Exception);
return;
}
if (Volatile.Read(ref m_completionPending) == 1)
{
return;
}
// Release a slot
m_availableSlots.Post(slotNumber);
}
int m_completionPending;
List<Exception> m_errors;
BufferBlock<int> m_availableSlots;
TaskCompletionSource<bool> m_tcs;
Task[] m_slots;
}
So what i am trying to do here is:
Make the engine loop and work on an object if the queue is not empty.
If the queue is empty i call the manualresetevent to make the thread sleep.
When a item is added and the loop is not active i set the manualresetevent.
To make it faster i pick up atmost 5 items from the list and perform operation on them asynchronously and wait for all of them to finish.
Problem:
The clear methods on the two lists are called as soon as a new call to the AddToUpdateQueueMethod is called.
In my head as i am waiting for Task.WhenAll(tasks), so thread should wait for its completion before moving ahead, hence the clear on the lists should only be called on after Task.WhenAll(tasks) returns.
What am i missing here, or what will be a better way to achieve this.
public async Task ThumbnailUpdaterEngine()
{
int count;
List<Task<bool>> tasks = new List<Task<bool>>();
List<Content> candidateContents = new List<Content>();
while (true)
{
for (int i = 0; i < 5; i++)
{
Content nextContent = GetNextFromInternalQueue();
if (nextContent == null)
break;
else
candidateContents.Add(nextContent);
}
foreach (var candidateContent in candidateContents)
{
foreach (var provider in interactionProviders)
{
if (provider.IsServiceSupported(candidateContent.ServiceType))
{
Task<bool> task = provider.UpdateThumbnail(candidateContent);
tasks.Add(task);
break;
}
}
}
var results = await Task.WhenAll(tasks);
tasks.Clear();
foreach (var candidateContent in candidateContents)
{
if (candidateContent.ThumbnailLink != null && !candidateContent.ThumbnailLink.Equals(candidateContent.FileIconLink, StringComparison.CurrentCultureIgnoreCase))
{
Task<bool> task = DownloadAndUpdateThumbnailCache(candidateContent);
tasks.Add(task);
}
}
await Task.WhenAll(tasks);
//Clean up for next time the loop comes in.
tasks.Clear();
candidateContents.Clear();
lock (syncObject)
{
count = internalQueue.Count;
if (count == 0)
{
isQueueControllerRunning = false;
monitorEvent.Reset();
}
}
await Task.Run(() => monitorEvent.WaitOne());
}
}
private Content GetNextFromInternalQueue()
{
lock (syncObject)
{
Content nextContent = null;
if (internalQueue.Count > 0)
{
nextContent = internalQueue[0];
internalQueue.Remove(nextContent);
}
return nextContent;
}
}
public void AddToUpdateQueue(Content content)
{
lock (syncObject)
{
internalQueue.Add(content);
if (!isQueueControllerRunning)
{
isQueueControllerRunning = true;
monitorEvent.Set();
}
}
}
You should simply use TPL Dataflow. It's an actor framework on top of the TPL with an async support. Use an ActionBlock with an async action and MaxDegreeOfParallelism of 5:
var block = new ActionBlock<Content>(
async content =>
{
var tasks = interactionProviders.
Where(provider => provider.IsServiceSupported(content.ServiceType)).
Select(provider => provider.UpdateThumbnail(content));
await Task.WhenAll(tasks);
if (content.ThumbnailLink != null && !content.ThumbnailLink.Equals(
content.FileIconLink,
StringComparison.CurrentCultureIgnoreCase))
{
await DownloadAndUpdateThumbnailCache(content);
}
}, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 5});
foreach (var content in GetContent())
{
block.Post(content);
}
block.Complete();
await block.Completion
I want to process something using parallel loop like this :
public void FillLogs(IEnumerable<IComputer> computers)
{
Parallel.ForEach(computers, cpt=>
{
cpt.Logs = cpt.GetRawLogs().ToList();
});
}
Ok, it works fine. But How to do if I want the FillLogs method return an IEnumerable ?
public IEnumerable<IComputer> FillLogs(IEnumerable<IComputer> computers)
{
Parallel.ForEach(computers, cpt=>
{
cpt.Logs = cpt.GetRawLogs().ToList();
yield return cpt // KO, don't work
});
}
EDIT
It seems not to be possible... but I use something like this :
public IEnumerable<IComputer> FillLogs(IEnumerable<IComputer> computers)
{
return computers.AsParallel().Select(cpt => cpt);
}
But where I put the cpt.Logs = cpt.GetRawLogs().ToList(); instruction
Short version - no, that isn't possible via an iterator block; the longer version probably involves synchronized queue/dequeue between the caller's iterator thread (doing the dequeue) and the parallel workers (doing the enqueue); but as a side note - logs are usually IO-bound, and parallelising things that are IO-bound often doesn't work very well.
If the caller is going to take some time to consume each, then there may be some merit to an approach that only processes one log at a time, but can do that while the caller is consuming the previous log; i.e. it begins a Task for the next item before the yield, and waits for completion after the yield... but that is again, pretty complex. As a simplified example:
static void Main()
{
foreach(string s in Get())
{
Console.WriteLine(s);
}
}
static IEnumerable<string> Get() {
var source = new[] {1, 2, 3, 4, 5};
Task<string> outstandingItem = null;
Func<object, string> transform = x => ProcessItem((int) x);
foreach(var item in source)
{
var tmp = outstandingItem;
// note: passed in as "state", not captured, so not a foreach/capture bug
outstandingItem = new Task<string>(transform, item);
outstandingItem.Start();
if (tmp != null) yield return tmp.Result;
}
if (outstandingItem != null) yield return outstandingItem.Result;
}
static string ProcessItem(int i)
{
return i.ToString();
}
I don't want to be offensive, but maybe there is a lack of understanding. Parallel.ForEach means that the TPL will run the foreach according to the available hardware in several threads. But that means, that ii is possible to do that work in parallel! yield return gives you the opportunity to get some values out of a list (or what-so-ever) and give them back one-by-one as they are needed. It prevents of the need to first find all items matching the condition and then iterate over them. That is indeed a performance advantage, but can't be done in parallel.
Although the question is old I've managed to do something just for fun.
class Program
{
static void Main(string[] args)
{
foreach (var message in GetMessages())
{
Console.WriteLine(message);
}
}
// Parallel yield
private static IEnumerable<string> GetMessages()
{
int total = 0;
bool completed = false;
var batches = Enumerable.Range(1, 100).Select(i => new Computer() { Id = i });
var qu = new ConcurrentQueue<Computer>();
Task.Run(() =>
{
try
{
Parallel.ForEach(batches,
() => 0,
(item, loop, subtotal) =>
{
Thread.Sleep(1000);
qu.Enqueue(item);
return subtotal + 1;
},
result => Interlocked.Add(ref total, result));
}
finally
{
completed = true;
}
});
int current = 0;
while (current < total || !completed)
{
SpinWait.SpinUntil(() => current < total || completed);
if (current == total) yield break;
current++;
qu.TryDequeue(out Computer computer);
yield return $"Completed {computer.Id}";
}
}
}
public class Computer
{
public int Id { get; set; }
}
Compared to Koray's answer this one really uses all the CPU cores.
You can use the following extension method
public static class ParallelExtensions
{
public static IEnumerable<T1> OrderedParallel<T, T1>(this IEnumerable<T> list, Func<T, T1> action)
{
var unorderedResult = new ConcurrentBag<(long, T1)>();
Parallel.ForEach(list, (o, state, i) =>
{
unorderedResult.Add((i, action.Invoke(o)));
});
var ordered = unorderedResult.OrderBy(o => o.Item1);
return ordered.Select(o => o.Item2);
}
}
use like:
public void FillLogs(IEnumerable<IComputer> computers)
{
cpt.Logs = computers.OrderedParallel(o => o.GetRawLogs()).ToList();
}
Hope this will save you some time.
How about
Queue<string> qu = new Queue<string>();
bool finished = false;
Task.Factory.StartNew(() =>
{
Parallel.ForEach(get_list(), (item) =>
{
string itemToReturn = heavyWorkOnItem(item);
lock (qu)
qu.Enqueue(itemToReturn );
});
finished = true;
});
while (!finished)
{
lock (qu)
while (qu.Count > 0)
yield return qu.Dequeue();
//maybe a thread sleep here?
}
Edit:
I think this is better:
public static IEnumerable<TOutput> ParallelYieldReturn<TSource, TOutput>(this IEnumerable<TSource> source, Func<TSource, TOutput> func)
{
ConcurrentQueue<TOutput> qu = new ConcurrentQueue<TOutput>();
bool finished = false;
AutoResetEvent re = new AutoResetEvent(false);
Task.Factory.StartNew(() =>
{
Parallel.ForEach(source, (item) =>
{
qu.Enqueue(func(item));
re.Set();
});
finished = true;
re.Set();
});
while (!finished)
{
re.WaitOne();
while (qu.Count > 0)
{
TOutput res;
if (qu.TryDequeue(out res))
yield return res;
}
}
}
Edit2: I agree with the short No answer. This code is useless; you cannot break the yield loop.