C# Parallel Foreach + Async - c#

I'm processing a list of items (200k - 300k), each item processing time is between 2 to 8 seconds. To gain time, I can process this list in parallel. As I'm in an async context, I use something like this :
public async Task<List<Keyword>> DoWord(List<string> keyword)
{
ConcurrentBag<Keyword> keywordResults = new ConcurrentBag<Keyword>();
if (keyword.Count > 0)
{
try
{
var tasks = keyword.Select(async kw =>
{
return await Work(kw).ConfigureAwait(false);
});
keywordResults = new ConcurrentBag<Keyword>(await Task.WhenAll(tasks).ConfigureAwait(false));
}
catch (AggregateException ae)
{
foreach (Exception innerEx in ae.InnerExceptions)
{
log.ErrorFormat("Core threads exception: {0}", innerEx);
}
}
}
return keywordResults.ToList();
}
The keyword list contains always 8 elements (comming from above) thus I process my list 8 by 8 but, in this case, I guess that if 7 keywords are processed in 3 secs and the 8th is processed in 10 secs, the total time for the 8 keywords will be 10 (correct me if i'm wrong).
How Can I approach from the Parallel.Foreach then? I mean : launch 8 keywords if 1 of them is done, launch 1 more. In this case I'll have 8 working processes permanently. Any idea ?

Another more easier way to do this is to use the AsyncEnumerator NuGet Package:
using System.Collections.Async;
public async Task<List<Keyword>> DoWord(List<string> keywords)
{
var keywordResults = new ConcurrentBag<Keyword>();
await keywords.ParallelForEachAsync(async keyword =>
{
try
{
var result = await Work(keyword);
keywordResults.Add(result);
}
catch (AggregateException ae)
{
foreach (Exception innerEx in ae.InnerExceptions)
{
log.ErrorFormat("Core threads exception: {0}", innerEx);
}
}
}, maxDegreeOfParallelism: 8);
return keywordResults.ToList();
}

Here's some sample code showing how you could approach this using TPL Dataflow.
Note that in order to compile this, you will need to add TPL Dataflow to your project via NuGet.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;
namespace Demo
{
class Keyword // Dummy test class.
{
public string Name;
}
class Program
{
static void Main()
{
// Dummy test data.
var keywords = Enumerable.Range(1, 100).Select(n => n.ToString()).ToList();
var result = DoWork(keywords).Result;
Console.WriteLine("---------------------------------");
foreach (var item in result)
Console.WriteLine(item.Name);
}
public static async Task<List<Keyword>> DoWork(List<string> keywords)
{
var input = new TransformBlock<string, Keyword>
(
async s => await Work(s),
// This is where you specify the max number of threads to use.
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 8 }
);
var result = new List<Keyword>();
var output = new ActionBlock<Keyword>
(
item => result.Add(item), // Output only 1 item at a time, because 'result.Add()' is not threadsafe.
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 1 }
);
input.LinkTo(output, new DataflowLinkOptions { PropagateCompletion = true });
foreach (string s in keywords)
await input.SendAsync(s);
input.Complete();
await output.Completion;
return result;
}
public static async Task<Keyword> Work(string s) // Stubbed test method.
{
Console.WriteLine("Processing " + s);
int delay;
lock (rng) { delay = rng.Next(10, 1000); }
await Task.Delay(delay); // Simulate load.
Console.WriteLine("Completed " + s);
return await Task.Run( () => new Keyword { Name = s });
}
static Random rng = new Random();
}
}

Related

How to await multiple IAsyncEnumerable

We have code like this:
var intList = new List<int>{1,2,3};
var asyncEnumerables = intList.Select(Foo);
private async IAsyncEnumerable<int> Foo(int a)
{
while (true)
{
await Task.Delay(5000);
yield return a;
}
}
I need to start await foreach for every asyncEnumerable's entry. Every loop iteration should wait each other, and when every iteration is done i need to collect every iteration's data and process that by another method.
Can i somehow achieve that by TPL? Otherwise, couldn't you give me some ideas?
What works for me is the Zip function in this repo (81 line)
I'm using it like this
var intList = new List<int> { 1, 2, 3 };
var asyncEnumerables = intList.Select(RunAsyncIterations);
var enumerableToIterate = async_enumerable_dotnet.AsyncEnumerable.Zip(s => s, asyncEnumerables.ToArray());
await foreach (int[] enumerablesConcatenation in enumerableToIterate)
{
Console.WriteLine(enumerablesConcatenation.Sum()); //Sum returns 6
await Task.Delay(2000);
}
static async IAsyncEnumerable<int> RunAsyncIterations(int i)
{
while (true)
yield return i;
}
Here is a generic method Zip you could use, implemented as an iterator. The cancellationToken is decorated with the EnumeratorCancellation attribute, so that the resulting IAsyncEnumerable is WithCancellation friendly.
using System.Runtime.CompilerServices;
public static async IAsyncEnumerable<TSource[]> Zip<TSource>(
IEnumerable<IAsyncEnumerable<TSource>> sources,
[EnumeratorCancellation]CancellationToken cancellationToken = default)
{
var enumerators = sources
.Select(x => x.GetAsyncEnumerator(cancellationToken))
.ToArray();
try
{
while (true)
{
var array = new TSource[enumerators.Length];
for (int i = 0; i < enumerators.Length; i++)
{
if (!await enumerators[i].MoveNextAsync()) yield break;
array[i] = enumerators[i].Current;
}
yield return array;
}
}
finally
{
foreach (var enumerator in enumerators)
{
await enumerator.DisposeAsync();
}
}
}
Usage example:
await foreach (int[] result in Zip(asyncEnumerables))
{
Console.WriteLine($"Result: {String.Join(", ", result)}");
}

TaskFactory, Starting a new Task when one ends

I have found many methods of using the TaskFactory but I could not find anything about starting more tasks and watching when one ends and starting another one.
I always want to have 10 tasks working.
I want something like this
int nTotalTasks=10;
int nCurrentTask=0;
Task<bool>[] tasks=new Task<bool>[nThreadsNum];
for (int i=0; i<1000; i++)
{
string param1="test";
string param2="test";
if (nCurrentTask<10) // if there are less than 10 tasks then start another one
tasks[nCurrentThread++] = Task.Factory.StartNew<bool>(() =>
{
MyClass cls = new MyClass();
bool bRet = cls.Method1(param1, param2, i); // takes up to 2 minutes to finish
return bRet;
});
// How can I stop the for loop until a new task is finished and start a new one?
}
Check out the Task.WaitAny method:
Waits for any of the provided Task objects to complete execution.
Example from the documentation:
var t1 = Task.Factory.StartNew(() => DoOperation1());
var t2 = Task.Factory.StartNew(() => DoOperation2());
Task.WaitAny(t1, t2)
I would use a combination of Microsoft's Reactive Framework (NuGet "Rx-Main") and TPL for this. It becomes very simple.
Here's the code:
int nTotalTasks=10;
string param1="test";
string param2="test";
IDisposable subscription =
Observable
.Range(0, 1000)
.Select(i => Observable.FromAsync(() => Task.Factory.StartNew<bool>(() =>
{
MyClass cls = new MyClass();
bool bRet = cls.Method1(param1, param2, i); // takes up to 2 minutes to finish
return bRet;
})))
.Merge(nTotalTasks)
.ToArray()
.Subscribe((bool[] results) =>
{
/* Do something with the results. */
});
The key part here is the .Merge(nTotalTasks) which limits the number of concurrent tasks.
If you need to stop the processing part way thru just call subscription.Dispose() and everything gets cleaned up for you.
If you want to process each result as they are produced you can change the code from the .Merge(...) like this:
.Merge(nTotalTasks)
.Subscribe((bool result) =>
{
/* Do something with each result. */
});
This should be all you need, not complete, but all you need to do is wait on the first to complete and then run the second.
Task.WaitAny(task to wait on);
Task.Factory.StartNew()
Have you seen the BlockingCollection class? It allows you to have multiple threads running in parallel and you can wait from results from one task to execute another. See more information here.
The answer depends on whether the tasks to be scheduled are CPU or I/O bound.
For CPU-intensive work I would use Parallel.For() API setting the number of thread/tasks through MaxDegreeOfParallelism property of ParallelOptions
For I/O bound work the number of concurrently executing tasks can be significantly larger than the number of available CPUs, so the strategy is to rely on async methods as much as possible, which reduces the total number of threads waiting for completion.
How can I stop the for loop until a new task is finished and start a
new one?
The loop can be throttled by using await:
static void Main(string[] args)
{
var task = DoWorkAsync();
task.Wait();
// handle results
// task.Result;
Console.WriteLine("Done.");
}
async static Task<bool> DoWorkAsync()
{
const int NUMBER_OF_SLOTS = 10;
string param1="test";
string param2="test";
var results = new bool[NUMBER_OF_SLOTS];
AsyncWorkScheduler ws = new AsyncWorkScheduler(NUMBER_OF_SLOTS);
for (int i = 0; i < 1000; ++i)
{
await ws.ScheduleAsync((slotNumber) => DoWorkAsync(i, slotNumber, param1, param2, results));
}
ws.Complete();
await ws.Completion;
}
async static Task DoWorkAsync(int index, int slotNumber, string param1, string param2, bool[] results)
{
results[slotNumber] = results[slotNumber} && await Task.Factory.StartNew<bool>(() =>
{
MyClass cls = new MyClass();
bool bRet = cls.Method1(param1, param2, i); // takes up to 2 minutes to finish
return bRet;
}));
}
A helper class AsyncWorkScheduler uses TPL.DataFlow components as well as Task.WhenAll():
class AsyncWorkScheduler
{
public AsyncWorkScheduler(int numberOfSlots)
{
m_slots = new Task[numberOfSlots];
m_availableSlots = new BufferBlock<int>();
m_errors = new List<Exception>();
m_tcs = new TaskCompletionSource<bool>();
m_completionPending = 0;
// Initial state: all slots are available
for(int i = 0; i < m_slots.Length; ++i)
{
m_slots[i] = Task.FromResult(false);
m_availableSlots.Post(i);
}
}
public async Task ScheduleAsync(Func<int, Task> action)
{
if (Volatile.Read(ref m_completionPending) != 0)
{
throw new InvalidOperationException("Unable to schedule new items.");
}
// Acquire a slot
int slotNumber = await m_availableSlots.ReceiveAsync().ConfigureAwait(false);
// Schedule a new task for a given slot
var task = action(slotNumber);
// Store a continuation on the task to handle completion events
m_slots[slotNumber] = task.ContinueWith(t => HandleCompletedTask(t, slotNumber), TaskContinuationOptions.ExecuteSynchronously);
}
public async void Complete()
{
if (Interlocked.CompareExchange(ref m_completionPending, 1, 0) != 0)
{
return;
}
// Signal the queue's completion
m_availableSlots.Complete();
await Task.WhenAll(m_slots).ConfigureAwait(false);
// Set completion
if (m_errors.Count != 0)
{
m_tcs.TrySetException(m_errors);
}
else
{
m_tcs.TrySetResult(true);
}
}
public Task Completion
{
get
{
return m_tcs.Task;
}
}
void SetFailed(Exception error)
{
lock(m_errors)
{
m_errors.Add(error);
}
}
void HandleCompletedTask(Task task, int slotNumber)
{
if (task.IsFaulted || task.IsCanceled)
{
SetFailed(task.Exception);
return;
}
if (Volatile.Read(ref m_completionPending) == 1)
{
return;
}
// Release a slot
m_availableSlots.Post(slotNumber);
}
int m_completionPending;
List<Exception> m_errors;
BufferBlock<int> m_availableSlots;
TaskCompletionSource<bool> m_tcs;
Task[] m_slots;
}

cancel async task if running

I have the following method called on several occasions (e.g onkeyup of textbox) which asynchronously filters items in listbox.
private async void filterCats(string category,bool deselect)
{
List<Category> tempList = new List<Category>();
//Wait for categories
var tokenSource = new CancellationTokenSource();
var token = tokenSource.Token;
//HERE,CANCEL TASK IF ALREADY RUNNING
tempList= await _filterCats(category,token);
//Show results
CAT_lb_Cats.DataSource = tempList;
CAT_lb_Cats.DisplayMember = "strCategory";
CAT_lb_Cats.ValueMember = "idCategory";
}
and the following task
private async Task<List<Category>> _filterCats(string category,CancellationToken token)
{
List<Category> result = await Task.Run(() =>
{
return getCatsByStr(category);
},token);
return result;
}
and I would like to test whether the task is already runing and if so cancel it and start it with the new value. I know how to cancel task, but how can I check whether it is already running?
This is the code that I use to do this :
if (_tokenSource != null)
{
_tokenSource.Cancel();
}
_tokenSource = new CancellationTokenSource();
try
{
await loadPrestatieAsync(_bedrijfid, _projectid, _medewerkerid, _prestatieid, _startDate, _endDate, _tokenSource.Token);
}
catch (OperationCanceledException ex)
{
}
and for the procedure call it is like this (simplified of course) :
private async Task loadPrestatieAsync(int bedrijfId, int projectid, int medewerkerid, int prestatieid,
DateTime? startDate, DateTime? endDate, CancellationToken token)
{
await Task.Delay(100, token).ConfigureAwait(true);
try{
//do stuff
token.ThrowIfCancellationRequested();
}
catch (OperationCanceledException ex)
{
throw;
}
catch (Exception Ex)
{
throw;
}
}
I am doing a delay of 100 ms because the same action is triggered rather quickly and repeatedly, a small postpone of 100 ms makes it look like the GUI is more responsive actually.
It appears you are looking for a way to get an "autocomplete list" from text entered in a text box, where an ongoing async search is canceled when the text has changed since the search was started.
As was mentioned in the comments, Rx (Reactive Extensions), provides very nice patterns for this, allowing you to easily connect your UI elements to cancellable asynchronous tasks, building in retry logic, etc.
The less than 90 line program below, shows a "full UI" sample (unfortunately excluding any cats ;-). It includes some reporting on the search status.
I have created this using a number of static methods in the RxAutoComplete class, to show how to this is achieved in small documented steps, and how they can be combined, to achieve a more complex task.
namespace TryOuts
{
using System;
using System.Linq;
using System.Threading.Tasks;
using System.Windows.Forms;
using System.Reactive.Linq;
using System.Threading;
// Simulated async search service, that can fail.
public class FakeWordSearchService
{
private static Random _rnd = new Random();
private static string[] _allWords = new[] {
"gideon", "gabby", "joan", "jessica", "bob", "bill", "sam", "johann"
};
public async Task<string[]> Search(string searchTerm, CancellationToken cancelToken)
{
await Task.Delay(_rnd.Next(600), cancelToken); // simulate async call.
if ((_rnd.Next() % 5) == 0) // every 5 times, we will cause a search failure
throw new Exception(string.Format("Search for '{0}' failed on purpose", searchTerm));
return _allWords.Where(w => w.StartsWith(searchTerm)).ToArray();
}
}
public static class RxAutoComplete
{
// Returns an observable that pushes the 'txt' TextBox text when it has changed.
static IObservable<string> TextChanged(TextBox txt)
{
return from evt in Observable.FromEventPattern<EventHandler, EventArgs>(
h => txt.TextChanged += h,
h => txt.TextChanged -= h)
select ((TextBox)evt.Sender).Text.Trim();
}
// Throttles the source.
static IObservable<string> ThrottleInput(IObservable<string> source, int minTextLength, TimeSpan throttle)
{
return source
.Where(t => t.Length >= minTextLength) // Wait until we have at least 'minTextLength' characters
.Throttle(throttle) // We don't start when the user is still typing
.DistinctUntilChanged(); // We only fire, if after throttling the text is different from before.
}
// Provides search results and performs asynchronous,
// cancellable search with automatic retries on errors
static IObservable<string[]> PerformSearch(IObservable<string> source, FakeWordSearchService searchService)
{
return from term in source // term from throttled input
from result in Observable.FromAsync(async token => await searchService.Search(term, token))
.Retry(3) // Perform up to 3 tries on failure
.TakeUntil(source) // Cancel pending request if new search request was made.
select result;
}
// Putting it all together.
public static void RunUI()
{
// Our simple search GUI.
var inputTextBox = new TextBox() { Width = 300 };
var searchResultLB = new ListBox { Top = inputTextBox.Height + 10, Width = inputTextBox.Width };
var searchStatus = new Label { Top = searchResultLB.Height + 30, Width = inputTextBox.Width };
var mainForm = new Form { Controls = { inputTextBox, searchResultLB, searchStatus }, Width = inputTextBox.Width + 20 };
// Our UI update handlers.
var syncContext = SynchronizationContext.Current;
Action<Action> onUITread = (x) => syncContext.Post(_ => x(), null);
Action<string> onSearchStarted = t => onUITread(() => searchStatus.Text = (string.Format("searching for '{0}'.", t)));
Action<string[]> onSearchResult = w => {
searchResultLB.Items.Clear();
searchResultLB.Items.AddRange(w);
searchStatus.Text += string.Format(" {0} maches found.", w.Length > 0 ? w.Length.ToString() : "No");
};
// Connecting input to search
var input = ThrottleInput(TextChanged(inputTextBox), 1, TimeSpan.FromSeconds(0.5)).Do(onSearchStarted);
var result = PerformSearch(input, new FakeWordSearchService());
// Running it
using (result.ObserveOn(syncContext).Subscribe(onSearchResult, ex => Console.WriteLine(ex)))
Application.Run(mainForm);
}
}
}

Why TPL Dataflow block.LinkTo does not give any output?

I am quite new to the topic TPL Dataflow. In the book Concurrency in C# I tested the following example. I can't figure out why there's no output which should be 2*2-2=2;
static void Main(string[] args)
{
//Task tt = test();
Task tt = test1();
Console.ReadLine();
}
static async Task test1()
{
try
{
var multiplyBlock = new TransformBlock<int, int>(item =>
{
if (item == 1)
throw new InvalidOperationException("Blech.");
return item * 2;
});
var subtractBlock = new TransformBlock<int, int>(item => item - 2);
multiplyBlock.LinkTo(subtractBlock,
new DataflowLinkOptions { PropagateCompletion = true });
multiplyBlock.Post(2);
await subtractBlock.Completion;
int temp = subtractBlock.Receive();
Console.WriteLine(temp);
}
catch (AggregateException e)
{
// The exception is caught here.
foreach (var v in e.InnerExceptions)
{
Console.WriteLine(v.Message);
}
}
}
Update1: I tried another example. Still I did not use Block.Complete() but I thought when the first block's completed, the result is passed into the second block automatically.
private static async Task test3()
{
TransformManyBlock<int, int> tmb = new TransformManyBlock<int, int>((i) => { return new int[] {i, i + 1}; });
ActionBlock<int> ab = new ActionBlock<int>((i) => Console.WriteLine(i));
tmb.LinkTo(ab);
for (int i = 0; i < 4; i++)
{
tmb.Post(i);
}
//tmb.Complete();
await ab.Completion;
Console.WriteLine("Finished post");
}
This part of the code:
await subtractBlock.Completion;
int temp = subtractBlock.Receive();
is first (asynchronously) waiting for the subtraction block to complete, and then attempting to retrieve an output from the block.
There are two problems: the source block is never completed, and the code is attempting to retrieve output from a completed block. Once a block has completed, it will not produce any more data.
(I assume you're referring to the example in recipe 4.2, which will post 1, causing the exception, which completes the block in a faulted state).
So, you can fix this test by completing the source block (and the completion will propagate along the link to the subtractBlock automatically), and by reading the output before (asynchronously) waiting for subtractBlock to complete:
multiplyBlock.Complete();
int temp = subtractBlock.Receive();
await subtractBlock.Completion;

Transform IEnumerable<Task<T>> asynchronously by awaiting each task

Today I was wondering how to transform a list of Tasks by awaiting each of it.
Consider the following example:
private static void Main(string[] args)
{
try
{
Run(args);
Console.ReadLine();
}
catch (Exception ex)
{
Console.WriteLine(ex.ToString());
Console.ReadLine();
}
}
static async Task Run(string[] args)
{
//Version 1: does compile, but ugly and List<T> overhead
var tasks1 = GetTasks();
List<string> gainStrings1 = new List<string>();
foreach (Task<string> task in tasks1)
{
gainStrings1.Add(await task);
}
Console.WriteLine(string.Join("", gainStrings1));
//Version 2: does not compile
var tasks2 = GetTasks();
IEnumerable<string> gainStrings2 = tasks2.Select(async t => await t);
Console.WriteLine(string.Join("", gainStrings2));
}
static IEnumerable<Task<string>> GetTasks()
{
string[] messages = new[] { "Hello", " ", "async", " ", "World" };
for (int i = 0; i < messages.Length; i++)
{
TaskCompletionSource<string> tcs = new TaskCompletionSource<string>();
tcs.SetResult(messages[i]);
yield return tcs.Task;
}
}
I'd like to transform my list of Tasks without the foreach, however either the anonymous function syntax nor the usual function syntax allows me to do what my foreach does.
Do I have to rely on my foreach and the List<T> or is there any way to get it to work with IEnumerable<T> and all its advantages?
What about this:
await Task.WhenAll(tasks1);
var gainStrings = tasks1.Select(t => t.Result).ToList();
Wait for all tasks to end and then extract results. This is ideal if you don't care in which order they are finished.
EDIT2:
Even better way:
var gainStrings = await Task.WhenAll(tasks1);

Categories