So I just started to try and understand async, Task, lambda and so on, and I am unable to get it to work like I want. With the code below I want for it to lock btnDoWebRequest, do a unknow number of WebRequests as a Task and once all the Task are done unlock btnDoWebRequest. However I only want a max of 3 or whatever number I set of Tasks running at one time, which I got partly from Have a set of Tasks with only X running at a time.
But after trying and modifying my code in multiple ways, it will always immediately jump back and reenabled btnDoWebRequest. Of course VS is warning me about needing awaits, currently at ".ContinueWith((task)" and at the async in "await Task.WhenAll(requestInfoList .Select(async i =>", but can't seem to work where or how to put in the needed awaits. Of course as I'm still learning there is a good chance I am going at this all wrong and the whole thing needs to be reworked. So any help would be greatly appreciated.
Thanks
private SemaphoreSlim maxThread = new SemaphoreSlim(3);
private void btnDoWebRequest_Click(object sender, EventArgs e)
{
btnDoWebRequest.Enabled = false;
Task.Factory.StartNew(async () => await DoWebRequest()).Wait();
btnDoWebRequest.Enabled = true;
}
private async Task DoWebRequest()
{
List<requestInfo> requestInfoList = new List<requestInfo>();
for (int i = 0; dataRequestInfo.RowCount - 1 > i; i++)
{
requestInfoList.Add((requestInfo)dataRequestInfo.Rows[i].Tag);
}
await Task.WhenAll(requestInfoList .Select(async i =>
{
maxThread.Wait();
Task.Factory.StartNew(() =>
{
var task = Global.webRequestWork(i);
}, TaskCreationOptions.LongRunning).ContinueWith((task) => maxThread.Release());
}));
}
First, don't use Task.Factory.StartNew by default. In fact, this should be avoided in async code. If you need to execute code on a background thread, then use Task.Run.
In your case, there's no need to use Task.Run (or Task.Factory.StartNew).
Start at the lowest level and work your way up. You already have an asynchronous web-requesting method, which I'll rename to WebRequestAsync to follow the Task-based Asynchronous Programming naming guidelines.
Next, throttle it by using the asynchronous APIs on SemaphoreSlim:
await maxThread.WaitAsync();
try
{
await Global.WebRequestWorkAsync(i);
}
finally
{
maxThread.Release();
}
Do that for each request info (note that no background thread is required):
private async Task DoWebRequestsAsync()
{
List<requestInfo> requestInfoList = new List<requestInfo>();
for (int i = 0; dataRequestInfo.RowCount - 1 > i; i++)
{
requestInfoList.Add((requestInfo)dataRequestInfo.Rows[i].Tag);
}
await Task.WhenAll(requestInfoList.Select(async i =>
{
await maxThread.WaitAsync();
try
{
await Global.WebRequestWorkAsync(i);
}
finally
{
maxThread.Release();
}
}));
}
Finally, call this from your UI (again, no background thread is required):
private async void btnDoWebRequest_Click(object sender, EventArgs e)
{
btnDoWebRequest.Enabled = false;
await DoWebRequestsAsync();
btnDoWebRequest.Enabled = true;
}
In summary, only use Task.Run when you need to; do not use Task.Factory.StartNew, and do not use Wait (use await instead). I have an async intro on my blog with more information.
There are a couple of things wrong with your code:
Using Wait() on a Task is like running things synchronously, hence you only notice the UI reacting when everything is done and the button reenabled. You need to await an async method in order for it to truely run async. More so, if a method is doing IO bound work like a web request, spinning up a new Thread Pool thread (using Task.Factory.StartNew) is redundant and is a waste of resources.
Your button click event handler needs to be marked with async so you can await inside your method.
I've cleaned up your code a bit for clarity, using the new SemaphoreSlim WaitAsync and replaced your for with a LINQ query. You may only take the first two points and apply them to your code.
private SemaphoreSlim maxThread = new SemaphoreSlim(3);
private async void btnDoWebRequest_Click(object sender, EventArgs e)
{
btnDoWebRequest.Enabled = false;
await DoWebRequest();
btnDoWebRequest.Enabled = true;
}
private async Task DoWebRequest()
{
List<requestInfo> requestInfoList = new List<requestInfo>();
var requestInfoList = dataRequestInfo.Rows.Select(x => x.Tag).Cast<requestInfo>();
var tasks = requestInfoList.Select(async I =>
{
await maxThread.WaitAsync();
try
{
await Global.webRequestWork(i);
}
finally
{
maxThread.Release();
}
});
await Task.WhenAll(tasks);
I have created an extension method for this.
It can be used like this:
var tt = new List<Func<Task>>()
{
() => Thread.Sleep(300), //Thread.Sleep can be replaced by your own functionality, like calling the website
() => Thread.Sleep(800),
() => Thread.Sleep(250),
() => Thread.Sleep(1000),
() => Thread.Sleep(100),
() => Thread.Sleep(200),
};
await tt.WhenAll(3); //this will let 3 threads run, if one ends, the next will start, untill all are finished.
The extention method:
public static class TaskExtension
{
public static async Task WhenAll(this List<Func<Task>> actions, int threadCount)
{
var _countdownEvent = new CountdownEvent(actions.Count);
var _throttler = new SemaphoreSlim(threadCount);
foreach (Func<Task> action in actions)
{
await _throttler.WaitAsync();
Task.Run(async () =>
{
try
{
await action();
}
finally
{
_throttler.Release();
_countdownEvent.Signal();
}
});
}
_countdownEvent.Wait();
}
}
We can easily achieve this using SemaphoreSlim. Extension method I've created:
/// <summary>
/// Concurrently Executes async actions for each item of <see cref="IEnumerable<typeparamref name="T"/>
/// </summary>
/// <typeparam name="T">Type of IEnumerable</typeparam>
/// <param name="enumerable">instance of <see cref="IEnumerable<typeparamref name="T"/>"/></param>
/// <param name="action">an async <see cref="Action" /> to execute</param>
/// <param name="maxActionsToRunInParallel">Optional, max numbers of the actions to run in parallel,
/// Must be grater than 0</param>
/// <returns>A Task representing an async operation</returns>
/// <exception cref="ArgumentOutOfRangeException">If the maxActionsToRunInParallel is less than 1</exception>
public static async Task ForEachAsyncConcurrent<T>(
this IEnumerable<T> enumerable,
Func<T, Task> action,
int? maxActionsToRunInParallel = null)
{
if (maxActionsToRunInParallel.HasValue)
{
using (var semaphoreSlim = new SemaphoreSlim(
maxActionsToRunInParallel.Value, maxActionsToRunInParallel.Value))
{
var tasksWithThrottler = new List<Task>();
foreach (var item in enumerable)
{
// Increment the number of currently running tasks and wait if they are more than limit.
await semaphoreSlim.WaitAsync();
tasksWithThrottler.Add(Task.Run(async () =>
{
await action(item);
// action is completed, so decrement the number of currently running tasks
semaphoreSlim.Release();
}));
}
// Wait for all tasks to complete.
await Task.WhenAll(tasksWithThrottler.ToArray());
}
}
else
{
await Task.WhenAll(enumerable.Select(item => action(item)));
}
}
Sample Usage:
await enumerable.ForEachAsyncConcurrent(
async item =>
{
await SomeAsyncMethod(item);
},
5);
Related
I would like to handle a collection in parallel, but I'm having trouble implementing it and I'm therefore hoping for some help.
The trouble arises if I want to call a method marked async in C#, within the lambda of the parallel loop. For example:
var bag = new ConcurrentBag<object>();
Parallel.ForEach(myCollection, async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
}
var count = bag.Count;
The problem occurs with the count being 0, because all the threads created are effectively just background threads and the Parallel.ForEach call doesn't wait for completion. If I remove the async keyword, the method looks like this:
var bag = new ConcurrentBag<object>();
Parallel.ForEach(myCollection, item =>
{
// some pre stuff
var responseTask = await GetData(item);
responseTask.Wait();
var response = responseTask.Result;
bag.Add(response);
// some post stuff
}
var count = bag.Count;
It works, but it completely disables the await cleverness and I have to do some manual exception handling.. (Removed for brevity).
How can I implement a Parallel.ForEach loop, that uses the await keyword within the lambda? Is it possible?
The prototype of the Parallel.ForEach method takes an Action<T> as parameter, but I want it to wait for my asynchronous lambda.
If you just want simple parallelism, you can do this:
var bag = new ConcurrentBag<object>();
var tasks = myCollection.Select(async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
});
await Task.WhenAll(tasks);
var count = bag.Count;
If you need something more complex, check out Stephen Toub's ForEachAsync post.
You can use the ParallelForEachAsync extension method from AsyncEnumerator NuGet Package:
using Dasync.Collections;
var bag = new ConcurrentBag<object>();
await myCollection.ParallelForEachAsync(async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
}, maxDegreeOfParallelism: 10);
var count = bag.Count;
Disclaimer: I'm the author of the AsyncEnumerator library, which is open source and licensed under MIT, and I'm posting this message just to help the community.
One of the new .NET 6 APIs is Parallel.ForEachAsync, a way to schedule asynchronous work that allows you to control the degree of parallelism:
var urls = new []
{
"https://dotnet.microsoft.com",
"https://www.microsoft.com",
"https://stackoverflow.com"
};
var client = new HttpClient();
var options = new ParallelOptions { MaxDegreeOfParallelism = 2 };
await Parallel.ForEachAsync(urls, options, async (url, token) =>
{
var targetPath = Path.Combine(Path.GetTempPath(), "http_cache", url);
var response = await client.GetAsync(url);
if (response.IsSuccessStatusCode)
{
using var target = File.OpenWrite(targetPath);
await response.Content.CopyToAsync(target);
}
});
Another example in Scott Hanselman's blog.
The source, for reference.
With SemaphoreSlim you can achieve parallelism control.
var bag = new ConcurrentBag<object>();
var maxParallel = 20;
var throttler = new SemaphoreSlim(initialCount: maxParallel);
var tasks = myCollection.Select(async item =>
{
await throttler.WaitAsync();
try
{
var response = await GetData(item);
bag.Add(response);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
var count = bag.Count;
Simplest possible extension method compiled from other answers and the article referenced by the accepted asnwer:
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, Task> asyncAction, int maxDegreeOfParallelism)
{
var throttler = new SemaphoreSlim(initialCount: maxDegreeOfParallelism);
var tasks = source.Select(async item =>
{
await throttler.WaitAsync();
try
{
await asyncAction(item).ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
}
UPDATE: here's a simple modification that also supports a cancellation token like requested in the comments (untested)
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, CancellationToken, Task> asyncAction, int maxDegreeOfParallelism, CancellationToken cancellationToken)
{
var throttler = new SemaphoreSlim(initialCount: maxDegreeOfParallelism);
var tasks = source.Select(async item =>
{
await throttler.WaitAsync(cancellationToken);
if (cancellationToken.IsCancellationRequested) return;
try
{
await asyncAction(item, cancellationToken).ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
}
My lightweight implementation of ParallelForEach async.
Features:
Throttling (max degree of parallelism).
Exception handling (aggregation exception will be thrown at completion).
Memory efficient (no need to store the list of tasks).
public static class AsyncEx
{
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, Task> asyncAction, int maxDegreeOfParallelism = 10)
{
var semaphoreSlim = new SemaphoreSlim(maxDegreeOfParallelism);
var tcs = new TaskCompletionSource<object>();
var exceptions = new ConcurrentBag<Exception>();
bool addingCompleted = false;
foreach (T item in source)
{
await semaphoreSlim.WaitAsync();
asyncAction(item).ContinueWith(t =>
{
semaphoreSlim.Release();
if (t.Exception != null)
{
exceptions.Add(t.Exception);
}
if (Volatile.Read(ref addingCompleted) && semaphoreSlim.CurrentCount == maxDegreeOfParallelism)
{
tcs.TrySetResult(null);
}
});
}
Volatile.Write(ref addingCompleted, true);
await tcs.Task;
if (exceptions.Count > 0)
{
throw new AggregateException(exceptions);
}
}
}
Usage example:
await Enumerable.Range(1, 10000).ParallelForEachAsync(async (i) =>
{
var data = await GetData(i);
}, maxDegreeOfParallelism: 100);
I've created an extension method for this which makes use of SemaphoreSlim and also allows to set maximum degree of parallelism
/// <summary>
/// Concurrently Executes async actions for each item of <see cref="IEnumerable<typeparamref name="T"/>
/// </summary>
/// <typeparam name="T">Type of IEnumerable</typeparam>
/// <param name="enumerable">instance of <see cref="IEnumerable<typeparamref name="T"/>"/></param>
/// <param name="action">an async <see cref="Action" /> to execute</param>
/// <param name="maxDegreeOfParallelism">Optional, An integer that represents the maximum degree of parallelism,
/// Must be grater than 0</param>
/// <returns>A Task representing an async operation</returns>
/// <exception cref="ArgumentOutOfRangeException">If the maxActionsToRunInParallel is less than 1</exception>
public static async Task ForEachAsyncConcurrent<T>(
this IEnumerable<T> enumerable,
Func<T, Task> action,
int? maxDegreeOfParallelism = null)
{
if (maxDegreeOfParallelism.HasValue)
{
using (var semaphoreSlim = new SemaphoreSlim(
maxDegreeOfParallelism.Value, maxDegreeOfParallelism.Value))
{
var tasksWithThrottler = new List<Task>();
foreach (var item in enumerable)
{
// Increment the number of currently running tasks and wait if they are more than limit.
await semaphoreSlim.WaitAsync();
tasksWithThrottler.Add(Task.Run(async () =>
{
await action(item).ContinueWith(res =>
{
// action is completed, so decrement the number of currently running tasks
semaphoreSlim.Release();
});
}));
}
// Wait for all tasks to complete.
await Task.WhenAll(tasksWithThrottler.ToArray());
}
}
else
{
await Task.WhenAll(enumerable.Select(item => action(item)));
}
}
Sample Usage:
await enumerable.ForEachAsyncConcurrent(
async item =>
{
await SomeAsyncMethod(item);
},
5);
In the accepted answer the ConcurrentBag is not required.
Here's an implementation without it:
var tasks = myCollection.Select(GetData).ToList();
await Task.WhenAll(tasks);
var results = tasks.Select(t => t.Result);
Any of the "// some pre stuff" and "// some post stuff" can go into the GetData implementation (or another method that calls GetData)
Aside from being shorter, there's no use of an "async void" lambda, which is an anti pattern.
The following is set to work with IAsyncEnumerable but can be modified to use IEnumerable by just changing the type and removing the "await" on the foreach. It's far more appropriate for large sets of data than creating countless parallel tasks and then awaiting them all.
public static async Task ForEachAsyncConcurrent<T>(this IAsyncEnumerable<T> enumerable, Func<T, Task> action, int maxDegreeOfParallelism, int? boundedCapacity = null)
{
ActionBlock<T> block = new ActionBlock<T>(
action,
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = maxDegreeOfParallelism,
BoundedCapacity = boundedCapacity ?? maxDegreeOfParallelism * 3
});
await foreach (T item in enumerable)
{
await block.SendAsync(item).ConfigureAwait(false);
}
block.Complete();
await block.Completion;
}
For a more simple solution (not sure if the most optimal), you can simply nest Parallel.ForEach inside a Task - as such
var options = new ParallelOptions { MaxDegreeOfParallelism = 5 }
Task.Run(() =>
{
Parallel.ForEach(myCollection, options, item =>
{
DoWork(item);
}
}
The ParallelOptions will do the throttlering for you, out of the box.
I am using it in a real world scenario to run a very long operations in the background. These operations are called via HTTP and it was designed not to block the HTTP call while the long operation is running.
Calling HTTP for long background operation.
Operation starts at the background.
User gets status ID which can be used to check the status using another HTTP call.
The background operation update its status.
That way, the CI/CD call does not timeout because of long HTTP operation, rather it loops the status every x seconds without blocking the process
I have few methods that report some data to Data base. We want to invoke all calls to Data service asynchronously. These calls to data service are all over and so we want to make sure that these DS calls are executed one after another in order at any given time. Initially, i was using async await on each of these methods and each of the calls were executed asynchronously but we found out if they are out of sequence then there are room for errors.
So, i thought we should queue all these asynchronous tasks and send them in a separate thread but i want to know what options we have? I came across 'SemaphoreSlim' . Will this be appropriate in my use case?
Or what other options will suit my use case? Please, guide me.
So, what i have in my code currently
public static SemaphoreSlim mutex = new SemaphoreSlim(1);
//first DS call
public async Task SendModuleDataToDSAsync(Module parameters)
{
var tasks1 = new List<Task>();
var tasks2 = new List<Task>();
//await mutex.WaitAsync(); **//is this correct way to use SemaphoreSlim ?**
foreach (var setting in Module.param)
{
Task job1 = SaveModule(setting);
tasks1.Add(job1);
Task job2= SaveModule(GetAdvancedData(setting));
tasks2.Add(job2);
}
await Task.WhenAll(tasks1);
await Task.WhenAll(tasks2);
//mutex.Release(); // **is this correct?**
}
private async Task SaveModule(Module setting)
{
await Task.Run(() =>
{
// Invokes Calls to DS
...
});
}
//somewhere down the main thread, invoking second call to DS
//Second DS Call
private async Task SendInstrumentSettingsToDS(<param1>, <param2>)
{
//await mutex.WaitAsync();// **is this correct?**
await Task.Run(() =>
{
//TrackInstrumentInfoToDS
//mutex.Release();// **is this correct?**
});
if(param2)
{
await Task.Run(() =>
{
//TrackParam2InstrumentInfoToDS
});
}
}
Initially, i was using async await on each of these methods and each of the calls were executed asynchronously but we found out if they are out of sequence then there are room for errors.
So, i thought we should queue all these asynchronous tasks and send them in a separate thread but i want to know what options we have? I came across 'SemaphoreSlim' .
SemaphoreSlim does restrict asynchronous code to running one at a time, and is a valid form of mutual exclusion. However, since "out of sequence" calls can cause errors, then SemaphoreSlim is not an appropriate solution since it does not guarantee FIFO.
In a more general sense, no synchronization primitive guarantees FIFO because that can cause problems due to side effects like lock convoys. On the other hand, it is natural for data structures to be strictly FIFO.
So, you'll need to use your own FIFO queue, rather than having an implicit execution queue. Channels is a nice, performant, async-compatible queue, but since you're on an older version of C#/.NET, BlockingCollection<T> would work:
public sealed class ExecutionQueue
{
private readonly BlockingCollection<Func<Task>> _queue = new BlockingCollection<Func<Task>>();
public ExecutionQueue() => Completion = Task.Run(() => ProcessQueueAsync());
public Task Completion { get; }
public void Complete() => _queue.CompleteAdding();
private async Task ProcessQueueAsync()
{
foreach (var value in _queue.GetConsumingEnumerable())
await value();
}
}
The only tricky part with this setup is how to queue work. From the perspective of the code queueing the work, they want to know when the lambda is executed, not when the lambda is queued. From the perspective of the queue method (which I'm calling Run), the method needs to complete its returned task only after the lambda is executed. So, you can write the queue method something like this:
public Task Run(Func<Task> lambda)
{
var tcs = new TaskCompletionSource<object>();
_queue.Add(async () =>
{
// Execute the lambda and propagate the results to the Task returned from Run
try
{
await lambda();
tcs.TrySetResult(null);
}
catch (OperationCanceledException ex)
{
tcs.TrySetCanceled(ex.CancellationToken);
}
catch (Exception ex)
{
tcs.TrySetException(ex);
}
});
return tcs.Task;
}
This queueing method isn't as perfect as it could be. If a task completes with more than one exception (this is normal for parallel code), only the first one is retained (this is normal for async code). There's also an edge case around OperationCanceledException handling. But this code is good enough for most cases.
Now you can use it like this:
public static ExecutionQueue _queue = new ExecutionQueue();
public async Task SendModuleDataToDSAsync(Module parameters)
{
var tasks1 = new List<Task>();
var tasks2 = new List<Task>();
foreach (var setting in Module.param)
{
Task job1 = _queue.Run(() => SaveModule(setting));
tasks1.Add(job1);
Task job2 = _queue.Run(() => SaveModule(GetAdvancedData(setting)));
tasks2.Add(job2);
}
await Task.WhenAll(tasks1);
await Task.WhenAll(tasks2);
}
Here's a compact solution that has the least amount of moving parts but still guarantees FIFO ordering (unlike some of the suggested SemaphoreSlim solutions). There are two overloads for Enqueue so you can enqueue tasks with and without return values.
using System;
using System.Threading;
using System.Threading.Tasks;
public class TaskQueue
{
private Task _previousTask = Task.CompletedTask;
public Task Enqueue(Func<Task> asyncAction)
{
return Enqueue(async () => {
await asyncAction().ConfigureAwait(false);
return true;
});
}
public async Task<T> Enqueue<T>(Func<Task<T>> asyncFunction)
{
var tcs = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously);
// get predecessor and wait until it's done. Also atomically swap in our own completion task.
await Interlocked.Exchange(ref _previousTask, tcs.Task).ConfigureAwait(false);
try
{
return await asyncFunction().ConfigureAwait(false);
}
finally
{
tcs.SetResult();
}
}
}
Please keep in mind that your first solution queueing all tasks to lists doesn't ensure that the tasks are executed one after another. They're all running in parallel because they're not awaited until the next tasks is startet.
So yes you've to use a SemapohoreSlim to use async locking and await. A simple implementation might be:
private readonly SemaphoreSlim _syncRoot = new SemaphoreSlim(1);
public async Task SendModuleDataToDSAsync(Module parameters)
{
await this._syncRoot.WaitAsync();
try
{
foreach (var setting in Module.param)
{
await SaveModule(setting);
await SaveModule(GetAdvancedData(setting));
}
}
finally
{
this._syncRoot.Release();
}
}
If you can use Nito.AsyncEx the code can be simplified to:
public async Task SendModuleDataToDSAsync(Module parameters)
{
using var lockHandle = await this._syncRoot.LockAsync();
foreach (var setting in Module.param)
{
await SaveModule(setting);
await SaveModule(GetAdvancedData(setting));
}
}
One option is to queue operations that will create tasks instead of queuing already running tasks as the code in the question does.
PseudoCode without locking:
Queue<Func<Task>> tasksQueue = new Queue<Func<Task>>();
async Task RunAllTasks()
{
while (tasksQueue.Count > 0)
{
var taskCreator = tasksQueue.Dequeu(); // get creator
var task = taskCreator(); // staring one task at a time here
await task; // wait till task completes
}
}
// note that declaring createSaveModuleTask does not
// start SaveModule task - it will only happen after this func is invoked
// inside RunAllTasks
Func<Task> createSaveModuleTask = () => SaveModule(setting);
tasksQueue.Add(createSaveModuleTask);
tasksQueue.Add(() => SaveModule(GetAdvancedData(setting)));
// no DB operations started at this point
// this will start tasks from the queue one by one.
await RunAllTasks();
Using ConcurrentQueue would be likely be right thing in actual code. You also would need to know total number of expected operations to stop when all are started and awaited one after another.
Building on your comment under Alexeis answer, your approch with the SemaphoreSlim is correct.
Assumeing that the methods SendInstrumentSettingsToDS and SendModuleDataToDSAsync are members of the same class. You simplay need a instance variable for a SemaphoreSlim and then at the start of each methode that needs synchornization call await lock.WaitAsync() and call lock.Release() in the finally block.
public async Task SendModuleDataToDSAsync(Module parameters)
{
await lock.WaitAsync();
try
{
...
}
finally
{
lock.Release();
}
}
private async Task SendInstrumentSettingsToDS(<param1>, <param2>)
{
await lock.WaitAsync();
try
{
...
}
finally
{
lock.Release();
}
}
and it is importend that the call to lock.Release() is in the finally-block, so that if an exception is thrown somewhere in the code of the try-block the semaphore is released.
I've written a class that asynchronously pings a subnet. It works, however, the number of hosts returned will sometimes change between runs. Some questions:
Am I doing something wrong in the code below?
What can I do to make it work better?
The ScanIPAddressesAsync() method is called like this:
NetworkDiscovery nd = new NetworkDiscovery("192.168.50.");
nd.RaiseIPScanCompleteEvent += HandleScanComplete;
nd.ScanIPAddressesAsync();
namespace BPSTestTool
{
public class IPScanCompleteEvent : EventArgs
{
public List<String> IPList { get; set; }
public IPScanCompleteEvent(List<String> _list)
{
IPList = _list;
}
}
public class NetworkDiscovery
{
private static object m_lockObj = new object();
private List<String> m_ipsFound = new List<string>();
private String m_ipBase = null;
public List<String> IPList
{
get { return m_ipsFound; }
}
public EventHandler<IPScanCompleteEvent> RaiseIPScanCompleteEvent;
public NetworkDiscovery(string ipBase)
{
this.m_ipBase = ipBase;
}
public async void ScanIPAddressesAsync()
{
var tasks = new List<Task>();
m_ipsFound.Clear();
await Task.Run(() => AsyncScan());
return;
}
private async void AsyncScan()
{
List<Task> tasks = new List<Task>();
for (int i = 2; i < 255; i++)
{
String ip = m_ipBase + i.ToString();
if (m_ipsFound.Contains(ip) == false)
{
for (int x = 0; x < 2; x++)
{
Ping p = new Ping();
var task = HandlePingReplyAsync(p, ip);
tasks.Add(task);
}
}
}
await Task.WhenAll(tasks).ContinueWith(t =>
{
OnRaiseIPScanCompleteEvent(new IPScanCompleteEvent(m_ipsFound));
});
}
protected virtual void OnRaiseIPScanCompleteEvent(IPScanCompleteEvent args)
{
RaiseIPScanCompleteEvent?.Invoke(this, args);
}
private async Task HandlePingReplyAsync(Ping ping, String ip)
{
PingReply reply = await ping.SendPingAsync(ip, 1500);
if ( reply != null && reply.Status == System.Net.NetworkInformation.IPStatus.Success)
{
lock (m_lockObj)
{
if (m_ipsFound.Contains(ip) == false)
{
m_ipsFound.Add(ip);
}
}
}
}
}
}
One problem I see is async void. The only reason async void is even allowed is only for event handlers. If it's not an event handler, it's a red flag.
Asynchronous methods always start running synchronously until the first await that acts on an incomplete Task. In your code, that is at await Task.WhenAll(tasks). At that point, AsyncScan returns - before all the tasks have completed. Usually, it would return a Task that will let you know when it's done, but since the method signature is void, it cannot.
So now look at this:
await Task.Run(() => AsyncScan());
When AsyncScan() returns, then the Task returned from Task.Run completes and your code moves on, before all of the pings have finished.
So when you report your results, the number of results will be random, depending on how many happened to finish before you displayed the results.
If you want make sure that all of the pings are done before continuing, then change AsyncScan() to return a Task:
private async Task AsyncScan()
And change the Task.Run to await it:
await Task.Run(async () => await AsyncScan());
However, you could also just get rid of the Task.Run and just have this:
await AsyncScan();
Task.Run runs the code in a separate thread. The only reason to do that is in a UI app where you want to move CPU-heavy computations off of the UI thread. When you're just doing network requests like this, that's not necessary.
On top of that, you're also using async void here:
public async void ScanIPAddressesAsync()
Which means that wherever you call ScanIPAddressesAsync() is unable to wait until everything is done. Change that to async Task and await it too.
This code needs a lot of refactoring and bugs like this in concurrency are hard to pinpoint. My bet is on await Task.Run(() => AsyncScan()); which is wrong because AsyncScan() is async and Task.Run(...) will return before it is complete.
My second guess is m_ipsFound which is called a shared state. This means there might be many threads simultaneously reading and writing on this. List<T> is not a data type for this.
Also as a side point having a return in the last line of a method is not adding to the readability and async void is a prohibited practice. Always use async Task even if you return nothing. You can read more on this very good answer.
I have two examples, straight from microsoft, where these examples seem to have nothing to do with cancellation token, because I can remove the token that is fed to the task, and the result is the same. So my question is: What is the cancellation token for, and why the poor examples? Am I missing something..? :)
using System;
using System.Threading;
using System.Threading.Tasks;
namespace Chapter1.Threads
{
public class Program
{
static void Main()
{
CancellationTokenSource cancellationTokenSource =
new CancellationTokenSource();
CancellationToken token = cancellationTokenSource.Token;
Task task = Task.Run(() =>
{
while (!token.IsCancellationRequested)
{
Console.Write(“*”);
Thread.Sleep(1000);
}
token.ThrowIfCancellationRequested();
}, token);
try
{
Console.WriteLine(“Press enter to stop the task”);
Console.ReadLine();
cancellationTokenSource.Cancel();
task.Wait();
}
catch (AggregateException e)
{
Console.WriteLine(e.InnerExceptions[0].Message);
}
Console.WriteLine(“Press enter to end the application”);
Console.ReadLine();
}
}
}
Code example2:
https://msdn.microsoft.com/en-us/library/system.threading.cancellationtoken(v=vs.110).aspx
using System;
using System.Collections.Generic;
using System.Threading;
using System.Threading.Tasks;
public class Example
{
public static void Main()
{
// Define the cancellation token.
CancellationTokenSource source = new CancellationTokenSource();
CancellationToken token = source.Token;
Random rnd = new Random();
Object lockObj = new Object();
List<Task<int[]>> tasks = new List<Task<int[]>>();
TaskFactory factory = new TaskFactory(token);
for (int taskCtr = 0; taskCtr <= 10; taskCtr++) {
int iteration = taskCtr + 1;
tasks.Add(factory.StartNew( () => {
int value;
int[] values = new int[10];
for (int ctr = 1; ctr <= 10; ctr++) {
lock (lockObj) {
value = rnd.Next(0,101);
}
if (value == 0) {
source.Cancel();
Console.WriteLine("Cancelling at task {0}", iteration);
break;
}
values[ctr-1] = value;
}
return values;
}, token));
}
try {
Task<double> fTask = factory.ContinueWhenAll(tasks.ToArray(),
(results) => {
Console.WriteLine("Calculating overall mean...");
long sum = 0;
int n = 0;
foreach (var t in results) {
foreach (var r in t.Result) {
sum += r;
n++;
}
}
return sum/(double) n;
} , token);
Console.WriteLine("The mean is {0}.", fTask.Result);
}
catch (AggregateException ae) {
foreach (Exception e in ae.InnerExceptions) {
if (e is TaskCanceledException)
Console.WriteLine("Unable to compute mean: {0}",
((TaskCanceledException) e).Message);
else
Console.WriteLine("Exception: " + e.GetType().Name);
}
}
finally {
source.Dispose();
}
}
}
Since cancellation in .Net is cooperative passing a CancellationToken into Task.Run for example is not enough to make sure the task is cancelled.
Passing the token as a parameter only associates the token with the task. It can cancel the task only if it didn't have a chance to start running before the token was cancelled. For example:
var token = new CancellationToken(true); // creates a cancelled token
Task.Run(() => {}, token);
To cancel a task "mid-flight" you need the task itself to observe the token and throw when cancellation is signaled, similar to:
Task.Run(() =>
{
while (true)
{
token.ThrowIfCancellationRequested();
// do something
}
}, token);
Moreover, simply throwing an exception from inside the task only marks the task as Faulted. To mark it as Cancelled the TaskCanceledException.CancellationToken needs to match the token passed to Task.Run.
I was about to ask a similar question until I found this one. The answer from i3arnon makes sense but I'll add this answer as an addition to hopefully help someone along.
I'll start out by saying (in contrast to the comments on the accepted answer) that the examples from Microsoft on the MSDN are horrible. Unless you already know how Cancellation works, they won't help you much. This MSDN article shows you how to pass a CancellationToken to a Task but if you follow the examples, they never actually show you how to cancel your own currently executing Task. The CancellationToken just vanishes into Microsoft code:
await client.GetAsync("http://msdn.microsoft.com/en-us/library/dd470362.aspx", ct);
await response.Content.ReadAsByteArrayAsync();
Here are examples of how I use CancellationToken:
When I have a task that needs to continually repeat:
public class Foo
{
private CancellationTokenSource _cts;
public Foo()
{
this._cts = new CancellationTokenSource();
}
public void StartExecution()
{
Task.Factory.StartNew(this.OwnCodeCancelableTask, this._cts.Token);
Task.Factory.StartNew(this.OwnCodeCancelableTask_EveryNSeconds, this._cts.Token);
}
public void CancelExecution()
{
this._cts.Cancel();
}
/// <summary>
/// "Infinite" loop with no delays. Writing to a database while pulling from a buffer for example.
/// </summary>
/// <param name="taskState">The cancellation token from our _cts field, passed in the StartNew call</param>
private void OwnCodeCancelableTask(object taskState)
{
var token = (CancellationToken) taskState;
while ( !token.IsCancellationRequested )
{
Console.WriteLine("Do your task work in this loop");
}
}
/// <summary>
/// "Infinite" loop that runs every N seconds. Good for checking for a heartbeat or updates.
/// </summary>
/// <param name="taskState">The cancellation token from our _cts field, passed in the StartNew call</param>
private async void OwnCodeCancelableTask_EveryNSeconds(object taskState)
{
var token = (CancellationToken)taskState;
while (!token.IsCancellationRequested)
{
Console.WriteLine("Do the work that needs to happen every N seconds in this loop");
// Passing token here allows the Delay to be cancelled if your task gets cancelled.
await Task.Delay(1000 /*Or however long you want to wait.*/, token);
}
}
}
When I have a task that a user can initiate:
public class Foo
{
private CancellationTokenSource _cts;
private Task _taskWeCanCancel;
public Foo()
{
this._cts = new CancellationTokenSource();
//This is where it's confusing. Passing the token here will only ensure that the task doesn't
//run if it's canceled BEFORE it starts. This does not cancel the task during the operation of our code.
this._taskWeCanCancel = new Task(this.FireTheTask, this._cts.Token);
}
/// <summary>
/// I'm not a fan of returning tasks to calling code, so I keep this method void
/// </summary>
public void FireTheTask()
{
//Check task status here if it's required.
this._taskWeCanCancel.Start();
}
public void CancelTheTask()
{
this._cts.Cancel();
}
/// <summary>
/// Go and get something from the web, process a piece of data, execute a lengthy calculation etc...
/// </summary>
private async void OurTask()
{
Console.WriteLine("Do your work here and check periodically for task cancellation requests...");
if (this._cts.Token.IsCancellationRequested) return;
Console.WriteLine("Do another step to your work here then check the token again if necessary...");
if (this._cts.Token.IsCancellationRequested) return;
Console.WriteLine("Some work that we need to delegate to another task");
await Some.Microsoft.Object.DoStuffAsync();
}
}
Maybe I missed some key feature of Task, but passing the CancellationToken to a Task as anything other than state has never made much sense to me. I have yet to run into a situation where I've passed a CancellationToken to a Task and cancelled the Task before it's run, and even if I did, the first line in every Task I create is always
if (token.IsCancellationRequested) return;
I would like to handle a collection in parallel, but I'm having trouble implementing it and I'm therefore hoping for some help.
The trouble arises if I want to call a method marked async in C#, within the lambda of the parallel loop. For example:
var bag = new ConcurrentBag<object>();
Parallel.ForEach(myCollection, async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
}
var count = bag.Count;
The problem occurs with the count being 0, because all the threads created are effectively just background threads and the Parallel.ForEach call doesn't wait for completion. If I remove the async keyword, the method looks like this:
var bag = new ConcurrentBag<object>();
Parallel.ForEach(myCollection, item =>
{
// some pre stuff
var responseTask = await GetData(item);
responseTask.Wait();
var response = responseTask.Result;
bag.Add(response);
// some post stuff
}
var count = bag.Count;
It works, but it completely disables the await cleverness and I have to do some manual exception handling.. (Removed for brevity).
How can I implement a Parallel.ForEach loop, that uses the await keyword within the lambda? Is it possible?
The prototype of the Parallel.ForEach method takes an Action<T> as parameter, but I want it to wait for my asynchronous lambda.
If you just want simple parallelism, you can do this:
var bag = new ConcurrentBag<object>();
var tasks = myCollection.Select(async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
});
await Task.WhenAll(tasks);
var count = bag.Count;
If you need something more complex, check out Stephen Toub's ForEachAsync post.
You can use the ParallelForEachAsync extension method from AsyncEnumerator NuGet Package:
using Dasync.Collections;
var bag = new ConcurrentBag<object>();
await myCollection.ParallelForEachAsync(async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
}, maxDegreeOfParallelism: 10);
var count = bag.Count;
Disclaimer: I'm the author of the AsyncEnumerator library, which is open source and licensed under MIT, and I'm posting this message just to help the community.
One of the new .NET 6 APIs is Parallel.ForEachAsync, a way to schedule asynchronous work that allows you to control the degree of parallelism:
var urls = new []
{
"https://dotnet.microsoft.com",
"https://www.microsoft.com",
"https://stackoverflow.com"
};
var client = new HttpClient();
var options = new ParallelOptions { MaxDegreeOfParallelism = 2 };
await Parallel.ForEachAsync(urls, options, async (url, token) =>
{
var targetPath = Path.Combine(Path.GetTempPath(), "http_cache", url);
var response = await client.GetAsync(url);
if (response.IsSuccessStatusCode)
{
using var target = File.OpenWrite(targetPath);
await response.Content.CopyToAsync(target);
}
});
Another example in Scott Hanselman's blog.
The source, for reference.
With SemaphoreSlim you can achieve parallelism control.
var bag = new ConcurrentBag<object>();
var maxParallel = 20;
var throttler = new SemaphoreSlim(initialCount: maxParallel);
var tasks = myCollection.Select(async item =>
{
await throttler.WaitAsync();
try
{
var response = await GetData(item);
bag.Add(response);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
var count = bag.Count;
Simplest possible extension method compiled from other answers and the article referenced by the accepted asnwer:
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, Task> asyncAction, int maxDegreeOfParallelism)
{
var throttler = new SemaphoreSlim(initialCount: maxDegreeOfParallelism);
var tasks = source.Select(async item =>
{
await throttler.WaitAsync();
try
{
await asyncAction(item).ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
}
UPDATE: here's a simple modification that also supports a cancellation token like requested in the comments (untested)
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, CancellationToken, Task> asyncAction, int maxDegreeOfParallelism, CancellationToken cancellationToken)
{
var throttler = new SemaphoreSlim(initialCount: maxDegreeOfParallelism);
var tasks = source.Select(async item =>
{
await throttler.WaitAsync(cancellationToken);
if (cancellationToken.IsCancellationRequested) return;
try
{
await asyncAction(item, cancellationToken).ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
}
My lightweight implementation of ParallelForEach async.
Features:
Throttling (max degree of parallelism).
Exception handling (aggregation exception will be thrown at completion).
Memory efficient (no need to store the list of tasks).
public static class AsyncEx
{
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, Task> asyncAction, int maxDegreeOfParallelism = 10)
{
var semaphoreSlim = new SemaphoreSlim(maxDegreeOfParallelism);
var tcs = new TaskCompletionSource<object>();
var exceptions = new ConcurrentBag<Exception>();
bool addingCompleted = false;
foreach (T item in source)
{
await semaphoreSlim.WaitAsync();
asyncAction(item).ContinueWith(t =>
{
semaphoreSlim.Release();
if (t.Exception != null)
{
exceptions.Add(t.Exception);
}
if (Volatile.Read(ref addingCompleted) && semaphoreSlim.CurrentCount == maxDegreeOfParallelism)
{
tcs.TrySetResult(null);
}
});
}
Volatile.Write(ref addingCompleted, true);
await tcs.Task;
if (exceptions.Count > 0)
{
throw new AggregateException(exceptions);
}
}
}
Usage example:
await Enumerable.Range(1, 10000).ParallelForEachAsync(async (i) =>
{
var data = await GetData(i);
}, maxDegreeOfParallelism: 100);
I've created an extension method for this which makes use of SemaphoreSlim and also allows to set maximum degree of parallelism
/// <summary>
/// Concurrently Executes async actions for each item of <see cref="IEnumerable<typeparamref name="T"/>
/// </summary>
/// <typeparam name="T">Type of IEnumerable</typeparam>
/// <param name="enumerable">instance of <see cref="IEnumerable<typeparamref name="T"/>"/></param>
/// <param name="action">an async <see cref="Action" /> to execute</param>
/// <param name="maxDegreeOfParallelism">Optional, An integer that represents the maximum degree of parallelism,
/// Must be grater than 0</param>
/// <returns>A Task representing an async operation</returns>
/// <exception cref="ArgumentOutOfRangeException">If the maxActionsToRunInParallel is less than 1</exception>
public static async Task ForEachAsyncConcurrent<T>(
this IEnumerable<T> enumerable,
Func<T, Task> action,
int? maxDegreeOfParallelism = null)
{
if (maxDegreeOfParallelism.HasValue)
{
using (var semaphoreSlim = new SemaphoreSlim(
maxDegreeOfParallelism.Value, maxDegreeOfParallelism.Value))
{
var tasksWithThrottler = new List<Task>();
foreach (var item in enumerable)
{
// Increment the number of currently running tasks and wait if they are more than limit.
await semaphoreSlim.WaitAsync();
tasksWithThrottler.Add(Task.Run(async () =>
{
await action(item).ContinueWith(res =>
{
// action is completed, so decrement the number of currently running tasks
semaphoreSlim.Release();
});
}));
}
// Wait for all tasks to complete.
await Task.WhenAll(tasksWithThrottler.ToArray());
}
}
else
{
await Task.WhenAll(enumerable.Select(item => action(item)));
}
}
Sample Usage:
await enumerable.ForEachAsyncConcurrent(
async item =>
{
await SomeAsyncMethod(item);
},
5);
In the accepted answer the ConcurrentBag is not required.
Here's an implementation without it:
var tasks = myCollection.Select(GetData).ToList();
await Task.WhenAll(tasks);
var results = tasks.Select(t => t.Result);
Any of the "// some pre stuff" and "// some post stuff" can go into the GetData implementation (or another method that calls GetData)
Aside from being shorter, there's no use of an "async void" lambda, which is an anti pattern.
The following is set to work with IAsyncEnumerable but can be modified to use IEnumerable by just changing the type and removing the "await" on the foreach. It's far more appropriate for large sets of data than creating countless parallel tasks and then awaiting them all.
public static async Task ForEachAsyncConcurrent<T>(this IAsyncEnumerable<T> enumerable, Func<T, Task> action, int maxDegreeOfParallelism, int? boundedCapacity = null)
{
ActionBlock<T> block = new ActionBlock<T>(
action,
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = maxDegreeOfParallelism,
BoundedCapacity = boundedCapacity ?? maxDegreeOfParallelism * 3
});
await foreach (T item in enumerable)
{
await block.SendAsync(item).ConfigureAwait(false);
}
block.Complete();
await block.Completion;
}
For a more simple solution (not sure if the most optimal), you can simply nest Parallel.ForEach inside a Task - as such
var options = new ParallelOptions { MaxDegreeOfParallelism = 5 }
Task.Run(() =>
{
Parallel.ForEach(myCollection, options, item =>
{
DoWork(item);
}
}
The ParallelOptions will do the throttlering for you, out of the box.
I am using it in a real world scenario to run a very long operations in the background. These operations are called via HTTP and it was designed not to block the HTTP call while the long operation is running.
Calling HTTP for long background operation.
Operation starts at the background.
User gets status ID which can be used to check the status using another HTTP call.
The background operation update its status.
That way, the CI/CD call does not timeout because of long HTTP operation, rather it loops the status every x seconds without blocking the process