This question already has answers here:
How to limit the amount of concurrent async I/O operations?
(11 answers)
Closed 10 days ago.
What if you need to run multiple asynchronous I/O tasks in parallel but need to make sure that no more than X I/O processes are running at the same time; and pre and post I/O processing tasks shouldn't have such limitation.
Here is a scenario - let's say there are 1000 tasks; each of them accepts a text string as an input parameter; transforms that text (pre I/O processing) then writes that transformed text into a file. The goal is to make pre-processing logic utilize 100% of CPU/Cores and I/O portion of the tasks run with max 10 degree of parallelism (max 10 simultaneously opened for writing files at a time).
Can you provide a sample code how to do it with C# / .NET 4.5?
http://blogs.msdn.com/b/csharpfaq/archive/2012/01/23/using-async-for-file-access-alan-berman.aspx
I think using TPL Dataflow for this would be a good idea: you create pre- and post-process blocks with unbounded parallelism, a file-writing block with limited parallelism and link them together. Something like:
var unboundedParallelismOptions =
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded
};
var preProcessBlock = new TransformBlock<string, string>(
s => PreProcess(s), unboundedParallelismOptions);
var writeToFileBlock = new TransformBlock<string, string>(
async s =>
{
await WriteToFile(s);
return s;
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 10 });
var postProcessBlock = new ActionBlock<string>(
s => PostProcess(s), unboundedParallelismOptions);
var propagateCompletionOptions =
new DataflowLinkOptions { PropagateCompletion = true };
preProcessBlock.LinkTo(writeToFileBlock, propagateCompletionOptions);
writeToFileBlock.LinkTo(postProcessBlock, propagateCompletionOptions);
// use something like await preProcessBlock.SendAsync("text") here
preProcessBlock.Complete();
await postProcessBlock.Completion;
Where WriteToFile() could look like this:
private static async Task WriteToFile(string s)
{
using (var writer = new StreamWriter(GetFileName()))
await writer.WriteAsync(s);
}
It sounds like you'd want to consider a Djikstra Semaphore to control access to the starting of tasks.
However, this sounds like a typical queue/fixed number of consumers kind of problem, which may be a more appropriate way to structure it.
I would create an extension method in which one can set maximum degree of parallelism. SemaphoreSlim will be the savior here.
/// <summary>
/// Concurrently Executes async actions for each item of <see cref="IEnumerable<typeparamref name="T"/>
/// </summary>
/// <typeparam name="T">Type of IEnumerable</typeparam>
/// <param name="enumerable">instance of <see cref="IEnumerable<typeparamref name="T"/>"/></param>
/// <param name="action">an async <see cref="Action" /> to execute</param>
/// <param name="maxDegreeOfParallelism">Optional, An integer that represents the maximum degree of parallelism,
/// Must be grater than 0</param>
/// <returns>A Task representing an async operation</returns>
/// <exception cref="ArgumentOutOfRangeException">If the maxActionsToRunInParallel is less than 1</exception>
public static async Task ForEachAsyncConcurrent<T>(
this IEnumerable<T> enumerable,
Func<T, Task> action,
int? maxDegreeOfParallelism = null)
{
if (maxDegreeOfParallelism.HasValue)
{
using (var semaphoreSlim = new SemaphoreSlim(
maxDegreeOfParallelism.Value, maxDegreeOfParallelism.Value))
{
var tasksWithThrottler = new List<Task>();
foreach (var item in enumerable)
{
// Increment the number of currently running tasks and wait if they are more than limit.
await semaphoreSlim.WaitAsync();
tasksWithThrottler.Add(Task.Run(async () =>
{
await action(item).ContinueWith(res =>
{
// action is completed, so decrement the number of currently running tasks
semaphoreSlim.Release();
});
}));
}
// Wait for all tasks to complete.
await Task.WhenAll(tasksWithThrottler.ToArray());
}
}
else
{
await Task.WhenAll(enumerable.Select(item => action(item)));
}
}
Sample Usage:
await enumerable.ForEachAsyncConcurrent(
async item =>
{
await SomeAsyncMethod(item);
},
5);
Related
I am new at asynchronous programming and I use the following code to collect data from third-party API and every time I am getting different responses. Am I doing the wrong approach?
Parallel.ForEach(products, item =>{
GetProductsInfo(item);
});
public async Task<Product> GetProductsInfo(Product product)
{
var restClientProduct = new RestClient("URL");
var restRequestProduct = new RestRequest(Method.POST);
var proudctRequestJson = JsonConvert.SerializeObject(new ProudctRequest()
{
product_code = product.product_code,
});
restRequestProduct.AddHeader("cache-control", "no-cache");
restRequestProduct.AddHeader("Content-Length", proudctRequestJson.Count().ToString());
restRequestProduct.AddHeader("Content-Type", "application/json");
restRequestProduct.AddHeader("Accept", "application/json");
restRequestProduct.AddParameter("undefined", proudctRequestJson, ParameterType.RequestBody);
var responseProduct = GetResponseContentAsync(restClientProduct, restRequestProduct).Result;
if (responseProduct.StatusCode == HttpStatusCode.OK)
{
// set values form the responseProduct to the product
}
return product;
}
private Task<IRestResponse> GetResponseContentAsync(RestClient theClient, RestRequest theRequest)
{
var tcs = new TaskCompletionSource<IRestResponse>();
theClient.ExecuteAsync(theRequest, response =>
{
tcs.SetResult(response);
});
return tcs.Task;
}
The parts of your code you have shown us is not running asynchronously. You are calling .Result on GetResponseContentAsync(), which will block the thread until it finishes. That means that by the time Parallel.ForEach completes, all the HTTP requests will have completed.
If you are using await somewhere in that block of code you replaced with
// set values form the responseProduct to the product
then it's possible that the results are not being reported before Parallel.ForEach finishes. That is because Parallel.ForEach does not support asynchronous code, so it will not wait for them to finish.
Let's assume that GetProductsInfo is actually running asynchronously
Then the problem is: Parellel.ForEach is not waiting for my asynchronous operations to finish. There are a couple ways to handle this.
Implement your own ForEachAsync. This has been requested, and will probably be added (to .NET Core at least) eventually. But there is actually a sample implementation in the issue where this was requested:
/// <summary>
/// Executes a foreach asynchronously.
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="source">The source.</param>
/// <param name="dop">The degrees of parallelism.</param>
/// <param name="body">The body.</param>
/// <returns></returns>
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in System.Collections.Concurrent.Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate
{
using (partition)
{
while (partition.MoveNext())
await body(partition.Current);
}
}));
}
That is written as an extention method, so you would use it like this:
await products.ForEachAsync(10, GetProductsInfo);
Where 10 is the number of request you would like to run at a time.
You can use something like:
Task.WaitAll(items.Select(i => GetProductsInfo(i));
This will run the requests asynchronously, but block the calling thread until they all finish. Alternatively, you can await them, so it doesn't block the calling thread:
await Task.WhenAll(items.Select(i => GetProductsInfo(i))
However, both these methods will fire off all of the requests at once. If you know you will only ever have a small number, then that's fine. But if you might have a very large number, you could flood the web service. Using Parallel.ForEach, or the implementation of ForEachAsync above will send them in blocks.
If you use any of these methods to await the responses, then you really should await GetResponseContentAsync instead of using .Result:
var responseProduct = await GetResponseContentAsync(restClientProduct, restRequestProduct);
Using async/await is especially important in ASP.NET, where there is a maximum number of threads it can use.
I was hoping for a clean solution for throttling a specific type of producer while the consumer is busy processing, without writing a custom block of my own. I had hoped the code below would have done exactly that, but once SendAsync blocks after the capacity limit hits, its task never completes, insinuating that the postponed message is never consumed.
_block = new TransformBlock<int, string>(async i =>
{
// Send the next request to own input queue
// before processing this request, or block
// while pipeline is full.
// Do not start processing if pipeline is full!
await _block.SendAsync(i + 1);
// Process this request and pass it on to the
// next block in the pipeline.
return i.ToString();
},
// TransformBlock block has input and output buffers. Limit both,
// otherwise requests that cannot be passed on to the next
// block in the pipeline will be cached in this block's output
// buffer, never throttling this block.
new ExecutionDataflowBlockOptions { BoundedCapacity = 5 });
// This block is linked to the output of the
// transform block.
var action = new ActionBlock<string>(async i =>
{
// Do some very long processing on the transformed element.
await Task.Delay(1000);
},
// Limit buffer size, and consequently throttle previous blocks
// in the pipeline.
new ExecutionDataflowBlockOptions { BoundedCapacity = 5 });
_block.LinkTo(action);
// Start running.
_block.Post(0);
I was wondering if there is any reason why the linked ActionBlock does not consume the postponed message.
I faced the same problem as you. I didn't dig deep into implementation of LinkTo but I think it propogate message only when source block received some. I mean, there may be a case when source block have some messages in its input, but it will not process them until next Post/SendAsync it received. And that's your case.
Here is my solution and it's working for me.
First declare "engine"
/// <summary>
/// Engine-class (like a car engine) that produced a lot count (or infinite) of actions.
/// </summary>
public class Engine
{
private BufferBlock<int> _bufferBlock;
/// <summary>
/// Creates source block that produced stub data.
/// </summary>
/// <param name="count">Count of actions. If count = 0 then it's infinite loop.</param>
/// <param name="boundedCapacity">Bounded capacity (throttling).</param>
/// <param name="cancellationToken">Cancellation token (used to stop infinite loop).</param>
/// <returns>Source block that constantly produced 0-value.</returns>
public ISourceBlock<int> CreateEngine(int count, int boundedCapacity, CancellationToken cancellationToken)
{
_bufferBlock = new BufferBlock<int>(new DataflowBlockOptions { BoundedCapacity = boundedCapacity });
Task.Run(async () =>
{
var counter = 0;
while (count == 0 || counter < count)
{
await _bufferBlock.SendAsync(0);
if (cancellationToken.IsCancellationRequested)
return;
counter++;
}
}, cancellationToken).ContinueWith((task) =>
{
_bufferBlock.Complete();
});
return _bufferBlock;
}
}
And then Producer that uses engine
/// <summary>
/// Producer that generates random byte blobs with specified size.
/// </summary>
public class Producer
{
private static Random random = new Random();
/// <summary>
/// Returns source block that produced byte arrays.
/// </summary>
/// <param name="blobSize">Size of byte arrays.</param>
/// <param name="count">Total count of blobs (if 0 then infinite).</param>
/// <param name="boundedCapacity">Bounded capacity (throttling).</param>
/// <param name="cancellationToken">Cancellation token (used to stop infinite loop).</param>
/// <returns>Source block.</returns>
public static ISourceBlock<byte[]> BlobsSourceBlock(int blobSize, int count, int boundedCapacity, CancellationToken cancellationToken)
{
// Creating engine with specified bounded capacity.
var engine = new Engine().CreateEngine(count, boundedCapacity, cancellationToken);
// Creating transform block that uses our driver as a source.
var block = new TransformBlock<int, byte[]>(
// Useful work.
i => CreateBlob(blobSize),
new ExecutionDataflowBlockOptions
{
// Here you can specify your own throttling.
BoundedCapacity = boundedCapacity,
MaxDegreeOfParallelism = Environment.ProcessorCount,
});
// Linking engine (and engine is already working at that time).
engine.LinkTo(block, new DataflowLinkOptions { PropagateCompletion = true });
return block;
}
/// <summary>
/// Simple random byte[] generator.
/// </summary>
/// <param name="size">Array size.</param>
/// <returns>byte[]</returns>
private static byte[] CreateBlob(int size)
{
var buffer = new byte[size];
random.NextBytes(buffer);
return buffer;
}
}
Now you can use producer with consumer (eg ActionBlock)
var blobsProducer = BlobsProducer.CreateAndStartBlobsSourceBlock(0, 1024 * 1024, 10, cancellationTokenSource.Token);
var md5Hash = MD5.Create();
var actionBlock = new ActionBlock<byte[]>(b =>
{
Console.WriteLine(GetMd5Hash(md5Hash, b));
},
new ExecutionDataflowBlockOptions() { BoundedCapacity = 10 });
blobsProducer.LinkTo(actionBlock);
Hope it will help you!
So I just started to try and understand async, Task, lambda and so on, and I am unable to get it to work like I want. With the code below I want for it to lock btnDoWebRequest, do a unknow number of WebRequests as a Task and once all the Task are done unlock btnDoWebRequest. However I only want a max of 3 or whatever number I set of Tasks running at one time, which I got partly from Have a set of Tasks with only X running at a time.
But after trying and modifying my code in multiple ways, it will always immediately jump back and reenabled btnDoWebRequest. Of course VS is warning me about needing awaits, currently at ".ContinueWith((task)" and at the async in "await Task.WhenAll(requestInfoList .Select(async i =>", but can't seem to work where or how to put in the needed awaits. Of course as I'm still learning there is a good chance I am going at this all wrong and the whole thing needs to be reworked. So any help would be greatly appreciated.
Thanks
private SemaphoreSlim maxThread = new SemaphoreSlim(3);
private void btnDoWebRequest_Click(object sender, EventArgs e)
{
btnDoWebRequest.Enabled = false;
Task.Factory.StartNew(async () => await DoWebRequest()).Wait();
btnDoWebRequest.Enabled = true;
}
private async Task DoWebRequest()
{
List<requestInfo> requestInfoList = new List<requestInfo>();
for (int i = 0; dataRequestInfo.RowCount - 1 > i; i++)
{
requestInfoList.Add((requestInfo)dataRequestInfo.Rows[i].Tag);
}
await Task.WhenAll(requestInfoList .Select(async i =>
{
maxThread.Wait();
Task.Factory.StartNew(() =>
{
var task = Global.webRequestWork(i);
}, TaskCreationOptions.LongRunning).ContinueWith((task) => maxThread.Release());
}));
}
First, don't use Task.Factory.StartNew by default. In fact, this should be avoided in async code. If you need to execute code on a background thread, then use Task.Run.
In your case, there's no need to use Task.Run (or Task.Factory.StartNew).
Start at the lowest level and work your way up. You already have an asynchronous web-requesting method, which I'll rename to WebRequestAsync to follow the Task-based Asynchronous Programming naming guidelines.
Next, throttle it by using the asynchronous APIs on SemaphoreSlim:
await maxThread.WaitAsync();
try
{
await Global.WebRequestWorkAsync(i);
}
finally
{
maxThread.Release();
}
Do that for each request info (note that no background thread is required):
private async Task DoWebRequestsAsync()
{
List<requestInfo> requestInfoList = new List<requestInfo>();
for (int i = 0; dataRequestInfo.RowCount - 1 > i; i++)
{
requestInfoList.Add((requestInfo)dataRequestInfo.Rows[i].Tag);
}
await Task.WhenAll(requestInfoList.Select(async i =>
{
await maxThread.WaitAsync();
try
{
await Global.WebRequestWorkAsync(i);
}
finally
{
maxThread.Release();
}
}));
}
Finally, call this from your UI (again, no background thread is required):
private async void btnDoWebRequest_Click(object sender, EventArgs e)
{
btnDoWebRequest.Enabled = false;
await DoWebRequestsAsync();
btnDoWebRequest.Enabled = true;
}
In summary, only use Task.Run when you need to; do not use Task.Factory.StartNew, and do not use Wait (use await instead). I have an async intro on my blog with more information.
There are a couple of things wrong with your code:
Using Wait() on a Task is like running things synchronously, hence you only notice the UI reacting when everything is done and the button reenabled. You need to await an async method in order for it to truely run async. More so, if a method is doing IO bound work like a web request, spinning up a new Thread Pool thread (using Task.Factory.StartNew) is redundant and is a waste of resources.
Your button click event handler needs to be marked with async so you can await inside your method.
I've cleaned up your code a bit for clarity, using the new SemaphoreSlim WaitAsync and replaced your for with a LINQ query. You may only take the first two points and apply them to your code.
private SemaphoreSlim maxThread = new SemaphoreSlim(3);
private async void btnDoWebRequest_Click(object sender, EventArgs e)
{
btnDoWebRequest.Enabled = false;
await DoWebRequest();
btnDoWebRequest.Enabled = true;
}
private async Task DoWebRequest()
{
List<requestInfo> requestInfoList = new List<requestInfo>();
var requestInfoList = dataRequestInfo.Rows.Select(x => x.Tag).Cast<requestInfo>();
var tasks = requestInfoList.Select(async I =>
{
await maxThread.WaitAsync();
try
{
await Global.webRequestWork(i);
}
finally
{
maxThread.Release();
}
});
await Task.WhenAll(tasks);
I have created an extension method for this.
It can be used like this:
var tt = new List<Func<Task>>()
{
() => Thread.Sleep(300), //Thread.Sleep can be replaced by your own functionality, like calling the website
() => Thread.Sleep(800),
() => Thread.Sleep(250),
() => Thread.Sleep(1000),
() => Thread.Sleep(100),
() => Thread.Sleep(200),
};
await tt.WhenAll(3); //this will let 3 threads run, if one ends, the next will start, untill all are finished.
The extention method:
public static class TaskExtension
{
public static async Task WhenAll(this List<Func<Task>> actions, int threadCount)
{
var _countdownEvent = new CountdownEvent(actions.Count);
var _throttler = new SemaphoreSlim(threadCount);
foreach (Func<Task> action in actions)
{
await _throttler.WaitAsync();
Task.Run(async () =>
{
try
{
await action();
}
finally
{
_throttler.Release();
_countdownEvent.Signal();
}
});
}
_countdownEvent.Wait();
}
}
We can easily achieve this using SemaphoreSlim. Extension method I've created:
/// <summary>
/// Concurrently Executes async actions for each item of <see cref="IEnumerable<typeparamref name="T"/>
/// </summary>
/// <typeparam name="T">Type of IEnumerable</typeparam>
/// <param name="enumerable">instance of <see cref="IEnumerable<typeparamref name="T"/>"/></param>
/// <param name="action">an async <see cref="Action" /> to execute</param>
/// <param name="maxActionsToRunInParallel">Optional, max numbers of the actions to run in parallel,
/// Must be grater than 0</param>
/// <returns>A Task representing an async operation</returns>
/// <exception cref="ArgumentOutOfRangeException">If the maxActionsToRunInParallel is less than 1</exception>
public static async Task ForEachAsyncConcurrent<T>(
this IEnumerable<T> enumerable,
Func<T, Task> action,
int? maxActionsToRunInParallel = null)
{
if (maxActionsToRunInParallel.HasValue)
{
using (var semaphoreSlim = new SemaphoreSlim(
maxActionsToRunInParallel.Value, maxActionsToRunInParallel.Value))
{
var tasksWithThrottler = new List<Task>();
foreach (var item in enumerable)
{
// Increment the number of currently running tasks and wait if they are more than limit.
await semaphoreSlim.WaitAsync();
tasksWithThrottler.Add(Task.Run(async () =>
{
await action(item);
// action is completed, so decrement the number of currently running tasks
semaphoreSlim.Release();
}));
}
// Wait for all tasks to complete.
await Task.WhenAll(tasksWithThrottler.ToArray());
}
}
else
{
await Task.WhenAll(enumerable.Select(item => action(item)));
}
}
Sample Usage:
await enumerable.ForEachAsyncConcurrent(
async item =>
{
await SomeAsyncMethod(item);
},
5);
// let's say there is a list of 1000+ URLs
string[] urls = { "http://google.com", "http://yahoo.com", ... };
// now let's send HTTP requests to each of these URLs in parallel
urls.AsParallel().ForAll(async (url) => {
var client = new HttpClient();
var html = await client.GetStringAsync(url);
});
Here is the problem, it starts 1000+ simultaneous web requests. Is there an easy way to limit the concurrent amount of these async http requests? So that no more than 20 web pages are downloaded at any given time. How to do it in the most efficient manner?
You can definitely do this in the latest versions of async for .NET, using .NET 4.5 Beta. The previous post from 'usr' points to a good article written by Stephen Toub, but the less announced news is that the async semaphore actually made it into the Beta release of .NET 4.5
If you look at our beloved SemaphoreSlim class (which you should be using since it's more performant than the original Semaphore), it now boasts the WaitAsync(...) series of overloads, with all of the expected arguments - timeout intervals, cancellation tokens, all of your usual scheduling friends :)
Stephen's also written a more recent blog post about the new .NET 4.5 goodies that came out with beta see What’s New for Parallelism in .NET 4.5 Beta.
Last, here's some sample code about how to use SemaphoreSlim for async method throttling:
public async Task MyOuterMethod()
{
// let's say there is a list of 1000+ URLs
var urls = { "http://google.com", "http://yahoo.com", ... };
// now let's send HTTP requests to each of these URLs in parallel
var allTasks = new List<Task>();
var throttler = new SemaphoreSlim(initialCount: 20);
foreach (var url in urls)
{
// do an async wait until we can schedule again
await throttler.WaitAsync();
// using Task.Run(...) to run the lambda in its own parallel
// flow on the threadpool
allTasks.Add(
Task.Run(async () =>
{
try
{
var client = new HttpClient();
var html = await client.GetStringAsync(url);
}
finally
{
throttler.Release();
}
}));
}
// won't get here until all urls have been put into tasks
await Task.WhenAll(allTasks);
// won't get here until all tasks have completed in some way
// (either success or exception)
}
Last, but probably a worthy mention is a solution that uses TPL-based scheduling. You can create delegate-bound tasks on the TPL that have not yet been started, and allow for a custom task scheduler to limit the concurrency. In fact, there's an MSDN sample for it here:
See also TaskScheduler .
If you have an IEnumerable (ie. strings of URL s) and you want to do an I/O bound operation with each of these (ie. make an async http request) concurrently AND optionally you also want to set the maximum number of concurrent I/O requests in real time, here is how you can do that. This way you do not use thread pool et al, the method uses semaphoreslim to control max concurrent I/O requests similar to a sliding window pattern one request completes, leaves the semaphore and the next one gets in.
usage:
await ForEachAsync(urlStrings, YourAsyncFunc, optionalMaxDegreeOfConcurrency);
public static Task ForEachAsync<TIn>(
IEnumerable<TIn> inputEnumerable,
Func<TIn, Task> asyncProcessor,
int? maxDegreeOfParallelism = null)
{
int maxAsyncThreadCount = maxDegreeOfParallelism ?? DefaultMaxDegreeOfParallelism;
SemaphoreSlim throttler = new SemaphoreSlim(maxAsyncThreadCount, maxAsyncThreadCount);
IEnumerable<Task> tasks = inputEnumerable.Select(async input =>
{
await throttler.WaitAsync().ConfigureAwait(false);
try
{
await asyncProcessor(input).ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
return Task.WhenAll(tasks);
}
After the release of the .NET 6 (in November, 2021), the recommended way of limiting the amount of concurrent asynchronous I/O operations is the Parallel.ForEachAsync API, with the MaxDegreeOfParallelism configuration. Here is how it can be used in practice:
// let's say there is a list of 1000+ URLs
string[] urls = { "http://google.com", "http://yahoo.com", /*...*/ };
var client = new HttpClient();
var options = new ParallelOptions() { MaxDegreeOfParallelism = 20 };
// now let's send HTTP requests to each of these URLs in parallel
await Parallel.ForEachAsync(urls, options, async (url, cancellationToken) =>
{
var html = await client.GetStringAsync(url, cancellationToken);
});
In the above example the Parallel.ForEachAsync task is awaited asynchronously. You can also Wait it synchronously if you need to, which will block the current thread until the completion of all asynchronous operations. The synchronous Wait has the advantage that in case of errors, all exceptions will be propagated. On the contrary the await operator propagates by design only the first exception. In case this is a problem, you can find solutions here.
(Note: an idiomatic implementation of a ForEachAsync extension method that also propagates the results, can be found in the 4th revision of this answer)
There are a lot of pitfalls and direct use of a semaphore can be tricky in error cases, so I would suggest to use AsyncEnumerator NuGet Package instead of re-inventing the wheel:
// let's say there is a list of 1000+ URLs
string[] urls = { "http://google.com", "http://yahoo.com", ... };
// now let's send HTTP requests to each of these URLs in parallel
await urls.ParallelForEachAsync(async (url) => {
var client = new HttpClient();
var html = await client.GetStringAsync(url);
}, maxDegreeOfParalellism: 20);
Unfortunately, the .NET Framework is missing most important combinators for orchestrating parallel async tasks. There is no such thing built-in.
Look at the AsyncSemaphore class built by the most respectable Stephen Toub. What you want is called a semaphore, and you need an async version of it.
SemaphoreSlim can be very helpful here. Here's the extension method I've created.
/// <summary>
/// Concurrently Executes async actions for each item of <see cref="IEnumerable<typeparamref name="T"/>
/// </summary>
/// <typeparam name="T">Type of IEnumerable</typeparam>
/// <param name="enumerable">instance of <see cref="IEnumerable<typeparamref name="T"/>"/></param>
/// <param name="action">an async <see cref="Action" /> to execute</param>
/// <param name="maxActionsToRunInParallel">Optional, max numbers of the actions to run in parallel,
/// Must be grater than 0</param>
/// <returns>A Task representing an async operation</returns>
/// <exception cref="ArgumentOutOfRangeException">If the maxActionsToRunInParallel is less than 1</exception>
public static async Task ForEachAsyncConcurrent<T>(
this IEnumerable<T> enumerable,
Func<T, Task> action,
int? maxActionsToRunInParallel = null)
{
if (maxActionsToRunInParallel.HasValue)
{
using (var semaphoreSlim = new SemaphoreSlim(
maxActionsToRunInParallel.Value, maxActionsToRunInParallel.Value))
{
var tasksWithThrottler = new List<Task>();
foreach (var item in enumerable)
{
// Increment the number of currently running tasks and wait if they are more than limit.
await semaphoreSlim.WaitAsync();
tasksWithThrottler.Add(Task.Run(async () =>
{
await action(item).ContinueWith(res =>
{
// action is completed, so decrement the number of currently running tasks
semaphoreSlim.Release();
});
}));
}
// Wait for all of the provided tasks to complete.
await Task.WhenAll(tasksWithThrottler.ToArray());
}
}
else
{
await Task.WhenAll(enumerable.Select(item => action(item)));
}
}
Sample Usage:
await enumerable.ForEachAsyncConcurrent(
async item =>
{
await SomeAsyncMethod(item);
},
5);
Although 1000 tasks might be queued very quickly, the Parallel Tasks library can only handle concurrent tasks equal to the amount of CPU cores in the machine. That means that if you have a four-core machine, only 4 tasks will be executing at a given time (unless you lower the MaxDegreeOfParallelism).
this is not good practice as it changes a global variable. it is also not a general solution for async. but it is easy for all instances of HttpClient, if that's all you're after. you can simply try:
System.Net.ServicePointManager.DefaultConnectionLimit = 20;
Here is a handy Extension Method you can create to wrap a list of tasks such that they will be executed with a maximum degree of concurrency:
/// <summary>Allows to do any async operation in bulk while limiting the system to a number of concurrent items being processed.</summary>
private static IEnumerable<Task<T>> WithMaxConcurrency<T>(this IEnumerable<Task<T>> tasks, int maxParallelism)
{
SemaphoreSlim maxOperations = new SemaphoreSlim(maxParallelism);
// The original tasks get wrapped in a new task that must first await a semaphore before the original task is called.
return tasks.Select(task => maxOperations.WaitAsync().ContinueWith(_ =>
{
try { return task; }
finally { maxOperations.Release(); }
}).Unwrap());
}
Now instead of:
await Task.WhenAll(someTasks);
You can go
await Task.WhenAll(someTasks.WithMaxConcurrency(20));
Parallel computations should be used for speeding up CPU-bound operations. Here we are talking about I/O bound operations. Your implementation should be purely async, unless you're overwhelming the busy single core on your multi-core CPU.
EDIT I like the suggestion made by usr to use an "async semaphore" here.
Essentially you're going to want to create an Action or Task for each URL that you want to hit, put them in a List, and then process that list, limiting the number that can be processed in parallel.
My blog post shows how to do this both with Tasks and with Actions, and provides a sample project you can download and run to see both in action.
With Actions
If using Actions, you can use the built-in .Net Parallel.Invoke function. Here we limit it to running at most 20 threads in parallel.
var listOfActions = new List<Action>();
foreach (var url in urls)
{
var localUrl = url;
// Note that we create the Task here, but do not start it.
listOfTasks.Add(new Task(() => CallUrl(localUrl)));
}
var options = new ParallelOptions {MaxDegreeOfParallelism = 20};
Parallel.Invoke(options, listOfActions.ToArray());
With Tasks
With Tasks there is no built-in function. However, you can use the one that I provide on my blog.
/// <summary>
/// Starts the given tasks and waits for them to complete. This will run, at most, the specified number of tasks in parallel.
/// <para>NOTE: If one of the given tasks has already been started, an exception will be thrown.</para>
/// </summary>
/// <param name="tasksToRun">The tasks to run.</param>
/// <param name="maxTasksToRunInParallel">The maximum number of tasks to run in parallel.</param>
/// <param name="cancellationToken">The cancellation token.</param>
public static async Task StartAndWaitAllThrottledAsync(IEnumerable<Task> tasksToRun, int maxTasksToRunInParallel, CancellationToken cancellationToken = new CancellationToken())
{
await StartAndWaitAllThrottledAsync(tasksToRun, maxTasksToRunInParallel, -1, cancellationToken);
}
/// <summary>
/// Starts the given tasks and waits for them to complete. This will run the specified number of tasks in parallel.
/// <para>NOTE: If a timeout is reached before the Task completes, another Task may be started, potentially running more than the specified maximum allowed.</para>
/// <para>NOTE: If one of the given tasks has already been started, an exception will be thrown.</para>
/// </summary>
/// <param name="tasksToRun">The tasks to run.</param>
/// <param name="maxTasksToRunInParallel">The maximum number of tasks to run in parallel.</param>
/// <param name="timeoutInMilliseconds">The maximum milliseconds we should allow the max tasks to run in parallel before allowing another task to start. Specify -1 to wait indefinitely.</param>
/// <param name="cancellationToken">The cancellation token.</param>
public static async Task StartAndWaitAllThrottledAsync(IEnumerable<Task> tasksToRun, int maxTasksToRunInParallel, int timeoutInMilliseconds, CancellationToken cancellationToken = new CancellationToken())
{
// Convert to a list of tasks so that we don't enumerate over it multiple times needlessly.
var tasks = tasksToRun.ToList();
using (var throttler = new SemaphoreSlim(maxTasksToRunInParallel))
{
var postTaskTasks = new List<Task>();
// Have each task notify the throttler when it completes so that it decrements the number of tasks currently running.
tasks.ForEach(t => postTaskTasks.Add(t.ContinueWith(tsk => throttler.Release())));
// Start running each task.
foreach (var task in tasks)
{
// Increment the number of tasks currently running and wait if too many are running.
await throttler.WaitAsync(timeoutInMilliseconds, cancellationToken);
cancellationToken.ThrowIfCancellationRequested();
task.Start();
}
// Wait for all of the provided tasks to complete.
// We wait on the list of "post" tasks instead of the original tasks, otherwise there is a potential race condition where the throttler's using block is exited before some Tasks have had their "post" action completed, which references the throttler, resulting in an exception due to accessing a disposed object.
await Task.WhenAll(postTaskTasks.ToArray());
}
}
And then creating your list of Tasks and calling the function to have them run, with say a maximum of 20 simultaneous at a time, you could do this:
var listOfTasks = new List<Task>();
foreach (var url in urls)
{
var localUrl = url;
// Note that we create the Task here, but do not start it.
listOfTasks.Add(new Task(async () => await CallUrl(localUrl)));
}
await Tasks.StartAndWaitAllThrottledAsync(listOfTasks, 20);
I am writing a C# program to generate and upload a half million files via FTP. I want to process 4 files in parallel since the machine have 4 cores and the file generating takes much longer time. Is it possible to convert the following Powershell example to C#? Or is there any better framework such as Actor framework in C# (like F# MailboxProcessor)?
Powershell example
$maxConcurrentJobs = 3;
# Read the input and queue it up
$jobInput = get-content .\input.txt
$queue = [System.Collections.Queue]::Synchronized( (New-Object System.Collections.Queue) )
foreach($item in $jobInput)
{
$queue.Enqueue($item)
}
# Function that pops input off the queue and starts a job with it
function RunJobFromQueue
{
if( $queue.Count -gt 0)
{
$j = Start-Job -ScriptBlock {param($x); Get-WinEvent -LogName $x} -ArgumentList $queue.Dequeue()
Register-ObjectEvent -InputObject $j -EventName StateChanged -Action { RunJobFromQueue; Unregister-Event $eventsubscriber.SourceIdentifier; Remove-Job $eventsubscriber.SourceIdentifier } | Out-Null
}
}
# Start up to the max number of concurrent jobs
# Each job will take care of running the rest
for( $i = 0; $i -lt $maxConcurrentJobs; $i++ )
{
RunJobFromQueue
}
Update:
The connection to remote FTP server can be slow so I want to limit the FTP uploading processing.
Assuming you're building this with the TPL, you can set the ParallelOptions.MaxDegreesOfParallelism to whatever you want it to be.
Parallel.For for a code example.
Task Parallel Library is your friend here. See this link which describes what's available to you. Basically framework 4 comes with it which optimises these essentially background thread pooled threads to the number of processors on the running machine.
Perhaps something along the lines of:
ParallelOptions options = new ParallelOptions();
options.MaxDegreeOfParallelism = 4;
Then in your loop something like:
Parallel.Invoke(options,
() => new WebClient().Upload("http://www.linqpad.net", "lp.html"),
() => new WebClient().Upload("http://www.jaoo.dk", "jaoo.html"));
If you are using .Net 4.0 you can use the Parallel library
Supposing you're iterating throug the half million of files you can "parallel" the iteration using a Parallel Foreach for instance or you can have a look to PLinq
Here a comparison between the two
Essentially you're going to want to create an Action or Task for each file to upload, put them in a List, and then process that list, limiting the number that can be processed in parallel.
My blog post shows how to do this both with Tasks and with Actions, and provides a sample project you can download and run to see both in action.
With Actions
If using Actions, you can use the built-in .Net Parallel.Invoke function. Here we limit it to running at most 4 threads in parallel.
var listOfActions = new List<Action>();
foreach (var file in files)
{
var localFile = file;
// Note that we create the Task here, but do not start it.
listOfTasks.Add(new Task(() => UploadFile(localFile)));
}
var options = new ParallelOptions {MaxDegreeOfParallelism = 4};
Parallel.Invoke(options, listOfActions.ToArray());
This option doesn't support async though, and I'm assuming you're FileUpload function will be, so you might want to use the Task example below.
With Tasks
With Tasks there is no built-in function. However, you can use the one that I provide on my blog.
/// <summary>
/// Starts the given tasks and waits for them to complete. This will run, at most, the specified number of tasks in parallel.
/// <para>NOTE: If one of the given tasks has already been started, an exception will be thrown.</para>
/// </summary>
/// <param name="tasksToRun">The tasks to run.</param>
/// <param name="maxTasksToRunInParallel">The maximum number of tasks to run in parallel.</param>
/// <param name="cancellationToken">The cancellation token.</param>
public static async Task StartAndWaitAllThrottledAsync(IEnumerable<Task> tasksToRun, int maxTasksToRunInParallel, CancellationToken cancellationToken = new CancellationToken())
{
await StartAndWaitAllThrottledAsync(tasksToRun, maxTasksToRunInParallel, -1, cancellationToken);
}
/// <summary>
/// Starts the given tasks and waits for them to complete. This will run the specified number of tasks in parallel.
/// <para>NOTE: If a timeout is reached before the Task completes, another Task may be started, potentially running more than the specified maximum allowed.</para>
/// <para>NOTE: If one of the given tasks has already been started, an exception will be thrown.</para>
/// </summary>
/// <param name="tasksToRun">The tasks to run.</param>
/// <param name="maxTasksToRunInParallel">The maximum number of tasks to run in parallel.</param>
/// <param name="timeoutInMilliseconds">The maximum milliseconds we should allow the max tasks to run in parallel before allowing another task to start. Specify -1 to wait indefinitely.</param>
/// <param name="cancellationToken">The cancellation token.</param>
public static async Task StartAndWaitAllThrottledAsync(IEnumerable<Task> tasksToRun, int maxTasksToRunInParallel, int timeoutInMilliseconds, CancellationToken cancellationToken = new CancellationToken())
{
// Convert to a list of tasks so that we don't enumerate over it multiple times needlessly.
var tasks = tasksToRun.ToList();
using (var throttler = new SemaphoreSlim(maxTasksToRunInParallel))
{
var postTaskTasks = new List<Task>();
// Have each task notify the throttler when it completes so that it decrements the number of tasks currently running.
tasks.ForEach(t => postTaskTasks.Add(t.ContinueWith(tsk => throttler.Release())));
// Start running each task.
foreach (var task in tasks)
{
// Increment the number of tasks currently running and wait if too many are running.
await throttler.WaitAsync(timeoutInMilliseconds, cancellationToken);
cancellationToken.ThrowIfCancellationRequested();
task.Start();
}
// Wait for all of the provided tasks to complete.
// We wait on the list of "post" tasks instead of the original tasks, otherwise there is a potential race condition where the throttler's using block is exited before some Tasks have had their "post" action completed, which references the throttler, resulting in an exception due to accessing a disposed object.
await Task.WhenAll(postTaskTasks.ToArray());
}
}
And then creating your list of Tasks and calling the function to have them run, with say a maximum of 4 simultaneous at a time, you could do this:
var listOfTasks = new List<Task>();
foreach (var file in files)
{
var localFile = file;
// Note that we create the Task here, but do not start it.
listOfTasks.Add(new Task(async () => await UploadFile(localFile)));
}
await Tasks.StartAndWaitAllThrottledAsync(listOfTasks, 4);
Also, because this method supports async, it will not block the UI thread like using Parallel.Invoke or Parallel.ForEach would.
I have coded below technique where I use BlockingCollection as a thread count manager. It is quite simple to implement and handles the job.
It simply accepts Task objects and add an integer value to blocking list, increasing running thread count by 1. When thread finishes, it dequeues the object and releases the block on add operation for upcoming tasks.
public class BlockingTaskQueue
{
private BlockingCollection<int> threadManager { get; set; } = null;
public bool IsWorking
{
get
{
return threadManager.Count > 0 ? true : false;
}
}
public BlockingTaskQueue(int maxThread)
{
threadManager = new BlockingCollection<int>(maxThread);
}
public async Task AddTask(Task task)
{
Task.Run(() =>
{
Run(task);
});
}
private bool Run(Task task)
{
try
{
threadManager.Add(1);
task.Start();
task.Wait();
return true;
}
catch (Exception ex)
{
return false;
}
finally
{
threadManager.Take();
}
}
}