Limit the number of parallel threads in C# - c#

I am writing a C# program to generate and upload a half million files via FTP. I want to process 4 files in parallel since the machine have 4 cores and the file generating takes much longer time. Is it possible to convert the following Powershell example to C#? Or is there any better framework such as Actor framework in C# (like F# MailboxProcessor)?
Powershell example
$maxConcurrentJobs = 3;
# Read the input and queue it up
$jobInput = get-content .\input.txt
$queue = [System.Collections.Queue]::Synchronized( (New-Object System.Collections.Queue) )
foreach($item in $jobInput)
{
$queue.Enqueue($item)
}
# Function that pops input off the queue and starts a job with it
function RunJobFromQueue
{
if( $queue.Count -gt 0)
{
$j = Start-Job -ScriptBlock {param($x); Get-WinEvent -LogName $x} -ArgumentList $queue.Dequeue()
Register-ObjectEvent -InputObject $j -EventName StateChanged -Action { RunJobFromQueue; Unregister-Event $eventsubscriber.SourceIdentifier; Remove-Job $eventsubscriber.SourceIdentifier } | Out-Null
}
}
# Start up to the max number of concurrent jobs
# Each job will take care of running the rest
for( $i = 0; $i -lt $maxConcurrentJobs; $i++ )
{
RunJobFromQueue
}
Update:
The connection to remote FTP server can be slow so I want to limit the FTP uploading processing.

Assuming you're building this with the TPL, you can set the ParallelOptions.MaxDegreesOfParallelism to whatever you want it to be.
Parallel.For for a code example.

Task Parallel Library is your friend here. See this link which describes what's available to you. Basically framework 4 comes with it which optimises these essentially background thread pooled threads to the number of processors on the running machine.
Perhaps something along the lines of:
ParallelOptions options = new ParallelOptions();
options.MaxDegreeOfParallelism = 4;
Then in your loop something like:
Parallel.Invoke(options,
() => new WebClient().Upload("http://www.linqpad.net", "lp.html"),
() => new WebClient().Upload("http://www.jaoo.dk", "jaoo.html"));

If you are using .Net 4.0 you can use the Parallel library
Supposing you're iterating throug the half million of files you can "parallel" the iteration using a Parallel Foreach for instance or you can have a look to PLinq
Here a comparison between the two

Essentially you're going to want to create an Action or Task for each file to upload, put them in a List, and then process that list, limiting the number that can be processed in parallel.
My blog post shows how to do this both with Tasks and with Actions, and provides a sample project you can download and run to see both in action.
With Actions
If using Actions, you can use the built-in .Net Parallel.Invoke function. Here we limit it to running at most 4 threads in parallel.
var listOfActions = new List<Action>();
foreach (var file in files)
{
var localFile = file;
// Note that we create the Task here, but do not start it.
listOfTasks.Add(new Task(() => UploadFile(localFile)));
}
var options = new ParallelOptions {MaxDegreeOfParallelism = 4};
Parallel.Invoke(options, listOfActions.ToArray());
This option doesn't support async though, and I'm assuming you're FileUpload function will be, so you might want to use the Task example below.
With Tasks
With Tasks there is no built-in function. However, you can use the one that I provide on my blog.
/// <summary>
/// Starts the given tasks and waits for them to complete. This will run, at most, the specified number of tasks in parallel.
/// <para>NOTE: If one of the given tasks has already been started, an exception will be thrown.</para>
/// </summary>
/// <param name="tasksToRun">The tasks to run.</param>
/// <param name="maxTasksToRunInParallel">The maximum number of tasks to run in parallel.</param>
/// <param name="cancellationToken">The cancellation token.</param>
public static async Task StartAndWaitAllThrottledAsync(IEnumerable<Task> tasksToRun, int maxTasksToRunInParallel, CancellationToken cancellationToken = new CancellationToken())
{
await StartAndWaitAllThrottledAsync(tasksToRun, maxTasksToRunInParallel, -1, cancellationToken);
}
/// <summary>
/// Starts the given tasks and waits for them to complete. This will run the specified number of tasks in parallel.
/// <para>NOTE: If a timeout is reached before the Task completes, another Task may be started, potentially running more than the specified maximum allowed.</para>
/// <para>NOTE: If one of the given tasks has already been started, an exception will be thrown.</para>
/// </summary>
/// <param name="tasksToRun">The tasks to run.</param>
/// <param name="maxTasksToRunInParallel">The maximum number of tasks to run in parallel.</param>
/// <param name="timeoutInMilliseconds">The maximum milliseconds we should allow the max tasks to run in parallel before allowing another task to start. Specify -1 to wait indefinitely.</param>
/// <param name="cancellationToken">The cancellation token.</param>
public static async Task StartAndWaitAllThrottledAsync(IEnumerable<Task> tasksToRun, int maxTasksToRunInParallel, int timeoutInMilliseconds, CancellationToken cancellationToken = new CancellationToken())
{
// Convert to a list of tasks so that we don't enumerate over it multiple times needlessly.
var tasks = tasksToRun.ToList();
using (var throttler = new SemaphoreSlim(maxTasksToRunInParallel))
{
var postTaskTasks = new List<Task>();
// Have each task notify the throttler when it completes so that it decrements the number of tasks currently running.
tasks.ForEach(t => postTaskTasks.Add(t.ContinueWith(tsk => throttler.Release())));
// Start running each task.
foreach (var task in tasks)
{
// Increment the number of tasks currently running and wait if too many are running.
await throttler.WaitAsync(timeoutInMilliseconds, cancellationToken);
cancellationToken.ThrowIfCancellationRequested();
task.Start();
}
// Wait for all of the provided tasks to complete.
// We wait on the list of "post" tasks instead of the original tasks, otherwise there is a potential race condition where the throttler's using block is exited before some Tasks have had their "post" action completed, which references the throttler, resulting in an exception due to accessing a disposed object.
await Task.WhenAll(postTaskTasks.ToArray());
}
}
And then creating your list of Tasks and calling the function to have them run, with say a maximum of 4 simultaneous at a time, you could do this:
var listOfTasks = new List<Task>();
foreach (var file in files)
{
var localFile = file;
// Note that we create the Task here, but do not start it.
listOfTasks.Add(new Task(async () => await UploadFile(localFile)));
}
await Tasks.StartAndWaitAllThrottledAsync(listOfTasks, 4);
Also, because this method supports async, it will not block the UI thread like using Parallel.Invoke or Parallel.ForEach would.

I have coded below technique where I use BlockingCollection as a thread count manager. It is quite simple to implement and handles the job.
It simply accepts Task objects and add an integer value to blocking list, increasing running thread count by 1. When thread finishes, it dequeues the object and releases the block on add operation for upcoming tasks.
public class BlockingTaskQueue
{
private BlockingCollection<int> threadManager { get; set; } = null;
public bool IsWorking
{
get
{
return threadManager.Count > 0 ? true : false;
}
}
public BlockingTaskQueue(int maxThread)
{
threadManager = new BlockingCollection<int>(maxThread);
}
public async Task AddTask(Task task)
{
Task.Run(() =>
{
Run(task);
});
}
private bool Run(Task task)
{
try
{
threadManager.Add(1);
task.Start();
task.Wait();
return true;
}
catch (Exception ex)
{
return false;
}
finally
{
threadManager.Take();
}
}
}

Related

How to synchronize the recurrent execution of three tasks that depend on each other?

I would like to ask expert developers in C#. I have three recurrent tasks that my program needs to do. Task 2 depends on task 1 and task 3 depends on task 2, but task 1 doesn't need to wait for the other two tasks to finish in order to start again (the program is continuously running). Since each task takes some time, I would like to run each task in one thread or a C# Task. Once task 1 finishes task 2 starts and task 1 starts again ... etc.
I'm not sure what is the best way to implement this. I hope someone can guide me on this.
One way to achieve this is using something called the the Task Parallel Library. This provides a set of classes that allow you to arrange your tasks into "blocks". You create a method that does A, B and C sequentially, then TPL will take care of running multiple invocations of that method simultaneously. Here's a small example:
async Task Main()
{
var actionBlock = new ActionBlock<int>(DoTasksAsync, new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 2 // This is the number of simultaneous executions of DoTasksAsync that will be run
};
await actionBlock.SendAsync(1);
await actionBlock.SendAsync(2);
actionBlock.Complete();
await actionBlock.Completion;
}
async Task DoTasksAsync(int input)
{
await DoTaskAAsync();
await DoTaskBAsync();
await DoTaskCAsync();
}
I would probably use some kind of queue pattern.
I am not sure what the requirements for if task 1 is threadsafe or not, so I will keep it simple:
Task 1 is always executing. As soon as it finished, it posts a message on some queue and starts over.
Task 2 is listening to the queue. Whenever a message is available, it starts working on it.
Whenever task 2 finishes working, it calls task 3, so that it can do it's work.
As one of the comments mentioned, you should probably be able to use async/await successfully in your code. Especially between task 2 and 3. Note that task 1 can be run in parallel to task 2 and 3, since it is not dependent on any of the other task.
You could use the ParallelLoop method below. This method starts an asynchronous workflow, where the three tasks are invoked in parallel to each other, but sequentially to themselves. So you don't need to add synchronization inside each task, unless some task produces global side-effects that are visible from some other task.
The tasks are invoked on the ThreadPool, with the Task.Run method.
/// <summary>
/// Invokes three actions repeatedly in parallel on the ThreadPool, with the
/// action2 depending on the action1, and the action3 depending on the action2.
/// Each action is invoked sequentially to itself.
/// </summary>
public static async Task ParallelLoop<TResult1, TResult2>(
Func<TResult1> action1,
Func<TResult1, TResult2> action2,
Action<TResult2> action3,
CancellationToken cancellationToken = default)
{
// Arguments validation omitted
var task1 = Task.FromResult<TResult1>(default);
var task2 = Task.FromResult<TResult2>(default);
var task3 = Task.CompletedTask;
try
{
int counter = 0;
while (true)
{
counter++;
var result1 = await task1.ConfigureAwait(false);
cancellationToken.ThrowIfCancellationRequested();
task1 = Task.Run(action1); // Restart the task1
if (counter <= 1) continue; // In the first loop result1 is undefined
var result2 = await task2.ConfigureAwait(false);
cancellationToken.ThrowIfCancellationRequested();
task2 = Task.Run(() => action2(result1)); // Restart the task2
if (counter <= 2) continue; // In the second loop result2 is undefined
await task3.ConfigureAwait(false);
cancellationToken.ThrowIfCancellationRequested();
task3 = Task.Run(() => action3(result2)); // Restart the task3
}
}
finally
{
// Prevent fire-and-forget
Task allTasks = Task.WhenAll(task1, task2, task3);
try { await allTasks.ConfigureAwait(false); } catch { allTasks.Wait(); }
// Propagate all errors in an AggregateException
}
}
There is an obvious pattern in the implementation, that makes it trivial to add overloads having more than three actions. Each added action will require its own generic type parameter (TResult3, TResult4 etc).
Usage example:
var cts = new CancellationTokenSource();
Task loopTask = ParallelLoop(() =>
{
// First task
Thread.Sleep(1000); // Simulates synchronous work
return "OK"; // The result that is passed to the second task
}, result =>
{
// Second task
Thread.Sleep(1000); // Simulates synchronous work
return result + "!"; // The result that is passed to the third task
}, result =>
{
// Third task
Thread.Sleep(1000); // Simulates synchronous work
}, cts.Token);
In case any of the tasks fails, the whole loop will stop (with the loopTask.Exception containing the error). Since the tasks depend on each other, recovering from a single failed task is not possible¹. What you could do is to execute the whole loop through a Polly Retry policy, to make sure that the loop will be reincarnated in case of failure. If you are unfamiliar with the Polly library, you could use the simple and featureless RetryUntilCanceled method below:
public static async Task RetryUntilCanceled(Func<Task> action,
CancellationToken cancellationToken)
{
while (true)
{
cancellationToken.ThrowIfCancellationRequested();
try { await action().ConfigureAwait(false); }
catch { if (cancellationToken.IsCancellationRequested) throw; }
}
}
Usage:
Task loopTask = RetryUntilCanceled(() => ParallelLoop(() =>
{
//...
}, cts.Token), cts.Token);
Before exiting the process you are advised to Cancel() the CancellationTokenSource and Wait() (or await) the loopTask, in order for the loop to terminate gracefully. Otherwise some tasks may be aborted in the middle of their work.
¹ It is actually possible, and probably preferable, to execute each individual task through a Polly Retry policy. The parallel loop will be suspended until the failed task is retried successfully.

How can I read messages from a queue in parallel?

Situation
We have one message queue. We would like to process messages in parallel and limit the number of simultaneously processed messages.
Our trial code below does process messages in parallel, but it only starts a new batch of processes when the previous one is finished. We would like to restart Tasks as they finish.
In other words: The maximum number of Tasks should always be active as long as the message queue is not empty.
Trial code
static string queue = #".\Private$\concurrenttest";
private static void Process(CancellationToken token)
{
Task.Factory.StartNew(async () =>
{
while (true)
{
IEnumerable<Task> consumerTasks = ConsumerTasks();
await Task.WhenAll(consumerTasks);
await PeekAsync(new MessageQueue(queue));
}
});
}
private static IEnumerable<Task> ConsumerTasks()
{
for (int i = 0; i < 15; i++)
{
Command1 message;
try
{
MessageQueue msMq = new MessageQueue(queue);
msMq.Formatter = new XmlMessageFormatter(new Type[] { typeof(Command1) });
Message msg = msMq.Receive();
message = (Command1)msg.Body;
}
catch (MessageQueueException mqex)
{
if (mqex.MessageQueueErrorCode == MessageQueueErrorCode.IOTimeout)
yield break; // nothing in queue
else throw;
}
yield return Task.Run(() =>
{
Console.WriteLine("id: " + message.id + ", name: " + message.name);
Thread.Sleep(1000);
});
}
}
private static Task<Message> PeekAsync(MessageQueue msMq)
{
return Task.Factory.FromAsync<Message>(msMq.BeginPeek(), msMq.EndPeek);
}
EDIT
I spent a lot of time thinking about reliability of the pump - specifically if a message is received from the MessageQueue, cancellation becomes tricky - so I provided two ways to terminate the queue:
Signaling the CancellationToken stops the pipeline as quickly as possible and will likely result in dropped messages.
Calling MessagePump.Stop() terminates the pump but allows all messages which have already been taken from the queue to be fully processed before the MessagePump.Completion task transitions to RanToCompletion.
The solution uses TPL Dataflow (NuGet: Microsoft.Tpl.Dataflow).
Full implementation:
using System;
using System.Messaging;
using System.Threading;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;
namespace StackOverflow.Q34437298
{
/// <summary>
/// Pumps the message queue and processes messages in parallel.
/// </summary>
public sealed class MessagePump
{
/// <summary>
/// Creates a <see cref="MessagePump"/> and immediately starts pumping.
/// </summary>
public static MessagePump Run(
MessageQueue messageQueue,
Func<Message, Task> processMessage,
int maxDegreeOfParallelism,
CancellationToken ct = default(CancellationToken))
{
if (messageQueue == null) throw new ArgumentNullException(nameof(messageQueue));
if (processMessage == null) throw new ArgumentNullException(nameof(processMessage));
if (maxDegreeOfParallelism <= 0) throw new ArgumentOutOfRangeException(nameof(maxDegreeOfParallelism));
ct.ThrowIfCancellationRequested();
return new MessagePump(messageQueue, processMessage, maxDegreeOfParallelism, ct);
}
private readonly TaskCompletionSource<bool> _stop = new TaskCompletionSource<bool>();
/// <summary>
/// <see cref="Task"/> which completes when this instance
/// stops due to a <see cref="Stop"/> or cancellation request.
/// </summary>
public Task Completion { get; }
/// <summary>
/// Maximum number of parallel message processors.
/// </summary>
public int MaxDegreeOfParallelism { get; }
/// <summary>
/// <see cref="MessageQueue"/> that is pumped by this instance.
/// </summary>
public MessageQueue MessageQueue { get; }
/// <summary>
/// Creates a new <see cref="MessagePump"/> instance.
/// </summary>
private MessagePump(MessageQueue messageQueue, Func<Message, Task> processMessage, int maxDegreeOfParallelism, CancellationToken ct)
{
MessageQueue = messageQueue;
MaxDegreeOfParallelism = maxDegreeOfParallelism;
// Kick off the loop.
Completion = RunAsync(processMessage, ct);
}
/// <summary>
/// Soft-terminates the pump so that no more messages will be pumped.
/// Any messages already removed from the message queue will be
/// processed before this instance fully completes.
/// </summary>
public void Stop()
{
// Multiple calls to Stop are fine.
_stop.TrySetResult(true);
}
/// <summary>
/// Pump implementation.
/// </summary>
private async Task RunAsync(Func<Message, Task> processMessage, CancellationToken ct = default(CancellationToken))
{
using (CancellationTokenSource producerCTS = ct.CanBeCanceled
? CancellationTokenSource.CreateLinkedTokenSource(ct)
: new CancellationTokenSource())
{
// This CancellationToken will either be signaled
// externally, or if our consumer errors.
ct = producerCTS.Token;
// Handover between producer and consumer.
DataflowBlockOptions bufferOptions = new DataflowBlockOptions {
// There is no point in dequeuing more messages than we can process,
// so we'll throttle the producer by limiting the buffer capacity.
BoundedCapacity = MaxDegreeOfParallelism,
CancellationToken = ct
};
BufferBlock<Message> buffer = new BufferBlock<Message>(bufferOptions);
Task producer = Task.Run(async () =>
{
try
{
while (_stop.Task.Status != TaskStatus.RanToCompletion)
{
// This line and next line are the *only* two cancellation
// points which will not cause dropped messages.
ct.ThrowIfCancellationRequested();
Task<Message> peekTask = WithCancellation(PeekAsync(MessageQueue), ct);
if (await Task.WhenAny(peekTask, _stop.Task).ConfigureAwait(false) == _stop.Task)
{
// Stop was signaled before PeekAsync returned. Wind down the producer gracefully
// by breaking out and propagating completion to the consumer blocks.
break;
}
await peekTask.ConfigureAwait(false); // Observe Peek exceptions.
ct.ThrowIfCancellationRequested();
// Zero timeout means that we will error if someone else snatches the
// peeked message from the queue before we get to it (due to a race).
// I deemed this better than getting stuck waiting for a message which
// may never arrive, or, worse yet, let this ReceiveAsync run onobserved
// due to a cancellation (if we choose to abandon it like we do PeekAsync).
// You will have to restart the pump if this throws.
// Omit timeout if this behaviour is undesired.
Message message = await ReceiveAsync(MessageQueue, timeout: TimeSpan.Zero).ConfigureAwait(false);
await buffer.SendAsync(message, ct).ConfigureAwait(false);
}
}
finally
{
buffer.Complete();
}
},
ct);
// Wire up the parallel consumers.
ExecutionDataflowBlockOptions executionOptions = new ExecutionDataflowBlockOptions {
CancellationToken = ct,
MaxDegreeOfParallelism = MaxDegreeOfParallelism,
SingleProducerConstrained = true, // We don't require thread safety guarantees.
BoundedCapacity = MaxDegreeOfParallelism,
};
ActionBlock<Message> consumer = new ActionBlock<Message>(async message =>
{
ct.ThrowIfCancellationRequested();
await processMessage(message).ConfigureAwait(false);
},
executionOptions);
buffer.LinkTo(consumer, new DataflowLinkOptions { PropagateCompletion = true });
if (await Task.WhenAny(producer, consumer.Completion).ConfigureAwait(false) == consumer.Completion)
{
// If we got here, consumer probably errored. Stop the producer
// before we throw so we don't go dequeuing more messages.
producerCTS.Cancel();
}
// Task.WhenAll checks faulted tasks before checking any
// canceled tasks, so if our consumer threw a legitimate
// execption, that's what will be rethrown, not the OCE.
await Task.WhenAll(producer, consumer.Completion).ConfigureAwait(false);
}
}
/// <summary>
/// APM -> TAP conversion for MessageQueue.Begin/EndPeek.
/// </summary>
private static Task<Message> PeekAsync(MessageQueue messageQueue)
{
return Task.Factory.FromAsync(messageQueue.BeginPeek(), messageQueue.EndPeek);
}
/// <summary>
/// APM -> TAP conversion for MessageQueue.Begin/EndReceive.
/// </summary>
private static Task<Message> ReceiveAsync(MessageQueue messageQueue, TimeSpan timeout)
{
return Task.Factory.FromAsync(messageQueue.BeginReceive(timeout), messageQueue.EndPeek);
}
/// <summary>
/// Allows abandoning tasks which do not natively
/// support cancellation. Use with caution.
/// </summary>
private static async Task<T> WithCancellation<T>(Task<T> task, CancellationToken ct)
{
ct.ThrowIfCancellationRequested();
TaskCompletionSource<bool> tcs = new TaskCompletionSource<bool>();
using (ct.Register(s => ((TaskCompletionSource<bool>)s).TrySetResult(true), tcs, false))
{
if (task != await Task.WhenAny(task, tcs.Task).ConfigureAwait(false))
{
// Cancellation task completed first.
// We are abandoning the original task.
throw new OperationCanceledException(ct);
}
}
// Task completed: synchronously return result or propagate exceptions.
return await task.ConfigureAwait(false);
}
}
}
Usage:
using (MessageQueue msMq = GetQueue())
{
MessagePump pump = MessagePump.Run(
msMq,
async message =>
{
await Task.Delay(50);
Console.WriteLine($"Finished processing message {message.Id}");
},
maxDegreeOfParallelism: 4
);
for (int i = 0; i < 100; i++)
{
msMq.Send(new Message());
Thread.Sleep(25);
}
pump.Stop();
await pump.Completion;
}
Untidy but functional unit tests:
https://gist.github.com/KirillShlenskiy/7f3e2c4b28b9f940c3da
ORIGINAL ANSWER
As mentioned in my comment, there are established producer/consumer patterns in .NET, one of which is pipeline. An excellent example of such can be found in "Patterns of Parallel Programming" by Microsoft's own Stephen Toub (full text here: https://www.microsoft.com/en-au/download/details.aspx?id=19222, page 55).
The idea is simple: producers continuously throw stuff in a queue, and consumers pull it out and process (in parallel to producers and possibly one another).
Here's an example of a message pipeline where the consumer uses synchronous, blocking methods to process the items as they arrive (I've parallelised the consumer to suit your scenario):
void MessageQueueWithBlockingCollection()
{
// If your processing is continuous and never stops throughout the lifetime of
// your application, you can ignore the fact that BlockingCollection is IDisposable.
using (BlockingCollection<Message> messages = new BlockingCollection<Message>())
{
Task producer = Task.Run(() =>
{
try
{
for (int i = 0; i < 10; i++)
{
// Hand over the message to the consumer.
messages.Add(new Message());
// Simulated arrival delay for the next message.
Thread.Sleep(10);
}
}
finally
{
// Notify consumer that there is no more data.
messages.CompleteAdding();
}
});
Task consumer = Task.Run(() =>
{
ParallelOptions options = new ParallelOptions {
MaxDegreeOfParallelism = 4
};
Parallel.ForEach(messages.GetConsumingEnumerable(), options, message => {
ProcessMessage(message);
});
});
Task.WaitAll(producer, consumer);
}
}
void ProcessMessage(Message message)
{
Thread.Sleep(40);
}
The above code completes in approx 130-140 ms, which is exactly what you would expect given the parallelisation of the consumers.
Now, in your scenario you are using Tasks and async/await better suited to TPL Dataflow (official Microsoft supported library tailored to parallel and asynchronous sequence processing).
Here's a little demo showing the different types of TPL Dataflow processing blocks that you would use for the job:
async Task MessageQueueWithTPLDataflow()
{
// Set up our queue.
BufferBlock<Message> queue = new BufferBlock<Message>();
// Set up our processing stage (consumer).
ExecutionDataflowBlockOptions options = new ExecutionDataflowBlockOptions {
CancellationToken = CancellationToken.None, // Plug in your own in case you need to support cancellation.
MaxDegreeOfParallelism = 4
};
ActionBlock<Message> consumer = new ActionBlock<Message>(m => ProcessMessageAsync(m), options);
// Link the queue to the consumer.
queue.LinkTo(consumer, new DataflowLinkOptions { PropagateCompletion = true });
// Wire up our producer.
Task producer = Task.Run(async () =>
{
try
{
for (int i = 0; i < 10; i++)
{
queue.Post(new Message());
await Task.Delay(10).ConfigureAwait(false);
}
}
finally
{
// Signal to the consumer that there are no more items.
queue.Complete();
}
});
await consumer.Completion.ConfigureAwait(false);
}
Task ProcessMessageAsync(Message message)
{
return Task.Delay(40);
}
It's not hard to adapt the above to use your MessageQueue and you can be sure that the end result will be free of threading issues. I'll do just that if I get a bit more time today/tomorrow.
You have one collection of things you want to process.
You create another collection for things being processed (this could be your task objects or items of some sort that reference a task).
You create a loop that will repeat as long as you have work to do. That is, work items are waiting to be started or you still have work items being processed.
At the start of the loop you populate your active task collection with as many tasks as you want to run concurrently and you start them as you add them.
You let the things run for a while (like Thread.Sleep(10);).
You create an inner loop that checks all your started tasks for completion. If one has completed, you remove it and report the results or do whatever seems appropriate.
That's it. On the next turn the upper part of your outer loop will add tasks to your running tasks collection until the number equals the maximum you have set, keeping your work-in-progress collection full.
You may want to do all this on a worker thread and monitor cancel requests in your loop.
The task library in .NET is made to execute a number of tasks in parallell. While there are ways to limit the number of active tasks, the library itself will limit the number of active tasks according to the computers CPU.
The first question that needs to be answered is why do you need to create another limit? If the limit imposed by the task library is OK, then you can just keep create tasks and rely on the task library to execute it with good performance.
If this is OK, then as soon as you get a message from MSMQ just start a task to process the message, skip the waiting (WhenAll call), start over and wait for the next message.
You can limit the number of concurrent tasks by using a custom task scheduler. More on MSDN: https://msdn.microsoft.com/en-us/library/system.threading.tasks.taskscheduler%28v=vs.110%29.aspx.
My colleague came up with the solution below. This solution works, but I'll let this code be reviewed on Code Review.
Based on answers given and some research of our own, we've come to a solution. We're using a SemaphoreSlim to limit our number of parallel Tasks.
static string queue = #".\Private$\concurrenttest";
private static async Task Process(CancellationToken token)
{
MessageQueue msMq = new MessageQueue(queue);
msMq.Formatter = new XmlMessageFormatter(new Type[] { typeof(Command1) });
SemaphoreSlim s = new SemaphoreSlim(15, 15);
while (true)
{
await s.WaitAsync();
await PeekAsync(msMq);
Command1 message = await ReceiveAsync(msMq);
Task.Run(async () =>
{
try
{
await HandleMessage(message);
}
catch (Exception)
{
// Exception handling
}
s.Release();
});
}
}
private static Task HandleMessage(Command1 message)
{
Console.WriteLine("id: " + message.id + ", name: " + message.name);
Thread.Sleep(1000);
return Task.FromResult(1);
}
private static Task<Message> PeekAsync(MessageQueue msMq)
{
return Task.Factory.FromAsync<Message>(msMq.BeginPeek(), msMq.EndPeek);
}
public class Command1
{
public int id { get; set; }
public string name { get; set; }
}
private static async Task<Command1> ReceiveAsync(MessageQueue msMq)
{
var receiveAsync = await Task.Factory.FromAsync<Message>(msMq.BeginReceive(), msMq.EndPeek);
return (Command1)receiveAsync.Body;
}
You should look at using Microsoft's Reactive Framework for this.
You code could look like this:
var query =
from command1 in FromQueue<Command1>(queue)
from text in Observable.Start(() =>
{
Thread.Sleep(1000);
return "id: " + command1.id + ", name: " + command1.name;
})
select text;
var subscription =
query
.Subscribe(text => Console.WriteLine(text));
This does all of the processing in parallel, and ensures that the processing is properly distributed across all cores. When one value ends another starts.
To cancel the subscription just call subscription.Dispose().
The code for FromQueue is:
static IObservable<T> FromQueue<T>(string serverQueue)
{
return Observable.Create<T>(observer =>
{
var responseQueue = Environment.MachineName + "\\Private$\\" + Guid.NewGuid().ToString();
var queue = MessageQueue.Create(responseQueue);
var frm = new System.Messaging.BinaryMessageFormatter();
var srv = new MessageQueue(serverQueue);
srv.Formatter = frm;
queue.Formatter = frm;
srv.Send("S " + responseQueue);
var loop = NewThreadScheduler.Default.ScheduleLongRunning(cancel =>
{
while (!cancel.IsDisposed)
{
var msg = queue.Receive();
observer.OnNext((T)msg.Body);
}
});
return new CompositeDisposable(
loop,
Disposable.Create(() =>
{
srv.Send("D " + responseQueue);
MessageQueue.Delete(responseQueue);
})
);
});
}
Just NuGet "Rx-Main" to get the bits.
In order to limit the concurrency you can do this:
int maxConcurrent = 2;
var query =
FromQueue<Command1>(queue)
.Select(command1 => Observable.Start(() =>
{
Thread.Sleep(1000);
return "id: " + command1.id + ", name: " + command1.name;
}))
.Merge(maxConcurrent);

.NET TPL Start a task after one ends indefinitely

I have an application which process images extracted from database. I need to process various images in parallel, and because of that i'm using .NET TPL with Tasks.
My application is a Winforms app in C#. I just have the option to choose how many processes will start and a Start button. The way that i'm doing right now is this:
private Action getBusinessAction(int numProcess) {
return () =>
{
try {
while (true) {
(new BusinessClass()).doSomeProcess(numProcess);
tokenCancelacion.ThrowIfCancellationRequested();
}
}
catch (OperationCanceledException ex) {
Console.WriteLine("Cancelled process");
}
};
}
...
for (int cnt = 0; cnt < NUM_OF_MAX_PROCESS; cnt++) {
Task.Factory.StartNew(getBusinessAction(cnt + 1), tokenCancelacion);
}
In this case, if i choose 8 as the number of processes, it starts 8 tasks which are running until application is closed. But i think that a better approach will be to start a number of tasks which calls doSomeProcess method, and then finish. But when a task finishes, i would like to start the same task or start a new instance of the task that does the same, in order to having always that number of processes running in parallel. Is there a way in TPL to achieve this?
This sounds like a good fit for TPL Dataflow's ActionBlock. You create a single block at the start. Set how many items it should process concurrently (i.e. 8) and post the items into it:
var block = new ActionBlock<Image>(
image => ProcessImage(image),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 8});
foreach (var image in GetImages())
{
block.Post(image);
}
This will take care of creating up to 8 tasks when needed and when they aren't anymore the number would go down.
When you are done you should signal the block for completion and wait for it to complete:
block.Complete();
await block.Completion;

How to properly run multiple async tasks in parallel? [duplicate]

This question already has answers here:
How to limit the amount of concurrent async I/O operations?
(11 answers)
Closed 10 days ago.
What if you need to run multiple asynchronous I/O tasks in parallel but need to make sure that no more than X I/O processes are running at the same time; and pre and post I/O processing tasks shouldn't have such limitation.
Here is a scenario - let's say there are 1000 tasks; each of them accepts a text string as an input parameter; transforms that text (pre I/O processing) then writes that transformed text into a file. The goal is to make pre-processing logic utilize 100% of CPU/Cores and I/O portion of the tasks run with max 10 degree of parallelism (max 10 simultaneously opened for writing files at a time).
Can you provide a sample code how to do it with C# / .NET 4.5?
http://blogs.msdn.com/b/csharpfaq/archive/2012/01/23/using-async-for-file-access-alan-berman.aspx
I think using TPL Dataflow for this would be a good idea: you create pre- and post-process blocks with unbounded parallelism, a file-writing block with limited parallelism and link them together. Something like:
var unboundedParallelismOptions =
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded
};
var preProcessBlock = new TransformBlock<string, string>(
s => PreProcess(s), unboundedParallelismOptions);
var writeToFileBlock = new TransformBlock<string, string>(
async s =>
{
await WriteToFile(s);
return s;
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 10 });
var postProcessBlock = new ActionBlock<string>(
s => PostProcess(s), unboundedParallelismOptions);
var propagateCompletionOptions =
new DataflowLinkOptions { PropagateCompletion = true };
preProcessBlock.LinkTo(writeToFileBlock, propagateCompletionOptions);
writeToFileBlock.LinkTo(postProcessBlock, propagateCompletionOptions);
// use something like await preProcessBlock.SendAsync("text") here
preProcessBlock.Complete();
await postProcessBlock.Completion;
Where WriteToFile() could look like this:
private static async Task WriteToFile(string s)
{
using (var writer = new StreamWriter(GetFileName()))
await writer.WriteAsync(s);
}
It sounds like you'd want to consider a Djikstra Semaphore to control access to the starting of tasks.
However, this sounds like a typical queue/fixed number of consumers kind of problem, which may be a more appropriate way to structure it.
I would create an extension method in which one can set maximum degree of parallelism. SemaphoreSlim will be the savior here.
/// <summary>
/// Concurrently Executes async actions for each item of <see cref="IEnumerable<typeparamref name="T"/>
/// </summary>
/// <typeparam name="T">Type of IEnumerable</typeparam>
/// <param name="enumerable">instance of <see cref="IEnumerable<typeparamref name="T"/>"/></param>
/// <param name="action">an async <see cref="Action" /> to execute</param>
/// <param name="maxDegreeOfParallelism">Optional, An integer that represents the maximum degree of parallelism,
/// Must be grater than 0</param>
/// <returns>A Task representing an async operation</returns>
/// <exception cref="ArgumentOutOfRangeException">If the maxActionsToRunInParallel is less than 1</exception>
public static async Task ForEachAsyncConcurrent<T>(
this IEnumerable<T> enumerable,
Func<T, Task> action,
int? maxDegreeOfParallelism = null)
{
if (maxDegreeOfParallelism.HasValue)
{
using (var semaphoreSlim = new SemaphoreSlim(
maxDegreeOfParallelism.Value, maxDegreeOfParallelism.Value))
{
var tasksWithThrottler = new List<Task>();
foreach (var item in enumerable)
{
// Increment the number of currently running tasks and wait if they are more than limit.
await semaphoreSlim.WaitAsync();
tasksWithThrottler.Add(Task.Run(async () =>
{
await action(item).ContinueWith(res =>
{
// action is completed, so decrement the number of currently running tasks
semaphoreSlim.Release();
});
}));
}
// Wait for all tasks to complete.
await Task.WhenAll(tasksWithThrottler.ToArray());
}
}
else
{
await Task.WhenAll(enumerable.Select(item => action(item)));
}
}
Sample Usage:
await enumerable.ForEachAsyncConcurrent(
async item =>
{
await SomeAsyncMethod(item);
},
5);

How to limit the amount of concurrent async I/O operations?

// let's say there is a list of 1000+ URLs
string[] urls = { "http://google.com", "http://yahoo.com", ... };
// now let's send HTTP requests to each of these URLs in parallel
urls.AsParallel().ForAll(async (url) => {
var client = new HttpClient();
var html = await client.GetStringAsync(url);
});
Here is the problem, it starts 1000+ simultaneous web requests. Is there an easy way to limit the concurrent amount of these async http requests? So that no more than 20 web pages are downloaded at any given time. How to do it in the most efficient manner?
You can definitely do this in the latest versions of async for .NET, using .NET 4.5 Beta. The previous post from 'usr' points to a good article written by Stephen Toub, but the less announced news is that the async semaphore actually made it into the Beta release of .NET 4.5
If you look at our beloved SemaphoreSlim class (which you should be using since it's more performant than the original Semaphore), it now boasts the WaitAsync(...) series of overloads, with all of the expected arguments - timeout intervals, cancellation tokens, all of your usual scheduling friends :)
Stephen's also written a more recent blog post about the new .NET 4.5 goodies that came out with beta see What’s New for Parallelism in .NET 4.5 Beta.
Last, here's some sample code about how to use SemaphoreSlim for async method throttling:
public async Task MyOuterMethod()
{
// let's say there is a list of 1000+ URLs
var urls = { "http://google.com", "http://yahoo.com", ... };
// now let's send HTTP requests to each of these URLs in parallel
var allTasks = new List<Task>();
var throttler = new SemaphoreSlim(initialCount: 20);
foreach (var url in urls)
{
// do an async wait until we can schedule again
await throttler.WaitAsync();
// using Task.Run(...) to run the lambda in its own parallel
// flow on the threadpool
allTasks.Add(
Task.Run(async () =>
{
try
{
var client = new HttpClient();
var html = await client.GetStringAsync(url);
}
finally
{
throttler.Release();
}
}));
}
// won't get here until all urls have been put into tasks
await Task.WhenAll(allTasks);
// won't get here until all tasks have completed in some way
// (either success or exception)
}
Last, but probably a worthy mention is a solution that uses TPL-based scheduling. You can create delegate-bound tasks on the TPL that have not yet been started, and allow for a custom task scheduler to limit the concurrency. In fact, there's an MSDN sample for it here:
See also TaskScheduler .
If you have an IEnumerable (ie. strings of URL s) and you want to do an I/O bound operation with each of these (ie. make an async http request) concurrently AND optionally you also want to set the maximum number of concurrent I/O requests in real time, here is how you can do that. This way you do not use thread pool et al, the method uses semaphoreslim to control max concurrent I/O requests similar to a sliding window pattern one request completes, leaves the semaphore and the next one gets in.
usage:
await ForEachAsync(urlStrings, YourAsyncFunc, optionalMaxDegreeOfConcurrency);
public static Task ForEachAsync<TIn>(
IEnumerable<TIn> inputEnumerable,
Func<TIn, Task> asyncProcessor,
int? maxDegreeOfParallelism = null)
{
int maxAsyncThreadCount = maxDegreeOfParallelism ?? DefaultMaxDegreeOfParallelism;
SemaphoreSlim throttler = new SemaphoreSlim(maxAsyncThreadCount, maxAsyncThreadCount);
IEnumerable<Task> tasks = inputEnumerable.Select(async input =>
{
await throttler.WaitAsync().ConfigureAwait(false);
try
{
await asyncProcessor(input).ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
return Task.WhenAll(tasks);
}
After the release of the .NET 6 (in November, 2021), the recommended way of limiting the amount of concurrent asynchronous I/O operations is the Parallel.ForEachAsync API, with the MaxDegreeOfParallelism configuration. Here is how it can be used in practice:
// let's say there is a list of 1000+ URLs
string[] urls = { "http://google.com", "http://yahoo.com", /*...*/ };
var client = new HttpClient();
var options = new ParallelOptions() { MaxDegreeOfParallelism = 20 };
// now let's send HTTP requests to each of these URLs in parallel
await Parallel.ForEachAsync(urls, options, async (url, cancellationToken) =>
{
var html = await client.GetStringAsync(url, cancellationToken);
});
In the above example the Parallel.ForEachAsync task is awaited asynchronously. You can also Wait it synchronously if you need to, which will block the current thread until the completion of all asynchronous operations. The synchronous Wait has the advantage that in case of errors, all exceptions will be propagated. On the contrary the await operator propagates by design only the first exception. In case this is a problem, you can find solutions here.
(Note: an idiomatic implementation of a ForEachAsync extension method that also propagates the results, can be found in the 4th revision of this answer)
There are a lot of pitfalls and direct use of a semaphore can be tricky in error cases, so I would suggest to use AsyncEnumerator NuGet Package instead of re-inventing the wheel:
// let's say there is a list of 1000+ URLs
string[] urls = { "http://google.com", "http://yahoo.com", ... };
// now let's send HTTP requests to each of these URLs in parallel
await urls.ParallelForEachAsync(async (url) => {
var client = new HttpClient();
var html = await client.GetStringAsync(url);
}, maxDegreeOfParalellism: 20);
Unfortunately, the .NET Framework is missing most important combinators for orchestrating parallel async tasks. There is no such thing built-in.
Look at the AsyncSemaphore class built by the most respectable Stephen Toub. What you want is called a semaphore, and you need an async version of it.
SemaphoreSlim can be very helpful here. Here's the extension method I've created.
/// <summary>
/// Concurrently Executes async actions for each item of <see cref="IEnumerable<typeparamref name="T"/>
/// </summary>
/// <typeparam name="T">Type of IEnumerable</typeparam>
/// <param name="enumerable">instance of <see cref="IEnumerable<typeparamref name="T"/>"/></param>
/// <param name="action">an async <see cref="Action" /> to execute</param>
/// <param name="maxActionsToRunInParallel">Optional, max numbers of the actions to run in parallel,
/// Must be grater than 0</param>
/// <returns>A Task representing an async operation</returns>
/// <exception cref="ArgumentOutOfRangeException">If the maxActionsToRunInParallel is less than 1</exception>
public static async Task ForEachAsyncConcurrent<T>(
this IEnumerable<T> enumerable,
Func<T, Task> action,
int? maxActionsToRunInParallel = null)
{
if (maxActionsToRunInParallel.HasValue)
{
using (var semaphoreSlim = new SemaphoreSlim(
maxActionsToRunInParallel.Value, maxActionsToRunInParallel.Value))
{
var tasksWithThrottler = new List<Task>();
foreach (var item in enumerable)
{
// Increment the number of currently running tasks and wait if they are more than limit.
await semaphoreSlim.WaitAsync();
tasksWithThrottler.Add(Task.Run(async () =>
{
await action(item).ContinueWith(res =>
{
// action is completed, so decrement the number of currently running tasks
semaphoreSlim.Release();
});
}));
}
// Wait for all of the provided tasks to complete.
await Task.WhenAll(tasksWithThrottler.ToArray());
}
}
else
{
await Task.WhenAll(enumerable.Select(item => action(item)));
}
}
Sample Usage:
await enumerable.ForEachAsyncConcurrent(
async item =>
{
await SomeAsyncMethod(item);
},
5);
Although 1000 tasks might be queued very quickly, the Parallel Tasks library can only handle concurrent tasks equal to the amount of CPU cores in the machine. That means that if you have a four-core machine, only 4 tasks will be executing at a given time (unless you lower the MaxDegreeOfParallelism).
this is not good practice as it changes a global variable. it is also not a general solution for async. but it is easy for all instances of HttpClient, if that's all you're after. you can simply try:
System.Net.ServicePointManager.DefaultConnectionLimit = 20;
Here is a handy Extension Method you can create to wrap a list of tasks such that they will be executed with a maximum degree of concurrency:
/// <summary>Allows to do any async operation in bulk while limiting the system to a number of concurrent items being processed.</summary>
private static IEnumerable<Task<T>> WithMaxConcurrency<T>(this IEnumerable<Task<T>> tasks, int maxParallelism)
{
SemaphoreSlim maxOperations = new SemaphoreSlim(maxParallelism);
// The original tasks get wrapped in a new task that must first await a semaphore before the original task is called.
return tasks.Select(task => maxOperations.WaitAsync().ContinueWith(_ =>
{
try { return task; }
finally { maxOperations.Release(); }
}).Unwrap());
}
Now instead of:
await Task.WhenAll(someTasks);
You can go
await Task.WhenAll(someTasks.WithMaxConcurrency(20));
Parallel computations should be used for speeding up CPU-bound operations. Here we are talking about I/O bound operations. Your implementation should be purely async, unless you're overwhelming the busy single core on your multi-core CPU.
EDIT I like the suggestion made by usr to use an "async semaphore" here.
Essentially you're going to want to create an Action or Task for each URL that you want to hit, put them in a List, and then process that list, limiting the number that can be processed in parallel.
My blog post shows how to do this both with Tasks and with Actions, and provides a sample project you can download and run to see both in action.
With Actions
If using Actions, you can use the built-in .Net Parallel.Invoke function. Here we limit it to running at most 20 threads in parallel.
var listOfActions = new List<Action>();
foreach (var url in urls)
{
var localUrl = url;
// Note that we create the Task here, but do not start it.
listOfTasks.Add(new Task(() => CallUrl(localUrl)));
}
var options = new ParallelOptions {MaxDegreeOfParallelism = 20};
Parallel.Invoke(options, listOfActions.ToArray());
With Tasks
With Tasks there is no built-in function. However, you can use the one that I provide on my blog.
/// <summary>
/// Starts the given tasks and waits for them to complete. This will run, at most, the specified number of tasks in parallel.
/// <para>NOTE: If one of the given tasks has already been started, an exception will be thrown.</para>
/// </summary>
/// <param name="tasksToRun">The tasks to run.</param>
/// <param name="maxTasksToRunInParallel">The maximum number of tasks to run in parallel.</param>
/// <param name="cancellationToken">The cancellation token.</param>
public static async Task StartAndWaitAllThrottledAsync(IEnumerable<Task> tasksToRun, int maxTasksToRunInParallel, CancellationToken cancellationToken = new CancellationToken())
{
await StartAndWaitAllThrottledAsync(tasksToRun, maxTasksToRunInParallel, -1, cancellationToken);
}
/// <summary>
/// Starts the given tasks and waits for them to complete. This will run the specified number of tasks in parallel.
/// <para>NOTE: If a timeout is reached before the Task completes, another Task may be started, potentially running more than the specified maximum allowed.</para>
/// <para>NOTE: If one of the given tasks has already been started, an exception will be thrown.</para>
/// </summary>
/// <param name="tasksToRun">The tasks to run.</param>
/// <param name="maxTasksToRunInParallel">The maximum number of tasks to run in parallel.</param>
/// <param name="timeoutInMilliseconds">The maximum milliseconds we should allow the max tasks to run in parallel before allowing another task to start. Specify -1 to wait indefinitely.</param>
/// <param name="cancellationToken">The cancellation token.</param>
public static async Task StartAndWaitAllThrottledAsync(IEnumerable<Task> tasksToRun, int maxTasksToRunInParallel, int timeoutInMilliseconds, CancellationToken cancellationToken = new CancellationToken())
{
// Convert to a list of tasks so that we don't enumerate over it multiple times needlessly.
var tasks = tasksToRun.ToList();
using (var throttler = new SemaphoreSlim(maxTasksToRunInParallel))
{
var postTaskTasks = new List<Task>();
// Have each task notify the throttler when it completes so that it decrements the number of tasks currently running.
tasks.ForEach(t => postTaskTasks.Add(t.ContinueWith(tsk => throttler.Release())));
// Start running each task.
foreach (var task in tasks)
{
// Increment the number of tasks currently running and wait if too many are running.
await throttler.WaitAsync(timeoutInMilliseconds, cancellationToken);
cancellationToken.ThrowIfCancellationRequested();
task.Start();
}
// Wait for all of the provided tasks to complete.
// We wait on the list of "post" tasks instead of the original tasks, otherwise there is a potential race condition where the throttler's using block is exited before some Tasks have had their "post" action completed, which references the throttler, resulting in an exception due to accessing a disposed object.
await Task.WhenAll(postTaskTasks.ToArray());
}
}
And then creating your list of Tasks and calling the function to have them run, with say a maximum of 20 simultaneous at a time, you could do this:
var listOfTasks = new List<Task>();
foreach (var url in urls)
{
var localUrl = url;
// Note that we create the Task here, but do not start it.
listOfTasks.Add(new Task(async () => await CallUrl(localUrl)));
}
await Tasks.StartAndWaitAllThrottledAsync(listOfTasks, 20);

Categories