Limited number of concurent threads C# [duplicate] - c#

Let's say I have 100 tasks that do something that takes 10 seconds.
Now I want to only run 10 at a time like when 1 of those 10 finishes another task gets executed till all are finished.
Now I always used ThreadPool.QueueUserWorkItem() for such task but I've read that it is bad practice to do so and that I should use Tasks instead.
My problem is that I nowhere found a good example for my scenario so could you get me started on how to achieve this goal with Tasks?

SemaphoreSlim maxThread = new SemaphoreSlim(10);
for (int i = 0; i < 115; i++)
{
maxThread.Wait();
Task.Factory.StartNew(() =>
{
//Your Works
}
, TaskCreationOptions.LongRunning)
.ContinueWith( (task) => maxThread.Release() );
}

TPL Dataflow is great for doing things like this. You can create a 100% async version of Parallel.Invoke pretty easily:
async Task ProcessTenAtOnce<T>(IEnumerable<T> items, Func<T, Task> func)
{
ExecutionDataflowBlockOptions edfbo = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
};
ActionBlock<T> ab = new ActionBlock<T>(func, edfbo);
foreach (T item in items)
{
await ab.SendAsync(item);
}
ab.Complete();
await ab.Completion;
}

You have several options. You can use Parallel.Invoke for starters:
public void DoWork(IEnumerable<Action> actions)
{
Parallel.Invoke(new ParallelOptions() { MaxDegreeOfParallelism = 10 }
, actions.ToArray());
}
Here is an alternate option that will work much harder to have exactly 10 tasks running (although the number of threads in the thread pool processing those tasks may be different) and that returns a Task indicating when it finishes, rather than blocking until done.
public Task DoWork(IList<Action> actions)
{
List<Task> tasks = new List<Task>();
int numWorkers = 10;
int batchSize = (int)Math.Ceiling(actions.Count / (double)numWorkers);
foreach (var batch in actions.Batch(actions.Count / 10))
{
tasks.Add(Task.Factory.StartNew(() =>
{
foreach (var action in batch)
{
action();
}
}));
}
return Task.WhenAll(tasks);
}
If you don't have MoreLinq, for the Batch function, here's my simpler implementation:
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int batchSize)
{
List<T> buffer = new List<T>(batchSize);
foreach (T item in source)
{
buffer.Add(item);
if (buffer.Count >= batchSize)
{
yield return buffer;
buffer = new List<T>();
}
}
if (buffer.Count >= 0)
{
yield return buffer;
}
}

You can create a method like this:
public static async Task RunLimitedNumberAtATime<T>(int numberOfTasksConcurrent,
IEnumerable<T> inputList, Func<T, Task> asyncFunc)
{
Queue<T> inputQueue = new Queue<T>(inputList);
List<Task> runningTasks = new List<Task>(numberOfTasksConcurrent);
for (int i = 0; i < numberOfTasksConcurrent && inputQueue.Count > 0; i++)
runningTasks.Add(asyncFunc(inputQueue.Dequeue()));
while (inputQueue.Count > 0)
{
Task task = await Task.WhenAny(runningTasks);
runningTasks.Remove(task);
runningTasks.Add(asyncFunc(inputQueue.Dequeue()));
}
await Task.WhenAll(runningTasks);
}
And then you can call any async method n times with a limit like this:
Task task = RunLimitedNumberAtATime(10,
Enumerable.Range(1, 100),
async x =>
{
Console.WriteLine($"Starting task {x}");
await Task.Delay(100);
Console.WriteLine($"Finishing task {x}");
});
Or if you want to run long running non async methods, you can do it that way:
Task task = RunLimitedNumberAtATime(10,
Enumerable.Range(1, 100),
x => Task.Factory.StartNew(() => {
Console.WriteLine($"Starting task {x}");
System.Threading.Thread.Sleep(100);
Console.WriteLine($"Finishing task {x}");
}, TaskCreationOptions.LongRunning));
Maybe there is a similar method somewhere in the framework, but I didn't find it yet.

I would love to use the simplest solution I can think of which as I think using the TPL:
string[] urls={};
Parallel.ForEach(urls, new ParallelOptions() { MaxDegreeOfParallelism = 2}, url =>
{
//Download the content or do whatever you want with each URL
});

Related

How do I run a method both parallel and sequentially in C#?

I have a C# console app. In this app, I have a method that I will call DoWorkAsync. For the context of this question, this method looks like this:
private async Task<string> DoWorkAsync()
{
System.Threading.Thread.Sleep(5000);
var random = new Random();
var chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
var length = random.Next(10, 101);
await Task.CompletedTask;
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}
I call DoWorkAsync from another method that determines a) how many times this will get ran and b) if each call will be ran in parallel or sequentially. That method looks like this:
private async Task<Task<string>[]> DoWork(int iterations, bool runInParallel)
{
var tasks = new List<Task<string>>();
for (var i=0; i<iterations; i++)
{
if (runInParallel)
{
var task = Task.Run(() => DoWorkAsync());
tasks.Add(task);
}
else
{
await DoWorkAsync();
}
}
return tasks.ToArray();
}
After all of the tasks are completed, I want to display the results. To do this, I have code that looks like this:
var random = new Random();
var tasks = await DoWork(random.Next(10, 101);
Task.WaitAll(tasks);
foreach (var task in tasks)
{
Console.WriteLine(task.Result);
}
This code works as expected if the code runs in parallel (i.e. runInParallel is true). However, when runInParallel is false (i.e. I want to run the Tasks sequentially) the Task array doesn't get populated. So, the caller doesn't have any results to work with. I don't know how to fix it though. I'm not sure how to add the method call as a Task that will run sequentially. I understand that the idea behind Tasks is to run in parallel. However, I have this need to toggle between parallel and sequential.
Thank you!
the Task array doesn't get populated.
So populate it:
else
{
var task = DoWorkAsync();
tasks.Add(task);
await task;
}
P.S.
Also your DoWorkAsync looks kinda wrong to me, why Thread.Sleep and not await Task.Delay (it is more correct way to simulate asynchronous execution, also you won't need await Task.CompletedTask this way). And if you expect DoWorkAsync to be CPU bound just make it like:
private Task<string> DoWorkAsync()
{
return Task.Run(() =>
{
// your cpu bound work
return "string";
});
}
After that you can do something like this (for both async/cpu bound work):
private async Task<string[]> DoWork(int iterations, bool runInParallel)
{
if(runInParallel)
{
var tasks = Enumerable.Range(0, iterations)
.Select(i => DoWorkAsync());
return await Task.WhenAll(tasks);
}
else
{
var result = new string[iterations];
for (var i = 0; i < iterations; i++)
{
result[i] = await DoWorkAsync();
}
return result;
}
}
Why is DoWorkAsync an async method?
It isn't currently doing anything asynchronous.
It seems that you are trying to utilise multiple threads to improve the performance of expensive CPU-bound work, so you would be better to make use of Parallel.For, which is designed for this purpose:
private string DoWork()
{
System.Threading.Thread.Sleep(5000);
var random = new Random();
var chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
var length = random.Next(10, 101);
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}
private string[] DoWork(int iterations, bool runInParallel)
{
var results = new string[iterations];
if (runInParallel)
{
Parallel.For(0, iterations - 1, i => results[i] = DoWork());
}
else
{
for (int i = 0; i < iterations; i++) results[i] = DoWork();
}
return results;
}
Then:
var random = new Random();
var serial = DoWork(random.Next(10, 101));
var parallel = DoWork(random.Next(10, 101), true);
I think you'd be better off doing the following:
Create a function that creates a (cold) list of tasks (or an array Task<string>[] for instance). No need to run them. Let's call this GetTasks()
var jobs = GetTasks();
Then, if you want to run them "sequentially", just do
var results = new List<string>();
foreach (var job in jobs)
{
var result = await job;
results.Add(result);
}
return results;
If you want to run them in parallel :
foreach (var job in jobs)
{
job.Start();
}
await results = Task.WhenAll(jobs);
Another note,
All this in itself should be a Task<string[]>, the Task<Task<... smells like a problem.

How to wait some seconds in C# Task.Factory.StartNew

I have a function to create multiple tasks using Task.Factory.StartNew.
public void MultipleThread(List<string> list)
{
List<Task> listTask = new List<Task>();
foreach (string item in list)
{
listTask.Add(Task.Factory.StartNew(() => SetAddress(item)));
}
Task.WaitAll(listTask.ToArray());
}
How to wait some seconds before continue to execute next task?
you can use the other WhaitAll option that takes two arguments
array of tasks and milliseconds Timeout.
public static bool WaitAll (System.Threading.Tasks.Task[] tasks, int millisecondsTimeout);
public void MultipleThread(List<string> list)
{
List<Task> listTask = new List<Task>();
foreach (string item in list)
{
listTask.Add(Task.Factory.StartNew(() => SetAddress(item)));
}
Task.WaitAll(listTask.ToArray(), 50000);
}
I hope this solves your problem
* UPDATE *
To make a gap in between each task run,
you can use the static TimeSpan.FromMilliseconds
Task t = Task.Run( () => {} );
TimeSpan ts = TimeSpan.FromMilliseconds(150);
if (! t.Wait(ts))
Console.WriteLine("The timeout interval elapsed.");

How to span MaxDegreeOfParallelism across multiple TPL Dataflow blocks?

I want to limit the total number of queries that I submit to my database server across all Dataflow blocks to 30. In the following scenario, the throttling of 30 concurrent tasks is per block so it always hits 60 concurrent tasks during execution. Obviously I could limit my parallelism to 15 per block to achieve a system wide total of 30 but this wouldn't be optimal.
How do I make this work? Do I limit (and block) my awaits using SemaphoreSlim, etc, or is there an intrinsic Dataflow approach that works better?
public class TPLTest
{
private long AsyncCount = 0;
private long MaxAsyncCount = 0;
private long TaskId = 0;
private object MetricsLock = new object();
public async Task Start()
{
ExecutionDataflowBlockOptions execOption
= new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 30 };
DataflowLinkOptions linkOption = new DataflowLinkOptions()
{ PropagateCompletion = true };
var doFirstIOWorkAsync = new TransformBlock<Data, Data>(
async data => await DoIOBoundWorkAsync(data), execOption);
var doCPUWork = new TransformBlock<Data, Data>(
data => DoCPUBoundWork(data));
var doSecondIOWorkAsync = new TransformBlock<Data, Data>(
async data => await DoIOBoundWorkAsync(data), execOption);
var doProcess = new TransformBlock<Data, string>(
i => $"Task finished, ID = : {i.TaskId}");
var doPrint = new ActionBlock<string>(
s => Debug.WriteLine(s));
doFirstIOWorkAsync.LinkTo(doCPUWork, linkOption);
doCPUWork.LinkTo(doSecondIOWorkAsync, linkOption);
doSecondIOWorkAsync.LinkTo(doProcess, linkOption);
doProcess.LinkTo(doPrint, linkOption);
int taskCount = 150;
for (int i = 0; i < taskCount; i++)
{
await doFirstIOWorkAsync.SendAsync(new Data() { Delay = 2500 });
}
doFirstIOWorkAsync.Complete();
await doPrint.Completion;
Debug.WriteLine("Max concurrent tasks: " + MaxAsyncCount.ToString());
}
private async Task<Data> DoIOBoundWorkAsync(Data data)
{
lock(MetricsLock)
{
AsyncCount++;
if (AsyncCount > MaxAsyncCount)
MaxAsyncCount = AsyncCount;
}
if (data.TaskId <= 0)
data.TaskId = Interlocked.Increment(ref TaskId);
await Task.Delay(data.Delay);
lock (MetricsLock)
AsyncCount--;
return data;
}
private Data DoCPUBoundWork(Data data)
{
data.Step = 1;
return data;
}
}
Data Class:
public class Data
{
public int Delay { get; set; }
public long TaskId { get; set; }
public int Step { get; set; }
}
Starting point:
TPLTest tpl = new TPLTest();
await tpl.Start();
Why don't you marshal everything to an action block that has the actual limitation?
var count = 0;
var ab1 = new TransformBlock<int, string>(l => $"1:{l}");
var ab2 = new TransformBlock<int, string>(l => $"2:{l}");
var doPrint = new ActionBlock<string>(
async s =>
{
var c = Interlocked.Increment(ref count);
Console.WriteLine($"{c}:{s}");
await Task.Delay(5);
Interlocked.Decrement(ref count);
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 15 });
ab1.LinkTo(doPrint);
ab2.LinkTo(doPrint);
for (var i = 100; i > 0; i--)
{
if (i % 3 == 0) await ab1.SendAsync(i);
if (i % 5 == 0) await ab2.SendAsync(i);
}
ab1.Complete();
ab2.Complete();
await ab1.Completion;
await ab2.Completion;
This is the solution I ended up going with (unless I can figure out how to use a single generic DataFlow block for marshalling every type of database access):
I defined a SemaphoreSlim at the class level:
private SemaphoreSlim ThrottleDatabaseQuerySemaphore = new SemaphoreSlim(30, 30);
I modified the I/O class to call a throttling class:
private async Task<Data> DoIOBoundWorkAsync(Data data)
{
if (data.TaskId <= 0)
data.TaskId = Interlocked.Increment(ref TaskId);
Task t = Task.Delay(data.Delay); ;
await ThrottleDatabaseQueryAsync(t);
return data;
}
The throttling class: (I also have a generic version of the throttling routine because I couldn't figure out how to write one routine to handle both Task and Task<TResult>)
private async Task ThrottleDatabaseQueryAsync(Task task)
{
await ThrottleDatabaseQuerySemaphore.WaitAsync();
try
{
lock (MetricsLock)
{
AsyncCount++;
if (AsyncCount > MaxAsyncCount)
MaxAsyncCount = AsyncCount;
}
await task;
}
finally
{
ThrottleDatabaseQuerySemaphore.Release();
lock (MetricsLock)
AsyncCount--;
}
}
}
The simplest solution to this problem is to configure all your blocks with a limited-concurrency TaskScheduler:
TaskScheduler scheduler = new ConcurrentExclusiveSchedulerPair(
TaskScheduler.Default, maxConcurrencyLevel: 30).ConcurrentScheduler;
ExecutionDataflowBlockOptions execOption = new()
{
TaskScheduler = scheduler,
MaxDegreeOfParallelism = scheduler.MaximumConcurrencyLevel,
};
TaskSchedulers can only limit the concurrency of work done on threads. They can't throttle asynchronous operations that are not running on threads. So in order to enforce the MaximumConcurrencyLevel policy, unfortunately you must pass synchronous delegates to all the Dataflow blocks. For example:
TransformBlock<Data, Data> doFirstIOWorkAsync = new(data =>
{
return DoIOBoundWorkAsync(data).GetAwaiter().GetResult();
}, execOption);
This change will increase the demand for ThreadPool threads, so you'd better increase the number of threads that the ThreadPool creates instantly on demand to a higher value than the default Environment.ProcessorCount:
ThreadPool.SetMinThreads(100, 100); // At the start of the program
I am proposing this solution not because it is optimal, but because it is easy to implement. My understanding is that wasting some RAM on ~30 threads that are going to be blocked most of the time, won't have any measurable negative effect on the type of application that you are working with.

TaskFactory, Starting a new Task when one ends

I have found many methods of using the TaskFactory but I could not find anything about starting more tasks and watching when one ends and starting another one.
I always want to have 10 tasks working.
I want something like this
int nTotalTasks=10;
int nCurrentTask=0;
Task<bool>[] tasks=new Task<bool>[nThreadsNum];
for (int i=0; i<1000; i++)
{
string param1="test";
string param2="test";
if (nCurrentTask<10) // if there are less than 10 tasks then start another one
tasks[nCurrentThread++] = Task.Factory.StartNew<bool>(() =>
{
MyClass cls = new MyClass();
bool bRet = cls.Method1(param1, param2, i); // takes up to 2 minutes to finish
return bRet;
});
// How can I stop the for loop until a new task is finished and start a new one?
}
Check out the Task.WaitAny method:
Waits for any of the provided Task objects to complete execution.
Example from the documentation:
var t1 = Task.Factory.StartNew(() => DoOperation1());
var t2 = Task.Factory.StartNew(() => DoOperation2());
Task.WaitAny(t1, t2)
I would use a combination of Microsoft's Reactive Framework (NuGet "Rx-Main") and TPL for this. It becomes very simple.
Here's the code:
int nTotalTasks=10;
string param1="test";
string param2="test";
IDisposable subscription =
Observable
.Range(0, 1000)
.Select(i => Observable.FromAsync(() => Task.Factory.StartNew<bool>(() =>
{
MyClass cls = new MyClass();
bool bRet = cls.Method1(param1, param2, i); // takes up to 2 minutes to finish
return bRet;
})))
.Merge(nTotalTasks)
.ToArray()
.Subscribe((bool[] results) =>
{
/* Do something with the results. */
});
The key part here is the .Merge(nTotalTasks) which limits the number of concurrent tasks.
If you need to stop the processing part way thru just call subscription.Dispose() and everything gets cleaned up for you.
If you want to process each result as they are produced you can change the code from the .Merge(...) like this:
.Merge(nTotalTasks)
.Subscribe((bool result) =>
{
/* Do something with each result. */
});
This should be all you need, not complete, but all you need to do is wait on the first to complete and then run the second.
Task.WaitAny(task to wait on);
Task.Factory.StartNew()
Have you seen the BlockingCollection class? It allows you to have multiple threads running in parallel and you can wait from results from one task to execute another. See more information here.
The answer depends on whether the tasks to be scheduled are CPU or I/O bound.
For CPU-intensive work I would use Parallel.For() API setting the number of thread/tasks through MaxDegreeOfParallelism property of ParallelOptions
For I/O bound work the number of concurrently executing tasks can be significantly larger than the number of available CPUs, so the strategy is to rely on async methods as much as possible, which reduces the total number of threads waiting for completion.
How can I stop the for loop until a new task is finished and start a
new one?
The loop can be throttled by using await:
static void Main(string[] args)
{
var task = DoWorkAsync();
task.Wait();
// handle results
// task.Result;
Console.WriteLine("Done.");
}
async static Task<bool> DoWorkAsync()
{
const int NUMBER_OF_SLOTS = 10;
string param1="test";
string param2="test";
var results = new bool[NUMBER_OF_SLOTS];
AsyncWorkScheduler ws = new AsyncWorkScheduler(NUMBER_OF_SLOTS);
for (int i = 0; i < 1000; ++i)
{
await ws.ScheduleAsync((slotNumber) => DoWorkAsync(i, slotNumber, param1, param2, results));
}
ws.Complete();
await ws.Completion;
}
async static Task DoWorkAsync(int index, int slotNumber, string param1, string param2, bool[] results)
{
results[slotNumber] = results[slotNumber} && await Task.Factory.StartNew<bool>(() =>
{
MyClass cls = new MyClass();
bool bRet = cls.Method1(param1, param2, i); // takes up to 2 minutes to finish
return bRet;
}));
}
A helper class AsyncWorkScheduler uses TPL.DataFlow components as well as Task.WhenAll():
class AsyncWorkScheduler
{
public AsyncWorkScheduler(int numberOfSlots)
{
m_slots = new Task[numberOfSlots];
m_availableSlots = new BufferBlock<int>();
m_errors = new List<Exception>();
m_tcs = new TaskCompletionSource<bool>();
m_completionPending = 0;
// Initial state: all slots are available
for(int i = 0; i < m_slots.Length; ++i)
{
m_slots[i] = Task.FromResult(false);
m_availableSlots.Post(i);
}
}
public async Task ScheduleAsync(Func<int, Task> action)
{
if (Volatile.Read(ref m_completionPending) != 0)
{
throw new InvalidOperationException("Unable to schedule new items.");
}
// Acquire a slot
int slotNumber = await m_availableSlots.ReceiveAsync().ConfigureAwait(false);
// Schedule a new task for a given slot
var task = action(slotNumber);
// Store a continuation on the task to handle completion events
m_slots[slotNumber] = task.ContinueWith(t => HandleCompletedTask(t, slotNumber), TaskContinuationOptions.ExecuteSynchronously);
}
public async void Complete()
{
if (Interlocked.CompareExchange(ref m_completionPending, 1, 0) != 0)
{
return;
}
// Signal the queue's completion
m_availableSlots.Complete();
await Task.WhenAll(m_slots).ConfigureAwait(false);
// Set completion
if (m_errors.Count != 0)
{
m_tcs.TrySetException(m_errors);
}
else
{
m_tcs.TrySetResult(true);
}
}
public Task Completion
{
get
{
return m_tcs.Task;
}
}
void SetFailed(Exception error)
{
lock(m_errors)
{
m_errors.Add(error);
}
}
void HandleCompletedTask(Task task, int slotNumber)
{
if (task.IsFaulted || task.IsCanceled)
{
SetFailed(task.Exception);
return;
}
if (Volatile.Read(ref m_completionPending) == 1)
{
return;
}
// Release a slot
m_availableSlots.Post(slotNumber);
}
int m_completionPending;
List<Exception> m_errors;
BufferBlock<int> m_availableSlots;
TaskCompletionSource<bool> m_tcs;
Task[] m_slots;
}

Task C#, list of Tast<T> and Task.Wait()

I have simple code and I would know answer:
List<Task<int>> list = new List<Task<int>>;
for(int i=0;i<10;i++)
{
list.Add(new Task<int>((j) => return (int)j%2, i);
}
list.Foreach(t => t.Start());
And now I want to wait for all Tasks, well it's good idea?:
Task.WaitAll(list.ToArray());
P.S:
describtion about WaitAll:
public static void WaitAll(params Task[] tasks)
Yes, that's exactly what Task.WaitAll is for.
But if you know how many tasks you're going to have before creating them, you should stick to Task[] instead of List<Task> from the very beginning:
var tasks = new Task[10];
for(int i=0;i<10;i++)
{
var task = new Task<int>(x => return x % 2, i);
tasks[i] = task;
task.Start();
}
Task.WaitAll(tasks);
This allows you to skip moving from List<T> to T[] array back and forth.

Categories