How do I run a method both parallel and sequentially in C#?

How do I run a method both parallel and sequentially in C#? - c#

I have a C# console app. In this app, I have a method that I will call DoWorkAsync. For the context of this question, this method looks like this:
private async Task<string> DoWorkAsync()
{
System.Threading.Thread.Sleep(5000);
var random = new Random();
var chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
var length = random.Next(10, 101);
await Task.CompletedTask;
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}
I call DoWorkAsync from another method that determines a) how many times this will get ran and b) if each call will be ran in parallel or sequentially. That method looks like this:
private async Task<Task<string>[]> DoWork(int iterations, bool runInParallel)
{
var tasks = new List<Task<string>>();
for (var i=0; i<iterations; i++)
{
if (runInParallel)
{
var task = Task.Run(() => DoWorkAsync());
tasks.Add(task);
}
else
{
await DoWorkAsync();
}
}
return tasks.ToArray();
}
After all of the tasks are completed, I want to display the results. To do this, I have code that looks like this:
var random = new Random();
var tasks = await DoWork(random.Next(10, 101);
Task.WaitAll(tasks);
foreach (var task in tasks)
{
Console.WriteLine(task.Result);
}
This code works as expected if the code runs in parallel (i.e. runInParallel is true). However, when runInParallel is false (i.e. I want to run the Tasks sequentially) the Task array doesn't get populated. So, the caller doesn't have any results to work with. I don't know how to fix it though. I'm not sure how to add the method call as a Task that will run sequentially. I understand that the idea behind Tasks is to run in parallel. However, I have this need to toggle between parallel and sequential.
Thank you!

the Task array doesn't get populated.
So populate it:
else
{
var task = DoWorkAsync();
tasks.Add(task);
await task;
}
P.S.
Also your DoWorkAsync looks kinda wrong to me, why Thread.Sleep and not await Task.Delay (it is more correct way to simulate asynchronous execution, also you won't need await Task.CompletedTask this way). And if you expect DoWorkAsync to be CPU bound just make it like:
private Task<string> DoWorkAsync()
{
return Task.Run(() =>
{
// your cpu bound work
return "string";
});
}
After that you can do something like this (for both async/cpu bound work):
private async Task<string[]> DoWork(int iterations, bool runInParallel)
{
if(runInParallel)
{
var tasks = Enumerable.Range(0, iterations)
.Select(i => DoWorkAsync());
return await Task.WhenAll(tasks);
}
else
{
var result = new string[iterations];
for (var i = 0; i < iterations; i++)
{
result[i] = await DoWorkAsync();
}
return result;
}
}

Why is DoWorkAsync an async method?
It isn't currently doing anything asynchronous.
It seems that you are trying to utilise multiple threads to improve the performance of expensive CPU-bound work, so you would be better to make use of Parallel.For, which is designed for this purpose:
private string DoWork()
{
System.Threading.Thread.Sleep(5000);
var random = new Random();
var chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
var length = random.Next(10, 101);
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}
private string[] DoWork(int iterations, bool runInParallel)
{
var results = new string[iterations];
if (runInParallel)
{
Parallel.For(0, iterations - 1, i => results[i] = DoWork());
}
else
{
for (int i = 0; i < iterations; i++) results[i] = DoWork();
}
return results;
}
Then:
var random = new Random();
var serial = DoWork(random.Next(10, 101));
var parallel = DoWork(random.Next(10, 101), true);

I think you'd be better off doing the following:
Create a function that creates a (cold) list of tasks (or an array Task<string>[] for instance). No need to run them. Let's call this GetTasks()
var jobs = GetTasks();
Then, if you want to run them "sequentially", just do
var results = new List<string>();
foreach (var job in jobs)
{
var result = await job;
results.Add(result);
}
return results;
If you want to run them in parallel :
foreach (var job in jobs)
{
job.Start();
}
await results = Task.WhenAll(jobs);
Another note,
All this in itself should be a Task<string[]>, the Task<Task<... smells like a problem.

Related

C# not waiting for all Tasks to be performed

I'm trying to execute multiple requests at the same time to a Pi Number API. The main problem is that despite the 'Task.WhenAll(ExecuteRequests()).Wait();' line, it isn't completing all tasks. It should execute 50 requests and add it results to pi Dictionary, but after code execution the dictionary has about 44~46 items.
I tried to add an 'availables threads at ThreadPool verification', so i could guarantee i have enough Threads, but nothing changed.
The other problem is that sometimes when I run the code, i have an error saying I'm trying to add an already added key to the dicitionary, but the lock statement wasn't supposed to guarantee this error doesn't occur?
const int TotalRequests = 50;
static int requestsCount = 0;
static Dictionary<int, string> pi = new();
static readonly object lockState = new();
static void Main(string[] args)
{
var timer = new Stopwatch();
timer.Start();
Task.WhenAll(ExecuteRequests()).Wait();
timer.Stop();
foreach (var item in pi.OrderBy(x => x.Key))
Console.Write(item.Value);
Console.WriteLine($"\n\n{timer.ElapsedMilliseconds}ms");
Console.WriteLine($"\n{pi.Count} items");
}
static List<Task> ExecuteRequests()
{
var tasks = new List<Task>();
for (int i = 0; i < TotalRequests; i++)
{
ThreadPool.GetAvailableThreads(out int workerThreads, out int completionPortThreads);
while (workerThreads < 1)
{
ThreadPool.GetAvailableThreads(out workerThreads, out completionPortThreads);
Thread.Sleep(100);
}
tasks.Add(Task.Run(async () =>
{
var currentRequestId = 0;
lock (lockState)
currentRequestId = requestsCount++;
var httpClient = new HttpClient();
var result = await httpClient.GetAsync($"https://api.pi.delivery/v1/pi?start={currentRequestId * 1000}&numberOfDigits=1000");
if (result.StatusCode == System.Net.HttpStatusCode.OK)
{
var json = await result.Content.ReadAsStringAsync();
var content = JsonSerializer.Deserialize<JsonObject>(json)!["content"]!.ToString();
//var content = (await JsonSerializer.DeserializeAsync<JsonObject>(new MemoryStream(System.Text.Encoding.UTF8.GetBytes(json)!)!)!)!["content"]!.ToString();
pi.Add(currentRequestId, content);
}
}));
}
return tasks;
}

There`s only one problem - you turned only one part of code, which have problem with threads:
lock (lockState)
currentRequestId = requestsCount++;
But, there`s another one:
pi.Add(currentRequestId, content);
The problem related to dictionary idea - a lot of readers and only one writer. So, you saw case with exception and if you write try catch, you will see AggregateException, which almost in every case mean thread issues, so, you need to do this:
lock (lockState)
pi.Add(currentRequestId, content);

I put a lock statement around the dicitionary manipulation as #AlexeiLevenkov mentioned and it worked fine.
tasks.Add(Task.Run(async () =>
{
var currentRequestId = 0;
lock (lockState)
currentRequestId = requestsCount++;
var httpClient = new HttpClient();
var result = await httpClient.GetAsync($"https://api.pi.delivery/v1/pi?start={currentRequestId * 1000}&numberOfDigits=1000");
if (result.StatusCode == System.Net.HttpStatusCode.OK)
{
var json = await result.Content.ReadAsStringAsync();
var content = JsonSerializer.Deserialize<JsonObject>(json)!["content"]!.ToString();
//var content = (await JsonSerializer.DeserializeAsync<JsonObject>(new MemoryStream(System.Text.Encoding.UTF8.GetBytes(json)!)!)!)!["content"]!.ToString();
lock (lockState)
pi.Add(currentRequestId, content);
}
}));

I'm not directly answering the question, just suggesting that you can use Microsoft's Reactive Framework (aka Rx) - NuGet System.Reactive and add using System.Reactive.Linq; - then you can do this:
static void Main(string[] args)
{
var timer = new Stopwatch();
timer.Start();
(int currentRequestId, string content)[] results = ExecuteRequests(50).ToArray().Wait()
timer.Stop();
foreach (var item in results.OrderBy(x => x.currentRequestId))
Console.Write(item.content);
Console.WriteLine($"\n\n{timer.ElapsedMilliseconds}ms");
Console.WriteLine($"\n{results.Count()} items");
}
static IObservable<(int currentRequestId, string content)> ExecuteRequests(int totalRequests) =>
Observable
.Defer(() =>
from currentRequestId in Observable.Range(0, totalRequests)
from content in Observable.Using(() => new HttpClient(), hc =>
from result in Observable.FromAsync(() => hc.GetAsync($"https://api.pi.delivery/v1/pi?start={currentRequestId * 1000}&numberOfDigits=1000"))
where result.StatusCode == System.Net.HttpStatusCode.OK
from json in Observable.FromAsync(() => result.Content.ReadAsStringAsync())
select JsonSerializer.Deserialize<JsonObject>(json)!["content"]!.ToString())
select new
{
currentRequestId,
content,
});

TaskFactory, Starting a new Task when one ends

I have found many methods of using the TaskFactory but I could not find anything about starting more tasks and watching when one ends and starting another one.
I always want to have 10 tasks working.
I want something like this
int nTotalTasks=10;
int nCurrentTask=0;
Task<bool>[] tasks=new Task<bool>[nThreadsNum];
for (int i=0; i<1000; i++)
{
string param1="test";
string param2="test";
if (nCurrentTask<10) // if there are less than 10 tasks then start another one
tasks[nCurrentThread++] = Task.Factory.StartNew<bool>(() =>
{
MyClass cls = new MyClass();
bool bRet = cls.Method1(param1, param2, i); // takes up to 2 minutes to finish
return bRet;
});
// How can I stop the for loop until a new task is finished and start a new one?
}

Check out the Task.WaitAny method:
Waits for any of the provided Task objects to complete execution.
Example from the documentation:
var t1 = Task.Factory.StartNew(() => DoOperation1());
var t2 = Task.Factory.StartNew(() => DoOperation2());
Task.WaitAny(t1, t2)

I would use a combination of Microsoft's Reactive Framework (NuGet "Rx-Main") and TPL for this. It becomes very simple.
Here's the code:
int nTotalTasks=10;
string param1="test";
string param2="test";
IDisposable subscription =
Observable
.Range(0, 1000)
.Select(i => Observable.FromAsync(() => Task.Factory.StartNew<bool>(() =>
{
MyClass cls = new MyClass();
bool bRet = cls.Method1(param1, param2, i); // takes up to 2 minutes to finish
return bRet;
})))
.Merge(nTotalTasks)
.ToArray()
.Subscribe((bool[] results) =>
{
/* Do something with the results. */
});
The key part here is the .Merge(nTotalTasks) which limits the number of concurrent tasks.
If you need to stop the processing part way thru just call subscription.Dispose() and everything gets cleaned up for you.
If you want to process each result as they are produced you can change the code from the .Merge(...) like this:
.Merge(nTotalTasks)
.Subscribe((bool result) =>
{
/* Do something with each result. */
});

This should be all you need, not complete, but all you need to do is wait on the first to complete and then run the second.
Task.WaitAny(task to wait on);
Task.Factory.StartNew()

Have you seen the BlockingCollection class? It allows you to have multiple threads running in parallel and you can wait from results from one task to execute another. See more information here.

The answer depends on whether the tasks to be scheduled are CPU or I/O bound.
For CPU-intensive work I would use Parallel.For() API setting the number of thread/tasks through MaxDegreeOfParallelism property of ParallelOptions
For I/O bound work the number of concurrently executing tasks can be significantly larger than the number of available CPUs, so the strategy is to rely on async methods as much as possible, which reduces the total number of threads waiting for completion.
How can I stop the for loop until a new task is finished and start a
new one?
The loop can be throttled by using await:
static void Main(string[] args)
{
var task = DoWorkAsync();
task.Wait();
// handle results
// task.Result;
Console.WriteLine("Done.");
}
async static Task<bool> DoWorkAsync()
{
const int NUMBER_OF_SLOTS = 10;
string param1="test";
string param2="test";
var results = new bool[NUMBER_OF_SLOTS];
AsyncWorkScheduler ws = new AsyncWorkScheduler(NUMBER_OF_SLOTS);
for (int i = 0; i < 1000; ++i)
{
await ws.ScheduleAsync((slotNumber) => DoWorkAsync(i, slotNumber, param1, param2, results));
}
ws.Complete();
await ws.Completion;
}
async static Task DoWorkAsync(int index, int slotNumber, string param1, string param2, bool[] results)
{
results[slotNumber] = results[slotNumber} && await Task.Factory.StartNew<bool>(() =>
{
MyClass cls = new MyClass();
bool bRet = cls.Method1(param1, param2, i); // takes up to 2 minutes to finish
return bRet;
}));
}
A helper class AsyncWorkScheduler uses TPL.DataFlow components as well as Task.WhenAll():
class AsyncWorkScheduler
{
public AsyncWorkScheduler(int numberOfSlots)
{
m_slots = new Task[numberOfSlots];
m_availableSlots = new BufferBlock<int>();
m_errors = new List<Exception>();
m_tcs = new TaskCompletionSource<bool>();
m_completionPending = 0;
// Initial state: all slots are available
for(int i = 0; i < m_slots.Length; ++i)
{
m_slots[i] = Task.FromResult(false);
m_availableSlots.Post(i);
}
}
public async Task ScheduleAsync(Func<int, Task> action)
{
if (Volatile.Read(ref m_completionPending) != 0)
{
throw new InvalidOperationException("Unable to schedule new items.");
}
// Acquire a slot
int slotNumber = await m_availableSlots.ReceiveAsync().ConfigureAwait(false);
// Schedule a new task for a given slot
var task = action(slotNumber);
// Store a continuation on the task to handle completion events
m_slots[slotNumber] = task.ContinueWith(t => HandleCompletedTask(t, slotNumber), TaskContinuationOptions.ExecuteSynchronously);
}
public async void Complete()
{
if (Interlocked.CompareExchange(ref m_completionPending, 1, 0) != 0)
{
return;
}
// Signal the queue's completion
m_availableSlots.Complete();
await Task.WhenAll(m_slots).ConfigureAwait(false);
// Set completion
if (m_errors.Count != 0)
{
m_tcs.TrySetException(m_errors);
}
else
{
m_tcs.TrySetResult(true);
}
}
public Task Completion
{
get
{
return m_tcs.Task;
}
}
void SetFailed(Exception error)
{
lock(m_errors)
{
m_errors.Add(error);
}
}
void HandleCompletedTask(Task task, int slotNumber)
{
if (task.IsFaulted || task.IsCanceled)
{
SetFailed(task.Exception);
return;
}
if (Volatile.Read(ref m_completionPending) == 1)
{
return;
}
// Release a slot
m_availableSlots.Post(slotNumber);
}
int m_completionPending;
List<Exception> m_errors;
BufferBlock<int> m_availableSlots;
TaskCompletionSource<bool> m_tcs;
Task[] m_slots;
}

List of Tasks C#

var tasks = new List<Task>();
for (int i = 0; i < pageCount; i++)
{
var task = Task.Run(() =>
{
worker.GetHouses(currentPage);
});
tasks.Add(task);
currentPage++;
}
Task.WaitAll(tasks.ToArray());
There is something i don't understand.
Whenever i use:
var tasks = new[]
{
Task.Run(() => {worker.GetHouses(1);}),
Task.Run(() => {worker.GetHouses(2);}),
Task.Run(() => {worker.GetHouses(3);})
};
And i loop trough that array, i get results perfectly fine. (when using Task.WaitAll(tasks)
When i use:
var tasks = new List<Task>();
my Task.WaitAll(tasks.toArray()) doesn't seem to work, my tasks "Status" stays on "RanToCompletion"
What did i do wrong?

You have a synchronization problem with the currentPage variable. Also create tasks with result.
Solution:
var tasks = new List<Task<List<House>>>();
for (int i = 0; i < pageCount; i++)
{
var currentPageCopy = currentPage;
var task = Task.Run(() =>
{
return worker.GetHouses(currentPageCopy);
});
tasks.Add(task);
currentPage++;
}
Task.WaitAll(tasks.ToArray());
The problem with your code is that all GetHouses invocations will be called with currentPage + pageCount - 1 as the last value will be used for all method calls...

There's been little issue with task types.
In your sample you were using System.Threading.Tasks.Task, which does not have the result - it's intended just to do some job, like void method.
In your code here:
var tasks = new[]
{
Task.Run(() => {worker.GetHouses(1);}),
Task.Run(() => {worker.GetHouses(2);}),
Task.Run(() => {worker.GetHouses(3);})
};
no type were specified explicitly, so it turned out to be System.Threading.Tasks.Task<List<House>>, but first piece of code you specified the System.Threading.Tasks.Task explicitly:
var tasks = new List<Task>();
What you need to use is System.Threading.Tasks.Task<TResult>:
var tasks = new List<Task<List<House>>>();// <- task type specified explicitly
for (int i = 0; i < pageCount; i++)
{
var task = Task.Factory.StartNew<List<House>>(() =>// <- task type specified explicitly , though it's mandatory here
{
return worker.GetHouses(currentPage);
});
tasks.Add(task);
currentPage++;
}
In similar situations I tend to define types explicitly, so that code becomes clearer to read and as you can see, even to work.

Why TPL Dataflow block.LinkTo does not give any output?

I am quite new to the topic TPL Dataflow. In the book Concurrency in C# I tested the following example. I can't figure out why there's no output which should be 2*2-2=2;
static void Main(string[] args)
{
//Task tt = test();
Task tt = test1();
Console.ReadLine();
}
static async Task test1()
{
try
{
var multiplyBlock = new TransformBlock<int, int>(item =>
{
if (item == 1)
throw new InvalidOperationException("Blech.");
return item * 2;
});
var subtractBlock = new TransformBlock<int, int>(item => item - 2);
multiplyBlock.LinkTo(subtractBlock,
new DataflowLinkOptions { PropagateCompletion = true });
multiplyBlock.Post(2);
await subtractBlock.Completion;
int temp = subtractBlock.Receive();
Console.WriteLine(temp);
}
catch (AggregateException e)
{
// The exception is caught here.
foreach (var v in e.InnerExceptions)
{
Console.WriteLine(v.Message);
}
}
}
Update1: I tried another example. Still I did not use Block.Complete() but I thought when the first block's completed, the result is passed into the second block automatically.
private static async Task test3()
{
TransformManyBlock<int, int> tmb = new TransformManyBlock<int, int>((i) => { return new int[] {i, i + 1}; });
ActionBlock<int> ab = new ActionBlock<int>((i) => Console.WriteLine(i));
tmb.LinkTo(ab);
for (int i = 0; i < 4; i++)
{
tmb.Post(i);
}
//tmb.Complete();
await ab.Completion;
Console.WriteLine("Finished post");
}

This part of the code:
await subtractBlock.Completion;
int temp = subtractBlock.Receive();
is first (asynchronously) waiting for the subtraction block to complete, and then attempting to retrieve an output from the block.
There are two problems: the source block is never completed, and the code is attempting to retrieve output from a completed block. Once a block has completed, it will not produce any more data.
(I assume you're referring to the example in recipe 4.2, which will post 1, causing the exception, which completes the block in a faulted state).
So, you can fix this test by completing the source block (and the completion will propagate along the link to the subtractBlock automatically), and by reading the output before (asynchronously) waiting for subtractBlock to complete:
multiplyBlock.Complete();
int temp = subtractBlock.Receive();
await subtractBlock.Completion;

Limited number of concurent threads C# [duplicate]

Let's say I have 100 tasks that do something that takes 10 seconds.
Now I want to only run 10 at a time like when 1 of those 10 finishes another task gets executed till all are finished.
Now I always used ThreadPool.QueueUserWorkItem() for such task but I've read that it is bad practice to do so and that I should use Tasks instead.
My problem is that I nowhere found a good example for my scenario so could you get me started on how to achieve this goal with Tasks?

SemaphoreSlim maxThread = new SemaphoreSlim(10);
for (int i = 0; i < 115; i++)
{
maxThread.Wait();
Task.Factory.StartNew(() =>
{
//Your Works
}
, TaskCreationOptions.LongRunning)
.ContinueWith( (task) => maxThread.Release() );
}

TPL Dataflow is great for doing things like this. You can create a 100% async version of Parallel.Invoke pretty easily:
async Task ProcessTenAtOnce<T>(IEnumerable<T> items, Func<T, Task> func)
{
ExecutionDataflowBlockOptions edfbo = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
};
ActionBlock<T> ab = new ActionBlock<T>(func, edfbo);
foreach (T item in items)
{
await ab.SendAsync(item);
}
ab.Complete();
await ab.Completion;
}

You have several options. You can use Parallel.Invoke for starters:
public void DoWork(IEnumerable<Action> actions)
{
Parallel.Invoke(new ParallelOptions() { MaxDegreeOfParallelism = 10 }
, actions.ToArray());
}
Here is an alternate option that will work much harder to have exactly 10 tasks running (although the number of threads in the thread pool processing those tasks may be different) and that returns a Task indicating when it finishes, rather than blocking until done.
public Task DoWork(IList<Action> actions)
{
List<Task> tasks = new List<Task>();
int numWorkers = 10;
int batchSize = (int)Math.Ceiling(actions.Count / (double)numWorkers);
foreach (var batch in actions.Batch(actions.Count / 10))
{
tasks.Add(Task.Factory.StartNew(() =>
{
foreach (var action in batch)
{
action();
}
}));
}
return Task.WhenAll(tasks);
}
If you don't have MoreLinq, for the Batch function, here's my simpler implementation:
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int batchSize)
{
List<T> buffer = new List<T>(batchSize);
foreach (T item in source)
{
buffer.Add(item);
if (buffer.Count >= batchSize)
{
yield return buffer;
buffer = new List<T>();
}
}
if (buffer.Count >= 0)
{
yield return buffer;
}
}

You can create a method like this:
public static async Task RunLimitedNumberAtATime<T>(int numberOfTasksConcurrent,
IEnumerable<T> inputList, Func<T, Task> asyncFunc)
{
Queue<T> inputQueue = new Queue<T>(inputList);
List<Task> runningTasks = new List<Task>(numberOfTasksConcurrent);
for (int i = 0; i < numberOfTasksConcurrent && inputQueue.Count > 0; i++)
runningTasks.Add(asyncFunc(inputQueue.Dequeue()));
while (inputQueue.Count > 0)
{
Task task = await Task.WhenAny(runningTasks);
runningTasks.Remove(task);
runningTasks.Add(asyncFunc(inputQueue.Dequeue()));
}
await Task.WhenAll(runningTasks);
}
And then you can call any async method n times with a limit like this:
Task task = RunLimitedNumberAtATime(10,
Enumerable.Range(1, 100),
async x =>
{
Console.WriteLine($"Starting task {x}");
await Task.Delay(100);
Console.WriteLine($"Finishing task {x}");
});
Or if you want to run long running non async methods, you can do it that way:
Task task = RunLimitedNumberAtATime(10,
Enumerable.Range(1, 100),
x => Task.Factory.StartNew(() => {
Console.WriteLine($"Starting task {x}");
System.Threading.Thread.Sleep(100);
Console.WriteLine($"Finishing task {x}");
}, TaskCreationOptions.LongRunning));
Maybe there is a similar method somewhere in the framework, but I didn't find it yet.

I would love to use the simplest solution I can think of which as I think using the TPL:
string[] urls={};
Parallel.ForEach(urls, new ParallelOptions() { MaxDegreeOfParallelism = 2}, url =>
{
//Download the content or do whatever you want with each URL
});

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How do I run a method both parallel and sequentially in C#? - c#

Related

C# not waiting for all Tasks to be performed

TaskFactory, Starting a new Task when one ends

List of Tasks C#

Why TPL Dataflow block.LinkTo does not give any output?

Limited number of concurent threads C# [duplicate]

Categories

Resources