Multiple parallel Tasks in C# do not improve calculation time - c#

I have a complicated math problem to solve and I decided to do some independent calculations in parallel to improve calculation time. In many CAE programs, like ANSYS or SolidWorks, it is possible to set multiple cores for that purpose.
I created a simple Windows Form example to illustrate my problem. Here the function CalculateStuff() raises A from Sample class in power 1.2 max times. For 2 tasks it's max / 2 times and for 4 tasks it's max / 4 times.
I calculated the resulting time of operation both for only one CalculateStuff() function or four duplicates (CalculateStuff1(), ...2(), ...3(), ...4() - one for each task) with the same code. I'm not sure, if it matters to use the same function for each task (anyway, Math.Pow is the same). I also tried to enable or disable the ProgressBar.
The table represents time of operation (sec) for all 12 cases. I expected it to be like 2 and 4 times faster for 2 and 4 tasks, but in some cases 4 tasks are even worse than 1. My computer has 2 processors, 10 cores each. According to Debug window, CPU usage increases with more tasks. What's wrong with my code here or do I misunderstand something? Why multiple tasks do not improve time of operation?
private readonly ulong max = 400000000ul;
// Sample class
private class Sample
{
public double A { get; set; } = 1.0;
}
// Clear WinForm elements
private void Clear()
{
PBar1.Value = PBar2.Value = PBar3.Value = PBar4.Value = 0;
TextBox.Text = "";
}
// Button that launches 1 task
private async void BThr1_Click(object sender, EventArgs e)
{
Clear();
DateTime start = DateTime.Now;
Sample sample = new Sample();
await Task.Delay(100);
Task t = Task.Run(() => CalculateStuff(sample, PBar1, max));
await t;
TextBox.Text = (DateTime.Now - start).ToString(#"hh\:mm\:ss");
t.Dispose();
}
// Button that launches 2 tasks
private async void BThr2_Click(object sender, EventArgs e)
{
Clear();
DateTime start = DateTime.Now;
Sample sample1 = new Sample();
Sample sample2 = new Sample();
await Task.Delay(100);
Task t1 = Task.Run(() => CalculateStuff(sample1, PBar1, max / 2));
Task t2 = Task.Run(() => CalculateStuff(sample2, PBar2, max / 2));
await t1; await t2;
TextBox.Text = (DateTime.Now - start).ToString(#"hh\:mm\:ss");
t1.Dispose(); t2.Dispose();
}
// Button that launches 4 tasks
private async void BThr4_Click(object sender, EventArgs e)
{
Clear();
DateTime start = DateTime.Now;
Sample sample1 = new Sample();
Sample sample2 = new Sample();
Sample sample3 = new Sample();
Sample sample4 = new Sample();
await Task.Delay(100);
Task t1 = Task.Run(() => CalculateStuff(sample1, PBar1, max / 4));
Task t2 = Task.Run(() => CalculateStuff(sample2, PBar2, max / 4));
Task t3 = Task.Run(() => CalculateStuff(sample3, PBar3, max / 4));
Task t4 = Task.Run(() => CalculateStuff(sample4, PBar4, max / 4));
await t1; await t2; await t3; await t4;
TextBox.Text = (DateTime.Now - start).ToString(#"hh\:mm\:ss");
t1.Dispose(); t2.Dispose(); t3.Dispose(); t4.Dispose();
}
// Calculate some math stuff
private static void CalculateStuff(Sample s, ProgressBar pb, ulong max)
{
ulong c = max / (ulong)pb.Maximum;
for (ulong i = 1; i <= max; i++)
{
s.A = Math.Pow(s.A, 1.2);
if (i % c == 0)
pb.Invoke(new Action(() => pb.Value = (int)(i / c)));
}
}

Tasks are not threads. "Asynchronous" does not mean "simultaneous".
What's wrong with my code here or do I misunderstand something?
You're misunderstanding what tasks are.
You should think of tasks as something that you can do in any order you desire. Take the example of a cooking recipe:
Cut the potatoes
Cut the vegetables
Cut the meat
If these were not tasks and it was synchronous code, you would always do these steps in the exact order they were listed.
If they were tasks, that doesn't mean these jobs will be done simultaneously. You are only one person (= one thread), and you can only do one thing at a time.
You can do the tasks in any order you like, you can possibly even halt one task to begin on another, but you still can't do more than one thing at the same time. Regardless of the order in which you complete the tasks, the total time taken to complete all three tasks remains the same, and this is not (inherently) any faster.
If they were threads, that's like hiring 3 chefs, which means these jobs can be done simultaneously.
Asynchronicity does cut down on idling time, when it is awaitable.
Do note that asynchronous code can lead to time gains in cases where your synchronous code would otherwise be idling, e.g. waiting for a network response. This is not taken into account in the above example, which is exactly why I listed "cut [x]" jobs rather than "wait for [x] to boil".
Your job (the calculation) is not asynchronous code. It never idles (in a way that it's awaitable) and therefore it runs synchronously. This means you're not getting any benefit from running this asynchronously.
Reducing your code to a simpler example:
private static void CalculateStuff(Sample s, ProgressBar pb, ulong max)
{
Thread.Sleep(5000);
}
Very simply put, this job takes 5 seconds and cannot be awaited. If you run 3 of these tasks at the same time, they will still be handled one after the other, taking 15 seconds total.
If the job inside your tasks were actually awaitable, you would see a time benefit. E.g.:
private static async void CalculateStuff(Sample s, ProgressBar pb, ulong max)
{
await Task.Delay(5000);
}
This job takes 5 seconds but is awaitable. If you run 3 of these tasks at the same time, your thread will not waste time idling (i.e. waiting for the delay) and will instead start on the following task. Since it can await (i.e. do nothing for) these tasks at the same time, this means that the total processing time takes 5 seconds total (plus some negligible overhead cost).
According to Debug window, CPU usage increases with more tasks.
The managing of tasks takes a small overhead cost, which means that the total amounts of work (which can be measured in CPU usage over time) is slightly higher compared to synchronous code. That is to be expected.
This small cost usually pales in comparison to the benefits gained from well written asynchronous code. However, your code is simply not leveraging the actual benefits from asynchronicity, so you're only seeing the overhead cost and not its benefits, which is why your monitoring is giving you the opposite result of what you were expecting.
My computer has 2 processors, 10 cores each.
CPU cores, threads and tasks are three very different beasts.
Tasks are handled by threads, but they don't necessarily have a one-to-one mapping. Take the example of a team of 4 developers which has 10 bugs to resolve. While this means it's impossible for all 10 bugs to be resolved at the same time, these developers (threads) can take on the tickets (tasks) one after the other, taking on a new ticket (task) whenever they finished their previous ticket (task).
CPU cores are like workstations. It makes little sense to have less workstations (CPU cores) than you have developers (threads), since you'll end up with idling developers.
Additionally, you might not want your developers to be able to claim all workstations. Maybe HR and accounting (= other OS processes) also need to have some guaranteed workstations so they can do their job.
The company (= computer) doesn't just grind to a halt because the developers are fixing some bugs. This is what used to happen on single core machines - if one process claims the CPU, nothing else can happen. If that one process takes long or hangs, everything freezes.
This is why we have a thread pool. There is no straightforward real-world-analogy here (unless maybe a consultancy firm that dynamically adjusts how many developers it sends to your company), but the thread pool is basically able to decide how many developers are allowed to work at the company at the same time in order to ensure that development tasks can be seen to as fast as possible while also ensuring other departments can still do their work on the workstations as well.
It's a careful balancing act, not sending too many developers as that floods the systems, while also not sending too few developers as that means the work gets done too slowly.
The exact configuration of your threadpool is not something I can troubleshoot over a simple Q&A. But the behavior you describe is consistent with having less CPUs (dedicated to your runtime) and/or threads compared to how many tasks you have.

There are a lot of possible reasons that you might not see the performance gains you're expecting, including things like what else your machine's cores are getting used for at the moment. Running this trimmed-down version of your code, I am able to see a marked improvement when running parallel:
private IEnumerable<Sample> CalculateMany(int n)
{
return Enumerable.Range(0, n)
.AsParallel() // comment this to remove parallelism
.Select(i => { var s = new Sample(); CalculateStuff(s, max / (ulong)n); return s; })
.ToList();
}
// Calculate some math stuff
private static void CalculateStuff(Sample s, ulong max)
{
for (ulong i = 1; i <= max; i++)
{
s.A = Math.Pow(s.A, 1.2);
}
}
Here's running CalculateMany with n values as 1, 2, and 4:
Here's what I get if not using parallelism:
I see similar results using Task.Run():
private IEnumerable<Sample> CalculateMany(int n)
{
var tasks =
Enumerable.Range(0, n)
.Select(i => Task.Run(() => { var s = new Sample(); CalculateStuff(s, max / (ulong)n); return s; }))
.ToArray() ;
Task.WaitAll(tasks);
return tasks
.Select(t => t.Result)
.ToList();
}

Unfortunately I can not give you a reason other than probably something with state machine magic that is happening but this significally increases performance:
private async void BThr4_Click(object sender, EventArgs e)
{
Clear();
DateTime start = DateTime.Now;
await Task.Delay(100);
Task<Sample> t1 = Task<Sample>.Run(() => CalculateStuff(PBar1, max / 4));
Task<Sample> t2 = Task<Sample>.Run(() => CalculateStuff(PBar2, max / 4));
Task<Sample> t3 = Task<Sample>.Run(() => CalculateStuff(PBar3, max / 4));
Task<Sample> t4 = Task<Sample>.Run(() => CalculateStuff(PBar4, max / 4));
Sample sample1 = await t1;
Sample sample2 = await t2;
Sample sample3 = await t3;
Sample sample4 = await t4;
TextBox.Text = (DateTime.Now - start).ToString(#"hh\:mm\:ss");
t1.Dispose(); t2.Dispose(); t3.Dispose(); t4.Dispose();
}
// Calculate some math stuff
private static Sample CalculateStuff(ProgressBar pb, ulong max)
{
Sample s = new Sample();
ulong c = max / (ulong)pb.Maximum;
for (ulong i = 1; i <= max; i++)
{
s.A = Math.Pow(s.A, 1.2);
if (i % c == 0)
pb.Invoke(new Action(() => pb.Value = (int)(i / c)));
}
return s;
}
This way you are not keeping Sample instances that the tasks have to access in the calling function but you create the instances within the task and then just return them to the caller after the task has completed.

Related

Multithread foreach slows down main thread

Edit: As per the discussion in the comments, I was overestimating how much many threads would help, and have gone back to Parallell.ForEach with a reasonable MaxDegreeOfParallelism, and just have to wait it out.
I have a 2D array data structure, and perform work on slices of the data. There will only ever be around 1000 threads required to work on all the data simultaneously. Basically there are around 1000 "days" worth of data for all ~7000 data points, and I would like to process the data for each day in a new thread in parallel.
My issue is that doing work in the child threads dramatically slows the time in which the main thread starts them. If I have no work being done in the child threads, the main thread starts them all basically instantly. In my example below, with just a bit of work, it takes ~65ms to start all the threads. In my real use case, the worker threads will take around 5-10 seconds to compute all what they need, but I would like them all to start instantly otherwise, I am basically running the work in sequence. I do not understand why their work is slowing down the main thread from starting them.
How the data is setup shouldn't matter (I hope). The way it's setupmight look weird I was just simulating exactly how I receive the data. What's important is that if you comment out the foreach loop in the DoThreadWork method, the time it takes to start the threads is waaay lower.
I have the for (var i = 0; i < 4; i++) loop just to run the simulation multiple times to see 4 sets of timing results to make sure that it wasn't just slow the first time.
Here is a code snippet to simulate my real code:
public static void Main(string[] args)
{
var fakeData = Enumerable
.Range(0, 7000)
.Select(_ => Enumerable.Range(0, 400).ToArray())
.ToArray();
const int offset = 100;
var dataIndices = Enumerable
.Range(offset, 290)
.ToArray();
for (var i = 0; i < 4; i++)
{
var s = Stopwatch.StartNew();
var threads = dataIndices
.Select(n =>
{
var thread = new Thread(() =>
{
foreach (var fake in fakeData)
{
var sliced = new ArraySegment<int>(fake, n - offset, n - (n - offset));
DoThreadWork(sliced);
}
});
return thread;
})
.ToList();
foreach (var thread in threads)
{
thread.Start();
}
Console.WriteLine($"Before Join: {s.Elapsed.Milliseconds}");
foreach (var thread in threads)
{
thread.Join();
}
Console.WriteLine($"After Join: {s.Elapsed.Milliseconds}");
}
}
private static void DoThreadWork(ArraySegment<int> fakeData)
{
// Commenting out this foreach loop will dramatically increase the speed
// in which all the threads start
var a = 0;
foreach (var fake in fakeData)
{
// Simulate thread work
a += fake;
}
}
Use the thread/task pool and limit thread/task count to 2*(CPU Cores) at most. Creating more threads doesn't magically make more work get done as you need hardware "threads" to run them (1 per CPU core for non-SMT CPU's, 2 per core for Intel HT, AMD's SMT implementation). Executing hundreds to thousands of threads that don't have to passively await asynchronous callbacks (i.e. I/O) makes running the threads far less efficient due to thrashing the CPU with context switches for no reason.

ParallelEnumerable.WithDegreeOfParallelism() not restricting tasks?

I'm attempting to use AsParallel() with async-await to have an application process a series of tasks in parallel, but with a restricted degree of concurrency due to the task starting an external Process that has significant memory usage (hence wanting to wait for the process to complete before proceeding to the next item in the series). Most literature I've seen on the function ParallelEnumerable.WithDegreeOfSeparation suggests that using it will set a max limit on concurrent tasks at any one time, but my own tests seem to suggest that it's skipping the limit altogether.
To provide an rough example (WithDegreeOrParallelism() set to 1 deliberately to demonstrate the issue):
public class Example
{
private async Task HeavyTask(int i)
{
await Task.Delay(10 * 1000);
}
public async Task Run()
{
int n = 0;
await Task.WhenAll(Enumerable.Range(0, 100)
.AsParallel()
.WithDegreeOfParallelism(1)
.Select(async i =>
{
Interlocked.Increment(ref n);
Console.WriteLine("[+] " + n);
await HeavyTask(i);
Interlocked.Decrement(ref n);
Console.WriteLine("[-] " + n);
}));
}
}
class Program
{
public static void Main(string[] args)
{
Task.Run(async () =>
{
await new Example().Run();
}).Wait();
}
}
From what I understand, the code above is meant to produce output along the lines of:
[+] 1
[-] 0
[+] 1
[-] 0
...
But instead returns:
[+] 1
[+] 2
[+] 3
[+] 4
...
Suggesting that it starting off all the tasks in the list and then waiting for the tasks to return.
Is there anything particularly obvious (or non-obvious) that I'm doing wrong which is making it seem like WithDegreeOfParallelism() is being ignored?
Update
Sorry, after testing your code i understand what you are seeing now
async i =>
Async lambda is just async void, basically unobserved task which will run regardless Thread.CurrentThread.ManagedThreadId); will show you clearly it is consuming as many threads as it likes
Also note, if your heavy task is IO bound, then skip the PLINQ and Parallel use async and await in an TPL Dataflow ActionBlock as it will give you the best of both worlds
E.g
public static async Task DoWorkLoads(List<Something> results)
{
var options = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 2
};
var block = new ActionBlock<int>(MyMethodAsync, options);
foreach (var item in list)
block.Post(item );
block.Complete();
await block.Completion;
}
...
public async Task MyMethodAsync(int i)
{
await Task.Delay(10 * 1000);
}
Original
This is very subtle and a very common misunderstanding, however the documentation i think seems wrong
Sets the degree of parallelism to use in a query. Degree of
parallelism is the maximum number of concurrently executing tasks that
will be used to process the query.
Though if we dig into this a bit more we get a better understanding, also there are github conversations on this as well.
ParallelOptions.MaxDegreeOfParallelism vs PLINQ’s WithDegreeOfParallelism
PLINQ is different. Some important Standard Query Operators in PLINQ
require communication between the threads involved in the processing
of the query, including some that rely on a Barrier to enable threads
to operate in lock-step. The PLINQ design requires that a specific
number of threads be actively involved for the query to make any
progress. Thus when you specify a DegreeOfParallelism for PLINQ,
you’re specifying the actual number of threads that will be involved,
rather than just a maximum.

Load Test using C# Async Await

I am creating a console program, which can test read / write to a Cache by simulating multiple clients, and have written following code. Please help me understand:
Is it correct way to achieve the multi client simulation
What can I do more to make it a genuine load test
void Main()
{
List<Task<long>> taskList = new List<Task<long>>();
for (int i = 0; i < 500; i++)
{
taskList.Add(TestAsync());
}
Task.WaitAll(taskList.ToArray());
long averageTime = taskList.Average(t => t.Result);
}
public static async Task<long> TestAsync()
{
// Returns the total time taken using Stop Watch in the same module
return await Task.Factory.StartNew(() => // Call Cache Read / Write);
}
Adjusted your code slightly to see how many threads we have at a particular time.
static volatile int currentExecutionCount = 0;
static void Main(string[] args)
{
List<Task<long>> taskList = new List<Task<long>>();
var timer = new Timer(Print, null, TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(1));
for (int i = 0; i < 1000; i++)
{
taskList.Add(DoMagic());
}
Task.WaitAll(taskList.ToArray());
timer.Change(Timeout.Infinite, Timeout.Infinite);
timer = null;
//to check that we have all the threads executed
Console.WriteLine("Done " + taskList.Sum(t => t.Result));
Console.ReadLine();
}
static void Print(object state)
{
Console.WriteLine(currentExecutionCount);
}
static async Task<long> DoMagic()
{
return await Task.Factory.StartNew(() =>
{
Interlocked.Increment(ref currentExecutionCount);
//place your code here
Thread.Sleep(TimeSpan.FromMilliseconds(1000));
Interlocked.Decrement(ref currentExecutionCount);
return 4;
}
//this thing should give a hint to scheduller to use new threads and not scheduled
, TaskCreationOptions.LongRunning
);
}
The result is: inside a virtual machine I have from 2 to 10 threads running simultaneously if I don't use the hint. With the hint — up to 100. And on real machine I can see 1000 threads at once. Process explorer confirms this. Some details on the hint that would be helpful.
If it is very busy, then apparently your clients have to wait a while before their requests are serviced. Your program does not measure this, because your stopwatch starts running when the service request starts.
If you also want to measure what happen with the average time before a request is finished, you should start your stopwatch when the request is made, not when the request is serviced.
Your program takes only threads from the thread pool. If you start more tasks then there are threads, some tasks will have to wait before TestAsync starts running. This wait time would be measured if you remember the time Task.Run is called.
Besides the flaw in time measurements, how many service requests do you expect simultaneously? Are there enough free threads in your thread pool to simulate this? If you expect about 50 service requests at the same time, and the size of your thread pool is only 20 threads, then you'll never run 50 service requests at the same time. Vice versa: if your thread pool is way bigger than your number of expected simultaneous service requests, then you'll measure longer times than are actual the case.
Consider changing the number of threads in your thread pool, and make sure no one else uses any threads of the pool.

How can i use simplest Task?

I'm talking about single-threaded (not TaskEx for WindowsPhone) (ok, even basic Task is designed to be async, this makes question senseless) and synchronous (no async/await) pure Task.
Can in be useful in some cases (i have quite common app which pulls data from the server, deserialize it and shows results), or is Task just a basement for
await TaskEx.Run()?
EDIT1: i mean, how this
void Foo()
{
DoSmth();
}
void Main()
{
int a = 1;
Foo();
int b = 1;
}
would differ from
void Main()
{
int a = 1;
Task.Run( () => DoSmth );
int b = 1;
}
Calling Foo(); is also kinda a "promise that next code would be called after Foo() is done".
EDIT2: I just ran in wp7 app
Debug.WriteLine("OnLoaded {0} ", Thread.CurrentThread.ManagedThreadId);
Task.Factory.StartNew(() =>
{
Thread.Sleep(5000);
Debug.WriteLine("Run Id: {0}", Thread.CurrentThread.ManagedThreadId);
});
Debug.WriteLine("Done");
Got the output:
OnLoaded 1
Done
Run Id: 4
So, is Task.Factory.StartNew() the same as TaskEx.Run() ?
ESIT3: so, here is a short summary (as Task.Factory.StartNew() is the same as TaskEx.Run()):
Thread.Sleep(5000); // UI is frozen for 5 seconds
int a = 1; // this is called after 5 seconds
TaskEx.Run(() =>
{
Thread.Sleep(5000);
int a = 1; // this is called after 5 seconds
}
int b = 2; // UI is not frozen, this is called instantly
await TaskEx.Run(() => // UI is not frozen, but...
{
Thread.Sleep(5000);
int a = 1; // this is called after 5 seconds
}
int b = 2; // this is called then task is done
A Task is just a way to represent something that will complete in the future. This is most commonly an asynchronous operation or something running in a background thread (via Task.Run/TaskEx.Run).
A "synchronous pure Task" really doesn't make sense - the entire purpose of a Task is to represent something that is not synchronous.
Can in be useful in some cases (i have quite common app which pulls data from the server, deserialize it and shows results),
In this case, since the data is pulling from a server, that is by its nature a good canidate for an asynchronous operation. This would make it a perfect canidate for Task (or Task<T>).
In response to your edit:
In the first version, everything is just run sequentially.
The second version, using Task.Run, actually causes DoSmth() to execute in a background thread. The Task returned can be used with await to asynchonously wait for it to complete, if you wanted to do so. This means that DoSmth() will potentially run at the same time as the assignment to b (and subsequent operations).

Wanting to limit Threadpool so it doesn't max out CPU

I am programming with Threads for the first time. My program only shows a small amount of data at a time; as the user moves through the data I want it to load all the possible data that could be access next so there is as little lag as possible when user switches to a new section.
Worst case scenario I might need to preload 6 sections of data. So I use something like:
if (SectionOne == null)
{
ThreadPool.QueueUserWorkItem(new System.Threading.WaitCallback(PreloadSection),
Tuple.Create(thisSection, SectionOne));
}
if (SectionTwo == null)
{
ThreadPool.QueueUserWorkItem(new System.Threading.WaitCallback(PreloadSection),
Tuple.Create(thisSection, SectionTwo));
}
//....
to preload each area. It works great on my main system that has 8 cores; but on my test system that only has 4 cores the entire system slows to a crawl while it is running the threads.
I am thinking that I want to run a maximum of TotalCores - 2 threads at the same time. But really I have no idea.
Looking for any help in getting this to run as efficiently as possible on multiple system setups (single core through 8 cores or whatever). Also, I am using C# and this is a Portable Class Library project, so some of my options are limited.
I would be using this built in .NET parallelism magic.
Task Parallelism
With the Task operations that is managed for you but you still have control to pick how many cores and threads you want.
Example:
const int MAX = 10000;
var options = new ParallelOptions
{
MaxDegreeOfParallelism = 2
};
IList<int> threadIds = new List<int>();
Parallel.For(0, MAX, options, i =>
{
var id = Thread.CurrentThread.ManagedThreadId;
Console.WriteLine("Number '{0}' on thread {1}", i, id);
threadIds.Add(id);
});
You can even do it with Extensions if you want:
const int MAX_TASKS = 8;
var numbers = Enumerable.Range(0, 10000000);
IList<int> threadIds = new List<int>(MAX_TASKS);
numbers.AsParallel()
.WithDegreeOfParallelism(MAX_TASKS)
.ForAll(i =>
{
var id = Thread.CurrentThread.ManagedThreadId;
if (!threadIds.Contains(id))
{
threadIds.Add(id);
}
});
Assert.IsTrue(threadIds.Count > 2);
Assert.IsTrue(threadIds.Count <= MAX_TASKS);
Console.WriteLine(threadIds.Count);

Categories