TPL force higher parallelism - c#

When queuing Tasks to the ThreadPool, the code relies on the default TaskScheduler to execute them. In my code example, I can see that 7 Tasks maximum get executed in parallel on separate threads.
new Thread(() =>
{
while (true)
{
ThreadPool.GetAvailableThreads(out var wt, out var cpt);
Console.WriteLine($"WT:{wt} CPT:{cpt}");
Thread.Sleep(500);
}
}).Start();
var stopwatch = new Stopwatch();
stopwatch.Start();
var tasks = Enumerable.Range(0, 100).Select(async i => { await Task.Yield(); Thread.Sleep(10000); }).ToArray();
Task.WaitAll(tasks);
Console.WriteLine(stopwatch.Elapsed.TotalSeconds);
Console.ReadKey();
Is there a way to force the scheduler to fire up more Tasks on other threads? Or is there a more "generous" scheduler in the framework without implementing a custom one?
EDIT:
Adding ThreadPool.SetMinThreads(100, X) seems to do the trick, I presume awaiting frees up the thread so the pool think it can fire up another one and then it immediately resumes.
By default, the minimum number of threads is set to the number of processors on a system. You can use the SetMinThreads method to increase the minimum number ofthreads. However, unnecessarily increasing these values can cause performance problems. If too many tasks start at the same time, all of them might appear to be slow. In most cases, the thread pool will perform better with its own algorithm for allocating threads. Reducing the minimum to less than the number of processors can also hurt performance.
From here: https://msdn.microsoft.com/en-us/library/system.threading.threadpool.setminthreads(v=vs.110).aspx
I removed AsParallel as it is not relevant and it just seems to confuse readers.

Is there a way to force the scheduler to fire up more Tasks on other threads?
You cannot have more executing threads than you have CPU cores. This is just how computers work. If you use more threads, then your work will actually get done more slowly since the threads must swap in and out of the cores in order to run.
Or is there a more "generous" scheduler in the framework without implementing a custom one?
PLINQ is already tuned to make maximum use of the hardware.
You can see this for yourself if you replace the Thread.Sleep call with something that actually uses the CPU (e.g., while (true) ;), and then watch your CPU usage in Task Manager. My expectation is that the 7 or 8 threads used by PLINQ in this example is all your machine can handle.

Useful link that explains it can be done with ThreadPool.SetMinThread:
https://gist.github.com/JonCole/e65411214030f0d823cb#file-threadpool-md

Try this: https://msdn.microsoft.com/en-us/library/system.threading.threadpool.setmaxthreads(v=vs.110).aspx
You can set the number of worker threads (first argument).

Use WithDegreeOfParallelism extension:
Enumerable.Range(0, 100).AsParallel().WithDegreeOfParallelism(x).Select(...

Related

Task.Factory.StartNew starts with a great delay despite having available threads in threadpool

This question is a continuation to a previous question I've asked:
It takes more than a few seconds for a task to start running
I now know how exactly to reproduce this scenario.
Task.Factory.StartNew is scheduled on the thread pool, so I'm logging the following (just before I invoke the Factory.StartNew):
int workerThreads = 0;
int completionPortThreads = 0;
ThreadPool.GetMaxThreads(out workerThreads, out completionPortThreads);
ThreadPool.GetAvailableThreads(out workerThreads, out completionPortThreads);
var tokenSource = new CancellationTokenSource();
CancellationToken token = tokenSource.Token;
//I HAVE A LOG HERE
Task task = Task.Factory.StartNew(() =>
{
//I HAVE A LOG ALSO HERE, AND THAT'S HOW I KNOW,
//THE TASK INVOCATION IS DELAYED, AND THE DALAY IS NOT DUE TO MY CODE WITHIN THE TASK
// Some action that returns a boolean - **CODE_A**
}).ContinueWith((task2) =>
{
result= task2.Result;
if (!result)
{
//Another action **CODE_B**
}
}, token);
When the bug is reproduced, I get 32767 as Max worker threads, and 32756 as available worker threads.
Now, there is something I don't understand.
At least as I've understood, once the threadpool reaches its overload, the threadpool will stop creating new threads immediately. And that's probably the reason for the delay of my task (that starts after more than 5 seconds from the invocation of Factory.StartNew).
But when the delay occurs, I see that I have 32756 available worker threads in my threadpool, so why does the threadpool NOT use one of those 32756 available worker threads to start my task immediately?
The available threads are on the ThreadPool (I mean, I invoke ThreadPool.GetAvailableThreads), and Task.Factory.StartNew allocates a task from the threadPool. So, why am I getting this delay despite having available threads in threadpool?
It's not the MAX worker threads value you need to look at - it's the MIN value you get via ThreadPool.GetMinThreads().
The max value is the absolute maximum threads that can be active. The min value is the number to always keep active. If you try to start a thread when the number of active threads is less than max (and greater than min) you'll see a 2 second delay.
You can change the minimum number of threads if absolutely necessary (which it is in some circumstances) but generally speaking if you find yourself needing to do that, you might need to think about redesigning your multithreading so that you don't need to.
As the Microsoft documentation states:
By default, the minimum number of threads is set to the number of processors on a system. You can use the SetMinThreads method to increase the minimum number of threads. However, unnecessarily increasing these values can cause performance problems. If too many tasks start at the same time, all of them might appear to be slow. In most cases, the thread pool will perform better with its own algorithm for allocating threads. Reducing the minimum to less than the number of processors can also hurt performance.

understanding Parallel.Invoke, creation and reusing of threads

I am trying to understand how Parallel.Invoke creates and reuses threads.
I ran the following example code (from MSDN, https://msdn.microsoft.com/en-us/library/dd642243(v=vs.110).aspx):
using System;
using System.Threading;
using System.Threading.Tasks;
class ThreadLocalDemo
{
static void Main()
{
// Thread-Local variable that yields a name for a thread
ThreadLocal<string> ThreadName = new ThreadLocal<string>(() =>
{
return "Thread" + Thread.CurrentThread.ManagedThreadId;
});
// Action that prints out ThreadName for the current thread
Action action = () =>
{
// If ThreadName.IsValueCreated is true, it means that we are not the
// first action to run on this thread.
bool repeat = ThreadName.IsValueCreated;
Console.WriteLine("ThreadName = {0} {1}", ThreadName.Value, repeat ? "(repeat)" : "");
};
// Launch eight of them. On 4 cores or less, you should see some repeat ThreadNames
Parallel.Invoke(action, action, action, action, action, action, action, action);
// Dispose when you are done
ThreadName.Dispose();
}
}
As I understand it, Parallel.Invoke tries to create 8 threads here - one for each action. So it creates the first thread, runs the first action, and by that gives a ThreadName to the thread. Then it creates the next thread (which gets a different ThreadName) and so on.
If it cannot create a new thread, it will reuse one of the threads created before. In this case, the value of repeat will be true and we can see this in the console output.
Is this correct until here?
The second-last comment ("Launch eight of them. On 4 cores or less, you should see some repeat ThreadNames") implies that the threads created by Invoke correspond to the available cpu threads of the processor: on 4 cores we have 8 cpu threads, at least one is busy (running the operating system and stuff), so Invoke can only use 7 different threads, so we must get at least one "repeat".
Is my interpretation of this comment correct?
I ran this code on my PC which has an Intel® Core™ i7-2860QM processor (i.e. 4 cores, 8 cpu threads). I expected to get at least one "repeat", but I didn't. When I changed the Invoke to take 10 instead of 8 actions, I got this output:
ThreadName = Thread6
ThreadName = Thread8
ThreadName = Thread6 (repeat)
ThreadName = Thread5
ThreadName = Thread3
ThreadName = Thread1
ThreadName = Thread10
ThreadName = Thread7
ThreadName = Thread4
ThreadName = Thread9
So I have at least 9 different threads in the console application. This contradicts the fact that my processor only has 8 threads.
So I guess some of my reasoning from above is wrong. Does Parallel.Invoke work differently than what I described above? If yes, how?
If you pass less then 10 items to Parallel.Invoke, and you don't specify MaxDegreeOfParallelism in options (so - your case), it will just run them all in parallel on thread pool sheduler using rougly the following code:
var actions = new [] { action, action, action, action, action, action, action, action };
var tasks = new Task[actions.Length];
for (int index = 1; index < tasks.Length; ++index)
tasks[index] = Task.Factory.StartNew(actions[index]);
tasks[0] = new Task(actions[0]);
tasks[0].RunSynchronously();
Task.WaitAll(tasks);
So just a regular Task.Factory.StartNew. If you will look at max number of threads in thread pool
int th, io;
ThreadPool.GetMaxThreads(out th, out io);
Console.WriteLine(th);
You will see some big number, like 32767. So, number of threads on which Parallel.Invoke will be executed (in your case) are not limited to number of cpu cores at all. Even on 1-core cpu it might run 8 threads in parallel.
You might then think, why some threads are reused at all? Because when work is done on thread pool thread - that thread is returned to the pool and is ready to accept new work. Actions from your example basically do no work at all and complete very fast. So sometimes first thread started via Task.Factory.StartNew has already completed your action and is returned to the pool before all subsequent threads were started. So that thread is reused.
By the way, you can see (repeat) in your example with 8 actions, and even with 7 if you try hard enough, on a 8 core (16 logical cores) processor.
UPDATE to answer your comment. Thread pool scheduler will not necessary create new threads immediately. There is min and max number of threads in thread pool. How to see max I already shown above. To see min number:
int th, io;
ThreadPool.GetMinThreads(out th, out io);
This number will usually be equal to the number of cores (so for example 8). Now, when you request new action to be performed on thread pool thread, and number of threads in a thread pool is less than minimum - new thread will be created immeditely. However, if number of available threads is greater than minimum - certain delay will be introduced before creating new thread (I don't remember how long exactly unfortunately, about 500ms).
Statement you added in your comment I highly doubt can execute in 2-3 seconds. For me it executes for 0.3 seconds max. So when first 8 threads are created by thread pool, there is that 500ms delay before creating 9th. During that delay, some (or all) of first 8 threads are completed their job and are available for new work, so there is no need to create new thread and they can be reused.
To verify this, introduce bigger delay:
static void Main()
{
// Thread-Local variable that yields a name for a thread
ThreadLocal<string> ThreadName = new ThreadLocal<string>(() =>
{
return "Thread" + Thread.CurrentThread.ManagedThreadId;
});
// Action that prints out ThreadName for the current thread
Action action = () =>
{
// If ThreadName.IsValueCreated is true, it means that we are not the
// first action to run on this thread.
bool repeat = ThreadName.IsValueCreated;
Console.WriteLine("ThreadName = {0} {1}", ThreadName.Value, repeat ? "(repeat)" : "");
Thread.Sleep(1000000);
};
int th, io;
ThreadPool.GetMinThreads(out th, out io);
Console.WriteLine("cpu:" + Environment.ProcessorCount);
Console.WriteLine(th);
Parallel.Invoke(Enumerable.Repeat(action, 100).ToArray());
// Dispose when you are done
ThreadName.Dispose();
Console.ReadKey();
}
You will see that now thread pool has to create new threads every time (much more than there are cores), because it cannot reuse previous threads while they are busy.
You can also increase number of min threads in thread pool, like this:
int th, io;
ThreadPool.GetMinThreads(out th, out io);
ThreadPool.SetMinThreads(100, io);
This will remove the delay (until 100 threads are created) and in above example you will notice that.
Behind the scenes, threads are organized (and possessed by) the task scheduler. Primary purpose of the task scheduler is to keep all CPU cores used as much as possible with useful work.
Under the hood, scheduler is using the thread pool, and then size of the thread pool is the way to fine-tune usefulness of operations executed on CPU cores.
Now this requires some analysis. For instance, thread switching costs CPU cycles and it is not useful work. On the other hand, when one thread executes one task on a core, all other tasks are stalled and they are not progressing on that core. I believe that is the core reason why the scheduler is usually starting two threads per core, so that at least some movement is visible in case that one task takes longer to complete (like several seconds).
There are corollaries to this basic mechanism. When some tasks take long time to complete, scheduler starts new threads to compensate. That means that long-running task will now have to compete for the core with short-running tasks. In that way, short tasks will be completed one after another, and long task will slowly progress to its completion as well.
Bottom line is that your observations about threads are generally correct, but not entirely true in specific situations. In concrete execution of a number of tasks, scheduler might choose to raise more threads, or to keep going with the default. That is why you will sometimes notice that number of threads differs.
Remember the goal of the game: Utilize CPU cores with useful work as much as possible, while at the same time making all tasks move, so that the application doesn't look like frozen. Historically, people used to try to reach these goals with many different techniques. Analysis had shown that many of those techniques were applied randomly and didn't really increase CPU utilization. That analysis has lead to introduction of task schedulers in .NET, so that fine-tuning can be coded once and be done well.
So I have at least 9 different threads in the console application. This contradicts the fact that my processor only has 8 threads.
A thread is a very much overloaded term. It can mean, at the very least: (1) something you sew with, (2) a bunch of code with associated state, that is represented by an OS handle, and (3) an execution pipeline of a CPU. The Thread.CurrentThread refers to (2), the "processor thread" that you mentioned refers to (3).
The existence of a (2)-thread is not predicated on the existence of (3)-thread, and the number of (2)-threads that exist on any particular system is pretty much limited by available memory and OS design. The existence of (2)-thread doesn't imply execution of (2)-thread at any given time (unless you use APIs that guarantee that).
Furthermore, if a (2)-thread executes at some point - implying a temporary 1:1 binding between (2)-thread and (3)-thread, there is no implication that the thread will continue executing in general, and of course neither is there an implication that the thread will continue executing on the same (3)-thread if it continues executing at all.
So, even if you have "caught" the execution of a (2)-thread on a (3)-thread by some side effect, e.g. console output, as you did, that doesn't necessarily imply anything about any other (2)-threads and (3)-threads at that point.
On to your code:
// If ThreadName.IsValueCreated is true, it means that we are not the
// first action to run on this thread. <-- this refers to (2)-thread, NOT (3)-thread.
Parallel.Invoke is not precluded from (in terms of specifications) creating as many new (2)-threads as there are arguments passed to it. The actual number of (2)-threads created may be all the way from zero to a hero, since to call Parallel.Invoke there must be an existing (2)-thread with some code that calls this API. So, no new (2)-threads need to be created at all, for example. Whether the (2)-threads created by Parallel.Invoke execute on any particular number of (3)-threads concurrently is beyond your control either.
So that explains the behavior you see. You conflated (2)-threads with (3)-threads, and assumed that Parallel.Invoke does something specific it in fact is not guaranteed to do. Citing documentation:
No guarantees are made about the order in which the operations execute or whether they execute in parallel.
This implies that Invoke is free to run the actions on dedicated (2)-threads if it so wishes. And that is what you observed.

Why Thread.Sleep affects creation of new Tasks?

private static void Main(string[] args)
{
for (int i = 0; i < 1000; i++)
{
Task.Factory.StartNew(() =>
{
Thread.Sleep(1000);
Console.WriteLine("hej");
Thread.Sleep(10000);
});
}
Console.ReadLine();
}
Why this code won't print 1000 times "hej" after one second? Why Thread.Sleep(10000) has an impact on code behavior?
Factory.StartNew effectively delegates the work to ThreadPool.
Threadpool will create number of threads immediately to respond the request as long as threads count is less than or equal to processor count. Once it reaches processor count, threadpool will stop creating new threads immediately. That makes sense, because creating number of threads more than processor count introduces Thread scheduling overhead and returns nothing.
Instead it will throttle the creation of threads. It waits for 500 ms to see if any work still pending and no threads to process the request. If pending works are there, it will introduce a new thread(only one). This process keeps on going as long as you have enough works to do.
When work queue's traffic is cleared, threadpool will destroy the threads. And above mentioned process keeps on going.
Also, There is a max limit for number of threads threadpool can run simultaneously. If you hit that, threadpool will stop creating more threads and wait for previous work items to complete, So that it can reuse the existing thread.
That's not the end of story, It is convoluted! These are few decisions taken by ThreadPool.
I hope now that will be clear why you see what you see.
There are a multitude of factors that would alter the result.
Some being (but not limited to):
The inherent time for the iteration of the loop
The size of the thread pool
Thread management overhead
The way you code behaves is intended behaviour. You wait 1000 milliseconds to print hej and after printing you do Thread.sleep for another 10000 millesconds. If you want to print 1000 times hej after one second remove Thread.sleep(10000).

Parallel.For vs regular threads

I'm trying to understand why Parallel.For is able to outperform a number of threads in the following scenario: consider a batch of jobs that can be processed in parallel. While processing these jobs, new work may be added, which then needs to be processed as well. The Parallel.For solution would look as follows:
var jobs = new List<Job> { firstJob };
int startIdx = 0, endIdx = jobs.Count;
while (startIdx < endIdx) {
Parallel.For(startIdx, endIdx, i => WorkJob(jobs[i]));
startIdx = endIdx; endIdx = jobs.Count;
}
This means that there are multiple times where the Parallel.For needs to synchronize. Consider a bread-first graph algorithm algorithm; the number of synchronizations would be quite large. Waste of time, no?
Trying the same in the old-fashioned threading approach:
var queue = new ConcurrentQueue<Job> { firstJob };
var threads = new List<Thread>();
var waitHandle = new AutoResetEvent(false);
int numBusy = 0;
for (int i = 0; i < maxThreads; i++)
threads.Add(new Thread(new ThreadStart(delegate {
while (!queue.IsEmpty || numBusy > 0) {
if (queue.IsEmpty)
// numbusy > 0 implies more data may arrive
waitHandle.WaitOne();
Job job;
if (queue.TryDequeue(out job)) {
Interlocked.Increment(ref numBusy);
WorkJob(job); // WorkJob does a waitHandle.Set() when more work was found
Interlocked.Decrement(ref numBusy);
}
}
// others are possibly waiting for us to enable more work which won't happen
waitHandle.Set();
})));
threads.ForEach(t => t.Start());
threads.ForEach(t => t.Join());
The Parallel.For code is of course much cleaner, but what I cannot comprehend, it's even faster as well! Is the task scheduler just that good? The synchronizations were eleminated, there's no busy waiting, yet the threaded approach is consistently slower (for me). What's going on? Can the threading approach be made faster?
Edit: thanks for all the answers, I wish I could pick multiple ones. I chose to go with the one that also shows an actual possible improvement.
The two code samples are not really the same.
The Parallel.ForEach() will use a limited amount of threads and re-use them. The 2nd sample is already starting way behind by having to create a number of threads. That takes time.
And what is the value of maxThreads ? Very critical, in Parallel.ForEach() it is dynamic.
Is the task scheduler just that good?
It is pretty good. And TPL uses work-stealing and other adaptive technologies. You'll have a hard time to do any better.
Parallel.For doesn't actually break up the items into single units of work. It breaks up all the work (early on) based on the number of threads it plans to use and the number of iterations to be executed. Then has each thread synchronously process that batch (possibly using work stealing or saving some extra items to load-balance near the end). By using this approach the worker threads are virtually never waiting on each other, while your threads are constantly waiting on each other due to the heavy synchronization you're using before/after every single iteration.
On top of that since it's using thread pool threads many of the threads it needs are likely already created, which is another advantage in its favor.
As for synchronization, the entire point of a Parallel.For is that all of the iterations can be done in parallel, so there is almost no synchronization that needs to take place (at least in their code).
Then of course there is the issue of the number of threads. The threadpool has a lot of very good algorithms and heuristics to help it determine how many threads are need at that instant in time, based on the current hardware, the load from other applications, etc. It's possible that you're using too many, or not enough threads.
Also, since the number of items that you have isn't known before you start I would suggest using Parallel.ForEach rather than several Parallel.For loops. It is simply designed for the situation that you're in, so it's heuristics will apply better. (It also makes for even cleaner code.)
BlockingCollection<Job> queue = new BlockingCollection<Job>();
//add jobs to queue, possibly in another thread
//call queue.CompleteAdding() when there are no more jobs to run
Parallel.ForEach(queue.GetConsumingEnumerable(),
job => job.DoWork());
Your creating a bunch of new threads and the Parallel.For is using a Threadpool. You'll see better performance if you were utilizing the C# threadpool but there really is no point in doing that.
I would shy away from rolling out your own solution; if there is a corner case where you need customization use the TPL and customize..

Threadpool QueueUserWorkItem without using SetMaxThreads,GetMaxThreads

I'm using Threadpool to do some parallel processing in c# .NET 2.0.
Code :
int MAXThreads=GetConfigValue("MaxThreadLimit"); //This value is read from app.config
ManualResetEvent[] doneEvents=new ManualResetEvent[MAXThreads];
for(int i=0;i<MaxThreads,i++)
{
doneEvents[i]=new ManualResetEvent(false);
//create workload
DoProcess job=new DoProcess(workload,doneEvents[i]);
ThreadPool.QueueUserWorkItem(job.ThreadPoolCallBack,i);
}
WaitHandle.WaitAll(doneEvents);
//proceed
Class DoProcess
{
private WorkLoad load;
private ManualResetEvent doneEvent;
public DoProcess(WorkLoad load,ManualResetEvent doneEvent)
{
this.load=load;
this.doneEvent=doneEvent;
}
public void ThreadPoolCallBack(object index)
{
//Do Processing
doneEvent.Set();
}
}
MAXThreads value is being read from config but I guess this has nothing to do with the actual number of threads generated. Only few ~4-5 threads handle all the workload. I want thread count to be fixed somewhere around 20. How can I achieve this? Am I missing on something?..Does SetMaxThreads address this issue?..The above code will run on quad core cpu.
You'd have to set the minimum number of threads instead.
That's not a very good idea in general, running more threads than you have processor cores usually gets less work done since the operating system is spending time swapping them in and out. These context switches are not cheap. The threadpool manager does its best to limit the number of active threads to the number of cores. Only allowing more threads to run when the existing ones don't complete in time. Up to the maximum number of threads. An enormous value by default, 1000 in your case.
Only increase the min threads when those worker threads are not performing enough work because they are blocked on I/O too often. In which case you really ought to consider Thread objects instead of thread pool threads.
There is also SetMinThreads in the ThreadPool class. Setting both min and max to the same value "should" fix the number of threads, but what actually happens under the hood is anybody's guess.
MSDN:
The thread pool provides new worker threads or I/O completion threads
on demand until it reaches the minimum for each category.
So setting the minimum number of threads to 20 should give you no less than 20 threads in the pool.
Assuming this is a C# application, you can use .NET Framework 4's Parallel programming model and limit the threads to 20.
Parallel.For(0, n, new ParallelOptions { MaxDegreeOfParallelism = 20 },
i =>
{
DoWork(i);
});
However, if this is a web site/app, it's best to stay away from the ThreadPool all together, except for really small jobs, 1-2 threads, because you don't want to starve the ThreadPool. And NEVER set the max or min threads in the code, because it will affect your entire site, and all other sites using the same thread pool.
In this case, I recommend using SmartThreadPool instead.

Categories