Thread count growth when using Task Parallel Library

Thread count growth when using Task Parallel Library - c#

I'm using C# TPL and I'm having a problem with a producer/consumer code... for some reason, TPL doesn't reuse threads and keeps creating new ones without stopping
I made a simple example to demonstrate this behavior:
class Program
{
static BlockingCollection<int> m_Buffer = new BlockingCollection<int>(1);
static CancellationTokenSource m_Cts = new CancellationTokenSource();
static void Producer()
{
try
{
while (!m_Cts.IsCancellationRequested)
{
Console.WriteLine("Enqueuing job");
m_Buffer.Add(0);
Thread.Sleep(1000);
}
}
finally
{
m_Buffer.CompleteAdding();
}
}
static void Consumer()
{
Parallel.ForEach(m_Buffer.GetConsumingEnumerable(), Run);
}
static void Run(int i)
{
Console.WriteLine
("Job Processed\tThread: {0}\tProcess Thread Count: {1}",
Thread.CurrentThread.ManagedThreadId,
Process.GetCurrentProcess().Threads.Count);
}
static void Main(string[] args)
{
Task producer = new Task(Producer);
Task consumer = new Task(Consumer);
producer.Start();
consumer.Start();
Console.ReadKey();
m_Cts.Cancel();
Task.WaitAll(producer, consumer);
}
}
This code creates 2 tasks, producer and consumer. Produces adds 1 work item every second, and Consumer only prints out a string with information. I would assume that 1 consumer thread is enough in this situation, because tasks are processed much faster than they are being added to the queue, but what actually happens is that every second number of threads in the process grows by 1... as if TPL is creating new thread for every item
after trying to understand what's happening I also noticed another thing: even though BlockingCollection size is 1, after a while Consumer starts getting called in bursts, for example, this is how it starts:
Enqueuing job
Job Processed Thread: 4 Process Thread Count: 9
Enqueuing job
Job Processed Thread: 6 Process Thread Count: 9
Enqueuing job
Job Processed Thread: 5 Process Thread Count: 10
Enqueuing job
Job Processed Thread: 4 Process Thread Count: 10
Enqueuing job
Job Processed Thread: 6 Process Thread Count: 11
and this is how it's processing items less than a minute later:
Enqueuing job
Job Processed Thread: 25 Process Thread Count: 52
Enqueuing job
Enqueuing job
Job Processed Thread: 5 Process Thread Count: 54
Job Processed Thread: 5 Process Thread Count: 54
and because threads get disposed after finishing Parallel.ForEach loop (I don't show it in this example, but it was in the real project) I assumed that it has something to do with ForEach specifically... I found this artice http://reedcopsey.com/2010/01/26/parallelism-in-net-part-5-partitioning-of-work/, and I thought that my problem was caused by this default partitioner, so I took custom partitioner from TPL Examples that is feeding Consumer threads item one by one, and although it fixed the order of execution (got rid of delay)...
Enqueuing job
Job Processed Thread: 71 Process Thread Count: 140
Enqueuing job
Job Processed Thread: 12 Process Thread Count: 141
Enqueuing job
Job Processed Thread: 72 Process Thread Count: 142
Enqueuing job
Job Processed Thread: 38 Process Thread Count: 143
Enqueuing job
Job Processed Thread: 73 Process Thread Count: 143
Enqueuing job
Job Processed Thread: 21 Process Thread Count: 144
Enqueuing job
Job Processed Thread: 74 Process Thread Count: 145
...it didn't stop threads from growing
I know about ParallelOptions.MaxDegreeOfParallelism, but I still want to understand what's happening with TPL and why it creates hundreds of threads for no reason
in my project I a code that has to run for hours and read new data from database, put it into a BlockingCollections and have has data processed by other code, there's 1 new item about every 5 seconds and it takes from several milliseconds to almost a minute to process it, and after running for about 10 minutes, thread count reached over a 1000 threads

There are two things that together cause this behavior:
ThreadPool tries to use the optimal number of threads for your situation. But if one of the threads in the pool blocks, the pool sees this as if that thread wasn't doing any useful work and so it tends to create another thread soon after that. What this means is that if you have a lot of blocking, ThreadPool is really bad at guessing the optimal number of threads and it tends to create new threads until it reaches the limit.
Parallel.ForEach() trusts the ThreadPool to guess the correct number of threads, unless you set the maximum number of threads explicitly. Parallel.ForEach() was also primarily meant for bounded collections, not streams of data.
When you combine these two things with GetConsumingEnumerable(), what you get is that Parallel.ForEach() creates threads that are almost always blocked. The ThreadPool sees this, and, to try to keep the CPU utilized, creates more and more threads.
The correct solution here is to set MaxDegreeOfParallelism. If your computations are CPU-bound, the best value is most likely Environment.ProcessorCount. If they are IO-bound, you will have to find out the best value experimentally.
Another option, if you can use .Net 4.5, is to use TPL Dataflow. This library was made specifically to process streams of data, like you have, so it doesn't have the problems your code has. It's actually even better than that and doesn't use any threads at all when it's not processing anything currently.
Note: There is also a good reason why is a new thread created for each new item, but explaining that would require me to explain how Parallel.ForEach() works in more detail and I feel that's not necessary here.

Related

How does asynchronous programming work with threads when using Thread.Sleep()?

Presumptions/Prelude:
In previous questions, we note that Thread.Sleep blocks threads see: When to use Task.Delay, when to use Thread.Sleep?.
We also note that console apps have three threads: The main thread, the GC thread & the finalizer thread IIRC. All other threads are debugger threads.
We know that async does not spin up new threads, and it instead runs on the synchronization context, "uses time on the thread only when the method is active". https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/async/task-asynchronous-programming-model
Setup:
In a sample console app, we can see that neither the sibling nor the parent code are affected by a call to Thread.Sleep, at least until the await is called (unknown if further).
var sw = new Stopwatch();
sw.Start();
Console.WriteLine($"{sw.Elapsed}");
var asyncTests = new AsyncTests();
var go1 = asyncTests.WriteWithSleep();
var go2 = asyncTests.WriteWithoutSleep();
await go1;
await go2;
sw.Stop();
Console.WriteLine($"{sw.Elapsed}");
Stopwatch sw1 = new Stopwatch();
public async Task WriteWithSleep()
{
sw1.Start();
await Task.Delay(1000);
Console.WriteLine("Delayed 1 seconds");
Console.WriteLine($"{sw1.Elapsed}");
Thread.Sleep(9000);
Console.WriteLine("Delayed 10 seconds");
Console.WriteLine($"{sw1.Elapsed}");
sw1.Stop();
}
public async Task WriteWithoutSleep()
{
await Task.Delay(3000);
Console.WriteLine("Delayed 3 second.");
Console.WriteLine($"{sw1.Elapsed}");
await Task.Delay(6000);
Console.WriteLine("Delayed 9 seconds.");
Console.WriteLine($"{sw1.Elapsed}");
}
Question:
If the thread is blocked from execution during Thread.Sleep, how is it that it continues to process the parent and sibling? Some answer that it is background threads, but I see no evidence of multithreading background threads. What am I missing?

I see no evidence of multithreading background threads. What am I missing?
Possibly you are looking in the wrong place, or using the wrong tools. There's a handy property that might be of use to you, in the form of Thread.CurrentThread.ManagedThreadId. According to the docs,
A thread's ManagedThreadId property value serves to uniquely identify that thread within its process.
The value of the ManagedThreadId property does not vary over time
This means that all code running on the same thread will always see the same ManagedThreadId value. If you sprinkle some extra WriteLines into your code, you'll be able to see that your tasks may run on several different threads during their lifetimes. It is even entirely possible for some async applications to have all their tasks run on the same thread, though you probably won't see that behaviour in your code under normal circumstances.
Here's some example output from my machine, not guaranteed to be the same on yours, nor is it necessarily going to be the same output on successive runs of the same application.
00:00:00.0000030
* WriteWithSleep on thread 1 before await
* WriteWithoutSleep on thread 1 before first await
* WriteWithSleep on thread 4 after await
Delayed 1 seconds
00:00:01.0203244
* WriteWithoutSleep on thread 5 after first await
Delayed 3 second.
00:00:03.0310891
* WriteWithoutSleep on thread 6 after second await
Delayed 9 seconds.
00:00:09.0609263
Delayed 10 seconds
00:00:10.0257838
00:00:10.0898976
The business of running tasks on threads is handled by a TaskScheduler. You could write one that forces code to be single threaded, but that's not often a useful thing to do. The default scheduler uses a threadpool, and as such tasks can be run on a number of different threads.

The Task.Delay method is implemented basically like this (simplified¹):
public static Task Delay(int millisecondsDelay)
{
var tcs = new TaskCompletionSource();
_ = new Timer(_ => tcs.SetResult(), null, millisecondsDelay, -1);
return tcs.Task;
}
The Task is completed on the callback of a System.Threading.Timer component, and according to the documentation this callback is invoked on a ThreadPool thread:
The method does not execute on the thread that created the timer; it executes on a ThreadPool thread supplied by the system.
So when you await the task returned by the Task.Delay method, the continuation after the await runs on the ThreadPool. The ThreadPool typically has more than one threads available immediately on demand, so it's not difficult to introduce concurrency and parallelism if you create 2 tasks at once, like you do in your example. The main thread of a console application is not equipped with a SynchronizationContext by default, so there is no mechanism in place to prevent the observed concurrency.
¹ For demonstration purposes only. The Timer reference is not stored anywhere, so it might be garbage collected before the callback is invoked, resulting in the Task never completing.

I am not accepting my own answer, I will accept someone else's answer because they helped me figure this out. First, in the context of my question, I was using async Main. It was very hard to choose between Theodor's & Rook's answer. However, Rook's answer provided me with one thing that helped me fish: Thread.CurrentThread.ManagedThreadId
These are the results of my running code:
1 00:00:00.0000767
Not Delayed.
1 00:00:00.2988809
Delayed 1 second.
4 00:00:01.3392148
Delayed 3 second.
5 00:00:03.3716776
Delayed 9 seconds.
5 00:00:09.3838139
Delayed 10 seconds
4 00:00:10.3411050
4 00:00:10.5313519
I notice that there are 3 threads here, The initial thread (1) provides for the first calling method and part of the WriteWithSleep() until Task.Delay is initialized and later awaited. At the point that Task.Delay is brought back into Thread 1, everything is run on Thread 4 instead of Thread 1 for the main and the remainder of WriteWithSleep.
WriteWithoutSleep uses its own Thread(5).
So my error was believing that there were only 3 threads. I believed the answer to this question: https://stackoverflow.com/questions/3476642/why-does-this-simple-net-console-app-have-so-many-threads#:~:text=You%20should%20only%20see%20three,see%20are%20debugger%2Drelated%20threads.
However, that question may not have been async, or may not have considered these additional worker threads from the threadpool.
Thank you all for your assistance in figuring out this question.

Task starts with delay

I create and start task in following way:
Task task = new Task(() => controller.Play());
task.Start();
For some reason, sometimes task get started with around 7-10 seconds delay.
I use 6 tasks in parallel, max number of tasks is 32767 and available 32759
which is what i log before i create task so it can't be that max number of tasks is reached. I write log at the first line of code in controller.Play() method that task should execute, so there is no lock or anything that could make task to wait.

Long running tasks, like your deserialization of 100MB that takes 10 seconds, should be, hm, well, run as long-running tasks :-)
Long-running tasks are, as per the current implementation, always run on a dedicated thread and they do not put pressure on the thread-pool.
In your case, you perhaps only two tasks - the deserialization and the player. The TaskScheduler works under the assumption that tasks are short-lived, and in this case, it obviously schedules the "player" task to run after the "deserializaion" one.

Why is the Completed callback from SocketAsyncEventArgs frequently executed in newly created threads instead of using a bounded thread pool?

I have a simple client application that receives byte buffers from the network with a low throughput. Here is the code:
private static readonly HashSet<int> _capturedThreadIds = new HashSet<int>();
private static void RunClient(Socket socket)
{
var e = new SocketAsyncEventArgs();
e.SetBuffer(new byte[10000], 0, 10000);
e.Completed += SocketAsyncEventsArgsCompleted;
Receive(socket, e);
}
private static void Receive(Socket socket, SocketAsyncEventArgs e)
{
var isAsynchronous = socket.ReceiveAsync(e);
if (!isAsynchronous)
SocketAsyncEventsArgsCompleted(socket, e);
}
private static void SocketAsyncEventsArgsCompleted(object sender, SocketAsyncEventArgs e)
{
if (e.LastOperation != SocketAsyncOperation.Receive || e.SocketError != SocketError.Success || e.BytesTransferred <= 0)
{
Console.WriteLine("Operation: {0}, Error: {1}, BytesTransferred: {2}", e.LastOperation, e.SocketError, e.BytesTransferred);
return;
}
var thread = Thread.CurrentThread;
if (_capturedThreadIds.Add(thread.ManagedThreadId))
Console.WriteLine("New thread, ManagedId: " + thread.ManagedThreadId + ", NativeId: " + GetCurrentThreadId());
//Console.WriteLine(e.BytesTransferred);
Receive((Socket)sender, e);
}
The threading behavior of the application is quite curious:
The SocketAsyncEventsArgsCompleted method is frequently run in new threads. I would have expected that after some time no new thread would be created. I would have expected the threads to be reused, because of the thread pool (or IOCP thread pool) and because the throughput is very stable.
The number of threads stays low, but I can see in the process explorer that threads are frequently created and destroyed. Likewise, I would not have expected threads to be created or destroyed.
Can you explain the application behavior?
Edit: The "low" throughput is 20 messages per second (roughly 200 KB/s). If I increase the throughput to more than 1000 messages per second (50 MB/s), the application behavior does not change.

The low application throughput itself cannot explain the thread creation and destruction. The socket receives 20 messages per seconds, which is more than enough to keep a thread alive (the waiting threads are being destroyed after spending 10 seconds idle).
This problem is related to the thread pool thread injection, i.e. the threads creation and destruction strategy. Thread pool threads are regularly injected and destroyed in order to measure the impact of new threads on the thread pool throughput.
This is called thread probing. It is clearly explained in the Channel 9 video CLR 4 - Inside the Thread Pool (jump to 26:30).
It seems like thread probing is always done with newly created threads instead of moving a thread in and out of the pool. I suppose it works better like this for most applications because it avoids to keep an unused thread alive.

From MSDN
Beginning with the .NET Framework 4, the thread pool creates and
destroys worker threads in order to optimize throughput, which is
defined as the number of tasks that complete per unit of time. Too few
threads might not make optimal use of available resources, whereas too
many threads could increase resource contention.
Note
When demand is low, the actual number of thread pool threads can
fall below the minimum values.
Basically it sounds like your low throughput is causing the thread pool to destroy threads since they are not required, and are just sat taking up resources. I wouldn't worry about it. As MS explicitly state:
In most cases the thread pool will perform better with its own
algorithm for allocating threads.
If you're really bothered, you could always poll ThreadPool.GetAvailableThreads() to watch the pool, and see how different network throughputs affect it.

CLR via C# 4th Ed. - Confused about waiting for Task deadlock

Jeffrey Richter pointed out in his book 'CLR via C#' the example of a possible deadlock I don't understand (page 702, bordered paragraph).
The example is a thread that runs Task and call Wait() for this Task. If the Task is not started it should possible that the Wait() call is not blocking, instead it's running the not started Task. If a lock is entered before the Wait() call and the Task also try to enter this lock can result in a deadlock.
But the locks are entered in the same thread, should this end up in a deadlock scenario?
The following code produce the expected output.
class Program
{
static object lockObj = new object();
static void Main(string[] args)
{
Task.Run(() =>
{
Console.WriteLine("Program starts running on thread {0}",
Thread.CurrentThread.ManagedThreadId);
var taskToRun = new Task(() =>
{
lock (lockObj)
{
for (int i = 0; i < 10; i++)
Console.WriteLine("{0} from Thread {1}",
i, Thread.CurrentThread.ManagedThreadId);
}
});
taskToRun.Start();
lock (lockObj)
{
taskToRun.Wait();
}
}).Wait() ;
}
}
/* Console output
Program starts running on thread 3
0 from Thread 3
1 from Thread 3
2 from Thread 3
3 from Thread 3
4 from Thread 3
5 from Thread 3
6 from Thread 3
7 from Thread 3
8 from Thread 3
9 from Thread 3
*/
No deadlock occured.
J. Richter wrote in his book "CLR via C#" 4th Edition on page 702:
When a thread calls the Wait method, the system checks if the Task that the thread is waiting for has started executing. If it has, then the thread calling Wait will block until the Task has completed running. But if the Task has not started executing yet, then the system may (depending on the TaskScheduler) execute the Trask by using the thread that called Wait. If this happens, then the thread calling Wait does not block; it executes the Task and returns immediatlely. This is good in that no thread has blocked, thereby reducing resource usage (by not creating a thread to replace the blocked thread) while improving performance (no time is spet to create a thread an there is no contexte switcing). But it can also be bad if, for example, thre thread has taken a thread synchronization lock before calling Wait and thren the Task tries to take the same lock, resulting in a deadlocked thread!
If I'm understand the paragraph correctly, the code above has to end in a deadlock!?

You're taking my usage of the word "lock" too literally. The C# "lock" statement (which my book discourages the use of), internally leverages Monitor.Enter/Exit. The Monitor lock is a lock that supports thread ownership & recursion. Therefore, a single thread can acquire this kind of lock multiple times successfully. But, if you use a different kind of lock, like a Semaphore(Slim), an AutoResetEvent(Slim) or a ReaderWriterLockSlim (without recursion), then when a single thread tries to acquire any of these locks multiple times, deadlock occurs.

In this example, you're dealing with task inlining, a not-so-rare behavior of the TPL's default task scheduler. It results in the task being executed on the same thread which is already waiting for it with Task.Wait(), rather than on a random pool thread. In which case, there is no deadlock.
Change your code like below and you'll have a dealock:
taskToRun.Start();
lock (lockObj)
{
//taskToRun.Wait();
((IAsyncResult)taskToRun).AsyncWaitHandle.WaitOne();
}
The task inlining is nondeterministic, it may or may not happen. You should make no assumptions. Check Task.Wait and “Inlining” by Stephen Toub for more details.
Updated, the lock does not affect the task inlining here. Your code still runs without deadlock if you move taskToRun.Start() inside the lock:
lock (lockObj)
{
taskToRun.Start();
taskToRun.Wait();
}
What does cause the inlining here is the circumstance that the main thread is calling taskToRun.Wait() right after taskToRun.Start(). Here's what happens behind the scene:
taskToRun.Start() queues the task for execution by the task scheduler, but it hasn't been allocated a pool thread yet.
On the same thread, the TPL code inside taskToRun.Wait() checks if the task has already been allocated a pool thread (it hasn't) and executes it inline on the main thread. In which case, it's OK to acquired the same lock twice without a deadlock.
There is also a TPL Task Scheduler thread. If this thread gets a chance to execute before taskToRun.Wait() is called on the main thread, inlining doesn't happen and you get a deadlock. Adding Thread.Sleep(100) before Task.Wait() would be modelling this scenario. Inlining also doesn't happen if you don't use Task.Wait() and rather use something like AsyncWaitHandle.WaitOne() above.
As to the quote you've added to your question, it depends on how you read it. One thing is for sure: the same lock from the main thread can be entered inside the task, when the task gets inlined, without a deadlock. You just cannot make any assumptions that it will get inlined.

In your example, no deadlock occurs because the thread scheduling the task and the thread executing the task happen to be the same. If you were to modify the code such that your task ran on a different thread, you would see the deadlock occur, because two threads would then be contending for a lock on the same object.
Your example, modified to create a deadlock:
class Program {
static object lockObj = new object();
static void Main(string[] args) {
Console.WriteLine("Program starts running on thread {0}",
Thread.CurrentThread.ManagedThreadId);
var taskToRun = new Task(() => {
lock (lockObj) {
for (int i = 0; i < 10; i++)
Console.WriteLine("{0} from Thread {1}",
i, Thread.CurrentThread.ManagedThreadId);
}
});
lock (lockObj) {
taskToRun.Start();
taskToRun.Wait();
}
}
}

This example code has two standard threading problems. To understand it, you first have to understand thread races. When you start a thread, you can never assume it will start running right away. Nor can you assume that the code inside the thread arrives at a particular statement at a particular moment in time.
What matters a great deal here is whether or not the task arrives at the lock statement before the main thread does. In other words, whether it races ahead of the code in the main thread. Do model this as a horse race, the thread that acquired the lock is the horse that wins.
If it is the task that wins, pretty common on modern machines with multiple processor cores or a simple program that doesn't have any other threads active (and probably when you test the code) then nothing goes wrong. It acquires the lock and prevents the main thread from doing the same when it, later, arrives at the lock statement. So you'll see the console output, the task finishes, the main thread now acquires the lock and the Wait() call quickly completes.
But if the thread pool is already busy with other threads, or the machine is busy executing threads in other programs, or you are unlucky and you get an email just as the task starts running, then the code in the task doesn't start running right away and it is the main thread that acquired the lock first. The task can now no longer enter the lock statement so it cannot complete. And the main thread can not complete, Wait() will never return. A deadly embrace called deadlock.
Deadlock is relatively easy to debug, you've got all the time in the world to attach a debugger and look at the active threads to see why they are blocked. Threading race bugs are incredibly difficult to debug, they happen too infrequently and it can be very difficult to reason through the ordering problem that causes them. A common approach to diagnose thread races is to add tracing to the program so you can see the order. Which changes the timing and can make the bug disappear. Lots of programs were shipped with the tracing left on because they couldn't diagnose the problem :)

Thanks #jeffrey-richter for pointing it out, #embee there are scenario when we use locks other than Monitor than a single thread tries to acquire any of these locks multiple times, deadlock occurs. Check out the example below
The following code produce the expected deadlock. It need not be nested task the deadlock can occur without nesting also
class Program
{
static AutoResetEvent signalEvent = new AutoResetEvent(false);
static void Main(string[] args)
{
Task.Run(() =>
{
Console.WriteLine("Program starts running on thread {0}",
Thread.CurrentThread.ManagedThreadId);
var taskToRun = new Task(() =>
{
signalEvent.WaitOne();
for (int i = 0; i < 10; i++)
Console.WriteLine("{0} from Thread {1}",
i, Thread.CurrentThread.ManagedThreadId);
});
taskToRun.Start();
signalEvent.Set();
taskToRun.Wait();
}).Wait() ;
}
}

Parallel.ForEach keeps spawning new threads

While I was using Parallel.ForEach in my program, I found that some threads never seemed to finish. In fact, it kept spawning new threads over and over, a behaviour that I wasn't expecting and definitely don't want.
I was able to reproduce this behaviour with the following code which, just like my 'real' program, both uses processor and memory a lot (.NET 4.0 code):
public class Node
{
public Node Previous { get; private set; }
public Node(Node previous)
{
Previous = previous;
}
}
public class Program
{
public static void Main(string[] args)
{
DateTime startMoment = DateTime.Now;
int concurrentThreads = 0;
var jobs = Enumerable.Range(0, 2000);
Parallel.ForEach(jobs, delegate(int jobNr)
{
Interlocked.Increment(ref concurrentThreads);
int heavyness = jobNr % 9;
//Give the processor and the garbage collector something to do...
List<Node> nodes = new List<Node>();
Node current = null;
for (int y = 0; y < 1024 * 1024 * heavyness; y++)
{
current = new Node(current);
nodes.Add(current);
}
TimeSpan elapsed = DateTime.Now - startMoment;
int threadsRemaining = Interlocked.Decrement(ref concurrentThreads);
Console.WriteLine("[{0:mm\\:ss}] Job {1,4} complete. {2} threads remaining.",
elapsed, jobNr, threadsRemaining);
});
}
}
When run on my quad-core, it initially starts of with 4 concurrent threads, just as you would expect. However, over time more and more threads are being created. Eventually, this program then throws an OutOfMemoryException:
[00:00] Job 0 complete. 3 threads remaining.
[00:01] Job 1 complete. 4 threads remaining.
[00:01] Job 2 complete. 4 threads remaining.
[00:02] Job 3 complete. 4 threads remaining.
[00:05] Job 9 complete. 5 threads remaining.
[00:05] Job 4 complete. 5 threads remaining.
[00:05] Job 5 complete. 5 threads remaining.
[00:05] Job 10 complete. 5 threads remaining.
[00:08] Job 11 complete. 5 threads remaining.
[00:08] Job 6 complete. 5 threads remaining.
...
[00:55] Job 67 complete. 7 threads remaining.
[00:56] Job 81 complete. 8 threads remaining.
...
[01:54] Job 107 complete. 11 threads remaining.
[02:00] Job 121 complete. 12 threads remaining.
..
[02:55] Job 115 complete. 19 threads remaining.
[03:02] Job 166 complete. 21 threads remaining.
...
[03:41] Job 113 complete. 28 threads remaining.
<OutOfMemoryException>
The memory usage graph for the experiment above is as follows:
(The screenshot is in Dutch; the top part represents processor usage, the bottom part memory usage.) As you can see, it looks like a new thread is being spawned almost every time the garbage collector gets in the way (as can be seen in the dips of memory usage).
Can anyone explain why this is happening, and what I can do about it? I just want .NET to stop spawning new threads, and finish the existing threads first...

You can limit the maximum number of threads that get created by specifying a ParallelOptions instance with the MaxDegreeOfParallelism property set:
var jobs = Enumerable.Range(0, 2000);
ParallelOptions po = new ParallelOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount
};
Parallel.ForEach(jobs, po, jobNr =>
{
// ...
});
As to why you're getting the behaviour you're observing: The TPL (which underlies PLINQ) is, by default, at liberty to guess the optimal number of threads to use. Whenever a parallel task blocks, the task scheduler may create a new thread in order to maintain progress. In your case, the blocking might be happening implicitly; for example, through the Console.WriteLine call, or (as you observed) during garbage collection.
From Concurrency Levels Tuning with Task Parallel Library (How Many Threads to Use?):
Since the TPL default policy is to use one thread per processor, we can conclude that TPL initially assumes that the workload of a task is ~100% working and 0% waiting, and if the initial assumption fails and the task enters a waiting state (i.e. starts blocking) - TPL with take the liberty to add threads as appropriate.

You should probably read a bit about the how the task scheduler works.
Parallel Programming with Microsoft .NET - Parallel Tasks
(latter half of the page)
"The .NET thread pool automatically manages the number of worker
threads in the pool. It adds and removes threads according to built-in
heuristics. The .NET thread pool has two main mechanisms for injecting
threads: a starvation-avoidance mechanism that adds worker threads if
it sees no progress being made on queued items and a hill-climbing
heuristic that tries to maximize throughput while using as few threads
as possible.
The goal of starvation avoidance is to prevent deadlock. This kind of
deadlock can occur when a worker thread waits for a synchronization
event that can only be satisfied by a work item that is still pending
in the thread pool's global or local queues. If there were a fixed
number of worker threads, and all of those threads were similarly
blocked, the system would be unable to ever make further progress.
Adding a new worker thread resolves the problem.
A goal of the hill-climbing heuristic is to improve the utilization of
cores when threads are blocked by I/O or other wait conditions that
stall the processor. By default, the managed thread pool has one
worker thread per core. If one of these worker threads becomes
blocked, there's a chance that a core might be underutilized,
depending on the computer's overall workload. The thread injection
logic doesn't distinguish between a thread that's blocked and a thread
that's performing a lengthy, processor-intensive operation. Therefore,
whenever the thread pool's global or local queues contain pending work
items, active work items that take a long time to run (more than a
half second) can trigger the creation of new thread pool worker
threads."
You can mark a task as LongRunning but this has the side effect of allocating a thread for it from outside the thread pool which means that the task cannot be inlined.
Remember that the ParallelFor treats the work it is given as blocks so even if the work in one loop is fairly small the overall work done by the task invoked by the look may appear longer to the scheduler.
Most calls to the GC in and of them selves aren't blocking (it runs on a separate thread) but if you wait for GC to complete then this does block. Remember also that the GC is rearranging memory so this may have some side effects (and blocking) if you are trying to allocate memory while running GC. I don't have specifics here but I know the PPL has some memory allocation features specifically for concurrent memory management for this reason.
Looking at your code's output it seems that things are running for many seconds. So I'm not surprised that you are seeing thread injection. However I seem to remember that the default thread pool size is roughly 30 threads (probably depending on the number of cores on your system). A thread takes up roughly a MB of memory before your code allocates any more so I'm not clear why you could get an out of memory exception here.

I've posted the follow-up question "How to count the amount of concurrent threads in .NET application?"
If to count the threads directly, their number in Parallel.For() mostly ((very rarely and insignificantly decreasing) only increases and is not releleased after loop completion.
Checked this in both Release and Debug mode, with
ParallelOptions po = new ParallelOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount
};
and without
The digits vary but conclusions are the same.
Here is the ready code I was using, if someone wants to play with:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
namespace Edit4Posting
{
public class Node
{
public Node Previous { get; private set; }
public Node(Node previous)
{
Previous = previous;
}
}
public class Edit4Posting
{
public static void Main(string[] args)
{
int concurrentThreads = 0;
int directThreadsCount = 0;
int diagThreadCount = 0;
var jobs = Enumerable.Range(0, 160);
ParallelOptions po = new ParallelOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount
};
Parallel.ForEach(jobs, po, delegate(int jobNr)
//Parallel.ForEach(jobs, delegate(int jobNr)
{
int threadsRemaining = Interlocked.Increment(ref concurrentThreads);
int heavyness = jobNr % 9;
//Give the processor and the garbage collector something to do...
List<Node> nodes = new List<Node>();
Node current = null;
//for (int y = 0; y < 1024 * 1024 * heavyness; y++)
for (int y = 0; y < 1024 * 24 * heavyness; y++)
{
current = new Node(current);
nodes.Add(current);
}
//*******************************
directThreadsCount = Process.GetCurrentProcess().Threads.Count;
//*******************************
threadsRemaining = Interlocked.Decrement(ref concurrentThreads);
Console.WriteLine("[Job {0} complete. {1} threads remaining but directThreadsCount == {2}",
jobNr, threadsRemaining, directThreadsCount);
});
Console.WriteLine("FINISHED");
Console.ReadLine();
}
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.