C# Multithreading and pooling

C# Multithreading and pooling - c#

Hello fellow developers,
I have a question about implementing multi-threading on my .NET (Framework 4.0) Windows Service.
Basically, what the service should be doing is the following:
Scans the filesystem (a specific directory) to see if there are files to process
If there are files that need to be processed, it should be using a thread pooling mechanism to issue threads up to a predetermined amount.
Each thread will perform an upload operation of a single file
As soon as one thread completes, the filesystem is scanned again to see if there are other files to process (I want to avoid having two threads perform the operation on the same file)
I am struggling to find a way that will allow me to do just that last step.
Right now, I have a function that retrieves the number of maximum number of concurrent threads that runs in the main thread:
int maximumNumberOfConcurrentThreads = getMaxThreads(databaseConnection);
Then, still in the main thread, I have a function that scans the directory and returns a list with the files to process
List<FileToUploadInfo> filesToUpload = getFilesToUploadFromFS(directory);
After this, I call the following function:
generateThreads(maximumNumberOfConcurrentThreads, filesToUpload);
Each thread should be calling the below function (returns void):
uploadFile(fileToUpload, databaseConnection, currentThread);
Right now, the way the program is structured, if maximum number of threads is set, say, to 5, I am grabbing 5 elements from the list and uploading them.
As soon as all 5 are done, I grab 5 more and do the same until I don't have any left, as per code below.
for (int index = 0; index < filesToUpload.Count; index = index + maximumNumberOfConcurrentThreads) {
try {
Parallel.For(0, maximumNumberOfConcurrentThreads, iteration => { if (index + iteration < filesToUpload .Count) { uploadFile(filesToUpload [index + iteration], databaseConnection, iteration); } });
}
catch (System.ArgumentOutOfRangeException outOfRange) {
debug("Exception in Parallel.For [" + outOfRange.Message + "]");
}
However, if 4 files are small and the upload of each one takes 5 seconds, while the remaining one is big and takes 30 minutes, I will have, after the 4 files have been uploaded, only one file uploading, and I need to wait for it to finish before starting to upload others in the list.
After finishing uploading all the files in the list, my service goes to sleep, and then, when it wakes up again, it scans the file system again.
What is the strategy that best fits my needs? Is it advisable to go this route or will it create concurrency nightmares? I need to avoid uploading any file twice.

Related

How to track custom metrics in Application Insights from an Azure .NET Function app

I have a .NET Azure function app that initially is triggered by a timer, but that just starts a process that puts a folder in a queue. The rest of the functions in this app are queue triggered. They are essentially processing folders in Azure storage looking for files to process and placing those in another queue. An instance of one of the functions in this app is spawned for each file in the queue and it is from this function I would like to send custom metrics to Application Insights.
This function evaluates the file and will determine one of 4 things it should do with the file. There can be hundreds instances of this function running simultaneously.
I want to track this data, essentially wanting to see in a given time period how many required action one, how many action two, and so on.
The results of this data gathering for a given run should be something like this:
Files processed: 894
Files new: 793
Files Changed: 25
Files Cleaned Up: 74
So I'm essentially wanting to tick up multiple counters.
Here are the two main documents I have been referencing:
Application Insights API for custom events and metrics
Custom metric collection in .NET and .NET Core
I understand the key point that I don't want to transmit telemetry data from each instance of this function directly to AI as that would be costly in resources, so the use of GetMetric() is the recommended course. That said, the examples involve creating a TelemtryClient in a context that can be referenced by all the code that needs to report data but I'm unsure of how I can do this effectively and efficiently within such a highly parallel function app.
I would love to find that there is a fairly straightforward pattern for this that does not involve some kind of singleton/durable function but that might be the only way.
To be more specific on the code front, here is the general structure:
Timmer Function
When triggered, puts one or more primary folders (PF) in a primary processing queue (PPQ)
Primary Folder Function
Triggered by the PPQ.
Processes all subfolders (SF) of the PF
SF's are placed into a SF queue (SFQ)
Files are placed into the file queue (FQ)
Subfolder Function
Triggered by the SFQ
Processes contents of subfolder (SF)
New subfolders are placed into a SF queue (SFQ)
Files are placed into the file queue (FQ)
Folder Function (FF)
Triggered by the FQ
Processes a given file
Primary point I want to transmit one or more results to tick a counter(s)
The only thing I have tried was writing data directly from timer triggered test function in my app just to see the Application Insights part working.
public class FunctionTesting
{
private readonly TelemetryClient telemetryClient;
public FunctionTesting(TelemetryConfiguration telemetryConfiguration)
{
this.telemetryClient = new TelemetryClient(telemetryConfiguration);
}
[FunctionName("FunctionTesting")]
public async Task Run(
[TimerTrigger("*/5 * * * * *")] TimerInfo timer,
ExecutionContext context,
ILogger log)
{
// Generate a random number between 1 and 100
var random = new Random();
var num = random.Next(1, 101);
// Log the number to Application Insights
this.telemetryClient.TrackMetric("RandomNumber", num);
// Check if the number is even or odd, and less than or greater than 50
if (num % 2 == 0)
{
this.telemetryClient.TrackEvent("EvenNumber");
}
else
{
this.telemetryClient.TrackEvent("OddNumber");
}
if (num < 50)
{
this.telemetryClient.TrackEvent("LessThan50");
}
else
{
this.telemetryClient.TrackEvent("GreaterThan50");
}
log.LogInformation($"Generated random number: {num}");
}
}
}
My plan, if I'm able to find a pattern that would work is to implement/test it in this test function before adding it to the rest of my code.
Thanks for any help!

Last batch never uploads to Solr when uploading batches of data from json file stream

This might be a long shot but I might as well try here. There is a block of c# code that is rebuilding a solr core. The steps are as follows:
Delete all the existing documents
Get the core entities
Split the entities into batches of 1000
Spin of threads to preform the next set of processes:
Serialize each batch to json and writing the json to a file on the server
hosting the core
Send a command to the core to upload that file using System.Net.WebClient solrurl/corename/update/json?stream.file=myfile.json&stream.contentType=application/json;charset=utf-8
Delete the file. I've also tried deleting the files after all the batches are done, as well as not deleting the files at all
After all batches are done it commits. I've also tried committing
after each batch is done.
My problem is the last batch will not upload if it's much less than the batch size. It flows through like the command was called but nothing happens. It throws no exceptions and I see no errors in the solr logs. My questions are Why? and How can I ensure the last batch always gets uploaded? We think it's a timing issue, but we've added Thread.Sleep(30000) in many parts of the code to test that theory and it still happens.
The only time it doesn't happen is:
if the batch is full or almost full
we don't run multiple threads it
we put a break point at the File.Delete line on the last batch and wait for 30 seconds or so, then continue
Here is the code for writing the file and calling the update command. This is called for each batch.
private const string
FileUpdateCommand = "{1}/update/json?stream.file={0}&stream.contentType=application/json;charset=utf-8",
SolrFilesDir = #"\\MYSERVER\SolrFiles",
SolrFileNameFormat = SolrFilesDir + #"\{0}-{1}.json",
_solrUrl = "http://MYSERVER:8983/solr/",
CoreName = "MyCore";
public void UpdateCoreByFile(List<CoreModel> items)
{
if (items.Count == 0)
return;
var settings = new JsonSerializerSettings { DateTimeZoneHandling = DateTimeZoneHandling.Utc };
var dir = new DirectoryInfo(SolrFilesDir);
if (!dir.Exists)
dir.Create();
var filename = string.Format(SolrFileNameFormat, Guid.NewGuid(), CoreName);
using (var sw = new StreamWriter(filename))
{
sw.Write(JsonConvert.SerializeObject(items, settings));
}
var file = HttpUtility.UrlEncode(filename);
var command = string.Format(FileUpdateCommand, file, CoreName);
using (var client = _clientFactory.GetClient())//System.Net.WebClient
{
client.DownloadData(new Uri(_solrUrl + command));
}
//Thread.Sleep(30000);//doesn't work if I add this
File.Delete(filename);//works here if add breakpoint and wait 30 sec or so
}
I'm just trying to figure out why this is happening and how to address it. I hope this makes sense, and I have provided enough information and code. Thanks for any help.

Since changing the size of the data set and adding a breakpoint "fixes" it, this is most certainly a race condition. Since you haven't added the code that actually indexes the content, it's impossible to say what the issue really is, but my guess is that the last commit happens before all the threads have finished, and only works when all threads are done (if you sleep the threads, the issue will still be there, since all threads sleep for the same time).
The easy fix - use commitWithin instead, and never issue explicit commits. The commitWithin parmaeter makes sure that the documents become available in the index within the given time frame (given as milliseconds). To make sure that the documents you submit becomes available within ten seconds, append &commitWithin=10000 to your URL.
If there's already documents pending a commit, the documents added will be committed before the ten seconds has ellapsed, but even if there's just one last document being submitted as the last batch, it'll never be more than ten seconds before it becomes visible (.. and there will be no documents left forever in a non-committed limbo).
That way you won't have to keep your threads synchronized or issue a final commit, as long as you wait until all threads have finished before exiting your application (if it's an application that actually terminates).

Why is this eating memory?

I wrote an application whose purpose is to read logs from a large table (90 million) and process them into easily understandable stats, how many, how long etc.
The first run took 7.5 hours and only had to process 27 of the 90 million. I would like to speed this up. So I am trying to run the queries in parallel. But when I run the below code, within a couple minutes I crash with an Out of Memory exception.
Environments:
Sync
Test : 26 Applications, 15 million logs, 5 million retrieved, < 20mb, takes 20 seconds
Production: 56 Applications, 90 million logs, 27 million retrieved, < 30mb, takes 7.5 hours
Async
Test : 26 Applications, 15 million logs, 5 million retrieved, < 20mb, takes 3 seconds
Production: 56 Applications, 90 million logs, 27 million retrieved, Memory Exception
public void Run()
{
List<Application> apps;
//Query for apps
using (var ctx = new MyContext())
{
apps = ctx.Applications.Where(x => x.Type == "TypeIWant").ToList();
}
var tasks = new Task[apps.Count];
for (int i = 0; i < apps.Count; i++)
{
var app = apps[i];
tasks[i] = Task.Run(() => Process(app));
}
//try catch
Task.WaitAll(tasks);
}
public void Process(Application app)
{
//Query for logs for time period
using (var ctx = new MyContext())
{
var logs = ctx.Logs.Where(l => l.Id == app.Id).AsNoTracking();
foreach (var log in logs)
{
Interlocked.Increment(ref _totalLogsRead);
var l = log;
Task.Run(() => ProcessLog(l, app.Id));
}
}
}
Is it ill advised to create 56 contexts?
Do I need to dispose and re-create contexts after a certain number of logs retrieved?
Perhaps I'm misunderstanding how the IQueryable is working? <-- My Guess
My understanding is that it will retrieve logs as needed, I guess that means for the loop is it like a yield? or is my issue that 56 'threads' call to the database and I am storing 27 million logs in memory?
Side question
The results don't really scale together. Based on the Test environment results i would expect Production would only take a few minutes. I assume the increase is directly related to the number of records in the table.

With 27 Million rows the problem is one of stream processing, not parallel execution. You need to approach the problem as you would with SQL Server's SSIS or any other ETL tools: each processing step is a transofrmation that processes its input and sends its output to the next step.
Parallel processing is achieved by using a separate thread to run each step. Some steps could also use multiple threads to process multiple inputs up to a limit. Setting limits to each step's thread count and input buffer ensures you can achieve maximum throughput without flooding your machine with waiting tasks.
.NET's TPL Dataflow addresses exactly this scenario. It provides blocks to transfrom inputs to outputs (TransformBlock), split collections to individual messages (TransformManyBlock), execute actions without transformations (ActionBlock), combine data in batches (BatchBlock) etc.
You can also specify the Maximum degree of parallelism for each step so that, eg. you have only 1 log queries executing at each time, but use 10 tasks for log processing.
In your case, you could:
Start with a TransformManyBlock that receives an application type and returns a list of app IDs
A TranformBlock reads the logs for a specific ID and sends them downstream
An ActionBlock processes the batch.
Step #3 could be broken to many other steps. Eg if you don't need to process all app log entries together, you can use a step to process individual entries. Or you could first group them by date.
Another option is to create a custom block to read data from the database using a DbDataReader and post each entry to the next step immediatelly, instead of waiting for all rows to return. This would allow you to process each entry as it arrives, instead of waiting to receive all entries.
If each app log contains many entries, this could be a huge memory and time saver

How to use threading effective way in a console application .net

I have a 8 core systems and i am processing number of text files contains millions of lines say 23 files contain huge number of lines which takes 2 to 3 hours to finish.I am thinking of using TPL task for processing text files.As of now the code which i am using is sequentially processing text files one by one so i am thinking of split it like 5 text files in one thread 5 in another thread etc.Is it a good approach or any other way ? I am using .net 4.0 and code i am using is as shown below
foreach (DataRow dtr in ds.Tables["test"].Rows)
{
string filename = dtr["ID"].ToString() + "_cfg";
try
{
foreach (var file in
Directory.EnumerateFiles(Path.GetDirectoryName(dtr["FILE_PATH"].ToString()), "*.txt"))
{
id = file.Split('\\').Last();
if (!id.Contains("GMML"))
{
strbsc = id.Split('_');
id = strbsc[0];
}
else
{
strbsc = file.Split('-');
id = ("RC" + strbsc[1]).Replace("SC", "");
}
ProcessFile(file, id, dtr["CODE"].ToString(), dtr["DOR_CODE"].ToString(), dtr["FILE_ID"].ToString());
}
}
How to split text files in to batches and each batch should run in threads rather one by one.Suppose if 23 files then 7 in one thread 7 in one thread 7 in one thread and 2 in another thread. One more thing is i am moving all these data from text files to oracle database
EDIT
if i use like this will it worth,but how to seperate files in to batches
Task.Factory.StartNew(() => {ProcessFile(file, id, dtr["CODE"].ToString(), dtr["DOR_CODE"].ToString(), dtr["FILE_ID"].ToString()); });

Splitting the file into multiple chunks does not seem to be a good idea because its performance boost is related to how the file is placed on your disk. But because of the async nature of disk IO operations, I strongly recommend async access to the file. There are several ways to do this and you can always choose a combination of those.
At the lowest level you can use async methods such as StreamWriter.WriteAsync() or StreamReader.ReadAsync() to access the file on disk and cooperatively let the OS know that it can switch to a new thread for disk IO and let the thread out until the Disk IO operation is finished. While it's useful to make async calls at this level, it alone does not have a significant impact on the overall performance of your application, since your app is still waiting for the disk operation to finish and does nothing in the meanwhile! (These calls can have a big impact on your software's responsiveness when they are called from the UI thread)
So, I recommend splitting your software logic into at least two separate parts running on two separate threads; One to read data from the file, and one to process the read data. You can use the provider/consumer pattern to help these threads interact.
One great data structure provided by .net is System.Collections.Concurrent.ConcurrentQueue which is specially useful in implementing multithreaded provider/consumer pattern.
So you can easily do something like this:
System.Collections.Concurrent.ConcurrentQueue<string> queue = new System.Collections.Concurrent.ConcurrentQueue<string>();
bool readFinished = false;
Task tRead = Task.Run(async () =>
{
using (FileStream fs = new FileStream())
{
using (StreamReader re = new StreamReader(fs))
{
string line = "";
while (!re.EndOfStream)
queue.Enqueue(await re.ReadLineAsync());
}
}
});
Task tLogic = Task.Run(async () =>
{
string data ="";
while (!readFinished)
{
if (queue.TryDequeue(out data))
//Process data
else
await Task.Delay(100);
}
});
tRead.Wait();
readFinished = true;
tLogic.Wait();
This simple example uses StreamReader.ReadLineAsync() to read data from file, while a good practice can be reading a fixed-length of characters into a char[] buffer and adding that data to the queue. You can find the optimized buffer length after some tests.

All ,the real bottle neck was when i was doing mass insertion , i was checking whether the inserting data is present in the database or what,I have a status column where if data is present ,it will be 'Y' or 'N' by doing update statement.So the update statement in congestion with insert was the culprit.After doing indexing in database the result reduced from 4 hours to 10 minute, what an impact, but it wins:)

Why is Parallel.ForEach much faster then AsParallel().ForAll() even though MSDN suggests otherwise?

I've been doing some investigation to see how we can create a multithreaded application that runs through a tree.
To find how this can be implemented in the best way I've created a test application that runs through my C:\ disk and opens all directories.
class Program
{
static void Main(string[] args)
{
//var startDirectory = #"C:\The folder\RecursiveFolder";
var startDirectory = #"C:\";
var w = Stopwatch.StartNew();
ThisIsARecursiveFunction(startDirectory);
Console.WriteLine("Elapsed seconds: " + w.Elapsed.TotalSeconds);
Console.ReadKey();
}
public static void ThisIsARecursiveFunction(String currentDirectory)
{
var lastBit = Path.GetFileName(currentDirectory);
var depth = currentDirectory.Count(t => t == '\\');
//Console.WriteLine(depth + ": " + currentDirectory);
try
{
var children = Directory.GetDirectories(currentDirectory);
//Edit this mode to switch what way of parallelization it should use
int mode = 3;
switch (mode)
{
case 1:
foreach (var child in children)
{
ThisIsARecursiveFunction(child);
}
break;
case 2:
children.AsParallel().ForAll(t =>
{
ThisIsARecursiveFunction(t);
});
break;
case 3:
Parallel.ForEach(children, t =>
{
ThisIsARecursiveFunction(t);
});
break;
default:
break;
}
}
catch (Exception eee)
{
//Exception might occur for directories that can't be accessed.
}
}
}
What I have encountered however is that when running this in mode 3 (Parallel.ForEach) the code completes in around 2.5 seconds (yes I have an SSD ;) ). Running the code without parallelization it completes in around 8 seconds. And running the code in mode 2 (AsParalle.ForAll()) it takes a near infinite amount of time.
When checking in process explorer I also encounter a few strange facts:
Mode1 (No Parallelization):
Cpu: ~25%
Threads: 3
Time to complete: ~8 seconds
Mode2 (AsParallel().ForAll()):
Cpu: ~0%
Threads: Increasing by one per second (I find this strange since it seems to be waiting on the other threads to complete or a second timeout.)
Time to complete: 1 second per node so about 3 days???
Mode3 (Parallel.ForEach()):
Cpu: 100%
Threads: At most 29-30
Time to complete: ~2.5 seconds
What I find especially strange is that Parallel.ForEach seems to ignore any parent threads/tasks that are still running while AsParallel().ForAll() seems to wait for the previous Task to either complete (which won't soon since all parent Tasks are still waiting on their child tasks to complete).
Also what I read on MSDN was: "Prefer ForAll to ForEach When It Is Possible"
Source: http://msdn.microsoft.com/en-us/library/dd997403(v=vs.110).aspx
Does anyone have a clue why this could be?
Edit 1:
As requested by Matthew Watson I've first loaded the tree in memory before looping through it. Now the loading of the tree is done sequentially.
The results however are the same. Unparallelized and Parallel.ForEach now complete the whole tree in about 0.05 seconds while AsParallel().ForAll still only goes around 1 step per second.
Code:
class Program
{
private static DirWithSubDirs RootDir;
static void Main(string[] args)
{
//var startDirectory = #"C:\The folder\RecursiveFolder";
var startDirectory = #"C:\";
Console.WriteLine("Loading file system into memory...");
RootDir = new DirWithSubDirs(startDirectory);
Console.WriteLine("Done");
var w = Stopwatch.StartNew();
ThisIsARecursiveFunctionInMemory(RootDir);
Console.WriteLine("Elapsed seconds: " + w.Elapsed.TotalSeconds);
Console.ReadKey();
}
public static void ThisIsARecursiveFunctionInMemory(DirWithSubDirs currentDirectory)
{
var depth = currentDirectory.Path.Count(t => t == '\\');
Console.WriteLine(depth + ": " + currentDirectory.Path);
var children = currentDirectory.SubDirs;
//Edit this mode to switch what way of parallelization it should use
int mode = 2;
switch (mode)
{
case 1:
foreach (var child in children)
{
ThisIsARecursiveFunctionInMemory(child);
}
break;
case 2:
children.AsParallel().ForAll(t =>
{
ThisIsARecursiveFunctionInMemory(t);
});
break;
case 3:
Parallel.ForEach(children, t =>
{
ThisIsARecursiveFunctionInMemory(t);
});
break;
default:
break;
}
}
}
class DirWithSubDirs
{
public List<DirWithSubDirs> SubDirs = new List<DirWithSubDirs>();
public String Path { get; private set; }
public DirWithSubDirs(String path)
{
this.Path = path;
try
{
SubDirs = Directory.GetDirectories(path).Select(t => new DirWithSubDirs(t)).ToList();
}
catch (Exception eee)
{
//Ignore directories that can't be accessed
}
}
}
Edit 2:
After reading the update on Matthew's comment I've tried to add the following code to the program:
ThreadPool.SetMinThreads(4000, 16);
ThreadPool.SetMaxThreads(4000, 16);
This however does not change how the AsParallel peforms. Still the first 8 steps are being executed in an instant before slowing down to 1 step / second.
(Extra note, I'm currently ignoring the exceptions that occur when I can't access a Directory by the Try Catch block around the Directory.GetDirectories())
Edit 3:
Also what I'm mainly interested in is the difference between Parallel.ForEach and AsParallel.ForAll because to me it's just strange that for some reason the second one creates one Thread for every recursion it does while the first once handles everything in around 30 threads max. (And also why MSDN suggests to use the AsParallel even though it creates so much threads with a ~1 second timeout)
Edit 4:
Another strange thing I found out:
When I try to set the MinThreads on the Thread pool above 1023 it seems to ignore the value and scale back to around 8 or 16:
ThreadPool.SetMinThreads(1023, 16);
Still when I use 1023 it does the first 1023 elements very fast followed by going back to the slow pace I've been experiencing all the time.
Note: Also literally more then 1000 threads are now created (compared to 30 for the whole Parallel.ForEach one).
Does this mean Parallel.ForEach is just way smarter in handling tasks?
Some more info, this code prints twice 8 - 8 when you set the value above 1023: (When you set the values to 1023 or lower it prints the correct value)
int threadsMin;
int completionMin;
ThreadPool.GetMinThreads(out threadsMin, out completionMin);
Console.WriteLine("Cur min threads: " + threadsMin + " and the other thing: " + completionMin);
ThreadPool.SetMinThreads(1023, 16);
ThreadPool.SetMaxThreads(1023, 16);
ThreadPool.GetMinThreads(out threadsMin, out completionMin);
Console.WriteLine("Now min threads: " + threadsMin + " and the other thing: " + completionMin);
Edit 5:
As of Dean's request I've created another case to manually create tasks:
case 4:
var taskList = new List<Task>();
foreach (var todo in children)
{
var itemTodo = todo;
taskList.Add(Task.Run(() => ThisIsARecursiveFunctionInMemory(itemTodo)));
}
Task.WaitAll(taskList.ToArray());
break;
This is also as fast as the Parallel.ForEach() loop. So we still don't have the answer to why AsParallel().ForAll() is so much slower.

This problem is pretty debuggable, an uncommon luxury when you have problems with threads. Your basic tool here is the Debug > Windows > Threads debugger window. Shows you the active threads and gives you a peek at their stack trace. You'll easily see that, once it gets slow, that you'll have dozens of threads active that are all stuck. Their stack trace all look the same:
mscorlib.dll!System.Threading.Monitor.Wait(object obj, int millisecondsTimeout, bool exitContext) + 0x16 bytes
mscorlib.dll!System.Threading.Monitor.Wait(object obj, int millisecondsTimeout) + 0x7 bytes
mscorlib.dll!System.Threading.ManualResetEventSlim.Wait(int millisecondsTimeout, System.Threading.CancellationToken cancellationToken) + 0x182 bytes
mscorlib.dll!System.Threading.Tasks.Task.SpinThenBlockingWait(int millisecondsTimeout, System.Threading.CancellationToken cancellationToken) + 0x93 bytes
mscorlib.dll!System.Threading.Tasks.Task.InternalRunSynchronously(System.Threading.Tasks.TaskScheduler scheduler, bool waitForCompletion) + 0xba bytes
mscorlib.dll!System.Threading.Tasks.Task.RunSynchronously(System.Threading.Tasks.TaskScheduler scheduler) + 0x13 bytes
System.Core.dll!System.Linq.Parallel.SpoolingTask.SpoolForAll<ConsoleApplication1.DirWithSubDirs,int>(System.Linq.Parallel.QueryTaskGroupState groupState, System.Linq.Parallel.PartitionedStream<ConsoleApplication1.DirWithSubDirs,int> partitions, System.Threading.Tasks.TaskScheduler taskScheduler) Line 172 C#
// etc..
Whenever you see something like this, you should immediately think fire-hose problem. Probably the third-most common bug with threads, after races and deadlocks.
Which you can reason out, now that you know the cause, the problem with the code is that every thread that completes adds N more threads. Where N is the average number of sub-directories in a directory. In effect, the number of threads grows exponentially, that's always bad. It will only stay in control if N = 1, that of course never happens on an typical disk.
Do beware that, like almost any threading problem, that this misbehavior tends to repeat poorly. The SSD in your machine tends to hide it. So does the RAM in your machine, the program might well complete quickly and trouble-free the second time you run it. Since you'll now read from the file system cache instead of the disk, very fast. Tinkering with ThreadPool.SetMinThreads() hides it as well, but it cannot fix it. It never fixes any problem, it only hides them. Because no matter what happens, the exponential number will always overwhelm the set minimum number of threads. You can only hope that it completes finishing iterating the drive before that happens. Idle hope for a user with a big drive.
The difference between ParallelEnumerable.ForAll() and Parallel.ForEach() is now perhaps also easily explained. You can tell from the stack trace that ForAll() does something naughty, the RunSynchronously() method blocks until all the threads are completed. Blocking is something threadpool threads should not do, it gums up the thread pool and won't allow it to schedule the processor for another job. And has the effect you observed, the thread pool is quickly overwhelmed with threads that are waiting on the N other threads to complete. Which isn't happening, they are waiting in the pool and are not getting scheduled because there are already so many of them active.
This is a deadlock scenario, a pretty common one, but the threadpool manager has a workaround for it. It watches the active threadpool threads and steps in when they don't complete in a timely manner. It then allows an extra thread to start, one more than the minimum set by SetMinThreads(). But not more then the maximum set by SetMaxThreads(), having too many active tp threads is risky and likely to trigger OOM. This does solve the deadlock, it gets one of the ForAll() calls to complete. But this happens at a very slow rate, the threadpool only does this twice a second. You'll run out of patience before it catches up.
Parallel.ForEach() doesn't have this problem, it doesn't block so doesn't gum up the pool.
Seems to be the solution, but do keep in mind that your program is still fire-hosing the memory of your machine, adding ever more waiting tp threads to the pool. This can crash your program as well, it just isn't as likely because you have a lot of memory and the threadpool doesn't use a lot of it to keep track of a request. Some programmers however accomplish that as well.
The solution is a very simple one, just don't use threading. It is harmful, there is no concurrency when you have only one disk. And it does not like being commandeered by multiple threads. Especially bad on a spindle drive, head seeks are very, very slow. SSDs do it a lot better, it however still takes an easy 50 microseconds, overhead that you just don't want or need. The ideal number of threads to access a disk that you can't otherwise expect to be cached well is always one.

The first thing to note is that you are trying to parallelise an IO-bound operation, which will distort the timings significantly.
The second thing to note is the nature of the parallelised tasks: You are recursively descending a directory tree. If you create multiple threads to do this, each thread is likely to be accessing a different part of the disk simultaneously - which will cause the disk read head to be jumping all over the place and slowing things down considerably.
Try changing your test to create an in-memory tree, and access that with multiple threads instead. Then you will be able to compare the timings properly without the results being distorted beyond all usefulness.
Additionally, you may be creating a great number of threads, and they will (by default) be threadpool threads. Having a great number of threads will actually slow things down when they exceed the number of processor cores.
Also note that when you exceed the thread pool minimum threads (defined by ThreadPool.GetMinThreads()), a delay is introduced by the thread pool manager between each new threadpool thread creation. (I think this is around 0.5s per new thread).
Also, if the number of threads exceeds the value returned by ThreadPool.GetMaxThreads(), the creating thread will block until one of the other threads has exited. I think this is likely to be happening.
You can test this hypothesis by calling ThreadPool.SetMaxThreads() and ThreadPool.SetMinThreads() to increase these values, and see if it makes any difference.
(Finally, note that if you are really trying to recursively descend from C:\, you will almost certainly get an IO exception when it reaches a protected OS folder.)
NOTE: Set the max/min threadpool threads like this:
ThreadPool.SetMinThreads(4000, 16);
ThreadPool.SetMaxThreads(4000, 16);
Follow Up
I have tried your test code with the threadpool thread counts set as described above, with the following results (not run on the whole of my C:\ drive, but on a smaller subset):
Mode 1 took 06.5 seconds.
Mode 2 took 15.7 seconds.
Mode 3 took 16.4 seconds.
This is in line with my expectations; adding a load of threading to do this actually makes it slower than single-threaded, and the two parallel approaches take roughly the same time.
In case anyone else wants to investigate this, here's some determinative test code (the OP's code is not reproducible because we don't know his directory structure).
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Threading.Tasks;
namespace Demo
{
internal class Program
{
private static DirWithSubDirs RootDir;
private static void Main()
{
Console.WriteLine("Loading file system into memory...");
RootDir = new DirWithSubDirs("Root", 4, 4);
Console.WriteLine("Done");
//ThreadPool.SetMinThreads(4000, 16);
//ThreadPool.SetMaxThreads(4000, 16);
var w = Stopwatch.StartNew();
ThisIsARecursiveFunctionInMemory(RootDir);
Console.WriteLine("Elapsed seconds: " + w.Elapsed.TotalSeconds);
Console.ReadKey();
}
public static void ThisIsARecursiveFunctionInMemory(DirWithSubDirs currentDirectory)
{
var depth = currentDirectory.Path.Count(t => t == '\\');
Console.WriteLine(depth + ": " + currentDirectory.Path);
var children = currentDirectory.SubDirs;
//Edit this mode to switch what way of parallelization it should use
int mode = 3;
switch (mode)
{
case 1:
foreach (var child in children)
{
ThisIsARecursiveFunctionInMemory(child);
}
break;
case 2:
children.AsParallel().ForAll(t =>
{
ThisIsARecursiveFunctionInMemory(t);
});
break;
case 3:
Parallel.ForEach(children, t =>
{
ThisIsARecursiveFunctionInMemory(t);
});
break;
default:
break;
}
}
}
internal class DirWithSubDirs
{
public List<DirWithSubDirs> SubDirs = new List<DirWithSubDirs>();
public String Path { get; private set; }
public DirWithSubDirs(String path, int width, int depth)
{
this.Path = path;
if (depth > 0)
for (int i = 0; i < width; ++i)
SubDirs.Add(new DirWithSubDirs(path + "\\" + i, width, depth - 1));
}
}
}

The Parallel.For and .ForEach methods are implemented internally as equivalent to running iterations in Tasks, e.g. that a loop like:
Parallel.For(0, N, i =>
{
DoWork(i);
});
is equivalent to:
var tasks = new List<Task>(N);
for(int i=0; i<N; i++)
{
tasks.Add(Task.Factory.StartNew(state => DoWork((int)state), i));
}
Task.WaitAll(tasks.ToArray());
And from the perspective of every iteration potentially running in parallel with every other iteration, this is an ok mental model, but does not happen in reality. Parallel, in fact, does not necessarily use one Task per iteration, as that is significantly more overhead than is necessary. Parallel.ForEach tries to use the minimum number of tasks necessary to complete the loop as fast as possible. It spins up tasks as threads become available to process those tasks, and each of those tasks participates in a management scheme (I think its called chunking): A task asks for multiple iterations to be done, gets them, and then processes that work, and then goes back for more. The chunk sizes vary based the number of tasks participating, the load on the machine, etc.
PLINQ’s .AsParallel() has a different implementation, but it ‘can’ still similarly fetch multiple iterations into a temporary store, do the calculations in a thread (but not as a task), and put the query results into a small buffer. (You get something based on ParallelQuery, and then further .Whatever() functions bind to an alternative set of extension methods that provide parallel implementations).
So now that we have a small idea of how these two mechanisms work, I will try to provide an answer to your original question:
So why is .AsParallel() slower than Parallel.ForEach? The reason stems from the following. Tasks (or their equivalent implementation here) do NOT block on I/O-like calls. They ‘await’ and free up the CPU to do something else. But (quoting C# nutshell book): “PLINQ cannot perform I/O-bound work without blocking threads”. The calls are synchronous. They were written with the intention that you increase the degree of parallelism if (and ONLY if) you are doing such things as downloading web pages per task that do not hog CPU time.
And the reason why your function calls are exactly analogous to I/O bound calls is this: One of your threads (call it T) blocks and does nothing until all of its child threads have finished, which can be a slow process here. T itself is not CPU-intensive while it waits for the children to unblock, it is doing nothing but waiting. Hence it is identical to a typical I/O bound function call.

Based on the accepted answer to How exactly does AsParallel work?
.AsParallel.ForAll() casts back to IEnumerable before calling .ForAll()
so it creates 1 new thread + N recursive calls (each of which generates a new thread).

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.