Why the following C# program uses limited (10) number of threads? [duplicate]

Why the following C# program uses limited (10) number of threads? [duplicate] - c#

I have just did a sample for multithreading using This Link like below:
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
int count = 0;
Parallel.For(0, 50000, options,(i, state) =>
{
count++;
});
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
Console.ReadKey();
It gives me 15 thread before Parellel.For and after it gives me 17 thread only. So only 2 thread is occupy with Parellel.For.
Then I have created a another sample code using This Link like below:
var options = new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount * 10 };
Console.WriteLine("MaxDegreeOfParallelism : {0}", Environment.ProcessorCount * 10);
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
int count = 0;
Parallel.For(0, 50000, options,(i, state) =>
{
count++;
});
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
Console.ReadKey();
In above code, I have set MaxDegreeOfParallelism where it sets 40 but is still taking same threads for Parallel.For.
So how can I increase running thread for Parallel.For?

I am facing a problem that some numbers is skipped inside the Parallel.For when I perform some heavy and complex functionality inside it. So here I want to increase the maximum thread and override the skipping issue.
What you're saying is something like: "My car is shaking when driving too fast. I'm trying to avoid this by driving even faster." That doesn't make any sense. What you need is to fix the car, not change the speed.
How exactly to do that depends on what are you actually doing in the loop. The code you showed is obviously placeholder, but even that's wrong. So I think what you should do first is to learn about thread safety.
Using a lock is one option, and it's the easiest one to get correct. But it's also hard to make it efficient. What you need is to lock only for a short amount of time each iteration.
There are other options how to achieve thread safety, including using Interlocked, overloads of Parallel.For that use thread-local data and approaches other than Parallel.For(), like PLINQ or TPL Dataflow.
After you made sure your code is thread safe, only then it's time to worry about things like the number of threads. And regarding that, I think there are two things to note:
For CPU-bound computations, it doesn't make sense to use more threads than the number of cores your CPU has. Using more threads than that will actually usually lead to slower code, since switching between threads has some overhead.
I don't think you can measure the number of threads used by Parallel.For() like that. Parallel.For() uses the thread pool and it's quite possible that there already are some threads in the pool before the loop begins.

Parallel loops use hardware CPU cores. If your CPU has 2 cores, this is the maximum degree of paralellism that you can get in your machine.
Taken from MSDN:
What to Expect
By default, the degree of parallelism (that is, how many iterations run at the same time in hardware) depends on the
number of available cores. In typical scenarios, the more cores you
have, the faster your loop executes, until you reach the point of
diminishing returns that Amdahl's Law predicts. How much faster
depends on the kind of work your loop does.
Further reading:
Threading vs Parallelism, how do they differ?
Threading vs. Parallel Processing

Parallel loops will give you wrong result for summation operations without locks as result of each iteration depends on a single variable 'Count' and value of 'Count' in parallel loop is not predictable. However, using locks in parallel loops do not achieve actual parallelism. so, u should try something else for testing parallel loop instead of summation.

Related

Parallel.For maximum threads and current thread C#

What I want to achieve:
Using Parallel.For (anything else would be appreciated, but found out that this one is the easiest) I want to increase a variable named max so that it gets to 100,000,000, using threads, but the program should use only X number of threads at the same time.
Code snippet:
using System;
using System.Threading;
using System.Linq;
using System.Threading.Tasks;
using System.Diagnostics;
namespace pool_threading
{
class MainClass
{
static int max=0;
static int current_max_thread=0;
public static void Main (string[] args)
{
Parallel.For(0, 100000000, new ParallelOptions { MaxDegreeOfParallelism = 50 },
i =>
{
max++;
if(Thread.CurrentThread.ManagedThreadId>current_max_thread)
current_max_thread=Thread.CurrentThread.ManagedThreadId;
});
Console.WriteLine ("I got to this number {0} using most {1} threads",max,current_max_tread);
}
public static void GetPage(int i)
{
}
}
}
Result:
I got this number 38,786,886 using most 11 threads
Now ... I don't know why I get the number 38,786,886 which is less than 38,786,886, but I guess it's because multiple threads are trying to increase it at the exact same time, so if 10 are trying at the same time, only the first one will get the chance. Please correct me if I'm wrong.
The biggest "problem" is that I get 11 threads all the time, even with the maximum set to 50 (scroll the code to see the maximum), if I set it to maximum 1 thread, I get always 4 (maybe 4 is the minimum in this situation, but it still doesn't explain why I get maximum only 11 at the same time).

This is trivial, just use i inside of the loop instead of trying to increment and use max. You're not incrementing it safely, but there's not reason for you to try. The whole point of Parallel.For is that it gives you the loop index, and ensures that each loop index is hit exactly once, no more, no less.

There are two things going on here, as far as I can see:
First, you are hammering away on max from multiple threads. You probably have a multicore machine. Those operations are probably overlapping with each other. To perform your incrementing in a thread safe way you need
Interlocked.Increment(ref max);
not
max++; /* not thread safe */
Second, you don't always get the number of threads you asked for in MaxDegreeOfParallelism. You'll never get more, but you might get less.
If you're going to use parallelism please pay close attention to the thread-safety of the code you'll run in the threads.

I believe you are not getting 50 threads for 2 main reasons:
Hardware. There is a point at which a processor can do more with fewer threads because the time it takes to switch threads is more than just running a thread more often. This leads to
Managed code. C#.NET is a managed code system. The programmers of .Net know the above and probably set some limits and checks to prevent too many threads from running. What you are setting in the code might be above some internal limit, so it is effectively ignored. Also, you are setting a maximum, so the Parallel.For() can use anywhere from 1 to X number of threads, whatever it thinks is most efficient.

Ten times Thread.Sleep(100) or a single Thread.Sleep(1000)?

Is there a difference (from performance perspective) between:
Thread.Sleep(10000);
and
for(int i=0; i<100; i++)
{
Thread.Sleep(100);
}
Does the single call to Thread.Sleep(10000) also results in context switches within this 10 seconds (so the OS can see if it's done sleeping), or is this thread really not served for 10 seconds?

The second code (for loop) requires more process swaps and should be little slower than Thread.Sleep(10000);
Anyway you can use System.Diagnostics.Stopwatch class to determine the exact time for these two approaches. I believe the difference will be very very small.

in any case second loop will take time because of following overheads
Memory utilization for 10 different thread objects
10 different callbacks will be initiated once you call thread.sleep
Overhead cost for running loop
if we want to run the code on single thread so why do we want a loop if we don't have any break point even.

ThreadPool with speed execution control

I need proccess several lines from a database (can be millions) in parallel in c#. The processing is quite quick (50 or 150ms/line) but I can not know this speed before runtime as it depends on hardware/network.
The ThreadPool or the newer TaskParallelLibrary seems to be what feets my needs as I am new to threading and want to get the most efficient way to process the data.
However these methods does not provide a way to control the speed execution of my tasks (lines/minute) : I want to be able to set a maximum speed limit for the processing or run it full speed.
Please note that setting the number of thread of the ThreadPool/TaskFactory does not provide sufficient accuracy for my needs as I would like to be able to set a speed limit below the 'one thread speed'.
Using a custom sheduler for the TPL seems to be a way to do that, but I did not find a way to implement it.
Furthermore, I'm worried about the efficiency cost that would take such a setup.
Could you provide me a way or advices how to achieve this work ?
Thanks in advance for your answers.

The TPL provides a convenient programming abstraction on top of the Thread Pool. I would always select TPL when that is an option.
If you wish to throttle the total processing speed, there's nothing built-in that would support that.
You can measure the total processing speed as you proceed through the file and regulate speed by introducing (non-spinning) delays in each thread. The size of the delay can be dynamically adjusted in your code based on observed processing speed.

I am not seeing the advantage of limiting a speed, but I suggest you look into limiting max degree of parallalism of the operation. That can be done via MaxDegreeOfParallelism in the ParalleForEach options property as the code works over the disparate lines of data. That way you can control the slots, for lack of a better term, which can be expanded or subtracted depending on the criteria which you are working under.
Here is an example using the ConcurrentBag to process lines of disperate data and to use 2 parallel tasks.
var myLines = new List<string> { "Alpha", "Beta", "Gamma", "Omega" };
var stringResult = new ConcurrentBag<string>();
ParallelOptions parallelOptions = new ParallelOptions();
parallelOptions.MaxDegreeOfParallelism = 2;
Parallel.ForEach( myLines, parallelOptions, line =>
{
if (line.Contains( "e" ))
stringResult.Add( line );
} );
Console.WriteLine( string.Join( " | ", stringResult ) );
// Outputs Beta | Omega
Note that parallel options also has a TaskScheduler property which you can refine more of the processing. Finally for more control, maybe you want to cancel the processing when a specific threshold is reached? If so look into CancellationToken property to exit the process early.

PLINQ not improving the performance

I wrote a LINQ to find out frequencies of unique characters from a text file.I was also transforming my initial result into an object with the help of select.The final result comes out in the form of a List.
Below is the query i have used.
charNodes = inputString.GroupBy(ch => ch)
.Select((ch) => new TNode(ch.Key.ToString(),ch.Count()))
.ToList<TNode>();
I have a quad core machine running and the above query ran in 15ms.But strangely it took more time when i PLINQ'ed the same query.The below one took about 40ms.
charNodes = inputString.GroupBy(ch => ch).AsParallel
.Select((ch) => new TNode(ch.Key.ToString(),ch.Count()))
.ToList<TNode>();
Worst was the case with the next query that took about 83ms
charNodes = inputString.AsParallel().GroupBy(ch => ch)
.Select((ch) => new TNode(ch.Key.ToString(), ch.Count()))
.ToList<TNode>();
What is going wrong here?.

When this type of question comes up the answer is always the same: The PLINQ overhead is higher than the gains.
This happens because the work items are extremely small (grouping by a char, or creating a new object from trivial inputs). It works much better when they are bigger.

It's really hard to tell what's going on there strictly based on the code you provided.
TPL uses thread pool threads. The thread pool starts up with about 10 running threads. If you need more threads then the thread pool will create new ones about once every second as long as a new thread is needed. If your loop resulted in more than 10 parallel operations, it would need to spend time spinning up a new thread.Correction: the number of threads a parallel loop needs takes away from available threads in the thread pool. The thread pool tries to keep a minimum number of available threads in that pool, if it notices that threads are taking too long, it will spin up new ones to compensate--which takes resources. Lots of parts of the framework use the thread pool, so there's all sorts of opportunities that could be stressing the thread pool. Starting up a thread is fairly expensive.
The other possibly is that if your number of iterations was more than the number of available CPUs, a lot of context switching resulted. Context switching is expensive and impacts the load on the CPUs as well as how fast the OS can switch between threads.
If you provide more detail, like the input data, I can provide more detail in the answer.

Running maximum threads: Automatic performance adjustment

I'm developing an app which one scans thousands copies of a struct; ~1 GB RAM. Speed is important.
ParallelScan(_from, _to); //In a new thread
I manually adjust the threads count:
if (myStructs.Count == 0) { threads = 0; }
else if (myStructs.Count < 1 * Number.Thousand) { threads = 1; }
else if (myStructs.Count < 3 * Number.Thousand) { threads = 2; }
else if (myStructs.Count < 5 * Number.Thousand) { threads = 4; }
else if (myStructs.Count < 10 * Number.Thousand) { threads = 8; }
else if (myStructs.Count < 20 * Number.Thousand) { threads = 12; }
else if (myStructs.Count < 30 * Number.Thousand) { threads = 20; }
else if (myStructs.Count < 50 * Number.Thousand) { threads = 30; }
else threads = 40;
I just wrote it from scratch and I need to modify it for another CPU, etc. I think I could write a smarter code which one dynamically starts a new thread if CPU is available at the moment:
If CPU is not %100 start N thread
Measure CPU or thread process time & modify/estimate N
Loop until scan all struct array
Is there anyone think that "I did something similar" or "I have a better idea" ?
UPDATE: The solution
Parallel.For(0, myStructs.Count - 1, (x) =>
{
ParallelScan(x, x); // Will be ParallelScan(x);
});
I did trim tons of code. Thanks people!
UPDATE 2: Results
Scan time for 10K templates
1 Thread: 500 ms
10 Threads: 300 ms
40 Threads: 600 ms
Tasks: 100 ms

The standard answer: Use Tasks (TPL) , not Threads. Tasks require Fx4.
Your ParallelScan could just use Parallel.Foreach( ... ) or PLINQ (.AsParallel()).
The TPL framework includes a scheduler, and ForEach() uses a partitioner, to adapt to CPU cores and load. Your problem is most likely solved with the standard components but you can write custom-schedulers and -partitioners.

Actually, you won't get much benefit from spanning 50 threads, if you CPU only has two cores (even if each of them supports hyperthreading). If will actually run slower due to context switching which will occur every once in a while.
That means you should go for the Task Parallel Library (.NET 4), which takes care that all available cores are used efficiently.
Apart from that, improving the asymptotic duration of your search algorithm might prove more valuable for large quantities of data, regardless of the Moore's law.
[Edit]
If you are unable/unwilling to use .NET 4 TPL, you can start by getting the information about the current number of logical processors in the system (use Environment.ProcessorCount or check this answer for detailed info). Based on that number, you can partition your data and span a fixed number of threads. That is much simpler that checking the CPU utilization, and should prevent creating unnecessary threads which are starved anyway.

OK, sorry to keep going on but first to compile my comments:
Unless you have a very, very, very, good reason to think that scanning these structs will take any more than a handful of microseconds and that really, really, really matters, it's not a good idea to do this kind of optimisation. If you really want to do it, you should have one thread per core. But really - don't. If it's just 50,000 structs and you're doing something simple with them, don't bother.
FYI, starting a new thread takes a good amount of time (a measurable part of a second, several milliseconds).
How long does this operation take? It's very unlikely that it's useful for you to optimize multithreading like this. It will give you the worst improvement. Better improvement will be gained by a better algorithm, or not having to depend on this weird invented multithreading scheme.
I'm confused about your performance fixation partly because you say you're looking through 50,000 structs (a very quick and easy operation) and partly because you're using structs. Without boxing that's a value type and if you're passing them around threads you're copying data rather than references, i.e. using more memory. My point being that that's a lot of data/memory, unless the structs are small, in which case, what kind of processing can you possibly be doing on them that takes so long as to think about 40+ threads in parallel?
If performance is truly incredibly important and your goal, and you're not simply trying to do this as a nice engineering exercise, please share information about what kind of processing you're doing.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.