In the parallel for loop below, is there a reliable way to determine how many threads will be created and with what index boundaries?
Parallel.For
(
0,
int.MaxValue,
new ParallelOptions() {MaxDegreeOfParallelism=Environment.ProcessorCount},
(i) =>
{
// Monitor [i] to see how the range is segmented.
}
);
If processor count on the target machine is 4 and we use all 4 processors, I observe that 4 segments are created roughly equal in size, each being approximately int.MaxValue/4. However, this is just observation and Parallel.For may or may not offer deterministic segmentation.
Searching around did not help much either. Is it possible to predict or calculate this?
You can specify your own partitioner if you don't like the default behavior.
Related
I have just did a sample for multithreading using This Link like below:
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
int count = 0;
Parallel.For(0, 50000, options,(i, state) =>
{
count++;
});
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
Console.ReadKey();
It gives me 15 thread before Parellel.For and after it gives me 17 thread only. So only 2 thread is occupy with Parellel.For.
Then I have created a another sample code using This Link like below:
var options = new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount * 10 };
Console.WriteLine("MaxDegreeOfParallelism : {0}", Environment.ProcessorCount * 10);
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
int count = 0;
Parallel.For(0, 50000, options,(i, state) =>
{
count++;
});
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
Console.ReadKey();
In above code, I have set MaxDegreeOfParallelism where it sets 40 but is still taking same threads for Parallel.For.
So how can I increase running thread for Parallel.For?
I am facing a problem that some numbers is skipped inside the Parallel.For when I perform some heavy and complex functionality inside it. So here I want to increase the maximum thread and override the skipping issue.
What you're saying is something like: "My car is shaking when driving too fast. I'm trying to avoid this by driving even faster." That doesn't make any sense. What you need is to fix the car, not change the speed.
How exactly to do that depends on what are you actually doing in the loop. The code you showed is obviously placeholder, but even that's wrong. So I think what you should do first is to learn about thread safety.
Using a lock is one option, and it's the easiest one to get correct. But it's also hard to make it efficient. What you need is to lock only for a short amount of time each iteration.
There are other options how to achieve thread safety, including using Interlocked, overloads of Parallel.For that use thread-local data and approaches other than Parallel.For(), like PLINQ or TPL Dataflow.
After you made sure your code is thread safe, only then it's time to worry about things like the number of threads. And regarding that, I think there are two things to note:
For CPU-bound computations, it doesn't make sense to use more threads than the number of cores your CPU has. Using more threads than that will actually usually lead to slower code, since switching between threads has some overhead.
I don't think you can measure the number of threads used by Parallel.For() like that. Parallel.For() uses the thread pool and it's quite possible that there already are some threads in the pool before the loop begins.
Parallel loops use hardware CPU cores. If your CPU has 2 cores, this is the maximum degree of paralellism that you can get in your machine.
Taken from MSDN:
What to Expect
By default, the degree of parallelism (that is, how many iterations run at the same time in hardware) depends on the
number of available cores. In typical scenarios, the more cores you
have, the faster your loop executes, until you reach the point of
diminishing returns that Amdahl's Law predicts. How much faster
depends on the kind of work your loop does.
Further reading:
Threading vs Parallelism, how do they differ?
Threading vs. Parallel Processing
Parallel loops will give you wrong result for summation operations without locks as result of each iteration depends on a single variable 'Count' and value of 'Count' in parallel loop is not predictable. However, using locks in parallel loops do not achieve actual parallelism. so, u should try something else for testing parallel loop instead of summation.
Let's say I have an action block A that perform some work in parallel with a max degree of parallelism of 4.
Say I have case where action block A is doing work X in some cases and work Y in others. X is some small work, Y is some larger work that requires it to be split into smaller work chunks and therefore I need to parallelise those too.
Inside work Y I therefore need to parallelise the work chunks to a max degree of 4, but at this point I might have 4 A blocks executing in parallel which could lead for example to "A-X, A-X, A-Y, A-Y" running in parallel. This would result in 1 + 1 + 4 + 4 parallel tasks which is too many parallel tasks for my purpose as I would always keep it limited to a maximum of 4 (or any other chosen number) overall.
Is there a way to control the maximum degree of parallelism including nested blocks?
While creating a block in TPL Dataflow, you can specify a custom scheduler for the block via its options.
Easy way to limit the number of concurrent tasks and concurrency level is to use the ConcurrentExclusiveSchedulerPair in your code, with parameters you need.
We would like to optionally control the number of "threads" on our parallel loops to avoid overwhelming a web service (for example).
Is it possible to specify a custom MaxDegreeOfParallelism on a Parallel.ForEach loop, but also to revert to the default value as required? Seemingly zero (0) is an invalid value for MaxDegreeOfParallelism, whereas I was hoping it could simply mean "ignore".
In other words, can you avoid writing this type of code?
int numParallelOperations = GetNumParallelOperations();
if (numParallelOperations > 0)
{
ParallelOptions options = new ParallelOptions();
options.MaxDegreeOfParallelism = numParallelOperations;
Parallel.ForEach(items, options, i =>
{
Foo(i);
});
}
else
{
Parallel.ForEach(items, i =>
{
Foo(i);
});
}
Do you mean -1 as per MSDN:
The MaxDegreeOfParallelism limits the number of concurrent operations
run by Parallel method calls that are passed this ParallelOptions
instance to the set value, if it is positive. If
MaxDegreeOfParallelism is -1, then there is no limit placed on the
number of concurrently running operations.
You can control the approximate number of threads like this:
// use only (ca) one kernel:
int degreeOfParallelism = 1;
// leave (ca) one kernel idle:
int degreeOfParallelism = Environment.ProcessorCount - 1;
// use (ca) half of the kernels:
int degreeOfParallelism = Environment.ProcessorCount > 1 ?
Environment.ProcessorCount / 2 : 1;
// run at full speed:
int degreeOfParallelism = - 1;
var options = new ParallelOptions();
options.MaxDegreeOfParallelism = degreeOfParallelism;
Parallel.For(0, x, options, y =>
//...
This may not be a definitive answer as I cannot add a comment due to me only just joining StackOverflow. I don't believe it is possible to do as your asking but I do know that MSDN documentation states that -1 is the parameter which sets an unlimited number of tasks to be ran the the ForEach. From my experience, it is best to leave the CLR to determine how many concurrent tasks will be ran unless you really know what you are doing. The Parallel library is high level and if you needed to really do something like this you should be coding at a lower level and in control of your own threads and not leaving it up to a TaskScheduler or ThreadPool etc but this takes a lot of experimentation to get your own algorithms running effectively.
The only thing I can suggest is wrapping the Parallel.ForEach method to include the setting of your ParallelOptions.MaxDegreeOfParallism to cut down on the duplicate code and enable you to add an interface and test the asynchonous code in a synchronous manner.
Apologies for not providing a more positive response!
I need proccess several lines from a database (can be millions) in parallel in c#. The processing is quite quick (50 or 150ms/line) but I can not know this speed before runtime as it depends on hardware/network.
The ThreadPool or the newer TaskParallelLibrary seems to be what feets my needs as I am new to threading and want to get the most efficient way to process the data.
However these methods does not provide a way to control the speed execution of my tasks (lines/minute) : I want to be able to set a maximum speed limit for the processing or run it full speed.
Please note that setting the number of thread of the ThreadPool/TaskFactory does not provide sufficient accuracy for my needs as I would like to be able to set a speed limit below the 'one thread speed'.
Using a custom sheduler for the TPL seems to be a way to do that, but I did not find a way to implement it.
Furthermore, I'm worried about the efficiency cost that would take such a setup.
Could you provide me a way or advices how to achieve this work ?
Thanks in advance for your answers.
The TPL provides a convenient programming abstraction on top of the Thread Pool. I would always select TPL when that is an option.
If you wish to throttle the total processing speed, there's nothing built-in that would support that.
You can measure the total processing speed as you proceed through the file and regulate speed by introducing (non-spinning) delays in each thread. The size of the delay can be dynamically adjusted in your code based on observed processing speed.
I am not seeing the advantage of limiting a speed, but I suggest you look into limiting max degree of parallalism of the operation. That can be done via MaxDegreeOfParallelism in the ParalleForEach options property as the code works over the disparate lines of data. That way you can control the slots, for lack of a better term, which can be expanded or subtracted depending on the criteria which you are working under.
Here is an example using the ConcurrentBag to process lines of disperate data and to use 2 parallel tasks.
var myLines = new List<string> { "Alpha", "Beta", "Gamma", "Omega" };
var stringResult = new ConcurrentBag<string>();
ParallelOptions parallelOptions = new ParallelOptions();
parallelOptions.MaxDegreeOfParallelism = 2;
Parallel.ForEach( myLines, parallelOptions, line =>
{
if (line.Contains( "e" ))
stringResult.Add( line );
} );
Console.WriteLine( string.Join( " | ", stringResult ) );
// Outputs Beta | Omega
Note that parallel options also has a TaskScheduler property which you can refine more of the processing. Finally for more control, maybe you want to cancel the processing when a specific threshold is reached? If so look into CancellationToken property to exit the process early.
I'm experimenting with the new System.Threading.Parallel methods like parallel for and foreach.
They seem to work nicely but I need a way to increase the number of concurrent threads that are executed which are 8 (I have a Quad core).
I know there is a way I just can find the place thy hidden the damn property.
Gilad.
quote:
var query = from item in source.AsParallel().WithDegreeOfParallelism(10)
where Compute(item) > 42
select item;
In cases where a query is performing a significant amount of non-compute-bound work such as File I/O, it might be beneficial to specify a degree of parallelism greater than the number of cores on the machine.
from: MSDN
IF you are using Parallel.For or Parallel.ForEach you can specify a ParallelOptions object which has a property MaxDegreesOfParallelism. Unfortunately this is just a maximum limit as the name suggests, and does not provide a lower bound guarantee. For the relationsship to WithDegreeOfParallelism see this blog post.
MAY NOT - enough said. Blindy commented it correctly