TPL Dataflow controlling max degree of parallelism between nested blocks - c#

Let's say I have an action block A that perform some work in parallel with a max degree of parallelism of 4.
Say I have case where action block A is doing work X in some cases and work Y in others. X is some small work, Y is some larger work that requires it to be split into smaller work chunks and therefore I need to parallelise those too.
Inside work Y I therefore need to parallelise the work chunks to a max degree of 4, but at this point I might have 4 A blocks executing in parallel which could lead for example to "A-X, A-X, A-Y, A-Y" running in parallel. This would result in 1 + 1 + 4 + 4 parallel tasks which is too many parallel tasks for my purpose as I would always keep it limited to a maximum of 4 (or any other chosen number) overall.
Is there a way to control the maximum degree of parallelism including nested blocks?

While creating a block in TPL Dataflow, you can specify a custom scheduler for the block via its options.
Easy way to limit the number of concurrent tasks and concurrency level is to use the ConcurrentExclusiveSchedulerPair in your code, with parameters you need.

Related

Why the following C# program uses limited (10) number of threads? [duplicate]

I have just did a sample for multithreading using This Link like below:
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
int count = 0;
Parallel.For(0, 50000, options,(i, state) =>
{
count++;
});
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
Console.ReadKey();
It gives me 15 thread before Parellel.For and after it gives me 17 thread only. So only 2 thread is occupy with Parellel.For.
Then I have created a another sample code using This Link like below:
var options = new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount * 10 };
Console.WriteLine("MaxDegreeOfParallelism : {0}", Environment.ProcessorCount * 10);
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
int count = 0;
Parallel.For(0, 50000, options,(i, state) =>
{
count++;
});
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
Console.ReadKey();
In above code, I have set MaxDegreeOfParallelism where it sets 40 but is still taking same threads for Parallel.For.
So how can I increase running thread for Parallel.For?
I am facing a problem that some numbers is skipped inside the Parallel.For when I perform some heavy and complex functionality inside it. So here I want to increase the maximum thread and override the skipping issue.
What you're saying is something like: "My car is shaking when driving too fast. I'm trying to avoid this by driving even faster." That doesn't make any sense. What you need is to fix the car, not change the speed.
How exactly to do that depends on what are you actually doing in the loop. The code you showed is obviously placeholder, but even that's wrong. So I think what you should do first is to learn about thread safety.
Using a lock is one option, and it's the easiest one to get correct. But it's also hard to make it efficient. What you need is to lock only for a short amount of time each iteration.
There are other options how to achieve thread safety, including using Interlocked, overloads of Parallel.For that use thread-local data and approaches other than Parallel.For(), like PLINQ or TPL Dataflow.
After you made sure your code is thread safe, only then it's time to worry about things like the number of threads. And regarding that, I think there are two things to note:
For CPU-bound computations, it doesn't make sense to use more threads than the number of cores your CPU has. Using more threads than that will actually usually lead to slower code, since switching between threads has some overhead.
I don't think you can measure the number of threads used by Parallel.For() like that. Parallel.For() uses the thread pool and it's quite possible that there already are some threads in the pool before the loop begins.
Parallel loops use hardware CPU cores. If your CPU has 2 cores, this is the maximum degree of paralellism that you can get in your machine.
Taken from MSDN:
What to Expect
By default, the degree of parallelism (that is, how many iterations run at the same time in hardware) depends on the
number of available cores. In typical scenarios, the more cores you
have, the faster your loop executes, until you reach the point of
diminishing returns that Amdahl's Law predicts. How much faster
depends on the kind of work your loop does.
Further reading:
Threading vs Parallelism, how do they differ?
Threading vs. Parallel Processing
Parallel loops will give you wrong result for summation operations without locks as result of each iteration depends on a single variable 'Count' and value of 'Count' in parallel loop is not predictable. However, using locks in parallel loops do not achieve actual parallelism. so, u should try something else for testing parallel loop instead of summation.

Multithreading BigInteger Operation

Doing very complex BigInteger operations are very slow e.g.
BigInteger.Pow(BigInteger(2),3231233282348);
I was wondering if there is any way I could multi thread any of these basic math functions.
It depends on the math function but I really can't see how you could speed up basic math functions. For these sort of calculations the next step in the process would typically depend on the previous step in the process. Threading would only really help where you have portions of a calculation that can be calculated independently. These could then be combined in a final step to produce the result. You would need to break up these calculations yourself into the portions that can run concurrently.
For example:
If you had a formula with 2 * 3 + 3 * 4. You could run two threads, the first calculating 2 * 3 and the second 3 * 4. You could then pull the results together at the end and sum the two results. You would need to work out how to break down the calculation into something smaller and then thread those accordingly.
In your example with power you could work out the following in 4 threads and then combine the results at the end by multiplying the results:
BigInteger.Pow(BigInteger(2),807808320587);
BigInteger.Pow(BigInteger(2),807808320587);
BigInteger.Pow(BigInteger(2),807808320587);
BigInteger.Pow(BigInteger(2),807808320587);
This would not save you any time at all because all 4 cores would be thrashing around trying to work out the same thing and you would just multiple them by each other at the end which is what a single threaded solution would do anyway. It would even be much slower on some processors as they will often speed up one core if the others are idle. I broke this up using the same thing as breaking up a 2^5 into 2^2 * 2^3.
The answer of
BigInteger.Pow(BigInteger(2), 3231233282348);
will contain
Log(2)/Log(10) * 3231233282348 == 9.727e11
digits; so it requires 900 GB to write the answer. That's why it that slow
If you're using .NET 4.5 read about async await:
http://blog.stephencleary.com/2012/02/async-and-await.html

ThreadPool with speed execution control

I need proccess several lines from a database (can be millions) in parallel in c#. The processing is quite quick (50 or 150ms/line) but I can not know this speed before runtime as it depends on hardware/network.
The ThreadPool or the newer TaskParallelLibrary seems to be what feets my needs as I am new to threading and want to get the most efficient way to process the data.
However these methods does not provide a way to control the speed execution of my tasks (lines/minute) : I want to be able to set a maximum speed limit for the processing or run it full speed.
Please note that setting the number of thread of the ThreadPool/TaskFactory does not provide sufficient accuracy for my needs as I would like to be able to set a speed limit below the 'one thread speed'.
Using a custom sheduler for the TPL seems to be a way to do that, but I did not find a way to implement it.
Furthermore, I'm worried about the efficiency cost that would take such a setup.
Could you provide me a way or advices how to achieve this work ?
Thanks in advance for your answers.
The TPL provides a convenient programming abstraction on top of the Thread Pool. I would always select TPL when that is an option.
If you wish to throttle the total processing speed, there's nothing built-in that would support that.
You can measure the total processing speed as you proceed through the file and regulate speed by introducing (non-spinning) delays in each thread. The size of the delay can be dynamically adjusted in your code based on observed processing speed.
I am not seeing the advantage of limiting a speed, but I suggest you look into limiting max degree of parallalism of the operation. That can be done via MaxDegreeOfParallelism in the ParalleForEach options property as the code works over the disparate lines of data. That way you can control the slots, for lack of a better term, which can be expanded or subtracted depending on the criteria which you are working under.
Here is an example using the ConcurrentBag to process lines of disperate data and to use 2 parallel tasks.
var myLines = new List<string> { "Alpha", "Beta", "Gamma", "Omega" };
var stringResult = new ConcurrentBag<string>();
ParallelOptions parallelOptions = new ParallelOptions();
parallelOptions.MaxDegreeOfParallelism = 2;
Parallel.ForEach( myLines, parallelOptions, line =>
{
if (line.Contains( "e" ))
stringResult.Add( line );
} );
Console.WriteLine( string.Join( " | ", stringResult ) );
// Outputs Beta | Omega
Note that parallel options also has a TaskScheduler property which you can refine more of the processing. Finally for more control, maybe you want to cancel the processing when a specific threshold is reached? If so look into CancellationToken property to exit the process early.

Estimate segment indexes in Parallel.For

In the parallel for loop below, is there a reliable way to determine how many threads will be created and with what index boundaries?
Parallel.For
(
0,
int.MaxValue,
new ParallelOptions() {MaxDegreeOfParallelism=Environment.ProcessorCount},
(i) =>
{
// Monitor [i] to see how the range is segmented.
}
);
If processor count on the target machine is 4 and we use all 4 processors, I observe that 4 segments are created roughly equal in size, each being approximately int.MaxValue/4. However, this is just observation and Parallel.For may or may not offer deterministic segmentation.
Searching around did not help much either. Is it possible to predict or calculate this?
You can specify your own partitioner if you don't like the default behavior.

How to increase concurrent parallel tasks with System.Threading.Parallel (.Net 4.0)

I'm experimenting with the new System.Threading.Parallel methods like parallel for and foreach.
They seem to work nicely but I need a way to increase the number of concurrent threads that are executed which are 8 (I have a Quad core).
I know there is a way I just can find the place thy hidden the damn property.
Gilad.
quote:
var query = from item in source.AsParallel().WithDegreeOfParallelism(10)
where Compute(item) > 42
select item;
In cases where a query is performing a significant amount of non-compute-bound work such as File I/O, it might be beneficial to specify a degree of parallelism greater than the number of cores on the machine.
from: MSDN
IF you are using Parallel.For or Parallel.ForEach you can specify a ParallelOptions object which has a property MaxDegreesOfParallelism. Unfortunately this is just a maximum limit as the name suggests, and does not provide a lower bound guarantee. For the relationsship to WithDegreeOfParallelism see this blog post.
MAY NOT - enough said. Blindy commented it correctly

Categories