ThreadPool with speed execution control - c#

I need proccess several lines from a database (can be millions) in parallel in c#. The processing is quite quick (50 or 150ms/line) but I can not know this speed before runtime as it depends on hardware/network.
The ThreadPool or the newer TaskParallelLibrary seems to be what feets my needs as I am new to threading and want to get the most efficient way to process the data.
However these methods does not provide a way to control the speed execution of my tasks (lines/minute) : I want to be able to set a maximum speed limit for the processing or run it full speed.
Please note that setting the number of thread of the ThreadPool/TaskFactory does not provide sufficient accuracy for my needs as I would like to be able to set a speed limit below the 'one thread speed'.
Using a custom sheduler for the TPL seems to be a way to do that, but I did not find a way to implement it.
Furthermore, I'm worried about the efficiency cost that would take such a setup.
Could you provide me a way or advices how to achieve this work ?
Thanks in advance for your answers.

The TPL provides a convenient programming abstraction on top of the Thread Pool. I would always select TPL when that is an option.
If you wish to throttle the total processing speed, there's nothing built-in that would support that.
You can measure the total processing speed as you proceed through the file and regulate speed by introducing (non-spinning) delays in each thread. The size of the delay can be dynamically adjusted in your code based on observed processing speed.

I am not seeing the advantage of limiting a speed, but I suggest you look into limiting max degree of parallalism of the operation. That can be done via MaxDegreeOfParallelism in the ParalleForEach options property as the code works over the disparate lines of data. That way you can control the slots, for lack of a better term, which can be expanded or subtracted depending on the criteria which you are working under.
Here is an example using the ConcurrentBag to process lines of disperate data and to use 2 parallel tasks.
var myLines = new List<string> { "Alpha", "Beta", "Gamma", "Omega" };
var stringResult = new ConcurrentBag<string>();
ParallelOptions parallelOptions = new ParallelOptions();
parallelOptions.MaxDegreeOfParallelism = 2;
Parallel.ForEach( myLines, parallelOptions, line =>
{
if (line.Contains( "e" ))
stringResult.Add( line );
} );
Console.WriteLine( string.Join( " | ", stringResult ) );
// Outputs Beta | Omega
Note that parallel options also has a TaskScheduler property which you can refine more of the processing. Finally for more control, maybe you want to cancel the processing when a specific threshold is reached? If so look into CancellationToken property to exit the process early.

Related

Why the following C# program uses limited (10) number of threads? [duplicate]

I have just did a sample for multithreading using This Link like below:
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
int count = 0;
Parallel.For(0, 50000, options,(i, state) =>
{
count++;
});
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
Console.ReadKey();
It gives me 15 thread before Parellel.For and after it gives me 17 thread only. So only 2 thread is occupy with Parellel.For.
Then I have created a another sample code using This Link like below:
var options = new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount * 10 };
Console.WriteLine("MaxDegreeOfParallelism : {0}", Environment.ProcessorCount * 10);
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
int count = 0;
Parallel.For(0, 50000, options,(i, state) =>
{
count++;
});
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
Console.ReadKey();
In above code, I have set MaxDegreeOfParallelism where it sets 40 but is still taking same threads for Parallel.For.
So how can I increase running thread for Parallel.For?
I am facing a problem that some numbers is skipped inside the Parallel.For when I perform some heavy and complex functionality inside it. So here I want to increase the maximum thread and override the skipping issue.
What you're saying is something like: "My car is shaking when driving too fast. I'm trying to avoid this by driving even faster." That doesn't make any sense. What you need is to fix the car, not change the speed.
How exactly to do that depends on what are you actually doing in the loop. The code you showed is obviously placeholder, but even that's wrong. So I think what you should do first is to learn about thread safety.
Using a lock is one option, and it's the easiest one to get correct. But it's also hard to make it efficient. What you need is to lock only for a short amount of time each iteration.
There are other options how to achieve thread safety, including using Interlocked, overloads of Parallel.For that use thread-local data and approaches other than Parallel.For(), like PLINQ or TPL Dataflow.
After you made sure your code is thread safe, only then it's time to worry about things like the number of threads. And regarding that, I think there are two things to note:
For CPU-bound computations, it doesn't make sense to use more threads than the number of cores your CPU has. Using more threads than that will actually usually lead to slower code, since switching between threads has some overhead.
I don't think you can measure the number of threads used by Parallel.For() like that. Parallel.For() uses the thread pool and it's quite possible that there already are some threads in the pool before the loop begins.
Parallel loops use hardware CPU cores. If your CPU has 2 cores, this is the maximum degree of paralellism that you can get in your machine.
Taken from MSDN:
What to Expect
By default, the degree of parallelism (that is, how many iterations run at the same time in hardware) depends on the
number of available cores. In typical scenarios, the more cores you
have, the faster your loop executes, until you reach the point of
diminishing returns that Amdahl's Law predicts. How much faster
depends on the kind of work your loop does.
Further reading:
Threading vs Parallelism, how do they differ?
Threading vs. Parallel Processing
Parallel loops will give you wrong result for summation operations without locks as result of each iteration depends on a single variable 'Count' and value of 'Count' in parallel loop is not predictable. However, using locks in parallel loops do not achieve actual parallelism. so, u should try something else for testing parallel loop instead of summation.

TPL Dataflow controlling max degree of parallelism between nested blocks

Let's say I have an action block A that perform some work in parallel with a max degree of parallelism of 4.
Say I have case where action block A is doing work X in some cases and work Y in others. X is some small work, Y is some larger work that requires it to be split into smaller work chunks and therefore I need to parallelise those too.
Inside work Y I therefore need to parallelise the work chunks to a max degree of 4, but at this point I might have 4 A blocks executing in parallel which could lead for example to "A-X, A-X, A-Y, A-Y" running in parallel. This would result in 1 + 1 + 4 + 4 parallel tasks which is too many parallel tasks for my purpose as I would always keep it limited to a maximum of 4 (or any other chosen number) overall.
Is there a way to control the maximum degree of parallelism including nested blocks?
While creating a block in TPL Dataflow, you can specify a custom scheduler for the block via its options.
Easy way to limit the number of concurrent tasks and concurrency level is to use the ConcurrentExclusiveSchedulerPair in your code, with parameters you need.

Prevent context-switching in timed section of code (or measure then subtract time not actually spent in thread)

I have a multi-threaded application, and in a certain section of code I use a Stopwatch to measure the time of an operation:
MatchCollection matches = regex.Matches(text); //lazy evaluation
Int32 matchCount;
//inside this bracket program should not context switch
{
//start timer
MyStopwatch matchDuration = MyStopwatch.StartNew();
//actually evaluate regex
matchCount = matches.Count;
//adds the time regex took to a list
durations.AddDuration(matchDuration.Stop());
}
Now, the problem is if the program switches control to another thread somewhere else while the stopwatch is started, then the timed duration will be wrong. The other thread could have done any amount of work before the context switches back to this section.
Note that I am not asking about locking, these are all local variables so there is no need for that. I just want the timed section to execute continuously.
edit: another solution could be to subtract the context-switched time to get the actual time done doing work in the timed section. Don't know if that's possible.
You can't do that. Otherwise it would be very easy for any application to get complete control over the CPU timeslices assigned to it.
You can, however, give your process a high priority to reduce the probability of a context-switch.
Here is another thought:
Assuming that you don't measure the execution time of a regular expression just once but multiple times, you should not see the average execution time as an absolute value but as a relative value compared to the average execution times of other regular expressions.
With this thinking you can compare the average execution times of different regular expressions without knowing the times lost to context switches. The time lost to context switches would be about the same in every average, assuming the environment is relatively stable with regards to CPU utilization.
I don't think you can do that.
A "best effort", for me, would be to put your method in a separate thread, and use
Thread.CurrentThread.Priority = ThreadPriority.Highest;
to avoid as much as possible context switching.
If I may ask, why do you need such a precise measurement, and why can't you extract the function, and benchmark it in its own program if that's the point ?
Edit : Depending on the use case it may be useful to use
Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(2); // Or whatever core you want to stick to
to avoid switch between cores.

Running maximum threads: Automatic performance adjustment

I'm developing an app which one scans thousands copies of a struct; ~1 GB RAM. Speed is important.
ParallelScan(_from, _to); //In a new thread
I manually adjust the threads count:
if (myStructs.Count == 0) { threads = 0; }
else if (myStructs.Count < 1 * Number.Thousand) { threads = 1; }
else if (myStructs.Count < 3 * Number.Thousand) { threads = 2; }
else if (myStructs.Count < 5 * Number.Thousand) { threads = 4; }
else if (myStructs.Count < 10 * Number.Thousand) { threads = 8; }
else if (myStructs.Count < 20 * Number.Thousand) { threads = 12; }
else if (myStructs.Count < 30 * Number.Thousand) { threads = 20; }
else if (myStructs.Count < 50 * Number.Thousand) { threads = 30; }
else threads = 40;
I just wrote it from scratch and I need to modify it for another CPU, etc. I think I could write a smarter code which one dynamically starts a new thread if CPU is available at the moment:
If CPU is not %100 start N thread
Measure CPU or thread process time & modify/estimate N
Loop until scan all struct array
Is there anyone think that "I did something similar" or "I have a better idea" ?
UPDATE: The solution
Parallel.For(0, myStructs.Count - 1, (x) =>
{
ParallelScan(x, x); // Will be ParallelScan(x);
});
I did trim tons of code. Thanks people!
UPDATE 2: Results
Scan time for 10K templates
1 Thread: 500 ms
10 Threads: 300 ms
40 Threads: 600 ms
Tasks: 100 ms
The standard answer: Use Tasks (TPL) , not Threads. Tasks require Fx4.
Your ParallelScan could just use Parallel.Foreach( ... ) or PLINQ (.AsParallel()).
The TPL framework includes a scheduler, and ForEach() uses a partitioner, to adapt to CPU cores and load. Your problem is most likely solved with the standard components but you can write custom-schedulers and -partitioners.
Actually, you won't get much benefit from spanning 50 threads, if you CPU only has two cores (even if each of them supports hyperthreading). If will actually run slower due to context switching which will occur every once in a while.
That means you should go for the Task Parallel Library (.NET 4), which takes care that all available cores are used efficiently.
Apart from that, improving the asymptotic duration of your search algorithm might prove more valuable for large quantities of data, regardless of the Moore's law.
[Edit]
If you are unable/unwilling to use .NET 4 TPL, you can start by getting the information about the current number of logical processors in the system (use Environment.ProcessorCount or check this answer for detailed info). Based on that number, you can partition your data and span a fixed number of threads. That is much simpler that checking the CPU utilization, and should prevent creating unnecessary threads which are starved anyway.
OK, sorry to keep going on but first to compile my comments:
Unless you have a very, very, very, good reason to think that scanning these structs will take any more than a handful of microseconds and that really, really, really matters, it's not a good idea to do this kind of optimisation. If you really want to do it, you should have one thread per core. But really - don't. If it's just 50,000 structs and you're doing something simple with them, don't bother.
FYI, starting a new thread takes a good amount of time (a measurable part of a second, several milliseconds).
How long does this operation take? It's very unlikely that it's useful for you to optimize multithreading like this. It will give you the worst improvement. Better improvement will be gained by a better algorithm, or not having to depend on this weird invented multithreading scheme.
I'm confused about your performance fixation partly because you say you're looking through 50,000 structs (a very quick and easy operation) and partly because you're using structs. Without boxing that's a value type and if you're passing them around threads you're copying data rather than references, i.e. using more memory. My point being that that's a lot of data/memory, unless the structs are small, in which case, what kind of processing can you possibly be doing on them that takes so long as to think about 40+ threads in parallel?
If performance is truly incredibly important and your goal, and you're not simply trying to do this as a nice engineering exercise, please share information about what kind of processing you're doing.

How to increase concurrent parallel tasks with System.Threading.Parallel (.Net 4.0)

I'm experimenting with the new System.Threading.Parallel methods like parallel for and foreach.
They seem to work nicely but I need a way to increase the number of concurrent threads that are executed which are 8 (I have a Quad core).
I know there is a way I just can find the place thy hidden the damn property.
Gilad.
quote:
var query = from item in source.AsParallel().WithDegreeOfParallelism(10)
where Compute(item) > 42
select item;
In cases where a query is performing a significant amount of non-compute-bound work such as File I/O, it might be beneficial to specify a degree of parallelism greater than the number of cores on the machine.
from: MSDN
IF you are using Parallel.For or Parallel.ForEach you can specify a ParallelOptions object which has a property MaxDegreesOfParallelism. Unfortunately this is just a maximum limit as the name suggests, and does not provide a lower bound guarantee. For the relationsship to WithDegreeOfParallelism see this blog post.
MAY NOT - enough said. Blindy commented it correctly

Categories