Parallelness within and across TPL Dataflow blocks?

Parallelness within and across TPL Dataflow blocks? - c#

I have some problems about the parallelness within and across TPL Dataflow blocks.
Within block: I have a TransformBlock, with let's say MaxDOP of 4, performing things that I already know would benefit if
it is as parallel as possible but the scheduler doesn't know. So when I give it 200 items, instead of roughly doing 50 items on 4 threads, it is usually like 150 item on 1 thread, 20 on 2 threads, and then it didn't bothered to use a fourth thread. Is there someway to hint it to be more parallel?
Across blocks: I have several blocks A -> B -> C that is a pipeline. I imagined it would work like early items would get finished processing at C while late items are still processed at A. But let's say, when I gave it like 10k items, it performed all 10k items for A, then all 10k at B, then all 10k at C. That means the first item only exited the pipeline when the last item is finished. I guess to the task scheduler all tasks are equal but to me I hope for "first response time" instead of "last response time". how do I hint the block to behave differently?
Thanks.

Related

Why the following C# program uses limited (10) number of threads? [duplicate]

I have just did a sample for multithreading using This Link like below:
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
int count = 0;
Parallel.For(0, 50000, options,(i, state) =>
{
count++;
});
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
Console.ReadKey();
It gives me 15 thread before Parellel.For and after it gives me 17 thread only. So only 2 thread is occupy with Parellel.For.
Then I have created a another sample code using This Link like below:
var options = new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount * 10 };
Console.WriteLine("MaxDegreeOfParallelism : {0}", Environment.ProcessorCount * 10);
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
int count = 0;
Parallel.For(0, 50000, options,(i, state) =>
{
count++;
});
Console.WriteLine("Number of Threads: {0}", System.Diagnostics.Process.GetCurrentProcess().Threads.Count);
Console.ReadKey();
In above code, I have set MaxDegreeOfParallelism where it sets 40 but is still taking same threads for Parallel.For.
So how can I increase running thread for Parallel.For?

I am facing a problem that some numbers is skipped inside the Parallel.For when I perform some heavy and complex functionality inside it. So here I want to increase the maximum thread and override the skipping issue.
What you're saying is something like: "My car is shaking when driving too fast. I'm trying to avoid this by driving even faster." That doesn't make any sense. What you need is to fix the car, not change the speed.
How exactly to do that depends on what are you actually doing in the loop. The code you showed is obviously placeholder, but even that's wrong. So I think what you should do first is to learn about thread safety.
Using a lock is one option, and it's the easiest one to get correct. But it's also hard to make it efficient. What you need is to lock only for a short amount of time each iteration.
There are other options how to achieve thread safety, including using Interlocked, overloads of Parallel.For that use thread-local data and approaches other than Parallel.For(), like PLINQ or TPL Dataflow.
After you made sure your code is thread safe, only then it's time to worry about things like the number of threads. And regarding that, I think there are two things to note:
For CPU-bound computations, it doesn't make sense to use more threads than the number of cores your CPU has. Using more threads than that will actually usually lead to slower code, since switching between threads has some overhead.
I don't think you can measure the number of threads used by Parallel.For() like that. Parallel.For() uses the thread pool and it's quite possible that there already are some threads in the pool before the loop begins.

Parallel loops use hardware CPU cores. If your CPU has 2 cores, this is the maximum degree of paralellism that you can get in your machine.
Taken from MSDN:
What to Expect
By default, the degree of parallelism (that is, how many iterations run at the same time in hardware) depends on the
number of available cores. In typical scenarios, the more cores you
have, the faster your loop executes, until you reach the point of
diminishing returns that Amdahl's Law predicts. How much faster
depends on the kind of work your loop does.
Further reading:
Threading vs Parallelism, how do they differ?
Threading vs. Parallel Processing

Parallel loops will give you wrong result for summation operations without locks as result of each iteration depends on a single variable 'Count' and value of 'Count' in parallel loop is not predictable. However, using locks in parallel loops do not achieve actual parallelism. so, u should try something else for testing parallel loop instead of summation.

How to optimize Workqueue of well know time consuming processes

I have an IEnumerable of actions and they are decendent ordered by the time they will consume when executing. Now i want all of them to be executed in parallel. Are there any better solutions than this one?
IEnumerable<WorkItem> workItemsOrderedByTime = myFactory.WorkItems.DecendentOrderedBy(t => t.ExecutionTime);
Parallel.ForEach(workItemsOrderedByTime, t => t.Execute(), Environment.ProcessorCount);
So my idea is to first execute all expensice tasks in terms of time they need to be done.
EDIT: The question is if there is a better solution to get all done in minimum of time.

To solve your XY Problem of
Because otherwise it can happen that 9 of 10 tasks are finished and the last one is executed on 1 core and all other cores are doing nothing.
What you need to do is tell Parallel.ForEach to only take one item from the source list at a time. That way when you are down to the last items you won't have a bunch of slow work items all in a single core's queue.
This can be done by using Partitioner.Create and passing in EnumerablePartitionerOptions.NoBuffering
Parallel.ForEach(Partitioner.Create(workItems, EnumerablePartitionerOptions.NoBuffering),
new ParallelOptions{MaxDegreeOfParallelism = Environment.ProcessorCount},
t => t.Execute());

By default there is no execution order guarantee in Parallel.ForEach
That is why your call to DecendentOrderedBy does not do anything good. Though it might do something bad: in case default partitioner decides to do a range partition dividing say 12 WorkItems into 4 groups of 3 items, by the order in IEnumerable. Then first core has much more work to do, thus creating the problem you try to avoid.
Easy fix to (2) is explained in the answer by Scott. If Parallel.ForEach takes just one item then you naturally get some load balancing. In most cases this will work fine
The optimal (in most cases) solution for an ordered IEnumerable (as you have) will be Striped Partitioning number of buckets = number of cores. AFIK there you don't get this out-of-the-box in .NET. But you can provide a custom OrderablePartitioner that will partition data just this way.
I am sorry to say it but: "No free lunch"

C# delay in Parallel.ForEach

I use 2 Parallel.ForEach nested loops to retrieve information quickly from a url. This is the code:
while (searches.Count > 0)
{
Parallel.ForEach(searches, (search, loopState) =>
{
Parallel.ForEach(search.items, item =>
{
RetrieveInfo(item);
}
);
}
);
}
The outer ForEach has a list of, for example 10, whilst the inner ForEach has a list of 5. This means that I'm going to query the url 50 times, however I query it 5 times simultaneously (inner ForEach).
I need to add a delay for the inner loop so that after it queries the url, it waits for x seconds - the time taken for the inner loop to complete the 5 requests.
Using Thread.Sleep is not a good idea because it will block the complete thread and possibly the other parallel tasks.
Is there an alternative that might work?

To my understanding, you have 50 tasks and you wish to process 5 of them at a time.
If so, you should look into ParallelOptions.MaxDegreeOfParallelism to process 50 tasks with a maximum degree of parallelism at 5. When one task stops, another task is permitted to start.
If you wish to have tasks processed in chunks of five, followed by another chunk of five (as in, you wish to process chunks in serial), then you would want code similar to
for(...)
{
Parallel.ForEach(
[paralleloptions,]
set of 5, action
)
}

Parallel.For and Break() misunderstanding?

I'm investigating the Parallelism Break in a For loop.
After reading this and this I still have a question:
I'd expect this code :
Parallel.For(0, 10, (i,state) =>
{
Console.WriteLine(i); if (i == 5) state.Break();
}
To yield at most 6 numbers (0..6).
not only he is not doing it but have different result length :
02351486
013542
0135642
Very annoying. (where the hell is Break() {after 5} here ??)
So I looked at msdn
Break may be used to communicate to the loop that no other iterations after the current iteration need be run.
If Break is called from the 100th iteration of a for loop iterating in
parallel from 0 to 1000, all iterations less than 100 should still be
run, but the iterations from 101 through to 1000 are not necessary.
Quesion #1 :
Which iterations ? the overall iteration counter ? or per thread ? I'm pretty sure it is per thread. please approve.
Question #2 :
Lets assume we are using Parallel + range partition (due to no cpu cost change between elements) so it divides the data among threads . So if we have 4 cores (and perfect divisions among them):
core #1 got 0..250
core #2 got 251..500
core #3 got 501..750
core #4 got 751..1000
so the thread in core #1 will meet value=100 sometime and will break.
this will be his iteration number 100 .
But the thread in core #4 got more quanta and he is on 900 now. he is way beyond his 100'th iteration.
He doesnt have index less 100 to be stopped !! - so he will show them all.
Am I right ? is that is the reason why I get more than 5 elements in my example ?
Question #3 :
How cn I truly break when (i == 5) ?
p.s.
I mean , come on ! when I do Break() , I want things the loop to stop.
excactly as I do in regular For loop.

To yield at most 6 numbers (0..6).
The problem is that this won't yield at most 6 numbers.
What happens is, when you hit a loop with an index of 5, you send the "break" request. Break() will cause the loop to no longer process any values >5, but process all values <5.
However, any values greater than 5 which were already started will still get processed. Since the various indices are running in parallel, they're no longer ordered, so you get various runs where some values >5 (such as 8 in your example) are still being executed.
Which iterations ? the overall iteration counter ? or per thread ? I'm pretty sure it is per thread. please approve.
This is the index being passed into Parallel.For. Break() won't prevent items from being processed, but provides a guarantee that all items up to 100 get processed, but items above 100 may or may not get processed.
Am I right ? is that is the reason why I get more than 5 elements in my example ?
Yes. If you use a partitioner like you've shown, as soon as you call Break(), items beyond the one where you break will no longer get scheduled. However, items (which is the entire partition) already scheduled will get processed fully. In your example, this means you're likely to always process all 1000 items.
How can I truly break when (i == 5) ?
You are - but when you run in Parallel, things change. What is the actual goal here? If you only want to process the first 6 items (0-5), you should restrict the items before you loop through them via a LINQ query or similar. You can then process the 6 items in Parallel.For or Parallel.ForEach without a Break() and without worry.
I mean , come on ! when I do Break() , I want things the loop to stop. excactly as I do in regular For loop.
You should use Stop() instead of Break() if you want things to stop as quickly as possible. This will not prevent items already running from stopping, but will no longer schedule any items (including ones at lower indices or earlier in the enumeration than your current position).

If Break is called from the 100th iteration of a for loop iterating in parallel from 0 to 1000
The 100th iteration of the loop is not necessarily (in fact probably not) the one with the index 99.
Your threads can and will run in an indeterminent order. When the .Break() instruction is encountered, no further loop iterations will be started. Exactly when that happens depends on the specifics of thread scheduling for a particular run.
I strongly recommend reading
Patterns of Parallel Programming
(free PDF from Microsoft)
to understand the design decisions and design tradeoffs that went into the TPL.

Which iterations ? the overall iteration counter ? or per thread ?
Off all the iterations scheduled (or yet to be scheduled).
Remember the delegate may be run out of order, there is no guarantee that iteration i == 5 will be the sixth to execute, rather this is unlikely to be the case except in rare cases.
Q2: Am I right ?
No, the scheduling is not so simplistic. Rather all the tasks are queued up and then the queue is processed. But the threads each use their own queue until it is empty when they steal from other the threads. This leads no way to predict which thread will process what delegate.
If the delegates are sufficiently trivial it might all be processed on the original calling thread (no other thread gets a chance to steal work).
Q3: How cn I truly break when (i == 5) ?
Don't use concurrently if you want linear (in specific) processing.
The Break method is there to support speculative execution: try various ways and stop as soon as any one completes.

Acceptable use of Thread.Sleep()

I'm working on a console application which will be scheduled and run at set intervals, say every 30 minutes. Its only purpose is to query a Web Service to update a batch of database rows.
The Web Service API reccommends calling once every 30 seconds, and timeout after a set interval. The following pseudocode is given as an example:
listId := updateList(<list of terms>)
LOOP
WHILE NOT isUpdatingComplete(listId)
END LOOP
statuses := getStatuses(“LIST_ID = {listId}”)
I have coded this roughly in C# as:
int callCount = 0;
while( callCount < 5 && !client.isUpdateComplete(listId, out messages) )
{
listId = client.updateList(options, terms, out messages);
callCount++;
Thread.Sleep(30000);
}
// Get resulting status...
Is it OK in this situation to use Thread.Sleep()? I'm aware it is not generally good practice but from reading reasons not to use it this seems like acceptable usage.
Thanks.

Thread.Sleep ensures the current thread doesn't return until at least the specified milliseconds have passed. There are plenty of places it's appropriate to do that, and your example seems fine, assuming it's running on a background thread.
Some example places you don't want to use it - on the UI thread or where you need to do exact timing.

Generally speaking, Thread.Sleep is like any other tool: perfectly OK to use, except when it's terribly misused. I disagree with the "not generally good practice" part, which is the result of people abusing Thread.Sleep when they should be doing something else (i.e. blocking on a synchronization object).
In your case the program is single-threaded, it has no UI (i.e. the thread has no message loop) and you do not want to synchronize with external events. Therefore Thread.Sleep is just fine.

The general objection against Sleep() is that it wastes a Thread.
In your case there is only 1 Thread (maybe 2) so that is not really a problem.
So I think it looks fine (but I would sleep 29 seconds to cut some slack).

It's fine, except that you cannot interrupt it once it goes into sleep, without aborting the thread (which is not recommended).
That's why a ManualResetEvent might be a better idea, since it can be signalled ("awaken") from a different thread.

you could stick with the Thread.Sleep method. But it would be more elegant to schedule it to run every 30 minutes - so you don't have to take care of the waiting inside your application.

Thread.Sleep isn't the best for executing periodic logic. Thread.Sleep(n) means your thread will relinquish control for n milliseconds. There is no guarantee that it will regain control after n milliseconds, it depends on the CPU load.

If you are locking the thread for 30 mins case you should schedule a windows task every 30 mins, so the program executes and then ends. That way you are not locking a thread for so long.
For shorter times, like 30 secs / 1 min, System.Thread.Sleep() is perfectly fine. For more than 5 mins i would use a windows task. (Im spanish i think on the english version are called like that, im talking about the tasks you schedule from the control panel ;-) )

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.